An opensource library for AutoML multimodal clustering on Apache Spark
Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part IV, Tome 540 (2024), pp. 178-193 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

We present a library that allows to choose and configure the clustering algorithm for multimodal datasets, i.e., for data where every object is stored not as a single vector but can be presented as a vector, text, and an image at the same time, and every modality is significant. Our library automatically finds a tradeoff between exploration and exploitation for the input data among a set of implemented clustering algorithms according to the selected internal clustering validation index. The library also implements a recommender system for the internal validation index and can predict the best fitting measure for the input data. We used Apache Spark to implement clustering algorithms, thus, it can be used on distributed computing system to clusterize big multimodal data.
@article{ZNSL_2024_540_a9,
     author = {S. Muravyov and V. Kazakovtsev and I. Usov and P. Shpineva and O. Muravyova and A. Shalyto},
     title = {An opensource library for {AutoML} multimodal clustering on {Apache} {Spark}},
     journal = {Zapiski Nauchnykh Seminarov POMI},
     pages = {178--193},
     year = {2024},
     volume = {540},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a9/}
}
TY  - JOUR
AU  - S. Muravyov
AU  - V. Kazakovtsev
AU  - I. Usov
AU  - P. Shpineva
AU  - O. Muravyova
AU  - A. Shalyto
TI  - An opensource library for AutoML multimodal clustering on Apache Spark
JO  - Zapiski Nauchnykh Seminarov POMI
PY  - 2024
SP  - 178
EP  - 193
VL  - 540
UR  - http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a9/
LA  - en
ID  - ZNSL_2024_540_a9
ER  - 
%0 Journal Article
%A S. Muravyov
%A V. Kazakovtsev
%A I. Usov
%A P. Shpineva
%A O. Muravyova
%A A. Shalyto
%T An opensource library for AutoML multimodal clustering on Apache Spark
%J Zapiski Nauchnykh Seminarov POMI
%D 2024
%P 178-193
%V 540
%U http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a9/
%G en
%F ZNSL_2024_540_a9
S. Muravyov; V. Kazakovtsev; I. Usov; P. Shpineva; O. Muravyova; A. Shalyto. An opensource library for AutoML multimodal clustering on Apache Spark. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part IV, Tome 540 (2024), pp. 178-193. http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a9/

[1] V. Shalamov, V. Efimova, S. Muravyov, and A. Filchenkov, “Reinforcement-based method for simultaneous clustering algorithm selection and its hyperparameters optimization”, Procedia Comput. Sci., 136 (2018), 144–153 | DOI

[2] V. Kazakovtsev and S. Muravyov, “Application of the automatic selection and configuration of clustering algorithms method for the Apache Spark framework”, ACM Int. Conf. Proc. Ser., 2021 | Zbl

[3] O. Taratukhin and S. Muravyov, “Meta-Learning Based Feature Selection for Clustering”, Lecture Notes in Comput. Sci., 13113, Springer, 2021, 548–559 | DOI

[4] N. Kulin and S. Muravyov, “A meta-feature selection method based on the auto-sklearn framework”, Sci. Tech. J. Inf. Technol. Mech. Opt., 21:5 (2021), 702–702

[5] A. Filchenkov, S. Muravyov, and V. Parfenov, “Towards cluster validity index evaluation and selection”, Proc. 2016 IEEE Artif. Intell. Nat. Lang. Conf. (AINL), IEEE, 2016, 1–8

[6] M.M. Al Rahhal, Y. Bazi, T. Abdullah, M.L. Mekhalfi, and M. Zuair, “Deep unsupervised embedding for remote sensing image retrieval using textual cues”, Appl. Sci., 10:24 (2020), 8931 | DOI

[7] T. Baltrušaitis, C. Ahuja, and L.-P. Morency, “Multimodal machine learning: A survey and taxonomy”, IEEE Trans. Pattern Anal. Mach. Intell., 41:2 (2018), 423–443

[8] C. Chen, D. Han, and J. Wang, “Multimodal encoder-decoder attention networks for visual question answering”, IEEE Access, 8 (2020), 35662–35671 | DOI

[9] M. Suzuki and Y. Matsuo, A survey of multimodal deep generative models, 2022, arXiv: 2207.02127

[10] W. Wang, B.C. Ooi, X. Yang, D. Zhang, and Y. Zhuang, “Effective multi-modal retrieval based on stacked auto-encoders”, Proc. VLDB Endow., 7:8 (2014), 649–660 | DOI

[11] J. Vanschoren, J.N. van Rijn, B. Bischl, and L. Torgo, OpenML: Networked science in machine learning, 2014, arXiv: 1407.7722

[12] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework”, Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min, 2019