An opensource library for AutoML multimodal clustering on Apache Spark
Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part IV, Tome 540 (2024), pp. 178-193
Voir la notice de l'article provenant de la source Math-Net.Ru
We present a library that allows to choose and configure the clustering algorithm for multimodal datasets, i.e., for data where every object is stored not as a single vector but can be presented as a vector, text, and an image at the same time, and every modality is significant. Our library automatically finds a tradeoff between exploration and exploitation for the input data among a set of implemented clustering algorithms according to the selected internal clustering validation index. The library also implements a recommender system for the internal validation index and can predict the best fitting measure for the input data. We used Apache Spark to implement clustering algorithms, thus, it can be used on distributed computing system to clusterize big multimodal data.
@article{ZNSL_2024_540_a9,
author = {S. Muravyov and V. Kazakovtsev and I. Usov and P. Shpineva and O. Muravyova and A. Shalyto},
title = {An opensource library for {AutoML} multimodal clustering on {Apache} {Spark}},
journal = {Zapiski Nauchnykh Seminarov POMI},
pages = {178--193},
publisher = {mathdoc},
volume = {540},
year = {2024},
language = {en},
url = {http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a9/}
}
TY - JOUR AU - S. Muravyov AU - V. Kazakovtsev AU - I. Usov AU - P. Shpineva AU - O. Muravyova AU - A. Shalyto TI - An opensource library for AutoML multimodal clustering on Apache Spark JO - Zapiski Nauchnykh Seminarov POMI PY - 2024 SP - 178 EP - 193 VL - 540 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a9/ LA - en ID - ZNSL_2024_540_a9 ER -
%0 Journal Article %A S. Muravyov %A V. Kazakovtsev %A I. Usov %A P. Shpineva %A O. Muravyova %A A. Shalyto %T An opensource library for AutoML multimodal clustering on Apache Spark %J Zapiski Nauchnykh Seminarov POMI %D 2024 %P 178-193 %V 540 %I mathdoc %U http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a9/ %G en %F ZNSL_2024_540_a9
S. Muravyov; V. Kazakovtsev; I. Usov; P. Shpineva; O. Muravyova; A. Shalyto. An opensource library for AutoML multimodal clustering on Apache Spark. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part IV, Tome 540 (2024), pp. 178-193. http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a9/