Convergence of the Algorithm of Additive Regularization of Topic Models
Trudy Instituta matematiki i mehaniki, Trudy Instituta Matematiki i Mekhaniki UrO RAN, Tome 26 (2020) no. 3, pp. 56-68
Voir la notice de l'article provenant de la source Math-Net.Ru
The problem of probabilistic topic modeling is as follows. Given a collection
of text documents, find the conditional distribution over topics for each
document and the conditional distribution over words (or terms) for each topic.
Log-likelihood maximization is used to solve this problem. The problem
generally has an infinite set of solutions and is ill-posed according to Hadamard.
In the framework of Additive Regularization of Topic Models (ARTM), a weighted
sum of regularization criteria is added to the main log-likelihood criterion.
The numerical method for solving this optimization problem is a kind of an
iterative EM-algorithm written in a general form for an
arbitrary smooth regularizer as well as for a linear combination of smooth
regularizers. This paper studies the problem of convergence of the EM iterative
process. Sufficient conditions are obtained for the convergence to a stationary
point of the regularized log-likelihood. The constraints imposed on the regularizer
are not too restrictive. We give their interpretations from the point of view
of the practical implementation of the algorithm. A modification of the algorithm
is proposed that improves the convergence without additional time and memory costs.
Experiments on a news text collection have shown that our modification both
accelerates the convergence and improves the value of the criterion to be optimized.
Keywords:
natural language processing, probabilistic topic modeling, probabilistic latent semantic analysis (PLSA), latent Dirichlet allocation (LDA), additive regularization of topic models (ARTM)
Mots-clés : EM-algorithm, sufficient conditions for convergence.
Mots-clés : EM-algorithm, sufficient conditions for convergence.
@article{TIMM_2020_26_3_a5,
author = {I. A. Irkhin and K. V. Vorontsov},
title = {Convergence of the {Algorithm} of {Additive} {Regularization} of {Topic} {Models}},
journal = {Trudy Instituta matematiki i mehaniki},
pages = {56--68},
publisher = {mathdoc},
volume = {26},
number = {3},
year = {2020},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/TIMM_2020_26_3_a5/}
}
TY - JOUR AU - I. A. Irkhin AU - K. V. Vorontsov TI - Convergence of the Algorithm of Additive Regularization of Topic Models JO - Trudy Instituta matematiki i mehaniki PY - 2020 SP - 56 EP - 68 VL - 26 IS - 3 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/TIMM_2020_26_3_a5/ LA - ru ID - TIMM_2020_26_3_a5 ER -
I. A. Irkhin; K. V. Vorontsov. Convergence of the Algorithm of Additive Regularization of Topic Models. Trudy Instituta matematiki i mehaniki, Trudy Instituta Matematiki i Mekhaniki UrO RAN, Tome 26 (2020) no. 3, pp. 56-68. http://geodesic.mathdoc.fr/item/TIMM_2020_26_3_a5/