Convergence of the Algorithm of Additive Regularization of Topic Models
Trudy Instituta matematiki i mehaniki, Trudy Instituta Matematiki i Mekhaniki UrO RAN, Tome 26 (2020) no. 3, pp. 56-68

Voir la notice de l'article provenant de la source Math-Net.Ru

The problem of probabilistic topic modeling is as follows. Given a collection of text documents, find the conditional distribution over topics for each document and the conditional distribution over words (or terms) for each topic. Log-likelihood maximization is used to solve this problem. The problem generally has an infinite set of solutions and is ill-posed according to Hadamard. In the framework of Additive Regularization of Topic Models (ARTM), a weighted sum of regularization criteria is added to the main log-likelihood criterion. The numerical method for solving this optimization problem is a kind of an iterative EM-algorithm written in a general form for an arbitrary smooth regularizer as well as for a linear combination of smooth regularizers. This paper studies the problem of convergence of the EM iterative process. Sufficient conditions are obtained for the convergence to a stationary point of the regularized log-likelihood. The constraints imposed on the regularizer are not too restrictive. We give their interpretations from the point of view of the practical implementation of the algorithm. A modification of the algorithm is proposed that improves the convergence without additional time and memory costs. Experiments on a news text collection have shown that our modification both accelerates the convergence and improves the value of the criterion to be optimized.
Keywords: natural language processing, probabilistic topic modeling, probabilistic latent semantic analysis (PLSA), latent Dirichlet allocation (LDA), additive regularization of topic models (ARTM)
Mots-clés : EM-algorithm, sufficient conditions for convergence.
@article{TIMM_2020_26_3_a5,
     author = {I. A. Irkhin and K. V. Vorontsov},
     title = {Convergence of the {Algorithm} of {Additive} {Regularization} of {Topic} {Models}},
     journal = {Trudy Instituta matematiki i mehaniki},
     pages = {56--68},
     publisher = {mathdoc},
     volume = {26},
     number = {3},
     year = {2020},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/TIMM_2020_26_3_a5/}
}
TY  - JOUR
AU  - I. A. Irkhin
AU  - K. V. Vorontsov
TI  - Convergence of the Algorithm of Additive Regularization of Topic Models
JO  - Trudy Instituta matematiki i mehaniki
PY  - 2020
SP  - 56
EP  - 68
VL  - 26
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/TIMM_2020_26_3_a5/
LA  - ru
ID  - TIMM_2020_26_3_a5
ER  - 
%0 Journal Article
%A I. A. Irkhin
%A K. V. Vorontsov
%T Convergence of the Algorithm of Additive Regularization of Topic Models
%J Trudy Instituta matematiki i mehaniki
%D 2020
%P 56-68
%V 26
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/TIMM_2020_26_3_a5/
%G ru
%F TIMM_2020_26_3_a5
I. A. Irkhin; K. V. Vorontsov. Convergence of the Algorithm of Additive Regularization of Topic Models. Trudy Instituta matematiki i mehaniki, Trudy Instituta Matematiki i Mekhaniki UrO RAN, Tome 26 (2020) no. 3, pp. 56-68. http://geodesic.mathdoc.fr/item/TIMM_2020_26_3_a5/