Matrix text models. Interpretation and experimental verification of models
Matematičeskoe modelirovanie, Tome 32 (2020) no. 7, pp. 24-46.

Voir la notice de l'article provenant de la source Math-Net.Ru

Interpretation of matrix models of texts and text collections is considered. Examples of computationally constructed models of text collections are presented. These examples demonstrate the richness of the simulation results and the possibilities of practical use of the proposed approaches. The original method of experimental verification of the acceptability of text models for solving problems of semantic search and analysis of unstructured text information is described, and the results of the corresponding largescale experiment are presented.
Keywords: natural language texts, text models, text information retrieval, model verification.
Mots-clés : text collections
@article{MM_2020_32_7_a1,
     author = {M. G. Kreines and E. M. Kreines},
     title = {Matrix text models. {Interpretation} and experimental verification of models},
     journal = {Matemati\v{c}eskoe modelirovanie},
     pages = {24--46},
     publisher = {mathdoc},
     volume = {32},
     number = {7},
     year = {2020},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MM_2020_32_7_a1/}
}
TY  - JOUR
AU  - M. G. Kreines
AU  - E. M. Kreines
TI  - Matrix text models. Interpretation and experimental verification of models
JO  - Matematičeskoe modelirovanie
PY  - 2020
SP  - 24
EP  - 46
VL  - 32
IS  - 7
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MM_2020_32_7_a1/
LA  - ru
ID  - MM_2020_32_7_a1
ER  - 
%0 Journal Article
%A M. G. Kreines
%A E. M. Kreines
%T Matrix text models. Interpretation and experimental verification of models
%J Matematičeskoe modelirovanie
%D 2020
%P 24-46
%V 32
%N 7
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MM_2020_32_7_a1/
%G ru
%F MM_2020_32_7_a1
M. G. Kreines; E. M. Kreines. Matrix text models. Interpretation and experimental verification of models. Matematičeskoe modelirovanie, Tome 32 (2020) no. 7, pp. 24-46. http://geodesic.mathdoc.fr/item/MM_2020_32_7_a1/

[1] D. Mimno, H. Wallach, E. Talley, M. Leenders, A. McCallum, “Optimizing semantic coherence in topic models”, Proc. of Conf. on Empirical Methods in Natural Language (Edinburgh, Scotland, UK, July 27–31, 2011), 262–272

[2] D. Newman, J. H. Lau, K. Grieser, T. Baldwin, “Automatic evaluation of topic coherence”, Human Language Technologies, Annual Conf. of the North American Chapter of the ACL (Los Angeles, California, 2010), 100–108

[3] D. Newman, Y. Noh, E. Talley, S. Karimi S., T. Baldwin, “Evaluating topic models for digi-tal libraries”, Proc. of 10th Annual Joint Conf. on Digital Libraries, ACM, New York, 2010, 215–224

[4] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information, 2016, 12 pp., arXiv: 1607.04606v1 [cs.CL]

[5] M. J. Kusner, Y. Sun, N. I. Kolkin, K. Q. Weinberger, “From Word Embeddings To Document Distances”, Proc. of 32nd Inter. Conf. on Machine Learning (Lille, France, 2015), W, 37, 957–966

[6] G. Huang, Ch. Guo, M. J. Kusner, Y. Sun, K. Q. Weinberger, F. Sha, “Supervised Word Mover's Distance”, 30th Conf. on Neural Inform. Proc. Syst. (Barcelona, Spain, 2016), 9 pp.

[7] T. Saracevič, “Effects of inconsistent relevance judgments on information retrieval test results: A historical perspective”, LIBRARY TRENDS, 56:4 (2008), 763–783 | DOI

[8] K. V. Vorontsov, “Additive Regularization for Topic Models of Text Collections”, Doklady Mathematics, 89:3 (2014), 301–304 | DOI | DOI | MR | Zbl

[9] K. V. Vorontsov, A. A. Potapenko, Additivnaia reguliarizatsiia tematicheskikh modelei, 2014, 22 pp.

[10] M. G. Kreines, E. M. Kreines, “Matrix text models. Text models and similarity of text contents”, MM, 12:5 (2020) | DOI | Zbl

[11] M. G. Kreines, E. M. Kreines, “Matrix text models. Text corpora models”, MM, 12:5 (2020) | Zbl

[12] D. Blei, J. Lafferty, “A correlated topic model of Science”, Annals of Applied Statistics, 1 (2007), 17–35 | MR | Zbl

[13] M. G. Kreines, E. M. Kreines, “The control model for the selection of reference collections providing the impartial assessment of the quality of scientific and technological publications by using bibliometric and scientometric indicators”, Journal of Computer and Systems Sciences International, 55:5, 750–766 | MR | Zbl

[14] W. B. Frakes, R. Baeza-Yates, Information Retrieval: Data Structures and Algorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1992, 630 pp.

[15] G. Salton, C. Buckley, “Term-weighting approaches in automatic text retrieval”, Inform. Processing Management, 24:5 (1988), 513–523

[16] S. E. Robertson, S. Walker, M. Beaulieu, “Experimentation as a way of life: Okapi at TREC”, Inform. Processing Management, 36 (2000), 95–108 | DOI

[17] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, R. Harshman, “Indexing by Latent Semantic Analysis”, J. of American Soc. for Inform. Sci., 41:6 (1990), 391–407 | 3.0.CO;2-9 class='badge bg-secondary rounded-pill ref-badge extid-badge'>DOI

[18] D. M. Blei, “Probabilistic topic models”, Communications of the ACM, 55:4 (2012), 77–84 | DOI | MR

[19] M. Chen, Z. Xu, K. Q. Weinberger, F. Sha, ICML 2012, 2012, 8 pp., arXiv: 1206.4683 [cs.LG]

[20] A. Perina, N. Jojic, M. Bicego, A. Truski, “Documents as multiple overlapping windows into grids of counts”, NIPS 2013, 10–18