Modification biterm topic model input feature for detecting topic in thematic virtual museums
Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 14 (2018) no. 3, pp. 243-251 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

This paper describes the method for detecting topic in short text documents developed by the authors. The method called Feature BTM, based on the modification of the third step of the generative process of the well-known BTM model. The authors conducted experiments of quality evaluation that have shown the advantage of efficiency by the modified Feature BTM model before the Standard BTM model. The thematic clustering technology of documents necessary for the creation of thematic virtual museums has described. The authors performed a performance evaluation that shows a slight loss of speed (less than 30 seconds), more effective using the Feature-BTM for clustering the virtual museum collection than the Standard BTM model.
Keywords: topic model, short text, BTM, clustering, thematic virtual museums.
Mots-clés : biterm
@article{VSPUI_2018_14_3_a4,
     author = {S. Anggai and I. S. Blekanov and S. L. Sergeev},
     title = {Modification biterm topic model input feature for detecting topic in thematic virtual museums},
     journal = {Vestnik Sankt-Peterburgskogo universiteta. Prikladna\^a matematika, informatika, processy upravleni\^a},
     pages = {243--251},
     year = {2018},
     volume = {14},
     number = {3},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/VSPUI_2018_14_3_a4/}
}
TY  - JOUR
AU  - S. Anggai
AU  - I. S. Blekanov
AU  - S. L. Sergeev
TI  - Modification biterm topic model input feature for detecting topic in thematic virtual museums
JO  - Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
PY  - 2018
SP  - 243
EP  - 251
VL  - 14
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/VSPUI_2018_14_3_a4/
LA  - en
ID  - VSPUI_2018_14_3_a4
ER  - 
%0 Journal Article
%A S. Anggai
%A I. S. Blekanov
%A S. L. Sergeev
%T Modification biterm topic model input feature for detecting topic in thematic virtual museums
%J Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
%D 2018
%P 243-251
%V 14
%N 3
%U http://geodesic.mathdoc.fr/item/VSPUI_2018_14_3_a4/
%G en
%F VSPUI_2018_14_3_a4
S. Anggai; I. S. Blekanov; S. L. Sergeev. Modification biterm topic model input feature for detecting topic in thematic virtual museums. Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 14 (2018) no. 3, pp. 243-251. http://geodesic.mathdoc.fr/item/VSPUI_2018_14_3_a4/

[1] Anggai S., The design and implementation of social networking at virtual museum of Indonesia (a case study museum of geology), Bandung Institute of Technology, Bandung, Indonesia, 2012, 60 pp.

[2] Foo S., “Online virtual exhibitions: concepts and design considerations”, Journal of Library and Information Technology, 28 (2008), 1–19

[3] Champion E., “Entertaining the similarities and distinctions between serious games and virtual heritage projects”, Entertainment Computing, 14 (2016), 67–74 | DOI

[4] Palombini A., “Storytelling and telling history. Towards a grammar of narratives for cultural heritage dissemination in the digital era”, Journal of Cultural Heritage, 24 (2017), 134–139 | DOI

[5] Deerwester S., Dumais S. T., Furnas G. W., Landauer T. K., Harshman R., “Indexing by latent semantic analysis”, Journal of the American Society for Information Science, 41 (1990), 391–407 | 3.0.CO;2-9 class='badge bg-secondary rounded-pill ref-badge extid-badge'>DOI

[6] Foltz P. W., “Using latent semantic indexing for information filtering”, SIGOIS Bull., 11 (1990), 40–47 | DOI

[7] Hofmann T., “Probabilistic latent semantic indexing”, Proceedings of the 22nd Annual International ACM SIGIR conference on Research and Development in Information Retrieval (Berkeley, California, USA, 1999), 50–57

[8] Hofmann T., “Unsupervised learning by probabilistic latent semantic analysis”, Machine Learning, 42 (2001), 177–196 | DOI | Zbl

[9] Blei D. M., Ng A. Y., Jordan M. I., “Latent dirichlet allocation”, The Journal of Machine Learning Research, 3:2 (2003), 993–1022 | Zbl

[10] Yan X., Guo J., Lan Y., Cheng X., “A biterm topic model for short texts”, Proceedings of the 22nd International conference on World Wide Web, WWW'13 (Rio de Janeiro, Brazil, 2013), 1445–1456

[11] Cheng X., Yan X., Lan Y., Guo J., “Topic modeling over short texts”, IEEE Transactions on Knowledge and Data Engineering, 2014, 2928–2941 | DOI

[12] Xu J., Liu P., Wu G., Sun Z., Xu B., Hao H., “A fast matching method based on semantic similarity for short texts”, Natural Language Processing and Chinese Computing, 400 (2013), 299–309 | DOI

[13] Wang P., Zhang H., Liu B. X., Hao H., “Short text feature enrichment using link analysis on topic-keyword graph”, Natural Language Processing and Chinese Computing, 496 (2014), 79–90

[14] Griffiths T., Gibbs sampling in the generative model of latent dirichlet allocation, Stanford technical report, Stanford, California, USA, 2002, 3 pp.

[15] He X., Xu H., Li J., He L., Yu L., “FastBTM: reducing the sampling time for biterm topic model”, Knowledge-Based Systems, 132 (2017), 11–20 | DOI

[16] Mimno D., Wallach H. M., Talley E., Leenders M., McCallum A., “Optimizing semantic coherence in topic models”, Proceedings of the Conference on Empirical Methods in Natural Language Processing (Edinburgh, United Kingdom, 2011), 262–272

[17] Stevens K., Kegelmeyer P., Andrzejewski D., Buttler D., “Exploring topic coherence over many models and many topics”, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (Jeju Island, Korea, 2012), 952–961

[18] Xia Y., Tang N., Hussain A., Cambria E., “Discriminative Bi-Term topic model for headline-based social news clustering”, The Twenty-Eighth International Florida Artificial Intelligence Research Society Conference (Florida, North America, 2015), 311–316

[19] Manning C. D., Raghavan P., Schutze H., Introduction to Information Retrieval, Cambridge University Press, New York, USA, 2008, 61–123 | DOI

[20] Zhang W., Yoshida T., Tang X., “A comparative study of TF{*}IDF, LSI and multi-words for text classification”, Expert Systems with Applications, 38:2 (2011), 2758–2765 | DOI

[21] Anggai S., Blekanov I. S., Sergeev S. L., “Index data structure, functionality and microservices in thematic virtual museums”, Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 14:1 (2018), 31–39 | DOI