Topic categorization based on collectives of term weighting methods for natural language call routing
Žurnal Sibirskogo federalʹnogo universiteta. Matematika i fizika, Tome 9 (2016) no. 2, pp. 235-245.

Voir la notice de l'article provenant de la source Math-Net.Ru

Natural language call routing is an important data analysis problem which can be applied in different domains including airspace industry. This paper presents the investigation of collectives of term weighting methods for natural language call routing based on text classification. The main idea is that collectives of different term weighting methods can provide classification effectiveness improvement with the same classification algorithm. Seven different unsupervised and supervised term weighting methods were tested and compared with each other for classification with k-NN. After that different combinations of term weighting methods were formed as collectives. Two approaches for the handling of the collectives were considered: the meta-classifier based on the rule induction and the majority vote procedure. The numerical experiments have shown that the best result is provided with the vote of all seven different term weighting methods. This combination provides a significant increasing of classification effectiveness in comparison with the most effective term weighting methods.
Keywords: natural language call routing, term weighting.
Mots-clés : text classification
@article{JSFU_2016_9_2_a12,
     author = {Roman B. Sergienko and Muhammad Shan and Wolfgang Minker and Eugene S. Semenkin},
     title = {Topic categorization based on collectives of term weighting methods for natural language call routing},
     journal = {\v{Z}urnal Sibirskogo federalʹnogo universiteta. Matematika i fizika},
     pages = {235--245},
     publisher = {mathdoc},
     volume = {9},
     number = {2},
     year = {2016},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/JSFU_2016_9_2_a12/}
}
TY  - JOUR
AU  - Roman B. Sergienko
AU  - Muhammad Shan
AU  - Wolfgang Minker
AU  - Eugene S. Semenkin
TI  - Topic categorization based on collectives of term weighting methods for natural language call routing
JO  - Žurnal Sibirskogo federalʹnogo universiteta. Matematika i fizika
PY  - 2016
SP  - 235
EP  - 245
VL  - 9
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/JSFU_2016_9_2_a12/
LA  - en
ID  - JSFU_2016_9_2_a12
ER  - 
%0 Journal Article
%A Roman B. Sergienko
%A Muhammad Shan
%A Wolfgang Minker
%A Eugene S. Semenkin
%T Topic categorization based on collectives of term weighting methods for natural language call routing
%J Žurnal Sibirskogo federalʹnogo universiteta. Matematika i fizika
%D 2016
%P 235-245
%V 9
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/JSFU_2016_9_2_a12/
%G en
%F JSFU_2016_9_2_a12
Roman B. Sergienko; Muhammad Shan; Wolfgang Minker; Eugene S. Semenkin. Topic categorization based on collectives of term weighting methods for natural language call routing. Žurnal Sibirskogo federalʹnogo universiteta. Matematika i fizika, Tome 9 (2016) no. 2, pp. 235-245. http://geodesic.mathdoc.fr/item/JSFU_2016_9_2_a12/

[1] B. Suhm, J. Bers, D. McCarthy, B. Freeman, D. Getty, K. Godfrey, P. Peterson, “A Comparative Study of Speech in the Call Center: Natural Language Call Routing vs. Touch-Tone Menus”, Proceedings of the SIGCHI conference on Human Factors in Computing Systems, 2002, 283–290

[2] C. Lee, S. Jung, S. Kim, G. Lee, “Example-Based Dialog Modeling for Practical Multi-Domain Dialog System”, Speech Communication, 51:5 (2009), 466–484 | DOI

[3] F. Sebastiani, “Machine Learning in Automated Text Categorization”, ACM computing surveys (CSUR), 34:1 (2002), 1–47 | DOI

[4] G. Salton, Ch.Buckley, “Term-Weighting Approaches in Automatic Text Retrieval”, Information processing management, 24:5 (1988), 513–523 | DOI

[5] F. Debole, F. Sebastiani, “Supervised Term Weighting for Automated Text Categorization”, Text mining and its applications, 2004, 81–97 | DOI

[6] P. Soucy, G. Mineau, “Beyond TFIDF Weighting for Text Categorization in the Vector Space Model”, IJCAI, 5 (2005), 1130–1135

[7] H. Xu, Ch. Li, “A Novel Term Weighting Scheme for Automated Text Categorization”, Seventh International Conference on Intelligent Systems Design and Applications, ISDA 2007, 2007, 759–764

[8] M. Lan, Ch. Tan, J. Su, Y. Lu, “Supervised and Traditional Term Weighting Methods for Automatic Text Categorization”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31:4 (2009), 721–735 | DOI

[9] Y. Ko, “A Study of Term Weighting Schemes Using Class Information for Text Classification”, Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (2012), 1029–1030

[10] T. Gasanova, R. Sergienko, E. Semenkin, W. Minker, “Dimension reduction with coevolutionary genetic algorithm for text classification”, Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics, ICINCO 2014, 2014, 215–222

[11] T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, 2002

[12] T. Joachims, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization, 1996

[13] L. Breiman, “Bagging Predictors”, Machine learning, 24:2 (1996), 123–140 | MR | Zbl

[14] R. Schapire, Y. Singer, “BoosTexter: A Boosting-Based System for Text Categorization”, Machine learning, 39:2–3 (2000), 135–168 | DOI | Zbl

[15] D. Morariu, L. Vintan, V. Tresp, “Meta-Classification Using SVM Classifiers for Text Documents”, Intl. Jrnl. of Applied Mathematics and Computer Sciences, 1:1 (2005)

[16] W. Cohen, “Fast Effective Rule Induction”, Proceedings of the twelfth international conference on machine learning (1995), 115–123

[17] Y. Yang and J. Pedersen, “A Comparative Study on Feature Selection in Text Categorization”, ICML, 97 (1997), 412–420

[18] E. Han, G. Karypis, V. Kumar, Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification, 2001

[19] B. Baharudin, L. Lee, K. Khan, “A Review of Machine Learning Algorithms for Text-Documents Classification”, Journal of advances in information technology, 1:1 (2010), 4–20 | DOI

[20] C. Goutte, E. Gaussier, “A Probabilistic Interpretation of Precision, Recall and F-score, with Implication for Evaluation”, Proceedings of the 27th European conference on Advances in Information Retrieval Research, Springer-Verlag, Berlin, 2005, 345–359 | DOI

[21] M. Rogati, Y. Yiming, “High-Performing Feature Selection for Text Classification”, Proceedings of the eleventh international conference on Information and knowledge management (2002), 659–661

[22] E. Gabrilovich, Sh. Markovitch, “Text Categorization with Many Redundant Features: Using Aggressive Feature Selection to Make SVMs Competitive with C4. 5”, Proceedings of the twenty-first international conference on Machine learning (2004), 41