Analysis of the texts for predicting the churn of ISP
Čelâbinskij fiziko-matematičeskij žurnal, Tome 3 (2018) no. 2, pp. 227-236.

Voir la notice de l'article provenant de la source Math-Net.Ru

The possibility of forecasting the churn of customers based on the data of the Russian ISP are considered. The basic stages and approaches to the preliminary processing of the texts of operators’ comments have been determined. It’s offered to use classification algorithms such as the logistic regression, $k$-nearest neighbors method, the gradient boosting, the naive Bayesian algorithm. As a sample, an array of input data from 23 features of 380 000 subscribers was formed. Typos are correcting with using the Dahmerau — Levenshtein distance and lemmatizing of the textual information, and then they are converted into a feature vector using the TF-IDF method and are added to the model. The main approaches of categorical features coding are determined. The forecast models are constructed. Comparison of the results of the study with different classifiers is made and conclusions are drawn.
Keywords: prediction, clients churn, ISP, python, customers calls, classification, analysis of texts, tf-idf.
@article{CHFMJ_2018_3_2_a8,
     author = {A. A. Karyakina and D. S. Botov},
     title = {Analysis of the texts for predicting the churn of {ISP}},
     journal = {\v{C}el\^abinskij fiziko-matemati\v{c}eskij \v{z}urnal},
     pages = {227--236},
     publisher = {mathdoc},
     volume = {3},
     number = {2},
     year = {2018},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/CHFMJ_2018_3_2_a8/}
}
TY  - JOUR
AU  - A. A. Karyakina
AU  - D. S. Botov
TI  - Analysis of the texts for predicting the churn of ISP
JO  - Čelâbinskij fiziko-matematičeskij žurnal
PY  - 2018
SP  - 227
EP  - 236
VL  - 3
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CHFMJ_2018_3_2_a8/
LA  - ru
ID  - CHFMJ_2018_3_2_a8
ER  - 
%0 Journal Article
%A A. A. Karyakina
%A D. S. Botov
%T Analysis of the texts for predicting the churn of ISP
%J Čelâbinskij fiziko-matematičeskij žurnal
%D 2018
%P 227-236
%V 3
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CHFMJ_2018_3_2_a8/
%G ru
%F CHFMJ_2018_3_2_a8
A. A. Karyakina; D. S. Botov. Analysis of the texts for predicting the churn of ISP. Čelâbinskij fiziko-matematičeskij žurnal, Tome 3 (2018) no. 2, pp. 227-236. http://geodesic.mathdoc.fr/item/CHFMJ_2018_3_2_a8/

[1] Ageev M.S., Methods of automatic text classification based on machine learning and expert knowledge, Thesis, Moscow, 2004, 136 pp. (In Russ.)

[2] Popkov M.I., Automatic text classification system for the knowledge base of the enterprise, Master's Thesis, Lomonosov Moscow State University, Moscow, 2014, 56 pp. (In Russ.)

[3] Terminology extraction, (accessed 28.12.2017) https://en.wikipedia.org/wiki/Terminology_extraction

[4] El-Khair I. A., Term Weighting (accessed 27.12.2017) | DOI

[5] Vektornaya model, (data obrascheniya: 28.12.2017) https://ru.wikipedia.org/wiki/Vektornaya_model

[6] Tokareva E.I., Hierarchical classification of texts, Graduate work, Lomonosov Moscow State University, Moscow, 2010, 46 pp. (In Russ.)

[7] TF-IDF, (accessed 29.12.2017) https://ru.wikipedia.org/wiki/TF-IDF

[8] The function of tokenizing the text in python, (In Russ.) (accessed 29.12.2017) http://zabaykin.ru/?p=77

[9] A. Dyakonov, Python: kategorialnye priznaki, (data obrascheniya: 29.12.2017) https://alexanderdyakonov.wordpress.com/2016/08/03/python-kategorialnye-priznaki/

[10] Kravchenko A., Open course of machine learning. Topic 6. Construction and selection of features, (In Russ.) (accessed 29.12.2017) https://habrahabr.ru/company/ods/blog/325422/

[11] M. Ikonomakis, S. Kotsiantis, V. Tampakas, “Text classification using machine learning yechniques”, WSEAS Transcations on computers, 4:8 (2005), 966–974

[12] Classification of texts using a bag of words. Leadership, (In Russ.) (accessed 29.12.2017) http://datareview.info/article/klassifikatsiya-tekstov-s-pomoshhyu-meshka-slov-rukovodstvo/

[13] Nizhybitskiy E.A., Review of classification algorithms for documents, (In Russ.) (accessed 29.12.2017) http://www.machinelearning.ru/wiki/images/e/ef/NizhibitskyKurs.pdf

[14] Fonarev A.Yu., Machine learning with categorical features, (In Russ.) (accessed 29.12.2017) http://www.machinelearning.ru/wiki/images/6/62/2014_517_FonarevAY.pdf

[15] Stop words, (In Russ.) (accessed 29.12.2017) https://klondike-studio.ru/wiki/stop-slova/

[16] Rasstoyanie Damerau — Levenshteina, (data obrascheniya: 29.12.2017) https://ru.wikipedia.org/wiki/Rasstoyanie_Damerau_–_Levenshteina

[17] J. Zobel, P. Dart, Phonetic String Matching: Lessons from Information Retrieval (accessed 29.12.2017) http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.18.2138&rep=rep1&type=pdf

[18] Source code of the library pyxDamerauLevenshtein, (accessed 29.12.2017) https://github.com/gfairchild/pyxDamerauLevenshtein

[19] Source, (accessed 29.12.2017) https://github.com/KiraTanaka/Prediction-churn-with-analysis-texts