Learning to predict closed questions on Stack Overflow
Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, Tome 155 (2013) no. 4, pp. 118-133 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

The paper deals with the problem of predicting whether the user's question will be closed by the moderator on Stack Overflow, a popular question answering service devoted to software programming. The task along with data and evaluation metrics was offered as an open machine learning competition on Kaggle platform. To solve this problem, we employed a wide range of classification features related to users, their interactions, and post content. Classification was carried out using several machine learning methods. According to the results of the experiment, the most important features are characteristics of the user and topical features of the question. The best results were obtained using Vowpal Wabbit – an implementation of online learning based on stochastic gradient descent. Our results are among the best ones in overall ranking, although they were obtained after the official competition was over.
Keywords: community question answering systems
Mots-clés : large-scale classification, question classification.
@article{UZKU_2013_155_4_a11,
     author = {G. Lezina and A. Kuznetsov and P. Braslavski},
     title = {Learning to predict closed questions on {Stack} {Overflow}},
     journal = {U\v{c}\"enye zapiski Kazanskogo universiteta. Seri\^a Fiziko-matemati\v{c}eskie nauki},
     pages = {118--133},
     year = {2013},
     volume = {155},
     number = {4},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a11/}
}
TY  - JOUR
AU  - G. Lezina
AU  - A. Kuznetsov
AU  - P. Braslavski
TI  - Learning to predict closed questions on Stack Overflow
JO  - Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki
PY  - 2013
SP  - 118
EP  - 133
VL  - 155
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a11/
LA  - en
ID  - UZKU_2013_155_4_a11
ER  - 
%0 Journal Article
%A G. Lezina
%A A. Kuznetsov
%A P. Braslavski
%T Learning to predict closed questions on Stack Overflow
%J Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki
%D 2013
%P 118-133
%V 155
%N 4
%U http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a11/
%G en
%F UZKU_2013_155_4_a11
G. Lezina; A. Kuznetsov; P. Braslavski. Learning to predict closed questions on Stack Overflow. Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, Tome 155 (2013) no. 4, pp. 118-133. http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a11/

[1] Agichtein E., Castillo C., Donato D., Gionis A., Mishne G., “Finding high-quality content in social media”, Proc. 2008 Int. Conf. on Web Search and Data Mining, ACM, N.Y., 2008, 183–194

[2] Harper F. M., Daniel M., Konstan J. A., “Facts or friends?: distinguishing informational and conversational questions in social Q A sites”, Proc. 27th Int. Conf. on Human Factors in Computing Systems, ACM, N.Y., 2009, 759–768

[3] Rodrigues E. M., Milic-Frayling N., “Socializing or knowledge sharing?: characterizing social intent in community question answering”, Proc. 18th ACM Conf. on Information and Knowledge Management, ACM, N.Y., 2009, 1127–1136

[4] Li B., Tan J., Lyu M. R., King I., Mak B., “Analyzing and predicting question quality in community question answering services”, Proc. 21st Int. Conf. Companion on World Wide Web, ACM, N.Y., 2012, 775–782

[5] Asaduzzaman M., Mashiyat A. S., Roy C. K., Schneider K. A., “Answering questions about unanswered questions of stack overflow”, Proc. Tenth Int. Workshop on Mining Software Repositories, IEEE Press, 2013, 97–100

[6] Barua A., Thomas S. W., Hassan A. E., “What are developers talking about? An analysis of topics and trends in Stack Overflow”, Empir. Software Eng., 19:3 (2014), 619–654 | DOI

[7] Correa D., Sureka A., Fit or unfit: Analysis and prediction of “closed questions” on stack overflow, 2013, arXiv: 1307.7291

[8] Blei D. M., Ng A. Y., Jordan M. I., “Latent dirichlet allocation”, J. Mach. Learn. Res., 3 (2003), 993–1022 | Zbl

[9] Draminski M., Rada-Iglesias A., Enroth S., Wadelius C., Koronacki J., Komorowski J., “Monte Carlo feature selection for supervised classification”, Bionformatics, 24:4 (2008), 110–117 | DOI

[10] Joachims T., “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, Proc. 10th Eur. Conf. on Machine Learning, Springer-Verlag, London, 1998, 137–142

[11] Langford J., Li L., Zhang T., “Sparse online learning via truncated gradient”, J. Mach. Learn. Res., 10 (2009), 777–801 | MR | Zbl