Effective clustering of a text sample depending on the different parameterization of this sample
Informacionnye tehnologii i vyčislitelnye sistemy, no. 4 (2019), pp. 60-69
Cet article a éte moissonné depuis la source Math-Net.Ru
The Internet becomes the primary means of receiving text news. As a result, there is a necessity in automated processing of large data amount. One of the most important tasks is the automated cultivation of text information. In this paper we will consider the problem of effective clustering for objects from text sample. The most common representation of the text set is the matrix, which elements are the statistical measure values calculated on the basis of the word frequency. In opposition to we suggest parametrization by the text key words. We use two methods to provide the clustering: K-means and Dbscan. This paper considers the analysis of mentioned methods and provide comparison of the clustering quality results, which depend on various text parameterization and the used algorithm.
Keywords:
Clustering, text set, sample parameterization, tf-idf-measure, keywords, effective method.
@article{ITVS_2019_4_a5,
author = {E. A. Golovastova and D. N. Krasotin},
title = {Effective clustering of a text sample depending on the different parameterization of this sample},
journal = {Informacionnye tehnologii i vy\v{c}islitelnye sistemy},
pages = {60--69},
year = {2019},
number = {4},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/ITVS_2019_4_a5/}
}
TY - JOUR AU - E. A. Golovastova AU - D. N. Krasotin TI - Effective clustering of a text sample depending on the different parameterization of this sample JO - Informacionnye tehnologii i vyčislitelnye sistemy PY - 2019 SP - 60 EP - 69 IS - 4 UR - http://geodesic.mathdoc.fr/item/ITVS_2019_4_a5/ LA - ru ID - ITVS_2019_4_a5 ER -
%0 Journal Article %A E. A. Golovastova %A D. N. Krasotin %T Effective clustering of a text sample depending on the different parameterization of this sample %J Informacionnye tehnologii i vyčislitelnye sistemy %D 2019 %P 60-69 %N 4 %U http://geodesic.mathdoc.fr/item/ITVS_2019_4_a5/ %G ru %F ITVS_2019_4_a5
E. A. Golovastova; D. N. Krasotin. Effective clustering of a text sample depending on the different parameterization of this sample. Informacionnye tehnologii i vyčislitelnye sistemy, no. 4 (2019), pp. 60-69. http://geodesic.mathdoc.fr/item/ITVS_2019_4_a5/