Automatic Text Categorization: Methods and Problems
Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Kazanskii Gosudarstvennyi Universitet. Uchenye Zapiski. Seriya Fiziko-Matematichaskie Nauki, Tome 150 (2008) no. 4, pp. 25-40
Voir la notice du chapitre de livre provenant de la source Math-Net.Ru
The paper is devoted to analysis of three techniques of text categorization (manual text categorization, knowledge-based text categorization and machine learning). Their advantages and problems are described. Two approaches are considered, intended to overcome problems of automatic text categorization. Their evaluation on public collections is presented. The first method is based on a large linguistic resource: RuThes Thesaurus and ALOT document processing technique. Another one is machine learning method of text categorization, generating descriptions of categories in form of Boolean formulas.
Keywords:
document processing, automatic text categorization, thesaurus, machine-learning.
@article{UZKU_2008_150_4_a1,
author = {M. S. Ageev and B. V. Dobrov and N. V. Loukachevitch},
title = {Automatic {Text} {Categorization:} {Methods} and {Problems}},
journal = {U\v{c}\"enye zapiski Kazanskogo universiteta. Seri\^a Fiziko-matemati\v{c}eskie nauki},
pages = {25--40},
publisher = {mathdoc},
volume = {150},
number = {4},
year = {2008},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/UZKU_2008_150_4_a1/}
}
TY - JOUR AU - M. S. Ageev AU - B. V. Dobrov AU - N. V. Loukachevitch TI - Automatic Text Categorization: Methods and Problems JO - Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki PY - 2008 SP - 25 EP - 40 VL - 150 IS - 4 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/UZKU_2008_150_4_a1/ LA - ru ID - UZKU_2008_150_4_a1 ER -
%0 Journal Article %A M. S. Ageev %A B. V. Dobrov %A N. V. Loukachevitch %T Automatic Text Categorization: Methods and Problems %J Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki %D 2008 %P 25-40 %V 150 %N 4 %I mathdoc %U http://geodesic.mathdoc.fr/item/UZKU_2008_150_4_a1/ %G ru %F UZKU_2008_150_4_a1
M. S. Ageev; B. V. Dobrov; N. V. Loukachevitch. Automatic Text Categorization: Methods and Problems. Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Kazanskii Gosudarstvennyi Universitet. Uchenye Zapiski. Seriya Fiziko-Matematichaskie Nauki, Tome 150 (2008) no. 4, pp. 25-40. http://geodesic.mathdoc.fr/item/UZKU_2008_150_4_a1/