Automatic Text Categorization: Methods and Problems
Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Kazanskii Gosudarstvennyi Universitet. Uchenye Zapiski. Seriya Fiziko-Matematichaskie Nauki, Tome 150 (2008) no. 4, pp. 25-40

Voir la notice du chapitre de livre provenant de la source Math-Net.Ru

The paper is devoted to analysis of three techniques of text categorization (manual text categorization, knowledge-based text categorization and machine learning). Their advantages and problems are described. Two approaches are considered, intended to overcome problems of automatic text categorization. Their evaluation on public collections is presented. The first method is based on a large linguistic resource: RuThes Thesaurus and ALOT document processing technique. Another one is machine learning method of text categorization, generating descriptions of categories in form of Boolean formulas.
Keywords: document processing, automatic text categorization, thesaurus, machine-learning.
@article{UZKU_2008_150_4_a1,
     author = {M. S. Ageev and B. V. Dobrov and N. V. Loukachevitch},
     title = {Automatic {Text} {Categorization:} {Methods} and {Problems}},
     journal = {U\v{c}\"enye zapiski Kazanskogo universiteta. Seri\^a Fiziko-matemati\v{c}eskie nauki},
     pages = {25--40},
     publisher = {mathdoc},
     volume = {150},
     number = {4},
     year = {2008},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/UZKU_2008_150_4_a1/}
}
TY  - JOUR
AU  - M. S. Ageev
AU  - B. V. Dobrov
AU  - N. V. Loukachevitch
TI  - Automatic Text Categorization: Methods and Problems
JO  - Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki
PY  - 2008
SP  - 25
EP  - 40
VL  - 150
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/UZKU_2008_150_4_a1/
LA  - ru
ID  - UZKU_2008_150_4_a1
ER  - 
%0 Journal Article
%A M. S. Ageev
%A B. V. Dobrov
%A N. V. Loukachevitch
%T Automatic Text Categorization: Methods and Problems
%J Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki
%D 2008
%P 25-40
%V 150
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/UZKU_2008_150_4_a1/
%G ru
%F UZKU_2008_150_4_a1
M. S. Ageev; B. V. Dobrov; N. V. Loukachevitch. Automatic Text Categorization: Methods and Problems. Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Kazanskii Gosudarstvennyi Universitet. Uchenye Zapiski. Seriya Fiziko-Matematichaskie Nauki, Tome 150 (2008) no. 4, pp. 25-40. http://geodesic.mathdoc.fr/item/UZKU_2008_150_4_a1/