Document Representations for Classification of Short Web-Page Descriptions
Yugoslav journal of operations research, Tome 18 (2008) no. 1, p. 123 .

Voir la notice de l'article provenant de la source eLibrary of Mathematical Institute of the Serbian Academy of Sciences and Arts

Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-ofwords document representations on the performance of five major classifiers – Naïve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from the dmoz Open Directory Web-page ontology, and classifiers are trained to automatically determine the topics which may be relevant to a previously unseen Web-page. Different transformations of input data: stemming, normalization, logtf and idf, together with dimensionality reduction, are found to have a statistically significant improving or degrading effect on classification performance measured by classical metrics – accuracy, precision, recall, $F_1$ and $F_2$. The emphasis of the study is not on determining the best document representation which corresponds to each classifier, but rather on describing the effects of every individual transformation on classification, together with their mutual relationships.
Classification : 68T05 68U35 68T10
Keywords: Text categorization, document representation, machine learning.
@article{YJOR_2008_18_1_a10,
     author = {Milo\v{s} Radovanovi\'c and Mirjana Ivanovi\'c},
     title = {Document {Representations} for {Classification} of {Short} {Web-Page} {Descriptions}},
     journal = {Yugoslav journal of operations research},
     pages = {123 },
     publisher = {mathdoc},
     volume = {18},
     number = {1},
     year = {2008},
     zbl = {1199.68230},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/YJOR_2008_18_1_a10/}
}
TY  - JOUR
AU  - Miloš Radovanović
AU  - Mirjana Ivanović
TI  - Document Representations for Classification of Short Web-Page Descriptions
JO  - Yugoslav journal of operations research
PY  - 2008
SP  - 123 
VL  - 18
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/YJOR_2008_18_1_a10/
LA  - en
ID  - YJOR_2008_18_1_a10
ER  - 
%0 Journal Article
%A Miloš Radovanović
%A Mirjana Ivanović
%T Document Representations for Classification of Short Web-Page Descriptions
%J Yugoslav journal of operations research
%D 2008
%P 123 
%V 18
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/YJOR_2008_18_1_a10/
%G en
%F YJOR_2008_18_1_a10
Miloš Radovanović; Mirjana Ivanović. Document Representations for Classification of Short Web-Page Descriptions. Yugoslav journal of operations research, Tome 18 (2008) no. 1, p. 123 . http://geodesic.mathdoc.fr/item/YJOR_2008_18_1_a10/