Prospective automated hierarchical classification of digitized documents
Review of the National Center for Digitization, Tome 29 (2016), p. 42
Cet article a éte moissonné depuis la source eLibrary of Mathematical Institute of the Serbian Academy of Sciences and Arts
The paper presents a proposal of a method for hierarchical classification of digitized documents of NCD digital library. The classification model implements Structured Support Vector Machine method (SSVM) which has shown excellent performance on Ebart corpus of documents in Serbian language. We describe the developed model and its results on Ebart dataset, suggest two types of hierarchies of classes of the NCD library regarding its content and define a protocol for the application of the method to digitized documents.
Keywords:
hierarchical text classification, structured support vector machine method, n-grams, Ebart
@article{NCD_2016_29_a5,
author = {Jovana Kova\v{c}evi\'c and Jelena Graovac},
title = {Prospective automated hierarchical classification of digitized documents},
journal = {Review of the National Center for Digitization},
pages = {42 },
year = {2016},
volume = {29},
language = {en},
url = {http://geodesic.mathdoc.fr/item/NCD_2016_29_a5/}
}
Jovana Kovačević; Jelena Graovac. Prospective automated hierarchical classification of digitized documents. Review of the National Center for Digitization, Tome 29 (2016), p. 42 . http://geodesic.mathdoc.fr/item/NCD_2016_29_a5/