Framework for Fuzzy Classification of Digitized Documents
Review of the National Center for Digitization, Tome 32 (2018) no. 1
Cet article a éte moissonné depuis la source eLibrary of Mathematical Institute of the Serbian Academy of Sciences and Arts
The classification of a text document with respect to a predefined set of classes is an
assignment of one of the values 0 or 1 to each ordered pair (document, class), depending on whether the
document belongs to the class or not. Fuzzy classification generalizes this notion by enabling the membership
to be expressed by any real number between 0 and 1. In this paper, we show one possible method of fuzzy
classification by using the existing formulas for calculating the distance of a document from a class. As an
illustration, we use this method to form a fuzzy classification of a subset of documents from Ebart-hier
corpus. After that, we briefly describe the current state of the National Center for Digitization virtual library
and show by an example how fuzzy classification can be used to improve the organization of the Library data
and extend the querying possibilities.
@article{NCD_2018_32_1_a0,
author = {Aleksandar Janji\'c},
title = {Framework for {Fuzzy} {Classification} of {Digitized} {Documents}},
journal = {Review of the National Center for Digitization},
pages = {1 - 26},
year = {2018},
volume = {32},
number = {1},
url = {http://geodesic.mathdoc.fr/item/NCD_2018_32_1_a0/}
}
Aleksandar Janjić. Framework for Fuzzy Classification of Digitized Documents. Review of the National Center for Digitization, Tome 32 (2018) no. 1. http://geodesic.mathdoc.fr/item/NCD_2018_32_1_a0/