Automatic classification of documents in the university electronic document management system
Informacionnye tehnologii i vyčislitelnye sistemy, no. 1 (2023), pp. 3-19
Cet article a éte moissonné depuis la source Math-Net.Ru
The issues of automatic text documents classification of the university in the electronic document management system are considered. A two-stage classification method based on machine learning and a numerical representation of documents is presented. It is proposed at the first stage of the method to reduce the collection size by screening out documents that do not belong to accepted classes (according to the probability of novelty of documents). At the second stage, the selection of documents with the highest occurrence frequencies of words characteristic of accepted classes documents is carried out (the formation of support vectors). The document is assigned a class to which most of the closest documents belong in accordance with the accepted distance metric. A set of programs for the text documents classification has been implemented, which is the basis for the information support of the university electronic document management system, and studies have been carried out confirming the effectiveness of the proposed method.
Mots-clés :
document classification
Keywords: the novelty of text documents, probabilistic thematic model, support vector machine, $k$-nearest neighbors.
Keywords: the novelty of text documents, probabilistic thematic model, support vector machine, $k$-nearest neighbors.
@article{ITVS_2023_1_a0,
author = {A. L. Tkachenko and L. A. Denisova},
title = {Automatic classification of documents in the university electronic document management system},
journal = {Informacionnye tehnologii i vy\v{c}islitelnye sistemy},
pages = {3--19},
year = {2023},
number = {1},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/ITVS_2023_1_a0/}
}
TY - JOUR AU - A. L. Tkachenko AU - L. A. Denisova TI - Automatic classification of documents in the university electronic document management system JO - Informacionnye tehnologii i vyčislitelnye sistemy PY - 2023 SP - 3 EP - 19 IS - 1 UR - http://geodesic.mathdoc.fr/item/ITVS_2023_1_a0/ LA - ru ID - ITVS_2023_1_a0 ER -
%0 Journal Article %A A. L. Tkachenko %A L. A. Denisova %T Automatic classification of documents in the university electronic document management system %J Informacionnye tehnologii i vyčislitelnye sistemy %D 2023 %P 3-19 %N 1 %U http://geodesic.mathdoc.fr/item/ITVS_2023_1_a0/ %G ru %F ITVS_2023_1_a0
A. L. Tkachenko; L. A. Denisova. Automatic classification of documents in the university electronic document management system. Informacionnye tehnologii i vyčislitelnye sistemy, no. 1 (2023), pp. 3-19. http://geodesic.mathdoc.fr/item/ITVS_2023_1_a0/