Automatic training data filtering for errors removing and improving the quality of the final neural network
Informacionnye tehnologii i vyčislitelnye sistemy, no. 3 (2022), pp. 35-42.

Voir la notice de l'article provenant de la source Math-Net.Ru

Real-world data are often dirty. In most cases it negatively affects the accuracy of the model trained on such data. Supervised data correction is an expensive and time-consuming procedure. So one of the possible ways to solve this problem is to automate the cleaning process. In this paper, we consider such a preprocessing technique for improving the quality of the trained network as automatic cleaning of training data. The proposed iterative method is based on the assumption that the polluted data are most likely located farther away from the median of the class. It includes detection and subsequent removal of the noisy data from a training set. Experiments on a generated synthetic dataset demonstrated that this method gives good results and allows to clean up the data even at high levels of pollution and significantly improve the quality of the classifier.
Keywords: data cleaning, outlier(s) detection, mislabels, siamese neural network.
Mots-clés : classifier
@article{ITVS_2022_3_a3,
     author = {N. Z. Valishina and S. A. Ilyuhin and A. V. Sheshkus and V. L. Arlazarov},
     title = {Automatic training data filtering for errors removing and improving the quality of the final neural network},
     journal = {Informacionnye tehnologii i vy\v{c}islitelnye sistemy},
     pages = {35--42},
     publisher = {mathdoc},
     number = {3},
     year = {2022},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ITVS_2022_3_a3/}
}
TY  - JOUR
AU  - N. Z. Valishina
AU  - S. A. Ilyuhin
AU  - A. V. Sheshkus
AU  - V. L. Arlazarov
TI  - Automatic training data filtering for errors removing and improving the quality of the final neural network
JO  - Informacionnye tehnologii i vyčislitelnye sistemy
PY  - 2022
SP  - 35
EP  - 42
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/ITVS_2022_3_a3/
LA  - en
ID  - ITVS_2022_3_a3
ER  - 
%0 Journal Article
%A N. Z. Valishina
%A S. A. Ilyuhin
%A A. V. Sheshkus
%A V. L. Arlazarov
%T Automatic training data filtering for errors removing and improving the quality of the final neural network
%J Informacionnye tehnologii i vyčislitelnye sistemy
%D 2022
%P 35-42
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/ITVS_2022_3_a3/
%G en
%F ITVS_2022_3_a3
N. Z. Valishina; S. A. Ilyuhin; A. V. Sheshkus; V. L. Arlazarov. Automatic training data filtering for errors removing and improving the quality of the final neural network. Informacionnye tehnologii i vyčislitelnye sistemy, no. 3 (2022), pp. 35-42. http://geodesic.mathdoc.fr/item/ITVS_2022_3_a3/