Synthetic document generation for the task of visual document understanding
Proceedings of the Yerevan State University. Physical and mathematical sciences, Tome 58 (2024) no. 3, pp. 79-87

Voir la notice de l'article provenant de la source Math-Net.Ru

Solving the problem of document analysis using machine learning methods requires a large amount of labeled data. Such data is not always available, and if available, it only covers certain types of documents. In this paper, we present a method for creating synthetic data that allows creating documents of any type by pre-defining the document components. By changing the arrangement of document components, text content, and visual elements using configurations, we create diverse and realistic datasets that mimic real documents. This method addresses the problem of the lack of labeled datasets and offers a flexible solution to improve the results of a machine learning model.
Keywords: machine learning, data generation, document understanding
@article{UZERU_2024_58_3_a1,
     author = {Kh. S. Khechoyan},
     title = {Synthetic document generation for the task of visual document understanding},
     journal = {Proceedings of the Yerevan State University. Physical and mathematical sciences},
     pages = {79--87},
     publisher = {mathdoc},
     volume = {58},
     number = {3},
     year = {2024},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/UZERU_2024_58_3_a1/}
}
TY  - JOUR
AU  - Kh. S. Khechoyan
TI  - Synthetic document generation for the task of visual document understanding
JO  - Proceedings of the Yerevan State University. Physical and mathematical sciences
PY  - 2024
SP  - 79
EP  - 87
VL  - 58
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/UZERU_2024_58_3_a1/
LA  - en
ID  - UZERU_2024_58_3_a1
ER  - 
%0 Journal Article
%A Kh. S. Khechoyan
%T Synthetic document generation for the task of visual document understanding
%J Proceedings of the Yerevan State University. Physical and mathematical sciences
%D 2024
%P 79-87
%V 58
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/UZERU_2024_58_3_a1/
%G en
%F UZERU_2024_58_3_a1
Kh. S. Khechoyan. Synthetic document generation for the task of visual document understanding. Proceedings of the Yerevan State University. Physical and mathematical sciences, Tome 58 (2024) no. 3, pp. 79-87. http://geodesic.mathdoc.fr/item/UZERU_2024_58_3_a1/