Method for analyzing the structure of noisy images of administrative documents
    
    
  
  
  
      
      
      
        
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematičeskoe modelirovanie i programmirovanie, Tome 15 (2022) no. 4, pp. 80-89
    
  
  
  
  
  
    
      
      
        
      
      
      
    Voir la notice de l'article provenant de la source Math-Net.Ru
            
              			The problem of extracting content elements (fields) from the images of administrative documents via descriptions of anchoring elements is considered. Administrative documents contain static elements and content elements (filled information). The static objects of the document model are the lines of the document structure and the words. Sets of objects united by properties and relationships are described. The text descriptor can contain attributes that distinguish it from similar descriptors. We suggest using combined descriptors of line segments and words. We showed experimentally that the extraction of object sets improves the recognition accuracy of the document fields by 17% and the accuracy of information extraction by 16%. For optical character recognition, we employed SDK Smart Document Engine in the experiment.
			
            
            
            
          
        
      
                  
                    
                    
                    
                        
Keywords: 
noisy image, document recognition, special text point, descriptor.
                    
                    
                    
                  
                
                
                @article{VYURU_2022_15_4_a6,
     author = {O. A. Slavin and E. L. Pliskin},
     title = {Method for analyzing the structure of noisy images of administrative documents},
     journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a, Matemati\v{c}eskoe modelirovanie i programmirovanie},
     pages = {80--89},
     publisher = {mathdoc},
     volume = {15},
     number = {4},
     year = {2022},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/VYURU_2022_15_4_a6/}
}
                      
                      
                    TY - JOUR AU - O. A. Slavin AU - E. L. Pliskin TI - Method for analyzing the structure of noisy images of administrative documents JO - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematičeskoe modelirovanie i programmirovanie PY - 2022 SP - 80 EP - 89 VL - 15 IS - 4 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/VYURU_2022_15_4_a6/ LA - en ID - VYURU_2022_15_4_a6 ER -
%0 Journal Article %A O. A. Slavin %A E. L. Pliskin %T Method for analyzing the structure of noisy images of administrative documents %J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematičeskoe modelirovanie i programmirovanie %D 2022 %P 80-89 %V 15 %N 4 %I mathdoc %U http://geodesic.mathdoc.fr/item/VYURU_2022_15_4_a6/ %G en %F VYURU_2022_15_4_a6
O. A. Slavin; E. L. Pliskin. Method for analyzing the structure of noisy images of administrative documents. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ, Matematičeskoe modelirovanie i programmirovanie, Tome 15 (2022) no. 4, pp. 80-89. http://geodesic.mathdoc.fr/item/VYURU_2022_15_4_a6/
