Selection technique for multiple outputs of optical character recognition
Eurasian journal of mathematical and computer applications, Tome 8 (2020) no. 2, pp. 41-51.

Voir la notice de l'article provenant de la source Math-Net.Ru

The approach of OCR multiple outputs is used to improve accuracy for low scanning resolution images. The idea of this approach is to incorporate information from multiple outputs of OCR to improve the final OCR output. This approach includes a selection process for choosing the best resulting words among multiple outputs of OCR. However, most existing selection techniques used in the selection process are not context-aware. Therefore, this research proposed a selection technique to overcome the drawbacks of existing techniques. It uses context information of sentences collected from the N-gram language model to improve the final OCR output. The proposed selection technique was evaluated against three other related existing techniques. The evaluation metrics used in this research were Character Error Rate (CER) and Word Error Rate (WER). Experiments showed a relative decrease of 18.26% and 14.23% over the CER and WER of the best existing technique. The proposed selection technique will result in better information extraction through the automatic recognition of low scanning documents.
Keywords: Selection technique, low-resolution images, ocr errors, document recognition.
@article{EJMCA_2020_8_2_a2,
     author = {I. Q. Habeeb and Z. Q. Al-Zaydi and H. N. Abdulkhudhur},
     title = {Selection technique for multiple outputs of optical character recognition},
     journal = {Eurasian journal of mathematical and computer applications},
     pages = {41--51},
     publisher = {mathdoc},
     volume = {8},
     number = {2},
     year = {2020},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/EJMCA_2020_8_2_a2/}
}
TY  - JOUR
AU  - I. Q. Habeeb
AU  - Z. Q. Al-Zaydi
AU  - H. N. Abdulkhudhur
TI  - Selection technique for multiple outputs of optical character recognition
JO  - Eurasian journal of mathematical and computer applications
PY  - 2020
SP  - 41
EP  - 51
VL  - 8
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/EJMCA_2020_8_2_a2/
LA  - en
ID  - EJMCA_2020_8_2_a2
ER  - 
%0 Journal Article
%A I. Q. Habeeb
%A Z. Q. Al-Zaydi
%A H. N. Abdulkhudhur
%T Selection technique for multiple outputs of optical character recognition
%J Eurasian journal of mathematical and computer applications
%D 2020
%P 41-51
%V 8
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/EJMCA_2020_8_2_a2/
%G en
%F EJMCA_2020_8_2_a2
I. Q. Habeeb; Z. Q. Al-Zaydi; H. N. Abdulkhudhur. Selection technique for multiple outputs of optical character recognition. Eurasian journal of mathematical and computer applications, Tome 8 (2020) no. 2, pp. 41-51. http://geodesic.mathdoc.fr/item/EJMCA_2020_8_2_a2/