Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century
Review of the National Center for Digitization, Tome 21 (2012) no. 1.

Voir la notice de l'article provenant de la source eLibrary of Mathematical Institute of the Serbian Academy of Sciences and Arts

This paper presents research aimed at achieving better OCR quality in large scale digitisation of newspapers and books, and opening possibilities of full-text search of digitised old Slovenian printed texts, which should enable digital library end-users to gain better transcriptions of digitised contents. The paper describes on-going work undertaken by the National and University Library of Slovenia and the Jožef Stefan Institute in the framework of the EU research project IMPACT – Improving access to text – to develop high-quality datasets, in particular ground-truth transcriptions (a clean corpus) and a lexicon of historical Slovene.
@article{NCD_2012_21_1_a15,
     author = {Ines Jerele and Toma\v{z} Erjavec and Da\v{s}a Pokorn and Alenka Kav\v{c}i\v{c}-\v{C}oli\'c},
     title = {Optical {Character} {Recognition} of {Historical} {Texts:} {End-User} {Focused} {Research} for {Slovenian} {Books} and {Newspapers} from the 18th and 19th {Century}},
     journal = {Review of the National Center for Digitization},
     pages = {117 - 126},
     publisher = {mathdoc},
     volume = {21},
     number = {1},
     year = {2012},
     url = {http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/}
}
TY  - JOUR
AU  - Ines Jerele
AU  - Tomaž Erjavec
AU  - Daša Pokorn
AU  - Alenka Kavčič-Čolić
TI  - Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century
JO  - Review of the National Center for Digitization
PY  - 2012
SP  - 117 
EP  -  126
VL  - 21
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/
ID  - NCD_2012_21_1_a15
ER  - 
%0 Journal Article
%A Ines Jerele
%A Tomaž Erjavec
%A Daša Pokorn
%A Alenka Kavčič-Čolić
%T Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century
%J Review of the National Center for Digitization
%D 2012
%P 117 - 126
%V 21
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/
%F NCD_2012_21_1_a15
Ines Jerele; Tomaž Erjavec; Daša Pokorn; Alenka Kavčič-Čolić. Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century. Review of the National Center for Digitization, Tome 21 (2012) no. 1. http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/