Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century
Review of the National Center for Digitization, Tome 21 (2012) no. 1
This paper presents research aimed at achieving better OCR quality in large scale digitisation of newspapers and books, and opening possibilities of full-text search of digitised old Slovenian printed texts, which should enable digital library end-users to gain better transcriptions of digitised contents. The paper describes on-going work undertaken by the National and University Library of Slovenia and the Jožef Stefan Institute in the framework of the EU research project IMPACT – Improving access to text – to develop high-quality datasets, in particular ground-truth transcriptions (a clean corpus) and a lexicon of historical Slovene.
@article{NCD_2012_21_1_a15,
author = {Ines Jerele and Toma\v{z} Erjavec and Da\v{s}a Pokorn and Alenka Kav\v{c}i\v{c}-\v{C}oli\'c},
title = {Optical {Character} {Recognition} of {Historical} {Texts:} {End-User} {Focused} {Research} for {Slovenian} {Books} and {Newspapers} from the 18th and 19th {Century}},
journal = {Review of the National Center for Digitization},
pages = {117 - 126},
year = {2012},
volume = {21},
number = {1},
url = {http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/}
}
TY - JOUR AU - Ines Jerele AU - Tomaž Erjavec AU - Daša Pokorn AU - Alenka Kavčič-Čolić TI - Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century JO - Review of the National Center for Digitization PY - 2012 SP - 117 EP - 126 VL - 21 IS - 1 UR - http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/ ID - NCD_2012_21_1_a15 ER -
%0 Journal Article %A Ines Jerele %A Tomaž Erjavec %A Daša Pokorn %A Alenka Kavčič-Čolić %T Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century %J Review of the National Center for Digitization %D 2012 %P 117 - 126 %V 21 %N 1 %U http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/ %F NCD_2012_21_1_a15
Ines Jerele; Tomaž Erjavec; Daša Pokorn; Alenka Kavčič-Čolić. Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century. Review of the National Center for Digitization, Tome 21 (2012) no. 1. http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/