Review of the National Center for Digitization, Tome 21 (2012) no. 1
Citer cet article
Ines Jerele; Tomaž Erjavec; Daša Pokorn; Alenka Kavčič-Čolić. Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century. Review of the National Center for Digitization, Tome 21 (2012) no. 1. http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/
@article{NCD_2012_21_1_a15,
author = {Ines Jerele and Toma\v{z} Erjavec and Da\v{s}a Pokorn and Alenka Kav\v{c}i\v{c}-\v{C}oli\'c},
title = {Optical {Character} {Recognition} of {Historical} {Texts:} {End-User} {Focused} {Research} for {Slovenian} {Books} and {Newspapers} from the 18th and 19th {Century}},
journal = {Review of the National Center for Digitization},
pages = {117 - 126},
year = {2012},
volume = {21},
number = {1},
url = {http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/}
}
TY - JOUR
AU - Ines Jerele
AU - Tomaž Erjavec
AU - Daša Pokorn
AU - Alenka Kavčič-Čolić
TI - Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century
JO - Review of the National Center for Digitization
PY - 2012
SP - 117
EP - 126
VL - 21
IS - 1
UR - http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/
ID - NCD_2012_21_1_a15
ER -
%0 Journal Article
%A Ines Jerele
%A Tomaž Erjavec
%A Daša Pokorn
%A Alenka Kavčič-Čolić
%T Optical Character Recognition of Historical Texts: End-User Focused Research for Slovenian Books and Newspapers from the 18th and 19th Century
%J Review of the National Center for Digitization
%D 2012
%P 117 - 126
%V 21
%N 1
%U http://geodesic.mathdoc.fr/item/NCD_2012_21_1_a15/
%F NCD_2012_21_1_a15
This paper presents research aimed at achieving better OCR quality in large scale digitisation of newspapers and books, and opening possibilities of full-text search of digitised old Slovenian printed texts, which should enable digital library end-users to gain better transcriptions of digitised contents. The paper describes on-going work undertaken by the National and University Library of Slovenia and the Jožef Stefan Institute in the framework of the EU research project IMPACT – Improving access to text – to develop high-quality datasets, in particular ground-truth transcriptions (a clean corpus) and a lexicon of historical Slovene.