Error Correction for Information Retrieval of Czech Documents
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F18%3A43952043" target="_blank" >RIV/49777513:23520/18:43952043 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Error Correction for Information Retrieval of Czech Documents
Original language description
This paper proposes a novel system for searching information over a set of scanned documents in Czech language. The documents are in the form of raster images and thus they are first converted into the text form by optical character recognition (OCR). Then OCR errors are corrected and the corrected texts are indexed and stored into a full-text database. The database provides a possibility of searching over these documents. This paper describes all components of the above mentioned system with a particular focus on the proposed OCR correction method. We experimentally show that the proposed approach is efficient, because it corrects a significant number of errors. We also create a small Czech corpus to evaluate OCR error correction methods which represent another contribution of this paper.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/LO1506" target="_blank" >LO1506: Sustainability support of the centre NTIS - New Technologies for the Information Society</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 10th International Conference on Agents and Artificial Intelligence
ISBN
978-989-758-275-2
ISSN
—
e-ISSN
neuvedeno
Number of pages
5
Pages from-to
630-634
Publisher name
SciTePress
Place of publication
Setúbal
Event location
Funchal, Madeira - Portugal
Event date
Jan 16, 2018
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—