Error Correction for Information Retrieval of Czech Documents

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F18%3A43952043" target="_blank" >RIV/49777513:23520/18:43952043 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Error Correction for Information Retrieval of Czech Documents
Popis výsledku v původním jazyce
This paper proposes a novel system for searching information over a set of scanned documents in Czech language. The documents are in the form of raster images and thus they are first converted into the text form by optical character recognition (OCR). Then OCR errors are corrected and the corrected texts are indexed and stored into a full-text database. The database provides a possibility of searching over these documents. This paper describes all components of the above mentioned system with a particular focus on the proposed OCR correction method. We experimentally show that the proposed approach is efficient, because it corrects a significant number of errors. We also create a small Czech corpus to evaluate OCR error correction methods which represent another contribution of this paper.
Název v anglickém jazyce
Error Correction for Information Retrieval of Czech Documents
Popis výsledku anglicky
This paper proposes a novel system for searching information over a set of scanned documents in Czech language. The documents are in the form of raster images and thus they are first converted into the text form by optical character recognition (OCR). Then OCR errors are corrected and the corrected texts are indexed and stored into a full-text database. The database provides a possibility of searching over these documents. This paper describes all components of the above mentioned system with a particular focus on the proposed OCR correction method. We experimentally show that the proposed approach is efficient, because it corrects a significant number of errors. We also create a small Czech corpus to evaluate OCR error correction methods which represent another contribution of this paper.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/LO1506" target="_blank" >LO1506: Podpora udržitelnosti centra NTIS - Nové technologie pro informační společnost</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the 10th International Conference on Agents and Artificial Intelligence
ISBN
978-989-758-275-2
ISSN
—
e-ISSN
neuvedeno
Počet stran výsledku
5
Strana od-do
630-634
Název nakladatele
SciTePress
Místo vydání
Setúbal
Místo konání akce
Funchal, Madeira - Portugal
Datum konání akce
16. 1. 2018
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

An Efficient Unsupervised Approach for OCR Error Correction of Vietnamese OCR Text OCR error correction using correction patterns and self-organizing migrating algorithm OCR Error Correction for Vietnamese OCR Text with Different Edit Distances

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Error Correction for Information Retrieval of Czech Documents

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)