OCR Error Correction for Vietnamese OCR Text with Different Edit Distances
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F22%3A10252180" target="_blank" >RIV/61989100:27240/22:10252180 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/chapter/10.1007/978-3-031-14627-5_13" target="_blank" >https://link.springer.com/chapter/10.1007/978-3-031-14627-5_13</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-031-14627-5_13" target="_blank" >10.1007/978-3-031-14627-5_13</a>
Alternative languages
Result language
angličtina
Original language name
OCR Error Correction for Vietnamese OCR Text with Different Edit Distances
Original language description
Candidate word generation by character edit operations is an important method that has been employed in many OCR error correction approaches. In this paper, we study how character edit distances impact the performance of OCR error correction. We propose the algorithm of generating correction candidates with different edit distances. Correction candidates for both non-word and real-word errors are considered. The candidates are scored and ranked based on linguistic features and edit probability. The experiments are tested on the VNOnDB database used in the Vietnamese online handwritten text recognition competition (VOHTR 2018). We evaluate the error correction performance on different edit distances in terms of two error metrics, character error rate (CER) and word error rate (WER). It is shown that the edit distances of 1 and 2 obtain better correction results instead of higher edit distances. (C) 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10200 - Computer and information sciences
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Lecture Notes in Networks and Systems. Volume 527
ISBN
978-3-031-14626-8
ISSN
2367-3370
e-ISSN
2367-3389
Number of pages
10
Pages from-to
130-139
Publisher name
Springer
Place of publication
Cham
Event location
Sanda
Event date
Sep 7, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000870692600013