All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

OCR error correction using correction patterns and self-organizing migrating algorithm

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F21%3A10247265" target="_blank" >RIV/61989100:27240/21:10247265 - isvavai.cz</a>

  • Result on the web

    <a href="https://link.springer.com/content/pdf/10.1007/s10044-020-00936-y.pdf" target="_blank" >https://link.springer.com/content/pdf/10.1007/s10044-020-00936-y.pdf</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/s10044-020-00936-y" target="_blank" >10.1007/s10044-020-00936-y</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    OCR error correction using correction patterns and self-organizing migrating algorithm

  • Original language description

    Optical character recognition (OCR) systems help to digitize paper-based historical achieves. However, poor quality of scanned documents and limitations of text recognition techniques result in different kinds of errors in OCR outputs. Post-processing is an essential step in improving the output quality of OCR systems by detecting and cleaning the errors. In this paper, we present an automatic model consisting of both error detection and error correction phases for OCR post-processing. We propose a novel approach of OCR post-processing error correction using correction pattern edits and evolutionary algorithm which has been mainly used for solving optimization problems. Our model adopts a variant of the self-organizing migrating algorithm along with a fitness function based on modifications of important linguistic features. We illustrate how to construct the table of correction pattern edits involving all types of edit operations and being directly learned from the training dataset. Through efficient settings of the algorithm parameters, our model can be performed with high-quality candidate generation and error correction. The experimental results show that our proposed approach outperforms various baseline approaches as evaluated on the benchmark dataset of ICDAR 2017 Post-OCR text correction competition. (C) 2020, Springer-Verlag London Ltd., part of Springer Nature.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database

  • CEP classification

  • OECD FORD branch

    10200 - Computer and information sciences

Result continuities

  • Project

  • Continuities

    S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2021

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Pattern Analysis and Applications

  • ISSN

    1433-7541

  • e-ISSN

    1433-755X

  • Volume of the periodical

    24

  • Issue of the periodical within the volume

    2

  • Country of publishing house

    US - UNITED STATES

  • Number of pages

    21

  • Pages from-to

    701-721

  • UT code for WoS article

    000591971700001

  • EID of the result in the Scopus database

    2-s2.0-85096431401