All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Building an efficient OCR system for historical documents with little training data

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F20%3A43958971" target="_blank" >RIV/49777513:23520/20:43958971 - isvavai.cz</a>

  • Result on the web

    <a href="https://link.springer.com/content/pdf/10.1007/s00521-020-04910-x.pdf" target="_blank" >https://link.springer.com/content/pdf/10.1007/s00521-020-04910-x.pdf</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/s00521-020-04910-x" target="_blank" >10.1007/s00521-020-04910-x</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Building an efficient OCR system for historical documents with little training data

  • Original language description

    As the number of digitized historical documents has increased rapidly it is necessary to provide efficient methods of information retrieval and knowledge extraction to make the data accessible. Such methods are dependent on optical character recognition (OCR) which converts the document images into textual representations. This paper introduces a set of methods that allows performing an OCR on historical document images using only a small amount of real, manually annotated training data. The presented OCR system includes two main tasks: page layout analysis including text block and line segmentation and OCR. Our seg-mentation methods are based on fully convolutional networks, and the OCR approach utilizes recurrent neural networks. We show that both the segmentation and OCR tasks are feasible with only a few annotated real data samples. The experiments aim at determining the best way how to achieve good performance with the given small set of data. We also demonstrate that obtained scores are comparable or even better than the scores of several state-of-the-art systems.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

  • Continuities

    O - Projekt operacniho programu

Others

  • Publication year

    2020

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Neural Computing and Applications

  • ISSN

    0941-0643

  • e-ISSN

  • Volume of the periodical

    32

  • Issue of the periodical within the volume

    23

  • Country of publishing house

    GB - UNITED KINGDOM

  • Number of pages

    19

  • Pages from-to

    17209-17227

  • UT code for WoS article

    000531222300001

  • EID of the result in the Scopus database

    2-s2.0-85084519412