Recognition of OCR Invoice Metadata Block Types
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F18%3A00103049" target="_blank" >RIV/00216224:14330/18:00103049 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-030-00794-2_33" target="_blank" >http://dx.doi.org/10.1007/978-3-030-00794-2_33</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-00794-2_33" target="_blank" >10.1007/978-3-030-00794-2_33</a>
Alternative languages
Result language
angličtina
Original language name
Recognition of OCR Invoice Metadata Block Types
Original language description
Automatically cataloging of thousands of paper-based structured documents is a crucial fund-saving task for future document management systems. Current optical character recognition (OCR) systems process the tabular data with a sufficient level of character-level accuracy; however, the overall structure of the document metadata is still an open practical task. In this paper, we introduce the OCRMiner system designed to extract the indexing metadata of structured documents obtained from an image scanning process and OCR. We present the details of the system modular architecture and evaluate the detection of text block types that appear within invoice documents. The system is based on text analysis in combination of layout features, and is developed and tested in cooperation with a renowned copy machine producer. The system uses an open source OCR and reaches the overall accuracy of 80.1%.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Text, Speech, and Dialogue, 21st International Conference, TSD 2018
ISBN
9783030007935
ISSN
0302-9743
e-ISSN
—
Number of pages
9
Pages from-to
304-312
Publisher name
Springer International Publishing
Place of publication
Switzerland
Event location
Brno, Czech Republic
Event date
Jan 1, 2018
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000611532300033