An OCR-based application using Tesseract engine to extract text information from ultrasound B-MODE images
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F47813059%3A19240%2F24%3AA0001431" target="_blank" >RIV/47813059:19240/24:A0001431 - isvavai.cz</a>
Výsledek na webu
<a href="https://ceur-ws.org/Vol-3792/paper12.pdf" target="_blank" >https://ceur-ws.org/Vol-3792/paper12.pdf</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
An OCR-based application using Tesseract engine to extract text information from ultrasound B-MODE images
Popis výsledku v původním jazyce
In this paper we introduce our developed OCR-based application focused on the extraction of text data from ultrasound B-MODE images. These images contain not only visual image but also additional, important text information about the examination, image data, etc. The extraction of these data is helpful in clinical practice. The application has a simple, user-friendly interface to use. The core of the algorithm to extract the text data is focused on Tesseract engine with C# programming language; front-end is a Windows Forms application. Although this software is fast, simple and user-friendly, in some cases, the recognition could produce a error, like patient’s name incorrectly recognized and/or some characters are missing or mistaken. That means, some text in the image could be missed or misinterpreted in comparison with the original. However, while the software could be a helpful tool, the recognition accuracy should be improved. After that, the application can be used for different types of medical images, e.g. CT/CTA, PET, SPECT and many more. Currently achieved accuracy is about 90 % in average. The authors discuss some ideas to increase the accuracy of the recognition and also some front-end features that can be improved for more comfortable use. Extracted text data can be saved as a CSV file for further processing.
Název v anglickém jazyce
An OCR-based application using Tesseract engine to extract text information from ultrasound B-MODE images
Popis výsledku anglicky
In this paper we introduce our developed OCR-based application focused on the extraction of text data from ultrasound B-MODE images. These images contain not only visual image but also additional, important text information about the examination, image data, etc. The extraction of these data is helpful in clinical practice. The application has a simple, user-friendly interface to use. The core of the algorithm to extract the text data is focused on Tesseract engine with C# programming language; front-end is a Windows Forms application. Although this software is fast, simple and user-friendly, in some cases, the recognition could produce a error, like patient’s name incorrectly recognized and/or some characters are missing or mistaken. That means, some text in the image could be missed or misinterpreted in comparison with the original. However, while the software could be a helpful tool, the recognition accuracy should be improved. After that, the application can be used for different types of medical images, e.g. CT/CTA, PET, SPECT and many more. Currently achieved accuracy is about 90 % in average. The authors discuss some ideas to increase the accuracy of the recognition and also some front-end features that can be improved for more comfortable use. Extracted text data can be saved as a CSV file for further processing.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the 24th Conference Information Technologies – Applications and Theory (ITAT 2024)
ISBN
—
ISSN
1613-0073
e-ISSN
—
Počet stran výsledku
6
Strana od-do
105-110
Název nakladatele
CEUR-WS
Místo vydání
Neuveden
Místo konání akce
Drienica (Slovensko)
Datum konání akce
20. 9. 2024
Typ akce podle státní příslušnosti
EUR - Evropská akce
Kód UT WoS článku
—