Audio-visual Broadcast Transcription System Using Artificial Neural Networks
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F21%3A00009296" target="_blank" >RIV/46747885:24220/21:00009296 - isvavai.cz</a>
Result on the web
<a href="https://ieeexplore.ieee.org/document/9468830" target="_blank" >https://ieeexplore.ieee.org/document/9468830</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ECMSM51310.2021.9468830" target="_blank" >10.1109/ECMSM51310.2021.9468830</a>
Alternative languages
Result language
angličtina
Original language name
Audio-visual Broadcast Transcription System Using Artificial Neural Networks
Original language description
In this paper, a new system for audio and visual TV broadcast News transcription is described. In the last few years, our system for audio-only broadcast transcription has been modified with the possibility of obtaining additional visual information, especially from TV video recordings. New extension modules and algorithms mainly for visual information extraction are described in this contribution. Combined Deep Neural Networks with Hidden Markov Models (DNN-HMM) are used for audio speech signal recognition. A classification of a relevant visual signal was based on Convolutional Neural Networks (CNN). There are the additional modules for detection and identification of human faces, TV logos, and company logos in the newly developed transcription system. Another module was designed for Optical Character Recognition (OCR) of text, which occurs mainly in video recordings of TV News very often. The whole audio-visual system for broadcast transcription was tested on a relatively big database (817 hours) which has been completely transcribed. The system also includes the possibility of intelligent search in transcribed data from audio and/or visual signals.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/TH03010018" target="_blank" >TH03010018: DeepSpot - Multilingual technology for spotting and instant alerting</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
2021 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics, ECMSM 2021
ISBN
978-153861757-1
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
—
Publisher name
IEEE
Place of publication
—
Event location
Liberec, ČR
Event date
Jan 1, 2021
Type of event by nationality
EUR - Evropská akce
UT code for WoS article
—