Multimodal Name Recognition in Live TV Subtitling
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F18%3A43952588" target="_blank" >RIV/49777513:23520/18:43952588 - isvavai.cz</a>
Result on the web
<a href="https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1748.html" target="_blank" >https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1748.html</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2018-1748" target="_blank" >10.21437/Interspeech.2018-1748</a>
Alternative languages
Result language
angličtina
Original language name
Multimodal Name Recognition in Live TV Subtitling
Original language description
In this paper, we present a method of combining a visual text reader with a system of automatic speech recognition to suppress errors when encountering out-of-vocabulary words – specifically names. The visual text reader outputs detected words that are mapped into a large list of names via the Levenshtein distance. The detected names are inserted into the class-based language model on the fly which improves recognition results. To demonstrate the effect on the real speech recognition task we use data from sports TV broadcasting where a lot of names are present in both the audio and video streams. We superseded manual vocabulary editing in live TV subtitling through re-speaking by an automated online process. Further, we show that automatically adding the names to the recognition vocabulary online and with forgetting lowers the WER relatively by 39 % in comparison with the case when names of all sportsmen are added to the vocabulary beforehand and by 15 % when only the relevant names are added beforehand.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/GBP103%2F12%2FG084" target="_blank" >GBP103/12/G084: Center for Large Scale Multi-modal Data Interpretation</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 19th Annual Conference of the International Speech Communication Association (Interspeech 2018)
ISBN
978-1-5108-7221-9
ISSN
2308-457X
e-ISSN
neuvedeno
Number of pages
4
Pages from-to
3529-3532
Publisher name
Curran Associates, Inc.
Place of publication
Red Hook, NY
Event location
Hyderabad, Indie
Event date
Sep 2, 2018
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—