Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F17%3A00004828" target="_blank" >RIV/46747885:24220/17:00004828 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-319-66429-3_77" target="_blank" >http://dx.doi.org/10.1007/978-3-319-66429-3_77</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-66429-3_77" target="_blank" >10.1007/978-3-319-66429-3_77</a>

Result language
angličtina
Original language name
Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition
Original language description
Vast majority of current research in the area of audiovisual speech recognition via lipreading from frontal face videos focuses on simple cases such as isolated phrase recognition or structured speech, where the vocabulary is limited to several tens of units. In this paper, we diverge from these traditional applications and investigate the effect of incorporating the visual information in the task of continuous speech recognition with vocabulary size ranging from several hundred to half a million words. To this end, we evaluate various visual speech parametrizations, both existing and novel, that are designed to capture different kind of information in the video signal. The experiments are conducted on a moderate sized dataset of 54 speakers, each uttering 100 sentences in Czech language. We show that even for large vocabularies the visual signal contains enough information to improve the word accuracy up to 15% relatively to the acoustic-only recognition.
Czech name
—
Czech description
—

Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Article name in the collection
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 19th International Conference on Speech and Computer, SPECOM 2017
ISBN
9783319664286
ISSN
0302-9743
e-ISSN
—
Number of pages
10
Pages from-to
767-776
Publisher name
Springer Verlag
Place of publication
Spolková republika Německo
Event location
Hatfield; United Kingdom
Event date
Jan 1, 2017
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—

Similar results(10)