Spatiotemporal Convolutional Features for Lipreading
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F17%3A00004827" target="_blank" >RIV/46747885:24220/17:00004827 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-319-64206-2_49" target="_blank" >http://dx.doi.org/10.1007/978-3-319-64206-2_49</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-64206-2_49" target="_blank" >10.1007/978-3-319-64206-2_49</a>
Alternative languages
Result language
angličtina
Original language name
Spatiotemporal Convolutional Features for Lipreading
Original language description
We propose a visual parametrization method for the task of lipreading and audiovisual speech recognition from frontal face videos. The presented features utilize learned spatiotemporal convolutions in a deep neural network that is trained to predict phonemes on a frame level. The network is trained on a manually transcribed moderate size dataset of Czech television broadcast, but we show that the resulting features generalize well to other languages as well. On a publicly available OuluVS dataset, a result of 91% word accuracy was achieved using vanilla convolutional features, and 97.2% after fine tuning – substantial state of the art improvements in this popular benchmark. Contrary to most of the work on lipreading, we also demonstrate usefulness of the proposed parametrization in the task of continuous audiovisual speech recognition.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20204 - Robotics and automatic control
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 20th International Conference on Text, Speech and Dialogue, TSD 2017
ISBN
9783319642055
ISSN
0302-9743
e-ISSN
—
Number of pages
9
Pages from-to
438-446
Publisher name
Springer Verlag
Place of publication
Spolková republika Německo
Event location
Praha, Česká Republika
Event date
Jan 1, 2017
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—