Experimenting With Lipreading For Large Vocabulary Continuous Speech Recognition
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F18%3A00006135" target="_blank" >RIV/46747885:24220/18:00006135 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/s12193-018-0266-2" target="_blank" >http://dx.doi.org/10.1007/s12193-018-0266-2</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s12193-018-0266-2" target="_blank" >10.1007/s12193-018-0266-2</a>
Alternative languages
Result language
angličtina
Original language name
Experimenting With Lipreading For Large Vocabulary Continuous Speech Recognition
Original language description
Vast majority of current research in the area of audiovisual speech recognition via lipreading from frontal face videos focuses on simple cases such as isolated phrase recognition or structured speech, where the vocabulary is limited to several tens of units. In this paper, we diverge from these traditional applications and investigate the effect of incorporating the visual and also depth information in the task of continuous speech recognition with vocabulary size ranging from several hundred to half a million words. To this end, we evaluate various visual speech parametrizations, both existing and novel, that are designed to capture different kind of information in the video and depth signals. The experiments are conducted on a moderate sized dataset of 54 speakers, each uttering 100 sentences in Czech language. Both the video and depth data was captured by the Microsoft Kinect device. We show that even for large vocabularies the visual signal contains enough information to improve the word accuracy up to 22% relatively to the acoustic-only recognition. Somewhat surprisingly, a relative improvement of up to 16% has also been reached using the interpolated depth data.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
20206 - Computer hardware and architecture
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Journal on Multimodal User Interfaces
ISSN
1783-7677
e-ISSN
—
Volume of the periodical
12
Issue of the periodical within the volume
4
Country of publishing house
US - UNITED STATES
Number of pages
10
Pages from-to
309-318
UT code for WoS article
000448519400005
EID of the result in the Scopus database
2-s2.0-85049998576