Bimodal speech recognition fusing audio-visual modalities
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F16%3A43929971" target="_blank" >RIV/49777513:23520/16:43929971 - isvavai.cz</a>
Result on the web
<a href="http://link.springer.com/chapter/10.1007/978-3-319-39516-6_16" target="_blank" >http://link.springer.com/chapter/10.1007/978-3-319-39516-6_16</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-39516-6_16" target="_blank" >10.1007/978-3-319-39516-6_16</a>
Alternative languages
Result language
angličtina
Original language name
Bimodal speech recognition fusing audio-visual modalities
Original language description
In this paper, we present a novel bimodal speech recognition technique that fuses both audio information (sound signal) and visual information (movements of lips) for Russian speech recognition. We propose an architecture of the automatic system for bimodal recognition of audio-visual speech, which uses one stationary microphone Oktava and one high-speed camera JAI Pulnix (200 frames per second at 640 x 480 pixels) to get audio and video signals. We describe also developed software for audio-visual speech database recording, phonemic and visemic structures of the Russian language, as well as probabilistic models of bimodal speech units based on Coupled Hidden Markov Models. Realization of a transformation method from a Coupled Hidden Markov Model into an equivalent 2-stream Hidden Markov Model is presented as well.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JD - Use of computers, robotics and its application
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/LO1506" target="_blank" >LO1506: Sustainability support of the centre NTIS - New Technologies for the Information Society</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Human-Computer Interaction. Interaction Platforms and Techniques 18th International Conference, HCI International 2016, Toronto, ON, Canada, July 17-22, 2016. Proceedings, Part II
ISBN
978-3-319-39515-9
ISSN
0302-9743
e-ISSN
—
Number of pages
10
Pages from-to
170-179
Publisher name
Springer
Place of publication
Heidelberg
Event location
Toronto, Canada
Event date
Jul 17, 2016
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—