Online Speaker Adaptation of an Acoustic Model Using Face Recognition
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F13%3A00212560" target="_blank" >RIV/68407700:21230/13:00212560 - isvavai.cz</a>
Alternative codes found
RIV/49777513:23520/13:43920969
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-642-40585-3_48" target="_blank" >http://dx.doi.org/10.1007/978-3-642-40585-3_48</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-642-40585-3_48" target="_blank" >10.1007/978-3-642-40585-3_48</a>
Alternative languages
Result language
angličtina
Original language name
Online Speaker Adaptation of an Acoustic Model Using Face Recognition
Original language description
We have proposed and evaluated a novel approach for online speaker adaptation of an acoustic model based on face recognition. Instead of traditionally used audio-based speaker identification we investigated video modality for the task of speaker detection. A simulated on-line transcription created by a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for online subtitling is evaluated utilizing speaker independent acoustic models, gender dependent models and models of particular speakers. In the experiment, the speaker dependent acoustic models were trained offline, and are switched online based on the decision of the face recognizer, which reduced Word Error Rate (WER) by 12% relatively compared to speaker independent baseline system.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JD - Use of computers, robotics and its application
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/GBP103%2F12%2FG084" target="_blank" >GBP103/12/G084: Center for Large Scale Multi-modal Data Interpretation</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2013
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Text, Speech, and Dialogue: 16th International Conference, TSD 2013
ISBN
978-3-642-40584-6
ISSN
0302-9743
e-ISSN
—
Number of pages
8
Pages from-to
378-385
Publisher name
Springer
Place of publication
Heidelberg
Event location
Pilsen
Event date
Sep 1, 2013
Type of event by nationality
EUR - Evropská akce
UT code for WoS article
—