Study on the use and adaptation of bottleneck features for robust speech recognition of nonlinearly distorted speech
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F16%3A00000301" target="_blank" >RIV/46747885:24220/16:00000301 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.5220/0005955500650071" target="_blank" >http://dx.doi.org/10.5220/0005955500650071</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.5220/0005955500650071" target="_blank" >10.5220/0005955500650071</a>
Alternative languages
Result language
angličtina
Original language name
Study on the use and adaptation of bottleneck features for robust speech recognition of nonlinearly distorted speech
Original language description
This paper focuses on the robust recognition of nonlinearly distorted speech. We have previously reported that hybrid acoustic models based on a combination of Hidden Markov Models and Deep Neural Networks (HMM-DNNs) are better suited to this task than conventional HMMs utilizing Gaussian Mixture Models (HMM-GMMs). To further improve recognition accuracy, this paper investigates the possibility of combining the modeling power of deep neural networks with the adaptation to given acoustic conditions. For this purpose, the deep neural networks are utilized to produce bottleneck coefficients / features (BNC). The BNCs are subsequently used for training of HMM-GMM based acoustic models and then adapted using Constrained Maximum Likelihood Linear Regression (CMLLR). Our results obtained for three types of nonlinear distortions and three types of input features show that the adapted BNC-based system (a) outperforms HMM-DNN acoustic models in the case of strong compression and (b) yields comparable performance for speech affected by nonlinear amplification in the analog domain.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilingual Multimedia Monitoring and Analyzing Platform</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proc. of 13th International Conference on Signal Processing and Multimedia Applications (SIGMAP 2016)
ISBN
978-989-758-196-0
ISSN
—
e-ISSN
—
Number of pages
7
Pages from-to
65-71
Publisher name
SciTePress
Place of publication
Lisabon, Portugalsko
Event location
Lisabon, Portugalsko
Event date
Jan 1, 2016
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000391091400006