Tuning of Acoustic Modeling and Adaptation Technique for a Real Speech Recognition Task
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F19%3A43956404" target="_blank" >RIV/49777513:23520/19:43956404 - isvavai.cz</a>
Výsledek na webu
<a href="https://link.springer.com/chapter/10.1007/978-3-030-31372-2_20#aboutcontent" target="_blank" >https://link.springer.com/chapter/10.1007/978-3-030-31372-2_20#aboutcontent</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-31372-2_20" target="_blank" >10.1007/978-3-030-31372-2_20</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Tuning of Acoustic Modeling and Adaptation Technique for a Real Speech Recognition Task
Popis výsledku v původním jazyce
At the beginning, we had started to develop a Czech telephone acoustic model by evaluating various Kaldi recipes. We had a 500-h Czech telephone Switchboard-like corpus. We had selected the Time-Delay Neural Network (TDNN) model variant “d” with the i-vector adaptation as the best performing model on the held-out set from the corpus. The TDNN architecture with an asymmetric time-delay window also fulfilled our real-time application constrain. However, we were wondering why the model totally failed on a real call center task. The main problem was in the i-vector estimation procedure. The training data are split into short utterances. In the recipe, 2-utterance pseudospeakers are made and i-vectors are evaluated for them. However, the real call center utterances are much longer, in order of several minutes or even more. The TDNN model was trained from i-vectors that did not match the test ones. We propose two ways how to normalize statistics used for the i-vector estimation. The test data i-vectors with the normalization are better compatible with the training data i-vectors. In the paper, we also discuss various additional ways of improving the model accuracy on the out-of-domain real task including using LSTM based models.
Název v anglickém jazyce
Tuning of Acoustic Modeling and Adaptation Technique for a Real Speech Recognition Task
Popis výsledku anglicky
At the beginning, we had started to develop a Czech telephone acoustic model by evaluating various Kaldi recipes. We had a 500-h Czech telephone Switchboard-like corpus. We had selected the Time-Delay Neural Network (TDNN) model variant “d” with the i-vector adaptation as the best performing model on the held-out set from the corpus. The TDNN architecture with an asymmetric time-delay window also fulfilled our real-time application constrain. However, we were wondering why the model totally failed on a real call center task. The main problem was in the i-vector estimation procedure. The training data are split into short utterances. In the recipe, 2-utterance pseudospeakers are made and i-vectors are evaluated for them. However, the real call center utterances are much longer, in order of several minutes or even more. The TDNN model was trained from i-vectors that did not match the test ones. We propose two ways how to normalize statistics used for the i-vector estimation. The test data i-vectors with the normalization are better compatible with the training data i-vectors. In the paper, we also discuss various additional ways of improving the model accuracy on the out-of-domain real task including using LSTM based models.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20205 - Automation and control systems
Návaznosti výsledku
Projekt
<a href="/cs/project/EF16_013%2F0001781" target="_blank" >EF16_013/0001781: LINDAT/CLARIN - Výzkumná infrastruktura pro jazykové technologie - rozšíření repozitáře a výpočetní kapacity</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Statistical Language and Speech Processing, 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings
ISBN
978-3-030-31371-5
ISSN
0302-9743
e-ISSN
1611-3349
Počet stran výsledku
11
Strana od-do
235-245
Název nakladatele
Springer
Místo vydání
Cham
Místo konání akce
Ljubljana, Slovenia
Datum konání akce
14. 10. 2019
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—