Lightly supervised vs. semi-supervised training of acoustic model on Luxembourgish for low-resource automatic speech recognition
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F18%3APU130796" target="_blank" >RIV/00216305:26230/18:PU130796 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.isca-speech.org/archive/Interspeech_2018/abstracts/2361.html" target="_blank" >https://www.isca-speech.org/archive/Interspeech_2018/abstracts/2361.html</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2018-2361" target="_blank" >10.21437/Interspeech.2018-2361</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Lightly supervised vs. semi-supervised training of acoustic model on Luxembourgish for low-resource automatic speech recognition
Popis výsledku v původním jazyce
In this work, we focus on exploiting inexpensive data in order to to improve the DNN acoustic model for ASR. We explore two strategies: The first one uses untranscribed data from the target domain. The second one is related to the proper selection of excerpts from imperfectly transcribed out-of-domain public data, as parliamentary speeches. We found out that both approaches lead to similar results, making them equally beneficial for practical use. The Luxembourgish ASR seed system had a 38.8% WER and it improved by roughly 4% absolute, leading to 34.6% for untranscribed and 34.9% for lightlysupervised data. Adding both databases simultaneously led to 34.4% WER, which is only a small improvement. As a secondary research topic, we experiment with semi-supervised state-level minimum Bayes risk (sMBR) training. Nonetheless, for sMBR we saw no improvement from adding the automatically transcribed target data, despite that similar techniques yield good results in the case of cross-entropy (CE) training.
Název v anglickém jazyce
Lightly supervised vs. semi-supervised training of acoustic model on Luxembourgish for low-resource automatic speech recognition
Popis výsledku anglicky
In this work, we focus on exploiting inexpensive data in order to to improve the DNN acoustic model for ASR. We explore two strategies: The first one uses untranscribed data from the target domain. The second one is related to the proper selection of excerpts from imperfectly transcribed out-of-domain public data, as parliamentary speeches. We found out that both approaches lead to similar results, making them equally beneficial for practical use. The Luxembourgish ASR seed system had a 38.8% WER and it improved by roughly 4% absolute, leading to 34.6% for untranscribed and 34.9% for lightlysupervised data. Adding both databases simultaneously led to 34.4% WER, which is only a small improvement. As a secondary research topic, we experiment with semi-supervised state-level minimum Bayes risk (sMBR) training. Nonetheless, for sMBR we saw no improvement from adding the automatically transcribed target data, despite that similar techniques yield good results in the case of cross-entropy (CE) training.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/TJ01000208" target="_blank" >TJ01000208: NeurOnové sítě pro zpracování SIgnálu a dolování informací v řeČI - NOSIČI</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of Interspeech 2018
ISBN
—
ISSN
1990-9772
e-ISSN
—
Počet stran výsledku
5
Strana od-do
2883-2887
Název nakladatele
International Speech Communication Association
Místo vydání
Hyderabad
Místo konání akce
Hyderabad, India
Datum konání akce
2. 9. 2018
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—