Unsupervised Language Model Adaptation for Speech Recognition with no Extra Resources

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F19%3APU134188" target="_blank" >RIV/00216305:26230/19:PU134188 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.dega-akustik.de/publikationen/online-proceedings/" target="_blank" >https://www.dega-akustik.de/publikationen/online-proceedings/</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Unsupervised Language Model Adaptation for Speech Recognition with no Extra Resources
Popis výsledku v původním jazyce
Classically, automatic speech recognition (ASR) models are decomposed into acoustic models and language models (LM). LMs usually exploit the linguistic structure on a purely textual level and usually contribute strongly to an ASR systems performance. LMs are estimated on large amounts of textual data covering the target domain. However, most utterances cover more specic topics, e.g. in uencing the vocabulary used. Therefore, it's desirable to have the LM adjusted to an utterance's topic. Previous work achieves this by crawling extra data from the web or by using signicant amounts of previous speech data to train topic-specic LM on. We propose a way of adapting the LM directly using the target utterance to be recognized. The corresponding adaptation needs to be done in an unsupervised or automatically supervised way based on the speech input. To deal with corresponding errors robustly, we employ topic encodings from the recently proposed Subspace Multinomial Model. This model also avoids any need of explicit topic labelling during training or recognition, making the proposed method straight-forward to use. We demonstrate the performance of the method on the Librispeech corpus, which consists of read ction books, and we discuss it's behaviour qualitatively.
Název v anglickém jazyce
Unsupervised Language Model Adaptation for Speech Recognition with no Extra Resources
Popis výsledku anglicky
Classically, automatic speech recognition (ASR) models are decomposed into acoustic models and language models (LM). LMs usually exploit the linguistic structure on a purely textual level and usually contribute strongly to an ASR systems performance. LMs are estimated on large amounts of textual data covering the target domain. However, most utterances cover more specic topics, e.g. in uencing the vocabulary used. Therefore, it's desirable to have the LM adjusted to an utterance's topic. Previous work achieves this by crawling extra data from the web or by using signicant amounts of previous speech data to train topic-specic LM on. We propose a way of adapting the LM directly using the target utterance to be recognized. The corresponding adaptation needs to be done in an unsupervised or automatically supervised way based on the speech input. To deal with corresponding errors robustly, we employ topic encodings from the recently proposed Subspace Multinomial Model. This model also avoids any need of explicit topic labelling during training or recognition, making the proposed method straight-forward to use. We demonstrate the performance of the method on the Librispeech corpus, which consists of read ction books, and we discuss it's behaviour qualitatively.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/EF16_027%2F0008371" target="_blank" >EF16_027/0008371: Mezinárodní mobilita výzkumníků Vysokého učení technického v Brně</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of DAGA 2019
ISBN
978-3-939296-14-0
ISSN
—
e-ISSN
—
Počet stran výsledku
4
Strana od-do
954-957
Název nakladatele
DEGA Head office, Deutsche Gesellschaft für Akustik
Místo vydání
Rostock
Místo konání akce
Rostock
Datum konání akce
18. 3. 2019
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Robust Recognition of Speech with Background Music in Acoustically Under-Resourced Scenarios The IWSLT 2021 BUT Speech Translation Systems Multi-Condition Training for Unknown Environment Adaptation in Robust ASR Under Real Conditions

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Unsupervised Language Model Adaptation for Speech Recognition with no Extra Resources

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)