On-the-Fly Text Retrieval for end-to-end ASR Adaptation
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F23%3APU149423" target="_blank" >RIV/00216305:26230/23:PU149423 - isvavai.cz</a>
Výsledek na webu
<a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10095857" target="_blank" >https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10095857</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP49357.2023.10095857" target="_blank" >10.1109/ICASSP49357.2023.10095857</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
On-the-Fly Text Retrieval for end-to-end ASR Adaptation
Popis výsledku v původním jazyce
End-to-end speech recognition models are improved by incorporat- ing external text sources, typically by fusion with an external lan- guage model. Such language models have to be retrained whenever the corpus of interest changes. Furthermore, since they store the entire corpus in their parameters, rare words can be challenging to recall. In this work, we propose augmenting a transducer-based ASR model with a retrieval language model, which directly retrieves from an external text corpus plausible completions for a partial ASR hy- pothesis. These completions are then integrated into subsequent pre- dictions by an adapter, which is trained once, so that the corpus of interest can be switched without incurring the computational over- head of retraining. Our experiments show that the proposed model significantly improves the performance of a transducer baseline on a pair of question-answering datasets. Further, it outperforms shallow fusion on recognition of named entities by about 7% relative; when the two are combined, the relative improvement increases to 13%
Název v anglickém jazyce
On-the-Fly Text Retrieval for end-to-end ASR Adaptation
Popis výsledku anglicky
End-to-end speech recognition models are improved by incorporat- ing external text sources, typically by fusion with an external lan- guage model. Such language models have to be retrained whenever the corpus of interest changes. Furthermore, since they store the entire corpus in their parameters, rare words can be challenging to recall. In this work, we propose augmenting a transducer-based ASR model with a retrieval language model, which directly retrieves from an external text corpus plausible completions for a partial ASR hy- pothesis. These completions are then integrated into subsequent pre- dictions by an adapter, which is trained once, so that the corpus of interest can be switched without incurring the computational over- head of retraining. Our experiments show that the proposed model significantly improves the performance of a transducer baseline on a pair of question-answering datasets. Further, it outperforms shallow fusion on recognition of named entities by about 7% relative; when the two are combined, the relative improvement increases to 13%
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of ICASSP 2023
ISBN
978-1-7281-6327-7
ISSN
—
e-ISSN
—
Počet stran výsledku
5
Strana od-do
1-5
Název nakladatele
IEEE Signal Processing Society
Místo vydání
Rhodes Island
Místo konání akce
Rhodes Island, Greece
Datum konání akce
4. 6. 2023
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—