Contextual Biasing Methods for Improving Rare Word Detection in Automatic Speech Recognition
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F24%3APU154700" target="_blank" >RIV/00216305:26230/24:PU154700 - isvavai.cz</a>
Výsledek na webu
<a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10447465" target="_blank" >https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10447465</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP48485.2024.10447465" target="_blank" >10.1109/ICASSP48485.2024.10447465</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Contextual Biasing Methods for Improving Rare Word Detection in Automatic Speech Recognition
Popis výsledku v původním jazyce
In specialized domains like Air Traffic Control (ATC), a notable challenge in porting a deployed Automatic Speech Recognition (ASR) system from one airport to another is the alteration in the set of crucial words that must be ac- curately detected in the new environment. Typically, such words have limited occurrences in training data, making it impractical to retrain the ASR system. This paper explores innovative word-boosting techniques to improve the detec- tion rate of such rare words in the ASR hypotheses for the ATC domain. Two acoustic models are investigated: a hybrid CNN-TDNNF model trained from scratch and a pre-trained wav2vec2-based XLSR model fine-tuned on a common ATC dataset. The word boosting is done in three ways. First, an out-of-vocabulary word addition method is explored. Second, G-boosting is explored, which amends the language model before building the decoding graph. Third, the boosting is performed on the fly during decoding using lattice re-scoring. The results indicate that the G-boosting method performs best and provides an approximately 30-43% relative improvement in recall of the boosted words. Moreover, a relative improve- ment of up to 48% is obtained upon combining G-boosting and lattice-rescoring
Název v anglickém jazyce
Contextual Biasing Methods for Improving Rare Word Detection in Automatic Speech Recognition
Popis výsledku anglicky
In specialized domains like Air Traffic Control (ATC), a notable challenge in porting a deployed Automatic Speech Recognition (ASR) system from one airport to another is the alteration in the set of crucial words that must be ac- curately detected in the new environment. Typically, such words have limited occurrences in training data, making it impractical to retrain the ASR system. This paper explores innovative word-boosting techniques to improve the detec- tion rate of such rare words in the ASR hypotheses for the ATC domain. Two acoustic models are investigated: a hybrid CNN-TDNNF model trained from scratch and a pre-trained wav2vec2-based XLSR model fine-tuned on a common ATC dataset. The word boosting is done in three ways. First, an out-of-vocabulary word addition method is explored. Second, G-boosting is explored, which amends the language model before building the decoding graph. Third, the boosting is performed on the fly during decoding using lattice re-scoring. The results indicate that the G-boosting method performs best and provides an approximately 30-43% relative improvement in recall of the boosted words. Moreover, a relative improve- ment of up to 48% is obtained upon combining G-boosting and lattice-rescoring
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISBN
979-8-3503-4485-1
ISSN
—
e-ISSN
—
Počet stran výsledku
5
Strana od-do
12652-12656
Název nakladatele
IEEE Signal Processing Society
Místo vydání
Seoul
Místo konání akce
Seoul
Datum konání akce
14. 4. 2024
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—