Contextual Biasing Methods for Improving Rare Word Detection in Automatic Speech Recognition
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F24%3APU154700" target="_blank" >RIV/00216305:26230/24:PU154700 - isvavai.cz</a>
Result on the web
<a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10447465" target="_blank" >https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10447465</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP48485.2024.10447465" target="_blank" >10.1109/ICASSP48485.2024.10447465</a>
Alternative languages
Result language
angličtina
Original language name
Contextual Biasing Methods for Improving Rare Word Detection in Automatic Speech Recognition
Original language description
In specialized domains like Air Traffic Control (ATC), a notable challenge in porting a deployed Automatic Speech Recognition (ASR) system from one airport to another is the alteration in the set of crucial words that must be ac- curately detected in the new environment. Typically, such words have limited occurrences in training data, making it impractical to retrain the ASR system. This paper explores innovative word-boosting techniques to improve the detec- tion rate of such rare words in the ASR hypotheses for the ATC domain. Two acoustic models are investigated: a hybrid CNN-TDNNF model trained from scratch and a pre-trained wav2vec2-based XLSR model fine-tuned on a common ATC dataset. The word boosting is done in three ways. First, an out-of-vocabulary word addition method is explored. Second, G-boosting is explored, which amends the language model before building the decoding graph. Third, the boosting is performed on the fly during decoding using lattice re-scoring. The results indicate that the G-boosting method performs best and provides an approximately 30-43% relative improvement in recall of the boosted words. Moreover, a relative improve- ment of up to 48% is obtained upon combining G-boosting and lattice-rescoring
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISBN
979-8-3503-4485-1
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
12652-12656
Publisher name
IEEE Signal Processing Society
Place of publication
Seoul
Event location
Seoul
Event date
Apr 14, 2024
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—