Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F24%3APU154699" target="_blank" >RIV/00216305:26230/24:PU154699 - isvavai.cz</a>
Výsledek na webu
<a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446751" target="_blank" >https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446751</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP48485.2024.10446751" target="_blank" >10.1109/ICASSP48485.2024.10446751</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint
Popis výsledku v původním jazyce
Self-supervised models trained with high linguistic diversity, such as the XLS-R model, can be effectively fine-tuned for the language recognition task. Typically, a back-end classifier followed by statistics pooling layer are added during train- ing. Commonly used back-end classifiers require a large num- ber of parameters to be trained, which is not ideal in limited data conditions. In this work, we explore smaller parame- ter back-ends using factorized Time Delay Neural Network (TDNN-F). The TDNN-F architecture is also integrated into Emphasized Channel Attention, Propagation and Aggregation- TDNN (ECAPA-TDNN) models, termed ECAPA-TDNN-F, reducing the number of parameters by 30 to 50% absolute, with competitive accuracies and no change in minimum cost. The results show that the ECAPA-TDNN-F can be extended to tasks where ECAPA-TDNN is suitable. We also test the effectiveness of a linear classifier and a variant, the Orthonor- mal linear classifier, previously used in x-vector type systems. The models are trained with NIST LRE17 data and evalu- ated on NIST LRE17, LRE22 and the ATCO2 LID datasets. Both linear classifiers outperform conventional back-ends with improvements in accuracy between 0.9% and 9.1%
Název v anglickém jazyce
Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint
Popis výsledku anglicky
Self-supervised models trained with high linguistic diversity, such as the XLS-R model, can be effectively fine-tuned for the language recognition task. Typically, a back-end classifier followed by statistics pooling layer are added during train- ing. Commonly used back-end classifiers require a large num- ber of parameters to be trained, which is not ideal in limited data conditions. In this work, we explore smaller parame- ter back-ends using factorized Time Delay Neural Network (TDNN-F). The TDNN-F architecture is also integrated into Emphasized Channel Attention, Propagation and Aggregation- TDNN (ECAPA-TDNN) models, termed ECAPA-TDNN-F, reducing the number of parameters by 30 to 50% absolute, with competitive accuracies and no change in minimum cost. The results show that the ECAPA-TDNN-F can be extended to tasks where ECAPA-TDNN is suitable. We also test the effectiveness of a linear classifier and a variant, the Orthonor- mal linear classifier, previously used in x-vector type systems. The models are trained with NIST LRE17 data and evalu- ated on NIST LRE17, LRE22 and the ATCO2 LID datasets. Both linear classifiers outperform conventional back-ends with improvements in accuracy between 0.9% and 9.1%
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISBN
979-8-3503-4485-1
ISSN
—
e-ISSN
—
Počet stran výsledku
5
Strana od-do
11921-11925
Název nakladatele
IEEE Signal Processing Society
Místo vydání
Seoul
Místo konání akce
Seoul
Datum konání akce
14. 4. 2024
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—