Non-Parametric Bayesian Subspace Models for Acoustic Unit Discovery
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F22%3APU144939" target="_blank" >RIV/00216305:26230/22:PU144939 - isvavai.cz</a>
Výsledek na webu
<a href="https://ieeexplore.ieee.org/document/9767690" target="_blank" >https://ieeexplore.ieee.org/document/9767690</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/TASLP.2022.3171975" target="_blank" >10.1109/TASLP.2022.3171975</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Non-Parametric Bayesian Subspace Models for Acoustic Unit Discovery
Popis výsledku v původním jazyce
This work investigates subspace non-parametric models for the task of learning a set of acoustic units fromunlabeled speech recordings. We constrain the base-measure of a Dirichlet- Process mixture with a phonetic subspaceestimated from other source languagesto build an educated prior, thereby forcing the learned acoustic units to resemble phones of known source languages. Two types of models are proposed: (i) the Subspace HMM (SHMM) which assumes that the phonetic subspace is the same for every language, (ii) the Hierarchical-Subspace HMM (H-SHMM) which relaxes this assumption and allows to have a languagespecific subspace estimated on the unlabeled target data. These models are applied on 3 languages: English, Yoruba and Mboshi and they are compared with various competitive acoustic units discovery baselines. Experimental results show that both subspace models outperform other systems in terms of clustering quality and segmentation accuracy. Moreover, we observe that the H-SHMM provides results superior to the SHMM supporting the idea that language-specific priors are preferable to language-agnostic priors for acoustic unit discovery.
Název v anglickém jazyce
Non-Parametric Bayesian Subspace Models for Acoustic Unit Discovery
Popis výsledku anglicky
This work investigates subspace non-parametric models for the task of learning a set of acoustic units fromunlabeled speech recordings. We constrain the base-measure of a Dirichlet- Process mixture with a phonetic subspaceestimated from other source languagesto build an educated prior, thereby forcing the learned acoustic units to resemble phones of known source languages. Two types of models are proposed: (i) the Subspace HMM (SHMM) which assumes that the phonetic subspace is the same for every language, (ii) the Hierarchical-Subspace HMM (H-SHMM) which relaxes this assumption and allows to have a languagespecific subspace estimated on the unlabeled target data. These models are applied on 3 languages: English, Yoruba and Mboshi and they are compared with various competitive acoustic units discovery baselines. Experimental results show that both subspace models outperform other systems in terms of clustering quality and segmentation accuracy. Moreover, we observe that the H-SHMM provides results superior to the SHMM supporting the idea that language-specific priors are preferable to language-agnostic priors for acoustic unit discovery.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/GX19-26934X" target="_blank" >GX19-26934X: Neuronové reprezentace v multimodálním a mnohojazyčném modelování</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING
ISSN
2329-9290
e-ISSN
2329-9304
Svazek periodika
30
Číslo periodika v rámci svazku
5
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
16
Strana od-do
1902-1917
Kód UT WoS článku
000811572000001
EID výsledku v databázi Scopus
2-s2.0-85129456463