Noise-robust speech triage
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F18%3APU127787" target="_blank" >RIV/00216305:26230/18:PU127787 - isvavai.cz</a>
Výsledek na webu
<a href="https://asa.scitation.org/doi/10.1121/1.5031029" target="_blank" >https://asa.scitation.org/doi/10.1121/1.5031029</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1121/1.5031029" target="_blank" >10.1121/1.5031029</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Noise-robust speech triage
Popis výsledku v původním jazyce
A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (10 dB SNR < 10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR10 dB).
Název v anglickém jazyce
Noise-robust speech triage
Popis výsledku anglicky
A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (10 dB SNR < 10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR10 dB).
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/VI20152020025" target="_blank" >VI20152020025: Dolování infoRmAcí z řeči Pořízené vzdÁlenými miKrofony - DRAPÁK</a><br>
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Journal of the Acoustical Society of America
ISSN
0001-4966
e-ISSN
1520-8524
Svazek periodika
143
Číslo periodika v rámci svazku
4
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
8
Strana od-do
2313-2320
Kód UT WoS článku
000430570900039
EID výsledku v databázi Scopus
2-s2.0-85045888415