Noise-robust speech triage
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F18%3APU127787" target="_blank" >RIV/00216305:26230/18:PU127787 - isvavai.cz</a>
Result on the web
<a href="https://asa.scitation.org/doi/10.1121/1.5031029" target="_blank" >https://asa.scitation.org/doi/10.1121/1.5031029</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1121/1.5031029" target="_blank" >10.1121/1.5031029</a>
Alternative languages
Result language
angličtina
Original language name
Noise-robust speech triage
Original language description
A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (10 dB SNR < 10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR10 dB).
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/VI20152020025" target="_blank" >VI20152020025: Information mining in speech acquired by distant microphones - DRAPÁK</a><br>
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Journal of the Acoustical Society of America
ISSN
0001-4966
e-ISSN
1520-8524
Volume of the periodical
143
Issue of the periodical within the volume
4
Country of publishing house
US - UNITED STATES
Number of pages
8
Pages from-to
2313-2320
UT code for WoS article
000430570900039
EID of the result in the Scopus database
2-s2.0-85045888415