Noise-robust speech triage

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F18%3APU127787" target="_blank" >RIV/00216305:26230/18:PU127787 - isvavai.cz</a>
Result on the web
<a href="https://asa.scitation.org/doi/10.1121/1.5031029" target="_blank" >https://asa.scitation.org/doi/10.1121/1.5031029</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1121/1.5031029" target="_blank" >10.1121/1.5031029</a>

Alternative languages

Result language
angličtina
Original language name
Noise-robust speech triage
Original language description
A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (10 dB SNR < 10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR10 dB).
Czech name
—
Czech description
—

Classification

Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
<a href="/en/project/VI20152020025" target="_blank" >VI20152020025: Information mining in speech acquired by distant microphones - DRAPÁK</a><br>
Continuities
S - Specificky vyzkum na vysokych skolach

Others

Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Name of the periodical
Journal of the Acoustical Society of America
ISSN
0001-4966
e-ISSN
1520-8524
Volume of the periodical
143
Issue of the periodical within the volume
4
Country of publishing house
US - UNITED STATES
Number of pages
8
Pages from-to
2313-2320
UT code for WoS article
000430570900039
EID of the result in the Scopus database
2-s2.0-85045888415

Similar results(10)

Study on the Use of Deep Neural Networks for Speech Activity Detection in Broadcast Recordings The Usage of ANN for Regression Analysis in Visible Light Positioning Systems Research on Passive Assessment of Parkinson’s Disease Utilising Speech Biomarkers

What are you looking for?

Quick search

Smart search

Noise-robust speech triage

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)