Voice-activity and overlapped speech detection using x-vectors
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F20%3A00008350" target="_blank" >RIV/46747885:24220/20:00008350 - isvavai.cz</a>
Výsledek na webu
<a href="https://link.springer.com/chapter/10.1007/978-3-030-58323-1_40" target="_blank" >https://link.springer.com/chapter/10.1007/978-3-030-58323-1_40</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-58323-1_40" target="_blank" >10.1007/978-3-030-58323-1_40</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Voice-activity and overlapped speech detection using x-vectors
Popis výsledku v původním jazyce
The x-vectors are features extracted from speech signals using pretrained deep neural networks, such that they discriminate well among different speakers. Their main application lies in speaker identification and verification. This manuscript studies, which other properties are encoded in x-vectors. The focus lies on distinguishing between speech signals/noise and utterances of a single speaker versus overlapped-speech. We attempt to show that the x-vector network is capable to extract multi-purpose features, which can be used by several simple back-end classifiers. This means a common feature extracting front-end for the tasks of voice-activity/overlapped speech detection and speaker identification. Compared to the alternative strategy, that is training of independent classifiers including feature extracting layers for each of the tasks, the common front-end saves computational time during both training and test phase.
Název v anglickém jazyce
Voice-activity and overlapped speech detection using x-vectors
Popis výsledku anglicky
The x-vectors are features extracted from speech signals using pretrained deep neural networks, such that they discriminate well among different speakers. Their main application lies in speaker identification and verification. This manuscript studies, which other properties are encoded in x-vectors. The focus lies on distinguishing between speech signals/noise and utterances of a single speaker versus overlapped-speech. We attempt to show that the x-vector network is capable to extract multi-purpose features, which can be used by several simple back-end classifiers. This means a common feature extracting front-end for the tasks of voice-activity/overlapped speech detection and speaker identification. Compared to the alternative strategy, that is training of independent classifiers including feature extracting layers for each of the tasks, the common front-end saves computational time during both training and test phase.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/TH03010018" target="_blank" >TH03010018: DeepSpot - Multilingvální technologie pro detekci a včasné upozornění</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2020
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) - 23rd International Conference on Text, Speech, and Dialogue, TSD 2020
ISBN
978-303058322-4
ISSN
03029743
e-ISSN
—
Počet stran výsledku
11
Strana od-do
366-376
Název nakladatele
Springer Nature Switzerland
Místo vydání
Switzerland
Místo konání akce
(on-line) Brno, Czech Republic
Datum konání akce
1. 1. 2020
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000611543200040