Utilizing VOiCES dataset for multichannel speaker verification with beamforming

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F20%3APU136531" target="_blank" >RIV/00216305:26230/20:PU136531 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.isca-speech.org/archive/Odyssey_2020/abstracts/80.html" target="_blank" >https://www.isca-speech.org/archive/Odyssey_2020/abstracts/80.html</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Odyssey.2020-27" target="_blank" >10.21437/Odyssey.2020-27</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Utilizing VOiCES dataset for multichannel speaker verification with beamforming
Popis výsledku v původním jazyce
VOiCES from a Distance Challenge 2019 aimed at the evaluation of speaker verification (SV) systems using single-channel trials based on the Voices Obscured in Complex Environmental Settings (VOiCES) corpus. Since it comprises recordings of the same utterances captured simultaneously by multiple microphones in the same environments, it is also suitable for multichannel experiments. In this work, we design a multichannel dataset as well as development and evaluation trials for SV inspired by the VOiCES challenge. Alternatives discarding harmful microphones are presented as well. We asses the utilization of the created dataset for x-vector based SV with beamforming as a front end. Standard fixed beamforming and NN-supported beamforming using simulated data and ideal binary masks (IBM) are compared with another variant of NNsupported beamforming that is trained solely on the VOiCES data. Lack of data revealed by experiments with VOiCESdata trained beamformer was tackled by means of a variant of SpecAugment applied to magnitude spectra. This approach led to as much as 10% relative improvement in EER pushing results closer to those obtained by a good beamformer based on IBMs.
Název v anglickém jazyce
Utilizing VOiCES dataset for multichannel speaker verification with beamforming
Popis výsledku anglicky
VOiCES from a Distance Challenge 2019 aimed at the evaluation of speaker verification (SV) systems using single-channel trials based on the Voices Obscured in Complex Environmental Settings (VOiCES) corpus. Since it comprises recordings of the same utterances captured simultaneously by multiple microphones in the same environments, it is also suitable for multichannel experiments. In this work, we design a multichannel dataset as well as development and evaluation trials for SV inspired by the VOiCES challenge. Alternatives discarding harmful microphones are presented as well. We asses the utilization of the created dataset for x-vector based SV with beamforming as a front end. Standard fixed beamforming and NN-supported beamforming using simulated data and ideal binary masks (IBM) are compared with another variant of NNsupported beamforming that is trained solely on the VOiCES data. Lack of data revealed by experiments with VOiCESdata trained beamformer was tackled by means of a variant of SpecAugment applied to magnitude spectra. This approach led to as much as 10% relative improvement in EER pushing results closer to those obtained by a good beamformer based on IBMs.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2020
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop
ISBN
—
ISSN
2312-2846
e-ISSN
—
Počet stran výsledku
7
Strana od-do
187-193
Název nakladatele
International Speech Communication Association
Místo vydání
Tokyo
Místo konání akce
Tokyo
Datum konání akce
1. 11. 2020
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Speaker Verification with Application-Aware Beamforming Multisv: Dataset for Far-Field Multi-Channel Speaker Verification Dereverberation and Beamforming in Robust Far-Field Speaker Recognition

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Utilizing VOiCES dataset for multichannel speaker verification with beamforming

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)