Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F17%3APU126480" target="_blank" >RIV/00216305:26230/17:PU126480 - isvavai.cz</a>
Výsledek na webu
<a href="http://www.fit.vutbr.cz/research/groups/speech/publi/2017/zmolikova_asru2017.pdf" target="_blank" >http://www.fit.vutbr.cz/research/groups/speech/publi/2017/zmolikova_asru2017.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ASRU.2017.8268910" target="_blank" >10.1109/ASRU.2017.8268910</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction
Popis výsledku v původním jazyce
Recently, schemes employing deep neural networks (DNNs) for extracting speech from noisy observation have demonstrated great potential for noise robust automatic speech recognition. However, these schemes are not well suited when the interfering noise is another speaker. To enable extracting a target speaker from a mixture of speakers, we have recently proposed to inform the neural network using speaker information extracted from an adaptation utterance from the same speaker. In our previous work, we explored ways how to inform the network about the speaker and found a speaker adaptive layer approach to be suitable for this task. In our experiments, we used speaker features designed for speaker recognition tasks as the additional speaker information, which may not be optimal for the speaker extraction task. In this paper, we propose a usage of a sequence summarizing scheme enabling to learn the speaker representation jointly with the network. Furthermore, we extend the previous experiments to demonstrate the potential of our proposed method as a front-end for speech recognition and explore the effect of additional noise on the performance of the method.
Název v anglickém jazyce
Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction
Popis výsledku anglicky
Recently, schemes employing deep neural networks (DNNs) for extracting speech from noisy observation have demonstrated great potential for noise robust automatic speech recognition. However, these schemes are not well suited when the interfering noise is another speaker. To enable extracting a target speaker from a mixture of speakers, we have recently proposed to inform the neural network using speaker information extracted from an adaptation utterance from the same speaker. In our previous work, we explored ways how to inform the network about the speaker and found a speaker adaptive layer approach to be suitable for this task. In our experiments, we used speaker features designed for speaker recognition tasks as the additional speaker information, which may not be optimal for the speaker extraction task. In this paper, we propose a usage of a sequence summarizing scheme enabling to learn the speaker representation jointly with the network. Furthermore, we extend the previous experiments to demonstrate the potential of our proposed method as a front-end for speech recognition and explore the effect of additional noise on the performance of the method.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of ASRU 2017
ISBN
978-1-5090-4788-8
ISSN
—
e-ISSN
—
Počet stran výsledku
8
Strana od-do
8-15
Název nakladatele
IEEE Signal Processing Society
Místo vydání
Okinawa
Místo konání akce
Okinawa
Datum konání akce
16. 12. 2017
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000426066100002

Podobné výsledky(10)

Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion Single Channel Target Speaker Extraction and Recognition with Speaker Beam Speaker activity driven neural speech extraction

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)