Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F17%3APU126480" target="_blank" >RIV/00216305:26230/17:PU126480 - isvavai.cz</a>
Result on the web
<a href="http://www.fit.vutbr.cz/research/groups/speech/publi/2017/zmolikova_asru2017.pdf" target="_blank" >http://www.fit.vutbr.cz/research/groups/speech/publi/2017/zmolikova_asru2017.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ASRU.2017.8268910" target="_blank" >10.1109/ASRU.2017.8268910</a>
Alternative languages
Result language
angličtina
Original language name
Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction
Original language description
Recently, schemes employing deep neural networks (DNNs) for extracting speech from noisy observation have demonstrated great potential for noise robust automatic speech recognition. However, these schemes are not well suited when the interfering noise is another speaker. To enable extracting a target speaker from a mixture of speakers, we have recently proposed to inform the neural network using speaker information extracted from an adaptation utterance from the same speaker. In our previous work, we explored ways how to inform the network about the speaker and found a speaker adaptive layer approach to be suitable for this task. In our experiments, we used speaker features designed for speaker recognition tasks as the additional speaker information, which may not be optimal for the speaker extraction task. In this paper, we propose a usage of a sequence summarizing scheme enabling to learn the speaker representation jointly with the network. Furthermore, we extend the previous experiments to demonstrate the potential of our proposed method as a front-end for speech recognition and explore the effect of additional noise on the performance of the method.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of ASRU 2017
ISBN
978-1-5090-4788-8
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
8-15
Publisher name
IEEE Signal Processing Society
Place of publication
Okinawa
Event location
Okinawa
Event date
Dec 16, 2017
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000426066100002