Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F18%3APU130736" target="_blank" >RIV/00216305:26230/18:PU130736 - isvavai.cz</a>
Result on the web
<a href="http://www.fit.vutbr.cz/research/pubs/all.php?id=11722" target="_blank" >http://www.fit.vutbr.cz/research/pubs/all.php?id=11722</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP.2018.8461533" target="_blank" >10.1109/ICASSP.2018.8461533</a>
Alternative languages
Result language
angličtina
Original language name
Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion
Original language description
This paper addresses the problem of recognizing speech corrupted by overlapping speakers in a multichannel setting. To extract a target speaker from the mixture, we use a neural network based beamformer which uses masks estimated by a neural network to compute statistically optimal spatial filters. Following our previous work, we inform the neural network about the target speaker using information extracted from an adaptation utterance, enabling the network to track the target speaker. While in the previous work, this method was used to separately extract the speaker and then pass such preprocessed speech to a speech recognition system, here we explore training both systems jointly with a common speech recognition criterion. We show that integrating the two systems and training for the final objective improves the performance. In addition, the integration enables further sharing of information between the acoustic model and the speaker extraction system, by making use of the predicted HMMstate posteriors to refine the masks used for beamforming.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of ICASSP 2018
ISBN
978-1-5386-4658-8
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
6702-6706
Publisher name
IEEE Signal Processing Society
Place of publication
Calgary
Event location
Calgary
Event date
Apr 15, 2018
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000446384606172