Single Channel Target Speaker Extraction and Recognition with Speaker Beam

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F18%3APU130735" target="_blank" >RIV/00216305:26230/18:PU130735 - isvavai.cz</a>
Result on the web
<a href="http://www.fit.vutbr.cz/research/pubs/all.php?id=11721" target="_blank" >http://www.fit.vutbr.cz/research/pubs/all.php?id=11721</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP.2018.8462661" target="_blank" >10.1109/ICASSP.2018.8462661</a>

Alternative languages

Result language
angličtina
Original language name
Single Channel Target Speaker Extraction and Recognition with Speaker Beam
Original language description
This paper addresses the problem of single channel speech recognition of a target speaker in a mixture of speech signals. We propose to exploit auxiliary speaker information provided by an adaptation utterance from the target speaker to extract and recognize only that speaker. Using such auxiliary information, we can build a speaker extraction neural network (NN) that is independent of the number of sources in the mixture, and that can track speakers across different utterances, which are two challenging issues occurring with conventional approaches for speech recognition of mixtures. We call such an informed speaker extraction scheme "SpeakerBeam". SpeakerBeam exploits a recently developed context adaptive deep NN (CADNN) that allows tracking speech from a target speaker using a speaker adaptation layer, whose parameters are adjusted depending on auxiliary features representing the target speaker characteristics. SpeakerBeam was previously investigated for speaker extraction using a microphone array. In this paper, we demonstrate that it is also efficient for single channel speaker extraction. The speaker adaptation layer can be employed either to build a speaker adaptive acoustic model that recognizes only the target speaker or a maskbased speaker extraction network that extracts the target speech from the speech mixture signal prior to recognition. We also show that the latter speaker extraction network can be optimized jointly with an acoustic model to further improve ASR performance.
Czech name
—
Czech description
—

Classification

Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
<a href="/en/project/LQ1602" target="_blank" >LQ1602: IT4Innovations excellence in science</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Article name in the collection
Proceedings of ICASSP 2018
ISBN
978-1-5386-4658-8
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
5554-5558
Publisher name
IEEE Signal Processing Society
Place of publication
Calgary
Event location
Calgary
Event date
Apr 15, 2018
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000446384605144

Similar results(10)

Compact Network for Speakerbeam Target Speaker Extraction Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion

What are you looking for?

Quick search

Smart search

Single Channel Target Speaker Extraction and Recognition with Speaker Beam

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)