SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F19%3APU134173" target="_blank" >RIV/00216305:26230/19:PU134173 - isvavai.cz</a>
Výsledek na webu
<a href="https://ieeexplore.ieee.org/document/8736286" target="_blank" >https://ieeexplore.ieee.org/document/8736286</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/JSTSP.2019.2922820" target="_blank" >10.1109/JSTSP.2019.2922820</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
Popis výsledku v původním jazyce
The processing of speech corrupted by interfering overlapping speakers is one of the challenging problems with regards to todays automatic speech recognition systems. Recently, approaches based on deep learning have made great progress toward solving this problem. Most of these approaches tackle the problem as speech separation, i.e., they blindly recover all the speakers from the mixture. In some scenarios, such as smart personal devices, we may however be interested in recovering one target speaker froma mixture. In this paper, we introduce Speaker- Beam, a method for extracting a target speaker from the mixture based on an adaptation utterance spoken by the target speaker. Formulating the problem as speaker extraction avoids certain issues such as label permutation and the need to determine the number of speakers in the mixture.With SpeakerBeam, we jointly learn to extract a representation from the adaptation utterance characterizing the target speaker and to use this representation to extract the speaker. We explore several ways to do this, mostly inspired by speaker adaptation in acoustic models for automatic speech recognition. We evaluate the performance on the widely used WSJ0-2mix andWSJ0-3mix datasets, and these datasets modified with more noise or more realistic overlapping patterns. We further analyze the learned behavior by exploring the speaker representations and assessing the effect of the length of the adaptation data. The results show the benefit of including speaker information in the processing and the effectiveness of the proposed method.
Název v anglickém jazyce
SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
Popis výsledku anglicky
The processing of speech corrupted by interfering overlapping speakers is one of the challenging problems with regards to todays automatic speech recognition systems. Recently, approaches based on deep learning have made great progress toward solving this problem. Most of these approaches tackle the problem as speech separation, i.e., they blindly recover all the speakers from the mixture. In some scenarios, such as smart personal devices, we may however be interested in recovering one target speaker froma mixture. In this paper, we introduce Speaker- Beam, a method for extracting a target speaker from the mixture based on an adaptation utterance spoken by the target speaker. Formulating the problem as speaker extraction avoids certain issues such as label permutation and the need to determine the number of speakers in the mixture.With SpeakerBeam, we jointly learn to extract a representation from the adaptation utterance characterizing the target speaker and to use this representation to extract the speaker. We explore several ways to do this, mostly inspired by speaker adaptation in acoustic models for automatic speech recognition. We evaluate the performance on the widely used WSJ0-2mix andWSJ0-3mix datasets, and these datasets modified with more noise or more realistic overlapping patterns. We further analyze the learned behavior by exploring the speaker representations and assessing the effect of the length of the adaptation data. The results show the benefit of including speaker information in the processing and the effectiveness of the proposed method.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
IEEE J-STSP
ISSN
1932-4553
e-ISSN
1941-0484
Svazek periodika
13
Číslo periodika v rámci svazku
4
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
15
Strana od-do
800-814
Kód UT WoS článku
000477715300003
EID výsledku v databázi Scopus
2-s2.0-85069900431

Podobné výsledky(10)

Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction Single Channel Target Speaker Extraction and Recognition with Speaker Beam Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion

Co hledáte?

Rychlé hledání

Chytré vyhledávání

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)