Compact Network for Speakerbeam Target Speaker Extraction

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F19%3APU134186" target="_blank" >RIV/00216305:26230/19:PU134186 - isvavai.cz</a>
Výsledek na webu
<a href="https://ieeexplore.ieee.org/document/8683087" target="_blank" >https://ieeexplore.ieee.org/document/8683087</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP.2019.8683087" target="_blank" >10.1109/ICASSP.2019.8683087</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Compact Network for Speakerbeam Target Speaker Extraction
Popis výsledku v původním jazyce
Speech separation that separates a mixture of speech signals into each of its sources has been an active research topic for a long time and has seen recent progress with the advent of deep learning. A related problem is target speaker extraction, i.e. extraction of only speech of a target speaker out of a mixture, given characteristics of his/her voice. We have recently proposed SpeakerBeam, which is a neural network-based target speaker extraction method. Speaker- Beam uses a speech extraction network that is adapted to the target speaker using auxiliary features derived from an adaptation utterance of that speaker. Initially, we implemented SpeakerBeam with a factorized adaptation layer, which consists of several parallel linear transformations weighted by weights derived from the auxiliary features. The factorized layer is effective for target speech extraction, but it requires a large number of parameters. In this paper, we propose to simply scale the activations of a hidden layer of the speech extraction network with weights derived from the auxiliary features. This simpler approach greatly reduces the number of model parameters by up to 60%, making it much more practical, while maintaining a similar level of performance. We tested our approach on simulated and real noisy and reverberant mixtures, showing the potential of SpeakerBeam for real-life applications. Moreover, we showed that speech extraction performance of SpeakerBeam compares favorably with that of a state-of-the-art speech separation method with a similar network configuration.
Název v anglickém jazyce
Compact Network for Speakerbeam Target Speaker Extraction
Popis výsledku anglicky
Speech separation that separates a mixture of speech signals into each of its sources has been an active research topic for a long time and has seen recent progress with the advent of deep learning. A related problem is target speaker extraction, i.e. extraction of only speech of a target speaker out of a mixture, given characteristics of his/her voice. We have recently proposed SpeakerBeam, which is a neural network-based target speaker extraction method. Speaker- Beam uses a speech extraction network that is adapted to the target speaker using auxiliary features derived from an adaptation utterance of that speaker. Initially, we implemented SpeakerBeam with a factorized adaptation layer, which consists of several parallel linear transformations weighted by weights derived from the auxiliary features. The factorized layer is effective for target speech extraction, but it requires a large number of parameters. In this paper, we propose to simply scale the activations of a hidden layer of the speech extraction network with weights derived from the auxiliary features. This simpler approach greatly reduces the number of model parameters by up to 60%, making it much more practical, while maintaining a similar level of performance. We tested our approach on simulated and real noisy and reverberant mixtures, showing the potential of SpeakerBeam for real-life applications. Moreover, we showed that speech extraction performance of SpeakerBeam compares favorably with that of a state-of-the-art speech separation method with a similar network configuration.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/TJ01000208" target="_blank" >TJ01000208: NeurOnové sítě pro zpracování SIgnálu a dolování informací v řeČI - NOSIČI</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of ICASSP
ISBN
978-1-5386-4658-8
ISSN
—
e-ISSN
—
Počet stran výsledku
5
Strana od-do
6965-6969
Název nakladatele
IEEE Signal Processing Society
Místo vydání
Brighton
Místo konání akce
Brighton
Datum konání akce
12. 5. 2019
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000482554007040

Podobné výsledky(10)

Single Channel Target Speaker Extraction and Recognition with Speaker Beam Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam Evaluation of SpeakerBeam target speech extraction in real noisy and reverberant conditions

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Compact Network for Speakerbeam Target Speaker Extraction

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)