All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F19%3APU134173" target="_blank" >RIV/00216305:26230/19:PU134173 - isvavai.cz</a>

  • Result on the web

    <a href="https://ieeexplore.ieee.org/document/8736286" target="_blank" >https://ieeexplore.ieee.org/document/8736286</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1109/JSTSP.2019.2922820" target="_blank" >10.1109/JSTSP.2019.2922820</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

  • Original language description

    The processing of speech corrupted by interfering overlapping speakers is one of the challenging problems with regards to todays automatic speech recognition systems. Recently, approaches based on deep learning have made great progress toward solving this problem. Most of these approaches tackle the problem as speech separation, i.e., they blindly recover all the speakers from the mixture. In some scenarios, such as smart personal devices, we may however be interested in recovering one target speaker froma mixture. In this paper, we introduce Speaker- Beam, a method for extracting a target speaker from the mixture based on an adaptation utterance spoken by the target speaker. Formulating the problem as speaker extraction avoids certain issues such as label permutation and the need to determine the number of speakers in the mixture.With SpeakerBeam, we jointly learn to extract a representation from the adaptation utterance characterizing the target speaker and to use this representation to extract the speaker. We explore several ways to do this, mostly inspired by speaker adaptation in acoustic models for automatic speech recognition. We evaluate the performance on the widely used WSJ0-2mix andWSJ0-3mix datasets, and these datasets modified with more noise or more realistic overlapping patterns. We further analyze the learned behavior by exploring the speaker representations and assessing the effect of the length of the adaptation data. The results show the benefit of including speaker information in the processing and the effectiveness of the proposed method.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    Result was created during the realization of more than one project. More information in the Projects tab.

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2019

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    IEEE J-STSP

  • ISSN

    1932-4553

  • e-ISSN

    1941-0484

  • Volume of the periodical

    13

  • Issue of the periodical within the volume

    4

  • Country of publishing house

    US - UNITED STATES

  • Number of pages

    15

  • Pages from-to

    800-814

  • UT code for WoS article

    000477715300003

  • EID of the result in the Scopus database

    2-s2.0-85069900431