Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

Single Channel Target Speaker Extraction and Recognition with Speaker Beam

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F18%3APU130735" target="_blank" >RIV/00216305:26230/18:PU130735 - isvavai.cz</a>

  • Výsledek na webu

    <a href="http://www.fit.vutbr.cz/research/pubs/all.php?id=11721" target="_blank" >http://www.fit.vutbr.cz/research/pubs/all.php?id=11721</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1109/ICASSP.2018.8462661" target="_blank" >10.1109/ICASSP.2018.8462661</a>

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    Single Channel Target Speaker Extraction and Recognition with Speaker Beam

  • Popis výsledku v původním jazyce

    This paper addresses the problem of single channel speech recognition of a target speaker in a mixture of speech signals. We propose to exploit auxiliary speaker information provided by an adaptation utterance from the target speaker to extract and recognize only that speaker. Using such auxiliary information, we can build a speaker extraction neural network (NN) that is independent of the number of sources in the mixture, and that can track speakers across different utterances, which are two challenging issues occurring with conventional approaches for speech recognition of mixtures. We call such an informed speaker extraction scheme "SpeakerBeam". SpeakerBeam exploits a recently developed context adaptive deep NN (CADNN) that allows tracking speech from a target speaker using a speaker adaptation layer, whose parameters are adjusted depending on auxiliary features representing the target speaker characteristics. SpeakerBeam was previously investigated for speaker extraction using a microphone array. In this paper, we demonstrate that it is also efficient for single channel speaker extraction. The speaker adaptation layer can be employed either to build a speaker adaptive acoustic model that recognizes only the target speaker or a maskbased speaker extraction network that extracts the target speech from the speech mixture signal prior to recognition. We also show that the latter speaker extraction network can be optimized jointly with an acoustic model to further improve ASR performance.

  • Název v anglickém jazyce

    Single Channel Target Speaker Extraction and Recognition with Speaker Beam

  • Popis výsledku anglicky

    This paper addresses the problem of single channel speech recognition of a target speaker in a mixture of speech signals. We propose to exploit auxiliary speaker information provided by an adaptation utterance from the target speaker to extract and recognize only that speaker. Using such auxiliary information, we can build a speaker extraction neural network (NN) that is independent of the number of sources in the mixture, and that can track speakers across different utterances, which are two challenging issues occurring with conventional approaches for speech recognition of mixtures. We call such an informed speaker extraction scheme "SpeakerBeam". SpeakerBeam exploits a recently developed context adaptive deep NN (CADNN) that allows tracking speech from a target speaker using a speaker adaptation layer, whose parameters are adjusted depending on auxiliary features representing the target speaker characteristics. SpeakerBeam was previously investigated for speaker extraction using a microphone array. In this paper, we demonstrate that it is also efficient for single channel speaker extraction. The speaker adaptation layer can be employed either to build a speaker adaptive acoustic model that recognizes only the target speaker or a maskbased speaker extraction network that extracts the target speech from the speech mixture signal prior to recognition. We also show that the latter speaker extraction network can be optimized jointly with an acoustic model to further improve ASR performance.

Klasifikace

  • Druh

    D - Stať ve sborníku

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

    <a href="/cs/project/LQ1602" target="_blank" >LQ1602: IT4Innovations excellence in science</a><br>

  • Návaznosti

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

  • Rok uplatnění

    2018

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název statě ve sborníku

    Proceedings of ICASSP 2018

  • ISBN

    978-1-5386-4658-8

  • ISSN

  • e-ISSN

  • Počet stran výsledku

    5

  • Strana od-do

    5554-5558

  • Název nakladatele

    IEEE Signal Processing Society

  • Místo vydání

    Calgary

  • Místo konání akce

    Calgary

  • Datum konání akce

    15. 4. 2018

  • Typ akce podle státní příslušnosti

    WRD - Celosvětová akce

  • Kód UT WoS článku

    000446384605144