All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Target Speech Extraction with Pre-Trained Self-Supervised Learning Models

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F24%3APU152295" target="_blank" >RIV/00216305:26230/24:PU152295 - isvavai.cz</a>

  • Result on the web

    <a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10448315" target="_blank" >https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10448315</a>

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Target Speech Extraction with Pre-Trained Self-Supervised Learning Models

  • Original language description

    Pre-trained self-supervised learning (SSL) models have achieved re- markable success in various speech tasks. However, their potential in target speech extraction (TSE) has not been fully exploited. TSE aims to extract the speech of a target speaker in a mixture guided by enrollment utterances. We exploit pre-trained SSL models for two purposes within a TSE framework, i.e., to process the input mixture and to derive speaker embeddings from the enrollment. In this paper, we focus on how to effectively use SSL models for TSE. We first in- troduce a novel TSE downstream task following the SUPERB princi- ples. This simple experiment shows the potential of SSL models for TSE, but extraction performance remains far behind the state-of-the- art. We then extend a powerful TSE architecture by incorporating two SSL-based modules: an Adaptive Input Enhancer (AIE) and a speaker encoder. Specifically, the proposed AIE utilizes intermedi- ate representations from the CNN encoder by adjusting the time re

  • Czech name

  • Czech description

Classification

  • Type

    O - Miscellaneous

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/GX19-26934X" target="_blank" >GX19-26934X: Neural Representations in Multi-modal and Multi-lingual Modeling</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2024

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů