All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Probing Self-Supervised Learning Models With Target Speech Extraction

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F24%3APU152296" target="_blank" >RIV/00216305:26230/24:PU152296 - isvavai.cz</a>

  • Result on the web

    <a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10627502" target="_blank" >https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10627502</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1109/ICASSPW62465.2024.10627502" target="_blank" >10.1109/ICASSPW62465.2024.10627502</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Probing Self-Supervised Learning Models With Target Speech Extraction

  • Original language description

    Large-scale pre-trained self-supervised learning (SSL) models have shown remarkable advancements in speech-related tasks. However, the utilization of these models in complex multi-talker scenarios, such as extracting a target speaker in a mixture, is yet to be fully evaluated. In this paper, we introduce target speech extraction (TSE) as a novel downstream task to evaluate the feature extraction capabilities of pre-trained SSL models. TSE uniquely requires both speaker identification and speech separation, distinguishing it from other tasks in the Speech processing Universal PERformance Benchmark (SUPERB) evaluation. Specifically, we propose a TSE downstream model composed of two lightweight task-oriented modules based on the same frozen SSL model. One module functions as a speaker encoder to obtain target speaker information from an enrollment speech, while the other estimates the target speaker's mask to extract its speech from the mixture. Experimental results on the Libri2mix datasets reveal the relevance of the TSE downstream task to probe SSL models, as its performance cannot be simply deduced from other related tasks such as speaker verification and separation.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/GX19-26934X" target="_blank" >GX19-26934X: Neural Representations in Multi-modal and Multi-lingual Modeling</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2024

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

  • ISBN

    979-8-3503-7451-3

  • ISSN

  • e-ISSN

  • Number of pages

    5

  • Pages from-to

    535-539

  • Publisher name

    IEEE Signal Processing Society

  • Place of publication

    Seoul

  • Event location

    Seoul

  • Event date

    Apr 14, 2024

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article