Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F23%3APU149387" target="_blank" >RIV/00216305:26230/23:PU149387 - isvavai.cz</a>
Result on the web
<a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10023345" target="_blank" >https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10023345</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/SLT54892.2023.10023345" target="_blank" >10.1109/SLT54892.2023.10023345</a>
Alternative languages
Result language
angličtina
Original language name
Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
Original language description
Self-supervised learning of speech representations from large amounts of unlabeled data has enabled state-of-the-art results in several speech processing tasks. Aggregating these speech representations across time is typically approached by using descriptive statistics, and in particular, using the first- and second-order statistics of representation coefficients. In this paper, we examine an alternative way of extracting speaker and emotion information from self-supervised trained models, based on the correlations between the coefficients of the representations - correlation pooling. We show improvements over mean pooling and further gains when the pooling methods are combined via fusion. The code is available at github.com/Lamomal/s3prl_correlation.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
ISBN
978-1-6654-7189-3
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
1136-1143
Publisher name
IEEE Signal Processing Society
Place of publication
Doha
Event location
Doha
Event date
Jan 9, 2023
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000968851900153