DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F24%3APU152298" target="_blank" >RIV/00216305:26230/24:PU152298 - isvavai.cz</a>
Výsledek na webu
<a href="https://ieeexplore.ieee.org/document/10584294" target="_blank" >https://ieeexplore.ieee.org/document/10584294</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/TASLP.2024.3422818" target="_blank" >10.1109/TASLP.2024.3422818</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors
Popis výsledku v původním jazyce
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly re- garding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most success- ful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Further- more, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.
Název v anglickém jazyce
DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors
Popis výsledku anglicky
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly re- garding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most success- ful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Further- more, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
IEEE Transactions on Audio, Speech, and Language Processing
ISSN
1558-7916
e-ISSN
1558-7924
Svazek periodika
32
Číslo periodika v rámci svazku
7
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
16
Strana od-do
3450-3465
Kód UT WoS článku
001283673700005
EID výsledku v databázi Scopus
2-s2.0-85197558425

Podobné výsledky(10)

From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization Applying EEND Diarization to Telephone Recordings from a Call Center

Co hledáte?

Rychlé hledání

Chytré vyhledávání

DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)