Discriminative Training of VBx Diarization
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F24%3APU152297" target="_blank" >RIV/00216305:26230/24:PU152297 - isvavai.cz</a>
Result on the web
<a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446119" target="_blank" >https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446119</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP48485.2024.10446119" target="_blank" >10.1109/ICASSP48485.2024.10446119</a>
Alternative languages
Result language
angličtina
Original language name
Discriminative Training of VBx Diarization
Original language description
Bayesian HMM clustering of x-vector sequences (VBx) has be- come a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a gen- eratively trained probabilistic linear discriminant analysis (PLDA) for speaker distribution modeling, and Bayesian inference to esti- mate the assignment of x-vectors to speakers. This paper presents a new framework for updating the VBx parameters using discrim- inative training, which directly optimizes a predefined loss. We also propose a new loss that better correlates with the diarization error rate compared to binary cross-entropy - the default choice for diarization end-to-end systems. Proof-of-concept results across three datasets (AMI, CALLHOME, and DIHARD II) demonstrate the method's capability of automatically finding hyperparameters, achieving comparable performance to those found by extensive grid search, which typically requires additional hyperparameter behavior knowledge. Moreover, we show that discriminative fine-tuning of PLDA can further improve the model's performance. We release the source code with this publication.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISBN
979-8-3503-4485-1
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
11871-11875
Publisher name
IEEE Signal Processing Society
Place of publication
Seoul
Event location
Seoul
Event date
Apr 14, 2024
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—