All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F22%3APU142975" target="_blank" >RIV/00216305:26230/22:PU142975 - isvavai.cz</a>

  • Result on the web

    <a href="https://www.sciencedirect.com/science/article/pii/S0885230821000619" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0885230821000619</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1016/j.csl.2021.101254" target="_blank" >10.1016/j.csl.2021.101254</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks

  • Original language description

    The recently proposed VBx diarization method uses a Bayesian hidden Markov model to find speaker clusters in a sequence of x-vectors. In this work we perform an extensive comparison of performance of the VBx diarization with other approaches in the literature and we show that VBx achieves superior performance on three of the most popular datasets for evaluating diarization: CALLHOME, AMI and DIHARD II datasets. Further, we present for the first time the derivation and update formulae for the VBx model, focusing on the efficiency and simplicity of this model as compared to the previous and more complex BHMM model working on frame-by-frame standard Cepstral features. Together with this publication, we release the recipe for training the x-vector extractors used in our experiments on both wide and narrowband data, and the VBx recipes that attain state-of-the-art performance on all three datasets. Besides, we point out the lack of a standardized evaluation protocol for AMI dataset and we propose a new protocol for both Beamformed and Mix-Headset audios based on the official AMI partitions and transcriptions.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/GX19-26934X" target="_blank" >GX19-26934X: Neural Representations in Multi-modal and Multi-lingual Modeling</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2022

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    COMPUTER SPEECH AND LANGUAGE

  • ISSN

    0885-2308

  • e-ISSN

    1095-8363

  • Volume of the periodical

    71

  • Issue of the periodical within the volume

    101254

  • Country of publishing house

    GB - UNITED KINGDOM

  • Number of pages

    16

  • Pages from-to

    1-16

  • UT code for WoS article

    000761599000019

  • EID of the result in the Scopus database

    2-s2.0-85109214006