Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F17%3A43932657" target="_blank" >RIV/49777513:23520/17:43932657 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.21437/Interspeech.2017-51" target="_blank" >http://dx.doi.org/10.21437/Interspeech.2017-51</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2017-51" target="_blank" >10.21437/Interspeech.2017-51</a>
Alternative languages
Result language
angličtina
Original language name
Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement
Original language description
The aim of this paper is to investigate the benefit of information from a speaker change detection system based on Convolutional Neural Network (CNN) when applied to the process of accumu- lation of statistics for an i-vector generation. The investigation is carried out on the problem of diarization. In our system, the output of the CNN is a probability value of a speaker change in a conversation for a given time segment. According to this probability, we cut the conversation into short segments that are then represented by the i-vector (to describe a speaker in it). We propose a technique to utilize the information from the CNN for the weighting of the acoustic data in a segment to refine the statistics accumulation process. This technique enables us to represent the speaker better in the final i-vector. The experi- ments on the English part of the CallHome corpus show that our proposed refinement of the statistics accumulation is beneficial with the relative improvement of Diarization Error Rate almost by 16 % when compared to the speaker diarization system with- out statistics refinement.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/LO1506" target="_blank" >LO1506: Sustainability support of the centre NTIS - New Technologies for the Information Society</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 18th Annual Conference of the International Speech Communication Association (Interspeech 2017)
ISBN
978-1-5108-4876-4
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
3562-3566
Publisher name
Curran Associates, Inc.
Place of publication
Red Hook, NY
Event location
Stockholm, Sweden
Event date
Aug 20, 2017
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000457505000741