Incorporation of the ASR output in speaker segmentation and clustering within the task of speaker diarization of broadcast streams
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F12%3A%230002003" target="_blank" >RIV/46747885:24220/12:#0002003 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Incorporation of the ASR output in speaker segmentation and clustering within the task of speaker diarization of broadcast streams
Original language description
In this paper we study the effect of incorporation of automatic transcriptions in the speaker diarization process. We aim to improve both the diarization accuracy as evaluated by standard objective measures and quality of the diarization output from user?s perspective. Although the presented approach relies on output of an automatic speech recognizer, it makes no use of lexical information. Instead, we use information about word boundaries and classification of non-speech events occurring in the processed stream. The former information is used as constraining condition for speaker change-point candidates and the latter facilitate to neglect various vocal noise sounds that carry no speaker-specific information (considering representation of the signal by cepstral features) and thus harm the speaker?s representation. The experimental evaluation of the presented approach was carried out using the COST278 multilingual broadcast news database. We demonstrate that the approach yields improve
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/TA01011204" target="_blank" >TA01011204: Living Archives</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2012
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proc. of IEEE conf. on Multimedia Signal Processing (MMSP)
ISBN
978-1-4673-4572-9
ISSN
—
e-ISSN
—
Number of pages
6
Pages from-to
118-123
Publisher name
—
Place of publication
Kanada
Event location
Banff, Kanada
Event date
Jan 1, 2012
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000312670200021