Using X-vectors for Speech Activity Detection in Broadcast Streams
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F21%3A00009297" target="_blank" >RIV/46747885:24220/21:00009297 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.isca-speech.org/archive/pdfs/interspeech_2021/mateju21_interspeech.pdf" target="_blank" >https://www.isca-speech.org/archive/pdfs/interspeech_2021/mateju21_interspeech.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.21437/Interspeech.2021-192" target="_blank" >10.21437/Interspeech.2021-192</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Using X-vectors for Speech Activity Detection in Broadcast Streams
Popis výsledku v původním jazyce
A new approach to speech activity detection (SAD) is presented in this work. It allows us to reduce the complexity and computation demands, namely in services that process streaming speech, where a SAD module usually forms the first block of the data pipeline (e.g., in a platform for 24/7 broadcast transcription). Our approach utilizes x-vectors as input features so that, within the subsequent pipeline stages, these embedding instances can also directly be employed for speaker diarization and recognition. The x-vectors are extracted by feed-forward sequential memory network (FSMN), allowing for modeling long-time dependencies; they thus form an input into a computationally undemanding binary classifier, whose output is smoothed by a decoder. Evaluation is performed on the standardized QUTNOISE- TIMIT dataset as well as on broadcast data with large portions of music and background noise. The former data allows for comparison with other existing approaches. The latter shows the performance in terms of word error rate (WER) and reduction in real-time factor (RTF) of the transcription process.
Název v anglickém jazyce
Using X-vectors for Speech Activity Detection in Broadcast Streams
Popis výsledku anglicky
A new approach to speech activity detection (SAD) is presented in this work. It allows us to reduce the complexity and computation demands, namely in services that process streaming speech, where a SAD module usually forms the first block of the data pipeline (e.g., in a platform for 24/7 broadcast transcription). Our approach utilizes x-vectors as input features so that, within the subsequent pipeline stages, these embedding instances can also directly be employed for speaker diarization and recognition. The x-vectors are extracted by feed-forward sequential memory network (FSMN), allowing for modeling long-time dependencies; they thus form an input into a computationally undemanding binary classifier, whose output is smoothed by a decoder. Evaluation is performed on the standardized QUTNOISE- TIMIT dataset as well as on broadcast data with large portions of music and background noise. The former data allows for comparison with other existing approaches. The latter shows the performance in terms of word error rate (WER) and reduction in real-time factor (RTF) of the transcription process.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/TH03010018" target="_blank" >TH03010018: DeepSpot - Multilingvální technologie pro detekci a včasné upozornění</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISBN
978-171383690-2
ISSN
2308-457X
e-ISSN
—
Počet stran výsledku
5
Strana od-do
4161 - 4165
Název nakladatele
ISCA
Místo vydání
—
Místo konání akce
Brno, ČR
Datum konání akce
1. 1. 2021
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000841879501118