Investigation into the Use of WFSTs and DNNs for Speech Activity Detection in Broadcast Data Transcription
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F17%3A00004824" target="_blank" >RIV/46747885:24220/17:00004824 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1007/978-3-319-67876-4_16" target="_blank" >http://dx.doi.org/10.1007/978-3-319-67876-4_16</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-67876-4_16" target="_blank" >10.1007/978-3-319-67876-4_16</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Investigation into the Use of WFSTs and DNNs for Speech Activity Detection in Broadcast Data Transcription
Popis výsledku v původním jazyce
This paper deals with the task of Speech Activity Detection (SAD). The main goal is to investigate a new SAD approach suitable for offline as well as online transcription of various radio/TV broadcasts containing a large amount of non-speech segments. For this purpose, Deep Neural Networks (DNNs) with various hyper-parameters are adopted and evaluated. Their training is carried out using artificially created mixtures of speech and non-speech signals. Our SAD scheme also utilizes a decoder based on Weighted Finite State Transducers (WFSTs). The decoder smooths the output from DNN, can operate online and utilizes context-based transduction model, where both speech and non-speech events are modeled using sequences of states. The final evaluation of the developed approach is carried out on standardized QUT-NOISE-TIMIT data set for SAD and in a real broadcast transcription system. The obtained results show that our SAD module yields state-of-the-art results on QUT-NOISE-TIMIT, and, at the same time, it is capable of (a) operating with low latency and (b) reducing the computational demands and error rate of the transcription system.
Název v anglickém jazyce
Investigation into the Use of WFSTs and DNNs for Speech Activity Detection in Broadcast Data Transcription
Popis výsledku anglicky
This paper deals with the task of Speech Activity Detection (SAD). The main goal is to investigate a new SAD approach suitable for offline as well as online transcription of various radio/TV broadcasts containing a large amount of non-speech segments. For this purpose, Deep Neural Networks (DNNs) with various hyper-parameters are adopted and evaluated. Their training is carried out using artificially created mixtures of speech and non-speech signals. Our SAD scheme also utilizes a decoder based on Weighted Finite State Transducers (WFSTs). The decoder smooths the output from DNN, can operate online and utilizes context-based transduction model, where both speech and non-speech events are modeled using sequences of states. The final evaluation of the developed approach is carried out on standardized QUT-NOISE-TIMIT data set for SAD and in a real broadcast transcription system. The obtained results show that our SAD module yields state-of-the-art results on QUT-NOISE-TIMIT, and, at the same time, it is capable of (a) operating with low latency and (b) reducing the computational demands and error rate of the transcription system.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20204 - Robotics and automatic control
Návaznosti výsledku
Projekt
<a href="/cs/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilinguální platforma pro monitoring a analýzu multimédií</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Communications in Computer and Information Science
ISBN
978-331967875-7
ISSN
1865-0929
e-ISSN
—
Počet stran výsledku
18
Strana od-do
341-358
Název nakladatele
Springer Verlag
Místo vydání
Spolková republika Německo
Místo konání akce
Lisbon; Portugal
Datum konání akce
1. 1. 2016
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—