Investigation into the Use of WFSTs and DNNs for Speech Activity Detection in Broadcast Data Transcription
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F17%3A00004824" target="_blank" >RIV/46747885:24220/17:00004824 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-319-67876-4_16" target="_blank" >http://dx.doi.org/10.1007/978-3-319-67876-4_16</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-67876-4_16" target="_blank" >10.1007/978-3-319-67876-4_16</a>
Alternative languages
Result language
angličtina
Original language name
Investigation into the Use of WFSTs and DNNs for Speech Activity Detection in Broadcast Data Transcription
Original language description
This paper deals with the task of Speech Activity Detection (SAD). The main goal is to investigate a new SAD approach suitable for offline as well as online transcription of various radio/TV broadcasts containing a large amount of non-speech segments. For this purpose, Deep Neural Networks (DNNs) with various hyper-parameters are adopted and evaluated. Their training is carried out using artificially created mixtures of speech and non-speech signals. Our SAD scheme also utilizes a decoder based on Weighted Finite State Transducers (WFSTs). The decoder smooths the output from DNN, can operate online and utilizes context-based transduction model, where both speech and non-speech events are modeled using sequences of states. The final evaluation of the developed approach is carried out on standardized QUT-NOISE-TIMIT data set for SAD and in a real broadcast transcription system. The obtained results show that our SAD module yields state-of-the-art results on QUT-NOISE-TIMIT, and, at the same time, it is capable of (a) operating with low latency and (b) reducing the computational demands and error rate of the transcription system.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20204 - Robotics and automatic control
Result continuities
Project
<a href="/en/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilingual Multimedia Monitoring and Analyzing Platform</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Communications in Computer and Information Science
ISBN
978-331967875-7
ISSN
1865-0929
e-ISSN
—
Number of pages
18
Pages from-to
341-358
Publisher name
Springer Verlag
Place of publication
Spolková republika Německo
Event location
Lisbon; Portugal
Event date
Jan 1, 2016
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—