Speech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F17%3A00004814" target="_blank" >RIV/46747885:24220/17:00004814 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1109/ICASSP.2017.7953200" target="_blank" >http://dx.doi.org/10.1109/ICASSP.2017.7953200</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP.2017.7953200" target="_blank" >10.1109/ICASSP.2017.7953200</a>

Result language
angličtina
Original language name
Speech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducers
Original language description
In this paper, a new approach to online Speech Activity Detection (SAD) is proposed. This approach is designed for the use in a system that carries out 24/7 transcription of radio/TV broadcasts containing a large amount of non-speech segments, such as advertisements or music. To improve the robustness of detection, we adopt Deep Neural Networks (DNNs) trained on artificially created mixtures of speech and non-speech signals at desired levels of signal-to-noise ratio (SNR). An integral part of our approach is an online decoder based on Weighted Finite State Transducers (WFSTs); this decoder smooths the output from DNN. The employed transduction model is context-based, i.e., both speech and non-speech events are modeled using sequences of states. The presented experimental results show that our approach yields state-of-the-art results on standardized QUT-NOISE-TIMIT data set for SAD and, at the same time, it is capable of a) operating with low latency and b) reducing the computational demands and error rate of the target transcription system.
Czech name
—
Czech description
—

Project
<a href="/en/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilingual Multimedia Monitoring and Analyzing Platform</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Article name in the collection
2017 IEEE IICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedingsnternational Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
ISBN
978-1-5090-4117-6
ISSN
1520-6149
e-ISSN
—
Number of pages
5
Pages from-to
5460-5464
Publisher name
Institute of Electrical and Electronics Engineers Inc.
Place of publication
USA
Event location
New Orleans, USA
Event date
Jan 1, 2017
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000414286205124

Similar results(10)