Robust Automatic Recognition of Speech with Background Music
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F17%3A00004811" target="_blank" >RIV/46747885:24220/17:00004811 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1109/ICASSP.2017.7953150" target="_blank" >http://dx.doi.org/10.1109/ICASSP.2017.7953150</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP.2017.7953150" target="_blank" >10.1109/ICASSP.2017.7953150</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Robust Automatic Recognition of Speech with Background Music
Popis výsledku v původním jazyce
This paper addresses the task of Automatic Speech Recognition (ASR) with music in the background, where the accuracy of recognition may deteriorate significantly. To improve the robustness of ASR in this task, e.g. for broadcast news transcription or subtitles creation, we adopt two approaches: 1) multi-condition training of the acoustic models and 2) denoising autoencoders followed by acoustic model training on the preprocessed data. In the latter case, two types of autoencoders are considered: the fully connected and the convolutional network. Presented experimental results show that all the investigated techniques are able to improve the recognition of speech distorted by music significantly. For example, in the case of artificial mixtures of speech and electronic music (low Signal-to-Noise Ratio (SNR) of 0 dB), we achieved absolute improvement of accuracy by 35.8%. For real-world broadcast news and a high SNR (about 10 dB), we achieved improvement by 2.4%. The important advantage of the studied approaches is that they do not deteriorate the accuracy in scenarios with clean speech (the decrease is about 1%).
Název v anglickém jazyce
Robust Automatic Recognition of Speech with Background Music
Popis výsledku anglicky
This paper addresses the task of Automatic Speech Recognition (ASR) with music in the background, where the accuracy of recognition may deteriorate significantly. To improve the robustness of ASR in this task, e.g. for broadcast news transcription or subtitles creation, we adopt two approaches: 1) multi-condition training of the acoustic models and 2) denoising autoencoders followed by acoustic model training on the preprocessed data. In the latter case, two types of autoencoders are considered: the fully connected and the convolutional network. Presented experimental results show that all the investigated techniques are able to improve the recognition of speech distorted by music significantly. For example, in the case of artificial mixtures of speech and electronic music (low Signal-to-Noise Ratio (SNR) of 0 dB), we achieved absolute improvement of accuracy by 35.8%. For real-world broadcast news and a high SNR (about 10 dB), we achieved improvement by 2.4%. The important advantage of the studied approaches is that they do not deteriorate the accuracy in scenarios with clean speech (the decrease is about 1%).
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20204 - Robotics and automatic control
Návaznosti výsledku
Projekt
<a href="/cs/project/TA04010199" target="_blank" >TA04010199: MULTILINMEDIA - Multilinguální platforma pro monitoring a analýzu multimédií</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
16 June 2017, Article number 7953150, Pages 5210-52142017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017; Hilton New Orleans RiversideNew Orleans; United States; 5 March 2017 through 9 March 2017; Category numberCFP
ISBN
978-1-5090-4117-6
ISSN
1520-6149
e-ISSN
—
Počet stran výsledku
5
Strana od-do
5210-5214
Název nakladatele
Institute of Electrical and Electronics Engineers Inc.
Místo vydání
USA
Místo konání akce
New Orleans, USA
Datum konání akce
1. 1. 2017
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000414286205074