Compensation of Nonlinear Distortions in Speech for Automatic Recognition
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F46747885%3A24220%2F15%3A00003411" target="_blank" >RIV/46747885:24220/15:00003411 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1109/TSP.2015.7296378" target="_blank" >http://dx.doi.org/10.1109/TSP.2015.7296378</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/TSP.2015.7296378" target="_blank" >10.1109/TSP.2015.7296378</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Compensation of Nonlinear Distortions in Speech for Automatic Recognition
Popis výsledku v původním jazyce
This paper addresses improvement of automatic transcription of speech distorted already during recording or by consequent processing. We focus on distortions that cannot be represented by most often used models, that is, as an additive noise or a linear convolutive channel distortion. We consider a) signals distorted through overgained microphone preamplifier and b) recordings exhibiting unnatural spectral sparseness, caused by application of excessive denoising or low-bit-rate compression. We demonstrate that these distortions deteriorate ASR accuracy significantly. To compensate, we propose to employ a combination of two general robust speech recognition techniques: a front-end feature normalization method and a channel/speaker adaptation technique. We present a significant improvement of transcription accuracy in the case of lectures distorted during recording, compressed broadcast data and utterances recorded with an inappropriately applied denoising..
Název v anglickém jazyce
Compensation of Nonlinear Distortions in Speech for Automatic Recognition
Popis výsledku anglicky
This paper addresses improvement of automatic transcription of speech distorted already during recording or by consequent processing. We focus on distortions that cannot be represented by most often used models, that is, as an additive noise or a linear convolutive channel distortion. We consider a) signals distorted through overgained microphone preamplifier and b) recordings exhibiting unnatural spectral sparseness, caused by application of excessive denoising or low-bit-rate compression. We demonstrate that these distortions deteriorate ASR accuracy significantly. To compensate, we propose to employ a combination of two general robust speech recognition techniques: a front-end feature normalization method and a channel/speaker adaptation technique. We present a significant improvement of transcription accuracy in the case of lectures distorted during recording, compressed broadcast data and utterances recorded with an inappropriately applied denoising..
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
JC - Počítačový hardware a software
OECD FORD obor
—
Návaznosti výsledku
Projekt
<a href="/cs/project/TA01011142" target="_blank" >TA01011142: Automatická transkripce a indexace přednášek</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
38th International Conference on Telecommunications and Signal Processing, TSP 2015
ISBN
978-1-4799-8498-5
ISSN
—
e-ISSN
—
Počet stran výsledku
5
Strana od-do
419-423
Název nakladatele
Institute of Electrical and Electronics Engineers Inc.
Místo vydání
Praha, Česká Republika
Místo konání akce
Praha
Datum konání akce
—
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—