Improved Estimation of Articulatory Features Based on Acoustic Features with Temporal Context
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F15%3A00232103" target="_blank" >RIV/68407700:21230/15:00232103 - isvavai.cz</a>
Výsledek na webu
<a href="http://link.springer.com/chapter/10.1007/978-3-319-24033-6_63" target="_blank" >http://link.springer.com/chapter/10.1007/978-3-319-24033-6_63</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-24033-6_63" target="_blank" >10.1007/978-3-319-24033-6_63</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Improved Estimation of Articulatory Features Based on Acoustic Features with Temporal Context
Popis výsledku v původním jazyce
The paper deals with neural network-based estimation of articulatory features for Czech which are intended to be applied within automatic phonetic segmentation or automatic speech recognition. In our current approach we use the multi-layer perceptron networks to extract the articulatory features on the basis of non-linear mapping from standard acoustic features extracted from speech signal. The suitability of various acoustic features and the optimum length of temporal context at the input of used network were analysed. The temporal context is represented by a context window created from the stacked feature vectors. The optimum length of the temporal contextual information was analysed and identified for the context window in the range from 9 to 21 frames.We obtained 90.5% frame level accuracy on average across all the articulatory feature classes for mellog filter-bank features. The highest classification rate of 95.3% was achieved for the voicing class.
Název v anglickém jazyce
Improved Estimation of Articulatory Features Based on Acoustic Features with Temporal Context
Popis výsledku anglicky
The paper deals with neural network-based estimation of articulatory features for Czech which are intended to be applied within automatic phonetic segmentation or automatic speech recognition. In our current approach we use the multi-layer perceptron networks to extract the articulatory features on the basis of non-linear mapping from standard acoustic features extracted from speech signal. The suitability of various acoustic features and the optimum length of temporal context at the input of used network were analysed. The temporal context is represented by a context window created from the stacked feature vectors. The optimum length of the temporal contextual information was analysed and identified for the context window in the range from 9 to 21 frames.We obtained 90.5% frame level accuracy on average across all the articulatory feature classes for mellog filter-bank features. The highest classification rate of 95.3% was achieved for the voicing class.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
JA - Elektronika a optoelektronika, elektrotechnika
OECD FORD obor
—
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Text, Speech, and Dialogue. 18th International Conference, TSD 2015
ISBN
978-3-319-24032-9
ISSN
0302-9743
e-ISSN
—
Počet stran výsledku
9
Strana od-do
560-568
Název nakladatele
Springer
Místo vydání
Heidelberg
Místo konání akce
Plzen
Datum konání akce
14. 9. 2015
Typ akce podle státní příslušnosti
EUR - Evropská akce
Kód UT WoS článku
000365947800063