Applying articulatory features within speech recognition

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F19%3A00341897" target="_blank" >RIV/68407700:21230/19:00341897 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Applying articulatory features within speech recognition
Popis výsledku v původním jazyce
This thesis deals with research on Articulatory Features (AF) of speech with special focus on improvement of Czech spontaneous speech recognition. As spontaneous speech is caused by frequent occurrence of coarticulation process, assimilation and reduction of phones and as AF contain the information about speech production mechanisms, they might represent a possible way how to improve results of these systems. The potential contribution of AF-based TANDEM ASR architecture on the tasks of recognition or phonetic segmentation of spontaneous speech is described. The multi-valued AF classes for Czech and four East-European languages were defined and unified. Next work was focused on the estimation of AF using artificial neural networks. The suitability of standard and advanced acoustic speech features was analyzed, mainly from the point of view of temporal context at the input of ANN/DNN network. The behaviour of AF estimation in mismatched or adverse noisy acoustic conditions was also studied and the robustness of DCT-TRAP features was proved as the best choice for this task. The application of AF within ASR was realized in the form of AF-Based TANDEM system. The performance of the AF-Based TANDEM system was analyzed for English phone recognition and Czech ASR tasks. Positive impact of this system was observed for standard monophone and triphone systems based on MFCC features. The ASR combination of GMM-HMM/DNN-HMM with the AF-Based TANDEM system on the level of lattice with decoded hypotheses significantly improved baseline results. Finally, phonetic segmentation task was analyzed using various type of acoustic model architectures as well as focusing on proper pronunciation variant selection. It was done for the following two task: read English and casual Czech. Two-stage forced-alignment with combination of DNN-HMM and optimized monophone-based system was proposed and the improvement of phone boundary determination was proved for both tasks.
Název v anglickém jazyce
Applying articulatory features within speech recognition
Popis výsledku anglicky
This thesis deals with research on Articulatory Features (AF) of speech with special focus on improvement of Czech spontaneous speech recognition. As spontaneous speech is caused by frequent occurrence of coarticulation process, assimilation and reduction of phones and as AF contain the information about speech production mechanisms, they might represent a possible way how to improve results of these systems. The potential contribution of AF-based TANDEM ASR architecture on the tasks of recognition or phonetic segmentation of spontaneous speech is described. The multi-valued AF classes for Czech and four East-European languages were defined and unified. Next work was focused on the estimation of AF using artificial neural networks. The suitability of standard and advanced acoustic speech features was analyzed, mainly from the point of view of temporal context at the input of ANN/DNN network. The behaviour of AF estimation in mismatched or adverse noisy acoustic conditions was also studied and the robustness of DCT-TRAP features was proved as the best choice for this task. The application of AF within ASR was realized in the form of AF-Based TANDEM system. The performance of the AF-Based TANDEM system was analyzed for English phone recognition and Czech ASR tasks. Positive impact of this system was observed for standard monophone and triphone systems based on MFCC features. The ASR combination of GMM-HMM/DNN-HMM with the AF-Based TANDEM system on the level of lattice with decoded hypotheses significantly improved baseline results. Finally, phonetic segmentation task was analyzed using various type of acoustic model architectures as well as focusing on proper pronunciation variant selection. It was done for the following two task: read English and casual Czech. Two-stage forced-alignment with combination of DNN-HMM and optimized monophone-based system was proposed and the improvement of phone boundary determination was proved for both tasks.

Klasifikace

Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Podobné výsledky(10)

Automatic Phonetic Segmentation and Pronunciation Detection with Various Approaches of Acoustic Modeling Study on the use and adaptation of bottleneck features for robust speech recognition of nonlinearly distorted speech Accuracy of HMM-Based Phonetic Segmentation Using Monophone or Triphone Acoustic Model

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Applying articulatory features within speech recognition

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Podobné výsledky(10)