Applying articulatory features within speech recognition

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F19%3A00341897" target="_blank" >RIV/68407700:21230/19:00341897 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—

Alternative languages

Result language
angličtina
Original language name
Applying articulatory features within speech recognition
Original language description
This thesis deals with research on Articulatory Features (AF) of speech with special focus on improvement of Czech spontaneous speech recognition. As spontaneous speech is caused by frequent occurrence of coarticulation process, assimilation and reduction of phones and as AF contain the information about speech production mechanisms, they might represent a possible way how to improve results of these systems. The potential contribution of AF-based TANDEM ASR architecture on the tasks of recognition or phonetic segmentation of spontaneous speech is described. The multi-valued AF classes for Czech and four East-European languages were defined and unified. Next work was focused on the estimation of AF using artificial neural networks. The suitability of standard and advanced acoustic speech features was analyzed, mainly from the point of view of temporal context at the input of ANN/DNN network. The behaviour of AF estimation in mismatched or adverse noisy acoustic conditions was also studied and the robustness of DCT-TRAP features was proved as the best choice for this task. The application of AF within ASR was realized in the form of AF-Based TANDEM system. The performance of the AF-Based TANDEM system was analyzed for English phone recognition and Czech ASR tasks. Positive impact of this system was observed for standard monophone and triphone systems based on MFCC features. The ASR combination of GMM-HMM/DNN-HMM with the AF-Based TANDEM system on the level of lattice with decoded hypotheses significantly improved baseline results. Finally, phonetic segmentation task was analyzed using various type of acoustic model architectures as well as focusing on proper pronunciation variant selection. It was done for the following two task: read English and casual Czech. Two-stage forced-alignment with combination of DNN-HMM and optimized monophone-based system was proposed and the improvement of phone boundary determination was proved for both tasks.
Czech name
—
Czech description
—

Classification

Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
—
Continuities
S - Specificky vyzkum na vysokych skolach

Others

Publication year
2019
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Similar results(10)

Automatic Phonetic Segmentation and Pronunciation Detection with Various Approaches of Acoustic Modeling Study on the use and adaptation of bottleneck features for robust speech recognition of nonlinearly distorted speech Accuracy of HMM-Based Phonetic Segmentation Using Monophone or Triphone Acoustic Model

What are you looking for?

Quick search

Smart search

Applying articulatory features within speech recognition

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Similar results(10)