Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

Applying articulatory features within speech recognition

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F19%3A00341897" target="_blank" >RIV/68407700:21230/19:00341897 - isvavai.cz</a>

  • Výsledek na webu

  • DOI - Digital Object Identifier

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    Applying articulatory features within speech recognition

  • Popis výsledku v původním jazyce

    This thesis deals with research on Articulatory Features (AF) of speech with special focus on improvement of Czech spontaneous speech recognition. As spontaneous speech is caused by frequent occurrence of coarticulation process, assimilation and reduction of phones and as AF contain the information about speech production mechanisms, they might represent a possible way how to improve results of these systems. The potential contribution of AF-based TANDEM ASR architecture on the tasks of recognition or phonetic segmentation of spontaneous speech is described. The multi-valued AF classes for Czech and four East-European languages were defined and unified. Next work was focused on the estimation of AF using artificial neural networks. The suitability of standard and advanced acoustic speech features was analyzed, mainly from the point of view of temporal context at the input of ANN/DNN network. The behaviour of AF estimation in mismatched or adverse noisy acoustic conditions was also studied and the robustness of DCT-TRAP features was proved as the best choice for this task. The application of AF within ASR was realized in the form of AF-Based TANDEM system. The performance of the AF-Based TANDEM system was analyzed for English phone recognition and Czech ASR tasks. Positive impact of this system was observed for standard monophone and triphone systems based on MFCC features. The ASR combination of GMM-HMM/DNN-HMM with the AF-Based TANDEM system on the level of lattice with decoded hypotheses significantly improved baseline results. Finally, phonetic segmentation task was analyzed using various type of acoustic model architectures as well as focusing on proper pronunciation variant selection. It was done for the following two task: read English and casual Czech. Two-stage forced-alignment with combination of DNN-HMM and optimized monophone-based system was proposed and the improvement of phone boundary determination was proved for both tasks.

  • Název v anglickém jazyce

    Applying articulatory features within speech recognition

  • Popis výsledku anglicky

    This thesis deals with research on Articulatory Features (AF) of speech with special focus on improvement of Czech spontaneous speech recognition. As spontaneous speech is caused by frequent occurrence of coarticulation process, assimilation and reduction of phones and as AF contain the information about speech production mechanisms, they might represent a possible way how to improve results of these systems. The potential contribution of AF-based TANDEM ASR architecture on the tasks of recognition or phonetic segmentation of spontaneous speech is described. The multi-valued AF classes for Czech and four East-European languages were defined and unified. Next work was focused on the estimation of AF using artificial neural networks. The suitability of standard and advanced acoustic speech features was analyzed, mainly from the point of view of temporal context at the input of ANN/DNN network. The behaviour of AF estimation in mismatched or adverse noisy acoustic conditions was also studied and the robustness of DCT-TRAP features was proved as the best choice for this task. The application of AF within ASR was realized in the form of AF-Based TANDEM system. The performance of the AF-Based TANDEM system was analyzed for English phone recognition and Czech ASR tasks. Positive impact of this system was observed for standard monophone and triphone systems based on MFCC features. The ASR combination of GMM-HMM/DNN-HMM with the AF-Based TANDEM system on the level of lattice with decoded hypotheses significantly improved baseline results. Finally, phonetic segmentation task was analyzed using various type of acoustic model architectures as well as focusing on proper pronunciation variant selection. It was done for the following two task: read English and casual Czech. Two-stage forced-alignment with combination of DNN-HMM and optimized monophone-based system was proposed and the improvement of phone boundary determination was proved for both tasks.

Klasifikace

  • Druh

    O - Ostatní výsledky

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

  • Návaznosti

    S - Specificky vyzkum na vysokych skolach

Ostatní

  • Rok uplatnění

    2019

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů