All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Automatic Phonetic Segmentation Using the Kaldi Toolkit

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F17%3A43932638" target="_blank" >RIV/49777513:23520/17:43932638 - isvavai.cz</a>

  • Result on the web

    <a href="https://link.springer.com/chapter/10.1007%2F978-3-319-64206-2_16" target="_blank" >https://link.springer.com/chapter/10.1007%2F978-3-319-64206-2_16</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/978-3-319-64206-2_16" target="_blank" >10.1007/978-3-319-64206-2_16</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Automatic Phonetic Segmentation Using the Kaldi Toolkit

  • Original language description

    In this paper we explore the possibilities of hidden Markov model based automatic phonetic segmentation with the Kaldi toolkit. We compare the Kaldi toolkit and the Hidden Markov Model Toolkit (HTK) in terms of segmentation accuracy. The well-tuned HTK-based phonetic segmentation framework was taken as the baseline and compared to a newly proposed segmentation framework built from the default examples and recipes available in the Kaldi repository. Since the segmentation accuracy of the HTK-based system was significantly higher than that of the Kaldi-based system, the default Kaldi setting was modified with respect to pause model topology, the way of generating phonetic questions for clustering, and the number of Gaussian mixtures used during modeling. The modified Kaldi-based system achieved results comparable to those obtained by HTK—slightly worse for small segmentation errors but better for gross segmentation errors. We also confirmed that, for both toolkits, the standard three-state left-to-right model topology was significantly outperformed by a modified five-state left-to-right topology, especially with respect to small segmentation errors.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    20205 - Automation and control systems

Result continuities

  • Project

    <a href="/en/project/TH02010307" target="_blank" >TH02010307: Automatic voice banking and reconstruction for patients after total laryngectomy</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2017

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Text, Speech and Dialogue, 20th International Conference, TSD 2017, Prague, Czech Republic, August 27-31 August, 2017, Proceedings

  • ISBN

    978-3-319-64205-5

  • ISSN

    0302-9743

  • e-ISSN

  • Number of pages

    9

  • Pages from-to

    138-146

  • Publisher name

    Springer

  • Place of publication

    Cham

  • Event location

    Prague, Czech Republic

  • Event date

    Aug 27, 2017

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article

    000449869200016