Automatic Phonetic Segmentation Using the Kaldi Toolkit
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F17%3A43932638" target="_blank" >RIV/49777513:23520/17:43932638 - isvavai.cz</a>
Výsledek na webu
<a href="https://link.springer.com/chapter/10.1007%2F978-3-319-64206-2_16" target="_blank" >https://link.springer.com/chapter/10.1007%2F978-3-319-64206-2_16</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-64206-2_16" target="_blank" >10.1007/978-3-319-64206-2_16</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Automatic Phonetic Segmentation Using the Kaldi Toolkit
Popis výsledku v původním jazyce
In this paper we explore the possibilities of hidden Markov model based automatic phonetic segmentation with the Kaldi toolkit. We compare the Kaldi toolkit and the Hidden Markov Model Toolkit (HTK) in terms of segmentation accuracy. The well-tuned HTK-based phonetic segmentation framework was taken as the baseline and compared to a newly proposed segmentation framework built from the default examples and recipes available in the Kaldi repository. Since the segmentation accuracy of the HTK-based system was significantly higher than that of the Kaldi-based system, the default Kaldi setting was modified with respect to pause model topology, the way of generating phonetic questions for clustering, and the number of Gaussian mixtures used during modeling. The modified Kaldi-based system achieved results comparable to those obtained by HTK—slightly worse for small segmentation errors but better for gross segmentation errors. We also confirmed that, for both toolkits, the standard three-state left-to-right model topology was significantly outperformed by a modified five-state left-to-right topology, especially with respect to small segmentation errors.
Název v anglickém jazyce
Automatic Phonetic Segmentation Using the Kaldi Toolkit
Popis výsledku anglicky
In this paper we explore the possibilities of hidden Markov model based automatic phonetic segmentation with the Kaldi toolkit. We compare the Kaldi toolkit and the Hidden Markov Model Toolkit (HTK) in terms of segmentation accuracy. The well-tuned HTK-based phonetic segmentation framework was taken as the baseline and compared to a newly proposed segmentation framework built from the default examples and recipes available in the Kaldi repository. Since the segmentation accuracy of the HTK-based system was significantly higher than that of the Kaldi-based system, the default Kaldi setting was modified with respect to pause model topology, the way of generating phonetic questions for clustering, and the number of Gaussian mixtures used during modeling. The modified Kaldi-based system achieved results comparable to those obtained by HTK—slightly worse for small segmentation errors but better for gross segmentation errors. We also confirmed that, for both toolkits, the standard three-state left-to-right model topology was significantly outperformed by a modified five-state left-to-right topology, especially with respect to small segmentation errors.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20205 - Automation and control systems
Návaznosti výsledku
Projekt
<a href="/cs/project/TH02010307" target="_blank" >TH02010307: Automatická konzervace a rekonstrukce hlasu se zaměřením na pacienty po totální laryngektomii</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Text, Speech and Dialogue, 20th International Conference, TSD 2017, Prague, Czech Republic, August 27-31 August, 2017, Proceedings
ISBN
978-3-319-64205-5
ISSN
0302-9743
e-ISSN
—
Počet stran výsledku
9
Strana od-do
138-146
Název nakladatele
Springer
Místo vydání
Cham
Místo konání akce
Prague, Czech Republic
Datum konání akce
27. 8. 2017
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000449869200016