Automatic Phonetic Segmentation Using the Kaldi Toolkit
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F17%3A43932638" target="_blank" >RIV/49777513:23520/17:43932638 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/chapter/10.1007%2F978-3-319-64206-2_16" target="_blank" >https://link.springer.com/chapter/10.1007%2F978-3-319-64206-2_16</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-64206-2_16" target="_blank" >10.1007/978-3-319-64206-2_16</a>
Alternative languages
Result language
angličtina
Original language name
Automatic Phonetic Segmentation Using the Kaldi Toolkit
Original language description
In this paper we explore the possibilities of hidden Markov model based automatic phonetic segmentation with the Kaldi toolkit. We compare the Kaldi toolkit and the Hidden Markov Model Toolkit (HTK) in terms of segmentation accuracy. The well-tuned HTK-based phonetic segmentation framework was taken as the baseline and compared to a newly proposed segmentation framework built from the default examples and recipes available in the Kaldi repository. Since the segmentation accuracy of the HTK-based system was significantly higher than that of the Kaldi-based system, the default Kaldi setting was modified with respect to pause model topology, the way of generating phonetic questions for clustering, and the number of Gaussian mixtures used during modeling. The modified Kaldi-based system achieved results comparable to those obtained by HTK—slightly worse for small segmentation errors but better for gross segmentation errors. We also confirmed that, for both toolkits, the standard three-state left-to-right model topology was significantly outperformed by a modified five-state left-to-right topology, especially with respect to small segmentation errors.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/TH02010307" target="_blank" >TH02010307: Automatic voice banking and reconstruction for patients after total laryngectomy</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Text, Speech and Dialogue, 20th International Conference, TSD 2017, Prague, Czech Republic, August 27-31 August, 2017, Proceedings
ISBN
978-3-319-64205-5
ISSN
0302-9743
e-ISSN
—
Number of pages
9
Pages from-to
138-146
Publisher name
Springer
Place of publication
Cham
Event location
Prague, Czech Republic
Event date
Aug 27, 2017
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000449869200016