Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F24%3A43969623" target="_blank" >RIV/49777513:23520/24:43969623 - isvavai.cz</a>
Výsledek na webu
<a href="https://onlinelibrary.wiley.com/doi/10.1111/coin.12602" target="_blank" >https://onlinelibrary.wiley.com/doi/10.1111/coin.12602</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1111/coin.12602" target="_blank" >10.1111/coin.12602</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure
Popis výsledku v původním jazyce
This article describes experiments on speech segmentation using long short-term memory recurrent neural networks. The main part of the paper deals with multi-lingual and cross-lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi-lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well-tuned hidden Markov model-based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text-to-speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.
Název v anglickém jazyce
Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure
Popis výsledku anglicky
This article describes experiments on speech segmentation using long short-term memory recurrent neural networks. The main part of the paper deals with multi-lingual and cross-lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi-lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well-tuned hidden Markov model-based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text-to-speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
20205 - Automation and control systems

Návaznosti výsledku

Projekt
<a href="/cs/project/GA19-19324S" target="_blank" >GA19-19324S: Plně trénovatelná syntéza české řeči z textu s využitím hlubokých neuronových sítí</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Computational Intelligence
ISSN
0824-7935
e-ISSN
1467-8640
Svazek periodika
40
Číslo periodika v rámci svazku
2
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
36
Strana od-do
—
Kód UT WoS článku
001066674400001
EID výsledku v databázi Scopus
2-s2.0-85171439072

Podobné výsledky(10)

Speakers Talking Foreign Languages in a Multi-lingual TTS System MT4CrossOIE: Multi-stage tuning for cross-lingual open information extraction Multi-lingual Dialogue Act Recognition with Deep Learning Methods

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)