Speakers Talking Foreign Languages in a Multi-lingual TTS System
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F21%3A43962411" target="_blank" >RIV/49777513:23520/21:43962411 - isvavai.cz</a>
Výsledek na webu
<a href="https://link.springer.com/chapter/10.1007%2F978-3-030-83527-9_42" target="_blank" >https://link.springer.com/chapter/10.1007%2F978-3-030-83527-9_42</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-83527-9_42" target="_blank" >10.1007/978-3-030-83527-9_42</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Speakers Talking Foreign Languages in a Multi-lingual TTS System
Popis výsledku v původním jazyce
This paper presents experiments with a multi-lingual multi-speaker TTS synthesis system jointly trained on English, German, Russian, and Czech speech data. The experimental LSTM-based TTS system with a trainable neural vocoder utilizes the International Phonetic Alphabet (IPA) which allows a straight combination of different languages. We analyzed whether the joint model is capable to generalize and mix the information contained in the training data and whether particular voices can be used for the synthesis of different languages, including the language-specific phonemes. The intelligibility of generated speech was assessed by an SUS (Semantically Unpredictable Sentences) listening tests containing Czech sentences spoken by non-Czech speakers. The performance of the joint multi-lingual model was also compared with independent single-voice models where the missing non-native phonemes were mapped to the most similar native phonemes. Besides the Czech sentences, the preference test also contained the English sentences spoken by Czech voices. The multi-lingual model was preferred for all evaluated voices. Although the generated speech did not sound like a native speaker, the phonetic and prosodic features were definitely better.
Název v anglickém jazyce
Speakers Talking Foreign Languages in a Multi-lingual TTS System
Popis výsledku anglicky
This paper presents experiments with a multi-lingual multi-speaker TTS synthesis system jointly trained on English, German, Russian, and Czech speech data. The experimental LSTM-based TTS system with a trainable neural vocoder utilizes the International Phonetic Alphabet (IPA) which allows a straight combination of different languages. We analyzed whether the joint model is capable to generalize and mix the information contained in the training data and whether particular voices can be used for the synthesis of different languages, including the language-specific phonemes. The intelligibility of generated speech was assessed by an SUS (Semantically Unpredictable Sentences) listening tests containing Czech sentences spoken by non-Czech speakers. The performance of the joint multi-lingual model was also compared with independent single-voice models where the missing non-native phonemes were mapped to the most similar native phonemes. Besides the Czech sentences, the preference test also contained the English sentences spoken by Czech voices. The multi-lingual model was preferred for all evaluated voices. Although the generated speech did not sound like a native speaker, the phonetic and prosodic features were definitely better.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20205 - Automation and control systems
Návaznosti výsledku
Projekt
<a href="/cs/project/GA19-19324S" target="_blank" >GA19-19324S: Plně trénovatelná syntéza české řeči z textu s využitím hlubokých neuronových sítí</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Text, Speech, and Dialogue 24th International Conference, TSD 2021, Olomouc, Czech Republic, September 6–9, 2021, Proceedings
ISBN
978-3-030-83526-2
ISSN
0302-9743
e-ISSN
1611-3349
Počet stran výsledku
10
Strana od-do
489-498
Název nakladatele
Springer International Publishing
Místo vydání
Cham
Místo konání akce
Olomouc, Czech Republic
Datum konání akce
6. 9. 2021
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—