Speakers Talking Foreign Languages in a Multi-lingual TTS System
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F21%3A43962411" target="_blank" >RIV/49777513:23520/21:43962411 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/chapter/10.1007%2F978-3-030-83527-9_42" target="_blank" >https://link.springer.com/chapter/10.1007%2F978-3-030-83527-9_42</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-83527-9_42" target="_blank" >10.1007/978-3-030-83527-9_42</a>
Alternative languages
Result language
angličtina
Original language name
Speakers Talking Foreign Languages in a Multi-lingual TTS System
Original language description
This paper presents experiments with a multi-lingual multi-speaker TTS synthesis system jointly trained on English, German, Russian, and Czech speech data. The experimental LSTM-based TTS system with a trainable neural vocoder utilizes the International Phonetic Alphabet (IPA) which allows a straight combination of different languages. We analyzed whether the joint model is capable to generalize and mix the information contained in the training data and whether particular voices can be used for the synthesis of different languages, including the language-specific phonemes. The intelligibility of generated speech was assessed by an SUS (Semantically Unpredictable Sentences) listening tests containing Czech sentences spoken by non-Czech speakers. The performance of the joint multi-lingual model was also compared with independent single-voice models where the missing non-native phonemes were mapped to the most similar native phonemes. Besides the Czech sentences, the preference test also contained the English sentences spoken by Czech voices. The multi-lingual model was preferred for all evaluated voices. Although the generated speech did not sound like a native speaker, the phonetic and prosodic features were definitely better.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/GA19-19324S" target="_blank" >GA19-19324S: Fully Trainable Deep Neural Network Based Czech Text-to-Speech Synthesis</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Text, Speech, and Dialogue 24th International Conference, TSD 2021, Olomouc, Czech Republic, September 6–9, 2021, Proceedings
ISBN
978-3-030-83526-2
ISSN
0302-9743
e-ISSN
1611-3349
Number of pages
10
Pages from-to
489-498
Publisher name
Springer International Publishing
Place of publication
Cham
Event location
Olomouc, Czech Republic
Event date
Sep 6, 2021
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—