Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F20%3A43958604" target="_blank" >RIV/49777513:23520/20:43958604 - isvavai.cz</a>
Výsledek na webu
<a href="http://iris.elf.stuba.sk/JEEEC/data/pdf/2_120-02.pdf" target="_blank" >http://iris.elf.stuba.sk/JEEEC/data/pdf/2_120-02.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.2478/jee-2020-0012" target="_blank" >10.2478/jee-2020-0012</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations
Popis výsledku v původním jazyce
Quality of speech synthesis is a crucial issue in comparison of various text-to-speech (TTS) systems. We proposed a system for automatic evaluation of speech quality by statistical analysis of temporal features (time duration, phrasing, and time structuring of an analysed sentence) together with standard spectral and prosodic features. This system was successfully tested on sentences produced by a unit selection speech synthesizer with a male as well as a female voice using two different approaches to prosody manipulation. Experiments have shown that for correct, sharp, and stable results all three types of speech features (spectral, prosodic, and temporal) are necessary. Furthermore, the number of used statistical parameters has a significant impact on the correctness and precision of the evaluated results. It was also demonstrated that the stability of the whole evaluation process is improved by enlarging the used speech material. Finally, the functionality of the proposed system was verified by comparison of the results with those of the standard listening test.
Název v anglickém jazyce
Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations
Popis výsledku anglicky
Quality of speech synthesis is a crucial issue in comparison of various text-to-speech (TTS) systems. We proposed a system for automatic evaluation of speech quality by statistical analysis of temporal features (time duration, phrasing, and time structuring of an analysed sentence) together with standard spectral and prosodic features. This system was successfully tested on sentences produced by a unit selection speech synthesizer with a male as well as a female voice using two different approaches to prosody manipulation. Experiments have shown that for correct, sharp, and stable results all three types of speech features (spectral, prosodic, and temporal) are necessary. Furthermore, the number of used statistical parameters has a significant impact on the correctness and precision of the evaluated results. It was also demonstrated that the stability of the whole evaluation process is improved by enlarging the used speech material. Finally, the functionality of the proposed system was verified by comparison of the results with those of the standard listening test.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
20205 - Automation and control systems
Návaznosti výsledku
Projekt
<a href="/cs/project/GA19-19324S" target="_blank" >GA19-19324S: Plně trénovatelná syntéza české řeči z textu s využitím hlubokých neuronových sítí</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2020
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Journal of ELECTRICAL ENGINEERING
ISSN
1335-3632
e-ISSN
—
Svazek periodika
71
Číslo periodika v rámci svazku
2
Stát vydavatele periodika
SK - Slovenská republika
Počet stran výsledku
9
Strana od-do
78-86
Kód UT WoS článku
000536287900002
EID výsledku v databázi Scopus
2-s2.0-85085749611