On the Analysis of Training Data for WaveNet-Based Speech Synthesis

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F18%3A43952771" target="_blank" >RIV/49777513:23520/18:43952771 - isvavai.cz</a>
Result on the web
<a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8461960" target="_blank" >https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8461960</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICASSP.2018.8461960" target="_blank" >10.1109/ICASSP.2018.8461960</a>

Result language
angličtina
Original language name
On the Analysis of Training Data for WaveNet-Based Speech Synthesis
Original language description
In this paper, we analyze how much, how consistent and how accurate data WaveNet-based speech synthesis method needs to be able to generate speech of good quality. We do this by adding artificial noise to the description of our training data and observing how well WaveNet trains and produces speech. More specifically, we add noise to both phonetic segmentation and annotation accuracy, and we also reduce the size of training data by using a fewer number of sentences during training of a WaveNet model. We conducted MUSHRA listening tests and used objective measures to track speech quality within the conducted experiments. We show that WaveNet retains high quality even after adding a small amount of noise (up to 10%) to phonetic segmentation and annotation. A small degradation of speech quality was observed for our WaveNet configuration when only 3 hours of training data were used.
Czech name
—
Czech description
—

Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Article name in the collection
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
ISBN
978-1-5386-4658-8
ISSN
—
e-ISSN
2379-190X
Number of pages
5
Pages from-to
5684-5688
Publisher name
IEEE
Place of publication
New York
Event location
Calgary, AB, Canada
Event date
May 15, 2018
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000446384605169

Similar results(10)