All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Effects of Large Multi-Speaker Models on the Quality of Neural Speech Synthesis

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F24%3A43974114" target="_blank" >RIV/49777513:23520/24:43974114 - isvavai.cz</a>

  • Result on the web

    <a href="https://svk.fav.zcu.cz/download/proceedings_svk_2024.pdf" target="_blank" >https://svk.fav.zcu.cz/download/proceedings_svk_2024.pdf</a>

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Effects of Large Multi-Speaker Models on the Quality of Neural Speech Synthesis

  • Original language description

    These days, speech synthesis is usually performed by neural models (Tan et al., 2021).A neural speech synthesizer is dependent on a large number of parameters, whose values mustbe acquired during the process of model training. In many situations, the result of trainingcan be improved by fine-tuning a pre-trained model, i.e. using the parameter values of a modelwhich has been trained using different training data to initialize the parameters of the targetmodel before the training process begins (Zhang et al., 2023).In the field of speech synthesis, a pre-trained model is a speech synthesizer which hasbeen trained to synthesize the voice of another speaker. Furthermore, we can use a multi-speakerpre-trained model, which has been trained using speech recordings of multiple speakers, so itshould contain general knowledge about human speech.This paper describes how the number of speakers used to train a pre-trained model affectsthe quality of the final synthetic speech. We used a single-speaker model as well as two multispeakermodels for fine-tuning and we compared the obtained models in a listening test.

  • Czech name

  • Czech description

Classification

  • Type

    O - Miscellaneous

  • CEP classification

  • OECD FORD branch

    20205 - Automation and control systems

Result continuities

  • Project

  • Continuities

    S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2024

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů