Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

Effects of Large Multi-Speaker Models on the Quality of Neural Speech Synthesis

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F24%3A43974114" target="_blank" >RIV/49777513:23520/24:43974114 - isvavai.cz</a>

  • Výsledek na webu

    <a href="https://svk.fav.zcu.cz/download/proceedings_svk_2024.pdf" target="_blank" >https://svk.fav.zcu.cz/download/proceedings_svk_2024.pdf</a>

  • DOI - Digital Object Identifier

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    Effects of Large Multi-Speaker Models on the Quality of Neural Speech Synthesis

  • Popis výsledku v původním jazyce

    These days, speech synthesis is usually performed by neural models (Tan et al., 2021).A neural speech synthesizer is dependent on a large number of parameters, whose values mustbe acquired during the process of model training. In many situations, the result of trainingcan be improved by fine-tuning a pre-trained model, i.e. using the parameter values of a modelwhich has been trained using different training data to initialize the parameters of the targetmodel before the training process begins (Zhang et al., 2023).In the field of speech synthesis, a pre-trained model is a speech synthesizer which hasbeen trained to synthesize the voice of another speaker. Furthermore, we can use a multi-speakerpre-trained model, which has been trained using speech recordings of multiple speakers, so itshould contain general knowledge about human speech.This paper describes how the number of speakers used to train a pre-trained model affectsthe quality of the final synthetic speech. We used a single-speaker model as well as two multispeakermodels for fine-tuning and we compared the obtained models in a listening test.

  • Název v anglickém jazyce

    Effects of Large Multi-Speaker Models on the Quality of Neural Speech Synthesis

  • Popis výsledku anglicky

    These days, speech synthesis is usually performed by neural models (Tan et al., 2021).A neural speech synthesizer is dependent on a large number of parameters, whose values mustbe acquired during the process of model training. In many situations, the result of trainingcan be improved by fine-tuning a pre-trained model, i.e. using the parameter values of a modelwhich has been trained using different training data to initialize the parameters of the targetmodel before the training process begins (Zhang et al., 2023).In the field of speech synthesis, a pre-trained model is a speech synthesizer which hasbeen trained to synthesize the voice of another speaker. Furthermore, we can use a multi-speakerpre-trained model, which has been trained using speech recordings of multiple speakers, so itshould contain general knowledge about human speech.This paper describes how the number of speakers used to train a pre-trained model affectsthe quality of the final synthetic speech. We used a single-speaker model as well as two multispeakermodels for fine-tuning and we compared the obtained models in a listening test.

Klasifikace

  • Druh

    O - Ostatní výsledky

  • CEP obor

  • OECD FORD obor

    20205 - Automation and control systems

Návaznosti výsledku

  • Projekt

  • Návaznosti

    S - Specificky vyzkum na vysokych skolach

Ostatní

  • Rok uplatnění

    2024

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů