VITS: Quality vs. Speed Analysis
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F23%3A43969619" target="_blank" >RIV/49777513:23520/23:43969619 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/chapter/10.1007/978-3-031-40498-6_19" target="_blank" >https://link.springer.com/chapter/10.1007/978-3-031-40498-6_19</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-031-40498-6_19" target="_blank" >10.1007/978-3-031-40498-6_19</a>
Alternative languages
Result language
angličtina
Original language name
VITS: Quality vs. Speed Analysis
Original language description
In this paper, we analyze the performance of a modern end-to-end speech synthesis model called Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). We build on the original VITS model and examine how different modifications to its architecture affect synthetic speech quality and computational complexity. Experiments with two Czech voices, a male and a female, were carried out. To assess the quality of speech synthesized by the different modified models, MUSHRA listening tests were performed. The computational complexity was measured in terms of synthesis speed over real time. While the original VITS model is still preferred regarding speech quality, we present a modification of the original structure with a significantly better response yet providing acceptable output quality. Such a configuration can be used when system response latency is critical.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/TL05000546" target="_blank" >TL05000546: Using a multimedia monolingual dictionary for modern teaching of Czech</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Text, Speech, and Dialogue 26th International Conference, TSD 2023, Pilsen, Czech Republic, September 4–6, 2023, Proceedings
ISBN
978-3-031-40497-9
ISSN
0302-9743
e-ISSN
1611-3349
Number of pages
12
Pages from-to
214-225
Publisher name
Springer International Publishing
Place of publication
Cham
Event location
Pilsen, Czech Republic
Event date
Sep 4, 2023
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—