Using a multilingual literary parallel corpus to train NMT systems
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A90244%2F24%3A10495709" target="_blank" >RIV/00216208:90244/24:10495709 - isvavai.cz</a>
Result on the web
<a href="https://aclanthology.org/2024.ctt-1.1.pdf" target="_blank" >https://aclanthology.org/2024.ctt-1.1.pdf</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Using a multilingual literary parallel corpus to train NMT systems
Original language description
This article presents an application of a multilingual and multidirectional parallel corpus composed of literary texts in five Romance languages (Spanish, French, Italian, Portuguese, Romanian) and a Slavic language (Croatian), with a total of 142,000 segments and 15.7 million words. After combining it with very large freely available parallel corpora, this resource is used to train NMT systems tailored to literature. A total of five NMT systems have been trained: Spanish-French, Spanish-Italian, Spanish-Portuguese, Spanish-Romanian and Spanish-Croatian. The trained systems were evaluated using automatic metrics (BLEU, chrF2 and TER) and a comparison with a rule-based MT system (Apertium) and a neural system (Google Translate) is presented. As a main conclusion, we can highlight that the use of this literary corpus has been very productive, as the majority of the trained systems achieve comparable, and in some cases even better, values of the automatic quality metrics than a widely used commercial NMT system.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
60203 - Linguistics
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 1st Workshop on Creative-text Translation and Technology
ISBN
978-1-06-869073-0
ISSN
—
e-ISSN
—
Number of pages
9
Pages from-to
1-9
Publisher name
European Association for Machine Translation
Place of publication
Sheffield
Event location
Sheffield
Event date
Jun 27, 2024
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—