Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F20%3A10424470" target="_blank" >RIV/00216208:11320/20:10424470 - isvavai.cz</a>
Result on the web
<a href="https://www.aclweb.org/anthology/2020.ngt-1.18/" target="_blank" >https://www.aclweb.org/anthology/2020.ngt-1.18/</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task
Original language description
We present our submission to the Simultaneous Translation And Paraphrase for Language Education (STAPLE) challenge. We used a standard Transformer model for translation, with a crosslingual classifier predicting correct translations on the output n-best list. To increase the diversity of the outputs, we used additional data to train the translation model, and we trained a paraphrasing model based on the Levenshtein Transformer architecture to generate further synonymous translations. The paraphrasing results were again filtered using our classifier. While the use of additional data and our classifier filter were able to improve results, the paraphrasing model produced too many invalid outputs to further improve the output quality. Our model without the paraphrasing component finished in the middle of the field for the shared task, improving over the best baseline by a margin of 10-22 % weighted F1 absolute.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/GX19-26934X" target="_blank" >GX19-26934X: Neural Representations in Multi-modal and Multi-lingual Modeling</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Fourth Workshop on Neural Generation and Translation
ISBN
978-1-952148-17-0
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
153-160
Publisher name
Association for Computational Linguistics
Place of publication
Stroudsburg, PA, USA
Event location
Online
Event date
Jul 10, 2020
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000563428500018