CUNI Transformer Neural MT System for WMT18
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F18%3A10390199" target="_blank" >RIV/00216208:11320/18:10390199 - isvavai.cz</a>
Result on the web
<a href="http://www.statmt.org/wmt18/pdf/WMT051.pdf" target="_blank" >http://www.statmt.org/wmt18/pdf/WMT051.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.18653/v1/W18-64051" target="_blank" >10.18653/v1/W18-64051</a>
Alternative languages
Result language
angličtina
Original language name
CUNI Transformer Neural MT System for WMT18
Original language description
We describe our NMT system submitted to the WMT2018 shared task in news translation. Our system is based on the Transformer model (Vaswani et al., 2017). We use an improved technique of backtranslation, where we iterate the process of translating monolingual data in one direction and training an NMT model for the opposite direction using synthetic parallel data. We apply a simple but effective filtering of the synthetic data. We pre-process the input sentences using coreference resolution in order to disambiguate the gender of pro-dropped personal pronouns. Finally, we apply two simple post-processing substitutions on the translated output. Our system is significantly (p < 0.05) better than all other English-Czech and Czech-English systems in WMT2018.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/DG16P02B048" target="_blank" >DG16P02B048: System for permanent preservation of documentation and presentation of historical sources from the period of totalitarian regimes</a><br>
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Tasks
ISBN
978-1-948087-81-0
ISSN
—
e-ISSN
neuvedeno
Number of pages
6
Pages from-to
486-491
Publisher name
Association for Computational Linguistics
Place of publication
Stroudsburg, PA, USA
Event location
Bruxelles, Belgium
Event date
Oct 31, 2018
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—