Training Tips for the Transformer Model
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F18%3A10390090" target="_blank" >RIV/00216208:11320/18:10390090 - isvavai.cz</a>
Result on the web
<a href="https://ufal.mff.cuni.cz/pbml/110/art-popel-bojar.pdf" target="_blank" >https://ufal.mff.cuni.cz/pbml/110/art-popel-bojar.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.2478/pralin-2018-0002" target="_blank" >10.2478/pralin-2018-0002</a>
Alternative languages
Result language
angličtina
Original language name
Training Tips for the Transformer Model
Original language description
This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each experiment with a set of recommendations for fellow researchers. In addition to confirming the general mantra "more data and larger models", we address scaling to multiple GPUs and provide practical tips for improved training regarding batch size, learning rate, warmup steps, maximum sentence length and checkpoint averaging. We hope that our observations will allow others to get better results given their particular hardware and data constraints.
Czech name
—
Czech description
—
Classification
Type
J<sub>ost</sub> - Miscellaneous article in a specialist periodical
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
The Prague Bulletin of Mathematical Linguistics
ISSN
0032-6585
e-ISSN
—
Volume of the periodical
110
Issue of the periodical within the volume
1
Country of publishing house
CZ - CZECH REPUBLIC
Number of pages
28
Pages from-to
43-70
UT code for WoS article
—
EID of the result in the Scopus database
—