Training Tips for the Transformer Model

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F18%3A10390090" target="_blank" >RIV/00216208:11320/18:10390090 - isvavai.cz</a>
Výsledek na webu
<a href="https://ufal.mff.cuni.cz/pbml/110/art-popel-bojar.pdf" target="_blank" >https://ufal.mff.cuni.cz/pbml/110/art-popel-bojar.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.2478/pralin-2018-0002" target="_blank" >10.2478/pralin-2018-0002</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Training Tips for the Transformer Model
Popis výsledku v původním jazyce
This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each experiment with a set of recommendations for fellow researchers. In addition to confirming the general mantra "more data and larger models", we address scaling to multiple GPUs and provide practical tips for improved training regarding batch size, learning rate, warmup steps, maximum sentence length and checkpoint averaging. We hope that our observations will allow others to get better results given their particular hardware and data constraints.
Název v anglickém jazyce
Training Tips for the Transformer Model
Popis výsledku anglicky
This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each experiment with a set of recommendations for fellow researchers. In addition to confirming the general mantra "more data and larger models", we address scaling to multiple GPUs and provide practical tips for improved training regarding batch size, learning rate, warmup steps, maximum sentence length and checkpoint averaging. We hope that our observations will allow others to get better results given their particular hardware and data constraints.

Klasifikace

Druh
J<sub>ost</sub> - Ostatní články v recenzovaných periodicích
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
The Prague Bulletin of Mathematical Linguistics
ISSN
0032-6585
e-ISSN
—
Svazek periodika
110
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
CZ - Česká republika
Počet stran výsledku
28
Strana od-do
43-70
Kód UT WoS článku
—
EID výsledku v databázi Scopus
—

Podobné výsledky(10)

CUNI Submissions in WMT18 Curriculum Learning and Minibatch Bucketing in Neural Machine Translation CUni Multilingual Matrix in the WMT 2013 Shared Task

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Training Tips for the Transformer Model

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)