Results of the WMT16 Metrics Shared Task
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F16%3A10335451" target="_blank" >RIV/00216208:11320/16:10335451 - isvavai.cz</a>
Výsledek na webu
<a href="http://www.statmt.org/wmt16/pdf/W16-2302.pdf" target="_blank" >http://www.statmt.org/wmt16/pdf/W16-2302.pdf</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.18653/v1/w16-2302" target="_blank" >10.18653/v1/w16-2302</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Results of the WMT16 Metrics Shared Task
Popis výsledku v původním jazyce
This paper presents the results of the WMT16 Metrics Shared Task. We asked participants of this task to score the outputs of the MT systems involved in the WMT16 Shared Translation Task. We collected scores of 16 metrics from 9 research groups. In addition to that, we computed scores of 9 standard metrics (BLEU, SentBLEU, NIST, WER, PER, TER and CDER) as baselines. The collected scores were evaluated in terms of system-level correlation (how well each metric's scores correlate with WMT16 official manual ranking of systems) and in terms of segment level correlation (how often a metric agrees with humans in comparing two translations of a particular sentence). This year there are several additions to the setup: large number of language pairs (18 in total), datasets from different domains (news, IT and medical), and different kinds of judgments: relative ranking (RR), direct assessment (DA) and HUME manual semantic judgments. Finally, generation of large number of hybrid systems was trialed for provision.
Název v anglickém jazyce
Results of the WMT16 Metrics Shared Task
Popis výsledku anglicky
This paper presents the results of the WMT16 Metrics Shared Task. We asked participants of this task to score the outputs of the MT systems involved in the WMT16 Shared Translation Task. We collected scores of 16 metrics from 9 research groups. In addition to that, we computed scores of 9 standard metrics (BLEU, SentBLEU, NIST, WER, PER, TER and CDER) as baselines. The collected scores were evaluated in terms of system-level correlation (how well each metric's scores correlate with WMT16 official manual ranking of systems) and in terms of segment level correlation (how often a metric agrees with humans in comparing two translations of a particular sentence). This year there are several additions to the setup: large number of language pairs (18 in total), datasets from different domains (news, IT and medical), and different kinds of judgments: relative ranking (RR), direct assessment (DA) and HUME manual semantic judgments. Finally, generation of large number of hybrid systems was trialed for provision.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2016
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the First Conference on Machine Translation (WMT). Volume 2: Shared Task Papers
ISBN
978-1-945626-10-4
ISSN
—
e-ISSN
—
Počet stran výsledku
33
Strana od-do
199-231
Název nakladatele
Association for Computational Linguistics
Místo vydání
Stroudsburg, PA, USA
Místo konání akce
Berlin, Germany
Datum konání akce
11. 8. 2016
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—