XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F20%3A10426955" target="_blank" >RIV/00216208:11320/20:10426955 - isvavai.cz</a>
Result on the web
<a href="https://www.aclweb.org/anthology/2020.emnlp-main.484" target="_blank" >https://www.aclweb.org/anthology/2020.emnlp-main.484</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation
Original language description
In this paper, we introduce XGLUE, a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE (Wang et al.,2019), which is labeled in English and includes natural language understanding tasks only, XGLUE has three main advantages: (1) it provides two corpora with different sizes for cross-lingual pre-training; (2) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (3) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder (Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison.
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů