Multilingual ELMo and the Effects of Corpus Sampling
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F21%3A10442289" target="_blank" >RIV/00216208:11320/21:10442289 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Multilingual ELMo and the Effects of Corpus Sampling
Original language description
Multilingual pretrained language models are rapidly gaining popularity in NLP systems for non-English languages. Most of these models feature an important corpus sampling step in the process of accumulating training data in different languages, to ensure that the signal from better resourced languages does not drown out poorly resourced ones. In this study, we train multiple multilingual recurrent language models, based on the ELMo architecture, and analyse both the effect of varying corpus size ratios on downstream performance, as well as the performance difference between monolingual models for each language, and broader multilingual language models. As part of this effort, we also make these trained models available for public use.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
ISBN
978-91-7929-614-8
ISSN
—
e-ISSN
—
Number of pages
7
Pages from-to
378-384
Publisher name
Linköping University Electronic Press
Place of publication
Linköping
Event location
Reykjavik
Event date
May 31, 2021
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—