High Quality ELMo Embeddings for Seven Less-Resourced Languages
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F19%3A10427042" target="_blank" >RIV/00216208:11320/19:10427042 - isvavai.cz</a>
Result on the web
<a href="https://www.aclweb.org/anthology/2020.lrec-1.582" target="_blank" >https://www.aclweb.org/anthology/2020.lrec-1.582</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
High Quality ELMo Embeddings for Seven Less-Resourced Languages
Original language description
Recent results show that deep neural networks using contextual embeddings significantly outperform non-contextual embeddings on a majority of text classification task. We offer precomputed embeddings from popular contextual ELMo model for seven languages: Croatian, Estonian, Finnish, Latvian, Lithuanian, Slovenian, and Swedish. We demonstrate that the quality of embeddings strongly depends on the size of training set and show that existing publicly available ELMo embeddings for listed languages shall be improved. We train new ELMo embeddings on much larger training sets and show their advantage over baseline non-contextual FastText embeddings. In evaluation, we use two benchmarks, the analogy task and the NER task.
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2019
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů