On the Language Neutrality of Pre-trained Multilingual Representations

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F20%3A10424472" target="_blank" >RIV/00216208:11320/20:10424472 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.aclweb.org/anthology/2020.findings-emnlp.150/" target="_blank" >https://www.aclweb.org/anthology/2020.findings-emnlp.150/</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.18653/v1/2020.findings-emnlp.150" target="_blank" >10.18653/v1/2020.findings-emnlp.150</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
On the Language Neutrality of Pre-trained Multilingual Representations
Popis výsledku v původním jazyce
Multilingual contextual embeddings, such as multilingual BERT (mBERT) and XLM-RoBERTa, have proved useful for many multi-lingual tasks. Previous work probed the cross-linguality of the representations indirectly using zero-shot transfer learning on morphological and syntactic tasks. We instead focus on the language-neutrality of mBERT with respect to lexical semantics. Our results show that contextual embeddings are more language-neutral and in general more informative than aligned static word-type embeddings which are explicitly trained for language neutrality. Contextual embeddings are still by default only moderately language-neutral, however, we show two simple methods for achieving stronger language neutrality: first, by unsupervised centering of the representation for languages, and second by fitting an explicit projection on small parallel data. In addition, we show how to reach state-of-the-art accuracy on language identification and word alignment in parallel sentences.
Název v anglickém jazyce
On the Language Neutrality of Pre-trained Multilingual Representations
Popis výsledku anglicky
Multilingual contextual embeddings, such as multilingual BERT (mBERT) and XLM-RoBERTa, have proved useful for many multi-lingual tasks. Previous work probed the cross-linguality of the representations indirectly using zero-shot transfer learning on morphological and syntactic tasks. We instead focus on the language-neutrality of mBERT with respect to lexical semantics. Our results show that contextual embeddings are more language-neutral and in general more informative than aligned static word-type embeddings which are explicitly trained for language neutrality. Contextual embeddings are still by default only moderately language-neutral, however, we show two simple methods for achieving stronger language neutrality: first, by unsupervised centering of the representation for languages, and second by fitting an explicit projection on small parallel data. In addition, we show how to reach state-of-the-art accuracy on language identification and word alignment in parallel sentences.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/GA18-02196S" target="_blank" >GA18-02196S: Reprezentace lingvistické struktury v neuronových sítích</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2020
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Findings of the Association for Computational Linguistics: EMNLP 2020
ISBN
978-1-952148-90-3
ISSN
—
e-ISSN
—
Počet stran výsledku
12
Strana od-do
1663-1674
Název nakladatele
Association for Computational Linguistics
Místo vydání
Stroudsburg, PA, USA
Místo konání akce
Online
Datum konání akce
16. 11. 2020
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Combining Static and Contextualised Multilingual Embeddings Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Co hledáte?

Rychlé hledání

Chytré vyhledávání

On the Language Neutrality of Pre-trained Multilingual Representations

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)