Exploring the Relationship between Alignment and Cross-lingual Transfer in Multilingual Transformers
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3AXPPSPEE9" target="_blank" >RIV/00216208:11320/23:XPPSPEE9 - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85175463133&partnerID=40&md5=cd5ea22c37e7c425a68cbb45417daa8b" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85175463133&partnerID=40&md5=cd5ea22c37e7c425a68cbb45417daa8b</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Exploring the Relationship between Alignment and Cross-lingual Transfer in Multilingual Transformers
Original language description
"Without any explicit cross-lingual training data, multilingual language models can achieve cross-lingual transfer. One common way to improve this transfer is to perform realignment steps before fine-tuning, i.e., to train the model to build similar representations for pairs of words from translated sentences. But such realignment methods were found to not always improve results across languages and tasks, which raises the question of whether aligned representations are truly beneficial for cross-lingual transfer. We provide evidence that alignment is actually significantly correlated with cross-lingual transfer across languages, models and random seeds. We show that fine-tuning can have a significant impact on alignment, depending mainly on the downstream task and the model. Finally, we show that realignment can, in some instances, improve cross-lingual transfer, and we identify conditions in which realignment methods provide significant improvements. Namely, we find that realignment works better on tasks for which alignment is correlated with cross-lingual transfer when generalizing to a distant language and with smaller models, as well as when using a bilingual dictionary rather than FastAlign to extract realignment pairs. For example, for POS-tagging, between English and Arabic, realignment can bring a +15.8 accuracy improvement on distilmBERT, even outperforming XLM-R Large by 1.7. We thus advocate for further research on realignment methods for smaller multilingual models as an alternative to scaling. © 2023 Association for Computational Linguistics."
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
"Proc. Annu. Meet. Assoc. Comput Linguist."
ISBN
978-195942962-3
ISSN
0736-587X
e-ISSN
—
Number of pages
23
Pages from-to
3020-3042
Publisher name
Association for Computational Linguistics (ACL)
Place of publication
—
Event location
Melaka, Malaysia
Event date
Jan 1, 2023
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—