Is Transformer-Based Attention Agnostic of the Pretraining Language and Task?
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AX7T7VVAC" target="_blank" >RIV/00216208:11320/25:X7T7VVAC - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200685211&doi=10.1007%2f978-3-031-64881-6_6&partnerID=40&md5=a62794440b7cf4cb3595f122ce95dac7" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200685211&doi=10.1007%2f978-3-031-64881-6_6&partnerID=40&md5=a62794440b7cf4cb3595f122ce95dac7</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-031-64881-6_6" target="_blank" >10.1007/978-3-031-64881-6_6</a>
Alternative languages
Result language
angličtina
Original language name
Is Transformer-Based Attention Agnostic of the Pretraining Language and Task?
Original language description
Since the introduction of the Transformer by Vaswani et al. in 2017, the attention mechanism has been used in multiple state-of-the-art large language models (LLMs), such as BERT, ELECTRA, and various GPT versions. Due to the complexity and the large size of LLMs and deep neural networks in general, intelligible explanations for specific model outputs can be difficult to formulate. However, mechanistic interpretability research aims to make this problem more tractable. In this paper, we show that models with different training objectives—namely, masked language modelling and replaced token detection—have similar internal patterns of attention, even when trained for different languages, in our case English, Afrikaans, Xhosa, and Zulu. This result suggests that, on a high level, the learnt role of attention is language-agnostic. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Commun. Comput. Info. Sci.
ISBN
978-303164880-9
ISSN
1865-0929
e-ISSN
—
Number of pages
29
Pages from-to
95-123
Publisher name
Springer Science and Business Media Deutschland GmbH
Place of publication
—
Event location
Gqeberha
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—