What does the language system look like in pre-trained language models? A study using complex networks
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AGT6VV3NW" target="_blank" >RIV/00216208:11320/25:GT6VV3NW - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85194716679&doi=10.1016%2fj.knosys.2024.111984&partnerID=40&md5=ec2ec53d32e48fc00652690f12e63b82" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85194716679&doi=10.1016%2fj.knosys.2024.111984&partnerID=40&md5=ec2ec53d32e48fc00652690f12e63b82</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.knosys.2024.111984" target="_blank" >10.1016/j.knosys.2024.111984</a>
Alternative languages
Result language
angličtina
Original language name
What does the language system look like in pre-trained language models? A study using complex networks
Original language description
Pre-trained language models has advanced the fields of natural language processing. The exceptional capabilities exhibited by PLMs in NLP tasks have been attracting researchers to explore the underlying factors responsible for their success. However, most of work primarily focus on studying some certain linguistic knowledge encoded in PLMs, rather than investigating how these models comprehend language from a holistic perspective. Furthermore, they cannot point out how PLMs organize the whole language system. Therefore, we adopt the complex network approach to represent the language system, and investigate how language elements are organized within the system. Specifically, we take the attention relationships among words as the research object, which are generated by attention heads within BERT models. Then, the words are treated as nodes, and the connections between words and their most-attending words are represented as edges. After obtaining these ""words' attention networks"", we analyze the network properties from various perspectives by calculating the network metrics. Many constructive conclusions are summarized, including: (1) The English attention networks demonstrate exceptional performance in organizing words; (2) Most words’ attention networks exhibit small-world property and scale-free behavior; (3) Some networks generated by multilingual BERT can reflect typological information well, achieving preferable clustering performance among language groups; (4) In cross-layer analysis, the networks from 8 to 10 layers in Chinese BERT and from 6 to 9 layers in English BERT exhibit more consistent characteristics. Our study provides a comprehensive explanation of how PLMs organize language systems, which can be utilized to evaluate and develop improved models. © 2024 Elsevier B.V.
Czech name
—
Czech description
—
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Knowledge-Based Systems
ISSN
0950-7051
e-ISSN
—
Volume of the periodical
299
Issue of the periodical within the volume
2024
Country of publishing house
US - UNITED STATES
Number of pages
11
Pages from-to
1-11
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85194716679