What does the language system look like in pre-trained language models? A study using complex networks

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AGT6VV3NW" target="_blank" >RIV/00216208:11320/25:GT6VV3NW - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85194716679&doi=10.1016%2fj.knosys.2024.111984&partnerID=40&md5=ec2ec53d32e48fc00652690f12e63b82" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85194716679&doi=10.1016%2fj.knosys.2024.111984&partnerID=40&md5=ec2ec53d32e48fc00652690f12e63b82</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.knosys.2024.111984" target="_blank" >10.1016/j.knosys.2024.111984</a>

Alternative languages

Result language
angličtina
Original language name
What does the language system look like in pre-trained language models? A study using complex networks
Original language description
Pre-trained language models has advanced the fields of natural language processing. The exceptional capabilities exhibited by PLMs in NLP tasks have been attracting researchers to explore the underlying factors responsible for their success. However, most of work primarily focus on studying some certain linguistic knowledge encoded in PLMs, rather than investigating how these models comprehend language from a holistic perspective. Furthermore, they cannot point out how PLMs organize the whole language system. Therefore, we adopt the complex network approach to represent the language system, and investigate how language elements are organized within the system. Specifically, we take the attention relationships among words as the research object, which are generated by attention heads within BERT models. Then, the words are treated as nodes, and the connections between words and their most-attending words are represented as edges. After obtaining these ""words' attention networks"", we analyze the network properties from various perspectives by calculating the network metrics. Many constructive conclusions are summarized, including: (1) The English attention networks demonstrate exceptional performance in organizing words; (2) Most words’ attention networks exhibit small-world property and scale-free behavior; (3) Some networks generated by multilingual BERT can reflect typological information well, achieving preferable clustering performance among language groups; (4) In cross-layer analysis, the networks from 8 to 10 layers in Chinese BERT and from 6 to 9 layers in English BERT exhibit more consistent characteristics. Our study provides a comprehensive explanation of how PLMs organize language systems, which can be utilized to evaluate and develop improved models. © 2024 Elsevier B.V.
Czech name
—
Czech description
—

Classification

Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
—
Continuities
—

Others

Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Name of the periodical
Knowledge-Based Systems
ISSN
0950-7051
e-ISSN
—
Volume of the periodical
299
Issue of the periodical within the volume
2024
Country of publishing house
US - UNITED STATES
Number of pages
11
Pages from-to
1-11
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85194716679

Similar results(10)

HerBERT: Efficiently pretrained transformer-based language model for Polish Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models

What are you looking for?

Quick search

Smart search

What does the language system look like in pre-trained language models? A study using complex networks

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)