Morphosyntactic probing of multilingual BERT models
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3ACPTG9Z2P" target="_blank" >RIV/00216208:11320/23:CPTG9Z2P - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85161321791&doi=10.1017%2fS1351324923000190&partnerID=40&md5=92981f23f267b885a8052ce234546706" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85161321791&doi=10.1017%2fS1351324923000190&partnerID=40&md5=92981f23f267b885a8052ce234546706</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1017/s1351324923000190" target="_blank" >10.1017/s1351324923000190</a>
Alternative languages
Result language
angličtina
Original language name
Morphosyntactic probing of multilingual BERT models
Original language description
"We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain strong performance across these tasks. We then apply two methods to locate, for each probing task, where the disambiguating information resides in the input. The first is a new perturbation method that masks various parts of context; the second is the classical method of Shapley values. The most intriguing finding that emerges is a strong tendency for the preceding context to hold more information relevant to the prediction than the following context. © The Author(s), 2023. Published by Cambridge University Press."
Czech name
—
Czech description
—
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
"Natural Language Engineering"
ISSN
1351-3249
e-ISSN
—
Volume of the periodical
1
Issue of the periodical within the volume
1
Country of publishing house
US - UNITED STATES
Number of pages
40
Pages from-to
1-40
UT code for WoS article
001007784400001
EID of the result in the Scopus database
2-s2.0-85161321791