Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

How Linguistically Fair Are Multilingual Pre-Trained Language Models?

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F21%3A10440953" target="_blank" >RIV/00216208:11320/21:10440953 - isvavai.cz</a>

  • Výsledek na webu

  • DOI - Digital Object Identifier

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    How Linguistically Fair Are Multilingual Pre-Trained Language Models?

  • Popis výsledku v původním jazyce

    Massively multilingual pre-trained language models, such as mBERT and XLM-RoBERTa, have received significant attention in the recent NLP literature for their excellent capability towards crosslingual zero-shot transfer of NLP tasks. This is especially promising because a large number of languages have no or very little labeled data for supervised learning. Moreover, a substantially improved performance on low resource languages without any significant degradation of accuracy for high resource languages lead us to believe that these models will help attain a fairer distribution of language technologies despite the prevalent unfair and extremely skewed distribution of resources across the world&apos;s languages. Nevertheless, these models, and the experimental approaches adopted by the researchers to arrive at those, have been criticised by some for lacking a nuanced and thorough comparison of benefits across languages and tasks. A related and important question that has received little attention is how to choose from a set of models, when no single model significantly outperforms the others on all tasks and languages. As we discuss in this paper, this is often the case, and the choices are usually made without a clear articulation of reasons or underlying fairness assumptions. In this work, we scrutinize the choices made in previous work, and propose a few different strategies for fair and efficient model selection based on the principles of fairness in economics and social choice theory. In particular, we emphasize Rawlsian fairness, which provides an appropriate framework for making fair (with respect to languages, or tasks, or both) choices while selecting multilingual pre-trained language models for a practical or scientific set-up.

  • Název v anglickém jazyce

    How Linguistically Fair Are Multilingual Pre-Trained Language Models?

  • Popis výsledku anglicky

    Massively multilingual pre-trained language models, such as mBERT and XLM-RoBERTa, have received significant attention in the recent NLP literature for their excellent capability towards crosslingual zero-shot transfer of NLP tasks. This is especially promising because a large number of languages have no or very little labeled data for supervised learning. Moreover, a substantially improved performance on low resource languages without any significant degradation of accuracy for high resource languages lead us to believe that these models will help attain a fairer distribution of language technologies despite the prevalent unfair and extremely skewed distribution of resources across the world&apos;s languages. Nevertheless, these models, and the experimental approaches adopted by the researchers to arrive at those, have been criticised by some for lacking a nuanced and thorough comparison of benefits across languages and tasks. A related and important question that has received little attention is how to choose from a set of models, when no single model significantly outperforms the others on all tasks and languages. As we discuss in this paper, this is often the case, and the choices are usually made without a clear articulation of reasons or underlying fairness assumptions. In this work, we scrutinize the choices made in previous work, and propose a few different strategies for fair and efficient model selection based on the principles of fairness in economics and social choice theory. In particular, we emphasize Rawlsian fairness, which provides an appropriate framework for making fair (with respect to languages, or tasks, or both) choices while selecting multilingual pre-trained language models for a practical or scientific set-up.

Klasifikace

  • Druh

    D - Stať ve sborníku

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

  • Návaznosti

Ostatní

  • Rok uplatnění

    2021

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název statě ve sborníku

    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE

  • ISBN

    978-1-57735-866-4

  • ISSN

    2159-5399

  • e-ISSN

    2374-3468

  • Počet stran výsledku

    9

  • Strana od-do

    12710-12718

  • Název nakladatele

    ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE

  • Místo vydání

    PALO ALTO

  • Místo konání akce

    online

  • Datum konání akce

    2. 2. 2021

  • Typ akce podle státní příslušnosti

    WRD - Celosvětová akce

  • Kód UT WoS článku

    000681269804043