Linguistic Injustice in Multilingual Technologies: The TenTen Corpus Family as a Case Study*

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3AMYFXI8YM" target="_blank" >RIV/00216208:11320/23:MYFXI8YM - isvavai.cz</a>
Výsledek na webu
<a href="https://www.taylorfrancis.com/chapters/edit/10.4324/9781003393696-12/linguistic-injustice-multilingual-technologies-david-bordonaba-plou-laila-jreis-navarro" target="_blank" >https://www.taylorfrancis.com/chapters/edit/10.4324/9781003393696-12/linguistic-injustice-multilingual-technologies-david-bordonaba-plou-laila-jreis-navarro</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Linguistic Injustice in Multilingual Technologies: The TenTen Corpus Family as a Case Study*
Popis výsledku v původním jazyce
"The aim of this work is twofold. First, to distinguish a phenomenon that produces a new type of linguistic injustice, “the paradox of Anglocentric multilingualism.” This paradox arises when a multilingual philosophy is pursued in constructing complex systems of analysis in the digital environment; however, these systems imply advantages in the study of English over other languages. The injustice derives from a poor level of precision in the output of a technology when analyzing non-English languages. Second, to contend that multilingual DH should deal with the deficiencies of tools’ performance, in addition to those of the language resources, because this disadvantage makes it difficult for any cross-linguistic study to provide reliable empirical data in dis(proving) linguistic intuitions. To illustrate some of the potential problems derived from the paradox, this work will detail the difficulties we have faced in a cross-linguistic study on color terms, when using the Arabic corpus arTenTen and the Spanish corpus esTenTen in Sketch Engine. We will study the different performances of the tool in Arabic and Spanish, compared to English, to signal the weaknesses of this tool in a multilingual arena, enabling its improvement and enriching the critical and inclusive framework of multilingual DH."
Název v anglickém jazyce
Linguistic Injustice in Multilingual Technologies: The TenTen Corpus Family as a Case Study*
Popis výsledku anglicky
"The aim of this work is twofold. First, to distinguish a phenomenon that produces a new type of linguistic injustice, “the paradox of Anglocentric multilingualism.” This paradox arises when a multilingual philosophy is pursued in constructing complex systems of analysis in the digital environment; however, these systems imply advantages in the study of English over other languages. The injustice derives from a poor level of precision in the output of a technology when analyzing non-English languages. Second, to contend that multilingual DH should deal with the deficiencies of tools’ performance, in addition to those of the language resources, because this disadvantage makes it difficult for any cross-linguistic study to provide reliable empirical data in dis(proving) linguistic intuitions. To illustrate some of the potential problems derived from the paradox, this work will detail the difficulties we have faced in a cross-linguistic study on color terms, when using the Arabic corpus arTenTen and the Spanish corpus esTenTen in Sketch Engine. We will study the different performances of the tool in Arabic and Spanish, compared to English, to signal the weaknesses of this tool in a multilingual arena, enabling its improvement and enriching the critical and inclusive framework of multilingual DH."

Klasifikace

Druh
C - Kapitola v odborné knize
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název knihy nebo sborníku
"Multilingual Digital Humanities"
ISBN
978-1-00-339369-6
Počet stran výsledku
16
Strana od-do
1-16
Počet stran knihy
264
Název nakladatele
Routledge
Místo vydání
—
Kód UT WoS kapitoly
—

Podobné výsledky(10)

Probing Multilingual Sentence Representations With X-Probe Multi-document multilingual summarization corpus preparation, Part 2: Czech, Hebrew and Spanish Typological Challenges for the Application of Multilingual Language Models in the Digital Humanities

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Linguistic Injustice in Multilingual Technologies: The TenTen Corpus Family as a Case Study*

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)