Cross-linguistic variations in syntactic complexity: insights from a multilingual parallel corpus

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F24%3A10489058" target="_blank" >RIV/00216208:11210/24:10489058 - isvavai.cz</a>
Výsledek na webu
<a href="https://jakobson.korpus.cz/~rosen/public/2024_Singapur.pdf" target="_blank" >https://jakobson.korpus.cz/~rosen/public/2024_Singapur.pdf</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Cross-linguistic variations in syntactic complexity: insights from a multilingual parallel corpus
Popis výsledku v původním jazyce
The study examines cross-linguistic variation in syntactic complexity (SC). Using the InterCorp multilingual corpus, with 49 languages annotated by Universal Dependencies (UD) and 6 syntactic complexity measures (SCMs), we analyse parallel texts both cross-linguistically and intra-linguistically (across text types), using InterCorp, a multilingual parallel corpus with Universal Dependencies (UD) annotation and syntactic complexity measures (SCMs). The pilot study of 12 languages has shown that: (i) Both language and text type influence the syntactic complexity of texts and sentences, but text type shows a larger effect size than language (in our sample). This highlights the importance of distinguishing text types in cross-linguistic analyses. (ii) Within fiction, languages cluster differently according to different SCMs (sLength and subRatio). The differences are given by structural variations between languages, but also - in Japanese - partly by the specifics of the UD annotation (tokenization). In fiction, languages show high degree of variation, due to the stylistic diversity of the analyzed texts.(iii) A strong correlation was observed between the two NP measures (maxNPDepth and maxNPLength) and between the two clausal measures (maxTreeDepth and subRatio). Most of the languages in the sample show a similar PCA pattern, but in other languages, patterns vary both cross-linguistically and intra-linguistically (text type variation).
Název v anglickém jazyce
Cross-linguistic variations in syntactic complexity: insights from a multilingual parallel corpus
Popis výsledku anglicky
The study examines cross-linguistic variation in syntactic complexity (SC). Using the InterCorp multilingual corpus, with 49 languages annotated by Universal Dependencies (UD) and 6 syntactic complexity measures (SCMs), we analyse parallel texts both cross-linguistically and intra-linguistically (across text types), using InterCorp, a multilingual parallel corpus with Universal Dependencies (UD) annotation and syntactic complexity measures (SCMs). The pilot study of 12 languages has shown that: (i) Both language and text type influence the syntactic complexity of texts and sentences, but text type shows a larger effect size than language (in our sample). This highlights the importance of distinguishing text types in cross-linguistic analyses. (ii) Within fiction, languages cluster differently according to different SCMs (sLength and subRatio). The differences are given by structural variations between languages, but also - in Japanese - partly by the specifics of the UD annotation (tokenization). In fiction, languages show high degree of variation, due to the stylistic diversity of the analyzed texts.(iii) A strong correlation was observed between the two NP measures (maxNPDepth and maxNPLength) and between the two clausal measures (maxTreeDepth and subRatio). Most of the languages in the sample show a similar PCA pattern, but in other languages, patterns vary both cross-linguistically and intra-linguistically (text type variation).

Klasifikace

Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
60203 - Linguistics

Návaznosti výsledku

Projekt
<a href="/cs/project/LM2023044" target="_blank" >LM2023044: Český národní korpus</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Podobné výsledky(10)

Korpus InterCorp, verze 16ud Korpus InterCorp, verze 13ud InterCorp multilingual parallel corpus and its Croatian component

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Cross-linguistic variations in syntactic complexity: insights from a multilingual parallel corpus

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Podobné výsledky(10)