Cross-linguistic variations in syntactic complexity: insights from a multilingual parallel corpus

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F24%3A10489058" target="_blank" >RIV/00216208:11210/24:10489058 - isvavai.cz</a>
Result on the web
<a href="https://jakobson.korpus.cz/~rosen/public/2024_Singapur.pdf" target="_blank" >https://jakobson.korpus.cz/~rosen/public/2024_Singapur.pdf</a>
DOI - Digital Object Identifier
—

Alternative languages

Result language
angličtina
Original language name
Cross-linguistic variations in syntactic complexity: insights from a multilingual parallel corpus
Original language description
The study examines cross-linguistic variation in syntactic complexity (SC). Using the InterCorp multilingual corpus, with 49 languages annotated by Universal Dependencies (UD) and 6 syntactic complexity measures (SCMs), we analyse parallel texts both cross-linguistically and intra-linguistically (across text types), using InterCorp, a multilingual parallel corpus with Universal Dependencies (UD) annotation and syntactic complexity measures (SCMs). The pilot study of 12 languages has shown that: (i) Both language and text type influence the syntactic complexity of texts and sentences, but text type shows a larger effect size than language (in our sample). This highlights the importance of distinguishing text types in cross-linguistic analyses. (ii) Within fiction, languages cluster differently according to different SCMs (sLength and subRatio). The differences are given by structural variations between languages, but also - in Japanese - partly by the specifics of the UD annotation (tokenization). In fiction, languages show high degree of variation, due to the stylistic diversity of the analyzed texts.(iii) A strong correlation was observed between the two NP measures (maxNPDepth and maxNPLength) and between the two clausal measures (maxTreeDepth and subRatio). Most of the languages in the sample show a similar PCA pattern, but in other languages, patterns vary both cross-linguistically and intra-linguistically (text type variation).
Czech name
—
Czech description
—

Classification

Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
60203 - Linguistics

Result continuities

Project
<a href="/en/project/LM2023044" target="_blank" >LM2023044: Czech National Corpus</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Similar results(10)

The InterCorp corpus, release 16ud The InterCorp corpus, release 13ud InterCorp multilingual parallel corpus and its Croatian component

What are you looking for?

Quick search

Smart search

Cross-linguistic variations in syntactic complexity: insights from a multilingual parallel corpus

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Similar results(10)