Modeling of complexity in Czech literary texts

Project name in Czech
Modelování komplexity českých literárních textů
Annotation in Czech
• Vytvořit český datový soubor podle vzoru Hamburského konceptu srozumitelnosti; • Porovnat korelace mezi testem porozumění, subjektivním hodnocením jasnosti u probandů a Hamburskými kritérii na jedné straně, a jednotlivými metrikami jasnosti textu na druhé straně; • Adaptovat některou z metrik na češtinu, podle zjištěných korelací; • Provést stylometrický experiment s českými texty a porovnat úspěšnost bez jasnosti a s jasností, totéž s vybranými slovanskými jazyky a angličtinou (s původními metrikami).

R&D category
ZV - Basic research
OECD FORD - main branch
60203 - Linguistics
OECD FORD - secondary branch
60201 - General language studies
OECD FORD - another secondary branch
—
CEP - equivalent branches <br>(according to the <a href="http://www.vyzkum.cz/storage/att/E6EF7938F0E854BAE520AC119FB22E8D/Prevodnik_oboru_Frascati.pdf">converter</a>)
AI - Linguistics

Provider evaluation
V - Vynikající výsledky projektu (s mezinárodním významem atd.)
Project results evaluation
A data set of 91 paraphrased Czech literary texts was created with measured comprehension read on a large sample of probands in 32 texts.At the same time, there was a multiple expert annotation based on the so-called Hamburg concept of intelligibility (Hamburger Verstándlichkeitskonzept).The annotation subjectively scales various aspects of the text, which demonstrably influence its readability. The data are documented in detail and available under the CC-BY license in the Lindat / CLARlAH-CZ industry repository under the permanent link http://hdl.handle.net/11234/l-4610.A software library for the R programming language called tidystopwords was created - auxiliary a library for natural language processing and text mining.The library is freely available in the R repository under the permanent link https://CRAN.R-proiect.org/package=tidvstopwords.We have published a study which, with the help of a stylometric library, quantifies how stylistically transcribes the author's stylistic signal

Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data delivery code
CEP22-MSM-LT-U
Data delivery date
Jun 30, 2022

Similar projects(10)