Language complexity across sub-styles and genres in legal Russian
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3AUZ62N7J4" target="_blank" >RIV/00216208:11320/23:UZ62N7J4 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85169335305&doi=10.18413%2f2313-8912-2023-9-2-0-5&partnerID=40&md5=c60cdc2650a34264b3f78e4d84b9783b" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85169335305&doi=10.18413%2f2313-8912-2023-9-2-0-5&partnerID=40&md5=c60cdc2650a34264b3f78e4d84b9783b</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.18413/2313-8912-2023-9-2-0-5" target="_blank" >10.18413/2313-8912-2023-9-2-0-5</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Language complexity across sub-styles and genres in legal Russian
Popis výsledku v původním jazyce
"The purpose of the paper is to find out the differences in linguistic complexity between legal documents, opposed by domain, sub-style and genre. The authors explore the large and diverse corpus of Russian legal texts and compare (1) international documents and documents of national law, (2) documents of the three sub-styles (administrative, legislative and justiciary), and (3) texts of different genres within sub-styles. To obtain complexity scores, an automatic model is used whose modules are capable of predicting complexity either by using the fine-tuned ruBERT model, or by using 133 language metrics, or in a hybrid way. The paper analyzes a dataset consisting of 43,804 documents, 118,768,028 words. National law documents are classified into three sub-styles. In addition, each document is characterized according to the genre and to the issuing body. Thus, 68 genres were identified. All documents were assigned complexity scores ranging from “0” to “12”. The vast majority of all documents were scored as maximally complex. The hybrid model assigned a complexity level of “12” to 97.1% of administrative sub-style documents, 94.5% of legislative sub-style documents, and 99.7% of judicial sub-style documents of national law. For all international law documents, the proportion of documents with a level of complexity of “12” is 94.1%. The set of legislative sub-style texts is the most varied in complexity. On average, the most complex documents in the dataset are of justiciary sub-style ones. Linguistic features successfully contrast international and national documents, as well as legislative and justiciary sub-styles. When comparing documents by genre, the authors interpreted only the average values of the 22 syntactic metrics. In general, a comparison of the genre-based document groups showed that it is not the genre itself that may be decisive for the complexity score, but the issuing body. © 2023 Belgorod State National Research University. All rights reserved."
Název v anglickém jazyce
Language complexity across sub-styles and genres in legal Russian
Popis výsledku anglicky
"The purpose of the paper is to find out the differences in linguistic complexity between legal documents, opposed by domain, sub-style and genre. The authors explore the large and diverse corpus of Russian legal texts and compare (1) international documents and documents of national law, (2) documents of the three sub-styles (administrative, legislative and justiciary), and (3) texts of different genres within sub-styles. To obtain complexity scores, an automatic model is used whose modules are capable of predicting complexity either by using the fine-tuned ruBERT model, or by using 133 language metrics, or in a hybrid way. The paper analyzes a dataset consisting of 43,804 documents, 118,768,028 words. National law documents are classified into three sub-styles. In addition, each document is characterized according to the genre and to the issuing body. Thus, 68 genres were identified. All documents were assigned complexity scores ranging from “0” to “12”. The vast majority of all documents were scored as maximally complex. The hybrid model assigned a complexity level of “12” to 97.1% of administrative sub-style documents, 94.5% of legislative sub-style documents, and 99.7% of judicial sub-style documents of national law. For all international law documents, the proportion of documents with a level of complexity of “12” is 94.1%. The set of legislative sub-style texts is the most varied in complexity. On average, the most complex documents in the dataset are of justiciary sub-style ones. Linguistic features successfully contrast international and national documents, as well as legislative and justiciary sub-styles. When comparing documents by genre, the authors interpreted only the average values of the 22 syntactic metrics. In general, a comparison of the genre-based document groups showed that it is not the genre itself that may be decisive for the complexity score, but the issuing body. © 2023 Belgorod State National Research University. All rights reserved."
Klasifikace
Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
"Research Result. Theoretical and Applied Linguistics"
ISSN
2313-8912
e-ISSN
—
Svazek periodika
9
Číslo periodika v rámci svazku
2
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
24
Strana od-do
73-96
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85169335305