Unpacking lexical intertextuality: Vocabulary shared among texts

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F22%3A10452153" target="_blank" >RIV/00216208:11210/22:10452153 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1515/9783110763560-009" target="_blank" >https://doi.org/10.1515/9783110763560-009</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1515/9783110763560-009" target="_blank" >10.1515/9783110763560-009</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Unpacking lexical intertextuality: Vocabulary shared among texts
Popis výsledku v původním jazyce
This paper focuses on lexical intertextuality, namely the three following intertextual properties: 1) the number of word-types shared by two texts; 2) the number of word-types shared by all texts in a collection; 3) the number of wordtypes shared by equal-sized segments of a collection. We have observed that the relation between the number of texts and the number of shared types follows a power law; similar behavior can be seen if text borders are disregarded and the corpus is artificially divided into equal-sized segments. The number of shared types is proportional to the size of these sequences. We developed baseline models for the number of shared types, i.e. models predicting the number of types shared by texts if all tokens were randomly shuffled and evenly spread among texts. The comparison between the empirical data and the baseline model can be used for contrastive purposes, to compare the number of shared types in corpora of different languages.
Název v anglickém jazyce
Unpacking lexical intertextuality: Vocabulary shared among texts
Popis výsledku anglicky
This paper focuses on lexical intertextuality, namely the three following intertextual properties: 1) the number of word-types shared by two texts; 2) the number of word-types shared by all texts in a collection; 3) the number of wordtypes shared by equal-sized segments of a collection. We have observed that the relation between the number of texts and the number of shared types follows a power law; similar behavior can be seen if text borders are disregarded and the corpus is artificially divided into equal-sized segments. The number of shared types is proportional to the size of these sequences. We developed baseline models for the number of shared types, i.e. models predicting the number of types shared by texts if all tokens were randomly shuffled and evenly spread among texts. The comparison between the empirical data and the baseline model can be used for contrastive purposes, to compare the number of shared types in corpora of different languages.

Klasifikace

Druh
C - Kapitola v odborné knize
CEP obor
—
OECD FORD obor
60203 - Linguistics

Návaznosti výsledku

Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název knihy nebo sborníku
Quantitative Approaches to Universality and Individuality in Language
ISBN
978-3-11-076356-0
Počet stran výsledku
15
Strana od-do
101-115
Počet stran knihy
237
Název nakladatele
De Gruyter Mouton
Místo vydání
Deutschland
Kód UT WoS kapitoly
—

Podobné výsledky(10)

Kvantitativní určení lexikálního jádra jazyka Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe Morphological Segmentation with Neural Networks: Performance Effects of Architecture, Data Size, and Cross-Lingual Transfer in Seven Languages

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Unpacking lexical intertextuality: Vocabulary shared among texts

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)