All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Unpacking lexical intertextuality: Vocabulary shared among texts

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F22%3A10452153" target="_blank" >RIV/00216208:11210/22:10452153 - isvavai.cz</a>

  • Result on the web

    <a href="https://doi.org/10.1515/9783110763560-009" target="_blank" >https://doi.org/10.1515/9783110763560-009</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1515/9783110763560-009" target="_blank" >10.1515/9783110763560-009</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Unpacking lexical intertextuality: Vocabulary shared among texts

  • Original language description

    This paper focuses on lexical intertextuality, namely the three following intertextual properties: 1) the number of word-types shared by two texts; 2) the number of word-types shared by all texts in a collection; 3) the number of wordtypes shared by equal-sized segments of a collection. We have observed that the relation between the number of texts and the number of shared types follows a power law; similar behavior can be seen if text borders are disregarded and the corpus is artificially divided into equal-sized segments. The number of shared types is proportional to the size of these sequences. We developed baseline models for the number of shared types, i.e. models predicting the number of types shared by texts if all tokens were randomly shuffled and evenly spread among texts. The comparison between the empirical data and the baseline model can be used for contrastive purposes, to compare the number of shared types in corpora of different languages.

  • Czech name

  • Czech description

Classification

  • Type

    C - Chapter in a specialist book

  • CEP classification

  • OECD FORD branch

    60203 - Linguistics

Result continuities

  • Project

  • Continuities

    I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

  • Publication year

    2022

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Book/collection name

    Quantitative Approaches to Universality and Individuality in Language

  • ISBN

    978-3-11-076356-0

  • Number of pages of the result

    15

  • Pages from-to

    101-115

  • Number of pages of the book

    237

  • Publisher name

    De Gruyter Mouton

  • Place of publication

    Deutschland

  • UT code for WoS chapter