All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Expressing Time in English and Czech Children's Literature: A Contrastive N-gram-Based Study of Typologically Distant Languages

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F19%3A10397901" target="_blank" >RIV/00216208:11210/19:10397901 - isvavai.cz</a>

  • Result on the web

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Expressing Time in English and Czech Children's Literature: A Contrastive N-gram-Based Study of Typologically Distant Languages

  • Original language description

    The study addresses two issues raised by previous studies dealing with children&apos;s literature and phraseology. First, we explore how TIME is expressed in English and Czech children&apos;s fiction (cf. Hunt, 2005; Thompson &amp; Sealey, 2007). Our approach relies on the neo-Firthian phraseological tradition, &quot;where meaning... is said to reside in multi-word units rather than single words&quot; (Ebeling &amp; Ebeling, 2013: 65). The study is data-driven, based on n-gram extraction. This raises the question of &quot;the potential contribution&quot; of n-gram-based approaches to language comparison (Granger, 2014). N-grams appear a useful starting point when comparing typologically related languages, and rather &quot;challenging&quot; when dealing with distant ones, e.g. predominantly analytical English and inflectional Czech (Čermáková &amp; Chlumská, 2017; Hasselgård, 2017; Ebeling &amp; Ebeling, 2013). The study uses comparable English and Czech corpora of children&apos;s fiction: two small (650,000 words each) and two large ones (2,700,000 words each, sub-corpora of the Czech National Corpus (SYN) and British National Corpus). For technical reasons, queries are restricted to 250,000 hits in the large corpora. The small corpora enabled detailed examination, the large ones served to verify our small-corpus findings, supplementing them by lemma and POS queries. We extracted 2-5-grams (i.e. continuous sequences of 2-5 words excluding punctuation) from the smaller corpora. Numbers of n-grams above the threshold are consistently higher in English. The ratios suggest a larger extent of recurrent patterning in analytical English than in Czech, characterized by high morphological variability and free word-order (cf. Czech 4-grams: se nedá nic dělat, nedá se nic dělat, nedalo se nic dělat). Higher type/token ratios in Czech again point to a higher variability of Czech. Another difference is the higher representation of verbs within the most frequent n-grams in Czech (e.g. se vydal na cestu), and prepositional phrases in English (e.g. for a long time). This is again in accord with the typological expectations, Czech generally preferring (finite) verbal expression and English being more &apos;nominal&apos;. The POS observations highlighted the importance of verbs for Czech but also their high morphological variability as a potential hindrance to the use of the n-gram approach. Frequent 3-5-grams in the small corpora were classified semantically. We then focused on TIME n-grams. The expression of TIME tends to rely on n-grams comprising temporal nouns in English (e.g. end, time, moment), while in Czech adverbs and conjunctions were salient (pak, hned, když), pointing to the &apos;nominal&apos; vs. &apos;verbal&apos; character of English and Czech, respectively. The recurrent lexemes can then be used to identify (partly lemmatized) patterns expressing TIME in both languages (e.g. a pak SE, by the time) (Ebeling &amp; Ebeling, 2013; Gries, 2008). The n-gram method proved a useful starting point in corpus-driven cross-linguistic genre analysis, highlighting typological characteristics of the languages compared. Owing to the limitations on the n-gram method in Czech, a combination of approaches seems beneficial, including semantic analysis, partial lemmatization and n-gram based patterns.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    60203 - Linguistics

Result continuities

  • Project

  • Continuities

    S - Specificky vyzkum na vysokych skolach<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

  • Publication year

    2019

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Language Use and Linguistic Structure: Proceedings of the Olomouc Linguistics Colloquium 2018

  • ISBN

    978-80-244-5525-9

  • ISSN

  • e-ISSN

  • Number of pages

    15

  • Pages from-to

    469-483

  • Publisher name

    Palacký University

  • Place of publication

    Olomouc

  • Event location

    Olomouc: Palacký University

  • Event date

    Jun 7, 2018

  • Type of event by nationality

    EUR - Evropská akce

  • UT code for WoS article