Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

Expressing Time in English and Czech Children's Literature: A Contrastive N-gram-Based Study of Typologically Distant Languages

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F19%3A10397901" target="_blank" >RIV/00216208:11210/19:10397901 - isvavai.cz</a>

  • Výsledek na webu

  • DOI - Digital Object Identifier

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    Expressing Time in English and Czech Children's Literature: A Contrastive N-gram-Based Study of Typologically Distant Languages

  • Popis výsledku v původním jazyce

    The study addresses two issues raised by previous studies dealing with children&apos;s literature and phraseology. First, we explore how TIME is expressed in English and Czech children&apos;s fiction (cf. Hunt, 2005; Thompson &amp; Sealey, 2007). Our approach relies on the neo-Firthian phraseological tradition, &quot;where meaning... is said to reside in multi-word units rather than single words&quot; (Ebeling &amp; Ebeling, 2013: 65). The study is data-driven, based on n-gram extraction. This raises the question of &quot;the potential contribution&quot; of n-gram-based approaches to language comparison (Granger, 2014). N-grams appear a useful starting point when comparing typologically related languages, and rather &quot;challenging&quot; when dealing with distant ones, e.g. predominantly analytical English and inflectional Czech (Čermáková &amp; Chlumská, 2017; Hasselgård, 2017; Ebeling &amp; Ebeling, 2013). The study uses comparable English and Czech corpora of children&apos;s fiction: two small (650,000 words each) and two large ones (2,700,000 words each, sub-corpora of the Czech National Corpus (SYN) and British National Corpus). For technical reasons, queries are restricted to 250,000 hits in the large corpora. The small corpora enabled detailed examination, the large ones served to verify our small-corpus findings, supplementing them by lemma and POS queries. We extracted 2-5-grams (i.e. continuous sequences of 2-5 words excluding punctuation) from the smaller corpora. Numbers of n-grams above the threshold are consistently higher in English. The ratios suggest a larger extent of recurrent patterning in analytical English than in Czech, characterized by high morphological variability and free word-order (cf. Czech 4-grams: se nedá nic dělat, nedá se nic dělat, nedalo se nic dělat). Higher type/token ratios in Czech again point to a higher variability of Czech. Another difference is the higher representation of verbs within the most frequent n-grams in Czech (e.g. se vydal na cestu), and prepositional phrases in English (e.g. for a long time). This is again in accord with the typological expectations, Czech generally preferring (finite) verbal expression and English being more &apos;nominal&apos;. The POS observations highlighted the importance of verbs for Czech but also their high morphological variability as a potential hindrance to the use of the n-gram approach. Frequent 3-5-grams in the small corpora were classified semantically. We then focused on TIME n-grams. The expression of TIME tends to rely on n-grams comprising temporal nouns in English (e.g. end, time, moment), while in Czech adverbs and conjunctions were salient (pak, hned, když), pointing to the &apos;nominal&apos; vs. &apos;verbal&apos; character of English and Czech, respectively. The recurrent lexemes can then be used to identify (partly lemmatized) patterns expressing TIME in both languages (e.g. a pak SE, by the time) (Ebeling &amp; Ebeling, 2013; Gries, 2008). The n-gram method proved a useful starting point in corpus-driven cross-linguistic genre analysis, highlighting typological characteristics of the languages compared. Owing to the limitations on the n-gram method in Czech, a combination of approaches seems beneficial, including semantic analysis, partial lemmatization and n-gram based patterns.

  • Název v anglickém jazyce

    Expressing Time in English and Czech Children's Literature: A Contrastive N-gram-Based Study of Typologically Distant Languages

  • Popis výsledku anglicky

    The study addresses two issues raised by previous studies dealing with children&apos;s literature and phraseology. First, we explore how TIME is expressed in English and Czech children&apos;s fiction (cf. Hunt, 2005; Thompson &amp; Sealey, 2007). Our approach relies on the neo-Firthian phraseological tradition, &quot;where meaning... is said to reside in multi-word units rather than single words&quot; (Ebeling &amp; Ebeling, 2013: 65). The study is data-driven, based on n-gram extraction. This raises the question of &quot;the potential contribution&quot; of n-gram-based approaches to language comparison (Granger, 2014). N-grams appear a useful starting point when comparing typologically related languages, and rather &quot;challenging&quot; when dealing with distant ones, e.g. predominantly analytical English and inflectional Czech (Čermáková &amp; Chlumská, 2017; Hasselgård, 2017; Ebeling &amp; Ebeling, 2013). The study uses comparable English and Czech corpora of children&apos;s fiction: two small (650,000 words each) and two large ones (2,700,000 words each, sub-corpora of the Czech National Corpus (SYN) and British National Corpus). For technical reasons, queries are restricted to 250,000 hits in the large corpora. The small corpora enabled detailed examination, the large ones served to verify our small-corpus findings, supplementing them by lemma and POS queries. We extracted 2-5-grams (i.e. continuous sequences of 2-5 words excluding punctuation) from the smaller corpora. Numbers of n-grams above the threshold are consistently higher in English. The ratios suggest a larger extent of recurrent patterning in analytical English than in Czech, characterized by high morphological variability and free word-order (cf. Czech 4-grams: se nedá nic dělat, nedá se nic dělat, nedalo se nic dělat). Higher type/token ratios in Czech again point to a higher variability of Czech. Another difference is the higher representation of verbs within the most frequent n-grams in Czech (e.g. se vydal na cestu), and prepositional phrases in English (e.g. for a long time). This is again in accord with the typological expectations, Czech generally preferring (finite) verbal expression and English being more &apos;nominal&apos;. The POS observations highlighted the importance of verbs for Czech but also their high morphological variability as a potential hindrance to the use of the n-gram approach. Frequent 3-5-grams in the small corpora were classified semantically. We then focused on TIME n-grams. The expression of TIME tends to rely on n-grams comprising temporal nouns in English (e.g. end, time, moment), while in Czech adverbs and conjunctions were salient (pak, hned, když), pointing to the &apos;nominal&apos; vs. &apos;verbal&apos; character of English and Czech, respectively. The recurrent lexemes can then be used to identify (partly lemmatized) patterns expressing TIME in both languages (e.g. a pak SE, by the time) (Ebeling &amp; Ebeling, 2013; Gries, 2008). The n-gram method proved a useful starting point in corpus-driven cross-linguistic genre analysis, highlighting typological characteristics of the languages compared. Owing to the limitations on the n-gram method in Czech, a combination of approaches seems beneficial, including semantic analysis, partial lemmatization and n-gram based patterns.

Klasifikace

  • Druh

    D - Stať ve sborníku

  • CEP obor

  • OECD FORD obor

    60203 - Linguistics

Návaznosti výsledku

  • Projekt

  • Návaznosti

    S - Specificky vyzkum na vysokych skolach<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

  • Rok uplatnění

    2019

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název statě ve sborníku

    Language Use and Linguistic Structure: Proceedings of the Olomouc Linguistics Colloquium 2018

  • ISBN

    978-80-244-5525-9

  • ISSN

  • e-ISSN

  • Počet stran výsledku

    15

  • Strana od-do

    469-483

  • Název nakladatele

    Palacký University

  • Místo vydání

    Olomouc

  • Místo konání akce

    Olomouc: Palacký University

  • Datum konání akce

    7. 6. 2018

  • Typ akce podle státní příslušnosti

    EUR - Evropská akce

  • Kód UT WoS článku