Expressing Time in English and Czech Children's Literature: A Contrastive N-gram-Based Study of Typologically Distant Languages
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F19%3A10397901" target="_blank" >RIV/00216208:11210/19:10397901 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Expressing Time in English and Czech Children's Literature: A Contrastive N-gram-Based Study of Typologically Distant Languages
Original language description
The study addresses two issues raised by previous studies dealing with children's literature and phraseology. First, we explore how TIME is expressed in English and Czech children's fiction (cf. Hunt, 2005; Thompson & Sealey, 2007). Our approach relies on the neo-Firthian phraseological tradition, "where meaning... is said to reside in multi-word units rather than single words" (Ebeling & Ebeling, 2013: 65). The study is data-driven, based on n-gram extraction. This raises the question of "the potential contribution" of n-gram-based approaches to language comparison (Granger, 2014). N-grams appear a useful starting point when comparing typologically related languages, and rather "challenging" when dealing with distant ones, e.g. predominantly analytical English and inflectional Czech (Čermáková & Chlumská, 2017; Hasselgård, 2017; Ebeling & Ebeling, 2013). The study uses comparable English and Czech corpora of children's fiction: two small (650,000 words each) and two large ones (2,700,000 words each, sub-corpora of the Czech National Corpus (SYN) and British National Corpus). For technical reasons, queries are restricted to 250,000 hits in the large corpora. The small corpora enabled detailed examination, the large ones served to verify our small-corpus findings, supplementing them by lemma and POS queries. We extracted 2-5-grams (i.e. continuous sequences of 2-5 words excluding punctuation) from the smaller corpora. Numbers of n-grams above the threshold are consistently higher in English. The ratios suggest a larger extent of recurrent patterning in analytical English than in Czech, characterized by high morphological variability and free word-order (cf. Czech 4-grams: se nedá nic dělat, nedá se nic dělat, nedalo se nic dělat). Higher type/token ratios in Czech again point to a higher variability of Czech. Another difference is the higher representation of verbs within the most frequent n-grams in Czech (e.g. se vydal na cestu), and prepositional phrases in English (e.g. for a long time). This is again in accord with the typological expectations, Czech generally preferring (finite) verbal expression and English being more 'nominal'. The POS observations highlighted the importance of verbs for Czech but also their high morphological variability as a potential hindrance to the use of the n-gram approach. Frequent 3-5-grams in the small corpora were classified semantically. We then focused on TIME n-grams. The expression of TIME tends to rely on n-grams comprising temporal nouns in English (e.g. end, time, moment), while in Czech adverbs and conjunctions were salient (pak, hned, když), pointing to the 'nominal' vs. 'verbal' character of English and Czech, respectively. The recurrent lexemes can then be used to identify (partly lemmatized) patterns expressing TIME in both languages (e.g. a pak SE, by the time) (Ebeling & Ebeling, 2013; Gries, 2008). The n-gram method proved a useful starting point in corpus-driven cross-linguistic genre analysis, highlighting typological characteristics of the languages compared. Owing to the limitations on the n-gram method in Czech, a combination of approaches seems beneficial, including semantic analysis, partial lemmatization and n-gram based patterns.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
60203 - Linguistics
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2019
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Language Use and Linguistic Structure: Proceedings of the Olomouc Linguistics Colloquium 2018
ISBN
978-80-244-5525-9
ISSN
—
e-ISSN
—
Number of pages
15
Pages from-to
469-483
Publisher name
Palacký University
Place of publication
Olomouc
Event location
Olomouc: Palacký University
Event date
Jun 7, 2018
Type of event by nationality
EUR - Evropská akce
UT code for WoS article
—