The structuralist tradition meets empirical data: Corpus data enhancing the Czech Internet Language Reference Book

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68378092%3A_____%2F23%3A00584073" target="_blank" >RIV/68378092:_____/23:00584073 - isvavai.cz</a>
Alternative codes found
RIV/00216208:11210/23:10465205
Result on the web
<a href="https://www.euppublishing.com/doi/full/10.3366/word.2023.0230" target="_blank" >https://www.euppublishing.com/doi/full/10.3366/word.2023.0230</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3366/word.2023.0230" target="_blank" >10.3366/word.2023.0230</a>

Alternative languages

Result language
angličtina
Original language name
The structuralist tradition meets empirical data: Corpus data enhancing the Czech Internet Language Reference Book
Original language description
This paper demonstrates how the corpus grammar tool GramatiKat can be used to improve and refine morphological information in the Internet Language Reference Book (ILRB), which presents complete declension paradigms for 45,632 standard Czech nouns. The paradigm tables are based mainly on morphological types, following structuralist conceptions of language as a fully articulated system. The paper discusses how to update the ILRB and provide users with empirically based grammatical information for individual word forms in each cell of the paradigm. All noun lemmas have been investigated using the GramatiKat tool for research into grammatical categories in Czech. The tool observes the distribution of word forms of a particular lexeme in comparison with the standard distribution across the whole word class. It is capable of identifying nouns that have an unusually high occurrence of a certain word form, as well as nouns with unattested word forms. GramatiKat is based on the data from two corpora of Czech written texts, SYN2015 and SYN2020 (200 million word tokens). The paper investigates the relationship between defectiveness and overabundance on one side and language variability and potentiality on the other. Based on the unique combination of data from the ILRB and GramatiKat, the paper suggests how information about unusually frequent or overabundant word forms as well as unattested ones should be pointed out, so that ILRB provides the user with accurate, empirically based data.
Czech name
—
Czech description
—

Classification

Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
60203 - Linguistics

Result continuities

Project
<a href="/en/project/LM2023044" target="_blank" >LM2023044: Czech National Corpus</a><br>
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Name of the periodical
Word Structure
ISSN
1750-1245
e-ISSN
1755-2036
Volume of the periodical
16
Issue of the periodical within the volume
2/3
Country of publishing house
GB - UNITED KINGDOM
Number of pages
25
Pages from-to
233-257
UT code for WoS article
001099547400005
EID of the result in the Scopus database
2-s2.0-85179302875

Similar results(10)

GramatiKat (version 2) : A tool for grammatical categories research and grammatical profiles Sharing data through specialized corpus-based tools: the case of GramatiKat GramatiKat

What are you looking for?

Quick search

Smart search

The structuralist tradition meets empirical data: Corpus data enhancing the Czech Internet Language Reference Book

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)