Introducing a corpus of non-native Czech with automatic annotation
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F17%3A10366730" target="_blank" >RIV/00216208:11210/17:10366730 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Introducing a corpus of non-native Czech with automatic annotation
Original language description
Learner corpus can be annotated with linguistic categories, target hypotheses and error labels. We show that useful results can be achieved even for non-native Czech by applying methods and tools developed for standard language. The corpus includes more than 8.6 thousands short essays, nearly one million words. First, the texts are processed by a tagger and lemmatizer. Then, a stochastic spelling and grammar checker is used to propose correct forms for non-words and some incorrect 'real words'. The precision of this step is above 80%. The corrected texts are tagged again. Original and corrected forms are compared and error labels, based on criteria applicable in a formally specifiable way, are assigned. The metadata include, i.a., the author's sex, age, first language, CEFR level of proficiency in Czech, and the task's time limit and topic. The corpus is available on-line via a search interface or for download.
Czech name
—
Czech description
—
Classification
Type
C - Chapter in a specialist book
CEP classification
—
OECD FORD branch
60203 - Linguistics
Result continuities
Project
<a href="/en/project/LM2011023" target="_blank" >LM2011023: Czech National Corpus</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Book/collection name
Language, Corpora and Cognition
ISBN
978-3-631-70709-8
Number of pages of the result
18
Pages from-to
163-180
Number of pages of the book
296
Publisher name
Peter Lang
Place of publication
Frankfurt am Main
UT code for WoS chapter
—