CzeSL - an error tagged corpus of Czech as a second language
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F12%3A10132322" target="_blank" >RIV/00216208:11320/12:10132322 - isvavai.cz</a>
Alternative codes found
RIV/00216208:11210/12:10132322
Result on the web
<a href="http://utkl.ff.cuni.cz/~rosen/public/2011-czesl-palc.pdf" target="_blank" >http://utkl.ff.cuni.cz/~rosen/public/2011-czesl-palc.pdf</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
CzeSL - an error tagged corpus of Czech as a second language
Original language description
The paper describes a corpus of texts produced by non-native speakers of Czech. We discuss its annotation scheme, consisting of three interlinked levels, designed to handle a wide range of error types present in the input. Each level corrects different types of errors; links between the levels allow capturing errors in word order and complex discontinuous expressions. Errors are not only corrected, but also classified. The annotation scheme is tested on a doubly-annotated sample of approx. 10,000 wordswith fair inter-annotator agreement results. We also explore the possibility of applying automated linguistic annotation tools (taggers, spell checkers and grammar checkers) to the learner text to support or even substitute manual annotation.
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
AI - Linguistics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/GPP406%2F10%2FP328" target="_blank" >GPP406/10/P328: Resource-light Morphological Analysis and Tagging</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2012
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů