Building a Corpus of Old Czech

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F12%3A10130057" target="_blank" >RIV/00216208:11320/12:10130057 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—

Result language
angličtina
Original language name
Building a Corpus of Old Czech
Original language description
In this paper we describe our efforts to build a corpus of Old Czech. We report on tools, resources and methodologies used during the corpus development as well as discuss the corpus sources and structure, the tagset used, the approach to lemmatization,morphological analysis and tagging. Due to practical restrictions we adapt resources and tools developed for Modern Czech. However, some of the described challenges, such as the non-standardized spelling in early Czech and the form and lemma variabilitydue to language change during the covered time-span, are unique and never arise when building synchronic corpora of Modern Czech.
Czech name
—
Czech description
—

Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Publication year
2012
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Article name in the collection
Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)
ISBN
978-2-9517408-7-7
ISSN
—
e-ISSN
—
Number of pages
1
Pages from-to
1
Publisher name
European Language Resources Association
Place of publication
?stanbul, Turkey
Event location
?stanbul, Turkey
Event date
May 21, 2012
Type of event by nationality
CST - Celostátní akce
UT code for WoS article
—

Similar results(10)