SYN2015: Representative Corpus of Contemporary Written Czech
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F16%3A10332726" target="_blank" >RIV/00216208:11210/16:10332726 - isvavai.cz</a>
Result on the web
<a href="http://www.lrec-conf.org/proceedings/lrec2016/pdf/186_Paper.pdf" target="_blank" >http://www.lrec-conf.org/proceedings/lrec2016/pdf/186_Paper.pdf</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
SYN2015: Representative Corpus of Contemporary Written Czech
Original language description
The paper concentrates on the design, composition and annotation of SYN2015, a new 100-million representative corpus of contemporary written Czech. SYN2015 is a sequel of the representative corpora of the SYN series that can be described as traditional (as opposed to the web-crawled corpora), featuring cleared copyright issues, well-defined composition, reliability of annotation and high-quality text processing. At the same time, SYN2015 is designed as a reflection of the variety of written Czech text production with necessary methodological and technological enhancements that include a detailed bibliographic annotation and text classification based on an updated scheme. The corpus has been produced using a completely rebuilt text processing toolchain called SynKorp. SYN2015 is lemmatized, morphologically and syntactically annotated with state-of-the-art tools. It has been published within the framework of the Czech National Corpus and it is available via the standard corpus query interface KonText at http://kontext.korpus.cz as well as a dataset in shuffled format.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
AI - Linguistics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/LM2015044" target="_blank" >LM2015044: Czech National Corpus</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
ISBN
978-2-9517408-9-1
ISSN
—
e-ISSN
—
Number of pages
7
Pages from-to
2522-2528
Publisher name
ELRA
Place of publication
Portorož
Event location
Portorož
Event date
May 25, 2016
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—