Czech Text Document Corpus v 2.0
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F19%3A43952007" target="_blank" >RIV/49777513:23520/19:43952007 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Czech Text Document Corpus v 2.0
Original language description
This paper introduces “Czech Text Document Corpus v 2.0”, a collection of text documents for automatic document classification in Czech language. It is composed of the text documents provided by the Czech News Agency and is freely available for research purposes. This corpus was created in order to facilitate a straightforward comparison of document classification approaches on Czech data. It is particularly dedicated to evaluation of multi-label document classification approaches, because one document is usually labelled with more than one label. Besides the information about the document classes, the corpus is also annotated at the morphological layer. This paper further shows the results of selected state-of-the-art methods on this corpus to offer the possibility of an easy comparison with these approaches.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/LO1506" target="_blank" >LO1506: Sustainability support of the centre NTIS - New Technologies for the Information Society</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2019
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
ISBN
979-10-95546-00-9
ISSN
—
e-ISSN
—
Number of pages
4
Pages from-to
4345-4348
Publisher name
European Language Resources Association (ELRA)
Place of publication
Francie
Event location
Miyazaki, Japan
Event date
May 7, 2018
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—