The Joy of Parallelism with CzEng 1.0
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F12%3A10130029" target="_blank" >RIV/00216208:11320/12:10130029 - isvavai.cz</a>
Result on the web
<a href="http://www.lrec-conf.org/proceedings/lrec2012/summaries/645.html" target="_blank" >http://www.lrec-conf.org/proceedings/lrec2012/summaries/645.html</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
The Joy of Parallelism with CzEng 1.0
Original language description
CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-commercial research or educational purposes. In this release, we approximately doubled the corpus size, reaching 15 million sentence pairs (about 200 million tokens per language). More importantly, we carefully filtered the data to reduce the amount of non-matching sentence pairs. CzEng 1.0 is automatically aligned at the level of sentences as well as words. We provide not only the plain text representation, but also automatic morphological tags, surface syntactic as well as deep syntactic dependency parse trees and automatic co-reference links in both English and Czech. This paper describes key properties of the released resource including the distribution of text domains, the corpus data formats, and a toolkit to handle the provided rich annotation. We also summarize the procedure of the rich annotation (incl. co-reference resolution) and of the automatic filtering. Finally, we provide some
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2012
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)
ISBN
978-2-9517408-7-7
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
3921-3928
Publisher name
European Language Resources Association
Place of publication
?stanbul, Turkey
Event location
?stanbul, Turkey
Event date
May 21, 2012
Type of event by nationality
CST - Celostátní akce
UT code for WoS article
—