CzEng 0.9, Building a Large Czech-English Automatic Parallel Treebank
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F09%3A00207387" target="_blank" >RIV/00216208:11320/09:00207387 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
CzEng 0.9, Building a Large Czech-English Automatic Parallel Treebank
Original language description
We describe our ongoing efforts in collecting a Czech-English parallel corpus CzEng. The paper provides full details on the current version~0.9 and focuses on its new features: (1) data from new sources were added, most importantly a few hundred electronically available books, technical documentation and also some parallel web pages, (2) the full corpus has been automatically annotated up to the tectogrammatical layer (surface and deep syntactic analysis), (3) sentence segmentation has been refined, and(4) several heuristic filters to improve corpus quality were implemented. In total, we provide a sentence-aligned automatic parallel treebank of 8.0 million sentences, 93 English and 82 Czech words. CzEng~0.9 is freely available for non-commercial research purposes.
Czech name
—
Czech description
—
Classification
Type
J<sub>x</sub> - Unclassified - Peer-reviewed scientific article (Jimp, Jsc and Jost)
CEP classification
AI - Linguistics
OECD FORD branch
—
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2009
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Prague Bulletin of Mathematical Linguistics
ISSN
0032-6585
e-ISSN
—
Volume of the periodical
Neuveden
Issue of the periodical within the volume
92
Country of publishing house
CZ - CZECH REPUBLIC
Number of pages
20
Pages from-to
—
UT code for WoS article
—
EID of the result in the Scopus database
—