W2C - Web To Corpus
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F11%3A10109519" target="_blank" >RIV/00216208:11320/11:10109519 - isvavai.cz</a>
Result on the web
<a href="http://ufal.mff.cuni.cz/~majlis/w2c/" target="_blank" >http://ufal.mff.cuni.cz/~majlis/w2c/</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
W2C - Web To Corpus
Original language description
W2C is a collection of software and data. The software part radically facilitates creating a new text corpora for a given language, using text materials freely available on the Internet. A special attention was given to components for filtering that allow to keep the material quality very high. The data part contains corpora for more than 100 languages, with around 10 million words in each. This language data resource can be used especially by researchers specialized at developing multilingual technologies.
Czech name
—
Czech description
—
Classification
Type
R - Software
CEP classification
AI - Linguistics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/1ET201120505" target="_blank" >1ET201120505: From a Natural Language to Knowledge and the Semantic Web</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2011
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Internal product ID
UFAL-SW-W2C-1.0
Technical parameters
http://ufal.mff.cuni.cz/~majlis/w2c/
Economical parameters
1 060 000 CZK
Owner IČO
00216208
Owner name
Univerzita Karlova v Praze