W2C - Web To Corpus

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F11%3A10109519" target="_blank" >RIV/00216208:11320/11:10109519 - isvavai.cz</a>
Result on the web
<a href="http://ufal.mff.cuni.cz/~majlis/w2c/" target="_blank" >http://ufal.mff.cuni.cz/~majlis/w2c/</a>
DOI - Digital Object Identifier
—

Result language
angličtina
Original language name
W2C - Web To Corpus
Original language description
W2C is a collection of software and data. The software part radically facilitates creating a new text corpora for a given language, using text materials freely available on the Internet. A special attention was given to components for filtering that allow to keep the material quality very high. The data part contains corpora for more than 100 languages, with around 10 million words in each. This language data resource can be used especially by researchers specialized at developing multilingual technologies.
Czech name
—
Czech description
—

Project
<a href="/en/project/1ET201120505" target="_blank" >1ET201120505: From a Natural Language to Knowledge and the Semantic Web</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>Z - Vyzkumny zamer (s odkazem do CEZ)

Publication year
2011
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Similar results(10)