All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

W2C - Web To Corpus

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F11%3A10109519" target="_blank" >RIV/00216208:11320/11:10109519 - isvavai.cz</a>

  • Result on the web

    <a href="http://ufal.mff.cuni.cz/~majlis/w2c/" target="_blank" >http://ufal.mff.cuni.cz/~majlis/w2c/</a>

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    W2C - Web To Corpus

  • Original language description

    W2C is a collection of software and data. The software part radically facilitates creating a new text corpora for a given language, using text materials freely available on the Internet. A special attention was given to components for filtering that allow to keep the material quality very high. The data part contains corpora for more than 100 languages, with around 10 million words in each. This language data resource can be used especially by researchers specialized at developing multilingual technologies.

  • Czech name

  • Czech description

Classification

  • Type

    R - Software

  • CEP classification

    AI - Linguistics

  • OECD FORD branch

Result continuities

  • Project

    <a href="/en/project/1ET201120505" target="_blank" >1ET201120505: From a Natural Language to Knowledge and the Semantic Web</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>Z - Vyzkumny zamer (s odkazem do CEZ)

Others

  • Publication year

    2011

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Internal product ID

    UFAL-SW-W2C-1.0

  • Technical parameters

    http://ufal.mff.cuni.cz/~majlis/w2c/

  • Economical parameters

    1 060 000 CZK

  • Owner IČO

    00216208

  • Owner name

    Univerzita Karlova v Praze