All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Czech MWE Database

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F08%3A00024204" target="_blank" >RIV/00216224:14330/08:00024204 - isvavai.cz</a>

  • Result on the web

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Czech MWE Database

  • Original language description

    In this paper we deal with a recently developed large Czech MWE database containing at the moment 160 000 MWEs (treated as lexical units). We describe the structure of the database and give basic types of MWEs according to domains they belong to. We compare the built MWEs database with the corpus data from Czech National Corpus (approx. 100 mil. tokens) and present results of this comparison in the paper. To obtain a more complete list of MWEs we propose and use a technique exploiting the Word Sketch Engine, which allows us to work with statistical parameters such as frequency of MWEs and their components as well as with the salience for the whole MWEs. We also discuss exploitation of the database for working out a more adequate tagging and lemmatization. The final goal is to be able to recognize MWEs in corpus text and lemmatize them as complete lexical units, i. e. to make tagging and lemmatization more adequate.

  • Czech name

    Česká databáze víceslovných vyrazů

  • Czech description

    Článek popisuje strukturu a obsah české databáze víceslovných výrazů obsahující v současnosti více než 160 000 položek a porovnává ji s daty Českého národního korpusu. Dále je navrženo, jak databázi doplňovat pomocí Word Sketch Engine.

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

    Result was created during the realization of more than one project. More information in the Projects tab.

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2008

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of the Sixth International Language Resources and Evaluation Conference (LREC '08)

  • ISBN

    2-9517408-4-0

  • ISSN

  • e-ISSN

  • Number of pages

    5

  • Pages from-to

  • Publisher name

    European Language Resources Association (ELRA)

  • Place of publication

    Marrakech, Morocco

  • Event location

    Marrakech, Morocco

  • Event date

    May 28, 2008

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article