All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Multi-word lexemes in syntactic context

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F60461373%3A22310%2F20%3A43921768" target="_blank" >RIV/60461373:22310/20:43921768 - isvavai.cz</a>

  • Result on the web

    <a href="https://dspace.cuni.cz/handle/20.500.11956/123090" target="_blank" >https://dspace.cuni.cz/handle/20.500.11956/123090</a>

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    čeština

  • Original language name

    Víceslovné lexémy v syntaktickém kontextu

  • Original language description

    We start with the assumption that (i) a corpus represents the use of language, i.e. linguistic performance, (ii) a rule-based grammar represents language as a system, i.e. linguistic competence, and (iii) corpus annotation represents the interface between the two. To detect and diagnose mismatches between the language use and the language system we use a constraint-based grammar run as a constraint solver on texts tagged and dependency-parsed by stochastic tools. The texts also have MWEs (multi-word expressions) identified and transformed into a constituency-based format before the grammar is applied. We describe the role and results of the grammar, and its use to check texts annotated with morphosyntactic categories, syntactic structure and information about the status of relevant expressions as MWEs. The grammar also employs lexical resources such as a valency lexicon and a database of MWEs to make the checking more accurate and the annotation more informative. The results are represented as typed feature structures where MWE-related information can be shared by lexical and phrasal nodes. This allows for the annotation of MWEs as lexical units, independently of their analysis in terms of syntactic structure. Focusing on the interplay of MWEs with their syntactic context we analyse a number of representative examples, pointing out the pros and cons of specific solutions and the whole approach.

  • Czech name

    Víceslovné lexémy v syntaktickém kontextu

  • Czech description

    We start with the assumption that (i) a corpus represents the use of language, i.e. linguistic performance, (ii) a rule-based grammar represents language as a system, i.e. linguistic competence, and (iii) corpus annotation represents the interface between the two. To detect and diagnose mismatches between the language use and the language system we use a constraint-based grammar run as a constraint solver on texts tagged and dependency-parsed by stochastic tools. The texts also have MWEs (multi-word expressions) identified and transformed into a constituency-based format before the grammar is applied. We describe the role and results of the grammar, and its use to check texts annotated with morphosyntactic categories, syntactic structure and information about the status of relevant expressions as MWEs. The grammar also employs lexical resources such as a valency lexicon and a database of MWEs to make the checking more accurate and the annotation more informative. The results are represented as typed feature structures where MWE-related information can be shared by lexical and phrasal nodes. This allows for the annotation of MWEs as lexical units, independently of their analysis in terms of syntactic structure. Focusing on the interplay of MWEs with their syntactic context we analyse a number of representative examples, pointing out the pros and cons of specific solutions and the whole approach.

Classification

  • Type

    J<sub>ost</sub> - Miscellaneous article in a specialist periodical

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

  • Continuities

    I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

  • Publication year

    2020

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Studie z aplikované lingvistiky

  • ISSN

    2336-6702

  • e-ISSN

  • Volume of the periodical

    2020

  • Issue of the periodical within the volume

    12

  • Country of publishing house

    CZ - CZECH REPUBLIC

  • Number of pages

    22

  • Pages from-to

    63-84

  • UT code for WoS article

  • EID of the result in the Scopus database