All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Longest-commonest Match

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F15%3A00080952" target="_blank" >RIV/00216224:14330/15:00080952 - isvavai.cz</a>

  • Result on the web

    <a href="https://elex.link/elex2015/proceedings/eLex_2015_26_Kilgarriff+etal.pdf" target="_blank" >https://elex.link/elex2015/proceedings/eLex_2015_26_Kilgarriff+etal.pdf</a>

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Longest-commonest Match

  • Original language description

    Finding two-word collocations is a well-studied task within natural language processing. The result of this task for a given headword is usually a list of collocations sorted by a salience score. In corpus manager Sketch Engine, these pairs are extractedfrom data using a word sketch grammar relation rules and log-dice statistics resulting in a sorted list of triples . The longest?commonest match is a straightforward extension of these two-word collocations into multiword expressions. The resulting expressions are also very useful for representing the most common realisation of the collocational pair and to facilitate the interpretation of the raw triplet because sometimes, for such a triple, it is not clear from what texts it comes. We present here analgorithm behind the longest?commonest match together with a simple evaluation. The longest?commonest match is already implemented in Sketch Engine.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

    Result was created during the realization of more than one project. More information in the Projects tab.

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2015

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom.

  • ISBN

    9789619359433

  • ISSN

  • e-ISSN

  • Number of pages

    8

  • Pages from-to

    397-404

  • Publisher name

    Trojina, Institute for Applied Slovene Studies

  • Place of publication

    Jlubljana

  • Event location

    Herstmonceux

  • Event date

    Jan 1, 2015

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article