Longest-commonest Match
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F15%3A00080952" target="_blank" >RIV/00216224:14330/15:00080952 - isvavai.cz</a>
Result on the web
<a href="https://elex.link/elex2015/proceedings/eLex_2015_26_Kilgarriff+etal.pdf" target="_blank" >https://elex.link/elex2015/proceedings/eLex_2015_26_Kilgarriff+etal.pdf</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Longest-commonest Match
Original language description
Finding two-word collocations is a well-studied task within natural language processing. The result of this task for a given headword is usually a list of collocations sorted by a salience score. In corpus manager Sketch Engine, these pairs are extractedfrom data using a word sketch grammar relation rules and log-dice statistics resulting in a sorted list of triples . The longest?commonest match is a straightforward extension of these two-word collocations into multiword expressions. The resulting expressions are also very useful for representing the most common realisation of the collocational pair and to facilitate the interpretation of the raw triplet because sometimes, for such a triple, it is not clear from what texts it comes. We present here analgorithm behind the longest?commonest match together with a simple evaluation. The longest?commonest match is already implemented in Sketch Engine.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom.
ISBN
9789619359433
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
397-404
Publisher name
Trojina, Institute for Applied Slovene Studies
Place of publication
Jlubljana
Event location
Herstmonceux
Event date
Jan 1, 2015
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—