Attempting to separate inflection and derivation using vector space representations
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F19%3A10405598" target="_blank" >RIV/00216208:11320/19:10405598 - isvavai.cz</a>
Result on the web
<a href="https://www.aclweb.org/anthology/W19-8508" target="_blank" >https://www.aclweb.org/anthology/W19-8508</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Attempting to separate inflection and derivation using vector space representations
Original language description
We investigate to what extent inflection can be automatically separated from derivation, just based on the word forms. We expect pairs of inflected forms of the same lemma to be closer to each other than pairs of inflected forms of two different lemmas (still derived from a same root, though), given a proper distance measure. We estimate distances of word forms using edit distance, which represents character-based similarity, and word embedding similarity, which serves as a proxy to meaning similarity. Specifically, we explore Levenshtein and Jaro-Winkler edit distances, and cosine similarity of FastText word embeddings. We evaluate the separability of inflection and derivation on a sample from DeriNet, a database of word formation relations in Czech. We investigate the word distance measures directly, as well as embedded in a clustering setup. Best results are achieved by using a combination of Jaro-Winkler edit distance and word embedding cosine similarity, outperforming each of the individual measu
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2019
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019)
ISBN
978-80-88132-08-0
ISSN
—
e-ISSN
—
Number of pages
10
Pages from-to
61-70
Publisher name
ÚFAL MFF UK
Place of publication
Praha, Czechia
Event location
Praha, Czechia
Event date
Sep 19, 2019
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—