Semi-Automatic Construction of Word-Formation Networks (for Polish and Spanish)
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F18%3A10390174" target="_blank" >RIV/00216208:11320/18:10390174 - isvavai.cz</a>
Výsledek na webu
<a href="http://www.lrec-conf.org/proceedings/lrec2018/pdf/231.pdf" target="_blank" >http://www.lrec-conf.org/proceedings/lrec2018/pdf/231.pdf</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Semi-Automatic Construction of Word-Formation Networks (for Polish and Spanish)
Popis výsledku v původním jazyce
The paper presents a semi-automatic method for the construction of derivational networks. The proposed approach applies a sequential pattern mining technique in order to construct useful morphological features in an unsupervised manner. The features take the form of regular expressions and later are used to feed a machine-learned ranking model. The network is constructed by applying resulting model to sort the lists of possible base words and selecting the most probable ones. This approach, besides relatively small training set and a lexicon, does not require any additional language resources such as a list of alternations groups, POS tags etc. The proposed approach is applied to the lexeme sets of two languages, namely Polish and Spanish, which results in the establishment of two novel word-formation networks. Finally, the network constructed for Polish is merged with the derivational connections extracted from the Polish WordNet and those resulting from the derivational rules developed by a linguist
Název v anglickém jazyce
Semi-Automatic Construction of Word-Formation Networks (for Polish and Spanish)
Popis výsledku anglicky
The paper presents a semi-automatic method for the construction of derivational networks. The proposed approach applies a sequential pattern mining technique in order to construct useful morphological features in an unsupervised manner. The features take the form of regular expressions and later are used to feed a machine-learned ranking model. The network is constructed by applying resulting model to sort the lists of possible base words and selecting the most probable ones. This approach, besides relatively small training set and a lexicon, does not require any additional language resources such as a list of alternations groups, POS tags etc. The proposed approach is applied to the lexeme sets of two languages, namely Polish and Spanish, which results in the establishment of two novel word-formation networks. Finally, the network constructed for Polish is merged with the derivational connections extracted from the Polish WordNet and those resulting from the derivational rules developed by a linguist
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018)
ISBN
979-10-95546-00-9
ISSN
—
e-ISSN
neuvedeno
Počet stran výsledku
8
Strana od-do
1853-1860
Název nakladatele
European Language Resources Association
Místo vydání
Paris, France
Místo konání akce
Miyazaki, Japan
Datum konání akce
7. 5. 2018
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—