Supervised Morphological Segmentation Using Rich Annotated Lexicon
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F19%3A10405556" target="_blank" >RIV/00216208:11320/19:10405556 - isvavai.cz</a>
Result on the web
<a href="http://lml.bas.bg/ranlp2019/proceedings-ranlp-2019.pdf" target="_blank" >http://lml.bas.bg/ranlp2019/proceedings-ranlp-2019.pdf</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Supervised Morphological Segmentation Using Rich Annotated Lexicon
Original language description
Morphological segmentation of words is the process of dividing a word into smaller units called morphemes; it is tricky especially when a morphologically rich or polysynthetic language is under question. In this work, we designed and evaluated several Recurrent Neural Network (RNN) based models as well as various other machine learning based approaches for the morphological segmentation task. We trained our models using annotated segmentation lexicons. To evaluate the effect of the training data size on our models, we decided to create a large hand-annotated morphologically segmented corpus of Persian words, which is, to the best of our knowledge, the first and the only segmentation lexicon for the Persian language. In the experimental phase, using the hand-annotated Persian lexicon and two smaller similar lexicons for Czech and Finnish languages, we evaluated the effect of the training data size, different hyper-parameters settings as well as different RNN-based models.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2019
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
International Conference "Recent Advances in Natural Language Processing"
ISBN
978-954-452-055-7
ISSN
1313-8502
e-ISSN
—
Number of pages
10
Pages from-to
52-61
Publisher name
INCOMA Ltd.
Place of publication
Varna, Bulgaria
Event location
Varna, Bulgaria
Event date
Sep 2, 2019
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—