Supervised Morphological Segmentation Using Rich Annotated Lexicon

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F19%3A10405556" target="_blank" >RIV/00216208:11320/19:10405556 - isvavai.cz</a>
Výsledek na webu
<a href="http://lml.bas.bg/ranlp2019/proceedings-ranlp-2019.pdf" target="_blank" >http://lml.bas.bg/ranlp2019/proceedings-ranlp-2019.pdf</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Supervised Morphological Segmentation Using Rich Annotated Lexicon
Popis výsledku v původním jazyce
Morphological segmentation of words is the process of dividing a word into smaller units called morphemes; it is tricky especially when a morphologically rich or polysynthetic language is under question. In this work, we designed and evaluated several Recurrent Neural Network (RNN) based models as well as various other machine learning based approaches for the morphological segmentation task. We trained our models using annotated segmentation lexicons. To evaluate the effect of the training data size on our models, we decided to create a large hand-annotated morphologically segmented corpus of Persian words, which is, to the best of our knowledge, the first and the only segmentation lexicon for the Persian language. In the experimental phase, using the hand-annotated Persian lexicon and two smaller similar lexicons for Czech and Finnish languages, we evaluated the effect of the training data size, different hyper-parameters settings as well as different RNN-based models.
Název v anglickém jazyce
Supervised Morphological Segmentation Using Rich Annotated Lexicon
Popis výsledku anglicky
Morphological segmentation of words is the process of dividing a word into smaller units called morphemes; it is tricky especially when a morphologically rich or polysynthetic language is under question. In this work, we designed and evaluated several Recurrent Neural Network (RNN) based models as well as various other machine learning based approaches for the morphological segmentation task. We trained our models using annotated segmentation lexicons. To evaluate the effect of the training data size on our models, we decided to create a large hand-annotated morphologically segmented corpus of Persian words, which is, to the best of our knowledge, the first and the only segmentation lexicon for the Persian language. In the experimental phase, using the hand-annotated Persian lexicon and two smaller similar lexicons for Czech and Finnish languages, we evaluated the effect of the training data size, different hyper-parameters settings as well as different RNN-based models.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
International Conference "Recent Advances in Natural Language Processing"
ISBN
978-954-452-055-7
ISSN
1313-8502
e-ISSN
—
Počet stran výsledku
10
Strana od-do
52-61
Název nakladatele
INCOMA Ltd.
Místo vydání
Varna, Bulgaria
Místo konání akce
Varna, Bulgaria
Datum konání akce
2. 9. 2019
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Building a Morphological Network for Persian on Top of a Morpheme-Segmented Lexicon Morphological Networks for Persian and Turkish: What Can Be Induced from Morpheme Segmentation?Morphological Segmentation with Neural Networks: Performance Effects of Architecture, Data Size, and Cross-Lingual Transfer in Seven Languages

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Supervised Morphological Segmentation Using Rich Annotated Lexicon

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)