All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Supervised Morphological Segmentation Using Rich Annotated Lexicon

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F19%3A10405556" target="_blank" >RIV/00216208:11320/19:10405556 - isvavai.cz</a>

  • Result on the web

    <a href="http://lml.bas.bg/ranlp2019/proceedings-ranlp-2019.pdf" target="_blank" >http://lml.bas.bg/ranlp2019/proceedings-ranlp-2019.pdf</a>

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Supervised Morphological Segmentation Using Rich Annotated Lexicon

  • Original language description

    Morphological segmentation of words is the process of dividing a word into smaller units called morphemes; it is tricky especially when a morphologically rich or polysynthetic language is under question. In this work, we designed and evaluated several Recurrent Neural Network (RNN) based models as well as various other machine learning based approaches for the morphological segmentation task. We trained our models using annotated segmentation lexicons. To evaluate the effect of the training data size on our models, we decided to create a large hand-annotated morphologically segmented corpus of Persian words, which is, to the best of our knowledge, the first and the only segmentation lexicon for the Persian language. In the experimental phase, using the hand-annotated Persian lexicon and two smaller similar lexicons for Czech and Finnish languages, we evaluated the effect of the training data size, different hyper-parameters settings as well as different RNN-based models.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    Result was created during the realization of more than one project. More information in the Projects tab.

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2019

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    International Conference &quot;Recent Advances in Natural Language Processing&quot;

  • ISBN

    978-954-452-055-7

  • ISSN

    1313-8502

  • e-ISSN

  • Number of pages

    10

  • Pages from-to

    52-61

  • Publisher name

    INCOMA Ltd.

  • Place of publication

    Varna, Bulgaria

  • Event location

    Varna, Bulgaria

  • Event date

    Sep 2, 2019

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article