All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Hybrid embeddings for transition-based dependency parsing of free word order languages

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A5KLEQ72G" target="_blank" >RIV/00216208:11320/23:5KLEQ72G - isvavai.cz</a>

  • Result on the web

    <a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85150066344&doi=10.1016%2fj.ipm.2023.103334&partnerID=40&md5=bf97bf992dc6554eb855a6e6dbddd1ae" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85150066344&doi=10.1016%2fj.ipm.2023.103334&partnerID=40&md5=bf97bf992dc6554eb855a6e6dbddd1ae</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1016/j.ipm.2023.103334" target="_blank" >10.1016/j.ipm.2023.103334</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Hybrid embeddings for transition-based dependency parsing of free word order languages

  • Original language description

    "Neural Dependency parsing relies on embeddings such as word embeddings and part of speech (POS) embeddings. We propose embeddings which convey more meanings in case of Arabic scripted, morphologically rich, free word order languages. In such languages, part of speech (POS) and morphological features (feats) of a particular word in a sentence govern the suffixes of another word in the same sentence. Keeping this in view, we augment the famous quote “a word is known by the company it keeps” and propose that “a POS is known by the company of suffixes it keeps” and “a morphological feat is known by the company of suffixes it keeps”. We propose two novel embeddings which are XPOSngram and FEATSngram embeddings. These embeddings are trained on heterogeneous items i.e. the pair of language specific POS (XPOS) and n-grams, referred to as ‘XPOSngram’; and morphological feats and n-grams, called ‘FEATSngram’. We call these new type of embeddings hybrid embeddings. We perform experiments on five treebanks, taken from universal dependencies (UD), which belong to four Arabic-scripted, morphologically rich, free word order, and low-resource languages (i.e. Urdu, Arabic, Persian and Uyghur). These treebanks consist of 42985 sentences in total. The experimental results show that on the average, the proposed approach has ≈1.24%, ≈0.84% and ≈3.31% gain in unlabelled attachment score (UAS) over the state of the art language specific POS embeddings, universal POS embeddings and n-gram embeddings based approaches respectively. We have compared the results of hybrid embeddings for Arabic language with the state of the art ArWordVec embeddings. The proposed solution achieves UAS which is ≈10.27% higher than the UAS achieved by ArWordVec. We have further compared the results of hybrid embeddings of Urdu with two state of the art Urdu word embeddings. The results show that the best hybrid embedding has a UAS ≈3.32% and ≈5.015% higher than the two embeddings. We have also tested the proposed methodology for five treebanks of non-Arabic scripted languages from the UD, which are Belarusian, Dutch, German, Greek, and Hungarian languages. The experimental results demonstrate that the proposed approach not only outperform for Arabic scripted languages, but generalizes well for non-Arabic scripted, free word order languages with an average gain of ≈2.5%, ≈2.8% and ≈7.5% in UAS over the state of the art XPOS, UPOS and n-gram based approaches. © 2023 Elsevier Ltd"

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

  • Continuities

Others

  • Publication year

    2023

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    "Information Processing and Management"

  • ISSN

    0306-4573

  • e-ISSN

  • Volume of the periodical

    60

  • Issue of the periodical within the volume

    3

  • Country of publishing house

    US - UNITED STATES

  • Number of pages

    21

  • Pages from-to

    1-21

  • UT code for WoS article

    000956224800001

  • EID of the result in the Scopus database

    2-s2.0-85150066344