Hybrid embeddings for transition-based dependency parsing of free word order languages
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A5KLEQ72G" target="_blank" >RIV/00216208:11320/23:5KLEQ72G - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85150066344&doi=10.1016%2fj.ipm.2023.103334&partnerID=40&md5=bf97bf992dc6554eb855a6e6dbddd1ae" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85150066344&doi=10.1016%2fj.ipm.2023.103334&partnerID=40&md5=bf97bf992dc6554eb855a6e6dbddd1ae</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.ipm.2023.103334" target="_blank" >10.1016/j.ipm.2023.103334</a>
Alternative languages
Result language
angličtina
Original language name
Hybrid embeddings for transition-based dependency parsing of free word order languages
Original language description
"Neural Dependency parsing relies on embeddings such as word embeddings and part of speech (POS) embeddings. We propose embeddings which convey more meanings in case of Arabic scripted, morphologically rich, free word order languages. In such languages, part of speech (POS) and morphological features (feats) of a particular word in a sentence govern the suffixes of another word in the same sentence. Keeping this in view, we augment the famous quote “a word is known by the company it keeps” and propose that “a POS is known by the company of suffixes it keeps” and “a morphological feat is known by the company of suffixes it keeps”. We propose two novel embeddings which are XPOSngram and FEATSngram embeddings. These embeddings are trained on heterogeneous items i.e. the pair of language specific POS (XPOS) and n-grams, referred to as ‘XPOSngram’; and morphological feats and n-grams, called ‘FEATSngram’. We call these new type of embeddings hybrid embeddings. We perform experiments on five treebanks, taken from universal dependencies (UD), which belong to four Arabic-scripted, morphologically rich, free word order, and low-resource languages (i.e. Urdu, Arabic, Persian and Uyghur). These treebanks consist of 42985 sentences in total. The experimental results show that on the average, the proposed approach has ≈1.24%, ≈0.84% and ≈3.31% gain in unlabelled attachment score (UAS) over the state of the art language specific POS embeddings, universal POS embeddings and n-gram embeddings based approaches respectively. We have compared the results of hybrid embeddings for Arabic language with the state of the art ArWordVec embeddings. The proposed solution achieves UAS which is ≈10.27% higher than the UAS achieved by ArWordVec. We have further compared the results of hybrid embeddings of Urdu with two state of the art Urdu word embeddings. The results show that the best hybrid embedding has a UAS ≈3.32% and ≈5.015% higher than the two embeddings. We have also tested the proposed methodology for five treebanks of non-Arabic scripted languages from the UD, which are Belarusian, Dutch, German, Greek, and Hungarian languages. The experimental results demonstrate that the proposed approach not only outperform for Arabic scripted languages, but generalizes well for non-Arabic scripted, free word order languages with an average gain of ≈2.5%, ≈2.8% and ≈7.5% in UAS over the state of the art XPOS, UPOS and n-gram based approaches. © 2023 Elsevier Ltd"
Czech name
—
Czech description
—
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
"Information Processing and Management"
ISSN
0306-4573
e-ISSN
—
Volume of the periodical
60
Issue of the periodical within the volume
3
Country of publishing house
US - UNITED STATES
Number of pages
21
Pages from-to
1-21
UT code for WoS article
000956224800001
EID of the result in the Scopus database
2-s2.0-85150066344