An accurate transformer-based model for transition-based dependency parsing of free word order languages
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AYJ3TDMYL" target="_blank" >RIV/00216208:11320/25:YJ3TDMYL - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85196791144&doi=10.1016%2fj.jksuci.2024.102107&partnerID=40&md5=53c288a4abdb146ff518c1db179c9722" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85196791144&doi=10.1016%2fj.jksuci.2024.102107&partnerID=40&md5=53c288a4abdb146ff518c1db179c9722</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.jksuci.2024.102107" target="_blank" >10.1016/j.jksuci.2024.102107</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
An accurate transformer-based model for transition-based dependency parsing of free word order languages
Popis výsledku v původním jazyce
Transformer models are the state-of-the-art in Natural Language Processing (NLP) and the core of the Large Language Models (LLMs). We propose a transformer-based model for transition-based dependency parsing of free word order languages. We have performed experiments on five treebanks from the Universal Dependencies (UD) dataset version 2.12. Our experiments show that a transformer model, trained with the dynamic word embeddings performs better than a multilayer perceptron trained on the state-of-the-art static word embeddings even if the dynamic word embeddings have a vocabulary size ten times smaller than the static word embeddings. The results show that the transformer trained on dynamic word embeddings achieves an unlabeled attachment score (UAS) of 84.17% for Urdu language which is approximate to 3 . 6% and approximate to 1 . 9% higher than the UAS scores of 80.56857% and 82.26859% achieved by the multilayer perceptron (MLP) using two static state-ofthe-art word embeddings. The proposed approach is investigated for Arabic, Persian and Uyghur languages, in addition to Urdu, for UAS scores and the results suggest that the proposed solution outperform the MLP-based approaches.
Název v anglickém jazyce
An accurate transformer-based model for transition-based dependency parsing of free word order languages
Popis výsledku anglicky
Transformer models are the state-of-the-art in Natural Language Processing (NLP) and the core of the Large Language Models (LLMs). We propose a transformer-based model for transition-based dependency parsing of free word order languages. We have performed experiments on five treebanks from the Universal Dependencies (UD) dataset version 2.12. Our experiments show that a transformer model, trained with the dynamic word embeddings performs better than a multilayer perceptron trained on the state-of-the-art static word embeddings even if the dynamic word embeddings have a vocabulary size ten times smaller than the static word embeddings. The results show that the transformer trained on dynamic word embeddings achieves an unlabeled attachment score (UAS) of 84.17% for Urdu language which is approximate to 3 . 6% and approximate to 1 . 9% higher than the UAS scores of 80.56857% and 82.26859% achieved by the multilayer perceptron (MLP) using two static state-ofthe-art word embeddings. The proposed approach is investigated for Arabic, Persian and Uyghur languages, in addition to Urdu, for UAS scores and the results suggest that the proposed solution outperform the MLP-based approaches.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
ISSN
1319-1578
e-ISSN
2213-1248
Svazek periodika
36
Číslo periodika v rámci svazku
6
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
12
Strana od-do
1-12
Kód UT WoS článku
001261229500001
EID výsledku v databázi Scopus
2-s2.0-85196791144