Towards accurate dependency parsing for Galician with limited resources
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AL39SNLPG" target="_blank" >RIV/00216208:11320/25:L39SNLPG - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85206510805&doi=10.26342%2f2024-73-18&partnerID=40&md5=0fa04a9f64eb9d360809cbc16f8c0cb2" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85206510805&doi=10.26342%2f2024-73-18&partnerID=40&md5=0fa04a9f64eb9d360809cbc16f8c0cb2</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.26342/2024-73-18" target="_blank" >10.26342/2024-73-18</a>
Alternative languages
Result language
angličtina
Original language name
Towards accurate dependency parsing for Galician with limited resources
Original language description
Automatic syntactic parsing is a fundamental aspect within NLP. However, effective parsing tools necessitate extensive and high-quality annotated treebanks for satisfactory performance. Consequently, the parsing quality for low-resource languages such as Galician remains inadequate. In this context, the present study explores several approaches to improve the automatic syntactic analysis of Galician using the UD framework. Through experimental endeavors, we analyze the quality of the model incrementing the size of the initial training corpus by adding data from Galician PUD treebank. Additionally, we explore the benefits of incorporating contextualized vector representations by comparing the use of various BERT models. Lastly, we assess the impact of integrating cross-lingual training data from similar varieties, analyzing the models’ performance across used treebanks. Our findings underscore (1) the positive correlation between augmented training data and enhanced model performance across used treebanks; (2) superior performance of monolingual BERT models compared to their multilingual analogues; (3) improvement of overall model performance across utilized treebanks by incorporation of cross-lingual data. © 2024 Sociedad Española para el Procesamiento del Lenguaje Natural.
Czech name
—
Czech description
—
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Procesamiento del Lenguaje Natural
ISSN
1135-5948
e-ISSN
—
Volume of the periodical
2024
Issue of the periodical within the volume
73
Country of publishing house
US - UNITED STATES
Number of pages
11
Pages from-to
247-257
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85206510805