Improving Latin Dependency Parsing by Combining Treebanks and Predictions
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AIHFX53EU" target="_blank" >RIV/00216208:11320/25:IHFX53EU - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216580264&partnerID=40&md5=ef3021de3b5b84c97577b16b7f7d7772" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216580264&partnerID=40&md5=ef3021de3b5b84c97577b16b7f7d7772</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Improving Latin Dependency Parsing by Combining Treebanks and Predictions
Original language description
This paper introduces new models designed to improve the morpho-syntactic parsing of the five largest Latin treebanks in the Universal Dependencies (UD) framework.First, using two state-of-the-art parsers, Trankit and Stanza, along with our custom UD tagger, we train new models on the five treebanks both individually and by combining them into novel merged datasets.We also test the models on the CIRCSE test set.In an additional experiment, we evaluate whether this set can be accurately tagged using the novel LASLA corpus (https://github.com/CIRCSE/LASLA).Second, we aim to improve the results by combining the predictions of different models through an atomic morphological feature voting system.The results of our two main experiments demonstrate significant improvements, particularly for the smaller treebanks, with LAS scores increasing by 16.10 and 11.85%-points for UDante and Perseus, respectively (Gamba and Zeman, 2023a).Additionally, the voting system for morphological features (FEATS) brings improvements, especially for the smaller Latin treebanks: Perseus 3.15% and CIRCSE 2.47%-points.Tagging the CIRCSE set with our custom model using the LASLA model improves POS 6.71 and FEATS 11.04%-points compared to our best-performing UD PROIEL model.Our results show that larger datasets and ensemble predictions can significantly improve performance. © 2024 Association for Computational Linguistics.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
NLP4DH - Int. Conf. Nat. Lang. Process. Digit. Humanit., Proc. Conf.
ISBN
979-889176181-0
ISSN
—
e-ISSN
—
Number of pages
13
Pages from-to
216-228
Publisher name
Association for Computational Linguistics (ACL)
Place of publication
—
Event location
Miami
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—