Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3ADA7KHNJ6" target="_blank" >RIV/00216208:11320/25:DA7KHNJ6 - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85204809265&partnerID=40&md5=c1292301b87f9a90321c2b969df83e5f" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85204809265&partnerID=40&md5=c1292301b87f9a90321c2b969df83e5f</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time
Original language description
Existing Latin treebanks draw from Latin’s long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks’ annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to identify the texts they draw from, identify their overlap, and document their coverage across time and genre. We additionally design automated conversions of their morphological feature annotations into the conventions of standard Latin grammar. From this, we build new time-period data splits that draw from the existing treebanks which we use to perform a broad cross-time analysis for POS and morphological feature tagging. We find that BERT-based taggers outperform existing taggers while also being more robust to cross-domain shifts. © 2024 Association for Computational Linguistics.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
ML4AL - Workshop Mach. Learn. Anc. Lang., Proc. Workshop
ISBN
979-889176144-5
ISSN
—
e-ISSN
—
Number of pages
16
Pages from-to
203-218
Publisher name
Association for Computational Linguistics (ACL)
Place of publication
—
Event location
Hybrid, Bangkok
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—