Behr at EvaLatin 2024: Latin Dependency Parsing Using Historical Sentence Embeddings
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3ADLBSJT6M" target="_blank" >RIV/00216208:11320/25:DLBSJT6M - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195176114&partnerID=40&md5=69c3c78bc426f034eba61d968b40f787" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195176114&partnerID=40&md5=69c3c78bc426f034eba61d968b40f787</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Behr at EvaLatin 2024: Latin Dependency Parsing Using Historical Sentence Embeddings
Original language description
This paper identifies the system used for my submission to EvaLatin’s shared dependency parsing task as part of the LT4HALA 2024 workshop. EvaLatin presented new Latin prose and poetry dependency test data from potentially different time periods, and imposed no restriction on training data or model selection for the task. This paper, therefore, sought to build a general Latin dependency parser that would perform accurately regardless of the Latin age to which the test data belongs. To train a general parser, all of the available Universal Dependencies treebanks were used, but in order to address the changes in the Latin language over time, this paper introduces historical sentence embeddings. A model was trained to encode sentences of the same Latin age into vectors of high cosine similarity, which are referred to as historical sentence embeddings. The system introduces these historical sentence embeddings into a biaffine dependency parser with the hopes of enabling training across the Latin treebanks in a more efficacious manner, but their inclusion shows no improvement over the base model. © 2024 ELRA Language Resources Association: CC BY-NC 4.0.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Workshop Lang. Technol. Hist. Anc. Lang., LT4HALA LREC-COLING - Workshop Proc.
ISBN
978-249381446-3
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
198-202
Publisher name
European Language Resources Association (ELRA)
Place of publication
—
Event location
Torino, Italia
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—