TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3ALYWAWLAQ" target="_blank" >RIV/00216208:11320/25:LYWAWLAQ - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189633553&partnerID=40&md5=cd7bcd5d9096a21fd919e344a11bf1e6" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189633553&partnerID=40&md5=cd7bcd5d9096a21fd919e344a11bf1e6</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages
Original language description
We present our submission to the unconstrained subtask of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages for morphological annotation, POS-tagging, lemmatization, character- and word-level gap-filling. We developed a simple, uniform, and computationally lightweight approach based on the adapters framework using parameter-efficient fine-tuning. We applied the same adapter-based approach uniformly to all tasks and 16 languages by fine-tuning stacked language- and task-specific adapters. Our submission obtained an overall second place out of three submissions, with the first place in word-level gap-filling. Our results show the feasibility of adapting language models pre-trained on modern languages to historical and ancient languages via adapter training. © 2024 Association for Computational Linguistics.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
SIGTYP - Workshop Res. Comput. Linguist. Typology Multiling. NLP, Proc. Workshop
ISBN
979-889176071-4
ISSN
—
e-ISSN
—
Number of pages
11
Pages from-to
120-130
Publisher name
Association for Computational Linguistics (ACL)
Place of publication
—
Event location
St. Julian's, Malta
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—