Allen Institute for AI @ SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3ANYSU2TXF" target="_blank" >RIV/00216208:11320/25:NYSU2TXF - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189634378&partnerID=40&md5=1803adfc9424430edf3373bb0474c0aa" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189634378&partnerID=40&md5=1803adfc9424430edf3373bb0474c0aa</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Allen Institute for AI @ SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages
Original language description
In this paper, we describe Allen AI’s submission to the constrained track of the SIGTYP 2024 Shared Task. Using only the data provided by the organizers, we pretrained a transformer-based multilingual model, then finetuned it on the Universal Dependencies (UD) annotations of a given language for a downstream task. Our systems achieved decent performance on the test set, beating the baseline in most language-task pairs, yet struggles with subtoken tags in multiword expressions as seen in Coptic and Ancient Hebrew. On the validation set, we obtained ≥70% F1-score on most language-task pairs. In addition, we also explored the cross-lingual capability of our trained models. This paper highlights our pretraining and finetuning process, and our findings from our internal evaluations. © 2024 Association for Computational Linguistics.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
SIGTYP - Workshop Res. Comput. Linguist. Typology Multiling. NLP, Proc. Workshop
ISBN
979-889176071-4
ISSN
—
e-ISSN
—
Number of pages
9
Pages from-to
151-159
Publisher name
Association for Computational Linguistics (ACL)
Place of publication
—
Event location
St. Julian's, Malta
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—