AUTOMATIC LEMMATIZATION OF ANCIENT GREEK INSCRIPTIONS: A PRESENTATION OF AGILE
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3A4IQH7Y6K" target="_blank" >RIV/00216208:11320/25:4IQH7Y6K - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85201624017&doi=10.19272%2f202413701002&partnerID=40&md5=77d1f4c3bfb4d6a8b4ea4d655af04d6f" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85201624017&doi=10.19272%2f202413701002&partnerID=40&md5=77d1f4c3bfb4d6a8b4ea4d655af04d6f</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.19272/202413701002" target="_blank" >10.19272/202413701002</a>
Alternative languages
Result language
angličtina
Original language name
AUTOMATIC LEMMATIZATION OF ANCIENT GREEK INSCRIPTIONS: A PRESENTATION OF AGILE
Original language description
In this paper, we present the first automatic lemmatizer for Ancient Greek Inscriptions (AGILe). Lemmatization of ancient texts, the process of tagging each word with the base form equal to the dictionary entry, benefits researchers, since searches on a lemmatized corpus can retrieve all occurrences of a lemma in one query. Whereas the corpus of literary texts (e.g. the Thesaurus Linguae Graecae) has been lemmatized, the vast majority of Ancient Greek inscriptions has not. Lemmatization is useful especially for inscriptions, since these are texts with a great amount of dialectal and spelling variation, but to lemmatize this vast corpus by hand would be an enormous task. We evaluated the performance of five existing automatic lemmatizers, developed for literary Greek, on epigraphic texts. Since their performance was disappointing (61.5% accuracy at best), we developed a new lemmatizer dedicated to Greek inscriptions. The performance of our lemmatizer is 85.6%. We provide a detailed error analysis as well as concrete suggestions for future improvement, as first steps towards the integration of AGILe in an online corpus of inscriptions. © 2024 Fabrizio Serra Editore Srl. All rights reserved.
Czech name
—
Czech description
—
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Journal of Epigraphic Studies
ISSN
2611-979X
e-ISSN
—
Volume of the periodical
7
Issue of the periodical within the volume
2024
Country of publishing house
US - UNITED STATES
Number of pages
22
Pages from-to
29-50
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85201624017