Construction of Amharic information retrieval resources and corpora
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AU77CT9GK" target="_blank" >RIV/00216208:11320/25:U77CT9GK - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85197301968&doi=10.1007%2fs10579-024-09719-x&partnerID=40&md5=54b748f1a7c16f31baa227ead33e086d" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85197301968&doi=10.1007%2fs10579-024-09719-x&partnerID=40&md5=54b748f1a7c16f31baa227ead33e086d</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s10579-024-09719-x" target="_blank" >10.1007/s10579-024-09719-x</a>
Alternative languages
Result language
angličtina
Original language name
Construction of Amharic information retrieval resources and corpora
Original language description
The development of information retrieval systems and natural language processing tools has been made possible for many natural languages because of the availability of natural language resources and corpora. Although Amharic is the working language of Ethiopia, it is still an under-resourced language. There are no adequate resources and corpora for Amharic ad-hoc retrieval evaluation to date. The existing ones are not publicly accessible and are not suitable for making scientific evaluation of information retrieval systems. To promote the development of Amharic ad-hoc retrieval, we build an ad-hoc retrieval test collection that consists of raw text, morphologically annotated stem-based and root-based corpora, a stopword list, stem-based and root-based lexicons, and WordNet-like resources. We also created word embeddings using the raw text and morphologically segmented forms of the corpora. When building these resources and corpora, we heavily consider the morphological characteristics of the language. The aim of this paper is to present these Amharic resources and corpora that we made available to the research community for information retrieval tasks. These resources and corpora are also evaluated experimentally and by linguists. © The Author(s), under exclusive licence to Springer Nature B.V. 2024.
Czech name
—
Czech description
—
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Language Resources and Evaluation
ISSN
1574-020X
e-ISSN
—
Volume of the periodical
2024
Issue of the periodical within the volume
2024
Country of publishing house
US - UNITED STATES
Number of pages
29
Pages from-to
1-29
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85197301968