Construction of Amharic information retrieval resources and corpora

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AU77CT9GK" target="_blank" >RIV/00216208:11320/25:U77CT9GK - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85197301968&doi=10.1007%2fs10579-024-09719-x&partnerID=40&md5=54b748f1a7c16f31baa227ead33e086d" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85197301968&doi=10.1007%2fs10579-024-09719-x&partnerID=40&md5=54b748f1a7c16f31baa227ead33e086d</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s10579-024-09719-x" target="_blank" >10.1007/s10579-024-09719-x</a>

Alternative languages

Result language
angličtina
Original language name
Construction of Amharic information retrieval resources and corpora
Original language description
The development of information retrieval systems and natural language processing tools has been made possible for many natural languages because of the availability of natural language resources and corpora. Although Amharic is the working language of Ethiopia, it is still an under-resourced language. There are no adequate resources and corpora for Amharic ad-hoc retrieval evaluation to date. The existing ones are not publicly accessible and are not suitable for making scientific evaluation of information retrieval systems. To promote the development of Amharic ad-hoc retrieval, we build an ad-hoc retrieval test collection that consists of raw text, morphologically annotated stem-based and root-based corpora, a stopword list, stem-based and root-based lexicons, and WordNet-like resources. We also created word embeddings using the raw text and morphologically segmented forms of the corpora. When building these resources and corpora, we heavily consider the morphological characteristics of the language. The aim of this paper is to present these Amharic resources and corpora that we made available to the research community for information retrieval tasks. These resources and corpora are also evaluated experimentally and by linguists. © The Author(s), under exclusive licence to Springer Nature B.V. 2024.
Czech name
—
Czech description
—

Classification

Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
—
Continuities
—

Others

Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Name of the periodical
Language Resources and Evaluation
ISSN
1574-020X
e-ISSN
—
Volume of the periodical
2024
Issue of the periodical within the volume
2024
Country of publishing house
US - UNITED STATES
Number of pages
29
Pages from-to
1-29
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85197301968

Similar results(10)

HaBiT system Corpus Generation to Develop Amharic Morphological Segmenter Annotated Amharic Corpora

What are you looking for?

Quick search

Smart search

Construction of Amharic information retrieval resources and corpora

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)