All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Construction of Amharic information retrieval resources and corpora

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AU77CT9GK" target="_blank" >RIV/00216208:11320/25:U77CT9GK - isvavai.cz</a>

  • Result on the web

    <a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85197301968&doi=10.1007%2fs10579-024-09719-x&partnerID=40&md5=54b748f1a7c16f31baa227ead33e086d" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85197301968&doi=10.1007%2fs10579-024-09719-x&partnerID=40&md5=54b748f1a7c16f31baa227ead33e086d</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/s10579-024-09719-x" target="_blank" >10.1007/s10579-024-09719-x</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Construction of Amharic information retrieval resources and corpora

  • Original language description

    The development of information retrieval systems and natural language processing tools has been made possible for many natural languages because of the availability of natural language resources and corpora. Although Amharic is the working language of Ethiopia, it is still an under-resourced language. There are no adequate resources and corpora for Amharic ad-hoc retrieval evaluation to date. The existing ones are not publicly accessible and are not suitable for making scientific evaluation of information retrieval systems. To promote the development of Amharic ad-hoc retrieval, we build an ad-hoc retrieval test collection that consists of raw text, morphologically annotated stem-based and root-based corpora, a stopword list, stem-based and root-based lexicons, and WordNet-like resources. We also created word embeddings using the raw text and morphologically segmented forms of the corpora. When building these resources and corpora, we heavily consider the morphological characteristics of the language. The aim of this paper is to present these Amharic resources and corpora that we made available to the research community for information retrieval tasks. These resources and corpora are also evaluated experimentally and by linguists. © The Author(s), under exclusive licence to Springer Nature B.V. 2024.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

  • Continuities

Others

  • Publication year

    2024

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Language Resources and Evaluation

  • ISSN

    1574-020X

  • e-ISSN

  • Volume of the periodical

    2024

  • Issue of the periodical within the volume

    2024

  • Country of publishing house

    US - UNITED STATES

  • Number of pages

    29

  • Pages from-to

    1-29

  • UT code for WoS article

  • EID of the result in the Scopus database

    2-s2.0-85197301968