All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Context-based Translation for the Out of Vocabulary Words Applied to Hindi-English Cross-Lingual Information Retrieval

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3AGAEAEM9K" target="_blank" >RIV/00216208:11320/22:GAEAEM9K - isvavai.cz</a>

  • Alternative codes found

    RIV/00216208:11320/23:Q9Y7NBVE

  • Result on the web

    <a href="https://doi.org/10.1080/02564602.2020.1843553" target="_blank" >https://doi.org/10.1080/02564602.2020.1843553</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1080/02564602.2020.1843553" target="_blank" >10.1080/02564602.2020.1843553</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Context-based Translation for the Out of Vocabulary Words Applied to Hindi-English Cross-Lingual Information Retrieval

  • Original language description

    Cross-Lingual Information Retrieval (CLIR) provides flexibility to users to query in their regional (source) languages regardless the target documents languages. CLIR uses trending translation techniques Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). SMT and NMT achieve good results for foreign languages but not for Indian languages due to non-absoluteness of the parallel corpus. Source language user queries may contain the Out Of Vocabulary (OOV) words which are not present in the parallel corpus such words may be skipped without performing translation by SMT. In this paper, a context-based translation algorithm is proposed to translate the OOV words by utilizing two unlabeled & unrelated large raw corpora (in source and target language) and a small bi-lingual parallel corpus. Since SMT performs better than NMT for Hindi to English translation as per the literature, therefore, experimental results are evaluated for FIRE datasets against baseline SMT. The proposed algorithm improves evaluation measures, Recall up to 6.04% (0.8785) for FIRE 2010 and up to 3.96% (0.7365) for FIRE 2011, & Mean Average Precision (MAP) up to 14.37% (0.3239) for FIRE 2010 and up to 5.46% (0.1988) for FIRE 2011, in comparison to the baseline SMT which achieves 0.8284 and 0.7084 Recall for FIRE 2010 and 2011, & 0.2832 and 0.1885 MAP for FIRE 2010 and 2011. An analysis for the number of OOV words shows that the proposed algorithm reduces the number of OOV more effectively, up to 0.81% for FIRE 2010 and 1.73% for FIRE 2011.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

  • Continuities

Others

  • Publication year

    2022

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India)

  • ISSN

    0256-4602

  • e-ISSN

    0974-5971

  • Volume of the periodical

    39

  • Issue of the periodical within the volume

    2

  • Country of publishing house

    IN - INDIA

  • Number of pages

    10

  • Pages from-to

    276-285

  • UT code for WoS article

    000592603100001

  • EID of the result in the Scopus database

    2-s2.0-85096773748