Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

Context-based Translation for the Out of Vocabulary Words Applied to Hindi-English Cross-Lingual Information Retrieval

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3AGAEAEM9K" target="_blank" >RIV/00216208:11320/22:GAEAEM9K - isvavai.cz</a>

  • Nalezeny alternativní kódy

    RIV/00216208:11320/23:Q9Y7NBVE

  • Výsledek na webu

    <a href="https://doi.org/10.1080/02564602.2020.1843553" target="_blank" >https://doi.org/10.1080/02564602.2020.1843553</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1080/02564602.2020.1843553" target="_blank" >10.1080/02564602.2020.1843553</a>

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    Context-based Translation for the Out of Vocabulary Words Applied to Hindi-English Cross-Lingual Information Retrieval

  • Popis výsledku v původním jazyce

    Cross-Lingual Information Retrieval (CLIR) provides flexibility to users to query in their regional (source) languages regardless the target documents languages. CLIR uses trending translation techniques Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). SMT and NMT achieve good results for foreign languages but not for Indian languages due to non-absoluteness of the parallel corpus. Source language user queries may contain the Out Of Vocabulary (OOV) words which are not present in the parallel corpus such words may be skipped without performing translation by SMT. In this paper, a context-based translation algorithm is proposed to translate the OOV words by utilizing two unlabeled & unrelated large raw corpora (in source and target language) and a small bi-lingual parallel corpus. Since SMT performs better than NMT for Hindi to English translation as per the literature, therefore, experimental results are evaluated for FIRE datasets against baseline SMT. The proposed algorithm improves evaluation measures, Recall up to 6.04% (0.8785) for FIRE 2010 and up to 3.96% (0.7365) for FIRE 2011, & Mean Average Precision (MAP) up to 14.37% (0.3239) for FIRE 2010 and up to 5.46% (0.1988) for FIRE 2011, in comparison to the baseline SMT which achieves 0.8284 and 0.7084 Recall for FIRE 2010 and 2011, & 0.2832 and 0.1885 MAP for FIRE 2010 and 2011. An analysis for the number of OOV words shows that the proposed algorithm reduces the number of OOV more effectively, up to 0.81% for FIRE 2010 and 1.73% for FIRE 2011.

  • Název v anglickém jazyce

    Context-based Translation for the Out of Vocabulary Words Applied to Hindi-English Cross-Lingual Information Retrieval

  • Popis výsledku anglicky

    Cross-Lingual Information Retrieval (CLIR) provides flexibility to users to query in their regional (source) languages regardless the target documents languages. CLIR uses trending translation techniques Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). SMT and NMT achieve good results for foreign languages but not for Indian languages due to non-absoluteness of the parallel corpus. Source language user queries may contain the Out Of Vocabulary (OOV) words which are not present in the parallel corpus such words may be skipped without performing translation by SMT. In this paper, a context-based translation algorithm is proposed to translate the OOV words by utilizing two unlabeled & unrelated large raw corpora (in source and target language) and a small bi-lingual parallel corpus. Since SMT performs better than NMT for Hindi to English translation as per the literature, therefore, experimental results are evaluated for FIRE datasets against baseline SMT. The proposed algorithm improves evaluation measures, Recall up to 6.04% (0.8785) for FIRE 2010 and up to 3.96% (0.7365) for FIRE 2011, & Mean Average Precision (MAP) up to 14.37% (0.3239) for FIRE 2010 and up to 5.46% (0.1988) for FIRE 2011, in comparison to the baseline SMT which achieves 0.8284 and 0.7084 Recall for FIRE 2010 and 2011, & 0.2832 and 0.1885 MAP for FIRE 2010 and 2011. An analysis for the number of OOV words shows that the proposed algorithm reduces the number of OOV more effectively, up to 0.81% for FIRE 2010 and 1.73% for FIRE 2011.

Klasifikace

  • Druh

    J<sub>imp</sub> - Článek v periodiku v databázi Web of Science

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

  • Návaznosti

Ostatní

  • Rok uplatnění

    2022

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název periodika

    IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India)

  • ISSN

    0256-4602

  • e-ISSN

    0974-5971

  • Svazek periodika

    39

  • Číslo periodika v rámci svazku

    2

  • Stát vydavatele periodika

    IN - Indická republika

  • Počet stran výsledku

    10

  • Strana od-do

    276-285

  • Kód UT WoS článku

    000592603100001

  • EID výsledku v databázi Scopus

    2-s2.0-85096773748