Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

LKMT: Linguistics Knowledge-DrivenMulti-Task NeuralMachine Translation for Urdu and English

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AMHAJJG69" target="_blank" >RIV/00216208:11320/25:MHAJJG69 - isvavai.cz</a>

  • Výsledek na webu

    <a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85206828110&doi=10.32604%2fcmc.2024.054673&partnerID=40&md5=11cc8e5a13114b45ab4a4331dd861ce1" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85206828110&doi=10.32604%2fcmc.2024.054673&partnerID=40&md5=11cc8e5a13114b45ab4a4331dd861ce1</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.32604/cmc.2024.054673" target="_blank" >10.32604/cmc.2024.054673</a>

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    LKMT: Linguistics Knowledge-DrivenMulti-Task NeuralMachine Translation for Urdu and English

  • Popis výsledku v původním jazyce

    Thanks to the strong representation capability of pre-trained language models, supervised machine translation models have achieved outstanding performance. However, the performances of these models drop sharply when the scale of the parallel training corpus is limited. Considering the pre-trained language model has a strong ability formonolingual representation, it is the key challenge formachine translation to construct the in-depth relationship between the source and target language by injecting the lexical and syntactic information into pre-trained language models. To alleviate the dependence on the parallel corpus, we propose a Linguistics Knowledge-Driven Multi- Task (LKMT) approach to inject part-of-speech and syntactic knowledge into pre-trained models, thus enhancing the machine translation performance. On the one hand, we integrate part-of-speech and dependency labels into the embedding layer and exploit large-scale monolingual corpus to update all parameters of pre-trained language models, thus ensuring the updated language model contains potential lexical and syntactic information. On the other hand, we leverage an extra self-attention layer to explicitly inject linguistic knowledge into the pre-trained language model-enhanced machine translation model. Experiments on the benchmark dataset show that our proposed LKMT approach improves the Urdu-English translation accuracy by 1.97 points and the English-Urdu translation accuracy by 2.42 points, highlighting the effectiveness of our LKMT framework. Detailed ablation experiments confirm the positive impact of part-of-speech and dependency parsing on machine translation. © 2024 The Authors.

  • Název v anglickém jazyce

    LKMT: Linguistics Knowledge-DrivenMulti-Task NeuralMachine Translation for Urdu and English

  • Popis výsledku anglicky

    Thanks to the strong representation capability of pre-trained language models, supervised machine translation models have achieved outstanding performance. However, the performances of these models drop sharply when the scale of the parallel training corpus is limited. Considering the pre-trained language model has a strong ability formonolingual representation, it is the key challenge formachine translation to construct the in-depth relationship between the source and target language by injecting the lexical and syntactic information into pre-trained language models. To alleviate the dependence on the parallel corpus, we propose a Linguistics Knowledge-Driven Multi- Task (LKMT) approach to inject part-of-speech and syntactic knowledge into pre-trained models, thus enhancing the machine translation performance. On the one hand, we integrate part-of-speech and dependency labels into the embedding layer and exploit large-scale monolingual corpus to update all parameters of pre-trained language models, thus ensuring the updated language model contains potential lexical and syntactic information. On the other hand, we leverage an extra self-attention layer to explicitly inject linguistic knowledge into the pre-trained language model-enhanced machine translation model. Experiments on the benchmark dataset show that our proposed LKMT approach improves the Urdu-English translation accuracy by 1.97 points and the English-Urdu translation accuracy by 2.42 points, highlighting the effectiveness of our LKMT framework. Detailed ablation experiments confirm the positive impact of part-of-speech and dependency parsing on machine translation. © 2024 The Authors.

Klasifikace

  • Druh

    J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

  • Návaznosti

Ostatní

  • Rok uplatnění

    2024

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název periodika

    Computers, Materials and Continua

  • ISSN

    1546-2218

  • e-ISSN

  • Svazek periodika

    81

  • Číslo periodika v rámci svazku

    1

  • Stát vydavatele periodika

    US - Spojené státy americké

  • Počet stran výsledku

    19

  • Strana od-do

    951-969

  • Kód UT WoS článku

  • EID výsledku v databázi Scopus

    2-s2.0-85206828110