Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

Javanese part-of-speech tagging using cross-lingual transfer learning

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AG87LC4VZ" target="_blank" >RIV/00216208:11320/25:G87LC4VZ - isvavai.cz</a>

  • Výsledek na webu

    <a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200057183&doi=10.11591%2fijai.v13.i3.pp3498-3509&partnerID=40&md5=3bc107ded6fef1573c58cdb8f371ff2c" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200057183&doi=10.11591%2fijai.v13.i3.pp3498-3509&partnerID=40&md5=3bc107ded6fef1573c58cdb8f371ff2c</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.11591/ijai.v13.i3.pp3498-3509" target="_blank" >10.11591/ijai.v13.i3.pp3498-3509</a>

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    Javanese part-of-speech tagging using cross-lingual transfer learning

  • Popis výsledku v původním jazyce

    Large datasets that are publicly available for part-of-speech (POS) tagging do not always exist for some languages. One of those languages is Javanese, a local language in Indonesia, which is considered as a low-resource language. This research aims to examine the effectiveness of cross-lingual transfer learning for Javanese POS tagging by fine-tuning the state-of-the-art transformer-based models (such as IndoBERT, mBERT, and XLM-RoBERTa) using different kinds of source languages that have a higher resource (such as Indonesian, English, Uyghur, Latin, and Hungarian languages), and then fine-tuning it again using the Javanese language as the target language. We found that the models using cross-lingual transfer learning can increase the accuracy of the models with-out using cross-lingual transfer learning by 14.3%–15.3% over long short-time memory (LSTM)-based models, and by 0.21%–3.95% over transformer-based models. Our results show that the most accurate Javanese POS tagger model is XLM-RoBERTa that is fine-tuned in two stages (the first one using Indonesian language as the source language, and the second one using Javanese language as the target language), capable of achieving an accuracy of 87.65%. © 2024, Institute of Advanced Engineering and Science. All rights reserved.

  • Název v anglickém jazyce

    Javanese part-of-speech tagging using cross-lingual transfer learning

  • Popis výsledku anglicky

    Large datasets that are publicly available for part-of-speech (POS) tagging do not always exist for some languages. One of those languages is Javanese, a local language in Indonesia, which is considered as a low-resource language. This research aims to examine the effectiveness of cross-lingual transfer learning for Javanese POS tagging by fine-tuning the state-of-the-art transformer-based models (such as IndoBERT, mBERT, and XLM-RoBERTa) using different kinds of source languages that have a higher resource (such as Indonesian, English, Uyghur, Latin, and Hungarian languages), and then fine-tuning it again using the Javanese language as the target language. We found that the models using cross-lingual transfer learning can increase the accuracy of the models with-out using cross-lingual transfer learning by 14.3%–15.3% over long short-time memory (LSTM)-based models, and by 0.21%–3.95% over transformer-based models. Our results show that the most accurate Javanese POS tagger model is XLM-RoBERTa that is fine-tuned in two stages (the first one using Indonesian language as the source language, and the second one using Javanese language as the target language), capable of achieving an accuracy of 87.65%. © 2024, Institute of Advanced Engineering and Science. All rights reserved.

Klasifikace

  • Druh

    J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

  • Návaznosti

Ostatní

  • Rok uplatnění

    2024

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název periodika

    IAES International Journal of Artificial Intelligence

  • ISSN

    2089-4872

  • e-ISSN

  • Svazek periodika

    13

  • Číslo periodika v rámci svazku

    3

  • Stát vydavatele periodika

    US - Spojené státy americké

  • Počet stran výsledku

    12

  • Strana od-do

    3498-3509

  • Kód UT WoS článku

  • EID výsledku v databázi Scopus

    2-s2.0-85200057183