Javanese part-of-speech tagging using cross-lingual transfer learning
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AG87LC4VZ" target="_blank" >RIV/00216208:11320/25:G87LC4VZ - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200057183&doi=10.11591%2fijai.v13.i3.pp3498-3509&partnerID=40&md5=3bc107ded6fef1573c58cdb8f371ff2c" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200057183&doi=10.11591%2fijai.v13.i3.pp3498-3509&partnerID=40&md5=3bc107ded6fef1573c58cdb8f371ff2c</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.11591/ijai.v13.i3.pp3498-3509" target="_blank" >10.11591/ijai.v13.i3.pp3498-3509</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Javanese part-of-speech tagging using cross-lingual transfer learning
Popis výsledku v původním jazyce
Large datasets that are publicly available for part-of-speech (POS) tagging do not always exist for some languages. One of those languages is Javanese, a local language in Indonesia, which is considered as a low-resource language. This research aims to examine the effectiveness of cross-lingual transfer learning for Javanese POS tagging by fine-tuning the state-of-the-art transformer-based models (such as IndoBERT, mBERT, and XLM-RoBERTa) using different kinds of source languages that have a higher resource (such as Indonesian, English, Uyghur, Latin, and Hungarian languages), and then fine-tuning it again using the Javanese language as the target language. We found that the models using cross-lingual transfer learning can increase the accuracy of the models with-out using cross-lingual transfer learning by 14.3%–15.3% over long short-time memory (LSTM)-based models, and by 0.21%–3.95% over transformer-based models. Our results show that the most accurate Javanese POS tagger model is XLM-RoBERTa that is fine-tuned in two stages (the first one using Indonesian language as the source language, and the second one using Javanese language as the target language), capable of achieving an accuracy of 87.65%. © 2024, Institute of Advanced Engineering and Science. All rights reserved.
Název v anglickém jazyce
Javanese part-of-speech tagging using cross-lingual transfer learning
Popis výsledku anglicky
Large datasets that are publicly available for part-of-speech (POS) tagging do not always exist for some languages. One of those languages is Javanese, a local language in Indonesia, which is considered as a low-resource language. This research aims to examine the effectiveness of cross-lingual transfer learning for Javanese POS tagging by fine-tuning the state-of-the-art transformer-based models (such as IndoBERT, mBERT, and XLM-RoBERTa) using different kinds of source languages that have a higher resource (such as Indonesian, English, Uyghur, Latin, and Hungarian languages), and then fine-tuning it again using the Javanese language as the target language. We found that the models using cross-lingual transfer learning can increase the accuracy of the models with-out using cross-lingual transfer learning by 14.3%–15.3% over long short-time memory (LSTM)-based models, and by 0.21%–3.95% over transformer-based models. Our results show that the most accurate Javanese POS tagger model is XLM-RoBERTa that is fine-tuned in two stages (the first one using Indonesian language as the source language, and the second one using Javanese language as the target language), capable of achieving an accuracy of 87.65%. © 2024, Institute of Advanced Engineering and Science. All rights reserved.
Klasifikace
Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
IAES International Journal of Artificial Intelligence
ISSN
2089-4872
e-ISSN
—
Svazek periodika
13
Číslo periodika v rámci svazku
3
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
12
Strana od-do
3498-3509
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85200057183