Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A52H827PR" target="_blank" >RIV/00216208:11320/23:52H827PR - isvavai.cz</a>
Výsledek na webu
<a href="https://dl.acm.org/doi/10.1145/3637877" target="_blank" >https://dl.acm.org/doi/10.1145/3637877</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1145/3637877" target="_blank" >10.1145/3637877</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
Popis výsledku v původním jazyce
"Developing effective natural language processing (NLP) tools for low-resourced languages poses significant challenges. This article centers its attention on the task of Part-of-speech (POS) tagging and chunking, which pertains to the identification and categorization of linguistic units within sentences. POS tagging and Chunking have already produced positive results in English and other European languages. However, in Indian languages, particularly in Odia language, it is not yet well explored because of the lack of supporting tools, resources, and its complex linguistic morphology. This study presents the building of a manually annotated dataset for Odia phrase chunking task and the development of a deep learning-based models specifically tailored to accommodate the distinctive properties of the language. The process of annotating the Odia chunking corpus involved the utilization of IOB (inside-outside-begin) labels, which were tagged by using designed Odia chunking tagset. We utilize the constructed Odia chunking dataset to build Odia chunker based on deep learning techniques, employing state-of-the-art architectures. Various techniques, such as Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and transformer-based models, are investigated in order to determine the most effective approach for Odia POS tagging and chunking. In addition, we conduct experiments utilizing diverse input representations, including Odia word embeddings, character-level representations, and sub-word units, in order to effectively capture the complex linguistic characteristics of the Odia language. Numerous experiments are conducted that evaluate the performance of our Odia POS tagger and chunker, employing standard evaluation metrics and making comparisons with existing approaches. The results demonstrate that our transformer-based tagger and chunker achieves superior accuracy and robustness in identifying and categorizing linguistic POS tags and chunks within Odia sentences. It outperforms existing work and exhibits consistent performance across diverse linguistic contexts and sentence structures. The developed Odia POS tagger and chunker have enormous potential for a variety of NLP applications, including information extraction, syntactic parsing, and machine translation, all of which are tailored to the low-resource Odia language. This work contributes to developing NLP tools and technologies for low-resource languages, thereby facilitating enhanced language processing capabilities in various linguistic contexts."
Název v anglickém jazyce
Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
Popis výsledku anglicky
"Developing effective natural language processing (NLP) tools for low-resourced languages poses significant challenges. This article centers its attention on the task of Part-of-speech (POS) tagging and chunking, which pertains to the identification and categorization of linguistic units within sentences. POS tagging and Chunking have already produced positive results in English and other European languages. However, in Indian languages, particularly in Odia language, it is not yet well explored because of the lack of supporting tools, resources, and its complex linguistic morphology. This study presents the building of a manually annotated dataset for Odia phrase chunking task and the development of a deep learning-based models specifically tailored to accommodate the distinctive properties of the language. The process of annotating the Odia chunking corpus involved the utilization of IOB (inside-outside-begin) labels, which were tagged by using designed Odia chunking tagset. We utilize the constructed Odia chunking dataset to build Odia chunker based on deep learning techniques, employing state-of-the-art architectures. Various techniques, such as Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and transformer-based models, are investigated in order to determine the most effective approach for Odia POS tagging and chunking. In addition, we conduct experiments utilizing diverse input representations, including Odia word embeddings, character-level representations, and sub-word units, in order to effectively capture the complex linguistic characteristics of the Odia language. Numerous experiments are conducted that evaluate the performance of our Odia POS tagger and chunker, employing standard evaluation metrics and making comparisons with existing approaches. The results demonstrate that our transformer-based tagger and chunker achieves superior accuracy and robustness in identifying and categorizing linguistic POS tags and chunks within Odia sentences. It outperforms existing work and exhibits consistent performance across diverse linguistic contexts and sentence structures. The developed Odia POS tagger and chunker have enormous potential for a variety of NLP applications, including information extraction, syntactic parsing, and machine translation, all of which are tailored to the low-resource Odia language. This work contributes to developing NLP tools and technologies for low-resource languages, thereby facilitating enhanced language processing capabilities in various linguistic contexts."

Klasifikace

Druh
J<sub>ost</sub> - Ostatní články v recenzovaných periodicích
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
"ACM Transactions on Asian and Low-Resource Language Information Processing"
ISSN
2375-4699
e-ISSN
—
Svazek periodika
""
Číslo periodika v rámci svazku
2023
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
23
Strana od-do
1-23
Kód UT WoS článku
—
EID výsledku v databázi Scopus
—

Podobné výsledky(10)

Part-of-Speech Tagging of Odia Language Using Statistical and Deep Learning Based Approaches BertOdia: BERT Pre-training for Low Resource Odia Language Part of Speech Tagging in Urdu: Comparison of Machine and Deep Learning Approaches

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)