Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3ANXJGVCU7" target="_blank" >RIV/00216208:11320/23:NXJGVCU7 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1007/s40745-022-00434-4" target="_blank" >https://doi.org/10.1007/s40745-022-00434-4</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s40745-022-00434-4" target="_blank" >10.1007/s40745-022-00434-4</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach
Popis výsledku v původním jazyce
"In the domain of natural language processing, part-of-speech (POS) tagging is the most important task. It plays a vital role in applications like sentiment analysis, text summarization, opinion mining, etc. POS tagging is a process of assigning POS information (noun, pronoun, verb, etc.) to the given word. This information is considered in the context of their relationship with the surrounding words. Hindi is very popular language in countries like India, Nepal, United States, Mauritius, etc. Majority of Indians are accustomed to Hindi for reading and writing. They also use Hindi for writing on social media such as Twitter, Facebook, WhatsApp, etc. POS tagging is the most important phase to analyze these Hindi text from social media. The text scripted in Hindi is ambiguous in nature and rich in morphology. It makes identification of POS information challenging. In this article, a heuristic based approach is proposed for identifying POS information. The proposed method deployed a context-based bigram model that create a bigram sequence based on the relationship with the adjacent words. Subsequently, it selects the most likelihood POS information for a word based on both the forward and reverse bigram sequences. The experimental result of the proposed heuristic approach is compared with existing state-of-the-art techniques like hidden Markov model, decision tree, conditional random fields, support vector machine, neural network, and recurrent neural networks. Finally, it is observe that the proposed heuristic approach for POS tagging in Hindi outperforms the existing techniques and attains an accuracy of 94.3%."
Název v anglickém jazyce
Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach
Popis výsledku anglicky
"In the domain of natural language processing, part-of-speech (POS) tagging is the most important task. It plays a vital role in applications like sentiment analysis, text summarization, opinion mining, etc. POS tagging is a process of assigning POS information (noun, pronoun, verb, etc.) to the given word. This information is considered in the context of their relationship with the surrounding words. Hindi is very popular language in countries like India, Nepal, United States, Mauritius, etc. Majority of Indians are accustomed to Hindi for reading and writing. They also use Hindi for writing on social media such as Twitter, Facebook, WhatsApp, etc. POS tagging is the most important phase to analyze these Hindi text from social media. The text scripted in Hindi is ambiguous in nature and rich in morphology. It makes identification of POS information challenging. In this article, a heuristic based approach is proposed for identifying POS information. The proposed method deployed a context-based bigram model that create a bigram sequence based on the relationship with the adjacent words. Subsequently, it selects the most likelihood POS information for a word based on both the forward and reverse bigram sequences. The experimental result of the proposed heuristic approach is compared with existing state-of-the-art techniques like hidden Markov model, decision tree, conditional random fields, support vector machine, neural network, and recurrent neural networks. Finally, it is observe that the proposed heuristic approach for POS tagging in Hindi outperforms the existing techniques and attains an accuracy of 94.3%."
Klasifikace
Druh
J<sub>ost</sub> - Ostatní články v recenzovaných periodicích
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
"Annals of Data Science"
ISSN
2198-5812
e-ISSN
—
Svazek periodika
""
Číslo periodika v rámci svazku
2023
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
32
Strana od-do
347-378
Kód UT WoS článku
—
EID výsledku v databázi Scopus
—