Graph-based Dependency Parser Building for Myanmar Language
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3AXWWRJVEQ" target="_blank" >RIV/00216208:11320/22:XWWRJVEQ - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1109/iSAI-NLP56921.2022.9960267" target="_blank" >https://doi.org/10.1109/iSAI-NLP56921.2022.9960267</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/iSAI-NLP56921.2022.9960267" target="_blank" >10.1109/iSAI-NLP56921.2022.9960267</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Graph-based Dependency Parser Building for Myanmar Language
Popis výsledku v původním jazyce
Examining the relationships between words in a sentence to determine its grammatical structure is known as dependency parsing (DP). Based on this, a sentence is broken down into several components. The process is based on the concept that every linguistic component of a sentence has a direct relationship to one another. These relationships are called dependencies. Dependency parsing is one of the key steps in natural language processing (NLP) for several text mining approaches. As the dominant formalism for dependency parsing in recent years, Universal Dependencies (UD) have emerged. The various UD corpus and dependency parsers are publicly accessible for resource-rich languages. However, there are no publicly available resources for dependency parsing, especially for the low-resource language, Myanmar. Thus, we manually extended the existing small Myanmar UD corpus (i.e., myPOS UD corpus) as myPOS version 3.0 UD corpus to publish the extended Myanmar UD corpus as the publicly available resource. To evaluate the effects of the extended UD corpus versus the original UD corpus, we utilized the graph-based neural dependency parsing models, namely, jPTDP (joint POS tagging and dependency parsing) and UniParse (universal graph-based parsing), and the evaluation scores are measured in terms of unlabeled and labeled attachment scores: (UAS) and (LAS). We compared the accuracies of graph-based neural models based on the original and extended UD corpora. The experimental results showed that, compared to the original myPOS UD corpus, the extended myPOS version 3.0 UD corpus enhanced the accuracy of dependency parsing models.
Název v anglickém jazyce
Graph-based Dependency Parser Building for Myanmar Language
Popis výsledku anglicky
Examining the relationships between words in a sentence to determine its grammatical structure is known as dependency parsing (DP). Based on this, a sentence is broken down into several components. The process is based on the concept that every linguistic component of a sentence has a direct relationship to one another. These relationships are called dependencies. Dependency parsing is one of the key steps in natural language processing (NLP) for several text mining approaches. As the dominant formalism for dependency parsing in recent years, Universal Dependencies (UD) have emerged. The various UD corpus and dependency parsers are publicly accessible for resource-rich languages. However, there are no publicly available resources for dependency parsing, especially for the low-resource language, Myanmar. Thus, we manually extended the existing small Myanmar UD corpus (i.e., myPOS UD corpus) as myPOS version 3.0 UD corpus to publish the extended Myanmar UD corpus as the publicly available resource. To evaluate the effects of the extended UD corpus versus the original UD corpus, we utilized the graph-based neural dependency parsing models, namely, jPTDP (joint POS tagging and dependency parsing) and UniParse (universal graph-based parsing), and the evaluation scores are measured in terms of unlabeled and labeled attachment scores: (UAS) and (LAS). We compared the accuracies of graph-based neural models based on the original and extended UD corpora. The experimental results showed that, compared to the original myPOS UD corpus, the extended myPOS version 3.0 UD corpus enhanced the accuracy of dependency parsing models.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)
ISBN
978-1-66545-727-9
ISSN
2831-4565
e-ISSN
—
Počet stran výsledku
6
Strana od-do
1-6
Název nakladatele
IEEE
Místo vydání
—
Místo konání akce
Chiang Mai, Thailand
Datum konání akce
1. 1. 2022
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000900145700024