Sequence-Labeling RoBERTa Model for Dependency-Parsing in Classical Chinese and Its Application to Vietnamese and Thai

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A2MU6KJB2" target="_blank" >RIV/00216208:11320/23:2MU6KJB2 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85164289459&doi=10.1109%2fICBIR57571.2023.10147628&partnerID=40&md5=2fb9f35120c62448088e49e9154f0479" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85164289459&doi=10.1109%2fICBIR57571.2023.10147628&partnerID=40&md5=2fb9f35120c62448088e49e9154f0479</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICBIR57571.2023.10147628" target="_blank" >10.1109/ICBIR57571.2023.10147628</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Sequence-Labeling RoBERTa Model for Dependency-Parsing in Classical Chinese and Its Application to Vietnamese and Thai
Popis výsledku v původním jazyce
"The author and his colleagues have been developing classical Chinese treebank using Universal Dependencies. We also developed RoBERTa-Classical-Chinese model pre-trained with classical Chinese texts of 1.7 billion characters. In this paper we describe how to finetune sequence-labeling RoBERTa model for dependency-parsing in classical Chinese. We introduce 'goeswith'-labeled edges into the directed acyclic graphs of Universal Dependencies in order to resolve the mismatch between the token length of RoBERTa-Classical-Chinese and the word length in classical Chinese. We utilize [MASK]token of RoBERTa model to handle outgoing edges and to produce the adjacency-matrices for the graphs of Universal Dependencies. Our RoBERTa-UDgoeswith model outperforms other dependency-parsers in classical Chinese on LAS / MLAS / BLEX benchmark scores. Then we apply our methods to other isolating languages. For Vietnamese we introduce 'goeswith'-labeled edges to separate words into space-separated syllables, and finetune RoBERTa and PhoBERT models. For Thai we try three kinds of tokenizers, character-wise tokenizer, quasi-syllable tokenizer, and SentencePiece, to produce RoBERTa models. © 2023 IEEE."
Název v anglickém jazyce
Sequence-Labeling RoBERTa Model for Dependency-Parsing in Classical Chinese and Its Application to Vietnamese and Thai
Popis výsledku anglicky
"The author and his colleagues have been developing classical Chinese treebank using Universal Dependencies. We also developed RoBERTa-Classical-Chinese model pre-trained with classical Chinese texts of 1.7 billion characters. In this paper we describe how to finetune sequence-labeling RoBERTa model for dependency-parsing in classical Chinese. We introduce 'goeswith'-labeled edges into the directed acyclic graphs of Universal Dependencies in order to resolve the mismatch between the token length of RoBERTa-Classical-Chinese and the word length in classical Chinese. We utilize [MASK]token of RoBERTa model to handle outgoing edges and to produce the adjacency-matrices for the graphs of Universal Dependencies. Our RoBERTa-UDgoeswith model outperforms other dependency-parsers in classical Chinese on LAS / MLAS / BLEX benchmark scores. Then we apply our methods to other isolating languages. For Vietnamese we introduce 'goeswith'-labeled edges to separate words into space-separated syllables, and finetune RoBERTa and PhoBERT models. For Thai we try three kinds of tokenizers, character-wise tokenizer, quasi-syllable tokenizer, and SentencePiece, to produce RoBERTa models. © 2023 IEEE."

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
"Int. Conf. Bus. Ind. Res., ICBIR - Proc."
ISBN
979-835039964-6
ISSN
—
e-ISSN
—
Počet stran výsledku
5
Strana od-do
169-173
Název nakladatele
Institute of Electrical and Electronics Engineers Inc.
Místo vydání
—
Místo konání akce
Cham
Datum konání akce
1. 1. 2023
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Word Length in Chinese: The Menzerath-Altmann Law is Valid After All KU Leuven / Brepols-CTLO at EvaLatin 2024: Span extraction approaches for Latin dependency parsing The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Sequence-Labeling RoBERTa Model for Dependency-Parsing in Classical Chinese and Its Application to Vietnamese and Thai

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)