Natural language generation from Universal Dependencies using data augmentation and pre-trained language models

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3AEEYSAUH5" target="_blank" >RIV/00216208:11320/23:EEYSAUH5 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85147539360&doi=10.1504%2fIJIIDS.2023.10053426&partnerID=40&md5=536d01463061350f89f904914ea31353" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85147539360&doi=10.1504%2fIJIIDS.2023.10053426&partnerID=40&md5=536d01463061350f89f904914ea31353</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1504/ijiids.2023.10053426" target="_blank" >10.1504/ijiids.2023.10053426</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Natural language generation from Universal Dependencies using data augmentation and pre-trained language models
Popis výsledku v původním jazyce
"Natural language generation (NLG) has focused on data-to-text tasks with different structured inputs in recent years. The generated text should contain given information, be grammatically correct, and meet other criteria. We propose in this research an approach that combines solid pre-trained language models with input data augmentation. The studied data in this work are Universal Dependencies (UDs) which is developed as a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) for cross-lingual learning. We study the English UD structures, which are modified into two groups. In the first group, the modification phase is to remove the order information of each word and lemmatise the tokens. In the second group, the modification phase is to remove the functional words and surface-oriented morphological details. With both groups of modified structures, we apply the same approach to explore how pre-trained sequence-to-sequence models text-to-text transfer transformer (T5) and BART perform on the training data. We augment the training data by creating several permutations for each input structure. The result shows that our approach can generate good quality English text with the exciting idea of studying strategies to represent UD inputs. Copyright © 2023 Inderscience Enterprises Ltd."
Název v anglickém jazyce
Natural language generation from Universal Dependencies using data augmentation and pre-trained language models
Popis výsledku anglicky
"Natural language generation (NLG) has focused on data-to-text tasks with different structured inputs in recent years. The generated text should contain given information, be grammatically correct, and meet other criteria. We propose in this research an approach that combines solid pre-trained language models with input data augmentation. The studied data in this work are Universal Dependencies (UDs) which is developed as a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) for cross-lingual learning. We study the English UD structures, which are modified into two groups. In the first group, the modification phase is to remove the order information of each word and lemmatise the tokens. In the second group, the modification phase is to remove the functional words and surface-oriented morphological details. With both groups of modified structures, we apply the same approach to explore how pre-trained sequence-to-sequence models text-to-text transfer transformer (T5) and BART perform on the training data. We augment the training data by creating several permutations for each input structure. The result shows that our approach can generate good quality English text with the exciting idea of studying strategies to represent UD inputs. Copyright © 2023 Inderscience Enterprises Ltd."

Klasifikace

Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
"International Journal of Intelligent Information and Database Systems"
ISSN
1751-5858
e-ISSN
—
Svazek periodika
16
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
17
Strana od-do
89-105
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85147539360

Podobné výsledky(10)

Part-of-Speech Tagging of Odia Language Using Statistical and Deep Learning Based Approaches Artificially Evolved Chunks for Morphosyntactic Analysis A Comparative Study of Lemmatization Approaches for Rojak Language

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Natural language generation from Universal Dependencies using data augmentation and pre-trained language models

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)