Natural language generation from Universal Dependencies using data augmentation and pre-trained language models

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3AEEYSAUH5" target="_blank" >RIV/00216208:11320/23:EEYSAUH5 - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85147539360&doi=10.1504%2fIJIIDS.2023.10053426&partnerID=40&md5=536d01463061350f89f904914ea31353" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85147539360&doi=10.1504%2fIJIIDS.2023.10053426&partnerID=40&md5=536d01463061350f89f904914ea31353</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1504/ijiids.2023.10053426" target="_blank" >10.1504/ijiids.2023.10053426</a>

Alternative languages

Result language
angličtina
Original language name
Natural language generation from Universal Dependencies using data augmentation and pre-trained language models
Original language description
"Natural language generation (NLG) has focused on data-to-text tasks with different structured inputs in recent years. The generated text should contain given information, be grammatically correct, and meet other criteria. We propose in this research an approach that combines solid pre-trained language models with input data augmentation. The studied data in this work are Universal Dependencies (UDs) which is developed as a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) for cross-lingual learning. We study the English UD structures, which are modified into two groups. In the first group, the modification phase is to remove the order information of each word and lemmatise the tokens. In the second group, the modification phase is to remove the functional words and surface-oriented morphological details. With both groups of modified structures, we apply the same approach to explore how pre-trained sequence-to-sequence models text-to-text transfer transformer (T5) and BART perform on the training data. We augment the training data by creating several permutations for each input structure. The result shows that our approach can generate good quality English text with the exciting idea of studying strategies to represent UD inputs. Copyright © 2023 Inderscience Enterprises Ltd."
Czech name
—
Czech description
—

Classification

Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
—
Continuities
—

Others

Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Name of the periodical
"International Journal of Intelligent Information and Database Systems"
ISSN
1751-5858
e-ISSN
—
Volume of the periodical
16
Issue of the periodical within the volume
1
Country of publishing house
US - UNITED STATES
Number of pages
17
Pages from-to
89-105
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85147539360

Similar results(10)

Part-of-Speech Tagging of Odia Language Using Statistical and Deep Learning Based Approaches Artificially Evolved Chunks for Morphosyntactic Analysis CUNI x-ling: Parsing under-resourced languages in CoNLL 2018 UD Shared Task

What are you looking for?

Quick search

Smart search

Natural language generation from Universal Dependencies using data augmentation and pre-trained language models

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)