MultIPAs: applying program transformations to introductory programming assignments for data augmentation
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F22%3A00364210" target="_blank" >RIV/68407700:21730/22:00364210 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.1145/3540250.3558931" target="_blank" >https://doi.org/10.1145/3540250.3558931</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1145/3540250.3558931" target="_blank" >10.1145/3540250.3558931</a>
Alternative languages
Result language
angličtina
Original language name
MultIPAs: applying program transformations to introductory programming assignments for data augmentation
Original language description
There has been a growing interest, over the last few years, in the topic of automated program repair applied to fixing introductory programming assignments (IPAs). However, the datasets of IPAs publicly available tend to be small and with no valuable annotations about the defects of each program. Small datasets are not very useful for program repair tools that rely on machine learning models. Furthermore, a large diversity of correct implementations allows computing a smaller set of repairs to fix a given incorrect program rather than always using the same set of correct implementations for a given IPA. For these reasons, there has been an increasing demand for the task of augmenting IPAs benchmarks. This paper presents MultIPAs, a program transformation tool that can augment IPAs benchmarks by: (1) applying six syntactic mutations that conserve the program's semantics and (2) applying three semantic mutilations that introduce faults in the IPAs. Moreover, we demonstrate the usefulness of MultIPAs by augmenting with millions of programs two publicly available benchmarks of programs written in the C language, and also by generating an extensive benchmark of semantically incorrect programs.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
ISBN
978-1-4503-9413-0
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
1657-1661
Publisher name
ACM
Place of publication
New York
Event location
Singapur
Event date
Nov 14, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
001118262900146