The Fault in Our Stars: Designing Reproducible Large-scale Code Analysis Experiments
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F24%3A10492062" target="_blank" >RIV/00216208:11320/24:10492062 - isvavai.cz</a>
Result on the web
<a href="http://10.1145/3689490.3690404" target="_blank" >http://10.1145/3689490.3690404</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.4230/LIPIcs.ECOOP.2024.27" target="_blank" >10.4230/LIPIcs.ECOOP.2024.27</a>
Alternative languages
Result language
angličtina
Original language name
The Fault in Our Stars: Designing Reproducible Large-scale Code Analysis Experiments
Original language description
Large-scale software repositories are a source of insights for software engineering. They offer an unmatched window into the software development process at scale. Their sheer number and size holds the promise of broadly applicable results. At the same time, that very size presents practical challenges for scaling tools and algorithms to millions of projects. A reasonable approach is to limit studies to representative samples of the population of interest. Broadly applicable conclusions can then be obtained by generalizing to the entire population. The contribution of this paper is a standardized experimental design methodology for choosing the inputs of studies working with large-scale repositories. We advocate for a methodology that clearly lays out what the population of interest is, how to sample it, and that fosters reproducibility. Along the way, we discourage researchers from using extrinsic attributes of projects such as stars, that measure some unclear notion of popularity.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/LL2325" target="_blank" >LL2325: Engineering of Data Analysis Pipelines</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
38th European Conference on Object-Oriented Programming, ECOOP 2024
ISBN
978-3-95977-341-6
ISSN
1868-8969
e-ISSN
—
Number of pages
27
Pages from-to
1-27
Publisher name
Schloss Dagstuhl
Place of publication
Germany
Event location
Vienna
Event date
Sep 16, 2024
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—