The Fault in Our Stars: Designing Reproducible Large-scale Code Analysis Experiments

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F24%3A10492062" target="_blank" >RIV/00216208:11320/24:10492062 - isvavai.cz</a>
Výsledek na webu
<a href="http://10.1145/3689490.3690404" target="_blank" >http://10.1145/3689490.3690404</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.4230/LIPIcs.ECOOP.2024.27" target="_blank" >10.4230/LIPIcs.ECOOP.2024.27</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
The Fault in Our Stars: Designing Reproducible Large-scale Code Analysis Experiments
Popis výsledku v původním jazyce
Large-scale software repositories are a source of insights for software engineering. They offer an unmatched window into the software development process at scale. Their sheer number and size holds the promise of broadly applicable results. At the same time, that very size presents practical challenges for scaling tools and algorithms to millions of projects. A reasonable approach is to limit studies to representative samples of the population of interest. Broadly applicable conclusions can then be obtained by generalizing to the entire population. The contribution of this paper is a standardized experimental design methodology for choosing the inputs of studies working with large-scale repositories. We advocate for a methodology that clearly lays out what the population of interest is, how to sample it, and that fosters reproducibility. Along the way, we discourage researchers from using extrinsic attributes of projects such as stars, that measure some unclear notion of popularity.
Název v anglickém jazyce
The Fault in Our Stars: Designing Reproducible Large-scale Code Analysis Experiments
Popis výsledku anglicky
Large-scale software repositories are a source of insights for software engineering. They offer an unmatched window into the software development process at scale. Their sheer number and size holds the promise of broadly applicable results. At the same time, that very size presents practical challenges for scaling tools and algorithms to millions of projects. A reasonable approach is to limit studies to representative samples of the population of interest. Broadly applicable conclusions can then be obtained by generalizing to the entire population. The contribution of this paper is a standardized experimental design methodology for choosing the inputs of studies working with large-scale repositories. We advocate for a methodology that clearly lays out what the population of interest is, how to sample it, and that fosters reproducibility. Along the way, we discourage researchers from using extrinsic attributes of projects such as stars, that measure some unclear notion of popularity.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/LL2325" target="_blank" >LL2325: Konstrukce kanálů pro analýzu dat</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
38th European Conference on Object-Oriented Programming, ECOOP 2024
ISBN
978-3-95977-341-6
ISSN
1868-8969
e-ISSN
—
Počet stran výsledku
27
Strana od-do
1-27
Název nakladatele
Schloss Dagstuhl
Místo vydání
Germany
Místo konání akce
Vienna
Datum konání akce
16. 9. 2024
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

CodeDJ: Reproducible queries over large-scale software repositories A Large-Scale Study on Source Code Reviewer Recommendation The reuse of avian samples: opportunities, pitfalls, and a solution

Co hledáte?

Rychlé hledání

Chytré vyhledávání

The Fault in Our Stars: Designing Reproducible Large-scale Code Analysis Experiments

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)