Evaluating subset selection methods for use case points estimation
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F70883521%3A28140%2F18%3A63519976" target="_blank" >RIV/70883521:28140/18:63519976 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.sciencedirect.com/science/article/pii/S0950584917305153" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0950584917305153</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.infsof.2017.12.009" target="_blank" >10.1016/j.infsof.2017.12.009</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Evaluating subset selection methods for use case points estimation
Popis výsledku v původním jazyce
When the Use Case Points method is used for software effort estimation, users are faced with low model accuracy which impacts on its practical application. This study investigates the significance of using subset selection methods for the prediction accuracy of Multiple Linear Regression models, obtained by the stepwise approach. K-means, Spectral Clustering, the Gaussian Mixture Model and Moving Window are evaluated as appropriate subset selection techniques. The methods were evaluated according to several evaluation criteria and then statistically tested. Evaluation was performing on two independent datasets - which differ in project types and size. Both were cut by the hold-out method. If clustering were used, the training sets were clustered into 3 classes; and, for each of class, an independent regression model was created. These were later used for the prediction of testing sets. If Moving Window was used, then window of sizes 5, 10 and 15 were tested. The results show that clustering techniques decrease prediction errors significantly when compared to Use Case Points or moving windows methods. Spectral Clustering was selected as the best-performing solution, because it achieves a Sum of Squared Errors reduction of 32% for the first dataset, and 98% for the second dataset. The Mean Absolute Percentage Error is less than 1% for the second dataset for Spectral Clustering; 9% for moving window; and 27% for Use Case Points. When the first dataset is used, then prediction errors are significantly higher – 53% for Spectral Clustering, but Use Case Points produces a 165% result. It can be concluded that this study proves subset selection techniques as a significant method for improving the prediction ability of linear regression models - which are used for software development effort prediction. It can also be concluded that the clustering method performs better than the moving window method.
Název v anglickém jazyce
Evaluating subset selection methods for use case points estimation
Popis výsledku anglicky
When the Use Case Points method is used for software effort estimation, users are faced with low model accuracy which impacts on its practical application. This study investigates the significance of using subset selection methods for the prediction accuracy of Multiple Linear Regression models, obtained by the stepwise approach. K-means, Spectral Clustering, the Gaussian Mixture Model and Moving Window are evaluated as appropriate subset selection techniques. The methods were evaluated according to several evaluation criteria and then statistically tested. Evaluation was performing on two independent datasets - which differ in project types and size. Both were cut by the hold-out method. If clustering were used, the training sets were clustered into 3 classes; and, for each of class, an independent regression model was created. These were later used for the prediction of testing sets. If Moving Window was used, then window of sizes 5, 10 and 15 were tested. The results show that clustering techniques decrease prediction errors significantly when compared to Use Case Points or moving windows methods. Spectral Clustering was selected as the best-performing solution, because it achieves a Sum of Squared Errors reduction of 32% for the first dataset, and 98% for the second dataset. The Mean Absolute Percentage Error is less than 1% for the second dataset for Spectral Clustering; 9% for moving window; and 27% for Use Case Points. When the first dataset is used, then prediction errors are significantly higher – 53% for Spectral Clustering, but Use Case Points produces a 165% result. It can be concluded that this study proves subset selection techniques as a significant method for improving the prediction ability of linear regression models - which are used for software development effort prediction. It can also be concluded that the clustering method performs better than the moving window method.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
INFORMATION AND SOFTWARE TECHNOLOGY
ISSN
0950-5849
e-ISSN
—
Svazek periodika
neuveden
Číslo periodika v rámci svazku
97
Stát vydavatele periodika
NL - Nizozemsko
Počet stran výsledku
9
Strana od-do
"nestrankovano"
Kód UT WoS článku
000428008600001
EID výsledku v databázi Scopus
2-s2.0-85039969351