Evaluating subset selection methods for use case points estimation
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F70883521%3A28140%2F18%3A63519976" target="_blank" >RIV/70883521:28140/18:63519976 - isvavai.cz</a>
Result on the web
<a href="https://www.sciencedirect.com/science/article/pii/S0950584917305153" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0950584917305153</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.infsof.2017.12.009" target="_blank" >10.1016/j.infsof.2017.12.009</a>
Alternative languages
Result language
angličtina
Original language name
Evaluating subset selection methods for use case points estimation
Original language description
When the Use Case Points method is used for software effort estimation, users are faced with low model accuracy which impacts on its practical application. This study investigates the significance of using subset selection methods for the prediction accuracy of Multiple Linear Regression models, obtained by the stepwise approach. K-means, Spectral Clustering, the Gaussian Mixture Model and Moving Window are evaluated as appropriate subset selection techniques. The methods were evaluated according to several evaluation criteria and then statistically tested. Evaluation was performing on two independent datasets - which differ in project types and size. Both were cut by the hold-out method. If clustering were used, the training sets were clustered into 3 classes; and, for each of class, an independent regression model was created. These were later used for the prediction of testing sets. If Moving Window was used, then window of sizes 5, 10 and 15 were tested. The results show that clustering techniques decrease prediction errors significantly when compared to Use Case Points or moving windows methods. Spectral Clustering was selected as the best-performing solution, because it achieves a Sum of Squared Errors reduction of 32% for the first dataset, and 98% for the second dataset. The Mean Absolute Percentage Error is less than 1% for the second dataset for Spectral Clustering; 9% for moving window; and 27% for Use Case Points. When the first dataset is used, then prediction errors are significantly higher – 53% for Spectral Clustering, but Use Case Points produces a 165% result. It can be concluded that this study proves subset selection techniques as a significant method for improving the prediction ability of linear regression models - which are used for software development effort prediction. It can also be concluded that the clustering method performs better than the moving window method.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
INFORMATION AND SOFTWARE TECHNOLOGY
ISSN
0950-5849
e-ISSN
—
Volume of the periodical
neuveden
Issue of the periodical within the volume
97
Country of publishing house
NL - THE KINGDOM OF THE NETHERLANDS
Number of pages
9
Pages from-to
"nestrankovano"
UT code for WoS article
000428008600001
EID of the result in the Scopus database
2-s2.0-85039969351