Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F17%3A00100027" target="_blank" >RIV/00216224:14330/17:00100027 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1109/SEAA.2017.71" target="_blank" >http://dx.doi.org/10.1109/SEAA.2017.71</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/SEAA.2017.71" target="_blank" >10.1109/SEAA.2017.71</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results
Popis výsledku v původním jazyce
Context: Software Bug Severity Classification can help to improve the software bug triaging process. However, severity levels present a high-level of data imbalance that needs to be taken into account. Aim: We investigate cost-sensitive strategies in multi-class bug severity classification to counteract data imbalance. Method: We transform datasets from three severity classification papers to a common format, totaling 17 projects. We test different cost sensitive strategies to penalize majority classes. We adopt a Support Vector Machine (SVM) classifier that we also compare to a baseline "majority class" classifier. Results: A model weighting classes based on the inverse of instance frequencies yields a statistically significant improvement (low effect size) over the standard unweighted SVM model in the assembled dataset. Conclusions: Data imbalance should be taken more into consideration in future severity classification research papers.
Název v anglickém jazyce
Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results
Popis výsledku anglicky
Context: Software Bug Severity Classification can help to improve the software bug triaging process. However, severity levels present a high-level of data imbalance that needs to be taken into account. Aim: We investigate cost-sensitive strategies in multi-class bug severity classification to counteract data imbalance. Method: We transform datasets from three severity classification papers to a common format, totaling 17 projects. We test different cost sensitive strategies to penalize majority classes. We adopt a Support Vector Machine (SVM) classifier that we also compare to a baseline "majority class" classifier. Results: A model weighting classes based on the inverse of instance frequencies yields a statistically significant improvement (low effect size) over the standard unweighted SVM model in the assembled dataset. Conclusions: Data imbalance should be taken more into consideration in future severity classification research papers.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2017
ISBN
9781538621400
ISSN
—
e-ISSN
—
Počet stran výsledku
4
Strana od-do
426-429
Název nakladatele
IEEE
Místo vydání
Not specified
Místo konání akce
Vienna
Datum konání akce
1. 1. 2017
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000426074600063

Podobné výsledky(10)

Predicting regional credit ratings using ensemble classification with metacost Towards an Improvement of Bug Severity Classification Inverse free reduced universum twin support vector machine for imbalanced data classification

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)