Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27350%2F22%3A10250524" target="_blank" >RIV/61989100:27350/22:10250524 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.mdpi.com/2073-431X/11/9/136" target="_blank" >https://www.mdpi.com/2073-431X/11/9/136</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3390/computers11090136" target="_blank" >10.3390/computers11090136</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method
Popis výsledku v původním jazyce
Developing a prediction model from risk factors can provide an efficient method to recognize breast cancer. Machine learning (ML) algorithms have been applied to increase the efficiency of diagnosis at the early stage. This paper studies a support vector machine (SVM) combined with an extremely randomized trees classifier (extra-trees) to provide a diagnosis of breast cancer at the early stage based on risk factors. The extra-trees classifier was used to remove irrelevant features, while SVM was utilized to diagnose the breast cancer status. A breast cancer dataset consisting of 116 subjects was utilized by machine learning models to predict breast cancer, while the stratified 10-fold cross-validation was employed for the model evaluation. Our proposed combined SVM and extra-trees model reached the highest accuracy up to 80.23%, which was significantly better than the other ML model. The experimental results demonstrated that by applying extra-trees-based feature selection, the average ML prediction accuracy was improved by up to 7.29% as contrasted to ML without the feature selection method. Our proposed model is expected to increase the efficiency of breast cancer diagnosis based on risk factors. In addition, we presented the proposed prediction model that could be employed for web-based breast cancer prediction. The proposed model is expected to improve diagnostic decision-support systems by predicting breast cancer disease accurately.
Název v anglickém jazyce
Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method
Popis výsledku anglicky
Developing a prediction model from risk factors can provide an efficient method to recognize breast cancer. Machine learning (ML) algorithms have been applied to increase the efficiency of diagnosis at the early stage. This paper studies a support vector machine (SVM) combined with an extremely randomized trees classifier (extra-trees) to provide a diagnosis of breast cancer at the early stage based on risk factors. The extra-trees classifier was used to remove irrelevant features, while SVM was utilized to diagnose the breast cancer status. A breast cancer dataset consisting of 116 subjects was utilized by machine learning models to predict breast cancer, while the stratified 10-fold cross-validation was employed for the model evaluation. Our proposed combined SVM and extra-trees model reached the highest accuracy up to 80.23%, which was significantly better than the other ML model. The experimental results demonstrated that by applying extra-trees-based feature selection, the average ML prediction accuracy was improved by up to 7.29% as contrasted to ML without the feature selection method. Our proposed model is expected to increase the efficiency of breast cancer diagnosis based on risk factors. In addition, we presented the proposed prediction model that could be employed for web-based breast cancer prediction. The proposed model is expected to improve diagnostic decision-support systems by predicting breast cancer disease accurately.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10200 - Computer and information sciences
Návaznosti výsledku
Projekt
—
Návaznosti
V - Vyzkumna aktivita podporovana z jinych verejnych zdroju
Ostatní
Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Computers
ISSN
2073-431X
e-ISSN
—
Svazek periodika
11
Číslo periodika v rámci svazku
9
Stát vydavatele periodika
CH - Švýcarská konfederace
Počet stran výsledku
14
Strana od-do
nestrankovano
Kód UT WoS článku
000856323500001
EID výsledku v databázi Scopus
2-s2.0-85138679296