Enhancing Cardiovascular Risk Assessment with Advanced Data Balancing and Domain Knowledge-driven Explainability

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216275%3A25410%2F24%3A39922247" target="_blank" >RIV/00216275:25410/24:39922247 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.sciencedirect.com/science/article/pii/S0957417424017536" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0957417424017536</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.eswa.2024.124886" target="_blank" >10.1016/j.eswa.2024.124886</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Enhancing Cardiovascular Risk Assessment with Advanced Data Balancing and Domain Knowledge-driven Explainability
Popis výsledku v původním jazyce
In medical risk prediction, such as predicting heart disease, machine learning (ML) classifiers must achieve high accuracy, precision, and recall to minimize the chances of incorrect diagnoses or treatment recommendations. However, real-world datasets often have imbalanced data, which can affect classifier performance. Traditional data balancing methods can lead to overfitting and underfitting, making it difficult to identify potential health risks accurately. Early prediction of heart attacks is of paramount importance, and researchers have developed ML-based systems to address this problem. However, much of the existing ML research is based on a single dataset, often ignoring performance evaluation across multiple datasets. As the demand for interpretable ML models grows, model interpretability becomes central to revealing insights and feature effects within predictive models. To address these challenges, we present a novel data balancing technique that uses a divide-and- conquer strategy with the K-Means clustering algorithm to segment the dataset. The performance of our approach is highlighted through comparisons with established techniques, which demonstrate the superiority of our proposed method. To address the challenge of inter-dataset discrepancies, we use two different datasets. Our holistic pipeline, strengthened by the innovative balancing technique, effectively addresses performance discrepancies, culminating in a significant improvement from 81% to 90%. Furthermore, through advanced statistical analysis, it has been determined that the 95% confidence interval for the AUC metric of our method ranges from 0.8187 to 0.8411. This observation serves to underscore the consistency and reliability of our approach, demonstrating its ability to achieve high performance across a range of scenarios. Incorporating Explainable AI (XAI), we examine the feature rankings and their contributions within the best performing Random Forest model. While the domain expert feedback is consistent with the explanatory power of XAI, some differences remain. Nevertheless, a remarkable convergence in feature ranking and weighting is observed, bridging the insights from XAI tools and domain expert perspectives.
Název v anglickém jazyce
Enhancing Cardiovascular Risk Assessment with Advanced Data Balancing and Domain Knowledge-driven Explainability
Popis výsledku anglicky
In medical risk prediction, such as predicting heart disease, machine learning (ML) classifiers must achieve high accuracy, precision, and recall to minimize the chances of incorrect diagnoses or treatment recommendations. However, real-world datasets often have imbalanced data, which can affect classifier performance. Traditional data balancing methods can lead to overfitting and underfitting, making it difficult to identify potential health risks accurately. Early prediction of heart attacks is of paramount importance, and researchers have developed ML-based systems to address this problem. However, much of the existing ML research is based on a single dataset, often ignoring performance evaluation across multiple datasets. As the demand for interpretable ML models grows, model interpretability becomes central to revealing insights and feature effects within predictive models. To address these challenges, we present a novel data balancing technique that uses a divide-and- conquer strategy with the K-Means clustering algorithm to segment the dataset. The performance of our approach is highlighted through comparisons with established techniques, which demonstrate the superiority of our proposed method. To address the challenge of inter-dataset discrepancies, we use two different datasets. Our holistic pipeline, strengthened by the innovative balancing technique, effectively addresses performance discrepancies, culminating in a significant improvement from 81% to 90%. Furthermore, through advanced statistical analysis, it has been determined that the 95% confidence interval for the AUC metric of our method ranges from 0.8187 to 0.8411. This observation serves to underscore the consistency and reliability of our approach, demonstrating its ability to achieve high performance across a range of scenarios. Incorporating Explainable AI (XAI), we examine the feature rankings and their contributions within the best performing Random Forest model. While the domain expert feedback is consistent with the explanatory power of XAI, some differences remain. Nevertheless, a remarkable convergence in feature ranking and weighting is observed, bridging the insights from XAI tools and domain expert perspectives.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Expert Systems with Applications
ISSN
0957-4174
e-ISSN
1873-6793
Svazek periodika
255
Číslo periodika v rámci svazku
December
Stát vydavatele periodika
GB - Spojené království Velké Británie a Severního Irska
Počet stran výsledku
20
Strana od-do
124886
Kód UT WoS článku
001286672200001
EID výsledku v databázi Scopus
2-s2.0-85200145279

Podobné výsledky(10)

Bridging the Explanation Gap in AI Security: A Task-Driven Approach to XAI Methods Evaluation Explainable Machine Learning for Intrusion Detection Utilization of Deep Learning and Expert Feature Classifier for Detection of Heart Murmurs

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Enhancing Cardiovascular Risk Assessment with Advanced Data Balancing and Domain Knowledge-driven Explainability

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)