Applying the Deep Learning Techniques to Solve Classification Tasks Using Gene Expression Data
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F44555601%3A13440%2F24%3A43898369" target="_blank" >RIV/44555601:13440/24:43898369 - isvavai.cz</a>
Výsledek na webu
<a href="https://ieeexplore.ieee.org/document/10440636" target="_blank" >https://ieeexplore.ieee.org/document/10440636</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ACCESS.2024.3368070" target="_blank" >10.1109/ACCESS.2024.3368070</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Applying the Deep Learning Techniques to Solve Classification Tasks Using Gene Expression Data
Popis výsledku v původním jazyce
This manuscript explores the application of deep learning (DL) techniques for classifying gene expression data. A key aspect of our research is the comparative analysis of various DL neural network architectures, including Convolution Neural Networks (CNN), Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) Recurrent Neural Networks (RNN), as well as hybrid models that combine these networks. We applied the Bayesian optimization algorithm using 5-fold cross-validation for optimal hyperparameter tuning, which is crucial for DL algorithm performance. Significantly, we have advanced the methods for applying RNNs in processing gene expression data, particularly focusing on LSTM and GRU types. Our study introduces also a novel hybrid quality criterion for data classification, calculated as a weighted sum of partial quality criteria, incorporating an integrated F1-score derived through the Harrington desirability method. Furthermore, we investigate hybrid models that leverage various DL methods, enhancing decision-making objectivity in sample identification. This model uses a step-by-step information processing procedure, initially applying different DL models to gene expression data and subsequently processing these through a CART-based classifier for final decision-making. Our experiments, performed on gene expression data from patients with eight cancer types and one subset with normal samples (without cancer), demonstrated that GRU-RNN-based models, particularly a two-layer GRU-RNN, achieved the highest classification efficacy, with an accuracy of 97.8% on the test dataset. The performance of this model exceeded that of other models, whose accuracy varied between 96.6% and 97.3%. Comparative analysis with other studies in this field suggests that the proposed techniques demonstrate higher efficacy compared to similar research regarding the application of DL models for cancer-type diagnosis.
Název v anglickém jazyce
Applying the Deep Learning Techniques to Solve Classification Tasks Using Gene Expression Data
Popis výsledku anglicky
This manuscript explores the application of deep learning (DL) techniques for classifying gene expression data. A key aspect of our research is the comparative analysis of various DL neural network architectures, including Convolution Neural Networks (CNN), Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) Recurrent Neural Networks (RNN), as well as hybrid models that combine these networks. We applied the Bayesian optimization algorithm using 5-fold cross-validation for optimal hyperparameter tuning, which is crucial for DL algorithm performance. Significantly, we have advanced the methods for applying RNNs in processing gene expression data, particularly focusing on LSTM and GRU types. Our study introduces also a novel hybrid quality criterion for data classification, calculated as a weighted sum of partial quality criteria, incorporating an integrated F1-score derived through the Harrington desirability method. Furthermore, we investigate hybrid models that leverage various DL methods, enhancing decision-making objectivity in sample identification. This model uses a step-by-step information processing procedure, initially applying different DL models to gene expression data and subsequently processing these through a CART-based classifier for final decision-making. Our experiments, performed on gene expression data from patients with eight cancer types and one subset with normal samples (without cancer), demonstrated that GRU-RNN-based models, particularly a two-layer GRU-RNN, achieved the highest classification efficacy, with an accuracy of 97.8% on the test dataset. The performance of this model exceeded that of other models, whose accuracy varied between 96.6% and 97.3%. Comparative analysis with other studies in this field suggests that the proposed techniques demonstrate higher efficacy compared to similar research regarding the application of DL models for cancer-type diagnosis.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
IEEE Access
ISSN
2169-3536
e-ISSN
—
Svazek periodika
2024
Číslo periodika v rámci svazku
12
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
12
Strana od-do
28437-28448
Kód UT WoS článku
001174249000001
EID výsledku v databázi Scopus
2-s2.0-85186090110