Biostatistic and machine learning in MALDI mass spectrometry research
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14110%2F23%3A00132354" target="_blank" >RIV/00216224:14110/23:00132354 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Biostatistic and machine learning in MALDI mass spectrometry research
Popis výsledku v původním jazyce
With increasing demands on precise analyses of biological samples in complex biological matrices, there is also a need to develop and optimize mass spectrometric (MS) methods. MS analysis of whole cells, plasma samples, and other biological materials is of great importance for monitoring and elucidating biological processes in the organism and provides important information regarding organism pheno/genotype. In two topics presented herein, different techniques for whole cell samples and peripheral blood plasma will be presented. The whole cell MALDI TOF MS is already used in clinical microbiology and diagnostics. In recent years it has been introduced also to cell biology, immunology, and cancer biology. The first project focuses on classifying ovarian cancer cells with different percentages of cell populations with a knockout of a single gene (TUSC3). Different cell types (4 in total) from different organisms (human and mouse) were introduced to MS analysis. MS method was combined with multivariate statistical and machine learning algorithms (PLS-DA, ANN, and RF for example) using an R programming language. Data obtained from MS were analysed via an in-house developed R-script. In total 5 optimized classifiers based on different algorithms were established and compared for 175 mass spectra divided into 5 groups. PLS-DA was determined as a model with the best performance with 100% accuracy (95% confidence interval, Cl = 94.7-100%) for the test data. The method described above was further used for other studies; to follow the differentiation process of hESCs to ELEPs for example. We visualized the full differentiation trajectory based on spectral data only and revealed also some phenotypic abnormalities linked to passage number, and by proxy aneuploidy status of hESCs. The second project is dealing with the development method for the analysis of human plasma samples using MALDI TOF MS. This project aims to discriminate multiple myeloma (MM) patients and patients with similar diseases like plasma cell leukemia (PCL) and extramedullary multiple myeloma (EMD). The two steps protein extraction protocol was developed for the classification of MM, PCL, and EMD patients. Intensity across the whole m/z range increased approx. 50 times when extraction protocol was used (compare to dilute direct plasma samples). The accuracy of classification models using ML algorithms (RF, PLS-DA, and ANN) was 80-90% for the training dataset and 80-85% for the test dataset. These findings may help accelerate the integration of MALDI MS into a clinical application as the diagnosis of MM, PCL, and EMD is rather inaccurate nowadays.
Název v anglickém jazyce
Biostatistic and machine learning in MALDI mass spectrometry research
Popis výsledku anglicky
With increasing demands on precise analyses of biological samples in complex biological matrices, there is also a need to develop and optimize mass spectrometric (MS) methods. MS analysis of whole cells, plasma samples, and other biological materials is of great importance for monitoring and elucidating biological processes in the organism and provides important information regarding organism pheno/genotype. In two topics presented herein, different techniques for whole cell samples and peripheral blood plasma will be presented. The whole cell MALDI TOF MS is already used in clinical microbiology and diagnostics. In recent years it has been introduced also to cell biology, immunology, and cancer biology. The first project focuses on classifying ovarian cancer cells with different percentages of cell populations with a knockout of a single gene (TUSC3). Different cell types (4 in total) from different organisms (human and mouse) were introduced to MS analysis. MS method was combined with multivariate statistical and machine learning algorithms (PLS-DA, ANN, and RF for example) using an R programming language. Data obtained from MS were analysed via an in-house developed R-script. In total 5 optimized classifiers based on different algorithms were established and compared for 175 mass spectra divided into 5 groups. PLS-DA was determined as a model with the best performance with 100% accuracy (95% confidence interval, Cl = 94.7-100%) for the test data. The method described above was further used for other studies; to follow the differentiation process of hESCs to ELEPs for example. We visualized the full differentiation trajectory based on spectral data only and revealed also some phenotypic abnormalities linked to passage number, and by proxy aneuploidy status of hESCs. The second project is dealing with the development method for the analysis of human plasma samples using MALDI TOF MS. This project aims to discriminate multiple myeloma (MM) patients and patients with similar diseases like plasma cell leukemia (PCL) and extramedullary multiple myeloma (EMD). The two steps protein extraction protocol was developed for the classification of MM, PCL, and EMD patients. Intensity across the whole m/z range increased approx. 50 times when extraction protocol was used (compare to dilute direct plasma samples). The accuracy of classification models using ML algorithms (RF, PLS-DA, and ANN) was 80-90% for the training dataset and 80-85% for the test dataset. These findings may help accelerate the integration of MALDI MS into a clinical application as the diagnosis of MM, PCL, and EMD is rather inaccurate nowadays.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
30400 - Medical biotechnology
Návaznosti výsledku
Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů