Data mining of publicly available SCRNA-SEQ datasets: application of machine learning to interrogate normal cellular counterparts of chronic lymphocytic leukemia
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F65269705%3A_____%2F21%3A00075047" target="_blank" >RIV/65269705:_____/21:00075047 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.conference.csac.cz/Amca-CSAC/media/content/2021/program/Book-of-abstracts-CSAC2021.pdf" target="_blank" >https://www.conference.csac.cz/Amca-CSAC/media/content/2021/program/Book-of-abstracts-CSAC2021.pdf</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Data mining of publicly available SCRNA-SEQ datasets: application of machine learning to interrogate normal cellular counterparts of chronic lymphocytic leukemia
Popis výsledku v původním jazyce
Chronic lymphocytic leukemia (CLL) is a lymphoproliferative disease of mature CD5+ B cells. There has been no consensus on the origin of CLL. It is also unclear whether the disease is derived from single or multiple precursors and at what stage the transformation occurs. The Extreme gradient boosting (XGBoost) algorithm is a machine learning approach that combines a large number of weak learners based on decision trees into a single strong classifier. This classifier can then be applied to a single sample to calculate a class probability that reflects its similarity to a given class. The aim of this study was to collect publicly available datasets of B cells, unify their annotation, and identify a subpopulation of healthy B cells most similar to CLL cells. Several single-cell RNA-sequencing (scRNA-seq) datasets of B cells have been published, however, differences in their quality and annotation represent obstacles for their straightforward exploration and generalization of the observations. Here, we reanalyzed the datasets of B cells from multiple tissues. We identified five shared subtypes in peripheral blood (PB) that we denoted as transitional, naïve, IgM+ alternative memory (AMB), CD1C high, and classical memory. To identify the normal counterpart of CLL, we built the XGBoost-based classification model for predicting B cell populations in PB. Then, we multiplexed nine well-characterized CLL patient samples using antibody-based hashtag oligos and performed scRNA-seq. We applied the classifier to predict the similarity of CLL cells to the five B cell populations in healthy PB. Most CLL cells were classified as either naïve or AMB. We, then, subclustered naïve and memory B cells and built two classifiers to predict the similarity of CLL cells predicted as either naïve or memory. Strikingly, within the subclusters most similar to CLL we detected small percentage of cells harboring some CLL markers.
Název v anglickém jazyce
Data mining of publicly available SCRNA-SEQ datasets: application of machine learning to interrogate normal cellular counterparts of chronic lymphocytic leukemia
Popis výsledku anglicky
Chronic lymphocytic leukemia (CLL) is a lymphoproliferative disease of mature CD5+ B cells. There has been no consensus on the origin of CLL. It is also unclear whether the disease is derived from single or multiple precursors and at what stage the transformation occurs. The Extreme gradient boosting (XGBoost) algorithm is a machine learning approach that combines a large number of weak learners based on decision trees into a single strong classifier. This classifier can then be applied to a single sample to calculate a class probability that reflects its similarity to a given class. The aim of this study was to collect publicly available datasets of B cells, unify their annotation, and identify a subpopulation of healthy B cells most similar to CLL cells. Several single-cell RNA-sequencing (scRNA-seq) datasets of B cells have been published, however, differences in their quality and annotation represent obstacles for their straightforward exploration and generalization of the observations. Here, we reanalyzed the datasets of B cells from multiple tissues. We identified five shared subtypes in peripheral blood (PB) that we denoted as transitional, naïve, IgM+ alternative memory (AMB), CD1C high, and classical memory. To identify the normal counterpart of CLL, we built the XGBoost-based classification model for predicting B cell populations in PB. Then, we multiplexed nine well-characterized CLL patient samples using antibody-based hashtag oligos and performed scRNA-seq. We applied the classifier to predict the similarity of CLL cells to the five B cell populations in healthy PB. Most CLL cells were classified as either naïve or AMB. We, then, subclustered naïve and memory B cells and built two classifiers to predict the similarity of CLL cells predicted as either naïve or memory. Strikingly, within the subclusters most similar to CLL we detected small percentage of cells harboring some CLL markers.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
30204 - Oncology
Návaznosti výsledku
Projekt
<a href="/cs/project/NU20-08-00314" target="_blank" >NU20-08-00314: Single cell analýza: moderní nástroj pro studium klonální evoluce u pacientů s chronickou lymfocytární leukémií s vysokým rizikem</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů