Data mining of publicly available SCRNA-SEQ datasets: application of machine learning to interrogate normal cellular counterparts of chronic lymphocytic leukemia
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F65269705%3A_____%2F21%3A00075047" target="_blank" >RIV/65269705:_____/21:00075047 - isvavai.cz</a>
Result on the web
<a href="https://www.conference.csac.cz/Amca-CSAC/media/content/2021/program/Book-of-abstracts-CSAC2021.pdf" target="_blank" >https://www.conference.csac.cz/Amca-CSAC/media/content/2021/program/Book-of-abstracts-CSAC2021.pdf</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Data mining of publicly available SCRNA-SEQ datasets: application of machine learning to interrogate normal cellular counterparts of chronic lymphocytic leukemia
Original language description
Chronic lymphocytic leukemia (CLL) is a lymphoproliferative disease of mature CD5+ B cells. There has been no consensus on the origin of CLL. It is also unclear whether the disease is derived from single or multiple precursors and at what stage the transformation occurs. The Extreme gradient boosting (XGBoost) algorithm is a machine learning approach that combines a large number of weak learners based on decision trees into a single strong classifier. This classifier can then be applied to a single sample to calculate a class probability that reflects its similarity to a given class. The aim of this study was to collect publicly available datasets of B cells, unify their annotation, and identify a subpopulation of healthy B cells most similar to CLL cells. Several single-cell RNA-sequencing (scRNA-seq) datasets of B cells have been published, however, differences in their quality and annotation represent obstacles for their straightforward exploration and generalization of the observations. Here, we reanalyzed the datasets of B cells from multiple tissues. We identified five shared subtypes in peripheral blood (PB) that we denoted as transitional, naïve, IgM+ alternative memory (AMB), CD1C high, and classical memory. To identify the normal counterpart of CLL, we built the XGBoost-based classification model for predicting B cell populations in PB. Then, we multiplexed nine well-characterized CLL patient samples using antibody-based hashtag oligos and performed scRNA-seq. We applied the classifier to predict the similarity of CLL cells to the five B cell populations in healthy PB. Most CLL cells were classified as either naïve or AMB. We, then, subclustered naïve and memory B cells and built two classifiers to predict the similarity of CLL cells predicted as either naïve or memory. Strikingly, within the subclusters most similar to CLL we detected small percentage of cells harboring some CLL markers.
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
30204 - Oncology
Result continuities
Project
<a href="/en/project/NU20-08-00314" target="_blank" >NU20-08-00314: Single cell analysis: a modern tool to study clonal evolution in high-risk patients with chronic lymphocytic leukemia</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů