Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989592%3A15310%2F21%3A73610091" target="_blank" >RIV/61989592:15310/21:73610091 - isvavai.cz</a>
Výsledek na webu
<a href="https://onlinelibrary.wiley.com/doi/full/10.1002/sam.11514" target="_blank" >https://onlinelibrary.wiley.com/doi/full/10.1002/sam.11514</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1002/sam.11514" target="_blank" >10.1002/sam.11514</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data
Popis výsledku v původním jazyce
High-throughput data representing large mixtures of chemical or biological signals are ordinarily produced in the molecular sciences. Given a number of samples, partial least squares (PLS) regression is a well-established statistical method to investigate associations between them and any continuous response variables of interest. However, technical artifacts generally make the raw signals not directly comparable between samples. Thus, data normalization is required before any meaningful scientific information can be drawn. This often allows to characterize the processed signals as compositional data where the relevant information is contained in the pairwise log-ratios between the components of the mixture. The (log-ratio) pivot coordinate approach facilitates the aggregation into single variables of the pairwise log-ratios of a component to all the remaining components. This simplifies interpretability and the investigation of their relative importance but, particularly in a high-dimensional context, the aggregated log-ratios can easily mix up information from different underlaying processes. In this context, we propose a weighting strategy for the construction of pivot coordinates for PLS regression which draws on the correlation between response variable and pairwise log-ratios. Using real and simulated data sets, we demonstrate that this proposal enhances the discovery of biological markers in high-throughput compositional data.
Název v anglickém jazyce
Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data
Popis výsledku anglicky
High-throughput data representing large mixtures of chemical or biological signals are ordinarily produced in the molecular sciences. Given a number of samples, partial least squares (PLS) regression is a well-established statistical method to investigate associations between them and any continuous response variables of interest. However, technical artifacts generally make the raw signals not directly comparable between samples. Thus, data normalization is required before any meaningful scientific information can be drawn. This often allows to characterize the processed signals as compositional data where the relevant information is contained in the pairwise log-ratios between the components of the mixture. The (log-ratio) pivot coordinate approach facilitates the aggregation into single variables of the pairwise log-ratios of a component to all the remaining components. This simplifies interpretability and the investigation of their relative importance but, particularly in a high-dimensional context, the aggregated log-ratios can easily mix up information from different underlaying processes. In this context, we propose a weighting strategy for the construction of pivot coordinates for PLS regression which draws on the correlation between response variable and pairwise log-ratios. Using real and simulated data sets, we demonstrate that this proposal enhances the discovery of biological markers in high-throughput compositional data.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10103 - Statistics and probability

Návaznosti výsledku

Projekt
<a href="/cs/project/GA19-07155S" target="_blank" >GA19-07155S: Identifikace regulačních sítí kontrolujících vývoj osemení hrachu pomocí RNA sekvenování, proteinové a metabolomické analýzy.</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Statistical Analysis and Data Mining
ISSN
1932-1864
e-ISSN
—
Svazek periodika
14
Číslo periodika v rámci svazku
4
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
16
Strana od-do
315-330
Kód UT WoS článku
000651867400001
EID výsledku v databázi Scopus
2-s2.0-85106329473

Podobné výsledky(10)

Principal balances of compositional data for regression and classification using partial least squares Selective pivot logratio coordinates for partial least squares discriminant analysis modelling with applications in metabolomics Partial least squares regression with compositional response variables and covariates

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)