Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989592%3A15310%2F21%3A73610091" target="_blank" >RIV/61989592:15310/21:73610091 - isvavai.cz</a>
Result on the web
<a href="https://onlinelibrary.wiley.com/doi/full/10.1002/sam.11514" target="_blank" >https://onlinelibrary.wiley.com/doi/full/10.1002/sam.11514</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1002/sam.11514" target="_blank" >10.1002/sam.11514</a>
Alternative languages
Result language
angličtina
Original language name
Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data
Original language description
High-throughput data representing large mixtures of chemical or biological signals are ordinarily produced in the molecular sciences. Given a number of samples, partial least squares (PLS) regression is a well-established statistical method to investigate associations between them and any continuous response variables of interest. However, technical artifacts generally make the raw signals not directly comparable between samples. Thus, data normalization is required before any meaningful scientific information can be drawn. This often allows to characterize the processed signals as compositional data where the relevant information is contained in the pairwise log-ratios between the components of the mixture. The (log-ratio) pivot coordinate approach facilitates the aggregation into single variables of the pairwise log-ratios of a component to all the remaining components. This simplifies interpretability and the investigation of their relative importance but, particularly in a high-dimensional context, the aggregated log-ratios can easily mix up information from different underlaying processes. In this context, we propose a weighting strategy for the construction of pivot coordinates for PLS regression which draws on the correlation between response variable and pairwise log-ratios. Using real and simulated data sets, we demonstrate that this proposal enhances the discovery of biological markers in high-throughput compositional data.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10103 - Statistics and probability
Result continuities
Project
<a href="/en/project/GA19-07155S" target="_blank" >GA19-07155S: Identification of regulatory networks controlling pea seed coat development using combination of RNA sequencing, protein and metabolites analysis.</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Statistical Analysis and Data Mining
ISSN
1932-1864
e-ISSN
—
Volume of the periodical
14
Issue of the periodical within the volume
4
Country of publishing house
US - UNITED STATES
Number of pages
16
Pages from-to
315-330
UT code for WoS article
000651867400001
EID of the result in the Scopus database
2-s2.0-85106329473