Streamlining NMR Chemical Shift Predictions for Intrinsically Disordered Proteins: Design of Ensembles with Dimensionality Reduction and Clustering
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11160%2F24%3A10487989" target="_blank" >RIV/00216208:11160/24:10487989 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/00216224:14310/24:00136971 RIV/62690094:18470/24:50021657
Výsledek na webu
<a href="https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=MQoIS34g.v" target="_blank" >https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=MQoIS34g.v</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1021/acs.jcim.4c00809" target="_blank" >10.1021/acs.jcim.4c00809</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Streamlining NMR Chemical Shift Predictions for Intrinsically Disordered Proteins: Design of Ensembles with Dimensionality Reduction and Clustering
Popis výsledku v původním jazyce
By merging advanced dimensionality reduction (DR) and clustering algorithm (CA) techniques, our study advances the sampling procedure for predicting NMR chemical shifts (CS) in intrinsically disordered proteins (IDPs), making a significant leap forward in the field of protein analysis/modeling. We enhance NMR CS sampling by generating clustered ensembles that accurately reflect the different properties and phenomena encapsulated by the IDP trajectories. This investigation critically assessed different rapid CS predictors, both neural network (e.g., Sparta+ and ShiftX2) and database-driven (ProCS-15), and highlighted the need for more advanced quantum calculations and the subsequent need for more tractable-sized conformational ensembles. Although neural network CS predictors outperformed ProCS-15 for all atoms, all tools showed poor agreement with H-N CSs, and the neural network CS predictors were unable to capture the influence of phosphorylated residues, highly relevant for IDPs. This study also addressed the limitations of using direct clustering with collective variables, such as the widespread implementation of the GROMOS algorithm. Clustered ensembles (CEs) produced by this algorithm showed poor performance with chemical shifts compared to sequential ensembles (SEs) of similar size. Instead, we implement a multiscale DR and CA approach and explore the challenges and limitations of applying these algorithms to obtain more robust and tractable CEs. The novel feature of this investigation is the use of solvent-accessible surface area (SASA) as one of the fingerprints for DR alongside previously investigated alpha carbon distance/angles or phi/psi dihedral angles. The ensembles produced with SASA tSNE DR produced CEs better aligned with the experimental CS of between 0.17 and 0.36 r(2) (0.18-0.26 ppm) depending on the system and replicate. Furthermore, this technique produced CEs with better agreement than traditional SEs in 85.7% of all ensemble sizes. This study investigates the quality of ensembles produced based on different input features, comparing latent spaces produced by linear vs nonlinear DR techniques and a novel integrated silhouette score scanning protocol for tSNE DR.
Název v anglickém jazyce
Streamlining NMR Chemical Shift Predictions for Intrinsically Disordered Proteins: Design of Ensembles with Dimensionality Reduction and Clustering
Popis výsledku anglicky
By merging advanced dimensionality reduction (DR) and clustering algorithm (CA) techniques, our study advances the sampling procedure for predicting NMR chemical shifts (CS) in intrinsically disordered proteins (IDPs), making a significant leap forward in the field of protein analysis/modeling. We enhance NMR CS sampling by generating clustered ensembles that accurately reflect the different properties and phenomena encapsulated by the IDP trajectories. This investigation critically assessed different rapid CS predictors, both neural network (e.g., Sparta+ and ShiftX2) and database-driven (ProCS-15), and highlighted the need for more advanced quantum calculations and the subsequent need for more tractable-sized conformational ensembles. Although neural network CS predictors outperformed ProCS-15 for all atoms, all tools showed poor agreement with H-N CSs, and the neural network CS predictors were unable to capture the influence of phosphorylated residues, highly relevant for IDPs. This study also addressed the limitations of using direct clustering with collective variables, such as the widespread implementation of the GROMOS algorithm. Clustered ensembles (CEs) produced by this algorithm showed poor performance with chemical shifts compared to sequential ensembles (SEs) of similar size. Instead, we implement a multiscale DR and CA approach and explore the challenges and limitations of applying these algorithms to obtain more robust and tractable CEs. The novel feature of this investigation is the use of solvent-accessible surface area (SASA) as one of the fingerprints for DR alongside previously investigated alpha carbon distance/angles or phi/psi dihedral angles. The ensembles produced with SASA tSNE DR produced CEs better aligned with the experimental CS of between 0.17 and 0.36 r(2) (0.18-0.26 ppm) depending on the system and replicate. Furthermore, this technique produced CEs with better agreement than traditional SEs in 85.7% of all ensemble sizes. This study investigates the quality of ensembles produced based on different input features, comparing latent spaces produced by linear vs nonlinear DR techniques and a novel integrated silhouette score scanning protocol for tSNE DR.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10403 - Physical chemistry
Návaznosti výsledku
Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Journal of Chemical Information and Modeling
ISSN
1549-9596
e-ISSN
1549-960X
Svazek periodika
64
Číslo periodika v rámci svazku
16
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
15
Strana od-do
6542-6556
Kód UT WoS článku
001284734600001
EID výsledku v databázi Scopus
2-s2.0-85200534789