Semantic biclustering for finding local, interpretable and predictive expression patterns
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F17%3A00314750" target="_blank" >RIV/68407700:21230/17:00314750 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1186/s12864-017-4132-5" target="_blank" >https://doi.org/10.1186/s12864-017-4132-5</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1186/s12864-017-4132-5" target="_blank" >10.1186/s12864-017-4132-5</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Semantic biclustering for finding local, interpretable and predictive expression patterns
Popis výsledku v původním jazyce
Background: One of the major challenges in the analysis of gene expression data is to identify local patterns composed of genes showing coherent expression across subsets of experimental conditions. Such patterns may provide an understanding of underlying biological processes related to these conditions. This understanding can further be improved by providing concise characterizations of the genes and situations delimiting the pattern. Results: We propose a method called semantic biclustering with the aim to detect interpretable rectangular patterns in binary data matrices. As usual in biclustering, we seek homogeneous submatrices, however, we also require that the included elements can be jointly described in terms of semantic annotations pertaining to both rows (genes) and columns (samples). To find such interpretable biclusters, we explore two strategies. The first endows an existing biclustering algorithm with the semantic ingredients. The other is based on rule and tree learning known from machine learning. Conclusions: The two alternatives are tested in experiments with two Drosophila melanogaster gene expression datasets. Both strategies are shown to detect sets of compact biclusters with semantic descriptions that also remain largely valid for unseen (testing) data. This desirable generalization aspect is more emphasized in the strategy stemming from conventional biclustering although this is traded off by the complexity of the descriptions (number of ontology terms employed), which, on the other hand, is lower for the alternative strategy.
Název v anglickém jazyce
Semantic biclustering for finding local, interpretable and predictive expression patterns
Popis výsledku anglicky
Background: One of the major challenges in the analysis of gene expression data is to identify local patterns composed of genes showing coherent expression across subsets of experimental conditions. Such patterns may provide an understanding of underlying biological processes related to these conditions. This understanding can further be improved by providing concise characterizations of the genes and situations delimiting the pattern. Results: We propose a method called semantic biclustering with the aim to detect interpretable rectangular patterns in binary data matrices. As usual in biclustering, we seek homogeneous submatrices, however, we also require that the included elements can be jointly described in terms of semantic annotations pertaining to both rows (genes) and columns (samples). To find such interpretable biclusters, we explore two strategies. The first endows an existing biclustering algorithm with the semantic ingredients. The other is based on rule and tree learning known from machine learning. Conclusions: The two alternatives are tested in experiments with two Drosophila melanogaster gene expression datasets. Both strategies are shown to detect sets of compact biclusters with semantic descriptions that also remain largely valid for unseen (testing) data. This desirable generalization aspect is more emphasized in the strategy stemming from conventional biclustering although this is traded off by the complexity of the descriptions (number of ontology terms employed), which, on the other hand, is lower for the alternative strategy.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/GA14-21421S" target="_blank" >GA14-21421S: Automatická analýza prostorových vzorů genové exprese</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
BMC Genomics
ISSN
1471-2164
e-ISSN
1471-2164
Svazek periodika
18
Číslo periodika v rámci svazku
4132
Stát vydavatele periodika
GB - Spojené království Velké Británie a Severního Irska
Počet stran výsledku
13
Strana od-do
—
Kód UT WoS článku
000413785600005
EID výsledku v databázi Scopus
2-s2.0-85031494977