Genetic network inference
Posted on 0 Comments
There are my notes when I read the paper called Genetic network inference.
Abstract
- high-throughput gene expression assays
- gene regulatory network
- clustering of co-expression profiles –> infer shared regulatory inputs and functional pathways
- various aspects of clustering –> distance measures clustering algorithms multiple-cluster memberships
- infer causal connections
- discrete Boolean networks continuous linear and non-linear models
- combination of predictive modeling with systematic experimental verification
Introduction
assay RNA and proein expression profiles with different levels of precision and depth
- hybridization microarrays
- automated RTPCR
- 2-dimensional gel electrophoresis
- antibody arrays
geneic information flow: defining the mapping from sequence space to functional space
complex dynamic systems: from states to trahectories to attractors
- principles of genetic network organization
- computational methods for extracting network architectures from experimental data
A conceptual approach to complex network dynamics
- The global gene expression pattern is the result of the collective behavior of individual regulatory pathways
- In such highly interconnected cellular signaling networks, gene function depends on its cellular context
- dynamic systems with large numbers of variables present a difficult mathematical problem. –> radically simplify the individual molecular interactions, and focus on the collective outcome.
Boolean Networks
each gene is considered as a binary variable—either ON or OFF—regulated by other genes through logical or Boolean functions cell differentiation corresponds to transitions from one global gene expression pattern to another Stability of global gene expression patterns can be understood as a consequence of the dynamic properties of the network, namely that all networks fall into one or more attractors, representing stable states of cell differentiation, adaptation or disease.
$N$ genes: $2^N$ gene expression patterns each gene is controlled by up to $K$ other genes in the network
Inference of regulation through clustering of gene expression data
- Large-scale gene screening technologies such as mRNA hybridization micro-arrays have dramatically increased our ability to explore the living organism at the genomic level
- classify gene expression patterns to explore shared functions and regulation. –> The simplest approach to clustering, sometimes referred to as GBA, is to select a gene and determine its nearest neighbors in expression space within a certain user-defined distance cut-off –> Genes sharing the same expression pattern are likely to be involved in the same regulatory process. –> Clustering allows us to extract groups of genes that are tightly co-expressed over a range of different experiments.
Distance measures and preprocessing
Distance measures can be divided into at least three classes, emphasizing different regularities present within the data:
- similarity according to positive correlations, which may identify similar or identical regulation;
- similarity according to positive and negative correlations, which may also help identify control processes that antagonistically regulate downstream pathways;
- similarity according to mutual information, which may detect even more complex relationships.
Distances
- For even longer time series, distance measures based on Fourier or wavelet transforms may be considered.
normalization and other preprocessing of the data.
- Distance measures that are sensitive to scaling and/or offsets (such as Euclidean distance) may require normalization of the data.
- Normalization can be done with respect to the maximum expression level for each gene, with respect to both minimum and maximum expression levels or with respect to the mean and standard deviation of each expression profile.
Clustering algorithms
non-hierarchical methods
- K-means (SOM)
- EM
- Autoclass(related to EM)
Hierarchical methods
- FITCH
Gene expression clustering is potentially useful in at least three areas:
- extraction of regulatory motifs (co-regulation from co-expression)
- inference of functional annotation
- as a molecular signature in distinguishing cell or tissue types
Clustering of gene expression patterns helps differentiate different cell types
for complex datasets spanning a variety of bio-logical responses, a gene should by definition be a member of several clusters, each reflecting a particular aspect of its function and control
several clustering methods could be used simultaneously, allocating each gene to several clusters based on the different regularities emphasized by each method.
Modeling methodologies
Gene-regulation models
- very abstract: Kauffman’s random Boolean networks –> the most mathematically tractable, and its simplicity allows examination of very large systems
- very concrete: the full biochemical interaction models with stochastic kinetics –> fits the biochemical reality better and may carry more weight with the experimental biologists –> its complexity necessarily restricts it to very small systems
Negative feedback with a moderate feedback gain has a stabilizing effect on the output of the system. However, negative feedback in Boolean circuits will always cause oscillations, rather than increased stability, because the Boolean transfer function effectively has an infinite slope (saturating at 0 and 1).
hybrid Boolean systems each gene has a continuous-valued internal state, and a Boolean external state
Deterministic or stochastic
Stochastic Petri Nets (SPNs): a subset of Markov processes, can be used to model molecular interactions
Spatial or non-spatial
Data availability
Forward and inverse modeling
reverse engineering problem: given an amount of data, what can we deduce about the unknown underlying regulatory network?
Gene network inference: reverse engineering
clustering only tells us which genes are co-regulated, not what is regulating what.
Data requirements
To correctly infer the regulation of a single gene, we need to observe the expression of that gene under many different combinations of expression levels of its regulatory inputs.
- time series
- individual measurements
The advantage of the time series is that it can provide insight into the dynamics of the process. On the other hand, data sets consisting of individual measurements provide an efficient way to map the attractors of the network.
Estimates for various network models
Correlation metric construction
Systems of differential equations
Conclusions and outlook
- Measurement quantity, depth and quality.
- Clustering and functional categorization.
- Reverse engineering.
- Integrated modeling. –> large-scale gene expression data –> other sources, ranging from sequence homology and cis-regulatory sequences, to disease association and a wide variety of functional knowledge from targeted experiments –> challenge: reliability and compatibility of these data sets.
- Coupling of modeling with systematic experimental design.
new words
- unravel
- metazoa
- basin
- fluctuate
- fluctuation
- yeast
- motifs
- Conspicuously
- elliptical
- mutual
- obscure
- agglomerative
- ligand
- coherent
- circuitry
- asynchronously
- eukaryotic
- compartments
- deduce
- simplistic
- coarse
- copious
- devise
- strain
- compatibility
- fidelity