WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Genetic network inference

Posted on 0 Comments
Tags: Bioinformatics

There are my notes when I read the paper called Genetic network inference.

Abstract

  1. high-throughput gene expression assays
  2. gene regulatory network
  3. clustering of co-expression profiles –> infer shared regulatory inputs and functional pathways
  4. various aspects of clustering –> distance measures clustering algorithms multiple-cluster memberships
  5. infer causal connections
  6. discrete Boolean networks continuous linear and non-linear models
  7. combination of predictive modeling with systematic experimental verification

Introduction

assay RNA and proein expression profiles with different levels of precision and depth

  1. hybridization microarrays
  2. automated RTPCR
  3. 2-dimensional gel electrophoresis
  4. antibody arrays

geneic information flow: defining the mapping from sequence space to functional space

complex dynamic systems: from states to trahectories to attractors

  1. principles of genetic network organization
  2. computational methods for extracting network architectures from experimental data

A conceptual approach to complex network dynamics

  1. The global gene expression pattern is the result of the collective behavior of individual regulatory pathways
  2. In such highly interconnected cellular signaling networks, gene function depends on its cellular context
  3. dynamic systems with large numbers of variables present a difficult mathematical problem. –> radically simplify the individual molecular interactions, and focus on the collective outcome.

Boolean Networks

each gene is considered as a binary variable—either ON or OFF—regulated by other genes through logical or Boolean functions cell differentiation corresponds to transitions from one global gene expression pattern to another Stability of global gene expression patterns can be understood as a consequence of the dynamic properties of the network, namely that all networks fall into one or more attractors, representing stable states of cell differentiation, adaptation or disease.

$N$ genes: $2^N$ gene expression patterns each gene is controlled by up to $K$ other genes in the network

Inference of regulation through clustering of gene expression data

  1. Large-scale gene screening technologies such as mRNA hybridization micro-arrays have dramatically increased our ability to explore the living organism at the genomic level
  2. classify gene expression patterns to explore shared functions and regulation. –> The simplest approach to clustering, sometimes referred to as GBA, is to select a gene and determine its nearest neighbors in expression space within a certain user-defined distance cut-off –> Genes sharing the same expression pattern are likely to be involved in the same regulatory process. –> Clustering allows us to extract groups of genes that are tightly co-expressed over a range of different experiments.

Distance measures and preprocessing

Distance measures can be divided into at least three classes, emphasizing different regularities present within the data:

  1. similarity according to positive correlations, which may identify similar or identical regulation;
  2. similarity according to positive and negative correlations, which may also help identify control processes that antagonistically regulate downstream pathways;
  3. similarity according to mutual information, which may detect even more complex relationships.

Distances

  1. For even longer time series, distance measures based on Fourier or wavelet transforms may be considered.

normalization and other preprocessing of the data.

  1. Distance measures that are sensitive to scaling and/or offsets (such as Euclidean distance) may require normalization of the data.
  2. Normalization can be done with respect to the maximum expression level for each gene, with respect to both minimum and maximum expression levels or with respect to the mean and standard deviation of each expression profile.

Clustering algorithms

non-hierarchical methods

  1. K-means (SOM)
  2. EM
  3. Autoclass(related to EM)

Hierarchical methods

  1. FITCH

Gene expression clustering is potentially useful in at least three areas:

  1. extraction of regulatory motifs (co-regulation from co-expression)
  2. inference of functional annotation
  3. as a molecular signature in distinguishing cell or tissue types

Clustering of gene expression patterns helps differentiate different cell types

for complex datasets spanning a variety of bio-logical responses, a gene should by definition be a member of several clusters, each reflecting a particular aspect of its function and control

several clustering methods could be used simultaneously, allocating each gene to several clusters based on the different regularities emphasized by each method.

Modeling methodologies

Gene-regulation models

  1. very abstract: Kauffman’s random Boolean networks –> the most mathematically tractable, and its simplicity allows examination of very large systems
  2. very concrete: the full biochemical interaction models with stochastic kinetics –> fits the biochemical reality better and may carry more weight with the experimental biologists –> its complexity necessarily restricts it to very small systems

Negative feedback with a moderate feedback gain has a stabilizing effect on the output of the system. However, negative feedback in Boolean circuits will always cause oscillations, rather than increased stability, because the Boolean transfer function effectively has an infinite slope (saturating at 0 and 1).

hybrid Boolean systems each gene has a continuous-valued internal state, and a Boolean external state

Deterministic or stochastic

Stochastic Petri Nets (SPNs): a subset of Markov processes, can be used to model molecular interactions

Spatial or non-spatial

Data availability

Forward and inverse modeling

reverse engineering problem: given an amount of data, what can we deduce about the unknown underlying regulatory network?

Gene network inference: reverse engineering

clustering only tells us which genes are co-regulated, not what is regulating what.

Data requirements

To correctly infer the regulation of a single gene, we need to observe the expression of that gene under many different combinations of expression levels of its regulatory inputs.

  1. time series
  2. individual measurements

The advantage of the time series is that it can provide insight into the dynamics of the process. On the other hand, data sets consisting of individual measurements provide an efficient way to map the attractors of the network.

Estimates for various network models

Correlation metric construction

Systems of differential equations

Conclusions and outlook

  1. Measurement quantity, depth and quality.
  2. Clustering and functional categorization.
  3. Reverse engineering.
  4. Integrated modeling. –> large-scale gene expression data –> other sources, ranging from sequence homology and cis-regulatory sequences, to disease association and a wide variety of functional knowledge from targeted experiments –> challenge: reliability and compatibility of these data sets.
  5. Coupling of modeling with systematic experimental design.

new words

  1. unravel
  2. metazoa
  3. basin
  4. fluctuate
  5. fluctuation
  6. yeast
  7. motifs
  8. Conspicuously
  9. elliptical
  10. mutual
  11. obscure
  12. agglomerative
  13. ligand
  14. coherent
  15. circuitry
  16. asynchronously
  17. eukaryotic
  18. compartments
  19. deduce
  20. simplistic
  21. coarse
  22. copious
  23. devise
  24. strain
  25. compatibility
  26. fidelity

Published in categories Note