Genetic network inference

Posted on Mar 14, 2017

Tags: Bioinformatics

There are my notes when I read the paper called Genetic network inference.

Abstract

high-throughput gene expression assays
gene regulatory network
clustering of co-expression profiles –> infer shared regulatory inputs and functional pathways
various aspects of clustering –> distance measures clustering algorithms multiple-cluster memberships
infer causal connections
discrete Boolean networks continuous linear and non-linear models
combination of predictive modeling with systematic experimental verification

Introduction

assay RNA and proein expression profiles with different levels of precision and depth

hybridization microarrays
automated RTPCR
2-dimensional gel electrophoresis
antibody arrays

geneic information flow: defining the mapping from sequence space to functional space

complex dynamic systems: from states to trahectories to attractors

principles of genetic network organization
computational methods for extracting network architectures from experimental data

A conceptual approach to complex network dynamics

The global gene expression pattern is the result of the collective behavior of individual regulatory pathways
In such highly interconnected cellular signaling networks, gene function depends on its cellular context
dynamic systems with large numbers of variables present a difficult mathematical problem. –> radically simplify the individual molecular interactions, and focus on the collective outcome.

Boolean Networks

each gene is considered as a binary variable—either ON or OFF—regulated by other genes through logical or Boolean functions cell differentiation corresponds to transitions from one global gene expression pattern to another Stability of global gene expression patterns can be understood as a consequence of the dynamic properties of the network, namely that all networks fall into one or more attractors, representing stable states of cell differentiation, adaptation or disease.

$N$ genes: $2^N$ gene expression patterns each gene is controlled by up to $K$ other genes in the network

Inference of regulation through clustering of gene expression data

Large-scale gene screening technologies such as mRNA hybridization micro-arrays have dramatically increased our ability to explore the living organism at the genomic level
classify gene expression patterns to explore shared functions and regulation. –> The simplest approach to clustering, sometimes referred to as GBA, is to select a gene and determine its nearest neighbors in expression space within a certain user-defined distance cut-off –> Genes sharing the same expression pattern are likely to be involved in the same regulatory process. –> Clustering allows us to extract groups of genes that are tightly co-expressed over a range of different experiments.

Distance measures and preprocessing

Distance measures can be divided into at least three classes, emphasizing different regularities present within the data:

similarity according to positive correlations, which may identify similar or identical regulation;
similarity according to positive and negative correlations, which may also help identify control processes that antagonistically regulate downstream pathways;
similarity according to mutual information, which may detect even more complex relationships.

Distances

For even longer time series, distance measures based on Fourier or wavelet transforms may be considered.

normalization and other preprocessing of the data.

Distance measures that are sensitive to scaling and/or offsets (such as Euclidean distance) may require normalization of the data.
Normalization can be done with respect to the maximum expression level for each gene, with respect to both minimum and maximum expression levels or with respect to the mean and standard deviation of each expression profile.

Clustering algorithms

non-hierarchical methods

K-means (SOM)
EM
Autoclass(related to EM)

Hierarchical methods

FITCH

Gene expression clustering is potentially useful in at least three areas:

extraction of regulatory motifs (co-regulation from co-expression)
inference of functional annotation
as a molecular signature in distinguishing cell or tissue types

Clustering of gene expression patterns helps differentiate different cell types

for complex datasets spanning a variety of bio-logical responses, a gene should by definition be a member of several clusters, each reflecting a particular aspect of its function and control

several clustering methods could be used simultaneously, allocating each gene to several clusters based on the different regularities emphasized by each method.

Modeling methodologies

Gene-regulation models

very abstract: Kauffman’s random Boolean networks –> the most mathematically tractable, and its simplicity allows examination of very large systems
very concrete: the full biochemical interaction models with stochastic kinetics –> fits the biochemical reality better and may carry more weight with the experimental biologists –> its complexity necessarily restricts it to very small systems

Negative feedback with a moderate feedback gain has a stabilizing effect on the output of the system. However, negative feedback in Boolean circuits will always cause oscillations, rather than increased stability, because the Boolean transfer function effectively has an infinite slope (saturating at 0 and 1).

hybrid Boolean systems each gene has a continuous-valued internal state, and a Boolean external state

Deterministic or stochastic

Stochastic Petri Nets (SPNs): a subset of Markov processes, can be used to model molecular interactions

Spatial or non-spatial

Data availability

Forward and inverse modeling

reverse engineering problem: given an amount of data, what can we deduce about the unknown underlying regulatory network?

Gene network inference: reverse engineering

clustering only tells us which genes are co-regulated, not what is regulating what.

Data requirements

To correctly infer the regulation of a single gene, we need to observe the expression of that gene under many different combinations of expression levels of its regulatory inputs.

time series
individual measurements

The advantage of the time series is that it can provide insight into the dynamics of the process. On the other hand, data sets consisting of individual measurements provide an efficient way to map the attractors of the network.

Estimates for various network models

Correlation metric construction

Systems of differential equations

Conclusions and outlook

Measurement quantity, depth and quality.
Clustering and functional categorization.
Reverse engineering.
Integrated modeling. –> large-scale gene expression data –> other sources, ranging from sequence homology and cis-regulatory sequences, to disease association and a wide variety of functional knowledge from targeted experiments –> challenge: reliability and compatibility of these data sets.
Coupling of modeling with systematic experimental design.

new words

unravel
metazoa
basin
fluctuate
fluctuation
yeast
motifs
Conspicuously
elliptical
mutual
obscure
agglomerative
ligand
coherent
circuitry
asynchronously
eukaryotic
compartments
deduce
simplistic
coarse
copious
devise
strain
compatibility
fidelity

Published in categories Note

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.