Benchmarking Algorithms for Gene Regulatory Network Inference
Posted on (Update: )
scRNA-seq can be used to trace cellular lineages during differentiation and to identify new cell types.
A central question: whether we can discover the gene regulatory networks (GRNs) that control cellular differentiation and drive transitions from one cell type to another.
In such a GRN, each edge connects a transcription factor (TF) to a gene it regulates. Ideally, the edge is indirected from the TF to a gene it regulates.
Results
Datasets from synthetic networks
two-fold motivations for using synthetic networks
- ground truth
- isolate from any limitations of pseudotime inference algorithms
simulating these networks should produce a variety of different trajectories seen in differentiating and developing cells.
Methods
GeneNetWeaver
starts with a network of regulatory interactions among TFs and their targets. It computes a connected, dense subnetwork around a randomly selected seed node and converts this network into a system of differential equations.
To express this network in the form of ODEs, it assigns each node $i$ in the network a “gene” variable $x_i$ representing the level of messenger RNA expression and a “protein” variable $p_i$ representing the amount of TF produced by protein translation as follows
\[\frac{d[x_i]}{dt} = mf(R_i) - l_x[x_i]\\ \frac{d[p_i]}{dt} = r[x_i] - l_p[p_i]\]where
- $m$: mRNA transcription rate
- $l_x$: mRNA degradation rate
- $r$: the protein translation rate
- $l_p$: the protein degradation rate
- $R_i$: the set of regulators of node $i$
- $f(R_i)$: nonlinear input function captures all the regulatory interactions controlling the expression of node $i$
If there are $N$ regulators for a given gene, there are $2^N$ possible configurations of how the regulators can bind to the gene’s promoter.
\[f(R_i) = \sum_{S\in 2^{R_i}} \alpha_S\Pr(S)\]BoolODE uses Boolean models to create simulated datasets
Consider a gene $X$ with two activators ($P$ and $Q$) and one inhibitor $R$, represented by the following rule:
\[X = (P \lor Q) \land \neg (R)\]The ODE governing the time dynamics of gene $X$ is
\[\frac{dX}{dt} = m\left(\frac{\alpha_0+\alpha_P[P]+\cdots}{1+[P]+[Q]+[R]+[P][Q]+[P][R]+[Q][R]+[P][Q][R]}\right)\]only $\alpha_P, \alpha_Q$ and $\alpha_{PQ}$ have the value one and every other parameter has the value zero.
To create stochastic simulations, use the formulation to modify the ODE expressions as follows:
\[\frac{d[x_i]}{dt} & = mf(R_i) -l_x[x_i] + s\sqrt{[x_i]}\Delta W_t\\ \frac{d[p_i]}{dt} & = r[x_i] - l_p[p_i] + s\sqrt{[p_i]}\Delta W_t\]where $\Delta W_t = N(0, h)$ and $s$ is the noise strength.
define the vector of gene expression values corresponding to a particular time point in a model simulation as a single cell
summary: develop the BoolODE approach to convert Boolean functions specifying a GRN directly to ODE equations. The proposed BoolODE pipeline accepts a file describing a Boolean model as input, creates an equivalent ODE model, add noise terms and numerically simulates a stochastic time course.
Datasets: a major challenge arises when evaluate GRN inference algorithms for single-cell RNS-seq data is that the “ground truth”