WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Benchmarking Algorithms for Gene Regulatory Network Inference

Posted on (Update: )
Tags: Gene Regulatory Network, Pseudotime

This note is for Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A., & Murali, T. M. (2020). Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nature Methods, 17(2), Article 2.

scRNA-seq can be used to trace cellular lineages during differentiation and to identify new cell types.

A central question: whether we can discover the gene regulatory networks (GRNs) that control cellular differentiation and drive transitions from one cell type to another.

In such a GRN, each edge connects a transcription factor (TF) to a gene it regulates. Ideally, the edge is indirected from the TF to a gene it regulates.

Results

Datasets from synthetic networks

two-fold motivations for using synthetic networks

  • ground truth
  • isolate from any limitations of pseudotime inference algorithms

simulating these networks should produce a variety of different trajectories seen in differentiating and developing cells.

Methods

GeneNetWeaver

starts with a network of regulatory interactions among TFs and their targets. It computes a connected, dense subnetwork around a randomly selected seed node and converts this network into a system of differential equations.

To express this network in the form of ODEs, it assigns each node $i$ in the network a “gene” variable $x_i$ representing the level of messenger RNA expression and a “protein” variable $p_i$ representing the amount of TF produced by protein translation as follows

\[\frac{d[x_i]}{dt} = mf(R_i) - l_x[x_i]\\ \frac{d[p_i]}{dt} = r[x_i] - l_p[p_i]\]

where

  • $m$: mRNA transcription rate
  • $l_x$: mRNA degradation rate
  • $r$: the protein translation rate
  • $l_p$: the protein degradation rate
  • $R_i$: the set of regulators of node $i$
  • $f(R_i)$: nonlinear input function captures all the regulatory interactions controlling the expression of node $i$

If there are $N$ regulators for a given gene, there are $2^N$ possible configurations of how the regulators can bind to the gene’s promoter.

\[f(R_i) = \sum_{S\in 2^{R_i}} \alpha_S\Pr(S)\]

BoolODE uses Boolean models to create simulated datasets

Consider a gene $X$ with two activators ($P$ and $Q$) and one inhibitor $R$, represented by the following rule:

\[X = (P \lor Q) \land \neg (R)\]

The ODE governing the time dynamics of gene $X$ is

\[\frac{dX}{dt} = m\left(\frac{\alpha_0+\alpha_P[P]+\cdots}{1+[P]+[Q]+[R]+[P][Q]+[P][R]+[Q][R]+[P][Q][R]}\right)\]

only $\alpha_P, \alpha_Q$ and $\alpha_{PQ}$ have the value one and every other parameter has the value zero.

To create stochastic simulations, use the formulation to modify the ODE expressions as follows:

\[\frac{d[x_i]}{dt} & = mf(R_i) -l_x[x_i] + s\sqrt{[x_i]}\Delta W_t\\ \frac{d[p_i]}{dt} & = r[x_i] - l_p[p_i] + s\sqrt{[p_i]}\Delta W_t\]

where $\Delta W_t = N(0, h)$ and $s$ is the noise strength.

define the vector of gene expression values corresponding to a particular time point in a model simulation as a single cell

summary: develop the BoolODE approach to convert Boolean functions specifying a GRN directly to ODE equations. The proposed BoolODE pipeline accepts a file describing a Boolean model as input, creates an equivalent ODE model, add noise terms and numerically simulates a stochastic time course.

Datasets: a major challenge arises when evaluate GRN inference algorithms for single-cell RNS-seq data is that the “ground truth”


Published in categories Note