WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Benchmarking Algorithms for Gene Regulatory Network Inference

Posted on (Update: )
Tags: Gene Regulatory Network, Pseudotime

This note is for Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A., & Murali, T. M. (2020). Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nature Methods, 17(2), Article 2.

scRNA-seq can be used to trace cellular lineages during differentiation and to identify new cell types.

A central question: whether we can discover the gene regulatory networks (GRNs) that control cellular differentiation and drive transitions from one cell type to another.

In such a GRN, each edge connects a transcription factor (TF) to a gene it regulates. Ideally, the edge is indirected from the TF to a gene it regulates.

Results

Datasets from synthetic networks

two-fold motivations for using synthetic networks

  • ground truth
  • isolate from any limitations of pseudotime inference algorithms

simulating these networks should produce a variety of different trajectories seen in differentiating and developing cells.

Methods

GeneNetWeaver

starts with a network of regulatory interactions among TFs and their targets. It computes a connected, dense subnetwork around a randomly selected seed node and converts this network into a system of differential equations.

To express this network in the form of ODEs, it assigns each node $i$ in the network a “gene” variable $x_i$ representing the level of messenger RNA expression and a “protein” variable $p_i$ representing the amount of TF produced by protein translation as follows

\[\frac{d[x_i]}{dt} = mf(R_i) - l_x[x_i]\\ \frac{d[p_i]}{dt} = r[x_i] - l_p[p_i]\]

where

  • $m$: mRNA transcription rate
  • $l_x$: mRNA degradation rate
  • $r$: the protein translation rate
  • $l_p$: the protein degradation rate
  • $R_i$: the set of regulators of node $i$
  • $f(R_i)$: nonlinear input function captures all the regulatory interactions controlling the expression of node $i$

If there are $N$ regulators for a given gene, there are $2^N$ possible configurations of how the regulators can bind to the gene’s promoter.

\[f(R_i) = \sum_{S\in 2^{R_i}} \alpha_S\Pr(S)\]

BoolODE uses Boolean models to create simulated datasets

Consider a gene $X$ with two activators ($P$ and $Q$) and one inhibitor $R$, represented by the following rule:

\[X = (P \lor Q) \land \neg (R)\]

The ODE governing the time dynamics of gene $X$ is

\[\frac{dX}{dt} = m\left(\frac{\alpha_0+\alpha_P[P]+\cdots}{1+[P]+[Q]+[R]+[P][Q]+[P][R]+[Q][R]+[P][Q][R]}\right)\]

only $\alpha_P, \alpha_Q$ and $\alpha_{PQ}$ have the value one and every other parameter has the value zero.

To create stochastic simulations, use the formulation to modify the ODE expressions as follows:

\[\frac{d[x_i]}{dt} & = mf(R_i) -l_x[x_i] + s\sqrt{[x_i]}\Delta W_t\\ \frac{d[p_i]}{dt} & = r[x_i] - l_p[p_i] + s\sqrt{[p_i]}\Delta W_t\]

where $\Delta W_t = N(0, h)$ and $s$ is the noise strength.

define the vector of gene expression values corresponding to a particular time point in a model simulation as a single cell

summary: develop the BoolODE approach to convert Boolean functions specifying a GRN directly to ODE equations. The proposed BoolODE pipeline accepts a file describing a Boolean model as input, creates an equivalent ODE model, add noise terms and numerically simulates a stochastic time course.

Datasets: a major challenge arises when evaluate GRN inference algorithms for single-cell RNS-seq data is that the “ground truth”


Published in categories Note