WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Uncertainty of Pseudotime Trajectory

Posted on
Tags: Pseudotime, Single-cell, Permutation Test, Tree Variability

This post is for Tenha, Lovemore, and Mingzhou Song. “Statistical Evidence for the Presence of Trajectory in Single-cell Data.” BMC Bioinformatics 23, no. Suppl 8 (August 16, 2022): 340.

Many computational methods have been developed to infer trajectories from single-cell data. however, few methods address the problem of determining the existence of a trajectory in observed data before attempting trajectory inference

the paper introduces a method to identify the existence of a trajectory using three graph-based statistics. Permutation test.

introduce graph-based statistics which quantify trajectory existence. Minimum-spanning-tree (MST) based statistics have been successfully used in analyzing global structures in galaxy data

  • input: data matrix such as a single-cell RNA-seq dataset, cell x feature
  • output: a set of $p$-values, each corresponding to a given number of cluster $k$, such as $a\le k\le b$ where $a$ and $b$ are minimum and maximum number of clusters, respectively. From the set of $p$-values, the median $p$-value measures the statistical significance for the presence of tarjectory in the particular dataset.
  1. to capture any global topological structure, the data are first partitioned into $k$ homogeneous regions using $k$-means clustering
  2. weighted undirected graph $G$ by the $k$ cluster centers
  3. compute a MST $H$ on graph using Prim’s algorithm: a set of $k$ nodes and $k-1$ edges
  4. three tree-based statistics to characterize the presence of trajectory in the data
    1. number of degree-one nodes $T_1(X) = D_1(H)$: hypothesis that if there is a trajectory in the data, the MST built on the cluster centers would have fewer branches and thus tends to be more linear.
    2. number of degree-two nodes $T_2(X) = D_2(H)$
    3. length of a longest path $T_3(X) = L_\max(H)$: a more compact tree tends to have more branches and is representative of data with no trajectory patterns.

Published in categories Note