Uncertainty of Pseudotime Trajectory
Posted on
Many computational methods have been developed to infer trajectories from single-cell data. however, few methods address the problem of determining the existence of a trajectory in observed data before attempting trajectory inference
the paper introduces a method to identify the existence of a trajectory using three graph-based statistics. Permutation test.
introduce graph-based statistics which quantify trajectory existence. Minimum-spanning-tree (MST) based statistics have been successfully used in analyzing global structures in galaxy data
- input: data matrix such as a single-cell RNA-seq dataset, cell x feature
- output: a set of $p$-values, each corresponding to a given number of cluster $k$, such as $a\le k\le b$ where $a$ and $b$ are minimum and maximum number of clusters, respectively. From the set of $p$-values, the median $p$-value measures the statistical significance for the presence of tarjectory in the particular dataset.
- to capture any global topological structure, the data are first partitioned into $k$ homogeneous regions using $k$-means clustering
- weighted undirected graph $G$ by the $k$ cluster centers
- compute a MST $H$ on graph using Prim’s algorithm: a set of $k$ nodes and $k-1$ edges
- three tree-based statistics to characterize the presence of trajectory in the data
- number of degree-one nodes $T_1(X) = D_1(H)$: hypothesis that if there is a trajectory in the data, the MST built on the cluster centers would have fewer branches and thus tends to be more linear.
- number of degree-two nodes $T_2(X) = D_2(H)$
- length of a longest path $T_3(X) = L_\max(H)$: a more compact tree tends to have more branches and is representative of data with no trajectory patterns.