C-SIDE for Cell-type-specific Spatial DE
Posted on (Update: )
cell type-specific inference of differential expression in spatial transcriptomics
a central problem: detect DE genes within cell types across tissue context
Challenges to learn DE:
changing cell type composition across space and measurement pixels detecting transcripts from multiple cell types
C-SIDE: identifies cell type-specific DE in spatial transcriptomics, accounting for localization of other cell types
model gene expression as an additive mixture across cell types of log-linear cell type-specific expression functions
current methods for DE in spatial transcriptomics fall into two categories
- nonparametric: not use constrained hypotheses about gene expression patterns, but rather fit general smooth spatial patterns of gene expression
- some not consider cell types, others operate on individual cell types
- parametric
- no general parametric framework is currently available
an important challenge unaddressed by current spatial transcriptomics DE methods:
- observations from cell type mixtures
sequencing-based, RNA-capture spatial transcriptomics technologies, such as Visium, GeoMx and Slide-seq, can capture multiple cell types on individual measurement pixels
imaging-based spatial transcriptomics technologies, such as MERFISH, ExSeq, and STARmap, have the potential to achieve single-cell resolution, these technologies may encounter mixing across cell types due to diffusion or imperfect cellular segmentation
the paper introduce cell type-specific inference of DE (C-SIDE), a general parametric statistical method that estimates cell type-specific DE in the context of cell type mixtures
- estimate cell type proportions on each pixel using a cell type-annotated scRNA-seq reference
- fit a parametric model, using predefined covariates such as spatial location or cellular microenvironment, that accounts for cell type differences to obtain cell type-specific DE estimates and corresponding standard errors
- the model accounts for sampling noise, gene-specific overdispersion, multiple hypothesis testing and platform effects between the scRNA-seq reference and the spatial data
- permits statistical inference across multiple experimental samples and/or replicates
C-SIDE inputs one or more experimental samples of spatial transcriptomics data, consisting of $Y_{i,j,g}$ as the observed RNA counts for pixel $i$, gene $j$ and experimental sample $g$
assume Poisson sampling
\[Y_{i,j,g}\mid \lambda_{i,j,g} \approx \text{Poisson}(N_{ig}\lambda_{i,j,g})\]- $\lambda_{ijg}$: expected count
- $N_{ig}$: total transcript count (e.g., total unique molecular identifiers, UMIs)
accounting for platform effects and other sources of technical and natural variability, assume $\lambda_{ijg}$ is a mixture of $K$ cell type expression profiles, defined by
\[\log (\lambda_{ijg}) = \log\left( \sum_{k=1}^K \beta_{ikg}\mu_{ikjg} \right) + \gamma_{jg} + \epsilon_{ijg}\]- $\mu_{ikjg}$: cell type-specific expected gene expression rate
- $\beta_{ikg}$: proportion of cell type
- $\gamma_{jg}$: gene-specific random effect that accounts for platform variability
- $\epsilon_{ijg}$: random effect to account for gene-specific overdispersion
account for cell type-specific DE, model across pixel locations the log of the cell type-specific profiles $\mu_{ikjg}$ as a linear combination of $L$ covariates used to explain DE:
\[\log(\mu_{ikjg}) = \alpha_{0kjg} + \sum_{\ell=1}^Lx_{ilg}\alpha_{\ell kjg}\]- $x$: predefined covariates that explain DE
- $\alpha$: DE effect size of covariate $\ell$ for gene $j$ in cell type $k$ for sample $g$