WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Counting Process Based Dimension Reduction Methods for Censored Data

Posted on
Tags: Survival Analysis, Sliced Inverse Regression, Estimating Equations, Sufficient Dimension Reduction

The note is for Sun, Q., Zhu, R., Wang, T., & Zeng, D. (2017). Counting Process Based Dimension Reduction Methods for Censored Outcomes. ArXiv:1704.05046 [Stat].

Dimension reduction

Multiple-index model

Aims to Extract a low-dimensional subspace from a $p$-dimensional covariates $X=(X_1,\ldots,X_p)^T$, to predict an outcome of interest $T$.

\[\begin{equation} T=h(B^TX,\epsilon)\,, \label{eq:model} \end{equation}\]

where $\epsilon$ is random error independent of $X$, $B\in\IR^{p\times d}$ is a coefficient matrix with $d < p$, and $h(\cdot)$ is a completely unknown link function. This model is equivalent to assuming

\[\begin{equation} T\perp X\mid B^TX\,. \label{eq:indep} \end{equation}\]
  1. $\cS(B)$: the linear space spanned by the columns of $B$.
  2. central subspace $\cS_{T\mid X}$: interaction of all $\cS(B)$ satisfying $\eqref{eq:indep}$.
  3. structural dimension: the dimension of $\cS_{T\mid X}$.

Goal of sufficient dimension reduction:

determine the structural dimension and the central subspace using empirical data.

Literatures

  1. Extensive literatures on estimating the central subspace for completely observed data.
  2. Limited literatures on estimating the dimension reduction space using censored observations.

Proposal

  • A class of dimension reduction methods for right censored survival data using a counting process representation of the failure process.
  • Construct semiparametric estimating equations to estimate the dimension reduction subspace for the failure time model.

Notations:

  • $Y=\min(T,C)$: the observed event time
  • $\delta=I(T\le C)$: the censoring indicator.
  • $N(u)=I(Y\le u,\delta=1)$: the observed counting process
  • $Y(u)=I(Y>u)$: at-risk process
  • $\lambda(\mu\mid X)$: conditional hazard for $T$ given $X$
  • $dM(u,X)=dM(u,B^TX)=dN(u)-\lambda(u\mid B^TX)Y(u)du$: martingale increment process indexed by $u$, since $\lambda(u\mid X)=\lambda(u\mid B^TX)$.

To estimate $B$, consider the unbiased estimating equations:

\[\begin{equation} \E\Big[ \int \{ \alpha(u,X) - \alpha^*(u, B^TX) \}\{ dN(u) - \lambda(u\mid B^TX)Y(u)du \} \Big] = 0\,. \label{eq:esteq} \end{equation}\]

where

\[\alpha^*(u,B^TX) = \E\{\alpha(u,X)\mid \cF_u, B^TX\}\,.\]

The sample versions based on $n$ independent and identical copies $\{Y_i,\delta_i,X_i\}_{i=1}^n$ are given by

\[\psi_n(B)\triangleq n^{-1}\sum_{i=1}^N\Big[ \int \{ \alpha(u,X) - \alpha^*(u, B^TX_i) \}\{ dN_i(u) - \lambda(u\mid B^TX_i)Y_i(u)du \} \Big]=0\,.\]

For some particular $\alpha(u,X)$, we have

\[B=\underset{B\in\Theta}{\argmin}\{\psi_n(B)^T\psi_n(B)\}\,.\]

Superiorities

  1. Not require any estimation of the censoring distribution to compensate the bias in estimating the dimension reduction subspace.
  2. Circumvents the curse of dimensionality since the nonparametric part is adaptive to the structural dimension.

Others

  1. Asymptotic normality.
  2. A computationally efficient approach. Optimization approach on the Stiefel manifold.
  3. Numerical studies & real data analysis (The Cancer Genome Atlas)

Approaches

forward regression approach

Set $\alpha(u,X)=X$ in $\eqref{eq:esteq}$, then the population version of the $p$-dimensional estimating equations is given by:

\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1,B^TX\} \big]dN(u)\Big) = 0\]

semiparametric inverse regression approach

Set

\[\alpha(u,X)-\alpha^*(u,B^TX) = [X - \E\{X\mid Y(u)=1,B^TX\}]\varphi^T(u)\,,\]

then

\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1,B^TX\} \big]\varphi^T(u)dM(u)\Big)\]

counting process inverse regression approach

Replacing $dM(u)$ with $dN(u)$,

\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1,B^TX\} \big]\varphi^T(u)dN(u)\Big)\]

the computational efficient approach

\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1\} \big]\varphi^T(u)dN(u)\Big)\]

Simulation

Use their R package to reproduce the simulation of setting 1. See simulation.R for complete source code.

library(orthoDr)

# forward regression
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T <  res1$C, method = "forward")
# semiparametric SIR
orthoDr_reg(res1$X, pmin(res1$T, res1$C), ndr = 2, method = "sir")
# counting process SIR
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T <  res1$C, method = "dn")
# computational efficient
CP_SIR(res1$X, pmin(res1$T, res1$C), res1$T <  res1$C)

If possible, I will rewrite the source code of this package in Julia to get a better understanding of the algorithms.


Published in categories Note