Counting Process Based Dimension Reduction Methods for Censored Data
Posted on
Dimension reduction
Multiple-index model
Aims to Extract a low-dimensional subspace from a $p$-dimensional covariates $X=(X_1,\ldots,X_p)^T$, to predict an outcome of interest $T$.
\[\begin{equation} T=h(B^TX,\epsilon)\,, \label{eq:model} \end{equation}\]where $\epsilon$ is random error independent of $X$, $B\in\IR^{p\times d}$ is a coefficient matrix with $d < p$, and $h(\cdot)$ is a completely unknown link function. This model is equivalent to assuming
\[\begin{equation} T\perp X\mid B^TX\,. \label{eq:indep} \end{equation}\]- $\cS(B)$: the linear space spanned by the columns of $B$.
- central subspace $\cS_{T\mid X}$: interaction of all $\cS(B)$ satisfying $\eqref{eq:indep}$.
- structural dimension: the dimension of $\cS_{T\mid X}$.
Goal of sufficient dimension reduction:
determine the structural dimension and the central subspace using empirical data.
Literatures
- Extensive literatures on estimating the central subspace for completely observed data.
- Limited literatures on estimating the dimension reduction space using censored observations.
Proposal
- A class of dimension reduction methods for right censored survival data using a counting process representation of the failure process.
- Construct semiparametric estimating equations to estimate the dimension reduction subspace for the failure time model.
Notations:
- $Y=\min(T,C)$: the observed event time
- $\delta=I(T\le C)$: the censoring indicator.
- $N(u)=I(Y\le u,\delta=1)$: the observed counting process
- $Y(u)=I(Y>u)$: at-risk process
- $\lambda(\mu\mid X)$: conditional hazard for $T$ given $X$
- $dM(u,X)=dM(u,B^TX)=dN(u)-\lambda(u\mid B^TX)Y(u)du$: martingale increment process indexed by $u$, since $\lambda(u\mid X)=\lambda(u\mid B^TX)$.
To estimate $B$, consider the unbiased estimating equations:
\[\begin{equation} \E\Big[ \int \{ \alpha(u,X) - \alpha^*(u, B^TX) \}\{ dN(u) - \lambda(u\mid B^TX)Y(u)du \} \Big] = 0\,. \label{eq:esteq} \end{equation}\]where
\[\alpha^*(u,B^TX) = \E\{\alpha(u,X)\mid \cF_u, B^TX\}\,.\]The sample versions based on $n$ independent and identical copies $\{Y_i,\delta_i,X_i\}_{i=1}^n$ are given by
\[\psi_n(B)\triangleq n^{-1}\sum_{i=1}^N\Big[ \int \{ \alpha(u,X) - \alpha^*(u, B^TX_i) \}\{ dN_i(u) - \lambda(u\mid B^TX_i)Y_i(u)du \} \Big]=0\,.\]For some particular $\alpha(u,X)$, we have
\[B=\underset{B\in\Theta}{\argmin}\{\psi_n(B)^T\psi_n(B)\}\,.\]Superiorities
- Not require any estimation of the censoring distribution to compensate the bias in estimating the dimension reduction subspace.
- Circumvents the curse of dimensionality since the nonparametric part is adaptive to the structural dimension.
Others
- Asymptotic normality.
- A computationally efficient approach. Optimization approach on the Stiefel manifold.
- Numerical studies & real data analysis (The Cancer Genome Atlas)
Approaches
forward regression approach
Set $\alpha(u,X)=X$ in $\eqref{eq:esteq}$, then the population version of the $p$-dimensional estimating equations is given by:
\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1,B^TX\} \big]dN(u)\Big) = 0\]semiparametric inverse regression approach
Set
\[\alpha(u,X)-\alpha^*(u,B^TX) = [X - \E\{X\mid Y(u)=1,B^TX\}]\varphi^T(u)\,,\]then
\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1,B^TX\} \big]\varphi^T(u)dM(u)\Big)\]counting process inverse regression approach
Replacing $dM(u)$ with $dN(u)$,
\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1,B^TX\} \big]\varphi^T(u)dN(u)\Big)\]the computational efficient approach
\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1\} \big]\varphi^T(u)dN(u)\Big)\]Simulation
Use their R package to reproduce the simulation of setting 1. See simulation.R for complete source code.
library(orthoDr)
# forward regression
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T < res1$C, method = "forward")
# semiparametric SIR
orthoDr_reg(res1$X, pmin(res1$T, res1$C), ndr = 2, method = "sir")
# counting process SIR
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T < res1$C, method = "dn")
# computational efficient
CP_SIR(res1$X, pmin(res1$T, res1$C), res1$T < res1$C)
If possible, I will rewrite the source code of this package in Julia to get a better understanding of the algorithms.