# Counting Process Based Dimension Reduction Methods for Censored Data

##### Posted on

## Dimension reduction

### Multiple-index model

Aims to Extract a low-dimensional subspace from a $p$-dimensional covariates $X=(X_1,\ldots,X_p)^T$, to predict an outcome of interest $T$.

\[\begin{equation} T=h(B^TX,\epsilon)\,, \label{eq:model} \end{equation}\]where $\epsilon$ is random error independent of $X$, $B\in\IR^{p\times d}$ is a coefficient matrix with $d < p$, and $h(\cdot)$ is a completely unknown link function. This model is equivalent to assuming

\[\begin{equation} T\perp X\mid B^TX\,. \label{eq:indep} \end{equation}\]- $\cS(B)$: the linear space spanned by the columns of $B$.
- central subspace $\cS_{T\mid X}$: interaction of all $\cS(B)$ satisfying $\eqref{eq:indep}$.
- structural dimension: the dimension of $\cS_{T\mid X}$.

#### Goal of sufficient dimension reduction:

determine the structural dimension and the central subspace using empirical data.

#### Literatures

- Extensive literatures on estimating the central subspace for completely observed data.
- Limited literatures on estimating the dimension reduction space using censored observations.

## Proposal

- A class of dimension reduction methods for right censored survival data using a counting process representation of the failure process.
- Construct semiparametric estimating equations to estimate the dimension reduction subspace for the failure time model.

#### Notations:

- $Y=\min(T,C)$: the observed event time
- $\delta=I(T\le C)$: the censoring indicator.
- $N(u)=I(Y\le u,\delta=1)$: the observed counting process
- $Y(u)=I(Y>u)$: at-risk process
- $\lambda(\mu\mid X)$: conditional hazard for $T$ given $X$
- $dM(u,X)=dM(u,B^TX)=dN(u)-\lambda(u\mid B^TX)Y(u)du$: martingale increment process indexed by $u$, since $\lambda(u\mid X)=\lambda(u\mid B^TX)$.

To estimate $B$, consider the unbiased estimating equations:

\[\begin{equation} \E\Big[ \int \{ \alpha(u,X) - \alpha^*(u, B^TX) \}\{ dN(u) - \lambda(u\mid B^TX)Y(u)du \} \Big] = 0\,. \label{eq:esteq} \end{equation}\]where

\[\alpha^*(u,B^TX) = \E\{\alpha(u,X)\mid \cF_u, B^TX\}\,.\]The sample versions based on $n$ independent and identical copies $\{Y_i,\delta_i,X_i\}_{i=1}^n$ are given by

\[\psi_n(B)\triangleq n^{-1}\sum_{i=1}^N\Big[ \int \{ \alpha(u,X) - \alpha^*(u, B^TX_i) \}\{ dN_i(u) - \lambda(u\mid B^TX_i)Y_i(u)du \} \Big]=0\,.\]For some particular $\alpha(u,X)$, we have

\[B=\underset{B\in\Theta}{\argmin}\{\psi_n(B)^T\psi_n(B)\}\,.\]### Superiorities

- Not require any estimation of the censoring distribution to compensate the bias in estimating the dimension reduction subspace.
- Circumvents the curse of dimensionality since the nonparametric part is adaptive to the structural dimension.

### Others

- Asymptotic normality.
- A computationally efficient approach. Optimization approach on the Stiefel manifold.
- Numerical studies & real data analysis (The Cancer Genome Atlas)

## Approaches

### forward regression approach

Set $\alpha(u,X)=X$ in $\eqref{eq:esteq}$, then the population version of the $p$-dimensional estimating equations is given by:

\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1,B^TX\} \big]dN(u)\Big) = 0\]### semiparametric inverse regression approach

Set

\[\alpha(u,X)-\alpha^*(u,B^TX) = [X - \E\{X\mid Y(u)=1,B^TX\}]\varphi^T(u)\,,\]then

\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1,B^TX\} \big]\varphi^T(u)dM(u)\Big)\]### counting process inverse regression approach

Replacing $dM(u)$ with $dN(u)$,

\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1,B^TX\} \big]\varphi^T(u)dN(u)\Big)\]### the computational efficient approach

\[\E\Big(\int\big[ X-E\{X\mid Y(u)=1\} \big]\varphi^T(u)dN(u)\Big)\]## Simulation

Use their R package to reproduce the simulation of setting 1. See simulation.R for complete source code.

```
library(orthoDr)
# forward regression
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T < res1$C, method = "forward")
# semiparametric SIR
orthoDr_reg(res1$X, pmin(res1$T, res1$C), ndr = 2, method = "sir")
# counting process SIR
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T < res1$C, method = "dn")
# computational efficient
CP_SIR(res1$X, pmin(res1$T, res1$C), res1$T < res1$C)
```

If possible, I will rewrite the source code of this package in Julia to get a better understanding of the algorithms.