# Counting Process Based Dimension Reduction Methods for Censored Data

##### Posted on

## Dimension reduction

### Multiple-index model

Aims to Extract a low-dimensional subspace from a $p$-dimensional covariates $X=(X_1,\ldots,X_p)^T$, to predict an outcome of interest $T$.

where $\epsilon$ is random error independent of $X$, $B\in\IR^{p\times d}$ is a coefficient matrix with $d < p$, and $h(\cdot)$ is a completely unknown link function. This model is equivalent to assuming

- $\cS(B)$: the linear space spanned by the columns of $B$.
- central subspace $\cS_{T\mid X}$: interaction of all $\cS(B)$ satisfying $\eqref{eq:indep}$.
- structural dimension: the dimension of $\cS_{T\mid X}$.

#### Goal of sufficient dimension reduction:

determine the structural dimension and the central subspace using empirical data.

#### Literatures

- Extensive literatures on estimating the central subspace for completely observed data.
- Limited literatures on estimating the dimension reduction space using censored observations.

## Proposal

- A class of dimension reduction methods for right censored survival data using a counting process representation of the failure process.
- Construct semiparametric estimating equations to estimate the dimension reduction subspace for the failure time model.

#### Notations:

- $Y=\min(T,C)$: the observed event time
- $\delta=I(T\le C)$: the censoring indicator.
- $N(u)=I(Y\le u,\delta=1)$: the observed counting process
- $Y(u)=I(Y>u)$: at-risk process
- $\lambda(\mu\mid X)$: conditional hazard for $T$ given $X$
- $dM(u,X)=dM(u,B^TX)=dN(u)-\lambda(u\mid B^TX)Y(u)du$: martingale increment process indexed by $u$, since $\lambda(u\mid X)=\lambda(u\mid B^TX)$.

To estimate $B$, consider the unbiased estimating equations:

where

The sample versions based on $n$ independent and identical copies $\{Y_i,\delta_i,X_i\}_{i=1}^n$ are given by

For some particular $\alpha(u,X)$, we have

### Superiorities

- Not require any estimation of the censoring distribution to compensate the bias in estimating the dimension reduction subspace.
- Circumvents the curse of dimensionality since the nonparametric part is adaptive to the structural dimension.

### Others

- Asymptotic normality.
- A computationally efficient approach. Optimization approach on the Stiefel manifold.
- Numerical studies & real data analysis (The Cancer Genome Atlas)

## Approaches

### forward regression approach

Set $\alpha(u,X)=X$ in $\eqref{eq:esteq}$, then the population version of the $p$-dimensional estimating equations is given by:

### semiparametric inverse regression approach

Set

then

### counting process inverse regression approach

Replacing $dM(u)$ with $dN(u)$,

### the computational efficient approach

## Simulation

Use their R package to reproduce the simulation of setting 1. See simulation.R for complete source code.

```
library(orthoDr)
# forward regression
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T < res1$C, method = "forward")
# semiparametric SIR
orthoDr_reg(res1$X, pmin(res1$T, res1$C), ndr = 2, method = "sir")
# counting process SIR
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T < res1$C, method = "dn")
# computational efficient
CP_SIR(res1$X, pmin(res1$T, res1$C), res1$T < res1$C)
```

If possible, I will rewrite the source code of this package in Julia to get a better understanding of the algorithms.