# Counting Process Based Dimension Reduction Methods for Censored Data

## Dimension reduction

### Multiple-index model

Aims to Extract a low-dimensional subspace from a $p$-dimensional covariates $X=(X_1,\ldots,X_p)^T$, to predict an outcome of interest $T$.

where $\epsilon$ is random error independent of $X$, $B\in\IR^{p\times d}$ is a coefficient matrix with $d < p$, and $h(\cdot)$ is a completely unknown link function. This model is equivalent to assuming

1. $\cS(B)$: the linear space spanned by the columns of $B$.
2. central subspace $\cS_{T\mid X}$: interaction of all $\cS(B)$ satisfying $\eqref{eq:indep}$.
3. structural dimension: the dimension of $\cS_{T\mid X}$.

#### Goal of sufficient dimension reduction:

determine the structural dimension and the central subspace using empirical data.

#### Literatures

1. Extensive literatures on estimating the central subspace for completely observed data.
2. Limited literatures on estimating the dimension reduction space using censored observations.

## Proposal

• A class of dimension reduction methods for right censored survival data using a counting process representation of the failure process.
• Construct semiparametric estimating equations to estimate the dimension reduction subspace for the failure time model.

#### Notations:

• $Y=\min(T,C)$: the observed event time
• $\delta=I(T\le C)$: the censoring indicator.
• $N(u)=I(Y\le u,\delta=1)$: the observed counting process
• $Y(u)=I(Y>u)$: at-risk process
• $\lambda(\mu\mid X)$: conditional hazard for $T$ given $X$
• $dM(u,X)=dM(u,B^TX)=dN(u)-\lambda(u\mid B^TX)Y(u)du$: martingale increment process indexed by $u$, since $\lambda(u\mid X)=\lambda(u\mid B^TX)$.

To estimate $B$, consider the unbiased estimating equations:

where

The sample versions based on $n$ independent and identical copies $\{Y_i,\delta_i,X_i\}_{i=1}^n$ are given by

For some particular $\alpha(u,X)$, we have

### Superiorities

1. Not require any estimation of the censoring distribution to compensate the bias in estimating the dimension reduction subspace.
2. Circumvents the curse of dimensionality since the nonparametric part is adaptive to the structural dimension.

### Others

1. Asymptotic normality.
2. A computationally efficient approach. Optimization approach on the Stiefel manifold.
3. Numerical studies & real data analysis (The Cancer Genome Atlas)

## Approaches

### forward regression approach

Set $\alpha(u,X)=X$ in $\eqref{eq:esteq}$, then the population version of the $p$-dimensional estimating equations is given by:

Set

then

### counting process inverse regression approach

Replacing $dM(u)$ with $dN(u)$,

## Simulation

Use their R package to reproduce the simulation of setting 1. See simulation.R for complete source code.

library(orthoDr)

# forward regression
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T <  res1$C, method = "forward") # semiparametric SIR orthoDr_reg(res1$X, pmin(res1$T, res1$C), ndr = 2, method = "sir")
# counting process SIR
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T <  res1$C, method = "dn") # computational efficient CP_SIR(res1$X, pmin(res1$T, res1$C), res1$T < res1$C)


If possible, I will rewrite the source code of this package in Julia to get a better understanding of the algorithms.

Published in categories Note