Counting Process Based Dimension Reduction Methods for Censored Data
Posted on
Dimension reduction
Multiple-index model
Aims to Extract a low-dimensional subspace from a p-dimensional covariates X=(X1,…,Xp)T, to predict an outcome of interest T.
T=h(BTX,ϵ),where ϵ is random error independent of X, B∈IRp×d is a coefficient matrix with d<p, and h(⋅) is a completely unknown link function. This model is equivalent to assuming
T⊥X∣BTX.- S(B): the linear space spanned by the columns of B.
- central subspace ST∣X: interaction of all S(B) satisfying (2).
- structural dimension: the dimension of ST∣X.
Goal of sufficient dimension reduction:
determine the structural dimension and the central subspace using empirical data.
Literatures
- Extensive literatures on estimating the central subspace for completely observed data.
- Limited literatures on estimating the dimension reduction space using censored observations.
Proposal
- A class of dimension reduction methods for right censored survival data using a counting process representation of the failure process.
- Construct semiparametric estimating equations to estimate the dimension reduction subspace for the failure time model.
Notations:
- Y=min(T,C): the observed event time
- δ=I(T≤C): the censoring indicator.
- N(u)=I(Y≤u,δ=1): the observed counting process
- Y(u)=I(Y>u): at-risk process
- λ(μ∣X): conditional hazard for T given X
- dM(u,X)=dM(u,BTX)=dN(u)−λ(u∣BTX)Y(u)du: martingale increment process indexed by u, since λ(u∣X)=λ(u∣BTX).
To estimate B, consider the unbiased estimating equations:
E[∫{α(u,X)−α∗(u,BTX)}{dN(u)−λ(u∣BTX)Y(u)du}]=0.where
α∗(u,BTX)=E{α(u,X)∣Fu,BTX}.The sample versions based on n independent and identical copies {Yi,δi,Xi}ni=1 are given by
ψn(B)≜n−1N∑i=1[∫{α(u,X)−α∗(u,BTXi)}{dNi(u)−λ(u∣BTXi)Yi(u)du}]=0.For some particular α(u,X), we have
B=argminB∈Θ{ψn(B)Tψn(B)}.Superiorities
- Not require any estimation of the censoring distribution to compensate the bias in estimating the dimension reduction subspace.
- Circumvents the curse of dimensionality since the nonparametric part is adaptive to the structural dimension.
Others
- Asymptotic normality.
- A computationally efficient approach. Optimization approach on the Stiefel manifold.
- Numerical studies & real data analysis (The Cancer Genome Atlas)
Approaches
forward regression approach
Set α(u,X)=X in (3), then the population version of the p-dimensional estimating equations is given by:
E(∫[X−E{X∣Y(u)=1,BTX}]dN(u))=0semiparametric inverse regression approach
Set
α(u,X)−α∗(u,BTX)=[X−E{X∣Y(u)=1,BTX}]φT(u),then
E(∫[X−E{X∣Y(u)=1,BTX}]φT(u)dM(u))counting process inverse regression approach
Replacing dM(u) with dN(u),
E(∫[X−E{X∣Y(u)=1,BTX}]φT(u)dN(u))the computational efficient approach
E(∫[X−E{X∣Y(u)=1}]φT(u)dN(u))Simulation
Use their R package to reproduce the simulation of setting 1. See simulation.R for complete source code.
library(orthoDr)
# forward regression
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T < res1$C, method = "forward")
# semiparametric SIR
orthoDr_reg(res1$X, pmin(res1$T, res1$C), ndr = 2, method = "sir")
# counting process SIR
orthoDr_surv(res1$X, pmin(res1$T, res1$C), res1$T < res1$C, method = "dn")
# computational efficient
CP_SIR(res1$X, pmin(res1$T, res1$C), res1$T < res1$C)
If possible, I will rewrite the source code of this package in Julia to get a better understanding of the algorithms.