WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Model-Free Scoring System for Risk Prediction

Posted on 0 Comments
Tags: Biostatistics

Method

Notation

  • $T$: survival time
  • $\mathbf X$: $p-$vector of the covariates
  • $S(\mathbf X)$: scoring system, where higher scores imply higher risk levels and shorter survival time
  • $\mathbf C$: censoring time, independent of $\mathbf T$ conditional on $\mathbf X$
  • $(\mathbf Z_i, \delta_i, \mathbf X_i), i=1,2,\ldots, n$: observed data, and $\overset{iid}{\sim} (\mathbf Z,\delta, \mathbf X)$, where $\mathbf Z=min(T,C)$, and $\mathbf Z$ is allowed to depend on the covariates $\mathbf X$.
  • $\cal{R(t)} = {j:Z_j>t}$: the risk set
\[TP_t^{I}(c)=P\{S(\mathbf X)>c\mid T=t\}, \; FP_t^D(c)=P\{S(\mathbf X)>c\mid T>t\}\]

then the time-dependent ROC curve is defined by

\[AUC(t)=P\{S(\mathbf X_i)>S(\mathbf X_j)\mid T_i=t, T_j>t\}\]

Estimation

\[S(\mathbf X_i;\mathbf \beta)=\beta_1X_{i1}+\cdots + \beta_pX_{ip}\]

Let $t_1<\ldots <t_M$ be the ordered unique failure times for ${Z_1,\cdots, Z_n}, M\le n$。

At each time point $t_m$, the subjects in the risk set $\cal R(t_m)$ can be divided into two groups,

  • $\cal R^L(t_m)$, the set of patients with relatively lower risk whose score values are lower than $S(\mathbf X_i;\mathbf \beta)$
  • $\cal R^H(t_m)$, the set of patients with relatively higher risk compared with subject $i$.

Use the proportional of observing a low-risk patient in the risk set, $\frac{\vert \cal R^L(t)\vert}{\vert \cal R(t)\vert}$ as an estimator of $AUC(t)$.

And construct the following pseudo-likelihood function

\[L(\mathbf \beta)=\prod\limits_{i=1}^M\widehat{AUC(t_i)}\]

Then estimate $\mathbf \beta$ by maximizing the log-pseudo-likelihood function.

Considering the computation, adopt a smoothing kernel to approximate the indicator to approximate the above log-pseudo-likelihood function.

Variable Selection

Maximize the following loss function,

\[Q_n(\mathbf \beta)=\ell_n^s(\mathbf \beta)-\lambda_n\sum\limits_{j=2}^pJ(\vert\beta_j\vert)\]

where $\lambda_n$ is a tunning parameter and $J(\cdot)$ is a penalty function, and here adaptive LASSO penalty.

Use coordinate descent algorithm.

Asymptotic Results

Simulation

Score System Without Variable Selection

case 1 - case 5

Variable Selection Examples

case 6 - case 9

References

Shen W, Ning J, Yuan Y, Lok AS, Feng Z. Model‐free scoring system for risk prediction with application to hepatocellular carcinoma study. Biometrics. 2017 Jul 25.


Published in categories Note