WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Model-Free Scoring System for Risk Prediction

Posted on 0 Comments
Tags: Biostatistics



  • $T$: survival time
  • $\mathbf X$: $p-$vector of the covariates
  • $S(\mathbf X)$: scoring system, where higher scores imply higher risk levels and shorter survival time
  • $\mathbf C$: censoring time, independent of $\mathbf T$ conditional on $\mathbf X$
  • $(\mathbf Z_i, \delta_i, \mathbf X_i), i=1,2,\ldots, n$: observed data, and $\overset{iid}{\sim} (\mathbf Z,\delta, \mathbf X)$, where $\mathbf Z=min(T,C)$, and $\mathbf Z$ is allowed to depend on the covariates $\mathbf X$.
  • $\cal{R(t)} = {j:Z_j>t}$: the risk set
\[TP_t^{I}(c)=P\{S(\mathbf X)>c\mid T=t\}, \; FP_t^D(c)=P\{S(\mathbf X)>c\mid T>t\}\]

then the time-dependent ROC curve is defined by

\[AUC(t)=P\{S(\mathbf X_i)>S(\mathbf X_j)\mid T_i=t, T_j>t\}\]


\[S(\mathbf X_i;\mathbf \beta)=\beta_1X_{i1}+\cdots + \beta_pX_{ip}\]

Let $t_1<\ldots <t_M$ be the ordered unique failure times for ${Z_1,\cdots, Z_n}, M\le n$。

At each time point $t_m$, the subjects in the risk set $\cal R(t_m)$ can be divided into two groups,

  • $\cal R^L(t_m)$, the set of patients with relatively lower risk whose score values are lower than $S(\mathbf X_i;\mathbf \beta)$
  • $\cal R^H(t_m)$, the set of patients with relatively higher risk compared with subject $i$.

Use the proportional of observing a low-risk patient in the risk set, $\frac{\vert \cal R^L(t)\vert}{\vert \cal R(t)\vert}$ as an estimator of $AUC(t)$.

And construct the following pseudo-likelihood function

\[L(\mathbf \beta)=\prod\limits_{i=1}^M\widehat{AUC(t_i)}\]

Then estimate $\mathbf \beta$ by maximizing the log-pseudo-likelihood function.

Considering the computation, adopt a smoothing kernel to approximate the indicator to approximate the above log-pseudo-likelihood function.

Variable Selection

Maximize the following loss function,

\[Q_n(\mathbf \beta)=\ell_n^s(\mathbf \beta)-\lambda_n\sum\limits_{j=2}^pJ(\vert\beta_j\vert)\]

where $\lambda_n$ is a tunning parameter and $J(\cdot)$ is a penalty function, and here adaptive LASSO penalty.

Use coordinate descent algorithm.

Asymptotic Results


Score System Without Variable Selection

case 1 - case 5

Variable Selection Examples

case 6 - case 9


Shen W, Ning J, Yuan Y, Lok AS, Feng Z. Model‐free scoring system for risk prediction with application to hepatocellular carcinoma study. Biometrics. 2017 Jul 25.

Published in categories Note