# Model-Free Scoring System for Risk Prediction

##### Posted on Oct 17, 20170 Comments
Tags: Biostatistics

## Method

### Notation

• $T$: survival time
• $\mathbf X$: $p-$vector of the covariates
• $S(\mathbf X)$: scoring system, where higher scores imply higher risk levels and shorter survival time
• $\mathbf C$: censoring time, independent of $\mathbf T$ conditional on $\mathbf X$
• $(\mathbf Z_i, \delta_i, \mathbf X_i), i=1,2,\ldots, n$: observed data, and $\overset{iid}{\sim} (\mathbf Z,\delta, \mathbf X)$, where $\mathbf Z=min(T,C)$, and $\mathbf Z$ is allowed to depend on the covariates $\mathbf X$.
• $\cal{R(t)} = {j:Z_j>t}$: the risk set
$TP_t^{I}(c)=P\{S(\mathbf X)>c\mid T=t\}, \; FP_t^D(c)=P\{S(\mathbf X)>c\mid T>t\}$

then the time-dependent ROC curve is defined by

$AUC(t)=P\{S(\mathbf X_i)>S(\mathbf X_j)\mid T_i=t, T_j>t\}$

### Estimation

$S(\mathbf X_i;\mathbf \beta)=\beta_1X_{i1}+\cdots + \beta_pX_{ip}$

Let $t_1<\ldots <t_M$ be the ordered unique failure times for ${Z_1,\cdots, Z_n}, M\le n$。

At each time point $t_m$, the subjects in the risk set $\cal R(t_m)$ can be divided into two groups,

• $\cal R^L(t_m)$, the set of patients with relatively lower risk whose score values are lower than $S(\mathbf X_i;\mathbf \beta)$
• $\cal R^H(t_m)$, the set of patients with relatively higher risk compared with subject $i$.

Use the proportional of observing a low-risk patient in the risk set, $\frac{\vert \cal R^L(t)\vert}{\vert \cal R(t)\vert}$ as an estimator of $AUC(t)$.

And construct the following pseudo-likelihood function

$L(\mathbf \beta)=\prod\limits_{i=1}^M\widehat{AUC(t_i)}$

Then estimate $\mathbf \beta$ by maximizing the log-pseudo-likelihood function.

Considering the computation, adopt a smoothing kernel to approximate the indicator to approximate the above log-pseudo-likelihood function.

### Variable Selection

Maximize the following loss function,

$Q_n(\mathbf \beta)=\ell_n^s(\mathbf \beta)-\lambda_n\sum\limits_{j=2}^pJ(\vert\beta_j\vert)$

where $\lambda_n$ is a tunning parameter and $J(\cdot)$ is a penalty function, and here adaptive LASSO penalty.

Use coordinate descent algorithm.

## Simulation

case 1 - case 5

### Variable Selection Examples

case 6 - case 9

Shen W, Ning J, Yuan Y, Lok AS, Feng Z. Model‐free scoring system for risk prediction with application to hepatocellular carcinoma study. Biometrics. 2017 Jul 25.