# Model Selection for Cox Models with Time-Varying Coefficients

##### Posted on

This note is for Yan, J., & Huang, J. (2012). Model Selection for Cox Models with Time-Varying Coefficients. Biometrics, 68(2), 419–428.

Cox model with time-varying coefficients can capture the temporal dynamics of covariate effects on right censored failure times.

Challenge: not all covariate coefficients are time-varying, so need to distinguish covariates with time-varying coefficient from those with time-independent coefficient.

Propose: an adaptive group lasso method that not only selects important variables but also selects between time-dependent and time-varying specifications of their presence in the model.

Each covariate effect is partitioned into a time-dependent part and a time-varying part, the latter of which is characterized by a group of coefficients of basis splines without intercept.

Model selection and estimation are carried out through a fast, iterative group shooting algorithm.

## Introduction

An ideal model selection procedure for Cox models with time-varying coefficients should distinguish three kinds of covariates:

- those not in the model
- those in the model with time-independent coefficients
- those in the model with time-varying coefficients

For standard Cox models with time-independent coefficients, effective variable selection techniques have been available.

- Lasso
- Zhang and Lu (2007): adaptive lasso where the penalty on each coefficient is weighted by the inverse magnitude of an initial estimate of the coefficient.
- Fan and Li (2001): a general nonconcave penalized partial likelihood approach

For cox model with time-varying coefficients,

- the literature on model selection for varying coefficient in general appears to be limited
- Lin and Zhang (2006): a component selection and smoothing operator (COSSO)
- Li and Liang (2008): a two-part variable selection approach for semiparametric regression models
- …

- in the context of Cox models with time-varying coefficients, simultaneous selection between varying coefficient and fixed coefficient in addition to selection between nonzero and zero coefficient has not been studied.

Two main classes of approaches for varying coefficient Cox models

- the penalized partial likelihood approach uses smooth functions for coefficients, maximizing the log partial likelihood with a penalty on the roughness of the coefficients
- the kernel-weighted partial likelihood approach finds point estimator at each time by maximizing a weighted “local” log partial likelihood function

The paper focus on the first class of models, where each time-varying coefficient is expanded over a B-spline basis. Each coefficient is characterized by a set of basis coefficients which is further treated as two groups.

- the first group captures the time-independent, overall level of the covariate effect
- the second group captures the temporal changes relative to the overall level over time

## Adaptive Group Lasso with B-splines

- $n$: sample size
- $T_i^*$: failure time
- $C_i$: censoring time
- covariate: $X_i=(X_{i1},\ldots, X_{ip})^T$
- $T_i=\min(T_i^*, C_i)$
- $\Delta_i = I(T_i^*\le C_i)$
- assume $T_i^*$ and $C_i$ are conditionally independent given $X_i$, and the censoring scheme is non-informative
- the observed data are iid copies ${T_i, \Delta_i, X_i},i=1,\ldots,n$

The cox model with time-varying coefficients is

\[h(t\mid X_i) = h_0(t)\exp[X_i^T\beta(t)]\]Assume $\beta(t) = \Theta F(t)$, where $\Theta$ is $p\times q$ matrix of parameters to be estimated.

each time varying coefficient $\beta_j(t)=\Theta_jF(t)$

decompose each $\beta_j(t)$ into two parts by partitioning $\Theta_j$ into two parts, each corresponding to a partition of $F(t)$.

Write $\Theta_j=(\Theta_{j,1},\Theta_{j,-1})$

consider the log partial likelihood function with a penalty function

Suppose partition $\theta$ into $g$ groups, $\theta_1,\ldots,\theta_g$, the penalty function is

\[P(\theta;\lambda_n) = \lambda_n\sum_{i=1}^gW_i\Vert \theta_i\Vert\]consider two types of penalty

- combined penalty:

- separate penalty:

The model selection procedure is

- minimize with combined penalty and weight $W_j=\sqrt q$ to obtain $\tilde\theta$
- minimize with separate penalty and weight $W_{1j}, W_{j2}$ computed from $W_i = \sqrt{p_i}/Vert\tilde\theta_i\Vert$