WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Cox Models with Time-Varying Covariates vs Time-Varying Coefficients

Posted on
Tags: Cox Model, Model Selection, Time-varying

This note is for Zhang, Z., Reinikainen, J., Adeleke, K. A., Pieterse, M. E., & Groothuis-Oudshoorn, C. G. M. (2018). Time-varying covariates and coefficients in Cox regression models. Annals of Translational Medicine, 6(7), 121.

The classical Cox model is

\[\lambda(t\mid X) = \lambda_0(t)\exp(\beta^TX)\,,\]

one approach for using time-varying covariate data is to extend the Cox model to allow time-varying covariates

\[\lambda(t\mid Z(t)) = \lambda_0(t) \exp(\beta^Tx +\gamma^TXg(t))\]

where

\[Z(t) = [x_1,\ldots,x_p, X_1g(t),\ldots, X_qg(t)]\]

In the time-varying coefficient situation, a covariate is measured at baseline but its effect on the outcome is not constant over the follow-up time, which is a violation of the proportional hazards assumption.

If the time-varying coefficient can be written as $g(\beta, t) = \beta g(t)$, the model with a time-varying coefficient can be expressed as a model with time-varying covariate with a constant coefficient.

Working example on time-varying covariates

generate a data frame containing two time-fixed variables: group and age

library(survsim)
library(survival)
N = 100
set.seed(123)
df.tf = simple.surv.sim(
  n = N, foltime = 500,
  dist.ev = c("llogistic"),
  anc.ev = c(0.68),
  beta0.ev = c(5.8),
  anc.cens = 1.2,
  beta0.cens = 7.4,
  z = list(c("unif", 0.8, 1.2)),
  beta = list(c(-0.4), c(0)),
  x = list(c("bern", 0.5), c("normal", 70, 13))
)
names(df.tf)[c(1, 6, 7)] = c("id", "grp", "age")
head(df.tf)
#   id status start      stop         z grp      age
# 1  1      1     0  48.12817 1.1532070   0 80.40721
# 2  2      0     0 121.43904 0.9826459   1 86.11345
# 3  3      1     0  30.65770 1.1599300   1 72.38007
# 4  4      0     0 500.00000 1.0771214   0 91.97240
# 5  5      1     0  69.54117 1.0376568   1 77.13826
# 6  6      1     0 233.05486 1.1181870   1 86.83192

construct a time-varying covariate crp, generate a data frame in a counting process data structure, in which each individual is represented by one or more rows. In such a data frame, each row represents a follow-up tine interval at which the value of a covariate is recorded.

set.seed(123)
nft = sample(1:10, N, replace = TRUE)
crp = round(abs(rnorm(sum(nft)+N, mean = 100, sd = 40)), 1)
time = NULL
id = NULL
i = 0
for (n in nft) {
  i = i+1
  time.n = sample(1:500, n)
  time.n = c(0, sort(time.n))
  time = c(time, time.n)
  id.n = rep(i, n+1) 
  id = c(id, id.n)
}
df.td = cbind(data.frame(id, time), crp)
head(df.td)
> head(df.td)
#   id time   crp
# 1  1    0 115.4
# 2  1  207  85.2
# 3  1  253 125.8
# 4  1  373  91.2
# 5  2    0 113.3
# 6  2  110 143.9

Merging data frames via tmerge

# convert the data frame to a longitudinal type. the first run defining the basic structure
# and subsequent runs add variables to that structure
df = tmerge(df.tf, df.tf, id = id, endpt = event(stop, status))
head(df)
#   id status start      stop         z grp      age tstart     tstop endpt
# 1  1      1     0  48.12817 1.1532070   0 80.40721      0  48.12817     1
# 2  2      0     0 121.43904 0.9826459   1 86.11345      0 121.43904     0
# 3  3      1     0  30.65770 1.1599300   1 72.38007      0  30.65770     1
# 4  4      0     0 500.00000 1.0771214   0 91.97240      0 500.00000     0
# 5  5      1     0  69.54117 1.0376568   1 77.13826      0  69.54117     1
# 6  6      1     0 233.05486 1.1181870   1 86.83192      0 233.05486     1
df = tmerge(df, df.td, id = id, crp = tdc(time, crp))
head(df)
#   id status start      stop         z grp      age tstart     tstop endpt   crp
# 1  1      1     0  48.12817 1.1532070   0 80.40721      0  48.12817     1 115.4
# 2  2      0     0 121.43904 0.9826459   1 86.11345      0 110.00000     0 113.3
# 3  2      0     0 121.43904 0.9826459   1 86.11345    110 121.43904     0 143.9
# 4  3      1     0  30.65770 1.1599300   1 72.38007      0  30.65770     1 146.0
# 5  4      0     0 500.00000 1.0771214   0 91.97240      0  18.00000     0  71.6
# 6  4      0     0 500.00000 1.0771214   0 91.97240     18 366.00000     0 110.3

Fit the Cox model with a time-varying covariate

# fit the cox model with a time-varying covariate
fit.tdc = coxph(Surv(tstart, tstop, endpt)~grp+age+crp+cluster(id), df)
fit.tdc

Time-varying coefficients.

It emerges when the proportional hazards assumption is not fulfilled to identify time-varying coefficients is actually to test the proportional hazards assumption after fitting a Cox proportional hazard model.

The examination of proportional hazards assumption can be performed using cox.zph() function, shipped with the survival package

fit2 = coxph(Surv(time, status) ~ age + ph.karno + sex, data = lung)
zph = cox.zph(fit2)
zph
#           chisq df      p
# age       0.478  1 0.4892
# ph.karno  8.017  1 0.0046
# sex       3.085  1 0.0790
# GLOBAL   10.359  3 0.0157

Schoenfeld’s global test for the violation of proportional assumption The result shows that there is significant deviation from the proportional hazards assumption for the variable pb.karno

We can visualize the results

plot(zph[2], lwd = 2)
abline(0, 0, col = 1, lty = 3, lwd = 2)
abline(h = fit2$coef[2], col = 3, lwd = 2, lty = 2)
legend("bottomright",
       legend=c("Reference line for null effect",
                "Average hazard over time",
                "Time-varying hazard"),
       lty = c(3, 2, 1), col = c(1, 3, 1), lwd = 2)

image

Step function to explore time-varying coefficient

one way to model time-varying coefficients is to use a step function, the effect of fixed baseline covariates becomes stronger or weaker over time, which can be explored via stratification by time

lung.split = survSplit(Surv(time, status) ~ ., 
                       data = lung, cut = c(180, 350), 
                       episode = "tgroup", id = "id")
fit.split = coxph(Surv(tstart, time, status) ~ age + ph.karno:strata(tgroup) + 
                    sex, data = lung.split)
cox.zph(fit.split)
#                            chisq df    p
# age                     0.000321  1 0.99
# sex                     2.531266  1 0.11
# ph.karno:strata(tgroup) 4.840271  3 0.18
# GLOBAL                  7.315512  5 0.20

Continuous function to describe the time-varying coefficient

## continuous function to describe the time-varying coefficient
fit.tt = coxph(Surv(time, status) ~ age+ph.karno+tt(ph.karno)+sex, data = lung, tt = function(x, t,...) x*log(t+20))

zph.tt = cox.zph(fit2, transform = function(t) log(t+20))
plot(zph.tt[2])
abline(coef(fit.tt)[2:3], col=2)

timereg package

Cox regression model is fit similarly as in the survival package with the only difference that resampling methods are used for the statistical inference

library(timereg)
fit.out = timecox(Surv(time, status) ~ age+sex+ph.karno, data = lung, n.sim = 500, max.time = 700)

# the effects of all variables
par(mfrow = c(2, 2))
plot(fit.out)

image


Published in categories Note