WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Joint Model in High Dimension

Posted on (Update: )
Tags: Mixture Model, Hypothesis Testing

This post is for Liu, M., Sun, J., Herazo-Maya, J. D., Kaminski, N., & Zhao, H. (2019). Joint Models for Time-to-Event Data and Longitudinal Biomarkers of High Dimension. Statistics in Biosciences, 11(3), 614–629.

Joint Models for Time-to-Event Data and Longitudinal Biomarkers of High Dimension

study the association between a biomarker repeatedly measured over time and the survival outcome

  • landmarking methods: Anderson et al. (1983), van Houwelingen and Putter (2011)
  • joint modeling methods: Faucett and Thomas (1996); Tsiatis and Davidian (2004); Rizopoulos et al. (2014)

The paper are interested in predicting the patients’ survival time using relevant clinical variables and the repeatedly measured gene expression profiles

many joint modeling methods for different types of longitudinal biomarkers and survival outcomes

  • Tsiatis and Davidian (2004): a comprehensive overview of some early work on joint models
  • Proust-Lima et al. (2014): discussed some recent advances on this topic
  • Brown et al. (2005): propose a joint modelling method in which cubic B-splines were introduced to model the longitudinal markers flexibly and their approach can deal with multivariate biomarkers, but is limited to only a few of them
  • Rizopoulos and Ghosh (2011): developed a spline-based approach for longitudinal outcomes with unusual time-dependent shapes.
  • Rizopoulos (2012): to improve the computation efficiency, proposed to use a pseudo-adaptive Gauss-Hermite quadrature rule to achieve a fast computation for integrations in joint models
  • Rizopoulos et al. (2014): use the Bayesian model averaging idea in the joint modeling framework to obtain individualized predictionby aggregating results from different types of submodels
  • Het at al. (2015): develop a penalized likelihood method with LASSO penalty for simutaneous selection of fixed and random effects in joint models.

However, all aforementioned methods cannot deal with a large number of biomarkers simultaneously, such as the gene expression profiles.

this paper propose a new joint modeling method under the Bayesian framework to deal with longitudinal biomarkers of high dimension.

  • specifically, assume that only a few unobserved latent variables are related to the survival outcome.
  • adopt a fatcor analysis model (West, 2003; Carvalho et al., 2008) to infer the latent variables, which greatly reduces the dimensionality of the biomarkers. In addition, the factor analysis model can also account for the high correlations among the biomarkers, as often observed in the gene expression data.

Model Specification

Longitudinal submodel

Let $y^{(i)}(t) = (y_1^{(i)}(t), \ldots, y_G^{(i)}(t))^T$ denote the gene expression profiles for subject $i$ at time point $t$, where $G$ is the total number of genes.

A standard factor analysis model can be written as

\[y^{(i)}(t) = \Lambda \eta^{(i)}(t) + \varepsilon^{(i)}(t), \varepsilon^{(i)}(t) \sim_{iid} N(0, \diag\{\sigma_1^2,\ldots, \sigma_G^2\})\]

where $\eta^{(i)}(t) = (\eta_1^{(i)}(t), \ldots, \eta_K^{(i)}(t))^T$ is a $K$-dimensional latent factor score vector for the $i$-th subject at time point $t$ with $K$ being some pre-specified number of factors

For $k=1,2,\ldots, K$,

\[\eta_k^{(i)}(t) = x_k^{(i)}(t)^T\beta_k + z_k^{(i)}(t)^Tb_k^{(i)}\,,\]

where $x_k^{(i)}(t)$ and $z_k^{(i)}(t)$ denote the time-dependent design vectors for subject $i$.

design vector for the latent vector?

Survival submodel

take the form

\[h_i(t) = h_0(t)\exp(\gamma^Tw_i + \alpha^T\eta^{(i)}(t))\]

Estimation and prediction

perform MCMC to estimate parameters in the submodels jointly


adopt the deviance information criterion (DIC)

set both $x^{(i)}(t)$ and $z^{(i)}(t)$ to include only linear functions of $t$

Dynamic Prediction

For any $u > t$, the probability that subject $j$ will survive at least up to $u$ can be written as

\[\zeta_j(u\mid t) = \Pr(T_j^\star\ge u\mid T_j^\star > t,...)\]

Simulation studies

Simulation setting

  • number of subject: 50
  • number of genes: 50
  • number of visits per subject is randomly drawn from ${2, 3, 4, 5}$ with equal probability

Assessment of Predictive Performance

to evaluate the predictive performance of the proposed model, we design two other models for comparison

  • JMsig: pick three biomarkers with the largest factors
  • JMran: first select three biomarkers at random and then a standard joint model is fitted with these randomly selected genes as in the first model

calculate the AUC

Published in categories Note