Joint Model in High Dimension

Posted on Jan 31, 2024 (Update: Feb 06, 2024)

This post is for Liu, M., Sun, J., Herazo-Maya, J. D., Kaminski, N., & Zhao, H. (2019). Joint Models for Time-to-Event Data and Longitudinal Biomarkers of High Dimension. Statistics in Biosciences, 11(3), 614–629.

Joint Models for Time-to-Event Data and Longitudinal Biomarkers of High Dimension

study the association between a biomarker repeatedly measured over time and the survival outcome

landmarking methods: Anderson et al. (1983), van Houwelingen and Putter (2011)
joint modeling methods: Faucett and Thomas (1996); Tsiatis and Davidian (2004); Rizopoulos et al. (2014)

The paper are interested in predicting the patients’ survival time using relevant clinical variables and the repeatedly measured gene expression profiles

many joint modeling methods for different types of longitudinal biomarkers and survival outcomes

Tsiatis and Davidian (2004): a comprehensive overview of some early work on joint models
Proust-Lima et al. (2014): discussed some recent advances on this topic
Brown et al. (2005): propose a joint modelling method in which cubic B-splines were introduced to model the longitudinal markers flexibly and their approach can deal with multivariate biomarkers, but is limited to only a few of them
Rizopoulos and Ghosh (2011): developed a spline-based approach for longitudinal outcomes with unusual time-dependent shapes.
Rizopoulos (2012): to improve the computation efficiency, proposed to use a pseudo-adaptive Gauss-Hermite quadrature rule to achieve a fast computation for integrations in joint models
Rizopoulos et al. (2014): use the Bayesian model averaging idea in the joint modeling framework to obtain individualized predictionby aggregating results from different types of submodels
Het at al. (2015): develop a penalized likelihood method with LASSO penalty for simutaneous selection of fixed and random effects in joint models.

However, all aforementioned methods cannot deal with a large number of biomarkers simultaneously, such as the gene expression profiles.

this paper propose a new joint modeling method under the Bayesian framework to deal with longitudinal biomarkers of high dimension.

specifically, assume that only a few unobserved latent variables are related to the survival outcome.
adopt a fatcor analysis model (West, 2003; Carvalho et al., 2008) to infer the latent variables, which greatly reduces the dimensionality of the biomarkers. In addition, the factor analysis model can also account for the high correlations among the biomarkers, as often observed in the gene expression data.

Model Specification

Longitudinal submodel

Let $y^{(i)}(t) = (y_1^{(i)}(t), \ldots, y_G^{(i)}(t))^T$ denote the gene expression profiles for subject $i$ at time point $t$, where $G$ is the total number of genes.

A standard factor analysis model can be written as

\[y^{(i)}(t) = \Lambda \eta^{(i)}(t) + \varepsilon^{(i)}(t), \varepsilon^{(i)}(t) \sim_{iid} N(0, \diag\{\sigma_1^2,\ldots, \sigma_G^2\})\]

where $\eta^{(i)}(t) = (\eta_1^{(i)}(t), \ldots, \eta_K^{(i)}(t))^T$ is a $K$-dimensional latent factor score vector for the $i$-th subject at time point $t$ with $K$ being some pre-specified number of factors

For $k=1,2,\ldots, K$,

\[\eta_k^{(i)}(t) = x_k^{(i)}(t)^T\beta_k + z_k^{(i)}(t)^Tb_k^{(i)}\,,\]

where $x_k^{(i)}(t)$ and $z_k^{(i)}(t)$ denote the time-dependent design vectors for subject $i$.

design vector for the latent vector?

Survival submodel

take the form

\[h_i(t) = h_0(t)\exp(\gamma^Tw_i + \alpha^T\eta^{(i)}(t))\]

Estimation and prediction

perform MCMC to estimate parameters in the submodels jointly

adopt the deviance information criterion (DIC)

set both $x^{(i)}(t)$ and $z^{(i)}(t)$ to include only linear functions of $t$

Dynamic Prediction

For any $u > t$, the probability that subject $j$ will survive at least up to $u$ can be written as

\[\zeta_j(u\mid t) = \Pr(T_j^\star\ge u\mid T_j^\star > t,...)\]

Simulation studies

Simulation setting

number of subject: 50
number of genes: 50
number of visits per subject is randomly drawn from ${2, 3, 4, 5}$ with equal probability

Assessment of Predictive Performance

to evaluate the predictive performance of the proposed model, we design two other models for comparison

JMsig: pick three biomarkers with the largest factors
JMran: first select three biomarkers at random and then a standard joint model is fitted with these randomly selected genes as in the first model

calculate the AUC

Published in categories Note

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.