MMRM: Mixed-Models for Repeated Measures
Posted on (Update: )
This post is based on vignettes of MMRM R package: https://openpharma.github.io/mmrm/main/index.html
Laid and Ware (1982) introduced the basic linear mixed-effects model for a single level of grouping to be any model that expresses an $n_i$-dimensional response (column) vector $y_i$ for the $i$-th subject (or, more generally, group or unit) as
\[y_i = X_i\beta + Z_ib_i + \epsilon_i, i=1,\ldots,M \\ b_i\sim N(0, \Psi), \varepsilon_i\sim N(0, \sigma^2I)\]MMRM model
directly model the variance-covariance structure of the response, i.e., to account for within-subject dependencies using the within-group component $\Lambda_i$.
This yields the MMRM with
\[y_i = X_i\beta + \epsilon_i, \epsilon_i\sim N(0, \Sigma_i), i=1,\ldots,M\]In total,
\[Y = X\beta + \epsilon\,,\]where $\epsilon\sim N(0, \Omega)$ with $\Omega\in \IR^{N\times N}$ being a block-diagonal matrix, containing the subject-specific $\Sigma_i$ on the diagonal.
Covariance Structures
In many covariance structures, use the decomposition
\[\Sigma = DPD\]where the diagonal standard deviation matrix is
$D = \diag\{\sigma_1,\ldots, \sigma_m\}$ with entries $\sigma_i > 0$ and the symmetric correlation matrix $P = {\rho_{ij}}$ with entries $\rho_{ij}\in (-1, 1)$.
- for $\sigma$, use the natural logarithm $\log(\sigma)$ to map them to $\IR$
- for correlation parameters $\rho$, use the transformation
which maps the correlation parameter to $\theta\in \IR$. It has the inverse
\[\rho = f^{-1}(\theta) = \frac{\theta}{\sqrt{1+\theta^2}}.\]Unstructured (us
)
then we have $k=m(m+1)/2$ parameters
Homogeneous (ad
) and Heterogeneous Ante-dependence (adh
)
the ante-dependence correlation structure (Gabriel 1962) is useful for balanced designs where the observations are not necessarily equally spaced in time.
Use an order one ante-dependence model, where the correlation matrix $P$ has elements
\[\rho_{ij} = \prod_{k=i}^{j-1}\rho_k\,.\]then
\[\theta = (\log(\sigma_1),\ldots, \log(\sigma_m), f(\rho_1), \ldots, f(\rho_{m-1}))\]in total $2m-1$ variance parameters.
Homogeneous (toep
) and Heterogeneous Toeplitz (toeph
)
Toeplitz matrices are diagonal-constant matrices. The correlation between two time points only depends on the distance between them, i.e., $\rho_{ij}=\rho_{\vert i-j\vert }$.
then
\[\theta = (\log(\sigma_1),\ldots, \log(\sigma_m), f(\rho_1), \ldots, f(\rho_{m-1}))\]in total $2m-1$ variance parameters.
Homogeneous (cs
) and Heterogeneous Compound Symmetry (csh
)
the compound symmetry covariance structures assume a constant correlation between different time points
\[\rho_{ij} = \rho\]then in total there are $k=m+1$ variance parameters.
Spatial Covariance Structure: spatial exponential (sp_exp
)
spatial covariance structures, unlike other covariance structures, does not require that the timepoints are consistent between subjects. Instead, as long as the distance between visits can be quantified in terms of time and/or other coordinates, the spatial covariance structure can be applied.
for spatial exponential, the covariance structure is define as follows:
\[\rho_{ij} = \rho^{d_{ij}}\]where $d_{ij}$ is the distance between time point $i$ and time point $j$.
total number of parameters $k=2$ is needed.
Hypothesis Testing
Contained Effect
For effect $E_2$, it is said to contain $E_1$ if
- the involved numerical covariates are identical for these two effects
- all the categorical covariates associated with $E_1$ are also associated with $E_2$
For example, for the following model formula
\[Y = A + B + A * B\]using $E_A, E_B$ and $E_{A*B}$ to denote the effect for $A, B$ and $A * B$ respectively, we have
- if $A$ and $B$ are both categorical, then $E_{A*B}$ contains $E_A$ and $E_B$
- if $A$ and $B$ are both numerical, then $E_{A*B}$ do not contain $E_A$ or $E_B$
- if $A$ is categorical and $B$ is numerical, then $E_{A*B}$ contains $E_B$ but not $E_A$
Model Fitting Algorithm
Let $i=1,\ldots, n$ denote the subjects from which we observe multiple observations $j=1,\ldots,m_i$ for total $m_i$ time points $t_{ij}\in {t_1,\ldots,t_m}$.
Note that the number of time points for a specific subject $m_i$ can be smaller than $m$, when only a subset of the possible $m$ time points have been observed.
Linear Model
For each subject $i$ we observe a vector $Y_i = (y_{i1}, \ldots, y_{im_i})^\top \in \IR^{m_i}$
and given a design matrix $X_i \in \IR^{m_i\times p}$
and a corresponding coefficient vector $\beta \in \IR^p$ we assume that the observations are multivariate normal distributed:
\[Y_i \sim N(X_i\beta, \Sigma_i)\]where the covariance matrix $\Sigma_i\in \IR^{m_i\times m_i}$ is derived by subsetting the overall covariance matrix $\Sigma \in \IR^{m\times m}$ appropriately by
\[\Sigma_i = G_i^{-1/2} S_i^T\Sigma S_iG_i^{-1/2}\]where the subsetting matrix $S_i \in {0, 1}^{m\times m_i}$. For example, assume a subject was observed on time points 1, 3, 4 out of total 5 then the subsetting matrix is
\[S_i = \begin{bmatrix} 1 & 0 & 0\\ 0 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ 0 & 0 & 0 \end{bmatrix}\]$G_i \in \IR^{m_i\times m_i}_{>0}$ is the diagonal weight matrix, which is the identity matrix if no weights are specified.
Combine together
\[Y = X\beta + \epsilon\]where $N=\sum_{i=1}^n m_i$
the MLE estimate of $\beta$ is the WLSE
for REML estimate, first obtain the marginal likelihood of the variance parameter $\theta$ by integrating out the remaining parameters $\beta$ from the likelihood.
Prediction of conditional mean
since residuals can be correlated, potentially existing observed outcomes of the same individuals can be informative for predicting the unobserved values of the same individual.
assume that the data is sorted such that $Y_{ij} = y_{ij}, j=k+1,k+2,\ldots, p$ are observed and $Y_{ij}, j=1,2,\ldots, k$ are not. The special case of all outcomes being unobserved (new individual) is covered with $k=p$
Let further
\[\Sigma_i(X_i, \theta) = \begin{pmatrix} \Sigma_{i}^{new, new}(X_i, \theta) & \Sigma_i^{new, old}(X_i, \theta)\\ \Sigma_{i}^{old, new}(X_i, \theta) & \Sigma_i^{old, old}(X_i, \theta) \end{pmatrix}\]be a block decomposition.
Predictions can then be made based on the conditional distribution
Prediction Sampling for Prediction Interval
with the conditional variance formula
\[\Var(Y_i) = \Var(E(Y_i\mid \theta)) + E(Var(Y_i\mid \theta))\]Prediction of Conditional Mean for New Subjects
If there are no observations for a subject, then the prediction is quite simple.
\[Y_i = X_i\hat\beta\]An Example
> head(fev_data)
USUBJID AVISIT ARMCD RACE SEX FEV1_BL
1 PT1 VIS1 TRT Black or African American Female 25.27144
2 PT1 VIS2 TRT Black or African American Female 25.27144
3 PT1 VIS3 TRT Black or African American Female 25.27144
4 PT1 VIS4 TRT Black or African American Female 25.27144
5 PT2 VIS1 PBO Asian Male 45.02477
6 PT2 VIS2 PBO Asian Male 45.02477
FEV1 WEIGHT VISITN VISITN2
1 NA 0.6767231 1 -0.6264538
2 39.97105 0.8006186 2 0.1836433
3 NA 0.7087859 3 -0.8356286
4 20.48379 0.8088997 4 1.5952808
5 NA 0.4651848 1 0.3295078
> length(fev_data$FEV1)
[1] 800
> sum(is.na(fev_data$FEV1))
[1] 263
# the missing value for each visit
> table(fev_data$AVISIT[is.na(fev_data$FEV1)])
VIS1 VIS2 VIS3 VIS4
66 60 71 66
use an MMRM with the given covariates and an unstructured covariance matrix for the timepoints (AVISIT
) within the subjects (USUBJID
). By default, this uses restricted maximum likelihood (REML)
> fit
mmrm fit
Formula: FEV1 ~ RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID)
Data: fev_data (used 537 observations from 197 subjects with maximum 4
timepoints)
Covariance: unstructured (10 variance parameters)
Inference: REML
Deviance: 3386.45
Coefficients:
(Intercept) RACEBlack or African American
30.7774065 1.5305950
RACEWhite SEXFemale
5.6435679 0.3260274
ARMCDTRT AVISITVIS2
3.7744139 4.8396039
AVISITVIS3 AVISITVIS4
10.3421671 15.0537863
ARMCDTRT:AVISITVIS2 ARMCDTRT:AVISITVIS3
-0.0420899 -0.6938068
ARMCDTRT:AVISITVIS4
0.6241229
Model Inference Optimization:
Converged with code 0 and message: No message provided.
we can see the coefficients table with Satterthwaite degrees of freedom as well as the covariance matrix estimate
> summary(fit)
mmrm fit
Formula: FEV1 ~ RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID)
Data: fev_data (used 537 observations from 197 subjects with maximum 4
timepoints)
Covariance: unstructured (10 variance parameters)
Method: Satterthwaite
Vcov Method: Asymptotic
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
3406.4 3439.3 -1693.2 3386.4
Coefficients:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 30.77741 0.88657 218.79000 34.715 < 2e-16
RACEBlack or African American 1.53059 0.62446 168.67000 2.451 0.015263
RACEWhite 5.64357 0.66559 157.14000 8.479 1.56e-14
SEXFemale 0.32603 0.53194 166.14000 0.613 0.540776
ARMCDTRT 3.77441 1.07416 145.55000 3.514 0.000589
AVISITVIS2 4.83960 0.80173 143.87000 6.036 1.27e-08
AVISITVIS3 10.34217 0.82269 155.56000 12.571 < 2e-16
AVISITVIS4 15.05379 1.31288 138.46000 11.466 < 2e-16
ARMCDTRT:AVISITVIS2 -0.04209 1.12933 138.56000 -0.037 0.970324
ARMCDTRT:AVISITVIS3 -0.69381 1.18764 158.17000 -0.584 0.559924
ARMCDTRT:AVISITVIS4 0.62412 1.85096 129.71000 0.337 0.736520
(Intercept) ***
RACEBlack or African American *
RACEWhite ***
SEXFemale
ARMCDTRT ***
AVISITVIS2 ***
AVISITVIS3 ***
AVISITVIS4 ***
ARMCDTRT:AVISITVIS2
ARMCDTRT:AVISITVIS3
ARMCDTRT:AVISITVIS4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Covariance estimate:
VIS1 VIS2 VIS3 VIS4
VIS1 40.5544 14.3960 4.9760 13.3779
VIS2 14.3960 26.5714 2.7836 7.4773
VIS3 4.9760 2.7836 14.8980 0.9036
VIS4 13.3779 7.4773 0.9036 95.5565