# MMRM: Mixed-Models for Repeated Measures

##### Posted on (Update: )

This post is based on vignettes of MMRM R package: https://openpharma.github.io/mmrm/main/index.html

Laid and Ware (1982) introduced the basic linear mixed-effects model for a single level of grouping to be any model that expresses an $n_i$-dimensional response (column) vector $y_i$ for the $i$-th subjec (or, more generally, group or unit) as

\[y_i = X_i\beta + Z_ib_i + \epsilon_i, i=1,\ldots,M \\ b_i\sim N(0, \Psi), \varepsilon_i\sim N(0, \sigma^2I)\]MMRM model

directly model the variance-covariance structure of the response, i.e., to account for within-subject dependencies using the within-group component $\Lambda_i$.

This yields the MMRM with

\[y_i = X_i\beta + \epsilon_i, \epsilon_i\sim N(0, \Sigma_i), i=1,\ldots,M\]In total,

\[Y = X\beta + \epsilon\,,\]where $\epsilon\sim N(0, \Omega)$ with $\Omega\in \IR^{N\times N}$ being a block-diagonal matrix, containing the subject-specific $\Sigma_i$ on the diagonal.

## Covariance Structures

In many covariance structures, use the decomposition

\[\Sigma = DPD\]where the diagonal standard deviation matrix is

$D = \diag{\sigma_1,\ldots, \sigma_m}$ with entries $\sigma_i > 0$ and the symmetric correlation matrix $P = {\rho_{ij}}$ with entries $\rho_{ij}\in (-1, 1)$.

- for $\sigma$, use the natural logarithm $\log(\sigma)$ to map them to $\IR$
- for correlation parameters $\rho$, use the transformation

which maps the correlation parameter to $\theta\in \IR$. It has the inverse

\[\rho = f^{-1}(\theta) = \frac{\theta}{\sqrt{1+\theta^2}}.\]### Unstructured (`us`

)

then we have $k=m(m+1)/2$ parameters

### Homogeneous (`ad`

) and Heterogeneous Ante-dependence (`adh`

)

the ante-dependence correlation structure (Gabriel 1962) is useful for balanced designs where the observations are not necessarily equally spaced in time.

Use an order one ante-dependence model, where the correlation matrix $P$ has elements

\[\rho_{ij} = \prod_{k=i}^{j-1}\rho_k\,.\]then

\[\theta = (\log(\sigma_1),\ldots, \log(\sigma_m), f(\rho_1), \ldots, f(\rho_{m-1}))\]in total $2m-1$ variance parameters.

### Homogeneous (`toep`

) and Heterogeneous Toeplitz (`toeph`

)

Toeplitz matrices are diagonal-constant matrices. The correlation between two time points only depends on the distance between them, i.e., $\rho_{ij}=\rho_{\vert i-j\vert }$.

then

\[\theta = (\log(\sigma_1),\ldots, \log(\sigma_m), f(\rho_1), \ldots, f(\rho_{m-1}))\]in total $2m-1$ variance parameters.

### Homogeneous (`cs`

) and Heterogeneous Compound Symmetry (`csh`

)

the compound symmetry covariance structures assume a constant correlation between different time points

\[\rho_{ij} = \rho\]then in total there are $k=m+1$ variance parameters.

### Spatial Covariance Structure: spatial exponential (`sp_exp`

)

spatial covariance structures, unlike other covariance structures, does not require that the timepoints are consistent between subjects. Instead, as long as the distance between visits can be quantified in terms of time and/or other coordinates, the spatial covariance structure can be applied.

for spatial exponential, the covariance structure is define as follows:

\[\rho_{ij} = \rho^{d_{ij}}\]where $d_{ij}$ is the distance between time point $i$ and time point $j$.

total number of parameters $k=2$ is needed.

## Hypothesis Testing

### Contained Effect

For effect $E_2$, it is said to contain $E_1$ if

- the involved numerical covariates are identical for these two effects
- all the categorical covariates associated with $E_1$ are also associated with $E_2$

For example, for the following model formula

\[Y = A + B + A * B\]using $E_A, E_B$ and $E_{A*B}$ to denote the effect for $A, B$ and $A * B$ respectively, we have

- if $A$ and $B$ are both categorical, then $E_{A*B}$ contains $E_A$ and $E_B$
- if $A$ and $B$ are both numerical, then $E_{A*B}$ do not contain $E_A$ or $E_B$
- if $A$ is categorical and $B$ is numerical, then $E_{A*B}$ contains $E_B$ but not $E_A$

## Model Fitting Algorithm

Let $i=1,\ldots, n$ denote the subjects from which we observe multiple observations $j=1,\ldots,m_i$ for total $m_i$ time points $t_{ij}\in {t_1,\ldots,t_m}$.

Note that the number of time points for a specific subject $m_i$ can be smaller than $m$, when only a subset of the possible $m$ time points have been observed.

### Linear Model

For each subject $i$ we observe a vector $Y_i = (y_{i1}, \ldots, y_{im_i})^\top \in \IR^{m_i}$

and given a design matrix $X_i \in \IR^{m_i\times p}$

and a corresponding coefficient vector $\beta \in \IR^p$ we assume that the observations are multivariate normal distributed:

\[Y_i \sim N(X_i\beta, \Sigma_i)\]where the covariance matrix $\Sigma_i\in \IR^{m_i\times m_i}$ is derived by subsetting the overall covariance matrix $\Sigma \in \IR^{m\times m}$ appropriately by

\[\Sigma_i = G_i^{-1/2} S_i^T\Sigma S_iG_i^{-1/2}\]where the subsetting matrix $S_i \in {0, 1}^{m\times m_i}$. For example, assume a subject was observed on time points 1, 3, 4 out of total 5 then the subsetting matrix is

\[S_i = \begin{bmatrix} 1 & 0 & 0\\ 0 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ 0 & 0 & 0 \end{bmatrix}\]$G_i \in \IR^{m_i\times m_i}_{>0}$ is the diagonal weight matrix, which is the identity matrix if no weights are specified.

Combine together

\[Y = X\beta + \epsilon\]where $N=\sum_{i=1}^n m_i$

the MLE estimate of $\beta$ is the WLSE

for REML estimate, first obtain the marginal likelihood of the variance parameter $\theta$ by integrating out the remaining parameters $\beta$ from the likelihood.

## Prediction of conditional mean

since residuals can be correlated, potentially existing observed outcomes of the same individuals can be informative for predicting the unobserved values of the same individual.

assume that the data is sorted such that $Y_{ij} = y_{ij}, j=k+1,k+2,\ldots, p$ are observed and $Y_{ij}, j=1,2,\ldots, k$ are not. The special case of all outcomes being unobserved (new individual) is covered with $k=p$

Let further

\[\Sigma_i(X_i, \theta) = \begin{pmatrix} \Sigma_{i}^{new, new}(X_i, \theta) & \Sigma_i^{new, old}(X_i, \theta)\\ \Sigma_{i}^{old, new}(X_i, \theta) & \Sigma_i^{old, old}(X_i, \theta) \end{pmatrix}\]be a block decomposition.

Predictions can then be made based on the conditional distribution

### Prediction Sampling for Prediction Interval

with the conditional variance formula

\[\Var(Y_i) = \Var(E(Y_i\mid \theta)) + E(Var(Y_i\mid \theta))\]### Prediction of Conditional Mean for New Subjects

If there are no observations for a subject, then the prediction is quite simple.

\[Y_i = X_i\hat\beta\]