Highdimensional linear mixedeffect model
Posted on
This post is based on Li, S., Cai, T. T., & Li, H. (2019). Inference for highdimensional linear mixedeffects models: A quasilikelihood approach. ArXiv:1907.06116 [Stat].
 Application of Linear mixedeffects models: analyzing clustered or repeated measures data
 Proposal:
 a quasilikelihood approach for estimation and inference of the unknown parameters in linear mixedeffects models with highdimensional fixed effects. (applicable to general settings where the cluster sizes are possibly large or unbalanced.)
 rate optimal estimators and valid inference procedures that are free of the assumptions on the specific structure of the variance components
 derive the rate optimal estimators of the variance components. Under proper conditions, the convergence rate for estimating the variance components of the random effects does not depend on the accuracy of fixedeffects estimation.
Introduction
The linear mixedeffects models incorporate both the fixed and random effects, where the random effects induce correlations among the observations within each cluster and accommodate the cluster structure.
A variety of statistical models and approaches have been proposed and studied for analyzing highdimensional data. But most of them are restricted to dealing with independent observations, such as linear models and generalized linear models.
Statistical inference w.r.t. highdimensional linear mixedeffects mideks remains to be a challenging problem.
Consider the setting for clustered data,
 $i=1,\ldots,n$: the clustering index
 $y_i\in\bbR^{m_i}$: reponse vector
 $X^i\in\bbR^{m_i\times p}$: a design matrix for the fixed effects
 $Z^i\in\bbR^{m_i\times q}$: a design matrix for the random effects
Linear mixedeffects model can be typically written as
\[y_i = X^i\beta^* + Z^*\gamma_i+\epsilon_i,\;i=1,\ldots,n\,,\]where
 $\beta^*\in\bbR^p$: the vector of true fixed effects
 $\gamma_i\in\bbR^q$: the vector of the random effects of the $i$th cluster
 $\epsilon_i$: noise vector.

$\gamma_i$ and $\epsilon_i$ are indep. distributed with mean zero and variance $\Psi_{\eta^*}\in\bbR^{q\times q}$ and $\sigma^2eI{m_i}$
 fixeddimensional setting: if $p,q$ and ${m_i}_{i=1}^n$ are all fixed numbers
 highdimensioanl setting: if $p$ is large and possibly much larger than $N=\sum_{i=1}^nm_i$
Related Literatures
In the fixeddimensioal setting, many methods have been proposed to jointly estimate the fixed effects and variance components.
 likelihoodbased: MLE and restricted MLE
 not applicable for highdimensional
 rely heavily on the normality assumptions of the random components
 generally lead to a nonconvex optimization problem
 momentbased
In terms of statistical inference on the fixed effects, the likelihood ratio, score and Wald tests are broadly used. For the variancecomponent, these methods are also available. But these methods are based on the MLEs or restricted MLEs as initial estimators, they also suffer from the drawbacks of likelihoodratio based methods.
In the highdimensional setting,  Schelldorfer et al. (2011): assuming fixed cluster sizes,  Fan and Li (2012): the fixed effects and random effects selection in a highdimensional linear mixedeffects model when cluster sizes are balanced, $\max_i m_i / \min_i m_i<\infty$.  Bradic et al. (2017): testing a single coefficient of the fixed effects in the highdimensional linear mixedeffects models with fixed cluster sizes and subGaussian designs.  all of them require the positive definiteness on the covariance matrix of the random effects. This condition takes prior knowledge on the existence of the random effects and can be hard to fulfill in applications. The optimal convergence rate of parameter estimation remains unknown.
The problems of estimation and inference of the fixed effects in linear mixedeffects models are well connected with the literature on highdimensional linear models.
 many penalized methods have been proposed for prediction, estimation, and variable selection in highdimensional linear models.
 statistical inference on a lowdimensional component of highdimensional regression coefficients has been considered and studied in linear models and generalized linear models with “debiased” estimators.
 the idea of debiasing has also been studied and extended to solve other statistical problems, such as statistical inference in Cox model and simultaneous inference.
Paper’s contributions
Develop a method for inference for the unknown parameters in highdimensional linear mixedeffects models with general applicability and computational efficiency.
 the proposed methods are computationally fast and stable
 the proposed estimator for the fixedeffects is rate optimal from the minimax perspective under general conditions.
 propose to estimate the variance components with any consistent estimators of the fixed effects.