WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

High-dimensional linear mixed-effect model

Posted on
Tags: High-Dimensional, Random Effects

This post is based on Li, S., Cai, T. T., & Li, H. (2019). Inference for high-dimensional linear mixed-effects models: A quasi-likelihood approach. ArXiv:1907.06116 [Stat].

  • Application of Linear mixed-effects models: analyzing clustered or repeated measures data
  • Proposal:
    • a quasi-likelihood approach for estimation and inference of the unknown parameters in linear mixed-effects models with high-dimensional fixed effects. (applicable to general settings where the cluster sizes are possibly large or unbalanced.)
    • rate optimal estimators and valid inference procedures that are free of the assumptions on the specific structure of the variance components
    • derive the rate optimal estimators of the variance components. Under proper conditions, the convergence rate for estimating the variance components of the random effects does not depend on the accuracy of fixed-effects estimation.

Introduction

The linear mixed-effects models incorporate both the fixed and random effects, where the random effects induce correlations among the observations within each cluster and accommodate the cluster structure.

A variety of statistical models and approaches have been proposed and studied for analyzing high-dimensional data. But most of them are restricted to dealing with independent observations, such as linear models and generalized linear models.

Statistical inference w.r.t. high-dimensional linear mixed-effects mideks remains to be a challenging problem.

Consider the setting for clustered data,

  • $i=1,\ldots,n$: the clustering index
  • $y_i\in\bbR^{m_i}$: reponse vector
  • $X^i\in\bbR^{m_i\times p}$: a design matrix for the fixed effects
  • $Z^i\in\bbR^{m_i\times q}$: a design matrix for the random effects

Linear mixed-effects model can be typically written as

\[y_i = X^i\beta^* + Z^*\gamma_i+\epsilon_i,\;i=1,\ldots,n\,,\]

where

  • $\beta^*\in\bbR^p$: the vector of true fixed effects
  • $\gamma_i\in\bbR^q$: the vector of the random effects of the $i$-th cluster
  • $\epsilon_i$: noise vector.
  • $\gamma_i$ and $\epsilon_i$ are indep. distributed with mean zero and variance $\Psi_{\eta^*}\in\bbR^{q\times q}$ and $\sigma^2eI{m_i}$

  • fixed-dimensional setting: if $p,q$ and ${m_i}_{i=1}^n$ are all fixed numbers
  • high-dimensioanl setting: if $p$ is large and possibly much larger than $N=\sum_{i=1}^nm_i$

In the fixed-dimensioal setting, many methods have been proposed to jointly estimate the fixed effects and variance components.

  • likelihood-based: MLE and restricted MLE
    • not applicable for high-dimensional
    • rely heavily on the normality assumptions of the random components
    • generally lead to a nonconvex optimization problem
  • moment-based

In terms of statistical inference on the fixed effects, the likelihood ratio, score and Wald tests are broadly used. For the variance-component, these methods are also available. But these methods are based on the MLEs or restricted MLEs as initial estimators, they also suffer from the drawbacks of likelihood-ratio based methods.

In the high-dimensional setting, - Schelldorfer et al. (2011): assuming fixed cluster sizes, - Fan and Li (2012): the fixed effects and random effects selection in a high-dimensional linear mixed-effects model when cluster sizes are balanced, $\max_i m_i / \min_i m_i<\infty$. - Bradic et al. (2017): testing a single coefficient of the fixed effects in the high-dimensional linear mixed-effects models with fixed cluster sizes and sub-Gaussian designs. - all of them require the positive definiteness on the covariance matrix of the random effects. This condition takes prior knowledge on the existence of the random effects and can be hard to fulfill in applications. The optimal convergence rate of parameter estimation remains unknown.

The problems of estimation and inference of the fixed effects in linear mixed-effects models are well connected with the literature on high-dimensional linear models.

  • many penalized methods have been proposed for prediction, estimation, and variable selection in high-dimensional linear models.
  • statistical inference on a low-dimensional component of high-dimensional regression coefficients has been considered and studied in linear models and generalized linear models with “debiased” estimators.
  • the idea of debiasing has also been studied and extended to solve other statistical problems, such as statistical inference in Cox model and simultaneous inference.

Paper’s contributions

Develop a method for inference for the unknown parameters in high-dimensional linear mixed-effects models with general applicability and computational efficiency.

  • the proposed methods are computationally fast and stable
  • the proposed estimator for the fixed-effects is rate optimal from the minimax perspective under general conditions.
  • propose to estimate the variance components with any consistent estimators of the fixed effects.

Published in categories Memo