WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Optimal estimation of functionals of high-dimensional mean and covariance matrix

Posted on
Tags: Minimax, LDA

This post is based on Fan, J., Weng, H., & Zhou, Y. (2019). Optimal estimation of functionals of high-dimensional mean and covariance matrix. ArXiv:1908.07460 [Math, Stat].


  • Estimate a functional $\mu\Sigma^{-1}\mu$ involving both the mean vector $\mu$ and covariance matrix $\Sigma$.
  • Study the minimax estimation of the functional in the high-dimensional setting where $\Sigma^{-1}\mu$ is sparse.


Many new theory and methods have been proposed to the challenges arising from high dimensionality.

  • various regularization techniques have been proposed to estimate large covariance matrix under different matrix structural assumptions such as sparsity, conditional sparsity, and smoothness.
  • two review paper: TODO

Let $\x_i\in\IR^p$ be i.i.d. random vectors with $\bbE(\x_i)=\mu$ and $\Cov(\x_i)=\Sigma$.

Primary goal: estimate the functional $\mu’\Sigma^{-1}\mu$ based on the observations $\{\x_i\}_{i=1}^n$, under the assumption that $\Sigma^{-1}\mu$ (approximately) sparse.

other goal: the optimal rate of convergence for estimating the functional $\mu’\Sigma^{-1}\mu$, reveal the minimax estimation rate of $\mu’\Sigma^{-1}\mu$ in the high-dimensional multivariate problem.

Preliminaries and Examples

Suppose that $\x_1,\ldots,\x_n$ are independent copies of $\x \in\IR^p$ with $\bbE(\x)=\mu$ and $\Cov(\x)=\Sigma$. Throughout the paper, assume $\x$ is sub-gaussian. That is, $\x = \Sigma^{1/2}y+\mu$ and the zero-mean isotropic random vector $y$ satisfies

with $\nu > 0$ being a constant.

Study the estimation problem under minimax framework. The central goal is to characterize the minimax rate of the estimation error given by

Consider $\cH=\{(\mu,\Sigma):\mu'\Sigma\mu\le c\}$, where $c > 0$ is a fixed constant. If $p \ge n^2$, it holds that $$ \inf_{\hat\theta}\sup_{(\mu,\Sigma)\in \cH}\bbE\vert\hat\theta-\mu'\Sigma^{-1}\mu\vert \ge \tilde c\,, $$ where $\tilde c>0$ is a constant that depends on $c$.

It shows that it is impossible to consistently estimate the functional $\mu’\Sigma^{-1}\mu$, under the scaling $p\ge n^2$ which is not common in high-dimensional problems. To overcome the difficulty, we need a more structured parameter space.

High-dimensional LDA

The classical LDA procedure approximates Fisher’s rule by replacing the unknown parameters $\mu_1,\mu_2,\Sigma$ by their sample versions.

However, in the high-dimensional settings, the standard LDA can be no better than random guess. Various high-dimensional LDA approaches have been proposed under the sparsity assumption on $\mu_1-\mu_2$ or $\Sigma$. An alternative approach to sparse linear discriminant analysis imposes sparsity directly on $\Sigma^{-1}(\mu_1-\mu_2)$, based on the key observation that Fisher’s rule depends on $\mu_1-\mu_2$ and $\Sigma$ only through the product $\Sigma^{-1}(\mu_1-\mu_2)$.

Although the functional estimation in the LDA problem looks different from the focused problem, it is possible to extend the results to the LDA setting by a simple adaptation.

De-biasing ideas for statistical inference in high-dimensional linear models: TODO.

Published in categories Memo