# Optimal estimation of functionals of high-dimensional mean and covariance matrix

##### Posted on

This post is based on Fan, J., Weng, H., & Zhou, Y. (2019). Optimal estimation of functionals of high-dimensional mean and covariance matrix. ArXiv:1908.07460 [Math, Stat].

## Abstract

- Estimate a functional $\mu\Sigma^{-1}\mu$ involving both the mean vector $\mu$ and covariance matrix $\Sigma$.
- Study the minimax estimation of the functional in the high-dimensional setting where $\Sigma^{-1}\mu$ is sparse.

## Introduction

Many new theory and methods have been proposed to the challenges arising from high dimensionality.

- various regularization techniques have been proposed to estimate large covariance matrix under different matrix structural assumptions such as sparsity, conditional sparsity, and smoothness.
- two review paper: TODO

Let $\x_i\in\IR^p$ be i.i.d. random vectors with $\bbE(\x_i)=\mu$ and $\Cov(\x_i)=\Sigma$.

**Primary goal:** estimate the functional $\mu’\Sigma^{-1}\mu$ based on the observations $\{\x_i\}_{i=1}^n$, under the assumption that $\Sigma^{-1}\mu$ (approximately) sparse.

**other goal:** the optimal rate of convergence for estimating the functional $\mu’\Sigma^{-1}\mu$, reveal the minimax estimation rate of $\mu’\Sigma^{-1}\mu$ in the high-dimensional multivariate problem.

## Preliminaries and Examples

Suppose that $\x_1,\ldots,\x_n$ are independent copies of $\x \in\IR^p$ with $\bbE(\x)=\mu$ and $\Cov(\x)=\Sigma$. Throughout the paper, assume $\x$ is sub-gaussian. That is, $\x = \Sigma^{1/2}y+\mu$ and the zero-mean isotropic random vector $y$ satisfies

with $\nu > 0$ being a constant.

Study the estimation problem under minimax framework. The central goal is to characterize the minimax rate of the estimation error given by

It shows that it is impossible to consistently estimate the functional $\mu’\Sigma^{-1}\mu$, under the scaling $p\ge n^2$ which is not common in high-dimensional problems. To overcome the difficulty, we need a more structured parameter space.

### High-dimensional LDA

The classical LDA procedure approximates Fisher’s rule by replacing the unknown parameters $\mu_1,\mu_2,\Sigma$ by their sample versions.

However, in the high-dimensional settings, the standard LDA can be no better than random guess. Various high-dimensional LDA approaches have been proposed under the sparsity assumption on $\mu_1-\mu_2$ or $\Sigma$. An alternative approach to sparse linear discriminant analysis imposes sparsity directly on $\Sigma^{-1}(\mu_1-\mu_2)$, based on the key observation that Fisher’s rule depends on $\mu_1-\mu_2$ and $\Sigma$ only through the product $\Sigma^{-1}(\mu_1-\mu_2)$.

Although the functional estimation in the LDA problem looks different from the focused problem, it is possible to extend the results to the LDA setting by a simple adaptation.

De-biasing ideas for statistical inference in high-dimensional linear models: TODO.