WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Leave-one-out CV for Lasso

Posted on
Tags: Lasso, Cross-Validation

This note is for Homrighausen, D., & McDonald, D. J. (2013). Leave-one-out cross-validation is risk consistent for lasso. ArXiv:1206.6128 [Math, Stat].

Suppose we form the adaptive ridge regression estimator (Grandvalet (1998))

\[\argmin_{\theta, (\lambda_j)} \Vert Y-X\theta\Vert_2^2 +\sum_{j=1}^p\lambda_j\theta_j^2\]

subject to the constraint $\lambda\sum_{j=1}^p 1/\lambda_j=p$. Then the solution is equivalent, under a reparameterization of $\lambda$, to the lasso solution.

Main assumptions ensure that the sequence $(X_i)_{i=1}^n$ is sufficiently regular.

  • A:
\[C_n := \frac 1n \sum_{i=1}^n X_iX_i^T \rightarrow C\,,\]

where $C$ is a positive definite matrix with minimum eigenvalue $c_\min > 0$

  • B:

There exists a constant $C_X < \infty$ independent of $n$ such that

\[\Vert X_i\Vert \le C_X\]

Define the predictive risk and the leave-one-out cross-validation estimator of risk

\[R_n(\lambda) = \frac 1n \bbE \Vert X(\hat \theta(\lambda) - \theta)\Vert^2 +\sigma^2\]


\[\hat R_n(\lambda) = \frac 1n \sum_{i=1}^n(Y_i - X_i^T\hat\theta^{(i)}(\lambda))^2\]

Suppose that Assumptions A and B hold and that there exists a $C_\theta < \infty$ such that $\Vert\theta\Vert_1\le C_\theta$. Also, suppose $W_i$ is sub-gaussian. Then \(R_n(\hat\lambda_n) - R_n(\lambda_n) \rightarrow 0\)

To prove the theorem, first to show

\[\sup \vert \hat R_n(\lambda) - R_n(\lambda)\vert \rightarrow 0\]

in probability. Then the result follows as

\[\begin{align*} R_n(\hat \lambda_n) - R_n(\lambda_n) &= (R_n(\hat\lambda_n) - \hat R_n(\hat\lambda_n)) + (\hat R_n(\hat\lambda_n) - R_n(\lambda_n)) \\ &\le (R_n(\hat\lambda_n) - \hat R_n(\hat\lambda_n)) + (\hat R_n(\lambda_n) - R_n(\lambda_n))\\ &\le 2\sup (R_n(\lambda) - \hat R_n(\lambda))\\ &=o_p(1) \end{align*}\]

where $R_n(\hat \lambda_n) -R_n(\lambda_n)$ is non-stochastic, and therefore the convergence in probability implies sequential convergence.


\[\begin{align*} \vert R_n(\lambda) - \hat R_n(\lambda)\vert &=\left\vert \frac 1n \bbE\Vert X\hat\theta(\lambda)\Vert_2^2 + \frac 1n \Vert X\theta\Vert_2^2 -\frac 2n \bbE (X\hat\theta(\lambda))^TX\theta + \sigma^2 - \frac 1n\sum_{i=1}^n\left(Y_i^2 + (X_i^T\hat\theta^{(i)}(\lambda))^2 - 2Y_iX_i^T\hat\theta^{(i)}(\lambda)\right) \right\vert\\ &\le \left\vert \frac 1n \bbE\Vert X\hat\theta(\lambda)\Vert_2^2 - \frac 1n\sum_{i=1}^n (X_i^T\hat\theta^{(i)}(\lambda))^2\right\vert + 2\left\vert \frac 1n \bbE (X\hat\theta(\lambda))^TX\theta - \frac 1n \sum_{i=1}^nY_iX_i^T\hat\theta^{(i)}(\lambda)\right\vert +\left\vert \frac 1n \Vert X\theta\Vert_2^2 + \sigma^2 - \frac 1n\sum_{i=1}^nY_i^2\right\vert\\ &\triangleq a + b + c \end{align*}\]


Observe that

\[\left\vert \frac 1n \bbE\Vert X\hat\theta(\lambda)\Vert_2^2 - \frac 1n \sum_{i=1}^n(X_i^T\hat\theta^{(i)}(\lambda))^2\right\vert \le \left\vert \frac 1n \bbE\Vert X\hat\theta(\lambda)\Vert_2^2 - \frac 1n\Vert X\hat\theta(\lambda)\Vert_2^2 \right\vert + \left\vert \frac 1n\Vert X\hat\theta(\lambda)\Vert_2^2 - \frac 1n\sum_{i=1}^n(X_i^T\hat\theta^{(i)}(\lambda))^2 \right\vert\]

Published in categories Note