# Leave-one-out CV for Lasso

##### Posted on Mar 14, 2022
Tags: Lasso, Cross-Validation

Suppose we form the adaptive ridge regression estimator (Grandvalet (1998))

$\argmin_{\theta, (\lambda_j)} \Vert Y-X\theta\Vert_2^2 +\sum_{j=1}^p\lambda_j\theta_j^2$

subject to the constraint $\lambda\sum_{j=1}^p 1/\lambda_j=p$. Then the solution is equivalent, under a reparameterization of $\lambda$, to the lasso solution.

Main assumptions ensure that the sequence $(X_i)_{i=1}^n$ is sufficiently regular.

• A:
$C_n := \frac 1n \sum_{i=1}^n X_iX_i^T \rightarrow C\,,$

where $C$ is a positive definite matrix with minimum eigenvalue $c_\min > 0$

• B:

There exists a constant $C_X < \infty$ independent of $n$ such that

$\Vert X_i\Vert \le C_X$

Define the predictive risk and the leave-one-out cross-validation estimator of risk

$R_n(\lambda) = \frac 1n \bbE \Vert X(\hat \theta(\lambda) - \theta)\Vert^2 +\sigma^2$

and

$\hat R_n(\lambda) = \frac 1n \sum_{i=1}^n(Y_i - X_i^T\hat\theta^{(i)}(\lambda))^2$

Suppose that Assumptions A and B hold and that there exists a $C_\theta < \infty$ such that $\Vert\theta\Vert_1\le C_\theta$. Also, suppose $W_i$ is sub-gaussian. Then $$R_n(\hat\lambda_n) - R_n(\lambda_n) \rightarrow 0$$

To prove the theorem, first to show

$\sup \vert \hat R_n(\lambda) - R_n(\lambda)\vert \rightarrow 0$

in probability. Then the result follows as

\begin{align*} R_n(\hat \lambda_n) - R_n(\lambda_n) &= (R_n(\hat\lambda_n) - \hat R_n(\hat\lambda_n)) + (\hat R_n(\hat\lambda_n) - R_n(\lambda_n)) \\ &\le (R_n(\hat\lambda_n) - \hat R_n(\hat\lambda_n)) + (\hat R_n(\lambda_n) - R_n(\lambda_n))\\ &\le 2\sup (R_n(\lambda) - \hat R_n(\lambda))\\ &=o_p(1) \end{align*}

where $R_n(\hat \lambda_n) -R_n(\lambda_n)$ is non-stochastic, and therefore the convergence in probability implies sequential convergence.

Write

\begin{align*} \vert R_n(\lambda) - \hat R_n(\lambda)\vert &=\left\vert \frac 1n \bbE\Vert X\hat\theta(\lambda)\Vert_2^2 + \frac 1n \Vert X\theta\Vert_2^2 -\frac 2n \bbE (X\hat\theta(\lambda))^TX\theta + \sigma^2 - \frac 1n\sum_{i=1}^n\left(Y_i^2 + (X_i^T\hat\theta^{(i)}(\lambda))^2 - 2Y_iX_i^T\hat\theta^{(i)}(\lambda)\right) \right\vert\\ &\le \left\vert \frac 1n \bbE\Vert X\hat\theta(\lambda)\Vert_2^2 - \frac 1n\sum_{i=1}^n (X_i^T\hat\theta^{(i)}(\lambda))^2\right\vert + 2\left\vert \frac 1n \bbE (X\hat\theta(\lambda))^TX\theta - \frac 1n \sum_{i=1}^nY_iX_i^T\hat\theta^{(i)}(\lambda)\right\vert +\left\vert \frac 1n \Vert X\theta\Vert_2^2 + \sigma^2 - \frac 1n\sum_{i=1}^nY_i^2\right\vert\\ &\triangleq a + b + c \end{align*}

## a

Observe that

$\left\vert \frac 1n \bbE\Vert X\hat\theta(\lambda)\Vert_2^2 - \frac 1n \sum_{i=1}^n(X_i^T\hat\theta^{(i)}(\lambda))^2\right\vert \le \left\vert \frac 1n \bbE\Vert X\hat\theta(\lambda)\Vert_2^2 - \frac 1n\Vert X\hat\theta(\lambda)\Vert_2^2 \right\vert + \left\vert \frac 1n\Vert X\hat\theta(\lambda)\Vert_2^2 - \frac 1n\sum_{i=1}^n(X_i^T\hat\theta^{(i)}(\lambda))^2 \right\vert$

Published in categories Note