WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Cross-Validation for High-Dimensional Ridge and Lasso

Posted on 0 Comments
Tags: Cross-Validation, Ridge, Lasso, High-Dimensional

This note collects several references on the research of cross-validation.

\[\newcommand\loo{\mathrm{loo}} \newcommand\gcv{\mathrm{gcv}}\]

Ridge Regression

Refer to Patil, P., Wei, Y., Rinaldo, A., & Tibshirani, R. (2021). Uniform Consistency of Cross-Validation Estimators for High-Dimensional Ridge Regression. International Conference on Artificial Intelligence and Statistics, 3178–3186.

examine generalized and leave-one-out cross-validation for ridge regression in a proportional asymptotic framework where the dimension of the feature space grows proportionally with the number of observations.

  • given i.i.d. samples from a linear model with an arbitrary feature covariance and a signal vector that is bounded in $\ell_2$ norm, we show that generalized cross-validation for ridge regression converges almost surely to the expected out-of-sample prediction error, uniformly over a range of ridge regularization parameters that includes zero (and even negative values)
  • prove the analogous result for leave-one-out cross-validation
  • ridge tunning via minimization of generalized or leave-one-out cross-validation asymptotically almost surely delivers the optimal level of regularization for predictive accuracy.
  • Ridge Error Analysis
  • Ridge Cross Validation

Problem Setup

consider the out-of-sample prediction error, or conditional (on the training dataset) prediction error,

\[\err(\lambda) = \Err(\hat\beta_\lambda)\]

of ridge estimate $\hat\beta_\lambda$, the goal is to analyze the differences between the cross-validation estimators of risk and the risk itself,

\[\loo(\lambda) - \err(\lambda)\]


\[\gcv(\lambda) - \err(\lambda)\]

Also, denote the optimal parameters as $\lambda_I^\star, \hat\lambda_I^\gcv, \hat\lambda_I^\loo$, compare the prediction errors of the models tunned using GCV and LOOCV.


Refer to Chetverikov, D., Liao, Z., & Chernozhukov, V. (2020). On cross-validated Lasso in high dimensions. ArXiv:1605.02214 [Math, Stat].

There exist very few results about properties of the Lasso estimator when $\lambda$ is chosen using cross-validation,

derive non-asymptotic error bounds for the Lasso estimator when the penalty parameter for the estimator is chosen using $K$-fold cross-validation.

  • the bounds imply that the cross-validated Lasso estimator has nearly optimal rates of convergence in the prediction

For example, when the conditional distribution of the noise $\epsilon$ given $X$ is Gaussian, the $L^2$ norm implies that

\[\Vert \hat\beta(\hat\lambda) - \beta\Vert_2 = O_P\left(...\right)\]
  • the results cover the case when $p$ in (potentially much) larger than $n$ and also allow for the case of non-Gaussian noise.


Also have a look on Fushiki, T. (2011). Estimation of prediction error by using K-fold cross-validation. Statistics and Computing, 21(2), 137–146.

the training error has a downward bias and K-fold cross-validation has an upward bias, the paper investigates two families that connect the training error and K-fold cross-validation.

This strategy reminds me the one used in Bootstrap mentioned in ESL

Published in categories Note