WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Cross-Validation With Confidence

Posted on
Tags: Cross-Validation

This note is for Lei, J. (2020). Cross-Validation With Confidence. Journal of the American Statistical Association, 115(532), 1978–1997.

  • traditional CV tend to overfit, due to the ignorance of the uncertainty in the testing sample
  • develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample
  • this method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability
  • the method can achieve consistent variable selection in a classical linear regression setting (remind me of the consistency of fdr and cv error), for which existing cross-validation methods require unconventional split ratios

Introduction

  • early theoretical studies of cross-validation indicate that, under a low-dimensional linear model, CV cannot consistently select the correct model unless the training-testing split ratio tends to zero
  • for each candidate model (or tuning parameter value) $m$, CVC tests the null hypothesis that the regression functions estimated from candidate $m$ have the smallest predictive risk among all candidates
  • this hypothesis test is carried out individually for each candidate $m$, and obtains a valid $p$-value $\hat p_m$ by comparing the cross-validated residuals of all candidate models
  • the subset of candidate models for which the null hypotheses are not rejected, denoted $\cA = {m: \hat p_m\ge \alpha }$, forms a confidence set for model (or tuning parameter) selection

Cross-validation with confidence

3.1 sample-split validation with hypothesis testing

  • candidate estimates ${\hat f_m:m \in \cM}$ obtained from training data $D_{tr}$
  • for each $m\in \cM$, define random vector $\xi_m = (\xi_{m, j}: j\neq m)$ as
\[\xi_{m,j} = \ell(\hat f_m(X), Y) - \ell(\hat f_j(X), Y)\]

let $\mu_{m, j} = \bbE[\xi_{m,j}\mid D_{tr}]$. Consider a hypothesis testing problem

\[H_{0,m}: \max_{j\neq m} \mu_{m, j}\neq 0\quad \text{versus}\quad H_{1, m}: \max_{j\neq m} \mu_{m, j} > 0\]

the CVC procedure outputs the confidence set

\[\cA_{ss} = \{\hat f_{ss}: m\in\cM, \hat p_{ss, m} > \alpha\}\]

3.2 V-fold CV with Confidence

4. Theoretical Properties

4.2 Model Selection Consistency in Linear Models

\[Y=X^T\beta + \epsilon\]

given a collection of subsets $\cJ = {J_1,\ldots, J_M}\subset 2^p$, we would like to find the $m^\star$ such that

\[J_{m^\star} = J^\star := \{j:\beta_j\neq 0\}\]

consider the most parsimonious model in the confidence set

\[\hat m_{ssc} = \argmin_{m\in\cA_{ss}} \vert J_m\vert\]

and

\[\hat m_{cvc} = \argmin_{m\in \cA_{cv}} \vert J_m\vert\]

Numerical Experiments

Simulation 1: Subset Selection Consistency in Linear Models

Simulation 2: Tuning the Lasso for Risk Minimization

Concluding Remarks

future extensions

  • study and extend CVC for high-dimensional regression model selection problems
  • extend the framework of CVC to unsupervised learning problems, e.g., in k-means or model-based clustering

Published in categories