Cross-Validation With Confidence
Posted on
This note is for Lei, J. (2020). Cross-Validation With Confidence. Journal of the American Statistical Association, 115(532), 1978–1997.
- traditional CV tend to overfit, due to the ignorance of the uncertainty in the testing sample
- develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample
- this method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability
- the method can achieve consistent variable selection in a classical linear regression setting (remind me of the consistency of fdr and cv error), for which existing cross-validation methods require unconventional split ratios
Introduction
- early theoretical studies of cross-validation indicate that, under a low-dimensional linear model, CV cannot consistently select the correct model unless the training-testing split ratio tends to zero
- for each candidate model (or tuning parameter value) $m$, CVC tests the null hypothesis that the regression functions estimated from candidate $m$ have the smallest predictive risk among all candidates
- this hypothesis test is carried out individually for each candidate $m$, and obtains a valid $p$-value $\hat p_m$ by comparing the cross-validated residuals of all candidate models
- the subset of candidate models for which the null hypotheses are not rejected, denoted $\cA = {m: \hat p_m\ge \alpha }$, forms a confidence set for model (or tuning parameter) selection
Cross-validation with confidence
3.1 sample-split validation with hypothesis testing
- candidate estimates ${\hat f_m:m \in \cM}$ obtained from training data $D_{tr}$
- for each $m\in \cM$, define random vector $\xi_m = (\xi_{m, j}: j\neq m)$ as
let $\mu_{m, j} = \bbE[\xi_{m,j}\mid D_{tr}]$. Consider a hypothesis testing problem
\[H_{0,m}: \max_{j\neq m} \mu_{m, j}\neq 0\quad \text{versus}\quad H_{1, m}: \max_{j\neq m} \mu_{m, j} > 0\]the CVC procedure outputs the confidence set
\[\cA_{ss} = \{\hat f_{ss}: m\in\cM, \hat p_{ss, m} > \alpha\}\]3.2 V-fold CV with Confidence
4. Theoretical Properties
4.2 Model Selection Consistency in Linear Models
\[Y=X^T\beta + \epsilon\]given a collection of subsets $\cJ = {J_1,\ldots, J_M}\subset 2^p$, we would like to find the $m^\star$ such that
\[J_{m^\star} = J^\star := \{j:\beta_j\neq 0\}\]consider the most parsimonious model in the confidence set
\[\hat m_{ssc} = \argmin_{m\in\cA_{ss}} \vert J_m\vert\]and
\[\hat m_{cvc} = \argmin_{m\in \cA_{cv}} \vert J_m\vert\]Numerical Experiments
Simulation 1: Subset Selection Consistency in Linear Models
Simulation 2: Tuning the Lasso for Risk Minimization
Concluding Remarks
future extensions
- study and extend CVC for high-dimensional regression model selection problems
- extend the framework of CVC to unsupervised learning problems, e.g., in k-means or model-based clustering