Cross-Validation With Confidence

Tags: Cross-Validation

traditional CV tend to overfit, due to the ignorance of the uncertainty in the testing sample
develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample
this method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability
the method can achieve consistent variable selection in a classical linear regression setting (remind me of the consistency of fdr and cv error), for which existing cross-validation methods require unconventional split ratios

Introduction

early theoretical studies of cross-validation indicate that, under a low-dimensional linear model, CV cannot consistently select the correct model unless the training-testing split ratio tends to zero
for each candidate model (or tuning parameter value) $m$, CVC tests the null hypothesis that the regression functions estimated from candidate $m$ have the smallest predictive risk among all candidates
this hypothesis test is carried out individually for each candidate $m$, and obtains a valid $p$-value $\hat p_m$ by comparing the cross-validated residuals of all candidate models
the subset of candidate models for which the null hypotheses are not rejected, denoted $\cA = {m: \hat p_m\ge \alpha }$, forms a confidence set for model (or tuning parameter) selection

\[\xi_{m,j} = \ell(\hat f_m(X), Y) - \ell(\hat f_j(X), Y)\]

let $\mu_{m, j} = \bbE[\xi_{m,j}\mid D_{tr}]$. Consider a hypothesis testing problem

\[H_{0,m}: \max_{j\neq m} \mu_{m, j}\neq 0\quad \text{versus}\quad H_{1, m}: \max_{j\neq m} \mu_{m, j} > 0\]

the CVC procedure outputs the confidence set

\[\cA_{ss} = \{\hat f_{ss}: m\in\cM, \hat p_{ss, m} > \alpha\}\]

\[Y=X^T\beta + \epsilon\]

given a collection of subsets $\cJ = {J_1,\ldots, J_M}\subset 2^p$, we would like to find the $m^\star$ such that

\[J_{m^\star} = J^\star := \{j:\beta_j\neq 0\}\]

consider the most parsimonious model in the confidence set

\[\hat m_{ssc} = \argmin_{m\in\cA_{ss}} \vert J_m\vert\]

and

\[\hat m_{cvc} = \argmin_{m\in \cA_{cv}} \vert J_m\vert\]

future extensions

study and extend CVC for high-dimensional regression model selection problems
extend the framework of CVC to unsupervised learning problems, e.g., in k-means or model-based clustering

Published in categories

← previous next →