Post-selection inference via algorithmic stability

Posted on Oct 02, 2025

Tags: Post-selection Inference, Selective Inference, Differential Privacy, Algorithm Stability

This note is for the paper Zrnic, T., & Jordan, M. I. (2023). Post-selection inference via algorithmic stability. The Annals of Statistics, 51(4), 1666–1691..

it proposes a solution to the problem of inference after selection by building on the framework of algorithmic stability, in particular its branch with origins in the field of differential privacy

stability is achieved via randomization of selection an it serves as a quantitative measure that is sufficient to obtain nontrivial post-selection corrections for classical confidence intervals

the theoretical framework delivers intervals of tunable width, since the condifence intervals derive from a quantitiative measure of the algorithmic stability of the selection procedure.

$\hat S$: a data-dependent outcome of a selection
$\beta_S$: the resulting inferential target
$\hat S_0$: imagine that there is an oracle that guesses $\hat S$, only knowing the method used to arrive at the selection together with the distribution of the data, but not its realization.

a selection procedure is $\eta$-stable for some $\eta > 0$ if there exists an oracle such that, with high probability over the distribution of the data, the likelihood of any selection under $\hat S$ and the likelihood of the same selection under $\hat S_0$ can differ by at most a multiplicative factor of $e^\eta$.

the magnitude of stability depends not only on the selection method, but also on the distribution of the data

For every fixed selection $S$, suppose that $CI_S^{(\alpha)}$ are confidence intervals with valid coverage, $\bbP\{\beta_S\not\in \text{CI}_S^{(\alpha)}\}\le \alpha$ Let $\hat S$ be an $\eta$-stable selection. Then $\bbP\{\beta_{\hat S} \not\in \text{CI}_{\hat S}^{(\alpha e^{-\eta})}\} \le \alpha$

2. Motivating vignettes

2.1 Vignette 1: Winner’s curse

consider the problem of selecting the largest observed effect. suppose one observe an $n$-dimensional vector $y\sim N(\mu, \sigma^2I)$, for example, each entry in this vector could be an observed treatment effect for a separate treatment.

we are interested in doing inference on the most significance effect

more formally, denoting $i_\star = \argmax_i y_i$, we want to construct a confidence interval for $\mu_{i_\star}$

this is a random inferential target because $i_\star$ is a function of the data

one simple way for providing valid inference for $\mu_{i_\star}$ is to apply the Bonferroni correction:

\[\bbP\{\mu_{i_\star} \in (y_{i_\star} \pm z_{1-\alpha/(2n)}\sigma) \} \ge 1 - \alpha\]

Benjamini et al. show that a tighter correction is valid,

\[\bbP\{\mu_{i_\star} \in (y_{i_\star} \pm z_{1-\alpha/(n+1)}\sigma)\} \ge 1 - \alpha\]

the paper shows that, if we randomize the selection step, rather than select $i_\star$ exactly, the intervals can be made even tighter

Suppose that we select $\hat i_\star = \argmax_i (y_i + \xi_i)$, where $\xi_i \sim_{iid} \text{Lap}\left(\frac{2z_{1-\alpha\delta/(2n)}}{\eta}\right)$, for user-chosen parameters $\eta > 0, \delta \in (0, 1)$, then $\bbP\{\mu_{\hat i_\star} \in (y_{\hat i_\star} \pm z_{1-\alpha(1-\delta)e^{-\eta}/2}\sigma)\} \ge 1 - \alpha$

2.2 Vignette 2: Feature Selection

suppose the goal of selection is to simply maximize the absolute correlation of the selected feature with $y$:

\[i_\star = \argmax_i \vert X_i^Ty\vert\]

Then, the results imply the following

Suppose that we select $\hat i_\star = \argmax_i \vert X_i^Ty + \xi\vert$, where $\xi \sim_{iid} \text{Lap}\left(\frac{2z_{1-\alpha\delta/(2d)}}{\eta}\right)$, for user-chosen parameters $\eta > 0, \delta \in (0, 1)$. Then $\bbP(X_{\hat i_\star}^T\mu\in (X_{\hat i_\star}^Ty \pm z_{1-\alpha(1-\delta)e^{-\eta}/2}\sigma)) \ge 1 - \alpha$

3.1 Simultaneous coverage

in the formulation of post-selection inference by Berk et al, the goal is to construct simultaneous confidence intervals that are valid for any model selection method $\hat M: y\rightarrow \cM$ for a prespecified model class $\cM$.

the stability-based approach does not rely on any independence assumption between different observations

the intervals computed by simultaneous methods and related approaches are unnecessarily wide, as they do not exploit any knowledge of how the analyst arrives at the selected target.

3.2 Conditional coverage

conditional methods exploit properties of the selection procedure. In particular, the goal of conditional post-selection inference is to design $\text{CI}_S$ such that, for all fixed selections $S$,

\[\bbP\{\beta_S\in \text{CI}_S\mid \hat S = S\} \ge 1 -\alpha\]

the conditional approach leads to overconditioning, thus leads to wide intervals. Informally, overconditioning refers to the phenomenon of overstating the cost of selection, thus leaving little information for inference.

3.3 Algorithmic stability

a more recent line of work, adaptive data analysis has recognized that a stability concept can be extracted from differential privacy and exploited to obtain perturbation-based generalization guarantees in learning theory

4 Algorithmic stability and selection

A random variable $Q$ is $(\eta, \tau)$-indistinguishable from $W$, denoted $Q\approx_{\eta, \tau}W$, if for all measurable sets $\cO$ $\bbP\{Q\in \cO\} \le e^\eta \bbP\{W\in \cO\} + \tau$

Let $\cA: \IR^n\rightarrow \cS$ be a randomized algorithm. We say that $\cA$ is $(\eta, \tau, \nu)$-stable with respect to a distribution $\cP$ supported on $\IR^n$ if there exists a random variable $A_0$, possibly dependent on $\cP$, such that $\cP\{w\in \IR^n: \cA(w)\approx_{\eta, \tau}A_0\}\ge 1 - \nu$

Example

Suppose to compute $w^Ty$, for some fixed vector $w$, and suppose $\cP_y$ to be $N(\mu, \sigma^2)$ with known $\sigma > 0$. Let $\cA(y) = w^Ty + \xi$, where $\xi \sim \text{Lap}\left(\frac{z_{1-\nu/2}\sigma\Vert w\Vert_2}{\eta}\right)$.

This mechanism is $(\eta, 0, \nu)$-stable.

5 Confidence intervals after stable selection

suppose that, under selection $S$, the target of inference is $\beta_S$. Moreover, suppose that $\text{CI}_S^{(\alpha)}$ are valid intervals at level $1-\alpha$ for any fixed $S$, meaning $\bbP{\beta_S\not\in \text{CI}_S^{(\alpha)}} \le \alpha$

Fix $\delta\in (0, 1)$, and let $\hat S$ be an $(\eta, \tau, \nu)$-stable selection algorithm. Then $\bbP\{\beta_{\hat S}\not \in \text{CI}^{\delta e^{-\eta}}\}\le \delta + \tau + \nu$

5.1 Comparison with data splitting

suppose allocate $f$-fraction of the data to selection, and $(1-f)$-fraction to inference. then, the resulting intervals will roughly look like classical intervals augmented by a factor of $\sqrt{\frac{1}{1-f}}$ regardless of how the selection is performed

suppose for concretness that $y\sim N(\mu, I)$ and we are considering doing inference on one of two targets, $\nu_0^T\mu$ or $\nu_1^T\mu$, where the selection $\hat S\in {0, 1}$ depends on the data $y$

another proposal that is conceptually closely related to data splitting, namely the (U, V) decomposition

6 Model selection in linear regression

here, do not assume $\mu = \bbE[y]$ can be expressed as a linear combination of ${X_i}_{i=1}^d$

the goal is to construct simultaneous CIs for the target of inference $\beta_{\hat M}$

the intervals resulting from this approach are of the form

\[\text{CI}_{j\hat M}(K):= (\hat \beta_{j\hat M} \pm K\hat \sigma_{j\hat M})\]

refer to the minimal such valid $K$ as the PoSI constant

7 Conditional coverage

7.1 Example: Publication bias

suppose we observe an effect $y\sim \cP_y$ with $\bbE[y]=\mu, \text{supp}(\cP_y)\subset \IR$. We are interested in constructing an interval for $\mu$ only if the observed effect is deemed interesting enough, for example, if $y > T$ for some threshold $T$. Denote by $\text{report}(y)$ the event that we decide to report the confidence interval.

8 The design of stable selection algorithms

8.1 Properties of stability

closure under post-processing
composition

8.2 Model selection algorithms

8.2.1 Model selection via the Lasso

8.2.2 Model selection via marginal screening

9 Experimental results

evaluate the selective intervals for the LASSO and marginal screening and compare the solution with data splitting

Published in categories

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Post-selection inference via algorithmic stability

Posted on Oct 02, 2025

2. Motivating vignettes

2.1 Vignette 1: Winner’s curse

2.2 Vignette 2: Feature Selection

3.1 Simultaneous coverage

3.2 Conditional coverage

3.3 Algorithmic stability

4 Algorithmic stability and selection

Example

5 Confidence intervals after stable selection

5.1 Comparison with data splitting

6 Model selection in linear regression

7 Conditional coverage

7.1 Example: Publication bias

8 The design of stable selection algorithms

8.1 Properties of stability

8.2 Model selection algorithms

8.2.1 Model selection via the Lasso

8.2.2 Model selection via marginal screening

9 Experimental results

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Post-selection inference via algorithmic stability

Posted on Oct 02, 2025

2. Motivating vignettes

2.1 Vignette 1: Winner’s curse

2.2 Vignette 2: Feature Selection

3. Related work

3.1 Simultaneous coverage

3.2 Conditional coverage

3.3 Algorithmic stability

4 Algorithmic stability and selection

Example

5 Confidence intervals after stable selection

5.1 Comparison with data splitting

6 Model selection in linear regression

7 Conditional coverage

7.1 Example: Publication bias

8 The design of stable selection algorithms

8.1 Properties of stability

8.2 Model selection algorithms

8.2.1 Model selection via the Lasso

8.2.2 Model selection via marginal screening

9 Experimental results