Exact Post-Selection Inference for Sequential Regression Procedures

Posted on Jan 19, 2024

Tags: p-values, False Discovery Rate, Lasso, Forward Stepwise Regression, Least Angle Regression

This post is for Tibshirani, R. J., Taylor, J., Lockhart, R., & Tibshirani, R. (2016). Exact Post-Selection Inference for Sequential Regression Procedures. Journal of the American Statistical Association, 111(514), 600–620.

propose new inference tools for forward stepwise regression, least angle regression, and the lasso.

assume a Gaussian model for the observation vector $y$, first describe a general scheme to perform valid inference after any selection event that can be characterized as $y$ falling into a polyhedral set.

this framework allows us to derive conditional (post-selection) hypothesis tests at any step of forward stepwise or least angle regression, or any step along the lasso regularization path

the p-values associated with these tests are exactly uniform under the null distribution, in finite samples, yielding exact Type I error control.

R package: selectiveInference

Introduction

Consider observations $y\in \IR^n$ drawn from a Gaussian model,

\[y = \theta +\epsilon, \epsilon\sim N(0, \sigma^2I)\,.\]

do not assume that the true model is itself linear

inference for high-dimensional regression models

based on sample splitting or resampling methods
- Wasserman and Roeder (2009)
- Meinshausen and Buhlmann (2010)
- Minnier, Tian, and Cai (2011)
based on “debiasing” a regularized regression estimator, like the lasso
- Zhang and Zhang (2014)
- Buhlmann (2013)
- van de Geer et al. (2014)
- Javamard and Montanari (2014a, 2014b)

the inferential targets considered in the aforementioned works are all fixed, and not post-selected

It is clear (at least conceptually) how to use sampling-splitting techniques to accommodate post-selection inferential goals; it is much less clear how to do so with the debiasing tools mentioned above.

Berk et al. (2013) carried out valid post-selection inference (PoSI) by considering all possible model selection procedures that could have produced the given submodel.
- as the authors state, the inferences are generally conservative for particular selection procedures, but have the advantages that they do not depend on the correctness of the selected submodel.
Lee et al. (2016), concurrent with this paper, constructed p-values and intervals for lasso coefficients at a fixed value of the regularization parameter $\lambda$. Both leverage the same core statistical framework, using truncated Gaussian (TG) distributions, for exact post-selection inference, but differ in the applications pursued with this framework.

Summary of Results

Consider testing the hypothesis

\[H_0: \nu^T\theta = 0\,,\]

conditional on having observed $y\in \cP$, where $\cP$ is a given polyhedral set, and $\nu$ is a given contrast vector.

Derive a test statistic $T(y, \cP, \nu)$ with the property that

\[T(y, P, \nu) \sim_{P_0} \mathrm{Unif}(0, 1)\,,\]

where $P_0(\cdot) = P_{v^T\theta = 0}(\cdot \mid y\in \cP)$.

For many regression procedures of interest, in particular, for the sequential algorithms FS, LAR, and lasso — the event that the procedure selects a given model (after a given number of steps) can be represented in this form.

For example, consider FS after one step, with $p=3$ variable total: the FS procedure selects variable 3, and assigns it a positive coefficients, iff

\[X_3^Ty/\Vert X_3\Vert_2 \ge \pm X_1^Ty/\Vert X_1\Vert_2\,, X_3^TY/\Vert X_3\Vert_2 \ge \pm X_2^Ty / \Vert X_2\Vert_2\,.\]

With $X$ considered fixed, it can be compactly represented as $\Gamma y \ge 0$.

If $\hat j_1(y)$ and $\hat s_1(y)$ denote the variable and sign selected by FS at the first step, then

\[\{y: \hat j_1(y) = 3, \hat s_1(y) = 1\} = \{y:\Gamma y\ge 0\}\,,\]

for a particular matrix $\Gamma$.

To test the significance of the third variable, conditional on it being selected at the first step of FS, consider the null hypothesis $H_0$ with $\nu = X_3, \cP = {y: \Gamma y \ge 0}$.

This can be rexpressed as

\[P_{X_3^T\theta = 0}(T_1\le \alpha\mid \hat j_1(y) = 3, \hat s_1(y) = 1) = \alpha\]

for all $\alpha\in [0, 1]$.

A similar constrcution holds for a general step $k$ of FS: letting $\hat A_k(y) = [\hat j_1(y), \ldots, \hat j_k(y)]$ denote the active list after $k$ steps and $\hat s_{A_k}(y) = [\hat s_1(y), \ldots, \hat s_k(y)]$ denote the signs of the corresponding coefficients, we have, for any fixed $A_k$ and $s_{A_k}$,

\[\{y: \hat A_k(y) = A_k, \hat s_{A_k}(y) = s_{A_k}\} = \{y:\Gamma y\ge 0\}\,,\]

for another matrix $\Gamma$.

Let $(M^TM)^+$ for the Moore-Penrose pseudoinverse of the square matrix $M^TM$, and $M^+ = (M^TM)^+M^T$ for the pseudoinverse of the rectangular matrix $M$.

With $\nu = (X_{A_k}^+)^Te_k$, where $e_k$ is the $k$-th standard basis vector, the hypothesis is $e_k^TX_{A_k}^+\theta = 0$, i.e., it specifies that the last partial regression coefficient is not significant, in a projected linear model of $\theta$ on $X_{A_k}$.

Conditional Confidence Intervals

By inverting the test statistic, one can obtain a conditional confidence intervals $I_k$ satisfying

\[P(e_k^TX_{A_k}^+\theta\in I_k\mid \hat A_k(y) = A_k, \hat s_{A_k}(y) = s_{A_k}) = 1-\alpha\,.\]

Marginalization

Published in categories Note

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Exact Post-Selection Inference for Sequential Regression Procedures

Posted on Jan 19, 2024

Introduction

Summary of Results

Conditional Confidence Intervals

Marginalization

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Exact Post-Selection Inference for Sequential Regression Procedures

Posted on Jan 19, 2024

Introduction

related work

Summary of Results

Conditional Confidence Intervals

Marginalization