Exact Post-Selection Inference for Sequential Regression Procedures
Posted on
propose new inference tools for forward stepwise regression, least angle regression, and the lasso.
assume a Gaussian model for the observation vector y, first describe a general scheme to perform valid inference after any selection event that can be characterized as y falling into a polyhedral set.
this framework allows us to derive conditional (post-selection) hypothesis tests at any step of forward stepwise or least angle regression, or any step along the lasso regularization path
the p-values associated with these tests are exactly uniform under the null distribution, in finite samples, yielding exact Type I error control.
R package: selectiveInference
Introduction
Consider observations y∈IRn drawn from a Gaussian model,
y=θ+ϵ,ϵ∼N(0,σ2I).do not assume that the true model is itself linear
related work
inference for high-dimensional regression models
- based on sample splitting or resampling methods
- Wasserman and Roeder (2009)
- Meinshausen and Buhlmann (2010)
- Minnier, Tian, and Cai (2011)
- based on “debiasing” a regularized regression estimator, like the lasso
- Zhang and Zhang (2014)
- Buhlmann (2013)
- van de Geer et al. (2014)
- Javamard and Montanari (2014a, 2014b)
the inferential targets considered in the aforementioned works are all fixed, and not post-selected
It is clear (at least conceptually) how to use sampling-splitting techniques to accommodate post-selection inferential goals; it is much less clear how to do so with the debiasing tools mentioned above.
- Berk et al. (2013) carried out valid post-selection inference (PoSI) by considering all possible model selection procedures that could have produced the given submodel.
- as the authors state, the inferences are generally conservative for particular selection procedures, but have the advantages that they do not depend on the correctness of the selected submodel.
- Lee et al. (2016), concurrent with this paper, constructed p-values and intervals for lasso coefficients at a fixed value of the regularization parameter λ. Both leverage the same core statistical framework, using truncated Gaussian (TG) distributions, for exact post-selection inference, but differ in the applications pursued with this framework.
Summary of Results
Consider testing the hypothesis
H0:νTθ=0,conditional on having observed y∈P, where P is a given polyhedral set, and ν is a given contrast vector.
Derive a test statistic T(y,P,ν) with the property that
T(y,P,ν)∼P0Unif(0,1),where P0(⋅)=PvTθ=0(⋅∣y∈P).
For many regression procedures of interest, in particular, for the sequential algorithms FS, LAR, and lasso — the event that the procedure selects a given model (after a given number of steps) can be represented in this form.
For example, consider FS after one step, with p=3 variable total: the FS procedure selects variable 3, and assigns it a positive coefficients, iff
XT3y/‖X3‖2≥±XT1y/‖X1‖2,XT3Y/‖X3‖2≥±XT2y/‖X2‖2.With X considered fixed, it can be compactly represented as Γy≥0.
If ˆj1(y) and ˆs1(y) denote the variable and sign selected by FS at the first step, then
{y:ˆj1(y)=3,ˆs1(y)=1}={y:Γy≥0},for a particular matrix Γ.
To test the significance of the third variable, conditional on it being selected at the first step of FS, consider the null hypothesis H0 with ν=X3,P=y:Γy≥0.
This can be rexpressed as
PXT3θ=0(T1≤α∣ˆj1(y)=3,ˆs1(y)=1)=αfor all α∈[0,1].
A similar constrcution holds for a general step k of FS: letting ˆAk(y)=[ˆj1(y),…,ˆjk(y)] denote the active list after k steps and ˆsAk(y)=[ˆs1(y),…,ˆsk(y)] denote the signs of the corresponding coefficients, we have, for any fixed Ak and sAk,
{y:ˆAk(y)=Ak,ˆsAk(y)=sAk}={y:Γy≥0},for another matrix Γ.
Let (MTM)+ for the Moore-Penrose pseudoinverse of the square matrix MTM, and M+=(MTM)+MT for the pseudoinverse of the rectangular matrix M.
With ν=(X+Ak)Tek, where ek is the k-th standard basis vector, the hypothesis is eTkX+Akθ=0, i.e., it specifies that the last partial regression coefficient is not significant, in a projected linear model of θ on XAk.
Conditional Confidence Intervals
By inverting the test statistic, one can obtain a conditional confidence intervals Ik satisfying
P(eTkX+Akθ∈Ik∣ˆAk(y)=Ak,ˆsAk(y)=sAk)=1−α.