Processing math: 100%

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Exact Post-Selection Inference for Sequential Regression Procedures

Posted on
Tags: p-values, False Discovery Rate, Lasso, Forward Stepwise Regression, Least Angle Regression

This post is for Tibshirani, R. J., Taylor, J., Lockhart, R., & Tibshirani, R. (2016). Exact Post-Selection Inference for Sequential Regression Procedures. Journal of the American Statistical Association, 111(514), 600–620.

propose new inference tools for forward stepwise regression, least angle regression, and the lasso.

assume a Gaussian model for the observation vector y, first describe a general scheme to perform valid inference after any selection event that can be characterized as y falling into a polyhedral set.

this framework allows us to derive conditional (post-selection) hypothesis tests at any step of forward stepwise or least angle regression, or any step along the lasso regularization path

the p-values associated with these tests are exactly uniform under the null distribution, in finite samples, yielding exact Type I error control.

R package: selectiveInference

Introduction

Consider observations yIRn drawn from a Gaussian model,

y=θ+ϵ,ϵN(0,σ2I).

do not assume that the true model is itself linear

inference for high-dimensional regression models

  • based on sample splitting or resampling methods
    • Wasserman and Roeder (2009)
    • Meinshausen and Buhlmann (2010)
    • Minnier, Tian, and Cai (2011)
  • based on “debiasing” a regularized regression estimator, like the lasso
    • Zhang and Zhang (2014)
    • Buhlmann (2013)
    • van de Geer et al. (2014)
    • Javamard and Montanari (2014a, 2014b)

the inferential targets considered in the aforementioned works are all fixed, and not post-selected

It is clear (at least conceptually) how to use sampling-splitting techniques to accommodate post-selection inferential goals; it is much less clear how to do so with the debiasing tools mentioned above.

  • Berk et al. (2013) carried out valid post-selection inference (PoSI) by considering all possible model selection procedures that could have produced the given submodel.
    • as the authors state, the inferences are generally conservative for particular selection procedures, but have the advantages that they do not depend on the correctness of the selected submodel.
  • Lee et al. (2016), concurrent with this paper, constructed p-values and intervals for lasso coefficients at a fixed value of the regularization parameter λ. Both leverage the same core statistical framework, using truncated Gaussian (TG) distributions, for exact post-selection inference, but differ in the applications pursued with this framework.

Summary of Results

Consider testing the hypothesis

H0:νTθ=0,

conditional on having observed yP, where P is a given polyhedral set, and ν is a given contrast vector.

Derive a test statistic T(y,P,ν) with the property that

T(y,P,ν)P0Unif(0,1),

where P0()=PvTθ=0(yP).

For many regression procedures of interest, in particular, for the sequential algorithms FS, LAR, and lasso — the event that the procedure selects a given model (after a given number of steps) can be represented in this form.

For example, consider FS after one step, with p=3 variable total: the FS procedure selects variable 3, and assigns it a positive coefficients, iff

XT3y/X32±XT1y/X12,XT3Y/X32±XT2y/X22.

With X considered fixed, it can be compactly represented as Γy0.

If ˆj1(y) and ˆs1(y) denote the variable and sign selected by FS at the first step, then

{y:ˆj1(y)=3,ˆs1(y)=1}={y:Γy0},

for a particular matrix Γ.

To test the significance of the third variable, conditional on it being selected at the first step of FS, consider the null hypothesis H0 with ν=X3,P=y:Γy0.

This can be rexpressed as

PXT3θ=0(T1αˆj1(y)=3,ˆs1(y)=1)=α

for all α[0,1].

A similar constrcution holds for a general step k of FS: letting ˆAk(y)=[ˆj1(y),,ˆjk(y)] denote the active list after k steps and ˆsAk(y)=[ˆs1(y),,ˆsk(y)] denote the signs of the corresponding coefficients, we have, for any fixed Ak and sAk,

{y:ˆAk(y)=Ak,ˆsAk(y)=sAk}={y:Γy0},

for another matrix Γ.

Let (MTM)+ for the Moore-Penrose pseudoinverse of the square matrix MTM, and M+=(MTM)+MT for the pseudoinverse of the rectangular matrix M.

With ν=(X+Ak)Tek, where ek is the k-th standard basis vector, the hypothesis is eTkX+Akθ=0, i.e., it specifies that the last partial regression coefficient is not significant, in a projected linear model of θ on XAk.

Conditional Confidence Intervals

By inverting the test statistic, one can obtain a conditional confidence intervals Ik satisfying

P(eTkX+AkθIkˆAk(y)=Ak,ˆsAk(y)=sAk)=1α.

Marginalization


Published in categories Note