WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

SuSiE: Sum of Single Effects Model

Posted on
Tags: Variable Selection, Linear Regression, Genetic Fine Mapping

This note is for Wang, G., Sarkar, A., Carbonetto, P., & Stephens, M. (2020). A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(5), 1273–1300.

writing the sparse vector of regression coefficients as a sum of “single-effect” vectors, each with one non-zero element

a new fitting procedure: iterative Bayesian stepwise selection (IBSS)

a Bayesian analogue of stepwise selection methods

instead of selecting a single variable at each step, IBSS computes a distribution on variables that captures uncertainty in which variable to select

\[y = Xb + e\]

variables $j$ with non-zero effects as “effect variables”

Assume variables 1 and 4 are two effect variables, but each completely corrleated with another non-effect variable, say $x_1 = x_2$ and $x_3 = x_4$

we may conclude that there are (at least) two effect variables, and that

\[(b_1\neq 0 \text{ or } b_2 \neq 0)\text{ and } (b_3\neq 0\text{ or } b_4\neq 0)\]

The goal: is to provide methods that directly produce this kind of inferential statement.

two approaches:

  • select groups of variables
  • Bayesian approach

the marginal posterior inclusion probability (PIP) of each variable

\[PIP_j = \Pr(b_j\neq 0\mid X, y)\]

level $\rho$ credible set: a subset of variables that has probability $\rho$ or greater of containing at least one effect variable. Equivalently, the probability that all variables in the credible set have zero regression coefficients is $1-\rho$ or less.

primary aim: report as many credible sets as the data support, each with as few variables as possible second goal: prioritize the variables within each credible set, assigning each a probability that reflects the strength of the evidence for that variable being an effect variable.

Posterior under single-effect regression model

  • $\alpha$ is the vector of PIPs
  • from $\alpha$, one can compute a level $\rho$ credible set, $CS(\alpha, \rho)$

The sum of single-effects regression model

Numerical Comparisons

b specified by two parameters:

  • S: the number of effect variables
  • $\phi$: the proportion of variance in $y$ explained by $X$

Published in categories Note