WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Prediction Risk for the Horseshoe Regression

Posted on
Tags: Horseshoe, Ridge

The note is for Bhadra, A., Datta, J., Li, Y., Polson, N. G., & Willard, B. (2019). Prediction Risk for the Horseshoe Regression. 39.

Develop theoretical results on prediction risk in the high-dimensional linear regression model

\[y = X\beta + \epsilon\]

Consider the quadratic predictive risk

\[R = E_{y^\star, y\mid X, \beta}(y^\star - X\hat\beta)^2\]

focus on comparing estimators $\hat\beta$ in a non-asymptotic fixed $n$, fixed $p > n$ setting.

global shrinkage: shrinkage estimators with a single tuning parameter

  • ridge regression
  • principal components regression

purely global shrinkage regression methods suffer from two major difficulties

  • the amount of relative shrinkage is monotone in the singular values of the design matrix
  • the shrinkage is determined by a single tuning parameter

A finite sample unbiased estimate of $R$ is given by Stein’s unbiased risk estimate or SURE


analyze the finite sample predictive risk of global shrinkage regression methods, examine where these methods fall short, and demonstrate a remedy using local shrinkage parameters

  • theoretical findings: an orthogonalized representation that allows shrinkage regression estimates to be viewed as posterior means under some suitable priors.
    • provide explicit finite sample risk comparisons between the global ridge and global-local horseshoe regressions

Shrinkage regression estimates as posterior means

Let $X=UDW^T$, and let $Z=UD$ be $n\times n$, and $\alpha = W^T\beta$ be $n\times 1$. Then the regression model becomes

\[y=Z\alpha + \epsilon\]

The estimates of many shrinkage regression methods can be expressed in terms of posterior mean of the orthogonalized regression coefficients $\alpha$ under the following hierarchical model:

\[\begin{align*} (\hat\alpha_i\mid \alpha_i, \sigma^2) \sim_{ind} N(\alpha_i, \sigma^2d_i^{-2})\\ (\alpha_i\mid \sigma^2,\tau^2, \lambda_i^2) \sim_{ind} N(0, \sigma^2\tau^2\lambda_i^2) \end{align*}\]

where $\sigma^2,\tau^2 > 0$,

  • $\tau$: control the amount of shrinkage
  • fixed $\lambda_i^2$: depend on the method at hand

several examples:

  • ridge: $\lambda_i^2=1$

Stein’s unbiased risk estimate for the global shrinkage regression

Stein’s unbiased risk estimate for the horseshoe regression

the global-local horseshoe shrinkage regreesion extends the global shrinkage regression models by putting a local (component-specific), heavy-tailed half-Cauchy prior on the $\lambda_i$ terms that allow these terms to be learned from the data

\[\begin{align*} (\hat\alpha_i\mid \alpha_i, \sigma^2) &\sim_{ind} N(\alpha_i, \sigma^2d_i^{-2})\\ (\alpha_i\mid \sigma^2, \tau^2, \lambda_i^2) & \sim_{ind} N(0, \sigma^2\tau^2\lambda_i^2)\\ \lambda_i & \sim_{ind} C^+(0, 1) \end{align*}\]

where $C^+(0, 1) denotes a standard half-Cauchy random variable with density $p(\lambda_i) = (2/\pi)(1+\lambda_i^2)^{-1}$

Published in categories Note