Prediction Risk for the Horseshoe Regression

Posted on Mar 24, 2022

Tags: Horseshoe, Ridge

The note is for Bhadra, A., Datta, J., Li, Y., Polson, N. G., & Willard, B. (2019). Prediction Risk for the Horseshoe Regression. 39.

Develop theoretical results on prediction risk in the high-dimensional linear regression model

\[y = X\beta + \epsilon\]

Consider the quadratic predictive risk

\[R = E_{y^\star, y\mid X, \beta}(y^\star - X\hat\beta)^2\]

focus on comparing estimators $\hat\beta$ in a non-asymptotic fixed $n$, fixed $p > n$ setting.

global shrinkage: shrinkage estimators with a single tuning parameter

ridge regression
principal components regression

purely global shrinkage regression methods suffer from two major difficulties

the amount of relative shrinkage is monotone in the singular values of the design matrix
the shrinkage is determined by a single tuning parameter

A finite sample unbiased estimate of $R$ is given by Stein’s unbiased risk estimate or SURE

contributions

analyze the finite sample predictive risk of global shrinkage regression methods, examine where these methods fall short, and demonstrate a remedy using local shrinkage parameters

theoretical findings: an orthogonalized representation that allows shrinkage regression estimates to be viewed as posterior means under some suitable priors.
- provide explicit finite sample risk comparisons between the global ridge and global-local horseshoe regressions

Shrinkage regression estimates as posterior means

Let $X=UDW^T$, and let $Z=UD$ be $n\times n$, and $\alpha = W^T\beta$ be $n\times 1$. Then the regression model becomes

\[y=Z\alpha + \epsilon\]

The estimates of many shrinkage regression methods can be expressed in terms of posterior mean of the orthogonalized regression coefficients $\alpha$ under the following hierarchical model:

\[\begin{align*} (\hat\alpha_i\mid \alpha_i, \sigma^2) \sim_{ind} N(\alpha_i, \sigma^2d_i^{-2})\\ (\alpha_i\mid \sigma^2,\tau^2, \lambda_i^2) \sim_{ind} N(0, \sigma^2\tau^2\lambda_i^2) \end{align*}\]

where $\sigma^2,\tau^2 > 0$,

$\tau$: control the amount of shrinkage
fixed $\lambda_i^2$: depend on the method at hand

several examples:

ridge: $\lambda_i^2=1$

Stein’s unbiased risk estimate for the global shrinkage regression

Stein’s unbiased risk estimate for the horseshoe regression

the global-local horseshoe shrinkage regreesion extends the global shrinkage regression models by putting a local (component-specific), heavy-tailed half-Cauchy prior on the $\lambda_i$ terms that allow these terms to be learned from the data

\[\begin{align*} (\hat\alpha_i\mid \alpha_i, \sigma^2) &\sim_{ind} N(\alpha_i, \sigma^2d_i^{-2})\\ (\alpha_i\mid \sigma^2, \tau^2, \lambda_i^2) & \sim_{ind} N(0, \sigma^2\tau^2\lambda_i^2)\\ \lambda_i & \sim_{ind} C^+(0, 1) \end{align*}\]

where $C^+(0, 1) denotes a standard half-Cauchy random variable with density $p(\lambda_i) = (2/\pi)(1+\lambda_i^2)^{-1}$

Published in categories Note

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.