# Prediction Risk for the Horseshoe Regression

##### Posted on

The note is for Bhadra, A., Datta, J., Li, Y., Polson, N. G., & Willard, B. (2019). Prediction Risk for the Horseshoe Regression. 39.

Develop theoretical results on prediction risk in the high-dimensional linear regression model

\[y = X\beta + \epsilon\]Consider the quadratic predictive risk

\[R = E_{y^\star, y\mid X, \beta}(y^\star - X\hat\beta)^2\]focus on comparing estimators $\hat\beta$ in a non-asymptotic fixed $n$, fixed $p > n$ setting.

global shrinkage: shrinkage estimators with a single tuning parameter

- ridge regression
- principal components regression

purely global shrinkage regression methods suffer from two major difficulties

- the amount of relative shrinkage is monotone in the singular values of the design matrix
- the shrinkage is determined by a single tuning parameter

A finite sample unbiased estimate of $R$ is given by Stein’s unbiased risk estimate or SURE

### contributions

analyze the finite sample predictive risk of global shrinkage regression methods, examine where these methods fall short, and demonstrate a remedy using local shrinkage parameters

- theoretical findings: an orthogonalized representation that allows shrinkage regression estimates to be viewed as posterior means under some suitable priors.
- provide explicit finite sample risk comparisons between the global ridge and global-local horseshoe regressions

## Shrinkage regression estimates as posterior means

Let $X=UDW^T$, and let $Z=UD$ be $n\times n$, and $\alpha = W^T\beta$ be $n\times 1$. Then the regression model becomes

\[y=Z\alpha + \epsilon\]The estimates of many shrinkage regression methods can be expressed in terms of posterior mean of the orthogonalized regression coefficients $\alpha$ under the following hierarchical model:

\[\begin{align*} (\hat\alpha_i\mid \alpha_i, \sigma^2) \sim_{ind} N(\alpha_i, \sigma^2d_i^{-2})\\ (\alpha_i\mid \sigma^2,\tau^2, \lambda_i^2) \sim_{ind} N(0, \sigma^2\tau^2\lambda_i^2) \end{align*}\]where $\sigma^2,\tau^2 > 0$,

- $\tau$: control the amount of shrinkage
- fixed $\lambda_i^2$: depend on the method at hand

several examples:

- ridge: $\lambda_i^2=1$

## Stein’s unbiased risk estimate for the global shrinkage regression

## Stein’s unbiased risk estimate for the horseshoe regression

the global-local horseshoe shrinkage regreesion extends the global shrinkage regression models by putting a local (component-specific), heavy-tailed half-Cauchy prior on the $\lambda_i$ terms that allow these terms to be learned from the data

\[\begin{align*} (\hat\alpha_i\mid \alpha_i, \sigma^2) &\sim_{ind} N(\alpha_i, \sigma^2d_i^{-2})\\ (\alpha_i\mid \sigma^2, \tau^2, \lambda_i^2) & \sim_{ind} N(0, \sigma^2\tau^2\lambda_i^2)\\ \lambda_i & \sim_{ind} C^+(0, 1) \end{align*}\]where $C^+(0, 1) denotes a standard half-Cauchy random variable with density $p(\lambda_i) = (2/\pi)(1+\lambda_i^2)^{-1}$