Prediction Risk for the Horseshoe Regression
Posted on
The note is for Bhadra, A., Datta, J., Li, Y., Polson, N. G., & Willard, B. (2019). Prediction Risk for the Horseshoe Regression. 39.
Develop theoretical results on prediction risk in the high-dimensional linear regression model
\[y = X\beta + \epsilon\]Consider the quadratic predictive risk
\[R = E_{y^\star, y\mid X, \beta}(y^\star - X\hat\beta)^2\]focus on comparing estimators $\hat\beta$ in a non-asymptotic fixed $n$, fixed $p > n$ setting.
global shrinkage: shrinkage estimators with a single tuning parameter
- ridge regression
- principal components regression
purely global shrinkage regression methods suffer from two major difficulties
- the amount of relative shrinkage is monotone in the singular values of the design matrix
- the shrinkage is determined by a single tuning parameter
A finite sample unbiased estimate of $R$ is given by Stein’s unbiased risk estimate or SURE
contributions
analyze the finite sample predictive risk of global shrinkage regression methods, examine where these methods fall short, and demonstrate a remedy using local shrinkage parameters
- theoretical findings: an orthogonalized representation that allows shrinkage regression estimates to be viewed as posterior means under some suitable priors.
- provide explicit finite sample risk comparisons between the global ridge and global-local horseshoe regressions
Shrinkage regression estimates as posterior means
Let $X=UDW^T$, and let $Z=UD$ be $n\times n$, and $\alpha = W^T\beta$ be $n\times 1$. Then the regression model becomes
\[y=Z\alpha + \epsilon\]The estimates of many shrinkage regression methods can be expressed in terms of posterior mean of the orthogonalized regression coefficients $\alpha$ under the following hierarchical model:
\[\begin{align*} (\hat\alpha_i\mid \alpha_i, \sigma^2) \sim_{ind} N(\alpha_i, \sigma^2d_i^{-2})\\ (\alpha_i\mid \sigma^2,\tau^2, \lambda_i^2) \sim_{ind} N(0, \sigma^2\tau^2\lambda_i^2) \end{align*}\]where $\sigma^2,\tau^2 > 0$,
- $\tau$: control the amount of shrinkage
- fixed $\lambda_i^2$: depend on the method at hand
several examples:
- ridge: $\lambda_i^2=1$
Stein’s unbiased risk estimate for the global shrinkage regression
Stein’s unbiased risk estimate for the horseshoe regression
the global-local horseshoe shrinkage regreesion extends the global shrinkage regression models by putting a local (component-specific), heavy-tailed half-Cauchy prior on the $\lambda_i$ terms that allow these terms to be learned from the data
\[\begin{align*} (\hat\alpha_i\mid \alpha_i, \sigma^2) &\sim_{ind} N(\alpha_i, \sigma^2d_i^{-2})\\ (\alpha_i\mid \sigma^2, \tau^2, \lambda_i^2) & \sim_{ind} N(0, \sigma^2\tau^2\lambda_i^2)\\ \lambda_i & \sim_{ind} C^+(0, 1) \end{align*}\]where $C^+(0, 1) denotes a standard half-Cauchy random variable with density $p(\lambda_i) = (2/\pi)(1+\lambda_i^2)^{-1}$