Bayesian Sparse Multiple Regression

Posted on Sep 16, 20210 Comments

In the context of multiple response regression, a popular technique to achieve parsimony and interpretability is to consider a reduced-rank decomposition of the coefficient matrix, commonly known as reduced rank regression.

Let $X\in \IR^{n\times p}, Y\in \IR^{n\times q}$, consider the multivariate linear regression model

$Y = XC + E\,, \quad E=(e_1,\ldots, e_n)^T$

where

• the response has been centred
• no intercept term
• the rows of the error matrix are independent, with $e_i\sim N(0, \Sigma)$
• the high-dimensional case, $p > \max(n, q)$
• the dimension of the response $q$ to be modest relative to the sample size

Basic assumption in reduced rank regression is

$\rank(C) = r \le \min(p, q)$

where

$C = B_\star A_\star^T\,, B_\star\in \IR^{p\times r}, A_\star\in \IR^{q\times r}$

it is possible to treat $r$ as a parameter and assign it a prior distribution inside a hierarchical formulation, posterior inference on $r$ requires calculation of intractable marginal likelihoods or resorting to RJMCMC.

To avoid specifying a prior on $r$, the paper works within a parameter-expanded framework to

• consider a potentially full-rank decomposition $C=BA^T$ with $B\in \IR^{p\times q}, A\in \IR^{q\times q}$,
• assign shrinkage priors to $A$ and $B$ to shrink out the redundant columns when $C$ is indeed low rank.

Consider independent standard normal priors on the entries of $A$ - use $\Pi_A$ to denote the prior on $A$, i.e., $a_{hk}\sim N(0, 1)$ independently for $h, k=1,\ldots,q$ - alternatively, a uniform prior on the Stiefel manifold of orthogonal matrices can be used, but it is slow.

Use independent horseshoe priors on the columns of $B$, and denote it by $\Pi_B$ - stronger shrinkage is warranted on the columns of $B$

$b_{jh}\mid \lambda_{jh}, \tau_h \sim N(0, \lambda_{jh}^2\tau_h^2),\quad \lambda_{jh}\sim Ca_+(0, 1), \quad \tau_h\sim Ca_+(0, 1)$

independently for $j=1,\ldots,p$ and $h=1,\ldots,q$, where $Ca_+(0,1)$ denotes the truncated standard half-Cauchy distribution with density proportional to $(1+t^2)^{-1}1_{(0,\infty)}(t)$

primarily restrict the attention to settings where $\Sigma$ is diagonal, $\Sigma=\diag(\sigma_1^2,\ldots,\sigma_q^2)$, and assign independent improper priors $\pi(\sigma_h^2)\propto \sigma_h^{-2} (h=1,\ldots,q)$ on the diagonal elements

Then the model becomes

$Y = XBA^T + E, e_i\sim N(0, \Sigma)$

where

$B\sim \Pi_B, A\sim \Pi_A, \Sigma\sim \Pi_\Sigma$

The likelihood of $(C,\Sigma)$ is

$p^{(n)}(Y\mid C, \Sigma; X) \propto \vert \Sigma\vert^{-n/2}\exp(-\trace\{(Y-XC)\Sigma^{-1}(Y-XC)^T\}/2)$

Published in categories Note