Bayesian Sparse Multiple Regression

Posted on Sep 16, 2021

Tags: Cross-Validation, Ridge, Lasso, High-Dimensional

This note is for Chakraborty, A., Bhattacharya, A., & Mallick, B. K. (2020). Bayesian sparse multiple regression for simultaneous rank reduction and variable selection. Biometrika, 107(1), 205–221.

In the context of multiple response regression, a popular technique to achieve parsimony and interpretability is to consider a reduced-rank decomposition of the coefficient matrix, commonly known as reduced rank regression.

Let $X\in \IR^{n\times p}, Y\in \IR^{n\times q}$, consider the multivariate linear regression model

\[Y = XC + E\,, \quad E=(e_1,\ldots, e_n)^T\]

where

the response has been centred
no intercept term
the rows of the error matrix are independent, with $e_i\sim N(0, \Sigma)$
the high-dimensional case, $p > \max(n, q)$
the dimension of the response $q$ to be modest relative to the sample size

Basic assumption in reduced rank regression is

\[\rank(C) = r \le \min(p, q)\]

where

\[C = B_\star A_\star^T\,, B_\star\in \IR^{p\times r}, A_\star\in \IR^{q\times r}\]

it is possible to treat $r$ as a parameter and assign it a prior distribution inside a hierarchical formulation, posterior inference on $r$ requires calculation of intractable marginal likelihoods or resorting to RJMCMC.

To avoid specifying a prior on $r$, the paper works within a parameter-expanded framework to

consider a potentially full-rank decomposition $C=BA^T$ with $B\in \IR^{p\times q}, A\in \IR^{q\times q}$,
assign shrinkage priors to $A$ and $B$ to shrink out the redundant columns when $C$ is indeed low rank.

Consider independent standard normal priors on the entries of $A$ - use $\Pi_A$ to denote the prior on $A$, i.e., $a_{hk}\sim N(0, 1)$ independently for $h, k=1,\ldots,q$ - alternatively, a uniform prior on the Stiefel manifold of orthogonal matrices can be used, but it is slow.

Use independent horseshoe priors on the columns of $B$, and denote it by $\Pi_B$ - stronger shrinkage is warranted on the columns of $B$

\[b_{jh}\mid \lambda_{jh}, \tau_h \sim N(0, \lambda_{jh}^2\tau_h^2),\quad \lambda_{jh}\sim Ca_+(0, 1), \quad \tau_h\sim Ca_+(0, 1)\]

independently for $j=1,\ldots,p$ and $h=1,\ldots,q$, where $Ca_+(0,1)$ denotes the truncated standard half-Cauchy distribution with density proportional to $(1+t^2)^{-1}1_{(0,\infty)}(t)$

primarily restrict the attention to settings where $\Sigma$ is diagonal, $\Sigma=\diag(\sigma_1^2,\ldots,\sigma_q^2)$, and assign independent improper priors $\pi(\sigma_h^2)\propto \sigma_h^{-2} (h=1,\ldots,q)$ on the diagonal elements

Then the model becomes

\[Y = XBA^T + E, e_i\sim N(0, \Sigma)\]

where

\[B\sim \Pi_B, A\sim \Pi_A, \Sigma\sim \Pi_\Sigma\]

The likelihood of $(C,\Sigma)$ is

\[p^{(n)}(Y\mid C, \Sigma; X) \propto \vert \Sigma\vert^{-n/2}\exp(-\trace\{(Y-XC)\Sigma^{-1}(Y-XC)^T\}/2)\]

Published in categories Note

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Bayesian Sparse Multiple Regression

Posted on Sep 16, 2021