Generalizing Ridge Regression
Posted on
This note is for Chapter 3 of van Wieringen, W. N. (2021). Lecture notes on ridge regression. ArXiv:1509.09169 [Stat].
Consider
\[(YX\beta)'W(YX\beta) + (\beta\beta_0)^T\Delta(\beta  \beta_0)\]which comprises a weighted least squares criterion and a generalized ridge penalty.
 $W$: a $n\times n$ dimensional, diagonal matrix with $W_{ii}\in[0,1]$ representing the weight of the ith observation.
 $\Delta$: a $p\times p$ dimensional, positive definite, symmetric matrix, it allows
 different penalization per regression parameter
 joint (or correlated) shrinkage among the elements of $\beta$
 $\beta_0$: a userspecified, nonrandom target towards which $\beta$ is shrunken as the penalty parameter increases
The solution is
\[\hat\beta(\Delta) = (X'WX + \Delta)^{1}(X'WY + \Delta\beta_0)\]Examples:
 Fused ridge estimation

A ridge to homogeneity

Codata: groups of covariates are deemed to be differentially important for the explaination of the response.