Generalizing Ridge Regression
Posted on
This note is for Chapter 3 of van Wieringen, W. N. (2021). Lecture notes on ridge regression. ArXiv:1509.09169 [Stat].
Consider
\[(Y-X\beta)'W(Y-X\beta) + (\beta-\beta_0)^T\Delta(\beta - \beta_0)\]which comprises a weighted least squares criterion and a generalized ridge penalty.
- $W$: a $n\times n$ dimensional, diagonal matrix with $W_{ii}\in[0,1]$ representing the weight of the i-th observation.
- $\Delta$: a $p\times p$ dimensional, positive definite, symmetric matrix, it allows
- different penalization per regression parameter
- joint (or correlated) shrinkage among the elements of $\beta$
- $\beta_0$: a user-specified, non-random target towards which $\beta$ is shrunken as the penalty parameter increases
The solution is
\[\hat\beta(\Delta) = (X'WX + \Delta)^{-1}(X'WY + \Delta\beta_0)\]Examples:
- Fused ridge estimation
-
A ridge to homogeneity
-
Codata: groups of covariates are deemed to be differentially important for the explaination of the response.