A Faster Algorithm for Repeated Linear Regression

Posted on Sep 21, 2017

Tags: Linear Regression

Repeated Linear Regression means that repeat the fitting of linear regression for many times, and there are some common parts among these regressions.

Background

Assume that we have a set of independent variables, $x_1, x_2,\ldots, x_n$, and a predictive variable $y$. In addition to these variables, there are also 4 variables called covariates, $cov_1, cov_2, cov_3, cov_4$. Now we want to fit the following linear regressions,

\[\begin{array}{ll} y&\sim x_1+x_2+x_1x_2 + cov_1 + cov_2+cov_3+cov_4\\ y&\sim x_1+x_3+x_1x_3 + cov_1 +cov_2+cov_3+cov_4\\ \cdot &\sim \cdots\\ y&\sim x_1+x_n+x_1x_n + cov_1 +cov_2+cov_3+cov_4\\ \end{array}\]

Naive Method

Denote $X_{1i}=(1, x_1,x_i, x_1x_i, cov_1,cov_2,cov_3,cov_4), i=2,\ldots,n$. Repeat the linear regression for $n-1$ times,

\[\hat\beta_{1i}=(X_{1i}'X_{1i})^{-1}X_{1i}'Y\]

The main complexity of the above estimation is the matrix inverse, whose complexity is $O(N^3)$. By using that method, we need repeat the $8\times 8$ matrix inverse for $n-1$ times.

Improved Method

Woodbury formula:
\[(A+UCV)^{-1}=A^{-1}-A^{-1}U(C^{-1}+VA^{-1}U)^{-1}VA^{-1}\]

Denote

\[X_{1i}=(x_i, x_1x_i, 1, x_1, cov_1,cov_2,cov_3,cov_4) = (\mathbf a_{1i}, \mathbf b)\]

where $\mathbf a_{1i}=(x_i,x_1x_i), \mathbf b=(1, x_1,cov_1,cov_2,cov_3,cov_4)$. Notice that we only adjust the order of $X_{1i}$, compared to the naive method, it does not affect the estimation of $\beta_{1i}$ (just different order of components.)

Then we have

\[X_{1i}'X_{1i}=\left[ \begin{array}{ll} \mathbf a_{1i}'\mathbf a_{1i} & \mathbf a_{1i}'\mathbf b\\ \mathbf b'\mathbf a_{1i} &\mathbf{b'b} \end{array} \right]= \left[ \begin{array}{ll} A_{1i}& V_{1i}'\\ V_{1i}& B \end{array} \right]\]

Let

\[A=\left[ \begin{array}{ll} A_{1i}&O\\ O&B \end{array}\right],\; U=\left[ \begin{array}{ll} I_{2\times 2} & O_{2\times 2}\\ O_{6\times 2} & V_{1i} \end{array} \right],\; V= \left[ \begin{array}{ll} O_{2\times 2}& V_{1i}'\\ I_{2\times 2}& O_{2\times 6} \end{array} \right]; \; i=2,\ldots, n\]

and $C=I_{4\times 4}$. Then we can apply woodbury formula

\[(X_{1i}'X_{1i})^{-1}=(A+UCV)^{-1}=A^{-1}-A^{-1}U(C^{-1}+VA^{-1}U)^{-1}VA^{-1}\]

where

\[A^{-1}=\left[ \begin{array}{ll} A_{1i}^{-1}&O\\ O&B^{-1} \end{array} \right]\]

We can do further calculation to simplify the inverse.

\[A^{-1}U = \left[ \begin{array}{ll} A_{1i}^{-1}&O\\ O&B^{-1}V_{1i} \end{array} \right]\] \[VA^{-1} = \left[ \begin{array}{cc} O&V_{1i}'B^{-1}\\ A_{1i}^{-1}&O \end{array} \right]\] \[VA^{-1}U= \left[ \begin{array}{cc} O&V_{1i}'B^{-1}V_{1i}\\ A_{1i}^{-1}&O \end{array} \right]\] \[\begin{array}{ll} (C^{-1}+VAU)^{-1}&\\=\left[ \begin{array}{cc} I&V_{1i}'B^{-1}V_{1i}\\ A_{1i}^{-1} &I\\ \end{array} \right]^{-1}&\\ =\left[ \begin{array}{cc} (I-V_{1i}'B^{-1}V_{1i}A_{1i}^{-1})^{-1}&-V_{1i}'B^{-1}V_{1i}(I-A_{1i}^{-1}V_{1i}'B^{-1}V_{1i})^{-1}\\ -A_{1i}^{-1}(I-V_{1i}'B^{-1}V_{1i}A_{1i}^{-1})^{-1}& (I-A_{1i}^{-1}V_{1i}'B^{-1}V_{1i})^{-1} \end{array} \right]&\\ =\left[ \begin{array}{cc} D & -B_1D'\\ -A_{1i}^{-1}D&D' \end{array} \right] \end{array}\]

where $B_1 = V_{1i}’B^{-1}V_{1i}$, $D=(I-B_1A_{1i}^{-1})^{-1}$

\[(X_{1i}'X_{1i})^{-1}\\=\left[ \begin{array}{cc} A_{1i}^{-1}+A_{1i}^{-1}B_1D'A_{1i}^{-1}&-A_{1i}^{-1}DV_{1i}'B^{-1}\\ -B^{-1}V_{1i}D'A_{1i}^{-1}&B^{-1}+B^{-1}V_{1i}A_{1i}^{-1}DV_{1i}'B^{-1} \end{array} \right]\]

Notice that $B$ is the same for these regressions, so we just need to calculate its inverse for the first time. And $A_{1i}$ is just $2\times 2$, it is much simpler. Then we just need to calculate the inverse of $2\times 2$ matrix, which is much smaller than $8\times 8$.

So I can use this process to reduce the dimension of matrix inverse significantly, which would speed the program theoretically, although need to calculate more matrix multiplication at the same time.

Continue to calculate $X_{1i}’Y$ to finish the estimation of $\beta$.

\[X_{1i}'Y = (\mathbf a_{1i}, \mathbf b)' Y = \left[\begin{array}{c}\mathbf a_{1i}'Y\\\mathbf b'Y\end{array}\right]\]

Then

\[\hat\beta_{1i} =(X_{1i}'X_{1i})^{-1}X_{1i}'Y\]

Practically Simulation

Apply these algorithm to the project epi, and compare these two method. It is obvious that the reduced-dimension method can speed the program significantly.

size_i	size_j	method	parallel	time(s)
50	5000	reduced	4	2.813
50	5000	reduced	1	9.440
50	5000	naive	4	14.729
50	5000	naive	1	53.507
500	5000	reduced	4	25.250
500	5000	reduced	1	83.836
500	5000	naive	4	141.837
500	5000	naive	1	493.661
5000	5000	reduced	4	128.087
5000	5000	reduced	1	438.277
5000	5000	naive	4	*
5000	5000	naive	1	*

The above cases are tested on my laptop, whose processor is “Intel® Core™ i5-6300HQ CPU @ 2.30GHz × 4”, and the actual memory is “7.7GB”. Now run these two programs on a server, whose processor is “20 Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz”.

size_i	size_j	method	parallel	time(s)
5000	5000	reduced	20	22.622
5000	5000	naive	20	75.422

Reference

https://en.wikipedia.org/wiki/Woodbury_matrix_identity

Published in categories Idea

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.