# Conjugate Gradient for Regression

##### Posted on Aug 13, 20170 Comments

The conjugate gradient method is an iterative method for solving a linear system of equations, so we can use conjugate method to estimate the parameters in (linear/ridge) regression.

### Problem

Considering a linear system of equations

where $A$ is a $n\times n$ symmetric positive definite matrix. It can be stated equivalently as the following minimization problem

The gradient of $\phi$ equals the residual of the linear system,

so in particular at $x=x_k$ we have

## Conjugate Direction Method

A set of nonzero vectors ${p_0,p_1,\ldots,p_l}$ is said to be conjugate with respect to the symmetric definite matrix $A$ if

Given a starting point $x_0\in R^n$ and a set of conjugate directions ${p_0,p_1,\ldots,p_{n-1}}$, let us generate the sequence ${x_k}$ by

where $\alpha_k$ is the one dimensional minimizer of the quadratic function $\phi(\cdot)$ along $x_k+\alpha p_k$, given explicitly by

Consider the function $\phi(\alpha)$ with respect step length $\alpha$

$\phi(\alpha)=f(x_k+\alpha p_k)$ where $f(x) = \frac 12x^TAx-b^Tx$. Let $\phi’(\alpha)=0$, we have $\alpha_k=-\frac{\nabla f_kp_k}{p_kAp_k}$

Theorem: For any $x_0\in R^n$, the sequence $x_k$ by the conjugate direction algorithm converges to the solution $x^*$ of the linear system in at most $n$ steps.

Theorem: Let $x_0\in R^n$ be any starting point and suppose that the sequence ${x_k}$ is generated by the conjugate direction algorithm. Then $r_k^Tp_i=0\;\text{for }i=0,1\ldots,k-1$

## Conjugate Gradient Method

The conjugate gradient method is a conjugate direction method with a very special property: In generating its set of conjugate vectors, it can compute a new vector $p_k$ by using only the previous vector $p_{k-1}$

Each direction $p_k$ is chosen to be a linear combination of $-r_k$ and the previous direction $p_{k-1}$, that is

where the scalar $\beta_k$ is to be determined by the requirement that $p_{k-1}$ and $p_k$ must be conjugate with respect to $A$. By premultiplying by $p_{k-1}^TA$ and imposing the condition $p_{k-1}^TAp_k=0$, we find that Figure 1: CG-Preliminary Version

Theorem: Suppose that the $k$-th iterate generated by the conjugate gradient method is not the solution point $x^*$. The following properties hold:

We can simplify $\alpha_k$ by using the property $r_k^Tp_{k-1}=0$ and obtain

Also, we can simplify $\beta_k$ by using the property $r_{k+1}^Tr_k=0$ and obtain

Then we obtain the following standard form of the conjugate gradient method Figure 2: CG-Standard Version

## Application in Regression

For linear regression, we can recognize $X’X$ as $A$ and $X’Y$ as $b$. It is cautious when there is multiple collinearity in $X$. Because $A$ should be a symmetric positive definite matrix, while $X’X$ doesn’t satisfy this requirement if there is multiple collinearity in $X$. Fortunately, for ridge regression, $X’X+\lambda I,\lambda>0$ is a symmetric positive definite matrix, so we don’t worry about multiple collinearity.

The following R code is for estimating the parameters in regression by using conjugate gradient method.

### Simulation

Generate the dataset by random, and apply CG method to estimate the parameter of linear regression.

It is obvious that the differences of these two methods so small.

As for ridge regression, we can also test the performance by using the following R code.