WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Conjugate Gradient for Regression

Posted on 0 Comments
Tags: Conjugate Gradient, Linear Regression

The conjugate gradient method is an iterative method for solving a linear system of equations, so we can use conjugate method to estimate the parameters in (linear/ridge) regression.


Considering a linear system of equations

where $A$ is a $n\times n$ symmetric positive definite matrix. It can be stated equivalently as the following minimization problem

The gradient of $\phi$ equals the residual of the linear system,

so in particular at $x=x_k$ we have

Conjugate Direction Method

A set of nonzero vectors ${p_0,p_1,\ldots,p_l}$ is said to be conjugate with respect to the symmetric definite matrix $A$ if

Given a starting point $x_0\in R^n$ and a set of conjugate directions ${p_0,p_1,\ldots,p_{n-1}}$, let us generate the sequence ${x_k}$ by

where $\alpha_k$ is the one dimensional minimizer of the quadratic function $\phi(\cdot)$ along $x_k+\alpha p_k$, given explicitly by

Consider the function $\phi(\alpha)$ with respect step length $\alpha$

where $f(x) = \frac 12x^TAx-b^Tx$. Let $\phi’(\alpha)=0$, we have

Theorem: For any $x_0\in R^n$, the sequence $x_k$ by the conjugate direction algorithm converges to the solution $x^* $ of the linear system in at most $n$ steps.

Theorem: Let $x_0\in R^n$ be any starting point and suppose that the sequence ${x_k}$ is generated by the conjugate direction algorithm. Then

Conjugate Gradient Method

The conjugate gradient method is a conjugate direction method with a very special property: In generating its set of conjugate vectors, it can compute a new vector $p_k$ by using only the previous vector $p_{k-1}$

Each direction $p_k$ is chosen to be a linear combination of $-r_k$ and the previous direction $p_{k-1}$, that is

where the scalar $\beta_k$ is to be determined by the requirement that $p_{k-1}$ and $p_k$ must be conjugate with respect to $A$. By premultiplying by $p_{k-1}^TA$ and imposing the condition $p_{k-1}^TAp_k=0$, we find that

Figure 1: CG-Preliminary Version

Theorem: Suppose that the $k$-th iterate generated by the conjugate gradient method is not the solution point $x^* $. The following properties hold:

We can simplify $\alpha_k$ by using the property $r_k^Tp_{k-1}=0$ and obtain

Also, we can simplify $\beta_k$ by using the property $r_{k+1}^Tr_k=0$ and obtain

Then we obtain the following standard form of the conjugate gradient method

Figure 2: CG-Standard Version

Application in Regression

For linear regression, we can recognize $X’X$ as $A$ and $X’Y$ as $b$. It is cautious when there is multiple collinearity in $X$. Because $A$ should be a symmetric positive definite matrix, while $X’X$ doesn’t satisfy this requirement if there is multiple collinearity in $X$. Fortunately, for ridge regression, $X’X+\lambda I,\lambda>0$ is a symmetric positive definite matrix, so we don’t worry about multiple collinearity.

The following R code is for estimating the parameters in regression by using conjugate gradient method.


Generate the dataset by random, and apply CG method to estimate the parameter of linear regression.

It is obvious that the differences of these two methods so small.

As for ridge regression, we can also test the performance by using the following R code.


  1. 邱怡轩, 共轭梯度法计算回归
  2. Jorge Nocedal, Stephen J. Wright, Numerical Optimization (Second Edition)

Published in categories Report