WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Interior-point Method

Posted on
Tags: Optimization

Nocedal and Wright (2006) and Boyd and Vandenberghe (2004) present slightly different introduction on Interior-point method. More specifically, the former one only considers equality constraints, while the latter incorporates the inequality constraints.

Nocedal and Wright (2006)

Consider the LP problem in standard form,

where $c,x\in\IR^n$ and $b\in\IR^m$, and $A$ is an $m\times n$ matrix with full row rank.

The dual problem is

where $\lambda\in \IR^m$ and $s\in \IR^n$. Solutions of \eqref{primal} and \eqref{dual} can be characterized by the KKT conditions,

Restate the optimality conditions in a slightly different form by means of a mapping $F$ from $\IR^{2n+m}$ to $\IR^{2n+m}$:

where $X=\diag(x_1,\ldots,x_n), S=\diag(s_1,\ldots,s_n)$ and $e=(1,\ldots,1)^T$. Primal-dual methods generates iterates $(x^k,\lambda^k,s^k)$ that satisfy the bounds strictly, i.e., $x^k > 0$ and $s^k > 0$. This property is the origin of the term interior-point.

Like most iterative algorithms in optimization, primal-dual interior-point methods have two basic ingredients: a procedure for determining the step and a measure of the desirability of each point in the search space.

Firstly, the duality measure is defined by

The procedure for determining the search direction has its origins in Newton’s method. Newton’s method forms a linear model for F around the current point and obtains the search direction $(\Delta x,\Delta \lambda, \Delta s)$ by solving the following system of linear equations

where $J$ is the Jacobian of $F$. Let $r_c$ and $r_b$ be the first two block rows in $F$,

then

Note that $J_{ij} = \frac{\partial f_i}{\partial x_j}$, i.e., $J_{i\cdot} = \frac{\partial f_i}{\partial x’}$, so we have $\frac{\partial f_1}{\partial \lambda’} = A^T$.

Usually a full step along this direction would violate the bound $(x,s)\ge 0$, so we perform a line search along the Newton direction and define the new iterate as

for some line search parameter $\alpha\in (0,1]$. We often can take only a small step along this direction ($\alpha < < 1$) before violating the condition $(x,s) > 0$. Hence, the pure Newton direction, sometimes known as the affine scaling direction, often does not allow us to make much progress toward a solution.

Most primal-dual methods use a less aggressive Newton direction, aim for a point whose pairwise products $x_is_i$ are reduced to a lower average value–not all the way to zero. Take a Newton step toward the point for which $x_is_i=\sigma\mu$, where $\mu$ is the current duality measure and $\sigma\in[0, 1]$ is the reduction factor that we wish to achieve in the duality measure on this step. The modified step equation is then

Here $\sigma$ is called the centering parameter.

Actually, software for implementing interior-point methods does not usually start from a feasible point $(x^0,\lambda^0,s^0)$.

Central Path

  • feasible set $\cF=\{(x,\lambda,s)\mid Ax=b,A^T\lambda+s=c,(x,s)\ge 0\}$
  • strictly feasible set $\cF^0 = \{(x,\lambda,s)\mid Ax=b,A^T\lambda+s=c,(x,s)> 0\}$

The central path $\cC$ is an arc of strictly feasible points, which is parametrized by a scalar $\tau > 0$, and each point $(x_\tau,\lambda_\tau,s_\tau)\in\cC$ satisfies the following equations:

Then define the central path as

Central Path Neighborhoods and Path-Following Methods

Path-following algorithms explicitly restrict the iterates to a neighborhood of the central path $\cC$ and follow $\cC$ to a solution of the linear program.

Two most interesting neighborhood of $\cC$ are

  • $\cN_2(\theta) = \{(x,\lambda, s)\in \cF^o\mid \Vert XSe-\mu e\Vert_2\le \theta\mu\}$ for some $\theta\in [0,1)$
  • $\cN_{-\infty}(\gamma) = \{(x,\lambda, s)\in \cF^o\mid x_is_i\ge \gamma\mu\;\forall i=1,2\ldots,n\}$

Boyd and Vandenberghe (2004)

Consider the convex optimization problems that include inequality constraints,

where $f_0,\ldots,f_m:\IR^n\rightarrow \IR$ are convex and twice continuously differentiable, and $A\in \IR^{p\times n}$ with rank $p < n$.

  • Assume that the problem is solvable, i.e., an optimal $x^\star$ exists. Denote the optimal value $f_0(x^\star)$ as $p^\star$.
  • Assume that the problem is strictly feasible, i.e., there exists $x\in\cD$ that satisfies $Ax=b$ and $f_i(x) < 0$ for $i=1,\ldots,m$.

Thus, there exist dual optimal $\lambda^\star\in\IR^m,\nu^\star\in\IR^p$, which together with $x^\star$ satisfy the KKT conditions

Logarithmic barrier function and central path

The goal is to approximately formulate the inequality constrained problem as an equality constrained problem to which Newton’s method can be applied.

Rewrite \eqref{11.1} to make the inequality constraints implicit in the objective,

where $I_{\_}:\IR\rightarrow \IR$ is the indicator function for the nonpositive reals,

The basic idea of the barrier method is to approximate the indicator function $I_{\_}$ by the function

where $t>0$ is a parameter that sets the accuracy of the approximation. Unlike $I_{\_ }$, $\hat I_{\_}$ is differentiable and closed: it increases to $\infty$ as $u$ increases to 0.

Thus, we have the approximation

The function

with $\text{dom} \phi=\{x\in\IR^n\mid f_i(x) < 0,i=1,\ldots,m\}$ is called the logarithmic barrier or log barrier.

We can simplify notation if multiplying the objective by $t$, and get the equivalent problem

For $t > 0$, define $x^\star(t)$ as the solution of \eqref{11.6}. The central path associated with problem \eqref{11.1} is defined as the set of points $x^\star(t), t>0$, which we call the central points. Points on the central path are characterized by the following necessary and sufficient conditions: $x^\star(t)$ is strictly feasible, i.e., satisfies

and there exists a $\hat \nu\in\IR^p$ such that

It follows that every central point yields a dual feasible point, and hence a lower bound on the optimal value $p^\star$. More specifically, define

It claims that the pair $\lambda^\star(t),\nu^\star(t)$ is dual feasible.

The barrier method

Take $t=m/\epsilon$,

The central path conditions \eqref{11.7} can be interpreted as a continuous deformation of the KKT optimality conditions. A point $x$ is equal to $x^\star(t)$ iff there exists $\lambda,\mu$ such that

which can be solved by Newton step (ignoring the inequality).

Firstly, we can eliminate the variables $\lambda_i = -1/(tf_i(x))$,

Write in matrix form,

then we can get the Newton step.

Primal-dual search direction

Define

and $t > 0$. The first block component of $r_t$,

is called the dual residual, and the last block component, $r_{pri}=Ax-b$ is called the primal residual. The middle block,

is the centrality residual. The Newton step is


Published in categories Note