WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Restricted Isometry Property

Posted on
Tags: Compressed Sensing, Restricted Isometry

I encounter the term RIP in Larry Wasserman’s post, RIP RIP (Restricted Isometry Property, Rest In Peace), and also find some material in Hastie et al.’s book: Statistical Learning with Sparsity about RIP.

For a tolerance $\delta\in(0,1)$ and integer $k\in\{1,2,\ldots,p\}$, we say that $\mathrm{RIP}(k,\delta)$ holds if \begin{equation} \Vert \X_S^T\X_S-\I_{k\times k}\Vert_{\op} \le \delta\label{eq:rip} \end{equation} for all subsets $S\subset{1,2,\ldots,p}$ of cardinality $k$.

Note that $\Vert \cdot \Vert_{\op}$ denotes the operator norm, or maximal singular value of a matrix. Due to the symmetry of $\X_S^T\X_S$, we have the equivalent representation

\[\Vert \X_S^T\X_S-\I_{k\times k}\Vert_{\op} = \sup_{\Vert u\Vert_2=1}\Big\vert u^T(\X_S^T\X_S-\I_{k\times k})u \Big\vert = \sup_{\Vert u\Vert_2=1}\Big\vert \Vert \X_Su\Vert_2^2-1\Big\vert\,.\]

Thus, $\mathrm{RIP}(k,\delta)$ holds iff for all subsets $S$ of cardinality $k$, we have

\[\frac{\Vert \X_Su\Vert_2^2}{\Vert u\Vert_2^2}\in [1-\delta, 1+\delta]\qquad \text{for all }u\in \bbR^k\backslash\{0\}\,.\]

Relationship with Restricted Nullspace

Given an observation vector $\y\in\bbR^p$ and a design matrix $\X\in \bbR^{N\times p}$, consider two problems

\begin{align} \min_{\beta\in\bbR^p}\Vert \beta\Vert_0 &\qquad \text{ such that }\X\beta=\y\label{eq:l0}\
\min_{\beta\in\bbR^p}\Vert \beta\Vert_1 &\qquad \text{ such that }\X\beta=\y\label{eq:l1} \end{align}

As Wikipedia said, there is one $\ell_0$ norm established mathematically and another function called the $\ell_0$ “norm” (with quotation marks to warn that this is not a proper norm), which is the number of non-zero entries of the vector. People often omit the quotation marks.

Suppose that the \eqref{eq:l0} has a unique optimal problem, say $\beta^*\in\bbR^p$. Our interest is in understanding when $\beta^*$ is also the unique optimal solution of the $\ell_1$-based problem \eqref{eq:l1}.

For a given subset $S\subseteq {1,\ldots,p}$, define

\[\bbC(S):=\{\beta\in\bbR^p\vert \;\Vert\beta_{S^c}\Vert_1\le \Vert\beta_S\Vert_1\}\,.\]

Given a matrix $\X\in \bbR^{N\times p}$, its nullspace is given by $\null(\X)=\{\beta\in\bbR^p\mid\X\beta=\0\}$.

For a given subset $S\subseteq \{1,\ldots,p\}$, we say that the design matrix $\X\in \bbR^{N\times p}$ satisfies the restricted nullspace property over $S$, denoted by $\mathrm{RN}(S)$, if $$ \null(\X) \cap \bbC(S)=\{0\}\,. $$

In words, the $\mathrm{RN}(S)$ property holds when the only element of the cone $\bbC(S)$ that lies within the nullspace of $\X$ is the all-zeros vector.

Remarkably, there exists a very simple necessary and sufficient condition on the design matrix $\X$ for the equivalence of $\ell_0$ and $\ell_1$.

Suppose that $\beta^*\in\bbR^p$ is the unique solution to the $\ell_0$ problem \eqref{eq:l0}, and has support $S$. Then the basis pursuit relaxation \eqref{eq:l1} has a unique solution equal to $\beta^*$ iff $\X$ satisfies $\mathrm{RN}(S)$ property.

Now return to the title of this section, RIP is a sufficient condition for the RN to hold.

If $\mathrm{RIP}(2k,\delta)$ holds with $\delta < 1/3$, then the uniform RN property of order $k$ holds, and hence the $\ell_1$ relaxation is exact for all vectors supported on at most $k$ elements.


Larry pointed out that RIP is a very strong assumption, and it is NOT a reasonable assumption in regression problems, but it is a design feature in compressed sensing, and hence reasonable. In words, the data comes from nature in regression problems, while we can design and build the device to satisfy the assumptions in compressed sensing.

Larry showed that the lasso can work well without RIP, and he wrote in his post

I feel that we (as a field) have gone off in a bad theoretical direction. In regression research we should let the RIP rest in peace.

Published in categories Note