Restricted Isometry Property
Posted on
I encounter the term RIP in Larry Wasserman’s post, RIP RIP (Restricted Isometry Property, Rest In Peace), and also find some material in Hastie et al.’s book: Statistical Learning with Sparsity about RIP.
Note that $\Vert \cdot \Vert_{\op}$ denotes the operator norm, or maximal singular value of a matrix. Due to the symmetry of $\X_S^T\X_S$, we have the equivalent representation
\[\Vert \X_S^T\X_S-\I_{k\times k}\Vert_{\op} = \sup_{\Vert u\Vert_2=1}\Big\vert u^T(\X_S^T\X_S-\I_{k\times k})u \Big\vert = \sup_{\Vert u\Vert_2=1}\Big\vert \Vert \X_Su\Vert_2^2-1\Big\vert\,.\]Thus, $\mathrm{RIP}(k,\delta)$ holds iff for all subsets $S$ of cardinality $k$, we have
\[\frac{\Vert \X_Su\Vert_2^2}{\Vert u\Vert_2^2}\in [1-\delta, 1+\delta]\qquad \text{for all }u\in \bbR^k\backslash\{0\}\,.\]Relationship with Restricted Nullspace
Given an observation vector $\y\in\bbR^p$ and a design matrix $\X\in \bbR^{N\times p}$, consider two problems
\begin{align}
\min_{\beta\in\bbR^p}\Vert \beta\Vert_0 &\qquad \text{ such that }\X\beta=\y\label{eq:l0}\
\min_{\beta\in\bbR^p}\Vert \beta\Vert_1 &\qquad \text{ such that }\X\beta=\y\label{eq:l1}
\end{align}
As Wikipedia said, there is one $\ell_0$ norm established mathematically and another function called the $\ell_0$ “norm” (with quotation marks to warn that this is not a proper norm), which is the number of non-zero entries of the vector. People often omit the quotation marks.
Suppose that the \eqref{eq:l0} has a unique optimal problem, say $\beta^*\in\bbR^p$. Our interest is in understanding when $\beta^*$ is also the unique optimal solution of the $\ell_1$-based problem \eqref{eq:l1}.
For a given subset $S\subseteq {1,\ldots,p}$, define
\[\bbC(S):=\{\beta\in\bbR^p\vert \;\Vert\beta_{S^c}\Vert_1\le \Vert\beta_S\Vert_1\}\,.\]Given a matrix $\X\in \bbR^{N\times p}$, its nullspace is given by $\null(\X)=\{\beta\in\bbR^p\mid\X\beta=\0\}$.
In words, the $\mathrm{RN}(S)$ property holds when the only element of the cone $\bbC(S)$ that lies within the nullspace of $\X$ is the all-zeros vector.
Remarkably, there exists a very simple necessary and sufficient condition on the design matrix $\X$ for the equivalence of $\ell_0$ and $\ell_1$.
Now return to the title of this section, RIP is a sufficient condition for the RN to hold.
Discussion
Larry pointed out that RIP is a very strong assumption, and it is NOT a reasonable assumption in regression problems, but it is a design feature in compressed sensing, and hence reasonable. In words, the data comes from nature in regression problems, while we can design and build the device to satisfy the assumptions in compressed sensing.
Larry showed that the lasso can work well without RIP, and he wrote in his post
I feel that we (as a field) have gone off in a bad theoretical direction. In regression research we should let the RIP rest in peace.