Tensor Completion
Posted on (Update: )
Prof. YUAN Ming will give a distinguish lecture on Low Rank Tensor Methods in High Dimensional Data Analysis. To get familiar with his work on tensor, I read his paper, Yuan, M., & Zhang, C.-H. (2016). On Tensor Completion via Nuclear Norm Minimization. Foundations of Computational Mathematics, 16(4), 1031–1068., which is the topic of this post.
Matrix Completion
Image Source: Wiki - Matrix completion. Matrix completion of a partially revealed 5 by 5 matrix with rank-1. Left: observed incomplete matrix; Right: matrix completion result.
Consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe $m$ entries selected uniformly at random from a matrix $\M$. Can we complete the matrix and recover the entries that we have not seen?
Candès and Recht (2009) showed that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. If the number $m$ of sampled entries obeys
\[m\ge Cn^{1.2}r\log n\]for some positive numerical constant $C$, then with very high probability, most $n\times n$ matrices of rank $r$ can be perfectly recovered by solving a simple convex optimization program.
Tensor Completion
Let $\T\in\bbR^{d_1\times d_2\times \cdots d_N}$ be an $N$-th order tensor, and $\Omega$ be a randomly sampled subset of $[d_1]\times \cdots \times[d_N]$ where $[d]=\{1,2,\ldots,d\}$. The goal of tensor completion is to recover $\T$ when observing only entries $\T(\omega)$ for $\omega\in \Omega$.
Common to many problems, the tensor $\T$ can oftentimes be identified with a certain low-rank structure. The low-rankness entails reduction in degrees of freedom, and as a result, it is possible to recover $\T$ exactly even when the sample size $\vert\Omega\vert$ is much smaller than the total number, $d_1d_2\cdots d_N$, of entries in $\T$.
In particular, when $N=2$, this becomes the so-called matrix completion problem. An especially attractive approach is through nuclear norm minimization:
\[\min_{\X\in\bbR^{d_1\times d_2}}\Vert \X\Vert_*\qquad\text{subject to }\X(\omega) = \T(\omega)\;\forall \omega\in \Omega\,,\]where the nuclear norm $\Vert \cdot \Vert_*$ of a matrix is given by
\[\Vert \X\Vert_* = \sum_{k=1}^{\min\{d_1,d_2\}}\sigma_k(\X)\,,\]and $\sigma_k(\cdot)$ stands for the $k$-th largest singular value of a matrix.
If an unknown $d_1\times d_2$ matrix $\T$ of rank $r$ is of low coherence w.r.t. the canonical basis, then it can be perfectly reconstructed by $\hat\T$ with high probability whenever $\vert \Omega\vert \ge C(d_1+d_2)r\log^2(d_1+d_2)$, where $C$ is a numerical constant. In other words, perfect recovery of a matrix is possible with observations from a very small fraction of entries in $\T$.
The challenges of generalizing the ideas from matrices to tensors are the basic notion such as rank, or singular value decomposition, becomes ambiguous for higher order tensors. A common strategy is to unfold them to matrices, and then resort to the usual nuclear norm minimization heuristics for matrices.
Following the matricization approach, $\T$ can be reconstructed by the solution of the following convex program:
\[\min_{\X\in\bbR^{d_1\times d_2\times d_3}}\{\Vert X^{(1)}\Vert_* + \Vert X^{(2)}\Vert_* + \Vert X^{(3)}\Vert_*\}\quad \text{subject to }\X(\omega) =\T(\omega)\;\forall \omega\in \Omega\,,\]where $X^{(j)}$ is a $d_j\times (\prod_{k\neq j}d_k)$ matrix whose columns are the mode-$j$ fibers of $\X$.
It is of great interests to investigate if this sample size requirement can be improved by avoiding matricization of tensors. More specifically, for two tensors $\X,\Y\in\bbR^{d_1\times d_2\times d_3}$,
\[\langle \X,\Y \rangle= \sum_{\omega\in [d_1]\times [d_2]\times [d_3]}\X(\omega)\Y(\omega)\]as their inner product. Define
\[\Vert \X\Vert = \max_{\u_j\in\bbR^{d_j}:\Vert \u_1\Vert = \Vert \u_2\Vert=\Vert \u_3\Vert=1}\langle \X, \u_1\otimes \u_2\otimes \u_3 \rangle\,,\]and for vectors $\u_j=(u_1^j,\ldots,u_{d_j}^j)’$,
\[\u_1\otimes \u_2\otimes \u_3 = (u_a^1u_b^2u_c^3)_{1\le a\le d_1,1\le b\le d_2,1\le c\le d_3}\,.\]The $\Vert \cdot\Vert$ define above for a tensor is an operator norm, and can be viewed as an extension of the usual matrix spectral norm. Appealing to the duality between the spectral norm and nuclear norm in the matrix case, consider the following nuclear norm for tensors:
\[\Vert \X\Vert_* = \max_{\Y\in \bbR^{d_1\times d_2\times d_3}:\Vert \Y\Vert\le 1}\langle \Y,\X\rangle.\]Consider reconstructing $\T$ via the solution to the following convex problem:
\[\min_{\X\in\bbR^{d_1\times d_2\times d_3}}\Vert\X\Vert_*\quad \text{subject to }\X(\omega) = \T(\omega)\;\forall\omega\in \Omega\,.\]Prof. Yuan’s Talk
Prof. Yuan mainly discussed about Tensor PCA. It is necessary to think more about the motivation and necessity.
Finally, he shared his idea on the research of tensor.