Functional Data Analysis by Matrix Completion
Posted on
This note is based on Descary, M.-H., & Panaretos, V. M. (2019). Functional data analysis by matrix completion. The Annals of Statistics, 47(1), 1–38.
Abstract
The paradigm of functional PCA implicitly assumes that rough variation is due to nuisance noise. Nevertheless, relevant functional features may indeed be rough relative to the global scale, but still smooth at shorter scales. These may be confounded with the global smooth components of variation by the smoothing and PCA, potentially distorting the parsimony and interpretability of the analysis.
Goal: investigate how both smooth and rough variations can be recovered on the basis of discretely observed functional data.
Assume that a functional datum arises as the sum of two uncorrelated components, one smooth and one rough, the authors develop identifiability conditions for the recovery of the two corresponding covariance operators.
Key insight: they should possess complementary forms of parsimony: one smooth and finite rank (larger scale), and the other banded and potentially infinite rank (small scale).
The conditions elucidate the precise interplay between rank, bandwidth and grid resolution.
Under these conditions, the recovery problem is equivalent to rank-constrained matrix completion, and exploit this to construct estimators of the two covariances, without assuming knowledge of the true bandwidth or rank. Use their asymptotic behaviour to recover the smooth and rough components of each functional datum by best linear prediction.
As a result, effectively produce separate functional PCAs for smooth and rough variation.
Introduction
Functional PCA aims to construct a parsimonious yet accurate finite dimensional representation of $n$ observable i.i.d. replicates ${X_1,\ldots,X_n}$ of a real-valued random function ${X(t):t\in [0,1]}$.
Since the covariance operator $\scrR$ is unknown in practice, functional PCA must be based on its empirical counterpart.
\[\hat \scrR_n = \sum_{i=1}^n(X_i-\bar X)\otimes (X_i-\bar X)\]Here $\v \otimes\w$ represents tensor product, which is equivalent to $\v\w’$.
One cannot perfectly observe the complete sample paths of $\{X_1,\ldots,X_n\}$. Instead, one has to make do with discrete measurements
\[X_{ij} = X_i(t_j) + \varepsilon_{ij}\,,\qquad i=1,\ldots,n,j=1,\ldots,K\,,\]where the points $t_j$ can be random or deterministic and the array $\varepsilon_{ij}$ is assumed to be comprised of centred i.i.d. perturbations, independent of the $X_i$.
Roughly speaking, there are two major approaches to deal with discrete measurements:
- to smooth the discretely observed curves and then obtain the covariance operator and spectrum of the smooth curves.
- to first obtain a smoothed estimate of the covariance operator and to use this to estimate the unobservable curves and their spectrum.
Define smoothed curves $\tilde X_i$ as
\[\tilde X_i(t) = \arg\min_{f\in C^2[0,1]}\left\{\sum_{j=1}^K(f(t_j)-X_{ij})^2+\tau \Vert \partial_t^2 f\Vert^2_{L^2}\right\}\quad i=1,\ldots,n\,,\]for $C^2[0,1]$ the space of twice continuously differentiable functions on $[0,1]$, and $\tau > 0$ a regularising constant. The proxy curves $\{\tilde X_i\}$ are used in lieu of the unobservable $\{X_i\}$ in order to construct a “smooth” empirical covariance operator $\tilde \scrR$, and the curves $\{\tilde X_i\}$ are finally projected onto the span of the first $r$ eigenfunctions of $\tilde R$.
The second general approach, Principal Analysis by Conditional Expectation (PACE).
Proceeding in either of these two ways essentially consigns any variations of smoothness class less than $C^2$ to pure noise, and subsequently smears them by means of smoothing; any further rough variations are expected to be negligible, and due to small fluctuations around eigenfunctions of order at least $r+1$ and are also discarded post-PCA.