One-way Matching with Low Rank
Posted on (Update: )
This post is for Chen, Shuxiao, Sizun Jiang, Zongming Ma, Garry P. Nolan, and Bokai Zhu. “One-Way Matching of Datasets with Low Rank Signals.” arXiv, October 3, 2022.
Suppose we have two datasets
- $X\in \IR^{n\times p}$
- $Y\in \IR^{m\times p}$
assume $n\le m$
assume that both data matrices admit a “signal+noise” structure
- $X=M_x+Z_x$
- $Y=M_y+Z_y$
One-way matching as estimation
assume $m = n$, special the model to
\[X = UDV^\top + \sigma_xN_x\\ \Pi^\star Y = UDV^\top + \sigma_y N_y\]For brevity, write $(X, Y)\sim M(U, D, V, \sigma_x, \sigma_y, \Pi^\star)$
each permutation matrix $\Pi$ has an one-to-one correspondence to a vector $\pi \in S_n$ where $S_n$ collects all permutations of the set ${1, \ldots, n}$.
the paper focus on minimax estimation of $\Pi^\star$ under the normalized Hamming loss
\[\frac 1n d(\hat \pi, \pi^\star) = \frac{1}{n}\sum_{i\in [n]}1\{\hat \pi_i \neq \pi_i^\star\} = \frac{1}{2n}\Vert \Pi-\hat\Pi\Vert_F^2 = \ell(\hat\Pi, \Pi^\star)\,.\]main results
- derive minimax lower bounds for estimating $\Pi^\star$
- a mathcing algorithm based on solving a linear assignment problem on projected data
- improve the error bound under more strigent “weak symmetry condition”
one-way setting: the columns of the datasets are already aligned and only permutation of rows is to be estimated