WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

One-way Matching with Low Rank

Posted on (Update: )
Tags: Data Alignment, Single-cell, Linear Assignment, Spatial Proteomics, Record Linkage

This post is for Chen, Shuxiao, Sizun Jiang, Zongming Ma, Garry P. Nolan, and Bokai Zhu. “One-Way Matching of Datasets with Low Rank Signals.” arXiv, October 3, 2022.

Suppose we have two datasets

  • $X\in \IR^{n\times p}$
  • $Y\in \IR^{m\times p}$

assume $n\le m$

assume that both data matrices admit a “signal+noise” structure

  • $X=M_x+Z_x$
  • $Y=M_y+Z_y$

One-way matching as estimation

assume $m = n$, special the model to

\[X = UDV^\top + \sigma_xN_x\\ \Pi^\star Y = UDV^\top + \sigma_y N_y\]

For brevity, write $(X, Y)\sim M(U, D, V, \sigma_x, \sigma_y, \Pi^\star)$

each permutation matrix $\Pi$ has an one-to-one correspondence to a vector $\pi \in S_n$ where $S_n$ collects all permutations of the set ${1, \ldots, n}$.

the paper focus on minimax estimation of $\Pi^\star$ under the normalized Hamming loss

\[\frac 1n d(\hat \pi, \pi^\star) = \frac{1}{n}\sum_{i\in [n]}1\{\hat \pi_i \neq \pi_i^\star\} = \frac{1}{2n}\Vert \Pi-\hat\Pi\Vert_F^2 = \ell(\hat\Pi, \Pi^\star)\,.\]

main results

  • derive minimax lower bounds for estimating $\Pi^\star$
  • a mathcing algorithm based on solving a linear assignment problem on projected data
  • improve the error bound under more strigent “weak symmetry condition”

one-way setting: the columns of the datasets are already aligned and only permutation of rows is to be estimated


Published in categories Note