WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Guarantees of Lloyd’s Algorithm

Posted on (Update: ) 0 Comments
Tags: K-means

This note is for Lu, Y., & Zhou, H. H. (2016). Statistical and Computational Guarantees of Lloyd’s Algorithm and its Variants (No. arXiv:1612.02099). arXiv. http://arxiv.org/abs/1612.02099

two issues with Lloyd’s algorithm under the worst case analysis

  1. as a greedy algorithm, Lloyd’s algorithm is only guaranteed to converge to a local minimum. The k-means objective function that Lloyd’s algorithm attemps to minimize is NP-hard
  2. the convergence rate of Lloyd’s algorithm can be very slow

analyze its performance on the Gaussian mixture model

Mixture of sub-Gaussians

$y_i \in \IR^d$ from a mixture of $k$ sub-Gaussian distributions

\[y_i = \theta_{z_i} + w_i\quad \text{for } i\in [n]\]

assume the noise ${w_i, i \in [n]}$ are independent zero mean sub-Gaussian vectors with parameter $\sigma > 0$

a special case is the symmetric two-component mixture model, in which the two centers are $\theta^\star$ and $-\theta^\star$. We observe $y_1,\ldots, y_n$ from the following generative model

\[y_i = z_i\theta^\star + \xi_i\]

where

  • $z_i\in {-1, 1}$
  • ${\xi, i\in [n]}$ are independent Gaussian noise with covariance matrix $\sigma^2I_d$

for these clustering problems, the main goal is to recover the unknown label $z_i$ rather than to estimate the centers ${\theta_j}$

define the mis-clustering rate of estimated labels $\hat z_1$, \ldots, $\hat z_n$

given $n$ vectors $y_1, \ldots, y_n\in\IR^d$ and an integer $k$, the goal is to find $k$ points $\theta_1,\ldots,\theta_k\in \IR^d$ to minimize the following objective function

\[\sum_{i\in [n]}\min_{j\in [k]} \Vert y_i - \theta_j\Vert^2\]

the problem is equivalent to find $\hat\theta_1,\ldots,\hat\theta_k\in \IR^d$ and $\hat z_1,\ldots, z_n\in [k]$ such that

\[(\hat\theta, \hat z) = \argmin_{(\theta, z)} \sum_{i\in [n]} \Vert y_i - \sum_{j=1}^k\theta_jI\{z_i = j\}\Vert^2\]

a key quantity in determining the basin of attraction of Lloyd’s algorithm is the normalized signal-to-noise ratio

\[r = \frac{\Vert \theta^\star\Vert}{\sigma\sqrt{1+\eta}}\]

where $\eta = 9d/n$

image


Published in categories