WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

FDR Control for Byzantine Machines

Posted on
Tags: False Discovery Rate, Distributed Learning, Federated Learning

This is the note for Qian, C., Wang, M., Ren, H., & Zou, C. (2024). ByMI: Byzantine Machine Identification with False Discovery Rate Control. Proceedings of the 41st International Conference on Machine Learning, 41357–41382. https://proceedings.mlr.press/v235/qian24b.html

In a distributed learning system, there is usually a small fraction of worker machines that send any arbitrary information due to malicious attacks on work machines and communication channels, or the variation and contamination in the data sources. Those abnormal machines are called Byzantine machines

Byzantine robust learning

it focuses on aggregating workers’ messages, e.g., the averages of gradients, via some robust estimation algorithms in the master machine

one popular choice is to take the median instead of the sample mean. however, those median-type algorithms suffer a bias dependent on the dimension due to Byzantine failures

another direction is to detect and delete Byzantine machines and further make estimations based on some reputation scores which measure the truthworthiness of worker machines

Outlier detection methods

Byzantine machine performs like one outlier.

traditionally, one outlier detection procedure is generally to obtain some robust center estimates and then to compute p-values based on some efficient tests to evaluate whether it is one outlier

Contributions

the paper suggests a p-value free and dimension insensitive detection procedure, named as Byzantine Machines Identification (ByMI)

Problem Formulation

assume $N$ independent samples ${s_i}_{i=1}^N$ are evenly stored in $m + 1$ machines $\cM_0, \cM_1,\ldots, \cM_m$, each of which contain $n$ observations and $N = (m+1)n$

  • $s$ could either be a $p$-variate random vector $x$, or $(y, x)$
  • $\cM_0$: the master machine that is in charge of integrating information from worker machines $\cM_1,\ldots,\cM_m$
  • $\cB$: Byzantine machines set
  • $\cG$: good/normal machines set

assume the normal data are i.i.d. drawn from $P_0$, but the data on Byzantine machine is not from $P_0$

it is the following multiple testing problem

\[H_{0j}: \cM_j \in \cG\quad \text{v.s.} \quad H_{1j}: \cM_j\in \cB,\quad j \in [m]\]

Byzantine Machines Identification Procedure

consider a generic distributed risk minimization framework.

  • $\ell(s, \theta)$: the loss function with parameter $\theta$
  • $g(s, \theta)$: gradient function

each worker machine computes an empirical gradient

\[g_j(\theta_0) = \frac 1n \sum_{i\in \cM_j} g(s_i, \theta_0)\]

denote $\mu^\star$ as the mean of $g_j(\theta_0)$ when $\cM_j\in \cG$ is one normal machine.

then the alternative hypotheses can be written as

\[H_{1j}: \bbE[g_j(\theta_0)] \neq \mu^\star\quad\text{for } j \in [m]\]
  1. data splitting: $g_{1j}$ and $g_{2j}$
  2. obtain robust mean estimator for $\mu^\star$, denoted by $\hat g(\theta)$
  3. construct the ranking score which provides $\cM_j$ may be one Byzantine machine
\[W_j = (g_{1j}(\theta) - \hat g(\theta))^\top \Omega (g_{2j}(\theta) - \hat g(\theta))\]
  • a large positive $W_j$ indicates that $\cM_j$ is likely to be the Byzantine machine
  1. fdr control

image


Published in categories