WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Multiaccuracy: Black-Box Post-Processing for Fairness

Posted on
Tags: Classification, Fairness

This note is for Kim, M. P., Ghorbani, A., & Zou, J. (2018). Multiaccuracy: Black-Box Post-Processing for Fairness in Classification (No. arXiv:1805.12317). arXiv.

it develops a rigorous framework of multiaccuracy auditing and post-processing to ensure accurate predictions across identifiable subgroups

The algorithm, Multiaccuracy Boost, works in any setting where we have black-box access to a predictor and a relatively small set of labeled daat for auditing

empirically the intuition that machine-learned classifiers may optimize predictions to perform well on the majority population, inadvertently hurting performance on the minority population in significant ways

given black-box access to a classifier, $f_0$, and a relatively small ‘‘validation set’’ of labeled samples drawn from some representative distribution $\cD$

the goal is to audit $f_0$ to determine whether the predictor satisfies a strong notion of subgroup fairness, multiaccuracy.

Setting and multiaccuracy

2.1 Multiaccuracy

Let $\alpha\ge 0$ and $\cC\subseteq [-1, 1]^\cX$ be a class of functions on $\cX$. A hypothesis $f:\cX\rightarrow [0, 1]$ is $(\cC, \alpha)$-multiaccurate if for all $c\in \cC$: \(\bbE_{x\in \cD}[c(x)\cdot (f(x) - y(x))] \le \alpha\)

2.2 Classification accuracy from multiaccuracy

Multiaccuracy guarantees that the predictions of a classifier appear unbiased on a rich class of subpopulations.

2.3 Auditing for multiaccuracy

use a learning algorithm $\cA$ to audit a classifier $f$ for multiaccuracy

the algorithm $\cA$ receives a small sample from $\cD$ and aims to learn a function $h$ that correlates with the residual function $f - y$

3 post-processing for multiaccuracy

the post-processing algorithm is an iterative procedure similar to boosting, that uses the multiplicative weights framework to improve suboptimal predictions identified by the auditor.

“do-no-harm” guarantee: informally, if $f_0$ has low classification error on some subpopulation $S\subseteq \cX$ identified by $\cA$, then the resulting classification error on $S$ cannot increase significantly.

4 Experimental Evaluation

4.1 Multiaccuracy improves gender detection

4.2 Additional case studies

Adult Income Prediction

4.2.1 Semi-Synthetic Disease Prediction from UK Biobank


Published in categories