Talagrand Concentration
Posted on (Update: ) 0 Comments
This note is for Wainwright, M. J. (n.d.). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. 604.
Let $\cF$ be a class of functions (each of the form: $f:\cX\rightarrow \bbR$), and let $(X_1,\ldots, X_n)$ be drawn from a product distribution $\bbP = \otimes_{i=1}^n \bbP_i$, where each $\bbP_i$ is supported on some set $\cX_i\subseteq \cX$. Consider the random variable
\[Z = \sup_{f\in \cF}\frac 1n \sum_{i=1}^n f(X_i)\,.\]Consider a countable class of functions $\cF$ uniformly bounded by $b$. Then for all $\delta > 0$, the random variable $Z$ satisfies the upper tail bound
\[\bbP[Z > \bbE[Z] + \delta] \le 2\exp\left( \frac{-n\delta^2}{56\bbE[\Sigma^2] + 4b\delta} \right)\,,\]where $\Sigma^2 = \sup_{f\in \cF}\frac 1n\sum_{i=1}^n f^2(X_i)$.
The expectation $\bbE[\Sigma^2]$ can be upper bounded. Using symmetrization techniques, it can be shown that
\[\bbE[\Sigma^2] \le \sigma^2 + 2b\bbE[Z]\,,\]where $\sigma^2 = \sup_{f\in \cF} \bbE[f^2(X)]$.
Let $\cF$ be a class of integrable real-valued functions with domain $\cX$, and let $X_1^n = {X_1, \ldots, X_n}$ be a collection of i.i.d. samples from some distribution $\bbP$ over $\cX$. Consider the random variable
\[\Vert \bbP_n - \bbP\Vert_{\cF} = \sup_{f\in \cF}\left\vert \frac 1n \sum_{i=1}^n f(X_i) -\bbE f(X) \right\vert\]Relate the random variable $\Vert \bbP_n -\bbP\Vert_{\cF}$ to its symmetrized version
\[\Vert \bbS_n\Vert_{\cF} = \sup_{f\in \cF}\vert \frac{1}{n}\sum_{i=1}^n \varepsilon_i f(X_i)\vert\]Note that the expectation of $\Vert S_n\Vert_{\cF}$ corresponds to the Rademacher complexity.
For any convex non-decreasing function $\Phi: \bbR \rightarrow \bbR$, we have
\[\bbE_{X, \varepsilon} [\Phi(\frac 12\Vert S_n\Vert_{\bar \cF})] \le \bbE_{X}[\Phi(\Vert P_n - P\Vert_{\cF})] \le \bbE_{X,\varepsilon}[\Phi(2\Vert S_n\Vert_{\cF})]\,,\]where $\bar \cF = {f-\bbE[f], f\in \cF}$ is the re-centered function class.