Data Thinning for Convolution-Closed Distributions

Posted on Aug 29, 2024

Tags: Data Thinning

This note is for Neufeld, A., Dharamshi, A., Gao, L. L., & Witten, D. (2024). Data Thinning for Convolution-Closed Distributions. Journal of Machine Learning Research, 25(57), 1–35.

sample splitting cannot be applied when there is one parameter of interest per observation, or the parameter of interest is a function of the $n$ observations

when estimating a low-rank approximation to a matrix, there is one parameter of interest of interest (a latent variable coordinate) for each of the $n$ rows in the matrix
in fixed-covariate regression under model misspecification, the target parameter depends on the specific $n$ observations included in the data set
settings in which we wish to draw observation-specific inferences about each of the $n$ observations

an alternative to sample splitting

outside of the following two distributions, no proposals are available to split a random variable into independent parts that follow the same distribution as the original random variable.

split $X \sim N(\mu, \sigma^2)$ with known $\sigma^2$ into two independent Gaussian random variables
split $X \sim Poisson(\lambda)$ into two independent Poisson random variables

Gamma decomposition into $M$ components, data thinning: suppose that $X\sim Gamma(\alpha, \beta)$, where $\beta$ is unknown. Take $(X^{(1)},\ldots, X^{(M)}) = XZ$, where $Z\sim Dirichlet(\alpha/M,\ldots, \alpha/M)$. Then $X^{(1)},\ldots, X^{(M)}$ are mutually independent, they sum to $X$, and each is marginally drawn from a $Gamma(a/M, \beta)$ distribution.

Section 6: validating the results of clustering and low-rank matrix approximations

The Data Thinning Proposal

A review of convolution-closed distributions

Let $F_\lambda$ denote a distribution indexed by a parameter $\lambda$ in parameter space $\Lambda$. Let $X’ \sim F_{\lambda_1}$ and $X’‘\sim F_{\lambda_2}$ with $X’\ind X’’$. If $X’+X’’ \sim F_{\lambda_1 +\lambda_2}$ whenever $\lambda_1 + \lambda_2 \in \Lambda$, then $F_\lambda$ is convolution-based in the parameter $\lambda$

datat thinning

effect of unknown nuisance parameters

consider what happens when perform data thinning on Gaussian data using an incorrect value of the variance

Multifold data thinning

Comparing data thinning and sample splitting

Theoretical comparison to sample splitting

Published in categories

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.