WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Data Thinning for Convolution-Closed Distributions

Posted on 0 Comments
Tags: Data Thinning

This note is for Neufeld, A., Dharamshi, A., Gao, L. L., & Witten, D. (2024). Data Thinning for Convolution-Closed Distributions. Journal of Machine Learning Research, 25(57), 1–35.

sample splitting cannot be applied when there is one parameter of interest per observation, or the parameter of interest is a function of the $n$ observations

  • when estimating a low-rank approximation to a matrix, there is one parameter of interest of interest (a latent variable coordinate) for each of the $n$ rows in the matrix
  • in fixed-covariate regression under model misspecification, the target parameter depends on the specific $n$ observations included in the data set
  • settings in which we wish to draw observation-specific inferences about each of the $n$ observations

an alternative to sample splitting

outside of the following two distributions, no proposals are available to split a random variable into independent parts that follow the same distribution as the original random variable.

  • split $X \sim N(\mu, \sigma^2)$ with known $\sigma^2$ into two independent Gaussian random variables
  • split $X \sim Poisson(\lambda)$ into two independent Poisson random variables

Gamma decomposition into $M$ components, data thinning: suppose that $X\sim Gamma(\alpha, \beta)$, where $\beta$ is unknown. Take $(X^{(1)},\ldots, X^{(M)}) = XZ$, where $Z\sim Dirichlet(\alpha/M,\ldots, \alpha/M)$. Then $X^{(1)},\ldots, X^{(M)}$ are mutually independent, they sum to $X$, and each is marginally drawn from a $Gamma(a/M, \beta)$ distribution.

  • Section 6: validating the results of clustering and low-rank matrix approximations

The Data Thinning Proposal

A review of convolution-closed distributions

Let $F_\lambda$ denote a distribution indexed by a parameter $\lambda$ in parameter space $\Lambda$. Let $X’ \sim F_{\lambda_1}$ and $X’‘\sim F_{\lambda_2}$ with $X’\ind X’’$. If $X’+X’’ \sim F_{\lambda_1 +\lambda_2}$ whenever $\lambda_1 + \lambda_2 \in \Lambda$, then $F_\lambda$ is convolution-based in the parameter $\lambda$

image

datat thinning

image

image

effect of unknown nuisance parameters

consider what happens when perform data thinning on Gaussian data using an incorrect value of the variance

Multifold data thinning

Comparing data thinning and sample splitting

Theoretical comparison to sample splitting


Published in categories