# Data Thinning for Convolution-Closed Distributions

##### Posted on 0 Comments

sample splitting cannot be applied when **there is one parameter of interest per observation, or the parameter of interest is a function of the $n$ observations**

- when estimating a low-rank approximation to a matrix, there is one parameter of interest of interest (a latent variable coordinate) for each of the $n$ rows in the matrix
- in fixed-covariate regression under model misspecification, the target parameter depends on the specific $n$ observations included in the data set
- settings in which we wish to draw observation-specific inferences about each of the $n$ observations

an alternative to sample splitting

outside of the following two distributions, no proposals are available to split a random variable into independent parts that follow the same distribution as the original random variable.

- split $X \sim N(\mu, \sigma^2)$ with known $\sigma^2$ into two independent Gaussian random variables
- split $X \sim Poisson(\lambda)$ into two independent Poisson random variables

Gamma decomposition into $M$ components, data thinning: suppose that $X\sim Gamma(\alpha, \beta)$, where $\beta$ is unknown. Take $(X^{(1)},\ldots, X^{(M)}) = XZ$, where $Z\sim Dirichlet(\alpha/M,\ldots, \alpha/M)$. Then $X^{(1)},\ldots, X^{(M)}$ are mutually independent, they sum to $X$, and each is marginally drawn from a $Gamma(a/M, \beta)$ distribution.

- Section 6: validating the results of clustering and low-rank matrix approximations

## The Data Thinning Proposal

### A review of convolution-closed distributions

Let $F_\lambda$ denote a distribution indexed by a parameter $\lambda$ in parameter space $\Lambda$. Let $X’ \sim F_{\lambda_1}$ and $X’‘\sim F_{\lambda_2}$ with $X’\ind X’’$. If $X’+X’’ \sim F_{\lambda_1 +\lambda_2}$ whenever $\lambda_1 + \lambda_2 \in \Lambda$, then $F_\lambda$ is convolution-based in the parameter $\lambda$

### datat thinning

### effect of unknown nuisance parameters

consider what happens when perform data thinning on Gaussian data using an incorrect value of the variance