Data Thinning for Convolution-Closed Distributions
Posted on
sample splitting cannot be applied when there is one parameter of interest per observation, or the parameter of interest is a function of the observations
- when estimating a low-rank approximation to a matrix, there is one parameter of interest of interest (a latent variable coordinate) for each of the rows in the matrix
- in fixed-covariate regression under model misspecification, the target parameter depends on the specific observations included in the data set
- settings in which we wish to draw observation-specific inferences about each of the observations
an alternative to sample splitting
outside of the following two distributions, no proposals are available to split a random variable into independent parts that follow the same distribution as the original random variable.
- split with known into two independent Gaussian random variables
- split into two independent Poisson random variables
Gamma decomposition into components, data thinning: suppose that , where is unknown. Take , where . Then are mutually independent, they sum to , and each is marginally drawn from a distribution.
- Section 6: validating the results of clustering and low-rank matrix approximations
The Data Thinning Proposal
A review of convolution-closed distributions
Let denote a distribution indexed by a parameter in parameter space . Let and with . If whenever , then is convolution-based in the parameter
datat thinning
effect of unknown nuisance parameters
consider what happens when perform data thinning on Gaussian data using an incorrect value of the variance