Causal Inference on Distribution Functions
Posted on (Update: )
Wasserstein Space
- $\cI$ be an interval of $\IR$,
- $V_1$ and $V_2$ be random variables taking values in $\cI$ with finite second moments
- $\lambda_1, \lambda_2$: cumulative distribution functions
the 2-Wassertein distance between $\lambda_1$ and $\lambda_2$ is defined as
\[W_2(\lambda_1, \lambda_2) = \left(\inf_{\lambda_{12}\in \Lambda(\lambda_1,\lambda_2)}\int_{\cI\times \cI}(s-t)^2d\lambda_{12}(s, t)\right)^{1/2}\,.\]the Wasserstein distance corresponds to the minimum effort that is required in order to transport the mass of $\lambda_1$ to produce the mass distribution of $\lambda_2$
Causal inference on distribution functions
both $Y_i(1)$ and $Y_i(0)$ take value in the Wassertein space, we define their means using their Wasserstein barycentres:
\[\mu_a = \bbE Y(a) = \argmin_{v\in\cW_2\cI}\bbE[W_2^2(Y(a), v))]\] \[\newcommand\oE{\mathrm{E}\!\!\circ}\]ideally, a causal effect definition in the Wasserstein space should satisfy the following desiderata:
- (a) when $\oE Y(1) = \oE Y(0)$, the causal effect equals zero
- (b) in the degenerate case where $Y_i(a) = \delta_{y_i(a)}$, reduces to the classical scenario
- (c) the average causal effect is a contrast between the averages of potential outcomes in two hypothetical populations
- (d) the average causal effect equals the average of individual causal effects
causal effect defined in this way satisfies desierata (a)-(c), in general, it fails to satisfy desideratum (d)
the paper introduces a novel definition of average causal effect, called the causal effect map
Let $\lambda$ be a continuous distribution function. The individual causal effect map of $A$ on $Y$ is defined as
\[\Delta_i^\lambda (\cdot) = Y_i(1)^{-1} \circ \lambda(\cdot) -Y_i(0)^{-1}\circ \lambda(\cdot)\]where $\lambda$ is a reference distribution.