WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Identify Multiple Treatments when Unmeasured Confounding

Posted on
Tags: Instrumental Variable, Unmeasured Confounding, Multiple Treatments

This note is for Miao, W., Hu ,Wenjie, Ogburn ,Elizabeth L., & and Zhou, X.-H. (2023). Identifying Effects of Multiple Treatments in the Presence of Unmeasured Confounding. Journal of the American Statistical Association, 118(543), 1953–1967.

  • identification of treatment effects in the presence of unmeasured confounding
  • the problem of unmeasured confounding in settings with multiple treatments is most common in statistical genetics and bioinformatics settings, where researchers have developed many successful statistical strategies without engaging deeply with the causal aspects of the problem
  • a number of attempts to bridge the gap between these statistical approaches and causal inference, but either flawed or have relied on filly parametric assumptions

the paper proposed two strategies for identifying and estimating causal effects of multiple treatments in the presence of unmeasured confounding

  • the auxiliary variables approach leverages variables that are not causally associated with the outcome;
    • in the case of a univariate confounder, the method only requires one auxiliary variable, unlike existing instrumental variable methods that would require as many instruments as there are treatments
  • an alternative null treatment approach relies on the assumption that at least half of the confounded treatments have no causal effect on the outcome, but does not require a priori knowledge of which treatments are null

the identification strategies do not impose parametric assumptions on the outcome model and do not rest on estimation of the confounder

the article extends and generalizes existing work on unmeasured confounding with a single treatment and models commonly used in bioinformatics.

Introduction

  • identification means that the treatment effect of interest in uniquely determined from the joint distribution of observed variables
  • most of the work on unmeasured confounding by causal inference researchers focuses on settings with a single treatment, and either harnesses auxiliary variables to achieve point identification of causal effects, or relies on sensitivity analyses or on weak assumptions to derive bounds for the effects of interest
  • a large body of work from statistical genetics and computational biology is concerned with multiple treatments
    • e.g., GWAS with confounding by population structure and computational biology applications with confounding by batch effects

for a single treatment, a variety of methods have been developed to test, adjusted for, and eliminate unmeasured confounding bias

  • sensitivity analysis and bounding are used to evaluate the robustness of causal inference to unmeasured confounding

for point identification of the treatment effect, the instrumental variable (IV) is an influential tool.

potential of using confounder proxy variables and negative controls for adjustment of confounding bias

similar methods can sometimes be used in settings with multiple treatments, simply treating them as a single vector-valued treatment. these approaches allow for unrestricted correlations among the treatmentss

  • however, typically in GWAS and computational biology settings, correlations among treatments contains useful information about the confounding, these methods cannot leverage the information

latent variable methods leveraging the multi-treatment correlation structure have been used to estimate and control for unmeasured confounders in biological applications since the early 2000s

recently, a few authors have attempted to elucidate the causal structure underlying these statistical procedures anbd to establish rigorous theoretical guarantees for identification, using fully parametric models.

  • Wang et al. (2017): propose confounding adjustment approaches for the effects of a treatment on multiple outcomes under a linear factor model; by reversing the labeling of the outcome and treatments, their approaches can test but not identify the effects of multiple treatments on the outcome.
  • Kong, Yang, and Wang (2022): consider a binary outcome with a univariate confounder and prove identification under a linear factor model for multiple treatments and a parametric outcome model via analysis of the link distribution, but their approach cannot generalize to the multivariate confounder setting
  • linear outcome models with high-dimensional treatments that are confounded or mismeasured
    • identification is implied by the fact that confounding on each treatment vanishes as the number of treatments goes to infinity
    • in contrast, the paper takes a fundamentally causal approach to confounding and to identification of treatment effects by allowing the outcome model to be unrestricted, the treatment-confounder distribution to lie in a more general, though not unrestricted, class of models, the number of treatments to be finite, and confounding to not vanish.
  • wang and blei (2019): an intuitive justification for using latent variable methods in general multi-treatment unmeasured confounding settings; they call the justification and resulting method the deconfounder
    • their approach uses a factor model assuming that treatments are independent conditional on the confounder to estimate the confounder, and the confounder estimate is used for adjustment of bias
      • however, as demonstrated in a counterexample, identification is not guaranteed for the deconfounder, that is, the treatment effects cannot be uniquely determined from the observed data even with an infinite number of data samples
      • additionally, an infinite number of treatments are required for consistent estimation of the confounder, complicating finite sample inference and undermining positivity
  • refinements and discussions of the deconfounder approach:
      1. proximal inference
      1. conventional instrumental variable approach to facilitate identification
    • However, if correlations among the multiple treatments are indicative of confounding, as the deconfounder approach assumes, neither of these methods makes use of that correlation.
    • moreover, their extension to the multi-treatment setting is complicated by the fact that the proximal inference requires confounder proxies to be causally uincorrelated with any of the treatments and the instrumental variable approach requires at least as many instrumental variables as there are treatments

Contribution

  • Section 2: review the challenges for identifying multi-treatment effects in the presence of unmeasured confounding.
  • Sections 3 & 4: propose two novel approaches for the identification of causal effects of multiple treatments with unmeasured confounding: an auxiliary variables approach and a null treatments approach
    • both approaches rely on two assumptions restricting the joint distribution of the unmeasured confounder and treatments
    • assumption 1: the joint treatment-confounder distribution lies in a class of models that satisify a particular equivalence property that is known to hold for amny commonly used models, e.g., many types of factor and mixture models
    • assumption 2: the treatment-confounder distribution satisifies a completeness condition thhat is standard in nonparametric identification problems
    • in addition to these two assumptions, the auxiliary variable approach leverags an auxiliary variable that does not directly affect the outcome to identify treatment effects, such as an IV or confounder proxy
      • in the presence of a univariate confounder, identification can be achieved with the proposed approach even if only one auxiliary variable is available and if it is associated with only one confounded treatment
      • in contrast, IV approaches require as manu instrumental variables as there are treatments and that all confounded treatments must be associated with the instrumental variables
      • the null treatment approach does not require any auxiliary variables, but instead rests on the assumption that at least half of the confounded treatments are null, without requiring knowledge of which are active and which are null
    • in the absence of auxiliary variables and if the null treatments assumption fails to hold, the method still constitutes a valid test of the null hypothesis of no joint treatment effect
  • Section 5: some estimation strategies
  • Section 6: simulations, the proposed approaches perform well with little bias and appropriate coverage rates
  • Section 7: a data example about mouse obesity, apply the approaches to detect genes possibly causing mouse obesity, which reinforces previous findings by taking unmeasured confounding into account.
  • Section 8: conclusion with brief mention of some potential extension of the approaches

Preliminaries and Challenges to Identification

  • $X = (X_1,\ldots, X_p)^T$: a vector of $p$ treatments
  • $Y$: outcome
  • we are interested in the effects of $X$ on $Y$, which may be confounded by a vector of $q$ unobserved covariates $U$
  • $q$: the dimension of the confounder, is assumed to be known a priori
  • $f(x, u)$: the treatment-confounder distribution
  • $f(y\mid u, x)$: the outcome model
  • $Y(x)$: the potential outcome that would have been observed has the treatment $X$ been set to $x$
  • treatment effects are defined by contrasts of potential outcomes between different treatment conditions, and thus, focus on identification of $f(Y(x))$
  • we say that $f(Y(x))$ is identified if and only if it is uniquely determined by the joint distribution of observed variables

three standard identifying assumptions:

Assumption 1:

(i): Consistency: When $X=x, Y=Y(x)$ (ii): Ignorability: $Y(x)\ind X\mid U$ (iii): Positivity: $0 < f(X=x\mid U=u) < 1$ for all $(x, u)$

  • Ignorability, also called “exchangeability”, ensures that treatment assignments are effectively randomized conditional on $U$ and implies that $U$ suffices to control for all confounding.
  • Positivity, also called “overlap”, ensures that for all values of $U$ all treatment values have positive probability

if one were able to observe the confounder $U$, Assumption 1 would permit fully nonparametric identification of $f{Y(x)}$ by the back-door formula or the g-formula

\[f(Y(x) = y) = \int_u f(y\mid u, x)f(u)du\]

but when $U$ is not observed, all information contained in the observed data is captured by $f(y, x)$, from which one cannot uniquely determine the joint distribution $f(y, x, u)$. One has to solve for $f(x, u)$ and $f(y\mid u, x)$ from

\[f(x) = \int_u f(x, u)du\\ f(y\mid x) = \int_u f(y\mid u, x)f(u\mid x)du\]
  • $f(x, u)$ cannot be uniquely determined, even if a factor model is imposed on $f(x, u)$
  • even if $f(x, u)$ is known, the outcome model $f(y\mid u, x)$ cannot be identified
  • the lack of identification of $f(y\mid u, x)$ is due to the unknown copula of $f(y\mid x)$ and $f(u\mid x)$

one cannot identify the true joint distribution $f(y, x, u)$ that is essential for the g-formula

we call a joint distribution $\tilde f(y, x, u)$ admissibloe if it conforms to be the observed data distribution $f(y, x)$, that is,

\[f(y, x) = \int_u \tilde f(y, x, u)du\]
  • different admissible joint distributions result in different potential outcome distributions, that is, the potential outcome distribution is not identified without additional assumptions
  • some previous approaches estimated $U$ directly with a deterministic function of $X$, but this controverts the positivity assumption and requires an infinite number of treatments in order to consistently estimate $U$
  • in these settings, the effect of $X$ and $Y$ is asymptotically unconfounded and a naive regression of $Y$ on $X$ weakly dominates these more involved approaches

Identification with Auxiliary Variables

The Auxiliary Variables Assumption

suppose we have available a vector of auxiliary variables, $Z$, then the observed data distribution is captured by $f(x, y, z)$, from which one aims to identify the potential outcome distribution $f{Y(x)}$.

  • $f(x, u\mid z; \alpha)$: a model for the treatment-confounder distribution indexed by a possibly infinite-dimensional parameter $\alpha$
  • $f(x\mid z;\alpha)$: the resulting marginal distribution

given $f(x\mid z;\alpha)$, let $f(x, u\mid z;\tilde \alpha)$ denote an arbitrary admissible joint distribution such that

\[f(x\mid z;\alpha) = \int_u f(x, u\mid z;\tilde \alpha)du\]

write $\tilde f(x, u\mid z) = f(x, u\mid z;\tilde \alpha)$

(i). Exclusion restriction (ii). Equivalence: (iii). Completeness

the exclusion restriction rules out the existence of a direct causal association between the auxiliary variable and the outcome

Image

equivalence is a high-level assumption stating that the treatment-confounder distribution lies in a model that is identified upon a one-to-one transformation of $U$

  • it restricts the class of treatment-confounder distributions; e.g., it is not met if the dimension of confounders exceeds that of the treatments
  • it admits a large class of models. In particular, it allows for any factor model or mixture model that is identified, where identification in the context of these models does not imply point identification but rather identification up to a rotation (factor models) or up to label switching (mixture models)
  • such model assumptions are often used in bioinformatics applications where the unmeasured confounder represents population structure (GWAS) or lab batch effects
  • identification results for factor and mixture models have been very well established
  • a major limitation of factor models is that they are in general not identified when there are single-treatment confounders or when there are causal relationships among the treatments

completeness is a fundamental concept in statistics, and primitive conditions are readily available in the literature, including the fact that it holds for very general exponential families of distributions and for many regression models

  • the role of completeness in this article is analogous to its wide use in a variety of nonparametric and semiparametric identification problems, e.g., in IV regression, IV quantile regression, measurement error problem, missing data, and proximal inference
  • the completeness assumption means that, conditional on $X$, any variability in $U$ is captured by variability in $Z$, analogous to the relevance condition in the instrumental variable identification
  • for the binary confounder case, completeness holds if $U$ and $Z$ are correlated within each level of $X$
  • when both $U$ and $Z$ have $k$ levels, completeness means that the matrix $[f(u_i\mid x, z_j)]_{k\times k}$ consisting of the conditional probabilitistics is invertible.
  • this is stronger than dependence of $Z$ and $U$ given $X$. roughly speaking, dependence reveals that variability in $U$ is accompanied by variability in $Z$, and completeness reinforces that any infinitesimal variability in $U$ is accompanied by variability in $Z$

Consider a factor model $X = \alpha U + \eta Z + \varepsilon$ for a vector of $p$ observed variables $X$, $q$ unobserved confounders $U$, and $r (\ge q)$ instrumental variables $Z$ such that $Z\ind U\ind \varepsilon$. w.l.o.g, let $E(U) = 0, \Sigma_U = I_q, E(\varepsilon) = 0$, and $\Sigma_\varepsilon$ be diagonal. Assuming there remain two disjoint submatrices of rank $q$ after deleting any row of $\alpha$, we have that

  • $\alpha\alpha^T$ and $\Sigma_\varepsilon$ are uniquely determined from $\Sigma_{X-\eta Z} = \alpha\alpha^T + \Sigma_\varepsilon$, and any admissible value for $\alpha$ can be written as $\tilde \alpha = \alpha R$ with $R$ an arbitrary $q\times q$ orthogonal matrix
  • if the components of $\varepsilon$ are mutually independent and the joint characteristic function of $X$ does not vanish, then any admissible joint distribution can be written as $\tilde f(x, u\mid z) = f(X=x, R^TU=u\mid Z=z;\alpha)$ with $R$ an arbitrary $q\times q$ orthogonal matrix
  • if $U, Z, \varepsilon^T$ are normal variables and $\eta^T\gamma$ has full rank of $q$, then $f(u\mid x, z) \sim N(\gamma^Tx - \gamma^T\eta z, \Sigma)$ and is complete in $z$, where $\gamma = (\Sigma_{X-\eta Z})^{-1}\alpha, \Sigma = I_q - \alpha^T(\Sigma_{X-\eta Z})^{-1}\alpha$
  • Proposition 1 requires that $p \ge 2q + 1$ and that each confounder is correlated with at least three observed variables, and therefore, implies “no single- or dual-treatment confounders”

3.2 Identification

Under Assumptions 1 and 2, for any admissible joint distribution $\tilde f(x, u\mid z)$ that solves $f(x\mid z) = \int_u \tilde f(x, u\mid z)du$, there exists a unique solution $\tilde f(y\mid u, x)$ to the equation \(f(y\mid x, z) = \int_u \tilde f(y\mid u, x)\tilde f(u\mid x, z)du \tag{(4)}\) and the potential outcome distribution is identified by \(f(Y(x) = y) = \int_u \tilde f(y\mid u, x)\tilde f(u)du\) where $\tilde f(u)$ is obtained from $\tilde f(x, u\mid z)$ and $f(z)$

Theorem 1 depicts three steps of the auxiliary variables approach

  1. first we obtain an arbitrary admissible distribution $\tilde f(x, u\mid z)$
  2. solve Equation (4) to identify $\tilde f(y\mid u, x)$, which encodes the treatment effect within each stratum of the confounder
  3. integrate the stratified effect to obtain the treatment effect in the population

the auxiliary variables approach does not estimate the confounder, or even a surrogate confounder, and thus, dispenses with the need for an infinite number of treatments and avoids the forced positivity violations

the auxiliary variable is indispensable in the second stage of the approach; without it one has to solve

\[f(y\mid x) = \int_u \tilde f(u\mid u, x)\tilde f(u\mid x)du\]

for the outcome model. the solution to this equation is not unique given $f(y\mid x)$ and $\tilde f(u\mid x)$

by incorporating an auxiliary variable satisifying the exclusion restriction, we obtain Equation (4), a Fredholm integral equation of the first kind. The solution of this equation is unique under the completeness condition and thus, identifies the outcome model, up to an invertible transformation of the confounder.

Equation (4) also offers testable implications for Assumption 2: if the equation does not have a solution, then Assumption 2 must be partially violated

unlike the g-formula, we do not identify the true outcome model $f(y\mid u, x)$ or the true confounder distribution $f(u)$, but instead we obtain

\[\tilde f(y\mid u, x) = f\{y\mid V(U) = u, x\}\]

and

\[\tilde f(x, u\mid z) = f\{x, V(U) = u\mid z\}\]

for some invertible transformation $V(U)$.

Nonetheless, for any such admissible pair of outcome model and treatment-confounder distribution, we can still identify the potential outcome distribution, because ignorability holds conditional on any such transformation of $U$.

The equivalence assumption guarantees that any admissible distribution $\tilde f(x, u\mid z)$ can be used for identifying the potential outcome distribution; we do not need to use the truth $f(x, u\mid z)$, and thus, bypass the challenge to identifying it.

Although Theorem 1 shows that the potential outcome distribution is identified, the integral Equation (4) does not admit an analytic solution in general and one has to resort to numerical methods

Suppose $p$ treatments, one confounder, one instrumental variable, and one outcome are generated as $X = \alpha U + \eta Z + \varepsilon$ and $Y=m(X, U, e)$, where $(\varepsilon^T, U, Z)$ is a vector of independent normal variables with mean zero, $\Sigma_U = 1$, $m$ is unknown, and $e\ind (\varepsilon^T, U, Z)$. We require that at least three entries of $\alpha$ are nonzero and that $\eta^T\gamma\neq 0$, in which case, the equivalence and completeness assumptions are met according to Proposition 1. Given an admissible value $\tilde \alpha$, we let $\tilde \gamma = (\Sigma_{X-\eta Z})^{-1}\tilde \alpha, \tilde\sigma^2 = 1 - \tilde\alpha^T(\Sigma_{X-\eta Z})^{-1}\tilde \alpha$, then $\tilde f(u\mid x, z)\sim N(\tilde\gamma^Tx - \tilde \gamma^T\eta z, \tilde\sigma^2)$ is admissible distribution for $f(u\mid x, z)$. Let $h_1(t)$ and $h_2(y, x, t)$ be the Fourier transforms of the standard normal density function $\phi$ and $f(y\mid x, z)$, respectively. Then the solution to Equation (4) with $\tilde f(u\mid x, z)$ given above is \(\tilde f(y\mid x, u) = \frac{1}{2\pi} \int_{-\infty}^\infty \exp\left(\frac{itu}{\tilde \sigma}\right)\frac{h_2(y, x, t)}{h_1(t)}dt\) and the potential outcome distribution is \(f(Y(x) = y) = \int_{-\infty}^\infty \tilde f(y\mid u, x)\phi(u)du\)

3.3 A Comparison to the Conventional Instrumental Variable and Proximal Inference Approaches

In addition to the exclusion restriction $Z\ind Y\mid (X, U)$, the instrumental variable approach requires additional assumptions to achieve identification, such as an additive outcome model not allowing for interaction of the treatment and the confounder $E(Y\mid u, x) = m(x) + u$ as well as completeness in $z$ of $f(x\mid z)$

Alternative strands of using IV for confounding adjustment include nonseparable outcome models and local average treatment effect models

However, these two approaches typically focus on a single (or binary) treatment and the authors are not aware of any extensions for multiple treaments.

therefore, they compared their approaches to the additive model, which has a straightforward extension to multiple treatments

the completeness of $f(x\mid z)$ guarantees uniqueness of the solution to $E(Y\mid z) = \int_x m(x) f(x\mid z)dx$, an integral equation identifying $m(x)$

in constrast, the approach does not rest on outcome model restrictions and hence, accommodates interactions.

  • the completeness of $f(x\mid z)$ in $z$ entails at least as many instrumental variables as there are confounded treatments, and requires each confounded treatment to be correlated with at least one instrumental variable.
  • for the auxiliary variables approach, completeness of $f(u\mid x, z)$ in $z$ requires the dimension of $Z$ to be as great as that of $U$, which can be much smaller than that of the treatments
  • proximal inference allows for unrestricted outcome models, but entails at least two confounder proxies $(W, Z)$ with exclusion restrictions: $W\ind (X, Z)\mid U$ and $Z\ind Y\mid (X, U)$, even in the single confounder setting. This approach additionally assumes existence of a function $h(w, y, x)$, called the confounding bridge function, such that $f(y\mid u, x) = \int_w h(w, y, x)f(w\mid u)dw$, that is, $h(w, y, x)$ suffices to depict the relationship between the confounding on $Y$ and $W$. The integral equation $f(y\mid x, z) = \int_w h(w, y,x) f(w\mid x, z)dw$ is solved for the confounding bridge $h(w, y, x)$, and the potential outcome distribution is obtained by $f({Y(x) = y}) = \int_w h(w, y,x)f(w)dw$, where completeness of $f(u\mid x, z)$ in $z$ is also required for identification of $f{Y(x)}$.

a strength of these two approaches is that they leave the treatment-confounder distribution unrestricted, but when the correlation structure of multiple treatments is informative about the presence and nature of confounding, as is generally the case in GWAS and computational biology applications, the proposed method can exploit this correlation structure to remove the confounding bias, while the conventional instrumental variable and proximal inference approaches are agnostic to the treatment-confounder distribution, and therefore, unable to leverage any information it contains.

4. Identification Under the Null Treatments Assumption

4.1 Identification

  • $\cC = {i: f(u\mid x) \text{varies with }x_i}$ denotes the indices of confounded treatments
  • $\cA = {i: f(y\mid u, x) \text{varies with }x_i}$ the active ones that affect the outcome

the null treatments assumption entails that fewer than half of the confounded treatments can have causal effects on the outcome but does not require knowledge of which treatments are active

4.2 Hypothesis Testing Without Auxiliary Variables and Null Treatments Assumptions

a test of the sharp null hypothesis of no joint effects which requires neither auxiliary variables nor the null treatments assumption. The sharp null hypothesis is

\[H_0: f(y\mid u, x) = f(y\mid u)\]

for all $x$.

5. Estimation

5.1 General Estimation Strategies

a common principle of causal inference, and indeed statistics more broadly, that is nicely summed up by Cox and Donnelly (2011)

If an issue can be addressed nonparametrically, then it will often be better to tackle it parametrically; however, if it cannot be resolved nonparametrically then it is usually dangerous to resolve it parametrically.

5.2 The Auxiliary Variables Approach with Linear Models

5.3 The Null Treatments Approach with Linear Models

6. Simulations

6.1 The Auxiliary Variables Setting

  • two confounders $U$
  • six instrumental variables $Z$
  • six treatments $X$
  • an outcome $Y$
  • two outcome-inducing confounder proxies $W$
\[X = \alpha U + \eta Z + \varepsilon_X, Y = \beta^TX + \delta_YU +\varepsilon_Y\\ W = \delta_W U + \varepsilon_W\\ U\sim N(0, I_2), Z\sim N(0, I_6), \varepsilon_X\sim N(0, I_6), \\ \varepsilon_Y\sim N(0, 1); \varepsilon_W\sim N(0, I_2)\]

under the setting, $(X_2,\ldots, X_6)$ are confounded but $X_1$ is not. For estimation, consider eight methods:

  • $IV_1$: conventional IV approach using all six IVs
  • $IV_2$: conventional IV approach using five IVs $(Z_1, \ldots, Z_5)$ and treating $(X_6, Z_6)$ as covariates
  • $Aux_1$: the proposed auxiliary variables approach assuming two factors and using all six IVs
  • $Aux_2$
  • $Aux_3$
  • $PI_1$
  • $PI_2$
  • OLS: regression $Y$ on $X$ and $Z$

7. Applications to a Mouse Obesity Study

8. Discussion

  • the paper extends results for the identification of treatment effects in the presence of unmeasured confounding in the single-treatment to the multi-treatment setting
  • extends the parametric approach to identification of multi-treatment effects with an unrestricted outcome model
  • they have assumed that the number of confounders is known, which is realistic in confounder measurement error or misclassification problems.
  • the identification framework rests on the auxiliary variables or the null treatment assumption. These assumptions are partially testable.
  • they considered fixed dimensions of treatments and confounders, and it might be of interest to extend to large and high-dimensional settings

Published in categories