WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Gaussian DAGs on Network Data

Posted on (Update: )
Tags: Graph

This post is based on Li, H., & Zhou, Q. (2019). Gaussian DAGs on network data. ArXiv:1905.10848 [Cs, Stat].

Graphical models based on DAGs have a variety of applications, including

  • genetics
  • causal inference
  • machine learning

Extensive research to recover the structure of DAGs from observational and experimental data. Most structure learning algorithms can be categorized as

  • score-based: maximizing a scoring function, such as MDL, BIC, Bayesian scores, with various search strategies
    • order-based search
    • greedy search
    • coordinate descent
  • constraint-based: perform conditional independence tests among variables to construct a skeleton and then orient some of the edegs
  • hybrid methods: combine the above two approaches

Formulation of the classical DAG learning problem

  • $X=(X_1,\ldots,X_p)\in \IR^{n\times p}$: data matrix of $n$ i.i.d. observations on $p$ nodes.
  • goal: learn the structure of $G^*$
  • focus: the Gaussian DAG model which is equivalent to the following Gaussian linear structural equations model (SEM):

\begin{equation} x_i = B^Tx_i + e_i, \; e_i\sim N_p(0, \Omega), \; i\in [n]\label{eq:sem1} \end{equation} where $B = (\beta_{kj})_{p\times p}$ is the weighted adjacency matrix of $G^*$ with the edge set $E={(k, j):\beta_{kj} \neq 0 }$ and $\Omega =\diag(\omega_1^2,\ldots, \omega_p^2)$.

To get a better understanding, we can write out the $j$-th component,

\[x_{ij}=(B^Tx_i)_j + e_{ij} = \sum_{k=1}^p(B^T)_{jk}x_{ik} = \sum_{k=1}^pB_{kj}x_{ik}\,.\]
  • goal (re-expressed): given iid samples $\{x_i\}_{i=1}^n$, estimate $B$.

A model for network data

independence among observed samples $\{x_i\}_{i=1}^n$ is one of the main assumptions for all the aforementioned DAG learning methods.

Use an undirected graph $\cG$ with each node representing an observation $x_i$ and each edge representing the conditional dependence between two observations given the rest. Let $A(\cG)$ be the adjacency matrix of $\cG$.

Generalize the SEM \eqref{eq:sem1} to

\begin{equation}\label{eq:sem2} X_j =\sum_{k\in \Pi_j}\beta_{kj}X_k + \varepsilon_j\,, \; \varepsilon_j =(\varepsilon_{1j},\ldots, \varepsilon_{nj})\sim N_n(0, \omega_j^2\Sigma) \end{equation}

where $\Sigma$ is positive definite with $\diag(\Sigma)=1$, $\Pi_j$ is the parent set of $X_j$, and $\varepsilon_j$ are independent of each other. (so the parent set exhibits DAG, while the dependence of observations embodies via undirected graph model)

when $\Sigma=I_n$, the SEM \eqref{eq:sem2} reduces to \eqref{eq:sem1}.

The distinction between \eqref{eq:sem1} and \eqref{eq:sem2} becomes more clear when regarding \eqref{eq:sem2} as a semi-Markovian causal model. Representing each variable $z_i$ in a DAG $G$ on vertices $\{z_1,\ldots,z_p\}$ using a deterministic function:

\[z_i = f_i(\Pi_i, u_i), i\in [p]\,,\]

where $\Pi_i$ is the parents of node $z_i$ in $G$ and $u_i$ are noises, sometimes also referred to as background variables.

  • Markovian: the noise variables $u_i$ are jointly independent
  • semi-Markovian: they are dependent

Contributions

when the nodes of the DAG are

  • in natural ordering (??), develop a maximum penalized likelihood algorithm based on block coordinate descent (BCD) to jointly estimate the DAG structure and the sample correlations
  • unknown. show empirically that the algorithm can still estimate the sample correlation accurately, which can be used to improve existing DAG learning methods via data decorrelation.

Published in categories Memo