Gaussian DAGs on Network Data

Posted on Nov 19, 2019 (Update: Nov 25, 2019)

Tags: Graph

This post is based on Li, H., & Zhou, Q. (2019). Gaussian DAGs on network data. ArXiv:1905.10848 [Cs, Stat].

Graphical models based on DAGs have a variety of applications, including

genetics
causal inference
machine learning

Extensive research to recover the structure of DAGs from observational and experimental data. Most structure learning algorithms can be categorized as

score-based: maximizing a scoring function, such as MDL, BIC, Bayesian scores, with various search strategies
- order-based search
- greedy search
- coordinate descent
constraint-based: perform conditional independence tests among variables to construct a skeleton and then orient some of the edegs
hybrid methods: combine the above two approaches

Formulation of the classical DAG learning problem

$X=(X_1,\ldots,X_p)\in \IR^{n\times p}$: data matrix of $n$ i.i.d. observations on $p$ nodes.
goal: learn the structure of $G^*$
focus: the Gaussian DAG model which is equivalent to the following Gaussian linear structural equations model (SEM):

\begin{equation} x_i = B^Tx_i + e_i, \; e_i\sim N_p(0, \Omega), \; i\in [n]\label{eq:sem1} \end{equation} where $B = (\beta_{kj})_{p\times p}$ is the weighted adjacency matrix of $G^*$ with the edge set $E={(k, j):\beta_{kj} \neq 0 }$ and $\Omega =\diag(\omega_1^2,\ldots, \omega_p^2)$.

To get a better understanding, we can write out the $j$-th component,

\[x_{ij}=(B^Tx_i)_j + e_{ij} = \sum_{k=1}^p(B^T)_{jk}x_{ik} = \sum_{k=1}^pB_{kj}x_{ik}\,.\]

goal (re-expressed): given iid samples $\{x_i\}_{i=1}^n$, estimate $B$.

A model for network data

independence among observed samples $\{x_i\}_{i=1}^n$ is one of the main assumptions for all the aforementioned DAG learning methods.

Use an undirected graph $\cG$ with each node representing an observation $x_i$ and each edge representing the conditional dependence between two observations given the rest. Let $A(\cG)$ be the adjacency matrix of $\cG$.

Generalize the SEM \eqref{eq:sem1} to

\begin{equation}\label{eq:sem2} X_j =\sum_{k\in \Pi_j}\beta_{kj}X_k + \varepsilon_j\,, \; \varepsilon_j =(\varepsilon_{1j},\ldots, \varepsilon_{nj})\sim N_n(0, \omega_j^2\Sigma) \end{equation}

where $\Sigma$ is positive definite with $\diag(\Sigma)=1$, $\Pi_j$ is the parent set of $X_j$, and $\varepsilon_j$ are independent of each other. (so the parent set exhibits DAG, while the dependence of observations embodies via undirected graph model)

when $\Sigma=I_n$, the SEM \eqref{eq:sem2} reduces to \eqref{eq:sem1}.

The distinction between \eqref{eq:sem1} and \eqref{eq:sem2} becomes more clear when regarding \eqref{eq:sem2} as a semi-Markovian causal model. Representing each variable $z_i$ in a DAG $G$ on vertices $\{z_1,\ldots,z_p\}$ using a deterministic function:

\[z_i = f_i(\Pi_i, u_i), i\in [p]\,,\]

where $\Pi_i$ is the parents of node $z_i$ in $G$ and $u_i$ are noises, sometimes also referred to as background variables.

Markovian: the noise variables $u_i$ are jointly independent
semi-Markovian: they are dependent

Contributions

when the nodes of the DAG are

in natural ordering (??), develop a maximum penalized likelihood algorithm based on block coordinate descent (BCD) to jointly estimate the DAG structure and the sample correlations
unknown. show empirically that the algorithm can still estimate the sample correlation accurately, which can be used to improve existing DAG learning methods via data decorrelation.

Published in categories Memo

← previous next →

See all posts →

WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Gaussian DAGs on Network Data

Posted on Nov 19, 2019 (Update: Nov 25, 2019)

Formulation of the classical DAG learning problem

A model for network data

Contributions