WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Reconstruct Gaussian DAG

Posted on
Tags: DAG, Nonconvex, Optimization

This note is based on Yuan, Y., Shen, X., Pan, W., & Wang, Z. (n.d.). Constrained likelihood for reconstructing a directed acyclic Gaussian graph. Biometrika.

DAG is widely used to describe directional pairwise relations. The relations are estimated by reconstructing a directed acyclic graph’s structure, which is challenging when the ordering of nodes of the graph is unknown.


Existing methods such as the neighbourhood and search-and-score methods have high estimation errors or computational complexities, especially when a local or sequential approach is used to enumerate edge directions by testing or optimizing a criterion locally, as a local method may break down even for moderately sized graphs.


A novel approach to simultaneously identifying all estimable directed edges and model parameters, using constrained maximum likelihood with nonconvex constraints.

Develop a constraint reduction method that constructs a set of active constraints from super-exponentially many constraints.

Coupled with an alternating direction method of multipliers and a difference convex method, permits efficient computation for large-graph learning.

The proposed method consistently reconstructs identifiable directions of the true graph and achieves the optimal performances in terms of parameter estimation.

Numerically, the method compares favourably with competitors. A protein network is analysed to demonstrate that the proposed method can make a difference in identifying the network’s structure.


Two types

local conditional independence tests sequentially to check the existence of directed edges

usually have worst-case complexity that is exponential in the number of nodes

  • PC algorithm: $O(p^q)$ with $p$ nodes and maximal neighborhood size $q$.
  • Search-and-score: optimize a goodness-of-fit measure for possible directions in a neighborhood of interest.
  • $L_1$ penalization method for learning a directed acyclic graphical model with a known topological ordering of the nodes.
  • $L_1$ penalization method for interventional data
  • a high-dimensional additive structural equation model based on order search and penalized regression.

exact learning method

  • dynamic programming: guarantee a global minimizer but has exponential time complexity and is feasible only for small graphs.
  • mixed integer linear programming: may reduce the complexity to a polynomial order when the maximal number of parent nodes is restricted to one or two in the search. such a restriction limits the scope of application and may destroy the acyclically requirement of a directed acyclic graph and hence the local Markov property.

the paper claims that they can employ an iterative difference convex algorithm to seek a good local minimizer with fast convergence.

theoretical challenges:

  1. the reconstruction error of a directed acyclic graph may grow super-exponentially in $p$
  2. little theory to guide practice for reconstructing a directed acyclic graph

Gaussian directed acyclic graph

The joint distribution of $(X_1,\ldots,X_p)$ is $\prod_{j=1}^p\pr(X_j\mid \pa_j)$. The factorization property is equivalent to the local Markov property of a directed acyclic graph.

Embed directional effects induced by directed edges into a structural equation model:

where $Z_j$ is the latent error representing unexplained variation in node $j$. Consider a Gaussian structural equation model in which each $f_j(\cdot,\cdot)$ becomes linear in $(\pa_j,Z_j)$ and $Z_j$ follows an independent $N(0,\sigma_j^2)$ distribution,

where $A$ is the parametrized adjacency matrix in which a nonzero $(j,k)$ element $A_{jk}$ corresponds to a directed edge from parent node $k$ to child node $j$, with its value $A_{jk}$ indicating the strength of the relation. Our goal is to estimate $A$ subject to the requirement that $A$ defines a directed acyclic graph.

DAG models have been under-studies compared with undirected graphs. The major challenge stems from acyclically and the corresponding nonconvex parameter space, defined by nonconvex constraints reinforcing acyclically of a graph. In contrast, the parameter space for Gaussian UG models is a positive-semidefinite cone, which a convex.

Denote a directed cycle by $(j_1,\ldots,j_L)$, a sequence of indices of nodes, where $L$ is the length of the directed cycle, and $j_1$ is required to be the smallest index. A DAG doesn’t contain a directed cycle. This implies that the number of directed edges is smaller than the number of nodes involved in every possible directed cycle. For example, to prevent a directed cycle, say $(1,2,3)$, we introduce the constraint

On this basis, we introduce constraints on entries of $A$ to reinforce acyclically of any order.

The total number of constraints is

Theorem 1 implies that the $O(p^p)$ constraints can be reduced to $p^3-p^2$ constraints.

Constrained Maximum Likelihood

Suppose that $\sigma_j(j=1,\ldots,p)$ are equal, we have

where all the $Z_j$ are independent and identically distributed according to $N(0,\sigma^2)$ with $\sigma > 0$.

Given an $n\times p$ data matrix ${x_{ij}}_{n\times p}$, the negative log-likelihood, after dropping a constant term, is

which is convex in $A$.

In addition to the constraints on acyclicity, also impose a constraint to regularize its sparsity. Then the directed acyclic graph learning problem can be formulated as

Applying their theorem and approximating the indicator functions by piecewise-linear functions to circumvent the discontinuity, the above problem can be reduced further, and they can use difference convex programming.

In practice, $\tau$ and $K$ are treated as tuning parameters, optimized using an independent validation set or by cross validation over a set of grids on their domain.

They solve this constrained minimization through difference convex programming and the alternating direction method of multipliers.

Published in categories Note