Loading [MathJax]/jax/output/CommonHTML/jax.js

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Reconstruct Gaussian DAG

Posted on
Tags: Graph, Nonconvex, Optimization

This note is based on Yuan, Y., Shen, X., Pan, W., & Wang, Z. (2019). Constrained likelihood for reconstructing a directed acyclic Gaussian graph. Biometrika, 106(1), 109–125.

DAG is widely used to describe directional pairwise relations. The relations are estimated by reconstructing a directed acyclic graph’s structure, which is challenging when the ordering of nodes of the graph is unknown.

Motivation

Existing methods such as the neighbourhood and search-and-score methods have high estimation errors or computational complexities, especially when a local or sequential approach is used to enumerate edge directions by testing or optimizing a criterion locally, as a local method may break down even for moderately sized graphs.

Proposal

A novel approach to simultaneously identifying all estimable directed edges and model parameters, using constrained maximum likelihood with nonconvex constraints.

Develop a constraint reduction method that constructs a set of active constraints from super-exponentially many constraints.

Coupled with an alternating direction method of multipliers and a difference convex method, permits efficient computation for large-graph learning.

The proposed method consistently reconstructs identifiable directions of the true graph and achieves the optimal performances in terms of parameter estimation.

Numerically, the method compares favourably with competitors. A protein network is analysed to demonstrate that the proposed method can make a difference in identifying the network’s structure.

Literatures

Two types

local conditional independence tests sequentially to check the existence of directed edges

usually have worst-case complexity that is exponential in the number of nodes

  • PC algorithm: O(pq) with p nodes and maximal neighborhood size q.
  • Search-and-score: optimize a goodness-of-fit measure for possible directions in a neighborhood of interest.
  • L1 penalization method for learning a directed acyclic graphical model with a known topological ordering of the nodes.
  • L1 penalization method for interventional data
  • a high-dimensional additive structural equation model based on order search and penalized regression.

exact learning method

  • dynamic programming: guarantee a global minimizer but has exponential time complexity and is feasible only for small graphs.
  • mixed integer linear programming: may reduce the complexity to a polynomial order when the maximal number of parent nodes is restricted to one or two in the search. such a restriction limits the scope of application and may destroy the acyclically requirement of a directed acyclic graph and hence the local Markov property.

the paper claims that they can employ an iterative difference convex algorithm to seek a good local minimizer with fast convergence.

theoretical challenges:

  1. the reconstruction error of a directed acyclic graph may grow super-exponentially in p
  2. little theory to guide practice for reconstructing a directed acyclic graph

Gaussian directed acyclic graph

The joint distribution of (X1,,Xp) is pj=1pr(Xjpaj). The factorization property is equivalent to the local Markov property of a directed acyclic graph.

Embed directional effects induced by directed edges into a structural equation model:

Xj=fj(paj,Zj)

where Zj is the latent error representing unexplained variation in node j. Consider a Gaussian structural equation model in which each fj(,) becomes linear in (paj,Zj) and Zj follows an independent N(0,σ2j) distribution,

Xj=k:kjAjkXk+ZjZjN(0,σ2j),

where A is the parametrized adjacency matrix in which a nonzero (j,k) element Ajk corresponds to a directed edge from parent node k to child node j, with its value Ajk indicating the strength of the relation. Our goal is to estimate A subject to the requirement that A defines a directed acyclic graph.

DAG models have been under-studies compared with undirected graphs. The major challenge stems from acyclically and the corresponding nonconvex parameter space, defined by nonconvex constraints reinforcing acyclically of a graph. In contrast, the parameter space for Gaussian UG models is a positive-semidefinite cone, which a convex.

Denote a directed cycle by (j1,,jL), a sequence of indices of nodes, where L is the length of the directed cycle, and j1 is required to be the smallest index. A DAG doesn’t contain a directed cycle. This implies that the number of directed edges is smaller than the number of nodes involved in every possible directed cycle. For example, to prevent a directed cycle, say (1,2,3), we introduce the constraint

I(A210)+I(A320)+I(A130)2.

On this basis, we introduce constraints on entries of A to reinforce acyclically of any order.

jL+1=j1:1kLI(Ajk+1jk0)L1{(j1,,jL):j1=min1kLjk,j1jL,L=2,,p}

The total number of constraints is

(p2)+(p2)2!++(pp)(p1)!=O(pp).

Theorem 1 implies that the O(pp) constraints can be reduced to p3p2 constraints.

Constrained Maximum Likelihood

Suppose that σj(j=1,,p) are equal, we have

Xj=k:kjAjkXk+Zj

where all the Zj are independent and identically distributed according to N(0,σ2) with σ>0.

Given an n×p data matrix xijn×p, the negative log-likelihood, after dropping a constant term, is

l(A)=12pj=1ni=1(xijk:kjxikAjk)2,

which is convex in A.

In addition to the constraints on acyclicity, also impose a constraint to regularize its sparsity. Then the directed acyclic graph learning problem can be formulated as

minAl(A)subject toi,j:ijI(Aij0)K1kL:jL+1=j1I(Ajk+1jk0)L1{(j1,,jL):j1=min1kLjk,j1jL,L=2,,p}

Applying their theorem and approximating the indicator functions by piecewise-linear functions to circumvent the discontinuity, the above problem can be reduced further, and they can use difference convex programming.

min(A,λ)l(A)subject toi,j:ijJτ(Aij)Kλjk+I(jk)λjkλτ(Aij)

In practice, τ and K are treated as tuning parameters, optimized using an independent validation set or by cross validation over a set of grids on their domain.

They solve this constrained minimization through difference convex programming and the alternating direction method of multipliers.


Published in categories Note