WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Infinite Relational Model

Posted on (Update: ) 0 Comments
Tags: Chinese Restaurant Process, Clustering, Infinite Relational Model

This note is based on Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T., & Ueda, N. (n.d.). Learning Systems of Concepts with an Infinite Relational Model. 8. and Saad, F. A., & Mansinghka, V. K. (2021). Hierarchical Infinite Relational Model. ArXiv:2108.07208 [Cs, Stat].

Suppose we are given one or more relations involving one or more types.

Goal: partition each type into clusters, where a good set of partitions allows relationships between entities to be predicted by their cluster assignments.

  • organize the entities into clusters that relate to each other in predictable ways
  • predicate types: multiple relations defined over the same domain, group them into a type and refer to them as predicates

e.g. several social predicates defined over the domain people x people: likes(,), admires(,), …, then introduce a type for these social predicates, and define a ternary relation applies(i, j, p) which is true if predicate $p$ applies to the pair (i, j).

Suppose that the observed data are $m$ relations involving $n$ types. Let $R^i$ be the $i$-th relation, $T^j$ be the $j$-th type, and $z^j$ be a vector of cluster assignments for $T^j$. The task is to infer the cluster assignments, and ultimately interested in the posterior distribution

\[P(z^1,\ldots, z^n\mid R^1,\ldots, R^m)\]

HIRM

Given a set of relations defined over a collection of domains, the model first infers multiple non-overlapping clusters of relations using a top-level Chinese restaurant process.

Within each cluster of relations, a Dirichlet process mixture is then used to partition the domain entities and model the probability distribution of relation values.

The HIRM generalizes the standard infinite relational model and can be used for a variety of data analysis tasks including dependence detection, clustering, and density estimation.

Present new algorithms for fully Bayesian posterior inference via Gibbs sampling.

For relational data, we observe attributes and interactions among a set of entities and the goal is to learn models that are useful for explaining or making predictions about the entities, their attributes, and/or their interactions.

A relational system $S$ consists of $n$ domains $D_1,\ldots,D_n$ and $m$ relations $R_1,\ldots,R_m$. Each domain $D_i$ ($1\le i\le n$) is a countably infinite set of distinct entities ${e_1^i,e_2^i\ldots,}$. Each relation $R_k$ ($1\le k\le m$) is a map from the Cartesian product of $t_k$ domains to an arbitrary codomain $C_k$. The symbol $d_{ki}$ ($1\le k\le m, 1\le i\le t_k$) denotes the domain index of the $i$-th argument of $R_k$.

Consider a system $S$ with $n$ domains and $m$ relations. For each $i=1,\ldots,n$, the IRM assumes that entities ${e_1^i,e_2^i,\ldots}$ in domain $D_i$ are associated with integer cluster assignments ${z_1^i,z_2^i,\ldots}=:z^i$.

The IRM defines a joint probability distribution over cluster assignments and relation values with the following factorization structure:

\[P(z^1,\ldots,z^n, R_1,\ldots,R_m) = \prod_{i=1}^nP(z^i)\prod_{k=1}^mP(R_k\mid z^1,\ldots,z^n)\]

To allow IRM to discover an arbitrary number of clusters for each domain $D_i$, the cluster assignments $z^i$ for the entities are given a nonparametric prior that assign a positive probability to all possible partition using the Chinese restaurant process (CRP).

For each $i=1,\ldots,n$, the cluster assignment probabilities $P(z^i) = P(z_1^i,z_2^i,\ldots,)$ are defined inductively with $z_1^i:=1$, and for $l\ge 2$,

\[P(z_l^i=j\mid z_1^i,\ldots,z_{l-1}^i) \propto \begin{cases} n_j & \text{if } 1\le j\le M\\ \gamma & \text{if }j=M+1 \end{cases}\]

where $n_j$ is the number of previous entities at cluster $j$, and $M$ is the number of clusters among the first $l-1$ entities, and $\gamma > 0$ is a concentration parameter.

Next, for each relation $R_k(1\le k\le m)$, a set of parameter $\theta_k(j_1,\ldots,j_{t_k})$ is used to dictate the distribution of $R_k(i_1,\ldots,i_{t_k})$.

The generative model of the IRM is

\[\begin{align} \{z_1^i,z_2^i,\ldots\} & \sim CRP(\gamma_i)\\ \theta_k(j_1,\ldots,j_{t_k}) & \sim \pi_k(\lambda_k)\\ R_k(i_1,\ldots,i_{t_k}) &\sim L_k(\theta_k(z_{i_1}^{d_{k1}}, \ldots,z_{i_{t_k}}^{d_{kt_k}})) \end{align}\]

Limitations of The IRM

  • enforcing shared domain clusterings leads to overfitting
  • restrictions when clustering multiple relations

Published in categories Note