Biclustering on Gene Expression Data

Posted on Nov 10, 2021

Tags: Biclustering

The note is based on Padilha, V. A., & Campello, R. J. G. B. (2017). A systematic comparative evaluation of biclustering techniques. BMC Bioinformatics, 18(1), 55.

Biclustering (simultaneous clustering of rows and columns) is quite popular and widely-used in gene expression data. (Simple googling with keywords “review biclustering” returns many papers on gene expression data)

The drawbacks of partitional or hierarchical clustering algorithm:

each gene or experimental condition is assigned to only one cluster
each gene or experimental condition must be assigned to some cluster

Greedy Algorithm:

Cheng and Church’s Algorithm (CCA): add/remove rows and columns to minimize the mean squared residue measure.
OPSM: search biclusters with rows following the same linear order under the same columns
xMOTIFs: nondeterministic algorithm for discrete data matrix
Iterative Signature Algorithm (ISA): starts with randomly selected genes and experimental conditions, evaluating and updating them through iterative steps until convergence.
Minimum Sum-Squared Residue Coclustering (MSSRCC): finding checkerboard patterns

Qualitative Biclustering: discretizes the input, and build a graph where each node corresponds to a gene, each edge has a weight equal to the number of experimental conditions for which two genes have the same nonzero integer values, then searches for biclusters corresponding to heavy subgraphs
Combinatorial Algorithm for Expression and Sequence-Based Cluster Extraction: initiate a bicluster with two maximally correlated genes across all experimental conditions
Correlated Pattern Biclusters (CPB): starts with randomly selected subsets of rows and columns, and searches for biclusters with highly correlated rows regarding some reference genes
Large Average Submatrices (LAS): search for submatrices by locally maximizing a Bonferroni-based significance score.

Divide-and-conquer algorithms:

Split a problem into smaller instances, solving each one recursively and combining their solutions into the solution for the original problem.

Binary Inclusion-Maximal Biclustering Algorithm (Bimax): recursively divides a binary matrix for searching submatrices formed with entries whose values ae all equal to one.

Exhaustive enumeration algorithms:

Assume that the best submatrices can only be identified by generating all possible row and column combinations of the dataset.

Statistical-Algorithmic Method for Bicluster Analysis (SAMBA): model the input datasets as a bipartite graph, where one set of nodes corresponds to the genes and the other is related to the experimental conditions
Bit-Pattern Biclustering Algorithm: search for maximal biclusters in binary datasets by applying the logical AND operator over all possible gene pairs.
Differentially Expressed Biclusters (DeBi): based on a frequent itemset approach

Distribution parameter identification algorithms:

Assume some statistical model related to the structure of the biclusters and then apply some iterative procedure to adapt its parameters.

Plaid: the expression value in a bicluster is the sum of the main effect $\mu$, the gene effect $\alpha_i$, the condition effect $\beta_j$, and the noise term $y_{ij}=(\mu_0+\alpha_{i0}+\beta_{j0})+\sum_{k=1}^K(\mu_k+\alpha_{ik}+\beta_{jk})\rho_{ik}\kappa_{jk} + \varepsilon_{ij}$
Bayesian BiClustering (BBC)
- for a single bicluster, same model as in Plaid model
- for multiple biclusters, constrain the overlapping of biclusters to only one direction: either gene $\sum_{k=1}^K\rho_{ik}\le 1$ or condition $\sum_{k=1}^K\kappa_{jk}\le 1$.

Spectral: use SVD to simultaneously cluster genes and experimental conditions to find checkerboard patterns
Factor Analysis for Bicluster Acquisition (FABIA): assume a multiplicative model, and uses a factor analysis approach

Published in categories Note

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.