# Biclustering on Gene Expression Data

##### Posted on Nov 10, 20210 Comments
Tags: Biclustering

Biclustering (simultaneous clustering of rows and columns) is quite popular and widely-used in gene expression data. (Simple googling with keywords “review biclustering” returns many papers on gene expression data)

The drawbacks of partitional or hierarchical clustering algorithm:

• each gene or experimental condition is assigned to only one cluster
• each gene or experimental condition must be assigned to some cluster

## Greedy Algorithm:

• Cheng and Church’s Algorithm (CCA): add/remove rows and columns to minimize the mean squared residue measure.
• OPSM: search biclusters with rows following the same linear order under the same columns
• xMOTIFs: nondeterministic algorithm for discrete data matrix
• Iterative Signature Algorithm (ISA): starts with randomly selected genes and experimental conditions, evaluating and updating them through iterative steps until convergence.
• Minimum Sum-Squared Residue Coclustering (MSSRCC): finding checkerboard patterns

• Qualitative Biclustering: discretizes the input, and build a graph where each node corresponds to a gene, each edge has a weight equal to the number of experimental conditions for which two genes have the same nonzero integer values, then searches for biclusters corresponding to heavy subgraphs
• Combinatorial Algorithm for Expression and Sequence-Based Cluster Extraction: initiate a bicluster with two maximally correlated genes across all experimental conditions
• Correlated Pattern Biclusters (CPB): starts with randomly selected subsets of rows and columns, and searches for biclusters with highly correlated rows regarding some reference genes
• Large Average Submatrices (LAS): search for submatrices by locally maximizing a Bonferroni-based significance score.

## Divide-and-conquer algorithms:

Split a problem into smaller instances, solving each one recursively and combining their solutions into the solution for the original problem.

• Binary Inclusion-Maximal Biclustering Algorithm (Bimax): recursively divides a binary matrix for searching submatrices formed with entries whose values ae all equal to one.

## Exhaustive enumeration algorithms:

Assume that the best submatrices can only be identified by generating all possible row and column combinations of the dataset.

• Statistical-Algorithmic Method for Bicluster Analysis (SAMBA): model the input datasets as a bipartite graph, where one set of nodes corresponds to the genes and the other is related to the experimental conditions
• Bit-Pattern Biclustering Algorithm: search for maximal biclusters in binary datasets by applying the logical AND operator over all possible gene pairs.
• Differentially Expressed Biclusters (DeBi): based on a frequent itemset approach

## Distribution parameter identification algorithms:

Assume some statistical model related to the structure of the biclusters and then apply some iterative procedure to adapt its parameters.

• Plaid: the expression value in a bicluster is the sum of the main effect $\mu$, the gene effect $\alpha_i$, the condition effect $\beta_j$, and the noise term $$y_{ij}=(\mu_0+\alpha_{i0}+\beta_{j0})+\sum_{k=1}^K(\mu_k+\alpha_{ik}+\beta_{jk})\rho_{ik}\kappa_{jk} + \varepsilon_{ij}$$
• Bayesian BiClustering (BBC)
• for a single bicluster, same model as in Plaid model
• for multiple biclusters, constrain the overlapping of biclusters to only one direction: either gene $\sum_{k=1}^K\rho_{ik}\le 1$ or condition $\sum_{k=1}^K\kappa_{jk}\le 1$.

• Spectral: use SVD to simultaneously cluster genes and experimental conditions to find checkerboard patterns
• Factor Analysis for Bicluster Acquisition (FABIA): assume a multiplicative model, and uses a factor analysis approach

Published in categories Note