Posted on 0 Comments
There are my notes when I read the paper called Maximal information component analysis.
A basic assumption
made by many of these co-expression methods.
relationships in a biological network can be accurately described using linear dependence measures such as Pearson correlation or a monotonic dependence measure such as Spearman’s correlation.
However,linear or monotonic relationships approximate only a fraction of the true relationship types observed in a biological system.
is capable of identifying non-linear connections in the data, and has been used in several previ- ously described algorithms
drawback difficult to address in some cases, has been its sensitivity to bin size and number as well as an unsatisfying [0-Infinity] range
a modification to MI termed Maximal Information-based Non-parametric Exploration (MINE) has been described that eliminates these two limitations of MI by identifying the ideal bin size and renormal- izing the MI measure into a [0,1] state space
Another common assumption
genes are clustered into modules after the underlying network structure has been identified
- Many methods adopt a strict clustering approach, where genes are partitioned uniquely into a single module per gene.
- In some cases, this is done out of necessity (hierarchical tree-based methods), but in many cases it is done purely for computational efficience.
every genes can be found in many modules.___my idea
Although convenient and fast, clustering methods that force genes to uniquely exist in a single module result in incomplete modules, missing key genes that link the modules to one another
avoids some of the above unlikely assump- tions made by other network algorithms.
MICA ALLOWS GENES TO EXIST WITHIN MULTIPLE MODULES
MICA REPRODUCES SCALE-FREE TOPOLOGY
PRINCIPLE COMPONENT ANALYSIS IS CONSERVED ACROSS A WIDE RANGE OF POTENTIAL MM CUTOFFS
Two common goals of module construction algorithms
- the identification of enriched pathways, domains, and molecular functions within modules,
- the discovery of modules which are strongly correlated with disease severity or other phenotypes of interest
??? one weighted PCA ??? directly from GWAS
STABILITY OF EIGENGENES ALLOWS FOR SELECTION OF OPTIMAL MODULES IN TERMS OF SIZE AND GENE-SET ENRICHMENT
network analysis and module construction should prior- itize specific pathways and genes for further analysis by targeted approaches. To achieve that goal, ideal modules should be both highly enriched for specific gene categories, and also small enough to reasonably examine all the genes in the module for interesting candidates and drivers without eliminating large numbers of genes from consideration.
- DAVID enrichment scores: To determine this ideal cutoff and identify the optimal modules for further analysis
- then applied a metric that incorporates both module sizes and enrichment significance
COMPARISON OF MICA TO WGCNA
two recently described gene expression microarray datasets from a large mouse panel
- one from control and OxPAPC-treated macrophages
- another from liver
several measures of network fitness
- the SFT criterion defined by comparing the observed distribution of edge connections across the inferred network to the power- law distribution of an ideal scale-free system
- The next comparison metric is perplexity, a measure of the entropy of a system
- utilized differences in GO enrichments as one measure of network fitness, but felt that a strict comparison of GO enrichment values only captured part of the overall “usefulness” of the constructed modules. **a combination of DAVID gene-set enrichment, module size and number of genes unplaced in modules **(Ideally, as many modules as possible in a network should be highly enriched and reasonably small to assist in further study.)
MICA-analyzed net- work contained more modules that were significantly enriched for these GO terms, with six modules being enriched for one or more term of interest as opposed to four in WGCNA
- a dataset consisting of 7000 highly expressed genes from livers taken from a large panel of mouse strains.
- small power
- In order to determine the overall stability of the modules observed in both WGCNA and MICA, we randomly partitioned the macrophage dataset into two equal parts and ran both MICA and WGCNA on each half.
EFFECTS OF DATASET ON MICA
If there are more non-linear interactions in a dataset, then MICA should perform better than WGCNA, which does not take into account the non-linear inter- actions in the data.
While the linear interactions will be picked up by Pearson correlation, the increased number of non-linear interactions can only be detected appropriately through MINE.
two possible explanations for the differences between the two datasets
- the macrophage dataset is an in vitro system containing a single cell type, while the liver samples contain multiple cell types.
- the improvement comes because we analyzed both treated and untreated data together, rather than separately.
- These observations suggest that the improvement observed when using MICA on the macrophage dataset is a result of MICA’s ability to capture gene by environment interactions between the treated and control samples.
MICA may be particularly well suited for the analysis of networks in which gene by environment interactions are expected to occur
- it does not discard non-linear interactions;
- it removes the need for soft thresholding
- employs a fuzzy clustering algorithm for module detection
- shows improvements over correlation algorithms in certain cases, particularly those involving gene by environment interactions.
perplexity: a measure of the entropy of a system