Cluster Analysis of Transcriptomic Datasets of IPF
Posted on
considerable clinical heterogeneity in IPF suggests the existence of multiple disease endotypes.
methods
-
co-normalized, pooled, and clustered three publicly available blood transcriptomic datasets (total 220 IPF cases)
-
compare clinical traits across clusters and used gene enrichment analysis to identify biological pathways and processes that were over-represented among the genes that were differentially expressed across clusters
-
A gene-based classifier was developed and validated using three additional independent datasets (total 194 IPF cases)
findings:
- identified three clusters of patients with IPF with statistically significant differences in lung function and mortality between groups
- developed and validated a 13-gene cluster classifier that predicted mortality in IPF (high-risk clusters vs low-risk clusters: HR 4.25) ? three groups, how to define low-risk and high-risk cluster?.
interpretation:
- identify blood gene expression signatures capable of discerning groups of patients with IPF with significant differences in survival
discovery stage
- co-normalized the discovery datasets using the COmbat CO-Normalization Using conTrols (COCONUT) method
- the Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL) to identify the optimal number of clusters within the pooled, co-normalized data
- develop a gene expression-based classifier
validation stage
compare the classifier’s performance at predicting survival in IPF