WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Cluster Analysis of Transcriptomic Datasets of IPF

Posted on
Tags: Clustering, IPF

Kraven, L. M., Taylor, A. R., Molyneaux, P. L., Maher, T. M., McDonough, J. E., Mura, M., Yang, I. V., Schwartz, D. A., Huang, Y., Noth, I., Ma, S. F., Yeo, A. J., Fahy, W. A., Jenkins, R. G., & Wain, L. V. (2023). Cluster analysis of transcriptomic datasets to identify endotypes of idiopathic pulmonary fibrosis. Thorax, 78(6), 551–558.

considerable clinical heterogeneity in IPF suggests the existence of multiple disease endotypes.


  • co-normalized, pooled, and clustered three publicly available blood transcriptomic datasets (total 220 IPF cases)

  • compare clinical traits across clusters and used gene enrichment analysis to identify biological pathways and processes that were over-represented among the genes that were differentially expressed across clusters

  • A gene-based classifier was developed and validated using three additional independent datasets (total 194 IPF cases)


  • identified three clusters of patients with IPF with statistically significant differences in lung function and mortality between groups
  • developed and validated a 13-gene cluster classifier that predicted mortality in IPF (high-risk clusters vs low-risk clusters: HR 4.25) ? three groups, how to define low-risk and high-risk cluster?.


  • identify blood gene expression signatures capable of discerning groups of patients with IPF with significant differences in survival

discovery stage

  • co-normalized the discovery datasets using the COmbat CO-Normalization Using conTrols (COCONUT) method
  • the Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL) to identify the optimal number of clusters within the pooled, co-normalized data
  • develop a gene expression-based classifier

validation stage

compare the classifier’s performance at predicting survival in IPF

Published in categories Note