WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Cell type-specific and disease-associated eQTL in the human lung

Posted on
Tags: eQTL, IPF

This post is for Natri, H. M., Azodi, C. B. D., Peter, L., Taylor, C. J., Chugh, S., Kendle, R., Chung, M., Flaherty, D. K., Matlock, B. K., Calvi, C. L., Blackwell, T. S., Ware, L. B., Bacchetta, M., Walia, R., Shaver, C. M., Kropski, J. A., McCarthy, D. J., & Banovich, N. E. (2023). Cell type-specific and disease-associated eQTL in the human lung (p. 2023.03.17.533161). bioRxiv.

表达数量性状位点(expression quantitative trait locus, eQTL)是一类能够影响基因表达量的遗传位点(大部分都是单核苷酸多态性,SNP)

一般而言,eQTL主要分为两类:(1)顺式eQTL(cis-eQTL):它主要是指与所调控基因相距较近的eQTL,一般多位于所调控基因的上下游1Mb区域;(2)反式eQTL(trans-eQTL):与cis-eQTL恰恰相反,反式是指距离所调控基因位置比较远的eQTL,有时候距离甚至超过5Mb。因此,对于eQTL分析而言,我们通常需要考虑两点,SNP和基因表达水平的关联度以及SNP与基因的距离。

利用原始数据做eQTL分析,我们至少需要三个文件,第一个是样本信息文件,该文件包含样本的年龄,性别和人种等等;第二个是基因表达量文件,它表示的是每个基因在每个样本中的表达含量;第三个是基因型数据,也即每个样本的基因型数据。

gene1 ~ snp1 + sex + age + error_term

source: https://zhuanlan.zhihu.com/p/378403055

表达数量位置的基因座,它指的是染色体上一些能特定调控mRNA和蛋白表达水平的区域,其mRNA/蛋白质的表达水平量与数量性状成比例关系。eQTL analysis是将基因表达水平的变化和基因型连接起来,研究遗传突变与基因表达的相关性。

source: https://xsliulab.github.io/Workshop/2021/week32/eqtl%E8%AE%A1%E7%AE%97%E6%96%B9%E6%B3%95.html

Background: sc-eQTL studies

Motivation of eQTL studies: many disease-associated variants identified in GWAS are located in the regulatory regions of the genome and contribute to disease risk and progression by effecting changes in gene expression.

  • Bulk RNA-seq -> single cell RNA-seq
  • Tissue-specific -> cell-type-specific or context-dependent

Background: IPF

GWAS and meta-analyses have identified many IPF-associated variants, some of these variants are eQTLs in bulk-lung tissue; however, their cell-type-specific regulatory consequences have not been explored.

Methods

  • Pseudo-bulk mean aggregation.
    • Perform single-cell-level normalization using scran, and then calculate the mean on the resulting normalized (logged) counts.
  • Perform eQTL mapping with a linear mixed model (LMM).
    • The random effect term in LMM accounts for the expected correlation between cells from the same donor as well as genetic relatedness between donors, e.g. the kinship matrix.
\[Y = covs + SNP + u_K + \varepsilon\]

  • A gene was considered as eGene for a cell type if any eQTL for that gene was significant.

int-eQTL: disease-state interaction eQTL


Published in categories Note