Bipartitle eQTL Network Construction
Posted on
evaluate eQTL by modeling the association between SNP genotypes and gene expression.
- an $r\times n$ matrix $S$ of SNP genotypes
- $r\times m$ matrix $G$ of gene expression
each with $r$ rows representing observations and columns representing $n$ SNPs and $m$ genes, respectively
consider a covariate matrix $X$, including features such as principal components for population structure, sex and age.
model the eQTL of a particular SNP $i$ on a locus’s gene expression $j$
\[G_j = X^\top\alpha + \beta_{ij}S_i\]the eQTL association between all pairs of SNPs and genes can be represented as a bipartitle network by considering each SNP $i$ and gene $j$ to be a node in the network, and casting a function of their association as edges
define a set of adjacency matrix representations based on summary statistics from eQTL analyses
\[a_{ij} = \vert z_{ij}\vert I\{Y_{ij} < \tau\}\,,\]where
- $z_{ij}$ is either set equal to 1 for an unweighted representation or the $z$-statistic for testing $\beta_{ij}$ from the eQTL regression between SNP $i$ and gene $j$ for a weighted representation
- $Y_{ij}$ is a measure of the significance of the eQTL association
three definitions of $Y$
to identify nodes (either SNPs or genes in the bipartite representation) that are central to the network
consider the network metric of degree
for the sparse representation of $A$, the degree of SNP $i$ and the degree of gene $j$ are defined as follows