WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Effective Gene Expression Prediction

Posted on (Update: )
Tags: Gene Expression, Transformer

This note is for Avsec, Ž., Agarwal, V., Visentin, D., Ledsam, J. R., Grabska-Barwinska, A., Taylor, K. R., Assael, Y., Jumper, J., Kohli, P., & Kelley, D. R. (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18(10), 1196–1203.

Effective gene expression prediction from sequence by integrating long-range interactions

  • how noncoding DNA determines gene expression in different cell types
  • here report substantially improved gene expression prediction accuracy from DNA sequences via a deep learning architecture, called Enformer
    • able to integrate information from long-range interactions (up to 100kb away) in the genome
    • enformer learned to predict enhancer-promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input

Introduction

  • increasing information flow between distal elements is a promising path to increase predictive accuracy

  • introduce a neural network architecture based on self-attentiion towards this goal
  • frame the machine learning problem as predicting thousands of epigenetic and transcriptional datasets in at multitask setting across long DNA sequences

Enformer attends to cell-type-specific enhancers

Methods

Model architecture

the Enformer architecture consists of three parts

  • 7 convolutional blocks with pooling
  • 11 transformer blocks
  • a cropping layer followed by final pointwise convolutions branching into 2 organism-specific network heads

take as input one-hot-encoded DNA sequence of length 196608 bp and predicts 5313 genomic tracks for the human genome and 1643 tracks for the mouse genome


Published in categories Note