Effective Gene Expression Prediction

Posted on Jan 26, 2024 (Update: Feb 06, 2024)

This note is for Avsec, Ž., Agarwal, V., Visentin, D., Ledsam, J. R., Grabska-Barwinska, A., Taylor, K. R., Assael, Y., Jumper, J., Kohli, P., & Kelley, D. R. (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18(10), 1196–1203.

Effective gene expression prediction from sequence by integrating long-range interactions

how noncoding DNA determines gene expression in different cell types
here report substantially improved gene expression prediction accuracy from DNA sequences via a deep learning architecture, called Enformer
- able to integrate information from long-range interactions (up to 100kb away) in the genome
- enformer learned to predict enhancer-promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input

Introduction

increasing information flow between distal elements is a promising path to increase predictive accuracy
introduce a neural network architecture based on self-attentiion towards this goal
frame the machine learning problem as predicting thousands of epigenetic and transcriptional datasets in at multitask setting across long DNA sequences

Enformer attends to cell-type-specific enhancers

Methods

Model architecture

the Enformer architecture consists of three parts

7 convolutional blocks with pooling
11 transformer blocks
a cropping layer followed by final pointwise convolutions branching into 2 organism-specific network heads

take as input one-hot-encoded DNA sequence of length 196608 bp and predicts 5313 genomic tracks for the human genome and 1643 tracks for the mouse genome

Published in categories Note

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.