DNA copy number profiling: from bulk tissue to single cells

Posted on Jan 02, 2020

Tags: Copy Number Variation, Single-cell

This post is based on the talk given by Yuchao Jiang at the 11th ICSA International Conference on Dec. 20th, 2019.

Copy number variation (CNV)

CNV detection by next-generation DNA-seq

Bulk DNA-seq:

Whole-genome sequencing (WGS)
Whole-exome sequencing (WES) & targeted sequencing

Single-cell DNA-seq:

Conventional whole-genome amplification
10X Genomics Chromium Single Cell CNV Solution

Goal

Use bulk/single-cell DNA sequencing to accurately detect CNV

Based on depth of coverage, i.e., number of times a genomic region is “read”

Biases:

GC content
Capture and amplification efficiency
Sequencing bias
Latent factors
Batch effect
Population stratification

Existing Works

Fromer et al. (2012) said that

Because exome sequencing takes aim at a sparse (~1%) set of noncontiguous genomic targets (the exons), most CNV breakpoints will not be sequenced, leaving read depth as the predominant indicator of CNVs. However, the quantitative relationship between true copy number and depth is distorted by target- and sample-specific biases in exome hybridization (“capture”), PCR amplification, sequencing efficiency, and in silico read mapping, all of which are in turn affected by GC content of the targets, target size and sequence complexity, proximity to segmental duplications, nucleotide-level variation (SNPs), DNA concentration, hybridization temperature, experimental sample batching, and the complex interplay among these and various indeterminate factors.

They proposed a method called XHMM, and introduced that

XHMM extracts copy-number signal from noisy read depth by leveraging the large-scale nature of sequencing projects to discern patterns of read-depth biases. Specifically, we ran a principal-component analysis (PCA) on the sample-by-target-depth matrix by “rotating” the high-dimensional data to find the main modes in which depth varies across multiple samples and targets, and we removed the largest of such effects.

Here is a diagram illustrating some basic concepts of DNA, which adapted from 潘高的小站 – 外显子、内含子、mRNA、CDS、ORF的区别与联系