Why Normalize?

Normalization smooths out technical variations among the samples we are comparing so that we can more confidently attribute variations we see to biological reasons. 

We usually normalize for:

  1. Sequencing depth: Say we are comparing gene counts in sample A against sample B.  If you start out with 10 million reads in sample A  vs 1 million reads in sample B, a 10 fold increase in expression in sample A is going to be purely due to its sequencing depth.
  2. Gene length: A gene that is twice as long is likely to have twice as many reads sampling it.
  3. GC content

Some Normalization Methods

RPKM: 

 

Median scaling (DESeq method):

 

TMM:

 

Quantile:

  • No labels