Let's recap what we learned on Day 3:

Part 1. Annotated genes/transcripts

1.Read counting:

  • When mapping to the genome (using a tool like hisat2), use a tool (eg: bedtools, htseq,stringtie) to get gene/transcript counts from mapping results.
  • When mapping to the transcriptome using kallisto, you do not need to do an extra step of gene/transcript counting. 

2.Filtering, visualizations:

  • Consider filtering out low count/low variance genes before differential expression analysis. DESEQ2 does some of this filtering for you.
  • Consider doing a PCA of the count data to identify factors causing variation in gene expression.  You may want to use a subset of data for this (such as 20% highest variance genes).

3.Differential expression analysis

  • DESEQ2 will take in raw count data as input, along with sample metadata.  It has convenient readers to read in kallisto and htseq counts, but really you can read in your own gene  expression matrix as well.
  • The design formula tells DESEQ2 what condition you are testing:  ~condition would test for differences among conditions; ~batch + condition would test for differences in condition while controlling for batch. DESEQ2 vignette gives you details on how you can tweak this design formula.
  • DESEQ2 results will contain log2 fold changes , p values and adjusted p values for every gene.
  • Impose cutoffs on fold change and adjusted p value to get significantly differentially expressed genes.

4.Visualization

  • MA plots, heat maps, PCA are all good ways to visualize your gene expression data. Make sure to use normalized, log transformed data for these visualizations.

Part 2. Novel transcripts 

Use a pipeline of hisat2 (mapping to genome), stringtie (transcript assembly, quantification), and ballgown (differential expression testing)

  • If you want to identify novel transcripts in your particular samples
  • If you want to look for differential expression in these novel transcripts



BACK TO COURSE OUTLINE

  • No labels