Day 3 Take Away Points

Let's recap what we learned on Day 3:

Part 1. Annotated genes/transcripts

1.Read counting:

When mapping to the genome (using a tool like hisat2), use a tool (eg: bedtools, htseq,stringtie) to get gene/transcript counts from mapping results.
When mapping to the transcriptome using kallisto, you do not need to do an extra step of gene/transcript counting.

2.Filtering, visualizations:

Consider filtering out low count/low variance genes before differential expression analysis. DESEQ2 does some of this filtering for you.
Consider doing a PCA of the count data to identify factors causing variation in gene expression. You may want to use a subset of data for this (such as 20% highest variance genes).

3.Differential expression analysis

DESEQ2 will take in raw count data as input, along with sample metadata. It has convenient readers to read in kallisto and htseq counts, but really you can read in your own gene expression matrix as well.
The design formula tells DESEQ2 what condition you are testing: ~condition would test for differences among conditions; ~batch + condition would test for differences in condition while controlling for batch. DESEQ2 vignette gives you details on how you can tweak this design formula.
DESEQ2 results will contain log2 fold changes , p values and adjusted p values for every gene.
Impose cutoffs on fold change and adjusted p value to get significantly differentially expressed genes.

4.Visualization

MA plots, heat maps, PCA are all good ways to visualize your gene expression data. Make sure to use normalized, log transformed data for these visualizations.

Part 2. Novel transcripts

Use a pipeline of hisat2 (mapping to genome), stringtie (transcript assembly, quantification), and ballgown (differential expression testing)

Space shortcuts