Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Objectives

In RNA-Seq, the abundance level of a gene is measured by the number of reads that map to that gene.  Once the reads have been mapped to our reference, we must now count the number of reads that map to RNA units of interest to obtain gene/exon/transcript counts. Here, we shall look at different methods for doing this.

...

In our RNA-Seq analysis paths, here we will be exploring this path:

Count reads mapping to genes

Get set up

Code Block
titleGet set up for the exercises
cds
cd my_rnaseq_course
cd tophat_results

Bedtools

Bedtools is a great utility for working with sequence features and mapped reads in BAM, BED, VCF, and GFF formats.

...

Warning
titleWarning: To submit to queue
Code Block
nano commands.bedtools
 
bedtools multicov -bams C1_R1_thout/accepted_hits.bam C1_R2_thout/accepted_hits.bam C1_R3_thout/accepted_hits.bam C2_R1_thout/accepted_hits.bam C2_R2_thout/accepted_hits.bam C2_R3_thout/accepted_hits.bam -bed reference/genes.formatted.gtf > gene_counts.gff 
Code Block
launcher_creator.py -j commands.bedtools -n multicov -q normal -t 02:00:00 -a CCBB -l bedtools_launcher.sge
qsub bedtools_launcher.sge

HTseq

HTseq is another tool to count reads. bedtools has many many useful functions, and counting reads is just one of them. In contrast, HTseq is a specialized utility for counting reads.  HTseq is very slow and you need to run multiple command lines in order to do the same job as what bedtools multicov did. However, if you are looking for more fine grained control over how to count genes, especially when a read overlaps more than one gene/feature, htseq-count would be an option.

...

HTseq-count is strand-specific in default. Therefore, read counts for each gene in gene_counts_HTseq.gff are approximately a half counts in gene_counts.gff for the corresponding gene.

Other Gene Counting Options

If you want to perform all above operations in R enviornment, GRanges (along with Rsamtools) is a useful option. An example vignette is available here

...

Let's look at how to check for differential expression now.