Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note how the CIGAR string is 3S20M13S, meaning that 13 bases were soft clipped from one end, and 3 from the other.  If we did the same alignment using either --end-to-end mode, or using BWA in the same way as we did in Exercise #1, very little of this file would have aligned.  However, if we had not lowered the seed parameter of Bowtie2 from its default of 22, we would not have found many of the alignments like the one shown above, because the read only matched for 20 bases - a matching 22 base seed does not exist.  Such is the nature of Bowtie2 - it can be a powerful tool to sift out the alignments you want from a messy dataset with limited information, but doing so requires careful tuning of the parameters, which can in itself take a lot of time to perfect.

Exercise #3: BWA-MEM (and Tophat2) - Human mRNA-seq

After Bowtie2 came out with a local alignment option, it wasn't long before BWA generated their own local-aligner called BWA-MEM (for Maximal Exact Matches).  This aligner is very, very nice because it incorporates a lot of the simplicity of using BWA with the complexities of local alignment.  This functionality, while enabling the alignment of datasets like the mirbase data we just examined, also permits more complex alignments, such as that of spliced mRNAs.  In a long RNA-seq experiment, reads will (at some frequency) span a splice junction themselves, or a pair of reads in a paired-end library will fall on either side of a splice junction.  We want to be able to align reads that do this for many reasons, from accurate transcript quantification to novel fusion transcript discovery.  Thus, our last exercise will be the alignment of a human LONG RNA-seq dataset composed (by design) almost exclusively of reads that cross splice junctions.

...