Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

After bowtie2 came out with a local alignment option, it wasn't long before bwa developed its own local aligner alignment algorithm called BWA-MEM (for Maximal Exact Matches). This aligner is very, very nice because it , implemented by the bwa mem command. bwa mem has the following advantages:

  • It incorporates a lot of the simplicity of using

...

  • bwa with the complexities of local alignment

...

  • ,

...

  • enabling

...

  • straightforward alignment of datasets like the mirbase data we just examined

...

  • It can align different portions of a read to different locations on the genome
    • In a long RNA-seq experiment, reads will (at some frequency) span a splice junction themselves, or a pair of reads in a paired-end library will fall on either side of a splice junction.

...

    • We want to be able to align reads that do this for many reasons, from accurate transcript quantification to novel fusion transcript discovery.

Thus  Thus, our last exercise will be the alignment of a human LONG long RNA-seq dataset composed (by design) almost exclusively of reads that cross splice junctions.

BWA-MEM should have been loaded bwa mem was made available when we loaded the BWA bwa module, so to look at the details of MEM alignment, just enter "bwa mem" to get the help menu with the options list.  The take a look at its usage information. The most important parameters, similar to those we've manipulated in the past two sections, are the following:

OptionEffect
-kControls the minimum seed length (default = 19)
-wControls the "gap bandwidth", or the length of a maximum gap. This is particularly relevant for MEM, since it can determine whether a read is split into two separate alignments , or is reported as one long alignment with a long gap in the middle (default = 100)
-rControls how long an alignment must be relative to its seed before it is re-seeded to try to find a best-fit local match (default = 1.5, e.g. the value of -k multiplied by 1.5)
-cControls how many matches a MEM must have in the genome before it is discarded (default = 10000)
-tControls the number of threads to use

There are many more parameters to control the scoring scheme and other details, but these are the most essential parameters ones to use to get anything of value at all.

...