Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleIf our reads were longer

Because these are short reads we do not have to adjust parameters like inter-seed distance (-i) or minimum alignment score (--min-score) that are a function of read length. If we were processing longer reads, we might need to use parameters like this, to force bowtie2 to "pretend" the read is a short, constant length:

-i C,1,0
--score-min=C,32,0

Yes, that looks complicated, and it kind of is. It's basically saying "slide the seed down the read one base at a time", and "report alignments as long as they have a minimum alignment score of 32 (16 matching bases x 2 points per match, minimum).

See the bowtie2 manual (after you have had a good stiff drink) for a full explanation.

...

Expand
titleWhat's going on?
  • --local – local alignment mode
  • -L 16 – seed length 16
  • -N 1 – allow 1 mismatch in the seed
  • -x <pfx> mb20/hairpin_cDNA_hsa.fa – prefix patch path of index files
  • -U <fq> fq/human_mirnaseq.fastq.gz – FASTQ file for single-end (UnpairedUnpaired) alignment
  • -S <outhuman_mirnaseq.sam> sam – tells bowtie2 to report alignments in SAM format to the specified file

Now you should have a human_mirnaseq.sam file that you can examine using whatever commands you like. An example alignment looks like this (note this is one alignment record, although it has been broken up below for readability).

Code Block
titleExample miRNA alignment record
TUPAC_0037_FC62EE7AAXX:2:1:2607:1430#0/1  0  hsa-mir-302b  50  22 3S20M13S * 0 0
    TACGTGCTTCCATGTTTTANTAGAAAAAAAAAAAAG  ZZFQV]Z[\IacaWc]RZIBVGSHL_b[XQQcXQcc
    AS:i:37 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:16G3       YT:Z:UU

Notes:

  • This is one alignment record, although it has been broken up below for readability.
  • Notice the CIGAR string is 3S20M13S, meaning that 3 bases were soft clipped from one end (3S), and 13 from the other (13S).
    • If we did the same alignment using either bowtie2 --end-to-end mode, or using bwa aln as in Exercise #1, very little of this file would have aligned.
  • The 20M part of the CIGAR string says there was a block of 20 read bases that mapped to the reference.
    • If we had not lowered the seed parameter of Bowtie2 from its default of 22, we would not have found many of the alignments like this one that only matched for 20 bases.

...