Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleIf our reads were longer

Because these are short reads we do not have to adjust parameters like inter-seed distance (-i) or minimum alignment score (--min-score) that are a function of read length. If we were processing longer reads, we might need to use parameters like this, to force bowtie2 to "pretend" the read is a short, constant length:

-i C,1,0
--score-min=C,32,0

Yes, that looks complicated, and it kind of is. It's basically saying "slide the seed down the read one base at a time", and "report alignments as long as they have a minimum alignment score of 32 (16 matching bases x 2 points per match, minimum).

See the bowtie2 manual (after you have had a good stiff drink) for a full explanation.

As you can tell from looking at the bowtie2 help message, the general syntax looks like this:

Code Block
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]

 

Let's make a link to the mirbase index directory to make our command line simpler:

...

Create a commands file called bt2.cmds with this task definition then generate and submit a batch job for it (time 1 hour, normal development queue).

Use nano to create the bt2.cmds file. Then:

Expand
titleWhat's going on?
Code Block
languagebash
titleLocal bowti2 alignment of miRNA data
launcher_creator.py -n bt2 -j bt2.cmds -t 01:00:00 -q normal
sbatch bt2.slurm; showq -u

...

Expand
titleSolution

We can use our cut / grep trick from Exercise #1, but on the human_mirnaseq.sam file. Since all read names in this file start with TUPAC, we'll use that pattern to select non-header lines.

Code Block
languagebash
titleCount aligned SAM records
grep -P -v '^TUPAC^@' human_mirnaseq.sam | cut -f 3 | grep -v '*' | wc -l

This expressions returns 221086.

...

Code Block
languagebash
titleCut mapping quality SAM field
grep -P -v '^TUPAC^@' human_mirnaseq.sam | cut -f 5 | head

...

Code Block
languagebash
titleCut mapping quality SAM field
grep -P -v '^TUPAC^@' human_mirnaseq.sam | cut -f 5 | sort -n | uniq -c | more

...

Expand
titleAnswer

We are looking at mapping quality values for both aligned and un-aligned records, but mapping quality only makes sense for aligned reads. This expression does not distinguish between mapping quality = 0 because the read mapped to multiple locations, and mapping quality = 0 because the sequence did not align.

The proper solution will await the use of samtools to filter out unmapped reads.

...

.

Exercise #4: BWA-MEM - Human mRNA-seq

...