...
Expand | ||
---|---|---|
| ||
Because these are short reads we do not have to adjust parameters like inter-seed distance (-i) or minimum alignment score (--min-score) that are a function of read length. If we were processing longer reads, we might need to use parameters like this, to force bowtie2 to "pretend" the read is a short, constant length: -i C,1,0 Yes, that looks complicated, and it kind of is. It's basically saying "slide the seed down the read one base at a time", and "report alignments as long as they have a minimum alignment score of 32 (16 matching bases x 2 points per match, minimum). See the bowtie2 manual (after you have had a good stiff drink) for a full explanation. |
As you can tell from looking at the bowtie2 help message, the general syntax looks like this:
Code Block |
---|
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>] |
Let's make a link to the mirbase index directory to make our command line simpler:
...
Create a commands file called bt2.cmds with this task definition then generate and submit a batch job for it (time 1 hour, normal development queue).
Use nano to create the bt2.cmds file. Then:
Expand | |||||||
---|---|---|---|---|---|---|---|
| |||||||
|
...
Expand | |||||||
---|---|---|---|---|---|---|---|
| |||||||
We can use our cut / grep trick from Exercise #1, but on the human_mirnaseq.sam file. Since all read names in this file start with TUPAC, we'll use that pattern to select non-header lines.
This expressions returns 221086. |
...
Code Block | ||||
---|---|---|---|---|
| ||||
grep -P -v '^TUPAC^@' human_mirnaseq.sam | cut -f 5 | head |
...
Code Block | ||||
---|---|---|---|---|
| ||||
grep -P -v '^TUPAC^@' human_mirnaseq.sam | cut -f 5 | sort -n | uniq -c | more |
...
Expand | ||
---|---|---|
| ||
We are looking at mapping quality values for both aligned and un-aligned records, but mapping quality only makes sense for aligned reads. This expression does not distinguish between mapping quality = 0 because the read mapped to multiple locations, and mapping quality = 0 because the sequence did not align. The proper solution will await the use of samtools to filter out unmapped reads. |
...
. |
Exercise #4: BWA-MEM - Human mRNA-seq
...