Using threading (where available)

Recall that we added a 'time' command to the BWA and Bowtie commands. This will print a message to your job's log that will tell us how long the task took to run. Your jobs are probably still running, but we have included log files from our own BWA alignment in the $BASE directory. They are called align_bwa_01.e774173 and align_bwa_01.o774173. The one we are interested in is align_bwa_01.e774173. Let's dig into and find out some numbers, but first, an introduction to the 'time' command is in order.

Interpreting the output of the UNIX time command

Type in the following command, which will report how long it takes to 'sleep' for 30 seconds

login1$ time sleep 30s
real	0m30.006s
user	0m0.000s
sys	0m0.000s

Question: What line in the result from your time command reports the total duration that the task ran?

The real reports total elapsed time. The 'user' field reports how much time was devoted to tasks owned by the user, and 'sys' reports how much time the operating system consumed over the same period.

Now, use the 'less' command to page through the align_bwa_01.e773416 file.

Question: How much elapsed time was required for each BWA (aln and sampe) task to complete?

...
real	13m7.858s
user	9m30.156s
sys	0m4.176s
...
real	9m33.198s
user	9m23.711s
sys	0m2.860s
...
real	15m18.292s
user	12m47.308s
sys	0m22.737s
...

The first one seemed to take 13m, and the second 9.5m, and the third 15m. Summing them together, it took ~38m for BWA to finish all tasks associated with alignment.

Bonus question: If your Bowtie and BWA tasks have run to completion, compare how long it took for each aligner to generate a SAM file of the sequence alignments

Use 'less' to page through the .e files resulting from your jobs. Sum the real times for the BWA tasks in order to compare directly to the single Bowtie task.

Adding parallelism to your BWA and Bowtie alignments

It turns out that both BWA and Bowtie support on-node parallelism via use of threads. Let's test out their usage! Create a new folder 'threaded' in your $WORK directory, then copy the original align_bwa_01.sh and align_bowtie_01.sh files into it. Rename them to align_bwa_threads.sh and align_bowtie_threads.sh.

login1$ cd $WORK
login1$ mkdir threaded-alignment
login1$ cd threaded-alignment
login1$ cp $BASE/align_bwa_01.sh align_bwa_threads.sh
login1$ cp $BASE/align_bowtie_01.sh align_bowtie_threads.sh

Find out how to invoke 'bwa aln' with multiple threads

login1$ module load bwa/0.6.1
bwa aln

Usage:   bwa aln [options] <prefix> <in.fq>

Options: -n NUM    max #diff (int) or missing prob under 0.02 err rate (float) [0.04]
         -o INT    maximum number or fraction of gap opens [1]
         -e INT    maximum number of gap extensions, -1 for disabling long gaps [-1]
         -i INT    do not put an indel within INT bp towards the ends [5]
         -d INT    maximum occurrences for extending a long deletion [10]
         -l INT    seed length [32]
         -k INT    maximum differences in the seed [2]
         -m INT    maximum entries in the queue [2000000]
         -t INT    number of threads [1]
...

Pass the -t parameter to bwa aln to get threaded behavior. Does bwa sampe support threads?

Question: Given the number of processors on a Lonestar compute node, what is a reasonable number of threads to request?

  • A) 1
  • B) 6
  • C) 12
  • D) 144

A, B, and C. Since most Lonestar compute nodes have 12 processors, you can request anywhere between 1 and 12 threads. Asking for more will lead to an error!

Open align_bwa_threads.sh in a text editor find the 'bwa aln' lines. Edit them to request 6 threads, then submit the job to the Lonestar queue.

...
time bwa aln -t 6 $BASE/human_variation/ref/hs37d5.fa $BASE/human_variation/allseqs_R1.fastq > r1.sai
time bwa aln -t 6 $BASE/human_variation/ref/hs37d5.fa $BASE/human_variation/allseqs_R2.fastq > r2.sai
...

Question: Based on your time profiling from before, how long do you expect it to take to complete the entire BWA mappng job now that you have added threading to the alignment steps?

Because the time required for the 'aln' steps should be reduced by ~6x, but the time to complete 'sampe' will remain constant, we can expect that the entire workflow will complete in about (2+2+15) 19 minutes.

Your turn: Modify align_bowtie_threads.sh to add thread-level parallelism to your Bowtie alignment and submit the job to the Lonestar queue.

First, find out how to invoke threading in Bowtie.

login1$ module load bowtie/0.12.8
login1$ bowtie
No index, query, or output file specified!
Usage:
  bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]

  <m1>    Comma-separated list of files containing upstream mates (or the
          sequences themselves, if -c is set) paired with mates in <m2>
  <m2>    Comma-separated list of files containing downstream mates (or the
          sequences themselves if -c is set) paired with mates in <m1>
  <r>     Comma-separated list of files containing Crossbow-style reads.  Can be
          a mixture of paired and unpaired.  Specify "-" for stdin.
  <s>     Comma-separated list of files containing unpaired reads, or the
          sequences themselves, if -c is set.  Specify "-" for stdin.
  <hit>   File to write hits to (default: stdout)
...
Performance:
  -o/--offrate <int> override offrate of index; must be >= index's offrate
  -p/--threads <int> number of alignment threads to launch (default: 1)
  --mm               use memory-mapped I/O for index; many 'bowtie's can share
  --shmem            use shared mem for index; many 'bowtie's can share
...

Pass -p or --threads to bowtie when you set up the alignment.

time bowtie --chunkmbs 256 --threads 6 -x -t -S $BASE/human_variation/ref/hs37d5.fa -1 $BASE/human_variation/allseqs_R1.fastq -2 $BASE/human_variation/allseqs_R2.fastq hs37d5_allseqs_bowtie.sam

Feel free to experiment with increasing the number of threads you assign to BWA or Bowtie, up to a maximum of 12.

  • No labels