MPI BLAST

The mpiBLAST module allows one to perform large, distributed BLAST database searches that could be prohibitively time-consuming on single-node computing systems.

You will learn from this exercise:

The basics of configuring mpiBLAST
Setting up a parallel environment (-pe) variable
Launching MPI jobs using ibrun

Set up .ncbirc

mkdir $WORK/mpiblast
chmod a+rw $WORK/mpiblast
Create a file name .ncbirc in your home directory and open it (e.g. nano ~/.ncbirc) – edit it to be similar to this, though you will to substitute the path your your own $WORK/mpiblast directory


 mpiBLAST
 Shared=/work/01374/vaughn/mpiblast
 Local=/tmp

 NCBI
 data=/work/01374/vaughn/mpiblast/data

Set up work directory and create an mpiBLAST-formatted database

cp -R /work/01374/vaughn/home/lonestar/UTRC/tutorial02 $WORK
cd $WORK/tutorial02
cp -R data $WORK/mpiblast/
module load mpiblast
$TACC_MPIBLAST_BIN/mpiformatdb -i plantrefseq.fa --nfrags=24
mv plantrefseq.fa.* $WORK/mpiblast/

Write and submit a MPI-based SGE script for mpiBLAST


 #!/bin/bash

 # You will learn from this exercise:
 ## The basics of configuring mpiBLAST
 ## Setting up a parallel environment (-pe) variable
 ## Launching MPI jobs using ibrun

 #$ -V

 #$ -cwd
 #$ -N tut2-mpiblast
 #$-A 20111206BIO
 #$ -j y
 #$ -pe 12way 48
 #$ -q development
 #$ -l h_rt=00:15:00

# Set up module
module load mpiblast

# ibrun is the MPI launcher at TACC
# euc_assembly.fa contains 721 transcript assemblies from Eucalyptus that are 2kb or longer
# plantrefseq.fa is the reference peptide set from NCBI refseq
## Corresponds to the database we built using mpiformatdb
# We are asking blastx to emit tabular results ( -m 9 )

ibrun -n 24 -o 0 $TACC_MPIBLAST_BIN/mpiblast -p blastx \
-i euc_assembly.fa -d plantrefseq.fa -m 9 \
-o euc_assembly.blast9

Links

http://www.mpiblast.org/Docs/Guide