The mpiBLAST module allows one to perform large, distributed BLAST database searches that could be prohibitively time-consuming on single-node computing systems.

You will learn from this exercise:

Set up .ncbirc

  1. mkdir $WORK/mpiblast
  2. chmod a+rw $WORK/mpiblast
  3. Create a file name .ncbirc in your home directory and open it (e.g. nano ~/.ncbirc) – edit it to be similar to this, though you will to substitute the path your your own $WORK/mpiblast directory


Set up work directory and create an mpiBLAST-formatted database

  1. cp -R /work/01374/vaughn/home/lonestar/UTRC/tutorial02 $WORK
  2. cd $WORK/tutorial02
  3. cp -R data $WORK/mpiblast/
  4. module load mpiblast
  5. $TACC_MPIBLAST_BIN/mpiformatdb -i plantrefseq.fa --nfrags=24
  6. mv plantrefseq.fa.* $WORK/mpiblast/

Write and submit a MPI-based SGE script for mpiBLAST

 # You will learn from this exercise:
 ## The basics of configuring mpiBLAST
 ## Setting up a parallel environment (-pe) variable
 ## Launching MPI jobs using ibrun
 #$ -V
 #$ -cwd
#$ -N tut2-mpiblast
#$-A 20111206BIO
#$ -j y
#$ -pe 12way 48
#$ -q development
#$ -l h_rt=00:15:00
# Set up module
module load mpiblast

# ibrun is the MPI launcher at TACC
# euc_assembly.fa contains 721 transcript assemblies from Eucalyptus that are 2kb or longer
# plantrefseq.fa is the reference peptide set from NCBI refseq
## Corresponds to the database we built using mpiformatdb
# We are asking blastx to emit tabular results ( -m 9 )


ibrun -n 24 -o 0 $TACC_MPIBLAST_BIN/mpiblast -p blastx \
-i euc_assembly.fa -d plantrefseq.fa -m 9 \
-o euc_assembly.blast9