MPI BLAST
The mpiBLAST module allows one to perform large, distributed BLAST database searches that could be prohibitively time-consuming on single-node computing systems.
You will learn from this exercise:
- The basics of configuring mpiBLAST
- Setting up a parallel environment (-pe) variable
- Launching MPI jobs using ibrun
Set up .ncbirc
- mkdir $WORK/mpiblast
- chmod a+rw $WORK/mpiblast
- Create a file name .ncbirc in your home directory and open it (e.g. nano ~/.ncbirc) – edit it to be similar to this, though you will to substitute the path your your own $WORK/mpiblast directory
mpiBLAST
Shared=/work/01374/vaughn/mpiblast
Local=/tmp
NCBI
data=/work/01374/vaughn/mpiblast/data
Set up work directory and create an mpiBLAST-formatted database
- cp -R /work/01374/vaughn/home/lonestar/UTRC/tutorial02 $WORK
- cd $WORK/tutorial02
- cp -R data $WORK/mpiblast/
- module load mpiblast
- $TACC_MPIBLAST_BIN/mpiformatdb -i plantrefseq.fa --nfrags=24
- mv plantrefseq.fa.* $WORK/mpiblast/
Write and submit a MPI-based SGE script for mpiBLAST
#!/bin/bash
# You will learn from this exercise: ## The basics of configuring mpiBLAST ## Setting up a parallel environment (-pe) variable ## Launching MPI jobs using ibrun
#$ -V
#$ -cwd
#$ -N tut2-mpiblast
#$-A 20111206BIO
#$ -j y
#$ -pe 12way 48
#$ -q development
#$ -l h_rt=00:15:00
# Set up module module load mpiblast # ibrun is the MPI launcher at TACC # euc_assembly.fa contains 721 transcript assemblies from Eucalyptus that are 2kb or longer # plantrefseq.fa is the reference peptide set from NCBI refseq ## Corresponds to the database we built using mpiformatdb # We are asking blastx to emit tabular results ( -m 9 )
ibrun -n 24 -o 0 $TACC_MPIBLAST_BIN/mpiblast -p blastx \ -i euc_assembly.fa -d plantrefseq.fa -m 9 \ -o euc_assembly.blast9
Links