The DIAMOND protein aligner is a recent tool offering much faster (100× to 1000× faster than Blast) alignment of protein sequences against reference databases. Whenever the BLAST databases are updated and installed, diamond prepdb is run on each of the protein-format databases so that they can be searched directly by diamond.  The BLAST databases are in /stor/system/opt/blastdb. You can run diamond on any of the protein databases.

For example:

export BLASTDB=/stor/system/opt/blastdb
diamond --db $BLASTDB/nr ...
diamond --db $BLASTDB/refseq_protein ...
diamond --db $BLASTDB/nt ..

According to DIAMOND's developer, these are faster to load than DIAMOND's own .dmnd-format databases. So, you may want to use --db $BLASTDB/nr for your NCBI nr searches, for example, instead of --db $DIAMOND_NR

The BLAST databases use the "v5" format, which includes rich taxonomic infromation with sequences, and will only work with the Blast tools from the module blast/2.8.0+ and later. Earlier module versions can still be used, but you will need to provide/build your own databases. NCBI no longer updates databases with the older "v4" databases as of February 2020, and they have been deleted from our systems. The final updates of these databases (as of this writing) are available from NCBI over FTP at ftp://ftp.ncbi.nlm.nih.gov/blast/db/v4

  • No labels