With 454 Reads
A few helpful tools to use BLAST with 454 data. Only installed on Fourierseq at the moment.
- bacfish.sh <blast.out> - After running blast on 454 Newbler contigs (usually blastn against nt with an eval of <1e-50, with m=1), you can run this script on the blast output file and it will group your contigs. This was written specifically for sorting out fragment sequencing results of BACs, where you get several contigs and want to quickly validate and bin them into E coli vs. something previously sequenced vs. something new.
- 454blastStats <454reads.fna> - With raw 454 reads in <454reads.fna>, this script will run a high stringency blast against NT and provide a quick-and-dirty frequency plot of top hits. Useful to make sure you sequenced what you thought you were sequencing.
Conversion To GFF (With Track Features)
blast2gff.py will convert BLAST results to a GFF3 file, with track decoration options. You can check the options for the script by running
blast2gff.py parses a file of BLAST results and writes out the relevant parts of the BLAST records in GFF3 format. The script provides options for decorating the data when loaded into a genome browser (specifically IGV or the UCSC Genome Browser) as a track. Most of the options provided will work with either IGV or the UCSC Genome Browser, but a few of them are specific to IGV.
Parsing BLAST results is most reliable when the BLAST results are in XML format.
blast2gff.py accepts tabular data, but the script expects a specific table layout when parsing the data. XML is more reliable.
blast2gff.py, simply give the script the input BLAST file, and use standard Unix redirection to write the GFF3 results to file. For example:
To add track options, simply append the name of the option to the command, along with the parameter, as specified by the standard track line designation. A generic example:
To see the supported options, run