To run bwa on SOLiD colorspace data do the following:
1. Create fastq:
Your output fastq file will be named "p1.<experiment name>.fastq"
Use solid2fastq_v2.pl to convert csfasta to fastq file. csfastaToFastq seems to trim off the last digit in the read id, so you may prefer to use solid2fastq_v2.pl
When your csfasta files and quality files are named test_F3.csfasta test_F3_QV.qual, test_F5-RNA.csfasta and test_F5-RNA_QV.qual run:
Output files will be called test_F3.out.single.fastq and test_F5.out.single.fastq
2. Create an index of your reference:
This will create a bunch of files with the root name "reference.fasta". In subsequent commands, you simply refer to this set by the base name.
Note: if you are working with a small reference (less than 10MB), you need to use
-a is rather than
-a bwtsw, as the BWT will fail. Details are at the BWA Manual page.
3. Align the fastq to the reference:
If your reads are mate-pair, you must run this command once for your F3 reads and then again for your R3 reads, into seperate output files.
4. Convert the alignment output to a SAM file; the command required depends on whether data is paired-end or single-end:
NOTE: these sam files contain ALL reads, whether they hit the reference or not which can make them LARGE. To filter for just hits, use something like this:
From here, you might wish to:
out.sorted.bam and out.sorted.bam.bai will be ready for IGV.
For some reason, I often have a one bp offset between IGV's view of the genome and BWA's; I fix this by doing math in the parsing of the bwa sampe output:
BWA is also installed in Phylocluster