BWA
BWA version
To run bwa on SOLiD colorspace data do the following:
1. Create fastq:
Code Block |
---|
csfastaToFastq -f <in.csfasta> -q <in_QV.qual> -e <experiment name> |
Your output fastq file will be named "p1.<experiment name>.fastq"
(or)
Use solid2fastq_v2.pl to convert csfasta to fastq file. csfastaToFastq seems to trim off the last digit in the read id, so you may prefer to use solid2fastq_v2.pl
When your csfasta files and quality files are named test_F3.csfasta test_F3_QV.qual, test_F5-RNA.csfasta and test_F5-RNA_QV.qual run:
Code Block |
---|
solid2fastq_v2.pl test_F3 test_F3.out solid2fastq_v2.pl test_F5-RNA test_F5-RNA.out |
Output files will be called test_F3.out.single.fastq and test_F5.out.single.fastq
2. Create an index of your reference:
Code Block |
---|
bwa index -a bwtsw <reference.fasta> |
For colorspace,
Code Block |
---|
bwa index -a bwtsw -c <reference.fasta> |
This will create a bunch of files with the root name "reference.fasta". In subsequent commands, you simply refer to this set by the base name.
Note: if you are working with a small reference (less than 10MB), you need to use -a is
rather than -a bwtsw
, as the BWT will fail. Details are at the BWA Manual page.
3. Align the fastq to the reference:
Code Block |
---|
bwa aln -f out.sai <reference.fasta> p1.<experiment name>.fastq &>bwa.log & |
For colorspace,
Code Block |
---|
bwa aln -c -f out.sai <reference.fasta> p1.<experiment name>.fastq &>bwa.log & |
If your reads are mate-pair, you must run this command once for your F3 reads and then again for your R3 reads, into seperate output files.
4. Convert the alignment output to a SAM file; the command required depends on whether data is paired-end or single-end:
Code Block |
---|
bwa sampe -f out.sam <reference.fasta> F3.sai F5.sai F3.fastq F5.fastq &>bwa.sampe.log & bwa -f out.sam samse <reference.fasta> F3.sai F3.fastq &>bwa.samse.log & |
NOTE: these sam files contain ALL reads, whether they hit the reference or not which can make them LARGE. To filter for just hits, use something like this:
Code Block |
---|
bwa sampe <reference.fasta> F3.sai F5.sai F3.fastq F5.fastq | \ awk '{if (substr($1,1,1)=="@") {print $0} else {if ($3!="*") {print $0}}}' > out.sam |
From here, you might wish to:
Code Block |
---|
samtools view \-b \-S <in.sam> > out.bam (convert to BAM file) samtools sort out.bam out.sorted (sort the BAM file) samtools index out.sorted.bam (index the BAM file) |
out.sorted.bam and out.sorted.bam.bai will be ready for IGV.
For some reason, I often have a one bp offset between IGV's view of the genome and BWA's; I fix this by doing math in the parsing of the bwa sampe output:
Code Block |
---|
awk 'BEGIN {OFS="\t"} {if (substr($1,1,1)=="@") {print $0} else {if ($3!="*") {$4=$4-1; print $0}}}' |
BWA is also installed in Phylocluster