Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

BWA

BWA version
To run bwa on SOLiD colorspace data do the following:
1. Create fastq:

Code Block
csfastaToFastq -f <in.csfasta> -q <in_QV.qual> -e <experiment name>

Your output fastq file will be named "p1.<experiment name>.fastq"

(or)
Use solid2fastq_v2.pl to convert csfasta to fastq file. csfastaToFastq seems to trim off the last digit in the read id, so you may prefer to use solid2fastq_v2.pl

When your csfasta files and quality files are named test_F3.csfasta test_F3_QV.qual, test_F5-RNA.csfasta and test_F5-RNA_QV.qual run:

Code Block
solid2fastq_v2.pl test_F3 test_F3.out
solid2fastq_v2.pl test_F5-RNA test_F5-RNA.out

Output files will be called test_F3.out.single.fastq and test_F5.out.single.fastq

2. Create an index of your reference:

Code Block
bwa index -a bwtsw <reference.fasta>

For colorspace,

Code Block
bwa index -a bwtsw -c <reference.fasta>

This will create a bunch of files with the root name "reference.fasta". In subsequent commands, you simply refer to this set by the base name.

Note: if you are working with a small reference (less than 10MB), you need to use -a is rather than -a bwtsw, as the BWT will fail. Details are at the BWA Manual page.

3. Align the fastq to the reference:

Code Block
bwa aln -f out.sai <reference.fasta> p1.<experiment name>.fastq &>bwa.log & 

For colorspace,

Code Block
bwa aln -c -f out.sai <reference.fasta> p1.<experiment name>.fastq &>bwa.log & 

If your reads are mate-pair, you must run this command once for your F3 reads and then again for your R3 reads, into seperate output files.

4. Convert the alignment output to a SAM file; the command required depends on whether data is paired-end or single-end:

Code Block
bwa sampe -f out.sam <reference.fasta> F3.sai F5.sai F3.fastq F5.fastq &>bwa.sampe.log &
bwa -f out.sam samse <reference.fasta> F3.sai F3.fastq &>bwa.samse.log &

NOTE: these sam files contain ALL reads, whether they hit the reference or not which can make them LARGE.  To filter for just hits, use something like this:

Code Block
bwa sampe <reference.fasta> F3.sai F5.sai F3.fastq F5.fastq | \
awk '{if (substr($1,1,1)=="@") {print $0} else {if ($3!="*") {print $0}}}' > out.sam

From here, you might wish to:

Code Block
samtools view \-b \-S <in.sam> > out.bam  (convert to BAM file)
samtools sort out.bam out.sorted  (sort the BAM file)
samtools index out.sorted.bam (index the BAM file)

out.sorted.bam and out.sorted.bam.bai will be ready for IGV.

For some reason, I often have a one bp offset between IGV's view of the genome and BWA's; I fix this by doing math in the parsing of the bwa sampe output:

Code Block
awk 'BEGIN {OFS="\t"} {if (substr($1,1,1)=="@") {print $0} else {if ($3!="*") {$4=$4-1; print $0}}}'

BWA is also installed in Phylocluster

http://bio-bwa.sourceforge.net/