Page tree
Skip to end of metadata
Go to start of metadata


BWA version
To run bwa on SOLiD colorspace data do the following:
1. Create fastq:

csfastaToFastq -f <in.csfasta> -q <in_QV.qual> -e <experiment name>

Your output fastq file will be named "p1.<experiment name>.fastq"

Use to convert csfasta to fastq file. csfastaToFastq seems to trim off the last digit in the read id, so you may prefer to use

When your csfasta files and quality files are named test_F3.csfasta test_F3_QV.qual, test_F5-RNA.csfasta and test_F5-RNA_QV.qual run: test_F3 test_F3.out test_F5-RNA test_F5-RNA.out

Output files will be called test_F3.out.single.fastq and test_F5.out.single.fastq

2. Create an index of your reference:

bwa index -a bwtsw <reference.fasta>

For colorspace,

bwa index -a bwtsw -c <reference.fasta>

This will create a bunch of files with the root name "reference.fasta". In subsequent commands, you simply refer to this set by the base name.

Note: if you are working with a small reference (less than 10MB), you need to use -a is rather than -a bwtsw, as the BWT will fail. Details are at the BWA Manual page.

3. Align the fastq to the reference:

bwa aln -f out.sai <reference.fasta> p1.<experiment name>.fastq &>bwa.log & 

For colorspace,

bwa aln -c -f out.sai <reference.fasta> p1.<experiment name>.fastq &>bwa.log & 

If your reads are mate-pair, you must run this command once for your F3 reads and then again for your R3 reads, into seperate output files.

4. Convert the alignment output to a SAM file; the command required depends on whether data is paired-end or single-end:

bwa sampe -f out.sam <reference.fasta> F3.sai F5.sai F3.fastq F5.fastq &>bwa.sampe.log &
bwa -f out.sam samse <reference.fasta> F3.sai F3.fastq &>bwa.samse.log &

NOTE: these sam files contain ALL reads, whether they hit the reference or not which can make them LARGE.  To filter for just hits, use something like this:

bwa sampe <reference.fasta> F3.sai F5.sai F3.fastq F5.fastq | \
awk '{if (substr($1,1,1)=="@") {print $0} else {if ($3!="*") {print $0}}}' > out.sam

From here, you might wish to:

samtools view \-b \-S <in.sam> > out.bam  (convert to BAM file)
samtools sort out.bam out.sorted  (sort the BAM file)
samtools index out.sorted.bam (index the BAM file)

out.sorted.bam and out.sorted.bam.bai will be ready for IGV.

For some reason, I often have a one bp offset between IGV's view of the genome and BWA's; I fix this by doing math in the parsing of the bwa sampe output:

awk 'BEGIN {OFS="\t"} {if (substr($1,1,1)=="@") {print $0} else {if ($3!="*") {$4=$4-1; print $0}}}'

BWA is also installed in Phylocluster

  • No labels