MAINTENANCE OUTAGE: The University Wiki Service will undergo maintenance on September 26th, 2017, from 6 pm to 8 pm. During this 2 hour time period https://wikis.utexas.edu may be unavailable. Users are advised to save content locally that may be needed during this time and to otherwise save all edits as unsaved work may be lost. Please contact the UT Service Desk at 512-475-9400 for any questions.
The University Wiki Service has upgraded the Confluence Server software, from version 5.9.14 to 5.10.8. Please refer to the knowledge base article, KB0015891, for a high level summary of upgrade changes. Thank you!
Skip to end of metadata
Go to start of metadata

BWA

BWA version
To run bwa on SOLiD colorspace data do the following:
1. Create fastq:

csfastaToFastq -f <in.csfasta> -q <in_QV.qual> -e <experiment name>

Your output fastq file will be named "p1.<experiment name>.fastq"

(or)
Use solid2fastq_v2.pl to convert csfasta to fastq file. csfastaToFastq seems to trim off the last digit in the read id, so you may prefer to use solid2fastq_v2.pl

When your csfasta files and quality files are named test_F3.csfasta test_F3_QV.qual, test_F5-RNA.csfasta and test_F5-RNA_QV.qual run:

solid2fastq_v2.pl test_F3 test_F3.out
solid2fastq_v2.pl test_F5-RNA test_F5-RNA.out

Output files will be called test_F3.out.single.fastq and test_F5.out.single.fastq

2. Create an index of your reference:

bwa index -a bwtsw <reference.fasta>

For colorspace,

bwa index -a bwtsw -c <reference.fasta>

This will create a bunch of files with the root name "reference.fasta". In subsequent commands, you simply refer to this set by the base name.

Note: if you are working with a small reference (less than 10MB), you need to use -a is rather than -a bwtsw, as the BWT will fail. Details are at the BWA Manual page.

3. Align the fastq to the reference:

bwa aln -f out.sai <reference.fasta> p1.<experiment name>.fastq &>bwa.log & 

For colorspace,

bwa aln -c -f out.sai <reference.fasta> p1.<experiment name>.fastq &>bwa.log & 

If your reads are mate-pair, you must run this command once for your F3 reads and then again for your R3 reads, into seperate output files.

4. Convert the alignment output to a SAM file; the command required depends on whether data is paired-end or single-end:

bwa sampe -f out.sam <reference.fasta> F3.sai F5.sai F3.fastq F5.fastq &>bwa.sampe.log &
bwa -f out.sam samse <reference.fasta> F3.sai F3.fastq &>bwa.samse.log &

NOTE: these sam files contain ALL reads, whether they hit the reference or not which can make them LARGE.  To filter for just hits, use something like this:

bwa sampe <reference.fasta> F3.sai F5.sai F3.fastq F5.fastq | \
awk '{if (substr($1,1,1)=="@") {print $0} else {if ($3!="*") {print $0}}}' > out.sam

From here, you might wish to:

samtools view \-b \-S <in.sam> > out.bam  (convert to BAM file)
samtools sort out.bam out.sorted  (sort the BAM file)
samtools index out.sorted.bam (index the BAM file)

out.sorted.bam and out.sorted.bam.bai will be ready for IGV.

For some reason, I often have a one bp offset between IGV's view of the genome and BWA's; I fix this by doing math in the parsing of the bwa sampe output:

awk 'BEGIN {OFS="\t"} {if (substr($1,1,1)=="@") {print $0} else {if ($3!="*") {$4=$4-1; print $0}}}'

BWA is also installed in Phylocluster

http://bio-bwa.sourceforge.net/

  • No labels