Mapping with BWA

Objectives

In this lab, you will explore a popular fast mapper called BWA. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). The objectives of this lab is mainly to:

Learn how BWA works and how to use it.

Introduction

BWA (the Burrows-Wheeler Aligner) is a fast short read aligner. It's the successor to another aligner you might have used or heard of called MAQ (Mapping and Assembly with Quality). As the name suggests, it uses the burrows-wheeler transform to perform alignment in a time and memory efficient manner.

BWA Variants

BWA has three different algorithms:

For reads upto 100 bp long:
- BWA-backtrack : BWA aln/samse/sampe
For reads upto 1 Mbp long:
- BWA-SW
- BWA-MEM : Newer! Typically faster and more accurate.

Get your data

Six raw data files have been provided for all our further RNA-seq analysis:

c1_r1, c1_r2, c1_r3 from the first biological condition
c2_r1, c2_r2, and c2_r3 from the second biological condition

Get set up for the exercises

cds
cd my_rnaseq_course
cp -r /corral-repl/utexas/BioITeam/rnaseq_course_2015/bwa_exercise . &
cd bwa_exercise

Run BWA

Load the module:

module load bwa/0.7.7

There are multiple versions of BWA on TACC, so you might want to check which one you have loaded for when you write up your awesome publication that was made possible by your analysis of next-gen sequencing data.

Here are some commands that could help...

module spider bwa
module list
bwa

You can see the different commands available under the bwa package from the command line help:

bwa

Part 1. Create a index of your reference

NO NEED TO RUN THIS NOW- YOUR INDEX HAS ALREADY BEEN BUILT!

bwa index -a bwtsw reference/genome.fa

Part 2. Align the samples to reference using bwa mem

Alternatively, lets also try running alignment using the newest and greatest, BWA MEM. Alignment is just one single step with bwa mem.

Submit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by sbatch.

Put this in your commands file

nano commands.mem
 
bwa mem reference/genome.fa data/GSM794483_C1_R1_1.fq data/GSM794483_C1_R1_2.fq > C1_R1.mem.sam
bwa mem reference/genome.fa data/GSM794484_C1_R2_1.fq data/GSM794484_C1_R2_2.fq > C1_R2.mem.sam
bwa mem reference/genome.fa data/GSM794485_C1_R3_1.fq data/GSM794485_C1_R3_2.fq > C1_R3.mem.sam
bwa mem reference/genome.fa data/GSM794486_C2_R1_1.fq data/GSM794486_C2_R1_2.fq > C2_R1.mem.sam
bwa mem reference/genome.fa data/GSM794487_C2_R2_1.fq data/GSM794487_C2_R2_2.fq > C2_R2.mem.sam
bwa mem reference/genome.fa data/GSM794488_C2_R3_1.fq data/GSM794488_C2_R3_2.fq > C2_R3.mem.sam

Use this Launcher_creator command

launcher_creator.py -n mem -t 04:00:00 -j commands.mem -q normal -a UT-2015-05-18 -m "module load bwa/0.7.7" -l bwa_mem_launcher.slurm

Since this will take a while to run, you can look at already generated results at: bwa_mem_results

Help! I have a lots of reads and a large number of reads. Make BWA go faster!

Use threading option in the bwa command ( bwa -t <number of threads>)
Split one data file into smaller chunks and run multiple instances of bwa. Finally concatenate the output.
- WAIT! We have a pipeline for that!
- Look for runBWA.sh in $BI/bin (it should be in your path)

Now that we are done mapping, lets look at how to assess mapping results.

Space shortcuts

Page tree