Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Overview ChIP-seq alignment workflow with BWA

We will perform a global alignment of the paired-end Yeast ChIP-seq sequences using bwa. This workflow generally has the following steps:

...

We're going to skip the trimming step for now and see how it goes. We'll perform steps 2 - 5 now and leave samtools for the next course section, since those steps (6 - 10) are common to nearly all post-alignment workflows.

...

Introducing BWA

...

Like other tools you've worked with so far, you first need to load bwa using the module system.  Go ahead and do that now, and then enter bwa with no arguments to view the top-level help page (many NGS tools will provide some help when called with no arguments).

...

As you can see, bwa offers a number of sub-commands one can use with to do different things.

Building the BWA sacCer3 index

We're going to index the genome with the index command. To learn what this sub-command needs in the way of options and arguments, enter bwa index with no arguments.

...

Now, we're ready to execute the actual alignment, with the goal of initially producing a SAM file from the input FASTQ files and reference. First go to the align directory, and link to the sacCer3 reference directory (this will make our commands more readable).

...

Now we're going to switch over to a different aligner that was originally designed for very short reads , and is frequently used for RNA-seq data.  AccordinglyAccordingly, we have prepared a another test microRNA-seq dataset for you to experiment with .  This (not the same one you used cutadapt on). This data is derived from a human H1 embryonic stem cell (H1-hESC) small RNA dataset generated by the ENCODE Consortium (its about a half million reads).  However

However, there is a problem!  We don't know (or, well, you don't know) what the adapter structure or sequences were.  SoSo, you have a bunch of 36 base pair reads, but many of those reads will include extra sequence that will can impede alignment !  We – and we don't know where! We need an aligner that can find subsections of the read that do align, and discard (or "soft-clip") the rest away .  Bowtie2 – an aligner with a local alignment mode. Bowtie2 is just such an aligner.

Overview miRNA alignment workflow with bowtie2

If the adapter structure were known, the normal workflow would be to first remove the adapter sequences with cutadapt. Since we can't do that, we will instead perform a local lignment of the single-end miRNA sequences using bowtie2. This workflow has the following steps:

  1. Prepare the mirbase v20 reference index for bowtie2 (one time) using bowtie2-build 
  2. Perform local alignment of the R1 reads with bowtie2, producing a SAM file directly
  3. Convert the SAM file to a BAM file (samtools view)
  4. Sort the BAM file by genomic location (samtools sort)
  5. Index the BAM file (samtools index)
  6. Gather simple alignment statistics (samtools flagstat and samtools idxstat)

This looks so much simpler than bwa – only one alignment step instead of three! We'll see the price for this "simplicity" in a moment...

As before, we will just do the alignment steps leave samtools for the next section.

Introducing bowtie2

Go ahead and load the bowtie2 module so we can examine some help pages and options. To do that,  Go ahead and load it up so we can examine some help pages and options.  To do that, you must first load the "perl" module, and then the bowtie2 module designated "bowtie/2.2.0". 

...

a specific version of bowtie2

Code Block
module load perl
module load bowtie/2.2.0
bowtie2

Now that it's loaded, check out the options.  There There are a LOT lot of them! In fact for the full range of options and their meaning, Google "Bowtie2 manual" and bring up that page. Ouch!

This  This is the key to using Bowtie2 bowtie2 - it allows you to control almost everything about its behavior, but that also makes it is much more challenging to use than bwa – and it's easier to screw things up too!

Building the bowtie2 mirbase index

Before the alignment, of course.  Before getting to using the tool, though, we've got to build a mirbase index .  This requires using the command " bowtie2-build " - (go ahead and check out its options).  Unlike Unlike for the aligner itself, we only need to worry about a few things here:

Code Block
bowtie2-build <reference_in> <bt2_index_base>

Here, the reference_in file is just our the FASTA file containing mirbase v20 sequences, and the bt2_index_base is the prefix of where we want the files to go.  Following Following what we did earlier for BWA indexing:

...