Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 With that, we're ready to get started on the first exercise.

Exercise #1: BWA global alignment – Yeast ChIP-seq

Overview ChIP-seq alignment workflow with BWA

...

Expand
titleAnswer
Recall that these are 100 bp reads and we did not remove adapter contamination. There will be a distribution of fragment sizes – some will be short – and those short fragments may not align without adapter removal (fastx_trimmer or cutadapt).

Exercise #2: Bowtie2 global alignment - Vibrio cholerae RNA-seq

While we have focused on aligning eukaryotic data, the same tools can be used to perform identical functions with prokaryotic data.  The major differences are less about the underlying data and much more about the external/public databases established to store and distribute reference data.  For example, the Illumina iGenome resource provides pre-processed and uniform reference data, designed to be out-of-the-box compatible with aligners like bowtie2 and bwa.  However, the limited number of available species are heavily biased towards model eukaryotes. If we wanted to study a prokaryote, the reference data must be downloaded from a resource like GenBank, and processed/indexed similarly to the procedure for mirbase.  

...

Now we will go back to our scratch area to do the alignment, and set up symbolic links to the index in the work area to simplify the alignment command:

Code Block
languagebash
cd $SCRATCH/core_ngs/alignment
ln -s -f $WORK/core_ngs/references/bt2/vibCho vibCho

Note that here the data is from standard mRNA sequencing, meaning that the DNA fragments are typically longer than the reads. There is likely to be very little contamination that would require using a local rather than global alignment, or many other pre-processing steps (e.g. adapter trimming). Thus, we will run bowtie2 with default parameters, omitting options other than the input, output, and reference index. The

As you can tell from looking at the bowtie2 help message, the general syntax looks like this:

Code Block
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]

So our command would look like this:

Code Block
languagebash
bowtie2 -x vibCho/vibCho.O395.fa -U fastq/cholera_rnaseq.fastq.gz -S cholera_rnaseq.sam

...

When the job is complete you should have a cholera_rnaseq.sam file that you can examine using whatever commands you like. In the previous exercise we explored a few different commands that you might run to get different kinds of information. Those commands will provide similar information here since this file is also in SAM/BAM format. In the last exercise, we will come back to this SAM file to explore ways to compress them effectively and to extract basic quality statistics.

Exercise #3: Bowtie2 local alignment - Human microRNA-seq

...

Code Block
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]

Let's make a link to the mirbase index directory to make our command line simpler:

...

Expand
titleAnswer

We are looking at mapping quality values for both aligned and un-aligned records, but mapping quality only makes sense for aligned reads. This expression does not distinguish between mapping quality = 0 because the read mapped to multiple locations, and mapping quality = 0 because the sequence did not align.

The proper solution will await the use of samtools to filter out unmapped reads.

In previous exercises we explored a few different commands that you might run to get different kinds of information. Those commands will provide similar information here since this file is also in SAM/BAM format. In the last exercise, we will come back to this SAM file to explore ways to compress them effectively and to extract basic quality statistics.

Exercise #4: BWA-MEM - Human mRNA-seq

...