Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Like other tools you've worked with so far, you first need to load bwaas a BioContainers module. Do that now, and then enter bwa with no arguments to view the top-level help page (many NGS tools will provide some help when called with no arguments). Note that bwa is available both from the standard TACC module system and as BioContainers. module.

Code Block
languagebash
module load biocontainers # optional, may take a while
module load bwa
bwa

...

You should now have a SAM file (yeast_pairedend.sam) that contains the alignments. It's just a text file, so take a look with head, more, less, tail, or whatever you feel like. In the next section, with samtools, Later you'll learn some additional ways to analyze the data with samtools once you create a BAM file.

...

In this exercise, we will use explore five utilities provided by samtools: view, sort, index, flagstat, and idxstats. Each of these is executed in one line for a given SAM/BAM file. In the SAMtools/BEDtools sections tomorrow we will explore samtools in more in depth.

Warning
titleKnow your samtools version!

There are two main "eras" of SAMtools development:

  • "original" samtools
    • v 0.1.19 is the last stable version
  • "newmodern" samtools
    • v 1.0, 1.1, 1.2 – avoid these (very buggy!)
    • v 1.3+ – finally stable!

Unfortunately, some functions with the same name in both version eras have different options and arguments! So be sure you know which version you're using. (The samtools version is usually reported at the top of its usage listing).

The default version in the ls5 module system is 1.3.1a "modern" version, but the BioITeam has a copy of the version 0.1.19 samtools for programs that might need it: /work/projects/BioITeam/ls5/bin/samtools-0.1.19. That version is also available as a TACC BioContainers module.

samtools view

The samtools view utility provides a way of converting between SAM (text) and BAM (binary, compressed) format. It also provides many, many other functions which we will discuss lster. To get a preview, execute samtools view without any other arguments. You should see:

...

Expand
titleAnswer

samtools view -h shows header records along with alignment records.

samtools view -H shows header records only.

samtools sort

Looking at some of the alignment record information (e.g. samtools view yeast_pairedend.bam | cut -f 1-4 | more), you will notice that read names appear in adjacent pairs (for the R1 and R2), and the in the same order they appeared in the original FASTQ file. Since that means the corresponding mappings are in no particular order, with chromosomes and start positions all mixed up. This makes searching through the file very inefficient. samtools sort provides the ability to re-order orders entries in the SAM file either by locus (contig name + coordinate position) or by read name.

If you execute samtools sort without any options, you see its help page:

...

In most cases you will be sorting a BAM file from name order to coordinate locus order. You can use either -o or reidrection redirection with > to control the output.

Expand
titleSetup (if needed)

Copy aligned yeast BAM file

Code Block
languagebash
mkdir -p $SCRATCH/core_ngs/alignment/yeast_bwa
cp $CORENGS/???/yeast_pairedend.bam $SCRATCH/core_ngs/alignment/yeast_bwa

To sort the paired-end yeast BAM file by coordinate, and get a BAM file named yeast_pairedend.sort.bam as output, execute the following command:

...