...
Like other tools you've worked with so far, you first need to load bwaas a BioContainers module. Do that now, and then enter bwa with no arguments to view the top-level help page (many NGS tools will provide some help when called with no arguments). Note that bwa is available both from the standard TACC module system and as BioContainers. module.
Code Block | ||
---|---|---|
| ||
module load biocontainers # optional, may take a while
module load bwa
bwa |
...
You should now have a SAM file (yeast_pairedend.sam) that contains the alignments. It's just a text file, so take a look with head, more, less, tail, or whatever you feel like. In the next section, with samtools, Later you'll learn some additional ways to analyze the data with samtools once you create a BAM file.
...
In this exercise, we will use explore five utilities provided by samtools: view, sort, index, flagstat, and idxstats. Each of these is executed in one line for a given SAM/BAM file. In the SAMtools/BEDtools sections tomorrow we will explore samtools in more in depth.
Warning | ||
---|---|---|
| ||
There are two main "eras" of SAMtools development:
Unfortunately, some functions with the same name in both version eras have different options and arguments! So be sure you know which version you're using. (The samtools version is usually reported at the top of its usage listing). The default version in the ls5 module system is 1.3.1a "modern" version, but the BioITeam has a copy of the version 0.1.19 samtools for programs that might need it: /work/projects/BioITeam/ls5/bin/samtools-0.1.19. That version is also available as a TACC BioContainers module. |
samtools view
The samtools view utility provides a way of converting between SAM (text) and BAM (binary, compressed) format. It also provides many, many other functions which we will discuss lster. To get a preview, execute samtools view without any other arguments. You should see:
...
Expand | ||
---|---|---|
| ||
samtools view -h shows header records along with alignment records. samtools view -H shows header records only. |
samtools sort
Looking at some of the alignment record information (e.g. samtools view yeast_pairedend.bam | cut -f 1-4 | more), you will notice that read names appear in adjacent pairs (for the R1 and R2), and the in the same order they appeared in the original FASTQ file. Since that means the corresponding mappings are in no particular order, with chromosomes and start positions all mixed up. This makes searching through the file very inefficient. samtools sort provides the ability to re-order orders entries in the SAM file either by locus (contig name + coordinate position) or by read name.
If you execute samtools sort without any options, you see its help page:
...
In most cases you will be sorting a BAM file from name order to coordinate locus order. You can use either -o or reidrection redirection with > to control the output.
Expand | |||||
---|---|---|---|---|---|
| |||||
Copy aligned yeast BAM file
|
To sort the paired-end yeast BAM file by coordinate, and get a BAM file named yeast_pairedend.sort.bam as output, execute the following command:
...