Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

NOTE: The most recent edition of SAMtools is 1.2, which has some important differences from the last version, 0.1.19.  Everything for this section is the same between the two versions, but if you see code from other sources using samtools, the version difference may be important.

Samtools

...

sort

Look at the SAM file briefly using less.  You will notice, if you scroll down, that the alignments are in no particular order, with chromosomes and start positions all mixed up.  This makes searching through the file very inefficient.  Thus, samtools sort is a piece of samtools that provides the ability to re-order entries in the SAM file either by coordinate or by read name.  If you execute samtools sort without any options, you will see its help page as follows:

...

Code Block
samtools sort -O sam -T yeast_pairedend -o yeast_pairedend.sort.sam yeast_pairedend.sam
less yeast_pairedend.sort.sam

Samtools

...

view

You may have noticed in the last help page that samtools sort can specify a BAM file as input or output, which is the smaller, binary form of a SAM file.  This is a viable option if the file needs sorting - however, in many cases you may just want to compress a SAM file by conversion to BAM without any modifications.  The utility samtools view provides a way of converting SAM files to BAM files directly.  It also provides many, many other functions which we will discuss in the next section.  To get a preview, execute samtools view without any other arguments.  You will see:

...

Above, the -b option tells the tool to output BAM, and the -o option specifies the name of the BAM file that will be created.  All the other options have their uses, but we will not discuss them right now.  It is worth noting, however, that if you wanted to convert back from BAM to SAM to read some alignments, you would simply remove the -b option and adjust the file names accordingly.

Samtools

...

index

Many tools (like the UCSC Genome Browser) only need to use sub-sections of the BAM file at a given point in time.  For example, if you are viewing all alignments that are within a particular gene, than the alignments on other chromosomes generally do not need to be loaded.  In order to speed up access, BAI files are BAM index files that allow other programs to navigate directly to the alignments of interest.  This is especially important when you have many alignments.  The utility samtools index directly creates an index that has the exact name as the input file, with '.bai' appended.  The help page, if you execute samtools index with no arguments, is as follows:

...

This will produce a file named yeast_pairedend.bam.bai.  Most of the time when an index is required, it will be automatically located provided it is in the same directory as the BAM file that it was produced from, and shares the same name up until the '.bai'' extension.

Samtools

...

idxstats

Now that we have a sorted, compressed, and indexed BAM file, we might like to get some simple statistics about the alignment run.  For example, we might like to know how many reads aligned to each chromosome/contig.  The samtools idxstats is a very simple tool that provides this information.  If you type the command without any arguments, you will see that it literally could not be simpler - just type the following command:

...

The output is a text file with four tab-delimited columns with the following meanings: (1) chromosome name, (2) chromosome length, (3) number of mapped reads, and (4) number of unmapped reads.  The reason that the "unmapped reads" field for the named chromosomes is not zero is that, if one half of a pair of reads aligns while the other half does not, the unmapped read is still assigned to the chromosome its pair mapped to, but still flagged as unmapped.

Samtools

...

flagstat

Finally, we might like to obtain some other statistics, such as the percent of all reads that aligned to the genome.  The samtools flagstat tool provides very simple analysis of the SAM flag fields, which includes information like whether reads are properly paired, aligned or not, and a few other things.  Its Its syntax is identical to that of samtools idxstats:

...