Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Now that we have a sorted, compressed, and indexed BAM file, we might like to get some simple statistics about the alignment run.  For For example, we might like to know how many reads aligned to each chromosome/contig.  The The samtools idxstats is a very simple tool that provides this information.  If If you type the command without any arguments, you will see that it literally could not be simpler - just type the following command:

Code Block

...

code
language
bash
samtools idxstats yeast_pairedend.bam

The output is a text file with four tab-delimited columns with the following meanings: (1)

  1. chromosome name

...

  1. chromosome length

...

  1. number of mapped reads

...

  1. number of unmapped reads

...

The reason that the "unmapped reads" field for the named chromosomes is not zero is that, if one half of a pair of reads aligns while the other half does not, the unmapped read is still assigned to the chromosome its pair mate mapped to, but still is flagged as unmapped.

Tip

If you're mapping to a non-genomic reference such as miRBase miRNAs or another set of genes (a transcriptome), samtools idxstats gives you a quick look at quantitative alignment results.

Samtools flagstat

Finally, we might like to obtain some other statistics, such as the percent of all reads that aligned to the genome. The samtools flagstat tool provides very simple analysis of the SAM flag fields, which includes information like whether reads are properly paired, aligned or not, and a few other things. Its syntax is identical to that of samtools idxstats:

Code Block
languagebash
samtools flagstat yeast_pairedend.bam

Ignore the "+ 0" addition to each line - that is a carriedcarry-over convention for counting QA-failed reads that is no longer necessary.  The The most important statistic is arguably alignment rate, but this readout allows you to verify that some common expectations (e.g. that about the same number of R1 and R2 reads aligned, and that most mapped reads are proper pairs) are met.