Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleAnswer

While the yeast_pairedend.sort.bam text file is ~91 MB, its index (yeast_pairedend.sort.bai) is only 20 KB.

samtools flagstat

we might like to obtain some other statistics, such as the percent of all reads that aligned to the genomeSince the BAM file contains records for both mapped and unmapped reads, just counting records doesn't provide information about the mapping rate of our alignment. The samtools flagstat tool provides very a simple analysis of mapping rate based on the the SAM flag fields, which includes information like whether reads are properly paired, aligned or not, and a few other things. Its syntax is identical to that of samtools idxstats. Here's how to run samtools flagstat and both see the output in the terminal and save it in a file – the samtools flagstat standard output is piped to tee, which both writes it to the specified file and sends it to its standard output:

Code Block
languagebash
titleRun samtools flagstat using tee
samtools flagstat yeast_pairedend.sort.bam | tee yeast_pariedend.flagstat.txt

You should see something like this: 

Code Block
1184360 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
547664 + 0 mapped (46.24%:-nan%)
1184360 + 0 paired in sequencing
592180 + 0 read1
592180 + 0 read2
473114 + 0 properly paired (39.95%:-nan%)
482360 + 0 with itself and mate mapped
65304 + 0 singletons (5.51%:-nan%)
534 + 0 with mate mapped to a different chr
227 + 0 with mate mapped to a different chr (mapQ>=5)

Ignore the "+ 0" addition to each line - that is a carry-over convention for counting QA-failed reads that is no longer necessaryrelevant.

The most important statistic is the mapping rate , (here 46%) but this readout also allows you to verify that some common expectations (e.g. that about the same number of R1 and R2 reads aligned, and that most mapped reads are proper pairs) are met.

Exercise: What proportion of mapped reads were properly paired?


Expand
titleHint

Divide the number of properly paired reads by the number of mapped reads:

Code Block
languagebash
echo $(( 473114 / 547664 ))
# or
awk 'BEGIN{ print 473114 / 547664 }'


Expand
titleAnswer

About 86% of mapped read were properly paired. This is actually a bit on the low side for ChIP-seq alignments which typically over 90%.


samtools idxstats

Now that we have a sorted, indexed BAM file, we might like to get some simple statistics about the alignment run. For example, we might like to know how many reads aligned to each chromosome/contig. The samtools idxstats is a very simple tool that provides this information. If you type the command without any arguments, you will see that it could not be simpler - just type the following command:

...