...
Expand | ||
---|---|---|
| ||
While the yeast_pairedend.sort.bam text file is ~91 MB, its index (yeast_pairedend.sort.bai) is only 20 KB. |
samtools flagstat
we might like to obtain some other statistics, such as the percent of all reads that aligned to the genomeSince the BAM file contains records for both mapped and unmapped reads, just counting records doesn't provide information about the mapping rate of our alignment. The samtools flagstat tool provides very a simple analysis of mapping rate based on the the SAM flag fields, which includes information like whether reads are properly paired, aligned or not, and a few other things. Its syntax is identical to that of samtools idxstats. Here's how to run samtools flagstat and both see the output in the terminal and save it in a file – the samtools flagstat standard output is piped to tee, which both writes it to the specified file and sends it to its standard output:
Code Block | ||||
---|---|---|---|---|
| ||||
samtools flagstat yeast_pairedend.sort.bam | tee yeast_pariedend.flagstat.txt |
You should see something like this:
Code Block |
---|
1184360 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 547664 + 0 mapped (46.24%:-nan%) 1184360 + 0 paired in sequencing 592180 + 0 read1 592180 + 0 read2 473114 + 0 properly paired (39.95%:-nan%) 482360 + 0 with itself and mate mapped 65304 + 0 singletons (5.51%:-nan%) 534 + 0 with mate mapped to a different chr 227 + 0 with mate mapped to a different chr (mapQ>=5) |
Ignore the "+ 0" addition to each line - that is a carry-over convention for counting QA-failed reads that is no longer necessaryrelevant.
The most important statistic is the mapping rate , (here 46%) but this readout also allows you to verify that some common expectations (e.g. that about the same number of R1 and R2 reads aligned, and that most mapped reads are proper pairs) are met.
Exercise: What proportion of mapped reads were properly paired?
Expand | |||||
---|---|---|---|---|---|
| |||||
Divide the number of properly paired reads by the number of mapped reads:
|
Expand | ||
---|---|---|
| ||
About 86% of mapped read were properly paired. This is actually a bit on the low side for ChIP-seq alignments which typically over 90%. |
samtools idxstats
Now that we have a sorted, indexed BAM file, we might like to get some simple statistics about the alignment run. For example, we might like to know how many reads aligned to each chromosome/contig. The samtools idxstats is a very simple tool that provides this information. If you type the command without any arguments, you will see that it could not be simpler - just type the following command:
...