SAMtools is a suite of commands for dealing with databases of mapped reads. You'll be using it quite a bit throughout the course.
Calling variants in reads mapped by bowtie
Right now, we'll be using it to call variants (find mutations) in the re-sequenced E. coli genome from the Introduction to mapping (bowtie, BWA). You will need the output SAM files from that tutorial to continue here. We assume that you are still in the main directory of
introduction_to_mapping data that you copied to
cp bowtie/SRR030257.sam samtools_bowtie cp bowtie/NC_012967.1.fasta samtools_bowtie
Take a look at this file. It has a nice header explaining what the columns mean.
VCF format has Allele Frequency tags denoted by AF1. Try the following command to see what values we have in our files.
What does the -v flag do in grep?
Calling variants in reads mapped by BWA
Follow the same directions to call variants in the BWA-mapped reads.
You could also try running all of the commands from inside of the
samtools_bwa directory, just for a change of pace.
Comparing the results of different mappers
Set up a new output directory and copy the respective VCF files to it.
subtractBed -a bowtie.vcf -b common_bowtie_bwa.vcf > unique_bowtie.vcf subtractBed -a bwa.vcf -b common_bowtie_bwa.vcf > unique_bwa.vcf
- Which mapper finds more variants?
- Can you figure out how to filter the VCF files on various criteria, like coverage, quality, ... ?
- How many high quality mutations are there in these E. coli samples relative to the reference genome?
We will get a chance to look at how the reads supporting these variants were aligned to the reference genome by Using the Integrative Genomics Viewer (IGV)