Right now, we'll be using it to call variants (find mutations) in the re-sequenced E. coli genome from the Mapping tutorial. You will need the output SAM files from that tutorial to continue here.
If you do not have the output from the Mapping tutorial, run these commands to copy over the output that would have been produced. Then, you can immediately start this tutorial!
We assume that you are still working in the main directory of called
intro_to_mapping data that you copied to created on
Load the SAMtools module (if not already loaded).
mkdir comparison cp samtools_bowtie/SRR030257.vcf comparison/bowtie.vcf cp samtools_bwa/SRR030257.vcf comparison/bwa.vcf cp samtools_bowtie2/SRR030257.vcf comparison/bowtie2.vcf cd comparison
Use the subcommands
bedtools intersect and
bedtools subtract we can find equal and different predictions between mappers. Try to figure out how to to do this on your own first. Hint: Remember that adding
> output.vcf to the end of a command will pipe the output that is to the terminal into a file, so that you can save it.
- Which mapper finds more variants?
- Can you figure out how to filter the VCF files on various criteria, like coverage, quality, ... ?
- How many high quality mutations are there in these E. coli samples relative to the reference genome?
Look at how the reads supporting these variants were aligned to the reference genome in the Integrative Genomics Viewer (IGV) tutorial.