...
Right now, we'll be using it to call variants (find mutations) in the re-sequenced E. coli genome from the Introduction to mapping (bowtie, BWA) Mapping tutorial. You will need the output SAM files from that tutorial to continue here. We assume that you are still in the main directory of introductionintro_to_mapping
data that you copied to $SCRATCH
.
Load the SAMtools module (if not already loaded):
Code Block |
---|
module load samtools |
...
Let's copy over the read alignment file in the SAM format and the reference genome in FASTA format to the new directory, so that we don't have so many files cluttering our space up.
Code Block |
---|
cp bowtie/SRR030257.sam samtools_bowtie cp bowtie/NC_012967.1.fasta samtools_bowtie |
Index the reference file. (This isn't indexing it for mapping. It's indexing it so that SAMtools can quickly jump to a certain base in the reference.)
Code Block |
---|
samtools faidx samtools_bowtie/NC_012967.1.fasta |
...
SAM is a text file, so it is slow to access information about how any given read was mapped. SAMtools and many of the commands that we will run later work on BAM files (essentially GZIP compressed binary forms of the text SAM files). These can be loaded much more quickly. Typically, they also need to be sorted, so that when the program wants to look at all reads overlapping position 4,129,888, it can easily find them all at once without having to search through the entire BAM file.
...
Code Block |
---|
samtools mpileup -ufu -f samtools_bowtie/NC_012967.1.fasta samtools_bowtie/SRR030257.sorted.bam > samtools_bowtie/SRR030257.bcf |
...
Code Block |
---|
$TACC_SAMTOOLS_DIR/bcftools/bcftools view -vcgv -c -g samtools_bowtie/SRR030257.bcf > samtools_bowtie/SRR030257.vcf |
...