Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Right now, we'll be using it to call variants (find mutations) in the re-sequenced E. coli genome from the Introduction to mapping (bowtie, BWA) Mapping tutorial. You will need the output SAM files from that tutorial to continue here. We assume that you are still in the main directory of introductionintro_to_mapping data that you copied to $SCRATCH.

Load the SAMtools module (if not already loaded):

Code Block
module load samtools

...

Let's copy over the read alignment file in the SAM format and the reference genome in FASTA format to the new directory, so that we don't have so many files cluttering our space up.

Code Block
cp bowtie/SRR030257.sam samtools_bowtie
cp bowtie/NC_012967.1.fasta samtools_bowtie

Index the reference file. (This isn't indexing it for mapping. It's indexing it so that SAMtools can quickly jump to a certain base in the reference.)

Code Block
samtools faidx samtools_bowtie/NC_012967.1.fasta

...

SAM is a text file, so it is slow to access information about how any given read was mapped. SAMtools and many of the commands that we will run later work on BAM files (essentially GZIP compressed binary forms of the text SAM files). These can be loaded much more quickly. Typically, they also need to be sorted, so that when the program wants to look at all reads overlapping position 4,129,888, it can easily find them all at once without having to search through the entire BAM file.

...

Code Block
samtools mpileup -ufu -f samtools_bowtie/NC_012967.1.fasta  samtools_bowtie/SRR030257.sorted.bam > samtools_bowtie/SRR030257.bcf

...

Code Block
$TACC_SAMTOOLS_DIR/bcftools/bcftools view -vcgv -c -g samtools_bowtie/SRR030257.bcf > samtools_bowtie/SRR030257.vcf

...