You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

As Nathan showed you yesterday, the main type of output from aligning reads to a databases is a binary alignment file, or BAM file.  These files are compressed, so they can't be viewed using standard unix file viewers such as more, less and head. Samtools allows you to manipulate the bam files - they can be converted into a non-binary format (SAM format specification here) and can also be ordered and sorted based on the quality of the alignment.  This is a good way to remove low quality reads, or make a bam file restricted to a single chromosome.

We'll be focusing on just a few of samtools functions in this series or exercises:

Since most aligners produce a bam file, we'll work on some basic manipulations of the bam files we produced from our alignments yesterday.  Most functionality while using bam files can be described as such:

  1.     SAM files are converted into BAM files (samstools view)
  2.     BAM files are sorted by reference coordinates (samtools sort)
  3.     Sorted BAM files are indexed (samtools index)
  4.     Sorted, indexed bam files are filtered (based on location, flags, mapping quality)

These steps presume that you are using a mapper/aligners such as bwa, which records mapped vs unmapped reads - make sure you check how the aligner writes it's output to sam/bam format, or you may get a strange surprise!

The code block below details some basic samtools functionality:

basic samtools functionality
samtools view -bH -o outfile_view.bam infile.bam #use the -c option to just count alignments
samtools sort infile.bam outfile.sorted.bam
samtools index aln.sorted.bam

First, logon to stampede and copy the file yeast_pairedend.bam to your scratch directory:

get bam file from Nathan's scratch
cds
mkdir samtools
cd samtools
cp /scratch/02423/nsabell/core_ngs/bam/yeast_pairedend.bam .

Now that we have a bam file, we need to index it. All bam files need an index, as the tend to be large and the index allows us to perform computationally complex operations on these files without it taking days to complete.

Exercise 1:  sort and index the file "yeast_pairedend.bam"

solution
module load samtools
samtools sort yeast_chip.bam yeast_pairedend_sort
samtools index yeast_pairedend_sort.bam

 

Exercise 2:  counting the number of reads on a chromosome

Exercise 3:  Making a sorted bam file with reads that map to multiple places removed

 

  • No labels