Analyzing your aligned experiment with BedTools

Now that we have a bam file with only the reads we want included, we can do some more sophisticated analysis using bedtools. Bedtools changes from version to version, and here we are using version 2.22, the newest version, and what is currently installed on stampede.

Converting a bam file to a fastq file

Sometimes, especially when working with external data, we need to go from a bam file back to a fastq file. This can be useful for re-aligning reads using a different aligner, different settings on the original aligner used. It can also be useful for extracting the sequence of interesting regions of the genome after you have manipulated your bam file.

For this exercise, you'll be using bamtofastq. This function takes an aligned bam file as input and outputs a fastq format file. You can use the options if you have paired end data to output R1 and R2 reads for your fastq file. This type of function is especially useful if you need to to analyze sequences after you've compared several bam or bed files.

bedtools bamtofastq [OPTIONS] -i input.bam -fq output.fastq

Exercise 1: convert bam to fastq and look at the quality scores

Solution

solution code

module load bedtools
bedtools bamtofastq [OPTIONS] -i input.bam -fq output.fastq
more output.fastq

Bed file format: converting a bam file into a bed file.

While it's useful to be able to look at the fastq file, many analyses will be easiest to perform in bed format. Bed format is a simple tab delimited format that designates various properties about segments of the genome, defined by the chromosome, start coordinates and end coordinates. Bedtools provides a simple utility to convert bam files over into bed files, termed bamtobed.

bedtools bamtobed [OPTIONS] -i input.bam > output.bed

Note that the output will be piped to standard out unless you redirect to a program (head, more, less) or a file (output.bed).

Exercise 2: Convert the filtered yeast paired end bam to bed using bamtobed, look at your file in more, and find the number of lines in the file

Solution

solution code

module load bedtools
bedtools bamtobed [OPTIONS] -i input.bam > output.bed

more output.bed #to examine the bed file visually
wc -l output.bed #to get the number of lines in a file

use ctrl+c to quit more

Bedtools Coverage: how much of the genome does my data cover?

One way of characterizing data is to understand what percentage of the genome your data covers. What type of experiment you performed should affect the coverage of your data. A ChIP-seq experiment will cover binding sites, and a RNA-seq experiment will cover expressed transcripts. Bedtools coverage allows you to compare one bed file to another and compute the breadth and depth of coverage. The resulting output will contain several additional columns which summarize this information:

After each interval in B, coverageBed will report:

The number of features in A that overlapped (by at least one base pair) the B interval.
The number of bases in B that had non-zero coverage from features in A.
The length of the entry in B.
The fraction of bases in B that had non-zero coverage from features in A.

For this exercise, we'll use a bed file that summerizes the S. cerevisiae genome, version 3 (aka SacCer3)

Exercise 3: Find the coverage of your bed file over the SacCer3 genome

Solution

solution code

module load bedtools
bedtools coverage -a output.bed -b sacCer3.chrom.sizes.bed > sacCer3coverage.bed
more sacCer3coverage.bed #this file should have

Bedtools merge: collapsing bookended elements (or elements within a certain distance)

Exercise 4: Use bedtools merge to merge an experiment

Solution

solution code

solution goes here

Bedtools intersect: identifying where two experiments overlap

Exercise 5: Intersect two experiments using intersect

Solution

solution code

solution goes here

Page tree

Analyzing your aligned experiment with BedTools

Converting a bam file to a fastq file

Bed file format: converting a bam file into a bed file.

Bedtools Coverage: how much of the genome does my data cover?

Bedtools merge: collapsing bookended elements (or elements within a certain distance)

Bedtools intersect: identifying where two experiments overlap