Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents

BED format

BED format is a simple 3 to 9 column format for location-oriented data.

See supported data formats for custom tracks for more information and examples.

Important rules in BED format

  • The number of fields per line must be consistent throughout any single set of data in an annotation track.
  • The first base in a chromosome is numbered 0
BED format practice 1

Q: Convert saccharomyces_cerevisiae_R64-1-1_20110208.gff into a 3 column bed file that includes 'gene' feature

Expand
titleAnswer
Code Block
titleconversion with awk
cat saccharomyces_cerevisiae_R64-1-1_20110208.gff |  awk 'BEGIN{FS="\t";OFS="\t"} $3=="gene" {print $1,$4-1,$5}' > sc_gene_3c.bed
BED format practice 2

Q: Convert saccharomyces_cerevisiae_R64-1-1_20110208.gff into a 4 column bed file that includes 'gene' feature (4th column has gene IDs)

Expand
titleAnswer
Code Block
titleconversion with awk
cat saccharomyces_cerevisiae_R64-1-1_20110208.gff |  awk 'BEGIN{FS="\t";OFS="\t"} $3=="gene" {split($9,x,";") ; $9=substr(x[1],4) ; print $1,$4-1,$5,$9}' > sc_gene.bed

BEDTools multicov

BEDTools is a great utility for comparing genomic features in BAM, BED, VCF, and GFF formats. The documentation is well written in great detail. Here, we will practice three commonly used sub-commands: multicov, merge, and intersect.

We are going to use it to count the number of reads that map to each gene in the genome. Load the module and check out the help for bedtools and the multicov specific command that we are going to use:

Code Block
module load bedtools
bedtools
bedtools multicov

The multicov command takes a feature file (GFF/BED/VCF) and counts how many reads are in certain regions from many input files. By default it counts how many reads overlap the feature on either strand, but it can be made specific with the -s option.

Note: Remember that the chromosome names in your gff file should match the way the chromosomes are named in the reference fasta file used in the mapping step. For example, if BAM file used for mapping contains chr1, chrX etc, the GFF file must also call the chromosomes as chr1, chrX and so on.

In order to use the bedtools command on our data, do the following:

Code Block
bedtools multicov -bams yeast_chip_sort.bam yeast_chip2_sort.bam -bed sc_gene.bed > gene_counts.txt

Then take a peek at the data...

Code Block
head gene_counts.txt

BEDTools merge

 

BEDTools intersect

 

Online tutorial

Overview and use cases

http://bedtools.readthedocs.org/en/latest/content/bedtools-suite.html

bedtools intersect

http://bedtools.readthedocs.org/en/latest/content/tools/intersect.html

bedtools multicov

http://bedtools.readthedocs.org/en/latest/content/tools/multicov.html

bedtools merge

http://bedtools.readthedocs.org/en/latest/content/tools/merge.html