Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

BED (Browser Extensible Data) format is a simple text format for location-oriented data (genomic regions) developed to support UCSC Genome Browser tracks. Standard BED files have 3 to 6 Tab-separated columns, although up to 12 columns are defined. (Read more about the UCSC Genome Browser's official BED format.)

...

  1. chrom (required) – string naming the chromosome or othre other contig
  2. start (required) – the 0-based start position of the region
  3. end (required) – the 1-based end position of the region
  4. name (optional) – an arbitrary string describing the region
    • for BED files loaded as UCSC Genome Browser tracks, this text is displayed above the region
  5. score (optional) – an integer score for the region
    • for BED files to be loaded as UCSC Genome Browser tracks, this should be a number between 0 and 1000, higher = "better"
    • for non-GenBrowse BED files, this can be any integer value (e.g. the length of the region)
  6. strand (optional) - a single character describing the region's strand
    • +plus strand (Watson strand) strand region
    • -minus strand (Crick strand) strand region
    • . no strand the region is not associated with a strand (e.g. a transcription factor binding region)

...

  • The number of fields per line must be consistent throughout any single BED file
    • e.g. they must all have 3 fields or all have 6 fields
  • The first base on a contig is numbered 0
    • versus 1 for BAM file positions
    • so the a BED start of 99 is actually the 100th base on the contig
    • but end positions are 1-based
      • so a BED end of 200 is the 200th base on the contig
    • the length of a BED region is end - start
      • not end - start  + 1, as it would be if both coordinates with 0-based or both 1-based
    • this difference is the single greatest source of errors dealing with BED files!

Note that the UCSC Genome Browser also defines many BED-like data formats (e.g. bedGraph, narrowPeak, tagAlign and various RNA elements element formats). See supported UCSC Genome Browser data formats for more information and examples.

In addition to standard-format BED files, one can create custom BED files that have at least 3 of the standard fields (chrom, start, end), followed by any number of custom fields. For example:

  • A BED3+ file contains the 3 required BED fields, followed by some number of user-defined columns (all records with the same number)
  • A BED6+ file contains the 3 required BED fields, 3 additional standard BED fields (name, score, strand), followed by some number of user-defined columns

As we will see, BEDTools functions require BED3+ input files, or BED6+ if strand-specific operations are requested.

...