...
BED (Browser Extensible Data) format is a simple text format for location-oriented data (genomic regions) developed to support UCSC Genome Browser tracks. Standard BED files have 3 to 6 Tab-separated columns, although up to 12 columns are defined. (Read more about the UCSC Genome Browser's official BED format.)
...
- chrom (required) – string naming the chromosome or othre other contig
- start (required) – the 0-based start position of the region
- end (required) – the 1-based end position of the region
- name (optional) – an arbitrary string describing the region
- for BED files loaded as UCSC Genome Browser tracks, this text is displayed above the region
- score (optional) – an integer score for the region
- for BED files to be loaded as UCSC Genome Browser tracks, this should be a number between 0 and 1000, higher = "better"
- for non-GenBrowse BED files, this can be any integer value (e.g. the length of the region)
- strand (optional) - a single character describing the region's strand
- + – plus strand (Watson strand) strand region
- - – minus strand (Crick strand) strand region
- . – no strand – the region is not associated with a strand (e.g. a transcription factor binding region)
...
- The number of fields per line must be consistent throughout any single BED file
- e.g. they must all have 3 fields or all have 6 fields
- The first base on a contig is numbered 0
- versus 1 for BAM file positions
- so the a BED start of 99 is actually the 100th base on the contig
- but end positions are 1-based
- so a BED end of 200 is the 200th base on the contig
- the length of a BED region is end - start
- not end - start + 1, as it would be if both coordinates with 0-based or both 1-based
- this difference is the single greatest source of errors dealing with BED files!
Note that the UCSC Genome Browser also defines many BED-like data formats (e.g. bedGraph, narrowPeak, tagAlign and various RNA elements element formats). See supported UCSC Genome Browser data formats for more information and examples.
In addition to standard-format BED files, one can create custom BED files that have at least 3 of the standard fields (chrom, start, end), followed by any number of custom fields. For example:
- A BED3+ file contains the 3 required BED fields, followed by some number of user-defined columns (all records with the same number)
- A BED6+ file contains the 3 required BED fields, 3 additional standard BED fields (name, score, strand), followed by some number of user-defined columns
As we will see, BEDTools functions require BED3+ input files, or BED6+ if strand-specific operations are requested.
...