UCSC Genome Browser tracks

The UCSC Genome Browser is an invaluable resource both for obtaining public sequencing data and for visualizing it.

Tip Sometimes the UCSC Genome Browser at http://genome.ucsc.edu/ is pretty slow -- after all, it's a resource shared among the Eukaryotic genomics community. But there's also a second "Beta test" version of the browser at http://hgwdev.cse.ucsc.edu/. It has slightly newer (and possibly less stable) code, but fewer people use it.

  • http://genome.ucsc.edu/ Genome Browser, submit
  • navigaion
    • type GAPDH in gene box, jump
    • note zoom out/zoom in buttons; click on position or click/drag
  • track detail
    • click "Simply Nucleotide Polymorphisms (dbSnp build 130)" to expand track detail
    • click on one of the SNP to expand track detail
      • then click on the snp name to see details
  • selecting/hiding tracks
    • under "Regulation" section, change "ENCODE Regulation" track from "show" to "hide", refresh
    • right click "Multiz Alignments", hide
    • under "Phenotype and Disease Association" change GWAS Catalog from "hide" to "squish",  refresh
  • type PRNP in gene box, jump
    • click on "NHGRI Catalog..." track description to expand detail
    • note correspondence between SNPs (SNP 132) and disease SNPs (GWAS)
    • click on one of the disease SNPs for detail

Configuring custom tracks

The UCSC Genome Browser has a "Custom Tracks" feature that lets you visualize your data using the Genome Browser web application. This data is visible only to you, not publically (unless you choose to share a link to it with others).

There are two approaches to visualizing your data in the UCSC Genome Browser:

  1. Directly upload a data file, in one of the supported formats.
  2. Host your data locally, and configure the UCSC Genome Browser with its URL.

BED data

BED format is a simple 3 to 9 column format for location-oriented data.

See supported data formats for custom tracks for more information and examples.

VCF data

VCF data can only be configured as a URL, not uploaded directly. Directions are found at http://genome.ucsc.edu/goldenPath/help/vcf.html.

This has already been done, and the resulting files are at this URL: http://loving.corral.tacc.utexas.edu/bioiteam/ucsc_custom_tracks/, filename progeria_ctcf.vcf.gz. These are hg18 SNP calls from published Iyer Lab CTCF ChIP-seq data in Progeria cells. The VCF file was produce using Broad's GATK.

BAM data

BAM data can only be configured as a URL, not uploaded directly. Directions are found at http://genome.ucsc.edu/goldenPath/help/bam.html.

This has already been done, and the resulting files are at this URL: http://loving.corral.tacc.utexas.edu/bioiteam/ucsc_custom_tracks/, filename hela_totrna.sorted.bam. This is SE RNAseq data mapped directly to the human genome, hg19.

Here is another example, using paired end RNAseq data as processed using a tophat/cufflinks pipeline:

track type=bam name="rnaseq_bam" pairEndsByName=Y bigDataUrl="http://loving.corral.tacc.utexas.edu/bioiteam/ucsc_custom_tracks/accepted_hits.sorted.bam"

Downloading annotation data

For RNAseq you often need a GTF file, but how do you find them? One way is to download annotations from the UCSC Table browser in GTF format:

  • http://genome.ucsc.edu/cgi-bin/hgTables
    • clade: Mammal, genome: Human, assembly: hg19
    • group: Genes and Gene Prediction tracks, track: RefSeq genes
    • output format: GTF - gene transfer format
    • optional: enter filename in typein box
    • get output

A couple of exercises

Exercise: Altzheimer's disease SNP

Using the UCSC Genome Browser, determine whether Craig Venter or James Watson has a higher risk of Altzheimer's disease.

Hints

APOE gene.

Variation & Repeats, Genome Variants

Phenotype & Disease Assocations, GWAS Catalog
A solution

Exercise 2

Using the UCSC Genome Browser, find and download a list of high-sequencing-depth regions in BED format.

Hints

group: Mapping and Sequencing tracks

track: Hi Seq Depth

A solution