The UCSC Genome Browser is an invaluable resource both for obtaining public sequencing data and for visualizing it.
Tip Sometimes the UCSC Genome Browser at http://genome.ucsc.edu/ is pretty slow -- after all, it's a resource shared among the Eukaryotic genomics community. But there's also a second "Beta test" version of the browser at http://hgwdev.cse.ucsc.edu/. It has slightly newer (and possibly less stable) code, but fewer people use it.
|
The UCSC Genome Browser has a "Custom Tracks" feature that lets you visualize your data using the Genome Browser web application. This data is visible only to you, not publically (unless you choose to share a link to it with others).
There are two approaches to visualizing your data in the UCSC Genome Browser:
BED format is a simple 3 to 9 column format for location-oriented data.
See supported data formats for custom tracks for more information and examples.
VCF data can only be configured as a URL, not uploaded directly. Directions are found at http://genome.ucsc.edu/goldenPath/help/vcf.html.
module load tabix # also loads bgzip cd $BI/web bgzip progeria_ctcf.vcf |
tabix -p vcf progeria_ctcf.vcf.gz |
This has already been done, and the resulting files are at this URL: http://loving.corral.tacc.utexas.edu/bioiteam/ucsc_custom_tracks/, filename progeria_ctcf.vcf.gz. These are hg18 SNP calls from published Iyer Lab CTCF ChIP-seq data in Progeria cells. The VCF file was produce using Broad's GATK.
track type=vcfTabix name="progeria_ctcf_snp_calls" bigDataUrl="http://loving.corral.tacc.utexas.edu/bioiteam/ucsc_custom_tracks/progeria_ctcf.vcf.gz" |
BAM data can only be configured as a URL, not uploaded directly. Directions are found at http://genome.ucsc.edu/goldenPath/help/bam.html.
This has already been done, and the resulting files are at this URL: http://loving.corral.tacc.utexas.edu/bioiteam/ucsc_custom_tracks/, filename hela_totrna.sorted.bam. This is SE RNAseq data mapped directly to the human genome, hg19.
track type=bam name="hela_rnaseq" bigDataUrl="http://loving.corral.tacc.utexas.edu/bioiteam/ucsc_custom_tracks/hela_totrna.sorted.bam" |
Here is another example, using paired end RNAseq data as processed using a tophat/cufflinks pipeline:
track type=bam name="rnaseq_bam" pairEndsByName=Y bigDataUrl="http://loving.corral.tacc.utexas.edu/bioiteam/ucsc_custom_tracks/accepted_hits.sorted.bam" |
Downloading annotation dataFor RNAseq you often need a GTF file, but how do you find them? One way is to download annotations from the UCSC Table browser in GTF format:
|
A couple of exercisesExercise: Altzheimer's disease SNPUsing the UCSC Genome Browser, determine whether Craig Venter or James Watson has a higher risk of Altzheimer's disease. HintsAPOE gene. Variation & Repeats, Genome Variants Phenotype & Disease Assocations, GWAS Catalog Exercise 2Using the UCSC Genome Browser, find and download a list of high-sequencing-depth regions in BED format. Hintsgroup: Mapping and Sequencing tracks track: Hi Seq Depth |