MAINTENANCE OUTAGE: The University Wiki Service will undergo maintenance on September 26th, 2017, from 6 pm to 8 pm. During this 2 hour time period https://wikis.utexas.edu may be unavailable. Users are advised to save content locally that may be needed during this time and to otherwise save all edits as unsaved work may be lost. Please contact the UT Service Desk at 512-475-9400 for any questions.
The University Wiki Service has upgraded the Confluence Server software, from version 5.9.14 to 5.10.8. Please refer to the knowledge base article, KB0015891, for a high level summary of upgrade changes. Thank you!
Skip to end of metadata
Go to start of metadata

1. Align reads from each sample to the reference genome using tophat

For Illumina/basespace data:

nohup tophat -p 4 -r <mate-inner-distance> <bowtie_index_prefix> <R1.fastq> <R2.fastq> &>tophat.log &

For ABI SOLiD (colorspace) data:

nohup tophat --bowtie1 -C -p 4 -r <mate-inner-distance> <bowtie_colorspace_index_prefix> <R1.fasta> <R2.fasta> <R1.qual> <R2.qual> &>tophat.log &

The output file (accepted_hits.bam) is the alignment output which will be used in following steps.

-G transcripts.gtf can be used to align to transcriptome first and align only those that don't map to transcriptome to the genome. This is useful for speeding up tophat

2. Assemble transcripts using cufflinks on each alignment result

nohup cufflinks -o <outputdirectory> accepted_hits.bam &>cufflinks.log &

or to assemble transcripts using the assistance of a gfffile (this would assemble novel and known transcripts)

nohup cufflinks -g <gfffile> -o <outputdirectory> accepted_hits.bam &>cufflinks.log &

The output file, transcripts.gtf will be used in the following steps.

3. Merge the assembled transcripts to make a unified gff/gtf file.

Make a text file with locations of each sample's transcripts.gtf file, one line per sample.

transcript_list.txt

sample1/transcripts.gtf
sample2/transcripts.gtf
sample3/transcripts.gtf

nohup cuffmerge -g <gfffile> transcripts_list.txt &>cuffmerge.log &

The output file, merged.gtf will be used in the following steps (4a and 4b).

4a. Identify differentially expressed transcripts using cuffdiff

If you have more than one replicate for a sample, supply the SAM files for the sample as a single comma-separated list.

nohup cuffdiff -o  <outputdirectory> merged.gtf <sample1_accepted_hits.bam> <sample2_accepted_hits.bam> <sample3_accepted_hits.bam> &cuffdiff.log &

Several output files, consisting of raw and normalized counts for genes, isoforms and transcription start sites are generated. More about the output files at http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff_output

4b. Check for differences between the assembled transcripts and known transcripts.

cuffcompare -s <reference.fasta> -r <gtffile/gfffile> merged.gtf 
  • No labels