Page History

Table of Contents

Differential expression with splice variant analysis at the same time: the Tuxedo pipeline

...

the original RNAseq analysis protocol using Tuxedo article in Nature Protocols, and
the URL for Tuxedo resource bundles for selected organisms (gff annotations, pre-built bowtie references, etc.)
the example data we'll use for this tutorial came from this experiment which has the raw fastq data in the SRA.

Objectives

In this lab, you will explore a fairly typical RNA-seq analysis workflow using the Tuxedo pipeline. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total). This simulated data has already been run through a basic RNA-seq analysis workflow. We will look at:

How the workflow was run and what steps are involved.
What genes and isoforms are significantly differentially expressed

Introduction

Overall Workflow Diagram

An This overview of tophat cufflinks workflow Diagram outlines the Tuxedo pipeline. We have annotated the image from the original paper to include the important file types at each stage, and to note the steps skippin in the "fast path" (no de novo junction assembly).

This is the full workflow that includes de novo splice variant detection. For simple differential gene expression, Steps 2 (cufflinks) and 3-4 (cuffmerge) can be omitted.

...

Simple differential gene expression analysis against a set of known splice variants.
- A GTF/GFF file is provided, and you specify that no novel junctions should be explored
- This is by far the fastest path through the workflow.
Same as 1), but novel splice junctions should be explored in addition to known splice junctions
- A GTF/GFF file is provided, and you let the tool search for novel junctions also
Use the input data to construct de novo splice junctions without reference to any known splice junctions
- No GTF/GFF is provided

What tophat does

The 1st, Tophat step is always required and sets the stage for all that follows. Tophat does a transcriptome-aware alignment of the input sequences to a reference genome using either the Bowtie or Bowtie2 aligner (in theory it can use other aligners, but we do not recommend this).

Split Read Alignment (Splice Finding)

To do thisTo do this, Tophat goes through several steps:

...

At the end of the Tophat process, you have a BAM file describing the alignment of the input data to genomic coordinates.

FASTQ preparation

Although we won't cover these issues here, there are some issues you should consider before embarking on the Tuxedo pipeline:

...

Possibly, if FastQC or other base quality reports show the data is really poor. But generally the fact that Tophat splits long reads into smaller fragments mitigates the need to do this.

...

This is usually a good idea because un-template adapter bases have a more drastic effect on reducing mappability than do low-quality 3' bases.

...

This is also usually a good idea, because such rRNA sequences can be a substantial proportion of your data (depending on library prep method), and this can skew cuffdiff's fragment counting statistics.

...

Maybe something like this:

Align your sequences to a reference "genome" consisting only of rRNA gene sequences.
Extract only the sequences that do not align to the rRNA reference into a new FASTQ file and use that as Tophat input.

...

There are many, and it will depend on your data and what you want to get out of it.

If you have paired-end data, tophat asks you to provide the mean fragment (insert) size and the standard deviation for insert sizes in your library. One common pre-processing step to achieve this would be to do a quick paired-end alignment of, for example, about 1 million sequences to a reference genome. Then you could calculate the mean and standard deviation of insert sizes for properly paired reads from the resulting BAM file records, and pass these values to Tophat.

Some Logistics...

Six raw data files were provided as the starting point:

c1_r1, c1_r2, c1_r3 from the first biological condition
c2_r1, c2_r2, and c2_r3 from the second biological condition

Due to the size of the data and length of run time, most of the programs have already been run for this exercise. The commands run are in different *.commands files. We will spend some time looking through these commands to understand them. You will then be parsing the output, finding answers, and visualizing results.

Data for this section is all located under $BI/ngs_course/tophat_cufflinks/ So cd into this directory now.

Commands used are in different *.commands files located in $BI/ngs_course/tophat_cufflinks/run_commands

Some output, like bam files can be gotten from the URL http://loving.corral.tacc.utexas.edu/bioiteam/tophat_cufflinks/ and directly loaded into IGV, using Load from URL.

If you generate your own output and would like to view them on IGV, you will need to scp the files from lonestar to your computer.

...

On your computer's side:

Go to the directory where you want to copy files to.

Code Block
scp my_user_name@lonestar.tacc.utexas.edu:/home/.../stuff.fastq ./

Replace the "/home/..." with the "pwd" information obtained earlier.

This command would transfer "stuff.fastq" from the specified directory on Lonestar to your current directory on your computer.

This file can be used as input for downstream applications like Cuffmerge, which will assemble parsimonious consensus fragments from the BAM file coordinates.

What cufflinks and cuffmerge do

Cuffmerge Assembly

For each separate dataset representing a specific replicate and condition, cufflinks assembles a map of genomic areas enriched in aligned reads. cuffmerge then takes the set of individual assemblies and merges them into a consensus assembly for all the provided datasets. The consensus may include known splice variant annotations if you have provided those to the program.

What cuffdiff and cummeRbund do

Next, cuffdiff uses the consensus splice variant annotations (and/or the known splice variants) to quantify expression levels of genes and isoforms, using FPKM (fragments per kilobase per million reads) metrics.

Finally, cummeRbund creates pretty differential expression plots of the FPKM data using R.

Notes on FASTQ preparation

Although we won't cover these issues here, there are some issues you should consider before embarking on the Tuxedo pipeline:

Should my FASTQ sequences be trimmed to remove low-quality 3' bases?

Expand

	Suggestion
	Suggestion

Possibly, if FastQC or other base quality reports show the data is really poor. But generally the fact that Tophat splits long reads into smaller fragments mitigates the need to do this.

Should I remove adapter sequences before running Tophat?
Expand
Suggestion
Suggestion
This is usually a good idea because un-template adapter bases have a more drastic effect on reducing mappability than do low-quality 3' bases.

Should I attempt to remove sequences that map to undesired RNAs before running Tophat? (rRNA for example)

Expand

	Suggestion
	Suggestion

This is also usually a good idea, because such rRNA sequences can be a substantial proportion of your data (depending on library prep method), and this can skew cuffdiff's fragment counting statistics.

How would, for example, rRNA sequence removal be done?

Expand

	Suggestion
	Suggestion

Maybe something like this:

Align your sequences to a reference "genome" consisting only of rRNA gene sequences.
Extract only the sequences that do not align to the rRNA reference into a new FASTQ file and use that as Tophat input.

What other pre-processing steps might I consider?

Expand

	Suggestion
	Suggestion

There are many, and it will depend on your data and what you want to get out of it.

If you have paired-end data, tophat asks you to provide the mean fragment (insert) size and the standard deviation for insert sizes in your library. One common pre-processing step to achieve this would be to do a quick paired-end alignment of, for example, about 1 million sequences to a reference genome. Then you could calculate the mean and standard deviation of insert sizes for properly paired reads from the resulting BAM file records, and pass these values to Tophat.

Some Logistics...

Six raw data files were provided as the starting point:

c1_r1, c1_r2, c1_r3 from the first biological condition
c2_r1, c2_r2, and c2_r3 from the second biological condition

Due to the size of the data and length of run time, most of the programs have already been run for this exercise. The commands run are in different *.commands files. We will spend some time looking through these commands to understand them. You will then be parsing the output, finding answers, and visualizing results.

Data for this section is all located under $BI/ngs_course/tophat_cufflinks/ So cd into this directory now.

Commands used are in different *.commands files located in $BI/ngs_course/tophat_cufflinks/run_commands

Some output, like bam files can be gotten from the URL http://loving.corral.tacc.utexas.edu/bioiteam/tophat_cufflinks/ and directly loaded into IGV, using Load from URL.

If you generate your own output and would like to view them on IGV, you will need to scp the files from lonestar to your computer.

Expand

	Reminder on how to scp files from lonestar
	Reminder on how to scp files from lonestar

On your computer's side:

Go to the directory where you want to copy files to.

Code Block
scp my_user_name@lonestar.tacc.utexas.edu:/home/.../stuff.fastq ./

Replace the "/home/..." with the "pwd" information obtained earlier.

This command would transfer "stuff.fastq" from the specified directory on Lonestar to your current directory on your computer.

For help, remember to type the commands and hit enter to see what help they can offer you. You will also almost certainly need to consult the documentation for tophat, cufflinks and cummeRbund:

Exercise Workflow

Here are the steps for the full workflow. Steps 2 and 3 can be omitted if you don't need to explore novel splice junctions. However here we will explore the full set of steps:

map reads against the Drosophila genome (tophat)
assemble the transcripts (cufflinks)
merge the assemblies (cuffmerge)
compute differential expression (cuffdiff)
inspect the results
examine the differential expression results (using Linux, IGV and cummeRbund)
(Extra) compare assembled transcripts to annotated transcripts

For help, remember to type the commands and hit enter to see what help they can offer you. You will also almost certainly need to consult the documentation for tophat, cufflinks and cummeRbund:

Exercise Workflow

Here are the steps for the full workflow. Steps 2 and 3 can be omitted if you don't need to explore novel splice junctions. However here we will explore the full set of steps:

map reads against the Drosophila genome (tophat)
assemble the transcripts (cufflinks)
merge the assemblies (cuffmerge)
compute differential expression (cuffdiff)
inspect the results
examine the differential expression results (using Linux, IGV and cummeRbund)
(Extra) compare assembled transcripts to annotated transcripts to identify potentially novel ones (cuffcmp)

...

On lonestar, to run tophat, cufflinks etc, following modules need to be loaded.

Code Block
module load boost/1.45.0 module load bowtie module load tophat module load cufflinks/2.0.2

Code Block

title	General syntax for tophat command


tophat [options] <bowtie_index_prefix> <reads1> <reads2>

...

Code Block

title	Take a minute to look at the output files produced by one tophat run.


cd $BI/ngs_course/tophat_cufflinks/C1_R1_thout
ls -l

-rwxr-xr-x 1 daras G-803889 323M Aug 16 11:47 accepted_hits.bam
-r-xr-xr-x 1 daras G-803889 237K Aug 16 11:46 accepted_hits.bam.bai

-rwxr-xr-x 1 daras G-803889   52 Aug 16 11:46 deletions.bed
-rwxr-xr-x 1 daras G-803889   54 Aug 16 11:46 insertions.bed
-rwxr-xr-x 1 daras G-803889 2.9M Aug 16 11:46 junctions.bed
-rwxr-xr-x 1 daras G-803889   70 Aug 16 11:46 left_kept_reads.info
drwxr-xr-x 2 daras G-803889  32K Aug 16 11:46 logs
-rwxr-xr-x 1 daras G-803889   70 Aug 16 11:46 right_kept_reads.info
-rwxr-xr-x 1 daras G-803889 9.7K Aug 16 11:46 unmapped_left.fq.z
-rwxr-xr-x 1 daras G-803889 9.9K Aug 16 11:46 unmapped_right.fq.z

Exercise 1a: Providing a transcript annotation file

If I wanted Which tophat option is used to provide a trancript transcript annotation file (GTF file) to use in tophat, what option would I add to the command?

Expand

	Hints
	Hints

Remember that you can type the command and hit enter to get help info about it.
Or Google tophat to find its online manual

Expand

	Solution
	Solution

Add The -G <GTF filename> to use the annotated splice junctions in the supplied GTF file

Exercise 1b: Using only annotated junctions
How would I tell tophat to only use a specified set of transcript annotation and not assemble any novel transcripts?

Expand

	Solution
	Solution

Add Specify -G <gtf filename> to have tophat use the annotated transcripts in the supplied GTF file
Also add the --no-novel-juncs option to suppress de novo junction detection

...

The GTF file for our Drosophila genome (dm3) is in $BI/ngs_course/tophat_cufflinks/reference/genes.gtf. What does it look like?

Expand

	hint
	hint

Any one of these:

Code Block


less $BI/ngs_course/tophat_cufflinks/reference/genes.gtf  # :q to exit
cat $BI/ngs_course/tophat_cufflinks/reference/genes.gtf | head
cat $BI/ngs_course/tophat_cufflinks/reference/genes.gtf | cut -f 1-8 | more
cat $BI/ngs_course/tophat_cufflinks/reference/genes.gtf | cut -f 9 | more

...

Expand

	How to
	How to

Code Block
cd $BI/ngs_course/tophat_cufflinks/C1_R1_thout ls -la samtools view -x accepted_hits.bam \| less # :q to quit

...

Expand

	Hint
	Hint

The 6th BAM file field is the CIGAR string which tells you how your query sequence mapped to the reference.

Expand

	Answer
	Answer

Code Block

	Looking at the CIGAR string
	Looking at the CIGAR string


samtools view -x accepted_hits.bam | cut -f 1-9 | more

26      0x61    2L      8008    255     75M     =       8205    272
16      0x93    2L      8021    255     75M     =       7902    -194
9       0x93    2L      8051    255     66M76N9M        =       7954    -248
3       0x63    2L      8059    255     58M76N17M       =       8220    236
10      0x93    2L      8093    255     24M76N51M       =       7972    -272
20      0x93    2L      8102    255     15M76N60M       =       7984    -269

The CIGAR string "58M76N17M" representst a spliced sequence. The codes mean:

56M - the first 58 bases match the 56M - the first 58 bases match the reference
76N - there are then 76 bases on the reference with no corresponding bases in the sequence (an intron)
17M - the last 17 bases match the reference

...

Expand

	How to
	How to

Code Block
cd $BI/ngs_course/tophat_cufflinks/C1_R1_thout samtools view accepted_hits.bam \| cut -f 6 \| grep 'N' \| wc -l

...

Code Block

title	General syntax for cufflinks command


cufflinks [options] <hits.bam>

...

Code Block

title	Cufflinks output files


cd $BI/ngs_course/tophat_cufflinks/C1_R1_clout
ls -l

ls -l


drwxrwxr-x 2 nsabell G-801021    32768 May 22 15:10 cuffcmp
-rwxr-xr-x 1 daras G-803889  14M Aug 16 12:49 transcripts.gtf

-rwxr-xr-x 1 daras G-803889 597K Aug 16 12:49 genes.fpkm_tracking
-rwxr-xr-x 1 daras G-803889 960K Aug 16 12:49 isoforms.fpkm_tracking
-rwxr-xr-x 1 daras G-803889    0 Aug 16 12:33 skipped.gtf

...

Create a file listing the paths of all per-sample transcripts.gtf files so far, then pass that to cuffmerge:

Code Block
cd $BI/ngs_course/tophat_cufflinks find . -name transcripts.gtf > assembly_list.txt cuffmerge <assembly_list.txt>

Expand

	assembly_list.txt contents
	assembly_list.txt contents

Code Block
cat $BI/ngs_course/tophat_cufflinks/assembly_list.txt

...

Code Block

title	cuffmerge output


cd $BI/ngs_course/tophat_cufflinks/merged_asm
ls -l

-rwxrwxr-x  1 daras G-803889  1571816 Aug 16  2012 genes.fpkm_tracking
-rwxrwxr-x  1 daras G-803889  2281319 Aug 16  2012 isoforms.fpkm_tracking
drwxrwxr-x  2 daras G-803889    32768 Aug 16  2012 logs
-r-xrwxr-x  1 daras G-803889 32090408 Aug 16  2012 merged.gtf
-rwxrwxr-x  1 daras G-803889        0 Aug 16  2012 skipped.gtf
drwxrwxr-x  2 daras G-803889    32768 Aug 16  2012 tmp
-rwxrwxr-x  1 daras G-803889 34844830 Aug 16  2012 transcripts.gtf

Step 4: Finding differentially expressed genes and isoforms using cuffdiff

Code Block
cuffdiff [options] <merged.gtf> <sample1_rep1.bam,sample1_rep2.bam> <sample2_rep1.bam,sample2_rep2.bam>

Exercise 2: What does cuffdiff -b do?

Expand

	Solution
	Solution

-b is for enabling fragment bias correction.

...

Step 5: Inspect the mapped results

Before you even start, do a sanity check on your data by examining the bam files from the mapping output.

IWe've included the directory "bwa_genome" containing the output from bwa of the C1_R1_1 and C1_R1_2 files mapped directly to the genome using bwa. Tophat output is in C1_R1_thout for C1_R1 sample.

Report the total number of mapped reads for each sample using "samtools flagstat" for the bwa output (bam file) and the tophat output (bam file). There is a big difference between the bam file from tophat and the bam file from bwa. Can you spot it?

Expand

	Reminder on how to run samtools flagstat
	Reminder on how to run samtools flagstat

Code Block
samtools flagstat C1_R1.bam

You might want to load the bwa bam file on IGV alongside the tophat bam file for the same sample to see the differences between mapping to the transcriptome and mapping to the genome.

...

If you would like, you can also load another bam file output from the tophat run (remember, tophat calls bowtie for mapping). These are large file, so it may slow your computer down. Since we've already loaded C1_R1, let's load one from the C2 condition, say, C2_R1, to see genuine expression level changes.

Keep this open to come back to, when we further explore differential expression results.

Expand

	Reminder on how to download and run IGV
	Reminder on how to download and run IGV

Warning
Do this on your local machine, not on TACC

Code Block

title	Downloading and running IGV


wget http://www.broadinstitute.org/igv/projects/downloads/IGV_2.1.14.zip
unzip IGV_2.1.14.zip
cd IGV_2.1.14
java -Xmx2g -jar igv.jar

Examine the differential expression analysis results

The cuffdiff output is in a directory called diff_out. We are going to spend some time parsing through this output. So, copy it over to your scratch directory and move to your SCRATCH directory.

Code Block
cds cp -r $BI/ngs_course/tophat_cufflinks/diff_out

...

 .
ls diff_out

Note that cuffdiff has performed a statistical test on the expression values between our two biological groups. It reports the FPKM expression levels for each group, the log2(group 1 FPKM/ group 2 FPKM), and a p-value measure of statistical confidence, among many other helpful data items.

Take a minute to look at the output files produced by cuffdiff.

Code Block

title	cuffdiff output

cds
cd diff_out
ls -l

-rwxr-x--- 1 daras G-801020  2691192 Aug 21 12:20 isoform_exp.diff  : Differential expression testing for transcripts
-rwxr-x--- 1 daras G-801020  1483520 Aug 21 12:20 gene_exp.diff     : Differential expression testing for genes

Code Block
cp -r diff_out $SCRATCH cds ls diff_out

Note that cuffdiff has performed a statistical test on the expression values between our two biological groups. It reports the FPKM expression levels for each group, the log2(group 1 FPKM/ group 2 FPKM), and a p-value measure of statistical confidence, among many other helpful data items.

Code Block

title	Take a minute to look at the output files produced by cuffdiff.

-rwxr-x--- 1 daras G-801020  26911921729831 Aug 21 12:20 isoformtss_group_exp.diff  : Differential expression testing for primary transcripts
-rwxr-x--- 1 daras G-801020  14835201369451 Aug 21 12:20 genecds_exp.diff      : Differential expression testing for genescoding sequences

-rwxr-x--- 1 daras G-801020  17298313277177 Aug 21 12:20 tss_group_exp.diff: Differential expression testing for primary transcriptsisoforms.fpkm_tracking
-rwxr-x--- 1 daras G-801020  1628659 Aug 21 12:20 genes.fpkm_tracking
-rwxr-x--- 1 daras G-801020  13694511885773 Aug 21 12:20 cdstss_exp.diff      : Differential expression testing for coding sequences

groups.fpkm_tracking
-rwxr-x--- 1 daras G-801020  32771771477492 Aug 21 12:20 isoformscds.fpkm_tracking

-rwxr-x--- 1 daras G-801020  16286591349574 Aug 21 12:20 genes.fpkm_trackingsplicing.diff  : Differential splicing tests
-rwxr-x--- 1 daras G-801020  18857731158560 Aug 21 12:20 tss_groups.fpkm_trackingpromoters.diff : Differential promoter usage
-rwxr-x--- 1 daras G-801020  1477492 919690 Aug 21 12:20 cds.fpkm_tracking

-rwxr-x--- 1 daras G-801020  1349574 Aug 21 12:20 splicing.diff : Differential splicing tests
-rwxr-x--- 1 daras G-801020  1158560 Aug 21 12:20 promoters.diff : Differential promoter usage
-rwxr-x--- 1 daras G-801020   919690 Aug 21 12:20 cds.diff      : Differential coding output.

Here is a basic command useful for parsing/sorting the gene_exp.diff or isoform_exp.diff files:

Code Block

title	Linux one-liner for sorting cuffdiff output by log2 fold-change values


cat isoform_exp.diff | awk '{print $10 "\t" $4}' | sort -n -r | head

diff       : Differential coding output.

Here is a basic command useful for parsing/sorting the gene_exp.diff or isoform_exp.diff files:

Code Block

title	Linux one-liner for sorting cuffdiff output by log2 fold-change values

cat isoform_exp.diff | awk '{print $10 "\t" $4}' | sort -n -r | head

Exercise 1: Find the 10 most up-regulated genes, by fold change that are classified as significantly changed. Look at one example of a up-regulated gene, regucalcin, on IGV.

Expand

	Solution
	Solution

Top 10 upregulated genes
scf
CG8979
CG4389
crc
KdelR
Vha68-2
CG3835
Df31
by
Dhpr

Expand

	Hint
	Hint

Code Block

title	One-line command to get 10 most up regulated genes

cat gene_exp.diff |grep 'yes'|sort -k10nr,10|head

Code Block

title	One-line command to get 10 most down regulated genes

cat gene_exp.diff |grep 'yes'|sort -k10n,10|head

Exercise 2Exercise 3: Find the 10 most up-regulated genesisoforms, by fold change that are classified as significantly changed. Look at one example of a up-regulated gene, regucalcin, on IGV.What genes do they belong to?

Expand

	Solution
	Solution

simj
CG2177
sPLA2
Nipsnap
Pde8Top 10 upregulated genes
scf
CG8979
CG4389
crc
KdelR
Vha68-2
CG3835
Df31
by
CG15814
Dhpr
eIF-4E
spir

Expand

	Hint
	Hint

Code Block

title	One-line command to get 10 most up-regulated genesisoforms


cat geneisoform_exp.diff |grep 'yes'|sort -k10nr,10|head

Code Block

title	One-line command to get 10 most down regulated genes


cat gene_exp.diff |grep 'yes'|sort -k10n,10|head

Exercise 4: Find the 10 most up-regulated isoforms, by fold change that are classified as significantly changed. What genes do they belong to?

...

simj
CG2177
sPLA2
Nipsnap
Pde8
by
CG15814
Dhpr
eIF-4E
spir

...

Code Block

title	One-line command to get 10 most up-regulated isoforms


cat isoform_exp.diff |grep 'yes'|sort -k10nr,10|head

,10|head

Step 6: Using cummeRbund to inspect differential expression data.

A) First load R and enter R environment

Code Block
module load R R

B) Within R environment, set up cummeRbund

Code Block
source("http://bioconductor.org/biocLite.R") biocLite("cummeRbund")

C) Load cummeRbund library and read in the differential expression results. If you save and exit the R environment and return, these commands must be executed again.

Code Block
library(cummeRbund) cuff_data <- readCufflinks('diff_out')

D) Use cummeRbund to visualize the differential expression results.

NOTE: Any graphic outputs will be automatically saved as "Rplots.pdf" which can create problems when you want to create multiple plots with different names in the same process. To save different plots with different names, preface any of the commands below with the command:

Code Block
pdf(file="myPlotName.pdf")

And after executing the necessary commands, add the line:

Code Block
dev.off()

Thus, to use the csScatter command and save the results in a file called scatterplot.pdf, one would execute the following commands:

Code Block
pdf(file="scatterplot.pdf") csScatter(genes(cuff_data), 'C1', 'C2') dev.off()

Step 6: Using cummeRbund to inspect differential expression data.

A) First load R and enter R environment

Code Block
module load R R

B) Within R environment, set up cummeRbund

Code Block
source("http://bioconductor.org/biocLite.R") biocLite("cummeRbund")

C) Load cummeRbund library and read in the differential expression results.

Code Block
library(cummeRbund) cuff_data <- readCufflinks('diff_out')

...

Code Block

title	To pull out significantly differentially expressed genes and isoforms


gene_diff_data  <- diffData(genes(cuff_data))
sig_gene_data  <- subset(gene_diff_data, (significant ==  'yes'))
nrow(sig_gene_data)
isoform_diff_data <-diffData(isoforms(cuff_data))
sig_isoform_data <- subset(isoform_diff_data, (significant == 'yes'))
nrow(sig_isoform_data)

Code Block

title	To draw a scatterplot


csScatter(genes(cuff_data), 'C1', 'C2')

Code Block

title	To plot gene level and isoform level expression for gene regucalcin


mygene1 <- getGene(cuff_data,'regucalcin')
expressionBarplot(mygene1)
expressionBarplot(isoforms(mygene1))

Code Block

title	To plot gene level and isoform level expression for gene Rala


mygene2 <- getGene(cuff_data, 'Rala')
expressionBarplot(mygene2)
expressionBarplot(isoforms(mygene2))

...

Expand

	Solution
	Solution

Code Block

title	R command to generate box plot of gene level fpkms


csBoxplot(genes(cuff_data))

Code Block

title	R command to generate box plot of isoform level fpkms


csBoxplot(isoforms(cuff_data))

...

Expand

	Solution
	Solution

Code Block

title	One possible solution


gene_diff_data  <- diffData(genes(cuff_data))
sig_gene_data  <- subset(gene_diff_data, (ln_fold_change > 1.5))
head(sig_gene_data)
sig_geneids <- c(sig_gene_data$gene_id)

myGenes <- getGenes(cuff_data, sig_geneids)
expressionBarplot(myGenes)

...

Space shortcuts

Page tree

Versions Compared

Old Version 134

New Version Current

Key

Differential expression with splice variant analysis at the same time: the Tuxedo pipeline

Objectives

Introduction

Overall Workflow Diagram

What tophat does

FASTQ preparation

Some Logistics...

What cufflinks and cuffmerge do

What cuffdiff and cummeRbund do

Notes on FASTQ preparation

Some Logistics...

Exercise Workflow

Exercise Workflow

Step 4: Finding differentially expressed genes and isoforms using cuffdiff

Step 5: Inspect the mapped results

Examine the differential expression analysis results

Step 6: Using cummeRbund to inspect differential expression data.

Step 6: Using cummeRbund to inspect differential expression data.