...
Differential
...
expression
...
with
...
splice
...
variant
...
analysis
...
at
...
the
...
same
...
time:
...
the
...
Tophat/Cufflinks
...
workflow
...
In
...
this
...
lab,
...
you
...
will
...
explore
...
a
...
fairly
...
typical
...
RNA-seq
...
analysis
...
workflow.
...
Simulated
...
RNA-seq
...
data
...
will
...
be
...
provided
...
to
...
you;
...
the
...
data
...
contains
...
75
...
bp
...
paired-end
...
reads
...
that
...
have
...
been
...
generated
...
in
...
silico
...
to
...
replicate
...
real
...
gene
...
count
...
data
...
from
...
Drosophila.
...
The
...
data
...
simulates
...
two
...
biological
...
groups
...
with
...
three
...
biological
...
replicates
...
per
...
group
...
(6
...
samples
...
total).
...
This
...
simulated
...
data
...
has
...
already
...
been
...
run
...
through
...
a
...
basic
...
RNA-seq
...
analysis
...
workflow.
...
We
...
will
...
look
...
at:
...
i.
...
How
...
the
...
workflow
...
was
...
run
...
and
...
what
...
steps
...
are
...
involved.
...
ii.
...
What
...
genes
...
and
...
isoforms
...
are
...
significantly
...
differentially
...
expressed
...
Six
...
raw
...
data
...
files
...
were
...
provided
...
as
...
the
...
starting
...
point:
...
c1_r1,
...
c1_r2,
...
c1_r3
...
from
...
the
...
first
...
biological
...
condition,
...
and
...
c2_r1,
...
c2_r2,
...
and
...
c2_r3
...
from
...
the
...
second.
...
Due
...
to
...
the
...
size
...
of
...
the
...
data
...
and
...
length
...
of
...
run
...
time,
...
*most
...
of
...
the
...
programs
...
have
...
already
...
been
...
run
...
for
...
this
...
exercise*.
...
The
...
commands
...
run
...
are
...
in
...
different
...
*.commands
...
files.
...
We
...
will
...
spend
...
some
...
time
...
looking
...
through
...
these
...
commands
...
to
...
understand
...
them.
...
You
...
will
...
then
...
be
...
parsing
...
the
...
output,
...
finding
...
answers,
...
and
...
visualizing
...
results.
...
Introduction
An overview of tophat cufflinks workflow
Some Logistics...
...
This
...
section
...
is
...
all
...
located
...
under
...
$BI/tophat_cufflinks/
...
So
...
cd
...
into
...
this
...
directory
...
now.
...
Commands
...
used
...
are
...
in
...
different
...
*.commands
...
files
...
located
...
in
...
$BI/tophat_cufflinks/run_commands
...
Some
...
output,
...
like
...
bam
...
files
...
can
...
be
...
gotten
...
from
...
the
...
URL
...
http://loving.corral.tacc.utexas.edu/bioiteam/tophat_cufflinks/
...
...
directly
...
loaded
...
into
...
IGV,
...
using
...
Load
...
from
...
URL.
...
If
...
you
...
generate
...
your
...
own
...
output
...
and
...
would
...
like
...
to
...
view
...
them
...
on
...
IGV,
...
you
...
will
...
need
...
to
...
scp
...
the
...
files
...
from
...
lonestar
...
to
...
your
...
computer.
Expand | ||||
---|---|---|---|---|
| ||||
On your computer's side: Go to the directory where you want to copy files to. {expand:Reminder on how to scp files from lonestar} On your computer's side: Go to the directory where you want to copy files to. {code}
Replace the "/home/..." with the "pwd" information obtained earlier. This command would transfer "stuff.fastq" from the specified directory on Lonestar to your current directory on your computer. {expand} For |
For help,
...
remember
...
to
...
type
...
the
...
commands
...
and
...
hit
...
enter
...
to
...
see
...
what
...
help
...
they
...
can
...
offer
...
you.
...
You
...
will
...
also
...
almost
...
certainly
...
need
...
to
...
consult
...
the
...
documentation
...
for
...
tophat,
...
cufflinks
...
and
...
cummeRbund:
...
http://tophat.cbcb.umd.edu/manual.html
...
http://cufflinks.cbcb.umd.edu/manual.html
...
http://compbio.mit.edu/cummeRbund/manual.html
...
The workflow
a)
...
map
...
reads
...
against
...
the
...
Drosophila
...
genome
...
(using
...
tophat)
...
(this
...
has
...
already
...
been
...
done
...
to
...
save
...
time)
...
b)
...
assemble
...
the
...
putative
...
transcripts
...
(using
...
cufflinks)
...
(this
...
has
...
already
...
been
...
done
...
to
...
save
...
time)
...
c)
...
merge
...
the
...
assemblies
...
(using
...
cuffmerge)
...
(this
...
has
...
already
...
been
...
done
...
to
...
save
...
time)
...
d)
...
compute
...
differential
...
expression
...
(using
...
cuffdiff)
...
(this
...
has
...
already
...
been
...
done
...
to
...
save
...
time)
...
e)
...
inspect
...
the
...
results
...
f)
...
examine
...
the
...
differential
...
expression
...
results
...
(using
...
unix,
...
IGV
...
and
...
cummeRbund)
...
Extra:
...
g)
...
compare
...
assembled
...
transcripts
...
to
...
annotated
...
transcripts
...
to
...
identify
...
potentially
...
novel
...
ones
...
(using
...
cuffcmp)
...
a)
...
and
...
b)
...
Running
...
tophat
...
and
...
cufflinks:
...
Look
...
at
...
run_commands/tophat.commands
...
and
...
run_commands/cufflinks.commands
...
to
...
see
...
how
...
tophat
...
and
...
cufflinks
...
were
...
run.
Expand | ||||
---|---|---|---|---|
| ||||
cat * {expand:How do I look into the files} cat$BI/tophat_cufflinks/run_commands/tophat.commands
$BI/tophat_cufflinks/run_commands/cufflinks.commands {expand} On |
On lonestar,
...
to
...
run
...
tophat,
...
cufflinks
...
etc,
...
following
...
modules
...
need
...
to
...
be
...
loaded.
Code Block |
---|
} module load boost/1.45.0 module load bowtie module load tophat module load cufflinks/2.0.2 {code} {code: |
Code Block | ||
---|---|---|
| ||
title= General syntax for tophat command} tophat [options] <bowtie_index_prefix> <reads1> <reads2> {code} {code: title=Take a minute to look at the output files produced by one tophat run.} |
Code Block | ||
---|---|---|
| ||
The most important files are in bold. *-rwxr-xr-x 1 daras G-803889 323M Aug 16 11:47 accepted_hits.bam* *-r-xr-xr-x 1 daras G-803889 237K Aug 16 11:46 accepted_hits.bam.bai* -rwxr-xr-x 1 daras G-803889 52 Aug 16 11:46 deletions.bed -rwxr-xr-x 1 daras G-803889 54 Aug 16 11:46 insertions.bed -rwxr-xr-x 1 daras G-803889 2.9M Aug 16 11:46 junctions.bed -rwxr-xr-x 1 daras G-803889 70 Aug 16 11:46 left_kept_reads.info drwxr-xr-x 2 daras G-803889 32K Aug 16 11:46 logs -rwxr-xr-x 1 daras G-803889 70 Aug 16 11:46 right_kept_reads.info -rwxr-xr-x 1 daras G-803889 9.7K Aug 16 11:46 unmapped_left.fq.z -rwxr-xr-x 1 daras G-803889 9.9K Aug 16 11:46 unmapped_right.fq.z |
Code Block | ||
---|---|---|
| ||
{code} {code: title=General syntax for cufflinks command} cufflinks [options] <hits.bam> {code} {code title=Take a minute to look at the output files produced by one |
Code Block | ||
---|---|---|
| ||
cufflinks run.} The most important files are in bold. -rwxr-xr-x 1 daras G-803889 597K Aug 16 12:49 genes.fpkm_tracking -rwxr-xr-x 1 daras G-803889 960K Aug 16 12:49 isoforms.fpkm_tracking -rwxr-xr-x 1 daras G-803889 0 Aug 16 12:33 skipped.gtf *-rwxr-xr-x 1 daras G-803889 14M Aug 16 12:49 transcripts.gtf* {code} Exercise |
Exercise 1:
...
If
...
I
...
wanted
...
to
...
provide
...
a
...
trancript
...
annotation
...
file
...
(gtf
...
file)
...
in
...
cufflinks
...
command,
...
what
...
would
...
I
...
add
...
to
...
the
...
command?
...
Remember
...
that
...
you
...
can
...
type
...
the
...
command
...
and
...
hit
...
enter
...
to
...
get
...
help
...
info
...
about
...
it.
...
i.
...
If
...
I
...
wanted
...
cufflinks
...
to
...
not
...
assemble
...
any
...
novel
...
transcripts
...
outside
...
of
...
what
...
is
...
in
...
the
...
gtf
...
file
...
ii.
...
If
...
I
...
wanted
...
cufflinks
...
to
...
assemble
...
novel
...
as
...
well
...
as
...
annotated
...
transcripts?
Expand | ||||
---|---|---|---|---|
| ||||
{expand:Solution}
\ -G to use only the annotated transcripts in the gtf file \
to use the annotated transcripts in the gtf file as a guide, but also assemble novel transcripts {expand} \\ *transcripts |
c)
...
Merging
...
assemblies
...
using
...
cuffmerge
Code Block |
---|
} find . -name transcripts.gtf > assembly_list.txt cuffmerge <assembly_list.txt> {code} \\ * |
d)
...
Finding
...
differentially
...
expressed
...
genes
...
and
...
isoforms
...
using
...
cuffdiff
Code Block |
---|
} cuffdiff [options] <merged.gtf> <sample1_rep1.bam,sample1_rep2.bam> <sample2_rep1.bam,sample2_rep2.bam> {code} |
Exercise
...
2:
...
What
...
does
...
cuffdiff
...
-b
...
do?
Expand | ||||
---|---|---|---|---|
| ||||
{expand:Solution}
\ -b is for enabling fragment bias correction. {expand} h2. |
e)
...
Inspect
...
the
...
mapped
...
results
...
Before
...
you
...
even
...
start,
...
do
...
a
...
sanity
...
check
...
on
...
your
...
data
...
by
...
examining
...
the
...
bam
...
files
...
from
...
the
...
mapping
...
output.
...
I've
...
included
...
the
...
directory
...
"bwa_genome"
...
containing
...
the
...
output
...
from
...
bwa
...
of
...
the
...
C1_R1_1
...
and
...
C1_R1_2
...
files
...
mapped
...
directly
...
to
...
the
...
genome
...
using
...
bwa.
...
Tophat
...
output
...
is
...
in
...
C1_R1_thout
...
for
...
C1_R1
...
sample.
...
Report
...
the
...
total
...
number
...
of
...
input
...
reads
...
and
...
the
...
total
...
number
...
of
...
mapped
...
reads
...
for
...
each
...
sample
...
using
...
"samtools
...
flagstat"
...
for
...
the
...
bwa
...
output
...
and
...
the
...
tophat
...
output.
...
There
...
is
...
a
...
big
...
difference
...
between
...
the
...
bam
...
file
...
from
...
tophat
...
and
...
the
...
bam
...
file
...
from
...
bwa.
...
Can
...
you
...
spot
...
it?
...
You
...
might
...
want
...
to
...
load
...
the
...
bwa
...
bam
...
file
...
on
...
IGV
...
alongside
...
the
...
tophat
...
bam
...
file
...
for
...
the
...
same
...
sample
...
to
...
see
...
the
...
differences
...
between
...
mapping
...
to
...
the
...
transcriptome
...
and
...
mapping
...
to
...
the
...
genome.
...
Load
...
tophat
...
bam
...
file
...
from
...
URL:
...
/corral-repl/utexas/BioITeam/web/tophat_cufflinks/
...
Load
...
bwa
...
bam
...
file
...
from
...
URL:
...
/corral-repl/utexas/BioITeam/web/tophat_cufflinks/bwa_genome/
...
Now,
...
load
...
TWO
...
of
...
the
...
bam
...
file
...
outputs
...
from
...
the
...
tophat
...
run
...
(remember,
...
tophat
...
calls
...
bowtie
...
for
...
mapping).
...
These
...
are
...
large
...
file,
...
so
...
it
...
may
...
slow
...
your
...
computer
...
down.
...
I'd
...
suggest
...
loading
...
one
...
from
...
C1
...
and
...
one
...
from
...
C2
...
so
...
you
...
see
...
genuine
...
expression
...
level
...
changes.
...
Keep
...
this
...
open
...
to
...
come
...
back
...
to,
...
when
...
we
...
further
...
explore
...
differential
...
expression
...
results.
Expand | ||||||
---|---|---|---|---|---|---|
| ||||||
{expand:Reminder on how to download and run IGV}
{code:title=Downloading and running IGV}
|
f)
...
Examine
...
the
...
differential
...
expression
...
analysis
...
results
...
Find
...
the
...
cuffdiff
...
output
...
(either
...
by
...
understanding
...
cuffdiff.commands
...
or
...
by
...
looking)
...
and
...
by
...
looking
...
at
...
it
...
and/or
...
reading
...
the
...
documentation
...
find
...
the
...
isoforms
...
and
...
genes
...
that
...
are
...
differentially
...
expressed.
...
Note
...
that
...
cuffdiff
...
has
...
performed
...
a
...
statistical
...
test
...
on
...
the
...
expression
...
values
...
between
...
our
...
two
...
biological
...
groups.
...
It
...
reports
...
the
...
FPKM
...
expression
...
levels
...
for
...
each
...
group,
...
the
...
log2(group
...
1
...
FPKM/
...
group
...
2
...
FPKM),
...
and
...
a
...
p-value
...
measure
...
of
...
statistical
...
confidence,
...
among
...
many
...
other
...
helpful
...
data
...
items.
...
Exercise
...
3:
...
Find
...
the
...
10
...
most
...
up-regulated
...
genes,
...
by
...
fold
...
change.
...
Look
...
at
...
one
...
example
...
of
...
a
...
up-regulated
...
gene,
...
regucalcin,
...
on
...
IGV.
...
Find
...
the
...
10
...
most
...
down-regulated
...
genes,
...
by
...
fold
...
change.
...
Look
...
at
...
one
...
example
...
of
...
a
...
down-regulated
...
gene,
...
on
...
IGV.
Expand | ||||
---|---|---|---|---|
| ||||
Top 10 upregulated genes Top 10 downregulated genes |
Expand | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||
{expand:Solution}
*Top 10 upregulated genes*
scf
CG8979
CG4389
crc
KdelR
Vha68-2
CG3835
Df31
by
Dhpr
*Top 10 downregulated genes*
Git
CG1657
Myo31DF
RpII215
RpL9
Pde8
Dsp1
Bsg
eIF-4E
HDAC6,dah
{expand}
{expand:Hint}
{code:title=One-line command to get 10 most up regulated genes}
|
Here is a basic command useful for parsing/sorting
...
the
...
gene_exp.diff
...
or
...
isoform_exp.diff
...
files:
Code Block | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| =
|
|
|
|
|
|
|
|
|
| ||||||||||||
} cat isoform_exp.diff | awk '{print $10 "\t" $4}' | sort -n -r | head {code} |
Exercise
...
4:
...
Find
...
the
...
10
...
most
...
up-regulated
...
isoforms,
...
by
...
fold
...
change.
...
What
...
genes
...
do
...
they
...
belong
...
to?
Expand | ||||
---|---|---|---|---|
| ||||
simj |
Expand | |||||
---|---|---|---|---|---|
| |||||
{expand:Solution}
simj
CG2177
sPLA2
Nipsnap
Pde8
by
CG15814
Dhpr
eIF-4E
spir
{expand}
{expand:Hint}
{code: title=One-line command to get 10 most up-regulated isoforms}
|
Using cummeRbund to inspect differential expression data.
Step 1: Load R and enter R environment
Code Block |
---|
{code} {expand} h4. Using cummeRbund to inspect differential expression data. Step 1: Load R and enter R environment {code} module load R R {code} Step R R |
Step 2:
...
Within
...
R
...
environment,
...
set
...
up
...
cummeRbund
Code Block |
---|
} source("http://bioconductor.org/biocLite.R") biocLite("cummeRbund") {code} |
Step
...
3:
...
Load
...
cummeRbund
...
library
...
and
...
read
...
in
...
the
...
differential
...
expression
...
results.
Code Block |
---|
} library(cummeRbund) cuff_data <- readCufflinks('diff_out') {code} |
Step
...
4:
...
Use
...
cummeRbund
...
to
...
visualize
...
the
...
differential
...
expression
...
results
Code Block | ||
---|---|---|
| ||
{code: title= To draw a scatterplot} csScatter(genes(cuff_data), 'C1', 'C2') {code} {code: title= To |
Code Block | ||
---|---|---|
| ||
pull out significantly differentially expressed genes and isoforms} gene_diff_data <- diffData(genes(cuff_data)) sig_gene_data <- subset(gene_diff_data, (significant == 'yes')) nrow(sig_gene_data) isoform_diff_data <-diffData(isoforms(cuff_data), 'C1', 'C2') sig_isoform_data <- subset(isoform_diff_data, (significant == 'yes')) nrow(sig_isoform_data) {code} {code: title= To plot gene level and isoform level expression for gene regucalcin} |
Code Block | ||
---|---|---|
| ||
mygene1 <- getGene(cuff_data,'regucalcin')
expressionBarplot(mygene1)
expressionBarplot(isoforms(mygene1))
{ |
Code Block | ||
---|---|---|
| ||
code} {code: title= To plot gene level and isoform level expression for gene Rala} mygene2 <- getGene(cuff_data, 'Rala') expressionBarplot(mygene2) expressionBarplot(isoforms(mygene2)) {code} h4. Take cummeRbund for a |
Take cummeRbund for a spin...
...
CummeRbund
...
is
...
powerful
...
package
...
with
...
many
...
different
...
functions.
...
Above
...
was
...
an
...
illustration
...
of
...
a
...
few
...
of
...
them.
...
Try
...
any
...
of
...
the
...
suggested
...
exercises
...
below
...
to
...
further
...
explore
...
the
...
differential
...
expression
...
results
...
with
...
different
...
cummeRbund
...
functions.
...
If
...
you
...
would
...
rather
...
just
...
look
...
at
...
the
...
resulting
...
graphs
...
,they
...
are
...
at
...
the
...
URL:
...
http://loving.corral.tacc.utexas.edu/bioiteam/tophat_cufflinks/
...
...
exercise5_Rplots.pdf,
...
exercise6_Rplots.pdf,
...
and
...
exercise7_Rplots.pdf.
...
You
...
can
...
refer
...
to
...
the
...
cummeRbund
...
manual
...
for
...
more
...
help
...
and
...
remember
...
that
...
?functionName
...
will
...
provide
...
syntax
...
information
...
for
...
different
...
functions.
...
http://compbio.mit.edu/cummeRbund/manual.html
...
You
...
may
...
need
...
to
...
redo
...
Step
...
3.
...
everytime
...
you
...
reopen
...
an
...
R
...
session.
...
Exercise
...
5:
...
Visualize
...
the
...
distribution
...
of
...
fpkm
...
values
...
across
...
the
...
two
...
different
...
conditions
...
using
...
a
...
boxplot.
Expand | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||
{expand:Solution}
{code: title= R command to generate box plot of gene level fpkms}
csBoxplot(isoforms(cuff_data)) {expand} {expand:Hint} Use csBoxplot function on |
Expand | ||||
---|---|---|---|---|
| ||||
Use csBoxplot function on cuff_data object to generate a boxplot of gene or isoform level fpkms. {expand} Exercise |
Exercise 6:
...
Visualize
...
the
...
significant
...
vs
...
non-significant
...
genes
...
using
...
a
...
volcano
...
plot.
Expand | ||||
---|---|---|---|---|
| ||||
{expand:Solution}
csVolcano(genes(cuff_data), "C1", "C2") {expand} {expand:Hint} Use csVolcano function on |
Expand | ||||
---|---|---|---|---|
| ||||
Use csVolcano function on cuff_data object to generate a volcano plot. {expand} Exercise |
Exercise 7:
...
Pull
...
out
...
a
...
subset
...
of
...
the
...
genes
...
using
...
a
...
p_value
...
cutoff
...
of
...
0.01.
...
Generate
...
expression
...
bar
...
plots
...
for
...
just
...
those
...
genes.
Expand | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
myGenes <- getGenes(cuff_data, sig_geneids)
|
If you have trouble sourcing cummeRbund, try this:
module swap gcc intel
Reenter R and redo above steps.