Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Differential

...

expression

...

with

...

splice

...

variant

...

analysis

...

at

...

the

...

same

...

time:

...

the

...

Tophat/Cufflinks

...

workflow

...

In

...

this

...

lab,

...

you

...

will

...

explore

...

a

...

fairly

...

typical

...

RNA-seq

...

analysis

...

workflow.

...

Simulated

...

RNA-seq

...

data

...

will

...

be

...

provided

...

to

...

you;

...

the

...

data

...

contains

...

75

...

bp

...

paired-end

...

reads

...

that

...

have

...

been

...

generated

...

in

...

silico

...

to

...

replicate

...

real

...

gene

...

count

...

data

...

from

...

Drosophila.

...

The

...

data

...

simulates

...

two

...

biological

...

groups

...

with

...

three

...

biological

...

replicates

...

per

...

group

...

(6

...

samples

...

total).

...

This

...

simulated

...

data

...

has

...

already

...

been

...

run

...

through

...

a

...

basic

...

RNA-seq

...

analysis

...

workflow.

...

We

...

will

...

look

...

at:

...

i.

...

How

...

the

...

workflow

...

was

...

run

...

and

...

what

...

steps

...

are

...

involved.

...


ii.

...

What

...

genes

...

and

...

isoforms

...

are

...

significantly

...

differentially

...

expressed

...

Six

...

raw

...

data

...

files

...

were

...

provided

...

as

...

the

...

starting

...

point:

...


c1_r1,

...

c1_r2,

...

c1_r3

...

from

...

the

...

first

...

biological

...

condition,

...

and

...

c2_r1,

...

c2_r2,

...

and

...

c2_r3

...

from

...

the

...

second.

...

Due

...

to

...

the

...

size

...

of

...

the

...

data

...

and

...

length

...

of

...

run

...

time,

...

*most

...

of

...

the

...

programs

...

have

...

already

...

been

...

run

...

for

...

this

...

exercise*.

...

The

...

commands

...

run

...

are

...

in

...

different

...

*.commands

...

files.

...

We

...

will

...

spend

...

some

...

time

...

looking

...

through

...

these

...

commands

...

to

...

understand

...

them.

...

You

...

will

...

then

...

be

...

parsing

...

the

...

output,

...

finding

...

answers,

...

and

...

visualizing

...

results.

...

Introduction

An overview of tophat cufflinks workflow

Some Logistics...

...

This

...

section

...

is

...

all

...

located

...

under

...

$BI/tophat_cufflinks/

...

So

...

cd

...

into

...

this

...

directory

...

now.

...

Commands

...

used

...

are

...

in

...

different

...

*.commands

...

files

...

located

...

in

...

$BI/tophat_cufflinks/run_commands

...

Some

...

output,

...

like

...

bam

...

files

...

can

...

be

...

gotten

...

from

...

the

...

URL

...

http://loving.corral.tacc.utexas.edu/bioiteam/tophat_cufflinks/

...

Image Added and

...

directly

...

loaded

...

into

...

IGV,

...

using

...

Load

...

from

...

URL.

...

If

...

you

...

generate

...

your

...

own

...

output

...

and

...

would

...

like

...

to

...

view

...

them

...

on

...

IGV,

...

you

...

will

...

need

...

to

...

scp

...

the

...

files

...

from

...

lonestar

...

to

...

your

...

computer.

Expand
Reminder on how to scp files from lonestar
Reminder on how to scp files from lonestar

On your computer's side:

Go to the directory where you want to copy files to.

{expand:Reminder on how to scp files from lonestar} On your computer's side: Go to the directory where you want to copy files to. {code}
Code Block

 scp my_user_name@lonestar.tacc.utexas.edu:/home/.../stuff.fastq ./
{code} Replace the

Replace the "/home/..."

with

the

"pwd"

information

obtained

earlier.

This

command

would

transfer

"stuff.fastq"

from

the

specified

directory

on

Lonestar

to

your

current

directory

on

your

computer.

{expand} For

For help,

...

remember

...

to

...

type

...

the

...

commands

...

and

...

hit

...

enter

...

to

...

see

...

what

...

help

...

they

...

can

...

offer

...

you.

...

You

...

will

...

also

...

almost

...

certainly

...

need

...

to

...

consult

...

the

...

documentation

...

for

...

tophat,

...

cufflinks

...

and

...

cummeRbund:

...


http://tophat.cbcb.umd.edu/manual.html

...


http://cufflinks.cbcb.umd.edu/manual.html

...


http://compbio.mit.edu/cummeRbund/manual.html

...

The workflow

a)

...

map

...

reads

...

against

...

the

...

Drosophila

...

genome

...

(using

...

tophat)

...

(this

...

has

...

already

...

been

...

done

...

to

...

save

...

time)

...


b)

...

assemble

...

the

...

putative

...

transcripts

...

(using

...

cufflinks)

...

(this

...

has

...

already

...

been

...

done

...

to

...

save

...

time)

...


c)

...

merge

...

the

...

assemblies

...

(using

...

cuffmerge)

...

(this

...

has

...

already

...

been

...

done

...

to

...

save

...

time)

...


d)

...

compute

...

differential

...

expression

...

(using

...

cuffdiff)

...

(this

...

has

...

already

...

been

...

done

...

to

...

save

...

time)

...


e)

...

inspect

...

the

...

results

...


f)

...

examine

...

the

...

differential

...

expression

...

results

...

(using

...

unix,

...

IGV

...

and

...

cummeRbund)

...


Extra:

...

g)

...

compare

...

assembled

...

transcripts

...

to

...

annotated

...

transcripts

...

to

...

identify

...

potentially

...

novel

...

ones

...

(using

...

cuffcmp)

...

a)

...

and

...

b)

...

Running

...

tophat

...

and

...

cufflinks:

...

Look

...

at

...

run_commands/tophat.commands

...

and

...

run_commands/cufflinks.commands

...

to

...

see

...

how

...

tophat

...

and

...

cufflinks

...

were

...

run.

Expand
How do I look into the files
How do I look into the files

cat

* {expand:How do I look into the files} cat

$BI/tophat_cufflinks/run_commands/tophat.commands


cat

$BI/tophat_cufflinks/run_commands/cufflinks.commands

{expand} On

On lonestar,

...

to

...

run

...

tophat,

...

cufflinks

...

etc,

...

following

...

modules

...

need

...

to

...

be

...

loaded.

{
Code Block
}
module load boost/1.45.0
module load bowtie
module load tophat
module load cufflinks/2.0.2
{code}

{code:
Code Block
titleGeneral syntax for tophat command
 title= General syntax for tophat command}
tophat [options] <bowtie_index_prefix> <reads1> <reads2>
{code}

{code: title=Take a minute to look at the output files produced by one tophat run.}
Code Block
titleTake a minute to look at the output files produced by one tophat run.

The most important files are in bold.

*-rwxr-xr-x 1 daras G-803889 323M Aug 16 11:47 accepted_hits.bam*
*-r-xr-xr-x 1 daras G-803889 237K Aug 16 11:46 accepted_hits.bam.bai*
-rwxr-xr-x 1 daras G-803889   52 Aug 16 11:46 deletions.bed
-rwxr-xr-x 1 daras G-803889   54 Aug 16 11:46 insertions.bed
-rwxr-xr-x 1 daras G-803889 2.9M Aug 16 11:46 junctions.bed
-rwxr-xr-x 1 daras G-803889   70 Aug 16 11:46 left_kept_reads.info
drwxr-xr-x 2 daras G-803889  32K Aug 16 11:46 logs
-rwxr-xr-x 1 daras G-803889   70 Aug 16 11:46 right_kept_reads.info
-rwxr-xr-x 1 daras G-803889 9.7K Aug 16 11:46 unmapped_left.fq.z
-rwxr-xr-x 1 daras G-803889 9.9K Aug 16 11:46 unmapped_right.fq.z
Code Block
titleGeneral syntax for cufflinks command
{code}

{code: title=General syntax for cufflinks command}
cufflinks [options] <hits.bam>
{code}

{code title=Take a minute to look at the output files produced by one
Code Block
titleTake a minute to look at the output files produced by one cufflinks run.
 cufflinks run.}

The most important files are in bold.

-rwxr-xr-x 1 daras G-803889 597K Aug 16 12:49 genes.fpkm_tracking
-rwxr-xr-x 1 daras G-803889 960K Aug 16 12:49 isoforms.fpkm_tracking
-rwxr-xr-x 1 daras G-803889    0 Aug 16 12:33 skipped.gtf
*-rwxr-xr-x 1 daras G-803889  14M Aug 16 12:49 transcripts.gtf*
{code}


Exercise 

Exercise 1:

...

If

...

I

...

wanted

...

to

...

provide

...

a

...

trancript

...

annotation

...

file

...

(gtf

...

file)

...

in

...

cufflinks

...

command,

...

what

...

would

...

I

...

add

...

to

...

the

...

command?

...

Remember

...

that

...

you

...

can

...

type

...

the

...

command

...

and

...

hit

...

enter

...

to

...

get

...

help

...

info

...

about

...

it.

...


i.

...

If

...

I

...

wanted

...

cufflinks

...

to

...

not

...

assemble

...

any

...

novel

...

transcripts

...

outside

...

of

...

what

...

is

...

in

...

the

...

gtf

...

file

...


ii.

...

If

...

I

...

wanted

...

cufflinks

...

to

...

assemble

...

novel

...

as

...

well

...

as

...

annotated

...

transcripts?

Expand
Solution
Solution
{expand:Solution} \

-G

to

use

only

the

annotated

transcripts

in

the

gtf

file

\


-g

to

use

the

annotated

transcripts

in

the

gtf

file

as

a

guide,

but

also

assemble

novel

transcripts {expand} \\ *

transcripts


c)

...

Merging

...

assemblies

...

using

...

cuffmerge

* {
Code Block
}
find . -name transcripts.gtf > assembly_list.txt
cuffmerge <assembly_list.txt>
{code}
\\
*


d)

...

Finding

...

differentially

...

expressed

...

genes

...

and

...

isoforms

...

using

...

cuffdiff

* {
Code Block
}
cuffdiff [options] <merged.gtf> <sample1_rep1.bam,sample1_rep2.bam> <sample2_rep1.bam,sample2_rep2.bam>
{code}

Exercise

...

2:

...

What

...

does

...

cuffdiff

...

-b

...

do?

Expand
Solution
Solution
{expand:Solution} \

-b

is

for

enabling

fragment

bias

correction.

{expand} h2.

e)

...

Inspect

...

the

...

mapped

...

results

...

Before

...

you

...

even

...

start,

...

do

...

a

...

sanity

...

check

...

on

...

your

...

data

...

by

...

examining

...

the

...

bam

...

files

...

from

...

the

...

mapping

...

output.

...

I've

...

included

...

the

...

directory

...

"bwa_genome"

...

containing

...

the

...

output

...

from

...

bwa

...

of

...

the

...

C1_R1_1

...

and

...

C1_R1_2

...

files

...

mapped

...

directly

...

to

...

the

...

genome

...

using

...

bwa.

...

Tophat

...

output

...

is

...

in

...

C1_R1_thout

...

for

...

C1_R1

...

sample.

...

Report

...

the

...

total

...

number

...

of

...

input

...

reads

...

and

...

the

...

total

...

number

...

of

...

mapped

...

reads

...

for

...

each

...

sample

...

using

...

"samtools

...

flagstat"

...

for

...

the

...

bwa

...

output

...

and

...

the

...

tophat

...

output.

...

There

...

is

...

a

...

big

...

difference

...

between

...

the

...

bam

...

file

...

from

...

tophat

...

and

...

the

...

bam

...

file

...

from

...

bwa.

...

Can

...

you

...

spot

...

it?

...

You

...

might

...

want

...

to

...

load

...

the

...

bwa

...

bam

...

file

...

on

...

IGV

...

alongside

...

the

...

tophat

...

bam

...

file

...

for

...

the

...

same

...

sample

...

to

...

see

...

the

...

differences

...

between

...

mapping

...

to

...

the

...

transcriptome

...

and

...

mapping

...

to

...

the

...

genome.

...

Load

...

tophat

...

bam

...

file

...

from

...

URL:

...

/corral-repl/utexas/BioITeam/web/tophat_cufflinks/

...


Load

...

bwa

...

bam

...

file

...

from

...

URL:

...

/corral-repl/utexas/BioITeam/web/tophat_cufflinks/bwa_genome/

...

Now,

...

load

...

TWO

...

of

...

the

...

bam

...

file

...

outputs

...

from

...

the

...

tophat

...

run

...

(remember,

...

tophat

...

calls

...

bowtie

...

for

...

mapping).

...

These

...

are

...

large

...

file,

...

so

...

it

...

may

...

slow

...

your

...

computer

...

down.

...

I'd

...

suggest

...

loading

...

one

...

from

...

C1

...

and

...

one

...

from

...

C2

...

so

...

you

...

see

...

genuine

...

expression

...

level

...

changes.

...

Keep

...

this

...

open

...

to

...

come

...

back

...

to,

...

when

...

we

...

further

...

explore

...

differential

...

expression

...

results.

Expand
Reminder on how to download and run IGV
Reminder on how to download and run IGV
{expand:Reminder on how to download and run IGV} {code:title=Downloading and running IGV}
Code Block
titleDownloading and running IGV

wget http://www.broadinstitute.org/igv/projects/downloads/IGV_2.1.14.zip
unzip IGV_2.1.14.zip
cd IGV_2.1.14
java -Xmx2g -jar igv.jar
{code} {expand} h2.

f)

...

Examine

...

the

...

differential

...

expression

...

analysis

...

results

...

Find

...

the

...

cuffdiff

...

output

...

(either

...

by

...

understanding

...

cuffdiff.commands

...

or

...

by

...

looking)

...

and

...

by

...

looking

...

at

...

it

...

and/or

...

reading

...

the

...

documentation

...

find

...

the

...

isoforms

...

and

...

genes

...

that

...

are

...

differentially

...

expressed.

...

Note

...

that

...

cuffdiff

...

has

...

performed

...

a

...

statistical

...

test

...

on

...

the

...

expression

...

values

...

between

...

our

...

two

...

biological

...

groups.

...

It

...

reports

...

the

...

FPKM

...

expression

...

levels

...

for

...

each

...

group,

...

the

...

log2(group

...

1

...

FPKM/

...

group

...

2

...

FPKM),

...

and

...

a

...

p-value

...

measure

...

of

...

statistical

...

confidence,

...

among

...

many

...

other

...

helpful

...

data

...

items.

...

Exercise

...

3:

...

Find

...

the

...

10

...

most

...

up-regulated

...

genes,

...

by

...

fold

...

change.

...

Look

...

at

...

one

...

example

...

of

...

a

...

up-regulated

...

gene,

...

regucalcin,

...

on

...

IGV.

...

Find

...

the

...

10

...

most

...

down-regulated

...

genes,

...

by

...

fold

...

change.

...

Look

...

at

...

one

...

example

...

of

...

a

...

down-regulated

...

gene,

...

on

...

IGV.

Expand
Solution
Solution

Top 10 upregulated genes
scf
CG8979
CG4389
crc
KdelR
Vha68-2
CG3835
Df31
by
Dhpr

Top 10 downregulated genes
Git
CG1657
Myo31DF
RpII215
RpL9
Pde8
Dsp1
Bsg
eIF-4E
HDAC6,dah

Expand
Hint
Hint
{expand:Solution} *Top 10 upregulated genes* scf CG8979 CG4389 crc KdelR Vha68-2 CG3835 Df31 by Dhpr *Top 10 downregulated genes* Git CG1657 Myo31DF RpII215 RpL9 Pde8 Dsp1 Bsg eIF-4E HDAC6,dah {expand} {expand:Hint} {code:title=One-line command to get 10 most up regulated genes}
Code Block
titleOne-line command to get 10 most up regulated genes

cat gene_exp.diff |grep 'yes'|sort -k10nr,10|head
{code} {code:title=
Code Block
titleOne-line
command
to
get
10
most
down
regulated
genes
}

cat gene_exp.diff |grep 'yes'|sort -k10n,10|head
{code} {expand} Here is a basic command useful for

Here is a basic command useful for parsing/sorting

...

the

...

gene_exp.diff

...

or

...

isoform_exp.diff

...

files:

{:=
Code Block
title
Linux
one-liner
for
sorting
cuffdiff
output
by
log2
fold-change
values
}
cat isoform_exp.diff | awk '{print $10 "\t" $4}' | sort -n -r | head
{code}

Exercise

...

4:

...

Find

...

the

...

10

...

most

...

up-regulated

...

isoforms,

...

by

...

fold

...

change.

...

What

...

genes

...

do

...

they

...

belong

...

to?

Expand
Solution
Solution

simj
CG2177
sPLA2
Nipsnap
Pde8
by
CG15814
Dhpr
eIF-4E
spir

Expand
Hint
Hint
{expand:Solution} simj CG2177 sPLA2 Nipsnap Pde8 by CG15814 Dhpr eIF-4E spir {expand} {expand:Hint} {code: title=One-line command to get 10 most up-regulated isoforms}
Code Block
titleOne-line command to get 10 most up-regulated isoforms

cat isoform_exp.diff |grep 'yes'|sort -k10nr,10|head

Using cummeRbund to inspect differential expression data.

Step 1: Load R and enter R environment

Code Block
{code}
{expand}

h4. Using cummeRbund to inspect differential expression data.

Step 1: Load R and enter R environment

{code}
module load R
R
{code}

Step R
R

Step 2:

...

Within

...

R

...

environment,

...

set

...

up

...

cummeRbund

{
Code Block
}
source("http://bioconductor.org/biocLite.R")
biocLite("cummeRbund")
{code}

Step

...

3:

...

Load

...

cummeRbund

...

library

...

and

...

read

...

in

...

the

...

differential

...

expression

...

results.

{
Code Block
}
library(cummeRbund)
cuff_data <- readCufflinks('diff_out')
{code}

Step

...

4:

...

Use

...

cummeRbund

...

to

...

visualize

...

the

...

differential

...

expression

...

results

Code Block
titleTo draw a scatterplot


{code: title= To draw a scatterplot}
csScatter(genes(cuff_data), 'C1', 'C2')
{code}
{code: title= To
Code Block
titleTo pull out significantly differentially expressed genes and isoforms
 pull out significantly differentially expressed genes and isoforms}
gene_diff_data  <- diffData(genes(cuff_data))
sig_gene_data  <- subset(gene_diff_data, (significant ==  'yes'))
nrow(sig_gene_data)
isoform_diff_data <-diffData(isoforms(cuff_data), 'C1', 'C2')
sig_isoform_data <- subset(isoform_diff_data, (significant == 'yes'))
nrow(sig_isoform_data)
{code}

{code: title= To plot gene level and isoform level expression for gene regucalcin}
Code Block
titleTo plot gene level and isoform level expression for gene regucalcin
mygene1 <- getGene(cuff_data,'regucalcin')
expressionBarplot(mygene1)
expressionBarplot(isoforms(mygene1))
{
Code Block
titleTo plot gene level and isoform level expression for gene Rala
code}

{code: title= To plot gene level and isoform level expression for gene Rala}
mygene2 <- getGene(cuff_data, 'Rala')
expressionBarplot(mygene2)
expressionBarplot(isoforms(mygene2))
{code}



h4. Take cummeRbund for a 

Take cummeRbund for a spin...

...

CummeRbund

...

is

...

powerful

...

package

...

with

...

many

...

different

...

functions.

...

Above

...

was

...

an

...

illustration

...

of

...

a

...

few

...

of

...

them.

...

Try

...

any

...

of

...

the

...

suggested

...

exercises

...

below

...

to

...

further

...

explore

...

the

...

differential

...

expression

...

results

...

with

...

different

...

cummeRbund

...

functions.

...

If

...

you

...

would

...

rather

...

just

...

look

...

at

...

the

...

resulting

...

graphs

...

,they

...

are

...

at

...

the

...

URL:

...

http://loving.corral.tacc.utexas.edu/bioiteam/tophat_cufflinks/

...

Image Added as

...

exercise5_Rplots.pdf,

...

exercise6_Rplots.pdf,

...

and

...

exercise7_Rplots.pdf.

...

You

...

can

...

refer

...

to

...

the

...

cummeRbund

...

manual

...

for

...

more

...

help

...

and

...

remember

...

that

...

?functionName

...

will

...

provide

...

syntax

...

information

...

for

...

different

...

functions.

...


http://compbio.mit.edu/cummeRbund/manual.html

...

You

...

may

...

need

...

to

...

redo

...

Step

...

3.

...

everytime

...

you

...

reopen

...

an

...

R

...

session.

...

Exercise

...

5:

...

Visualize

...

the

...

distribution

...

of

...

fpkm

...

values

...

across

...

the

...

two

...

different

...

conditions

...

using

...

a

...

boxplot.

Expand
Solution
Solution
{expand:Solution} {code: title= R command to generate box plot of gene level fpkms}
Code Block
titleR command to generate box plot of gene level fpkms

csBoxplot(genes(cuff_data))
{code} {code: title= R command to generate box plot of isoform level fpkms}
Code Block
titleR command to generate box plot of isoform level fpkms

csBoxplot(isoforms(cuff_data))

{expand} {expand:Hint} Use csBoxplot function on

Expand
Hint
Hint

Use csBoxplot function on cuff_data

object

to

generate

a

boxplot

of

gene

or

isoform

level

fpkms.

{expand} Exercise

Exercise 6:

...

Visualize

...

the

...

significant

...

vs

...

non-significant

...

genes

...

using

...

a

...

volcano

...

plot.

Expand
Solution
Solution
{expand:Solution}

csVolcano(genes(cuff_data),

"C1",

"C2")

{expand} {expand:Hint} Use csVolcano function on

Expand
Hint
Hint

Use csVolcano function on cuff_data

object

to

generate

a

volcano

plot.

{expand} Exercise

Exercise 7:

...

Pull

...

out

...

a

...

subset

...

of

...

the

...

genes

...

using

...

a

...

p_value

...

cutoff

...

of

...

0.01.

...

Generate

...

expression

...

bar

...

plots

...

for

...

just

...

those

...

genes.

Expand
Solution
Solution
Code Block
titleOne possible solution
{expand:Solution} {code: title=One possible solution}

Wiki Markup

gene_diff_data  <- diffData(genes(cuff_data))
sig_gene_data  <- subset(gene_diff_data, (ln_fold_change > 1.5))
head(sig_gene_data)
sig_geneids <- c(sig_gene_data\[1\]$gene_id)

myGenes

<-

getGenes(cuff_data,

sig_geneids)


expressionBarplot(myGenes)

{expand} *If you have trouble sourcing cummeRbund, try this:* module swap gcc intel Reenter R and redo above steps.

If you have trouble sourcing cummeRbund, try this:
module swap gcc intel
Reenter R and redo above steps.