Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
Answer
Answer
Code Block
Looking at the CIGAR string
Looking at the CIGAR string
samtools view -x accepted_hits.bam | cut -f 1-9 | more

26      0x61    2L      8008    255     75M     =       8205    272
16      0x93    2L      8021    255     75M     =       7902    -194
9       0x93    2L      8051    255     66M76N9M        =       7954    -248
3       0x63    2L      8059    255     58M76N17M       =       8220    236
10      0x93    2L      8093    255     24M76N51M       =       7972    -272
20      0x93    2L      8102    255     15M76N60M       =       7984    -269

The CIGAR string "58M76N17M" representst a spliced sequence. The codes mean:

  • 56M - the first 58 bases match the reference
  • 76N - there are then 76 bases on the reference with no corresponding bases in the sequence (an intron)
  • 17M - the last 17 bases match the reference
Code Block
titleGeneral syntax for cufflinks command

cufflinks [options] <hits.bam>

...

Exercise 4: Count spliced sequences
How many spliced sequences are there in the C1_R1 alignment file?

Expand
Hint
Hint

samtools view and cut and grep

Expand
How to
How to
Code Block

cd $BI/ngs_course/tophat_cufflinks/C1_R1_thout
samtools view accepted_hits.bam | cut -f 6 | grep 'N' | wc -l
Code Block
titleGeneral syntax for cufflinks command

cufflinks [options] <hits.bam>

Look at $BI/ngs_course/tophat_cufflinks/run_commands/cufflinks.commands to see how it was run.

Expand
How do I look into the file?
How do I look into the file?

cat $BI/ngs_course/tophat_cufflinks/run_commands/cufflinks.commands

Take a minute to look at the output files produced by one cufflinks run.
The important file is transcripts.gtf, which contains Tophat's assembled junctions for C1_R1.

Code Block
titleCufflinks output files

cd $BI/ngs_course/tophat_cufflinks/C1_R1_clout
ls -l

-rwxr-xr-x 1 daras G-803889  14M Aug 16 12:49 transcripts.gtf

-rwxr-xr-x 1 daras G-803889 597K Aug 16 12:49 genes.fpkm_tracking
-rwxr-xr-x 1 daras G-803889 960K Aug 16 12:49 isoforms.fpkm_tracking
-rwxr-xr-x 1 daras G-803889    0 Aug 16 12:33 skipped.gtf

Step 3: Merging assemblies using cuffmerge

Create a file listing the paths of all per-sample transcripts.gtf files so far, then pass that to cuffmerge:

Code Block

cd $BI/ngs_course/tophat_cufflinks

...


find . -name transcripts.gtf > assembly_list.txt
cuffmerge <assembly_list.txt>
Expand
assembly_list.txt contents
assembly_list.txt contents
Code Block

cat 
Expand
How do I look into the file?How do I look into the file?cat
$BI/ngs_course/tophat_cufflinks/
run
assembly_
commands/cufflinks.commands
list.txt

Take a minute to look at the output files produced by one cufflinks run cuffmerge.
The transcripts.gtf file is most important file is merged.gif, which contains the consensus transcriptome annotations cuffmerge has calculated.

Code Block
titleCufflinks cuffmerge output files
cd $BI/ngs_course/tophat_cufflinks/merged_asm
-rwxr-xrls -l

-rwxrwxr-x  1 daras G-803889  1571816 Aug 16  2012 genes.fpkm_tracking
-rwxrwxr-x  1 daras G-803889  14M2281319 Aug 16  12:492012 transcriptsisoforms.gtf

-rwxr-xrfpkm_tracking
drwxrwxr-x  12 daras G-803889 597K   32768 Aug 16 12:49 genes.fpkm_tracking 2012 logs
-rwxrr-xrxrwxr-x  1 daras G-803889 960K32090408 Aug 16 12:49 isoforms.fpkm_tracking
-rwxr-xr 2012 merged.gtf
-rwxrwxr-x  1 daras G-803889        0 Aug 16 12:33 skipped.gtf

Step 3: Merging assemblies using cuffmerge

Code Block

find . -name transcripts.gtf > assembly_list.txt
cuffmerge <assembly_list.txt>
Code Block
titleTake a minute to look at the output files produced by cuffmerge.

The most important file is:
-r-xr-xr-x 2012 skipped.gtf
drwxrwxr-x  2 daras G-803889    32768 Aug 16  2012 tmp
-rwxrwxr-x  1 daras G-803889 3209040834844830 Aug 16 20:11 merged 2012 transcripts.gtf

Step 4: Finding differentially expressed genes and isoforms using cuffdiff

...