...
Expand |
---|
|
Code Block |
---|
| Looking at the CIGAR string |
---|
| Looking at the CIGAR string |
---|
|
samtools view -x accepted_hits.bam | cut -f 1-9 | more
26 0x61 2L 8008 255 75M = 8205 272
16 0x93 2L 8021 255 75M = 7902 -194
9 0x93 2L 8051 255 66M76N9M = 7954 -248
3 0x63 2L 8059 255 58M76N17M = 8220 236
10 0x93 2L 8093 255 24M76N51M = 7972 -272
20 0x93 2L 8102 255 15M76N60M = 7984 -269
|
The CIGAR string "58M76N17M" representst a spliced sequence. The codes mean: - 56M - the first 58 bases match the reference
- 76N - there are then 76 bases on the reference with no corresponding bases in the sequence (an intron)
- 17M - the last 17 bases match the reference
|
Step 2: Run cufflinks
Code Block |
---|
title | General syntax for cufflinks command |
---|
|
cufflinks [options] <hits.bam>
|
...
Exercise 4: Count spliced sequences
How many spliced sequences are there in the C1_R1 alignment file?
Expand |
---|
|
samtools view and cut and grep |
Expand |
---|
|
Code Block |
---|
cd $BI/ngs_course/tophat_cufflinks/C1_R1_thout
samtools view accepted_hits.bam | cut -f 6 | grep 'N' | wc -l
|
|
Step 2: Run cufflinks
Code Block |
---|
title | General syntax for cufflinks command |
---|
|
cufflinks [options] <hits.bam>
|
Look at $BI/ngs_course/tophat_cufflinks/run_commands/cufflinks.commands to see how it was run.
Expand |
---|
| How do I look into the file? |
---|
| How do I look into the file? |
---|
|
cat $BI/ngs_course/tophat_cufflinks/run_commands/cufflinks.commands |
Take a minute to look at the output files produced by one cufflinks run.
The important file is transcripts.gtf, which contains Tophat's assembled junctions for C1_R1.
Code Block |
---|
title | Cufflinks output files |
---|
|
cd $BI/ngs_course/tophat_cufflinks/C1_R1_clout
ls -l
-rwxr-xr-x 1 daras G-803889 14M Aug 16 12:49 transcripts.gtf
-rwxr-xr-x 1 daras G-803889 597K Aug 16 12:49 genes.fpkm_tracking
-rwxr-xr-x 1 daras G-803889 960K Aug 16 12:49 isoforms.fpkm_tracking
-rwxr-xr-x 1 daras G-803889 0 Aug 16 12:33 skipped.gtf
|
Step 3: Merging assemblies using cuffmerge
Create a file listing the paths of all per-sample transcripts.gtf files so far, then pass that to cuffmerge:
Code Block |
---|
cd $BI/ngs_course/tophat_cufflinks |
...
find . -name transcripts.gtf > assembly_list.txt
cuffmerge <assembly_list.txt>
|
Expand |
---|
| assembly_list.txt contents |
---|
| assembly_list.txt contents |
---|
|
|
Expand |
---|
How do I look into the file? | How do I look into the file? | cat $BI/ngs_course/tophat_cufflinks/ | runcommands/cufflinks.commands |
Take a minute to look at the output files produced by one cufflinks run cuffmerge.
The transcripts.gtf file is most important file is merged.gif, which contains the consensus transcriptome annotations cuffmerge has calculated.
Code Block |
---|
title | Cufflinks cuffmerge output files |
---|
|
cd $BI/ngs_course/tophat_cufflinks/merged_asm
-rwxr-xrls -l
-rwxrwxr-x 1 daras G-803889 1571816 Aug 16 2012 genes.fpkm_tracking
-rwxrwxr-x 1 daras G-803889 14M2281319 Aug 16 12:492012 transcriptsisoforms.gtf
-rwxr-xrfpkm_tracking
drwxrwxr-x 12 daras G-803889 597K 32768 Aug 16 12:49 genes.fpkm_tracking 2012 logs
-rwxrr-xrxrwxr-x 1 daras G-803889 960K32090408 Aug 16 12:49 isoforms.fpkm_tracking
-rwxr-xr 2012 merged.gtf
-rwxrwxr-x 1 daras G-803889 0 Aug 16 12:33 skipped.gtf
|
Step 3: Merging assemblies using cuffmerge
Code Block |
---|
find . -name transcripts.gtf > assembly_list.txt
cuffmerge <assembly_list.txt>
|
Code Block |
---|
title | Take a minute to look at the output files produced by cuffmerge. |
---|
|
The most important file is:
-r-xr-xr-x 2012 skipped.gtf
drwxrwxr-x 2 daras G-803889 32768 Aug 16 2012 tmp
-rwxrwxr-x 1 daras G-803889 3209040834844830 Aug 16 20:11 merged 2012 transcripts.gtf
|
Step 4: Finding differentially expressed genes and isoforms using cuffdiff
...