Objectives

In this lab, you will explore a faster splice aware mapper called STAR. Simulated RNA-seq data will be provided to you; the data contains 75 bp paired-end reads that have been generated in silico to replicate real gene count data from Drosophila. The data simulates two biological groups with three biological replicates per group (6 samples total).  The objectives of this lab is mainly to:


  1. Learn how STAR works and how to use it.

Introduction

STAR '‘Spliced Transcripts Alignment to a Reference" is a faster alternative  for splice-aware read alignment. STAR can align the non-contiguous sequences directly to the genome. The STAR algorithm consists of two major steps: seed searching step and clustering/stitching/scoring step. STAR is more memory intensive (30 gb of RAM required for human genome as compared to ~5 gb required by hisat2), but it is fast.




Get your data

Six raw data files have been provided for all our further RNA-seq analysis:

  • c1_r1, c1_r2, c1_r3 from the first biological condition
  • c2_r1, c2_r2, and c2_r3 from the second biological condition


Get set up for the exercises
cds
cd my_rnaseq_course/day_2
cd star_exercise

Run STAR

See if STAR is a module that is available on stampede2.

module spider star

#You will need to load the intel/17.0.4 module for STAR
module load intel/17.0.4
module load star
STAR

Part 1. Create a index of your reference

NO NEED TO RUN THIS NOW- YOUR INDEX HAS ALREADY BEEN BUILT!

STAR --runMode genomeGenerate --genomeDir STAR_genome/ --genomeFastaFiles genome.fa --sjdbGTFfile genes.formatted.gtf --sjdbOverhang 74 --genomeChrBinNbits 14


Part 2. Align the samples to reference 

Submit to the TACC queue or run in an idev shell

Create a commands file and use launcher_creator.py followed by sbatch.

nano commands.star


Put this in your commands file:

STAR --runThreadN 1 --outFileNamePrefix C1_R1 --outSAMstrandField intronMotif --genomeDir ../reference/STAR_genome --outSAMtype BAM Unsorted --readFilesIn ../data/GSM794483_C1_R1_1.fq ../data/GSM794483_C1_R1_2.fq

STAR --runThreadN 1 --outFileNamePrefix C1_R2 --outSAMstrandField intronMotif --genomeDir ../reference/STAR_genome --outSAMtype BAM Unsorted --readFilesIn ../data/GSM794484_C1_R2_1.fq ../data/GSM794484_C1_R2_2.fq

STAR --runThreadN 1 --outFileNamePrefix C1_R3 --outSAMstrandField intronMotif --genomeDir ../reference/STAR_genome --outSAMtype BAM Unsorted --readFilesIn ../data/GSM794485_C1_R3_1.fq ../data/GSM794485_C1_R3_2.fq

STAR --runThreadN 1 --outFileNamePrefix C2_R1 --outSAMstrandField intronMotif --genomeDir ../reference/STAR_genome --outSAMtype BAM Unsorted --readFilesIn ../data/GSM794486_C2_R1_1.fq ../data/GSM794486_C2_R1_2.fq

STAR --runThreadN 1 --outFileNamePrefix C2_R2 --outSAMstrandField intronMotif --genomeDir ../reference/STAR_genome --outSAMtype BAM Unsorted --readFilesIn ../data/GSM794487_C2_R2_1.fq ../data/GSM794487_C2_R2_2.fq

STAR --runThreadN 1 --outFileNamePrefix C2_R3 --outSAMstrandField intronMotif --genomeDir ../reference/STAR_genome --outSAMtype BAM Unsorted --readFilesIn ../data/GSM794488_C2_R3_1.fq ../data/GSM794488_C2_R3_2.fq

launcher_creator.py -j commands.star -n starmap -q normal -t 08:00:00 -a UT-2015-05-18 -l star_launcher.slurm -m "module load intel/17.0.4; module load star" -N 6 -w 1

This brings up the concept of wayness when running things on stampede2!  Let's go back to Submitting Jobs to stampede2 to discuss that.

Back to Course Outline


  • No labels