Environment setup

Directories and symlinks

Directories and links needed in your home directory.

ln -s -f $SCRATCH scratch
ln -s -f $WORK work
ln -s -f /work/projects/BioITeam

mkdir -p ~/local/bin
cd ~/local/bin
ln -s -f /work/projects/BioITeam/common/bin/launcher_maker.py
ln -s -f /work/projects/BioITeam/ls5/opt/cutadapt-1.10/bin/cutadapt
ln -s -f /work/projects/BioITeam/ls5/opt/multiqc-1.0/multiqc
ln -s -f /work/projects/BioITeam/ls5/opt/samstat-1.09/samstat

.bashrc setup

If you already have a .bashrc set up, make a backup copy first. You can restore your original login script after class is over.

cp .bashrc .bashrc.beforeNGS

Copy and configure the login profile for this class

cp /work/projects/BioITeam/projects/courses/Core_NGS_Tools/tacc/bashrc.corengs.ls5  .bashrc
chmod 600 .bashrc

Source it to make it active (if this doesn't work, log off then log back in):

source ~/.bashrc

Environment variables


export ALLOCATION=UT-2015-05-18
export BI=/corral-repl/utexas/BioITeam
export BIWORK=/work/projects/BioITeam
export CORENGS=$BIWORK/projects/courses/Core_NGS_Tools

 export PATH=.:$HOME/local/bin:$PATH
# For cutadapt support:
export PYTHONPATH=$BIWORK/ls5/lib/python2.7/site-packages:$PYTHONPATH
# For MultiQC support:
export PYTHONPATH=$BIWORK/ls5/lib/python2.7/annab-packages:$PYTHONPATH

Turn on coloring by file type in the shell:

export LS_OPTIONS='-N --color=auto -T 0'

# For better colors using a white background terminal, un-comment this line:
export LS_COLORS=$LS_COLORS:'di=1;33:'

# For better colors using a white background terminal:
export LS_COLORS=$LS_COLORS:'di=1;34:'

TACC intro

Commands files

Simple commands

mkdir -p $SCRATCH/core_ngs/slurm/simple
cd $SCRATCH/core_ngs/slurm/simple
cp $CORENGS/tacc/simple.cmds 

Wayness commands

mkdir -p $SCRATCH/core_ngs/slurm/wayness
cd $SCRATCH/core_ngs/slurm/wayness
cp $CORENGS/tacc/wayness.cmds .

Start an idev session

To start a 3-hour idev (interactive development) session:

idev -p normal -m 180 -N 1 -n 24 -A UT-2015-05-18 --reservation=CCBB

You can tell you're in a idev session because the hostname command will return a compute node name (e.g. nid00438) instead of a login node name (e.g. login5).

The n idev session will terminate when the requested time has expired, or you use the exit command.

Working with FASTQ

Yeast data

Working with some yeast ChIP-seq FASTQ data:

# Area for "original" sequencing data
mkdir -p $WORK/archive/original/2018_05.core_ngs
cd $WORK/archive/original/2018_05.core_ngs
wget http://web.corral.tacc.utexas.edu/BioITeam/yeast_stuff/Sample_Yeast_L005_R1.cat.fastq.gz
wget http://web.corral.tacc.utexas.edu/BioITeam/yeast_stuff/Sample_Yeast_L005_R2.cat.fastq.gz

# Create a $SCRATCH area for FASTQ prep and link the yeast data there
mkdir -p $SCRATCH/core_ngs/fastq_prep
cd $SCRATCH/core_ngs/fastq_prep
ln -s -f $WORK/archive/original/2018_05.core_ngs/Sample_Yeast_L005_R1.cat.fastq.gz
ln -s -f $WORK/archive/original/2018_05.core_ngs/Sample_Yeast_L005_R2.cat.fastq.gz

# Copy over a small FASTQ file
cd $SCRATCH/core_ngs/fastq_prep
cp $CORENGS/misc/small.fq .

ATACseq data for MultiQC

Get some FastQC reports for MultiQC:

mkdir -p $SCRATCH/core_ngs/multiqc/fqc.atacseq
cd $SCRATCH/core_ngs/multiqc/fqc.atacseq
cp $CORENGS/multiqc/fqc.atacseq/*.html 

FASTQ files for cutadapt

For command-line cutadapt exploration:

cd $SCRATCH/core_ngs/fastq_prep
cp $CORENGS/human_stuff/Sample_H54_miRNA_L004_R1.cat.fastq.gz .
cp $CORENGS/human_stuff/Sample_H54_miRNA_L005_R1.cat.fastq.gz .
zcat Sample_H54_miRNA_L004_R1.cat.fastq.gz | head -2000 > miRNA_test.fq

For batch cutadapt processing:

mkdir -p $SCRATCH/core_ngs/cutadapt
cd $SCRATCH/core_ngs/cutadapt
cp $CORENGS/human_stuff/Sample_H54_miRNA_L004_R1.cat.fastq.gz .
cp $CORENGS/human_stuff/Sample_H54_miRNA_L005_R1.cat.fastq.gz .
cp $CORENGS/yeast_stuff/Yeast_RNAseq_L002_R1.fastq.gz .
cp $CORENGS/yeast_stuff/Yeast_RNAseq_L002_R2.fastq.gz .
cp $CORENGS/tacc/cuta.cmds .

Alignment workflow

Alignment workflow setup

Starting files:

# FASTA (for building references)
mkdir -p $SCRATCH/core_ngs/references/fasta
cp $CORENGS/references/*.* $SCRATCH/core_ngs/references/fasta/

# FASTQ (to align)
mkdir -p $SCRATCH/core_ngs/alignment/fastq
cp $CORENGS/alignment/*fastq.gz $SCRATCH/core_ngs/alignment/fastq/


Get a copy of all references we build in the exercises (including FASTA):

mkdir -p $SCRATCH/core_ngs/references
rsync -ptlvrP $CORENGS/references/ $SCRATCH/core_ngs/references/

BWA PE alignment of yeast data

To jump into aligning PE yeast data with BWA

# Pre-built references
mkdir -p $SCRATCH/core_ngs/references
rsync -ptlvrP $CORENGS/references/ $SCRATCH/core_ngs/references/

# FASTQ (to align)
mkdir -p $SCRATCH/core_ngs/alignment/fastq
cp $CORENGS/alignment/*fastq.gz $SCRATCH/core_ngs/alignment/fastq/

# Alignment directory
mkdir -p $SCRATCH/core_ngs/alignment/yeast_bwa
cd $SCRATCH/core_ngs/alignment/yeast_bwa
ln -s -f ../fastq
ln -s -f ../../references/bwa/sacCer3