...
Code Block |
---|
language | bash |
---|
title | Start an idev session |
---|
|
idev -p developmentnormal -m 120180 -A UT-2015-05-18 -N 1 -n 68 --reservation=BIO_DATA_week_1 |
Then stage the sample datasets and references we will use.
Code Block |
---|
language | bash |
---|
title | Get the alignment exercises files |
---|
|
mkdir -p $SCRATCH/core_ngs/references/fasta
mkdir -p $SCRATCH/core_ngs/alignment/fastq
cp $CORENGS/references/*.fa $SCRATCH/core_ngs/references/fasta/
cp $CORENGS/alignment/*fastq.gz $SCRATCH/core_ngs/alignment/fastq/
cd $SCRATCH/core_ngs/alignment/fastq |
...
Expand |
---|
title | Make sure you're in a idev session |
---|
|
Make sure you're in an idev session. If you're in an idev session, the hostname command will display a name like c455-021.stampede2.tacc.utexas.edu. But if you're on a login node the hostname will be something like login3.stampede2.tacc.utexas.edu. If you're on a login node, start an idev session like this: Code Block |
---|
language | bash |
---|
title | Start an idev session |
---|
| idev -p developmentnormal -m 120 -A UT-2015-05-18 -N 1 -n 68 --reservation=BIO_DATA_week_1 |
|
Code Block |
---|
|
module load biocontainers # takes a while
module load bwa
bwa |
...
First we need to make sure that we don't look at fields in the SAM header lines. We're going to end up with a series of pipe operations, and the best way to make sure you're on track is to enter them one at a time piping to head:
...
Expand |
---|
|
The expression above returns 612,968. There were 1,184,360 records total, so the percentage is: Code Block |
---|
language | bash |
---|
title | Calculate alignment rate |
---|
| awk 'BEGIN{print 612968/1184360}' |
or about 51%. Not great. Note we perform this calculation in awk's BEGIN block, which is always executed, instead of the body block, which is only executed for lines of input. And here we call awk without piping it any input. See Linux fundamentals: cut,sort,uniq,grep,awk |
Exercise: What might we try in order to improve the alignment rate?
...
Expand |
---|
title | Make sure you're in a idev session |
---|
|
Make sure you're in an idev session. If you're in an idev session, the hostname command will display a name like c455-021.stampede2.tacc.utexas.edu. But if you're on a login node the hostname will be something like login3.stampede2.tacc.utexas.edu. If you're on a login node, start an idev session like this: Code Block |
---|
language | bash |
---|
title | Start an idev session |
---|
| idev -p developmentnormal -m 120 -A UT-2015-05-18 -N 1 -n 68 --reservation=BIO_DATA_week_1 |
|
Code Block |
---|
|
# If not already loaded
module load biocontainers # takes a while
module load samtools
samtools |
...
Expand |
---|
title | Catch up (if needed) |
---|
|
Code Block |
---|
| # Copy over the Yeast data if needed
mkdir -p $SCRATCH/core_ngs/alignment/fastq
cp $CORENGS/alignment/Sample_Yeast*.gz $SCRATCH/core_ngs/alignment/fastq/
# Copy a pre-built sacCer3 reference if you didn't build one already
mkdir -p $SCRATCH/core_ngs/references
rsync -avrP $CORENGS/references/ $SCRATCH/core_ngs/references/ |
|
Code Block |
---|
language | bash |
---|
title | Run multiple alignments using the TACC batch system |
---|
|
# Make sure you're not in an idev session by looking at the hostname
hostname
# If the hostname looks like "c455-004.stampede2.tacc.utexas.edu", exit the idev session
# Make a new alignment directory for running these scripts
mkdir -p $SCRATCH/core_ngs/alignment/bwa_script
cd $SCRATCH/core_ngs/alignment/bwa_script
ln -s -f ../fastq
# Copy the alignment commands file and submit the batch job
cp $CORENGS/tacc/aln_script.cmds .
launcher_creator.py -j aln_script.cmds -n aln_script -t 02:00:00 -w 4 -a UT-2015-05-18 -q normal
sbatch --reservation=BIO_DATA_week_1 aln_script.slurm
showq -u |
...