Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
$path_code/script/align/align_bwa_illumina.sh ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz yeast_chip sacCer3 1 2>&1 | tee aln.yeast_chip.log

Output files

This alignment pipeline script performs the following steps:

  • Hard trims FASTQ, if optionally specified (fastx_trimmer)
  • Aligns the R1 FASTQ (bwa aln)
  • Aligns the R2 FASTQ, if paired end alignment specified (bwa aln)
  • Reports the alignments as SAM (bwa samse for single end, or bwa sampe for paired end)
  • Converts SAM to BAM (samtools view)
  • Sorts the BAM (samtools sort)
  • Marks duplicates (Picard MarkDuplicates)
  • Indexes the sorted, duplicate-marked BAM (samtools index)
  • Gathers statistics (samtools idxstats, samtools flagstat, plus a custom statistics script of Anna's)
  • Removes intermediate files

There are a number of output files, with the most important being those desribed below.

  1. aln.<prefix>.log – Log file of the entire alignment process.
    • check the tail of this file to make sure the alignment was successful
  2. <prefix>.sort.dup.bam – Sorted, duplicate-marked alignment file.
  3. <prefix>.sort.dup.bam.bai – Index for the sorted, duplicate-marked alignment file
  4. <prefix>.samstats.txt – Summary alignment statistics from Anna's stats script

TACC batch system considerations

...

Tip
titleAlways specify wayness 2 for these pipeline scripts

 These pipeline scripts should always be run with a wayness of 2 (-w 2) in the TACC batch system, meaning two commands per node.

Assuming you have your alignment commands in a file called aln.cmds, here's how to create and submit a batch job for the commands.
 

Code Block
languagebash
titleSubmit BWA alignment pipeline job
launcher_creator.py -n aln -j aln.cmds -t 12:00:00 -q normal -w 2
sbatch aln.slurm
showq -u

...

Exercise: What would alignment commands look like if you were putting it in a batch system .cmds file? 

Expand
titleAnswer

Assuming you have $path_code set properly before submitting the job, the batch command would look like the command above, but you don't need the tee pipe. Instead, just redirect all output to a file. The example below shows how you would run alignments on two yeast samples in a batch file, adjusting the output prefix (yeast1, yeast2) and log file (aln.yeast1.log, aln.yeast2.log) accordingly.

Code Block
languagebash
$path_code/script/align/align_bwa_illumina.sh ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz yeast1 sacCer3 1 2>&1 > aln.yeast1.log
$path_code/script/align/align_bwa_illumina.sh ./fastq/Sample_ABCDE_L005_R1.cat.fastq.gz yeast2 sacCer3 1 2>&1 > aln.yeast2.log

...