...
Code Block | ||
---|---|---|
| ||
$path_code/script/align/align_bwa_illumina.sh ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz yeast_chip sacCer3 1 2>&1 | tee aln.yeast_chip.log |
Output files
This alignment pipeline script performs the following steps:
- Hard trims FASTQ, if optionally specified (fastx_trimmer)
- Aligns the R1 FASTQ (bwa aln)
- Aligns the R2 FASTQ, if paired end alignment specified (bwa aln)
- Reports the alignments as SAM (bwa samse for single end, or bwa sampe for paired end)
- Converts SAM to BAM (samtools view)
- Sorts the BAM (samtools sort)
- Marks duplicates (Picard MarkDuplicates)
- Indexes the sorted, duplicate-marked BAM (samtools index)
- Gathers statistics (samtools idxstats, samtools flagstat, plus a custom statistics script of Anna's)
- Removes intermediate files
There are a number of output files, with the most important being those desribed below.
- aln.<prefix>.log – Log file of the entire alignment process.
- check the tail of this file to make sure the alignment was successful
- <prefix>.sort.dup.bam – Sorted, duplicate-marked alignment file.
- <prefix>.sort.dup.bam.bai – Index for the sorted, duplicate-marked alignment file
- <prefix>.samstats.txt – Summary alignment statistics from Anna's stats script
TACC batch system considerations
...
Tip | ||
---|---|---|
| ||
These pipeline scripts should always be run with a wayness of 2 (-w 2) in the TACC batch system, meaning two commands per node. |
Assuming you have your alignment commands in a file called aln.cmds, here's how to create and submit a batch job for the commands.
Code Block | ||||
---|---|---|---|---|
| ||||
launcher_creator.py -n aln -j aln.cmds -t 12:00:00 -q normal -w 2 sbatch aln.slurm showq -u |
...
Exercise: What would alignment commands look like if you were putting it in a batch system .cmds file?
Expand | |||||
---|---|---|---|---|---|
| |||||
Assuming you have $path_code set properly before submitting the job, the batch command would look like the command above, but you don't need the tee pipe. Instead, just redirect all output to a file. The example below shows how you would run alignments on two yeast samples in a batch file, adjusting the output prefix (yeast1, yeast2) and log file (aln.yeast1.log, aln.yeast2.log) accordingly.
|
...