...
Now that you've done everything the hard way, let's see how to do run an alignment pipeline using a BWA alignment script maintained by the BioITeam
First the setup, to work in a new directory:
Code Block | ||
---|---|---|
| ||
mkdir -p $SCRATCH/core_ngs/alignment/bwa_script
cd $SCRATCH/core_ngs/alignment/bwa_script
ln -s -f ../../fastq |
The BioITeam BWA alignment script is , /work/projects/BioITeam/common/script/align_bwa_illumina.sh. Type in the script name to see its usage.
...
- aln_mode – whether to perform a global or local alignment (the 1st argument must be one of those words)
- global mode uses the bwa aln workflow as we did above
- local mode uses the bwa mem command
- FASTQ file – full or relative path to the FASTQ file (just the R1 fastq if paired end). Can be compressed (.gz)
- output prefix – prefix for all the output files produced by the script. Should relate back to what the data is.
- assembly – genome assembly to use.
- there are pre-built indexes for some common eukaryotes (hg19, hg18, mm10, mm9, danRer7, sacCer3) that you can use
- or provide a full path for a bwa reference index you have built somewhere
- paired flag – 0 means single end (the default); 1 means paired end
- hard trim length – if you want the FASTQ hard trimmed down to a specific length before alignment, supply that number here
We're going to run this script and a similar Bowtie2 alignment script, on the yeast data using the TACC batch system. Copy over the commands file and take a look at itand submit the batch job:
Code Block | ||
---|---|---|
| ||
mkdir -p $SCRATCH/core_ngs/alignment/bwa_script cd $SCRATCH/core_ngs/alignment/bwa_script ln -s -f ../../fastq cp $CORENGS/tacc/aln_script.cmds . cat alnlauncher_scriptmaker.cmdspy |
Code Block | ||||
---|---|---|---|---|
| ||||
cd $SCRATCH/core_ngs/alignment/bwa_script
cp $CORENGS/tacc/aln_script.cmds .
cat aln_script.cmds |
While we're waiting
Code Block | ||||
---|---|---|---|---|
| ||||
/work/projects/BioITeam/common/script/alignalign_bwa_illumina.sh global ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz yeastbwa_chipglobal sacCer3 1 50 /work/projects/BioITeam/common/script/align_bwa_illumina.sh local ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz bwa_local sacCer3 1 /work/projects/BioITeam/common/script/align_bowtie2_illumina.sh global ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz bt2_global sacCer3 1 50 /work/projects/BioITeam/common/script/align_bowtie2_illumina.sh local ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz bt2_local sacCer3 1 |
Notes:
- The 1st command performs a paired-end BWA global alignment similar to what we did before, but asks that the 100 bp reads be trimmed to 50 first.
- we refer to the pre-built index for yeast by name: sacCer3
- this index is located in the /work/projects/BioITeam/ref_genome/bwa/bwtsw/sacCer3/ directory
- we provide the name of the R1 FASTQ file
- because we request a PE alignment (the 1 argument) the script will look for a similarly-named R2 file.
- all output files associated with this command will be named with the prefix bwa_global.
- we refer to the pre-built index for yeast by name: sacCer3
- The 2nd command performs a paired-end BWA local alignment.
- all output files associated with this command will be named with the prefix bwa_local.
- no trimming is requested because the local alignment should ignore 5' and 3' bases that don't match the reference genome
- The 3rd command performs a paired-end Bowtie2 global alignment.
- the Bowtie2 alignment script has the same first arguments as the BWA alignment script.
- all output files associated with this command will be named with the prefix bt2_global.
- again, we specify that reads should first be trimmed to 50 bp.
- The 4th command performs a paired-end Bowtie2 local alignment.
- all output files associated with this command will be named with the prefix bt2_local.
- again, no trimming is requested for the local alignment
in both global and local alignment modes, and also ru
...