Page History

...

Now that you've done everything the hard way, let's see how to do run an alignment pipeline using a BWA alignment script maintained by the BioITeam

First the setup, to work in a new directory:

Code Block

language	bash

mkdir -p $SCRATCH/core_ngs/alignment/bwa_script
cd $SCRATCH/core_ngs/alignment/bwa_script
ln -s -f ../../fastq

The BioITeam BWA alignment script is , /work/projects/BioITeam/common/script/align_bwa_illumina.sh. Type in the script name to see its usage.

...

aln_mode – whether to perform a global or local alignment (the 1st argument must be one of those words)
- global mode uses the bwa aln workflow as we did above
- local mode uses the bwa mem command
FASTQ file – full or relative path to the FASTQ file (just the R1 fastq if paired end). Can be compressed (.gz)
output prefix – prefix for all the output files produced by the script. Should relate back to what the data is.
assembly – genome assembly to use.
- there are pre-built indexes for some common eukaryotes (hg19, hg18, mm10, mm9, danRer7, sacCer3) that you can use
- or provide a full path for a bwa reference index you have built somewhere
paired flag – 0 means single end (the default); 1 means paired end
hard trim length – if you want the FASTQ hard trimmed down to a specific length before alignment, supply that number here

We're going to run this script and a similar Bowtie2 alignment script, on the yeast data using the TACC batch system. Copy over the commands file and take a look at itand submit the batch job:

Code Block

language	bash

mkdir -p $SCRATCH/core_ngs/alignment/bwa_script
cd $SCRATCH/core_ngs/alignment/bwa_script
ln -s -f ../../fastq
cp $CORENGS/tacc/aln_script.cmds .
cat alnlauncher_scriptmaker.cmdspy

Code Block

language	bash
title	Copy the commands file

cd $SCRATCH/core_ngs/alignment/bwa_script
cp $CORENGS/tacc/aln_script.cmds .
cat aln_script.cmds

While we're waiting

Code Block

language	bash
title	Commands to run multiple alignments

/work/projects/BioITeam/common/script/alignalign_bwa_illumina.sh global     ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz yeastbwa_chipglobal sacCer3 1 50
/work/projects/BioITeam/common/script/align_bwa_illumina.sh local      ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz bwa_local sacCer3 1
/work/projects/BioITeam/common/script/align_bowtie2_illumina.sh global ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz bt2_global sacCer3 1 50
/work/projects/BioITeam/common/script/align_bowtie2_illumina.sh local  ./fastq/Sample_Yeast_L005_R1.cat.fastq.gz bt2_local sacCer3 1

Notes:

The 1st command performs a paired-end BWA global alignment similar to what we did before, but asks that the 100 bp reads be trimmed to 50 first.
- we refer to the pre-built index for yeast by name: sacCer3
  - this index is located in the /work/projects/BioITeam/ref_genome/bwa/bwtsw/sacCer3/ directory
- we provide the name of the R1 FASTQ file
  - because we request a PE alignment (the 1 argument) the script will look for a similarly-named R2 file.
- all output files associated with this command will be named with the prefix bwa_global.
The 2nd command performs a paired-end BWA local alignment.
- all output files associated with this command will be named with the prefix bwa_local.
- no trimming is requested because the local alignment should ignore 5' and 3' bases that don't match the reference genome
The 3rd command performs a paired-end Bowtie2 global alignment.
- the Bowtie2 alignment script has the same first arguments as the BWA alignment script.
- all output files associated with this command will be named with the prefix bt2_global.
- again, we specify that reads should first be trimmed to 50 bp.
The 4th command performs a paired-end Bowtie2 local alignment.
- all output files associated with this command will be named with the prefix bt2_local.
- again, no trimming is requested for the local alignment

in both global and local alignment modes, and also ru

...

Page tree

Versions Compared

Old Version 140

New Version 141

Key