Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Importantly, the output of this command is a group of files that are all required together as the index. So, within the references directory, we will create another directory called bwa/sacCer3, make a symbolic link to the yeast FASTA there, and run the index command in that directory.  So, first we will create a directory called fasta to hold our reference FASTA file, then create the bwa/sacCer3 directory, and construct the symbolic links.

Code Block
languagebash
titlePrepare BWA reference directory for sacCer3
mkdir -p $WORK/archivecore_ngs/references/bwa/sacCer3
mkdir -p $WORK/core_ngs/references/fasta
mv $WORK/core_ngs/references/sacCer3.fa $WORK/core_ngs/references/fasta
cd $WORK/archivecore_ngs/references/bwa/sacCer3
ln -s ../../fasta/sacCer3.fa
ls -la

...

Code Block
languagebash
titlePrepare to align yeast data
cd $SCRATCH/core_ngs/alignalignment
ln -s $WORK/archivecore_ngs/references/bwa/sacCer3
ls sacCer3

...

Code Block
languagebash
titlebwa aln commands for yeast R1 and R2
bwa aln -t 8 sacCer3/sacCer3.fa fqfastq/Sample_Yeast_L005_R1.cat.fastq.gz > yeast_R1.sai 2> aln.yeast_R1.log
bwa aln -t 8 sacCer3/sacCer3.fa fqfastq/Sample_Yeast_L005_R2.cat.fastq.gz > yeast_R2.sai 2> aln.yeast_R2.log

...

Code Block
languagebash
titleBWA global alignment of R1 reads
bwa sampe sacCer3/sacCer3.fa yeast_R1.sai yeast_R2.sai fqfastq/Sample_Yeast_L005_R1.cat.fastq.gz fqfastq/Sample_Yeast_L005_R2.cat.fastq.gz > yeast_pairedend.sam

...

Exercise: How many sequences were in the R1 and R2 FASTQ files combined?

Expand
titleHint

gunzip -c fqfastq/Sample_Yeast_L005_R1.cat.fastq.gz | echo $((`wc -l` / 2))

...

 Following what we did earlier for BWA indexing:, namely move our FASTA into place, create the index directory, and establish our symbolic links.

Code Block
languagebash
title
Code Block
languagebash
titlePrepare Bowtie2 index directory for mirbase
mkdir -p $WORK/archivecore_ngs/references/bt2/mirbase.v20
mv $WORK/core_ngs/references/hairpin_cDNA_hsa.fa $WORK/core_ngs/references/fasta
cd $WORK/archivecore_ngs/references/bt2/mirbase.v20
ln -s -f ../../fasta/hairpin_cDNA_hsa.fa
ls -la 

...

Code Block
languagebash
titleExamine miRNA sequences to be aligned
cd $SCRATCH/core_ngs/alignalignment
less fqfastq/human_mirnaseq.fastq.gz

Lots of reads have long strings of A's, which must be an adapter or protocol artifact.  Even though we see how we might be able to fix it using some tools we've talked about, what if we had no idea what the adapter sequence was, or couldn't use cutadapt or other programs to prepare the reads?

...

Code Block
languagebash
titleLink to mirbase index for bowtie2
cd $SCRATCH/core_ngs/alignalignment
ln -s -f $WORK/archivecore_ngs/references/bt2/mirbase.v20 mb20

...

Code Block
languagebash
titleLocal bowti2 alignment of miRNA data
bowtie2 --local -N 1 -L 16 -x mb20/hairpin_cDNA_hsa.fa -U fqfastq/human_mirnaseq.fastq.gz -S human_mirnaseq.sam

...

  1. Prepare the vibCho reference index for bowtie2 from GenBank files using BioPerl
  2.  Align reads using bowtie2, producing a SAM file
  3. Convert the SAM file to a BAM file (samtools view) 
  4. Sort the BAM file by genomic location (samtools sort)
  5. Index the BAM file (samtools index)
  6. Gather simple alignment statistics (samtools flagstat and samtools idxstat)

Obtaining GenBank sequence files for Vibrio Cholerae

 

Converting GenBank files into usable reference FASTA files and genome annotations (GFF files)

BioPerl is available through the module system, so load it up like we have done before, with the caveat that we load regular Perl before we load BioPerl.

...

Expand
titleHint:
Code Block
languagebash
cds
cd $SCRATCH/core_ngs/align/alignment
ls fqfastq
gunzip -c fqfastq/human_rnaseq.fastq.gz | echo $((`wc -l`/4))

...

Now, try aligning it with bwa aln like we did in Example #1, but first link to the hg19 bwa index directory.  In this case, due to the size of the hg19 index, we are linking to Anna's scratch area INSTEAD of our own work area containing indexes that we built ourselves.

Code Block
languagebash
titleLink to BWA hg19 index directory
cd $SCRATCH/core_ngs/alignalignment
ln -s -f /scratch/01063/abattenh/ref_genome/bwa/bwtsw/hg19
ls hg19

...

Code Block
languagebash
titleCommands to bwa aln RNA-seq data
bwa aln hg19/hg19.fa fqfastq/human_rnaseq.fastq.gz > human_rnaseq_bwa.sai
bwa samse hg19/hg19.fa human_rnaseq_bwa.sai fqfastq/human_rnaseq.fastq.gz > human_rnaseq_bwa.sam

...