Page History

...

Importantly, the output of this command is a group of files that are all required together as the index. So, within the references directory, we will create another directory called bwa/sacCer3, make a symbolic link to the yeast FASTA there, and run the index command in that directory. So, first we will create a directory called fasta to hold our reference FASTA file, then create the bwa/sacCer3 directory, and construct the symbolic links.

Code Block

language	bash
title	Prepare BWA reference directory for sacCer3

mkdir -p $WORK/archivecore_ngs/references/bwa/sacCer3
mkdir -p $WORK/core_ngs/references/fasta
mv $WORK/core_ngs/references/sacCer3.fa $WORK/core_ngs/references/fasta
cd $WORK/archivecore_ngs/references/bwa/sacCer3
ln -s ../../fasta/sacCer3.fa
ls -la

...

Code Block

language	bash
title	Prepare to align yeast data

cd $SCRATCH/core_ngs/alignalignment
ln -s $WORK/archivecore_ngs/references/bwa/sacCer3
ls sacCer3

...

Code Block

language	bash
title	bwa aln commands for yeast R1 and R2

bwa aln -t 8 sacCer3/sacCer3.fa fqfastq/Sample_Yeast_L005_R1.cat.fastq.gz > yeast_R1.sai 2> aln.yeast_R1.log
bwa aln -t 8 sacCer3/sacCer3.fa fqfastq/Sample_Yeast_L005_R2.cat.fastq.gz > yeast_R2.sai 2> aln.yeast_R2.log

...

Code Block

language	bash
title	BWA global alignment of R1 reads

bwa sampe sacCer3/sacCer3.fa yeast_R1.sai yeast_R2.sai fqfastq/Sample_Yeast_L005_R1.cat.fastq.gz fqfastq/Sample_Yeast_L005_R2.cat.fastq.gz > yeast_pairedend.sam

...

Exercise: How many sequences were in the R1 and R2 FASTQ files combined?

Expand

title	Hint

gunzip -c fqfastq/Sample_Yeast_L005_R1.cat.fastq.gz | echo $((`wc -l` / 2))

...

Following what we did earlier for BWA indexing:, namely move our FASTA into place, create the index directory, and establish our symbolic links.

Code Block

title
language	bash

Code Block

language	bash
title	Prepare Bowtie2 index directory for mirbase

mkdir -p $WORK/archivecore_ngs/references/bt2/mirbase.v20
mv $WORK/core_ngs/references/hairpin_cDNA_hsa.fa $WORK/core_ngs/references/fasta
cd $WORK/archivecore_ngs/references/bt2/mirbase.v20
ln -s -f ../../fasta/hairpin_cDNA_hsa.fa
ls -la

...

Code Block

language	bash
title	Examine miRNA sequences to be aligned

cd $SCRATCH/core_ngs/alignalignment
less fqfastq/human_mirnaseq.fastq.gz

Lots of reads have long strings of A's, which must be an adapter or protocol artifact. Even though we see how we might be able to fix it using some tools we've talked about, what if we had no idea what the adapter sequence was, or couldn't use cutadapt or other programs to prepare the reads?

...

Code Block

language	bash
title	Link to mirbase index for bowtie2

cd $SCRATCH/core_ngs/alignalignment
ln -s -f $WORK/archivecore_ngs/references/bt2/mirbase.v20 mb20

...

Code Block

language	bash
title	Local bowti2 alignment of miRNA data

bowtie2 --local -N 1 -L 16 -x mb20/hairpin_cDNA_hsa.fa -U fqfastq/human_mirnaseq.fastq.gz -S human_mirnaseq.sam

...

Prepare the vibCho reference index for bowtie2 from GenBank files using BioPerl
Align reads using bowtie2, producing a SAM file
Convert the SAM file to a BAM file (samtools view)
Sort the BAM file by genomic location (samtools sort)
Index the BAM file (samtools index)
Gather simple alignment statistics (samtools flagstat and samtools idxstat)

Obtaining GenBank sequence files for Vibrio Cholerae

Converting GenBank files into usable reference FASTA files and genome annotations (GFF files)

BioPerl is available through the module system, so load it up like we have done before, with the caveat that we load regular Perl before we load BioPerl.

...

Expand

title	Hint:

Code Block

language	bash

cds
cd $SCRATCH/core_ngs/align/alignment
ls fqfastq
gunzip -c fqfastq/human_rnaseq.fastq.gz | echo $((`wc -l`/4))

...

Now, try aligning it with bwa aln like we did in Example #1, but first link to the hg19 bwa index directory. In this case, due to the size of the hg19 index, we are linking to Anna's scratch area INSTEAD of our own work area containing indexes that we built ourselves.

Code Block

language	bash
title	Link to BWA hg19 index directory

cd $SCRATCH/core_ngs/alignalignment
ln -s -f /scratch/01063/abattenh/ref_genome/bwa/bwtsw/hg19
ls hg19

...

Code Block

language	bash
title	Commands to bwa aln RNA-seq data

bwa aln hg19/hg19.fa fqfastq/human_rnaseq.fastq.gz > human_rnaseq_bwa.sai
bwa samse hg19/hg19.fa human_rnaseq_bwa.sai fqfastq/human_rnaseq.fastq.gz > human_rnaseq_bwa.sam

...

Page tree

Versions Compared

Old Version 77

New Version 78

Key

Obtaining GenBank sequence files for Vibrio Cholerae

Converting GenBank files into usable reference FASTA files and genome annotations (GFF files)