...
Importantly, the output of this command is a group of files that are all required together as the index. So, within the references directory, we will create another directory called bwa/sacCer3, make a symbolic link to the yeast FASTA there, and run the index command in that directory. So, first we will create a directory called fasta to hold our reference FASTA file, then create the bwa/sacCer3 directory, and construct the symbolic links.
Code Block |
---|
language | bash |
---|
title | Prepare BWA reference directory for sacCer3 |
---|
|
mkdir -p $WORK/archivecore_ngs/references/bwa/sacCer3
mkdir -p $WORK/core_ngs/references/fasta
mv $WORK/core_ngs/references/sacCer3.fa $WORK/core_ngs/references/fasta
cd $WORK/archivecore_ngs/references/bwa/sacCer3
ln -s ../../fasta/sacCer3.fa
ls -la |
...
Code Block |
---|
language | bash |
---|
title | Prepare to align yeast data |
---|
|
cd $SCRATCH/core_ngs/alignalignment
ln -s $WORK/archivecore_ngs/references/bwa/sacCer3
ls sacCer3 |
...
Code Block |
---|
language | bash |
---|
title | bwa aln commands for yeast R1 and R2 |
---|
|
bwa aln -t 8 sacCer3/sacCer3.fa fqfastq/Sample_Yeast_L005_R1.cat.fastq.gz > yeast_R1.sai 2> aln.yeast_R1.log
bwa aln -t 8 sacCer3/sacCer3.fa fqfastq/Sample_Yeast_L005_R2.cat.fastq.gz > yeast_R2.sai 2> aln.yeast_R2.log |
...
Code Block |
---|
language | bash |
---|
title | BWA global alignment of R1 reads |
---|
|
bwa sampe sacCer3/sacCer3.fa yeast_R1.sai yeast_R2.sai fqfastq/Sample_Yeast_L005_R1.cat.fastq.gz fqfastq/Sample_Yeast_L005_R2.cat.fastq.gz > yeast_pairedend.sam |
...
Exercise: How many sequences were in the R1 and R2 FASTQ files combined?
Expand |
---|
|
gunzip -c fqfastq/Sample_Yeast_L005_R1.cat.fastq.gz | echo $((`wc -l` / 2))
|
...
Following what we did earlier for BWA indexing:, namely move our FASTA into place, create the index directory, and establish our symbolic links.
Code Block |
---|
|
Code Block |
---|
language | bash |
---|
title | Prepare Bowtie2 index directory for mirbase |
---|
|
mkdir -p $WORK/archivecore_ngs/references/bt2/mirbase.v20
mv $WORK/core_ngs/references/hairpin_cDNA_hsa.fa $WORK/core_ngs/references/fasta
cd $WORK/archivecore_ngs/references/bt2/mirbase.v20
ln -s -f ../../fasta/hairpin_cDNA_hsa.fa
ls -la |
...
Code Block |
---|
language | bash |
---|
title | Examine miRNA sequences to be aligned |
---|
|
cd $SCRATCH/core_ngs/alignalignment
less fqfastq/human_mirnaseq.fastq.gz |
Lots of reads have long strings of A's, which must be an adapter or protocol artifact. Even though we see how we might be able to fix it using some tools we've talked about, what if we had no idea what the adapter sequence was, or couldn't use cutadapt or other programs to prepare the reads?
...
Code Block |
---|
language | bash |
---|
title | Link to mirbase index for bowtie2 |
---|
|
cd $SCRATCH/core_ngs/alignalignment
ln -s -f $WORK/archivecore_ngs/references/bt2/mirbase.v20 mb20
|
...
Code Block |
---|
language | bash |
---|
title | Local bowti2 alignment of miRNA data |
---|
|
bowtie2 --local -N 1 -L 16 -x mb20/hairpin_cDNA_hsa.fa -U fqfastq/human_mirnaseq.fastq.gz -S human_mirnaseq.sam |
...
- Prepare the vibCho reference index for bowtie2 from GenBank files using BioPerl
- Align reads using bowtie2, producing a SAM file
- Convert the SAM file to a BAM file (samtools view)
- Sort the BAM file by genomic location (samtools sort)
- Index the BAM file (samtools index)
- Gather simple alignment statistics (samtools flagstat and samtools idxstat)
Obtaining GenBank sequence files for Vibrio Cholerae
Converting GenBank files into usable reference FASTA files and genome annotations (GFF files)
BioPerl is available through the module system, so load it up like we have done before, with the caveat that we load regular Perl before we load BioPerl.
...
Expand |
---|
|
Code Block |
---|
| cds
cd $SCRATCH/core_ngs/align/alignment
ls fqfastq
gunzip -c fqfastq/human_rnaseq.fastq.gz | echo $((`wc -l`/4)) |
|
...
Now, try aligning it with bwa aln like we did in Example #1, but first link to the hg19 bwa index directory. In this case, due to the size of the hg19 index, we are linking to Anna's scratch area INSTEAD of our own work area containing indexes that we built ourselves.
Code Block |
---|
language | bash |
---|
title | Link to BWA hg19 index directory |
---|
|
cd $SCRATCH/core_ngs/alignalignment
ln -s -f /scratch/01063/abattenh/ref_genome/bwa/bwtsw/hg19
ls hg19 |
...
Code Block |
---|
language | bash |
---|
title | Commands to bwa aln RNA-seq data |
---|
|
bwa aln hg19/hg19.fa fqfastq/human_rnaseq.fastq.gz > human_rnaseq_bwa.sai
bwa samse hg19/hg19.fa human_rnaseq_bwa.sai fqfastq/human_rnaseq.fastq.gz > human_rnaseq_bwa.sam |
...