Page History

...

Expand

title	What's going on?

Parameters are:

--local – local alignment mode
-L 16 – seed length 16
-N 1 – allow 1 mismatch in the seed
-x mb20/hairpin_cDNA_hsa.fa – prefix path of index files
-U fqfastq/human_mirnaseq.fastq.gz – FASTQ file for single-end (Unpaired) alignment
-S human_mirnaseq.sam – tells bowtie2 to report alignments in SAM format to the specified file

...

As noted earlier, many microbial genomes are available through repositories like GenBank that use specific file format conventions for storage and distribution of genome sequence and annotations. The "Genbank" file format is a text file that can be parsed to yield other files that are compatible with the pipelines we have been implementing. Go ahead and look at some of the contents of a Genbank file with the following commands:

Code Block
cd $SCRATCH$WORK/core_ngs/references less vibCho.O395.gbk grep -A 5 ORIGIN vibCho.O395.gbk

...

To build the reference index for alignment, we actually only need the FASTA file, since annotations are frequently often not necessary for alignment purposes. (This is not always true , as - extensively spliced transcriptomes requires splice junction annotations to align RNA-seq data properly, but for now we will only use the FASTA file.) We build the reference files exactly as we did with mirbase, only swapping out the FASTA files as follows:

Code Block

bp_genbank2gff3.plmkdir --format Genbank p $WORK/core_ngs/references/bt2/vibCho
mv $WORK/core_ngs/references/vibCho.O395.gbk
mv vibCho.O395.gbk.gff fa $WORK/core_ngs/references/fasta
cd $WORK/core_ngs/references/bt2/vibCho
ln -s -f ../../fasta/vibCho.O395.gfffa
less vibCho.O395.gff

Performing the bowtie2 alignment

ls -la

Now build the index using bowtie2-build like we did for the mirbase index:

Code Block
bowtie2-build vibCho.O395.fa vibCho.O395

This should also go pretty fast: you can see the resulting files using ls like before.

Performing the bowtie2 alignment

Now we will go back to our scratch area to do the alignment, and set up symbolic links to the index in the work area to simplify the alignment command:

Code Block
cd $SCRATCH/core_ngs/alignment ln -s -f $WORK/core_ngs/references/bt2/vibCho vibCho

In the previous exercise, we set specific local alignment parameters for bowtie2 that took into account the fact that the dataset was derived from small RNA. However, in this case the data is from standard mRNA sequencing, meaning that the DNA fragments are typically longer than the reads. There is likely to be very little contamination that would require using a local rather than global alignment, or many other pre-processing steps (e.g. adapter trimming). Thus, we will run bowtie2 with default parameters, omitting any options other than the input, output, and reference index. The command would look like this:

Code Block
bowtie2 -x vibCho/vibCho.O395.fa -U fastq/cholera_rnaseq.fastq.gz -S cholera_rnaseq.sam

Expand

title	What's going on?

Parameters are:

-x vibCho/vibCho.O395.fa – prefix path of index files
-U fastq/cholera_rnaseq.fastq.gz – FASTQ file for single-end (Unpaired) alignment
-S cholera_rnaseq.sam – tells bowtie2 to report alignments in SAM format to the specified file

Create a commands file called bt2_vibCho.cmds with this task definition then generate and submit a batch job for it (time 1 hour, normal queue).

Expand

title	What's going on?

Use nano to create the bt2_vibCho.cmds file. Then:

Code Block

language	bash
title	Local bowti2 alignment of miRNA data

launcher_creator.py -n bt2_vibCho -j bt2_vibCho.cmds -t 01:00:00 -q normal
sbatch bt2_vibCho.slurm; showq -u

When the job is complete you should have a cholera_rnaseq.sam file that you can examine using whatever commands you like. In the past two exercises, we have explored a few different sorts of commands that you might run on this file to get different sorts of information. Those commands will provide similar information here, since this file is also in SAM/BAM format. In the last exercise, we will come back to this SAM file (and the others) to explore ways to compress them effectively and to extract basic quality statistics.(this will stop here because samtools index/idxstats/flagstat will be Exercise #5)

Exercise #4: BWA-MEM - Human mRNA-seq

...

Expand

title	Hint:

Code Block

language	bash

bwa mem hg19/hg19.fa fqfastq/human_rnaseq.fastq.gz > human_rnaseq_mem.sam

...

Page tree

Versions Compared

Old Version 79

New Version 80

Key

Performing the bowtie2 alignment

Performing the bowtie2 alignment

Exercise #4: BWA-MEM - Human mRNA-seq