Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleWhat's going on?

Parameters are:

  • --local – local alignment mode
  • -L 16 – seed length 16
  • -N 1 – allow 1 mismatch in the seed
  • -x  mb20/hairpin_cDNA_hsa.fa – prefix path of index files
  • -U fqfastq/human_mirnaseq.fastq.gz – FASTQ file for single-end (Unpaired) alignment
  • -S human_mirnaseq.sam – tells bowtie2 to report alignments in SAM format to the specified file

...

As noted earlier, many microbial genomes are available through repositories like GenBank that use specific file format conventions for storage and distribution of genome sequence and annotations.  The "Genbank" file format is a text file that can be parsed to yield other files that are compatible with the pipelines we have been implementing.  Go ahead and look at some of the contents of a Genbank file with the following commands:

Code Block
cd $SCRATCH$WORK/core_ngs/references
less vibCho.O395.gbk
grep -A 5 ORIGIN vibCho.O395.gbk

...

To build the reference index for alignment, we actually only need the FASTA file, since annotations are frequently often not necessary for alignment purposes. (This is not always true , as - extensively spliced transcriptomes requires splice junction annotations to align RNA-seq data properly, but for now we will only use the FASTA file.)  We build the reference files exactly as we did with mirbase, only swapping out the FASTA files as follows:

Code Block
bp_genbank2gff3.plmkdir --format Genbank p $WORK/core_ngs/references/bt2/vibCho
mv $WORK/core_ngs/references/vibCho.O395.gbk
mv vibCho.O395.gbk.gff fa $WORK/core_ngs/references/fasta
cd $WORK/core_ngs/references/bt2/vibCho
ln -s -f ../../fasta/vibCho.O395.gfffa
less vibCho.O395.gff

 

 

Performing the bowtie2 alignment

ls -la

Now build the index using bowtie2-build like we did for the mirbase index:

Code Block
bowtie2-build vibCho.O395.fa vibCho.O395

This should also go pretty fast: you can see the resulting files using ls like before.

Performing the bowtie2 alignment

Now we will go back to our scratch area to do the alignment, and set up symbolic links to the index in the work area to simplify the alignment command:

Code Block
cd $SCRATCH/core_ngs/alignment
ln -s -f $WORK/core_ngs/references/bt2/vibCho vibCho

In the previous exercise, we set specific local alignment parameters for bowtie2 that took into account the fact that the dataset was derived from small RNA.  However, in this case the data is from standard mRNA sequencing, meaning that the DNA fragments are typically longer than the reads.  There is likely to be very little contamination that would require using a local rather than global alignment, or many other pre-processing steps (e.g. adapter trimming).  Thus, we will run bowtie2 with default parameters, omitting any options other than the input, output, and reference index.  The command would look like this:

Code Block
bowtie2 -x vibCho/vibCho.O395.fa -U fastq/cholera_rnaseq.fastq.gz -S cholera_rnaseq.sam
Expand
titleWhat's going on?

Parameters are:

  • -x  vibCho/vibCho.O395.fa – prefix path of index files
  • -U fastq/cholera_rnaseq.fastq.gz – FASTQ file for single-end (Unpaired) alignment
  • -S cholera_rnaseq.sam – tells bowtie2 to report alignments in SAM format to the specified file

Create a commands file called bt2_vibCho.cmds with this task definition then generate and submit a batch job for it (time 1 hour, normal queue).

Expand
titleWhat's going on?

Use nano to create the bt2_vibCho.cmds file. Then:

Code Block
languagebash
titleLocal bowti2 alignment of miRNA data
launcher_creator.py -n bt2_vibCho -j bt2_vibCho.cmds -t 01:00:00 -q normal
sbatch bt2_vibCho.slurm; showq -u

When the job is complete you should have a cholera_rnaseq.sam file that you can examine using whatever commands you like.  In the past two exercises, we have explored a few different sorts of commands that you might run on this file to get different sorts of information.  Those commands will provide similar information here, since this file is also in SAM/BAM format.  In the last exercise, we will come back to this SAM file (and the others) to explore ways to compress them effectively and to extract basic quality statistics.(this will stop here because samtools index/idxstats/flagstat will be Exercise #5)

Exercise #4: BWA-MEM - Human mRNA-seq

...

Expand
titleHint:
Code Block
languagebash
bwa mem hg19/hg19.fa fqfastq/human_rnaseq.fastq.gz > human_rnaseq_mem.sam

...