Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Before the alignment, of course, we've got to build a mirbase index using bowtie2-build (go ahead and check out its options). Unlike for the aligner itself, we only need to worry about a few things here:

Code Block
bowtie2-build <reference_in> <bt2_index_base>

...

  • reference_in file is just the FASTA file containing mirbase v20 sequences

...

  • bt2_index_base is the prefix of where we want the files to go

 Following . Following what we did earlier for BWA indexing:

Code Block
languagebash
titlePrepare Bowtie2 index directory for mirbase
mkdir -p $WORK/archive/references/bt2/mirbase.v20
cd $WORK/archive/references/bt2/mirbase.v20
ln -s -f ../../fasta/cd $SCRATCH/references/
mkdir mirbase
mv hairpin_cDNA_hsa.fa
ls -la

Now build the index with bowtie2-build:

Code Block
languagebash
titlePrepare Bowtie2 index directory for mirbase
mirbase
cd mirbase
bowtie2-build hairpin_cDNA_hsa.fa hairpin_cDNA_hsa.fa

That was very fast!  It's because the mirbase reference genome is so small compared to what programs like this are used to dealing with, which is the human genome (or bigger).  Now, your $SCRATCH/references/mirbase directory should be filled with   You should see the following files:

Code Block
titlebowtie2 index files for miRNAs
hairpin_cDNA_hsa.fa
hairpin_cDNA_hsa.fa.1.bt2
hairpin_cDNA_hsa.fa.2.bt2
hairpin_cDNA_hsa.fa.3.bt2
hairpin_cDNA_hsa.fa.4.bt2
hairpin_cDNA_hsa.fa.rev.1.bt2
hairpin_cDNA_hsa.fa.rev.2.bt2

Now, we're ready to actually try to do the alignment.  Remember, unlike BWA, we actually need to set some options depending on what we're after.  These are Some of the most important options when using Bowtie2for bowtie2 are:

L--
OptionEffect
-NControls the number of mismatches allowable in the seed of each alignment (default = 0)-Controls the length of seed substrings generated from each read (default = 22)end-to-end or --localControls whether the entire read must align to the reference, or whether soft-clipping the ends is allowed to find internal alignments. Default --end-to-end
-LControls the length of seed substrings generated from each read (default = 22)
-NControls the number of mismatches allowable in the seed of each alignment (default = 0)
-maControls the alignment score contribution of a matching base (0 for --end-to-end, 2 for --local)

To decide how we want to go about doing our alignment, check out the file we're aligning with 'less'.:

Expand
titleHint:
Code Block
cds
less fastq_align/human_mirnaseq.fastq.gz

...