Page History

...

Before the alignment, of course, we've got to build a mirbase index using bowtie2-build (go ahead and check out its options). Unlike for the aligner itself, we only need to worry about a few things here:

Code Block
bowtie2-build <reference_in> <bt2_index_base>

...

reference_in file is just the FASTA file containing mirbase v20 sequences

...

bt2_index_base is the prefix of where we want the files to go

Following . Following what we did earlier for BWA indexing:

Code Block

language	bash
title	Prepare Bowtie2 index directory for mirbase

mkdir -p $WORK/archive/references/bt2/mirbase.v20
cd $WORK/archive/references/bt2/mirbase.v20
ln -s -f ../../fasta/cd $SCRATCH/references/
mkdir mirbase
mv hairpin_cDNA_hsa.fa
ls -la

Now build the index with bowtie2-build:

Code Block

language	bash
title	Prepare Bowtie2 index directory for mirbase

mirbase
cd mirbase
bowtie2-build hairpin_cDNA_hsa.fa hairpin_cDNA_hsa.fa

That was very fast! It's because the mirbase reference genome is so small compared to what programs like this are used to dealing with, which is the human genome (or bigger). Now, your $SCRATCH/references/mirbase directory should be filled with You should see the following files:

Code Block

title	bowtie2 index files for miRNAs

hairpin_cDNA_hsa.fa
hairpin_cDNA_hsa.fa.1.bt2
hairpin_cDNA_hsa.fa.2.bt2
hairpin_cDNA_hsa.fa.3.bt2
hairpin_cDNA_hsa.fa.4.bt2
hairpin_cDNA_hsa.fa.rev.1.bt2
hairpin_cDNA_hsa.fa.rev.2.bt2

Now, we're ready to actually try to do the alignment. Remember, unlike BWA, we actually need to set some options depending on what we're after. These are Some of the most important options when using Bowtie2for bowtie2 are:

L--

Option	Effect
-N	Controls the number of mismatches allowable in the seed of each alignment (default = 0)	-	Controls the length of seed substrings generated from each read (default = 22)	end-to-end or --local	Controls whether the entire read must align to the reference, or whether soft-clipping the ends is allowed to find internal alignments. Default --end-to-end
-L	Controls the length of seed substrings generated from each read (default = 22)
-N	Controls the number of mismatches allowable in the seed of each alignment (default = 0)
-ma	Controls the alignment score contribution of a matching base (0 for --end-to-end, 2 for --local)

To decide how we want to go about doing our alignment, check out the file we're aligning with 'less'.:

Expand

title	Hint:

Code Block
cds less fastq_align/human_mirnaseq.fastq.gz

...

Page tree

Versions Compared

Old Version 38

New Version 39

Key