Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


File NameDescriptionSample Illumina, First of pair, FASTQYeast ChIP-seq Illumina, Second of pair, FASTQYeast ChIP-seq
human_rnaseq.fastq.gzPaired-end Illumina, First of pair only, FASTQHuman RNA-seq
human_mirnaseq.fastq.gzSingle-end Illumina, FASTQHuman microRNA-seq
cholera_rnaseq.fastq.gzSingle-end Illumina, FASTQV. cholerae RNA-seq

Reference Genomes

Before we get to alignment, we need a genome to align to.  We will use four different references here: 


ReferenceSpeciesBase LengthContig NumberSourceDownload
hg19Human3.1 Gbp25 (really 93)UCSCUCSC GoldenPath
sacCer3Yeast12.2 Mbp17UCSCUCSC GoldenPath
mirbase V20Human160 Kbp1908MirbaseMirbase Downloads
vibCho (O395)V. cholerae~4 Mbp2GenBankGenBank Downloads


Recall that these are 100 bp reads and we did not remove adapter contamination. There will be a distribution of fragment sizes – some will be short – and those short fragments may not align without adapter removal (fastx_trimmer or cutadapt).

Exercise #2: Bowtie2 - Vibrio


cholerae RNA-seq

While we have focused on aligning eukaryotic data, the same tools can be used to perform identical functions with prokaryotic data.  The major differences are less about the underlying data and much more about the external/public databases established to store and distribute reference data.  For example, the Illumina iGenome resource provides pre-processed and uniform reference data, designed to be out-of-the-box compatible with aligners like bowtie2 and bwa.  However, the limited number of available species are heavily biased towards model eukaryotes. If we wanted to study a prokaryote, the reference data must be downloaded from a resource like GenBank, and processed/indexed similarly to the procedure for mirbase.