...
File Name | Description | Sample |
---|---|---|
Sample_Yeast_L005_R1.cat.fastq.gz | Paired-end Illumina, First of pair, FASTQ | Yeast ChIP-seq |
Sample_Yeast_L005_R2.cat.fastq.gz | Paired-end Illumina, Second of pair, FASTQ | Yeast ChIP-seq |
human_rnaseq.fastq.gz | SinglePaired-end Illumina, First of pair only, FASTQ | Human RNA-seq |
human_mirnaseq.fastq.gz | Single-end Illumina, FASTQ | Human microRNA-seq |
...
Searching genomes, however, is hard work and takes a long time if done on an un-indexed, linear genomic sequence. So, most aligners require that references be indexed for quick access The aligners we are using each require a different index, but use the same method (the Burrows-Wheeler Transform) to get the job done. This requires taking a FASTA file as input, with each chromosome (or contig) as a separate entry, and producing some aligner-specific set of files as output. Then, those output files are used by the aligner when executing a given alignment command. Here are some details of where you can find the references we need now (and here are many more):
Reference | Species | Base Length | Contig Number | Source | Download Link | ||
---|---|---|---|---|---|---|---|
Hg19 | Human | 3,137,161,264 | 25 (really 93) | UCSC | |||
MirbaseV20 | Human | 1908 | Mirbase | ||||
SacCer3 | Yeast | 17 | UCSC | ||||
Mm9Mm10 | Mouse | UCSC | 22 (really 66) | UCSC |
The yeast and mirbase fasta files (with just reference sequence information) are located on the path:
Code Block |
---|
/ |
Hg19 is way too big for us to index here, so we've already done it.
BWA - Yeast ChIP-seq
C
Bowtie2 and Local Alignment - Human microRNA-seq
...