Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

After the header lines, each feature in the genome is represented by a line that gives chromosome, start, stop, strand, and other information.  Features are things like "mRNA," "CDS," and "EXON."  As you would expect in a prokaryotic genome it is frequently the case that the gene, mRNA, CDS, and exon annotations are identical, meaning they share coordinate information.  You could parse these files further using commands like grep  and awk  to extract, say, all exons from the full file or to remove the header lines that begin with "#".

Building the bowtie2 vibCho index

To build the reference index for alignment, we actually only need the FASTA file, since annotations are often not necessary for alignment. (This is not always true - extensively spliced transcriptomes requires splice junction annotations to align RNA-seq data properly, but for now we will only use the FASTA file.)  We build the reference files exactly as we did with mirbase, only swapping out the FASTA files as follows:

...