Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Alignment of this prokaryotic data follows the workflow below. Here we will concentrate on steps 1 and 2.

  1. Prepare the vibCho reference index for bowtie2 from GenBank files using BioPerl
  2. Align reads using bowtie2, producing a SAM file
  3. Convert the SAM file to a BAM file (samtools view) 
  4. Sort the BAM file by genomic location (samtools sort)
  5. Index the BAM file (samtools index)
  6. Gather simple alignment statistics (samtools flagstat and samtools idxstat)

...

Clearly, there are many file formats that we can use this script to convert.  In our case, we are moving "moving from" genbank "to" fasta, so the commands we would execute to produce and view the FASTA files would look like this:

...

Now we have a reference sequence file that we can use with the bowtie2 reference builder, and ultimately align sequence data against.

Recall  However, recall from when we viewed the Genbank GenBank file that there are genome annotations available as well that we would like to extract into GFF format.  However, the bp_seqconvert.pl script is designed to be used to convert sequence formats, not annotation formats.  FortunatelyFortunately, there is another script called bp_genbank2gff3.pl that can take a Genbank GenBank file and produce a GFF3 (the most recent format convention for GFF files) file.  To To run it and see the output, run these commands:

...

After the header lines, each feature in the genome is represented by a line that gives chromosome, start, stop, strand, and other information.  Features are things like "mRNA," "CDS," and "EXON."  As you would expect in a prokaryotic genome it is frequently the case that the gene, mRNA, CDS, and exon annotations are identical, meaning they share coordinate information.  You You could parse these files further using commands like grep  and awk  to extract, say, all exons from the full file or to remove the header lines that begin with "with #".

Building the bowtie2 vibCho index

...