Plasmids are used extensively in both synthetic biology and academic settings, and they are often passed around labs, modified slightly to suit some new purpose, before being handed off to another researcher again often serving some slightly modified purpose. While repositories are getting better about fully sequencing all plasmids, plasmids were used for decades without significant sequencing efforts, followed by decades where sequencing only key inserts was performed via Sanger sequencing. Even in my own work, it has not been uncommon to get a functioning plasmid without getting a reference sequence. Recently the Barrick lab has developed a program capable quickly annotating and visualizing plasmid sequences named pLannotate.
Using the assembled plasmid you identified in the SPAdes genome assembly tutorial or the novel DNA identification tutorial or a plasmid of your own:
- Use pLannotate website to annotate the sequence.
- Compare the results with any existing annotations you are aware of.
- Consider how any partial artifacts found on the plasmid could be effecting its known behavior.
Like with most other programs we have worked with pLannotate has both:
- An associated publication: https://doi.org/10.1093/nar/gkab374
- A github page covering both usage and insulation details https://github.com/barricklab/pLannotate
For this tutorial we will also be making use of something new: a dedicated web server capable of doing the analysis for us: http://plannotate.barricklab.org/. While installation instructions are present on the github page, and include a nont surprising conda installation, the use of mamba appears to be required for installation on stampede2 for reasons that are not currently known.
mamba create -n plannotate -c conda-forge -c bioconda plannotate
If you decide to install mamba as discussed in Friday's review tutorial, consider installing plannotate yourself, and using the command line tools. The information on command line use is sufficient for you to figure out how to run the program on whatever plasmid sequence you have at this point in the class.
Get Some Data
Data source is either product of other tutorial, or your own plasmid
As mentioned above the reference sequence(s) you will annotate will come from one (or more) of the following:
- The SPAdes genome assembly tutorial
- the novel DNA identification tutorial
- a plasmid you currently work with
If you will use a product of one of the tutorials, you can transfer the plasmid sequence (in fasta or genbank format) back to your laptop possibly with help of the scp tutorial or use cat/more/less to highlight the entire sequence, copy it, and paste it in the next step.
As we have not installed the program locally, we will instead use the program's web server to annotate the program. Navigate to http://plannotate.barricklab.org/ and upload the file, or paste the sequence into the appropriate section.
Plannotate will generate 3 key things:
An interactive graphic of your plasmid
This allows you to see the location of the genes and gene fragments identified, as well as popup information about each
An annotated genbank file
As we have worked with throughout the class, this format can be downloaded to your local computer (or is produced automatically if you were instead using the program via the command line)
A csv file
The csv file contains the same information in a generic format that can be useful if you are attempting to work with multiple plasmids, and say want to identify any plasmid that has a certain antibiotic resistance gene
Next steps and optional exercises
As mentioned above, if you install mamba after consulting the review tutorial you can install plannotate on stampede, and perform the same analysis via the command line.