Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Deliverables
    • Reports generated by FastQC.
  • Tools Used
    • FastQC: (Andrews 2010) used to generate quality summaries of data:
      • Per base sequence quality report: useful for deciding if trimming necessary.
      • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
      • Overrepresented sequences: evaluation of adapter contamination.

2. Assembly

We use Trinity to community standard assemblers to generate a de novo assembly. Assembly is a very computationally complex task, and may not finish within the time limits imposed on compute jobs at TACC, especially for large data sets. To increase the chance of getting an assembly, we run two assemblies: one with the original data, and one with an in silico normalization to 50x coverage before the main assembly starts. If the non-normalized data doesn't complete an assembly, the normalized data mayIf an initial assembly run doesn't complete within TACC time limits, we employ a variety of strategies such as in silico normalization to get a complete assembly.

  • Deliverables
    • FASTA file of assembly from full data (if it finishes).

    • Otherwise, FASTA file of assembly with in silico normalization to 50x coverage (if it finishes).

    • .

    • If we are unable to finish an assemblyIf neither assembly run finishes, no charge.

  • Tools Used
    • Trinity (Grabherr, et al 2011) is the best-known and most-used transcriptome assembler available today. for eukaryotes
    • rnaSPAdes for bacteria

3. Optional: Homology Against Standard Databases

...