The University Wiki Service has upgraded the Confluence Server software, from version 5.9.14 to 5.10.8. Please refer to the knowledge base article, KB0015891, for a high level summary of upgrade changes. Thank you!
Skip to end of metadata
Go to start of metadata


Performs assembly of reads and generates contigs. Current Version 2.5.3. Full Roche manual.

Input options

  • Sff files
  • Fasta files
  • Converted Sanger data- Fasta files and corresponding Quality files

Output options

  • Consensus sequence (contigs)
  • Corresponding quality scores
  • ACE files
  • Assembly metrics files
  • Pairwise alignments
  • Read status file
  • Alignment views - GUI only
  • Flowgrams - GUI only
  • For paired end data, scaffold files

Running GS De Novo assembler

GUI Assembler - 

  • Can be accessed by typing gsAssembler

Commandline Assembler - 

  • runAssembly -o /data/filename /data/R_/D_
  • For paired end data, runAssembly -o /data/filename -p /data/R_/D_

Some options

  • Incremental de novo assembly - will allow you to add more data to the assembly when needed.
  • Large or complex genomes - for genomes larger than 15 Mb, use this option.
  • Trimming database file - Provide a file with fasta sequences that need to be removed (trimmed) from reads (like vectors).
  • Screening database file - Provide a file containing contamination sequences for screening.
  • cDNA assembly- use option -cdna

Things to remember

  • Reads shorter than 50 bp long are removed by default. 
  • The tool is more powerful and produces better assemblies when using sff files than just fasta files as input. The flowgrams are used when computing signals.
  • It is a good idea to use Repeatmasker to handle repeats before assembly.
  • The current assembler version uses 3 to 4 bytes of memory per base and is equipped to run only on a single processor. In cases where memory is not enough to do an assembly, try the incremental de novo assembly option.
  • No labels