Performs assembly of reads and generates contigs. Current Version 2.5.3. Full Roche manual.
- Sff files
- Fasta files
- Converted Sanger data- Fasta files and corresponding Quality files
- Consensus sequence (contigs)
- Corresponding quality scores
- ACE files
- Assembly metrics files
- Pairwise alignments
- Read status file
- Alignment views - GUI only
- Flowgrams - GUI only
- For paired end data, scaffold files
Running GS De Novo assembler
GUI Assembler -
- Can be accessed by typing gsAssembler
Commandline Assembler -
- runAssembly -o /data/filename /data/R_/D_
- For paired end data, runAssembly -o /data/filename -p /data/R_/D_
- Incremental de novo assembly - will allow you to add more data to the assembly when needed.
- Large or complex genomes - for genomes larger than 15 Mb, use this option.
- Trimming database file - Provide a file with fasta sequences that need to be removed (trimmed) from reads (like vectors).
- Screening database file - Provide a file containing contamination sequences for screening.
- cDNA assembly- use option -cdna
Things to remember
- Reads shorter than 50 bp long are removed by default.
- The tool is more powerful and produces better assemblies when using sff files than just fasta files as input. The flowgrams are used when computing signals.
- It is a good idea to use Repeatmasker to handle repeats before assembly.
- The current assembler version uses 3 to 4 bytes of memory per base and is equipped to run only on a single processor. In cases where memory is not enough to do an assembly, try the incremental de novo assembly option.