You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Genome Assembly

First, some background

De Novo asssembly is creating a genome without a reference genome. Creating a genome with a reference genome is called mapping assembly.

This paper is an excellent review of the theory and practice of NGS assemblers as of 2010. Read lengths will continue to get longer, error rates lower, coverage higher, but the basic concepts embodied in that paper will probably remain useful for several more years.

The figures embedded in this wiki page for educational purposes are from that paper.

Upfront we need to discuss the two basic assembler types: overlap graph and de Bruijn:

Unknown macro: {iframe}

In either case, more and longer reads are better as you can imagine. With an overlap graph (also called overlap layout consensus algorithm or overlap layout algorithm) your assembly grows much more effectively with longer reads and there are few parameters you can tweak. With a de Bruijn approach, obviously your choice of k can have a strong impact on your assembly.

Effect of trade-off in read length and coverage
Unknown macro: {iframe}
k-mer distributions inherent in select genomes
Unknown macro: {iframe}
Some example assembly statistics
Unknown macro: {iframe}
Unknown macro: {iframe}

Many (many) assemblers are available. A list of assemblers can be found here.

We'll take a look at Velvet. - it's a fast and easy to use de Bruijn assembler.

OK - let's try an exercise

Pick your data:

  1. simulated single-end 100 bp reads
  2. simulated paired-end reads 2x100, one insert size (400 bp)
  3. simulated paired-end reads 2x100, two insert sizes (400 + 3000 bp)
  4. real data 2x100

Velvet is available on Lonestar. Type:

 module load velvet

Running velvet is a two-stage process:
velveth

Look at stats...
N50, assy size, max contig, % coding genes (wait for annotation), look at velvetContigStats in BioITeam/sphsmith

Comparing assemblies with Mauve

http://asap.ahabs.wisc.edu/mauve/mauve-user-guide/

  • No labels