You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Current »

Velvet 1.2.03 is on Fourierseq as of 03/12/12 - compiled with max Kmer = 73 and long sequences.  SPHS.
Velvet 1.1.07 was on Fourierseq as of 12/15/11 - compiled with max Kmer = 71.  SPHS.

This is the CURRENT version of the manual (which may not match the version we have installed).

Simple Velvet commands for paired-end illumina data

There are many options for all these commands that can be played around with, but these are the most basic assembly instructions.

1. Preprocess the illumina fastq files - Velvet needs the R1 and R2 reads in one single file.

shuffleSequences_fastq.pl R1.fastq R2.fastq R1_R2.fastq &

2. Make hash of the reads using Velveth. Hash length (or kmer length) matters. You might have to run velvet several times to pick the best kmer length. It should be long enough to avoid false positives, but not long that you miss out reads in your assembly. Recommendation: try k-mer lengths between 61-73.

nohup velveth outputdirectory_61 61 -shortPaired R1_R2.fastq &

outputdirectory_61: the output directory
61: kmer length/hash length
shortpaired: indicating illumina sanger-like paired-end data
R1_R2.fastq: input fastq file

Note: If you have multiple fastq files that have been processed using shuffleSequences_fastq.pl, you can supply them all to velveth, as below:

nohup velveth outputdirectory_61 61 -shortPaired R1_R2.1.fastq R1_R2.2.fastq R1_R2.3.fastq &

3. Assembly using velvetg

nohup velvetg outputdirectory_61 -cov_cutoff 10 -ins_length 400 &

outputdirectory_61: the output directory
cov_cutoff 10: Coverage cutoff for assembly ( try cov_cutoff auto to allow the system to infer this).
ins_length 400: expected distance between two paired end reads (required for paired end data). This will be your library/template size

4. Checking the assembly output

Contigs.fa will have all the contigs that were assembled. The contig header will have information such as length and coverage for that contig.
Note that the length and coverage information provided in the header of each contig should be understood in k-mers and in k-mer coverage respectively. E.g. for a 500bp contig and a k-mer length of 21, the length in the header will be 480.

To pull out the top 50 largest contig headers:

grep '^>' contigs.fa |awk 'BEGIN {FS ="_";};{ print $4+kmerlength-1"\t"$0}'|sort -n -r |head -50

Replace kmerlength in the command by the actual kmer length.

  • No labels