SOAP2.18 (short oligonucleotide analysis package) is a versatile and fast aligner for short reads. It uses the 2-way burrows-wheeler transform to reduce the amount of memory needed while mapping. It handles only base space data.
- To get started using SOAP, visit the SOAP website.
How to run SOAP
Because SOAP does not handle color space data, the only way to use SOAP with color space reads is to convert both the reads and the reference to mock base space.
Example pipeline for running soap with color space reads (when dealing with base space reads, follow step 3 onwards)
1. Convert the reference to mock base space.
bs2cs ref.fasta > ref.csfasta
cs2mbs ref.csfasta > ref.m.fasta
ref.fasta : reference in base space ref.csfasta : reference in color space (for temporary purposes) ref.m.fasta : reference in mock base space
2. Convert the reads to mock base space
cs2mbs -d -r in.csfasta > in.m.fasta
in.csfasta : reads file in color space in.m.fasta : reads file in mock base space \-d : drop the first colorspace base during conversion. This will ignore the first color space base which is part of the primer. \-r : For each read, include the reverse of the mock base space sequence.
3. Create SOAP indexes for the reference genome
ref.m.fasta : reference in mock base space
4. Align using SOAP
soap -D ref.m.fasta.index -v 3 -a in.m.fasta -o out
ref.m.fasta.index : base name for the SOAP reference indexes in.m.fasta : reads file (in mock base space) out : mapping output file \-v 3 : mismatches allowed in the entire alignment
- If you have lots of warning message as 'length y < 0, countinue as 13', it means that your read length is too short, so SOAP cannot handle them properly. Currently, SOAP supports only reads longer than 30 bp.( NewsGroup article; it described 2.18 version, but 2.20 shows the same result.)