Tricks to preprocess SOLiD and 454 data

Created by Dhivya Arasappan, last modified on Dec 16, 2011

Some tricks to preprocess/assess ABI SOLiD data

Look for dominant sequences in your data
- grep -v '^>' F3.csfasta |sort|uniq -c -w 25|sort -n -r|head -20

- F3.csfasta : Input file- raw csfasta file from ABI SOLiD
- This command looks for dominant sequences with unique bases in the first 25 bases of the read - change 25 if you want more or less o the read to be considered when looking for dominant sequences.

Some tricks to preprocess/assess 454 data

Make 454 data into format of one sequence per line
- makeSeqsOneLine 454.fna > 454.modified.fna

- 454.fna : Input file of raw 454 data
- 454.modified.fna : Output file of modified 454 data

Pull out read sequences (with read id) containing a certain pattern (Let's say 'TAGGAC')
- grep -B 1 'TAGGAC' 454.modified.fna |grep -v '^-' > 454.pattern.fna

- 454.modified.fna : Modified 454 data
- 454.pattern.fna : Fasta file with only reads containing the specified pattern.

Pull out read sequences (with read id) starting with a certain pattern (Let's say 'TAGGAC')
- grep -B 1 '^{TAGGAC' 454.modified.fna |grep -v '}-' > 454.pattern.fna

- 454.modified.fna : Modified 454 data
- 454.pattern.fna : Fasta file with only reads starting with the specified pattern.

To get the reverse complement sequences for a fasta file, run the following command on fourierseq:

- reversecomplement.pl test.fasta|sed 's/U/T/g' > test.revcomp.fasta

- test.fasta: Fasta input file
- test.revcomp.fasta : Fasta output file, with reverse complemented sequences

No labels

Confluence Documentation | Web Privacy Policy | Web Accessibility