Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
titleFASTX_toolkit module description
module spider fastx

Fastx toolkit is not a module on lonestar5, but it is installed in my /work/01184/daras/bin/ directory.  Because it is in your path, you will be able to access it without providing the entire path to it.

Code Block
titleFASTX_toolkit module description
which fastx_trimmer


echo $PATH

fastx_trimmer -h load fastx_toolkit


Let's run fastx_trimmer to trim all input sequences down to 90 bases:

...

  • The -l 90 option says that base 90 should be the last base (i.e., trim down to 90 bases)
  • the -Q 33 option specifies how base qualities on the 4th line of each fastq entry are encoded. The FASTX toolkit is an older program, written in the time when Illumina base qualities were encoded differently. These days Illumina base qualities follow the Sanger FASTQ standard (Phred score + 33 to make an ASCII character).

Exercise: fastx toolkit programs

What other fastx manipulation programs are part of the fastx toolkit?

...

titleHint

Type fastx_ then tab to see their names
See all the programs like this:

...

titlefastx toolkit programs

...

Exercise: What if you just want to get rid of reads that are too low in quality?


Code Block
titlefastx_quality_filter syntax
fastq_quality_filter -q <N> -p <N> -i <inputfile> -o <outputfile>
-q N: Minimum Base quality score
-p N: Minimum percent of bases that must have [-q] quality

Let's try it on our data- trim it to only include reads with atleast 80% of the read having a quality score of 30 or above.

Code Block
titleRun fastx_quality_filter
fastq_quality_filter -q 20 -p 80 -i data/Sample1_R1.fastq -Q 33 -o Sample1_R1.filtered.fastq

Exercise: Compare the results of fastq_trimmer vs fastq_quality_filter


Code Block
titleCompare results
grep '^@HWI' Sample1_R1.trimmed.fastq |wc -l
grep '^@HWI' Sample1_R1.filtered.fastq |wc -l


Adaptor Trimming

...