This is the first step in tag-seq data analysis. We use the pipeline provided below for filtering out duplicates, remove headers and removing low quality reads: 

https://github.com/z0on/tag-based_RNAseq

Trimming fastq files

Within the directory with tagseq fastq files, run the following:

tagseq_trim_launch.pl '\.fastq$' > clean  

 chmod +x clean

nohup ./clean &>trim.stats &

When this completes, trim.stats will contain the trimming statistics for every fastq file. Note that the last command can take a while, but you can close your ssh session and come back and check later if you'd like. The command will continue to run.  You may need to type out the first command because ' can look weird when you copy-paste.

This will do adaptor trimming, deduplicating, and quality filtering. Dhivya has found it useful to remove the quality filtering step with some datasets, but this is enough to get an idea of data loss due to lack of header/duplicates/low quality.