Table of Contents

Overview

As mentioned in the introduction tutorial as well as the read processing tutorial, read processing can make a huge impact on downstream work. While cutadapt which was introduced in the read processing tutorial is great for quick evaluation or dealing with a single bad sample, it is not as robust as some other trimmers in particular when it comes to removing sequence that you know shouldn't be present but may exist in odd orientations (such as adapter sequences from the library preparation).

Learning objectives:

Install trimmomatic
Set up a small script to work around the annoying java invocation
Remove adapter sequences from some plasmids and evaluate effect on read quality, or assembly.

Installing trimmomatic

Trimmomatic's home page can be found at this link which includes links to the paper discussing the program, and a user manual. Trimmomatic is far above average for as far as programs go, most will not have a user manual, may not have been updated since originally published, etc. This is what makes it such a good tool.

Checking for installation

Code Block

language	bash
title	To verify you have placed the .jar file in a location that is already in your path try the following java invocation to pull up the help information.

java -jar $HOME/local/bin/trimmomatic-0.39.jar

If the above command works, jump down to the section on making a bash script. Otherwise continue with the next section to install the program

Installing using wget

In a new web browser window/tab, navigate to the trimmomatic home page. In the Downloading Trimmomatic section; right click on the 'binary' link for version 0.39 and copy that link address.

...

Code Block

language	bash

unzip Trimmomatic-0.39.zip
cdcp Trimmomatic-0.39
cp /trimmomatic-0.39.jar $HOME/local/bin

If you don't see the zip file or are unable to cd into the 0.39 directory after unzipping it let the instructor know.

Code Block

language	bash
title	To verify you have placed the .jar file in a location that is already in your path try the following java invocation to pull up the help information.

java -jar $HOME/local/bin/trimmomatic-0.39.jar

When you compare how wordy and complicated that is to the other programs you have encountered in the course, it makes sense that we would want a simpler way of accessing the program which is exactly what we will do next.

Anchor
bashScript
bashScript
Creating bash script to launch trimmomatic

The goal here will be to create an executable file with 2 lines. The first line will specify that it is a bash script. The second line will be the command we want to run, followed by information needed by bash to know to put the rest of the command you type after the executable file name.

Launch nano with the command: nano $HOME/local/bin/trimmomatic
In the nano window enter the following 2 lines exactly as they are typed
No Format
#!/bin/bash java -jar $HOME/local/bin/trimmomatic-0.39.jar $@
write and exit nano with control+o and control+x
make the trimmomatic file you just created executable
Code Block
language bash
title making file executable
chmod +x $HOME/local/bin/trimmomatic

Verify that you have successfully made your bash script by typing 'trimmomatic' (try using the tab key in the middle as further sign things are working) and then press enter. You should see the same help that you saw above.

Trimming adapter sequences

Example generic command

Code Block

language	bash
title	Example command for trimming illumina paired end adapters

trimmomatic PE <READ1> <READ2> -baseout Trim_Reads/<basename> ILLUMINACLIP:<FASTAadapterFILE>:4:30:10 MINLEN:30

Breaking down the parts of the above command:

Part	Purpose	replace with/note
trimmomatic	tell the computer you are using the bash script/command you just made
PE	tell trimmomatic program you are using the paired end mode
<READ1>	fastq file read1 you are trying to trim	actual name of fastq file
<READ2>	fastq file read2 you are trying to trim	actual name of paired fastq file
-baseout	add a prefix to the resulting trimmed files
Trimreads/<basename>	put all trimmed reads in a new folder. This is very good practice	typically you will replace with the first part of the fastq file name (before the Snumber or Lnumber depending on the file usually)
ILLUMINACLIP	tell trimmomatic program how you want to trim the adapters
<FASTAadapterFILE>	name of file you containing your adapters	good practice to copy the fasta file you want to use to the current directory
:4:30:10	allow 4 mismatches require 30 bp of overlap between R1 and R2 to identify the fragment as being less than the read length Require 10bp of sequence to match before removing anything
MINLEN:30	discard any reads that are less than 30bp after trimming

All of the above has been put together using the trimmomatic manual and experience in our lab given the adapters we use and the library kits we prepare the samples with.

What does this look like in practice?

Get some data

Code Block

language	bash
title	set up directories and copy files

mkdir -p $SCRATCH/GVA_trimmomatic_test/Trim_Reads
cd $SCRATCH/GVA_trimmomatic_test
cp $BI/gva_course/plasmid_qc/E1-7* .

cp $WORK/src/Trimmomatic-0.39/adapters/TruSeq3-PE-2.fa .

The ls command should show you 2 gzipped fastq files.

Trim the fastq files

The following command can be run on the head node. Like with FastQC if we are dealing with less than say 1-2Million reads, it is reasonable to run the command on the head node unless we have 100s of samples in which case submitting to the queue will be faster as the files can be trimmed all at once rather than 1 at a time. Use what you have learned in the class to determine if you think this command should be run on the head node. (this was covered in more detail in the first part of the evaluating and processing read quality tutorial.)

Code Block

language	bash
title	Figuring out how many reads are in each file
collapse	true

zgrep -c "^+$" *.fastq.gz

Code Block

language	bash
title	Example command for trimming illumina paired end adapters

trimmomatic PE E1-7_S187_L001_R1_001.fastq.gz E1-7_S187_L001_R2_001.fastq.gz -baseout Trim_Reads/E1-7.fastq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:4:30:10 MINLEN:30

Evaluating the output

The last 2 lines of text you get should read:

No Format
Input Read Pairs: 6891 Both Surviving: 2391 (34.70%) Forward Only Surviving: 1844 (26.76%) Reverse Only Surviving: 2644 (38.37%) Dropped: 12 (0.17%) TrimmomaticPE: Completed successfully

From the last line we know things worked correctly.

2nd to last line tells us:

we had 6891 total reads
34.7% of reads both R1 and R2 were long enough to be kept after trimming
26.76% of reads and 38.37% of reads only 1 of the reads were long enough and/or not a complete duplicate of the other read
only 0.17% of reads were discarded for both R1 and R2. This is a rough estimate of adapter dimer being present in the sample.

When we interrogate the Trim_Reads folder with ls we see 4 files:

_1P
_2P
_1U
_2U

1/2 represent read 1 and read2 ... P/U represent paired and unpaired reads. They are kept separate as many programs require R1 and R2 files to be the same length when being used as input.

Useful command line for loop

Much as I suggest storing trimmed reads in a directory named Trim_Reads (or similar) I also suggest storing raw reads in a directory named Raw_Reads. The following command will pair all R1 reads in the Raw_Reads folder with its R2 pair, determine the base name, and generate a command to trim the file

Code Block

language	bash
title	command taken from my history to trim all files

for r1 in Raw_Reads/*_R1_*.fastq.gz; do r2=$(echo $r1|sed 's/_R1_/_R2_/'); name=$(echo $r1|sed 's/_R1_001.fastq.gz//'|sed 's/Raw_Reads\///'); echo "trimmomatic PE $r1 $r2 -baseout Trim_Reads/$name.fastq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:4:30:10 MINLEN:30";done > trim_commands

Optional steps

Consdier moving over to the multiqc tutorial to get a large set of samples to trim all at once and use multiqc to determine which samples have the most surviving reads. The reads could then further be assembled in the Spades tutorial as they are all plasmid samples.

Return to GVA2020

Space shortcuts

Page tree

Versions Compared

Old Version 1

New Version 2

Key

Overview

Learning objectives:

Installing trimmomatic

Checking for installation

Installing using wget

Anchor
bashScript
bashScript
Creating bash script to launch trimmomatic

Trimming adapter sequences

Example generic command

Get some data

Trim the fastq files

Evaluating the output

Useful command line for loop

Optional steps

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 1

New Version 2

Key

Overview

Learning objectives:

Installing trimmomatic

Checking for installation

Installing using wget

AnchorbashScriptbashScriptCreating bash script to launch trimmomatic

Trimming adapter sequences

Example generic command

Get some data

Trim the fastq files

Evaluating the output

Useful command line for loop

Optional steps

Anchor
bashScript
bashScript
Creating bash script to launch trimmomatic