...
Similar to our comments about version control, it is not always necessary to update to the newest and best tools, sometimes the old ones work "well enough" and can be "left alone". For the purpose of this tutorial, we will be working with flexbar which like breseq is something that we have installed in the BioITeam as it is not a tacc module. Test that flexbar is installed and working correctly using the which command and 'flexbar -h'. If these commands work, note that this is the perfect example of the benefits of the BioITeam. This is has proven to be an extremely difficult command to install and configure and install in the past, leading me to place it in the $BI/bin directory, but it still required additional options (expand the next section to see what those may have been). Yet if it is working it strongly suggests some other member of the BioITeam found it available, and fixed it in some way so it works by default. While we will never again fight with flexbar to make it work somewhere new or with set of data that is different, as long as it is working we will keep using it. As a optional exercise you could figure out how to use the fastx_toolkit using what you learned in the quality control tutorial to accomplish the same result.
Expand | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
Some additional modules must be loaded in order for it to work correctly, and the LD_LIBRARY_PATH variable must be modified as listed below.
For this tutorial, it is sufficient to simply type these commands out, if this becomes something you want to do more often, or want to submit as a job, it would be important to add these lines to your .bashrc so they are loaded each time you log in, and available by default to the compute nodes. |
Typing flexbar -h should display a lengthy list of optional arguments which can be used for a variety of purposes. For the purpose of this tutorial, we will only focus on trimming the first 16 bases off each read as this represents the 12 bases of the molecular index and a 4 base constant region. See if you can figure out what the command is based on the help output pay special attention to the -t option.
Expand | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||
The following arguments are the ones that are needed to successfully trim the first 16 bases of the sequence:
| |||||||||||
Expand | |||||||||||
|
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
module swap intel gcc
export LD_LIBRARY_PATH=/corral-repl/utexas/BioITeam/flexbar_v2.23_linux64:$LD_LIBRARY_PATH |
In an idev shell this should take less than 5 minutes to complete. Once completed there should be 4 new files, all of which begin with "trimmed" if you took the answer from the above help, or whatever string you entered for the -t argument if you did not use the above help. These 4 files represent the trimmed fastq files, the length distribution. Using the head command, see if you can figure out which file is which.
...