Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Similar to our comments about version control, it is not always necessary to update to the newest and best tools, sometimes the old ones work "well enough" and can be "left alone". For the purpose of this tutorial, we will be working with flexbar which like breseq is something that we have installed in the BioITeam as it is not a tacc module. Test that flexbar is installed and working correctly using the which command and 'flexbar -h'. If these commands work, note that this is the perfect example of the benefits of the BioITeam. This is has proven to be an extremely difficult command to install and configure and install in the past, leading me to place it in the $BI/bin directory, but it still required additional options (expand the next section to see what those may have been). Yet if it is working it strongly suggests some other member of the BioITeam found it available, and fixed it in some way so it works by default. While we will never again fight with flexbar to make it work somewhere new or with set of data that is different, as long as it is working we will keep using it. As a optional exercise you could figure out how to use the fastx_toolkit  using what you learned in the quality control tutorial to accomplish the same result.

Expand
titleIf error messages are displayed you may need to try the following as has been necessary in previous years

Some additional modules must be loaded in order for it to work correctly, and the LD_LIBRARY_PATH variable must be modified as listed below.

Code Block
languagebash
titleIF YOU DO NOT HAVE AN ERRMESSAGE DO NOT TYPE THIS
collapsetrue
module swap intel gcc
export LD_LIBRARY_PATH=/corral-repl/utexas/BioITeam/flexbar_v2.23_linux64:$LD_LIBRARY_PATH

For this tutorial, it is sufficient to simply type these commands out, if this becomes something you want to do more often, or want to submit as a job, it would be important to add these lines to your .bashrc so they are loaded each time you log in, and available by default to the compute nodes.

Typing flexbar -h should display a lengthy list of optional arguments which can be used for a variety of purposes. For the purpose of this tutorial, we will only focus on trimming the first 16 bases off each read as this represents the 12 bases of the molecular index and a 4 base constant region. See if you can figure out what the command is based on the help output pay special attention to the -t option.

Some additional modules must be loaded in order for it to work correctly, and the LD_LIBRARY_PATH variable must be modified as listed below.

Expand
titleIf you need a hint without the answer click the triangle...

The following arguments are the ones that are needed to successfully trim the first 16 bases of the sequence:

Code Block
    -u, --max-uncalled NUM
          Allowed uncalled bases (N or .) in reads, default: 0
    -x, --pre-trim-left NUM
          Trim specified number of bases on 5' end of reads before alignment
    -t, --target STR
          Prefix for output file names
    -r, --reads FILE
          Input file with reads, that may contain barcodes
    -p, --reads2 FILE
          Second input file for paired read scenario
    -f, --format STR
          Input format of reads: csfasta, csfastq, fasta, fastq, fastq-sanger, fastq-solexa, fastq-i1.3, fastq-i1.5,
          fastq-i1.8 (illumina 1.8+)
    -d, --length-dist
          Write length distribution for read output files
Code Block
languagebash
titleclick here for the answer
collapsetrue
flexbar -u 100 -x 16 -t trimmed -r DED110_CATGGC_L006_R1_001.fastq -p DED110_CATGGC_L006_R2_001.fastq -f fastq -d
Expand
titleIf error messages are displayed you may need to try the following as has been necessary in previous years
Code Block
languagebash
titleIF YOU DO NOT HAVE AN ERRMESSAGE DO NOT TYPE THIS
collapsetrue
module swap intel gcc
export LD_LIBRARY_PATH=/corral-repl/utexas/BioITeam/flexbar_v2.23_linux64:$LD_LIBRARY_PATH
For this tutorial, it is sufficient to simply type these commands out, if this becomes something you want to do more often, or want to submit as a job, it would be important to add these lines to your .bashrc so they are loaded each time you log in, and available by default to the compute nodes.

 

 

In an idev shell this should take less than 5 minutes to complete. Once completed there should be 4 new files, all of which begin with "trimmed" if you took the answer from the above help, or whatever string you entered for the -t argument if you did not use the above help. These 4 files represent the trimmed fastq files, the length distribution. Using the head command, see if you can figure out which file is which. 

...