Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
titlebreseq command
cd $SCRATCH/BDIB_breseq_lambda_mixed_pop
module loadunload intel
module load Rstats
samtools
breseq -j 48 -r lambda.gbk lambda_mixed_population.fastq &> log.txt &
Expand
titleWhy do we have to unload samtools before running breseq?

As eluded to in the SNV tutorial, breseq uses (and includes) an older version of samtools that had slightly different options, so breseq will not run without the correct version of samtools. Speak to your instructors about changes to your .bashrc file and setting up breseq to be run in the best way if you expect to use it frequently.

 

A bunch of progress messages will stream by during the breseq run which would clutter the screen if not for the redirection to the log.txt file. The & at the end of the line tells the system to run the previous command in the background which will enable you to still type and execute other commands while breseq runs. The output text details several steps in a pipeline that combines the steps of mapping (using SSAHA2), variant calling, annotating mutations, etc. You can examine them by peeking in the log.txt file as your job runs using tail log.txt. While breseq is running lets look at what the different parts of the command are actually doing:

...

This will finish very quickly (likely before you begin reading this) with a final line of "Creating index HTML table...". check this using the tail command. If you instead see this output:

No Format
!!!!!!!!!!!!!!!!!!!!!!!> FATAL ERROR <!!!!!!!!!!!!!!!!!!!!!!!
Error running command:
[system] /opt/apps/samtools/1.3/bin/samtools sort ./03_candidate_junctions/best.unsorted.bam ./03_candidate_junctions/best
Result code: 256
FILE: libbreseq/common.h   LINE: 1294
!!!!!!!!!!!!!!!!!!!!!!!> FATAL ERROR <!!!!!!!!!!!!!!!!!!!!!!!
breseq: libbreseq/common.h:92: void breseq::my_assertion_handler(bool, const char*, const char*, int, const string&): Assertion `false' failed.
Aborted

It means that you tried to run breseq with the incorrect version of samtools loaded. Execute the following 4 commands and then retry running breseq. 

Code Block
languagebash
titleCommands IF breseq failed
collapsetrue
#DO NOT DO THIS IF BRESEQ COMPLETED SUCCESSFULLY 
rm -r 0*
rm -r data
rm -r output
module unload samtools

 

Looking at breseq predictions

breseq produced a lot of directories beginning 01_sequence_conversion02_reference_alignment, ... Each of these contains intermediate files that are used to 'pickup where it left off' if the run doesn't complete successfully. These can be deleted when the run completes, or explored if you are interested in the inner guts of what is going on. More importantly, breseq will also produce two directories called: data and output which contain files used to create .html output files and .html output files respectively. The most interesting files are the .html files which can't be viewed directly on lonestar. Therefore we first need to copy the output directory back to your desktop computer. Go back to the first tutorial (BDIB_breseq_tutorial_1) and Use scp to transfer the contents of the output directory back to your local computer.

...

Expand
titleWe have previously covered using scp to transfer files, but here we present another detailed example. Click to expand.

To use scp you will need to run it in open a second terminal window that is on your desktop and not on the remote TACC system. It can be tricky to figure out where the files are on the remote TACC system, because your desktop won't understand what $HOME, $WORK, $SCRATCH mean (they are only defined on TACC).

To figure out the full path to your file, you can use the pwd command in your terminal on TACC in the window that you ran breseq in (it should contain an "output" folder). Rather than copying the entire contents of the folder which can be rather large, we are going to add a twist of compressing the entire folder into a single compressed archive using the tar command so that the size will be smaller and it will transfer faster:

Code Block
languagebash
titleCommand to type in TACC
cd $SCRATCH/BDIB_breseq_lambda_tutorialmixed_1pop
tar -czvf output.tar.gz output  # the czvf options in order mean Create, Zip, Verbose, Force
pwd

Then you can then copy paste that information (in the correct position) into the scp command on the desktop's command line:

Code Block
languagebash
titleCommand to type in the desktop's terminal window
scp -r <username>@ls5.tacc.utexas.edu:<the_directory_returned_by_pwd>/output.tar.gz .
 
# Enter your password and Token number and wait for the file transfer to complete
 
tar -xvzf output.tar.gz  # the new "x" option at the front means eXtract 

...

  • Deatherage, D.E.Barrick, J.E.. (2014) Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseqMethods Mol. Biol. 1151:165-188. «PubMed»

 

Examining breseq results

Exercise: Can you figure out how to archive all of the output directories and copy only those files (and not all of the very large intermediate files) back to your machine? - without deleting any files?

Expand
titleClick here for a hint without the answer

You will want to use the tar command again, but you will need to use a wildcard to specify what goes into the compressed file, and only the output directories within each of the wildcard-matched directories.

Code Block
languagebash
titleclick here to check your solution, or get the answer
collapsetrue
tar -cvzf output.tgz output_*/output
Expand
titleHere are the commands we showed you for the previous example (with the trick of getting a single compressed output directory you just learned) to transfer so you don't have to scroll back and forth. See if you can remember how to do it without going back over the lesson.

To use scp you will need to run it in a terminal that is on your desktop and not on the remote TACC system. It can be tricky to figure out where the files are on the remote TACC system, because your desktop won't understand what $HOME, $WORK, $SCRATCH mean (they are only defined on TACC).

To figure out the full path to your file, you can use the pwd command in your terminal on TACC in the window that you ran breseq in (it should contain an "output" folder). Rather than copying the entire contents of the folder which can be rather large, we are going to add a twist of compressing the entire folder into a single compressed archive using the tar command so that the size will be smaller and it will transfer faster:

Code Block
languagebash
titleCommand to type in TACC
tar -czvf output.tar.gz output_*/output  # the czvf options in order mean Create, Zip, Verbose, Force
pwd

Then you can then copy paste that information (in the correct position) into the scp command on the desktop's command line:

Code Block
languagebash
titleCommand to type in the desktop's terminal window
scp -r <username>@ls5.tacc.utexas.edu:<the_directory_returned_by_pwd>/output.tar.gz .
tar -xvzf output.tar.gz  # the new "x" option at the front means eXtract 

 

Click around in the results and see the different types of mutations you can detect.Return to the GVA2017 page