Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Table of Contents
typelist
separatornewline

Introduction

OK. So you just read the latest issue of Bioinformatics (or did a Google search) and have discovered some new pieces of software that promise to slice and dice your data in new, interesting, and useful ways. Most often, these tools will be designed to run in a Linux environment. Unfortunately, the helpful support staff at TACC may not have had time to test these tools and make a proper module out of them (or maybe they didn't want to make 1,000+ modules for every piece of bioinformatics software out there). Perhaps there is a TACC module, but it was made a month or two back when the software was at version 1.01 and now it's at version 1.03, which has a bug fix or some nifty new bell and whistle.

The bottom line is that you are going to find yourself in a situation where module spider will come up empty and you're on your own installing a piece of software that you are dying try out on TACC.

Unfortunately, there is no double-click installer for TACC. Fortunately, a majority of the better and more mature programs out there (but by no means all bioinformatics software) can be fairly easily installed. If these instructions fail, you might need to find your nearest Linux guru. Or, you might try to consult Google and tinker with things a bit.

The overall steps for installing a program on a Linux system are:

  1. Download the executable or source code
  2. Compile or make the project (if installing from source code)
  3. Set up your $PATH to find the new executable

Note: Most Linux installs will work similarly on MacOSX, with just a few additional preambles (install XCode, maybe some extra libraries, etc). With more extra work, it is possible to set up a Linux-like environment in Windows as well. Both of these topics are outside the scope of what we are going to cover here.

Case 1: Installing a precompiled binary (executable)

For programs that are already compiled (converted from high level source code in a language like C into machine specific code), you are often given some choices and need to determine how to download the version that has the correct CPU architecture for your machine.

You can get your CPU architecture with this command:

Code Block
titleFinding out the CPU architecture of your machine
login1$ arch

Output might be something like i386 (for my MacBook) or x86_64 (for Lonestar).

Example: Install SSAHA2 precompiled binary

The website for the SSAHA2 read mapper has links to download executables compiled for several different architectures. Using commands that you have learned in earlier lessons, download the correct one to Lonestar and place it under the directory $HOME/local/bin.

Expand
Hint...
Hint...

You can often right-click to copy the URL of a link on a website and then use wget to download it directly to TACC.

Expand
One possible answer...
One possible answer...
Code Block
$login cdw
$login wget ftp://ftp.sanger.ac.uk/pub4/resources/software/ssaha2/ssaha2_v2.5.5_x86_64.tgz
$login tar -xvzf ssaha2_v2.5.5_x86_64.tgz
$login cd ssaha2_v2.5.5_x86_64
$login mkdir -p $HOME/local/bin
$login mv ssaha* $HOME/local/bin

How the shell finds executables: $PATH

Now, you might want to tell your login shell that it should look for executable files in this new directory $HOME/local/bin. This will allow you to use the executable as a one-word command like you are used to:

Code Block
login1$ ssaha2

Instead of writing out the entire path to the executable to run it, like in one of these examples:

Code Block
login1$ /home1/01502/jbarrick/local/bin/ssaha2
login1$ $HOME/local/bin/ssaha2

Assuming you are using the bash shell, you can do this by editing your $HOME/.profile or $HOME/.profile_user configuration file. These files are basically just bash scripts that are run whenever you log in. You want to add a line that looks like this to the top of $HOME/.profile:

Code Block
titleAdding a new location to your PATH
export PATH="$HOME/local/bin:$PATH"

This sets the environmental variable PATH to point to its old value with your new directory appended to the front (the : separates multiple paths). This means the shell will look for executables in this new location first, then it will look in all of the standard locations after that. For more information on environmental variables see the Bash Beginner's Guide.

Important! In order to have this change take effect, you must log out or log in again to force the shell to re-read the $HOME/.profile_user file. (Alternately, you can use one of these commands to re-read it at any time:

Code Block
login1$ . $HOME/.profile_user
login1$ source $HOME/.profile_user

If your path is not working or you're curious about where else your shell is looking for commands and the order, then you might want to see the value of your $PATH.

Code Block
titlePrint out the value of your PATH or all environmental variables
login1$ echo $PATH
login1$ env

Warning! If you forget to include $PATH on the right side in the above example, then you will tell your shell to not look in the usual places for executables any more. This means that ls, cd, and other common commands will no longer work without typing out their whole paths, e.g. /bin/ls. This can be extremely confusing!!

Handling multiple versions If you install a newer version of a command that is already available on TACC for yourself, then you might get confused about what version you are running when you type the command. You can see the whole path to the executable that will be run when you type a one-word command using the which command.

Code Block
titleDetermining the location of an executable
login1$ which ssaha2

Many tools will also have a -v or --version flag, or output their version information in a header when they are run. This can help you be sure that you are running the version that you think you are.

Code Block
titleFinding out the version of a program
login1$ ssaha2 -v

Case 2: Install from the source code

Note on TACC compilers

There are multiple compilers available on TACC:

  • intel or icc - the default compiler. Preferred for optimizing speed of compiled executables.
  • gcc - the GNU compiler collection. Tends to be more compatible.

Be aware that if you compile libraries and programs that link to them, that generally you must compile all components with the same compiler.

If you run into an error during compilation, try the gcc compiler by loading its module. You may get a message like this:

Code Block
login1$ module load gcc

Error: You can only have one compiler module loaded at time.
You already have intel loaded.
To correct the situation, please enter the following command:

  module swap intel gcc/4.4.5

Please submit a consulting ticket if you require additional assistance.

So, follow the directions:

Code Block
login1$ module swap intel gcc/4.4.5

You will need to do this to get breseq to compile in the next example.

Example: Install breseq from a source code archive

breseq is a tool developed by the Barrick lab. You might use it in a later lesson. It is a good example of a tool that can be downloaded and compiled.

breseq web page
breseq download page

breseq uses the common GNU build system install sequence. If you install other GNU tools then the same ./configure; make; make install command sequence will often be used.

Code Block
titleInstalling _breseq_ from source
$login1 cdw
$login1 wget http://breseq.googlecode.com/files/breseq-0.19.tar.gz
$login1 tar -xvzf breseq-0.19.tar.gz
$login1 cd breseq-0.19
$login1 ./configure --prefix=$HOME/local
$login1 make
$login1 make install

The extra option --prefix to ./configure sets where the executable and any other files associated with the program will be installed. If you leave off this flag, then it will try to install them in a system-side location. You must have administrator privileges to do this and would generally have to substitute sudo make install for the last step to get this to work. That won't work on TACC! (sudo means "super-user do".)

For some other tools, the instructions may tell you to skip straight to make, or you might also have to install some other programs or libraries that the tool you want to use needs to run. Generally, you can find this information in the online documentation or an INSTALL file in the root of the downloaded code.

More Examples

Example: Install the latest version of Bowtie2

There is a newer version of Bowtie2 available than the one loaded into a module on TACC. You might want to use it because it includes some new bug fixes. You can download either a source code version to compile using the above instructions or a binary version of bowtie2. Try to get this running on your own.

Expand
One possible complication if you are installing the binaries...
One possible complication if you are installing the binaries...

Bowtie2 is comprised of multiple executables. You will need to copy or move all of them into $HOME/local/bin to have a functioning Bowtie2 install. (Be sure that both bowtie2 and bowtie2-build work).

Other Cases

In other lessons we'll cover various deviations and elaborations on these two procedures in order to install specific programs, R modules, Perl modules, Python modules, etc.