Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here is a comparison of the configurations and ls6 and stampede2. As you can see, stampede2 is the larger cluster, launched in 2017, but ls6, launched this year, has fewer but more powerful nodes.


ls6stampede2
login nodes

43

128 cores each
256 GB memory

6

28 cores each
128 GB memory

standard compute nodes

560

128 cores per node
256 GB memory

4,200 KNL (Knights Landing)

  • 68 cores per node (272 virtual)
  • 96 GB memory

1,736 SKX (Skylake)

  • 48 cores per node (96 virtual)
  • 192 GB memory
GPU nodes

16 total

128 cores per nod
256 GB memory

2x NVIDIA A100 GPUs
w/ 40GB RAM onboard

--
batch systemSLURMSLURM
maximum job run time

48 hours, normal queue

2 hours, development queue

96 hours on KNL nodes, normal queue

48 hours on SKX nodes, normal queue

2 hours, development queue

...

The module system is an incredibly powerful way to have literally thousands of software packages available, some of which are incompatible with each other, without causing complete havoc. The TACC staff builds the desired package from source code stages packages in well-known locations that are NOT on your $PATH. Then, when a module is loaded, its binaries are added to your $PATH.

...

Code Block
languagebash
titleHow module load affects $PATH
# first type "fastqcmatlab" to show that it is not present in your environment:
fastqcmatlab
# it's not on your $PATH either:
which fastqcmatlab

# now add fastqcmatlabto to your environment and try again:
module load fastqc
fastqc --help
matlab
# and see how it's now on your $PATH:
which fastqcmatlab
# you can see the new directory at the front of $PATH
echo $PATH

# to remove it, use "unload"
module unload fastqcmatlab
fastqcmatlab
# gone from $PATH again...
which fastqc

module spider

These days the TACC module system includes hundreds of useful bioinformatics programs. To see if your favorite software package has been installed at TACC, use module spider:

Code Block
languagebash
module spider fastqc
module spider samtools
matlab

TACC BioContainers modules

...

These BioContainers are not visible in TACC's "standard" module system, but only after the master biocontainers module is loaded:loaded. Once it has been loaded, you can search for your favorite bioinformatics program using module spider.

Code Block
languagebash
# MakeVerify sure the non-biocontainers version of fastqc is not loaded
module unload fastqc 
# Verify that fastqc that samtools is not available
fastqc samtools

# Load the Biocontainers master module (this takes a while)
module load biocontainers

# Now look for these programs
module spider samtools
module spider Rstats
module spider kallisto
module spider bowtie2
module spider minimap2
module spider multiqc
module spider GATK
module spider velvet

...

Tip

The standard TACC module system is being has been phased out in favor of the new BioContainers module, so use BioContainers modules insteadfor bioinformatics programs, so always look for your application in BioContainers.

While it's great that there are now hundreds of programs available through BioContainers, the one drawback is that they can only be run on cluster nodes, not on login nodes. To test BioContainer program interactively, you will need to use TACC's idev command to obtain an interactive cluster node. More on this shortly...

loading a biocontainer module

Once the biocontainers module has been loaded, you can just module load the desired tool module, as with the kallisto pseudo-aligner program below.

Code Block
languagebash
# Load the Biocontainers master module 
module load biocontainers

# Verify kallisto is not yet available
kallisto 

# Load the default kallisto biocontainer 
module load kallisto

# Verify kallisto is not available (although not on login nodes)
kallisto

Note that loading a BioContainer does not add anything to your $PATH. Instead, it defines an alias, which is just a shortcut for executing the command. You can see the alias definition using the type command. And you can ensure the program is available using the command -v utility.

Code Block
languagebash
# Note that kallisto has not been added to your $PATH
which kallisto

# Instead, anbut aliasinstead has been defined. Use type to see its definition
typean alias
which kallisto

# Ensure kallisto is available with command -v
command -v kallisto

...

For one thing, remember that your $HOME directory quota is fairly small (10 GB on stampede2 ls6), and that can fill up quickly if you install many programs. We recommend creating an installation area in your $WORK2 $WORK directory and installing programs there. You can then make symbolic links to the binaries you need in your $HOME/local/bin directory (which was added to your $PATH in your .bashrc).

...

Job execution is controlled by the SLURM batch system on both stampede2 and ls5 ls6.

To run a job you prepare 2 files:

  1. a commands file file containing the commands to run, one command per line (<job_name>.cmds)
  2. a job control file that describes how to run the job (<job_name>.slurm)

...

Code Block
languagebash
titleCopy simple commands
cds
mkdir -p $SCRATCH/core_ngs/slurm/simple
cd $SCRATCH/core_ngs/slurm/simple
cp $CORENGS/tacc/simple.cmds .

...