Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Tip
titleReservations

Use our summer school reservation (CoreNGSday2CoreNGS-Tue) when submitting batch jobs to get higher priority on the ls6 normal queue today:

sbatch --reservation=CoreNGSday2 <batch_file>.slurm
idev -m 180 -N 1 -A OTH21164 -r CoreNGSday2

Note that the reservation name (CoreNGSday2CoreNGS-Tue) is different from the TACC allocation/project for this class, which is OTH21164.

Table of Contents

Anchor
Clusters
Clusters
Compute cluster overview

When you SSH into ls6, your session is assigned to one of a small set of login nodes (also called head nodes). These are not separate from the cluster compute nodes that will run your jobs.

Think of a node as a computer, like your laptop, but probably with more cores and memory. Now multiply that computer a thousand or more, and you have a cluster.

...

The small set of login nodes are a shared resource (type the users command to see everyone currently logged in) and are not meant for running interactive programs – for that you submit a description of what you want done to a batch system, which farms distributes the work out to one or more compute nodes.

...

Here is a comparison of the configurations and ls6 and stampede2. As you can see, stampede2 is the larger cluster, launched in 2017, but ls6, launched this yearom 2022, has fewer but more powerful nodes.

...

Note the use of the term virtual core above on stampede2. Compute cores are standalone processors – mini CPUs, each of which can execute separate sets of instructions. However modern cores may also have hyper-threading enabled, where a single core can appear as more than one virtual processor to the operating system (see https://en.wikipedia.org/wiki/Hyper-threading for more on hyper-threading). For example, stampede2 nodes have 2 or 4 hyperthreads (HTs) per core. So KNL nodes with 4 HTs for each of the 68 physical cores, have a total of 272 virtual cores.

...

Unfortunately, the TACC user guides are aimed towards a different user community – the weather modelers and aerodynamic flow simulators who need very fast matrix manipulation and other high performance computing High Performance Computing (HPC) features. The usage patterns for bioinformatics – generally running 3rd party tools on many different datasets – datasets – is rather a special case for HPC. TACC calls our type of processing "parameter sweep jobs" and has a special process for running them, using their launcher module.

...

When you type in the name of an arbitrary program (ls for example), how does the shell know where to find that program? The answer is your $PATH. $PATH is a pre-defined predefined environment variable whose value is a list of directories.The shell looks for program names in that list, in the order the directories appear.

...

For example, the following module load command makes the fastqc FASTQ file quality checking program singularity container management system available to you:

Code Block
languagebash
titleHow module load affects $PATH
# first type "matlabsingularity" to show that it is not present in your environment:
matlabsingularity
# it's not on your $PATH either:
which matlabsingularity

# now add matlabtobiocontainers to your environment and try again:
module load matlabbiocontainers
# and see how singularity it'sis now on your $PATH:
which matlabsingularity
# you can see the new directory at the front of $PATH
echo $PATH

# to remove it, use "unload"
module unload matlabbiocontainers
matlabsingularity
# gone from $PATH again...
which matlabsingularity

TACC BioContainers modules

...

TACC obtains its containers from BioContainers (https://biocontainers.pro/ and https://github.com/BioContainers/containers), a large public repository of bioinformatics tool Singularity containers. This has allowed TACC to easily provision thousands of such tools!

These BioContainers are not visible in TACC's "standard" module system, but only after the master biocontainers module is loaded. Once it has been loaded, you can search for your favorite bioinformatics program using module spider.

Code Block
languagebash
# Verify that samtools is not available
samtools
# and cannot be found in the standard module system
module spider samtools

# Load the BiocontainersBioContainers master module (this takes a while)
module load biocontainers

# Now look for these programs
module spider samtools
module spider Rstats
module spider kallisto
module spider bowtie2
module spider minimap2
module spider multiqc
module spider GATKgatk
module spider velvet

Notice how the BioContainers module names have "ctr" in their names, version numbers, and other identifying information.

Tip

The standard TACC module system has been phased out for bioinformatics programs, so always look for your application in BioContainers.

While it's great that there are now hundreds of programs available through BioContainers, the one drawback is that they can only be run on cluster compute nodes, not on login nodes. To test test BioContainer program interactively, you will need to use TACC's idev command to obtain an interactive cluster node. More on this shortly...

...

For one thing, remember that your $HOME directory quota is fairly small (10 GB on ls6), and that can fill up quickly if you install many programs. We recommend creating an installation area in your $WORK directory and installing programs there. You can then make symbolic links to the binaries you need in your $HOME~/local/bin directory (which was added to your $PATH in your .bashrc).

...

Warning
title$PATH caveat

Remember that the order of locations in the $PATH environment variable is the order in which the locations will be searched. In particular, the (non-BioContainers) module load command adds to the front of your path. This can mask similarly-named programs, for example, in your $HOME/local/bin directory.

Anchor
Jobs
Jobs
Job Execution

...

The process of running the job involves these steps:

  1. Create a commands file containing exactly one task per line.
  2. Prepare a job control file for the commands file that describes how the job should be run.
  3. You submit the job control file to the batch system. The job is then said to be queued to run.
  4. The batch system prioritizes the job based on the number of compute nodes needed and the job run time requested.
  5. When compute nodes become available, the job tasks (command lines in the <job_name>.cmds file) are assigned to one or more compute nodes and begin to run in parallel.
  6. The job completes when either:
    1. you cancel the job manually
    2. all job tasks in the job complete (successfully or not!)
    3. the requested job run time has expired

SLURM at a glance

Here are the main components of the SLURM batch system.


stampede2, ls5
batch systemSLURM
batch control file name<job_name>.slurm
job submission commandsbatch <job_name>.slurm
job monitoring commandshowq -u
job stop commandscancel -n <job name>

...