...
Here is a comparison of the configurations and ls6 and stampede2. As you can see, stampede2 is the larger cluster, launched in 2017, but ls6, launched this year, has fewer but more powerful nodes.
ls6 | stampede2 | |
---|---|---|
login nodes | 43 128 cores each | 6 28 cores each |
standard compute nodes | 560 128 cores per node | 4,200 KNL (Knights Landing)
1,736 SKX (Skylake)
|
GPU nodes | 16 total 128 cores per nod 2x NVIDIA A100 GPUs | -- |
batch system | SLURM | SLURM |
maximum job run time | 48 hours, normal queue 2 hours, development queue | 96 hours on KNL nodes, normal queue 48 hours on SKX nodes, normal queue 2 hours, development queue |
...
The module system is an incredibly powerful way to have literally thousands of software packages available, some of which are incompatible with each other, without causing complete havoc. The TACC staff builds the desired package from source code stages packages in well-known locations that are NOT on your $PATH. Then, when a module is loaded, its binaries are added to your $PATH.
...
Code Block | ||||
---|---|---|---|---|
| ||||
# first type "fastqcmatlab" to show that it is not present in your environment: fastqcmatlab # it's not on your $PATH either: which fastqcmatlab # now add fastqcmatlabto to your environment and try again: module load fastqc fastqc --help matlab # and see how it's now on your $PATH: which fastqcmatlab # you can see the new directory at the front of $PATH echo $PATH # to remove it, use "unload" module unload fastqcmatlab fastqcmatlab # gone from $PATH again... which fastqc |
module spider
These days the TACC module system includes hundreds of useful bioinformatics programs. To see if your favorite software package has been installed at TACC, use module spider:
Code Block | ||
---|---|---|
| ||
module spider fastqc
module spider samtools
|
matlab |
TACC BioContainers modules
...
These BioContainers are not visible in TACC's "standard" module system, but only after the master biocontainers module is loaded:loaded. Once it has been loaded, you can search for your favorite bioinformatics program using module spider.
Code Block | ||
---|---|---|
| ||
# MakeVerify sure the non-biocontainers version of fastqc is not loaded module unload fastqc # Verify that fastqc that samtools is not available fastqc samtools # Load the Biocontainers master module (this takes a while) module load biocontainers # Now look for these programs module spider samtools module spider Rstats module spider kallisto module spider bowtie2 module spider minimap2 module spider multiqc module spider GATK module spider velvet |
...
Tip |
---|
The standard TACC module system is being has been phased out in favor of the new BioContainers module, so use BioContainers modules insteadfor bioinformatics programs, so always look for your application in BioContainers. While it's great that there are now hundreds of programs available through BioContainers, the one drawback is that they can only be run on cluster nodes, not on login nodes. To test BioContainer program interactively, you will need to use TACC's idev command to obtain an interactive cluster node. More on this shortly... |
loading a biocontainer module
Once the biocontainers module has been loaded, you can just module load the desired tool module, as with the kallisto pseudo-aligner program below.
Code Block | ||
---|---|---|
| ||
# Load the Biocontainers master module module load biocontainers # Verify kallisto is not yet available kallisto # Load the default kallisto biocontainer module load kallisto # Verify kallisto is not available (although not on login nodes) kallisto |
Note that loading a BioContainer does not add anything to your $PATH. Instead, it defines an alias, which is just a shortcut for executing the command. You can see the alias definition using the type command. And you can ensure the program is available using the command -v utility.
Code Block | ||
---|---|---|
| ||
# Note that kallisto has not been added to your $PATH which kallisto # Instead, anbut aliasinstead has been defined. Use type to see its definition typean alias which kallisto # Ensure kallisto is available with command -v command -v kallisto |
...
For one thing, remember that your $HOME directory quota is fairly small (10 GB on stampede2 ls6), and that can fill up quickly if you install many programs. We recommend creating an installation area in your $WORK2 $WORK directory and installing programs there. You can then make symbolic links to the binaries you need in your $HOME/local/bin directory (which was added to your $PATH in your .bashrc).
...
Job execution is controlled by the SLURM batch system on both stampede2 and ls5 ls6.
To run a job you prepare 2 files:
- a commands file file containing the commands to run, one command per line (<job_name>.cmds)
- a job control file that describes how to run the job (<job_name>.slurm)
...
Code Block | ||||
---|---|---|---|---|
| ||||
cds mkdir -p $SCRATCH/core_ngs/slurm/simple cd $SCRATCH/core_ngs/slurm/simple cp $CORENGS/tacc/simple.cmds . |
...