Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Tip
titleImportant Tip -- the Tab key is your BFF!

The Tab key is one of your best friends in Linux. Hitting it invokes "shell completion", which is as close to magic as it gets!

  • Tab once will expand the current command line contents as far as it can unambiguously.
    • if nothing shows up, there is no unambiguous match
  • Tab twice will give you a list of everything the shell finds matching the current command line.
    • you then decide where to go next

...

As you can see, there are a lot of locations on the $PATH. That's because when you load modules at TACC (such as the module load lines in the common login script), that mechanism makes the programs available to you by putting their installation directories on your $PATH. We'll learn more about modules shortly.

...

Code Block
languagebash
titleSetting up the friendly shell prompt for stampede
##########
# SECTION 3 -- controlling the prompt
# for NGS course
if [[ -n "$PS1" ]]; then
  PS1='stampls5:\w$ '
fi

File systems at TACC

...

TACC storage areas and Linux commands to access data
(all commands to be executed at TACC
except
laptop-to-TACC copies, which must be executed on your laptop)

Image Modified

Local file systems

...

On lonestar5 these local file systems have the following characteristics:


HomeWorkScratch
quota5 10 GB1024 GB = 1 TB12+ PB (basically infinite)
policybacked upnot backed up,
not purged
not backed up,
purged if not accessed recently (~10 days)
access commandcdcdwcds
environment variable$HOME$STOCKYARD (root of the shared Work file system)
$WORK (different sub-directory for each cluster)
$SCRATCH
root file system/home/work/scratch
use forSmall files such as scripts that you don't want to lose.Medium-sized artifacts you don't want to copy over all the time. For example, custom programs you install (these can get large), or annotation file used for analysis.Large files accessed from batch jobs. Your starting files will be copied here from somewhere else, and your final results files will be copied back to your home systemelsewhere (e.g. stockyard, corral, or your BRCF POD).

When you first login, the system gives you information about disk quota and your compute allocation quota:

Code Block
--------------------- Project balances for user abattenh -----------------------
| Name           Avail SUs     Expires  | Name           Avail SUs     Expires |
| CancerGenetics      821054856  20152018-09-30 | human_brains A-cm10              456341096  20152018-0612-3031 |
| UT-2015-05-18      10000 2100 2 0152019-0603-3031 | genomeAnalysis     29324 2500  20162019-03-31 |
------------------------ Disk quotas for user abattenh -------------------------
| Disk         Usage (GB)     Limit    %Used   File Usage       Limit   %Used |
| /home1              0.0       510.0     0.0312          178 91     1500001000000    0.1201 |
| /work              54538.85    1024.0     5.35  52.59        61053     3000000    2.04 |
| /scratch       2621  3725.9   3000000    0.09 |
-----------------------------0     0.00         4137           0    0.00 |
-------------------------------------------------------------------------------

changing TACC directories

...

Tip

The cd (change directory) command with no arguments takes you to your home directory on any Linux/Unix system. The cdw and cds commands are specific to the TACC environment.

Corral

Stockyard (shared Work)

TACC compute clusters now share a common Work file system called stockyard. So files in your Work area do not have to be copied, for example from ls5 to stampede2 – they can be accessed directly from either cluster.

Note that there are two environment variables pertaining to the shared Work area:

  • $STOCKYARD - This refers to the root of your shared Work area
    • e.g. /work/01063/abattenh
  • $WORK - Refers to a sub-directory of the shared Work area that is different for different clusters, e.g.:
    • /work/01063/abattenh/lonestar on lonestar5
    • /work/01063/abattenh/stampede2 on stampede2

A mechanism for purchasing larger stockyard allocations (above the 1 TB basic quota) are in development.

The UT Austin BioInformatics Team, a loose group of researchers, maintains a common directory area on stockyard.

Code Block
languagebash
titleThe shared BioITeam directory
ls /work/projects/BioITeam

Files we will use in this course are in a sub-directory there:

Code Block
languagebash
titleOur shared class directory
ls /work/projects/BioITeam/courses/Core_NGS_Tools

Corral

Corral is a gigantic (multiple PB) storage system (spinning Corral is a gigantic (multiple PB) storage system (spinning disk) where researchers can store data. UT researchers may request up to 5 TB of corral storage through the normal TACC allocation request process. Additional space on corral can be rented for < $100~$85/TB/year.

The UT/Austin BioInformatics Team , a loose group of researchers, maintains a common directory area on corral.

Code Block
languagebash
titleThe shared BioITeam directory
ls /corral-repl/utexas/BioITeam

Files we will use in this course are in a sub-directory there:also has an older, common directory area on corral.

Code Block
languagebash
titleOur The shared class BioITeam directory
ls /corral-repl/utexas/BioITeam/core_ngs_tools

A couple of things to keep in mind regarding corral:

  • corral is a great place to store data in between analyses.
    • Store your permanent, original sequence data on corral
    • Copy the data you want to work with from corral to $SCRATCH
    • Run your analyses (batch jobs)
    • Copy your results back to corral
    On stampede you can access corral directories from login nodes (like the one you're on now), but your batch jobs cannot access corral.
    • Copy your results back to corral
    • This is because corral is a network file system, like Samba or NFS.
    • Since stampede has so many compute nodes, it doesn't have the network bandwidth that would allow simultaneous access to corral .
  • Occasionally corral can become unavailable. This can cause any command to hang that tries to access corral data.

Stockyard (shared $WORK)

TACC compute clusters now share a common $WORK file system called stockyard. So files in your $WORK area do not have to be copied, for example from stampede to ls5 ("lonestar5") – they can be accessed from either cluster.

...

  • !

Ranch

Ranch is a gigantic (multiple PB) tape archive system where researchers can archive data. UT researchers may request large (multi-TB) ranch storage allocations through the normal TACC allocation request process.

...