You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

Welcome to the Research Computing Task Force (RCTF) Users wiki!

Upcoming maintenance:

  • Marcotte POD - Thursday December 15, 2016, 9 am - 5 pm Central Time
  • January monthly maintenance - Thursday January 12, 2017, 9 am - 5 pm Central Time

RCTF provides local, centralized storage and compute systems called PODs. A POD consists of one or more compute servers along with a shared storage server. Files on POD storage can be accessed from any server within that POD.

A graphic illustration of the RCTF POD Compute/Storage model is shown below:

Technical features of this architecture include:

  • A large set of Bioinformatics software available on all compute servers
  • Storage managed by the high-performance ZFS file system, with
    • built-in data integrity, far superior to standard Unix file systems
    • large, contiguous address space
    • automatic file compression
    • RAID-like redundancy capabilities
    • works well with inexpensive, commodity disks
  • Central OS configuration and software installations via the Puppet deployment tool
  • Global RCTF user and group IDs, deployable to any POD
  • Self-service Account Management web interface
  • O/S monitoring via Nagios tool
  • Hardware-level monitorig via Out-Of-Band Management (OOBM) interfaces
    • IMPI, iDRAC

RCTF Architecture goals

The Research Computing Task Force (RCTF) is a working group of IT-knowledgeable UT staff and students from CSSB, GSAF and CCBB. With assistance from the College of Natural Sciences Office of Information Technology (CNS-OIT), RCTF has implemented a standard hardware and software architecture, suitable for research computing, that can be efficiently and centrally managed.

Broadly, our goals are to supplement TACC's offerings by providing extensive local storage, including backups and archiving, along with easy-access non-batch local compute.

Before the RCTF intiative, various labs had their own legacy computational equipment and storage, as well as a hodgepodge of backup solutions (a common solution being "none"). This diversity combined with dwindling systems administration resources led to an untenable situation.

The Texas Advanced Computing Center (TACC) provides excellent computation resources for performing large-scale computations in parallel. However its batch system orientation is not well suited for running smaller, one-off computations, for developing scripts and pipelines, or for executing very long-running (> 2 days) computations. While TACC offers a no-cost tape archive facility (ranch), its persistent storage offerings (corral, global work file system) are expensive and cumbersome to use for collaboration.

The RCTF IT architecture has been designed to address these issues and needs.

  • Provide adequate local storage (spinning disk) in a large, non-partitioned address space.
    • Implement some common file system structures to assist data organization and automation
  • Provide flexible local compute capability with both common and lab-specific bioinformatics tools installed.
    • Augment TACC offerings with non-batch computing environment
    • Robust and "highly available" (but 24x7 uptime not required)
  • Provide automated backups to spinning disk and periodic data archiving to TACC's ranch tape system.
  • Avoid large "one-time" equipment purchases
    • Aim for rolling hardware upgrades as technology evolves
    • Target "sweet spot" of cost-versus-function commodity hardware offerings
  • Provide centralized management of IT equipment
    • Automate software deployment and system monitoring
    • Make it easy to deploy new equipment

 

Recently Updated

 
Navigate space

Recent space activity

Space contributors

{"mode":"list","scope":"descendants","limit":"5","showLastTime":"true","order":"update","contextEntityId":153038831}

 

  • No labels