Welcome to the Research Computing Task Force (RCTF) Users wiki! The RCTF is now part of the Biomedical Research Computing Facility (BRCF), and has an official organizational home in the Center for Biomedical Research Support (CBRS; see http://ccbb.utexas.edu/CBRSNewsletterSpring2017.pdf).
A special 3-day POD maintenance will take place in March over Spring Break, Wednesday March 14 through Friday March 16. During this 3-day maintenance we will be re-working our data center networking.
All POD access and services will be unavailable during this window. And because all networking will be modified simultaneously, we are unable to honor requests for maintenance exceptions or time changes.
The next regular POD maintenance window after March is scheduled for Thursday April 26, 2018.
- Obtaining a RCTF Account (https://rctf-account-request.icmb.utexas.edu)
- POD Resources and Access
- Support and Maintenance
- POD Software Information
RCTF provides local, centralized storage and compute systems called PODs. A POD consists of one or more compute servers along with a shared storage server. Files on POD storage can be accessed from any server within that POD.
A graphic illustration of the RCTF POD Compute/Storage model is shown below:
Technical features of this architecture include:
- A large set of Bioinformatics software available on all compute servers
- Storage managed by the high-performance ZFS file system, with
- built-in data integrity, far superior to standard Unix file systems
- large, contiguous address space
- automatic file compression
- RAID-like redundancy capabilities
- works well with inexpensive, commodity disks
- Centralized deployment, administration and monitoring
- OS configuration and 3rd part software installed via Puppet deployment tool
- Global RCTF user and group IDs, deployable to any POD
- Self-service Account Management web interface
- OS monitoring via Nagios tool
- Hardware-level monitorig via Out-Of-Band Management (OOBM) interfaces
- IMPI, iDRAC
RCTF Architecture goals
The Research Computing Task Force (RCTF) is a working group of IT-knowledgeable UT staff and students from CSSB, GSAF and CCBB. With assistance from the College of Natural Sciences Office of Information Technology (CNS-OIT), RCTF has implemented a standard hardware and software architecture, suitable for research computing, that can be efficiently and centrally managed.
Broadly, our goals are to supplement TACC's offerings by providing extensive local storage, including backups and archiving, along with easy-access non-batch local compute.
Before the RCTF intiative, various labs had their own legacy computational equipment and storage, as well as a hodgepodge of backup solutions (a common solution being "none"). This diversity combined with dwindling systems administration resources led to an untenable situation.
The Texas Advanced Computing Center (TACC) provides excellent computation resources for performing large-scale computations in parallel. However its batch system orientation is not well suited for running smaller, one-off computations, for developing scripts and pipelines, or for executing very long-running (> 2 days) computations. While TACC offers a no-cost tape archive facility (ranch), its persistent storage offerings (corral, global work file system) can be expensive and cumbersome to use for collaboration.
The RCTF IT architecture has been designed to address these issues and needs.
- Provide adequate local storage (spinning disk) in a large, non-partitioned address space.
- Implement some common file system structures to assist data organization and automation
- Provide flexible local compute capability with both common and lab-specific bioinformatics tools installed.
- Augment TACC offerings with non-batch computing environment
- Robust and "highly available" (but 24x7 uptime not required)
- Provide automated backups to spinning disk and periodic data archiving to TACC's ranch tape system.
- Target "sweet spot" of cost-versus-function commodity hardware offerings
- Aim for rolling hardware upgrades as technology evolves
- Provide centralized management of IT equipment
- Automate software deployment and system monitoring
- Make it easy to deploy new equipment
Recent space activity