GPU servers

The BRCF has access to several GPU servers. See:

AMD GPU servers

NVIDIA GPU servers

Common Resources

Testing GPU access

To verify GPU access, use either the rocm-smi (AMD GPU servers) or nvidia-smi (NVIDIA GPU servers).

GPU-enabled software

See server-specific pages for how to run TensorFlow, PyTorch and AlphaFold.

Sharing resources

Since there's no batch system on BRCF POD compute servers, it is important for users to monitor their resource usage and that of other users in order to share resources appropriately.

Use top to monitor running tasks (or top -i to exclude idle processes)
- commands while top is running include:
- M - sort task list by memory usage
- P - sort task list by processor usage
- N - sort task list by process ID (PID)
- T - sort task list by run time
- 1 - show usage of each individual hyperthread
  - they're called "CPUs" but are really hyperthreads
  - this list can be long; non-interactive mpstat may be preferred
Use mpstat to monitor overall CPU usage
- mpstat -P ALL to see usage for all hyperthreads
- mpstat -P 0 to see specific hyperthread usage
Use free -g to monitor overall RAM memory and swap space usage (in GB)
Use rocm-smi (AMD GPUs) or nvidia-smi (NVIDIA GPUs) to see GPU usage

Page tree