Catscan¶
The Catscan cluster is an XCAT stand alone cluster.
Login node: catscan.lbl.gov
Sports a 269Tb zfs filesystem for data and computational scratch space on /pool0 (see below)
Compute nodes: n0000, n0001, n0002, n0003
Cluster Configuration¶
Node | Access | Storage | Filesystems | Description of Use | CPU | CORES | MEMORY | GPU |
---|---|---|---|---|---|---|---|---|
catscan.lbl.gov | ssh with either LDAP credentials or password provided by administrator | /home: local drive, 211G /pool0 ZFS filesystem, 269T |
/clusterfs/bebb/users /clusterfs/bebb/group-sw |
Login node | Intel(R) Xeon(R) Gold 6126 | 48 (HT Enabled) | 196 GB | N/A |
n000[0-1] | ssh from catscan.lbl.gov with cluster key | As above via nfs | As above via nfs | Compute node | Intel(R) Xeon(R) Gold 6126 | 48 (HT Enabled) | 196 GB | 4x NVIDIA GeForce GTX 1080 Ti |
n000[2-3] | ssh from catscan.lbl.gov with cluster key | As above via nfs | As above via nfs | Compute node | 24 | 188 GB | 2x NVIDIA RTX A4500 |
Operating System: CentOS Linux release 7.9.2009 (Core)
Nvidia driver & NVRM version: NVIDIA UNIX x86_64 Kernel Module 510.73.05 Sat May 7 05:30:26 UTC 2022
Each compute node has 4 GPUS:
CUDA Driver Version / Runtime Version 10.1 / 10.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 11178 MBytes (11721506816 bytes)
(28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores
GPU Max Clock rate: 1582 MHz (1.58 GHz)
Memory Clock rate: 5505 Mhz
Memory Bus Width: 352-bit L2
Cache Size: 2883584 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 94 / 0
Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Queue Configuration¶
At the moment, there is no resource manager/scheduler. Depending on how the resources are used, this may change.
Additional Notes¶
Authentication¶
Currently authentication is LDAP -- the same credentials that you would use to access gmail.lbl.gov. We are in the process of evaluating OTP over ssh keys, or simply OTP.
Accessing the Compute nodes¶
You will need to generate ssh-keys for intra-cluster access to the compute nodes.
To do this run:
ssh-keygen -t ed25519
Then run
cat ~/.ssh/id_ed25519.pub >> ~/.ssh/authorized_keys
Then finally
chmod 600 ~/.ssh/authorized_keys
The suggested default name and location should be fine. You will be prompted for a password, but to leave it blank just hit enter. Passwords can interfere with intra-cluster node communication when launching jobs, particularly with a scheduler should we choose to deploy one.
Storage for Data¶
Each user will be granted space under /clusterfs/bebb/users
for data. There is also a group directory to which the group has write permission. This directory should be used for custom builds of software of which the group may want to take advantage. This is modeled after what we offer on our clusters in 1275, but this is your cluster so you can choose to use it how you desire.
Software Module Farm¶
See Documentation on using "modules".
Apptainer (formerly known as Singularity)¶
Some of you have expressed the need to import custom software built on different architectures. Instead of re-inventing this on the catscan architecture, you can use Apptainer.
Apptainer enables users to have full control of their environment. Apptainer can be used to package entire scientific workflows, software and libraries, and even data.
To get started, check out these links:
Documentation on using Singularity on a cluster called Savio that we manage on UCB cluster