Skip to content

Catscan

The Catscan cluster is an XCAT stand alone cluster.

Login node: catscan.lbl.gov

Sports a 269Tb zfs filesystem for data and computational scratch space on /pool0 (see below)

Compute nodes: n0000, n0001, n0002, n0003

Cluster Configuration

Node Access Storage Filesystems Description of Use CPU CORES MEMORY GPU
catscan.lbl.gov ssh with either LDAP credentials or password provided by administrator /home: local drive, 211G

/pool0 ZFS filesystem, 269T
/clusterfs/bebb/users

/clusterfs/bebb/group-sw
Login node Intel(R) Xeon(R) Gold 6126 48 (HT Enabled) 196 GB N/A
n000[0-1] ssh from catscan.lbl.gov with cluster key As above via nfs As above via nfs Compute node Intel(R) Xeon(R) Gold 6126 48 (HT Enabled) 196 GB 4x NVIDIA GeForce GTX 1080 Ti
n000[2-3] ssh from catscan.lbl.gov with cluster key As above via nfs As above via nfs Compute node 24 188 GB 2x NVIDIA RTX A4500

Operating System: CentOS Linux release 7.9.2009 (Core)

Nvidia driver & NVRM version: NVIDIA UNIX x86_64 Kernel Module 510.73.05 Sat May 7 05:30:26 UTC 2022

Each compute node has 4 GPUS:

CUDA Driver Version / Runtime Version          10.1 / 10.0 
CUDA Capability Major/Minor version number:    6.1 
Total amount of global memory:                 11178 MBytes (11721506816 bytes) 
(28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores 
GPU Max Clock rate:                            1582 MHz (1.58 GHz) 
Memory Clock rate:                             5505 Mhz 
Memory Bus Width:                              352-bit L2 
Cache Size:                                 2883584 bytes 
Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) 
Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers 
Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers 
Total amount of constant memory:               65536 bytes 
Total amount of shared memory per block:       49152 bytes 
Total number of registers available per block: 65536 
Warp size:                                     32 
Maximum number of threads per multiprocessor:  2048 
Maximum number of threads per block:           1024 
Max dimension size of a thread block (x,y,z): (1024, 1024, 64) 
Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535) 
Maximum memory pitch:                          2147483647 bytes 
Texture alignment:                             512 bytes 
Concurrent copy and kernel execution:          Yes with 2 copy engine(s) 
Run time limit on kernels:                     No 
Integrated GPU sharing Host Memory:            No 
Support host page-locked memory mapping:       Yes 
Alignment requirement for Surfaces:            Yes 
Device has ECC support:                        Disabled 
Device supports Unified Addressing (UVA):      Yes 
Device supports Compute Preemption:            Yes 
Supports Cooperative Kernel Launch:            Yes 
Supports MultiDevice Co-op Kernel Launch:      Yes 
Device PCI Domain ID / Bus ID / location ID:   0 / 94 / 0 
Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Queue Configuration

At the moment, there is no resource manager/scheduler. Depending on how the resources are used, this may change.

Additional Notes

Authentication

Currently authentication is LDAP -- the same credentials that you would use to access gmail.lbl.gov. We are in the process of evaluating OTP over ssh keys, or simply OTP.

Accessing the Compute nodes

You will need to generate ssh-keys for intra-cluster access to the compute nodes.

To do this run:

ssh-keygen -t ed25519

Then run

cat ~/.ssh/id_ed25519.pub >> ~/.ssh/authorized_keys

Then finally

chmod 600 ~/.ssh/authorized_keys

The suggested default name and location should be fine. You will be prompted for a password, but to leave it blank just hit enter. Passwords can interfere with intra-cluster node communication when launching jobs, particularly with a scheduler should we choose to deploy one.

Storage for Data

Each user will be granted space under /clusterfs/bebb/users for data. There is also a group directory to which the group has write permission. This directory should be used for custom builds of software of which the group may want to take advantage. This is modeled after what we offer on our clusters in 1275, but this is your cluster so you can choose to use it how you desire.

Software Module Farm

See Documentation on using "modules".

Apptainer (formerly known as Singularity)

Some of you have expressed the need to import custom software built on different architectures. Instead of re-inventing this on the catscan architecture, you can use Apptainer.

Apptainer enables users to have full control of their environment. Apptainer can be used to package entire scientific workflows, software and libraries, and even data.

To get started, check out these links:

Documentation on using Singularity on a cluster called Savio that we manage on UCB cluster