Merge pull request #345 from NMannall/main

Add documentation for the high memory node. Policy has been confirmed for the PR update on highmem node usage.
EPCCed · Aug 13, 2024 · d06515d · d06515d
2 parents 347911a + 1deb37c
commit d06515d
Show file tree

Hide file tree

Showing 2 changed files with 45 additions and 9 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -12,9 +12,11 @@ information on how to get access to the system please see the [Cirrus
 website](http://www.cirrus.ac.uk).
 
 The Cirrus facility is based around an SGI ICE XA system. There are 280
-standard compute nodes and 38 GPU compute nodes. Each standard compute
-node has 256 GiB of memory and contains two 2.1 GHz, 18-core Intel Xeon
-(Broadwell) processors. Each GPU compute node has 384 GiB of memory,
+standard compute nodes, 1 high memory compute node and 38 GPU compute
+nodes. Each standard compute node has 256 GiB of memory and contains two
+2.1 GHz, 18-core Intel Xeon (Broadwell) processors. Each high memory
+compute node has 3 TiB of memory and contains four 2.7 GHz, 28-core Intel
+Xeon (Platinum) processors. Each GPU compute node has 384 GiB of memory,
 contains two 2.4 GHz, 20-core Intel Xeon (Cascade Lake) processors and
 four NVIDIA Tesla V100-SXM2-16GB (Volta) GPU accelerators connected to
 the host processors and each other via PCIe. All nodes are connected

diff --git a/docs/user-guide/batch.md b/docs/user-guide/batch.md
@@ -199,16 +199,49 @@ you request 1 GPU card, then you will be assigned a maximum of 384/4 =
 
 
 
+### Primary resources on high memory (CPU) compute nodes
+
+The *primary resource* you request on the high memory compute node is CPU
+cores. The maximum amount of memory you are allocated is computed as the
+number of CPU cores you requested multiplied by 1/112th of the total
+memory available (as there are 112 CPU cores per node). So, if you
+request the full node (112 cores), then you will be allocated a maximum
+of all of the memory (3 TB) available on the node; however, if you
+request 1 core, then you will be assigned a maximum of 3000/112 = 26.8 GB
+of the memory available on the node.
+
+!!! Note
+
+	Using the `--exclusive` option in jobs will give you access to the full
+	node memory even if you do not explicitly request all of the CPU cores
+	on the node.
+
+
+!!! Warning
+
+	Using the `--exclusive` option will charge your account for the usage of
+	the entire node, even if you don't request all the cores in your
+	scripts.
+
+!!! Note
+
+	You will not generally have access to the full amount of memory resource
+	on the the node as some is retained for running the operating system and
+	other system processes.
+
+
+
 ### Partitions
 
 On Cirrus, compute nodes are grouped into partitions. You will have to
 specify a partition using the `--partition` option in your submission
 script. The following table has a list of active partitions on Cirrus.
 
-| Partition | Description                                                                    | Total nodes available | Notes |
-|-----------|--------------------------------------------------------------------------------|-----------------------|-------|
-| standard  | CPU nodes with 2x 18-core Intel Broadwell processors                           | 352                   |       |
-| gpu       | GPU nodes with 4x Nvidia V100 GPU and 2x 20-core Intel Cascade Lake processors | 36                    |       |
+| Partition | Description                                                                                   | Total nodes available | Notes |
+|-----------|-----------------------------------------------------------------------------------------------|-----------------------|-------|
+| standard  | CPU nodes with 2x 18-core Intel Broadwell processors, 256 GB memory                           | 352                   |       |
+| highmem   | CPU node with 4x 28-core Intel Xeon Platinum processors, 3 TB memory                          | 1                     |       |
+| gpu       | GPU nodes with 4x Nvidia V100 GPU and 2x 20-core Intel Cascade Lake processors, 384 GB memory | 36                    |       |
 
 Cirrus Partitions
 
@@ -232,12 +265,13 @@ resource limits. The following table has a list of active QoS on Cirrus.
 | QoS Name     | Jobs Running Per User | Jobs Queued Per User | Max Walltime | Max Size                                | Applies to Partitions | Notes |
 |--------------|-----------------------|----------------------|--------------|-----------------------------------------|-----------------------|-------|
 | standard     | No limit              | 500 jobs             | 4 days       | 88 nodes (3168 cores/25%)               | standard              |       |
+| highmem      | 1 job                 | 2 jobs               | 24 hours     | 1 node                                  | highmem               |       |
 | largescale   | 1 job                 | 4 jobs               | 24 hours     | 228 nodes (8192+ cores/65%) or 144 GPUs | standard, gpu         |       |
 | long         | 5 jobs                | 20 jobs              | 14 days      | 16 nodes or 8 GPUs                      | standard, gpu         |       |
-| highpriority | 10 jobs               | 20 jobs              | 4 days       | 140 nodes                               | standard              |        charged at 1.5 x normal rate |
+| highpriority | 10 jobs               | 20 jobs              | 4 days       | 140 nodes                               | standard              | charged at 1.5 x normal rate |
 | gpu          | No limit              | 128 jobs             | 4 days       | 64 GPUs (16 nodes/40%)                  | gpu                   |       |
 | short        | 1 job                 | 2 jobs               | 20 minutes   | 2 nodes or 4 GPUs                       | standard, gpu         |       |
-| lowpriority  | No limit              | 100 jobs             | 2 days       | 36 nodes (1296 cores/10%) or 16 GPUs    | standard, gpu         |        usage is not charged |
+| lowpriority  | No limit              | 100 jobs             | 2 days       | 36 nodes (1296 cores/10%) or 16 GPUs    | standard, gpu         | usage is not charged |
 
 #### Cirrus QoS