diff --git a/404.html b/404.html index 299fd618..873a457e 100644 --- a/404.html +++ b/404.html @@ -996,7 +996,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/e1000-migration/index.html b/e1000-migration/index.html index 3be10dd3..7e621c28 100644 --- a/e1000-migration/index.html +++ b/e1000-migration/index.html @@ -1001,7 +1001,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/index.html b/index.html index 4b3a2be7..74619aaf 100644 --- a/index.html +++ b/index.html @@ -1013,7 +1013,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/search/search_index.json b/search/search_index.json index e9705773..c149589d 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Overview","text":""},{"location":"#cirrus","title":"Cirrus","text":"

Cirrus is a HPC and data science service hosted and run by EPCC at The University of Edinburgh. It is one of the EPSRC Tier-2 National HPC Services.

Cirrus is available to industry and academic researchers. For information on how to get access to the system please see the Cirrus website.

The Cirrus facility is based around an SGI ICE XA system. There are 280 standard compute nodes and 38 GPU compute nodes. Each standard compute node has 256 GiB of memory and contains two 2.1 GHz, 18-core Intel Xeon (Broadwell) processors. Each GPU compute node has 384 GiB of memory, contains two 2.4 GHz, 20-core Intel Xeon (Cascade Lake) processors and four NVIDIA Tesla V100-SXM2-16GB (Volta) GPU accelerators connected to the host processors and each other via PCIe. All nodes are connected using a single Infiniband fabric. This documentation covers:

Information on using the SAFE web interface for managing and reporting on your usage on Cirrus can be found on the Tier-2 SAFE Documentation

This documentation draws on the documentation for the ARCHER2 National Supercomputing Service.

"},{"location":"e1000-migration/","title":"Cirrus migration to E1000 system","text":"

There will be a full service maintenance on Tuesday 12th March from 0900 - 1700 GMT to allow for some major changes on the Cirrus service.

Tip

If you need help or have questions on the Cirrus E1000 migration please contact the Cirrus service desk

"},{"location":"e1000-migration/#change-of-authentication-protocol","title":"Change of authentication protocol","text":"

We are changing the authentication protocol on Cirrus from ldap to freeipa.

We expect this change to be transparent to users but you may notice a change from username@cirrus to username@eidf within your SAFE account.

You should be able to connect using your existing Cirrus authentication factors i.e. your ssh key pair and your TOTP token.

If you do experience issues, then please reset your tokens and try to reconnect. If problems persist then please contact the service desk.

Further details on Connecting to Cirrus

"},{"location":"e1000-migration/#new-work-file-system","title":"New /work file system","text":"

We are replacing the existing lustre /work file system with a new more performant lustre file system, E1000.

The old /work file system will be available as read-only and we ask you to copy any files you require onto the new /work file system.

The old read-only file system will be removed on 1st May so please retrieve all required data by then.

For username in project x01, to copy data from /mnt/lustre/indy2lfs/work/x01/x01/username/directory_to_copy to /work/x01/x01/username/destination_directory you would do this by running:

cp -r /mnt/lustre/indy2lfs/work/x01/x01/username/directory_to_copy \\ /work/x01/x01/username/destination_directory

Further details of Data Management and Transfer on Cirrus

Note

Slurm Pending Jobs As the underlying pathname for /work will be changing with the addition of the new file system, all of the pending work in the slurm queue will be removed during the migration. When the service is returned, please resubmit your slurm jobs to Cirrus.

"},{"location":"e1000-migration/#cse-module-updates","title":"CSE Module Updates","text":"

Our Computational Science and Engineering (CSE) Team have taken the opportunity of the arrival of the new file system to update modules and also remove older versions of modules. A full list of the changes to the modules can be found below.

Please contact the service desk if you have concerns about the removal of any of the older modules.

"},{"location":"e1000-migration/#to-be-removed","title":"TO BE REMOVED","text":"Package/module Advice for users altair-hwsolvers/13.0.213 Please contact the service desk if you wish to use Altair Hyperworks. altair-hwsolvers/14.0.210 Please contact the service desk if you wish to use Altair Hyperworks. ansys/18.0 Please contact the service desk if you wish to use ANSYS Fluent. ansys/19.0 Please contact the service desk if you wish to use ANSYS Fluent.

autoconf/2.69

Please use autoconf/2.71

bison/3.6.4

Please use bison/3.8.2

boost/1.67.0

Please use boost/1.84.0

boost/1.73.0

Please use boost/1.84.0

cmake/3.17.3

cmake/3.22.1

Please use cmake/3.25.2

CUnit/2.1.3

Please contact the service desk if you wish to use CUnit.

dolfin/2019.1.0-intel-mpi

dolfin/2019.1.0-mpt

Dolfin is no longer supported and will not be replaced.

eclipse/2020-09

Please contact the service desk if you wish to use Eclipse.

expat/2.2.9

Please use expat/2.6.0

fenics/2019.1.0-intel-mpi

fenics/2019.1.0-mpt

Fenics is no longer supported and will not be replaced.

fftw/3.3.8-gcc8-ompi4

fftw/3.3.8-intel19

fftw/3.3.9-ompi4-cuda11-gcc8\u00a0

fftw/3.3.8-intel18 \u00a0\u00a0

fftw/3.3.9-impi19-gcc8 \u00a0

fftw/3.3.10-intel19-mpt225 \u00a0\u00a0

fftw/3.3.10-ompi4-cuda116-gcc8

Please use one of the following

fftw/3.3.10-gcc10.2-mpt2.25

fftw/3.3.10-gcc10.2-impi20.4

fftw/3.3.10-gcc10.2-ompi4-cuda11.8

fftw/3.3.10-gcc12.3-impi20.4

fftw/3.3.10-intel20.4-impi20.4

flacs/10.9.1

flacs-cfd/20.1

flacs-cfd/20.2

flacs-cfd/21.1

flacs-cfd/21.2

flacs-cfd/22.1

Please contact the helpdesk if you wish to use FLACS.

forge/22.1.3

Please use forge/23.1.1

gcc/6.2.0

Please use gcc/8.2.0 or later

gcc/6.3.0

Please use gcc/8.2.0 or later

gcc/12.2.0-offload

Please use gcc/12.3.0-offload

gdal/2.1.2-gcc

gdal/2.1.2-intel\u00a0

gdal/2.4.4-gcc

Please use gcc/3.6.2-gcc

git/2.21.0

Please use git/2.37.3

gmp/6.2.0-intel\u00a0

gmp/6.2.1-mpt

gmp/6.3.0-mpt

Please use gmp/6.3.0-gcc or gmp/6.3.0-intel\u00a0

gnu-parallel/20200522-gcc6

Please use gnu-parallel/20240122-gcc10

gromacs/2022.1gromacs/2022.1-gpugromacs/2022.3-gpu

Please use one of:gromacs/2023.4gromacs/2023.4-gpu

hdf5parallel/1.10.4-intel18-impi18

Please use hdf5parallel/1.14.3-intel20-impi20

hdf5parallel/1.10.6-gcc6-mpt225

Please use hdf5parallel/1.14.3-gcc10-mpt225

hdf5parallel/1.10.6-intel18-mpt225

Please use hdf5parallel/1.14.3-intel20-mpt225

hdf5parallel/1.10.6-intel19-mpt225

Please use hdf5parallel/1.14.3-intel20-mpt225

hdf5serial/1.10.6-intel18

Please use hdf5serial/1.14.3-intel20

horovod/0.25.0

horovod/0.25.0-gpu

horovod/0.26.1-gpu

Please use one of the pytorch or tensorflow modules

htop/3.1.2\u00a0

Please use htop/3.2.1\u00a0

intel 18.0 compilers etc

Please use Intel 19.5 or later; or oneAPI

intel 19.0 compilers etc

Please use Intel 19.5 or later

lammps/23Jun2022_intel19_mptlammps/8Feb2023-gcc8-impilammps/23Sep2023-gcc8-impilammps/8Feb2023-gcc8-impi-cuda118lammps/23Sep2023-gcc8-impi-cuda118

Please use one of:

lammps/15Dec2023-gcc10.2-impi20.4lammps-gpu/15Dec2023-gcc10.2-impi20.4-cuda11.8

libxkbcommon/1.0.1

Please contact the service desk if you wish to use libxkbcommon.

libnsl/1.3.0\u00a0

Please contact the helpdesk if you wish to use libnsl.

libpng/1.6.30

This is no longer supported as the central module.

libtirpc/1.2.6

Please contact the helpdesk if you wish to use libtirpc.

libtool/2.4.6

Please use libtool/2.4.7 nco/4.9.3 Please use nco/5.1.9 nco/4.9.7 Please use nco/5.1.9 ncview/2.1.7 Please use ncview/2.1.10

netcdf-parallel/4.6.2-intel18-impi18

Please use netcdf-parallel/4.9.2-intel20-impi20

netcdf-parallel/4.6.2-intel19-mpt225

Please use netcdf-parallel/4.9.2-intel20-mpt225

nvidia/cudnn/8.2.1-cuda-11.6

nvidia/cudnn/8.2.1-cuda-11.6

nvidia/cudnn/8.9.4-cuda-11.6

nvidia/cudnn/8.9.7-cuda-11.6

Please use one of the following

nvidia/cudnn/8.6.0-cuda-11.6

nvidia/cudnn/8.6.0-cuda-11.6

nvidia/nvhpc/22.11-no-gcc

Use nvidia/nvhpc/22.11

nvidia/tensorrt/7.2.3.4

Please use nvidia/tensorrt/8.4.3.1-u2

openfoam/v8.0

Please consider a later version, e.g., v10.0

openfoam/v9.0

Please consider a later version, e.g, v11.0

openfoam/v2006

Please consider a later version, e.g., v2306

openmpi/4.1.2-cuda-11.6

openmpi/4.1.4

openmpi/4.1.4-cuda-11.6

openmpi/4.1.4-cuda-11.6-nvfortran

openmpi/4.1.4-cuda-11.8

openmpi/4.1.4-cuda-11.8-nvfortran

openmpi/4.1.5

openmpi/4.1.5-cuda-11.6

Please use one of the following

openmpi/4.1.6

openmpi/4.1.6-cuda-11.6

openmpi/4.1.6-cuda-11.6-nvfortran

openmpi/4.1.6-cuda-11.8

openmpi/4.1.6-cuda-11.8-nvfortran

petsc/3.13.2-intel-mpi-18

petsc/3.13.2-mpt

Please contact the helpdesk if you require a more recent version of PETSc.

pyfr/1.14.0-gpu

Please use pyfr/1.15.0-gpu

pytorch/1.12.1

pytorch/1.12.1-gpu

Please use one of the following

pytorch/1.13.1

pytorch/1.13.1-gpu

quantum-espresso/6.5-intel-19

Please use QE/6.5-intel-20.4

specfem3d

Please contact the helpdesk if you wish to use SPECFEM3D

starccm+/14.04.013-R8

starccm+/14.06.013-R8 \u2192 2019.3.1-R8

starccm+/15.02.009-R8 \u2192 2020.1.1-R8\u00a0

starccm+/15.04.010-R8 \u2192 2020.2.1-R8\u00a0

starccm+/15.06.008-R8 \u2192 2020.3.1-R8

starccm+/16.02.009 \u2192 2021.1.1

Please contact the helpdesk if you wish to use STAR-CCM+

tensorflow/2.9.1-gpu

tensorflow/2.10.0

tensorflow/2.11.0-gpu

Please use one of the following

tensorflow/2.15.0

tensorflow/2.15.0-gpu

ucx/1.9.0

ucx/1.9.0-cuda-11.6

ucx/1.9.0-cuda-11.8

Please use one of the following

ucx/1.15.0

ucx/1.15.0-cuda-11.6

ucx/1.15.0-cuda-11.8

vasp-5.4.4-intel19-mpt220

zlib/1.2.11

Please use zlib/1.3.1"},{"location":"software-libraries/hdf5/","title":"HDF5","text":"

Serial and parallel versions of HDF5 are available on Cirrus.

Module name Library version Compiler MPI library hdf5parallel/1.10.4-intel18-impi18 1.10.4 Intel 18 Intel MPI 18 hdf5parallel/1.10.6-intel18-mpt222 1.10.6 Intel 18 HPE MPT 2.22 hdf5parallel/1.10.6-intel19-mpt222 1.10.6 Intel 19 HPE MPT 2.22 hdf5parallel/1.10.6-gcc6-mpt222 1.10.6 GCC 6.3.0 HPE MPT 2.22

Instructions to install a local version of HDF5 can be found on this repository: https://github.com/hpc-uk/build-instructions/tree/main/utils/HDF5

"},{"location":"software-libraries/intel_mkl/","title":"Intel MKL: BLAS, LAPACK, ScaLAPACK","text":"

The Intel Maths Kernel Libraries (MKL) contain a variety of optimised numerical libraries including BLAS, LAPACK, and ScaLAPACK. In general, the exact commands required to build against MKL depend on the details of compiler, environment, requirements for parallelism, and so on. The Intel MKL link line advisor should be consulted.

See https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link-line-advisor.html

Some examples are given below. Note that loading the appropriate intel tools module will provide the environment variable MKLROOT which holds the location of the various MKL components.

"},{"location":"software-libraries/intel_mkl/#intel-compilers","title":"Intel Compilers","text":""},{"location":"software-libraries/intel_mkl/#blas-and-lapack","title":"BLAS and LAPACK","text":"

To use MKL libraries with the Intel compilers you just need to load the relevant Intel compiler module, and the Intel cmkl module, e.g.:

module load intel-20.4/fc\nmodule load intel-20.4/cmkl\n

To include MKL you specify the -mkl option on your compile and link lines. For example, to compile a simple Fortran program with MKL you could use:

ifort -c -mkl -o lapack_prb.o lapack_prb.f90\nifort -mkl -o lapack_prb.x lapack_prb.o\n

The -mkl flag without any options builds against the threaded version of MKL. If you wish to build against the serial version of MKL, you would use -mkl=sequential.

"},{"location":"software-libraries/intel_mkl/#scalapack","title":"ScaLAPACK","text":"

The distributed memory linear algebra routines in ScaLAPACK require MPI in addition to the compiler and MKL libraries. Here we use Intel MPI via:

module load intel-20.4/fc\nmodule load intel-20.4/mpi\nmodule load intel-20.4/cmkl\n

ScaLAPACK requires the Intel versions of BLACS at link time in addition to ScaLAPACK libraries; remember also to use the MPI versions of the compilers:

mpiifort -c -o linsolve.o linsolve.f90\nmpiifort -o linsolve.x linsolve.o -L${MKLROOT}/lib/intel64 \\\n-lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core \\\n-lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl\n
"},{"location":"software-libraries/intel_mkl/#gnu-compiler","title":"GNU Compiler","text":""},{"location":"software-libraries/intel_mkl/#blas-and-lapack_1","title":"BLAS and LAPACK","text":"

To use MKL libraries with the GNU compiler you first need to load the GNU compiler module and Intel MKL module, e.g.,:

module load gcc\nmodule load intel-20.4/cmkl\n

To include MKL you need to link explicitly against the MKL libraries. For example, to compile a single source file Fortran program with MKL you could use:

gfortran -c -o lapack_prb.o lapack_prb.f90\ngfortran -o lapack_prb.x lapack_prb.o -L$MKLROOT/lib/intel64 \\\n-lmkl_gf_lp64 -lmkl_core -lmkl_sequential\n

This will build against the serial version of MKL; to build against the threaded version use:

gfortran -c -o lapack_prb.o lapack_prb.f90\ngfortran -fopenmp -o lapack_prb.x lapack_prb.o -L$MKLROOT/lib/intel64 \\\n-lmkl_gf_lp64 -lmkl_core -lmkl_gnu_thread\n
"},{"location":"software-libraries/intel_mkl/#scalapack_1","title":"ScaLAPACK","text":"

The distributed memory linear algebra routines in ScaLAPACK require MPI in addition to the MKL libraries. On Cirrus, this is usually provided by SGI MPT.

module load gcc\nmodule load mpt\nmodule load intel-20.4/cmkl\n

Once you have the modules loaded you need to link against two additional libraries to include ScaLAPACK. Note we use here the relevant mkl_blacs_sgimpt_lp64 version of the BLACS library. Remember to use the MPI versions of the compilers:

mpif90 -f90=gfortran -c -o linsolve.o linsolve.f90\nmpif90 -f90=gfortran -o linsolve.x linsolve.o -L${MKLROOT}/lib/intel64 \\\n-lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core \\\n-lmkl_blacs_sgimpt_lp64 -lpthread -lm -ldl\n
"},{"location":"software-libraries/intel_mkl/#ilp-vs-lp-interface-layer","title":"ILP vs LP interface layer","text":"

Many applications will use 32-bit (4-byte) integers. This means the MKL 32-bit integer interface should be selected (which gives the _lp64 extensions seen in the examples above).

For applications which require, e.g., very large array indices (greater than 2^31-1 elements), the 64-bit integer interface is required. This gives rise to _ilp64 appended to library names. This may also require -DMKL_ILP64 at the compilation stage. Check the Intel link line advisor for specific cases.

"},{"location":"software-packages/Ansys/","title":"ANSYS Fluent","text":"

ANSYS Fluent is a computational fluid dynamics (CFD) tool. Fluent includes well-validated physical modelling capabilities to deliver fast, accurate results across the widest range of CFD and multi-physics applications.

"},{"location":"software-packages/Ansys/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/Ansys/#using-ansys-fluent-on-cirrus","title":"Using ANSYS Fluent on Cirrus","text":"

ANSYS Fluent on Cirrus is only available to researchers who bring their own licence. Other users cannot access the version centrally-installed on Cirrus.

If you have any questions regarding ANSYS Fluent on Cirrus please contact the Cirrus Helpdesk.

"},{"location":"software-packages/Ansys/#running-parallel-ansys-fluent-jobs","title":"Running parallel ANSYS Fluent jobs","text":"

The following batch file starts Fluent in a command line mode (no GUI) and starts the Fluent batch file \"inputfile\". One parameter that requires particular attention is \"-t504\". In this example 14 Cirrus nodes (14 * 72 = 1008 cores) are allocated; where half of the 1008 cores are physical and the other half are virtual. To run fluent optimally on Cirrus, only the physical cores should be employed. As such, fluent's -t flag should reflect the number of physical cores: in this example, \"-t504\" is employed.

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=ANSYS_test\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\nexport HOME=${HOME/home/work}\n\nscontrol show hostnames $SLURM_NODELIST > ~/fluent.launcher.host.txt\n\n# Launch the parallel job\n./fluent 3ddp -g -i inputfile.fl \\\n  -pinfiniband -alnamd64 -t504 -pib    \\\n  -cnf=~/fluent.launcher.host.txt      \\\n  -ssh  >& outputfile.txt\n

Below is the Fluent \"inputfile.fl\" batch script. Anything that starts with a \";\" is a comment. This script does the following:

"},{"location":"software-packages/Ansys/#actual-fluent-script-inputfilefl","title":"Actual Fluent script (\"inputfile.fl\"):","text":"

Replace [Your Path To ] before running

; Start transcript\n/file/start-transcript [Your Path To ]/transcript_output_01.txt\n; Read case file\nrc [Your Path To ]/200M-CFD-Benchmark.cas\n; Read data file\n/file/read-data [Your Path To ]/200M-CFD-Benchmark-500.dat\n; Print statistics\n/parallel/bandwidth\n/parallel/latency\n/parallel/timer/usage\n/parallel/timer/reset\n; Calculate 50 iterations\nit 50\n; Print statistics\n/parallel/timer/usage\n/parallel/timer/reset\n; Write data file\nwd [Your Path To ]/200M-CFD-Benchmark-500-new.dat\n; Stop transcript\n/file/stop-transcript\n; Exit Fluent\nexit\nyes\n
"},{"location":"software-packages/MATLAB/","title":"MATLAB","text":"

MATLAB combines a desktop environment tuned for iterative analysis and design processes with a programming language that expresses matrix and array mathematics directly.

"},{"location":"software-packages/MATLAB/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/MATLAB/#using-matlab-on-cirrus","title":"Using MATLAB on Cirrus","text":"

MATLAB R2020b and R2021b are available on Cirrus. R2020b is the current default.

This installation of MATLAB on Cirrus is covered by an Academic License - for use in teaching, academic research, and meeting course requirements at degree granting institutions only. Not for government, commercial, or other organizational use.

If your use of MATLAB is not covered by this license then please do not use this installation. Please contact the Cirrus Helpdesk to arrange use of your own MATLAB license on Cirrus.

Detailed version information:

-----------------------------------------------------------------------------------------------------\nMATLAB Version: 9.9.0.2037887 (R2020b) Update 8\nMATLAB License Number: 904098\nOperating System: Linux 4.18.0-305.25.1.el8_4.x86_64 #1 SMP Mon Oct 18 14:34:11 EDT 2021 x86_64\nJava Version: Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode\n-----------------------------------------------------------------------------------------------------\nMATLAB                                                Version 9.9         (R2020b)\nSimulink                                              Version 10.2        (R2020b)\nDSP System Toolbox                                    Version 9.11        (R2020b)\nDeep Learning HDL Toolbox                             Version 1.0         (R2020b)\nDeep Learning Toolbox                                 Version 14.1        (R2020b)\nImage Processing Toolbox                              Version 11.2        (R2020b)\nParallel Computing Toolbox                            Version 7.3         (R2020b)\nSignal Processing Toolbox                             Version 8.5         (R2020b)\nStatistics and Machine Learning Toolbox               Version 12.0        (R2020b)\nSymbolic Math Toolbox                                 Version 8.6         (R2020b)\nWavelet Toolbox                                       Version 5.5         (R2020b)\n
"},{"location":"software-packages/MATLAB/#running-matlab-jobs","title":"Running MATLAB jobs","text":"

On Cirrus, MATLAB is intended to be used on the compute nodes within Slurm job scripts. Use on the login nodes should be restricted to setting preferences, accessing help, and launching MDCS jobs. It is recommended that MATLAB is used without a GUI on the compute nodes, as the interactive response is slow.

"},{"location":"software-packages/MATLAB/#running-parallel-matlab-jobs-using-the-local-cluster","title":"Running parallel MATLAB jobs using the local cluster","text":"

The license for this installation of MATLAB provides only 32 workers via MDCS but provides 36 workers via the local cluster profile (there are 36 cores on a Cirrus compute node), so we only recommend the use of MDCS to test the configuration of distributed memory parallel computations for eventual use of your own MDCS license.

The local cluster should be used within a Slurm job script - you submit a job that runs MATLAB and uses the local cluster, which is the compute node that the job is running on.

MATLAB will normally use up to the total number of cores on a node for multi-threaded operations (e.g. matrix inversions) and for parallel computations. It also make no restriction on its memory use. These features are incompatible with the shared use of nodes on Cirrus. For the local cluster, a wrapper script is provided to limit the number of cores and amount of memory used, in proportion to the number of CPUs selected in the Slurm job script. Please use this wrapper instead of using MATLAB directly.

Say you have a job that requires 3 workers, each running 2 threads. As such, you should employ 3x2=6 cores. An example job script for this particular case would be :

#SBATCH --job-name=Example_MATLAB_Job\n#SBATCH --time=0:20:0\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=6\n#SBATCH --cpus-per-task=1\n\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\nmodule load matlab\n\nmatlab_wrapper -nodisplay < /mnt/lustre/indy2lfs/sw/cse-matlab/examples/testp.m > testp.log\n

Note, for MATLAB versions R2019 and later, the matlab_wrapper_2019 script may be required (see 2019 section below).

This would run the testp.m script, without a display, and exit when testp.m has finished. 6 CPUs are selected, which correspond to 6 cores, and the following limits would be set initially :

ncores = 6\nmemory = 42GB\n\nMaximum number of computational threads (maxNumCompThreads)          = 6\nPreferred number of workers in a parallel pool (PreferredNumWorkers) = 6\nNumber of workers to start on your local machine (NumWorkers)        = 6\nNumber of computational threads to use on each worker (NumThreads)   = 1\n

The testp.m program sets NumWorkers to 3 and NumThreads to 2 :

cirrus_cluster = parcluster('local');\nncores = cirrus_cluster.NumWorkers * cirrus_cluster.NumThreads;\ncirrus_cluster.NumWorkers = 3;\ncirrus_cluster.NumThreads = 2;\nfprintf(\"NumWorkers = %d NumThreads = %d ncores = %d\\n\",cirrus_cluster.NumWorkers,cirrus_cluster.NumThreads,ncores);\nif cirrus_cluster.NumWorkers * cirrus_cluster.NumThreads > ncores\n    disp(\"NumWorkers * NumThreads > ncores\");\n    disp(\"Exiting\");\n    exit(1);\nend\nsaveProfile(cirrus_cluster);\nclear cirrus_cluster;\n\n\nn = 3;\nA = 3000;\n\na=zeros(A,A,n);\nb=1:n;\n\nparpool;\n\ntic\nparfor i = 1:n\n    a(:,:,i) = rand(A);\nend\ntoc\ntic\nparfor i = 1:n\n    b(i) = max(abs(eig(a(:,:,i))));\nend\ntoc\n

Note that PreferredNumWorkers, NumWorkers and NumThreads persist between MATLAB sessions but will be updated correctly if you use the wrapper each time.

NumWorkers and NumThreads can be changed (using parcluster and saveProfile) but NumWorkers * NumThreads should be less than or equal to the number of cores (ncores above). If you wish a worker to run a threaded routine in serial, you must set NumThreads to 1 (the default).

If you specify exclusive node access, then all the cores and memory will be available. On the login nodes, a single core is used and memory is not limited.

"},{"location":"software-packages/MATLAB/#matlab-2019-versions","title":"MATLAB 2019 versions","text":"

There has been a change of configuration options for MATLAB from version R2019 and onwards that means the -r flag has been replaced with the -batch flag. To accommodate that a new job wrapper script is required to run applications. For these versions of MATLAB, if you need to use the -r or -batch flag replace this line in your Slurm script, i.e.:

matlab_wrapper -nodisplay -nodesktop -batch \"main_simulated_data_FINAL_clean(\"$ind\",\"$gamma\",\"$rw\",'\"$SLURM_JOB_ID\"')\n

with:

matlab_wrapper_2019 -nodisplay -nodesktop -batch \"main_simulated_data_FINAL_clean(\"$ind\",\"$gamma\",\"$rw\",'\"$SLURM_JOB_ID\"')\n

and this should allow scripts to run normally.

"},{"location":"software-packages/MATLAB/#running-parallel-matlab-jobs-using-mdcs","title":"Running parallel MATLAB jobs using MDCS","text":"

It is possible to use MATLAB on the login node to set up an MDCS Slurm cluster profile and then launch jobs using that profile. However, this does not give per-job control of the number of cores and walltime; these are set once in the profile.

This MDCS profile can be used in MATLAB on the login node - the MDCS computations are done in Slurm jobs launched using the profile.

"},{"location":"software-packages/MATLAB/#configuration","title":"Configuration","text":"

Start MATLAB on the login node. Configure MATLAB to run parallel jobs on your cluster by calling configCluster. For each cluster, configCluster only needs to be called once per version of MATLAB :

configCluster\n

Jobs will now default to the cluster rather than submit to the local machine (the login node in this case).

"},{"location":"software-packages/MATLAB/#configuring-jobs","title":"Configuring jobs","text":"

Prior to submitting the job, you can specify various parameters to pass to our jobs, such as walltime, e-mail, etc. Other than ProjectCode and WallTime, none of these are required to be set.

NOTE: Any parameters specified using this workflow will be persistent between MATLAB sessions :

% Get a handle to the cluster.\nc = parcluster('cirrus');\n\n% Assign the project code for the job.  **[REQUIRED]**\nc.AdditionalProperties.ProjectCode = 'project-code';\n\n% Specify the walltime (e.g. 5 hours).  **[REQUIRED]**\nc.AdditionalProperties.WallTime = '05:00:00';\n\n% Specify e-mail address to receive notifications about your job.\nc.AdditionalProperties.EmailAddress = 'your_name@your_address';\n\n% Request a specific reservation to run your job.  It is better to\n% use the queues rather than a reservation.\nc.AdditionalProperties.Reservation = 'your-reservation';\n\n% Set the job placement (e.g., pack, excl, scatter:excl).\n% Usually the default of free is what you want.\nc.AdditionalProperties.JobPlacement = 'pack';\n\n% Request to run in a particular queue.  Usually the default (no\n% specific queue requested) will route the job to the correct queue.\nc.AdditionalProperties.QueueName = 'queue-name';\n\n% If you are using GPUs, request up to 4 GPUs per node (this will\n% override a requested queue name and will use the 'gpu' queue).\nc.AdditionalProperties.GpusPerNode = 4;\n

Save changes after modifying AdditionalProperties fields :

c.saveProfile\n

To see the values of the current configuration options, call the specific AdditionalProperties name :

c.AdditionalProperties\n

To clear a value, assign the property an empty value ('', [], or false) :

% Turn off email notifications.\nc.AdditionalProperties.EmailAddress = '';\n
"},{"location":"software-packages/MATLAB/#interactive-jobs","title":"Interactive jobs","text":"

To run an interactive pool job on the cluster, use parpool as before. configCluster sets NumWorkers to 32 in the cluster to match the number of MDCS workers available in our TAH licence. If you have your own MDCS licence, you can change this by setting c.NumWorkers and saving the profile. :

% Open a pool of 32 workers on the cluster.\np = parpool('cirrus',32);\n

Rather than running locally on one compute node machine, this pool can run across multiple nodes on the cluster :

% Run a parfor over 1000 iterations.\nparfor idx = 1:1000\n  a(idx) = ...\nend\n

Once you have finished using the pool, delete it :

% Delete the pool\np.delete\n
"},{"location":"software-packages/MATLAB/#serial-jobs","title":"Serial jobs","text":"

Rather than running interactively, use the batch command to submit asynchronous jobs to the cluster. This is generally more useful on Cirrus, which usually has long queues. The batch command will return a job object which is used to access the output of the submitted job. See the MATLAB documentation for more help on batch :

% Get a handle to the cluster.\nc = parcluster('cirrus');\n\n% Submit job to query where MATLAB is running on the cluster.\nj = c.batch(@pwd, 1, {});\n\n% Query job for state.\nj.State\n\n% If state is finished, fetch results.\nj.fetchOutputs{:}\n\n% Delete the job after results are no longer needed.\nj.delete\n

To retrieve a list of currently running or completed jobs, call parcluster to retrieve the cluster object. The cluster object stores an array of jobs that were run, are running, or are queued to run. This allows you to fetch the results of completed jobs. Retrieve and view the list of jobs as shown below :

c = parcluster('cirrus');\njobs = c.Jobs\n

Once you have identified the job you want, you can retrieve the results as you have done previously.

fetchOutputs is used to retrieve function output arguments; if using batch with a script, use load instead. Data that has been written to files on the cluster needs be retrieved directly from the file system.

To view results of a previously completed job :

% Get a handle on job with ID 2.\nj2 = c.Jobs(2);\n

NOTE: You can view a list of your jobs, as well as their IDs, using the above c.Jobs command :

% Fetch results for job with ID 2.\nj2.fetchOutputs{:}\n\n% If the job produces an error, view the error log file.\nc.getDebugLog(j.Tasks(1))\n

NOTE: When submitting independent jobs, with multiple tasks, you will have to specify the task number.

"},{"location":"software-packages/MATLAB/#parallel-jobs","title":"Parallel jobs","text":"

Users can also submit parallel workflows with batch. You can use the following example (parallel_example.m) for a parallel job :

function t = parallel_example(iter)\n\n  if nargin==0, iter = 16; end\n\n  disp('Start sim')\n\n  t0 = tic;\n  parfor idx = 1:iter\n    A(idx) = idx;\n    pause(2);\n  end\n  t =toc(t0);\n\n  disp('Sim completed.')\n

Use the batch command again, but since you are running a parallel job, you also specify a MATLAB Pool :

% Get a handle to the cluster.\nc = parcluster('cirrus');\n\n% Submit a batch pool job using 4 workers for 16 simulations.\nj = c.batch(@parallel_example, 1, {}, 'Pool', 4);\n\n% View current job status.\nj.State\n\n% Fetch the results after a finished state is retrieved.\nj.fetchOutputs{:}\n\nans =\n\n8.8872\n

The job ran in 8.89 seconds using 4 workers. Note that these jobs will always request N+1 CPU cores, since one worker is required to manage the batch job and pool of workers. For example, a job that needs eight workers will consume nine CPU cores. With a MDCS licence for 32 workers, you will be able to have a pool of 31 workers.

Run the same simulation but increase the Pool size. This time, to retrieve the results later, keep track of the job ID.

NOTE: For some applications, there will be a diminishing return when allocating too many workers, as the overhead may exceed computation time. :

% Get a handle to the cluster.\nc = parcluster('cirrus');\n\n% Submit a batch pool job using 8 workers for 16 simulations.\nj = c.batch(@parallel_example, 1, {}, 'Pool', 8);\n\n% Get the job ID\nid = j.ID\n\nId =\n\n4\n
% Clear workspace, as though you have quit MATLAB.\nclear j\n

Once you have a handle to the cluster, call the findJob method to search for the job with the specified job ID :

% Get a handle to the cluster.\nc = parcluster('cirrus');\n\n% Find the old job\nj = c.findJob('ID', 4);\n\n% Retrieve the state of the job.\nj.State\n\nans\n\nfinished\n\n% Fetch the results.\nj.fetchOutputs{:};\n\nans =\n\n4.7270\n\n% If necessary, retrieve an output/error log file.\nc.getDebugLog(j)\n

The job now runs 4.73 seconds using 8 workers. Run code with different number of workers to determine the ideal number to use.

Alternatively, to retrieve job results via a graphical user interface, use the Job Monitor (Parallel > Monitor Jobs).

"},{"location":"software-packages/MATLAB/#debugging","title":"Debugging","text":"

If a serial job produces an error, you can call the getDebugLog method to view the error log file :

j.Parent.getDebugLog(j.Tasks(1))\n

When submitting independent jobs, with multiple tasks, you will have to specify the task number. For Pool jobs, do not dereference into the job object :

j.Parent.getDebugLog(j)\n

The scheduler ID can be derived by calling schedID :

schedID(j)\n\nans\n\n25539\n
"},{"location":"software-packages/MATLAB/#to-learn-more","title":"To learn more","text":"

To learn more about the MATLAB Parallel Computing Toolbox, check out these resources:

"},{"location":"software-packages/MATLAB/#gpus","title":"GPUs","text":"

Calculations using GPUs can be done using the GPU nodes <../user-guide/gpu>. This can be done using MATLAB within a Slurm job script, similar to using the local cluster <local>, or can be done using the MDCS profile <MDCS>. The GPUs are shared unless you request exclusive access to the node (4 GPUs), so you may find that you share a GPU with another user.

"},{"location":"software-packages/altair_hw/","title":"Altair Hyperworks","text":"

Hyperworks includes best-in-class modeling, linear and nonlinear analyses, structural and system-level optimization, fluid and multi-body dynamics simulation, electromagnetic compatibility (EMC), multiphysics analysis, model-based development, and data management solutions.

"},{"location":"software-packages/altair_hw/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/altair_hw/#using-hyperworks-on-cirrus","title":"Using Hyperworks on Cirrus","text":"

Hyperworks is licenced software so you require access to a Hyperworks licence to access the software. For queries on access to Hyperworks on Cirrus and to enable your access please contact the Cirrus helpdesk.

The standard mode of using Hyperworks on Cirrus is to use the installation of the Desktop application on your local workstation or laptop to set up your model/simulation. Once this has been done you would transsfer the required files over to Cirrus using SSH and then launch the appropriate Solver program (OptiStruct, RADIOSS, MotionSolve).

Once the Solver has finished you can transfer the output back to your local system for visualisation and analysis in the Hyperworks Desktop.

"},{"location":"software-packages/altair_hw/#running-serial-hyperworks-jobs","title":"Running serial Hyperworks jobs","text":"

Each of the Hyperworks Solvers can be run in serial on Cirrus in a similar way. You should construct a batch submission script with the command to launch your chosen Solver with the correct command line options.

For example, here is a job script to run a serial RADIOSS job on Cirrus:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=HW_RADIOSS_test\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Set the number of threads to the CPUs per task\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\n\n# Load Hyperworks module\nmodule load altair-hwsolvers/14.0.210\n\n# Launch the parallel job\n#   Using 36 threads per node\n#\u00a0  srun picks up the distribution from the sbatch options\nsrun --cpu-bind=cores radioss box.fem\n
"},{"location":"software-packages/altair_hw/#running-parallel-hyperworks-jobs","title":"Running parallel Hyperworks jobs","text":"

Only the OptiStruct Solver currently supports parallel execution. OptiStruct supports a number of parallel execution modes of which two can be used on Cirrus:

"},{"location":"software-packages/altair_hw/#optistruct-smp","title":"OptiStruct SMP","text":"

You can use up to 36 physical cores (or 72 virtual cores using HyperThreading) for OptiStruct SMP mode as these are the maximum numbers available on each Cirrus compute node.

You use the -nt option to OptiStruct to specify the number of cores to use.

For example, to run an 18-core OptiStruct SMP calculation you could use the following job script:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=HW_OptiStruct_SMP\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=1\n#SBATCH --cpus-per-task=36\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load Hyperworks module\nmodule load altair-hwsolvers/14.0.210\n\n# Launch the parallel job\n#   Using 36 threads per node\n#\u00a0  srun picks up the distribution from the sbatch options\nsrun --cpu-bind=cores --ntasks=18 optistruct box.fem -nt 18\n
"},{"location":"software-packages/altair_hw/#optistruct-spmd-mpi","title":"OptiStruct SPMD (MPI)","text":"

There are four different parallelisation schemes for SPMD OptStruct that are selected by different flags:

You should launch OptiStruct SPMD using the standard Intel MPI mpirun command.

Note: OptiStruct does not support the use of SGI MPT, you must use Intel MPI.

Example OptiStruct SPMD job submission script:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=HW_OptiStruct_SPMD\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load Hyperworks module and Intel MPI\nmodule load altair-hwsolvers/14.0.210\nmodule load intel-mpi-17\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically \n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Run the OptStruct SPMD Solver (domain decomposition mode)\n#   Use 72 cores, 36 on each node (i.e. all physical cores)\n#\u00a0  srun picks up the distribution from the sbatch options\nsrun --ntasks=72 $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_14.0.211_linux64_impi box.fem -ddmmode\n
"},{"location":"software-packages/castep/","title":"CASTEP","text":"

CASTEP is a leading code for calculating the properties of materials from first principles. Using density functional theory, it can simulate a wide range of properties of materials proprieties including energetics, structure at the atomic level, vibrational properties, electronic response properties etc. In particular it has a wide range of spectroscopic features that link directly to experiment, such as infra-red and Raman spectroscopies, NMR, and core level spectra.

"},{"location":"software-packages/castep/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/castep/#using-castep-on-cirrus","title":"Using CASTEP on Cirrus","text":"

CASTEP is only available to users who have a valid CASTEP licence.

If you have a CASTEP licence and wish to have access to CASTEP on Cirrus please submit a request through the SAFE.

Note

CASTEP versions 19 and above require a separate licence from CASTEP versions 18 and below so these are treated as two separate access requests.

"},{"location":"software-packages/castep/#running-parallel-castep-jobs","title":"Running parallel CASTEP jobs","text":"

CASTEP can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

For example, the following script will run a CASTEP job using 4 nodes (144 cores).

#!/bin/bash\n\n # Slurm job options (name, compute nodes, job time)\n #SBATCH --job-name=CASTEP_Example\n #SBATCH --time=1:0:0\n #SBATCH --exclusive\n #SBATCH --nodes=4\n #SBATCH --tasks-per-node=36\n #SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load CASTEP version 18 module\nmodule load castep/18\n\n# Set OMP_NUM_THREADS=1 to avoid unintentional threading\nexport OMP_NUM_THREADS=1\n\n# Run using input in test_calc.in\nsrun --distribution=block:block castep.mpi test_calc\n
"},{"location":"software-packages/cp2k/","title":"CP2K","text":"

CP2K is a quantum chemistry and solid state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. CP2K provides a general framework for different modelling methods such as DFT using the mixed Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO, \u2026), and classical force fields (AMBER, CHARMM, \u2026). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimisation, and transition state optimisation using NEB or dimer method.

"},{"location":"software-packages/cp2k/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/cp2k/#using-cp2k-on-cirrus","title":"Using CP2K on Cirrus","text":"

CP2K is available through the cp2k module. Loading this module provides access to the MPI/OpenMP hybrid cp2k.psmp executable.

To run CP2K after loading this module you should also source the environment setup script that was generated by CP2K's toolchain (see example job script below)

"},{"location":"software-packages/cp2k/#running-parallel-cp2k-jobs-mpiopenmp-hybrid-mode","title":"Running Parallel CP2K Jobs - MPI/OpenMP Hybrid Mode","text":"

To run CP2K using MPI and OpenMP, load the cp2k module and use the cp2k.psmp executable.

For example, the following script will run a CP2K job using 8 nodes, with 2 OpenMP threads per MPI process:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=CP2K_test\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=8\n#SBATCH --tasks-per-node=18\n#SBATCH --cpus-per-task=2\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load CP2K\nmodule load cp2k\n\n# Source the environment setup script generated by CP2K's install toolchain\nsource $CP2K/tools/toolchain/install/setup\n\n# Set the number of threads to the value specified for --cpus-per-task above\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\n\n# Run using input in test.inp\nsrun cp2k.psmp -i test.inp\n
"},{"location":"software-packages/elements/","title":"ELEMENTS","text":"

ELEMENTS is a computational fluid dynamics (CFD) software tool based on the HELYX\u00ae package developed by ENGYS. The software features an advanced open-source CFD simulation engine and a client-server GUI to provide a flexible and cost-effective HPC solver platform for automotive and motorsports design applications, including a dedicated virtual wind tunnel wizard for external vehicle aerodynamics and other proven methods for UHTM, HVAC, aeroacoustics, etc.

"},{"location":"software-packages/elements/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/elements/#using-elements-on-cirrus","title":"Using ELEMENTS on Cirrus","text":"

ELEMENTS is only available on Cirrus to authorised users with a valid license of the software. For any queries regarding ELEMENTS on Cirrus, please contact ENGYS or the Cirrus Helpdesk.

ELEMENTS applications can be run on Cirrus in two ways:

A complete user's guide to access ELEMENTS on demand via Cirrus is provided by ENGYS as part of this service.

"},{"location":"software-packages/elements/#running-elements-jobs-in-parallel","title":"Running ELEMENTS Jobs in Parallel","text":"

The standard execution of ELEMENTS applications on Cirrus is handled through the command line using a submission script to control Slurm. A basic submission script for running multiple ELEMENTS applications in parallel using the SGI-MPT (Message Passing Toolkit) module is included below. In this example the applications helyxHexMesh, caseSetup and helyxAero are run sequentially using 4 nodes (144 cores).

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Test\n#SBATCH --time=1:00:00\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=36\n#SBATCH --cpus-per-task=1\n#SBATCH --output=test.out\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=t01\n\n# Replace [partition name] below with your partition name (e.g. standard)\n#SBATCH --partition=standard\n\n\n# Replace [QoS name] below with your QoS name (e.g. commercial)\n#SBATCH --qos=commercial\n\n# Load any required modules\nmodule load gcc\nmodule load mpt\n\n# Load the HELYX-Core environment v3.5.0 (select version as needed, e.g. 3.5.0)\nsource /scratch/sw/elements/v3.5.0/CORE/HELYXcore-3.5.0/platforms/activeBuild.shrc\n\n# Launch ELEMENTS applications in parallel\nexport myoptions=\"-parallel\"\njobs=\"helyxHexMesh caseSetup helyxAero\"\n\nfor job in `echo $jobs`\ndo\n\n  case \"$job\" in\n   *                )   options=\"$myoptions\" ;;\n  esac\n\n  srun $job $myoptions 2>&1 | tee log/$job.$SLURM_JOB_ID.out\n\ndone\n

Alternatively, the user can execute most ELEMENTS applications on Cirrus interactively via the GUI by following these simple steps:

  1. Launch ELEMENTS GUI in your local Windows or Linux machine.
  2. Create a client-server connection to Cirrus using the dedicated node provided for this service in the GUI. Enter your Cirrus user login details and the total number of processors to be employed in the cluster for parallel execution.
  3. Use the GUI in the local machine to access the remote file system in Cirrus to load a geometry, create a computational grid, set up a simulation, solve the flow, and post-process the results using the HPC resources available in the cluster. The Slurm scheduling associated with every ELEMENTS job is handled automatically by the client-server.
  4. Visualise the remote data from your local machine, perform changes to the model and complete as many flow simulations in Cirrus as required, all interactively from within the GUI.
  5. Disconnect the client-server at any point during execution, leave a utility or solver running in the cluster, and resume the connection to Cirrus from another client machine to reload an existing case in the GUI when needed.
"},{"location":"software-packages/flacs/","title":"FLACS","text":"

FLACS from Gexcon is the industry standard for CFD explosion modelling and one of the best validated tools for modeling flammable and toxic releases in a technical safety context.

The Cirrus cluster is ideally suited to run multiple FLACS simulations simultaneously, via its batch system. Short lasting simulations (of typically up to a few hours computing time each) can be processed efficiently and you could get a few hundred done in a day or two. In contrast, the Cirrus cluster is not particularly suited for running single big FLACS simulations with many threads: each node on Cirrus has 2x4 memory channels, and for memory-bound applications like FLACS multi-threaded execution will not scale linearly beyond eight cores. On most systems, FLACS will not scale well to more than four cores (cf. the FLACS User's Manual), and therefore multi-core hardware is normally best used by increasing the number of simulations running in parallel rather than by increasing the number of cores per simulation.

Gexcon has two different service offerings on Cirrus: FLACS-Cloud and FLACS-HPC. FLACS-Cloud is the preferable way to exploit the HPC cluster, directly from the FLACS graphical user interfaces. For users who are familiar with accessing remote Linux HPC systems manually, FLACS-HPC may be an option. Both services are presented below.

"},{"location":"software-packages/flacs/#flacs-cloud","title":"FLACS-Cloud","text":"

FLACS-Cloud is a high performance computing service available right from the FLACS-Risk user interface, as well as from the FLACS RunManager. It allows you to run FLACS simulations on the high performance cloud computing infrastructure of Gexcon's partner EPCC straight from the graphical user interfaces of FLACS -- no need to manually log in, transfer data, or start jobs!

By using the FLACS-Cloud service, you can run a large number of simulations very quickly, without having to invest into in-house computing hardware. The FLACS-Cloud service scales to your your demand and facilitates running projects with rapid development cycles.

The workflow for using FLACS-Cloud is described in the FLACS User's Manual and in the FLACS-Risk documentation; you can also find basic information in the knowledge base of the FLACS User Portal (accessible for FLACS license holders).

"},{"location":"software-packages/flacs/#flacs-hpc","title":"FLACS-HPC","text":"

Compared to FLACS-Cloud, the FLACS-HPC service is built on more traditional ways of accessing and using a remote Linux cluster. Therefore the user experience is more basic, and FLACS has to be run manually. For an experienced user, however, this way of exploiting the HPC system can be at least as efficient as FLACS-Cloud.

Follow the steps below to use the FLACS-HPC facilities on Cirrus.

Note: The instructions below assume you have a valid account on Cirrus. To get an account please first get in touch with FLACS support at flacs@gexcon.com and then see the instructions in the Tier-2 SAFE Documentation.

Note: In the instructions below you should substitute \"username\" by your actual Cirrus username.

"},{"location":"software-packages/flacs/#log-into-cirrus","title":"Log into Cirrus","text":"

Log into Cirrus following the instructions at ../user-guide/connecting.

"},{"location":"software-packages/flacs/#upload-your-data-to-cirrus","title":"Upload your data to Cirrus","text":"

Transfer your data to Cirrus by following the instructions at ../user-guide/data.

For example, to copy the scenario definition files from the current directory to the folder project_folder in your home directory on Cirrus run the following command on your local machine:

rsync -avz c*.dat3 username@cirrus.epcc.ed.ac.uk:project_folder\n

Note that this will preserve soft links as such; the link targets are not copied if they are outside the current directory.

"},{"location":"software-packages/flacs/#flacs-license-manager","title":"FLACS license manager","text":"

In order to use FLACS a valid license is required. To check the availability of a license, a license manager is used. To be able to connect to the license manager from the batch system, users wishing to use FLACS should add the following file as ~/.hasplm/hasp_104628.ini (that is, in their home directory)

; copy this file (vendor is gexcon) to ~/.hasplm/hasp_104628.ini\naggressive = 0\nbroadcastsearch = 0\nserveraddr = cirrus-services1\ndisable_IPv6 = 1\n
"},{"location":"software-packages/flacs/#submit-a-flacs-job-to-the-queue","title":"Submit a FLACS job to the queue","text":"

To run FLACS on Cirrus you must first change to the directory where your FLACS jobs are located, use the cd (change directory) command for Linux. For example

cd projects/sim\n

The usual way to submit work to the queue system is to write a submission script, which would be located in the working directory. This is a standard bash shell script, a simple example of which is given here:

#!/bin/bash --login\n\n#SBATCH --job-name=test_flacs_1\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=1\n#SBATCH --time=02:00:00\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load flacs-cfd/21.2\n\nrun_runflacs 012345\n

The script has a series of special comments (introduced by #SBATCH) which give information to the queue system to allow the system to allocate space for the job and to execute the work. These are discussed in more detail below.

The flacs module is loaded to make the application available. Note that you should specify the specific version you require:

module load flacs-cfd/21.2\n

(Use module avail flacs to see which versions are available.) The appropriate FLACS commands can then be executed in the usual way.

Submit your FLACS jobs using the sbatch command, e.g.:

$ sbatch --account=i123 script.sh\nSubmitted batch job 157875\n

The --account=i123 option is obligatory and states that account i123 will be used to record the CPU time consumed by the job, and result in billing to the relevant customer. You will need your project account code here to replace i123. You can check your account details in SAFE.

The name of the submission script here is script.sh. The queue system returns a unique job id (here 157875) to identify the job. For example, the standard output here will appear in a file named slurm-157875.out in the current working directory.

"},{"location":"software-packages/flacs/#options-for-flacs-jobs","title":"Options for FLACS jobs","text":"

The #SBATCH lines in the script above set various parameters which control execution of the job. The first is --job-name just provides a label which will be associated with the job.

The parameter --ntasks=1 is the number of tasks or processes involved in the job. For a serial FLACS job you would use --ntasks=1. The

The maximum length of time (i.e. wall clock time) you want the job to run is specified with the --time=hh:mm:ss option. After this time, your job will be terminated by the job scheduler. The default time limit is 12 hours. It is useful to have an estimate of how long your job will take to be able to specify the correct limit (which can take some experience). Note that shorter jobs can sometimes be scheduled more quickly by the system.

Multithreaded FLACS simulations can be run on Cirrus with the following job submission, schematically:

#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=4\n...\n\nrun_runflacs -dir projects/sim 010101 NumThreads=4\n

When submitting multithreaded FLACS simulations the --cpus-per-task option should be used in order for the queue system to allocate the correct resources (here 4 threads running on 4 cores). In addition, one must also specify the number of threads used by the simulation with the NumThreads=4 option to the run_runflacs.

One can also specify the OpenMP version of FLACS explicitly via, e.g.,

export OMP_NUM_THREADS=20\n\nrun_runflacs version _omp <run number> NumThreads=20\n

See the FLACS manual for further details.

"},{"location":"software-packages/flacs/#monitor-your-jobs","title":"Monitor your jobs","text":"

You can monitor the progress of your jobs with the squeue command. This will list all jobs that are running or queued on the system. To list only your jobs use:

squeue -u username\n
"},{"location":"software-packages/flacs/#submitting-many-flacs-jobs-as-a-job-array","title":"Submitting many FLACS jobs as a job array","text":"

Running many related scenarios with the FLACS simulator is ideally suited for using job arrays, i.e. running the simulations as part of a single job.

Note you must determine ahead of time the number of scenarios involved. This determines the number of array elements, which must be specified at the point of job submission. The number of array elements is specified by --array argument to sbatch.

A job script for running a job array with 128 FLACS scenarios that are located in the current directory could look like this:

#!/bin/bash --login\n\n# Recall that the resource specification is per element of the array\n# so this would give four instances of one task (with one thread per\n# task --cpus-per-task=1).\n\n#SBATCH --array=1-128\n\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=1\n#SBATCH --time=02:00:00\n#SBATCH --account=z04\n\n#SBATCH --partition=standard\n#SBATCH --qos=commercial\n\n# Abbreviate some SLURM variables for brevity/readability\n\nTASK_MIN=${SLURM_ARRAY_TASK_MIN}\nTASK_MAX=${SLURM_ARRAY_TASK_MAX}\nTASK_ID=${SLURM_ARRAY_TASK_ID}\nTASK_COUNT=${SLURM_ARRAY_TASK_COUNT}\n\n# Form a list of relevant files, and check the number of array elements\n# matches the number of cases with 6-digit identifiers.\n\nCS_FILES=(`ls -1 cs??????.dat3`)\n\nif test \"${#CS_FILES[@]}\" -ne \"${TASK_COUNT}\";\nthen\n  printf \"Number of files is:       %s\\n\" \"${#CS_FILES[@]}\"\n  printf \"Number of array tasks is: %s\\n\" \"${TASK_COUNT}\"\n  printf \"Do not match!\\n\"\nfi\n\n# All tasks loop through the entire list to find their specific case.\n\nfor (( jid = $((${TASK_MIN})); jid <= $((${TASK_MAX})); jid++ ));\ndo\n  if test \"${TASK_ID}\" -eq \"${jid}\";\n  then\n      # File list index with offset zero\n  file_id=$((${jid} - ${TASK_MIN}))\n  # Form the substring file_id (recall syntax is :offset:length)\n  my_file=${CS_FILES[${file_id}]}\n  my_file_id=${my_file:2:6}\n  fi\ndone\n\nprintf \"Task %d has file %s id %s\\n\" \"${TASK_ID}\" \"${my_file}\" \"${my_file_id}\"\n\nmodule load flacs-cfd/21.2\n`which run_runflacs` ${my_file_id}\n
"},{"location":"software-packages/flacs/#transfer-data-from-cirrus-to-your-local-system","title":"Transfer data from Cirrus to your local system","text":"

After your simulations are finished, transfer the data back from Cirrus following the instructions at ../user-guide/data.

For example, to copy the result files from the directory project_folder in your home directory on Cirrus to the folder /tmp on your local machine use:

rsync -rvz --include='r[13t]*.*' --exclude='*' username@cirrus.epcc.ed.ac.uk:project_folder/ /tmp\n
"},{"location":"software-packages/flacs/#billing-for-flacs-hpc-use-on-cirrus","title":"Billing for FLACS-HPC use on Cirrus","text":"

CPU time on Cirrus is measured in CPUh for each job run on a compute node, based on the number of physical cores employed. Only jobs submitted to compute nodes via sbatch are charged. Any processing on a login node is not charged. However, using login nodes for computations other than simple pre- or post-processing is strongly discouraged.

Gexcon normally bills monthly for the use of FLACS-Cloud and FLACS-HPC, based on the Cirrus CPU usage logging.

"},{"location":"software-packages/flacs/#getting-help","title":"Getting help","text":"

Get in touch with FLACS Support by email to flacs@gexcon.com if you encounter any problems. For specific issues related to Cirrus rather than FLACS contact the Cirrus helpdesk.

"},{"location":"software-packages/gaussian/","title":"Gaussian","text":"

Gaussian is a general-purpose computational chemistry package.

"},{"location":"software-packages/gaussian/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/gaussian/#using-gaussian-on-cirrus","title":"Using Gaussian on Cirrus","text":"

Gaussian on Cirrus is only available to University of Edinburgh researchers through the University's site licence. Users from other institutions cannot access the version centrally-installed on Cirrus.

If you wish to have access to Gaussian on Cirrus please request access via SAFE

Gaussian cannot run across multiple nodes. This means that the maximum number of cores you can use for Gaussian jobs is 36 (the number of cores on a compute node). In reality, even large Gaussian jobs will not be able to make effective use of more than 8 cores. You should explore the scaling and performance of your calculations on the system before running production jobs.

"},{"location":"software-packages/gaussian/#scratch-directories","title":"Scratch Directories","text":"

You will typically add lines to your job submission script to create a scratch directory on the solid state storage for temporary Gaussian files. e.g.:

export GAUSS_SCRDIR=\"/scratch/space1/x01/auser/$SLURM_JOBID.tmp\"\nmkdir -p $GAUSS_SCRDIR\n

You should also add a line at the end of your job script to remove the scratch directory. e.g.:

rm -r $GAUSS_SCRDIR\n
"},{"location":"software-packages/gaussian/#running-serial-gaussian-jobs","title":"Running serial Gaussian jobs","text":"

In many cases you will use Gaussian in serial mode. The following example script will run a serial Gaussian job on Cirrus (before using, ensure you have created a Gaussian scratch directory as outlined above).

#!/bin/bash\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=G16_test\n#SBATCH --ntasks=1\n#SBATCH --time=0:20:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load Gaussian module\nmodule load gaussian\n\n# Setup the Gaussian environment\nsource $g16root/g16/bsd/g16.profile\n\n# Location of the scratch directory\nexport GAUSS_SCRDIR=\"/scratch/space1/x01/auser/$SLURM_JOBID.tmp\"\nmkdir -p $GAUSS_SCRDIR\n\n# Run using input in \"test0027.com\"\ng16 test0027\n\n# Remove the temporary scratch directory\nrm -r $GAUSS_SCRDIR\n
"},{"location":"software-packages/gaussian/#running-parallel-gaussian-jobs","title":"Running parallel Gaussian jobs","text":"

Gaussian on Cirrus can use shared memory parallelism through OpenMP by setting the OMP_NUM_THREADS environment variable. The number of cores requested in the job should also be modified to match.

For example, the following script will run a Gaussian job using 4 cores.

#!/bin/bash --login\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=G16_test\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=4\n#SBATCH --time=0:20:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load Gaussian module\nmodule load gaussian\n\n# Setup the Gaussian environment\nsource $g16root/g16/bsd/g16.profile\n\n# Location of the scratch directory\nexport GAUSS_SCRDIR=\"/scratch/space1/x01/auser/$SLURM_JOBID.tmp\"\nmkdir -p $GAUSS_SCRDIR\n\n# Run using input in \"test0027.com\"\nexport OMP_NUM_THREADS=4\ng16 test0027\n\n# Remove the temporary scratch directory\nrm -r $GAUSS_SCRDIR\n
"},{"location":"software-packages/gromacs/","title":"GROMACS","text":"

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

"},{"location":"software-packages/gromacs/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/gromacs/#using-gromacs-on-cirrus","title":"Using GROMACS on Cirrus","text":"

GROMACS is Open Source software and is freely available to all Cirrus users. A number of versions are available:

"},{"location":"software-packages/gromacs/#running-parallel-gromacs-jobs-pure-mpi","title":"Running parallel GROMACS jobs: pure MPI","text":"

GROMACS can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

For example, the following script will run a GROMACS MD job using 2 nodes (72 cores) with pure MPI.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=gmx_test\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=36\n#SBATCH --time=0:25:0\n# Make sure you are not sharing nodes with other users\n#SBATCH --exclusive\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load GROMACS module\nmodule load gromacs\n\n# Run using input in test_calc.tpr\nexport OMP_NUM_THREADS=1 \nsrun gmx_mpi mdrun -s test_calc.tpr\n
"},{"location":"software-packages/gromacs/#running-parallel-gromacs-jobs-hybrid-mpiopenmp","title":"Running parallel GROMACS jobs: hybrid MPI/OpenMP","text":"

The following script will run a GROMACS MD job using 2 nodes (72 cores) with 6 MPI processes per node (12 MPI processes in total) and 6 OpenMP threads per MPI process.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=gmx_test\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=6\n#SBATCH --cpus-per-task=6\n#SBATCH --time=0:25:0\n# Make sure you are not sharing nodes with other users\n#SBATCH --exclusive\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load GROMACS and MPI modules\nmodule load gromacs\n\n# Run using input in test_calc.tpr\nexport OMP_NUM_THREADS=6\nsrun gmx_mpi mdrun -s test_calc.tpr\n
"},{"location":"software-packages/gromacs/#gromacs-gpu-jobs","title":"GROMACS GPU jobs","text":"

The following script will run a GROMACS GPU MD job using 1 node (40 cores and 4 GPUs). The job is set up to run on \\<MPI task count> MPI processes, and \\<OMP thread count> OMP threads -- you will need to change these variables when running your script.

Note

Unlike the base version of GROMACS, the GPU version comes with only MDRUN installed. For any pre- and post-processing, you will need to use the non-GPU version of GROMACS.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=gmx_test\n#SBATCH --nodes=1\n#SBATCH --time=0:25:0\n#SBATCH --exclusive\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n#SBATCH --gres=gpu:4\n\n# Load GROMACS and MPI modules\nmodule load gromacs/2023.4-gpu\n\n# Run using input in test_calc.tpr\nexport OMP_NUM_THREADS=<OMP thread count>\nsrun --ntasks=<MPI task count> --cpus-per-task=<OMP thread count> \\\n     gmx_mpi mdrun -ntomp <OMP thread count> -s test_calc.tpr\n

Information on how to assign different types of calculation to the CPU or GPU appears in the GROMACS documentation under Getting good performance from mdrun

"},{"location":"software-packages/helyx/","title":"HELYX\u00ae","text":"

HELYX is a comprehensive, general-purpose, computational fluid dynamics (CFD) software package for engineering analysis and design optimisation developed by ENGYS. The package features an advanced open-source CFD simulation engine and a client-server GUI to provide a flexible and cost-effective HPC solver platform for enterprise applications.

"},{"location":"software-packages/helyx/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/helyx/#using-helyx-on-cirrus","title":"Using HELYX on Cirrus","text":"

HELYX is only available on Cirrus to authorised users with a valid license to use the software. For any queries regarding HELYX on Cirrus, please contact ENGYS or the Cirrus Helpdesk.

HELYX applications can be run on Cirrus in two ways:

A complete user\u2019s guide to access HELYX on demand via Cirrus is provided by ENGYS as part of this service.

"},{"location":"software-packages/helyx/#running-helyx-jobs-in-parallel","title":"Running HELYX Jobs in Parallel","text":"

The standard execution of HELYX applications on Cirrus is handled through the command line using a submission script to control Slurm. A basic submission script for running multiple HELYX applications in parallel using the SGI-MPT (Message Passing Toolkit) module is included below. In this example the applications helyxHexMesh, caseSetup and helyxSolve are run sequentially using 4 nodes (144 cores).

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Test\n#SBATCH --time=1:00:00\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=36\n#SBATCH --cpus-per-task=1\n#SBATCH --output=test.out\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=t01\n\n# Replace [partition name] below with your partition name (e.g. standard)\n#SBATCH --partition=standard\n\n# Replace [QoS name] below with your QoS name (e.g. commercial)\n#SBATCH --qos=commercial\n\n# Load any required modules\nmodule load gcc\nmodule load mpt\n\n# Load the HELYX-Core environment v3.5.0 (select version as needed, e.g. 3.5.0)\nsource /scratch/sw/helyx/v3.5.0/CORE/HELYXcore-3.5.0/platforms/activeBuild.shrc\n\n# Set the number of threads to 1\nexport OMP_NUM_THREADS=1\n\n# Launch HELYX applications in parallel\nexport myoptions=\"-parallel\"\njobs=\"helyxHexMesh caseSetup helyxSolve\"\n\nfor job in `echo $jobs`\ndo\n\n   case \"$job\" in\n    *                )   options=\"$myoptions\" ;;\n   esac\n\n   srun $job $myoptions 2>&1 | tee log/$job.$SLURM_JOB_ID.out\n\ndone\n

Alternatively, the user can execute most HELYX applications on Cirrus interactively via the GUI by following these simple steps:

  1. Launch HELYX GUI in your local Windows or Linux machine.
  2. Create a client-server connection to Cirrus using the dedicated node provided for this service in the GUI. Enter your Cirrus user login details and the total number of processors to be employed in the cluster for parallel execution.
  3. Use the GUI in the local machine to access the remote file system in Cirrus to load a geometry, create a computational grid, set up a simulation, solve the flow, and post-process the results using the HPC resources available in the cluster. The Slurm scheduling associated with every HELYX job is handled automatically by the client-server.
  4. Visualise the remote data from your local machine, perform changes to the model and complete as many flow simulations in Cirrus as required, all interactively from within the GUI.
  5. Disconnect the client-server at any point during execution, leave a utility or solver running in the cluster, and resume the connection to Cirrus from another client machine to reload an existing case in the GUI when needed.
"},{"location":"software-packages/lammps/","title":"LAMMPS","text":"

LAMMPS, is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.

"},{"location":"software-packages/lammps/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/lammps/#using-lammps-on-cirrus","title":"Using LAMMPS on Cirrus","text":"

LAMMPS is Open Source software, and is freely available to all Cirrus users. A number of versions are available:

"},{"location":"software-packages/lammps/#running-parallel-lammps-jobs-mpi","title":"Running parallel LAMMPS jobs (MPI)","text":"

LAMMPS can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

For example, the following script will run a LAMMPS MD job using 4 nodes (144 cores) with pure MPI.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=lammps_Example\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load LAMMPS module\nmodule load lammps\n\n# Run using input in in.test\nsrun lmp_mpi < in.test\n
"},{"location":"software-packages/lammps/#running-parallel-lammps-jobs-gpu","title":"Running parallel LAMMPS jobs (GPU)","text":"

LAMMPS can exploit multiple GPUs, although the performance scaling depends heavily on the particular system, so each user should run benchmarks for their particular use-case. While not every LAMMPS forcefield/fix are available for GPU, a vast majority is, and more are added with each new version. Check the LAMMPS documentation for GPU compatibility with a specific command.

For example, the following script will run a LAMMPS MD job using 2 GPUs

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=lammps_Example\n#SBATCH --time=00:20:00\n#SBATCH --nodes=1\n#SBATCH --gres=gpu:2\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load LAMMPS module\nmodule load lammps-gpu\n\n# Run using input in in.test\nsrun lmp -sf gpu -pk gpu 2 -in input.file -l log.file\n
"},{"location":"software-packages/lammps/#compiling-lammps-on-cirrus","title":"Compiling LAMMPS on Cirrus","text":"

Compile instructions for LAMMPS on Cirrus can be found on GitHub:

"},{"location":"software-packages/molpro/","title":"Molpro","text":"

Molpro is a comprehensive system of ab initio programs for advanced molecular electronic structure calculations, designed and maintained by H.-J. Werner and P. J. Knowles, and containing contributions from many other authors. It comprises efficient and well parallelized programs for standard computational chemistry applications, such as DFT with a large choice of functionals, as well as state-of-the art high-level coupled-cluster and multi-reference wave function methods.

"},{"location":"software-packages/molpro/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/molpro/#using-molpro-on-cirrus","title":"Using Molpro on Cirrus","text":"

In order to use the Molpro binaries on Cirrus you must possess a valid Molpro licence key. Without a key you will be able to access the binaries but will not be able to run any calculations.

"},{"location":"software-packages/molpro/#running","title":"Running","text":"

To run Molpro you need to add the correct module to your environment; specify your licence key using the MOLPRO_KEY environment variable and make sure you specify the location for the temporary files using the TMPDIR environment variable. You can load the default Molpro module with:

module add molpro\n

Once you have loaded the module, the Molpro executables are available in your PATH.

"},{"location":"software-packages/molpro/#example-job-submission-script","title":"Example Job Submission Script","text":"

An example Molpro job submission script is shown below.

#!/bin/bash\n#SBATCH --job-name=molpro_test\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=36\n#SBATCH --exclusive\n#SBATCH --time=0:15:0\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Replace \"budget\" with your budget code in the line below\n#SBATCH --account=budget\n\n# Load the molpro module \nmodule add molpro\n\n# Specify your Molpro licence key\n#   Replace this with the value of your Molpro licence key\nexport MOLPRO_KEY=\"...your Molpro key...\"\n\n# Make sure temporary files are in your home file space\nexport TMPDIR=$SLURM_SUBMIT_DIR\n\n# Run Molpro using the input my_file.inp\n#    Requested 1 node above = 36 cores\n#\u00a0   Note use of \"molpro\" command rather than usual \"srun\"\nmolpro -n 36 my_file.inp\n
"},{"location":"software-packages/namd/","title":"NAMD","text":"

NAMD, recipient of a 2002 Gordon Bell Award and a 2012 Sidney Fernbach Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.

"},{"location":"software-packages/namd/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/namd/#using-namd-on-cirrus","title":"Using NAMD on Cirrus","text":"

NAMD is freely available to all Cirrus users.

"},{"location":"software-packages/namd/#running-parallel-namd-jobs","title":"Running parallel NAMD jobs","text":"

NAMD can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

For example, the following script will run a NAMD MD job across 2 nodes (72 cores) with 2 processes/tasks per node and 18 cores per process, one of which is reserved for communications.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=NAMD_Example\n#SBATCH --time=01:00:00\n#SBATCH --exclusive\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=2\n#SBATCH --cpus-per-task=18\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load namd/2.14\n\nsrun namd2 +setcpuaffinity +ppn 17 +pemap 1-17,19-35 +commap 0,18 input.namd\n

NAMD can also be run without SMP.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=NAMD_Example\n#SBATCH --time=01:00:00\n#SBATCH --exclusive\n#SBATCH --nodes=2\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load namd/2.14-nosmp\n\nsrun namd2 +setcpuaffinity input.namd\n

And, finally, there's also a GPU version. The example below uses 8 GPUs across two GPU nodes, running one process per GPU and 9 worker threads per process (+ 1 comms thread).

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=NAMD_Example\n#SBATCH --time=01:00:00\n#SBATCH --nodes=2\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --gres=gpu:2\n\nmodule load namd/2022.07.21-gpu\n\nsrun --hint=nomultithread --ntasks=8 --tasks-per-node=4 \\ \n    namd2 +ppn 9 +setcpuaffinity +pemap 1-9,11-19,21-29,31-39 +commap 0,10,20,30 \\\n          +devices 0,1,2,3 input.namd\n
"},{"location":"software-packages/openfoam/","title":"OpenFOAM","text":"

OpenFOAM is an open-source toolbox for computational fluid dynamics. OpenFOAM consists of generic tools to simulate complex physics for a variety of fields of interest, from fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics, electromagnetism and the pricing of financial options.

The core technology of OpenFOAM is a flexible set of modules written in C++. These are used to build solvers and utilities to perform pre- and post-processing tasks ranging from simple data manipulation to visualisation and mesh processing.

"},{"location":"software-packages/openfoam/#available-versions","title":"Available Versions","text":"

OpenFOAM comes in a number of different flavours. The two main releases are from https://openfoam.org/ and from https://www.openfoam.com/.

You can query the versions of OpenFOAM are currently available on Cirrus from the command line with module avail openfoam.

Versions from https://openfoam.org/ are typically v8 etc, while versions from https://www.openfoam.com/ are typically v2006 (released June 2020).

"},{"location":"software-packages/openfoam/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/openfoam/#using-openfoam-on-cirrus","title":"Using OpenFOAM on Cirrus","text":"

Any batch script which intends to use OpenFOAM should first load the appropriate openfoam module. You then need to source the etc/bashrc file provided by OpenFOAM to set all the relevant environment variables. The relevant command is printed to screen when the module is loaded. For example, for OpenFOAM v8:

module add openfoam/v8.0\nsource ${FOAM_INSTALL_PATH}/etc/bashrc\n

You should then be able to use OpenFOAM in the usual way.

"},{"location":"software-packages/openfoam/#example-batch-submisison","title":"Example Batch Submisison","text":"

The following example batch submission script would run OpenFOAM on two nodes, with 36 MPI tasks per node.

#!/bin/bash\n\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=36\n#SBATCH --exclusive\n#SBATCH --time=00:10:00\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the openfoam module and source the bashrc file\n\nmodule load openfoam/v8.0\nsource ${FOAM_INSTALL_PATH}/etc/bashrc\n\n# Compose OpenFOAM work in the usual way, except that parallel\n# executables are launched via srun. For example:\n\nsrun interFoam -parallel\n

A SLURM submission script would usually also contain an account token of the form

#SBATCH --account=your_account_here\n

where the your_account_here should be replaced by the relevant token for your account. This is available from SAFE with your budget details.

"},{"location":"software-packages/orca/","title":"ORCA","text":"

ORCA is an ab initio quantum chemistry program package that contains modern electronic structure methods including density functional theory, many-body perturbation, coupled cluster, multireference methods, and semi-empirical quantum chemistry methods. Its main field of application is larger molecules, transition metal complexes, and their spectroscopic properties. ORCA is developed in the research group of Frank Neese. The free version is available only for academic use at academic institutions.

"},{"location":"software-packages/orca/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/orca/#using-orca-on-cirrus","title":"Using ORCA on Cirrus","text":"

ORCA is available for academic use on Cirrus only. If you wish to use ORCA for commercial applications, you must contact the ORCA developers.

ORCA cannot use GPUs.

"},{"location":"software-packages/orca/#running-parallel-orca-jobs","title":"Running parallel ORCA jobs","text":"

The following script will run an ORCA job on the Cirrus using 4 MPI processes on a single node, each MPI process will be placed on a separate physical core. It assumes that the input file is h2o_2.inp

#!/bin/bash\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=ORCA_test\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=4\n\n#SBATCH --time=0:20:0\n\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load ORCA module\nmodule load orca\n\n# Launch the ORCA calculation\n#   * You must use \"$ORCADIR/orca\" so the application has the full executable path\n#   * Do not use \"srun\" to launch parallel ORCA jobs as they use interal ORCA routines to launch in parallel\n#   * Remember to change the name of the input file to match your file name\n$ORCADIR/orca h2o_2.inp\n

The example input file h2o_2.inp contains:

! DLPNO-CCSD(T) cc-pVTZ cc-pVTZ/C cc-pVTZ/jk rijk verytightscf TightPNO LED\n# Specify number of processors\n%pal\nnprocs 4\nend\n# Specify memory\n%maxcore 12000\n%mdci\nprintlevel 3\nend\n* xyz 0 1\nO 1.327706 0.106852 0.000000\nH 1.612645 -0.413154 0.767232\nH 1.612645 -0.413154 -0.767232\nO -1.550676 -0.120030 -0.000000\nH -0.587091 0.053367 -0.000000\nH -1.954502 0.759303 -0.000000\n*\n%geom\nFragments\n2 {3:5} end\nend\nend\n
"},{"location":"software-packages/qe/","title":"Quantum Espresso (QE)","text":"

Quantum Espresso is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.

"},{"location":"software-packages/qe/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/qe/#using-qe-on-cirrus","title":"Using QE on Cirrus","text":"

QE is Open Source software and is freely available to all Cirrus users.

"},{"location":"software-packages/qe/#running-parallel-qe-jobs","title":"Running parallel QE jobs","text":"

QE can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

For example, the following script will run a QE pw.x job using 4 nodes (144 cores).

#!/bin/bash\n#\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=pw_test\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --time=0:20:0\n# Make sure you are not sharing nodes with other users\n#SBATCH --exclusive\n\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load QE and MPI modules\nmodule load quantum-espresso\n\n# Run using input in test_calc.in\nsrun pw.x -i test_cals.in\n
"},{"location":"software-packages/starccm%2B/","title":"STAR-CCM+","text":"

STAR-CCM+ is a computational fluid dynamics (CFD) code and beyond. It provides a broad range of validated models to simulate disciplines and physics including CFD, computational solid mechanics (CSM), electromagnetics, heat transfer, multiphase flow, particle dynamics, reacting flow, electrochemistry, aero-acoustics and rheology; the simulation of rigid and flexible body motions with techniques including mesh morphing, overset mesh and six degrees of freedom (6DOF) motion; and the ability to combine and account for the interaction between the various physics and motion models in a single simulation to cover your specific application.

"},{"location":"software-packages/starccm%2B/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/starccm%2B/#licensing","title":"Licensing","text":"

All users must provide their own licence for STAR-CCM+. Currently we only support Power on Demand (PoD) licenses

For queries about other types of license options please contact the Cirrus Helpdesk with the relevant details.

"},{"location":"software-packages/starccm%2B/#using-star-ccm-on-cirrus-interactive-remote-gui-mode","title":"Using STAR-CCM+ on Cirrus: Interactive remote GUI Mode","text":"

A fast and responsive way of running with a GUI is to install STAR-CCM+ on your local Windows(7 or 10) or Linux workstation. You can then start your local STAR-CCM+ and connect to Cirrus in order to submit new jobs or query the status of running jobs.

You will need to setup passwordless SSH connections to Cirrus.

"},{"location":"software-packages/starccm%2B/#jobs-using-power-on-demand-pod-licences","title":"Jobs using Power on Demand (PoD) licences","text":"

You can then start the STAR-CCM+ server on the compute nodes. The following script starts the server:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=STAR-CCM_test\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=14\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the default HPE MPI environment\nmodule load mpt\nmodule load starccm+\n\nexport SGI_MPI_HOME=$MPI_ROOT\nexport PATH=$STARCCM_EXE:$PATH\nexport LM_LICENSE_FILE=48002@192.168.191.10\nexport CDLMD_LICENSE_FILE=48002@192.168.191.10\n\nexport LIBNSL_PATH=/mnt/lustre/indy2lfs/sw/libnsl/1.3.0\n\nscontrol show hostnames $SLURM_NODELIST > ./starccm.launcher.host.$SLURM_JOB_ID.txt\n\nstarccm+ -clientldlibpath ${LIBNSL_PATH}/lib -ldlibpath ${LIBNSL_PATH}/lib \\\n         -power -podkey <PODkey> -licpath ${LM_LICENSE_FILE} \\\n         -server -machinefile ./starccm.launcher.host.$SLURM_JOB_ID.txt \\\n         -np 504 -rsh ssh\n

You should replace \"<PODkey>\" with your PoD licence key.

"},{"location":"software-packages/starccm%2B/#automatically-load-and-start-a-star-ccm-simulation","title":"Automatically load and start a Star-CCM+ simulation","text":"

You can use the \"-batch\" option to automatically load and start a Star-CCM+ simulation.

Your submission script will look like this (the only difference with the previous examples is the \"starccm+\" line)

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=STAR-CCM_test\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=14\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the default HPE MPI environment\nmodule load mpt\nmodule load starccm+\n\nexport SGI_MPI_HOME=$MPI_ROOT\nexport PATH=$STARCCM_EXE:$PATH\nexport LM_LICENSE_FILE=48002@192.168.191.10\nexport CDLMD_LICENSE_FILE=48002@192.168.191.10\n\nexport LIBNSL_PATH=/mnt/lustre/indy2lfs/sw/libnsl/1.3.0\n\nscontrol show hostnames $SLURM_NODELIST > ./starccm.launcher.host.$SLURM_JOB_ID.txt\n\nstarccm+ -clientldlibpath ${LIBNSL_PATH}/lib -ldlibpath ${LIBNSL_PATH}/lib \\\n         -power -podkey <PODkey> -licpath ${LM_LICENSE_FILE} \\\n         -batch simulation.java \\\n         -machinefile ./starccm.launcher.host.$SLURM_JOB_ID.txt \\\n         -np 504 -rsh ssh\n

This script will load the file \"simulation.java\". You can find instructions on how to write a suitable \"simulation.java\" in the Star-CCM+ documentation

The file \"simulation.java\" must be in the same directory as your Slurm submission script (or you can provide a full path).

"},{"location":"software-packages/starccm%2B/#local-star-ccm-client-configuration","title":"Local Star-CCM+ client configuration","text":"

Start your local STAR-CCM+ application and connect to your server. Click on the File -> \"Connect to Server...\" option and use the following settings:

Select the \"Connect through SSH tunnel\" option and use:

Your local STAR-CCM+ client should now be connected to the remote server. You should be able to run a new simulation or interact with an existing one.

"},{"location":"software-packages/vasp/","title":"VASP","text":"

The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

VASP computes an approximate solution to the many-body Schr\u00f6dinger equation, either within density functional theory (DFT), solving the Kohn-Sham equations, or within the Hartree-Fock (HF) approximation, solving the Roothaan equations. Hybrid functionals that mix the Hartree-Fock approach with density functional theory are implemented as well. Furthermore, Green's functions methods (GW quasiparticles, and ACFDT-RPA) and many-body perturbation theory (2nd-order M\u00f8ller-Plesset) are available in VASP.

In VASP, central quantities, like the one-electron orbitals, the electronic charge density, and the local potential are expressed in plane wave basis sets. The interactions between the electrons and ions are described using norm-conserving or ultrasoft pseudopotentials, or the projector-augmented-wave method.

To determine the electronic groundstate, VASP makes use of efficient iterative matrix diagonalisation techniques, like the residual minimisation method with direct inversion of the iterative subspace (RMM-DIIS) or blocked Davidson algorithms. These are coupled to highly efficient Broyden and Pulay density mixing schemes to speed up the self-consistency cycle.

"},{"location":"software-packages/vasp/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/vasp/#using-vasp-on-cirrus","title":"Using VASP on Cirrus","text":"

CPU and GPU versions of VASP are available on Cirrus

VASP is only available to users who have a valid VASP licence. VASP 5 and VASP 6 are separate packages on Cirrus and requests for access need to be made separately for the two versions via SAFE.

If you have a VASP 5 or VASP 6 licence and wish to have access to VASP on Cirrus please request access through SAFE:

Once your access has been enabled, you access the VASP software using the vasp modules in your job submission script. You can see which versions of VASP are currently available on Cirrus with

module avail vasp\n

Once loaded, the executables are called:

All executables include the additional MD algorithms accessed via the MDALGO keyword.

"},{"location":"software-packages/vasp/#running-parallel-vasp-jobs-cpu","title":"Running parallel VASP jobs - CPU","text":"

The CPU version of VASP can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

The following script will run a VASP job using 4 nodes (144 cores).

#!/bin/bash\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=VASP_CPU_test\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --exclusive\n#SBATCH --time=0:20:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load VASP version 6 module\nmodule load vasp/6\n\n# Set number of OpenMP threads to 1\nexport OMP_NUM_THREADS=1\n\n# Run standard VASP executable\nsrun --hint=nomultithread --distribution=block:block vasp_std\n
"},{"location":"software-packages/vasp/#running-parallel-vasp-jobs-gpu","title":"Running parallel VASP jobs - GPU","text":"

The GPU version of VASP can exploit multiple GPU across multiple nodes, you should benchmark your system to ensure you understand how many GPU can be used in parallel for your calculations.

The following script will run a VASP job using 2 GPU on 1 node.

#!/bin/bash\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=VASP_GPU_test\n#SBATCH --nodes=1\n#SBATCH --gres=gpu:2\n#SBATCH --time=0:20:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n\n# Load VASP version 6 module\nmodule load vasp/6/6.3.2-gpu-nvhpc22\n\n# Set number of OpenMP threads to 1\nexport OMP_NUM_THREADS=1\n\n# Run standard VASP executable with 1 MPI process per GPU\nsrun --ntasks=2 --cpus-per-task=10 --hint=nomultithread --distribution=block:block vasp_std\n
"},{"location":"software-tools/ddt/","title":"Debugging using Arm DDT","text":"

The Arm Forge tool suite is installed on Cirrus. This includes DDT, which is a debugging tool for scalar, multi-threaded and large-scale parallel applications. To compile your code for debugging you will usually want to specify the -O0 option to turn off all code optimisation (as this can produce a mismatch between source code line numbers and debugging information) and -g to include debugging information in the compiled executable. To use this package you will need to log in to Cirrus with X11-forwarding enabled, load the Arm Forge module and execute forge:

module load forge\nforge\n
"},{"location":"software-tools/ddt/#debugging-runs-on-the-login-nodes","title":"Debugging runs on the login nodes","text":"

You can execute and debug your MPI code on the login node which is useful for immediate development work with short, small, simple runs to avoid having to wait in the queue. Firstly ensure you have loaded the mpt module and any other dependencies of your code, then start Forge and click Run. Fill in the necessary details of your code under the Application pane, then tick the MPI tick box, specify the number of MPI processes you wish to run and ensure the implementation is set to HPE MPT (2.18+). If this is not set correctly then you can update the configuration by clicking the Change button and selecting this option on the MPI/UPC Implementation field of the system pane. When you are happy with this hit Run to start.

"},{"location":"software-tools/ddt/#debugging-runs-on-the-compute-nodes","title":"Debugging runs on the compute nodes","text":"

This involves DDT submitting your job to the queue, and as soon as the compute nodes start executing you will drop into the debug session and be able to interact with your code. Start Forge and click on Run, then in the Application pane provide the details needed for your code. Then tick the MPI box -- when running on the compute nodes, you must set the MPI implementation to Slurm (generic). You must also tick the Submit to Queue box. Clicking the Configure button in this section, you must now choose the submission template. One is provided for you at /mnt/lustre/indy2lfs/sw/arm/forge/latest/templates/cirrus.qtf which you should copy and modify to suit your needs. You will need to load any modules required for your code and perform any other necessary setup, such as providing extra sbatch options, i.e., whatever is needed for your code to run in a normal batch job.

Note

The current Arm Forge licence permits use on the Cirrus CPU nodes only. The licence does not permit use of DDT/MAP for codes that run on the Cirrus GPUs.

Back in the DDT run window, you can click on Parameters in the same queue pane to set the partition and QoS to use, the account to which the job should be charged, and the maximum walltime. You can also now look at the MPI pane again and select the number of processes and nodes to use. Finally, clicking Submit will place the job in the queue. A new window will show you the queue until the job starts at which you can start to debug.

"},{"location":"software-tools/ddt/#memory-debugging-with-ddt","title":"Memory debugging with DDT","text":"

If you are dynamically linking your code and debugging it on the login node then this is fine (just ensure that the Preload the memory debugging library option is ticked in the Details pane.) If you are dynamically linking but intending to debug running on the compute nodes, or statically linking then you need to include the compile option -Wl,--allow-multiple-definition and explicitly link your executable with Allinea's memory debugging library. The exactly library to link against depends on your code; -ldmalloc (for no threading with C), -ldmallocth (for threading with C), -ldmallocxx (for no threading with C++) or -ldmallocthcxx (for threading with C++). The library locations are all set up when the forge module is loaded so these libraries should be found without further arguments.

"},{"location":"software-tools/ddt/#remote-client","title":"Remote Client","text":"

Arm Forge can connect to remote systems using SSH so you can run the user interface on your desktop or laptop machine without the need for X forwarding. Native remote clients are available for Windows, macOS and Linux. You can download the remote clients from the Arm website. No licence file is required by a remote client.

Note

The same versions of Arm Forge must be installed on the local and remote systems in order to use DDT remotely.

To configure the remote client to connect to Cirrus, start it and then click on the Remote Launch drop-down box and click on Configure. In the new window, click Add to create a new login profile. For the hostname you should provide username@login.cirrus.ac.uk where username is your login username. For Remote Installation Directory* enter /mnt/lustre/indy2lfs/sw/arm/forge/latest. To ensure your SSH private key can be used to connect, the SSH agent on your local machine should be configured to provide it. You can ensure this by running ssh-add ~/.ssh/id_rsa_cirrus before using the Forge client where you should replace ~/.ssh/id_rsa_cirrus with the path to the key you normally use to log in to Cirrus. This should persist until your local machine is restarted --only then should you have to re-run ssh-add.

If you only intend to debug jobs on the compute nodes no further configuration is needed. If however you want to use the login nodes, you will likely need to write a short bash script to prepare the same environment you would use if you were running your code interactively on the login node -- otherwise, the necessary libraries will not be found while running. For example, if using MPT, you might create a file in your home directory containing only one line:

module load mpt\n

In your local Forge client you should then edit the Remote Script field in the Cirrus login details to contain the path to this script. When you log in the script will be sourced and the software provided by whatever modules it loads become usable.

When you start the Forge client, you will now be able to select the Cirrus login from the Remote Launch drop-down box. After providing your usual login password the connection to Cirrus will be established and you will be able to start debugging.

You can find more detailed information here.

"},{"location":"software-tools/ddt/#getting-further-help-on-ddt","title":"Getting further help on DDT","text":""},{"location":"software-tools/intel-vtune/","title":"Intel VTune","text":""},{"location":"software-tools/intel-vtune/#profiling-using-vtune","title":"Profiling using VTune","text":"

Intel VTune allows profiling of compiled codes, and is particularly suited to analysing high performance applications involving threads (OpenMP), and MPI (or some combination thereof).

Using VTune is a two-stage process. First, an application is compiled using an appropriate Intel compiler and run in a \"collection\" phase. The results are stored to file, and may then be inspected interactively via the VTune GUI.

"},{"location":"software-tools/intel-vtune/#collection","title":"Collection","text":"

Compile the application in the normal way, and run a batch job in exclusive mode to ensure the node is not shared with other jobs. An example is given below.

Collection of performance data is based on a collect option, which defines which set of hardware counters are monitered in a given run. As not all counters are available at the same time, a number of different collections are available. A different one may be relevant if interested in different aspects of performance. Some standard options are:

vtune -collect=performance-snapshot may be used to product a text summary of performance (typically to standard output), which can be used as a basis for further investigation.

vtune -collect=hotspots produces a more detailed analysis which can be used to inspect time taken per function and per line of code.

vtune -collect=hpc-performance may be useful for HPC codes.

vtune --collect=meory-access will provide figures for memory-related measures including application memory bandwidth.

Use vtune --help collect for a full summary of collection options. Note that not all options are available (e.g., prefer NVIDIA profiling for GPU codes).

"},{"location":"software-tools/intel-vtune/#example-slurm-script","title":"Example SLURM script","text":"

Here we give an example of profiling an application which has been compiled with Intel 20.4 and requests the memory-access collection. We assume the application involves OpenMP threads, but no MPI.

#!/bin/bash\n\n#SBATCH --time=00:10:00\n#SBATCH --nodes=1\n#SBATCH --exclusive\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nexport OMP_NUM_THREADS=18\n\n# Load relevant (cf. compile-time) Intel options \nmodule load intel-20.4/compilers\nmodule load intel-20.4/vtune\n\nvtune -collect=memory-access -r results-memory ./my_application\n

Profiling will generate a certain amount of additional text information; this appears on standard output. Detailed profiling data will be stored in various files in a sub-directory, the name of which can be specified using the -r option.

Notes

"},{"location":"software-tools/intel-vtune/#profiling-an-mpi-code","title":"Profiling an MPI code","text":"

Intel VTune can also be used to profile MPI codes. It is recommended that the relavant Intel MPI module is used for compilation. The following example uses Intel 18 with the older amplxe-cl command:

#!/bin/bash\n\n#SBATCH --time=00:10:00\n#SBATCH --nodes=2\n#SBATCH --exclusive\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nexport OMP_NUM_THREADS=18\n\nmodule load intel-mpi-18\nmodule load intel-compilers-18\nmodule load intel-vtune-18\n\nmpirun -np 4 -ppn 2 amplxe-cl -collect hotspots -r vtune-hotspots \\\n       ./my_application\n

Note that the Intel MPI launcher mpirun is used, and this precedes the VTune command. The example runs a total of 4 MPI tasks (-np 4) with two tasks per node (-ppn 2). Each task runs 18 OpenMP threads.

"},{"location":"software-tools/intel-vtune/#viewing-the-results","title":"Viewing the results","text":"

We recommend that the latest version of the VTune GUI is used to view results; this can be run interactively with an appropriate X connection. The latest version is available via

$ module load oneapi\n$ module load vtune/latest\n$ vtune-gui\n

From the GUI, navigate to the appropriate results file to load the analysis. Note that the latest version of VTune will be able to read results generated with previous versions of the Intel compilers.

"},{"location":"software-tools/scalasca/","title":"Profiling using Scalasca","text":"

Scalasca is installed on Cirrus, which is an open source performance profiling tool. Two versions are provided, using GCC 8.2.0 and the Intel 19 compilers; both use the HPE MPT library to provide MPI and SHMEM. An important distinction is that the GCC+MPT installation cannot be used to profile Fortran code as MPT does not provide GCC Fortran module files. To profile Fortran code, please use the Intel+MPT installation.

Loading the one of the modules will autoload the correct compiler and MPI library:

module load scalasca/2.6-gcc8-mpt225\n

or

module load scalasca/2.6-intel19-mpt225\n

Once loaded, the profiler may be run with the scalasca or scan commands, but the code must first be compiled first with the Score-P instrumentation wrapper tool. This is done by prepending the compilation commands with scorep, e.g.:

scorep mpicc -c main.c -o main\nscorep mpif90 -openmp main.f90 -o main\n

Advanced users may also wish to make use of the Score-P API. This allows you to manually define function and region entry and exit points.

You can then profile the execution during a Slurm job by prepending your srun commands with one of the equivalent commands scalasca -analyze or scan -s:

scalasca -analyze srun ./main\nscan -s srun ./main\n

You will see some output from Scalasca to stdout during the run. Included in that output will be the name of an experiment archive directory, starting with scorep_, which will be created in the working directory. If you want, you can set the name of the directory by exporting the SCOREP_EXPERIMENT_DIRECTORY environment variable in your job script.

There is an associated GUI called Cube which can be used to process and examine the experiment results, allowing you to understand your code's performance. This has been made available via a Singularity container. To start it, run the command cube followed by the file in the experiment archive directory ending in .cubex (or alternatively the whole archive), as seen below:

cube scorep_exp_1/profile.cubex\n

The Scalasca quick reference guide found here provides a good overview of the toolset's use, from instrumentation and use of the API to analysis with Cube.

"},{"location":"user-guide/batch/","title":"Running Jobs on Cirrus","text":"

As with most HPC services, Cirrus uses a scheduler to manage access to resources and ensure that the thousands of different users of system are able to share the system and all get access to the resources they require. Cirrus uses the Slurm software to schedule jobs.

Writing a submission script is typically the most convenient way to submit your job to the scheduler. Example submission scripts (with explanations) for the most common job types are provided below.

Interactive jobs are also available and can be particularly useful for developing and debugging applications. More details are available below.

Hint

If you have any questions on how to run jobs on Cirrus do not hesitate to contact the Cirrus Service Desk.

You typically interact with Slurm by issuing Slurm commands from the login nodes (to submit, check and cancel jobs), and by specifying Slurm directives that describe the resources required for your jobs in job submission scripts.

"},{"location":"user-guide/batch/#basic-slurm-commands","title":"Basic Slurm commands","text":"

There are three key commands used to interact with the Slurm on the command line:

We cover each of these commands in more detail below.

"},{"location":"user-guide/batch/#sinfo-information-on-resources","title":"sinfo: information on resources","text":"

sinfo is used to query information about available resources and partitions. Without any options, sinfo lists the status of all resources and partitions, e.g.

[auser@cirrus-login3 ~]$ sinfo\n\nPARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST\nstandard       up   infinite    280   idle r1i0n[0-35],r1i1n[0-35],r1i2n[0-35],r1i3n[0-35],r1i4n[0-35],r1i5n[0-35],r1i6n[0-35],r1i7n[0-6,9-15,18-24,27-33]\ngpu            up   infinite     36   idle r2i4n[0-8],r2i5n[0-8],r2i6n[0-8],r2i7n[0-8]\n
"},{"location":"user-guide/batch/#sbatch-submitting-jobs","title":"sbatch: submitting jobs","text":"

sbatch is used to submit a job script to the job submission system. The script will typically contain one or more srun commands to launch parallel tasks.

When you submit the job, the scheduler provides the job ID, which is used to identify this job in other Slurm commands and when looking at resource usage in SAFE.

[auser@cirrus-login3 ~]$ sbatch test-job.slurm\nSubmitted batch job 12345\n
"},{"location":"user-guide/batch/#squeue-monitoring-jobs","title":"squeue: monitoring jobs","text":"

squeue without any options or arguments shows the current status of all jobs known to the scheduler. For example:

[auser@cirrus-login3 ~]$ squeue\n          JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)\n          1554  comp-cse CASTEP_a  auser  R       0:03      2 r2i0n[18-19]\n

will list all jobs on Cirrus.

The output of this is often overwhelmingly large. You can restrict the output to just your jobs by adding the -u $USER option:

[auser@cirrus-login3 ~]$ squeue -u $USER\n
"},{"location":"user-guide/batch/#scancel-deleting-jobs","title":"scancel: deleting jobs","text":"

scancel is used to delete a jobs from the scheduler. If the job is waiting to run it is simply cancelled, if it is a running job then it is stopped immediately. You need to provide the job ID of the job you wish to cancel/stop. For example:

[auser@cirrus-login3 ~]$ scancel 12345\n

will cancel (if waiting) or stop (if running) the job with ID 12345.

"},{"location":"user-guide/batch/#resource-limits","title":"Resource Limits","text":"

Note

If you have requirements which do not fit within the current QoS, please contact the Service Desk and we can discuss how to accommodate your requirements.

There are different resource limits on Cirrus for different purposes. There are three different things you need to specify for each job:

Each of these aspects are described in more detail below.

The primary resources you request are compute resources: either CPU cores on the standard compute nodes or GPU cards on the GPU compute nodes. Other node resources: memory on the standard compute nodes; memory and CPU cores on the GPU nodes are assigned pro rata based on the primary resource that you request.

Warning

On Cirrus, you cannot specify the memory for a job using the --mem options to Slurm (e.g. --mem, --mem-per-cpu, --mem-per-gpu). The amount of memory you are assigned is calculated from the amount of primary resource you request.

"},{"location":"user-guide/batch/#primary-resources-on-standard-cpu-compute-nodes","title":"Primary resources on standard (CPU) compute nodes","text":"

The primary resource you request on standard compute nodes are CPU cores. The maximum amount of memory you are allocated is computed as the number of CPU cores you requested multiplied by 1/36th of the total memory available (as there are 36 CPU cores per node). So, if you request the full node (36 cores), then you will be allocated a maximum of all of the memory (256 GB) available on the node; however, if you request 1 core, then you will be assigned a maximum of 256/36 = 7.1 GB of the memory available on the node.

Note

Using the --exclusive option in jobs will give you access to the full node memory even if you do not explicitly request all of the CPU cores on the node.

Warning

Using the --exclusive option will charge your account for the usage of the entire node, even if you don't request all the cores in your scripts.

Note

You will not generally have access to the full amount of memory resource on the the node as some is retained for running the operating system and other system processes.

"},{"location":"user-guide/batch/#primary-resources-on-gpu-nodes","title":"Primary resources on GPU nodes","text":"

The primary resource you request on standard compute nodes are GPU cards. The maximum amount of memory and CPU cores you are allocated is computed as the number of GPU cards you requested multiplied by 1/4 of the total available (as there are 4 GPU cards per node). So, if you request the full node (4 GPU cards), then you will be allocated a maximum of all of the memory (384 GB) available on the node; however, if you request 1 GPU card, then you will be assigned a maximum of 384/4 = 96 GB of the memory available on the node.

Note

Using the --exclusive option in jobs will give you access to all of the CPU cores and the full node memory even if you do not explicitly request all of the GPU cards on the node.

Warning

In order to run jobs on the GPU nodes your budget must have positive GPU hours and core hours associated with it. However, only your GPU hours will be consumed when running these jobs.

Warning

Using the --exclusive option will charge your account for the usage of the entire node, i.e., 4 GPUs, even if you don't request all the GPUs in your submission script.

"},{"location":"user-guide/batch/#partitions","title":"Partitions","text":"

On Cirrus, compute nodes are grouped into partitions. You will have to specify a partition using the --partition option in your submission script. The following table has a list of active partitions on Cirrus.

Partition Description Total nodes available Notes standard CPU nodes with 2x 18-core Intel Broadwell processors 352 gpu GPU nodes with 4x Nvidia V100 GPU and 2x 20-core Intel Cascade Lake processors 36

Cirrus Partitions

You can list the active partitions using

sinfo\n

Note

you may not have access to all the available partitions.

"},{"location":"user-guide/batch/#quality-of-service-qos","title":"Quality of Service (QoS)","text":"

On Cirrus Quality of Service (QoS) is used alongside partitions to set resource limits. The following table has a list of active QoS on Cirrus.

QoS Name Jobs Running Per User Jobs Queued Per User Max Walltime Max Size Applies to Partitions Notes standard No limit 500 jobs 4 days 88 nodes (3168 cores/25%) standard largescale 1 job 4 jobs 24 hours 228 nodes (8192+ cores/65%) or 144 GPUs standard, gpu long 5 jobs 20 jobs 14 days 16 nodes or 8 GPUs standard, gpu highpriority 10 jobs 20 jobs 4 days 140 nodes standard charged at 1.5 x normal rate gpu No limit 128 jobs 4 days 64 GPUs (16 nodes/40%) gpu short 1 job 2 jobs 20 minutes 2 nodes or 4 GPUs standard, gpu lowpriority No limit 100 jobs 2 days 36 nodes (1296 cores/10%) or 16 GPUs standard, gpu usage is not charged"},{"location":"user-guide/batch/#cirrus-qos","title":"Cirrus QoS","text":"

You can find out the QoS that you can use by running the following command:

sacctmgr show assoc user=$USER cluster=cirrus format=cluster,account,user,qos%50\n
"},{"location":"user-guide/batch/#troubleshooting","title":"Troubleshooting","text":""},{"location":"user-guide/batch/#slurm-error-handling","title":"Slurm error handling","text":""},{"location":"user-guide/batch/#mpi-jobs","title":"MPI jobs","text":"

Users of MPI codes may wish to ensure termination of all tasks on the failure of one individual task by specifying the --kill-on-bad-exit argument to srun. E.g.,

srun -n 36 --kill-on-bad-exit ./my-mpi-program\n

This can prevent effective \"hanging\" of the job until the wall time limit is reached.

"},{"location":"user-guide/batch/#automatic-resubmission","title":"Automatic resubmission","text":"

Jobs that fail are not automatically resubmitted by Slurm on Cirrus. Automatic resubmission can be enabled for a job by specifying the --requeue option to sbatch.

"},{"location":"user-guide/batch/#slurm-error-messages","title":"Slurm error messages","text":"

An incorrect submission will cause Slurm to return an error. Some common problems are listed below, with a suggestion about the likely cause:

A --partition= option is missing. You must specify the partition (see the list above). This is most often --partition=standard.

error: Batch job submission failed: Invalid partition name specified

Check the partition exists and check the spelling is correct.

This probably means an invalid account has been given. Check the --account= options against valid accounts in SAFE.

A QoS option is either missing or invalid. Check the script has a --qos= option and that the option is a valid one from the table above. (Check the spelling of the QoS is correct.)

Add an option of the form --time=hours:minutes:seconds to the submission script. E.g., --time=01:30:00 gives a time limit of 90 minutes.

The script has probably specified a time limit which is too long for the corresponding QoS. E.g., the time limit for the short QoS is 20 minutes.

"},{"location":"user-guide/batch/#slurm-queued-reasons","title":"Slurm queued reasons","text":"

The squeue command allows users to view information for jobs managed by Slurm. Jobs typically go through the following states: PENDING, RUNNING, COMPLETING, and COMPLETED. The first table provides a description of some job state codes. The second table provides a description of the reasons that cause a job to be in a state.

Status Code Description PENDING PD Job is awaiting resource allocation. RUNNING R Job currently has an allocation. SUSPENDED S Job currently has an allocation. COMPLETING CG Job is in the process of completing. Some processes on some nodes may still be active. COMPLETED CD Job has terminated all processes on all nodes with an exit code of zero. TIMEOUT TO Job terminated upon reaching its time limit. STOPPED ST Job has an allocation, but execution has been stopped with SIGSTOP signal. CPUS have been retained by this job. OUT_OF_MEMORY OOM Job experienced out of memory error. FAILED F Job terminated with non-zero exit code or other failure condition. NODE_FAIL NF Job terminated due to failure of one or more allocated nodes. CANCELLED CA Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated.

Slurm Job State codes

For a full list of see Job State Codes

Reason Description Priority One or more higher priority jobs exist for this partition or advanced reservation. Resources The job is waiting for resources to become available. BadConstraints The job's constraints can not be satisfied. BeginTime The job's earliest start time has not yet been reached. Dependency This job is waiting for a dependent job to complete. Licenses The job is waiting for a license. WaitingForScheduling No reason has been set for this job yet. Waiting for the scheduler to determine the appropriate reason. Prolog Its PrologSlurmctld program is still running. JobHeldAdmin The job is held by a system administrator. JobHeldUser The job is held by the user. JobLaunchFailure The job could not be launched. This may be due to a file system problem, invalid program name, etc. NonZeroExitCode The job terminated with a non-zero exit code. InvalidAccount The job's account is invalid. InvalidQOS The job's QOS is invalid. QOSUsageThreshold Required QOS threshold has been breached. QOSJobLimit The job's QOS has reached its maximum job count. QOSResourceLimit The job's QOS has reached some resource limit. QOSTimeLimit The job's QOS has reached its time limit. NodeDown A node required by the job is down. TimeLimit The job exhausted its time limit. ReqNodeNotAvail Some node specifically required by the job is not currently available. The node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job's \"reason\" field as \"UnavailableNodes\". Such nodes will typically require the intervention of a system administrator to make available.

Slurm Job Reasons

For a full list of see Job Reasons

"},{"location":"user-guide/batch/#output-from-slurm-jobs","title":"Output from Slurm jobs","text":"

Slurm places standard output (STDOUT) and standard error (STDERR) for each job in the file slurm_<JobID>.out. This file appears in the job's working directory once your job starts running.

Note

This file is plain text and can contain useful information to help debugging if a job is not working as expected. The Cirrus Service Desk team will often ask you to provide the contents of this file if you contact them for help with issues.

"},{"location":"user-guide/batch/#specifying-resources-in-job-scripts","title":"Specifying resources in job scripts","text":"

You specify the resources you require for your job using directives at the top of your job submission script using lines that start with the directive #SBATCH.

Note

Options provided using #SBATCH directives can also be specified as command line options to srun.

If you do not specify any options, then the default for each option will be applied. As a minimum, all job submissions must specify the budget that they wish to charge the job too, the partition they wish to use and the QoS they want to use with the options:

Other common options that are used are:

Other not so common options that are used are:

In addition, parallel jobs will also need to specify how many nodes, parallel processes and threads they require.

Note

For parallel jobs, you should request exclusive node access with the --exclusive option to ensure you get the expected resources and performance.

"},{"location":"user-guide/batch/#srun-launching-parallel-jobs","title":"srun: Launching parallel jobs","text":"

If you are running parallel jobs, your job submission script should contain one or more srun commands to launch the parallel executable across the compute nodes. As well as launching the executable, srun also allows you to specify the distribution and placement (or pinning) of the parallel processes and threads.

If you are running MPI jobs that do not also use OpenMP threading, then you should use srun with no additional options. srun will use the specification of nodes and tasks from your job script, sbatch or salloc command to launch the correct number of parallel tasks.

If you are using OpenMP threads then you will generally add the --cpu-bind=cores option to srun to bind threads to cores to obtain the best performance.

Note

See the example job submission scripts below for examples of using srun for pure MPI jobs and for jobs that use OpenMP threading.

"},{"location":"user-guide/batch/#example-parallel-job-submission-scripts","title":"Example parallel job submission scripts","text":"

A subset of example job submission scripts are included in full below.

Hint

Do not replace srun with mpirun in the following examples. Although this might work under special circumstances, it is not guaranteed and therefore not supported.

"},{"location":"user-guide/batch/#example-job-submission-script-for-mpi-parallel-job","title":"Example: job submission script for MPI parallel job","text":"

A simple MPI job submission script to submit a job using 4 compute nodes and 36 MPI ranks per node for 20 minutes would look like:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Example_MPI_Job\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# We use the \"standard\" partition as we are running on CPU nodes\n#SBATCH --partition=standard\n# We use the \"standard\" QoS as our runtime is less than 4 days\n#SBATCH --qos=standard\n\n# Load the default HPE MPI environment\nmodule load mpt\n\n# Change to the submission directory\ncd $SLURM_SUBMIT_DIR\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Launch the parallel job\n#   Using 144 MPI processes and 36 MPI processes per node\n#\u00a0  srun picks up the distribution from the sbatch options\nsrun ./my_mpi_executable.x\n

This will run your executable \"my_mpi_executable.x\" in parallel on 144 MPI processes using 4 nodes (36 cores per node, i.e. not using hyper-threading). Slurm will allocate 4 nodes to your job and srun will place 36 MPI processes on each node (one per physical core).

By default, srun will launch an MPI job that uses all of the cores you have requested via the \"nodes\" and \"tasks-per-node\" options. If you want to run fewer MPI processes than cores you will need to change the script.

For example, to run this program on 128 MPI processes you have two options:

Note

If you specify --ntasks explicitly and it is not compatible with the value of tasks-per-node then you will get a warning message from srun such as srun: Warning: can't honor --ntasks-per-node set to 36.

In this case, srun does the sensible thing and allocates MPI processes as evenly as it can across nodes. For example, the second option above would result in 32 MPI processes on each of the 4 nodes.

See above for a more detailed discussion of the different sbatch options.

"},{"location":"user-guide/batch/#note-on-mpt-task-placement","title":"Note on MPT task placement","text":"

By default, mpt will distribute processss to physical cores (cores 0-17 on socket 0, and cores 18-35 on socket 1) in a cyclic fashion. That is, rank 0 would be placed on core 0, task 1 on core 18, rank 2 on core 1, and so on (in a single-node job). This may be undesirable. Block, rather than cyclic, distribution can be obtained with

#SBATCH --distribution=block:block\n

The block:block here refers to the distribution on both nodes and sockets. This will distribute rank 0 for core 0, rank 1 to core 1, rank 2 to core 2, and so on.

"},{"location":"user-guide/batch/#example-job-submission-script-for-mpiopenmp-mixed-mode-parallel-job","title":"Example: job submission script for MPI+OpenMP (mixed mode) parallel job","text":"

Mixed mode codes that use both MPI (or another distributed memory parallel model) and OpenMP should take care to ensure that the shared memory portion of the process/thread placement does not span more than one node. This means that the number of shared memory threads should be a factor of 36.

In the example below, we are using 4 nodes for 6 hours. There are 8 MPI processes in total (2 MPI processes per node) and 18 OpenMP threads per MPI process. This results in all 36 physical cores per node being used.

Note

the use of the --cpu-bind=cores option to generate the correct affinity settings.

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Example_MPI_Job\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --ntasks=8\n#SBATCH --tasks-per-node=2\n#SBATCH --cpus-per-task=18\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# We use the \"standard\" partition as we are running on CPU nodes\n#SBATCH --partition=standard\n# We use the \"standard\" QoS as our runtime is less than 4 days\n#SBATCH --qos=standard\n\n# Load the default HPE MPI environment\nmodule load mpt\n\n# Change to the submission directory\ncd $SLURM_SUBMIT_DIR\n\n# Set the number of threads to 18\n#   There are 18 OpenMP threads per MPI process\nexport OMP_NUM_THREADS=18\n\n# Launch the parallel job\n#   Using 8 MPI processes\n#   2 MPI processes per node\n#   18 OpenMP threads per MPI process\n\nsrun --cpu-bind=cores ./my_mixed_executable.x arg1 arg2\n
"},{"location":"user-guide/batch/#example-job-submission-script-for-openmp-parallel-job","title":"Example: job submission script for OpenMP parallel job","text":"

A simple OpenMP job submission script to submit a job using 1 compute nodes and 36 threads for 20 minutes would look like:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Example_OpenMP_Job\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=1\n#SBATCH --cpus-per-task=36\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# We use the \"standard\" partition as we are running on CPU nodes\n#SBATCH --partition=standard\n# We use the \"standard\" QoS as our runtime is less than 4 days\n#SBATCH --qos=standard\n\n# Load any required modules\nmodule load mpt\n\n# Change to the submission directory\ncd $SLURM_SUBMIT_DIR\n\n# Set the number of threads to the CPUs per task\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel job\n#   Using 36 threads per node\n#\u00a0  srun picks up the distribution from the sbatch options\nsrun --cpu-bind=cores ./my_openmp_executable.x\n

This will run your executable \"my_openmp_executable.x\" in parallel on 36 threads. Slurm will allocate 1 node to your job and srun will place 36 threads (one per physical core).

See above for a more detailed discussion of the different sbatch options

"},{"location":"user-guide/batch/#job-arrays","title":"Job arrays","text":"

The Slurm job scheduling system offers the job array concept, for running collections of almost-identical jobs. For example, running the same program several times with different arguments or input data.

Each job in a job array is called a subjob. The subjobs of a job array can be submitted and queried as a unit, making it easier and cleaner to handle the full set, compared to individual jobs.

All subjobs in a job array are started by running the same job script. The job script also contains information on the number of jobs to be started, and Slurm provides a subjob index which can be passed to the individual subjobs or used to select the input data per subjob.

"},{"location":"user-guide/batch/#job-script-for-a-job-array","title":"Job script for a job array","text":"

As an example, the following script runs 56 subjobs, with the subjob index as the only argument to the executable. Each subjob requests a single node and uses all 36 cores on the node by placing 1 MPI process per core and specifies 4 hours maximum runtime per subjob:

#!/bin/bash\n# Slurm job options (name, compute nodes, job time)\n\n#SBATCH --name=Example_Array_Job\n#SBATCH --time=04:00:00\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n#SBATCH --array=0-55\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# We use the \"standard\" partition as we are running on CPU nodes\n#SBATCH --partition=standard\n# We use the \"standard\" QoS as our runtime is less than 4 days\n#SBATCH --qos=standard\n\n# Load the default HPE MPI environment\nmodule load mpt\n\n# Change to the submission directory\ncd $SLURM_SUBMIT_DIR\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\nsrun /path/to/exe $SLURM_ARRAY_TASK_ID\n
"},{"location":"user-guide/batch/#submitting-a-job-array","title":"Submitting a job array","text":"

Job arrays are submitted using sbatch in the same way as for standard jobs:

sbatch job_script.pbs\n
"},{"location":"user-guide/batch/#job-chaining","title":"Job chaining","text":"

Job dependencies can be used to construct complex pipelines or chain together long simulations requiring multiple steps.

Note

The --parsable option to sbatch can simplify working with job dependencies. It returns the job ID in a format that can be used as the input to other commands.

For example:

jobid=$(sbatch --parsable first_job.sh)\nsbatch --dependency=afterok:$jobid second_job.sh\n

or for a longer chain:

jobid1=$(sbatch --parsable first_job.sh)\njobid2=$(sbatch --parsable --dependency=afterok:$jobid1 second_job.sh)\njobid3=$(sbatch --parsable --dependency=afterok:$jobid1 third_job.sh)\nsbatch --dependency=afterok:$jobid2,afterok:$jobid3 last_job.sh\n
"},{"location":"user-guide/batch/#interactive-jobs","title":"Interactive Jobs","text":"

When you are developing or debugging code you often want to run many short jobs with a small amount of editing the code between runs. This can be achieved by using the login nodes to run small/short MPI jobs. However, you may want to test on the compute nodes (e.g. you may want to test running on multiple nodes across the high performance interconnect). One way to achieve this on Cirrus is to use an interactive jobs.

Interactive jobs via SLURM take two slightly different forms. The first uses srun directly to allocate resource to be used interactively; the second uses both salloc and srun.

"},{"location":"user-guide/batch/#using-srun","title":"Using srun","text":"

An interactive job via srun allows you to execute commands directly from the command line without using a job submission script, and to see the output from your program directly in the terminal.

A convenient way to do this is as follows.

[user@cirrus-login1]$ srun --exclusive --nodes=1 --time=00:20:00 --partition=standard --qos=standard --account=z04 --pty /usr/bin/bash --login\n[user@r1i0n14]$\n

This requests the exclusive use of one node for the given time (here, 20 minutes). The --pty /usr/bin/bash --login requests an interactive login shell be started. (Note the prompt has changed.) Interactive commands can then be used as normal and will execute on the compute node. When no longer required, you can type exit or CTRL-D to release the resources and return control to the front end shell.

[user@r1i0n14]$ exit\nlogout\n[user@cirrus-login1]$\n

Note that the new interactive shell will reflect the environment of the original login shell. If you do not wish this, add the --export=none argument to srun to provide a clean login environment.

Within an interactive job, one can use srun to launch parallel jobs in the normal way, e.g.,

[user@r1i0n14]$ srun -n 2 ./a.out\n

In this context, one could also use mpirun directly. Note we are limited to the 36 cores of our original --nodes=1 srun request.

"},{"location":"user-guide/batch/#using-salloc-with-srun","title":"Using salloc with srun","text":"

This approach uses thesalloc command to reserve compute nodes and then srun to launch relevant work.

To submit a request for a job reserving 2 nodes (72 physical cores) for 1 hour you would issue the command:

[user@cirrus-login1]$ salloc --exclusive --nodes=2 --tasks-per-node=36 --cpus-per-task=1 --time=01:00:00  --partition=standard --qos=standard --account=t01\nsalloc: Granted job allocation 8699\nsalloc: Waiting for resource configuration\nsalloc: Nodes r1i7n[13-14] are ready for job\n[user@cirrus-login1]$\n

Note that this starts a new shell on the login node associated with the allocation (the prompt has not changed). The allocation may be released by exiting this new shell.

[user@cirrus-login1]$ exit\nsalloc: Relinquishing job allocation 8699\n[user@cirrus-login1]$\n

While the allocation lasts you will be able to run parallel jobs on the compute nodes by issuing the srun command in the normal way. The resources available are those specified in the original salloc command. For example, with the above allocation,

$ srun ./mpi-code.out\n

will run 36 MPI tasks per node on two nodes.

If your allocation reaches its time limit, it will automatically be termintated and the associated shell will exit. To check that the allocation is still running, use squeue:

[user@cirrus-login1]$ squeue -u user\n           JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)\n            8718  standard     bash    user   R       0:07      2 r1i7n[18-19]\n

Choose a time limit long enough to allow the relevant work to be completed.

The salloc method may be useful if one wishes to associate operations on the login node (e.g., via a GUI) with work in the allocation itself.

"},{"location":"user-guide/batch/#reservations","title":"Reservations","text":"

Reservations are available on Cirrus. These allow users to reserve a number of nodes for a specified length of time starting at a particular time on the system.

Reservations require justification. They will only be approved if the request could not be fulfilled with the standard queues. For example, you require a job/jobs to run at a particular time e.g. for a demonstration or course.

Note

Reservation requests must be submitted at least 120 hours in advance of the reservation start time. We cannot guarantee to meet all reservation requests due to potential conflicts with other demands on the service but will do our best to meet all requests.

Reservations will be charged at 1.5 times the usual rate and our policy is that they will be charged the full rate for the entire reservation at the time of booking, whether or not you use the nodes for the full time. In addition, you will not be refunded the compute time if you fail to use them due to a job crash unless this crash is due to a system failure.

To request a reservation you complete a form on SAFE:

  1. [Log into SAFE](https://safe.epcc.ed.ac.uk)
  2. Under the \"Login accounts\" menu, choose the \"Request reservation\" option

On the first page, you need to provide the following:

On the second page, you will need to specify which username you wish the reservation to be charged against and, once the username has been selected, the budget you want to charge the reservation to. (The selected username will be charged for the reservation but the reservation can be used by all members of the selected budget.)

Your request will be checked by the Cirrus User Administration team and, if approved, you will be provided a reservation ID which can be used on the system. To submit jobs to a reservation, you need to add --reservation=<reservation ID> and --qos=reservation options to your job submission script or Slurm job submission command.

Note

You must have at least 1 CPUh - and 1 GPUh for reservations on GPU nodes - to be able to submit jobs to reservations.

Tip

You can submit jobs to a reservation as soon as the reservation has been set up; jobs will remain queued until the reservation starts.

"},{"location":"user-guide/batch/#serial-jobs","title":"Serial jobs","text":"

Unlike parallel jobs, serial jobs will generally not need to specify the number of nodes and exclusive access (unless they want access to all of the memory on a node. You usually only need the --ntasks=1 specifier. For example, a serial job submission script could look like:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Example_Serial_Job\n#SBATCH --time=0:20:0\n#SBATCH --ntasks=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# We use the \"standard\" partition as we are running on CPU nodes\n#SBATCH --partition=standard\n# We use the \"standard\" QoS as our runtime is less than 4 days\n#SBATCH --qos=standard\n\n# Change to the submission directory\ncd $SLURM_SUBMIT_DIR\n\n# Enforce threading to 1 in case underlying libraries are threaded\nexport OMP_NUM_THREADS=1\n\n# Launch the serial job\n#   Using 1 thread\nsrun --cpu-bind=cores ./my_serial_executable.x\n

Note

Remember that you will be allocated memory based on the number of tasks (i.e. CPU cores) that you request. You will get ~7.1 GB per task/core. If you need more than this for your serial job then you should ask for the number of tasks you need for the required memory (or use the --exclusive option to get access to all the memory on a node) and launch specifying a single task using srun --ntasks=1 --cpu-bind=cores.

"},{"location":"user-guide/batch/#temporary-files-and-tmp-in-batch-jobs","title":"Temporary files and /tmp in batch jobs","text":"

Applications which normally read and write temporary files from /tmp may require some care in batch jobs on Cirrus. As the size of /tmp on backend nodes is relatively small (\\< 150 MB), applications should use a different location to prevent possible failures. This is relevant for both CPU and GPU nodes.

Note also that the default value of the variable TMPDIR in batch jobs is a memory-resident file system location specific to the current job (typically in the /dev/shm directory). Files here reduce the available capacity of main memory on the node.

It is recommended that applications with significant temporary file space requirement should use the /user-guide/solidstate. E.g., a submission script might contain:

export TMPDIR=\"/scratch/space1/x01/auser/$SLURM_JOBID.tmp\"\nmkdir -p $TMPDIR\n

to set the standard temporary directory to a unique location in the solid state storage. You will also probably want to add a line to clean up the temporary directory at the end of your job script, e.g.

rm -r $TMPDIR\n

Tip

Applications should not hard-code specific locations such as /tmp. Parallel applications should further ensure that there are no collisions in temporary file names on separate processes/nodes.

"},{"location":"user-guide/connecting/","title":"Connecting to Cirrus","text":"

On the Cirrus system, interactive access can be achieved via SSH, either directly from a command line terminal or using an SSH client. In addition data can be transferred to and from the Cirrus system using scp from the command line or by using a file transfer client.

Before following the process below, we assume you have set up an account on Cirrus through the EPCC SAFE. Documentation on how to do this can be found at:

SAFE Guide for Users

This section covers the basic connection methods.

"},{"location":"user-guide/connecting/#access-credentials-mfa","title":"Access credentials: MFA","text":"

To access Cirrus, you need to use two credentials (this is known as multi-factor authentication or MFA): your SSH key pair, protected by a passphrase, and a time-based one-time passcode (sometimes known as a TOTP code). You can find more detailed instructions on how to set up your credentials to access Cirrus from Windows, macOS and Linux below.

Note

The first time you log into a new account you will also need to enter a one-time password from SAFE. This is described in more detail below.

"},{"location":"user-guide/connecting/#ssh-key-pairs","title":"SSH Key Pairs","text":"

You will need to generate an SSH key pair protected by a passphrase to access Cirrus.

Using a terminal (the command line), set up a key pair that contains your e-mail address and enter a passphrase you will use to unlock the key:

$ ssh-keygen -t rsa -C \"your@email.com\"\n...\n-bash-4.1$ ssh-keygen -t rsa -C \"your@email.com\"\nGenerating public/private rsa key pair.\nEnter file in which to save the key (/Home/user/.ssh/id_rsa): [Enter]\nEnter passphrase (empty for no passphrase): [Passphrase]\nEnter same passphrase again: [Passphrase]\nYour identification has been saved in /Home/user/.ssh/id_rsa.\nYour public key has been saved in /Home/user/.ssh/id_rsa.pub.\nThe key fingerprint is:\n03:d4:c4:6d:58:0a:e2:4a:f8:73:9a:e8:e3:07:16:c8 your@email.com\nThe key's randomart image is:\n+--[ RSA 2048]----+\n|    . ...+o++++. |\n| . . . =o..      |\n|+ . . .......o o |\n|oE .   .         |\n|o =     .   S    |\n|.    +.+     .   |\n|.  oo            |\n|.  .             |\n| ..              |\n+-----------------+\n

(remember to replace \"your@email.com\" with your e-mail address).

"},{"location":"user-guide/connecting/#upload-public-part-of-key-pair-to-safe","title":"Upload public part of key pair to SAFE","text":"

You should now upload the public part of your SSH key pair to the SAFE by following the instructions at:

Login to SAFE. Then:

  1. Go to the Menu Login accounts and select the Cirrus account you want to add the SSH key to
  2. On the subsequent Login account details page click the Add Credential button
  3. Select SSH public key as the Credential Type and click Next
  4. Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer.
  5. Click Add to associate the public SSH key part with your account

Once you have done this, your SSH key will be added to your Cirrus account.

"},{"location":"user-guide/connecting/#time-based-one-time-passcode-totp-code","title":"Time-based one-time passcode (TOTP code)","text":"

Remember, you will need to use both an SSH key and time-based one-time passcode (TOTP code) to log into Cirrus so you will also need to set up a method for generating a TOTP code before you can log into Cirrus.

"},{"location":"user-guide/connecting/#first-login-password-required","title":"First login: password required","text":"

Important

You will not use your password when logging on to Cirrus after the first login for a new account.

As an additional security measure, you will also need to use a password from SAFE for your first login to Cirrus with a new account. When you log into Cirrus for the first time with a new account, you will be prompted to change your initial password. This is a three step process:

  1. When promoted to enter your ldap password: Enter the password which you retrieve from SAFE
  2. When prompted to enter your new password: type in a new password
  3. When prompted to re-enter the new password: re-enter the new password

Your password has now been changed. You will no longer need this password to log into Cirrus from this point forwards, you will use your SSH key and TOTP code as described above.

"},{"location":"user-guide/connecting/#ssh-clients","title":"SSH Clients","text":"

Interaction with Cirrus is done remotely, over an encrypted communication channel, Secure Shell version 2 (SSH-2). This allows command-line access to one of the login nodes of a Cirrus, from which you can run commands or use a command-line text editor to edit files. SSH can also be used to run graphical programs such as GUI text editors and debuggers when used in conjunction with an X client.

"},{"location":"user-guide/connecting/#logging-in-from-linux-and-macos","title":"Logging in from Linux and MacOS","text":"

Linux distributions and MacOS each come installed with a terminal application that can be use for SSH access to the login nodes. Linux users will have different terminals depending on their distribution and window manager (e.g. GNOME Terminal in GNOME, Konsole in KDE). Consult your Linux distribution's documentation for details on how to load a terminal.

MacOS users can use the Terminal application, located in the Utilities folder within the Applications folder.

You can use the following command from the terminal window to login into Cirrus:

ssh username@login.cirrus.ac.uk\n

You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered your passphrase successfully, you will then be prompted for your password. You need to enter both correctly to be able to access Cirrus.

Note

If your SSH key pair is not stored in the default location (usually ~/.ssh/id_rsa) on your local system, you may need to specify the path to the private part of the key with the -i option to ssh. For example, if your key is in a file called keys/id_rsa_cirrus you would use the command ssh -i keys/id_rsa_cirrus username@login.cirrus.ac.uk to log in.

To allow remote programs, especially graphical applications to control your local display, such as being able to open up a new GUI window (such as for a debugger), use:

ssh -X username@login.cirrus.ac.uk\n

Some sites recommend using the -Y flag. While this can fix some compatibility issues, the -X flag is more secure.

Current MacOS systems do not have an X window system. Users should install the XQuartz package to allow for SSH with X11 forwarding on MacOS systems:

"},{"location":"user-guide/connecting/#logging-in-from-windows-using-mobaxterm","title":"Logging in from Windows using MobaXterm","text":"

A typical Windows installation will not include a terminal client, though there are various clients available. We recommend all our Windows users to download and install MobaXterm to access Cirrus. It is very easy to use and includes an integrated X server with SSH client to run any graphical applications on Cirrus.

You can download MobaXterm Home Edition (Installer Edition) from the following link:

Double-click the downloaded Microsoft Installer file (.msi), and the Windows wizard will automatically guides you through the installation process. Note, you might need to have administrator rights to install on some Windows OS. Also make sure to check whether Windows Firewall hasn't blocked any features of this program after installation.

Start MobaXterm using, for example, the icon added to the Start menu during the installation process.

If you would like to run any small remote GUI applications, then make sure to use -X option along with the ssh command (see above) to enable X11 forwarding, which allows you to run graphical clients on your local X server.

"},{"location":"user-guide/connecting/#host-keys","title":"Host Keys","text":"

Adding the host keys to your SSH configuration file provides an extra level of security for your connections to Cirrus. The host keys are checked against the login nodes when you login to Cirrus and if the remote server key does not match the one in the configuration file, the connection will be refused. This provides protection against potential malicious servers masquerading as the Cirrus login nodes.

"},{"location":"user-guide/connecting/#logincirrusacuk","title":"login.cirrus.ac.uk","text":"
login.cirrus.ac.uk ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOXYXQEFJfIBZRadNjVU9T0bYVlssht4Qz9Urliqor3L+S8rQojSQtPAjsxxgtD/yeaUWAaBZnXcbPFl2/uFPro=\n\nlogin.cirrus.ac.uk ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC4YJNc0yYfUPtiApGzzkwTYxUhFB1q1G2/vO8biwDL4W0LOcaBFCNTVst1IXQ6tZ9l0GfvlmYTb1LHYoTYLn5CyUL5KKS7X4FkhM9n2EExy/WK+H7kOvOwnWEAWM3GOwPYfhPWdddIHO7cI3CTd1kAL3NVzlt/yvx0CKGtw2QyL9gLGPJ23soDlIJYp/OC/f7E6U+JM6jx8QshQn0PiBPN3gB9MLWNX7ZsYXaSafIw1/txoh7D7CawsTrlKEHgEyNpQIgZFR7pLYlydRijbWEtD40DxlgaF1l/OuJrBfddRXC7VYHNvHq0jv0HCncCjxcHZmr3FW9B3PuRvBeWJpzV6Bv2pLGTPPwd8p7QgkAmTQ1Ews/Q4giUboZyqRcJAkFQtOBCmv43+qxWXKMAB7OdbjJL2oO9UIfPtUmE6oj+rnPxpJMhJuQX2aHIlS0Mev7NzaTUpQqNa4QgsI7Kj/m2JT0ZfQ0I33NO10Z3PLZghKqhTH5yy+2nSYLK6rnxZLU=\n\nlogin.cirrus.ac.uk ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFk4UnY1DaS+LFSS8AFKbmAmlevxShN4hGpn+gGGX8Io\n

Host key verification can fail if this key is out of date, a problem which can be fixed by removing the offending entry in ~/.ssh/known_hosts and replacing it with the new key published here. We recommend users should check this page for any key updates and not just accept a new key from the server without confirmation.

"},{"location":"user-guide/connecting/#making-access-more-convenient-using-the-ssh-configuration-file","title":"Making access more convenient using the SSH configuration file","text":"

Typing in the full command to login or transfer data to Cirrus can become tedious as it often has to be repeated many times. You can use the SSH configuration file, usually located on your local machine at .ssh/config to make things a bit more convenient.

Each remote site (or group of sites) can have an entry in this file which may look something like:

Host cirrus\n  HostName login.cirrus.ac.uk\n  User username\n

(remember to replace username with your actual username!).

The Host cirrus line defines a short name for the entry. In this case, instead of typing ssh username@login.cirrus.ac.uk to access the Cirrus login nodes, you could use ssh cirrus instead. The remaining lines define the options for the cirrus host.

Now you can use SSH to access Cirrus without needing to enter your username or the full hostname every time:

-bash-4.1$ ssh cirrus\n

You can set up as many of these entries as you need in your local configuration file. Other options are available. See the ssh_config man page (or man ssh_config on any machine with SSH installed) for a description of the SSH configuration file. You may find the IdentityFile option useful if you have to manage multiple SSH key pairs for different systems as this allows you to specify which SSH key to use for each system.

Note

There is a known bug with Windows ssh-agent. If you get the error message: Warning: agent returned different signature type ssh-rsa (expected rsa-sha2-512), you will need to either specify the path to your ssh key in the command line (using the -i option as described above) or add the path to your SSH config file by using the IdentityFile option.

"},{"location":"user-guide/connecting/#accessing-cirrus-from-more-than-1-machine","title":"Accessing Cirrus from more than 1 machine","text":"

It is common for users to want to access Cirrus from more than one local machine (e.g. a desktop linux, and a laptop) - this can be achieved through use of an ~/.ssh/authorized_keys file on Cirrus to hold the additional keys you generate. Note that if you want to access Cirrus via another remote service, see the next section, SSH forwarding.

You need to consider one of your local machines as your primary machine - this is the machine you should connect to Cirrus with using the instructions further up this page, adding your public key to SAFE.

On your second local machine, generate a new SSH key pair. Copy the public key to your primary machine (e.g. by email, USB stick, or cloud storage); the default location for this on a Linux or MacOS machine will be ~/.ssh/id_rsa.pub. If you are a Windows user using MobaXTerm, you should export the public key it generates to OpenSSH format (Conversions > Export OpenSSH Key). You should never move the private key off the machine on which it was generated.

Once back on your primary machine, you should copy the public key from your secondary machine to Cirrus using:

scp id_rsa.pub <user>@login.cirrus.ac.uk:id_secondary.pub\n

You should then log into Cirrus, as normal: ssh <user>@login.cirrus.ac.uk, and then:

mkdir ~/.ssh\nchmod 700 ~/.ssh\n
cat ~/id_secondary.pub >> ~/.ssh/authorized_keys\nchmod 600 ~/.ssh/authorized_keys\nrm ~/id_secondary.pub\n

You can then repeat this process for any more local machines you want to access Cirrus from, omitting the mkdir and chmod lines as the relevant files and directories will already exist with the correct permissions. You don't need to add the public key from your primary machine in your authorized_keys file, because Cirrus can find this in SAFE.

Note that the permissions on the .ssh directory must be set to 700 (Owner can read, can write and can execute but group and world do not have access) and on the authorized_keys file must be 600 (Owner can read and write but group and world do not have access). Keys will be ignored if this is not the case.

"},{"location":"user-guide/connecting/#ssh-forwarding-to-use-cirrus-from-a-second-remote-machine","title":"SSH forwarding (to use Cirrus from a second remote machine)","text":"

If you want to access Cirrus from a machine you already access remotely (e.g. to copy data from Cirrus onto a different cluster), you can forward your local Cirrus SSH keys so that you don't need to create a new key pair on the intermediate machine.

If your local machine is MacOS or Linus, add your Cirrus SSH key to the SSH Agent:

eval \"$(ssh-agent -s)\"\nssh-add ~/.ssh/id_rsa\n

(If you created your key with a different name, replace id_rsa in the command with the name of your private key file). You will be prompted for your SSH key's passphrase.

You can then use the -A flag when connecting to your intermediate cluster:

ssh -A <user>@<host>\n

Once on the intermediate cluster, you should be able to SSH to Cirrus directly:

ssh <user>@login.cirrus.ac.uk\n
"},{"location":"user-guide/connecting/#ssh-debugging-tips","title":"SSH debugging tips","text":"

If you find you are unable to connect via SSH there are a number of ways you can try and diagnose the issue. Some of these are collected below - if you are having difficulties connecting we suggest trying these before contacting the Cirrus service desk.

"},{"location":"user-guide/connecting/#can-you-connect-to-the-login-node","title":"Can you connect to the login node?","text":"

Try the command ping -c 3 login.cirrus.ac.uk. If you successfully connect to the login node, the output should include:

--- login.dyn.cirrus.ac.uk ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 38ms\n

(the ping time '38ms' is not important). If not all packets are received there could be a problem with your internet connection, or the login node could be unavailable.

"},{"location":"user-guide/connecting/#ssh-key","title":"SSH key","text":"

If you get the error message Permission denied (publickey) this can indicate a problem with your SSH key. Some things to check:

    $ ls -al ~/.ssh/\n    drwx------.  2 user group    48 Jul 15 20:24 .\n    drwx------. 12 user group  4096 Oct 13 12:11 ..\n    -rw-------.  1 user group   113 Jul 15 20:23 authorized_keys\n    -rw-------.  1 user group 12686 Jul 15 20:23 id_rsa\n    -rw-r--r--.  1 user group  2785 Jul 15 20:23 id_rsa.pub\n    -rw-r--r--.  1 user group  1967 Oct 13 14:11 known_hosts\n

The important section here is the string of letters and dashes at the start, for the lines ending in ., id_rsa, and id_rsa.pub, which indicate permissions on the containing directory, private key, and public key respectively. If your permissions are not correct, they can be set with chmod. Consult the table below for the relevant chmod command. On Windows, permissions are handled differently but can be set by right-clicking on the file and selecting Properties > Security > Advanced. The user, SYSTEM, and Administrators should have Full control, and no other permissions should exist for both public and private key files, and the containing folder.

Target Permissions chmod Code Directory drwx------ 700 Private Key -rw------- 600 Public Key -rw-r--r-- 644

chmod can be used to set permissions on the target in the following way: chmod <code> <target>. So for example to set correct permissions on the private key file id_rsa_cirrus one would use the command chmod 600 id_rsa_cirrus.

Note

Unix file permissions can be understood in the following way. There are three groups that can have file permissions: (owning) users, (owning) groups, and others. The available permissions are read, write, and execute.

The first character indicates whether the target is a file -, or directory d. The next three characters indicate the owning user's permissions. The first character is r if they have read permission, - if they don't, the second character is w if they have write permission, - if they don't, the third character is x if they have execute permission, - if they don't. This pattern is then repeated for group, and other permissions.

For example the pattern -rw-r--r-- indicates that the owning user can read and write the file, members of the owning group can read it, and anyone else can also read it. The chmod codes are constructed by treating the user, group, and owner permission strings as binary numbers, then converting them to decimal. For example the permission string -rwx------ becomes 111 000 000 -> 700.

"},{"location":"user-guide/connecting/#mfa","title":"MFA","text":"

If your TOTP passcode is being consistently rejected, you can remove MFA from your account and then re-enable it.

"},{"location":"user-guide/connecting/#ssh-verbose-output","title":"SSH verbose output","text":"

Verbose debugging output from ssh can be very useful for diagnosing the issue. In particular, it can be used to distinguish between problems with the SSH key and password - further details are given below. To enable verbose output add the -vvv flag to your SSH command. For example:

ssh -vvv username@login.cirrus.ac.uk\n

The output is lengthy, but somewhere in there you should see lines similar to the following:

debug1: Next authentication method: publickey\ndebug1: Offering public key: RSA SHA256:<key-hash> <path_to_private_key>\ndebug3: send_pubkey_test\ndebug3: send packet: type 50\ndebug2: we sent a publickey packet, wait for reply\ndebug3: receive packet: type 60\ndebug1: Server accepts key: pkalg ssh-rsa vlen 2071\ndebug2: input_userauth_pk_ok: fp SHA256:<key-hash>\ndebug3: sign_and_send_pubkey: RSA SHA256:<key-hash>\nEnter passphrase for key '<path_to_private_key>':\ndebug3: send packet: type 50\ndebug3: receive packet: type 51\nAuthenticated with partial success.\n

Most importantly, you can see which files ssh has checked for private keys, and you can see if any key is accepted. The line Authenticated with partial success indicates that the SSH key has been accepted, and you will next be asked for your password. By default ssh will go through a list of standard private key files, as well as any you have specified with -i or a config file. This is fine, as long as one of the files mentioned is the one that matches the public key uploaded to SAFE.

If you do not see Authenticated with partial success anywhere in the verbose output, consider the suggestions under SSH key above. If you do, but are unable to connect, consider the suggestions under Password above.

The equivalent information can be obtained in PuTTY or MobaXterm by enabling all logging in settings.

"},{"location":"user-guide/connecting/#default-shell-environment","title":"Default shell environment","text":"

Usually, when a new login shell is created, the commands on $HOME/.bashrc are executed. This tipically includes setting user-defined alias, changing environment variables, and, in the case of an HPC system, loading modules.

Cirrus does not currently read the $HOME/.bashrc file, but it does read the $HOME/.bash_profile file, so, if you wish to read a $HOME/.bashrc file, you can add the following to your $HOME/.bash_profile file (or create one, if it doesn't exist):

# $HOME/.bash_profile\n# load $HOME/.bashrc, if it exists\nif [ -f $HOME/.bashrc ]; then\n        . $HOME/.bashrc\nfi\n
"},{"location":"user-guide/data/","title":"Data Management and Transfer","text":"

This section covers the storage and file systems available on the system; the different ways that you can transfer data to and from Cirrus; and how to transfer backed up data from prior to the March 2022 Cirrus upgrade.

In all cases of data transfer, users should use the Cirrus login nodes.

"},{"location":"user-guide/data/#cirrus-file-systems-and-storage","title":"Cirrus file systems and storage","text":"

The Cirrus service, like many HPC systems, has a complex structure. There are a number of different data storage types available to users:

Each type of storage has different characteristics and policies, and is suitable for different types of use.

There are also two different types of node available to users:

Each type of node sees a different combination of the storage types. The following table shows which storage options are available on different node types:

Storage Login nodes Compute nodes Notes Home yes no No backup Work yes yes No backup Solid state yes yes No backup"},{"location":"user-guide/data/#home-file-system","title":"Home file system","text":"

Every project has an allocation on the home file system and your project's space can always be accessed via the path /home/[project-code]. The home file system is approximately 1.5 PB in size and is implemented using the Ceph technology. This means that this storage is not particularly high performance but are well suited to standard operations like compilation and file editing. This file systems is visible from the Cirrus login nodes.

There are currently no backups of any data on the home file system.

"},{"location":"user-guide/data/#quotas-on-home-file-system","title":"Quotas on home file system","text":"

All projects are assigned a quota on the home file system. The project PI or manager can split this quota up between groups of users if they wish.

You can view any home file system quotas that apply to your account by logging into SAFE and navigating to the page for your Cirrus login account.

  1. Log into SAFE
  2. Use the \"Login accounts\" menu and select your Cirrus login account
  3. The \"Login account details\" table lists any user or group quotas that are linked with your account. (If there is no quota shown for a row then you have an unlimited quota for that item, but you may still may be limited by another quota.)

Quota and usage data on SAFE is updated twice daily so may not be exactly up to date with the situation on the system itself.

"},{"location":"user-guide/data/#from-the-command-line","title":"From the command line","text":"

Some useful information on the current contents of directories on the /home file system is available from the command line by using the Ceph command getfattr. This is to be preferred over standard Unix commands such as du for reasons of efficiency.

For example, the number of entries (files plus directories) in a home directory can be queried via

$ cd\n$ getfattr -n ceph.dir.entries .\n# file: .\nceph.dir.entries=\"33\"\n

The corresponding attribute rentries gives the recursive total in all subdirectories, that is, the total number of files and directories:

$ getfattr -n ceph.dir.rentries .\n# file: .\nceph.dir.rentries=\"1619179\"\n

Other useful attributes (all prefixed with ceph.dir.) include files which is the number of ordinary files, subdirs the number of subdirectories, and bytes the total number of bytes used. All these have a corresponding recursive version, respectively: rfiles, rsubdirs, and rbytes.

A full path name can be specified if required.

"},{"location":"user-guide/data/#work-file-system","title":"Work file system","text":"

Every project has an allocation on the work file system and your project's space can always be accessed via the path /work/[project-code]. The work file system is approximately 400 TB in size and is implemented using the Lustre parallel file system technology. They are designed to support data in large files. The performance for data stored in large numbers of small files is probably not going to be as good.

There are currently no backups of any data on the work file system.

Ideally, the work file system should only contain data that is:

In practice it may be convenient to keep copies of datasets on the work file system that you know will be needed at a later date. However, make sure that important data is always backed up elsewhere and that your work would not be significantly impacted if the data on the work file system was lost.

If you have data on the work file system that you are not going to need in the future please delete it.

"},{"location":"user-guide/data/#quotas-on-the-work-file-system","title":"Quotas on the work file system","text":"

Tip

The capacity of the home file system is much larger than the work file system so you should store most data on home and only move data to work that you need for current running work.

As for the home file system, all projects are assigned a quota on the work file system. The project PI or manager can split this quota up between groups of users if they wish.

You can view any work file system quotas that apply to your account by logging into SAFE and navigating to the page for your Cirrus login account.

  1. Log into SAFE
  2. Use the \"Login accounts\" menu and select your Cirrus login account
  3. The \"Login account details\" table lists any user or group quotas that are linked with your account. (If there is no quota shown for a row then you have an unlimited quota for that item, but you may still may be limited by another quota.)

Quota and usage data on SAFE is updated twice daily so may not be exactly up to date with the situation on the system itself.

You can also examine up to date quotas and usage on the Cirrus system itself using the lfs quota command. To do this:

Change directory to the work directory where you want to check the quota. For example, if I wanted to check the quota for user auser in project t01 then I would:

[auser@cirrus-login1 auser]$ cd /work/t01/t01/auser\n\n[auser@cirrus-login1 auser]$ lfs quota -hu auser .\nDisk quotas for usr auser (uid 68826):\n     Filesystem    used   quota   limit   grace   files   quota   limit   grace\n              .  5.915G      0k      0k       -   51652       0       0       -\nuid 68826 is using default block quota setting\nuid 68826 is using default file quota setting\n

the quota and limit of 0k here indicate that no user quota is set for this user.

To check your project (group) quota, you would use the command:

[auser@cirrus-login1 auser]$ lfs quota -hg t01 .\nDisk quotas for grp t01 (gid 37733):\n     Filesystem    used   quota   limit   grace   files   quota   limit   grace\n           .  958.3G      0k  13.57T       - 1427052       0       0       -\ngid 37733 is using default file quota setting\n

the limit of 13.57T indicates the quota for the group.

"},{"location":"user-guide/data/#solid-state-storage","title":"Solid state storage","text":"

More information on using the solid state storage can be found in the /user-guide/solidstate section of the user guide.

The solid state storage is not backed up.

"},{"location":"user-guide/data/#accessing-cirrus-data-from-before-march-2022","title":"Accessing Cirrus data from before March 2022","text":"

Prior to the March 2022 Cirrus upgrade,all user date on the /lustre/sw filesystem was archived. Users can access their archived data from the Cirrus login nodes in the /home-archive directory. Assuming you are user auser from project x01, your pre-rebuild archived data can be found in:

/home-archive/x01/auser\n

The data in the /home-archive file system is read only meaning that you will not be able to create, edit, or copy new information to this file system.

To make archived data visible from the compute nodes, you will need to copy the data from the /home-archive file system to the /home file system. Assuming again that you are user auser from project x01 and that you were wanting to copy data from /home-archive/x01/auser/directory_to_copy to /home/x01/x01/auser/destination_directory, you would do this by running:

cp -r /home-archive/x01/auser/directory_to_copy \\\n   /home/x01/x01/auser/destination_directory\n

Note that the project code appears once in the path for the old home archive and twice in the path on the new /home file system.

Note

The capacity of the home file system is much larger than the work file system so you should move data to home rather than work.

"},{"location":"user-guide/data/#archiving","title":"Archiving","text":"

If you have related data that consists of a large number of small files it is strongly recommended to pack the files into a larger \"archive\" file for ease of transfer and manipulation. A single large file makes more efficient use of the file system and is easier to move and copy and transfer because significantly fewer meta-data operations are required. Archive files can be created using tools like tar and zip.

"},{"location":"user-guide/data/#tar","title":"tar","text":"

The tar command packs files into a \"tape archive\" format. The command has general form:

tar [options] [file(s)]\n

Common options include:

Putting these together:

tar -cvWlf mydata.tar mydata\n

will create and verify an archive.

To extract files from a tar file, the option -x is used. For example:

tar -xf mydata.tar\n

will recover the contents of mydata.tar to the current working directory.

To verify an existing tar file against a set of data, the -d (diff) option can be used. By default, no output will be given if a verification succeeds and an example of a failed verification follows:

$> tar -df mydata.tar mydata/*\nmydata/damaged_file: Mod time differs\nmydata/damaged_file: Size differs\n

Note

tar files do not store checksums with their data, requiring the original data to be present during verification.

Tip

Further information on using tar can be found in the tar manual (accessed via man tar or at man tar).

"},{"location":"user-guide/data/#zip","title":"zip","text":"

The zip file format is widely used for archiving files and is supported by most major operating systems. The utility to create zip files can be run from the command line as:

zip [options] mydata.zip [file(s)]\n

Common options are:

Together:

zip -0r mydata.zip mydata\n

will create an archive.

Note

Unlike tar, zip files do not preserve hard links. File data will be copied on archive creation, e.g. an uncompressed zip archive of a 100MB file and a hard link to that file will be approximately 200MB in size. This makes zip an unsuitable format if you wish to precisely reproduce the file system layout.

The corresponding unzip command is used to extract data from the archive. The simplest use case is:

unzip mydata.zip\n

which recovers the contents of the archive to the current working directory.

Files in a zip archive are stored with a CRC checksum to help detect data loss. unzip provides options for verifying this checksum against the stored files. The relevant flag is -t and is used as follows:

$> unzip -t mydata.zip\nArchive:  mydata.zip\n    testing: mydata/                 OK\n    testing: mydata/file             OK\nNo errors detected in compressed data of mydata.zip.\n

Tip

Further information on using zip can be found in the zip manual (accessed via man zip or at man zip).

"},{"location":"user-guide/data/#data-transfer","title":"Data transfer","text":""},{"location":"user-guide/data/#before-you-start","title":"Before you start","text":"

Read Harry Mangalam's guide on How to transfer large amounts of data via network. This tells you all you want to know about transferring data.

"},{"location":"user-guide/data/#data-transfer-via-ssh","title":"Data Transfer via SSH","text":"

The easiest way of transferring data to/from Cirrus is to use one of the standard programs based on the SSH protocol such as scp, sftp or rsync. These all use the same underlying mechanism (ssh) as you normally use to login to Cirrus. So, once the command has been executed via the command line, you will be prompted for your password for the specified account on the remote machine.

To avoid having to type in your password multiple times you can set up a ssh-key as documented in the User Guide at connecting

"},{"location":"user-guide/data/#ssh-transfer-performance-considerations","title":"SSH Transfer Performance Considerations","text":"

The ssh protocol encrypts all traffic it sends. This means that file-transfer using ssh consumes a relatively large amount of CPU time at both ends of the transfer. The encryption algorithm used is negotiated between the ssh-client and the ssh-server. There are command line flags that allow you to specify a preference for which encryption algorithm should be used. You may be able to improve transfer speeds by requesting a different algorithm than the default. The arcfour algorithm is usually quite fast assuming both hosts support it.

A single ssh based transfer will usually not be able to saturate the available network bandwidth or the available disk bandwidth so you may see an overall improvement by running several data transfer operations in parallel. To reduce metadata interactions it is a good idea to overlap transfers of files from different directories.

In addition, you should consider the following when transferring data.

"},{"location":"user-guide/data/#scp-command","title":"scp command","text":"

The scp command creates a copy of a file, or if given the -r flag, a directory, on a remote machine.

For example, to transfer files to Cirrus:

scp [options] source user@login.cirrus.ac.uk:[destination]\n

(Remember to replace user with your Cirrus username in the example above.)

In the above example, the [destination] is optional, as when left out scp will simply copy the source into the user's home directory. Also the source should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.

Tip

If your local version of OpenSSL (the library underlying scp) is very new you may see errors transferring data to Cirrus using scp where the version of OpenSSL is older. The errors typically look like scp: upload \"mydata\": path canonicalization failed. You can get around this issue by adding the -O option to scp.

If you want to request a different encryption algorithm add the -c [algorithm-name] flag to the scp options. For example, to use the (usually faster) aes128-ctr encryption algorithm you would use:

scp [options] -c aes128-ctr source user@login.cirrus.ac.uk:[destination]\n

(Remember to replace user with your Cirrus username in the example above.)

"},{"location":"user-guide/data/#rsync-command","title":"rsync command","text":"

The rsync command can also transfer data between hosts using a ssh connection. It creates a copy of a file or, if given the -r flag, a directory at the given destination, similar to scp above.

Given the -a option rsync can also make exact copies (including permissions), this is referred to as mirroring. In this case the rsync command is executed with ssh to create the copy on a remote machine.

To transfer files to Cirrus using rsync the command should have the form:

rsync [options] -e ssh source user@login.cirrus.ac.uk:[destination]\n

(Remember to replace user with your Cirrus username in the example above.)

In the above example, the [destination] is optional, as when left out rsync will simply copy the source into the users home directory. Also the source should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.

Additional flags can be specified for the underlying ssh command by using a quoted string as the argument of the -e flag. e.g.

rsync [options] -e \"ssh -c aes128-ctr\" source user@login.cirrus.ac.uk:[destination]\n

(Remember to replace user with your Cirrus username in the example above.)

"},{"location":"user-guide/data/#data-transfer-using-rclone","title":"Data transfer using rclone","text":"

Rclone is a command-line program to manage files on cloud storage. You can transfer files directly to/from cloud storage services, such as MS OneDrive and Dropbox. The program preserves timestamps and verifies checksums at all times.

First of all, you must download and unzip rclone on Cirrus:

wget https://downloads.rclone.org/v1.62.2/rclone-v1.62.2-linux-amd64.zip\nunzip rclone-v1.62.2-linux-amd64.zip\ncd rclone-v1.62.2-linux-amd64/\n

The previous code snippet uses rclone v1.62.2, which was the latest version when these instructions were written.

Configure rclone using ./rclone config. This will guide you through an interactive setup process where you can make a new remote (called remote). See the following for detailed instructions for:

Please note that a token is required to connect from Cirrus to the cloud service. You need a web browser to get the token. The recommendation is to run rclone in your laptop using rclone authorize, get the token, and then copy the token from your laptop to Cirrus. The rclone website contains further instructions on configuring rclone on a remote machine without web browser.

Once all the above is done, you\u2019re ready to go. If you want to copy a directory, please use:

rclone copy <cirrus_directory> remote:<cloud_directory>\n

Please note that \u201cremote\u201d is the name that you have chosen when running rclone config`. To copy files, please use:

rclone copyto <cirrus_file> remote:<cloud_file>\n

Note

If the session times out while the data transfer takes place, adding the -vv flag to an rclone transfer forces rclone to output to the terminal and therefore avoids triggering the timeout process.

"},{"location":"user-guide/development/","title":"Application Development Environment","text":"

The application development environment on Cirrus is primarily controlled through the modules environment. By loading and switching modules you control the compilers, libraries and software available.

This means that for compiling on Cirrus you typically set the compiler you wish to use using the appropriate modules, then load all the required library modules (e.g. numerical libraries, IO format libraries).

Additionally, if you are compiling parallel applications using MPI (or SHMEM, etc.) then you will need to load one of the MPI environments and use the appropriate compiler wrapper scripts.

By default, all users on Cirrus start with no modules loaded.

Basic usage of the module command on Cirrus is covered below. For full documentation please see:

"},{"location":"user-guide/development/#using-the-modules-environment","title":"Using the modules environment","text":""},{"location":"user-guide/development/#information-on-the-available-modules","title":"Information on the available modules","text":"

Finding out which modules (and hence which compilers, libraries and software) are available on the system is performed using the module avail command:

[user@cirrus-login0 ~]$ module avail\n...\n

This will list all the names and versions of the modules available on the service. Not all of them may work in your account though due to, for example, licencing restrictions. You will notice that for many modules we have more than one version, each of which is identified by a version number. One of these versions is the default. As the service develops the default version will change.

You can list all the modules of a particular type by providing an argument to the module avail command. For example, to list all available versions of the Intel Compiler type:

[user@cirrus-login0 ~]$ module avail intel-compilers\n\n--------------------------------- /mnt/lustre/indy2lfs/sw/modulefiles --------------------------------\nintel-compilers-18/18.05.274  intel-compilers-19/19.0.0.117\n

If you want more info on any of the modules, you can use the module help command:

[user@cirrus-login0 ~]$ module help mpt\n\n-------------------------------------------------------------------\nModule Specific Help for /usr/share/Modules/modulefiles/mpt/2.25:\n\nThe HPE Message Passing Toolkit (MPT) is an optimized MPI\nimplementation for HPE systems and clusters.  See the\nMPI(1) man page and the MPT User's Guide for more\ninformation.\n-------------------------------------------------------------------\n

The simple module list command will give the names of the modules and their versions you have presently loaded in your environment, e.g.:

[user@cirrus-login0 ~]$ module list\nCurrently Loaded Modulefiles:\n1) git/2.35.1(default)                                  6) gcc/8.2.0(default)\n2) singularity/3.7.2(default)                           7) intel-cc-18/18.0.5.274\n3) epcc/utils                                           8) intel-fc-18/18.0.5.274\n4) /mnt/lustre/indy2lfs/sw/modulefiles/epcc/setup-env   9) intel-compilers-18/18.05.274\n5) intel-license                                       10) mpt/2.25\n
"},{"location":"user-guide/development/#loading-unloading-and-swapping-modules","title":"Loading, unloading and swapping modules","text":"

To load a module to use module add or module load. For example, to load the intel-compilers-18 into the development environment:

module load intel-compilers-18\n

This will load the default version of the intel compilers. If you need a specific version of the module, you can add more information:

module load intel-compilers-18/18.0.5.274\n

will load version 18.0.2.274 for you, regardless of the default.

If a module loading file cannot be accessed within 10 seconds, a warning message will appear: Warning: Module system not loaded.

If you want to clean up, module remove will remove a loaded module:

module remove intel-compilers-18\n

(or module rm intel-compilers-18 or module unload intel-compilers-18) will unload what ever version of intel-compilers-18 (even if it is not the default) you might have loaded. There are many situations in which you might want to change the presently loaded version to a different one, such as trying the latest version which is not yet the default or using a legacy version to keep compatibility with old data. This can be achieved most easily by using \"module swap oldmodule newmodule\".

Suppose you have loaded version 18 of the Intel compilers; the following command will change to version 19:

module swap intel-compilers-18 intel-compilers-19\n
"},{"location":"user-guide/development/#available-compiler-suites","title":"Available Compiler Suites","text":"

Note

As Cirrus uses dynamic linking by default you will generally also need to load any modules you used to compile your code in your job submission script when you run your code.

"},{"location":"user-guide/development/#intel-compiler-suite","title":"Intel Compiler Suite","text":"

The Intel compiler suite is accessed by loading the intel-compilers-* and intel-*/compilers modules, where * references the version. For example, to load the 2019 release, you would run:

module load intel-compilers-19\n

Once you have loaded the module, the compilers are available as:

See the extended section below for further details of available Intel compiler versions and tools.

"},{"location":"user-guide/development/#gcc-compiler-suite","title":"GCC Compiler Suite","text":"

The GCC compiler suite is accessed by loading the gcc/* modules, where * again is the version. For example, to load version 8.2.0 you would run:

module load gcc/8.2.0\n

Once you have loaded the module, the compilers are available as:

"},{"location":"user-guide/development/#compiling-mpi-codes","title":"Compiling MPI codes","text":"

MPI on Cirrus is currently provided by the HPE MPT library.

You should also consult the chapter on running jobs through the batch system for examples of how to run jobs compiled against MPI.

Note

By default, all compilers produce dynamic executables on Cirrus. This means that you must load the same modules at runtime (usually in your job submission script) as you have loaded at compile time.

"},{"location":"user-guide/development/#using-hpe-mpt","title":"Using HPE MPT","text":"

To compile MPI code with HPE MPT, using any compiler, you must first load the \"mpt\" module.

module load mpt\n

This makes the compiler wrapper scripts mpicc, mpicxx and mpif90 available to you.

What you do next depends on which compiler (Intel or GCC) you wish to use to compile your code.

Note

We recommend that you use the Intel compiler wherever possible to compile MPI applications as this is the method officially supported and tested by HPE.

Note

You can always check which compiler the MPI compiler wrapper scripts are using with, for example, mpicc -v or mpif90 -v.

"},{"location":"user-guide/development/#using-intel-compilers-and-hpe-mpt","title":"Using Intel Compilers and HPE MPT","text":"

Once you have loaded the MPT module you should next load the Intel compilers module you intend to use (e.g. intel-compilers-19):

module load intel-compilers-19\n

The compiler wrappers are then available as

Note

The MPT compiler wrappers use GCC by default rather than the Intel compilers:

When compiling C applications you must also specify that mpicc should use the icc compiler with, for example, mpicc -cc=icc. Similarly, when compiling C++ applications you must also specify that mpicxx should use the icpc compiler with, for example, mpicxx -cxx=icpc. (This is not required for Fortran as the mpif90 compiler automatically uses ifort.) If in doubt use mpicc -cc=icc -v or mpicxx -cxx=icpc -v to see which compiler is actually being called.

Alternatively, you can set the environment variables MPICC_CC=icc and/or MPICXX=icpc to ensure the correct base compiler is used:

export MPICC_CC=icc\nexport MPICXX_CXX=icpc\n
"},{"location":"user-guide/development/#using-gcc-compilers-and-hpe-mpt","title":"Using GCC Compilers and HPE MPT","text":"

Once you have loaded the MPT module you should next load the gcc module:

module load gcc\n

Compilers are then available as

Note

HPE MPT does not support the syntax use mpi in Fortran applications with the GCC compiler gfortran. You should use the older include \"mpif.h\" syntax when using GCC compilers with mpif90. If you cannot change this, then use the Intel compilers with MPT.

"},{"location":"user-guide/development/#using-intel-mpi","title":"Using Intel MPI","text":"

Although HPE MPT remains the default MPI library and we recommend that first attempts at building code follow that route, you may also choose to use Intel MPI if you wish. To use these, load the appropriate intel-mpi module, for example intel-mpi-19:

module load intel-mpi-19\n

Please note that the name of the wrappers to use when compiling with Intel MPI depends on whether you are using the Intel compilers or GCC. You should make sure that you or any tools use the correct ones when building software.

Note

Although Intel MPI is available on Cirrus, HPE MPT remains the recommended and default MPI library to use when building applications.

Note

Using Intel MPI 18 can cause warnings in your output similar to no hfi units are available or The /dev/hfi1_0 device failed to appear. These warnings can be safely ignored, or, if you would prefer to prevent them, you may add the line

export I_MPI_FABRICS=shm:ofa\n

to your job scripts after loading the Intel MPI 18 module.

Note

When using Intel MPI 18, you should always launch MPI tasks with srun, the supported method on Cirrus. Launches with mpirun or mpiexec will likely fail.

"},{"location":"user-guide/development/#using-intel-compilers-and-intel-mpi","title":"Using Intel Compilers and Intel MPI","text":"

After first loading Intel MPI, you should next load the appropriate intel-compilers module (e.g. intel-compilers-19):

module load intel-compilers-19\n

You may then use the following MPI compiler wrappers:

"},{"location":"user-guide/development/#using-gcc-compilers-and-intel-mpi","title":"Using GCC Compilers and Intel MPI","text":"

After loading Intel MPI, you should next load the gcc module you wish to use:

module load gcc\n

You may then use these MPI compiler wrappers:

"},{"location":"user-guide/development/#using-openmpi","title":"Using OpenMPI","text":"

There are a number of OpenMPI modules available on Cirrus; these can be listed by running module avail openmpi. You'll notice that the majority of these modules are intended for use on the GPU nodes.

The fact that OpenMPI is open source means that we have full control over how the OpenMPI libraries are built. Indeed the OpenMPI configure script supports a wealth of options that allow us to build OpenMPI for a specific CUDA version, one that is fully compatible with the underlying NVIDIA GPU device driver. See the link below for an example how an OpenMPI build is configured.

Build instructions for OpenMPI 4.1.5 on Cirrus

All this means we build can OpenMPI such that it supports direct GPU-to-GPU communications using the NVLink intra-node GPU comm links (and inter-node GPU comms are direct to Infiniband intead of passing through the host processor).

Hence, the OpenMPI GPU modules allow the user to run GPU-aware MPI code as efficiently as possible, see Compiling and using GPU-aware MPI.

OpenMPI modules for use on the CPU nodes are also available, but these are not expected to provide any performance advantage over HPE MPT or Intel MPI.

"},{"location":"user-guide/development/#compiler-information-and-options","title":"Compiler Information and Options","text":"

The manual pages for the different compiler suites are available:

GCC Fortran man gfortran , C/C++ man gcc

Intel Fortran man ifort , C/C++ man icc

"},{"location":"user-guide/development/#useful-compiler-options","title":"Useful compiler options","text":"

Whilst difference codes will benefit from compiler optimisations in different ways, for reasonable performance on Cirrus, at least initially, we suggest the following compiler options:

Intel -O2

GNU -O2 -ftree-vectorize -funroll-loops -ffast-math

When you have a application that you are happy is working correctly and has reasonable performance you may wish to investigate some more aggressive compiler optimisations. Below is a list of some further optimisations that you can try on your application (Note: these optimisations may result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions):

Intel -fast

GNU -Ofast -funroll-loops

Vectorisation, which is one of the important compiler optimisations for Cirrus, is enabled by default as follows:

Intel At -O2 and above

GNU At -O3 and above or when using -ftree-vectorize

To promote integer and real variables from four to eight byte precision for Fortran codes the following compiler flags can be used:

Intel -real-size 64 -integer-size 64 -xAVX (Sometimes the Intel compiler incorrectly generates AVX2 instructions if the -real-size 64 or -r8 options are set. Using the -xAVX option prevents this.)

GNU -freal-4-real-8 -finteger-4-integer-8

"},{"location":"user-guide/development/#using-static-linkinglibraries","title":"Using static linking/libraries","text":"

By default, executables on Cirrus are built using shared/dynamic libraries (that is, libraries which are loaded at run-time as and when needed by the application) when using the wrapper scripts.

An application compiled this way to use shared/dynamic libraries will use the default version of the library installed on the system (just like any other Linux executable), even if the system modules were set differently at compile time. This means that the application may potentially be using slightly different object code each time the application runs as the defaults may change. This is usually the desired behaviour for many applications as any fixes or improvements to the default linked libraries are used without having to recompile the application, however some users may feel this is not the desired behaviour for their applications.

Alternatively, applications can be compiled to use static libraries (i.e. all of the object code of referenced libraries are contained in the executable file). This has the advantage that once an executable is created, whenever it is run in the future, it will always use the same object code (within the limit of changing runtime environment). However, executables compiled with static libraries have the potential disadvantage that when multiple instances are running simultaneously multiple copies of the libraries used are held in memory. This can lead to large amounts of memory being used to hold the executable and not application data.

To create an application that uses static libraries you must pass an extra flag during compilation, -Bstatic.

Use the UNIX command ldd exe_file to check whether you are using an executable that depends on shared libraries. This utility will also report the shared libraries this executable will use if it has been dynamically linked.

"},{"location":"user-guide/development/#intel-modules-and-tools","title":"Intel modules and tools","text":"

There are a number of different Intel compiler versions available, and there is also a slight difference in the way different versions appear.

A full list is available via module avail intel.

The different available compiler versions are:

We recommend the most up-to-date version in the first instance, unless you have particular reasons for preferring an older version.

For a note on Intel compiler version numbers, see this Intel page

The different module names (or parts thereof) indicate:

"},{"location":"user-guide/gpu/","title":"Using the Cirrus GPU Nodes","text":"

Cirrus has 38 GPU compute nodes each equipped with 4 NVIDIA V100 (Volta) GPU cards. This section of the user guide gives some details of the hardware; it also covers how to compile and run standard GPU applications.

The GPU cards on Cirrus do not support graphics rendering tasks; they are set to compute cluster mode and so only support computational tasks.

"},{"location":"user-guide/gpu/#hardware-details","title":"Hardware details","text":"

All of the Cirrus GPU nodes contain four Tesla V100-SXM2-16GB (Volta) cards. Each card has 16 GB of high-bandwidth memory, HBM, often referred to as device memory. Maximum device memory bandwidth is in the region of 900 GB per second. Each card has 5,120 CUDA cores and 640 Tensor cores.

There is one GPU Slurm partition installed on Cirrus called simply gpu. The 36 nodes in this partition have the Intel Cascade Lake architecture. Users concerned with host performance should add the specific compilation options appropriate for the processor.

The Cascade Lake nodes have two 20-core sockets (2.5 GHz) and a total of 384 GB host memory (192 GB per socket). Each core supports two threads in hardware.

For further details of the V100 architecture see, https://www.nvidia.com/en-gb/data-center/tesla-v100/ .

"},{"location":"user-guide/gpu/#compiling-software-for-the-gpu-nodes","title":"Compiling software for the GPU nodes","text":""},{"location":"user-guide/gpu/#nvidia-hpc-sdk","title":"NVIDIA HPC SDK","text":"

NVIDIA now make regular releases of a unified HPC SDK which provides the relevant compilers and libraries needed to build and run GPU programs. Versions of the SDK are available via the module system.

$ module avail nvidia/nvhpc\n

NVIDIA encourage the use of the latest available version, unless there are particular reasons to use earlier versions. The default version is therefore the latest module version present on the system.

Each release of the NVIDIA HPC SDK may include several different versions of the CUDA toolchain. For example, the nvidia/nvhpc/21.2 module comes with CUDA 10.2, 11.0 and 11.2. Only one of these CUDA toolchains can be active at any one time and for nvhpc/22.11 this is CUDA 11.8.

Here is a list of available HPC SDK versions, and the corresponding version of CUDA:

Module Supported CUDA Version nvidia/nvhpc/22.11 CUDA 11.8 nvidia/nvhpc/22.2 CUDA 11.6

To load the latest NVIDIA HPC SDK use

$ module load nvidia/nvhpc\n

The following sections provide some details of compilation for different programming models.

"},{"location":"user-guide/gpu/#cuda","title":"CUDA","text":"

CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs).

Programs, typically written in C or C++, are compiled with nvcc. As well as nvcc, a host compiler is required. By default, a gcc module is added when nvidia/nvhpc is loaded.

Compile your source code in the usual way.

nvcc -arch=sm_70 -o cuda_test.x cuda_test.cu\n

Note

The -arch=sm_70 compile option ensures that the binary produced is compatible with the NVIDIA Volta architecture.

"},{"location":"user-guide/gpu/#using-cuda-with-intel-compilers","title":"Using CUDA with Intel compilers","text":"

You can load either the Intel 18 or Intel 19 compilers to use with nvcc.

module unload gcc\nmodule load intel-compilers-19\n

You can now use nvcc -ccbin icpc to compile your source code with the Intel C++ compiler icpc.

nvcc -arch=sm_70 -ccbin icpc -o cuda_test.x cuda_test.cu\n
"},{"location":"user-guide/gpu/#compiling-openacc-code","title":"Compiling OpenACC code","text":"

OpenACC is a directive-based approach to introducing parallelism into either C/C++ or Fortran codes. A code with OpenACC directives may be compiled like so.

$ module load nvidia/nvhpc\n$ nvc program.c\n\n$ nvc++ program.cpp\n

Note that nvc and nvc++ are distinct from the NVIDIA CUDA compiler nvcc. They provide a way to compile standard C or C++ programs without explicit CUDA content. See man nvc or man nvc++ for further details.

"},{"location":"user-guide/gpu/#cuda-fortran","title":"CUDA Fortran","text":"

CUDA Fortran provides extensions to standard Fortran which allow GPU functionality. CUDA Fortran files (with file extension .cuf) may be compiled with the NVIDIA Fortran compiler.

$ module load nvidia/nvhpc\n$ nvfortran program.cuf\n

See man nvfortran for further details.

"},{"location":"user-guide/gpu/#openmp-for-gpus","title":"OpenMP for GPUs","text":"

The OpenMP API supports multi-platform shared-memory parallel programming in C/C++ and Fortran and can offload computation from the host (i.e. CPU) to one or more target devices (such as the GPUs on Cirrus). OpenMP code can be compiled with the NVIDIA compilers in a similar manner to OpenACC. To enable this functionality, you must add -mp=gpu to your compile command.

$ module load nvidia/nvhpc\n$ nvc++ -mp=gpu program.cpp\n

You can specify exactly which GPU to target with the -gpu flag. For example, the Volta cards on Cirrus use the flag -gpu=cc70.

During development it can be useful to have the compiler report information about how it is processing OpenMP pragmas. This can be enabled by the use of -Minfo=mp, see below.

nvc -mp=gpu -Minfo=mp testprogram.c\nmain:\n24, #omp target teams distribute parallel for thread_limit(128)\n24, Generating Tesla and Multicore code\nGenerating \"nvkernel_main_F1L88_2\" GPU kernel\n26, Loop parallelized across teams and threads(128), schedule(static)\n
"},{"location":"user-guide/gpu/#submitting-jobs-to-the-gpu-nodes","title":"Submitting jobs to the GPU nodes","text":"

To run a GPU job, a SLURM submission must specify a GPU partition and a quality of service (QoS) as well as the number of GPUs required. You specify the number of GPU cards you want using the --gres=gpu:N option, where N is typically 1, 2 or 4.

Note

As there are 4 GPUs per node, each GPU is associated with 1/4 of the resources of the node, i.e., 10/40 physical cores and roughly 91/384 GB in host memory.

Allocations of host resources are made pro-rata. For example, if 2 GPUs are requested, sbatch will allocate 20 cores and around 190 GB of host memory (in addition to 2 GPUs). Any attempt to use more than the allocated resources will result in an error.

This automatic allocation by SLURM for GPU jobs means that the submission script should not specify options such as --ntasks and --cpus-per-task. Such a job submission will be rejected. See below for some examples of how to use host resources and how to launch MPI applications.

If you specify the --exclusive option, you will automatically be allocated all host cores and all memory from the node irrespective of how many GPUs you request. This may be needed if the application has a large host memory requirement.

If more than one node is required, exclusive mode --exclusive and --gres=gpu:4 options must be included in your submission script. It is, for example, not possible to request 6 GPUs other than via exclusive use of two nodes.

Warning

In order to run jobs on the GPU nodes your budget must have positive GPU hours and positive CPU core hours associated with it. However, only your GPU hours will be consumed when running these jobs.

"},{"location":"user-guide/gpu/#partitions","title":"Partitions","text":"

Your job script must specify a partition. The following table has a list of relevant GPU partition(s) on Cirrus.

Partition Description Maximum Job Size (Nodes) gpu GPU nodes with Cascade Lake processors 36"},{"location":"user-guide/gpu/#quality-of-service-qos","title":"Quality of Service (QoS)","text":"

Your job script must specify a QoS relevant for the GPU nodes. Available QoS specifications are as follows.

QoS Name Jobs Running Per User Jobs Queued Per User Max Walltime Max Size Partition gpu No limit 128 jobs 4 days 64 GPUs gpu long 5 jobs 20 jobs 14 days 8 GPUs gpu short 1 job 2 jobs 20 minutes 4 GPUs gpu lowpriority No limit 100 jobs 2 days 16 GPUs gpu largescale 1 job 4 jobs 24 hours 144 GPUs gpu"},{"location":"user-guide/gpu/#examples","title":"Examples","text":""},{"location":"user-guide/gpu/#job-submission-script-using-one-gpu-on-a-single-node","title":"Job submission script using one GPU on a single node","text":"

A job script that requires 1 GPU accelerator and 10 CPU cores for 20 minutes would look like the following.

#!/bin/bash\n#\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --gres=gpu:1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\n# Load the required modules \nmodule load nvidia/nvhpc\n\nsrun ./cuda_test.x\n

This will execute one host process with access to one GPU. If we wish to make use of the 10 host cores in this allocation, we could use host threads via OpenMP.

export OMP_NUM_THREADS=10\nexport OMP_PLACES=cores\n\nsrun --ntasks=1 --cpus-per-task=10 --hint=nomultithread ./cuda_test.x\n

The launch configuration is specified directly to srun because, for the GPU partitions, it is not possible to do this via sbatch.

"},{"location":"user-guide/gpu/#job-submission-script-using-multiple-gpus-on-a-single-node","title":"Job submission script using multiple GPUs on a single node","text":"

A job script that requires 4 GPU accelerators and 40 CPU cores for 20 minutes would appear as follows.

#!/bin/bash\n#\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --gres=gpu:4\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\n# Load the required modules \nmodule load nvidia/nvhpc\n\nsrun ./cuda_test.x\n

A typical MPI application might assign one device per MPI process, in which case we would want 4 MPI tasks in this example. This would again be specified directly to srun.

srun --ntasks=4 ./mpi_cuda_test.x\n
"},{"location":"user-guide/gpu/#job-submission-script-using-multiple-gpus-on-multiple-nodes","title":"Job submission script using multiple GPUs on multiple nodes","text":"

See below for a job script that requires 8 GPU accelerators for 20 minutes.

#!/bin/bash\n#\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --gres=gpu:4\n#SBATCH --nodes=2\n#SBATCH --exclusive\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\n# Load the required modules \nmodule load nvidia/nvhpc\n\nsrun ./cuda_test.x\n

An MPI application with four MPI tasks per node would be launched as follows.

srun --ntasks=8 --tasks-per-node=4 ./mpi_cuda_test.x\n

Again, these options are specified directly to srun rather than being declared as sbatch directives.

Attempts to oversubscribe an allocation (10 cores per GPU) will fail, and generate an error message.

srun: error: Unable to create step for job 234123: More processors requested\nthan permitted\n
"},{"location":"user-guide/gpu/#debugging-gpu-applications","title":"Debugging GPU applications","text":"

Applications may be debugged using cuda-gdb. This is an extension of gdb which can be used with CUDA. We assume the reader is familiar with gdb.

First, compile the application with the -g -G flags in order to generate debugging information for both host and device code. Then, obtain an interactive session like so.

$ srun --nodes=1 --partition=gpu --qos=short --gres=gpu:1 \\\n       --time=0:20:0 --account=[budget code] --pty /bin/bash\n

Next, load the NVIDIA HPC SDK module and start cuda-gdb for your application.

$ module load nvidia/nvhpc\n$ cuda-gdb ./my-application.x\nNVIDIA (R) CUDA Debugger\n...\n(cuda-gdb)\n

Debugging then proceeds as usual. One can use the help facility within cuda-gdb to find details on the various debugging commands. Type quit to end your debug session followed by exit to close the interactive session.

Note, it may be necessary to set the temporary directory to somewhere in the user space (e.g., export TMPDIR=$(pwd)/tmp) to prevent unexpected internal CUDA driver errors.

For further information on CUDA-GDB, see https://docs.nvidia.com/cuda/cuda-gdb/index.html.

"},{"location":"user-guide/gpu/#profiling-gpu-applications","title":"Profiling GPU applications","text":"

NVIDIA provide two useful tools for profiling performance of applications: Nsight Systems and Nsight Compute; the former provides an overview of application performance, while the latter provides detailed information specifically on GPU kernels.

"},{"location":"user-guide/gpu/#using-nsight-systems","title":"Using Nsight Systems","text":"

Nsight Systems provides an overview of application performance and should therefore be the starting point for investigation. To run an application, compile as normal (including the -g flag) and then submit a batch job.

#!/bin/bash\n\n#SBATCH --time=00:10:00\n#SBATCH --nodes=1\n#SBATCH --exclusive  \n#SBATCH --partition=gpu\n#SBATCH --qos=short\n#SBATCH --gres=gpu:1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\nmodule load nvidia/nvhpc\n\nsrun -n 1 nsys profile -o prof1 ./my_application.x\n

The run should then produce an additional output file called, in this case, prof1.qdrep. The recommended way to view the contents of this file is to download the NVIDIA Nsight package to your own machine (you do not need the entire HPC SDK). Then copy the .qdrep file produced on Cirrus so that if can be viewed locally.

Note, a profiling run should probably be of a short duration so that the profile information (contained in the .qdrep file) does not become prohibitively large.

Details of the download of Nsight Systems and a user guide can be found via the links below.

https://developer.nvidia.com/nsight-systems

https://docs.nvidia.com/nsight-systems/UserGuide/index.html

If your code was compiled with the tools provided by nvidia/nvhpc/21.2 you should download and install Nsight Systems v2020.5.1.85.

"},{"location":"user-guide/gpu/#using-nsight-compute","title":"Using Nsight Compute","text":"

Nsight Compute may be used in a similar way as Nsight Systems. A job may be submitted like so.

#!/bin/bash\n\n#SBATCH --time=00:10:00\n#SBATCH --nodes=1\n#SBATCH --exclusive\n#SBATCH --partition=gpu\n#SBATCH --qos=short\n#SBATCH --gres=gpu:1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\nmodule load nvidia/nvhpc\n\nsrun -n 1 nv-nsight-cu-cli --section SpeedOfLight_RooflineChart \\\n                           -o prof2 -f ./my_application.x\n

In this case, a file called prof2.ncu-rep should be produced. Again, the recommended way to view this file is to download the Nsight Compute package to your own machine, along with the .ncu-rep file from Cirrus. The --section option determines which statistics are recorded (typically not all hardware counters can be accessed at the same time). A common starting point is --section MemoryWorkloadAnalysis.

Consult the NVIDIA documentation for further details.

https://developer.nvidia.com/nsight-compute

https://docs.nvidia.com/nsight-compute/2021.2/index.html

Nsight Compute v2021.3.1.0 has been found to work for codes compiled using nvhpc versions 21.2 and 21.9.

"},{"location":"user-guide/gpu/#monitoring-the-gpu-power-usage","title":"Monitoring the GPU Power Usage","text":"

NVIDIA also provides a useful command line utility for the management and monitoring of NVIDIA GPUs: the NVIDIA System Management Interface nvidia-smi.

The nvidia-smi command queries the available GPUs and reports current information, including but not limited to: driver versions, CUDA version, name, temperature, current power usage and maximum power capability. In this example output, there is one available GPU and it is idle:

  +-----------------------------------------------------------------------------+\n  | NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |\n  |-------------------------------+----------------------+----------------------+\n  | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n  | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n  |                               |                      |               MIG M. |\n  |===============================+======================+======================|\n  |   0  Tesla V100-SXM2...  Off  | 00000000:1C:00.0 Off |                  Off |\n  | N/A   38C    P0    57W / 300W |      0MiB / 16384MiB |      1%      Default |\n  |                               |                      |                  N/A |\n  +-------------------------------+----------------------+----------------------+\n\n  +-----------------------------------------------------------------------------+\n  | Processes:                                                                  |\n  |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n  |        ID   ID                                                   Usage      |\n  |=============================================================================|\n  |  No running processes found                                                 |\n  +-----------------------------------------------------------------------------+\n

To monitor the power usage throughout the duration of a job, the output of nvidia-smi will report data at every specified interval with the --loop=SEC option with the tool sleeping in-between queries. The following command will print the output of nvidia-smi every 10 seconds in the specified output file.

nvidia-smi --loop=10 --filename=out-nvidia-smi.txt &\n

Example submission script:

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=lammps_Example\n#SBATCH --time=00:20:00\n#SBATCH --nodes=1\n#SBATCH --gres=gpu:4\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n\n# Load the required modules\nmodule load nvidia/nvhpc\n\n# Save the output of NVIDIA-SMI every 10 seconds\nnvidia-smi --loop=10 --filename=out-nvidia-smi.txt &\nsrun ./cuda_test.x\n

This submission script uses 4 GPU accelerators for 20 minutes, printing the output of nvidia-smi every 10 seconds to the nvidia-smi.txt output file. The & means the shell executes the command in the background.

Consult the NVIDIA documentation for further details.

https://developer.nvidia.com/nvidia-system-management-interface

"},{"location":"user-guide/gpu/#compiling-and-using-gpu-aware-mpi","title":"Compiling and using GPU-aware MPI","text":"

For applications using message passing via MPI, considerable improvements in performance may be available by allowing device memory references in MPI calls. This allows replacement of relevant host device transfers by direct communication within a node via NVLink. Between nodes, MPI communication will remain limited by network latency and bandwidth.

Version of OpenMPI with both CUDA-aware MPI support and SLURM support are available, you should load the following modules:

module load openmpi/4.1.4-cuda-11.8\nmodule load nvidia/nvhpc-nompi/22.11\n

The command you use to compile depends on whether you are compiling C/C++ or Fortran.

"},{"location":"user-guide/gpu/#compiling-cc","title":"Compiling C/C++","text":"

The location of the MPI include files and libraries must be specified explicitly, e.g.,

nvcc -I${MPI_HOME}/include  -L${MPI_HOME}/lib -lmpi -o my_program.x my_program.cu\n

This will produce an executable in the usual way.

"},{"location":"user-guide/gpu/#compiling-fortran","title":"Compiling Fortran","text":"

Use the mpif90 compiler wrapper to compile Fortran code for GPU. e.g.

mpif90 -o my_program.x my_program.f90\n

This will produce an executable in the usual way.

"},{"location":"user-guide/gpu/#run-time","title":"Run time","text":"

A batch script to use such an executable might be:

#!/bin/bash\n\n#SBATCH --time=00:20:00\n\n#SBATCH --nodes=1\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --gres=gpu:4\n\n# Load the appropriate modules, e.g.,\nmodule load openmpi/4.1.4-cuda-11.8\nmodule load nvidia/nvhpc-nompi/22.2\n\nexport OMP_NUM_THREADS=1\n\n# Note the addition\nexport OMPI_MCA_pml=ob1\n\nsrun --ntasks=4 --cpus-per-task=10 --hint=nomultithread ./my_program\n

Note the addition of the environment variable OMPI_MCA_pml=ob1 is required for correct operation. As before, MPI and placement options should be directly specified to srun and not via SBATCH directives.

"},{"location":"user-guide/introduction/","title":"Introduction","text":"

This guide is designed to be a reference for users of the high-performance computing (HPC) facility: Cirrus. It provides all the information needed to access the system, transfer data, manage your resources (disk and compute time), submit jobs, compile programs and manage your environment.

"},{"location":"user-guide/introduction/#acknowledging-cirrus","title":"Acknowledging Cirrus","text":"

You should use the following phrase to acknowledge Cirrus in all research outputs that have used the facility:

This work used the Cirrus UK National Tier-2 HPC Service at EPCC (http://www.cirrus.ac.uk) funded by the University of Edinburgh and EPSRC (EP/P020267/1)

You should also tag outputs with the keyword Cirrus whenever possible.

"},{"location":"user-guide/introduction/#hardware","title":"Hardware","text":"

Details of the Cirrus hardware are available on the Cirrus website:

"},{"location":"user-guide/introduction/#useful-terminology","title":"Useful terminology","text":"

This is a list of terminology used throughout this guide and its meaning.

CPUh Cirrus CPU time is measured in CPUh. Each job you run on the service consumes CPUhs from your budget. You can find out more about CPUhs and how to track your usage in the resource management section

GPUh Cirrus GPU time is measured in GPUh. Each job you run on the GPU nodes consumes GPUhs from your budget, and requires positive CPUh, even though these will not be consumed. You can find out more about GPUhs and how to track your usage in the resource management section

"},{"location":"user-guide/network-upgrade-2023/","title":"Cirrus Network Upgrade: 2023","text":"

During September 2023 Cirrus will be undergoing a Network upgrade.

On this page we describe the impact this will have and links to further information.

If you have any questions or concerns, please contact the Cirrus Service Desk: https://www.cirrus.ac.uk/support/

"},{"location":"user-guide/network-upgrade-2023/#when-will-the-upgrade-happen-and-how-long-will-it-take","title":"When will the upgrade happen and how long will it take?","text":"

The outage dates will be:

We will notify users if we are able to complete this work ahead of schedule.

"},{"location":"user-guide/network-upgrade-2023/#what-are-the-impacts-on-users-from-the-upgrade","title":"What are the impacts on users from the upgrade?","text":"

During the upgrade process

Submitting new work, and running work

We will therefore be encouraging users to submit jobs to the queues in the period prior to the work, so that Cirrus can continue to run jobs during the outage.

"},{"location":"user-guide/network-upgrade-2023/#relaxing-of-queue-limits","title":"Relaxing of queue limits","text":"

In preparation for the Data Centre Network (DCN) upgrade we have relaxed the queue limits on all the QoS\u2019s, so that users can submit a significantly larger number of jobs to Cirrus. These changes are intended to allow users to submit jobs that they wish to run during the upgrade, in advance of the start of the upgrade. The changes will be in place until the end of the Data Centre Network upgrade.

"},{"location":"user-guide/network-upgrade-2023/#quality-of-service-qos","title":"Quality of Service (QoS)","text":"

QoS relaxed limits which will be in force during the Network upgrade.

QoS Name Jobs Running Per User Jobs Queued Per User Max Walltime Max Size Applies to Partitions Notes standard No limit 1000 jobs 4 days 88 nodes (3168 cores/25%) standard largescale 1 job 20 jobs 24 hours 228 nodes (8192+ cores/65%) or 144 GPUs standard, gpu long 5 jobs 40 jobs 14 days 16 nodes or 8 GPUs standard, gpu highpriority 10 jobs 20 jobs 4 days 140 nodes standard gpu No limit 256 jobs 4 days 64 GPUs (16 nodes/40%) gpu lowpriority No limit 1000 jobs 2 days 36 nodes (1296 cores/10%) or 16 GPUs standard, gpu"},{"location":"user-guide/python/","title":"Using Python","text":"

Python on Cirrus is provided by a number of Miniconda modules and one Anaconda module. (Miniconda being a small bootstrap version of Anaconda).

The Anaconda module is called anaconda/python3 and is suitable for running serial applications - for parallel applications using mpi4py see mpi4py for CPU or mpi4py for GPU.

You can list the Miniconda modules by running module avail python on a login node. Those module versions that have the gpu suffix are suitable for use on the Cirrus GPU nodes. There are also modules that extend these Python environments, e.g., pyfr, horovod, tensorflow and pytorch - simply run module help <module name> for further info.

The Miniconda modules support Python-based parallel codes, i.e., each such python module provides a suite of packages pertinent to parallel processing and numerical analysis such as dask, ipyparallel, jupyter, matplotlib, numpy, pandas and scipy.

All the packages provided by a module can be obtained by running pip list. We now give some examples that show how the python modules can be used on the Cirrus CPU/GPU nodes.

"},{"location":"user-guide/python/#mpi4py-for-cpu","title":"mpi4py for CPU","text":"

The python/3.9.13 module provides mpi4py 3.1.3 linked with OpenMPI 4.1.4.

See numpy-broadcast.py below which is a simple MPI Broadcast example, and the Slurm script submit-broadcast.slurm which demonstrates how to run across it two compute nodes.

numpy-broadcast.py
#!/usr/bin/env python\n\n\"\"\"\nParallel Numpy Array Broadcast \n\"\"\"\n\nfrom mpi4py import MPI\nimport numpy as np\nimport sys\n\ncomm = MPI.COMM_WORLD\n\nsize = comm.Get_size()\nrank = comm.Get_rank()\nname = MPI.Get_processor_name()\n\narraySize = 100\nif rank == 0:\n    data = np.arange(arraySize, dtype='i')\nelse:\n    data = np.empty(arraySize, dtype='i')\n\ncomm.Bcast(data, root=0)\n\nif rank == 0:\n    sys.stdout.write(\n        \"Rank %d of %d (%s) has broadcast %d integers.\\n\"\n        % (rank, size, name, arraySize))\nelse:\n    sys.stdout.write(\n        \"Rank %d of %d (%s) has received %d integers.\\n\"\n        % (rank, size, name, arraySize))\n\n    arrayBad = False\n    for i in range(100):\n        if data[i] != i:\n            arrayBad = True\n            break\n\n    if arrayBad:\n        sys.stdout.write(\n            \"Error, rank %d array is not as expected.\\n\"\n            % (rank))\n

The MPI initialisation is done automatically as a result of calling from mpi4py import MPI.

submit-broadcast.slurm
#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=broadcast\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n#SBATCH --account=[budget code]\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\nmodule load python/3.9.13\n\nexport OMPI_MCA_mca_base_component_show_load_errors=0\n\nsrun numpy-broadcast.py\n

The Slurm submission script (submit-broadcast.slurm) above sets a OMPI_MCA environment variable before launching the job. That particular variable suppresses warnings written to the job output file; it can of course be removed. Please see the OpenMPI documentation for info on all OMPI_MCA variables.

"},{"location":"user-guide/python/#mpi4py-for-gpu","title":"mpi4py for GPU","text":"

There's also an mpi4py module (again using OpenMPI 4.1.4) that is tailored for CUDA 11.6 on the Cirrus GPU nodes, python/3.9.13-gpu. We show below an example that features an MPI reduction performed on a CuPy array (cupy-allreduce.py).

cupy-allreduce.py
#!/usr/bin/env python\n\n\"\"\"\nReduce-to-all CuPy Arrays \n\"\"\"\n\nfrom mpi4py import MPI\nimport cupy as cp\nimport sys\n\ncomm = MPI.COMM_WORLD\n\nsize = comm.Get_size()\nrank = comm.Get_rank()\nname = MPI.Get_processor_name()\n\nsendbuf = cp.arange(10, dtype='i')\nrecvbuf = cp.empty_like(sendbuf)\nassert hasattr(sendbuf, '__cuda_array_interface__')\nassert hasattr(recvbuf, '__cuda_array_interface__')\ncp.cuda.get_current_stream().synchronize()\ncomm.Allreduce(sendbuf, recvbuf)\n\nassert cp.allclose(recvbuf, sendbuf*size)\n\nsys.stdout.write(\n    \"%d (%s): recvbuf = %s\\n\"\n    % (rank, name, str(recvbuf)))\n

By default, the CuPy cache will be located within the user's home directory. And so, as /home is not accessible from the GPU nodes, it is necessary to set CUPY_CACHE_DIR such that the cache is on the /work file system instead.

submit-allreduce.slurm
#!/bin/bash\n\n#SBATCH --job-name=allreduce\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --account=[budget code]\n#SBATCH --nodes=2\n#SBATCH --gres=gpu:4\n\nmodule load python/3.9.13-gpu\n\nexport CUPY_CACHE_DIR=${HOME/home/work}/.cupy/kernel_cache\n\nexport OMPI_MCA_mpi_warn_on_fork=0\nexport OMPI_MCA_mca_base_component_show_load_errors=0\n\nsrun --ntasks=8 --tasks-per-node=4 --cpus-per-task=1 cupy-allreduce.py\n

Again, the submission script (submit-allreduce.slurm) is the place to set OMPI_MCA variables - the two shown are optional, see the link below for further details.

https://www.open-mpi.org/faq/?category=tuning#mca-def

"},{"location":"user-guide/python/#machine-learning-frameworks","title":"Machine Learning frameworks","text":"

There are several more Python-based modules that also target the Cirrus GPU nodes. These include two machine learning frameworks, pytorch/1.12.1-gpu and tensorflow/2.9.1-gpu. Both modules are Python virtual environments that extend python/3.9.13-gpu. The MPI comms is handled by the Horovod 0.25.0 package along with the NVIDIA Collective Communications Library v2.11.4.

A full package list for these environments can be obtained by loading the module of interest and then running pip list.

Please click on the link indicated to see examples of how to use the PyTorch and TensorFlow modules .

"},{"location":"user-guide/python/#installing-your-own-python-packages-with-pip","title":"Installing your own Python packages (with pip)","text":"

This section shows how to setup a local custom Python environment such that it extends a centrally-installed python module. By extend, we mean being able to install packages locally that are not provided by the central python. This is needed because some packages such as mpi4py must be built specifically for the Cirrus system and so are best provided centrally.

You can do this by creating a lightweight virtual environment where the local packages can be installed. Further, this environment is created on top of an existing Python installation, known as the environment's base Python.

Select the base Python by loading the python module you wish to extend, e.g., python/3.9.13-gpu (you can run module avail python to list all the available python modules).

[auser@cirrus-login1 auser]$ module load python/3.9.13\n

Tip

In the commands below, remember to replace x01 with your project code and auser with your username.

Next, create the virtual environment within a designated folder:

python -m venv --system-site-packages /work/x01/x01/auser/myvenv\n

In our example, the environment is created within a myvenv folder located on /work, which means the environment will be accessible from the compute nodes. The --system-site-packages option ensures that this environment is based on the currently loaded python module. See https://docs.python.org/3/library/venv.html for more details.

extend-venv-activate /work/x01/x01/auser/myvenv\n

The extend-venv-activate command ensures that your virtual environment's activate script loads and unloads the base python module when appropriate. You're now ready to activate your environment.

source /work/x01/x01/auser/myvenv/bin/activate\n

Important

The path above uses a fictitious project code, x01, and username, auser. Please remember to replace those values with your actual project code and username. Alternatively, you could enter ${HOME/home/work} in place of /work/x01/x01/auser. That command fragment expands ${HOME} and then replaces the home part with work.

Installing packages to your local environment can now be done as follows.

(myvenv) [auser@cirrus-login1 auser]$ python -m pip install <package name>\n

Running pip directly as in pip install <package name> will also work, but we show the python -m approach as this is consistent with the way the virtual environment was created. And when you have finished installing packages, you can deactivate your environment by issuing the deactivate command.

(myvenv) [auser@cirrus-login1 auser]$ deactivate\n[auser@cirrus-login1 auser]$\n

The packages you have just installed locally will only be available once the local environment has been activated. So, when running code that requires these packages, you must first activate the environment, by adding the activation command to the submission script, as shown below.

submit-myvenv.slurm
#!/bin/bash\n\n#SBATCH --job-name=myvenv\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --account=[budget code]\n#SBATCH --nodes=2\n#SBATCH --gres=gpu:4\n\nsource /work/x01/x01/auser/myvenv/bin/activate\n\nsrun --ntasks=8 --tasks-per-node=4 --cpus-per-task=10 myvenv-script.py\n

Lastly, the environment being extended does not have to come from one of the centrally-installed python modules. You could just as easily create a local virtual environment based on one of the Machine Learning (ML) modules, e.g., horovod, tensorflow or pytorch. This means you would avoid having to install ML packages within your local area. Each of those ML modules is based on a python module. For example, tensorflow/2.11.0-gpu is itself an extension of python/3.10.8-gpu.

"},{"location":"user-guide/python/#installing-your-own-python-packages-with-conda","title":"Installing your own Python packages (with conda)","text":"

This section shows you how to setup a local custom Python environment such that it duplicates a centrally-installed python module, ensuring that your local conda environment will contain packages that are compatible with the Cirrus system.

Select the base Python by loading the python module you wish to duplicate, e.g., python/3.9.13-gpu (you can run module avail python to list all the available python modules).

[auser@cirrus-login1 auser]$ module load python/3.9.13\n

Next, create the folder for holding your conda environments. This folder should be on the /work file system as /home is not accessible from the compute nodes.

CONDA_ROOT=/work/x01/x01/auser/condaenvs\nmkdir -p ${CONDA_ROOT}\n

The following commands tell conda where to save your custom environments and packages.

conda config --prepend envs_dirs ${CONDA_ROOT}/envs\nconda config --prepend pkgs_dirs ${CONDA_ROOT}/pkgs\n

The conda config commands are executed just once and the configuration details are held in a .condarc file located in your home directory. You now need to move this .condarc file to a directory visible from the compute nodes.

mv ~/.condarc ${CONDA_ROOT}\n

You can now activate the conda configuration.

export CONDARC=${CONDA_ROOT}/.condarc\neval \"$(conda shell.bash hook)\"\n

These two lines need to be called each time you want to use your virtual conda environment. The next command creates that virtual environment.

conda create --clone base --name myvenv\n

The above creates an environment called myvenv that will hold the same packages provided by the base python module. As this command involves a significant amount of file copying and downloading, it may take a long time to complete. When it has completed please activate the local myvenv conda environment.

conda activate myvenv\n

You can now install packages using conda install -p ${CONDA_ROOT}/envs/myvenv <package_name>. And you can see the packages currently installed in the active environment with the command conda list. After all packages have been installed, simply run conda deactivate twice in order to restore the original comand prompt.

(myvenv) [auser@cirrus-login1 auser]$ conda deactivate\n(base) [auser@cirrus-login1 auser]$ conda deactivate\n[auser@cirrus-login1 auser]$\n

The submission script below shows how to use the conda environment within a job running on the compute nodes.

submit-myvenv.slurm
#!/bin/bash\n\n#SBATCH --job-name=myvenv\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --account=[budget code]\n#SBATCH --nodes=2\n#SBATCH --gres=gpu:4\n\nmodule load python/3.9.13\n\nCONDA_ROOT=/work/x01/x01/auser/condaenvs\nexport CONDARC=${CONDA_ROOT}/.condarc\neval \"$(conda shell.bash hook)\"\n\nconda activate myvenv\n\nsrun --ntasks=8 --tasks-per-node=4 --cpus-per-task=10 myvenv-script.py\n

You can see that using conda is less convenient compared to pip. In particular, the centrally-installed Python packages on copied in to the local conda environment, consuming some of the disk space allocated to your project. Secondly, activating the conda environment within a submission script is more involved: five commands are required (including an explicit load for the base python module), instead of the single source command that is sufficient for a pip environment.

Further, conda cannot be used if the base environment is one of the Machine Learning (ML) modules, as conda is not flexible enough to gather Python packages from both the ML and base python modules (e.g., the ML module pytorch/2.0.0-gpu is itself based on python/3.10.8-gpu, and so conda will only duplicate packages provided by the python module and not the ones supplied by pytorch).

"},{"location":"user-guide/python/#using-jupyterlab-on-cirrus","title":"Using JupyterLab on Cirrus","text":"

It is possible to view and run JupyterLab on both the login and compute nodes of Cirrus. Please note, you can test notebooks on the login nodes, but please don\u2019t attempt to run any computationally intensive work (such jobs will be killed should they reach the login node CPU limit).

If you want to run your JupyterLab on a compute node, you will need to enter an interactive session; otherwise you can start from a login node prompt.

  1. As described above, load the Anaconda module on Cirrus using module load anaconda/python3.

  2. Run export JUPYTER_RUNTIME_DIR=$(pwd).

  3. Start the JupyterLab server by running jupyter lab --ip=0.0.0.0 --no-browser

    Or copy and paste one of these URLs:\n    ...\n or http://127.0.0.1:8888/lab?token=<string>\n

    You will need the URL shown above for step 6.

  4. Please skip this step if you are connecting from Windows. If you are connecting from Linux or macOS, open a new terminal window, and run the following command.

    ssh <username>@login.cirrus.ac.uk -L<port_number>:<node_id>:<port_number>\n

    where \\<username> is your username, \\<port_number> is as shown in the URL from the Jupyter output and \\<node_id> is the name of the node we\u2019re currently on. On a login node, this will be cirrus-login1, or similar; on a compute node, it will be a mix of numbers and letters such as r2i5n5.

    Note

    If, when you connect in the new terminal, you see a message of the form channel_setup_fwd_listener_tcpip: cannot listen to port: 8888, it means port 8888 is already in use. You need to go back to step 3 (kill the existing jupyter lab) and retry with a new explicit port number by adding the --port=N option. The port number N can be in the range 5000-65535. You should then use the same port number in place of 8888.

  5. Please skip this step if you are connecting from Linux or macOS. If you are connecting from Windows, you should use MobaXterm to configure an SSH tunnel as follows.

    5.1. Click on the Tunnelling button above the MobaXterm terminal. Create a new tunnel by clicking on New SSH tunnel in the window that opens.

    5.2. In the new window that opens, make sure the Local port forwarding radio button is selected.

    5.3. In the forwarded port text box on the left under My computer with MobaXterm, enter the port number indicated in the Jupyter server output.

    5.4. In the three text boxes on the bottom right under SSH server enter login.cirrus.ac.uk, your Cirrus username, and then 22.

    5.5. At the top right, under Remote server, enter the name of the Cirrus login or compute node that you noted earlier followed by the port number (e.g. 8888).

    5.6. Click on the Save button.

    5.7. In the tunnelling window, you will now see a new row for the settings you just entered. If you like, you can give a name to the tunnel in the leftmost column to identify it. Click on the small key icon close to the right for the new connection to tell MobaXterm which SSH private key to use when connecting to Cirrus. You should tell it to use the same .ppk private key that you normally use.

    5.8. The tunnel should now be configured. Click on the small start button (like a play > icon) for the new tunnel to open it. You'll be asked to enter your Cirrus password -- please do so.

  6. Now, if you open a browser window on your local machine, you should be able to navigate to the URL from step 3, and this should display the JupyterLab server.

Note

If you have extended a central Python venv following the instructions about for Installing your own Python packages (with pip), Jupyter Lab will load the central ipython kernel, not the one for your venv. To enable loading of the ipython kernel for your venv from within Jupyter Lab, first install the ipykernel module and then use this to install the kernel for your venv.

source /work/x01/x01/auser/myvenv/bin/activate\npython -m pip install ipykernel\npython -m ipykernel install --user --name=myvenv\n
changing placeholder account and username as appropriate. Thereafter, launch Jupyter Lab as above and select the myvenv kernel.

If you are on a compute node, the JupyterLab server will be available for the length of the interactive session you have requested.

You can also run Jupyter sessions using the centrally-installed Miniconda3 modules available on Cirrus. For example, the following link provides instructions for how to setup a Jupyter server on a GPU node.

https://github.com/hpc-uk/build-instructions/tree/main/pyenvs/ipyparallel

"},{"location":"user-guide/reading/","title":"References and further reading","text":""},{"location":"user-guide/reading/#online-documentation-and-resources","title":"Online Documentation and Resources","text":""},{"location":"user-guide/reading/#mpi-programming","title":"MPI programming","text":""},{"location":"user-guide/reading/#openmp-programming","title":"OpenMP programming","text":""},{"location":"user-guide/reading/#parallel-programming","title":"Parallel programming","text":""},{"location":"user-guide/reading/#programming-languages","title":"Programming languages","text":""},{"location":"user-guide/reading/#programming-skills","title":"Programming skills","text":""},{"location":"user-guide/resource_management/","title":"File and Resource Management","text":"

This section covers some of the tools and technical knowledge that will be key to maximising the usage of the Cirrus system, such as the online administration tool SAFE and calculating the CPU-time available.

The default file permissions are then outlined, along with a description of changing these permissions to the desired setting. This leads on to the sharing of data between users and systems often a vital tool for project groups and collaboration.

Finally we cover some guidelines for I/O and data archiving on Cirrus.

"},{"location":"user-guide/resource_management/#the-cirrus-administration-web-site-safe","title":"The Cirrus Administration Web Site (SAFE)","text":"

All users have a login and password on the Cirrus Administration Web Site (also know as the 'SAFE'): SAFE. Once logged into this web site, users can find out much about their usage of the Cirrus system, including:

"},{"location":"user-guide/resource_management/#checking-your-cpugpu-time-allocations","title":"Checking your CPU/GPU time allocations","text":"

You can view these details by logging into the SAFE (https://safe.epcc.ed.ac.uk).

Use the Login accounts menu to select the user account that you wish to query. The page for the login account will summarise the resources available to account.

You can also generate reports on your usage over a particular period and examine the details of how many CPUh or GPUh individual jobs on the system cost. To do this use the Service information menu and selet Report generator.

"},{"location":"user-guide/resource_management/#disk-quotas","title":"Disk quotas","text":"

Disk quotas on Cirrus are managed via SAFE

For live disk usage figures on the Lustre /work file system, use

lfs quota -hu <username> /work\n\nlfs quota -hg <groupname> /work\n
"},{"location":"user-guide/resource_management/#backup-policies","title":"Backup policies","text":"

The /home file system is not backed up.

The /work file system is not backed up.

The solid-state storage /scratch/space1 file system is not backed up.

We strongly advise that you keep copies of any critical data on on an alternative system that is fully backed up.

"},{"location":"user-guide/resource_management/#sharing-data-with-other-cirrus-users","title":"Sharing data with other Cirrus users","text":"

How you share data with other Cirrus users depends on whether or not they belong to the same project as you. Each project has two shared folders that can be used for sharing data.

"},{"location":"user-guide/resource_management/#sharing-data-with-cirrus-users-in-your-project","title":"Sharing data with Cirrus users in your project","text":"

Each project has an inner shared folder on the /home and /work filesystems:

/home/[project code]/[project code]/shared\n\n/work/[project code]/[project code]/shared\n

This folder has read/write permissions for all project members. You can place any data you wish to share with other project members in this directory. For example, if your project code is x01 the inner shared folder on the /work file system would be located at /work/x01/x01/shared.

"},{"location":"user-guide/resource_management/#sharing-data-with-all-cirrus-users","title":"Sharing data with all Cirrus users","text":"

Each project also has an outer shared folder on the /home and /work filesystems:

/home/[project code]/shared\n\n/work/[project code]/shared\n

It is writable by all project members and readable by any user on the system. You can place any data you wish to share with other Cirrus users who are not members of your project in this directory. For example, if your project code is x01 the outer shared folder on the /work file system would be located at /work/x01/shared.

"},{"location":"user-guide/resource_management/#file-permissions-and-security","title":"File permissions and security","text":"

You should check the permissions of any files that you place in the shared area, especially if those files were created in your own Cirrus account. Files of the latter type are likely to be readable by you only.

The chmod command below shows how to make sure that a file placed in the outer shared folder is also readable by all Cirrus users.

chmod a+r /work/x01/shared/your-shared-file.txt\n

Similarly, for the inner shared folder, chmod can be called such that read permission is granted to all users within the x01 project.

chmod g+r /work/x01/x01/shared/your-shared-file.txt\n

If you're sharing a set of files stored within a folder hierarchy the chmod is slightly more complicated.

chmod -R a+Xr /work/x01/shared/my-shared-folder\nchmod -R g+Xr /work/x01/x01/shared/my-shared-folder\n

The -R option ensures that the read permission is enabled recursively and the +X guarantees that the user(s) you're sharing the folder with can access the subdirectories below my-shared-folder.

Default Unix file permissions can be specified by the umask command. The default umask value on Cirrus is 22, which provides \"group\" and \"other\" read permissions for all files created, and \"group\" and \"other\" read and execute permissions for all directories created. This is highly undesirable, as it allows everyone else on the system to access (but at least not modify or delete) every file you create. Thus it is strongly recommended that users change this default umask behaviour, by adding the command umask 077 to their $HOME/.profile file. This umask setting only allows the user access to any file or directory created. The user can then selectively enable \"group\" and/or \"other\" access to particular files or directories if required.

"},{"location":"user-guide/resource_management/#file-types","title":"File types","text":""},{"location":"user-guide/resource_management/#ascii-or-formatted-files","title":"ASCII (or formatted) files","text":"

These are the most portable, but can be extremely inefficient to read and write. There is also the problem that if the formatting is not done correctly, the data may not be output to full precision (or to the subsequently required precision), resulting in inaccurate results when the data is used. Another common problem with formatted files is FORMAT statements that fail to provide an adequate range to accommodate future requirements, e.g. if we wish to output the total number of processors, NPROC, used by the application, the statement:

WRITE (*,'I3') NPROC\n

will not work correctly if NPROC is greater than 999.

"},{"location":"user-guide/resource_management/#binary-or-unformatted-files","title":"Binary (or unformatted) files","text":"

These are much faster to read and write, especially if an entire array is read or written with a single READ or WRITE statement. However the files produced may not be readable on other systems.

GNU compiler -fconvert=swap compiler option. This compiler option often needs to be used together with a second option -frecord-marker, which specifies the length of record marker (extra bytes inserted before or after the actual data in the binary file) for unformatted files generated on a particular system. To read a binary file generated by a big-endian system on Cirrus, use -fconvert=swap -frecord-marker=4. Please note that due to the same 'length of record marker' reason, the unformatted files generated by GNU and other compilers on Cirrus are not compatible. In fact, the same WRITE statements would result in slightly larger files with GNU compiler. Therefore it is recommended to use the same compiler for your simulations and related pre- and post-processing jobs.

Other options for file formats include:

Direct access files Fortran unformatted files with specified record lengths. These may be more portable between different systems than ordinary (i.e. sequential IO) unformatted files, with significantly better performance than formatted (or ASCII) files. The \"endian\" issue will, however, still be a potential problem.

Portable data formats These machine-independent formats for representing scientific data are specifically designed to enable the same data files to be used on a wide variety of different hardware and operating systems. The most common formats are:

It is important to note that these portable data formats are evolving standards, so make sure you are aware of which version of the standard/software you are using, and keep up-to-date with any backward-compatibility implications of each new release.

"},{"location":"user-guide/resource_management/#file-io-performance-guidelines","title":"File IO Performance Guidelines","text":"

Here are some general guidelines

"},{"location":"user-guide/resource_management/#common-io-patterns","title":"Common I/O patterns","text":"

There is a number of I/O patterns that are frequently used in applications:

"},{"location":"user-guide/resource_management/#single-file-single-writer-serial-io","title":"Single file, single writer (Serial I/O)","text":"

A common approach is to funnel all the I/O through a single master process. Although this has the advantage of producing a single file, the fact that only a single client is doing all the I/O means that it gains little benefit from the parallel file system.

"},{"location":"user-guide/resource_management/#file-per-process-fpp","title":"File-per-process (FPP)","text":"

One of the first parallel strategies people use for I/O is for each parallel process to write to its own file. This is a simple scheme to implement and understand but has the disadvantage that, at the end of the calculation, the data is spread across many different files and may therefore be difficult to use for further analysis without a data reconstruction stage.

"},{"location":"user-guide/resource_management/#single-file-multiple-writers-without-collective-operations","title":"Single file, multiple writers without collective operations","text":"

There are a number of ways to achieve this. For example, many processes can open the same file but access different parts by skipping some initial offset; parallel I/O libraries such as MPI-IO, HDF5 and NetCDF also enable this.

Shared-file I/O has the advantage that all the data is organised correctly in a single file making analysis or restart more straightforward.

The problem is that, with many clients all accessing the same file, there can be a lot of contention for file system resources.

"},{"location":"user-guide/resource_management/#single-shared-file-with-collective-writes-ssf","title":"Single Shared File with collective writes (SSF)","text":"

The problem with having many clients performing I/O at the same time is that, to prevent them clashing with each other, the I/O library may have to take a conservative approach. For example, a file may be locked while each client is accessing it which means that I/O is effectively serialised and performance may be poor.

However, if I/O is done collectively where the library knows that all clients are doing I/O at the same time, then reads and writes can be explicitly coordinated to avoid clashes. It is only through collective I/O that the full bandwidth of the file system can be realised while accessing a single file.

"},{"location":"user-guide/resource_management/#achieving-efficient-io","title":"Achieving efficient I/O","text":"

This section provides information on getting the best performance out of the /work parallel file system on Cirrus when writing data, particularly using parallel I/O patterns.

You may find that using the /user-guide/solidstate gives better performance than /work for some applications and IO patterns.

"},{"location":"user-guide/resource_management/#lustre","title":"Lustre","text":"

The Cirrus /work file system use Lustre as a parallel file system technology. The Lustre file system provides POSIX semantics (changes on one node are immediately visible on other nodes) and can support very high data rates for appropriate I/O patterns.

"},{"location":"user-guide/resource_management/#striping","title":"Striping","text":"

One of the main factors leading to the high performance of /work Lustre file systems is the ability to stripe data across multiple Object Storage Targets (OSTs) in a round-robin fashion. Files are striped when the data is split up in chunks that will then be stored on different OSTs across the /work file system. Striping might improve the I/O performance because it increases the available bandwidth since multiple processes can read and write the same files simultaneously. However striping can also increase the overhead. Choosing the right striping configuration is key to obtain high performance results.

Users have control of a number of striping settings on Lustre file systems. Although these parameters can be set on a per-file basis they are usually set on directory where your output files will be written so that all output files inherit the settings.

"},{"location":"user-guide/resource_management/#default-configuration","title":"Default configuration","text":"

The file system on Cirrus has the following default stripe settings:

These settings have been chosen to provide a good compromise for the wide variety of I/O patterns that are seen on the system but are unlikely to be optimal for any one particular scenario. The Lustre command to query the stripe settings for a directory (or file) is lfs getstripe. For example, to query the stripe settings of an already created directory res_dir:

$ lfs getstripe res_dir/\nres_dir\nstripe_count:   1 stripe_size:    1048576 stripe_offset:  -1\n
"},{"location":"user-guide/resource_management/#setting-custom-striping-configurations","title":"Setting Custom Striping Configurations","text":"

Users can set stripe settings for a directory (or file) using the lfs setstripe command. The options for lfs setstripe are:

For example, to set a stripe size of 4 MiB for the existing directory res_dir, along with maximum striping count you would use:

$ lfs setstripe -s 4m -c -1 res_dir/\n
"},{"location":"user-guide/singularity/","title":"Singularity Containers","text":"

This page was originally based on the documentation at the University of Sheffield HPC service.

Designed around the notion of mobility of compute and reproducible science, Singularity enables users to have full control of their operating system environment. This means that a non-privileged user can \"swap out\" the Linux operating system and environment on the host for a Linux OS and environment that they control. So if the host system is running CentOS Linux but your application runs in Ubuntu Linux with a particular software stack, you can create an Ubuntu image, install your software into that image, copy the image to another host (e.g. Cirrus), and run your application on that host in its native Ubuntu environment.

Singularity also allows you to leverage the resources of whatever host you are on. This includes high-speed interconnects (e.g. Infiniband), file systems (e.g. Lustre) and potentially other resources (such as the licensed Intel compilers on Cirrus).

Note

Singularity only supports Linux containers. You cannot create images that use Windows or macOS (this is a restriction of the containerisation model rather than of Singularity).

"},{"location":"user-guide/singularity/#useful-links","title":"Useful Links","text":""},{"location":"user-guide/singularity/#about-singularity-containers-images","title":"About Singularity Containers (Images)","text":"

Similar to Docker, a Singularity container (or, more commonly, image) is a self-contained software stack. As Singularity does not require a root-level daemon to run its images (as is required by Docker) it is suitable for use on a multi-user HPC system such as Cirrus. Within the container/image, you have exactly the same permissions as you do in a standard login session on the system.

In principle, this means that an image created on your local machine with all your research software installed for local development will also run on Cirrus.

Pre-built images (such as those on DockerHub or SingularityHub) can simply be downloaded and used on Cirrus (or anywhere else Singularity is installed); see use_image_singularity).

Creating and modifying images requires root permission and so must be done on a system where you have such access (in practice, this is usually within a virtual machine on your laptop/workstation); see create_image_singularity.

"},{"location":"user-guide/singularity/#using-singularity-images-on-cirrus","title":"Using Singularity Images on Cirrus","text":"

Singularity images can be used on Cirrus in a number of ways.

  1. Interactively on the login nodes
  2. Interactively on compute nodes
  3. As serial processes within a non-interactive batch script
  4. As parallel processes within a non-interactive batch script

We provide information on each of these scenarios. First, we describe briefly how to get existing images onto Cirrus so that you can use them.

"},{"location":"user-guide/singularity/#getting-existing-images-onto-cirrus","title":"Getting existing images onto Cirrus","text":"

Singularity images are simply files, so if you already have an image file, you can use scp to copy the file to Cirrus as you would with any other file.

If you wish to get a file from one of the container image repositories then Singularity allows you to do this from Cirrus itself.

For example, to retrieve an image from SingularityHub on Cirrus we can simply issue a Singularity command to pull the image.

[user@cirrus-login1 ~]$ module load singularity\n[user@cirrus-login1 ~]$ singularity pull hello-world.sif shub://vsoch/hello-world\n

The image located at the shub URI is written to a Singularity Image File (SIF) called hello-world.sif.

"},{"location":"user-guide/singularity/#interactive-use-on-the-login-nodes","title":"Interactive use on the login nodes","text":"

The container represented by the image file can be run on the login node like so.

[user@cirrus-login1 ~]$ singularity run hello-world.sif \nRaawwWWWWWRRRR!! Avocado!\n[user@cirrus-login1 ~]$\n

We can also shell into the container.

[user@cirrus-login1 ~]$ singularity shell hello-world.sif\nSingularity> ls /\nbin  boot  dev  environment  etc  home  lib  lib64  lustre  media  mnt  opt  proc  rawr.sh  root  run  sbin  singularity  srv  sys  tmp  usr  var\nSingularity> exit\nexit\n[user@cirrus-login1 ~]$\n

For more information see the Singularity documentation.

"},{"location":"user-guide/singularity/#interactive-use-on-the-compute-nodes","title":"Interactive use on the compute nodes","text":"

The process for using an image interactively on the compute nodes is very similar to that for using them on the login nodes. The only difference is that you first have to submit an interactive serial job to get interactive access to the compute node.

First though, move to a suitable location on /work and re-pull the hello-world image. This step is necessary as the compute nodes do not have access to the /home file system.

[user@cirrus-login1 ~]$ cd ${HOME/home/work}\n[user@cirrus-login1 ~]$ singularity pull hello-world.sif shub://vsoch/hello-world\n

Now reserve a full node to work on interactively by issuing an salloc command, see below.

[user@cirrus-login1 ~]$ salloc --exclusive --nodes=1 \\\n  --tasks-per-node=36 --cpus-per-task=1 --time=00:20:00 \\\n  --partition=standard --qos=standard --account=[budget code] \nsalloc: Pending job allocation 14507\nsalloc: job 14507 queued and waiting for resources\nsalloc: job 14507 has been allocated resources\nsalloc: Granted job allocation 14507\nsalloc: Waiting for resource configuration\nsalloc: Nodes r1i0n8 are ready for job\n[user@cirrus-login1 ~]$ ssh r1i0n8\n

Note the prompt has changed to show you are on a compute node. Once you are logged in to the compute node (you may need to submit your account password), move to a suitable location on /work as before. You can now use the hello-world image in the same way you did on the login node.

[user@r1i0n8 ~]$ cd ${HOME/home/work}\n[user@r1i0n8 ~]$ singularity shell hello-world.sif\nSingularity> exit\nexit\n[user@r1i0n8 ~]$ exit\nlogout\nConnection to r1i0n8 closed.\n[user@cirrus-login1 ~]$ exit\nexit\nsalloc: Relinquishing job allocation 14507\nsalloc: Job allocation 14507 has been revoked.\n[user@cirrus-login1 ~]$\n

Note we used exit to leave the interactive container shell and then called exit twice more to close the interactive job on the compute node.

"},{"location":"user-guide/singularity/#serial-processes-within-a-non-interactive-batch-script","title":"Serial processes within a non-interactive batch script","text":"

You can also use Singularity images within a non-interactive batch script as you would any other command. If your image contains a runscript then you can use singularity run to execute the runscript in the job. You can also use singularity exec to execute arbitrary commands (or scripts) within the image.

An example job submission script to run a serial job that executes the runscript within the hello-world.sif we built above on Cirrus would be as follows.

#!/bin/bash --login\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=hello-world\n#SBATCH --ntasks=1\n#SBATCH --exclusive\n#SBATCH --time=0:20:0\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\n# Load any required modules\nmodule load singularity\n\n# Run the serial executable\nsrun --cpu-bind=cores singularity run ${HOME/home/work}/hello-world.sif\n

Submit this script using the sbatch command and once the job has finished, you should see RaawwWWWWWRRRR!! Avocado! in the Slurm output file.

"},{"location":"user-guide/singularity/#parallel-processes-within-a-non-interactive-batch-script","title":"Parallel processes within a non-interactive batch script","text":"

Running a Singularity container on the compute nodes isn't too different from launching a normal parallel application. The submission script below shows that the srun command now contains an additional singularity clause.

#!/bin/bash --login\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=[name of application]\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n#SBATCH --exclusive\n#SBATCH --time=0:20:0\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\n# Load any required modules\nmodule load mpt\nmodule load singularity\n\n# The host bind paths for the Singularity container.\nBIND_ARGS=/mnt/lustre/indy2lfs/sw,/opt/hpe,/etc/libibverbs.d,/path/to/input/files\n\n# The file containing environment variable settings that will allow\n# the container to find libraries on the host, e.g., LD_LIBRARY_PATH . \nENV_PATH=/path/to/container/environment/file\n\nCONTAINER_PATH=/path/to/singularity/image/file\n\nAPP_PATH=/path/to/containerized/application/executable\nAPP_PARAMS=[application parameters]\n\nsrun --distribution=block:block --hint=nomultithread \\\n    singularity exec --bind ${BIND_ARGS} --env-file ${ENV_PATH} ${IMAGE_PATH}\n        ${APP_PATH} ${APP_PARAMS}\n

The script above runs a containerized application such that each of the four nodes requested is fully populated. In general, the containerized application's input and output will be read from and written to a location on the host; hence, it is necessary to pass a suitable bind path to singularity (/path/to/input/files).

Note

The paths in the submission script that begin /path/to should be provided by the user. All but one of these paths are host specific. The exception being APP_PATH, which should be given a path relative to the container file system.

If the Singularity image file was built according to the Bind model, you will need to specify certain paths (--bind) and environment variables (--env-file) that allow the containerized application to find the required MPI libraries.

Otherwise, if the image follows the Hybrid model and so contains its own MPI implementation, you instead need to be sure that the containerized MPI is compatible with the host MPI, the one loaded in the submission script. In the example above, the host MPI is HPE MPT 2.25, but you could also use OpenMPI (with mpirun), either by loading a suitable openmpi module or by referencing the paths to an OpenMPI installation that was built locally (i.e., within your Cirrus work folder).

"},{"location":"user-guide/singularity/#creating-your-own-singularity-images","title":"Creating Your Own Singularity Images","text":"

You can create Singularity images by importing from DockerHub or Singularity Hub directly to Cirrus. If you wish to create your own custom image then you must install Singularity on a system where you have root (or administrator) privileges - often your own laptop or workstation.

We provide links below to instructions on how to install Singularity locally and then cover what options you need to include in a Singularity definition file in order to create images that can run on Cirrus and access the software development modules. This can be useful if you want to create a custom environment but still want to compile and link against libraries that you only have access to on Cirrus such as the Intel compilers and HPE MPI libraries.

"},{"location":"user-guide/singularity/#installing-singularity-on-your-local-machine","title":"Installing Singularity on Your Local Machine","text":"

You will need Singularity installed on your machine in order to locally run, create and modify images. How you install Singularity on your laptop/workstation depends on the operating system you are using.

If you are using Windows or macOS, the simplest solution is to use Vagrant to give you an easy to use virtual environment with Linux and Singularity installed. The Singularity website has instructions on how to use this method to install Singularity.

If you are using Linux then you can usually install Singularity directly.

"},{"location":"user-guide/singularity/#accessing-cirrus-modules-from-inside-a-container","title":"Accessing Cirrus Modules from Inside a Container","text":"

You may want your custom image to be able to access the modules environment on Cirrus so you can make use of custom software that you cannot access elsewhere. We demonstrate how to do this for a CentOS 7 image but the steps are easily translated for other flavours of Linux.

For the Cirrus modules to be available in your Singularity container you need to ensure that the environment-modules package is installed in your image.

In addition, when you use the container you must invoke access as a login shell to have access to the module commands.

Below, is an example Singularity definition file that builds a CentOS 7 image with access to TCL modules already installed on Cirrus.

BootStrap: docker\nFrom: centos:centos7\n\n%post\n    yum update -y\n    yum install environment-modules -y\n    echo 'module() { eval `/usr/bin/modulecmd bash $*`; }' >> /etc/bashrc\n    yum install wget -y\n    yum install which -y\n    yum install squashfs-tools -y\n

If we save this definition to a file called centos7.def, we can use the following build command to build the image (remember this command must be run on a system where you have root access, not on Cirrus).

me@my-system:~> sudo singularity build centos7.sif centos7.def\n

The resulting image file (centos7.sif) can then be copied to Cirrus using scp; such an image already exists on Cirrus and can be found in the /mnt/lustre/indy2lfs/sw/singularity/images folder.

When you use that image interactively on Cirrus you must start with a login shell and also bind /mnt/lustre/indy2lfs/sw so that the container can see all the module files, see below.

[user@cirrus-login1 ~]$ module load singularity\n[user@cirrus-login1 ~]$ singularity exec -B /mnt/lustre/indy2lfs/sw \\\n  /mnt/lustre/indy2lfs/sw/singularity/images/centos7.sif \\\n    /bin/bash --login\nSingularity> module avail intel-compilers\n\n--------- /mnt/lustre/indy2lfs/sw/modulefiles -------------\nintel-compilers-18/18.05.274  intel-compilers-19/19.0.0.117\nSingularity> exit\nlogout\n[user@cirrus-login1 ~]$\n
"},{"location":"user-guide/singularity/#altering-a-container-on-cirrus","title":"Altering a Container on Cirrus","text":"

A container image file is immutable but it is possible to alter the image if you convert the file to a sandbox. The sandbox is essentially a directory on the host system that contains the full container file hierarchy.

You first run the singularity build command to perform the conversion followed by a shell command with the --writable option. You are now free to change the files inside the container sandbox.

user@cirrus-login1 ~]$ singularity build --sandbox image.sif.sandbox image.sif\nuser@cirrus-login1 ~]$ singularity shell -B /mnt/lustre/indy2lfs/sw --writable image.sif.sandbox\nSingularity>\n

In the example above, the /mnt/lustre/indy2lfs/sw bind path is specified, allowing you to build code that links to the Cirrus module libraries.

Finally, once you are finished with the sandbox you can exit and convert back to the original image file.

Singularity> exit\nexit\nuser@cirrus-login1 ~]$ singularity build --force image.sif image.sif.sandbox\n

Note

Altering a container in this way will cause the associated definition file to be out of step with the current image. Care should be taken to keep a record of the commands that were run within the sandbox so that the image can be reproduced.

"},{"location":"user-guide/solidstate/","title":"Solid state storage","text":"

In addition to the Lustre file system, the Cirrus login and compute nodes have access to a shared, high-performance, solid state storage system (also known as RPOOL). This storage system is network mounted and shared across the login nodes and GPU compute nodes in a similar way to the normal, spinning-disk Lustre file system but has different performanc characteristics.

The solid state storage has a maximum usable capacity of 256 TB which is shared between all users.

"},{"location":"user-guide/solidstate/#backups-quotas-and-data-longevity","title":"Backups, quotas and data longevity","text":"

There are no backups of any data on the solid state storage so you should ensure that you have copies of critical data elsewhere.

In addition, the solid state storage does not currently have any quotas (user or group) enabled so all users are potentially able to access the full 256 TB capacity of the storage system. We ask all users to be considerate in their use of this shared storage system and to delete any data on the solid state storage as soon as it no longer needs to be there.

We monitor the usage of the storage system by users and groups and will potentially remove data that is stopping other users getting fair access to the storage and data that has not been actively used for long periods of time.

"},{"location":"user-guide/solidstate/#accessing-the-solid-state-storage","title":"Accessing the solid-state storage","text":"

You access the solid-state storage at /scratch/space1 on both the login nodes and on the compute nodes.

Everybody has access to be able to create directories and add data so we suggest that you create a directory for your project and/or user to avoid clashes with files and data added by other users. For example, if my project is t01 and my username is auser then I could create a directory with

mkdir -p /scratch/space1/t01/auser\n

When these directories are initially created they will be world-readable. If you do not want users from other projects to be able to see your data, you should change the permissions on your new directory. For example, to restrict the directory so that only other users in your project can read the data you would use:

chmod -R o-rwx /scratch/space1/t01\n
"},{"location":"user-guide/solidstate/#copying-data-tofrom-solid-state-storage","title":"Copying data to/from solid-state storage","text":"

You can move data to/from the solid-state storage in a number of different ways:

"},{"location":"user-guide/solidstate/#local-data-transfer","title":"Local data transfer","text":"

The most efficient tool for copying to/from the Cirrus file systems (/home, /work) to the solid state storage is generally the cp command, e.g.

cp -r /path/to/data-dir /scratch/space1/t01/auser/\n

where /path/to/data-dir should be replaced with the path to the data directory you are wanting to copy and assuming, of course, that you have setup the t01/auser subdirectories as described above).

Note

If you are transferring data from your /work directory, these commands can also be added to job submission scripts running on the compute nodes to move data as part of the job. If you do this, remember to include the data transfer time in the overall walltime for the job.

Data from your /home directory is not available from the compute nodes and must therefore be transferred from a login node.

"},{"location":"user-guide/solidstate/#remote-data-transfer","title":"Remote data transfer","text":"

You can transfer data directly to the solid state storage from external locations using scp or rsync in exactly the same way as you would usually do to transfer data to Cirrus. Simply substitute the path to the location on the solid state storage for that you would normally use for Cirrus. For example, if you are on the external location (e.g. your laptop), you could use something like:

scp -r data_dir user@login.cirrus.ac.uk:/scratch/space1/t01/auser/\n

You can also use commands such as wget and curl to pull data from external locations directly to the solid state storage.

Note

You cannot transfer data from external locations in job scripts as the Cirrus compute nodes do not have external network access.

"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Overview","text":""},{"location":"#cirrus","title":"Cirrus","text":"

Cirrus is a HPC and data science service hosted and run by EPCC at The University of Edinburgh. It is one of the EPSRC Tier-2 National HPC Services.

Cirrus is available to industry and academic researchers. For information on how to get access to the system please see the Cirrus website.

The Cirrus facility is based around an SGI ICE XA system. There are 280 standard compute nodes and 38 GPU compute nodes. Each standard compute node has 256 GiB of memory and contains two 2.1 GHz, 18-core Intel Xeon (Broadwell) processors. Each GPU compute node has 384 GiB of memory, contains two 2.4 GHz, 20-core Intel Xeon (Cascade Lake) processors and four NVIDIA Tesla V100-SXM2-16GB (Volta) GPU accelerators connected to the host processors and each other via PCIe. All nodes are connected using a single Infiniband fabric. This documentation covers:

Information on using the SAFE web interface for managing and reporting on your usage on Cirrus can be found on the Tier-2 SAFE Documentation

This documentation draws on the documentation for the ARCHER2 National Supercomputing Service.

"},{"location":"e1000-migration/","title":"Cirrus migration to E1000 system","text":"

There will be a full service maintenance on Tuesday 12th March from 0900 - 1700 GMT to allow for some major changes on the Cirrus service.

Tip

If you need help or have questions on the Cirrus E1000 migration please contact the Cirrus service desk

"},{"location":"e1000-migration/#change-of-authentication-protocol","title":"Change of authentication protocol","text":"

We are changing the authentication protocol on Cirrus from ldap to freeipa.

We expect this change to be transparent to users but you may notice a change from username@cirrus to username@eidf within your SAFE account.

You should be able to connect using your existing Cirrus authentication factors i.e. your ssh key pair and your TOTP token.

If you do experience issues, then please reset your tokens and try to reconnect. If problems persist then please contact the service desk.

Further details on Connecting to Cirrus

"},{"location":"e1000-migration/#new-work-file-system","title":"New /work file system","text":"

We are replacing the existing lustre /work file system with a new more performant lustre file system, E1000.

The old /work file system will be available as read-only and we ask you to copy any files you require onto the new /work file system.

The old read-only file system will be removed on 1st May so please retrieve all required data by then.

For username in project x01, to copy data from /mnt/lustre/indy2lfs/work/x01/x01/username/directory_to_copy to /work/x01/x01/username/destination_directory you would do this by running:

cp -r /mnt/lustre/indy2lfs/work/x01/x01/username/directory_to_copy \\ /work/x01/x01/username/destination_directory

Further details of Data Management and Transfer on Cirrus

Note

Slurm Pending Jobs As the underlying pathname for /work will be changing with the addition of the new file system, all of the pending work in the slurm queue will be removed during the migration. When the service is returned, please resubmit your slurm jobs to Cirrus.

"},{"location":"e1000-migration/#cse-module-updates","title":"CSE Module Updates","text":"

Our Computational Science and Engineering (CSE) Team have taken the opportunity of the arrival of the new file system to update modules and also remove older versions of modules. A full list of the changes to the modules can be found below.

Please contact the service desk if you have concerns about the removal of any of the older modules.

"},{"location":"e1000-migration/#to-be-removed","title":"TO BE REMOVED","text":"Package/module Advice for users altair-hwsolvers/13.0.213 Please contact the service desk if you wish to use Altair Hyperworks. altair-hwsolvers/14.0.210 Please contact the service desk if you wish to use Altair Hyperworks. ansys/18.0 Please contact the service desk if you wish to use ANSYS Fluent. ansys/19.0 Please contact the service desk if you wish to use ANSYS Fluent.

autoconf/2.69

Please use autoconf/2.71

bison/3.6.4

Please use bison/3.8.2

boost/1.67.0

Please use boost/1.84.0

boost/1.73.0

Please use boost/1.84.0

cmake/3.17.3

cmake/3.22.1

Please use cmake/3.25.2

CUnit/2.1.3

Please contact the service desk if you wish to use CUnit.

dolfin/2019.1.0-intel-mpi

dolfin/2019.1.0-mpt

Dolfin is no longer supported and will not be replaced.

eclipse/2020-09

Please contact the service desk if you wish to use Eclipse.

expat/2.2.9

Please use expat/2.6.0

fenics/2019.1.0-intel-mpi

fenics/2019.1.0-mpt

Fenics is no longer supported and will not be replaced.

fftw/3.3.8-gcc8-ompi4

fftw/3.3.8-intel19

fftw/3.3.9-ompi4-cuda11-gcc8\u00a0

fftw/3.3.8-intel18 \u00a0\u00a0

fftw/3.3.9-impi19-gcc8 \u00a0

fftw/3.3.10-intel19-mpt225 \u00a0\u00a0

fftw/3.3.10-ompi4-cuda116-gcc8

Please use one of the following

fftw/3.3.10-gcc10.2-mpt2.25

fftw/3.3.10-gcc10.2-impi20.4

fftw/3.3.10-gcc10.2-ompi4-cuda11.8

fftw/3.3.10-gcc12.3-impi20.4

fftw/3.3.10-intel20.4-impi20.4

flacs/10.9.1

flacs-cfd/20.1

flacs-cfd/20.2

flacs-cfd/21.1

flacs-cfd/21.2

flacs-cfd/22.1

Please contact the helpdesk if you wish to use FLACS.

forge/22.1.3

Please use forge/23.1.1

gcc/6.2.0

Please use gcc/8.2.0 or later

gcc/6.3.0

Please use gcc/8.2.0 or later

gcc/12.2.0-offload

Please use gcc/12.3.0-offload

gdal/2.1.2-gcc

gdal/2.1.2-intel\u00a0

gdal/2.4.4-gcc

Please use gcc/3.6.2-gcc

git/2.21.0

Please use git/2.37.3

gmp/6.2.0-intel\u00a0

gmp/6.2.1-mpt

gmp/6.3.0-mpt

Please use gmp/6.3.0-gcc or gmp/6.3.0-intel\u00a0

gnu-parallel/20200522-gcc6

Please use gnu-parallel/20240122-gcc10

gromacs/2022.1gromacs/2022.1-gpugromacs/2022.3-gpu

Please use one of:gromacs/2023.4gromacs/2023.4-gpu

hdf5parallel/1.10.4-intel18-impi18

Please use hdf5parallel/1.14.3-intel20-impi20

hdf5parallel/1.10.6-gcc6-mpt225

Please use hdf5parallel/1.14.3-gcc10-mpt225

hdf5parallel/1.10.6-intel18-mpt225

Please use hdf5parallel/1.14.3-intel20-mpt225

hdf5parallel/1.10.6-intel19-mpt225

Please use hdf5parallel/1.14.3-intel20-mpt225

hdf5serial/1.10.6-intel18

Please use hdf5serial/1.14.3-intel20

horovod/0.25.0

horovod/0.25.0-gpu

horovod/0.26.1-gpu

Please use one of the pytorch or tensorflow modules

htop/3.1.2\u00a0

Please use htop/3.2.1\u00a0

intel 18.0 compilers etc

Please use Intel 19.5 or later; or oneAPI

intel 19.0 compilers etc

Please use Intel 19.5 or later

lammps/23Jun2022_intel19_mptlammps/8Feb2023-gcc8-impilammps/23Sep2023-gcc8-impilammps/8Feb2023-gcc8-impi-cuda118lammps/23Sep2023-gcc8-impi-cuda118

Please use one of:

lammps/15Dec2023-gcc10.2-impi20.4lammps-gpu/15Dec2023-gcc10.2-impi20.4-cuda11.8

libxkbcommon/1.0.1

Please contact the service desk if you wish to use libxkbcommon.

libnsl/1.3.0\u00a0

Please contact the helpdesk if you wish to use libnsl.

libpng/1.6.30

This is no longer supported as the central module.

libtirpc/1.2.6

Please contact the helpdesk if you wish to use libtirpc.

libtool/2.4.6

Please use libtool/2.4.7 nco/4.9.3 Please use nco/5.1.9 nco/4.9.7 Please use nco/5.1.9 ncview/2.1.7 Please use ncview/2.1.10

netcdf-parallel/4.6.2-intel18-impi18

Please use netcdf-parallel/4.9.2-intel20-impi20

netcdf-parallel/4.6.2-intel19-mpt225

Please use netcdf-parallel/4.9.2-intel20-mpt225

nvidia/cudnn/8.2.1-cuda-11.6

nvidia/cudnn/8.2.1-cuda-11.6

nvidia/cudnn/8.9.4-cuda-11.6

nvidia/cudnn/8.9.7-cuda-11.6

Please use one of the following

nvidia/cudnn/8.6.0-cuda-11.6

nvidia/cudnn/8.6.0-cuda-11.6

nvidia/nvhpc/22.11-no-gcc

Use nvidia/nvhpc/22.11

nvidia/tensorrt/7.2.3.4

Please use nvidia/tensorrt/8.4.3.1-u2

openfoam/v8.0

Please consider a later version, e.g., v10.0

openfoam/v9.0

Please consider a later version, e.g, v11.0

openfoam/v2006

Please consider a later version, e.g., v2306

openmpi/4.1.2-cuda-11.6

openmpi/4.1.4

openmpi/4.1.4-cuda-11.6

openmpi/4.1.4-cuda-11.6-nvfortran

openmpi/4.1.4-cuda-11.8

openmpi/4.1.4-cuda-11.8-nvfortran

openmpi/4.1.5

openmpi/4.1.5-cuda-11.6

Please use one of the following

openmpi/4.1.6

openmpi/4.1.6-cuda-11.6

openmpi/4.1.6-cuda-11.6-nvfortran

openmpi/4.1.6-cuda-11.8

openmpi/4.1.6-cuda-11.8-nvfortran

petsc/3.13.2-intel-mpi-18

petsc/3.13.2-mpt

Please contact the helpdesk if you require a more recent version of PETSc.

pyfr/1.14.0-gpu

Please use pyfr/1.15.0-gpu

pytorch/1.12.1

pytorch/1.12.1-gpu

Please use one of the following

pytorch/1.13.1

pytorch/1.13.1-gpu

quantum-espresso/6.5-intel-19

Please use QE/6.5-intel-20.4

specfem3d

Please contact the helpdesk if you wish to use SPECFEM3D

starccm+/14.04.013-R8

starccm+/14.06.013-R8 \u2192 2019.3.1-R8

starccm+/15.02.009-R8 \u2192 2020.1.1-R8\u00a0

starccm+/15.04.010-R8 \u2192 2020.2.1-R8\u00a0

starccm+/15.06.008-R8 \u2192 2020.3.1-R8

starccm+/16.02.009 \u2192 2021.1.1

Please contact the helpdesk if you wish to use STAR-CCM+

tensorflow/2.9.1-gpu

tensorflow/2.10.0

tensorflow/2.11.0-gpu

Please use one of the following

tensorflow/2.15.0

tensorflow/2.15.0-gpu

ucx/1.9.0

ucx/1.9.0-cuda-11.6

ucx/1.9.0-cuda-11.8

Please use one of the following

ucx/1.15.0

ucx/1.15.0-cuda-11.6

ucx/1.15.0-cuda-11.8

vasp-5.4.4-intel19-mpt220

zlib/1.2.11

Please use zlib/1.3.1"},{"location":"software-libraries/hdf5/","title":"HDF5","text":"

Serial and parallel versions of HDF5 are available on Cirrus.

Module name Library version Compiler MPI library hdf5parallel/1.10.4-intel18-impi18 1.10.4 Intel 18 Intel MPI 18 hdf5parallel/1.10.6-intel18-mpt222 1.10.6 Intel 18 HPE MPT 2.22 hdf5parallel/1.10.6-intel19-mpt222 1.10.6 Intel 19 HPE MPT 2.22 hdf5parallel/1.10.6-gcc6-mpt222 1.10.6 GCC 6.3.0 HPE MPT 2.22

Instructions to install a local version of HDF5 can be found on this repository: https://github.com/hpc-uk/build-instructions/tree/main/utils/HDF5

"},{"location":"software-libraries/intel_mkl/","title":"Intel MKL: BLAS, LAPACK, ScaLAPACK","text":"

The Intel Maths Kernel Libraries (MKL) contain a variety of optimised numerical libraries including BLAS, LAPACK, and ScaLAPACK. In general, the exact commands required to build against MKL depend on the details of compiler, environment, requirements for parallelism, and so on. The Intel MKL link line advisor should be consulted.

See https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link-line-advisor.html

Some examples are given below. Note that loading the appropriate intel tools module will provide the environment variable MKLROOT which holds the location of the various MKL components.

"},{"location":"software-libraries/intel_mkl/#intel-compilers","title":"Intel Compilers","text":""},{"location":"software-libraries/intel_mkl/#blas-and-lapack","title":"BLAS and LAPACK","text":"

To use MKL libraries with the Intel compilers you just need to load the relevant Intel compiler module, and the Intel cmkl module, e.g.:

module load intel-20.4/fc\nmodule load intel-20.4/cmkl\n

To include MKL you specify the -mkl option on your compile and link lines. For example, to compile a simple Fortran program with MKL you could use:

ifort -c -mkl -o lapack_prb.o lapack_prb.f90\nifort -mkl -o lapack_prb.x lapack_prb.o\n

The -mkl flag without any options builds against the threaded version of MKL. If you wish to build against the serial version of MKL, you would use -mkl=sequential.

"},{"location":"software-libraries/intel_mkl/#scalapack","title":"ScaLAPACK","text":"

The distributed memory linear algebra routines in ScaLAPACK require MPI in addition to the compiler and MKL libraries. Here we use Intel MPI via:

module load intel-20.4/fc\nmodule load intel-20.4/mpi\nmodule load intel-20.4/cmkl\n

ScaLAPACK requires the Intel versions of BLACS at link time in addition to ScaLAPACK libraries; remember also to use the MPI versions of the compilers:

mpiifort -c -o linsolve.o linsolve.f90\nmpiifort -o linsolve.x linsolve.o -L${MKLROOT}/lib/intel64 \\\n-lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core \\\n-lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl\n
"},{"location":"software-libraries/intel_mkl/#gnu-compiler","title":"GNU Compiler","text":""},{"location":"software-libraries/intel_mkl/#blas-and-lapack_1","title":"BLAS and LAPACK","text":"

To use MKL libraries with the GNU compiler you first need to load the GNU compiler module and Intel MKL module, e.g.,:

module load gcc\nmodule load intel-20.4/cmkl\n

To include MKL you need to link explicitly against the MKL libraries. For example, to compile a single source file Fortran program with MKL you could use:

gfortran -c -o lapack_prb.o lapack_prb.f90\ngfortran -o lapack_prb.x lapack_prb.o -L$MKLROOT/lib/intel64 \\\n-lmkl_gf_lp64 -lmkl_core -lmkl_sequential\n

This will build against the serial version of MKL; to build against the threaded version use:

gfortran -c -o lapack_prb.o lapack_prb.f90\ngfortran -fopenmp -o lapack_prb.x lapack_prb.o -L$MKLROOT/lib/intel64 \\\n-lmkl_gf_lp64 -lmkl_core -lmkl_gnu_thread\n
"},{"location":"software-libraries/intel_mkl/#scalapack_1","title":"ScaLAPACK","text":"

The distributed memory linear algebra routines in ScaLAPACK require MPI in addition to the MKL libraries. On Cirrus, this is usually provided by SGI MPT.

module load gcc\nmodule load mpt\nmodule load intel-20.4/cmkl\n

Once you have the modules loaded you need to link against two additional libraries to include ScaLAPACK. Note we use here the relevant mkl_blacs_sgimpt_lp64 version of the BLACS library. Remember to use the MPI versions of the compilers:

mpif90 -f90=gfortran -c -o linsolve.o linsolve.f90\nmpif90 -f90=gfortran -o linsolve.x linsolve.o -L${MKLROOT}/lib/intel64 \\\n-lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core \\\n-lmkl_blacs_sgimpt_lp64 -lpthread -lm -ldl\n
"},{"location":"software-libraries/intel_mkl/#ilp-vs-lp-interface-layer","title":"ILP vs LP interface layer","text":"

Many applications will use 32-bit (4-byte) integers. This means the MKL 32-bit integer interface should be selected (which gives the _lp64 extensions seen in the examples above).

For applications which require, e.g., very large array indices (greater than 2^31-1 elements), the 64-bit integer interface is required. This gives rise to _ilp64 appended to library names. This may also require -DMKL_ILP64 at the compilation stage. Check the Intel link line advisor for specific cases.

"},{"location":"software-packages/Ansys/","title":"ANSYS Fluent","text":"

ANSYS Fluent is a computational fluid dynamics (CFD) tool. Fluent includes well-validated physical modelling capabilities to deliver fast, accurate results across the widest range of CFD and multi-physics applications.

"},{"location":"software-packages/Ansys/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/Ansys/#using-ansys-fluent-on-cirrus","title":"Using ANSYS Fluent on Cirrus","text":"

ANSYS Fluent on Cirrus is only available to researchers who bring their own licence. Other users cannot access the version centrally-installed on Cirrus.

If you have any questions regarding ANSYS Fluent on Cirrus please contact the Cirrus Helpdesk.

"},{"location":"software-packages/Ansys/#running-parallel-ansys-fluent-jobs","title":"Running parallel ANSYS Fluent jobs","text":"

The following batch file starts Fluent in a command line mode (no GUI) and starts the Fluent batch file \"inputfile\". One parameter that requires particular attention is \"-t504\". In this example 14 Cirrus nodes (14 * 72 = 1008 cores) are allocated; where half of the 1008 cores are physical and the other half are virtual. To run fluent optimally on Cirrus, only the physical cores should be employed. As such, fluent's -t flag should reflect the number of physical cores: in this example, \"-t504\" is employed.

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=ANSYS_test\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\nexport HOME=${HOME/home/work}\n\nscontrol show hostnames $SLURM_NODELIST > ~/fluent.launcher.host.txt\n\n# Launch the parallel job\n./fluent 3ddp -g -i inputfile.fl \\\n  -pinfiniband -alnamd64 -t504 -pib    \\\n  -cnf=~/fluent.launcher.host.txt      \\\n  -ssh  >& outputfile.txt\n

Below is the Fluent \"inputfile.fl\" batch script. Anything that starts with a \";\" is a comment. This script does the following:

"},{"location":"software-packages/Ansys/#actual-fluent-script-inputfilefl","title":"Actual Fluent script (\"inputfile.fl\"):","text":"

Replace [Your Path To ] before running

; Start transcript\n/file/start-transcript [Your Path To ]/transcript_output_01.txt\n; Read case file\nrc [Your Path To ]/200M-CFD-Benchmark.cas\n; Read data file\n/file/read-data [Your Path To ]/200M-CFD-Benchmark-500.dat\n; Print statistics\n/parallel/bandwidth\n/parallel/latency\n/parallel/timer/usage\n/parallel/timer/reset\n; Calculate 50 iterations\nit 50\n; Print statistics\n/parallel/timer/usage\n/parallel/timer/reset\n; Write data file\nwd [Your Path To ]/200M-CFD-Benchmark-500-new.dat\n; Stop transcript\n/file/stop-transcript\n; Exit Fluent\nexit\nyes\n
"},{"location":"software-packages/MATLAB/","title":"MATLAB","text":"

MATLAB combines a desktop environment tuned for iterative analysis and design processes with a programming language that expresses matrix and array mathematics directly.

"},{"location":"software-packages/MATLAB/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/MATLAB/#using-matlab-on-cirrus","title":"Using MATLAB on Cirrus","text":"

MATLAB R2020b and R2021b are available on Cirrus. R2020b is the current default.

This installation of MATLAB on Cirrus is covered by an Academic License - for use in teaching, academic research, and meeting course requirements at degree granting institutions only. Not for government, commercial, or other organizational use.

If your use of MATLAB is not covered by this license then please do not use this installation. Please contact the Cirrus Helpdesk to arrange use of your own MATLAB license on Cirrus.

Detailed version information:

-----------------------------------------------------------------------------------------------------\nMATLAB Version: 9.9.0.2037887 (R2020b) Update 8\nMATLAB License Number: 904098\nOperating System: Linux 4.18.0-305.25.1.el8_4.x86_64 #1 SMP Mon Oct 18 14:34:11 EDT 2021 x86_64\nJava Version: Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode\n-----------------------------------------------------------------------------------------------------\nMATLAB                                                Version 9.9         (R2020b)\nSimulink                                              Version 10.2        (R2020b)\nDSP System Toolbox                                    Version 9.11        (R2020b)\nDeep Learning HDL Toolbox                             Version 1.0         (R2020b)\nDeep Learning Toolbox                                 Version 14.1        (R2020b)\nImage Processing Toolbox                              Version 11.2        (R2020b)\nParallel Computing Toolbox                            Version 7.3         (R2020b)\nSignal Processing Toolbox                             Version 8.5         (R2020b)\nStatistics and Machine Learning Toolbox               Version 12.0        (R2020b)\nSymbolic Math Toolbox                                 Version 8.6         (R2020b)\nWavelet Toolbox                                       Version 5.5         (R2020b)\n
"},{"location":"software-packages/MATLAB/#running-matlab-jobs","title":"Running MATLAB jobs","text":"

On Cirrus, MATLAB is intended to be used on the compute nodes within Slurm job scripts. Use on the login nodes should be restricted to setting preferences, accessing help, and launching MDCS jobs. It is recommended that MATLAB is used without a GUI on the compute nodes, as the interactive response is slow.

"},{"location":"software-packages/MATLAB/#running-parallel-matlab-jobs-using-the-local-cluster","title":"Running parallel MATLAB jobs using the local cluster","text":"

The license for this installation of MATLAB provides only 32 workers via MDCS but provides 36 workers via the local cluster profile (there are 36 cores on a Cirrus compute node), so we only recommend the use of MDCS to test the configuration of distributed memory parallel computations for eventual use of your own MDCS license.

The local cluster should be used within a Slurm job script - you submit a job that runs MATLAB and uses the local cluster, which is the compute node that the job is running on.

MATLAB will normally use up to the total number of cores on a node for multi-threaded operations (e.g. matrix inversions) and for parallel computations. It also make no restriction on its memory use. These features are incompatible with the shared use of nodes on Cirrus. For the local cluster, a wrapper script is provided to limit the number of cores and amount of memory used, in proportion to the number of CPUs selected in the Slurm job script. Please use this wrapper instead of using MATLAB directly.

Say you have a job that requires 3 workers, each running 2 threads. As such, you should employ 3x2=6 cores. An example job script for this particular case would be :

#SBATCH --job-name=Example_MATLAB_Job\n#SBATCH --time=0:20:0\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=6\n#SBATCH --cpus-per-task=1\n\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\nmodule load matlab\n\nmatlab_wrapper -nodisplay < /mnt/lustre/indy2lfs/sw/cse-matlab/examples/testp.m > testp.log\n

Note, for MATLAB versions R2019 and later, the matlab_wrapper_2019 script may be required (see 2019 section below).

This would run the testp.m script, without a display, and exit when testp.m has finished. 6 CPUs are selected, which correspond to 6 cores, and the following limits would be set initially :

ncores = 6\nmemory = 42GB\n\nMaximum number of computational threads (maxNumCompThreads)          = 6\nPreferred number of workers in a parallel pool (PreferredNumWorkers) = 6\nNumber of workers to start on your local machine (NumWorkers)        = 6\nNumber of computational threads to use on each worker (NumThreads)   = 1\n

The testp.m program sets NumWorkers to 3 and NumThreads to 2 :

cirrus_cluster = parcluster('local');\nncores = cirrus_cluster.NumWorkers * cirrus_cluster.NumThreads;\ncirrus_cluster.NumWorkers = 3;\ncirrus_cluster.NumThreads = 2;\nfprintf(\"NumWorkers = %d NumThreads = %d ncores = %d\\n\",cirrus_cluster.NumWorkers,cirrus_cluster.NumThreads,ncores);\nif cirrus_cluster.NumWorkers * cirrus_cluster.NumThreads > ncores\n    disp(\"NumWorkers * NumThreads > ncores\");\n    disp(\"Exiting\");\n    exit(1);\nend\nsaveProfile(cirrus_cluster);\nclear cirrus_cluster;\n\n\nn = 3;\nA = 3000;\n\na=zeros(A,A,n);\nb=1:n;\n\nparpool;\n\ntic\nparfor i = 1:n\n    a(:,:,i) = rand(A);\nend\ntoc\ntic\nparfor i = 1:n\n    b(i) = max(abs(eig(a(:,:,i))));\nend\ntoc\n

Note that PreferredNumWorkers, NumWorkers and NumThreads persist between MATLAB sessions but will be updated correctly if you use the wrapper each time.

NumWorkers and NumThreads can be changed (using parcluster and saveProfile) but NumWorkers * NumThreads should be less than or equal to the number of cores (ncores above). If you wish a worker to run a threaded routine in serial, you must set NumThreads to 1 (the default).

If you specify exclusive node access, then all the cores and memory will be available. On the login nodes, a single core is used and memory is not limited.

"},{"location":"software-packages/MATLAB/#matlab-2019-versions","title":"MATLAB 2019 versions","text":"

There has been a change of configuration options for MATLAB from version R2019 and onwards that means the -r flag has been replaced with the -batch flag. To accommodate that a new job wrapper script is required to run applications. For these versions of MATLAB, if you need to use the -r or -batch flag replace this line in your Slurm script, i.e.:

matlab_wrapper -nodisplay -nodesktop -batch \"main_simulated_data_FINAL_clean(\"$ind\",\"$gamma\",\"$rw\",'\"$SLURM_JOB_ID\"')\n

with:

matlab_wrapper_2019 -nodisplay -nodesktop -batch \"main_simulated_data_FINAL_clean(\"$ind\",\"$gamma\",\"$rw\",'\"$SLURM_JOB_ID\"')\n

and this should allow scripts to run normally.

"},{"location":"software-packages/MATLAB/#running-parallel-matlab-jobs-using-mdcs","title":"Running parallel MATLAB jobs using MDCS","text":"

It is possible to use MATLAB on the login node to set up an MDCS Slurm cluster profile and then launch jobs using that profile. However, this does not give per-job control of the number of cores and walltime; these are set once in the profile.

This MDCS profile can be used in MATLAB on the login node - the MDCS computations are done in Slurm jobs launched using the profile.

"},{"location":"software-packages/MATLAB/#configuration","title":"Configuration","text":"

Start MATLAB on the login node. Configure MATLAB to run parallel jobs on your cluster by calling configCluster. For each cluster, configCluster only needs to be called once per version of MATLAB :

configCluster\n

Jobs will now default to the cluster rather than submit to the local machine (the login node in this case).

"},{"location":"software-packages/MATLAB/#configuring-jobs","title":"Configuring jobs","text":"

Prior to submitting the job, you can specify various parameters to pass to our jobs, such as walltime, e-mail, etc. Other than ProjectCode and WallTime, none of these are required to be set.

NOTE: Any parameters specified using this workflow will be persistent between MATLAB sessions :

% Get a handle to the cluster.\nc = parcluster('cirrus');\n\n% Assign the project code for the job.  **[REQUIRED]**\nc.AdditionalProperties.ProjectCode = 'project-code';\n\n% Specify the walltime (e.g. 5 hours).  **[REQUIRED]**\nc.AdditionalProperties.WallTime = '05:00:00';\n\n% Specify e-mail address to receive notifications about your job.\nc.AdditionalProperties.EmailAddress = 'your_name@your_address';\n\n% Request a specific reservation to run your job.  It is better to\n% use the queues rather than a reservation.\nc.AdditionalProperties.Reservation = 'your-reservation';\n\n% Set the job placement (e.g., pack, excl, scatter:excl).\n% Usually the default of free is what you want.\nc.AdditionalProperties.JobPlacement = 'pack';\n\n% Request to run in a particular queue.  Usually the default (no\n% specific queue requested) will route the job to the correct queue.\nc.AdditionalProperties.QueueName = 'queue-name';\n\n% If you are using GPUs, request up to 4 GPUs per node (this will\n% override a requested queue name and will use the 'gpu' queue).\nc.AdditionalProperties.GpusPerNode = 4;\n

Save changes after modifying AdditionalProperties fields :

c.saveProfile\n

To see the values of the current configuration options, call the specific AdditionalProperties name :

c.AdditionalProperties\n

To clear a value, assign the property an empty value ('', [], or false) :

% Turn off email notifications.\nc.AdditionalProperties.EmailAddress = '';\n
"},{"location":"software-packages/MATLAB/#interactive-jobs","title":"Interactive jobs","text":"

To run an interactive pool job on the cluster, use parpool as before. configCluster sets NumWorkers to 32 in the cluster to match the number of MDCS workers available in our TAH licence. If you have your own MDCS licence, you can change this by setting c.NumWorkers and saving the profile. :

% Open a pool of 32 workers on the cluster.\np = parpool('cirrus',32);\n

Rather than running locally on one compute node machine, this pool can run across multiple nodes on the cluster :

% Run a parfor over 1000 iterations.\nparfor idx = 1:1000\n  a(idx) = ...\nend\n

Once you have finished using the pool, delete it :

% Delete the pool\np.delete\n
"},{"location":"software-packages/MATLAB/#serial-jobs","title":"Serial jobs","text":"

Rather than running interactively, use the batch command to submit asynchronous jobs to the cluster. This is generally more useful on Cirrus, which usually has long queues. The batch command will return a job object which is used to access the output of the submitted job. See the MATLAB documentation for more help on batch :

% Get a handle to the cluster.\nc = parcluster('cirrus');\n\n% Submit job to query where MATLAB is running on the cluster.\nj = c.batch(@pwd, 1, {});\n\n% Query job for state.\nj.State\n\n% If state is finished, fetch results.\nj.fetchOutputs{:}\n\n% Delete the job after results are no longer needed.\nj.delete\n

To retrieve a list of currently running or completed jobs, call parcluster to retrieve the cluster object. The cluster object stores an array of jobs that were run, are running, or are queued to run. This allows you to fetch the results of completed jobs. Retrieve and view the list of jobs as shown below :

c = parcluster('cirrus');\njobs = c.Jobs\n

Once you have identified the job you want, you can retrieve the results as you have done previously.

fetchOutputs is used to retrieve function output arguments; if using batch with a script, use load instead. Data that has been written to files on the cluster needs be retrieved directly from the file system.

To view results of a previously completed job :

% Get a handle on job with ID 2.\nj2 = c.Jobs(2);\n

NOTE: You can view a list of your jobs, as well as their IDs, using the above c.Jobs command :

% Fetch results for job with ID 2.\nj2.fetchOutputs{:}\n\n% If the job produces an error, view the error log file.\nc.getDebugLog(j.Tasks(1))\n

NOTE: When submitting independent jobs, with multiple tasks, you will have to specify the task number.

"},{"location":"software-packages/MATLAB/#parallel-jobs","title":"Parallel jobs","text":"

Users can also submit parallel workflows with batch. You can use the following example (parallel_example.m) for a parallel job :

function t = parallel_example(iter)\n\n  if nargin==0, iter = 16; end\n\n  disp('Start sim')\n\n  t0 = tic;\n  parfor idx = 1:iter\n    A(idx) = idx;\n    pause(2);\n  end\n  t =toc(t0);\n\n  disp('Sim completed.')\n

Use the batch command again, but since you are running a parallel job, you also specify a MATLAB Pool :

% Get a handle to the cluster.\nc = parcluster('cirrus');\n\n% Submit a batch pool job using 4 workers for 16 simulations.\nj = c.batch(@parallel_example, 1, {}, 'Pool', 4);\n\n% View current job status.\nj.State\n\n% Fetch the results after a finished state is retrieved.\nj.fetchOutputs{:}\n\nans =\n\n8.8872\n

The job ran in 8.89 seconds using 4 workers. Note that these jobs will always request N+1 CPU cores, since one worker is required to manage the batch job and pool of workers. For example, a job that needs eight workers will consume nine CPU cores. With a MDCS licence for 32 workers, you will be able to have a pool of 31 workers.

Run the same simulation but increase the Pool size. This time, to retrieve the results later, keep track of the job ID.

NOTE: For some applications, there will be a diminishing return when allocating too many workers, as the overhead may exceed computation time. :

% Get a handle to the cluster.\nc = parcluster('cirrus');\n\n% Submit a batch pool job using 8 workers for 16 simulations.\nj = c.batch(@parallel_example, 1, {}, 'Pool', 8);\n\n% Get the job ID\nid = j.ID\n\nId =\n\n4\n
% Clear workspace, as though you have quit MATLAB.\nclear j\n

Once you have a handle to the cluster, call the findJob method to search for the job with the specified job ID :

% Get a handle to the cluster.\nc = parcluster('cirrus');\n\n% Find the old job\nj = c.findJob('ID', 4);\n\n% Retrieve the state of the job.\nj.State\n\nans\n\nfinished\n\n% Fetch the results.\nj.fetchOutputs{:};\n\nans =\n\n4.7270\n\n% If necessary, retrieve an output/error log file.\nc.getDebugLog(j)\n

The job now runs 4.73 seconds using 8 workers. Run code with different number of workers to determine the ideal number to use.

Alternatively, to retrieve job results via a graphical user interface, use the Job Monitor (Parallel > Monitor Jobs).

"},{"location":"software-packages/MATLAB/#debugging","title":"Debugging","text":"

If a serial job produces an error, you can call the getDebugLog method to view the error log file :

j.Parent.getDebugLog(j.Tasks(1))\n

When submitting independent jobs, with multiple tasks, you will have to specify the task number. For Pool jobs, do not dereference into the job object :

j.Parent.getDebugLog(j)\n

The scheduler ID can be derived by calling schedID :

schedID(j)\n\nans\n\n25539\n
"},{"location":"software-packages/MATLAB/#to-learn-more","title":"To learn more","text":"

To learn more about the MATLAB Parallel Computing Toolbox, check out these resources:

"},{"location":"software-packages/MATLAB/#gpus","title":"GPUs","text":"

Calculations using GPUs can be done using the GPU nodes <../user-guide/gpu>. This can be done using MATLAB within a Slurm job script, similar to using the local cluster <local>, or can be done using the MDCS profile <MDCS>. The GPUs are shared unless you request exclusive access to the node (4 GPUs), so you may find that you share a GPU with another user.

"},{"location":"software-packages/altair_hw/","title":"Altair Hyperworks","text":"

Hyperworks includes best-in-class modeling, linear and nonlinear analyses, structural and system-level optimization, fluid and multi-body dynamics simulation, electromagnetic compatibility (EMC), multiphysics analysis, model-based development, and data management solutions.

"},{"location":"software-packages/altair_hw/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/altair_hw/#using-hyperworks-on-cirrus","title":"Using Hyperworks on Cirrus","text":"

Hyperworks is licenced software so you require access to a Hyperworks licence to access the software. For queries on access to Hyperworks on Cirrus and to enable your access please contact the Cirrus helpdesk.

The standard mode of using Hyperworks on Cirrus is to use the installation of the Desktop application on your local workstation or laptop to set up your model/simulation. Once this has been done you would transsfer the required files over to Cirrus using SSH and then launch the appropriate Solver program (OptiStruct, RADIOSS, MotionSolve).

Once the Solver has finished you can transfer the output back to your local system for visualisation and analysis in the Hyperworks Desktop.

"},{"location":"software-packages/altair_hw/#running-serial-hyperworks-jobs","title":"Running serial Hyperworks jobs","text":"

Each of the Hyperworks Solvers can be run in serial on Cirrus in a similar way. You should construct a batch submission script with the command to launch your chosen Solver with the correct command line options.

For example, here is a job script to run a serial RADIOSS job on Cirrus:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=HW_RADIOSS_test\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=1\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Set the number of threads to the CPUs per task\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\n\n# Load Hyperworks module\nmodule load altair-hwsolvers/14.0.210\n\n# Launch the parallel job\n#   Using 36 threads per node\n#\u00a0  srun picks up the distribution from the sbatch options\nsrun --cpu-bind=cores radioss box.fem\n
"},{"location":"software-packages/altair_hw/#running-parallel-hyperworks-jobs","title":"Running parallel Hyperworks jobs","text":"

Only the OptiStruct Solver currently supports parallel execution. OptiStruct supports a number of parallel execution modes of which two can be used on Cirrus:

"},{"location":"software-packages/altair_hw/#optistruct-smp","title":"OptiStruct SMP","text":"

You can use up to 36 physical cores (or 72 virtual cores using HyperThreading) for OptiStruct SMP mode as these are the maximum numbers available on each Cirrus compute node.

You use the -nt option to OptiStruct to specify the number of cores to use.

For example, to run an 18-core OptiStruct SMP calculation you could use the following job script:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=HW_OptiStruct_SMP\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=1\n#SBATCH --cpus-per-task=36\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load Hyperworks module\nmodule load altair-hwsolvers/14.0.210\n\n# Launch the parallel job\n#   Using 36 threads per node\n#\u00a0  srun picks up the distribution from the sbatch options\nsrun --cpu-bind=cores --ntasks=18 optistruct box.fem -nt 18\n
"},{"location":"software-packages/altair_hw/#optistruct-spmd-mpi","title":"OptiStruct SPMD (MPI)","text":"

There are four different parallelisation schemes for SPMD OptStruct that are selected by different flags:

You should launch OptiStruct SPMD using the standard Intel MPI mpirun command.

Note: OptiStruct does not support the use of SGI MPT, you must use Intel MPI.

Example OptiStruct SPMD job submission script:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=HW_OptiStruct_SPMD\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load Hyperworks module and Intel MPI\nmodule load altair-hwsolvers/14.0.210\nmodule load intel-mpi-17\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically \n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Run the OptStruct SPMD Solver (domain decomposition mode)\n#   Use 72 cores, 36 on each node (i.e. all physical cores)\n#\u00a0  srun picks up the distribution from the sbatch options\nsrun --ntasks=72 $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_14.0.211_linux64_impi box.fem -ddmmode\n
"},{"location":"software-packages/castep/","title":"CASTEP","text":"

CASTEP is a leading code for calculating the properties of materials from first principles. Using density functional theory, it can simulate a wide range of properties of materials proprieties including energetics, structure at the atomic level, vibrational properties, electronic response properties etc. In particular it has a wide range of spectroscopic features that link directly to experiment, such as infra-red and Raman spectroscopies, NMR, and core level spectra.

"},{"location":"software-packages/castep/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/castep/#using-castep-on-cirrus","title":"Using CASTEP on Cirrus","text":"

CASTEP is only available to users who have a valid CASTEP licence.

If you have a CASTEP licence and wish to have access to CASTEP on Cirrus please submit a request through the SAFE.

Note

CASTEP versions 19 and above require a separate licence from CASTEP versions 18 and below so these are treated as two separate access requests.

"},{"location":"software-packages/castep/#running-parallel-castep-jobs","title":"Running parallel CASTEP jobs","text":"

CASTEP can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

For example, the following script will run a CASTEP job using 4 nodes (144 cores).

#!/bin/bash\n\n # Slurm job options (name, compute nodes, job time)\n #SBATCH --job-name=CASTEP_Example\n #SBATCH --time=1:0:0\n #SBATCH --exclusive\n #SBATCH --nodes=4\n #SBATCH --tasks-per-node=36\n #SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load CASTEP version 18 module\nmodule load castep/18\n\n# Set OMP_NUM_THREADS=1 to avoid unintentional threading\nexport OMP_NUM_THREADS=1\n\n# Run using input in test_calc.in\nsrun --distribution=block:block castep.mpi test_calc\n
"},{"location":"software-packages/cp2k/","title":"CP2K","text":"

CP2K is a quantum chemistry and solid state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. CP2K provides a general framework for different modelling methods such as DFT using the mixed Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO, \u2026), and classical force fields (AMBER, CHARMM, \u2026). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimisation, and transition state optimisation using NEB or dimer method.

"},{"location":"software-packages/cp2k/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/cp2k/#using-cp2k-on-cirrus","title":"Using CP2K on Cirrus","text":"

CP2K is available through the cp2k module. Loading this module provides access to the MPI/OpenMP hybrid cp2k.psmp executable.

To run CP2K after loading this module you should also source the environment setup script that was generated by CP2K's toolchain (see example job script below)

"},{"location":"software-packages/cp2k/#running-parallel-cp2k-jobs-mpiopenmp-hybrid-mode","title":"Running Parallel CP2K Jobs - MPI/OpenMP Hybrid Mode","text":"

To run CP2K using MPI and OpenMP, load the cp2k module and use the cp2k.psmp executable.

For example, the following script will run a CP2K job using 8 nodes, with 2 OpenMP threads per MPI process:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=CP2K_test\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=8\n#SBATCH --tasks-per-node=18\n#SBATCH --cpus-per-task=2\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load CP2K\nmodule load cp2k\n\n# Source the environment setup script generated by CP2K's install toolchain\nsource $CP2K/tools/toolchain/install/setup\n\n# Set the number of threads to the value specified for --cpus-per-task above\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\n\n# Run using input in test.inp\nsrun cp2k.psmp -i test.inp\n
"},{"location":"software-packages/elements/","title":"ELEMENTS","text":"

ELEMENTS is a computational fluid dynamics (CFD) software tool based on the HELYX\u00ae package developed by ENGYS. The software features an advanced open-source CFD simulation engine and a client-server GUI to provide a flexible and cost-effective HPC solver platform for automotive and motorsports design applications, including a dedicated virtual wind tunnel wizard for external vehicle aerodynamics and other proven methods for UHTM, HVAC, aeroacoustics, etc.

"},{"location":"software-packages/elements/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/elements/#using-elements-on-cirrus","title":"Using ELEMENTS on Cirrus","text":"

ELEMENTS is only available on Cirrus to authorised users with a valid license of the software. For any queries regarding ELEMENTS on Cirrus, please contact ENGYS or the Cirrus Helpdesk.

ELEMENTS applications can be run on Cirrus in two ways:

A complete user's guide to access ELEMENTS on demand via Cirrus is provided by ENGYS as part of this service.

"},{"location":"software-packages/elements/#running-elements-jobs-in-parallel","title":"Running ELEMENTS Jobs in Parallel","text":"

The standard execution of ELEMENTS applications on Cirrus is handled through the command line using a submission script to control Slurm. A basic submission script for running multiple ELEMENTS applications in parallel using the SGI-MPT (Message Passing Toolkit) module is included below. In this example the applications helyxHexMesh, caseSetup and helyxAero are run sequentially using 4 nodes (144 cores).

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Test\n#SBATCH --time=1:00:00\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=36\n#SBATCH --cpus-per-task=1\n#SBATCH --output=test.out\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=t01\n\n# Replace [partition name] below with your partition name (e.g. standard)\n#SBATCH --partition=standard\n\n\n# Replace [QoS name] below with your QoS name (e.g. commercial)\n#SBATCH --qos=commercial\n\n# Load any required modules\nmodule load gcc\nmodule load mpt\n\n# Load the HELYX-Core environment v3.5.0 (select version as needed, e.g. 3.5.0)\nsource /scratch/sw/elements/v3.5.0/CORE/HELYXcore-3.5.0/platforms/activeBuild.shrc\n\n# Launch ELEMENTS applications in parallel\nexport myoptions=\"-parallel\"\njobs=\"helyxHexMesh caseSetup helyxAero\"\n\nfor job in `echo $jobs`\ndo\n\n  case \"$job\" in\n   *                )   options=\"$myoptions\" ;;\n  esac\n\n  srun $job $myoptions 2>&1 | tee log/$job.$SLURM_JOB_ID.out\n\ndone\n

Alternatively, the user can execute most ELEMENTS applications on Cirrus interactively via the GUI by following these simple steps:

  1. Launch ELEMENTS GUI in your local Windows or Linux machine.
  2. Create a client-server connection to Cirrus using the dedicated node provided for this service in the GUI. Enter your Cirrus user login details and the total number of processors to be employed in the cluster for parallel execution.
  3. Use the GUI in the local machine to access the remote file system in Cirrus to load a geometry, create a computational grid, set up a simulation, solve the flow, and post-process the results using the HPC resources available in the cluster. The Slurm scheduling associated with every ELEMENTS job is handled automatically by the client-server.
  4. Visualise the remote data from your local machine, perform changes to the model and complete as many flow simulations in Cirrus as required, all interactively from within the GUI.
  5. Disconnect the client-server at any point during execution, leave a utility or solver running in the cluster, and resume the connection to Cirrus from another client machine to reload an existing case in the GUI when needed.
"},{"location":"software-packages/flacs/","title":"FLACS","text":"

FLACS from Gexcon is the industry standard for CFD explosion modelling and one of the best validated tools for modeling flammable and toxic releases in a technical safety context.

The Cirrus cluster is ideally suited to run multiple FLACS simulations simultaneously, via its batch system. Short lasting simulations (of typically up to a few hours computing time each) can be processed efficiently and you could get a few hundred done in a day or two. In contrast, the Cirrus cluster is not particularly suited for running single big FLACS simulations with many threads: each node on Cirrus has 2x4 memory channels, and for memory-bound applications like FLACS multi-threaded execution will not scale linearly beyond eight cores. On most systems, FLACS will not scale well to more than four cores (cf. the FLACS User's Manual), and therefore multi-core hardware is normally best used by increasing the number of simulations running in parallel rather than by increasing the number of cores per simulation.

Gexcon has two different service offerings on Cirrus: FLACS-Cloud and FLACS-HPC. FLACS-Cloud is the preferable way to exploit the HPC cluster, directly from the FLACS graphical user interfaces. For users who are familiar with accessing remote Linux HPC systems manually, FLACS-HPC may be an option. Both services are presented below.

"},{"location":"software-packages/flacs/#flacs-cloud","title":"FLACS-Cloud","text":"

FLACS-Cloud is a high performance computing service available right from the FLACS-Risk user interface, as well as from the FLACS RunManager. It allows you to run FLACS simulations on the high performance cloud computing infrastructure of Gexcon's partner EPCC straight from the graphical user interfaces of FLACS -- no need to manually log in, transfer data, or start jobs!

By using the FLACS-Cloud service, you can run a large number of simulations very quickly, without having to invest into in-house computing hardware. The FLACS-Cloud service scales to your your demand and facilitates running projects with rapid development cycles.

The workflow for using FLACS-Cloud is described in the FLACS User's Manual and in the FLACS-Risk documentation; you can also find basic information in the knowledge base of the FLACS User Portal (accessible for FLACS license holders).

"},{"location":"software-packages/flacs/#flacs-hpc","title":"FLACS-HPC","text":"

Compared to FLACS-Cloud, the FLACS-HPC service is built on more traditional ways of accessing and using a remote Linux cluster. Therefore the user experience is more basic, and FLACS has to be run manually. For an experienced user, however, this way of exploiting the HPC system can be at least as efficient as FLACS-Cloud.

Follow the steps below to use the FLACS-HPC facilities on Cirrus.

Note: The instructions below assume you have a valid account on Cirrus. To get an account please first get in touch with FLACS support at flacs@gexcon.com and then see the instructions in the Tier-2 SAFE Documentation.

Note: In the instructions below you should substitute \"username\" by your actual Cirrus username.

"},{"location":"software-packages/flacs/#log-into-cirrus","title":"Log into Cirrus","text":"

Log into Cirrus following the instructions at ../user-guide/connecting.

"},{"location":"software-packages/flacs/#upload-your-data-to-cirrus","title":"Upload your data to Cirrus","text":"

Transfer your data to Cirrus by following the instructions at ../user-guide/data.

For example, to copy the scenario definition files from the current directory to the folder project_folder in your home directory on Cirrus run the following command on your local machine:

rsync -avz c*.dat3 username@cirrus.epcc.ed.ac.uk:project_folder\n

Note that this will preserve soft links as such; the link targets are not copied if they are outside the current directory.

"},{"location":"software-packages/flacs/#flacs-license-manager","title":"FLACS license manager","text":"

In order to use FLACS a valid license is required. To check the availability of a license, a license manager is used. To be able to connect to the license manager from the batch system, users wishing to use FLACS should add the following file as ~/.hasplm/hasp_104628.ini (that is, in their home directory)

; copy this file (vendor is gexcon) to ~/.hasplm/hasp_104628.ini\naggressive = 0\nbroadcastsearch = 0\nserveraddr = cirrus-services1\ndisable_IPv6 = 1\n
"},{"location":"software-packages/flacs/#submit-a-flacs-job-to-the-queue","title":"Submit a FLACS job to the queue","text":"

To run FLACS on Cirrus you must first change to the directory where your FLACS jobs are located, use the cd (change directory) command for Linux. For example

cd projects/sim\n

The usual way to submit work to the queue system is to write a submission script, which would be located in the working directory. This is a standard bash shell script, a simple example of which is given here:

#!/bin/bash --login\n\n#SBATCH --job-name=test_flacs_1\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=1\n#SBATCH --time=02:00:00\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load flacs-cfd/21.2\n\nrun_runflacs 012345\n

The script has a series of special comments (introduced by #SBATCH) which give information to the queue system to allow the system to allocate space for the job and to execute the work. These are discussed in more detail below.

The flacs module is loaded to make the application available. Note that you should specify the specific version you require:

module load flacs-cfd/21.2\n

(Use module avail flacs to see which versions are available.) The appropriate FLACS commands can then be executed in the usual way.

Submit your FLACS jobs using the sbatch command, e.g.:

$ sbatch --account=i123 script.sh\nSubmitted batch job 157875\n

The --account=i123 option is obligatory and states that account i123 will be used to record the CPU time consumed by the job, and result in billing to the relevant customer. You will need your project account code here to replace i123. You can check your account details in SAFE.

The name of the submission script here is script.sh. The queue system returns a unique job id (here 157875) to identify the job. For example, the standard output here will appear in a file named slurm-157875.out in the current working directory.

"},{"location":"software-packages/flacs/#options-for-flacs-jobs","title":"Options for FLACS jobs","text":"

The #SBATCH lines in the script above set various parameters which control execution of the job. The first is --job-name just provides a label which will be associated with the job.

The parameter --ntasks=1 is the number of tasks or processes involved in the job. For a serial FLACS job you would use --ntasks=1. The

The maximum length of time (i.e. wall clock time) you want the job to run is specified with the --time=hh:mm:ss option. After this time, your job will be terminated by the job scheduler. The default time limit is 12 hours. It is useful to have an estimate of how long your job will take to be able to specify the correct limit (which can take some experience). Note that shorter jobs can sometimes be scheduled more quickly by the system.

Multithreaded FLACS simulations can be run on Cirrus with the following job submission, schematically:

#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=4\n...\n\nrun_runflacs -dir projects/sim 010101 NumThreads=4\n

When submitting multithreaded FLACS simulations the --cpus-per-task option should be used in order for the queue system to allocate the correct resources (here 4 threads running on 4 cores). In addition, one must also specify the number of threads used by the simulation with the NumThreads=4 option to the run_runflacs.

One can also specify the OpenMP version of FLACS explicitly via, e.g.,

export OMP_NUM_THREADS=20\n\nrun_runflacs version _omp <run number> NumThreads=20\n

See the FLACS manual for further details.

"},{"location":"software-packages/flacs/#monitor-your-jobs","title":"Monitor your jobs","text":"

You can monitor the progress of your jobs with the squeue command. This will list all jobs that are running or queued on the system. To list only your jobs use:

squeue -u username\n
"},{"location":"software-packages/flacs/#submitting-many-flacs-jobs-as-a-job-array","title":"Submitting many FLACS jobs as a job array","text":"

Running many related scenarios with the FLACS simulator is ideally suited for using job arrays, i.e. running the simulations as part of a single job.

Note you must determine ahead of time the number of scenarios involved. This determines the number of array elements, which must be specified at the point of job submission. The number of array elements is specified by --array argument to sbatch.

A job script for running a job array with 128 FLACS scenarios that are located in the current directory could look like this:

#!/bin/bash --login\n\n# Recall that the resource specification is per element of the array\n# so this would give four instances of one task (with one thread per\n# task --cpus-per-task=1).\n\n#SBATCH --array=1-128\n\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=1\n#SBATCH --time=02:00:00\n#SBATCH --account=z04\n\n#SBATCH --partition=standard\n#SBATCH --qos=commercial\n\n# Abbreviate some SLURM variables for brevity/readability\n\nTASK_MIN=${SLURM_ARRAY_TASK_MIN}\nTASK_MAX=${SLURM_ARRAY_TASK_MAX}\nTASK_ID=${SLURM_ARRAY_TASK_ID}\nTASK_COUNT=${SLURM_ARRAY_TASK_COUNT}\n\n# Form a list of relevant files, and check the number of array elements\n# matches the number of cases with 6-digit identifiers.\n\nCS_FILES=(`ls -1 cs??????.dat3`)\n\nif test \"${#CS_FILES[@]}\" -ne \"${TASK_COUNT}\";\nthen\n  printf \"Number of files is:       %s\\n\" \"${#CS_FILES[@]}\"\n  printf \"Number of array tasks is: %s\\n\" \"${TASK_COUNT}\"\n  printf \"Do not match!\\n\"\nfi\n\n# All tasks loop through the entire list to find their specific case.\n\nfor (( jid = $((${TASK_MIN})); jid <= $((${TASK_MAX})); jid++ ));\ndo\n  if test \"${TASK_ID}\" -eq \"${jid}\";\n  then\n      # File list index with offset zero\n  file_id=$((${jid} - ${TASK_MIN}))\n  # Form the substring file_id (recall syntax is :offset:length)\n  my_file=${CS_FILES[${file_id}]}\n  my_file_id=${my_file:2:6}\n  fi\ndone\n\nprintf \"Task %d has file %s id %s\\n\" \"${TASK_ID}\" \"${my_file}\" \"${my_file_id}\"\n\nmodule load flacs-cfd/21.2\n`which run_runflacs` ${my_file_id}\n
"},{"location":"software-packages/flacs/#transfer-data-from-cirrus-to-your-local-system","title":"Transfer data from Cirrus to your local system","text":"

After your simulations are finished, transfer the data back from Cirrus following the instructions at ../user-guide/data.

For example, to copy the result files from the directory project_folder in your home directory on Cirrus to the folder /tmp on your local machine use:

rsync -rvz --include='r[13t]*.*' --exclude='*' username@cirrus.epcc.ed.ac.uk:project_folder/ /tmp\n
"},{"location":"software-packages/flacs/#billing-for-flacs-hpc-use-on-cirrus","title":"Billing for FLACS-HPC use on Cirrus","text":"

CPU time on Cirrus is measured in CPUh for each job run on a compute node, based on the number of physical cores employed. Only jobs submitted to compute nodes via sbatch are charged. Any processing on a login node is not charged. However, using login nodes for computations other than simple pre- or post-processing is strongly discouraged.

Gexcon normally bills monthly for the use of FLACS-Cloud and FLACS-HPC, based on the Cirrus CPU usage logging.

"},{"location":"software-packages/flacs/#getting-help","title":"Getting help","text":"

Get in touch with FLACS Support by email to flacs@gexcon.com if you encounter any problems. For specific issues related to Cirrus rather than FLACS contact the Cirrus helpdesk.

"},{"location":"software-packages/gaussian/","title":"Gaussian","text":"

Gaussian is a general-purpose computational chemistry package.

"},{"location":"software-packages/gaussian/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/gaussian/#using-gaussian-on-cirrus","title":"Using Gaussian on Cirrus","text":"

Gaussian on Cirrus is only available to University of Edinburgh researchers through the University's site licence. Users from other institutions cannot access the version centrally-installed on Cirrus.

If you wish to have access to Gaussian on Cirrus please request access via SAFE

Gaussian cannot run across multiple nodes. This means that the maximum number of cores you can use for Gaussian jobs is 36 (the number of cores on a compute node). In reality, even large Gaussian jobs will not be able to make effective use of more than 8 cores. You should explore the scaling and performance of your calculations on the system before running production jobs.

"},{"location":"software-packages/gaussian/#scratch-directories","title":"Scratch Directories","text":"

You will typically add lines to your job submission script to create a scratch directory on the solid state storage for temporary Gaussian files. e.g.:

export GAUSS_SCRDIR=\"/scratch/space1/x01/auser/$SLURM_JOBID.tmp\"\nmkdir -p $GAUSS_SCRDIR\n

You should also add a line at the end of your job script to remove the scratch directory. e.g.:

rm -r $GAUSS_SCRDIR\n
"},{"location":"software-packages/gaussian/#running-serial-gaussian-jobs","title":"Running serial Gaussian jobs","text":"

In many cases you will use Gaussian in serial mode. The following example script will run a serial Gaussian job on Cirrus (before using, ensure you have created a Gaussian scratch directory as outlined above).

#!/bin/bash\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=G16_test\n#SBATCH --ntasks=1\n#SBATCH --time=0:20:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load Gaussian module\nmodule load gaussian\n\n# Setup the Gaussian environment\nsource $g16root/g16/bsd/g16.profile\n\n# Location of the scratch directory\nexport GAUSS_SCRDIR=\"/scratch/space1/x01/auser/$SLURM_JOBID.tmp\"\nmkdir -p $GAUSS_SCRDIR\n\n# Run using input in \"test0027.com\"\ng16 test0027\n\n# Remove the temporary scratch directory\nrm -r $GAUSS_SCRDIR\n
"},{"location":"software-packages/gaussian/#running-parallel-gaussian-jobs","title":"Running parallel Gaussian jobs","text":"

Gaussian on Cirrus can use shared memory parallelism through OpenMP by setting the OMP_NUM_THREADS environment variable. The number of cores requested in the job should also be modified to match.

For example, the following script will run a Gaussian job using 4 cores.

#!/bin/bash --login\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=G16_test\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=4\n#SBATCH --time=0:20:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load Gaussian module\nmodule load gaussian\n\n# Setup the Gaussian environment\nsource $g16root/g16/bsd/g16.profile\n\n# Location of the scratch directory\nexport GAUSS_SCRDIR=\"/scratch/space1/x01/auser/$SLURM_JOBID.tmp\"\nmkdir -p $GAUSS_SCRDIR\n\n# Run using input in \"test0027.com\"\nexport OMP_NUM_THREADS=4\ng16 test0027\n\n# Remove the temporary scratch directory\nrm -r $GAUSS_SCRDIR\n
"},{"location":"software-packages/gromacs/","title":"GROMACS","text":"

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

"},{"location":"software-packages/gromacs/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/gromacs/#using-gromacs-on-cirrus","title":"Using GROMACS on Cirrus","text":"

GROMACS is Open Source software and is freely available to all Cirrus users. A number of versions are available:

"},{"location":"software-packages/gromacs/#running-parallel-gromacs-jobs-pure-mpi","title":"Running parallel GROMACS jobs: pure MPI","text":"

GROMACS can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

For example, the following script will run a GROMACS MD job using 2 nodes (72 cores) with pure MPI.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=gmx_test\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=36\n#SBATCH --time=0:25:0\n# Make sure you are not sharing nodes with other users\n#SBATCH --exclusive\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load GROMACS module\nmodule load gromacs\n\n# Run using input in test_calc.tpr\nexport OMP_NUM_THREADS=1 \nsrun gmx_mpi mdrun -s test_calc.tpr\n
"},{"location":"software-packages/gromacs/#running-parallel-gromacs-jobs-hybrid-mpiopenmp","title":"Running parallel GROMACS jobs: hybrid MPI/OpenMP","text":"

The following script will run a GROMACS MD job using 2 nodes (72 cores) with 6 MPI processes per node (12 MPI processes in total) and 6 OpenMP threads per MPI process.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=gmx_test\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=6\n#SBATCH --cpus-per-task=6\n#SBATCH --time=0:25:0\n# Make sure you are not sharing nodes with other users\n#SBATCH --exclusive\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load GROMACS and MPI modules\nmodule load gromacs\n\n# Run using input in test_calc.tpr\nexport OMP_NUM_THREADS=6\nsrun gmx_mpi mdrun -s test_calc.tpr\n
"},{"location":"software-packages/gromacs/#gromacs-gpu-jobs","title":"GROMACS GPU jobs","text":"

The following script will run a GROMACS GPU MD job using 1 node (40 cores and 4 GPUs). The job is set up to run on \\<MPI task count> MPI processes, and \\<OMP thread count> OMP threads -- you will need to change these variables when running your script.

Note

Unlike the base version of GROMACS, the GPU version comes with only MDRUN installed. For any pre- and post-processing, you will need to use the non-GPU version of GROMACS.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=gmx_test\n#SBATCH --nodes=1\n#SBATCH --time=0:25:0\n#SBATCH --exclusive\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n#SBATCH --gres=gpu:4\n\n# Load GROMACS and MPI modules\nmodule load gromacs/2023.4-gpu\n\n# Run using input in test_calc.tpr\nexport OMP_NUM_THREADS=<OMP thread count>\nsrun --ntasks=<MPI task count> --cpus-per-task=<OMP thread count> \\\n     gmx_mpi mdrun -ntomp <OMP thread count> -s test_calc.tpr\n

Information on how to assign different types of calculation to the CPU or GPU appears in the GROMACS documentation under Getting good performance from mdrun

"},{"location":"software-packages/helyx/","title":"HELYX\u00ae","text":"

HELYX is a comprehensive, general-purpose, computational fluid dynamics (CFD) software package for engineering analysis and design optimisation developed by ENGYS. The package features an advanced open-source CFD simulation engine and a client-server GUI to provide a flexible and cost-effective HPC solver platform for enterprise applications.

"},{"location":"software-packages/helyx/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/helyx/#using-helyx-on-cirrus","title":"Using HELYX on Cirrus","text":"

HELYX is only available on Cirrus to authorised users with a valid license to use the software. For any queries regarding HELYX on Cirrus, please contact ENGYS or the Cirrus Helpdesk.

HELYX applications can be run on Cirrus in two ways:

A complete user\u2019s guide to access HELYX on demand via Cirrus is provided by ENGYS as part of this service.

"},{"location":"software-packages/helyx/#running-helyx-jobs-in-parallel","title":"Running HELYX Jobs in Parallel","text":"

The standard execution of HELYX applications on Cirrus is handled through the command line using a submission script to control Slurm. A basic submission script for running multiple HELYX applications in parallel using the SGI-MPT (Message Passing Toolkit) module is included below. In this example the applications helyxHexMesh, caseSetup and helyxSolve are run sequentially using 4 nodes (144 cores).

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Test\n#SBATCH --time=1:00:00\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --ntasks-per-node=36\n#SBATCH --cpus-per-task=1\n#SBATCH --output=test.out\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=t01\n\n# Replace [partition name] below with your partition name (e.g. standard)\n#SBATCH --partition=standard\n\n# Replace [QoS name] below with your QoS name (e.g. commercial)\n#SBATCH --qos=commercial\n\n# Load any required modules\nmodule load gcc\nmodule load mpt\n\n# Load the HELYX-Core environment v3.5.0 (select version as needed, e.g. 3.5.0)\nsource /scratch/sw/helyx/v3.5.0/CORE/HELYXcore-3.5.0/platforms/activeBuild.shrc\n\n# Set the number of threads to 1\nexport OMP_NUM_THREADS=1\n\n# Launch HELYX applications in parallel\nexport myoptions=\"-parallel\"\njobs=\"helyxHexMesh caseSetup helyxSolve\"\n\nfor job in `echo $jobs`\ndo\n\n   case \"$job\" in\n    *                )   options=\"$myoptions\" ;;\n   esac\n\n   srun $job $myoptions 2>&1 | tee log/$job.$SLURM_JOB_ID.out\n\ndone\n

Alternatively, the user can execute most HELYX applications on Cirrus interactively via the GUI by following these simple steps:

  1. Launch HELYX GUI in your local Windows or Linux machine.
  2. Create a client-server connection to Cirrus using the dedicated node provided for this service in the GUI. Enter your Cirrus user login details and the total number of processors to be employed in the cluster for parallel execution.
  3. Use the GUI in the local machine to access the remote file system in Cirrus to load a geometry, create a computational grid, set up a simulation, solve the flow, and post-process the results using the HPC resources available in the cluster. The Slurm scheduling associated with every HELYX job is handled automatically by the client-server.
  4. Visualise the remote data from your local machine, perform changes to the model and complete as many flow simulations in Cirrus as required, all interactively from within the GUI.
  5. Disconnect the client-server at any point during execution, leave a utility or solver running in the cluster, and resume the connection to Cirrus from another client machine to reload an existing case in the GUI when needed.
"},{"location":"software-packages/lammps/","title":"LAMMPS","text":"

LAMMPS, is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.

"},{"location":"software-packages/lammps/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/lammps/#using-lammps-on-cirrus","title":"Using LAMMPS on Cirrus","text":"

LAMMPS is Open Source software, and is freely available to all Cirrus users. A number of versions are available:

"},{"location":"software-packages/lammps/#running-parallel-lammps-jobs-mpi","title":"Running parallel LAMMPS jobs (MPI)","text":"

LAMMPS can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

For example, the following script will run a LAMMPS MD job using 4 nodes (144 cores) with pure MPI.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=lammps_Example\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load LAMMPS module\nmodule load lammps\n\n# Run using input in in.test\nsrun lmp_mpi < in.test\n
"},{"location":"software-packages/lammps/#running-parallel-lammps-jobs-gpu","title":"Running parallel LAMMPS jobs (GPU)","text":"

LAMMPS can exploit multiple GPUs, although the performance scaling depends heavily on the particular system, so each user should run benchmarks for their particular use-case. While not every LAMMPS forcefield/fix are available for GPU, a vast majority is, and more are added with each new version. Check the LAMMPS documentation for GPU compatibility with a specific command.

For example, the following script will run a LAMMPS MD job using 2 GPUs

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=lammps_Example\n#SBATCH --time=00:20:00\n#SBATCH --nodes=1\n#SBATCH --gres=gpu:2\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load LAMMPS module\nmodule load lammps-gpu\n\n# Run using input in in.test\nsrun lmp -sf gpu -pk gpu 2 -in input.file -l log.file\n
"},{"location":"software-packages/lammps/#compiling-lammps-on-cirrus","title":"Compiling LAMMPS on Cirrus","text":"

Compile instructions for LAMMPS on Cirrus can be found on GitHub:

"},{"location":"software-packages/molpro/","title":"Molpro","text":"

Molpro is a comprehensive system of ab initio programs for advanced molecular electronic structure calculations, designed and maintained by H.-J. Werner and P. J. Knowles, and containing contributions from many other authors. It comprises efficient and well parallelized programs for standard computational chemistry applications, such as DFT with a large choice of functionals, as well as state-of-the art high-level coupled-cluster and multi-reference wave function methods.

"},{"location":"software-packages/molpro/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/molpro/#using-molpro-on-cirrus","title":"Using Molpro on Cirrus","text":"

In order to use the Molpro binaries on Cirrus you must possess a valid Molpro licence key. Without a key you will be able to access the binaries but will not be able to run any calculations.

"},{"location":"software-packages/molpro/#running","title":"Running","text":"

To run Molpro you need to add the correct module to your environment; specify your licence key using the MOLPRO_KEY environment variable and make sure you specify the location for the temporary files using the TMPDIR environment variable. You can load the default Molpro module with:

module add molpro\n

Once you have loaded the module, the Molpro executables are available in your PATH.

"},{"location":"software-packages/molpro/#example-job-submission-script","title":"Example Job Submission Script","text":"

An example Molpro job submission script is shown below.

#!/bin/bash\n#SBATCH --job-name=molpro_test\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=36\n#SBATCH --exclusive\n#SBATCH --time=0:15:0\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Replace \"budget\" with your budget code in the line below\n#SBATCH --account=budget\n\n# Load the molpro module \nmodule add molpro\n\n# Specify your Molpro licence key\n#   Replace this with the value of your Molpro licence key\nexport MOLPRO_KEY=\"...your Molpro key...\"\n\n# Make sure temporary files are in your home file space\nexport TMPDIR=$SLURM_SUBMIT_DIR\n\n# Run Molpro using the input my_file.inp\n#    Requested 1 node above = 36 cores\n#\u00a0   Note use of \"molpro\" command rather than usual \"srun\"\nmolpro -n 36 my_file.inp\n
"},{"location":"software-packages/namd/","title":"NAMD","text":"

NAMD, recipient of a 2002 Gordon Bell Award and a 2012 Sidney Fernbach Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.

"},{"location":"software-packages/namd/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/namd/#using-namd-on-cirrus","title":"Using NAMD on Cirrus","text":"

NAMD is freely available to all Cirrus users.

"},{"location":"software-packages/namd/#running-parallel-namd-jobs","title":"Running parallel NAMD jobs","text":"

NAMD can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

For example, the following script will run a NAMD MD job across 2 nodes (72 cores) with 2 processes/tasks per node and 18 cores per process, one of which is reserved for communications.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=NAMD_Example\n#SBATCH --time=01:00:00\n#SBATCH --exclusive\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=2\n#SBATCH --cpus-per-task=18\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load namd/2.14\n\nsrun namd2 +setcpuaffinity +ppn 17 +pemap 1-17,19-35 +commap 0,18 input.namd\n

NAMD can also be run without SMP.

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=NAMD_Example\n#SBATCH --time=01:00:00\n#SBATCH --exclusive\n#SBATCH --nodes=2\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nmodule load namd/2.14-nosmp\n\nsrun namd2 +setcpuaffinity input.namd\n

And, finally, there's also a GPU version. The example below uses 8 GPUs across two GPU nodes, running one process per GPU and 9 worker threads per process (+ 1 comms thread).

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=NAMD_Example\n#SBATCH --time=01:00:00\n#SBATCH --nodes=2\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --gres=gpu:2\n\nmodule load namd/2022.07.21-gpu\n\nsrun --hint=nomultithread --ntasks=8 --tasks-per-node=4 \\ \n    namd2 +ppn 9 +setcpuaffinity +pemap 1-9,11-19,21-29,31-39 +commap 0,10,20,30 \\\n          +devices 0,1,2,3 input.namd\n
"},{"location":"software-packages/openfoam/","title":"OpenFOAM","text":"

OpenFOAM is an open-source toolbox for computational fluid dynamics. OpenFOAM consists of generic tools to simulate complex physics for a variety of fields of interest, from fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics, electromagnetism and the pricing of financial options.

The core technology of OpenFOAM is a flexible set of modules written in C++. These are used to build solvers and utilities to perform pre- and post-processing tasks ranging from simple data manipulation to visualisation and mesh processing.

"},{"location":"software-packages/openfoam/#available-versions","title":"Available Versions","text":"

OpenFOAM comes in a number of different flavours. The two main releases are from https://openfoam.org/ and from https://www.openfoam.com/.

You can query the versions of OpenFOAM are currently available on Cirrus from the command line with module avail openfoam.

Versions from https://openfoam.org/ are typically v8 etc, while versions from https://www.openfoam.com/ are typically v2006 (released June 2020).

"},{"location":"software-packages/openfoam/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/openfoam/#using-openfoam-on-cirrus","title":"Using OpenFOAM on Cirrus","text":"

Any batch script which intends to use OpenFOAM should first load the appropriate openfoam module. You then need to source the etc/bashrc file provided by OpenFOAM to set all the relevant environment variables. The relevant command is printed to screen when the module is loaded. For example, for OpenFOAM v8:

module add openfoam/v8.0\nsource ${FOAM_INSTALL_PATH}/etc/bashrc\n

You should then be able to use OpenFOAM in the usual way.

"},{"location":"software-packages/openfoam/#example-batch-submisison","title":"Example Batch Submisison","text":"

The following example batch submission script would run OpenFOAM on two nodes, with 36 MPI tasks per node.

#!/bin/bash\n\n#SBATCH --nodes=2\n#SBATCH --ntasks-per-node=36\n#SBATCH --exclusive\n#SBATCH --time=00:10:00\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the openfoam module and source the bashrc file\n\nmodule load openfoam/v8.0\nsource ${FOAM_INSTALL_PATH}/etc/bashrc\n\n# Compose OpenFOAM work in the usual way, except that parallel\n# executables are launched via srun. For example:\n\nsrun interFoam -parallel\n

A SLURM submission script would usually also contain an account token of the form

#SBATCH --account=your_account_here\n

where the your_account_here should be replaced by the relevant token for your account. This is available from SAFE with your budget details.

"},{"location":"software-packages/orca/","title":"ORCA","text":"

ORCA is an ab initio quantum chemistry program package that contains modern electronic structure methods including density functional theory, many-body perturbation, coupled cluster, multireference methods, and semi-empirical quantum chemistry methods. Its main field of application is larger molecules, transition metal complexes, and their spectroscopic properties. ORCA is developed in the research group of Frank Neese. The free version is available only for academic use at academic institutions.

"},{"location":"software-packages/orca/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/orca/#using-orca-on-cirrus","title":"Using ORCA on Cirrus","text":"

ORCA is available for academic use on Cirrus only. If you wish to use ORCA for commercial applications, you must contact the ORCA developers.

ORCA cannot use GPUs.

"},{"location":"software-packages/orca/#running-parallel-orca-jobs","title":"Running parallel ORCA jobs","text":"

The following script will run an ORCA job on the Cirrus using 4 MPI processes on a single node, each MPI process will be placed on a separate physical core. It assumes that the input file is h2o_2.inp

#!/bin/bash\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=ORCA_test\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=4\n\n#SBATCH --time=0:20:0\n\n#SBATCH --account=[budget code]\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load ORCA module\nmodule load orca\n\n# Launch the ORCA calculation\n#   * You must use \"$ORCADIR/orca\" so the application has the full executable path\n#   * Do not use \"srun\" to launch parallel ORCA jobs as they use interal ORCA routines to launch in parallel\n#   * Remember to change the name of the input file to match your file name\n$ORCADIR/orca h2o_2.inp\n

The example input file h2o_2.inp contains:

! DLPNO-CCSD(T) cc-pVTZ cc-pVTZ/C cc-pVTZ/jk rijk verytightscf TightPNO LED\n# Specify number of processors\n%pal\nnprocs 4\nend\n# Specify memory\n%maxcore 12000\n%mdci\nprintlevel 3\nend\n* xyz 0 1\nO 1.327706 0.106852 0.000000\nH 1.612645 -0.413154 0.767232\nH 1.612645 -0.413154 -0.767232\nO -1.550676 -0.120030 -0.000000\nH -0.587091 0.053367 -0.000000\nH -1.954502 0.759303 -0.000000\n*\n%geom\nFragments\n2 {3:5} end\nend\nend\n
"},{"location":"software-packages/qe/","title":"Quantum Espresso (QE)","text":"

Quantum Espresso is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.

"},{"location":"software-packages/qe/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/qe/#using-qe-on-cirrus","title":"Using QE on Cirrus","text":"

QE is Open Source software and is freely available to all Cirrus users.

"},{"location":"software-packages/qe/#running-parallel-qe-jobs","title":"Running parallel QE jobs","text":"

QE can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

For example, the following script will run a QE pw.x job using 4 nodes (144 cores).

#!/bin/bash\n#\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=pw_test\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --time=0:20:0\n# Make sure you are not sharing nodes with other users\n#SBATCH --exclusive\n\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load QE and MPI modules\nmodule load quantum-espresso\n\n# Run using input in test_calc.in\nsrun pw.x -i test_cals.in\n
"},{"location":"software-packages/starccm%2B/","title":"STAR-CCM+","text":"

STAR-CCM+ is a computational fluid dynamics (CFD) code and beyond. It provides a broad range of validated models to simulate disciplines and physics including CFD, computational solid mechanics (CSM), electromagnetics, heat transfer, multiphase flow, particle dynamics, reacting flow, electrochemistry, aero-acoustics and rheology; the simulation of rigid and flexible body motions with techniques including mesh morphing, overset mesh and six degrees of freedom (6DOF) motion; and the ability to combine and account for the interaction between the various physics and motion models in a single simulation to cover your specific application.

"},{"location":"software-packages/starccm%2B/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/starccm%2B/#licensing","title":"Licensing","text":"

All users must provide their own licence for STAR-CCM+. Currently we only support Power on Demand (PoD) licenses

For queries about other types of license options please contact the Cirrus Helpdesk with the relevant details.

"},{"location":"software-packages/starccm%2B/#using-star-ccm-on-cirrus-interactive-remote-gui-mode","title":"Using STAR-CCM+ on Cirrus: Interactive remote GUI Mode","text":"

A fast and responsive way of running with a GUI is to install STAR-CCM+ on your local Windows(7 or 10) or Linux workstation. You can then start your local STAR-CCM+ and connect to Cirrus in order to submit new jobs or query the status of running jobs.

You will need to setup passwordless SSH connections to Cirrus.

"},{"location":"software-packages/starccm%2B/#jobs-using-power-on-demand-pod-licences","title":"Jobs using Power on Demand (PoD) licences","text":"

You can then start the STAR-CCM+ server on the compute nodes. The following script starts the server:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=STAR-CCM_test\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=14\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the default HPE MPI environment\nmodule load mpt\nmodule load starccm+\n\nexport SGI_MPI_HOME=$MPI_ROOT\nexport PATH=$STARCCM_EXE:$PATH\nexport LM_LICENSE_FILE=48002@192.168.191.10\nexport CDLMD_LICENSE_FILE=48002@192.168.191.10\n\nexport LIBNSL_PATH=/mnt/lustre/indy2lfs/sw/libnsl/1.3.0\n\nscontrol show hostnames $SLURM_NODELIST > ./starccm.launcher.host.$SLURM_JOB_ID.txt\n\nstarccm+ -clientldlibpath ${LIBNSL_PATH}/lib -ldlibpath ${LIBNSL_PATH}/lib \\\n         -power -podkey <PODkey> -licpath ${LM_LICENSE_FILE} \\\n         -server -machinefile ./starccm.launcher.host.$SLURM_JOB_ID.txt \\\n         -np 504 -rsh ssh\n

You should replace \"<PODkey>\" with your PoD licence key.

"},{"location":"software-packages/starccm%2B/#automatically-load-and-start-a-star-ccm-simulation","title":"Automatically load and start a Star-CCM+ simulation","text":"

You can use the \"-batch\" option to automatically load and start a Star-CCM+ simulation.

Your submission script will look like this (the only difference with the previous examples is the \"starccm+\" line)

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=STAR-CCM_test\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=14\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Load the default HPE MPI environment\nmodule load mpt\nmodule load starccm+\n\nexport SGI_MPI_HOME=$MPI_ROOT\nexport PATH=$STARCCM_EXE:$PATH\nexport LM_LICENSE_FILE=48002@192.168.191.10\nexport CDLMD_LICENSE_FILE=48002@192.168.191.10\n\nexport LIBNSL_PATH=/mnt/lustre/indy2lfs/sw/libnsl/1.3.0\n\nscontrol show hostnames $SLURM_NODELIST > ./starccm.launcher.host.$SLURM_JOB_ID.txt\n\nstarccm+ -clientldlibpath ${LIBNSL_PATH}/lib -ldlibpath ${LIBNSL_PATH}/lib \\\n         -power -podkey <PODkey> -licpath ${LM_LICENSE_FILE} \\\n         -batch simulation.java \\\n         -machinefile ./starccm.launcher.host.$SLURM_JOB_ID.txt \\\n         -np 504 -rsh ssh\n

This script will load the file \"simulation.java\". You can find instructions on how to write a suitable \"simulation.java\" in the Star-CCM+ documentation

The file \"simulation.java\" must be in the same directory as your Slurm submission script (or you can provide a full path).

"},{"location":"software-packages/starccm%2B/#local-star-ccm-client-configuration","title":"Local Star-CCM+ client configuration","text":"

Start your local STAR-CCM+ application and connect to your server. Click on the File -> \"Connect to Server...\" option and use the following settings:

Select the \"Connect through SSH tunnel\" option and use:

Your local STAR-CCM+ client should now be connected to the remote server. You should be able to run a new simulation or interact with an existing one.

"},{"location":"software-packages/vasp/","title":"VASP","text":"

The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

VASP computes an approximate solution to the many-body Schr\u00f6dinger equation, either within density functional theory (DFT), solving the Kohn-Sham equations, or within the Hartree-Fock (HF) approximation, solving the Roothaan equations. Hybrid functionals that mix the Hartree-Fock approach with density functional theory are implemented as well. Furthermore, Green's functions methods (GW quasiparticles, and ACFDT-RPA) and many-body perturbation theory (2nd-order M\u00f8ller-Plesset) are available in VASP.

In VASP, central quantities, like the one-electron orbitals, the electronic charge density, and the local potential are expressed in plane wave basis sets. The interactions between the electrons and ions are described using norm-conserving or ultrasoft pseudopotentials, or the projector-augmented-wave method.

To determine the electronic groundstate, VASP makes use of efficient iterative matrix diagonalisation techniques, like the residual minimisation method with direct inversion of the iterative subspace (RMM-DIIS) or blocked Davidson algorithms. These are coupled to highly efficient Broyden and Pulay density mixing schemes to speed up the self-consistency cycle.

"},{"location":"software-packages/vasp/#useful-links","title":"Useful Links","text":""},{"location":"software-packages/vasp/#using-vasp-on-cirrus","title":"Using VASP on Cirrus","text":"

CPU and GPU versions of VASP are available on Cirrus

VASP is only available to users who have a valid VASP licence. VASP 5 and VASP 6 are separate packages on Cirrus and requests for access need to be made separately for the two versions via SAFE.

If you have a VASP 5 or VASP 6 licence and wish to have access to VASP on Cirrus please request access through SAFE:

Once your access has been enabled, you access the VASP software using the vasp modules in your job submission script. You can see which versions of VASP are currently available on Cirrus with

module avail vasp\n

Once loaded, the executables are called:

All executables include the additional MD algorithms accessed via the MDALGO keyword.

"},{"location":"software-packages/vasp/#running-parallel-vasp-jobs-cpu","title":"Running parallel VASP jobs - CPU","text":"

The CPU version of VASP can exploit multiple nodes on Cirrus and will generally be run in exclusive mode over more than one node.

The following script will run a VASP job using 4 nodes (144 cores).

#!/bin/bash\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=VASP_CPU_test\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --exclusive\n#SBATCH --time=0:20:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# Replace [partition name] below with your partition name (e.g. standard,gpu)\n#SBATCH --partition=[partition name]\n# Replace [qos name] below with your qos name (e.g. standard,long,gpu)\n#SBATCH --qos=[qos name]\n\n# Load VASP version 6 module\nmodule load vasp/6\n\n# Set number of OpenMP threads to 1\nexport OMP_NUM_THREADS=1\n\n# Run standard VASP executable\nsrun --hint=nomultithread --distribution=block:block vasp_std\n
"},{"location":"software-packages/vasp/#running-parallel-vasp-jobs-gpu","title":"Running parallel VASP jobs - GPU","text":"

The GPU version of VASP can exploit multiple GPU across multiple nodes, you should benchmark your system to ensure you understand how many GPU can be used in parallel for your calculations.

The following script will run a VASP job using 2 GPU on 1 node.

#!/bin/bash\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=VASP_GPU_test\n#SBATCH --nodes=1\n#SBATCH --gres=gpu:2\n#SBATCH --time=0:20:0\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n\n# Load VASP version 6 module\nmodule load vasp/6/6.3.2-gpu-nvhpc22\n\n# Set number of OpenMP threads to 1\nexport OMP_NUM_THREADS=1\n\n# Run standard VASP executable with 1 MPI process per GPU\nsrun --ntasks=2 --cpus-per-task=10 --hint=nomultithread --distribution=block:block vasp_std\n
"},{"location":"software-tools/ddt/","title":"Debugging using Linaro DDT","text":"

The Linaro Forge tool suite is installed on Cirrus. This includes DDT, which is a debugging tool for scalar, multi-threaded and large-scale parallel applications. To compile your code for debugging you will usually want to specify the -O0 option to turn off all code optimisation (as this can produce a mismatch between source code line numbers and debugging information) and -g to include debugging information in the compiled executable. To use this package you will need to log in to Cirrus with X11-forwarding enabled, load the Linaro Forge module and execute forge:

module load forge\nforge\n
"},{"location":"software-tools/ddt/#debugging-runs-on-the-login-nodes","title":"Debugging runs on the login nodes","text":"

You can execute and debug your MPI code on the login node which is useful for immediate development work with short, small, simple runs to avoid having to wait in the queue. Firstly ensure you have loaded the mpt module and any other dependencies of your code, then start Forge and click Run. Fill in the necessary details of your code under the Application pane, then tick the MPI tick box, specify the number of MPI processes you wish to run and ensure the implementation is set to HPE MPT (2.18+). If this is not set correctly then you can update the configuration by clicking the Change button and selecting this option on the MPI/UPC Implementation field of the system pane. When you are happy with this hit Run to start.

"},{"location":"software-tools/ddt/#debugging-runs-on-the-compute-nodes","title":"Debugging runs on the compute nodes","text":"

This involves DDT submitting your job to the queue, and as soon as the compute nodes start executing you will drop into the debug session and be able to interact with your code. Start Forge and click on Run, then in the Application pane provide the details needed for your code. Then tick the MPI box -- when running on the compute nodes, you must set the MPI implementation to Slurm (generic). You must also tick the Submit to Queue box. Clicking the Configure button in this section, you must now choose the submission template. One is provided for you at /work/y07/shared/cirrus-software/forge/latest/templates/cirrus.qtf which you should copy and modify to suit your needs. You will need to load any modules required for your code and perform any other necessary setup, such as providing extra sbatch options, i.e., whatever is needed for your code to run in a normal batch job.

Note

The current Linaro Forge licence permits use on the Cirrus CPU nodes only. The licence does not permit use of DDT/MAP for codes that run on the Cirrus GPUs.

Back in the DDT run window, you can click on Parameters in the same queue pane to set the partition and QoS to use, the account to which the job should be charged, and the maximum walltime. You can also now look at the MPI pane again and select the number of processes and nodes to use. Finally, clicking Submit will place the job in the queue. A new window will show you the queue until the job starts at which you can start to debug.

"},{"location":"software-tools/ddt/#memory-debugging-with-ddt","title":"Memory debugging with DDT","text":"

If you are dynamically linking your code and debugging it on the login node then this is fine (just ensure that the Preload the memory debugging library option is ticked in the Details pane.) If you are dynamically linking but intending to debug running on the compute nodes, or statically linking then you need to include the compile option -Wl,--allow-multiple-definition and explicitly link your executable with Allinea's memory debugging library. The exactly library to link against depends on your code; -ldmalloc (for no threading with C), -ldmallocth (for threading with C), -ldmallocxx (for no threading with C++) or -ldmallocthcxx (for threading with C++). The library locations are all set up when the forge module is loaded so these libraries should be found without further arguments.

"},{"location":"software-tools/ddt/#remote-client","title":"Remote Client","text":"

Linaro Forge can connect to remote systems using SSH so you can run the user interface on your desktop or laptop machine without the need for X forwarding. Native remote clients are available for Windows, macOS and Linux. You can download the remote clients from the Linaro Forge website. No licence file is required by a remote client.

Note

The same versions of Linaro Forge must be installed on the local and remote systems in order to use DDT remotely.

To configure the remote client to connect to Cirrus, start it and then click on the Remote Launch drop-down box and click on Configure. In the new window, click Add to create a new login profile. For the hostname you should provide username@login.cirrus.ac.uk where username is your login username. For Remote Installation Directory* enter /work/y07/shared/cirrus-software/forge/latest. To ensure your SSH private key can be used to connect, the SSH agent on your local machine should be configured to provide it. You can ensure this by running ssh-add ~/.ssh/id_rsa_cirrus before using the Forge client where you should replace ~/.ssh/id_rsa_cirrus with the path to the key you normally use to log in to Cirrus. This should persist until your local machine is restarted --only then should you have to re-run ssh-add.

If you only intend to debug jobs on the compute nodes no further configuration is needed. If however you want to use the login nodes, you will likely need to write a short bash script to prepare the same environment you would use if you were running your code interactively on the login node -- otherwise, the necessary libraries will not be found while running. For example, if using MPT, you might create a file in your home directory containing only one line:

module load mpt\n

In your local Forge client you should then edit the Remote Script field in the Cirrus login details to contain the path to this script. When you log in the script will be sourced and the software provided by whatever modules it loads become usable.

When you start the Forge client, you will now be able to select the Cirrus login from the Remote Launch drop-down box. After providing your usual login password the connection to Cirrus will be established and you will be able to start debugging.

You can find more detailed information here.

"},{"location":"software-tools/ddt/#getting-further-help-on-ddt","title":"Getting further help on DDT","text":""},{"location":"software-tools/intel-vtune/","title":"Intel VTune","text":""},{"location":"software-tools/intel-vtune/#profiling-using-vtune","title":"Profiling using VTune","text":"

Intel VTune allows profiling of compiled codes, and is particularly suited to analysing high performance applications involving threads (OpenMP), and MPI (or some combination thereof).

Using VTune is a two-stage process. First, an application is compiled using an appropriate Intel compiler and run in a \"collection\" phase. The results are stored to file, and may then be inspected interactively via the VTune GUI.

"},{"location":"software-tools/intel-vtune/#collection","title":"Collection","text":"

Compile the application in the normal way, and run a batch job in exclusive mode to ensure the node is not shared with other jobs. An example is given below.

Collection of performance data is based on a collect option, which defines which set of hardware counters are monitered in a given run. As not all counters are available at the same time, a number of different collections are available. A different one may be relevant if interested in different aspects of performance. Some standard options are:

vtune -collect=performance-snapshot may be used to product a text summary of performance (typically to standard output), which can be used as a basis for further investigation.

vtune -collect=hotspots produces a more detailed analysis which can be used to inspect time taken per function and per line of code.

vtune -collect=hpc-performance may be useful for HPC codes.

vtune --collect=meory-access will provide figures for memory-related measures including application memory bandwidth.

Use vtune --help collect for a full summary of collection options. Note that not all options are available (e.g., prefer NVIDIA profiling for GPU codes).

"},{"location":"software-tools/intel-vtune/#example-slurm-script","title":"Example SLURM script","text":"

Here we give an example of profiling an application which has been compiled with Intel 20.4 and requests the memory-access collection. We assume the application involves OpenMP threads, but no MPI.

#!/bin/bash\n\n#SBATCH --time=00:10:00\n#SBATCH --nodes=1\n#SBATCH --exclusive\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nexport OMP_NUM_THREADS=18\n\n# Load relevant (cf. compile-time) Intel options \nmodule load intel-20.4/compilers\nmodule load intel-20.4/vtune\n\nvtune -collect=memory-access -r results-memory ./my_application\n

Profiling will generate a certain amount of additional text information; this appears on standard output. Detailed profiling data will be stored in various files in a sub-directory, the name of which can be specified using the -r option.

Notes

"},{"location":"software-tools/intel-vtune/#profiling-an-mpi-code","title":"Profiling an MPI code","text":"

Intel VTune can also be used to profile MPI codes. It is recommended that the relavant Intel MPI module is used for compilation. The following example uses Intel 18 with the older amplxe-cl command:

#!/bin/bash\n\n#SBATCH --time=00:10:00\n#SBATCH --nodes=2\n#SBATCH --exclusive\n\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\nexport OMP_NUM_THREADS=18\n\nmodule load intel-mpi-18\nmodule load intel-compilers-18\nmodule load intel-vtune-18\n\nmpirun -np 4 -ppn 2 amplxe-cl -collect hotspots -r vtune-hotspots \\\n       ./my_application\n

Note that the Intel MPI launcher mpirun is used, and this precedes the VTune command. The example runs a total of 4 MPI tasks (-np 4) with two tasks per node (-ppn 2). Each task runs 18 OpenMP threads.

"},{"location":"software-tools/intel-vtune/#viewing-the-results","title":"Viewing the results","text":"

We recommend that the latest version of the VTune GUI is used to view results; this can be run interactively with an appropriate X connection. The latest version is available via

$ module load oneapi\n$ module load vtune/latest\n$ vtune-gui\n

From the GUI, navigate to the appropriate results file to load the analysis. Note that the latest version of VTune will be able to read results generated with previous versions of the Intel compilers.

"},{"location":"software-tools/scalasca/","title":"Profiling using Scalasca","text":"

Scalasca is installed on Cirrus, which is an open source performance profiling tool. Two versions are provided, using GCC 8.2.0 and the Intel 19 compilers; both use the HPE MPT library to provide MPI and SHMEM. An important distinction is that the GCC+MPT installation cannot be used to profile Fortran code as MPT does not provide GCC Fortran module files. To profile Fortran code, please use the Intel+MPT installation.

Loading the one of the modules will autoload the correct compiler and MPI library:

module load scalasca/2.6-gcc8-mpt225\n

or

module load scalasca/2.6-intel19-mpt225\n

Once loaded, the profiler may be run with the scalasca or scan commands, but the code must first be compiled first with the Score-P instrumentation wrapper tool. This is done by prepending the compilation commands with scorep, e.g.:

scorep mpicc -c main.c -o main\nscorep mpif90 -openmp main.f90 -o main\n

Advanced users may also wish to make use of the Score-P API. This allows you to manually define function and region entry and exit points.

You can then profile the execution during a Slurm job by prepending your srun commands with one of the equivalent commands scalasca -analyze or scan -s:

scalasca -analyze srun ./main\nscan -s srun ./main\n

You will see some output from Scalasca to stdout during the run. Included in that output will be the name of an experiment archive directory, starting with scorep_, which will be created in the working directory. If you want, you can set the name of the directory by exporting the SCOREP_EXPERIMENT_DIRECTORY environment variable in your job script.

There is an associated GUI called Cube which can be used to process and examine the experiment results, allowing you to understand your code's performance. This has been made available via a Singularity container. To start it, run the command cube followed by the file in the experiment archive directory ending in .cubex (or alternatively the whole archive), as seen below:

cube scorep_exp_1/profile.cubex\n

The Scalasca quick reference guide found here provides a good overview of the toolset's use, from instrumentation and use of the API to analysis with Cube.

"},{"location":"user-guide/batch/","title":"Running Jobs on Cirrus","text":"

As with most HPC services, Cirrus uses a scheduler to manage access to resources and ensure that the thousands of different users of system are able to share the system and all get access to the resources they require. Cirrus uses the Slurm software to schedule jobs.

Writing a submission script is typically the most convenient way to submit your job to the scheduler. Example submission scripts (with explanations) for the most common job types are provided below.

Interactive jobs are also available and can be particularly useful for developing and debugging applications. More details are available below.

Hint

If you have any questions on how to run jobs on Cirrus do not hesitate to contact the Cirrus Service Desk.

You typically interact with Slurm by issuing Slurm commands from the login nodes (to submit, check and cancel jobs), and by specifying Slurm directives that describe the resources required for your jobs in job submission scripts.

"},{"location":"user-guide/batch/#basic-slurm-commands","title":"Basic Slurm commands","text":"

There are three key commands used to interact with the Slurm on the command line:

We cover each of these commands in more detail below.

"},{"location":"user-guide/batch/#sinfo-information-on-resources","title":"sinfo: information on resources","text":"

sinfo is used to query information about available resources and partitions. Without any options, sinfo lists the status of all resources and partitions, e.g.

[auser@cirrus-login3 ~]$ sinfo\n\nPARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST\nstandard       up   infinite    280   idle r1i0n[0-35],r1i1n[0-35],r1i2n[0-35],r1i3n[0-35],r1i4n[0-35],r1i5n[0-35],r1i6n[0-35],r1i7n[0-6,9-15,18-24,27-33]\ngpu            up   infinite     36   idle r2i4n[0-8],r2i5n[0-8],r2i6n[0-8],r2i7n[0-8]\n
"},{"location":"user-guide/batch/#sbatch-submitting-jobs","title":"sbatch: submitting jobs","text":"

sbatch is used to submit a job script to the job submission system. The script will typically contain one or more srun commands to launch parallel tasks.

When you submit the job, the scheduler provides the job ID, which is used to identify this job in other Slurm commands and when looking at resource usage in SAFE.

[auser@cirrus-login3 ~]$ sbatch test-job.slurm\nSubmitted batch job 12345\n
"},{"location":"user-guide/batch/#squeue-monitoring-jobs","title":"squeue: monitoring jobs","text":"

squeue without any options or arguments shows the current status of all jobs known to the scheduler. For example:

[auser@cirrus-login3 ~]$ squeue\n          JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)\n          1554  comp-cse CASTEP_a  auser  R       0:03      2 r2i0n[18-19]\n

will list all jobs on Cirrus.

The output of this is often overwhelmingly large. You can restrict the output to just your jobs by adding the -u $USER option:

[auser@cirrus-login3 ~]$ squeue -u $USER\n
"},{"location":"user-guide/batch/#scancel-deleting-jobs","title":"scancel: deleting jobs","text":"

scancel is used to delete a jobs from the scheduler. If the job is waiting to run it is simply cancelled, if it is a running job then it is stopped immediately. You need to provide the job ID of the job you wish to cancel/stop. For example:

[auser@cirrus-login3 ~]$ scancel 12345\n

will cancel (if waiting) or stop (if running) the job with ID 12345.

"},{"location":"user-guide/batch/#resource-limits","title":"Resource Limits","text":"

Note

If you have requirements which do not fit within the current QoS, please contact the Service Desk and we can discuss how to accommodate your requirements.

There are different resource limits on Cirrus for different purposes. There are three different things you need to specify for each job:

Each of these aspects are described in more detail below.

The primary resources you request are compute resources: either CPU cores on the standard compute nodes or GPU cards on the GPU compute nodes. Other node resources: memory on the standard compute nodes; memory and CPU cores on the GPU nodes are assigned pro rata based on the primary resource that you request.

Warning

On Cirrus, you cannot specify the memory for a job using the --mem options to Slurm (e.g. --mem, --mem-per-cpu, --mem-per-gpu). The amount of memory you are assigned is calculated from the amount of primary resource you request.

"},{"location":"user-guide/batch/#primary-resources-on-standard-cpu-compute-nodes","title":"Primary resources on standard (CPU) compute nodes","text":"

The primary resource you request on standard compute nodes are CPU cores. The maximum amount of memory you are allocated is computed as the number of CPU cores you requested multiplied by 1/36th of the total memory available (as there are 36 CPU cores per node). So, if you request the full node (36 cores), then you will be allocated a maximum of all of the memory (256 GB) available on the node; however, if you request 1 core, then you will be assigned a maximum of 256/36 = 7.1 GB of the memory available on the node.

Note

Using the --exclusive option in jobs will give you access to the full node memory even if you do not explicitly request all of the CPU cores on the node.

Warning

Using the --exclusive option will charge your account for the usage of the entire node, even if you don't request all the cores in your scripts.

Note

You will not generally have access to the full amount of memory resource on the the node as some is retained for running the operating system and other system processes.

"},{"location":"user-guide/batch/#primary-resources-on-gpu-nodes","title":"Primary resources on GPU nodes","text":"

The primary resource you request on standard compute nodes are GPU cards. The maximum amount of memory and CPU cores you are allocated is computed as the number of GPU cards you requested multiplied by 1/4 of the total available (as there are 4 GPU cards per node). So, if you request the full node (4 GPU cards), then you will be allocated a maximum of all of the memory (384 GB) available on the node; however, if you request 1 GPU card, then you will be assigned a maximum of 384/4 = 96 GB of the memory available on the node.

Note

Using the --exclusive option in jobs will give you access to all of the CPU cores and the full node memory even if you do not explicitly request all of the GPU cards on the node.

Warning

In order to run jobs on the GPU nodes your budget must have positive GPU hours and core hours associated with it. However, only your GPU hours will be consumed when running these jobs.

Warning

Using the --exclusive option will charge your account for the usage of the entire node, i.e., 4 GPUs, even if you don't request all the GPUs in your submission script.

"},{"location":"user-guide/batch/#partitions","title":"Partitions","text":"

On Cirrus, compute nodes are grouped into partitions. You will have to specify a partition using the --partition option in your submission script. The following table has a list of active partitions on Cirrus.

Partition Description Total nodes available Notes standard CPU nodes with 2x 18-core Intel Broadwell processors 352 gpu GPU nodes with 4x Nvidia V100 GPU and 2x 20-core Intel Cascade Lake processors 36

Cirrus Partitions

You can list the active partitions using

sinfo\n

Note

you may not have access to all the available partitions.

"},{"location":"user-guide/batch/#quality-of-service-qos","title":"Quality of Service (QoS)","text":"

On Cirrus Quality of Service (QoS) is used alongside partitions to set resource limits. The following table has a list of active QoS on Cirrus.

QoS Name Jobs Running Per User Jobs Queued Per User Max Walltime Max Size Applies to Partitions Notes standard No limit 500 jobs 4 days 88 nodes (3168 cores/25%) standard largescale 1 job 4 jobs 24 hours 228 nodes (8192+ cores/65%) or 144 GPUs standard, gpu long 5 jobs 20 jobs 14 days 16 nodes or 8 GPUs standard, gpu highpriority 10 jobs 20 jobs 4 days 140 nodes standard charged at 1.5 x normal rate gpu No limit 128 jobs 4 days 64 GPUs (16 nodes/40%) gpu short 1 job 2 jobs 20 minutes 2 nodes or 4 GPUs standard, gpu lowpriority No limit 100 jobs 2 days 36 nodes (1296 cores/10%) or 16 GPUs standard, gpu usage is not charged"},{"location":"user-guide/batch/#cirrus-qos","title":"Cirrus QoS","text":"

You can find out the QoS that you can use by running the following command:

sacctmgr show assoc user=$USER cluster=cirrus format=cluster,account,user,qos%50\n
"},{"location":"user-guide/batch/#troubleshooting","title":"Troubleshooting","text":""},{"location":"user-guide/batch/#slurm-error-handling","title":"Slurm error handling","text":""},{"location":"user-guide/batch/#mpi-jobs","title":"MPI jobs","text":"

Users of MPI codes may wish to ensure termination of all tasks on the failure of one individual task by specifying the --kill-on-bad-exit argument to srun. E.g.,

srun -n 36 --kill-on-bad-exit ./my-mpi-program\n

This can prevent effective \"hanging\" of the job until the wall time limit is reached.

"},{"location":"user-guide/batch/#automatic-resubmission","title":"Automatic resubmission","text":"

Jobs that fail are not automatically resubmitted by Slurm on Cirrus. Automatic resubmission can be enabled for a job by specifying the --requeue option to sbatch.

"},{"location":"user-guide/batch/#slurm-error-messages","title":"Slurm error messages","text":"

An incorrect submission will cause Slurm to return an error. Some common problems are listed below, with a suggestion about the likely cause:

A --partition= option is missing. You must specify the partition (see the list above). This is most often --partition=standard.

error: Batch job submission failed: Invalid partition name specified

Check the partition exists and check the spelling is correct.

This probably means an invalid account has been given. Check the --account= options against valid accounts in SAFE.

A QoS option is either missing or invalid. Check the script has a --qos= option and that the option is a valid one from the table above. (Check the spelling of the QoS is correct.)

Add an option of the form --time=hours:minutes:seconds to the submission script. E.g., --time=01:30:00 gives a time limit of 90 minutes.

The script has probably specified a time limit which is too long for the corresponding QoS. E.g., the time limit for the short QoS is 20 minutes.

"},{"location":"user-guide/batch/#slurm-queued-reasons","title":"Slurm queued reasons","text":"

The squeue command allows users to view information for jobs managed by Slurm. Jobs typically go through the following states: PENDING, RUNNING, COMPLETING, and COMPLETED. The first table provides a description of some job state codes. The second table provides a description of the reasons that cause a job to be in a state.

Status Code Description PENDING PD Job is awaiting resource allocation. RUNNING R Job currently has an allocation. SUSPENDED S Job currently has an allocation. COMPLETING CG Job is in the process of completing. Some processes on some nodes may still be active. COMPLETED CD Job has terminated all processes on all nodes with an exit code of zero. TIMEOUT TO Job terminated upon reaching its time limit. STOPPED ST Job has an allocation, but execution has been stopped with SIGSTOP signal. CPUS have been retained by this job. OUT_OF_MEMORY OOM Job experienced out of memory error. FAILED F Job terminated with non-zero exit code or other failure condition. NODE_FAIL NF Job terminated due to failure of one or more allocated nodes. CANCELLED CA Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated.

Slurm Job State codes

For a full list of see Job State Codes

Reason Description Priority One or more higher priority jobs exist for this partition or advanced reservation. Resources The job is waiting for resources to become available. BadConstraints The job's constraints can not be satisfied. BeginTime The job's earliest start time has not yet been reached. Dependency This job is waiting for a dependent job to complete. Licenses The job is waiting for a license. WaitingForScheduling No reason has been set for this job yet. Waiting for the scheduler to determine the appropriate reason. Prolog Its PrologSlurmctld program is still running. JobHeldAdmin The job is held by a system administrator. JobHeldUser The job is held by the user. JobLaunchFailure The job could not be launched. This may be due to a file system problem, invalid program name, etc. NonZeroExitCode The job terminated with a non-zero exit code. InvalidAccount The job's account is invalid. InvalidQOS The job's QOS is invalid. QOSUsageThreshold Required QOS threshold has been breached. QOSJobLimit The job's QOS has reached its maximum job count. QOSResourceLimit The job's QOS has reached some resource limit. QOSTimeLimit The job's QOS has reached its time limit. NodeDown A node required by the job is down. TimeLimit The job exhausted its time limit. ReqNodeNotAvail Some node specifically required by the job is not currently available. The node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job's \"reason\" field as \"UnavailableNodes\". Such nodes will typically require the intervention of a system administrator to make available.

Slurm Job Reasons

For a full list of see Job Reasons

"},{"location":"user-guide/batch/#output-from-slurm-jobs","title":"Output from Slurm jobs","text":"

Slurm places standard output (STDOUT) and standard error (STDERR) for each job in the file slurm_<JobID>.out. This file appears in the job's working directory once your job starts running.

Note

This file is plain text and can contain useful information to help debugging if a job is not working as expected. The Cirrus Service Desk team will often ask you to provide the contents of this file if you contact them for help with issues.

"},{"location":"user-guide/batch/#specifying-resources-in-job-scripts","title":"Specifying resources in job scripts","text":"

You specify the resources you require for your job using directives at the top of your job submission script using lines that start with the directive #SBATCH.

Note

Options provided using #SBATCH directives can also be specified as command line options to srun.

If you do not specify any options, then the default for each option will be applied. As a minimum, all job submissions must specify the budget that they wish to charge the job too, the partition they wish to use and the QoS they want to use with the options:

Other common options that are used are:

Other not so common options that are used are:

In addition, parallel jobs will also need to specify how many nodes, parallel processes and threads they require.

Note

For parallel jobs, you should request exclusive node access with the --exclusive option to ensure you get the expected resources and performance.

"},{"location":"user-guide/batch/#srun-launching-parallel-jobs","title":"srun: Launching parallel jobs","text":"

If you are running parallel jobs, your job submission script should contain one or more srun commands to launch the parallel executable across the compute nodes. As well as launching the executable, srun also allows you to specify the distribution and placement (or pinning) of the parallel processes and threads.

If you are running MPI jobs that do not also use OpenMP threading, then you should use srun with no additional options. srun will use the specification of nodes and tasks from your job script, sbatch or salloc command to launch the correct number of parallel tasks.

If you are using OpenMP threads then you will generally add the --cpu-bind=cores option to srun to bind threads to cores to obtain the best performance.

Note

See the example job submission scripts below for examples of using srun for pure MPI jobs and for jobs that use OpenMP threading.

"},{"location":"user-guide/batch/#example-parallel-job-submission-scripts","title":"Example parallel job submission scripts","text":"

A subset of example job submission scripts are included in full below.

Hint

Do not replace srun with mpirun in the following examples. Although this might work under special circumstances, it is not guaranteed and therefore not supported.

"},{"location":"user-guide/batch/#example-job-submission-script-for-mpi-parallel-job","title":"Example: job submission script for MPI parallel job","text":"

A simple MPI job submission script to submit a job using 4 compute nodes and 36 MPI ranks per node for 20 minutes would look like:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Example_MPI_Job\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# We use the \"standard\" partition as we are running on CPU nodes\n#SBATCH --partition=standard\n# We use the \"standard\" QoS as our runtime is less than 4 days\n#SBATCH --qos=standard\n\n# Load the default HPE MPI environment\nmodule load mpt\n\n# Change to the submission directory\ncd $SLURM_SUBMIT_DIR\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\n# Launch the parallel job\n#   Using 144 MPI processes and 36 MPI processes per node\n#\u00a0  srun picks up the distribution from the sbatch options\nsrun ./my_mpi_executable.x\n

This will run your executable \"my_mpi_executable.x\" in parallel on 144 MPI processes using 4 nodes (36 cores per node, i.e. not using hyper-threading). Slurm will allocate 4 nodes to your job and srun will place 36 MPI processes on each node (one per physical core).

By default, srun will launch an MPI job that uses all of the cores you have requested via the \"nodes\" and \"tasks-per-node\" options. If you want to run fewer MPI processes than cores you will need to change the script.

For example, to run this program on 128 MPI processes you have two options:

Note

If you specify --ntasks explicitly and it is not compatible with the value of tasks-per-node then you will get a warning message from srun such as srun: Warning: can't honor --ntasks-per-node set to 36.

In this case, srun does the sensible thing and allocates MPI processes as evenly as it can across nodes. For example, the second option above would result in 32 MPI processes on each of the 4 nodes.

See above for a more detailed discussion of the different sbatch options.

"},{"location":"user-guide/batch/#note-on-mpt-task-placement","title":"Note on MPT task placement","text":"

By default, mpt will distribute processss to physical cores (cores 0-17 on socket 0, and cores 18-35 on socket 1) in a cyclic fashion. That is, rank 0 would be placed on core 0, task 1 on core 18, rank 2 on core 1, and so on (in a single-node job). This may be undesirable. Block, rather than cyclic, distribution can be obtained with

#SBATCH --distribution=block:block\n

The block:block here refers to the distribution on both nodes and sockets. This will distribute rank 0 for core 0, rank 1 to core 1, rank 2 to core 2, and so on.

"},{"location":"user-guide/batch/#example-job-submission-script-for-mpiopenmp-mixed-mode-parallel-job","title":"Example: job submission script for MPI+OpenMP (mixed mode) parallel job","text":"

Mixed mode codes that use both MPI (or another distributed memory parallel model) and OpenMP should take care to ensure that the shared memory portion of the process/thread placement does not span more than one node. This means that the number of shared memory threads should be a factor of 36.

In the example below, we are using 4 nodes for 6 hours. There are 8 MPI processes in total (2 MPI processes per node) and 18 OpenMP threads per MPI process. This results in all 36 physical cores per node being used.

Note

the use of the --cpu-bind=cores option to generate the correct affinity settings.

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Example_MPI_Job\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=4\n#SBATCH --ntasks=8\n#SBATCH --tasks-per-node=2\n#SBATCH --cpus-per-task=18\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n# We use the \"standard\" partition as we are running on CPU nodes\n#SBATCH --partition=standard\n# We use the \"standard\" QoS as our runtime is less than 4 days\n#SBATCH --qos=standard\n\n# Load the default HPE MPI environment\nmodule load mpt\n\n# Change to the submission directory\ncd $SLURM_SUBMIT_DIR\n\n# Set the number of threads to 18\n#   There are 18 OpenMP threads per MPI process\nexport OMP_NUM_THREADS=18\n\n# Launch the parallel job\n#   Using 8 MPI processes\n#   2 MPI processes per node\n#   18 OpenMP threads per MPI process\n\nsrun --cpu-bind=cores ./my_mixed_executable.x arg1 arg2\n
"},{"location":"user-guide/batch/#example-job-submission-script-for-openmp-parallel-job","title":"Example: job submission script for OpenMP parallel job","text":"

A simple OpenMP job submission script to submit a job using 1 compute nodes and 36 threads for 20 minutes would look like:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Example_OpenMP_Job\n#SBATCH --time=0:20:0\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=1\n#SBATCH --cpus-per-task=36\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# We use the \"standard\" partition as we are running on CPU nodes\n#SBATCH --partition=standard\n# We use the \"standard\" QoS as our runtime is less than 4 days\n#SBATCH --qos=standard\n\n# Load any required modules\nmodule load mpt\n\n# Change to the submission directory\ncd $SLURM_SUBMIT_DIR\n\n# Set the number of threads to the CPUs per task\nexport OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\n\n# Launch the parallel job\n#   Using 36 threads per node\n#\u00a0  srun picks up the distribution from the sbatch options\nsrun --cpu-bind=cores ./my_openmp_executable.x\n

This will run your executable \"my_openmp_executable.x\" in parallel on 36 threads. Slurm will allocate 1 node to your job and srun will place 36 threads (one per physical core).

See above for a more detailed discussion of the different sbatch options

"},{"location":"user-guide/batch/#job-arrays","title":"Job arrays","text":"

The Slurm job scheduling system offers the job array concept, for running collections of almost-identical jobs. For example, running the same program several times with different arguments or input data.

Each job in a job array is called a subjob. The subjobs of a job array can be submitted and queried as a unit, making it easier and cleaner to handle the full set, compared to individual jobs.

All subjobs in a job array are started by running the same job script. The job script also contains information on the number of jobs to be started, and Slurm provides a subjob index which can be passed to the individual subjobs or used to select the input data per subjob.

"},{"location":"user-guide/batch/#job-script-for-a-job-array","title":"Job script for a job array","text":"

As an example, the following script runs 56 subjobs, with the subjob index as the only argument to the executable. Each subjob requests a single node and uses all 36 cores on the node by placing 1 MPI process per core and specifies 4 hours maximum runtime per subjob:

#!/bin/bash\n# Slurm job options (name, compute nodes, job time)\n\n#SBATCH --name=Example_Array_Job\n#SBATCH --time=04:00:00\n#SBATCH --exclusive\n#SBATCH --nodes=1\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n#SBATCH --array=0-55\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# We use the \"standard\" partition as we are running on CPU nodes\n#SBATCH --partition=standard\n# We use the \"standard\" QoS as our runtime is less than 4 days\n#SBATCH --qos=standard\n\n# Load the default HPE MPI environment\nmodule load mpt\n\n# Change to the submission directory\ncd $SLURM_SUBMIT_DIR\n\n# Set the number of threads to 1\n#   This prevents any threaded system libraries from automatically\n#   using threading.\nexport OMP_NUM_THREADS=1\n\nsrun /path/to/exe $SLURM_ARRAY_TASK_ID\n
"},{"location":"user-guide/batch/#submitting-a-job-array","title":"Submitting a job array","text":"

Job arrays are submitted using sbatch in the same way as for standard jobs:

sbatch job_script.pbs\n
"},{"location":"user-guide/batch/#job-chaining","title":"Job chaining","text":"

Job dependencies can be used to construct complex pipelines or chain together long simulations requiring multiple steps.

Note

The --parsable option to sbatch can simplify working with job dependencies. It returns the job ID in a format that can be used as the input to other commands.

For example:

jobid=$(sbatch --parsable first_job.sh)\nsbatch --dependency=afterok:$jobid second_job.sh\n

or for a longer chain:

jobid1=$(sbatch --parsable first_job.sh)\njobid2=$(sbatch --parsable --dependency=afterok:$jobid1 second_job.sh)\njobid3=$(sbatch --parsable --dependency=afterok:$jobid1 third_job.sh)\nsbatch --dependency=afterok:$jobid2,afterok:$jobid3 last_job.sh\n
"},{"location":"user-guide/batch/#interactive-jobs","title":"Interactive Jobs","text":"

When you are developing or debugging code you often want to run many short jobs with a small amount of editing the code between runs. This can be achieved by using the login nodes to run small/short MPI jobs. However, you may want to test on the compute nodes (e.g. you may want to test running on multiple nodes across the high performance interconnect). One way to achieve this on Cirrus is to use an interactive jobs.

Interactive jobs via SLURM take two slightly different forms. The first uses srun directly to allocate resource to be used interactively; the second uses both salloc and srun.

"},{"location":"user-guide/batch/#using-srun","title":"Using srun","text":"

An interactive job via srun allows you to execute commands directly from the command line without using a job submission script, and to see the output from your program directly in the terminal.

A convenient way to do this is as follows.

[user@cirrus-login1]$ srun --exclusive --nodes=1 --time=00:20:00 --partition=standard --qos=standard --account=z04 --pty /usr/bin/bash --login\n[user@r1i0n14]$\n

This requests the exclusive use of one node for the given time (here, 20 minutes). The --pty /usr/bin/bash --login requests an interactive login shell be started. (Note the prompt has changed.) Interactive commands can then be used as normal and will execute on the compute node. When no longer required, you can type exit or CTRL-D to release the resources and return control to the front end shell.

[user@r1i0n14]$ exit\nlogout\n[user@cirrus-login1]$\n

Note that the new interactive shell will reflect the environment of the original login shell. If you do not wish this, add the --export=none argument to srun to provide a clean login environment.

Within an interactive job, one can use srun to launch parallel jobs in the normal way, e.g.,

[user@r1i0n14]$ srun -n 2 ./a.out\n

In this context, one could also use mpirun directly. Note we are limited to the 36 cores of our original --nodes=1 srun request.

"},{"location":"user-guide/batch/#using-salloc-with-srun","title":"Using salloc with srun","text":"

This approach uses thesalloc command to reserve compute nodes and then srun to launch relevant work.

To submit a request for a job reserving 2 nodes (72 physical cores) for 1 hour you would issue the command:

[user@cirrus-login1]$ salloc --exclusive --nodes=2 --tasks-per-node=36 --cpus-per-task=1 --time=01:00:00  --partition=standard --qos=standard --account=t01\nsalloc: Granted job allocation 8699\nsalloc: Waiting for resource configuration\nsalloc: Nodes r1i7n[13-14] are ready for job\n[user@cirrus-login1]$\n

Note that this starts a new shell on the login node associated with the allocation (the prompt has not changed). The allocation may be released by exiting this new shell.

[user@cirrus-login1]$ exit\nsalloc: Relinquishing job allocation 8699\n[user@cirrus-login1]$\n

While the allocation lasts you will be able to run parallel jobs on the compute nodes by issuing the srun command in the normal way. The resources available are those specified in the original salloc command. For example, with the above allocation,

$ srun ./mpi-code.out\n

will run 36 MPI tasks per node on two nodes.

If your allocation reaches its time limit, it will automatically be termintated and the associated shell will exit. To check that the allocation is still running, use squeue:

[user@cirrus-login1]$ squeue -u user\n           JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)\n            8718  standard     bash    user   R       0:07      2 r1i7n[18-19]\n

Choose a time limit long enough to allow the relevant work to be completed.

The salloc method may be useful if one wishes to associate operations on the login node (e.g., via a GUI) with work in the allocation itself.

"},{"location":"user-guide/batch/#reservations","title":"Reservations","text":"

Reservations are available on Cirrus. These allow users to reserve a number of nodes for a specified length of time starting at a particular time on the system.

Reservations require justification. They will only be approved if the request could not be fulfilled with the standard queues. For example, you require a job/jobs to run at a particular time e.g. for a demonstration or course.

Note

Reservation requests must be submitted at least 120 hours in advance of the reservation start time. We cannot guarantee to meet all reservation requests due to potential conflicts with other demands on the service but will do our best to meet all requests.

Reservations will be charged at 1.5 times the usual rate and our policy is that they will be charged the full rate for the entire reservation at the time of booking, whether or not you use the nodes for the full time. In addition, you will not be refunded the compute time if you fail to use them due to a job crash unless this crash is due to a system failure.

To request a reservation you complete a form on SAFE:

  1. [Log into SAFE](https://safe.epcc.ed.ac.uk)
  2. Under the \"Login accounts\" menu, choose the \"Request reservation\" option

On the first page, you need to provide the following:

On the second page, you will need to specify which username you wish the reservation to be charged against and, once the username has been selected, the budget you want to charge the reservation to. (The selected username will be charged for the reservation but the reservation can be used by all members of the selected budget.)

Your request will be checked by the Cirrus User Administration team and, if approved, you will be provided a reservation ID which can be used on the system. To submit jobs to a reservation, you need to add --reservation=<reservation ID> and --qos=reservation options to your job submission script or Slurm job submission command.

Note

You must have at least 1 CPUh - and 1 GPUh for reservations on GPU nodes - to be able to submit jobs to reservations.

Tip

You can submit jobs to a reservation as soon as the reservation has been set up; jobs will remain queued until the reservation starts.

"},{"location":"user-guide/batch/#serial-jobs","title":"Serial jobs","text":"

Unlike parallel jobs, serial jobs will generally not need to specify the number of nodes and exclusive access (unless they want access to all of the memory on a node. You usually only need the --ntasks=1 specifier. For example, a serial job submission script could look like:

#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=Example_Serial_Job\n#SBATCH --time=0:20:0\n#SBATCH --ntasks=1\n\n# Replace [budget code] below with your budget code (e.g. t01)\n#SBATCH --account=[budget code]\n# We use the \"standard\" partition as we are running on CPU nodes\n#SBATCH --partition=standard\n# We use the \"standard\" QoS as our runtime is less than 4 days\n#SBATCH --qos=standard\n\n# Change to the submission directory\ncd $SLURM_SUBMIT_DIR\n\n# Enforce threading to 1 in case underlying libraries are threaded\nexport OMP_NUM_THREADS=1\n\n# Launch the serial job\n#   Using 1 thread\nsrun --cpu-bind=cores ./my_serial_executable.x\n

Note

Remember that you will be allocated memory based on the number of tasks (i.e. CPU cores) that you request. You will get ~7.1 GB per task/core. If you need more than this for your serial job then you should ask for the number of tasks you need for the required memory (or use the --exclusive option to get access to all the memory on a node) and launch specifying a single task using srun --ntasks=1 --cpu-bind=cores.

"},{"location":"user-guide/batch/#temporary-files-and-tmp-in-batch-jobs","title":"Temporary files and /tmp in batch jobs","text":"

Applications which normally read and write temporary files from /tmp may require some care in batch jobs on Cirrus. As the size of /tmp on backend nodes is relatively small (\\< 150 MB), applications should use a different location to prevent possible failures. This is relevant for both CPU and GPU nodes.

Note also that the default value of the variable TMPDIR in batch jobs is a memory-resident file system location specific to the current job (typically in the /dev/shm directory). Files here reduce the available capacity of main memory on the node.

It is recommended that applications with significant temporary file space requirement should use the /user-guide/solidstate. E.g., a submission script might contain:

export TMPDIR=\"/scratch/space1/x01/auser/$SLURM_JOBID.tmp\"\nmkdir -p $TMPDIR\n

to set the standard temporary directory to a unique location in the solid state storage. You will also probably want to add a line to clean up the temporary directory at the end of your job script, e.g.

rm -r $TMPDIR\n

Tip

Applications should not hard-code specific locations such as /tmp. Parallel applications should further ensure that there are no collisions in temporary file names on separate processes/nodes.

"},{"location":"user-guide/connecting/","title":"Connecting to Cirrus","text":"

On the Cirrus system, interactive access can be achieved via SSH, either directly from a command line terminal or using an SSH client. In addition data can be transferred to and from the Cirrus system using scp from the command line or by using a file transfer client.

Before following the process below, we assume you have set up an account on Cirrus through the EPCC SAFE. Documentation on how to do this can be found at:

SAFE Guide for Users

This section covers the basic connection methods.

"},{"location":"user-guide/connecting/#access-credentials-mfa","title":"Access credentials: MFA","text":"

To access Cirrus, you need to use two credentials (this is known as multi-factor authentication or MFA): your SSH key pair, protected by a passphrase, and a time-based one-time passcode (sometimes known as a TOTP code). You can find more detailed instructions on how to set up your credentials to access Cirrus from Windows, macOS and Linux below.

Note

The first time you log into a new account you will also need to enter a one-time password from SAFE. This is described in more detail below.

"},{"location":"user-guide/connecting/#ssh-key-pairs","title":"SSH Key Pairs","text":"

You will need to generate an SSH key pair protected by a passphrase to access Cirrus.

Using a terminal (the command line), set up a key pair that contains your e-mail address and enter a passphrase you will use to unlock the key:

$ ssh-keygen -t rsa -C \"your@email.com\"\n...\n-bash-4.1$ ssh-keygen -t rsa -C \"your@email.com\"\nGenerating public/private rsa key pair.\nEnter file in which to save the key (/Home/user/.ssh/id_rsa): [Enter]\nEnter passphrase (empty for no passphrase): [Passphrase]\nEnter same passphrase again: [Passphrase]\nYour identification has been saved in /Home/user/.ssh/id_rsa.\nYour public key has been saved in /Home/user/.ssh/id_rsa.pub.\nThe key fingerprint is:\n03:d4:c4:6d:58:0a:e2:4a:f8:73:9a:e8:e3:07:16:c8 your@email.com\nThe key's randomart image is:\n+--[ RSA 2048]----+\n|    . ...+o++++. |\n| . . . =o..      |\n|+ . . .......o o |\n|oE .   .         |\n|o =     .   S    |\n|.    +.+     .   |\n|.  oo            |\n|.  .             |\n| ..              |\n+-----------------+\n

(remember to replace \"your@email.com\" with your e-mail address).

"},{"location":"user-guide/connecting/#upload-public-part-of-key-pair-to-safe","title":"Upload public part of key pair to SAFE","text":"

You should now upload the public part of your SSH key pair to the SAFE by following the instructions at:

Login to SAFE. Then:

  1. Go to the Menu Login accounts and select the Cirrus account you want to add the SSH key to
  2. On the subsequent Login account details page click the Add Credential button
  3. Select SSH public key as the Credential Type and click Next
  4. Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer.
  5. Click Add to associate the public SSH key part with your account

Once you have done this, your SSH key will be added to your Cirrus account.

"},{"location":"user-guide/connecting/#time-based-one-time-passcode-totp-code","title":"Time-based one-time passcode (TOTP code)","text":"

Remember, you will need to use both an SSH key and time-based one-time passcode (TOTP code) to log into Cirrus so you will also need to set up a method for generating a TOTP code before you can log into Cirrus.

"},{"location":"user-guide/connecting/#first-login-password-required","title":"First login: password required","text":"

Important

You will not use your password when logging on to Cirrus after the first login for a new account.

As an additional security measure, you will also need to use a password from SAFE for your first login to Cirrus with a new account. When you log into Cirrus for the first time with a new account, you will be prompted to change your initial password. This is a three step process:

  1. When promoted to enter your ldap password: Enter the password which you retrieve from SAFE
  2. When prompted to enter your new password: type in a new password
  3. When prompted to re-enter the new password: re-enter the new password

Your password has now been changed. You will no longer need this password to log into Cirrus from this point forwards, you will use your SSH key and TOTP code as described above.

"},{"location":"user-guide/connecting/#ssh-clients","title":"SSH Clients","text":"

Interaction with Cirrus is done remotely, over an encrypted communication channel, Secure Shell version 2 (SSH-2). This allows command-line access to one of the login nodes of a Cirrus, from which you can run commands or use a command-line text editor to edit files. SSH can also be used to run graphical programs such as GUI text editors and debuggers when used in conjunction with an X client.

"},{"location":"user-guide/connecting/#logging-in-from-linux-and-macos","title":"Logging in from Linux and MacOS","text":"

Linux distributions and MacOS each come installed with a terminal application that can be use for SSH access to the login nodes. Linux users will have different terminals depending on their distribution and window manager (e.g. GNOME Terminal in GNOME, Konsole in KDE). Consult your Linux distribution's documentation for details on how to load a terminal.

MacOS users can use the Terminal application, located in the Utilities folder within the Applications folder.

You can use the following command from the terminal window to login into Cirrus:

ssh username@login.cirrus.ac.uk\n

You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered your passphrase successfully, you will then be prompted for your password. You need to enter both correctly to be able to access Cirrus.

Note

If your SSH key pair is not stored in the default location (usually ~/.ssh/id_rsa) on your local system, you may need to specify the path to the private part of the key with the -i option to ssh. For example, if your key is in a file called keys/id_rsa_cirrus you would use the command ssh -i keys/id_rsa_cirrus username@login.cirrus.ac.uk to log in.

To allow remote programs, especially graphical applications to control your local display, such as being able to open up a new GUI window (such as for a debugger), use:

ssh -X username@login.cirrus.ac.uk\n

Some sites recommend using the -Y flag. While this can fix some compatibility issues, the -X flag is more secure.

Current MacOS systems do not have an X window system. Users should install the XQuartz package to allow for SSH with X11 forwarding on MacOS systems:

"},{"location":"user-guide/connecting/#logging-in-from-windows-using-mobaxterm","title":"Logging in from Windows using MobaXterm","text":"

A typical Windows installation will not include a terminal client, though there are various clients available. We recommend all our Windows users to download and install MobaXterm to access Cirrus. It is very easy to use and includes an integrated X server with SSH client to run any graphical applications on Cirrus.

You can download MobaXterm Home Edition (Installer Edition) from the following link:

Double-click the downloaded Microsoft Installer file (.msi), and the Windows wizard will automatically guides you through the installation process. Note, you might need to have administrator rights to install on some Windows OS. Also make sure to check whether Windows Firewall hasn't blocked any features of this program after installation.

Start MobaXterm using, for example, the icon added to the Start menu during the installation process.

If you would like to run any small remote GUI applications, then make sure to use -X option along with the ssh command (see above) to enable X11 forwarding, which allows you to run graphical clients on your local X server.

"},{"location":"user-guide/connecting/#host-keys","title":"Host Keys","text":"

Adding the host keys to your SSH configuration file provides an extra level of security for your connections to Cirrus. The host keys are checked against the login nodes when you login to Cirrus and if the remote server key does not match the one in the configuration file, the connection will be refused. This provides protection against potential malicious servers masquerading as the Cirrus login nodes.

"},{"location":"user-guide/connecting/#logincirrusacuk","title":"login.cirrus.ac.uk","text":"
login.cirrus.ac.uk ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOXYXQEFJfIBZRadNjVU9T0bYVlssht4Qz9Urliqor3L+S8rQojSQtPAjsxxgtD/yeaUWAaBZnXcbPFl2/uFPro=\n\nlogin.cirrus.ac.uk ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC4YJNc0yYfUPtiApGzzkwTYxUhFB1q1G2/vO8biwDL4W0LOcaBFCNTVst1IXQ6tZ9l0GfvlmYTb1LHYoTYLn5CyUL5KKS7X4FkhM9n2EExy/WK+H7kOvOwnWEAWM3GOwPYfhPWdddIHO7cI3CTd1kAL3NVzlt/yvx0CKGtw2QyL9gLGPJ23soDlIJYp/OC/f7E6U+JM6jx8QshQn0PiBPN3gB9MLWNX7ZsYXaSafIw1/txoh7D7CawsTrlKEHgEyNpQIgZFR7pLYlydRijbWEtD40DxlgaF1l/OuJrBfddRXC7VYHNvHq0jv0HCncCjxcHZmr3FW9B3PuRvBeWJpzV6Bv2pLGTPPwd8p7QgkAmTQ1Ews/Q4giUboZyqRcJAkFQtOBCmv43+qxWXKMAB7OdbjJL2oO9UIfPtUmE6oj+rnPxpJMhJuQX2aHIlS0Mev7NzaTUpQqNa4QgsI7Kj/m2JT0ZfQ0I33NO10Z3PLZghKqhTH5yy+2nSYLK6rnxZLU=\n\nlogin.cirrus.ac.uk ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFk4UnY1DaS+LFSS8AFKbmAmlevxShN4hGpn+gGGX8Io\n

Host key verification can fail if this key is out of date, a problem which can be fixed by removing the offending entry in ~/.ssh/known_hosts and replacing it with the new key published here. We recommend users should check this page for any key updates and not just accept a new key from the server without confirmation.

"},{"location":"user-guide/connecting/#making-access-more-convenient-using-the-ssh-configuration-file","title":"Making access more convenient using the SSH configuration file","text":"

Typing in the full command to login or transfer data to Cirrus can become tedious as it often has to be repeated many times. You can use the SSH configuration file, usually located on your local machine at .ssh/config to make things a bit more convenient.

Each remote site (or group of sites) can have an entry in this file which may look something like:

Host cirrus\n  HostName login.cirrus.ac.uk\n  User username\n

(remember to replace username with your actual username!).

The Host cirrus line defines a short name for the entry. In this case, instead of typing ssh username@login.cirrus.ac.uk to access the Cirrus login nodes, you could use ssh cirrus instead. The remaining lines define the options for the cirrus host.

Now you can use SSH to access Cirrus without needing to enter your username or the full hostname every time:

-bash-4.1$ ssh cirrus\n

You can set up as many of these entries as you need in your local configuration file. Other options are available. See the ssh_config man page (or man ssh_config on any machine with SSH installed) for a description of the SSH configuration file. You may find the IdentityFile option useful if you have to manage multiple SSH key pairs for different systems as this allows you to specify which SSH key to use for each system.

Note

There is a known bug with Windows ssh-agent. If you get the error message: Warning: agent returned different signature type ssh-rsa (expected rsa-sha2-512), you will need to either specify the path to your ssh key in the command line (using the -i option as described above) or add the path to your SSH config file by using the IdentityFile option.

"},{"location":"user-guide/connecting/#accessing-cirrus-from-more-than-1-machine","title":"Accessing Cirrus from more than 1 machine","text":"

It is common for users to want to access Cirrus from more than one local machine (e.g. a desktop linux, and a laptop) - this can be achieved through use of an ~/.ssh/authorized_keys file on Cirrus to hold the additional keys you generate. Note that if you want to access Cirrus via another remote service, see the next section, SSH forwarding.

You need to consider one of your local machines as your primary machine - this is the machine you should connect to Cirrus with using the instructions further up this page, adding your public key to SAFE.

On your second local machine, generate a new SSH key pair. Copy the public key to your primary machine (e.g. by email, USB stick, or cloud storage); the default location for this on a Linux or MacOS machine will be ~/.ssh/id_rsa.pub. If you are a Windows user using MobaXTerm, you should export the public key it generates to OpenSSH format (Conversions > Export OpenSSH Key). You should never move the private key off the machine on which it was generated.

Once back on your primary machine, you should copy the public key from your secondary machine to Cirrus using:

scp id_rsa.pub <user>@login.cirrus.ac.uk:id_secondary.pub\n

You should then log into Cirrus, as normal: ssh <user>@login.cirrus.ac.uk, and then:

mkdir ~/.ssh\nchmod 700 ~/.ssh\n
cat ~/id_secondary.pub >> ~/.ssh/authorized_keys\nchmod 600 ~/.ssh/authorized_keys\nrm ~/id_secondary.pub\n

You can then repeat this process for any more local machines you want to access Cirrus from, omitting the mkdir and chmod lines as the relevant files and directories will already exist with the correct permissions. You don't need to add the public key from your primary machine in your authorized_keys file, because Cirrus can find this in SAFE.

Note that the permissions on the .ssh directory must be set to 700 (Owner can read, can write and can execute but group and world do not have access) and on the authorized_keys file must be 600 (Owner can read and write but group and world do not have access). Keys will be ignored if this is not the case.

"},{"location":"user-guide/connecting/#ssh-forwarding-to-use-cirrus-from-a-second-remote-machine","title":"SSH forwarding (to use Cirrus from a second remote machine)","text":"

If you want to access Cirrus from a machine you already access remotely (e.g. to copy data from Cirrus onto a different cluster), you can forward your local Cirrus SSH keys so that you don't need to create a new key pair on the intermediate machine.

If your local machine is MacOS or Linus, add your Cirrus SSH key to the SSH Agent:

eval \"$(ssh-agent -s)\"\nssh-add ~/.ssh/id_rsa\n

(If you created your key with a different name, replace id_rsa in the command with the name of your private key file). You will be prompted for your SSH key's passphrase.

You can then use the -A flag when connecting to your intermediate cluster:

ssh -A <user>@<host>\n

Once on the intermediate cluster, you should be able to SSH to Cirrus directly:

ssh <user>@login.cirrus.ac.uk\n
"},{"location":"user-guide/connecting/#ssh-debugging-tips","title":"SSH debugging tips","text":"

If you find you are unable to connect via SSH there are a number of ways you can try and diagnose the issue. Some of these are collected below - if you are having difficulties connecting we suggest trying these before contacting the Cirrus service desk.

"},{"location":"user-guide/connecting/#can-you-connect-to-the-login-node","title":"Can you connect to the login node?","text":"

Try the command ping -c 3 login.cirrus.ac.uk. If you successfully connect to the login node, the output should include:

--- login.dyn.cirrus.ac.uk ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 38ms\n

(the ping time '38ms' is not important). If not all packets are received there could be a problem with your internet connection, or the login node could be unavailable.

"},{"location":"user-guide/connecting/#ssh-key","title":"SSH key","text":"

If you get the error message Permission denied (publickey) this can indicate a problem with your SSH key. Some things to check:

    $ ls -al ~/.ssh/\n    drwx------.  2 user group    48 Jul 15 20:24 .\n    drwx------. 12 user group  4096 Oct 13 12:11 ..\n    -rw-------.  1 user group   113 Jul 15 20:23 authorized_keys\n    -rw-------.  1 user group 12686 Jul 15 20:23 id_rsa\n    -rw-r--r--.  1 user group  2785 Jul 15 20:23 id_rsa.pub\n    -rw-r--r--.  1 user group  1967 Oct 13 14:11 known_hosts\n

The important section here is the string of letters and dashes at the start, for the lines ending in ., id_rsa, and id_rsa.pub, which indicate permissions on the containing directory, private key, and public key respectively. If your permissions are not correct, they can be set with chmod. Consult the table below for the relevant chmod command. On Windows, permissions are handled differently but can be set by right-clicking on the file and selecting Properties > Security > Advanced. The user, SYSTEM, and Administrators should have Full control, and no other permissions should exist for both public and private key files, and the containing folder.

Target Permissions chmod Code Directory drwx------ 700 Private Key -rw------- 600 Public Key -rw-r--r-- 644

chmod can be used to set permissions on the target in the following way: chmod <code> <target>. So for example to set correct permissions on the private key file id_rsa_cirrus one would use the command chmod 600 id_rsa_cirrus.

Note

Unix file permissions can be understood in the following way. There are three groups that can have file permissions: (owning) users, (owning) groups, and others. The available permissions are read, write, and execute.

The first character indicates whether the target is a file -, or directory d. The next three characters indicate the owning user's permissions. The first character is r if they have read permission, - if they don't, the second character is w if they have write permission, - if they don't, the third character is x if they have execute permission, - if they don't. This pattern is then repeated for group, and other permissions.

For example the pattern -rw-r--r-- indicates that the owning user can read and write the file, members of the owning group can read it, and anyone else can also read it. The chmod codes are constructed by treating the user, group, and owner permission strings as binary numbers, then converting them to decimal. For example the permission string -rwx------ becomes 111 000 000 -> 700.

"},{"location":"user-guide/connecting/#mfa","title":"MFA","text":"

If your TOTP passcode is being consistently rejected, you can remove MFA from your account and then re-enable it.

"},{"location":"user-guide/connecting/#ssh-verbose-output","title":"SSH verbose output","text":"

Verbose debugging output from ssh can be very useful for diagnosing the issue. In particular, it can be used to distinguish between problems with the SSH key and password - further details are given below. To enable verbose output add the -vvv flag to your SSH command. For example:

ssh -vvv username@login.cirrus.ac.uk\n

The output is lengthy, but somewhere in there you should see lines similar to the following:

debug1: Next authentication method: publickey\ndebug1: Offering public key: RSA SHA256:<key-hash> <path_to_private_key>\ndebug3: send_pubkey_test\ndebug3: send packet: type 50\ndebug2: we sent a publickey packet, wait for reply\ndebug3: receive packet: type 60\ndebug1: Server accepts key: pkalg ssh-rsa vlen 2071\ndebug2: input_userauth_pk_ok: fp SHA256:<key-hash>\ndebug3: sign_and_send_pubkey: RSA SHA256:<key-hash>\nEnter passphrase for key '<path_to_private_key>':\ndebug3: send packet: type 50\ndebug3: receive packet: type 51\nAuthenticated with partial success.\n

Most importantly, you can see which files ssh has checked for private keys, and you can see if any key is accepted. The line Authenticated with partial success indicates that the SSH key has been accepted, and you will next be asked for your password. By default ssh will go through a list of standard private key files, as well as any you have specified with -i or a config file. This is fine, as long as one of the files mentioned is the one that matches the public key uploaded to SAFE.

If you do not see Authenticated with partial success anywhere in the verbose output, consider the suggestions under SSH key above. If you do, but are unable to connect, consider the suggestions under Password above.

The equivalent information can be obtained in PuTTY or MobaXterm by enabling all logging in settings.

"},{"location":"user-guide/connecting/#default-shell-environment","title":"Default shell environment","text":"

Usually, when a new login shell is created, the commands on $HOME/.bashrc are executed. This tipically includes setting user-defined alias, changing environment variables, and, in the case of an HPC system, loading modules.

Cirrus does not currently read the $HOME/.bashrc file, but it does read the $HOME/.bash_profile file, so, if you wish to read a $HOME/.bashrc file, you can add the following to your $HOME/.bash_profile file (or create one, if it doesn't exist):

# $HOME/.bash_profile\n# load $HOME/.bashrc, if it exists\nif [ -f $HOME/.bashrc ]; then\n        . $HOME/.bashrc\nfi\n
"},{"location":"user-guide/data/","title":"Data Management and Transfer","text":"

This section covers the storage and file systems available on the system; the different ways that you can transfer data to and from Cirrus; and how to transfer backed up data from prior to the March 2022 Cirrus upgrade.

In all cases of data transfer, users should use the Cirrus login nodes.

"},{"location":"user-guide/data/#cirrus-file-systems-and-storage","title":"Cirrus file systems and storage","text":"

The Cirrus service, like many HPC systems, has a complex structure. There are a number of different data storage types available to users:

Each type of storage has different characteristics and policies, and is suitable for different types of use.

There are also two different types of node available to users:

Each type of node sees a different combination of the storage types. The following table shows which storage options are available on different node types:

Storage Login nodes Compute nodes Notes Home yes no No backup Work yes yes No backup Solid state yes yes No backup"},{"location":"user-guide/data/#home-file-system","title":"Home file system","text":"

Every project has an allocation on the home file system and your project's space can always be accessed via the path /home/[project-code]. The home file system is approximately 1.5 PB in size and is implemented using the Ceph technology. This means that this storage is not particularly high performance but are well suited to standard operations like compilation and file editing. This file systems is visible from the Cirrus login nodes.

There are currently no backups of any data on the home file system.

"},{"location":"user-guide/data/#quotas-on-home-file-system","title":"Quotas on home file system","text":"

All projects are assigned a quota on the home file system. The project PI or manager can split this quota up between groups of users if they wish.

You can view any home file system quotas that apply to your account by logging into SAFE and navigating to the page for your Cirrus login account.

  1. Log into SAFE
  2. Use the \"Login accounts\" menu and select your Cirrus login account
  3. The \"Login account details\" table lists any user or group quotas that are linked with your account. (If there is no quota shown for a row then you have an unlimited quota for that item, but you may still may be limited by another quota.)

Quota and usage data on SAFE is updated twice daily so may not be exactly up to date with the situation on the system itself.

"},{"location":"user-guide/data/#from-the-command-line","title":"From the command line","text":"

Some useful information on the current contents of directories on the /home file system is available from the command line by using the Ceph command getfattr. This is to be preferred over standard Unix commands such as du for reasons of efficiency.

For example, the number of entries (files plus directories) in a home directory can be queried via

$ cd\n$ getfattr -n ceph.dir.entries .\n# file: .\nceph.dir.entries=\"33\"\n

The corresponding attribute rentries gives the recursive total in all subdirectories, that is, the total number of files and directories:

$ getfattr -n ceph.dir.rentries .\n# file: .\nceph.dir.rentries=\"1619179\"\n

Other useful attributes (all prefixed with ceph.dir.) include files which is the number of ordinary files, subdirs the number of subdirectories, and bytes the total number of bytes used. All these have a corresponding recursive version, respectively: rfiles, rsubdirs, and rbytes.

A full path name can be specified if required.

"},{"location":"user-guide/data/#work-file-system","title":"Work file system","text":"

Every project has an allocation on the work file system and your project's space can always be accessed via the path /work/[project-code]. The work file system is approximately 400 TB in size and is implemented using the Lustre parallel file system technology. They are designed to support data in large files. The performance for data stored in large numbers of small files is probably not going to be as good.

There are currently no backups of any data on the work file system.

Ideally, the work file system should only contain data that is:

In practice it may be convenient to keep copies of datasets on the work file system that you know will be needed at a later date. However, make sure that important data is always backed up elsewhere and that your work would not be significantly impacted if the data on the work file system was lost.

If you have data on the work file system that you are not going to need in the future please delete it.

"},{"location":"user-guide/data/#quotas-on-the-work-file-system","title":"Quotas on the work file system","text":"

Tip

The capacity of the home file system is much larger than the work file system so you should store most data on home and only move data to work that you need for current running work.

As for the home file system, all projects are assigned a quota on the work file system. The project PI or manager can split this quota up between groups of users if they wish.

You can view any work file system quotas that apply to your account by logging into SAFE and navigating to the page for your Cirrus login account.

  1. Log into SAFE
  2. Use the \"Login accounts\" menu and select your Cirrus login account
  3. The \"Login account details\" table lists any user or group quotas that are linked with your account. (If there is no quota shown for a row then you have an unlimited quota for that item, but you may still may be limited by another quota.)

Quota and usage data on SAFE is updated twice daily so may not be exactly up to date with the situation on the system itself.

You can also examine up to date quotas and usage on the Cirrus system itself using the lfs quota command. To do this:

Change directory to the work directory where you want to check the quota. For example, if I wanted to check the quota for user auser in project t01 then I would:

[auser@cirrus-login1 auser]$ cd /work/t01/t01/auser\n\n[auser@cirrus-login1 auser]$ lfs quota -hu auser .\nDisk quotas for usr auser (uid 68826):\n     Filesystem    used   quota   limit   grace   files   quota   limit   grace\n              .  5.915G      0k      0k       -   51652       0       0       -\nuid 68826 is using default block quota setting\nuid 68826 is using default file quota setting\n

the quota and limit of 0k here indicate that no user quota is set for this user.

To check your project (group) quota, you would use the command:

[auser@cirrus-login1 auser]$ lfs quota -hg t01 .\nDisk quotas for grp t01 (gid 37733):\n     Filesystem    used   quota   limit   grace   files   quota   limit   grace\n           .  958.3G      0k  13.57T       - 1427052       0       0       -\ngid 37733 is using default file quota setting\n

the limit of 13.57T indicates the quota for the group.

"},{"location":"user-guide/data/#solid-state-storage","title":"Solid state storage","text":"

More information on using the solid state storage can be found in the /user-guide/solidstate section of the user guide.

The solid state storage is not backed up.

"},{"location":"user-guide/data/#accessing-cirrus-data-from-before-march-2022","title":"Accessing Cirrus data from before March 2022","text":"

Prior to the March 2022 Cirrus upgrade,all user date on the /lustre/sw filesystem was archived. Users can access their archived data from the Cirrus login nodes in the /home-archive directory. Assuming you are user auser from project x01, your pre-rebuild archived data can be found in:

/home-archive/x01/auser\n

The data in the /home-archive file system is read only meaning that you will not be able to create, edit, or copy new information to this file system.

To make archived data visible from the compute nodes, you will need to copy the data from the /home-archive file system to the /home file system. Assuming again that you are user auser from project x01 and that you were wanting to copy data from /home-archive/x01/auser/directory_to_copy to /home/x01/x01/auser/destination_directory, you would do this by running:

cp -r /home-archive/x01/auser/directory_to_copy \\\n   /home/x01/x01/auser/destination_directory\n

Note that the project code appears once in the path for the old home archive and twice in the path on the new /home file system.

Note

The capacity of the home file system is much larger than the work file system so you should move data to home rather than work.

"},{"location":"user-guide/data/#archiving","title":"Archiving","text":"

If you have related data that consists of a large number of small files it is strongly recommended to pack the files into a larger \"archive\" file for ease of transfer and manipulation. A single large file makes more efficient use of the file system and is easier to move and copy and transfer because significantly fewer meta-data operations are required. Archive files can be created using tools like tar and zip.

"},{"location":"user-guide/data/#tar","title":"tar","text":"

The tar command packs files into a \"tape archive\" format. The command has general form:

tar [options] [file(s)]\n

Common options include:

Putting these together:

tar -cvWlf mydata.tar mydata\n

will create and verify an archive.

To extract files from a tar file, the option -x is used. For example:

tar -xf mydata.tar\n

will recover the contents of mydata.tar to the current working directory.

To verify an existing tar file against a set of data, the -d (diff) option can be used. By default, no output will be given if a verification succeeds and an example of a failed verification follows:

$> tar -df mydata.tar mydata/*\nmydata/damaged_file: Mod time differs\nmydata/damaged_file: Size differs\n

Note

tar files do not store checksums with their data, requiring the original data to be present during verification.

Tip

Further information on using tar can be found in the tar manual (accessed via man tar or at man tar).

"},{"location":"user-guide/data/#zip","title":"zip","text":"

The zip file format is widely used for archiving files and is supported by most major operating systems. The utility to create zip files can be run from the command line as:

zip [options] mydata.zip [file(s)]\n

Common options are:

Together:

zip -0r mydata.zip mydata\n

will create an archive.

Note

Unlike tar, zip files do not preserve hard links. File data will be copied on archive creation, e.g. an uncompressed zip archive of a 100MB file and a hard link to that file will be approximately 200MB in size. This makes zip an unsuitable format if you wish to precisely reproduce the file system layout.

The corresponding unzip command is used to extract data from the archive. The simplest use case is:

unzip mydata.zip\n

which recovers the contents of the archive to the current working directory.

Files in a zip archive are stored with a CRC checksum to help detect data loss. unzip provides options for verifying this checksum against the stored files. The relevant flag is -t and is used as follows:

$> unzip -t mydata.zip\nArchive:  mydata.zip\n    testing: mydata/                 OK\n    testing: mydata/file             OK\nNo errors detected in compressed data of mydata.zip.\n

Tip

Further information on using zip can be found in the zip manual (accessed via man zip or at man zip).

"},{"location":"user-guide/data/#data-transfer","title":"Data transfer","text":""},{"location":"user-guide/data/#before-you-start","title":"Before you start","text":"

Read Harry Mangalam's guide on How to transfer large amounts of data via network. This tells you all you want to know about transferring data.

"},{"location":"user-guide/data/#data-transfer-via-ssh","title":"Data Transfer via SSH","text":"

The easiest way of transferring data to/from Cirrus is to use one of the standard programs based on the SSH protocol such as scp, sftp or rsync. These all use the same underlying mechanism (ssh) as you normally use to login to Cirrus. So, once the command has been executed via the command line, you will be prompted for your password for the specified account on the remote machine.

To avoid having to type in your password multiple times you can set up a ssh-key as documented in the User Guide at connecting

"},{"location":"user-guide/data/#ssh-transfer-performance-considerations","title":"SSH Transfer Performance Considerations","text":"

The ssh protocol encrypts all traffic it sends. This means that file-transfer using ssh consumes a relatively large amount of CPU time at both ends of the transfer. The encryption algorithm used is negotiated between the ssh-client and the ssh-server. There are command line flags that allow you to specify a preference for which encryption algorithm should be used. You may be able to improve transfer speeds by requesting a different algorithm than the default. The arcfour algorithm is usually quite fast assuming both hosts support it.

A single ssh based transfer will usually not be able to saturate the available network bandwidth or the available disk bandwidth so you may see an overall improvement by running several data transfer operations in parallel. To reduce metadata interactions it is a good idea to overlap transfers of files from different directories.

In addition, you should consider the following when transferring data.

"},{"location":"user-guide/data/#scp-command","title":"scp command","text":"

The scp command creates a copy of a file, or if given the -r flag, a directory, on a remote machine.

For example, to transfer files to Cirrus:

scp [options] source user@login.cirrus.ac.uk:[destination]\n

(Remember to replace user with your Cirrus username in the example above.)

In the above example, the [destination] is optional, as when left out scp will simply copy the source into the user's home directory. Also the source should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.

Tip

If your local version of OpenSSL (the library underlying scp) is very new you may see errors transferring data to Cirrus using scp where the version of OpenSSL is older. The errors typically look like scp: upload \"mydata\": path canonicalization failed. You can get around this issue by adding the -O option to scp.

If you want to request a different encryption algorithm add the -c [algorithm-name] flag to the scp options. For example, to use the (usually faster) aes128-ctr encryption algorithm you would use:

scp [options] -c aes128-ctr source user@login.cirrus.ac.uk:[destination]\n

(Remember to replace user with your Cirrus username in the example above.)

"},{"location":"user-guide/data/#rsync-command","title":"rsync command","text":"

The rsync command can also transfer data between hosts using a ssh connection. It creates a copy of a file or, if given the -r flag, a directory at the given destination, similar to scp above.

Given the -a option rsync can also make exact copies (including permissions), this is referred to as mirroring. In this case the rsync command is executed with ssh to create the copy on a remote machine.

To transfer files to Cirrus using rsync the command should have the form:

rsync [options] -e ssh source user@login.cirrus.ac.uk:[destination]\n

(Remember to replace user with your Cirrus username in the example above.)

In the above example, the [destination] is optional, as when left out rsync will simply copy the source into the users home directory. Also the source should be the absolute path of the file/directory being copied or the command should be executed in the directory containing the source file/directory.

Additional flags can be specified for the underlying ssh command by using a quoted string as the argument of the -e flag. e.g.

rsync [options] -e \"ssh -c aes128-ctr\" source user@login.cirrus.ac.uk:[destination]\n

(Remember to replace user with your Cirrus username in the example above.)

"},{"location":"user-guide/data/#data-transfer-using-rclone","title":"Data transfer using rclone","text":"

Rclone is a command-line program to manage files on cloud storage. You can transfer files directly to/from cloud storage services, such as MS OneDrive and Dropbox. The program preserves timestamps and verifies checksums at all times.

First of all, you must download and unzip rclone on Cirrus:

wget https://downloads.rclone.org/v1.62.2/rclone-v1.62.2-linux-amd64.zip\nunzip rclone-v1.62.2-linux-amd64.zip\ncd rclone-v1.62.2-linux-amd64/\n

The previous code snippet uses rclone v1.62.2, which was the latest version when these instructions were written.

Configure rclone using ./rclone config. This will guide you through an interactive setup process where you can make a new remote (called remote). See the following for detailed instructions for:

Please note that a token is required to connect from Cirrus to the cloud service. You need a web browser to get the token. The recommendation is to run rclone in your laptop using rclone authorize, get the token, and then copy the token from your laptop to Cirrus. The rclone website contains further instructions on configuring rclone on a remote machine without web browser.

Once all the above is done, you\u2019re ready to go. If you want to copy a directory, please use:

rclone copy <cirrus_directory> remote:<cloud_directory>\n

Please note that \u201cremote\u201d is the name that you have chosen when running rclone config`. To copy files, please use:

rclone copyto <cirrus_file> remote:<cloud_file>\n

Note

If the session times out while the data transfer takes place, adding the -vv flag to an rclone transfer forces rclone to output to the terminal and therefore avoids triggering the timeout process.

"},{"location":"user-guide/development/","title":"Application Development Environment","text":"

The application development environment on Cirrus is primarily controlled through the modules environment. By loading and switching modules you control the compilers, libraries and software available.

This means that for compiling on Cirrus you typically set the compiler you wish to use using the appropriate modules, then load all the required library modules (e.g. numerical libraries, IO format libraries).

Additionally, if you are compiling parallel applications using MPI (or SHMEM, etc.) then you will need to load one of the MPI environments and use the appropriate compiler wrapper scripts.

By default, all users on Cirrus start with no modules loaded.

Basic usage of the module command on Cirrus is covered below. For full documentation please see:

"},{"location":"user-guide/development/#using-the-modules-environment","title":"Using the modules environment","text":""},{"location":"user-guide/development/#information-on-the-available-modules","title":"Information on the available modules","text":"

Finding out which modules (and hence which compilers, libraries and software) are available on the system is performed using the module avail command:

[user@cirrus-login0 ~]$ module avail\n...\n

This will list all the names and versions of the modules available on the service. Not all of them may work in your account though due to, for example, licencing restrictions. You will notice that for many modules we have more than one version, each of which is identified by a version number. One of these versions is the default. As the service develops the default version will change.

You can list all the modules of a particular type by providing an argument to the module avail command. For example, to list all available versions of the Intel Compiler type:

[user@cirrus-login0 ~]$ module avail intel-*/compilers\n\n--------------------------------- /work/y07/shared/cirrus-modulefiles --------------------------------\nintel-19.5/compilers  intel-20.4/compilers\n

If you want more info on any of the modules, you can use the module help command:

[user@cirrus-login0 ~]$ module help mpt\n\n-------------------------------------------------------------------\nModule Specific Help for /usr/share/Modules/modulefiles/mpt/2.25:\n\nThe HPE Message Passing Toolkit (MPT) is an optimized MPI\nimplementation for HPE systems and clusters.  See the\nMPI(1) man page and the MPT User's Guide for more\ninformation.\n-------------------------------------------------------------------\n

The simple module list command will give the names of the modules and their versions you have presently loaded in your environment, e.g.:

[user@cirrus-login0 ~]$ module list\nCurrently Loaded Modulefiles:\n1) git/2.35.1(default)                                  \n2) epcc/utils\n2) /mnt/lustre/e1000/home/y07/shared/cirrus-modulefiles/epcc/setup-env\n
"},{"location":"user-guide/development/#loading-unloading-and-swapping-modules","title":"Loading, unloading and swapping modules","text":"

To load a module to use module add or module load. For example, to load the intel 20.4 compilers into the development environment:

module load intel-20.4/compilers\n

This will load the default version of the intel compilers.

If a module loading file cannot be accessed within 10 seconds, a warning message will appear: Warning: Module system not loaded.

If you want to clean up, module remove will remove a loaded module:

module remove intel-20.4/compilers\n

You could also run module rm intel-20.4/compilers or module unload intel-20.4/compilers. There are many situations in which you might want to change the presently loaded version to a different one, such as trying the latest version which is not yet the default or using a legacy version to keep compatibility with old data. This can be achieved most easily by using \"module swap oldmodule newmodule\".

Suppose you have loaded version 19 of the Intel compilers; the following command will change to version 20:

module swap intel-19.5/compilers intel-20.4/compilers\n
"},{"location":"user-guide/development/#available-compiler-suites","title":"Available Compiler Suites","text":"

Note

As Cirrus uses dynamic linking by default you will generally also need to load any modules you used to compile your code in your job submission script when you run your code.

"},{"location":"user-guide/development/#intel-compiler-suite","title":"Intel Compiler Suite","text":"

The Intel compiler suite is accessed by loading the intel-*/compilers module, where * references the version. For example, to load the v20 release, you would run:

module load intel-20.4/compilers\n

Once you have loaded the module, the compilers are available as:

See the extended section below for further details of available Intel compiler versions and tools.

"},{"location":"user-guide/development/#gcc-compiler-suite","title":"GCC Compiler Suite","text":"

The GCC compiler suite is accessed by loading the gcc/* modules, where * again is the version. For example, to load version 10.2.0 you would run:

module load gcc/10.2.0\n

Once you have loaded the module, the compilers are available as:

"},{"location":"user-guide/development/#compiling-mpi-codes","title":"Compiling MPI codes","text":"

MPI on Cirrus is currently provided by the HPE MPT library.

You should also consult the chapter on running jobs through the batch system for examples of how to run jobs compiled against MPI.

Note

By default, all compilers produce dynamic executables on Cirrus. This means that you must load the same modules at runtime (usually in your job submission script) as you have loaded at compile time.

"},{"location":"user-guide/development/#using-hpe-mpt","title":"Using HPE MPT","text":"

To compile MPI code with HPE MPT, using any compiler, you must first load the \"mpt\" module.

module load mpt\n

This makes the compiler wrapper scripts mpicc, mpicxx and mpif90 available to you.

What you do next depends on which compiler (Intel or GCC) you wish to use to compile your code.

Note

We recommend that you use the Intel compiler wherever possible to compile MPI applications as this is the method officially supported and tested by HPE.

Note

You can always check which compiler the MPI compiler wrapper scripts are using with, for example, mpicc -v or mpif90 -v.

"},{"location":"user-guide/development/#using-intel-compilers-and-hpe-mpt","title":"Using Intel Compilers and HPE MPT","text":"

Once you have loaded the MPT module you should next load the Intel compilers module you intend to use (e.g. intel-20.4/compilers):

module load intel-20.4/compilers\n

The compiler wrappers are then available as

Note

The MPT compiler wrappers use GCC by default rather than the Intel compilers:

When compiling C applications you must also specify that mpicc should use the icc compiler with, for example, mpicc -cc=icc. Similarly, when compiling C++ applications you must also specify that mpicxx should use the icpc compiler with, for example, mpicxx -cxx=icpc. (This is not required for Fortran as the mpif90 compiler automatically uses ifort.) If in doubt use mpicc -cc=icc -v or mpicxx -cxx=icpc -v to see which compiler is actually being called.

Alternatively, you can set the environment variables MPICC_CC=icc and/or MPICXX=icpc to ensure the correct base compiler is used:

export MPICC_CC=icc\nexport MPICXX_CXX=icpc\n
"},{"location":"user-guide/development/#using-gcc-compilers-and-hpe-mpt","title":"Using GCC Compilers and HPE MPT","text":"

Once you have loaded the MPT module you should next load the gcc module:

module load gcc\n

Compilers are then available as

Note

HPE MPT does not support the syntax use mpi in Fortran applications with the GCC compiler gfortran. You should use the older include \"mpif.h\" syntax when using GCC compilers with mpif90. If you cannot change this, then use the Intel compilers with MPT.

"},{"location":"user-guide/development/#using-intel-mpi","title":"Using Intel MPI","text":"

Although HPE MPT remains the default MPI library and we recommend that first attempts at building code follow that route, you may also choose to use Intel MPI if you wish. To use these, load the appropriate MPI module, for example intel-20.4/mpi:

module load intel-20.4/mpi\n

Please note that the name of the wrappers to use when compiling with Intel MPI depends on whether you are using the Intel compilers or GCC. You should make sure that you or any tools use the correct ones when building software.

Note

Although Intel MPI is available on Cirrus, HPE MPT remains the recommended and default MPI library to use when building applications.

"},{"location":"user-guide/development/#using-intel-compilers-and-intel-mpi","title":"Using Intel Compilers and Intel MPI","text":"

After first loading Intel MPI, you should next load the appropriate Intel compilers module (e.g. intel-20.4/compilers):

module load intel-20.4/compilers\n

You may then use the following MPI compiler wrappers:

"},{"location":"user-guide/development/#using-gcc-compilers-and-intel-mpi","title":"Using GCC Compilers and Intel MPI","text":"

After loading Intel MPI, you should next load the gcc module you wish to use:

module load gcc\n

You may then use these MPI compiler wrappers:

"},{"location":"user-guide/development/#using-openmpi","title":"Using OpenMPI","text":"

There are a number of OpenMPI modules available on Cirrus; these can be listed by running module avail openmpi. You'll notice that the majority of these modules are intended for use on the GPU nodes.

The fact that OpenMPI is open source means that we have full control over how the OpenMPI libraries are built. Indeed the OpenMPI configure script supports a wealth of options that allow us to build OpenMPI for a specific CUDA version, one that is fully compatible with the underlying NVIDIA GPU device driver. See the link below for an example how an OpenMPI build is configured.

Build instructions for OpenMPI 4.1.6 on Cirrus

All this means we build can OpenMPI such that it supports direct GPU-to-GPU communications using the NVLink intra-node GPU comm links (and inter-node GPU comms are direct to Infiniband intead of passing through the host processor).

Hence, the OpenMPI GPU modules allow the user to run GPU-aware MPI code as efficiently as possible, see Compiling and using GPU-aware MPI.

OpenMPI modules for use on the CPU nodes are also available, but these are not expected to provide any performance advantage over HPE MPT or Intel MPI.

"},{"location":"user-guide/development/#compiler-information-and-options","title":"Compiler Information and Options","text":"

The manual pages for the different compiler suites are available:

GCC Fortran man gfortran , C/C++ man gcc

Intel Fortran man ifort , C/C++ man icc

"},{"location":"user-guide/development/#useful-compiler-options","title":"Useful compiler options","text":"

Whilst difference codes will benefit from compiler optimisations in different ways, for reasonable performance on Cirrus, at least initially, we suggest the following compiler options:

Intel -O2

GNU -O2 -ftree-vectorize -funroll-loops -ffast-math

When you have a application that you are happy is working correctly and has reasonable performance you may wish to investigate some more aggressive compiler optimisations. Below is a list of some further optimisations that you can try on your application (Note: these optimisations may result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions):

Intel -fast

GNU -Ofast -funroll-loops

Vectorisation, which is one of the important compiler optimisations for Cirrus, is enabled by default as follows:

Intel At -O2 and above

GNU At -O3 and above or when using -ftree-vectorize

To promote integer and real variables from four to eight byte precision for Fortran codes the following compiler flags can be used:

Intel -real-size 64 -integer-size 64 -xAVX (Sometimes the Intel compiler incorrectly generates AVX2 instructions if the -real-size 64 or -r8 options are set. Using the -xAVX option prevents this.)

GNU -freal-4-real-8 -finteger-4-integer-8

"},{"location":"user-guide/development/#using-static-linkinglibraries","title":"Using static linking/libraries","text":"

By default, executables on Cirrus are built using shared/dynamic libraries (that is, libraries which are loaded at run-time as and when needed by the application) when using the wrapper scripts.

An application compiled this way to use shared/dynamic libraries will use the default version of the library installed on the system (just like any other Linux executable), even if the system modules were set differently at compile time. This means that the application may potentially be using slightly different object code each time the application runs as the defaults may change. This is usually the desired behaviour for many applications as any fixes or improvements to the default linked libraries are used without having to recompile the application, however some users may feel this is not the desired behaviour for their applications.

Alternatively, applications can be compiled to use static libraries (i.e. all of the object code of referenced libraries are contained in the executable file). This has the advantage that once an executable is created, whenever it is run in the future, it will always use the same object code (within the limit of changing runtime environment). However, executables compiled with static libraries have the potential disadvantage that when multiple instances are running simultaneously multiple copies of the libraries used are held in memory. This can lead to large amounts of memory being used to hold the executable and not application data.

To create an application that uses static libraries you must pass an extra flag during compilation, -Bstatic.

Use the UNIX command ldd exe_file to check whether you are using an executable that depends on shared libraries. This utility will also report the shared libraries this executable will use if it has been dynamically linked.

"},{"location":"user-guide/development/#intel-modules-and-tools","title":"Intel modules and tools","text":"

There are a number of different Intel compiler versions available, and there is also a slight difference in the way different versions appear.

A full list is available via module avail intel.

The different available compiler versions are:

We recommend the most up-to-date version in the first instance, unless you have particular reasons for preferring an older version.

For a note on Intel compiler version numbers, see this Intel page

The different module names (or parts thereof) indicate:

"},{"location":"user-guide/gpu/","title":"Using the Cirrus GPU Nodes","text":"

Cirrus has 38 GPU compute nodes each equipped with 4 NVIDIA V100 (Volta) GPU cards. This section of the user guide gives some details of the hardware; it also covers how to compile and run standard GPU applications.

The GPU cards on Cirrus do not support graphics rendering tasks; they are set to compute cluster mode and so only support computational tasks.

"},{"location":"user-guide/gpu/#hardware-details","title":"Hardware details","text":"

All of the Cirrus GPU nodes contain four Tesla V100-SXM2-16GB (Volta) cards. Each card has 16 GB of high-bandwidth memory, HBM, often referred to as device memory. Maximum device memory bandwidth is in the region of 900 GB per second. Each card has 5,120 CUDA cores and 640 Tensor cores.

There is one GPU Slurm partition installed on Cirrus called simply gpu. The 36 nodes in this partition have the Intel Cascade Lake architecture. Users concerned with host performance should add the specific compilation options appropriate for the processor.

The Cascade Lake nodes have two 20-core sockets (2.5 GHz) and a total of 384 GB host memory (192 GB per socket). Each core supports two threads in hardware.

For further details of the V100 architecture see, https://www.nvidia.com/en-gb/data-center/tesla-v100/ .

"},{"location":"user-guide/gpu/#compiling-software-for-the-gpu-nodes","title":"Compiling software for the GPU nodes","text":""},{"location":"user-guide/gpu/#nvidia-hpc-sdk","title":"NVIDIA HPC SDK","text":"

NVIDIA now make regular releases of a unified HPC SDK which provides the relevant compilers and libraries needed to build and run GPU programs. Versions of the SDK are available via the module system.

$ module avail nvidia/nvhpc\n

NVIDIA encourage the use of the latest available version, unless there are particular reasons to use earlier versions. The default version is therefore the latest module version present on the system.

Each release of the NVIDIA HPC SDK may include several different versions of the CUDA toolchain. Only one of these CUDA toolchains can be active at any one time and for nvhpc/22.11 this is CUDA 11.8.

Here is a list of available HPC SDK versions, and the corresponding version of CUDA:

Module Supported CUDA Version nvidia/nvhpc/22.11 CUDA 11.8 nvidia/nvhpc/22.2 CUDA 11.6

To load the latest NVIDIA HPC SDK use

$ module load nvidia/nvhpc\n

The following sections provide some details of compilation for different programming models.

"},{"location":"user-guide/gpu/#cuda","title":"CUDA","text":"

CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs).

Programs, typically written in C or C++, are compiled with nvcc. As well as nvcc, a host compiler is required. By default, a gcc module is added when nvidia/nvhpc is loaded.

Compile your source code in the usual way.

nvcc -arch=sm_70 -o cuda_test.x cuda_test.cu\n

Note

The -arch=sm_70 compile option ensures that the binary produced is compatible with the NVIDIA Volta architecture.

"},{"location":"user-guide/gpu/#using-cuda-with-intel-compilers","title":"Using CUDA with Intel compilers","text":"

You can load either the Intel 19 or Intel 20 compilers to use with nvcc.

module unload gcc\nmodule load intel-20.4/compilers\n

You can now use nvcc -ccbin icpc to compile your source code with the Intel C++ compiler icpc.

nvcc -arch=sm_70 -ccbin icpc -o cuda_test.x cuda_test.cu\n
"},{"location":"user-guide/gpu/#compiling-openacc-code","title":"Compiling OpenACC code","text":"

OpenACC is a directive-based approach to introducing parallelism into either C/C++ or Fortran codes. A code with OpenACC directives may be compiled like so.

$ module load nvidia/nvhpc\n$ nvc program.c\n\n$ nvc++ program.cpp\n

Note that nvc and nvc++ are distinct from the NVIDIA CUDA compiler nvcc. They provide a way to compile standard C or C++ programs without explicit CUDA content. See man nvc or man nvc++ for further details.

"},{"location":"user-guide/gpu/#cuda-fortran","title":"CUDA Fortran","text":"

CUDA Fortran provides extensions to standard Fortran which allow GPU functionality. CUDA Fortran files (with file extension .cuf) may be compiled with the NVIDIA Fortran compiler.

$ module load nvidia/nvhpc\n$ nvfortran program.cuf\n

See man nvfortran for further details.

"},{"location":"user-guide/gpu/#openmp-for-gpus","title":"OpenMP for GPUs","text":"

The OpenMP API supports multi-platform shared-memory parallel programming in C/C++ and Fortran and can offload computation from the host (i.e. CPU) to one or more target devices (such as the GPUs on Cirrus). OpenMP code can be compiled with the NVIDIA compilers in a similar manner to OpenACC. To enable this functionality, you must add -mp=gpu to your compile command.

$ module load nvidia/nvhpc\n$ nvc++ -mp=gpu program.cpp\n

You can specify exactly which GPU to target with the -gpu flag. For example, the Volta cards on Cirrus use the flag -gpu=cc70.

During development it can be useful to have the compiler report information about how it is processing OpenMP pragmas. This can be enabled by the use of -Minfo=mp, see below.

nvc -mp=gpu -Minfo=mp testprogram.c\nmain:\n24, #omp target teams distribute parallel for thread_limit(128)\n24, Generating Tesla and Multicore code\nGenerating \"nvkernel_main_F1L88_2\" GPU kernel\n26, Loop parallelized across teams and threads(128), schedule(static)\n
"},{"location":"user-guide/gpu/#submitting-jobs-to-the-gpu-nodes","title":"Submitting jobs to the GPU nodes","text":"

To run a GPU job, a SLURM submission must specify a GPU partition and a quality of service (QoS) as well as the number of GPUs required. You specify the number of GPU cards you want using the --gres=gpu:N option, where N is typically 1, 2 or 4.

Note

As there are 4 GPUs per node, each GPU is associated with 1/4 of the resources of the node, i.e., 10/40 physical cores and roughly 91/384 GB in host memory.

Allocations of host resources are made pro-rata. For example, if 2 GPUs are requested, sbatch will allocate 20 cores and around 190 GB of host memory (in addition to 2 GPUs). Any attempt to use more than the allocated resources will result in an error.

This automatic allocation by SLURM for GPU jobs means that the submission script should not specify options such as --ntasks and --cpus-per-task. Such a job submission will be rejected. See below for some examples of how to use host resources and how to launch MPI applications.

If you specify the --exclusive option, you will automatically be allocated all host cores and all memory from the node irrespective of how many GPUs you request. This may be needed if the application has a large host memory requirement.

If more than one node is required, exclusive mode --exclusive and --gres=gpu:4 options must be included in your submission script. It is, for example, not possible to request 6 GPUs other than via exclusive use of two nodes.

Warning

In order to run jobs on the GPU nodes your budget must have positive GPU hours and positive CPU core hours associated with it. However, only your GPU hours will be consumed when running these jobs.

"},{"location":"user-guide/gpu/#partitions","title":"Partitions","text":"

Your job script must specify a partition. The following table has a list of relevant GPU partition(s) on Cirrus.

Partition Description Maximum Job Size (Nodes) gpu GPU nodes with Cascade Lake processors 36"},{"location":"user-guide/gpu/#quality-of-service-qos","title":"Quality of Service (QoS)","text":"

Your job script must specify a QoS relevant for the GPU nodes. Available QoS specifications are as follows.

QoS Name Jobs Running Per User Jobs Queued Per User Max Walltime Max Size Partition gpu No limit 128 jobs 4 days 64 GPUs gpu long 5 jobs 20 jobs 14 days 8 GPUs gpu short 1 job 2 jobs 20 minutes 4 GPUs gpu lowpriority No limit 100 jobs 2 days 16 GPUs gpu largescale 1 job 4 jobs 24 hours 144 GPUs gpu"},{"location":"user-guide/gpu/#examples","title":"Examples","text":""},{"location":"user-guide/gpu/#job-submission-script-using-one-gpu-on-a-single-node","title":"Job submission script using one GPU on a single node","text":"

A job script that requires 1 GPU accelerator and 10 CPU cores for 20 minutes would look like the following.

#!/bin/bash\n#\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --gres=gpu:1\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\n# Load the required modules \nmodule load nvidia/nvhpc\n\nsrun ./cuda_test.x\n

This will execute one host process with access to one GPU. If we wish to make use of the 10 host cores in this allocation, we could use host threads via OpenMP.

export OMP_NUM_THREADS=10\nexport OMP_PLACES=cores\n\nsrun --ntasks=1 --cpus-per-task=10 --hint=nomultithread ./cuda_test.x\n

The launch configuration is specified directly to srun because, for the GPU partitions, it is not possible to do this via sbatch.

"},{"location":"user-guide/gpu/#job-submission-script-using-multiple-gpus-on-a-single-node","title":"Job submission script using multiple GPUs on a single node","text":"

A job script that requires 4 GPU accelerators and 40 CPU cores for 20 minutes would appear as follows.

#!/bin/bash\n#\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --gres=gpu:4\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\n# Load the required modules \nmodule load nvidia/nvhpc\n\nsrun ./cuda_test.x\n

A typical MPI application might assign one device per MPI process, in which case we would want 4 MPI tasks in this example. This would again be specified directly to srun.

srun --ntasks=4 ./mpi_cuda_test.x\n
"},{"location":"user-guide/gpu/#job-submission-script-using-multiple-gpus-on-multiple-nodes","title":"Job submission script using multiple GPUs on multiple nodes","text":"

See below for a job script that requires 8 GPU accelerators for 20 minutes.

#!/bin/bash\n#\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --gres=gpu:4\n#SBATCH --nodes=2\n#SBATCH --exclusive\n#SBATCH --time=00:20:00\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\n# Load the required modules \nmodule load nvidia/nvhpc\n\nsrun ./cuda_test.x\n

An MPI application with four MPI tasks per node would be launched as follows.

srun --ntasks=8 --tasks-per-node=4 ./mpi_cuda_test.x\n

Again, these options are specified directly to srun rather than being declared as sbatch directives.

Attempts to oversubscribe an allocation (10 cores per GPU) will fail, and generate an error message.

srun: error: Unable to create step for job 234123: More processors requested\nthan permitted\n
"},{"location":"user-guide/gpu/#debugging-gpu-applications","title":"Debugging GPU applications","text":"

Applications may be debugged using cuda-gdb. This is an extension of gdb which can be used with CUDA. We assume the reader is familiar with gdb.

First, compile the application with the -g -G flags in order to generate debugging information for both host and device code. Then, obtain an interactive session like so.

$ srun --nodes=1 --partition=gpu --qos=short --gres=gpu:1 \\\n       --time=0:20:0 --account=[budget code] --pty /bin/bash\n

Next, load the NVIDIA HPC SDK module and start cuda-gdb for your application.

$ module load nvidia/nvhpc\n$ cuda-gdb ./my-application.x\nNVIDIA (R) CUDA Debugger\n...\n(cuda-gdb)\n

Debugging then proceeds as usual. One can use the help facility within cuda-gdb to find details on the various debugging commands. Type quit to end your debug session followed by exit to close the interactive session.

Note, it may be necessary to set the temporary directory to somewhere in the user space (e.g., export TMPDIR=$(pwd)/tmp) to prevent unexpected internal CUDA driver errors.

For further information on CUDA-GDB, see https://docs.nvidia.com/cuda/cuda-gdb/index.html.

"},{"location":"user-guide/gpu/#profiling-gpu-applications","title":"Profiling GPU applications","text":"

NVIDIA provide two useful tools for profiling performance of applications: Nsight Systems and Nsight Compute; the former provides an overview of application performance, while the latter provides detailed information specifically on GPU kernels.

"},{"location":"user-guide/gpu/#using-nsight-systems","title":"Using Nsight Systems","text":"

Nsight Systems provides an overview of application performance and should therefore be the starting point for investigation. To run an application, compile as normal (including the -g flag) and then submit a batch job.

#!/bin/bash\n\n#SBATCH --time=00:10:00\n#SBATCH --nodes=1\n#SBATCH --exclusive  \n#SBATCH --partition=gpu\n#SBATCH --qos=short\n#SBATCH --gres=gpu:1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\nmodule load nvidia/nvhpc\n\nsrun -n 1 nsys profile -o prof1 ./my_application.x\n

The run should then produce an additional output file called, in this case, prof1.qdrep. The recommended way to view the contents of this file is to download the NVIDIA Nsight package to your own machine (you do not need the entire HPC SDK). Then copy the .qdrep file produced on Cirrus so that if can be viewed locally.

Note, a profiling run should probably be of a short duration so that the profile information (contained in the .qdrep file) does not become prohibitively large.

Details of the download of Nsight Systems and a user guide can be found via the links below.

https://developer.nvidia.com/nsight-systems

https://docs.nvidia.com/nsight-systems/UserGuide/index.html

If your code was compiled with the tools provided by nvidia/nvhpc/22.2 you should download and install Nsight Systems v2023.4.1.97.

"},{"location":"user-guide/gpu/#using-nsight-compute","title":"Using Nsight Compute","text":"

Nsight Compute may be used in a similar way as Nsight Systems. A job may be submitted like so.

#!/bin/bash\n\n#SBATCH --time=00:10:00\n#SBATCH --nodes=1\n#SBATCH --exclusive\n#SBATCH --partition=gpu\n#SBATCH --qos=short\n#SBATCH --gres=gpu:1\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\nmodule load nvidia/nvhpc\n\nsrun -n 1 nv-nsight-cu-cli --section SpeedOfLight_RooflineChart \\\n                           -o prof2 -f ./my_application.x\n

In this case, a file called prof2.ncu-rep should be produced. Again, the recommended way to view this file is to download the Nsight Compute package to your own machine, along with the .ncu-rep file from Cirrus. The --section option determines which statistics are recorded (typically not all hardware counters can be accessed at the same time). A common starting point is --section MemoryWorkloadAnalysis.

Consult the NVIDIA documentation for further details.

https://developer.nvidia.com/nsight-compute

https://docs.nvidia.com/nsight-compute/2023.3/index.html

Nsight Compute v2023.3.1.0 has been found to work for codes compiled using nvhpc versions 22.2 and 22.11.

"},{"location":"user-guide/gpu/#monitoring-the-gpu-power-usage","title":"Monitoring the GPU Power Usage","text":"

NVIDIA also provides a useful command line utility for the management and monitoring of NVIDIA GPUs: the NVIDIA System Management Interface nvidia-smi.

The nvidia-smi command queries the available GPUs and reports current information, including but not limited to: driver versions, CUDA version, name, temperature, current power usage and maximum power capability. In this example output, there is one available GPU and it is idle:

  +-----------------------------------------------------------------------------+\n  | NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |\n  |-------------------------------+----------------------+----------------------+\n  | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n  | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n  |                               |                      |               MIG M. |\n  |===============================+======================+======================|\n  |   0  Tesla V100-SXM2...  Off  | 00000000:1C:00.0 Off |                  Off |\n  | N/A   38C    P0    57W / 300W |      0MiB / 16384MiB |      1%      Default |\n  |                               |                      |                  N/A |\n  +-------------------------------+----------------------+----------------------+\n\n  +-----------------------------------------------------------------------------+\n  | Processes:                                                                  |\n  |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n  |        ID   ID                                                   Usage      |\n  |=============================================================================|\n  |  No running processes found                                                 |\n  +-----------------------------------------------------------------------------+\n

To monitor the power usage throughout the duration of a job, the output of nvidia-smi will report data at every specified interval with the --loop=SEC option with the tool sleeping in-between queries. The following command will print the output of nvidia-smi every 10 seconds in the specified output file.

nvidia-smi --loop=10 --filename=out-nvidia-smi.txt &\n

Example submission script:

#!/bin/bash --login\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=lammps_Example\n#SBATCH --time=00:20:00\n#SBATCH --nodes=1\n#SBATCH --gres=gpu:4\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n\n# Load the required modules\nmodule load nvidia/nvhpc\n\n# Save the output of NVIDIA-SMI every 10 seconds\nnvidia-smi --loop=10 --filename=out-nvidia-smi.txt &\nsrun ./cuda_test.x\n

This submission script uses 4 GPU accelerators for 20 minutes, printing the output of nvidia-smi every 10 seconds to the nvidia-smi.txt output file. The & means the shell executes the command in the background.

Consult the NVIDIA documentation for further details.

https://developer.nvidia.com/nvidia-system-management-interface

"},{"location":"user-guide/gpu/#compiling-and-using-gpu-aware-mpi","title":"Compiling and using GPU-aware MPI","text":"

For applications using message passing via MPI, considerable improvements in performance may be available by allowing device memory references in MPI calls. This allows replacement of relevant host device transfers by direct communication within a node via NVLink. Between nodes, MPI communication will remain limited by network latency and bandwidth.

Version of OpenMPI with both CUDA-aware MPI support and SLURM support are available, you should load the following modules:

module load openmpi/4.1.6-cuda-11.6\nmodule load nvidia/nvhpc-nompi/22.2\n

The command you use to compile depends on whether you are compiling C/C++ or Fortran.

"},{"location":"user-guide/gpu/#compiling-cc","title":"Compiling C/C++","text":"

The location of the MPI include files and libraries must be specified explicitly, e.g.,

nvcc -I${MPI_HOME}/include  -L${MPI_HOME}/lib -lmpi -o my_program.x my_program.cu\n

This will produce an executable in the usual way.

"},{"location":"user-guide/gpu/#compiling-fortran","title":"Compiling Fortran","text":"

Use the mpif90 compiler wrapper to compile Fortran code for GPU. e.g.

mpif90 -o my_program.x my_program.f90\n

This will produce an executable in the usual way.

"},{"location":"user-guide/gpu/#run-time","title":"Run time","text":"

A batch script to use such an executable might be:

#!/bin/bash\n\n#SBATCH --time=00:20:00\n\n#SBATCH --nodes=1\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --gres=gpu:4\n\n# Load the appropriate modules, e.g.,\nmodule load openmpi/4.1.6-cuda-11.6\nmodule load nvidia/nvhpc-nompi/22.2\n\nexport OMP_NUM_THREADS=1\n\n# Note the addition\nexport OMPI_MCA_pml=ob1\n\nsrun --ntasks=4 --cpus-per-task=10 --hint=nomultithread ./my_program\n

Note the addition of the environment variable OMPI_MCA_pml=ob1 is required for correct operation. As before, MPI and placement options should be directly specified to srun and not via SBATCH directives.

"},{"location":"user-guide/introduction/","title":"Introduction","text":"

This guide is designed to be a reference for users of the high-performance computing (HPC) facility: Cirrus. It provides all the information needed to access the system, transfer data, manage your resources (disk and compute time), submit jobs, compile programs and manage your environment.

"},{"location":"user-guide/introduction/#acknowledging-cirrus","title":"Acknowledging Cirrus","text":"

You should use the following phrase to acknowledge Cirrus in all research outputs that have used the facility:

This work used the Cirrus UK National Tier-2 HPC Service at EPCC (http://www.cirrus.ac.uk) funded by the University of Edinburgh and EPSRC (EP/P020267/1)

You should also tag outputs with the keyword Cirrus whenever possible.

"},{"location":"user-guide/introduction/#hardware","title":"Hardware","text":"

Details of the Cirrus hardware are available on the Cirrus website:

"},{"location":"user-guide/introduction/#useful-terminology","title":"Useful terminology","text":"

This is a list of terminology used throughout this guide and its meaning.

CPUh Cirrus CPU time is measured in CPUh. Each job you run on the service consumes CPUhs from your budget. You can find out more about CPUhs and how to track your usage in the resource management section

GPUh Cirrus GPU time is measured in GPUh. Each job you run on the GPU nodes consumes GPUhs from your budget, and requires positive CPUh, even though these will not be consumed. You can find out more about GPUhs and how to track your usage in the resource management section

"},{"location":"user-guide/network-upgrade-2023/","title":"Cirrus Network Upgrade: 2023","text":"

During September 2023 Cirrus will be undergoing a Network upgrade.

On this page we describe the impact this will have and links to further information.

If you have any questions or concerns, please contact the Cirrus Service Desk: https://www.cirrus.ac.uk/support/

"},{"location":"user-guide/network-upgrade-2023/#when-will-the-upgrade-happen-and-how-long-will-it-take","title":"When will the upgrade happen and how long will it take?","text":"

The outage dates will be:

We will notify users if we are able to complete this work ahead of schedule.

"},{"location":"user-guide/network-upgrade-2023/#what-are-the-impacts-on-users-from-the-upgrade","title":"What are the impacts on users from the upgrade?","text":"

During the upgrade process

Submitting new work, and running work

We will therefore be encouraging users to submit jobs to the queues in the period prior to the work, so that Cirrus can continue to run jobs during the outage.

"},{"location":"user-guide/network-upgrade-2023/#relaxing-of-queue-limits","title":"Relaxing of queue limits","text":"

In preparation for the Data Centre Network (DCN) upgrade we have relaxed the queue limits on all the QoS\u2019s, so that users can submit a significantly larger number of jobs to Cirrus. These changes are intended to allow users to submit jobs that they wish to run during the upgrade, in advance of the start of the upgrade. The changes will be in place until the end of the Data Centre Network upgrade.

"},{"location":"user-guide/network-upgrade-2023/#quality-of-service-qos","title":"Quality of Service (QoS)","text":"

QoS relaxed limits which will be in force during the Network upgrade.

QoS Name Jobs Running Per User Jobs Queued Per User Max Walltime Max Size Applies to Partitions Notes standard No limit 1000 jobs 4 days 88 nodes (3168 cores/25%) standard largescale 1 job 20 jobs 24 hours 228 nodes (8192+ cores/65%) or 144 GPUs standard, gpu long 5 jobs 40 jobs 14 days 16 nodes or 8 GPUs standard, gpu highpriority 10 jobs 20 jobs 4 days 140 nodes standard gpu No limit 256 jobs 4 days 64 GPUs (16 nodes/40%) gpu lowpriority No limit 1000 jobs 2 days 36 nodes (1296 cores/10%) or 16 GPUs standard, gpu"},{"location":"user-guide/python/","title":"Using Python","text":"

Python on Cirrus is provided by a number of Miniconda modules and one Anaconda module. (Miniconda being a small bootstrap version of Anaconda).

The Anaconda module is called anaconda3/2023.9 and is suitable for running serial applications - for parallel applications using mpi4py see mpi4py for CPU or mpi4py for GPU.

You can list the Miniconda modules by running module avail python on a login node. Those module versions that have the gpu suffix are suitable for use on the Cirrus GPU nodes. There are also modules that extend these Python environments, e.g., pyfr, tensorflow and pytorch - simply run module help <module name> for further info.

The Miniconda modules support Python-based parallel codes, i.e., each such python module provides a suite of packages pertinent to parallel processing and numerical analysis such as dask, ipyparallel, jupyter, matplotlib, numpy, pandas and scipy.

All the packages provided by a module can be obtained by running pip list. We now give some examples that show how the python modules can be used on the Cirrus CPU/GPU nodes.

"},{"location":"user-guide/python/#mpi4py-for-cpu","title":"mpi4py for CPU","text":"

The python/3.9.13 module provides mpi4py 3.1.5 linked with OpenMPI 4.1.6.

See numpy-broadcast.py below which is a simple MPI Broadcast example, and the Slurm script submit-broadcast.slurm which demonstrates how to run across it two compute nodes.

numpy-broadcast.py
#!/usr/bin/env python\n\n\"\"\"\nParallel Numpy Array Broadcast \n\"\"\"\n\nfrom mpi4py import MPI\nimport numpy as np\nimport sys\n\ncomm = MPI.COMM_WORLD\n\nsize = comm.Get_size()\nrank = comm.Get_rank()\nname = MPI.Get_processor_name()\n\narraySize = 100\nif rank == 0:\n    data = np.arange(arraySize, dtype='i')\nelse:\n    data = np.empty(arraySize, dtype='i')\n\ncomm.Bcast(data, root=0)\n\nif rank == 0:\n    sys.stdout.write(\n        \"Rank %d of %d (%s) has broadcast %d integers.\\n\"\n        % (rank, size, name, arraySize))\nelse:\n    sys.stdout.write(\n        \"Rank %d of %d (%s) has received %d integers.\\n\"\n        % (rank, size, name, arraySize))\n\n    arrayBad = False\n    for i in range(100):\n        if data[i] != i:\n            arrayBad = True\n            break\n\n    if arrayBad:\n        sys.stdout.write(\n            \"Error, rank %d array is not as expected.\\n\"\n            % (rank))\n

The MPI initialisation is done automatically as a result of calling from mpi4py import MPI.

submit-broadcast.slurm
#!/bin/bash\n\n# Slurm job options (name, compute nodes, job time)\n#SBATCH --job-name=broadcast\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n#SBATCH --account=[budget code]\n#SBATCH --nodes=2\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n\nmodule load python/3.9.13\n\nexport OMPI_MCA_mca_base_component_show_load_errors=0\n\nsrun numpy-broadcast.py\n

The Slurm submission script (submit-broadcast.slurm) above sets a OMPI_MCA environment variable before launching the job. That particular variable suppresses warnings written to the job output file; it can of course be removed. Please see the OpenMPI documentation for info on all OMPI_MCA variables.

"},{"location":"user-guide/python/#mpi4py-for-gpu","title":"mpi4py for GPU","text":"

There's also an mpi4py module (again using OpenMPI 4.1.4) that is tailored for CUDA 11.6 on the Cirrus GPU nodes, python/3.9.13-gpu. We show below an example that features an MPI reduction performed on a CuPy array (cupy-allreduce.py).

cupy-allreduce.py
#!/usr/bin/env python\n\n\"\"\"\nReduce-to-all CuPy Arrays \n\"\"\"\n\nfrom mpi4py import MPI\nimport cupy as cp\nimport sys\n\ncomm = MPI.COMM_WORLD\n\nsize = comm.Get_size()\nrank = comm.Get_rank()\nname = MPI.Get_processor_name()\n\nsendbuf = cp.arange(10, dtype='i')\nrecvbuf = cp.empty_like(sendbuf)\nassert hasattr(sendbuf, '__cuda_array_interface__')\nassert hasattr(recvbuf, '__cuda_array_interface__')\ncp.cuda.get_current_stream().synchronize()\ncomm.Allreduce(sendbuf, recvbuf)\n\nassert cp.allclose(recvbuf, sendbuf*size)\n\nsys.stdout.write(\n    \"%d (%s): recvbuf = %s\\n\"\n    % (rank, name, str(recvbuf)))\n

By default, the CuPy cache will be located within the user's home directory. And so, as /home is not accessible from the GPU nodes, it is necessary to set CUPY_CACHE_DIR such that the cache is on the /work file system instead.

submit-allreduce.slurm
#!/bin/bash\n\n#SBATCH --job-name=allreduce\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --account=[budget code]\n#SBATCH --nodes=2\n#SBATCH --gres=gpu:4\n\nmodule load python/3.9.13-gpu\n\nexport CUPY_CACHE_DIR=${HOME/home/work}/.cupy/kernel_cache\n\nexport OMPI_MCA_mpi_warn_on_fork=0\nexport OMPI_MCA_mca_base_component_show_load_errors=0\n\nsrun --ntasks=8 --tasks-per-node=4 --cpus-per-task=1 cupy-allreduce.py\n

Again, the submission script (submit-allreduce.slurm) is the place to set OMPI_MCA variables - the two shown are optional, see the link below for further details.

https://www.open-mpi.org/faq/?category=tuning#mca-def

"},{"location":"user-guide/python/#machine-learning-frameworks","title":"Machine Learning frameworks","text":"

There are several more Python-based modules that also target the Cirrus GPU nodes. These include two machine learning frameworks, pytorch/1.13.1-gpu and tensorflow/2.15.0-gpu. Both modules are Python virtual environments that extend python/3.10.8-gpu. The MPI comms is handled by the Horovod 0.28.1 package along with the NVIDIA Collective Communications Library v2.11.4.

A full package list for these environments can be obtained by loading the module of interest and then running pip list.

Please click on the link indicated to see examples of how to use the PyTorch and TensorFlow modules .

"},{"location":"user-guide/python/#installing-your-own-python-packages-with-pip","title":"Installing your own Python packages (with pip)","text":"

This section shows how to setup a local custom Python environment such that it extends a centrally-installed python module. By extend, we mean being able to install packages locally that are not provided by the central python. This is needed because some packages such as mpi4py must be built specifically for the Cirrus system and so are best provided centrally.

You can do this by creating a lightweight virtual environment where the local packages can be installed. Further, this environment is created on top of an existing Python installation, known as the environment's base Python.

Select the base Python by loading the python module you wish to extend, e.g., python/3.9.13-gpu (you can run module avail python to list all the available python modules).

[auser@cirrus-login1 auser]$ module load python/3.9.13\n

Tip

In the commands below, remember to replace x01 with your project code and auser with your username.

Next, create the virtual environment within a designated folder:

python -m venv --system-site-packages /work/x01/x01/auser/myvenv\n

In our example, the environment is created within a myvenv folder located on /work, which means the environment will be accessible from the compute nodes. The --system-site-packages option ensures that this environment is based on the currently loaded python module. See https://docs.python.org/3/library/venv.html for more details.

extend-venv-activate /work/x01/x01/auser/myvenv\n

The extend-venv-activate command ensures that your virtual environment's activate script loads and unloads the base python module when appropriate. You're now ready to activate your environment.

source /work/x01/x01/auser/myvenv/bin/activate\n

Important

The path above uses a fictitious project code, x01, and username, auser. Please remember to replace those values with your actual project code and username. Alternatively, you could enter ${HOME/home/work} in place of /work/x01/x01/auser. That command fragment expands ${HOME} and then replaces the home part with work.

Installing packages to your local environment can now be done as follows.

(myvenv) [auser@cirrus-login1 auser]$ python -m pip install <package name>\n

Running pip directly as in pip install <package name> will also work, but we show the python -m approach as this is consistent with the way the virtual environment was created. And when you have finished installing packages, you can deactivate your environment by issuing the deactivate command.

(myvenv) [auser@cirrus-login1 auser]$ deactivate\n[auser@cirrus-login1 auser]$\n

The packages you have just installed locally will only be available once the local environment has been activated. So, when running code that requires these packages, you must first activate the environment, by adding the activation command to the submission script, as shown below.

submit-myvenv.slurm
#!/bin/bash\n\n#SBATCH --job-name=myvenv\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --account=[budget code]\n#SBATCH --nodes=2\n#SBATCH --gres=gpu:4\n\nsource /work/x01/x01/auser/myvenv/bin/activate\n\nsrun --ntasks=8 --tasks-per-node=4 --cpus-per-task=10 myvenv-script.py\n

Lastly, the environment being extended does not have to come from one of the centrally-installed python modules. You could just as easily create a local virtual environment based on one of the Machine Learning (ML) modules, e.g., tensorflow or pytorch. This means you would avoid having to install ML packages within your local area. Each of those ML modules is based on a python module. For example, tensorflow/2.15.0-gpu is itself an extension of python/3.10.8-gpu.

"},{"location":"user-guide/python/#installing-your-own-python-packages-with-conda","title":"Installing your own Python packages (with conda)","text":"

This section shows you how to setup a local custom Python environment such that it duplicates a centrally-installed python module, ensuring that your local conda environment will contain packages that are compatible with the Cirrus system.

Select the base Python by loading the python module you wish to duplicate, e.g., python/3.9.13-gpu (you can run module avail python to list all the available python modules).

[auser@cirrus-login1 auser]$ module load python/3.9.13\n

Next, create the folder for holding your conda environments. This folder should be on the /work file system as /home is not accessible from the compute nodes.

CONDA_ROOT=/work/x01/x01/auser/condaenvs\nmkdir -p ${CONDA_ROOT}\n

The following commands tell conda where to save your custom environments and packages.

conda config --prepend envs_dirs ${CONDA_ROOT}/envs\nconda config --prepend pkgs_dirs ${CONDA_ROOT}/pkgs\n

The conda config commands are executed just once and the configuration details are held in a .condarc file located in your home directory. You now need to move this .condarc file to a directory visible from the compute nodes.

mv ~/.condarc ${CONDA_ROOT}\n

You can now activate the conda configuration.

export CONDARC=${CONDA_ROOT}/.condarc\neval \"$(conda shell.bash hook)\"\n

These two lines need to be called each time you want to use your virtual conda environment. The next command creates that virtual environment.

conda create --clone base --name myvenv\n

The above creates an environment called myvenv that will hold the same packages provided by the base python module. As this command involves a significant amount of file copying and downloading, it may take a long time to complete. When it has completed please activate the local myvenv conda environment.

conda activate myvenv\n

You can now install packages using conda install -p ${CONDA_ROOT}/envs/myvenv <package_name>. And you can see the packages currently installed in the active environment with the command conda list. After all packages have been installed, simply run conda deactivate twice in order to restore the original comand prompt.

(myvenv) [auser@cirrus-login1 auser]$ conda deactivate\n(base) [auser@cirrus-login1 auser]$ conda deactivate\n[auser@cirrus-login1 auser]$\n

The submission script below shows how to use the conda environment within a job running on the compute nodes.

submit-myvenv.slurm
#!/bin/bash\n\n#SBATCH --job-name=myvenv\n#SBATCH --time=00:20:00\n#SBATCH --exclusive\n#SBATCH --partition=gpu\n#SBATCH --qos=gpu\n#SBATCH --account=[budget code]\n#SBATCH --nodes=2\n#SBATCH --gres=gpu:4\n\nmodule load python/3.9.13\n\nCONDA_ROOT=/work/x01/x01/auser/condaenvs\nexport CONDARC=${CONDA_ROOT}/.condarc\neval \"$(conda shell.bash hook)\"\n\nconda activate myvenv\n\nsrun --ntasks=8 --tasks-per-node=4 --cpus-per-task=10 myvenv-script.py\n

You can see that using conda is less convenient compared to pip. In particular, the centrally-installed Python packages on copied in to the local conda environment, consuming some of the disk space allocated to your project. Secondly, activating the conda environment within a submission script is more involved: five commands are required (including an explicit load for the base python module), instead of the single source command that is sufficient for a pip environment.

Further, conda cannot be used if the base environment is one of the Machine Learning (ML) modules, as conda is not flexible enough to gather Python packages from both the ML and base python modules (e.g., the ML module pytorch/1.13.1-gpu is itself based on python/3.10.8-gpu, and so conda will only duplicate packages provided by the python module and not the ones supplied by pytorch).

"},{"location":"user-guide/python/#using-jupyterlab-on-cirrus","title":"Using JupyterLab on Cirrus","text":"

It is possible to view and run JupyterLab on both the login and compute nodes of Cirrus. Please note, you can test notebooks on the login nodes, but please don\u2019t attempt to run any computationally intensive work (such jobs will be killed should they reach the login node CPU limit).

If you want to run your JupyterLab on a compute node, you will need to enter an interactive session; otherwise you can start from a login node prompt.

  1. As described above, load the Anaconda module on Cirrus using module load anaconda/python3.

  2. Run export JUPYTER_RUNTIME_DIR=$(pwd).

  3. Start the JupyterLab server by running jupyter lab --ip=0.0.0.0 --no-browser

    Or copy and paste one of these URLs:\n    ...\n or http://127.0.0.1:8888/lab?token=<string>\n

    You will need the URL shown above for step 6.

  4. Please skip this step if you are connecting from Windows. If you are connecting from Linux or macOS, open a new terminal window, and run the following command.

    ssh <username>@login.cirrus.ac.uk -L<port_number>:<node_id>:<port_number>\n

    where \\<username> is your username, \\<port_number> is as shown in the URL from the Jupyter output and \\<node_id> is the name of the node we\u2019re currently on. On a login node, this will be cirrus-login1, or similar; on a compute node, it will be a mix of numbers and letters such as r2i5n5.

    Note

    If, when you connect in the new terminal, you see a message of the form channel_setup_fwd_listener_tcpip: cannot listen to port: 8888, it means port 8888 is already in use. You need to go back to step 3 (kill the existing jupyter lab) and retry with a new explicit port number by adding the --port=N option. The port number N can be in the range 5000-65535. You should then use the same port number in place of 8888.

  5. Please skip this step if you are connecting from Linux or macOS. If you are connecting from Windows, you should use MobaXterm to configure an SSH tunnel as follows.

    5.1. Click on the Tunnelling button above the MobaXterm terminal. Create a new tunnel by clicking on New SSH tunnel in the window that opens.

    5.2. In the new window that opens, make sure the Local port forwarding radio button is selected.

    5.3. In the forwarded port text box on the left under My computer with MobaXterm, enter the port number indicated in the Jupyter server output.

    5.4. In the three text boxes on the bottom right under SSH server enter login.cirrus.ac.uk, your Cirrus username, and then 22.

    5.5. At the top right, under Remote server, enter the name of the Cirrus login or compute node that you noted earlier followed by the port number (e.g. 8888).

    5.6. Click on the Save button.

    5.7. In the tunnelling window, you will now see a new row for the settings you just entered. If you like, you can give a name to the tunnel in the leftmost column to identify it. Click on the small key icon close to the right for the new connection to tell MobaXterm which SSH private key to use when connecting to Cirrus. You should tell it to use the same .ppk private key that you normally use.

    5.8. The tunnel should now be configured. Click on the small start button (like a play > icon) for the new tunnel to open it. You'll be asked to enter your Cirrus password -- please do so.

  6. Now, if you open a browser window on your local machine, you should be able to navigate to the URL from step 3, and this should display the JupyterLab server.

Note

If you have extended a central Python venv following the instructions about for Installing your own Python packages (with pip), Jupyter Lab will load the central ipython kernel, not the one for your venv. To enable loading of the ipython kernel for your venv from within Jupyter Lab, first install the ipykernel module and then use this to install the kernel for your venv.

source /work/x01/x01/auser/myvenv/bin/activate\npython -m pip install ipykernel\npython -m ipykernel install --user --name=myvenv\n
changing placeholder account and username as appropriate. Thereafter, launch Jupyter Lab as above and select the myvenv kernel.

If you are on a compute node, the JupyterLab server will be available for the length of the interactive session you have requested.

You can also run Jupyter sessions using the centrally-installed Miniconda3 modules available on Cirrus. For example, the following link provides instructions for how to setup a Jupyter server on a GPU node.

https://github.com/hpc-uk/build-instructions/tree/main/pyenvs/ipyparallel

"},{"location":"user-guide/reading/","title":"References and further reading","text":""},{"location":"user-guide/reading/#online-documentation-and-resources","title":"Online Documentation and Resources","text":""},{"location":"user-guide/reading/#mpi-programming","title":"MPI programming","text":""},{"location":"user-guide/reading/#openmp-programming","title":"OpenMP programming","text":""},{"location":"user-guide/reading/#parallel-programming","title":"Parallel programming","text":""},{"location":"user-guide/reading/#programming-languages","title":"Programming languages","text":""},{"location":"user-guide/reading/#programming-skills","title":"Programming skills","text":""},{"location":"user-guide/resource_management/","title":"File and Resource Management","text":"

This section covers some of the tools and technical knowledge that will be key to maximising the usage of the Cirrus system, such as the online administration tool SAFE and calculating the CPU-time available.

The default file permissions are then outlined, along with a description of changing these permissions to the desired setting. This leads on to the sharing of data between users and systems often a vital tool for project groups and collaboration.

Finally we cover some guidelines for I/O and data archiving on Cirrus.

"},{"location":"user-guide/resource_management/#the-cirrus-administration-web-site-safe","title":"The Cirrus Administration Web Site (SAFE)","text":"

All users have a login and password on the Cirrus Administration Web Site (also know as the 'SAFE'): SAFE. Once logged into this web site, users can find out much about their usage of the Cirrus system, including:

"},{"location":"user-guide/resource_management/#checking-your-cpugpu-time-allocations","title":"Checking your CPU/GPU time allocations","text":"

You can view these details by logging into the SAFE (https://safe.epcc.ed.ac.uk).

Use the Login accounts menu to select the user account that you wish to query. The page for the login account will summarise the resources available to account.

You can also generate reports on your usage over a particular period and examine the details of how many CPUh or GPUh individual jobs on the system cost. To do this use the Service information menu and selet Report generator.

"},{"location":"user-guide/resource_management/#disk-quotas","title":"Disk quotas","text":"

Disk quotas on Cirrus are managed via SAFE

For live disk usage figures on the Lustre /work file system, use

lfs quota -hu <username> /work\n\nlfs quota -hg <groupname> /work\n
"},{"location":"user-guide/resource_management/#backup-policies","title":"Backup policies","text":"

The /home file system is not backed up.

The /work file system is not backed up.

The solid-state storage /scratch/space1 file system is not backed up.

We strongly advise that you keep copies of any critical data on on an alternative system that is fully backed up.

"},{"location":"user-guide/resource_management/#sharing-data-with-other-cirrus-users","title":"Sharing data with other Cirrus users","text":"

How you share data with other Cirrus users depends on whether or not they belong to the same project as you. Each project has two shared folders that can be used for sharing data.

"},{"location":"user-guide/resource_management/#sharing-data-with-cirrus-users-in-your-project","title":"Sharing data with Cirrus users in your project","text":"

Each project has an inner shared folder on the /home and /work filesystems:

/home/[project code]/[project code]/shared\n\n/work/[project code]/[project code]/shared\n

This folder has read/write permissions for all project members. You can place any data you wish to share with other project members in this directory. For example, if your project code is x01 the inner shared folder on the /work file system would be located at /work/x01/x01/shared.

"},{"location":"user-guide/resource_management/#sharing-data-with-all-cirrus-users","title":"Sharing data with all Cirrus users","text":"

Each project also has an outer shared folder on the /home and /work filesystems:

/home/[project code]/shared\n\n/work/[project code]/shared\n

It is writable by all project members and readable by any user on the system. You can place any data you wish to share with other Cirrus users who are not members of your project in this directory. For example, if your project code is x01 the outer shared folder on the /work file system would be located at /work/x01/shared.

"},{"location":"user-guide/resource_management/#file-permissions-and-security","title":"File permissions and security","text":"

You should check the permissions of any files that you place in the shared area, especially if those files were created in your own Cirrus account. Files of the latter type are likely to be readable by you only.

The chmod command below shows how to make sure that a file placed in the outer shared folder is also readable by all Cirrus users.

chmod a+r /work/x01/shared/your-shared-file.txt\n

Similarly, for the inner shared folder, chmod can be called such that read permission is granted to all users within the x01 project.

chmod g+r /work/x01/x01/shared/your-shared-file.txt\n

If you're sharing a set of files stored within a folder hierarchy the chmod is slightly more complicated.

chmod -R a+Xr /work/x01/shared/my-shared-folder\nchmod -R g+Xr /work/x01/x01/shared/my-shared-folder\n

The -R option ensures that the read permission is enabled recursively and the +X guarantees that the user(s) you're sharing the folder with can access the subdirectories below my-shared-folder.

Default Unix file permissions can be specified by the umask command. The default umask value on Cirrus is 22, which provides \"group\" and \"other\" read permissions for all files created, and \"group\" and \"other\" read and execute permissions for all directories created. This is highly undesirable, as it allows everyone else on the system to access (but at least not modify or delete) every file you create. Thus it is strongly recommended that users change this default umask behaviour, by adding the command umask 077 to their $HOME/.profile file. This umask setting only allows the user access to any file or directory created. The user can then selectively enable \"group\" and/or \"other\" access to particular files or directories if required.

"},{"location":"user-guide/resource_management/#file-types","title":"File types","text":""},{"location":"user-guide/resource_management/#ascii-or-formatted-files","title":"ASCII (or formatted) files","text":"

These are the most portable, but can be extremely inefficient to read and write. There is also the problem that if the formatting is not done correctly, the data may not be output to full precision (or to the subsequently required precision), resulting in inaccurate results when the data is used. Another common problem with formatted files is FORMAT statements that fail to provide an adequate range to accommodate future requirements, e.g. if we wish to output the total number of processors, NPROC, used by the application, the statement:

WRITE (*,'I3') NPROC\n

will not work correctly if NPROC is greater than 999.

"},{"location":"user-guide/resource_management/#binary-or-unformatted-files","title":"Binary (or unformatted) files","text":"

These are much faster to read and write, especially if an entire array is read or written with a single READ or WRITE statement. However the files produced may not be readable on other systems.

GNU compiler -fconvert=swap compiler option. This compiler option often needs to be used together with a second option -frecord-marker, which specifies the length of record marker (extra bytes inserted before or after the actual data in the binary file) for unformatted files generated on a particular system. To read a binary file generated by a big-endian system on Cirrus, use -fconvert=swap -frecord-marker=4. Please note that due to the same 'length of record marker' reason, the unformatted files generated by GNU and other compilers on Cirrus are not compatible. In fact, the same WRITE statements would result in slightly larger files with GNU compiler. Therefore it is recommended to use the same compiler for your simulations and related pre- and post-processing jobs.

Other options for file formats include:

Direct access files Fortran unformatted files with specified record lengths. These may be more portable between different systems than ordinary (i.e. sequential IO) unformatted files, with significantly better performance than formatted (or ASCII) files. The \"endian\" issue will, however, still be a potential problem.

Portable data formats These machine-independent formats for representing scientific data are specifically designed to enable the same data files to be used on a wide variety of different hardware and operating systems. The most common formats are:

It is important to note that these portable data formats are evolving standards, so make sure you are aware of which version of the standard/software you are using, and keep up-to-date with any backward-compatibility implications of each new release.

"},{"location":"user-guide/resource_management/#file-io-performance-guidelines","title":"File IO Performance Guidelines","text":"

Here are some general guidelines

"},{"location":"user-guide/resource_management/#common-io-patterns","title":"Common I/O patterns","text":"

There is a number of I/O patterns that are frequently used in applications:

"},{"location":"user-guide/resource_management/#single-file-single-writer-serial-io","title":"Single file, single writer (Serial I/O)","text":"

A common approach is to funnel all the I/O through a single master process. Although this has the advantage of producing a single file, the fact that only a single client is doing all the I/O means that it gains little benefit from the parallel file system.

"},{"location":"user-guide/resource_management/#file-per-process-fpp","title":"File-per-process (FPP)","text":"

One of the first parallel strategies people use for I/O is for each parallel process to write to its own file. This is a simple scheme to implement and understand but has the disadvantage that, at the end of the calculation, the data is spread across many different files and may therefore be difficult to use for further analysis without a data reconstruction stage.

"},{"location":"user-guide/resource_management/#single-file-multiple-writers-without-collective-operations","title":"Single file, multiple writers without collective operations","text":"

There are a number of ways to achieve this. For example, many processes can open the same file but access different parts by skipping some initial offset; parallel I/O libraries such as MPI-IO, HDF5 and NetCDF also enable this.

Shared-file I/O has the advantage that all the data is organised correctly in a single file making analysis or restart more straightforward.

The problem is that, with many clients all accessing the same file, there can be a lot of contention for file system resources.

"},{"location":"user-guide/resource_management/#single-shared-file-with-collective-writes-ssf","title":"Single Shared File with collective writes (SSF)","text":"

The problem with having many clients performing I/O at the same time is that, to prevent them clashing with each other, the I/O library may have to take a conservative approach. For example, a file may be locked while each client is accessing it which means that I/O is effectively serialised and performance may be poor.

However, if I/O is done collectively where the library knows that all clients are doing I/O at the same time, then reads and writes can be explicitly coordinated to avoid clashes. It is only through collective I/O that the full bandwidth of the file system can be realised while accessing a single file.

"},{"location":"user-guide/resource_management/#achieving-efficient-io","title":"Achieving efficient I/O","text":"

This section provides information on getting the best performance out of the /work parallel file system on Cirrus when writing data, particularly using parallel I/O patterns.

You may find that using the /user-guide/solidstate gives better performance than /work for some applications and IO patterns.

"},{"location":"user-guide/resource_management/#lustre","title":"Lustre","text":"

The Cirrus /work file system use Lustre as a parallel file system technology. The Lustre file system provides POSIX semantics (changes on one node are immediately visible on other nodes) and can support very high data rates for appropriate I/O patterns.

"},{"location":"user-guide/resource_management/#striping","title":"Striping","text":"

One of the main factors leading to the high performance of /work Lustre file systems is the ability to stripe data across multiple Object Storage Targets (OSTs) in a round-robin fashion. Files are striped when the data is split up in chunks that will then be stored on different OSTs across the /work file system. Striping might improve the I/O performance because it increases the available bandwidth since multiple processes can read and write the same files simultaneously. However striping can also increase the overhead. Choosing the right striping configuration is key to obtain high performance results.

Users have control of a number of striping settings on Lustre file systems. Although these parameters can be set on a per-file basis they are usually set on directory where your output files will be written so that all output files inherit the settings.

"},{"location":"user-guide/resource_management/#default-configuration","title":"Default configuration","text":"

The file system on Cirrus has the following default stripe settings:

These settings have been chosen to provide a good compromise for the wide variety of I/O patterns that are seen on the system but are unlikely to be optimal for any one particular scenario. The Lustre command to query the stripe settings for a directory (or file) is lfs getstripe. For example, to query the stripe settings of an already created directory res_dir:

$ lfs getstripe res_dir/\nres_dir\nstripe_count:   1 stripe_size:    1048576 stripe_offset:  -1\n
"},{"location":"user-guide/resource_management/#setting-custom-striping-configurations","title":"Setting Custom Striping Configurations","text":"

Users can set stripe settings for a directory (or file) using the lfs setstripe command. The options for lfs setstripe are:

For example, to set a stripe size of 4 MiB for the existing directory res_dir, along with maximum striping count you would use:

$ lfs setstripe -s 4m -c -1 res_dir/\n
"},{"location":"user-guide/singularity/","title":"Singularity Containers","text":"

This page was originally based on the documentation at the University of Sheffield HPC service.

Designed around the notion of mobility of compute and reproducible science, Singularity enables users to have full control of their operating system environment. This means that a non-privileged user can \"swap out\" the Linux operating system and environment on the host for a Linux OS and environment that they control. So if the host system is running CentOS Linux but your application runs in Ubuntu Linux with a particular software stack, you can create an Ubuntu image, install your software into that image, copy the image to another host (e.g. Cirrus), and run your application on that host in its native Ubuntu environment.

Singularity also allows you to leverage the resources of whatever host you are on. This includes high-speed interconnects (e.g. Infiniband), file systems (e.g. Lustre) and potentially other resources (such as the licensed Intel compilers on Cirrus).

Note

Singularity only supports Linux containers. You cannot create images that use Windows or macOS (this is a restriction of the containerisation model rather than of Singularity).

"},{"location":"user-guide/singularity/#useful-links","title":"Useful Links","text":""},{"location":"user-guide/singularity/#about-singularity-containers-images","title":"About Singularity Containers (Images)","text":"

Similar to Docker, a Singularity container (or, more commonly, image) is a self-contained software stack. As Singularity does not require a root-level daemon to run its images (as is required by Docker) it is suitable for use on a multi-user HPC system such as Cirrus. Within the container/image, you have exactly the same permissions as you do in a standard login session on the system.

In principle, this means that an image created on your local machine with all your research software installed for local development will also run on Cirrus.

Pre-built images (such as those on DockerHub or SingularityHub) can simply be downloaded and used on Cirrus (or anywhere else Singularity is installed); see use_image_singularity).

Creating and modifying images requires root permission and so must be done on a system where you have such access (in practice, this is usually within a virtual machine on your laptop/workstation); see create_image_singularity.

"},{"location":"user-guide/singularity/#using-singularity-images-on-cirrus","title":"Using Singularity Images on Cirrus","text":"

Singularity images can be used on Cirrus in a number of ways.

  1. Interactively on the login nodes
  2. Interactively on compute nodes
  3. As serial processes within a non-interactive batch script
  4. As parallel processes within a non-interactive batch script

We provide information on each of these scenarios. First, we describe briefly how to get existing images onto Cirrus so that you can use them.

"},{"location":"user-guide/singularity/#getting-existing-images-onto-cirrus","title":"Getting existing images onto Cirrus","text":"

Singularity images are simply files, so if you already have an image file, you can use scp to copy the file to Cirrus as you would with any other file.

If you wish to get a file from one of the container image repositories then Singularity allows you to do this from Cirrus itself.

For example, to retrieve an image from SingularityHub on Cirrus we can simply issue a Singularity command to pull the image.

[user@cirrus-login1 ~]$ module load singularity\n[user@cirrus-login1 ~]$ singularity pull hello-world.sif shub://vsoch/hello-world\n

The image located at the shub URI is written to a Singularity Image File (SIF) called hello-world.sif.

"},{"location":"user-guide/singularity/#interactive-use-on-the-login-nodes","title":"Interactive use on the login nodes","text":"

The container represented by the image file can be run on the login node like so.

[user@cirrus-login1 ~]$ singularity run hello-world.sif \nRaawwWWWWWRRRR!! Avocado!\n[user@cirrus-login1 ~]$\n

We can also shell into the container.

[user@cirrus-login1 ~]$ singularity shell hello-world.sif\nSingularity> ls /\nbin  boot  dev  environment  etc  home  lib  lib64  lustre  media  mnt  opt  proc  rawr.sh  root  run  sbin  singularity  srv  sys  tmp  usr  var\nSingularity> exit\nexit\n[user@cirrus-login1 ~]$\n

For more information see the Singularity documentation.

"},{"location":"user-guide/singularity/#interactive-use-on-the-compute-nodes","title":"Interactive use on the compute nodes","text":"

The process for using an image interactively on the compute nodes is very similar to that for using them on the login nodes. The only difference is that you first have to submit an interactive serial job to get interactive access to the compute node.

First though, move to a suitable location on /work and re-pull the hello-world image. This step is necessary as the compute nodes do not have access to the /home file system.

[user@cirrus-login1 ~]$ cd ${HOME/home/work}\n[user@cirrus-login1 ~]$ singularity pull hello-world.sif shub://vsoch/hello-world\n

Now reserve a full node to work on interactively by issuing an salloc command, see below.

[user@cirrus-login1 ~]$ salloc --exclusive --nodes=1 \\\n  --tasks-per-node=36 --cpus-per-task=1 --time=00:20:00 \\\n  --partition=standard --qos=standard --account=[budget code] \nsalloc: Pending job allocation 14507\nsalloc: job 14507 queued and waiting for resources\nsalloc: job 14507 has been allocated resources\nsalloc: Granted job allocation 14507\nsalloc: Waiting for resource configuration\nsalloc: Nodes r1i0n8 are ready for job\n[user@cirrus-login1 ~]$ ssh r1i0n8\n

Note the prompt has changed to show you are on a compute node. Once you are logged in to the compute node (you may need to submit your account password), move to a suitable location on /work as before. You can now use the hello-world image in the same way you did on the login node.

[user@r1i0n8 ~]$ cd ${HOME/home/work}\n[user@r1i0n8 ~]$ singularity shell hello-world.sif\nSingularity> exit\nexit\n[user@r1i0n8 ~]$ exit\nlogout\nConnection to r1i0n8 closed.\n[user@cirrus-login1 ~]$ exit\nexit\nsalloc: Relinquishing job allocation 14507\nsalloc: Job allocation 14507 has been revoked.\n[user@cirrus-login1 ~]$\n

Note we used exit to leave the interactive container shell and then called exit twice more to close the interactive job on the compute node.

"},{"location":"user-guide/singularity/#serial-processes-within-a-non-interactive-batch-script","title":"Serial processes within a non-interactive batch script","text":"

You can also use Singularity images within a non-interactive batch script as you would any other command. If your image contains a runscript then you can use singularity run to execute the runscript in the job. You can also use singularity exec to execute arbitrary commands (or scripts) within the image.

An example job submission script to run a serial job that executes the runscript within the hello-world.sif we built above on Cirrus would be as follows.

#!/bin/bash --login\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=hello-world\n#SBATCH --ntasks=1\n#SBATCH --exclusive\n#SBATCH --time=0:20:0\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\n# Load any required modules\nmodule load singularity\n\n# Run the serial executable\nsrun --cpu-bind=cores singularity run ${HOME/home/work}/hello-world.sif\n

Submit this script using the sbatch command and once the job has finished, you should see RaawwWWWWWRRRR!! Avocado! in the Slurm output file.

"},{"location":"user-guide/singularity/#parallel-processes-within-a-non-interactive-batch-script","title":"Parallel processes within a non-interactive batch script","text":"

Running a Singularity container on the compute nodes isn't too different from launching a normal parallel application. The submission script below shows that the srun command now contains an additional singularity clause.

#!/bin/bash --login\n\n# job options (name, compute nodes, job time)\n#SBATCH --job-name=[name of application]\n#SBATCH --nodes=4\n#SBATCH --tasks-per-node=36\n#SBATCH --cpus-per-task=1\n#SBATCH --exclusive\n#SBATCH --time=0:20:0\n#SBATCH --partition=standard\n#SBATCH --qos=standard\n\n# Replace [budget code] below with your project code (e.g. t01)\n#SBATCH --account=[budget code]\n\n# Load any required modules\nmodule load mpt\nmodule load singularity\n\n# The host bind paths for the Singularity container.\nBIND_ARGS=/work/y07/shared/cirrus-software,/opt/hpe,/etc/libibverbs.d,/path/to/input/files\n\n# The file containing environment variable settings that will allow\n# the container to find libraries on the host, e.g., LD_LIBRARY_PATH . \nENV_PATH=/path/to/container/environment/file\n\nCONTAINER_PATH=/path/to/singularity/image/file\n\nAPP_PATH=/path/to/containerized/application/executable\nAPP_PARAMS=[application parameters]\n\nsrun --distribution=block:block --hint=nomultithread \\\n    singularity exec --bind ${BIND_ARGS} --env-file ${ENV_PATH} ${IMAGE_PATH}\n        ${APP_PATH} ${APP_PARAMS}\n

The script above runs a containerized application such that each of the four nodes requested is fully populated. In general, the containerized application's input and output will be read from and written to a location on the host; hence, it is necessary to pass a suitable bind path to singularity (/path/to/input/files).

Note

The paths in the submission script that begin /path/to should be provided by the user. All but one of these paths are host specific. The exception being APP_PATH, which should be given a path relative to the container file system.

If the Singularity image file was built according to the Bind model, you will need to specify certain paths (--bind) and environment variables (--env-file) that allow the containerized application to find the required MPI libraries.

Otherwise, if the image follows the Hybrid model and so contains its own MPI implementation, you instead need to be sure that the containerized MPI is compatible with the host MPI, the one loaded in the submission script. In the example above, the host MPI is HPE MPT 2.25, but you could also use OpenMPI (with mpirun), either by loading a suitable openmpi module or by referencing the paths to an OpenMPI installation that was built locally (i.e., within your Cirrus work folder).

"},{"location":"user-guide/singularity/#creating-your-own-singularity-images","title":"Creating Your Own Singularity Images","text":"

You can create Singularity images by importing from DockerHub or Singularity Hub directly to Cirrus. If you wish to create your own custom image then you must install Singularity on a system where you have root (or administrator) privileges - often your own laptop or workstation.

We provide links below to instructions on how to install Singularity locally and then cover what options you need to include in a Singularity definition file in order to create images that can run on Cirrus and access the software development modules. This can be useful if you want to create a custom environment but still want to compile and link against libraries that you only have access to on Cirrus such as the Intel compilers and HPE MPI libraries.

"},{"location":"user-guide/singularity/#installing-singularity-on-your-local-machine","title":"Installing Singularity on Your Local Machine","text":"

You will need Singularity installed on your machine in order to locally run, create and modify images. How you install Singularity on your laptop/workstation depends on the operating system you are using.

If you are using Windows or macOS, the simplest solution is to use Vagrant to give you an easy to use virtual environment with Linux and Singularity installed. The Singularity website has instructions on how to use this method to install Singularity.

If you are using Linux then you can usually install Singularity directly.

"},{"location":"user-guide/singularity/#accessing-cirrus-modules-from-inside-a-container","title":"Accessing Cirrus Modules from Inside a Container","text":"

You may want your custom image to be able to access the modules environment on Cirrus so you can make use of custom software that you cannot access elsewhere. We demonstrate how to do this for a CentOS 7 image but the steps are easily translated for other flavours of Linux.

For the Cirrus modules to be available in your Singularity container you need to ensure that the environment-modules package is installed in your image.

In addition, when you use the container you must invoke access as a login shell to have access to the module commands.

Below, is an example Singularity definition file that builds a CentOS 7 image with access to TCL modules already installed on Cirrus.

BootStrap: docker\nFrom: centos:centos7\n\n%post\n    yum update -y\n    yum install environment-modules -y\n    echo 'module() { eval `/usr/bin/modulecmd bash $*`; }' >> /etc/bashrc\n    yum install wget -y\n    yum install which -y\n    yum install squashfs-tools -y\n

If we save this definition to a file called centos7.def, we can use the following build command to build the image (remember this command must be run on a system where you have root access, not on Cirrus).

me@my-system:~> sudo singularity build centos7.sif centos7.def\n

The resulting image file (centos7.sif) can then be copied to Cirrus using scp; such an image already exists on Cirrus and can be found in the /work/y07/shared/cirrus-software/singularity/images folder.

When you use that image interactively on Cirrus you must start with a login shell and also bind /work/y07/shared/cirrus-software so that the container can see all the module files, see below.

[user@cirrus-login1 ~]$ module load singularity\n[user@cirrus-login1 ~]$ singularity exec -B /work/y07/shared/cirrus-software \\\n  /work/y07/shared/cirrus-software/singularity/images/centos7.sif \\\n    /bin/bash --login\nSingularity> module avail intel-*/compilers\n\n--------- /work/y07/shared/cirrus-modulefiles -------------\nintel-19.5/compilers  intel-20.4/compilers\nSingularity> exit\nlogout\n[user@cirrus-login1 ~]$\n
"},{"location":"user-guide/singularity/#altering-a-container-on-cirrus","title":"Altering a Container on Cirrus","text":"

A container image file is immutable but it is possible to alter the image if you convert the file to a sandbox. The sandbox is essentially a directory on the host system that contains the full container file hierarchy.

You first run the singularity build command to perform the conversion followed by a shell command with the --writable option. You are now free to change the files inside the container sandbox.

user@cirrus-login1 ~]$ singularity build --sandbox image.sif.sandbox image.sif\nuser@cirrus-login1 ~]$ singularity shell -B /work/y07/shared/cirrus-software --writable image.sif.sandbox\nSingularity>\n

In the example above, the /work/y07/shared/cirrus-software bind path is specified, allowing you to build code that links to the Cirrus module libraries.

Finally, once you are finished with the sandbox you can exit and convert back to the original image file.

Singularity> exit\nexit\nuser@cirrus-login1 ~]$ singularity build --force image.sif image.sif.sandbox\n

Note

Altering a container in this way will cause the associated definition file to be out of step with the current image. Care should be taken to keep a record of the commands that were run within the sandbox so that the image can be reproduced.

"},{"location":"user-guide/solidstate/","title":"Solid state storage","text":"

In addition to the Lustre file system, the Cirrus login and compute nodes have access to a shared, high-performance, solid state storage system (also known as RPOOL). This storage system is network mounted and shared across the login nodes and GPU compute nodes in a similar way to the normal, spinning-disk Lustre file system but has different performanc characteristics.

The solid state storage has a maximum usable capacity of 256 TB which is shared between all users.

"},{"location":"user-guide/solidstate/#backups-quotas-and-data-longevity","title":"Backups, quotas and data longevity","text":"

There are no backups of any data on the solid state storage so you should ensure that you have copies of critical data elsewhere.

In addition, the solid state storage does not currently have any quotas (user or group) enabled so all users are potentially able to access the full 256 TB capacity of the storage system. We ask all users to be considerate in their use of this shared storage system and to delete any data on the solid state storage as soon as it no longer needs to be there.

We monitor the usage of the storage system by users and groups and will potentially remove data that is stopping other users getting fair access to the storage and data that has not been actively used for long periods of time.

"},{"location":"user-guide/solidstate/#accessing-the-solid-state-storage","title":"Accessing the solid-state storage","text":"

You access the solid-state storage at /scratch/space1 on both the login nodes and on the compute nodes.

Everybody has access to be able to create directories and add data so we suggest that you create a directory for your project and/or user to avoid clashes with files and data added by other users. For example, if my project is t01 and my username is auser then I could create a directory with

mkdir -p /scratch/space1/t01/auser\n

When these directories are initially created they will be world-readable. If you do not want users from other projects to be able to see your data, you should change the permissions on your new directory. For example, to restrict the directory so that only other users in your project can read the data you would use:

chmod -R o-rwx /scratch/space1/t01\n
"},{"location":"user-guide/solidstate/#copying-data-tofrom-solid-state-storage","title":"Copying data to/from solid-state storage","text":"

You can move data to/from the solid-state storage in a number of different ways:

"},{"location":"user-guide/solidstate/#local-data-transfer","title":"Local data transfer","text":"

The most efficient tool for copying to/from the Cirrus file systems (/home, /work) to the solid state storage is generally the cp command, e.g.

cp -r /path/to/data-dir /scratch/space1/t01/auser/\n

where /path/to/data-dir should be replaced with the path to the data directory you are wanting to copy and assuming, of course, that you have setup the t01/auser subdirectories as described above).

Note

If you are transferring data from your /work directory, these commands can also be added to job submission scripts running on the compute nodes to move data as part of the job. If you do this, remember to include the data transfer time in the overall walltime for the job.

Data from your /home directory is not available from the compute nodes and must therefore be transferred from a login node.

"},{"location":"user-guide/solidstate/#remote-data-transfer","title":"Remote data transfer","text":"

You can transfer data directly to the solid state storage from external locations using scp or rsync in exactly the same way as you would usually do to transfer data to Cirrus. Simply substitute the path to the location on the solid state storage for that you would normally use for Cirrus. For example, if you are on the external location (e.g. your laptop), you could use something like:

scp -r data_dir user@login.cirrus.ac.uk:/scratch/space1/t01/auser/\n

You can also use commands such as wget and curl to pull data from external locations directly to the solid state storage.

Note

You cannot transfer data from external locations in job scripts as the Cirrus compute nodes do not have external network access.

"}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index d0093ae7..e9f47e21 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ diff --git a/software-libraries/hdf5/index.html b/software-libraries/hdf5/index.html index d5656a19..a4ae4e67 100644 --- a/software-libraries/hdf5/index.html +++ b/software-libraries/hdf5/index.html @@ -1017,7 +1017,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-libraries/intel_mkl/index.html b/software-libraries/intel_mkl/index.html index 44dc3785..9d5f5cc4 100644 --- a/software-libraries/intel_mkl/index.html +++ b/software-libraries/intel_mkl/index.html @@ -1122,7 +1122,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/Ansys/index.html b/software-packages/Ansys/index.html index 5735440c..ad880755 100644 --- a/software-packages/Ansys/index.html +++ b/software-packages/Ansys/index.html @@ -1001,7 +1001,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/MATLAB/index.html b/software-packages/MATLAB/index.html index 6c80af41..27bc5b1e 100644 --- a/software-packages/MATLAB/index.html +++ b/software-packages/MATLAB/index.html @@ -1179,7 +1179,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/altair_hw/index.html b/software-packages/altair_hw/index.html index 0a4c5651..d27af520 100644 --- a/software-packages/altair_hw/index.html +++ b/software-packages/altair_hw/index.html @@ -1001,7 +1001,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/castep/index.html b/software-packages/castep/index.html index 11fa5317..4c4e0fc6 100644 --- a/software-packages/castep/index.html +++ b/software-packages/castep/index.html @@ -1074,7 +1074,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/cp2k/index.html b/software-packages/cp2k/index.html index 54875421..ad30e690 100644 --- a/software-packages/cp2k/index.html +++ b/software-packages/cp2k/index.html @@ -1074,7 +1074,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/elements/index.html b/software-packages/elements/index.html index 003fa68a..72061aa5 100644 --- a/software-packages/elements/index.html +++ b/software-packages/elements/index.html @@ -1074,7 +1074,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/flacs/index.html b/software-packages/flacs/index.html index 262536b7..822c6c2d 100644 --- a/software-packages/flacs/index.html +++ b/software-packages/flacs/index.html @@ -1161,7 +1161,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/gaussian/index.html b/software-packages/gaussian/index.html index 27313190..bc101591 100644 --- a/software-packages/gaussian/index.html +++ b/software-packages/gaussian/index.html @@ -1092,7 +1092,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/gromacs/index.html b/software-packages/gromacs/index.html index 77b4b583..d550ea8a 100644 --- a/software-packages/gromacs/index.html +++ b/software-packages/gromacs/index.html @@ -1092,7 +1092,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/helyx/index.html b/software-packages/helyx/index.html index 4933090d..cd705470 100644 --- a/software-packages/helyx/index.html +++ b/software-packages/helyx/index.html @@ -1074,7 +1074,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/lammps/index.html b/software-packages/lammps/index.html index 975c65e8..a4e9b97d 100644 --- a/software-packages/lammps/index.html +++ b/software-packages/lammps/index.html @@ -1092,7 +1092,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/molpro/index.html b/software-packages/molpro/index.html index a6d06b50..d8b82bcc 100644 --- a/software-packages/molpro/index.html +++ b/software-packages/molpro/index.html @@ -1001,7 +1001,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/namd/index.html b/software-packages/namd/index.html index 6081cb3e..c2bc97ef 100644 --- a/software-packages/namd/index.html +++ b/software-packages/namd/index.html @@ -1074,7 +1074,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/openfoam/index.html b/software-packages/openfoam/index.html index ddfb571f..ee6edf4b 100644 --- a/software-packages/openfoam/index.html +++ b/software-packages/openfoam/index.html @@ -1083,7 +1083,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/orca/index.html b/software-packages/orca/index.html index e941055f..bd64eb70 100644 --- a/software-packages/orca/index.html +++ b/software-packages/orca/index.html @@ -1074,7 +1074,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/qe/index.html b/software-packages/qe/index.html index f3b28a82..60f169e8 100644 --- a/software-packages/qe/index.html +++ b/software-packages/qe/index.html @@ -1074,7 +1074,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/starccm+/index.html b/software-packages/starccm+/index.html index 4ec59633..aa389b3c 100644 --- a/software-packages/starccm+/index.html +++ b/software-packages/starccm+/index.html @@ -1107,7 +1107,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-packages/vasp/index.html b/software-packages/vasp/index.html index dc12e504..91d9ebab 100644 --- a/software-packages/vasp/index.html +++ b/software-packages/vasp/index.html @@ -1083,7 +1083,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-tools/ddt/index.html b/software-tools/ddt/index.html index e0182439..a47b7999 100644 --- a/software-tools/ddt/index.html +++ b/software-tools/ddt/index.html @@ -20,7 +20,7 @@ - Debugging using Arm DDT - Cirrus User Documentation + Debugging using Linaro DDT - Cirrus User Documentation @@ -82,7 +82,7 @@
- + Skip to content @@ -117,7 +117,7 @@
- Debugging using Arm DDT + Debugging using Linaro DDT
@@ -1016,7 +1016,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT @@ -1027,7 +1027,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT @@ -1260,15 +1260,15 @@ -

Debugging using Arm DDT

-

The Arm Forge tool suite is installed on Cirrus. This includes DDT, +

Debugging using Linaro DDT

+

The Linaro Forge tool suite is installed on Cirrus. This includes DDT, which is a debugging tool for scalar, multi-threaded and large-scale parallel applications. To compile your code for debugging you will usually want to specify the -O0 option to turn off all code optimisation (as this can produce a mismatch between source code line numbers and debugging information) and -g to include debugging information in the compiled executable. To use this package you will -need to log in to Cirrus with X11-forwarding enabled, load the Arm Forge +need to log in to Cirrus with X11-forwarding enabled, load the Linaro Forge module and execute forge:

module load forge
 forge
@@ -1294,14 +1294,14 @@ 

Debugging runs on the compute nodes the MPI implementation to Slurm (generic). You must also tick the Submit to Queue box. Clicking the Configure button in this section, you must now choose the submission template. One is provided for you at -/mnt/lustre/indy2lfs/sw/arm/forge/latest/templates/cirrus.qtf which +/work/y07/shared/cirrus-software/forge/latest/templates/cirrus.qtf which you should copy and modify to suit your needs. You will need to load any modules required for your code and perform any other necessary setup, such as providing extra sbatch options, i.e., whatever is needed for your code to run in a normal batch job.

Note

-

The current Arm Forge licence permits use on the Cirrus CPU nodes only. +

The current Linaro Forge licence permits use on the Cirrus CPU nodes only. The licence does not permit use of DDT/MAP for codes that run on the Cirrus GPUs.

@@ -1326,15 +1326,15 @@

Memory debugging with DDT

locations are all set up when the forge module is loaded so these libraries should be found without further arguments.

Remote Client

-

Arm Forge can connect to remote systems using SSH so you can run the +

Linaro Forge can connect to remote systems using SSH so you can run the user interface on your desktop or laptop machine without the need for X forwarding. Native remote clients are available for Windows, macOS and -Linux. You can download the remote clients from the Arm +Linux. You can download the remote clients from the Linaro Forge website. No licence file is required by a remote client.

Note

-

The same versions of Arm Forge must be installed on the local and remote +

The same versions of Linaro Forge must be installed on the local and remote systems in order to use DDT remotely.

To configure the remote client to connect to Cirrus, start it and then @@ -1342,7 +1342,7 @@

Remote Client

the new window, click Add to create a new login profile. For the hostname you should provide username@login.cirrus.ac.uk where username is your login username. For Remote Installation Directory* -enter /mnt/lustre/indy2lfs/sw/arm/forge/latest. To ensure your SSH +enter /work/y07/shared/cirrus-software/forge/latest. To ensure your SSH private key can be used to connect, the SSH agent on your local machine should be configured to provide it. You can ensure this by running ssh-add ~/.ssh/id_rsa_cirrus before using the Forge client where you @@ -1367,12 +1367,12 @@

Remote Client

usual login password the connection to Cirrus will be established and you will be able to start debugging.

You can find more detailed information -here.

+here.

Getting further help on DDT

diff --git a/software-tools/intel-vtune/index.html b/software-tools/intel-vtune/index.html index ec7285b7..6841e392 100644 --- a/software-tools/intel-vtune/index.html +++ b/software-tools/intel-vtune/index.html @@ -1007,7 +1007,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/software-tools/scalasca/index.html b/software-tools/scalasca/index.html index 6f95449b..c5d90a15 100644 --- a/software-tools/scalasca/index.html +++ b/software-tools/scalasca/index.html @@ -1007,7 +1007,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/user-guide/batch/index.html b/user-guide/batch/index.html index 3f1147c7..a987ea84 100644 --- a/user-guide/batch/index.html +++ b/user-guide/batch/index.html @@ -1416,7 +1416,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/user-guide/connecting/index.html b/user-guide/connecting/index.html index 202fc165..399fa20b 100644 --- a/user-guide/connecting/index.html +++ b/user-guide/connecting/index.html @@ -1242,7 +1242,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/user-guide/data/index.html b/user-guide/data/index.html index 076c2311..c6cb7a0f 100644 --- a/user-guide/data/index.html +++ b/user-guide/data/index.html @@ -1239,7 +1239,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/user-guide/development/index.html b/user-guide/development/index.html index e589d295..6216b73b 100644 --- a/user-guide/development/index.html +++ b/user-guide/development/index.html @@ -1245,7 +1245,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT @@ -1600,10 +1600,10 @@

Information on the available modul

You can list all the modules of a particular type by providing an argument to the module avail command. For example, to list all available versions of the Intel Compiler type:

-
[user@cirrus-login0 ~]$ module avail intel-compilers
+
[user@cirrus-login0 ~]$ module avail intel-*/compilers
 
---------------------------------- /mnt/lustre/indy2lfs/sw/modulefiles --------------------------------
-intel-compilers-18/18.05.274  intel-compilers-19/19.0.0.117
+--------------------------------- /work/y07/shared/cirrus-modulefiles --------------------------------
+intel-19.5/compilers  intel-20.4/compilers
 

If you want more info on any of the modules, you can use the module help command:

@@ -1622,38 +1622,30 @@

Information on the available modul their versions you have presently loaded in your environment, e.g.:

[user@cirrus-login0 ~]$ module list
 Currently Loaded Modulefiles:
-1) git/2.35.1(default)                                  6) gcc/8.2.0(default)
-2) singularity/3.7.2(default)                           7) intel-cc-18/18.0.5.274
-3) epcc/utils                                           8) intel-fc-18/18.0.5.274
-4) /mnt/lustre/indy2lfs/sw/modulefiles/epcc/setup-env   9) intel-compilers-18/18.05.274
-5) intel-license                                       10) mpt/2.25
+1) git/2.35.1(default)                                  
+2) epcc/utils
+2) /mnt/lustre/e1000/home/y07/shared/cirrus-modulefiles/epcc/setup-env
 

Loading, unloading and swapping modules

To load a module to use module add or module load. For example, to -load the intel-compilers-18 into the development environment:

-
module load intel-compilers-18
+load the intel 20.4 compilers into the development environment:

+
module load intel-20.4/compilers
 
-

This will load the default version of the intel compilers. If you need a -specific version of the module, you can add more information:

-
module load intel-compilers-18/18.0.5.274
-
-

will load version 18.0.2.274 for you, regardless of the default.

+

This will load the default version of the intel compilers.

If a module loading file cannot be accessed within 10 seconds, a warning message will appear: Warning: Module system not loaded.

If you want to clean up, module remove will remove a loaded module:

-
module remove intel-compilers-18
+
module remove intel-20.4/compilers
 
-

(or module rm intel-compilers-18 or -module unload intel-compilers-18) will unload what ever version of -intel-compilers-18 (even if it is not the default) you might have -loaded. There are many situations in which you might want to change the +

You could also run module rm intel-20.4/compilers or module unload intel-20.4/compilers. +There are many situations in which you might want to change the presently loaded version to a different one, such as trying the latest version which is not yet the default or using a legacy version to keep compatibility with old data. This can be achieved most easily by using "module swap oldmodule newmodule".

-

Suppose you have loaded version 18 of the Intel compilers; the following -command will change to version 19:

-
module swap intel-compilers-18 intel-compilers-19
+

Suppose you have loaded version 19 of the Intel compilers; the following +command will change to version 20:

+
module swap intel-19.5/compilers intel-20.4/compilers
 

Available Compiler Suites

@@ -1661,10 +1653,10 @@

Available Compiler Suites

As Cirrus uses dynamic linking by default you will generally also need to load any modules you used to compile your code in your job submission script when you run your code.

Intel Compiler Suite

-

The Intel compiler suite is accessed by loading the intel-compilers-* -and intel-*/compilers modules, where * references the version. For -example, to load the 2019 release, you would run:

-
module load intel-compilers-19
+

The Intel compiler suite is accessed by loading the intel-*/compilers +module, where * references the version. For example, to load the v20 +release, you would run:

+
module load intel-20.4/compilers
 

Once you have loaded the module, the compilers are available as:

    @@ -1676,9 +1668,9 @@

    Intel Compiler Suite

    compiler versions and tools.

    GCC Compiler Suite

    The GCC compiler suite is accessed by loading the gcc/* modules, where -* again is the version. For example, to load version 8.2.0 you would +* again is the version. For example, to load version 10.2.0 you would run:

    -
    module load gcc/8.2.0
    +
    module load gcc/10.2.0
     

    Once you have loaded the module, the compilers are available as:

      @@ -1713,8 +1705,8 @@

      Using HPE MPT

Using Intel Compilers and HPE MPT

Once you have loaded the MPT module you should next load the Intel -compilers module you intend to use (e.g. intel-compilers-19):

-
module load intel-compilers-19
+compilers module you intend to use (e.g. intel-20.4/compilers):

+
module load intel-20.4/compilers
 

The compiler wrappers are then available as

    @@ -1750,8 +1742,8 @@

    Using Intel MPI

    Although HPE MPT remains the default MPI library and we recommend that first attempts at building code follow that route, you may also choose to use Intel MPI if you wish. To use these, load the appropriate -intel-mpi module, for example intel-mpi-19:

    -
    module load intel-mpi-19
    +MPI module, for example intel-20.4/mpi:

    +
    module load intel-20.4/mpi
     

    Please note that the name of the wrappers to use when compiling with Intel MPI depends on whether you are using the Intel compilers or GCC. @@ -1761,26 +1753,10 @@

    Using Intel MPI

    Note

    Although Intel MPI is available on Cirrus, HPE MPT remains the recommended and default MPI library to use when building applications.

    -
    -

    Note

    -

    Using Intel MPI 18 can cause warnings in your output similar to -no hfi units are available or -The /dev/hfi1_0 device failed to appear. These warnings can be safely -ignored, or, if you would prefer to prevent them, you may add the line

    -
    export I_MPI_FABRICS=shm:ofa
    -
    -

    to your job scripts after loading the Intel MPI 18 module.

    -
    -
    -

    Note

    -

    When using Intel MPI 18, you should always launch MPI tasks with srun, -the supported method on Cirrus. Launches with mpirun or mpiexec will -likely fail.

    -

    Using Intel Compilers and Intel MPI

    After first loading Intel MPI, you should next load the appropriate -intel-compilers module (e.g. intel-compilers-19):

    -
    module load intel-compilers-19
    +Intel compilers module (e.g. intel-20.4/compilers):

    +
    module load intel-20.4/compilers
     

    You may then use the following MPI compiler wrappers:

      @@ -1809,7 +1785,7 @@

      Using OpenMPI

      specific CUDA version, one that is fully compatible with the underlying NVIDIA GPU device driver. See the link below for an example how an OpenMPI build is configured.

      -

      Build instructions for OpenMPI 4.1.5 on Cirrus

      +

      Build instructions for OpenMPI 4.1.6 on Cirrus

      All this means we build can OpenMPI such that it supports direct GPU-to-GPU communications using the NVLink intra-node GPU comm links (and inter-node GPU comms are direct to Infiniband intead of passing through the host processor).

      @@ -1892,8 +1868,6 @@

      Intel modules and tools

      A full list is available via module avail intel.

      The different available compiler versions are:

        -
      • intel-*/18.0.5.274 Intel 2018 Update 4
      • -
      • intel-*/19.0.0.117 Intel 2019 Initial release
      • intel-19.5/* Intel 2019 Update 5
      • intel-20.4/* Intel 2020 Update 4
      diff --git a/user-guide/gpu/index.html b/user-guide/gpu/index.html index c7d983af..207453b5 100644 --- a/user-guide/gpu/index.html +++ b/user-guide/gpu/index.html @@ -1299,7 +1299,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT @@ -1708,10 +1708,8 @@

      NVIDIA HPC SDK

      are particular reasons to use earlier versions. The default version is therefore the latest module version present on the system.

      Each release of the NVIDIA HPC SDK may include several different -versions of the CUDA toolchain. For example, the nvidia/nvhpc/21.2 -module comes with CUDA 10.2, 11.0 and 11.2. Only one of these CUDA -toolchains can be active at any one time and for nvhpc/22.11 this is -CUDA 11.8.

      +versions of the CUDA toolchain. Only one of these CUDA toolchains +can be active at any one time and for nvhpc/22.11 this is CUDA 11.8.

      Here is a list of available HPC SDK versions, and the corresponding version of CUDA:

      @@ -1753,10 +1751,10 @@

      CUDA

      compatible with the NVIDIA Volta architecture.

      Using CUDA with Intel compilers

      -

      You can load either the Intel 18 or Intel 19 compilers to use with +

      You can load either the Intel 19 or Intel 20 compilers to use with nvcc.

      module unload gcc
      -module load intel-compilers-19
      +module load intel-20.4/compilers
       

      You can now use nvcc -ccbin icpc to compile your source code with the Intel C++ compiler icpc.

      @@ -2062,8 +2060,8 @@

      Using Nsight Systems

      via the links below.

      https://developer.nvidia.com/nsight-systems

      https://docs.nvidia.com/nsight-systems/UserGuide/index.html

      -

      If your code was compiled with the tools provided by nvidia/nvhpc/21.2 -you should download and install Nsight Systems v2020.5.1.85.

      +

      If your code was compiled with the tools provided by nvidia/nvhpc/22.2 +you should download and install Nsight Systems v2023.4.1.97.

      Using Nsight Compute

      Nsight Compute may be used in a similar way as Nsight Systems. A job may be submitted like so.

      @@ -2092,9 +2090,9 @@

      Using Nsight Compute

      A common starting point is --section MemoryWorkloadAnalysis.

      Consult the NVIDIA documentation for further details.

      https://developer.nvidia.com/nsight-compute

      -

      https://docs.nvidia.com/nsight-compute/2021.2/index.html

      -

      Nsight Compute v2021.3.1.0 has been found to work for codes compiled -using nvhpc versions 21.2 and 21.9.

      +

      https://docs.nvidia.com/nsight-compute/2023.3/index.html

      +

      Nsight Compute v2023.3.1.0 has been found to work for codes compiled +using nvhpc versions 22.2 and 22.11.

      Monitoring the GPU Power Usage

      NVIDIA also provides a useful command line utility for the management and monitoring of NVIDIA GPUs: the NVIDIA System Management Interface nvidia-smi.

      The nvidia-smi command queries the available GPUs and reports current information, including but not limited to: driver versions, CUDA version, name, temperature, current power usage and maximum power capability. In this example output, there is one available GPU and it is idle:

      @@ -2154,8 +2152,8 @@

      Compiling and using GPU-aware MPI

      Version of OpenMPI with both CUDA-aware MPI support and SLURM support are available, you should load the following modules:

      -
      module load openmpi/4.1.4-cuda-11.8
      -module load nvidia/nvhpc-nompi/22.11
      +
      module load openmpi/4.1.6-cuda-11.6
      +module load nvidia/nvhpc-nompi/22.2
       

      The command you use to compile depends on whether you are compiling C/C++ or Fortran.

      @@ -2182,7 +2180,7 @@

      Run time

      #SBATCH --gres=gpu:4 # Load the appropriate modules, e.g., -module load openmpi/4.1.4-cuda-11.8 +module load openmpi/4.1.6-cuda-11.6 module load nvidia/nvhpc-nompi/22.2 export OMP_NUM_THREADS=1 diff --git a/user-guide/introduction/index.html b/user-guide/introduction/index.html index dc7a10b2..f470023d 100644 --- a/user-guide/introduction/index.html +++ b/user-guide/introduction/index.html @@ -1074,7 +1074,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/user-guide/network-upgrade-2023/index.html b/user-guide/network-upgrade-2023/index.html index cd4b5cd2..0c203437 100644 --- a/user-guide/network-upgrade-2023/index.html +++ b/user-guide/network-upgrade-2023/index.html @@ -1001,7 +1001,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/user-guide/python/index.html b/user-guide/python/index.html index 30b3cccc..04f4e162 100644 --- a/user-guide/python/index.html +++ b/user-guide/python/index.html @@ -1101,7 +1101,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT @@ -1283,15 +1283,14 @@

      Using Python

      Miniconda modules and one Anaconda module. (Miniconda being a small bootstrap version of Anaconda).

      -

      The Anaconda module is called anaconda/python3 and is suitable for +

      The Anaconda module is called anaconda3/2023.9 and is suitable for running serial applications - for parallel applications using mpi4py see mpi4py for CPU or mpi4py for GPU.

      You can list the Miniconda modules by running module avail python on a login node. Those module versions that have the gpu suffix are suitable for use on the Cirrus GPU nodes. There are also -modules that extend these Python environments, e.g., pyfr, horovod, -tensorflow and pytorch - simply run module help <module name> for -further info.

      +modules that extend these Python environments, e.g., pyfr, tensorflow +and pytorch - simply run module help <module name> for further info.

      The Miniconda modules support Python-based parallel codes, i.e., each such python module provides a suite of packages pertinent to parallel processing and numerical analysis such as dask, ipyparallel, @@ -1300,8 +1299,8 @@

      Using Python

      pip list. We now give some examples that show how the python modules can be used on the Cirrus CPU/GPU nodes.

      mpi4py for CPU

      -

      The python/3.9.13 module provides mpi4py 3.1.3 linked with OpenMPI -4.1.4.

      +

      The python/3.9.13 module provides mpi4py 3.1.5 linked with OpenMPI +4.1.6.

      See numpy-broadcast.py below which is a simple MPI Broadcast example, and the Slurm script submit-broadcast.slurm which demonstrates how to run across it two compute nodes.

      @@ -1462,15 +1461,15 @@

      mpi4py for GPU

      Machine Learning frameworks

      There are several more Python-based modules that also target the Cirrus GPU nodes. These include two machine learning frameworks, -pytorch/1.12.1-gpu and tensorflow/2.9.1-gpu. Both modules are Python -virtual environments that extend python/3.9.13-gpu. The MPI comms is +pytorch/1.13.1-gpu and tensorflow/2.15.0-gpu. Both modules are Python +virtual environments that extend python/3.10.8-gpu. The MPI comms is handled by the Horovod -0.25.0 package along with the NVIDIA Collective Communications +0.28.1 package along with the NVIDIA Collective Communications Library v2.11.4.

      A full package list for these environments can be obtained by loading the module of interest and then running pip list.

      Please click on the link indicated to see examples of how to use the -PyTorch and TensorFlow +PyTorch and TensorFlow modules .

      Installing your own Python packages (with pip)

      @@ -1557,10 +1556,10 @@

      Installing your own Python

      Lastly, the environment being extended does not have to come from one of the centrally-installed python modules. You could just as easily create a local virtual environment based on one of the Machine Learning -(ML) modules, e.g., horovod, tensorflow or pytorch. This means you -would avoid having to install ML packages within your local area. Each -of those ML modules is based on a python module. For example, -tensorflow/2.11.0-gpu is itself an extension of python/3.10.8-gpu.

      +(ML) modules, e.g., tensorflow or pytorch. This means you would avoid +having to install ML packages within your local area. Each of those ML +modules is based on a python module. For example, tensorflow/2.15.0-gpu +is itself an extension of python/3.10.8-gpu.

      Installing your own Python packages (with conda)

      This section shows you how to setup a local custom Python environment such that it duplicates a centrally-installed python module, ensuring @@ -1650,7 +1649,7 @@

      Installing your own Pyth

      Further, conda cannot be used if the base environment is one of the Machine Learning (ML) modules, as conda is not flexible enough to gather Python packages from both the ML and base python modules (e.g., -the ML module pytorch/2.0.0-gpu is itself based on +the ML module pytorch/1.13.1-gpu is itself based on python/3.10.8-gpu, and so conda will only duplicate packages provided by the python module and not the ones supplied by pytorch).

      Using JupyterLab on Cirrus

      diff --git a/user-guide/reading/index.html b/user-guide/reading/index.html index e10329a8..df8c1dd6 100644 --- a/user-guide/reading/index.html +++ b/user-guide/reading/index.html @@ -1003,7 +1003,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/user-guide/resource_management/index.html b/user-guide/resource_management/index.html index f3ad26b7..648ebdc0 100644 --- a/user-guide/resource_management/index.html +++ b/user-guide/resource_management/index.html @@ -1275,7 +1275,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT diff --git a/user-guide/singularity/index.html b/user-guide/singularity/index.html index 0f4412e2..39aa391d 100644 --- a/user-guide/singularity/index.html +++ b/user-guide/singularity/index.html @@ -1167,7 +1167,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT @@ -1596,7 +1596,7 @@

      Parallel proce module load singularity # The host bind paths for the Singularity container. -BIND_ARGS=/mnt/lustre/indy2lfs/sw,/opt/hpe,/etc/libibverbs.d,/path/to/input/files +BIND_ARGS=/work/y07/shared/cirrus-software,/opt/hpe,/etc/libibverbs.d,/path/to/input/files # The file containing environment variable settings that will allow # the container to find libraries on the host, e.g., LD_LIBRARY_PATH . @@ -1701,18 +1701,18 @@

      Accessing Cirrus Modul

      The resulting image file (centos7.sif) can then be copied to Cirrus using scp; such an image already exists on Cirrus and can be found in -the /mnt/lustre/indy2lfs/sw/singularity/images folder.

      +the /work/y07/shared/cirrus-software/singularity/images folder.

      When you use that image interactively on Cirrus you must start with a -login shell and also bind /mnt/lustre/indy2lfs/sw so that the container +login shell and also bind /work/y07/shared/cirrus-software so that the container can see all the module files, see below.

      [user@cirrus-login1 ~]$ module load singularity
      -[user@cirrus-login1 ~]$ singularity exec -B /mnt/lustre/indy2lfs/sw \
      -  /mnt/lustre/indy2lfs/sw/singularity/images/centos7.sif \
      +[user@cirrus-login1 ~]$ singularity exec -B /work/y07/shared/cirrus-software \
      +  /work/y07/shared/cirrus-software/singularity/images/centos7.sif \
           /bin/bash --login
      -Singularity> module avail intel-compilers
      +Singularity> module avail intel-*/compilers
       
      ---------- /mnt/lustre/indy2lfs/sw/modulefiles -------------
      -intel-compilers-18/18.05.274  intel-compilers-19/19.0.0.117
      +--------- /work/y07/shared/cirrus-modulefiles -------------
      +intel-19.5/compilers  intel-20.4/compilers
       Singularity> exit
       logout
       [user@cirrus-login1 ~]$
      @@ -1726,10 +1726,10 @@ 

      Altering a Container on Cirrus

      followed by a shell command with the --writable option. You are now free to change the files inside the container sandbox.

      user@cirrus-login1 ~]$ singularity build --sandbox image.sif.sandbox image.sif
      -user@cirrus-login1 ~]$ singularity shell -B /mnt/lustre/indy2lfs/sw --writable image.sif.sandbox
      +user@cirrus-login1 ~]$ singularity shell -B /work/y07/shared/cirrus-software --writable image.sif.sandbox
       Singularity>
       
      -

      In the example above, the /mnt/lustre/indy2lfs/sw bind path is specified, allowing +

      In the example above, the /work/y07/shared/cirrus-software bind path is specified, allowing you to build code that links to the Cirrus module libraries.

      Finally, once you are finished with the sandbox you can exit and convert back to the original image file.

      diff --git a/user-guide/solidstate/index.html b/user-guide/solidstate/index.html index dd691706..ae55feb1 100644 --- a/user-guide/solidstate/index.html +++ b/user-guide/solidstate/index.html @@ -1098,7 +1098,7 @@ - Debugging using Arm DDT + Debugging using Linaro DDT