This is a guide aimed at HECBioSim users of UK Tier 2 JADE and is designed as a quick-start guide, it is not intended as an official documentation for the JADE HPC facility, these can be found here.

Using slurm
Query Usage
Running AMBER
Running GROMACS
Running NAMD

 

Using Slurm

Slurm is a batch scheduling system for managing simulations run by many users on HPC resources. The following commands will get you started with using slurm, but are by no means exhaustive, consult the JADE documentation for more examples, or the slurm documentation for more advanced usage.

 

submitting a job

Assuming you have created a bash script file called "submit.sh" with your job submission resource requirements, software to run etc then:

sbatch submit.sh

will submit a job to the JADE queue.

 

query job status

If you have jobs running on JADE, then you can query their status with:

squeue -u <user-id>

This will list all jobs that your user account has submitted that has not completed.

 

delete job

If you want to delete a job you have submitted. You should use the job id either given when you submitted or found by querying your job status and use it with the following command:

scancel <jobid>

 

Job partitions on JADE

One of the things to be aware of is the partitions on JADE. There are three partitions, these are "small", "big" and "devel". The "devel" partition is for developers to test building codes and for users to verify their job scripts are working when they are running unfamiliar regimes (don't run a full production run here). The other two are partitions that classify job sizes. The "small" partition is for simulations using a single GPU, whilst the "big" partition is for jobs using 4 or 8 GPUs. You will need to pay attention to this when making your job submit scripts. They key numbers for these partitions are in the table below:

 

Partition name   Partition Size   Job Walltime limit   Running Job limit
big 11 nodes 24 hours 5 Jobs
small 10 nodes 6 days 8 Jobs
devel 1 node 1 hour 1 Job

 

Query Usage

Usage on JADE broadly falls into two categories, how much compute you have used and how much storage you are using. To find out your compute usage run:

$ account-balance

and you will see something like below. Where the users in your project account/s will be listed with their individual usage in the right hand column with the project total being the top row.

Cluster   Account              Login     Proper Name     GPU Name       Used
--------- -------------------- --------- --------------- -------------- ----------
     jade  account1                                                       gres/gpu      314
     jade  account1              user1-bu+     user1             gres/gpu      249
     jade  account1              user2-bu+     user2              gres/gpu       65

 

To find out your disk usage, run:

$ get-quota

and you will see something like below describing how much space you have used, any quotas and limits along with the same information for numbers of files.

Filesystem    used     quota limit grace  files     quota limit grace
/jmain01       5.135G     0k     0k     -    55315       0      0      -

 

Running AMBER

AMBER 18 is installed and maintained on JADE by HECBioSim core SLA support. AMBER is fairly straight forward to run on JADE and shows excellent GPU performance. See the example scripts below.

 

Single GPU

 

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00
#SBATCH -J job1
#SBATCH -p small

module load amber/18

pmemd.cuda -O -i md1.in -p example.top -c example.crd -ref example-ref.crd

 

Running GROMACS

Gromacs 2019.3 is installed and maintained on JADE by HECBioSim core SLA support.

Running gromacs on a single GPU is probably the most common way users will submit GROMACS jobs to JADE. It is often better in terms of performance to run many single GPU jobs in tandem than it is to run one large parallelised simulation across many GPUs, a good use case for using multiple GPUs are examples where you may have a large system that is too big for the memory on one volta GPU (16Gb). Here are example job submission scripts for both cases.

 

Single GPU

For a single GPU job, you will also have 5 CPUs available, you are free to use some or all 5 depending on your system, reducing this or increasing (up to five) may affect performance.

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=5
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
#SBATCH -J job1
#SBATCH -p small

module purge
module load gromacs/2019.3

gmx mdrun -deffnm simulation1 -ntomp ${SLURM_CPUS_PER_TASK}

 

Multiple GPU

Although we are using 4 GPUs here, we are still only running on one node. We are using an MPI thread per GPU and using 5 CPU OpenMP threads per GPU so in total 4 GPUs and 20 CPUs. You can of course play with this combination of MPI and OpenMP to see what suits your system best.

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=5
#SBATCH --gres=gpu:4
#SBATCH --time=02:00:00
#SBATCH -J job1
#SBATCH -p big

module purge
module load gromacs/2019.3

mpirun -np ${SLURM_NTASKS_PER_NODE} --bind-to socket mdrun_mpi -deffnm simulation1 -ntomp ${SLURM_CPUS_PER_TASK}

 

Running NAMD

NAMD 2.12 is installed and maintained on JADE by HECBioSim core SLA support.

Single GPU

 

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=5
#SBATCH --gres=gpu:1
#SBATCH --time=00:30:00
#SBATCH -J job1
#SBATCH -p small

module purge
module load namd/2.12

namd2 +p $SLURM_NTASKS_PER_NODE +setcpuaffinity +devices $CUDA_VISIBLE_DEVICES ./bench.in &> bench.out