This is a guide aimed at HECBioSim users of UK Tier 2 JADE and is designed as a quick-start guide, it is not intended as an official documentation for the JADE HPC facility, these can be found here.
Using slurm
Query Usage
Running AMBER
Running GROMACS
Running NAMD
Using Slurm
Slurm is a batch scheduling system for managing simulations run by many users on HPC resources. The following commands will get you started with using slurm, but are by no means exhaustive, consult the JADE documentation for more examples, or the slurm documentation for more advanced usage.
submitting a job
Assuming you have created a bash script file called "submit.sh" with your job submission resource requirements, software to run etc then:
sbatch submit.sh
will submit a job to the JADE queue.
query job status
If you have jobs running on JADE, then you can query their status with:
squeue -u <user-id>
This will list all jobs that your user account has submitted that has not completed.
delete job
If you want to delete a job you have submitted. You should use the job id either given when you submitted or found by querying your job status and use it with the following command:
scancel <jobid>
Job partitions on JADE
One of the things to be aware of is the partitions on JADE. There are three partitions, these are "small", "big" and "devel". The "devel" partition is for developers to test building codes and for users to verify their job scripts are working when they are running unfamiliar regimes (don't run a full production run here). The other two are partitions that classify job sizes. The "small" partition is for simulations using a single GPU, whilst the "big" partition is for jobs using 4 or 8 GPUs. You will need to pay attention to this when making your job submit scripts. They key numbers for these partitions are in the table below:
Partition name | Partition Size | Job Walltime limit | Running Job limit |
---|---|---|---|
big | 11 nodes | 24 hours | 5 Jobs |
small | 10 nodes | 6 days | 8 Jobs |
devel | 1 node | 1 hour | 1 Job |
Query Usage
Usage on JADE broadly falls into two categories, how much compute you have used and how much storage you are using. To find out your compute usage run:
$ account-balance
and you will see something like below. Where the users in your project account/s will be listed with their individual usage in the right hand column with the project total being the top row.
Cluster Account Login Proper Name GPU Name Used
--------- -------------------- --------- --------------- -------------- ----------
jade account1 gres/gpu 314
jade account1 user1-bu+ user1 gres/gpu 249
jade account1 user2-bu+ user2 gres/gpu 65
To find out your disk usage, run:
$ get-quota
and you will see something like below describing how much space you have used, any quotas and limits along with the same information for numbers of files.
Filesystem used quota limit grace files quota limit grace
/jmain01 5.135G 0k 0k - 55315 0 0 -
Running AMBER
AMBER 18 is installed and maintained on JADE by HECBioSim core SLA support. AMBER is fairly straight forward to run on JADE and shows excellent GPU performance. See the example scripts below.
Single GPU
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00
#SBATCH -J job1
#SBATCH -p small
module load amber/18
pmemd.cuda -O -i md1.in -p example.top -c example.crd -ref example-ref.crd
Running GROMACS
Gromacs 2019.3 is installed and maintained on JADE by HECBioSim core SLA support.
Running gromacs on a single GPU is probably the most common way users will submit GROMACS jobs to JADE. It is often better in terms of performance to run many single GPU jobs in tandem than it is to run one large parallelised simulation across many GPUs, a good use case for using multiple GPUs are examples where you may have a large system that is too big for the memory on one volta GPU (16Gb). Here are example job submission scripts for both cases.
Single GPU
For a single GPU job, you will also have 5 CPUs available, you are free to use some or all 5 depending on your system, reducing this or increasing (up to five) may affect performance.
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=5
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
#SBATCH -J job1
#SBATCH -p small
module purge
module load gromacs/2019.3
gmx mdrun -deffnm simulation1 -ntomp ${SLURM_CPUS_PER_TASK}
Multiple GPU
Although we are using 4 GPUs here, we are still only running on one node. We are using an MPI thread per GPU and using 5 CPU OpenMP threads per GPU so in total 4 GPUs and 20 CPUs. You can of course play with this combination of MPI and OpenMP to see what suits your system best.
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=5
#SBATCH --gres=gpu:4
#SBATCH --time=02:00:00
#SBATCH -J job1
#SBATCH -p big
module purge
module load gromacs/2019.3
mpirun -np ${SLURM_NTASKS_PER_NODE} --bind-to socket mdrun_mpi -deffnm simulation1 -ntomp ${SLURM_CPUS_PER_TASK}
Running NAMD
NAMD 2.12 is installed and maintained on JADE by HECBioSim core SLA support.
Single GPU
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=5
#SBATCH --gres=gpu:1
#SBATCH --time=00:30:00
#SBATCH -J job1
#SBATCH -p small
module purge
module load namd/2.12
namd2 +p $SLURM_NTASKS_PER_NODE +setcpuaffinity +devices $CUDA_VISIBLE_DEVICES ./bench.in &> bench.out