SLURM cheatsheet help

This page is dedicated to commonly used SLURM commands with short tips and howto quickies. You can find more details at (first two hits on google search):

Submitting a job

In order to submit a job, a script compliant with the scheduler directives should be passed to sbatch

$ sbatch

To submit an interactive for testing and/or debugging/development the srun command can be used

# single core interactive bash terminal on a compute node (e.g for development)
$ srun --pty /bin/bash

# allocate a cpu only job (specify resources details)
$ srun --partition=normal --nodes=1 --ntasks-per-node=4 --cpus-per-task=1 --mem=8000 --account=my_project --time=0-01:00:00 --pty /bin/bash

# allocate a gpu job
$ srun --partition=gpu --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 --mem=8000 --gres=gpu --account=my_project --time=0-01:00:00 --pty /bin/bash

List of running jobs

The list of jobs specific to the current user (i.e you) that are queued or running

$ squeue

The list of jobs running or queueud on the cluster

$ squeue -a

To show the estimated starting time of a pending job

$ squeue --start -j <job_id>

Remove a job from the queue

Use squeue to query the running jobs and get the JOBID. Once the job id (that is an integer in the first column of the output of squeue) of the job to be killed is know, execute:

$ scancel job_to_be_killed_id

List of hosts and queues/partitions on the cluster

$ sinfo

To see the details of the compute nodes with their respective specs

$ sinfo_all
onode01    idle     up     16    2:8:1   0.01     62536      intel                none
onode02    idle     up     16    2:8:1   0.01     63275      intel                none
onode03    idle     up     16    2:8:1   0.01     63317      intel                none
onode04    idle     up     16    2:8:1   0.08     63295      intel                none
onode05    idle     up     16    2:8:1   0.06     18614      amd                  none
onode06    idle     up     16    2:8:1   0.03     25758      amd                  none
onode07    idle     up     16    2:8:1   0.01     59303      amd                  none
onode08    idle     up     16    2:8:1   0.01     21531      amd                  none
onode09    idle     up     16    2:8:1   0.01     18060      amd                  none
onode10    idle     up     8     1:8:1   0.07     14140      amd                  none
onode11    idle     up     8     1:8:1   0.01     32087      amd                  none
onode12    idle     up     8     1:8:1   0.15     31365      amd                  none
onode13    idle     up     64    8:8:1   0.01     63232      amd                  none
onode14    idle     up     64    8:8:1   0.01     56430      amd                  none
onode15    idle     up     64    8:8:1   0.01     63092      amd                  none
onode16    idle     up     64    8:8:1   0.01     62363      amd                  none

To see the details of the available partition with their respective specs

$ sinfo_partitions
PARTITION           TIMELIMIT           NODELIST            MAX_CPUS_PER_NODE   NODES               JOB_SIZE            CPUS                MEMORY              GRES                NODES(A/I/O/T)
normal              1-00:00:00          onode[01-09]        UNLIMITED           9                   1-infinite          16                  60000+              (null)              0/9/0/9
large               1-00:00:00          onode[13-16]        UNLIMITED           4                   1-infinite          64                  256000              (null)              1/3/0/4
gpu                 6:00:00             onode10             UNLIMITED           1                   1-infinite          8                   15000               gpu:v100d16q:1      1/0/0/1
gpu                 6:00:00             onode[11-12]        UNLIMITED           2                   1-infinite          8                   32000               gpu:v100d32q:1      1/1/0/2
msfea-ai            3-00:00:00          onode12             UNLIMITED           1                   1-infinite          8                   32000               gpu:v100d32q:1      1/0/0/1
msfea-ai            3-00:00:00          onode10             UNLIMITED           1                   1-infinite          8                   15000               gpu:v100d16q:1      1/0/0/1
cmps-ai             3-00:00:00          onode11             UNLIMITED           1                   1-infinite          8                   32000               gpu:v100d32q:1      0/1/0/1
physics             1-00:00:00          onode[13-16]        UNLIMITED           4                   1-infinite          64                  256000              (null)              1/3/0/4