Job scripts

The following script can be used as a template to exectute some bash commands for a serial or parallel program. This is just a template that has the most commontly used flags. For working example see the job scripts examples.

template job script

 1#!/bin/bash
 2
 3## specify the job and project name
 4#SBATCH --job-name=my_job_name
 5#SBATCH --account=abc123
 6
 7## specify the required resources
 8#SBATCH --partition=normal
 9#SBATCH --nodes=1
10#SBATCH --ntasks-per-node=8
11#SBATCH --cpus-per-task=1
12#SBATCH --gres=gpu:v100d32q:1
13#SBATCH --mem=12000
14#SBATCH --time=0-01:00:00
15#SBATCH --mail-type=ALL
16#SBATCH --mail-user=abc123@mail.aub.edu
17
18#
19# add your command here, e.g
20#
21echo "hello world"

Flags Description

  • #SBATCH --job-name=my_job_name: Set the name of the job. This will appear e.g. when the command squeue is executed to query the queued or running jobs.

  • #SBATCH --account=abc123: Specify the ID of the project. This number should correspond to the project ID of the service request. Jobs without this flag will be rejected.

  • #SBATCH --partition=normal: The name of the partition, a.k.a queue to which the job will be submitted.

  • #SBATCH --nodes=2: The number of nodes that will be reserved for the job.

  • #SBATCH --ntasks-per-node=8: The number of cores (e.g mpi tasks) that will be reserved per node.

  • #SBATCH --cpus-per-task=2: The number of cores per task to be reserved for the job (e.g number of openmp threads per mpi task). The total number of cores reserved for the job is the product of the values of the flags --nodes, --ntasks-per-node and --cpus-per-task.

  • #SBATCH --mem=32000: the amount of memory per node in MB that will be reserved for the job. Jobs that do not specify this flag will be rejected.

  • #SBATCH --time=1-00:00:00: The time limit of the job. When the limit is reached, the job is killed by the scheduler. Jobs that do not specify this flag will be rejected.

  • #SBATCH --mail-type=ALL: recieve email notification for all stages of a job, e.g when the job starts and terminates.

  • #SBATCH --mail-user=abc123@aub.edu.lb: The email address to which the job notification emails are sent.

Job scripts examples

Below is a list of job script that can be used to run serial or parallel jobs. For more details please refer to the scientific computing section.

Note

all the jobs below are just basic working examples and can be copied and modified to suit the user needs. Make sure to change the account though to the one that you are using.

Note

Every application is different and might need special flags to run correctly. Please consult the documentation of the application that you are using to make sure that you are using the correct flags. You may also email it.helpdesk@aub.edu.lb for advise.

serial - single core job

 1#!/bin/bash
 2
 3#SBATCH --job-name=test-job
 4#SBATCH --account=abc123
 5
 6#SBATCH --partition=normal
 7#SBATCH --nodes=1
 8#SBATCH --ntasks-per-node=1
 9#SBATCH --cpus-per-task=1
10#SBATCH --mem=8000
11#SBATCH --time=0-03:00:00
12
13#SBATCH --mail-type=ALL
14#SBATCH --mail-user=abc123@mail.aub.edu
15
16echo "Hello World!"

single node smp job

 1#!/bin/bash
 2
 3#SBATCH --job-name=test-job
 4#SBATCH --account=abc123
 5
 6#SBATCH --partition=normal
 7#SBATCH --nodes=1
 8#SBATCH --ntasks-per-node=1
 9#SBATCH --cpus-per-task=8
10#SBATCH --mem=32000
11#SBATCH --time=0-03:00:00
12
13#SBATCH --mail-type=ALL
14#SBATCH --mail-user=abc123@mail.aub.edu
15
16echo "Hello World!"

parallel multi-host job

 1#!/bin/bash
 2
 3#SBATCH --job-name=test-job
 4#SBATCH --account=abc123
 5
 6#SBATCH --partition=large   # normal, arza, medium ...
 7#SBATCH --nodes=4
 8#SBATCH --ntasks-per-node=4
 9#SBATCH --cpus-per-task=16
10#SBATCH --mem=256000
11#SBATCH --time=0-03:00:00
12
13#SBATCH --mail-type=ALL
14#SBATCH --mail-user=abc123@mail.aub.edu
15
16echo "Hello World!"

single host GPU job

 1#!/bin/bash
 2
 3#SBATCH --job-name=test-job
 4#SBATCH --account=abc123
 5
 6#SBATCH --partition=gpu
 7#SBATCH --nodes=1
 8#SBATCH --ntasks-per-node=1
 9#SBATCH --cpus-per-task=8
10#SBATCH --mem=32000
11#SBATCH --gres=gpu:v100d32q:1
12#SBATCH --time=0-03:00:00
13
14#SBATCH --mail-type=ALL
15#SBATCH --mail-user=abc123@mail.aub.edu
16
17echo "Hello World!"

multi-host GPU job

 1#!/bin/bash
 2
 3#SBATCH --job-name=test-job
 4#SBATCH --account=abc123
 5
 6#SBATCH --partition=gpu
 7#SBATCH --nodes=4
 8#SBATCH --ntasks-per-node=1
 9#SBATCH --cpus-per-task=8
10#SBATCH --mem=32000
11#SBATCH --gres=gpu:v100d32q:1
12#SBATCH --time=0-03:00:00
13
14#SBATCH --mail-type=ALL
15#SBATCH --mail-user=abc123@mail.aub.edu
16
17echo "Hello World!"

Batch job submission and monitoring procedure

  • submit the job script using SLURM

    $ sbatch my_job_script.sh
    

    This will submit the job to the queueing system. The job could run immediately or set in pending mode until the requested resources are available for the job to run.

  • check the status of the job

    $ squeue -a
    
  • After the job is dispatched for executing (starts running), monitor the output by checking the .o file.

For more information on using SLURM, please consult the man pages:

$ man sbatch

Jobs time limits and checkpoints

In-order to have fair usage of the resources and the partitions (queues), different partitions have different time limits. The maximum time limit for jobs is 3 days. Also paritions have different priorities that are necessary for fair usage, for example, short jobs have higher priorities than long jobs. When a job reaches the time limit that is specified in the job script or the time limit of the partition, it is automatically killed and removed the the queue. It is the responsibility of the user to set the job parameters based on the requirements of the job and the available resources.

in all the examples below it is the responsibily of the user to manage writing the checkpoint file and loading it.

In the following example, a job array (#SBATCH --array=1-30%1) is used to indicate that the job should be run as a chain of 30 jobs back to back. Using this flow a job can be run for arbitarily long periods, in this case and for the sake of demonstration, this job runs for 30 days using individual jobs that run for 1 day each. When the first job finishes, a checkpoint file foo.chkp is written to the disk and the execution of the next job starts where foo.chkp` is read and the program state is restored and the execution resumes.

#!/bin/bash

#SBATCH --job-name=my_job_name
#SBATCH --account=abc123

## specify the required resources
#SBATCH --partition=normal
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=2
#SBATCH --mem=12000
#SBATCH --time=0-01:00:00
#SBATCH --array=1-30%1

## load some modules
module load python

# start executing the program,
MY_CHECKPOINT_FILE=foo.chkp
if [ -z "${MY_CHECKPOINT_FILE}" ]; then
    # checkpoint file is not found, execute this command
    python train_model_from_scratch.py
else
    # checkpoint file is found, read it and continue training
    python train_model_from_scratch.py --use-checkpoint=${MY_CHECKPOINT_FILE}
fi

Each job in the job array will have its own .out file suffixed with the job array index, e.g my_slurm_30.out.

The main difference between using job dependencies and job array is that using dependencies the job will be resubmitted infinit times until the user decides to cancel the automatic re-submission.

Warning

It is important to include a wait time of a few minuites (e.g 5 min) so that the scheduler will not be overloaded by the recursive resubmission of jobs in case something goes wrong.

In the template job script below, when the job is submitted, a sbatch command submits the dependency from within the job. The simulation/program resume procedure is the same as that of using job arrays, i.e if a checkpoint exists, run the program from the checkpoint, otherwise run the program and create the checkpoint.

#!/bin/bash

#SBATCH --job-name=my_job_name
#SBATCH --account=abc123

## specify the required resources
#SBATCH --partition=normal
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=2
#SBATCH --mem=12000
#SBATCH --time=0-01:00:00

## submit the dependency that will start after the current job finishes
sbatch --dependency=afterok:${SLURM_JOBID} job.sh
sleep 300

# start executing the program,
MY_CHECKPOINT_FILE=foo.chkp
if [ -z "${MY_CHECKPOINT_FILE}" ]; then
    # checkpoint file is not found, execute this command
    python train_model_from_scratch.py
else
    # checkpoint file is found, read it and continue training
    python train_model_from_scratch.py --use-checkpoint=${MY_CHECKPOINT_FILE}
fi