Skip to content
Snippets Groups Projects
slurm-job-submission-and-execution.md 6.44 KiB
Newer Older
  • Learn to ignore specific revisions
  • Jan Siwiec's avatar
    Jan Siwiec committed
    # Slurm Job Submission and Execution
    
    [Slurm][1] workload manager is used to allocate and access Barbora cluster and Complementary systems resources. Karolina cluster coming soon...
    
    A `man` page exists for all Slurm commands, as well as `--help` command option, which provides a brief summary of options. Slurm [documentation][c] and [man pages][d] are also available online.
    
    ## Getting Partitions Information
    
    Display partitions/queues on system:
    
    
    ```console
    $ sinfo -s
    
    PARTITION    AVAIL  TIMELIMIT   NODES(A/I/O/T) NODELIST
    
    qcpu*           up 2-00:00:00      1/191/0/192 cn[1-192]
    qcpu_biz        up 2-00:00:00      1/191/0/192 cn[1-192]
    qcpu_exp        up    1:00:00      1/191/0/192 cn[1-192]
    qcpu_free       up   18:00:00      1/191/0/192 cn[1-192]
    qcpu_long       up 6-00:00:00      1/191/0/192 cn[1-192]
    qcpu_preempt    up   12:00:00      1/191/0/192 cn[1-192]
    
    qgpu            up 2-00:00:00          0/8/0/8 cn[193-200]
    qgpu_biz        up 2-00:00:00          0/8/0/8 cn[193-200]
    qgpu_exp        up    1:00:00          0/8/0/8 cn[193-200]
    qgpu_free       up   18:00:00          0/8/0/8 cn[193-200]
    qgpu_preempt    up   12:00:00          0/8/0/8 cn[193-200]
    qfat            up 2-00:00:00          0/1/0/1 cn201
    qdgx            up 2-00:00:00          0/1/0/1 cn202
    qviz            up    8:00:00          0/2/0/2 vizserv[1-2]
    
    `NODES(A/I/O/T)` column sumarizes node count per state, where the `A/I/O/T` stands for `allocated/idle/other/total`.
    
    Example output is from Barbora cluster.
    
    On Barbora cluster all queues/partitions provide full node allocation, whole nodes are allocated to job.
    
    
    On Complementary systems only some queues/partitions provide full node allocation, see [Complementary systems documentation][2] for details.
    
    ## Getting Job Information
    
    Show all jobs on system:
    
    ```console
    $ squeue
    ```
    
    
    $ squeue --me
                 JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                   104   qcpu    interact    user   R       1:48      2 cn[101-102]
    
    Show job details for specific job:
    
    $ scontrol show job JOBID
    
    Show job details for executing job from job session:
    
    $ scontrol show job $SLURM_JOBID
    ```
    
    
    Show my jobs using long output format which includes time limit:
    
    
    ```console
    $ squeue --me -l
    
    Show my jobs in running state:
    
    
    ```console
    $ squeue --me -t running
    ```
    
    
    Show my jobs in pending state:
    
    
    ```console
    $ squeue --me -t pending
    ```
    
    
    Show jobs for given project:
    
    $ squeue -A PROJECT-ID
    
    ## Running Interactive Jobs
    
    
    Run interactive job - queue qcpu_exp, one node by default, one task by default:
    
    $ salloc -A PROJECT-ID -p qcpu_exp
    
    Run interactive job on four nodes, 36 tasks per node (Barbora cluster, cpu partition recommended value based on node core count), two hours time limit:
    
    $ salloc -A PROJECT-ID -p qcpu -N 4 --ntasks-per-node 36 -t 2:00:00
    
    Run interactive job, with X11 forwarding:
    
    $ salloc -A PROJECT-ID -p qcpu_exp --x11
    
    To finish the interactive job, you can either use the `exit` keyword, or Ctrl+D (`^D`) control sequence.
    
    
    !!! warning
        Do not use `srun` for initiating interactive jobs, subsequent `srun`, `mpirun` invocations would block forever.
    
    Create example batch script called script.sh with the following content:
    
    #SBATCH --job-name MyJobName
    #SBATCH --account PROJECT-ID
    #SBATCH --partition qcpu
    #SBATCH --nodes 4
    
    #SBATCH --ntasks-per-node 36
    
    #SBATCH --time 12:00:00
    
    ml OpenMPI/4.1.4-GCC-11.3.0
    
    srun hostname | sort | uniq -c
    
    * use bash shell interpreter
    
    * use MyJobName as job name
    * use project PROJECT-ID for job access and accounting
    * use partition/queue qcpu
    * use four nodes
    * use 36 tasks per node
    * set job time limit to 12 hours
    
    * load appropriate module
    * run command, srun serves as Slurm's native way of executing MPI-enabled applications, hostname is used in the example just for sake of simplicity
    
    
    Run batch job:
    
    ```console
    
    ### submit directory my_work_dir will be also used as working directory for submitted job
    $ cd my_work_dir
    
    Example output of the job:
    ```shell
         36 cn17.barbora.it4i.cz
         36 cn18.barbora.it4i.cz
         36 cn19.barbora.it4i.cz
         36 cn20.barbora.it4i.cz
    ```
    
    ## Job Environment Variables
    
    Slurm provides useful information to the job via environment variables. Environment variables are available on all nodes allocated to job when accessed via Slurm supported means (srun, compatible mpirun).
    
    See all Slurm variables
    
    $ set | grep ^SLURM
    
    | variable name | description | example |
    | ------ | ------ | ------ |
    | SLURM_JOBID | job id of the executing job| 593 |
    | SLURM_JOB_NODELIST | nodes allocated to the job | cn[101-102] |
    | SLURM_JOB_NUM_NODES | number of nodes allocated to the job | 2 |
    | SLURM_STEP_NODELIST | nodes allocated to the job step | cn101 |
    | SLURM_STEP_NUM_NODES | number of nodes allocated to the job step | 1 |
    | SLURM_JOB_PARTITION | name of the partition | qcpu |
    | SLURM_SUBMIT_DIR | submit directory | /scratch/project/open-xx-yy/work |
    
    See relevant [Slurm documentation][3] for details.
    
    Get job nodelist:
    
    ```
    $ echo $SLURM_JOB_NODELIST
    cn[101-102]
    
    Expand nodelist to list of nodes:
    
    $ scontrol show hostnames
    
    ```
    $ scontrol update JobId=JOBID ATTR=VALUE
    
    Modify job's time limit:
    
    $ scontrol update JobId=JOBID timelimit=4:00:00
    
    Set/modify job's comment:
    
    ```
    $ scontrol update JobId=JOBID Comment='The best job ever'
    
    Delete job by job id:
    
    
    Delete all my jobs:
    
    Delete all my jobs in interactive mode, confirming every action:
    
    Delete all my running jobs:
    
    Delete all my pending jobs:
    
    Delete all my pending jobs for project PROJECT-ID:
    
    
    ```
    $ scancel --me -t pending -A PROJECT-ID
    ```
    
    
    [1]: https://slurm.schedmd.com/
    
    [2]: /cs/job-scheduling/#partitions
    
    [3]: https://slurm.schedmd.com/srun.html#SECTION_OUTPUT-ENVIRONMENT-VARIABLES
    
    
    [a]: https://slurm.schedmd.com/
    [b]: http://slurmlearning.deic.dk/
    [c]: https://slurm.schedmd.com/documentation.html
    [d]: https://slurm.schedmd.com/man_index.html
    [e]: https://slurm.schedmd.com/sinfo.html
    [f]: https://slurm.schedmd.com/squeue.html
    [g]: https://slurm.schedmd.com/scancel.html
    [h]: https://slurm.schedmd.com/scontrol.html
    [i]: https://slurm.schedmd.com/job_array.html