Newer
Older
[Slurm][1] workload manager is used to allocate and access Barbora cluster and Complementary systems resources. Karolina cluster coming soon...
A `man` page exists for all Slurm commands, as well as `--help` command option, which provides a brief summary of options. Slurm [documentation][c] and [man pages][d] are also available online.
## Getting Partitions Information
Display partitions/queues on system:
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
qcpu* up 2-00:00:00 1/191/0/192 cn[1-192]
qcpu_biz up 2-00:00:00 1/191/0/192 cn[1-192]
qcpu_exp up 1:00:00 1/191/0/192 cn[1-192]
qcpu_free up 18:00:00 1/191/0/192 cn[1-192]
qcpu_long up 6-00:00:00 1/191/0/192 cn[1-192]
qcpu_preempt up 12:00:00 1/191/0/192 cn[1-192]
qgpu up 2-00:00:00 0/8/0/8 cn[193-200]
qgpu_biz up 2-00:00:00 0/8/0/8 cn[193-200]
qgpu_exp up 1:00:00 0/8/0/8 cn[193-200]
qgpu_free up 18:00:00 0/8/0/8 cn[193-200]
qgpu_preempt up 12:00:00 0/8/0/8 cn[193-200]
qfat up 2-00:00:00 0/1/0/1 cn201
qdgx up 2-00:00:00 0/1/0/1 cn202
qviz up 8:00:00 0/2/0/2 vizserv[1-2]
`NODES(A/I/O/T)` column sumarizes node count per state, where the `A/I/O/T` stands for `allocated/idle/other/total`.
Example output is from Barbora cluster.
On Barbora cluster all queues/partitions provide full node allocation, whole nodes are allocated to job.
On Complementary systems only some queues/partitions provide full node allocation, see [Complementary systems documentation][2] for details.
## Running Interactive Jobs
Sometimes you may want to run your job interactively, for example for debugging, running your commands one by one from the command line.
Run interactive job - queue qcpu_exp, one node by default, one task by default:
$ salloc -A PROJECT-ID -p qcpu_exp
Run interactive job on four nodes, 36 tasks per node (Barbora cluster, cpu partition recommended value based on node core count), two hours time limit:
$ salloc -A PROJECT-ID -p qcpu -N 4 --ntasks-per-node 36 -t 2:00:00
Run interactive job, with X11 forwarding:
$ salloc -A PROJECT-ID -p qcpu_exp --x11
To finish the interactive job, you can either use the `exit` keyword, or Ctrl+D (`^D`) control sequence.
!!! warning
Do not use `srun` for initiating interactive jobs, subsequent `srun`, `mpirun` invocations would block forever.
## Running Batch Jobs
Batch jobs is the standard way of running jobs and utilizing HPC clusters.
Create example batch script called script.sh with the following content:
```shell
#!/usr/bin/bash
#SBATCH --job-name MyJobName
#SBATCH --account PROJECT-ID
#SBATCH --partition qcpu
#SBATCH --nodes 4
#SBATCH --ntasks-per-node 36
ml OpenMPI/4.1.4-GCC-11.3.0
srun hostname | sort | uniq -c
* use MyJobName as job name
* use project PROJECT-ID for job access and accounting
* use partition/queue qcpu
* use four nodes
* use 36 tasks per node
* set job time limit to 12 hours
* load appropriate module
* run command, srun serves as Slurm's native way of executing MPI-enabled applications, hostname is used in the example just for sake of simplicity
### Job Submit
Submit batch job:
### submit directory my_work_dir will be also used as working directory for submitted job
$ cd my_work_dir
$ sbatch script.sh
```
Example output of the job:
```shell
36 cn17.barbora.it4i.cz
36 cn18.barbora.it4i.cz
36 cn19.barbora.it4i.cz
36 cn20.barbora.it4i.cz
```
### Job Environment Variables
Slurm provides useful information to the job via environment variables. Environment variables are available on all nodes allocated to job when accessed via Slurm supported means (srun, compatible mpirun).
See all Slurm variables
| variable name | description | example |
| ------ | ------ | ------ |
| SLURM_JOBID | job id of the executing job| 593 |
| SLURM_JOB_NODELIST | nodes allocated to the job | cn[101-102] |
| SLURM_JOB_NUM_NODES | number of nodes allocated to the job | 2 |
| SLURM_STEP_NODELIST | nodes allocated to the job step | cn101 |
| SLURM_STEP_NUM_NODES | number of nodes allocated to the job step | 1 |
| SLURM_JOB_PARTITION | name of the partition | qcpu |
| SLURM_SUBMIT_DIR | submit directory | /scratch/project/open-xx-yy/work |
See relevant [Slurm documentation][3] for details.
```
$ echo $SLURM_JOB_NODELIST
cn[101-102]
Expand nodelist to list of nodes:
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
## Job Management
### Getting Job Information
Show all jobs on system:
```console
$ squeue
```
Show my jobs:
```console
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
104 qcpu interact user R 1:48 2 cn[101-102]
```
Show job details for specific job:
```console
$ scontrol show job JOBID
```
Show job details for executing job from job session:
```console
$ scontrol show job $SLURM_JOBID
```
Show my jobs using long output format which includes time limit:
```console
$ squeue --me -l
```
Show my jobs in running state:
```console
$ squeue --me -t running
```
Show my jobs in pending state:
```console
$ squeue --me -t pending
```
Show jobs for given project:
```console
$ squeue -A PROJECT-ID
```
### Job States
The most common job states are (in alphabetical order):
| Code | Job State | Explanation |
| :--: | :------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| CA | CANCELLED | Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated. |
| CD | COMPLETED | Job has terminated all processes on all nodes with an exit code of zero. |
| CG | COMPLETING | Job is in the process of completing. Some processes on some nodes may still be active. |
| F | FAILED | Job terminated with non-zero exit code or other failure condition. |
| NF | NODE_FAIL | Job terminated due to failure of one or more allocated nodes. |
| OOM | OUT_OF_MEMORY | Job experienced out of memory error. |
| PD | PENDING | Job is awaiting resource allocation. |
| PR | PREEMPTED | Job terminated due to preemption. |
| R | RUNNING | Job currently has an allocation. |
| RQ | REQUEUED | Completing job is being requeued. |
| SI | SIGNALING | Job is being signaled. |
| TO | TIMEOUT | Job terminated upon reaching its time limit. |
```
$ scontrol update JobId=JOBID ATTR=VALUE
$ scontrol update JobId=JOBID timelimit=4:00:00
```
$ scontrol update JobId=JOBID Comment='The best job ever'
```
$ scancel JOBID
```
```
$ scancel --me
```
Delete all my jobs in interactive mode, confirming every action:
```
$ scancel --me -i
```
```
$ scancel --me -t running
```
```
$ scancel --me -t pending
```
Delete all my pending jobs for project PROJECT-ID:
```
$ scancel --me -t pending -A PROJECT-ID
```
### Invalid Account
`sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified`
* Invalid account (i.e. project) was specified in job submission
* User does not have access to given account/project
* Given account/project does not have access to given partition
* Access to given partition was retracted due to the project's allocation exhaustion
[1]: https://slurm.schedmd.com/
[2]: /cs/job-scheduling/#partitions
[3]: https://slurm.schedmd.com/srun.html#SECTION_OUTPUT-ENVIRONMENT-VARIABLES
[a]: https://slurm.schedmd.com/
[b]: http://slurmlearning.deic.dk/
[c]: https://slurm.schedmd.com/documentation.html
[d]: https://slurm.schedmd.com/man_index.html
[e]: https://slurm.schedmd.com/sinfo.html
[f]: https://slurm.schedmd.com/squeue.html
[g]: https://slurm.schedmd.com/scancel.html
[h]: https://slurm.schedmd.com/scontrol.html
[i]: https://slurm.schedmd.com/job_array.html