Newer
Older
[Slurm][1] workload manager is used to allocate and access Barbora cluster and Complementary systems resources.
## Getting Partitions Information
Display partitions/queues
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
qcpu* up 2-00:00:00 1/191/0/192 cn[1-192]
qcpu_biz up 2-00:00:00 1/191/0/192 cn[1-192]
qcpu_exp up 1:00:00 1/191/0/192 cn[1-192]
qcpu_free up 18:00:00 1/191/0/192 cn[1-192]
qcpu_long up 6-00:00:00 1/191/0/192 cn[1-192]
qcpu_preempt up 12:00:00 1/191/0/192 cn[1-192]
qgpu up 2-00:00:00 0/8/0/8 cn[193-200]
qgpu_biz up 2-00:00:00 0/8/0/8 cn[193-200]
qgpu_exp up 1:00:00 0/8/0/8 cn[193-200]
qgpu_free up 18:00:00 0/8/0/8 cn[193-200]
qgpu_preempt up 12:00:00 0/8/0/8 cn[193-200]
qfat up 2-00:00:00 0/1/0/1 cn201
qdgx up 2-00:00:00 0/1/0/1 cn202
qviz up 8:00:00 0/2/0/2 vizserv[1-2]
## Getting Job Information
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
104 qcpu interact user R 1:48 2 cn[101-102]
Show job details for specific job
$ scontrol -d show job JOBID
Show job details for executing job from job session
$ scontrol -d show job $SLURM_JOBID
## Running Interactive Jobs
Run interactive job
$ salloc -A PROJECT-ID -p qcpu
Run interactive job, with X11 forwarding
$ salloc -A PROJECT-ID -p qcpu --x11
!!! warning
Do not use `srun` for initiating interactive jobs, subsequent `srun`, `mpirun` invocations would block forever.
## Running Batch Jobs
$ sbatch ./script.sh
```shell
#!/usr/bin/bash
#SBATCH -J MyJobName
#SBATCH -A OPEN-00-00
#SBATCH -N 4
#SBATCH --ntasks-per-node 36
#SBATCH -p qcpu
#SBATCH -t 12:00:00
ml OpenMPI/4.1.4-GCC-11.3.0
srun hostname | uniq -c
`
Useful command options (salloc, sbatch, srun)
* -N, --nodes
* --tasks-per-node
* -n, --ntasks
* -c, --cpus-per-task
## Slurm Job Environment Variables
Slurm provides useful information to the job via environment variables. Environment variables are available on all nodes allocated to job when accessed via Slurm supported means (srun, compatible mpirun).
See all Slurm variables
### Useful Variables
| variable name | description | example |
| ------ | ------ | ------ |
| SLURM_JOBID | job id of the executing job| 593 |
| SLURM_JOB_NODELIST | nodes allocated to the job | cn[101-102] |
| SLURM_JOB_NUM_NODES | number of nodes allocated to the job | 2 |
| SLURM_STEP_NODELIST | nodes allocated to the job step | cn101 |
| SLURM_STEP_NUM_NODES | number of nodes allocated to the job step | 1 |
| SLURM_JOB_PARTITION | name of the partition | qcpu |
| SLURM_SUBMIT_DIR | submit directory | /scratch/project/open-xx-yy/work |
See [Slurm srun documentation][2] for details.
```
$ echo $SLURM_JOB_NODELIST
cn[101-102]
Expand nodelist to list of nodes.
```
$ scontrol show hostnames $SLURM_JOB_NODELIST
cn101
cn102
```
$ scontrol update JobId=JOBID ATTR=VALUE
```
$ scontrol update JobId=JOBID Comment='The best job ever'
```
$ scancel JOBID
```
[1]: https://slurm.schedmd.com/
[2]: https://slurm.schedmd.com/srun.html#SECTION_OUTPUT-ENVIRONMENT-VARIABLES