Skip to content
Snippets Groups Projects
Commit d69905de authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Update job-scheduling.md

parent 4dcf7e69
No related branches found
No related tags found
No related merge requests found
Pipeline #29349 passed with warnings
...@@ -74,9 +74,10 @@ Useful command options (salloc, sbatch, srun) ...@@ -74,9 +74,10 @@ Useful command options (salloc, sbatch, srun)
## Slurm Job Environment Variables ## Slurm Job Environment Variables
Slurm provides usefull information to the job via environment variables. Environment variables are available on all nodes allocated to job when accessed via Slurm supported means (srun, compatible mpirun). Slurm provides useful information to the job via environment variables. Environment variables are available on all nodes allocated to job when accessed via Slurm supported means (srun, compatible mpirun).
See all Slurm variables See all Slurm variables
``` ```
set | grep ^SLURM set | grep ^SLURM
``` ```
...@@ -96,12 +97,14 @@ set | grep ^SLURM ...@@ -96,12 +97,14 @@ set | grep ^SLURM
See [Slurm srun documentation][2] for details. See [Slurm srun documentation][2] for details.
Get job nodelist Get job nodelist
``` ```
$ echo $SLURM_JOB_NODELIST $ echo $SLURM_JOB_NODELIST
p03-amd[01-02] p03-amd[01-02]
``` ```
Expand nodelist to list of nodes. Expand nodelist to list of nodes.
``` ```
$ scontrol show hostnames $SLURM_JOB_NODELIST $ scontrol show hostnames $SLURM_JOB_NODELIST
p03-amd01 p03-amd01
...@@ -139,7 +142,7 @@ $ scancel JOBID ...@@ -139,7 +142,7 @@ $ scancel JOBID
Use `-t`, `--time` option to specify job run time limit. Default job time limit is 2 hours, maximum job time limit is 24 hours. Use `-t`, `--time` option to specify job run time limit. Default job time limit is 2 hours, maximum job time limit is 24 hours.
FIFO scheduling with backfiling is employed. FIFO scheduling with backfilling is employed.
## Partition 00 - ARM (Cortex-A72) ## Partition 00 - ARM (Cortex-A72)
...@@ -327,7 +330,7 @@ $ scontrol -d show node p02-intel02 | grep ActiveFeatures ...@@ -327,7 +330,7 @@ $ scontrol -d show node p02-intel02 | grep ActiveFeatures
Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPUs and FPGAs. Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPUs and FPGAs.
!!! warning !!! warning
Use only allocated GPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behaviour and get into troubles. Use only allocated GPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behavior and get into troubles.
### Node Resources ### Node Resources
...@@ -346,23 +349,25 @@ $ scontrol -d show node p03-amd02 | grep Gres= ...@@ -346,23 +349,25 @@ $ scontrol -d show node p03-amd02 | grep Gres=
### Request Resources ### Request Resources
To allocate required resources (GPUs or FPGAs) use --gres salloc/srun option. To allocate required resources (GPUs or FPGAs) use the `--gres salloc/srun` option.
Example: Allocate one FPGA
Example: Alocate one FPGA
``` ```
$ salloc -A PROJECT-ID -p p03-amd --gres fpga:1 $ salloc -A PROJECT-ID -p p03-amd --gres fpga:1
``` ```
### Find Out Allocated Resources ### Find Out Allocated Resources
Information about allocated resources is available in Slurm job details, attributes JOB_GRES and GRES. Information about allocated resources is available in Slurm job details, attributes `JOB_GRES` and `GRES`.
``` ```
$ scontrol -d show job $SLURM_JOBID |grep GRES= $ scontrol -d show job $SLURM_JOBID |grep GRES=
JOB_GRES=fpga:xilinx_alveo_u250:1 JOB_GRES=fpga:xilinx_alveo_u250:1
Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0) Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0)
``` ```
IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPUs) allocated to the job on the node. In the given example - allocated resources are fpga:xilinx_alveo_u250:1(IDX:0), we should use FPGA with index/number 0 on node p03-amd01.
IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPUs) allocated to the job on the node. In the given example - allocated resources are `fpga:xilinx_alveo_u250:1(IDX:0)`, we should use FPGA with index/number 0 on node p03-amd01.
### Request Specific Resources ### Request Specific Resources
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment