Skip to content
Snippets Groups Projects
Commit fb57b0d4 authored by Roman Sliva's avatar Roman Sliva
Browse files

Update job-scheduling.md

parent ed3a3130
No related branches found
No related tags found
No related merge requests found
Pipeline #29187 passed with warnings
...@@ -153,6 +153,7 @@ sbatch -A PROJECT-ID -p p01-arm -N=8 ./script.sh ...@@ -153,6 +153,7 @@ sbatch -A PROJECT-ID -p p01-arm -N=8 ./script.sh
FPGAs are treated as resources. See below for more details about resources. FPGAs are treated as resources. See below for more details about resources.
Partial allocation - per FPGA, resource separation is not enforced. Partial allocation - per FPGA, resource separation is not enforced.
Use only FPGAs allocated to the job!
One FPGA: One FPGA:
...@@ -177,6 +178,7 @@ sbatch -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2 ./script.sh ...@@ -177,6 +178,7 @@ sbatch -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2 ./script.sh
GPGPUs and FPGAs are treated as resources. See below for more details about resources. GPGPUs and FPGAs are treated as resources. See below for more details about resources.
Partial allocation - per GPGPU and per FPGA, resource separation is not enforced. Partial allocation - per GPGPU and per FPGA, resource separation is not enforced.
Use only GPGPUs and FPGAs allocated to the job!
One GPU: One GPU:
...@@ -301,6 +303,9 @@ $ scontrol -d show node p02-intel02 | grep ActiveFeatures ...@@ -301,6 +303,9 @@ $ scontrol -d show node p02-intel02 | grep ActiveFeatures
Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPGPUs and FPGAs. Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPGPUs and FPGAs.
!!! warning
Use only allocated GPGPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behaviour and get into troubles.
Get information about GRES on node: Get information about GRES on node:
``` ```
...@@ -310,7 +315,29 @@ $ scontrol -d show node p03-amd02 | grep Gres= ...@@ -310,7 +315,29 @@ $ scontrol -d show node p03-amd02 | grep Gres=
Gres=gpgpu:amd_mi100:4,fpga:xilinx_alveo_u280:2 Gres=gpgpu:amd_mi100:4,fpga:xilinx_alveo_u280:2
``` ```
Request specified GRES. GRES entry is using format "name[[:type]:count", in the following example name is fpga, type is xilinx_alveo_u280, and count is count 2. Example: Alocate one FPGA
```
$ salloc -A PROJECT-ID -p p03-amd --gres fpga:1
salloc: Granted job allocation XXX
salloc: Waiting for resource configuration
salloc: Nodes p03-amd01 are ready for job
avx2 modules + all modules
```
Find out allocated FPGA.
```
$ scontrol -d show job $SLURM_JOBID |grep GRES=
JOB_GRES=fpga:xilinx_alveo_u250:1
Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0)
```
IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPGPUs) allocated to the job on the node. In the given example - allocated resources are fpga:xilinx_alveo_u250:1(IDX:0), we should use FPGA with index/number 0.
### Request Specific Resource.
It is possible to allocate specific resources. It is useful for partition p03-amd, where FPGAs of different types are available.
GRES entry is using format "name[[:type]:count", in the following example name is fpga, type is xilinx_alveo_u280, and count is count 2.
``` ```
$ salloc -A PROJECT-ID -p p03-amd --gres=fpga:xilinx_alveo_u280:2 $ salloc -A PROJECT-ID -p p03-amd --gres=fpga:xilinx_alveo_u280:2
...@@ -320,7 +347,7 @@ salloc: Nodes p03-amd02 are ready for job ...@@ -320,7 +347,7 @@ salloc: Nodes p03-amd02 are ready for job
$ scontrol -d show job $SLURM_JOBID | grep -i gres $ scontrol -d show job $SLURM_JOBID | grep -i gres
JOB_GRES=fpga:xilinx_alveo_u280:2 JOB_GRES=fpga:xilinx_alveo_u280:2
Nodes=p03-amd02 CPU_IDs=0 Mem=0 GRES=fpga:xilinx_alveo_u280(CNT:2) Nodes=p03-amd02 CPU_IDs=0 Mem=0 GRES=fpga:xilinx_alveo_u280(IDX:0-1)
TresPerNode=gres:fpga:xilinx_alveo_u280:2 TresPerNode=gres:fpga:xilinx_alveo_u280:2
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment