Skip to content
Snippets Groups Projects
Commit fb57b0d4 authored by Roman Sliva's avatar Roman Sliva
Browse files

Update job-scheduling.md

parent ed3a3130
No related branches found
No related tags found
No related merge requests found
Pipeline #29187 passed with warnings
......@@ -153,6 +153,7 @@ sbatch -A PROJECT-ID -p p01-arm -N=8 ./script.sh
FPGAs are treated as resources. See below for more details about resources.
Partial allocation - per FPGA, resource separation is not enforced.
Use only FPGAs allocated to the job!
One FPGA:
......@@ -177,6 +178,7 @@ sbatch -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2 ./script.sh
GPGPUs and FPGAs are treated as resources. See below for more details about resources.
Partial allocation - per GPGPU and per FPGA, resource separation is not enforced.
Use only GPGPUs and FPGAs allocated to the job!
One GPU:
......@@ -301,6 +303,9 @@ $ scontrol -d show node p02-intel02 | grep ActiveFeatures
Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPGPUs and FPGAs.
!!! warning
Use only allocated GPGPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behaviour and get into troubles.
Get information about GRES on node:
```
......@@ -310,7 +315,29 @@ $ scontrol -d show node p03-amd02 | grep Gres=
Gres=gpgpu:amd_mi100:4,fpga:xilinx_alveo_u280:2
```
Request specified GRES. GRES entry is using format "name[[:type]:count", in the following example name is fpga, type is xilinx_alveo_u280, and count is count 2.
Example: Alocate one FPGA
```
$ salloc -A PROJECT-ID -p p03-amd --gres fpga:1
salloc: Granted job allocation XXX
salloc: Waiting for resource configuration
salloc: Nodes p03-amd01 are ready for job
avx2 modules + all modules
```
Find out allocated FPGA.
```
$ scontrol -d show job $SLURM_JOBID |grep GRES=
JOB_GRES=fpga:xilinx_alveo_u250:1
Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0)
```
IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPGPUs) allocated to the job on the node. In the given example - allocated resources are fpga:xilinx_alveo_u250:1(IDX:0), we should use FPGA with index/number 0.
### Request Specific Resource.
It is possible to allocate specific resources. It is useful for partition p03-amd, where FPGAs of different types are available.
GRES entry is using format "name[[:type]:count", in the following example name is fpga, type is xilinx_alveo_u280, and count is count 2.
```
$ salloc -A PROJECT-ID -p p03-amd --gres=fpga:xilinx_alveo_u280:2
......@@ -320,7 +347,7 @@ salloc: Nodes p03-amd02 are ready for job
$ scontrol -d show job $SLURM_JOBID | grep -i gres
JOB_GRES=fpga:xilinx_alveo_u280:2
Nodes=p03-amd02 CPU_IDs=0 Mem=0 GRES=fpga:xilinx_alveo_u280(CNT:2)
Nodes=p03-amd02 CPU_IDs=0 Mem=0 GRES=fpga:xilinx_alveo_u280(IDX:0-1)
TresPerNode=gres:fpga:xilinx_alveo_u280:2
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment