Skip to content
Snippets Groups Projects
Commit 0b9aba56 authored by Roman Sliva's avatar Roman Sliva
Browse files

Update job-scheduling.md

parent ec050eed
No related branches found
No related tags found
No related merge requests found
Pipeline #29238 passed with warnings
......@@ -133,7 +133,7 @@ $ scancel JOBID
| p00-arm | 1 | 64 | aarch64,cortex-a72 |
| p01-arm | 8 | 48 | aarch64,a64fx,ib |
| p02-intel | 2 | 64 | x86_64,intel,icelake,ib,fpga,bitware,nvdimm |
| p03-amd | 2 | 64 | x86_64,amd,milan,ib,gpgpu,mi100,fpga,xilinx |
| p03-amd | 2 | 64 | x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx |
| p04-edge | 1 | 16 | 86_64,intel,broadwell,ib |
| p05-synt | 1 | 8 | x86_64,amd,milan,ib,ht |
......@@ -198,33 +198,33 @@ salloc -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2
## Partition 03 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)
GPGPUs and FPGAs are treated as resources. See below for more details about resources.
GPUs and FPGAs are treated as resources. See below for more details about resources.
Partial allocation - per GPGPU and per FPGA, resource separation is not enforced.
Use only GPGPUs and FPGAs allocated to the job!
Partial allocation - per GPU and per FPGA, resource separation is not enforced.
Use only GPUs and FPGAs allocated to the job!
One GPU:
```console
salloc -A PROJECT-ID -p p03-amd --gres=gpgpu
salloc -A PROJECT-ID -p p03-amd --gres=gpu
```
Two GPUs on the same node:
```console
salloc -A PROJECT-ID -p p03-amd --gres=gpgpu:2
salloc -A PROJECT-ID -p p03-amd --gres=gpu:2
```
Four GPUs on the same node:
```console
salloc -A PROJECT-ID -p p03-amd --gres=gpgpu:4
salloc -A PROJECT-ID -p p03-amd --gres=gpu:4
```
All GPUs:
```console
salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpgpu:4
salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4
```
One FPGA:
......@@ -248,19 +248,19 @@ salloc -A PROJECT-ID -p p03-amd -N 2--gres=fpga:2
One GPU and one FPGA on the same node:
```console
salloc -A PROJECT-ID -p p03-amd --gres=gpgpu,fpga
salloc -A PROJECT-ID -p p03-amd --gres=gpu,fpga
```
Four GPUs and two FPGAs on the same node:
```console
salloc -A PROJECT-ID -p p03-amd --gres=gpgpu:4,fpga:2
salloc -A PROJECT-ID -p p03-amd --gres=gpu:4,fpga:2
```
All GPUs and FPGAs:
```console
salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpgpu:4,fpga:2
salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4,fpga:2
```
## Partition 04 - Edge Server
......@@ -294,7 +294,7 @@ Users can select nodes based on the feature tags using --constraint option.
| broadwell | processor family |
| milan | processor family |
| ib | Infiniband |
| gpgpu | equipped with GPGPU |
| gpu | equipped with GPU |
| fpga | equipped with FPGA |
| nvdimm | equipped with NVDIMMs |
| ht | Hyperthreading enabled |
......@@ -307,8 +307,8 @@ p00-arm01 aarch64,cortex-a72
p01-arm[01-08] aarch64,a64fx,ib
p02-intel01 x86_64,intel,icelake,ib,fpga,bitware,nvdimm,ht
p02-intel02 x86_64,intel,icelake,ib,fpga,bitware,nvdimm,noht
p03-amd01 x86_64,amd,milan,ib,gpgpu,mi100,fpga,xilinx,ht
p03-amd02 x86_64,amd,milan,ib,gpgpu,mi100,fpga,xilinx,noht
p03-amd01 x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,ht
p03-amd02 x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,noht
p04-edge01 x86_64,intel,broadwell,ib,ht
p05-synt01 x86_64,amd,milan,ib,ht
```
......@@ -324,10 +324,10 @@ $ scontrol -d show node p02-intel02 | grep ActiveFeatures
## Resources, GRES
Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPGPUs and FPGAs.
Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPUs and FPGAs.
!!! warning
Use only allocated GPGPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behaviour and get into troubles.
Use only allocated GPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behaviour and get into troubles.
### Node Resources
......@@ -339,14 +339,14 @@ $ scontrol -d show node p02-intel01 | grep Gres=
$ scontrol -d show node p02-intel02 | grep Gres=
Gres=fpga:bitware_520n_mx:2
$ scontrol -d show node p03-amd01 | grep Gres=
Gres=gpgpu:amd_mi100:4,fpga:xilinx_alveo_u250:2
Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u250:2
$ scontrol -d show node p03-amd02 | grep Gres=
Gres=gpgpu:amd_mi100:4,fpga:xilinx_alveo_u280:2
Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u280:2
```
### Request Resources
To allocate required resources (GPGPUs or FPGAs) use --gres salloc/srun option.
To allocate required resources (GPUs or FPGAs) use --gres salloc/srun option.
Example: Alocate one FPGA
```
......@@ -362,7 +362,7 @@ $ scontrol -d show job $SLURM_JOBID |grep GRES=
JOB_GRES=fpga:xilinx_alveo_u250:1
Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0)
```
IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPGPUs) allocated to the job on the node. In the given example - allocated resources are fpga:xilinx_alveo_u250:1(IDX:0), we should use FPGA with index/number 0 on node p03-amd01.
IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPUs) allocated to the job on the node. In the given example - allocated resources are fpga:xilinx_alveo_u250:1(IDX:0), we should use FPGA with index/number 0 on node p03-amd01.
### Request Specific Resources
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment