Skip to content
Snippets Groups Projects
Commit 0b9aba56 authored by Roman Sliva's avatar Roman Sliva
Browse files

Update job-scheduling.md

parent ec050eed
No related branches found
No related tags found
Loading
Pipeline #29238 passed with warnings
...@@ -133,7 +133,7 @@ $ scancel JOBID ...@@ -133,7 +133,7 @@ $ scancel JOBID
| p00-arm | 1 | 64 | aarch64,cortex-a72 | | p00-arm | 1 | 64 | aarch64,cortex-a72 |
| p01-arm | 8 | 48 | aarch64,a64fx,ib | | p01-arm | 8 | 48 | aarch64,a64fx,ib |
| p02-intel | 2 | 64 | x86_64,intel,icelake,ib,fpga,bitware,nvdimm | | p02-intel | 2 | 64 | x86_64,intel,icelake,ib,fpga,bitware,nvdimm |
| p03-amd | 2 | 64 | x86_64,amd,milan,ib,gpgpu,mi100,fpga,xilinx | | p03-amd | 2 | 64 | x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx |
| p04-edge | 1 | 16 | 86_64,intel,broadwell,ib | | p04-edge | 1 | 16 | 86_64,intel,broadwell,ib |
| p05-synt | 1 | 8 | x86_64,amd,milan,ib,ht | | p05-synt | 1 | 8 | x86_64,amd,milan,ib,ht |
...@@ -198,33 +198,33 @@ salloc -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2 ...@@ -198,33 +198,33 @@ salloc -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2
## Partition 03 - AMD (Milan, MI100 GPUs + Xilinx FPGAs) ## Partition 03 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)
GPGPUs and FPGAs are treated as resources. See below for more details about resources. GPUs and FPGAs are treated as resources. See below for more details about resources.
Partial allocation - per GPGPU and per FPGA, resource separation is not enforced. Partial allocation - per GPU and per FPGA, resource separation is not enforced.
Use only GPGPUs and FPGAs allocated to the job! Use only GPUs and FPGAs allocated to the job!
One GPU: One GPU:
```console ```console
salloc -A PROJECT-ID -p p03-amd --gres=gpgpu salloc -A PROJECT-ID -p p03-amd --gres=gpu
``` ```
Two GPUs on the same node: Two GPUs on the same node:
```console ```console
salloc -A PROJECT-ID -p p03-amd --gres=gpgpu:2 salloc -A PROJECT-ID -p p03-amd --gres=gpu:2
``` ```
Four GPUs on the same node: Four GPUs on the same node:
```console ```console
salloc -A PROJECT-ID -p p03-amd --gres=gpgpu:4 salloc -A PROJECT-ID -p p03-amd --gres=gpu:4
``` ```
All GPUs: All GPUs:
```console ```console
salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpgpu:4 salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4
``` ```
One FPGA: One FPGA:
...@@ -248,19 +248,19 @@ salloc -A PROJECT-ID -p p03-amd -N 2--gres=fpga:2 ...@@ -248,19 +248,19 @@ salloc -A PROJECT-ID -p p03-amd -N 2--gres=fpga:2
One GPU and one FPGA on the same node: One GPU and one FPGA on the same node:
```console ```console
salloc -A PROJECT-ID -p p03-amd --gres=gpgpu,fpga salloc -A PROJECT-ID -p p03-amd --gres=gpu,fpga
``` ```
Four GPUs and two FPGAs on the same node: Four GPUs and two FPGAs on the same node:
```console ```console
salloc -A PROJECT-ID -p p03-amd --gres=gpgpu:4,fpga:2 salloc -A PROJECT-ID -p p03-amd --gres=gpu:4,fpga:2
``` ```
All GPUs and FPGAs: All GPUs and FPGAs:
```console ```console
salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpgpu:4,fpga:2 salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4,fpga:2
``` ```
## Partition 04 - Edge Server ## Partition 04 - Edge Server
...@@ -294,7 +294,7 @@ Users can select nodes based on the feature tags using --constraint option. ...@@ -294,7 +294,7 @@ Users can select nodes based on the feature tags using --constraint option.
| broadwell | processor family | | broadwell | processor family |
| milan | processor family | | milan | processor family |
| ib | Infiniband | | ib | Infiniband |
| gpgpu | equipped with GPGPU | | gpu | equipped with GPU |
| fpga | equipped with FPGA | | fpga | equipped with FPGA |
| nvdimm | equipped with NVDIMMs | | nvdimm | equipped with NVDIMMs |
| ht | Hyperthreading enabled | | ht | Hyperthreading enabled |
...@@ -307,8 +307,8 @@ p00-arm01 aarch64,cortex-a72 ...@@ -307,8 +307,8 @@ p00-arm01 aarch64,cortex-a72
p01-arm[01-08] aarch64,a64fx,ib p01-arm[01-08] aarch64,a64fx,ib
p02-intel01 x86_64,intel,icelake,ib,fpga,bitware,nvdimm,ht p02-intel01 x86_64,intel,icelake,ib,fpga,bitware,nvdimm,ht
p02-intel02 x86_64,intel,icelake,ib,fpga,bitware,nvdimm,noht p02-intel02 x86_64,intel,icelake,ib,fpga,bitware,nvdimm,noht
p03-amd01 x86_64,amd,milan,ib,gpgpu,mi100,fpga,xilinx,ht p03-amd01 x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,ht
p03-amd02 x86_64,amd,milan,ib,gpgpu,mi100,fpga,xilinx,noht p03-amd02 x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,noht
p04-edge01 x86_64,intel,broadwell,ib,ht p04-edge01 x86_64,intel,broadwell,ib,ht
p05-synt01 x86_64,amd,milan,ib,ht p05-synt01 x86_64,amd,milan,ib,ht
``` ```
...@@ -324,10 +324,10 @@ $ scontrol -d show node p02-intel02 | grep ActiveFeatures ...@@ -324,10 +324,10 @@ $ scontrol -d show node p02-intel02 | grep ActiveFeatures
## Resources, GRES ## Resources, GRES
Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPGPUs and FPGAs. Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPUs and FPGAs.
!!! warning !!! warning
Use only allocated GPGPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behaviour and get into troubles. Use only allocated GPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behaviour and get into troubles.
### Node Resources ### Node Resources
...@@ -339,14 +339,14 @@ $ scontrol -d show node p02-intel01 | grep Gres= ...@@ -339,14 +339,14 @@ $ scontrol -d show node p02-intel01 | grep Gres=
$ scontrol -d show node p02-intel02 | grep Gres= $ scontrol -d show node p02-intel02 | grep Gres=
Gres=fpga:bitware_520n_mx:2 Gres=fpga:bitware_520n_mx:2
$ scontrol -d show node p03-amd01 | grep Gres= $ scontrol -d show node p03-amd01 | grep Gres=
Gres=gpgpu:amd_mi100:4,fpga:xilinx_alveo_u250:2 Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u250:2
$ scontrol -d show node p03-amd02 | grep Gres= $ scontrol -d show node p03-amd02 | grep Gres=
Gres=gpgpu:amd_mi100:4,fpga:xilinx_alveo_u280:2 Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u280:2
``` ```
### Request Resources ### Request Resources
To allocate required resources (GPGPUs or FPGAs) use --gres salloc/srun option. To allocate required resources (GPUs or FPGAs) use --gres salloc/srun option.
Example: Alocate one FPGA Example: Alocate one FPGA
``` ```
...@@ -362,7 +362,7 @@ $ scontrol -d show job $SLURM_JOBID |grep GRES= ...@@ -362,7 +362,7 @@ $ scontrol -d show job $SLURM_JOBID |grep GRES=
JOB_GRES=fpga:xilinx_alveo_u250:1 JOB_GRES=fpga:xilinx_alveo_u250:1
Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0) Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0)
``` ```
IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPGPUs) allocated to the job on the node. In the given example - allocated resources are fpga:xilinx_alveo_u250:1(IDX:0), we should use FPGA with index/number 0 on node p03-amd01. IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPUs) allocated to the job on the node. In the given example - allocated resources are fpga:xilinx_alveo_u250:1(IDX:0), we should use FPGA with index/number 0 on node p03-amd01.
### Request Specific Resources ### Request Specific Resources
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment