diff --git a/docs.it4i/cs/job-scheduling.md b/docs.it4i/cs/job-scheduling.md index f516828d055b3d1b3180f8a328d62bba19825fd5..50c1d4cdcc421b9d815546a24c6cb356527625be 100644 --- a/docs.it4i/cs/job-scheduling.md +++ b/docs.it4i/cs/job-scheduling.md @@ -133,7 +133,7 @@ $ scancel JOBID | p00-arm | 1 | 64 | aarch64,cortex-a72 | | p01-arm | 8 | 48 | aarch64,a64fx,ib | | p02-intel | 2 | 64 | x86_64,intel,icelake,ib,fpga,bitware,nvdimm | -| p03-amd | 2 | 64 | x86_64,amd,milan,ib,gpgpu,mi100,fpga,xilinx | +| p03-amd | 2 | 64 | x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx | | p04-edge | 1 | 16 | 86_64,intel,broadwell,ib | | p05-synt | 1 | 8 | x86_64,amd,milan,ib,ht | @@ -198,33 +198,33 @@ salloc -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2 ## Partition 03 - AMD (Milan, MI100 GPUs + Xilinx FPGAs) -GPGPUs and FPGAs are treated as resources. See below for more details about resources. +GPUs and FPGAs are treated as resources. See below for more details about resources. -Partial allocation - per GPGPU and per FPGA, resource separation is not enforced. -Use only GPGPUs and FPGAs allocated to the job! +Partial allocation - per GPU and per FPGA, resource separation is not enforced. +Use only GPUs and FPGAs allocated to the job! One GPU: ```console -salloc -A PROJECT-ID -p p03-amd --gres=gpgpu +salloc -A PROJECT-ID -p p03-amd --gres=gpu ``` Two GPUs on the same node: ```console -salloc -A PROJECT-ID -p p03-amd --gres=gpgpu:2 +salloc -A PROJECT-ID -p p03-amd --gres=gpu:2 ``` Four GPUs on the same node: ```console -salloc -A PROJECT-ID -p p03-amd --gres=gpgpu:4 +salloc -A PROJECT-ID -p p03-amd --gres=gpu:4 ``` All GPUs: ```console -salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpgpu:4 +salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4 ``` One FPGA: @@ -248,19 +248,19 @@ salloc -A PROJECT-ID -p p03-amd -N 2--gres=fpga:2 One GPU and one FPGA on the same node: ```console -salloc -A PROJECT-ID -p p03-amd --gres=gpgpu,fpga +salloc -A PROJECT-ID -p p03-amd --gres=gpu,fpga ``` Four GPUs and two FPGAs on the same node: ```console -salloc -A PROJECT-ID -p p03-amd --gres=gpgpu:4,fpga:2 +salloc -A PROJECT-ID -p p03-amd --gres=gpu:4,fpga:2 ``` All GPUs and FPGAs: ```console -salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpgpu:4,fpga:2 +salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4,fpga:2 ``` ## Partition 04 - Edge Server @@ -294,7 +294,7 @@ Users can select nodes based on the feature tags using --constraint option. | broadwell | processor family | | milan | processor family | | ib | Infiniband | -| gpgpu | equipped with GPGPU | +| gpu | equipped with GPU | | fpga | equipped with FPGA | | nvdimm | equipped with NVDIMMs | | ht | Hyperthreading enabled | @@ -307,8 +307,8 @@ p00-arm01 aarch64,cortex-a72 p01-arm[01-08] aarch64,a64fx,ib p02-intel01 x86_64,intel,icelake,ib,fpga,bitware,nvdimm,ht p02-intel02 x86_64,intel,icelake,ib,fpga,bitware,nvdimm,noht -p03-amd01 x86_64,amd,milan,ib,gpgpu,mi100,fpga,xilinx,ht -p03-amd02 x86_64,amd,milan,ib,gpgpu,mi100,fpga,xilinx,noht +p03-amd01 x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,ht +p03-amd02 x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,noht p04-edge01 x86_64,intel,broadwell,ib,ht p05-synt01 x86_64,amd,milan,ib,ht ``` @@ -324,10 +324,10 @@ $ scontrol -d show node p02-intel02 | grep ActiveFeatures ## Resources, GRES -Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPGPUs and FPGAs. +Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPUs and FPGAs. !!! warning - Use only allocated GPGPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behaviour and get into troubles. + Use only allocated GPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behaviour and get into troubles. ### Node Resources @@ -339,14 +339,14 @@ $ scontrol -d show node p02-intel01 | grep Gres= $ scontrol -d show node p02-intel02 | grep Gres= Gres=fpga:bitware_520n_mx:2 $ scontrol -d show node p03-amd01 | grep Gres= - Gres=gpgpu:amd_mi100:4,fpga:xilinx_alveo_u250:2 + Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u250:2 $ scontrol -d show node p03-amd02 | grep Gres= - Gres=gpgpu:amd_mi100:4,fpga:xilinx_alveo_u280:2 + Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u280:2 ``` ### Request Resources -To allocate required resources (GPGPUs or FPGAs) use --gres salloc/srun option. +To allocate required resources (GPUs or FPGAs) use --gres salloc/srun option. Example: Alocate one FPGA ``` @@ -362,7 +362,7 @@ $ scontrol -d show job $SLURM_JOBID |grep GRES= JOB_GRES=fpga:xilinx_alveo_u250:1 Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0) ``` -IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPGPUs) allocated to the job on the node. In the given example - allocated resources are fpga:xilinx_alveo_u250:1(IDX:0), we should use FPGA with index/number 0 on node p03-amd01. +IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPUs) allocated to the job on the node. In the given example - allocated resources are fpga:xilinx_alveo_u250:1(IDX:0), we should use FPGA with index/number 0 on node p03-amd01. ### Request Specific Resources