diff --git a/docs.it4i/cs/job-scheduling.md b/docs.it4i/cs/job-scheduling.md index ecd9b0ca284316c04fdb60956d452494af81b186..c396bbcfef0c4c81ebd9b0ccfc5db844c1b7f01c 100644 --- a/docs.it4i/cs/job-scheduling.md +++ b/docs.it4i/cs/job-scheduling.md @@ -153,6 +153,7 @@ sbatch -A PROJECT-ID -p p01-arm -N=8 ./script.sh FPGAs are treated as resources. See below for more details about resources. Partial allocation - per FPGA, resource separation is not enforced. +Use only FPGAs allocated to the job! One FPGA: @@ -177,6 +178,7 @@ sbatch -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2 ./script.sh GPGPUs and FPGAs are treated as resources. See below for more details about resources. Partial allocation - per GPGPU and per FPGA, resource separation is not enforced. +Use only GPGPUs and FPGAs allocated to the job! One GPU: @@ -301,6 +303,9 @@ $ scontrol -d show node p02-intel02 | grep ActiveFeatures Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPGPUs and FPGAs. +!!! warning +Use only allocated GPGPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behaviour and get into troubles. + Get information about GRES on node: ``` @@ -310,7 +315,29 @@ $ scontrol -d show node p03-amd02 | grep Gres= Gres=gpgpu:amd_mi100:4,fpga:xilinx_alveo_u280:2 ``` -Request specified GRES. GRES entry is using format "name[[:type]:count", in the following example name is fpga, type is xilinx_alveo_u280, and count is count 2. +Example: Alocate one FPGA +``` +$ salloc -A PROJECT-ID -p p03-amd --gres fpga:1 +salloc: Granted job allocation XXX +salloc: Waiting for resource configuration +salloc: Nodes p03-amd01 are ready for job + + avx2 modules + all modules +``` + +Find out allocated FPGA. +``` +$ scontrol -d show job $SLURM_JOBID |grep GRES= + JOB_GRES=fpga:xilinx_alveo_u250:1 + Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0) +``` +IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPGPUs) allocated to the job on the node. In the given example - allocated resources are fpga:xilinx_alveo_u250:1(IDX:0), we should use FPGA with index/number 0. + +### Request Specific Resource. + +It is possible to allocate specific resources. It is useful for partition p03-amd, where FPGAs of different types are available. + +GRES entry is using format "name[[:type]:count", in the following example name is fpga, type is xilinx_alveo_u280, and count is count 2. ``` $ salloc -A PROJECT-ID -p p03-amd --gres=fpga:xilinx_alveo_u280:2 @@ -320,7 +347,7 @@ salloc: Nodes p03-amd02 are ready for job $ scontrol -d show job $SLURM_JOBID | grep -i gres JOB_GRES=fpga:xilinx_alveo_u280:2 - Nodes=p03-amd02 CPU_IDs=0 Mem=0 GRES=fpga:xilinx_alveo_u280(CNT:2) + Nodes=p03-amd02 CPU_IDs=0 Mem=0 GRES=fpga:xilinx_alveo_u280(IDX:0-1) TresPerNode=gres:fpga:xilinx_alveo_u280:2 ```