diff --git a/docs.it4i/cs/amd.md b/docs.it4i/cs/amd.md index dda29f379a9b6798f05e38b7c0b515ea24673d83..7286751e9683f55796806a6deeb3a39a21375460 100644 --- a/docs.it4i/cs/amd.md +++ b/docs.it4i/cs/amd.md @@ -9,14 +9,15 @@ salloc -N 1 -c 64 -A PROJECT-ID -p p03-amd --gres=gpu:4 --time=08:00:00 where: -- -N 1 means allocating one server, -- -c 64 means allocation 64 cores, -- -A is your project, -- -p p03-amd is AMD partition, -- --gres=gpu:4 means allcating all 4 GPUs of the node, -- --time=08:00:00 means allocation for 8 hours. +- `-N 1` means allocating one server, +- `-c 64` means allocating 64 cores, +- `-A` is your project, +- `-p p03-amd` is AMD partition, +- `--gres=gpu:4` means allocating all 4 GPUs of the node, +- `--time=08:00:00` means allocation for 8 hours. -You have also an option to allocate subset of the resources only, by reducing the -c and --gres=gpu to smaller values. +You have also an option to allocate subset of the resources only, +by reducing the `-c` and `--gres=gpu` to smaller values. ```console salloc -N 1 -c 48 -A PROJECT-ID -p p03-amd --gres=gpu:3 --time=08:00:00 @@ -25,15 +26,15 @@ salloc -N 1 -c 16 -A PROJECT-ID -p p03-amd --gres=gpu:1 --time=08:00:00 ``` !!! Note - p03-amd01 server has hyperthreading enabled therefore htop shows 128 cores. - - p03-amd02 server has hyperthreading dissabled therefore htop shows 64 cores. + p03-amd01 server has hyperthreading **enabled** therefore htop shows 128 cores.<br> + p03-amd02 server has hyperthreading **disabled** therefore htop shows 64 cores. ## Using AMD MI100 GPUs The AMD GPUs can be programmed using the [ROCm open-source platform](https://docs.amd.com/). -ROCm and related libraries are installed directly in the system. You can find it here: +ROCm and related libraries are installed directly in the system. +You can find it here: ```console /opt/rocm/ @@ -51,7 +52,9 @@ The actual version can be found here: The first way how to program AMD GPUs is to use HIP. -The basic vector addition code in HIP looks like this. This a full code and you can copy and paste it into a file. For this example we use `vector_add.hip.cpp`. +The basic vector addition code in HIP looks like this. +This a full code and you can copy and paste it into a file. +For this example we use `vector_add.hip.cpp`. ```console #include <cstdio> @@ -126,7 +129,8 @@ int main() } ``` -To compile the code we use `hipcc` compiler. The compiler information can be found like this: +To compile the code we use `hipcc` compiler. +For compiler information, use `hipcc --version`: ```console [user@p03-amd02.cs ~]$ hipcc --version @@ -193,7 +197,9 @@ drwxr-xr-x 3 root root 29 Jun 7 14:09 rocthrust ## Using HipBlas Library -The basic code in HIP that uses hipBlas looks like this. This a full code and you can copy and paste it into a file. For this example we use `hipblas.hip.cpp`. +The basic code in HIP that uses hipBlas looks like this. +This a full code and you can copy and paste it into a file. +For this example we use `hipblas.hip.cpp`. ```console #include <cstdio> @@ -316,7 +322,9 @@ hipcc hipblas.hip.cpp -o hipblas.x -lhipblas ## Using HipSolver Library -The basic code in HIP that uses hipSolver looks like this. This a full code and you can copy and paste it into a file. For this example we use `hipsolver.hip.cpp`. +The basic code in HIP that uses hipSolver looks like this. +This a full code and you can copy and paste it into a file. +For this example we use `hipsolver.hip.cpp`. ```console #include <cstdio> @@ -452,9 +460,12 @@ hipcc hipsolver.hip.cpp -o hipsolver.x -lhipblas -lhipsolver ## Using OpenMP Offload to Program AMD GPUs -The ROCmâ„¢ installation includes an LLVM-based implementation that fully supports the OpenMP 4.5 standard and a subset of the OpenMP 5.0 standard. Fortran, C/C++ compilers, and corresponding runtime libraries are included. +The ROCmâ„¢ installation includes an LLVM-based implementation that fully supports the OpenMP 4.5 standard +and a subset of the OpenMP 5.0 standard. +Fortran, C/C++ compilers, and corresponding runtime libraries are included. -The OpenMP toolchain is automatically installed as part of the standard ROCm installation and is available under `/opt/rocm/llvm`. The sub-directories are: +The OpenMP toolchain is automatically installed as part of the standard ROCm installation +and is available under `/opt/rocm/llvm`. The sub-directories are: - `bin` : Compilers (flang and clang) and other binaries. - `examples` : The usage section below shows how to compile and run these programs. @@ -466,7 +477,9 @@ More information can be found in the [AMD OpenMP Support Guide](https://docs.amd ## Compilation of OpenMP Code -Basic example that uses OpenMP offload is here. Again, code is comlete and can be copy and pasted into file. Here we use `vadd.cpp`. +Basic example that uses OpenMP offload is here. +Again, code is complete and can be copied and pasted into a file. +Here we use `vadd.cpp`. ```console #include <cstdio> @@ -535,8 +548,11 @@ These options are required for target offload from an OpenMP program: - `-fopenmp-targets=amdgcn-amd-amdhsa` - `-Xopenmp-target=amdgcn-amd-amdhsa` -This flag specifies the GPU architecture of targeted GPU. You need to chage this when moving for instance to LUMI with MI250X GPU. The MI100 GPUs presented in CS have code `gfx908`: +This flag specifies the GPU architecture of targeted GPU. +You need to chage this when moving for instance to LUMI with MI250X GPU. +The MI100 GPUs presented in CS have code `gfx908`: - `-march=gfx908` -Note: You also have to include the `O0`, `O2`, `O3` or `O3` flag. Without this flag the execution of the compiled code fails. +Note: You also have to include the `O0`, `O2`, `O3` or `O3` flag. +Without this flag the execution of the compiled code fails.