diff --git a/docs.it4i/cs/amd.md b/docs.it4i/cs/amd.md index 03ab26c1aefcb8d3e088f1bdce2110b14d34a10a..c9a1c996aab392ddfbb1b52714b10673e41bd3f0 100644 --- a/docs.it4i/cs/amd.md +++ b/docs.it4i/cs/amd.md @@ -3,9 +3,10 @@ For testing your application on the AMD partition, you need to prepare a job script for that partition or use the interactive job: -``` +```console salloc -N 1 -c 64 -A PROJECT-ID -p p03-amd --gres=gpu:4 --time=08:00:00 ``` + where: - -N 1 means allocating one server, - -c 64 means allocation 64 cores, @@ -16,17 +17,15 @@ where: You have also an option to allocate subset of the resources only, by reducing the -c and --gres=gpu to smaller values. -``` +```console salloc -N 1 -c 48 -A PROJECT-ID -p p03-amd --gres=gpu:3 --time=08:00:00 salloc -N 1 -c 32 -A PROJECT-ID -p p03-amd --gres=gpu:2 --time=08:00:00 salloc -N 1 -c 16 -A PROJECT-ID -p p03-amd --gres=gpu:1 --time=08:00:00 ``` -### Note: - -p03-amd01 server has hyperthreading enabled therefore htop shows 128 cores. - -p03-amd02 server has hyperthreading dissabled therefore htop shows 64 cores. +!!! Note + p03-amd01 server has hyperthreading enabled therefore htop shows 128 cores. + p03-amd02 server has hyperthreading dissabled therefore htop shows 64 cores. ## Using AMD MI100 GPUs @@ -34,23 +33,26 @@ p03-amd02 server has hyperthreading dissabled therefore htop shows 64 cores. The AMD GPUs can be programmed using the [ROCm open-source platform](https://docs.amd.com/). ROCm and related libraries are installed directly in the system. You can find it here: -``` + +```console /opt/rocm/ ``` + The actual version can be found here: -``` + +```console [user@p03-amd02.cs]$ cat /opt/rocm/.info/version 5.5.1-74 ``` -## Basic HIP code +## Basic HIP Code The first way how to program AMD GPUs is to use HIP. The basic vector addition code in HIP looks like this. This a full code and you can copy and paste it into a file. For this example we use `vector_add.hip.cpp` . -``` +```console #include <cstdio> #include <hip/hip_runtime.h> @@ -125,7 +127,7 @@ int main() To compile the code we use `hipcc` compiler. The compiler information can be found like this: -```` +```console [user@p03-amd02.cs ~]$ hipcc --version HIP version: 5.5.30202-eaf00c0b @@ -133,16 +135,17 @@ AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc- Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/rocm-5.5.1/llvm/bin -```` +``` The code is compiled a follows: -``` +```console hipcc vector_add.hip.cpp -o vector_add.x ``` The correct output of the code is: -``` + +```console [user@p03-amd02.cs ~]$ ./vector_add.x X: 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 Y: 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 @@ -151,18 +154,20 @@ Y: 0.00 110.00 220.00 330.00 440.00 550.00 660.00 770.00 880.00 990. More details on HIP programming is in the [HIP Programming Guide](https://docs.amd.com/bundle/HIP-Programming-Guide-v5.5/page/Introduction_to_HIP_Programming_Guide.html) -## HIP and ROCm libraries +## HIP and ROCm Libraries The list of official AMD libraries can be found [here](https://docs.amd.com/category/libraries). The libraries are installed in the same directory is ROCm -``` + +```console /opt/rocm/ ``` Following libraries are installed: -``` + +```console drwxr-xr-x 4 root root 44 Jun 7 14:09 hipblas drwxr-xr-x 3 root root 17 Jun 7 14:09 hipblas-clients drwxr-xr-x 3 root root 29 Jun 7 14:09 hipcub @@ -175,7 +180,7 @@ drwxr-xr-x 4 root root 44 Jun 7 14:09 hipsparse and -``` +```console drwxr-xr-x 4 root root 32 Jun 7 14:09 rocalution drwxr-xr-x 4 root root 44 Jun 7 14:09 rocblas drwxr-xr-x 4 root root 44 Jun 7 14:09 rocfft @@ -186,13 +191,11 @@ drwxr-xr-x 4 root root 44 Jun 7 14:09 rocsparse drwxr-xr-x 3 root root 29 Jun 7 14:09 rocthrust ``` - - -### Using hipBlas library +## Using HipBlas Library The basic code in HIP that uses hipBlas looks like this. This a full code and you can copy and paste it into a file. For this example we use `hipblas.hip.cpp` . -``` +```console #include <cstdio> #include <vector> #include <cstdlib> @@ -306,15 +309,16 @@ int main() ``` The code compilation can be done as follows: -``` + +```console hipcc hipblas.hip.cpp -o hipblas.x -lhipblas ``` -### Using hipSolver library +## Using HipSolver Library The basic code in HIP that uses hipSolver looks like this. This a full code and you can copy and paste it into a file. For this example we use `hipsolver.hip.cpp` . -``` +```console #include <cstdio> #include <vector> #include <cstdlib> @@ -441,11 +445,12 @@ int main() ``` The code compilation can be done as follows: -``` + +```console hipcc hipsolver.hip.cpp -o hipsolver.x -lhipblas -lhipsolver ``` -## Using OpenMP offload to program AMD GPUs +## Using OpenMP Offload to Program AMD GPUs The ROCmâ„¢ installation includes an LLVM-based implementation that fully supports the OpenMP 4.5 standard and a subset of the OpenMP 5.0 standard. Fortran, C/C++ compilers, and corresponding runtime libraries are included. @@ -459,12 +464,11 @@ The OpenMP toolchain is automatically installed as part of the standard ROCm ins More information can be found in the [AMD OpenMP Support Guide](https://docs.amd.com/bundle/OpenMP-Support-Guide-v5.5/page/Introduction_to_OpenMP_Support_Guide.html). - -### Compilation of OpenMP code +## Compilation of OpenMP Code Basic example that uses OpenMP offload is here. Again, code is comlete and can be copy and pasted into file. Here we use `vadd.cpp`. -``` +```console #include <cstdio> #include <cstdlib> @@ -520,7 +524,7 @@ int main(int argc, char ** argv) This code can be compiled like this: -``` +```console /opt/rocm/llvm/bin/clang++ -O3 -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908 vadd.cpp -o vadd.x ``` @@ -533,4 +537,4 @@ These options are required for target offload from an OpenMP program: This flag specifies the GPU architecture of targeted GPU. You need to chage this when moving for instance to LUMI with MI250X GPU. The MI100 GPUs presented in CS have code `gfx908`: - `-march=gfx908` -Note: You also have to include the `O0`, `O2`, `O3` or `O3` flag. Without this flag the execution of the compiled code fails. +Note: You also have to include the `O0`, `O2`, `O3` or `O3` flag. Without this flag the execution of the compiled code fails.