diff --git a/docs.it4i/cs/amd.md b/docs.it4i/cs/amd.md
index f1d7076056412bea5c62b8a34aa9e4b3002506e5..03ab26c1aefcb8d3e088f1bdce2110b14d34a10a 100644
--- a/docs.it4i/cs/amd.md
+++ b/docs.it4i/cs/amd.md
@@ -6,18 +6,15 @@ you need to prepare a job script for that partition or use the interactive job:
 ```
 salloc -N 1 -c 64 -A PROJECT-ID -p p03-amd --gres=gpu:4 --time=08:00:00
 ```
+where: 
+- -N 1 means allocating one server, 
+- -c 64 means allocation 64 cores,  
+- -A is your project, 
+- -p p03-amd is AMD partition, 
+- --gres=gpu:4 means allcating all 4 GPUs of the node,
+- --time=08:00:00 means allocation for 8 hours.  
 
-where:
-
-- `-N 1` means allocating one server,
-- `-c 64` means allocation 64 cores,
-- `-A` is your project,
-- `-p p03-amd` is AMD partition,
-- `--gres=gpu:4` means allocating all 4 GPUs of the node,
-- `--time=08:00:00` means allocation for 8 hours.
-
-You have also an option to allocate a subset of the resources only,
-by reducing the `-c` and `--gres=gpu` to smaller values.
+You have also an option to allocate subset of the resources only, by reducing the -c and --gres=gpu to smaller values. 
 
 ```
 salloc -N 1 -c 48 -A PROJECT-ID -p p03-amd --gres=gpu:3 --time=08:00:00
@@ -25,39 +22,33 @@ salloc -N 1 -c 32 -A PROJECT-ID -p p03-amd --gres=gpu:2 --time=08:00:00
 salloc -N 1 -c 16 -A PROJECT-ID -p p03-amd --gres=gpu:1 --time=08:00:00
 ```
 
-!!! Note
+### Note: 
 
-    p03-amd01 server has hyperthreading **enabled** therefore htop shows 128 cores.
+p03-amd01 server has hyperthreading enabled therefore htop shows 128 cores.
 
-    p03-amd02 server has hyperthreading **disabled** therefore htop shows 64 cores.
+p03-amd02 server has hyperthreading dissabled therefore htop shows 64 cores.
 
-## Using AMD MI100 GPUs
 
-The AMD GPUs can be programmed using the ROCm open-source platform
-(for more information, see [https://docs.amd.com/][1].)
+## Using AMD MI100 GPUs
 
-ROCm and related libraries are installed directly in the system.
-You can find it here:
+The AMD GPUs can be programmed using the [ROCm open-source platform](https://docs.amd.com/). 
 
+ROCm and related libraries are installed directly in the system. You can find it here: 
 ```
 /opt/rocm/
 ```
-
-The actual version can be found here:
-
+The actual version can be found here: 
 ```
 [user@p03-amd02.cs]$ cat /opt/rocm/.info/version
 
 5.5.1-74
 ```
 
-## Basic HIP Code
+## Basic HIP code
 
-The first way how to program AMD GPUs is to use HIP.
+The first way how to program AMD GPUs is to use HIP. 
 
-The basic vector addition code in HIP looks like this.
-This a full code and you can copy and paste it into a file.
-For this example, we use `vector_add.hip.cpp`.
+The basic vector addition code in HIP looks like this. This a full code and you can copy and paste it into a file. For this example we use `vector_add.hip.cpp` .  
 
 ```
 #include <cstdio>
@@ -96,7 +87,7 @@ int main()
     for(long long i = 0; i < count; i++)
         printf(" %7.2f", h_y[i]);
     printf("\n");
-
+    
     // allocation of memory on the GPU device
     float * d_x;
     float * d_y;
@@ -132,46 +123,45 @@ int main()
 }
 ```
 
-To compile the code, we use `hipcc` compiler.
-The compiler information can be found like this:
+To compile the code we use `hipcc` compiler. The compiler information can be found like this: 
 
-```
-[user@p03-amd02.cs ~]$ hipcc --version
+````
+[user@p03-amd02.cs ~]$ hipcc --version 
 
 HIP version: 5.5.30202-eaf00c0b
 AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.5.1 23194 69ef12a7c3cc5b0ccf820bc007bd87e8b3ac3037)
 Target: x86_64-unknown-linux-gnu
 Thread model: posix
 InstalledDir: /opt/rocm-5.5.1/llvm/bin
-```
+````
 
-The code is compiled as follows:
+The code is compiled a follows: 
 
 ```
 hipcc vector_add.hip.cpp -o vector_add.x
 ```
 
-The correct output of the code is:
-
+The correct output of the code is: 
 ```
-[user@p03-amd02.cs ~]$ ./vector_add.x
+[user@p03-amd02.cs ~]$ ./vector_add.x 
 X:    0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00
 Y:    0.00   10.00   20.00   30.00   40.00   50.00   60.00   70.00   80.00   90.00
 Y:    0.00  110.00  220.00  330.00  440.00  550.00  660.00  770.00  880.00  990.00
 ```
 
-## HIP and ROCm Libraries
+More details on HIP programming is in the [HIP Programming Guide](https://docs.amd.com/bundle/HIP-Programming-Guide-v5.5/page/Introduction_to_HIP_Programming_Guide.html)
 
-The list of official AMD libraries can be found here: [https://docs.amd.com/category/libraries][2].
+## HIP and ROCm libraries
 
-The libraries are installed in the same directory as ROCm
+The list of official AMD libraries can be found [here](https://docs.amd.com/category/libraries). 
 
+
+The libraries are installed in the same directory is ROCm 
 ```
 /opt/rocm/
 ```
 
-Following libraries are installed:
-
+Following libraries are installed: 
 ```
 drwxr-xr-x  4 root root   44 Jun  7 14:09 hipblas
 drwxr-xr-x  3 root root   17 Jun  7 14:09 hipblas-clients
@@ -183,7 +173,7 @@ drwxr-xr-x  4 root root   44 Jun  7 14:09 hipsolver
 drwxr-xr-x  4 root root   44 Jun  7 14:09 hipsparse
 ```
 
-and
+and 
 
 ```
 drwxr-xr-x  4 root root   32 Jun  7 14:09 rocalution
@@ -196,11 +186,11 @@ drwxr-xr-x  4 root root   44 Jun  7 14:09 rocsparse
 drwxr-xr-x  3 root root   29 Jun  7 14:09 rocthrust
 ```
 
-### Using hipBlas Library
 
-The basic code in HIP that uses hipBlas looks like this.
-This is a full code and you can copy and paste it into a file.
-For this example we use `hipblas.hip.cpp`.
+
+### Using hipBlas library
+
+The basic code in HIP that uses hipBlas looks like this. This a full code and you can copy and paste it into a file. For this example we use `hipblas.hip.cpp` .  
 
 ```
 #include <cstdio>
@@ -211,7 +201,7 @@ For this example we use `hipblas.hip.cpp`.
 
 
 int main()
-{
+{    
     srand(9600);
 
     int width = 10;
@@ -241,7 +231,7 @@ int main()
     for(int i = 0; i < width; i++)
         printf("%6.3f  ", h_x[i]);
     printf("\n");
-
+    
     float * h_y;
     hipHostMalloc(&h_y, height * sizeof(*h_y));
     for(int i = 0; i < height; i++)
@@ -251,7 +241,7 @@ int main()
         printf("%6.3f  ", h_x[i]);
     printf("\n");
 
-
+    
     // initialization of data in GPU memory
 
     float * d_A;
@@ -263,7 +253,7 @@ int main()
     float * d_x;
     hipMalloc(&d_x, width * sizeof(*d_x));
     hipMemcpy(d_x, h_x, width * sizeof(*d_x), hipMemcpyHostToDevice);
-
+    
     float * d_y;
     hipMalloc(&d_y, height * sizeof(*d_y));
     hipMemcpy(d_y, h_y, height * sizeof(*d_y), hipMemcpyHostToDevice);
@@ -282,8 +272,8 @@ int main()
     for(int i = 0; i < height; i++)
         printf("%6.3f  ", h_y[i]);
     printf("\n");
-
-
+    
+    
     // calculation of the result on the GPU using the hipBLAS library
 
     hipblasHandle_t blas_handle;
@@ -293,7 +283,7 @@ int main()
     hipDeviceSynchronize();
 
     hipblasDestroy(blas_handle);
-
+    
 
     // copy the GPU result to CPU memory and print it
     hipMemcpy(h_y, d_y, height * sizeof(*d_y), hipMemcpyDeviceToHost);
@@ -315,17 +305,14 @@ int main()
 }
 ```
 
-The code compilation can be done as follows:
-
+The code compilation can be done as follows: 
 ```
 hipcc hipblas.hip.cpp -o hipblas.x -lhipblas
 ```
 
-### Using hipSolver Library
+### Using hipSolver library
 
-The basic code in HIP that uses hipSolver looks like this.
-This a full code and you can copy and paste it into a file.
-For this example we use `hipsolver.hip.cpp`.
+The basic code in HIP that uses hipSolver looks like this. This a full code and you can copy and paste it into a file. For this example we use `hipsolver.hip.cpp` .  
 
 ```
 #include <cstdio>
@@ -356,8 +343,8 @@ int main()
         for(int c = 0; c < size; c++)
             printf("%6.3f  ", h_A[r * h_A_ld + c]);
         printf("\n");
-    }
-
+    }    
+    
     std::vector<float> h_b(size);
     for(int i = 0; i < size; i++)
         h_b[i] = (10.0 * rand()) / RAND_MAX;
@@ -378,7 +365,7 @@ int main()
 
     float * d_b;
     hipMalloc(&d_b, size * sizeof(float));
-
+    
     float * d_x;
     hipMalloc(&d_x, size * sizeof(float));
 
@@ -390,7 +377,7 @@ int main()
 
     hipMemcpy2D(d_A, d_A_pitch, h_A.data(), h_A_pitch, size * sizeof(float), size, hipMemcpyHostToDevice);
     hipMemcpy(d_b, h_b.data(), size * sizeof(float), hipMemcpyHostToDevice);
-
+    
 
     // solving the system using hipSOLVER
 
@@ -403,7 +390,7 @@ int main()
     float * workspace;
     int wss = std::max(wss_trf, wss_trs);
     hipMalloc(&workspace, wss * sizeof(float));
-
+    
     hipsolverSgetrf(solverHandle, size, size, d_A, d_A_ld, workspace, wss, d_piv, info);
     hipsolverSgetrs(solverHandle, HIPSOLVER_OP_N, size, 1, d_A, d_A_ld, d_piv, d_b, size, workspace, wss, info);
 
@@ -453,18 +440,97 @@ int main()
 }
 ```
 
-The code compilation can be done as follows:
-
+The code compilation can be done as follows: 
 ```
 hipcc hipsolver.hip.cpp -o hipsolver.x -lhipblas -lhipsolver
 ```
 
-### Other AMD Libraries and Frameworks
+## Using OpenMP offload to program AMD GPUs 
+
+The ROCm™ installation includes an LLVM-based implementation that fully supports the OpenMP 4.5 standard and a subset of the OpenMP 5.0 standard. Fortran, C/C++ compilers, and corresponding runtime libraries are included.
+
+The OpenMP toolchain is automatically installed as part of the standard ROCm installation and is available under `/opt/rocm/llvm`. The sub-directories are:
+
+- `bin` : Compilers (flang and clang) and other binaries.
+- `examples` : The usage section below shows how to compile and run these programs.
+- `include` : Header files.
+- `lib` : Libraries including those required for target offload.
+- `lib-debug` : Debug versions of the above libraries.
+
+More information can be found in the [AMD OpenMP Support Guide](https://docs.amd.com/bundle/OpenMP-Support-Guide-v5.5/page/Introduction_to_OpenMP_Support_Guide.html). 
+
+
+### Compilation of OpenMP code 
+
+Basic example that uses OpenMP offload is here. Again, code is comlete and can be copy and pasted into file. Here we use `vadd.cpp`. 
+
+```
+#include <cstdio>
+#include <cstdlib>
+
+int main(int argc, char ** argv)
+{
+    long long count = 1 << 20;
+    if(argc > 1)
+        count = atoll(argv[1]);
+    long long print_count = 16;
+    if(argc > 2)
+        print_count = atoll(argv[2]);
+
+    long long * a = new long long[count];
+    long long * b = new long long[count];
+    long long * c = new long long[count];
+
+#pragma omp parallel for
+    for(long long i = 0; i < count; i++)
+    {
+        a[i] = i;
+        b[i] = 10 * i;
+    }
+
+    printf("A: ");
+    for(long long i = 0; i < print_count; i++)
+        printf("%3lld ", a[i]);
+    printf("\n");
+    
+    printf("B: ");
+    for(long long i = 0; i < print_count; i++)
+        printf("%3lld ", b[i]);
+    printf("\n");
+
+#pragma omp target map(to: a[0:count],b[0:count]) map(from: c[0:count])
+#pragma omp teams distribute parallel for
+    for(long long i = 0; i < count; i++)
+    {
+        c[i] = a[i] + b[i];
+    }
+
+    printf("C: ");
+    for(long long i = 0; i < print_count; i++)
+        printf("%3lld ", c[i]);
+    printf("\n");
+
+    delete[] a;
+    delete[] b;
+    delete[] c;
+
+    return 0;
+}
+```
+
+This code can be compiled like this: 
+
+```
+/opt/rocm/llvm/bin/clang++ -O3 -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908 vadd.cpp -o vadd.x
+```
+
+These options are required for target offload from an OpenMP program:
+- `-target x86_64-pc-linux-gnu` 
+- `-fopenmp` 
+- `-fopenmp-targets=amdgcn-amd-amdhsa` 
+- `-Xopenmp-target=amdgcn-amd-amdhsa` 
 
-Please see [gcc options](https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html) for more advanced compilation settings.
-No complications are expected as long as the application does not use any intrinsic for `x64` architecture.
-If you want to use intrinsic,
-[SVE](https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics) instruction set is available.
+This flag specifies the GPU architecture of targeted GPU. You need to chage this when moving for instance to LUMI with MI250X GPU. The MI100 GPUs presented in CS have code `gfx908`: 
+- `-march=gfx908`
 
-[1]: https://docs.amd.com/
-[2]: https://docs.amd.com/category/libraries
+Note: You also have to include the `O0`, `O2`, `O3` or `O3` flag. Without this flag the execution of the compiled code fails.