p03-amd01 server has hyperthreading enabled therefore htop shows 128 cores.
p03-amd02 server has hyperthreading dissabled therefore htop shows 64 cores.
## Using AMD MI100 GPUs
The AMD GPUs can be programmed using the ROCm open-source platform (see: https://docs.amd.com/ for more information.)
ROCm and related libraries are installed directly in the system. You can find it here:
```
/opt/rocm/
```
The actual version can be found here:
```
[user@p03-amd02.cs]$ cat /opt/rocm/.info/version
5.5.1-74
```
## Basic HIP code
The first way how to program AMD GPUs is to use HIP.
The basic vector addition code in HIP looks like this. This a full code and you can copy and paste it into a file. For this example we use `vector_add.hip.cpp` .
```
#include <cstdio>
#include <hip/hip_runtime.h>
__global__ void add_vectors(float * x, float * y, float alpha, int count)
{
long long idx = blockIdx.x * blockDim.x + threadIdx.x;
if(idx < count)
y[idx] += alpha * x[idx];
}
int main()
{
// number of elements in the vectors
long long count = 10;
// allocation and initialization of data on the host (CPU memory)
The list of official AMD libraries can be found here: https://docs.amd.com/category/libraries.
The libraries are installed in the same directory is ROCm
```
/opt/rocm/
```
Following libraries are installed:
```
drwxr-xr-x 4 root root 44 Jun 7 14:09 hipblas
drwxr-xr-x 3 root root 17 Jun 7 14:09 hipblas-clients
drwxr-xr-x 3 root root 29 Jun 7 14:09 hipcub
drwxr-xr-x 4 root root 44 Jun 7 14:09 hipfft
drwxr-xr-x 3 root root 25 Jun 7 14:09 hipfort
drwxr-xr-x 4 root root 32 Jun 7 14:09 hiprand
drwxr-xr-x 4 root root 44 Jun 7 14:09 hipsolver
drwxr-xr-x 4 root root 44 Jun 7 14:09 hipsparse
```
and
```
drwxr-xr-x 4 root root 32 Jun 7 14:09 rocalution
drwxr-xr-x 4 root root 44 Jun 7 14:09 rocblas
drwxr-xr-x 4 root root 44 Jun 7 14:09 rocfft
drwxr-xr-x 4 root root 32 Jun 7 14:09 rocprim
drwxr-xr-x 4 root root 32 Jun 7 14:09 rocrand
drwxr-xr-x 4 root root 44 Jun 7 14:09 rocsolver
drwxr-xr-x 4 root root 44 Jun 7 14:09 rocsparse
drwxr-xr-x 3 root root 29 Jun 7 14:09 rocthrust
```
### Using hipBlas library
The basic code in HIP that uses hipBlas looks like this. This a full code and you can copy and paste it into a file. For this example we use `hipblas.hip.cpp` .
The basic code in HIP that uses hipSolver looks like this. This a full code and you can copy and paste it into a file. For this example we use `hipsolver.hip.cpp` .
```
#include <cstdio>
#include <vector>
#include <cstdlib>
#include <algorithm>
#include <hipsolver/hipsolver.h>
#include <hipblas/hipblas.h>
int main()
{
srand(63456);
int size = 10;
// allocation and initialization of data on host. this time we use std::vector
int h_A_ld = size;
int h_A_pitch = h_A_ld * sizeof(float);
std::vector<float> h_A(size * h_A_ld);
for(int r = 0; r < size; r++)
for(int c = 0; c < size; c++)
h_A[r * h_A_ld + c] = (10.0 * rand()) / RAND_MAX;
printf("System matrix A:\n");
for(int r = 0; r < size; r++)
{
for(int c = 0; c < size; c++)
printf("%6.3f ", h_A[r * h_A_ld + c]);
printf("\n");
}
std::vector<float> h_b(size);
for(int i = 0; i < size; i++)
h_b[i] = (10.0 * rand()) / RAND_MAX;
printf("RHS vector b:\n");
for(int i = 0; i < size; i++)
printf("%6.3f ", h_b[i]);
printf("\n");
std::vector<float> h_x(size);
// memory allocation on the device and initialization