This example demostrates how to use AOMP, which can compile programs that use OpenMP offloading.
The `vadd.cpp` source file contains a simple vector add source code. On line 35 there begins a loop performing the vector addition, which is annotated by several OpenMP constructs. The `target` construct makes the code execute on the GPU, `map` informs OpenMP about what data transfers should be done. The `teams` construct creates a league of teams, and `distribute` splits the for loop iterations between all teams, a lot like dividing work between threadblocks in CUDA/HIP. `parallel for` then creates several threads, which together work on the team's loop iterations, just like threads in a threadblock.
The code can be compiled using
```
aompcc vadd.cpp -o vadd.x
```
On machines with other than non-default GPU (default is Vega, gfx900), one would either `export AOMP_GPU=gfx908` or compile using `aompcc --offload-arch gfx908 vadd.cpp -o vadd.x` (for AMD Instinct MI100).
This example compares different OpenMP parallelization techniques of a simple algorithm calculating $\pi$ based on numerical integration and the fact that $\pi = \int_0^1 \frac{4}{1+x^2} \;\mathrm{d} x$.
The `pi_seq.cpp` source file contains sequential code of this algorithm, `pi_omp.cpp` is parallelized using OpenMP, `pi_omp_offload.cpp` uses OpenMP offloading, and `pi_hip.hip.cpp` is the same algorithm, but written in HIP. Compile the sources by `make` and run them all by `make run`. Watch how many different ways was the code compiled, what commands were used for the compilation, and compare the differences in computation time.