Skip to content
Snippets Groups Projects
Commit 9ffc5b63 authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Update file grace.md

parent 591495cb
No related branches found
No related tags found
No related merge requests found
Pipeline #36629 passed with warnings
......@@ -17,6 +17,7 @@ where:
## Available Toolchains
The platform offers three toolchains:
- Standard GCC (as a module `ml GCC`)
- [NVHPC](https://developer.nvidia.com/hpc-sdk) (as a module `ml NVHPC`)
- [Clang for NVIDIA Grace](https://developer.nvidia.com/grace/clang) (installed in `/opt/nvidia/clang`)
......@@ -38,6 +39,7 @@ for(int i = 0; i < 1000000; ++i) {
}
}
```
may emit scalar code for the inner loop leading to no vectorization being used at all.
### Clang (For Grace) Toolchain
......@@ -73,6 +75,7 @@ The basic libraries (BLAS and LAPACK) are included in NVHPC toolchain and can be
### NVIDIA Performance Libraries
The [NVPL](https://developer.nvidia.com/nvpl) package includes more extensive set of libraries in both sequential and multi-threaded versions:
- BLACS: `-lnvpl_blacs_{lp64,ilp64}_{mpich,openmpi3,openmpi4,openmpi5}`
- BLAS: `-lnvpl_blas_{lp64,ilp64}_{seq,gomp}`
- FFTW: `-lnvpl_fftw`
......@@ -112,6 +115,7 @@ ml OpenMPI
mpic++ -fast -fopenmp hello.cpp -o hello
OMP_PROC_BIND=close OMP_NUM_THREADS=4 mpirun -np 4 --map-by slot:pe=36 ./hello
```
In this configuration we run 4 ranks bound to one quarter of cores each with 4 OpenMP threads.
## Simple BLAS Application
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment