Skip to content
Snippets Groups Projects
Commit 9ffc5b63 authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Update file grace.md

parent 591495cb
Branches
Tags
No related merge requests found
Pipeline #36629 passed with warnings
......@@ -17,6 +17,7 @@ where:
## Available Toolchains
The platform offers three toolchains:
- Standard GCC (as a module `ml GCC`)
- [NVHPC](https://developer.nvidia.com/hpc-sdk) (as a module `ml NVHPC`)
- [Clang for NVIDIA Grace](https://developer.nvidia.com/grace/clang) (installed in `/opt/nvidia/clang`)
......@@ -38,6 +39,7 @@ for(int i = 0; i < 1000000; ++i) {
}
}
```
may emit scalar code for the inner loop leading to no vectorization being used at all.
### Clang (For Grace) Toolchain
......@@ -73,6 +75,7 @@ The basic libraries (BLAS and LAPACK) are included in NVHPC toolchain and can be
### NVIDIA Performance Libraries
The [NVPL](https://developer.nvidia.com/nvpl) package includes more extensive set of libraries in both sequential and multi-threaded versions:
- BLACS: `-lnvpl_blacs_{lp64,ilp64}_{mpich,openmpi3,openmpi4,openmpi5}`
- BLAS: `-lnvpl_blas_{lp64,ilp64}_{seq,gomp}`
- FFTW: `-lnvpl_fftw`
......@@ -112,6 +115,7 @@ ml OpenMPI
mpic++ -fast -fopenmp hello.cpp -o hello
OMP_PROC_BIND=close OMP_NUM_THREADS=4 mpirun -np 4 --map-by slot:pe=36 ./hello
```
In this configuration we run 4 ranks bound to one quarter of cores each with 4 OpenMP threads.
## Simple BLAS Application
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment