Skip to content
Snippets Groups Projects
Commit 9ffc5b63 authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Update file grace.md

parent 591495cb
No related branches found
No related tags found
No related merge requests found
Pipeline #36629 passed with warnings
...@@ -17,6 +17,7 @@ where: ...@@ -17,6 +17,7 @@ where:
## Available Toolchains ## Available Toolchains
The platform offers three toolchains: The platform offers three toolchains:
- Standard GCC (as a module `ml GCC`) - Standard GCC (as a module `ml GCC`)
- [NVHPC](https://developer.nvidia.com/hpc-sdk) (as a module `ml NVHPC`) - [NVHPC](https://developer.nvidia.com/hpc-sdk) (as a module `ml NVHPC`)
- [Clang for NVIDIA Grace](https://developer.nvidia.com/grace/clang) (installed in `/opt/nvidia/clang`) - [Clang for NVIDIA Grace](https://developer.nvidia.com/grace/clang) (installed in `/opt/nvidia/clang`)
...@@ -38,6 +39,7 @@ for(int i = 0; i < 1000000; ++i) { ...@@ -38,6 +39,7 @@ for(int i = 0; i < 1000000; ++i) {
} }
} }
``` ```
may emit scalar code for the inner loop leading to no vectorization being used at all. may emit scalar code for the inner loop leading to no vectorization being used at all.
### Clang (For Grace) Toolchain ### Clang (For Grace) Toolchain
...@@ -73,6 +75,7 @@ The basic libraries (BLAS and LAPACK) are included in NVHPC toolchain and can be ...@@ -73,6 +75,7 @@ The basic libraries (BLAS and LAPACK) are included in NVHPC toolchain and can be
### NVIDIA Performance Libraries ### NVIDIA Performance Libraries
The [NVPL](https://developer.nvidia.com/nvpl) package includes more extensive set of libraries in both sequential and multi-threaded versions: The [NVPL](https://developer.nvidia.com/nvpl) package includes more extensive set of libraries in both sequential and multi-threaded versions:
- BLACS: `-lnvpl_blacs_{lp64,ilp64}_{mpich,openmpi3,openmpi4,openmpi5}` - BLACS: `-lnvpl_blacs_{lp64,ilp64}_{mpich,openmpi3,openmpi4,openmpi5}`
- BLAS: `-lnvpl_blas_{lp64,ilp64}_{seq,gomp}` - BLAS: `-lnvpl_blas_{lp64,ilp64}_{seq,gomp}`
- FFTW: `-lnvpl_fftw` - FFTW: `-lnvpl_fftw`
...@@ -112,6 +115,7 @@ ml OpenMPI ...@@ -112,6 +115,7 @@ ml OpenMPI
mpic++ -fast -fopenmp hello.cpp -o hello mpic++ -fast -fopenmp hello.cpp -o hello
OMP_PROC_BIND=close OMP_NUM_THREADS=4 mpirun -np 4 --map-by slot:pe=36 ./hello OMP_PROC_BIND=close OMP_NUM_THREADS=4 mpirun -np 4 --map-by slot:pe=36 ./hello
``` ```
In this configuration we run 4 ranks bound to one quarter of cores each with 4 OpenMP threads. In this configuration we run 4 ranks bound to one quarter of cores each with 4 OpenMP threads.
## Simple BLAS Application ## Simple BLAS Application
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment