Skip to content
Snippets Groups Projects
Commit 591495cb authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Update file grace.md

parent ad625d8b
No related branches found
No related tags found
No related merge requests found
Pipeline #36628 passed with warnings
......@@ -40,7 +40,7 @@ for(int i = 0; i < 1000000; ++i) {
```
may emit scalar code for the inner loop leading to no vectorization being used at all.
### Clang (for Grace) Toolchain
### Clang (For Grace) Toolchain
The Clang/LLVM tends to behave similarly, but can be guided to properly vectorize the inner loop with either flags `-O3 -ffast-math -march=native -fno-unroll-loops -mllvm -force-vector-width=8` or pragmas such as `#pragma clang loop vectorize_width(8)` and `#pragma clang loop unroll(disable)`.
......@@ -257,7 +257,7 @@ OMP_NUM_THREADS=144 OMP_PROC_BIND=spread ./main
!!! note
It may be advantageous to use NVPL libraries instead NVHPC ones. For example DGEMM BLAS 3 routine from NVPL is almost 30% faster than NVHPC one.
### Using Clang (for Grace) Toolchain
### Using Clang (For Grace) Toolchain
Similarly Clang for Grace toolchain with NVPL BLAS can be used to compile C++ version of the example.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment