Update file grace.md

591495cb · Jan Siwiec · ad625d8b · 591495cb
Commit 591495cb authored 1 year ago by Jan Siwiec
--- a/docs.it4i/cs/guides/grace.md
+++ b/docs.it4i/cs/guides/grace.md
@@ -40,7 +40,7 @@ for(int i = 0; i < 1000000; ++i) {
 ```
 may emit scalar code for the inner loop leading to no vectorization being used at all.

-### Clang (for Grace) Toolchain
+### Clang (For Grace) Toolchain

 The Clang/LLVM tends to behave similarly, but can be guided to properly vectorize the inner loop with either flags `-O3 -ffast-math -march=native -fno-unroll-loops -mllvm -force-vector-width=8` or pragmas such as `#pragma clang loop vectorize_width(8)` and `#pragma clang loop unroll(disable)`.

@@ -257,7 +257,7 @@ OMP_NUM_THREADS=144 OMP_PROC_BIND=spread ./main
 !!! note
    It may be advantageous to use NVPL libraries instead NVHPC ones. For example DGEMM BLAS 3 routine from NVPL is almost 30% faster than NVHPC one.

-### Using Clang (for Grace) Toolchain
+### Using Clang (For Grace) Toolchain

 Similarly Clang for Grace toolchain with NVPL BLAS can be used to compile C++ version of the example.