From 591495cb97d6e704d74ef92ad1637668f51cdd7e Mon Sep 17 00:00:00 2001 From: Jan Siwiec <jan.siwiec@vsb.cz> Date: Thu, 15 Feb 2024 08:59:58 +0100 Subject: [PATCH] Update file grace.md --- docs.it4i/cs/guides/grace.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs.it4i/cs/guides/grace.md b/docs.it4i/cs/guides/grace.md index d51729b31..a2fba21d7 100644 --- a/docs.it4i/cs/guides/grace.md +++ b/docs.it4i/cs/guides/grace.md @@ -40,7 +40,7 @@ for(int i = 0; i < 1000000; ++i) { ``` may emit scalar code for the inner loop leading to no vectorization being used at all. -### Clang (for Grace) Toolchain +### Clang (For Grace) Toolchain The Clang/LLVM tends to behave similarly, but can be guided to properly vectorize the inner loop with either flags `-O3 -ffast-math -march=native -fno-unroll-loops -mllvm -force-vector-width=8` or pragmas such as `#pragma clang loop vectorize_width(8)` and `#pragma clang loop unroll(disable)`. @@ -257,7 +257,7 @@ OMP_NUM_THREADS=144 OMP_PROC_BIND=spread ./main !!! note It may be advantageous to use NVPL libraries instead NVHPC ones. For example DGEMM BLAS 3 routine from NVPL is almost 30% faster than NVHPC one. -### Using Clang (for Grace) Toolchain +### Using Clang (For Grace) Toolchain Similarly Clang for Grace toolchain with NVPL BLAS can be used to compile C++ version of the example. -- GitLab