diff --git a/docs.it4i/cs/guides/grace.md b/docs.it4i/cs/guides/grace.md index d51729b31a5de109b932a8423d6959cfdea1414f..a2fba21d7a2331fa10ef44a2c73274b32e542446 100644 --- a/docs.it4i/cs/guides/grace.md +++ b/docs.it4i/cs/guides/grace.md @@ -40,7 +40,7 @@ for(int i = 0; i < 1000000; ++i) { ``` may emit scalar code for the inner loop leading to no vectorization being used at all. -### Clang (for Grace) Toolchain +### Clang (For Grace) Toolchain The Clang/LLVM tends to behave similarly, but can be guided to properly vectorize the inner loop with either flags `-O3 -ffast-math -march=native -fno-unroll-loops -mllvm -force-vector-width=8` or pragmas such as `#pragma clang loop vectorize_width(8)` and `#pragma clang loop unroll(disable)`. @@ -257,7 +257,7 @@ OMP_NUM_THREADS=144 OMP_PROC_BIND=spread ./main !!! note It may be advantageous to use NVPL libraries instead NVHPC ones. For example DGEMM BLAS 3 routine from NVPL is almost 30% faster than NVHPC one. -### Using Clang (for Grace) Toolchain +### Using Clang (For Grace) Toolchain Similarly Clang for Grace toolchain with NVPL BLAS can be used to compile C++ version of the example.