Skip to content
Snippets Groups Projects
Commit 5fad5f33 authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Update file grace.md

parent 9b0b74f3
No related branches found
No related tags found
No related merge requests found
Pipeline #36632 passed with warnings
...@@ -23,7 +23,7 @@ The platform offers three toolchains: ...@@ -23,7 +23,7 @@ The platform offers three toolchains:
- [Clang for NVIDIA Grace](https://developer.nvidia.com/grace/clang) (installed in `/opt/nvidia/clang`) - [Clang for NVIDIA Grace](https://developer.nvidia.com/grace/clang) (installed in `/opt/nvidia/clang`)
!!! note !!! note
The NVHPC toolchain showed strong results with minimal amount of tunning necessary in our initial evaluation. The NVHPC toolchain showed strong results with minimal amount of tuning necessary in our initial evaluation.
### GCC Toolchain ### GCC Toolchain
...@@ -59,11 +59,11 @@ for(int i = 0; i < 1000000; ++i) { ...@@ -59,11 +59,11 @@ for(int i = 0; i < 1000000; ++i) {
``` ```
!!! note !!! note
Our basic experiments show that fixed width vectorization (NEON) tends to perform better in the case of short (register-length) loops than SVE. In cases (like above), where specified `vectorize_width` is larger than avaliable vector unit width, Clang will emit multiple NEON instructions (eg. 4 instructions will be emitted to process 8 64-bit operations in 128-bit units of Grace). Our basic experiments show that fixed width vectorization (NEON) tends to perform better in the case of short (register-length) loops than SVE. In cases (like above), where specified `vectorize_width` is larger than availiable vector unit width, Clang will emit multiple NEON instructions (eg. 4 instructions will be emitted to process 8 64-bit operations in 128-bit units of Grace).
### NVHPC Toolchain ### NVHPC Toolchain
The NVHPC toolchain handled aforementioned case without any additional tunning. Simple `-O3 -march=native -fast` should be therefore sufficient. The NVHPC toolchain handled aforementioned case without any additional tuning. Simple `-O3 -march=native -fast` should be therefore sufficient.
## Basic Math Libraries ## Basic Math Libraries
...@@ -84,7 +84,7 @@ The [NVPL](https://developer.nvidia.com/nvpl) package includes more extensive se ...@@ -84,7 +84,7 @@ The [NVPL](https://developer.nvidia.com/nvpl) package includes more extensive se
- RAND: `-lnvpl_rand` or `-lnvpl_rand_mt` - RAND: `-lnvpl_rand` or `-lnvpl_rand_mt`
- SPARSE: `-lnvpl_sparse` - SPARSE: `-lnvpl_sparse`
This package should be compatible with all avaliable toolchains and includes CMake module files for easy integration into CMake-based projects. For further documentation see also [NVPL](https://docs.nvidia.com/nvpl). This package should be compatible with all availiable toolchains and includes CMake module files for easy integration into CMake-based projects. For further documentation see also [NVPL](https://docs.nvidia.com/nvpl).
## Basic Communication Libraries ## Basic Communication Libraries
...@@ -242,7 +242,7 @@ end program main ...@@ -242,7 +242,7 @@ end program main
### Using NVHPC Toolchain ### Using NVHPC Toolchain
The C++ version of the example can be compiled with NVHPC and ran as folows The C++ version of the example can be compiled with NVHPC and ran as follows
```console ```console
ml NVHPC ml NVHPC
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment