Issue #4: NVIDIA Grace CPU Guide
Compare changes
docs.it4i/cs/guides/grace.md
0 → 100644
+ 282
− 0
Our basic experiments show that fixed width vectorization (NEON) tends to perform better in the case of short (register-length) loops than SVE. In cases (like above), where specified `vectorize_width` is larger than avaliable vector unit width, Clang will emit multiple NEON instructions (eg. 4 instructions will be emitted to process 8 64-bit operations in 128-bit units of Grace).
\ No newline at end of file