@@ -24,7 +24,7 @@ see [Lorenz Compiler performance benchmark][a].
## 2. Use BLAS Library
It is important to use the BLAS library that performs well on AMD processors.
To combine the optimizations for the general CPU code and have the most efficient BLAS routines we recommend the combination of lastest Intel Compiler suite, with Cray's Scientific Library bundle (LIBSCI). When using the Intel Compiler suite includes also support for efficient MPI implementation utilizing Intel MPI library over the Infiniband interconnect.
To combine the optimizations for the general CPU code and have the most efficient BLAS routines we recommend the combination of lastest Intel Compiler suite, with Cray's Scientific Library bundle (LibSci). When using the Intel Compiler suite includes also support for efficient MPI implementation utilizing Intel MPI library over the Infiniband interconnect.
For the compilation as well for the runtime of compiled code use:
...
...
@@ -39,10 +39,10 @@ There are usually two standard situation how to compile and run the code
### OpenMP Without MPI
To compile the code against the LIBSCI, without MPI, but still enabling OpenMP run over multiple cores use:
To compile the code against the LibSci, without MPI, but still enabling OpenMP run over multiple cores use:
This example runs the BINARY.x, placed in ${HOME} as 2 MPI processes, each using 64 cores of a single socket of a single node.
Another example would be to run a job on 2 full nodes, utilizing 128 cores on each (so 256 cores in total) and letting the LIBSCI efficiently placing the BLAS routines across the allocated CPU sockets:
Another example would be to run a job on 2 full nodes, utilizing 128 cores on each (so 256 cores in total) and letting the LibSci efficiently placing the BLAS routines across the allocated CPU sockets: