diff --git a/docs.it4i/software/karolina-compilation.md b/docs.it4i/software/karolina-compilation.md index ec47f7ee7ba57996deb4b8b4f0b1fa1d84ea56e4..27d4ab77c8aae21c79a58be2ea7cda2e812857f6 100644 --- a/docs.it4i/software/karolina-compilation.md +++ b/docs.it4i/software/karolina-compilation.md @@ -24,7 +24,7 @@ see [Lorenz Compiler performance benchmark][a]. ## 2. Use BLAS Library It is important to use the BLAS library that performs well on AMD processors. -To combine the optimizations for the general CPU code and have the most efficient BLAS routines we recommend the combination of lastest Intel Compiler suite, with Cray's Scientific Library bundle (LIBSCI). When using the Intel Compiler suite includes also support for efficient MPI implementation utilizing Intel MPI library over the Infiniband interconnect. +To combine the optimizations for the general CPU code and have the most efficient BLAS routines we recommend the combination of lastest Intel Compiler suite, with Cray's Scientific Library bundle (LibSci). When using the Intel Compiler suite includes also support for efficient MPI implementation utilizing Intel MPI library over the Infiniband interconnect. For the compilation as well for the runtime of compiled code use: @@ -39,10 +39,10 @@ There are usually two standard situation how to compile and run the code ### OpenMP Without MPI -To compile the code against the LIBSCI, without MPI, but still enabling OpenMP run over multiple cores use: +To compile the code against the LibSci, without MPI, but still enabling OpenMP run over multiple cores use: ```code -icx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c -lsci_intel_mp +icx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SOURCE_CODE.c -lsci_intel_mp ``` To run the resulting binary use: @@ -55,10 +55,10 @@ This enables effective run over all 128 cores available on a single Karlina comp ### OpenMP With MPI -To compile the code against the LIBSCI, with MPI, use: +To compile the code against the LibSci, with MPI, use: ```code -mpiicx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c -lsci_intel_mp -lsci_intel_mpi_mp +mpiicx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SOURCE_CODE.c -lsci_intel_mp -lsci_intel_mpi_mp ``` To run the resulting binary use: @@ -69,7 +69,7 @@ OMP_NUM_THREADS=64 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x This example runs the BINARY.x, placed in ${HOME} as 2 MPI processes, each using 64 cores of a single socket of a single node. -Another example would be to run a job on 2 full nodes, utilizing 128 cores on each (so 256 cores in total) and letting the LIBSCI efficiently placing the BLAS routines across the allocated CPU sockets: +Another example would be to run a job on 2 full nodes, utilizing 128 cores on each (so 256 cores in total) and letting the LibSci efficiently placing the BLAS routines across the allocated CPU sockets: ```code OMP_NUM_THREADS=128 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x