Typos and correction of LibSCI naming.
Compare changes
@@ -24,7 +24,7 @@ see [Lorenz Compiler performance benchmark][a].
@@ -24,7 +24,7 @@ see [Lorenz Compiler performance benchmark][a].
To combine the optimizations for the general CPU code and have the most efficient BLAS routines we recommend the combination of lastest Intel Compiler suite, with Cray's Scientific Library bundle (LIBSCI). When using the Intel Compiler suite includes also support for efficient MPI implementation utilizing Intel MPI library over the Infiniband interconnect.
To combine the optimizations for the general CPU code and have the most efficient BLAS routines we recommend the combination of lastest Intel Compiler suite, with Cray's Scientific Library bundle (LibSci). When using the Intel Compiler suite includes also support for efficient MPI implementation utilizing Intel MPI library over the Infiniband interconnect.
@@ -39,10 +39,10 @@ There are usually two standard situation how to compile and run the code
@@ -39,10 +39,10 @@ There are usually two standard situation how to compile and run the code
To compile the code against the LIBSCI, without MPI, but still enabling OpenMP run over multiple cores use:
icx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c -lsci_intel_mp
@@ -55,10 +55,10 @@ This enables effective run over all 128 cores available on a single Karlina comp
@@ -55,10 +55,10 @@ This enables effective run over all 128 cores available on a single Karlina comp
mpiicx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c -lsci_intel_mp -lsci_intel_mpi_mp
@@ -69,7 +69,7 @@ OMP_NUM_THREADS=64 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x
@@ -69,7 +69,7 @@ OMP_NUM_THREADS=64 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x
This example runs the BINARY.x, placed in ${HOME} as 2 MPI processes, each using 64 cores of a single socket of a single node.
Another example would be to run a job on 2 full nodes, utilizing 128 cores on each (so 256 cores in total) and letting the LIBSCI efficiently placing the BLAS routines across the allocated CPU sockets: