diff --git a/docs.it4i/software/karolina-compilation.md b/docs.it4i/software/karolina-compilation.md index a0e222347722eb67bc5c7b49251ba29a31a86467..d278240c96afb41005c07894781a3fc9064fbd5d 100644 --- a/docs.it4i/software/karolina-compilation.md +++ b/docs.it4i/software/karolina-compilation.md @@ -34,47 +34,58 @@ ml cray-pmi/6.1.14 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CRAY_LD_LIBRARY_PATH:$CRAY_LIBSCI_PREFIX_DIR/lib:/opt/cray/pals/1.3.2/lib ``` + There are usually two standard situation how to compile and run the code ### OpenMP without MPI To compile the code against the LIBSCI, without MPI, but still enabling OpenMP run over multiple cores use: + ```code icx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c -lsci_intel_mp ``` To run the resulting binary use: + ```code OMP_NUM_THREADS=128 OMP_PROC_BIND=true BINARY.x ``` + This enables effective run over all 128 cores available on a single Karlina compute node. ### OpenMP with MPI + To compile the code against the LIBSCI, with MPI, use: + ```code mpiicx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c -lsci_intel_mp -lsci_intel_mpi_mp ``` To run the resulting binary use: + ```code OMP_NUM_THREADS=64 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x ``` + This example runs the BINARY.x, placed in ${HOME} as 2 MPI processes, each using 64 cores of a single socket of a single node. Another example would be to run a job on 2 full nodes, utilizing 128 cores on each (so 256 cores in total) and letting the LIBSCI efficiently placing the BLAS routines across the allocated CPU sockets: + ```code OMP_NUM_THREADS=128 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x ``` + This assumes you have allocated 2 full nodes on Karolina using SLURM's directives, e. g. in a submission script: + ```code #SBATCH --nodes 2 #SBATCH --ntasks-per-node 128 ``` -**Don't forget** before the run to ensure you have the correct modules and loaded and that you have set up the LD_LIBRARY_PATH environment variable set as shown above (e.g. part of your submission script for SLURM). +**Don't forget** before the run to ensure you have the correct modules and loaded and that you have set up the LD_LIBRARY_PATH environment variable set as shown above (e.g. part of your submission script for SLURM). -!!! note -Most MPI libraries do the binding automatically. The binding of MPI ranks can be inspected for any MPI by running `$ mpirun -n num_of_ranks numactl --show`. However, if the ranks spawn threads, binding of these threads should be done via the environment variables described above. +!!! note + Most MPI libraries do the binding automatically. The binding of MPI ranks can be inspected for any MPI by running `$ mpirun -n num_of_ranks numactl --show`. However, if the ranks spawn threads, binding of these threads should be done via the environment variables described above. -The choice of BLAS library and its performance may be verified with our benchmark, +The choice of BLAS library and its performance may be verified with our benchmark, see [Lorenz BLAS performance benchmark](https://code.it4i.cz/jansik/lorenz/-/blob/main/README.md).