Update karolina-compilation.md

dda25d0a · Jan Siwiec · e6c5d7b3 · dda25d0a
Commit dda25d0a authored 7 months ago by Jan Siwiec
--- a/docs.it4i/software/karolina-compilation.md
+++ b/docs.it4i/software/karolina-compilation.md
@@ -34,47 +34,58 @@ ml cray-pmi/6.1.14
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CRAY_LD_LIBRARY_PATH:$CRAY_LIBSCI_PREFIX_DIR/lib:/opt/cray/pals/1.3.2/lib
 ```
 There are usually two standard situation how to compile and run the code
 ### OpenMP without MPI
 To compile the code against the LIBSCI, without MPI, but still enabling OpenMP run over multiple cores use:
 ```code
 icx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c  -lsci_intel_mp
 ```
 To run the resulting binary use:
 ```code
 OMP_NUM_THREADS=128 OMP_PROC_BIND=true BINARY.x
 ```
 This enables effective run over all 128 cores available on a single Karlina compute node.
 ### OpenMP with MPI
 To compile the code against the LIBSCI, with MPI, use:
 ```code
 mpiicx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c  -lsci_intel_mp -lsci_intel_mpi_mp
 ```
 To run the resulting binary use:
 ```code
 OMP_NUM_THREADS=64 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x
 ```
 This example runs the BINARY.x, placed in ${HOME} as 2 MPI processes, each using 64 cores of a single socket of a single node.
 Another example would be to run a job on 2 full nodes, utilizing 128 cores on each (so 256 cores in total) and letting the LIBSCI efficiently placing the BLAS routines across the allocated CPU sockets:
 ```code
 OMP_NUM_THREADS=128 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x
 ```
 This assumes you have allocated 2 full nodes on Karolina using SLURM's directives, e. g. in a submission script:
 ```code
 #SBATCH --nodes 2
 #SBATCH --ntasks-per-node 128
 ```
-**Don't forget** before the run to ensure you have the correct modules and loaded and that you have set up the LD_LIBRARY_PATH environment variable set as shown above (e.g. part of your submission script for SLURM). 
+**Don't forget** before the run to ensure you have the correct modules and loaded and that you have set up the LD_LIBRARY_PATH environment variable set as shown above (e.g. part of your submission script for SLURM).
-!!! note  
+!!! note
-Most MPI libraries do the binding automatically. The binding of MPI ranks can be inspected for any MPI by running  `$ mpirun -n num_of_ranks numactl --show`. However, if the ranks spawn threads, binding of these threads should be done via the environment variables described above.
+    Most MPI libraries do the binding automatically. The binding of MPI ranks can be inspected for any MPI by running  `$ mpirun -n num_of_ranks numactl --show`. However, if the ranks spawn threads, binding of these threads should be done via the environment variables described above.
-The choice of BLAS library and its performance may be verified with our benchmark,  
+The choice of BLAS library and its performance may be verified with our benchmark,
 see  [Lorenz BLAS performance benchmark](https://code.it4i.cz/jansik/lorenz/-/blob/main/README.md).