Skip to content
Snippets Groups Projects
Commit dda25d0a authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Update karolina-compilation.md

parent e6c5d7b3
No related branches found
No related tags found
No related merge requests found
Pipeline #40672 failed
...@@ -34,47 +34,58 @@ ml cray-pmi/6.1.14 ...@@ -34,47 +34,58 @@ ml cray-pmi/6.1.14
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CRAY_LD_LIBRARY_PATH:$CRAY_LIBSCI_PREFIX_DIR/lib:/opt/cray/pals/1.3.2/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CRAY_LD_LIBRARY_PATH:$CRAY_LIBSCI_PREFIX_DIR/lib:/opt/cray/pals/1.3.2/lib
``` ```
There are usually two standard situation how to compile and run the code There are usually two standard situation how to compile and run the code
### OpenMP without MPI ### OpenMP without MPI
To compile the code against the LIBSCI, without MPI, but still enabling OpenMP run over multiple cores use: To compile the code against the LIBSCI, without MPI, but still enabling OpenMP run over multiple cores use:
```code ```code
icx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c -lsci_intel_mp icx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c -lsci_intel_mp
``` ```
To run the resulting binary use: To run the resulting binary use:
```code ```code
OMP_NUM_THREADS=128 OMP_PROC_BIND=true BINARY.x OMP_NUM_THREADS=128 OMP_PROC_BIND=true BINARY.x
``` ```
This enables effective run over all 128 cores available on a single Karlina compute node. This enables effective run over all 128 cores available on a single Karlina compute node.
### OpenMP with MPI ### OpenMP with MPI
To compile the code against the LIBSCI, with MPI, use: To compile the code against the LIBSCI, with MPI, use:
```code ```code
mpiicx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c -lsci_intel_mp -lsci_intel_mpi_mp mpiicx -qopenmp -L$CRAY_LIBSCI_PREFIX_DIR/lib -I$CRAY_LIBSCI_PREFIX_DIR/include -o BINARY.x SURCE_CODE.c -lsci_intel_mp -lsci_intel_mpi_mp
``` ```
To run the resulting binary use: To run the resulting binary use:
```code ```code
OMP_NUM_THREADS=64 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x OMP_NUM_THREADS=64 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x
``` ```
This example runs the BINARY.x, placed in ${HOME} as 2 MPI processes, each using 64 cores of a single socket of a single node. This example runs the BINARY.x, placed in ${HOME} as 2 MPI processes, each using 64 cores of a single socket of a single node.
Another example would be to run a job on 2 full nodes, utilizing 128 cores on each (so 256 cores in total) and letting the LIBSCI efficiently placing the BLAS routines across the allocated CPU sockets: Another example would be to run a job on 2 full nodes, utilizing 128 cores on each (so 256 cores in total) and letting the LIBSCI efficiently placing the BLAS routines across the allocated CPU sockets:
```code ```code
OMP_NUM_THREADS=128 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x OMP_NUM_THREADS=128 OMP_PROC_BIND=true mpirun -n 2 ${HOME}/BINARY.x
``` ```
This assumes you have allocated 2 full nodes on Karolina using SLURM's directives, e. g. in a submission script: This assumes you have allocated 2 full nodes on Karolina using SLURM's directives, e. g. in a submission script:
```code ```code
#SBATCH --nodes 2 #SBATCH --nodes 2
#SBATCH --ntasks-per-node 128 #SBATCH --ntasks-per-node 128
``` ```
**Don't forget** before the run to ensure you have the correct modules and loaded and that you have set up the LD_LIBRARY_PATH environment variable set as shown above (e.g. part of your submission script for SLURM). **Don't forget** before the run to ensure you have the correct modules and loaded and that you have set up the LD_LIBRARY_PATH environment variable set as shown above (e.g. part of your submission script for SLURM).
!!! note !!! note
Most MPI libraries do the binding automatically. The binding of MPI ranks can be inspected for any MPI by running `$ mpirun -n num_of_ranks numactl --show`. However, if the ranks spawn threads, binding of these threads should be done via the environment variables described above. Most MPI libraries do the binding automatically. The binding of MPI ranks can be inspected for any MPI by running `$ mpirun -n num_of_ranks numactl --show`. However, if the ranks spawn threads, binding of these threads should be done via the environment variables described above.
The choice of BLAS library and its performance may be verified with our benchmark, The choice of BLAS library and its performance may be verified with our benchmark,
see [Lorenz BLAS performance benchmark](https://code.it4i.cz/jansik/lorenz/-/blob/main/README.md). see [Lorenz BLAS performance benchmark](https://code.it4i.cz/jansik/lorenz/-/blob/main/README.md).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment