Skip to content
Snippets Groups Projects
Commit f4669cd6 authored by Lukáš Krupčík's avatar Lukáš Krupčík
Browse files

Update docs.it4i/software/machine-learning/tensorflow.md, docs.it4i/software/mpi/mpi.md files

parent b8c720d8
No related branches found
No related tags found
1 merge request!338software
Pipeline #22339 failed
...@@ -7,6 +7,7 @@ For searching available modules type: ...@@ -7,6 +7,7 @@ For searching available modules type:
$ ml av Tensorflow $ ml av Tensorflow
``` ```
<!---
## Salomon Modules ## Salomon Modules
Salomon provides (besides other) these TensorFlow modules: Salomon provides (besides other) these TensorFlow modules:
...@@ -22,6 +23,7 @@ Salomon provides (besides other) these TensorFlow modules: ...@@ -22,6 +23,7 @@ Salomon provides (besides other) these TensorFlow modules:
* GCC/7.1.0-2.28 * GCC/7.1.0-2.28
* Python/3.6.1 * Python/3.6.1
* protobuf/3.2.0-GCC-7.1.0-2.28-Python-3.6.1 * protobuf/3.2.0-GCC-7.1.0-2.28-Python-3.6.1
-->
## TensorFlow Application Example ## TensorFlow Application Example
......
...@@ -2,14 +2,11 @@ ...@@ -2,14 +2,11 @@
## Setting Up MPI Environment ## Setting Up MPI Environment
The Salomon cluster provides several implementations of the MPI library: The Karolina cluster provides several implementations of the MPI library:
| MPI Library | Thread support | * OpenMPI
| ----------------- | ------------------------------------------------------------------------------------------------------ | * Intel MPI (impi)
| **Intel MPI 4.1** | Full thread support up to `MPI_THREAD_MULTIPLE` | * MPICH
| **Intel MPI 5.0** | Full thread support up to `MPI_THREAD_MULTIPLE` |
| OpenMPI 1.8.6 | Full thread support up to `MPI_THREAD_MULTIPLE`, MPI-3.0 support |
| SGI MPT 2.12 | |
MPI libraries are activated via the environment modules. MPI libraries are activated via the environment modules.
...@@ -17,56 +14,94 @@ Look up the modulefiles/mpi section in `ml av`: ...@@ -17,56 +14,94 @@ Look up the modulefiles/mpi section in `ml av`:
```console ```console
$ ml av $ ml av
------------------------------ /apps/modules/mpi ------------------------------- ------------------------------------------------------- /apps/modules/mpi -------------------------------------------------------
impi/4.1.1.036-iccifort-2013.5.192 OpenMPI/3.1.4-GCC-6.3.0-2.27 OpenMPI/4.1.1-GCC-10.2.0
impi/4.1.1.036-iccifort-2013.5.192-GCC-4.8.3 OpenMPI/4.0.3-GCC-9.3.0 OpenMPI/4.1.1-GCC-10.3.0 (D)
impi/5.0.3.048-iccifort-2015.3.187 OpenMPI/4.0.5-GCC-10.2.0 impi/2017.4.239-iccifort-2017.8.262-GCC-6.3.0-2.27
impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25 OpenMPI/4.0.5-gcccuda-2020b impi/2018.4.274-iccifort-2018.5.274-GCC-8.3.0-2.32
MPT/2.12 OpenMPI/4.0.5-iccifort-2020.4.304 impi/2018.4.274-iccifort-2019.1.144-GCC-8.2.0-2.31.1
OpenMPI/1.8.6-GNU-5.1.0-2.25 OpenMPI/4.0.5-NVHPC-21.2-CUDA-11.2.2 impi/2019.9.304-iccifort-2020.1.217
OpenMPI/4.0.5-NVHPC-21.2-CUDA-11.3.0 impi/2019.9.304-iccifort-2020.4.304
OpenMPI/4.1.1-GCC-10.2.0-Java-1.8.0_221 impi/2021.2.0-intel-compilers-2021.2.0 (D)
MPICH/3.3.2-GCC-10.2.0
``` ```
There are default compilers associated with any particular MPI implementation. The defaults may be changed; the MPI libraries may be used in conjunction with any compiler. The defaults are selected via the modules in the following way: There are default compilers associated with any particular MPI implementation. The defaults may be changed; the MPI libraries may be used in conjunction with any compiler.
| Module | MPI | Compiler suite |
| ---------------------------------------- | ---------- | -------------- |
| impi-5.0.3.048-iccifort- Intel MPI 5.0.3 | 2015.3.187 | |
| OpenMP-1.8.6-GNU-5.1.0-2 OpenMPI 1.8.6 | .25 | |
Examples: Examples:
```console ```console
$ ml gompi/2015b $ ml gompi/2020b
``` ```
In this example, we activate the latest OpenMPI with the latest GNU compilers (OpenMPI 1.8.6 and GCC 5.1). For more information about toolchains, see the [Environment and Modules][1] section. In this example, we activate the OpenMPI with the GNU compilers (OpenMPI 4.0.5 and GCC 10.2.0). For more information about toolchains, see the [Environment and Modules][1] section.
To use OpenMPI with the Intel compiler suite, use: To use OpenMPI with the Intel compiler suite, use:
```console ```console
$ ml iompi/2015.03 $ ml iompi/2020b
``` ```
In this example, the OpenMPI 1.8.6 using the Intel compilers is activated. It uses the `iompi` toolchain. In this example, the OpenMPI 4.0.5 using the Intel compilers 2020.4.304 is activated. It uses the `iompi` toolchain.
## Compiling MPI Programs ## Compiling MPI Programs
After setting up your MPI environment, compile your program using one of the MPI wrappers: After setting up your MPI environment, compile your program using one of the MPI wrappers:
For module `gompi/2020b`
```console ```console
$ mpicc -v $ mpicc -v
Using built-in specs.
COLLECT_GCC=/apps/all/GCCcore/10.2.0/bin/gcc
COLLECT_LTO_WRAPPER=/apps/all/GCCcore/10.2.0/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++,fortran --without-cuda-driver --enable-offload-targets=nvptx-none --enable-lto --enable-checking=release --disable-multilib --enable-shared=yes --enable-static=yes --enable-threads=posix --enable-plugins --enable-gold=default --enable-ld --with-plugin-ld=ld.gold --prefix=/apps/all/GCCcore/10.2.0 --with-local-prefix=/apps/all/GCCcore/10.2.0 --enable-bootstrap --with-isl=/dev/shm/easybuild/build/GCCcore/10.2.0/system-system/gcc-10.2.0/stage2_stuff
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (GCC)
$ mpif77 -v $ mpif77 -v
$ mpif90 -v Using built-in specs.
COLLECT_GCC=/apps/all/GCCcore/10.2.0/bin/gfortran
COLLECT_LTO_WRAPPER=/apps/all/GCCcore/10.2.0/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++,fortran --without-cuda-driver --enable-offload-targets=nvptx-none --enable-lto --enable-checking=release --disable-multilib --enable-shared=yes --enable-static=yes --enable-threads=posix --enable-plugins --enable-gold=default --enable-ld --with-plugin-ld=ld.gold --prefix=/apps/all/GCCcore/10.2.0 --with-local-prefix=/apps/all/GCCcore/10.2.0 --enable-bootstrap --with-isl=/dev/shm/easybuild/build/GCCcore/10.2.0/system-system/gcc-10.2.0/stage2_stuff
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (GCC)
~$ mpif90 -v
Using built-in specs.
COLLECT_GCC=/apps/all/GCCcore/10.2.0/bin/gfortran
COLLECT_LTO_WRAPPER=/apps/all/GCCcore/10.2.0/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++,fortran --without-cuda-driver --enable-offload-targets=nvptx-none --enable-lto --enable-checking=release --disable-multilib --enable-shared=yes --enable-static=yes --enable-threads=posix --enable-plugins --enable-gold=default --enable-ld --with-plugin-ld=ld.gold --prefix=/apps/all/GCCcore/10.2.0 --with-local-prefix=/apps/all/GCCcore/10.2.0 --enable-bootstrap --with-isl=/dev/shm/easybuild/build/GCCcore/10.2.0/system-system/gcc-10.2.0/stage2_stuff
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (GCC)
``` ```
When using Intel MPI, use the following MPI wrappers: When using Intel MPI, use the following MPI wrappers:
For module `intel/2020b`
```console ```console
$ mpicc $ mpiicc -v
$ mpiifort mpiicc for the Intel(R) MPI Library 2019 Update 9 for Linux*
Copyright 2003-2020, Intel Corporation.
icc version 19.1.3.304 (gcc version 10.2.0 compatibility)
ld /lib/../lib64/crt1.o /lib/../lib64/crti.o /apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtbegin.o --eh-frame-hdr --build-id -dynamic-linker /lib64/ld-linux-x86-64.so.2 -m elf_x86_64 -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -o a.out -L/apps/all/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64 -L/apps/all/imkl/2020.4.304-iimpi-2020b/lib/intel64 -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/libfabric/lib -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/../lib64 -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/../lib64 -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/../lib64 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/../lib64 -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64 -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64/ -L/lib/../lib64 -L/lib/../lib64/ -L/usr/lib/../lib64 -L/usr/lib/../lib64/ -L/apps/all/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/ -L/apps/all/imkl/2020.4.304-iimpi-2020b/lib/intel64/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/libfabric/lib/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/ -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib64 -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/ -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib64 -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/ -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8/ -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib64 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/ -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib64 -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../ -L/lib64 -L/lib/ -L/usr/lib64 -L/usr/lib --enable-new-dtags -rpath /apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -rpath /apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -lmpifort -lmpi -ldl -lrt -lpthread -Bdynamic -Bstatic -limf -lsvml -lirng -Bdynamic -lm -Bstatic -lipgo -ldecimal --as-needed -Bdynamic -lcilkrts -lstdc++ --no-as-needed -lgcc -lgcc_s -Bstatic -lirc -lsvml -Bdynamic -lc -lgcc -lgcc_s -Bstatic -lirc_s -Bdynamic -ldl -lc /apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtend.o /lib/../lib64/crtn.o
$ mpiifort -v
mpiifort for the Intel(R) MPI Library 2019 Update 9 for Linux*
Copyright 2003-2020, Intel Corporation.
ifort version 19.1.3.304
ld /lib/../lib64/crt1.o /lib/../lib64/crti.o /apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtbegin.o --eh-frame-hdr --build-id -dynamic-linker /lib64/ld-linux-x86-64.so.2 -m elf_x86_64 -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -o a.out /apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/for_main.o -L/apps/all/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64 -L/apps/all/imkl/2020.4.304-iimpi-2020b/lib/intel64 -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/libfabric/lib -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/../lib64 -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/../lib64 -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/../lib64 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/../lib64 -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64 -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64/ -L/lib/../lib64 -L/lib/../lib64/ -L/usr/lib/../lib64 -L/usr/lib/../lib64/ -L/apps/all/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/ -L/apps/all/imkl/2020.4.304-iimpi-2020b/lib/intel64/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/libfabric/lib/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/ -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib64 -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/ -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib64 -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/ -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8/ -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib64 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/ -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib64 -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../ -L/lib64 -L/lib/ -L/usr/lib64 -L/usr/lib --enable-new-dtags -rpath /apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -rpath /apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -lmpifort -lmpi -ldl -lrt -lpthread -Bdynamic -Bstatic -lifport -lifcoremt -limf -lsvml -Bdynamic -lm -Bstatic -lipgo -lirc -Bdynamic -lpthread -Bstatic -lsvml -Bdynamic -lc -lgcc -lgcc_s -Bstatic -lirc_s -Bdynamic -ldl -lc /apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtend.o /lib/../lib64/crtn.o
``` ```
Wrappers `mpif90` and `mpif77` provided by Intel MPI are designed for GCC and GFortran. You might be able to compile MPI code by them even with Intel compilers, but you might run into problems (for example, native MIC compilation with `-mmic` does not work with `mpif90`). Wrappers `mpif90` and `mpif77` provided by Intel MPI are designed for GCC and GFortran. You might be able to compile MPI code by them even with Intel compilers, but you might run into problems.
Example program: Example program:
...@@ -119,29 +154,26 @@ The optimal way to run an MPI program depends on its memory requirements, memory ...@@ -119,29 +154,26 @@ The optimal way to run an MPI program depends on its memory requirements, memory
!!! note !!! note
Consider these ways to run an MPI program: Consider these ways to run an MPI program:
1. One MPI process per node, 24 threads per process 1. One MPI process per node, 128 threads per process
2. Two MPI processes per node, 12 threads per process 2. Two MPI processes per node, 64 threads per process
3. 24 MPI processes per node, 1 thread per process. 3. 128 MPI processes per node, 1 thread per process.
**One MPI** process per node, using 24 threads, is most useful for memory demanding applications that make good use of processor cache memory and are not memory-bound. This is also a preferred way for communication intensive applications as one process per node enjoys full bandwidth access to the network interface. **One MPI** process per node, using 128 threads, is most useful for memory demanding applications that make good use of processor cache memory and are not memory-bound. This is also a preferred way for communication intensive applications as one process per node enjoys full bandwidth access to the network interface.
**Two MPI** processes per node, using 12 threads each, bound to processor socket is most useful for memory bandwidth-bound applications such as BLAS1 or FFT with scalable memory demand. However, note that the two processes will share access to the network interface. The 12 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration, and NUMA effect overheads. **Two MPI** processes per node, using 64 threads each, bound to processor socket is most useful for memory bandwidth-bound applications such as BLAS1 or FFT with scalable memory demand. However, note that the two processes will share access to the network interface. The 64 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration, and NUMA effect overheads.
!!! note !!! note
Important! Bind every OpenMP thread to a core! Important! Bind every OpenMP thread to a core!
In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the `KMP_AFFINITY` or `GOMP_CPU_AFFINITY` environment variables. In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the `KMP_AFFINITY` or `GOMP_CPU_AFFINITY` environment variables.
**24 MPI** processes per node, using 1 thread each bound to a processor core is most suitable for highly scalable applications with low communication demand. **128 MPI** processes per node, using 1 thread each bound to a processor core is most suitable for highly scalable applications with low communication demand.
### Running OpenMPI ### Running OpenMPI
The [OpenMPI 1.8.6][a] is based on OpenMPI. Read more on [how to run OpenMPI][2]. The [OpenMPI 4.1.1][a] is based on OpenMPI. Read more on [how to run OpenMPI][2].
The Intel MPI may run on the [Intel Xeon Ph][3] accelerators as well. Read more on [how to run Intel MPI on accelerators][3].
[1]: ../../modules-matrix.md [1]: ../../modules-matrix.md
[2]: running_openmpi.md [2]: running_openmpi.md
[3]: ../intel/intel-xeon-phi-salomon.md
[a]: http://www.open-mpi.org/ [a]: http://www.open-mpi.org/
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment