Skip to content
Snippets Groups Projects
Commit 16abbbaa authored by Lukáš Krupčík's avatar Lukáš Krupčík
Browse files

Update docs.it4i/software/intel/intel-suite/intel-compilers.md

parent cc2fc1dc
No related branches found
No related tags found
1 merge request!338software
Pipeline #22290 failed
...@@ -8,39 +8,52 @@ ...@@ -8,39 +8,52 @@
Intel compilers are available in multiple versions via the `intel` module. The compilers include the icc C and C++ compiler and the ifort Fortran 77/90/95 compiler. Intel compilers are available in multiple versions via the `intel` module. The compilers include the icc C and C++ compiler and the ifort Fortran 77/90/95 compiler.
For the current list of installed versions, use:
```console
$ ml av intel/
```
```console ```console
$ ml intel $ ml intel/2020b
$ icc -v $ icc -v
icc version 19.1.3.304 (gcc version 10.2.0 compatibility)
$ ifort -v $ ifort -v
ifort version 19.1.3.304
``` ```
## AVX2 Vectorization ## Instructions Vectorization
Intel compilers provide vectorization of the code via the AVX-2/AVX-512 instructions and support threading parallelization via OpenMP.
For maximum performance on the Barbora cluster compute nodes, compile your programs using the AVX-512 instructions, with reporting where the vectorization was used. We recommend the following compilation options for high performance.
``` info
Barbora non-accelerated nodes support AVX-512 instructions (cn1-cn192).
Intel compilers provide vectorization of the code via the AVX2 instructions and support threading parallelization via OpenMP. ```console
$ icc -ipo -O3 -xCORE-AVX512 -qopt-report1 -qopt-report-phase=vec myprog.c mysubroutines.c -o myprog.x
```
For maximum performance on the Salomon cluster compute nodes, compile your programs using the AVX2 instructions, with reporting where the vectorization was used. We recommend the following compilation options for high performance: In this example, we compile the program enabling interprocedural optimizations between source files (`-ipo`), aggressive loop optimizations (`-O3`), and vectorization (`-xCORE-AVX512`).
For maximum performance on the Barbora GPU nodes or Karolina cluster compute nodes, compile your programs using the AVX-2 instructions, with reporting where the vectorization was used. We recommend the following compilation options for high performance.
```console ```console
$ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.c mysubroutines.c -o myprog.x $ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.c mysubroutines.c -o myprog.x
$ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.f mysubroutines.f -o myprog.x
``` ```
``` warn
Karolina cluster has AMD cpu, use compiler options `-march=core-avx2`.
In this example, we compile the program enabling interprocedural optimizations between source files (`-ipo`), aggressive loop optimizations (`-O3`), and vectorization (`-xCORE-AVX2`). In this example, we compile the program enabling interprocedural optimizations between source files (`-ipo`), aggressive loop optimizations (`-O3`), and vectorization (`-xCORE-AVX2`).
The compiler recognizes the omp, simd, vector, and ivdep pragmas for OpenMP parallelization and AVX2 vectorization. Enable the OpenMP parallelization by the `-openmp` compiler switch. The compiler recognizes the omp, simd, vector, and ivdep pragmas for OpenMP parallelization and AVX2 vectorization. Enable the OpenMP parallelization by the `-openmp` compiler switch.
```console ```console
$ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.c mysubroutines.c -o myprog.x $ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.c mysubroutines.c -o myprog.x
$ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.f mysubroutines.f -o myprog.x
``` ```
Read more [here][a]. Read more [here][a].
## Sandy Bridge/Ivy Bridge/Haswell Binary Compatibility [a]: https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top.html
Salomon compute nodes are equipped with the Haswell-based architecture while the UV1 SMP compute server has Ivy Bridge CPUs, which are equivalent to Sandy Bridge (only smaller manufacturing technology). The new processors are backward compatible with the Sandy Bridge nodes, so all programs that run on the Sandy Bridge processors should also run on the new Haswell nodes. To get the optimal performance out of the Haswell processors, a program should make use of the special AVX2 instructions for this processor. This can be done by recompiling codes with the compiler flags designated to invoke these instructions. For the Intel compiler suite, there are two options:
* Using compiler flag (both for Fortran and C): `-xCORE-AVX2`. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge/Ivy Bridge nodes.
* Using compiler flags (both for Fortran and C): `-xAVX -axCORE-AVX2`. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. Therefore, this binary will run both on Sandy Bridge/Ivy Bridge and Haswell processors. During runtime, it will be decided which path to follow, dependent on which processor you are running on. In general, this will result in larger binaries.
[a]: https://software.intel.com/en-us/intel-cplusplus-compiler-16.0-user-and-reference-guide
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment