diff --git a/docs.it4i/software/intel/intel-xeon-phi-anselm.md b/docs.it4i/software/intel/intel-xeon-phi-anselm.md index 7b6ed0c3b62a1322711711c2b0b37264f4bbe156..547ce91448d5b740050f71f43bc150765fefbd3f 100644 --- a/docs.it4i/software/intel/intel-xeon-phi-anselm.md +++ b/docs.it4i/software/intel/intel-xeon-phi-anselm.md @@ -12,7 +12,7 @@ To get access to a compute node with the Intel Xeon Phi accelerator, use the PBS $ qsub -I -q qmic -A NONE-0-0 ``` -To set up the environment, the intel module has to be loaded: +To set up the environment, the `intel` module has to be loaded: ```console $ ml intel @@ -97,7 +97,7 @@ $ qsub -I -q qmic -A NONE-0-0 $ ml intel ``` -For debugging purposes, it is also recommended to set the "OFFLOAD_REPORT" environment variable. The value can be set from 0 to 3, where a higher number means more debugging information: +For debugging purposes, it is also recommended to set the `OFFLOAD_REPORT` environment variable. The value can be set from 0 to 3, where a higher number means more debugging information: ```console export OFFLOAD_REPORT=3 @@ -222,7 +222,7 @@ int main() } ``` -During the compilation, the Intel compiler shows which loops have been vectorized in both the host and the accelerator. This can be enabled with the "-vec-report2" compiler option. To compile and execute the code, run: +During the compilation, the Intel compiler shows which loops have been vectorized in both the host and the accelerator. This can be enabled with the `-vec-report2` compiler option. To compile and execute the code, run: ```console $ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add @@ -234,11 +234,11 @@ Some interesting compiler flags useful not only for code debugging are: !!! note Debugging - openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level - vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level + `openmp_report[0|1|2]` - controls the compiler based vectorization diagnostic level + `vec-report[0|1|2]` - controls the OpenMP parallelizer diagnostic level Performance optimization - xhost - FOR HOST ONLY - to generate AVX (Advanced Vector Extensions) instructions. + `xhost` - FOR HOST ONLY - to generate AVX (Advanced Vector Extensions) instructions. ## Automatic Offload Using Intel MKL Library @@ -263,7 +263,7 @@ To get more information about the automatic offload, refer to the "[Using Intel ### Automatic Offload Example -At first, get an interactive PBS session on a node with the MIC accelerator and load the "intel" module that automatically loads the "mkl" module as well. +At first, get an interactive PBS session on a node with the MIC accelerator and load the `intel` module that automatically loads the `mkl` module as well. ```console $ qsub -I -q qmic -A OPEN-0-0 -l select=1:ncpus=16 @@ -374,7 +374,7 @@ $ ml intel !!! note A particular version of the Intel module is specified. This information is used later to specify the correct library paths. -To produce a binary compatible with the Intel Xeon Phi architecture, the user has to specify the "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile an OpenMP parallel code "vect-add.c" for the host only: +To produce a binary compatible with the Intel Xeon Phi architecture, the user has to specify the `-mmic` compiler flag. Two compilation examples are shown below. The first example shows how to compile an OpenMP parallel code `vect-add.c` for the host only: ```console $ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host @@ -408,7 +408,7 @@ If the code is sequential, it can be executed directly: mic0 $ ~/path_to_binary/vect-add-seq-mic ``` -If the code is parallelized using OpenMP, a set of additional libraries is required for execution. To locate these libraries a new path has to be added to the LD_LIBRARY_PATH environment variable prior to the execution: +If the code is parallelized using OpenMP, a set of additional libraries is required for execution. To locate these libraries a new path has to be added to the `LD_LIBRARY_PATH` environment variable prior to the execution: ```console mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH @@ -444,7 +444,7 @@ On Anselm, OpenCL is installed only on compute nodes with the MIC accelerator, s ml opencl-sdk opencl-rt ``` -Always load the "opencl-sdk" (providing devel files like headers) and "opencl-rt" (providing dynamic library libOpenCL.so) modules to compile and link OpenCL code. Load "opencl-rt" for running your compiled code. +Always load the `opencl-sdk` (providing devel files like headers) and `opencl-rt` (providing dynamic library libOpenCL.so) modules to compile and link OpenCL code. Load `opencl-rt` for running your compiled code. There are two basic examples of OpenCL code in the following directory: @@ -628,8 +628,8 @@ Hello world from process 0 of 4 on host cn207 ### Coprocessor-Only Model -There are two ways to execute MPI code on a single coprocessor: 1) launch the program using "**mpirun**" from the -coprocessor; or 2) launch the task using "**mpiexec.hydra**" from a host. +There are two ways to execute MPI code on a single coprocessor: 1) launch the program using `mpirun` from the +coprocessor; or 2) launch the task using `mpiexec.hydra` from a host. #### Execution on Coprocessor @@ -684,7 +684,7 @@ Hello world from process 0 of 4 on host cn207-mic0 If the MPI program is launched from host instead of the coprocessor, the environmental variables are not set using the ".profile" file. Therefore, the user has to specify library paths from the command line when calling "mpiexec". -First step is to tell mpiexec that the MPI should be executed on a local accelerator by setting up the "I_MPI_MIC" environment variable: +First step is to tell mpiexec that the MPI should be executed on a local accelerator by setting up the `I_MPI_MIC` environment variable: ```console $ export I_MPI_MIC=1 @@ -696,7 +696,7 @@ Now the MPI program can be executed as: $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic ``` -or using mpirun: +or using `mpirun`: ```console $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic @@ -755,7 +755,7 @@ This output means that the PBS allocated nodes cn204 and cn205, which means that - to connect to the accelerator on the first node from the first node: `$ ssh cn204-mic0` or `$ ssh mic0` - to connect to the accelerator on the second node from the first node: `$ ssh cn205-mic0` -At this point, we expect that the correct modules are loaded and the binary is compiled. For parallel execution, the mpiexec.hydra is used. Again, the first step is to tell mpiexec that the MPI can be executed on the MIC accelerators by setting up the "I_MPI_MIC" environment variable: +At this point, we expect that the correct modules are loaded and the binary is compiled. For parallel execution, `mpiexec.hydra` is used. Again, the first step is to tell mpiexec that the MPI can be executed on the MIC accelerators by setting up the `I_MPI_MIC` environment variable: ```console $ export I_MPI_MIC=1 @@ -772,7 +772,7 @@ $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ : -host cn205-mic0 -n 6 ~/mpi-test-mic ``` -or using mpirun: +or using `mpirun`: ```console $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ @@ -814,7 +814,7 @@ $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ In a symmetric mode MPI programs are executed on both the host computer(s) and the MIC accelerator(s). Since MIC has a different architecture and requires a different binary file produced by the Intel compiler, two different files have to be compiled before the MPI program is executed. -In the previous section, we have compiled two binary files, one for hosts "**mpi-test**" and one for MIC accelerators "**mpi-test-mic**". These two binaries can be executed at once using mpiexec.hydra: +In the previous section, we have compiled two binary files, one for hosts "**mpi-test**" and one for MIC accelerators "**mpi-test-mic**". These two binaries can be executed at once using `mpiexec.hydra`: ```console $ mpiexec.hydra @@ -837,7 +837,7 @@ The output of the program is: Hello world from process 3 of 4 on host cn205-mic0 ``` -The execution procedure can be simplified by using the mpirun command with the machine file as a parameter. The machine file contains a list of all nodes and accelerators that should be used to execute MPI processes. +The execution procedure can be simplified by using the `mpirun` command with the machine file as a parameter. The machine file contains a list of all nodes and accelerators that should be used to execute MPI processes. An example of a machine file that uses 2 hosts (**cn205** and **cn206**) and 2 accelerators **(cn205-mic0** and **cn206-mic0**) to run 2 MPI processes on each of them: @@ -849,13 +849,13 @@ $ cat hosts_file_mix cn206-mic0:2 ``` -In addition if a naming convention is set in a way that the name of the binary for host is **"bin_name"** and the name of the binary for the accelerator is **"bin_name-mic"** then by setting up the environment variable **I_MPI_MIC_POSTFIX** to **"-mic"**, the user does not have to specify the names of both binaries. In this case, mpirun needs just the name of the host binary file (i.e. "mpi-test") and uses the suffix to get a name of the binary for accelerator (i..e. "mpi-test-mic"). +In addition if a naming convention is set in a way that the name of the binary for host is **"bin_name"** and the name of the binary for the accelerator is **"bin_name-mic"** then by setting up the environment variable `I_MPI_MIC_POSTFIX` to `-mic`, the user does not have to specify the names of both binaries. In this case, `mpirun` needs just the name of the host binary file (i.e. "mpi-test") and uses the suffix to get a name of the binary for accelerator (i..e. "mpi-test-mic"). ```console $ export I_MPI_MIC_POSTFIX=-mic ``` -To run the MPI code using mpirun and the machine file "hosts_file_mix", use: +To run the MPI code using `mpirun` and the machine file "hosts_file_mix", use: ```console $ mpirun @@ -898,7 +898,7 @@ A set of node-files, which can be used instead of manually creating a new one ev - /lscratch/${PBS_JOBID}/nodefile-mic-sn MICs only node-file, using short names - /lscratch/${PBS_JOBID}/nodefile-mix-sn Hosts and MICs node-file, using short names -Each host or accelerator is listed only once per file. User has to specify how many jobs should be executed per node using the `-n` parameter of the mpirun command. +Each host or accelerator is listed only once per file. User has to specify how many jobs should be executed per node using the `-n` parameter of the `mpirun` command. ## Optimization