Update mpi.md

5cb83428 · Jan Siwiec · 3af6df17 · 5cb83428
Commit 5cb83428 authored Mar 24, 2020 by Jan Siwiec
--- a/docs.it4i/software/mpi/mpi.md
+++ b/docs.it4i/software/mpi/mpi.md
@@ -13,7 +13,7 @@ The Salomon cluster provides several implementations of the MPI library:

 MPI libraries are activated via the environment modules.

-Look up section modulefiles/mpi in ml av
+Look up the modulefiles/mpi section in ml av:

 ```console
 $ ml av
@@ -26,7 +26,7 @@ $ ml av
    OpenMPI/1.8.6-GNU-5.1.0-2.25
 ```

-There are default compilers associated with any particular MPI implementation. The defaults may be changed, the MPI libraries may be used in conjunction with any compiler. The defaults are selected via the modules in following way
+There are default compilers associated with any particular MPI implementation. The defaults may be changed; the MPI libraries may be used in conjunction with any compiler. The defaults are selected via the modules in the following way:

 | Module                                   | MPI        | Compiler suite |
 | ---------------------------------------- | ---------- | -------------- |
@@ -39,15 +39,15 @@ Examples:
 $ ml gompi/2015b
 ```

-In this example, we activate the latest OpenMPI with latest GNU compilers (OpenMPI 1.8.6 and GCC 5.1). See more information about toolchains in section [Environment and Modules][1].
+In this example, we activate the latest OpenMPI with the latest GNU compilers (OpenMPI 1.8.6 and GCC 5.1). For more information about toolchains, see the [Environment and Modules][1] section.

-To use OpenMPI with the intel compiler suite, use
+To use OpenMPI with the Intel compiler suite, use

 ```console
 $ ml iompi/2015.03
 ```

-In this example, the openmpi 1.8.6 using intel compilers is activated. It's used "iompi" toolchain.
+In this example, the OpenMPI 1.8.6 using the Intel compilers is activated. It uses the "iompi" toolchain.

 ## Compiling MPI Programs

@@ -66,7 +66,7 @@ $ mpicc
 $ mpiifort
 ```

-Wrappers mpif90, mpif77 that are provided by Intel MPI are designed for gcc and gfortran. You might be able to compile MPI code by them even with Intel compilers, but you might run into problems (for example, native MIC compilation with -mmic does not work with mpif90).
+Wrappers mpif90 and mpif77 that are provided by Intel MPI are designed for GCC and GFortran. You might be able to compile MPI code by them even with Intel compilers, but you might run into problems (for example, native MIC compilation with -mmic does not work with mpif90).

 Example program:

@@ -98,7 +98,7 @@ return 0;
 }
 ```

-Compile the above example with
+Compile the above example with:

 ```console
 $ mpicc helloworld_mpi.c -o helloworld_mpi.x
@@ -109,13 +109,13 @@ $ mpicc helloworld_mpi.c -o helloworld_mpi.x
 The MPI program executable must be compatible with the loaded MPI module.
 Always compile and execute using the very same MPI module.

-It is strongly discouraged to mix MPI implementations. Linking an application with one MPI implementation and running mpirun/mpiexec form other implementation may result in unexpected errors.
+It is strongly discouraged to mix MPI implementations. Linking an application with one MPI implementation and running mpirun/mpiexec from another implementation may result in unexpected errors.

-The MPI program executable must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem. You need to preload the executable, if running on the local scratch /lscratch filesystem.
+The MPI program executable must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem. You need to preload the executable if running on the local scratch /lscratch filesystem.

 ### Ways to Run MPI Programs

-Optimal way to run an MPI program depends on its memory requirements, memory access pattern and communication pattern.
+The optimal way to run an MPI program depends on its memory requirements, memory access pattern and communication pattern.

 !!! note
    Consider these ways to run an MPI program:
@@ -123,20 +123,20 @@ Optimal way to run an MPI program depends on its memory requirements, memory acc
    2. Two MPI processes per node, 12 threads per process
    3. 24 MPI processes per node, 1 thread per process.

-**One MPI** process per node, using 24 threads, is most useful for memory demanding applications, that make good use of processor cache memory and are not memory bound.  This is also a preferred way for communication intensive applications as one process per node enjoys full bandwidth access to the network interface.
+**One MPI** process per node, using 24 threads, is most useful for memory demanding applications that make good use of processor cache memory and are not memory-bound. This is also a preferred way for communication intensive applications as one process per node enjoys full bandwidth access to the network interface.

-**Two MPI** processes per node, using 12 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 12 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and NUMA effect overheads.
+**Two MPI** processes per node, using 12 threads each, bound to processor socket is most useful for memory bandwidth-bound applications such as BLAS1 or FFT with scalable memory demand. However, note that the two processes will share access to the network interface. The 12 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration, and NUMA effect overheads.

 !!! note
    Important! Bind every OpenMP thread to a core!

 In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables.

-**24 MPI** processes per node, using 1 thread each bound to processor core is most suitable for highly scalable applications with low communication demand.
+**24 MPI** processes per node, using 1 thread each bound to a processor core is most suitable for highly scalable applications with low communication demand.

 ### Running OpenMPI

-The [OpenMPI 1.8.6][a] is based on OpenMPI. Read more on [how to run OpenMPI][2] based MPI.
+The [OpenMPI 1.8.6][a] is based on OpenMPI. Read more on [how to run OpenMPI][2].

 The Intel MPI may run on the [Intel Xeon Ph][3] accelerators as well. Read more on [how to run Intel MPI on accelerators][3].