Skip to content
Snippets Groups Projects
running-mpich2.md 6.24 KiB
Newer Older
  • Learn to ignore specific revisions
  • David Hrbáč's avatar
    David Hrbáč committed
    # Running MPICH2
    
    ## MPICH2 program execution
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    The MPICH2 programs use mpd daemon or ssh connection to spawn processes, no PBS support is needed. However the PBS allocation is required to access compute nodes. On Anselm, the **Intel MPI** and **mpich2 1.9** are MPICH2 based MPI implementations.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    ### Basic usage
    
    
    !!! Note "Note"
    	Use the mpirun to execute the MPICH2 code.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    Example:
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ qsub -q qexp -l select=4:ncpus=16 -I
        qsub: waiting for job 15210.srv11 to start
        qsub: job 15210.srv11 ready
    
        $ module load impi
    
        $ mpirun -ppn 1 -hostfile $PBS_NODEFILE ./helloworld_mpi.x
        Hello world! from rank 0 of 4 on host cn17
        Hello world! from rank 1 of 4 on host cn108
        Hello world! from rank 2 of 4 on host cn109
        Hello world! from rank 3 of 4 on host cn110
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    In this example, we allocate 4 nodes via the express queue interactively. We set up the intel MPI environment and interactively run the helloworld_mpi.x program. We request MPI to spawn 1 process per node.
    Note that the executable helloworld_mpi.x must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    You need to preload the executable, if running on the local scratch /lscratch filesystem
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ pwd
        /lscratch/15210.srv11
        $ mpirun -ppn 1 -hostfile $PBS_NODEFILE cp /home/username/helloworld_mpi.x .
        $ mpirun -ppn 1 -hostfile $PBS_NODEFILE ./helloworld_mpi.x
        Hello world! from rank 0 of 4 on host cn17
        Hello world! from rank 1 of 4 on host cn108
        Hello world! from rank 2 of 4 on host cn109
        Hello world! from rank 3 of 4 on host cn110
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    David Hrbáč's avatar
    David Hrbáč committed
    In this example, we assume the executable helloworld_mpi.x is present on shared home directory. We run the cp command via mpirun, copying the executable from shared home to local scratch . Second  mpirun will execute the binary in the /lscratch/15210.srv11 directory on nodes cn17, cn108, cn109 and cn110, one process per node.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    !!! Note "Note"
    	MPI process mapping may be controlled by PBS parameters.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    ### One MPI process per node
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    Follow this example to run one MPI process per node, 16 threads per process. Note that no options to mpirun are needed
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I
    
        $ module load mvapich2
    
        $ mpirun ./helloworld_mpi.x
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    In this example, we demonstrate recommended way to run an MPI application, using 1 MPI processes per node and 16 threads per socket, on 4 nodes.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    ### Two MPI processes per node
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    Follow this example to run two MPI processes per node, 8 threads per process. Note the options to mpirun for mvapich2. No options are needed for impi.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I
    
        $ module load mvapich2
    
        $ mpirun -bind-to numa ./helloworld_mpi.x
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    In this example, we demonstrate recommended way to run an MPI application, using 2 MPI processes per node and 8 threads per socket, each process and its threads bound to a separate processor socket of the node, on 4 nodes
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    ### 16 MPI processes per node
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    Follow this example to run 16 MPI processes per node, 1 thread per process. Note the options to mpirun for mvapich2. No options are needed for impi.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I
    
        $ module load mvapich2
    
        $ mpirun -bind-to core ./helloworld_mpi.x
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    In this example, we demonstrate recommended way to run an MPI application, using 16 MPI processes per node, single threaded. Each process is bound to separate processor core, on 4 nodes.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    ### OpenMP thread affinity
    
    
    !!! Note "Note"
    
    David Hrbáč's avatar
    David Hrbáč committed
    	Important!  Bind every OpenMP thread to a core!
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP:
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ export GOMP_CPU_AFFINITY="0-15"
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    or this one for Intel OpenMP:
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ export KMP_AFFINITY=granularity=fine,compact,1,0
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and later) the following variables may be used for Intel or GCC:
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ export OMP_PROC_BIND=true
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ export OMP_PLACES=cores
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    David Hrbáč's avatar
    David Hrbáč committed
    ## MPICH2 Process Mapping and Binding
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    The mpirun allows for precise selection of how the MPI processes will be mapped to the computational nodes and how these processes will bind to particular processor sockets and cores.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    ### Machinefile
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    Process mapping may be controlled by specifying a machinefile input to the mpirun program. Altough all implementations of MPI provide means for process mapping and binding, following examples are valid for the impi and mvapich2 only.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    Example machinefile
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        cn110.bullx
        cn109.bullx
        cn108.bullx
        cn17.bullx
        cn108.bullx
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    Use the machinefile to control process placement
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ mpirun -machinefile machinefile helloworld_mpi.x
        Hello world! from rank 0 of 5 on host cn110
        Hello world! from rank 1 of 5 on host cn109
        Hello world! from rank 2 of 5 on host cn108
        Hello world! from rank 3 of 5 on host cn17
        Hello world! from rank 4 of 5 on host cn108
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    In this example, we see that ranks have been mapped on nodes according to the order in which nodes show in the machinefile
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    ### Process Binding
    
    
    The Intel MPI automatically binds each process and its threads to the corresponding portion of cores on the processor socket of the node, no options needed. The binding is primarily controlled by environment variables. Read more about mpi process binding on [Intel website](https://software.intel.com/sites/products/documentation/hpc/ics/impi/41/lin/Reference_Manual/Environment_Variables_Process_Pinning.htm). The MPICH2 uses the -bind-to option Use -bind-to numa or -bind-to core to bind the process on single core or entire socket.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    ### Bindings verification
    
    In all cases, binding and threading may be verified by executing
    
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```bash
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
        $ mpirun  -bindto numa numactl --show
        $ mpirun  -bindto numa echo $OMP_NUM_THREADS
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    ```
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    David Hrbáč's avatar
    David Hrbáč committed
    ## Intel MPI on Xeon Phi
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    The[MPI section of Intel Xeon Phi chapter](../intel-xeon-phi/) provides details on how to run Intel MPI code on Xeon Phi architecture.