Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • sccs/docs.it4i.cz
  • soj0018/docs.it4i.cz
  • lszustak/docs.it4i.cz
  • jarosjir/docs.it4i.cz
  • strakpe/docs.it4i.cz
  • beranekj/docs.it4i.cz
  • tab0039/docs.it4i.cz
  • davidciz/docs.it4i.cz
  • gui0013/docs.it4i.cz
  • mrazek/docs.it4i.cz
  • lriha/docs.it4i.cz
  • it4i-vhapla/docs.it4i.cz
  • hol0598/docs.it4i.cz
  • sccs/docs-it-4-i-cz-fumadocs
  • siw019/docs-it-4-i-cz-fumadocs
15 results
Show changes
Showing
with 339 additions and 0 deletions
GPI-2
=====
##A library that implements the GASPI specification
Introduction
------------
Programming Next Generation Supercomputers: GPI-2 is an API library for asynchronous interprocess, cross-node communication. It provides a flexible, scalable and fault tolerant interface for parallel applications.
The GPI-2 library ([www.gpi-site.com/gpi2/](http://www.gpi-site.com/gpi2/)) implements the GASPI specification (Global Address Space Programming Interface, [www.gaspi.de](http://www.gaspi.de/en/project.html)). GASPI is a Partitioned Global Address Space (PGAS) API. It aims at scalable, flexible and failure tolerant computing in massively parallel environments.
Modules
-------
The GPI-2, version 1.0.2 is available on Anselm via module gpi2:
```bash
$ module load gpi2
```
The module sets up environment variables, required for linking and running GPI-2 enabled applications. This particular command loads the default module, which is gpi2/1.0.2
Linking
-------
!!! Note "Note"
Link with -lGPI2 -libverbs
Load the gpi2 module. Link using **-lGPI2** and **-libverbs** switches to link your code against GPI-2. The GPI-2 requires the OFED infinband communication library ibverbs.
### Compiling and linking with Intel compilers
```bash
$ module load intel
$ module load gpi2
$ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs
```
### Compiling and linking with GNU compilers
```bash
$ module load gcc
$ module load gpi2
$ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs
```
Running the GPI-2 codes
-----------------------
!!! Note "Note"
gaspi_run starts the GPI-2 application
The gaspi_run utility is used to start and run GPI-2 applications:
```bash
$ gaspi_run -m machinefile ./myprog.x
```
A machine file (**machinefile**) with the hostnames of nodes where the application will run, must be provided. The machinefile lists all nodes on which to run, one entry per node per process. This file may be hand created or obtained from standard $PBS_NODEFILE:
```bash
$ cut -f1 -d"." $PBS_NODEFILE > machinefile
```
machinefile:
```bash
cn79
cn80
```
This machinefile will run 2 GPI-2 processes, one on node cn79 other on node cn80.
machinefle:
```bash
cn79
cn79
cn80
cn80
```
This machinefile will run 4 GPI-2 processes, 2 on node cn79 o 2 on node cn80.
!!! Note "Note"
Use the **mpiprocs** to control how many GPI-2 processes will run per node
Example:
```bash
$ qsub -A OPEN-0-0 -q qexp -l select=2:ncpus=16:mpiprocs=16 -I
```
This example will produce $PBS_NODEFILE with 16 entries per node.
### gaspi_logger
!!! Note "Note"
gaspi_logger views the output form GPI-2 application ranks
The gaspi_logger utility is used to view the output from all nodes except the master node (rank 0). The gaspi_logger is started, on another session, on the master node - the node where the gaspi_run is executed. The output of the application, when called with gaspi_printf(), will be redirected to the gaspi_logger. Other I/O routines (e.g. printf) will not.
Example
-------
Following is an example GPI-2 enabled code:
```cpp
#include <GASPI.h>
#include <stdlib.h>
void success_or_exit ( const char* file, const int line, const int ec)
{
if (ec != GASPI_SUCCESS)
{
gaspi_printf ("Assertion failed in %s[%i]:%dn", file, line, ec);
exit (1);
}
}
#define ASSERT(ec) success_or_exit (__FILE__, __LINE__, ec);
int main(int argc, char *argv[])
{
gaspi_rank_t rank, num;
gaspi_return_t ret;
/* Initialize GPI-2 */
ASSERT( gaspi_proc_init(GASPI_BLOCK) );
/* Get ranks information */
ASSERT( gaspi_proc_rank(&rank) );
ASSERT( gaspi_proc_num(&num) );
gaspi_printf("Hello from rank %d of %dn",
rank, num);
/* Terminate */
ASSERT( gaspi_proc_term(GASPI_BLOCK) );
return 0;
}
```
Load modules and compile:
```bash
$ module load gcc gpi2
$ gcc helloworld_gpi.c -o helloworld_gpi.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs
```
Submit the job and run the GPI-2 application
```bash
$ qsub -q qexp -l select=2:ncpus=1:mpiprocs=1,place=scatter,walltime=00:05:00 -I
qsub: waiting for job 171247.dm2 to start
qsub: job 171247.dm2 ready
cn79 $ module load gpi2
cn79 $ cut -f1 -d"." $PBS_NODEFILE > machinefile
cn79 $ gaspi_run -m machinefile ./helloworld_gpi.x
Hello from rank 0 of 2
```
At the same time, in another session, you may start the gaspi logger:
```bash
$ ssh cn79
cn79 $ gaspi_logger
GASPI Logger (v1.1)
[cn80:0] Hello from rank 1 of 2
```
In this example, we compile the helloworld_gpi.c code using the **gnu compiler** (gcc) and link it to the GPI-2 and ibverbs library. The library search path is compiled in. For execution, we use the qexp queue, 2 nodes 1 core each. The GPI module must be loaded on the master compute node (in this example the cn79), gaspi_logger is used from different session to view the output of the second process.
Anselm Cluster Software
===
## [Modules](../../modules-anselm)
* List of available modules
## [COMSOL](comsol-multiphysics)
* A finite element analysis, solver and Simulation software
## [ParaView](paraview)
* An open-source, multi-platform data analysis and visualization application
## [Compilers](compilers)
* Available compilers, including GNU, INTEL and UPC compilers
## [NVIDIA CUDA](nvidia-cuda)
* A guide to NVIDIA CUDA programming and GPU usage
## [GPI-2](gpi2)
* A library that implements the GASPI specification
## [OpenFOAM](openfoam)
* A free, open source CFD software package
## [ISV Licenses](isv_licenses)
* A guide to managing Independent Software Vendor licenses
## [Intel Xeon Phi](intel-xeon-phi)
* A guide to Intel Xeon Phi usage
## [Virtualization](kvirtualization)
## [Java](java)
* Java on ANSELM
## [Operating System](operating-system)
* The operating system, deployed on ANSELM
## Intel Suite
* The Intel Parallel Studio XE
### [Introduction](intel-suite/introduction)
### [Intel MKL](intel-suite/intel-mkl)
### [Intel Compilers](intel-suite/intel-compilers)
### [Intel IPP](intel-suite/intel-integrated-performance-primitives)
### [Intel TBB](intel-suite/intel-tbb)
### [Intel Debugger](intel-suite/intel-debugger)
## MPI
* Message Passing Interface libraries
### [Introduction](mpi/mpi)
### [MPI4Py (MPI for Python)](mpi/mpi4py-mpi-for-python)
### [Running OpenMPI](mpi/Running_OpenMPI)
### [Running MPICH2](mpi/running-mpich2)
## Numerical Libraries
* Libraries for numerical computations
### [Intel numerical libraries](numerical-libraries/intel-numerical-libraries)
### [PETSc](numerical-libraries/petsc)
### [Trilinos](numerical-libraries/trilinos)
### [FFTW](numerical-libraries/fftw)
### [GSL](numerical-libraries/gsl)
### [MAGMA for Intel Xeon Phi](numerical-libraries/magma-for-intel-xeon-phi)
### [HDF5](numerical-libraries/hdf5)
## Omics Master
### [Diagnostic component (TEAM)](omics-master/diagnostic-component-team)
### [Prioritization component (BiERapp)](omics-master/priorization-component-bierapp)
### [Overview](omics-master/overview)
## Debuggers
* A collection of development tools
### [Valgrind](debuggers/valgrind)
### [PAPI](debuggers/papi)
### [Allinea Forge (DDT,MAP)](debuggers/allinea-ddt)
### [Total View](debuggers/total-view)
### [CUBE](debuggers/cube)
### [Intel VTune Amplifier](debuggers/intel-vtune-amplifier)
### [VNC](debuggers/debuggers)
### [Scalasca](debuggers/scalasca)
### [Score-P](debuggers/score-p)
### [Intel Performance Counter Monitor](debuggers/intel-performance-counter-monitor)
### [Allinea Performance Reports](debuggers/allinea-performance-reports)
### [Vampir](debuggers/vampir)
## Numerical Languages
* Interpreted languages for numerical computations
### [Introduction](numerical-languages/introduction)
### [R](numerical-languages/r)
### [MATLAB 2013-2014](numerical-languages/matlab_1314)
### [MATLAB](numerical-languages/matlab)
### [Octave](numerical-languages/octave)
## Chemistry
* Tools for computational chemistry
### [Molpro](chemistry/molpro)
### [NWChem](chemistry/nwchem)
## ANSYS
* An engineering simulation software
### [Introduction](ansys/ansys)
### [ANSYS CFX](ansys/ansys-cfx)
### [ANSYS LS-DYNA](ansys/ansys-ls-dyna)
### [ANSYS MAPDL](ansys/ansys-mechanical-apdl)
### [LS-DYNA](ansys/ls-dyna)
### [ANSYS Fluent](ansys/ansys-fluent)
Intel Compilers
===============
The Intel compilers version 13.1.1 are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler.
```bash
$ module load intel
$ icc -v
$ ifort -v
```
The intel compilers provide for vectorization of the code, via the AVX instructions and support threading parallelization via OpenMP
For maximum performance on the Anselm cluster, compile your programs using the AVX instructions, with reporting where the vectorization was used. We recommend following compilation options for high performance
```bash
$ icc -ipo -O3 -vec -xAVX -vec-report1 myprog.c mysubroutines.c -o myprog.x
$ ifort -ipo -O3 -vec -xAVX -vec-report1 myprog.f mysubroutines.f -o myprog.x
```
In this example, we compile the program enabling interprocedural optimizations between source files (-ipo), aggressive loop optimizations (-O3) and vectorization (-vec -xAVX)
The compiler recognizes the omp, simd, vector and ivdep pragmas for OpenMP parallelization and AVX vectorization. Enable the OpenMP parallelization by the **-openmp** compiler switch.
```bash
$ icc -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.c mysubroutines.c -o myprog.x
$ ifort -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.f mysubroutines.f -o myprog.x
```
Read more at <http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm>
Sandy Bridge/Haswell binary compatibility
-----------------------------------------
Anselm nodes are currently equipped with Sandy Bridge CPUs, while Salomon will use Haswell architecture. >The new processors are backward compatible with the Sandy Bridge nodes, so all programs that ran on the Sandy Bridge processors, should also run on the new Haswell nodes. >To get optimal performance out of the Haswell processors a program should make use of the special AVX2 instructions for this processor. One can do this by recompiling codes with the compiler flags >designated to invoke these instructions. For the Intel compiler suite, there are two ways of doing this:
- Using compiler flag (both for Fortran and C): -xCORE-AVX2. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge nodes.
- Using compiler flags (both for Fortran and C): -xAVX -axCORE-AVX2. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries.
Intel TBB
=========
Intel Threading Building Blocks
-------------------------------
Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. The tasks are executed by a runtime scheduler and may
be offloaded to [MIC accelerator](../intel-xeon-phi/).
Intel TBB version 4.1 is available on Anselm
```bash
$ module load tbb
```
The module sets up environment variables, required for linking and running tbb enabled applications.
!!! Note "Note"
Link the tbb library, using -ltbb
Examples
--------
Number of examples, demonstrating use of TBB and its built-in scheduler is available on Anselm, in the $TBB_EXAMPLES directory.
```bash
$ module load intel
$ module load tbb
$ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/
$ cd /tmp/parallel_reduce/primes
$ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb
$ ./primes.x
```
In this example, we compile, link and run the primes example, demonstrating use of parallel task-based reduce in computation of prime numbers.
You will need the tbb module loaded to run the tbb enabled executable. This may be avoided, by compiling library search paths into the executable.
```bash
$ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb
```
Further reading
---------------
Read more on Intel website, <http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/index.htm>
This diff is collapsed.
This diff is collapsed.