From d6dc47d8cc4002b4be17409756799d9dee930773 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pavel=20Gajdu=C5=A1ek?= <gajdusek.pavel@gmail.com> Date: Tue, 19 Sep 2017 14:10:58 +0200 Subject: [PATCH] intel-suite changed, bad links, to fix --- .../software/intel-suite/intel-advisor.md | 31 ++++ .../software/intel-suite/intel-compilers.md | 36 +++++ .../software/intel-suite/intel-debugger.md | 73 +++++++++ .../software/intel-suite/intel-inspector.md | 39 +++++ ...intel-integrated-performance-primitives.md | 78 ++++++++++ docs.it4i/software/intel-suite/intel-mkl.md | 120 +++++++++++++++ .../intel-parallel-studio-introduction.md | 69 +++++++++ docs.it4i/software/intel-suite/intel-tbb.md | 40 +++++ .../intel-trace-analyzer-and-collector.md | 40 +++++ .../software/numerical-libraries/fftw.md | 73 +++++++++ docs.it4i/software/numerical-libraries/gsl.md | 145 ++++++++++++++++++ .../software/numerical-libraries/hdf5.md | 89 +++++++++++ .../intel-numerical-libraries.md | 33 ++++ .../magma-for-intel-xeon-phi.md | 76 +++++++++ .../software/numerical-libraries/petsc.md | 60 ++++++++ .../software/numerical-libraries/trilinos.md | 49 ++++++ mkdocs.yml | 43 +++--- todelete | 49 ++++++ 18 files changed, 1118 insertions(+), 25 deletions(-) create mode 100644 docs.it4i/software/intel-suite/intel-advisor.md create mode 100644 docs.it4i/software/intel-suite/intel-compilers.md create mode 100644 docs.it4i/software/intel-suite/intel-debugger.md create mode 100644 docs.it4i/software/intel-suite/intel-inspector.md create mode 100644 docs.it4i/software/intel-suite/intel-integrated-performance-primitives.md create mode 100644 docs.it4i/software/intel-suite/intel-mkl.md create mode 100644 docs.it4i/software/intel-suite/intel-parallel-studio-introduction.md create mode 100644 docs.it4i/software/intel-suite/intel-tbb.md create mode 100644 docs.it4i/software/intel-suite/intel-trace-analyzer-and-collector.md create mode 100644 docs.it4i/software/numerical-libraries/fftw.md create mode 100644 docs.it4i/software/numerical-libraries/gsl.md create mode 100644 docs.it4i/software/numerical-libraries/hdf5.md create mode 100644 docs.it4i/software/numerical-libraries/intel-numerical-libraries.md create mode 100644 docs.it4i/software/numerical-libraries/magma-for-intel-xeon-phi.md create mode 100644 docs.it4i/software/numerical-libraries/petsc.md create mode 100644 docs.it4i/software/numerical-libraries/trilinos.md diff --git a/docs.it4i/software/intel-suite/intel-advisor.md b/docs.it4i/software/intel-suite/intel-advisor.md new file mode 100644 index 000000000..688deda17 --- /dev/null +++ b/docs.it4i/software/intel-suite/intel-advisor.md @@ -0,0 +1,31 @@ +# Intel Advisor + +is tool aiming to assist you in vectorization and threading of your code. You can use it to profile your application and identify loops, that could benefit from vectorization and/or threading parallelism. + +## Installed Versions + +The following versions are currently available on Salomon as modules: + +2016 Update 2 - Advisor/2016_update2 + +## Usage + +Your program should be compiled with -g switch to include symbol names. You should compile with -O2 or higher to see code that is already vectorized by the compiler. + +Profiling is possible either directly from the GUI, or from command line. + +To profile from GUI, launch Advisor: + +```console +$ advixe-gui +``` + +Then select menu File -> New -> Project. Choose a directory to save project data to. After clicking OK, Project properties window will appear, where you can configure path to your binary, launch arguments, working directory etc. After clicking OK, the project is ready. + +In the left pane, you can switch between Vectorization and Threading workflows. Each has several possible steps which you can execute by clicking Collect button. Alternatively, you can click on Command Line, to see the command line required to run the analysis directly from command line. + +## References + +1. [Intel® Advisor 2015 Tutorial: Find Where to Add Parallelism - C++ Sample](https://software.intel.com/en-us/intel-advisor-tutorial-vectorization-windows-cplusplus) +1. [Product page](https://software.intel.com/en-us/intel-advisor-xe) +1. [Documentation](https://software.intel.com/en-us/intel-advisor-2016-user-guide-linux) diff --git a/docs.it4i/software/intel-suite/intel-compilers.md b/docs.it4i/software/intel-suite/intel-compilers.md new file mode 100644 index 000000000..8e2ee714f --- /dev/null +++ b/docs.it4i/software/intel-suite/intel-compilers.md @@ -0,0 +1,36 @@ +# Intel Compilers + +The Intel compilers in multiple versions are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. + +```console +$ ml intel +$ icc -v +$ ifort -v +``` + +The intel compilers provide for vectorization of the code, via the AVX2 instructions and support threading parallelization via OpenMP + +For maximum performance on the Salomon cluster compute nodes, compile your programs using the AVX2 instructions, with reporting where the vectorization was used. We recommend following compilation options for high performance + +```console +$ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.c mysubroutines.c -o myprog.x +$ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.f mysubroutines.f -o myprog.x +``` + +In this example, we compile the program enabling interprocedural optimizations between source files (-ipo), aggresive loop optimizations (-O3) and vectorization (-xCORE-AVX2) + +The compiler recognizes the omp, simd, vector and ivdep pragmas for OpenMP parallelization and AVX2 vectorization. Enable the OpenMP parallelization by the **-openmp** compiler switch. + +```console +$ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.c mysubroutines.c -o myprog.x +$ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.f mysubroutines.f -o myprog.x +``` + +Read more at <https://software.intel.com/en-us/intel-cplusplus-compiler-16.0-user-and-reference-guide> + +## Sandy Bridge/Ivy Bridge/Haswell Binary Compatibility + + Anselm nodes are currently equipped with Sandy Bridge CPUs, while Salomon compute nodes are equipped with Haswell based architecture. The UV1 SMP compute server has Ivy Bridge CPUs, which are equivalent to Sandy Bridge (only smaller manufacturing technology). The new processors are backward compatible with the Sandy Bridge nodes, so all programs that ran on the Sandy Bridge processors, should also run on the new Haswell nodes. To get optimal performance out of the Haswell processors a program should make use of the special AVX2 instructions for this processor. One can do this by recompiling codes with the compiler flags designated to invoke these instructions. For the Intel compiler suite, there are two ways of doing this: + +* Using compiler flag (both for Fortran and C): -xCORE-AVX2. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge/Ivy Bridge nodes. +* Using compiler flags (both for Fortran and C): -xAVX -axCORE-AVX2. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge/Ivy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries. diff --git a/docs.it4i/software/intel-suite/intel-debugger.md b/docs.it4i/software/intel-suite/intel-debugger.md new file mode 100644 index 000000000..9bd08cdcf --- /dev/null +++ b/docs.it4i/software/intel-suite/intel-debugger.md @@ -0,0 +1,73 @@ +# Intel Debugger + +IDB is no longer available since Intel Parallel Studio 2015 + +## Debugging Serial Applications + +The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use [X display](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/) for running the GUI. + +```console +$ ml intel +$ ml Java +$ idb +``` + +The debugger may run in text mode. To debug in text mode, use + +```console +$ idbc +``` + +To debug on the compute nodes, module intel must be loaded. The GUI on compute nodes may be accessed using the same way as in [the GUI section](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/) + +Example: + +```console +$ qsub -q qexp -l select=1:ncpus=24 -X -I # use 16 threads for Anselm + qsub: waiting for job 19654.srv11 to start + qsub: job 19654.srv11 ready +$ ml intel +$ ml Java +$ icc -O0 -g myprog.c -o myprog.x +$ idb ./myprog.x +``` + +In this example, we allocate 1 full compute node, compile program myprog.c with debugging options -O0 -g and run the idb debugger interactively on the myprog.x executable. The GUI access is via X11 port forwarding provided by the PBS workload manager. + +## Debugging Parallel Applications + + Intel debugger is capable of debugging multithreaded and MPI parallel programs as well. + +### Small Number of MPI Ranks + +For debugging small number of MPI ranks, you may execute and debug each rank in separate xterm terminal (do not forget the [X display](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/)). Using Intel MPI, this may be done in following way: + +```console +$ qsub -q qexp -l select=2:ncpus=24 -X -I + qsub: waiting for job 19654.srv11 to start + qsub: job 19655.srv11 ready +$ ml intel +$ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x +``` + +In this example, we allocate 2 full compute node, run xterm on each node and start idb debugger in command line mode, debugging two ranks of mympiprog.x application. The xterm will pop up for each rank, with idb prompt ready. The example is not limited to use of Intel MPI + +### Large Number of MPI Ranks + +Run the idb debugger from within the MPI debug option. This will cause the debugger to bind to all ranks and provide aggregated outputs across the ranks, pausing execution automatically just after startup. You may then set break points and step the execution manually. Using Intel MPI: + +```console +$ qsub -q qexp -l select=2:ncpus=24 -X -I + qsub: waiting for job 19654.srv11 to start + qsub: job 19655.srv11 ready +$ ml intel +$ mpirun -n 48 -idb ./mympiprog.x +``` + +### Debugging Multithreaded Application + +Run the idb debugger in GUI mode. The menu Parallel contains number of tools for debugging multiple threads. One of the most useful tools is the **Serialize Execution** tool, which serializes execution of concurrent threads for easy orientation and identification of concurrency related bugs. + +## Further Information + +Exhaustive manual on idb features and usage is published at Intel website, <https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/> diff --git a/docs.it4i/software/intel-suite/intel-inspector.md b/docs.it4i/software/intel-suite/intel-inspector.md new file mode 100644 index 000000000..bd2989238 --- /dev/null +++ b/docs.it4i/software/intel-suite/intel-inspector.md @@ -0,0 +1,39 @@ +# Intel Inspector + +Intel Inspector is a dynamic memory and threading error checking tool for C/C++/Fortran applications. It can detect issues such as memory leaks, invalid memory references, uninitalized variables, race conditions, deadlocks etc. + +## Installed Versions + +The following versions are currently available on Salomon as modules: + +2016 Update 1 - Inspector/2016_update1 + +## Usage + +Your program should be compiled with -g switch to include symbol names. Optimizations can be turned on. + +Debugging is possible either directly from the GUI, or from command line. + +### GUI Mode + +To debug from GUI, launch Inspector: + +```console +$ inspxe-gui & +``` + +Then select menu File -> New -> Project. Choose a directory to save project data to. After clicking OK, Project properties window will appear, where you can configure path to your binary, launch arguments, working directory etc. After clicking OK, the project is ready. + +In the main pane, you can start a predefined analysis type or define your own. Click Start to start the analysis. Alternatively, you can click on Command Line, to see the command line required to run the analysis directly from command line. + +### Batch Mode + +Analysis can be also run from command line in batch mode. Batch mode analysis is run with command inspxe-cl. To obtain the required parameters, either consult the documentation or you can configure the analysis in the GUI and then click "Command Line" button in the lower right corner to the respective command line. + +Results obtained from batch mode can be then viewed in the GUI by selecting File -> Open -> Result... + +## References + +1. [Product page](https://software.intel.com/en-us/intel-inspector-xe) +1. [Documentation and Release Notes](https://software.intel.com/en-us/intel-inspector-xe-support/documentation) +1. [Tutorials](https://software.intel.com/en-us/articles/inspectorxe-tutorials) diff --git a/docs.it4i/software/intel-suite/intel-integrated-performance-primitives.md b/docs.it4i/software/intel-suite/intel-integrated-performance-primitives.md new file mode 100644 index 000000000..a47233367 --- /dev/null +++ b/docs.it4i/software/intel-suite/intel-integrated-performance-primitives.md @@ -0,0 +1,78 @@ +# Intel IPP + +## Intel Integrated Performance Primitives + +Intel Integrated Performance Primitives, version 9.0.1, compiled for AVX2 vector instructions is available, via module ipp. The IPP is a very rich library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax, as well as cryptographic functions, linear algebra functions and many more. + +Check out IPP before implementing own math functions for data processing, it is likely already there. + +```console +$ ml ipp +``` + +The module sets up environment variables, required for linking and running ipp enabled applications. + +## IPP Example + +```cpp +#include "ipp.h" +#include <stdio.h> +int main(int argc, char* argv[]) +{ + const IppLibraryVersion *lib; + Ipp64u fm; + IppStatus status; + + status= ippInit(); //IPP initialization with the best optimization layer + if( status != ippStsNoErr ) { + printf("IppInit() Error:n"); + printf("%sn", ippGetStatusString(status) ); + return -1; + } + + //Get version info + lib = ippiGetLibVersion(); + printf("%s %sn", lib->Name, lib->Version); + + //Get CPU features enabled with selected library level + fm=ippGetEnabledCpuFeatures(); + printf("SSE :%cn",(fm>1)&1?'Y':'N'); + printf("SSE2 :%cn",(fm>2)&1?'Y':'N'); + printf("SSE3 :%cn",(fm>3)&1?'Y':'N'); + printf("SSSE3 :%cn",(fm>4)&1?'Y':'N'); + printf("SSE41 :%cn",(fm>6)&1?'Y':'N'); + printf("SSE42 :%cn",(fm>7)&1?'Y':'N'); + printf("AVX :%cn",(fm>8)&1 ?'Y':'N'); + printf("AVX2 :%cn", (fm>15)&1 ?'Y':'N' ); + printf("----------n"); + printf("OS Enabled AVX :%cn", (fm>9)&1 ?'Y':'N'); + printf("AES :%cn", (fm>10)&1?'Y':'N'); + printf("CLMUL :%cn", (fm>11)&1?'Y':'N'); + printf("RDRAND :%cn", (fm>13)&1?'Y':'N'); + printf("F16C :%cn", (fm>14)&1?'Y':'N'); + + return 0; +} +``` + +Compile above example, using any compiler and the ipp module. + +```console +$ ml intel +$ ml ipp +$ icc testipp.c -o testipp.x -lippi -lipps -lippcore +``` + +You will need the ipp module loaded to run the ipp enabled executable. This may be avoided, by compiling library search paths into the executable + +```console +$ ml intel +$ ml ipp +$ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore +``` + +## Code Samples and Documentation + +Intel provides number of [Code Samples for IPP](https://software.intel.com/en-us/articles/code-samples-for-intel-integrated-performance-primitives-library), illustrating use of IPP. + +Read full documentation on IPP [on Intel website,](http://software.intel.com/sites/products/search/search.php?q=&x=15&y=6&product=ipp&version=7.1&docos=lin) in particular the [IPP Reference manual.](http://software.intel.com/sites/products/documentation/doclib/ipp_sa/71/ipp_manual/index.htm) diff --git a/docs.it4i/software/intel-suite/intel-mkl.md b/docs.it4i/software/intel-suite/intel-mkl.md new file mode 100644 index 000000000..2053e958b --- /dev/null +++ b/docs.it4i/software/intel-suite/intel-mkl.md @@ -0,0 +1,120 @@ +# Intel MKL + +## Intel Math Kernel Library + +Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL provides these basic math kernels: + +* BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, offering vector, vector-matrix, and matrix-matrix operations. +* The PARDISO direct sparse solver, an iterative sparse solver, and supporting sparse BLAS (level 1, 2, and 3) routines for solving sparse systems of equations. +* ScaLAPACK distributed processing linear algebra routines for Linux and Windows operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS). +* Fast Fourier transform (FFT) functions in one, two, or three dimensions with support for mixed radices (not limited to sizes that are powers of 2), as well as distributed versions of these functions. +* Vector Math Library (VML) routines for optimized mathematical operations on vectors. +* Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions. +* Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search. +* Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver. + +For details see the [Intel MKL Reference Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm). + +Intel MKL is available on the cluster + +```console +$ ml av imkl +$ ml imkl +``` + +The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $CPATH, $LD_LIBRARY_PATH and $MKL_EXAMPLES + +Intel MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL. + +### Interfaces + +Intel MKL library provides number of interfaces. The fundamental once are the LP64 and ILP64. The Intel MKL ILP64 libraries use the 64-bit integer type (necessary for indexing large arrays, with more than 231^-1 elements), whereas the LP64 libraries index arrays with the 32-bit integer type. + +| Interface | Integer type | +| --------- | -------------------------------------------- | +| LP64 | 32-bit, int, integer(kind=4), MPI_INT | +| ILP64 | 64-bit, long int, integer(kind=8), MPI_INT64 | + +### Linking + +Linking Intel MKL libraries may be complex. Intel [mkl link line advisor](http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor) helps. See also [examples](intel-mkl/#examples) below. + +You will need the mkl module loaded to run the mkl enabled executable. This may be avoided, by compiling library search paths into the executable. Include rpath on the compile line: + +```console +$ icc .... -Wl,-rpath=$LIBRARY_PATH ... +``` + +### Threading + +Advantage in using Intel MKL library is that it brings threaded parallelization to applications that are otherwise not parallel. + +For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS + +```console +$ export OMP_NUM_THREADS=24 # 16 for Anselm +$ export KMP_AFFINITY=granularity=fine,compact,1,0 +``` + +The application will run with 24 threads with affinity optimized for fine grain parallelization. + +## Examples + +Number of examples, demonstrating use of the Intel MKL library and its linking is available on clusters, in the $MKL_EXAMPLES directory. In the examples below, we demonstrate linking Intel MKL to Intel and GNU compiled program for multi-threaded matrix multiplication. + +### Working With Examples + +```console +$ ml intel +$ ml imkl +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ make sointel64 function=cblas_dgemm +``` + +In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL example suite installed on clusters. + +### Example: MKL and Intel Compiler + +```console +$ ml intel +$ ml imkl +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ +$ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x +$ ./cblas_dgemmx.x data/cblas_dgemmx.d +``` + +In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL with icc -mkl option. Using the -mkl option is equivalent to: + +```console +$ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 +``` + +In this example, we compile and link the cblas_dgemm example, using LP64 interface to threaded MKL and Intel OMP threads implementation. + +### Example: Intel MKL and GNU Compiler + +```console +$ ml GCC +$ ml imkl +$ cp -a $MKL_EXAMPLES/cblas /tmp/ +$ cd /tmp/cblas +$ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm +$ ./cblas_dgemmx.x data/cblas_dgemmx.d +``` + +In this example, we compile, link and run the cblas_dgemm example, using LP64 interface to threaded MKL and gnu OMP threads implementation. + +## MKL and MIC Accelerators + +The Intel MKL is capable to automatically offload the computations o the MIC accelerator. See section [Intel Xeon Phi](../intel-xeon-phi/) for details. + +## LAPACKE C Interface + +MKL includes LAPACKE C Interface to LAPACK. For some reason, although Intel is the author of LAPACKE, the LAPACKE header files are not present in MKL. For this reason, we have prepared LAPACKE module, which includes Intel's LAPACKE headers from official LAPACK, which you can use to compile code using LAPACKE interface against MKL. + +## Further Reading + +Read more on [Intel website](http://software.intel.com/en-us/intel-mkl), in particular the [MKL users guide](https://software.intel.com/en-us/intel-mkl/documentation/linux). diff --git a/docs.it4i/software/intel-suite/intel-parallel-studio-introduction.md b/docs.it4i/software/intel-suite/intel-parallel-studio-introduction.md new file mode 100644 index 000000000..7b6ba956b --- /dev/null +++ b/docs.it4i/software/intel-suite/intel-parallel-studio-introduction.md @@ -0,0 +1,69 @@ +# Intel Parallel Studio + +The Salomon cluster provides following elements of the Intel Parallel Studio XE + +Intel Parallel Studio XE + +* Intel Compilers +* Intel Debugger +* Intel MKL Library +* Intel Integrated Performance Primitives Library +* Intel Threading Building Blocks Library +* Intel Trace Analyzer and Collector +* Intel Advisor +* Intel Inspector + +## Intel Compilers + +The Intel compilers are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. + +```console +$ ml intel +$ icc -v +$ ifort -v +``` + +Read more at the [Intel Compilers](intel-compilers/) page. + +## Intel Debugger + +IDB is no longer available since Parallel Studio 2015. + +The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. + +```console +$ ml intel +$ idb +``` + +Read more at the [Intel Debugger](intel-debugger/) page. + +## Intel Math Kernel Library + +Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. + +```console +$ ml imkl +``` + +Read more at the [Intel MKL](intel-mkl/) page. + +## Intel Integrated Performance Primitives + +Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX is available, via module ipp. The IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. + +```console +$ ml ipp +``` + +Read more at the [Intel IPP](intel-integrated-performance-primitives/) page. + +## Intel Threading Building Blocks + +Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. + +```console +$ ml tbb +``` + +Read more at the [Intel TBB](intel-tbb/) page. diff --git a/docs.it4i/software/intel-suite/intel-tbb.md b/docs.it4i/software/intel-suite/intel-tbb.md new file mode 100644 index 000000000..59976aa7e --- /dev/null +++ b/docs.it4i/software/intel-suite/intel-tbb.md @@ -0,0 +1,40 @@ +# Intel TBB + +## Intel Threading Building Blocks + +Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. The tasks are executed by a runtime scheduler and may be offloaded to [MIC accelerator](../intel-xeon-phi/). + +Intel is available on the cluster. + +```console +$ ml av tbb +``` + +The module sets up environment variables, required for linking and running tbb enabled applications. + +Link the tbb library, using -ltbb + +## Examples + +Number of examples, demonstrating use of TBB and its built-in scheduler is available on Anselm, in the $TBB_EXAMPLES directory. + +```console +$ ml intel +$ ml tbb +$ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ +$ cd /tmp/parallel_reduce/primes +$ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb +$ ./primes.x +``` + +In this example, we compile, link and run the primes example, demonstrating use of parallel task-based reduce in computation of prime numbers. + +You will need the tbb module loaded to run the tbb enabled executable. This may be avoided, by compiling library search paths into the executable. + +```console +$ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb +``` + +## Further Reading + +Read more on Intel website, <http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/index.htm> diff --git a/docs.it4i/software/intel-suite/intel-trace-analyzer-and-collector.md b/docs.it4i/software/intel-suite/intel-trace-analyzer-and-collector.md new file mode 100644 index 000000000..9cae361ca --- /dev/null +++ b/docs.it4i/software/intel-suite/intel-trace-analyzer-and-collector.md @@ -0,0 +1,40 @@ +# Intel Trace Analyzer and Collector + +Intel Trace Analyzer and Collector (ITAC) is a tool to collect and graphicaly analyze behaviour of MPI applications. It helps you to analyze communication patterns of your application, identify hotspots, perform correctnes checking (identify deadlocks, data corruption etc), simulate how your application would run on a different interconnect. + +ITAC is a offline analysis tool - first you run your application to collect a trace file, then you can open the trace in a GUI analyzer to view it. + +## Installed Version + +Currently on Salomon is version 9.1.2.024 available as module itac/9.1.2.024 + +## Collecting Traces + +ITAC can collect traces from applications that are using Intel MPI. To generate a trace, simply add -trace option to your mpirun command : + +```console +$ ml itac/9.1.2.024 +$ mpirun -trace myapp +``` + +The trace will be saved in file myapp.stf in the current directory. + +## Viewing Traces + +To view and analyze the trace, open ITAC GUI in a [graphical environment](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/): + +```console +$ ml itac/9.1.2.024 +$ traceanalyzer +``` + +The GUI will launch and you can open the produced `*`.stf file. + + + +Please refer to Intel documenation about usage of the GUI tool. + +## References + +1. [Getting Started with Intel® Trace Analyzer and Collector](https://software.intel.com/en-us/get-started-with-itac-for-linux) +1. [Intel® Trace Analyzer and Collector - Documentation](https://software.intel.com/en-us/intel-trace-analyzer) diff --git a/docs.it4i/software/numerical-libraries/fftw.md b/docs.it4i/software/numerical-libraries/fftw.md new file mode 100644 index 000000000..7345a8116 --- /dev/null +++ b/docs.it4i/software/numerical-libraries/fftw.md @@ -0,0 +1,73 @@ +# FFTW + +The discrete Fourier transform in one or more dimensions, MPI parallel + +FFTW is a C subroutine library for computing the discrete Fourier transform in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, e.g. the discrete cosine/sine transforms or DCT/DST). The FFTW library allows for MPI parallel, in-place discrete Fourier transform, with data distributed over number of nodes. + +Two versions, **3.3.3** and **2.1.5** of FFTW are available on Anselm, each compiled for **Intel MPI** and **OpenMPI** using **intel** and **gnu** compilers. These are available via modules: + +| Version | Parallelization | module | linker options | +| -------------- | --------------- | ------------------- | ----------------------------------- | +| FFTW3 gcc3.3.3 | pthread, OpenMP | fftw3/3.3.3-gcc | -lfftw3, -lfftw3_threads-lfftw3_omp | +| FFTW3 icc3.3.3 | pthread, OpenMP | fftw3 | -lfftw3, -lfftw3_threads-lfftw3_omp | +| FFTW2 gcc2.1.5 | pthread | fftw2/2.1.5-gcc | -lfftw, -lfftw_threads | +| FFTW2 icc2.1.5 | pthread | fftw2 | -lfftw, -lfftw_threads | +| FFTW3 gcc3.3.3 | OpenMPI | fftw-mpi3/3.3.3-gcc | -lfftw3_mpi | +| FFTW3 icc3.3.3 | Intel MPI | fftw3-mpi | -lfftw3_mpi | +| FFTW2 gcc2.1.5 | OpenMPI | fftw2-mpi/2.1.5-gcc | -lfftw_mpi | +| FFTW2 gcc2.1.5 | IntelMPI | fftw2-mpi/2.1.5-gcc | -lfftw_mpi | + +```console +$ ml fftw3 **or** ml FFTW +``` + +The module sets up environment variables, required for linking and running FFTW enabled applications. Make sure that the choice of FFTW module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results. + +## Example + +```cpp + #include <fftw3-mpi.h> + int main(int argc, char **argv) + { + const ptrdiff_t N0 = 100, N1 = 1000; + fftw_plan plan; + fftw_complex *data; + ptrdiff_t alloc_local, local_n0, local_0_start, i, j; + + MPI_Init(&argc, &argv); + fftw_mpi_init(); + + /* get local data size and allocate */ + alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD, + &local_n0, &local_0_start); + data = fftw_alloc_complex(alloc_local); + + /* create plan for in-place forward DFT */ + plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD, + FFTW_FORWARD, FFTW_ESTIMATE); + + /* initialize data */ + for (i = 0; i < local_n0; ++i) for (j = 0; j < N1; ++j) + { data[i*N1 + j][0] = i; + data[i*N1 + j][1] = j; } + + /* compute transforms, in-place, as many times as desired */ + fftw_execute(plan); + + fftw_destroy_plan(plan); + + MPI_Finalize(); + } +``` + +Load modules and compile: + +```console +$ ml intel +$ ml fftw3-mpi +$ mpicc testfftw3mpi.c -o testfftw3mpi.x -Wl,-rpath=$LIBRARY_PATH -lfftw3_mpi +``` + +Run the example as [Intel MPI program](../mpi/running-mpich2/). + +Read more on FFTW usage on the [FFTW website.](http://www.fftw.org/fftw3_doc/) diff --git a/docs.it4i/software/numerical-libraries/gsl.md b/docs.it4i/software/numerical-libraries/gsl.md new file mode 100644 index 000000000..3299492dd --- /dev/null +++ b/docs.it4i/software/numerical-libraries/gsl.md @@ -0,0 +1,145 @@ +# GSL + +The GNU Scientific Library. Provides a wide range of mathematical routines. + +## Introduction + +The GNU Scientific Library (GSL) provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total. The routines have been written from scratch in C, and present a modern Applications Programming Interface (API) for C programmers, allowing wrappers to be written for very high level languages. + +The library covers a wide range of topics in numerical computing. Routines are available for the following areas: + + Complex Numbers Roots of Polynomials + + Special Functions Vectors and Matrices + + Permutations Combinations + + Sorting BLAS Support + + Linear Algebra CBLAS Library + + Fast Fourier Transforms Eigensystems + + Random Numbers Quadrature + + Random Distributions Quasi-Random Sequences + + Histograms Statistics + + Monte Carlo Integration N-Tuples + + Differential Equations Simulated Annealing + + Numerical Differentiation Interpolation + + Series Acceleration Chebyshev Approximations + + Root-Finding Discrete Hankel Transforms + + Least-Squares Fitting Minimization + + IEEE Floating-Point Physical Constants + + Basis Splines Wavelets + +## Modules + +The GSL 1.16 is available on Anselm, compiled for GNU and Intel compiler. These variants are available via modules: + +| Module | Compiler | +| --------------------- | --------- | +| gsl/1.16-gcc | gcc 4.8.6 | +| gsl/1.16-icc(default) | icc | + +```console +$ ml gsl +``` + +The module sets up environment variables, required for linking and running GSL enabled applications. This particular command loads the default module, which is gsl/1.16-icc + +## Linking + +Load an appropriate gsl module. Link using **-lgsl** switch to link your code against GSL. The GSL depends on cblas API to BLAS library, which must be supplied for linking. The BLAS may be provided, for example from the MKL library, as well as from the BLAS GSL library (-lgslcblas). Using the MKL is recommended. + +### Compiling and Linking With Intel Compilers + +```console +$ ml intel +$ ml gsl +$ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl +``` + +### Compiling and Linking With GNU Compilers + +```console +$ ml gcc +$ ml imkl **or** ml mkl +$ ml gsl/1.16-gcc +$ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lgsl +``` + +## Example + +Following is an example of discrete wavelet transform implemented by GSL: + +```cpp + #include <stdio.h> + #include <math.h> + #include <gsl/gsl_sort.h> + #include <gsl/gsl_wavelet.h> + + int + main (int argc, char **argv) + { + int i, n = 256, nc = 20; + double *data = malloc (n * sizeof (double)); + double *abscoeff = malloc (n * sizeof (double)); + size_t *p = malloc (n * sizeof (size_t)); + + gsl_wavelet *w; + gsl_wavelet_workspace *work; + + w = gsl_wavelet_alloc (gsl_wavelet_daubechies, 4); + work = gsl_wavelet_workspace_alloc (n); + + for (i=0; i<n; i++) + data[i] = sin (3.141592654*(double)i/256.0); + + gsl_wavelet_transform_forward (w, data, 1, n, work); + + for (i = 0; i < n; i++) + { + abscoeff[i] = fabs (data[i]); + } + + gsl_sort_index (p, abscoeff, 1, n); + + for (i = 0; (i + nc) < n; i++) + data[p[i]] = 0; + + gsl_wavelet_transform_inverse (w, data, 1, n, work); + + for (i = 0; i < n; i++) + { + printf ("%gn", data[i]); + } + + gsl_wavelet_free (w); + gsl_wavelet_workspace_free (work); + + free (data); + free (abscoeff); + free (p); + return 0; + } +``` + +Load modules and compile: + +```console +$ ml intel +$ ml gsl +$ icc dwt.c -o dwt.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl +``` + +In this example, we compile the dwt.c code using the Intel compiler and link it to the MKL and GSL library, note the -mkl and -lgsl options. The library search path is compiled in, so that no modules are necessary to run the code. diff --git a/docs.it4i/software/numerical-libraries/hdf5.md b/docs.it4i/software/numerical-libraries/hdf5.md new file mode 100644 index 000000000..13f626264 --- /dev/null +++ b/docs.it4i/software/numerical-libraries/hdf5.md @@ -0,0 +1,89 @@ +# HDF5 + +Hierarchical Data Format library. Serial and MPI parallel version. + +[HDF5 (Hierarchical Data Format)](http://www.hdfgroup.org/HDF5/) is a general purpose library and file format for storing scientific data. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic objects, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. + +Versions **1.8.11** and **1.8.13** of HDF5 library are available on Anselm, compiled for **Intel MPI** and **OpenMPI** using **intel** and **gnu** compilers. These are available via modules: + +| Version | Parallelization | module | C linker options | C++ linker options | Fortran linker options | +| --------------------- | --------------------------------- | -------------------------- | --------------------- | ----------------------- | ----------------------- | +| HDF5 icc serial | pthread | hdf5/1.8.11 | $HDF5_INC $HDF5_SHLIB | $HDF5_INC $HDF5_CPP_LIB | $HDF5_INC $HDF5_F90_LIB | +| HDF5 icc parallel MPI | pthread, IntelMPI | hdf5-parallel/1.8.11 | $HDF5_INC $HDF5_SHLIB | Not supported | $HDF5_INC $HDF5_F90_LIB | +| HDF5 icc serial | pthread | hdf5/1.8.13 | $HDF5_INC $HDF5_SHLIB | $HDF5_INC $HDF5_CPP_LIB | $HDF5_INC $HDF5_F90_LIB | +| HDF5 icc parallel MPI | pthread, IntelMPI | hdf5-parallel/1.8.13 | $HDF5_INC $HDF5_SHLIB | Not supported | $HDF5_INC $HDF5_F90_LIB | +| HDF5 gcc parallel MPI | pthread, OpenMPI 1.6.5, gcc 4.8.1 | hdf5-parallel/1.8.11-gcc | $HDF5_INC $HDF5_SHLIB | Not supported | $HDF5_INC $HDF5_F90_LIB | +| HDF5 gcc parallel MPI | pthread, OpenMPI 1.6.5, gcc 4.8.1 | hdf5-parallel/1.8.13-gcc | $HDF5_INC $HDF5_SHLIB | Not supported | $HDF5_INC $HDF5_F90_LIB | +| HDF5 gcc parallel MPI | pthread, OpenMPI 1.8.1, gcc 4.9.0 | hdf5-parallel/1.8.13-gcc49 | $HDF5_INC $HDF5_SHLIB | Not supported | $HDF5_INC $HDF5_F90_LIB | + +```console + +$ ml hdf5-parallel +``` + +The module sets up environment variables, required for linking and running HDF5 enabled applications. Make sure that the choice of HDF5 module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results. + +!!! note + Be aware, that GCC version of **HDF5 1.8.11** has serious performance issues, since it's compiled with -O0 optimization flag. This version is provided only for testing of code compiled only by GCC and IS NOT recommended for production computations. For more information, please see: <http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811> + + All GCC versions of **HDF5 1.8.13** are not affected by the bug, are compiled with -O3 optimizations and are recommended for production computations. + +## Example + +```cpp + #include "hdf5.h" + #define FILE "dset.h5" + + int main() { + + hid_t file_id, dataset_id, dataspace_id; /* identifiers */ + hsize_t dims[2]; + herr_t status; + int i, j, dset_data[4][6]; + + /* Create a new file using default properties. */ + file_id = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); + + /* Create the data space for the dataset. */ + dims[0] = 4; + dims[1] = 6; + dataspace_id = H5Screate_simple(2, dims, NULL); + + /* Initialize the dataset. */ + for (i = 0; i < 4; i++) + for (j = 0; j < 6; j++) + dset_data[i][j] = i * 6 + j + 1; + + /* Create the dataset. */ + dataset_id = H5Dcreate2(file_id, "/dset", H5T_STD_I32BE, dataspace_id, + H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); + + /* Write the dataset. */ + status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, + dset_data); + + status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, + dset_data); + + /* End access to the dataset and release resources used by it. */ + status = H5Dclose(dataset_id); + + /* Terminate access to the data space. */ + status = H5Sclose(dataspace_id); + + /* Close the file. */ + status = H5Fclose(file_id); + } +``` + +Load modules and compile: + +```console +$ ml intel +$ ml hdf5-parallel +$ mpicc hdf5test.c -o hdf5test.x -Wl,-rpath=$LIBRARY_PATH $HDF5_INC $HDF5_SHLIB +``` + +Run the example as [Intel MPI program](../mpi/running-mpich2/). + +For further information, please see the website: <http://www.hdfgroup.org/HDF5/> diff --git a/docs.it4i/software/numerical-libraries/intel-numerical-libraries.md b/docs.it4i/software/numerical-libraries/intel-numerical-libraries.md new file mode 100644 index 000000000..5f3834ffa --- /dev/null +++ b/docs.it4i/software/numerical-libraries/intel-numerical-libraries.md @@ -0,0 +1,33 @@ +# Intel numerical libraries + +Intel libraries for high performance in numerical computing + +## Intel Math Kernel Library + +Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. + +```console +$ ml mkl **or** ml imkl +``` + +Read more at the [Intel MKL](../intel-suite/intel-mkl/) page. + +## Intel Integrated Performance Primitives + +Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX is available, via module ipp. The IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. + +```console +$ ml ipp +``` + +Read more at the [Intel IPP](../intel-suite/intel-integrated-performance-primitives/) page. + +## Intel Threading Building Blocks + +Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. + +```console +$ ml tbb +``` + +Read more at the [Intel TBB](../intel-suite/intel-tbb/) page. diff --git a/docs.it4i/software/numerical-libraries/magma-for-intel-xeon-phi.md b/docs.it4i/software/numerical-libraries/magma-for-intel-xeon-phi.md new file mode 100644 index 000000000..64c443796 --- /dev/null +++ b/docs.it4i/software/numerical-libraries/magma-for-intel-xeon-phi.md @@ -0,0 +1,76 @@ +# MAGMA for Intel Xeon Phi + +Next generation dense algebra library for heterogeneous systems with accelerators + +## Compiling and Linking With MAGMA + +To be able to compile and link code with MAGMA library user has to load following module: + +```console +$ ml magma/1.3.0-mic +``` + +To make compilation more user friendly module also sets these two environment variables: + +!!! note + MAGMA_INC - contains paths to the MAGMA header files (to be used for compilation step) + +!!! note + MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking step). + +Compilation example: + +```console +$ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall $MAGMA_INC -c testing_dgetrf_mic.cpp -o testing_dgetrf_mic.o +$ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall -fPIC -Xlinker -zmuldefs -Wall -DNOCHANGE -DHOST testing_dgetrf_mic.o -o testing_dgetrf_mic $MAGMA_LIBS +``` + +### Running MAGMA Code + +MAGMA implementation for Intel MIC requires a MAGMA server running on accelerator prior to executing the user application. The server can be started and stopped using following scripts: + +!!! note + To start MAGMA server use: + **$MAGMAROOT/start_magma_server** + +!!! note + To stop the server use: + **$MAGMAROOT/stop_magma_server** + +!!! note + For deeper understanding how the MAGMA server is started, see the following script: + **$MAGMAROOT/launch_anselm_from_mic.sh** + +To test if the MAGMA server runs properly we can run one of examples that are part of the MAGMA installation: + +```console +[user@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic +[user@cn204 ~]$ export OMP_NUM_THREADS=16 +[lriha@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic + Usage: /apps/libs/magma-mic/magmamic-1.3.0/testing/testing_dgetrf_mic [options] [-h|--help] + + M N CPU GFlop/s (sec) MAGMA GFlop/s (sec) ||PA-LU||/(||A||*N) + ========================================================================= + 1088 1088 --- ( --- ) 13.93 ( 0.06) --- + 2112 2112 --- ( --- ) 77.85 ( 0.08) --- + 3136 3136 --- ( --- ) 183.21 ( 0.11) --- + 4160 4160 --- ( --- ) 227.52 ( 0.21) --- + 5184 5184 --- ( --- ) 258.61 ( 0.36) --- + 6208 6208 --- ( --- ) 333.12 ( 0.48) --- + 7232 7232 --- ( --- ) 416.52 ( 0.61) --- + 8256 8256 --- ( --- ) 446.97 ( 0.84) --- + 9280 9280 --- ( --- ) 461.15 ( 1.16) --- + 10304 10304 --- ( --- ) 500.70 ( 1.46) --- +``` + +!!! hint + MAGMA contains several benchmarks and examples in `$MAGMAROOT/testing/` + +!!! note + MAGMA relies on the performance of all CPU cores as well as on the performance of the accelerator. Therefore on Anselm number of CPU OpenMP threads has to be set to 16 with `export OMP_NUM_THREADS=16`. + +See more details at [MAGMA home page](http://icl.cs.utk.edu/magma/). + +## References + +[1] MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors, Jack Dongarra et. al, <http://icl.utk.edu/projectsfiles/magma/pubs/24-MAGMA_MIC_03.pdf> diff --git a/docs.it4i/software/numerical-libraries/petsc.md b/docs.it4i/software/numerical-libraries/petsc.md new file mode 100644 index 000000000..214e4074a --- /dev/null +++ b/docs.it4i/software/numerical-libraries/petsc.md @@ -0,0 +1,60 @@ +# PETSc + +PETSc is a suite of building blocks for the scalable solution of scientific and engineering applications modeled by partial differential equations. It supports MPI, shared memory, and GPU through CUDA or OpenCL, as well as hybrid MPI-shared memory or MPI-GPU parallelism. + +## Introduction + +PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of building blocks (data structures and routines) for the scalable solution of scientific and engineering applications modelled by partial differential equations. It allows thinking in terms of high-level objects (matrices) instead of low-level objects (raw arrays). Written in C language but can also be called from FORTRAN, C++, Python and Java codes. It supports MPI, shared memory, and GPUs through CUDA or OpenCL, as well as hybrid MPI-shared memory or MPI-GPU parallelism. + +## Resources + +* [project webpage](http://www.mcs.anl.gov/petsc/) +* [documentation](http://www.mcs.anl.gov/petsc/documentation/) + * [PETSc Users Manual (PDF)](http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) + * [index of all manual pages](http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/singleindex.html) +* PRACE Video Tutorial [part1](http://www.youtube.com/watch?v=asVaFg1NDqY), [part2](http://www.youtube.com/watch?v=ubp_cSibb9I), [part3](http://www.youtube.com/watch?v=vJAAAQv-aaw), [part4](http://www.youtube.com/watch?v=BKVlqWNh8jY), [part5](http://www.youtube.com/watch?v=iXkbLEBFjlM) + +## Modules + +You can start using PETSc on Anselm by loading the PETSc module. Module names obey this pattern: + +```console +$# ml petsc/version-compiler-mpi-blas-variant, e.g. +$ ml petsc/3.4.4-icc-impi-mkl-opt +``` + +where `variant` is replaced by one of `{dbg, opt, threads-dbg, threads-opt}`. The `opt` variant is compiled without debugging information (no `-g` option) and with aggressive compiler optimizations (`-O3 -xAVX`). This variant is suitable for performance measurements and production runs. In all other cases use the debug (`dbg`) variant, because it contains debugging information, performs validations and self-checks, and provides a clear stack trace and message in case of an error. The other two variants `threads-dbg` and `threads-opt` are `dbg` and `opt`, respectively, built with [OpenMP and pthreads threading support](https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html). + +## External Libraries + +PETSc needs at least MPI, BLAS and LAPACK. These dependencies are currently satisfied with Intel MPI and Intel MKL in Anselm `petsc` modules. + +PETSc can be linked with a plethora of [external numerical libraries](http://www.mcs.anl.gov/petsc/miscellaneous/external.html), extending PETSc functionality, e.g. direct linear system solvers, preconditioners or partitioners. See below a list of libraries currently included in Anselm `petsc` modules. + +All these libraries can be used also alone, without PETSc. Their static or shared program libraries are available in +`$PETSC_DIR/$PETSC_ARCH/lib` and header files in `$PETSC_DIR/$PETSC_ARCH/include`. `PETSC_DIR` and `PETSC_ARCH` are environment variables pointing to a specific PETSc instance based on the petsc module loaded. + +### Libraries Linked to PETSc on Anselm (As of 11 April 2015) + +* dense linear algebra + * [Elemental](http://libelemental.org/) +* sparse linear system solvers + * [Intel MKL Pardiso](https://software.intel.com/en-us/node/470282) + * [MUMPS](http://mumps.enseeiht.fr/) + * [PaStiX](http://pastix.gforge.inria.fr/) + * [SuiteSparse](http://faculty.cse.tamu.edu/davis/suitesparse.html) + * [SuperLU](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu) + * [SuperLU_Dist](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu_dist) +* input/output + * [ExodusII](http://sourceforge.net/projects/exodusii/) + * [HDF5](http://www.hdfgroup.org/HDF5/) + * [NetCDF](http://www.unidata.ucar.edu/software/netcdf/) +* partitioning + * [Chaco](http://www.cs.sandia.gov/CRF/chac.html) + * [METIS](http://glaros.dtc.umn.edu/gkhome/metis/metis/overview) + * [ParMETIS](http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview) + * [PT-Scotch](http://www.labri.fr/perso/pelegrin/scotch/) +* preconditioners & multigrid + * [Hypre](http://www.nersc.gov/users/software/programming-libraries/math-libraries/petsc/) + * [Trilinos ML](http://trilinos.sandia.gov/packages/ml/) + * [SPAI - Sparse Approximate Inverse](https://bitbucket.org/petsc/pkg-spai) diff --git a/docs.it4i/software/numerical-libraries/trilinos.md b/docs.it4i/software/numerical-libraries/trilinos.md new file mode 100644 index 000000000..36688e989 --- /dev/null +++ b/docs.it4i/software/numerical-libraries/trilinos.md @@ -0,0 +1,49 @@ +# Trilinos + +Packages for large scale scientific and engineering problems. Provides MPI and hybrid parallelization. + +## Introduction + +Trilinos is a collection of software packages for the numerical solution of large scale scientific and engineering problems. It is based on C++ and features modern object-oriented design. Both serial as well as parallel computations based on MPI and hybrid parallelization are supported within Trilinos packages. + +## Installed Packages + +Current Trilinos installation on ANSELM contains (among others) the following main packages + +* **Epetra** - core linear algebra package containing classes for manipulation with serial and distributed vectors, matrices, and graphs. Dense linear solvers are supported via interface to BLAS and LAPACK (Intel MKL on ANSELM). Its extension **EpetraExt** contains e.g. methods for matrix-matrix multiplication. +* **Tpetra** - next-generation linear algebra package. Supports 64-bit indexing and arbitrary data type using C++ templates. +* **Belos** - library of various iterative solvers (CG, block CG, GMRES, block GMRES etc.). +* **Amesos** - interface to direct sparse solvers. +* **Anasazi** - framework for large-scale eigenvalue algorithms. +* **IFPACK** - distributed algebraic preconditioner (includes e.g. incomplete LU factorization) +* **Teuchos** - common tools packages. This package contains classes for memory management, output, performance monitoring, BLAS and LAPACK wrappers etc. + +For the full list of Trilinos packages, descriptions of their capabilities, and user manuals see [http://trilinos.sandia.gov.](http://trilinos.sandia.gov) + +## Installed Version + +Currently, Trilinos in version 11.2.3 compiled with Intel Compiler is installed on ANSELM. + +## Compiling Against Trilinos + +First, load the appropriate module: + +```console +$ ml trilinos +``` + +For the compilation of CMake-aware project, Trilinos provides the FIND_PACKAGE( Trilinos ) capability, which makes it easy to build against Trilinos, including linking against the correct list of libraries. For details, see <http://trilinos.sandia.gov/Finding_Trilinos.txt> + +For compiling using simple makefiles, Trilinos provides Makefile.export system, which allows users to include important Trilinos variables directly into their makefiles. This can be done simply by inserting the following line into the makefile: + +```cpp +include Makefile.export.Trilinos +``` + +or + +```cpp +include Makefile.export.<package> +``` + +if you are interested only in a specific Trilinos package. This will give you access to the variables such as Trilinos_CXX_COMPILER, Trilinos_INCLUDE_DIRS, Trilinos_LIBRARY_DIRS etc. For the detailed description and example makefile see <http://trilinos.sandia.gov/Export_Makefile.txt>. diff --git a/mkdocs.yml b/mkdocs.yml index 5be70b08f..354fdc997 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -93,6 +93,16 @@ pages: - Total View: software/debuggers/total-view.md - Valgrind: software/debuggers/valgrind.md - Vampir: software/debuggers/vampir.md + - 'Intel Suite': + - Introduction: software/intel-suite/intel-parallel-studio-introduction.md + - Intel Advisor: software/intel-suite/intel-advisor.md + - Intel Compilers: software/intel-suite/intel-compilers.md + - Intel Debugger: software/intel-suite/intel-debugger.md + - Intel IPP: software/intel-suite/intel-integrated-performance-primitives.md + - Intel Inspector: software/intel-suite/intel-inspector.md + - Intel MKL: software/intel-suite/intel-mkl.md + - Intel TBB: software/intel-suite/intel-tbb.md + - Intel Trace Analyzer and Collector: software/intel-suite/intel-trace-analyzer-and-collector.md - ISV Licenses: software/isv_licenses.md - Java: software/java.md - 'Machine larning': @@ -110,6 +120,14 @@ pages: - Matlab 2013-2014: software/numerical-languages/matlab_1314.md - Octave: software/numerical-languages/octave.md - OpenCoarrays: software/numerical-languages/opencoarrays.md + - 'Numerical Libraries': + - FFTW: software/numerical-libraries/fftw.md + - GSL: software/numerical-libraries/gsl.md + - HDF5: software/numerical-libraries/hdf5.md + - Intel Numerical Libraries: software/numerical-libraries/intel-numerical-libraries.md + - MAGMA for Intel Xeon Phi: software/numerical-libraries/magma-for-intel-xeon-phi.md + - PETSc: software/numerical-libraries/petsc.md + - Trilinos: software/numerical-libraries/trilinos.md - OpenFOAM: software/openfoam.md - Operating System: software/operating-system.md - 'Tools': @@ -120,38 +138,13 @@ pages: - Available Modules: modules-salomon.md - Available Modules on UV: modules-salomon-uv.md - Compilers: salomon/software/compilers.md - - 'Intel Suite': - - Introduction: salomon/software/intel-suite/intel-parallel-studio-introduction.md - - Intel Advisor: salomon/software/intel-suite/intel-advisor.md - - Intel Compilers: salomon/software/intel-suite/intel-compilers.md - - Intel Debugger: salomon/software/intel-suite/intel-debugger.md - - Intel IPP: salomon/software/intel-suite/intel-integrated-performance-primitives.md - - Intel Inspector: salomon/software/intel-suite/intel-inspector.md - - Intel MKL: salomon/software/intel-suite/intel-mkl.md - - Intel TBB: salomon/software/intel-suite/intel-tbb.md - - Intel Trace Analyzer and Collector: salomon/software/intel-suite/intel-trace-analyzer-and-collector.md - Intel Xeon Phi: salomon/software/intel-xeon-phi.md - ParaView: salomon/software/paraview.md - Anselm Software: - Available Modules: modules-anselm.md - Compilers: anselm/software/compilers.md - GPI-2: anselm/software/gpi2.md - - 'Intel Suite': - - Introduction: anselm/software/intel-suite/introduction.md - - Intel Compilers: anselm/software/intel-suite/intel-compilers.md - - Intel Debugger: anselm/software/intel-suite/intel-debugger.md - - Intel IPP: anselm/software/intel-suite/intel-integrated-performance-primitives.md - - Intel MKL: anselm/software/intel-suite/intel-mkl.md - - Intel TBB: anselm/software/intel-suite/intel-tbb.md - Intel Xeon Phi: anselm/software/intel-xeon-phi.md - - 'Numerical Libraries': - - FFTW: anselm/software/numerical-libraries/fftw.md - - GSL: anselm/software/numerical-libraries/gsl.md - - HDF5: anselm/software/numerical-libraries/hdf5.md - - Intel Numerical Libraries: anselm/software/numerical-libraries/intel-numerical-libraries.md - - MAGMA for Intel Xeon Phi: anselm/software/numerical-libraries/magma-for-intel-xeon-phi.md - - PETSc: anselm/software/numerical-libraries/petsc.md - - Trilinos: anselm/software/numerical-libraries/trilinos.md - NVIDIA CUDA: anselm/software/nvidia-cuda.md - 'Omics Master': - Diagnostic Component (TEAM): anselm/software/omics-master/diagnostic-component-team.md diff --git a/todelete b/todelete index ecc2f20e6..67463a2a3 100644 --- a/todelete +++ b/todelete @@ -28,3 +28,52 @@ docs.it4i/salomon/software/numerical-languages/r.md ./docs.it4i/anselm/software/machine-learning/tensorflow.md ./docs.it4i/salomon/software/machine-learning/introduction.md ./docs.it4i/salomon/software/machine-learning/tensorflow.md +./docs.it4i/anselm/software/debuggers +./docs.it4i/anselm/software/debuggers/allinea-ddt.md +./docs.it4i/anselm/software/debuggers/allinea-performance-reports.md +./docs.it4i/anselm/software/debuggers/cube.md +./docs.it4i/anselm/software/debuggers/debuggers.md +./docs.it4i/anselm/software/debuggers/intel-performance-counter-monitor.md +./docs.it4i/anselm/software/debuggers/intel-vtune-amplifier.md +./docs.it4i/anselm/software/debuggers/papi.md +./docs.it4i/anselm/software/debuggers/scalasca.md +./docs.it4i/anselm/software/debuggers/score-p.md +./docs.it4i/anselm/software/debuggers/total-view.md +./docs.it4i/anselm/software/debuggers/valgrind.md +./docs.it4i/anselm/software/debuggers/vampir.md +./docs.it4i/salomon/software/debuggers +./docs.it4i/salomon/software/debuggers/Introduction.md +./docs.it4i/salomon/software/debuggers/aislinn.md +./docs.it4i/salomon/software/debuggers/allinea-ddt.md +./docs.it4i/salomon/software/debuggers/allinea-performance-reports.md +./docs.it4i/salomon/software/debuggers/intel-vtune-amplifier.md +./docs.it4i/salomon/software/debuggers/mympiprog_32p_2014-10-15_16-56.html +./docs.it4i/salomon/software/debuggers/mympiprog_32p_2014-10-15_16-56.txt +./docs.it4i/salomon/software/debuggers/total-view.md +./docs.it4i/salomon/software/debuggers/valgrind.md +./docs.it4i/salomon/software/debuggers/vampir.md +./docs.it4i/anselm/software/numerical-libraries +./docs.it4i/anselm/software/numerical-libraries/fftw.md +./docs.it4i/anselm/software/numerical-libraries/gsl.md +./docs.it4i/anselm/software/numerical-libraries/hdf5.md +./docs.it4i/anselm/software/numerical-libraries/intel-numerical-libraries.md +./docs.it4i/anselm/software/numerical-libraries/magma-for-intel-xeon-phi.md +./docs.it4i/anselm/software/numerical-libraries/petsc.md +./docs.it4i/anselm/software/numerical-libraries/trilinos.md +./docs.it4i/anselm/software/intel-suite +./docs.it4i/anselm/software/intel-suite/intel-compilers.md +./docs.it4i/anselm/software/intel-suite/intel-debugger.md +./docs.it4i/anselm/software/intel-suite/intel-integrated-performance-primitives.md +./docs.it4i/anselm/software/intel-suite/intel-mkl.md +./docs.it4i/anselm/software/intel-suite/intel-tbb.md +./docs.it4i/anselm/software/intel-suite/introduction.md +./docs.it4i/salomon/software/intel-suite +./docs.it4i/salomon/software/intel-suite/intel-advisor.md +./docs.it4i/salomon/software/intel-suite/intel-compilers.md +./docs.it4i/salomon/software/intel-suite/intel-debugger.md +./docs.it4i/salomon/software/intel-suite/intel-inspector.md +./docs.it4i/salomon/software/intel-suite/intel-integrated-performance-primitives.md +./docs.it4i/salomon/software/intel-suite/intel-mkl.md +./docs.it4i/salomon/software/intel-suite/intel-parallel-studio-introduction.md +./docs.it4i/salomon/software/intel-suite/intel-tbb.md +./docs.it4i/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md -- GitLab