Compare revisions

b834a65c · b834a65c · b834a65c · b834a65c · b834a65c · b834a65c
--- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-inspector.md
+++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-inspector.md
+Intel Inspector 
+===============
+
+
+
+
+
+
+
+
+
+Intel Inspector is a dynamic memory and threading error checking tool
+for C/C++/Fortran applications. It can detect issues such as memory
+leaks, invalid memory references, uninitalized variables, race
+conditions, deadlocks etc.
+
+Installed versions
+------------------
+
+The following versions are currently available on Salomon as modules:
+
+--------------- -------------------------
+Version**     **Module**
+2016 Update 1   Inspector/2016_update1
+--------------- -------------------------
+
+Usage
+-----
+
+Your program should be compiled with -g switch to include symbol names.
+Optimizations can be turned on.
+
+Debugging is possible either directly from the GUI, or from command
+line.
+
+### GUI mode
+
+To debug from GUI, launch Inspector:
+
+  $ inspxe-gui &
+
+Then select menu File -&gt; New -&gt; Project. Choose a directory to
+save project data to. After clicking OK, Project properties window will
+appear, where you can configure path to your binary, launch arguments,
+working directory etc. After clicking OK, the project is ready.
+
+In the main pane, you can start a predefined analysis type or define
+your own. Click Start to start the analysis. Alternatively, you can
+click on Command Line, to see the command line required to run the
+analysis directly from command line.
+
+### Batch mode
+
+Analysis can be also run from command line in batch mode. Batch mode
+analysis is run with command  inspxe-cl.
+To obtain the required parameters, either consult the documentation or
+you can configure the analysis in the GUI and then click "Command Line"
+button in the lower right corner to the respective command line.
+
+Results obtained from batch mode can be then viewed in the GUI by
+selecting File -&gt; Open -&gt; Result...
+
+References
+----------
+
+1.[Product
+  page](https://software.intel.com/en-us/intel-inspector-xe)
+2.[Documentation and Release
+  Notes](https://software.intel.com/en-us/intel-inspector-xe-support/documentation)
+3.[Tutorials](https://software.intel.com/en-us/articles/inspectorxe-tutorials)
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-integrated-performance-primitives.md
+++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-integrated-performance-primitives.md
+Intel IPP 
+=========
+
+
+
+
+
+
+
+
+
+
+
+
+
+Intel Integrated Performance Primitives
+---------------------------------------
+
+Intel Integrated Performance Primitives, version 9.0.1, compiled for
+AVX2 vector instructions is available, via module ipp. The IPP is a very
+rich library of highly optimized algorithmic building blocks for media
+and data applications. This includes signal, image and frame processing
+algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough
+transform, Sum, MinMax, as well as cryptographic functions, linear
+algebra functions and many more.
+
+Check out IPP before implementing own math functions for data
+processing, it is likely already there.
+
+  $ module load ipp
+
+The module sets up environment variables, required for linking and
+running ipp enabled applications.
+
+IPP example
+-----------
+
+  #include "ipp.h"
+  #include <stdio.h>
+  int main(int argc, char* argv[])
+  {
+          const IppLibraryVersion *lib;
+          Ipp64u fm;
+          IppStatus status;
+
+          status= ippInit();            //IPP initialization with the best optimization layer
+          if( status != ippStsNoErr ) {
+                  printf("IppInit() Error:n");
+                  printf("%sn", ippGetStatusString(status) );
+                  return -1;
+          }
+
+          //Get version info
+          lib = ippiGetLibVersion();
+          printf("%s %sn", lib->Name, lib->Version);
+
+          //Get CPU features enabled with selected library level
+          fm=ippGetEnabledCpuFeatures();
+          printf("SSE    :%cn",(fm>>1)&1?'Y':'N');
+          printf("SSE2   :%cn",(fm>>2)&1?'Y':'N');
+          printf("SSE3   :%cn",(fm>>3)&1?'Y':'N');
+          printf("SSSE3  :%cn",(fm>>4)&1?'Y':'N');
+          printf("SSE41  :%cn",(fm>>6)&1?'Y':'N');
+          printf("SSE42  :%cn",(fm>>7)&1?'Y':'N');
+          printf("AVX    :%cn",(fm>>8)&1 ?'Y':'N');
+          printf("AVX2   :%cn", (fm>>15)&1 ?'Y':'N' );
+          printf("----------n");
+          printf("OS Enabled AVX :%cn", (fm>>9)&1 ?'Y':'N');
+          printf("AES            :%cn", (fm>>10)&1?'Y':'N');
+          printf("CLMUL          :%cn", (fm>>11)&1?'Y':'N');
+          printf("RDRAND         :%cn", (fm>>13)&1?'Y':'N');
+          printf("F16C           :%cn", (fm>>14)&1?'Y':'N');
+
+          return 0;
+  }
+
+ Compile above example, using any compiler and the ipp module.
+
+  $ module load intel
+  $ module load ipp
+
+  $ icc testipp.c -o testipp.x -lippi -lipps -lippcore
+
+You will need the ipp module loaded to run the ipp enabled executable.
+This may be avoided, by compiling library search paths into the
+executable
+
+  $ module load intel
+  $ module load ipp
+
+  $ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore
+
+Code samples and documentation
+------------------------------
+
+Intel provides number of [Code Samples for
+IPP](https://software.intel.com/en-us/articles/code-samples-for-intel-integrated-performance-primitives-library),
+illustrating use of IPP.
+
+Read full documentation on IPP [on Intel
+website,](http://software.intel.com/sites/products/search/search.php?q=&x=15&y=6&product=ipp&version=7.1&docos=lin)
+in particular the [IPP Reference
+manual.](http://software.intel.com/sites/products/documentation/doclib/ipp_sa/71/ipp_manual/index.htm)
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-mkl.md
+++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-mkl.md
+Intel MKL 
+=========
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Intel Math Kernel Library
+-------------------------
+
+Intel Math Kernel Library (Intel MKL) is a library of math kernel
+subroutines, extensively threaded and optimized for maximum performance.
+Intel MKL provides these basic math kernels:
+
+
+- 
+
+  
+
+  BLAS (level 1, 2, and 3) and LAPACK linear algebra routines,
+  offering vector, vector-matrix, and matrix-matrix operations.
+- 
+
+  
+
+  The PARDISO direct sparse solver, an iterative sparse solver,
+  and supporting sparse BLAS (level 1, 2, and 3) routines for solving
+  sparse systems of equations.
+- 
+
+  
+
+  ScaLAPACK distributed processing linear algebra routines for
+  Linux* and Windows* operating systems, as well as the Basic Linear
+  Algebra Communications Subprograms (BLACS) and the Parallel Basic
+  Linear Algebra Subprograms (PBLAS).
+- 
+
+  
+
+  Fast Fourier transform (FFT) functions in one, two, or three
+  dimensions with support for mixed radices (not limited to sizes that
+  are powers of 2), as well as distributed versions of
+  these functions.
+- 
+
+  
+
+  Vector Math Library (VML) routines for optimized mathematical
+  operations on vectors.
+- 
+
+  
+
+  Vector Statistical Library (VSL) routines, which offer
+  high-performance vectorized random number generators (RNG) for
+  several probability distributions, convolution and correlation
+  routines, and summary statistics functions.
+- 
+
+  
+
+  Data Fitting Library, which provides capabilities for
+  spline-based approximation of functions, derivatives and integrals
+  of functions, and search.
+- Extended Eigensolver, a shared memory  version of an eigensolver
+  based on the Feast Eigenvalue Solver.
+
+
+
+For details see the [Intel MKL Reference
+Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm).
+
+Intel MKL version 11.2.3.187 is available on the cluster
+
+  $ module load imkl
+
+The module sets up environment variables, required for linking and
+running mkl enabled applications. The most important variables are the
+$MKLROOT, $CPATH, $LD_LIBRARY_PATH and $MKL_EXAMPLES
+
+Intel MKL library may be linked using any compiler.
+With intel compiler use -mkl option to link default threaded MKL.
+
+### Interfaces
+
+Intel MKL library provides number of interfaces. The fundamental once
+are the LP64 and ILP64. The Intel MKL ILP64 libraries use the 64-bit
+integer type (necessary for indexing large arrays, with more than
+231^-1 elements), whereas the LP64 libraries index arrays with the
+32-bit integer type.
+
+Interface   Integer type
+----------- -----------------------------------------------
+LP64        32-bit, int, integer(kind=4), MPI_INT
+ILP64       64-bit, long int, integer(kind=8), MPI_INT64
+
+### Linking
+
+Linking Intel MKL libraries may be complex. Intel [mkl link line
+advisor](http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor)
+helps. See also [examples](intel-mkl.html#examples) below.
+
+You will need the mkl module loaded to run the mkl enabled executable.
+This may be avoided, by compiling library search paths into the
+executable. Include  rpath on the compile line:
+
+  $ icc .... -Wl,-rpath=$LIBRARY_PATH ...
+
+### Threading
+
+Advantage in using Intel MKL library is that it brings threaded
+parallelization to applications that are otherwise not parallel.
+
+For this to work, the application must link the threaded MKL library
+(default). Number and behaviour of MKL threads may be controlled via the
+OpenMP environment variables, such as OMP_NUM_THREADS and
+KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS
+
+  $ export OMP_NUM_THREADS=24
+  $ export KMP_AFFINITY=granularity=fine,compact,1,0
+
+The application will run with 24 threads with affinity optimized for
+fine grain parallelization.
+
+Examples
+------------
+
+Number of examples, demonstrating use of the Intel MKL library and its
+linking is available on clusters, in the $MKL_EXAMPLES directory. In
+the examples below, we demonstrate linking Intel MKL to Intel and GNU
+compiled program for multi-threaded matrix multiplication.
+
+### Working with examples
+
+  $ module load intel
+  $ module load imkl
+  $ cp -a $MKL_EXAMPLES/cblas /tmp/
+  $ cd /tmp/cblas
+
+  $ make sointel64 function=cblas_dgemm
+
+In this example, we compile, link and run the cblas_dgemm  example,
+demonstrating use of MKL example suite installed on clusters.
+
+### Example: MKL and Intel compiler
+
+  $ module load intel
+  $ module load imkl
+  $ cp -a $MKL_EXAMPLES/cblas /tmp/
+  $ cd /tmp/cblas
+  $ 
+  $ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x
+  $ ./cblas_dgemmx.x data/cblas_dgemmx.d
+
+In this example, we compile, link and run the cblas_dgemm  example,
+demonstrating use of MKL with icc -mkl option. Using the -mkl option is
+equivalent to:
+
+  $ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x 
+  -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5
+
+In this example, we compile and link the cblas_dgemm  example, using
+LP64 interface to threaded MKL and Intel OMP threads implementation.
+
+### Example: Intel MKL and GNU compiler
+
+  $ module load GCC
+  $ module load imkl
+  $ cp -a $MKL_EXAMPLES/cblas /tmp/
+  $ cd /tmp/cblas
+   
+  $ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x 
+  -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm
+
+  $ ./cblas_dgemmx.x data/cblas_dgemmx.d
+
+In this example, we compile, link and run the cblas_dgemm  example,
+using LP64 interface to threaded MKL and gnu OMP threads implementation.
+
+MKL and MIC accelerators
+------------------------
+
+The Intel MKL is capable to automatically offload the computations o the
+MIC accelerator. See section [Intel Xeon
+Phi](../intel-xeon-phi.html) for details.
+
+LAPACKE C Interface
+-------------------
+
+MKL includes LAPACKE C Interface to LAPACK. For some reason, although
+Intel is the author of LAPACKE, the LAPACKE header files are not present
+in MKL. For this reason, we have prepared 
+LAPACKE module, which includes Intel's LAPACKE
+headers from official LAPACK, which you can use to compile code using
+LAPACKE interface against MKL.
+
+Further reading
+---------------
+
+Read more on [Intel
+website](http://software.intel.com/en-us/intel-mkl), in
+particular the [MKL users
+guide](https://software.intel.com/en-us/intel-mkl/documentation/linux).
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-parallel-studio-introduction.md
+++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-parallel-studio-introduction.md
+Intel Parallel Studio 
+=====================
+
+
+
+
+
+
+
+
+
+
+
+
+
+The Salomon cluster provides following elements of the Intel Parallel
+Studio XE
+
+Intel Parallel Studio XE
+-------------------------------------------------
+Intel Compilers
+Intel Debugger
+Intel MKL Library
+Intel Integrated Performance Primitives Library
+Intel Threading Building Blocks Library
+Intel Trace Analyzer and Collector
+Intel Advisor
+Intel Inspector
+
+Intel compilers
+---------------
+
+The Intel compilers version 131.3 are available, via module
+iccifort/2013.5.192-GCC-4.8.3. The compilers include the icc C and C++
+compiler and the ifort fortran 77/90/95 compiler.
+
+  $ module load intel
+  $ icc -v
+  $ ifort -v
+
+Read more at the [Intel Compilers](intel-compilers.html)
+page.
+
+Intel debugger
+--------------
+
+IDB is no longer available since Parallel Studio 2015.
+
+ The intel debugger version 13.0 is available, via module intel. The
+debugger works for applications compiled with C and C++ compiler and the
+ifort fortran 77/90/95 compiler. The debugger provides java GUI
+environment. Use [X
+display](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html)
+for running the GUI.
+
+  $ module load intel
+  $ idb
+
+Read more at the [Intel Debugger](intel-debugger.html)
+page.
+
+Intel Math Kernel Library
+-------------------------
+
+Intel Math Kernel Library (Intel MKL) is a library of math kernel
+subroutines, extensively threaded and optimized for maximum performance.
+Intel MKL unites and provides these basic components: BLAS, LAPACK,
+ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and
+many more.
+
+  $ module load imkl
+
+Read more at the [Intel MKL](intel-mkl.html) page.
+
+Intel Integrated Performance Primitives
+---------------------------------------
+
+Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX
+is available, via module ipp. The IPP is a library of highly optimized
+algorithmic building blocks for media and data applications. This
+includes signal, image and frame processing algorithms, such as FFT,
+FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many
+more.
+
+  $ module load ipp
+
+Read more at the [Intel
+IPP](intel-integrated-performance-primitives.html) page.
+
+Intel Threading Building Blocks
+-------------------------------
+
+Intel Threading Building Blocks (Intel TBB) is a library that supports
+scalable parallel programming using standard ISO C++ code. It does not
+require special languages or compilers. It is designed to promote
+scalable data parallel programming. Additionally, it fully supports
+nested parallelism, so you can build larger parallel components from
+smaller parallel components. To use the library, you specify tasks, not
+threads, and let the library map tasks onto threads in an efficient
+manner.
+
+  $ module load tbb
+
+Read more at the [Intel TBB](intel-tbb.html) page.
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-tbb.md
+++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-tbb.md
+Intel TBB 
+=========
+
+
+
+
+
+
+
+
+
+
+
+
+
+Intel Threading Building Blocks
+-------------------------------
+
+Intel Threading Building Blocks (Intel TBB) is a library that supports
+scalable parallel programming using standard ISO C++ code. It does not
+require special languages or compilers.  To use the library, you specify
+tasks, not threads, and let the library map tasks onto threads in an
+efficient manner. The tasks are executed by a runtime scheduler and may
+be offloaded to [MIC
+accelerator](../intel-xeon-phi.html).
+
+Intel TBB version 4.3.5.187 is available on the cluster.
+
+  $ module load tbb
+
+The module sets up environment variables, required for linking and
+running tbb enabled applications.
+
+Link the tbb library, using -ltbb
+
+Examples
+--------
+
+Number of examples, demonstrating use of TBB and its built-in scheduler 
+is available on Anselm, in the $TBB_EXAMPLES directory.
+
+  $ module load intel
+  $ module load tbb
+  $ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/
+  $ cd /tmp/parallel_reduce/primes
+  $ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb
+  $ ./primes.x
+
+In this example, we compile, link and run the primes example,
+demonstrating use of parallel task-based reduce in computation of prime
+numbers.
+
+You will need the tbb module loaded to run the tbb enabled executable.
+This may be avoided, by compiling library search paths into the
+executable.
+
+  $ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb
+
+Further reading
+---------------
+
+Read more on Intel website,
+<http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/index.htm>
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md
+++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md
+Intel Trace Analyzer and Collector 
+==================================
+
+
+
+
+
+
+
+
+
+Intel Trace Analyzer and Collector (ITAC) is a tool to collect and
+graphicaly analyze behaviour of MPI applications. It helps you to
+analyze communication patterns of your application, identify hotspots,
+perform correctnes checking (identify deadlocks, data corruption etc),
+simulate how your application would run on a different interconnect. 
+
+ITAC is a offline analysis tool - first you run your application to
+collect a trace file, then you can open the trace in a GUI analyzer to
+view it.
+
+Installed version
+-----------------
+
+Currently on Salomon is version 9.1.2.024 available as module 
+itac/9.1.2.024
+
+Collecting traces
+-----------------
+
+ITAC can collect traces from applications that are using Intel MPI. To
+generate a trace, simply add -trace option to your mpirun command :
+
+  $ module load itac/9.1.2.024
+  $ mpirun -trace myapp
+
+The trace will be saved in file myapp.stf in the current directory.
+
+Viewing traces
+--------------
+
+To view and analyze the trace, open ITAC GUI in a [graphical
+environment](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html)
+:
+
+  $ module load itac/9.1.2.024
+  $ traceanalyzer
+
+The GUI will launch and you can open the produced *.stf file.
+
+
+![](Snmekobrazovky20151204v15.35.12.png)
+
+Please refer to Intel documenation about usage of the GUI tool.
+
+References
+----------
+
+1.[Getting Started with Intel® Trace Analyzer and
+  Collector](https://software.intel.com/en-us/get-started-with-itac-for-linux)
+2.[Intel® Trace Analyzer and Collector -
+  Documentation](http://Intel®%20Trace%20Analyzer%20and%20Collector%20-%20Documentation)
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/intel-xeon-phi.md
+++ b/converted/docs.it4i.cz/salomon/software/intel-xeon-phi.md
+Intel Xeon Phi 
+==============
+
+
+
+
+
+A guide to Intel Xeon Phi usage
+
+
+
+
+
+
+
+
+
+
+
+
+Intel Xeon Phi accelerator can be programmed in several modes. The
+default mode on the cluster is offload mode, but all modes described in
+this document are supported.
+
+Intel Utilities for Xeon Phi
+----------------------------
+
+To get access to a compute node with Intel Xeon Phi accelerator, use the
+PBS interactive session
+
+  $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0
+
+To set up the environment module "intel" has to be loaded, without
+specifying the version, default version is loaded (at time of writing
+this, it's 2015b)
+
+  $ module load intel
+
+Information about the hardware can be obtained by running
+the micinfo program on the host.
+
+  $ /usr/bin/micinfo
+
+The output of the "micinfo" utility executed on one of the cluster node
+is as follows. (note: to get PCIe related details the command has to be
+run with root privileges)
+
+  MicInfo Utility Log
+  Created Mon Aug 17 13:55:59 2015
+
+
+      System Info
+          HOST OS         : Linux
+          OS Version      : 2.6.32-504.16.2.el6.x86_64
+          Driver Version      : 3.4.1-1
+          MPSS Version        : 3.4.1
+          Host Physical Memory    : 131930 MB
+
+  Device No: 0, Device Name: mic0
+
+      Version
+          Flash Version        : 2.1.02.0390
+          SMC Firmware Version     : 1.16.5078
+          SMC Boot Loader Version  : 1.8.4326
+          uOS Version          : 2.6.38.8+mpss3.4.1
+          Device Serial Number     : ADKC44601414
+
+      Board
+          Vendor ID        : 0x8086
+          Device ID        : 0x225c
+          Subsystem ID         : 0x7d95
+          Coprocessor Stepping ID  : 2
+          PCIe Width       : x16
+          PCIe Speed       : 5 GT/s
+          PCIe Max payload size    : 256 bytes
+          PCIe Max read req size   : 512 bytes
+          Coprocessor Model    : 0x01
+          Coprocessor Model Ext    : 0x00
+          Coprocessor Type     : 0x00
+          Coprocessor Family   : 0x0b
+          Coprocessor Family Ext   : 0x00
+          Coprocessor Stepping     : C0
+          Board SKU        : C0PRQ-7120 P/A/X/D
+          ECC Mode         : Enabled
+          SMC HW Revision      : Product 300W Passive CS
+
+      Cores
+          Total No of Active Cores : 61
+          Voltage          : 1007000 uV
+          Frequency        : 1238095 kHz
+
+      Thermal
+          Fan Speed Control    : N/A
+          Fan RPM          : N/A
+          Fan PWM          : N/A
+          Die Temp         : 60 C
+
+      GDDR
+          GDDR Vendor      : Samsung
+          GDDR Version         : 0x6
+          GDDR Density         : 4096 Mb
+          GDDR Size        : 15872 MB
+          GDDR Technology      : GDDR5 
+          GDDR Speed       : 5.500000 GT/s 
+          GDDR Frequency       : 2750000 kHz
+          GDDR Voltage         : 1501000 uV
+
+  Device No: 1, Device Name: mic1
+
+      Version
+          Flash Version        : 2.1.02.0390
+          SMC Firmware Version     : 1.16.5078
+          SMC Boot Loader Version  : 1.8.4326
+          uOS Version          : 2.6.38.8+mpss3.4.1
+          Device Serial Number     : ADKC44500454
+
+      Board
+          Vendor ID        : 0x8086
+          Device ID        : 0x225c
+          Subsystem ID         : 0x7d95
+          Coprocessor Stepping ID  : 2
+          PCIe Width       : x16
+          PCIe Speed       : 5 GT/s
+          PCIe Max payload size    : 256 bytes
+          PCIe Max read req size   : 512 bytes
+          Coprocessor Model    : 0x01
+          Coprocessor Model Ext    : 0x00
+          Coprocessor Type     : 0x00
+          Coprocessor Family   : 0x0b
+          Coprocessor Family Ext   : 0x00
+          Coprocessor Stepping     : C0
+          Board SKU        : C0PRQ-7120 P/A/X/D
+          ECC Mode         : Enabled
+          SMC HW Revision      : Product 300W Passive CS
+
+      Cores
+          Total No of Active Cores : 61
+          Voltage          : 998000 uV
+          Frequency        : 1238095 kHz
+
+      Thermal
+          Fan Speed Control    : N/A
+          Fan RPM          : N/A
+          Fan PWM          : N/A
+          Die Temp         : 59 C
+
+      GDDR
+          GDDR Vendor      : Samsung
+          GDDR Version         : 0x6
+          GDDR Density         : 4096 Mb
+          GDDR Size        : 15872 MB
+          GDDR Technology      : GDDR5 
+          GDDR Speed       : 5.500000 GT/s 
+          GDDR Frequency       : 2750000 kHz
+          GDDR Voltage         : 1501000 uV
+
+Offload Mode
+------------
+
+To compile a code for Intel Xeon Phi a MPSS stack has to be installed on
+the machine where compilation is executed. Currently the MPSS stack is
+only installed on compute nodes equipped with accelerators.
+
+  $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0
+  $ module load intel
+
+For debugging purposes it is also recommended to set environment
+variable "OFFLOAD_REPORT". Value can be set from 0 to 3, where higher
+number means more debugging information.
+
+  export OFFLOAD_REPORT=3
+
+A very basic example of code that employs offload programming technique
+is shown in the next listing. Please note that this code is sequential
+and utilizes only single core of the accelerator.
+
+  $ vim source-offload.cpp
+
+  #include <iostream>
+
+  int main(int argc, char* argv[])
+  {
+      const int niter = 100000;
+      double result = 0;
+
+   #pragma offload target(mic)
+      for (int i = 0; i < niter; ++i) {
+          const double t = (i + 0.5) / niter;
+          result += 4.0 / (t * t + 1.0);
+      }
+      result /= niter;
+      std::cout << "Pi ~ " << result << 'n';
+  }
+
+To compile a code using Intel compiler run
+
+  $ icc source-offload.cpp -o bin-offload
+
+To execute the code, run the following command on the host
+
+  ./bin-offload
+
+### Parallelization in Offload Mode Using OpenMP
+
+One way of paralelization a code for Xeon Phi is using OpenMP
+directives. The following example shows code for parallel vector
+addition. 
+
+  $ vim ./vect-add 
+
+  #include <stdio.h>
+
+  typedef int T;
+
+  #define SIZE 1000
+
+  #pragma offload_attribute(push, target(mic))
+  T in1[SIZE];
+  T in2[SIZE];
+  T res[SIZE];
+  #pragma offload_attribute(pop)
+
+  // MIC function to add two vectors
+  __attribute__((target(mic))) add_mic(T *a, T *b, T *c, int size) {
+    int i = 0;
+    #pragma omp parallel for
+      for (i = 0; i < size; i++)
+        c[i] = a[i] + b[i];
+  }
+
+  // CPU function to add two vectors
+  void add_cpu (T *a, T *b, T *c, int size) {
+    int i;
+    for (i = 0; i < size; i++)
+      c[i] = a[i] + b[i];
+  }
+
+  // CPU function to generate a vector of random numbers
+  void random_T (T *a, int size) {
+    int i;
+    for (i = 0; i < size; i++)
+      a[i] = rand() % 10000; // random number between 0 and 9999
+  }
+
+  // CPU function to compare two vectors
+  int compare(T *a, T *b, T size ){
+    int pass = 0;
+    int i;
+    for (i = 0; i < size; i++){
+      if (a[i] != b[i]) {
+        printf("Value mismatch at location %d, values %d and %dn",i, a[i], b[i]);
+        pass = 1;
+      }
+    }
+    if (pass == 0) printf ("Test passedn"); else printf ("Test Failedn");
+    return pass;
+  }
+
+
+  int main()
+  {
+    int i;
+    random_T(in1, SIZE);
+    random_T(in2, SIZE);
+
+    #pragma offload target(mic) in(in1,in2)  inout(res)
+    {
+
+      // Parallel loop from main function
+      #pragma omp parallel for
+      for (i=0; i<SIZE; i++)
+        res[i] = in1[i] + in2[i];
+
+      // or parallel loop is called inside the function
+      add_mic(in1, in2, res, SIZE);
+
+    }
+
+
+    //Check the results with CPU implementation
+    T res_cpu[SIZE];
+    add_cpu(in1, in2, res_cpu, SIZE);
+    compare(res, res_cpu, SIZE);
+
+  }
+
+During the compilation Intel compiler shows which loops have been
+vectorized in both host and accelerator. This can be enabled with
+compiler option "-vec-report2". To compile and execute the code run
+
+  $ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add
+
+  $ ./vect-add 
+
+Some interesting compiler flags useful not only for code debugging are:
+
+Debugging
+  openmp_report[0|1|2] - controls the compiler based vectorization
+diagnostic level
+  vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic
+level
+
+Performance ooptimization
+  xhost - FOR HOST ONLY - to generate AVX (Advanced Vector Extensions)
+instructions.
+
+Automatic Offload using Intel MKL Library
+-----------------------------------------
+
+Intel MKL includes an Automatic Offload (AO) feature that enables
+computationally intensive MKL functions called in user code to benefit
+from attached Intel Xeon Phi coprocessors automatically and
+transparently.
+
+Behavioural of automatic offload mode is controlled by functions called
+within the program or by environmental variables. Complete list of
+controls is listed [
+class="external-link">here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm).
+
+The Automatic Offload may be enabled by either an MKL function call
+within the code:
+
+  mkl_mic_enable();
+
+or by setting environment variable
+
+  $ export MKL_MIC_ENABLE=1
+
+To get more information about automatic offload please refer to "[Using
+Intel® MKL Automatic Offload on Intel ® Xeon Phi™
+Coprocessors](http://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf)"
+white paper or [ class="external-link">Intel MKL
+documentation](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation).
+
+### Automatic offload example #1
+
+Following example show how to automatically offload an SGEMM (single
+precision - g dir="auto">eneral matrix multiply) function to
+MIC coprocessor.
+
+At first get an interactive PBS session on a node with MIC accelerator
+and load "intel" module that automatically loads "mkl" module as well.
+
+  $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0
+  $ module load intel
+
+ The code can be copied to a file and compiled without any necessary
+modification. 
+
+  $ vim sgemm-ao-short.c
+
+```
+#include <stdio.h>
+#include <stdlib.h>
+#include <malloc.h>
+#include <stdint.h>
+
+#include "mkl.h"
+
+int main(int argc, char **argv)
+{
+        float *A, *B, *C; /* Matrices */
+
+        MKL_INT N = 2560; /* Matrix dimensions */
+        MKL_INT LD = N; /* Leading dimension */
+        int matrix_bytes; /* Matrix size in bytes */
+        int matrix_elements; /* Matrix size in elements */
+
+        float alpha = 1.0, beta = 1.0; /* Scaling factors */
+        char transa = 'N', transb = 'N'; /* Transposition options */
+
+        int i, j; /* Counters */
+
+        matrix_elements = N * N;
+        matrix_bytes = sizeof(float) * matrix_elements;
+
+        /* Allocate the matrices */
+        A = malloc(matrix_bytes); B = malloc(matrix_bytes); C = malloc(matrix_bytes);
+
+        /* Initialize the matrices */
+        for (i = 0; i < matrix_elements; i++) {
+                A[i] = 1.0; B[i] = 2.0; C[i] = 0.0;
+        }
+
+        printf("Computing SGEMM on the hostn");
+        sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N);
+
+        printf("Enabling Automatic Offloadn");
+        /* Alternatively, set environment variable MKL_MIC_ENABLE=1 */
+        mkl_mic_enable();
+        
+        int ndevices = mkl_mic_get_device_count(); /* Number of MIC devices */
+        printf("Automatic Offload enabled: %d MIC devices presentn",   ndevices);
+
+        printf("Computing SGEMM with automatic workdivisionn");
+        sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N);
+
+        /* Free the matrix memory */
+        free(A); free(B); free(C);
+
+        printf("Donen");
+
+    return 0;
+}
+```
+
+Please note: This example is simplified version of an example from MKL.
+The expanded version can be found here:
+$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c**
+
+To compile a code using Intel compiler use:
+
+  $ icc -mkl sgemm-ao-short.c -o sgemm
+
+For debugging purposes enable the offload report to see more information
+about automatic offloading.
+
+  $ export OFFLOAD_REPORT=2
+
+The output of a code should look similar to following listing, where
+lines starting with [MKL] are generated by offload reporting:
+
+  [user@r31u03n799 ~]$ ./sgemm 
+  Computing SGEMM on the host
+  Enabling Automatic Offload
+  Automatic Offload enabled: 2 MIC devices present
+  Computing SGEMM with automatic workdivision
+  [MKL] [MIC --] [AO Function]    SGEMM
+  [MKL] [MIC --] [AO SGEMM Workdivision]    0.44 0.28 0.28
+  [MKL] [MIC 00] [AO SGEMM CPU Time]    0.252427 seconds
+  [MKL] [MIC 00] [AO SGEMM MIC Time]    0.091001 seconds
+  [MKL] [MIC 00] [AO SGEMM CPU->MIC Data]    34078720 bytes
+  [MKL] [MIC 00] [AO SGEMM MIC->CPU Data]    7864320 bytes
+  [MKL] [MIC 01] [AO SGEMM CPU Time]    0.252427 seconds
+  [MKL] [MIC 01] [AO SGEMM MIC Time]    0.094758 seconds
+  [MKL] [MIC 01] [AO SGEMM CPU->MIC Data]    34078720 bytes
+  [MKL] [MIC 01] [AO SGEMM MIC->CPU Data]    7864320 bytes
+  Done
+
+Behavioral of automatic offload mode is controlled by functions called
+within the program or by environmental variables. Complete list of
+controls is listed [
+class="external-link">here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm).
+
+To get more information about automatic offload please refer to "[Using
+Intel® MKL Automatic Offload on Intel ® Xeon Phi™
+Coprocessors](http://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf)"
+white paper or [ class="external-link">Intel MKL
+documentation](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation).
+
+### Automatic offload example #2
+
+In this example, we will demonstrate automatic offload control via an
+environment vatiable MKL_MIC_ENABLE. The function DGEMM will be
+offloaded.
+
+At first get an interactive PBS session on a node with MIC accelerator.
+
+  $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0
+
+Once in, we enable the offload and run the Octave software. In octave,
+we generate two large random matrices and let them multiply together.
+
+  $ export MKL_MIC_ENABLE=1
+  $ export OFFLOAD_REPORT=2
+  $ module load Octave/3.8.2-intel-2015b
+
+  $ octave -q
+  octave:1> A=rand(10000);
+  octave:2> B=rand(10000);
+  octave:3> C=A*B;
+  [MKL] [MIC --] [AO Function]    DGEMM
+  [MKL] [MIC --] [AO DGEMM Workdivision]    0.14 0.43 0.43
+  [MKL] [MIC 00] [AO DGEMM CPU Time]    3.814714 seconds
+  [MKL] [MIC 00] [AO DGEMM MIC Time]    2.781595 seconds
+  [MKL] [MIC 00] [AO DGEMM CPU->MIC Data]    1145600000 bytes
+  [MKL] [MIC 00] [AO DGEMM MIC->CPU Data]    1382400000 bytes
+  [MKL] [MIC 01] [AO DGEMM CPU Time]    3.814714 seconds
+  [MKL] [MIC 01] [AO DGEMM MIC Time]    2.843016 seconds
+  [MKL] [MIC 01] [AO DGEMM CPU->MIC Data]    1145600000 bytes
+  [MKL] [MIC 01] [AO DGEMM MIC->CPU Data]    1382400000 bytes
+  octave:4> exit
+
+On the example above we observe, that the DGEMM function workload was
+split over CPU, MIC 0 and MIC 1, in the ratio 0.14 0.43 0.43. The matrix
+multiplication was done on the CPU, accelerated by two Xeon Phi
+accelerators.
+
+Native Mode
+-----------
+
+In the native mode a program is executed directly on Intel Xeon Phi
+without involvement of the host machine. Similarly to offload mode, the
+code is compiled on the host computer with Intel compilers.
+
+To compile a code user has to be connected to a compute with MIC and
+load Intel compilers module. To get an interactive session on a compute
+node with an Intel Xeon Phi and load the module use following commands: 
+
+  $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0
+
+  $ module load intel
+
+Please note that particular version of the Intel module is specified.
+This information is used later to specify the correct library paths.
+
+To produce a binary compatible with Intel Xeon Phi architecture user has
+to specify "-mmic" compiler flag. Two compilation examples are shown
+below. The first example shows how to compile OpenMP parallel code
+"vect-add.c" for host only:
+
+  $ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host
+
+To run this code on host, use:
+
+  $ ./vect-add-host
+
+The second example shows how to compile the same code for Intel Xeon
+Phi:
+
+  $ icc -mmic -fopenmp vect-add.c -o vect-add-mic
+
+### Execution of the Program in Native Mode on Intel Xeon Phi
+
+The user access to the Intel Xeon Phi is through the SSH. Since user
+home directories are mounted using NFS on the accelerator, users do not
+have to copy binary files or libraries between the host and accelerator.
+ 
+
+Get the PATH of MIC enabled libraries for currently used Intel Compiler
+(here was icc/2015.3.187-GNU-5.1.0-2.25 used) :
+
+  $ echo $MIC_LD_LIBRARY_PATH
+  /apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic
+
+To connect to the accelerator run:
+
+  $ ssh mic0
+
+If the code is sequential, it can be executed directly:
+
+  mic0 $ ~/path_to_binary/vect-add-seq-mic
+
+If the code is parallelized using OpenMP a set of additional libraries
+is required for execution. To locate these libraries new path has to be
+added to the LD_LIBRARY_PATH environment variable prior to the
+execution:
+
+  mic0 $ export LD_LIBRARY_PATH=/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic:$LD_LIBRARY_PATH
+
+Please note that the path exported in the previous example contains path
+to a specific compiler (here the version is 2015.3.187-GNU-5.1.0-2.25).
+This version number has to match with the version number of the Intel
+compiler module that was used to compile the code on the host computer.
+
+For your information the list of libraries and their location required
+for execution of an OpenMP parallel code on Intel Xeon Phi is:
+
+
+>/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic
+
+libiomp5.so
+libimf.so
+libsvml.so
+libirng.so
+libintlc.so.5
+
+
+>Finally, to run the compiled code use: 
+
+  $ ~/path_to_binary/vect-add-mic
+
+>OpenCL
+-------------------
+
+>OpenCL (Open Computing Language) is an open standard for
+general-purpose parallel programming for diverse mix of multi-core CPUs,
+GPU coprocessors, and other parallel processors. OpenCL provides a
+flexible execution model and uniform programming environment for
+software developers to write portable code for systems running on both
+the CPU and graphics processors or accelerators like the Intel® Xeon
+Phi.
+
+>On Anselm OpenCL is installed only on compute nodes with MIC
+accelerator, therefore OpenCL code can be compiled only on these nodes.
+
+
+  module load opencl-sdk opencl-rt
+
+>Always load "opencl-sdk" (providing devel files like headers) and
+"opencl-rt" (providing dynamic library libOpenCL.so) modules to compile
+and link OpenCL code. Load "opencl-rt" for running your compiled code.
+
+
+>There are two basic examples of OpenCL code in the following
+directory: 
+
+  /apps/intel/opencl-examples/
+
+>First example "CapsBasic" detects OpenCL compatible hardware, here
+CPU and MIC, and prints basic information about the capabilities of
+it. 
+
+
+  /apps/intel/opencl-examples/CapsBasic/capsbasic
+
+>To compile and run the example copy it to your home directory, get
+a PBS interactive session on of the nodes with MIC and run make for
+compilation. Make files are very basic and shows how the OpenCL code can
+be compiled on Anselm. 
+
+  $ cp /apps/intel/opencl-examples/CapsBasic/* .
+  $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0
+  $ make
+
+>The compilation command for this example is: 
+
+  $ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/
+
+>After executing the complied binary file, following output should
+be displayed.
+
+
+  ./capsbasic
+
+  Number of available platforms: 1
+  Platform names:
+      [0] Intel(R) OpenCL [Selected]
+  Number of devices available for each type:
+      CL_DEVICE_TYPE_CPU: 1
+      CL_DEVICE_TYPE_GPU: 0
+      CL_DEVICE_TYPE_ACCELERATOR: 1
+
+  *** Detailed information for each device ***
+
+  CL_DEVICE_TYPE_CPU[0]
+      CL_DEVICE_NAME:        Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz
+      CL_DEVICE_AVAILABLE: 1
+
+  ...
+
+  CL_DEVICE_TYPE_ACCELERATOR[0]
+      CL_DEVICE_NAME: Intel(R) Many Integrated Core Acceleration Card
+      CL_DEVICE_AVAILABLE: 1
+
+  ...
+
+>More information about this example can be found on Intel website:
+<http://software.intel.com/en-us/vcsource/samples/caps-basic/>
+
+
+>The second example that can be found in
+"/apps/intel/opencl-examples" >directory is General Matrix
+Multiply. You can follow the the same procedure to download the example
+to your directory and compile it. 
+
+
+  $ cp -r /apps/intel/opencl-examples/* .
+  $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0
+  $ cd GEMM 
+  $ make
+
+>The compilation command for this example is: 
+
+  $ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/
+
+>To see the performance of Intel Xeon Phi performing the DGEMM run
+the example as follows: 
+
+  ./gemm -d 1
+  Platforms (1):
+   [0] Intel(R) OpenCL [Selected]
+  Devices (2):
+   [0] Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz
+   [1] Intel(R) Many Integrated Core Acceleration Card [Selected]
+  Build program options: "-DT=float -DTILE_SIZE_M=1 -DTILE_GROUP_M=16 -DTILE_SIZE_N=128 -DTILE_GROUP_N=1 -DTILE_SIZE_K=8"
+  Running gemm_nn kernel with matrix size: 3968x3968
+  Memory row stride to ensure necessary alignment: 15872 bytes
+  Size of memory region for one matrix: 62980096 bytes
+  Using alpha = 0.57599 and beta = 0.872412
+  ...
+  Host time: 0.292953 sec.
+  Host perf: 426.635 GFLOPS
+  Host time: 0.293334 sec.
+  Host perf: 426.081 GFLOPS
+  ...
+
+>Please note: GNU compiler is used to compile the OpenCL codes for
+Intel MIC. You do not need to load Intel compiler module.
+
+
+>MPI
+----------------
+
+### Environment setup and compilation
+
+To achieve best MPI performance always use following setup for Intel MPI
+on Xeon Phi accelerated nodes:
+
+  $ export I_MPI_FABRICS=shm:dapl
+  $ export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0,ofa-v2-mcm-1
+
+This ensures, that MPI inside node will use SHMEM communication, between
+HOST and Phi the IB SCIF will be used and between different nodes or
+Phi's on diferent nodes a CCL-Direct proxy will be used.
+
+Please note: Other FABRICS like tcp,ofa may be used (even combined with
+shm) but there's severe loss of performance (by order of magnitude).
+Usage of single DAPL PROVIDER (e. g.
+I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u) will cause failure of
+Host&lt;-&gt;Phi and/or Phi&lt;-&gt;Phi communication.
+Usage of the I_MPI_DAPL_PROVIDER_LIST on non-accelerated node will
+cause failure of any MPI communication, since those nodes don't have
+SCIF device and there's no CCL-Direct proxy runnig.
+
+Again an MPI code for Intel Xeon Phi has to be compiled on a compute
+node with accelerator and MPSS software stack installed. To get to a
+compute node with accelerator use:
+
+  $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0
+
+The only supported implementation of MPI standard for Intel Xeon Phi is
+Intel MPI. To setup a fully functional development environment a
+combination of Intel compiler and Intel MPI has to be used. On a host
+load following modules before compilation:
+
+  $ module load intel impi
+
+To compile an MPI code for host use:
+
+  $ mpiicc -xhost -o mpi-test mpi-test.c
+
+To compile the same code for Intel Xeon Phi architecture use:
+
+  $ mpiicc -mmic -o mpi-test-mic mpi-test.c
+
+Or, if you are using Fortran :
+
+  $ mpiifort -mmic -o mpi-test-mic mpi-test.f90
+
+An example of basic MPI version of "hello-world" example in C language,
+that can be executed on both host and Xeon Phi is (can be directly copy
+and pasted to a .c file)
+
+```
+#include <stdio.h>
+#include <mpi.h>
+
+
+int main (argc, argv)
+     int argc;
+     char *argv[];
+{
+  int rank, size;
+
+  int len;
+  char node[MPI_MAX_PROCESSOR_NAME];
+
+  MPI_Init (&argc, &argv);      /* starts MPI */
+  MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /* get current process id */
+  MPI_Comm_size (MPI_COMM_WORLD, &size);        /* get number of processes */
+
+  MPI_Get_processor_name(node,&len);
+
+  printf( "Hello world from process %d of %d on host %s n", rank, size, node );
+  MPI_Finalize();
+  return 0; 
+}
+```
+
+### MPI programming models
+
+>Intel MPI for the Xeon Phi coprocessors offers different MPI
+programming models:
+
+Host-only model** - all MPI ranks reside on the host. The coprocessors
+can be used by using offload pragmas. (Using MPI calls inside offloaded
+code is not supported.)**
+
+Coprocessor-only model** - all MPI ranks reside only on the
+coprocessors.
+
+Symmetric model** - the MPI ranks reside on both the host and the
+coprocessor. Most general MPI case.
+
+### >Host-only model
+
+>In this case all environment variables are set by modules,
+so to execute the compiled MPI program on a single node, use:
+
+  $ mpirun -np 4 ./mpi-test
+
+The output should be similar to:
+
+  Hello world from process 1 of 4 on host r38u31n1000
+  Hello world from process 3 of 4 on host r38u31n1000
+  Hello world from process 2 of 4 on host r38u31n1000
+  Hello world from process 0 of 4 on host r38u31n1000
+
+### Coprocessor-only model
+
+>There are two ways how to execute an MPI code on a single
+coprocessor: 1.) lunch the program using "**mpirun**" from the
+coprocessor; or 2.) lunch the task using "**mpiexec.hydra**" from a
+host.
+
+
+Execution on coprocessor** 
+
+Similarly to execution of OpenMP programs in native mode, since the
+environmental module are not supported on MIC, user has to setup paths
+to Intel MPI libraries and binaries manually. One time setup can be done
+by creating a "**.profile**" file in user's home directory. This file
+sets up the environment on the MIC automatically once user access to the
+accelerator through the SSH.
+
+At first get the LD_LIBRARY_PATH for currenty used Intel Compiler and
+Intel MPI:
+
+  $ echo $MIC_LD_LIBRARY_PATH
+  /apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/mkl/lib/mic:/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/lib/mic:/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic/
+
+Use it in your ~/.profile:
+
+  $ vim ~/.profile 
+
+  PS1='[u@h W]$ '
+  export PATH=/usr/bin:/usr/sbin:/bin:/sbin
+
+  #IMPI
+  export PATH=/apps/all/impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25/mic/bin/:$PATH
+
+  #OpenMP (ICC, IFORT), IMKL and IMPI
+  export LD_LIBRARY_PATH=/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/mkl/lib/mic:/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/lib/mic:/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic:$LD_LIBRARY_PATH
+
+Please note:
+ - this file sets up both environmental variable for both MPI and OpenMP
+libraries.
+ - this file sets up the paths to a particular version of Intel MPI
+library and particular version of an Intel compiler. These versions have
+to match with loaded modules.
+
+To access a MIC accelerator located on a node that user is currently
+connected to, use:
+
+  $ ssh mic0
+
+or in case you need specify a MIC accelerator on a particular node, use:
+
+  $ ssh r38u31n1000-mic0
+
+To run the MPI code in parallel on multiple core of the accelerator,
+use:
+
+  $ mpirun -np 4 ./mpi-test-mic
+
+The output should be similar to:
+
+  Hello world from process 1 of 4 on host r38u31n1000-mic0
+  Hello world from process 2 of 4 on host r38u31n1000-mic0
+  Hello world from process 3 of 4 on host r38u31n1000-mic0
+  Hello world from process 0 of 4 on host r38u31n1000-mic0
+
+ **
+
+Execution on host**
+
+If the MPI program is launched from host instead of the coprocessor, the
+environmental variables are not set using the ".profile" file. Therefore
+user has to specify library paths from the command line when calling
+"mpiexec".
+
+First step is to tell mpiexec that the MPI should be executed on a local
+accelerator by setting up the environmental variable "I_MPI_MIC"
+
+  $ export I_MPI_MIC=1
+
+Now the MPI program can be executed as:
+
+  $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -host mic0 -n 4 ~/mpi-test-mic
+
+or using mpirun
+
+  $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -host mic0 -n 4 ~/mpi-test-mic
+
+Please note:
+ - the full path to the binary has to specified (here:
+"**>~/mpi-test-mic**")
+ - the LD_LIBRARY_PATH has to match with Intel MPI module used to
+compile the MPI code
+
+The output should be again similar to:
+
+  Hello world from process 1 of 4 on host r38u31n1000-mic0
+  Hello world from process 2 of 4 on host r38u31n1000-mic0
+  Hello world from process 3 of 4 on host r38u31n1000-mic0
+  Hello world from process 0 of 4 on host r38u31n1000-mic0
+
+Please note that the "mpiexec.hydra" requires a file
+"**>pmi_proxy**" from Intel MPI library to be copied to the
+MIC filesystem. If the file is missing please contact the system
+administrators. A simple test to see if the file is present is to
+execute:
+
+    $ ssh mic0 ls /bin/pmi_proxy
+    /bin/pmi_proxy
+
+ **
+
+Execution on host - MPI processes distributed over multiple
+accelerators on multiple nodes**
+
+>To get access to multiple nodes with MIC accelerator, user has to
+use PBS to allocate the resources. To start interactive session, that
+allocates 2 compute nodes = 2 MIC accelerators run qsub command with
+following parameters: 
+
+  $ qsub -I -q qprod -l select=2:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0
+
+  $ module load intel impi
+
+>This command connects user through ssh to one of the nodes
+immediately. To see the other nodes that have been allocated use:
+
+
+  $ cat $PBS_NODEFILE
+
+>For example: 
+
+  r38u31n1000.bullx
+  r38u32n1001.bullx
+
+>This output means that the PBS allocated nodes r38u31n1000 and
+r38u32n1001, which means that user has direct access to
+"**r38u31n1000-mic0**" and "**>r38u32n1001-mic0**"
+accelerators.
+
+>Please note: At this point user can connect to any of the
+allocated nodes or any of the allocated MIC accelerators using ssh:
+- to connect to the second node : ** $
+ssh >r38u32n1001**
+>- to connect to the accelerator on the first node from the first
+node:  **$ ssh
+>r38u31n1000-mic0**</span> or **
+$ ssh mic0**
+-** to connect to the accelerator on the second node from the first
+node:  **$ ssh
+>r38u32n1001-mic0**
+
+
+>At this point we expect that correct modules are loaded and binary
+is compiled. For parallel execution the mpiexec.hydra is used.
+Again the first step is to tell mpiexec that the MPI can be executed on
+MIC accelerators by setting up the environmental variable "I_MPI_MIC",
+don't forget to have correct FABRIC and PROVIDER defined.
+
+  $ export I_MPI_MIC=1
+  $ export I_MPI_FABRICS=shm:dapl
+  $ export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0,ofa-v2-mcm-1
+
+>The launch the MPI program use:
+
+  $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH 
+   -host r38u31n1000-mic0 -n 4 ~/mpi-test-mic 
+  : -host r38u32n1001-mic0 -n 6 ~/mpi-test-mic
+
+or using mpirun:
+
+  $ mpirun -genv LD_LIBRARY_PATH 
+   -host r38u31n1000-mic0 -n 4 ~/mpi-test-mic 
+  : -host r38u32n1001-mic0 -n 6 ~/mpi-test-mic
+
+In this case four MPI processes are executed on accelerator
+r38u31n1000-mic and six processes are executed on accelerator
+r38u32n1001-mic0. The sample output (sorted after execution) is:
+
+  Hello world from process 0 of 10 on host r38u31n1000-mic0
+  Hello world from process 1 of 10 on host r38u31n1000-mic0
+  Hello world from process 2 of 10 on host r38u31n1000-mic0
+  Hello world from process 3 of 10 on host r38u31n1000-mic0
+  Hello world from process 4 of 10 on host r38u32n1001-mic0
+  Hello world from process 5 of 10 on host r38u32n1001-mic0
+  Hello world from process 6 of 10 on host r38u32n1001-mic0
+  Hello world from process 7 of 10 on host r38u32n1001-mic0
+  Hello world from process 8 of 10 on host r38u32n1001-mic0
+  Hello world from process 9 of 10 on host r38u32n1001-mic0
+
+The same way MPI program can be executed on multiple hosts: 
+
+  $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH 
+   -host r38u31n1000 -n 4 ~/mpi-test 
+  : -host r38u32n1001 -n 6 ~/mpi-test
+
+### >Symmetric model 
+
+>In a symmetric mode MPI programs are executed on both host
+computer(s) and MIC accelerator(s). Since MIC has a different
+architecture and requires different binary file produced by the Intel
+compiler two different files has to be compiled before MPI program is
+executed. 
+
+>In the previous section we have compiled two binary files, one for
+hosts "**mpi-test**" and one for MIC accelerators "**mpi-test-mic**".
+These two binaries can be executed at once using mpiexec.hydra:
+
+
+  $ mpirun 
+   -genv $MIC_LD_LIBRARY_PATH 
+   -host r38u32n1001 -n 2 ~/mpi-test 
+  : -host r38u32n1001-mic0 -n 2 ~/mpi-test-mic
+
+In this example the first two parameters (line 2 and 3) sets up required
+environment variables for execution. The third line specifies binary
+that is executed on host (here r38u32n1001) and the last line specifies
+the binary that is execute on the accelerator (here r38u32n1001-mic0).
+
+>The output of the program is: 
+
+  Hello world from process 0 of 4 on host r38u32n1001
+  Hello world from process 1 of 4 on host r38u32n1001
+  Hello world from process 2 of 4 on host r38u32n1001-mic0
+  Hello world from process 3 of 4 on host r38u32n1001-mic0
+
+>The execution procedure can be simplified by using the mpirun
+command with the machine file a a parameter. Machine file contains list
+of all nodes and accelerators that should used to execute MPI processes.
+
+
+>An example of a machine file that uses 2 >hosts (r38u32n1001
+and r38u33n1002) and 2 accelerators **(r38u32n1001-mic0** and
+>>r38u33n1002-mic0**) to run 2 MPI processes
+on each of them:
+
+
+  $ cat hosts_file_mix
+  r38u32n1001:2
+  r38u32n1001-mic0:2
+  r38u33n1002:2
+  r38u33n1002-mic0:2
+
+>In addition if a naming convention is set in a way that the name
+of the binary for host is **"bin_name"**  and the name of the binary
+for the accelerator is **"bin_name-mic"** then by setting up the
+environment variable **I_MPI_MIC_POSTFIX** to **"-mic"** user do not
+have to specify the names of booth binaries. In this case mpirun needs
+just the name of the host binary file (i.e. "mpi-test") and uses the
+suffix to get a name of the binary for accelerator (i..e.
+"mpi-test-mic").
+
+
+  $ export I_MPI_MIC_POSTFIX=-mic
+
+ >To run the MPI code using mpirun and the machine file
+"hosts_file_mix" use:
+
+
+  $ mpirun 
+   -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH 
+   -machinefile hosts_file_mix 
+   ~/mpi-test
+
+>A possible output of the MPI "hello-world" example executed on two
+hosts and two accelerators is:
+
+
+  Hello world from process 0 of 8 on host r38u31n1000
+  Hello world from process 1 of 8 on host r38u31n1000
+  Hello world from process 2 of 8 on host r38u31n1000-mic0
+  Hello world from process 3 of 8 on host r38u31n1000-mic0
+  Hello world from process 4 of 8 on host r38u32n1001
+  Hello world from process 5 of 8 on host r38u32n1001
+  Hello world from process 6 of 8 on host r38u32n1001-mic0
+  Hello world from process 7 of 8 on host r38u32n1001-mic0
+
+Using the PBS automatically generated node-files
+
+
+PBS also generates a set of node-files that can be used instead of
+manually creating a new one every time. Three node-files are genereated:
+
+Host only node-file:**
+ - /lscratch/${PBS_JOBID}/nodefile-cn
+MIC only node-file**:
+ - /lscratch/${PBS_JOBID}/nodefile-mic
+Host and MIC node-file**:
+ - /lscratch/${PBS_JOBID}/nodefile-mix
+
+Please note each host or accelerator is listed only per files. User has
+to specify how many jobs should be executed per node using "-n"
+parameter of the mpirun command.
+
+Optimization
+------------
+
+For more details about optimization techniques please read Intel
+document [Optimization and Performance Tuning for Intel® Xeon Phi™
+Coprocessors](http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization "http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization")
+
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/java.md
+++ b/converted/docs.it4i.cz/salomon/software/java.md
+Java 
+====
+
+
+
+
+
+Java on the cluster
+
+
+
+
+
+
+
+
+
+
+
+
+Java is available on the cluster. Activate java by loading the Java
+module
+
+  $ module load Java
+
+Note that the Java module must be loaded on the compute nodes as well,
+in order to run java on compute nodes.
+
+Check for java version and path
+
+  $ java -version
+  $ which java
+
+With the module loaded, not only the runtime environment (JRE), but also
+the development environment (JDK) with the compiler is available.
+
+  $ javac -version
+  $ which javac
+
+Java applications may use MPI for interprocess communication, in
+conjunction with OpenMPI. Read more
+on <http://www.open-mpi.org/faq/?category=java>.
+This functionality is currently not supported on Anselm cluster. In case
+you require the java interface to MPI, please contact [cluster
+support](https://support.it4i.cz/rt/).
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/mpi-1/Running_OpenMPI.md
+++ b/converted/docs.it4i.cz/salomon/software/mpi-1/Running_OpenMPI.md
+Running OpenMPI 
+===============
+
+
+
+
+
+
+
+
+
+
+
+
+
+OpenMPI program execution
+-------------------------
+
+The OpenMPI programs may be executed only via the PBS Workload manager,
+by entering an appropriate queue. On the cluster, the **OpenMPI 1.8.6**
+is OpenMPI based MPI implementation.
+
+### Basic usage
+
+Use the mpiexec to run the OpenMPI code.
+
+Example:
+
+  $ qsub -q qexp -l select=4:ncpus=24 -I
+  qsub: waiting for job 15210.isrv5 to start
+  qsub: job 15210.isrv5 ready
+
+  $ pwd
+  /home/username
+
+  $ module load OpenMPI
+  $ mpiexec -pernode ./helloworld_mpi.x
+  Hello world! from rank 0 of 4 on host r1i0n17
+  Hello world! from rank 1 of 4 on host r1i0n5
+  Hello world! from rank 2 of 4 on host r1i0n6
+  Hello world! from rank 3 of 4 on host r1i0n7
+
+Please be aware, that in this example, the directive **-pernode** is
+used to run only **one task per node**, which is normally an unwanted
+behaviour (unless you want to run hybrid code with just one MPI and 24
+OpenMP tasks per node). In normal MPI programs **omit the -pernode
+directive** to run up to 24 MPI tasks per each node.
+
+In this example, we allocate 4 nodes via the express queue
+interactively. We set up the openmpi environment and interactively run
+the helloworld_mpi.x program.
+Note that the executable 
+helloworld_mpi.x must be available within the
+same path on all nodes. This is automatically fulfilled on the /home and
+/scratch filesystem.
+
+You need to preload the executable, if running on the local ramdisk /tmp
+filesystem
+
+  $ pwd
+  /tmp/pbs.15210.isrv5
+
+  $ mpiexec -pernode --preload-binary ./helloworld_mpi.x
+  Hello world! from rank 0 of 4 on host r1i0n17
+  Hello world! from rank 1 of 4 on host r1i0n5
+  Hello world! from rank 2 of 4 on host r1i0n6
+  Hello world! from rank 3 of 4 on host r1i0n7
+
+In this example, we assume the executable 
+helloworld_mpi.x is present on compute node
+r1i0n17 on ramdisk. We call the mpiexec whith the **--preload-binary**
+argument (valid for openmpi). The mpiexec will copy the executable from
+r1i0n17 to the  /tmp/pbs.15210.isrv5
+directory on r1i0n5, r1i0n6 and r1i0n7 and execute the program.
+
+MPI process mapping may be controlled by PBS parameters.
+
+The mpiprocs and ompthreads parameters allow for selection of number of
+running MPI processes per node as well as number of OpenMP threads per
+MPI process.
+
+### One MPI process per node
+
+Follow this example to run one MPI process per node, 24 threads per
+process. 
+
+  $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=1:ompthreads=24 -I
+
+  $ module load OpenMPI
+
+  $ mpiexec --bind-to-none ./helloworld_mpi.x
+
+In this example, we demonstrate recommended way to run an MPI
+application, using 1 MPI processes per node and 24 threads per socket,
+on 4 nodes.
+
+### Two MPI processes per node
+
+Follow this example to run two MPI processes per node, 8 threads per
+process. Note the options to mpiexec.
+
+  $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=2:ompthreads=12 -I
+
+  $ module load OpenMPI
+
+  $ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x
+
+In this example, we demonstrate recommended way to run an MPI
+application, using 2 MPI processes per node and 12 threads per socket,
+each process and its threads bound to a separate processor socket of the
+node, on 4 nodes
+
+### 24 MPI processes per node
+
+Follow this example to run 24 MPI processes per node, 1 thread per
+process. Note the options to mpiexec.
+
+  $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I
+
+  $ module load OpenMPI
+
+  $ mpiexec -bycore -bind-to-core ./helloworld_mpi.x
+
+In this example, we demonstrate recommended way to run an MPI
+application, using 24 MPI processes per node, single threaded. Each
+process is bound to separate processor core, on 4 nodes.
+
+### OpenMP thread affinity
+
+Important!  Bind every OpenMP thread to a core!
+
+In the previous two examples with one or two MPI processes per node, the
+operating system might still migrate OpenMP threads between cores. You
+might want to avoid this by setting these environment variable for GCC
+OpenMP:
+
+  $ export GOMP_CPU_AFFINITY="0-23"
+
+or this one for Intel OpenMP:
+
+  $ export KMP_AFFINITY=granularity=fine,compact,1,0
+
+As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and
+later) the following variables may be used for Intel or GCC:
+
+  $ export OMP_PROC_BIND=true
+  $ export OMP_PLACES=cores 
+
+>OpenMPI Process Mapping and Binding
+------------------------------------------------
+
+The mpiexec allows for precise selection of how the MPI processes will
+be mapped to the computational nodes and how these processes will bind
+to particular processor sockets and cores.
+
+MPI process mapping may be specified by a hostfile or rankfile input to
+the mpiexec program. Altough all implementations of MPI provide means
+for process mapping and binding, following examples are valid for the
+openmpi only.
+
+### Hostfile
+
+Example hostfile
+
+  r1i0n17.smc.salomon.it4i.cz
+  r1i0n5.smc.salomon.it4i.cz
+  r1i0n6.smc.salomon.it4i.cz
+  r1i0n7.smc.salomon.it4i.cz
+
+Use the hostfile to control process placement
+
+  $ mpiexec -hostfile hostfile ./helloworld_mpi.x
+  Hello world! from rank 0 of 4 on host r1i0n17
+  Hello world! from rank 1 of 4 on host r1i0n5
+  Hello world! from rank 2 of 4 on host r1i0n6
+  Hello world! from rank 3 of 4 on host r1i0n7
+
+In this example, we see that ranks have been mapped on nodes according
+to the order in which nodes show in the hostfile
+
+### Rankfile
+
+Exact control of MPI process placement and resource binding is provided
+by specifying a rankfile
+
+Appropriate binding may boost performance of your application.
+
+Example rankfile
+
+  rank 0=r1i0n7.smc.salomon.it4i.cz slot=1:0,1
+  rank 1=r1i0n6.smc.salomon.it4i.cz slot=0:*
+  rank 2=r1i0n5.smc.salomon.it4i.cz slot=1:1-2
+  rank 3=r1i0n17.smc.salomon slot=0:1,1:0-2
+  rank 4=r1i0n6.smc.salomon.it4i.cz slot=0:*,1:*
+
+This rankfile assumes 5 ranks will be running on 4 nodes and provides
+exact mapping and binding of the processes to the processor sockets and
+cores
+
+Explanation:
+rank 0 will be bounded to r1i0n7, socket1 core0 and core1
+rank 1 will be bounded to r1i0n6, socket0, all cores
+rank 2 will be bounded to r1i0n5, socket1, core1 and core2
+rank 3 will be bounded to r1i0n17, socket0 core1, socket1 core0, core1,
+core2
+rank 4 will be bounded to r1i0n6, all cores on both sockets
+
+  $ mpiexec -n 5 -rf rankfile --report-bindings ./helloworld_mpi.x
+  [r1i0n17:11180]  MCW rank 3 bound to socket 0[core 1] socket 1[core 0-2]: [. B . . . . . . . . . .][B B B . . . . . . . . .] (slot list 0:1,1:0-2)
+  [r1i0n7:09928] MCW rank 0 bound to socket 1[core 0-1]: [. . . . . . . . . . . .][B B . . . . . . . . . .] (slot list 1:0,1)
+  [r1i0n6:10395] MCW rank 1 bound to socket 0[core 0-7]: [B B B B B B B B B B B B][. . . . . . . . . . . .] (slot list 0:*)
+  [r1i0n5:10406]  MCW rank 2 bound to socket 1[core 1-2]: [. . . . . . . . . . . .][. B B . . . . . . . . .] (slot list 1:1-2)
+  [r1i0n6:10406]  MCW rank 4 bound to socket 0[core 0-7] socket 1[core 0-7]: [B B B B B B B B B B B B][B B B B B B B B B B B B] (slot list 0:*,1:*)
+  Hello world! from rank 3 of 5 on host r1i0n17
+  Hello world! from rank 1 of 5 on host r1i0n6
+  Hello world! from rank 0 of 5 on host r1i0n7
+  Hello world! from rank 4 of 5 on host r1i0n6
+  Hello world! from rank 2 of 5 on host r1i0n5
+
+In this example we run 5 MPI processes (5 ranks) on four nodes. The
+rankfile defines how the processes will be mapped on the nodes, sockets
+and cores. The **--report-bindings** option was used to print out the
+actual process location and bindings. Note that ranks 1 and 4 run on the
+same node and their core binding overlaps.
+
+It is users responsibility to provide correct number of ranks, sockets
+and cores.
+
+### Bindings verification
+
+In all cases, binding and threading may be verified by executing for
+example:
+
+  $ mpiexec -bysocket -bind-to-socket --report-bindings echo
+  $ mpiexec -bysocket -bind-to-socket numactl --show
+  $ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS
+
+Changes in OpenMPI 1.8
+----------------------
+
+Some options have changed in OpenMPI version 1.8.
+
+<table>
+<colgroup>
+<col width="50%" />
+<col width="50%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th align="left">version 1.6.5</th>
+<th align="left">version 1.8.1</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">--bind-to-none</td>
+<td align="left">--bind-to none</td>
+</tr>
+<tr class="even">
+<td align="left">--bind-to-core</td>
+<td align="left">--bind-to core</td>
+</tr>
+<tr class="odd">
+<td align="left">--bind-to-socket</td>
+<td align="left">--bind-to socket</td>
+</tr>
+<tr class="even">
+<td align="left">-bysocket</td>
+<td align="left">--map-by socket</td>
+</tr>
+<tr class="odd">
+<td align="left">-bycore</td>
+<td align="left">--map-by core</td>
+</tr>
+<tr class="even">
+<td align="left">-pernode</td>
+<td align="left"><p> class="s1">--map-by ppr:1:node</p></td>
+</tr>
+</tbody>
+</table>
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/mpi-1/mpi.md
+++ b/converted/docs.it4i.cz/salomon/software/mpi-1/mpi.md
+MPI 
+===
+
+
+
+
+
+
+
+
+
+
+
+
+
+Setting up MPI Environment
+--------------------------
+
+The Salomon cluster provides several implementations of the MPI library:
+
+-------------------------------------------------------------------------
+MPI Library                          Thread support
+------------------------------------ ------------------------------------
+Intel MPI 4.1**                    Full thread support up to
+                                     MPI_THREAD_MULTIPLE
+
+Intel MPI 5.0**                    Full thread support up to
+                                     MPI_THREAD_MULTIPLE
+
+OpenMPI 1.8.6                        Full thread support up to
+                                     MPI_THREAD_MULTIPLE, MPI-3.0
+                                     support
+
+SGI MPT 2.12                         
+-------------------------------------------------------------------------
+
+MPI libraries are activated via the environment modules.
+
+Look up section modulefiles/mpi in module avail
+
+  $ module avail
+  ------------------------------ /apps/modules/mpi -------------------------------
+  impi/4.1.1.036-iccifort-2013.5.192
+  impi/4.1.1.036-iccifort-2013.5.192-GCC-4.8.3
+  impi/5.0.3.048-iccifort-2015.3.187
+  impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25
+  MPT/2.12
+  OpenMPI/1.8.6-GNU-5.1.0-2.25
+
+There are default compilers associated with any particular MPI
+implementation. The defaults may be changed, the MPI libraries may be
+used in conjunction with any compiler.
+The defaults are selected via the modules in following way
+
+--------------------------------------------------------------------------
+Module                   MPI                      Compiler suite
+------------------------ ------------------------ ------------------------
+impi-5.0.3.048-iccifort- Intel MPI 5.0.3          
+2015.3.187                                        
+
+OpenMP-1.8.6-GNU-5.1.0-2 OpenMPI 1.8.6            
+.25                                               
+--------------------------------------------------------------------------
+
+Examples:
+
+  $ module load gompi/2015b
+
+In this example, we activate the latest OpenMPI with latest GNU
+compilers (OpenMPI 1.8.6 and GCC 5.1). Please see more information about
+toolchains in section [Environment and
+Modules](../../environment-and-modules.html) .
+
+To use OpenMPI with the intel compiler suite, use
+
+  $ module load iompi/2015.03
+
+In this example, the openmpi 1.8.6 using intel compilers is activated.
+It's used "iompi" toolchain.
+
+Compiling MPI Programs
+----------------------
+
+After setting up your MPI environment, compile your program using one of
+the mpi wrappers
+
+  $ mpicc -v
+  $ mpif77 -v
+  $ mpif90 -v
+
+When using Intel MPI, use the following MPI wrappers:
+
+  $ mpicc
+  $ mpiifort 
+
+Wrappers mpif90, mpif77 that are provided by Intel MPI are designed for
+gcc and gfortran. You might be able to compile MPI code by them even
+with Intel compilers, but you might run into problems (for example,
+native MIC compilation with -mmic does not work with mpif90).
+
+Example program:
+
+  // helloworld_mpi.c
+  #include <stdio.h>
+
+  #include<mpi.h>
+
+  int main(int argc, char **argv) {
+
+  int len;
+  int rank, size;
+  char node[MPI_MAX_PROCESSOR_NAME];
+
+  // Initiate MPI
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
+  MPI_Comm_size(MPI_COMM_WORLD,&size);
+
+  // Get hostame and print
+  MPI_Get_processor_name(node,&len);
+  printf("Hello world! from rank %d of %d on host %sn",rank,size,node);
+
+  // Finalize and exit
+  MPI_Finalize();
+
+  return 0;
+  }
+
+Compile the above example with
+
+  $ mpicc helloworld_mpi.c -o helloworld_mpi.x
+
+Running MPI Programs
+--------------------
+
+The MPI program executable must be compatible with the loaded MPI
+module.
+Always compile and execute using the very same MPI module.
+
+It is strongly discouraged to mix mpi implementations. Linking an
+application with one MPI implementation and running mpirun/mpiexec form
+other implementation may result in unexpected errors.
+
+The MPI program executable must be available within the same path on all
+nodes. This is automatically fulfilled on the /home and /scratch
+filesystem. You need to preload the executable, if running on the local
+scratch /lscratch filesystem.
+
+### Ways to run MPI programs
+
+Optimal way to run an MPI program depends on its memory requirements,
+memory access pattern and communication pattern.
+
+Consider these ways to run an MPI program:
+1. One MPI process per node, 24 threads per process
+2. Two MPI processes per node, 12 threads per process
+3. 24 MPI processes per node, 1 thread per process.
+
+One MPI** process per node, using 24 threads, is most useful for
+memory demanding applications, that make good use of processor cache
+memory and are not memory bound.  This is also a preferred way for
+communication intensive applications as one process per node enjoys full
+bandwidth access to the network interface. 
+
+Two MPI** processes per node, using 12 threads each, bound to
+processor socket is most useful for memory bandwidth bound applications
+such as BLAS1 or FFT, with scalable memory demand. However, note that
+the two processes will share access to the network interface. The 12
+threads and socket binding should ensure maximum memory access bandwidth
+and minimize communication, migration and numa effect overheads.
+
+Important!  Bind every OpenMP thread to a core!
+
+In the previous two cases with one or two MPI processes per node, the
+operating system might still migrate OpenMP threads between cores. You
+want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY
+environment variables.
+
+24 MPI** processes per node, using 1 thread each bound to processor
+core is most suitable for highly scalable applications with low
+communication demand.
+
+### Running OpenMPI
+
+The [**OpenMPI 1.8.6**](http://www.open-mpi.org/) is
+based on OpenMPI. Read more on [how to run
+OpenMPI](Running_OpenMPI.html) based MPI.
+
+ 
+
+The Intel MPI may run on the[Intel Xeon
+Ph](../intel-xeon-phi.html)i accelerators as well. Read
+more on [how to run Intel MPI on
+accelerators](../intel-xeon-phi.html).
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/mpi-1/mpi4py-mpi-for-python.md
+++ b/converted/docs.it4i.cz/salomon/software/mpi-1/mpi4py-mpi-for-python.md
+MPI4Py (MPI for Python) 
+=======================
+
+
+
+
+
+OpenMPI interface to Python
+
+
+
+
+
+
+
+
+
+
+
+
+Introduction
+------------
+
+MPI for Python provides bindings of the Message Passing Interface (MPI)
+standard for the Python programming language, allowing any Python
+program to exploit multiple processors.
+
+This package is constructed on top of the MPI-1/2 specifications and
+provides an object oriented interface which closely follows MPI-2 C++
+bindings. It supports point-to-point (sends, receives) and collective
+(broadcasts, scatters, gathers) communications of any picklable Python
+object, as well as optimized communications of Python object exposing
+the single-segment buffer interface (NumPy arrays, builtin
+bytes/string/array objects).
+
+On Anselm MPI4Py is available in standard Python modules.
+
+Modules
+-------
+
+MPI4Py is build for OpenMPI. Before you start with MPI4Py you need to
+load Python and OpenMPI modules. You can use toolchain, that loads
+Python and OpenMPI at once.
+
+  $ module load Python/2.7.9-foss-2015g
+
+Execution
+---------
+
+You need to import MPI to your python program. Include the following
+line to the python script:
+
+  from mpi4py import MPI
+
+The MPI4Py enabled python programs [execute as any other
+OpenMPI](Running_OpenMPI.html) code.The simpliest way is
+to run
+
+  $ mpiexec python <script>.py
+
+>For example
+
+  $ mpiexec python hello_world.py
+
+Examples
+--------
+
+### Hello world!
+
+  from mpi4py import MPI
+
+  comm = MPI.COMM_WORLD
+
+  print "Hello! I'm rank %d from %d running in total..." % (comm.rank, comm.size)
+
+  comm.Barrier()   # wait for everybody to synchronize
+
+### >Collective Communication with NumPy arrays
+
+  from __future__ import division
+  from mpi4py import MPI
+  import numpy as np
+
+  comm = MPI.COMM_WORLD
+
+  print("-"*78)
+  print(" Running on %d cores" % comm.size)
+  print("-"*78)
+
+  comm.Barrier()
+
+  # Prepare a vector of N=5 elements to be broadcasted...
+  N = 5
+  if comm.rank == 0:
+      A = np.arange(N, dtype=np.float64)    # rank 0 has proper data
+  else:
+      A = np.empty(N, dtype=np.float64)     # all other just an empty array
+
+  # Broadcast A from rank 0 to everybody
+  comm.Bcast( [A, MPI.DOUBLE] )
+
+  # Everybody should now have the same...
+  print "[%02d] %s" % (comm.rank, A)
+
+Execute the above code as:
+
+  $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I
+
+  $ module load Python/2.7.9-foss-2015g
+
+  $ mpiexec --map-by core --bind-to core python hello_world.py
+
+In this example, we run MPI4Py enabled code on 4 nodes, 24 cores per
+node (total of 96 processes), each python process is bound to a
+different core.
+More examples and documentation can be found on [MPI for Python
+webpage](https://pythonhosted.org/mpi4py/usrman/index.html).
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/numerical-languages/introduction.md
+++ b/converted/docs.it4i.cz/salomon/software/numerical-languages/introduction.md
+Numerical languages 
+===================
+
+
+
+
+
+Interpreted languages for numerical computations and analysis
+
+
+
+
+
+
+
+
+
+
+
+
+Introduction
+------------
+
+This section contains a collection of high-level interpreted languages,
+primarily intended for numerical computations.
+
+Matlab
+------
+
+MATLAB®^ is a high-level language and interactive environment for
+numerical computation, visualization, and programming.
+
+  $ module load MATLAB
+  $ matlab
+
+Read more at the [Matlab 
+page](matlab.html).
+
+Octave
+------
+
+GNU Octave is a high-level interpreted language, primarily intended for
+numerical computations. The Octave language is quite similar to Matlab
+so that most programs are easily portable.
+
+  $ module load Octave
+  $ octave
+
+Read more at the [Octave page](octave.html).
+
+R
+-
+
+The R is an interpreted language and environment for statistical
+computing and graphics.
+
+  $ module load R
+  $ R
+
+Read more at the [R page](r.html).
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/numerical-languages/matlab.md
+++ b/converted/docs.it4i.cz/salomon/software/numerical-languages/matlab.md
+Matlab 
+======
+
+
+
+
+
+
+
+
+
+
+
+
+
+Introduction
+------------
+
+Matlab is available in versions R2015a and R2015b. There are always two
+variants of the release:
+
+- Non commercial or so called EDU variant, which can be used for
+  common research and educational purposes.
+- Commercial or so called COM variant, which can used also for
+  commercial activities. The licenses for commercial variant are much
+  more expensive, so usually the commercial variant has only subset of
+  features compared to the EDU available.
+
+ 
+
+To load the latest version of Matlab load the module
+
+  $ module load MATLAB
+
+By default the EDU variant is marked as default. If you need other
+version or variant, load the particular version. To obtain the list of
+available versions use
+
+  $ module avail MATLAB
+
+If you need to use the Matlab GUI to prepare your Matlab programs, you
+can use Matlab directly on the login nodes. But for all computations use
+Matlab on the compute nodes via PBS Pro scheduler.
+
+If you require the Matlab GUI, please follow the general informations
+about [running graphical
+applications](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html).
+
+Matlab GUI is quite slow using the X forwarding built in the PBS (qsub
+-X), so using X11 display redirection either via SSH or directly by
+xauth (please see the "GUI Applications on Compute Nodes over VNC" part
+[here](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html))
+is recommended.
+
+To run Matlab with GUI, use
+
+  $ matlab
+
+To run Matlab in text mode, without the Matlab Desktop GUI environment,
+use
+
+  $ matlab -nodesktop -nosplash
+
+plots, images, etc... will be still available.
+
+Running parallel Matlab using Distributed Computing Toolbox / Engine
+------------------------------------------------------------------------
+
+Distributed toolbox is available only for the EDU variant
+
+The MPIEXEC mode available in previous versions is no longer available
+in MATLAB 2015. Also, the programming interface has changed. Refer
+to [Release
+Notes](http://www.mathworks.com/help/distcomp/release-notes.html#buanp9e-1).
+
+Delete previously used file mpiLibConf.m, we have observed crashes when
+using Intel MPI.
+
+To use Distributed Computing, you first need to setup a parallel
+profile. We have provided the profile for you, you can either import it
+in MATLAB command line:
+
+  >> parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonPBSPro.settings')
+
+  ans = 
+
+  SalomonPBSPro 
+
+Or in the GUI, go to tab HOME -&gt; Parallel -&gt; Manage Cluster
+Profiles..., click Import and navigate to :
+
+/apps/all/MATLAB/2015b-EDU/SalomonPBSPro.settings
+
+With the new mode, MATLAB itself launches the workers via PBS, so you
+can either use interactive mode or a batch mode on one node, but the
+actual parallel processing will be done in a separate job started by
+MATLAB itself. Alternatively, you can use "local" mode to run parallel
+code on just a single node.
+
+### Parallel Matlab interactive session
+
+Following example shows how to start interactive session with support
+for Matlab GUI. For more information about GUI based applications on
+Anselm see [this
+page](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html).
+
+  $ xhost +
+  $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 
+  -l feature__matlab__MATLAB=1 
+
+This qsub command example shows how to run Matlab on a single node.
+
+The second part of the command shows how to request all necessary
+licenses. In this case 1 Matlab-EDU license and 48 Distributed Computing
+Engines licenses.
+
+Once the access to compute nodes is granted by PBS, user can load
+following modules and start Matlab: 
+
+  r1i0n17$ module load MATLAB/2015a-EDU
+  r1i0n17$ matlab &
+
+### Parallel Matlab batch job in Local mode
+
+To run matlab in batch mode, write an matlab script, then write a bash
+jobscript and execute via the qsub command. By default, matlab will
+execute one matlab worker instance per allocated core.
+
+  #!/bin/bash
+  #PBS -A PROJECT ID
+  #PBS -q qprod
+  #PBS -l select=1:ncpus=24:mpiprocs=24:ompthreads=1
+
+  # change to shared scratch directory
+  SCR=/scratch/work/user/$USER/$PBS_JOBID
+  mkdir -p $SCR ; cd $SCR || exit
+
+  # copy input file to scratch 
+  cp $PBS_O_WORKDIR/matlabcode.m .
+
+  # load modules
+  module load MATLAB/2015a-EDU
+
+  # execute the calculation
+  matlab -nodisplay -r matlabcode > output.out
+
+  # copy output file to home
+  cp output.out $PBS_O_WORKDIR/.
+
+This script may be submitted directly to the PBS workload manager via
+the qsub command.  The inputs and matlab script are in matlabcode.m
+file, outputs in output.out file. Note the missing .m extension in the
+matlab -r matlabcodefile call, **the .m must not be included**.  Note
+that the **shared /scratch must be used**. Further, it is **important to
+include quit** statement at the end of the matlabcode.m script.
+
+Submit the jobscript using qsub
+
+  $ qsub ./jobscript
+
+### Parallel Matlab Local mode program example
+
+The last part of the configuration is done directly in the user Matlab
+script before Distributed Computing Toolbox is started.
+
+  cluster = parcluster('local')
+
+This script creates scheduler object "cluster" of type "local" that
+starts workers locally. 
+
+Please note: Every Matlab script that needs to initialize/use matlabpool
+has to contain these three lines prior to calling parpool(sched, ...)
+function. 
+
+The last step is to start matlabpool with "cluster" object and correct
+number of workers. We have 24 cores per node, so we start 24 workers.
+
+  parpool(cluster,24);
+                    
+                   
+  ... parallel code ...
+                   
+                     
+  parpool close
+
+The complete example showing how to use Distributed Computing Toolbox in
+local mode is shown here. 
+
+  cluster = parcluster('local');
+  cluster
+
+  parpool(cluster,24);
+
+  n=2000;
+
+  W = rand(n,n);
+  W = distributed(W);
+  x = (1:n)';
+  x = distributed(x);
+  spmd
+  [~, name] = system('hostname')
+      
+      T = W*x; % Calculation performed on labs, in parallel.
+               % T and W are both codistributed arrays here.
+  end
+  T;
+  whos         % T and W are both distributed arrays here.
+
+  parpool close
+  quit
+
+You can copy and paste the example in a .m file and execute. Note that
+the parpool size should correspond to **total number of cores**
+available on allocated nodes.
+
+### Parallel Matlab Batch job using PBS mode (workers spawned in a separate job)
+
+This mode uses PBS scheduler to launch the parallel pool. It uses the
+SalomonPBSPro profile that needs to be imported to Cluster Manager, as
+mentioned before. This methodod uses MATLAB's PBS Scheduler interface -
+it spawns the workers in a separate job submitted by MATLAB using qsub.
+
+This is an example of m-script using PBS mode:
+
+  cluster = parcluster('SalomonPBSPro');
+  set(cluster, 'SubmitArguments', '-A OPEN-0-0');
+  set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=24');
+  set(cluster, 'NumWorkers', 240);
+
+  pool = parpool(cluster,240);
+
+  n=2000;
+
+  W = rand(n,n);
+  W = distributed(W);
+  x = (1:n)';
+  x = distributed(x);
+  spmd
+  [~, name] = system('hostname')
+
+      T = W*x; % Calculation performed on labs, in parallel.
+               % T and W are both codistributed arrays here.
+  end
+  whos         % T and W are both distributed arrays here.
+
+  % shut down parallel pool
+  delete(pool)
+
+Note that we first construct a cluster object using the imported
+profile, then set some important options, namely : SubmitArguments,
+where you need to specify accounting id, and ResourceTemplate, where you
+need to specify number of nodes to run the job. 
+
+You can start this script using batch mode the same way as in Local mode
+example.
+
+### Parallel Matlab Batch with direct launch (workers spawned within the existing job)
+
+This method is a "hack" invented by us to emulate the mpiexec
+functionality found in previous MATLAB versions. We leverage the MATLAB
+Generic Scheduler interface, but instead of submitting the workers to
+PBS, we launch the workers directly within the running job, thus we
+avoid the issues with master script and workers running in separate jobs
+(issues with license not available, waiting for the worker's job to
+spawn etc.)
+
+Please note that this method is experimental.
+
+For this method, you need to use SalomonDirect profile, import it
+using [the same way as
+SalomonPBSPro](matlab.html#running-parallel-matlab-using-distributed-computing-toolbox---engine) 
+
+This is an example of m-script using direct mode:
+
+  parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonDirect.settings')
+  cluster = parcluster('SalomonDirect');
+  set(cluster, 'NumWorkers', 48);
+
+  pool = parpool(cluster, 48);
+
+  n=2000;
+
+  W = rand(n,n);
+  W = distributed(W);
+  x = (1:n)';
+  x = distributed(x);
+  spmd
+  [~, name] = system('hostname')
+
+      T = W*x; % Calculation performed on labs, in parallel.
+               % T and W are both codistributed arrays here.
+  end
+  whos         % T and W are both distributed arrays here.
+
+  % shut down parallel pool
+  delete(pool)
+
+### Non-interactive Session and Licenses
+
+If you want to run batch jobs with Matlab, be sure to request
+appropriate license features with the PBS Pro scheduler, at least the "
+-l __feature__matlab__MATLAB=1" for EDU variant of Matlab. More
+information about how to check the license features states and how to
+request them with PBS Pro, please [look
+here](../../../anselm-cluster-documentation/software/isv_licenses.html).
+
+The licensing feature of PBS is currently disabled.
+
+In case of non-interactive session please read the [following
+information](../../../anselm-cluster-documentation/software/isv_licenses.html)
+on how to modify the qsub command to test for available licenses prior
+getting the resource allocation.
+
+### Matlab Distributed Computing Engines start up time
+
+Starting Matlab workers is an expensive process that requires certain
+amount of time. For your information please see the following table:
+
+compute nodes   number of workers   start-up time[s]
+--------------- ------------------- --------------------
+16              384                 831
+8               192                 807
+4               96                  483
+2               48                  16
+
+MATLAB on UV2000 
+-----------------
+
+UV2000 machine available in queue "qfat" can be used for MATLAB
+computations. This is a SMP NUMA machine with large amount of RAM, which
+can be beneficial for certain types of MATLAB jobs. CPU cores are
+allocated in chunks of 8 for this machine.
+
+You can use MATLAB on UV2000 in two parallel modes :
+
+### Threaded mode
+
+Since this is a SMP machine, you can completely avoid using Parallel
+Toolbox and use only MATLAB's threading. MATLAB will automatically
+detect the number of cores you have allocated and will set 
+maxNumCompThreads accordingly and certain
+operations, such as  fft, , eig, svd,
+etc. will be automatically run in threads. The advantage of this mode is
+that you don't need to modify your existing sequential codes.
+
+
+
+### Local cluster mode
+
+You can also use Parallel Toolbox on UV2000. Use l[ocal cluster
+mode](matlab.html#parallel-matlab-batch-job-in-local-mode),
+"SalomonPBSPro" profile will not work.
+
+ 
+
+ 
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/numerical-languages/octave.md
+++ b/converted/docs.it4i.cz/salomon/software/numerical-languages/octave.md
+Octave 
+======
+
+
+
+
+
+
+
+
+
+
+
+
+
+GNU Octave is a high-level interpreted language, primarily intended for
+numerical computations. It provides capabilities for the numerical
+solution of linear and nonlinear problems, and for performing other
+numerical experiments. It also provides extensive graphics capabilities
+for data visualization and manipulation. Octave is normally used through
+its interactive command line interface, but it can also be used to write
+non-interactive programs. The Octave language is quite similar to Matlab
+so that most programs are easily portable. Read more on
+<http://www.gnu.org/software/octave/>****
+
+
+Two versions of octave are available on the cluster, via module
+
+Status       Version        module
+------------ -------------- --------
+Stable**   Octave 3.8.2   Octave
+
+ 
+
+  $ module load Octave
+
+The octave on the cluster is linked to highly optimized MKL mathematical
+library. This provides threaded parallelization to many octave kernels,
+notably the linear algebra subroutines. Octave runs these heavy
+calculation kernels without any penalty. By default, octave would
+parallelize to 24 threads. You may control the threads by setting the
+OMP_NUM_THREADS environment variable.
+
+To run octave interactively, log in with ssh -X parameter for X11
+forwarding. Run octave:
+
+  $ octave
+
+To run octave in batch mode, write an octave script, then write a bash
+jobscript and execute via the qsub command. By default, octave will use
+16 threads when running MKL kernels.
+
+  #!/bin/bash
+
+  # change to local scratch directory
+  mkdir -p /scratch/work/user/$USER/$PBS_JOBID
+  cd /scratch/work/user/$USER/$PBS_JOBID || exit
+
+  # copy input file to scratch 
+  cp $PBS_O_WORKDIR/octcode.m .
+
+  # load octave module
+  module load Octave
+
+  # execute the calculation
+  octave -q --eval octcode > output.out
+
+  # copy output file to home
+  cp output.out $PBS_O_WORKDIR/.
+
+  #exit
+  exit
+
+This script may be submitted directly to the PBS workload manager via
+the qsub command.  The inputs are in octcode.m file, outputs in
+output.out file. See the single node jobscript example in the [Job
+execution
+section](../../resource-allocation-and-job-execution.html).
+
+The octave c compiler mkoctfile calls the GNU gcc 4.8.1 for compiling
+native c code. This is very useful for running native c subroutines in
+octave environment.
+
+$ mkoctfile -v
+
+Octave may use MPI for interprocess communication
+This functionality is currently not supported on the cluster cluster. In
+case you require the octave interface to MPI, please contact our
+[cluster support](https://support.it4i.cz/rt/).
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/numerical-languages/r.md
+++ b/converted/docs.it4i.cz/salomon/software/numerical-languages/r.md
+R 
+=
+
+
+
+
+
+
+
+
+
+
+
+
+
+Introduction 
+------------
+
+The R is a language and environment for statistical computing and
+graphics.  R provides a wide variety of statistical (linear and
+nonlinear modelling, classical statistical tests, time-series analysis,
+classification, clustering, ...) and graphical techniques, and is highly
+extensible.
+
+
+
+One of R's strengths is the ease with which well-designed
+publication-quality plots can be produced, including mathematical
+symbols and formulae where needed. Great care has been taken over the
+defaults for the minor design choices in graphics, but the user retains
+full control.
+
+Another convenience is the ease with which the C code or third party
+libraries may be integrated within R.
+
+Extensive support for parallel computing is available within R.
+
+Read more on <http://www.r-project.org/>,
+<http://cran.r-project.org/doc/manuals/r-release/R-lang.html>
+
+Modules
+-------
+
+**The R version 3.1.1 is available on the cluster, along with GUI
+interface Rstudio
+
+Application   Version        module
+------------- -------------- ---------------------
+R**         R 3.1.1        R/3.1.1-intel-2015b
+Rstudio**   Rstudio 0.97   Rstudio
+
+  $ module load R
+
+Execution
+---------
+
+The R on Anselm is linked to highly optimized MKL mathematical
+library. This provides threaded parallelization to many R kernels,
+notably the linear algebra subroutines. The R runs these heavy
+calculation kernels without any penalty. By default, the R would
+parallelize to 24 threads. You may control the threads by setting the
+OMP_NUM_THREADS environment variable.
+
+### Interactive execution
+
+To run R interactively, using Rstudio GUI, log in with ssh -X parameter
+for X11 forwarding. Run rstudio:
+
+  $ module load Rstudio
+  $ rstudio
+
+### Batch execution
+
+To run R in batch mode, write an R script, then write a bash jobscript
+and execute via the qsub command. By default, R will use 24 threads when
+running MKL kernels.
+
+Example jobscript:
+
+  #!/bin/bash
+
+  # change to local scratch directory
+  cd /lscratch/$PBS_JOBID || exit
+
+  # copy input file to scratch 
+  cp $PBS_O_WORKDIR/rscript.R .
+
+  # load R module
+  module load R
+
+  # execute the calculation
+  R CMD BATCH rscript.R routput.out
+
+  # copy output file to home
+  cp routput.out $PBS_O_WORKDIR/.
+
+  #exit
+  exit
+
+This script may be submitted directly to the PBS workload manager via
+the qsub command.  The inputs are in rscript.R file, outputs in
+routput.out file. See the single node jobscript example in the [Job
+execution
+section](../../resource-allocation-and-job-execution/job-submission-and-execution.html).
+
+Parallel R
+----------
+
+Parallel execution of R may be achieved in many ways. One approach is
+the implied parallelization due to linked libraries or specially enabled
+functions, as [described
+above](r.html#interactive-execution). In the following
+sections, we focus on explicit parallelization, where  parallel
+constructs are directly stated within the R script.
+
+Package parallel
+--------------------
+
+The package parallel provides support for parallel computation,
+including by forking (taken from package multicore), by sockets (taken
+from package snow) and random-number generation.
+
+The package is activated this way:
+
+  $ R
+  > library(parallel)
+
+More information and examples may be obtained directly by reading the
+documentation available in R
+
+  > ?parallel
+  > library(help = "parallel")
+  > vignette("parallel")
+
+Download the package
+[parallell](package-parallel-vignette) vignette.
+
+The forking is the most simple to use. Forking family of functions
+provide parallelized, drop in replacement for the serial apply() family
+of functions.
+
+Forking via package parallel provides functionality similar to OpenMP
+construct
+#omp parallel for
+
+Only cores of single node can be utilized this way!
+
+Forking example:
+
+  library(parallel)
+
+  #integrand function
+  f <- function(i,h) {
+  x <- h*(i-0.5)
+  return (4/(1 + x*x))
+  }
+
+  #initialize
+  size <- detectCores()
+
+  while (TRUE)
+  {
+    #read number of intervals
+    cat("Enter the number of intervals: (0 quits) ")
+    fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp)
+
+    if(n<=0) break
+
+    #run the calculation
+    n <- max(n,size)
+    h <-   1.0/n
+
+    i <- seq(1,n);
+    pi3 <- h*sum(simplify2array(mclapply(i,f,h,mc.cores=size)));
+
+    #print results
+    cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi))
+  }
+
+The above example is the classic parallel example for calculating the
+number π. Note the **detectCores()** and **mclapply()** functions.
+Execute the example as:
+
+  $ R --slave --no-save --no-restore -f pi3p.R
+
+Every evaluation of the integrad function runs in parallel on different
+process.
+
+Package Rmpi
+------------
+
+package Rmpi provides an interface (wrapper) to MPI APIs.
+
+It also provides interactive R slave environment. On the cluster, Rmpi
+provides interface to the
+[OpenMPI](../mpi-1/Running_OpenMPI.html).
+
+Read more on Rmpi at <http://cran.r-project.org/web/packages/Rmpi/>,
+reference manual is available at
+<http://cran.r-project.org/web/packages/Rmpi/Rmpi.pdf>
+
+When using package Rmpi, both openmpi and R modules must be loaded
+
+  $ module load OpenMPI
+  $ module load R
+
+Rmpi may be used in three basic ways. The static approach is identical
+to executing any other MPI programm. In addition, there is Rslaves
+dynamic MPI approach and the mpi.apply approach. In the following
+section, we will use the number π integration example, to illustrate all
+these concepts.
+
+### static Rmpi
+
+Static Rmpi programs are executed via mpiexec, as any other MPI
+programs. Number of processes is static - given at the launch time.
+
+Static Rmpi example:
+
+  library(Rmpi)
+
+  #integrand function
+  f <- function(i,h) {
+  x <- h*(i-0.5)
+  return (4/(1 + x*x))
+  }
+
+  #initialize
+  invisible(mpi.comm.dup(0,1))
+  rank <- mpi.comm.rank()
+  size <- mpi.comm.size()
+  n<-0
+
+  while (TRUE)
+  {
+    #read number of intervals
+    if (rank==0) {
+     cat("Enter the number of intervals: (0 quits) ")
+     fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp)
+    }
+
+    #broadcat the intervals
+    n <- mpi.bcast(as.integer(n),type=1)
+
+    if(n<=0) break
+
+    #run the calculation
+    n <- max(n,size)
+    h <-   1.0/n
+
+    i <- seq(rank+1,n,size);
+    mypi <- h*sum(sapply(i,f,h));
+
+    pi3 <- mpi.reduce(mypi)
+
+    #print results
+    if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi))
+  }
+
+  mpi.quit()
+
+The above is the static MPI example for calculating the number π. Note
+the **library(Rmpi)** and **mpi.comm.dup()** function calls.
+Execute the example as:
+
+  $ mpirun R --slave --no-save --no-restore -f pi3.R
+
+### dynamic Rmpi
+
+Dynamic Rmpi programs are executed by calling the R directly. OpenMPI
+module must be still loaded. The R slave processes will be spawned by a
+function call within the Rmpi program.
+
+Dynamic Rmpi example:
+
+  #integrand function
+  f <- function(i,h) {
+  x <- h*(i-0.5)
+  return (4/(1 + x*x))
+  }
+
+  #the worker function
+  workerpi <- function()
+  {
+  #initialize
+  rank <- mpi.comm.rank()
+  size <- mpi.comm.size()
+  n<-0
+
+  while (TRUE)
+  {
+    #read number of intervals
+    if (rank==0) {
+     cat("Enter the number of intervals: (0 quits) ")
+     fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp)
+    }
+
+    #broadcat the intervals
+    n <- mpi.bcast(as.integer(n),type=1)
+
+    if(n<=0) break
+
+    #run the calculation
+    n <- max(n,size)
+    h <-   1.0/n
+
+    i <- seq(rank+1,n,size);
+    mypi <- h*sum(sapply(i,f,h));
+
+    pi3 <- mpi.reduce(mypi)
+
+    #print results
+    if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi))
+  }
+  }
+
+  #main
+  library(Rmpi)
+
+  cat("Enter the number of slaves: ")
+  fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp)
+
+  mpi.spawn.Rslaves(nslaves=ns)
+  mpi.bcast.Robj2slave(f)
+  mpi.bcast.Robj2slave(workerpi)
+
+  mpi.bcast.cmd(workerpi())
+  workerpi()
+
+
+  mpi.quit()
+
+The above example is the dynamic MPI example for calculating the number
+π. Both master and slave processes carry out the calculation. Note the
+mpi.spawn.Rslaves(), mpi.bcast.Robj2slave()** and the
+mpi.bcast.cmd()** function calls.
+Execute the example as:
+
+  $ mpirun -np 1 R --slave --no-save --no-restore -f pi3Rslaves.R
+
+Note that this method uses MPI_Comm_spawn (Dynamic process feature of
+MPI-2) to start the slave processes - the master process needs to be
+launched with MPI. In general, Dynamic processes are not well supported
+among MPI implementations, some issues might arise. Also, environment
+variables are not propagated to spawned processes, so they will not see
+paths from modules.
+### mpi.apply Rmpi
+
+mpi.apply is a specific way of executing Dynamic Rmpi programs.
+
+mpi.apply() family of functions provide MPI parallelized, drop in
+replacement for the serial apply() family of functions.
+
+Execution is identical to other dynamic Rmpi programs.
+
+mpi.apply Rmpi example:
+
+  #integrand function
+  f <- function(i,h) {
+  x <- h*(i-0.5)
+  return (4/(1 + x*x))
+  }
+
+  #the worker function
+  workerpi <- function(rank,size,n)
+  {
+    #run the calculation
+    n <- max(n,size)
+    h <- 1.0/n
+
+    i <- seq(rank,n,size);
+    mypi <- h*sum(sapply(i,f,h));
+
+    return(mypi)
+  }
+
+  #main
+  library(Rmpi)
+
+  cat("Enter the number of slaves: ")
+  fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp)
+
+  mpi.spawn.Rslaves(nslaves=ns)
+  mpi.bcast.Robj2slave(f)
+  mpi.bcast.Robj2slave(workerpi)
+
+  while (TRUE)
+  {
+    #read number of intervals
+    cat("Enter the number of intervals: (0 quits) ")
+    fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp)
+    if(n<=0) break
+
+    #run workerpi
+    i=seq(1,2*ns)
+    pi3=sum(mpi.parSapply(i,workerpi,2*ns,n))
+
+    #print results
+    cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi))
+  }
+
+  mpi.quit()
+
+The above is the mpi.apply MPI example for calculating the number π.
+Only the slave processes carry out the calculation. Note the
+mpi.parSapply(), ** function call. The package 
+parallel
+[example](r.html#package-parallel)[above](r.html#package-parallel){.anchor
+may be trivially adapted (for much better performance) to this structure
+using the mclapply() in place of mpi.parSapply().
+
+Execute the example as:
+
+  $ mpirun -np 1 R --slave --no-save --no-restore -f pi3parSapply.R
+
+Combining parallel and Rmpi
+---------------------------
+
+Currently, the two packages can not be combined for hybrid calculations.
+
+Parallel execution
+------------------
+
+The R parallel jobs are executed via the PBS queue system exactly as any
+other parallel jobs. User must create an appropriate jobscript and
+submit via the **qsub**
+
+Example jobscript for [static Rmpi](r.html#static-rmpi)
+parallel R execution, running 1 process per core:
+
+  #!/bin/bash
+  #PBS -q qprod
+  #PBS -N Rjob
+  #PBS -l select=100:ncpus=24:mpiprocs=24:ompthreads=1
+
+  # change to  scratch directory
+  SCRDIR=/scratch/work/user/$USER/myjob
+  cd $SCRDIR || exit
+
+  # copy input file to scratch 
+  cp $PBS_O_WORKDIR/rscript.R .
+
+  # load R and openmpi module
+  module load R
+  module load OpenMPI
+
+  # execute the calculation
+  mpiexec -bycore -bind-to-core R --slave --no-save --no-restore -f rscript.R
+
+  # copy output file to home
+  cp routput.out $PBS_O_WORKDIR/.
+
+  #exit
+  exit
+
+For more information about jobscripts and MPI execution refer to the
+[Job
+submission](../../resource-allocation-and-job-execution/job-submission-and-execution.html)
+and general [MPI](../mpi-1.html) sections.
+
+
+
+
+
+
+
+Xeon Phi Offload
+----------------
+
+By leveraging MKL, R can accelerate certain computations, most notably
+linear algebra operations on the Xeon Phi accelerator by using Automated
+Offload. To use MKL Automated Offload, you need to first set this
+environment variable before R execution :
+
+  $ export MKL_MIC_ENABLE=1
+
+[Read more about automatic
+offload](../intel-xeon-phi.html)
+
+
+
+
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/software/operating-system.md
+++ b/converted/docs.it4i.cz/salomon/software/operating-system.md
+Operating System 
+================
+
+
+
+
+
+The operating system, deployed on Salomon cluster
+
+
+
+
+
+
+
+
+
+
+
+
+The operating system on Salomon is Linux - CentOS 6.6.
+
+>The CentOS Linux distribution is a stable, predictable, manageable
+and reproducible platform derived from the sources of Red Hat Enterprise
+Linux (RHEL).
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/storage/cesnet-data-storage.md
+++ b/converted/docs.it4i.cz/salomon/storage/cesnet-data-storage.md
+CESNET Data Storage 
+===================
+
+
+
+
+
+
+
+
+
+
+
+
+
+Introduction
+------------
+
+Do not use shared filesystems at IT4Innovations as a backup for large
+amount of data or long-term archiving purposes.
+
+The IT4Innovations does not provide storage capacity for data archiving.
+Academic staff and students of research institutions in the Czech
+Republic can use [CESNET Storage
+service](https://du.cesnet.cz/).
+
+The CESNET Storage service can be used for research purposes, mainly by
+academic staff and students of research institutions in the Czech
+Republic.
+
+User of data storage CESNET (DU) association can become organizations or
+an individual person who is either in the current employment
+relationship (employees) or the current study relationship (students) to
+a legal entity (organization) that meets the “Principles for access to
+CESNET Large infrastructure (Access Policy)”.
+
+User may only use data storage CESNET for data transfer and storage
+which are associated with activities in science, research, development,
+the spread of education, culture and prosperity. In detail see
+“Acceptable Use Policy CESNET Large Infrastructure (Acceptable Use
+Policy, AUP)”.
+
+The service is documented at
+<https://du.cesnet.cz/wiki/doku.php/en/start>. For special requirements
+please contact directly CESNET Storage Department via e-mail
+[du-support(at)cesnet.cz](mailto:du-support@cesnet.cz).
+
+The procedure to obtain the CESNET access is quick and trouble-free.
+
+(source
+[https://du.cesnet.cz/](https://du.cesnet.cz/wiki/doku.php/en/start "CESNET Data Storage"))
+
+CESNET storage access
+---------------------
+
+### Understanding Cesnet storage
+
+It is very important to understand the Cesnet storage before uploading
+data. Please read
+<https://du.cesnet.cz/en/navody/home-migrace-plzen/start> first.
+
+Once registered for CESNET Storage, you may [access the
+storage](https://du.cesnet.cz/en/navody/faq/start) in
+number of ways. We recommend the SSHFS and RSYNC methods.
+
+### SSHFS Access
+
+SSHFS: The storage will be mounted like a local hard drive
+
+The SSHFS  provides a very convenient way to access the CESNET Storage.
+The storage will be mounted onto a local directory, exposing the vast
+CESNET Storage as if it was a local removable harddrive. Files can be
+than copied in and out in a usual fashion.
+
+First, create the mountpoint
+
+  $ mkdir cesnet
+
+Mount the storage. Note that you can choose among the ssh.du1.cesnet.cz
+(Plzen), ssh.du2.cesnet.cz (Jihlava), ssh.du3.cesnet.cz (Brno)
+Mount tier1_home **(only 5120M !)**:
+
+  $ sshfs username@ssh.du1.cesnet.cz:. cesnet/
+
+For easy future access from Anselm, install your public key
+
+  $ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys
+
+Mount tier1_cache_tape for the Storage VO:
+
+  $ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/
+
+View the archive, copy the files and directories in and out
+
+  $ ls cesnet/ 
+  $ cp -a mydir cesnet/.
+  $ cp cesnet/myfile .
+
+Once done, please remember to unmount the storage
+
+  $ fusermount -u cesnet
+
+### Rsync access
+
+Rsync provides delta transfer for best performance, can resume
+interrupted transfers
+
+Rsync is a fast and extraordinarily versatile file copying tool. It is
+famous for its delta-transfer algorithm, which reduces the amount of
+data sent over the network by sending only the differences between the
+source files and the existing files in the destination.  Rsync is widely
+used for backups and mirroring and as an improved copy command for
+everyday use.
+
+Rsync finds files that need to be transferred using a "quick check"
+algorithm (by default) that looks for files that have changed in size or
+in last-modified time.  Any changes in the other preserved attributes
+(as requested by options) are made on the destination file directly when
+the quick check indicates that the file's data does not need to be
+updated.
+
+More about Rsync at
+<https://du.cesnet.cz/en/navody/rsync/start#pro_bezne_uzivatele>
+
+Transfer large files to/from Cesnet storage, assuming membership in the
+Storage VO
+
+  $ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/.
+  $ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile .
+
+Transfer large directories to/from Cesnet storage, assuming membership
+in the Storage VO
+
+  $ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/.
+  $ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder .
+
+Transfer rates of about 28MB/s can be expected.
+
+
+
+
--- a/converted/docs.it4i.cz/salomon/storage/storage.md
+++ b/converted/docs.it4i.cz/salomon/storage/storage.md
+Storage 
+=======
+
+
+
+
+
+
+
+
+
+
+
+
+
+Introduction
+------------
+
+There are two main shared file systems on Salomon cluster, the [
+
+HOME](storage.html#home)
+and [ 
+SCRATCH](storage.html#shared-filesystems).
+All login and compute nodes may access same data on shared filesystems.
+Compute nodes are also equipped with local (non-shared) scratch, ramdisk
+and tmp filesystems.
+
+Policy (in a nutshell)
+----------------------
+
+Use [HOME](storage.html#home) for your most valuable data
+and programs.
+Use [WORK](storage.html#work) for your large project
+files
+Use [TEMP](storage.html#temp) for large scratch data.
+
+Do not use for [archiving](storage.html#archiving)!
+
+Archiving
+-------------
+
+Please don't use shared filesystems as a backup for large amount of data
+or long-term archiving mean. The academic staff and students of research
+institutions in the Czech Republic can use [CESNET storage
+service](../../anselm-cluster-documentation/storage-1/cesnet-data-storage.html),
+which is available via SSHFS.
+
+Shared Filesystems
+----------------------
+
+Salomon computer provides two main shared filesystems, the [
+HOME
+filesystem](storage.html#home-filesystem) and the
+[SCRATCH filesystem](storage.html#scratch-filesystem). The
+SCRATCH filesystem is partitioned to [WORK and TEMP
+workspaces](storage.html#shared-workspaces). The HOME
+filesystem is realized as a tiered NFS disk storage. The SCRATCH
+filesystem is realized as a parallel Lustre filesystem. Both shared file
+systems are accessible via the Infiniband network. Extended ACLs are
+provided on both HOME/SCRATCH filesystems for the purpose of sharing
+data with other users using fine-grained control.
+
+###HOME filesystem
+
+The HOME filesystem is realized as a Tiered filesystem, exported via
+NFS. The first tier has capacity 100TB, second tier has capacity 400TB.
+The filesystem is available on all login and computational nodes. The
+Home filesystem hosts the [HOME
+workspace](storage.html#home).
+
+###SCRATCH filesystem
+
+The  architecture of Lustre on Salomon is composed of two metadata
+servers (MDS) and six data/object storage servers (OSS). Accessible
+capacity is 1.69 PB, shared among all users. The SCRATCH filesystem
+hosts the [WORK and TEMP
+workspaces](storage.html#shared-workspaces).
+
+ class="listitem">Configuration of the SCRATCH Lustre storage
+
+
+
+  class="emphasis">
+-  SCRATCH Lustre object storage
+  
+
+  -   Disk array SFA12KX
+  -   540 4TB SAS 7.2krpm disks
+  -   54 OSTs of 10 disks in RAID6 (8+2)
+  -   15 hot-spare disks
+  -   4x 400GB SSD cache
+
+  
+
+-  SCRATCH Lustre metadata storage
+  
+
+  -   Disk array EF3015
+  -   12 600GB SAS 15krpm disks
+
+  
+
+
+
+### Understanding the Lustre Filesystems
+
+(source <http://www.nas.nasa.gov>)
+
+A user file on the Lustre filesystem can be divided into multiple chunks
+(stripes) and stored across a subset of the object storage targets
+(OSTs) (disks). The stripes are distributed among the OSTs in a
+round-robin fashion to ensure load balancing.
+
+When a client (a  compute 
+node from your job) needs to create
+or access a file, the client queries the metadata server (
+MDS) and the metadata target (
+MDT) for the layout and location of the
+[file's
+stripes](http://www.nas.nasa.gov/hecc/support/kb/Lustre_Basics_224.html#striping).
+Once the file is opened and the client obtains the striping information,
+the  MDS is no longer involved in the
+file I/O process. The client interacts directly with the object storage
+servers (OSSes) and OSTs to perform I/O operations such as locking, disk
+allocation, storage, and retrieval.
+
+If multiple clients try to read and write the same part of a file at the
+same time, the Lustre distributed lock manager enforces coherency so
+that all clients see consistent results.
+
+There is default stripe configuration for Salomon Lustre filesystems.
+However, users can set the following stripe parameters for their own
+directories or files to get optimum I/O performance:
+
+1.stripe_size: the size of the chunk in bytes; specify with k, m, or
+  g to use units of KB, MB, or GB, respectively; the size must be an
+  even multiple of 65,536 bytes; default is 1MB for all Salomon Lustre
+  filesystems
+2.stripe_count the number of OSTs to stripe across; default is 1 for
+  Salomon Lustre filesystems  one can specify -1 to use all OSTs in
+  the filesystem.
+3.stripe_offset The index of the 
+  OST where the first stripe is to be
+  placed; default is -1 which results in random selection; using a
+  non-default value is NOT recommended.
+
+ 
+
+Setting stripe size and stripe count correctly for your needs may
+significantly impact the I/O performance you experience.
+
+Use the lfs getstripe for getting the stripe parameters. Use the lfs
+setstripe command for setting the stripe parameters to get optimal I/O
+performance The correct stripe setting depends on your needs and file
+access patterns. 
+
+```
+$ lfs getstripe dir|filename 
+$ lfs setstripe -s stripe_size -c stripe_count -o stripe_offset dir|filename 
+```
+
+Example:
+
+```
+$ lfs getstripe /scratch/work/user/username
+/scratch/work/user/username
+stripe_count: 1 stripe_size:    1048576 stripe_offset:  -1
+
+$ lfs setstripe -c -1 /scratch/work/user/username/
+$ lfs getstripe /scratch/work/user/username/
+/scratch/work/user/username/
+stripe_count:-1 stripe_size:    1048576 stripe_offset:  -1
+```
+
+In this example, we view current stripe setting of the
+/scratch/username/ directory. The stripe count is changed to all OSTs,
+and verified. All files written to this directory will be striped over
+all (54) OSTs
+
+Use lfs check OSTs to see the number and status of active OSTs for each
+filesystem on Salomon. Learn more by reading the man page
+
+```
+$ lfs check osts
+$ man lfs
+```
+
+### Hints on Lustre Stripping
+
+Increase the stripe_count for parallel I/O to the same file.
+
+When multiple processes are writing blocks of data to the same file in
+parallel, the I/O performance for large files will improve when the
+stripe_count is set to a larger value. The stripe count sets the number
+of OSTs the file will be written to. By default, the stripe count is set
+to 1. While this default setting provides for efficient access of
+metadata (for example to support the ls -l command), large files should
+use stripe counts of greater than 1. This will increase the aggregate
+I/O bandwidth by using multiple OSTs in parallel instead of just one. A
+rule of thumb is to use a stripe count approximately equal to the number
+of gigabytes in the file.
+
+Another good practice is to make the stripe count be an integral factor
+of the number of processes performing the write in parallel, so that you
+achieve load balance among the OSTs. For example, set the stripe count
+to 16 instead of 15 when you have 64 processes performing the writes.
+
+Using a large stripe size can improve performance when accessing very
+large files
+
+Large stripe size allows each client to have exclusive access to its own
+part of a file. However, it can be counterproductive in some cases if it
+does not match your I/O pattern. The choice of stripe size has no effect
+on a single-stripe file.
+
+Read more on
+<http://wiki.lustre.org/manual/LustreManual20_HTML/ManagingStripingFreeSpace.html>
+
+>Disk usage and quota commands
+------------------------------------------
+
+>User quotas on the Lustre file systems (SCRATCH) can be checked
+and reviewed using following command:
+
+```
+$ lfs quota dir
+```
+
+Example for Lustre SCRATCH directory:
+
+```
+$ lfs quota /scratch
+Disk quotas for user user001 (uid 1234):
+   Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
+        /scratch       8       0 100000000000       -       3       0       0       -
+Disk quotas for group user001 (gid 1234):
+ Filesystem kbytes quota limit grace files quota limit grace
+ /scratch       8       0       0       -       3       0       0       -
+```
+
+In this example, we view current quota size limit of 100TB and 8KB
+currently used by user001.
+
+HOME directory is mounted via NFS, so a different command must be used
+to obtain quota information:
+
+   $ quota
+
+Example output:
+
+  $ quota
+  Disk quotas for user vop999 (uid 1025):
+       Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
+  home-nfs-ib.salomon.it4i.cz:/home
+                       28       0 250000000              10     0  500000
+
+To have a better understanding of where the space is exactly used, you
+can use following command to find out.
+
+```
+$ du -hs dir
+```
+
+Example for your HOME directory:
+
+```
+$ cd /home
+$ du -hs * .[a-zA-z0-9]* | grep -E "[0-9]*G|[0-9]*M" | sort -hr
+258M   cuda-samples
+15M    .cache
+13M    .mozilla
+5,5M   .eclipse
+2,7M   .idb_13.0_linux_intel64_app
+```
+
+This will list all directories which are having MegaBytes or GigaBytes
+of consumed space in your actual (in this example HOME) directory. List
+is sorted in descending order from largest to smallest
+files/directories.
+
+
+>To have a better understanding of previous commands, you can read
+manpages.
+
+```
+$ man lfs
+```
+
+```
+$ man du 
+```
+
+Extended Access Control List (ACL)
+----------------------------------
+
+Extended ACLs provide another security mechanism beside the standard
+POSIX ACLs which are defined by three entries (for
+owner/group/others). Extended ACLs have more than the three basic
+entries. In addition, they also contain a mask entry and may contain any
+number of named user and named group entries.
+
+ACLs on a Lustre file system work exactly like ACLs on any Linux file
+system. They are manipulated with the standard tools in the standard
+manner. Below, we create a directory and allow a specific user access.
+
+```
+[vop999@login1.salomon ~]$ umask 027
+[vop999@login1.salomon ~]$ mkdir test
+[vop999@login1.salomon ~]$ ls -ld test
+drwxr-x--- 2 vop999 vop999 4096 Nov  5 14:17 test
+[vop999@login1.salomon ~]$ getfacl test
+# file: test
+# owner: vop999
+# group: vop999
+user::rwx
+group::r-x
+other::---
+
+[vop999@login1.salomon ~]$ setfacl -m user:johnsm:rwx test
+[vop999@login1.salomon ~]$ ls -ld test
+drwxrwx---+ 2 vop999 vop999 4096 Nov  5 14:17 test
+[vop999@login1.salomon ~]$ getfacl test
+# file: test
+# owner: vop999
+# group: vop999
+user::rwx
+user:johnsm:rwx
+group::r-x
+mask::rwx
+other::---
+```
+
+Default ACL mechanism can be used to replace setuid/setgid permissions
+on directories. Setting a default ACL on a directory (-d flag to
+setfacl) will cause the ACL permissions to be inherited by any newly
+created file or subdirectory within the directory. Refer to this page
+for more information on Linux ACL:
+
+[http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html ](http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html)
+
+
+Shared Workspaces
+---------------------
+
+
+
+###HOME
+
+Users home directories /home/username reside on HOME filesystem.
+Accessible capacity is 0.5PB, shared among all users. Individual users
+are restricted by filesystem usage quotas, set to 250GB per user.
+>If 250GB should prove as insufficient for particular user, please
+contact [support](https://support.it4i.cz/rt),
+the quota may be lifted upon request.
+
+The HOME filesystem is intended for preparation, evaluation, processing
+and storage of data generated by active Projects.
+
+The HOME  should not be used to archive data of past Projects or other
+unrelated data.
+
+The files on HOME will not be deleted until end of the [users
+lifecycle](../../get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.html).
+
+The workspace is backed up, such that it can be restored in case of 
+catasthropic failure resulting in significant data loss. This backup
+however is not intended to restore old versions of user data or to
+restore (accidentaly) deleted files.
+
+
+HOME workspace
+Accesspoint
+/home/username
+Capacity
+0.5PB
+Throughput
+6GB/s
+User quota
+250GB
+Protocol
+NFS, 2-Tier
+### WORK
+
+The WORK workspace resides on SCRATCH filesystem.  Users may create
+subdirectories and files in directories **/scratch/work/user/username**
+and **/scratch/work/project/projectid. **The /scratch/work/user/username
+is private to user, much like the home directory. The
+/scratch/work/project/projectid is accessible to all users involved in
+project projectid. >
+
+The WORK workspace is intended  to store users project data as well as
+for high performance access to input and output files. All project data
+should be removed once the project is finished. The data on the WORK
+workspace are not backed up.
+
+Files on the WORK filesystem are **persistent** (not automatically
+deleted) throughout duration of the project.
+
+The WORK workspace is hosted on SCRATCH filesystem. The SCRATCH is
+realized as Lustre parallel filesystem and is available from all login
+and computational nodes. Default stripe size is 1MB, stripe count is 1.
+There are 54 OSTs dedicated for the SCRATCH filesystem.
+
+Setting stripe size and stripe count correctly for your needs may
+significantly impact the I/O performance you experience.
+
+WORK workspace
+Accesspoints
+/scratch/work/user/username
+/scratch/work/user/projectid
+Capacity
+1.6P
+Throughput
+30GB/s
+User quota
+100TB
+Default stripe size
+1MB
+Default stripe count
+1
+Number of OSTs
+54
+Protocol
+Lustre
+### TEMP
+
+The TEMP workspace resides on SCRATCH filesystem. The TEMP workspace
+accesspoint is  /scratch/temp.  Users may freely create subdirectories
+and files on the workspace. Accessible capacity is 1.6P, shared among
+all users on TEMP and WORK. Individual users are restricted by
+filesystem usage quotas, set to 100TB per user. The purpose of this
+quota is to prevent runaway programs from filling the entire filesystem
+and deny service to other users. >If 100TB should prove as
+insufficient for particular user, please contact
+[support](https://support.it4i.cz/rt), the quota may be
+lifted upon request. 
+
+The TEMP workspace is intended  for temporary scratch data generated
+during the calculation as well as for high performance access to input
+and output files. All I/O intensive jobs must use the TEMP workspace as
+their working directory.
+
+Users are advised to save the necessary data from the TEMP workspace to
+HOME or WORK after the calculations and clean up the scratch files.
+
+Files on the TEMP filesystem that are **not accessed for more than 90
+days** will be automatically **deleted**.
+
+The TEMP workspace is hosted on SCRATCH filesystem. The SCRATCH is
+realized as Lustre parallel filesystem and is available from all login
+and computational nodes. Default stripe size is 1MB, stripe count is 1.
+There are 54 OSTs dedicated for the SCRATCH filesystem.
+
+Setting stripe size and stripe count correctly for your needs may
+significantly impact the I/O performance you experience.
+
+TEMP workspace
+Accesspoint
+/scratch/temp
+Capacity
+1.6P
+Throughput
+30GB/s
+User quota
+100TB
+Default stripe size
+1MB
+Default stripe count
+1
+Number of OSTs
+54
+Protocol
+Lustre
+ 
+
+RAM disk
+--------
+
+Every computational node is equipped with filesystem realized in memory,
+so called RAM disk.
+
+Use RAM disk in case you need really fast access to your data of limited
+size during your calculation.
+Be very careful, use of RAM disk filesystem is at the expense of
+operational memory.
+
+The local RAM disk is mounted as /ramdisk and is accessible to user
+at /ramdisk/$PBS_JOBID directory.
+
+The local RAM disk filesystem is intended for temporary scratch data
+generated during the calculation as well as for high performance access
+to input and output files. Size of RAM disk filesystem is limited. Be
+very careful, use of RAM disk filesystem is at the expense of
+operational memory.  It is not recommended to allocate large amount of
+memory and use large amount of data in RAM disk filesystem at the same
+time.
+
+The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted
+immediately after the calculation end. Users should take care to save
+the output data from within the jobscript.
+
+RAM disk
+Mountpoint
+ /ramdisk
+Accesspoint
+ /ramdisk/$PBS_JOBID
+Capacity
+120 GB
+Throughput
+over 1.5 GB/s write, over 5 GB/s read, single thread
+over 10 GB/s write, over 50 GB/s read, 16 threads
+
+User quota
+none
+ 
+
+
+Summary
+
+----------
+
+
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+Mountpoint                                     Usage                            Protocol      Net Capacity   Throughput   Limitations   Access                    Services
+---------------------------------------------- -------------------------------- ------------- -------------- ------------ ------------- ------------------------- -----------------------------
+ /home           home directory                   NFS, 2-Tier   0.5 PB         6 GB/s       Quota 250GB   Compute and login nodes   backed up
+
+ /scratch/work   large project files              Lustre        1.69 PB        30 GB/s      Quota        Compute and login nodes   none
+                                                                                                                          1TB                                     
+
+ /scratch/temp   job temporary data               Lustre        1.69 PB        30 GB/s      Quota 100TB   Compute and login nodes   files older 90 days removed
+
+ /ramdisk        job temporary data, node local   local         120GB          90 GB/s      none          Compute nodes             purged after job ends
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+
+
+
+
+ 
+
+
+
+
--- a/html_md.sh
+++ b/html_md.sh
+#!/bin/bash
+
+### DOWNLOAD AND CONVERT DOCUMENTATION
+# autor: kru0052
+# version: 0.4
+# change: repair bugs and optimalizations
+# bugs: bad formatting tables, bad links for other files, stayed a few html elements, formatting bugs...
+###
+
+if [ "$1" = "-t" ]; then
+	# testing new function
+	
+	
+fi
+if [ "$1" = "-w" ]; then
+	# download html pages 
+	wget -X pbspro-documentation,changelog,whats-new,portal_css,portal_javascripts,++resource++jquery-ui-themes,anselm-cluster-documentation/icon.jpg -R favicon.ico,pdf.png,logo.png,background.png,application.png,search_icon.png,png.png,sh.png,touch_icon.png,anselm-cluster-documentation/icon.jpg,*js,robots.txt,*xml,RSS,download_icon.png,pdf,*zip,*rar,@@*,anselm-cluster-documentation/icon.jpg.1 --mirror --convert-links --adjust-extension --page-requisites  --no-parent https://docs.it4i.cz;
+	
+	wget --directory-prefix=./docs.it4i.cz/ http://verif.cs.vsb.cz/aislinn/doc/report.png
+	wget --directory-prefix=./docs.it4i.cz/ https://docs.it4i.cz/anselm-cluster-documentation/software/virtualization/virtualization-job-workflow
+	wget --directory-prefix=./docs.it4i.cz/ https://docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/images/fig1.png
+	wget --directory-prefix=./docs.it4i.cz/ https://docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/images/fig2.png
+	wget --directory-prefix=./docs.it4i.cz/ https://docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/images/fig3.png
+	wget --directory-prefix=./docs.it4i.cz/ https://docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/images/fig4.png
+	wget --directory-prefix=./docs.it4i.cz/ https://docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/images/fig5.png
+	wget --directory-prefix=./docs.it4i.cz/ https://docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/images/fig6.png
+	
+	
+fi
+if [ "$1" = "-c" ]; then
+	### convert html to md
+	# erasing the previous transfer
+	
+	rm -rf converted;
+	rm -rf info;
+	
+	# erasing duplicate files and unwanted files
+	(while read i; 
+	do		
+		if [ -f "$i" ];
+		then
+			echo "$(tput setaf 9)$i deleted";
+		  	rm "$i";
+		fi
+		
+	done) < ./source/list_rm
+	
+	counter=1
+	count=$(find . -name "*.html" -type f | wc -l)
+	
+	find . -name "*.ht*" | 
+	while read i; 
+	do 
+		# first filtering html 
+		echo "$(tput setaf 12)($counter/$count)$(tput setaf 11)$i"; 
+		counter=$((counter+1))
+		printf "$(tput setaf 15)\t\tfirst filtering html files...\n";
+		
+		HEAD=$(grep -n -m1 '<h1' "$i" |cut -f1 -d: | tr --delete '\n')
+		END=$(grep -n -m1 '<!-- <div tal:content=' "$i" |cut -f1 -d: | tr --delete '\n')
+		LAST=$(wc -l "$i" | cut -f1 -d' ')
+		DOWN=$((LAST-END+2))
+
+		sed '1,'"$((HEAD-1))"'d' "$i" | sed -n -e :a -e '1,'"$DOWN"'!{P;N;D;};N;ba' > "${i%.*}TMP.html"	
+
+		# converted .html to .md
+		printf "\t\t.html -> .md\n"
+		pandoc -f html -t markdown+pipe_tables-grid_tables "${i%.*}TMP.html" -o "${i%.*}.md"; 
+		rm "${i%.*}TMP.html";
+
+		# second filtering html and css elements...
+		printf "\t\tsecond filtering html and css elements...\n"
+		sed -e 's/``` /```/' "${i%.*}.md" | sed -e 's/  //' | sed -e 's/<\/div>//g' | sed '/^<div/d' | sed -e 's/<\/span>//' | sed -e 's/^\*\*//' | sed -e 's/\\//g' | sed -e 's/^: //g' | sed -e 's/^Obsah//g' > "${i%.*}TMP.md";
+		while read x ; do 
+			arg1=`echo "$x" | cut -d"&" -f1 | sed 's:[]\[\^\$\.\*\/\"]:\\\\&:g'`;	
+			arg2=`echo "$x" | cut -d"&" -f2 | sed 's:[]\[\^\$\.\*\/\"]:\\\\&:g'`;
+	
+			sed -e 's/'"$arg1"'/'"$arg2"'/' "${i%.*}TMP.md" > "${i%.*}TMP.TEST.md";
+			cat "${i%.*}TMP.TEST.md" > "${i%.*}TMP.md";
+		done < ./source/replace 
+		
+		# repair image...
+		printf "\t\trepair images...\n"
+		while read x ; do 
+			arg1=`echo "$x" | cut -d"&" -f1 | sed 's:[]\[\^\$\.\*\/\"]:\\\\&:g'`;	
+			arg2=`echo "$x" | cut -d"&" -f2 | sed 's:[]\[\^\$\.\*\/\"]:\\\\&:g'`;
+	
+			sed -e 's/'"$arg1"'/'"$arg2"'/' "${i%.*}TMP.md" > "${i%.*}.md";
+			cat "${i%.*}.md" > "${i%.*}TMP.md";
+		done < ./source/repairIMG
+		
+		cat "${i%.*}TMP.md" > "${i%.*}.md";
+		
+		# delete temporary files
+		rm "${i%.*}TMP.md";
+		rm "${i%.*}TMP.TEST.md";		
+	done
+	
+	# delete empty files
+	find -type f -size -10c | 
+	while read i; 
+	do
+		rm "$i"; 	
+		echo "$(tput setaf 9)$i deleted"; 
+	done
+
+	
+	### create new folder and move converted files
+	# create folder info and view all files and folder
+	mkdir info;
+	find ./docs.it4i.cz -name "*.png" -type f > ./info/list_image.txt;
+	find ./docs.it4i.cz -name "*.jpg" -type f >> ./info/list_image.txt;
+	find ./docs.it4i.cz -name "*.jpeg" -type f >> ./info/list_image.txt;
+	find ./docs.it4i.cz -name "*.md" -type f> ./info/list_md.txt; 
+	find ./docs.it4i.cz -type d | sort > ./info/list_folder.txt
+	
+	rm -rf ./converted 
+	
+	mkdir converted;
+	(while read i; 
+	do		
+		mkdir "./converted/$i";
+	done) < ./source/list_folder
+	
+	# move md files to new folders
+	while read a b ; do 
+		cp "$a" "./converted/$b"; 
+	done < <(paste ./info/list_md.txt ./source/list_md_mv)
+		
+	# copy jpg and jpeg to new folders
+	while read a b ; do 
+		cp "$a" "./converted/$b"; 
+	done < <(paste ./info/list_image.txt ./source/list_image_mv.txt)
+	cp ./docs.it4i.cz/salomon/salomon ./converted/docs.it4i.cz/salomon/salomon
+	cp ./docs.it4i.cz/salomon/salomon-2 ./converted/docs.it4i.cz/salomon/salomon-2
+	cp ./converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/fairshare_formula.png ./converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/fairshare_formula.png
+	cp ./converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job_sort_formula.png ./converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job_sort_formula.png
+	cp ./converted/docs.it4i.cz/salomon/software/debuggers/vtune-amplifier.png ./converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/vtune-amplifier.png
+	cp ./converted/docs.it4i.cz/salomon/software/debuggers/Snmekobrazovky20160708v12.33.35.png ./converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/Snmekobrazovky20160708v12.33.35.png
+	cp ./docs.it4i.cz/virtualization-job-workflow ./converted/docs.it4i.cz/anselm-cluster-documentation/software/ 
+	
+fi
--- a/source/list_folder
+++ b/source/list_folder
+./docs.it4i.cz
+./docs.it4i.cz/anselm-cluster-documentation
+./docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster
+./docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/shell-and-data-access
+./docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution
+./docs.it4i.cz/anselm-cluster-documentation/software
+./docs.it4i.cz/anselm-cluster-documentation/software/ansys
+./docs.it4i.cz/anselm-cluster-documentation/software/chemistry
+./docs.it4i.cz/anselm-cluster-documentation/software/comsol
+./docs.it4i.cz/anselm-cluster-documentation/software/debuggers
+./docs.it4i.cz/anselm-cluster-documentation/software/intel-suite
+./docs.it4i.cz/anselm-cluster-documentation/software/mpi-1
+./docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages
+./docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries
+./docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1
+./docs.it4i.cz/anselm-cluster-documentation/storage-1
+./docs.it4i.cz/get-started-with-it4innovations
+./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters
+./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface
+./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer
+./docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials
+./docs.it4i.cz/salomon
+./docs.it4i.cz/salomon/accessing-the-cluster
+./docs.it4i.cz/salomon/hardware-overview-1
+./docs.it4i.cz/salomon/network-1
+./docs.it4i.cz/salomon/resource-allocation-and-job-execution
+./docs.it4i.cz/salomon/software
+./docs.it4i.cz/salomon/software/ansys
+./docs.it4i.cz/salomon/software/chemistry
+./docs.it4i.cz/salomon/software/comsol
+./docs.it4i.cz/salomon/software/debuggers
+./docs.it4i.cz/salomon/software/intel-suite
+./docs.it4i.cz/salomon/software/mpi-1
+./docs.it4i.cz/salomon/software/numerical-languages
+./docs.it4i.cz/salomon/storage
+./docs.it4i.cz/salomon/uv-2000
No results found