Skip to content
Snippets Groups Projects
Commit 4f249622 authored by Lukáš Krupčík's avatar Lukáš Krupčík
Browse files

Merge branch 'gajdusek_clean' into 'master'

Gajdusek cleaning

See merge request !161
parents c8119615 44d6e063
No related branches found
No related tags found
6 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section,!196Master,!161Gajdusek cleaning
Showing
with 10 additions and 1267 deletions
...@@ -39,7 +39,7 @@ ext_links: ...@@ -39,7 +39,7 @@ ext_links:
image: davidhrbac/docker-mdcheck:latest image: davidhrbac/docker-mdcheck:latest
allow_failure: true allow_failure: true
after_script: after_script:
# remove JSON results # remove JSON results
- rm *.json - rm *.json
script: script:
#- find docs.it4i/ -name '*.md' -exec grep --color -l http {} + | xargs awesome_bot -t 10 #- find docs.it4i/ -name '*.md' -exec grep --color -l http {} + | xargs awesome_bot -t 10
...@@ -63,7 +63,7 @@ mkdocs: ...@@ -63,7 +63,7 @@ mkdocs:
#- apt-get -y install git #- apt-get -y install git
# add version to footer # add version to footer
- bash scripts/add_version.sh - bash scripts/add_version.sh
# get modules list from clusters # get modules list from clusters
- bash scripts/get_modules.sh - bash scripts/get_modules.sh
# regenerate modules matrix # regenerate modules matrix
- python scripts/modules-matrix.py > docs.it4i/modules-matrix.md - python scripts/modules-matrix.py > docs.it4i/modules-matrix.md
...@@ -75,7 +75,7 @@ mkdocs: ...@@ -75,7 +75,7 @@ mkdocs:
# replace broken links in 404.html # replace broken links in 404.html
- sed -i 's,href="" title=",href="/" title=",g' site/404.html - sed -i 's,href="" title=",href="/" title=",g' site/404.html
# compress sitemap # compress sitemap
- gzip < site/sitemap.xml > site/sitemap.xml.gz - gzip < site/sitemap.xml > site/sitemap.xml.gz
artifacts: artifacts:
paths: paths:
- site - site
...@@ -90,11 +90,11 @@ shellcheck: ...@@ -90,11 +90,11 @@ shellcheck:
- find . -name *.sh -not -path "./docs.it4i/*" -not -path "./site/*" -exec shellcheck {} + - find . -name *.sh -not -path "./docs.it4i/*" -not -path "./site/*" -exec shellcheck {} +
deploy to stage: deploy to stage:
environment: stage environment: stage
stage: deploy stage: deploy
image: davidhrbac/docker-mkdocscheck:latest image: davidhrbac/docker-mkdocscheck:latest
before_script: before_script:
# install ssh-agent # install ssh-agent
- 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client -y )' - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client -y )'
- 'which rsync || ( apt-get update -y && apt-get install rsync -y )' - 'which rsync || ( apt-get update -y && apt-get install rsync -y )'
# run ssh-agent # run ssh-agent
...@@ -117,7 +117,7 @@ deploy to production: ...@@ -117,7 +117,7 @@ deploy to production:
stage: deploy stage: deploy
image: davidhrbac/docker-mkdocscheck:latest image: davidhrbac/docker-mkdocscheck:latest
before_script: before_script:
# install ssh-agent # install ssh-agent
- 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client -y )' - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client -y )'
- 'which rsync || ( apt-get update -y && apt-get install rsync -y )' - 'which rsync || ( apt-get update -y && apt-get install rsync -y )'
# run ssh-agent # run ssh-agent
...@@ -127,7 +127,7 @@ deploy to production: ...@@ -127,7 +127,7 @@ deploy to production:
# disable host key checking (NOTE: makes you susceptible to man-in-the-middle attacks) # disable host key checking (NOTE: makes you susceptible to man-in-the-middle attacks)
# WARNING: use only in docker container, if you use it with shell you will overwrite your user's ssh config # WARNING: use only in docker container, if you use it with shell you will overwrite your user's ssh config
- mkdir -p ~/.ssh - mkdir -p ~/.ssh
- echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config - echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config
- useradd -lM nginx - useradd -lM nginx
script: script:
- chown nginx:nginx site -R - chown nginx:nginx site -R
......
...@@ -29,8 +29,8 @@ Mellanox ...@@ -29,8 +29,8 @@ Mellanox
### Formulas are made with: ### Formulas are made with:
* https://facelessuser.github.io/pymdown-extensions/extensions/arithmatex/ * [https://facelessuser.github.io/pymdown-extensions/extensions/arithmatex/](https://facelessuser.github.io/pymdown-extensions/extensions/arithmatex/)
* https://www.mathjax.org/ * [https://www.mathjax.org/](https://www.mathjax.org/)
You can add formula to page like this: You can add formula to page like this:
......
# Resource Allocation and Job Execution # Resource Allocation and Job Execution
To run a [job](ob-submission-and-execution/), [computational resources](resources-allocation-policy/) for this particular job must be allocated. This is done via the PBS Pro job workload manager software, which efficiently distributes workloads across the supercomputer. Extensive information about PBS Pro can be found in the [official documentation here](../pbspro/), especially in the PBS Pro User's Guide. To run a [job](job-submission-and-execution/), [computational resources](resources-allocation-policy/) for this particular job must be allocated. This is done via the PBS Pro job workload manager software, which efficiently distributes workloads across the supercomputer. Extensive information about PBS Pro can be found in the [official documentation here](../pbspro/), especially in the PBS Pro User's Guide.
## Resources Allocation Policy ## Resources Allocation Policy
......
...@@ -108,6 +108,4 @@ Options: ...@@ -108,6 +108,4 @@ Options:
---8<--- "resource_accounting.md" ---8<--- "resource_accounting.md"
---8<--- "mathjax.md" ---8<--- "mathjax.md"
# Molpro
Molpro is a complete system of ab initio programs for molecular electronic structure calculations.
## About Molpro
Molpro is a software package used for accurate ab-initio quantum chemistry calculations. More information can be found at the [official webpage](http://www.molpro.net/).
## License
Molpro software package is available only to users that have a valid license. Please contact support to enable access to Molpro if you have a valid license appropriate for running on our cluster (eg. academic research group licence, parallel execution).
To run Molpro, you need to have a valid license token present in " $HOME/.molpro/token". You can download the token from [Molpro website](https://www.molpro.net/licensee/?portal=licensee).
## Installed Version
Currently on Anselm is installed version 2010.1, patch level 45, parallel version compiled with Intel compilers and Intel MPI.
Compilation parameters are default:
| Parameter | Value |
| ---------------------------------- | ------------ |
| max number of atoms | 200 |
| max number of valence orbitals | 300 |
| max number of basis functions | 4095 |
| max number of states per symmmetry | 20 |
| max number of state symmetries | 16 |
| max number of records | 200 |
| max number of primitives | maxbfn x [2] |
## Running
Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details.
!!! note
The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS.
You are advised to use the -d option to point to a directory in [SCRATCH file system](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch file system.
### Example jobscript
```bash
#PBS -A IT4I-0-0
#PBS -q qprod
#PBS -l select=1:ncpus=16:mpiprocs=16:ompthreads=1
cd $PBS_O_WORKDIR
# load Molpro module
module add molpro
# create a directory in the SCRATCH filesystem
mkdir -p /scratch/$USER/$PBS_JOBID
# copy an example input
cp /apps/chem/molpro/2010.1/molprop_2010_1_Linux_x86_64_i8/examples/caffeine_opt_diis.com .
# run Molpro with default options
molpro -d /scratch/$USER/$PBS_JOBID caffeine_opt_diis.com
# delete scratch directory
rm -rf /scratch/$USER/$PBS_JOBID
```
# NWChem
## Introduction
NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters.
[Homepage](http://www.nwchem-sw.org/index.php/Main_Page)
## Installed Versions
The following versions are currently installed:
* 6.1.1, not recommended, problems have been observed with this version
* 6.3-rev2-patch1, current release with QMD patch applied. Compiled with Intel compilers, MKL and Intel MPI
* 6.3-rev2-patch1-openmpi, same as above, but compiled with OpenMPI and NWChem provided BLAS instead of MKL. This version is expected to be slower
* 6.3-rev2-patch1-venus, this version contains only libraries for VENUS interface linking. Does not provide standalone NWChem executable
For a current list of installed versions, execute:
```console
$ ml av nwchem
```
## Running
NWChem is compiled for parallel MPI execution. Normal procedure for MPI jobs applies. Sample jobscript:
```bash
#PBS -A IT4I-0-0
#PBS -q qprod
#PBS -l select=1:ncpus=16
module add nwchem/6.3-rev2-patch1
mpirun -np 16 nwchem h2o.nw
```
## Options
Please refer to [the documentation](http://www.nwchem-sw.org/index.php/Release62:Top-level) and in the input file set the following directives :
* MEMORY : controls the amount of memory NWChem will use
* SCRATCH_DIR : set this to a directory in [SCRATCH file system](../../storage/storage/#scratch) (or run the calculation completely in a scratch directory). For certain calculations, it might be advisable to reduce I/O by forcing "direct" mode, e.g.. "scf direct"
# Compilers
## Available Compilers, Including GNU, INTEL, and UPC Compilers
Currently there are several compilers for different programming languages available on the Anselm cluster:
* C/C++
* Fortran 77/90/95
* Unified Parallel C
* Java
* NVIDIA CUDA
The C/C++ and Fortran compilers are divided into two main groups GNU and Intel.
## Intel Compilers
For information about the usage of Intel Compilers and other Intel products, please read the [Intel Parallel studio](intel-suite/) page.
## GNU C/C++ and Fortran Compilers
For compatibility reasons there are still available the original (old 4.4.6-4) versions of GNU compilers as part of the OS. These are accessible in the search path by default.
It is strongly recommended to use the up to date version (4.8.1) which comes with the module gcc:
```console
$ ml gcc
$ gcc -v
$ g++ -v
$ gfortran -v
```
With the module loaded two environment variables are predefined. One for maximum optimizations on the Anselm cluster architecture, and the other for debugging purposes:
```console
$ echo $OPTFLAGS
-O3 -march=corei7-avx
$ echo $DEBUGFLAGS
-O0 -g
```
For more information about the possibilities of the compilers, please see the man pages.
## Unified Parallel C
UPC is supported by two compiler/runtime implementations:
* GNU - SMP/multi-threading support only
* Berkley - multi-node support as well as SMP/multi-threading support
### GNU UPC Compiler
To use the GNU UPC compiler and run the compiled binaries use the module gupc
```console
$ module add gupc
$ gupc -v
$ g++ -v
```
Simple program to test the compiler
```console
$ cat count.upc
/* hello.upc - a simple UPC example */
#include <upc.h>
#include <stdio.h>
int main() {
if (MYTHREAD == 0) {
printf("Welcome to GNU UPC!!!n");
}
upc_barrier;
printf(" - Hello from thread %in", MYTHREAD);
return 0;
}
```
To compile the example use
```console
$ gupc -o count.upc.x count.upc
```
To run the example with 5 threads issue
```console
$ ./count.upc.x -fupc-threads-5
```
For more information see the man pages.
### Berkley UPC Compiler
To use the Berkley UPC compiler and runtime environment to run the binaries use the module bupc
```console
$ module add bupc
$ upcc -version
```
As default UPC network the "smp" is used. This is very quick and easy way for testing/debugging, but limited to one node only.
For production runs, it is recommended to use the native Infiband implementation of UPC network "ibv". For testing/debugging using multiple nodes, the "mpi" UPC network is recommended.
!!! warning
Selection of the network is done at the compile time and not at runtime (as expected)!
Example UPC code:
```console
$ cat hello.upc
/* hello.upc - a simple UPC example */
#include <upc.h>
#include <stdio.h>
int main() {
if (MYTHREAD == 0) {
printf("Welcome to Berkeley UPC!!!n");
}
upc_barrier;
printf(" - Hello from thread %in", MYTHREAD);
return 0;
}
```
To compile the example with the "ibv" UPC network use
```console
$ upcc -network=ibv -o hello.upc.x hello.upc
```
To run the example with 5 threads issue
```console
$ upcrun -n 5 ./hello.upc.x
```
To run the example on two compute nodes using all 32 cores, with 32 threads, issue
```console
$ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16
$ module add bupc
$ upcrun -n 32 ./hello.upc.x
```
For more information see the man pages.
## Java
For information how to use Java (runtime and/or compiler), please read the [Java page](java/).
## NVIDIA CUDA
For information on how to work with NVIDIA CUDA, please read the [NVIDIA CUDA page](nvidia-cuda/).
# COMSOL Multiphysics
## Introduction
[COMSOL](http://www.comsol.com) is a powerful environment for modelling and solving various engineering and scientific problems based on partial differential equations. COMSOL is designed to solve coupled or multiphysics phenomena. For many
standard engineering problems COMSOL provides add-on products such as electrical, mechanical, fluid flow, and chemical
applications.
* [Structural Mechanics Module](http://www.comsol.com/structural-mechanics-module),
* [Heat Transfer Module](http://www.comsol.com/heat-transfer-module),
* [CFD Module](http://www.comsol.com/cfd-module),
* [Acoustics Module](http://www.comsol.com/acoustics-module),
* and [many others](http://www.comsol.com/products)
COMSOL also allows an interface support for equation-based modelling of partial differential equations.
## Execution
On the Anselm cluster COMSOL is available in the latest stable version. There are two variants of the release:
* **Non commercial** or so called **EDU variant**, which can be used for research and educational purposes.
* **Commercial** or so called **COM variant**, which can used also for commercial activities. **COM variant** has only subset of features compared to the **EDU variant** available. More about licensing will be posted here soon.
To load the of COMSOL load the module
```console
$ ml comsol
```
By default the **EDU variant** will be loaded. If user needs other version or variant, load the particular version. To obtain the list of available versions use
```console
$ ml av comsol
```
If user needs to prepare COMSOL jobs in the interactive mode it is recommend to use COMSOL on the compute nodes via PBS Pro scheduler. In order run the COMSOL Desktop GUI on Windows is recommended to use the Virtual Network Computing (VNC).
```console
$ xhost +
$ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ncpus=16
$ ml comsol
$ comsol
```
To run COMSOL in batch mode, without the COMSOL Desktop GUI environment, user can utilized the default (comsol.pbs) job script and execute it via the qsub command.
```bash
#!/bin/bash
#PBS -l select=3:ncpus=16
#PBS -q qprod
#PBS -N JOB_NAME
#PBS -A PROJECT_ID
cd /scratch/$USER/ || exit
echo Time is `date`
echo Directory is `pwd`
echo '**PBS_NODEFILE***START*******'
cat $PBS_NODEFILE
echo '**PBS_NODEFILE***END*********'
text_nodes < cat $PBS_NODEFILE
module load comsol
# module load comsol/43b-COM
ntask=$(wc -l $PBS_NODEFILE)
comsol -nn ${ntask} batch -configuration /tmp –mpiarg –rmk –mpiarg pbs -tmpdir /scratch/$USER/ -inputfile name_input_f.mph -outputfile name_output_f.mph -batchlog name_log_f.log
```
Working directory has to be created before sending the (comsol.pbs) job script into the queue. Input file (name_input_f.mph) has to be in working directory or full path to input file has to be specified. The appropriate path to the temp directory of the job has to be set by command option (-tmpdir).
## LiveLink for MATLAB
COMSOL is the software package for the numerical solution of the partial differential equations. LiveLink for MATLAB allows connection to the COMSOL API (Application Programming Interface) with the benefits of the programming language and computing environment of the MATLAB.
LiveLink for MATLAB is available in both **EDU** and **COM** **variant** of the COMSOL release. On Anselm 1 commercial (**COM**) license and the 5 educational (**EDU**) licenses of LiveLink for MATLAB (please see the [ISV Licenses](isv_licenses/)) are available.
Following example shows how to start COMSOL model from MATLAB via LiveLink in the interactive mode.
```console
$ xhost +
$ qsub -I -X -A PROJECT_ID -q qexp -l select=1:ncpus=16
$ ml matlab
$ ml comsol
$ comsol server matlab
```
At the first time to launch the LiveLink for MATLAB (client-MATLAB/server-COMSOL connection) the login and password is requested and this information is not requested again.
To run LiveLink for MATLAB in batch mode with (comsol_matlab.pbs) job script you can utilize/modify the following script and execute it via the qsub command.
```bash
#!/bin/bash
#PBS -l select=3:ncpus=16
#PBS -q qprod
#PBS -N JOB_NAME
#PBS -A PROJECT_ID
cd /scratch/$USER || exit
echo Time is `date`
echo Directory is `pwd`
echo '**PBS_NODEFILE***START*******'
cat $PBS_NODEFILE
echo '**PBS_NODEFILE***END*********'
text_nodes < cat $PBS_NODEFILE
module load matlab
module load comsol/43b-EDU
ntask=$(wc -l $PBS_NODEFILE)
comsol -nn ${ntask} server -configuration /tmp -mpiarg -rmk -mpiarg pbs -tmpdir /scratch/$USER &
cd /apps/engineering/comsol/comsol43b/mli
matlab -nodesktop -nosplash -r "mphstart; addpath /scratch/$USER; test_job"
```
This example shows, how to run LiveLink for MATLAB with following configuration: 3 nodes and 16 cores per node. Working directory has to be created before submitting (comsol_matlab.pbs) job script into the queue. Input file (test_job.m) has to be in working directory or full path to input file has to be specified. The MATLAB command option (-r ”mphstart”) created a connection with a COMSOL server using the default port number.
# Allinea Forge (DDT,MAP)
Allinea Forge consist of two tools - debugger DDT and profiler MAP.
Allinea DDT, is a commercial debugger primarily for debugging parallel MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel Xeon Phi accelerators. DDT provides all the standard debugging features (stack trace, breakpoints, watches, view variables, threads etc.) for every thread running as part of your program, or for every process - even if these processes are distributed across a cluster using an MPI implementation.
Allinea MAP is a profiler for C/C++/Fortran HPC codes. It is designed for profiling parallel code, which uses pthreads, OpenMP or MPI.
## License and Limitations for Anselm Users
On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel processes. In case of debugging GPU or Xeon Phi accelerated codes the limit is 8 accelerators. These limitation means that:
* 1 user can debug up 64 processes, or
* 32 users can debug 2 processes, etc.
In case of debugging on accelerators:
* 1 user can debug on up to 8 accelerators, or
* 8 users can debug on single accelerator.
## Compiling Code to Run With DDT
### Modules
Load all necessary modules to compile the code. For example:
```console
$ ml intel
$ ml impi ... or ... module load openmpi/X.X.X-icc
```
Load the Allinea DDT module:
```console
$ ml Forge
```
Compile the code:
```console
$ mpicc -g -O0 -o test_debug test.c
$ mpif90 -g -O0 -o test_debug test.f
```
### Compiler Flags
Before debugging, you need to compile your code with theses flags:
!!! note
\* **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers.
\* **O0** : Suppress all optimizations.
## Starting a Job With DDT
Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh:
```console
$ ssh -X username@anselm.it4i.cz
```
Other options is to access login node using VNC. Please see the detailed information on how to [use graphic user interface on Anselm](/general/accessing-the-clusters/graphical-user-interface/x-window-system/)
From the login node an interactive session **with X windows forwarding** (-X option) can be started by following command:
```console
$ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00
```
Then launch the debugger with the ddt command followed by the name of the executable to debug:
```console
$ ddt test_debug
```
A submission window that appears have a prefilled path to the executable to debug. You can select the number of MPI processors and/or OpenMP threads on which to run and press run. Command line arguments to a program can be entered to the "Arguments " box.
![](../../../img/ddt1.png)
To start the debugging directly without the submission window, user can specify the debugging and execution parameters from the command line. For example the number of MPI processes is set by option "-np 4". Skipping the dialog is done by "-start" option. To see the list of the "ddt" command line parameters, run "ddt --help".
```console
ddt -start -np 4 ./hello_debug_impi
```
## Documentation
Users can find original User Guide after loading the DDT module:
```console
$DDTPATH/doc/userguide.pdf
```
[1] Discipline, Magic, Inspiration and Science: Best Practice Debugging with Allinea DDT, Workshop conducted at LLNL by Allinea on May 10, 2013, [link](https://computing.llnl.gov/tutorials/allineaDDT/index.html)
# Allinea Performance Reports
## Introduction
Allinea Performance Reports characterize the performance of HPC application runs. After executing your application through the tool, a synthetic HTML report is generated automatically, containing information about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs.
The Allinea Performance Reports is most useful in profiling MPI programs.
Our license is limited to 64 MPI processes.
## Modules
Allinea Performance Reports version 6.0 is available
```console
$ ml PerformanceReports/6.0
```
The module sets up environment variables, required for using the Allinea Performance Reports. This particular command loads the default module, which is performance reports version 4.2.
## Usage
!!! note
Use the the perf-report wrapper on your (MPI) program.
Instead of [running your MPI program the usual way](../mpi/), use the the perf report wrapper:
```console
$ perf-report mpirun ./mympiprog.x
```
The mpi program will run as usual. The perf-report creates two additional files, in \*.txt and \*.html format, containing the performance report. Note that [demanding MPI codes should be run within the queue system](../../job-submission-and-execution/).
## Example
In this example, we will be profiling the mympiprog.x MPI program, using Allinea performance reports. Assume that the code is compiled with Intel compilers and linked against Intel MPI library:
First, we allocate some nodes via the express queue:
```console
$ qsub -q qexp -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 -I
qsub: waiting for job 262197.dm2 to start
qsub: job 262197.dm2 ready
```
Then we load the modules and run the program the usual way:
```console
$ ml intel impi allinea-perf-report/4.2
$ mpirun ./mympiprog.x
```
Now lets profile the code:
```console
$ perf-report mpirun ./mympiprog.x
```
Performance report files [mympiprog_32p\*.txt](../../../src/mympiprog_32p_2014-10-15_16-56.txt) and [mympiprog_32p\*.html](../../../src/mympiprog_32p_2014-10-15_16-56.html) were created. We can see that the code is very efficient on MPI and is CPU bounded.
# Debuggers and profilers summary
## Introduction
We provide state of the art programms and tools to develop, profile and debug HPC codes at IT4Innovations. On these pages, we provide an overview of the profiling and debugging tools available on Anslem at IT4I.
## Intel Debugger
The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use X display for running the GUI.
```console
$ ml intel
$ idb
```
Read more at the [Intel Debugger](intel-suite/intel-debugger/) page.
## Allinea Forge (DDT/MAP)
Allinea DDT, is a commercial debugger primarily for debugging parallel MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel Xeon Phi accelerators. DDT provides all the standard debugging features (stack trace, breakpoints, watches, view variables, threads etc.) for every thread running as part of your program, or for every process even if these processes are distributed across a cluster using an MPI implementation.
```console
$ ml Forge
$ forge
```
Read more at the [Allinea DDT](debuggers/allinea-ddt/) page.
## Allinea Performance Reports
Allinea Performance Reports characterize the performance of HPC application runs. After executing your application through the tool, a synthetic HTML report is generated automatically, containing information about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs. Our license is limited to 64 MPI processes.
```console
$ ml PerformanceReports/6.0
$ perf-report mpirun -n 64 ./my_application argument01 argument02
```
Read more at the [Allinea Performance Reports](debuggers/allinea-performance-reports/) page.
## RougeWave Totalview
TotalView is a source- and machine-level debugger for multi-process, multi-threaded programs. Its wide range of tools provides ways to analyze, organize, and test programs, making it easy to isolate and identify problems in individual threads and processes in programs of great complexity.
```console
$ ml totalview
$ totalview
```
Read more at the [Totalview](debuggers/total-view/) page.
## Vampir Trace Analyzer
Vampir is a GUI trace analyzer for traces in OTF format.
```console
$ ml Vampir/8.5.0
$ vampir
```
Read more at the [Vampir](vampir/) page.
# Intel VTune Amplifier
## Introduction
Intel VTune Amplifier, part of Intel Parallel studio, is a GUI profiling tool designed for Intel processors. It offers a graphical performance analysis of single core and multithreaded applications. A highlight of the features:
* Hotspot analysis
* Locks and waits analysis
* Low level specific counters, such as branch analysis and memory
bandwidth
* Power usage analysis - frequency and sleep states.
![screenshot](../../../img/vtune-amplifier.png)
## Usage
To launch the GUI, first load the module:
```console
$ module add VTune/2016_update1
```
and launch the GUI :
```console
$ amplxe-gui
```
!!! note
To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on using GUI applications.
The GUI will open in new window. Click on "_New Project..._" to create a new project. After clicking _OK_, a new window with project properties will appear. At "_Application:_", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "_Managed code profiling mode:_" select "_Native_" (unless you want to profile managed mode .NET/Mono applications). After clicking _OK_, your project is created.
To run a new analysis, click "_New analysis..._". You will see a list of possible analysis. Some of them will not be possible on the current CPU (e.g. Intel Atom analysis is not possible on Sandy Bridge CPU), the GUI will show an error box if you select the wrong analysis. For example, select "_Advanced Hotspots_". Clicking on _Start _will start profiling of the application.
## Remote Analysis
VTune Amplifier also allows a form of remote analysis. In this mode, data for analysis is collected from the command line without GUI, and the results are then loaded to GUI on another machine. This allows profiling without interactive graphical jobs. To perform a remote analysis, launch a GUI somewhere, open the new analysis window and then click the button "_Command line_" in bottom right corner. It will show the command line needed to perform the selected analysis.
The command line will look like this:
```console
$ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -knob collection-detail=stack-and-callcount -mrte-mode=native -target-duration-type=veryshort -app-working-dir /home/sta545/test -- /home/sta545/test_pgsesv
```
Copy the line to clipboard and then you can paste it in your jobscript or in command line. After the collection is run, open the GUI once again, click the menu button in the upper right corner, and select "_Open > Result..._". The GUI will load the results from the run.
## Xeon Phi
!!! note
This section is outdated. It will be updated with new information soon.
It is possible to analyze both native and offload Xeon Phi applications. For offload mode, just specify the path to the binary. For native mode, you need to specify in project properties:
Application: ssh
Application parameters: mic0 source ~/.profile && /path/to/your/bin
Note that we include source ~/.profile in the command to setup environment paths [as described here](../intel-xeon-phi/).
!!! note
If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card.
You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later :
```console
$ amplxe-cl -collect knc-hotspots -no-auto-finalize -- ssh mic0
"export LD_LIBRARY_PATH=/apps/intel/composer_xe_2015.2.164/compiler/lib/mic/:/apps/intel/composer_xe_2015.2.164/mkl/lib/mic/; export KMP_AFFINITY=compact; /tmp/app.mic"
```
## References
1. <https://www.rcac.purdue.edu/tutorials/phi/PerformanceTuningXeonPhi-Tullos.pdf> Performance Tuning for Intel® Xeon Phi™ Coprocessors
# Total View
TotalView is a GUI-based source code multi-process, multi-thread debugger.
## License and Limitations for Anselm Users
On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel processes. These limitation means that:
```console
1 user can debug up 64 processes, or
32 users can debug 2 processes, etc.
```
Debugging of GPU accelerated codes is also supported.
You can check the status of the licenses here:
```console
$ cat /apps/user/licenses/totalview_features_state.txt
# totalview
# -------------------------------------------------
# FEATURE TOTAL USED AVAIL
# -------------------------------------------------
TotalView_Team 64 0 64
Replay 64 0 64
CUDA 64 0 64
```
## Compiling Code to Run With TotalView
### Modules
Load all necessary modules to compile the code. For example:
```console
$ ml intel **or** ml foss
```
Load the TotalView module:
```console
$ ml totalview/8.12
```
Compile the code:
```console
$ mpicc -g -O0 -o test_debug test.c
$ mpif90 -g -O0 -o test_debug test.f
```
### Compiler Flags
Before debugging, you need to compile your code with theses flags:
!!! note
\* **-g** : Generates extra debugging information usable by GDB. **-g3** includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers.
\* **-O0** : Suppress all optimizations.
## Starting a Job With TotalView
Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh:
```console
local $ ssh -X username@anselm.it4i.cz
```
Other options is to access login node using VNC. Please see the detailed information on how to use graphic user interface on Anselm.
From the login node an interactive session with X windows forwarding (-X option) can be started by following command:
```console
$ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00
```
Then launch the debugger with the totalview command followed by the name of the executable to debug.
### Debugging a Serial Code
To debug a serial code use:
```console
$ totalview test_debug
```
### Debugging a Parallel Code - Option 1
To debug a parallel code compiled with **OpenMPI** you need to setup your TotalView environment:
!!! hint
To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your `~/.tvdrc` file:
```console
proc mpi_auto_run_starter {loaded_id} {
set starter_programs {mpirun mpiexec orterun}
set executable_name [TV::symbol get $loaded_id full_pathname]
set file_component [file tail $executable_name]
if {[lsearch -exact $starter_programs $file_component] != -1} {
puts "*************************************"
puts "Automatically starting $file_component"
puts "*************************************"
dgo
}
}
# Append this function to TotalView's image load callbacks so that
# TotalView run this program automatically.
dlappend TV::image_load_callbacks mpi_auto_run_starter
```
The source code of this function can be also found in
```console
$ /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl
```
!!! note
You can also add only following line to you ~/.tvdrc file instead of the entire function:
**source /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl**
You need to do this step only once.
Now you can run the parallel debugger using:
```console
$ mpirun -tv -n 5 ./test_debug
```
When following dialog appears click on "Yes"
![](../../../img/totalview1.png)
At this point the main TotalView GUI window will appear and you can insert the breakpoints and start debugging:
![](../../../img/totalview2.png)
### Debugging a Parallel Code - Option 2
Other option to start new parallel debugging session from a command line is to let TotalView to execute mpirun by itself. In this case user has to specify a MPI implementation used to compile the source code.
The following example shows how to start debugging session with Intel MPI:
```console
$ ml intel
$ ml totalview
$ totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi
```
After running previous command you will see the same window as shown in the screenshot above.
More information regarding the command line parameters of the TotalView can be found TotalView Reference Guide, Chapter 7: TotalView Command Syntax.
## Documentation
[1] The [TotalView documentation](http://www.roguewave.com/support/product-documentation/totalview-family.aspx#totalview) web page is a good resource for learning more about some of the advanced TotalView features.
# Vampir
Vampir is a commercial trace analysis and visualization tool. It can work with traces in OTF and OTF2 formats. It does not have the functionality to collect traces, you need to use a trace collection tool (such as [Score-P](score-p/)) first to collect the traces.
![](../../../img/Snmekobrazovky20160708v12.33.35.png)
## Installed Versions
Version 8.5.0 is currently installed as module Vampir/8.5.0 :
```console
$ ml Vampir/8.5.0
$ vampir &
```
## User Manual
You can find the detailed user manual in PDF format in $EBROOTVAMPIR/doc/vampir-manual.pdf
## References
[1]. <https://www.vampir.eu>
# Intel Compilers
The Intel compilers version 13.1.1 are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler.
```console
$ ml intel
$ icc -v
$ ifort -v
```
The intel compilers provide for vectorization of the code, via the AVX instructions and support threading parallelization via OpenMP
For maximum performance on the Anselm cluster, compile your programs using the AVX instructions, with reporting where the vectorization was used. We recommend following compilation options for high performance
```console
$ icc -ipo -O3 -vec -xAVX -vec-report1 myprog.c mysubroutines.c -o myprog.x
$ ifort -ipo -O3 -vec -xAVX -vec-report1 myprog.f mysubroutines.f -o myprog.x
```
In this example, we compile the program enabling interprocedural optimizations between source files (-ipo), aggressive loop optimizations (-O3) and vectorization (-vec -xAVX)
The compiler recognizes the omp, simd, vector and ivdep pragmas for OpenMP parallelization and AVX vectorization. Enable the OpenMP parallelization by the **-openmp** compiler switch.
```console
$ icc -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.c mysubroutines.c -o myprog.x
$ ifort -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.f mysubroutines.f -o myprog.x
```
Read more at <http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm>
## Sandy Bridge/Haswell Binary Compatibility
Anselm nodes are currently equipped with Sandy Bridge CPUs, while Salomon will use Haswell architecture. >The new processors are backward compatible with the Sandy Bridge nodes, so all programs that ran on the Sandy Bridge processors, should also run on the new Haswell nodes. >To get optimal performance out of the Haswell processors a program should make use of the special AVX2 instructions for this processor. One can do this by recompiling codes with the compiler flags >designated to invoke these instructions. For the Intel compiler suite, there are two ways of doing this:
* Using compiler flag (both for Fortran and C): -xCORE-AVX2. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge nodes.
* Using compiler flags (both for Fortran and C): -xAVX -axCORE-AVX2. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries.
# Intel Debugger
## Debugging Serial Applications
The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use X display for running the GUI.
```baconsolesh
$ ml intel
$ idb
```
The debugger may run in text mode. To debug in text mode, use
```console
$ idbc
```
To debug on the compute nodes, module intel must be loaded. The GUI on compute nodes may be accessed using the same way as in the GUI section
Example:
```console
$ qsub -q qexp -l select=1:ncpus=16 -X -I
qsub: waiting for job 19654.srv11 to start
qsub: job 19654.srv11 ready
$ ml intel
$ ml java
$ icc -O0 -g myprog.c -o myprog.x
$ idb ./myprog.x
```
In this example, we allocate 1 full compute node, compile program myprog.c with debugging options -O0 -g and run the idb debugger interactively on the myprog.x executable. The GUI access is via X11 port forwarding provided by the PBS workload manager.
## Debugging Parallel Applications
Intel debugger is capable of debugging multithreaded and MPI parallel programs as well.
### Small Number of MPI Ranks
For debugging small number of MPI ranks, you may execute and debug each rank in separate xterm terminal (do not forget the X display. Using Intel MPI, this may be done in following way:
```console
$ qsub -q qexp -l select=2:ncpus=16 -X -I
qsub: waiting for job 19654.srv11 to start
qsub: job 19655.srv11 ready
$ ml intel
$ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x
```
In this example, we allocate 2 full compute node, run xterm on each node and start idb debugger in command line mode, debugging two ranks of mympiprog.x application. The xterm will pop up for each rank, with idb prompt ready. The example is not limited to use of Intel MPI
### Large Number of MPI Ranks
Run the idb debugger from within the MPI debug option. This will cause the debugger to bind to all ranks and provide aggregated outputs across the ranks, pausing execution automatically just after startup. You may then set break points and step the execution manually. Using Intel MPI:
```console
$ qsub -q qexp -l select=2:ncpus=16 -X -I
qsub: waiting for job 19654.srv11 to start
qsub: job 19655.srv11 ready
$ ml intel
$ mpirun -n 32 -idb ./mympiprog.x
```
### Debugging Multithreaded Application
Run the idb debugger in GUI mode. The menu Parallel contains number of tools for debugging multiple threads. One of the most useful tools is the **Serialize Execution** tool, which serializes execution of concurrent threads for easy orientation and identification of concurrency related bugs.
## Further Information
Exhaustive manual on idb features and usage is published at [Intel website](http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/debugger/user_guide/index.htm)
# Intel IPP
## Intel Integrated Performance Primitives
Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX vector instructions is available, via module ipp. The IPP is a very rich library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax, as well as cryptographic functions, linear algebra functions and many more.
!!! note
Check out IPP before implementing own math functions for data processing, it is likely already there.
```console
$ ml ipp
```
The module sets up environment variables, required for linking and running ipp enabled applications.
## IPP Example
```cpp
#include "ipp.h"
#include <stdio.h>
int main(int argc, char* argv[])
{
const IppLibraryVersion *lib;
Ipp64u fm;
IppStatus status;
status= ippInit(); //IPP initialization with the best optimization layer
if( status != ippStsNoErr ) {
printf("IppInit() Error:n");
printf("%sn", ippGetStatusString(status) );
return -1;
}
//Get version info
lib = ippiGetLibVersion();
printf("%s %sn", lib->Name, lib->Version);
//Get CPU features enabled with selected library level
fm=ippGetEnabledCpuFeatures();
printf("SSE :%cn",(fm>1)&1?'Y':'N');
printf("SSE2 :%cn",(fm>2)&1?'Y':'N');
printf("SSE3 :%cn",(fm>3)&1?'Y':'N');
printf("SSSE3 :%cn",(fm>4)&1?'Y':'N');
printf("SSE41 :%cn",(fm>6)&1?'Y':'N');
printf("SSE42 :%cn",(fm>7)&1?'Y':'N');
printf("AVX :%cn",(fm>8)&1 ?'Y':'N');
printf("AVX2 :%cn", (fm>15)&1 ?'Y':'N' );
printf("----------n");
printf("OS Enabled AVX :%cn", (fm>9)&1 ?'Y':'N');
printf("AES :%cn", (fm>10)&1?'Y':'N');
printf("CLMUL :%cn", (fm>11)&1?'Y':'N');
printf("RDRAND :%cn", (fm>13)&1?'Y':'N');
printf("F16C :%cn", (fm>14)&1?'Y':'N');
return 0;
}
```
Compile above example, using any compiler and the ipp module.
```console
$ ml intel
$ ml ipp
$ icc testipp.c -o testipp.x -lippi -lipps -lippcore
```
You will need the ipp module loaded to run the ipp enabled executable. This may be avoided, by compiling library search paths into the executable
```console
$ ml intel
$ ml ipp
$ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore
```
## Code Samples and Documentation
Intel provides number of [Code Samples for IPP](https://software.intel.com/en-us/articles/code-samples-for-intel-integrated-performance-primitives-library), illustrating use of IPP.
Read full documentation on IPP [on Intel website,](http://software.intel.com/sites/products/search/search.php?q=&x=15&y=6&product=ipp&version=7.1&docos=lin) in particular the [IPP Reference manual.](http://software.intel.com/sites/products/documentation/doclib/ipp_sa/71/ipp_manual/index.htm)
# Intel MKL
## Intel Math Kernel Library
Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL provides these basic math kernels:
* BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, offering vector, vector-matrix, and matrix-matrix operations.
* The PARDISO direct sparse solver, an iterative sparse solver, and supporting sparse BLAS (level 1, 2, and 3) routines for solving sparse systems of equations.
* ScaLAPACK distributed processing linear algebra routines for Linux and Windows operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS).
* Fast Fourier transform (FFT) functions in one, two, or three dimensions with support for mixed radices (not limited to sizes that are powers of 2), as well as distributed versions of these functions.
* Vector Math Library (VML) routines for optimized mathematical operations on vectors.
* Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions.
* Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search.
* Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver.
For details see the [Intel MKL Reference Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm).
Intel MKL is available on Anselm
```console
$ ml imkl
```
The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $MKL_INC_DIR, $MKL_LIB_DIR and $MKL_EXAMPLES
!!! note
The MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL.
### Interfaces
The MKL library provides number of interfaces. The fundamental once are the LP64 and ILP64. The Intel MKL ILP64 libraries use the 64-bit integer type (necessary for indexing large arrays, with more than 231^-1 elements), whereas the LP64 libraries index arrays with the 32-bit integer type.
| Interface | Integer type |
| --------- | -------------------------------------------- |
| LP64 | 32-bit, int, integer(kind=4), MPI_INT |
| ILP64 | 64-bit, long int, integer(kind=8), MPI_INT64 |
### Linking
Linking MKL libraries may be complex. Intel [mkl link line advisor](http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor) helps. See also [examples](intel-mkl/#examples) below.
You will need the mkl module loaded to run the mkl enabled executable. This may be avoided, by compiling library search paths into the executable. Include rpath on the compile line:
```console
$ icc .... -Wl,-rpath=$LIBRARY_PATH ...
```
### Threading
!!! note
Advantage in using the MKL library is that it brings threaded parallelization to applications that are otherwise not parallel.
For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS
```console
$ export OMP_NUM_THREADS=16
$ export KMP_AFFINITY=granularity=fine,compact,1,0
```
The application will run with 16 threads with affinity optimized for fine grain parallelization.
## Examples
Number of examples, demonstrating use of the MKL library and its linking is available on Anselm, in the $MKL_EXAMPLES directory. In the examples below, we demonstrate linking MKL to Intel and GNU compiled program for multi-threaded matrix multiplication.
### Working With Examples
```console
$ ml intel
$ cp -a $MKL_EXAMPLES/cblas /tmp/
$ cd /tmp/cblas
$ make sointel64 function=cblas_dgemm
```
In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL example suite installed on Anselm.
### Example: MKL and Intel Compiler
```console
$ ml intel
$ cp -a $MKL_EXAMPLES/cblas /tmp/
$ cd /tmp/cblas
$ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x
$ ./cblas_dgemmx.x data/cblas_dgemmx.d
```
In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL with icc -mkl option. Using the -mkl option is equivalent to:
```console
$ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5
```
In this example, we compile and link the cblas_dgemm example, using LP64 interface to threaded MKL and Intel OMP threads implementation.
### Example: MKL and GNU Compiler
```console
$ ml gcc
$ ml imkl
$ cp -a $MKL_EXAMPLES/cblas /tmp/
$ cd /tmp/cblas
$ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm
$ ./cblas_dgemmx.x data/cblas_dgemmx.d
```
In this example, we compile, link and run the cblas_dgemm example, using LP64 interface to threaded MKL and gnu OMP threads implementation.
## MKL and MIC Accelerators
The MKL is capable to automatically offload the computations o the MIC accelerator. See section [Intel XeonPhi](../intel-xeon-phi/) for details.
## Further Reading
Read more on [Intel website](http://software.intel.com/en-us/intel-mkl), in particular the [MKL users guide](https://software.intel.com/en-us/intel-mkl/documentation/linux).
# Intel TBB
## Intel Threading Building Blocks
Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. The tasks are executed by a runtime scheduler and may
be offloaded to [MIC accelerator](../intel-xeon-phi/).
Intel TBB version 4.1 is available on Anselm
```console
$ ml tbb
```
The module sets up environment variables, required for linking and running tbb enabled applications.
!!! note
Link the tbb library, using -ltbb
## Examples
Number of examples, demonstrating use of TBB and its built-in scheduler is available on Anselm, in the $TBB_EXAMPLES directory.
```console
$ ml intel
$ ml tbb
$ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/
$ cd /tmp/parallel_reduce/primes
$ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb
$ ./primes.x
```
In this example, we compile, link and run the primes example, demonstrating use of parallel task-based reduce in computation of prime numbers.
You will need the tbb module loaded to run the tbb enabled executable. This may be avoided, by compiling library search paths into the executable.
```console
$ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb
```
## Further Reading
Read more on Intel website, <http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/index.htm>
# Intel Parallel Studio
The Anselm cluster provides following elements of the Intel Parallel Studio XE
* Intel Compilers
* Intel Debugger
* Intel MKL Library
* Intel Integrated Performance Primitives Library
* Intel Threading Building Blocks Library
## Intel Compilers
The Intel compilers version 13.1.3 are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler.
```console
$ ml intel
$ icc -v
$ ifort -v
```
Read more at the [Intel Compilers](intel-compilers/) page.
## Intel Debugger
The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use X display for running the GUI.
```console
$ ml intel
$ idb
```
Read more at the [Intel Debugger](intel-debugger/) page.
## Intel Math Kernel Library
Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more.
```console
$ ml imkl
```
Read more at the [Intel MKL](intel-mkl/) page.
## Intel Integrated Performance Primitives
Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX is available, via module ipp. The IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more.
```console
$ ml ipp
```
Read more at the [Intel IPP](intel-integrated-performance-primitives/) page.
## Intel Threading Building Blocks
Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner.
```console
$ ml tbb
```
Read more at the [Intel TBB](intel-tbb/) page.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment