Commit 2b9f45a7 authored by Lukáš Krupčík's avatar Lukáš Krupčík

* -> *

parent 2d8ba497
Pipeline #1985 passed with stages
in 59 seconds
......@@ -9,9 +9,9 @@ However, executing huge number of jobs via the PBS queue may strain the system.
!!! note
Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
* Use [Job arrays](capacity-computing/#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs
* Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs
* Combine [GNU parallel with Job arrays](capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs
* Use [Job arrays](capacity-computing/#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs
* Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs
* Combine [GNU parallel with Job arrays](capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs
## Policy
......@@ -25,9 +25,9 @@ However, executing huge number of jobs via the PBS queue may strain the system.
A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions:
* each subjob has a unique index, $PBS_ARRAY_INDEX
* job Identifiers of subjobs only differ by their indices
* the state of subjobs can differ (R,Q,...etc.)
* each subjob has a unique index, $PBS_ARRAY_INDEX
* job Identifiers of subjobs only differ by their indices
* the state of subjobs can differ (R,Q,...etc.)
All subjobs within a job array have the same scheduling priority and schedule as independent jobs. Entire job array is submitted through a single qsub command and may be managed by qdel, qalter, qhold, qrls and qsig commands as a single job.
......
......@@ -6,46 +6,46 @@ Anselm is cluster of x86-64 Intel based nodes built on Bull Extreme Computing bu
### Compute Nodes Without Accelerator
* 180 nodes
* 2880 cores in total
* two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
* 64 GB of physical memory per node
* one 500GB SATA 2,5” 7,2 krpm HDD per node
* bullx B510 blade servers
* cn[1-180]
* 180 nodes
* 2880 cores in total
* two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
* 64 GB of physical memory per node
* one 500GB SATA 2,5” 7,2 krpm HDD per node
* bullx B510 blade servers
* cn[1-180]
### Compute Nodes With GPU Accelerator
* 23 nodes
* 368 cores in total
* two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
* 96 GB of physical memory per node
* one 500GB SATA 2,5” 7,2 krpm HDD per node
* GPU accelerator 1x NVIDIA Tesla Kepler K20 per node
* bullx B515 blade servers
* cn[181-203]
* 23 nodes
* 368 cores in total
* two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
* 96 GB of physical memory per node
* one 500GB SATA 2,5” 7,2 krpm HDD per node
* GPU accelerator 1x NVIDIA Tesla Kepler K20 per node
* bullx B515 blade servers
* cn[181-203]
### Compute Nodes With MIC Accelerator
* 4 nodes
* 64 cores in total
* two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
* 96 GB of physical memory per node
* one 500GB SATA 2,5” 7,2 krpm HDD per node
* MIC accelerator 1x Intel Phi 5110P per node
* bullx B515 blade servers
* cn[204-207]
* 4 nodes
* 64 cores in total
* two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
* 96 GB of physical memory per node
* one 500GB SATA 2,5” 7,2 krpm HDD per node
* MIC accelerator 1x Intel Phi 5110P per node
* bullx B515 blade servers
* cn[204-207]
### Fat Compute Nodes
* 2 nodes
* 32 cores in total
* 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
* 512 GB of physical memory per node
* two 300GB SAS 3,5”15krpm HDD (RAID1) per node
* two 100GB SLC SSD per node
* bullx R423-E3 servers
* cn[208-209]
* 2 nodes
* 32 cores in total
* 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
* 512 GB of physical memory per node
* two 300GB SAS 3,5”15krpm HDD (RAID1) per node
* two 100GB SLC SSD per node
* bullx R423-E3 servers
* cn[208-209]
![](../img/bullxB510.png)
**Figure Anselm bullx B510 servers**
......@@ -65,23 +65,23 @@ Anselm is equipped with Intel Sandy Bridge processors Intel Xeon E5-2665 (nodes
### Intel Sandy Bridge E5-2665 Processor
* eight-core
* speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology
* peak performance: 19.2 GFLOP/s per core
* caches:
* eight-core
* speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology
* peak performance: 19.2 GFLOP/s per core
* caches:
* L2: 256 KB per core
* L3: 20 MB per processor
* memory bandwidth at the level of the processor: 51.2 GB/s
* memory bandwidth at the level of the processor: 51.2 GB/s
### Intel Sandy Bridge E5-2470 Processor
* eight-core
* speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology
* peak performance: 18.4 GFLOP/s per core
* caches:
* eight-core
* speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology
* peak performance: 18.4 GFLOP/s per core
* caches:
* L2: 256 KB per core
* L3: 20 MB per processor
* memory bandwidth at the level of the processor: 38.4 GB/s
* memory bandwidth at the level of the processor: 38.4 GB/s
Nodes equipped with Intel Xeon E5-2665 CPU have set PBS resource attribute cpu_freq = 24, nodes equipped with Intel Xeon E5-2470 CPU have set PBS resource attribute cpu_freq = 23.
......@@ -101,30 +101,30 @@ Intel Turbo Boost Technology is used by default, you can disable it for all nod
### Compute Node Without Accelerator
* 2 sockets
* Memory Controllers are integrated into processors.
* 2 sockets
* Memory Controllers are integrated into processors.
* 8 DDR3 DIMMs per node
* 4 DDR3 DIMMs per CPU
* 1 DDR3 DIMMs per channel
* Data rate support: up to 1600MT/s
* Populated memory: 8 x 8 GB DDR3 DIMM 1600 MHz
* Populated memory: 8 x 8 GB DDR3 DIMM 1600 MHz
### Compute Node With GPU or MIC Accelerator
* 2 sockets
* Memory Controllers are integrated into processors.
* 2 sockets
* Memory Controllers are integrated into processors.
* 6 DDR3 DIMMs per node
* 3 DDR3 DIMMs per CPU
* 1 DDR3 DIMMs per channel
* Data rate support: up to 1600MT/s
* Populated memory: 6 x 16 GB DDR3 DIMM 1600 MHz
* Populated memory: 6 x 16 GB DDR3 DIMM 1600 MHz
### Fat Compute Node
* 2 sockets
* Memory Controllers are integrated into processors.
* 2 sockets
* Memory Controllers are integrated into processors.
* 16 DDR3 DIMMs per node
* 8 DDR3 DIMMs per CPU
* 2 DDR3 DIMMs per channel
* Data rate support: up to 1600MT/s
* Populated memory: 16 x 32 GB DDR3 DIMM 1600 MHz
* Populated memory: 16 x 32 GB DDR3 DIMM 1600 MHz
......@@ -12,10 +12,10 @@ The cluster compute nodes cn[1-207] are organized within 13 chassis.
There are four types of compute nodes:
* 180 compute nodes without the accelerator
* 23 compute nodes with GPU accelerator - equipped with NVIDIA Tesla Kepler K20
* 4 compute nodes with MIC accelerator - equipped with Intel Xeon Phi 5110P
* 2 fat nodes - equipped with 512 GB RAM and two 100 GB SSD drives
* 180 compute nodes without the accelerator
* 23 compute nodes with GPU accelerator - equipped with NVIDIA Tesla Kepler K20
* 4 compute nodes with MIC accelerator - equipped with Intel Xeon Phi 5110P
* 2 fat nodes - equipped with 512 GB RAM and two 100 GB SSD drives
[More about Compute nodes](compute-nodes/).
......
......@@ -28,11 +28,11 @@ The user will need a valid certificate and to be present in the PRACE LDAP (plea
Most of the information needed by PRACE users accessing the Anselm TIER-1 system can be found here:
* [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs)
* [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ)
* [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh)
* [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details)
* [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer)
* [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs)
* [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ)
* [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh)
* [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details)
* [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer)
Before you start to use any of the services don't forget to create a proxy certificate from your certificate:
......
......@@ -6,11 +6,11 @@ To run a [job](../introduction/), [computational resources](../introduction/) fo
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share](job-priority/) at Anselm ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are available to Anselm users:
* **qexp**, the Express queue
* **qprod**, the Production queue
* **qlong**, the Long queue, regula
* **qnvidia**, **qmic**, **qfat**, the Dedicated queues
* **qfree**, the Free resource utilization queue
* **qexp**, the Express queue
* **qprod**, the Production queue
* **qlong**, the Long queue, regula
* **qnvidia**, **qmic**, **qfat**, the Dedicated queues
* **qfree**, the Free resource utilization queue
!!! note
Check the queue status at <https://extranet.it4i.cz/anselm/>
......
......@@ -20,11 +20,11 @@ The resources are allocated to the job in a fair-share fashion, subject to const
**The qexp queue is equipped with the nodes not having the very same CPU clock speed.** Should you need the very same CPU speed, you have to select the proper nodes during the PSB job submission.
* **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB RAM (cn208-209). This enables to test and tune also accelerated code or code with higher RAM requirements. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour.
* **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerator are included. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
* **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 x 48 h).
* **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to access the Nvidia accelerated nodes, the qmic to access MIC nodes and qfat the Fat nodes. It is required that active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full nodes, 16 cores per node are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated to her/his Project.
* **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 16 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
* **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB RAM (cn208-209). This enables to test and tune also accelerated code or code with higher RAM requirements. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour.
* **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerator are included. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
* **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 x 48 h).
* **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to access the Nvidia accelerated nodes, the qmic to access MIC nodes and qfat the Fat nodes. It is required that active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full nodes, 16 cores per node are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated to her/his Project.
* **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 16 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
### Notes
......
......@@ -198,9 +198,9 @@ Now, configure the applications proxy settings to **localhost:6000**. Use port f
## Graphical User Interface
* The [X Window system](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/) is a principal way to get GUI access to the clusters.
* The [Virtual Network Computing](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc/) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing) system that uses the [Remote Frame Buffer protocol](http://en.wikipedia.org/wiki/RFB_protocol) to remotely control another [computer](http://en.wikipedia.org/wiki/Computer).
* The [X Window system](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/) is a principal way to get GUI access to the clusters.
* The [Virtual Network Computing](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc/) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing) system that uses the [Remote Frame Buffer protocol](http://en.wikipedia.org/wiki/RFB_protocol) to remotely control another [computer](http://en.wikipedia.org/wiki/Computer).
## VPN Access
* Access to IT4Innovations internal resources via [VPN](../get-started-with-it4innovations/accessing-the-clusters/vpn-access/).
* Access to IT4Innovations internal resources via [VPN](../get-started-with-it4innovations/accessing-the-clusters/vpn-access/).
......@@ -12,10 +12,10 @@ NWChem aims to provide its users with computational chemistry tools that are sca
The following versions are currently installed:
* 6.1.1, not recommended, problems have been observed with this version
* 6.3-rev2-patch1, current release with QMD patch applied. Compiled with Intel compilers, MKL and Intel MPI
* 6.3-rev2-patch1-openmpi, same as above, but compiled with OpenMPI and NWChem provided BLAS instead of MKL. This version is expected to be slower
* 6.3-rev2-patch1-venus, this version contains only libraries for VENUS interface linking. Does not provide standalone NWChem executable
* 6.1.1, not recommended, problems have been observed with this version
* 6.3-rev2-patch1, current release with QMD patch applied. Compiled with Intel compilers, MKL and Intel MPI
* 6.3-rev2-patch1-openmpi, same as above, but compiled with OpenMPI and NWChem provided BLAS instead of MKL. This version is expected to be slower
* 6.3-rev2-patch1-venus, this version contains only libraries for VENUS interface linking. Does not provide standalone NWChem executable
For a current list of installed versions, execute:
......@@ -40,5 +40,5 @@ NWChem is compiled for parallel MPI execution. Normal procedure for MPI jobs app
Please refer to [the documentation](http://www.nwchem-sw.org/index.php/Release62:Top-level) and in the input file set the following directives :
* MEMORY : controls the amount of memory NWChem will use
* SCRATCH_DIR : set this to a directory in [SCRATCH file system](../../storage/storage/#scratch) (or run the calculation completely in a scratch directory). For certain calculations, it might be advisable to reduce I/O by forcing "direct" mode, e.g.. "scf direct"
* MEMORY : controls the amount of memory NWChem will use
* SCRATCH_DIR : set this to a directory in [SCRATCH file system](../../storage/storage/#scratch) (or run the calculation completely in a scratch directory). For certain calculations, it might be advisable to reduce I/O by forcing "direct" mode, e.g.. "scf direct"
......@@ -4,11 +4,11 @@
Currently there are several compilers for different programming languages available on the Anselm cluster:
* C/C++
* Fortran 77/90/95
* Unified Parallel C
* Java
* NVIDIA CUDA
* C/C++
* Fortran 77/90/95
* Unified Parallel C
* Java
* NVIDIA CUDA
The C/C++ and Fortran compilers are divided into two main groups GNU and Intel.
......@@ -45,8 +45,8 @@ For more information about the possibilities of the compilers, please see the ma
UPC is supported by two compiler/runtime implementations:
* GNU - SMP/multi-threading support only
* Berkley - multi-node support as well as SMP/multi-threading support
* GNU - SMP/multi-threading support only
* Berkley - multi-node support as well as SMP/multi-threading support
### GNU UPC Compiler
......
......@@ -6,11 +6,11 @@
standard engineering problems COMSOL provides add-on products such as electrical, mechanical, fluid flow, and chemical
applications.
* [Structural Mechanics Module](http://www.comsol.com/structural-mechanics-module),
* [Heat Transfer Module](http://www.comsol.com/heat-transfer-module),
* [CFD Module](http://www.comsol.com/cfd-module),
* [Acoustics Module](http://www.comsol.com/acoustics-module),
* and [many others](http://www.comsol.com/products)
* [Structural Mechanics Module](http://www.comsol.com/structural-mechanics-module),
* [Heat Transfer Module](http://www.comsol.com/heat-transfer-module),
* [CFD Module](http://www.comsol.com/cfd-module),
* [Acoustics Module](http://www.comsol.com/acoustics-module),
* and [many others](http://www.comsol.com/products)
COMSOL also allows an interface support for equation-based modelling of partial differential equations.
......@@ -18,8 +18,8 @@ COMSOL also allows an interface support for equation-based modelling of partial
On the Anselm cluster COMSOL is available in the latest stable version. There are two variants of the release:
* **Non commercial** or so called **EDU variant**, which can be used for research and educational purposes.
* **Commercial** or so called **COM variant**, which can used also for commercial activities. **COM variant** has only subset of features compared to the **EDU variant** available. More about licensing will be posted here soon.
* **Non commercial** or so called **EDU variant**, which can be used for research and educational purposes.
* **Commercial** or so called **COM variant**, which can used also for commercial activities. **COM variant** has only subset of features compared to the **EDU variant** available. More about licensing will be posted here soon.
To load the of COMSOL load the module
......
......@@ -10,13 +10,13 @@ Allinea MAP is a profiler for C/C++/Fortran HPC codes. It is designed for profil
On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel processes. In case of debugging GPU or Xeon Phi accelerated codes the limit is 8 accelerators. These limitation means that:
* 1 user can debug up 64 processes, or
* 32 users can debug 2 processes, etc.
* 1 user can debug up 64 processes, or
* 32 users can debug 2 processes, etc.
In case of debugging on accelerators:
* 1 user can debug on up to 8 accelerators, or
* 8 users can debug on single accelerator.
* 1 user can debug on up to 8 accelerators, or
* 8 users can debug on single accelerator.
## Compiling Code to Run With DDT
......
......@@ -4,9 +4,9 @@
CUBE is a graphical performance report explorer for displaying data from Score-P and Scalasca (and other compatible tools). The name comes from the fact that it displays performance data in a three-dimensions :
* **performance metric**, where a number of metrics are available, such as communication time or cache misses,
* **call path**, which contains the call tree of your program
* **system resource**, which contains system's nodes, processes and threads, depending on the parallel programming model.
* **performance metric**, where a number of metrics are available, such as communication time or cache misses,
* **call path**, which contains the call tree of your program
* **system resource**, which contains system's nodes, processes and threads, depending on the parallel programming model.
Each dimension is organized in a tree, for example the time performance metric is divided into Execution time and Overhead time, call path dimension is organized by files and routines in your source code etc.
......@@ -20,8 +20,8 @@ Each node in the tree is colored by severity (the color scheme is displayed at t
Currently, there are two versions of CUBE 4.2.3 available as [modules](../../environment-and-modules/):
* cube/4.2.3-gcc, compiled with GCC
* cube/4.2.3-icc, compiled with Intel compiler
* cube/4.2.3-gcc, compiled with GCC
* cube/4.2.3-icc, compiled with Intel compiler
## Usage
......
......@@ -4,11 +4,11 @@
Intel VTune Amplifier, part of Intel Parallel studio, is a GUI profiling tool designed for Intel processors. It offers a graphical performance analysis of single core and multithreaded applications. A highlight of the features:
* Hotspot analysis
* Locks and waits analysis
* Low level specific counters, such as branch analysis and memory
* Hotspot analysis
* Locks and waits analysis
* Low level specific counters, such as branch analysis and memory
bandwidth
* Power usage analysis - frequency and sleep states.
* Power usage analysis - frequency and sleep states.
![screenshot](../../../img/vtune-amplifier.png)
......
......@@ -76,15 +76,15 @@ Prints information about the memory architecture of the current CPU.
PAPI provides two kinds of events:
* **Preset events** is a set of predefined common CPU events, standardized across platforms.
* **Native events **is a set of all events supported by the current hardware. This is a larger set of features than preset. For other components than CPU, only native events are usually available.
* **Preset events** is a set of predefined common CPU events, standardized across platforms.
* **Native events **is a set of all events supported by the current hardware. This is a larger set of features than preset. For other components than CPU, only native events are usually available.
To use PAPI in your application, you need to link the appropriate include file.
* papi.h for C
* f77papi.h for Fortran 77
* f90papi.h for Fortran 90
* fpapi.h for Fortran with preprocessor
* papi.h for C
* f77papi.h for Fortran 77
* f90papi.h for Fortran 90
* fpapi.h for Fortran with preprocessor
The include path is automatically added by papi module to $INCLUDE.
......
......@@ -10,8 +10,8 @@ Scalasca supports profiling of MPI, OpenMP and hybrid MPI+OpenMP applications.
There are currently two versions of Scalasca 2.0 [modules](../../environment-and-modules/) installed on Anselm:
* scalasca2/2.0-gcc-openmpi, for usage with [GNU Compiler](../compilers/) and [OpenMPI](../mpi/Running_OpenMPI/),
* scalasca2/2.0-icc-impi, for usage with [Intel Compiler](../compilers.html) and [Intel MPI](../mpi/running-mpich2/).
* scalasca2/2.0-gcc-openmpi, for usage with [GNU Compiler](../compilers/) and [OpenMPI](../mpi/Running_OpenMPI/),
* scalasca2/2.0-icc-impi, for usage with [Intel Compiler](../compilers.html) and [Intel MPI](../mpi/running-mpich2/).
## Usage
......@@ -39,8 +39,8 @@ An example :
Some notable Scalasca options are:
* **-t Enable trace data collection. By default, only summary data are collected.**
* **-e &lt;directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with prefix scorep\_, followed by name of the executable and launch configuration.**
* **-t Enable trace data collection. By default, only summary data are collected.**
* **-e &lt;directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with prefix scorep\_, followed by name of the executable and launch configuration.**
!!! note
Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage/storage/).
......
......@@ -10,8 +10,8 @@ Score-P can be used as an instrumentation tool for [Scalasca](scalasca/).
There are currently two versions of Score-P version 1.2.6 [modules](../../environment-and-modules/) installed on Anselm :
* scorep/1.2.3-gcc-openmpi, for usage with [GNU Compiler](../compilers/) and [OpenMPI](../mpi/Running_OpenMPI/)
* scorep/1.2.3-icc-impi, for usage with [Intel Compiler](../compilers.html)> and [Intel MPI](../mpi/running-mpich2/)>.
* scorep/1.2.3-gcc-openmpi, for usage with [GNU Compiler](../compilers/) and [OpenMPI](../mpi/Running_OpenMPI/)
* scorep/1.2.3-icc-impi, for usage with [Intel Compiler](../compilers.html)> and [Intel MPI](../mpi/running-mpich2/)>.
## Instrumentation
......
......@@ -10,19 +10,19 @@ Valgind is an extremely useful tool for debugging memory errors such as [off-by-
The main tools available in Valgrind are :
* **Memcheck**, the original, must used and default tool. Verifies memory access in you program and can detect use of unitialized memory, out of bounds memory access, memory leaks, double free, etc.
* **Massif**, a heap profiler.
* **Hellgrind** and **DRD** can detect race conditions in multi-threaded applications.
* **Cachegrind**, a cache profiler.
* **Callgrind**, a callgraph analyzer.
* For a full list and detailed documentation, please refer to the [official Valgrind documentation](http://valgrind.org/docs/).
* **Memcheck**, the original, must used and default tool. Verifies memory access in you program and can detect use of unitialized memory, out of bounds memory access, memory leaks, double free, etc.
* **Massif**, a heap profiler.
* **Hellgrind** and **DRD** can detect race conditions in multi-threaded applications.
* **Cachegrind**, a cache profiler.
* **Callgrind**, a callgraph analyzer.
* For a full list and detailed documentation, please refer to the [official Valgrind documentation](http://valgrind.org/docs/).
## Installed Versions
There are two versions of Valgrind available on Anselm.
* Version 3.6.0, installed by operating system vendor in /usr/bin/valgrind. This version is available by default, without the need to load any module. This version however does not provide additional MPI support.
* Version 3.9.0 with support for Intel MPI, available in [module](../../environment-and-modules/) valgrind/3.9.0-impi. After loading the module, this version replaces the default valgrind.
* Version 3.6.0, installed by operating system vendor in /usr/bin/valgrind. This version is available by default, without the need to load any module. This version however does not provide additional MPI support.
* Version 3.9.0 with support for Intel MPI, available in [module](../../environment-and-modules/) valgrind/3.9.0-impi. After loading the module, this version replaces the default valgrind.
## Usage
......
......@@ -32,5 +32,5 @@ Read more at <http://software.intel.com/sites/products/documentation/doclib/stdx
Anselm nodes are currently equipped with Sandy Bridge CPUs, while Salomon will use Haswell architecture. >The new processors are backward compatible with the Sandy Bridge nodes, so all programs that ran on the Sandy Bridge processors, should also run on the new Haswell nodes. >To get optimal performance out of the Haswell processors a program should make use of the special AVX2 instructions for this processor. One can do this by recompiling codes with the compiler flags >designated to invoke these instructions. For the Intel compiler suite, there are two ways of doing this:
* Using compiler flag (both for Fortran and C): -xCORE-AVX2. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge nodes.
* Using compiler flags (both for Fortran and C): -xAVX -axCORE-AVX2. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries.
* Using compiler flag (both for Fortran and C): -xCORE-AVX2. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge nodes.
* Using compiler flags (both for Fortran and C): -xAVX -axCORE-AVX2. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries.
......@@ -4,14 +4,14 @@
Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL provides these basic math kernels:
* BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, offering vector, vector-matrix, and matrix-matrix operations.
* The PARDISO direct sparse solver, an iterative sparse solver, and supporting sparse BLAS (level 1, 2, and 3) routines for solving sparse systems of equations.
* ScaLAPACK distributed processing linear algebra routines for Linux and Windows operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS).
* Fast Fourier transform (FFT) functions in one, two, or three dimensions with support for mixed radices (not limited to sizes that are powers of 2), as well as distributed versions of these functions.
* Vector Math Library (VML) routines for optimized mathematical operations on vectors.
* Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions.
* Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search.
* Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver.
* BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, offering vector, vector-matrix, and matrix-matrix operations.
* The PARDISO direct sparse solver, an iterative sparse solver, and supporting sparse BLAS (level 1, 2, and 3) routines for solving sparse systems of equations.
* ScaLAPACK distributed processing linear algebra routines for Linux and Windows operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS).
* Fast Fourier transform (FFT) functions in one, two, or three dimensions with support for mixed radices (not limited to sizes that are powers of 2), as well as distributed versions of these functions.
* Vector Math Library (VML) routines for optimized mathematical operations on vectors.
* Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions.
* Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search.
* Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver.
For details see the [Intel MKL Reference Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm).
......
......@@ -61,21 +61,21 @@ The general format of the name is `feature__APP__FEATURE`.
Names of applications (APP):
* ansys
* comsol
* comsol-edu
* matlab
* matlab-edu
* ansys
* comsol
* comsol-edu
* matlab
* matlab-edu
To get the FEATUREs of a license take a look into the corresponding state file ([see above](isv_licenses/#Licence)), or use:
**Application and List of provided features**
* **ansys** $ grep -v "#" /apps/user/licenses/ansys_features_state.txt | cut -f1 -d' '
* **comsol** $ grep -v "#" /apps/user/licenses/comsol_features_state.txt | cut -f1 -d' '
* **comsol-ed** $ grep -v "#" /apps/user/licenses/comsol-edu_features_state.txt | cut -f1 -d' '
* **matlab** $ grep -v "#" /apps/user/licenses/matlab_features_state.txt | cut -f1 -d' '
* **matlab-edu** $ grep -v "#" /apps/user/licenses/matlab-edu_features_state.txt | cut -f1 -d' '
* **ansys** $ grep -v "#" /apps/user/licenses/ansys_features_state.txt | cut -f1 -d' '
* **comsol** $ grep -v "#" /apps/user/licenses/comsol_features_state.txt | cut -f1 -d' '
* **comsol-ed** $ grep -v "#" /apps/user/licenses/comsol-edu_features_state.txt | cut -f1 -d' '
* **matlab** $ grep -v "#" /apps/user/licenses/matlab_features_state.txt | cut -f1 -d' '
* **matlab-edu** $ grep -v "#" /apps/user/licenses/matlab-edu_features_state.txt | cut -f1 -d' '
Example of PBS Pro resource name, based on APP and FEATURE name:
......
......@@ -6,11 +6,11 @@ Running virtual machines on compute nodes
There are situations when Anselm's environment is not suitable for user needs.
* Application requires different operating system (e.g Windows), application is not available for Linux
* Application requires different versions of base system libraries and tools
* Application requires specific setup (installation, configuration) of complex software stack
* Application requires privileged access to operating system
* ... and combinations of above cases
* Application requires different operating system (e.g Windows), application is not available for Linux
* Application requires different versions of base system libraries and tools
* Application requires specific setup (installation, configuration) of complex software stack
* Application requires privileged access to operating system
* ... and combinations of above cases
We offer solution for these cases - **virtualization**. Anselm's environment gives the possibility to run virtual machines on compute nodes. Users can create their own images of operating system with specific software stack and run instances of these images as virtual machines on compute nodes. Run of virtual machines is provided by standard mechanism of [Resource Allocation and Job Execution](../../resource-allocation-and-job-execution/introduction/).
......@@ -65,13 +65,13 @@ You can either use your existing image or create new image from scratch.
QEMU currently supports these image types or formats:
* raw
* cloop
* cow
* qcow
* qcow2
* vmdk - VMware 3 & 4, or 6 image format, for exchanging images with that product
* vdi - VirtualBox 1.1 compatible image format, for exchanging images with VirtualBox.
* raw
* cloop
* cow
* qcow
* qcow2
* vmdk - VMware 3 & 4, or 6 image format, for exchanging images with that product
* vdi - VirtualBox 1.1 compatible image format, for exchanging images with VirtualBox.
You can convert your existing image using qemu-img convert command. Supported formats of this command are: blkdebug blkverify bochs cloop cow dmg file ftp ftps host_cdrom host_device host_floppy http https nbd parallels qcow qcow2 qed raw sheepdog tftp vdi vhdx vmdk vpc vvfat.
......@@ -97,10 +97,10 @@ Your image should run some kind of operating system startup script. Startup scri
We recommend, that startup script
* maps Job Directory from host (from compute node)
* runs script (we call it "run script") from Job Directory and waits for application's exit
* maps Job Directory from host (from compute node)
* runs script (we call it "run script") from Job Directory and waits for application's exit
* for management purposes if run script does not exist wait for some time period (few minutes)
* shutdowns/quits OS
* shutdowns/quits OS
For Windows operating systems we suggest using Local Group Policy Startup script, for Linux operating systems rc.local, runlevel init script or similar service.
......@@ -338,9 +338,9 @@ Interface tap0 has IP address 192.168.1.1 and network mask 255.255.255.0 (/24).
Redirected ports:
* DNS udp/53->udp/3053, tcp/53->tcp3053
* DHCP udp/67->udp3067
* SMB tcp/139->tcp3139, tcp/445->tcp3445).
* DNS udp/53->udp/3053, tcp/53->tcp3053
* DHCP udp/67->udp3067
* SMB tcp/139->tcp3139, tcp/445->tcp3445).
You can configure IP address of virtual machine statically or dynamically. For dynamic addressing provide your DHCP server on port 3067 of tap0 interface, you can also provide your DNS server on port 3053 of tap0 interface for example:
......
......@@ -4,8 +4,8 @@
Matlab is available in versions R2015a and R2015b. There are always two variants of the release:
* Non commercial or so called EDU variant, which can be used for common research and educational purposes.
* Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available.
* Non commercial or so called EDU variant, which can be used for common research and educational purposes.
* Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available.
To load the latest version of Matlab load the module
......
......@@ -7,8 +7,8 @@
Matlab is available in the latest stable version. There are always two variants of the release:
* Non commercial or so called EDU variant, which can be used for common research and educational purposes.
* Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available.
* Non commercial or so called EDU variant, which can be used for common research and educational purposes.
* Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available.
To load the latest version of Matlab load the module
......
......@@ -90,8 +90,8 @@ In this example, the calculation was automatically divided among the CPU cores a
A version of [native](../intel-xeon-phi/#section-4) Octave is compiled for Xeon Phi accelerators. Some limitations apply for this version:
* Only command line support. GUI, graph plotting etc. is not supported.
* Command history in interactive mode is not supported.
* Only command line support. GUI, graph plotting etc. is not supported.
* Command history in interactive mode is not supported.
Octave is linked with parallel Intel MKL, so it best suited for batch processing of tasks that utilize BLAS, LAPACK and FFT operations. By default, number of threads is set to 120, you can control this with > OMP_NUM_THREADS environment
variable.
......
......@@ -8,11 +8,11 @@ PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of bu
## Resources
* [project webpage](http://www.mcs.anl.gov/petsc/)
* [documentation](http://www.mcs.anl.gov/petsc/documentation/)
* [project webpage](http://www.mcs.anl.gov/petsc/)
* [documentation](http://www.mcs.anl.gov/petsc/documentation/)
* [PETSc Users Manual (PDF)](http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf)
* [index of all manual pages](http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/singleindex.html)
* PRACE Video Tutorial [part1](http://www.youtube.com/watch?v=asVaFg1NDqY), [part2](http://www.youtube.com/watch?v=ubp_cSibb9I), [part3](http://www.youtube.com/watch?v=vJAAAQv-aaw), [part4](http://www.youtube.com/watch?v=BKVlqWNh8jY), [part5](http://www.youtube.com/watch?v=iXkbLEBFjlM)
* PRACE Video Tutorial [part1](http://www.youtube.com/watch?v=asVaFg1NDqY), [part2](http://www.youtube.com/watch?v=ubp_cSibb9I), [part3](http://www.youtube.com/watch?v=vJAAAQv-aaw), [part4](http://www.youtube.com/watch?v=BKVlqWNh8jY), [part5](http://www.youtube.com/watch?v=iXkbLEBFjlM)
## Modules
......@@ -36,25 +36,25 @@ All these libraries can be used also alone, without PETSc. Their static or share
### Libraries Linked to PETSc on Anselm (As of 11 April 2015)
* dense linear algebra
* dense linear algebra
* [Elemental](http://libelemental.org/)
* sparse linear system solvers
* sparse linear system solvers
* [Intel MKL Pardiso](https://software.intel.com/en-us/node/470282)
* [MUMPS](http://mumps.enseeiht.fr/)
* [PaStiX](http://pastix.gforge.inria.fr/)
* [SuiteSparse](http://faculty.cse.tamu.edu/davis/suitesparse.html)
* [SuperLU](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu)