Commit b4474c1c authored by Lukáš Krupčík's avatar Lukáš Krupčík

remove tab

parent 35f4dca9
Pipeline #1983 passed with stages
in 1 minute and 7 seconds
......@@ -7,7 +7,7 @@ In many cases, it is useful to submit huge (>100+) number of computational jobs
However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1000 per job array**
!!! Note
Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
- Use [Job arrays](capacity-computing/#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs
- Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs
......@@ -21,7 +21,7 @@ However, executing huge number of jobs via the PBS queue may strain the system.
## Job Arrays
!!! Note
Huge number of jobs may be easily submitted and managed as a job array.
Huge number of jobs may be easily submitted and managed as a job array.
A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions:
......@@ -150,7 +150,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/)
## GNU Parallel
!!! Note
Use GNU parallel to run many single core tasks on one node.
Use GNU parallel to run many single core tasks on one node.
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm.
......
......@@ -24,14 +24,14 @@ fi
```
!!! Note
Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Conside utilization of SSH session interactivity for such commands as stated in the previous example.
Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Conside utilization of SSH session interactivity for such commands as stated in the previous example.
### Application Modules
In order to configure your shell for running particular application on Anselm we use Module package interface.
!!! Note
The modules set up the application paths, library paths and environment variables for running particular application.
The modules set up the application paths, library paths and environment variables for running particular application.
We have also second modules repository. This modules repository is created using tool called EasyBuild. On Salomon cluster, all modules will be build by this tool. If you want to use software from this modules repository, please follow instructions in section [Application Modules Path Expansion](environment-and-modules/#EasyBuild).
......
......@@ -36,7 +36,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut
Jobs queued in queue qexp are not calculated to project's usage.
!!! Note
Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>.
Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>.
Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job.
......@@ -65,6 +65,6 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu
It means, that jobs with lower execution priority can be run before jobs with higher execution priority.
!!! Note
It is **very beneficial to specify the walltime** when submitting jobs.
It is **very beneficial to specify the walltime** when submitting jobs.
Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority.
......@@ -9,7 +9,7 @@ All compute and login nodes of Anselm are interconnected by a high-bandwidth, lo
The compute nodes may be accessed via the InfiniBand network using ib0 network interface, in address range 10.2.1.1-209. The MPI may be used to establish native InfiniBand connection among the nodes.
!!! Note
The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via native InfiniBand protocol.
The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via native InfiniBand protocol.
The Fat tree topology ensures that peak transfer rates are achieved between any two nodes, independent of network traffic exchanged among other nodes concurrently.
......
......@@ -13,14 +13,14 @@ The resources are allocated to the job in a fair-share fashion, subject to const
- **qfree**, the Free resource utilization queue
!!! Note
Check the queue status at <https://extranet.it4i.cz/anselm/>
Check the queue status at <https://extranet.it4i.cz/anselm/>
Read more on the [Resource AllocationPolicy](resources-allocation-policy/) page.
## Job Submission and Execution
!!! Note
Use the **qsub** command to submit your jobs.
Use the **qsub** command to submit your jobs.
The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 16 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.**
......@@ -29,7 +29,7 @@ Read more on the [Job submission and execution](job-submission-and-execution/) p
## Capacity Computing
!!! Note
Use Job arrays when running huge number of jobs.
Use Job arrays when running huge number of jobs.
Use GNU Parallel and/or Job arrays when running (many) single core jobs.
......
......@@ -33,7 +33,7 @@ Compilation parameters are default:
Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details.
!!! Note
The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS.
The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS.
You are advised to use the -d option to point to a directory in [SCRATCH file system](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch file system.
......
......@@ -24,13 +24,13 @@ On the Anselm cluster COMSOL is available in the latest stable version. There ar
To load the of COMSOL load the module
```bash
$ module load comsol
$ module load comsol
```
By default the **EDU variant** will be loaded. If user needs other version or variant, load the particular version. To obtain the list of available versions use
```bash
$ module avail comsol
$ module avail comsol
```
If user needs to prepare COMSOL jobs in the interactive mode it is recommend to use COMSOL on the compute nodes via PBS Pro scheduler. In order run the COMSOL Desktop GUI on Windows is recommended to use the Virtual Network Computing (VNC).
......
......@@ -21,7 +21,7 @@ The module sets up environment variables, required for using the Allinea Perform
## Usage
!!! Note
Use the the perf-report wrapper on your (MPI) program.
Use the the perf-report wrapper on your (MPI) program.
Instead of [running your MPI program the usual way](../mpi/), use the the perf report wrapper:
......
......@@ -28,7 +28,7 @@ Currently, there are two versions of CUBE 4.2.3 available as [modules](../../env
CUBE is a graphical application. Refer to Graphical User Interface documentation for a list of methods to launch graphical applications on Anselm.
!!! Note
Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes.
Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes.
After loading the appropriate module, simply launch cube command, or alternatively you can use scalasca -examine command to launch the GUI. Note that for Scalasca datasets, if you do not analyze the data with scalasca -examine before to opening them with CUBE, not all performance data will be available.
......
......@@ -193,7 +193,7 @@ Can be used as a sensor for ksysguard GUI, which is currently not installed on A
In a similar fashion to PAPI, PCM provides a C++ API to access the performance counter from within your application. Refer to the [Doxygen documentation](http://intel-pcm-api-documentation.github.io/classPCM.html) for details of the API.
!!! Note
Due to security limitations, using PCM API to monitor your applications is currently not possible on Anselm. (The application must be run as root user)
Due to security limitations, using PCM API to monitor your applications is currently not possible on Anselm. (The application must be run as root user)
Sample program using the API :
......
......@@ -27,7 +27,7 @@ and launch the GUI :
```
!!! Note
To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on using GUI applications.
To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on using GUI applications.
The GUI will open in new window. Click on "_New Project..._" to create a new project. After clicking _OK_, a new window with project properties will appear. At "_Application:_", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "_Managed code profiling mode:_" select "_Native_" (unless you want to profile managed mode .NET/Mono applications). After clicking _OK_, your project is created.
......@@ -48,7 +48,7 @@ Copy the line to clipboard and then you can paste it in your jobscript or in com
## Xeon Phi
!!! Note
This section is outdated. It will be updated with new information soon.
This section is outdated. It will be updated with new information soon.
It is possible to analyze both native and offload Xeon Phi applications. For offload mode, just specify the path to the binary. For native mode, you need to specify in project properties:
......@@ -59,7 +59,7 @@ Application parameters: mic0 source ~/.profile && /path/to/your/bin
Note that we include source ~/.profile in the command to setup environment paths [as described here](../intel-xeon-phi/).
!!! Note
If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card.
If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card.
You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later :
......
......@@ -191,7 +191,7 @@ Now the compiler won't remove the multiplication loop. (However it is still not
### Intel Xeon Phi
!!! Note
PAPI currently supports only a subset of counters on the Intel Xeon Phi processor compared to Intel Xeon, for example the floating point operations counter is missing.
PAPI currently supports only a subset of counters on the Intel Xeon Phi processor compared to Intel Xeon, for example the floating point operations counter is missing.
To use PAPI in [Intel Xeon Phi](../intel-xeon-phi/) native applications, you need to load module with " -mic" suffix, for example " papi/5.3.2-mic" :
......
......@@ -43,7 +43,7 @@ Some notable Scalasca options are:
- **-e &lt;directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with prefix scorep\_, followed by name of the executable and launch configuration.**
!!! Note
Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage/storage/).
Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage/storage/).
### Analysis of Reports
......
......@@ -121,7 +121,7 @@ The source code of this function can be also found in
```
!!! Note
You can also add only following line to you ~/.tvdrc file instead of the entire function:
You can also add only following line to you ~/.tvdrc file instead of the entire function:
**source /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl**
You need to do this step only once.
......
......@@ -5,7 +5,7 @@
Intel Integrated Performance Primitives, version 7.1.1, compiled for AVX vector instructions is available, via module ipp. The IPP is a very rich library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax, as well as cryptographic functions, linear algebra functions and many more.
!!! Note
Check out IPP before implementing own math functions for data processing, it is likely already there.
Check out IPP before implementing own math functions for data processing, it is likely already there.
```bash
$ module load ipp
......
......@@ -24,7 +24,7 @@ Intel MKL version 13.5.192 is available on Anselm
The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the $MKLROOT, $MKL_INC_DIR, $MKL_LIB_DIR and $MKL_EXAMPLES
!!! Note
The MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL.
The MKL library may be linked using any compiler. With intel compiler use -mkl option to link default threaded MKL.
### Interfaces
......@@ -48,7 +48,7 @@ You will need the mkl module loaded to run the mkl enabled executable. This may
### Threading
!!! Note
Advantage in using the MKL library is that it brings threaded parallelization to applications that are otherwise not parallel.
Advantage in using the MKL library is that it brings threaded parallelization to applications that are otherwise not parallel.
For this to work, the application must link the threaded MKL library (default). Number and behaviour of MKL threads may be controlled via the OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS
......
......@@ -14,7 +14,7 @@ Intel TBB version 4.1 is available on Anselm
The module sets up environment variables, required for linking and running tbb enabled applications.
!!! Note
Link the tbb library, using -ltbb
Link the tbb library, using -ltbb
## Examples
......
......@@ -233,7 +233,7 @@ During the compilation Intel compiler shows which loops have been vectorized in
Some interesting compiler flags useful not only for code debugging are:
!!! Note
Debugging
Debugging
openmp_report[0|1|2] - controls the compiler based vectorization diagnostic level
vec-report[0|1|2] - controls the OpenMP parallelizer diagnostic level
......@@ -421,7 +421,7 @@ If the code is parallelized using OpenMP a set of additional libraries is requir
For your information the list of libraries and their location required for execution of an OpenMP parallel code on Intel Xeon Phi is:
!!! Note
/apps/intel/composer_xe_2013.5.192/compiler/lib/mic
/apps/intel/composer_xe_2013.5.192/compiler/lib/mic
- libiomp5.so
- libimf.so
......@@ -502,7 +502,7 @@ After executing the complied binary file, following output should be displayed.
```
!!! Note
More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/>
More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/>
The second example that can be found in "/apps/intel/opencl-examples" directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it.
......@@ -604,7 +604,7 @@ An example of basic MPI version of "hello-world" example in C language, that can
Intel MPI for the Xeon Phi coprocessors offers different MPI programming models:
!!! Note
**Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.)
**Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.)
**Coprocessor-only model** - all MPI ranks reside only on the coprocessors.
......@@ -873,7 +873,7 @@ To run the MPI code using mpirun and the machine file "hosts_file_mix" use:
A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is:
```bash
Hello world from process 0 of 8 on host cn204
Hello world from process 0 of 8 on host cn204
Hello world from process 1 of 8 on host cn204
Hello world from process 2 of 8 on host cn204-mic0
Hello world from process 3 of 8 on host cn204-mic0
......@@ -891,7 +891,7 @@ A possible output of the MPI "hello-world" example executed on two hosts and two
PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated:
!!! Note
**Host only node-file:**
**Host only node-file:**
- /lscratch/${PBS_JOBID}/nodefile-cn MIC only node-file:
- /lscratch/${PBS_JOBID}/nodefile-mic Host and MIC node-file:
......
......@@ -11,7 +11,7 @@ If an ISV application was purchased for educational (research) purposes and also
## Overview of the Licenses Usage
!!! Note
The overview is generated every minute and is accessible from web or command line interface.
The overview is generated every minute and is accessible from web or command line interface.
### Web Interface
......
......@@ -27,7 +27,7 @@ Virtualization has also some drawbacks, it is not so easy to setup efficient sol
Solution described in chapter [HOWTO](virtualization/#howto) is suitable for single node tasks, does not introduce virtual machine clustering.
!!! Note
Please consider virtualization as last resort solution for your needs.
Please consider virtualization as last resort solution for your needs.
!!! Warning
Please consult use of virtualization with IT4Innovation's support.
......@@ -39,7 +39,7 @@ For running Windows application (when source code and Linux native application a
IT4Innovations does not provide any licenses for operating systems and software of virtual machines. Users are ( in accordance with [Acceptable use policy document](http://www.it4i.cz/acceptable-use-policy.pdf)) fully responsible for licensing all software running in virtual machines on Anselm. Be aware of complex conditions of licensing software in virtual environments.
!!! Note
Users are responsible for licensing OS e.g. MS Windows and all software running in their virtual machines.
Users are responsible for licensing OS e.g. MS Windows and all software running in their virtual machines.
## Howto
......@@ -249,7 +249,7 @@ Run virtual machine using optimized devices, user network back-end with sharing
Thanks to port forwarding you can access virtual machine via SSH (Linux) or RDP (Windows) connecting to IP address of compute node (and port 2222 for SSH). You must use VPN network).
!!! Note
Keep in mind, that if you use virtio devices, you must have virtio drivers installed on your virtual machine.
Keep in mind, that if you use virtio devices, you must have virtio drivers installed on your virtual machine.
### Networking and Data Sharing
......
......@@ -7,7 +7,7 @@ The OpenMPI programs may be executed only via the PBS Workload manager, by enter
### Basic Usage
!!! Note
Use the mpiexec to run the OpenMPI code.
Use the mpiexec to run the OpenMPI code.
Example:
......@@ -28,7 +28,7 @@ Example:
```
!!! Note
Please be aware, that in this example, the directive **-pernode** is used to run only **one task per node**, which is normally an unwanted behaviour (unless you want to run hybrid code with just one MPI and 16 OpenMP tasks per node). In normal MPI programs **omit the -pernode directive** to run up to 16 MPI tasks per each node.
Please be aware, that in this example, the directive **-pernode** is used to run only **one task per node**, which is normally an unwanted behaviour (unless you want to run hybrid code with just one MPI and 16 OpenMP tasks per node). In normal MPI programs **omit the -pernode directive** to run up to 16 MPI tasks per each node.
In this example, we allocate 4 nodes via the express queue interactively. We set up the openmpi environment and interactively run the helloworld_mpi.x program. Note that the executable helloworld_mpi.x must be available within the
same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem.
......@@ -49,7 +49,7 @@ You need to preload the executable, if running on the local scratch /lscratch fi
In this example, we assume the executable helloworld_mpi.x is present on compute node cn17 on local scratch. We call the mpiexec whith the **--preload-binary** argument (valid for openmpi). The mpiexec will copy the executable from cn17 to the /lscratch/15210.srv11 directory on cn108, cn109 and cn110 and execute the program.
!!! Note
MPI process mapping may be controlled by PBS parameters.
MPI process mapping may be controlled by PBS parameters.
The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process.
......@@ -98,7 +98,7 @@ In this example, we demonstrate recommended way to run an MPI application, using
### OpenMP Thread Affinity
!!! Note
Important! Bind every OpenMP thread to a core!
Important! Bind every OpenMP thread to a core!
In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP:
......@@ -153,7 +153,7 @@ In this example, we see that ranks have been mapped on nodes according to the or
Exact control of MPI process placement and resource binding is provided by specifying a rankfile
!!! Note
Appropriate binding may boost performance of your application.
Appropriate binding may boost performance of your application.
Example rankfile
......
......@@ -61,7 +61,7 @@ In this example, the openmpi 1.6.5 using intel compilers is activated
## Compiling MPI Programs
!!! Note
After setting up your MPI environment, compile your program using one of the mpi wrappers
After setting up your MPI environment, compile your program using one of the mpi wrappers
```bash
$ mpicc -v
......@@ -108,7 +108,7 @@ Compile the above example with
## Running MPI Programs
!!! Note
The MPI program executable must be compatible with the loaded MPI module.
The MPI program executable must be compatible with the loaded MPI module.
Always compile and execute using the very same MPI module.
It is strongly discouraged to mix mpi implementations. Linking an application with one MPI implementation and running mpirun/mpiexec form other implementation may result in unexpected errors.
......@@ -120,7 +120,7 @@ The MPI program executable must be available within the same path on all nodes.
Optimal way to run an MPI program depends on its memory requirements, memory access pattern and communication pattern.
!!! Note
Consider these ways to run an MPI program:
Consider these ways to run an MPI program:
1. One MPI process per node, 16 threads per process
2. Two MPI processes per node, 8 threads per process
......@@ -131,7 +131,7 @@ Optimal way to run an MPI program depends on its memory requirements, memory acc
**Two MPI** processes per node, using 8 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 8 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and NUMA effect overheads.
!!! Note
Important! Bind every OpenMP thread to a core!
Important! Bind every OpenMP thread to a core!
In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables.
......
......@@ -7,7 +7,7 @@ The MPICH2 programs use mpd daemon or ssh connection to spawn processes, no PBS
### Basic Usage
!!! Note
Use the mpirun to execute the MPICH2 code.
Use the mpirun to execute the MPICH2 code.
Example:
......@@ -44,7 +44,7 @@ You need to preload the executable, if running on the local scratch /lscratch fi
In this example, we assume the executable helloworld_mpi.x is present on shared home directory. We run the cp command via mpirun, copying the executable from shared home to local scratch . Second mpirun will execute the binary in the /lscratch/15210.srv11 directory on nodes cn17, cn108, cn109 and cn110, one process per node.
!!! Note
MPI process mapping may be controlled by PBS parameters.
MPI process mapping may be controlled by PBS parameters.
The mpiprocs and ompthreads parameters allow for selection of number of running MPI processes per node as well as number of OpenMP threads per MPI process.
......@@ -93,7 +93,7 @@ In this example, we demonstrate recommended way to run an MPI application, using
### OpenMP Thread Affinity
!!! Note
Important! Bind every OpenMP thread to a core!
Important! Bind every OpenMP thread to a core!
In the previous two examples with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP:
......
......@@ -42,7 +42,7 @@ plots, images, etc... will be still available.
## Running Parallel Matlab Using Distributed Computing Toolbox / Engine
!!! Note
Distributed toolbox is available only for the EDU variant
Distributed toolbox is available only for the EDU variant
The MPIEXEC mode available in previous versions is no longer available in MATLAB 2015. Also, the programming interface has changed. Refer to [Release Notes](http://www.mathworks.com/help/distcomp/release-notes.html#buanp9e-1).
......@@ -65,7 +65,7 @@ Or in the GUI, go to tab HOME -> Parallel -> Manage Cluster Profiles..., click I
With the new mode, MATLAB itself launches the workers via PBS, so you can either use interactive mode or a batch mode on one node, but the actual parallel processing will be done in a separate job started by MATLAB itself. Alternatively, you can use "local" mode to run parallel code on just a single node.
!!! Note
The profile is confusingly named Salomon, but you can use it also on Anselm.
The profile is confusingly named Salomon, but you can use it also on Anselm.
### Parallel Matlab Interactive Session
......
......@@ -3,7 +3,7 @@
## Introduction
!!! Note
This document relates to the old versions R2013 and R2014. For MATLAB 2015, please use [this documentation instead](matlab/).
This document relates to the old versions R2013 and R2014. For MATLAB 2015, please use [this documentation instead](matlab/).
Matlab is available in the latest stable version. There are always two variants of the release:
......
......@@ -97,7 +97,7 @@ Octave is linked with parallel Intel MKL, so it best suited for batch processing
variable.
!!! Note
Calculations that do not employ parallelism (either by using parallel MKL e.g. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU.
Calculations that do not employ parallelism (either by using parallel MKL e.g. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU.
To use Octave on a node with Xeon Phi:
......
......@@ -96,7 +96,7 @@ Download the package [parallell](package-parallel-vignette.pdf) vignette.
The forking is the most simple to use. Forking family of functions provide parallelized, drop in replacement for the serial apply() family of functions.
!!! Note
Forking via package parallel provides functionality similar to OpenMP construct
Forking via package parallel provides functionality similar to OpenMP construct
omp parallel for
......@@ -147,7 +147,7 @@ Every evaluation of the integrad function runs in parallel on different process.
## Package Rmpi
!!! Note
package Rmpi provides an interface (wrapper) to MPI APIs.
package Rmpi provides an interface (wrapper) to MPI APIs.
It also provides interactive R slave environment. On Anselm, Rmpi provides interface to the [OpenMPI](../mpi-1/Running_OpenMPI/).
......@@ -297,7 +297,7 @@ Execute the example as:
mpi.apply is a specific way of executing Dynamic Rmpi programs.
!!! Note
mpi.apply() family of functions provide MPI parallelized, drop in replacement for the serial apply() family of functions.
mpi.apply() family of functions provide MPI parallelized, drop in replacement for the serial apply() family of functions.
Execution is identical to other dynamic Rmpi programs.
......
......@@ -23,7 +23,7 @@ Versions **1.8.11** and **1.8.13** of HDF5 library are available on Anselm, comp
The module sets up environment variables, required for linking and running HDF5 enabled applications. Make sure that the choice of HDF5 module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results.
!!! Note
Be aware, that GCC version of **HDF5 1.8.11** has serious performance issues, since it's compiled with -O0 optimization flag. This version is provided only for testing of code compiled only by GCC and IS NOT recommended for production computations. For more information, please see: <http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811>
Be aware, that GCC version of **HDF5 1.8.11** has serious performance issues, since it's compiled with -O0 optimization flag. This version is provided only for testing of code compiled only by GCC and IS NOT recommended for production computations. For more information, please see: <http://www.hdfgroup.org/ftp/HDF5/prev-releases/ReleaseFiles/release5-1811>
All GCC versions of **HDF5 1.8.13** are not affected by the bug, are compiled with -O3 optimizations and are recommended for production computations.
......
......@@ -13,10 +13,10 @@ To be able to compile and link code with MAGMA library user has to load followin
To make compilation more user friendly module also sets these two environment variables:
!!! Note
MAGMA_INC - contains paths to the MAGMA header files (to be used for compilation step)
MAGMA_INC - contains paths to the MAGMA header files (to be used for compilation step)
!!! Note
MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking step).
MAGMA_LIBS - contains paths to MAGMA libraries (to be used for linking step).
Compilation example:
......@@ -31,16 +31,16 @@ Compilation example:
MAGMA implementation for Intel MIC requires a MAGMA server running on accelerator prior to executing the user application. The server can be started and stopped using following scripts:
!!! Note
To start MAGMA server use:
**$MAGMAROOT/start_magma_server**
To start MAGMA server use:
**$MAGMAROOT/start_magma_server**
!!! Note
To stop the server use:
**$MAGMAROOT/stop_magma_server**
To stop the server use:
**$MAGMAROOT/stop_magma_server**
!!! Note
For deeper understanding how the MAGMA server is started, see the following script:
**$MAGMAROOT/launch_anselm_from_mic.sh**
For deeper understanding how the MAGMA server is started, see the following script:
**$MAGMAROOT/launch_anselm_from_mic.sh**
To test if the MAGMA server runs properly we can run one of examples that are part of the MAGMA installation:
......
......@@ -58,7 +58,7 @@ To create OpenFOAM environment on ANSELM give the commands:
```
!!! Note
Please load correct module with your requirements “compiler - GCC/ICC, precision - DP/SP”.
Please load correct module with your requirements “compiler - GCC/ICC, precision - DP/SP”.
Create a project directory within the $HOME/OpenFOAM directory named \<USER\>-\<OFversion\> and create a directory named run within it, e.g. by typing:
......@@ -121,7 +121,7 @@ Run the second case for example external incompressible turbulent flow - case -
First we must run serial application bockMesh and decomposePar for preparation of parallel computation.
!!! Note
Create a Bash scrip test.sh:
Create a Bash scrip test.sh:
```bash
#!/bin/bash
......@@ -146,7 +146,7 @@ Job submission
This job create simple block mesh and domain decomposition. Check your decomposition, and submit parallel computation:
!!! Note
Create a PBS script testParallel.pbs:
Create a PBS script testParallel.pbs:
```bash
#!/bin/bash
......
......@@ -7,7 +7,7 @@ In many cases, it is useful to submit huge (100+) number of computational jobs i
However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1500 per job array**
!!! Note
Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
- Use [Job arrays](capacity-computing.md#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs
- Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs
......@@ -21,7 +21,7 @@ However, executing huge number of jobs via the PBS queue may strain the system.
## Job Arrays
!!! Note
Huge number of jobs may be easily submitted and managed as a job array.
Huge number of jobs may be easily submitted and managed as a job array.
A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions:
......@@ -152,7 +152,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/)
## GNU Parallel
!!! Note
Use GNU parallel to run many single core tasks on one node.
Use GNU parallel to run many single core tasks on one node.
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm.
......@@ -224,12 +224,12 @@ In this example, we submit a job of 101 tasks. 24 input files will be processed
## Job Arrays and GNU Parallel
!!! Note
Combine the Job arrays and GNU parallel for best throughput of single core jobs
Combine the Job arrays and GNU parallel for best throughput of single core jobs
While job arrays are able to utilize all available computational nodes, the GNU parallel can be used to efficiently run multiple single-core jobs on single node. The two approaches may be combined to utilize all available (current and future) resources to execute single core jobs.
!!! Note
Every subjob in an array runs GNU parallel to utilize all cores on the node
Every subjob in an array runs GNU parallel to utilize all cores on the node
### GNU Parallel, Shared jobscript
......@@ -284,7 +284,7 @@ cp output $PBS_O_WORKDIR/$TASK.out
In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached.
!!! Note
Select subjob walltime and number of tasks per subjob carefully
Select subjob walltime and number of tasks per subjob carefully
When deciding this values, think about following guiding rules :
......
......@@ -24,7 +24,7 @@ fi
```
!!! Note
Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session interactivity for such commands as stated in the previous example.
Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session interactivity for such commands as stated in the previous example.
### Application Modules
......@@ -57,7 +57,7 @@ Application modules on Salomon cluster are built using [EasyBuild](http://hpcuge
```
!!! Note
The modules set up the application paths, library paths and environment variables for running particular application.
The modules set up the application paths, library paths and environment variables for running particular application.
The modules may be loaded, unloaded and switched, according to momentary needs.
......
......@@ -37,7 +37,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut
# Jobs Queued in Queue qexp Are Not Calculated to Project's Usage.
!!! Note
Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/rsweb/salomon/projects>.
Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/rsweb/salomon/projects>.
Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job.
......@@ -66,7 +66,7 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu
It means, that jobs with lower execution priority can be run before jobs with higher execution priority.
!!! Note
It is **very beneficial to specify the walltime** when submitting jobs.
It is **very beneficial to specify the walltime** when submitting jobs.
Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority.
......
......@@ -12,7 +12,7 @@ When allocating computational resources for the job, please specify
6. Jobscript or interactive switch
!!! Note
Use the **qsub** command to submit your job to a queue for allocation of the computational resources.
Use the **qsub** command to submit your job to a queue for allocation of the computational resources.
Submit the job using the qsub command:
......@@ -23,7 +23,7 @@ $ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] job
The qsub submits the job into the queue, in another words the qsub command creates a request to the PBS Job manager for allocation of specified resources. The resources will be allocated when available, subject to above described policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.**
!!! Note
PBS statement nodes (qsub -l nodes=nodespec) is not supported on Salomon cluster.
PBS statement nodes (qsub -l nodes=nodespec) is not supported on Salomon cluster.
### Job Submission Examples
......@@ -72,7 +72,7 @@ In this example, we allocate 4 nodes, with 24 cores per node (totalling 96 cores
### UV2000 SMP
!!! Note
14 NUMA nodes available on UV2000
14 NUMA nodes available on UV2000
Per NUMA node allocation.
Jobs are isolated by cpusets.
......@@ -109,7 +109,7 @@ $ qsub -m n
### Placement by Name
!!! Note
Not useful for ordinary computing, suitable for node testing/bechmarking and management tasks.
Not useful for ordinary computing, suitable for node testing/bechmarking and management tasks.
Specific nodes may be selected using PBS resource attribute host (for hostnames):
......@@ -136,7 +136,7 @@ For communication intensive jobs it is possible to set stricter requirement - to
Nodes directly connected to the same InifiBand switch can communicate most efficiently. Using the same switch prevents hops in the network and provides for unbiased, most efficient network communication. There are 9 nodes directly connected to every InifiBand switch.
!!! Note