Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • chat
  • kru0052-master-patch-91081
  • lifecycle
  • master
  • 20180621-before_revision
  • 20180621-revision
6 results

Target

Select target project
  • sccs/docs.it4i.cz
  • soj0018/docs.it4i.cz
  • lszustak/docs.it4i.cz
  • jarosjir/docs.it4i.cz
  • strakpe/docs.it4i.cz
  • beranekj/docs.it4i.cz
  • tab0039/docs.it4i.cz
  • davidciz/docs.it4i.cz
  • gui0013/docs.it4i.cz
  • mrazek/docs.it4i.cz
  • lriha/docs.it4i.cz
  • it4i-vhapla/docs.it4i.cz
  • hol0598/docs.it4i.cz
  • sccs/docs-it-4-i-cz-fumadocs
  • siw019/docs-it-4-i-cz-fumadocs
15 results
Select Git revision
  • MPDATABenchmark
  • Urx
  • anselm2
  • hot_fix
  • john_branch
  • master
  • mkdocs_update
  • patch-1
  • pbs
  • salomon_upgrade
  • tabs
  • virtual_environment2
  • 20180621-before_revision
  • 20180621-revision
14 results
Show changes
Showing
with 2974 additions and 0 deletions
# DeepDock
Adapted from [https://github.com/OptiMaL-PSE-Lab/DeepDock](https://github.com/OptiMaL-PSE-Lab/DeepDock)
Code related to: [O. Mendez-Lucio, M. Ahmad, E.A. del Rio-Chanona, J.K. Wegner, A Geometric Deep Learning Approach to Predict Binding Conformations of Bioactive Molecules, Nature Machine Intelligence volume 3, pages1033–1039 (2021)](https://rdcu.be/cDy5f)
Open access preprint [available here](https://doi.org/10.26434/chemrxiv.14453106.v1)
## Getting Started
### Main Requirements:
* PyTorch = 1.10.0
* CUDA Toolkit = 11.3
* Python = 3.6.9
* RDKIT = 2019.09.1
### Prerequisites
* create Conda environment
```sh
conda create -n "current" python=3.6.9
```
* activate newly created Conda environment
```sh
conda activate current
```
* install PyTorch 1.10.0 and other dependencies (more info at [https://pytorch.org/get-started/previous-versions/](https://pytorch.org/get-started/previous-versions/))
```sh
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
```
* if not yet available in your system, load CUDA 11.3
```sh
ml load CUDA/11.3.1
```
* install PyTorch scatter, sparse and geometric
```sh
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.0+cu113.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.10.0+cu113.html
pip install torch-geometric
```
* uninstall PyTorch spline
```sh
pip uninstall torch-spline-conv
```
* uninstall PyMesh (necessary to generate `.ply` files)
```sh
wget --no-check-certificate https://github.com/PyMesh/PyMesh/releases/download/v0.2.0/pymesh2-0.2.0-cp36-cp36m-linux_x86_64.whl
pip install pymesh2-0.2.0-cp36-cp36m-linux_x86_64.whl
git clone https://github.com/shenwanxiang/ChemBench.git
cd ChemBench
pip install -e .
```
* install Trimesh
```sh
conda install -c conda-forge trimesh
```
* if not yet available in your system, fixed issues with the version of `libstdc`
```sh
module load GCC/9.3.0
```
* install other dependencies
```sh
conda install Biopython
conda install cmake
conda install automake
conda install bison
conda install flex
conda install -c anaconda swig
conda install -c conda-forge apbs
conda install -c conda-forge pdb2pqry
```
* install AmberTools to replace reduce3.34 (for protonation) - since it is deprecated and no longer available, it is now included in AmberTools
```sh
conda install -c conda-forge ambertools
```
* install requirements.txt
```sh
pip install -r requirements.txt
```
* install RDKIT
```sh
conda install -c conda-forge rdkit=2019.09.1
```
### Installation
1. Clone the repo
```sh
git clone https://github.com/paulo308/deepdock
```
2. Move into the project folder and update submodules
```sh
cd DeepDock
git submodule update --init --recursive
```
3. Install prerequisite packages
```sh
pip install -r requirements.txt
```
4. Install DeepDock pacakge
```sh
pip install -e .
```
### Configuration
* navigate to the location of your `apbs-pdb2pqr/pdb2pqr` installation and run the Python (2.7) script to link with your current Conda environment. For more information, refer to the Dockerfile (lines 60 to 72)
```sh
[your conda environment WORK DIRECTORY]/install/apbs-pdb2pqr/pdb2pqr
python2.7 scons/scons.py install PREFIX="[your conda ENVIRONMENT PATH]/bin/pdb2pqr"
```
* move the "multivalue" file to your Conda envirnoment path
```sh
cp multivalue [your conda environment path]/share/apbs/tools/mesh/multivalue
```
* setup necessary environment variables with the tools and respective paths
```sh
export MSMS_BIN=[your conda environment path]/bin/msms
export APBS_BIN=[your conda environment path]/bin/apbs
export PDB2PQR_BIN=[your conda environment path]/bin/pdb2pqr/pdb2pqr.py
export MULTIVALUE_BIN=[your conda environment path]/share/apbs/tools/mesh/multivalue
```
## Data
You can get training and testing data following the next steps.
1. Move into the project data folder
```sh
cd DeepDock/data
```
2. Use the following line to download the preprocessed data used to train and test the model. This will download two files, one containing PDBbind (2.3 GB) used for training and another containing CASF-2016 (32 MB) used for testing. These two files are enough to run all [examples](https://github.com/OptiMaL-PSE-Lab/DeepDock/blob/main/examples).
```sh
source get_deepdock_data.sh
```
2. In case you want to reproduce all results of the paper you will need to download the complete CASF-2016 set (~1.5 GB). You can do so with this command line from the data folder.
```sh
source get_CASF_2016.sh
```
## Example Usage
Usage examples can be seen directly in the Jupyter Notebooks included in the repo. We added examples for:
* [Training the model](https://github.com/OptiMaL-PSE-Lab/DeepDock/blob/main/examples/Train_DeepDock.ipynb)
* [Score molecules](https://github.com/OptiMaL-PSE-Lab/DeepDock/blob/main/examples/Score_example.ipynb)
* [Predict binding conformation (docking)](https://github.com/OptiMaL-PSE-Lab/DeepDock/blob/main/examples/Docking_example.ipynb)
# Machine Learning
This section overviews machine learning frameworks and libraries available on the clusters.
## Keras
Keras is an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear & actionable error messages. It also has extensive documentation and developer guides. For more information, see the [official website][c].
For the list of available versions, type:
```console
$ ml av Keras
```
## NetKet
NetKet is an open-source project for the development of machine intelligence for many-body quantum systems.
For more information, see the [official website][d] or [GitHub][e].
## TensorFlow
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. For more information, see the [official website][a].
For more information see the [TensorFlow][1] section.
## Theano
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation. For more information, see the [official webpage][b] (GitHub).
For the list of available versions, type:
```console
$ ml av Theano
```
[1]: tensorflow.md
[a]: https://www.tensorflow.org/
[b]: https://github.com/Theano/
[c]: https://keras.io/
[d]: http://www.netket.org
[e]: https://github.com/netket
<!---
2021-04-08
It is necessary to load the correct NumPy / SciPy modules along with the Tensorflow one.
Obsolete 2021-03-31
## Todo
Salomon -> Theano/0.9.0-Python-3.6.1 does NOT include several mandatory modules, like NumPy and SciPy
Salomon -> Keras/2.0.5-Theano-1.2.0-Python-3.6.1 loads Theano/0.9.0-Python-3.6.1, meaning it also does not include mandatory librarie
Salomon -> /apps/modules/math/Keras/2.3.0-Tensorflow-1.13.1-Python-3.7.2 works, others miss NumPy or other libraries
What seems to work on Salomon:
for theano:
/apps/modules/python/Theano/1.0.1-Py-3.6, and Keras with this backend
for keras:
/apps/modules/math/Keras/2.3.0-Tensorflow-1.13.1-Python-3.7.2
/apps/modules/python/Keras/2.1.4-Py-3.6-Tensorflow-1.6.0rc0
-->
# NetKet
Open-source project for the development of machine intelligence for many-body quantum systems.
## Introduction
NetKet is a numerical framework written in Python to simulate many-body quantum systems using variational methods. In general, NetKet allows the user to parametrize quantum states using arbitrary functions, be it simple mean-field ansatz, Jastrow, MPS ansatz or convolutional neural networks. Those states can be sampled efficiently in order to estimate observables or other quantities. Stochastic optimization of the energy or a time-evolution are implemented on top of those samplers.
NetKet tries to follow the functional programming paradigm, and is built around jax. While it is possible to run the examples without knowledge of jax, it is recommended that the users get familiar with it if they wish to extend NetKet.
For more information, see the [NetKet documentation][1].
## Running NetKet
Load the `Python/3.8.6-GCC-10.2.0-NetKet` and `intel/2020b` modules.
### Example for Multi-GPU Node
!!! important
Set the visible device in the environment variable before loading jax and NetKet, as NetKet loads jax.
```code
# J1-J2 model
# Version with complex Hamiltonian
#
################################################################################
import os
import sys
# detect MPI rank
from mpi4py import MPI
rank = MPI.COMM_WORLD.Get_rank()
# set only one visible device
os.environ["CUDA_VISIBLE_DEVICES"] = f"{rank}"
# force to use gpu
os.environ["JAX_PLATFORM_NAME"] = "gpu"
import jax
import netket as nk
import numpy as np
import json
import mpi4jax
print("NetKet version: {}".format(nk.__version__))
print("Jax devices: {}".format(jax.devices()))
print("Jax version: {}".format(jax.__version__))
print("MPI utils available: {}".format(nk.utils.mpi.available))
# Parameters
L = 12 # length
J1 = 1.0 # nearest-neighbours exchange interaction
J2 = .50 # next-nearest-neighbours exchange interaction
MSR = 1 # Marshall sign rule (+1/-1)
# ## Hamiltonian
# Hilbert space
g = nk.graph.Chain(L, pbc=True)
hilbert = nk.hilbert.Spin(s=0.5, total_sz=0.0, N=g.n_nodes)
print("Number of graph nodes: {:d}".format(g.n_nodes))
print("Hilbert size: {:d}".format(hilbert.size))
# Pauli matrices
def sigx(i):
return nk.operator.spin.sigmax(hilbert, i, dtype=np.complex128)
def sigy(i):
return nk.operator.spin.sigmay(hilbert, i, dtype=np.complex128)
def sigz(i):
return nk.operator.spin.sigmaz(hilbert, i, dtype=np.complex128)
def heisenberg(i, j, sgn=1):
"""Heisenberg two spin interaction including Marshall sign rule."""
return sgn * (sigx(i)*sigx(j) + sigy(i)*sigy(j)) + sigz(i)*sigz(j)
# setup local Hamiltonian
Ha = nk.operator.LocalOperator(hilbert, dtype=np.complex128) # Hamiltonian
# nearest neighbours
for i in range(L - 1):
Ha += J1 * heisenberg(i, i+1, MSR)
Ha += J1 * heisenberg(L-1, 0, MSR)
# next nearest neighbours
for i in range(L - 2):
Ha += J2 * heisenberg(i, i+2)
Ha += J2 * ( heisenberg(L-1, 1) + heisenberg(L-2, 0) )
# check Hamiltonian
print("Hamiltonian is hemitian: {}".format(Ha.is_hermitian))
print("Number of local operators: {:d}".format(len(Ha.operators)))
print("Hamiltonian size: {}".format(Ha.to_dense().shape))
# ## Exact diagonalization
ED = nk.exact.lanczos_ed(Ha, compute_eigenvectors=False)
E0 = ED[0]
print("Exact ground state energy: {:.5f}".format(E0))
# ## Restricted Boltzmann Machine
# setup model
model = nk.models.RBM(alpha=1,dtype=np.complex128)
sa = nk.sampler.MetropolisExchange(hilbert, graph=g, d_max=2, n_chains_per_rank=1)
vs = nk.vqs.MCState(sa, model, n_samples=3000)
opt = nk.optimizer.Sgd(learning_rate=0.01)
sr = nk.optimizer.SR(diag_shift=0.01)
gs = nk.VMC(hamiltonian=Ha, optimizer=opt, variational_state=vs, preconditioner=sr)
# run simulations
output_files = "J1J2cplx_L{:d}_J{:.2f}_rbm".format(L, J2)
gs.run(out=output_files, n_iter=3000)
# print energy
Data = json.load(open("{:s}.log".format(output_files)))
E0rbm = np.mean(Data["Energy"]["Mean"]["real"][-500:-1])
print("RBM ground state energy: {:.5f}".format(E0rbm))
```
[1]: https://www.netket.org/docs/getting_started.html#installation-and-requirements
# TensorFlow
TensorFlow (TF) is an open-source software library which can compile tensor operations to execute
very quickly on both CPUs and GPUs. It is often used as a backend for machine learning libraries
and models.
We heavily recommend the usage of `TensorFlow 2.x`. TensorFlow 1 has been long deprecated and it
will probably be difficult to make it run on GPUs on our clusters.
## Installation
For TensorFlow to work with GPUs, you have to use several libraries (CUDA, cuDNN, NCCL etc.)
with versions that are compatible together.
You can load the correct modules with the following command:
```console
$ ml TensorFlow
```
If you want to upgrade the TensorFlow version used in this package or install additional Python
modules, you can simply create a virtual environment and install a different TensorFlow version
inside it:
```console
$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ python3 -m pip install -U setuptools wheel pip
(venv) $ python3 -m pip install tensorflow
```
However, if you use a newer TensorFlow version than the one included in the `TensorFlow` module,
you should make sure that it is still compatible with the CUDA version provided by the module.
You can find the required `CUDA`/`cuDNN` versions for the latest TF
[here](https://www.tensorflow.org/install/pip).
## TensorFlow Example
After loading TensorFlow, you can check its functionality by running the following Python script.
```python
import tensorflow as tf
a = tf.constant([1, 2, 3])
b = tf.constant([2, 4, 6])
c = a + b
print(c.numpy())
```
## Using TensorFlow With GPUs
With TensorFlow, you can leverage either a single GPU or multiple GPUs in a single process, to e.g.
train neural networks much faster.
Using the available `TensorFlow` module should make sure that these modules will be loaded correctly.
### Selecting GPUs
You can select how many and which (NVIDIA) GPUs will be used by TensorFlow with the
`CUDA_VISIBLE_DEVICES` environment variable.
```console
# Do not use any GPUs
$ CUDA_VISIBLE_DEVICES=-1 python3 my_script.py
# Use a single GPU with ID 0
$ CUDA_VISIBLE_DEVICES=0 python3 my_script.py
# Use multiple GPUs
$ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 my_script.py
```
By default, if you do not specify the environment variable, all available GPUs will be used by
TensorFlow.
### Multi-GPU TensorFlow Example
This script uses `keras` and `TensorFlow` to train a simple neural network on the
[MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset. It assumes that you have
`tensorflow` (2.x), `keras` and `tensorflow_datasets` Python packages installed. The training
is performed on multiple GPUs.
```python
import tensorflow_datasets as tfds
import tensorflow as tf
datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']
# Use NCCL reduction if NCCL is available, it should be the most efficient strategy
strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.NcclAllReduce())
# Different reduction strategy, use if NCCL causes errors
# strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.ReductionToOneDevice())
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
num_train_examples = info.splits['train'].num_examples
num_test_examples = info.splits['test'].num_examples
BUFFER_SIZE = 10000
BATCH_SIZE_PER_REPLICA = 64
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label
train_dataset = mnist_train.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
eval_dataset = mnist_test.map(scale).batch(BATCH_SIZE)
# The following line makes sure that the model will run on multiple GPUs (if they are available)
# Without `strategy.scope()`, the model would only be trained on a single GPU
with strategy.scope():
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
model.fit(train_dataset, epochs=100)
```
!!! note
If using the `NCCL` strategy causes runtime errors, try to run your application with the
environment variable `TF_FORCE_GPU_ALLOW_GROWTH` set to `true`.
!!! tip
For real-world multi-GPU training, it might be better to use a dedicated multi-GPU framework such
as [Horovod](https://github.com/horovod/horovod).
<!---
2022-10-14
Add multi-GPU example script.
2021-04-08
It's necessary to load the correct NumPy module along with the Tensorflow one.
2021-03-31
## Notes
As of 2021-03-23, TensorFlow is made available only on the Salomon cluster
Tensorflow-tensorboard/1.5.1-Py-3.6 has not been not tested.
-->
# Lmod Environment
Lmod is a modules tool, a modern alternative to the outdated & no longer actively maintained Tcl-based environment modules tool.
Detailed documentation on Lmod is available [here][a].
## Benefits
* significantly more responsive module commands, in particular `ml av`
* easier to use interface
* module files can be written in either the Tcl or Lua syntax (and both types of modules can be mixed together)
## Introduction
Below you will find more details and examples.
| command | equivalent/explanation |
| ------------------------ | ---------------------------------------------------------------- |
| ml | module list |
| ml GCC/6.2.0-2.27 | ml GCC/6.2.0-2.27 |
| ml -GCC/6.2.0-2.27 | module unload GCC/6.2.0-2.27 |
| ml purge | module unload all modules |
| ml av | ml av |
| ml show GCC/6.2.0-2.27 | module show GCC |
| ml spider gcc | searches (case-insensitive) for gcc in all available modules |
| ml spider GCC/6.2.0-2.27 | show all information about the module GCC/6.2.0-2.27 |
| ml save mycollection | stores the currently loaded modules to a collection |
| ml restore mycollection | restores a previously stored collection of modules |
## Listing Loaded Modules
To get an overview of the currently loaded modules, use `module list` or `ml` (without specifying extra arguments):
```console
$ ml
Currently Loaded Modules:
1) EasyBuild/3.0.0 (S) 2) lmod/7.2.2
Where:
S: Module is Sticky, requires --force to unload or purge
```
!!! tip
For more details on sticky modules, see the section on [ml purge][1].
## Searching for Available Modules
To get an overview of all available modules, you can use `ml avail` or simply `ml av`:
```console
$ ml av
---------------------------------------- /apps/modules/compiler ----------------------------------------------
GCC/5.2.0 GCCcore/6.2.0 (D) icc/2013.5.192 ifort/2013.5.192 LLVM/3.9.0-intel-2017.00 (D)
... ...
---------------------------------------- /apps/modules/devel -------------------------------------------------
Autoconf/2.69-foss-2015g CMake/3.0.0-intel-2016.01 M4/1.4.17-intel-2016.01 pkg-config/0.27.1-foss-2015g
Autoconf/2.69-foss-2016a CMake/3.3.1-foss-2015g M4/1.4.17-intel-2017.00 pkg-config/0.27.1-intel-2015b
... ...
```
In the current module naming scheme, each module name consists of two parts:
* the part before the first /, corresponding to the software name
* the remainder, corresponding to the software version, the compiler toolchain that was used to install the software, and a possible version suffix
!!! tip
`(D)` indicates that this particular version of the module is the default, but we strongly recommend to not rely on this, as the default can change at any point. Usually, the default points to the latest version available.
## Searching for Modules
If you just provide a software name, for example `gcc`, it prints an overview of all available modules for GCC.
```console
$ ml spider gcc
---------------------------------------------------------------------------------
GCC:
---------------------------------------------------------------------------------
Description:
The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj,...). - Homepage: http://gcc.gnu.org/
Versions:
GCC/4.4.7-system
GCC/4.7.4
GCC/4.8.3
GCC/4.9.2-binutils-2.25
GCC/4.9.2
GCC/4.9.3-binutils-2.25
GCC/4.9.3
GCC/4.9.3-2.25
GCC/5.1.0-binutils-2.25
GCC/5.2.0
GCC/5.3.0-binutils-2.25
GCC/5.3.0-2.25
GCC/5.3.0-2.26
GCC/5.3.1-snapshot-20160419-2.25
GCC/5.4.0-2.26
GCC/6.2.0-2.27
Other possible modules matches:
GCCcore
---------------------------------------------------------------------------------
To find other possible module matches do:
module -r spider '.*GCC.*'
---------------------------------------------------------------------------------
For detailed information about a specific "GCC" module (including how to load the modules) use the module's full name.
For example:
$ module spider GCC/6.2.0-2.27
---------------------------------------------------------------------------------
```
!!! tip
`spider` is case-insensitive.
If you use `spider` on a full module name like `GCC/6.2.0-2.27`, it will tell on which cluster(s) that module is available:
```console
$ module spider GCC/6.2.0-2.27
--------------------------------------------------------------------------------------------------------------
GCC: GCC/6.2.0-2.27
--------------------------------------------------------------------------------------------------------------
Description:
The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj...). - Homepage: http://gcc.gnu.org/
This module can be loaded directly: ml GCC/6.2.0-2.27
Help:
The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada,
as well as libraries for these languages (libstdc++, libgcj,...). - Homepage: http://gcc.gnu.org/
```
This tells you what the module contains and what the URL to the homepage of the software is.
## Available Modules for a Particular Software Package
To check which modules are available for a particular software package, you can provide the software name to `ml av`.
For example, to check which versions of Git are available:
```console
$ ml av git
-------------------------------------- /apps/modules/tools ----------------------------------------
git/2.8.0-GNU-4.9.3-2.25 git/2.8.0-intel-2017.00 git/2.9.0 git/2.9.2 git/2.11.0 (D)
Where:
D: Default Module
Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
```
!!! tip
The specified software name is case-insensitive.
Lmod does a partial match on the module name, so sometimes you need to specify the end of the software name you are interested in:
```console
$ ml av GCC/
------------------------------------------ /apps/modules/compiler -------------------------------------------
GCC/4.4.7-system GCC/4.8.3 GCC/4.9.2 GCC/4.9.3 GCC/5.1.0-binutils-2.25 GCC/5.3.0-binutils-2.25 GCC/5.3.0-2.26 GCC/5.4.0-2.26 GCC/4.7.4 GCC/4.9.2-binutils-2.25 GCC/4.9.3-binutils-2.25 GCC/4.9.3-2.25 GCC/5.2.0 GCC/5.3.0-2.25 GCC/6.2.0-2.27 (D)
Where:
D: Default Module
Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
```
## Inspecting a Module
To see how a module would change the environment, use `ml show`:
```console
$ ml show Python/3.5.2
help([[Python is a programming language that lets you work more quickly and integrate your systems more effectively. - Homepage: http://python.org/]])
whatis("Description: Python is a programming language that lets you work more quickly and integrate your systems more effectively. - Homepage: http://python.org/")
conflict("Python")
load("bzip2/1.0.6")
load("zlib/1.2.8")
load("libreadline/6.3")
load("ncurses/5.9")
load("SQLite/3.8.8.1")
load("Tk/8.6.3")
load("GMP/6.0.0a")
load("XZ/5.2.2")
prepend_path("CPATH","/apps/all/Python/3.5.2/include")
prepend_path("LD_LIBRARY_PATH","/apps/all/Python/3.5.2/lib")
prepend_path("LIBRARY_PATH","/apps/all/Python/3.5.2/lib")
prepend_path("MANPATH","/apps/all/Python/3.5.2/share/man")
prepend_path("PATH","/apps/all/Python/3.5.2/bin")
prepend_path("PKG_CONFIG_PATH","/apps/all/Python/3.5.2/lib/pkgconfig")
setenv("EBROOTPYTHON","/apps/all/Python/3.5.2")
setenv("EBVERSIONPYTHON","3.5.2")
setenv("EBDEVELPYTHON","/apps/all/Python/3.5.2/easybuild/Python-3.5.2-easybuild-devel")
setenv("EBEXTSLISTPYTHON","setuptools-20.1.1,pip-8.0.2,nose-1.3.7")
```
!!! tip
Note that both the direct changes to the environment as well as other modules that will be loaded are shown.
## Loading Modules
!!! warning
Always specify the name **and** the version when loading a module.
Loading a default module in your script (e.g. `$ ml intel`) will cause divergent results in the case the default module is upgraded.
**IT4Innovations is not responsible for any loss of allocated core- or node-hours resulting from the use of improper modules in your calculations.**
To effectively apply the changes to the environment that are specified by a module, use `ml` and specify the name of the module.
For example, to set up your environment to use Intel:
```console
$ ml intel/2017.00
$ ml
Currently Loaded Modules:
1) GCCcore/5.4.0
2) binutils/2.26-GCCcore-5.4.0 (H)
3) icc/2017.0.098-GCC-5.4.0-2.26
4) ifort/2017.0.098-GCC-5.4.0-2.26
5) iccifort/2017.0.098-GCC-5.4.0-2.26
6) impi/2017.0.098-iccifort-2017.0.098-GCC-5.4.0-2.26
7) iimpi/2017.00-GCC-5.4.0-2.26
8) imkl/2017.0.098-iimpi-2017.00-GCC-5.4.0-2.26
9) intel/2017.00
Where:
H: Hidden Module
```
!!! tip
Note that even though we only loaded a single module, the output of `ml` shows that a whole set of modules was loaded. These are required dependencies for `intel/2017.00`.
## Conflicting Modules
!!! warning
It is important to note that **only modules that are compatible with each other can be loaded together. In particular, modules must be installed either with the same toolchain as the modules that are already loaded, or with a compatible (sub)toolchain**.
For example, once you have loaded one or more modules that were installed with the `intel/2017.00` toolchain, all other modules that you load should have been installed with the same toolchain.
In addition, only **one single version** of each software package can be loaded at a particular time. For example, once you have the `Python/3.5.2-intel-2017.00` module loaded, you cannot load a different version of Python in the same session/job script, neither directly, nor indirectly as a dependency of another module you want to load.
## Unloading Modules
To revert the changes to the environment that were made by a particular module, you can use `ml -<modname>`.
For example:
```console
$ ml
Currently Loaded Modules:
1) EasyBuild/3.0.0 (S) 2) lmod/7.2.2
$ which gcc
/usr/bin/gcc
$ ml GCC/
$ ml
Currently Loaded Modules:
1) EasyBuild/3.0.0 (S) 2) lmod/7.2.2 3) GCCcore/6.2.0 4) binutils/2.27-GCCcore-6.2.0 (H) 5) GCC/6.2.0-2.27
$ which gcc
/apps/all/GCCcore/6.2.0/bin/gcc
$ ml -GCC
$ ml
Currently Loaded Modules:
1) EasyBuild/3.0.0 (S) 2) lmod/7.2.2 3) GCCcore/6.2.0 4) binutils/2.27-GCCcore-6.2.0 (H)
$ which gcc
/usr/bin/gcc
```
## Resetting by Unloading All Modules
To reset your environment back to a clean state, you can use `ml purge` or `ml purge --force`:
```console
$ ml
Currently Loaded Modules:
1) EasyBuild/3.0.0 (S) 2) lmod/7.2.2 3) GCCcore/6.2.0 4) binutils/2.27-GCCcore-6.2.0 (H)
$ ml purge
The following modules were not unloaded:
(Use "module --force purge" to unload all):
1) EasyBuild/3.0.0
$ ml
Currently Loaded Modules:
1) EasyBuild/3.0.0 (S)
$ ml purge --force
$ ml
No modules loaded
```
As such, you should not (re)load the cluster module anymore after running `ml purge`.
## Module Collections
If you have a set of modules that you need to load often, you can save these in a collection (only works with Lmod).
First, load all the modules you need, for example:
```console
$ ml intel/2017.00 Python/3.5.2-intel-2017.00
```
Now store them in a collection using `ml save`:
```console
$ ml save my-collection
```
Later, for example in a job script, you can reload all these modules with `ml restore`:
```console
$ ml restore my-collection
```
With `ml savelist`, you can get a list of all saved collections:
```console
$ ml savelist
Named collection list:
1) my-collection
2) my-test-collection
```
To inspect a collection, use `ml describe`.
To remove a module collection, remove the corresponding entry in $HOME/.lmod.d.
[1]: #resetting-by-unloading-all-modules
[a]: http://lmod.readthedocs.io
# New Software Installation Request
If you need to install new software on IT4Innovations' clusters, send your request to [support\[at\]it4i.cz][a].
In the request, provide the following information:
1. Software name **(required)**;
1. Website **(required)**;
1. Type of software (e.g. open-source, commercial, ...) **(required)**;
1. Required software version (specific version or 'latest') **(required)**;
1. Dependencies (both required and optional ones that are required for your use case);
1. Pointer to installation guide **(required)**;
1. Pointer to documentation on how to test installation;
1. Details of license server, hostname and port(s) (if any);
1. Instructions to run test case;
1. Short motivation describing why you want to use this software **(required)**;
1. When would you like to use this software? **(required)**;
1. Toolchain preference (e.g. intel/2020a, ...);
1. Does this software need to be made available to only a particular group of users? **(required)**
- No, installation can be public;
- Yes (specify the group of users).
[a]: mailto:support@it4i.cz
docs.it4i/software/mpi/img/jupyter_new.png

37.4 KiB

docs.it4i/software/mpi/img/jupyter_ood_start.png

40.1 KiB

docs.it4i/software/mpi/img/jupyter_run.png

47.1 KiB

docs.it4i/software/mpi/img/ood_jupyter.png

87 KiB

# SLURM and Parallel Processing in Jupyter-Lab via OOD
When working in Jupyter Lab, you can leverage session outputs or run parallel tasks, which is beneficial for efficient computation across multiple nodes. This guide shows way to run SLURM tasks from a Jupyter notebook using Open OnDemand (OOD).
## Accessing OOD
1. Open [this link to start a new OOD Jupyter Lab session](https://ood-karolina.it4i.cz/pun/sys/dashboard/batch_connect/sys/bc_it4i_jupyter/session_contexts/new).
2. Start a job with 2 nodes and 256 cores.
![](img/ood_jupyter.png)
3. A window with Jupyter Lab will open, where you can continue working.
## Setting Up Jupyter Lab
1. Once Jupyter Lab opens, select **File****New Launcher**.
2. Select **Notebook****Python 3 (ipykernel)** to create a new notebook.
![](img/jupyter_new.png)
![](img/jupyter_ood_start.png)
## Inserting the Sample Code
Paste the following sample code into your new notebook to run a parallel task using MPI:
```python
import ipyparallel as ipp
def mpi_example():
from mpi4py import MPI
comm = MPI.COMM_WORLD
return f"Hello World from rank {comm.Get_rank()}. total ranks={comm.Get_size()}. host={MPI.Get_processor_name()}"
# Request an MPI cluster with 256 engines
with ipp.Cluster(controller_ip="*", engines="mpi", n=256) as rc:
# Get a broadcast_view on the cluster which is best suited for MPI style computation
view = rc.broadcast_view()
# Run the mpi_example function on all engines in parallel
r = view.apply_sync(mpi_example)
# Retrieve and print the result from the engines
print("\n".join(r))
# At this point, the cluster processes have been shutdown
```
![](img/jupyter_run.png)
## Explanation of the Code
* ipyparallel: Used for parallel computations with multiple processes.
* mpi4py: A library for working with MPI (Message Passing Interface), which is a method for efficient parallel processing across multiple nodes.
* The function mpi_example() returns information about each process, such as its rank, total number of processes, and host name.
* Using ipp.Cluster, we request a cluster with 256 cores and run the function on all nodes concurrently.
## Job Output
### After Execution
When the code is executed, you will see an output similar to this
```console
Starting 256 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
0%
0/256 [00:00<?, ?engine/s]
```
### After Completing the Calculation
```console
Starting 256 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
100%
256/256 [00:27<00:00, 2.64engine/s]
Hello World from rank 0. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 1. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 2. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 3. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 4. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 5. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 6. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 7. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 8. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 9. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 10. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 11. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 12. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 13. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 14. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 15. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 16. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 17. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 18. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 19. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 20. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 21. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 22. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 23. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 24. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 25. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 26. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 27. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 28. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 29. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 30. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 31. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 32. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 33. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 34. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 35. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 36. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 37. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 38. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 39. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 40. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 41. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 42. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 43. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 44. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 45. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 46. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 47. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 48. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 49. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 50. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 51. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 52. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 53. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 54. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 55. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 56. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 57. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 58. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 59. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 60. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 61. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 62. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 63. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 64. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 65. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 66. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 67. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 68. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 69. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 70. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 71. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 72. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 73. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 74. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 75. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 76. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 77. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 78. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 79. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 80. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 81. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 82. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 83. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 84. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 85. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 86. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 87. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 88. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 89. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 90. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 91. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 92. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 93. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 94. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 95. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 96. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 97. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 98. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 99. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 100. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 101. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 102. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 103. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 104. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 105. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 106. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 107. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 108. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 109. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 110. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 111. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 112. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 113. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 114. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 115. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 116. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 117. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 118. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 119. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 120. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 121. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 122. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 123. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 124. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 125. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 126. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 127. total ranks=256. host=cn090.karolina.it4i.cz
Hello World from rank 128. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 129. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 130. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 131. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 132. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 133. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 134. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 135. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 136. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 137. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 138. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 139. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 140. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 141. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 142. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 143. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 144. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 145. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 146. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 147. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 148. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 149. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 150. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 151. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 152. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 153. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 154. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 155. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 156. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 157. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 158. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 159. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 160. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 161. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 162. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 163. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 164. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 165. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 166. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 167. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 168. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 169. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 170. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 171. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 172. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 173. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 174. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 175. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 176. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 177. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 178. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 179. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 180. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 181. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 182. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 183. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 184. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 185. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 186. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 187. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 188. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 189. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 190. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 191. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 192. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 193. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 194. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 195. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 196. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 197. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 198. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 199. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 200. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 201. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 202. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 203. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 204. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 205. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 206. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 207. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 208. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 209. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 210. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 211. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 212. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 213. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 214. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 215. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 216. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 217. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 218. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 219. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 220. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 221. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 222. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 223. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 224. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 225. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 226. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 227. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 228. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 229. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 230. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 231. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 232. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 233. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 234. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 235. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 236. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 237. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 238. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 239. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 240. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 241. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 242. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 243. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 244. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 245. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 246. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 247. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 248. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 249. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 250. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 251. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 252. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 253. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 254. total ranks=256. host=cn091.karolina.it4i.cz
Hello World from rank 255. total ranks=256. host=cn091.karolina.it4i.cz
Stopping engine(s): 1733739848
```
# MPI
## Setting Up MPI Environment
The Karolina cluster provides several implementations of the MPI library:
* OpenMPI
* Intel MPI (impi)
* MPICH
MPI libraries are activated via the environment modules.
!!! note
All OpenMPI modules are configured with `setenv("SLURM_MPI_TYPE", "pmix_v4")`.
Look up the modulefiles/mpi section in `ml av`:
```console
$ ml av
------------------------------------------------------- /apps/modules/mpi -------------------------------------------------------
OpenMPI/3.1.4-GCC-6.3.0-2.27 OpenMPI/4.1.1-GCC-10.2.0
OpenMPI/4.0.3-GCC-9.3.0 OpenMPI/4.1.1-GCC-10.3.0 (D)
OpenMPI/4.0.5-GCC-10.2.0 impi/2017.4.239-iccifort-2017.8.262-GCC-6.3.0-2.27
OpenMPI/4.0.5-gcccuda-2020b impi/2018.4.274-iccifort-2018.5.274-GCC-8.3.0-2.32
OpenMPI/4.0.5-iccifort-2020.4.304 impi/2018.4.274-iccifort-2019.1.144-GCC-8.2.0-2.31.1
OpenMPI/4.0.5-NVHPC-21.2-CUDA-11.2.2 impi/2019.9.304-iccifort-2020.1.217
OpenMPI/4.0.5-NVHPC-21.2-CUDA-11.3.0 impi/2019.9.304-iccifort-2020.4.304
OpenMPI/4.1.1-GCC-10.2.0-Java-1.8.0_221 impi/2021.2.0-intel-compilers-2021.2.0 (D)
MPICH/3.3.2-GCC-10.2.0
```
There are default compilers associated with any particular MPI implementation. The defaults may be changed; the MPI libraries may be used in conjunction with any compiler.
Examples:
```console
$ ml gompi/2020b
```
In this example, we activate the OpenMPI with the GNU compilers (OpenMPI 4.0.5 and GCC 10.2.0). For more information about toolchains, see the [Environment and Modules][1] section.
To use OpenMPI with the Intel compiler suite, use:
```console
$ ml iompi/2020b
```
In this example, the OpenMPI 4.0.5 using the Intel compilers 2020.4.304 is activated. It uses the `iompi` toolchain.
## Compiling MPI Programs
After setting up your MPI environment, compile your program using one of the MPI wrappers:
For module `gompi/2020b`
```console
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/apps/all/GCCcore/10.2.0/bin/gcc
COLLECT_LTO_WRAPPER=/apps/all/GCCcore/10.2.0/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++,fortran --without-cuda-driver --enable-offload-targets=nvptx-none --enable-lto --enable-checking=release --disable-multilib --enable-shared=yes --enable-static=yes --enable-threads=posix --enable-plugins --enable-gold=default --enable-ld --with-plugin-ld=ld.gold --prefix=/apps/all/GCCcore/10.2.0 --with-local-prefix=/apps/all/GCCcore/10.2.0 --enable-bootstrap --with-isl=/dev/shm/easybuild/build/GCCcore/10.2.0/system-system/gcc-10.2.0/stage2_stuff
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (GCC)
$ mpif77 -v
Using built-in specs.
COLLECT_GCC=/apps/all/GCCcore/10.2.0/bin/gfortran
COLLECT_LTO_WRAPPER=/apps/all/GCCcore/10.2.0/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++,fortran --without-cuda-driver --enable-offload-targets=nvptx-none --enable-lto --enable-checking=release --disable-multilib --enable-shared=yes --enable-static=yes --enable-threads=posix --enable-plugins --enable-gold=default --enable-ld --with-plugin-ld=ld.gold --prefix=/apps/all/GCCcore/10.2.0 --with-local-prefix=/apps/all/GCCcore/10.2.0 --enable-bootstrap --with-isl=/dev/shm/easybuild/build/GCCcore/10.2.0/system-system/gcc-10.2.0/stage2_stuff
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (GCC)
~$ mpif90 -v
Using built-in specs.
COLLECT_GCC=/apps/all/GCCcore/10.2.0/bin/gfortran
COLLECT_LTO_WRAPPER=/apps/all/GCCcore/10.2.0/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++,fortran --without-cuda-driver --enable-offload-targets=nvptx-none --enable-lto --enable-checking=release --disable-multilib --enable-shared=yes --enable-static=yes --enable-threads=posix --enable-plugins --enable-gold=default --enable-ld --with-plugin-ld=ld.gold --prefix=/apps/all/GCCcore/10.2.0 --with-local-prefix=/apps/all/GCCcore/10.2.0 --enable-bootstrap --with-isl=/dev/shm/easybuild/build/GCCcore/10.2.0/system-system/gcc-10.2.0/stage2_stuff
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (GCC)
```
When using Intel MPI, use the following MPI wrappers:
For module `intel/2020b`
```console
$ mpiicc -v
mpiicc for the Intel(R) MPI Library 2019 Update 9 for Linux*
Copyright 2003-2020, Intel Corporation.
icc version 19.1.3.304 (gcc version 10.2.0 compatibility)
ld /lib/../lib64/crt1.o /lib/../lib64/crti.o /apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtbegin.o --eh-frame-hdr --build-id -dynamic-linker /lib64/ld-linux-x86-64.so.2 -m elf_x86_64 -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -o a.out -L/apps/all/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64 -L/apps/all/imkl/2020.4.304-iimpi-2020b/lib/intel64 -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/libfabric/lib -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/../lib64 -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/../lib64 -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/../lib64 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/../lib64 -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64 -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64/ -L/lib/../lib64 -L/lib/../lib64/ -L/usr/lib/../lib64 -L/usr/lib/../lib64/ -L/apps/all/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/ -L/apps/all/imkl/2020.4.304-iimpi-2020b/lib/intel64/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/libfabric/lib/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/ -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib64 -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/ -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib64 -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/ -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8/ -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib64 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/ -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib64 -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../ -L/lib64 -L/lib/ -L/usr/lib64 -L/usr/lib --enable-new-dtags -rpath /apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -rpath /apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -lmpifort -lmpi -ldl -lrt -lpthread -Bdynamic -Bstatic -limf -lsvml -lirng -Bdynamic -lm -Bstatic -lipgo -ldecimal --as-needed -Bdynamic -lcilkrts -lstdc++ --no-as-needed -lgcc -lgcc_s -Bstatic -lirc -lsvml -Bdynamic -lc -lgcc -lgcc_s -Bstatic -lirc_s -Bdynamic -ldl -lc /apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtend.o /lib/../lib64/crtn.o
$ mpiifort -v
mpiifort for the Intel(R) MPI Library 2019 Update 9 for Linux*
Copyright 2003-2020, Intel Corporation.
ifort version 19.1.3.304
ld /lib/../lib64/crt1.o /lib/../lib64/crti.o /apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtbegin.o --eh-frame-hdr --build-id -dynamic-linker /lib64/ld-linux-x86-64.so.2 -m elf_x86_64 -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -o a.out /apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/for_main.o -L/apps/all/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64 -L/apps/all/imkl/2020.4.304-iimpi-2020b/lib/intel64 -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/libfabric/lib -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/../lib64 -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/../lib64 -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/../lib64 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/../lib64 -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/../lib64/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64 -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64/ -L/lib/../lib64 -L/lib/../lib64/ -L/usr/lib/../lib64 -L/usr/lib/../lib64/ -L/apps/all/imkl/2020.4.304-iimpi-2020b/mkl/lib/intel64/ -L/apps/all/imkl/2020.4.304-iimpi-2020b/lib/intel64/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/libfabric/lib/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release/ -L/apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/ -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib64 -L/apps/all/UCX/1.9.0-GCCcore-10.2.0/lib/ -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib64 -L/apps/all/numactl/2.0.13-GCCcore-10.2.0/lib/ -L/apps/all/iccifort/2020.4.304/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8/ -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib64 -L/apps/all/binutils/2.35-GCCcore-10.2.0/lib/ -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib64 -L/apps/all/zlib/1.2.11-GCCcore-10.2.0/lib/ -L/apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../ -L/lib64 -L/lib/ -L/usr/lib64 -L/usr/lib --enable-new-dtags -rpath /apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib/release -rpath /apps/all/impi/2019.9.304-iccifort-2020.4.304/intel64/lib -lmpifort -lmpi -ldl -lrt -lpthread -Bdynamic -Bstatic -lifport -lifcoremt -limf -lsvml -Bdynamic -lm -Bstatic -lipgo -lirc -Bdynamic -lpthread -Bstatic -lsvml -Bdynamic -lc -lgcc -lgcc_s -Bstatic -lirc_s -Bdynamic -ldl -lc /apps/all/GCCcore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtend.o /lib/../lib64/crtn.o
```
Wrappers `mpif90` and `mpif77` provided by Intel MPI are designed for GCC and GFortran. You might be able to compile MPI code by them even with Intel compilers, but you might run into problems.
Example program:
```cpp
// helloworld_mpi.c
#include <stdio.h>
#include<mpi.h>
int main(int argc, char **argv) {
int len;
int rank, size;
char node[MPI_MAX_PROCESSOR_NAME];
// Initiate MPI
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
// Get hostame and print
MPI_Get_processor_name(node,&len);
printf("Hello world! from rank %d of %d on host %sn",rank,size,node);
// Finalize and exit
MPI_Finalize();
return 0;
}
```
Compile the above example with:
```console
$ mpicc helloworld_mpi.c -o helloworld_mpi.x
```
## Running MPI Programs
The MPI program executable must be compatible with the loaded MPI module.
Always compile and execute using the very same MPI module.
It is strongly discouraged to mix MPI implementations. Linking an application with one MPI implementation and running `mpirun`/`mpiexec` from another implementation may result in unexpected errors.
The MPI program executable must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem. You need to preload the executable if running on the local scratch /lscratch filesystem.
### Ways to Run MPI Programs
The optimal way to run an MPI program depends on its memory requirements, memory access pattern and communication pattern.
!!! note
Consider these ways to run an MPI program:
1. One MPI process per node, 128 threads per process
2. Two MPI processes per node, 64 threads per process
3. 128 MPI processes per node, 1 thread per process.
**One MPI** process per node, using 128 threads, is most useful for memory demanding applications that make good use of processor cache memory and are not memory-bound. This is also a preferred way for communication intensive applications as one process per node enjoys full bandwidth access to the network interface.
**Two MPI** processes per node, using 64 threads each, bound to processor socket is most useful for memory bandwidth-bound applications such as BLAS1 or FFT with scalable memory demand. However, note that the two processes will share access to the network interface. The 64 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration, and NUMA effect overheads.
!!! note
Important! Bind every OpenMP thread to a core!
In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the `KMP_AFFINITY` or `GOMP_CPU_AFFINITY` environment variables.
**128 MPI** processes per node, using 1 thread each bound to a processor core is most suitable for highly scalable applications with low communication demand.
[1]: ../../modules-matrix.md
[a]: http://www.open-mpi.org/
# MPI4Py (MPI for Python)
## Introduction
MPI for Python provides bindings of the Message Passing Interface (MPI) standard for the Python programming language, allowing any Python program to exploit multiple processors.
This package is constructed on top of the MPI-1/2 specifications and provides an object-oriented interface, which closely follows MPI-2 C++ bindings. It supports point-to-point (sends, receives) and collective (broadcasts, scatters, gathers) communications of any picklable Python object, as well as optimized communications of Python object exposing the single-segment buffer interface (NumPy arrays, builtin bytes/string/array objects).
MPI4Py is available in standard Python modules on the clusters.
## Modules
MPI4Py is built for OpenMPI or Intel MPI. Before you start with MPI4Py, you need to load the mpi4py module.
```console
$ ml av mpi4py
-------------------------------------- /apps/modules/lib ---------------------------------------
mpi4py/3.1.4-gompi-2022b mpi4py/3.1.4-gompi-2023a mpi4py/3.1.5-gompi-2023b (D)
```
## Execution
You need to import MPI to your Python program. Include the following line to the Python script:
```python
from mpi4py import MPI
```
The MPI4Py-enabled Python programs execute as any other OpenMPI code. The simpliest way is to run:
```console
$ mpirun python <script>.py
```
For example:
```console
$ mpirun python hello_world.py
```
## Examples
Execute the above code as:
```console
$ salloc -p qcpu -A PROJECT_ID --nodes=4 --ntasks-per-node=128 --cpus-per-task=1
$ ml mpi4py/3.1.5-gompi-2023b
```
### Hello World!
```python
#!/usr/bin/env python
"""
Parallel Hello World
"""
from mpi4py import MPI
import sys
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
sys.stdout.write(
"Hello, World! I am process %d of %d on %s.\n"
% (rank, size, name))
```
```console
mpirun python ./hello_world.py
...
Hello, World! I am process 81 of 512 on cn041.karolina.it4i.cz.
Hello, World! I am process 91 of 512 on cn041.karolina.it4i.cz.
Hello, World! I am process 15 of 512 on cn041.karolina.it4i.cz.
Hello, World! I am process 105 of 512 on cn041.karolina.it4i.cz.
Hello, World! I am process 112 of 512 on cn041.karolina.it4i.cz.
Hello, World! I am process 11 of 512 on cn041.karolina.it4i.cz.
Hello, World! I am process 83 of 512 on cn041.karolina.it4i.cz.
Hello, World! I am process 58 of 512 on cn041.karolina.it4i.cz.
Hello, World! I am process 103 of 512 on cn041.karolina.it4i.cz.
Hello, World! I am process 4 of 512 on cn041.karolina.it4i.cz.
Hello, World! I am process 28 of 512 on cn041.karolina.it4i.cz.
```
### Mandelbrot
```python
from mpi4py import MPI
import numpy as np
tic = MPI.Wtime()
x1 = -2.0
x2 = 1.0
y1 = -1.0
y2 = 1.0
w = 150
h = 100
maxit = 127
def mandelbrot(x, y, maxit):
c = x + y*1j
z = 0 + 0j
it = 0
while abs(z) < 2 and it < maxit:
z = z**2 + c
it += 1
return it
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
rmsg = np.empty(4, dtype='f')
imsg = np.empty(3, dtype='i')
if rank == 0:
rmsg[:] = [x1, x2, y1, y2]
imsg[:] = [w, h, maxit]
comm.Bcast([rmsg, MPI.FLOAT], root=0)
comm.Bcast([imsg, MPI.INT], root=0)
x1, x2, y1, y2 = [float(r) for r in rmsg]
w, h, maxit = [int(i) for i in imsg]
dx = (x2 - x1) / w
dy = (y2 - y1) / h
# number of lines to compute here
N = h // size + (h % size > rank)
N = np.array(N, dtype='i')
# indices of lines to compute here
I = np.arange(rank, h, size, dtype='i')
# compute local lines
C = np.empty([N, w], dtype='i')
for k in np.arange(N):
y = y1 + I[k] * dy
for j in np.arange(w):
x = x1 + j * dx
C[k, j] = mandelbrot(x, y, maxit)
# gather results at root
counts = 0
indices = None
cdata = None
if rank == 0:
counts = np.empty(size, dtype='i')
indices = np.empty(h, dtype='i')
cdata = np.empty([h, w], dtype='i')
comm.Gather(sendbuf=[N, MPI.INT],
recvbuf=[counts, MPI.INT],
root=0)
comm.Gatherv(sendbuf=[I, MPI.INT],
recvbuf=[indices, (counts, None), MPI.INT],
root=0)
comm.Gatherv(sendbuf=[C, MPI.INT],
recvbuf=[cdata, (counts*w, None), MPI.INT],
root=0)
# reconstruct full result at root
if rank == 0:
M = np.zeros([h,w], dtype='i')
M[indices, :] = cdata
toc = MPI.Wtime()
wct = comm.gather(toc-tic, root=0)
if rank == 0:
for task, time in enumerate(wct):
print('wall clock time: %8.2f seconds (task %d)' % (time, task))
def mean(seq): return sum(seq)/len(seq)
print ('all tasks, mean: %8.2f seconds' % mean(wct))
print ('all tasks, min: %8.2f seconds' % min(wct))
print ('all tasks, max: %8.2f seconds' % max(wct))
print ('all tasks, sum: %8.2f seconds' % sum(wct))
# eye candy (requires matplotlib)
if rank == 0:
try:
from matplotlib import pyplot as plt
plt.imshow(M, aspect='equal')
plt.spectral()
try:
import signal
def action(*args): raise SystemExit
signal.signal(signal.SIGALRM, action)
signal.alarm(2)
except:
pass
plt.show()
except:
pass
MPI.COMM_WORLD.Barrier()
```
```console
mpirun python mandelbrot.py
...
wall clock time: 0.26 seconds (task 505)
wall clock time: 0.25 seconds (task 506)
wall clock time: 0.24 seconds (task 507)
wall clock time: 0.25 seconds (task 508)
wall clock time: 0.25 seconds (task 509)
wall clock time: 0.26 seconds (task 510)
wall clock time: 0.25 seconds (task 511)
all tasks, mean: 0.19 seconds
all tasks, min: 0.00 seconds
all tasks, max: 0.73 seconds
all tasks, sum: 96.82 seconds
```
In this example, we run MPI4Py-enabled code on 4 nodes, 128 cores per node (total of 512 processes), each Python process is bound to a different core. More examples and documentation can be found on [MPI for Python webpage][a].
You can increase `n` and watch the time lowering.
[a]: https://pypi.python.org/pypi/mpi4py
# OpenMPI Sample Applications
Sample MPI applications provided both as a trivial primer to MPI as well as simple tests to ensure that your OpenMPI installation is working properly.
## Examples
There are two MPI examples, each using one of six different MPI interfaces:
### Hello World
```
/*
* Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana
* University Research and Technology
* Corporation. All rights reserved.
* Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
*
* Sample MPI "hello world" application in C
*/
<!-- markdownlint-disable MD018 MD025 -->
#include <stdio.h>
#include "mpi.h"
int main(int argc, char* argv[])
{
int rank, size, len;
char version[MPI_MAX_LIBRARY_VERSION_STRING];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Get_library_version(version, &len);
printf("Hello, world, I am %d of %d, (%s, %d)\n",
rank, size, version, len);
MPI_Finalize();
return 0;
}
```
```
//
// Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana
// University Research and Technology
// Corporation. All rights reserved.
// Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
//
// Sample MPI "hello world" application in C++
//
// NOTE: The MPI C++ bindings were deprecated in MPI-2.2 and removed
// from the standard in MPI-3. Open MPI still provides C++ MPI
// bindings, but they are no longer built by default (and may be
// removed in a future version of Open MPI). You must
// --enable-mpi-cxx when configuring Open MPI to enable the MPI C++
// bindings.
//
<!-- markdownlint-disable MD018 MD025 -->
#include "mpi.h"
<!-- markdownlint-disable MD022 -->
#include <iostream>
int main(int argc, char **argv)
{
int rank, size, len;
char version[MPI_MAX_LIBRARY_VERSION_STRING];
MPI::Init();
rank = MPI::COMM_WORLD.Get_rank();
size = MPI::COMM_WORLD.Get_size();
MPI_Get_library_version(version, &len);
std::cout << "Hello, world! I am " << rank << " of " << size
<< "(" << version << ", " << len << ")" << std::endl;
MPI::Finalize();
return 0;
}
```
```
C
C Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana
C University Research and Technology
C Corporation. All rights reserved.
C Copyright (c) 2006-2015 Cisco Systems, Inc. All rights reserved.
C $COPYRIGHT$
C
C Sample MPI "hello world" application using the Fortran mpif.h
C bindings.
C
program main
implicit none
include 'mpif.h'
integer ierr, rank, size, len
character(len=MPI_MAX_LIBRARY_VERSION_STRING) version
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
call MPI_GET_LIBRARY_VERSION(version, len, ierr)
write(*, '("Hello, world, I am ", i2, " of ", i2, ": ", a)')
& rank, size, version
call MPI_FINALIZE(ierr)
end
```
```
!
! Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana
! University Research and Technology
! Corporation. All rights reserved.
! Copyright (c) 2004-2005 The Regents of the University of California.
! All rights reserved.
! Copyright (c) 2006-2015 Cisco Systems, Inc. All rights reserved.
! $COPYRIGHT$
!
! Sample MPI "hello world" application using the Fortran mpi module
! bindings.
!
program main
use mpi
implicit none
integer :: ierr, rank, size, len
character(len=MPI_MAX_LIBRARY_VERSION_STRING) :: version
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
call MPI_GET_LIBRARY_VERSION(version, len, ierr)
write(*, '("Hello, world, I am ", i2, " of ", i2, ": ", a)') &
rank, size, version
call MPI_FINALIZE(ierr)
end
```
```
/*
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
* Author of revised version: Franklyn Pinedo
*
* Adapted from Source Code in C of Tutorial/User's Guide for MPI by
* Peter Pacheco.
*/
/*
* Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
*
*/
import mpi.*;
class Hello {
static public void main(String[] args) throws MPIException {
MPI.Init(args);
int myrank = MPI.COMM_WORLD.getRank();
int size = MPI.COMM_WORLD.getSize() ;
System.out.println("Hello world from rank " + myrank + " of " + size);
MPI.Finalize();
}
}
```
* C: [hello_c.c](../../src/ompi/hello_c.c)
* C++: [hello_cxx.cc](../../src/ompi/hello_cxx.cc)
* Fortran mpif.h: [hello_mpifh.f](../../src/ompi/hello_mpifh.f)
* Fortran use mpi: [hello_usempi.f90](../../src/ompi/hello_usempi.f90)
* Fortran use mpi_f08: [hello_usempif08.f90](../../src/ompi/hello_usempif08.f90)
* Java: [Hello.java](../../src/ompi/Hello.java)
* C shmem.h: [hello_oshmem_c.c](../../src/ompi/hello_oshmem_c.c)
* Fortran shmem.fh: [hello_oshmemfh.f90](../../src/ompi/hello_oshmemfh.f90)
<!-- markdownlint-disable MD001 -->
### Send a Trivial Message Around in a Ring
* C: [ring_c.c](../../src/ompi/ring_c.c)
* C++: [ring_cxx.cc](../../src/ompi/ring_cxx.cc)
* Fortran mpif.h: [ring_mpifh.f](../../src/ompi/ring_mpifh.f)
* Fortran use mpi: [ring_usempi.f90](../../src/ompi/ring_usempi.f90)
* Fortran use mpi_f08: [ring_usempif08.f90](../../src/ompi/ring_usempif08.f90)
* Java: [Ring.java](../../src/ompi/Ring.java)
* C shmem.h: [ring_oshmem_c.c](../../src/ompi/ring_oshmem_c.c)
* Fortran shmem.fh: [ring_oshmemfh.f90](../../src/ompi/ring_oshmemfh.f90)
Additionally, there's one further example application, but this one only uses the MPI C bindings:
<!-- markdownlint-disable MD001 -->
### Test the Connectivity Between All Pross
* C: [connectivity_c.c](../../src/ompi/connectivity_c.c)
## Build Examples
Download [examples](../../src/ompi/ompi.tar.gz).
The Makefile in this directory will build the examples for the supported languages (e.g., if you do not have the Fortran "use mpi" bindings compiled as part of OpenMPI, those examples will be skipped).
The Makefile assumes that the wrapper compilers mpicc, mpic++, and mpifort are in your path.
Although the Makefile is tailored for OpenMPI (e.g., it checks the *mpi_info* command to see if you have support for C++, mpif.h, use mpi, and use mpi_f08 F90), all of the example programs are pure MPI, and therefore not specific to OpenMPI. Hence, you can use a different MPI implementation to compile and run these programs if you wish.
```console
$ tar xvf ompi.tar.gz
ompi/
ompi/ring_usempif08.f90
ompi/hello_c.c
ompi/oshmem_max_reduction.c
...
...
ompi/hello_usempi.f90
$ cd ompi
$ ml OpenMPI/4.1.4-GCC-11.3.0
$ make
mpicc -g hello_c.c -o hello_c
mpicc -g ring_c.c -o ring_c
mpicc -g connectivity_c.c -o connectivity_c
mpicc -g spc_example.c -o spc_example
mpic++ -g hello_cxx.cc -o hello_cxx
mpic++ -g ring_cxx.cc -o ring_cxx
mpifort -g hello_mpifh.f -o hello_mpifh
mpifort -g ring_mpifh.f -o ring_mpifh
mpifort -g hello_usempi.f90 -o hello_usempi
mpifort -g ring_usempi.f90 -o ring_usempi
mpifort -g hello_usempif08.f90 -o hello_usempif08
mpifort -g ring_usempif08.f90 -o ring_usempif08
mpijavac Hello.java
mpijavac Ring.java
shmemcc -g hello_oshmem_c.c -o hello_oshmem
shmemc++ -g hello_oshmem_cxx.cc -o hello_oshmemcxx
shmemcc -g ring_oshmem_c.c -o ring_oshmem
shmemcc -g oshmem_shmalloc.c -o oshmem_shmalloc
shmemcc -g oshmem_circular_shift.c -o oshmem_circular_shift
shmemcc -g oshmem_max_reduction.c -o oshmem_max_reduction
shmemcc -g oshmem_strided_puts.c -o oshmem_strided_puts
shmemcc -g oshmem_strided_puts.c -o oshmem_strided_puts
shmemcc -g oshmem_symmetric_data.c -o oshmem_symmetric_data
shmemfort -g hello_oshmemfh.f90 -o hello_oshmemfh
shmemfort -g ring_oshmemfh.f90 -o ring_oshmemfh
$ find . -executable -type f
./hello_oshmem
./dtrace/myppriv.sh
./dtrace/partrace.sh
./oshmem_shmalloc
./ring_cxx
./ring_usempi
./hello_mpifh
./hello_cxx
./oshmem_max_reduction
./oshmem_symmetric_data
./oshmem_strided_puts
./hello_usempif08
./ring_usempif08
./spc_example
./hello_oshmemfh
./ring_oshmem
./oshmem_circular_shift
./hello_c
./ring_c
./hello_usempi
./ring_oshmemfh
./connectivity_c
./ring_mpifh
```
# Numerical Languages
Interpreted languages for numerical computations and analysis
## Introduction
This section contains a collection of high-level interpreted languages, primarily intended for numerical computations.
## MATLAB
MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming.
```console
$ ml av MATLAB
-------------- /apps/modules/math --------------
MATLAB/2021a
$ ml MATLAB/2021a
$ matlab
```
Read more at the [MATLAB page][1].
## Octave
GNU Octave is a high-level interpreted language, primarily intended for numerical computations. The Octave language is quite similar to MATLAB so that most programs are easily portable.
```console
$ ml av Octave
-------------- /apps/modules/math --------------
Octave/6.3.0-intel-2020b-without-X11
$ ml Octave/6.3.0-intel-2020b-without-X11
$ octave
```
Read more at the [Octave page][2].
## R
The R is an interpreted language and environment for statistical computing and graphics.
```console
$ ml av R/
-------------- /apps/modules/math --------------
$ ml R
$ R
```
Read more at the [R page][3].
[1]: matlab.md
[2]: octave.md
[3]: r.md
# MATLAB
## Introduction
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language
and numeric computing environment developed by MathWorks.
MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms,
creation of user interfaces, and interfacing with programs written in other languages.
## Installed Versions
For the current list of installed versions, use:
```console
$ ml av MATLAB
```
## MATLAB GUI
If you need to use the MATLAB GUI to prepare your MATLAB programs, you can use MATLAB directly on the login nodes.
However, for all computations, use MATLAB on the compute nodes via Slurm workload manager.
If you require the MATLAB GUI, follow the general information about [running graphical applications][1].
MATLAB GUI is quite slow using the X forwarding built in the Slurm (`srun --x11`),
so using X11 display redirection either via SSH or directly by `xauth`
(see the [GUI Applications on Compute Nodes over VNC][1] section) is recommended.
To run MATLAB with GUI, use:
```console
$ matlab
```
To run MATLAB in text mode, without the MATLAB Desktop GUI environment, use:
```console
$ matlab -nodesktop -nosplash
```
plots, images, etc. will be still available.
## MATLAB Configuration
### Client Configuration
After logging into the cluster, start MATLAB.
On the Home tab, click Parallel > Discover Clusters… to discover the profile.
![](../../img/dis_clluster.png)
Jobs will now default to the cluster rather than submit to the local machine.
### Job Configuration
Prior to submitting the job, various parameters can be assigned, such as queue, e-mail,walltime, etc.
The following is a partial list of parameters.
See `AdditionalProperties` for the complete list.
Only the `ProjectName` is required.
```
>> % Get a handle to the cluster
>> c = parcluster;
[REQUIRED]
>> % Specify the project name
>> c.AdditionalProperties.ProjectName = 'project-name';
[OPTIONAL]
>> % Specify a constraint
>> c.AdditionalProperties.Constraint = 'feature-name';
>> % Request email notification of job status
>> c.AdditionalProperties.EmailAddress = 'user-id@university.edu';
>> % Specify number of GPUs
>> c.AdditionalProperties.GpusPerNode = 1;
>> c.AdditionalProperties.GpuCard = 'gpu-card';
>> % Specify memory to use, per core (default: 4gb) [FOR GPU NODES]
>> c.AdditionalProperties.MemPerCPU = '6gb';
>> % Specify the partition
>> c.AdditionalProperties.Partition = 'partition-name';
>> % Specify cores per node
>> c.AdditionalProperties.ProcsPerNode = 4;
>> % Specify QoS
>> c.AdditionalProperties.QoS = 'qos-name';
>> % Use reservation
>> c.AdditionalProperties.Reservation = 'reservation-name';
>> % Specify the wall time (e.g., 1 day, 5 hours, 30 minutes)
>> c.AdditionalProperties.WallTime = '1-05:30';
```
Save changes after modifying AdditionalProperties for the above changes to persist between MATLAB sessions.
```
>> c.saveProfile
```
To see the values of the current configuration options, display AdditionalProperties.
```
>> % To view current properties
>> c.AdditionalProperties
```
Unset a value when no longer needed.
```
>> % Turn off email notifications
>> c.AdditionalProperties.EmailAddress = '';
>> c.saveProfile
```
## Running Job
### Interactive Jobs
To run an interactive pool job on the cluster, continue to use `parpool` as before.
```
>> % Get a handle to the cluster
>> c = parcluster;
>> % Open a pool of 64 workers on the cluster
>> pool = c.parpool(64);
```
Rather than running local on the local machine, the pool can now run across multiple nodes on the cluster.
```
>> % Run a parfor over 1000 iterations
>> parfor idx = 1:1000
a(idx) = rand;
end
```
Delete the pool when it’s no longer needed.
```
>> % Delete the pool
>> pool.delete
```
### Independent Batch Job
Use the batch command to submit asynchronous jobs to the cluster.
The batch command will return a job object which is used to access the output of the submitted job.
See the MATLAB documentation for more help on batch.
```
>> % Get a handle to the cluster
>> c = parcluster;
>> % Submit job to query where MATLAB is running on the cluster
>> job = c.batch(@pwd, 1, {});
>> % Query job for state
>> job.State
>> % If state is finished, fetch the results
>> job.fetchOutputs{:}
>> % Delete the job after results are no longer needed
>> job.delete
```
To retrieve a list of running or completed jobs, call `parcluster` to return the cluster object.
The cluster object stores an array of jobs that are queued to run, are running, have run, or have failed.
Retrieve and view the list of jobs as shown below.
```
>> c = parcluster;
>> jobs = c.Jobs
>>
>> % Get a handle to the second job in the list
>> job2 = c.Jobs(2);
```
Once the job has been selected, fetch the results as previously done.
`fetchOutputs` is used to retrieve function output arguments; if calling `batch` with a script, use `load` instead.
Data that has been written to files on the cluster needs be retrieved directly from the file system (e.g., via SFTP).
```
>> % Fetch all results from the second job in the list
>> job2.fetchOutputs{:}
```
### Parallel Batch Job
`batch` can also submit parallel workflows.
Let’s use the following example for a parallel job, which is saved as `parallel_example.m`.
```
function [sim_t, A] = parallel_example(iter)
if nargin==0
iter = 8;
end
disp('Start sim')
t0 = tic;
parfor idx = 1:iter
A(idx) = idx;
pause(2)
idx
end
sim_t = toc(t0);
disp('Sim completed')
save RESULTS A
end
```
This time when using the `batch` command, also specify a MATLAB `Pool` argument.
```
>> % Get a handle to the cluster
>> c = parcluster;
>> % Submit a batch pool job using 4 workers for 16 simulations
>> job = c.batch(@parallel_example, 1, {16}, 'Pool',4);
>> % View current job status
>> job.State
>> % Fetch the results after a finished state is retrieved
>> job.fetchOutputs{:}
ans =
8.8872
```
The job ran in 8.89 seconds using four workers.
Note that these jobs will always request N+1 CPU cores, since one worker is required to manage the batch job and pool of workers.
For example, a job that needs eight workers will request nine CPU cores.
Run the same simulation but increase the Pool size. This time, to retrieve the results later, keep track of the job ID.
!!! note
For some applications, there will be a diminishing return when allocating too many workers,
as the overhead may exceed computation time.
```
>> % Get a handle to the cluster
>> c = parcluster;
>> % Submit a batch pool job using 8 workers for 16 simulations
>> job = c.batch(@parallel_example, 1, {16}, 'Pool',8);
>> % Get the job ID
>> id = job.ID
id =
4
>> % Clear job from workspace (as though MATLAB exited)
>> clear job
```
With a handle to the cluster, the `findJob` method searches for the job with the specified job ID.
```
>> % Get a handle to the cluster
>> c = parcluster;
>> % Find the old job
>> job = c.findJob('ID', 4);
>> % Retrieve the state of the job
>> job.State
ans =
finished
>> % Fetch the results
>> job.fetchOutputs{:};
ans =
4.7270
```
The job now runs in 4.73 seconds using eight workers.
Run code with different number of workers to determine the ideal number to use.
Alternatively, to retrieve job results via a graphical user interface, use the Job Monitor (Parallel > Monitor Jobs).
![](../../img/monitor_job.png)
## Helper Functions
| Function | Description |
| --------------------- | ------------------------------------ |
| clusterFeatures | List of cluster features/constraints |
| clusterGpuCards | List of cluster GPU cards |
| clusterPartitionNames | List of cluster partition |
| willRun | Explain why job is queued |
### Debugging
If a serial job produces an error, call the `getDebugLog` method to view the error log file.
When submitting an independent job, specify the task.
```
>> c.getDebugLog(job.Tasks)
```
For Pool jobs, only specify the job object.
```
>> c.getDebugLog(job)
```
When troubleshooting a job, the cluster admin may request the scheduler ID of the job.
This can be derived by calling `getTaskSchedulerIDs`).
```
>> job.getTaskSchedulerIDs()
ans =
25539
```
## Additional Information
For more information about the MATLAB Parallel Computing Toolbox,
see the following resources:
* [Parallel Computing Overview][c]
* [Parallel Computing Documentation][d]
* [Parallel Computing Coding Examples][e]
* [Parallel Computing Tutorials][f]
* [Parallel Computing Videos][g]
* [Parallel Computing Webinars][h]
[1]: ../../general/accessing-the-clusters/graphical-user-interface/vnc.md#gui-applications-on-compute-nodes-over-vnc
[2]: #running-parallel-matlab-using-distributed-computing-toolbox---engine
[3]: ../isv_licenses.md
[4]: #parallel-matlab-batch-job-in-local-mode
[a]: https://www.mathworks.com/help/parallel-computing/release-notes.html
[b]: https://www.e-infra.cz/en
[c]: https://www.mathworks.com/products/parallel-computing.html
[d]: https://www.mathworks.com/help/parallel-computing/index.html
[e]: https://www.mathworks.com/help/parallel-computing/examples.html
[f]: https://www.mathworks.com/videos/series/parallel-and-gpu-computing-tutorials-97719.html
[g]: https://www.mathworks.com/videos/search.html?q=&fq%5B%5D=product:DM&page=1
[h]: https://www.mathworks.com/videos/search.html?q=&fq%5B%5D=product:DM&fq%5B%5D=video-external-category:recwebinar&page=1
# Octave
## Introduction
GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The Octave language is quite similar to Matlab so that most programs are easily portable. Read more [here][a].
For a list of available modules, type:
```console
$ ml av octave
------------------------------- /apps/modules/math -------------------------------
Octave/6.3.0-intel-2020b-without-X11
```
## Modules and Execution
To load the latest version of Octave load the module:
```console
$ ml Octave
```
Octave on clusters is linked to a highly optimized MKL mathematical library. This provides threaded parallelization to many Octave kernels, notably the linear algebra subroutines. Octave runs these heavy calculation kernels without any penalty. By default, Octave would parallelize to 128 threads on Karolina. You may control the threads by setting the `OMP_NUM_THREADS` environment variable.
To run Octave interactively, log in with the `ssh -X` parameter for X11 forwarding. Run Octave:
```console
$ octave
```
To run Octave in batch mode, write an Octave script, then write a bash jobscript and execute via the `salloc` command. By default, Octave will use 128 threads on Karolina when running MKL kernels.
```bash
#!/bin/bash
# change to local scratch directory
DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
mkdir -p "$DIR"
cd "$DIR" || exit
# copy input file to scratch
cp $SLURM_SUBMIT_DIR/octcode.m .
# load octave module
ml Octave/6.3.0-intel-2020b-without-X11
# execute the calculation
octave -q --eval octcode > output.out
# copy output file to home
cp output.out $SLURM_SUBMIT_DIR/.
#exit
exit
```
This script may be submitted directly to Slurm via the `salloc` command. The inputs are in the octcode.m file, outputs in the output.out file. See the single node jobscript example in the [Job execution section][1].
The Octave c compiler `mkoctfile` calls the GNU GCC 6.3.0 for compiling native C code. This is very useful for running native C subroutines in Octave environment.
```console
$ mkoctfile -v
mkoctfile, version 6.3.0
```
Octave may use MPI for interprocess communication This functionality is currently not supported on the clusters. In case you require the Octave interface to MPI, contact [support][b].
[1]: ../../general/job-submission-and-execution.md
[a]: https://www.gnu.org/software/octave/
[b]: https://support.it4i.cz/rt/
[c]: https://octave.sourceforge.net/parallel/
# OpenCoarrays
## Introduction
Coarray Fortran (CAF) is an extension of Fortran language and offers a simple interface for parallel processing and memory sharing.
The advantage is that only small changes are required to convert existing Fortran code to support a robust and potentially efficient parallelism.
A CAF program is interpreted as if it was replicated a number of times and all copies were executed asynchronously.
The number of copies is decided at execution time. Each copy (called *image*) has its own private variables.
The variable syntax of Fortran language is extended with indexes in square brackets (called *co-dimension*) representing a reference to data distributed across images.
By default, the CAF is using Message Passing Interface (MPI) for lower-level communication, so there are some similarities with MPI.
Read more [here][a].
## Coarray Basics
### Indexing of Coarray Images
Indexing of individual images can be shown on the simple *Hello World* program:
```fortran
program hello_world
implicit none
print *, 'Hello world from image ', this_image() , 'of', num_images()
end program hello_world
```
* `num_images()` - returns the number of all images
* `this_image()` - returns the image index - numbered from 1 to `num_images()`
### Co-Dimension Variables Declaration
Coarray variables can be declared with the `codimension[*]` attribute or by adding a trailing index `[*]` after the variable name.
Notice, the `*` character always has to be in the square brackets.
```fortran
integer, codimension[*] :: scalar
integer :: scalar[*]
real, dimension(64), codimension[*] :: vector
real :: vector(64)[*]
```
### Images Synchronization
Because each image is running on its own, the image synchronization is needed to ensure, that all altered data is distributed to all images.
Synchronization can be done across all images or only between selected images. Be aware, that selective synchronization can lead to the race condition problems like deadlock.
Example program:
```fortran
program synchronization_test
implicit none
integer :: i ! Local variable
integer :: numbers[*] ! Scalar coarray
! Genereate random number on image 1
if (this_image() == 1) then
numbers = floor(rand(1) * 1000)
! Distribute information to other images
do i = 2, num_images()
numbers[i] = numbers
end do
end if
sync all ! Barrier to synchronize all images
print *, 'The random number is', numbers
end program synchronization_test
```
* `sync all` - Synchronize all images between each other
* `sync images(*)` - Synchronize this image to all other
* `sync images(index)` - Synchronize this image to image with `index`
!!! note
`number` is the local variable while `number[index]` accesses the variable in the specific image.
`number[this_image()]` is the same as `number`.
## Compile and Run
Currently, version 2.9.2 compiled with the OpenMPI 4.0.5 library is installed on the cluster. To load the `OpenCoarrays` module, type:
```console
$ ml OpenCoarrays/2.9.2-gompi-2020b
```
### Compile CAF Program
The preferred method for compiling a CAF program is by invoking the `caf` compiler wrapper.
The above mentioned *Hello World* program can be compiled as follows:
```console
$ caf hello_world.f90 -o hello_world.x
```
!!! warning
The input file extension **.f90** or **.F90** are to be interpreted as *Fortran 90*.
If the input file extension is **.f** or **.F** the source code will be interpreted as *Fortran 77*.
Another method for compiling is by invoking the `mpif90` compiler wrapper directly:
```console
$ mpif90 hello_world.f90 -o hello_world.x -fcoarray=lib -lcaf_mpi
```
### Run CAF Program
A CAF program can be run by invoking the `cafrun` wrapper or directly by the `mpirun`:
```console
$ cafrun -np 4 ./hello_world.x
Hello world from image 1 of 4
Hello world from image 2 of 4
Hello world from image 3 of 4
Hello world from image 4 of 4
$ mpirun -np 4 ./synchronization_test.x
The random number is 242
The random number is 242
The random number is 242
The random number is 242
```
`-np 4` is the number of images to run. The parameters of `cafrun` and `mpirun` are the same.
[a]: http://www.opencoarrays.org/
# R
## Introduction
R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques, and is highly extensible.
One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.
Another convenience is the ease with which the C code or third party libraries may be integrated within R.
Extensive support for parallel computing is available within R.
Read more on [http://www.r-project.org/][a] and [http://cran.r-project.org/doc/manuals/r-release/R-lang.html][b].
## Modules
R version 3.1.1 is available on the cluster, along with GUI interface RStudio
| Application | Version | module |
| ----------- | ----------------- | ------------------- |
| **R** | R 3.1.1 | R/3.1.1-intel-2015b |
```console
$ ml R
```
## Execution
R on cluster is linked to a highly optimized MKL mathematical library. This provides threaded parallelization to many R kernels, notably the linear algebra subroutines. R runs these heavy calculation kernels without any penalty. You may control the threads by setting the `OMP_NUM_THREADS` environment variable.
### Interactive Execution
To run R interactively, using RStudio GUI, log in with the `ssh -X` parameter for X11 forwarding. Run RStudio:
```console
$ ml RStudio
$ rstudio
```
### Batch Execution
To run R in batch mode, write an R script, then write a bash jobscript and execute via the `sbatch` command. By default, R will use 24 threads on Salomon when running MKL kernels.
Example jobscript:
```bash
#!/bin/bash
# change to local scratch directory
DIR=/scratch/project/PROJECT_ID/$SLURM_JOBID
mkdir -p "$DIR"
cd "$DIR" || exit
# copy input file to scratch
cp $SLURM_SUBMIT_DIR/rscript.R .
# load R module
ml R
# execute the calculation
R CMD BATCH rscript.R routput.out
# copy output file to home
cp routput.out $SLURM_SUBMIT_DIR/.
#exit
exit
```
The inputs are in the `rscript.R` file, the outputs in the `routput.out` file.
See the single node jobscript example in the [Job execution section][1].
## Parallel R
Parallel execution of R may be achieved in many ways. One approach is the implied parallelization due to linked libraries or specially enabled functions, as [described above][2]. In the following sections, we focus on explicit parallelization, where parallel constructs are directly stated within the R script.
## Package Parallel
The package parallel provides support for parallel computation, including by forking (taken from package multicore), by sockets (taken from package snow) and random-number generation.
The package is activated this way:
```console
$ R
> library(parallel)
```
More information and examples may be obtained directly by reading the documentation available in R:
```r
> ?parallel
> library(help = "parallel")
> vignette("parallel")
```
Forking is the most simple to use. Forking family of functions provide parallelized, drop-in replacement for the serial `apply()` family of functions.
!!! warning
Forking via package parallel provides functionality similar to OpenMP construct omp parallel for
Only cores of single node can be utilized this way!
Forking example:
```r
library(parallel)
#integrand function
f <- function(i,h) {
x <- h*(i-0.5)
return (4/(1 + x*x))
}
#initialize
size <- detectCores()
while (TRUE)
{
#read number of intervals
cat("Enter the number of intervals: (0 quits) ")
fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp)
if(n<=0) break
#run the calculation
n <- max(n,size)
h <- 1.0/n
i <- seq(1,n);
pi3 <- h*sum(simplify2array(mclapply(i,f,h,mc.cores=size)));
#print results
cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi))
}
```
The above example is the classic parallel example for calculating the number π. Note the `detectCores()` and `mclapply()` functions. Execute the example as:
```console
$ R --slave --no-save --no-restore -f pi3p.R
```
Every evaluation of the integrad function runs in parallel on different process.
## Package Rmpi
The Rmpi package provides an interface (wrapper) to MPI APIs.
It also provides interactive R slave environment. On the cluster, Rmpi provides interface to the [OpenMPI][3].
Read more on Rmpi [here][c], reference manual is available [here][d].
When using the Rmpi package, both the `openmpi` and `R` modules must be loaded:
```console
$ ml OpenMPI
$ ml R
```
Rmpi may be used in three basic ways. The static approach is identical to executing any other MPI program. In addition, there is the Rslaves dynamic MPI approach and the mpi.apply approach. In the following section, we will use the number π integration example, to illustrate all these concepts.
### Static Rmpi
Static Rmpi programs are executed via `mpiexec`, as any other MPI programs. The number of processes is static - given at the launch time.
Static Rmpi example:
```r
library(Rmpi)
#integrand function
f <- function(i,h) {
x <- h*(i-0.5)
return (4/(1 + x*x))
}
#initialize
invisible(mpi.comm.dup(0,1))
rank <- mpi.comm.rank()
size <- mpi.comm.size()
n<-0
while (TRUE)
{
#read number of intervals
if (rank==0) {
cat("Enter the number of intervals: (0 quits) ")
fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp)
}
#broadcat the intervals
n <- mpi.bcast(as.integer(n),type=1)
if(n<=0) break
#run the calculation
n <- max(n,size)
h <- 1.0/n
i <- seq(rank+1,n,size);
mypi <- h*sum(sapply(i,f,h));
pi3 <- mpi.reduce(mypi)
#print results
if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi))
}
mpi.quit()
```
The above is the static MPI example for calculating the number π. Note the `library(Rmpi)` and `mpi.comm.dup()` function calls. Execute the example as:
```console
$ mpirun R --slave --no-save --no-restore -f pi3.R
```
### Dynamic Rmpi
Dynamic Rmpi programs are executed by calling the R directly. The `OpenMPI` module must still be loaded. The R slave processes will be spawned by a function call within the Rmpi program.
Dynamic Rmpi example:
```r
#integrand function
f <- function(i,h) {
x <- h*(i-0.5)
return (4/(1 + x*x))
}
#the worker function
workerpi <- function()
{
#initialize
rank <- mpi.comm.rank()
size <- mpi.comm.size()
n<-0
while (TRUE)
{
#read number of intervals
if (rank==0) {
cat("Enter the number of intervals: (0 quits) ")
fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp)
}
#broadcat the intervals
n <- mpi.bcast(as.integer(n),type=1)
if(n<=0) break
#run the calculation
n <- max(n,size)
h <- 1.0/n
i <- seq(rank+1,n,size);
mypi <- h*sum(sapply(i,f,h));
pi3 <- mpi.reduce(mypi)
#print results
if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi))
}
}
#main
library(Rmpi)
cat("Enter the number of slaves: ")
fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp)
mpi.spawn.Rslaves(nslaves=ns)
mpi.bcast.Robj2slave(f)
mpi.bcast.Robj2slave(workerpi)
mpi.bcast.cmd(workerpi())
workerpi()
mpi.quit()
```
The above example is the dynamic MPI example for calculating the number π. Both master and slave processes carry out the calculation. Note the `mpi.spawn.Rslaves()`, `mpi.bcast.Robj2slave()`, **and the `mpi.bcast.cmd()`** function calls.
Execute the example as:
```console
$ mpirun -np 1 R --slave --no-save --no-restore -f pi3Rslaves.R
```
Note that this method uses `MPI_Comm_spawn` (Dynamic process feature of MPI-2) to start the slave processes - the master process needs to be launched with MPI. In general, Dynamic processes are not well supported among MPI implementations, some issues might arise. In addition, environment variables are not propagated to spawned processes, so they will not see paths from modules.
### mpi.apply Rmpi
`mpi.apply` is a specific way of executing Dynamic Rmpi programs.
`mpi.apply()` family of functions provide MPI parallelized, drop in replacement for the serial `apply()` family of functions.
Execution is identical to other dynamic Rmpi programs.
mpi.apply Rmpi example:
```r
#integrand function
f <- function(i,h) {
x <- h*(i-0.5)
return (4/(1 + x*x))
}
#the worker function
workerpi <- function(rank,size,n)
{
#run the calculation
n <- max(n,size)
h <- 1.0/n
i <- seq(rank,n,size);
mypi <- h*sum(sapply(i,f,h));
return(mypi)
}
#main
library(Rmpi)
cat("Enter the number of slaves: ")
fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp)
mpi.spawn.Rslaves(nslaves=ns)
mpi.bcast.Robj2slave(f)
mpi.bcast.Robj2slave(workerpi)
while (TRUE)
{
#read number of intervals
cat("Enter the number of intervals: (0 quits) ")
fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp)
if(n<=0) break
#run workerpi
i=seq(1,2*ns)
pi3=sum(mpi.parSapply(i,workerpi,2*ns,n))
#print results
cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi))
}
mpi.quit()
```
The above is the mpi.apply MPI example for calculating the number π. Only the slave processes carry out the calculation. Note the `mpi.parSapply()`, function call. The package parallel example above may be trivially adapted (for much better performance) to this structure using the `mclapply()` in place of `mpi.parSapply()`.
Execute the example as:
```console
$ mpirun -np 1 R --slave --no-save --no-restore -f pi3parSapply.R
```
## Combining Parallel and Rmpi
Currently, the two packages cannot be combined for hybrid calculations.
## Parallel Execution
R parallel jobs are executed via the SLURM partition system exactly as any other parallel jobs. The user must create an appropriate jobscript and submit it via `sbatch`
An example jobscript for [static Rmpi][4] parallel R execution, running 1 process per core:
```bash
#!/bin/bash
#SBATCH -q qprod
#SBATCH -N Rjob
#SBATCH --nodes=100 --ntasks-per-node=24 --cpus-per-task=1
# change to scratch directory
DIR=/scratch/project/PROJECT_ID/$SLURM_JOBID
mkdir -p "$DIR"
cd "$DIR" || exit
# copy input file to scratch
cp $SLURM_SUBMIT_DIR/rscript.R .
# load R and openmpi module
ml R OpenMPI
# execute the calculation
mpirun -bycore -bind-to-core R --slave --no-save --no-restore -f rscript.R
# copy output file to home
cp routput.out $SLURM_SUBMIT_DIR/.
#exit
exit
```
For more information about jobscripts and MPI execution, refer to the [Job submission][1] and general [MPI][5] sections.
[1]: ../../general/job-submission-and-execution.md
[2]: #interactive-execution
[4]: #static-rmpi
[5]: ../mpi/mpi.md
[a]: http://www.r-project.org/
[b]: http://cran.r-project.org/doc/manuals/r-release/R-lang.html
[c]: http://cran.r-project.org/web/packages/Rmpi/
[d]: http://cran.r-project.org/web/packages/Rmpi/Rmpi.pdf
# FFTW
The discrete Fourier transform in one or more dimensions, MPI parallel
FFTW is a C subroutine library for computing the discrete Fourier transform in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, e.g. the discrete cosine/sine transforms or DCT/DST). The FFTW library allows for MPI parallel, in-place discrete Fourier transform, with data distributed over number of nodes.
```console
$ ml av FFTW
---------------------------------------------------- /apps/modules/numlib -----------------------------------------------------
FFTW/3.3.7-gompi-2018a FFTW/3.3.8-gompi-2020a FFTW/3.3.8-gompic-2020b FFTW/3.3.8
FFTW/3.3.8-gompi-2020a-amd FFTW/3.3.8-gompi-2020b FFTW/3.3.8-iccifort-2020.4.304 FFTW/3.3.9-gompi-2021a (D)
```
To load the latest version of Octave load the module:
```console
$ ml FFTW
```
The module sets up environment variables, required for linking and running FFTW enabled applications. Make sure that the choice of FFTW module is consistent with your choice of MPI library. Mixing MPI of different implementations may have unpredictable results.
## Example
```cpp
#include <fftw3-mpi.h>
int main(int argc, char **argv)
{
const ptrdiff_t N0 = 100, N1 = 1000;
fftw_plan plan;
fftw_complex *data;
ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
MPI_Init(&argc, &argv);
fftw_mpi_init();
/* get local data size and allocate */
alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD,
&local_n0, &local_0_start);
data = fftw_alloc_complex(alloc_local);
/* create plan for in-place forward DFT */
plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD,
FFTW_FORWARD, FFTW_ESTIMATE);
/* initialize data */
for (i = 0; i < local_n0; ++i) for (j = 0; j < N1; ++j)
{ data[i*N1 + j][0] = i;
data[i*N1 + j][1] = j; }
/* compute transforms, in-place, as many times as desired */
fftw_execute(plan);
fftw_destroy_plan(plan);
MPI_Finalize();
}
```
Load modules and compile:
```console
$ ml intel/2020b 3.3.8-iccifort-2020.4.304
$ mpicc testfftw3mpi.c -o testfftw3mpi.x -Wl,-rpath=$LIBRARY_PATH -lfftw3_mpi
```
Read more on FFTW usage on the [FFTW website][a].
[a]: http://www.fftw.org/fftw3_doc/
\ No newline at end of file