Compare revisions

0209c81e · 0209c81e · 0209c81e · 0209c81e · 0209c81e · 0209c81e
--- a/docs.it4i/software/numerical-libraries/gsl.md
+++ b/docs.it4i/software/numerical-libraries/gsl.md
+# GSL
+
+The GNU Scientific Library. Provides a wide range of mathematical routines.
+
+## Introduction
+
+The GNU Scientific Library (GSL) provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total. The routines have been written from scratch in C, and present a modern API for C programmers, allowing wrappers to be written for very high level languages.
+
+The library covers a wide range of topics in numerical computing. Routines are available for the following areas:
+
+Complex Numbers              Roots of Polynomials
+
+Special Functions            Vectors and Matrices
+
+Permutations                 Combinations
+
+Sorting                      BLAS Support
+
+Linear Algebra               CBLAS Library
+
+Fast Fourier Transforms      Eigensystems
+
+Random Numbers               Quadrature
+
+Random Distributions         Quasi-Random Sequences
+
+Histograms                   Statistics
+
+Monte Carlo Integration      N-Tuples
+
+Differential Equations       Simulated Annealing
+
+Numerical Differentiation    Interpolation
+
+Series Acceleration          Chebyshev Approximations
+
+Root-Finding                 Discrete Hankel Transforms
+
+Least-Squares Fitting        Minimization
+
+IEEE Floating-Point          Physical Constants
+
+Basis Splines                Wavelets
+
+## Modules
+
+For the list of available gsl modules, use the command:
+
+```console
+$ ml av gsl
+---------------- /apps/modules/numlib -------------------
+   GSL/2.5-intel-2017c    GSL/2.6-iccifort-2020.1.217    GSL/2.7-GCC-10.3.0 (D)
+   GSL/2.6-GCC-10.2.0     GSL/2.6-iccifort-2020.4.304
+```
+
+## Linking
+
+Load an appropriate `gsl` module. Use the `-lgsl` switch to link your code against GSL. The GSL depends on cblas API to BLAS library, which must be supplied for linking. The BLAS may be provided, for example from the MKL library, as well as from the BLAS GSL library (`-lgslcblas`). Using the MKL is recommended.
+
+### Compiling and Linking With Intel Compilers
+
+```console
+$ ml intel/2020b gsl/2.6-iccifort-2020.4.304
+$ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl
+```
+
+### Compiling and Linking With GNU Compilers
+
+```console
+$ ml ml GCC/10.2.0 imkl/2020.4.304-iimpi-2020b GSL/2.6-iccifort-2020.4.304
+$ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lgsl
+```
+
+## Example
+
+Following is an example of a discrete wavelet transform implemented by GSL:
+
+```cpp
+    #include <stdio.h>
+    #include <math.h>
+    #include <gsl/gsl_sort.h>
+    #include <gsl/gsl_wavelet.h>
+
+    int
+    main (int argc, char **argv)
+    {
+      int i, n = 256, nc = 20;
+      double *data = malloc (n * sizeof (double));
+      double *abscoeff = malloc (n * sizeof (double));
+      size_t *p = malloc (n * sizeof (size_t));
+
+      gsl_wavelet *w;
+      gsl_wavelet_workspace *work;
+
+      w = gsl_wavelet_alloc (gsl_wavelet_daubechies, 4);
+      work = gsl_wavelet_workspace_alloc (n);
+
+      for (i=0; i<n; i++)
+      data[i] = sin (3.141592654*(double)i/256.0);
+
+      gsl_wavelet_transform_forward (w, data, 1, n, work);
+
+      for (i = 0; i < n; i++)
+        {
+          abscoeff[i] = fabs (data[i]);
+        }
+
+      gsl_sort_index (p, abscoeff, 1, n);
+
+      for (i = 0; (i + nc) < n; i++)
+        data[p[i]] = 0;
+
+      gsl_wavelet_transform_inverse (w, data, 1, n, work);
+
+      for (i = 0; i < n; i++)
+        {
+          printf ("%gn", data[i]);
+        }
+
+      gsl_wavelet_free (w);
+      gsl_wavelet_workspace_free (work);
+
+      free (data);
+      free (abscoeff);
+      free (p);
+      return 0;
+    }
+```
+
+Load modules and compile:
+
+```console
+$ ml intel/2020b gsl/GSL/2.6-iccifort-2020.4.304
+$ icc dwt.c -o dwt.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl
+```
+
+In this example, we compile the dwt.c code using the Intel compiler and link it to the MKL and GSL library, note the `-mkl` and `-lgsl` options. The library search path is compiled in, so that no modules are necessary to run the code.
--- a/docs.it4i/software/numerical-libraries/hdf5.md
+++ b/docs.it4i/software/numerical-libraries/hdf5.md
+# HDF5
+
+Hierarchical Data Format library. Serial and MPI parallel version.
+
+[HDF5 (Hierarchical Data Format)][a] is a general purpose library and file format for storing scientific data. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic objects, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs.
+
+## Installed Versions
+
+For the current list of installed versions, use:
+
+```console
+$ ml av HDF5
+----------------------------------------------------- /apps/modules/data ------------------------------------------------------
+   HDF5/1.10.6-foss-2020b-parallel     HDF5/1.10.6-intel-2020a             HDF5/1.10.7-gompi-2021a
+   HDF5/1.10.6-iimpi-2020a             HDF5/1.10.6-intel-2020b-parallel    HDF5/1.10.7-gompic-2020b
+   HDF5/1.10.6-intel-2020a-parallel    HDF5/1.10.7-gompi-2020b             HDF5/1.10.7-iimpi-2020b  (D)
+
+```
+
+To load the module, use the `ml` command.
+
+The module sets up environment variables required for linking and running HDF5 enabled applications. Make sure that the choice of the HDF5 module is consistent with your choice of the MPI library. Mixing MPI of different implementations may cause unexpected results.
+
+## Example
+
+```cpp
+    #include "hdf5.h"
+    #define FILE "dset.h5"
+
+    int main() {
+
+       hid_t       file_id, dataset_id, dataspace_id;  /* identifiers */
+       hsize_t     dims[2];
+       herr_t      status;
+       int         i, j, dset_data[4][6];
+
+       /* Create a new file using default properties. */
+       file_id = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
+
+       /* Create the data space for the dataset. */
+       dims[0] = 4;
+       dims[1] = 6;
+       dataspace_id = H5Screate_simple(2, dims, NULL);
+
+       /* Initialize the dataset. */
+       for (i = 0; i < 4; i++)
+          for (j = 0; j < 6; j++)
+             dset_data[i][j] = i * 6 + j + 1;
+
+       /* Create the dataset. */
+       dataset_id = H5Dcreate2(file_id, "/dset", H5T_STD_I32BE, dataspace_id,
+                              H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
+
+       /* Write the dataset. */
+       status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
+                         dset_data);
+
+       status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
+                        dset_data);
+
+       /* End access to the dataset and release resources used by it. */
+       status = H5Dclose(dataset_id);
+
+       /* Terminate access to the data space. */
+       status = H5Sclose(dataspace_id);
+
+       /* Close the file. */
+       status = H5Fclose(file_id);
+    }
+```
+
+Load modules and compile:
+
+```console
+$ ml intel/2020b HDF5/1.10.6-intel-2020b-parallel
+$ mpicc hdf5test.c -o hdf5test.x -Wl,-rpath=$LIBRARY_PATH $HDF5_INC $HDF5_SHLIB
+```
+
+For further information, see the [website][a].
+
+[a]: http://www.hdfgroup.org/HDF5/
--- a/docs.it4i/software/numerical-libraries/intel-numerical-libraries.md
+++ b/docs.it4i/software/numerical-libraries/intel-numerical-libraries.md
+# Intel Numerical Libraries
+
+Intel libraries for high performance in numerical computing.
+
+## Intel Math Kernel Library
+
+Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver, and many more.
+
+```console
+$ ml av mkl
+------------------- /apps/modules/numlib -------------------
+   imkl/2017.4.239-iimpi-2017c    imkl/2020.1.217-iimpi-2020a        imkl/2021.2.0-iimpi-2021a (D)
+   imkl/2018.4.274-iimpi-2018a    imkl/2020.4.304-iimpi-2020b (L)    mkl/2020.4.304
+   imkl/2019.1.144-iimpi-2019a    imkl/2020.4.304-iompi-2020b
+```
+
+!!! info
+    `imkl` ... with intel toolchain. `mkl` with system toolchain.
+
+For more information, see the [Intel MKL][1] section.
+
+## Intel Integrated Performance Primitives
+
+Intel Integrated Performance Primitives version 7.1.1, compiled for AVX is available via the `ipp` module. IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image, and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax, and many more.
+
+```console
+$ ml av ipp
+------------------- /apps/modules/perf -------------------
+   ipp/2020.3.304
+```
+
+For more information, see the [Intel IPP][2] section.
+
+## Intel Threading Building Blocks
+
+Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner.
+
+```console
+$ ml av tbb
+------------------- /apps/modules/lib -------------------
+   tbb/2020.3-GCCcore-10.2.0
+
+```
+
+Read more at the [Intel TBB][3].
+
+## Python Hooks for Intel Math Kernel Library
+
+Python hooks for Intel(R) Math Kernel Library runtime control settings.
+
+```console
+$ ml av mkl-service
+------------------- /apps/modules/data -------------------
+   mkl-service/2.3.0-intel-2020b
+```
+
+Read more at the [hooks][a].
+
+[1]: ../intel/intel-suite/intel-mkl.md
+[2]: ../intel/intel-suite/intel-integrated-performance-primitives.md
+[3]: ../intel/intel-suite/intel-tbb.md
+
+[a]: https://github.com/IntelPython/mkl-service
--- a/docs.it4i/software/numerical-libraries/petsc.md
+++ b/docs.it4i/software/numerical-libraries/petsc.md
+# PETSc
+
+PETSc is a suite of building blocks for the scalable solution of scientific and engineering applications modeled by partial differential equations. It supports MPI, shared memory, and GPU through CUDA or OpenCL, as well as hybrid MPI-shared memory or MPI-GPU parallelism.
+
+## Introduction
+
+PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of building blocks (data structures and routines) for the scalable solution of scientific and engineering applications modelled by partial differential equations. It allows thinking in terms of high-level objects (matrices) instead of low-level objects (raw arrays). Written in C language but can also be called from FORTRAN, C++, Python, and Java codes. It supports MPI, shared memory, and GPUs through CUDA or OpenCL, as well as hybrid MPI-shared memory or MPI-GPU parallelism.
+
+## Resources
+
+* [project webpage][a]
+* [documentation][b]
+  * [PETSc Users Manual (PDF)][c]
+  * [index of all manual pages][d]
+* PRACE Video Tutorial [part1][e], [part2][f], [part3][g], [part4][h], [part5][i]
+
+## Modules
+
+For the current list of installed versions, use:
+
+```console
+$ ml av petsc
+
+```
+
+## External Libraries
+
+PETSc needs at least MPI, BLAS, and LAPACK. These dependencies are currently satisfied with Intel MPI and Intel MKL in `petsc` modules.
+
+PETSc can be linked with a plethora of [external numerical libraries][k], extending PETSc functionality, e.g. direct linear system solvers, preconditioners, or partitioners. See below the list of libraries currently included in `petsc` modules.
+
+All these libraries can also be used alone, without PETSc. Their static or shared program libraries are available in
+`$PETSC_DIR/$PETSC_ARCH/lib` and header files in `$PETSC_DIR/$PETSC_ARCH/include`. `PETSC_DIR` and `PETSC_ARCH` are environment variables pointing to a specific PETSc instance based on the PETSc module loaded.
+
+* dense linear algebra
+  * [Elemental][l]
+* sparse linear system solvers
+  * [Intel MKL Pardiso][m]
+  * [MUMPS][n]
+  * [PaStiX][o]
+  * [SuiteSparse][p]
+  * [SuperLU][q]
+  * [SuperLU_Dist][r]
+* input/output
+  * [ExodusII][s]
+  * [HDF5][t]
+  * [NetCDF][u]
+* partitioning
+  * [Chaco][v]
+  * [METIS][w]
+  * [ParMETIS][x]
+  * [PT-Scotch][y]
+* preconditioners & multigrid
+  * [Hypre][z]
+  * [SPAI - Sparse Approximate Inverse][aa]
+
+[a]: http://www.mcs.anl.gov/petsc/
+[b]: http://www.mcs.anl.gov/petsc/documentation/
+[c]: http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf
+[d]: http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/singleindex.html
+[e]: http://www.youtube.com/watch?v=asVaFg1NDqY
+[f]: http://www.youtube.com/watch?v=ubp_cSibb9I
+[g]: http://www.youtube.com/watch?v=vJAAAQv-aaw
+[h]: http://www.youtube.com/watch?v=BKVlqWNh8jY
+[i]: http://www.youtube.com/watch?v=iXkbLEBFjlM
+[j]: https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html
+[k]: http://www.mcs.anl.gov/petsc/miscellaneous/external.html
+[l]: http://libelemental.org/
+[m]: https://software.intel.com/en-us/node/470282
+[n]: http://mumps.enseeiht.fr/
+[o]: http://pastix.gforge.inria.fr/
+[p]: http://faculty.cse.tamu.edu/davis/suitesparse.html
+[q]: http://crd.lbl.gov/~xiaoye/SuperLU/#superlu
+[r]: http://crd.lbl.gov/~xiaoye/SuperLU/#superlu_dist
+[s]: http://sourceforge.net/projects/exodusii/
+[t]: http://www.hdfgroup.org/HDF5/
+[u]: http://www.unidata.ucar.edu/software/netcdf/
+[v]: http://www.cs.sandia.gov/CRF/chac.html
+[w]: http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
+[x]: http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview
+[y]: http://www.labri.fr/perso/pelegrin/scotch/
+[z]: http://www.nersc.gov/users/software/programming-libraries/math-libraries/petsc/
+[aa]: https://bitbucket.org/petsc/pkg-spai
--- a/docs.it4i/software/nvidia-cuda-q.md
+++ b/docs.it4i/software/nvidia-cuda-q.md
+# CUDA Quantum for Python
+
+## What Is CUDA Quantum?
+
+CUDA Quantum streamlines hybrid application development and promotes productivity and scalability in quantum computing. It offers a unified programming model designed for a hybrid setting—that is, CPUs, GPUs, and QPUs working together.
+
+For more information, see the [official documentation][1].
+
+## How to Install Version Without GPU Acceleration
+
+Use (preferably in conda environment)
+
+```bash
+pip install cuda-quantum
+```
+
+## How to Install Version With GPU Acceleration Using Conda
+
+Run:
+
+```bash
+conda create -y -n cuda-quantum python=3.10 pip
+conda install -y -n cuda-quantum -c "nvidia/label/cuda-11.8.0" cuda
+conda install -y -n cuda-quantum -c conda-forge mpi4py openmpi cxx-compiler cuquantum
+conda env config vars set -n cuda-quantum
+LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CONDA_PREFIX/envs/cuda-quantum/lib"
+conda env config vars set -n cuda-quantum
+MPI_PATH=$CONDA_PREFIX/envs/cuda-quantum
+conda run -n cuda-quantum pip install cuda-quantum
+conda activate cuda-quantum
+source $CONDA_PREFIX/lib/python3.10/site-packages/distributed_interfaces/activate_custom_mpi.sh
+```
+
+Then configure the MPI:
+
+``` bash
+export OMPI_MCA_opal_cuda_support=true OMPI_MCA_btl='^openib'
+```
+
+## How to Test Your Installation?
+
+You can test your installation by running the following script:
+
+```bash
+import cudaq
+
+kernel = cudaq.make_kernel()
+qubit = kernel.qalloc()
+kernel.x(qubit)
+kernel.mz(qubit)
+
+result = cudaq.sample(kernel)
+```
+
+## Further Questions Considering the Installation?
+
+See the Cuda Quantum PyPI website at [https://pypi.org/project/cuda-quantum/][2].
+
+## Example QNN
+
+In the *qnn_example.py* you find a script that loads FashionMNIST dataset, chooses two data type (shirts and pants), then we create a Neural Network with quantum layer.This network is then trained on our data and later tested on the test dataset. You are free to try it on your own. Download the [QNN example][a] and rename it to `qnn_example.py`.
+
+![](../img/cudaq.png)
+
+[1]: https://nvidia.github.io/cuda-quantum/latest/index.html
+[2]: https://pypi.org/project/cuda-quantum/
+
+[a]: ../src/qnn_example.txt
--- a/docs.it4i/software/nvidia-cuda.md
+++ b/docs.it4i/software/nvidia-cuda.md
+# NVIDIA CUDA
+
+## Introduction
+
+CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs).
+
+## Installed Versions
+
+For the current list of installed versions, use:
+
+```console
+$ ml av CUDA
+```
+
+## CUDA Programming
+
+The default programming model for GPU accelerators is NVIDIA CUDA. To set up the environment for CUDA, use:
+
+```console
+$ ml CUDA
+```
+
+CUDA code can be compiled directly on login nodes. The user does not have to use compute nodes with GPU accelerators for compilation. To compile CUDA source code, use the NVCC compiler:
+
+```console
+$ nvcc --version
+```
+
+The CUDA Toolkit comes with a large number of examples, which can be a helpful reference to start with. To compile and test these examples, users should copy them to their home directory:
+
+```console
+$ cd ~
+$ mkdir cuda-samples
+$ cp -R /apps/nvidia/cuda/VERSION_CUDA/samples/* ~/cuda-samples/
+```
+
+To compile examples, change directory to the particular example (here the example used is deviceQuery) and run `make` to start the compilation;
+
+```console
+$ cd ~/cuda-samples/1_Utilities/deviceQuery
+$ make
+```
+
+Request an interactive session on the `qgpu` queue and execute the binary file:
+
+```console
+$ salloc -p qgpu -A PROJECT_ID
+$ ml CUDA
+$ ~/cuda-samples/1_Utilities/deviceQuery/deviceQuery
+```
+
+The expected output of the deviceQuery example executed on a node with a Tesla K20m is:
+
+```console
+    CUDA Device Query (Runtime API) version (CUDART static linking)
+
+    Detected 1 CUDA Capable device(s)
+
+    Device 0: "Tesla K20m"
+    CUDA Driver Version / Runtime Version 5.0 / 5.0
+    CUDA Capability Major/Minor version number: 3.5
+    Total amount of global memory: 4800 MBytes (5032706048 bytes)
+    (13) Multiprocessors x (192) CUDA Cores/MP: 2496 CUDA Cores
+    GPU Clock rate: 706 MHz (0.71 GHz)
+    Memory Clock rate: 2600 Mhz
+    Memory Bus Width: 320-bit
+    L2 Cache Size: 1310720 bytes
+    Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
+    Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
+    Total amount of constant memory: 65536 bytes
+    Total amount of shared memory per block: 49152 bytes
+    Total number of registers available per block: 65536
+    Warp size: 32
+    Maximum number of threads per multiprocessor: 2048
+    Maximum number of threads per block: 1024
+    Maximum sizes of each dimension of a block: 1024 x 1024 x 64
+    Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
+    Maximum memory pitch: 2147483647 bytes
+    Texture alignment: 512 bytes
+    Concurrent copy and kernel execution: Yes with 2 copy engine(s)
+    Run time limit on kernels: No
+    Integrated GPU sharing Host Memory: No
+    Support host page-locked memory mapping: Yes
+    Alignment requirement for Surfaces: Yes
+    Device has ECC support: Enabled
+    Device supports Unified Addressing (UVA): Yes
+    Device PCI Bus ID / PCI location ID: 2 / 0
+    Compute Mode:
+    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
+    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = Tesla K20m
+```
+
+### Code Example
+
+In this section, we provide a basic CUDA based vector addition code example. You can directly copy and paste the code to test it:
+
+```cpp
+$ vim test.cu
+
+#define N (2048*2048)
+#define THREADS_PER_BLOCK 512
+
+#include <stdio.h>
+#include <stdlib.h>
+
+// GPU kernel function to add two vectors
+__global__ void add_gpu( int *a, int *b, int *c, int n){
+  int index = threadIdx.x + blockIdx.x * blockDim.x;
+  if (index < n)
+    c[index] = a[index] + b[index];
+}
+
+// CPU function to add two vectors
+void add_cpu (int *a, int *b, int *c, int n) {
+  for (int i=0; i < n; i++)
+    c[i] = a[i] + b[i];
+}
+
+// CPU function to generate a vector of random integers
+void random_ints (int *a, int n) {
+  for (int i = 0; i < n; i++)
+  a[i] = rand() % 10000; // random number between 0 and 9999
+}
+
+// CPU function to compare two vectors
+int compare_ints( int *a, int *b, int n ){
+  int pass = 0;
+  for (int i = 0; i < N; i++){
+    if (a[i] != b[i]) {
+      printf("Value mismatch at location %d, values %d and %dn",i, a[i], b[i]);
+      pass = 1;
+    }
+  }
+  if (pass == 0) printf ("Test passedn"); else printf ("Test Failedn");
+  return pass;
+}
+
+int main( void ) {
+
+  int *a, *b, *c; // host copies of a, b, c
+  int *dev_a, *dev_b, *dev_c; // device copies of a, b, c
+  int size = N * sizeof( int ); // we need space for N integers
+
+  // Allocate GPU/device copies of dev_a, dev_b, dev_c
+  cudaMalloc( (void**)&dev_a, size );
+  cudaMalloc( (void**)&dev_b, size );
+  cudaMalloc( (void**)&dev_c, size );
+
+  // Allocate CPU/host copies of a, b, c
+  a = (int*)malloc( size );
+  b = (int*)malloc( size );
+  c = (int*)malloc( size );
+
+  // Fill input vectors with random integer numbers
+  random_ints( a, N );
+  random_ints( b, N );
+
+  // copy inputs to device
+  cudaMemcpy( dev_a, a, size, cudaMemcpyHostToDevice );
+  cudaMemcpy( dev_b, b, size, cudaMemcpyHostToDevice );
+
+  // launch add_gpu() kernel with blocks and threads
+  add_gpu<<< N/THREADS_PER_BLOCK, THREADS_PER_BLOCK >>( dev_a, dev_b, dev_c, N );
+
+  // copy device result back to host copy of c
+  cudaMemcpy( c, dev_c, size, cudaMemcpyDeviceToHost );
+
+  //Check the results with CPU implementation
+  int *c_h; c_h = (int*)malloc( size );
+  add_cpu (a, b, c_h, N);
+  compare_ints(c, c_h, N);
+
+  // Clean CPU memory allocations
+  free( a ); free( b ); free( c ); free (c_h);
+
+  // Clean GPU memory allocations
+  cudaFree( dev_a );
+  cudaFree( dev_b );
+  cudaFree( dev_c );
+
+  return 0;
+}
+```
+
+This code can be compiled using the following command:
+
+```console
+$ nvcc test.cu -o test_cuda
+```
+
+To run the code, request an interactive session to get access to one of the GPU accelerated nodes:
+
+```console
+$ salloc -p qgpu -A PROJECT_ID
+$ ml cuda
+$ ./test.cuda
+```
+
+## CUDA Libraries
+
+### cuBLAS
+
+The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library with 152 standard BLAS routines. A basic description of the library together with basic performance comparisons with MKL can be found [here][a].
+
+#### cuBLAS Example: SAXPY
+
+The SAXPY function multiplies the vector x by the scalar alpha and adds it to the vector y, overwriting the latest vector with the result. A description of the cuBLAS function can be found in the [NVIDIA CUDA documentation][b]. The code can be pasted in the file and compiled without any modification:
+
+```cpp
+/* Includes, system */
+#include <stdio.h>
+#include <stdlib.h>
+
+/* Includes, cuda */
+#include <cuda_runtime.h>
+#include <cublas_v2.h>
+
+/* Vector size */
+#define N  (32)
+
+/* Host implementation of a simple version of saxpi */
+void saxpy(int n, float alpha, const float *x, float *y)
+{
+    for (int i = 0; i < n; ++i)
+    y[i] = alpha*x[i] + y[i];
+}
+
+/* Main */
+int main(int argc, char **argv)
+{
+    float *h_X, *h_Y, *h_Y_ref;
+    float *d_X = 0;
+    float *d_Y = 0;
+
+    const float alpha = 1.0f;
+    int i;
+
+    cublasHandle_t handle;
+
+    /* Initialize CUBLAS */
+    printf("simpleCUBLAS test running..n");
+    cublasCreate(&handle);
+
+    /* Allocate host memory for the matrices */
+    h_X = (float *)malloc(N * sizeof(h_X[0]));
+    h_Y = (float *)malloc(N * sizeof(h_Y[0]));
+    h_Y_ref = (float *)malloc(N * sizeof(h_Y_ref[0]));
+
+    /* Fill the matrices with test data */
+    for (i = 0; i < N; i++)
+    {
+        h_X[i] = rand() / (float)RAND_MAX;
+        h_Y[i] = rand() / (float)RAND_MAX;
+        h_Y_ref[i] = h_Y[i];
+    }
+
+    /* Allocate device memory for the matrices */
+    cudaMalloc((void **)&d_X, N * sizeof(d_X[0]));
+    cudaMalloc((void **)&d_Y, N * sizeof(d_Y[0]));
+
+    /* Initialize the device matrices with the host matrices */
+    cublasSetVector(N, sizeof(h_X[0]), h_X, 1, d_X, 1);
+    cublasSetVector(N, sizeof(h_Y[0]), h_Y, 1, d_Y, 1);
+
+    /* Performs operation using plain C code */
+    saxpy(N, alpha, h_X, h_Y_ref);
+
+    /* Performs operation using cublas */
+    cublasSaxpy(handle, N, &alpha, d_X, 1, d_Y, 1);
+
+    /* Read the result back */
+    cublasGetVector(N, sizeof(h_Y[0]), d_Y, 1, h_Y, 1);
+
+    /* Check result against reference */
+    for (i = 0; i < N; ++i)
+        printf("CPU res = %f t GPU res = %f t diff = %f n", h_Y_ref[i], h_Y[i], h_Y_ref[i] - h_Y[i]);
+
+    /* Memory clean up */
+    free(h_X); free(h_Y); free(h_Y_ref);
+    cudaFree(d_X); cudaFree(d_Y);
+
+    /* Shutdown */
+    cublasDestroy(handle);
+}
+```
+
+!!! note
+    cuBLAS has its own function for data transfers between CPU and GPU memory:
+    - [cublasSetVector][c] - transfers data from CPU to GPU memory
+    - [cublasGetVector][d] - transfers data from GPU to CPU memory
+
+To compile the code using the NVCC compiler, the `-lcublas` compiler flag has to be specified:
+
+```console
+$ ml cuda
+$ nvcc -lcublas test_cublas.cu -o test_cublas_nvcc
+```
+
+To compile the same code with GCC:
+
+```console
+$ ml cuda
+$ gcc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart
+```
+
+To compile the same code with the Intel compiler:
+
+```console
+$ ml cuda
+$ ml intel
+$ icc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart
+```
+
+[a]: https://developer.nvidia.com/cublas
+[b]: http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-axpy
+[c]: http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector
+[d]: http://docs.nvidia.com/cuda/cublas/index.html#cublasgetvector
--- a/docs.it4i/software/nvidia-hip.md
+++ b/docs.it4i/software/nvidia-hip.md
+# ROCm HIP
+
+## Introduction
+
+ROCm HIP allows developers to convert [CUDA code][a] to portable C++. The same source code can be compiled to run on NVIDIA or AMD GPUs.
+
+This page documents the use of pre-built Apptainer (previously Singularity) image on Karolina Accelerated nodes (acn).
+
+## Get Into GPU Node
+
+```console
+$ salloc -p qgpu -A PROJECT_ID -t 01:00:00
+salloc: Granted job allocation 1543777
+salloc: Waiting for resource configuration
+salloc: Nodes acn41 are ready for job
+```
+
+## Installed Versions of Apptainer
+
+For the current list of installed versions, use:
+
+```console
+module avail apptainer
+# ----------------- /apps/modules/tools ------------------
+#   apptainer-wrappers/1.0 (A)    apptainer/1.1.5
+```
+
+Load the required module:
+
+```console
+module load apptainer/1.1.5
+```
+
+## Launch Apptainer
+
+Run the container:
+
+```console
+singularity  shell  /home/username/rocm/centos7-nvidia-rocm.sif
+```
+
+The above gives you Apptainer shell prompt:
+
+```console
+Singularity>
+```
+
+## Inside Container
+
+Verify that you have GPUs active and accessible on the given node:
+
+```console
+nvidia-smi
+```
+
+You should get output similar to:
+
+```console
+-----------------------------------------------------------------------------+
+| NVIDIA-SMI 515.65.07    Driver Version: 515.65.07    CUDA Version: 11.7     |
+|-------------------------------+----------------------+----------------------+
+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|                               |                      |               MIG M. |
+|===============================+======================+======================|
+|   0  NVIDIA A100-SXM...  Off  | 00000000:07:00.0 Off |                    0 |
+| N/A   26C    P0    50W / 400W |      0MiB / 40960MiB |      0%      Default |
+|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+|   1  NVIDIA A100-SXM...  Off  | 00000000:0B:00.0 Off |                    0 |
+| N/A   26C    P0    51W / 400W |      0MiB / 40960MiB |      0%      Default |
+|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+|   2  NVIDIA A100-SXM...  Off  | 00000000:48:00.0 Off |                    0 |
+| N/A   22C    P0    51W / 400W |      0MiB / 40960MiB |      0%      Default |
+|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+|   3  NVIDIA A100-SXM...  Off  | 00000000:4C:00.0 Off |                    0 |
+| N/A   25C    P0    52W / 400W |      0MiB / 40960MiB |      0%      Default |
+|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+|   4  NVIDIA A100-SXM...  Off  | 00000000:88:00.0 Off |                    0 |
+| N/A   22C    P0    51W / 400W |      0MiB / 40960MiB |      0%      Default |
+|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+|   5  NVIDIA A100-SXM...  Off  | 00000000:8B:00.0 Off |                    0 |
+| N/A   26C    P0    54W / 400W |      0MiB / 40960MiB |      0%      Default |
+|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+|   6  NVIDIA A100-SXM...  Off  | 00000000:C8:00.0 Off |                    0 |
+| N/A   25C    P0    52W / 400W |      0MiB / 40960MiB |      0%      Default |
+|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+|   7  NVIDIA A100-SXM...  Off  | 00000000:CB:00.0 Off |                    0 |
+| N/A   26C    P0    51W / 400W |      0MiB / 40960MiB |      0%      Default |
+|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+
+-----------------------------------------------------------------------------+
+| Processes:                                                                  |
+|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
+|        ID   ID                                                   Usage      |
+|=============================================================================|
+|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
+```
+
+### Code Example
+
+In this section, we show a basic code example. You can directly copy and paste the code to test it:
+
+```cpp
+// filename : /tmp/sample.cu
+
+#include <stdio.h>
+#include <cuda_runtime.h>
+
+#define CHECK(cmd) \
+{\
+    cudaError_t error  = cmd;\
+    if (error != cudaSuccess) { \
+        fprintf(stderr, "error: '%s'(%d) at %s:%d\n", cudaGetErrorString(error), error,__FILE__, __LINE__); \
+        exit(EXIT_FAILURE);\
+          }\
+}
+
+
+/*
+ * Square each element in the array A and write to array C.
+ */
+template <typename T>
+__global__ void
+vector_square(T *C_d, T *A_d, size_t N)
+{
+    size_t offset = (blockIdx.x * blockDim.x + threadIdx.x);
+    size_t stride = blockDim.x * gridDim.x ;
+
+    for (size_t i=offset; i<N; i+=stride) {
+        C_d[i] = A_d[i] * A_d[i];
+    }
+}
+
+
+int main(int argc, char *argv[])
+{
+    float *A_d, *C_d;
+    float *A_h, *C_h;
+    size_t N = 1000000;
+    size_t Nbytes = N * sizeof(float);
+
+    cudaDeviceProp props;
+    CHECK(cudaGetDeviceProperties(&props, 0/*deviceID*/));
+    printf ("info: running on device %s\n", props.name);
+
+    printf ("info: allocate host mem (%6.2f MB)\n", 2*Nbytes/1024.0/1024.0);
+    A_h = (float*)malloc(Nbytes);
+    CHECK(A_h == 0 ? cudaErrorMemoryAllocation : cudaSuccess );
+    C_h = (float*)malloc(Nbytes);
+    CHECK(C_h == 0 ? cudaErrorMemoryAllocation : cudaSuccess );
+    // Fill with Phi + i
+    for (size_t i=0; i<N; i++)
+    {
+        A_h[i] = 1.618f + i;
+    }
+
+    printf ("info: allocate device mem (%6.2f MB)\n", 2*Nbytes/1024.0/1024.0);
+    CHECK(cudaMalloc(&A_d, Nbytes));
+    CHECK(cudaMalloc(&C_d, Nbytes));
+
+
+    printf ("info: copy Host2Device\n");
+    CHECK ( cudaMemcpy(A_d, A_h, Nbytes, cudaMemcpyHostToDevice));
+
+    const unsigned blocks = 512;
+    const unsigned threadsPerBlock = 256;
+
+    printf ("info: launch 'vector_square' kernel\n");
+    vector_square <<<blocks, threadsPerBlock>>> (C_d, A_d, N);
+
+    printf ("info: copy Device2Host\n");
+    CHECK ( cudaMemcpy(C_h, C_d, Nbytes, cudaMemcpyDeviceToHost));
+
+    printf ("info: check result\n");
+    for (size_t i=0; i<N; i++)  {
+        if (C_h[i] != A_h[i] * A_h[i]) {
+            CHECK(cudaErrorUnknown);
+        }
+    }
+    printf ("PASSED!\n");
+}
+```
+
+First convert the CUDA sample code into HIP code:
+
+```console
+cd /tmp
+/opt/rocm/hip/bin/hipify-perl sample.cu > sample.cpp
+```
+
+This code can then be compiled using the following commands:
+
+```console
+cd /tmp
+export HIP_PLATFORM=$( /opt/rocm/hip/bin/hipconfig --platform )
+export HIPCC=/opt/rocm/hip/bin/hipcc
+$HIPCC sample.cpp -o sample
+```
+
+Running it, you should get the following output:
+
+```console
+Singularity> cd /tmp
+Singularity> ./sample
+info: running on device NVIDIA A100-SXM4-40GB
+info: allocate host mem (  7.63 MB)
+info: allocate device mem (  7.63 MB)
+info: copy Host2Device
+info: launch 'vector_square' kernel
+info: copy Device2Host
+info: check result
+PASSED!
+```
+
+[a]: nvidia-cuda.md
--- a/docs.it4i/software/sdk/nvhpc.md
+++ b/docs.it4i/software/sdk/nvhpc.md
+<style type="text/css">
+.tg  {border-collapse:collapse;border-spacing:0;}
+.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:12px;
+  overflow:hidden;padding:10px 5px;word-break:normal;}
+.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:12px;
+  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
+.tg .tg-lzqt{background-color:#656565;border-color:inherit;color:#ffffff;font-weight:bold;text-align:center;vertical-align:top}
+.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
+.tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
+</style>
+
+# NVIDIA HPC SDK
+
+The NVIDIA HPC Software Development Kit includes the proven compilers, libraries, and software tools
+essential to maximizing developer productivity and the performance and portability of HPC applications.
+
+## Installed Versions
+
+Different versions are available on Karolina, Barbora, and DGX-2.
+For the current version use the command:
+
+```console
+ml av nvhpc
+```
+
+## Components
+
+Below is the list of components in the NVIDIA HPC SDK.
+
+<table class="tg">
+<thead>
+  <tr>
+    <th class="tg-lzqt" colspan="7">Development</th>
+    <th class="tg-lzqt" colspan="2">Analysis</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td class="tg-7btt">Programming<br>Models</td>
+    <td class="tg-7btt" colspan="2">Compilers</td>
+    <td class="tg-7btt">Core<br>Libraries</td>
+    <td class="tg-7btt" colspan="2">Math<br>Libraries</td>
+    <td class="tg-7btt">Communication<br>Libraries</td>
+    <td class="tg-7btt">Profilers</td>
+    <td class="tg-7btt">Debuggers</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/hpc-sdk/compilers/c++-parallel-algorithms/index.html" target="blank">Standard C++</a> &amp; <a href="https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html" target="">Fortran</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html" target="blank">nvcc</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html" target="blank">nvc</a></td>
+    <td class="tg-c3ow"><a href="https://nvidia.github.io/libcudacxx/" target="blank">libcu++</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cublas/index.html#abstract" target="blank">cuBLAS</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cutensor/index.html" target="blank">cuTENSOR</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#mpi-use" target="blank">Open MPI</a></td>
+    <td class="tg-c3ow">Nsight</td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cuda-gdb/index.html" target="blank">Cuda-gdb</a></td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/hpc-sdk/compilers/openacc-gs/index.html" target="blank">OpenACC</a> &amp; <a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#openmp-use" target="blank">OpenMP</a></td>
+    <td class="tg-c3ow" colspan="2"><a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html" target="blank">nvc++</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/thrust/" target="blank">Thrust</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cusparse/index.html#abstract" target="blank">cuSPARSE</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cusolver/index.html#abstract" target="blank">cuSOLVER</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/nvshmem/" target="blank">NVSHMEM</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/nsight-systems/" target="blank">Systems</a></td>
+    <td class="tg-c3ow">Host</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html" target="blank">CUDA</a></td>
+    <td class="tg-c3ow" colspan="2"><a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html" target="blank">nvfortran</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cub/index.html" target="blank">CUB</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cufft/index.html#abstract" target="blank">cuFFT</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/curand/index.html" target="blank">cuRAND</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/index.html" target="blank">NCCL</a></td>
+    <td class="tg-c3ow"><a href="https://docs.nvidia.com/nsight-compute/" target="blank">Compute</a></td>
+    <td class="tg-c3ow">Device</td>
+  </tr>
+</tbody>
+</table>
+
+## References
+
+[NVIDIA HPC SDK homepage][1]<br>
+[Documentation][2]
+
+[1]: https://developer.nvidia.com/hpc-sdk
+[2]: https://docs.nvidia.com/hpc-sdk/index.html
--- a/docs.it4i/software/sdk/openacc-mpi.md
+++ b/docs.it4i/software/sdk/openacc-mpi.md
+# OpenACC MPI Tutorial
+
+This tutorial is an excerpt from Nvidia's [5× in 5 Hours: Porting a 3D Elastic Wave Simulator to GPUs Using OpenACC][1] tutorial.
+All source code for this tutorial can be downloaded as part of this [tarball][2].
+`SEISMIC_CPML`, developed by Dimitri Komatitsch and Roland Martin from University of Pau, France,
+is a set of ten open-source Fortran 90 programs.
+
+!!!note
+    Before building and running each step,
+    make sure that the compiler (`pgfortran`) and MPI wrappers (`mpif90`) are in your path.
+
+## Step 0: Evaluation
+
+Before you start, you should evaluate the code to determine
+whether it is worth accelerating.
+Using the compiler flag `-⁠Minfo=intensity`, you can see
+that the average compute intensity of the various loops is between 2.5 and 2.64.
+As a rule, anything below 1.0 is generally not worth accelerating
+unless it is part of a larger program.
+
+To build and run the original MPI/OpenMP code on your system, do the following:
+
+```console
+cd step0
+make build
+make run
+make verify
+```
+
+## Step 1: Adding Setup Code
+
+Because this is an MPI code where each process will use its own GPU,
+you need to add some utility code to ensure that happens.
+The `setDevice` routine first determines which node the process is on
+(via a call to `hostid`) and then gathers the hostids from all other processes.
+It then determines how many GPUs are available on the node
+and assigns the devices to each process.
+
+Note that in order to maintain portability with the CPU version,
+this section of code is guarded by the preprocessor macro `_OPENACC`,
+which is defined when the OpenACC directives are enabled in the HPC Fortran compiler
+through the use of the `-⁠acc` command-line compiler option.
+
+```code
+#ifdef _OPENACC
+#
+function setDevice(nprocs,myrank)
+
+  use iso_c_binding
+  use openacc
+  implicit none
+  include 'mpif.h'
+
+  interface
+    function gethostid() BIND(C)
+      use iso_c_binding
+      integer (C_INT) :: gethostid
+    end function gethostid
+  end interface
+
+  integer :: nprocs, myrank
+  integer, dimension(nprocs) :: hostids, localprocs
+  integer :: hostid, ierr, numdev, mydev, i, numlocal
+  integer :: setDevice
+
+! get the hostids so we can determine what other processes are on this node
+  hostid = gethostid()
+  CALL mpi_allgather(hostid,1,MPI_INTEGER,hostids,1,MPI_INTEGER, &
+                     MPI_COMM_WORLD,ierr)
+
+! determine which processors are on this node
+  numlocal=0
+  localprocs=0
+  do i=1,nprocs
+    if (hostid .eq. hostids(i)) then
+      localprocs(i)=numlocal
+      numlocal = numlocal+1
+    endif
+  enddo
+
+! get the number of devices on this node
+  numdev = acc_get_num_devices(ACC_DEVICE_NVIDIA)
+
+  if (numdev .lt. 1) then
+    print *, 'ERROR: There are no devices available on this host.  &
+              ABORTING.', myrank
+    stop
+  endif
+
+! print a warning if the number of devices is less then the number
+! of processes on this node.  Having multiple processes share devices is not
+! recommended.
+  if (numdev .lt. numlocal) then
+   if (localprocs(myrank+1).eq.1) then
+     ! print the message only once per node
+   print *, 'WARNING: The number of process is greater then the number  &
+             of GPUs.', myrank
+   endif
+   mydev = mod(localprocs(myrank+1),numdev)
+  else
+   mydev = localprocs(myrank+1)
+  endif
+
+ call acc_set_device_num(mydev,ACC_DEVICE_NVIDIA)
+ call acc_init(ACC_DEVICE_NVIDIA)
+ setDevice = mydev
+
+end function setDevice
+#endif
+```
+
+To build and run the step1 code on your system do the following:
+
+```console
+cd step1
+make build
+make run
+make verify
+```
+
+## Step 2: Adding Compute Regions
+
+Next, you add six compute regions around the eight parallel loops.
+For example, here's the final reduction loop.
+
+```code
+!$acc kernels
+  do k = kmin,kmax
+    do j = NPOINTS_PML+1, NY-NPOINTS_PML
+      do i = NPOINTS_PML+1, NX-NPOINTS_PML
+
+! compute kinetic energy first, defined as 1/2 rho ||v||^2
+! in principle we should use rho_half_x_half_y instead of rho for vy
+! in order to interpolate density at the right location in the staggered grid
+! cell but in a homogeneous medium we can safely ignore it
+
+      total_energy_kinetic = total_energy_kinetic + 0.5d0 * rho*( &
+              vx(i,j,k)**2 + vy(i,j,k)**2 + vz(i,j,k)**2)
+
+! add potential energy, defined as 1/2 epsilon_ij sigma_ij
+! in principle we should interpolate the medium parameters at the right location
+! in the staggered grid cell but in a homogeneous medium we can safely ignore it
+
+! compute total field from split components
+      epsilon_xx = ((lambda + 2.d0*mu) * sigmaxx(i,j,k) - lambda *  &
+      sigmayy(i,j,k) - lambda*sigmazz(i,j,k)) / (4.d0 * mu * (lambda + mu))
+      epsilon_yy = ((lambda + 2.d0*mu) * sigmayy(i,j,k) - lambda *  &
+          sigmaxx(i,j,k) - lambda*sigmazz(i,j,k)) / (4.d0 * mu * (lambda + mu))
+      epsilon_zz = ((lambda + 2.d0*mu) * sigmazz(i,j,k) - lambda *  &
+          sigmaxx(i,j,k) - lambda*sigmayy(i,j,k)) / (4.d0 * mu * (lambda + mu))
+      epsilon_xy = sigmaxy(i,j,k) / (2.d0 * mu)
+      epsilon_xz = sigmaxz(i,j,k) / (2.d0 * mu)
+      epsilon_yz = sigmayz(i,j,k) / (2.d0 * mu)
+
+      total_energy_potential = total_energy_potential + &
+        0.5d0 * (epsilon_xx * sigmaxx(i,j,k) + epsilon_yy * sigmayy(i,j,k) + &
+        epsilon_yy * sigmayy(i,j,k)+ 2.d0 * epsilon_xy * sigmaxy(i,j,k) + &
+        2.d0*epsilon_xz * sigmaxz(i,j,k)+2.d0*epsilon_yz * sigmayz(i,j,k))
+
+      enddo
+    enddo
+  enddo
+!$acc end kernels
+```
+
+The `-⁠acc` command line option to the HPC Accelerator Fortran compiler enables OpenACC directives. Note that OpenACC is meant to model a generic class of devices.
+
+Another compiler option you'll want to use during development is `-⁠Minfo`,
+which provides feedback on optimizations and transformations performed on your code.
+For accelerator-specific information, use the `-⁠Minfo=accel` sub-option.
+
+Examples of feedback messages produced when compiling `SEISMIC_CPML` include:
+
+```console
+   1113, Generating copyin(vz(11:91,11:631,kmin:kmax))
+         Generating copyin(vy(11:91,11:631,kmin:kmax))
+         Generating copyin(vx(11:91,11:631,kmin:kmax))
+         Generating copyin(sigmaxx(11:91,11:631,kmin:kmax))
+         Generating copyin(sigmayy(11:91,11:631,kmin:kmax))
+         Generating copyin(sigmazz(11:91,11:631,kmin:kmax))
+         Generating copyin(sigmaxy(11:91,11:631,kmin:kmax))
+         Generating copyin(sigmaxz(11:91,11:631,kmin:kmax))
+         Generating copyin(sigmayz(11:91,11:631,kmin:kmax))
+```
+
+To compute on a GPU, the first step is to move data from host memory to GPU memory.
+In the example above, the compiler tells you that it is copying over nine arrays.
+
+Note the `copyin` statements.
+These mean that the compiler will only copy the data to the GPU
+but not copy it back to the host.
+This is because line 1113 corresponds to the start of the reduction loop compute region,
+where these arrays are used but never modified.
+
+Data movement clauses:
+
+* `copyin` - the data is copied only to the GPU;
+* `copy` - the data is copied to the device at the beginning of the region and copied back at the end of the region;
+* `copyout` - the data is only copied back to the host.
+
+The compiler is conservative and only copies the data
+that's actually required to perform the necessary computations.
+Unfortunately, because the interior sub-arrays are not contiguous in host memory,
+the compiler needs to generate multiple data transfers for each array.
+
+```console
+   1114, Loop is parallelizable
+   1115, Loop is parallelizable
+   1116, Loop is parallelizable
+         Accelerator kernel generated
+```
+
+Here the compiler has performed dependence analysis
+on the loops at lines 1114, 1115, and 1116 (the reduction loop shown earlier).
+It finds that all three loops are parallelizable so it generates an accelerator kernel.
+
+The compiler may attempt to work around dependences that prevent parallelization by interchanging loops (i.e changing the order) where it's safe to do so. At least one outer or interchanged loop must be parallel for an accelerator kernel to be generated.
+
+How the threads are organized is called the loop schedule.
+Below you can see the loop schedule for our reduction loop.
+The do loops have been replaced with a three-dimensional gang,
+which in turn is composed of a two-dimensional vector section.
+
+```console
+       1114, !$acc loop gang ! blockidx%y
+       1115, !$acc loop gang, vector(4) ! blockidx%z threadidx%y
+       1116, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
+```
+
+In CUDA terminology, the gang clause corresponds to a grid dimension
+and the vector clause corresponds to a thread block dimension.
+
+So here we have a 3-D array that's being grouped into blocks of 32×4 elements
+where a single thread is working on a specific element.
+Because the number of gangs is not specified in the loop schedule,
+it will be determined dynamically when the kernel is launched.
+If the gang clause had a fixed width, such as gang(16),
+then each kernel would be written to loop over multiple elements.
+
+With CUDA, programming reductions and managing shared memory can be a fairly difficult task.
+In the example below, the compiler has automatically generated optimal code using these features.
+
+```console
+       1122, Sum reduction generated for total_energy_kinetic
+       1140, Sum reduction generated for total_energy_potential
+```
+
+To build and run the step2 code on your system do the following:
+
+```console
+cd step2
+make build
+make run
+make verify
+```
+
+## Step 3: Adding Data Regions
+
+!!! tip
+    Set the environment variable `PGI_ACC_TIME=1` and run your executable.
+    This option prints basic profile information such as the kernel execution time,
+    data transfer time, initialization time, the actual launch configuration,
+    and total time spent in a compute region.
+    Note that the total time is measured from the host and includes time spent executing host code within a region.
+
+To improve performance, you should minimize the amount of time transferring data,
+i.e. the data directive.
+You can use a data region to specify exact points in your program
+where data should be copied from host memory to GPU memory, and back again.
+Any compute region enclosed within a data region will use the previously copied data,
+without the need to copy at the boundaries of the compute region.
+A data region can span across host code and multiple compute regions,
+and even across subroutine boundaries.
+
+In looking at the arrays in `SEISMIC_CMPL`, there are 18 arrays with constant values.
+Another 21 are used only within compute regions so are never needed on the host.
+Let's start by adding a data region around the outer time step loop.
+The final three arrays do need to be copied back to the host to pass their halos.
+
+For those cases, we use the update directive.
+
+```code
+!---
+!---  beginning of time loop
+!---
+!$acc data &
+!$acc copyin(a_x_half,b_x_half,k_x_half,                       &
+!$acc        a_y_half,b_y_half,k_y_half,                       &
+!$acc        a_z_half,b_z_half,k_z_half,                       &
+!$acc        a_x,a_y,a_z,b_x,b_y,b_z,k_x,k_y,k_z,              &
+!$acc        sigmaxx,sigmaxz,sigmaxy,sigmayy,sigmayz,sigmazz,  &
+!$acc        memory_dvx_dx,memory_dvy_dx,memory_dvz_dx,        &
+!$acc        memory_dvx_dy,memory_dvy_dy,memory_dvz_dy,        &
+!$acc        memory_dvx_dz,memory_dvy_dz,memory_dvz_dz,        &
+!$acc        memory_dsigmaxx_dx, memory_dsigmaxy_dy,           &
+!$acc        memory_dsigmaxz_dz, memory_dsigmaxy_dx,           &
+!$acc        memory_dsigmaxz_dx, memory_dsigmayz_dy,           &
+!$acc        memory_dsigmayy_dy, memory_dsigmayz_dz,           &
+!$acc        memory_dsigmazz_dz)
+
+  do it = 1,NSTEP
+
+...
+
+!$acc update host(sigmazz,sigmayz,sigmaxz)
+! sigmazz(k+1), left shift
+  call MPI_SENDRECV(sigmazz(:,:,1),number_of_values,MPI_DOUBLE_PRECISION, &
+         receiver_left_shift,message_tag,sigmazz(:,:,NZ_LOCAL+1), &
+         number_of_values,
+
+...
+
+!$acc update device(sigmazz,sigmayz,sigmaxz)
+
+...
+
+  ! --- end of time loop
+  enddo
+!$acc end data
+```
+
+Data regions can be nested, and in fact we used this feature
+in the time loop body for the arrays vx, vy and vz as shown below.
+While these arrays are copied back and forth at the inner data region boundary,
+and so are moved more often than the arrays moved in the outer data region,
+they are used across multiple compute regions
+instead of being copied at each compute region boundary.
+
+Note that we do not specify any array dimensions in the copy clause.
+This instructs the compiler to copy each array in its entirety as a contiguous block,
+and eliminates the inefficiency we noted earlier
+when interior sub-arrays were being copied in multiple blocks.
+
+```code
+!$acc data copy(vx,vy,vz)
+
+... data region spans over 5 compute regions and host code
+
+!$acc kernels
+
+...
+
+!$acc end kernels
+
+!$acc end data
+```
+
+To build and run the step3 code on your system do the following:
+
+```console
+cd step3
+make build
+make run
+make verify
+```
+
+## Step 4: Optimizing Data Transfers
+
+The next steps further optimizes the data transfers
+by migrating as much of the computation as we can over to the GPU
+and moving only the absolute minimum amount of data required.
+The first step is to move the start of the outer data region up
+so that it occurs earlier in the code, and to put the data initialization loops into compute kernels.
+This includes the `vx`, `vy`, and `vz` arrays.
+This approach enables you to remove the inner data region used in the previous optimization step.
+
+In the following example code, notice the use of the `create` clause.
+This instructs the compiler to allocate space for variables in GPU memory for local use
+but to perform no data movement on those variables.
+Essentially they are used as scratch variables in GPU memory.
+
+```console
+!$acc data                                                     &
+!$acc copyin(a_x_half,b_x_half,k_x_half,                       &
+!$acc        a_y_half,b_y_half,k_y_half,                       &
+!$acc        a_z_half,b_z_half,k_z_half,                       &
+!$acc        ix_rec,iy_rec,                                    &
+!$acc        a_x,a_y,a_z,b_x,b_y,b_z,k_x,k_y,k_z),             &
+!$acc copyout(sisvx,sisvy),                                    &
+!$acc create(memory_dvx_dx,memory_dvy_dx,memory_dvz_dx,        &
+!$acc        memory_dvx_dy,memory_dvy_dy,memory_dvz_dy,        &
+!$acc        memory_dvx_dz,memory_dvy_dz,memory_dvz_dz,        &
+!$acc        memory_dsigmaxx_dx, memory_dsigmaxy_dy,           &
+!$acc        memory_dsigmaxz_dz, memory_dsigmaxy_dx,           &
+!$acc        memory_dsigmaxz_dx, memory_dsigmayz_dy,           &
+!$acc        memory_dsigmayy_dy, memory_dsigmayz_dz,           &
+!$acc        memory_dsigmazz_dz,                               &
+!$acc        vx,vy,vz,vx1,vy1,vz1,vx2,vy2,vz2,                 &
+!$acc        sigmazz1,sigmaxz1,sigmayz1,                       &
+!$acc        sigmazz2,sigmaxz2,sigmayz2)                       &
+!$acc copyin(sigmaxx,sigmaxz,sigmaxy,sigmayy,sigmayz,sigmazz)
+
+...
+
+! Initialize vx, vy and vz arrays on the device
+!$acc kernels
+  vx(:,:,:) = ZERO
+  vy(:,:,:) = ZERO
+  vz(:,:,:) = ZERO
+!$acc end kernels
+
+...
+```
+
+One caveat to using data regions is that you must be aware of which copy
+(host or device) of the data you are actually using in a given loop or computation.
+For example, any update to the copy of a variable in device memory
+won't be reflected in the host copy until you specified
+using either an update directive or a `copy` clause at a data or compute region boundary.
+
+!!! important
+    Unintentional loss of coherence between the host and device copy of a variable is one of the most common causes of validation errors in OpenACC programs.
+
+After making the above change to `SEISMIC_CPML`, the code generated incorrect results. After debugging, it was determined that the section of the time step loop
+that initializes boundary conditions was omitted from an OpenACC compute region.
+As a result, we were initializing the host copy of the data,
+rather than the device copy as intended, which resulted in uninitialized variables in device memory.
+
+The next challenge in optimizing the data transfers related to the handling of the halo regions.
+`SEISMIC_CPML` passes halos from six 3-D arrays between MPI processes during the course of the computations.
+
+After some experimentation, we settled on an approach whereby we added six new temporary 2-D arrays to hold the halo data.
+Within a compute region we gathered the 2-D halos from the main 3-D arrays
+into the new temp arrays, copied the temporaries back to the host in one contiguous block,
+passed the halos between MPI processes, and finally copied the exchanged values
+back to device memory and scattered the halos back into the 3-D arrays.
+While this approach does add to the kernel execution time, it saves a considerable amount of data transfer time.
+
+In the example code below, note that the source code added to support the halo
+gathers and transfers is guarded by the preprocessor `_OPENACC` macro
+and will only be executed if the code is compiled by an OpenACC-enabled compiler.
+
+```code
+#ifdef _OPENACC
+#
+! Gather the sigma 3D arrays to a 2D slice to allow for faster
+! copy from the device to host
+!$acc kernels
+   do i=1,NX
+    do j=1,NY
+      vx1(i,j)=vx(i,j,1)
+      vy1(i,j)=vy(i,j,1)
+      vz1(i,j)=vz(i,j,NZ_LOCAL)
+    enddo
+  enddo
+!$acc end kernels
+!$acc update host(vxl,vyl,vzl)
+
+! vx(k+1), left shift
+  call MPI_SENDRECV(vx1(:,:), number_of_values, MPI_DOUBLE_PRECISION, &
+       receiver_left_shift, message_tag, vx2(:,:), number_of_values, &
+       MPI_DOUBLE_PRECISION, sender_left_shift, message_tag, MPI_COMM_WORLD,&
+       message_status, code)
+
+! vy(k+1), left shift
+  call MPI_SENDRECV(vy1(:,:), number_of_values, MPI_DOUBLE_PRECISION, &
+       receiver_left_shift,message_tag, vy2(:,:),number_of_values,   &
+       MPI_DOUBLE_PRECISION, sender_left_shift, message_tag, MPI_COMM_WORLD,&
+       message_status, code)
+
+! vz(k-1), right shift
+  call MPI_SENDRECV(vz1(:,:), number_of_values, MPI_DOUBLE_PRECISION, &
+       receiver_right_shift, message_tag, vz2(:,:), number_of_values, &
+       MPI_DOUBLE_PRECISION, sender_right_shift, message_tag, MPI_COMM_WORLD, &
+       message_status, code)
+
+!$acc update device(vx2,vy2,vz2)
+!$acc kernels
+  do i=1,NX
+    do j=1,NY
+      vx(i,j,NZ_LOCAL+1)=vx2(i,j)
+      vy(i,j,NZ_LOCAL+1)=vy2(i,j)
+      vz(i,j,0)=vz2(i,j)
+    enddo
+  enddo
+!$acc end kernels
+
+#else
+```
+
+To build and run the step4 code on your system do the following:
+
+```console
+cd step4
+make build
+make run
+make verify
+```
+
+## Step 5: Loop Schedule Tuning
+
+The final step is to tune the OpenACC compute region loop schedules
+using the gang, worker, and vector clauses.
+The default kernel schedules chosen by the NVIDIA OpenACC compiler are usually quite good.
+Manual tuning efforts often don't improve timings significantly,
+but it's always worthwhile to spend a little time examining
+whether you can do better by overriding compiler-generated loop schedules
+using explicit loop scheduling clauses.
+You can usually tell fairly quickly if the clauses are having an effect.
+
+Note that there is no well-defined method for finding an optimal kernel schedule.
+The best advice is to start with the compiler's default schedule and try small adjustments
+to see if and how they affect execution time.
+The kernel schedule you choose will affect whether and how shared memory is used,
+global array accesses, and various types of optimizations.
+Typically, it's better to perform gang scheduling of loops with large iteration counts.
+
+```code
+!$acc loop gang
+  do k = k2begin,NZ_LOCAL
+    kglobal = k + offset_k
+!$acc loop worker vector collapse(2)
+    do j = 2,NY
+      do i = 2,NX
+```
+
+To build and run the step5 code on your system do the following:
+
+```console
+cd step5
+make build
+make run
+make verify
+```
+
+[1]: https://docs.nvidia.com/hpc-sdk/compilers/openacc-mpi-tutorial/index.html
+[2]: https://docs.nvidia.com/hpc-sdk/compilers/openacc-mpi-tutorial/openacc-mpi-tutorial.tar.gz
--- a/docs.it4i/software/tools/ansys/ansys-cfx.md
+++ b/docs.it4i/software/tools/ansys/ansys-cfx.md
+# ANSYS CFX
+
+[ANSYS CFX][a] is a high-performance, general purpose fluid dynamics program
+that has been applied to solve wide-ranging fluid flow problems for over 20 years.
+At the heart of ANSYS CFX is its advanced solver technology,
+the key to achieving reliable and accurate solutions quickly and robustly.
+The modern, highly parallelized solver is the foundation for an abundant choice of physical models
+to capture virtually any type of phenomena related to fluid flow.
+The solver and its many physical models are wrapped in a modern, intuitive, and flexible GUI and user environment,
+with extensive capabilities for customization and automation using session files, scripting and a powerful expression language.
+
+To run ANSYS CFX in batch mode, you can utilize/modify the default `cfx.slurm` script and execute it via the `sbatch` command:
+
+```bash
+#!/bin/bash
+#SBATCH --nodes=5             # Request 5 nodes
+#SBATCH --ntasks-per-node=128 # Request 128 MPI processes per node
+#SBATCH --job-name=ANSYS-test # Job name
+#SBATCH --partition=qcpu      # Partition name
+#SBATCH --account=ACCOUNT_ID  # Account/project ID
+#SBATCH --output=%x-%j.out    # Output log file with job name and job ID
+#SBATCH --time=04:00:00       # Walltime
+
+#!change the working directory (default is home directory)
+#cd <working directory> (working directory must exists)
+DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
+mkdir -p "$DIR"
+cd "$DIR" || exit
+
+echo Running on host `hostname`
+echo Time is `date`
+echo Directory is `pwd`
+echo This jobs runs on the following processors:
+echo `$SLURM_NODELIST`
+
+ml ANSYS/2023R2-intel-2022.12
+
+#### Set number of processors per host listing
+procs_per_host=1
+#### Create host list
+hl=""
+for host in $(scontrol show hostname $SLURM_NODELIST)
+do
+ if [ "$hl" = "" ]
+ then hl="$host:$procs_per_host"
+ else hl="${hl}:$host:$procs_per_host"
+ fi
+done
+
+echo Machines: $hl
+
+#-dev input.def includes the input of CFX analysis in DEF format
+#-P the name of prefered license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial))
+cfx5solve -def input.def -size 4 -size-ni 4x -part-large -start-method "Platform MPI Distributed Parallel" -par-dist $hl -P aa_r
+```
+
+SVS FEM recommends utilizing sources by keywords: nodes, ppn.
+These keywords allow addressing directly the number of nodes (computers) and cores (ppn) utilized in the job.
+In addition, the rest of the code assumes such structure of allocated resources.
+
+A working directory has to be created before sending the Slurm job into the queue.
+The input file should be in the working directory or a full path to the input file has to be specified.
+The input file has to be defined by a common CFX def file which is attached to the CFX solver via the `-def` parameter.
+
+The **license** should be selected by the `-P` parameter.
+Licensed products are: `aa_r` (ANSYS **Academic** Research) and `ane3fl` (ANSYS Multiphysics-**Commercial**).
+
+[a]: http://www.ansys.com/products/fluids/ansys-cfx
--- a/docs.it4i/software/tools/ansys/ansys-fluent.md
+++ b/docs.it4i/software/tools/ansys/ansys-fluent.md
+# ANSYS Fluent
+
+[ANSYS Fluent][a] software contains the broad physical modeling capabilities needed to model flow,
+turbulence, heat transfer, and reactions for industrial applications ranging
+from air flow over an aircraft wing to combustion in a furnace, from bubble columns to oil platforms,
+from blood flow to semiconductor manufacturing, and from clean room design to wastewater treatment plants.
+Special models that give the software the ability to model in-cylinder combustion,
+aeroacoustics, turbomachinery, and multiphase systems have served to broaden its reach.
+
+## Common Way to Run Fluent
+
+To run ANSYS Fluent in a batch mode, you can utilize/modify the default `fluent.slurm` script and execute it via the `sbatch` command:
+
+```bash
+#!/bin/bash
+#SBATCH --nodes=5             # Request 5 nodes
+#SBATCH --ntasks-per-node=128 # 128 MPI processes per node
+#SBATCH --job-name=ANSYS-test # Job name
+#SBATCH --partition=qcpu      # Partition name
+#SBATCH --account=ACCOUNT_ID  # Account/project ID
+#SBATCH --output=%x-%j.out    # Output log file with job name and job ID
+#SBATCH --time=04:00:00       # Walltime
+
+#!change the working directory (default is home directory)
+#cd <working directory> (working directory must exists)
+DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
+mkdir -p "$DIR"
+cd "$DIR" || exit
+
+echo Running on host `hostname`
+echo Time is `date`
+echo Directory is `pwd`
+echo This jobs runs on the following processors:
+echo $SLURM_NODELIST
+
+#### Load ansys module so that we find the cfx5solve command
+ml ANSYS/2023R2-intel-2022.12
+
+# Count the total number of cores allocated
+NCORES=$SLURM_NTASKS
+
+fluent 3d -t$NCORES -cnf=$SLURM_NODELIST -g -i fluent.jou
+```
+
+[SVS FEM][b] recommends utilizing sources by keywords: nodes, ppn.
+These keywords allows addressing directly the number of nodes (computers) and cores (ppn) utilized in the job.
+In addition, the rest of the code assumes such structure of allocated resources.
+
+A working directory has to be created before sending the job into the queue.
+The input file should be in the working directory or a full path to the input file has to be specified.
+The input file has to be defined by a common Fluent journal file
+which is attached to the Fluent solver via the `-i fluent.jou` parameter.
+
+A journal file with the definition of the input geometry and boundary conditions
+and defined process of solution has, for example, the following structure:
+
+```console
+    /file/read-case aircraft_2m.cas.gz
+    /solve/init
+    init
+    /solve/iterate
+    10
+    /file/write-case-dat aircraft_2m-solution
+    /exit yes
+```
+
+The appropriate dimension of the problem has to be set by a parameter (`2d`/`3d`).
+
+## Fast Way to Run Fluent From Command Line
+
+```console
+fluent solver_version [FLUENT_options] -i journal_file -slurm
+```
+
+This syntax will start the ANSYS FLUENT job under Slurm using the sbatch commnad.
+When resources are available, Slurm will start the job and return the job ID, usually in the form of `_job_ID.hostname_`.
+This job ID can then be used to query, control, or stop the job using standard Slurm commands, such as `squeue` or `scancel`.
+The job will be run out of the current working directory and all output will be written to the fluent.o `_job_ID_` file.
+
+## Running Fluent via User's Config File
+
+If no command line arguments are present, the sample script uses a configuration file called slurm_fluent.conf.
+This configuration file should be present in the directory from which the jobs are submitted
+(which is also the directory in which the jobs are executed).
+The following is an example of what the content of slurm_fluent.conf can be:
+
+```console
+input="example_small.flin"
+case="Small-1.65m.cas"
+fluent_args="3d -pmyrinet"
+outfile="fluent_test.out"
+mpp="true"
+```
+
+The following is an explanation of the parameters:
+
+`input` is the name of the input file.
+
+`case` is the name of the .cas file that the input file will utilize.
+
+`fluent_args` are extra ANSYS FLUENT arguments. As shown in the previous example, you can specify the interconnect by using the `-p interconnect` command. The available interconnects include ethernet (default), Myrinet, InfiniBand, Vendor, Altix, and Crayx. MPI is selected automatically, based on the specified interconnect.
+
+`outfile` is the name of the file to which the standard output will be sent.
+
+`mpp="true"` will tell the job script to execute the job across multiple processors.
+
+To run ANSYS Fluent in batch mode with the user's config file, you can utilize/modify the following script and execute it via the `sbatch` command:
+
+```bash
+#!/bin/sh
+#SBATCH --nodes=2                       # Request 2 nodes
+#SBATCH --ntasks-per-node=4             # 4 MPI processes per node
+#SBATCH --cpus-per-task=128             # 128 CPUs (threads) per MPI process
+#SBATCH --job-name=$USE-Fluent-Project  # Job name
+#SBATCH --partition=qprod               # Partition name
+#SBATCH --account=XX-YY-ZZ              # Account/project ID
+#SBATCH --output=%x-%j.out              # Output file name with job name and job ID
+#SBATCH --time=04:00:00                 # Walltime
+
+ cd $SLURM_SUBMIT_DIR
+
+ #We assume that if they didn’t specify arguments then they should use the
+ #config file if ["xx${input}${case}${mpp}${fluent_args}zz" = "xxzz" ]; then
+   if [ -f slurm_fluent.conf ]; then
+     . slurm_fluent.conf
+   else
+     printf "No command line arguments specified, "
+     printf "and no configuration file found.  Exiting n"
+   fi
+ fi
+
+
+ #Augment the ANSYS FLUENT command line arguments case "$mpp" in
+   true)
+     #MPI job execution scenario
+     num_nodes=‘$SLURM_NODELIST | sort -u | wc -l‘
+     cpus=‘expr $num_nodes * $NCPUS‘
+     #Default arguments for mpp jobs, these should be changed to suit your
+     #needs.
+     fluent_args="-t${cpus} $fluent_args -cnf=$SLURM_NODELIST"
+     ;;
+   *)
+     #SMP case
+     #Default arguments for smp jobs, should be adjusted to suit your
+     #needs.
+     fluent_args="-t$NCPUS $fluent_args"
+     ;;
+ esac
+ #Default arguments for all jobs
+ fluent_args="-ssh -g -i $input $fluent_args"
+
+ echo "---------- Going to start a fluent job with the following settings:
+ Input: $input
+ Case: $case
+ Output: $outfile
+ Fluent arguments: $fluent_args"
+
+ #run the solver
+ fluent $fluent_args  > $outfile
+```
+
+It runs the jobs out of the directory from which they are submitted (SLURM_SUBMIT_DIR).
+
+## Running Fluent in Parralel
+
+Fluent could be run in parallel only under the Academic Research license.
+To do this, the ANSYS Academic Research license must be placed before the ANSYS CFD license in user preferences.
+To make this change, the anslic_admin utility should be run:
+
+```console
+/ansys_inc/shared_les/licensing/lic_admin/anslic_admin
+```
+
+The ANSLIC_ADMIN utility will be run:
+
+![](../../../img/Fluent_Licence_1.jpg)
+
+![](../../../img/Fluent_Licence_2.jpg)
+
+![](../../../img/Fluent_Licence_3.jpg)
+
+The ANSYS Academic Research license should be moved up to the top of the list:
+
+![](../../../img/Fluent_Licence_4.jpg)
+
+[a]: http://www.ansys.com/products/fluids/ansys-fluent
+[b]: http://www.svsfem.cz
--- a/docs.it4i/software/tools/ansys/ansys-ls-dyna.md
+++ b/docs.it4i/software/tools/ansys/ansys-ls-dyna.md
+# ANSYS LS-DYNA
+
+[ANSYSLS-DYNA][a] provides convenient and easy-to-use access to the technology-rich,
+time-tested explicit solver without the need to contend
+with the complex input requirements of this sophisticated program.
+Introduced in 1996, ANSYS LS-DYNA capabilities have helped customers in numerous industries
+to resolve highly intricate design issues.
+ANSYS Mechanical users have been able to take advantage of complex explicit solutions
+for a long time utilizing the traditional ANSYS Parametric Design Language (APDL) environment.
+These explicit capabilities are available to ANSYS Workbench users as well.
+The Workbench platform is a powerful, comprehensive, easy-to-use environment for engineering simulation.
+CAD import from all sources, geometry cleanup, automatic meshing, solution,
+parametric optimization, result visualization, and comprehensive report generation
+are all available within a single fully interactive modern graphical user environment.
+
+To run ANSYS LS-DYNA in batch mode, you can utilize/modify the default `ansysdyna.slurm` script
+and execute it via the `sbatch` command:
+
+```bash
+#!/bin/bash
+#SBATCH --nodes=5             # Request 5 nodes
+#SBATCH --ntasks-per-node=128 # Request 128 MPI processes per node
+#SBATCH --job-name=ANSYS-test # Job name
+#SBATCH --partition=qcpu      # Partition name
+#SBATCH --account=PROJECT_ID  # Account/project ID
+#SBATCH --output=%x-%j.out    # Output log file with job name and job ID
+#SBATCH --time=04:00:00       # Walltime
+
+
+#!change the working directory (default is home directory)
+#cd <working directory>
+DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
+mkdir -p "$DIR"
+cd "$DIR" || exit
+
+echo Running on host `hostname`
+echo Time is `date`
+echo Directory is `pwd`
+echo This jobs runs on the following processors:
+echo $SLURM_NODELIST
+
+#! Counts the number of processors
+NPROCS=$(scontrol show hostname $SLURM_NODELIST | wc -l)
+
+echo This job has allocated $NPROCS nodes
+
+ml ANSYS/2023R2-intel-2022.12
+
+#### Set number of processors per host listing
+procs_per_host=1
+#### Create host list
+hl=""
+for host in $(scontrol show hostname $SLURM_NODELIST)
+do
+ if [ "$hl" = "" ]
+ then hl="$host:$procs_per_host"
+ else hl="${hl}:$host:$procs_per_host"
+ fi
+done
+
+echo Machines: $hl
+
+ansys211 -dis -lsdynampp i=input.k -machines $hl
+```
+
+[SVS FEM][b] recommends to utilize sources by keywords: nodes, ppn.
+These keywords allows addressing directly the number of nodes (computers)
+and cores (ppn) utilized in the job.
+In addition, the rest of the code assumes such structure of allocated resources.
+
+[a]: http://www.ansys.com/products/structures/ansys-ls-dyna
+[b]: http://www.svsfem.cz
--- a/docs.it4i/software/tools/ansys/ansys-mechanical-apdl.md
+++ b/docs.it4i/software/tools/ansys/ansys-mechanical-apdl.md
+# ANSYS MAPDL
+
+[ANSYS Multiphysics][a] offers a comprehensive product solution for both multiphysics and single-physics analysis.
+The product includes structural, thermal, fluid, and both high- and low-frequency electromagnetic analysis.
+The product also contains solutions for both direct and sequentially coupled physics problems
+including direct coupled-field elements and the ANSYS multi-field solver.
+
+To run ANSYS MAPDL in batch mode you can utilize/modify the default `mapdl.slurm` script and execute it via the `sbatch` command:
+
+```bash
+#!/bin/bash
+#SBATCH --nodes=5             # Request 5 nodes
+#SBATCH --ntasks-per-node=128 # Request 128 MPI processes per node
+#SBATCH --job-name=ANSYS-test # Job name
+#SBATCH --partition=qcpu      # Partition name
+#SBATCH --account=PROJECT_ID  # Account/project ID
+#SBATCH --output=%x-%j.out    # Output log file with job name and job ID
+#SBATCH --time=04:00:00       # Walltime
+
+#!change the working directory (default is home directory)
+#cd <working directory> (working directory must exists)
+DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
+mkdir -p "$DIR"
+cd "$DIR" || exit
+
+echo Running on host `hostname`
+echo Time is `date`
+echo Directory is `pwd`
+echo This jobs runs on the following processors:
+echo $SLURM_NODELIST
+
+ml ANSYS/2023R2-intel-2022.12
+
+#### Set number of processors per host listing
+procs_per_host=1
+
+#### Create host list
+hl=""
+for host in $(scontrol show hostname $SLURM_NODELIST)
+do
+ if [ "$hl" = "" ]
+ then hl="$host:$procs_per_host"
+ else hl="${hl}:$host:$procs_per_host"
+ fi
+done
+
+echo Machines: $hl
+
+#-i input.dat includes the input of analysis in APDL format
+#-o file.out is output file from ansys where all text outputs will be redirected
+#-p the name of license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial), aa_r_dy=Academic AUTODYN)
+ansys211 -b -dis -p aa_r -i input.dat -o file.out -machines $hl -dir $WORK_DIR
+```
+
+[SVS FEM][b] recommends utilizing sources by keywords: nodes, ppn.
+These keywords allow addressing directly the number of nodes (computers) and cores (ppn) utilized in the job.
+In addition the rest of the code assumes such structure of allocated resources.
+
+A working directory has to be created before sending the Slurm job into the queue.
+The input file should be in the working directory or a full path to the input file has to be specified.
+The input file has to be defined by a common APDL file which is attached to the ANSYS solver via the `-i` parameter.
+
+The **license** should be selected by the `-p` parameter.
+Licensed products are the following: `aa_r` (ANSYS **Academic** Research),
+`ane3fl` (ANSYS Multiphysics-**Commercial**), and `aa_r_dy` (ANSYS **Academic** AUTODYN)
+
+[1]: ../../../general/resources-allocation-policy.md
+
+[a]: http://www.ansys.com/products/multiphysics
+[b]: http://www.svsfem.cz
--- a/docs.it4i/software/tools/ansys/ansys.md
+++ b/docs.it4i/software/tools/ansys/ansys.md
+# Overview of ANSYS Products
+
+[SVS FEM][a] as [ANSYS Channel partner][b] for the Czech Republic provided all ANSYS licenses for our clusters and supports all ANSYS Products (Multiphysics, Mechanical, MAPDL, CFX, Fluent, Maxwell, LS-DYNA, etc.) to IT staff and ANSYS users. In case of a problem with ANSYS functionality, contact [hotline@svsfem.cz][c].
+
+We provides commercial as well as academic variants. Academic variants are distinguished by the "**Academic...**" word in the license name or by the two letter preposition "**aa\_**" in the license feature name. Change of license is realized on command line or directly in the user's Slurm file (see individual products).
+
+To load the latest version of any ANSYS product (Mechanical, Fluent, CFX, MAPDL, etc.) load the module:
+
+```console
+$ ml ANSYS
+```
+
+ANSYS supports interactive mode, but due to assumed solution of extremely difficult tasks it is not recommended.
+
+If the user needs to work in the interactive mode, we recommend to configure the RSM service on the client machine which allows to forward the solution to the cluster directly from the client's Workbench project (see ANSYS RSM service).
+
+[a]: http://www.svsfem.cz/
+[b]: http://www.ansys.com/
+[c]: mailto:hotline@svsfem.cz
--- a/docs.it4i/software/tools/ansys/licensing.md
+++ b/docs.it4i/software/tools/ansys/licensing.md
+# Licensing and Available Versions
+
+## ANSYS License Can Be Used By:
+
+* all persons in the carrying out of the CE IT4Innovations Project (In addition to the primary licensee, which is VSB - Technical University of Ostrava, users are CE IT4Innovations third parties - CE IT4Innovations project partners, particularly the University of Ostrava, the Brno University of Technology - Faculty of Informatics, the Silesian University in Opava, Institute of Geonics AS CR.)
+* all persons who have a valid license
+* students of the Technical University
+
+## ANSYS Academic Research
+
+The license intended to be used for science and research, publications, students’ projects (academic license).
+
+## ANSYS COM
+
+The license intended to be used for science and research, publications, students’ projects, and commercial research with no commercial use restrictions.
+
+## Server / Port
+
+lic-ansys.vsb.cz / 1055 (2325)
+
+![](../../../img/Ansys-lic-admin.jpg)
+
+## Available Versions
+
+* 21.1
+
+``` console
+$ ml av ANSYS
+---------------- /apps/modules/tools -----------------------
+   ANSYS/21.1-intel-2018a (D)
+
+  Where:
+   D:   Default Module
+```
--- a/docs.it4i/software/tools/ansys/setting-license-preferences.md
+++ b/docs.it4i/software/tools/ansys/setting-license-preferences.md
+# Setting License Preferences
+
+Some ANSYS tools allow you to explicitly specify usage of academic or commercial licenses in the command line (e.g. ansys211 -p aa_r to select the Academic Research license). However, we have observed that not all tools obey this option and choose the commercial license.
+
+Thus you need to configure preferred license order with ANSLIC_ADMIN. Follow these steps and move the Academic Research license to the top or bottom of the list accordingly.
+
+Launch the ANSLIC_ADMIN utility in a graphical environment:
+
+```console
+$ANSYSLIC_DIR/lic_admin/anslic_admin
+```
+
+ANSLIC_ADMIN Utility will be run
+
+![](../../../img/Fluent_Licence_1.jpg)
+
+![](../../../img/Fluent_Licence_2.jpg)
+
+![](../../../img/Fluent_Licence_3.jpg)
+
+The ANSYS Academic Research license should be moved up to the top or down to the bottom of the list.
+
+![](../../../img/Fluent_Licence_4.jpg)
--- a/docs.it4i/software/tools/ansys/workbench.md
+++ b/docs.it4i/software/tools/ansys/workbench.md
+# Workbench
+
+## Workbench Batch Mode
+
+It is possible to run Workbench scripts in a batch mode.
+You need to configure solvers of individual components to run in parallel mode.
+Open your project in Workbench.
+Then, for example, in *Mechanical*, go to *Tools - Solve Process Settings...*.
+
+![](../../../img/AMsetPar1.png)
+
+Enable the *Distribute Solution* checkbox and enter the number of cores (e.g. 72 to run on two Barbora nodes).
+If you want the job to run on more than 1 node, you must also provide a so called MPI appfile.
+In the *Additional Command Line Arguments* input field, enter:
+
+```console
+    -mpifile /path/to/my/job/mpifile.txt
+```
+
+Where `/path/to/my/job` is the directory where your project is saved.
+We will create the file `mpifile.txt` programmatically later in the batch script.
+For more information, refer to \*ANSYS Mechanical APDL Parallel Processing\* \*Guide\*.
+
+Now, save the project and close Workbench.
+We will use this script to launch the job:
+
+```bash
+    #!/bin/bash
+    #SBATCH --nodes=2
+    #SBATCH --ntasks-per-node=128
+    #SBATCH --job-name=test9_mpi_2
+    #SBATCH --partition=qcpu
+    #SBATCH --account=ACCOUNT_ID
+
+    # change the working directory
+    DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
+    mkdir -p "$DIR"
+    cd "$DIR" || exit
+
+    echo Running on host `hostname`
+    echo Time is `date`
+    echo Directory is `pwd`
+    echo This jobs runs on the following nodes:
+    echo `$SLURM_NODELIST`
+
+    ml ANSYS/2023R2-intel-2022.12
+
+    #### Set number of processors per host listing
+    procs_per_host=24
+    #### Create MPI appfile
+    echo -n "" > mpifile.txt
+    for host in `$SLURM_NODELIST`
+    do
+      echo "-h $host -np $procs_per_host $ANSYS160_DIR/bin/ansysdis161 -dis" > mpifile.txt
+    done
+
+    #-i input.dat includes the input of analysis in APDL format
+    #-o file.out is output file from ansys where all text outputs will be redirected
+    #-p the name of license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial), aa_r_dy=Academic AUTODYN)
+
+    # prevent using scsif0 interface on accelerated nodes
+    export MPI_IC_ORDER="UDAPL"
+    # spawn remote process using SSH (default is RSH)
+    export MPI_REMSH="/usr/bin/ssh"
+
+    runwb2 -R jou6.wbjn -B -F test9.wbpj
+```
+
+The solver settings are saved in the `solvehandlers.xml` file,
+which is not located in the project directory.
+Verify your solved settings when uploading a project from your local computer.
--- a/docs.it4i/software/tools/apptainer.md
+++ b/docs.it4i/software/tools/apptainer.md
+# Apptainer on IT4Innovations
+
+On our clusters, the Apptainer images of main Linux distributions are prepared.
+
+```console
+Barbora             Karolina
+ ├── CentOS          ├── CentOS
+ |    └── 7          |    └── 7
+ ├── Rocky           ├── Rocky
+ |    ├── 8          |    ├── 8
+ │    └── 9          │    └── 9
+ ├── Fedora          ├── Fedora
+ │    └── latest     │    └── latest
+ └── Ubuntu          └── Ubuntu
+      └── latest          └── latest
+```
+
+!!! info
+    Current information about available Apptainer images can be obtained by the `ml av` command. The images are listed in the `OS` section.
+
+The bootstrap scripts, wrappers, features, etc. are located on [it4i-singularity GitLab page][a].
+
+## IT4Innovations Apptainer Wrappers
+
+For better user experience with Apptainer containers, we prepared several wrappers:
+
+* image-exec
+* image-mpi
+* image-run
+* image-shell
+* image-update
+
+Listed wrappers help you to use prepared Apptainer images loaded as modules.
+You can easily load a Apptainer image like any other module on the cluster by the `ml OS/version` command.
+After the module is loaded for the first time, the prepared image is copied into your home folder and is ready for use.
+When you load the module next time, the version of the image is checked and an image update (if exists) is offered.
+Then you can update your copy of the image by the `image-update` command.
+
+!!! warning
+    With an image update, all user changes to the image will be overridden.
+
+The runscript inside the Apptainer image can be run by the `image-run` command.
+
+!!! note " CentOS/7 module only"
+    This command automatically mounts the `/scratch` and `/apps` storage and invokes the image as writable, so user changes can be made.
+
+Very similar to `image-run` is the `image-exec` command.
+The only difference is that `image-exec` runs a user-defined command instead of a runscript.
+In this case, the command to be run is specified as a parameter.
+
+Using the interactive shell inside the Apptainer container is very useful for development.
+In this interactive shell, you can make any changes to the image you want,
+but be aware that you can not use the `sudo` privileged commands directly on the cluster.
+To simply invoke interactive shell, use the `image-shell` command.
+
+Another useful feature of the Apptainer is the direct support of OpenMPI.
+For proper MPI function, you have to install the same version of OpenMPI inside the image as you use on the cluster.
+OpenMPI/4.1.2 is installed in prepared images  (CentOS 7, Rocky 8).
+The MPI must be started outside the container.
+The easiest way to start the MPI is to use the `image-mpi` command.
+This command has the same parameters as `mpirun`, so there is no difference between running normal MPI application
+and MPI application in a Apptainer container.
+
+## Examples
+
+In the examples, we will use prepared Apptainer images.
+
+### Load Image
+
+```console
+$ ml CentOS/7
+Preparing image CentOS-7_20230116143612.sif
+        261.20M 100%  412.36MB/s    0:00:00 (xfr#1, to-chk=0/1)
+Your image of CentOS/7 is at location: /home/username/.apptainer/images/CentOS-7_20230116143612.sif
+```
+
+!!! tip
+    After the module is loaded for the first time, the prepared image is copied into your home folder to the *.apptainer/images* subfolder.
+
+### Wrappers
+
+**image-exec**
+
+Executes the given command inside the Apptainer image. The container is in this case started, then the command is executed and the container is stopped.
+
+```console
+$ ml CentOS/7
+$ image-exec cat /etc/redhat-release
+CentOS Linux release 7.9.2009 (Core)
+```
+
+**image-mpi**
+
+MPI wrapper - see more in the [Examples MPI][1] section.
+
+**image-run**
+
+This command runs the runscript inside the Apptainer image. Note, that the prepared images do not contain a runscript.
+
+**image-shell**
+
+Invokes an interactive shell inside the Apptainer image.
+
+```console
+$ ml CentOS/7
+$ image-shell
+Apptainer>
+```
+
+### Update Image
+
+This command is for updating your local Apptainer image copy.
+The local copy is overridden in this case.
+
+```console
+$ ml CentOS/7
+New version of CentOS image was found. (New: CentOS-7_20230116143612.sif Old: CentOS-7_20230115143612.sif)
+For updating image use: image-update
+Your image of CentOS/7 is at location: /home/username/.apptainer/images/CentOS-7_20230115143612.sif
+$ image-update
+New version of CentOS image was found. (New: CentOS-7_20230116143612.sif Old: CentOS-7_20230115143612.sif)
+Do you want to update local copy? (WARNING all user modification will be deleted) [y/N]: y
+Updating image  CentOS-7_20230116143612.sif
+       2.71G 100%  199.49MB/s    0:00:12 (xfer#1, to-check=0/1)
+
+sent 2.71G bytes  received 31 bytes  163.98M bytes/sec
+total size is 2.71G  speedup is 1.00
+New version is ready. (/home/username/.apptainer/images/CentOS-7_20230116143612.sif)
+```
+
+### MPI
+
+In the following example, we are using a job submitted by the command:
+
+```bash
+$ salloc -A PROJECT_ID -p qcpu --nodes=2 --ntasks-per-node=128 --time=00:30:00
+```
+
+!!! note
+    We have seen no major performance impact for a job running in a Apptainer container.
+
+With Apptainer, the MPI usage model is to call `mpirun` from outside the container
+and reference the container from your `mpirun` command.
+Usage would look like this:
+
+```console
+$ mpirun -np 128 apptainer exec container.img /path/to/contained_mpi_prog
+```
+
+By calling `mpirun` outside of the container, we solve several very complicated work-flow aspects.
+For example, if `mpirun` is called from within the container, it must have a method for spawning processes on remote nodes.
+Historically the SSH is used for this, which means that there must be an `sshd` running within the container on the remote nodes
+and this `sshd` process must not conflict with the `sshd` running on that host.
+It is also possible for the resource manager to launch the job
+and (in OpenMPI’s case) the Orted (Open RTE User-Level Daemon) processes on the remote system,
+but that then requires resource manager modification and container awareness.
+
+In the end, we do not gain anything by calling `mpirun` from within the container
+except for increasing the complexity levels and possibly losing out on some added
+performance benefits (e.g. if a container was not built with the proper OFED as the host).
+
+#### MPI Inside Apptainer Image
+
+```console
+$ ml CentOS/7
+$ image-shell
+Apptainer> mpirun hostname | wc -l
+128
+```
+
+As you can see in this example, we allocated two nodes, but MPI can use only one node (128 processes) when used inside the Apptainer image.
+
+#### MPI Outside Apptainer Image
+
+```console
+$ ml CentOS/7
+Your image of CentOS/7 is at location: /home/username/.apptainer/images/CentOS-7_20230116143612.sif
+$ image-mpi hostname | wc -l
+256
+```
+
+In this case, the MPI wrapper behaves like the `mpirun` command.
+The `mpirun` command is called outside the container
+and the communication between nodes are propagated into the container automatically.
+
+## How to Use Own Image on Cluster?
+
+* Prepare the image on your computer
+* Transfer the images to your `/home` directory on the cluster (for example `.apptainer/image`)
+
+```console
+local:$ scp container.img login@login2.clustername.it4i.cz:~/.apptainer/image/container.img
+```
+
+* Load module Apptainer (`ml apptainer`)
+* Use your image
+
+!!! note
+    If you want to use the Apptainer wrappers with your own images, load the `apptainer-wrappers/1.0` module and set the environment variable `IMAGE_PATH_LOCAL=/path/to/container.img`.
+
+## How to Edit IT4Innovations Image?
+
+* Transfer the image to your computer
+
+```console
+local:$ scp login@login2.clustername.it4i.cz:/home/username/.apptainer/image/container.img container.img
+```
+
+* Modify the image
+* Transfer the image from your computer to your `/home` directory on the cluster
+
+```console
+local:$ scp container.img login@login2.clustername.it4i.cz:/home/username/.apptainer/image/container.img
+```
+
+* Load module Apptainer (`ml apptainer`)
+* Use your image
+
+[1]: #mpi
+
+[a]: https://code.it4i.cz/sccs/it4i-singularity
--- a/docs.it4i/software/tools/easybuild-images.md
+++ b/docs.it4i/software/tools/easybuild-images.md
+# Generating Container Recipes & Images
+
+EasyBuild has support for generating container recipes that will use EasyBuild to build and install a specified software stack. In addition, EasyBuild can (optionally) leverage the build tool provided by the container software of choice to create container images.
+
+## Generating Container Recipes
+
+To generate container recipes, use `eb --containerize`, or `eb -C` for short.
+
+The resulting container recipe will leverage EasyBuild to build and install the software that corresponds to the easyconfig files that are specified as arguments to the eb command (and all required dependencies, if needed).
+
+!!! note
+    EasyBuild will refuse to overwrite existing container recipes.
+    To re-generate an already existing recipe file, use the `--force` command line option.
+
+## Base Container Image
+
+In order to let EasyBuild generate a container recipe, it is required to specify which container image should be used as a base, via the `--container-base` configuration option.
+
+Currently, three types of container base images can be specified:
+
+* **localimage: *path***: the location of an existing container image file
+* **docker:*name***: the name of a Docker container image (to be downloaded from [Docker Hub][a].
+* **shub:*name***: the name of a Apptainer/Singularity container image (to be downloaded from [Singularity Hub][b].
+
+## Building Container Images
+
+To instruct EasyBuild to also build a container image from the generated container recipe, use `--container-build-image` (in combination with `-C` or `--containerize`).
+
+EasyBuild will leverage functionality provided by the container software of choice (see containers_cfg_image_type) to build the container image.
+
+For example, in the case of Apptainer/Singularity, EasyBuild will run `sudo /path/to/singularity build` on the generated container recipe.
+
+The container image will be placed in the location specified by the `--containerpath` configuration option (see Location for generated container recipes & images (`--containerpath`)), next to the generated container recipe that was used to build the image.
+
+## Example Usage
+
+In this example, we will use a pre-built base container image located at `/tmp/example.simg` (see also Base container image (`--container-base`)).
+
+To let EasyBuild generate a container recipe for GCC 6.4.0 + binutils 2.28:
+
+```console
+eb GCC-6.4.0-2.28.eb --containerize --container-base localimage:/tmp/example.simg --experimental
+```
+
+With other configuration options left to default (see the output of `eb --show-config`), this will result in a Apptainer/Singularity container recipe using example.simg as a base image, which will be stored in `$HOME/.local/easybuild/containers`:
+
+```console
+$ eb GCC-6.4.0-2.28.eb --containerize --container-base localimage:/tmp/example.simg --experimental
+== temporary log file in case of crash /tmp/eb-dLZTNF/easybuild-LPLeG0.log
+== Singularity definition file created at /home/example/.local/easybuild/containers/Singularity.GCC-6.4.0-2.28
+== Temporary log file(s) /tmp/eb-dLZTNF/easybuild-LPLeG0.log* have been removed.
+== Temporary directory /tmp/eb-dLZTNF has been removed.
+```
+
+## Example of a Generated Container Recipe
+
+Below is an example of container recipe generated by EasyBuild, using the following command:
+
+```console
+eb Python-3.6.4-foss-2018a.eb OpenMPI-2.1.2-GCC-6.4.0-2.28.eb -C --container-base shub:shahzebsiddiqui/eb-singularity:centos-7.4.1708 --experimental
+```
+
+It uses the *shahzebsiddiqui/eb-singularity:centos-7.4.1708* base container image that is available from the Apptainer/Singularity hub ([see this webpage][c]).
+
+```
+Bootstrap: shub
+From: shahzebsiddiqui/eb-singularity:centos-7.4.1708
+
+%post
+yum --skip-broken -y install openssl-devel libssl-dev libopenssl-devel
+yum --skip-broken -y install libibverbs-dev libibverbs-devel rdma-core-devel
+
+
+# upgrade easybuild package automatically to latest version
+pip install -U easybuild
+
+# change to 'easybuild' user
+su - easybuild
+
+eb Python-3.6.4-foss-2018a.eb OpenMPI-2.1.2-GCC-6.4.0-2.28.eb --robot --installpath=/app/ --prefix=/scratch --tmpdir=/scratch/tmp
+
+# exit from 'easybuild' user
+exit
+
+# cleanup
+rm -rf /scratch/tmp/* /scratch/build /scratch/sources /scratch/ebfiles_repo
+
+%runscript
+eval "$@"
+
+%environment
+source /etc/profile
+module use /app/modules/all
+ml Python/3.6.4-foss-2018a OpenMPI/2.1.2-GCC-6.4.0-2.28
+
+%labels
+```
+
+!!! note
+    We also specify the easyconfig file for the OpenMPI component of `foss/2018a` here, because it requires specific OS dependencies to be installed (see the second `yum ... install` line in the generated container recipe).
+    We intend to let EasyBuild take into account the OS dependencies of the entire software stack automatically in a future update.
+
+    The generated container recipe includes `pip install -U easybuild` to ensure that the latest version of EasyBuild is used to build the software in the container image, regardless of whether EasyBuild was already present in the container and which version it was.
+
+    In addition, the generated module files will follow the default module-naming scheme (EasyBuildMNS). The modules that correspond to the easyconfig files that were specified on the command line will be loaded automatically; see the statements in the %environment section of the generated container recipe.
+
+## Example of Building Container Image
+
+You can instruct EasyBuild to also build the container image by using `--container-build-image`.
+
+Note that you will need to enter your sudo password (unless you recently executed a sudo command in the same shell session):
+
+```console
+$ eb GCC-6.4.0-2.28.eb --containerize --container-base localimage:/tmp/example.simg --container-build-image --experimental
+== temporary log file in case of crash /tmp/eb-aYXYC8/easybuild-8uXhvu.log
+== Singularity tool found at /usr/bin/singularity
+== Singularity version '2.4.6' is 2.4 or higher ... OK
+== Singularity definition file created at /home/example/.local/easybuild/containers/Singularity.GCC-6.4.0-2.28
+== Running 'sudo /usr/bin/singularity build  /home/example/.local/easybuild/containers/GCC-6.4.0-2.28.simg /home/example/.local/easybuild/containers/Singularity.GCC-6.4.0-2.28', you may need to enter your 'sudo' password...
+== (streaming) output for command 'sudo /usr/bin/singularity build  /home/example/.local/easybuild/containers/GCC-6.4.0-2.28.simg /home/example/.local/easybuild/containers/Singularity.GCC-6.4.0-2.28':
+Using container recipe deffile: /home/example/.local/easybuild/containers/Singularity.GCC-6.4.0-2.28
+Sanitizing environment
+Adding base Singularity environment to container
+...
+== temporary log file in case of crash /scratch/tmp/eb-WnmCI_/easybuild-GcKyY9.log
+== resolving dependencies ...
+...
+== building and installing GCCcore/6.4.0...
+...
+== building and installing binutils/2.28-GCCcore-6.4.0...
+...
+== building and installing GCC/6.4.0-2.28...
+...
+== COMPLETED: Installation ended successfully
+== Results of the build can be found in the log file(s) /app/software/GCC/6.4.0-2.28/easybuild/easybuild-GCC-6.4.0-20180424.084946.log
+== Build succeeded for 15 out of 15
+...
+Building Singularity image...
+Singularity container built: /home/example/.local/easybuild/containers/GCC-6.4.0-2.28.simg
+Cleaning up...
+== Singularity image created at /home/example/.local/easybuild/containers/GCC-6.4.0-2.28.simg
+== Temporary log file(s) /tmp/eb-aYXYC8/easybuild-8uXhvu.log* have been removed.
+== Temporary directory /tmp/eb-aYXYC8 has been removed.
+```
+
+To inspect the container image, you can use `singularity shell` to start a shell session in the container:
+
+```console
+$ singularity shell --shell "/bin/bash --norc" $HOME/.local/easybuild/containers/GCC-6.4.0-2.28.simg
+
+Singularity GCC-6.4.0-2.28.simg:~> source /etc/profile
+
+Singularity GCC-6.4.0-2.28.simg:~> module list
+
+Currently Loaded Modules:
+  1) GCCcore/6.4.0   2) binutils/2.28-GCCcore-6.4.0   3) GCC/6.4.0-2.28
+
+Singularity GCC-6.4.0-2.28.simg:~> which gcc
+/app/software/GCCcore/6.4.0/bin/gcc
+
+Singularity GCC-6.4.0-2.28.simg:~> gcc --version
+gcc (GCC) 6.4.0
+...
+```
+
+Or, you can use `singularity exec` to execute a command in the container.
+
+Compare the output of running which gcc and `gcc --version` locally:
+
+```console
+$ which gcc
+/usr/bin/gcc
+$ gcc --version
+gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
+...
+```
+
+and the output when running the same commands in the container:
+
+```console
+$ singularity exec GCC-6.4.0-2.28.simg which gcc
+/app/software/GCCcore/6.4.0/bin/gcc
+
+$ singularity exec GCC-6.4.0-2.28.simg gcc --version
+gcc (GCC) 6.4.0
+...
+```
+
+## Configuration
+
+### Location for Generated Container Recipes & Images
+
+To control the location where EasyBuild will put generated container recipes & images, use the `--containerpath` configuration setting. Next to providing this as an option to the eb command, you can also define the `$EASYBUILD_CONTAINERPATH` environment variable or specify containerpath in an EasyBuild configuration file.
+
+The default value for this location is `$HOME/.local/easybuild/containers`, unless the `--prefix` configuration setting was provided, in which case it becomes <prefix>/containers (see Overall prefix path (`--prefix`)).
+
+Use `eb --show-full-config | grep containerpath` to determine the currently active setting.
+
+### Container Image Format
+
+The format for container images that EasyBuild produces via the functionality provided by the container software can be controlled via the `--container-image-format` configuration setting.
+
+For Apptainer/Singularity containers (see Type of container recipe/image to generate (`--container-type`)), three image formats are supported:
+
+* squashfs (default): compressed images using squashfs read-only file system
+* ext3: writable image file using ext3 file system
+* sandbox: container image in a regular directory
+
+See also official user guide on [Image Mounts format][d] and [Building a Container][e].
+
+## Name for Container Recipe & Image
+
+By default, EasyBuild will use the name of the first easyconfig file (without the .eb suffix) as a name for both the container recipe and the image.
+
+You can specify an alternate name using the `--container-image-name` configuration setting.
+
+The filename of the generated container recipe will be `Singularity`.<name>.
+
+The filename of the container image will be `<name><extension>`, where the value for `<extension>` depends on the image format (see Container image format (`--container-image-format`)):
+
+* ‘.simg’ for squashfs container images
+* ‘.img’ for ext3 container images
+* empty for sandbox container images (in which case the container image is actually a directory rather than a file)
+
+### Temporary Directory for Creating Container Images
+
+The container software that EasyBuild leverages to build container images may be using a temporary directory in a location that does not have sufficient free space.
+
+You can instruct EasyBuild to pass an alternate location via the `--container-tmpdir` configuration setting.
+
+For Apptainer/Singularity, the default is to use /tmp, [see][f]. If `--container-tmpdir` is specified, the `$SINGULARITY_TMPDIR` environment variable will be defined accordingly to let Apptainer/Singularity use that location instead.
+
+Type of container recipe/image to generate (`--container-type`).
+With the `--container-type` configuration option, you can specify what type of container recipe/image EasyBuild should generate. Possible values are:
+
+* singularity (default): [Singularity][g] container recipes & images
+* docker: [Docker][h] container recipe & images
+
+For detailed documentation, see the [webpage][i].
+
+[a]: https://hub.docker.com/
+[b]: https://singularity-hub.org/
+[c]: https://singularity-hub.org/collections/143
+[d]: https://apptainer.org/docs/user/latest/bind_paths_and_mounts.html#image-mounts
+[e]: https://apptainer.org/docs/user/latest/build_a_container.html
+[f]: https://apptainer.org/docs/user/latest/build_env.html#temporary-folders
+[g]: https://singularity.lbl.gov
+[h]: https://docs.docker.com/
+[i]: http://easybuild.readthedocs.io/en/latest/Containers.html
--- a/docs.it4i/software/tools/easybuild.md
+++ b/docs.it4i/software/tools/easybuild.md
+# EasyBuild
+
+The objective of this tutorial is to show how EasyBuild can be used to ease, automate, and script the build of software on the IT4Innovations clusters. Two use-cases are considered. First, we are going to build a software that is supported by EasyBuild. Then, we will see through a simple example how to add support for a new software in EasyBuild.
+
+The benefit of using EasyBuild for your builds is that it allows automated and reproducible build of software. Once a build has been made, the build script (via the EasyConfig file) or the installed software (via the module file) can be shared with other users.
+
+## Short Introduction
+
+EasyBuild is a tool that allows performing automated and reproducible software compilation and installation.
+
+All builds and installations are performed at user level, so you do not need the admin rights. The software is installed in your home directory (by default in `$HOME/.local/easybuild/software/`) and a module file is generated (by default in `$HOME/.local/easybuild/modules/`) to use the software.
+
+EasyBuild relies on two main concepts:
+
+* Toolchains
+* EasyConfig file (our easyconfigs are [here][a])
+
+A detailed documentation is available [here][b].
+
+## Toolchains
+
+A toolchain corresponds to a compiler and a set of libraries, which are commonly used to build a software. The two main toolchains frequently used on the IT4Innovations clusters are the **foss** and **intel**.
+
+* **foss** is based on the GCC compiler and on open-source libraries (OpenMPI, OpenBLAS, etc.).
+* **intel** is based on the Intel compiler and on Intel libraries (Intel MPI, Intel Math Kernel Library, etc.).
+
+Additional details are available [here][c].
+
+## EasyConfig File
+
+The EasyConfig file is a simple text file that describes the build process of a software. For most software that uses standard procedure (like configure, make, and make install), this file is very simple. Many EasyConfig files are already provided with EasyBuild.
+
+By default, EasyConfig files and generated modules are named using the following convention:
+
+`software-name-software-version-toolchain-name-toolchain-version(-suffix).eb`
+
+Additional details are available [here][d].
+
+## EasyBuild on IT4Innovations Clusters
+
+To use EasyBuild on a compute node, load the `EasyBuild` module:
+
+```console
+$ml av easybuild
+
+------------------------------------------ /apps/modules/tools -------------------------------------------
+   EasyBuild/4.3.3 (S)    EasyBuild/4.4.2 (S)    EasyBuild/4.5.4 (S)    EasyBuild/4.6.2 (S)
+   EasyBuild/4.3.4 (S)    EasyBuild/4.5.0 (S)    EasyBuild/4.5.5 (S)    EasyBuild/4.7.0 (S,D)
+   EasyBuild/4.4.0 (S)    EasyBuild/4.5.1 (S)    EasyBuild/4.6.0 (S)
+   EasyBuild/4.4.1 (S)    EasyBuild/4.5.3 (S)    EasyBuild/4.6.1 (S)
+
+  Where:
+   S:  Module is Sticky, requires --force to unload or purge
+   D:  Default Module
+
+Use "module spider" to find all possible modules and extensions.
+Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
+
+$ ml EasyBuild
+```
+
+The EasyBuild command is `eb`. Check the version you have loaded:
+
+```console
+$ eb --version
+This is EasyBuild 4.7.0 (framework: 4.7.0, easyblocks: 4.7.0) on host login1.karolina.it4i.cz.
+```
+
+To get help on the EasyBuild options, use the `-h` or `-H` option flags:
+
+```console
+$ eb -h
+Usage: eb [options] easyconfig [...]
+
+Builds software based on easyconfig (or parse a directory). Provide one or more easyconfigs or
+directories, use -H or --help more information.
+
+Options:
+  -h                show short help message and exit
+  -H OUTPUT_FORMAT  show full help message and exit
+
+  Debug and logging options (configfile section MAIN):
+    -d              Enable debug log mode (default: False)
+
+  Basic options:
+    Basic runtime options for EasyBuild. (configfile section basic)
+...
+```
+
+## Build Software Using Provided EasyConfig File
+
+### Search for Available Easyconfig
+
+Searching for available easyconfig files can be done using the `--search` (long output) and `-S` (short output) command line options. All easyconfig files available in the robot search path are considered and searching is done case-insensitive.
+
+```console
+$ eb -S git
+CFGS1=/apps/easybuild/easyconfigs-it4i
+CFGS2=/apps/easybuild/easyconfigs-master/easybuild/easyconfigs
+CFGS3=/apps/easybuild/easyconfigs-develop/easybuild/easyconfigs
+ * $CFGS1/.gitignore
+ * $CFGS1/.gitlab-ci.yml
+ * $CFGS1/g/git-lfs/git-lfs-1.1.1.eb
+ * $CFGS1/g/git-lfs/git-lfs-2.11.0.eb
+ * $CFGS1/g/git-lfs/git-lfs-3.1.2.eb
+ * $CFGS1/g/git/git-2.19.1.eb
+ * $CFGS1/g/git/git-2.21.0.eb
+ * $CFGS1/g/git/git-2.23.0.eb
+ * $CFGS1/g/git/git-2.25.1.eb
+ * $CFGS1/g/git/git-2.30.1.eb
+ * $CFGS1/g/git/git-2.31.1.eb
+ * $CFGS1/g/git/git-2.32.0-GCCcore-10.3.0-nodocs-test.eb
+ * $CFGS2/b/BCALM/BCALM-2.2.0-fix-nogit.patch
+ * $CFGS2/d/dagitty/dagitty-0.2-2-foss-2018b-R-3.5.1.eb
+ * $CFGS2/e/EMAN2/EMAN2-2.3_fix_broken_githash_regex_replace.patch
+ * $CFGS2/g/GIMIC/GIMIC-2018.04.20_git.patch
+ * $CFGS2/g/GitPython/GitPython-2.1.11-foss-2018b-Python-3.6.6.eb
+ * $CFGS2/g/GitPython/GitPython-2.1.11-intel-2018b-Python-3.6.6.eb
+ * $CFGS2/g/GitPython/GitPython-2.1.15.eb
+ * $CFGS2/g/GitPython/GitPython-3.0.3-GCCcore-8.2.0-Python-3.7.2.eb
+ * $CFGS2/g/GitPython/GitPython-3.1.0-GCCcore-8.3.0-Python-3.7.4.eb
+ * $CFGS2/g/GitPython/GitPython-3.1.9-GCCcore-9.3.0-Python-3.8.2.eb
+ * $CFGS2/g/GitPython/GitPython-3.1.14-GCCcore-10.2.0.eb
+ * $CFGS2/g/GitPython/GitPython-3.1.18-GCCcore-10.3.0.eb
+ * $CFGS2/g/GitPython/GitPython-3.1.24-GCCcore-11.2.0.eb
+ * $CFGS2/g/GitPython/GitPython-3.1.27-GCCcore-11.3.0.eb
+ * $CFGS2/g/gettext/gettext-0.19.8_fix-git-config.patch
+ * $CFGS2/g/git-extras/git-extras-5.1.0-foss-2016a.eb
+...
+```
+
+### Get an Overview of Planned Installations
+
+You can do a “dry-run” overview by supplying `-D`/`--dry-run` (typically combined with `--robot`, in the form of `-Dr`):
+
+```console
+$ eb git-2.30.1.eb -Dr
+== Temporary log file in case of crash /tmp/eb-6vwvor2_/easybuild-vg82aat4.log
+Dry run: printing build status of easyconfigs and dependencies
+CFGS=/apps/easybuild
+ * [x] $CFGS/easyconfigs-master/easybuild/easyconfigs/m/M4/M4-1.4.18.eb (module: M4/1.4.18)
+ * [x] $CFGS/easyconfigs-it4i/a/Autoconf/Autoconf-2.69.eb (module: Autoconf/2.69)
+ * [ ] $CFGS/easyconfigs-it4i/g/git/git-2.30.1.eb (module: git/2.30.1)
+== Temporary log file(s) /tmp/eb-6vwvor2_/easybuild-vg82aat4.log* have been removed.
+== Temporary directory /tmp/eb-6vwvor2_ has been removed.
+```
+
+### Compile and Install Module
+
+If we try to build *git-2.31.1.eb*, nothing will happen as it is already installed on the cluster. To enable dependency resolution, use the `--robot` command line option (or `-r` for short):
+
+```console
+$ eb git-2.31.1.eb -r
+== Temporary log file in case of crash /tmp/eb-11d_kpht/easybuild-jmygqpqr.log
+== git/2.31.1 is already installed (module found), skipping
+== No easyconfigs left to be built.
+
+== Build succeeded for 0 out of 0
+== Temporary log file(s) /tmp/eb-11d_kpht/easybuild-jmygqpqr.log* have been removed.
+== Temporary directory /tmp/eb-11d_kpht has been removed.
+```
+
+Rebuild *git-2.31.1.eb*. Use `eb --rebuild` to rebuild a given easyconfig/module or use `eb --force`/`-f` to force the reinstallation of a given easyconfig/module. The behavior of `--force` is the same as `--rebuild` and `--ignore-osdeps`.
+
+```console
+$ eb git-2.31.1.eb -r -f
+== Temporary log file in case of crash /tmp/eb-wbzf_rxh/easybuild-umq1_01u.log
+== resolving dependencies ...
+== processing EasyBuild easyconfig /apps/easybuild/easyconfigs-it4i/g/git/git-2.31.1.eb
+== building and installing git/2.31.1...
+== fetching files...
+== creating build dir, resetting environment...
+== ... (took 3 secs)
+== unpacking...
+== ... (took 9 secs)
+== patching...
+== preparing...
+== configuring...
+== ... (took 4 secs)
+== building...
+== ... (took 4 secs)
+== testing...
+== installing...
+== ... (took 2 secs)
+== taking care of extensions...
+== restore after iterating...
+== postprocessing...
+== sanity checking...
+== cleaning up...
+== ... (took 3 secs)
+== creating module...
+== permissions...
+== packaging...
+== COMPLETED: Installation ended successfully (took 30 secs)
+== Results of the build can be found in the log file(s)
+/home/username/.local/easybuild/software/git/2.31.1/easybuild/easybuild-git-2.31.1-20230315.092001.log
+
+== Build succeeded for 1 out of 1
+== Temporary log file(s) /tmp/eb-wbzf_rxh/easybuild-umq1_01u.log* have been removed.
+== Temporary directory /tmp/eb-wbzf_rxh has been removed.
+```
+
+If we try to build *git-2.30.1.eb*:
+
+```console
+$ eb git-2.30.1.eb -r
+== Temporary log file in case of crash /tmp/eb-s3t9lwk_/easybuild-cvx5kpna.log
+== resolving dependencies ...
+== processing EasyBuild easyconfig /apps/easybuild/easyconfigs-it4i/g/git/git-2.30.1.eb
+== building and installing git/2.30.1...
+== fetching files...
+== creating build dir, resetting environment...
+== unpacking...
+== ... (took 10 secs)
+== patching...
+== preparing...
+== configuring...
+== ... (took 4 secs)
+== building...
+== ... (took 4 secs)
+== testing...
+== installing...
+== ... (took 3 secs)
+== taking care of extensions...
+== restore after iterating...
+== postprocessing...
+== sanity checking...
+== cleaning up...
+== ... (took 3 secs)
+== creating module...
+== permissions...
+== packaging...
+== COMPLETED: Installation ended successfully (took 29 secs)
+== Results of the build can be found in the log file(s)
+/home/username/.local/easybuild/software/git/2.30.1/easybuild/easybuild-git-2.30.1-20230315.092117.log
+
+== Build succeeded for 1 out of 1
+== Temporary log file(s) /tmp/eb-s3t9lwk_/easybuild-cvx5kpna.log* have been removed.
+== Temporary directory /tmp/eb-s3t9lwk_ has been removed.
+```
+
+If we try to build *git-2.30.1*, but we used easyconfig *git-2.25.1.eb*, change the version command `--try-software-version=2.30.1`:
+
+```console
+$ eb git-2.25.1.eb -r --try-software-version=2.30.1
+== Temporary log file in case of crash /tmp/eb-lw9itci8/easybuild-qzb7j64j.log
+== resolving dependencies ...
+== processing EasyBuild easyconfig /tmp/eb-lw9itci8/tweaked_easyconfigs/git-2.30.1.eb
+== building and installing git/2.30.1...
+== fetching files...
+== ... (took 4 secs)
+== creating build dir, resetting environment...
+== unpacking...
+== ... (took 9 secs)
+== patching...
+== preparing...
+== configuring...
+== ... (took 4 secs)
+== building...
+== ... (took 4 secs)
+== testing...
+== installing...
+== ... (took 4 secs)
+== taking care of extensions...
+== restore after iterating...
+== postprocessing...
+== sanity checking...
+== cleaning up...
+== ... (took 3 secs)
+== creating module...
+== permissions...
+== packaging...
+== COMPLETED: Installation ended successfully (took 33 secs)
+== Results of the build can be found in the log file(s)
+/home/username/.local/easybuild/software/git/2.30.1/easybuild/easybuild-git-2.30.1-20230315.092313.log
+
+== Build succeeded for 1 out of 1
+== Temporary log file(s) /tmp/eb-lw9itci8/easybuild-qzb7j64j.log* have been removed.
+== Temporary directory /tmp/eb-lw9itci8 has been removed.
+```
+
+### MODULEPATH
+
+To see the newly installed modules, you need to add the path where they were installed to the MODULEPATH. On the cluster, you have to use the `module use` command:
+
+```console
+$ module use $HOME/.local/easybuild/modules/all/
+```
+
+or modify your `.bash_profile`:
+
+```console
+$ cat ~/.bash_profile
+# .bash_profile
+
+# Get the aliases and functions
+if [ -f ~/.bashrc ]; then
+. ~/.bashrc
+fi
+
+# User specific environment and startup programs
+
+module use $HOME/.local/easybuild/modules/all/
+
+PATH=$PATH:$HOME/bin
+
+export PATH
+```
+
+## Build Software Using Your Own EasyConfig File
+
+For this example, we create an EasyConfig file to build Git 2.38.1 with the *foss* toolchain. Open your favorite editor and create a file named *git-2.18.1-foss-2022b.eb* with the following content:
+
+```console
+$ vim git-2.38.1-foss-2022b.eb
+```
+
+```python
+easyblock = 'ConfigureMake'
+
+name = 'git'
+version = '2.38.1'
+
+homepage = 'https://git-scm.com/'
+description = """Git is a free and open source distributed version control system designed
+to handle everything from small to very large projects with speed and efficiency."""
+
+toolchain = {'name': 'foss', 'version': '2022b'}
+
+source_urls = ['https://github.com/git/git/archive']
+sources = ['v%(version)s.tar.gz']
+
+builddependencies = [
+    ('binutils', '2.39'),
+    ('Autotools', '20220317'),
+]
+
+dependencies = [
+    ('cURL', '7.86.0'),
+    ('expat', '2.4.9'),
+    ('gettext', '0.21.1'),
+    ('Perl', '5.36.0'),
+]
+
+preconfigopts = 'make configure && '
+
+# Work around git build system bug.  If LIBS contains -lpthread, then configure
+# will not append -lpthread to LDFLAGS, but Makefile ignores LIBS.
+configopts = "--with-perl=${EBROOTPERL}/bin/perl --enable-pthreads='-lpthread'"
+
+postinstallcmds = ['cd contrib/subtree; make install']
+
+sanity_check_paths = {
+    'files': ['bin/git'],
+    'dirs': ['libexec/git-core', 'share'],
+}
+
+moduleclass = 'tools'
+```
+
+This is a simple EasyConfig. Most of the fields are self-descriptive. No build method is explicitly defined, so it uses by default the standard configure/make/make install approach.
+
+Let us build Git with this EasyConfig file:
+
+```console
+$ eb git-2.38.1-foss-2022b.eb -r
+== Temporary log file in case of crash /tmp/eb-2aiq9qr8/easybuild-eb4zenze.log
+== resolving dependencies ...
+== processing EasyBuild easyconfig /home/username/git-2.38.1-foss-2022b.eb
+== building and installing git/2.38.1-foss-2022b...
+== fetching files...
+== ... (took 3 secs)
+== creating build dir, resetting environment...
+== unpacking...
+== ... (took 11 secs)
+== patching...
+== preparing...
+== ... (took 2 secs)
+== configuring...
+== ... (took 7 secs)
+== building...
+== ... (took 7 secs)
+== testing...
+== installing...
+== ... (took 2 secs)
+== taking care of extensions...
+== restore after iterating...
+== postprocessing...
+== sanity checking...
+== ... (took 1 secs)
+== cleaning up...
+== ... (took 4 secs)
+== creating module...
+== ... (took 1 secs)
+== permissions...
+== packaging...
+== COMPLETED: Installation ended successfully (took 41 secs)
+== Results of the build can be found in the log file(s)
+/home/username/.local/easybuild/software/git/2.38.1-foss-2022b/easybuild/easybuild-git-2.38.1-20230315.0957
+22.log
+
+== Build succeeded for 1 out of 1
+== Temporary log file(s) /tmp/eb-2aiq9qr8/easybuild-eb4zenze.log* have been removed.
+== Temporary directory /tmp/eb-2aiq9qr8 has been removed.
+```
+
+We can now check that our version of Git is available via the modules:
+
+```console
+$ ml av git
+
+------------------------------- /home/username/.local/easybuild/modules/all -------------------------------
+   git/2.38.1-foss-2022b
+
+------------------------------------------ /apps/modules/devel -------------------------------------------
+   libgit2/1.1.0-GCCcore-10.3.0
+
+------------------------------------------ /apps/modules/tools -------------------------------------------
+   git-lfs/3.1.2                            git/2.32.0-GCCcore-10.3.0-nodocs
+   git/2.28.0-GCCcore-10.2.0-nodocs         git/2.33.1-GCCcore-11.2.0-nodocs
+   git/2.31.1                               git/2.36.0-GCCcore-11.3.0-nodocs
+   git/2.32.0-GCCcore-10.3.0-nodocs-test    git/2.38.1-GCCcore-12.2.0-nodocs (D)
+
+  Where:
+   D:  Default Module
+
+Use "module spider" to find all possible modules and extensions.
+Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
+```
+
+## Advanced EasyBuild Configuration
+
+By creating the `~/.config/easybuild/config.cfg` file, you can easily specify the desired location of your software, CUDA compute capabilities, and other options that you would usually have to specify within your easyconfig or from the command line. To get an overview of all available options, use `eb --confighelp` command.
+
+You can use our template to set all of the usual EasyBuild variables:
+
+```console
+[MAIN]
+
+[basic]
+locks-dir=EASYBUILD_ROOT/.locks/
+robot=/apps/easybuild/easyconfigs-it4i:/apps/easybuild/easyconfigs-master/easybuild/easyconfigs:/apps/easybuild/easyconfigs-develop/easybuild/easyconfigs
+robot-paths=/apps/easybuild/easyconfigs-it4i:/apps/easybuild/easyconfigs-master/easybuild/easyconfigs:/apps/easybuild/easyconfigs-develop/easybuild/easyconfigs
+
+[config]
+buildpath=/dev/shm/USER/build
+installpath=EASYBUILD_ROOT
+installpath-modules=EASYBUILD_ROOT/modules
+installpath-software=EASYBUILD_ROOT/all
+moduleclasses=python
+repository=FileRepository
+repositorypath=EASYBUILD_ROOT/file-repository
+sourcepath=EASYBUILD_ROOT/sources
+
+[easyconfig]
+local-var-naming-check=error
+
+[override]
+# 8.0 for Karolina, 7.0 for Barbora
+cuda-compute-capabilities=CUDA_CC
+detect-loaded-modules=purge
+enforce-checksums=True
+silence-deprecation-warnings=True
+trace=True
+```
+
+!!! note
+    Do not forget to add the path to your modules to MODULEPATH using the `module use` command in your `~/.bashrc` to be able to lookup and use your installed modules.
+
+Template requires you to fill in the `EASYBUILD_ROOT`, `CUDA_CC`, and `USER` variables. `EASYBUILD_ROOT` is the top level directory which will hold all of your EasyBuild related data. `CUDA_CC` defines the CUDA compute capabilities of graphics cards, and `USER` should preferably be set to your username.
+
+If you plan on writing more than one or two of your own easyconfigs, it might be useful to setup a custom easyconfig repository. Simply prepend it's path to the `robot` and `robot-paths` variables.
+
+A detailed documentation regarding EasyBuild configuration is available [here][e].
+
+[a]: https://code.it4i.cz/sccs/easyconfigs-it4i
+[b]: https://docs.easybuild.io/
+[c]: https://github.com/easybuilders/easybuild/wiki/Compiler-toolchains
+[d]: https://github.com/easybuilders/easybuild-easyconfigs
+[e]: https://docs.easybuild.io/configuration/
No results found