Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision

Target

Select target project
  • sccs/docs.it4i.cz
  • soj0018/docs.it4i.cz
  • lszustak/docs.it4i.cz
  • jarosjir/docs.it4i.cz
  • strakpe/docs.it4i.cz
  • beranekj/docs.it4i.cz
  • tab0039/docs.it4i.cz
  • davidciz/docs.it4i.cz
  • gui0013/docs.it4i.cz
  • mrazek/docs.it4i.cz
  • lriha/docs.it4i.cz
  • it4i-vhapla/docs.it4i.cz
  • hol0598/docs.it4i.cz
  • sccs/docs-it-4-i-cz-fumadocs
  • siw019/docs-it-4-i-cz-fumadocs
15 results
Select Git revision
Show changes
Showing
with 3081 additions and 0 deletions
# GSL
The GNU Scientific Library. Provides a wide range of mathematical routines.
## Introduction
The GNU Scientific Library (GSL) provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total. The routines have been written from scratch in C, and present a modern API for C programmers, allowing wrappers to be written for very high level languages.
The library covers a wide range of topics in numerical computing. Routines are available for the following areas:
Complex Numbers Roots of Polynomials
Special Functions Vectors and Matrices
Permutations Combinations
Sorting BLAS Support
Linear Algebra CBLAS Library
Fast Fourier Transforms Eigensystems
Random Numbers Quadrature
Random Distributions Quasi-Random Sequences
Histograms Statistics
Monte Carlo Integration N-Tuples
Differential Equations Simulated Annealing
Numerical Differentiation Interpolation
Series Acceleration Chebyshev Approximations
Root-Finding Discrete Hankel Transforms
Least-Squares Fitting Minimization
IEEE Floating-Point Physical Constants
Basis Splines Wavelets
## Modules
For the list of available gsl modules, use the command:
```console
$ ml av gsl
---------------- /apps/modules/numlib -------------------
GSL/2.5-intel-2017c GSL/2.6-iccifort-2020.1.217 GSL/2.7-GCC-10.3.0 (D)
GSL/2.6-GCC-10.2.0 GSL/2.6-iccifort-2020.4.304
```
## Linking
Load an appropriate `gsl` module. Use the `-lgsl` switch to link your code against GSL. The GSL depends on cblas API to BLAS library, which must be supplied for linking. The BLAS may be provided, for example from the MKL library, as well as from the BLAS GSL library (`-lgslcblas`). Using the MKL is recommended.
### Compiling and Linking With Intel Compilers
```console
$ ml intel/2020b gsl/2.6-iccifort-2020.4.304
$ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl
```
### Compiling and Linking With GNU Compilers
```console
$ ml ml GCC/10.2.0 imkl/2020.4.304-iimpi-2020b GSL/2.6-iccifort-2020.4.304
$ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lgsl
```
## Example
Following is an example of a discrete wavelet transform implemented by GSL:
```cpp
#include <stdio.h>
#include <math.h>
#include <gsl/gsl_sort.h>
#include <gsl/gsl_wavelet.h>
int
main (int argc, char **argv)
{
int i, n = 256, nc = 20;
double *data = malloc (n * sizeof (double));
double *abscoeff = malloc (n * sizeof (double));
size_t *p = malloc (n * sizeof (size_t));
gsl_wavelet *w;
gsl_wavelet_workspace *work;
w = gsl_wavelet_alloc (gsl_wavelet_daubechies, 4);
work = gsl_wavelet_workspace_alloc (n);
for (i=0; i<n; i++)
data[i] = sin (3.141592654*(double)i/256.0);
gsl_wavelet_transform_forward (w, data, 1, n, work);
for (i = 0; i < n; i++)
{
abscoeff[i] = fabs (data[i]);
}
gsl_sort_index (p, abscoeff, 1, n);
for (i = 0; (i + nc) < n; i++)
data[p[i]] = 0;
gsl_wavelet_transform_inverse (w, data, 1, n, work);
for (i = 0; i < n; i++)
{
printf ("%gn", data[i]);
}
gsl_wavelet_free (w);
gsl_wavelet_workspace_free (work);
free (data);
free (abscoeff);
free (p);
return 0;
}
```
Load modules and compile:
```console
$ ml intel/2020b gsl/GSL/2.6-iccifort-2020.4.304
$ icc dwt.c -o dwt.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl
```
In this example, we compile the dwt.c code using the Intel compiler and link it to the MKL and GSL library, note the `-mkl` and `-lgsl` options. The library search path is compiled in, so that no modules are necessary to run the code.
# HDF5
Hierarchical Data Format library. Serial and MPI parallel version.
[HDF5 (Hierarchical Data Format)][a] is a general purpose library and file format for storing scientific data. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic objects, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs.
## Installed Versions
For the current list of installed versions, use:
```console
$ ml av HDF5
----------------------------------------------------- /apps/modules/data ------------------------------------------------------
HDF5/1.10.6-foss-2020b-parallel HDF5/1.10.6-intel-2020a HDF5/1.10.7-gompi-2021a
HDF5/1.10.6-iimpi-2020a HDF5/1.10.6-intel-2020b-parallel HDF5/1.10.7-gompic-2020b
HDF5/1.10.6-intel-2020a-parallel HDF5/1.10.7-gompi-2020b HDF5/1.10.7-iimpi-2020b (D)
```
To load the module, use the `ml` command.
The module sets up environment variables required for linking and running HDF5 enabled applications. Make sure that the choice of the HDF5 module is consistent with your choice of the MPI library. Mixing MPI of different implementations may cause unexpected results.
## Example
```cpp
#include "hdf5.h"
#define FILE "dset.h5"
int main() {
hid_t file_id, dataset_id, dataspace_id; /* identifiers */
hsize_t dims[2];
herr_t status;
int i, j, dset_data[4][6];
/* Create a new file using default properties. */
file_id = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
/* Create the data space for the dataset. */
dims[0] = 4;
dims[1] = 6;
dataspace_id = H5Screate_simple(2, dims, NULL);
/* Initialize the dataset. */
for (i = 0; i < 4; i++)
for (j = 0; j < 6; j++)
dset_data[i][j] = i * 6 + j + 1;
/* Create the dataset. */
dataset_id = H5Dcreate2(file_id, "/dset", H5T_STD_I32BE, dataspace_id,
H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
/* Write the dataset. */
status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
dset_data);
status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
dset_data);
/* End access to the dataset and release resources used by it. */
status = H5Dclose(dataset_id);
/* Terminate access to the data space. */
status = H5Sclose(dataspace_id);
/* Close the file. */
status = H5Fclose(file_id);
}
```
Load modules and compile:
```console
$ ml intel/2020b HDF5/1.10.6-intel-2020b-parallel
$ mpicc hdf5test.c -o hdf5test.x -Wl,-rpath=$LIBRARY_PATH $HDF5_INC $HDF5_SHLIB
```
For further information, see the [website][a].
[a]: http://www.hdfgroup.org/HDF5/
# Intel Numerical Libraries
Intel libraries for high performance in numerical computing.
## Intel Math Kernel Library
Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver, and many more.
```console
$ ml av mkl
------------------- /apps/modules/numlib -------------------
imkl/2017.4.239-iimpi-2017c imkl/2020.1.217-iimpi-2020a imkl/2021.2.0-iimpi-2021a (D)
imkl/2018.4.274-iimpi-2018a imkl/2020.4.304-iimpi-2020b (L) mkl/2020.4.304
imkl/2019.1.144-iimpi-2019a imkl/2020.4.304-iompi-2020b
```
!!! info
`imkl` ... with intel toolchain. `mkl` with system toolchain.
For more information, see the [Intel MKL][1] section.
## Intel Integrated Performance Primitives
Intel Integrated Performance Primitives version 7.1.1, compiled for AVX is available via the `ipp` module. IPP is a library of highly optimized algorithmic building blocks for media and data applications. This includes signal, image, and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax, and many more.
```console
$ ml av ipp
------------------- /apps/modules/perf -------------------
ipp/2020.3.304
```
For more information, see the [Intel IPP][2] section.
## Intel Threading Building Blocks
Intel Threading Building Blocks (Intel TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner.
```console
$ ml av tbb
------------------- /apps/modules/lib -------------------
tbb/2020.3-GCCcore-10.2.0
```
Read more at the [Intel TBB][3].
## Python Hooks for Intel Math Kernel Library
Python hooks for Intel(R) Math Kernel Library runtime control settings.
```console
$ ml av mkl-service
------------------- /apps/modules/data -------------------
mkl-service/2.3.0-intel-2020b
```
Read more at the [hooks][a].
[1]: ../intel/intel-suite/intel-mkl.md
[2]: ../intel/intel-suite/intel-integrated-performance-primitives.md
[3]: ../intel/intel-suite/intel-tbb.md
[a]: https://github.com/IntelPython/mkl-service
# PETSc
PETSc is a suite of building blocks for the scalable solution of scientific and engineering applications modeled by partial differential equations. It supports MPI, shared memory, and GPU through CUDA or OpenCL, as well as hybrid MPI-shared memory or MPI-GPU parallelism.
## Introduction
PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of building blocks (data structures and routines) for the scalable solution of scientific and engineering applications modelled by partial differential equations. It allows thinking in terms of high-level objects (matrices) instead of low-level objects (raw arrays). Written in C language but can also be called from FORTRAN, C++, Python, and Java codes. It supports MPI, shared memory, and GPUs through CUDA or OpenCL, as well as hybrid MPI-shared memory or MPI-GPU parallelism.
## Resources
* [project webpage][a]
* [documentation][b]
* [PETSc Users Manual (PDF)][c]
* [index of all manual pages][d]
* PRACE Video Tutorial [part1][e], [part2][f], [part3][g], [part4][h], [part5][i]
## Modules
For the current list of installed versions, use:
```console
$ ml av petsc
```
## External Libraries
PETSc needs at least MPI, BLAS, and LAPACK. These dependencies are currently satisfied with Intel MPI and Intel MKL in `petsc` modules.
PETSc can be linked with a plethora of [external numerical libraries][k], extending PETSc functionality, e.g. direct linear system solvers, preconditioners, or partitioners. See below the list of libraries currently included in `petsc` modules.
All these libraries can also be used alone, without PETSc. Their static or shared program libraries are available in
`$PETSC_DIR/$PETSC_ARCH/lib` and header files in `$PETSC_DIR/$PETSC_ARCH/include`. `PETSC_DIR` and `PETSC_ARCH` are environment variables pointing to a specific PETSc instance based on the PETSc module loaded.
* dense linear algebra
* [Elemental][l]
* sparse linear system solvers
* [Intel MKL Pardiso][m]
* [MUMPS][n]
* [PaStiX][o]
* [SuiteSparse][p]
* [SuperLU][q]
* [SuperLU_Dist][r]
* input/output
* [ExodusII][s]
* [HDF5][t]
* [NetCDF][u]
* partitioning
* [Chaco][v]
* [METIS][w]
* [ParMETIS][x]
* [PT-Scotch][y]
* preconditioners & multigrid
* [Hypre][z]
* [SPAI - Sparse Approximate Inverse][aa]
[a]: http://www.mcs.anl.gov/petsc/
[b]: http://www.mcs.anl.gov/petsc/documentation/
[c]: http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf
[d]: http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/singleindex.html
[e]: http://www.youtube.com/watch?v=asVaFg1NDqY
[f]: http://www.youtube.com/watch?v=ubp_cSibb9I
[g]: http://www.youtube.com/watch?v=vJAAAQv-aaw
[h]: http://www.youtube.com/watch?v=BKVlqWNh8jY
[i]: http://www.youtube.com/watch?v=iXkbLEBFjlM
[j]: https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html
[k]: http://www.mcs.anl.gov/petsc/miscellaneous/external.html
[l]: http://libelemental.org/
[m]: https://software.intel.com/en-us/node/470282
[n]: http://mumps.enseeiht.fr/
[o]: http://pastix.gforge.inria.fr/
[p]: http://faculty.cse.tamu.edu/davis/suitesparse.html
[q]: http://crd.lbl.gov/~xiaoye/SuperLU/#superlu
[r]: http://crd.lbl.gov/~xiaoye/SuperLU/#superlu_dist
[s]: http://sourceforge.net/projects/exodusii/
[t]: http://www.hdfgroup.org/HDF5/
[u]: http://www.unidata.ucar.edu/software/netcdf/
[v]: http://www.cs.sandia.gov/CRF/chac.html
[w]: http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
[x]: http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview
[y]: http://www.labri.fr/perso/pelegrin/scotch/
[z]: http://www.nersc.gov/users/software/programming-libraries/math-libraries/petsc/
[aa]: https://bitbucket.org/petsc/pkg-spai
# CUDA Quantum for Python
## What Is CUDA Quantum?
CUDA Quantum streamlines hybrid application development and promotes productivity and scalability in quantum computing. It offers a unified programming model designed for a hybrid setting—that is, CPUs, GPUs, and QPUs working together.
For more information, see the [official documentation][1].
## How to Install Version Without GPU Acceleration
Use (preferably in conda environment)
```bash
pip install cuda-quantum
```
## How to Install Version With GPU Acceleration Using Conda
Run:
```bash
conda create -y -n cuda-quantum python=3.10 pip
conda install -y -n cuda-quantum -c "nvidia/label/cuda-11.8.0" cuda
conda install -y -n cuda-quantum -c conda-forge mpi4py openmpi cxx-compiler cuquantum
conda env config vars set -n cuda-quantum
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CONDA_PREFIX/envs/cuda-quantum/lib"
conda env config vars set -n cuda-quantum
MPI_PATH=$CONDA_PREFIX/envs/cuda-quantum
conda run -n cuda-quantum pip install cuda-quantum
conda activate cuda-quantum
source $CONDA_PREFIX/lib/python3.10/site-packages/distributed_interfaces/activate_custom_mpi.sh
```
Then configure the MPI:
``` bash
export OMPI_MCA_opal_cuda_support=true OMPI_MCA_btl='^openib'
```
## How to Test Your Installation?
You can test your installation by running the following script:
```bash
import cudaq
kernel = cudaq.make_kernel()
qubit = kernel.qalloc()
kernel.x(qubit)
kernel.mz(qubit)
result = cudaq.sample(kernel)
```
## Further Questions Considering the Installation?
See the Cuda Quantum PyPI website at [https://pypi.org/project/cuda-quantum/][2].
## Example QNN
In the *qnn_example.py* you find a script that loads FashionMNIST dataset, chooses two data type (shirts and pants), then we create a Neural Network with quantum layer.This network is then trained on our data and later tested on the test dataset. You are free to try it on your own. Download the [QNN example][a] and rename it to `qnn_example.py`.
![](../img/cudaq.png)
[1]: https://nvidia.github.io/cuda-quantum/latest/index.html
[2]: https://pypi.org/project/cuda-quantum/
[a]: ../src/qnn_example.txt
# NVIDIA CUDA
## Introduction
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs).
## Installed Versions
For the current list of installed versions, use:
```console
$ ml av CUDA
```
## CUDA Programming
The default programming model for GPU accelerators is NVIDIA CUDA. To set up the environment for CUDA, use:
```console
$ ml CUDA
```
CUDA code can be compiled directly on login nodes. The user does not have to use compute nodes with GPU accelerators for compilation. To compile CUDA source code, use the NVCC compiler:
```console
$ nvcc --version
```
The CUDA Toolkit comes with a large number of examples, which can be a helpful reference to start with. To compile and test these examples, users should copy them to their home directory:
```console
$ cd ~
$ mkdir cuda-samples
$ cp -R /apps/nvidia/cuda/VERSION_CUDA/samples/* ~/cuda-samples/
```
To compile examples, change directory to the particular example (here the example used is deviceQuery) and run `make` to start the compilation;
```console
$ cd ~/cuda-samples/1_Utilities/deviceQuery
$ make
```
Request an interactive session on the `qgpu` queue and execute the binary file:
```console
$ salloc -p qgpu -A PROJECT_ID
$ ml CUDA
$ ~/cuda-samples/1_Utilities/deviceQuery/deviceQuery
```
The expected output of the deviceQuery example executed on a node with a Tesla K20m is:
```console
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Tesla K20m"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 4800 MBytes (5032706048 bytes)
(13) Multiprocessors x (192) CUDA Cores/MP: 2496 CUDA Cores
GPU Clock rate: 706 MHz (0.71 GHz)
Memory Clock rate: 2600 Mhz
Memory Bus Width: 320-bit
L2 Cache Size: 1310720 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = Tesla K20m
```
### Code Example
In this section, we provide a basic CUDA based vector addition code example. You can directly copy and paste the code to test it:
```cpp
$ vim test.cu
#define N (2048*2048)
#define THREADS_PER_BLOCK 512
#include <stdio.h>
#include <stdlib.h>
// GPU kernel function to add two vectors
__global__ void add_gpu( int *a, int *b, int *c, int n){
int index = threadIdx.x + blockIdx.x * blockDim.x;
if (index < n)
c[index] = a[index] + b[index];
}
// CPU function to add two vectors
void add_cpu (int *a, int *b, int *c, int n) {
for (int i=0; i < n; i++)
c[i] = a[i] + b[i];
}
// CPU function to generate a vector of random integers
void random_ints (int *a, int n) {
for (int i = 0; i < n; i++)
a[i] = rand() % 10000; // random number between 0 and 9999
}
// CPU function to compare two vectors
int compare_ints( int *a, int *b, int n ){
int pass = 0;
for (int i = 0; i < N; i++){
if (a[i] != b[i]) {
printf("Value mismatch at location %d, values %d and %dn",i, a[i], b[i]);
pass = 1;
}
}
if (pass == 0) printf ("Test passedn"); else printf ("Test Failedn");
return pass;
}
int main( void ) {
int *a, *b, *c; // host copies of a, b, c
int *dev_a, *dev_b, *dev_c; // device copies of a, b, c
int size = N * sizeof( int ); // we need space for N integers
// Allocate GPU/device copies of dev_a, dev_b, dev_c
cudaMalloc( (void**)&dev_a, size );
cudaMalloc( (void**)&dev_b, size );
cudaMalloc( (void**)&dev_c, size );
// Allocate CPU/host copies of a, b, c
a = (int*)malloc( size );
b = (int*)malloc( size );
c = (int*)malloc( size );
// Fill input vectors with random integer numbers
random_ints( a, N );
random_ints( b, N );
// copy inputs to device
cudaMemcpy( dev_a, a, size, cudaMemcpyHostToDevice );
cudaMemcpy( dev_b, b, size, cudaMemcpyHostToDevice );
// launch add_gpu() kernel with blocks and threads
add_gpu<<< N/THREADS_PER_BLOCK, THREADS_PER_BLOCK >>( dev_a, dev_b, dev_c, N );
// copy device result back to host copy of c
cudaMemcpy( c, dev_c, size, cudaMemcpyDeviceToHost );
//Check the results with CPU implementation
int *c_h; c_h = (int*)malloc( size );
add_cpu (a, b, c_h, N);
compare_ints(c, c_h, N);
// Clean CPU memory allocations
free( a ); free( b ); free( c ); free (c_h);
// Clean GPU memory allocations
cudaFree( dev_a );
cudaFree( dev_b );
cudaFree( dev_c );
return 0;
}
```
This code can be compiled using the following command:
```console
$ nvcc test.cu -o test_cuda
```
To run the code, request an interactive session to get access to one of the GPU accelerated nodes:
```console
$ salloc -p qgpu -A PROJECT_ID
$ ml cuda
$ ./test.cuda
```
## CUDA Libraries
### cuBLAS
The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library with 152 standard BLAS routines. A basic description of the library together with basic performance comparisons with MKL can be found [here][a].
#### cuBLAS Example: SAXPY
The SAXPY function multiplies the vector x by the scalar alpha and adds it to the vector y, overwriting the latest vector with the result. A description of the cuBLAS function can be found in the [NVIDIA CUDA documentation][b]. The code can be pasted in the file and compiled without any modification:
```cpp
/* Includes, system */
#include <stdio.h>
#include <stdlib.h>
/* Includes, cuda */
#include <cuda_runtime.h>
#include <cublas_v2.h>
/* Vector size */
#define N (32)
/* Host implementation of a simple version of saxpi */
void saxpy(int n, float alpha, const float *x, float *y)
{
for (int i = 0; i < n; ++i)
y[i] = alpha*x[i] + y[i];
}
/* Main */
int main(int argc, char **argv)
{
float *h_X, *h_Y, *h_Y_ref;
float *d_X = 0;
float *d_Y = 0;
const float alpha = 1.0f;
int i;
cublasHandle_t handle;
/* Initialize CUBLAS */
printf("simpleCUBLAS test running..n");
cublasCreate(&handle);
/* Allocate host memory for the matrices */
h_X = (float *)malloc(N * sizeof(h_X[0]));
h_Y = (float *)malloc(N * sizeof(h_Y[0]));
h_Y_ref = (float *)malloc(N * sizeof(h_Y_ref[0]));
/* Fill the matrices with test data */
for (i = 0; i < N; i++)
{
h_X[i] = rand() / (float)RAND_MAX;
h_Y[i] = rand() / (float)RAND_MAX;
h_Y_ref[i] = h_Y[i];
}
/* Allocate device memory for the matrices */
cudaMalloc((void **)&d_X, N * sizeof(d_X[0]));
cudaMalloc((void **)&d_Y, N * sizeof(d_Y[0]));
/* Initialize the device matrices with the host matrices */
cublasSetVector(N, sizeof(h_X[0]), h_X, 1, d_X, 1);
cublasSetVector(N, sizeof(h_Y[0]), h_Y, 1, d_Y, 1);
/* Performs operation using plain C code */
saxpy(N, alpha, h_X, h_Y_ref);
/* Performs operation using cublas */
cublasSaxpy(handle, N, &alpha, d_X, 1, d_Y, 1);
/* Read the result back */
cublasGetVector(N, sizeof(h_Y[0]), d_Y, 1, h_Y, 1);
/* Check result against reference */
for (i = 0; i < N; ++i)
printf("CPU res = %f t GPU res = %f t diff = %f n", h_Y_ref[i], h_Y[i], h_Y_ref[i] - h_Y[i]);
/* Memory clean up */
free(h_X); free(h_Y); free(h_Y_ref);
cudaFree(d_X); cudaFree(d_Y);
/* Shutdown */
cublasDestroy(handle);
}
```
!!! note
cuBLAS has its own function for data transfers between CPU and GPU memory:
- [cublasSetVector][c] - transfers data from CPU to GPU memory
- [cublasGetVector][d] - transfers data from GPU to CPU memory
To compile the code using the NVCC compiler, the `-lcublas` compiler flag has to be specified:
```console
$ ml cuda
$ nvcc -lcublas test_cublas.cu -o test_cublas_nvcc
```
To compile the same code with GCC:
```console
$ ml cuda
$ gcc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart
```
To compile the same code with the Intel compiler:
```console
$ ml cuda
$ ml intel
$ icc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart
```
[a]: https://developer.nvidia.com/cublas
[b]: http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-axpy
[c]: http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector
[d]: http://docs.nvidia.com/cuda/cublas/index.html#cublasgetvector
# ROCm HIP
## Introduction
ROCm HIP allows developers to convert [CUDA code][a] to portable C++. The same source code can be compiled to run on NVIDIA or AMD GPUs.
This page documents the use of pre-built Apptainer (previously Singularity) image on Karolina Accelerated nodes (acn).
## Get Into GPU Node
```console
$ salloc -p qgpu -A PROJECT_ID -t 01:00:00
salloc: Granted job allocation 1543777
salloc: Waiting for resource configuration
salloc: Nodes acn41 are ready for job
```
## Installed Versions of Apptainer
For the current list of installed versions, use:
```console
module avail apptainer
# ----------------- /apps/modules/tools ------------------
# apptainer-wrappers/1.0 (A) apptainer/1.1.5
```
Load the required module:
```console
module load apptainer/1.1.5
```
## Launch Apptainer
Run the container:
```console
singularity shell /home/username/rocm/centos7-nvidia-rocm.sif
```
The above gives you Apptainer shell prompt:
```console
Singularity>
```
## Inside Container
Verify that you have GPUs active and accessible on the given node:
```console
nvidia-smi
```
You should get output similar to:
```console
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.07 Driver Version: 515.65.07 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... Off | 00000000:07:00.0 Off | 0 |
| N/A 26C P0 50W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... Off | 00000000:0B:00.0 Off | 0 |
| N/A 26C P0 51W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM... Off | 00000000:48:00.0 Off | 0 |
| N/A 22C P0 51W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM... Off | 00000000:4C:00.0 Off | 0 |
| N/A 25C P0 52W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-SXM... Off | 00000000:88:00.0 Off | 0 |
| N/A 22C P0 51W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA A100-SXM... Off | 00000000:8B:00.0 Off | 0 |
| N/A 26C P0 54W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA A100-SXM... Off | 00000000:C8:00.0 Off | 0 |
| N/A 25C P0 52W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA A100-SXM... Off | 00000000:CB:00.0 Off | 0 |
| N/A 26C P0 51W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
### Code Example
In this section, we show a basic code example. You can directly copy and paste the code to test it:
```cpp
// filename : /tmp/sample.cu
#include <stdio.h>
#include <cuda_runtime.h>
#define CHECK(cmd) \
{\
cudaError_t error = cmd;\
if (error != cudaSuccess) { \
fprintf(stderr, "error: '%s'(%d) at %s:%d\n", cudaGetErrorString(error), error,__FILE__, __LINE__); \
exit(EXIT_FAILURE);\
}\
}
/*
* Square each element in the array A and write to array C.
*/
template <typename T>
__global__ void
vector_square(T *C_d, T *A_d, size_t N)
{
size_t offset = (blockIdx.x * blockDim.x + threadIdx.x);
size_t stride = blockDim.x * gridDim.x ;
for (size_t i=offset; i<N; i+=stride) {
C_d[i] = A_d[i] * A_d[i];
}
}
int main(int argc, char *argv[])
{
float *A_d, *C_d;
float *A_h, *C_h;
size_t N = 1000000;
size_t Nbytes = N * sizeof(float);
cudaDeviceProp props;
CHECK(cudaGetDeviceProperties(&props, 0/*deviceID*/));
printf ("info: running on device %s\n", props.name);
printf ("info: allocate host mem (%6.2f MB)\n", 2*Nbytes/1024.0/1024.0);
A_h = (float*)malloc(Nbytes);
CHECK(A_h == 0 ? cudaErrorMemoryAllocation : cudaSuccess );
C_h = (float*)malloc(Nbytes);
CHECK(C_h == 0 ? cudaErrorMemoryAllocation : cudaSuccess );
// Fill with Phi + i
for (size_t i=0; i<N; i++)
{
A_h[i] = 1.618f + i;
}
printf ("info: allocate device mem (%6.2f MB)\n", 2*Nbytes/1024.0/1024.0);
CHECK(cudaMalloc(&A_d, Nbytes));
CHECK(cudaMalloc(&C_d, Nbytes));
printf ("info: copy Host2Device\n");
CHECK ( cudaMemcpy(A_d, A_h, Nbytes, cudaMemcpyHostToDevice));
const unsigned blocks = 512;
const unsigned threadsPerBlock = 256;
printf ("info: launch 'vector_square' kernel\n");
vector_square <<<blocks, threadsPerBlock>>> (C_d, A_d, N);
printf ("info: copy Device2Host\n");
CHECK ( cudaMemcpy(C_h, C_d, Nbytes, cudaMemcpyDeviceToHost));
printf ("info: check result\n");
for (size_t i=0; i<N; i++) {
if (C_h[i] != A_h[i] * A_h[i]) {
CHECK(cudaErrorUnknown);
}
}
printf ("PASSED!\n");
}
```
First convert the CUDA sample code into HIP code:
```console
cd /tmp
/opt/rocm/hip/bin/hipify-perl sample.cu > sample.cpp
```
This code can then be compiled using the following commands:
```console
cd /tmp
export HIP_PLATFORM=$( /opt/rocm/hip/bin/hipconfig --platform )
export HIPCC=/opt/rocm/hip/bin/hipcc
$HIPCC sample.cpp -o sample
```
Running it, you should get the following output:
```console
Singularity> cd /tmp
Singularity> ./sample
info: running on device NVIDIA A100-SXM4-40GB
info: allocate host mem ( 7.63 MB)
info: allocate device mem ( 7.63 MB)
info: copy Host2Device
info: launch 'vector_square' kernel
info: copy Device2Host
info: check result
PASSED!
```
[a]: nvidia-cuda.md
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:12px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:12px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-lzqt{background-color:#656565;border-color:inherit;color:#ffffff;font-weight:bold;text-align:center;vertical-align:top}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
</style>
# NVIDIA HPC SDK
The NVIDIA HPC Software Development Kit includes the proven compilers, libraries, and software tools
essential to maximizing developer productivity and the performance and portability of HPC applications.
## Installed Versions
Different versions are available on Karolina, Barbora, and DGX-2.
For the current version use the command:
```console
ml av nvhpc
```
## Components
Below is the list of components in the NVIDIA HPC SDK.
<table class="tg">
<thead>
<tr>
<th class="tg-lzqt" colspan="7">Development</th>
<th class="tg-lzqt" colspan="2">Analysis</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-7btt">Programming<br>Models</td>
<td class="tg-7btt" colspan="2">Compilers</td>
<td class="tg-7btt">Core<br>Libraries</td>
<td class="tg-7btt" colspan="2">Math<br>Libraries</td>
<td class="tg-7btt">Communication<br>Libraries</td>
<td class="tg-7btt">Profilers</td>
<td class="tg-7btt">Debuggers</td>
</tr>
<tr>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/hpc-sdk/compilers/c++-parallel-algorithms/index.html" target="blank">Standard C++</a> &amp; <a href="https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html" target="">Fortran</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html" target="blank">nvcc</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html" target="blank">nvc</a></td>
<td class="tg-c3ow"><a href="https://nvidia.github.io/libcudacxx/" target="blank">libcu++</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cublas/index.html#abstract" target="blank">cuBLAS</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cutensor/index.html" target="blank">cuTENSOR</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#mpi-use" target="blank">Open MPI</a></td>
<td class="tg-c3ow">Nsight</td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cuda-gdb/index.html" target="blank">Cuda-gdb</a></td>
</tr>
<tr>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/hpc-sdk/compilers/openacc-gs/index.html" target="blank">OpenACC</a> &amp; <a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#openmp-use" target="blank">OpenMP</a></td>
<td class="tg-c3ow" colspan="2"><a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html" target="blank">nvc++</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/thrust/" target="blank">Thrust</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cusparse/index.html#abstract" target="blank">cuSPARSE</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cusolver/index.html#abstract" target="blank">cuSOLVER</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/nvshmem/" target="blank">NVSHMEM</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/nsight-systems/" target="blank">Systems</a></td>
<td class="tg-c3ow">Host</td>
</tr>
<tr>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html" target="blank">CUDA</a></td>
<td class="tg-c3ow" colspan="2"><a href="https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html" target="blank">nvfortran</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cub/index.html" target="blank">CUB</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/cufft/index.html#abstract" target="blank">cuFFT</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/cuda/curand/index.html" target="blank">cuRAND</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/index.html" target="blank">NCCL</a></td>
<td class="tg-c3ow"><a href="https://docs.nvidia.com/nsight-compute/" target="blank">Compute</a></td>
<td class="tg-c3ow">Device</td>
</tr>
</tbody>
</table>
## References
[NVIDIA HPC SDK homepage][1]<br>
[Documentation][2]
[1]: https://developer.nvidia.com/hpc-sdk
[2]: https://docs.nvidia.com/hpc-sdk/index.html
# OpenACC MPI Tutorial
This tutorial is an excerpt from Nvidia's [5× in 5 Hours: Porting a 3D Elastic Wave Simulator to GPUs Using OpenACC][1] tutorial.
All source code for this tutorial can be downloaded as part of this [tarball][2].
`SEISMIC_CPML`, developed by Dimitri Komatitsch and Roland Martin from University of Pau, France,
is a set of ten open-source Fortran 90 programs.
!!!note
Before building and running each step,
make sure that the compiler (`pgfortran`) and MPI wrappers (`mpif90`) are in your path.
## Step 0: Evaluation
Before you start, you should evaluate the code to determine
whether it is worth accelerating.
Using the compiler flag `-⁠Minfo=intensity`, you can see
that the average compute intensity of the various loops is between 2.5 and 2.64.
As a rule, anything below 1.0 is generally not worth accelerating
unless it is part of a larger program.
To build and run the original MPI/OpenMP code on your system, do the following:
```console
cd step0
make build
make run
make verify
```
## Step 1: Adding Setup Code
Because this is an MPI code where each process will use its own GPU,
you need to add some utility code to ensure that happens.
The `setDevice` routine first determines which node the process is on
(via a call to `hostid`) and then gathers the hostids from all other processes.
It then determines how many GPUs are available on the node
and assigns the devices to each process.
Note that in order to maintain portability with the CPU version,
this section of code is guarded by the preprocessor macro `_OPENACC`,
which is defined when the OpenACC directives are enabled in the HPC Fortran compiler
through the use of the `-⁠acc` command-line compiler option.
```code
#ifdef _OPENACC
#
function setDevice(nprocs,myrank)
use iso_c_binding
use openacc
implicit none
include 'mpif.h'
interface
function gethostid() BIND(C)
use iso_c_binding
integer (C_INT) :: gethostid
end function gethostid
end interface
integer :: nprocs, myrank
integer, dimension(nprocs) :: hostids, localprocs
integer :: hostid, ierr, numdev, mydev, i, numlocal
integer :: setDevice
! get the hostids so we can determine what other processes are on this node
hostid = gethostid()
CALL mpi_allgather(hostid,1,MPI_INTEGER,hostids,1,MPI_INTEGER, &
MPI_COMM_WORLD,ierr)
! determine which processors are on this node
numlocal=0
localprocs=0
do i=1,nprocs
if (hostid .eq. hostids(i)) then
localprocs(i)=numlocal
numlocal = numlocal+1
endif
enddo
! get the number of devices on this node
numdev = acc_get_num_devices(ACC_DEVICE_NVIDIA)
if (numdev .lt. 1) then
print *, 'ERROR: There are no devices available on this host. &
ABORTING.', myrank
stop
endif
! print a warning if the number of devices is less then the number
! of processes on this node. Having multiple processes share devices is not
! recommended.
if (numdev .lt. numlocal) then
if (localprocs(myrank+1).eq.1) then
! print the message only once per node
print *, 'WARNING: The number of process is greater then the number &
of GPUs.', myrank
endif
mydev = mod(localprocs(myrank+1),numdev)
else
mydev = localprocs(myrank+1)
endif
call acc_set_device_num(mydev,ACC_DEVICE_NVIDIA)
call acc_init(ACC_DEVICE_NVIDIA)
setDevice = mydev
end function setDevice
#endif
```
To build and run the step1 code on your system do the following:
```console
cd step1
make build
make run
make verify
```
## Step 2: Adding Compute Regions
Next, you add six compute regions around the eight parallel loops.
For example, here's the final reduction loop.
```code
!$acc kernels
do k = kmin,kmax
do j = NPOINTS_PML+1, NY-NPOINTS_PML
do i = NPOINTS_PML+1, NX-NPOINTS_PML
! compute kinetic energy first, defined as 1/2 rho ||v||^2
! in principle we should use rho_half_x_half_y instead of rho for vy
! in order to interpolate density at the right location in the staggered grid
! cell but in a homogeneous medium we can safely ignore it
total_energy_kinetic = total_energy_kinetic + 0.5d0 * rho*( &
vx(i,j,k)**2 + vy(i,j,k)**2 + vz(i,j,k)**2)
! add potential energy, defined as 1/2 epsilon_ij sigma_ij
! in principle we should interpolate the medium parameters at the right location
! in the staggered grid cell but in a homogeneous medium we can safely ignore it
! compute total field from split components
epsilon_xx = ((lambda + 2.d0*mu) * sigmaxx(i,j,k) - lambda * &
sigmayy(i,j,k) - lambda*sigmazz(i,j,k)) / (4.d0 * mu * (lambda + mu))
epsilon_yy = ((lambda + 2.d0*mu) * sigmayy(i,j,k) - lambda * &
sigmaxx(i,j,k) - lambda*sigmazz(i,j,k)) / (4.d0 * mu * (lambda + mu))
epsilon_zz = ((lambda + 2.d0*mu) * sigmazz(i,j,k) - lambda * &
sigmaxx(i,j,k) - lambda*sigmayy(i,j,k)) / (4.d0 * mu * (lambda + mu))
epsilon_xy = sigmaxy(i,j,k) / (2.d0 * mu)
epsilon_xz = sigmaxz(i,j,k) / (2.d0 * mu)
epsilon_yz = sigmayz(i,j,k) / (2.d0 * mu)
total_energy_potential = total_energy_potential + &
0.5d0 * (epsilon_xx * sigmaxx(i,j,k) + epsilon_yy * sigmayy(i,j,k) + &
epsilon_yy * sigmayy(i,j,k)+ 2.d0 * epsilon_xy * sigmaxy(i,j,k) + &
2.d0*epsilon_xz * sigmaxz(i,j,k)+2.d0*epsilon_yz * sigmayz(i,j,k))
enddo
enddo
enddo
!$acc end kernels
```
The `-⁠acc` command line option to the HPC Accelerator Fortran compiler enables OpenACC directives. Note that OpenACC is meant to model a generic class of devices.
Another compiler option you'll want to use during development is `-⁠Minfo`,
which provides feedback on optimizations and transformations performed on your code.
For accelerator-specific information, use the `-⁠Minfo=accel` sub-option.
Examples of feedback messages produced when compiling `SEISMIC_CPML` include:
```console
1113, Generating copyin(vz(11:91,11:631,kmin:kmax))
Generating copyin(vy(11:91,11:631,kmin:kmax))
Generating copyin(vx(11:91,11:631,kmin:kmax))
Generating copyin(sigmaxx(11:91,11:631,kmin:kmax))
Generating copyin(sigmayy(11:91,11:631,kmin:kmax))
Generating copyin(sigmazz(11:91,11:631,kmin:kmax))
Generating copyin(sigmaxy(11:91,11:631,kmin:kmax))
Generating copyin(sigmaxz(11:91,11:631,kmin:kmax))
Generating copyin(sigmayz(11:91,11:631,kmin:kmax))
```
To compute on a GPU, the first step is to move data from host memory to GPU memory.
In the example above, the compiler tells you that it is copying over nine arrays.
Note the `copyin` statements.
These mean that the compiler will only copy the data to the GPU
but not copy it back to the host.
This is because line 1113 corresponds to the start of the reduction loop compute region,
where these arrays are used but never modified.
Data movement clauses:
* `copyin` - the data is copied only to the GPU;
* `copy` - the data is copied to the device at the beginning of the region and copied back at the end of the region;
* `copyout` - the data is only copied back to the host.
The compiler is conservative and only copies the data
that's actually required to perform the necessary computations.
Unfortunately, because the interior sub-arrays are not contiguous in host memory,
the compiler needs to generate multiple data transfers for each array.
```console
1114, Loop is parallelizable
1115, Loop is parallelizable
1116, Loop is parallelizable
Accelerator kernel generated
```
Here the compiler has performed dependence analysis
on the loops at lines 1114, 1115, and 1116 (the reduction loop shown earlier).
It finds that all three loops are parallelizable so it generates an accelerator kernel.
The compiler may attempt to work around dependences that prevent parallelization by interchanging loops (i.e changing the order) where it's safe to do so. At least one outer or interchanged loop must be parallel for an accelerator kernel to be generated.
How the threads are organized is called the loop schedule.
Below you can see the loop schedule for our reduction loop.
The do loops have been replaced with a three-dimensional gang,
which in turn is composed of a two-dimensional vector section.
```console
1114, !$acc loop gang ! blockidx%y
1115, !$acc loop gang, vector(4) ! blockidx%z threadidx%y
1116, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
```
In CUDA terminology, the gang clause corresponds to a grid dimension
and the vector clause corresponds to a thread block dimension.
So here we have a 3-D array that's being grouped into blocks of 32×4 elements
where a single thread is working on a specific element.
Because the number of gangs is not specified in the loop schedule,
it will be determined dynamically when the kernel is launched.
If the gang clause had a fixed width, such as gang(16),
then each kernel would be written to loop over multiple elements.
With CUDA, programming reductions and managing shared memory can be a fairly difficult task.
In the example below, the compiler has automatically generated optimal code using these features.
```console
1122, Sum reduction generated for total_energy_kinetic
1140, Sum reduction generated for total_energy_potential
```
To build and run the step2 code on your system do the following:
```console
cd step2
make build
make run
make verify
```
## Step 3: Adding Data Regions
!!! tip
Set the environment variable `PGI_ACC_TIME=1` and run your executable.
This option prints basic profile information such as the kernel execution time,
data transfer time, initialization time, the actual launch configuration,
and total time spent in a compute region.
Note that the total time is measured from the host and includes time spent executing host code within a region.
To improve performance, you should minimize the amount of time transferring data,
i.e. the data directive.
You can use a data region to specify exact points in your program
where data should be copied from host memory to GPU memory, and back again.
Any compute region enclosed within a data region will use the previously copied data,
without the need to copy at the boundaries of the compute region.
A data region can span across host code and multiple compute regions,
and even across subroutine boundaries.
In looking at the arrays in `SEISMIC_CMPL`, there are 18 arrays with constant values.
Another 21 are used only within compute regions so are never needed on the host.
Let's start by adding a data region around the outer time step loop.
The final three arrays do need to be copied back to the host to pass their halos.
For those cases, we use the update directive.
```code
!---
!--- beginning of time loop
!---
!$acc data &
!$acc copyin(a_x_half,b_x_half,k_x_half, &
!$acc a_y_half,b_y_half,k_y_half, &
!$acc a_z_half,b_z_half,k_z_half, &
!$acc a_x,a_y,a_z,b_x,b_y,b_z,k_x,k_y,k_z, &
!$acc sigmaxx,sigmaxz,sigmaxy,sigmayy,sigmayz,sigmazz, &
!$acc memory_dvx_dx,memory_dvy_dx,memory_dvz_dx, &
!$acc memory_dvx_dy,memory_dvy_dy,memory_dvz_dy, &
!$acc memory_dvx_dz,memory_dvy_dz,memory_dvz_dz, &
!$acc memory_dsigmaxx_dx, memory_dsigmaxy_dy, &
!$acc memory_dsigmaxz_dz, memory_dsigmaxy_dx, &
!$acc memory_dsigmaxz_dx, memory_dsigmayz_dy, &
!$acc memory_dsigmayy_dy, memory_dsigmayz_dz, &
!$acc memory_dsigmazz_dz)
do it = 1,NSTEP
...
!$acc update host(sigmazz,sigmayz,sigmaxz)
! sigmazz(k+1), left shift
call MPI_SENDRECV(sigmazz(:,:,1),number_of_values,MPI_DOUBLE_PRECISION, &
receiver_left_shift,message_tag,sigmazz(:,:,NZ_LOCAL+1), &
number_of_values,
...
!$acc update device(sigmazz,sigmayz,sigmaxz)
...
! --- end of time loop
enddo
!$acc end data
```
Data regions can be nested, and in fact we used this feature
in the time loop body for the arrays vx, vy and vz as shown below.
While these arrays are copied back and forth at the inner data region boundary,
and so are moved more often than the arrays moved in the outer data region,
they are used across multiple compute regions
instead of being copied at each compute region boundary.
Note that we do not specify any array dimensions in the copy clause.
This instructs the compiler to copy each array in its entirety as a contiguous block,
and eliminates the inefficiency we noted earlier
when interior sub-arrays were being copied in multiple blocks.
```code
!$acc data copy(vx,vy,vz)
... data region spans over 5 compute regions and host code
!$acc kernels
...
!$acc end kernels
!$acc end data
```
To build and run the step3 code on your system do the following:
```console
cd step3
make build
make run
make verify
```
## Step 4: Optimizing Data Transfers
The next steps further optimizes the data transfers
by migrating as much of the computation as we can over to the GPU
and moving only the absolute minimum amount of data required.
The first step is to move the start of the outer data region up
so that it occurs earlier in the code, and to put the data initialization loops into compute kernels.
This includes the `vx`, `vy`, and `vz` arrays.
This approach enables you to remove the inner data region used in the previous optimization step.
In the following example code, notice the use of the `create` clause.
This instructs the compiler to allocate space for variables in GPU memory for local use
but to perform no data movement on those variables.
Essentially they are used as scratch variables in GPU memory.
```console
!$acc data &
!$acc copyin(a_x_half,b_x_half,k_x_half, &
!$acc a_y_half,b_y_half,k_y_half, &
!$acc a_z_half,b_z_half,k_z_half, &
!$acc ix_rec,iy_rec, &
!$acc a_x,a_y,a_z,b_x,b_y,b_z,k_x,k_y,k_z), &
!$acc copyout(sisvx,sisvy), &
!$acc create(memory_dvx_dx,memory_dvy_dx,memory_dvz_dx, &
!$acc memory_dvx_dy,memory_dvy_dy,memory_dvz_dy, &
!$acc memory_dvx_dz,memory_dvy_dz,memory_dvz_dz, &
!$acc memory_dsigmaxx_dx, memory_dsigmaxy_dy, &
!$acc memory_dsigmaxz_dz, memory_dsigmaxy_dx, &
!$acc memory_dsigmaxz_dx, memory_dsigmayz_dy, &
!$acc memory_dsigmayy_dy, memory_dsigmayz_dz, &
!$acc memory_dsigmazz_dz, &
!$acc vx,vy,vz,vx1,vy1,vz1,vx2,vy2,vz2, &
!$acc sigmazz1,sigmaxz1,sigmayz1, &
!$acc sigmazz2,sigmaxz2,sigmayz2) &
!$acc copyin(sigmaxx,sigmaxz,sigmaxy,sigmayy,sigmayz,sigmazz)
...
! Initialize vx, vy and vz arrays on the device
!$acc kernels
vx(:,:,:) = ZERO
vy(:,:,:) = ZERO
vz(:,:,:) = ZERO
!$acc end kernels
...
```
One caveat to using data regions is that you must be aware of which copy
(host or device) of the data you are actually using in a given loop or computation.
For example, any update to the copy of a variable in device memory
won't be reflected in the host copy until you specified
using either an update directive or a `copy` clause at a data or compute region boundary.
!!! important
Unintentional loss of coherence between the host and device copy of a variable is one of the most common causes of validation errors in OpenACC programs.
After making the above change to `SEISMIC_CPML`, the code generated incorrect results. After debugging, it was determined that the section of the time step loop
that initializes boundary conditions was omitted from an OpenACC compute region.
As a result, we were initializing the host copy of the data,
rather than the device copy as intended, which resulted in uninitialized variables in device memory.
The next challenge in optimizing the data transfers related to the handling of the halo regions.
`SEISMIC_CPML` passes halos from six 3-D arrays between MPI processes during the course of the computations.
After some experimentation, we settled on an approach whereby we added six new temporary 2-D arrays to hold the halo data.
Within a compute region we gathered the 2-D halos from the main 3-D arrays
into the new temp arrays, copied the temporaries back to the host in one contiguous block,
passed the halos between MPI processes, and finally copied the exchanged values
back to device memory and scattered the halos back into the 3-D arrays.
While this approach does add to the kernel execution time, it saves a considerable amount of data transfer time.
In the example code below, note that the source code added to support the halo
gathers and transfers is guarded by the preprocessor `_OPENACC` macro
and will only be executed if the code is compiled by an OpenACC-enabled compiler.
```code
#ifdef _OPENACC
#
! Gather the sigma 3D arrays to a 2D slice to allow for faster
! copy from the device to host
!$acc kernels
do i=1,NX
do j=1,NY
vx1(i,j)=vx(i,j,1)
vy1(i,j)=vy(i,j,1)
vz1(i,j)=vz(i,j,NZ_LOCAL)
enddo
enddo
!$acc end kernels
!$acc update host(vxl,vyl,vzl)
! vx(k+1), left shift
call MPI_SENDRECV(vx1(:,:), number_of_values, MPI_DOUBLE_PRECISION, &
receiver_left_shift, message_tag, vx2(:,:), number_of_values, &
MPI_DOUBLE_PRECISION, sender_left_shift, message_tag, MPI_COMM_WORLD,&
message_status, code)
! vy(k+1), left shift
call MPI_SENDRECV(vy1(:,:), number_of_values, MPI_DOUBLE_PRECISION, &
receiver_left_shift,message_tag, vy2(:,:),number_of_values, &
MPI_DOUBLE_PRECISION, sender_left_shift, message_tag, MPI_COMM_WORLD,&
message_status, code)
! vz(k-1), right shift
call MPI_SENDRECV(vz1(:,:), number_of_values, MPI_DOUBLE_PRECISION, &
receiver_right_shift, message_tag, vz2(:,:), number_of_values, &
MPI_DOUBLE_PRECISION, sender_right_shift, message_tag, MPI_COMM_WORLD, &
message_status, code)
!$acc update device(vx2,vy2,vz2)
!$acc kernels
do i=1,NX
do j=1,NY
vx(i,j,NZ_LOCAL+1)=vx2(i,j)
vy(i,j,NZ_LOCAL+1)=vy2(i,j)
vz(i,j,0)=vz2(i,j)
enddo
enddo
!$acc end kernels
#else
```
To build and run the step4 code on your system do the following:
```console
cd step4
make build
make run
make verify
```
## Step 5: Loop Schedule Tuning
The final step is to tune the OpenACC compute region loop schedules
using the gang, worker, and vector clauses.
The default kernel schedules chosen by the NVIDIA OpenACC compiler are usually quite good.
Manual tuning efforts often don't improve timings significantly,
but it's always worthwhile to spend a little time examining
whether you can do better by overriding compiler-generated loop schedules
using explicit loop scheduling clauses.
You can usually tell fairly quickly if the clauses are having an effect.
Note that there is no well-defined method for finding an optimal kernel schedule.
The best advice is to start with the compiler's default schedule and try small adjustments
to see if and how they affect execution time.
The kernel schedule you choose will affect whether and how shared memory is used,
global array accesses, and various types of optimizations.
Typically, it's better to perform gang scheduling of loops with large iteration counts.
```code
!$acc loop gang
do k = k2begin,NZ_LOCAL
kglobal = k + offset_k
!$acc loop worker vector collapse(2)
do j = 2,NY
do i = 2,NX
```
To build and run the step5 code on your system do the following:
```console
cd step5
make build
make run
make verify
```
[1]: https://docs.nvidia.com/hpc-sdk/compilers/openacc-mpi-tutorial/index.html
[2]: https://docs.nvidia.com/hpc-sdk/compilers/openacc-mpi-tutorial/openacc-mpi-tutorial.tar.gz
# ANSYS CFX
[ANSYS CFX][a] is a high-performance, general purpose fluid dynamics program
that has been applied to solve wide-ranging fluid flow problems for over 20 years.
At the heart of ANSYS CFX is its advanced solver technology,
the key to achieving reliable and accurate solutions quickly and robustly.
The modern, highly parallelized solver is the foundation for an abundant choice of physical models
to capture virtually any type of phenomena related to fluid flow.
The solver and its many physical models are wrapped in a modern, intuitive, and flexible GUI and user environment,
with extensive capabilities for customization and automation using session files, scripting and a powerful expression language.
To run ANSYS CFX in batch mode, you can utilize/modify the default `cfx.slurm` script and execute it via the `sbatch` command:
```bash
#!/bin/bash
#SBATCH --nodes=5 # Request 5 nodes
#SBATCH --ntasks-per-node=128 # Request 128 MPI processes per node
#SBATCH --job-name=ANSYS-test # Job name
#SBATCH --partition=qcpu # Partition name
#SBATCH --account=ACCOUNT_ID # Account/project ID
#SBATCH --output=%x-%j.out # Output log file with job name and job ID
#SBATCH --time=04:00:00 # Walltime
#!change the working directory (default is home directory)
#cd <working directory> (working directory must exists)
DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
mkdir -p "$DIR"
cd "$DIR" || exit
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following processors:
echo `$SLURM_NODELIST`
ml ANSYS/2023R2-intel-2022.12
#### Set number of processors per host listing
procs_per_host=1
#### Create host list
hl=""
for host in $(scontrol show hostname $SLURM_NODELIST)
do
if [ "$hl" = "" ]
then hl="$host:$procs_per_host"
else hl="${hl}:$host:$procs_per_host"
fi
done
echo Machines: $hl
#-dev input.def includes the input of CFX analysis in DEF format
#-P the name of prefered license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial))
cfx5solve -def input.def -size 4 -size-ni 4x -part-large -start-method "Platform MPI Distributed Parallel" -par-dist $hl -P aa_r
```
SVS FEM recommends utilizing sources by keywords: nodes, ppn.
These keywords allow addressing directly the number of nodes (computers) and cores (ppn) utilized in the job.
In addition, the rest of the code assumes such structure of allocated resources.
A working directory has to be created before sending the Slurm job into the queue.
The input file should be in the working directory or a full path to the input file has to be specified.
The input file has to be defined by a common CFX def file which is attached to the CFX solver via the `-def` parameter.
The **license** should be selected by the `-P` parameter.
Licensed products are: `aa_r` (ANSYS **Academic** Research) and `ane3fl` (ANSYS Multiphysics-**Commercial**).
[a]: http://www.ansys.com/products/fluids/ansys-cfx
# ANSYS Fluent
[ANSYS Fluent][a] software contains the broad physical modeling capabilities needed to model flow,
turbulence, heat transfer, and reactions for industrial applications ranging
from air flow over an aircraft wing to combustion in a furnace, from bubble columns to oil platforms,
from blood flow to semiconductor manufacturing, and from clean room design to wastewater treatment plants.
Special models that give the software the ability to model in-cylinder combustion,
aeroacoustics, turbomachinery, and multiphase systems have served to broaden its reach.
## Common Way to Run Fluent
To run ANSYS Fluent in a batch mode, you can utilize/modify the default `fluent.slurm` script and execute it via the `sbatch` command:
```bash
#!/bin/bash
#SBATCH --nodes=5 # Request 5 nodes
#SBATCH --ntasks-per-node=128 # 128 MPI processes per node
#SBATCH --job-name=ANSYS-test # Job name
#SBATCH --partition=qcpu # Partition name
#SBATCH --account=ACCOUNT_ID # Account/project ID
#SBATCH --output=%x-%j.out # Output log file with job name and job ID
#SBATCH --time=04:00:00 # Walltime
#!change the working directory (default is home directory)
#cd <working directory> (working directory must exists)
DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
mkdir -p "$DIR"
cd "$DIR" || exit
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following processors:
echo $SLURM_NODELIST
#### Load ansys module so that we find the cfx5solve command
ml ANSYS/2023R2-intel-2022.12
# Count the total number of cores allocated
NCORES=$SLURM_NTASKS
fluent 3d -t$NCORES -cnf=$SLURM_NODELIST -g -i fluent.jou
```
[SVS FEM][b] recommends utilizing sources by keywords: nodes, ppn.
These keywords allows addressing directly the number of nodes (computers) and cores (ppn) utilized in the job.
In addition, the rest of the code assumes such structure of allocated resources.
A working directory has to be created before sending the job into the queue.
The input file should be in the working directory or a full path to the input file has to be specified.
The input file has to be defined by a common Fluent journal file
which is attached to the Fluent solver via the `-i fluent.jou` parameter.
A journal file with the definition of the input geometry and boundary conditions
and defined process of solution has, for example, the following structure:
```console
/file/read-case aircraft_2m.cas.gz
/solve/init
init
/solve/iterate
10
/file/write-case-dat aircraft_2m-solution
/exit yes
```
The appropriate dimension of the problem has to be set by a parameter (`2d`/`3d`).
## Fast Way to Run Fluent From Command Line
```console
fluent solver_version [FLUENT_options] -i journal_file -slurm
```
This syntax will start the ANSYS FLUENT job under Slurm using the sbatch commnad.
When resources are available, Slurm will start the job and return the job ID, usually in the form of `_job_ID.hostname_`.
This job ID can then be used to query, control, or stop the job using standard Slurm commands, such as `squeue` or `scancel`.
The job will be run out of the current working directory and all output will be written to the fluent.o `_job_ID_` file.
## Running Fluent via User's Config File
If no command line arguments are present, the sample script uses a configuration file called slurm_fluent.conf.
This configuration file should be present in the directory from which the jobs are submitted
(which is also the directory in which the jobs are executed).
The following is an example of what the content of slurm_fluent.conf can be:
```console
input="example_small.flin"
case="Small-1.65m.cas"
fluent_args="3d -pmyrinet"
outfile="fluent_test.out"
mpp="true"
```
The following is an explanation of the parameters:
`input` is the name of the input file.
`case` is the name of the .cas file that the input file will utilize.
`fluent_args` are extra ANSYS FLUENT arguments. As shown in the previous example, you can specify the interconnect by using the `-p interconnect` command. The available interconnects include ethernet (default), Myrinet, InfiniBand, Vendor, Altix, and Crayx. MPI is selected automatically, based on the specified interconnect.
`outfile` is the name of the file to which the standard output will be sent.
`mpp="true"` will tell the job script to execute the job across multiple processors.
To run ANSYS Fluent in batch mode with the user's config file, you can utilize/modify the following script and execute it via the `sbatch` command:
```bash
#!/bin/sh
#SBATCH --nodes=2 # Request 2 nodes
#SBATCH --ntasks-per-node=4 # 4 MPI processes per node
#SBATCH --cpus-per-task=128 # 128 CPUs (threads) per MPI process
#SBATCH --job-name=$USE-Fluent-Project # Job name
#SBATCH --partition=qprod # Partition name
#SBATCH --account=XX-YY-ZZ # Account/project ID
#SBATCH --output=%x-%j.out # Output file name with job name and job ID
#SBATCH --time=04:00:00 # Walltime
cd $SLURM_SUBMIT_DIR
#We assume that if they didn’t specify arguments then they should use the
#config file if ["xx${input}${case}${mpp}${fluent_args}zz" = "xxzz" ]; then
if [ -f slurm_fluent.conf ]; then
. slurm_fluent.conf
else
printf "No command line arguments specified, "
printf "and no configuration file found. Exiting n"
fi
fi
#Augment the ANSYS FLUENT command line arguments case "$mpp" in
true)
#MPI job execution scenario
num_nodes=$SLURM_NODELIST | sort -u | wc -l
cpus=‘expr $num_nodes * $NCPUS
#Default arguments for mpp jobs, these should be changed to suit your
#needs.
fluent_args="-t${cpus} $fluent_args -cnf=$SLURM_NODELIST"
;;
*)
#SMP case
#Default arguments for smp jobs, should be adjusted to suit your
#needs.
fluent_args="-t$NCPUS $fluent_args"
;;
esac
#Default arguments for all jobs
fluent_args="-ssh -g -i $input $fluent_args"
echo "---------- Going to start a fluent job with the following settings:
Input: $input
Case: $case
Output: $outfile
Fluent arguments: $fluent_args"
#run the solver
fluent $fluent_args > $outfile
```
It runs the jobs out of the directory from which they are submitted (SLURM_SUBMIT_DIR).
## Running Fluent in Parralel
Fluent could be run in parallel only under the Academic Research license.
To do this, the ANSYS Academic Research license must be placed before the ANSYS CFD license in user preferences.
To make this change, the anslic_admin utility should be run:
```console
/ansys_inc/shared_les/licensing/lic_admin/anslic_admin
```
The ANSLIC_ADMIN utility will be run:
![](../../../img/Fluent_Licence_1.jpg)
![](../../../img/Fluent_Licence_2.jpg)
![](../../../img/Fluent_Licence_3.jpg)
The ANSYS Academic Research license should be moved up to the top of the list:
![](../../../img/Fluent_Licence_4.jpg)
[a]: http://www.ansys.com/products/fluids/ansys-fluent
[b]: http://www.svsfem.cz
# ANSYS LS-DYNA
[ANSYSLS-DYNA][a] provides convenient and easy-to-use access to the technology-rich,
time-tested explicit solver without the need to contend
with the complex input requirements of this sophisticated program.
Introduced in 1996, ANSYS LS-DYNA capabilities have helped customers in numerous industries
to resolve highly intricate design issues.
ANSYS Mechanical users have been able to take advantage of complex explicit solutions
for a long time utilizing the traditional ANSYS Parametric Design Language (APDL) environment.
These explicit capabilities are available to ANSYS Workbench users as well.
The Workbench platform is a powerful, comprehensive, easy-to-use environment for engineering simulation.
CAD import from all sources, geometry cleanup, automatic meshing, solution,
parametric optimization, result visualization, and comprehensive report generation
are all available within a single fully interactive modern graphical user environment.
To run ANSYS LS-DYNA in batch mode, you can utilize/modify the default `ansysdyna.slurm` script
and execute it via the `sbatch` command:
```bash
#!/bin/bash
#SBATCH --nodes=5 # Request 5 nodes
#SBATCH --ntasks-per-node=128 # Request 128 MPI processes per node
#SBATCH --job-name=ANSYS-test # Job name
#SBATCH --partition=qcpu # Partition name
#SBATCH --account=PROJECT_ID # Account/project ID
#SBATCH --output=%x-%j.out # Output log file with job name and job ID
#SBATCH --time=04:00:00 # Walltime
#!change the working directory (default is home directory)
#cd <working directory>
DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
mkdir -p "$DIR"
cd "$DIR" || exit
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following processors:
echo $SLURM_NODELIST
#! Counts the number of processors
NPROCS=$(scontrol show hostname $SLURM_NODELIST | wc -l)
echo This job has allocated $NPROCS nodes
ml ANSYS/2023R2-intel-2022.12
#### Set number of processors per host listing
procs_per_host=1
#### Create host list
hl=""
for host in $(scontrol show hostname $SLURM_NODELIST)
do
if [ "$hl" = "" ]
then hl="$host:$procs_per_host"
else hl="${hl}:$host:$procs_per_host"
fi
done
echo Machines: $hl
ansys211 -dis -lsdynampp i=input.k -machines $hl
```
[SVS FEM][b] recommends to utilize sources by keywords: nodes, ppn.
These keywords allows addressing directly the number of nodes (computers)
and cores (ppn) utilized in the job.
In addition, the rest of the code assumes such structure of allocated resources.
[a]: http://www.ansys.com/products/structures/ansys-ls-dyna
[b]: http://www.svsfem.cz
# ANSYS MAPDL
[ANSYS Multiphysics][a] offers a comprehensive product solution for both multiphysics and single-physics analysis.
The product includes structural, thermal, fluid, and both high- and low-frequency electromagnetic analysis.
The product also contains solutions for both direct and sequentially coupled physics problems
including direct coupled-field elements and the ANSYS multi-field solver.
To run ANSYS MAPDL in batch mode you can utilize/modify the default `mapdl.slurm` script and execute it via the `sbatch` command:
```bash
#!/bin/bash
#SBATCH --nodes=5 # Request 5 nodes
#SBATCH --ntasks-per-node=128 # Request 128 MPI processes per node
#SBATCH --job-name=ANSYS-test # Job name
#SBATCH --partition=qcpu # Partition name
#SBATCH --account=PROJECT_ID # Account/project ID
#SBATCH --output=%x-%j.out # Output log file with job name and job ID
#SBATCH --time=04:00:00 # Walltime
#!change the working directory (default is home directory)
#cd <working directory> (working directory must exists)
DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
mkdir -p "$DIR"
cd "$DIR" || exit
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following processors:
echo $SLURM_NODELIST
ml ANSYS/2023R2-intel-2022.12
#### Set number of processors per host listing
procs_per_host=1
#### Create host list
hl=""
for host in $(scontrol show hostname $SLURM_NODELIST)
do
if [ "$hl" = "" ]
then hl="$host:$procs_per_host"
else hl="${hl}:$host:$procs_per_host"
fi
done
echo Machines: $hl
#-i input.dat includes the input of analysis in APDL format
#-o file.out is output file from ansys where all text outputs will be redirected
#-p the name of license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial), aa_r_dy=Academic AUTODYN)
ansys211 -b -dis -p aa_r -i input.dat -o file.out -machines $hl -dir $WORK_DIR
```
[SVS FEM][b] recommends utilizing sources by keywords: nodes, ppn.
These keywords allow addressing directly the number of nodes (computers) and cores (ppn) utilized in the job.
In addition the rest of the code assumes such structure of allocated resources.
A working directory has to be created before sending the Slurm job into the queue.
The input file should be in the working directory or a full path to the input file has to be specified.
The input file has to be defined by a common APDL file which is attached to the ANSYS solver via the `-i` parameter.
The **license** should be selected by the `-p` parameter.
Licensed products are the following: `aa_r` (ANSYS **Academic** Research),
`ane3fl` (ANSYS Multiphysics-**Commercial**), and `aa_r_dy` (ANSYS **Academic** AUTODYN)
[1]: ../../../general/resources-allocation-policy.md
[a]: http://www.ansys.com/products/multiphysics
[b]: http://www.svsfem.cz
# Overview of ANSYS Products
[SVS FEM][a] as [ANSYS Channel partner][b] for the Czech Republic provided all ANSYS licenses for our clusters and supports all ANSYS Products (Multiphysics, Mechanical, MAPDL, CFX, Fluent, Maxwell, LS-DYNA, etc.) to IT staff and ANSYS users. In case of a problem with ANSYS functionality, contact [hotline@svsfem.cz][c].
We provides commercial as well as academic variants. Academic variants are distinguished by the "**Academic...**" word in the license name or by the two letter preposition "**aa\_**" in the license feature name. Change of license is realized on command line or directly in the user's Slurm file (see individual products).
To load the latest version of any ANSYS product (Mechanical, Fluent, CFX, MAPDL, etc.) load the module:
```console
$ ml ANSYS
```
ANSYS supports interactive mode, but due to assumed solution of extremely difficult tasks it is not recommended.
If the user needs to work in the interactive mode, we recommend to configure the RSM service on the client machine which allows to forward the solution to the cluster directly from the client's Workbench project (see ANSYS RSM service).
[a]: http://www.svsfem.cz/
[b]: http://www.ansys.com/
[c]: mailto:hotline@svsfem.cz
# Licensing and Available Versions
## ANSYS License Can Be Used By:
* all persons in the carrying out of the CE IT4Innovations Project (In addition to the primary licensee, which is VSB - Technical University of Ostrava, users are CE IT4Innovations third parties - CE IT4Innovations project partners, particularly the University of Ostrava, the Brno University of Technology - Faculty of Informatics, the Silesian University in Opava, Institute of Geonics AS CR.)
* all persons who have a valid license
* students of the Technical University
## ANSYS Academic Research
The license intended to be used for science and research, publications, students’ projects (academic license).
## ANSYS COM
The license intended to be used for science and research, publications, students’ projects, and commercial research with no commercial use restrictions.
## Server / Port
lic-ansys.vsb.cz / 1055 (2325)
![](../../../img/Ansys-lic-admin.jpg)
## Available Versions
* 21.1
``` console
$ ml av ANSYS
---------------- /apps/modules/tools -----------------------
ANSYS/21.1-intel-2018a (D)
Where:
D: Default Module
```
# Setting License Preferences
Some ANSYS tools allow you to explicitly specify usage of academic or commercial licenses in the command line (e.g. ansys211 -p aa_r to select the Academic Research license). However, we have observed that not all tools obey this option and choose the commercial license.
Thus you need to configure preferred license order with ANSLIC_ADMIN. Follow these steps and move the Academic Research license to the top or bottom of the list accordingly.
Launch the ANSLIC_ADMIN utility in a graphical environment:
```console
$ANSYSLIC_DIR/lic_admin/anslic_admin
```
ANSLIC_ADMIN Utility will be run
![](../../../img/Fluent_Licence_1.jpg)
![](../../../img/Fluent_Licence_2.jpg)
![](../../../img/Fluent_Licence_3.jpg)
The ANSYS Academic Research license should be moved up to the top or down to the bottom of the list.
![](../../../img/Fluent_Licence_4.jpg)
# Workbench
## Workbench Batch Mode
It is possible to run Workbench scripts in a batch mode.
You need to configure solvers of individual components to run in parallel mode.
Open your project in Workbench.
Then, for example, in *Mechanical*, go to *Tools - Solve Process Settings...*.
![](../../../img/AMsetPar1.png)
Enable the *Distribute Solution* checkbox and enter the number of cores (e.g. 72 to run on two Barbora nodes).
If you want the job to run on more than 1 node, you must also provide a so called MPI appfile.
In the *Additional Command Line Arguments* input field, enter:
```console
-mpifile /path/to/my/job/mpifile.txt
```
Where `/path/to/my/job` is the directory where your project is saved.
We will create the file `mpifile.txt` programmatically later in the batch script.
For more information, refer to \*ANSYS Mechanical APDL Parallel Processing\* \*Guide\*.
Now, save the project and close Workbench.
We will use this script to launch the job:
```bash
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=128
#SBATCH --job-name=test9_mpi_2
#SBATCH --partition=qcpu
#SBATCH --account=ACCOUNT_ID
# change the working directory
DIR=/scratch/project/PROJECT_ID/$SLURM_JOB_ID
mkdir -p "$DIR"
cd "$DIR" || exit
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following nodes:
echo `$SLURM_NODELIST`
ml ANSYS/2023R2-intel-2022.12
#### Set number of processors per host listing
procs_per_host=24
#### Create MPI appfile
echo -n "" > mpifile.txt
for host in `$SLURM_NODELIST`
do
echo "-h $host -np $procs_per_host $ANSYS160_DIR/bin/ansysdis161 -dis" > mpifile.txt
done
#-i input.dat includes the input of analysis in APDL format
#-o file.out is output file from ansys where all text outputs will be redirected
#-p the name of license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial), aa_r_dy=Academic AUTODYN)
# prevent using scsif0 interface on accelerated nodes
export MPI_IC_ORDER="UDAPL"
# spawn remote process using SSH (default is RSH)
export MPI_REMSH="/usr/bin/ssh"
runwb2 -R jou6.wbjn -B -F test9.wbpj
```
The solver settings are saved in the `solvehandlers.xml` file,
which is not located in the project directory.
Verify your solved settings when uploading a project from your local computer.
# Apptainer on IT4Innovations
On our clusters, the Apptainer images of main Linux distributions are prepared.
```console
Barbora Karolina
├── CentOS ├── CentOS
| └── 7 | └── 7
├── Rocky ├── Rocky
| ├── 8 | ├── 8
│ └── 9 │ └── 9
├── Fedora ├── Fedora
│ └── latest │ └── latest
└── Ubuntu └── Ubuntu
└── latest └── latest
```
!!! info
Current information about available Apptainer images can be obtained by the `ml av` command. The images are listed in the `OS` section.
The bootstrap scripts, wrappers, features, etc. are located on [it4i-singularity GitLab page][a].
## IT4Innovations Apptainer Wrappers
For better user experience with Apptainer containers, we prepared several wrappers:
* image-exec
* image-mpi
* image-run
* image-shell
* image-update
Listed wrappers help you to use prepared Apptainer images loaded as modules.
You can easily load a Apptainer image like any other module on the cluster by the `ml OS/version` command.
After the module is loaded for the first time, the prepared image is copied into your home folder and is ready for use.
When you load the module next time, the version of the image is checked and an image update (if exists) is offered.
Then you can update your copy of the image by the `image-update` command.
!!! warning
With an image update, all user changes to the image will be overridden.
The runscript inside the Apptainer image can be run by the `image-run` command.
!!! note " CentOS/7 module only"
This command automatically mounts the `/scratch` and `/apps` storage and invokes the image as writable, so user changes can be made.
Very similar to `image-run` is the `image-exec` command.
The only difference is that `image-exec` runs a user-defined command instead of a runscript.
In this case, the command to be run is specified as a parameter.
Using the interactive shell inside the Apptainer container is very useful for development.
In this interactive shell, you can make any changes to the image you want,
but be aware that you can not use the `sudo` privileged commands directly on the cluster.
To simply invoke interactive shell, use the `image-shell` command.
Another useful feature of the Apptainer is the direct support of OpenMPI.
For proper MPI function, you have to install the same version of OpenMPI inside the image as you use on the cluster.
OpenMPI/4.1.2 is installed in prepared images (CentOS 7, Rocky 8).
The MPI must be started outside the container.
The easiest way to start the MPI is to use the `image-mpi` command.
This command has the same parameters as `mpirun`, so there is no difference between running normal MPI application
and MPI application in a Apptainer container.
## Examples
In the examples, we will use prepared Apptainer images.
### Load Image
```console
$ ml CentOS/7
Preparing image CentOS-7_20230116143612.sif
261.20M 100% 412.36MB/s 0:00:00 (xfr#1, to-chk=0/1)
Your image of CentOS/7 is at location: /home/username/.apptainer/images/CentOS-7_20230116143612.sif
```
!!! tip
After the module is loaded for the first time, the prepared image is copied into your home folder to the *.apptainer/images* subfolder.
### Wrappers
**image-exec**
Executes the given command inside the Apptainer image. The container is in this case started, then the command is executed and the container is stopped.
```console
$ ml CentOS/7
$ image-exec cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
```
**image-mpi**
MPI wrapper - see more in the [Examples MPI][1] section.
**image-run**
This command runs the runscript inside the Apptainer image. Note, that the prepared images do not contain a runscript.
**image-shell**
Invokes an interactive shell inside the Apptainer image.
```console
$ ml CentOS/7
$ image-shell
Apptainer>
```
### Update Image
This command is for updating your local Apptainer image copy.
The local copy is overridden in this case.
```console
$ ml CentOS/7
New version of CentOS image was found. (New: CentOS-7_20230116143612.sif Old: CentOS-7_20230115143612.sif)
For updating image use: image-update
Your image of CentOS/7 is at location: /home/username/.apptainer/images/CentOS-7_20230115143612.sif
$ image-update
New version of CentOS image was found. (New: CentOS-7_20230116143612.sif Old: CentOS-7_20230115143612.sif)
Do you want to update local copy? (WARNING all user modification will be deleted) [y/N]: y
Updating image CentOS-7_20230116143612.sif
2.71G 100% 199.49MB/s 0:00:12 (xfer#1, to-check=0/1)
sent 2.71G bytes received 31 bytes 163.98M bytes/sec
total size is 2.71G speedup is 1.00
New version is ready. (/home/username/.apptainer/images/CentOS-7_20230116143612.sif)
```
### MPI
In the following example, we are using a job submitted by the command:
```bash
$ salloc -A PROJECT_ID -p qcpu --nodes=2 --ntasks-per-node=128 --time=00:30:00
```
!!! note
We have seen no major performance impact for a job running in a Apptainer container.
With Apptainer, the MPI usage model is to call `mpirun` from outside the container
and reference the container from your `mpirun` command.
Usage would look like this:
```console
$ mpirun -np 128 apptainer exec container.img /path/to/contained_mpi_prog
```
By calling `mpirun` outside of the container, we solve several very complicated work-flow aspects.
For example, if `mpirun` is called from within the container, it must have a method for spawning processes on remote nodes.
Historically the SSH is used for this, which means that there must be an `sshd` running within the container on the remote nodes
and this `sshd` process must not conflict with the `sshd` running on that host.
It is also possible for the resource manager to launch the job
and (in OpenMPI’s case) the Orted (Open RTE User-Level Daemon) processes on the remote system,
but that then requires resource manager modification and container awareness.
In the end, we do not gain anything by calling `mpirun` from within the container
except for increasing the complexity levels and possibly losing out on some added
performance benefits (e.g. if a container was not built with the proper OFED as the host).
#### MPI Inside Apptainer Image
```console
$ ml CentOS/7
$ image-shell
Apptainer> mpirun hostname | wc -l
128
```
As you can see in this example, we allocated two nodes, but MPI can use only one node (128 processes) when used inside the Apptainer image.
#### MPI Outside Apptainer Image
```console
$ ml CentOS/7
Your image of CentOS/7 is at location: /home/username/.apptainer/images/CentOS-7_20230116143612.sif
$ image-mpi hostname | wc -l
256
```
In this case, the MPI wrapper behaves like the `mpirun` command.
The `mpirun` command is called outside the container
and the communication between nodes are propagated into the container automatically.
## How to Use Own Image on Cluster?
* Prepare the image on your computer
* Transfer the images to your `/home` directory on the cluster (for example `.apptainer/image`)
```console
local:$ scp container.img login@login2.clustername.it4i.cz:~/.apptainer/image/container.img
```
* Load module Apptainer (`ml apptainer`)
* Use your image
!!! note
If you want to use the Apptainer wrappers with your own images, load the `apptainer-wrappers/1.0` module and set the environment variable `IMAGE_PATH_LOCAL=/path/to/container.img`.
## How to Edit IT4Innovations Image?
* Transfer the image to your computer
```console
local:$ scp login@login2.clustername.it4i.cz:/home/username/.apptainer/image/container.img container.img
```
* Modify the image
* Transfer the image from your computer to your `/home` directory on the cluster
```console
local:$ scp container.img login@login2.clustername.it4i.cz:/home/username/.apptainer/image/container.img
```
* Load module Apptainer (`ml apptainer`)
* Use your image
[1]: #mpi
[a]: https://code.it4i.cz/sccs/it4i-singularity
# Generating Container Recipes & Images
EasyBuild has support for generating container recipes that will use EasyBuild to build and install a specified software stack. In addition, EasyBuild can (optionally) leverage the build tool provided by the container software of choice to create container images.
## Generating Container Recipes
To generate container recipes, use `eb --containerize`, or `eb -C` for short.
The resulting container recipe will leverage EasyBuild to build and install the software that corresponds to the easyconfig files that are specified as arguments to the eb command (and all required dependencies, if needed).
!!! note
EasyBuild will refuse to overwrite existing container recipes.
To re-generate an already existing recipe file, use the `--force` command line option.
## Base Container Image
In order to let EasyBuild generate a container recipe, it is required to specify which container image should be used as a base, via the `--container-base` configuration option.
Currently, three types of container base images can be specified:
* **localimage: *path***: the location of an existing container image file
* **docker:*name***: the name of a Docker container image (to be downloaded from [Docker Hub][a].
* **shub:*name***: the name of a Apptainer/Singularity container image (to be downloaded from [Singularity Hub][b].
## Building Container Images
To instruct EasyBuild to also build a container image from the generated container recipe, use `--container-build-image` (in combination with `-C` or `--containerize`).
EasyBuild will leverage functionality provided by the container software of choice (see containers_cfg_image_type) to build the container image.
For example, in the case of Apptainer/Singularity, EasyBuild will run `sudo /path/to/singularity build` on the generated container recipe.
The container image will be placed in the location specified by the `--containerpath` configuration option (see Location for generated container recipes & images (`--containerpath`)), next to the generated container recipe that was used to build the image.
## Example Usage
In this example, we will use a pre-built base container image located at `/tmp/example.simg` (see also Base container image (`--container-base`)).
To let EasyBuild generate a container recipe for GCC 6.4.0 + binutils 2.28:
```console
eb GCC-6.4.0-2.28.eb --containerize --container-base localimage:/tmp/example.simg --experimental
```
With other configuration options left to default (see the output of `eb --show-config`), this will result in a Apptainer/Singularity container recipe using example.simg as a base image, which will be stored in `$HOME/.local/easybuild/containers`:
```console
$ eb GCC-6.4.0-2.28.eb --containerize --container-base localimage:/tmp/example.simg --experimental
== temporary log file in case of crash /tmp/eb-dLZTNF/easybuild-LPLeG0.log
== Singularity definition file created at /home/example/.local/easybuild/containers/Singularity.GCC-6.4.0-2.28
== Temporary log file(s) /tmp/eb-dLZTNF/easybuild-LPLeG0.log* have been removed.
== Temporary directory /tmp/eb-dLZTNF has been removed.
```
## Example of a Generated Container Recipe
Below is an example of container recipe generated by EasyBuild, using the following command:
```console
eb Python-3.6.4-foss-2018a.eb OpenMPI-2.1.2-GCC-6.4.0-2.28.eb -C --container-base shub:shahzebsiddiqui/eb-singularity:centos-7.4.1708 --experimental
```
It uses the *shahzebsiddiqui/eb-singularity:centos-7.4.1708* base container image that is available from the Apptainer/Singularity hub ([see this webpage][c]).
```
Bootstrap: shub
From: shahzebsiddiqui/eb-singularity:centos-7.4.1708
%post
yum --skip-broken -y install openssl-devel libssl-dev libopenssl-devel
yum --skip-broken -y install libibverbs-dev libibverbs-devel rdma-core-devel
# upgrade easybuild package automatically to latest version
pip install -U easybuild
# change to 'easybuild' user
su - easybuild
eb Python-3.6.4-foss-2018a.eb OpenMPI-2.1.2-GCC-6.4.0-2.28.eb --robot --installpath=/app/ --prefix=/scratch --tmpdir=/scratch/tmp
# exit from 'easybuild' user
exit
# cleanup
rm -rf /scratch/tmp/* /scratch/build /scratch/sources /scratch/ebfiles_repo
%runscript
eval "$@"
%environment
source /etc/profile
module use /app/modules/all
ml Python/3.6.4-foss-2018a OpenMPI/2.1.2-GCC-6.4.0-2.28
%labels
```
!!! note
We also specify the easyconfig file for the OpenMPI component of `foss/2018a` here, because it requires specific OS dependencies to be installed (see the second `yum ... install` line in the generated container recipe).
We intend to let EasyBuild take into account the OS dependencies of the entire software stack automatically in a future update.
The generated container recipe includes `pip install -U easybuild` to ensure that the latest version of EasyBuild is used to build the software in the container image, regardless of whether EasyBuild was already present in the container and which version it was.
In addition, the generated module files will follow the default module-naming scheme (EasyBuildMNS). The modules that correspond to the easyconfig files that were specified on the command line will be loaded automatically; see the statements in the %environment section of the generated container recipe.
## Example of Building Container Image
You can instruct EasyBuild to also build the container image by using `--container-build-image`.
Note that you will need to enter your sudo password (unless you recently executed a sudo command in the same shell session):
```console
$ eb GCC-6.4.0-2.28.eb --containerize --container-base localimage:/tmp/example.simg --container-build-image --experimental
== temporary log file in case of crash /tmp/eb-aYXYC8/easybuild-8uXhvu.log
== Singularity tool found at /usr/bin/singularity
== Singularity version '2.4.6' is 2.4 or higher ... OK
== Singularity definition file created at /home/example/.local/easybuild/containers/Singularity.GCC-6.4.0-2.28
== Running 'sudo /usr/bin/singularity build /home/example/.local/easybuild/containers/GCC-6.4.0-2.28.simg /home/example/.local/easybuild/containers/Singularity.GCC-6.4.0-2.28', you may need to enter your 'sudo' password...
== (streaming) output for command 'sudo /usr/bin/singularity build /home/example/.local/easybuild/containers/GCC-6.4.0-2.28.simg /home/example/.local/easybuild/containers/Singularity.GCC-6.4.0-2.28':
Using container recipe deffile: /home/example/.local/easybuild/containers/Singularity.GCC-6.4.0-2.28
Sanitizing environment
Adding base Singularity environment to container
...
== temporary log file in case of crash /scratch/tmp/eb-WnmCI_/easybuild-GcKyY9.log
== resolving dependencies ...
...
== building and installing GCCcore/6.4.0...
...
== building and installing binutils/2.28-GCCcore-6.4.0...
...
== building and installing GCC/6.4.0-2.28...
...
== COMPLETED: Installation ended successfully
== Results of the build can be found in the log file(s) /app/software/GCC/6.4.0-2.28/easybuild/easybuild-GCC-6.4.0-20180424.084946.log
== Build succeeded for 15 out of 15
...
Building Singularity image...
Singularity container built: /home/example/.local/easybuild/containers/GCC-6.4.0-2.28.simg
Cleaning up...
== Singularity image created at /home/example/.local/easybuild/containers/GCC-6.4.0-2.28.simg
== Temporary log file(s) /tmp/eb-aYXYC8/easybuild-8uXhvu.log* have been removed.
== Temporary directory /tmp/eb-aYXYC8 has been removed.
```
To inspect the container image, you can use `singularity shell` to start a shell session in the container:
```console
$ singularity shell --shell "/bin/bash --norc" $HOME/.local/easybuild/containers/GCC-6.4.0-2.28.simg
Singularity GCC-6.4.0-2.28.simg:~> source /etc/profile
Singularity GCC-6.4.0-2.28.simg:~> module list
Currently Loaded Modules:
1) GCCcore/6.4.0 2) binutils/2.28-GCCcore-6.4.0 3) GCC/6.4.0-2.28
Singularity GCC-6.4.0-2.28.simg:~> which gcc
/app/software/GCCcore/6.4.0/bin/gcc
Singularity GCC-6.4.0-2.28.simg:~> gcc --version
gcc (GCC) 6.4.0
...
```
Or, you can use `singularity exec` to execute a command in the container.
Compare the output of running which gcc and `gcc --version` locally:
```console
$ which gcc
/usr/bin/gcc
$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
...
```
and the output when running the same commands in the container:
```console
$ singularity exec GCC-6.4.0-2.28.simg which gcc
/app/software/GCCcore/6.4.0/bin/gcc
$ singularity exec GCC-6.4.0-2.28.simg gcc --version
gcc (GCC) 6.4.0
...
```
## Configuration
### Location for Generated Container Recipes & Images
To control the location where EasyBuild will put generated container recipes & images, use the `--containerpath` configuration setting. Next to providing this as an option to the eb command, you can also define the `$EASYBUILD_CONTAINERPATH` environment variable or specify containerpath in an EasyBuild configuration file.
The default value for this location is `$HOME/.local/easybuild/containers`, unless the `--prefix` configuration setting was provided, in which case it becomes <prefix>/containers (see Overall prefix path (`--prefix`)).
Use `eb --show-full-config | grep containerpath` to determine the currently active setting.
### Container Image Format
The format for container images that EasyBuild produces via the functionality provided by the container software can be controlled via the `--container-image-format` configuration setting.
For Apptainer/Singularity containers (see Type of container recipe/image to generate (`--container-type`)), three image formats are supported:
* squashfs (default): compressed images using squashfs read-only file system
* ext3: writable image file using ext3 file system
* sandbox: container image in a regular directory
See also official user guide on [Image Mounts format][d] and [Building a Container][e].
## Name for Container Recipe & Image
By default, EasyBuild will use the name of the first easyconfig file (without the .eb suffix) as a name for both the container recipe and the image.
You can specify an alternate name using the `--container-image-name` configuration setting.
The filename of the generated container recipe will be `Singularity`.<name>.
The filename of the container image will be `<name><extension>`, where the value for `<extension>` depends on the image format (see Container image format (`--container-image-format`)):
* ‘.simg’ for squashfs container images
* ‘.img’ for ext3 container images
* empty for sandbox container images (in which case the container image is actually a directory rather than a file)
### Temporary Directory for Creating Container Images
The container software that EasyBuild leverages to build container images may be using a temporary directory in a location that does not have sufficient free space.
You can instruct EasyBuild to pass an alternate location via the `--container-tmpdir` configuration setting.
For Apptainer/Singularity, the default is to use /tmp, [see][f]. If `--container-tmpdir` is specified, the `$SINGULARITY_TMPDIR` environment variable will be defined accordingly to let Apptainer/Singularity use that location instead.
Type of container recipe/image to generate (`--container-type`).
With the `--container-type` configuration option, you can specify what type of container recipe/image EasyBuild should generate. Possible values are:
* singularity (default): [Singularity][g] container recipes & images
* docker: [Docker][h] container recipe & images
For detailed documentation, see the [webpage][i].
[a]: https://hub.docker.com/
[b]: https://singularity-hub.org/
[c]: https://singularity-hub.org/collections/143
[d]: https://apptainer.org/docs/user/latest/bind_paths_and_mounts.html#image-mounts
[e]: https://apptainer.org/docs/user/latest/build_a_container.html
[f]: https://apptainer.org/docs/user/latest/build_env.html#temporary-folders
[g]: https://singularity.lbl.gov
[h]: https://docs.docker.com/
[i]: http://easybuild.readthedocs.io/en/latest/Containers.html
# EasyBuild
The objective of this tutorial is to show how EasyBuild can be used to ease, automate, and script the build of software on the IT4Innovations clusters. Two use-cases are considered. First, we are going to build a software that is supported by EasyBuild. Then, we will see through a simple example how to add support for a new software in EasyBuild.
The benefit of using EasyBuild for your builds is that it allows automated and reproducible build of software. Once a build has been made, the build script (via the EasyConfig file) or the installed software (via the module file) can be shared with other users.
## Short Introduction
EasyBuild is a tool that allows performing automated and reproducible software compilation and installation.
All builds and installations are performed at user level, so you do not need the admin rights. The software is installed in your home directory (by default in `$HOME/.local/easybuild/software/`) and a module file is generated (by default in `$HOME/.local/easybuild/modules/`) to use the software.
EasyBuild relies on two main concepts:
* Toolchains
* EasyConfig file (our easyconfigs are [here][a])
A detailed documentation is available [here][b].
## Toolchains
A toolchain corresponds to a compiler and a set of libraries, which are commonly used to build a software. The two main toolchains frequently used on the IT4Innovations clusters are the **foss** and **intel**.
* **foss** is based on the GCC compiler and on open-source libraries (OpenMPI, OpenBLAS, etc.).
* **intel** is based on the Intel compiler and on Intel libraries (Intel MPI, Intel Math Kernel Library, etc.).
Additional details are available [here][c].
## EasyConfig File
The EasyConfig file is a simple text file that describes the build process of a software. For most software that uses standard procedure (like configure, make, and make install), this file is very simple. Many EasyConfig files are already provided with EasyBuild.
By default, EasyConfig files and generated modules are named using the following convention:
`software-name-software-version-toolchain-name-toolchain-version(-suffix).eb`
Additional details are available [here][d].
## EasyBuild on IT4Innovations Clusters
To use EasyBuild on a compute node, load the `EasyBuild` module:
```console
$ml av easybuild
------------------------------------------ /apps/modules/tools -------------------------------------------
EasyBuild/4.3.3 (S) EasyBuild/4.4.2 (S) EasyBuild/4.5.4 (S) EasyBuild/4.6.2 (S)
EasyBuild/4.3.4 (S) EasyBuild/4.5.0 (S) EasyBuild/4.5.5 (S) EasyBuild/4.7.0 (S,D)
EasyBuild/4.4.0 (S) EasyBuild/4.5.1 (S) EasyBuild/4.6.0 (S)
EasyBuild/4.4.1 (S) EasyBuild/4.5.3 (S) EasyBuild/4.6.1 (S)
Where:
S: Module is Sticky, requires --force to unload or purge
D: Default Module
Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
$ ml EasyBuild
```
The EasyBuild command is `eb`. Check the version you have loaded:
```console
$ eb --version
This is EasyBuild 4.7.0 (framework: 4.7.0, easyblocks: 4.7.0) on host login1.karolina.it4i.cz.
```
To get help on the EasyBuild options, use the `-h` or `-H` option flags:
```console
$ eb -h
Usage: eb [options] easyconfig [...]
Builds software based on easyconfig (or parse a directory). Provide one or more easyconfigs or
directories, use -H or --help more information.
Options:
-h show short help message and exit
-H OUTPUT_FORMAT show full help message and exit
Debug and logging options (configfile section MAIN):
-d Enable debug log mode (default: False)
Basic options:
Basic runtime options for EasyBuild. (configfile section basic)
...
```
## Build Software Using Provided EasyConfig File
### Search for Available Easyconfig
Searching for available easyconfig files can be done using the `--search` (long output) and `-S` (short output) command line options. All easyconfig files available in the robot search path are considered and searching is done case-insensitive.
```console
$ eb -S git
CFGS1=/apps/easybuild/easyconfigs-it4i
CFGS2=/apps/easybuild/easyconfigs-master/easybuild/easyconfigs
CFGS3=/apps/easybuild/easyconfigs-develop/easybuild/easyconfigs
* $CFGS1/.gitignore
* $CFGS1/.gitlab-ci.yml
* $CFGS1/g/git-lfs/git-lfs-1.1.1.eb
* $CFGS1/g/git-lfs/git-lfs-2.11.0.eb
* $CFGS1/g/git-lfs/git-lfs-3.1.2.eb
* $CFGS1/g/git/git-2.19.1.eb
* $CFGS1/g/git/git-2.21.0.eb
* $CFGS1/g/git/git-2.23.0.eb
* $CFGS1/g/git/git-2.25.1.eb
* $CFGS1/g/git/git-2.30.1.eb
* $CFGS1/g/git/git-2.31.1.eb
* $CFGS1/g/git/git-2.32.0-GCCcore-10.3.0-nodocs-test.eb
* $CFGS2/b/BCALM/BCALM-2.2.0-fix-nogit.patch
* $CFGS2/d/dagitty/dagitty-0.2-2-foss-2018b-R-3.5.1.eb
* $CFGS2/e/EMAN2/EMAN2-2.3_fix_broken_githash_regex_replace.patch
* $CFGS2/g/GIMIC/GIMIC-2018.04.20_git.patch
* $CFGS2/g/GitPython/GitPython-2.1.11-foss-2018b-Python-3.6.6.eb
* $CFGS2/g/GitPython/GitPython-2.1.11-intel-2018b-Python-3.6.6.eb
* $CFGS2/g/GitPython/GitPython-2.1.15.eb
* $CFGS2/g/GitPython/GitPython-3.0.3-GCCcore-8.2.0-Python-3.7.2.eb
* $CFGS2/g/GitPython/GitPython-3.1.0-GCCcore-8.3.0-Python-3.7.4.eb
* $CFGS2/g/GitPython/GitPython-3.1.9-GCCcore-9.3.0-Python-3.8.2.eb
* $CFGS2/g/GitPython/GitPython-3.1.14-GCCcore-10.2.0.eb
* $CFGS2/g/GitPython/GitPython-3.1.18-GCCcore-10.3.0.eb
* $CFGS2/g/GitPython/GitPython-3.1.24-GCCcore-11.2.0.eb
* $CFGS2/g/GitPython/GitPython-3.1.27-GCCcore-11.3.0.eb
* $CFGS2/g/gettext/gettext-0.19.8_fix-git-config.patch
* $CFGS2/g/git-extras/git-extras-5.1.0-foss-2016a.eb
...
```
### Get an Overview of Planned Installations
You can do a “dry-run” overview by supplying `-D`/`--dry-run` (typically combined with `--robot`, in the form of `-Dr`):
```console
$ eb git-2.30.1.eb -Dr
== Temporary log file in case of crash /tmp/eb-6vwvor2_/easybuild-vg82aat4.log
Dry run: printing build status of easyconfigs and dependencies
CFGS=/apps/easybuild
* [x] $CFGS/easyconfigs-master/easybuild/easyconfigs/m/M4/M4-1.4.18.eb (module: M4/1.4.18)
* [x] $CFGS/easyconfigs-it4i/a/Autoconf/Autoconf-2.69.eb (module: Autoconf/2.69)
* [ ] $CFGS/easyconfigs-it4i/g/git/git-2.30.1.eb (module: git/2.30.1)
== Temporary log file(s) /tmp/eb-6vwvor2_/easybuild-vg82aat4.log* have been removed.
== Temporary directory /tmp/eb-6vwvor2_ has been removed.
```
### Compile and Install Module
If we try to build *git-2.31.1.eb*, nothing will happen as it is already installed on the cluster. To enable dependency resolution, use the `--robot` command line option (or `-r` for short):
```console
$ eb git-2.31.1.eb -r
== Temporary log file in case of crash /tmp/eb-11d_kpht/easybuild-jmygqpqr.log
== git/2.31.1 is already installed (module found), skipping
== No easyconfigs left to be built.
== Build succeeded for 0 out of 0
== Temporary log file(s) /tmp/eb-11d_kpht/easybuild-jmygqpqr.log* have been removed.
== Temporary directory /tmp/eb-11d_kpht has been removed.
```
Rebuild *git-2.31.1.eb*. Use `eb --rebuild` to rebuild a given easyconfig/module or use `eb --force`/`-f` to force the reinstallation of a given easyconfig/module. The behavior of `--force` is the same as `--rebuild` and `--ignore-osdeps`.
```console
$ eb git-2.31.1.eb -r -f
== Temporary log file in case of crash /tmp/eb-wbzf_rxh/easybuild-umq1_01u.log
== resolving dependencies ...
== processing EasyBuild easyconfig /apps/easybuild/easyconfigs-it4i/g/git/git-2.31.1.eb
== building and installing git/2.31.1...
== fetching files...
== creating build dir, resetting environment...
== ... (took 3 secs)
== unpacking...
== ... (took 9 secs)
== patching...
== preparing...
== configuring...
== ... (took 4 secs)
== building...
== ... (took 4 secs)
== testing...
== installing...
== ... (took 2 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== cleaning up...
== ... (took 3 secs)
== creating module...
== permissions...
== packaging...
== COMPLETED: Installation ended successfully (took 30 secs)
== Results of the build can be found in the log file(s)
/home/username/.local/easybuild/software/git/2.31.1/easybuild/easybuild-git-2.31.1-20230315.092001.log
== Build succeeded for 1 out of 1
== Temporary log file(s) /tmp/eb-wbzf_rxh/easybuild-umq1_01u.log* have been removed.
== Temporary directory /tmp/eb-wbzf_rxh has been removed.
```
If we try to build *git-2.30.1.eb*:
```console
$ eb git-2.30.1.eb -r
== Temporary log file in case of crash /tmp/eb-s3t9lwk_/easybuild-cvx5kpna.log
== resolving dependencies ...
== processing EasyBuild easyconfig /apps/easybuild/easyconfigs-it4i/g/git/git-2.30.1.eb
== building and installing git/2.30.1...
== fetching files...
== creating build dir, resetting environment...
== unpacking...
== ... (took 10 secs)
== patching...
== preparing...
== configuring...
== ... (took 4 secs)
== building...
== ... (took 4 secs)
== testing...
== installing...
== ... (took 3 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== cleaning up...
== ... (took 3 secs)
== creating module...
== permissions...
== packaging...
== COMPLETED: Installation ended successfully (took 29 secs)
== Results of the build can be found in the log file(s)
/home/username/.local/easybuild/software/git/2.30.1/easybuild/easybuild-git-2.30.1-20230315.092117.log
== Build succeeded for 1 out of 1
== Temporary log file(s) /tmp/eb-s3t9lwk_/easybuild-cvx5kpna.log* have been removed.
== Temporary directory /tmp/eb-s3t9lwk_ has been removed.
```
If we try to build *git-2.30.1*, but we used easyconfig *git-2.25.1.eb*, change the version command `--try-software-version=2.30.1`:
```console
$ eb git-2.25.1.eb -r --try-software-version=2.30.1
== Temporary log file in case of crash /tmp/eb-lw9itci8/easybuild-qzb7j64j.log
== resolving dependencies ...
== processing EasyBuild easyconfig /tmp/eb-lw9itci8/tweaked_easyconfigs/git-2.30.1.eb
== building and installing git/2.30.1...
== fetching files...
== ... (took 4 secs)
== creating build dir, resetting environment...
== unpacking...
== ... (took 9 secs)
== patching...
== preparing...
== configuring...
== ... (took 4 secs)
== building...
== ... (took 4 secs)
== testing...
== installing...
== ... (took 4 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== cleaning up...
== ... (took 3 secs)
== creating module...
== permissions...
== packaging...
== COMPLETED: Installation ended successfully (took 33 secs)
== Results of the build can be found in the log file(s)
/home/username/.local/easybuild/software/git/2.30.1/easybuild/easybuild-git-2.30.1-20230315.092313.log
== Build succeeded for 1 out of 1
== Temporary log file(s) /tmp/eb-lw9itci8/easybuild-qzb7j64j.log* have been removed.
== Temporary directory /tmp/eb-lw9itci8 has been removed.
```
### MODULEPATH
To see the newly installed modules, you need to add the path where they were installed to the MODULEPATH. On the cluster, you have to use the `module use` command:
```console
$ module use $HOME/.local/easybuild/modules/all/
```
or modify your `.bash_profile`:
```console
$ cat ~/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
module use $HOME/.local/easybuild/modules/all/
PATH=$PATH:$HOME/bin
export PATH
```
## Build Software Using Your Own EasyConfig File
For this example, we create an EasyConfig file to build Git 2.38.1 with the *foss* toolchain. Open your favorite editor and create a file named *git-2.18.1-foss-2022b.eb* with the following content:
```console
$ vim git-2.38.1-foss-2022b.eb
```
```python
easyblock = 'ConfigureMake'
name = 'git'
version = '2.38.1'
homepage = 'https://git-scm.com/'
description = """Git is a free and open source distributed version control system designed
to handle everything from small to very large projects with speed and efficiency."""
toolchain = {'name': 'foss', 'version': '2022b'}
source_urls = ['https://github.com/git/git/archive']
sources = ['v%(version)s.tar.gz']
builddependencies = [
('binutils', '2.39'),
('Autotools', '20220317'),
]
dependencies = [
('cURL', '7.86.0'),
('expat', '2.4.9'),
('gettext', '0.21.1'),
('Perl', '5.36.0'),
]
preconfigopts = 'make configure && '
# Work around git build system bug. If LIBS contains -lpthread, then configure
# will not append -lpthread to LDFLAGS, but Makefile ignores LIBS.
configopts = "--with-perl=${EBROOTPERL}/bin/perl --enable-pthreads='-lpthread'"
postinstallcmds = ['cd contrib/subtree; make install']
sanity_check_paths = {
'files': ['bin/git'],
'dirs': ['libexec/git-core', 'share'],
}
moduleclass = 'tools'
```
This is a simple EasyConfig. Most of the fields are self-descriptive. No build method is explicitly defined, so it uses by default the standard configure/make/make install approach.
Let us build Git with this EasyConfig file:
```console
$ eb git-2.38.1-foss-2022b.eb -r
== Temporary log file in case of crash /tmp/eb-2aiq9qr8/easybuild-eb4zenze.log
== resolving dependencies ...
== processing EasyBuild easyconfig /home/username/git-2.38.1-foss-2022b.eb
== building and installing git/2.38.1-foss-2022b...
== fetching files...
== ... (took 3 secs)
== creating build dir, resetting environment...
== unpacking...
== ... (took 11 secs)
== patching...
== preparing...
== ... (took 2 secs)
== configuring...
== ... (took 7 secs)
== building...
== ... (took 7 secs)
== testing...
== installing...
== ... (took 2 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== ... (took 1 secs)
== cleaning up...
== ... (took 4 secs)
== creating module...
== ... (took 1 secs)
== permissions...
== packaging...
== COMPLETED: Installation ended successfully (took 41 secs)
== Results of the build can be found in the log file(s)
/home/username/.local/easybuild/software/git/2.38.1-foss-2022b/easybuild/easybuild-git-2.38.1-20230315.0957
22.log
== Build succeeded for 1 out of 1
== Temporary log file(s) /tmp/eb-2aiq9qr8/easybuild-eb4zenze.log* have been removed.
== Temporary directory /tmp/eb-2aiq9qr8 has been removed.
```
We can now check that our version of Git is available via the modules:
```console
$ ml av git
------------------------------- /home/username/.local/easybuild/modules/all -------------------------------
git/2.38.1-foss-2022b
------------------------------------------ /apps/modules/devel -------------------------------------------
libgit2/1.1.0-GCCcore-10.3.0
------------------------------------------ /apps/modules/tools -------------------------------------------
git-lfs/3.1.2 git/2.32.0-GCCcore-10.3.0-nodocs
git/2.28.0-GCCcore-10.2.0-nodocs git/2.33.1-GCCcore-11.2.0-nodocs
git/2.31.1 git/2.36.0-GCCcore-11.3.0-nodocs
git/2.32.0-GCCcore-10.3.0-nodocs-test git/2.38.1-GCCcore-12.2.0-nodocs (D)
Where:
D: Default Module
Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
```
## Advanced EasyBuild Configuration
By creating the `~/.config/easybuild/config.cfg` file, you can easily specify the desired location of your software, CUDA compute capabilities, and other options that you would usually have to specify within your easyconfig or from the command line. To get an overview of all available options, use `eb --confighelp` command.
You can use our template to set all of the usual EasyBuild variables:
```console
[MAIN]
[basic]
locks-dir=EASYBUILD_ROOT/.locks/
robot=/apps/easybuild/easyconfigs-it4i:/apps/easybuild/easyconfigs-master/easybuild/easyconfigs:/apps/easybuild/easyconfigs-develop/easybuild/easyconfigs
robot-paths=/apps/easybuild/easyconfigs-it4i:/apps/easybuild/easyconfigs-master/easybuild/easyconfigs:/apps/easybuild/easyconfigs-develop/easybuild/easyconfigs
[config]
buildpath=/dev/shm/USER/build
installpath=EASYBUILD_ROOT
installpath-modules=EASYBUILD_ROOT/modules
installpath-software=EASYBUILD_ROOT/all
moduleclasses=python
repository=FileRepository
repositorypath=EASYBUILD_ROOT/file-repository
sourcepath=EASYBUILD_ROOT/sources
[easyconfig]
local-var-naming-check=error
[override]
# 8.0 for Karolina, 7.0 for Barbora
cuda-compute-capabilities=CUDA_CC
detect-loaded-modules=purge
enforce-checksums=True
silence-deprecation-warnings=True
trace=True
```
!!! note
Do not forget to add the path to your modules to MODULEPATH using the `module use` command in your `~/.bashrc` to be able to lookup and use your installed modules.
Template requires you to fill in the `EASYBUILD_ROOT`, `CUDA_CC`, and `USER` variables. `EASYBUILD_ROOT` is the top level directory which will hold all of your EasyBuild related data. `CUDA_CC` defines the CUDA compute capabilities of graphics cards, and `USER` should preferably be set to your username.
If you plan on writing more than one or two of your own easyconfigs, it might be useful to setup a custom easyconfig repository. Simply prepend it's path to the `robot` and `robot-paths` variables.
A detailed documentation regarding EasyBuild configuration is available [here][e].
[a]: https://code.it4i.cz/sccs/easyconfigs-it4i
[b]: https://docs.easybuild.io/
[c]: https://github.com/easybuilders/easybuild/wiki/Compiler-toolchains
[d]: https://github.com/easybuilders/easybuild-easyconfigs
[e]: https://docs.easybuild.io/configuration/