Compare revisions

5099e238 · 5099e238 · 5099e238 · 5099e238 · 5099e238 · 5099e238
--- a/docs.it4i/cs/guides/xilinx.md
+++ b/docs.it4i/cs/guides/xilinx.md
+# Using Xilinx Accelerator Platform
+
+The first step to use Xilinx accelerators is to initialize Vitis (compiler) and XRT (runtime) environments.
+
+```console
+$ . /tools/Xilinx/Vitis/2023.1/settings64.sh
+$ . /opt/xilinx/xrt/setup.sh
+```
+
+## Platform Level Accelerator Management
+
+This should allow to examine current platform using `xbutil examine`,
+which should output user-level information about XRT platform and list available devices
+
+```
+$ xbutil examine
+System Configuration
+  OS Name              : Linux
+  Release              : 4.18.0-477.27.1.el8_8.x86_64
+  Version              : #1 SMP Thu Aug 31 10:29:22 EDT 2023
+  Machine              : x86_64
+  CPU Cores            : 64
+  Memory               : 257145 MB
+  Distribution         : Red Hat Enterprise Linux 8.8 (Ootpa)
+  GLIBC                : 2.28
+  Model                : ProLiant XL675d Gen10 Plus
+
+XRT
+  Version              : 2.16.0
+  Branch               : master
+  Hash                 : f2524a2fcbbabd969db19abf4d835c24379e390d
+  Hash Date            : 2023-10-11 14:01:19
+  XOCL                 : 2.16.0, f2524a2fcbbabd969db19abf4d835c24379e390d
+  XCLMGMT              : 2.16.0, f2524a2fcbbabd969db19abf4d835c24379e390d
+
+Devices present
+BDF             :  Shell                            Logic UUID                            Device ID       Device Ready*
+-------------------------------------------------------------------------------------------------------------------------
+[0000:88:00.1]  :  xilinx_u280_gen3x16_xdma_base_1  283BAB8F-654D-8674-968F-4DA57F7FA5D7  user(inst=132)  Yes
+[0000:8c:00.1]  :  xilinx_u280_gen3x16_xdma_base_1  283BAB8F-654D-8674-968F-4DA57F7FA5D7  user(inst=133)  Yes
+
+
+* Devices that are not ready will have reduced functionality when using XRT tools
+```
+
+Here two Xilinx Alveo u280 accelerators (`0000:88:00.1` and `0000:8c:00.1`) are available.
+The `xbutil` can be also used to query additional information about specific device using its BDF address
+
+```console
+$ xbutil examine -d "0000:88:00.1"
+
+-------------------------------------------------
+[0000:88:00.1] : xilinx_u280_gen3x16_xdma_base_1
+-------------------------------------------------
+Platform
+  XSA Name               : xilinx_u280_gen3x16_xdma_base_1
+  Logic UUID             : 283BAB8F-654D-8674-968F-4DA57F7FA5D7
+  FPGA Name              :
+  JTAG ID Code           : 0x14b7d093
+  DDR Size               : 0 Bytes
+  DDR Count              : 0
+  Mig Calibrated         : true
+  P2P Status             : disabled
+  Performance Mode       : not supported
+  P2P IO space required  : 64 GB
+
+Clocks
+  DATA_CLK (Data)        : 300 MHz
+  KERNEL_CLK (Kernel)    : 500 MHz
+  hbm_aclk (System)      : 450 MHz
+
+Mac Addresses            : 00:0A:35:0E:20:B0
+                         : 00:0A:35:0E:20:B1
+
+  Device Status: HEALTHY
+  Hardware Context ID: 0
+    Xclbin UUID: 6306D6AE-1D66-AEA7-B15D-446D4ECC53BD
+    PL Compute Units
+      Index  Name         Base Address  Usage  Status
+      -------------------------------------------------
+      0      vadd:vadd_1  0x800000      1      (IDLE)
+```
+
+Basic functionality of the device can be checked using `xbutil validate -d <BDF>` as
+
+```console
+$ xbutil validate -d "0000:88:00.1"
+Validate Device           : [0000:88:00.1]
+    Platform              : xilinx_u280_gen3x16_xdma_base_1
+    SC Version            : 4.3.27
+    Platform ID           : 283BAB8F-654D-8674-968F-4DA57F7FA5D7
+-------------------------------------------------------------------------------
+Test 1 [0000:88:00.1]     : aux-connection
+    Test Status           : [PASSED]
+-------------------------------------------------------------------------------
+Test 2 [0000:88:00.1]     : pcie-link
+    Test Status           : [PASSED]
+-------------------------------------------------------------------------------
+Test 3 [0000:88:00.1]     : sc-version
+    Test Status           : [PASSED]
+-------------------------------------------------------------------------------
+Test 4 [0000:88:00.1]     : verify
+    Test Status           : [PASSED]
+-------------------------------------------------------------------------------
+Test 5 [0000:88:00.1]     : dma
+    Details               : Buffer size - '16 MB' Memory Tag - 'HBM[0]'
+                            Host -> PCIe -> FPGA write bandwidth = 11988.9 MB/s
+                            Host <- PCIe <- FPGA read bandwidth = 12571.2 MB/s
+                            ...
+    Test Status           : [PASSED]
+-------------------------------------------------------------------------------
+Test 6 [0000:88:00.1]     : iops
+    Details               : IOPS: 387240(verify)
+    Test Status           : [PASSED]
+-------------------------------------------------------------------------------
+Test 7 [0000:88:00.1]     : mem-bw
+    Details               : Throughput (Type: DDR) (Bank count: 2) : 33932.9MB/s
+                            Throughput of Memory Tag: DDR[0] is 16974.1MB/s
+                            Throughput of Memory Tag: DDR[1] is 16974.2MB/s
+                            Throughput (Type: HBM) (Bank count: 1) : 12383.7MB/s
+    Test Status           : [PASSED]
+-------------------------------------------------------------------------------
+Test 8 [0000:88:00.1]     : p2p
+Test 9 [0000:88:00.1]     : vcu
+Test 10 [0000:88:00.1]    : aie
+Test 11 [0000:88:00.1]    : ps-aie
+Test 12 [0000:88:00.1]    : ps-pl-verify
+Test 13 [0000:88:00.1]    : ps-verify
+Test 14 [0000:88:00.1]    : ps-iops
+```
+
+Finally, the device can be reinitialized using `xbutil reset -d <BDF>` as
+
+```console
+$ xbutil reset -d "0000:88:00.1"
+Performing 'HOT Reset' on '0000:88:00.1'
+Are you sure you wish to proceed? [Y/n]: Y
+Successfully reset Device[0000:88:00.1]
+```
+
+This can be useful to recover the device from states such as `HANGING`, reported by `xbutil examine -d <BDF>`.
+
+## OpenCL Platform Level
+
+The `clinfo` utility can be used to verify that the accelerator is visible to OpenCL
+
+```console
+$ clinfo
+Number of platforms:                             2
+  Platform Profile:                              FULL_PROFILE
+  Platform Version:                              OpenCL 2.1 AMD-APP (3590.0)
+  Platform Name:                                 AMD Accelerated Parallel Processing
+  Platform Vendor:                               Advanced Micro Devices, Inc.
+  Platform Extensions:                           cl_khr_icd cl_amd_event_callback
+  Platform Profile:                              EMBEDDED_PROFILE
+  Platform Version:                              OpenCL 1.0
+  Platform Name:                                 Xilinx
+  Platform Vendor:                               Xilinx
+  Platform Extensions:                           cl_khr_icd
+<...>
+  Platform Name:                                 Xilinx
+Number of devices:                               2
+  Device Type:                                   CL_DEVICE_TYPE_ACCRLERATOR
+  Vendor ID:                                     0h
+  Max compute units:                             0
+  Max work items dimensions:                     3
+    Max work items[0]:                           4294967295
+    Max work items[1]:                           4294967295
+    Max work items[2]:                           4294967295
+  Max work group size:                           4294967295
+  Preferred vector width char:                   1
+  Preferred vector width short:                  1
+  Preferred vector width int:                    1
+  Preferred vector width long:                   1
+  Preferred vector width float:                  1
+  Preferred vector width double:                 0
+  Max clock frequency:                           0Mhz
+  Address bits:                                  64
+  Max memory allocation:                         4294967296
+  Image support:                                 Yes
+  Max number of images read arguments:           128
+  Max number of images write arguments:          8
+  Max image 2D width:                            8192
+  Max image 2D height:                           8192
+  Max image 3D width:                            2048
+  Max image 3D height:                           2048
+  Max image 3D depth:                            2048
+  Max samplers within kernel:                    0
+  Max size of kernel argument:                   2048
+  Alignment (bits) of base address:              32768
+  Minimum alignment (bytes) for any datatype:    128
+  Single precision floating point capability
+    Denorms:                                     No
+    Quiet NaNs:                                  Yes
+    Round to nearest even:                       Yes
+    Round to zero:                               No
+    Round to +ve and infinity:                   No
+    IEEE754-2008 fused multiply-add:             No
+  Cache type:                                    None
+  Cache line size:                               64
+  Cache size:                                    0
+  Global memory size:                            0
+  Constant buffer size:                          4194304
+  Max number of constant args:                   8
+  Local memory type:                             Scratchpad
+  Local memory size:                             16384
+  Error correction support:                      1
+  Profiling timer resolution:                    1
+  Device endianess:                              Little
+  Available:                                     No
+  Compiler available:                            No
+  Execution capabilities:
+    Execute OpenCL kernels:                      Yes
+    Execute native function:                     No
+  Queue on Host properties:
+    Out-of-Order:                                Yes
+    Profiling:                                   Yes
+  Platform ID:                                   0x16fbae8
+  Name:                                          xilinx_u280_gen3x16_xdma_base_1
+  Vendor:                                        Xilinx
+  Driver version:                                1.0
+  Profile:                                       EMBEDDED_PROFILE
+  Version:                                       OpenCL 1.0
+<...>
+```
+
+which shows that both `Xilinx` platform and accelerator devices are present.
+
+## Building Applications
+
+To simplify the build process we define two environment variables `IT4I_PLATFORM` and `IT4I_BUILD_MODE`.
+The first `IT4I_PLATFORM` denotes specific accelerator hardware such as `Alveo u250` or `Alveo u280`
+and its configuration stored in (`*.xpfm` files).
+The list of available platforms can be obtained using `platforminfo` utility:
+
+```console
+$ platforminfo -l
+{
+    "platforms": [
+        {
+            "baseName": "xilinx_u280_gen3x16_xdma_1_202211_1",
+            "version": "202211.1",
+            "type": "sdaccel",
+            "dataCenter": "true",
+            "embedded": "false",
+            "externalHost": "true",
+            "serverManaged": "true",
+            "platformState": "impl",
+            "usesPR": "true",
+            "platformFile": "\/opt\/xilinx\/platforms\/xilinx_u280_gen3x16_xdma_1_202211_1\/xilinx_u280_gen3x16_xdma_1_202211_1.xpfm"
+        },
+        {
+            "baseName": "xilinx_u250_gen3x16_xdma_4_1_202210_1",
+            "version": "202210.1",
+            "type": "sdaccel",
+            "dataCenter": "true",
+            "embedded": "false",
+            "externalHost": "true",
+            "serverManaged": "true",
+            "platformState": "impl",
+            "usesPR": "true",
+            "platformFile": "\/opt\/xilinx\/platforms\/xilinx_u250_gen3x16_xdma_4_1_202210_1\/xilinx_u250_gen3x16_xdma_4_1_202210_1.xpfm"
+        }
+    ]
+}
+```
+
+Here, `baseName` and potentially `platformFile` are of interest and either can be specified as value of `IT4I_PLATFORM`.
+In this case we have platform files `xilinx_u280_gen3x16_xdma_1_202211_1` (Alveo u280) and `xilinx_u250_gen3x16_xdma_4_1_202210_1` (Alveo u250).
+
+The `IT4I_BUILD_MODE` is used to specify build type (`hw`, `hw_emu` and `sw_emu`):
+
+- `hw` performs full synthesis for the accelerator
+- `hw_emu` allows to run both synthesis and emulation for debugging
+- `sw_emu` compiles kernels only for emulation (doesn't require accelerator and allows much faster build)
+
+For example to configure build for `Alveo u280` we set:
+
+```console
+$ export IT4I_PLATFORM=xilinx_u280_gen3x16_xdma_1_202211_1
+```
+
+### Software Emulation Mode
+
+The software emulation mode is preferable for development as HLS synthesis is very time consuming. To build following applications in this mode we set:
+
+```console
+$ export IT4I_BUILD_MODE=sw_emu
+```
+
+and run each application with `XCL_EMULATION_MODE` set to `sw_emu`:
+
+```
+$ XCL_EMULATION_MODE=sw_emu <application>
+```
+
+### Hardware Synthesis Mode
+
+!!! note
+    The HLS of these simple applications **can take up to 2 hours** to finish.
+
+To allow the application to utilize real hardware we have to synthetize FPGA design for the accelerator. This can be done by repeating same steps used to build kernels in emulation mode, but with `IT4I_BUILD_MODE` set to `hw` like so:
+
+```console
+$ export IT4I_BUILD_MODE=hw
+```
+
+the host application binary can be reused, but it has to be run without `XCL_EMULATION_MODE`:
+
+```console
+$ <application>
+```
+
+## Sample Applications
+
+The first two samples illustrate two main approaches to building FPGA accelerated applications using Xilinx platform - **XRT** and **OpenCL**.
+The final example combines **HIP** with **XRT** to show basics necessary to build application, which utilizes both GPU and FPGA accelerators.
+
+### Using HLS and XRT
+
+The applications are typically separated into host and accelerator/kernel side.
+The following host-side code should be saved as `host.cpp`
+
+```c++
+/*
+# Copyright (C) 2023, Advanced Micro Devices, Inc. All rights reserved.
+# SPDX-License-Identifier: X11
+*/
+#include <iostream>
+#include <cstring>
+
+// XRT includes
+#include "xrt/xrt_bo.h"
+#include <experimental/xrt_xclbin.h>
+#include "xrt/xrt_device.h"
+#include "xrt/xrt_kernel.h"
+
+#define DATA_SIZE 4096
+
+int main(int argc, char** argv)
+{
+    if(argc != 2)
+    {
+        std::cout << "Usage: " << argv[0] << " <XCLBIN File>" << std::endl;
+        return EXIT_FAILURE;
+    }
+
+    // Read settings
+    std::string binaryFile = argv[1];
+    int device_index = 0;
+
+    std::cout << "Open the device" << device_index << std::endl;
+    auto device = xrt::device(device_index);
+    std::cout << "Load the xclbin " << binaryFile << std::endl;
+    auto uuid = device.load_xclbin("./vadd.xclbin");
+
+    size_t vector_size_bytes = sizeof(int) * DATA_SIZE;
+
+    //auto krnl = xrt::kernel(device, uuid, "vadd");
+    auto krnl = xrt::kernel(device, uuid, "vadd", xrt::kernel::cu_access_mode::exclusive);
+
+    std::cout << "Allocate Buffer in Global Memory\n";
+    auto boIn1 = xrt::bo(device, vector_size_bytes, krnl.group_id(0)); //Match kernel arguments to RTL kernel
+    auto boIn2 = xrt::bo(device, vector_size_bytes, krnl.group_id(1));
+    auto boOut = xrt::bo(device, vector_size_bytes, krnl.group_id(2));
+
+    // Map the contents of the buffer object into host memory
+    auto bo0_map = boIn1.map<int*>();
+    auto bo1_map = boIn2.map<int*>();
+    auto bo2_map = boOut.map<int*>();
+    std::fill(bo0_map, bo0_map + DATA_SIZE, 0);
+    std::fill(bo1_map, bo1_map + DATA_SIZE, 0);
+    std::fill(bo2_map, bo2_map + DATA_SIZE, 0);
+
+    // Create the test data
+    int bufReference[DATA_SIZE];
+    for (int i = 0; i < DATA_SIZE; ++i)
+    {
+        bo0_map[i] = i;
+        bo1_map[i] = i;
+        bufReference[i] = bo0_map[i] + bo1_map[i]; //Generate check data for validation
+    }
+
+    // Synchronize buffer content with device side
+    std::cout << "synchronize input buffer data to device global memory\n";
+    boIn1.sync(XCL_BO_SYNC_BO_TO_DEVICE);
+    boIn2.sync(XCL_BO_SYNC_BO_TO_DEVICE);
+
+    std::cout << "Execution of the kernel\n";
+    auto run = krnl(boIn1, boIn2, boOut, DATA_SIZE); //DATA_SIZE=size
+    run.wait();
+
+    // Get the output;
+    std::cout << "Get the output data from the device" << std::endl;
+    boOut.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
+
+    // Validate results
+    if (std::memcmp(bo2_map, bufReference, vector_size_bytes))
+        throw std::runtime_error("Value read back does not match reference");
+
+    std::cout << "TEST PASSED\n";
+    return 0;
+}
+```
+
+The host-side code can now be compiled using GCC toolchain as:
+
+```console
+$ g++ host.cpp -I$XILINX_XRT/include -I$XILINX_VIVADO/include -L$XILINX_XRT/lib -lxrt_coreutil -o host
+```
+
+The accelerator side (simple vector-add kernel) should be saved as `vadd.cpp`.
+
+```c++
+/*
+# Copyright (C) 2023, Advanced Micro Devices, Inc. All rights reserved.
+# SPDX-License-Identifier: X11
+*/
+
+extern "C" {
+	void vadd(
+	        const unsigned int *in1, // Read-Only Vector 1
+	        const unsigned int *in2, // Read-Only Vector 2
+	        unsigned int *out,       // Output Result
+	        int size                 // Size in integer
+	        )
+	{
+#pragma HLS INTERFACE m_axi port=in1 bundle=aximm1
+#pragma HLS INTERFACE m_axi port=in2 bundle=aximm2
+#pragma HLS INTERFACE m_axi port=out bundle=aximm1
+
+	    for(int i = 0; i < size; ++i)
+	    {
+	        out[i] = in1[i] + in2[i];
+	    }
+	}
+}
+```
+
+The accelerator-side code is build using Vitis `v++`.
+This is two-step process, which either builds emulation binary or performs full HLS (depending on the value of `-t` argument).
+The platform (specific accelerator) has to be also specified at this step (both for emulation and full HLS).
+
+```console
+$ v++ -c -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -k vadd vadd.cpp -o vadd.xo
+$ v++ -l -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM vadd.xo -o vadd.xclbin
+```
+
+This process should result in `vadd.xclbin`, which can be loaded by host-side application.
+
+### Running the Application
+
+With both host application and kernel binary at hand the application (in emulation mode) can be launched as
+
+```console
+$ XCL_EMULATION_MODE=sw_emu ./host vadd.xclbin
+```
+
+or with real hardware (having compiled kernels with `IT4I_BUILD_MODE=hw`)
+
+```console
+./host vadd.xclbin
+```
+
+## Using HLS and OpenCL
+
+The host-side application code should be saved as `host.cpp`.
+This application attempts to find `Xilinx` OpenCL platform in the system and selects first device in that platform.
+The device is then configured with provided kernel binary.
+Other than that the only difference to typical vector-add in OpenCL is use of `enqueueTask(...)` to launch the kernel
+(compared to typical `enqueueNDRangeKernel`).
+
+```c++
+#include <iostream>
+#include <fstream>
+#include <iterator>
+#include <vector>
+
+#define CL_HPP_TARGET_OPENCL_VERSION 120
+#define CL_HPP_MINIMUM_OPENCL_VERSION 120
+#define CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY 1
+#define CL_USE_DEPRECATED_OPENCL_1_2_APIS
+
+#include <CL/cl2.hpp>
+#include <CL/cl_ext_xilinx.h>
+
+std::vector<unsigned char> read_binary_file(const std::string &filename)
+{
+    std::cout << "INFO: Reading " << filename << std::endl;
+    std::ifstream file(filename, std::ios::binary);
+    file.unsetf(std::ios::skipws);
+
+    std::streampos file_size;
+    file.seekg(0, std::ios::end);
+    file_size = file.tellg();
+    file.seekg(0, std::ios::beg);
+
+    std::vector<unsigned char> data;
+    data.reserve(file_size);
+    data.insert(data.begin(),
+        std::istream_iterator<unsigned char>(file),
+        std::istream_iterator<unsigned char>());
+
+    return data;
+}
+
+cl::Device select_device()
+{
+    std::vector<cl::Platform> platforms;
+    cl::Platform::get(&platforms);
+    cl::Platform platform;
+
+    for(cl::Platform &p: platforms)
+    {
+        const std::string name = p.getInfo<CL_PLATFORM_NAME>();
+        std::cout << "PLATFORM: " << name << std::endl;
+        if(name == "Xilinx")
+        {
+            platform = p;
+            break;
+        }
+    }
+
+    if(platform == cl::Platform())
+    {
+        std::cout << "Xilinx platform not found!" << std::endl;
+        exit(EXIT_FAILURE);
+    }
+
+    std::vector<cl::Device> devices;
+    platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices);
+    return devices[0];
+}
+
+static const int DATA_SIZE = 1024;
+
+int main(int argc, char *argv[])
+{
+    if(argc != 2)
+    {
+        std::cout << "Usage: " << argv[0] << " <XCLBIN File>" << std::endl;
+        return EXIT_FAILURE;
+    }
+
+    std::string binary_file = argv[1];
+
+    std::vector<int> source_a(DATA_SIZE, 10);
+    std::vector<int> source_b(DATA_SIZE, 32);
+
+    auto program_binary = read_binary_file(binary_file);
+    cl::Program::Binaries bins{{program_binary.data(), program_binary.size()}};
+
+    cl::Device device = select_device();
+    cl::Context context(device, nullptr, nullptr, nullptr);
+    cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE);
+
+    cl::Program program(context, {device}, bins, nullptr);
+
+    cl::Kernel vadd_kernel = cl::Kernel(program, "vector_add");
+
+    cl::Buffer buffer_a(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, source_a.size() * sizeof(int), source_a.data());
+    cl::Buffer buffer_b(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, source_b.size() * sizeof(int), source_b.data());
+    cl::Buffer buffer_res(context, CL_MEM_READ_WRITE, source_a.size() * sizeof(int));
+
+    int narg = 0;
+    vadd_kernel.setArg(narg++, buffer_res);
+    vadd_kernel.setArg(narg++, buffer_a);
+    vadd_kernel.setArg(narg++, buffer_b);
+    vadd_kernel.setArg(narg++, DATA_SIZE);
+
+    q.enqueueTask(vadd_kernel);
+
+    std::vector<int> result(DATA_SIZE, 0);
+    q.enqueueReadBuffer(buffer_res, CL_TRUE, 0, result.size() * sizeof(int), result.data());
+
+    int mismatch_count = 0;
+    for(size_t i = 0; i < DATA_SIZE; ++i)
+    {
+        int host_result = source_a[i] + source_b[i];
+        if(result[i] != host_result)
+        {
+            mismatch_count++;
+            std::cout << "ERROR: " << result[i] << " != " << host_result << std::endl;
+            break;
+        }
+    }
+
+    std::cout << "RESULT: " << (mismatch_count == 0 ? "PASSED" : "FAILED") << std::endl;
+
+    return 0;
+}
+```
+
+The host-side code can now be compiled using GCC toolchain as:
+
+```console
+$ g++ host.cpp -I$XILINX_XRT/include -I$XILINX_VIVADO/include -lOpenCL -o host
+```
+
+The accelerator side (simple vector-add kernel) should be saved as `vadd.cl`.
+
+```c++
+#define BUFFER_SIZE 256
+#define DATA_SIZE 1024
+
+// TRIPCOUNT indentifier
+__constant uint c_len = DATA_SIZE / BUFFER_SIZE;
+__constant uint c_size = BUFFER_SIZE;
+
+__attribute__((reqd_work_group_size(1, 1, 1)))
+__kernel void vector_add(__global int* c,
+    __global const int* a,
+    __global const int* b,
+    const int n_elements)
+{
+    int arrayA[BUFFER_SIZE];
+    int arrayB[BUFFER_SIZE];
+
+    __attribute__((xcl_loop_tripcount(c_len, c_len)))
+    for (int i = 0; i < n_elements; i += BUFFER_SIZE)
+    {
+        int size = BUFFER_SIZE;
+
+        if(i + size > n_elements)
+            size = n_elements - i;
+
+        __attribute__((xcl_loop_tripcount(c_size, c_size)))
+        __attribute__((xcl_pipeline_loop(1))) readA:
+        for(int j = 0; j < size; j++)
+            arrayA[j] = a[i + j];
+
+        __attribute__((xcl_loop_tripcount(c_size, c_size)))
+        __attribute__((xcl_pipeline_loop(1))) readB:
+        for(int j = 0; j < size; j++)
+            arrayB[j] = b[i + j];
+
+        __attribute__((xcl_loop_tripcount(c_size, c_size)))
+        __attribute__((xcl_pipeline_loop(1))) vadd_writeC:
+        for(int j = 0; j < size; j++)
+            c[i + j] = arrayA[j] + arrayB[j];
+    }
+}
+```
+
+The accelerator-side code is build using Vitis `v++`.
+This is three-step process, which either builds emulation binary or performs full HLS (depending on the value of `-t` argument).
+The platform (specific accelerator) has to be also specified at this step (both for emulation and full HLS).
+
+```console
+$ v++ -c -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -k vector_add -o vadd.xo vadd.cl
+$ v++ -l -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -o vadd.link.xclbin vadd.xo
+$ v++ -p vadd.link.xclbin -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -o vadd.xclbin
+```
+
+This process should result in `vadd.xclbin`, which can be loaded by host-side application.
+
+### Running the Application
+
+With both host application and kernel binary at hand the application (in emulation mode) can be launched as
+
+```console
+$ XCL_EMULATION_MODE=sw_emu ./host vadd.xclbin
+```
+
+or with real hardware (having compiled kernels with `IT4I_BUILD_MODE=hw`)
+
+```console
+./host vadd.xclbin
+```
+
+## Hybrid GPU and FPGA Application (HIP+XRT)
+
+This simple 8-bit quantized dot product (`R = sum(X[i]*Y[i])`) example illustrates basic approach to utilize both GPU and FPGA accelerators in a single application.
+The application takes the simplest approach, where both synchronization and data transfers are handled explicitly by the host.
+The HIP toolchain is used to compile the single source host/GPU code as usual, but it is also linked with XRT runtime, which allows host to control the FPGA accelerator.
+The FPGA kernels are built separately as in previous examples.
+
+The host/GPU HIP code should be saved as `main.hip`
+
+```c++
+#include <iostream>
+#include <vector>
+
+#include "xrt/xrt_bo.h"
+#include "experimental/xrt_xclbin.h"
+#include "xrt/xrt_device.h"
+#include "xrt/xrt_kernel.h"
+#include "hip/hip_runtime.h"
+
+const size_t DATA_SIZE = 1024;
+
+float compute_reference(const float *srcX, const float *srcY, size_t count);
+
+__global__ void quantize(int8_t *out, const float *in, size_t count)
+{
+    size_t idx = blockIdx.x * blockDim.x + threadIdx.x;
+
+    for(size_t i = idx; i < count; i += blockDim.x * gridDim.x)
+        out[i] = int8_t(in[i] * 127);
+}
+
+__global__ void dequantize(float *out, const int16_t *in, size_t count)
+{
+    size_t idx = blockIdx.x * blockDim.x + threadIdx.x;
+
+    for(size_t i = idx; i < count; i += blockDim.x * gridDim.x)
+        out[i] = float(in[i] / float(127*127));
+}
+
+int main(int argc, char *argv[])
+{
+    if(argc != 2)
+    {
+        std::cout << "Usage: " << argv[0] << " <XCLBIN File>" << std::endl;
+        return EXIT_FAILURE;
+    }
+
+    // Prepare experiment data
+    std::vector<float> srcX(DATA_SIZE);
+    std::vector<float> srcY(DATA_SIZE);
+    float outR = 0.0f;
+
+    for(size_t i = 0; i < DATA_SIZE; ++i)
+    {
+        srcX[i] = float(rand()) / float(RAND_MAX);
+        srcY[i] = float(rand()) / float(RAND_MAX);
+        outR += srcX[i] * srcY[i];
+    }
+
+    float outR_quant = compute_reference(srcX.data(), srcY.data(), DATA_SIZE);
+
+    std::cout << "REFERENCE: " << outR_quant << " (" << outR << ")" << std::endl;
+
+    // Initialize XRT (FPGA device), load kernels binary and create kernel object
+    xrt::device device(0);
+    std::cout << "Loading xclbin file " << argv[1] << std::endl;
+    xrt::uuid xclbinId = device.load_xclbin(argv[1]);
+    xrt::kernel mulKernel(device, xclbinId, "multiply", xrt::kernel::cu_access_mode::exclusive);
+
+    // Allocate GPU buffers
+    float   *srcX_gpu, *srcY_gpu, *res_gpu;
+    int8_t  *srcX_gpu_quant, *srcY_gpu_quant;
+    int16_t *res_gpu_quant;
+    hipMalloc(&srcX_gpu, DATA_SIZE * sizeof(float));
+    hipMalloc(&srcY_gpu, DATA_SIZE * sizeof(float));
+    hipMalloc(&res_gpu,  DATA_SIZE * sizeof(float));
+    hipMalloc(&srcX_gpu_quant, DATA_SIZE * sizeof(int8_t));
+    hipMalloc(&srcY_gpu_quant, DATA_SIZE * sizeof(int8_t));
+    hipMalloc(&res_gpu_quant,  DATA_SIZE * sizeof(int16_t));
+
+    // Allocate FPGA buffers
+    xrt::bo srcX_fpga_quant(device, DATA_SIZE * sizeof(int8_t), mulKernel.group_id(0));
+    xrt::bo srcY_fpga_quant(device, DATA_SIZE * sizeof(int8_t), mulKernel.group_id(1));
+    xrt::bo res_fpga_quant(device, DATA_SIZE * sizeof(int16_t), mulKernel.group_id(2));
+
+    // Copy experiment data from HOST to GPU
+    hipMemcpy(srcX_gpu, srcX.data(), DATA_SIZE * sizeof(float), hipMemcpyHostToDevice);
+    hipMemcpy(srcY_gpu, srcY.data(), DATA_SIZE * sizeof(float), hipMemcpyHostToDevice);
+
+    // Execute quantization kernels on both input vectors
+    quantize<<<16, 256>>>(srcX_gpu_quant, srcX_gpu, DATA_SIZE);
+    quantize<<<16, 256>>>(srcY_gpu_quant, srcY_gpu, DATA_SIZE);
+
+    // Map FPGA buffers into HOST memory, copy data from GPU to these mapped buffers and synchronize them into FPGA memory
+    hipMemcpy(srcX_fpga_quant.map<int8_t *>(), srcX_gpu_quant, DATA_SIZE * sizeof(int8_t), hipMemcpyDeviceToHost);
+    srcX_fpga_quant.sync(XCL_BO_SYNC_BO_TO_DEVICE);
+    hipMemcpy(srcY_fpga_quant.map<int8_t *>(), srcY_gpu_quant, DATA_SIZE * sizeof(int8_t), hipMemcpyDeviceToHost);
+    srcY_fpga_quant.sync(XCL_BO_SYNC_BO_TO_DEVICE);
+
+    // Execute FPGA kernel (8-bit integer multiplication)
+    auto kernelRun = mulKernel(res_fpga_quant, srcX_fpga_quant, srcY_fpga_quant, DATA_SIZE);
+    kernelRun.wait();
+
+    // Synchronize output FPGA buffer back to HOST and copy its contents to GPU buffer for dequantization
+    res_fpga_quant.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
+    hipMemcpy(res_gpu_quant, res_fpga_quant.map<int16_t *>(), DATA_SIZE * sizeof(int16_t), hipMemcpyDeviceToHost);
+
+    // Dequantize multiplication result on GPU
+    dequantize<<<16, 256>>>(res_gpu, res_gpu_quant, DATA_SIZE);
+
+    // Copy dequantized results from GPU to HOST
+    std::vector<float> res(DATA_SIZE);
+    hipMemcpy(res.data(), res_gpu, DATA_SIZE * sizeof(float), hipMemcpyDeviceToHost);
+
+    // Perform simple sum on CPU
+    float out = 0.0;
+    for(size_t i = 0; i < DATA_SIZE; ++i)
+        out += res[i];
+
+    std::cout << "RESULT: " << out << std::endl;
+
+    hipFree(srcX_gpu);
+    hipFree(srcY_gpu);
+    hipFree(res_gpu);
+    hipFree(srcX_gpu_quant);
+    hipFree(srcY_gpu_quant);
+    hipFree(res_gpu_quant);
+
+    return 0;
+}
+
+float compute_reference(const float *srcX, const float *srcY, size_t count)
+{
+    float out = 0.0f;
+
+    for(size_t i = 0; i < count; ++i)
+    {
+        int16_t quantX(srcX[i] * 127);
+        int16_t quantY(srcY[i] * 127);
+
+        out += float(int16_t(quantX * quantY) / float(127*127));
+    }
+
+    return out;
+}
+```
+
+The host/GPU application can be built using HIPCC as:
+
+```console
+$ hipcc -I$XILINX_XRT/include -I$XILINX_VIVADO/include -L$XILINX_XRT/lib -lxrt_coreutil main.hip -o host
+```
+
+The accelerator side (simple vector-multiply kernel) should be saved as `kernels.cpp`.
+
+```c++
+extern "C" {
+    void multiply(
+        short *out,
+        const char *inX,
+        const char *inY,
+        int size)
+    {
+#pragma HLS INTERFACE m_axi port=inX bundle=aximm1
+#pragma HLS INTERFACE m_axi port=inY bundle=aximm2
+#pragma HLS INTERFACE m_axi port=out bundle=aximm1
+        for(int i = 0; i < size; ++i)
+            out[i] = short(inX[i]) * short(inY[i]);
+    }
+}
+```
+
+Once again the HLS kernel is build using Vitis `v++` in two steps:
+
+```console
+v++ -c -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -k multiply kernels.cpp -o kernels.xo
+v++ -l -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM kernels.xo -o kernels.xclbin
+```
+
+### Running the Application
+
+In emulation mode (FPGA emulation, GPU HW is required) the application can be launched as:
+
+```console
+$ XCL_EMULATION_MODE=sw_emu ./host kernels.xclbin
+REFERENCE: 256.554 (260.714)
+Loading xclbin file ./kernels.xclbin
+RESULT: 256.554
+```
+
+or, having compiled kernels with `IT4I_BUILD_MODE=hw` set, using real hardware (both FPGA and GPU HW is required)
+
+```console
+$ ./host kernels.xclbin
+REFERENCE: 256.554 (260.714)
+Loading xclbin file ./kernels.xclbin
+RESULT: 256.554
+```
+
+## Additional Resources
+
+- [https://xilinx.github.io/Vitis-Tutorials/][1]
+- [http://xilinx.github.io/Vitis_Accel_Examples/][2]
+
+[1]: https://xilinx.github.io/Vitis-Tutorials/
+[2]: http://xilinx.github.io/Vitis_Accel_Examples/
--- a/docs.it4i/cs/introduction.md
+++ b/docs.it4i/cs/introduction.md
+# Complementary Systems
+
+Complementary systems offer development environment for users
+that need to port and optimize their code and applications
+for various hardware architectures and software technologies
+that are not available on standard clusters.
+
+## Complementary Systems 1
+
+First stage of complementary systems implementation comprises of these partitions:
+
+- compute partition 0 – based on ARM technology - legacy
+- compute partition 1 – based on ARM technology - A64FX
+- compute partition 2 – based on Intel technologies - Ice Lake, NVDIMMs + Bitware FPGAs
+- compute partition 3 – based on AMD technologies - Milan, MI100 GPUs + Xilinx FPGAs
+- compute partition 4 – reflecting Edge type of servers
+- partition 5 – FPGA synthesis server
+
+![](../img/cs1_1.png)
+
+## Complementary Systems 2
+
+Second stage of complementary systems implementation comprises of these partitions:
+
+- compute partition 6 - based on ARM technology + CUDA programmable GPGPU accelerators on ampere architecture + DPU network processing units
+- compute partition 7 - based on IBM Power10 architecture
+- compute partition 8 - modern CPU with a very high L3 cache capacity (over 750MB)
+- compute partition 9 - virtual GPU accelerated workstations
+- compute partition 10 - Sapphire Rapids-HBM server
+- compute partition 11 - NVIDIA Grace CPU Superchip
+
+![](../img/cs2_2.png)
+
+## Modules and Architecture Availability
+
+Complementary systems list available modules automatically based on the detected architecture.
+
+However, you can load one of the three modules -- `aarch64`, `avx2`, and `avx512` --
+to reload the list of modules available for the respective architecture:
+
+```console
+[user@login.cs ~]$ ml architecture/aarch64
+
+  aarch64 modules + all modules
+
+[user@login.cs ~]$ ml architecture/avx2
+
+  avx2 modules + all modules
+
+[user@login.cs ~]$ ml architecture/avx512
+
+  avx512 modules + all modules
+```
--- a/docs.it4i/cs/job-scheduling.md
+++ b/docs.it4i/cs/job-scheduling.md
+# Complementary System Job Scheduling
+
+## Introduction
+
+[Slurm][1] workload manager is used to allocate and access Complementary systems resources.
+
+## Getting Partition Information
+
+Display partitions/queues
+
+```console
+$ sinfo -s
+PARTITION AVAIL  TIMELIMIT   NODES(A/I/O/T) NODELIST
+p00-arm      up 1-00:00:00          0/1/0/1 p00-arm01
+p01-arm*     up 1-00:00:00          0/8/0/8 p01-arm[01-08]
+p02-intel    up 1-00:00:00          0/2/0/2 p02-intel[01-02]
+p03-amd      up 1-00:00:00          0/2/0/2 p03-amd[01-02]
+p04-edge     up 1-00:00:00          0/1/0/1 p04-edge01
+p05-synt     up 1-00:00:00          0/1/0/1 p05-synt01
+p06-arm      up 1-00:00:00          0/2/0/2 p06-arm[01-02]
+p07-power    up 1-00:00:00          0/1/0/1 p07-power01
+p08-amd      up 1-00:00:00          0/1/0/1 p08-amd01
+p10-intel    up 1-00:00:00          0/1/0/1 p10-intel01
+```
+
+## Getting Job Information
+
+Show jobs
+
+```console
+$ squeue --me
+             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
+               104   p01-arm interact    user   R       1:48      2 p01-arm[01-02]
+```
+
+Show job details for specific job
+
+```console
+$ scontrol -d show job JOBID
+```
+
+Show job details for executing job from job session
+
+```console
+$ scontrol -d show job $SLURM_JOBID
+```
+
+## Running Interactive Jobs
+
+Run interactive job
+
+```console
+ $ salloc -A PROJECT-ID -p p01-arm
+```
+
+Run interactive job, with X11 forwarding
+
+```console
+ $ salloc -A PROJECT-ID -p p01-arm --x11
+```
+
+!!! warning
+    Do not use `srun` for initiating interactive jobs, subsequent `srun`, `mpirun` invocations would block forever.
+
+## Running Batch Jobs
+
+Run batch job
+
+```console
+ $ sbatch -A PROJECT-ID -p p01-arm ./script.sh
+```
+
+Useful command options (salloc, sbatch, srun)
+
+* -n, --ntasks
+* -c, --cpus-per-task
+* -N, --nodes
+
+## Slurm Job Environment Variables
+
+Slurm provides useful information to the job via environment variables. Environment variables are available on all nodes allocated to job when accessed via Slurm supported means (srun, compatible mpirun).
+
+See all Slurm variables
+
+```
+set | grep ^SLURM
+```
+
+### Useful Variables
+
+| variable name | description | example |
+| ------ | ------ | ------ |
+| SLURM_JOB_ID | job id of the executing job| 593 |
+| SLURM_JOB_NODELIST | nodes allocated to the job | p03-amd[01-02] |
+| SLURM_JOB_NUM_NODES | number of nodes allocated to the job | 2 |
+| SLURM_STEP_NODELIST | nodes allocated to the job step | p03-amd01 |
+| SLURM_STEP_NUM_NODES | number of nodes allocated to the job step | 1 |
+| SLURM_JOB_PARTITION | name of the partition | p03-amd |
+| SLURM_SUBMIT_DIR | submit directory | /scratch/project/open-xx-yy/work |
+
+See [Slurm srun documentation][2] for details.
+
+Get job nodelist
+
+```
+$ echo $SLURM_JOB_NODELIST
+p03-amd[01-02]
+```
+
+Expand nodelist to list of nodes.
+
+```
+$ scontrol show hostnames $SLURM_JOB_NODELIST
+p03-amd01
+p03-amd02
+```
+
+## Modifying Jobs
+
+```
+$ scontrol update JobId=JOBID ATTR=VALUE
+```
+
+for example
+
+```
+$ scontrol update JobId=JOBID Comment='The best job ever'
+```
+
+## Deleting Jobs
+
+```
+$ scancel JOBID
+```
+
+## Partitions
+
+| PARTITION | nodes | whole node | cores per node | features |
+| --------- | ----- | ---------- | -------------- | -------- |
+| p00-arm   | 1     | yes        | 64             | aarch64,cortex-a72 |
+| p01-arm   | 8     | yes        | 48             | aarch64,a64fx,ib |
+| p02-intel | 2     | no         | 64             | x86_64,intel,icelake,ib,fpga,bitware,nvdimm |
+| p03-amd   | 2     | no         | 64             | x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx |
+| p04-edge  | 1     | yes        | 16             | 86_64,intel,broadwell,ib |
+| p05-synt  | 1     | yes        | 8              | x86_64,amd,milan,ib,ht |
+| p06-arm   | 2     | yes        | 80             | aarch64,ib |
+| p07-power | 1     | yes        | 192            | ppc64le,ib |
+| p08-amd   | 1     | yes        | 128            | x86_64,amd,milan-x,ib,ht |
+| p10-intel | 1     | yes        | 96             | x86_64,intel,sapphire_rapids,ht|
+
+Use `-t`, `--time` option to specify job run time limit. Default job time limit is 2 hours, maximum job time limit is 24 hours.
+
+FIFO scheduling with backfilling is employed.
+
+## Partition 00 - ARM (Cortex-A72)
+
+Whole node allocation.
+
+One node:
+
+```console
+salloc -A PROJECT-ID -p p00-arm
+```
+
+## Partition 01 - ARM (A64FX)
+
+Whole node allocation.
+
+One node:
+
+```console
+salloc -A PROJECT-ID -p p01-arm
+```
+
+```console
+salloc -A PROJECT-ID -p p01-arm -N=1
+```
+
+Multiple nodes:
+
+```console
+salloc -A PROJECT-ID -p p01-arm -N=8
+```
+
+## Partition 02 - Intel (Ice Lake, NVDIMMs + Bitware FPGAs)
+
+FPGAs are treated as resources. See below for more details about resources.
+
+Partial allocation - per FPGA, resource separation is not enforced.
+Use only FPGAs allocated to the job!
+
+One FPGA:
+
+```console
+salloc -A PROJECT-ID -p p02-intel --gres=fpga
+```
+
+Two FPGAs on the same node:
+
+```console
+salloc -A PROJECT-ID -p p02-intel --gres=fpga:2
+```
+
+All FPGAs:
+
+```console
+salloc -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2
+```
+
+## Partition 03 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)
+
+GPUs and FPGAs are treated as resources. See below for more details about resources.
+
+Partial allocation - per GPU and per FPGA, resource separation is not enforced.
+Use only GPUs and FPGAs allocated to the job!
+
+One GPU:
+
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=gpu
+```
+
+Two GPUs on the same node:
+
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=gpu:2
+```
+
+Four GPUs on the same node:
+
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=gpu:4
+```
+
+All GPUs:
+
+```console
+salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4
+```
+
+One FPGA:
+
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=fpga
+```
+
+Two FPGAs:
+
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=fpga:2
+```
+
+All FPGAs:
+
+```console
+salloc -A PROJECT-ID -p p03-amd -N 2--gres=fpga:2
+```
+
+One GPU and one FPGA on the same node:
+
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=gpu,fpga
+```
+
+Four GPUs and two FPGAs on the same node:
+
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=gpu:4,fpga:2
+```
+
+All GPUs and FPGAs:
+
+```console
+salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4,fpga:2
+```
+
+## Partition 04 - Edge Server
+
+Whole node allocation:
+
+```console
+salloc -A PROJECT-ID -p p04-edge
+```
+
+## Partition 05 - FPGA Synthesis Server
+
+Whole node allocation:
+
+```console
+salloc -A PROJECT-ID -p p05-synt
+```
+
+## Partition 06 - ARM
+
+Whole node allocation:
+
+```console
+salloc -A PROJECT-ID -p p06-arm
+```
+
+## Partition 07 - IBM Power
+
+Whole node allocation:
+
+```console
+salloc -A PROJECT-ID -p p07-power
+```
+
+## Partition 08 - AMD Milan-X
+
+Whole node allocation:
+
+```console
+salloc -A PROJECT-ID -p p08-amd
+```
+
+## Partition 10 - Intel Sapphire Rapids
+
+Whole node allocation:
+
+```console
+salloc -A PROJECT-ID -p p10-intel
+```
+
+## Features
+
+Nodes have feature tags assigned to them.
+Users can select nodes based on the feature tags using --constraint option.
+
+| Feature | Description |
+| ------ | ------ |
+| aarch64 | platform |
+| x86_64 | platform |
+| ppc64le | platform |
+| amd | manufacturer |
+| intel | manufacturer |
+| icelake | processor family |
+| broadwell | processor family |
+| sapphire_rapids | processor family |
+| milan | processor family |
+| milan-x | processor family |
+| ib | Infiniband |
+| gpu | equipped with GPU |
+| fpga | equipped with FPGA |
+| nvdimm | equipped with NVDIMMs |
+| ht | Hyperthreading enabled |
+| noht | Hyperthreading disabled |
+
+```
+$ sinfo -o '%16N %f'
+NODELIST         AVAIL_FEATURES
+p00-arm01        aarch64,cortex-a72
+p01-arm[01-08]   aarch64,a64fx,ib
+p02-intel01      x86_64,intel,icelake,ib,fpga,bitware,nvdimm,ht
+p02-intel02      x86_64,intel,icelake,ib,fpga,bitware,nvdimm,noht
+p03-amd02        x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,noht
+p03-amd01        x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,ht
+p04-edge01       x86_64,intel,broadwell,ib,ht
+p05-synt01       x86_64,amd,milan,ib,ht
+p06-arm[01-02]   aarch64,ib
+p07-power01      ppc64le,ib
+p08-amd01        x86_64,amd,milan-x,ib,ht
+p10-intel01      x86_64,intel,sapphire_rapids,ht
+```
+
+```
+$ salloc -A PROJECT-ID -p p02-intel --constraint noht
+```
+
+```
+$ scontrol -d show node p02-intel02 | grep ActiveFeatures
+   ActiveFeatures=x86_64,intel,icelake,ib,fpga,bitware,nvdimm,noht
+```
+
+## Resources, GRES
+
+Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPUs and FPGAs.
+
+!!! warning
+    Use only allocated GPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behavior and get into troubles.
+
+### Node Resources
+
+Get information about GRES on node.
+
+```
+$ scontrol -d show node p02-intel01 | grep Gres=
+   Gres=fpga:bitware_520n_mx:2
+$ scontrol -d show node p02-intel02 | grep Gres=
+   Gres=fpga:bitware_520n_mx:2
+$ scontrol -d show node p03-amd01 | grep Gres=
+   Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u250:2
+$ scontrol -d show node p03-amd02 | grep Gres=
+   Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u280:2
+```
+
+### Request Resources
+
+To allocate required resources (GPUs or FPGAs) use the `--gres salloc/srun` option.
+
+Example: Allocate one FPGA
+
+```
+$ salloc -A PROJECT-ID -p p03-amd --gres fpga:1
+```
+
+### Find Out Allocated Resources
+
+Information about allocated resources is available in Slurm job details, attributes `JOB_GRES` and `GRES`.
+
+```
+$ scontrol -d show job $SLURM_JOBID |grep GRES=
+   JOB_GRES=fpga:xilinx_alveo_u250:1
+     Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0)
+```
+
+IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPUs) allocated to the job on the node. In the given example - allocated resources are `fpga:xilinx_alveo_u250:1(IDX:0)`, we should use FPGA with index/number 0 on node p03-amd01.
+
+### Request Specific Resources
+
+It is possible to allocate specific resources. It is useful for partition p03-amd equipped with FPGAs of different types.
+
+GRES entry is using format "name[[:type]:count", in the following example name is fpga, type is xilinx_alveo_u280, and count is count 2.
+
+```
+$ salloc -A PROJECT-ID -p p03-amd --gres=fpga:xilinx_alveo_u280:2
+salloc: Granted job allocation XXX
+salloc: Waiting for resource configuration
+salloc: Nodes p03-amd02 are ready for job
+
+$ scontrol -d show job $SLURM_JOBID | grep -i gres
+   JOB_GRES=fpga:xilinx_alveo_u280:2
+     Nodes=p03-amd02 CPU_IDs=0 Mem=0 GRES=fpga:xilinx_alveo_u280(IDX:0-1)
+   TresPerNode=gres:fpga:xilinx_alveo_u280:2
+```
+
+[1]: https://slurm.schedmd.com/
+[2]: https://slurm.schedmd.com/srun.html#SECTION_OUTPUT-ENVIRONMENT-VARIABLES
--- a/docs.it4i/cs/specifications.md
+++ b/docs.it4i/cs/specifications.md
+# Complementary Systems Specifications
+
+Below are the technical specifications of individual Complementary systems.
+
+## Partition 0 - ARM (Cortex-A72)
+
+The partition is based on the [ARMv8-A 64-bit][4] nebo architecture.
+
+- Cortex-A72
+  - ARMv8-A 64-bit
+  - 2x 32 cores @ 2 GHz
+  - 255 GB memory
+- disk capacity 3,7 TB
+- 1x Infiniband FDR 56 Gb/s
+
+## Partition 1 - ARM (A64FX)
+
+The partition is based on the Armv8.2-A architecture
+with SVE extension of instruction set and
+consists of 8 compute nodes with the following per-node parameters:
+
+- 1x Fujitsu A64FX CPU
+  - Arm v8.2-A ISA CPU with Scalable Vector Extension (SVE) extension
+  - 48 cores at 2.0 GHz
+  - 32 GB of HBM2 memory
+- 400 GB SSD (m.2 form factor) – mixed used type
+- 1x Infiniband HDR100 interface
+  - connected via 16x PCI-e Gen3 slot to the CPU
+
+## Partition 2 - Intel (Ice Lake, NVDIMMs) <!--- + Bitware FPGAs) -->
+
+The partition is based on the Intel Ice Lake x86 architecture.
+It contains two servers with Intel NVDIMM memories.
+ <!--- The key technologies installed are Intel NVDIMM memories. and Intel FPGA accelerators.
+The partition contains two servers each with two FPGA accelerators. -->
+
+Each server has the following parameters:
+
+- 2x 3rd Gen Xeon Scalable Processors Intel Xeon Gold 6338 CPU
+  - 32-cores @ 2.00GHz
+- 16x 16GB RAM with ECC
+  - DDR4-3200
+- 1x Infiniband HDR100 interface
+  - connected to CPU 8x PCI-e Gen4 interface
+- 3.2 TB NVMe local storage – mixed use type
+
+<!---
+2x FPGA accelerators
+Bitware [520N-MX][1]
+-->
+
+In addition, the servers has the following parameters:
+
+- Intel server 1 – low NVDIMM memory server with 2304 GB NVDIMM memory
+  - 16x 128GB NVDIMM persistent memory modules
+- Intel server 2 – high NVDIMM memory server with 8448 GB NVDIMM memory
+  - 16x 512GB NVDIMM persistent memory modules
+
+Software installed on the partition:
+
+FPGA boards support application development using following design flows:
+
+- OpenCL
+- High-Level Synthesis (C/C++) including support for OneAPI
+- Verilog and VHDL
+
+## Partition 3 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)
+
+The partition is based on two servers equipped with AMD Milan x86 CPUs,
+AMD GPUs and Xilinx FPGAs architectures and represents an alternative
+to the Intel-based partition's ecosystem.
+
+Each server has the following parameters:
+
+- 2x AMD Milan 7513 CPU
+  - 32 cores @ 2.6 GHz
+- 16x 16GB RAM with ECC
+  - DDR4-3200
+- 4x AMD GPU accelerators MI 100
+  - Interconnected with AMD Infinity Fabric™ Link for fast GPU to GPU communication
+- 1x 100 GBps Infiniband HDR100
+  - connected to CPU via 8x PCI-e Gen4 interface
+- 3.2 TB NVMe local storage – mixed use
+
+In addition:
+
+- AMD server 1 has 2x FPGA [Xilinx Alveo U250 Data Center Accelerator Card][2]
+- AMD server 2 has 2x FPGA [Xilinx Alveo U280 Data Center Accelerator Card][3]
+
+Software installed on the partition:
+
+FPGA boards support application development using following design flows:
+
+- OpenCL
+- High-Level Synthesis (C/C++)
+- Verilog and VHDL
+- developer tools and libraries for AMD GPUs.
+
+## Partition 4 - Edge Server
+
+The partition provides overview of the so-called edge computing class of resources
+with solutions powerful enough to provide data analytic capabilities (both CPU and GPU)
+in a form factor which cannot require a data center to operate.
+
+The partition consists of one edge computing server with following parameters:
+
+- 1x x86_64 CPU Intel Xeon D-1587
+  - TDP 65 W,
+  - 16 cores,
+  - 435 GFlop/s theoretical max performance in double precision
+- 1x CUDA programmable GPU NVIDIA Tesla T4
+  - TDP 70W
+  - theoretical performance 8.1 TFlop/s in FP32
+- 128 GB RAM
+- 1.92TB SSD storage
+- connectivity:
+  - 2x 10 Gbps Ethernet,
+  - WiFi 802.11 ac,
+  - LTE connectivity
+
+## Partition 5 - FPGA Synthesis Server
+
+FPGAs design tools usually run for several hours to one day to generate a final bitstream (logic design) of large FPGA chips. These tools are usually sequential, therefore part of the system is a dedicated server for this task.
+
+This server is used by development tools needed for FPGA boards installed in both Compute partition 2 and 3.
+
+- AMD EPYC 72F3, 8 cores @ 3.7 GHz nominal frequency
+  - 8 memory channels with ECC
+- 128 GB of DDR4-3200 memory with ECC
+  - memory is fully populated to maximize memory subsystem performance
+- 1x 10Gb Ethernet port used for connection to LAN
+- NVMe local storage
+  - 2x NVMe disks 3.2TB, configured RAID 1
+
+## Partition 6 - ARM + CUDA GPGU (Ampere) + DPU
+
+This partition is based on ARM architecture and is equipped with CUDA programmable GPGPU accelerators
+based on Ampere architecture and DPU network processing units.
+The partition consists of two nodes with the following per-node parameters:
+
+- Server Gigabyte G242-P36, Ampere Altra Q80-30 (80c, 3.0GHz)
+- 512GB DIMM DDR4, 3200MHz, ECC, CL22
+- 2x Micron 7400 PRO 1920GB NVMe M.2 Non-SED Enterprise SSD
+- 2x NVIDIA A30 GPU Accelerator
+- 2x NVIDIA BlueField-2 E-Series DPU 25GbE Dual-Port SFP56, PCIe Gen4 x16, 16GB DDR + 64, 200Gb Ethernet
+- Mellanox ConnectX-5 EN network interface card, 10/25GbE dual-port SFP28, PCIe3.0 x8
+- Mellanox ConnectX-6 VPI adapter card, 100Gb/s (HDR100, EDR IB and 100GbE), single-port QSFP56
+
+## Partition 7 - IBM
+
+The IBM Power10 server is a single-node partition with the following parameters:
+
+- Server IBM POWER S1022
+- 2x Power10 12-CORE TYPICAL 2.90 TO 4.0 GHZ (MAX) PO
+- 512GB DDIMMS, 3200 MHZ, 8GBIT DDR4
+- 2x ENTERPRISE 1.6 TB SSD PCIE4 NVME U.2 MOD
+- 2x ENTERPRISE 6.4 TB SSD PCIE4 NVME U.2 MOD
+- PCIE3 LP 2-PORT 25/10GB NIC&ROCE SR/CU A
+
+## Partition 8 - HPE Proliant
+
+This partition provides a modern CPU with a very large L3 cache.
+The goal is to enable users to develop algorithms and libraries
+that will efficiently utilize this technology.
+The processor is very efficient, for example, for linear algebra on relatively small matrices.
+This is a single-node partition with the following parameters:
+
+- Server HPE Proliant DL 385 Gen10 Plus v2 CTO
+- 2x AMD EPYC 7773X Milan-X, 64 cores, 2.2GHz, 768 MB L3 cache
+- 16x HPE 16GB (1x+16GB) x4 DDR4-3200 Registered Smart Memory Kit
+- 2x 3.84TB NVMe RI SFF BC U.3ST MV SSD
+- BCM 57412 10GbE 2p SFP+ OCP3 Adptr
+- HPE IB HDR100/EN 100Gb 1p QSFP56 Adptr1
+- HPE Cray Programming Environment for x86 Systems 2 Seats
+
+## Partition 9 - Virtual GPU Accelerated Workstation
+
+This partition provides users with a remote/virtual workstation running MS Windows OS.
+It offers rich graphical environment with a focus on 3D OpenGL
+or RayTracing-based applications with the smallest possible degradation of user experience.
+The partition consists of two nodes with the following per-node parameters:
+
+- Server HPE Proliant DL 385 Gen10 Plus v2 CTO
+- 2x AMD EPYC 7413, 24 cores, 2.55GHz
+- 16x HPE 32GB 2Rx4 PC4-3200AA-R Smart Kit
+- 2x 3.84TB NVMe RI SFF BC U.3ST MV SSD
+- BCM 57412 10GbE 2p SFP+ OCP3 Adptr
+- 2x NVIDIA A40 48GB GPU Accelerator
+
+### Available Software
+
+The following is the list of software available on partiton 09:
+
+- Academic VMware Horizon 8 Enterprise Term Edition: 10 Concurrent User Pack for 4 year term license; includes SnS
+- 8x NVIDIA RTX Virtual Workstation, per concurrent user, EDU, perpetual license
+- 32x NVIDIA RTX Virtual Workstation, per concurrent user, EDU SUMS per year
+- 7x Windows Server 2022 Standard - 16 Core License Pack
+- 10x Windows Server 2022 - 1 User CAL
+- 40x Windows 10/11 Enterprise E3 VDA (Microsoft) per year
+- Hardware VMware Horizon management
+
+## Partition 10 - Sapphire Rapids-HBM Server
+
+The primary purpose of this server is to evaluate the impact of the HBM memory on the x86 processor
+on the performance of the user applications.
+This is a new feature previously available only on the GPGPU accelerators
+and provided a significant boost to the memory-bound applications.
+Users can also compare the impact of the HBM memory with the impact of the large L3 cache
+available on the AMD Milan-X processor also available on the complementary systems.
+The server is also equipped with DDR5 memory and enables the comparative studies with reference to DDR4 based systems.
+
+- 2x Intel® Xeon® CPU Max 9468 48 cores base 2.1GHz, max 3.5Ghz
+- 16x 16GB DDR5 4800Mhz
+- 2x Intel D3 S4520 960GB SATA 6Gb/s
+- 1x Supermicro Standard LP 2-port 10GbE RJ45, Broadcom BCM57416
+
+## Partition 11 - NVIDIA Grace CPU Superchip
+
+The [NVIDIA Grace CPU Superchip][6] uses the [NVIDIA® NVLink®-C2C][5] technology to deliver 144 Arm® Neoverse V2 cores and 1TB/s of memory bandwidth.
+Runs all NVIDIA software stacks and platforms, including NVIDIA RTX™, NVIDIA HPC SDK, NVIDIA AI, and NVIDIA Omniverse™.
+
+- Superchip design with up to 144 Arm Neoverse V2 CPU cores with Scalable Vector Extensions (SVE2)
+- World’s first LPDDR5X with error-correcting code (ECC) memory, 1TB/s total bandwidth
+- 900GB/s coherent interface, 7X faster than PCIe Gen 5
+- NVIDIA Scalable Coherency Fabric with 3.2TB/s of aggregate bisectional bandwidth
+- 2X the packaging density of DIMM-based solutions
+- 2X the performance per watt of today’s leading CPU
+- FP64 Peak of 7.1TFLOPS
+
+[1]: https://www.bittware.com/fpga/520n-mx/
+[2]: https://www.xilinx.com/products/boards-and-kits/alveo/u250.html#overview
+[3]: https://www.xilinx.com/products/boards-and-kits/alveo/u280.html#overview
+[4]: https://developer.arm.com/documentation/100095/0003/
+[5]: https://www.nvidia.com/en-us/data-center/nvlink-c2c/
+[6]: https://www.nvidia.com/en-us/data-center/grace-cpu-superchip/
+
--- a/docs.it4i/dgx2/accessing.md
+++ b/docs.it4i/dgx2/accessing.md
+# Accessing the DGX-2
+
+## Before You Access
+
+!!! warning
+    GPUs are single-user devices. GPU memory is not purged between job runs and it can be read (but not written) by any user. Consider the confidentiality of your running jobs.
+
+## How to Access
+
+The DGX-2 machine is integrated into [Barbora cluster][3].
+The DGX-2 machine can be accessed from Barbora login nodes `barbora.it4i.cz` through the Barbora scheduler queue qdgx as a compute node cn202.
+
+## Storage
+
+There are three shared file systems on the DGX-2 system: HOME, SCRATCH (LSCRATCH), and PROJECT.
+
+### HOME
+
+The HOME filesystem is realized as an NFS filesystem. This is a shared home from the [Barbora cluster][1].
+
+### SCRATCH
+
+The SCRATCH is realized on an NVME storage. The SCRATCH filesystem is mounted in the `/scratch` directory.
+Accessible capacity is 22TB, shared among all users.
+
+!!! warning
+    Files on the SCRATCH filesystem that are not accessed for more than 60 days will be automatically deleted.
+
+### PROJECT
+
+The PROJECT data storage is IT4Innovations' central data storage accessible from all clusters.
+For more information on accessing PROJECT, its quotas, etc., see the [PROJECT Data Storage][2] section.
+
+[1]: ../../barbora/storage/#home-file-system
+[2]: ../../storage/project-storage
+[3]: ../../barbora/introduction
--- a/docs.it4i/dgx2/introduction.md
+++ b/docs.it4i/dgx2/introduction.md
+# NVIDIA DGX-2
+
+The DGX-2 is a very powerful computational node, featuring high end x86_64 processors and 16 NVIDIA V100-SXM3 GPUs.
+
+| NVIDIA DGX-2  | |
+| --- | --- |
+| CPUs | 2 x Intel Xeon Platinum |
+| GPUs | 16 x NVIDIA Tesla V100 32GB HBM2 |
+| System Memory | Up to 1.5 TB DDR4 |
+| GPU Memory | 512 GB HBM2 (16 x 32 GB)	|
+| Storage | 30 TB NVMe, Up to 60 TB |
+| Networking | 8 x Infiniband or 8 x 100 GbE |
+| Power | 10 kW	|
+| Size | 350 lbs |
+| GPU Throughput | Tensor: 1920 TFLOPs, FP16: 520 TFLOPs, FP32: 260 TFLOPs, FP64: 130 TFLOPs |
+
+The [DGX-2][a] introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
+
+With NVLink2, it enables 16x NVIDIA V100-SXM3 GPUs in a single system, for a total bandwidth going beyond 14 TB/s.
+Featuring pair of Xeon 8168 CPUs, 1.5 TB of memory, and 30 TB of NVMe storage,
+we get a system that consumes 10 kW, weighs 163.29 kg, but offers double precision performance in excess of 130TF.
+
+The DGX-2 is designed to be a powerful server in its own right.
+On the storage side, the DGX-2 comes with 30TB of NVMe-based solid state storage.
+For clustering or further inter-system communications, it also offers InfiniBand and 100GigE connectivity, up to eight of them.
+
+Further, the [DGX-2][b] offers  a total of ~2 PFLOPs of half precision performance in a single system, when using the tensor cores.
+
+![](../img/dgx1.png)
+
+With DGX-2, AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes.
+
+The DGX-2 is able to complete the training process
+for FAIRSEQ – a neural network model for language translation – 10x faster than a DGX-1 system,
+bringing it down to less than two days total rather than 15 days.
+
+The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity.
+The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space,
+though with the usual tradeoffs involved if going off-chip.
+
+![](../img/dgx2-nvlink.png)
+
+[a]: https://www.nvidia.com/content/dam/en-zz/es_em/Solutions/Data-Center/dgx-2/nvidia-dgx-2-datasheet.pdf
+[b]: https://www.youtube.com/embed/OTOGw0BRqK0
--- a/docs.it4i/dgx2/job_execution.md
+++ b/docs.it4i/dgx2/job_execution.md
+# Resource Allocation and Job Execution
+
+To run a job, computational resources of DGX-2 must be allocated.
+
+The DGX-2 machine is integrated to and accessible through Barbora cluster, the queue for the DGX-2 machine is called **qdgx**.
+
+When allocating computational resources for the job, specify:
+
+1. your Project ID
+1. a queue for your job - **qdgx**;
+1. the maximum time allocated to your calculation (default is **4 hour**, maximum is **48 hour**);
+1. a jobscript if batch processing is intended.
+
+Submit the job using the `sbatch` (for batch processing) or `salloc` (for interactive session) command:
+
+**Example**
+
+```console
+[kru0052@login2.barbora ~]$ salloc -A PROJECT-ID -p qdgx --time=02:00:00
+salloc: Granted job allocation 36631
+salloc: Waiting for resource configuration
+salloc: Nodes cn202 are ready for job
+
+kru0052@cn202:~$ nvidia-smi
+Wed Jun 16 07:46:32 2021
+-----------------------------------------------------------------------------+
+|  NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3    |
+|-------------------------------+----------------------+----------------------+
+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|===============================+======================+======================|
+|   0  Tesla V100-SXM3...  On   | 00000000:34:00.0 Off |                    0 |
+| N/A   32C    P0    51W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|   1  Tesla V100-SXM3...  On   | 00000000:36:00.0 Off |                    0 |
+| N/A   31C    P0    48W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|   2  Tesla V100-SXM3...  On   | 00000000:39:00.0 Off |                    0 |
+| N/A   35C    P0    53W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|   3  Tesla V100-SXM3...  On   | 00000000:3B:00.0 Off |                    0 |
+| N/A   36C    P0    53W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|   4  Tesla V100-SXM3...  On   | 00000000:57:00.0 Off |                    0 |
+| N/A   29C    P0    50W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|   5  Tesla V100-SXM3...  On   | 00000000:59:00.0 Off |                    0 |
+| N/A   35C    P0    51W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|   6  Tesla V100-SXM3...  On   | 00000000:5C:00.0 Off |                    0 |
+| N/A   30C    P0    50W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|   7  Tesla V100-SXM3...  On   | 00000000:5E:00.0 Off |                    0 |
+| N/A   35C    P0    53W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|   8  Tesla V100-SXM3...  On   | 00000000:B7:00.0 Off |                    0 |
+| N/A   30C    P0    50W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|   9  Tesla V100-SXM3...  On   | 00000000:B9:00.0 Off |                    0 |
+| N/A   30C    P0    51W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|  10  Tesla V100-SXM3...  On   | 00000000:BC:00.0 Off |                    0 |
+| N/A   35C    P0    51W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|  11  Tesla V100-SXM3...  On   | 00000000:BE:00.0 Off |                    0 |
+| N/A   35C    P0    50W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|  12  Tesla V100-SXM3...  On   | 00000000:E0:00.0 Off |                    0 |
+| N/A   31C    P0    50W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|  13  Tesla V100-SXM3...  On   | 00000000:E2:00.0 Off |                    0 |
+| N/A   29C    P0    51W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|  14  Tesla V100-SXM3...  On   | 00000000:E5:00.0 Off |                    0 |
+| N/A   34C    P0    51W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+|  15  Tesla V100-SXM3...  On   | 00000000:E7:00.0 Off |                    0 |
+| N/A   34C    P0    50W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+kru0052@cn202:~$ exit
+```
+
+!!! tip
+    Submit the interactive job using the `salloc` command.
+
+## Job Execution
+
+The DGX-2 machine runs only a bare-bone, minimal operating system. Users are expected to run
+**[Apptainer/Singularity][1]** containers in order to enrich the environment according to the needs.
+
+Containers (Docker images) optimized for DGX-2 may be downloaded from
+[NVIDIA Gpu Cloud][2]. Select the code of interest and
+copy the docker nvcr.io link from the Pull Command section. This link may be directly used
+to download the container via Apptainer/Singularity, see the example below:
+
+### Example - Apptainer/Singularity Run Tensorflow
+
+```console
+[kru0052@login2.barbora ~] $ salloc -A PROJECT-ID -p qdgx --time=02:00:00
+salloc: Granted job allocation 36633
+salloc: Waiting for resource configuration
+salloc: Nodes cn202 are ready for job
+
+kru0052@cn202:~$ singularity shell docker://nvcr.io/nvidia/tensorflow:19.02-py3
+Singularity tensorflow_19.02-py3.sif:~>
+Singularity tensorflow_19.02-py3.sif:~> mpiexec --bind-to socket -np 16 python /opt/tensorflow/nvidia-examples/cnn/resnet.py --layers=18 --precision=fp16 --batch_size=512
+PY 3.5.2 (default, Nov 12 2018, 13:43:14)
+[GCC 5.4.0 20160609]
+TF 1.13.0-rc0
+PY 3.5.2 (default, Nov 12 2018, 13:43:14)
+[GCC 5.4.0 20160609]
+TF 1.13.0-rc0
+PY 3.5.2 (default, Nov 12 2018, 13:43:14)
+[GCC 5.4.0 20160609]
+TF 1.13.0-rc0
+PY 3.5.2 (default, Nov 12 2018, 13:43:14)
+[GCC 5.4.0 20160609]
+TF 1.13.0-rc0
+PY 3.5.2 (default, Nov 12 2018, 13:43:14)
+[GCC 5.4.0 20160609]
+TF 1.13.0-rc0
+PY 3.5.2 (default, Nov 12 2018, 13:43:14)
+[GCC 5.4.0 20160609]
+...
+...
+...
+2019-03-11 08:30:12.263822: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
+     1   1.0   338.2  6.999  7.291 2.00000
+    10  10.0  3658.6  5.658  5.950 1.62000
+    20  20.0 25628.6  2.957  3.258 1.24469
+    30  30.0 30815.1  0.177  0.494 0.91877
+    40  40.0 30826.3  0.004  0.330 0.64222
+    50  50.0 30884.3  0.002  0.327 0.41506
+    60  60.0 30888.7  0.001  0.325 0.23728
+    70  70.0 30763.2  0.001  0.324 0.10889
+    80  80.0 30845.5  0.001  0.324 0.02988
+    90  90.0 26350.9  0.001  0.324 0.00025
+kru0052@cn202:~$ exit
+```
+
+**GPU stat**
+
+The GPU load can be determined by the `gpustat` utility.
+
+```console
+Every 2,0s: gpustat --color
+
+dgx  Mon Mar 11 09:31:00 2019
+[0] Tesla V100-SXM3-32GB | 47'C,  96 % | 23660 / 32480 MB | kru0052(23645M)
+[1] Tesla V100-SXM3-32GB | 48'C,  96 % | 23660 / 32480 MB | kru0052(23645M)
+[2] Tesla V100-SXM3-32GB | 56'C,  97 % | 23660 / 32480 MB | kru0052(23645M)
+[3] Tesla V100-SXM3-32GB | 57'C,  97 % | 23660 / 32480 MB | kru0052(23645M)
+[4] Tesla V100-SXM3-32GB | 46'C,  97 % | 23660 / 32480 MB | kru0052(23645M)
+[5] Tesla V100-SXM3-32GB | 55'C,  96 % | 23660 / 32480 MB | kru0052(23645M)
+[6] Tesla V100-SXM3-32GB | 45'C,  96 % | 23660 / 32480 MB | kru0052(23645M)
+[7] Tesla V100-SXM3-32GB | 54'C,  97 % | 23660 / 32480 MB | kru0052(23645M)
+[8] Tesla V100-SXM3-32GB | 45'C,  96 % | 23660 / 32480 MB | kru0052(23645M)
+[9] Tesla V100-SXM3-32GB | 46'C,  95 % | 23660 / 32480 MB | kru0052(23645M)
+[10] Tesla V100-SXM3-32GB | 55'C,  96 % | 23660 / 32480 MB | kru0052(23645M)
+[11] Tesla V100-SXM3-32GB | 56'C,  96 % | 23660 / 32480 MB | kru0052(23645M)
+[12] Tesla V100-SXM3-32GB | 47'C,  95 % | 23660 / 32480 MB | kru0052(23645M)
+[13] Tesla V100-SXM3-32GB | 45'C,  96 % | 23660 / 32480 MB | kru0052(23645M)
+[14] Tesla V100-SXM3-32GB | 55'C,  96 % | 23660 / 32480 MB | kru0052(23645M)
+[15] Tesla V100-SXM3-32GB | 58'C,  95 % | 23660 / 32480 MB | kru0052(23645M)
+```
+
+[1]: https://docs.it4i.cz/software/tools/singularity/
+[2]: https://ngc.nvidia.com/
--- a/docs.it4i/dgx2/software.md
+++ b/docs.it4i/dgx2/software.md
+# Software Deployment
+
+Software deployment on DGX-2 is based on containers. NVIDIA provides a wide range of prepared Docker containers with a variety of different software. Users can easily download these containers and use them directly on the DGX-2.
+
+The catalog of all container images can be found on [NVIDIA site][a]. Supported software includes:
+
+* TensorFlow
+* MATLAB
+* GROMACS
+* Theano
+* Caffe2
+* LAMMPS
+* ParaView
+* ...
+
+## Running Containers on DGX-2
+
+NVIDIA expects usage of Docker as a containerization tool, but Docker is not a suitable solution in a multiuser environment. For this reason, the [Apptainer/Singularity container][b] solution is used.
+
+Singularity can be used similarly to Docker, just change the image URL address. For example, original command for Docker `docker run -it nvcr.io/nvidia/theano:18.08` should be changed to `singularity shell docker://nvcr.io/nvidia/theano:18.08`. More about Apptainer/Singularity [here][1].
+
+For fast container deployment, all images are cached after first use in the *lscratch* directory. This behavior can be changed by the *SINGULARITY_CACHEDIR* environment variable, but the start time of the container will increase significantly.
+
+```console
+$ ml av Singularity
+
+---------------------------- /apps/modules/tools ----------------------------
+   Singularity/3.3.0
+```
+
+## MPI Modules
+
+```console
+$ ml av MPI
+
+---------------------------- /apps/modules/mpi ----------------------------
+   OpenMPI/2.1.5-GCC-6.3.0-2.27    OpenMPI/3.1.4-GCC-6.3.0-2.27    OpenMPI/4.0.0-GCC-6.3.0-2.27 (D)    impi/2017.4.239-iccifort-2017.7.259-GCC-6.3.0-2.27
+```
+
+## Compiler Modules
+
+```console
+$ ml av gcc
+
+---------------------------- /apps/modules/compiler ----------------------------
+   GCC/6.3.0-2.27    GCCcore/6.3.0    icc/2017.7.259-GCC-6.3.0-2.27    ifort/2017.7.259-GCC-6.3.0-2.27
+
+```
+
+[1]: ../software/tools/singularity.md
+[a]: https://ngc.nvidia.com/catalog/landing
+[b]: https://www.sylabs.io/
--- a/docs.it4i/dice.md
+++ b/docs.it4i/dice.md
+# What Is DICE Project?
+
+DICE (Data Infrastructure Capacity for EOSC) is an international project funded by the European Union
+that provides cutting-edge data management services and a significant amount of storage resources for the EOSC.
+The EOSC (European Open Science Cloud) project provides European researchers, innovators, companies,
+and citizens with a federated and open multi-disciplinary environment
+where they can publish, find, and re-use data, tools, and services for research, innovation and educational purposes.
+
+For more information, see the official [DICE project][b] and [EOSC project][q] pages.
+
+**IT4Innovations participates in DICE. DICE uses the iRODS software**
+
+The integrated Rule-Oriented Data System (iRODS) is an open source data management software
+used by research organizations and government agencies worldwide.
+iRODS is released as a production-level distribution aimed at deployment in mission critical environments.
+It virtualizes data storage resources, so users can take control of their data,
+regardless of where and on what device the data is stored.
+As data volumes grow and data services become more complex,
+iRODS is serving an increasingly important role in data management.
+For more information, see [the official iRODS page][c].
+
+## How to Put Your Data to Our Server
+
+**Prerequisities:**
+
+First, we need to verify your identity, this is done through the following steps:
+
+1. Sign in with your organization [B2ACCESS][d]; the page requests a valid personal certificate (e.g. GEANT).
+  Accounts with "Low" level of assurance are not granted access to IT4I zone.
+
+1. Confirm your certificate in the browser:
+
+    ![](img/B2ACCESS_chrome_eng.jpg)
+
+1. Confirm your certificate in the OS (Windows):
+
+    ![](img/crypto_v2.jpg)
+
+1. Sign to EUDAT/B2ACCESS:
+
+    ![](img/eudat_v2.jpg)
+
+1. After successful login to B2Access:
+
+    1. **For Non IT4I Users**
+
+         Sign in to our [AAI][f] through your B2Access account.
+         You have to set a new password for iRODS access.
+
+    1. **For IT4I Users**
+
+         Sign in to our [AAI][f] through your B2Access account and link your B2ACCESS identity with your existing account.
+         The iRODS password will be the same as your IT4I LDAP password (i.e. code.it4i.cz password).
+
+    ![](img/aai.jpg)
+    ![](img/aai2.jpg)
+    ![](img/aai3-passwd.jpg)
+    ![](img/irods_linking_link.jpg)
+
+1. Contact [support@it4i.cz][a], so we can create your account at our iRODS server.
+
+1. **Fill this request on [EOSC-MARKETPLACE][h] (recommended)** or at [EUDAT][l], please specify the requested capacity.
+
+   ![](img/eosc-marketplace-active.jpg)
+   ![](img/eosc-providers.jpg)
+   ![](img/eudat_request.jpg)
+
+## Access to iRODS Collection From Karolina
+
+Access to iRODS Collection requires access to the Karolina cluster (i.e. [IT4I account][4]),
+since iRODS clients are provided as a module on Karolina (Barbora is in progress).
+The `irodsfs` module loads config file for irodsfs and icommands, too.
+
+Note that you can change password to iRODS at [aai.it4i.cz][m].
+
+### Mounting Your Collection
+
+```console
+ssh some_user@karolina.it4i.cz
+ml irodsfs
+```
+
+Now you can choose between the Fuse client or iCommands:
+
+#### Fuse
+
+```console
+ssh some_user@karolina.it4i.cz
+[some_use@login4.karolina ~]$ ml irodsfs
+
+irodsfs configuration file has been created at /home/dvo0012/.irods/config.yml
+iRODS environment file has been created at /home/dvo0012/.irods/irods_environment.json
+
+to start irodsfs, run:          irodsfs -config ~/.irods/config.yml ~/IRODS
+to start iCommands, run:        iinit
+
+For more information, see https://docs.it4i.cz/dice/
+```
+
+To mount your iRODS collection to ~/IRODS, run
+
+```console
+[some_user@login4.karolina ~]$ irodsfs -config ~/.irods/config.yml ~/IRODS
+time="2022-08-04 08:54:13.222836" level=info msg="Logging to /tmp/irodsfs_cblmq5ab1lsaj31vrv20.log" function=processArguments package=main
+Password:
+time="2022-08-04 08:54:18.698811" level=info msg="Found FUSE Device. Starting iRODS FUSE Lite." function=parentMain package=main
+time="2022-08-04 08:54:18.699080" level=info msg="Running the process in the background mode" function=parentRun package=main
+time="2022-08-04 08:54:18.699544" level=info msg="Process id = 27145" function=parentRun package=main
+time="2022-08-04 08:54:18.699572" level=info msg="Sending configuration data" function=parentRun package=main
+time="2022-08-04 08:54:18.699730" level=info msg="Successfully sent configuration data to background process" function=parentRun package=main
+time="2022-08-04 08:54:18.922490" level=info msg="Successfully started background process" function=parentRun package=main
+```
+
+To unmount it, run
+
+```console
+fusermount -u ~/IRODS
+```
+
+You can work with Fuse as an ordinary directory  (`ls`, `cd`, `cp`, `mv`, etc.).
+
+#### iCommands
+
+```console
+ssh some_user@karolina.it4i.cz
+[some_use@login4.karolina ~]$ ml irodsfs
+irodsfs configuration file has been created at /home/dvo0012/.irods/config.yml.
+      to start irods fs run: irodsfs -config ~/.irods/config.yml ~/IRODS
+
+iCommands environment file has been created at /home/$USER/.irods/irods_environment.json.
+      to start iCommands run: iinit
+
+[some_user@login4.karolina ~]$ iinit
+Enter your current PAM password:
+```
+
+```console
+[some_use@login4.karolina ~]$ ils
+/IT4I/home/some_user:
+  test.1
+  test.2
+  test.3
+  test.4
+```
+
+Use the command `iput` for upload, `iget` for download, or `ihelp` for help.
+
+## Access to iRODS Collection From Other Resource
+
+!!! note
+    This guide assumes you are uploading your data from your local PC/VM.
+
+Use the password from [AAI][f].
+
+### You Need a Client to Connect to iRODS Server
+
+There are many iRODS clients, but we recommend the following:
+
+- Cyberduck - Windows/Mac, GUI
+- Fuse (irodsfs lite) - Linux, CLI
+- iCommands - Linux, CLI.
+
+For access, set PAM passwords at [AAI][f].
+
+### Cyberduck
+
+1. Download [Cyberduck][i].
+2. Download [connection profile][1] for IT4I iRods server.
+3. Left double-click this file to open connection.
+
+![](img/irods-cyberduck.jpg)
+
+### Fuse
+
+!!!note "Linux client only"
+    This is a Linux client only, basic knowledge of the command line is necessary.
+
+Fuse allows you to work with your iRODS collection like an ordinary directory.
+
+```console
+cd ~
+wget https://github.com/cyverse/irodsfs/releases/download/v0.7.6/irodsfs_amd64_linux_v0.7.6.tar
+tar -xvf ~/irodsfs_amd64_linux_v0.7.6.tar
+mkdir ~/IRODS ~/.irods/ && cd "$_" && wget https://docs.it4i.cz/config.yml
+wget https://pki.cesnet.cz/_media/certs/chain_geant_ov_rsa_ca_4_full.pem -P ~/.irods/
+```
+
+Edit `~/.irods/config.yml` with username from [AAI][f].
+
+#### Mounting Your Collection
+
+```console
+[some_user@local_pc ~]$ ./irodsfs -config ~/.irods/config.yml ~/IRODS
+time="2022-07-29 09:51:11.720831" level=info msg="Logging to /tmp/irodsfs_cbhp2rucso0ef0s7dtl0.log" function=processArguments package=main
+Password:
+
+time="2022-07-29 09:51:17.691988" level=info msg="Found FUSE Device. Starting iRODS FUSE Lite." function=parentMain package=main
+time="2022-07-29 09:51:17.692683" level=info msg="Running the process in the background mode" function=parentRun package=main
+time="2022-07-29 09:51:17.693381" level=info msg="Process id = 74772" function=parentRun package=main
+time="2022-07-29 09:51:17.693421" level=info msg="Sending configuration data" function=parentRun package=main
+time="2022-07-29 09:51:17.693772" level=info msg="Successfully sent configuration data to background process" function=parentRun package=main
+time="2022-07-29 09:51:18.008166" level=info msg="Successfully started background process" function=parentRun package=main
+```
+
+#### Putting Your Data to iRODS
+
+```console
+[some_use@local_pc ~]$ cp test1G.txt ~/IRODS
+```
+
+It works as ordinary file system
+
+```console
+[some_user@local_pc ~]$ ls -la ~/IRODS
+total 0
+-rwx------ 1 some_user some_user 1073741824 Nov  4  2021 test1G.txt
+```
+
+#### Unmounting Your Collection
+
+To stop/unmount your collection, use:
+
+```console
+[some_user@local_pc ~]$ fusermount -u ~/IRODS
+```
+
+### iCommands
+
+!!!note "Linux client only"
+    This is a Linux client only, basic knowledge of the command line is necessary.
+
+We recommend Centos7, Ubuntu 20 is optional.
+
+#### Steps for Ubuntu 20
+
+```console
+LSB_RELEASE="bionic"
+wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add -
+echo "deb [arch=amd64] https://packages.irods.org/apt/ ${LSB_RELEASE}  main" \
+>   | sudo tee /etc/apt/sources.list.d/renci-irods.list
+deb [arch=amd64] https://packages.irods.org/apt/ bionic  main
+
+sudo apt-get update
+apt-cache search irods
+wget -c \
+  http://security.ubuntu.com/ubuntu/pool/main/p/python-urllib3/python-urllib3_1.22-1ubuntu0.18.04.2_all.deb \
+  http://security.ubuntu.com/ubuntu/pool/main/r/requests/python-requests_2.18.4-2ubuntu0.1_all.deb \
+  http://security.ubuntu.com/ubuntu/pool/main/o/openssl1.0/libssl1.0.0_1.0.2n-1ubuntu5.10_amd64.deb
+sudo apt install \
+  ./python-urllib3_1.22-1ubuntu0.18.04.2_all.deb \
+  ./python-requests_2.18.4-2ubuntu0.1_all.deb \
+  ./libssl1.0.0_1.0.2n-1ubuntu5.6_amd64.deb
+sudo rm -rf \
+  ./python-urllib3_1.22-1ubuntu0.18.04.2_all.deb \
+  ./python-requests_2.18.4-2ubuntu0.1_all.deb \
+  ./libssl1.0.0_1.0.2n-1ubuntu5.6_amd64.deb
+sudo apt install -y irods-icommands
+mkdir ~/.irods/ && cd "$_" && wget https://docs.it4i.cz/irods_environment.json
+wget https://pki.cesnet.cz/_media/certs/chain_geant_ov_rsa_ca_4_full.pem -P ~/.irods
+sed -i 's,~,'"$HOME"',g' ~/.irods/irods_environment.json
+```
+
+#### Steps for Centos
+
+```console
+sudo rpm --import https://packages.irods.org/irods-signing-key.asc
+sudo wget -qO - https://packages.irods.org/renci-irods.yum.repo | sudo tee /etc/yum.repos.d/renci-irods.yum.repo
+sudo yum install epel-release -y
+sudo yum install python-psutil python-jsonschema
+sudo yum install irods-icommands
+mkdir ~/.irods/ && cd "$_" && wget https://docs.it4i.cz/irods_environment.json
+wget https://pki.cesnet.cz/_media/certs/chain_geant_ov_rsa_ca_4_full.pem -P ~/.irods
+sed -i 's,~,'"$HOME"',g' ~/.irods/irods_environment.json
+```
+
+Edit ***irods_user_name*** in `~/.irods/irods_environment.json` with the username from [AAI][f].
+
+```console
+[some_user@local_pc ~]$ pwd
+/some_user/.irods
+
+[some_user@local_pc ~]$  ls -la
+total 16
+drwx------. 2 some_user some_user 136 Sep 29 08:53 .
+dr-xr-x---. 6 some_user some_user 206 Sep 29 08:53 ..
+-rw-r--r--. 1 some_user some_user 253 Sep 29 08:14 irods_environment.json
+```
+
+**How to Start:**
+
+**step 1:**
+
+```console
+[some_user@local_pc ~]$ iinit
+Enter your current PAM password:
+
+[some_user@local_pc ~]$ ils
+/IT4I/home/some_user:
+  file.jpg
+```
+
+**How to put your data to iRODS**
+
+```console
+[some_user@local_pc ~]$ iput cesnet.crt
+```
+
+```console
+[some_user@local_pc ~]$ ils
+/IT4I/home/some_user:
+  cesnet.crt
+```
+
+**How to download data**
+
+```console
+[some_user@local_pc ~]$ iget cesnet.crt
+ls -la ~
+-rw-r--r--. 1 some_user some_user 1464 Jul 20 13:44 cesnet.crt
+```
+
+For more commands, use the `ihelp` command.
+
+## PID Services
+
+You, as user, may want to index your datasets and allocate some PIDs - Persistent Identifiers for them. We host pid system by hdl-surfsara ([https://it4i-handle.it4i.cz][o]), wich is conected to [https://hdl.handle.net][p], and you are able to create your own PID by calling some of irule.
+
+### How to Create PID
+
+Pids are created by calling `irule`, you have to create at your `$HOME` or everewhere you want,
+but you have to specify the path correctly.
+Rules for pid operations have always `.r suffix`.
+It can by done only through `iCommands`.
+
+Example of a rule for PID creating only:
+
+```console
+user in ~ λ pwd
+/home/user
+
+user in ~ λ ils
+/IT4I/home/user:
+  C- /IT4I/home/dvo0012/Collection_A
+
+user in ~ λ ls -l | grep pid
+-rw-r--r--  1 user user      249 Sep 30 10:55 create_pid.r
+
+user in ~ λ cat create_pid.r
+PID_DO_reg {
+      EUDATCreatePID(*parent_pid, *source, *ror, *fio, *fixed, *newPID);
+      writeLine("stdout","PID: *newPID");
+}
+INPUT *source="/IT4I/home/user/Collection_A",*parent_pid="None",*ror="None",*fio="None",*fixed="true"
+OUTPUT ruleExecOut
+
+user in ~ λ irule -F create_pid.r
+PID: 21.12149/f3b9b1a5-7b4d-4fff-bfb7-826676f6fe14
+```
+
+After creation, your PID is searchable worldwide:
+
+![](img/hdl_net.jpg)
+![](img/hdl_pid.jpg)
+
+**More info at [www.eudat.eu][n]**
+
+### Metadata
+
+For adding metadata to you collection/dataset, you can use imeta from iCommands.
+
+This is after PID creation:
+
+```console
+user in ~ λ imeta ls -C /IT4I/home/user/Collection_A
+AVUs defined for collection /IT4I/home/user/Collection_A:
+attribute: EUDAT/FIXED_CONTENT
+value: True
+units:
+----
+attribute: PID
+value: 21.12149/f3b9b1a5-7b4d-4fff-bfb7-826676f6fe14
+units:
+```
+
+For adding any other metadata you can use:
+
+```console
+user in ~ λ imeta add -C /IT4I/home/user/Collection_A EUDAT_B2SHARE_TITLE Some_Title
+
+user in ~ λ imeta ls -C /IT4I/home/user/Collection_A
+AVUs defined for collection /IT4I/home/user/Collection_A:
+attribute: EUDAT/FIXED_CONTENT
+value: True
+units:
+----
+attribute: PID
+value: 21.12149/f3b9b1a5-7b4d-4fff-bfb7-826676f6fe14
+units:
+----
+attribute: EUDAT_B2SHARE_TITLE
+value: Some_Title
+units:
+```
+
+[1]: irods.cyberduckprofile
+[2]: irods_environment.json
+[3]: config.yml
+[4]: general/access/account-introduction.md
+
+[a]: mailto:support@it4i.cz
+[b]: https://www.dice-eosc.eu/
+[c]: https://irods.org/
+[d]: https://b2access.eudat.eu/
+[f]: https://aai.it4i.cz/realms/IT4i_IRODS/account/#/
+[h]: https://marketplace.eosc-portal.eu/services/b2safe/offers
+[i]: https://cyberduck.io/download/
+[l]: https://www.eudat.eu/contact-support-request?Service=B2SAFE
+[m]: https://aai.it4i.cz/
+[n]: https://www.eudat.eu/catalogue/b2handle
+[o]: https://it4i-handle.it4i.cz
+[p]: https://hdl.handle.net
+[q]: https://eosc-portal.eu/
--- a/docs.it4i/einfracz-migration.md
+++ b/docs.it4i/einfracz-migration.md
+# Migration to e-INFRA CZ
+
+## Introduction
+
+IT4Innovations is a part of [e-INFRA CZ][1] - strategic research infrastructure of the Czech Republic, which provides capacities and resources for the transmission, storage, and processing of scientific and research data. In January 2022, IT4I has begun the process of integration of its services.
+
+As a part of the process, a joint e-INFRA CZ user base has been established. This included a migration of eligible IT4I accounts.
+
+## Who Has Been Affected
+
+The migration affects all accounts of users affiliated with an academic organizations in the Czech Republic who also have an OPEN-XX-XX project. Affected users have received an email with information about changes in personal data processing.
+
+## Who Has Not Been Affected
+
+Commercial users, training accounts, suppliers, and service accounts were **not** affected by the migration.
+
+## Process
+
+During the process, additional steps have been required for successful migration.
+
+This may have included:
+
+1. e-INFRA CZ registration, if one does not already exist.
+2. e-INFRA CZ password reset, if one does not already exist.
+
+## Steps After Migration
+
+After the migration, you must use your **e-INFRA CZ credentials** to access all IT4I services as well as [e-INFRA CZ services][5].
+
+Successfully migrated accounts tied to e-INFRA CZ can be self-managed at [e-INFRA CZ User profile][4].
+
+!!! tip "Recommendation"
+    We recommend [verifying your SSH keys][6] for cluster access.
+
+## Troubleshooting
+
+If you have a problem with your account migrated to e-INFRA CZ user base, contact the [CESNET support][7].
+
+If you have questions or a problem with IT4I account (i.e. account not eligible for migration), contact the [IT4I support][2].
+
+[1]: https://www.e-infra.cz/en
+[2]: mailto:support@it4i.cz
+[3]: https://www.cesnet.cz/?lang=en
+[4]: https://profile.e-infra.cz/
+[5]: https://www.e-infra.cz/en/services
+[6]: https://profile.e-infra.cz/profile/settings/sshKeys
+[7]: mailto:support@cesnet.cz
--- a/docs.it4i/environment-and-modules.md
+++ b/docs.it4i/environment-and-modules.md
+# Environment and Modules
+
+## Shells on Clusters
+
+The table shows which shells are available on the IT4Innovations clusters.
+
+Note that bash is the only supported shell.
+
+| Cluster Name    | bash | tcsh | zsh | ksh | dash |
+| --------------- | ---- | ---- | --- | --- | ---- |
+| Karolina        | yes  | yes  | yes | yes | yes  |
+| Barbora         | yes  | yes  | yes | yes | no   |
+| DGX-2           | yes  | no   | no  | no  | no   |
+
+!!! info
+    Bash is the default shell. Should you need a different shell, contact [support\[at\]it4i.cz][3].
+
+## Environment Customization
+
+After logging in, you may want to configure the environment. Write your preferred path definitions, aliases, functions, and module loads in the .bashrc file
+
+```console
+# ./bashrc
+
+# users compilation path
+export MODULEPATH=${MODULEPATH}:/home/$USER/.local/easybuild/modules/all
+
+# User specific aliases and functions
+alias sq='squeue --me'
+
+# load default intel compilator !!! is not recommended !!!
+ml intel
+
+# Display information to standard output - only in interactive ssh session
+if [ -n "$SSH_TTY" ]
+then
+ ml # Display loaded modules
+fi
+```
+
+!!! note
+    Do not run commands outputting to standard output (echo, module list, etc.) in .bashrc for non-interactive SSH sessions. It breaks the fundamental functionality (SCP) of your account. Take care for SSH session interactivity for such commands as stated in the previous example.
+
+### Application Modules
+
+In order to configure your shell for running a particular application on clusters, we use a module package interface.
+
+Application modules on clusters are built using [EasyBuild][1]. The modules are divided into the following groups:
+
+```
+ base: Default module class
+ bio: Bioinformatics, biology and biomedical
+ cae: Computer Aided Engineering (incl. CFD)
+ chem: Chemistry, Computational Chemistry and Quantum Chemistry
+ compiler: Compilers
+ data: Data management & processing tools
+ debugger: Debuggers
+ devel: Development tools
+ geo: Earth Sciences
+ ide: Integrated Development Environments (e.g. editors)
+ lang: Languages and programming aids
+ lib: General purpose libraries
+ math: High-level mathematical software
+ mpi: MPI stacks
+ numlib: Numerical Libraries
+ perf: Performance tools
+ phys: Physics and physical systems simulations
+ system: System utilities (e.g. highly depending on system OS and hardware)
+ toolchain: EasyBuild toolchains
+ tools: General purpose tools
+ vis: Visualization, plotting, documentation and typesetting
+ OS: singularity image
+ python: python packages
+```
+
+!!! note
+    The modules set up the application paths, library paths and environment variables for running a particular application.
+
+The modules may be loaded, unloaded, and switched according to momentary needs. For details, see [lmod][2].
+
+[1]: software/tools/easybuild.md
+[2]: software/modules/lmod.md
+[3]: mailto:support@it4i.cz
--- a/docs.it4i/general/AUP-final.pdf
+++ b/docs.it4i/general/AUP-final.pdf
--- a/docs.it4i/general/Energy_saving_Karolina.pdf
+++ b/docs.it4i/general/Energy_saving_Karolina.pdf
--- a/docs.it4i/general/access/.gitkeep
+++ b/docs.it4i/general/access/.gitkeep
+
--- a/docs.it4i/general/access/account-introduction.md
+++ b/docs.it4i/general/access/account-introduction.md
+# Introduction
+
+This section provides basic information on how to gain access to IT4Innovations Information systems and project membership.
+
+## Account Types
+
+There are two types of accounts at IT4Innovations:
+
+* [**e-INFRA CZ Account**][1]
+    intended for all persons affiliated with an academic institution from the Czech Republic ([eduID.cz][a]).
+
+* [**IT4I Account**][2]
+    intended for all persons who are not eligible for an e-INFRA CZ account.
+
+Once you create an account, you can use it only for communication with IT4I support and accessing the SCS information system.
+If you want to access IT4I clusters, your account must also be **assigned to a project**.
+
+For more information, see the section:
+
+* [**Get Project Membership**][3]
+    if you want to become a collaborator on a project, or
+
+* [**Get Project**][4]
+    if you want to become a project owner.
+
+[1]: ./einfracz-account.md
+[2]: ../obtaining-login-credentials/obtaining-login-credentials.md
+[3]: ../access/project-access.md
+[4]: ../applying-for-resources.md
+
+[a]: https://www.eduid.cz/
--- a/docs.it4i/general/access/einfracz-account.md
+++ b/docs.it4i/general/access/einfracz-account.md
+# e-INFRA CZ Account
+
+[e-INFRA CZ][1] is a unique research and development e-infrastructure in the Czech Republic,
+which provides capacities and resources for the transmission, storage and processing of scientific and research data.
+IT4Innovations has become a member of e-INFRA CZ on January 2022.
+
+!!! important
+    Only persons affiliated with an academic institution from the Czech Republic ([eduID.cz][6]) are eligible for an e-INFRA CZ account.
+
+## Request e-INFRA CZ Account
+
+1. Request an account:
+    1. Go to [https://signup.e-infra.cz/fed/registrar/?vo=IT4Innovations][2]
+    1. Select a member academic institution you are affiliated with.
+    1. Fill out the e-INFRA CZ Account information (username, password and ssh key(s)).
+
+    Your account should be created in a few minutes after submitting the request.
+    Once your e-INFRA CZ account is created, it is propagated into IT4I systems
+    and can be used to access [SCS portal][3] and [Request Tracker][4].
+
+1. Provide additional information via [IT4I support][a] or email [support\[at\]it4i.cz][b] (**required**, note that without this information, you cannot use IT4I resources):
+    1. **Full name**
+    1. **Gender**
+    1. **Citizenship**
+    1. **Country of residence**
+    1. **Organization/affiliation**
+    1. **Organization/affiliation country**
+    1. **Organization/affiliation type** (university, company, R&D institution, private/public sector (hospital, police), academy of sciences, etc.)
+    1. **Job title**  (student, PhD student, researcher, research assistant, employee, etc.)
+
+Continue to apply for a project or project membership to access clusters through the [SCS portal][3].
+
+## Logging Into IT4I Services
+
+The table below shows how different IT4I services are accessed:
+
+| Services | Access  |
+| -------- | ------- |
+| Clusters | SSH key |
+| IS, RT, web, VPN | e-INFRA CZ login |
+| Profile<br>Change&nbsp;password<br>Change&nbsp;SSH&nbsp;key | Academic institution's credentials<br>e-INFRA CZ / eduID |
+
+You can change you profile settings at any time.
+
+[1]: https://www.e-infra.cz/en
+[2]: https://signup.e-infra.cz/fed/registrar/?vo=IT4Innovations
+[3]: https://scs.it4i.cz/
+[4]: https://support.it4i.cz/
+[5]: ../../management/einfracz-profile.md
+[6]: https://www.eduid.cz/
+
+[a]: https://support.it4i.cz/rt/
+[b]: mailto:support@it4i.cz
--- a/docs.it4i/general/access/project-access.md
+++ b/docs.it4i/general/access/project-access.md
+# Get Project Membership
+
+!!! note
+    You need to be named as a collaborator by a Primary Investigator (PI) in order to access and use the clusters.
+
+## Authorization by Web
+
+This is a preferred method if you have an IT4I or e-INFRA CZ account.
+
+Log in to the [IT4I SCS portal][a] and go to the **Authorization Requests** section. Here you can submit your requests for becoming a project member. You will have to wait until the project PI authorizes your request.
+
+## Authorization by Email
+
+An alternative way to become a project member is on request sent via [email by the project PI][1].
+
+[1]: ../../applying-for-resources/#authorization-by-email-an-alternative-approach
+
+[a]: https://scs.it4i.cz/
--- a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/ood.md
+++ b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/ood.md
+# Open OnDemand
+
+[Open OnDemand][1] is an intuitive, innovative, and interactive interface to remote computing resources.
+It allows users to access our services from any device and web browser,
+resulting in faster and more efficient use of supercomputing resources.
+
+For more information, see the Open OnDemand [documentation][2].
+
+## Access Open OnDemand
+
+To access the OOD service, you must be connected to [IT4I VPN][a].
+Then go to [https://ood-karolina.it4i.cz/][3] for Karolina
+or [https://ood-barbora.it4i.cz/][4] for Barbora and enter your e-INFRA CZ or IT4I credentials.
+
+From the top menu bar, you can manage your files and jobs, access the cluster's shell
+and launch interactive apps on login nodes.
+
+## OOD Apps on IT4I Clusters
+
+!!! note
+    Barbora OOD offers Mate and XFCE Desktops on login node only. Other applications listed below are exclusive to Karolina OOD.
+
+* Desktops
+    * Karolina Login Mate
+    * Karolina Login XFCE
+    * Gnome Desktop
+* GUIs
+    * Ansys
+    * Blender
+    * ParaView
+    * TorchStudio
+* Servers
+    * Code Server
+    * Jupyter (+IJulia)
+    * MATLAB
+    * TensorBoard
+* Simulation
+    * Code Aster
+
+Depending on a selected application, you can set up various properties;
+e.g. partition, number of nodes, tasks per node reservation, etc.
+
+For `qgpu` partitions, you can select the number of GPUs.
+
+![Ansys app in OOD GUI](../../../img/ood-ansys.png)
+
+## Job Composer Tutorial
+
+Under *Jobs > Job Composer*, you can create jobs from several sources.
+A simple tutorial will guide you through the process.
+To restart the tutorial, click *Help* in the upper right corner.
+
+[1]: https://openondemand.org/
+[2]: https://osc.github.io/ood-documentation/latest/
+[3]: https://ood-karolina.it4i.cz/
+[4]: https://ood-barbora.it4i.cz/
+
+[a]: ../vpn-access.md
--- a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md
+++ b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md
+# VNC
+
+Virtual Network Computing (VNC) is a graphical desktop-sharing system that uses the Remote Frame Buffer protocol (RFB) to remotely control another computer. It transmits the keyboard and mouse events from one computer to another, relaying the graphical screen updates back in the other direction, over a network.
+
+VNC-based connections are usually faster (require less network bandwidth) than [X11][1] applications forwarded directly through SSH.
+
+The recommended clients are [TightVNC][b] or [TigerVNC][c] (free, open source, available for almost any platform).
+
+## Create VNC Server Password
+
+!!! note
+    VNC server password should be set before the first login. Use a strong password.
+
+```console
+$ vncpasswd
+Password:
+Verify:
+```
+
+## Start VNC Server
+
+!!! note
+    To access VNC, a remote VNC Server must be started first and a tunnel using SSH port forwarding must be established.
+
+[See below][2] the details on SSH tunnels.
+
+Start by **choosing your display number**.
+To choose a free one, you should check currently occupied display numbers - list them using the command:
+
+```console
+$ ps aux | grep Xvnc | sed -rn 's/(\s) .*Xvnc (\:[0-9]+) .*/\1 \2/p'
+username :79
+username :60
+.....
+```
+
+As you can see above, displays ":79" and ":60" are already occupied.
+Generally, you can choose display number freely, *except these occupied numbers*.
+Also remember that display number should be lower than or equal to 99.
+Based on this requirement, we have chosen the display number 61, as seen in the examples below.
+
+!!! note
+    Your situation may be different so the choice of your number may differ, as well. **Choose and use your own display number accordingly!**
+
+Start your remote VNC server on the chosen display number (61):
+
+```console
+$ vncserver :61 -geometry 1600x900 -depth 16
+
+New 'login2:1 (username)' desktop is login2:1
+
+Starting applications specified in /home/username/.vnc/xstartup
+Log file is /home/username/.vnc/login2:1.log
+```
+
+Check whether the VNC server is running on the chosen display number (61):
+
+```console
+$ vncserver -list
+
+TigerVNC server sessions:
+
+X DISPLAY #     PROCESS ID
+:61              18437
+```
+
+Another way to check it:
+
+```console
+$  ps aux | grep Xvnc | sed -rn 's/(\s) .*Xvnc (\:[0-9]+) .*/\1 \2/p'
+
+username :61
+username :102
+```
+
+!!! note
+    The VNC server runs on port 59xx, where xx is the display number. To get your port number, simply add 5900 + display number, in our example 5900 + 61 = 5961. Another example for display number 102 is calculation of TCP port 5900 + 102 = 6002, but note that TCP ports above 6000 are often used by X11. **Calculate your own port number and use it instead of 5961 from examples below**.
+
+To access the remote VNC server you have to create a tunnel between the login node using TCP port 5961 and your local  machine using a free TCP port (for simplicity the very same) in next step. See examples for [Linux/Mac OS][2] and [Windows][3].
+
+!!! note
+    The tunnel must point to the same login node where you launched the VNC server, e.g. login2. If you use just cluster-name.it4i.cz, the tunnel might point to a different node due to DNS round robin.
+
+## Linux/Mac OS Example of Creating a Tunnel
+
+On your local machine, create the tunnel:
+
+```console
+$ ssh -TN -f username@login2.cluster-name.it4i.cz -L 5961:localhost:5961
+```
+
+Issue the following command to check the tunnel is established (note the PID 2022 in the last column, it is required for closing the tunnel):
+
+```console
+$ netstat -natp | grep 5961
+(Not all processes could be identified, non-owned process info
+ will not be shown, you would have to be root to see it all.)
+tcp        0      0 127.0.0.1:5961          0.0.0.0:*               LISTEN      2022/ssh
+tcp6       0      0 ::1:5961                :::*                    LISTEN      2022/ssh
+```
+
+Or on Mac OS use this command:
+
+```console
+$ lsof -n -i4TCP:5961 | grep LISTEN
+ssh 75890 sta545 7u IPv4 0xfb062b5c15a56a3b 0t0 TCP 127.0.0.1:5961 (LISTEN)
+```
+
+Connect with the VNC client:
+
+```console
+$ vncviewer 127.0.0.1:5961
+```
+
+In this example, we connect to remote VNC server on port 5961, via the SSH tunnel. The connection is encrypted and secured. The VNC server listening on port 5961 provides screen of 1600x900 pixels.
+
+You have to close the SSH tunnel which is still running in the background after you finish the work. Use the following command (PID 2022 in this case, see the netstat command above):
+
+```console
+kill 2022
+```
+
+!!! note
+    You can watch the instruction video on how to make a VNC connection between a local Ubuntu desktop and the IT4I cluster [here][e].
+
+## Windows Example of Creating a Tunnel
+
+Start the VNC server using the `vncserver` command described above.
+
+Search for the localhost and port number (in this case 127.0.0.1:5961):
+
+```console
+$ netstat -tanp | grep Xvnc
+(Not all processes could be identified, non-owned process info
+ will not be shown, you would have to be root to see it all.)
+tcp        0      0 127.0.0.1:5961              0.0.0.0:*                   LISTEN      24031/Xvnc
+```
+
+### PuTTY
+
+On the PuTTY Configuration screen, go to _Connection -> SSH -> Tunnels_ to set up the tunnel.
+
+Fill the _Source port_ and _Destination_ fields. **Do not forget to click the _Add_ button**.
+
+![](../../../img/putty-tunnel.png)
+
+### WSL (Bash on Windows)
+
+[Windows Subsystem for Linux][d] is another way to run Linux software in a Windows environment.
+
+At your machine, create the tunnel:
+
+```console
+$ ssh username@login2.cluster-name.it4i.cz -L 5961:localhost:5961
+```
+
+## Example of Starting VNC Client
+
+Run the VNC client of your choice, select the VNC server 127.0.0.1, port 5961 and connect using the VNC password.
+
+### TigerVNC Viewer
+
+![](../../../img/vncviewer.png)
+
+In this example, we connect to remote the VNC server on port 5961, via the SSH tunnel, using the TigerVNC viewer. The connection is encrypted and secured. The VNC server listening on port 5961 provides a screen of 1600x900 pixels.
+
+### TightVNC Viewer
+
+Use your VNC password to log using the TightVNC Viewer and start a Gnome Session on the login node.
+
+![](../../../img/TightVNC_login.png)
+
+## Gnome Session
+
+After the successful login, you should see the following screen:
+
+![](../../../img/gnome_screen.png)
+
+### Disable Your Gnome Session Screensaver
+
+Open the Screensaver preferences dialog:
+
+![](../../../img/gdmscreensaver.png)
+
+Uncheck both options below the slider:
+
+![](../../../img/gdmdisablescreensaver.png)
+
+### Kill Screensaver if Locked Screen
+
+If the screen gets locked, you have to kill the screensaver. Do not forget to disable the screensaver then.
+
+```console
+$ ps aux | grep screen
+username     1503 0.0 0.0 103244   892 pts/4    S+   14:37   0:00 grep screen
+username     24316 0.0 0.0 270564 3528 ?        Ss   14:12   0:00 gnome-screensaver
+
+[username@login2 .vnc]$ kill 24316
+```
+
+## Kill VNC Server After Finished Work
+
+You should kill your VNC server using the command:
+
+```console
+$ vncserver -kill :61
+Killing Xvnc process ID 7074
+Xvnc process ID 7074 already killed
+```
+
+or:
+
+```console
+$ pkill vnc
+```
+
+!!! note
+    Also, do not forget to terminate the SSH tunnel, if it was used. For details, see the end of [this section][2].
+
+## GUI Applications on Compute Nodes Over VNC
+
+The very same methods as described above may be used to run the GUI applications on compute nodes. However, for maximum performance, follow these steps:
+
+Open a Terminal (_Applications -> System Tools -> Terminal_). Run all the following commands in the terminal.
+
+![](../../../img/gnome-terminal.png)
+
+Allow incoming X11 graphics from the compute nodes at the login node:
+
+Get an interactive session on a compute node (for more detailed info [look here][4]). Forward X11 system using `--x11` option:
+
+```console
+$ salloc -A PROJECT_ID -p qcpu --x11
+```
+
+Test that the DISPLAY redirection into your VNC session works, by running an X11 application (e.g. XTerm, Intel Advisor, etc.) on the assigned compute node:
+
+```console
+$ xterm
+```
+
+The example described above:
+
+![](../../../img/node_gui_xwindow.png)
+
+### GUI Over VNC and SSH
+
+For a [better performance][1] an SSH connection can be used.
+
+Open two Terminals (_Applications -> System Tools -> Terminal_) as described before.
+
+Get an interactive session on a compute node (for more detailed info [look here][4]). Forward X11 system using `--x11` option:
+
+```console
+$ salloc -A PROJECT_ID -p qcpu --x11
+```
+
+In the second terminal connect to the assigned node and run the X11 application
+
+```console
+$ ssh -X node_name.barbora.it4i.cz
+$ xterm
+```
+
+The example described above:
+![](../../../img/node_gui_sshx.png)
+
+[b]: http://www.tightvnc.com
+[c]: http://sourceforge.net/apps/mediawiki/tigervnc/index.php?title=Main_Page
+[d]: http://docs.microsoft.com/en-us/windows/wsl
+[e]: https://www.youtube.com/watch?v=b9Ez9UN2uL0
+
+[1]: x-window-system.md
+[2]: #linuxmac-os-example-of-creating-a-tunnel
+[3]: #windows-example-of-creating-a-tunnel
+[4]: ../../job-submission-and-execution.md
--- a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/x-window-system.md
+++ b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/x-window-system.md
+# X Window System
+
+The X Window system is a principal way to get GUI access to the clusters. The **X Window System** (commonly known as **X11**, based on its current major version being 11, or shortened to simply **X**, and sometimes informally **X-Windows**) is a computer software system and network protocol that provides a basis for graphical user interfaces (GUIs) and rich input device capability for networked computers.
+
+!!! tip
+    The X display forwarding must be activated and the X server running on client side
+
+## X Display
+
+### Linux Example
+
+In order to display the GUI of various software tools, you need to enable the X display forwarding. On Linux and Mac, log in using the `-X` option in the SSH client:
+
+```console
+ local $ ssh -X username@cluster-name.it4i.cz
+```
+
+### PuTTY on Windows
+
+On Windows, use the PuTTY client to enable X11 forwarding. In PuTTY menu, go to _Connection > SSH > X11_ and check the _Enable X11 forwarding_ checkbox before logging in. Then log in as usual.
+
+![](../../../img/cygwinX11forwarding.png)
+
+### WSL (Bash on Windows)
+
+To enable the X display forwarding, log in using the `-X` option in the SSH client:
+
+```console
+ local $ ssh -X username@cluster-name.it4i.cz
+```
+
+!!! tip
+    If you are getting the "cannot open display" error message, try to export the DISPLAY variable, before attempting to log in:
+
+```console
+ local $ export DISPLAY=localhost:0.0
+```
+
+## X Server
+
+In order to display the GUI of various software tools, you need a running X server on your desktop computer. For Linux users, no action is required as the X server is the default GUI environment on most Linux distributions. Mac and Windows users need to install and run the X server on their workstations.
+
+### X Server on OS X
+
+Mac OS users need to install [XQuartz server][d].
+
+### WSL (Bash on Windows)
+
+To run Linux GuI on WSL, download, for example, [VcXsrv][a].
+
+1. After installation, run XLaunch and during the initial setup, check the `Disable access control`.
+
+    !!! tip
+        Save the configuration and launch VcXsrv using the `config.xlaunch` file, so you won't have to check the option on every run.
+
+1. Allow VcXsrv in your firewall to communicate on private and public networks.
+
+1. Set the `DISPLAY` environment variable, using the following command:
+
+    ```console
+        export DISPLAY="`grep nameserver /etc/resolv.conf | sed 's/nameserver //'`:0"
+    ```
+
+    !!! tip
+        Include the command at the end of the `/etc/bash.bashrc`, so you don't have to run it every time you run WSL.
+
+1. Test the configuration by running `echo $DISPLAY`:
+
+    ```code
+     user@nb-user:/$ echo $DISPLAY
+     172.26.240.1:0
+    ```
+
+### X Server on Windows
+
+There is a variety of X servers available for the Windows environment. The commercial Xwin32 is very stable and feature-rich. The Cygwin environment provides fully featured open-source XWin X server. For simplicity, we recommend the open-source X server by the [Xming project][e]. For stability and full features, we recommend the [XWin][f] X server by Cygwin
+
+| How to use Xwin | How to use Xming |
+|--- | --- |
+| [Install Cygwin][g]. Find and execute XWin.exe to start the X server on Windows desktop computer. | Use Xlaunch to configure Xming. Run Xming to start the X server on a Windows desktop computer. |
+
+## Running GUI Enabled Applications
+
+!!! note
+    Make sure that X forwarding is activated and the X server is running.
+
+Then launch the application as usual. Use the `&` to run the application in background:
+
+```console
+$ ml intel (idb and gvim not installed yet)
+$ gvim &
+```
+
+```console
+$ xterm
+```
+
+In this example, we activate the Intel programing environment tools and then start the graphical gvim editor.
+
+## GUI Applications on Compute Nodes
+
+Allocate the compute nodes using the `--x11` option on the `salloc` command:
+
+```console
+$ salloc -A PROJECT-ID -q qcpu_exp --x11
+```
+
+In this example, we allocate one node via qcpu_exp queue, interactively. We request X11 forwarding with the `--x11` option. It will be possible to run the GUI enabled applications directly on the first compute node.
+
+For **better performance**, log on the allocated compute node via SSH, using the `-X` option.
+
+```console
+$ ssh -X cn245
+```
+
+In this example, we log on the cn245 compute node, with the X11 forwarding enabled.
+
+## Gnome GUI Environment
+
+The Gnome 2.28 GUI environment is available on the clusters. We recommend using a separate X server window for displaying the Gnome environment.
+
+### Gnome on Linux and OS X
+
+To run the remote Gnome session in a window on a Linux/OS X computer, you need to install Xephyr. Ubuntu package is
+xserver-xephyr, on OS X it is part of [XQuartz][i]. First, launch Xephyr on local machine:
+
+```console
+local $ Xephyr -ac -screen 1024x768 -br -reset -terminate :1 &
+```
+
+This will open a new X window of size 1024x768 at DISPLAY :1. Next, connect via SSH to the cluster with the `DISPLAY` environment variable set and launch a gnome-session:
+
+```console
+local $ DISPLAY=:1.0 ssh -XC yourname@cluster-name.it4i.cz -i ~/.ssh/path_to_your_key
+... cluster-name MOTD...
+yourname@login1.cluster-namen.it4i.cz $ gnome-session &
+```
+
+On older systems where Xephyr is not available, you may also try Xnest instead of Xephyr. Another option is to launch a new X server in a separate console via:
+
+```console
+xinit /usr/bin/ssh -XT -i .ssh/path_to_your_key yourname@cluster-namen.it4i.cz gnome-session -- :1 vt12
+```
+
+However, this method does not seem to work with recent Linux distributions and you will need to manually source
+/etc/profile to properly set environment variables for Slurm.
+
+### Gnome on Windows
+
+Use XLaunch to start the Xming server or run the XWin.exe. Select the "One window" mode.
+
+Log in to the cluster using [PuTTY][2] or [Bash on Windows][3]. On the cluster, run the gnome-session command.
+
+```console
+$ gnome-session &
+```
+
+This way, we run a remote gnome session on the cluster, displaying it in the local X server.
+
+Use System-Log Out to close the gnome-session.
+
+[1]: #if-no-able-to-forward-x11-using-putty-to-cygwinx
+[2]: #putty-on-windows
+[3]: #wsl-bash-on-windows
+
+[a]: https://sourceforge.net/projects/vcxsrv/
+[d]: https://www.xquartz.org
+[e]: http://sourceforge.net/projects/xming/
+[f]: http://x.cygwin.com/
+[g]: http://x.cygwin.com/
+[i]: http://xquartz.macosforge.org/landing/
No results found