Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • sccs/docs.it4i.cz
  • soj0018/docs.it4i.cz
  • lszustak/docs.it4i.cz
  • jarosjir/docs.it4i.cz
  • strakpe/docs.it4i.cz
  • beranekj/docs.it4i.cz
  • tab0039/docs.it4i.cz
  • davidciz/docs.it4i.cz
  • gui0013/docs.it4i.cz
  • mrazek/docs.it4i.cz
  • lriha/docs.it4i.cz
  • it4i-vhapla/docs.it4i.cz
  • hol0598/docs.it4i.cz
  • sccs/docs-it-4-i-cz-fumadocs
  • siw019/docs-it-4-i-cz-fumadocs
15 results
Show changes
Showing
with 3063 additions and 0 deletions
# Using Xilinx Accelerator Platform
The first step to use Xilinx accelerators is to initialize Vitis (compiler) and XRT (runtime) environments.
```console
$ . /tools/Xilinx/Vitis/2023.1/settings64.sh
$ . /opt/xilinx/xrt/setup.sh
```
## Platform Level Accelerator Management
This should allow to examine current platform using `xbutil examine`,
which should output user-level information about XRT platform and list available devices
```
$ xbutil examine
System Configuration
OS Name : Linux
Release : 4.18.0-477.27.1.el8_8.x86_64
Version : #1 SMP Thu Aug 31 10:29:22 EDT 2023
Machine : x86_64
CPU Cores : 64
Memory : 257145 MB
Distribution : Red Hat Enterprise Linux 8.8 (Ootpa)
GLIBC : 2.28
Model : ProLiant XL675d Gen10 Plus
XRT
Version : 2.16.0
Branch : master
Hash : f2524a2fcbbabd969db19abf4d835c24379e390d
Hash Date : 2023-10-11 14:01:19
XOCL : 2.16.0, f2524a2fcbbabd969db19abf4d835c24379e390d
XCLMGMT : 2.16.0, f2524a2fcbbabd969db19abf4d835c24379e390d
Devices present
BDF : Shell Logic UUID Device ID Device Ready*
-------------------------------------------------------------------------------------------------------------------------
[0000:88:00.1] : xilinx_u280_gen3x16_xdma_base_1 283BAB8F-654D-8674-968F-4DA57F7FA5D7 user(inst=132) Yes
[0000:8c:00.1] : xilinx_u280_gen3x16_xdma_base_1 283BAB8F-654D-8674-968F-4DA57F7FA5D7 user(inst=133) Yes
* Devices that are not ready will have reduced functionality when using XRT tools
```
Here two Xilinx Alveo u280 accelerators (`0000:88:00.1` and `0000:8c:00.1`) are available.
The `xbutil` can be also used to query additional information about specific device using its BDF address
```console
$ xbutil examine -d "0000:88:00.1"
-------------------------------------------------
[0000:88:00.1] : xilinx_u280_gen3x16_xdma_base_1
-------------------------------------------------
Platform
XSA Name : xilinx_u280_gen3x16_xdma_base_1
Logic UUID : 283BAB8F-654D-8674-968F-4DA57F7FA5D7
FPGA Name :
JTAG ID Code : 0x14b7d093
DDR Size : 0 Bytes
DDR Count : 0
Mig Calibrated : true
P2P Status : disabled
Performance Mode : not supported
P2P IO space required : 64 GB
Clocks
DATA_CLK (Data) : 300 MHz
KERNEL_CLK (Kernel) : 500 MHz
hbm_aclk (System) : 450 MHz
Mac Addresses : 00:0A:35:0E:20:B0
: 00:0A:35:0E:20:B1
Device Status: HEALTHY
Hardware Context ID: 0
Xclbin UUID: 6306D6AE-1D66-AEA7-B15D-446D4ECC53BD
PL Compute Units
Index Name Base Address Usage Status
-------------------------------------------------
0 vadd:vadd_1 0x800000 1 (IDLE)
```
Basic functionality of the device can be checked using `xbutil validate -d <BDF>` as
```console
$ xbutil validate -d "0000:88:00.1"
Validate Device : [0000:88:00.1]
Platform : xilinx_u280_gen3x16_xdma_base_1
SC Version : 4.3.27
Platform ID : 283BAB8F-654D-8674-968F-4DA57F7FA5D7
-------------------------------------------------------------------------------
Test 1 [0000:88:00.1] : aux-connection
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 2 [0000:88:00.1] : pcie-link
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 3 [0000:88:00.1] : sc-version
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 4 [0000:88:00.1] : verify
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 5 [0000:88:00.1] : dma
Details : Buffer size - '16 MB' Memory Tag - 'HBM[0]'
Host -> PCIe -> FPGA write bandwidth = 11988.9 MB/s
Host <- PCIe <- FPGA read bandwidth = 12571.2 MB/s
...
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 6 [0000:88:00.1] : iops
Details : IOPS: 387240(verify)
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 7 [0000:88:00.1] : mem-bw
Details : Throughput (Type: DDR) (Bank count: 2) : 33932.9MB/s
Throughput of Memory Tag: DDR[0] is 16974.1MB/s
Throughput of Memory Tag: DDR[1] is 16974.2MB/s
Throughput (Type: HBM) (Bank count: 1) : 12383.7MB/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 8 [0000:88:00.1] : p2p
Test 9 [0000:88:00.1] : vcu
Test 10 [0000:88:00.1] : aie
Test 11 [0000:88:00.1] : ps-aie
Test 12 [0000:88:00.1] : ps-pl-verify
Test 13 [0000:88:00.1] : ps-verify
Test 14 [0000:88:00.1] : ps-iops
```
Finally, the device can be reinitialized using `xbutil reset -d <BDF>` as
```console
$ xbutil reset -d "0000:88:00.1"
Performing 'HOT Reset' on '0000:88:00.1'
Are you sure you wish to proceed? [Y/n]: Y
Successfully reset Device[0000:88:00.1]
```
This can be useful to recover the device from states such as `HANGING`, reported by `xbutil examine -d <BDF>`.
## OpenCL Platform Level
The `clinfo` utility can be used to verify that the accelerator is visible to OpenCL
```console
$ clinfo
Number of platforms: 2
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3590.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Profile: EMBEDDED_PROFILE
Platform Version: OpenCL 1.0
Platform Name: Xilinx
Platform Vendor: Xilinx
Platform Extensions: cl_khr_icd
<...>
Platform Name: Xilinx
Number of devices: 2
Device Type: CL_DEVICE_TYPE_ACCRLERATOR
Vendor ID: 0h
Max compute units: 0
Max work items dimensions: 3
Max work items[0]: 4294967295
Max work items[1]: 4294967295
Max work items[2]: 4294967295
Max work group size: 4294967295
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 0
Max clock frequency: 0Mhz
Address bits: 64
Max memory allocation: 4294967296
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 0
Max size of kernel argument: 2048
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: None
Cache line size: 64
Cache size: 0
Global memory size: 0
Constant buffer size: 4194304
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 16384
Error correction support: 1
Profiling timer resolution: 1
Device endianess: Little
Available: No
Compiler available: No
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: Yes
Profiling: Yes
Platform ID: 0x16fbae8
Name: xilinx_u280_gen3x16_xdma_base_1
Vendor: Xilinx
Driver version: 1.0
Profile: EMBEDDED_PROFILE
Version: OpenCL 1.0
<...>
```
which shows that both `Xilinx` platform and accelerator devices are present.
## Building Applications
To simplify the build process we define two environment variables `IT4I_PLATFORM` and `IT4I_BUILD_MODE`.
The first `IT4I_PLATFORM` denotes specific accelerator hardware such as `Alveo u250` or `Alveo u280`
and its configuration stored in (`*.xpfm` files).
The list of available platforms can be obtained using `platforminfo` utility:
```console
$ platforminfo -l
{
"platforms": [
{
"baseName": "xilinx_u280_gen3x16_xdma_1_202211_1",
"version": "202211.1",
"type": "sdaccel",
"dataCenter": "true",
"embedded": "false",
"externalHost": "true",
"serverManaged": "true",
"platformState": "impl",
"usesPR": "true",
"platformFile": "\/opt\/xilinx\/platforms\/xilinx_u280_gen3x16_xdma_1_202211_1\/xilinx_u280_gen3x16_xdma_1_202211_1.xpfm"
},
{
"baseName": "xilinx_u250_gen3x16_xdma_4_1_202210_1",
"version": "202210.1",
"type": "sdaccel",
"dataCenter": "true",
"embedded": "false",
"externalHost": "true",
"serverManaged": "true",
"platformState": "impl",
"usesPR": "true",
"platformFile": "\/opt\/xilinx\/platforms\/xilinx_u250_gen3x16_xdma_4_1_202210_1\/xilinx_u250_gen3x16_xdma_4_1_202210_1.xpfm"
}
]
}
```
Here, `baseName` and potentially `platformFile` are of interest and either can be specified as value of `IT4I_PLATFORM`.
In this case we have platform files `xilinx_u280_gen3x16_xdma_1_202211_1` (Alveo u280) and `xilinx_u250_gen3x16_xdma_4_1_202210_1` (Alveo u250).
The `IT4I_BUILD_MODE` is used to specify build type (`hw`, `hw_emu` and `sw_emu`):
- `hw` performs full synthesis for the accelerator
- `hw_emu` allows to run both synthesis and emulation for debugging
- `sw_emu` compiles kernels only for emulation (doesn't require accelerator and allows much faster build)
For example to configure build for `Alveo u280` we set:
```console
$ export IT4I_PLATFORM=xilinx_u280_gen3x16_xdma_1_202211_1
```
### Software Emulation Mode
The software emulation mode is preferable for development as HLS synthesis is very time consuming. To build following applications in this mode we set:
```console
$ export IT4I_BUILD_MODE=sw_emu
```
and run each application with `XCL_EMULATION_MODE` set to `sw_emu`:
```
$ XCL_EMULATION_MODE=sw_emu <application>
```
### Hardware Synthesis Mode
!!! note
The HLS of these simple applications **can take up to 2 hours** to finish.
To allow the application to utilize real hardware we have to synthetize FPGA design for the accelerator. This can be done by repeating same steps used to build kernels in emulation mode, but with `IT4I_BUILD_MODE` set to `hw` like so:
```console
$ export IT4I_BUILD_MODE=hw
```
the host application binary can be reused, but it has to be run without `XCL_EMULATION_MODE`:
```console
$ <application>
```
## Sample Applications
The first two samples illustrate two main approaches to building FPGA accelerated applications using Xilinx platform - **XRT** and **OpenCL**.
The final example combines **HIP** with **XRT** to show basics necessary to build application, which utilizes both GPU and FPGA accelerators.
### Using HLS and XRT
The applications are typically separated into host and accelerator/kernel side.
The following host-side code should be saved as `host.cpp`
```c++
/*
# Copyright (C) 2023, Advanced Micro Devices, Inc. All rights reserved.
# SPDX-License-Identifier: X11
*/
#include <iostream>
#include <cstring>
// XRT includes
#include "xrt/xrt_bo.h"
#include <experimental/xrt_xclbin.h>
#include "xrt/xrt_device.h"
#include "xrt/xrt_kernel.h"
#define DATA_SIZE 4096
int main(int argc, char** argv)
{
if(argc != 2)
{
std::cout << "Usage: " << argv[0] << " <XCLBIN File>" << std::endl;
return EXIT_FAILURE;
}
// Read settings
std::string binaryFile = argv[1];
int device_index = 0;
std::cout << "Open the device" << device_index << std::endl;
auto device = xrt::device(device_index);
std::cout << "Load the xclbin " << binaryFile << std::endl;
auto uuid = device.load_xclbin("./vadd.xclbin");
size_t vector_size_bytes = sizeof(int) * DATA_SIZE;
//auto krnl = xrt::kernel(device, uuid, "vadd");
auto krnl = xrt::kernel(device, uuid, "vadd", xrt::kernel::cu_access_mode::exclusive);
std::cout << "Allocate Buffer in Global Memory\n";
auto boIn1 = xrt::bo(device, vector_size_bytes, krnl.group_id(0)); //Match kernel arguments to RTL kernel
auto boIn2 = xrt::bo(device, vector_size_bytes, krnl.group_id(1));
auto boOut = xrt::bo(device, vector_size_bytes, krnl.group_id(2));
// Map the contents of the buffer object into host memory
auto bo0_map = boIn1.map<int*>();
auto bo1_map = boIn2.map<int*>();
auto bo2_map = boOut.map<int*>();
std::fill(bo0_map, bo0_map + DATA_SIZE, 0);
std::fill(bo1_map, bo1_map + DATA_SIZE, 0);
std::fill(bo2_map, bo2_map + DATA_SIZE, 0);
// Create the test data
int bufReference[DATA_SIZE];
for (int i = 0; i < DATA_SIZE; ++i)
{
bo0_map[i] = i;
bo1_map[i] = i;
bufReference[i] = bo0_map[i] + bo1_map[i]; //Generate check data for validation
}
// Synchronize buffer content with device side
std::cout << "synchronize input buffer data to device global memory\n";
boIn1.sync(XCL_BO_SYNC_BO_TO_DEVICE);
boIn2.sync(XCL_BO_SYNC_BO_TO_DEVICE);
std::cout << "Execution of the kernel\n";
auto run = krnl(boIn1, boIn2, boOut, DATA_SIZE); //DATA_SIZE=size
run.wait();
// Get the output;
std::cout << "Get the output data from the device" << std::endl;
boOut.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
// Validate results
if (std::memcmp(bo2_map, bufReference, vector_size_bytes))
throw std::runtime_error("Value read back does not match reference");
std::cout << "TEST PASSED\n";
return 0;
}
```
The host-side code can now be compiled using GCC toolchain as:
```console
$ g++ host.cpp -I$XILINX_XRT/include -I$XILINX_VIVADO/include -L$XILINX_XRT/lib -lxrt_coreutil -o host
```
The accelerator side (simple vector-add kernel) should be saved as `vadd.cpp`.
```c++
/*
# Copyright (C) 2023, Advanced Micro Devices, Inc. All rights reserved.
# SPDX-License-Identifier: X11
*/
extern "C" {
void vadd(
const unsigned int *in1, // Read-Only Vector 1
const unsigned int *in2, // Read-Only Vector 2
unsigned int *out, // Output Result
int size // Size in integer
)
{
#pragma HLS INTERFACE m_axi port=in1 bundle=aximm1
#pragma HLS INTERFACE m_axi port=in2 bundle=aximm2
#pragma HLS INTERFACE m_axi port=out bundle=aximm1
for(int i = 0; i < size; ++i)
{
out[i] = in1[i] + in2[i];
}
}
}
```
The accelerator-side code is build using Vitis `v++`.
This is two-step process, which either builds emulation binary or performs full HLS (depending on the value of `-t` argument).
The platform (specific accelerator) has to be also specified at this step (both for emulation and full HLS).
```console
$ v++ -c -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -k vadd vadd.cpp -o vadd.xo
$ v++ -l -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM vadd.xo -o vadd.xclbin
```
This process should result in `vadd.xclbin`, which can be loaded by host-side application.
### Running the Application
With both host application and kernel binary at hand the application (in emulation mode) can be launched as
```console
$ XCL_EMULATION_MODE=sw_emu ./host vadd.xclbin
```
or with real hardware (having compiled kernels with `IT4I_BUILD_MODE=hw`)
```console
./host vadd.xclbin
```
## Using HLS and OpenCL
The host-side application code should be saved as `host.cpp`.
This application attempts to find `Xilinx` OpenCL platform in the system and selects first device in that platform.
The device is then configured with provided kernel binary.
Other than that the only difference to typical vector-add in OpenCL is use of `enqueueTask(...)` to launch the kernel
(compared to typical `enqueueNDRangeKernel`).
```c++
#include <iostream>
#include <fstream>
#include <iterator>
#include <vector>
#define CL_HPP_TARGET_OPENCL_VERSION 120
#define CL_HPP_MINIMUM_OPENCL_VERSION 120
#define CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY 1
#define CL_USE_DEPRECATED_OPENCL_1_2_APIS
#include <CL/cl2.hpp>
#include <CL/cl_ext_xilinx.h>
std::vector<unsigned char> read_binary_file(const std::string &filename)
{
std::cout << "INFO: Reading " << filename << std::endl;
std::ifstream file(filename, std::ios::binary);
file.unsetf(std::ios::skipws);
std::streampos file_size;
file.seekg(0, std::ios::end);
file_size = file.tellg();
file.seekg(0, std::ios::beg);
std::vector<unsigned char> data;
data.reserve(file_size);
data.insert(data.begin(),
std::istream_iterator<unsigned char>(file),
std::istream_iterator<unsigned char>());
return data;
}
cl::Device select_device()
{
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
cl::Platform platform;
for(cl::Platform &p: platforms)
{
const std::string name = p.getInfo<CL_PLATFORM_NAME>();
std::cout << "PLATFORM: " << name << std::endl;
if(name == "Xilinx")
{
platform = p;
break;
}
}
if(platform == cl::Platform())
{
std::cout << "Xilinx platform not found!" << std::endl;
exit(EXIT_FAILURE);
}
std::vector<cl::Device> devices;
platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices);
return devices[0];
}
static const int DATA_SIZE = 1024;
int main(int argc, char *argv[])
{
if(argc != 2)
{
std::cout << "Usage: " << argv[0] << " <XCLBIN File>" << std::endl;
return EXIT_FAILURE;
}
std::string binary_file = argv[1];
std::vector<int> source_a(DATA_SIZE, 10);
std::vector<int> source_b(DATA_SIZE, 32);
auto program_binary = read_binary_file(binary_file);
cl::Program::Binaries bins{{program_binary.data(), program_binary.size()}};
cl::Device device = select_device();
cl::Context context(device, nullptr, nullptr, nullptr);
cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE);
cl::Program program(context, {device}, bins, nullptr);
cl::Kernel vadd_kernel = cl::Kernel(program, "vector_add");
cl::Buffer buffer_a(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, source_a.size() * sizeof(int), source_a.data());
cl::Buffer buffer_b(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, source_b.size() * sizeof(int), source_b.data());
cl::Buffer buffer_res(context, CL_MEM_READ_WRITE, source_a.size() * sizeof(int));
int narg = 0;
vadd_kernel.setArg(narg++, buffer_res);
vadd_kernel.setArg(narg++, buffer_a);
vadd_kernel.setArg(narg++, buffer_b);
vadd_kernel.setArg(narg++, DATA_SIZE);
q.enqueueTask(vadd_kernel);
std::vector<int> result(DATA_SIZE, 0);
q.enqueueReadBuffer(buffer_res, CL_TRUE, 0, result.size() * sizeof(int), result.data());
int mismatch_count = 0;
for(size_t i = 0; i < DATA_SIZE; ++i)
{
int host_result = source_a[i] + source_b[i];
if(result[i] != host_result)
{
mismatch_count++;
std::cout << "ERROR: " << result[i] << " != " << host_result << std::endl;
break;
}
}
std::cout << "RESULT: " << (mismatch_count == 0 ? "PASSED" : "FAILED") << std::endl;
return 0;
}
```
The host-side code can now be compiled using GCC toolchain as:
```console
$ g++ host.cpp -I$XILINX_XRT/include -I$XILINX_VIVADO/include -lOpenCL -o host
```
The accelerator side (simple vector-add kernel) should be saved as `vadd.cl`.
```c++
#define BUFFER_SIZE 256
#define DATA_SIZE 1024
// TRIPCOUNT indentifier
__constant uint c_len = DATA_SIZE / BUFFER_SIZE;
__constant uint c_size = BUFFER_SIZE;
__attribute__((reqd_work_group_size(1, 1, 1)))
__kernel void vector_add(__global int* c,
__global const int* a,
__global const int* b,
const int n_elements)
{
int arrayA[BUFFER_SIZE];
int arrayB[BUFFER_SIZE];
__attribute__((xcl_loop_tripcount(c_len, c_len)))
for (int i = 0; i < n_elements; i += BUFFER_SIZE)
{
int size = BUFFER_SIZE;
if(i + size > n_elements)
size = n_elements - i;
__attribute__((xcl_loop_tripcount(c_size, c_size)))
__attribute__((xcl_pipeline_loop(1))) readA:
for(int j = 0; j < size; j++)
arrayA[j] = a[i + j];
__attribute__((xcl_loop_tripcount(c_size, c_size)))
__attribute__((xcl_pipeline_loop(1))) readB:
for(int j = 0; j < size; j++)
arrayB[j] = b[i + j];
__attribute__((xcl_loop_tripcount(c_size, c_size)))
__attribute__((xcl_pipeline_loop(1))) vadd_writeC:
for(int j = 0; j < size; j++)
c[i + j] = arrayA[j] + arrayB[j];
}
}
```
The accelerator-side code is build using Vitis `v++`.
This is three-step process, which either builds emulation binary or performs full HLS (depending on the value of `-t` argument).
The platform (specific accelerator) has to be also specified at this step (both for emulation and full HLS).
```console
$ v++ -c -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -k vector_add -o vadd.xo vadd.cl
$ v++ -l -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -o vadd.link.xclbin vadd.xo
$ v++ -p vadd.link.xclbin -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -o vadd.xclbin
```
This process should result in `vadd.xclbin`, which can be loaded by host-side application.
### Running the Application
With both host application and kernel binary at hand the application (in emulation mode) can be launched as
```console
$ XCL_EMULATION_MODE=sw_emu ./host vadd.xclbin
```
or with real hardware (having compiled kernels with `IT4I_BUILD_MODE=hw`)
```console
./host vadd.xclbin
```
## Hybrid GPU and FPGA Application (HIP+XRT)
This simple 8-bit quantized dot product (`R = sum(X[i]*Y[i])`) example illustrates basic approach to utilize both GPU and FPGA accelerators in a single application.
The application takes the simplest approach, where both synchronization and data transfers are handled explicitly by the host.
The HIP toolchain is used to compile the single source host/GPU code as usual, but it is also linked with XRT runtime, which allows host to control the FPGA accelerator.
The FPGA kernels are built separately as in previous examples.
The host/GPU HIP code should be saved as `main.hip`
```c++
#include <iostream>
#include <vector>
#include "xrt/xrt_bo.h"
#include "experimental/xrt_xclbin.h"
#include "xrt/xrt_device.h"
#include "xrt/xrt_kernel.h"
#include "hip/hip_runtime.h"
const size_t DATA_SIZE = 1024;
float compute_reference(const float *srcX, const float *srcY, size_t count);
__global__ void quantize(int8_t *out, const float *in, size_t count)
{
size_t idx = blockIdx.x * blockDim.x + threadIdx.x;
for(size_t i = idx; i < count; i += blockDim.x * gridDim.x)
out[i] = int8_t(in[i] * 127);
}
__global__ void dequantize(float *out, const int16_t *in, size_t count)
{
size_t idx = blockIdx.x * blockDim.x + threadIdx.x;
for(size_t i = idx; i < count; i += blockDim.x * gridDim.x)
out[i] = float(in[i] / float(127*127));
}
int main(int argc, char *argv[])
{
if(argc != 2)
{
std::cout << "Usage: " << argv[0] << " <XCLBIN File>" << std::endl;
return EXIT_FAILURE;
}
// Prepare experiment data
std::vector<float> srcX(DATA_SIZE);
std::vector<float> srcY(DATA_SIZE);
float outR = 0.0f;
for(size_t i = 0; i < DATA_SIZE; ++i)
{
srcX[i] = float(rand()) / float(RAND_MAX);
srcY[i] = float(rand()) / float(RAND_MAX);
outR += srcX[i] * srcY[i];
}
float outR_quant = compute_reference(srcX.data(), srcY.data(), DATA_SIZE);
std::cout << "REFERENCE: " << outR_quant << " (" << outR << ")" << std::endl;
// Initialize XRT (FPGA device), load kernels binary and create kernel object
xrt::device device(0);
std::cout << "Loading xclbin file " << argv[1] << std::endl;
xrt::uuid xclbinId = device.load_xclbin(argv[1]);
xrt::kernel mulKernel(device, xclbinId, "multiply", xrt::kernel::cu_access_mode::exclusive);
// Allocate GPU buffers
float *srcX_gpu, *srcY_gpu, *res_gpu;
int8_t *srcX_gpu_quant, *srcY_gpu_quant;
int16_t *res_gpu_quant;
hipMalloc(&srcX_gpu, DATA_SIZE * sizeof(float));
hipMalloc(&srcY_gpu, DATA_SIZE * sizeof(float));
hipMalloc(&res_gpu, DATA_SIZE * sizeof(float));
hipMalloc(&srcX_gpu_quant, DATA_SIZE * sizeof(int8_t));
hipMalloc(&srcY_gpu_quant, DATA_SIZE * sizeof(int8_t));
hipMalloc(&res_gpu_quant, DATA_SIZE * sizeof(int16_t));
// Allocate FPGA buffers
xrt::bo srcX_fpga_quant(device, DATA_SIZE * sizeof(int8_t), mulKernel.group_id(0));
xrt::bo srcY_fpga_quant(device, DATA_SIZE * sizeof(int8_t), mulKernel.group_id(1));
xrt::bo res_fpga_quant(device, DATA_SIZE * sizeof(int16_t), mulKernel.group_id(2));
// Copy experiment data from HOST to GPU
hipMemcpy(srcX_gpu, srcX.data(), DATA_SIZE * sizeof(float), hipMemcpyHostToDevice);
hipMemcpy(srcY_gpu, srcY.data(), DATA_SIZE * sizeof(float), hipMemcpyHostToDevice);
// Execute quantization kernels on both input vectors
quantize<<<16, 256>>>(srcX_gpu_quant, srcX_gpu, DATA_SIZE);
quantize<<<16, 256>>>(srcY_gpu_quant, srcY_gpu, DATA_SIZE);
// Map FPGA buffers into HOST memory, copy data from GPU to these mapped buffers and synchronize them into FPGA memory
hipMemcpy(srcX_fpga_quant.map<int8_t *>(), srcX_gpu_quant, DATA_SIZE * sizeof(int8_t), hipMemcpyDeviceToHost);
srcX_fpga_quant.sync(XCL_BO_SYNC_BO_TO_DEVICE);
hipMemcpy(srcY_fpga_quant.map<int8_t *>(), srcY_gpu_quant, DATA_SIZE * sizeof(int8_t), hipMemcpyDeviceToHost);
srcY_fpga_quant.sync(XCL_BO_SYNC_BO_TO_DEVICE);
// Execute FPGA kernel (8-bit integer multiplication)
auto kernelRun = mulKernel(res_fpga_quant, srcX_fpga_quant, srcY_fpga_quant, DATA_SIZE);
kernelRun.wait();
// Synchronize output FPGA buffer back to HOST and copy its contents to GPU buffer for dequantization
res_fpga_quant.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
hipMemcpy(res_gpu_quant, res_fpga_quant.map<int16_t *>(), DATA_SIZE * sizeof(int16_t), hipMemcpyDeviceToHost);
// Dequantize multiplication result on GPU
dequantize<<<16, 256>>>(res_gpu, res_gpu_quant, DATA_SIZE);
// Copy dequantized results from GPU to HOST
std::vector<float> res(DATA_SIZE);
hipMemcpy(res.data(), res_gpu, DATA_SIZE * sizeof(float), hipMemcpyDeviceToHost);
// Perform simple sum on CPU
float out = 0.0;
for(size_t i = 0; i < DATA_SIZE; ++i)
out += res[i];
std::cout << "RESULT: " << out << std::endl;
hipFree(srcX_gpu);
hipFree(srcY_gpu);
hipFree(res_gpu);
hipFree(srcX_gpu_quant);
hipFree(srcY_gpu_quant);
hipFree(res_gpu_quant);
return 0;
}
float compute_reference(const float *srcX, const float *srcY, size_t count)
{
float out = 0.0f;
for(size_t i = 0; i < count; ++i)
{
int16_t quantX(srcX[i] * 127);
int16_t quantY(srcY[i] * 127);
out += float(int16_t(quantX * quantY) / float(127*127));
}
return out;
}
```
The host/GPU application can be built using HIPCC as:
```console
$ hipcc -I$XILINX_XRT/include -I$XILINX_VIVADO/include -L$XILINX_XRT/lib -lxrt_coreutil main.hip -o host
```
The accelerator side (simple vector-multiply kernel) should be saved as `kernels.cpp`.
```c++
extern "C" {
void multiply(
short *out,
const char *inX,
const char *inY,
int size)
{
#pragma HLS INTERFACE m_axi port=inX bundle=aximm1
#pragma HLS INTERFACE m_axi port=inY bundle=aximm2
#pragma HLS INTERFACE m_axi port=out bundle=aximm1
for(int i = 0; i < size; ++i)
out[i] = short(inX[i]) * short(inY[i]);
}
}
```
Once again the HLS kernel is build using Vitis `v++` in two steps:
```console
v++ -c -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -k multiply kernels.cpp -o kernels.xo
v++ -l -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM kernels.xo -o kernels.xclbin
```
### Running the Application
In emulation mode (FPGA emulation, GPU HW is required) the application can be launched as:
```console
$ XCL_EMULATION_MODE=sw_emu ./host kernels.xclbin
REFERENCE: 256.554 (260.714)
Loading xclbin file ./kernels.xclbin
RESULT: 256.554
```
or, having compiled kernels with `IT4I_BUILD_MODE=hw` set, using real hardware (both FPGA and GPU HW is required)
```console
$ ./host kernels.xclbin
REFERENCE: 256.554 (260.714)
Loading xclbin file ./kernels.xclbin
RESULT: 256.554
```
## Additional Resources
- [https://xilinx.github.io/Vitis-Tutorials/][1]
- [http://xilinx.github.io/Vitis_Accel_Examples/][2]
[1]: https://xilinx.github.io/Vitis-Tutorials/
[2]: http://xilinx.github.io/Vitis_Accel_Examples/
# Complementary Systems
Complementary systems offer development environment for users
that need to port and optimize their code and applications
for various hardware architectures and software technologies
that are not available on standard clusters.
## Complementary Systems 1
First stage of complementary systems implementation comprises of these partitions:
- compute partition 0 – based on ARM technology - legacy
- compute partition 1 – based on ARM technology - A64FX
- compute partition 2 – based on Intel technologies - Ice Lake, NVDIMMs + Bitware FPGAs
- compute partition 3 – based on AMD technologies - Milan, MI100 GPUs + Xilinx FPGAs
- compute partition 4 – reflecting Edge type of servers
- partition 5 – FPGA synthesis server
![](../img/cs1_1.png)
## Complementary Systems 2
Second stage of complementary systems implementation comprises of these partitions:
- compute partition 6 - based on ARM technology + CUDA programmable GPGPU accelerators on ampere architecture + DPU network processing units
- compute partition 7 - based on IBM Power10 architecture
- compute partition 8 - modern CPU with a very high L3 cache capacity (over 750MB)
- compute partition 9 - virtual GPU accelerated workstations
- compute partition 10 - Sapphire Rapids-HBM server
- compute partition 11 - NVIDIA Grace CPU Superchip
![](../img/cs2_2.png)
## Modules and Architecture Availability
Complementary systems list available modules automatically based on the detected architecture.
However, you can load one of the three modules -- `aarch64`, `avx2`, and `avx512` --
to reload the list of modules available for the respective architecture:
```console
[user@login.cs ~]$ ml architecture/aarch64
aarch64 modules + all modules
[user@login.cs ~]$ ml architecture/avx2
avx2 modules + all modules
[user@login.cs ~]$ ml architecture/avx512
avx512 modules + all modules
```
# Complementary System Job Scheduling
## Introduction
[Slurm][1] workload manager is used to allocate and access Complementary systems resources.
## Getting Partition Information
Display partitions/queues
```console
$ sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
p00-arm up 1-00:00:00 0/1/0/1 p00-arm01
p01-arm* up 1-00:00:00 0/8/0/8 p01-arm[01-08]
p02-intel up 1-00:00:00 0/2/0/2 p02-intel[01-02]
p03-amd up 1-00:00:00 0/2/0/2 p03-amd[01-02]
p04-edge up 1-00:00:00 0/1/0/1 p04-edge01
p05-synt up 1-00:00:00 0/1/0/1 p05-synt01
p06-arm up 1-00:00:00 0/2/0/2 p06-arm[01-02]
p07-power up 1-00:00:00 0/1/0/1 p07-power01
p08-amd up 1-00:00:00 0/1/0/1 p08-amd01
p10-intel up 1-00:00:00 0/1/0/1 p10-intel01
```
## Getting Job Information
Show jobs
```console
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
104 p01-arm interact user R 1:48 2 p01-arm[01-02]
```
Show job details for specific job
```console
$ scontrol -d show job JOBID
```
Show job details for executing job from job session
```console
$ scontrol -d show job $SLURM_JOBID
```
## Running Interactive Jobs
Run interactive job
```console
$ salloc -A PROJECT-ID -p p01-arm
```
Run interactive job, with X11 forwarding
```console
$ salloc -A PROJECT-ID -p p01-arm --x11
```
!!! warning
Do not use `srun` for initiating interactive jobs, subsequent `srun`, `mpirun` invocations would block forever.
## Running Batch Jobs
Run batch job
```console
$ sbatch -A PROJECT-ID -p p01-arm ./script.sh
```
Useful command options (salloc, sbatch, srun)
* -n, --ntasks
* -c, --cpus-per-task
* -N, --nodes
## Slurm Job Environment Variables
Slurm provides useful information to the job via environment variables. Environment variables are available on all nodes allocated to job when accessed via Slurm supported means (srun, compatible mpirun).
See all Slurm variables
```
set | grep ^SLURM
```
### Useful Variables
| variable name | description | example |
| ------ | ------ | ------ |
| SLURM_JOB_ID | job id of the executing job| 593 |
| SLURM_JOB_NODELIST | nodes allocated to the job | p03-amd[01-02] |
| SLURM_JOB_NUM_NODES | number of nodes allocated to the job | 2 |
| SLURM_STEP_NODELIST | nodes allocated to the job step | p03-amd01 |
| SLURM_STEP_NUM_NODES | number of nodes allocated to the job step | 1 |
| SLURM_JOB_PARTITION | name of the partition | p03-amd |
| SLURM_SUBMIT_DIR | submit directory | /scratch/project/open-xx-yy/work |
See [Slurm srun documentation][2] for details.
Get job nodelist
```
$ echo $SLURM_JOB_NODELIST
p03-amd[01-02]
```
Expand nodelist to list of nodes.
```
$ scontrol show hostnames $SLURM_JOB_NODELIST
p03-amd01
p03-amd02
```
## Modifying Jobs
```
$ scontrol update JobId=JOBID ATTR=VALUE
```
for example
```
$ scontrol update JobId=JOBID Comment='The best job ever'
```
## Deleting Jobs
```
$ scancel JOBID
```
## Partitions
| PARTITION | nodes | whole node | cores per node | features |
| --------- | ----- | ---------- | -------------- | -------- |
| p00-arm | 1 | yes | 64 | aarch64,cortex-a72 |
| p01-arm | 8 | yes | 48 | aarch64,a64fx,ib |
| p02-intel | 2 | no | 64 | x86_64,intel,icelake,ib,fpga,bitware,nvdimm |
| p03-amd | 2 | no | 64 | x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx |
| p04-edge | 1 | yes | 16 | 86_64,intel,broadwell,ib |
| p05-synt | 1 | yes | 8 | x86_64,amd,milan,ib,ht |
| p06-arm | 2 | yes | 80 | aarch64,ib |
| p07-power | 1 | yes | 192 | ppc64le,ib |
| p08-amd | 1 | yes | 128 | x86_64,amd,milan-x,ib,ht |
| p10-intel | 1 | yes | 96 | x86_64,intel,sapphire_rapids,ht|
Use `-t`, `--time` option to specify job run time limit. Default job time limit is 2 hours, maximum job time limit is 24 hours.
FIFO scheduling with backfilling is employed.
## Partition 00 - ARM (Cortex-A72)
Whole node allocation.
One node:
```console
salloc -A PROJECT-ID -p p00-arm
```
## Partition 01 - ARM (A64FX)
Whole node allocation.
One node:
```console
salloc -A PROJECT-ID -p p01-arm
```
```console
salloc -A PROJECT-ID -p p01-arm -N=1
```
Multiple nodes:
```console
salloc -A PROJECT-ID -p p01-arm -N=8
```
## Partition 02 - Intel (Ice Lake, NVDIMMs + Bitware FPGAs)
FPGAs are treated as resources. See below for more details about resources.
Partial allocation - per FPGA, resource separation is not enforced.
Use only FPGAs allocated to the job!
One FPGA:
```console
salloc -A PROJECT-ID -p p02-intel --gres=fpga
```
Two FPGAs on the same node:
```console
salloc -A PROJECT-ID -p p02-intel --gres=fpga:2
```
All FPGAs:
```console
salloc -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2
```
## Partition 03 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)
GPUs and FPGAs are treated as resources. See below for more details about resources.
Partial allocation - per GPU and per FPGA, resource separation is not enforced.
Use only GPUs and FPGAs allocated to the job!
One GPU:
```console
salloc -A PROJECT-ID -p p03-amd --gres=gpu
```
Two GPUs on the same node:
```console
salloc -A PROJECT-ID -p p03-amd --gres=gpu:2
```
Four GPUs on the same node:
```console
salloc -A PROJECT-ID -p p03-amd --gres=gpu:4
```
All GPUs:
```console
salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4
```
One FPGA:
```console
salloc -A PROJECT-ID -p p03-amd --gres=fpga
```
Two FPGAs:
```console
salloc -A PROJECT-ID -p p03-amd --gres=fpga:2
```
All FPGAs:
```console
salloc -A PROJECT-ID -p p03-amd -N 2--gres=fpga:2
```
One GPU and one FPGA on the same node:
```console
salloc -A PROJECT-ID -p p03-amd --gres=gpu,fpga
```
Four GPUs and two FPGAs on the same node:
```console
salloc -A PROJECT-ID -p p03-amd --gres=gpu:4,fpga:2
```
All GPUs and FPGAs:
```console
salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4,fpga:2
```
## Partition 04 - Edge Server
Whole node allocation:
```console
salloc -A PROJECT-ID -p p04-edge
```
## Partition 05 - FPGA Synthesis Server
Whole node allocation:
```console
salloc -A PROJECT-ID -p p05-synt
```
## Partition 06 - ARM
Whole node allocation:
```console
salloc -A PROJECT-ID -p p06-arm
```
## Partition 07 - IBM Power
Whole node allocation:
```console
salloc -A PROJECT-ID -p p07-power
```
## Partition 08 - AMD Milan-X
Whole node allocation:
```console
salloc -A PROJECT-ID -p p08-amd
```
## Partition 10 - Intel Sapphire Rapids
Whole node allocation:
```console
salloc -A PROJECT-ID -p p10-intel
```
## Features
Nodes have feature tags assigned to them.
Users can select nodes based on the feature tags using --constraint option.
| Feature | Description |
| ------ | ------ |
| aarch64 | platform |
| x86_64 | platform |
| ppc64le | platform |
| amd | manufacturer |
| intel | manufacturer |
| icelake | processor family |
| broadwell | processor family |
| sapphire_rapids | processor family |
| milan | processor family |
| milan-x | processor family |
| ib | Infiniband |
| gpu | equipped with GPU |
| fpga | equipped with FPGA |
| nvdimm | equipped with NVDIMMs |
| ht | Hyperthreading enabled |
| noht | Hyperthreading disabled |
```
$ sinfo -o '%16N %f'
NODELIST AVAIL_FEATURES
p00-arm01 aarch64,cortex-a72
p01-arm[01-08] aarch64,a64fx,ib
p02-intel01 x86_64,intel,icelake,ib,fpga,bitware,nvdimm,ht
p02-intel02 x86_64,intel,icelake,ib,fpga,bitware,nvdimm,noht
p03-amd02 x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,noht
p03-amd01 x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,ht
p04-edge01 x86_64,intel,broadwell,ib,ht
p05-synt01 x86_64,amd,milan,ib,ht
p06-arm[01-02] aarch64,ib
p07-power01 ppc64le,ib
p08-amd01 x86_64,amd,milan-x,ib,ht
p10-intel01 x86_64,intel,sapphire_rapids,ht
```
```
$ salloc -A PROJECT-ID -p p02-intel --constraint noht
```
```
$ scontrol -d show node p02-intel02 | grep ActiveFeatures
ActiveFeatures=x86_64,intel,icelake,ib,fpga,bitware,nvdimm,noht
```
## Resources, GRES
Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPUs and FPGAs.
!!! warning
Use only allocated GPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behavior and get into troubles.
### Node Resources
Get information about GRES on node.
```
$ scontrol -d show node p02-intel01 | grep Gres=
Gres=fpga:bitware_520n_mx:2
$ scontrol -d show node p02-intel02 | grep Gres=
Gres=fpga:bitware_520n_mx:2
$ scontrol -d show node p03-amd01 | grep Gres=
Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u250:2
$ scontrol -d show node p03-amd02 | grep Gres=
Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u280:2
```
### Request Resources
To allocate required resources (GPUs or FPGAs) use the `--gres salloc/srun` option.
Example: Allocate one FPGA
```
$ salloc -A PROJECT-ID -p p03-amd --gres fpga:1
```
### Find Out Allocated Resources
Information about allocated resources is available in Slurm job details, attributes `JOB_GRES` and `GRES`.
```
$ scontrol -d show job $SLURM_JOBID |grep GRES=
JOB_GRES=fpga:xilinx_alveo_u250:1
Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0)
```
IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPUs) allocated to the job on the node. In the given example - allocated resources are `fpga:xilinx_alveo_u250:1(IDX:0)`, we should use FPGA with index/number 0 on node p03-amd01.
### Request Specific Resources
It is possible to allocate specific resources. It is useful for partition p03-amd equipped with FPGAs of different types.
GRES entry is using format "name[[:type]:count", in the following example name is fpga, type is xilinx_alveo_u280, and count is count 2.
```
$ salloc -A PROJECT-ID -p p03-amd --gres=fpga:xilinx_alveo_u280:2
salloc: Granted job allocation XXX
salloc: Waiting for resource configuration
salloc: Nodes p03-amd02 are ready for job
$ scontrol -d show job $SLURM_JOBID | grep -i gres
JOB_GRES=fpga:xilinx_alveo_u280:2
Nodes=p03-amd02 CPU_IDs=0 Mem=0 GRES=fpga:xilinx_alveo_u280(IDX:0-1)
TresPerNode=gres:fpga:xilinx_alveo_u280:2
```
[1]: https://slurm.schedmd.com/
[2]: https://slurm.schedmd.com/srun.html#SECTION_OUTPUT-ENVIRONMENT-VARIABLES
# Complementary Systems Specifications
Below are the technical specifications of individual Complementary systems.
## Partition 0 - ARM (Cortex-A72)
The partition is based on the [ARMv8-A 64-bit][4] nebo architecture.
- Cortex-A72
- ARMv8-A 64-bit
- 2x 32 cores @ 2 GHz
- 255 GB memory
- disk capacity 3,7 TB
- 1x Infiniband FDR 56 Gb/s
## Partition 1 - ARM (A64FX)
The partition is based on the Armv8.2-A architecture
with SVE extension of instruction set and
consists of 8 compute nodes with the following per-node parameters:
- 1x Fujitsu A64FX CPU
- Arm v8.2-A ISA CPU with Scalable Vector Extension (SVE) extension
- 48 cores at 2.0 GHz
- 32 GB of HBM2 memory
- 400 GB SSD (m.2 form factor) – mixed used type
- 1x Infiniband HDR100 interface
- connected via 16x PCI-e Gen3 slot to the CPU
## Partition 2 - Intel (Ice Lake, NVDIMMs) <!--- + Bitware FPGAs) -->
The partition is based on the Intel Ice Lake x86 architecture.
It contains two servers with Intel NVDIMM memories.
<!--- The key technologies installed are Intel NVDIMM memories. and Intel FPGA accelerators.
The partition contains two servers each with two FPGA accelerators. -->
Each server has the following parameters:
- 2x 3rd Gen Xeon Scalable Processors Intel Xeon Gold 6338 CPU
- 32-cores @ 2.00GHz
- 16x 16GB RAM with ECC
- DDR4-3200
- 1x Infiniband HDR100 interface
- connected to CPU 8x PCI-e Gen4 interface
- 3.2 TB NVMe local storage – mixed use type
<!---
2x FPGA accelerators
Bitware [520N-MX][1]
-->
In addition, the servers has the following parameters:
- Intel server 1 – low NVDIMM memory server with 2304 GB NVDIMM memory
- 16x 128GB NVDIMM persistent memory modules
- Intel server 2 – high NVDIMM memory server with 8448 GB NVDIMM memory
- 16x 512GB NVDIMM persistent memory modules
Software installed on the partition:
FPGA boards support application development using following design flows:
- OpenCL
- High-Level Synthesis (C/C++) including support for OneAPI
- Verilog and VHDL
## Partition 3 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)
The partition is based on two servers equipped with AMD Milan x86 CPUs,
AMD GPUs and Xilinx FPGAs architectures and represents an alternative
to the Intel-based partition's ecosystem.
Each server has the following parameters:
- 2x AMD Milan 7513 CPU
- 32 cores @ 2.6 GHz
- 16x 16GB RAM with ECC
- DDR4-3200
- 4x AMD GPU accelerators MI 100
- Interconnected with AMD Infinity Fabric™ Link for fast GPU to GPU communication
- 1x 100 GBps Infiniband HDR100
- connected to CPU via 8x PCI-e Gen4 interface
- 3.2 TB NVMe local storage – mixed use
In addition:
- AMD server 1 has 2x FPGA [Xilinx Alveo U250 Data Center Accelerator Card][2]
- AMD server 2 has 2x FPGA [Xilinx Alveo U280 Data Center Accelerator Card][3]
Software installed on the partition:
FPGA boards support application development using following design flows:
- OpenCL
- High-Level Synthesis (C/C++)
- Verilog and VHDL
- developer tools and libraries for AMD GPUs.
## Partition 4 - Edge Server
The partition provides overview of the so-called edge computing class of resources
with solutions powerful enough to provide data analytic capabilities (both CPU and GPU)
in a form factor which cannot require a data center to operate.
The partition consists of one edge computing server with following parameters:
- 1x x86_64 CPU Intel Xeon D-1587
- TDP 65 W,
- 16 cores,
- 435 GFlop/s theoretical max performance in double precision
- 1x CUDA programmable GPU NVIDIA Tesla T4
- TDP 70W
- theoretical performance 8.1 TFlop/s in FP32
- 128 GB RAM
- 1.92TB SSD storage
- connectivity:
- 2x 10 Gbps Ethernet,
- WiFi 802.11 ac,
- LTE connectivity
## Partition 5 - FPGA Synthesis Server
FPGAs design tools usually run for several hours to one day to generate a final bitstream (logic design) of large FPGA chips. These tools are usually sequential, therefore part of the system is a dedicated server for this task.
This server is used by development tools needed for FPGA boards installed in both Compute partition 2 and 3.
- AMD EPYC 72F3, 8 cores @ 3.7 GHz nominal frequency
- 8 memory channels with ECC
- 128 GB of DDR4-3200 memory with ECC
- memory is fully populated to maximize memory subsystem performance
- 1x 10Gb Ethernet port used for connection to LAN
- NVMe local storage
- 2x NVMe disks 3.2TB, configured RAID 1
## Partition 6 - ARM + CUDA GPGU (Ampere) + DPU
This partition is based on ARM architecture and is equipped with CUDA programmable GPGPU accelerators
based on Ampere architecture and DPU network processing units.
The partition consists of two nodes with the following per-node parameters:
- Server Gigabyte G242-P36, Ampere Altra Q80-30 (80c, 3.0GHz)
- 512GB DIMM DDR4, 3200MHz, ECC, CL22
- 2x Micron 7400 PRO 1920GB NVMe M.2 Non-SED Enterprise SSD
- 2x NVIDIA A30 GPU Accelerator
- 2x NVIDIA BlueField-2 E-Series DPU 25GbE Dual-Port SFP56, PCIe Gen4 x16, 16GB DDR + 64, 200Gb Ethernet
- Mellanox ConnectX-5 EN network interface card, 10/25GbE dual-port SFP28, PCIe3.0 x8
- Mellanox ConnectX-6 VPI adapter card, 100Gb/s (HDR100, EDR IB and 100GbE), single-port QSFP56
## Partition 7 - IBM
The IBM Power10 server is a single-node partition with the following parameters:
- Server IBM POWER S1022
- 2x Power10 12-CORE TYPICAL 2.90 TO 4.0 GHZ (MAX) PO
- 512GB DDIMMS, 3200 MHZ, 8GBIT DDR4
- 2x ENTERPRISE 1.6 TB SSD PCIE4 NVME U.2 MOD
- 2x ENTERPRISE 6.4 TB SSD PCIE4 NVME U.2 MOD
- PCIE3 LP 2-PORT 25/10GB NIC&ROCE SR/CU A
## Partition 8 - HPE Proliant
This partition provides a modern CPU with a very large L3 cache.
The goal is to enable users to develop algorithms and libraries
that will efficiently utilize this technology.
The processor is very efficient, for example, for linear algebra on relatively small matrices.
This is a single-node partition with the following parameters:
- Server HPE Proliant DL 385 Gen10 Plus v2 CTO
- 2x AMD EPYC 7773X Milan-X, 64 cores, 2.2GHz, 768 MB L3 cache
- 16x HPE 16GB (1x+16GB) x4 DDR4-3200 Registered Smart Memory Kit
- 2x 3.84TB NVMe RI SFF BC U.3ST MV SSD
- BCM 57412 10GbE 2p SFP+ OCP3 Adptr
- HPE IB HDR100/EN 100Gb 1p QSFP56 Adptr1
- HPE Cray Programming Environment for x86 Systems 2 Seats
## Partition 9 - Virtual GPU Accelerated Workstation
This partition provides users with a remote/virtual workstation running MS Windows OS.
It offers rich graphical environment with a focus on 3D OpenGL
or RayTracing-based applications with the smallest possible degradation of user experience.
The partition consists of two nodes with the following per-node parameters:
- Server HPE Proliant DL 385 Gen10 Plus v2 CTO
- 2x AMD EPYC 7413, 24 cores, 2.55GHz
- 16x HPE 32GB 2Rx4 PC4-3200AA-R Smart Kit
- 2x 3.84TB NVMe RI SFF BC U.3ST MV SSD
- BCM 57412 10GbE 2p SFP+ OCP3 Adptr
- 2x NVIDIA A40 48GB GPU Accelerator
### Available Software
The following is the list of software available on partiton 09:
- Academic VMware Horizon 8 Enterprise Term Edition: 10 Concurrent User Pack for 4 year term license; includes SnS
- 8x NVIDIA RTX Virtual Workstation, per concurrent user, EDU, perpetual license
- 32x NVIDIA RTX Virtual Workstation, per concurrent user, EDU SUMS per year
- 7x Windows Server 2022 Standard - 16 Core License Pack
- 10x Windows Server 2022 - 1 User CAL
- 40x Windows 10/11 Enterprise E3 VDA (Microsoft) per year
- Hardware VMware Horizon management
## Partition 10 - Sapphire Rapids-HBM Server
The primary purpose of this server is to evaluate the impact of the HBM memory on the x86 processor
on the performance of the user applications.
This is a new feature previously available only on the GPGPU accelerators
and provided a significant boost to the memory-bound applications.
Users can also compare the impact of the HBM memory with the impact of the large L3 cache
available on the AMD Milan-X processor also available on the complementary systems.
The server is also equipped with DDR5 memory and enables the comparative studies with reference to DDR4 based systems.
- 2x Intel® Xeon® CPU Max 9468 48 cores base 2.1GHz, max 3.5Ghz
- 16x 16GB DDR5 4800Mhz
- 2x Intel D3 S4520 960GB SATA 6Gb/s
- 1x Supermicro Standard LP 2-port 10GbE RJ45, Broadcom BCM57416
## Partition 11 - NVIDIA Grace CPU Superchip
The [NVIDIA Grace CPU Superchip][6] uses the [NVIDIA® NVLink®-C2C][5] technology to deliver 144 Arm® Neoverse V2 cores and 1TB/s of memory bandwidth.
Runs all NVIDIA software stacks and platforms, including NVIDIA RTX™, NVIDIA HPC SDK, NVIDIA AI, and NVIDIA Omniverse™.
- Superchip design with up to 144 Arm Neoverse V2 CPU cores with Scalable Vector Extensions (SVE2)
- World’s first LPDDR5X with error-correcting code (ECC) memory, 1TB/s total bandwidth
- 900GB/s coherent interface, 7X faster than PCIe Gen 5
- NVIDIA Scalable Coherency Fabric with 3.2TB/s of aggregate bisectional bandwidth
- 2X the packaging density of DIMM-based solutions
- 2X the performance per watt of today’s leading CPU
- FP64 Peak of 7.1TFLOPS
[1]: https://www.bittware.com/fpga/520n-mx/
[2]: https://www.xilinx.com/products/boards-and-kits/alveo/u250.html#overview
[3]: https://www.xilinx.com/products/boards-and-kits/alveo/u280.html#overview
[4]: https://developer.arm.com/documentation/100095/0003/
[5]: https://www.nvidia.com/en-us/data-center/nvlink-c2c/
[6]: https://www.nvidia.com/en-us/data-center/grace-cpu-superchip/
# Accessing the DGX-2
## Before You Access
!!! warning
GPUs are single-user devices. GPU memory is not purged between job runs and it can be read (but not written) by any user. Consider the confidentiality of your running jobs.
## How to Access
The DGX-2 machine is integrated into [Barbora cluster][3].
The DGX-2 machine can be accessed from Barbora login nodes `barbora.it4i.cz` through the Barbora scheduler queue qdgx as a compute node cn202.
## Storage
There are three shared file systems on the DGX-2 system: HOME, SCRATCH (LSCRATCH), and PROJECT.
### HOME
The HOME filesystem is realized as an NFS filesystem. This is a shared home from the [Barbora cluster][1].
### SCRATCH
The SCRATCH is realized on an NVME storage. The SCRATCH filesystem is mounted in the `/scratch` directory.
Accessible capacity is 22TB, shared among all users.
!!! warning
Files on the SCRATCH filesystem that are not accessed for more than 60 days will be automatically deleted.
### PROJECT
The PROJECT data storage is IT4Innovations' central data storage accessible from all clusters.
For more information on accessing PROJECT, its quotas, etc., see the [PROJECT Data Storage][2] section.
[1]: ../../barbora/storage/#home-file-system
[2]: ../../storage/project-storage
[3]: ../../barbora/introduction
# NVIDIA DGX-2
The DGX-2 is a very powerful computational node, featuring high end x86_64 processors and 16 NVIDIA V100-SXM3 GPUs.
| NVIDIA DGX-2 | |
| --- | --- |
| CPUs | 2 x Intel Xeon Platinum |
| GPUs | 16 x NVIDIA Tesla V100 32GB HBM2 |
| System Memory | Up to 1.5 TB DDR4 |
| GPU Memory | 512 GB HBM2 (16 x 32 GB) |
| Storage | 30 TB NVMe, Up to 60 TB |
| Networking | 8 x Infiniband or 8 x 100 GbE |
| Power | 10 kW |
| Size | 350 lbs |
| GPU Throughput | Tensor: 1920 TFLOPs, FP16: 520 TFLOPs, FP32: 260 TFLOPs, FP64: 130 TFLOPs |
The [DGX-2][a] introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
With NVLink2, it enables 16x NVIDIA V100-SXM3 GPUs in a single system, for a total bandwidth going beyond 14 TB/s.
Featuring pair of Xeon 8168 CPUs, 1.5 TB of memory, and 30 TB of NVMe storage,
we get a system that consumes 10 kW, weighs 163.29 kg, but offers double precision performance in excess of 130TF.
The DGX-2 is designed to be a powerful server in its own right.
On the storage side, the DGX-2 comes with 30TB of NVMe-based solid state storage.
For clustering or further inter-system communications, it also offers InfiniBand and 100GigE connectivity, up to eight of them.
Further, the [DGX-2][b] offers a total of ~2 PFLOPs of half precision performance in a single system, when using the tensor cores.
![](../img/dgx1.png)
With DGX-2, AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes.
The DGX-2 is able to complete the training process
for FAIRSEQ – a neural network model for language translation – 10x faster than a DGX-1 system,
bringing it down to less than two days total rather than 15 days.
The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity.
The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space,
though with the usual tradeoffs involved if going off-chip.
![](../img/dgx2-nvlink.png)
[a]: https://www.nvidia.com/content/dam/en-zz/es_em/Solutions/Data-Center/dgx-2/nvidia-dgx-2-datasheet.pdf
[b]: https://www.youtube.com/embed/OTOGw0BRqK0
# Resource Allocation and Job Execution
To run a job, computational resources of DGX-2 must be allocated.
The DGX-2 machine is integrated to and accessible through Barbora cluster, the queue for the DGX-2 machine is called **qdgx**.
When allocating computational resources for the job, specify:
1. your Project ID
1. a queue for your job - **qdgx**;
1. the maximum time allocated to your calculation (default is **4 hour**, maximum is **48 hour**);
1. a jobscript if batch processing is intended.
Submit the job using the `sbatch` (for batch processing) or `salloc` (for interactive session) command:
**Example**
```console
[kru0052@login2.barbora ~]$ salloc -A PROJECT-ID -p qdgx --time=02:00:00
salloc: Granted job allocation 36631
salloc: Waiting for resource configuration
salloc: Nodes cn202 are ready for job
kru0052@cn202:~$ nvidia-smi
Wed Jun 16 07:46:32 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM3... On | 00000000:34:00.0 Off | 0 |
| N/A 32C P0 51W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM3... On | 00000000:36:00.0 Off | 0 |
| N/A 31C P0 48W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM3... On | 00000000:39:00.0 Off | 0 |
| N/A 35C P0 53W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM3... On | 00000000:3B:00.0 Off | 0 |
| N/A 36C P0 53W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM3... On | 00000000:57:00.0 Off | 0 |
| N/A 29C P0 50W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM3... On | 00000000:59:00.0 Off | 0 |
| N/A 35C P0 51W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM3... On | 00000000:5C:00.0 Off | 0 |
| N/A 30C P0 50W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM3... On | 00000000:5E:00.0 Off | 0 |
| N/A 35C P0 53W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 8 Tesla V100-SXM3... On | 00000000:B7:00.0 Off | 0 |
| N/A 30C P0 50W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 9 Tesla V100-SXM3... On | 00000000:B9:00.0 Off | 0 |
| N/A 30C P0 51W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 10 Tesla V100-SXM3... On | 00000000:BC:00.0 Off | 0 |
| N/A 35C P0 51W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 11 Tesla V100-SXM3... On | 00000000:BE:00.0 Off | 0 |
| N/A 35C P0 50W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 12 Tesla V100-SXM3... On | 00000000:E0:00.0 Off | 0 |
| N/A 31C P0 50W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 13 Tesla V100-SXM3... On | 00000000:E2:00.0 Off | 0 |
| N/A 29C P0 51W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 14 Tesla V100-SXM3... On | 00000000:E5:00.0 Off | 0 |
| N/A 34C P0 51W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 15 Tesla V100-SXM3... On | 00000000:E7:00.0 Off | 0 |
| N/A 34C P0 50W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
kru0052@cn202:~$ exit
```
!!! tip
Submit the interactive job using the `salloc` command.
## Job Execution
The DGX-2 machine runs only a bare-bone, minimal operating system. Users are expected to run
**[Apptainer/Singularity][1]** containers in order to enrich the environment according to the needs.
Containers (Docker images) optimized for DGX-2 may be downloaded from
[NVIDIA Gpu Cloud][2]. Select the code of interest and
copy the docker nvcr.io link from the Pull Command section. This link may be directly used
to download the container via Apptainer/Singularity, see the example below:
### Example - Apptainer/Singularity Run Tensorflow
```console
[kru0052@login2.barbora ~] $ salloc -A PROJECT-ID -p qdgx --time=02:00:00
salloc: Granted job allocation 36633
salloc: Waiting for resource configuration
salloc: Nodes cn202 are ready for job
kru0052@cn202:~$ singularity shell docker://nvcr.io/nvidia/tensorflow:19.02-py3
Singularity tensorflow_19.02-py3.sif:~>
Singularity tensorflow_19.02-py3.sif:~> mpiexec --bind-to socket -np 16 python /opt/tensorflow/nvidia-examples/cnn/resnet.py --layers=18 --precision=fp16 --batch_size=512
PY 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609]
TF 1.13.0-rc0
PY 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609]
TF 1.13.0-rc0
PY 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609]
TF 1.13.0-rc0
PY 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609]
TF 1.13.0-rc0
PY 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609]
TF 1.13.0-rc0
PY 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609]
...
...
...
2019-03-11 08:30:12.263822: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
1 1.0 338.2 6.999 7.291 2.00000
10 10.0 3658.6 5.658 5.950 1.62000
20 20.0 25628.6 2.957 3.258 1.24469
30 30.0 30815.1 0.177 0.494 0.91877
40 40.0 30826.3 0.004 0.330 0.64222
50 50.0 30884.3 0.002 0.327 0.41506
60 60.0 30888.7 0.001 0.325 0.23728
70 70.0 30763.2 0.001 0.324 0.10889
80 80.0 30845.5 0.001 0.324 0.02988
90 90.0 26350.9 0.001 0.324 0.00025
kru0052@cn202:~$ exit
```
**GPU stat**
The GPU load can be determined by the `gpustat` utility.
```console
Every 2,0s: gpustat --color
dgx Mon Mar 11 09:31:00 2019
[0] Tesla V100-SXM3-32GB | 47'C, 96 % | 23660 / 32480 MB | kru0052(23645M)
[1] Tesla V100-SXM3-32GB | 48'C, 96 % | 23660 / 32480 MB | kru0052(23645M)
[2] Tesla V100-SXM3-32GB | 56'C, 97 % | 23660 / 32480 MB | kru0052(23645M)
[3] Tesla V100-SXM3-32GB | 57'C, 97 % | 23660 / 32480 MB | kru0052(23645M)
[4] Tesla V100-SXM3-32GB | 46'C, 97 % | 23660 / 32480 MB | kru0052(23645M)
[5] Tesla V100-SXM3-32GB | 55'C, 96 % | 23660 / 32480 MB | kru0052(23645M)
[6] Tesla V100-SXM3-32GB | 45'C, 96 % | 23660 / 32480 MB | kru0052(23645M)
[7] Tesla V100-SXM3-32GB | 54'C, 97 % | 23660 / 32480 MB | kru0052(23645M)
[8] Tesla V100-SXM3-32GB | 45'C, 96 % | 23660 / 32480 MB | kru0052(23645M)
[9] Tesla V100-SXM3-32GB | 46'C, 95 % | 23660 / 32480 MB | kru0052(23645M)
[10] Tesla V100-SXM3-32GB | 55'C, 96 % | 23660 / 32480 MB | kru0052(23645M)
[11] Tesla V100-SXM3-32GB | 56'C, 96 % | 23660 / 32480 MB | kru0052(23645M)
[12] Tesla V100-SXM3-32GB | 47'C, 95 % | 23660 / 32480 MB | kru0052(23645M)
[13] Tesla V100-SXM3-32GB | 45'C, 96 % | 23660 / 32480 MB | kru0052(23645M)
[14] Tesla V100-SXM3-32GB | 55'C, 96 % | 23660 / 32480 MB | kru0052(23645M)
[15] Tesla V100-SXM3-32GB | 58'C, 95 % | 23660 / 32480 MB | kru0052(23645M)
```
[1]: https://docs.it4i.cz/software/tools/singularity/
[2]: https://ngc.nvidia.com/
# Software Deployment
Software deployment on DGX-2 is based on containers. NVIDIA provides a wide range of prepared Docker containers with a variety of different software. Users can easily download these containers and use them directly on the DGX-2.
The catalog of all container images can be found on [NVIDIA site][a]. Supported software includes:
* TensorFlow
* MATLAB
* GROMACS
* Theano
* Caffe2
* LAMMPS
* ParaView
* ...
## Running Containers on DGX-2
NVIDIA expects usage of Docker as a containerization tool, but Docker is not a suitable solution in a multiuser environment. For this reason, the [Apptainer/Singularity container][b] solution is used.
Singularity can be used similarly to Docker, just change the image URL address. For example, original command for Docker `docker run -it nvcr.io/nvidia/theano:18.08` should be changed to `singularity shell docker://nvcr.io/nvidia/theano:18.08`. More about Apptainer/Singularity [here][1].
For fast container deployment, all images are cached after first use in the *lscratch* directory. This behavior can be changed by the *SINGULARITY_CACHEDIR* environment variable, but the start time of the container will increase significantly.
```console
$ ml av Singularity
---------------------------- /apps/modules/tools ----------------------------
Singularity/3.3.0
```
## MPI Modules
```console
$ ml av MPI
---------------------------- /apps/modules/mpi ----------------------------
OpenMPI/2.1.5-GCC-6.3.0-2.27 OpenMPI/3.1.4-GCC-6.3.0-2.27 OpenMPI/4.0.0-GCC-6.3.0-2.27 (D) impi/2017.4.239-iccifort-2017.7.259-GCC-6.3.0-2.27
```
## Compiler Modules
```console
$ ml av gcc
---------------------------- /apps/modules/compiler ----------------------------
GCC/6.3.0-2.27 GCCcore/6.3.0 icc/2017.7.259-GCC-6.3.0-2.27 ifort/2017.7.259-GCC-6.3.0-2.27
```
[1]: ../software/tools/singularity.md
[a]: https://ngc.nvidia.com/catalog/landing
[b]: https://www.sylabs.io/
# What Is DICE Project?
DICE (Data Infrastructure Capacity for EOSC) is an international project funded by the European Union
that provides cutting-edge data management services and a significant amount of storage resources for the EOSC.
The EOSC (European Open Science Cloud) project provides European researchers, innovators, companies,
and citizens with a federated and open multi-disciplinary environment
where they can publish, find, and re-use data, tools, and services for research, innovation and educational purposes.
For more information, see the official [DICE project][b] and [EOSC project][q] pages.
**IT4Innovations participates in DICE. DICE uses the iRODS software**
The integrated Rule-Oriented Data System (iRODS) is an open source data management software
used by research organizations and government agencies worldwide.
iRODS is released as a production-level distribution aimed at deployment in mission critical environments.
It virtualizes data storage resources, so users can take control of their data,
regardless of where and on what device the data is stored.
As data volumes grow and data services become more complex,
iRODS is serving an increasingly important role in data management.
For more information, see [the official iRODS page][c].
## How to Put Your Data to Our Server
**Prerequisities:**
First, we need to verify your identity, this is done through the following steps:
1. Sign in with your organization [B2ACCESS][d]; the page requests a valid personal certificate (e.g. GEANT).
Accounts with "Low" level of assurance are not granted access to IT4I zone.
1. Confirm your certificate in the browser:
![](img/B2ACCESS_chrome_eng.jpg)
1. Confirm your certificate in the OS (Windows):
![](img/crypto_v2.jpg)
1. Sign to EUDAT/B2ACCESS:
![](img/eudat_v2.jpg)
1. After successful login to B2Access:
1. **For Non IT4I Users**
Sign in to our [AAI][f] through your B2Access account.
You have to set a new password for iRODS access.
1. **For IT4I Users**
Sign in to our [AAI][f] through your B2Access account and link your B2ACCESS identity with your existing account.
The iRODS password will be the same as your IT4I LDAP password (i.e. code.it4i.cz password).
![](img/aai.jpg)
![](img/aai2.jpg)
![](img/aai3-passwd.jpg)
![](img/irods_linking_link.jpg)
1. Contact [support@it4i.cz][a], so we can create your account at our iRODS server.
1. **Fill this request on [EOSC-MARKETPLACE][h] (recommended)** or at [EUDAT][l], please specify the requested capacity.
![](img/eosc-marketplace-active.jpg)
![](img/eosc-providers.jpg)
![](img/eudat_request.jpg)
## Access to iRODS Collection From Karolina
Access to iRODS Collection requires access to the Karolina cluster (i.e. [IT4I account][4]),
since iRODS clients are provided as a module on Karolina (Barbora is in progress).
The `irodsfs` module loads config file for irodsfs and icommands, too.
Note that you can change password to iRODS at [aai.it4i.cz][m].
### Mounting Your Collection
```console
ssh some_user@karolina.it4i.cz
ml irodsfs
```
Now you can choose between the Fuse client or iCommands:
#### Fuse
```console
ssh some_user@karolina.it4i.cz
[some_use@login4.karolina ~]$ ml irodsfs
irodsfs configuration file has been created at /home/dvo0012/.irods/config.yml
iRODS environment file has been created at /home/dvo0012/.irods/irods_environment.json
to start irodsfs, run: irodsfs -config ~/.irods/config.yml ~/IRODS
to start iCommands, run: iinit
For more information, see https://docs.it4i.cz/dice/
```
To mount your iRODS collection to ~/IRODS, run
```console
[some_user@login4.karolina ~]$ irodsfs -config ~/.irods/config.yml ~/IRODS
time="2022-08-04 08:54:13.222836" level=info msg="Logging to /tmp/irodsfs_cblmq5ab1lsaj31vrv20.log" function=processArguments package=main
Password:
time="2022-08-04 08:54:18.698811" level=info msg="Found FUSE Device. Starting iRODS FUSE Lite." function=parentMain package=main
time="2022-08-04 08:54:18.699080" level=info msg="Running the process in the background mode" function=parentRun package=main
time="2022-08-04 08:54:18.699544" level=info msg="Process id = 27145" function=parentRun package=main
time="2022-08-04 08:54:18.699572" level=info msg="Sending configuration data" function=parentRun package=main
time="2022-08-04 08:54:18.699730" level=info msg="Successfully sent configuration data to background process" function=parentRun package=main
time="2022-08-04 08:54:18.922490" level=info msg="Successfully started background process" function=parentRun package=main
```
To unmount it, run
```console
fusermount -u ~/IRODS
```
You can work with Fuse as an ordinary directory (`ls`, `cd`, `cp`, `mv`, etc.).
#### iCommands
```console
ssh some_user@karolina.it4i.cz
[some_use@login4.karolina ~]$ ml irodsfs
irodsfs configuration file has been created at /home/dvo0012/.irods/config.yml.
to start irods fs run: irodsfs -config ~/.irods/config.yml ~/IRODS
iCommands environment file has been created at /home/$USER/.irods/irods_environment.json.
to start iCommands run: iinit
[some_user@login4.karolina ~]$ iinit
Enter your current PAM password:
```
```console
[some_use@login4.karolina ~]$ ils
/IT4I/home/some_user:
test.1
test.2
test.3
test.4
```
Use the command `iput` for upload, `iget` for download, or `ihelp` for help.
## Access to iRODS Collection From Other Resource
!!! note
This guide assumes you are uploading your data from your local PC/VM.
Use the password from [AAI][f].
### You Need a Client to Connect to iRODS Server
There are many iRODS clients, but we recommend the following:
- Cyberduck - Windows/Mac, GUI
- Fuse (irodsfs lite) - Linux, CLI
- iCommands - Linux, CLI.
For access, set PAM passwords at [AAI][f].
### Cyberduck
1. Download [Cyberduck][i].
2. Download [connection profile][1] for IT4I iRods server.
3. Left double-click this file to open connection.
![](img/irods-cyberduck.jpg)
### Fuse
!!!note "Linux client only"
This is a Linux client only, basic knowledge of the command line is necessary.
Fuse allows you to work with your iRODS collection like an ordinary directory.
```console
cd ~
wget https://github.com/cyverse/irodsfs/releases/download/v0.7.6/irodsfs_amd64_linux_v0.7.6.tar
tar -xvf ~/irodsfs_amd64_linux_v0.7.6.tar
mkdir ~/IRODS ~/.irods/ && cd "$_" && wget https://docs.it4i.cz/config.yml
wget https://pki.cesnet.cz/_media/certs/chain_geant_ov_rsa_ca_4_full.pem -P ~/.irods/
```
Edit `~/.irods/config.yml` with username from [AAI][f].
#### Mounting Your Collection
```console
[some_user@local_pc ~]$ ./irodsfs -config ~/.irods/config.yml ~/IRODS
time="2022-07-29 09:51:11.720831" level=info msg="Logging to /tmp/irodsfs_cbhp2rucso0ef0s7dtl0.log" function=processArguments package=main
Password:
time="2022-07-29 09:51:17.691988" level=info msg="Found FUSE Device. Starting iRODS FUSE Lite." function=parentMain package=main
time="2022-07-29 09:51:17.692683" level=info msg="Running the process in the background mode" function=parentRun package=main
time="2022-07-29 09:51:17.693381" level=info msg="Process id = 74772" function=parentRun package=main
time="2022-07-29 09:51:17.693421" level=info msg="Sending configuration data" function=parentRun package=main
time="2022-07-29 09:51:17.693772" level=info msg="Successfully sent configuration data to background process" function=parentRun package=main
time="2022-07-29 09:51:18.008166" level=info msg="Successfully started background process" function=parentRun package=main
```
#### Putting Your Data to iRODS
```console
[some_use@local_pc ~]$ cp test1G.txt ~/IRODS
```
It works as ordinary file system
```console
[some_user@local_pc ~]$ ls -la ~/IRODS
total 0
-rwx------ 1 some_user some_user 1073741824 Nov 4 2021 test1G.txt
```
#### Unmounting Your Collection
To stop/unmount your collection, use:
```console
[some_user@local_pc ~]$ fusermount -u ~/IRODS
```
### iCommands
!!!note "Linux client only"
This is a Linux client only, basic knowledge of the command line is necessary.
We recommend Centos7, Ubuntu 20 is optional.
#### Steps for Ubuntu 20
```console
LSB_RELEASE="bionic"
wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add -
echo "deb [arch=amd64] https://packages.irods.org/apt/ ${LSB_RELEASE} main" \
> | sudo tee /etc/apt/sources.list.d/renci-irods.list
deb [arch=amd64] https://packages.irods.org/apt/ bionic main
sudo apt-get update
apt-cache search irods
wget -c \
http://security.ubuntu.com/ubuntu/pool/main/p/python-urllib3/python-urllib3_1.22-1ubuntu0.18.04.2_all.deb \
http://security.ubuntu.com/ubuntu/pool/main/r/requests/python-requests_2.18.4-2ubuntu0.1_all.deb \
http://security.ubuntu.com/ubuntu/pool/main/o/openssl1.0/libssl1.0.0_1.0.2n-1ubuntu5.10_amd64.deb
sudo apt install \
./python-urllib3_1.22-1ubuntu0.18.04.2_all.deb \
./python-requests_2.18.4-2ubuntu0.1_all.deb \
./libssl1.0.0_1.0.2n-1ubuntu5.6_amd64.deb
sudo rm -rf \
./python-urllib3_1.22-1ubuntu0.18.04.2_all.deb \
./python-requests_2.18.4-2ubuntu0.1_all.deb \
./libssl1.0.0_1.0.2n-1ubuntu5.6_amd64.deb
sudo apt install -y irods-icommands
mkdir ~/.irods/ && cd "$_" && wget https://docs.it4i.cz/irods_environment.json
wget https://pki.cesnet.cz/_media/certs/chain_geant_ov_rsa_ca_4_full.pem -P ~/.irods
sed -i 's,~,'"$HOME"',g' ~/.irods/irods_environment.json
```
#### Steps for Centos
```console
sudo rpm --import https://packages.irods.org/irods-signing-key.asc
sudo wget -qO - https://packages.irods.org/renci-irods.yum.repo | sudo tee /etc/yum.repos.d/renci-irods.yum.repo
sudo yum install epel-release -y
sudo yum install python-psutil python-jsonschema
sudo yum install irods-icommands
mkdir ~/.irods/ && cd "$_" && wget https://docs.it4i.cz/irods_environment.json
wget https://pki.cesnet.cz/_media/certs/chain_geant_ov_rsa_ca_4_full.pem -P ~/.irods
sed -i 's,~,'"$HOME"',g' ~/.irods/irods_environment.json
```
Edit ***irods_user_name*** in `~/.irods/irods_environment.json` with the username from [AAI][f].
```console
[some_user@local_pc ~]$ pwd
/some_user/.irods
[some_user@local_pc ~]$ ls -la
total 16
drwx------. 2 some_user some_user 136 Sep 29 08:53 .
dr-xr-x---. 6 some_user some_user 206 Sep 29 08:53 ..
-rw-r--r--. 1 some_user some_user 253 Sep 29 08:14 irods_environment.json
```
**How to Start:**
**step 1:**
```console
[some_user@local_pc ~]$ iinit
Enter your current PAM password:
[some_user@local_pc ~]$ ils
/IT4I/home/some_user:
file.jpg
```
**How to put your data to iRODS**
```console
[some_user@local_pc ~]$ iput cesnet.crt
```
```console
[some_user@local_pc ~]$ ils
/IT4I/home/some_user:
cesnet.crt
```
**How to download data**
```console
[some_user@local_pc ~]$ iget cesnet.crt
ls -la ~
-rw-r--r--. 1 some_user some_user 1464 Jul 20 13:44 cesnet.crt
```
For more commands, use the `ihelp` command.
## PID Services
You, as user, may want to index your datasets and allocate some PIDs - Persistent Identifiers for them. We host pid system by hdl-surfsara ([https://it4i-handle.it4i.cz][o]), wich is conected to [https://hdl.handle.net][p], and you are able to create your own PID by calling some of irule.
### How to Create PID
Pids are created by calling `irule`, you have to create at your `$HOME` or everewhere you want,
but you have to specify the path correctly.
Rules for pid operations have always `.r suffix`.
It can by done only through `iCommands`.
Example of a rule for PID creating only:
```console
user in ~ λ pwd
/home/user
user in ~ λ ils
/IT4I/home/user:
C- /IT4I/home/dvo0012/Collection_A
user in ~ λ ls -l | grep pid
-rw-r--r-- 1 user user 249 Sep 30 10:55 create_pid.r
user in ~ λ cat create_pid.r
PID_DO_reg {
EUDATCreatePID(*parent_pid, *source, *ror, *fio, *fixed, *newPID);
writeLine("stdout","PID: *newPID");
}
INPUT *source="/IT4I/home/user/Collection_A",*parent_pid="None",*ror="None",*fio="None",*fixed="true"
OUTPUT ruleExecOut
user in ~ λ irule -F create_pid.r
PID: 21.12149/f3b9b1a5-7b4d-4fff-bfb7-826676f6fe14
```
After creation, your PID is searchable worldwide:
![](img/hdl_net.jpg)
![](img/hdl_pid.jpg)
**More info at [www.eudat.eu][n]**
### Metadata
For adding metadata to you collection/dataset, you can use imeta from iCommands.
This is after PID creation:
```console
user in ~ λ imeta ls -C /IT4I/home/user/Collection_A
AVUs defined for collection /IT4I/home/user/Collection_A:
attribute: EUDAT/FIXED_CONTENT
value: True
units:
----
attribute: PID
value: 21.12149/f3b9b1a5-7b4d-4fff-bfb7-826676f6fe14
units:
```
For adding any other metadata you can use:
```console
user in ~ λ imeta add -C /IT4I/home/user/Collection_A EUDAT_B2SHARE_TITLE Some_Title
user in ~ λ imeta ls -C /IT4I/home/user/Collection_A
AVUs defined for collection /IT4I/home/user/Collection_A:
attribute: EUDAT/FIXED_CONTENT
value: True
units:
----
attribute: PID
value: 21.12149/f3b9b1a5-7b4d-4fff-bfb7-826676f6fe14
units:
----
attribute: EUDAT_B2SHARE_TITLE
value: Some_Title
units:
```
[1]: irods.cyberduckprofile
[2]: irods_environment.json
[3]: config.yml
[4]: general/access/account-introduction.md
[a]: mailto:support@it4i.cz
[b]: https://www.dice-eosc.eu/
[c]: https://irods.org/
[d]: https://b2access.eudat.eu/
[f]: https://aai.it4i.cz/realms/IT4i_IRODS/account/#/
[h]: https://marketplace.eosc-portal.eu/services/b2safe/offers
[i]: https://cyberduck.io/download/
[l]: https://www.eudat.eu/contact-support-request?Service=B2SAFE
[m]: https://aai.it4i.cz/
[n]: https://www.eudat.eu/catalogue/b2handle
[o]: https://it4i-handle.it4i.cz
[p]: https://hdl.handle.net
[q]: https://eosc-portal.eu/
# Migration to e-INFRA CZ
## Introduction
IT4Innovations is a part of [e-INFRA CZ][1] - strategic research infrastructure of the Czech Republic, which provides capacities and resources for the transmission, storage, and processing of scientific and research data. In January 2022, IT4I has begun the process of integration of its services.
As a part of the process, a joint e-INFRA CZ user base has been established. This included a migration of eligible IT4I accounts.
## Who Has Been Affected
The migration affects all accounts of users affiliated with an academic organizations in the Czech Republic who also have an OPEN-XX-XX project. Affected users have received an email with information about changes in personal data processing.
## Who Has Not Been Affected
Commercial users, training accounts, suppliers, and service accounts were **not** affected by the migration.
## Process
During the process, additional steps have been required for successful migration.
This may have included:
1. e-INFRA CZ registration, if one does not already exist.
2. e-INFRA CZ password reset, if one does not already exist.
## Steps After Migration
After the migration, you must use your **e-INFRA CZ credentials** to access all IT4I services as well as [e-INFRA CZ services][5].
Successfully migrated accounts tied to e-INFRA CZ can be self-managed at [e-INFRA CZ User profile][4].
!!! tip "Recommendation"
We recommend [verifying your SSH keys][6] for cluster access.
## Troubleshooting
If you have a problem with your account migrated to e-INFRA CZ user base, contact the [CESNET support][7].
If you have questions or a problem with IT4I account (i.e. account not eligible for migration), contact the [IT4I support][2].
[1]: https://www.e-infra.cz/en
[2]: mailto:support@it4i.cz
[3]: https://www.cesnet.cz/?lang=en
[4]: https://profile.e-infra.cz/
[5]: https://www.e-infra.cz/en/services
[6]: https://profile.e-infra.cz/profile/settings/sshKeys
[7]: mailto:support@cesnet.cz
# Environment and Modules
## Shells on Clusters
The table shows which shells are available on the IT4Innovations clusters.
Note that bash is the only supported shell.
| Cluster Name | bash | tcsh | zsh | ksh | dash |
| --------------- | ---- | ---- | --- | --- | ---- |
| Karolina | yes | yes | yes | yes | yes |
| Barbora | yes | yes | yes | yes | no |
| DGX-2 | yes | no | no | no | no |
!!! info
Bash is the default shell. Should you need a different shell, contact [support\[at\]it4i.cz][3].
## Environment Customization
After logging in, you may want to configure the environment. Write your preferred path definitions, aliases, functions, and module loads in the .bashrc file
```console
# ./bashrc
# users compilation path
export MODULEPATH=${MODULEPATH}:/home/$USER/.local/easybuild/modules/all
# User specific aliases and functions
alias sq='squeue --me'
# load default intel compilator !!! is not recommended !!!
ml intel
# Display information to standard output - only in interactive ssh session
if [ -n "$SSH_TTY" ]
then
ml # Display loaded modules
fi
```
!!! note
Do not run commands outputting to standard output (echo, module list, etc.) in .bashrc for non-interactive SSH sessions. It breaks the fundamental functionality (SCP) of your account. Take care for SSH session interactivity for such commands as stated in the previous example.
### Application Modules
In order to configure your shell for running a particular application on clusters, we use a module package interface.
Application modules on clusters are built using [EasyBuild][1]. The modules are divided into the following groups:
```
base: Default module class
bio: Bioinformatics, biology and biomedical
cae: Computer Aided Engineering (incl. CFD)
chem: Chemistry, Computational Chemistry and Quantum Chemistry
compiler: Compilers
data: Data management & processing tools
debugger: Debuggers
devel: Development tools
geo: Earth Sciences
ide: Integrated Development Environments (e.g. editors)
lang: Languages and programming aids
lib: General purpose libraries
math: High-level mathematical software
mpi: MPI stacks
numlib: Numerical Libraries
perf: Performance tools
phys: Physics and physical systems simulations
system: System utilities (e.g. highly depending on system OS and hardware)
toolchain: EasyBuild toolchains
tools: General purpose tools
vis: Visualization, plotting, documentation and typesetting
OS: singularity image
python: python packages
```
!!! note
The modules set up the application paths, library paths and environment variables for running a particular application.
The modules may be loaded, unloaded, and switched according to momentary needs. For details, see [lmod][2].
[1]: software/tools/easybuild.md
[2]: software/modules/lmod.md
[3]: mailto:support@it4i.cz
File added
File added
# Introduction
This section provides basic information on how to gain access to IT4Innovations Information systems and project membership.
## Account Types
There are two types of accounts at IT4Innovations:
* [**e-INFRA CZ Account**][1]
intended for all persons affiliated with an academic institution from the Czech Republic ([eduID.cz][a]).
* [**IT4I Account**][2]
intended for all persons who are not eligible for an e-INFRA CZ account.
Once you create an account, you can use it only for communication with IT4I support and accessing the SCS information system.
If you want to access IT4I clusters, your account must also be **assigned to a project**.
For more information, see the section:
* [**Get Project Membership**][3]
if you want to become a collaborator on a project, or
* [**Get Project**][4]
if you want to become a project owner.
[1]: ./einfracz-account.md
[2]: ../obtaining-login-credentials/obtaining-login-credentials.md
[3]: ../access/project-access.md
[4]: ../applying-for-resources.md
[a]: https://www.eduid.cz/
# e-INFRA CZ Account
[e-INFRA CZ][1] is a unique research and development e-infrastructure in the Czech Republic,
which provides capacities and resources for the transmission, storage and processing of scientific and research data.
IT4Innovations has become a member of e-INFRA CZ on January 2022.
!!! important
Only persons affiliated with an academic institution from the Czech Republic ([eduID.cz][6]) are eligible for an e-INFRA CZ account.
## Request e-INFRA CZ Account
1. Request an account:
1. Go to [https://signup.e-infra.cz/fed/registrar/?vo=IT4Innovations][2]
1. Select a member academic institution you are affiliated with.
1. Fill out the e-INFRA CZ Account information (username, password and ssh key(s)).
Your account should be created in a few minutes after submitting the request.
Once your e-INFRA CZ account is created, it is propagated into IT4I systems
and can be used to access [SCS portal][3] and [Request Tracker][4].
1. Provide additional information via [IT4I support][a] or email [support\[at\]it4i.cz][b] (**required**, note that without this information, you cannot use IT4I resources):
1. **Full name**
1. **Gender**
1. **Citizenship**
1. **Country of residence**
1. **Organization/affiliation**
1. **Organization/affiliation country**
1. **Organization/affiliation type** (university, company, R&D institution, private/public sector (hospital, police), academy of sciences, etc.)
1. **Job title** (student, PhD student, researcher, research assistant, employee, etc.)
Continue to apply for a project or project membership to access clusters through the [SCS portal][3].
## Logging Into IT4I Services
The table below shows how different IT4I services are accessed:
| Services | Access |
| -------- | ------- |
| Clusters | SSH key |
| IS, RT, web, VPN | e-INFRA CZ login |
| Profile<br>Change&nbsp;password<br>Change&nbsp;SSH&nbsp;key | Academic institution's credentials<br>e-INFRA CZ / eduID |
You can change you profile settings at any time.
[1]: https://www.e-infra.cz/en
[2]: https://signup.e-infra.cz/fed/registrar/?vo=IT4Innovations
[3]: https://scs.it4i.cz/
[4]: https://support.it4i.cz/
[5]: ../../management/einfracz-profile.md
[6]: https://www.eduid.cz/
[a]: https://support.it4i.cz/rt/
[b]: mailto:support@it4i.cz
# Get Project Membership
!!! note
You need to be named as a collaborator by a Primary Investigator (PI) in order to access and use the clusters.
## Authorization by Web
This is a preferred method if you have an IT4I or e-INFRA CZ account.
Log in to the [IT4I SCS portal][a] and go to the **Authorization Requests** section. Here you can submit your requests for becoming a project member. You will have to wait until the project PI authorizes your request.
## Authorization by Email
An alternative way to become a project member is on request sent via [email by the project PI][1].
[1]: ../../applying-for-resources/#authorization-by-email-an-alternative-approach
[a]: https://scs.it4i.cz/
# Open OnDemand
[Open OnDemand][1] is an intuitive, innovative, and interactive interface to remote computing resources.
It allows users to access our services from any device and web browser,
resulting in faster and more efficient use of supercomputing resources.
For more information, see the Open OnDemand [documentation][2].
## Access Open OnDemand
To access the OOD service, you must be connected to [IT4I VPN][a].
Then go to [https://ood-karolina.it4i.cz/][3] for Karolina
or [https://ood-barbora.it4i.cz/][4] for Barbora and enter your e-INFRA CZ or IT4I credentials.
From the top menu bar, you can manage your files and jobs, access the cluster's shell
and launch interactive apps on login nodes.
## OOD Apps on IT4I Clusters
!!! note
Barbora OOD offers Mate and XFCE Desktops on login node only. Other applications listed below are exclusive to Karolina OOD.
* Desktops
* Karolina Login Mate
* Karolina Login XFCE
* Gnome Desktop
* GUIs
* Ansys
* Blender
* ParaView
* TorchStudio
* Servers
* Code Server
* Jupyter (+IJulia)
* MATLAB
* TensorBoard
* Simulation
* Code Aster
Depending on a selected application, you can set up various properties;
e.g. partition, number of nodes, tasks per node reservation, etc.
For `qgpu` partitions, you can select the number of GPUs.
![Ansys app in OOD GUI](../../../img/ood-ansys.png)
## Job Composer Tutorial
Under *Jobs > Job Composer*, you can create jobs from several sources.
A simple tutorial will guide you through the process.
To restart the tutorial, click *Help* in the upper right corner.
[1]: https://openondemand.org/
[2]: https://osc.github.io/ood-documentation/latest/
[3]: https://ood-karolina.it4i.cz/
[4]: https://ood-barbora.it4i.cz/
[a]: ../vpn-access.md
# VNC
Virtual Network Computing (VNC) is a graphical desktop-sharing system that uses the Remote Frame Buffer protocol (RFB) to remotely control another computer. It transmits the keyboard and mouse events from one computer to another, relaying the graphical screen updates back in the other direction, over a network.
VNC-based connections are usually faster (require less network bandwidth) than [X11][1] applications forwarded directly through SSH.
The recommended clients are [TightVNC][b] or [TigerVNC][c] (free, open source, available for almost any platform).
## Create VNC Server Password
!!! note
VNC server password should be set before the first login. Use a strong password.
```console
$ vncpasswd
Password:
Verify:
```
## Start VNC Server
!!! note
To access VNC, a remote VNC Server must be started first and a tunnel using SSH port forwarding must be established.
[See below][2] the details on SSH tunnels.
Start by **choosing your display number**.
To choose a free one, you should check currently occupied display numbers - list them using the command:
```console
$ ps aux | grep Xvnc | sed -rn 's/(\s) .*Xvnc (\:[0-9]+) .*/\1 \2/p'
username :79
username :60
.....
```
As you can see above, displays ":79" and ":60" are already occupied.
Generally, you can choose display number freely, *except these occupied numbers*.
Also remember that display number should be lower than or equal to 99.
Based on this requirement, we have chosen the display number 61, as seen in the examples below.
!!! note
Your situation may be different so the choice of your number may differ, as well. **Choose and use your own display number accordingly!**
Start your remote VNC server on the chosen display number (61):
```console
$ vncserver :61 -geometry 1600x900 -depth 16
New 'login2:1 (username)' desktop is login2:1
Starting applications specified in /home/username/.vnc/xstartup
Log file is /home/username/.vnc/login2:1.log
```
Check whether the VNC server is running on the chosen display number (61):
```console
$ vncserver -list
TigerVNC server sessions:
X DISPLAY # PROCESS ID
:61 18437
```
Another way to check it:
```console
$ ps aux | grep Xvnc | sed -rn 's/(\s) .*Xvnc (\:[0-9]+) .*/\1 \2/p'
username :61
username :102
```
!!! note
The VNC server runs on port 59xx, where xx is the display number. To get your port number, simply add 5900 + display number, in our example 5900 + 61 = 5961. Another example for display number 102 is calculation of TCP port 5900 + 102 = 6002, but note that TCP ports above 6000 are often used by X11. **Calculate your own port number and use it instead of 5961 from examples below**.
To access the remote VNC server you have to create a tunnel between the login node using TCP port 5961 and your local machine using a free TCP port (for simplicity the very same) in next step. See examples for [Linux/Mac OS][2] and [Windows][3].
!!! note
The tunnel must point to the same login node where you launched the VNC server, e.g. login2. If you use just cluster-name.it4i.cz, the tunnel might point to a different node due to DNS round robin.
## Linux/Mac OS Example of Creating a Tunnel
On your local machine, create the tunnel:
```console
$ ssh -TN -f username@login2.cluster-name.it4i.cz -L 5961:localhost:5961
```
Issue the following command to check the tunnel is established (note the PID 2022 in the last column, it is required for closing the tunnel):
```console
$ netstat -natp | grep 5961
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.0.1:5961 0.0.0.0:* LISTEN 2022/ssh
tcp6 0 0 ::1:5961 :::* LISTEN 2022/ssh
```
Or on Mac OS use this command:
```console
$ lsof -n -i4TCP:5961 | grep LISTEN
ssh 75890 sta545 7u IPv4 0xfb062b5c15a56a3b 0t0 TCP 127.0.0.1:5961 (LISTEN)
```
Connect with the VNC client:
```console
$ vncviewer 127.0.0.1:5961
```
In this example, we connect to remote VNC server on port 5961, via the SSH tunnel. The connection is encrypted and secured. The VNC server listening on port 5961 provides screen of 1600x900 pixels.
You have to close the SSH tunnel which is still running in the background after you finish the work. Use the following command (PID 2022 in this case, see the netstat command above):
```console
kill 2022
```
!!! note
You can watch the instruction video on how to make a VNC connection between a local Ubuntu desktop and the IT4I cluster [here][e].
## Windows Example of Creating a Tunnel
Start the VNC server using the `vncserver` command described above.
Search for the localhost and port number (in this case 127.0.0.1:5961):
```console
$ netstat -tanp | grep Xvnc
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.0.1:5961 0.0.0.0:* LISTEN 24031/Xvnc
```
### PuTTY
On the PuTTY Configuration screen, go to _Connection -> SSH -> Tunnels_ to set up the tunnel.
Fill the _Source port_ and _Destination_ fields. **Do not forget to click the _Add_ button**.
![](../../../img/putty-tunnel.png)
### WSL (Bash on Windows)
[Windows Subsystem for Linux][d] is another way to run Linux software in a Windows environment.
At your machine, create the tunnel:
```console
$ ssh username@login2.cluster-name.it4i.cz -L 5961:localhost:5961
```
## Example of Starting VNC Client
Run the VNC client of your choice, select the VNC server 127.0.0.1, port 5961 and connect using the VNC password.
### TigerVNC Viewer
![](../../../img/vncviewer.png)
In this example, we connect to remote the VNC server on port 5961, via the SSH tunnel, using the TigerVNC viewer. The connection is encrypted and secured. The VNC server listening on port 5961 provides a screen of 1600x900 pixels.
### TightVNC Viewer
Use your VNC password to log using the TightVNC Viewer and start a Gnome Session on the login node.
![](../../../img/TightVNC_login.png)
## Gnome Session
After the successful login, you should see the following screen:
![](../../../img/gnome_screen.png)
### Disable Your Gnome Session Screensaver
Open the Screensaver preferences dialog:
![](../../../img/gdmscreensaver.png)
Uncheck both options below the slider:
![](../../../img/gdmdisablescreensaver.png)
### Kill Screensaver if Locked Screen
If the screen gets locked, you have to kill the screensaver. Do not forget to disable the screensaver then.
```console
$ ps aux | grep screen
username 1503 0.0 0.0 103244 892 pts/4 S+ 14:37 0:00 grep screen
username 24316 0.0 0.0 270564 3528 ? Ss 14:12 0:00 gnome-screensaver
[username@login2 .vnc]$ kill 24316
```
## Kill VNC Server After Finished Work
You should kill your VNC server using the command:
```console
$ vncserver -kill :61
Killing Xvnc process ID 7074
Xvnc process ID 7074 already killed
```
or:
```console
$ pkill vnc
```
!!! note
Also, do not forget to terminate the SSH tunnel, if it was used. For details, see the end of [this section][2].
## GUI Applications on Compute Nodes Over VNC
The very same methods as described above may be used to run the GUI applications on compute nodes. However, for maximum performance, follow these steps:
Open a Terminal (_Applications -> System Tools -> Terminal_). Run all the following commands in the terminal.
![](../../../img/gnome-terminal.png)
Allow incoming X11 graphics from the compute nodes at the login node:
Get an interactive session on a compute node (for more detailed info [look here][4]). Forward X11 system using `--x11` option:
```console
$ salloc -A PROJECT_ID -p qcpu --x11
```
Test that the DISPLAY redirection into your VNC session works, by running an X11 application (e.g. XTerm, Intel Advisor, etc.) on the assigned compute node:
```console
$ xterm
```
The example described above:
![](../../../img/node_gui_xwindow.png)
### GUI Over VNC and SSH
For a [better performance][1] an SSH connection can be used.
Open two Terminals (_Applications -> System Tools -> Terminal_) as described before.
Get an interactive session on a compute node (for more detailed info [look here][4]). Forward X11 system using `--x11` option:
```console
$ salloc -A PROJECT_ID -p qcpu --x11
```
In the second terminal connect to the assigned node and run the X11 application
```console
$ ssh -X node_name.barbora.it4i.cz
$ xterm
```
The example described above:
![](../../../img/node_gui_sshx.png)
[b]: http://www.tightvnc.com
[c]: http://sourceforge.net/apps/mediawiki/tigervnc/index.php?title=Main_Page
[d]: http://docs.microsoft.com/en-us/windows/wsl
[e]: https://www.youtube.com/watch?v=b9Ez9UN2uL0
[1]: x-window-system.md
[2]: #linuxmac-os-example-of-creating-a-tunnel
[3]: #windows-example-of-creating-a-tunnel
[4]: ../../job-submission-and-execution.md
# X Window System
The X Window system is a principal way to get GUI access to the clusters. The **X Window System** (commonly known as **X11**, based on its current major version being 11, or shortened to simply **X**, and sometimes informally **X-Windows**) is a computer software system and network protocol that provides a basis for graphical user interfaces (GUIs) and rich input device capability for networked computers.
!!! tip
The X display forwarding must be activated and the X server running on client side
## X Display
### Linux Example
In order to display the GUI of various software tools, you need to enable the X display forwarding. On Linux and Mac, log in using the `-X` option in the SSH client:
```console
local $ ssh -X username@cluster-name.it4i.cz
```
### PuTTY on Windows
On Windows, use the PuTTY client to enable X11 forwarding. In PuTTY menu, go to _Connection > SSH > X11_ and check the _Enable X11 forwarding_ checkbox before logging in. Then log in as usual.
![](../../../img/cygwinX11forwarding.png)
### WSL (Bash on Windows)
To enable the X display forwarding, log in using the `-X` option in the SSH client:
```console
local $ ssh -X username@cluster-name.it4i.cz
```
!!! tip
If you are getting the "cannot open display" error message, try to export the DISPLAY variable, before attempting to log in:
```console
local $ export DISPLAY=localhost:0.0
```
## X Server
In order to display the GUI of various software tools, you need a running X server on your desktop computer. For Linux users, no action is required as the X server is the default GUI environment on most Linux distributions. Mac and Windows users need to install and run the X server on their workstations.
### X Server on OS X
Mac OS users need to install [XQuartz server][d].
### WSL (Bash on Windows)
To run Linux GuI on WSL, download, for example, [VcXsrv][a].
1. After installation, run XLaunch and during the initial setup, check the `Disable access control`.
!!! tip
Save the configuration and launch VcXsrv using the `config.xlaunch` file, so you won't have to check the option on every run.
1. Allow VcXsrv in your firewall to communicate on private and public networks.
1. Set the `DISPLAY` environment variable, using the following command:
```console
export DISPLAY="`grep nameserver /etc/resolv.conf | sed 's/nameserver //'`:0"
```
!!! tip
Include the command at the end of the `/etc/bash.bashrc`, so you don't have to run it every time you run WSL.
1. Test the configuration by running `echo $DISPLAY`:
```code
user@nb-user:/$ echo $DISPLAY
172.26.240.1:0
```
### X Server on Windows
There is a variety of X servers available for the Windows environment. The commercial Xwin32 is very stable and feature-rich. The Cygwin environment provides fully featured open-source XWin X server. For simplicity, we recommend the open-source X server by the [Xming project][e]. For stability and full features, we recommend the [XWin][f] X server by Cygwin
| How to use Xwin | How to use Xming |
|--- | --- |
| [Install Cygwin][g]. Find and execute XWin.exe to start the X server on Windows desktop computer. | Use Xlaunch to configure Xming. Run Xming to start the X server on a Windows desktop computer. |
## Running GUI Enabled Applications
!!! note
Make sure that X forwarding is activated and the X server is running.
Then launch the application as usual. Use the `&` to run the application in background:
```console
$ ml intel (idb and gvim not installed yet)
$ gvim &
```
```console
$ xterm
```
In this example, we activate the Intel programing environment tools and then start the graphical gvim editor.
## GUI Applications on Compute Nodes
Allocate the compute nodes using the `--x11` option on the `salloc` command:
```console
$ salloc -A PROJECT-ID -q qcpu_exp --x11
```
In this example, we allocate one node via qcpu_exp queue, interactively. We request X11 forwarding with the `--x11` option. It will be possible to run the GUI enabled applications directly on the first compute node.
For **better performance**, log on the allocated compute node via SSH, using the `-X` option.
```console
$ ssh -X cn245
```
In this example, we log on the cn245 compute node, with the X11 forwarding enabled.
## Gnome GUI Environment
The Gnome 2.28 GUI environment is available on the clusters. We recommend using a separate X server window for displaying the Gnome environment.
### Gnome on Linux and OS X
To run the remote Gnome session in a window on a Linux/OS X computer, you need to install Xephyr. Ubuntu package is
xserver-xephyr, on OS X it is part of [XQuartz][i]. First, launch Xephyr on local machine:
```console
local $ Xephyr -ac -screen 1024x768 -br -reset -terminate :1 &
```
This will open a new X window of size 1024x768 at DISPLAY :1. Next, connect via SSH to the cluster with the `DISPLAY` environment variable set and launch a gnome-session:
```console
local $ DISPLAY=:1.0 ssh -XC yourname@cluster-name.it4i.cz -i ~/.ssh/path_to_your_key
... cluster-name MOTD...
yourname@login1.cluster-namen.it4i.cz $ gnome-session &
```
On older systems where Xephyr is not available, you may also try Xnest instead of Xephyr. Another option is to launch a new X server in a separate console via:
```console
xinit /usr/bin/ssh -XT -i .ssh/path_to_your_key yourname@cluster-namen.it4i.cz gnome-session -- :1 vt12
```
However, this method does not seem to work with recent Linux distributions and you will need to manually source
/etc/profile to properly set environment variables for Slurm.
### Gnome on Windows
Use XLaunch to start the Xming server or run the XWin.exe. Select the "One window" mode.
Log in to the cluster using [PuTTY][2] or [Bash on Windows][3]. On the cluster, run the gnome-session command.
```console
$ gnome-session &
```
This way, we run a remote gnome session on the cluster, displaying it in the local X server.
Use System-Log Out to close the gnome-session.
[1]: #if-no-able-to-forward-x11-using-putty-to-cygwinx
[2]: #putty-on-windows
[3]: #wsl-bash-on-windows
[a]: https://sourceforge.net/projects/vcxsrv/
[d]: https://www.xquartz.org
[e]: http://sourceforge.net/projects/xming/
[f]: http://x.cygwin.com/
[g]: http://x.cygwin.com/
[i]: http://xquartz.macosforge.org/landing/