diff --git a/docs.it4i/cs/amd.md b/docs.it4i/cs/guides/amd.md similarity index 100% rename from docs.it4i/cs/amd.md rename to docs.it4i/cs/guides/amd.md diff --git a/docs.it4i/cs/arm.md b/docs.it4i/cs/guides/arm.md similarity index 100% rename from docs.it4i/cs/arm.md rename to docs.it4i/cs/guides/arm.md diff --git a/docs.it4i/cs/guides/xilinx.md b/docs.it4i/cs/guides/xilinx.md new file mode 100644 index 0000000000000000000000000000000000000000..074b0406b4b3fd10e426e25cef4fa936e488889e --- /dev/null +++ b/docs.it4i/cs/guides/xilinx.md @@ -0,0 +1,652 @@ +# Xilinx Accelerator Platform + +The first step to use Xilinx accelerators is to initialize Vitis (compiler) and XRT (runtime) environments. + +```console +$ . /tools/Xilinx/Vitis/2023.1/settings64.sh +$ . /opt/xilinx/xrt/setup.sh +``` + +## Platform Level Accelerator Management + +This should allow to examine current platform using `xbutil examine`, +which should output user-level information about XRT platform and list available devices + +``` +$ xbutil examine +System Configuration + OS Name : Linux + Release : 4.18.0-477.27.1.el8_8.x86_64 + Version : #1 SMP Thu Aug 31 10:29:22 EDT 2023 + Machine : x86_64 + CPU Cores : 64 + Memory : 257145 MB + Distribution : Red Hat Enterprise Linux 8.8 (Ootpa) + GLIBC : 2.28 + Model : ProLiant XL675d Gen10 Plus + +XRT + Version : 2.16.0 + Branch : master + Hash : f2524a2fcbbabd969db19abf4d835c24379e390d + Hash Date : 2023-10-11 14:01:19 + XOCL : 2.16.0, f2524a2fcbbabd969db19abf4d835c24379e390d + XCLMGMT : 2.16.0, f2524a2fcbbabd969db19abf4d835c24379e390d + +Devices present +BDF : Shell Logic UUID Device ID Device Ready* +------------------------------------------------------------------------------------------------------------------------- +[0000:88:00.1] : xilinx_u280_gen3x16_xdma_base_1 283BAB8F-654D-8674-968F-4DA57F7FA5D7 user(inst=132) Yes +[0000:8c:00.1] : xilinx_u280_gen3x16_xdma_base_1 283BAB8F-654D-8674-968F-4DA57F7FA5D7 user(inst=133) Yes + + +* Devices that are not ready will have reduced functionality when using XRT tools +``` + +Here two Xilinx Alveo u280 accelerators (`0000:88:00.1` and `0000:8c:00.1`) are available. +The `xbutil` can be also used to query additional information about specific device using its BDF address + +```console +$ xbutil examine -d "0000:88:00.1" + +------------------------------------------------- +[0000:88:00.1] : xilinx_u280_gen3x16_xdma_base_1 +------------------------------------------------- +Platform + XSA Name : xilinx_u280_gen3x16_xdma_base_1 + Logic UUID : 283BAB8F-654D-8674-968F-4DA57F7FA5D7 + FPGA Name : + JTAG ID Code : 0x14b7d093 + DDR Size : 0 Bytes + DDR Count : 0 + Mig Calibrated : true + P2P Status : disabled + Performance Mode : not supported + P2P IO space required : 64 GB + +Clocks + DATA_CLK (Data) : 300 MHz + KERNEL_CLK (Kernel) : 500 MHz + hbm_aclk (System) : 450 MHz + +Mac Addresses : 00:0A:35:0E:20:B0 + : 00:0A:35:0E:20:B1 + + Device Status: HEALTHY + Hardware Context ID: 0 + Xclbin UUID: 6306D6AE-1D66-AEA7-B15D-446D4ECC53BD + PL Compute Units + Index Name Base Address Usage Status + ------------------------------------------------- + 0 vadd:vadd_1 0x800000 1 (IDLE) +``` + +Basic functionality of the device can be checked using `xbutil validate -d <BDF>` as + +```console +$ xbutil validate -d "0000:88:00.1" +Validate Device : [0000:88:00.1] + Platform : xilinx_u280_gen3x16_xdma_base_1 + SC Version : 4.3.27 + Platform ID : 283BAB8F-654D-8674-968F-4DA57F7FA5D7 +------------------------------------------------------------------------------- +Test 1 [0000:88:00.1] : aux-connection + Test Status : [PASSED] +------------------------------------------------------------------------------- +Test 2 [0000:88:00.1] : pcie-link + Test Status : [PASSED] +------------------------------------------------------------------------------- +Test 3 [0000:88:00.1] : sc-version + Test Status : [PASSED] +------------------------------------------------------------------------------- +Test 4 [0000:88:00.1] : verify + Test Status : [PASSED] +------------------------------------------------------------------------------- +Test 5 [0000:88:00.1] : dma + Details : Buffer size - '16 MB' Memory Tag - 'HBM[0]' + Host -> PCIe -> FPGA write bandwidth = 11988.9 MB/s + Host <- PCIe <- FPGA read bandwidth = 12571.2 MB/s + ... + Test Status : [PASSED] +------------------------------------------------------------------------------- +Test 6 [0000:88:00.1] : iops + Details : IOPS: 387240(verify) + Test Status : [PASSED] +------------------------------------------------------------------------------- +Test 7 [0000:88:00.1] : mem-bw + Details : Throughput (Type: DDR) (Bank count: 2) : 33932.9MB/s + Throughput of Memory Tag: DDR[0] is 16974.1MB/s + Throughput of Memory Tag: DDR[1] is 16974.2MB/s + Throughput (Type: HBM) (Bank count: 1) : 12383.7MB/s + Test Status : [PASSED] +------------------------------------------------------------------------------- +Test 8 [0000:88:00.1] : p2p +Test 9 [0000:88:00.1] : vcu +Test 10 [0000:88:00.1] : aie +Test 11 [0000:88:00.1] : ps-aie +Test 12 [0000:88:00.1] : ps-pl-verify +Test 13 [0000:88:00.1] : ps-verify +Test 14 [0000:88:00.1] : ps-iops +``` + +Finally, the device can be reinitialized using `xbutil reset -d <BDF>` as + +```console +$ xbutil reset -d "0000:88:00.1" +Performing 'HOT Reset' on '0000:88:00.1' +Are you sure you wish to proceed? [Y/n]: Y +Successfully reset Device[0000:88:00.1] +``` + +This can be useful to recover the device from states such as `HANGING`, reported by `xbutil examine -d <BDF>`. + +## OpenCL Platform Level + +The `clinfo` utility can be used to verify that the accelerator is visible to OpenCL + +```console +$ clinfo +Number of platforms: 2 + Platform Profile: FULL_PROFILE + Platform Version: OpenCL 2.1 AMD-APP (3590.0) + Platform Name: AMD Accelerated Parallel Processing + Platform Vendor: Advanced Micro Devices, Inc. + Platform Extensions: cl_khr_icd cl_amd_event_callback + Platform Profile: EMBEDDED_PROFILE + Platform Version: OpenCL 1.0 + Platform Name: Xilinx + Platform Vendor: Xilinx + Platform Extensions: cl_khr_icd +<...> + Platform Name: Xilinx +Number of devices: 2 + Device Type: CL_DEVICE_TYPE_ACCRLERATOR + Vendor ID: 0h + Max compute units: 0 + Max work items dimensions: 3 + Max work items[0]: 4294967295 + Max work items[1]: 4294967295 + Max work items[2]: 4294967295 + Max work group size: 4294967295 + Preferred vector width char: 1 + Preferred vector width short: 1 + Preferred vector width int: 1 + Preferred vector width long: 1 + Preferred vector width float: 1 + Preferred vector width double: 0 + Max clock frequency: 0Mhz + Address bits: 64 + Max memory allocation: 4294967296 + Image support: Yes + Max number of images read arguments: 128 + Max number of images write arguments: 8 + Max image 2D width: 8192 + Max image 2D height: 8192 + Max image 3D width: 2048 + Max image 3D height: 2048 + Max image 3D depth: 2048 + Max samplers within kernel: 0 + Max size of kernel argument: 2048 + Alignment (bits) of base address: 32768 + Minimum alignment (bytes) for any datatype: 128 + Single precision floating point capability + Denorms: No + Quiet NaNs: Yes + Round to nearest even: Yes + Round to zero: No + Round to +ve and infinity: No + IEEE754-2008 fused multiply-add: No + Cache type: None + Cache line size: 64 + Cache size: 0 + Global memory size: 0 + Constant buffer size: 4194304 + Max number of constant args: 8 + Local memory type: Scratchpad + Local memory size: 16384 + Error correction support: 1 + Profiling timer resolution: 1 + Device endianess: Little + Available: No + Compiler available: No + Execution capabilities: + Execute OpenCL kernels: Yes + Execute native function: No + Queue on Host properties: + Out-of-Order: Yes + Profiling: Yes + Platform ID: 0x16fbae8 + Name: xilinx_u280_gen3x16_xdma_base_1 + Vendor: Xilinx + Driver version: 1.0 + Profile: EMBEDDED_PROFILE + Version: OpenCL 1.0 +<...> +``` + +which shows that both `Xilinx` platform and accelerator devices are present. + +## Building Applications for Emulation + +The two main approaches to building FPGA accelerated applications using Xilinx platform and HLS are **XRT** and **OpenCL**. +The XRT uses dedicated host interface and dialect of C for accelerated kernels, +while OpenCL allows to share most of the host interface with other accelerators. + +To simplify the build process we define two environment variables `IT4I_PLATFORM` and `IT4I_BUILD_MODE`. +The first `IT4I_PLATFORM` denotes specific accelerator hardware such as `Alveo u250` or `Alveo u280` +and its configuration stored in (`*.xpfm` files). +The list of available platforms can be obtained using `platforminfo` utility: + +```console +$ platforminfo -l +{ + "platforms": [ + { + "baseName": "xilinx_u280_gen3x16_xdma_1_202211_1", + "version": "202211.1", + "type": "sdaccel", + "dataCenter": "true", + "embedded": "false", + "externalHost": "true", + "serverManaged": "true", + "platformState": "impl", + "usesPR": "true", + "platformFile": "\/opt\/xilinx\/platforms\/xilinx_u280_gen3x16_xdma_1_202211_1\/xilinx_u280_gen3x16_xdma_1_202211_1.xpfm" + }, + { + "baseName": "xilinx_u250_gen3x16_xdma_4_1_202210_1", + "version": "202210.1", + "type": "sdaccel", + "dataCenter": "true", + "embedded": "false", + "externalHost": "true", + "serverManaged": "true", + "platformState": "impl", + "usesPR": "true", + "platformFile": "\/opt\/xilinx\/platforms\/xilinx_u250_gen3x16_xdma_4_1_202210_1\/xilinx_u250_gen3x16_xdma_4_1_202210_1.xpfm" + } + ] +} +``` + +Here, `baseName` and potentially `platformFile` are of interest and either can be specified as value of `IT4I_PLATFORM`. +In this case we have platform files `xilinx_u280_gen3x16_xdma_1_202211_1` (Alveo u280) and `xilinx_u250_gen3x16_xdma_4_1_202210_1` (Alveo u250). + +The `IT4I_BUILD_MODE` is used to specify build type (`hw`, `hw_emu` and `sw_emu`): + +- `hw` performs full synthesis for the accelerator +- `hw_emu` allows to run both synthesis and emulation for debugging +- `sw_emu` compiles kernels only for emulation (doesn't require accelerator and allows much faster build) + +For example, to configure software emulation mode build for `Alveo u280` we set: + +```console +$ export IT4I_PLATFORM=xilinx_u280_gen3x16_xdma_1_202211_1 +$ export IT4I_BUILD_MODE=sw_emu +``` + +### Using HLS and XRT + +The applications are typically separated into host and accelerator/kernel side. +The following host-side code should be saved as `host.cpp` + +```c++ +/* +# Copyright (C) 2023, Advanced Micro Devices, Inc. All rights reserved. +# SPDX-License-Identifier: X11 +*/ +#include <iostream> +#include <cstring> + +// XRT includes +#include "xrt/xrt_bo.h" +#include <experimental/xrt_xclbin.h> +#include "xrt/xrt_device.h" +#include "xrt/xrt_kernel.h" + +#define DATA_SIZE 4096 + +int main(int argc, char** argv) +{ + if(argc != 2) + { + std::cout << "Usage: " << argv[0] << " <XCLBIN File>" << std::endl; + return EXIT_FAILURE; + } + + // Read settings + std::string binaryFile = argv[1]; + int device_index = 0; + + std::cout << "Open the device" << device_index << std::endl; + auto device = xrt::device(device_index); + std::cout << "Load the xclbin " << binaryFile << std::endl; + auto uuid = device.load_xclbin("./vadd.xclbin"); + + size_t vector_size_bytes = sizeof(int) * DATA_SIZE; + + //auto krnl = xrt::kernel(device, uuid, "vadd"); + auto krnl = xrt::kernel(device, uuid, "vadd", xrt::kernel::cu_access_mode::exclusive); + + std::cout << "Allocate Buffer in Global Memory\n"; + auto boIn1 = xrt::bo(device, vector_size_bytes, krnl.group_id(0)); //Match kernel arguments to RTL kernel + auto boIn2 = xrt::bo(device, vector_size_bytes, krnl.group_id(1)); + auto boOut = xrt::bo(device, vector_size_bytes, krnl.group_id(2)); + + // Map the contents of the buffer object into host memory + auto bo0_map = boIn1.map<int*>(); + auto bo1_map = boIn2.map<int*>(); + auto bo2_map = boOut.map<int*>(); + std::fill(bo0_map, bo0_map + DATA_SIZE, 0); + std::fill(bo1_map, bo1_map + DATA_SIZE, 0); + std::fill(bo2_map, bo2_map + DATA_SIZE, 0); + + // Create the test data + int bufReference[DATA_SIZE]; + for (int i = 0; i < DATA_SIZE; ++i) + { + bo0_map[i] = i; + bo1_map[i] = i; + bufReference[i] = bo0_map[i] + bo1_map[i]; //Generate check data for validation + } + + // Synchronize buffer content with device side + std::cout << "synchronize input buffer data to device global memory\n"; + boIn1.sync(XCL_BO_SYNC_BO_TO_DEVICE); + boIn2.sync(XCL_BO_SYNC_BO_TO_DEVICE); + + std::cout << "Execution of the kernel\n"; + auto run = krnl(boIn1, boIn2, boOut, DATA_SIZE); //DATA_SIZE=size + run.wait(); + + // Get the output; + std::cout << "Get the output data from the device" << std::endl; + boOut.sync(XCL_BO_SYNC_BO_FROM_DEVICE); + + // Validate results + if (std::memcmp(bo2_map, bufReference, vector_size_bytes)) + throw std::runtime_error("Value read back does not match reference"); + + std::cout << "TEST PASSED\n"; + return 0; +} +``` + +The host-side code can now be compiled using GCC toolchain as: + +```console +$ g++ host.cpp -I$XILINX_XRT/include -I$XILINX_VIVADO/include -L$XILINX_XRT/lib -lxrt_coreutil -o host +``` + +The accelerator side (simple vector-add kernel) should be saved as `vadd.cpp`. + +```c++ +/* +# Copyright (C) 2023, Advanced Micro Devices, Inc. All rights reserved. +# SPDX-License-Identifier: X11 +*/ + +extern "C" { + void vadd( + const unsigned int *in1, // Read-Only Vector 1 + const unsigned int *in2, // Read-Only Vector 2 + unsigned int *out, // Output Result + int size // Size in integer + ) + { +#pragma HLS INTERFACE m_axi port=in1 bundle=aximm1 +#pragma HLS INTERFACE m_axi port=in2 bundle=aximm2 +#pragma HLS INTERFACE m_axi port=out bundle=aximm1 + + for(int i = 0; i < size; ++i) + { + out[i] = in1[i] + in2[i]; + } + } +} +``` + +The accelerator-side code is build using Vitis `v++`. +This is two-step process, which either builds emulation binary or performs full HLS (depending on the value of `-t` argument). +The platform (specific accelerator) has to be also specified at this step (both for emulation and full HLS). + +```console +$ v++ -c -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -k vadd vadd.cpp -o vadd.xo +$ v++ -l -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM vadd.xo -o vadd.xclbin +``` + +This process should result in `vadd.xclbin`, which can be loaded by host-side application. + +### Running in Emulation Mode + +With both host application and kernel binary at hand the application (in emulation mode) can be launched as + +```console +$ XCL_EMULATION_MODE=sw_emu ./host vadd.xclbin +``` + +## Using HLS and OpenCL + +The host-side application code should be saved as `host.cpp`. +This application attempts to find `Xilinx` OpenCL platform in the system and selects first device in that platform. +The device is then configured with provided kernel binary. +Other than that the only difference to typical vector-add in OpenCL is use of `enqueueTask(...)` to launch the kernel +(compared to typical `enqueueNDRangeKernel`). + +```c++ +#include <iostream> +#include <fstream> +#include <iterator> +#include <vector> + +#define CL_HPP_TARGET_OPENCL_VERSION 120 +#define CL_HPP_MINIMUM_OPENCL_VERSION 120 +#define CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY 1 +#define CL_USE_DEPRECATED_OPENCL_1_2_APIS + +#include <CL/cl2.hpp> +#include <CL/cl_ext_xilinx.h> + +std::vector<unsigned char> read_binary_file(const std::string &filename) +{ + std::cout << "INFO: Reading " << filename << std::endl; + std::ifstream file(filename, std::ios::binary); + file.unsetf(std::ios::skipws); + + std::streampos file_size; + file.seekg(0, std::ios::end); + file_size = file.tellg(); + file.seekg(0, std::ios::beg); + + std::vector<unsigned char> data; + data.reserve(file_size); + data.insert(data.begin(), + std::istream_iterator<unsigned char>(file), + std::istream_iterator<unsigned char>()); + + return data; +} + +cl::Device select_device() +{ + std::vector<cl::Platform> platforms; + cl::Platform::get(&platforms); + cl::Platform platform; + + for(cl::Platform &p: platforms) + { + const std::string name = p.getInfo<CL_PLATFORM_NAME>(); + std::cout << "PLATFORM: " << name << std::endl; + if(name == "Xilinx") + { + platform = p; + break; + } + } + + if(platform == cl::Platform()) + { + std::cout << "Xilinx platform not found!" << std::endl; + exit(EXIT_FAILURE); + } + + std::vector<cl::Device> devices; + platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices); + return devices[0]; +} + +static const int DATA_SIZE = 1024; + +int main(int argc, char *argv[]) +{ + if(argc != 2) + { + std::cout << "Usage: " << argv[0] << " <XCLBIN File>" << std::endl; + return EXIT_FAILURE; + } + + std::string binary_file = argv[1]; + + std::vector<int> source_a(DATA_SIZE, 10); + std::vector<int> source_b(DATA_SIZE, 32); + + auto program_binary = read_binary_file(binary_file); + cl::Program::Binaries bins{{program_binary.data(), program_binary.size()}}; + + cl::Device device = select_device(); + cl::Context context(device, nullptr, nullptr, nullptr); + cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE); + + cl::Program program(context, {device}, bins, nullptr); + + cl::Kernel vadd_kernel = cl::Kernel(program, "vector_add"); + + cl::Buffer buffer_a(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, source_a.size() * sizeof(int), source_a.data()); + cl::Buffer buffer_b(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, source_b.size() * sizeof(int), source_b.data()); + cl::Buffer buffer_res(context, CL_MEM_READ_WRITE, source_a.size() * sizeof(int)); + + int narg = 0; + vadd_kernel.setArg(narg++, buffer_res); + vadd_kernel.setArg(narg++, buffer_a); + vadd_kernel.setArg(narg++, buffer_b); + vadd_kernel.setArg(narg++, DATA_SIZE); + + q.enqueueTask(vadd_kernel); + + std::vector<int> result(DATA_SIZE, 0); + q.enqueueReadBuffer(buffer_res, CL_TRUE, 0, result.size() * sizeof(int), result.data()); + + int mismatch_count = 0; + for(size_t i = 0; i < DATA_SIZE; ++i) + { + int host_result = source_a[i] + source_b[i]; + if(result[i] != host_result) + { + mismatch_count++; + std::cout << "ERROR: " << result[i] << " != " << host_result << std::endl; + break; + } + } + + std::cout << "RESULT: " << (mismatch_count == 0 ? "PASSED" : "FAILED") << std::endl; + + return 0; +} +``` + +The host-side code can now be compiled using GCC toolchain as: + +```console +$ g++ host.cpp -I$XILINX_XRT/include -I$XILINX_VIVADO/include -lOpenCL -o host +``` + +The accelerator side (simple vector-add kernel) should be saved as `vadd.cl`. + +```c++ +#define BUFFER_SIZE 256 +#define DATA_SIZE 1024 + +// TRIPCOUNT indentifier +__constant uint c_len = DATA_SIZE / BUFFER_SIZE; +__constant uint c_size = BUFFER_SIZE; + +__attribute__((reqd_work_group_size(1, 1, 1))) +__kernel void vector_add(__global int* c, + __global const int* a, + __global const int* b, + const int n_elements) +{ + int arrayA[BUFFER_SIZE]; + int arrayB[BUFFER_SIZE]; + + __attribute__((xcl_loop_tripcount(c_len, c_len))) + for (int i = 0; i < n_elements; i += BUFFER_SIZE) + { + int size = BUFFER_SIZE; + + if(i + size > n_elements) + size = n_elements - i; + + __attribute__((xcl_loop_tripcount(c_size, c_size))) + __attribute__((xcl_pipeline_loop(1))) readA: + for(int j = 0; j < size; j++) + arrayA[j] = a[i + j]; + + __attribute__((xcl_loop_tripcount(c_size, c_size))) + __attribute__((xcl_pipeline_loop(1))) readB: + for(int j = 0; j < size; j++) + arrayB[j] = b[i + j]; + + __attribute__((xcl_loop_tripcount(c_size, c_size))) + __attribute__((xcl_pipeline_loop(1))) vadd_writeC: + for(int j = 0; j < size; j++) + c[i + j] = arrayA[j] + arrayB[j]; + } +} +``` + +The accelerator-side code is build using Vitis `v++`. +This is three-step process, which either builds emulation binary or performs full HLS (depending on the value of `-t` argument). +The platform (specific accelerator) has to be also specified at this step (both for emulation and full HLS). + +```console +$ v++ -c -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -k vector_add -o vadd.xo vadd.cl +$ v++ -l -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -o vadd.link.xclbin vadd.xo +$ v++ -p vadd.link.xclbin -t $IT4I_BUILD_MODE --platform $IT4I_PLATFORM -o vadd.xclbin +``` + +This process should result in `vadd.xclbin`, which can be loaded by host-side application. + +### Running in Emulation Mode + +With both host application and kernel binary at hand the application (in emulation mode) can be launched as + +```console +$ XCL_EMULATION_MODE=sw_emu ./host vadd.xclbin +``` + +## Building Application for Real HW + +So far we have assumed software emulation (`sw_emu`), however the same steps can be used to build application for real hardware. +To do so we have to rebuild our kernel binaries in `hw` mode by setting + +```console +$ export IT4I_BUILD_MODE=hw +``` + +and the application can be run without setting `XCL_EMULATION_MODE`: + +```console +$ ./host vadd.xclbin +``` + +!!! note + The HLS of these simple applications **can take up to 2 hours** to finish. + +## Additional Resources + +- [https://xilinx.github.io/Vitis-Tutorials/][1] +- [http://xilinx.github.io/Vitis_Accel_Examples/][2] + +[1]: https://xilinx.github.io/Vitis-Tutorials/ +[2]: http://xilinx.github.io/Vitis_Accel_Examples/ \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 423cabe15f9344286e02cae0a82ae87da85ab083..c9397881315db40c0c8f556510195df92a2683b2 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -155,8 +155,10 @@ nav: - Accessing CS: cs/accessing.md - Specification: cs/specifications.md - Complementary System Job Scheduling: cs/job-scheduling.md - - Using AMD Partition: cs/amd.md - - Using ARM Partition: cs/arm.md + - Guides: + - Using AMD Partition: cs/guides/amd.md + - Using ARM Partition: cs/guides/arm.md + - Xilinx Accelerator Platform: cs/guides/xilinx.md - Archive: - Introduction: archive/archive-intro.md - Anselm: