specifications.md

# Complementary Systems Specifications

Below are the technical specifications of individual Complementary systems.

## Partition 0 - ARM (Cortex-A72)

The partition is based on the [ARMv8-A 64-bit][4] nebo architecture.

- Cortex-A72
  - ARMv8-A 64-bit
  - 2x 32 cores @ 2 GHz
  - 255 GB memory
- disk capacity 3,7 TB
- 1x Infiniband FDR 56 Gb/s

## Partition 1 - ARM (A64FX)

The partition is based on the Armv8.2-A architecture
with SVE extension of instruction set and
consists of 8 compute nodes with the following per-node parameters:

- 1x Fujitsu A64FX CPU
  - Arm v8.2-A ISA CPU with Scalable Vector Extension (SVE) extension
  - 48 cores at 2.0 GHz
  - 32 GB of HBM2 memory
- 400 GB SSD (m.2 form factor) – mixed used type
- 1x Infiniband HDR100 interface
  - connected via 16x PCI-e Gen3 slot to the CPU

## Partition 2 - Intel (Ice Lake, NVDIMMs) <!--- + Bitware FPGAs) -->

The partition is based on the Intel Ice Lake x86 architecture.
It contains two servers with Intel NVDIMM memories.
 <!--- The key technologies installed are Intel NVDIMM memories. and Intel FPGA accelerators.
The partition contains two servers each with two FPGA accelerators. -->

Each server has the following parameters:

- 2x 3rd Gen Xeon Scalable Processors Intel Xeon Gold 6338 CPU
  - 32-cores @ 2.00GHz
- 16x 16GB RAM with ECC
  - DDR4-3200
- 1x Infiniband HDR100 interface
  - connected to CPU 8x PCI-e Gen4 interface
- 3.2 TB NVMe local storage – mixed use type

<!---
2x FPGA accelerators
Bitware [520N-MX][1]
-->

In addition, the servers has the following parameters:

- Intel server 1 – low NVDIMM memory server with 2304 GB NVDIMM memory
  - 16x 128GB NVDIMM persistent memory modules
- Intel server 2 – high NVDIMM memory server with 8448 GB NVDIMM memory
  - 16x 512GB NVDIMM persistent memory modules

Software installed on the partition:

FPGA boards support application development using following design flows:

- OpenCL
- High-Level Synthesis (C/C++) including support for OneAPI
- Verilog and VHDL

## Partition 3 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)

The partition is based on two servers equipped with AMD Milan x86 CPUs,
AMD GPUs and Xilinx FPGAs architectures and represents an alternative
to the Intel-based partition's ecosystem.

Each server has the following parameters:

- 2x AMD Milan 7513 CPU
  - 32 cores @ 2.6 GHz
- 16x 16GB RAM with ECC
  - DDR4-3200
- 4x AMD GPU accelerators MI 100
  - Interconnected with AMD Infinity Fabric™ Link for fast GPU to GPU communication
- 1x 100 GBps Infiniband HDR100
  - connected to CPU via 8x PCI-e Gen4 interface
- 3.2 TB NVMe local storage – mixed use

In addition:

- AMD server 1 has 2x FPGA [Xilinx Alveo U250 Data Center Accelerator Card][2]
- AMD server 2 has 2x FPGA [Xilinx Alveo U280 Data Center Accelerator Card][3]

Software installed on the partition:

FPGA boards support application development using following design flows:

- OpenCL
- High-Level Synthesis (C/C++)
- Verilog and VHDL
- developer tools and libraries for AMD GPUs.

## Partition 4 - Edge Server

The partition provides overview of the so-called edge computing class of resources
with solutions powerful enough to provide data analytic capabilities (both CPU and GPU)
in a form factor which cannot require a data center to operate.

The partition consists of one edge computing server with following parameters:

- 1x x86_64 CPU Intel Xeon D-1587
  - TDP 65 W,
  - 16 cores,
  - 435 GFlop/s theoretical max performance in double precision
- 1x CUDA programmable GPU NVIDIA Tesla T4
  - TDP 70W
  - theoretical performance 8.1 TFlop/s in FP32
- 128 GB RAM
- 1.92TB SSD storage
- connectivity:
  - 2x 10 Gbps Ethernet,
  - WiFi 802.11 ac,
  - LTE connectivity

## Partition 5 - FPGA Synthesis Server

FPGAs design tools usually run for several hours to one day to generate a final bitstream (logic design) of large FPGA chips. These tools are usually sequential, therefore part of the system is a dedicated server for this task.

This server is used by development tools needed for FPGA boards installed in both Compute partition 2 and 3.

- AMD EPYC 72F3, 8 cores @ 3.7 GHz nominal frequency
  - 8 memory channels with ECC
- 128 GB of DDR4-3200 memory with ECC
  - memory is fully populated to maximize memory subsystem performance
- 1x 10Gb Ethernet port used for connection to LAN
- NVMe local storage
  - 2x NVMe disks 3.2TB, configured RAID 1

## Partition 6 - ARM + CUDA GPGU (Ampere) + DPU

This partition is based on ARM architecture and is equipped with CUDA programmable GPGPU accelerators
based on Ampere architecture and DPU network processing units.
The partition consists of two nodes with the following per-node parameters:

- Server Gigabyte G242-P36, Ampere Altra Q80-30 (80c, 3.0GHz)
- 512GB DIMM DDR4, 3200MHz, ECC, CL22
- 2x Micron 7400 PRO 1920GB NVMe M.2 Non-SED Enterprise SSD
- 2x NVIDIA A30 GPU Accelerator
- 2x NVIDIA BlueField-2 E-Series DPU 25GbE Dual-Port SFP56, PCIe Gen4 x16, 16GB DDR + 64, 200Gb Ethernet
- Mellanox ConnectX-5 EN network interface card, 10/25GbE dual-port SFP28, PCIe3.0 x8
- Mellanox ConnectX-6 VPI adapter card, 100Gb/s (HDR100, EDR IB and 100GbE), single-port QSFP56

## Partition 7 - IBM

The IBM Power10 server is a single-node partition with the following parameters:

- Server IBM POWER S1022
- 2x Power10 12-CORE TYPICAL 2.90 TO 4.0 GHZ (MAX) PO
- 512GB DDIMMS, 3200 MHZ, 8GBIT DDR4
- 2x ENTERPRISE 1.6 TB SSD PCIE4 NVME U.2 MOD
- 2x ENTERPRISE 6.4 TB SSD PCIE4 NVME U.2 MOD
- PCIE3 LP 2-PORT 25/10GB NIC&ROCE SR/CU A

## Partition 8 - HPE Proliant

This partition provides a modern CPU with a very large L3 cache.
The goal is to enable users to develop algorithms and libraries
that will efficiently utilize this technology.
The processor is very efficient, for example, for linear algebra on relatively small matrices.
This is a single-node partition with the following parameters:

- Server HPE Proliant DL 385 Gen10 Plus v2 CTO
- 2x AMD EPYC 7773X Milan-X, 64 cores, 2.2GHz, 768 MB L3 cache
- 16x HPE 16GB (1x+16GB) x4 DDR4-3200 Registered Smart Memory Kit
- 2x 3.84TB NVMe RI SFF BC U.3ST MV SSD
- BCM 57412 10GbE 2p SFP+ OCP3 Adptr
- HPE IB HDR100/EN 100Gb 1p QSFP56 Adptr1
- HPE Cray Programming Environment for x86 Systems 2 Seats

## Partition 9 - Virtual GPU Accelerated Workstation

This partition provides users with a remote/virtual workstation running MS Windows OS.
It offers rich graphical environment with a focus on 3D OpenGL
or RayTracing-based applications with the smallest possible degradation of user experience.
The partition consists of two nodes with the following per-node parameters:

- Server HPE Proliant DL 385 Gen10 Plus v2 CTO
- 2x AMD EPYC 7413, 24 cores, 2.55GHz
- 16x HPE 32GB 2Rx4 PC4-3200AA-R Smart Kit
- 2x 3.84TB NVMe RI SFF BC U.3ST MV SSD
- BCM 57412 10GbE 2p SFP+ OCP3 Adptr
- 2x NVIDIA A40 48GB GPU Accelerator

### Available Software

The following is the list of software available on partiton 09:

- Academic VMware Horizon 8 Enterprise Term Edition: 10 Concurrent User Pack for 4 year term license; includes SnS
- 8x NVIDIA RTX Virtual Workstation, per concurrent user, EDU, perpetual license
- 32x NVIDIA RTX Virtual Workstation, per concurrent user, EDU SUMS per year
- 7x Windows Server 2022 Standard - 16 Core License Pack
- 10x Windows Server 2022 - 1 User CAL
- 40x Windows 10/11 Enterprise E3 VDA (Microsoft) per year
- Hardware VMware Horizon management

## Partition 10 - Sapphire Rapids-HBM Server

The primary purpose of this server is to evaluate the impact of the HBM memory on the x86 processor
on the performance of the user applications.
This is a new feature previously available only on the GPGPU accelerators
and provided a significant boost to the memory-bound applications.
Users can also compare the impact of the HBM memory with the impact of the large L3 cache
available on the AMD Milan-X processor also available on the complementary systems.
The server is also equipped with DDR5 memory and enables the comparative studies with reference to DDR4 based systems.

- 2x Intel® Xeon® CPU Max 9468 48 cores base 2.1GHz, max 3.5Ghz
- 16x 16GB DDR5 4800Mhz
- 2x Intel D3 S4520 960GB SATA 6Gb/s
- 1x Supermicro Standard LP 2-port 10GbE RJ45, Broadcom BCM57416

## Partition 11 - NVIDIA Grace CPU Superchip

The [NVIDIA Grace CPU Superchip][6] uses the [NVIDIA® NVLink®-C2C][5] technology to deliver 144 Arm® Neoverse V2 cores and 1TB/s of memory bandwidth.
Runs all NVIDIA software stacks and platforms, including NVIDIA RTX™, NVIDIA HPC SDK, NVIDIA AI, and NVIDIA Omniverse™.

- Superchip design with up to 144 Arm Neoverse V2 CPU cores with Scalable Vector Extensions (SVE2)
- World’s first LPDDR5X with error-correcting code (ECC) memory, 1TB/s total bandwidth
- 900GB/s coherent interface, 7X faster than PCIe Gen 5
- NVIDIA Scalable Coherency Fabric with 3.2TB/s of aggregate bisectional bandwidth
- 2X the packaging density of DIMM-based solutions
- 2X the performance per watt of today’s leading CPU
- FP64 Peak of 7.1TFLOPS

[1]: https://www.bittware.com/fpga/520n-mx/
[2]: https://www.xilinx.com/products/boards-and-kits/alveo/u250.html#overview
[3]: https://www.xilinx.com/products/boards-and-kits/alveo/u280.html#overview
[4]: https://developer.arm.com/documentation/100095/0003/
[5]: https://www.nvidia.com/en-us/data-center/nvlink-c2c/
[6]: https://www.nvidia.com/en-us/data-center/grace-cpu-superchip/