Skip to content
Snippets Groups Projects
Commit 2d25d3d4 authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Complimentary complementary systems

parent 319821e1
No related branches found
No related tags found
1 merge request!415Complimentary complementary systems
# Accessing Complementary Systems
Complementary systems can be accessed at `login.cs.it4i.cz`
by any user with an active account assigned to an active project.
SSH is required to access complementary systems.
## Data Storage
### Home
The `/home` file system is shared across all complementary systems.
### Scratch
There are local `/lscratch` storages on individual nodes.
### PROJECT
Complementary systems are connected to the [PROJECT storage][1].
[1]: ../storage/project-storage.md
# Complementary Systems
Complementary systems offer development environment for users
that need to port and optimize their codes and application
for various hardware architectures and software technologies
that are not available on standard clusters.
First stage of complementary systems implementation comprises of these partitions:
- compute partition 0 – based on ARM technology - legacy
- compute partition 1 – based on ARM technology - A64FX
- compute partition 2 – based on Intel technologies - Ice Lake, NVDIMMs + Bitware FPGAs
- compute partition 3 – based on AMD technologies - Milan, MI100 GPUs + Xilinx FPGAs
- compute partition 4 – reflecting Edge type of servers
- partition 5 – FPGA synthesis server
![](../img/cs1_1.png)
# Complementary System Job Scheduling
## Introduction
[Slurm][1] workload manager is used to allocate and access Complementary systems resources.
Display partitions/queues
```
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
p00-arm up 1-00:00:00 1 idle p00-arm01
p01-arm* up 1-00:00:00 8 idle p01-arm[01-08]
p02-intel up 1-00:00:00 2 idle p02-intel[01-02]
p03-amd up 1-00:00:00 2 idle p03-amd[01-02]
p04-edge up 1-00:00:00 1 idle p04-edge01
p05-synt up 1-00:00:00 1 idle p05-synt01
```
Show jobs
```
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
104 p01-arm interact user R 1:48 2 p01-arm[01-02]
```
Show job details
```
$ scontrol show job 104
```
Run interactive job
```
$ srun -A PROJECT-ID -p p01-arm --pty bash -i
```
Run interactive job, with X11 forwarding
```
$ srun -A PROJECT-ID -p p01-arm --pty --x11 bash -i
```
Run batch job
```
$ sbatch -A PROJECT-ID -p p01-arm ../script.sh
```
Useful command options (srun, sbatch, salloc)
* -n, --ntasks
* -c, --cpus-per-task
* -N, --nodes
| PARTITION | nodes| cores per node |
| ------ | ------ | ------ |
| p00-arm | 1 | 64 |
| p01-arm | 8 | 48 |
| p02-intel | 2 | 64 |
| p03-amd | 2 | 64 |
| p04-edge | 1 | 16 |
| p05-synt | 1 | 8 |
Use -t, --time option to specify job run time limit. Default job time limit is 2 hours, maximum job time limit is 24 hours.
FIFO scheduling with backfiling is employed.
## Partition 00 - ARM (Legacy)
Whole node allocation.
One node:
```console
sbatch -A PROJECT-ID -p p00-arm ./script.sh
```
## Partition 01 - ARM (A64FX)
Whole node allocation.
One node:
```console
sbatch -A PROJECT-ID -p p01-arm ./script.sh
```
```console
sbatch -A PROJECT-ID -p p01-arm -N=1 ./script.sh
```
Multiple nodes:
```console
sbatch -A PROJECT-ID -p p01-arm -N=8 ./script.sh
```
## Partition 02 - Intel (Ice Lake, NVDIMMs + Bitware FPGAs)
Partial allocation - per FPGA, resource separation is not enforced.
One FPGA:
```console
sbatch -A PROJECT-ID -p p02-intel --gres=fpga ./script.sh
```
Two FPGAs on the same node:
```console
sbatch -A PROJECT-ID -p p02-intel --gres=fpga:2 ./script.sh
```
All FPGAs:
```console
sbatch -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2 ./script.sh
```
## Partition 03 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)
Partial allocation - per GPU and per FPGA, resource separation is not enforced.
One GPU:
```console
sbatch -A PROJECT-ID -p p03-amd --gres=gpgpu ./script.sh
```
Two GPUs on the same node:
```console
sbatch -A PROJECT-ID -p p03-amd --gres=gpgpu:2 ./script.sh
```
Four GPUs on the same node:
```console
sbatch -A PROJECT-ID -p p03-amd --gres=gpgpu:4 ./script.sh
```
All GPUs:
```console
sbatch -A PROJECT-ID -p p03-amd -N 2 --gres=gpgpu:4 ./script.sh
```
One FPGA:
```console
sbatch -A PROJECT-ID -p p03-amd --gres=fpga ./script.sh
```
Two FPGAs:
```console
sbatch -A PROJECT-ID -p p03-amd --gres=fpga:2 ./script.sh
```
All FPGAs:
```console
sbatch -A PROJECT-ID -p p03-amd -N 2--gres=fpga:2 ./script.sh
```
One GPU and one FPGA on the same node:
```console
sbatch -A PROJECT-ID -p p03-amd --gres=gpgpu,fpga ./script.sh
```
Four GPUs and two FPGAs on the same node:
```console
sbatch -A PROJECT-ID -p p03-amd --gres=gpgpu:4,fpga:2 ./script.sh
```
All GPUs and FPGAs:
```console
sbatch -A PROJECT-ID -p p03-amd -N 2 --gres=gpgpu:4,fpga:2 ./script.sh
```
## Partition 04 - Edge Server
Whole node allocation:
```console
sbatch -A PROJECT-ID -p p04-edge ./script.sh
```
## Partition 05 - FPGA Synthesis Server
Whole node allocation:
```console
sbatch -A PROJECT-ID -p p05-synt ./script.sh
```
[1]: https://slurm.schedmd.com/
# Complementary Systems Specifications
Below are the technical specifications of individual complementary systems.
## Partition 0 - ARM (Legacy)
The partition is based on the [ARMv8-A 64-bit][4] nebo architecture.
- Cortex-A72
- ARMv8-A 64-bit
- 2x 32 cores @ 2 GHz
- 255 GB memory
- disk capacity 3,7 TB
- 1x Infiniband FDR 56 Gb/s
## Partition 1 - ARM (A64FX)
The partition is based on the Armv8.2-A architecture
with SVE extension of instruction set and
consists of 8 compute nodes with the following per-node parameters:
- 1x Fujitsu A64FX CPU
- Arm v8.2-A ISA CPU with Scalable Vector Extension (SVE) extension
- 48 cores at 2.0 GHz
- 32 GB of HBM2 memory
- 400 GB SSD (m.2 form factor) – mixed used type
- 1x Infiniband HDR100 interface
- connected via 16x PCI-e Gen3 slot to the CPU
## Partition 02 - Intel (Ice Lake, NVDIMMs + Bitware FPGAs)
The partition is based on the Intel Ice Lake x86 architecture.
The key technologies installed are Intel NVDIMM memories and Intel FPGA accelerators.
The partition contains two servers each with two FPGA accelerators.
Each server has the following parameters:
- 2x 3rd Gen Xeon Scalable Processors Intel Xeon Gold 6338 CPU
- 32-cores @ 2.00GHz
- 16x 16GB RAM with ECC
- DDR4-3200
- 1x Infiniband HDR100 interface
- connected to CPU 8x PCI-e Gen4 interface
- 3.2 TB NVMe local storage – mixed use type
- 2x FPGA accelerators
- Bitware [520N-MX][1]
In addition, the servers has the following parameters:
- Intel server 1 – low NVDIMM memory server with 2304 GB NVDIMM memory
- 16x 128GB NVDIMM persistent memory modules
- Intel server 2 – high NVDIMM memory server with 8448 GB NVDIMM memory
- 16x 512GB NVDIMM persistent memory modules
Software installed on the partition:
FPGA boards support application development using following design flows:
- OpenCL
- High-Level Synthesis (C/C++) including support for OneAPI
- Verilog and VHDL
## Partition 03 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)
The partition is based on two servers equipped with AMD Milan x86 CPUs,
AMD GPUs and Xilinx FPGAs architectures and represents an alternative
to the Intel-based partition's ecosystem.
Each server has the following parameters:
- 2x AMD Milan 7513 CPU
- 32 cores @ 2.6 GHz
- 16x 16GB RAM with ECC
- DDR4-3200
- 4x AMD GPU accelerators MI 100
- Interconnected with AMD Infinity Fabric™ Link for fast GPU to GPU communication
- 1x 100 GBps Infiniband HDR100
- connected to CPU via 8x PCI-e Gen4 interface
- 3.2 TB NVMe local storage – mixed use
In addition:
- AMD server 1 has 2x FPGA [Xilinx Alveo U250 Data Center Accelerator Card][2]
- AMD server 2 has 2x FPGA [Xilinx Alveo U280 Data Center Accelerator Card][3]
Software installed on the partition:
FPGA boards support application development using following design flows:
- OpenCL
- High-Level Synthesis (C/C++)
- Verilog and VHDL
- developer tools and libraries for AMD GPUs.
## Partition 04 - Edge Server
The partition provides overview of the so-called edge computing class of resources
with solutions powerful enough to provide data analytic capabilities (both CPU and GPU)
in a form factor which cannot require a data center to operate.
The partition consists of one edge computing server with following parameters:
- 1x x86_64 CPU Intel Xeon D-1587
- TDP 65 W,
- 16 cores,
- 435 GFlop/s theoretical max performance in double precision
- 1x CUDA programmable GPU NVIDIA Tesla T4
- TDP 70W
- theoretical performance 8.1 TFlop/s in FP32
- 128 GB RAM
- 1.92TB SSD storage
- connectivity:
- 2x 10 Gbps Ethernet,
- WiFi 802.11 ac,
- LTE connectivity
## Partition 05 - FPGA Synthesis Server
FPGAs design tools usually run for several hours to one day to generate a final bitstream (logic design) of large FPGA chips. These tools are usually sequential, therefore part of the system is a dedicated server for this task.
This server is used by development tools needed for FPGA boards installed in both Compute partition 2 and 3.
- AMD EPYC 72F3, 8 cores @ 3.7 GHz nominal frequency
- 8 memory channels with ECC
- 128 GB of DDR4-3200 memory with ECC
- memory is fully populated to maximize memory subsystem performance
- 1x 10Gb Ethernet port used for connection to LAN
- NVMe local storage
- 2x NVMe disks 3.2TB, configured RAID 1
[1]: https://www.bittware.com/fpga/520n-mx/
[2]: https://www.xilinx.com/products/boards-and-kits/alveo/u250.html#overview
[3]: https://www.xilinx.com/products/boards-and-kits/alveo/u280.html#overview
[4]: https://developer.arm.com/documentation/100095/0003/
docs.it4i/img/cs1.png

48.9 KiB

docs.it4i/img/cs1_1.png

80.9 KiB

...@@ -116,27 +116,32 @@ nav: ...@@ -116,27 +116,32 @@ nav:
- Visualization Servers: barbora/visualization.md - Visualization Servers: barbora/visualization.md
- NVIDIA DGX-2: - NVIDIA DGX-2:
- Introduction: dgx2/introduction.md - Introduction: dgx2/introduction.md
- Accessing the DGX-2: dgx2/accessing.md - Accessing DGX-2: dgx2/accessing.md
- Resource Allocation and Job Execution: dgx2/job_execution.md - Resource Allocation and Job Execution: dgx2/job_execution.md
- Software deployment: dgx2/software.md - Software deployment: dgx2/software.md
- Complementary Systems:
- Introduction: cs/introduction.md
- Accessing CS: cs/accessing.md
- Specification: cs/specifications.md
- Resource Allocation and Job Execution:: cs/job-scheduling.md
- Archive: - Archive:
- Introduction: archive/archive-intro.md - Introduction: archive/archive-intro.md
- Anselm: - Anselm:
- Introduction: anselm/introduction.md - Introduction: anselm/introduction.md
- Hardware Overview: anselm/hardware-overview.md - Hardware Overview: anselm/hardware-overview.md
- Compute Nodes: anselm/compute-nodes.md - Compute Nodes: anselm/compute-nodes.md
- Storage: anselm/storage.md - Storage: anselm/storage.md
- Network: anselm/network.md - Network: anselm/network.md
- Salomon: - Salomon:
- Introduction: salomon/introduction.md - Introduction: salomon/introduction.md
- Hardware Overview: salomon/hardware-overview.md - Hardware Overview: salomon/hardware-overview.md
- Compute Nodes: salomon/compute-nodes.md - Compute Nodes: salomon/compute-nodes.md
- Network: - Network:
- InfiniBand Network: salomon/network.md - InfiniBand Network: salomon/network.md
- IB Single-Plane Topology: salomon/ib-single-plane-topology.md - IB Single-Plane Topology: salomon/ib-single-plane-topology.md
- 7D Enhanced Hypercube: salomon/7d-enhanced-hypercube.md - 7D Enhanced Hypercube: salomon/7d-enhanced-hypercube.md
- Storage: salomon/storage.md - Storage: salomon/storage.md
- Visualization Servers: salomon/visualization.md - Visualization Servers: salomon/visualization.md
- Software: - Software:
- Environment and Modules: environment-and-modules.md - Environment and Modules: environment-and-modules.md
- Modules: - Modules:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment