Skip to content
Snippets Groups Projects
Commit fe6c957a authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Karolina acceleration partition update

parent e634421a
Branches
No related tags found
1 merge request!326Karolina acceleration partition update
...@@ -8,7 +8,7 @@ Standard compute nodes without accelerators (such as GPUs or FPGAs) are based on ...@@ -8,7 +8,7 @@ Standard compute nodes without accelerators (such as GPUs or FPGAs) are based on
* 720 nodes * 720 nodes
* 92,160 cores in total * 92,160 cores in total
* 2x AMD Zen 2 EPYC™ 7H12, 64-core, 2.6 GHz processors per node * 2x AMD EPYC™ 7H12, 64-core, 2.6 GHz processors per node
* 256 GB DDR4 3200MT/s of physical memory per node * 256 GB DDR4 3200MT/s of physical memory per node
* 5,324.8 GFLOP/s per compute node * 5,324.8 GFLOP/s per compute node
* 1x 100 Gb/s Ethernet * 1x 100 Gb/s Ethernet
...@@ -21,16 +21,15 @@ Standard compute nodes without accelerators (such as GPUs or FPGAs) are based on ...@@ -21,16 +21,15 @@ Standard compute nodes without accelerators (such as GPUs or FPGAs) are based on
Accelerated compute nodes deliver most of the compute power usable for HPC as well as excellent performance in HPDA and AI workloads, especially in the learning phase of Deep Neural Networks. Accelerated compute nodes deliver most of the compute power usable for HPC as well as excellent performance in HPDA and AI workloads, especially in the learning phase of Deep Neural Networks.
* 70 nodes * 72 nodes
* 4,480 cores in total * 9,216 cores in total
* 2x AMD Zen 2 EPYC™ 7452, 32-core, 2.35 GHz processors per node * 2x AMD EPYC™ 7763, 64-core, 2.45 GHz processors per node
* 512 GB DDR4 3200MT/s of physical memory per node * 1024 GB DDR4 3200MT/s of physical memory per node
* 8x GPU accelerator NVIDIA A100 per node * 8x GPU accelerator NVIDIA A100 per node
* 2,406.4 GFLOP/s per compute node * 5,017.6 GFLOP/s per compute node
* GPU-to-GPU All-to-All NVLINK 2.0, GPU-Direct(?)
* 4x 200 Gb/s Ethernet * 4x 200 Gb/s Ethernet
* 4x 200 Gb/s IB port * 4x 200 Gb/s IB port
* Acn[01-70] * Acn[01-72]
![](img/hpeapollo6500.png) ![](img/hpeapollo6500.png)
...@@ -41,7 +40,7 @@ Data analytics compute node is oriented on supporting huge memory jobs by implem ...@@ -41,7 +40,7 @@ Data analytics compute node is oriented on supporting huge memory jobs by implem
* 1x HPE Superdome Flex server * 1x HPE Superdome Flex server
* 768 cores in total * 768 cores in total
* 32x Intel® Xeon® Platinum, 24-core, 2.9 GHz, 205W * 32x Intel® Xeon® Platinum, 24-core, 2.9 GHz, 205W
* 24 TiB DDR4 2993MT/s of physical memory per node * 24 TB DDR4 2993MT/s of physical memory per node
* 2x 200 Gb/s Ethernet * 2x 200 Gb/s Ethernet
* 2x 200 Gb/s IB port * 2x 200 Gb/s IB port
* 71.2704 TFLOP/s * 71.2704 TFLOP/s
...@@ -55,7 +54,7 @@ Cloud compute nodes support both the research and operation of the Infrastructur ...@@ -55,7 +54,7 @@ Cloud compute nodes support both the research and operation of the Infrastructur
* 36 nodes * 36 nodes
* 4,608 cores in total * 4,608 cores in total
* 2x AMD Zen 2 EPYC™ 7H12, 64-core, 2.6 GHz processors per node * 2x AMD EPYC™ 7H12, 64-core, 2.6 GHz processors per node
* 256 GB DDR4 3200MT/s of physical memory per node * 256 GB DDR4 3200MT/s of physical memory per node
* HPE ProLiant XL225n Gen10 Plus servers * HPE ProLiant XL225n Gen10 Plus servers
* 5,324.8 GFLOP/s per compute node * 5,324.8 GFLOP/s per compute node
...@@ -68,15 +67,15 @@ Cloud compute nodes support both the research and operation of the Infrastructur ...@@ -68,15 +67,15 @@ Cloud compute nodes support both the research and operation of the Infrastructur
| Node type | Count | Range | Memory | Cores | Queues (?) | | Node type | Count | Range | Memory | Cores | Queues (?) |
| ---------------------------- | ----- | ----------- | ------ | ----------- | -------------------------- | | ---------------------------- | ----- | ----------- | ------ | ----------- | -------------------------- |
| Nodes without an accelerator | 720 | Cn[001-720] | 256 GB | 128 @ 2.6 GHz | qexp, qprod, qlong, qfree | | Nodes without an accelerator | 720 | Cn[001-720] | 256 GB | 128 @ 2.6 GHz | qexp, qprod, qlong, qfree |
| Nodes with a GPU accelerator | 70 | Acn[01-70] | 512 GB | 64 @ 2.35 GHz | qnvidia | | Nodes with a GPU accelerator | 72 | Acn[01-72] | 1024 GB | 64 @ 2.45 GHz | qnvidia |
| Data analytics nodes | 1 | DAcn1 | 24 TB | 768 @ 2.9 GHz | qfat | | Data analytics nodes | 1 | DAcn1 | 24 TB | 768 @ 2.9 GHz | qfat |
| Cloud partiton | 36 | CLn[01-36] | 256 GB | 128 @ 2.6 GHz | | | Cloud partiton | 36 | CLn[01-36] | 256 GB | 128 @ 2.6 GHz | |
## Processor Architecture ## Processor Architecture
Karolina is equipped with AMD Zen 2 EPYC™ 7H12 (nodes without accelerators, Cloud partiton), AMD Zen 2 EPYC™ 7452 (nodes with accelerators), and Intel Cascade Lake Xeon-SC 8268 (Data analytics partition). Karolina is equipped with AMD EPYC™ 7H12 (nodes without accelerators, Cloud partiton), AMD EPYC™ 7763 (nodes with accelerators), and Intel Cascade Lake Xeon-SC 8268 (Data analytics partition).
### AMD [Zen 2 EPYC™ 7H12][d] ### AMD [Epyc™ 7H12][d]
EPYC™ 7H12 is a 64-bit 64-core x86 server microprocessor designed and introduced by AMD in late 2019. This multi-chip processor, which is based on the Zen 2 microarchitecture, incorporates logic fabricated TSMC 7 nm process and I/O fabricated on GlobalFoundries 14 nm process. The 7H12 has a TDP of 280 W with a base frequency of 2.6 GHz and a boost frequency of up to 3.3 GHz. This processor supports up to two-way SMP and up to 4 TiB of eight channels DDR4-3200 memory per socket. EPYC™ 7H12 is a 64-bit 64-core x86 server microprocessor designed and introduced by AMD in late 2019. This multi-chip processor, which is based on the Zen 2 microarchitecture, incorporates logic fabricated TSMC 7 nm process and I/O fabricated on GlobalFoundries 14 nm process. The 7H12 has a TDP of 280 W with a base frequency of 2.6 GHz and a boost frequency of up to 3.3 GHz. This processor supports up to two-way SMP and up to 4 TiB of eight channels DDR4-3200 memory per socket.
...@@ -93,22 +92,22 @@ EPYC™ 7H12 is a 64-bit 64-core x86 server microprocessor designed and introduc ...@@ -93,22 +92,22 @@ EPYC™ 7H12 is a 64-bit 64-core x86 server microprocessor designed and introduc
* **Process**: 7 nm, 14 nm * **Process**: 7 nm, 14 nm
* **TDP**: 280 W * **TDP**: 280 W
### AMD [Zen 2 EPYC™ 7452][e] ### AMD [Epyc™ 7763][e]
EPYC 7452 is a 64-bit dotriaconta-core x86 server microprocessor designed and introduced by AMD in mid-2019. This multi-chip processor, which is based on the Zen 2 microarchitecture, incorporates logic fabricated TSMC 7 nm process and I/O fabricated on GlobalFoundries 14 nm process. The 7452 has a TDP of 155 W with a base frequency of 2.2 GHz and a boost frequency of up to 3.35 GHz. This processor supports up to two-way SMP and up to 4 TiB of eight channels DDR4-3200 memory per socket. EPYC 7763 is a 64-bit 64-core x86 server microprocessor designed and introduced by AMD in March 2021. This multi-chip processor, which is based on the Zen 3 microarchitecture, incorporates eight Core Complex Dies fabricated on a TSMC advanced 7 nm process and a large I/O die manufactured by GlobalFoundries. The 7763 has a TDP of 280 W with a base frequency of 2.45 GHz and a boost frequency of up to 3.5 GHz. This processor supports up to two-way SMP and up to 4 TiB of eight channel DDR4-3200 memory per socket.
* **Family**: EPYC™ * **Family**: EPYC™
* **Cores**: 32 * **Cores**: 64
* **Threads**: 64 * **Threads**: 128
* **L1I Cache**: 1 MiB, 32x32 KiB, 8-way set associative * **L1I Cache**: 2 MiB, 64x32 KiB, 8-way set associative, write-back
* **L1D Cache**: 1 MiB, 32x32 KiB, 8-way set associative * **L1D Cache**: 2 MiB, 64x32 KiB, 8-way set associative, write-back
* **L2 Cache**: 16 MiB, 32x512 KiB, 8-way set associative * **L2 Cache**: 32 MiB, 64x512 KiB, 8-way set associative, write-back
* **L3 Cache**: 128 MiB, 18x16 MiB, 8-way set associative, write-back * **L3 Cache**: 256 MiB, 8x32 MiB, 16-way set associative, write-back
* **Instructions**: x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA3, F16C, BMI, BMI2, VT-x, VT-d, TXT, TSX, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVE, SGX, MPX, AVX-512 * **Instructions**: x86-16, x86-32, x86-64, MMX, EMMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4a, AVX, AVX2, AES, CLMUL, RdRanD, FMA3, F16C, ABM, BMI1, BMI2, AMD-Vi, AMD-V, SHA, ADX, Real, Protected, SMM, FPU, NX, SMT, SME, TSME, SEV, SenseMI
* **Frequency**: 2.35 GHz * **Frequency**: 2.45 GHz
* **Max turbo**: 3.35 GHz * **Max turbo**: 3.5 GHz
* **Process**: 7 nm, 14 nm * **Process**: 7 nm
* **TDP**: 155 W * **TDP**: 280 W
### Intel [Skylake Platinum 8268][f] ### Intel [Skylake Platinum 8268][f]
...@@ -151,6 +150,6 @@ Karolina is equipped with an [NVIDIA A100][g] accelerator. ...@@ -151,6 +150,6 @@ Karolina is equipped with an [NVIDIA A100][g] accelerator.
[c]: https://en.wikichip.org/wiki/x86/avx512vnni [c]: https://en.wikichip.org/wiki/x86/avx512vnni
[d]: https://en.wikichip.org/wiki/amd/epyc/7h12 [d]: https://en.wikichip.org/wiki/amd/epyc/7h12
[e]: https://en.wikichip.org/wiki/amd/epyc/7452 [e]: https://en.wikichip.org/wiki/amd/epyc/7763
[f]: https://en.wikichip.org/wiki/intel/xeon_platinum/8268 [f]: https://en.wikichip.org/wiki/intel/xeon_platinum/8268
[g]: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/a100-80gb-datasheet-update-nvidia-us-1521051-r2-web.pdf [g]: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/a100-80gb-datasheet-update-nvidia-us-1521051-r2-web.pdf
# Hardware Overview # Hardware Overview
Karolina consists of 827 computational nodes of which 720 are universal compute nodes (**Cn[001-720]**), 70 are NVIDIA A100 accelerated nodes (**Acn[01-70]**), 1 is a data analytics node (**DAcn1**), and 36 are cloud partitions (**CLn[01-36]**). Each node is a powerful x86-64 computer, equipped with 64/128/768 cores (64-core AMD EPYC™ 7H12 / 32-core AMD EPYC™ 7452 / 24-core Intel Xeon-SC 8268) and at least 256 GB of RAM. Karolina consists of 829 computational nodes of which 720 are universal compute nodes (**Cn[001-720]**), 72 are NVIDIA A100 accelerated nodes (**Acn[01-72]**), 1 is a data analytics node (**DAcn1**), and 36 are cloud partitions (**CLn[01-36]**). Each node is a powerful x86-64 computer, equipped with 128/768 cores (64-core AMD EPYC™ 7H12 / 64-core AMD EPYC™ 7763 / 24-core Intel Xeon-SC 8268) and at least 256 GB of RAM.
[User access][5] to Karolina is provided by four login nodes **login[1-4]**. The nodes are interlinked through high speed InfiniBand and Ethernet networks. [User access][5] to Karolina is provided by four login nodes **login[1-4]**. The nodes are interlinked through high speed InfiniBand and Ethernet networks.
...@@ -16,23 +16,23 @@ The parameters are summarized in the following tables: ...@@ -16,23 +16,23 @@ The parameters are summarized in the following tables:
| Architecture of compute nodes | x86-64 | | Architecture of compute nodes | x86-64 |
| Operating system | Linux | | Operating system | Linux |
| **Compute nodes** | | | **Compute nodes** | |
| Total | 827 | | Total | 829 |
| Processor cores | 64/128/768 (2x32 cores/2x64 cores/32x24 cores) | | Processor cores | 128/768 (2x32 cores/2x64 cores/32x24 cores) |
| RAM | min. 256 GB | | RAM | min. 256 GB |
| Local disk drive | no | | Local disk drive | no |
| Compute network | InfiniBand HDR | | Compute network | InfiniBand HDR |
| Universal compute node | 720, Cn[001-720] | | Universal compute node | 720, Cn[001-720] |
| Accelerated compute nodes | 70, Acn[01-70] | | Accelerated compute nodes | 72, Acn[01-72] |
| Data analytics compute nodes | 1, DAcn1 | | Data analytics compute nodes | 1, DAcn1 |
| Cloud compute nodes | 36, CLn[01-36] | | Cloud compute nodes | 36, CLn[01-36] |
| **In total** | | | **In total** | |
| Total theoretical peak performance (Rpeak) | 15.2 PFLOP/s | | Total theoretical peak performance (Rpeak) | 15.2 PFLOP/s |
| Total amount of RAM | 248 TB | | Total amount of RAM | 313 TB |
| Node | Processor | Memory | Accelerator | | Node | Processor | Memory | Accelerator |
| ------------------------ | --------------------------------------- | ------ | ---------------------- | | ------------------------ | --------------------------------------- | ------ | ---------------------- |
| Universal compute node | 2 x AMD Zen 2 EPYC™ 7H12, 2.6 GHz | 256 GB | - | | Universal compute node | 2 x AMD Zen 2 EPYC™ 7H12, 2.6 GHz | 256 GB | - |
| Accelerated compute node | 2 x AMD Zen 2 EPYC™ 7452, 2.35 GHz | 512 GB | NVIDIA A100 | | Accelerated compute node | 2 x AMD Zen 2 EPYC™ 7763, 2.45 GHz | 1024 GB | NVIDIA A100 |
| Data analytics node | 32 x Intel Xeon-SC 8268, 2.9 GHz | 24 TB | - | | Data analytics node | 32 x Intel Xeon-SC 8268, 2.9 GHz | 24 TB | - |
| Cloud compute node | 2 x AMD Zen 2 EPYC™ 7H12, 2.6 GHz | 256 GB | - | | Cloud compute node | 2 x AMD Zen 2 EPYC™ 7H12, 2.6 GHz | 256 GB | - |
......
# Introduction # Introduction
Karolina is the latest and most powerful supercomputer cluster built for IT4Innovations in Q2 of 2021. The Karolina cluster consists of 827 compute nodes, totaling 102,016 compute cores with 248 TB RAM, giving over 15.2 PFLOP/s theoretical peak performance and is ranked in the top 10 of the most powerful supercomputers in Europe. Karolina is the latest and most powerful supercomputer cluster built for IT4Innovations in Q2 of 2021. The Karolina cluster consists of 829 compute nodes, totaling 106,752 compute cores with 313 TB RAM, giving over 15.2 PFLOP/s theoretical peak performance and is ranked in the top 10 of the most powerful supercomputers in Europe.
Nodes are interconnected through a fully non-blocking fat-tree InfiniBand network, and are equipped with AMD Zen 2 and Intel Cascade Lake architecture processors. Seventy nodes are also equipped with NVIDIA A100 accelerators. Read more in [Hardware Overview][1]. Nodes are interconnected through a fully non-blocking fat-tree InfiniBand network, and are equipped with AMD Zen 2, Zen3, and Intel Cascade Lake architecture processors. Seventy two nodes are also equipped with NVIDIA A100 accelerators. Read more in [Hardware Overview][1].
The cluster runs with an operating system compatible with the Red Hat [Linux family][a]. We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment][2]. The cluster runs with an operating system compatible with the Red Hat [Linux family][a]. We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment][2].
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment