Skip to content
Snippets Groups Projects
Commit a88f36ce authored by Branislav Jansik's avatar Branislav Jansik
Browse files

Update introduction.md

parent 2186d339
No related branches found
No related tags found
4 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section
# NVIDIA DGX-2
The [DGX-2][a] introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
With NVLink2, it enables sixteen Nvidia V100-SXM3 GPUs in a single system, for a total bandwidth going beyond 14 TB/s.
Featuring pair of Xeon 8168 CPUs, 1.5 TB of memory, and 30 TB of NVMe storage,
we get a system that consumes 10 kW, weighs 163.29 kg, but offers perfomance in excess of 130TF.
NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute performance in a single system, when using the tensor cores.
<div align="center">
<iframe src="https://www.youtube.com/embed/OTOGw0BRqK0" width="50%" height="195" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
![](../img/dgx1.png)
The DGX-2 is a very powerful computational node, featuring high end x86_64 processors and 16 Nvidia V100-SXM3 GPUs.
| NVIDIA DGX-2 | |
| --- | --- |
......@@ -26,16 +13,26 @@ NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute p
| Size | 350 lbs |
| GPU Throughput | Tensor: 1920 TFLOPs, FP16: 480 TFLOPs, FP32: 240 TFLOPs, FP64: 120 TFLOPs |
![](../img/dgx2.png)
The [DGX-2][a] introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes
With NVLink2, it enables 16x Nvidia V100-SXM3 GPUs in a single system, for a total bandwidth going beyond 14 TB/s.
Featuring pair of Xeon 8168 CPUs, 1.5 TB of memory, and 30 TB of NVMe storage,
we get a system that consumes 10 kW, weighs 163.29 kg, but offers double precision perfomance in excess of 130TF.
The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space,
though with the usual tradeoffs involved if going off-chip.
Further, the DGX-2 offers a total of ~2 PFLOPs of half precision performance in a single system, when using the tensor cores.
<div align="center">
<iframe src="https://www.youtube.com/embed/OTOGw0BRqK0" width="50%" height="195" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
![](../img/dgx1.png)
With DGX-2, AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes.
The DGX-2 is able to complete the training process
for FAIRSEQ – a neural network model for language translation – 10x faster than a DGX-1 system,
bringing it down to less than two days total rather than 15.
bringing it down to less than two days total rather than 15 days.
![](../img/dgx3.png)
......@@ -46,6 +43,8 @@ For clustering or further inter-system communications, it also offers InfiniBand
![](../img/dgx2-nvlink.png){ width=50% }
The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity.
The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space,
though with the usual tradeoffs involved if going off-chip.
[a]: https://www.nvidia.com/content/dam/en-zz/es_em/Solutions/Data-Center/dgx-2/nvidia-dgx-2-datasheet.pdf
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment