Commit a88f36ce authored by Branislav Jansik's avatar Branislav Jansik

Update introduction.md

parent 2186d339
Pipeline #6589 passed with stages
in 4 minutes and 39 seconds
# NVIDIA DGX-2
The [DGX-2][a] introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
With NVLink2, it enables sixteen Nvidia V100-SXM3 GPUs in a single system, for a total bandwidth going beyond 14 TB/s.
Featuring pair of Xeon 8168 CPUs, 1.5 TB of memory, and 30 TB of NVMe storage,
we get a system that consumes 10 kW, weighs 163.29 kg, but offers perfomance in excess of 130TF.
NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute performance in a single system, when using the tensor cores.
<div align="center">
<iframe src="https://www.youtube.com/embed/OTOGw0BRqK0" width="50%" height="195" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
![](../img/dgx1.png)
The DGX-2 is a very powerful computational node, featuring high end x86_64 processors and 16 Nvidia V100-SXM3 GPUs.
| NVIDIA DGX-2 | |
| --- | --- |
......@@ -26,16 +13,26 @@ NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute p
| Size | 350 lbs |
| GPU Throughput | Tensor: 1920 TFLOPs, FP16: 480 TFLOPs, FP32: 240 TFLOPs, FP64: 120 TFLOPs |
![](../img/dgx2.png)
The [DGX-2][a] introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes
With NVLink2, it enables 16x Nvidia V100-SXM3 GPUs in a single system, for a total bandwidth going beyond 14 TB/s.
Featuring pair of Xeon 8168 CPUs, 1.5 TB of memory, and 30 TB of NVMe storage,
we get a system that consumes 10 kW, weighs 163.29 kg, but offers double precision perfomance in excess of 130TF.
The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space,
though with the usual tradeoffs involved if going off-chip.
Further, the DGX-2 offers a total of ~2 PFLOPs of half precision performance in a single system, when using the tensor cores.
<div align="center">
<iframe src="https://www.youtube.com/embed/OTOGw0BRqK0" width="50%" height="195" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
![](../img/dgx1.png)
With DGX-2, AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes.
The DGX-2 is able to complete the training process
for FAIRSEQ – a neural network model for language translation – 10x faster than a DGX-1 system,
bringing it down to less than two days total rather than 15.
bringing it down to less than two days total rather than 15 days.
![](../img/dgx3.png)
......@@ -46,6 +43,8 @@ For clustering or further inter-system communications, it also offers InfiniBand
![](../img/dgx2-nvlink.png){ width=50% }
The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity.
The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space,
though with the usual tradeoffs involved if going off-chip.
[a]: https://www.nvidia.com/content/dam/en-zz/es_em/Solutions/Data-Center/dgx-2/nvidia-dgx-2-datasheet.pdf
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment