4 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section
[DGX-2][a]builds upon [DGX-1][b] in several ways. Introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
The [DGX-2][a]introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
With NVLink2, enables sixteen GPUs to be grouped together in a single system, for a total bandwidth going beyond 14 TB/s. Pair of Xeon CPUs, 1.5 TB of memory, and 30 TB of NVMe storage, and we get a system that consumes 10 kW, weighs 163.29 kg, but offers easily double the performance of the DGX-1.
With NVLink2, it enables sixteen Nvidia V100-SXM3 GPUs in a single system, for a total bandwidth going beyond 14 TB/s.
Featuring pair of Xeon 8168 CPUs, 1.5 TB of memory, and 30 TB of NVMe storage,
we get a system that consumes 10 kW, weighs 163.29 kg, but offers perfomance in excess of 130TF.
NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute performance in a single system, when using the tensor cores.
NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute performance in a single system, when using the tensor cores.
...
@@ -28,17 +30,22 @@ NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute p
...
@@ -28,17 +30,22 @@ NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute p
AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes
AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes
The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space, though with the usual tradeoffs involved if going off-chip.
The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space,
though with the usual tradeoffs involved if going off-chip.
Not unlike the Tesla V100 memory capacity increase then, one of NVIDIA’s goals here is to build a system that can keep in-memory workloads that would be too large for an 8 GPU cluster. Providing one such example, NVIDIA is saying that the DGX-2 is able to complete the training process for FAIRSEQ – a neural network model for language translation – 10x faster than a DGX-1 system, bringing it down to less than two days total rather than 15.
The DGX-2 is able to complete the training process
for FAIRSEQ – a neural network model for language translation – 10x faster than a DGX-1 system,
bringing it down to less than two days total rather than 15.


Otherwise, similar to its DGX-1 counterpart, the DGX-2 is designed to be a powerful server in its own right. On the storage side the DGX-2 comes with 30TB of NVMe-based solid state storage. And for clustering or further inter-system communications, it also offers InfiniBand and 100GigE connectivity, up to eight of them.
The DGX-2 is designed to be a powerful server in its own right.
On the storage side the DGX-2 comes with 30TB of NVMe-based solid state storage.
For clustering or further inter-system communications, it also offers InfiniBand and 100GigE connectivity, up to eight of them.


The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity.
The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity.