Update introduction.md

a88f36ce · Branislav Jansik · 2186d339 · a88f36ce
Commit a88f36ce authored 6 years ago by Branislav Jansik
--- a/docs.it4i/dgx2/introduction.md
+++ b/docs.it4i/dgx2/introduction.md
 # NVIDIA DGX-2
-
-The [DGX-2][a] introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
-
-With NVLink2, it enables sixteen Nvidia V100-SXM3 GPUs in a single system, for a total bandwidth going beyond 14 TB/s.
-Featuring pair of Xeon 8168 CPUs, 1.5 TB of memory, and 30 TB of NVMe storage,
-we get a system that consumes 10 kW, weighs 163.29 kg, but offers perfomance in excess of 130TF.
-
-NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute performance in a single system, when using the tensor cores.
-
-<div align="center">
-  <iframe src="https://www.youtube.com/embed/OTOGw0BRqK0" width="50%" height="195" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-</div>
-
-![](../img/dgx1.png)
+The DGX-2 is a very powerful computational node, featuring high end x86_64 processors and 16 Nvidia V100-SXM3 GPUs.

 | NVIDIA DGX-2  | |
 | --- | --- |
@@ -26,16 +13,26 @@ NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute p
 | Size | 350 lbs |
 | GPU Throughput | Tensor: 1920 TFLOPs, FP16: 480 TFLOPs, FP32: 240 TFLOPs, FP64: 120 TFLOPs |

-![](../img/dgx2.png)
+The [DGX-2][a] introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.

-AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes
+With NVLink2, it enables 16x Nvidia V100-SXM3 GPUs in a single system, for a total bandwidth going beyond 14 TB/s.
+Featuring pair of Xeon 8168 CPUs, 1.5 TB of memory, and 30 TB of NVMe storage,
+we get a system that consumes 10 kW, weighs 163.29 kg, but offers double precision perfomance in excess of 130TF.

-The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space,
-though with the usual tradeoffs involved if going off-chip.
+Further, the DGX-2 offers  a total of ~2 PFLOPs of half precision performance in a single system, when using the tensor cores.
+
+<div align="center">
+  <iframe src="https://www.youtube.com/embed/OTOGw0BRqK0" width="50%" height="195" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
+
+![](../img/dgx1.png)
+
+
+With DGX-2, AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes.

 The DGX-2 is able to complete the training process
 for FAIRSEQ – a neural network model for language translation – 10x faster than a DGX-1 system,
-bringing it down to less than two days total rather than 15.
+bringing it down to less than two days total rather than 15 days.

 ![](../img/dgx3.png)

@@ -46,6 +43,8 @@ For clustering or further inter-system communications, it also offers InfiniBand
 ![](../img/dgx2-nvlink.png){ width=50% }

 The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity.
+The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space,
+though with the usual tradeoffs involved if going off-chip.

 [a]: https://www.nvidia.com/content/dam/en-zz/es_em/Solutions/Data-Center/dgx-2/nvidia-dgx-2-datasheet.pdf