Merge branch 'gaussian-gpu-support' into 'master'

Update gaussian.md See merge request !290

Merge branch 'gaussian-gpu-support' into 'master'
c502be19 · Jan Siwiec · 464b4395 · e3bffe42 · c502be19
Commit c502be19 authored 5 years ago by Jan Siwiec
--- a/docs.it4i/software/chemistry/gaussian.md
+++ b/docs.it4i/software/chemistry/gaussian.md
@@ -16,7 +16,7 @@ not in direct or indirect competition with the Gaussian Inc. company and have a
 The license includes GPU support and Linda parallel environment for Gaussian multi-node parallel execution.

 !!! note
-    You need to be a member of the **gaussian group**. Contact support@it4i.cz in order to get included in the gaussian group.
+    You need to be a member of the **gaussian group**. Contact [support\[at\]it4i.cz][b] in order to get included in the gaussian group.

 Check your group membership:

@@ -27,16 +27,19 @@ uid=1000(user) gid=1000(user) groups=1000(user),1234(open-0-0),7310(gaussian)

 ## Installed Version

-Gaussian is available on Salomon, Barbora, and DGX-2 systems in the latest version Gaussian 16 rev. c0.
+Gaussian is available on Anselm, Salomon, Barbora, and DGX-2 systems in the latest version Gaussian 16 rev. c0.

-| Module                                | CPU support | GPU support  | Parallelization | Note                |
-|--------------------------------------|-------------|--------------|-----------------|---------------------|
-| Gaussian/16_rev_c0-binary            | AVX2        | Yes          | SMP             | Binary distribution |
-| Gaussian/16_rev_c0-binary-Linda      | AVX2        | Yes          | SMP + Linda     | Binary distribution |
-| Gaussian/16_rev_c0-CascadeLake       | AVX-512     | No           | SMP             | IT4I compiled       |
-| Gaussian/16_rev_c0-CascadeLake-Linda | AVX-512     | No           | SMP + Linda     | IT4I compiled       |
+| Module                                | CPU support | GPU support  | Parallelization | Note               | Anselm  | Barbora | Salomon | DGX-2 |
+|--------------------------------------|-------------|--------------|-----------------|---------------------|---------|---------|---------|-------|
+| Gaussian/16_rev_c0-binary            | AVX2        | Yes          | SMP             | Binary distribution | No      | Yes     | Yes     | Yes   |
+| Gaussian/16_rev_c0-binary-Linda      | AVX2        | Yes          | SMP + Linda     | Binary distribution | No      | Yes     | Yes     | No    |
+| Gaussian/16_rev_c0-CascadeLake       | AVX-512     | No           | SMP             | IT4I compiled       | No      | Yes     | No      | No    |
+| Gaussian/16_rev_c0-CascadeLake-Linda | AVX-512     | No           | SMP + Linda     | IT4I compiled       | No      | Yes     | No      | No    |
+| Gaussian/16_rev_c0-GPU-Linda         | AVX-512     | Yes          | SMP + Linda     | IT4I compiled       | No      | Yes     | No      | No    |
+| Gaussian/16_rev_c0-GPU               | AVX-512     | Yes          | SMP             | IT4I compiled       | No      | No      | No      | Yes   |
+| Gaussian/16_rev_c0-Linda             | AVX         | No           | SMP + Linda     | IT4I compiled       | Yes     | No      | No      | No    |

-Speedup may be observed on Barbora supercomputer when using the `CascadeLake` module compared to the `binary` module.
+Speedup may be observed on Barbora and DGX-2 systems when using the `CascadeLake` and `GPU` modules compared to the `binary` module.

 ## Running

@@ -44,7 +47,7 @@ Gaussian is compiled for single node parallel execution as well as multi-node pa
 GPU support for V100 cards is available on Barbora and DGX-2.

 !!! note
-    By default, the execution is single-core, single node, without GPU acceleration.
+    By default, the execution is single-core, single-node, and without GPU acceleration.

 ### Shared-Memory Multiprocessor Parallel Execution (Single Node)

@@ -71,7 +74,7 @@ $ ml Gaussian/16_rev_c0-binary-Linda

 The network parallelization environment is **Linda**.
 In the input file Link0 header section, set the CPU cores (24 for Salomon, 36 for Barbora, 48 for DGX-2) and memory amount.
-Include the **%UseSSH keyword**, as well. This enables Linda to spawn parallel workers.
+Include the `%UseSSH` keyword, as well. This enables Linda to spawn parallel workers.

 ```bash
 %CPU=0-35
@@ -81,8 +84,8 @@ Include the **%UseSSH keyword**, as well. This enables Linda to spawn parallel w

 The number and placement of Linda workers may be controlled by %LindaWorkers keyword or by
 GAUSS_WDEF environment variable. When running multi-node job via the PBS batch queue, loading
-the Linda-enabled module **automaticaly sets the `GAUSS_WDEF` variable** to the correct node-list, using one worker per node.
-In combination with the %CPU keyword, this enables a full scale multi-node execution.
+the Linda-enabled module **automatically sets the `GAUSS_WDEF` variable** to the correct node-list, using one worker per node.
+In combination with the %CPU keyword, this enables a full-scale multi-node execution.

 ```bash
 $ echo $GAUSS_WDEF
@@ -100,10 +103,16 @@ Load Linda-enabled binary module
 $ ml Gaussian/16_rev_c0-binary-Linda
 ```

+Or GPU-enabled module
+
+```bash
+$ ml Gaussian/16_rev_c0-GPU-Linda
+```
+
 In the input file Link0 header section, set the CPU cores (24 for Barbora GPU nodes) and memory.
-To enable GPU acceleration, set the **%GPUCPU** keyword. This keyword activates GPU accelerators and dedicates CPU cores to drive the GPU accelerators.
-On Barbora GPU nodes, we activate GPUs 0-3 and assign cores 0,2,12,14 (two from each CPU socket) to drive the accelerators.
-If multi node computation is intended, Include the **%UseSSH keyword**, as well. This enables Linda to spawn parallel workers.
+To enable GPU acceleration, set the `%GPUCPU` keyword. This keyword activates GPU accelerators and dedicates CPU cores to drive the GPU accelerators.
+On Barbora GPU nodes, we activate GPUs 0-3 and assign cores 0, 2, 12, 14 (two from each CPU socket) to drive the accelerators.
+If multi-node computation is intended, include the `%UseSSH` keyword, as well. This enables Linda to spawn parallel workers.

 ```bash
 %CPU=0-23
@@ -112,14 +121,20 @@ If multi node computation is intended, Include the **%UseSSH keyword**, as well.
 %UseSSH
 ```

-GPU accelerated caclulations on the **DGX-2** are supported
-with Gaussian binary module.
+GPU accelerated calculations on the **DGX-2** are supported
+with Gaussian binary module

 ```bash
 $ ml Gaussian/16_rev_c0-binary
 ```

-In the input file Link0 header section, modify  the %CPU keyword to 48 cores and the %GPUCPU keyword to 16 GPU accelerators. Omit the Linda.
+Or IT4I-compiled module
+
+```bash
+$ ml Gaussian/16_rev_c0-GPU
+```
+
+In the input file Link0 header section, modify the `%CPU` keyword to 48 cores and the `%GPUCPU` keyword to 16 GPU accelerators. Omit the Linda.

 ```bash
 %CPU=0-47
@@ -149,8 +164,6 @@ WATER H20 optimization
 R 0.96
 A 109.471221

-
-
 ```

 ### Run Gaussian (All Modes)
@@ -225,4 +238,5 @@ exit
 [1]: ../../salomon/storage.md

 [a]: https://gaussian.com/gaussian16/
+[b]: mailto:support@it4i.cz
 [c]: https://gaussian.com/man/
\ No newline at end of file