Commit 37559e72 authored by Lukáš Krupčík's avatar Lukáš Krupčík

Merge branch 'dgx' into 'master'

Dgx

See merge request !249
parents 3fa40c83 6db16c37
Pipeline #6550 passed with stages
in 3 minutes and 58 seconds
......@@ -7,32 +7,4 @@
## Shell Access
The DGX-2 can be accessed by SSH protocol via login node ldgx at the address `ldgx.it4i.cz`. [VPN][1] connection is required in order to connect to ldgx.
```console
_ ___ _____ ____ ___ _ ____ ______ __ ____
| \ | \ \ / /_ _| _ \_ _| / \ | _ \ / ___\ \/ / |___ \
| \| |\ \ / / | || | | | | / _ \ | | | | | _ \ /_____ __) |
| |\ | \ V / | || |_| | | / ___ \ | |_| | |_| |/ \_____/ __/
|_| \_| \_/ |___|____/___/_/ \_\ |____/ \____/_/\_\ |_____|
...running on Ubuntu 18.04 (DGX-2)
[kru0052@ldgx ~]$
```
### Authentication
Authentication is available by private key only.
!!! info
Should you need access to the DGX-2 machine, request it at support@it4i.cz.
### Data Transfer
Data in and out of the system may be transferred by the SCP protocol.
!!! warning
/HOME directory on ldgx is not the same as /HOME directory on dgx. /SCRATCH storage is shared between login node and DGX-2 machine.
[1]: ../../general/accessing-the-clusters/vpn-access/
\ No newline at end of file
The DGX-2 machine can be accessed by SSH protocol via login nodes at the address `loginX.salomon.it4i.cz`.
......@@ -27,12 +27,17 @@ When allocating computational resources for the job, specify:
!!! note
Right now, the DGX-2 is divided into 16 computational nodes. Every node contains 6 CPUs (3 physical cores + 3 HT cores) and 1 GPU.
!!! info
You can access the DGX PBS scheduler by loadnig the "DGX-2" module.
Submit the job using the `qsub` command:
**Example for 1 GPU**
```console
[kru0052@ldgx ~]$ qsub -q qdgx -l select=1 -l walltime=04:00:00 -I
[kru0052@login4.salomon ~]$ ml DGX-2
PBS 18.1.3 for DGX-2 machine
[kru0052@login4.salomon ~]$ qsub -q qdgx -l select=1 -l walltime=04:00:00 -I
qsub: waiting for job 257.ldgx to start
qsub: job 257.ldgx ready
......@@ -47,12 +52,18 @@ Thu Mar 14 07:46:01 2019
| 0 Tesla V100-SXM3... On | 00000000:57:00.0 Off | 0 |
| N/A 29C P0 50W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
kru0052@dgx:~$ exit
[kru0052@login4.salomon ~]$ ml purge
PBS 13.1.1 for cluster Salomon
[kru0052@login4.salomon ~]$
```
**Example for 4 GPU**
```console
[kru0052@ldgx ~]$ qsub -q qdgx -l select=4 -l walltime=04:00:00 -I
[kru0052@login4.salomon ~]$ ml DGX-2
PBS 18.1.3 for DGX-2 machine
[kru0052@login4.salomon ~]$ qsub -q qdgx -l select=4 -l walltime=04:00:00 -I
qsub: waiting for job 256.ldgx to start
qsub: job 256.ldgx ready
......@@ -76,12 +87,18 @@ Thu Mar 14 07:45:29 2019
| 3 Tesla V100-SXM3... On | 00000000:5E:00.0 Off | 0 |
| N/A 35C P0 53W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
kru0052@dgx:~$ exit
[kru0052@login4.salomon ~]$ ml purge
PBS 13.1.1 for cluster Salomon
[kru0052@login4.salomon ~]$
```
**Example for 16 GPU (all DGX-2)**
```console
[kru0052@ldgx ~]$ qsub -q qdgx -l select=16 -l walltime=04:00:00 -I
[kru0052@login4.salomon ~]$ ml DGX-2
PBS 18.1.3 for DGX-2 machine
[kru0052@login4.salomon ~]$ qsub -q qdgx -l select=16 -l walltime=04:00:00 -I
qsub: waiting for job 258.ldgx to start
qsub: job 258.ldgx ready
......@@ -141,6 +158,10 @@ Thu Mar 14 07:46:32 2019
| 15 Tesla V100-SXM3... On | 00000000:E7:00.0 Off | 0 |
| N/A 34C P0 50W / 350W | 0MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
kru0052@dgx:~$ exit
[kru0052@login4.salomon ~]$ ml purge
PBS 13.1.1 for cluster Salomon
[kru0052@login4.salomon ~]$
```
!!! tip
......@@ -156,6 +177,8 @@ The jobscript is a user made script controlling a sequence of commands for execu
#### Example - Singularity Run Tensorflow
```console
[kru0052@login4.salomon ~]$ ml DGX-2
PBS 18.1.3 for DGX-2 machine
$ qsub -q qdgx -l select=16 -l walltime=01:00:00 -I
qsub: waiting for job 96.ldgx to start
qsub: job 96.ldgx ready
......@@ -194,6 +217,10 @@ PY 3.5.2 (default, Nov 12 2018, 13:43:14)
70 70.0 30763.2 0.001 0.324 0.10889
80 80.0 30845.5 0.001 0.324 0.02988
90 90.0 26350.9 0.001 0.324 0.00025
kru0052@dgx:~$ exit
[kru0052@login4.salomon ~]$ ml purge
PBS 13.1.1 for cluster Salomon
[kru0052@login4.salomon ~]$
```
**GPU stat**
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment