Skip to content
Snippets Groups Projects
Commit 60e17082 authored by Roman Sliva's avatar Roman Sliva
Browse files

docs.it4i/dgx2 - slurm

parent 8d973f1e
No related branches found
No related tags found
1 merge request!440PBS eradication
Pipeline #34177 passed with warnings
......@@ -7,7 +7,8 @@
## How to Access
The DGX-2 machine can be accessed through the scheduler from Barbora login nodes `barbora.it4i.cz` as a compute node cn202.
The DGX-2 machine is integrated into [Barbora cluster][3].
The DGX-2 machine can be accessed from Barbora login nodes `barbora.it4i.cz` through the Barbora scheduler queue qdgx as a compute node cn202.
## Storage
......@@ -32,3 +33,4 @@ For more information on accessing PROJECT, its quotas, etc., see the [PROJECT Da
[1]: ../../barbora/storage/#home-file-system
[2]: ../../storage/project-storage
[3]: ../../barbora/introduction
......@@ -2,38 +2,24 @@
To run a job, computational resources of DGX-2 must be allocated.
## Resources Allocation Policy
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue. The queue provides prioritized and exclusive access to computational resources.
The queue for the DGX-2 machine is called **qdgx**.
!!! note
The qdgx queue is configured to run one job and accept one job in a queue per user with the maximum walltime of a job being **48** hours.
## Job Submission and Execution
The `qsub` submits the job into the queue. The command creates a request to the PBS Job manager for allocation of specified resources. The resources will be allocated when available, subject to allocation policies and constraints. After the resources are allocated, the jobscript or interactive shell is executed on the allocated node.
### Job Submission
The DGX-2 machine is integrated to and accessible through Barbora cluster, the queue for the DGX-2 machine is called **qdgx**.
When allocating computational resources for the job, specify:
1. a queue for your job (the default is **qdgx**);
1. the maximum wall time allocated to your calculation (default is **4 hour**, maximum is **48 hour**);
1. a jobscript or interactive switch.
!!! info
You can access the DGX PBS scheduler by loading the "DGX-2" module.
1. your Project ID
1. a queue for your job - **qdgx**;
1. the maximum time allocated to your calculation (default is **4 hour**, maximum is **48 hour**);
1. a jobscript if batch processing is intended.
Submit the job using the `qsub` command:
Submit the job using the `sbatch` (for batch processing) or `salloc` (for interactive session) command:
**Example**
```console
[kru0052@login2.barbora ~]$ qsub -q qdgx -l walltime=02:00:00 -I
qsub: waiting for job 258.dgx to start
qsub: job 258.dgx ready
[kru0052@login2.barbora ~]$ salloc -A PROJECT-ID -p qdgx --time=02:00:00
salloc: Granted job allocation 36631
salloc: Waiting for resource configuration
salloc: Nodes cn202 are ready for job
kru0052@cn202:~$ nvidia-smi
Wed Jun 16 07:46:32 2021
......@@ -95,7 +81,7 @@ kru0052@cn202:~$ exit
```
!!! tip
Submit the interactive job using the `qsub -I ...` command.
Submit the interactive job using the `salloc` command.
### Job Execution
......@@ -110,9 +96,10 @@ to download the container via Apptainer/Singularity, see the example below:
#### Example - Apptainer/Singularity Run Tensorflow
```console
[kru0052@login2.barbora ~]$ qsub -q qdgx -l walltime=01:00:00 -I
qsub: waiting for job 96.dgx to start
qsub: job 96.dgx ready
[kru0052@login2.barbora ~] $ salloc -A PROJECT-ID -p qdgx --time=02:00:00
salloc: Granted job allocation 36633
salloc: Waiting for resource configuration
salloc: Nodes cn202 are ready for job
kru0052@cn202:~$ singularity shell docker://nvcr.io/nvidia/tensorflow:19.02-py3
Singularity tensorflow_19.02-py3.sif:~>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment