Skip to content
Snippets Groups Projects
Commit 13ca779c authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Qfree update

parent eadccda7
No related branches found
No related tags found
1 merge request!406Qfree update
...@@ -21,6 +21,8 @@ Anselm ...@@ -21,6 +21,8 @@ Anselm
IT4I IT4I
IT4Innovations IT4Innovations
PBS PBS
vnode
vnodes
Salomon Salomon
TurboVNC TurboVNC
VNC VNC
...@@ -812,3 +814,5 @@ PROJECT ...@@ -812,3 +814,5 @@ PROJECT
e-INFRA e-INFRA
e-INFRA CZ e-INFRA CZ
DICE DICE
qgpu
qcpu
...@@ -4,16 +4,7 @@ To run a [job][1], computational resources for this particular job must be alloc ...@@ -4,16 +4,7 @@ To run a [job][1], computational resources for this particular job must be alloc
## Resources Allocation Policy ## Resources Allocation Policy
Resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share][3] ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are the most important: Resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share][3] ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources.
* **qexp** - Express queue
* **qprod** - Production queue
* **qlong** - Long queue
* **qmpp** - Massively parallel queue
* **qnvidia**, **qfat** - Dedicated queues
* **qcpu_biz**, **qgpu_biz** - Queues for commercial users
* **qcpu_eurohpc**, **qgpu_eurohpc** - Queues for EuroHPC users
* **qfree** - Free resource utilization queue
!!! note !!! note
See the queue status for [Karolina][a] or [Barbora][c]. See the queue status for [Karolina][a] or [Barbora][c].
...@@ -38,7 +29,13 @@ Use GNU Parallel and/or Job arrays when running (many) single core jobs. ...@@ -38,7 +29,13 @@ Use GNU Parallel and/or Job arrays when running (many) single core jobs.
In many cases, it is useful to submit a huge (100+) number of computational jobs into the PBS queue system. A huge number of (small) jobs is one of the most effective ways to execute parallel calculations, achieving best runtime, throughput and computer utilization. In this chapter, we discuss the recommended way to run huge numbers of jobs, including **ways to run huge numbers of single core jobs**. In many cases, it is useful to submit a huge (100+) number of computational jobs into the PBS queue system. A huge number of (small) jobs is one of the most effective ways to execute parallel calculations, achieving best runtime, throughput and computer utilization. In this chapter, we discuss the recommended way to run huge numbers of jobs, including **ways to run huge numbers of single core jobs**.
Read more on [Capacity Computing][6] page. Read more on the [Capacity Computing][6] page.
## Vnode Allocation
The `qgpu` queue on Karolina takes advantage of the division of nodes into vnodes. Accelerated node equipped with two 64-core processors and eight GPU cards is treated as eight vnodes, each containing 16 CPU cores and 1 GPU card. Vnodes can be allocated to jobs individually –⁠ through precise definition of resource list at job submission, you may allocate varying number of resources/GPU cards according to your needs.
Red more on the [Vnode Allocation][7] page.
[1]: ../index.md#terminology-frequently-used-on-these-pages [1]: ../index.md#terminology-frequently-used-on-these-pages
[2]: ../pbspro.md [2]: ../pbspro.md
...@@ -46,6 +43,7 @@ Read more on [Capacity Computing][6] page. ...@@ -46,6 +43,7 @@ Read more on [Capacity Computing][6] page.
[4]: resources-allocation-policy.md [4]: resources-allocation-policy.md
[5]: job-submission-and-execution.md [5]: job-submission-and-execution.md
[6]: capacity-computing.md [6]: capacity-computing.md
[7]: vnode-allocation.md
[a]: https://extranet.it4i.cz/rsweb/karolina/queues [a]: https://extranet.it4i.cz/rsweb/karolina/queues
[b]: https://www.altair.com/pbs-works/ [b]: https://www.altair.com/pbs-works/
......
This diff is collapsed.
# Allocation of vnodes on qgpu
## Introduction
The `qgpu` queue on Karolina takes advantage of the division of nodes into vnodes.
Accelerated node equipped with two 64-core processors and eight GPU cards is treated as eight vnodes,
each containing 16 CPU cores and 1 GPU card.
Vnodes can be allocated to jobs individually –⁠
through precise definition of resource list at job submission,
you may allocate varying number of resources/GPU cards according to your needs.
!!! important "Vnodes and Security"
Division of nodes into vnodes was implemented to be as secure as possible, but it is still a "multi-user mode",
which means that if two users allocate a portion of the same node, they can see each other's running processes.
If this solution is inconvenient for you, consider allocating a whole node.
## Selection Statement and Chunks
Requested resources are specified using a selection statement:
```
-l select=[<N>:]<chunk>[+[<N>:]<chunk> ...]
```
`N` specifies the number of chunks; if not specified then `N = 1`.<br>
`chunk` declares the value of each resource in a set of resources which are to be allocated as a unit to a job.
* `chunk` is seen by the MPI as one node.
* Multiple chunks are then seen as multiple nodes.
* Maximum chunk size is equal to the size of a full physical node (8 GPU cards, 128 cores)
Default chunk for the `qgpu` queue is configured to contain 1 GPU card and 16 CPU cores, i.e. `ncpus=16:ngpus=1`.
* `ncpus` specifies number of CPU cores
* `ngpus` specifies number of GPU cards
### Allocating Single GPU
Single GPU can be allocated in an interactive session using
```console
qsub -q qgpu -A OPEN-00-00 -l select=1 -I
```
or simply
```console
qsub -q qgpu -A OPEN-00-00 -I
```
In this case, the `ngpus` parameter is optional, since it defaults to `1`.
You can verify your allocation either in the PBS using the `qstat` command,
or by checking the number of allocated GPU cards in the `CUDA_VISIBLE_DEVICES` variable:
```console
$ qstat -F json -f $PBS_JOBID | grep exec_vnode
"exec_vnode":"(acn53[0]:ncpus=16:ngpus=1)"
$ echo $CUDA_VISIBLE_DEVICES
GPU-8772c06c-0e5e-9f87-8a41-30f1a70baa00
```
The output shows that you have been allocated vnode acn53[0].
### Allocating Single Accelerated Node
!!! tip "Security tip"
Allocating a whole node prevents other users from seeing your running processes.
Single accelerated node can be allocated in an interactive session using
```console
qsub -q qgpu -A OPEN-00-00 -l select=8 -I
```
Setting `select=8` automatically allocates a whole accelerated node and sets `mpiproc`.
So for `N` full nodes, set `select` to `N x 8`.
However, note that it may take some time before your jobs are executed
if the required amount of full nodes isn't available.
### Allocating Multiple GPUs
!!! important "Security risk"
If two users allocate a portion of the same node, they can see each other's running processes.
When required for security reasons, consider allocating a whole node.
Again, the following examples use only the selection statement, so no additional setting is required.
```console
qsub -q qgpu -A OPEN-00-00 -l select=2 -I
```
In this example two chunks will be allocated on the same node, if possible.
```console
qsub -q qgpu -A OPEN-00-00 -l select=16 -I
```
This example allocates two whole accelerated nodes.
Multiple vnodes within the same chunk can be allocated using the `ngpus` parameter.
For example, to allocate 2 vnodes in an interactive mode, run
```console
qsub -q qgpu -A OPEN-00-00 -l select=1:ngpus=2:mpiprocs=2 -I
```
Remember to **set the number of `mpiprocs` equal to that of `ngpus`** to spawn an according number of MPI processes.
To verify the correctness:
```console
$ qstat -F json -f $PBS_JOBID | grep exec_vnode
"exec_vnode":"(acn53[0]:ncpus=16:ngpus=1+acn53[1]:ncpus=16:ngpus=1)"
$ echo $CUDA_VISIBLE_DEVICES | tr ',' '\n'
GPU-8772c06c-0e5e-9f87-8a41-30f1a70baa00
GPU-5e88c15c-e331-a1e4-c80c-ceb3f49c300e
```
The number of chunks to allocate is specified in the `select` parameter.
For example, to allocate 2 chunks, each with 4 GPUs, run
```console
qsub -q qgpu -A OPEN-00-00 -l select=2:ngpus=4:mpiprocs=4 -I
```
To verify the correctness:
```console
$ cat > print-cuda-devices.sh <<EOF
#!/bin/bash
echo \$CUDA_VISIBLE_DEVICES
EOF
$ chmod +x print-cuda-devices.sh
$ ml OpenMPI/4.1.4-GCC-11.3.0
$ mpirun ./print-cuda-devices.sh | tr ',' '\n' | sort | uniq
GPU-0910c544-aef7-eab8-f49e-f90d4d9b7560
GPU-1422a1c6-15b4-7b23-dd58-af3a233cda51
GPU-3dbf6187-9833-b50b-b536-a83e18688cff
GPU-3dd0ae4b-e196-7c77-146d-ae16368152d0
GPU-93edfee0-4cfa-3f82-18a1-1e5f93e614b9
GPU-9c8143a6-274d-d9fc-e793-a7833adde729
GPU-ad06ab8b-99cd-e1eb-6f40-d0f9694601c0
GPU-dc0bc3d6-e300-a80a-79d9-3e5373cb84c9
```
...@@ -77,6 +77,7 @@ nav: ...@@ -77,6 +77,7 @@ nav:
- Job Priority: general/job-priority.md - Job Priority: general/job-priority.md
- Job Submission and Execution: general/job-submission-and-execution.md - Job Submission and Execution: general/job-submission-and-execution.md
- Capacity Computing: general/capacity-computing.md - Capacity Computing: general/capacity-computing.md
- Vnode Allocation: general/vnode-allocation.md
- Migrating from SLURM: general/slurmtopbs.md - Migrating from SLURM: general/slurmtopbs.md
- Technical Information: - Technical Information:
- SSH Keys: - SSH Keys:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment