Skip to content
Snippets Groups Projects
Commit 13ca779c authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Qfree update

parent eadccda7
No related branches found
No related tags found
1 merge request!406Qfree update
...@@ -21,6 +21,8 @@ Anselm ...@@ -21,6 +21,8 @@ Anselm
IT4I IT4I
IT4Innovations IT4Innovations
PBS PBS
vnode
vnodes
Salomon Salomon
TurboVNC TurboVNC
VNC VNC
...@@ -812,3 +814,5 @@ PROJECT ...@@ -812,3 +814,5 @@ PROJECT
e-INFRA e-INFRA
e-INFRA CZ e-INFRA CZ
DICE DICE
qgpu
qcpu
...@@ -4,16 +4,7 @@ To run a [job][1], computational resources for this particular job must be alloc ...@@ -4,16 +4,7 @@ To run a [job][1], computational resources for this particular job must be alloc
## Resources Allocation Policy ## Resources Allocation Policy
Resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share][3] ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are the most important: Resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share][3] ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources.
* **qexp** - Express queue
* **qprod** - Production queue
* **qlong** - Long queue
* **qmpp** - Massively parallel queue
* **qnvidia**, **qfat** - Dedicated queues
* **qcpu_biz**, **qgpu_biz** - Queues for commercial users
* **qcpu_eurohpc**, **qgpu_eurohpc** - Queues for EuroHPC users
* **qfree** - Free resource utilization queue
!!! note !!! note
See the queue status for [Karolina][a] or [Barbora][c]. See the queue status for [Karolina][a] or [Barbora][c].
...@@ -38,7 +29,13 @@ Use GNU Parallel and/or Job arrays when running (many) single core jobs. ...@@ -38,7 +29,13 @@ Use GNU Parallel and/or Job arrays when running (many) single core jobs.
In many cases, it is useful to submit a huge (100+) number of computational jobs into the PBS queue system. A huge number of (small) jobs is one of the most effective ways to execute parallel calculations, achieving best runtime, throughput and computer utilization. In this chapter, we discuss the recommended way to run huge numbers of jobs, including **ways to run huge numbers of single core jobs**. In many cases, it is useful to submit a huge (100+) number of computational jobs into the PBS queue system. A huge number of (small) jobs is one of the most effective ways to execute parallel calculations, achieving best runtime, throughput and computer utilization. In this chapter, we discuss the recommended way to run huge numbers of jobs, including **ways to run huge numbers of single core jobs**.
Read more on [Capacity Computing][6] page. Read more on the [Capacity Computing][6] page.
## Vnode Allocation
The `qgpu` queue on Karolina takes advantage of the division of nodes into vnodes. Accelerated node equipped with two 64-core processors and eight GPU cards is treated as eight vnodes, each containing 16 CPU cores and 1 GPU card. Vnodes can be allocated to jobs individually –⁠ through precise definition of resource list at job submission, you may allocate varying number of resources/GPU cards according to your needs.
Red more on the [Vnode Allocation][7] page.
[1]: ../index.md#terminology-frequently-used-on-these-pages [1]: ../index.md#terminology-frequently-used-on-these-pages
[2]: ../pbspro.md [2]: ../pbspro.md
...@@ -46,6 +43,7 @@ Read more on [Capacity Computing][6] page. ...@@ -46,6 +43,7 @@ Read more on [Capacity Computing][6] page.
[4]: resources-allocation-policy.md [4]: resources-allocation-policy.md
[5]: job-submission-and-execution.md [5]: job-submission-and-execution.md
[6]: capacity-computing.md [6]: capacity-computing.md
[7]: vnode-allocation.md
[a]: https://extranet.it4i.cz/rsweb/karolina/queues [a]: https://extranet.it4i.cz/rsweb/karolina/queues
[b]: https://www.altair.com/pbs-works/ [b]: https://www.altair.com/pbs-works/
......
...@@ -2,74 +2,100 @@ ...@@ -2,74 +2,100 @@
## Job Queue Policies ## Job Queue Policies
Resources are allocated to jobs in a fair-share fashion, subject to constraints set by the queue and the resources available to the project. The fair-share system ensures that individual users may consume approximately equal amounts of resources per week. Detailed information can be found in the [Job scheduling][1] section. Resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. The following table provides the queue partitioning overview: Resources are allocated to jobs in a fair-share fashion,
subject to constraints set by the queue and the resources available to the project.
!!! note "New queues" The fair-share system ensures that individual users may consume approximately equal amounts of resources per week.
As a part of a larger, gradual update of the queues, we are introducing new queues:<br><br> Detailed information can be found in the [Job scheduling][1] section.
**qcpu_preempt**, **qgpu_preempt** - Free queues with the lowest priority (LP). The queues require a project with allocation of the respective resource type. There is no limit on resource overdraft. Jobs are killed if other jobs with a higher priority (HP) request the nodes and there are no other nodes available. LP jobs are automatically requeued once HP jobs finish, so make sure your jobs are rerunnable<br><br>
**qcpu_free**, **qgpu_free** - limit increased from 120% to 150% of project's resources allocation, max walltime 18h, resources load reduced from 90% to 65%. Resources are accessible via several queues for queueing the jobs.
Queues provide prioritized and exclusive access to the computational resources.
!!! important
**The qfree, qcpu_free and qgpu_free queues are not free of charge**. [Normal accounting][2] applies. However, it allows for utilization of free resources, once a project has exhausted all its allocated computational resources. This does not apply to Director's Discretion projects (DD projects) by default. Usage of the queues after exhaustion of DD projects' computational resources is allowed upon request. !!! important "Queues update"
We are introducing updated queues.
!!! note These have the same parameters as the legacy queues but are divided based on resource type (`qcpu_` for non-accelerated nodes and `qgpu_` for accelerated nodes).<br><br>
The qexp queue is configured to run one job and accept five jobs in a queue per user. Note that on the Karolina's `qgpu` queue, **you can now allocate 1/8 of the node - 1 GPU and 16 cores**. For more information, see [Allocation of vnodes on qgpu][4].<br><br>
We have also added completely new queues `qcpu_preempt` and `qgpu_preempt`. For more information, see the table below.
### New Queues
| <div style="width:86px">Queue</div>| Description |
| -------------------------------- | ----------- |
| `qcpu` | Production queue for non-accelerated nodes intended for standard production runs. Requires an active project with nonzero remaining resources. Full nodes are allocated. Identical to `qprod`. |
| `qgpu` | Dedicated queue for accessing the NVIDIA accelerated nodes. Requires an active project with nonzero remaining resources. It utilizes 8x NVIDIA A100 with 320GB HBM2 memory per node. The PI needs to explicitly ask support for authorization to enter the queue for all users associated with their project. **On Karolina, you can allocate 1/8 of the node - 1 GPU and 16 cores**. For more information, see [Allocation of vnodes on qgpu][4]. |
| `qcpu_biz`<br>`qgpu_biz` | Commercial queues, slightly higher priority. |
| `qcpu_eurohpc`<br>`qgpu_eurohpc` | EuroHPC queues, slightly higher priority, **Karolina only**. |
| `qcpu_exp`<br>`qgpu_exp` | Express queues for testing and running very small jobs. Doesn't require a project. There are 2 nodes always reserved (w/o accelerators), max 8 nodes available per user. The nodes may be allocated on a per core basis. It is configured to run one job and accept five jobs in a queue per user. |
| `qcpu_free`<br>`qgpu_free` | Intended for utilization of free resources, after a project exhausted all its allocated resources. Note that the queue is **not free of charge**. [Normal accounting][2] applies. (Does not apply to DD projects by default. DD projects have to request for permission after exhaustion of computational resources.). Consumed resources will be accounted to the Project. Access to the queue is removed if consumed resources exceed 150% of the allocation. Full nodes are allocated. |
| `qcpu_long`<br>`qgpu_long` | Queues for long production runs. Require an active project with nonzero remaining resources. Only 200 nodes without acceleration may be accessed. Full nodes are allocated. |
| `qcpu_preempt`<br>`qgpu_preempt` | Free queues with the lowest priority (LP). The queues require a project with allocation of the respective resource type. There is no limit on resource overdraft. Jobs are killed if other jobs with a higher priority (HP) request the nodes and there are no other nodes available. LP jobs are automatically re-queued once HP jobs finish, so **make sure your jobs are re-runnable**. |
| `qdgx` | Queue for DGX-2, accessible from Barbora. |
| `qfat` | Queue for fat node, PI must request authorization to enter the queue for all users associated to their project. |
| `qviz` | Visualization queue Intended for pre-/post-processing using OpenGL accelerated graphics. Each user gets 8 cores of a CPU allocated (approx. 64 GB of RAM and 1/8 of the GPU capacity (default "chunk")). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 8 cores each) up to one whole node per user. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. |
### Legacy Queues
Legacy queues stay in production until the end of 2022.
| Legacy queue | Replaced by |
| ------------ | ------------------------- |
| `qexp` | `qcpu_exp` & `qgpu_exp` |
| `qprod` | `qcpu` |
| `qlong` | `qcpu_long` & `qgpu_long` |
| `nvidia` | `qgpu` Note that unlike in new queues, only full nodes can be allocated. |
| `qfree` | `qcpu_free` & `qgpu_free` |
The following table provides the queue partitioning per cluster overview:
### Karolina ### Karolina
| queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime (default/max) | | Queue | Active project | Project resources | Nodes | Min ncpus | Priority | Authorization | Walltime (default/max) |
| ---------------- | -------------- | -------------------- | ------------------------------------------------------------- | --------- | -------- | ------------- | ----------------------- | | ---------------- | -------------- | -------------------- | ------------------------------------------------------------- | --------- | -------- | ------------- | ----------------------- |
| **qexp** | no | none required | 32 nodes<br>max 2 nodes per job | 128 | 150 | no | 1 / 1h | | **qcpu** | yes | > 0 | 756 nodes | 128 | 0 | no | 24 / 48h |
| **qprod** | yes | > 0 | 754 nodes | 128 | 0 | no | 24 / 48h | | **qcpu_biz** | yes | > 0 | 756 nodes | 128 | 50 | no | 24 / 48h |
| **qlong** | yes | > 0 | 200 nodes, max 20 nodes per job, only non-accelerated nodes allowed | 128 | 0 | no | 72 / 144h | | **qcpu_eurohpc** | yes | > 0 | 756 nodes | 128 | 50 | no | 24 / 48h |
| **qnvidia** | yes | > 0 | 72 nodes | 128 | 0 | yes | 24 / 48h | | **qcpu_exp** | yes | none required | 756 nodes<br>max 2 nodes per user | 128 | 150 | no | 1 / 1h |
| **qfat** | yes | > 0 | 1 (sdf1) | 24 | 200 | yes | 24 / 48h | | **qcpu_free** | yes | < 150% of allocation | 756 nodes<br>max 4 nodes per job | 128 | -100 | no | 12 / 12h |
| **qcpu_biz** | yes | > 0 | 754 nodes | 128 | 50 | no | 24 / 48h | | **qcpu_long** | yes | > 0 | 200 nodes<br>max 20 nodes per job, only non-accelerated nodes allowed | 128 | 0 | no | 72 / 144h |
| **qgpu_biz** | yes | > 0 | 72 nodes | 128 | 50 | yes | 24 / 48h | | **qcpu_preempt** | yes | > 0 | 756 nodes<br>max 4 nodes per job | 128 | -200 | no | 12 / 12h |
| **qcpu_eurohpc** | yes | > 0 | 754 nodes | 128 | 50 | no | 24 / 48h | | **qgpu** | yes | > 0 | 72 nodes | 16 cpus<br>1 gpu | 0 | yes | 24 / 48h |
| **qgpu_eurohpc** | yes | > 0 | 72 nodes | 128 | 50 | yes | 24 / 48h | | **qgpu_biz** | yes | > 0 | 70 nodes | 128 | 50 | yes | 24 / 48h |
| **qcpu_preempt** | yes | > 0 | 491 nodes<br>max 4 nodes per job | 128 | -200 | no | 12 / 12h | | **qgpu_eurohpc** | yes | > 0 | 70 nodes | 128 | 50 | yes | 24 / 48h |
| **qgpu_exp** | yes | none required | 4 nodes<br>max 1 node per job | 16 cpus<br>1 gpu | 0 | no | 1 / 1h |
| **qgpu_free** | yes | < 150% of allocation | 46 nodes<br>max 2 nodes per job | 16 cpus<br>1 gpu|-100| no | 12 / 12h |
| **qgpu_preempt** | yes | > 0 | 72 nodes<br>max 2 nodes per job | 16 cpus<br>1 gpu|-200| no | 12 / 12h | | **qgpu_preempt** | yes | > 0 | 72 nodes<br>max 2 nodes per job | 16 cpus<br>1 gpu|-200| no | 12 / 12h |
| **qcpu_free** | yes | < 150% of allocation | 491 nodes<br>max 4 nodes per job | 128 | -100 | no | 12 / 18h | | **qviz** | yes | none required | 2 nodes (with NVIDIA® Quadro RTX™ 6000) | 8 | 0 | no | 1 / 8h |
| **qgpu_free** | yes | < 150% of allocation | 46 nodes<br>max 2 nodes per job | 16 cpus<br>1 gpu|-100| no | 12 / 18h | | **qfat** | yes | > 0 | 1 (sdf1) | 24 | 0 | yes | 24 / 48h |
| **qfree** | yes | < 150% of allocation | 491 nodes, max 4 nodes per job | 128 | -1024 | no | 12 / 12h | | **Legacy Queues** |
| **qviz** | yes | none required | 2 nodes (with NVIDIA® Quadro RTX™ 6000) | 8 | 150 | no | 1 / 8h | | **qfree** | yes | < 150% of allocation | 756 nodes<br>max 4 nodes per job | 128 | -100 | no | 12 / 12h |
| **qexp** | no | none required | 756 nodes<br>max 2 nodes per job | 128 | 150 | no | 1 / 1h |
* **qexp** Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerators), a maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on a per core basis. No special authorization is required to use the queue. Maximum runtime is 1 hour. | **qprod** | yes | > 0 | 756 nodes | 128 | 0 | no | 24 / 48h |
* **qprod** Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue. Full nodes, 128 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. Maximum runtime is 48 hours. | **qlong** | yes | > 0 | 200 nodes<br>max 20 nodes per job, only non-accelerated nodes allowed | 128 | 0 | no | 72 / 144h |
* **qlong** Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 200 nodes without acceleration may be accessed via the qlong queue. Full nodes, 128 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. Maximum runtime is 144 hours (3 \* qprod time) | **qnvidia** | yes | > 0 | 72 nodes | 128 | 0 | yes | 24 / 48h |
* **qnvidia** Dedicated queue: This queue is dedicated to accessing the NVIDIA accelerated nodes. It is required that an active project with nonzero remaining resources is specified to enter this queue. It utilizes 8x NVIDIA A100 with 320GB HBM2 memory per node. Full nodes, 128 cores and 8 GPUs per node, are allocated. The PI needs to explicitly ask [support][a] for authorization to enter the queue for all users associated with their project.
* **qfat** HPE Superdome Flex queue. This queue is dedicated to access the fat HPE Superdome Flex machine. The machine (sdf1) has 768 Intel® Xeon® Platinum cores at 2.9GHz and 24TB RAM. The PI needs to explicitly ask support for authorization to enter the queue for all users associated to their Project.
* **qcpu_biz**, **qgpu_biz** Commercial queues similar to qprod and qnvidia. These queues are reserved for commercial customers and have slightly higher priorities.
* **qcpu_eurohpc**, **qgpu_eurohpc** EuroHPC queues similar to qprod and qnvidia. These queues are reserved for EuroHPC users.
* **qcpu_preempt**, **qgpu_preempt** free queues with a lower priority (LP), requires allocation of the resource type, jobs are killed if other jobs with a higher priority (HP) request the nodes, LP jobs are automatically requeued once HP jobs finish.
* **qcpu_free**, **qgpu_free** queues similar to **qfree**
* **qfree** Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources. (Does not apply to DD projects by default. DD projects have to request for permission on qfree after exhaustion of computational resources.) It is required that an active project is specified to enter the queue. Consumed resources will be accounted to the Project. Access to the qfree queue is automatically removed if consumed resources exceed 150% of the resources allocated to the Project. Only 756 nodes without accelerator may be accessed from this queue. Full nodes, 128 cores per node are allocated. The queue runs with a very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
* **qviz** Visualization queue: Intended for pre-/post-processing using OpenGL accelerated graphics. Currently when accessing the node, each user gets 8 cores of a CPU allocated, thus approximately 64 GB of RAM and 1/8 of the GPU capacity (default "chunk"). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 8 cores each) up to one whole node per user, so that all 64 cores, 256 GB RAM and a whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum.
### Barbora ### Barbora
| queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime (default/max) | | Queue | Active project | Project resources | Nodes | Min ncpus | Priority | Authorization | Walltime (default/max) |
| ---------------- | -------------- | -------------------- | ------------------------- | --------- | -------- | ------------- | ---------------------- | | ---------------- | -------------- | -------------------- | -------------------------------- | --------- | -------- | ------------- | ---------------------- |
| **qexp** | no | none required | 16 nodes<br>max 4 nodes per job | 36 | 150 | no | 1 / 1h | | **qcpu** | yes | > 0 | 190 nodes | 36 | 0 | no | 24 / 48h |
| **qprod** | yes | > 0 | 190 nodes w/o accelerator | 36 | 0 | no | 24 / 48h | | **qcpu_biz** | yes | > 0 | 190 nodes | 36 | 50 | no | 24 / 48h |
| **qlong** | yes | > 0 | 20 nodes w/o accelerator | 36 | 0 | no | 72 / 144h | | **qcpu_exp** | yes | none required | 16 nodes | 36 | 150 | no | 1 / 1h |
| **qnvidia** | yes | > 0 | 8 NVIDIA nodes | 24 | 0 | yes | 24 / 48h | | **qcpu_free** | yes | < 150% of allocation | 124 nodes<br>max 4 nodes per job | 36 | -100 | no | 12 / 18h |
| **qfat** | yes | > 0 | 1 fat node | 8 | 200 | yes | 24 / 144h | | **qcpu_long** | yes | > 0 | 60 nodes<br>max 20 nodes per job | 36 | 0 | no | 72 / 144h |
| **qcpu_biz** | yes | > 0 | 187 nodes w/o accelerator | 36 | 50 | no | 24 / 48h | | **qcpu_preempt** | yes | > 0 | 190 nodes<br>max 4 nodes per job | 36 | -200 | no | 12 / 12h |
| **qgpu_biz** | yes | > 0 | 8 NVIDIA nodes | 24 | 50 | yes | 24 / 48h | | **qgpu** | yes | > 0 | 8 nodes | 24 | 0 | yes | 24 / 48h |
| **qcpu_preempt** | yes | > 0 | 190 nodes<br>max 4 nodes per job | 36 | -200 | no | 12 / 12h | | **qgpu_biz** | yes | > 0 | 8 nodes | 24 | 50 | yes | 24 / 48h |
| **qgpu_preempt** | yes | > 0 | 8 nodes<br>max 2 nodes per job | 24 | -200 | no | 12 / 12h | | **qgpu_exp** | yes | none required | 4 nodes<br>max 1 node per job | 24 | 0 | no | 1 / 1h |
| **qcpu_free** | yes | < 150% of allocation | 124 nodes<br>max 4 nodes per job | 36 | -100 | no | 12 / 18h | | **qgpu_free** | yes | < 150% of allocation | 5 nodes<br>max 2 nodes per job | 24 | -100 | no | 12 / 18h |
| **qgpu_free** | yes | < 150% of allocation | 5 nodes<br>max 2 nodes per job | 24 | -100 | no | 12 / 18h | | **qgpu_preempt** | yes | > 0 | 4 nodes<br>max 2 nodes per job | 24 | -200 | no | 12 / 12h |
| **qfree** | yes | < 150% of allocation | 192 w/o accelerator | 36 | -1024 | no | 12 / 12h | | **qdgx** | yes | > 0 | cn202 | 96 | 0 | yes | 4 / 48h |
| **qviz** | yes | none required | 2 nodes with NVIDIA Quadro P6000 | 4 | 0 | no | 1 / 8h |
* **qexp**, Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerators), a maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on a per core basis. No special authorization is required to use the queue. The maximum runtime in qexp is 1 hour. | **qfat** | yes | > 0 | 1 fat node | 128 | 0 | yes | 24 / 48h |
* **qprod**, Production queue: This queue is intended for normal production runs. It is required that an active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 187 nodes without accelerators are included. Full nodes, 36 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours. | **Legacy Queues** |
* **qlong**, Long queue: This queue is intended for long production runs. It is required that an active project with nonzero remaining resources is specified to enter the qlong. Only 20 nodes without acceleration may be accessed via the qlong queue. Full nodes, 36 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times that of the standard qprod time - 3 x 48 h). | **qexp** | no | none required | 16 nodes<br>max 4 nodes per job | 36 | 150 | no | 1 / 1h |
* **qnvidia**, **qfat**, Dedicated queues: The queue qnvidia is dedicated to accessing the NVIDIA accelerated nodes and qfat the Fat nodes. It is required that an active project with nonzero remaining resources is specified to enter these queues. Included are 8 NVIDIA (4 NVIDIA cards per node) and 1 fat nodes. Full nodes, 24 cores per node, are allocated. qfat runs with very high priority. The PI needs to explicitly ask [support][a] for authorization to enter the dedicated queues for all users associated with their project. | **qprod** | yes | > 0 | 190 nodes w/o accelerator | 36 | 0 | no | 24 / 48h |
* **qcpu_biz**, **qgpu_biz** Commercial queues similar to qprod and qnvidia: These queues are reserved for commercial customers and have slightly higher priorities. | **qlong** | yes | > 0 | 60 nodes w/o accelerator<br>max 20 nodes per job | 36 | 0 | no | 72 / 144h |
* **qcpu_preempt**, **qgpu_preempt** free queues with a lower priority (LP), requires allocation of the resource type, jobs are killed if other jobs with a higher priority (HP) request the nodes, LP jobs are automatically requeued once HP jobs finish.* **qfree**, Free resource queue: The queue qfree is intended for utilization of free resources, after a project has exhausted all of its allocated computational resources (Does not apply to DD projects by default; DD projects have to request permission to use qfree after exhaustion of computational resources). It is required that an active project is specified to enter the queue. Consumed resources will be accounted to the Project. Access to the qfree queue is automatically removed if consumed resources exceed 120% of the resources allocated to the Project. Only 189 nodes without accelerators may be accessed from this queue. Full nodes, 16 cores per node, are allocated. The queue runs with a very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. | **qnvidia** | yes | > 0 | 8 NVIDIA nodes | 24 | 0 | yes | 24 / 48h |
| **qfree** | yes | < 150% of allocation | 192 w/o accelerator<br>max 32 nodes per job | 36 | -100 | no | 12 / 12h |
## Queue Notes ## Queue Notes
...@@ -105,92 +131,9 @@ Options: ...@@ -105,92 +131,9 @@ Options:
--get-reservations Print reservations --get-reservations Print reservations
--get-reservations-details --get-reservations-details
Print reservations details Print reservations details
--get-nodes Print nodes of PBS complex ...
--get-nodeset Print nodeset of PBS complex ..
--get-nodes-details Print nodes details .
--get-vnodes Print vnodes of PBS complex
--get-vnodeset Print vnodes nodeset of PBS complex
--get-vnodes-details Print vnodes details
--get-jobs Print jobs
--get-jobs-details Print jobs details
--get-job-nodes Print job nodes
--get-job-nodeset Print job nodeset
--get-job-vnodes Print job vnodes
--get-job-vnodeset Print job vnodes nodeset
--get-jobs-check-params
Print jobid, job state, session_id, user, nodes
--get-users Print users of jobs
--get-allocated-nodes
Print nodes allocated by jobs
--get-allocated-nodeset
Print nodeset allocated by jobs
--get-allocated-vnodes
Print vnodes allocated by jobs
--get-allocated-vnodeset
Print vnodes nodeset allocated by jobs
--get-node-users Print node users
--get-node-jobs Print node jobs
--get-node-ncpus Print number of cpus per node
--get-node-naccelerators
Print number of accelerators per node
--get-node-allocated-ncpus
Print number of allocated cpus per node
--get-node-allocated-naccelerators
Print number of allocated accelerators per node
--get-node-qlist Print node qlist
--get-node-ibswitch Print node ibswitch
--get-vnode-users Print vnode users
--get-vnode-jobs Print vnode jobs
--get-vnode-ncpus Print number of cpus per vnode
--get-vnode-naccelerators
Print number of naccelerators per vnode
--get-vnode-allocated-ncpus
Print number of allocated cpus per vnode
--get-vnode-allocated-naccelerators
Print number of allocated accelerators per vnode
--get-vnode-qlist Print vnode qlist
--get-vnode-ibswitch Print vnode ibswitch
--get-user-nodes Print user nodes
--get-user-nodeset Print user nodeset
--get-user-vnodes Print user vnodes
--get-user-vnodeset Print user vnodes nodeset
--get-user-jobs Print user jobs
--get-user-job-count Print number of jobs per user
--get-user-node-count
Print number of allocated nodes per user
--get-user-vnode-count
Print number of allocated vnodes per user
--get-user-ncpus Print number of allocated ncpus per user
--get-qlist-nodes Print qlist nodes
--get-qlist-nodeset Print qlist nodeset
--get-qlist-vnodes Print qlist vnodes
--get-qlist-vnodeset Print qlist vnodes nodeset
--get-ibswitch-nodes Print ibswitch nodes
--get-ibswitch-nodeset
Print ibswitch nodeset
--get-ibswitch-vnodes
Print ibswitch vnodes
--get-ibswitch-vnodeset
Print ibswitch vnodes nodeset
--last-job Print expected time of last running job
--summary Print summary
--get-node-ncpu-chart
Obsolete. Print chart of allocated ncpus per node
--server=SERVER Use given PBS server
--state=STATE Only for given job state
--jobid=JOBID Only for given job ID
--user=USER Only for given user
--node=NODE Only for given node
--vnode=VNODE Only for given vnode
--nodestate=NODESTATE
Only for given node state (affects only --get-node*
--get-vnode* --get-qlist-* --get-ibswitch-* actions)
--incl-finished Include finished jobs
--walltime-exceeded-used-walltime
Job walltime exceeded - resources_used.walltime
--walltime-exceeded-real-runtime
Job walltime exceeded - real runtime
--backend-sqlite Use SQLite backend - experimental
``` ```
---8<--- "resource_accounting.md" ---8<--- "resource_accounting.md"
...@@ -200,6 +143,7 @@ Options: ...@@ -200,6 +143,7 @@ Options:
[1]: job-priority.md [1]: job-priority.md
[2]: #resource-accounting-policy [2]: #resource-accounting-policy
[3]: job-submission-and-execution.md [3]: job-submission-and-execution.md
[4]: ./vnode-allocation.md
[a]: https://support.it4i.cz/rt/ [a]: https://support.it4i.cz/rt/
[c]: https://extranet.it4i.cz/rsweb [c]: https://extranet.it4i.cz/rsweb
# Allocation of vnodes on qgpu
## Introduction
The `qgpu` queue on Karolina takes advantage of the division of nodes into vnodes.
Accelerated node equipped with two 64-core processors and eight GPU cards is treated as eight vnodes,
each containing 16 CPU cores and 1 GPU card.
Vnodes can be allocated to jobs individually –⁠
through precise definition of resource list at job submission,
you may allocate varying number of resources/GPU cards according to your needs.
!!! important "Vnodes and Security"
Division of nodes into vnodes was implemented to be as secure as possible, but it is still a "multi-user mode",
which means that if two users allocate a portion of the same node, they can see each other's running processes.
If this solution is inconvenient for you, consider allocating a whole node.
## Selection Statement and Chunks
Requested resources are specified using a selection statement:
```
-l select=[<N>:]<chunk>[+[<N>:]<chunk> ...]
```
`N` specifies the number of chunks; if not specified then `N = 1`.<br>
`chunk` declares the value of each resource in a set of resources which are to be allocated as a unit to a job.
* `chunk` is seen by the MPI as one node.
* Multiple chunks are then seen as multiple nodes.
* Maximum chunk size is equal to the size of a full physical node (8 GPU cards, 128 cores)
Default chunk for the `qgpu` queue is configured to contain 1 GPU card and 16 CPU cores, i.e. `ncpus=16:ngpus=1`.
* `ncpus` specifies number of CPU cores
* `ngpus` specifies number of GPU cards
### Allocating Single GPU
Single GPU can be allocated in an interactive session using
```console
qsub -q qgpu -A OPEN-00-00 -l select=1 -I
```
or simply
```console
qsub -q qgpu -A OPEN-00-00 -I
```
In this case, the `ngpus` parameter is optional, since it defaults to `1`.
You can verify your allocation either in the PBS using the `qstat` command,
or by checking the number of allocated GPU cards in the `CUDA_VISIBLE_DEVICES` variable:
```console
$ qstat -F json -f $PBS_JOBID | grep exec_vnode
"exec_vnode":"(acn53[0]:ncpus=16:ngpus=1)"
$ echo $CUDA_VISIBLE_DEVICES
GPU-8772c06c-0e5e-9f87-8a41-30f1a70baa00
```
The output shows that you have been allocated vnode acn53[0].
### Allocating Single Accelerated Node
!!! tip "Security tip"
Allocating a whole node prevents other users from seeing your running processes.
Single accelerated node can be allocated in an interactive session using
```console
qsub -q qgpu -A OPEN-00-00 -l select=8 -I
```
Setting `select=8` automatically allocates a whole accelerated node and sets `mpiproc`.
So for `N` full nodes, set `select` to `N x 8`.
However, note that it may take some time before your jobs are executed
if the required amount of full nodes isn't available.
### Allocating Multiple GPUs
!!! important "Security risk"
If two users allocate a portion of the same node, they can see each other's running processes.
When required for security reasons, consider allocating a whole node.
Again, the following examples use only the selection statement, so no additional setting is required.
```console
qsub -q qgpu -A OPEN-00-00 -l select=2 -I
```
In this example two chunks will be allocated on the same node, if possible.
```console
qsub -q qgpu -A OPEN-00-00 -l select=16 -I
```
This example allocates two whole accelerated nodes.
Multiple vnodes within the same chunk can be allocated using the `ngpus` parameter.
For example, to allocate 2 vnodes in an interactive mode, run
```console
qsub -q qgpu -A OPEN-00-00 -l select=1:ngpus=2:mpiprocs=2 -I
```
Remember to **set the number of `mpiprocs` equal to that of `ngpus`** to spawn an according number of MPI processes.
To verify the correctness:
```console
$ qstat -F json -f $PBS_JOBID | grep exec_vnode
"exec_vnode":"(acn53[0]:ncpus=16:ngpus=1+acn53[1]:ncpus=16:ngpus=1)"
$ echo $CUDA_VISIBLE_DEVICES | tr ',' '\n'
GPU-8772c06c-0e5e-9f87-8a41-30f1a70baa00
GPU-5e88c15c-e331-a1e4-c80c-ceb3f49c300e
```
The number of chunks to allocate is specified in the `select` parameter.
For example, to allocate 2 chunks, each with 4 GPUs, run
```console
qsub -q qgpu -A OPEN-00-00 -l select=2:ngpus=4:mpiprocs=4 -I
```
To verify the correctness:
```console
$ cat > print-cuda-devices.sh <<EOF
#!/bin/bash
echo \$CUDA_VISIBLE_DEVICES
EOF
$ chmod +x print-cuda-devices.sh
$ ml OpenMPI/4.1.4-GCC-11.3.0
$ mpirun ./print-cuda-devices.sh | tr ',' '\n' | sort | uniq
GPU-0910c544-aef7-eab8-f49e-f90d4d9b7560
GPU-1422a1c6-15b4-7b23-dd58-af3a233cda51
GPU-3dbf6187-9833-b50b-b536-a83e18688cff
GPU-3dd0ae4b-e196-7c77-146d-ae16368152d0
GPU-93edfee0-4cfa-3f82-18a1-1e5f93e614b9
GPU-9c8143a6-274d-d9fc-e793-a7833adde729
GPU-ad06ab8b-99cd-e1eb-6f40-d0f9694601c0
GPU-dc0bc3d6-e300-a80a-79d9-3e5373cb84c9
```
...@@ -77,6 +77,7 @@ nav: ...@@ -77,6 +77,7 @@ nav:
- Job Priority: general/job-priority.md - Job Priority: general/job-priority.md
- Job Submission and Execution: general/job-submission-and-execution.md - Job Submission and Execution: general/job-submission-and-execution.md
- Capacity Computing: general/capacity-computing.md - Capacity Computing: general/capacity-computing.md
- Vnode Allocation: general/vnode-allocation.md
- Migrating from SLURM: general/slurmtopbs.md - Migrating from SLURM: general/slurmtopbs.md
- Technical Information: - Technical Information:
- SSH Keys: - SSH Keys:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment