Skip to content
Snippets Groups Projects
Commit 8b725317 authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Update resources-allocation-policy.md

parent 0ce0b5c0
No related branches found
No related tags found
No related merge requests found
Pipeline #25708 passed with warnings
...@@ -28,9 +28,9 @@ Resources are allocated to jobs in a fair-share fashion, subject to constraints ...@@ -28,9 +28,9 @@ Resources are allocated to jobs in a fair-share fashion, subject to constraints
* **qexp**, Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerators), a maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on a per core basis. No special authorization is required to use the queue. Maximum runtime is 1 hour. * **qexp**, Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerators), a maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on a per core basis. No special authorization is required to use the queue. Maximum runtime is 1 hour.
* **qprod**, Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue. Full nodes, 128 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. Maximum runtime is 48 hours. * **qprod**, Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue. Full nodes, 128 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. Maximum runtime is 48 hours.
* **qlong**, Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 200 nodes without acceleration may be accessed via the qlong queue. Full nodes, 128 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. Maximum runtime is 144 hours (3 \* qprod time) * **qlong**, Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 200 nodes without acceleration may be accessed via the qlong queue. Full nodes, 128 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. Maximum runtime is 144 hours (3 \* qprod time)
* **qnvidia** Dedicated queue: This queue is dedicated to accessing the NVIDIA accelerated nodes. It is required that an active project with nonzero remaining resources is specified to enter this queue. It utilizes 8x NVIDIA A100 with 320GB HBM2 memory per node. Full nodes, 128 cores per node, are allocated. The queue runs with a very high priority. The PI needs to explicitly ask support for authorization to enter the queue for all users associated with their project. * **qnvidia** Dedicated queue: This queue is dedicated to accessing the NVIDIA accelerated nodes. It is required that an active project with nonzero remaining resources is specified to enter this queue. It utilizes 8x NVIDIA A100 with 320GB HBM2 memory per node. Full nodes, 128 cores per node, are allocated. The queue runs with a very high priority. The PI needs to explicitly ask [support][a] for authorization to enter the queue for all users associated with their project.
* **qfat**, HPE Superdome Flex queue. This queue is dedicated to access the fat HPE Superdome Flex machine. The machine (sdf1) has 768 Intel® Xeon® Platinum cores at 2.9GHz and 24TB RAM. The PI needs to explicitly ask support for authorization to enter the queue for all users associated to their Project. * **qfat**, HPE Superdome Flex queue. This queue is dedicated to access the fat HPE Superdome Flex machine. The machine (sdf1) has 768 Intel® Xeon® Platinum cores at 2.9GHz and 24TB RAM. The PI needs to explicitly ask support for authorization to enter the queue for all users associated to their Project.
* **qfree**, Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources. (Does not apply to DD projects by default. DD projects have to request for permission on qfree after exhaustion of computational resources.) It is required that an active project is specified to enter the queue. Consumed resources will be accounted to the Project. Access to the qfree queue is automatically removed if consumed resources exceed 120% of the resources allocated to the Project. Only 756 nodes without accelerator may be accessed from this queue. Full nodes, 128 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. * **qfree**, Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources. (Does not apply to DD projects by default. DD projects have to request for permission on qfree after exhaustion of computational resources.) It is required that an active project is specified to enter the queue. Consumed resources will be accounted to the Project. Access to the qfree queue is automatically removed if consumed resources exceed 120% of the resources allocated to the Project. Only 756 nodes without accelerator may be accessed from this queue. Full nodes, 128 cores per node are allocated. The queue runs with a very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
* **qviz**, Visualization queue: Intended for pre-/post-processing using OpenGL accelerated graphics. Currently when accessing the node, each user gets 8 cores of a CPU allocated, thus approximately 64 GB of RAM and 1/8 of the GPU capacity (default "chunk"). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 8 cores each) up to one whole node per user, so that all 64 cores, 256 GB RAM and a whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. * **qviz**, Visualization queue: Intended for pre-/post-processing using OpenGL accelerated graphics. Currently when accessing the node, each user gets 8 cores of a CPU allocated, thus approximately 64 GB of RAM and 1/8 of the GPU capacity (default "chunk"). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 8 cores each) up to one whole node per user, so that all 64 cores, 256 GB RAM and a whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum.
### Barbora ### Barbora
...@@ -47,17 +47,15 @@ Resources are allocated to jobs in a fair-share fashion, subject to constraints ...@@ -47,17 +47,15 @@ Resources are allocated to jobs in a fair-share fashion, subject to constraints
* **qexp**, Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerators), a maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on a per core basis. No special authorization is required to use the queue. The maximum runtime in qexp is 1 hour. * **qexp**, Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerators), a maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on a per core basis. No special authorization is required to use the queue. The maximum runtime in qexp is 1 hour.
* **qprod**, Production queue: This queue is intended for normal production runs. It is required that an active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 187 nodes without accelerators are included. Full nodes, 36 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours. * **qprod**, Production queue: This queue is intended for normal production runs. It is required that an active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 187 nodes without accelerators are included. Full nodes, 36 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
* **qlong**, Long queue: This queue is intended for long production runs. It is required that an active project with nonzero remaining resources is specified to enter the qlong. Only 20 nodes without acceleration may be accessed via the qlong queue. Full nodes, 36 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times that of the standard qprod time - 3 x 48 h). * **qlong**, Long queue: This queue is intended for long production runs. It is required that an active project with nonzero remaining resources is specified to enter the qlong. Only 20 nodes without acceleration may be accessed via the qlong queue. Full nodes, 36 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times that of the standard qprod time - 3 x 48 h).
* **qnvidia**, **qfat**, Dedicated queues: The queue qnvidia is dedicated to accessing the Nvidia accelerated nodes and qfat the Fat nodes. It is required that an active project with nonzero remaining resources is specified to enter these queues. Included are 8 NVIDIA (4 NVIDIA cards per node) and 1 fat nodes. Full nodes, 24 cores per node, are allocated. The queues run with very high priority. The PI needs to explicitly ask [support][a] for authorization to enter the dedicated queues for all users associated with their project. * **qnvidia**, **qfat**, Dedicated queues: The queue qnvidia is dedicated to accessing the NVIDIA accelerated nodes and qfat the Fat nodes. It is required that an active project with nonzero remaining resources is specified to enter these queues. Included are 8 NVIDIA (4 NVIDIA cards per node) and 1 fat nodes. Full nodes, 24 cores per node, are allocated. The queues run with very high priority. The PI needs to explicitly ask [support][a] for authorization to enter the dedicated queues for all users associated with their project.
* **qfree**, Free resource queue: The queue qfree is intended for utilization of free resources, after a project has exhausted all of its allocated computational resources (Does not apply to DD projects by default; DD projects have to request permission to use qfree after exhaustion of computational resources). It is required that an active project is specified to enter the queue. Consumed resources will be accounted to the Project. Access to the qfree queue is automatically removed if consumed resources exceed 120% of the resources allocated to the Project. Only 189 nodes without accelerators may be accessed from this queue. Full nodes, 16 cores per node, are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours. * **qfree**, Free resource queue: The queue qfree is intended for utilization of free resources, after a project has exhausted all of its allocated computational resources (Does not apply to DD projects by default; DD projects have to request permission to use qfree after exhaustion of computational resources). It is required that an active project is specified to enter the queue. Consumed resources will be accounted to the Project. Access to the qfree queue is automatically removed if consumed resources exceed 120% of the resources allocated to the Project. Only 189 nodes without accelerators may be accessed from this queue. Full nodes, 16 cores per node, are allocated. The queue runs with a very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
## Queue Notes ## Queue Notes
The job wall clock time defaults to **half the maximum time**, see the table above. Longer wall time limits can be [set manually, see examples][3]. The job wallclock time defaults to **half the maximum time**, see the table above. Longer wall time limits can be [set manually, see examples][3].
Jobs that exceed the reserved wall clock time (Req'd Time) get killed automatically. The wall clock time limit can be changed for queuing jobs (state Q) using the `qalter` command, however it cannot be changed for a running job (state R). Jobs that exceed the reserved wall clock time (Req'd Time) get killed automatically. The wall clock time limit can be changed for queuing jobs (state Q) using the `qalter` command, however it cannot be changed for a running job (state R).
You can check the current queue configuration on rsweb: [Barbora][b].
## Queue Status ## Queue Status
!!! tip !!! tip
...@@ -183,6 +181,4 @@ Options: ...@@ -183,6 +181,4 @@ Options:
[3]: job-submission-and-execution.md [3]: job-submission-and-execution.md
[a]: https://support.it4i.cz/rt/ [a]: https://support.it4i.cz/rt/
[b]: https://extranet.it4i.cz/rsweb/barbora/queues
[c]: https://extranet.it4i.cz/rsweb [c]: https://extranet.it4i.cz/rsweb
[d]: https://extranet.it4i.cz/rsweb
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment