resources-allocation-policy.md 8.86 KB
 David Hrbáč committed Oct 17, 2017 1 # Resources Allocation Policy  Lukáš Krupčík committed Jan 26, 2017 2   David Hrbáč committed Oct 19, 2017 3 ## Job Queue Policies  Lukáš Krupčík committed Aug 11, 2016 4   David Hrbáč committed Oct 31, 2018 5 The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and the resources available to the Project. The Fair-share system of Anselm ensures that individual users may consume approximately equal amounts of resources per week. Detailed information can be found in the [Job scheduling][1] section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. The following table provides the queue partitioning overview:  Lukáš Krupčík committed Aug 11, 2016 6   David Hrbáč committed Jan 27, 2017 7 !!! note  Jan Krupa committed Dec 19, 2018 8  Check the queue status at  Pavel Jirásek committed Jan 20, 2017 9   Marek Chrastina committed Oct 10, 2018 10 11 12 13 14 15 16 17 | queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime | | ------------------- | -------------- | -------------------- | ---------------------------------------------------- | --------- | -------- | ------------- | -------- | | qexp | no | none required | 209 nodes | 1 | 150 | no | 1 h | | qprod | yes | > 0 | 180 nodes w/o accelerator | 16 | 0 | no | 24/48 h | | qlong | yes | > 0 | 180 nodes w/o accelerator | 16 | 0 | no | 72/144 h | | qnvidia, qmic | yes | > 0 | 23 nvidia nodes, 4 mic nodes | 16 | 200 | yes | 24/48 h | | qfat | yes | > 0 | 2 fat nodes | 16 | 200 | yes | 24/144 h | | qfree | yes | < 120% of allocation | 180 w/o accelerator | 16 | -1024 | no | 12 h |  Lukáš Krupčík committed Aug 25, 2016 18   David Hrbáč committed Jan 27, 2017 19 !!! note  David Hrbáč committed Oct 31, 2018 20  **The qfree queue is not free of charge**. [Normal accounting][2] applies. However, it allows for utilization of free resources, once a project has exhausted all its allocated computational resources. This does not apply to Director's Discretion projects (DD projects) by default. Usage of qfree after exhaustion of DD projects' computational resources is allowed after request for this queue.  Lukáš Krupčík committed Aug 25, 2016 21   John Cawley committed Nov 30, 2017 22 **The qexp queue is equipped with nodes which do not have exactly the same CPU clock speed.** Should you need the nodes to have exactly the same CPU speed, you have to select the proper nodes during the PSB job submission.  Lukáš Krupčík committed Aug 25, 2016 23   John Cawley committed Nov 30, 2017 24 25 26 * **qexp**, the Express queue: This queue is dedicated to testing and running very small jobs. It is not required to specify a project to enter the qexp. There are always 2 nodes reserved for this queue (w/o accelerators), a maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB of RAM (cn208-209). This enables us to test and tune accelerated code and code with higher RAM requirements. The nodes may be allocated on a per core basis. No special authorization is required to use qexp. The maximum runtime in qexp is 1 hour. * **qprod**, the Production queue: This queue is intended for normal production runs. It is required that an active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerators are included. Full nodes, 16 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours. * **qlong**, the Long queue: This queue is intended for long production runs. It is required that an active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times that of the standard qprod time - 3 x 48 h).  David Hrbáč committed Oct 31, 2018 27 * **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to accessing the Nvidia accelerated nodes, the qmic to accessing MIC nodes and qfat the Fat nodes. It is required that an active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic, and 2 fat nodes are included. Full nodes, 16 cores per node, are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs to explicitly ask [support][a] for authorization to enter the dedicated queues for all users associated with her/his project.  Marek Chrastina committed Oct 10, 2018 28 * **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a project has exhausted all of its allocated computational resources (Does not apply to DD projects by default; DD projects have to request persmission to use qfree after exhaustion of computational resources). It is required that active project is specified to enter the queue. Consumed resources will be accounted to the Project. Access to the qfree queue is automatically removed if consumed resources exceed 120% of the resources allocated to the Project. Only 180 nodes without accelerators may be accessed from this queue. Full nodes, 16 cores per node, are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.  Lukáš Krupčík committed Aug 11, 2016 29   David Hrbáč committed Oct 19, 2017 30 ## Queue Notes  Lukáš Krupčík committed Aug 11, 2016 31   David Hrbáč committed Oct 31, 2018 32 The job wall clock time defaults to **half the maximum time**, see the table above. Longer wall time limits can be [set manually, see examples][3].  Lukáš Krupčík committed Aug 11, 2016 33   John Cawley committed Nov 30, 2017 34 Jobs that exceed the reserved wall clock time (Req'd Time) get killed automatically. The wall clock time limit can be changed for queuing jobs (state Q) using the qalter command, however it cannot be changed for a running job (state R).  Lukáš Krupčík committed Aug 11, 2016 35   David Hrbáč committed Oct 31, 2018 36 Anselm users may check the current queue configuration [here][b].  Lukáš Krupčík committed Aug 11, 2016 37   David Hrbáč committed Oct 19, 2017 38 ## Queue Status  Lukáš Krupčík committed Aug 11, 2016 39   Lukáš Krupčík committed Jan 26, 2017 40 !!! tip  David Hrbáč committed Oct 31, 2018 41  Check the status of jobs, queues and compute nodes [here][c].  Lukáš Krupčík committed Aug 11, 2016 42   Lukáš Krupčík committed Oct 26, 2018 43 ![rspbs web interface](../img/rsweb.png)  Lukáš Krupčík committed Aug 11, 2016 44 45 46  Display the queue status on Anselm:  Lukáš Krupčík committed Feb 17, 2017 47 console  Lukáš Krupčík committed Aug 11, 2016 48 $qstat -q  Lukáš Krupčík committed Aug 25, 2016 49   Lukáš Krupčík committed Aug 11, 2016 50   John Cawley committed Nov 30, 2017 51 The PBS allocation overview may be obtained also using the rspbs command:  Lukáš Krupčík committed Aug 11, 2016 52   Lukáš Krupčík committed Feb 17, 2017 53 console  Lukáš Krupčík committed Aug 11, 2016 54 55 56 57 $ rspbs Usage: rspbs [options] Options:  David Hrbáč committed Jan 23, 2017 58 59 60 61 62  --version show program's version number and exit -h, --help show this help message and exit --get-node-ncpu-chart Print chart of allocated ncpus per node --summary Print summary  Lukáš Krupčík committed Jan 27, 2017 63  --get-server-details Print server  David Hrbáč committed Jan 23, 2017 64  --get-queues Print queues  Lukáš Krupčík committed Jan 27, 2017 65  --get-queues-details Print queues details  David Hrbáč committed Jan 23, 2017 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95  --get-reservations Print reservations --get-reservations-details Print reservations details --get-nodes Print nodes of PBS complex --get-nodeset Print nodeset of PBS complex --get-nodes-details Print nodes details --get-jobs Print jobs --get-jobs-details Print jobs details --get-jobs-check-params Print jobid, job state, session_id, user, nodes --get-users Print users of jobs --get-allocated-nodes Print allocated nodes of jobs --get-allocated-nodeset Print allocated nodeset of jobs --get-node-users Print node users --get-node-jobs Print node jobs --get-node-ncpus Print number of ncpus per node --get-node-allocated-ncpus Print number of allocated ncpus per node --get-node-qlist Print node qlist --get-node-ibswitch Print node ibswitch --get-user-nodes Print user nodes --get-user-nodeset Print user nodeset --get-user-jobs Print user jobs --get-user-jobc Print number of jobs per user --get-user-nodec Print number of allocated nodes per user --get-user-ncpus Print number of allocated ncpus per user --get-qlist-nodes Print qlist nodes --get-qlist-nodeset Print qlist nodeset  Lukáš Krupčík committed Jan 27, 2017 96  --get-ibswitch-nodes Print ibswitch nodes  David Hrbáč committed Jan 23, 2017 97 98 99 100 101 102 103 104 105 106  --get-ibswitch-nodeset Print ibswitch nodeset --state=STATE Only for given job state --jobid=JOBID Only for given job ID --user=USER Only for given user --node=NODE Only for given node --nodestate=NODESTATE Only for given node state (affects only --get-node* --get-qlist-* --get-ibswitch-* actions) --incl-finished Include finished jobs  Lukáš Krupčík committed Aug 25, 2016 107   Lukáš Krupčík committed Aug 11, 2016 108   Branislav Jansik committed Sep 06, 2017 109 ---8<--- "resource_accounting.md"  Lukáš Krupčík committed Aug 11, 2016 110   Branislav Jansik committed Sep 06, 2017 111 ---8<--- "mathjax.md"  David Hrbáč committed Oct 31, 2018 112   David Hrbáč committed Oct 31, 2018 113 114 115 [1]: job-priority.md [2]: #resources-accounting-policy [3]: job-submission-and-execution.md  David Hrbáč committed Oct 31, 2018 116 117  [a]: https://support.it4i.cz/rt/  Jan Krupa committed Dec 19, 2018 118 119 [b]: https://extranet.it4i.cz/rsweb/anselm/queues [c]: https://extranet.it4i.cz/rsweb/anselm/