Newer
Older
# HyperQueue
HyperQueue lets you build a computation plan consisting of a large amount of tasks and then execute it transparently over a system like SLURM/PBS.
It dynamically groups tasks into Slurm jobs and distributes them to fully utilize allocated nodes.
You thus do not have to manually aggregate your tasks into Slurm jobs.
Find more about HyperQueue in its [documentation][a].

## Features
* **Transparent task execution on top of a Slurm/PBS cluster**
* Automatic task distribution amongst jobs, nodes, and cores
* Automatic submission of PBS/Slurm jobs
* Work-stealing scheduler
* NUMA-aware, core planning, task priorities, task arrays
* Nodes and tasks may be added/removed on the fly
* Low overhead per task (~100μs)
* Handles hundreds of nodes and millions of tasks
* Output streaming avoids creating many files on network filesystems
* Single binary, no installation, depends only on *libc*
* No elevated privileges required
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
## Installation
* On Barbora and Karolina, you can simply load the HyperQueue module:
```console
$ ml HyperQueue
```
* If you want to install/compile HyperQueue manually, follow the steps on the [official webpage][b].
## Usage
### Starting the Server
To use HyperQueue, you first have to start the HyperQueue server. It is a long-lived process that
is supposed to be running on a login node. You can start it with the following command:
```console
$ hq server start
```
### Submitting Computation
Once the HyperQueue server is running, you can submit jobs into it. Here are a few examples of job submissions.
You can find more information in the [documentation][1].
* Submit a simple job (command `echo 'Hello world'` in this case)
```console
$ hq submit echo 'Hello world'
```
* Submit a job with 10000 tasks
```console
$ hq submit --array 1-10000 my-script.sh
```
Once you start some jobs, you can observe their status using the following commands:
```console
# Display status of a single job
$ hq job <job-id>
# Display status of all jobs
$ hq jobs
```
!!! important
Before the jobs can start executing, you have to provide HyperQueue with some computational resources.
### Providing Computational Resources
Before HyperQueue can execute your jobs, it needs to have access to some computational resources.
You can provide these by starting HyperQueue *workers* which connect to the server and execute your jobs.
The workers should run on computing nodes, therefore they should be started inside Slurm jobs.
HyperQueue can automatically submit Slurm jobs with workers on your behalf. This system is called
[automatic allocation][c]. After the server is started, you can add a new automatic allocation
queue using the `hq alloc add` command:
```console
$ hq alloc add slurm -- -A<PROJECT-ID> -p qcpu_exp
After you run this command, HQ will automatically start submitting Slurm jobs on your behalf
* **Manually start Slurm jobs with HQ workers**
With the following command, you can submit a Slurm job that will start a single HQ worker which
$ salloc <salloc-params> -- /bin/bash -l -c "$(which hq) worker start"
```
!!! tip
For debugging purposes, you can also start the worker e.g. on a login node, simply by running
`$ hq worker start`. Do not use such worker for any long-running computations though!
## Architecture
Here you can see the architecture of HyperQueue.
The user submits jobs into the server which schedules them onto a set of workers running on compute nodes.

[1]: https://it4innovations.github.io/hyperqueue/stable/jobs/jobs/
[a]: https://it4innovations.github.io/hyperqueue/stable/
[b]: https://it4innovations.github.io/hyperqueue/stable/installation/
[c]: https://it4innovations.github.io/hyperqueue/stable/deployment/allocation/