documented hyperqueue
Compare changes
- Jan Siwiec authored
+ 26
− 142
@@ -10,8 +10,7 @@ However, executing a huge number of jobs via the PBS queue may strain the system
@@ -151,162 +150,47 @@ $ qstat -u $USER -tJ
In this example, tasks from the tasklist are executed via the GNU parallel. The jobscript executes multiple instances of itself in parallel, on all cores of the node. Once an instace of the jobscript is finished, a new instance starts until all entries in the tasklist are processed. Currently processed entries of the joblist may be retrieved via the `$1` variable. The `$TASK` variable expands to one of the input filenames from the tasklist. We copy the input file to the local scratch memory, execute myprog.x, and copy the output file back to the submit directory under the $TASK.out name.
HyperQueue lets you build a computation plan consisting of a large amount of tasks and then execute it transparently over a system like SLURM/PBS. It dynamically groups jobs into SLURM/PBS jobs and distributes them to fully utilize allocated notes. You thus do not have to manually aggregate your tasks into SLURM/PBS jobs. See the [project repository][a].
A combined approach, very similar to job arrays, can be taken. A job array is submitted to the queuing system. The subjobs run GNU parallel. The GNU parallel shell executes multiple instances of the jobscript using all of the cores on the node. The instances execute different work, controlled by the `$PBS_JOB_ARRAY` and the `$PARALLEL_SEQ` variables.
In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. The `$TASK` variable expands to one of the input filenames from the tasklist. We copy the input file to local scratch memory, execute myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once a task is finished, a new task starts, until the number of tasks in the numtasks file is reached.
@@ -321,10 +205,10 @@ $ cat README