4 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section
In many cases, it is useful to submit a huge (>100) number of computational jobs into the PBS queue system. A huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations, achieving the best runtime, throughput, and computer utilization.
In many cases, it is useful to submit a huge (>100) number of computational jobs into the PBS queue system. A huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations, achieving the best runtime, throughput, and computer utilization.
However, executing a huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling, and overall degradation of performance and user experience for all users. For this reason, the number of jobs is **limited to 100 jobs per user, 4000 jobs and subjobs per user, 1500 subjobs per job array**.
However, executing a huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling, and overall degradation of performance and user experience for all users. For this reason, the number of jobs is **limited to 100 jobs per user, 4,000 jobs and subjobs per user, 1,500 subjobs per job array**.
!!! note
!!! note
Follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
Follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
* Use [Job arrays][1] when running a huge number of [multithread][2] (bound to one node only) or multinode (multithread across several nodes) jobs
* Use [Job arrays][1] when running a huge number of [multithread][2] (bound to one node only) or multinode (multithread across several nodes) jobs.
* Use [GNU parallel][3] when running single core jobs.
* Use [GNU parallel][3] when running single core jobs.
* Combine [GNU parallel with Job arrays][4] when running huge number of single core jobs.
* Combine [GNU parallel with Job arrays][4] when running huge number of single core jobs.
## Policy
## Policy
1. A user is allowed to submit at most 100 jobs. Each job may be [a job array][1].
1. A user is allowed to submit at most 100 jobs. Each job may be [a job array][1].
1. The array size is at most 1000 subjobs.
1. The array size is at most 1,000 subjobs.
## Job Arrays
## Job Arrays
...
@@ -181,7 +181,7 @@ Display status information for all user's subjobs.
...
@@ -181,7 +181,7 @@ Display status information for all user's subjobs.
$qstat -u$USER-tJ
$qstat -u$USER-tJ
```
```
Read more on job arrays in the [PBSPro Users guide][6].
For more information on job arrays, see the [PBSPro Users guide][6].
## GNU Parallel
## GNU Parallel
...
@@ -322,7 +322,7 @@ In this example, the jobscript executes in multiple instances in parallel, on al
...
@@ -322,7 +322,7 @@ In this example, the jobscript executes in multiple instances in parallel, on al
When deciding these values, keep in mind the following guiding rules:
When deciding these values, keep in mind the following guiding rules:
1. Let n=N/16. Inequality (n+1) \* T < W should hold. N is the number of tasks per subjob, T is the expected single task walltime and W is subjob walltime. A short subjob walltime improves scheduling and job throughput.
1. Let n=N/16. Inequality (n+1) \* T < W should hold. N is the number of tasks per subjob, T is the expected single task walltime and W is subjob walltime. A short subjob walltime improves scheduling and job throughput.
1. The number of tasks should be modulo 16.
1. The number of tasks should be modulo 16.
1. These rules are valid only when all tasks have similar task walltimes T.
1. These rules are valid only when all tasks have similar task walltimes T.
In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**, this must be the same as the number sent to numtasks file. Each subjob will run on one full node and process 16 input files in parallel, 32 in total per subjob. Every subjob is assumed to complete in less than 2 hours.
In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**, this must be the same as the number sent to numtasks file. Each subjob will run on one full node and process 16 input files in parallel, 32 in total per subjob. Every subjob is assumed to complete in less than 2 hours.
!!! hint
!!! hint
Use #PBS directives at the beginning of the jobscript file, do not forget to set your valid PROJECT_ID and desired queue.
Use #PBS directives at the beginning of the jobscript file, do not forget to set your valid PROJECT_ID and desired queue.
...
@@ -344,7 +344,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**,
...
@@ -344,7 +344,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**,
Download the examples in [capacity.zip][9], illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs.
Download the examples in [capacity.zip][9], illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs.
Unzip the archive in an empty directory on cluster and follow the instructions in the README file
Unzip the archive in an empty directory on cluster and follow the instructions in the README file-