6 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section,!196Master,!190Update information in capacity-computing.md to match Salomon cluster.
In this example, the submit directory holds the 900 input files, executable myprog.x and the jobscript file. As input for each run, we take the filename of input file from created tasklist file. We copy the input file to scratch /scratch/work/user/$USER/$PBS_JOBID, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The myprog.x runs on one node only and must use threads to run in parallel. Be aware, that if the myprog.x **is not multithreaded**, then all the **jobs are run as single thread programs in sequential** manner. Due to allocation of the whole node, the **accounted time is equal to the usage of whole node**, while using only 1/24 of the node!
In this example, the submit directory holds the 900 input files, executable myprog.x and the jobscript file. As input for each run, we take the filename of input file from created tasklist file. We copy the input file to scratch (/scratch/work/user/$USER/$PBS_JOBID), execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The myprog.x runs on one node only and must use threads to run in parallel. Be aware, that if the myprog.x **is not multithreaded**, then all the **jobs are run as single thread programs in sequential** manner. Due to allocation of the whole node, the **accounted time is equal to the usage of whole node**, while using only 1/24 of the node!
If huge number of parallel multicore (in means of multinode multithread, e. g. MPI enabled) jobs is needed to run, then a job array approach should also be used. The main difference compared to previous example using one node is that the local scratch should not be used (as it's not shared between nodes) and MPI or other technique for parallel multinode run has to be used properly.
If huge number of parallel multicore (in means of multinode multithread, e. g. MPI enabled) jobs is needed to run, then a job array approach should also be used. The main difference compared to previous example using one node is that the local scratch should not be used (as it's not shared between nodes) and MPI or other technique for parallel multinode run has to be used properly.
...
@@ -154,12 +154,12 @@ Read more on job arrays in the [PBSPro Users guide](../pbspro/).
...
@@ -154,12 +154,12 @@ Read more on job arrays in the [PBSPro Users guide](../pbspro/).
!!! note
!!! note
Use GNU parallel to run many single core tasks on one node.
Use GNU parallel to run many single core tasks on one node.
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm.
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on the cluster.
For more information and examples see the parallel man page:
For more information and examples see the parallel man page:
In this example, tasks from tasklist are executed via the GNU parallel. The jobscript executes multiple instances of itself in parallel, on all cores of the node. Once an instace of jobscript is finished, new instance starts until all entries in tasklist are processed. Currently processed entry of the joblist may be retrieved via $1 variable. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name.
In this example, tasks from tasklist are executed via the GNU parallel. The jobscript executes multiple instances of itself in parallel, on all cores of the node. Once an instace of jobscript is finished, new instance starts until all entries in tasklist are processed. Currently processed entry of the joblist may be retrieved via $1 variable. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to the scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name.
### Submit the Job
### Submit the Job
...
@@ -237,7 +237,7 @@ Combined approach, very similar to job arrays, can be taken. Job array is submit
...
@@ -237,7 +237,7 @@ Combined approach, very similar to job arrays, can be taken. Job array is submit
Example:
Example:
Assume we have 992 input files with name beginning with "file" (e. g. file001, ..., file992). Assume we would like to use each of these input files with program executable myprog.x, each as a separate single core job. We call these single core jobs tasks.
Assume we have 960 input files with name beginning with "file" (e. g. file001, ..., file960). Assume we would like to use each of these input files with program executable myprog.x, each as a separate single core job. We call these single core jobs tasks.
First, we create a tasklist file, listing all tasks - all input files in our example:
First, we create a tasklist file, listing all tasks - all input files in our example:
In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached.
In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to the scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached.
!!! note
!!! note
Select subjob walltime and number of tasks per subjob carefully
Select subjob walltime and number of tasks per subjob carefully
...
@@ -294,14 +294,14 @@ When deciding this values, think about following guiding rules :
...
@@ -294,14 +294,14 @@ When deciding this values, think about following guiding rules :
### Submit the Job Array (-J)
### Submit the Job Array (-J)
To submit the job array, use the qsub -J command. The 992 tasks' job of the [example above](capacity-computing/#combined_example) may be submitted like this:
To submit the job array, use the qsub -J command. The 960 tasks' job of the [example above](capacity-computing/#combined_example) may be submitted like this:
```console
```console
$qsub -N JOBNAME -J 1-992:32 jobscript
$qsub -N JOBNAME -J 1-960:48 jobscript
12345[].dm2
12345[].dm2
```
```
In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**48**, this must be the same as the number sent to numtasks file. Each subjob will run on full node and process 24 input files in parallel, 48 in total per subjob. Every subjob is assumed to complete in less than 2 hours.
In this example, we submit a job array of 20 subjobs. Note the -J 1-960:48, this must be the same as the number sent to numtasks file. Each subjob will run on full node and process 24 input files in parallel, 48 in total per subjob. Every subjob is assumed to complete in less than 2 hours.
!!! note
!!! note
Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue.
Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue.
...
@@ -310,7 +310,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**48**,
...
@@ -310,7 +310,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**48**,
Download the examples in [capacity.zip](capacity.zip), illustrating the above listed ways to run huge number of jobs. We recommend to try out the examples, before using this for running production jobs.
Download the examples in [capacity.zip](capacity.zip), illustrating the above listed ways to run huge number of jobs. We recommend to try out the examples, before using this for running production jobs.
Unzip the archive in an empty directory on Anselm and follow the instructions in the README file
Unzip the archive in an empty directory on the cluster and follow the instructions in the README file