Skip to content
Snippets Groups Projects
Commit f1c79e67 authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Update job-submission-and-execution.md

parent edd9973a
No related branches found
No related tags found
4 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section
......@@ -11,10 +11,7 @@ When allocating computational resources for the job, specify:
1. your Project ID
1. a Jobscript or interactive switch
!!! note
Use the **qsub** command to submit your job to a queue for allocation of computational resources.
Submit the job using the qsub command:
Submit the job using the `qsub` command:
```console
$ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] jobscript
......@@ -68,7 +65,7 @@ $ qsub -m n
### Salomon - Intel Xeon Phi Co-Processors
To allocate a node with Xeon Phi co-processor, the user needs to specify that in the select statement. Currently only allocation of whole nodes with both Phi cards as the smallest chunk is supported. A standard PBSPro approach through the "accelerator", "naccelerators", and "accelerator_model" attributes is used. The "accelerator_model" can be omitted since on Salomon, only one type of accelerator type/model is available.
To allocate a node with Xeon Phi co-processor, the user needs to specify that in the select statement. Currently only allocation of whole nodes with both Phi cards as the smallest chunk is supported. A standard PBSPro approach through the `accelerator`, `naccelerators`, and `accelerator_model` attributes is used. The `accelerator_model` can be omitted since on Salomon, only one type of accelerator type/model is available.
The absence of specialized queue for accessing the nodes with cards means, that the Phi cards can be utilized in any queue, including qexp for testing/experiments, qlong for longer jobs, qfree after the project resources have been spent, etc. The Phi cards are thus also available to PRACE users. There is no need to ask for permission to utilize the Phi cards in project proposals.
```console
......@@ -111,7 +108,7 @@ exec_vnode = (r21u05n581-mic0:naccelerators=1:ncpus=0)
Per NUMA node allocation.
Jobs are isolated by cpusets.
The UV2000 (node uv1) offers 3TB of RAM and 104 cores, distributed in 13 NUMA nodes. A NUMA node packs 8 cores and approx. 247GB RAM (with exception, node 11 has only 123GB RAM). In the PBS the UV2000 provides 13 chunks, a chunk per NUMA node (see [Resource allocation policy][1]). The jobs on UV2000 are isolated from each other by cpusets, so that a job by one user may not utilize CPU or memory allocated to a job by other user. Always, full chunks are allocated, a job may only use resources of the NUMA nodes allocated to itself.
The UV2000 (node uv1) offers 3TB of RAM and 104 cores, distributed in 13 NUMA nodes. A NUMA node packs 8 cores and approx. 247GB RAM (with exception, node 11 has only 123GB RAM). In the PBS the UV2000 provides 13 chunks, a chunk per NUMA node (see [Resource allocation policy][2]). The jobs on UV2000 are isolated from each other by cpusets, so that a job by one user may not utilize CPU or memory allocated to a job by other user. Always, full chunks are allocated, a job may only use resources of the NUMA nodes allocated to itself.
```console
$ qsub -A OPEN-0-0 -q qfat -l select=13 ./myjob
......@@ -139,7 +136,7 @@ In this example, we allocate 2000GB of memory and 16 cores on the UV2000 for 48
### Useful Tricks
All qsub options may be [saved directly into the jobscript][2]. In such a case, no options to qsub are needed.
All qsub options may be [saved directly into the jobscript][1]. In such a case, no options to qsub are needed.
```console
$ qsub ./myjob
......@@ -186,9 +183,9 @@ In this example, we allocate 4 nodes, 16 cores per node, selecting only the node
### Anselm - Placement by IB Switch
Groups of computational nodes are connected to chassis integrated Infiniband switches. These switches form the leaf switch layer of the [Infiniband network][2] fat tree topology. Nodes sharing the leaf switch can communicate most efficiently. Sharing the same switch prevents hops in the network and facilitates unbiased, highly efficient network communication.
Groups of computational nodes are connected to chassis integrated Infiniband switches. These switches form the leaf switch layer of the [Infiniband network][3] fat tree topology. Nodes sharing the leaf switch can communicate most efficiently. Sharing the same switch prevents hops in the network and facilitates unbiased, highly efficient network communication.
Nodes sharing the same switch may be selected via the PBS resource attribute ibswitch. Values of this attribute are iswXX, where XX is the switch number. The node-switch mapping can be seen in the [Hardware Overview][3] section.
Nodes sharing the same switch may be selected via the PBS resource attribute `ibswitch`. Values of this attribute are `iswXX`, where `XX` is the switch number. The node-switch mapping can be seen in the [Hardware Overview][4] section.
We recommend allocating compute nodes to a single switch when best possible computational network performance is required to run the job efficiently:
......@@ -211,7 +208,7 @@ Nodes directly connected to the same InifiBand switch can communicate most effic
!!! note
We recommend allocating compute nodes of a single switch when the best possible computational network performance is required to run job efficiently.
Nodes directly connected to the one InifiBand switch can be allocated using node grouping on the PBS resource attribute _switch_.
Nodes directly connected to the one InifiBand switch can be allocated using node grouping on the PBS resource attribute `switch`.
In this example, we request all 9 nodes directly connected to the same switch using node grouping placement.
......@@ -224,7 +221,7 @@ $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24 -l place=group=switch ./myjob
!!! note
Not useful for ordinary computing, suitable for testing and management tasks.
Nodes directly connected to the specific InifiBand switch can be selected using the PBS resource attribute _switch_.
Nodes directly connected to the specific InifiBand switch can be selected using the PBS resource attribute `switch`.
In this example, we request all 9 nodes directly connected to the r4i1s0sw1 switch.
......@@ -273,7 +270,7 @@ Nodes located in the same dimension group may be allocated using node grouping o
| 6D | ehc_6d | 432,576 |
| 7D | ehc_7d | all |
In this example, we allocate 16 nodes in the same [hypercube dimension][4] 1 group.
In this example, we allocate 16 nodes in the same [hypercube dimension][5] 1 group.
```console
$ qsub -A OPEN-0-0 -q qprod -l select=16:ncpus=24 -l place=group=ehc_1d -I
......@@ -335,7 +332,7 @@ Although this example is somewhat artificial, it demonstrates the flexibility of
## Job Management
!!! note
Check status of your jobs using the **qstat** and **check-pbs-jobs** commands
Check status of your jobs using the `qstat` and `check-pbs-jobs` commands
```console
$ qstat -a
......@@ -414,21 +411,21 @@ Run loop 3
In this example, we see the actual output (some iteration loops) of the job 35141.dm2.
!!! note
Manage your queued or running jobs, using the **qhold**, **qrls**, **qdel**, **qsig**, or **qalter** commands
Manage your queued or running jobs, using the `qhold`, `qrls`, `qdel`, `qsig`, or `qalter` commands
You may release your allocation at any time, using the qdel command
You may release your allocation at any time, using the `qdel` command
```console
$ qdel 12345.srv11
```
You may kill a running job by force, using the qsig command
You may kill a running job by force, using the `qsig` command
```console
$ qsig -s 9 12345.srv11
```
Learn more by reading the pbs man page
Learn more by reading the PBS man page
```console
$ man pbs_professional
......@@ -441,7 +438,7 @@ $ man pbs_professional
!!! note
Prepare the jobscript to run batch jobs in the PBS queue system
The Jobscript is a user made script controlling a sequence of commands for executing the calculation. It is often written in bash, though other scripts may be used as well. The jobscript is supplied to the PBS **qsub** command as an argument, and is executed by the PBS Professional workload manager.
The Jobscript is a user made script controlling a sequence of commands for executing the calculation. It is often written in bash, though other scripts may be used as well. The jobscript is supplied to the PBS `qsub` command as an argument, and is executed by the PBS Professional workload manager.
!!! note
The jobscript or interactive shell is executed on first of the allocated nodes.
......@@ -474,7 +471,7 @@ $ pwd
In this example, 4 nodes were allocated interactively for 1 hour via the qexp queue. The interactive shell is executed in the home directory.
!!! note
All nodes within the allocation may be accessed via SSH. Unallocated nodes are not accessible to the user.
All nodes within the allocation may be accessed via SSH. Unallocated nodes are not accessible to the user.
The allocated nodes are accessible via SSH from login nodes. The nodes may access each other via SSH as well.
......@@ -538,12 +535,12 @@ exit
In this example, a directory in /home holds the input file input and the mympiprog.x executable. We create the myjob directory on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI program mympiprog.x and copy the output file back to the /home directory. mympiprog.x is executed as one process per node, on all allocated nodes.
!!! note
Consider preloading inputs and executables onto [shared scratch][4] memory before the calculation starts.
Consider preloading inputs and executables onto [shared scratch][6] memory before the calculation starts.
In some cases, it may be impractical to copy the inputs to the scratch memory and the outputs to the home directory. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such cases, it is the users' responsibility to preload the input files on shared /scratch memory before the job submission, and retrieve the outputs manually after all calculations are finished.
!!! note
Store the qsub options within the jobscript. Use the **mpiprocs** and **ompthreads** qsub options to control the MPI job execution.
Store the qsub options within the jobscript. Use the `mpiprocs` and `ompthreads` qsub options to control the MPI job execution.
### Example Jobscript for MPI Calculation With Preloaded Inputs
......@@ -570,16 +567,16 @@ mpirun ./mympiprog.x
exit
```
In this example, input and executable files are assumed to be preloaded manually in the /scratch/$USER/myjob directory. Note the **mpiprocs** and **ompthreads** qsub options controlling the behavior of the MPI execution. mympiprog.x is executed as one process per node, on all 100 allocated nodes. If mympiprog.x implements OpenMP threads, it will run 16 threads per node.
In this example, input and executable files are assumed to be preloaded manually in the /scratch/$USER/myjob directory. Note the `mpiprocs` and `ompthreads` qsub options controlling the behavior of the MPI execution. mympiprog.x is executed as one process per node, on all 100 allocated nodes. If mympiprog.x implements OpenMP threads, it will run 16 threads per node.
More information can be found in the [Running OpenMPI][5] and [Running MPICH2][6] sections.
More information can be found in the [Running OpenMPI][7] and [Running MPICH2][8] sections.
### Example Jobscript for Single Node Calculation
!!! note
The local scratch directory is often useful for single node jobs. Local scratch memory will be deleted immediately after the job ends.
Example jobscript for single node calculation, using [local scratch][4] memory on the node:
Example jobscript for single node calculation, using [local scratch][6] memory on the node:
```bash
#!/bin/bash
......@@ -605,12 +602,14 @@ In this example, a directory in /home holds the input file input and executable
### Other Jobscript Examples
Further jobscript examples may be found in the software section and the [Capacity computing][7] section.
Further jobscript examples may be found in the software section and the [Capacity computing][9] section.
[1]: #example-jobscript-for-mpi-calculation-with-preloaded-inputs
[2]: ../anselm/network.md
[3]: ../anselm/hardware-overview.md
[4]: ../anselm/storage.md
[5]: ../software/mpi/running_openmpi.md
[6]: ../software/mpi/running-mpich2.md
[7]: capacity-computing.md
[2]: resources-allocation-policy.md
[3]: ../anselm/network.md
[4]: ../anselm/hardware-overview.md
[5]: ../salomon/7d-enhanced-hypercube
[6]: ../anselm/storage.md
[7]: ../software/mpi/running_openmpi.md
[8]: ../software/mpi/running-mpich2.md
[9]: capacity-computing.md
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment