Commit 42dcecdb authored by David Hrbáč's avatar David Hrbáč

Links OK

parent 7c627d6f
......@@ -9,13 +9,13 @@ However, executing huge number of jobs via the PBS queue may strain the system.
!!! note
Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
* Use [Job arrays](#job-arrays) when running huge number of [multithread](#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs
* Use [GNU parallel](#gnu-parallel) when running single core jobs
* Combine [GNU parallel with Job arrays](#job-arrays-and-gnu-parallel) when running huge number of single core jobs
* Use [Job arrays][1] when running huge number of [multithread][2] (bound to one node only) or multinode (multithread across several nodes) jobs
* Use [GNU parallel][3] when running single core jobs
* Combine [GNU parallel with Job arrays][4] when running huge number of single core jobs
## Policy
1. A user is allowed to submit at most 100 jobs. Each job may be [a job array](#job-arrays).
1. A user is allowed to submit at most 100 jobs. Each job may be [a job array][1].
1. The array size is at most 1500 subjobs.
## Job Arrays
......@@ -76,7 +76,7 @@ If huge number of parallel multicore (in means of multinode multithread, e. g. M
### Submit the Job Array
To submit the job array, use the qsub -J command. The 900 jobs of the [example above](#array_example) may be submitted like this:
To submit the job array, use the qsub -J command. The 900 jobs of the [example above][5] may be submitted like this:
```console
$ qsub -N JOBNAME -J 1-900 jobscript
......@@ -147,7 +147,7 @@ Display status information for all user's subjobs.
$ qstat -u $USER -tJ
```
Read more on job arrays in the [PBSPro Users guide](software/pbspro/).
Read more on job arrays in the [PBSPro Users guide][6].
## GNU Parallel
......@@ -209,7 +209,7 @@ In this example, tasks from tasklist are executed via the GNU parallel. The jobs
### Submit the Job
To submit the job, use the qsub command. The 101 tasks' job of the [example above](#gp_example) may be submitted like this:
To submit the job, use the qsub command. The 101 tasks' job of the [example above][7] may be submitted like this:
```console
$ qsub -N JOBNAME jobscript
......@@ -294,7 +294,7 @@ When deciding this values, think about following guiding rules :
### Submit the Job Array (-J)
To submit the job array, use the qsub -J command. The 960 tasks' job of the [example above](#combined_example) may be submitted like this:
To submit the job array, use the qsub -J command. The 960 tasks' job of the [example above][8] may be submitted like this:
```console
$ qsub -N JOBNAME -J 1-960:48 jobscript
......@@ -308,7 +308,7 @@ In this example, we submit a job array of 20 subjobs. Note the -J 1-960:48, thi
## Examples
Download the examples in [capacity.zip](capacity.zip), illustrating the above listed ways to run huge number of jobs. We recommend to try out the examples, before using this for running production jobs.
Download the examples in [capacity.zip][9], illustrating the above listed ways to run huge number of jobs. We recommend to try out the examples, before using this for running production jobs.
Unzip the archive in an empty directory on the cluster and follow the instructions in the README file
......@@ -317,3 +317,13 @@ $ unzip capacity.zip
$ cd capacity
$ cat README
```
[1]: #job-arrays
[2]: #shared-jobscript-on-one-node
[3]: #gnu-parallel
[4]: #job-arrays-and-gnu-parallel
[5]: #array_example
[6]: ../pbspro.md
[7]: #gp_example
[8]: #combined_example
[9]: capacity.zip
......@@ -5,7 +5,7 @@
Salomon is cluster of x86-64 Intel based nodes. The cluster contains two types of compute nodes of the same processor type and memory size.
Compute nodes with MIC accelerator **contains two Intel Xeon Phi 7120P accelerators.**
[More about schematic representation of the Salomon cluster compute nodes IB topology](salomon/ib-single-plane-topology/).
[More about][1] schematic representation of the Salomon cluster compute nodes IB topology.
### Compute Nodes Without Accelerator
......@@ -105,3 +105,5 @@ MIC Accelerator Intel Xeon Phi 7120P Processor
* 16 GDDR5 DIMMs per node
* 8 GDDR5 DIMMs per CPU
* 2 GDDR5 DIMMs per channel
[1]: ib-single-plane-topology.md
......@@ -4,7 +4,7 @@
The Salomon cluster consists of 1008 computational nodes of which 576 are regular compute nodes and 432 accelerated nodes. Each node is a powerful x86-64 computer, equipped with 24 cores (two twelve-core Intel Xeon processors) and 128 GB RAM. The nodes are interlinked by high speed InfiniBand and Ethernet networks. All nodes share 0.5 PB /home NFS disk storage to store the user files. Users may use a DDN Lustre shared storage with capacity of 1.69 PB which is available for the scratch project data. The user access to the Salomon cluster is provided by four login nodes.
[More about schematic representation of the Salomon cluster compute nodes IB topology](salomon/ib-single-plane-topology/).
[More about][1] schematic representation of the Salomon cluster compute nodes IB topology.
![Salomon](../img/salomon-2.jpg)
......@@ -17,7 +17,7 @@ The parameters are summarized in the following tables:
| Primary purpose | High Performance Computing |
| Architecture of compute nodes | x86-64 |
| Operating system | CentOS 6.x Linux |
| [**Compute nodes**](salomon/compute-nodes/) | |
| [**Compute nodes**][2] | |
| Totally | 1008 |
| Processor | 2 x Intel Xeon E5-2680v3, 2.5 GHz, 12 cores |
| RAM | 128GB, 5.3 GB per core, DDR4@2133 MHz |
......@@ -36,7 +36,7 @@ The parameters are summarized in the following tables:
| w/o accelerator | 576 | 2 x Intel Xeon E5-2680v3, 2.5 GHz | 24 | 128 GB | - |
| MIC accelerated | 432 | 2 x Intel Xeon E5-2680v3, 2.5 GHz | 24 | 128 GB | 2 x Intel Xeon Phi 7120P, 61 cores, 16 GB RAM |
For more details refer to the [Compute nodes](salomon/compute-nodes/).
For more details refer to the [Compute nodes][2].
## Remote Visualization Nodes
......@@ -55,3 +55,6 @@ For large memory computations a special SMP/NUMA SGI UV 2000 server is available
| UV2000 | 1 | 14 x Intel Xeon E5-4627v2, 3.3 GHz, 8 cores | 112 | 3328 GB DDR3@1866 MHz | 2 x 400GB local SSD, 1x NVIDIA GM200 (GeForce GTX TITAN X), 12 GB RAM |
![](../img/uv-2000.jpeg)
[1]: ib-single-plane-topology.md
[2]: compute-nodes.md
......@@ -12,20 +12,25 @@ The SGI ICE X IB Premium Blade provides the first level of interconnection via d
Each color in each physical IRU represents one dual-switch ASIC switch.
[IB single-plane topology - ICEX Mcell.pdf](../src/IB single-plane topology - ICEX Mcell.pdf)
[IB single-plane topology - ICEX Mcell.pdf][1]
![IB single-plane topology - ICEX Mcell.pdf](../img/IBsingleplanetopologyICEXMcellsmall.png)
## IB Single-Plane Topology - Accelerated Nodes
Each of the 3 inter-connected D racks are equivalent to one half of M-Cell rack. 18 x D rack with MIC accelerated nodes [r21-r38] are equivalent to 3 M-Cell racks as shown in a diagram [7D Enhanced Hypercube](salomon/7d-enhanced-hypercube/).
Each of the 3 inter-connected D racks are equivalent to one half of M-Cell rack. 18 x D rack with MIC accelerated nodes [r21-r38] are equivalent to 3 M-Cell racks as shown in a diagram [7D Enhanced Hypercube][2].
As shown in a diagram [IB Topology](salomon/7d-enhanced-hypercube/#ib-topology)
As shown in a diagram [IB Topology][3]
* Racks 21, 22, 23, 24, 25, 26 are equivalent to one M-Cell rack.
* Racks 27, 28, 29, 30, 31, 32 are equivalent to one M-Cell rack.
* Racks 33, 34, 35, 36, 37, 38 are equivalent to one M-Cell rack.
[IB single-plane topology - Accelerated nodes.pdf](../src/IB single-plane topology - Accelerated nodes.pdf)
[IB single-plane topology - Accelerated nodes.pdf][4]
![IB single-plane topology - Accelerated nodes.pdf](../img/IBsingleplanetopologyAcceleratednodessmall.png)
[1]: ../src/IB_single-plane_topology_-_ICEX_Mcell.pdf
[2]: 7d-enhanced-hypercube.md
[3]: 7d-enhanced-hypercube.md#ib-topology)
[4]: ../src/IB_single-plane_topology_-_Accelerated_nodes.pdf
# Introduction
Welcome to Salomon supercomputer cluster. The Salomon cluster consists of 1008 compute nodes, totalling 24192 compute cores with 129 TB RAM and giving over 2 Pflop/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 24 cores, and at least 128 GB RAM. Nodes are interconnected through a 7D Enhanced hypercube InfiniBand network and are equipped with Intel Xeon E5-2680v3 processors. The Salomon cluster consists of 576 nodes without accelerators, and 432 nodes equipped with Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview](salomon/hardware-overview/).
Welcome to Salomon supercomputer cluster. The Salomon cluster consists of 1008 compute nodes, totalling 24192 compute cores with 129 TB RAM and giving over 2 Pflop/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 24 cores, and at least 128 GB RAM. Nodes are interconnected through a 7D Enhanced hypercube InfiniBand network and are equipped with Intel Xeon E5-2680v3 processors. The Salomon cluster consists of 576 nodes without accelerators, and 432 nodes equipped with Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview][1].
The cluster runs with a [CentOS Linux](http://www.bull.com/bullx-logiciels/systeme-exploitation.html) operating system, which is compatible with the RedHat [Linux family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg)
The cluster runs with a [CentOS Linux][a] operating system, which is compatible with the RedHat [Linux family][b].
## Water-Cooled Compute Nodes With MIC Accelerators
......@@ -15,3 +15,8 @@ The cluster runs with a [CentOS Linux](http://www.bull.com/bullx-logiciels/syste
![](../img/salomon-3.jpeg)
![](../img/salomon-4.jpeg)
[1]: hardware-overview.md
[a]: http://www.bull.com/bullx-logiciels/systeme-exploitation.html
[b]: http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg
......@@ -16,7 +16,7 @@ Queue priority is priority of queue where job is queued before execution.
Queue priority has the biggest impact on job execution priority. Execution priority of jobs in higher priority queues is always greater than execution priority of jobs in lower priority queues. Other properties of job used for determining job execution priority (fair-share priority, eligible time) cannot compete with queue priority.
Queue priorities can be seen at [https://extranet.it4i.cz/rsweb/salomon/queues](https://extranet.it4i.cz/rsweb/salomon/queues)
Queue priorities can be seen [here][a].
### Fair-Share Priority
......@@ -37,7 +37,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut
## Jobs Queued in Queue qexp Are Not Calculated to Project's Usage.
!!! note
Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/rsweb/salomon/projects>.
Calculated usage and fair-share priority can be seen [here][b].
Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job.
......@@ -72,6 +72,11 @@ Specifying more accurate walltime enables better scheduling, better execution ti
### Job Placement
Job [placement can be controlled by flags during submission](salomon/job-submission-and-execution/#job_placement).
Job [placement can be controlled by flags during submission][1].
---8<--- "mathjax.md"
[1]: job-submission-and-execution.md#job_placement
[a]: https://extranet.it4i.cz/rsweb/salomon/queues
[b]: https://extranet.it4i.cz/rsweb/salomon/projects
......@@ -102,7 +102,7 @@ exec_vnode = (r21u05n581-mic0:naccelerators=1:ncpus=0)
Per NUMA node allocation.
Jobs are isolated by cpusets.
The UV2000 (node uv1) offers 3TB of RAM and 104 cores, distributed in 13 NUMA nodes. A NUMA node packs 8 cores and approx. 247GB RAM (with exception, node 11 has only 123GB RAM). In the PBS the UV2000 provides 13 chunks, a chunk per NUMA node (see [Resource allocation policy](salomon/resources-allocation-policy/)). The jobs on UV2000 are isolated from each other by cpusets, so that a job by one user may not utilize CPU or memory allocated to a job by other user. Always, full chunks are allocated, a job may only use resources of the NUMA nodes allocated to itself.
The UV2000 (node uv1) offers 3TB of RAM and 104 cores, distributed in 13 NUMA nodes. A NUMA node packs 8 cores and approx. 247GB RAM (with exception, node 11 has only 123GB RAM). In the PBS the UV2000 provides 13 chunks, a chunk per NUMA node (see [Resource allocation policy][1]). The jobs on UV2000 are isolated from each other by cpusets, so that a job by one user may not utilize CPU or memory allocated to a job by other user. Always, full chunks are allocated, a job may only use resources of the NUMA nodes allocated to itself.
```console
$ qsub -A OPEN-0-0 -q qfat -l select=13 ./myjob
......@@ -130,7 +130,7 @@ In this example, we allocate 2000GB of memory and 16 cores on the UV2000 for 48
### Useful Tricks
All qsub options may be [saved directly into the jobscript](#example-jobscript-for-mpi-calculation-with-preloaded-inputs). In such a case, no options to qsub are needed.
All qsub options may be [saved directly into the jobscript][2]. In such a case, no options to qsub are needed.
```console
$ qsub ./myjob
......@@ -165,7 +165,7 @@ In this example, we allocate nodes r24u35n680 and r24u36n681, all 24 cores per n
### Placement by Network Location
Network location of allocated nodes in the [InifiBand network](salomon/network/) influences efficiency of network communication between nodes of job. Nodes on the same InifiBand switch communicate faster with lower latency than distant nodes. To improve communication efficiency of jobs, PBS scheduler on Salomon is configured to allocate nodes - from currently available resources - which are as close as possible in the network topology.
Network location of allocated nodes in the [InifiBand network][3] influences efficiency of network communication between nodes of job. Nodes on the same InifiBand switch communicate faster with lower latency than distant nodes. To improve communication efficiency of jobs, PBS scheduler on Salomon is configured to allocate nodes - from currently available resources - which are as close as possible in the network topology.
For communication intensive jobs it is possible to set stricter requirement - to require nodes directly connected to the same InifiBand switch or to require nodes located in the same dimension group of the InifiBand network.
......@@ -238,7 +238,7 @@ Nodes located in the same dimension group may be allocated using node grouping o
| 6D | ehc_6d | 432,576 |
| 7D | ehc_7d | all |
In this example, we allocate 16 nodes in the same [hypercube dimension](salomon/7d-enhanced-hypercube/) 1 group.
In this example, we allocate 16 nodes in the same [hypercube dimension][4] 1 group.
```console
$ qsub -A OPEN-0-0 -q qprod -l select=16:ncpus=24 -l place=group=ehc_1d -I
......@@ -475,7 +475,7 @@ exit
In this example, some directory on the /home holds the input file input and executable mympiprog.x . We create a directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI programm mympiprog.x and copy the output file back to the /home directory. The mympiprog.x is executed as one process per node, on all allocated nodes.
!!! note
Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts.
Consider preloading inputs and executables onto [shared scratch][5] before the calculation starts.
In some cases, it may be impractical to copy the inputs to scratch and outputs to home. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished.
......@@ -516,7 +516,7 @@ HTML commented section #2 (examples need to be reworked)
!!! note
Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Be very careful, use of RAM disk filesystem is at the expense of operational memory.
Example jobscript for single node calculation, using [local scratch](salomon/storage/) on the node:
Example jobscript for single node calculation, using [local scratch][5] on the node:
```bash
#!/bin/bash
......@@ -539,3 +539,9 @@ exit
```
In this example, some directory on the home holds the input file input and executable myprog.x . We copy input and executable files from the home directory where the qsub was invoked ($PBS_O_WORKDIR) to local scratch /lscratch/$PBS_JOBID, execute the myprog.x and copy the output file back to the /home directory. The myprog.x runs on one node only and may use threads.
[1]: resources-allocation-policy.md
[2]: #example-jobscript-for-mpi-calculation-with-preloaded-inputs
[3]: network.md
[4]: 7d-enhanced-hypercube.md
[5]: storage.md
# Network
All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube [InfiniBand](http://en.wikipedia.org/wiki/InfiniBand) network and by Gigabit [Ethernet](http://en.wikipedia.org/wiki/Ethernet)
network. Only [InfiniBand](http://en.wikipedia.org/wiki/InfiniBand) network may be used to transfer user data.
All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube [InfiniBand][a] network and by Gigabit [Ethernet][b] network. Only [InfiniBand][c] network may be used to transfer user data.
## InfiniBand Network
All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube [Infiniband](http://en.wikipedia.org/wiki/InfiniBand) network (56 Gbps). The network topology is a [7D Enhanced hypercube](salomon/7d-enhanced-hypercube/).
All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube [Infiniband][a] network (56 Gbps). The network topology is a [7D Enhanced hypercube][1].
Read more about schematic representation of the Salomon cluster [IB single-plain topology](salomon/ib-single-plane-topology/)
([hypercube dimension](salomon/7d-enhanced-hypercube/)).
Read more about schematic representation of the Salomon cluster [IB single-plain topology][2] ([hypercube dimension][1]).
The compute nodes may be accessed via the Infiniband network using ib0 network interface, in address range 10.17.0.0 (mask 255.255.224.0). The MPI may be used to establish native Infiniband connection among the nodes.
......@@ -47,3 +45,9 @@ $ ip addr show ib0
inet 10.17.35.19....
....
```
[1]: 7d-enhanced-hypercube.md
[2]: ib-single-plane-topology.md
[a]: http://en.wikipedia.org/wiki/InfiniBand
[b]: http://en.wikipedia.org/wiki/Ethernet
[c]: http://en.wikipedia.org/wiki/InfiniBand
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment