diff --git a/docs.it4i/salomon/capacity-computing.md b/docs.it4i/salomon/capacity-computing.md index f7be485d1731627aebf9a050a636e68c618cb79c..05a300ba86a17ae7b60686e2d79066523955fa9f 100644 --- a/docs.it4i/salomon/capacity-computing.md +++ b/docs.it4i/salomon/capacity-computing.md @@ -9,13 +9,13 @@ However, executing huge number of jobs via the PBS queue may strain the system. !!! note Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. -* Use [Job arrays](#job-arrays) when running huge number of [multithread](#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs -* Use [GNU parallel](#gnu-parallel) when running single core jobs -* Combine [GNU parallel with Job arrays](#job-arrays-and-gnu-parallel) when running huge number of single core jobs +* Use [Job arrays][1] when running huge number of [multithread][2] (bound to one node only) or multinode (multithread across several nodes) jobs +* Use [GNU parallel][3] when running single core jobs +* Combine [GNU parallel with Job arrays][4] when running huge number of single core jobs ## Policy -1. A user is allowed to submit at most 100 jobs. Each job may be [a job array](#job-arrays). +1. A user is allowed to submit at most 100 jobs. Each job may be [a job array][1]. 1. The array size is at most 1500 subjobs. ## Job Arrays @@ -76,7 +76,7 @@ If huge number of parallel multicore (in means of multinode multithread, e. g. M ### Submit the Job Array -To submit the job array, use the qsub -J command. The 900 jobs of the [example above](#array_example) may be submitted like this: +To submit the job array, use the qsub -J command. The 900 jobs of the [example above][5] may be submitted like this: ```console $ qsub -N JOBNAME -J 1-900 jobscript @@ -147,7 +147,7 @@ Display status information for all user's subjobs. $ qstat -u $USER -tJ ``` -Read more on job arrays in the [PBSPro Users guide](software/pbspro/). +Read more on job arrays in the [PBSPro Users guide][6]. ## GNU Parallel @@ -209,7 +209,7 @@ In this example, tasks from tasklist are executed via the GNU parallel. The jobs ### Submit the Job -To submit the job, use the qsub command. The 101 tasks' job of the [example above](#gp_example) may be submitted like this: +To submit the job, use the qsub command. The 101 tasks' job of the [example above][7] may be submitted like this: ```console $ qsub -N JOBNAME jobscript @@ -294,7 +294,7 @@ When deciding this values, think about following guiding rules : ### Submit the Job Array (-J) -To submit the job array, use the qsub -J command. The 960 tasks' job of the [example above](#combined_example) may be submitted like this: +To submit the job array, use the qsub -J command. The 960 tasks' job of the [example above][8] may be submitted like this: ```console $ qsub -N JOBNAME -J 1-960:48 jobscript @@ -308,7 +308,7 @@ In this example, we submit a job array of 20 subjobs. Note the -J 1-960:48, thi ## Examples -Download the examples in [capacity.zip](capacity.zip), illustrating the above listed ways to run huge number of jobs. We recommend to try out the examples, before using this for running production jobs. +Download the examples in [capacity.zip][9], illustrating the above listed ways to run huge number of jobs. We recommend to try out the examples, before using this for running production jobs. Unzip the archive in an empty directory on the cluster and follow the instructions in the README file @@ -317,3 +317,13 @@ $ unzip capacity.zip $ cd capacity $ cat README ``` + +[1]: #job-arrays +[2]: #shared-jobscript-on-one-node +[3]: #gnu-parallel +[4]: #job-arrays-and-gnu-parallel +[5]: #array_example +[6]: ../pbspro.md +[7]: #gp_example +[8]: #combined_example +[9]: capacity.zip diff --git a/docs.it4i/salomon/compute-nodes.md b/docs.it4i/salomon/compute-nodes.md index 8eae726d9d194feb4f32f0d035230f24f86b354e..0703ec985bdf9cac7f62aaaea2986bf707a65556 100644 --- a/docs.it4i/salomon/compute-nodes.md +++ b/docs.it4i/salomon/compute-nodes.md @@ -5,7 +5,7 @@ Salomon is cluster of x86-64 Intel based nodes. The cluster contains two types of compute nodes of the same processor type and memory size. Compute nodes with MIC accelerator **contains two Intel Xeon Phi 7120P accelerators.** -[More about schematic representation of the Salomon cluster compute nodes IB topology](salomon/ib-single-plane-topology/). +[More about][1] schematic representation of the Salomon cluster compute nodes IB topology. ### Compute Nodes Without Accelerator @@ -105,3 +105,5 @@ MIC Accelerator Intel Xeon Phi 7120P Processor * 16 GDDR5 DIMMs per node * 8 GDDR5 DIMMs per CPU * 2 GDDR5 DIMMs per channel + +[1]: ib-single-plane-topology.md diff --git a/docs.it4i/salomon/hardware-overview.md b/docs.it4i/salomon/hardware-overview.md index 59c2c42ff0e1785bf3978dc32afd780eb2a52485..3eab396980dadf090215541b95b3be00d73a0b47 100644 --- a/docs.it4i/salomon/hardware-overview.md +++ b/docs.it4i/salomon/hardware-overview.md @@ -4,7 +4,7 @@ The Salomon cluster consists of 1008 computational nodes of which 576 are regular compute nodes and 432 accelerated nodes. Each node is a powerful x86-64 computer, equipped with 24 cores (two twelve-core Intel Xeon processors) and 128 GB RAM. The nodes are interlinked by high speed InfiniBand and Ethernet networks. All nodes share 0.5 PB /home NFS disk storage to store the user files. Users may use a DDN Lustre shared storage with capacity of 1.69 PB which is available for the scratch project data. The user access to the Salomon cluster is provided by four login nodes. -[More about schematic representation of the Salomon cluster compute nodes IB topology](salomon/ib-single-plane-topology/). +[More about][1] schematic representation of the Salomon cluster compute nodes IB topology.  @@ -17,7 +17,7 @@ The parameters are summarized in the following tables: | Primary purpose | High Performance Computing | | Architecture of compute nodes | x86-64 | | Operating system | CentOS 6.x Linux | -| [**Compute nodes**](salomon/compute-nodes/) | | +| [**Compute nodes**][2] | | | Totally | 1008 | | Processor | 2 x Intel Xeon E5-2680v3, 2.5 GHz, 12 cores | | RAM | 128GB, 5.3 GB per core, DDR4@2133 MHz | @@ -36,7 +36,7 @@ The parameters are summarized in the following tables: | w/o accelerator | 576 | 2 x Intel Xeon E5-2680v3, 2.5 GHz | 24 | 128 GB | - | | MIC accelerated | 432 | 2 x Intel Xeon E5-2680v3, 2.5 GHz | 24 | 128 GB | 2 x Intel Xeon Phi 7120P, 61 cores, 16 GB RAM | -For more details refer to the [Compute nodes](salomon/compute-nodes/). +For more details refer to the [Compute nodes][2]. ## Remote Visualization Nodes @@ -55,3 +55,6 @@ For large memory computations a special SMP/NUMA SGI UV 2000 server is available | UV2000 | 1 | 14 x Intel Xeon E5-4627v2, 3.3 GHz, 8 cores | 112 | 3328 GB DDR3@1866 MHz | 2 x 400GB local SSD, 1x NVIDIA GM200 (GeForce GTX TITAN X), 12 GB RAM |  + +[1]: ib-single-plane-topology.md +[2]: compute-nodes.md diff --git a/docs.it4i/salomon/ib-single-plane-topology.md b/docs.it4i/salomon/ib-single-plane-topology.md index ab945605ff0deda9eadc72d15a758dc036e7cdc0..7f02a1b5b9d7b73dd9263ece70ffd3242d51db50 100644 --- a/docs.it4i/salomon/ib-single-plane-topology.md +++ b/docs.it4i/salomon/ib-single-plane-topology.md @@ -12,20 +12,25 @@ The SGI ICE X IB Premium Blade provides the first level of interconnection via d Each color in each physical IRU represents one dual-switch ASIC switch. -[IB single-plane topology - ICEX Mcell.pdf](../src/IB single-plane topology - ICEX Mcell.pdf) +[IB single-plane topology - ICEX Mcell.pdf][1]  ## IB Single-Plane Topology - Accelerated Nodes -Each of the 3 inter-connected D racks are equivalent to one half of M-Cell rack. 18 x D rack with MIC accelerated nodes [r21-r38] are equivalent to 3 M-Cell racks as shown in a diagram [7D Enhanced Hypercube](salomon/7d-enhanced-hypercube/). +Each of the 3 inter-connected D racks are equivalent to one half of M-Cell rack. 18 x D rack with MIC accelerated nodes [r21-r38] are equivalent to 3 M-Cell racks as shown in a diagram [7D Enhanced Hypercube][2]. -As shown in a diagram [IB Topology](salomon/7d-enhanced-hypercube/#ib-topology) +As shown in a diagram [IB Topology][3] * Racks 21, 22, 23, 24, 25, 26 are equivalent to one M-Cell rack. * Racks 27, 28, 29, 30, 31, 32 are equivalent to one M-Cell rack. * Racks 33, 34, 35, 36, 37, 38 are equivalent to one M-Cell rack. -[IB single-plane topology - Accelerated nodes.pdf](../src/IB single-plane topology - Accelerated nodes.pdf) +[IB single-plane topology - Accelerated nodes.pdf][4]  + +[1]: ../src/IB_single-plane_topology_-_ICEX_Mcell.pdf +[2]: 7d-enhanced-hypercube.md +[3]: 7d-enhanced-hypercube.md#ib-topology) +[4]: ../src/IB_single-plane_topology_-_Accelerated_nodes.pdf diff --git a/docs.it4i/salomon/introduction.md b/docs.it4i/salomon/introduction.md index 90625624abbac158466071bd601d69f988c262d5..f7b1a06c58532116ef25107726aefd8ec5ae9fe6 100644 --- a/docs.it4i/salomon/introduction.md +++ b/docs.it4i/salomon/introduction.md @@ -1,8 +1,8 @@ # Introduction -Welcome to Salomon supercomputer cluster. The Salomon cluster consists of 1008 compute nodes, totalling 24192 compute cores with 129 TB RAM and giving over 2 Pflop/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 24 cores, and at least 128 GB RAM. Nodes are interconnected through a 7D Enhanced hypercube InfiniBand network and are equipped with Intel Xeon E5-2680v3 processors. The Salomon cluster consists of 576 nodes without accelerators, and 432 nodes equipped with Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview](salomon/hardware-overview/). +Welcome to Salomon supercomputer cluster. The Salomon cluster consists of 1008 compute nodes, totalling 24192 compute cores with 129 TB RAM and giving over 2 Pflop/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 24 cores, and at least 128 GB RAM. Nodes are interconnected through a 7D Enhanced hypercube InfiniBand network and are equipped with Intel Xeon E5-2680v3 processors. The Salomon cluster consists of 576 nodes without accelerators, and 432 nodes equipped with Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview][1]. -The cluster runs with a [CentOS Linux](http://www.bull.com/bullx-logiciels/systeme-exploitation.html) operating system, which is compatible with the RedHat [Linux family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) +The cluster runs with a [CentOS Linux][a] operating system, which is compatible with the RedHat [Linux family][b]. ## Water-Cooled Compute Nodes With MIC Accelerators @@ -15,3 +15,8 @@ The cluster runs with a [CentOS Linux](http://www.bull.com/bullx-logiciels/syste   + +[1]: hardware-overview.md + +[a]: http://www.bull.com/bullx-logiciels/systeme-exploitation.html +[b]: http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg diff --git a/docs.it4i/salomon/job-priority.md b/docs.it4i/salomon/job-priority.md index e4515f3d976991ccc6df6d4e8dfa96b5fedc66fd..c1c6f6642df07cda0e6fa8449c8efaae997593a7 100644 --- a/docs.it4i/salomon/job-priority.md +++ b/docs.it4i/salomon/job-priority.md @@ -16,7 +16,7 @@ Queue priority is priority of queue where job is queued before execution. Queue priority has the biggest impact on job execution priority. Execution priority of jobs in higher priority queues is always greater than execution priority of jobs in lower priority queues. Other properties of job used for determining job execution priority (fair-share priority, eligible time) cannot compete with queue priority. -Queue priorities can be seen at [https://extranet.it4i.cz/rsweb/salomon/queues](https://extranet.it4i.cz/rsweb/salomon/queues) +Queue priorities can be seen [here][a]. ### Fair-Share Priority @@ -37,7 +37,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut ## Jobs Queued in Queue qexp Are Not Calculated to Project's Usage. !!! note - Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/rsweb/salomon/projects>. + Calculated usage and fair-share priority can be seen [here][b]. Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job. @@ -72,6 +72,11 @@ Specifying more accurate walltime enables better scheduling, better execution ti ### Job Placement -Job [placement can be controlled by flags during submission](salomon/job-submission-and-execution/#job_placement). +Job [placement can be controlled by flags during submission][1]. ---8<--- "mathjax.md" + +[1]: job-submission-and-execution.md#job_placement + +[a]: https://extranet.it4i.cz/rsweb/salomon/queues +[b]: https://extranet.it4i.cz/rsweb/salomon/projects diff --git a/docs.it4i/salomon/job-submission-and-execution.md b/docs.it4i/salomon/job-submission-and-execution.md index ee87ddcf223ca42c426aabf932a49b677dccc79d..b0c6e9f02f55fdf3acee50c43de4bbd5cb85a8e9 100644 --- a/docs.it4i/salomon/job-submission-and-execution.md +++ b/docs.it4i/salomon/job-submission-and-execution.md @@ -102,7 +102,7 @@ exec_vnode = (r21u05n581-mic0:naccelerators=1:ncpus=0) Per NUMA node allocation. Jobs are isolated by cpusets. -The UV2000 (node uv1) offers 3TB of RAM and 104 cores, distributed in 13 NUMA nodes. A NUMA node packs 8 cores and approx. 247GB RAM (with exception, node 11 has only 123GB RAM). In the PBS the UV2000 provides 13 chunks, a chunk per NUMA node (see [Resource allocation policy](salomon/resources-allocation-policy/)). The jobs on UV2000 are isolated from each other by cpusets, so that a job by one user may not utilize CPU or memory allocated to a job by other user. Always, full chunks are allocated, a job may only use resources of the NUMA nodes allocated to itself. +The UV2000 (node uv1) offers 3TB of RAM and 104 cores, distributed in 13 NUMA nodes. A NUMA node packs 8 cores and approx. 247GB RAM (with exception, node 11 has only 123GB RAM). In the PBS the UV2000 provides 13 chunks, a chunk per NUMA node (see [Resource allocation policy][1]). The jobs on UV2000 are isolated from each other by cpusets, so that a job by one user may not utilize CPU or memory allocated to a job by other user. Always, full chunks are allocated, a job may only use resources of the NUMA nodes allocated to itself. ```console $ qsub -A OPEN-0-0 -q qfat -l select=13 ./myjob @@ -130,7 +130,7 @@ In this example, we allocate 2000GB of memory and 16 cores on the UV2000 for 48 ### Useful Tricks -All qsub options may be [saved directly into the jobscript](#example-jobscript-for-mpi-calculation-with-preloaded-inputs). In such a case, no options to qsub are needed. +All qsub options may be [saved directly into the jobscript][2]. In such a case, no options to qsub are needed. ```console $ qsub ./myjob @@ -165,7 +165,7 @@ In this example, we allocate nodes r24u35n680 and r24u36n681, all 24 cores per n ### Placement by Network Location -Network location of allocated nodes in the [InifiBand network](salomon/network/) influences efficiency of network communication between nodes of job. Nodes on the same InifiBand switch communicate faster with lower latency than distant nodes. To improve communication efficiency of jobs, PBS scheduler on Salomon is configured to allocate nodes - from currently available resources - which are as close as possible in the network topology. +Network location of allocated nodes in the [InifiBand network][3] influences efficiency of network communication between nodes of job. Nodes on the same InifiBand switch communicate faster with lower latency than distant nodes. To improve communication efficiency of jobs, PBS scheduler on Salomon is configured to allocate nodes - from currently available resources - which are as close as possible in the network topology. For communication intensive jobs it is possible to set stricter requirement - to require nodes directly connected to the same InifiBand switch or to require nodes located in the same dimension group of the InifiBand network. @@ -238,7 +238,7 @@ Nodes located in the same dimension group may be allocated using node grouping o | 6D | ehc_6d | 432,576 | | 7D | ehc_7d | all | -In this example, we allocate 16 nodes in the same [hypercube dimension](salomon/7d-enhanced-hypercube/) 1 group. +In this example, we allocate 16 nodes in the same [hypercube dimension][4] 1 group. ```console $ qsub -A OPEN-0-0 -q qprod -l select=16:ncpus=24 -l place=group=ehc_1d -I @@ -475,7 +475,7 @@ exit In this example, some directory on the /home holds the input file input and executable mympiprog.x . We create a directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI programm mympiprog.x and copy the output file back to the /home directory. The mympiprog.x is executed as one process per node, on all allocated nodes. !!! note - Consider preloading inputs and executables onto [shared scratch](storage/) before the calculation starts. + Consider preloading inputs and executables onto [shared scratch][5] before the calculation starts. In some cases, it may be impractical to copy the inputs to scratch and outputs to home. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished. @@ -516,7 +516,7 @@ HTML commented section #2 (examples need to be reworked) !!! note Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. Be very careful, use of RAM disk filesystem is at the expense of operational memory. -Example jobscript for single node calculation, using [local scratch](salomon/storage/) on the node: +Example jobscript for single node calculation, using [local scratch][5] on the node: ```bash #!/bin/bash @@ -539,3 +539,9 @@ exit ``` In this example, some directory on the home holds the input file input and executable myprog.x . We copy input and executable files from the home directory where the qsub was invoked ($PBS_O_WORKDIR) to local scratch /lscratch/$PBS_JOBID, execute the myprog.x and copy the output file back to the /home directory. The myprog.x runs on one node only and may use threads. + +[1]: resources-allocation-policy.md +[2]: #example-jobscript-for-mpi-calculation-with-preloaded-inputs +[3]: network.md +[4]: 7d-enhanced-hypercube.md +[5]: storage.md diff --git a/docs.it4i/salomon/network.md b/docs.it4i/salomon/network.md index 252fe034af4e765d36da76c6e4898264cf4cfd8b..1b3f26785af5dcae9d11cae1cca31327ce99067f 100644 --- a/docs.it4i/salomon/network.md +++ b/docs.it4i/salomon/network.md @@ -1,14 +1,12 @@ # Network -All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube [InfiniBand](http://en.wikipedia.org/wiki/InfiniBand) network and by Gigabit [Ethernet](http://en.wikipedia.org/wiki/Ethernet) -network. Only [InfiniBand](http://en.wikipedia.org/wiki/InfiniBand) network may be used to transfer user data. +All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube [InfiniBand][a] network and by Gigabit [Ethernet][b] network. Only [InfiniBand][c] network may be used to transfer user data. ## InfiniBand Network -All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube [Infiniband](http://en.wikipedia.org/wiki/InfiniBand) network (56 Gbps). The network topology is a [7D Enhanced hypercube](salomon/7d-enhanced-hypercube/). +All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube [Infiniband][a] network (56 Gbps). The network topology is a [7D Enhanced hypercube][1]. -Read more about schematic representation of the Salomon cluster [IB single-plain topology](salomon/ib-single-plane-topology/) -([hypercube dimension](salomon/7d-enhanced-hypercube/)). +Read more about schematic representation of the Salomon cluster [IB single-plain topology][2] ([hypercube dimension][1]). The compute nodes may be accessed via the Infiniband network using ib0 network interface, in address range 10.17.0.0 (mask 255.255.224.0). The MPI may be used to establish native Infiniband connection among the nodes. @@ -47,3 +45,9 @@ $ ip addr show ib0 inet 10.17.35.19.... .... ``` +[1]: 7d-enhanced-hypercube.md +[2]: ib-single-plane-topology.md + +[a]: http://en.wikipedia.org/wiki/InfiniBand +[b]: http://en.wikipedia.org/wiki/Ethernet +[c]: http://en.wikipedia.org/wiki/InfiniBand diff --git a/docs.it4i/src/IB single-plane topology - Accelerated nodes.pdf b/docs.it4i/src/IB_single-plane_topology_-_Accelerated_nodes.pdf similarity index 100% rename from docs.it4i/src/IB single-plane topology - Accelerated nodes.pdf rename to docs.it4i/src/IB_single-plane_topology_-_Accelerated_nodes.pdf diff --git a/docs.it4i/src/IB single-plane topology - ICEX Mcell.pdf b/docs.it4i/src/IB_single-plane_topology_-_ICEX_Mcell.pdf similarity index 100% rename from docs.it4i/src/IB single-plane topology - ICEX Mcell.pdf rename to docs.it4i/src/IB_single-plane_topology_-_ICEX_Mcell.pdf