Skip to content
Snippets Groups Projects
Commit e9333cd5 authored by David Hrbáč's avatar David Hrbáč
Browse files

Links OK

parent c1598434
No related branches found
No related tags found
5 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section,!219Virtual environment, upgrade MKdocs, upgrade Material design
...@@ -9,13 +9,14 @@ However, executing a huge number of jobs via the PBS queue may strain the system ...@@ -9,13 +9,14 @@ However, executing a huge number of jobs via the PBS queue may strain the system
!!! note !!! note
Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
* Use [Job arrays](anselm/capacity-computing/#job-arrays) when running a huge number of [multithread](anselm/capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs * Use [Job arrays][1] when running a huge number of [multithread][2] (bound to one node only) or multinode (multithread across several nodes) jobs
* Use [GNU parallel](anselm/capacity-computing/#gnu-parallel) when running single core jobs * Use [GNU parallel][3] when running single core jobs
* Combine [GNU parallel with Job arrays](anselm/capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs * Combine [GNU parallel with Job arrays][4] when running huge number of single core jobs
## Policy ## Policy
1. A user is allowed to submit at most 100 jobs. Each job may be [a job array](anselm/capacity-computing/#job-arrays). 1. A user is allowed to submit at most 100 jobs. Each job may be [a job array][1].
1. The array size is at most 1000 subjobs. 1. The array size is at most 1000 subjobs.
## Job Arrays ## Job Arrays
...@@ -76,7 +77,7 @@ If running a huge number of parallel multicore (in means of multinode multithrea ...@@ -76,7 +77,7 @@ If running a huge number of parallel multicore (in means of multinode multithrea
### Submit the Job Array ### Submit the Job Array
To submit the job array, use the qsub -J command. The 900 jobs of the [example above](anselm/capacity-computing/#array_example) may be submitted like this: To submit the job array, use the qsub -J command. The 900 jobs of the [example above][5] may be submitted like this:
```console ```console
$ qsub -N JOBNAME -J 1-900 jobscript $ qsub -N JOBNAME -J 1-900 jobscript
...@@ -145,7 +146,7 @@ Display status information for all user's subjobs. ...@@ -145,7 +146,7 @@ Display status information for all user's subjobs.
$ qstat -u $USER -tJ $ qstat -u $USER -tJ
``` ```
Read more on job arrays in the [PBSPro Users guide](pbspro/). Read more on job arrays in the [PBSPro Users guide][6].
## GNU Parallel ## GNU Parallel
...@@ -207,7 +208,7 @@ In this example, tasks from the tasklist are executed via the GNU parallel. The ...@@ -207,7 +208,7 @@ In this example, tasks from the tasklist are executed via the GNU parallel. The
### Submit the Job ### Submit the Job
To submit the job, use the qsub command. The 101 task job of the [example above](anselm/capacity-computing/#gp_example) may be submitted as follows: To submit the job, use the qsub command. The 101 task job of the [example above][7] may be submitted as follows:
```console ```console
$ qsub -N JOBNAME jobscript $ qsub -N JOBNAME jobscript
...@@ -292,7 +293,7 @@ When deciding this values, keep in mind the following guiding rules: ...@@ -292,7 +293,7 @@ When deciding this values, keep in mind the following guiding rules:
### Submit the Job Array (-J) ### Submit the Job Array (-J)
To submit the job array, use the qsub -J command. The 992 task job of the [example above](anselm/capacity-computing/#combined_example) may be submitted like this: To submit the job array, use the qsub -J command. The 992 task job of the [example above][8] may be submitted like this:
```console ```console
$ qsub -N JOBNAME -J 1-992:32 jobscript $ qsub -N JOBNAME -J 1-992:32 jobscript
...@@ -306,7 +307,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**, ...@@ -306,7 +307,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**,
## Examples ## Examples
Download the examples in [capacity.zip](capacity.zip), illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs. Download the examples in [capacity.zip][9], illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs.
Unzip the archive in an empty directory on Anselm and follow the instructions in the README file Unzip the archive in an empty directory on Anselm and follow the instructions in the README file
...@@ -314,3 +315,13 @@ Unzip the archive in an empty directory on Anselm and follow the instructions in ...@@ -314,3 +315,13 @@ Unzip the archive in an empty directory on Anselm and follow the instructions in
$ unzip capacity.zip $ unzip capacity.zip
$ cat README $ cat README
``` ```
[1]: ./#job-arrays
[2]: ./#shared-jobscript-on-one-node
[3]: ./#gnu-parallel
[4]: ./#job-arrays-and-gnu-parallel
[5]: ./#array_example
[6]: ../pbspro.md
[7]: ./#gp_example
[8]: ./#combined_example
[9]: ./capacity.zip
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
## Node Configuration ## Node Configuration
Anselm is cluster of x86-64 Intel based nodes built with Bull Extreme Computing bullx technology. The cluster contains four types of compute nodes. Anselm is a cluster of x86-64 Intel based nodes built with Bull Extreme Computing bullx technology. The cluster contains four types of compute nodes.
### Compute Nodes Without Accelerators ### Compute Nodes Without Accelerators
...@@ -52,7 +52,7 @@ Anselm is cluster of x86-64 Intel based nodes built with Bull Extreme Computing ...@@ -52,7 +52,7 @@ Anselm is cluster of x86-64 Intel based nodes built with Bull Extreme Computing
### Compute Node Summary ### Compute Node Summary
| Node type | Count | Range | Memory | Cores | [Access](general/resources-allocation-policy/) | | Node type | Count | Range | Memory | Cores | Queues |
| ---------------------------- | ----- | ----------- | ------ | ----------- | -------------------------------------- | | ---------------------------- | ----- | ----------- | ------ | ----------- | -------------------------------------- |
| Nodes without an accelerator | 180 | cn[1-180] | 64GB | 16 @ 2.4GHz | qexp, qprod, qlong, qfree, qprace, qatlas | | Nodes without an accelerator | 180 | cn[1-180] | 64GB | 16 @ 2.4GHz | qexp, qprod, qlong, qfree, qprace, qatlas |
| Nodes with a GPU accelerator | 23 | cn[181-203] | 96GB | 16 @ 2.3GHz | qnvidia, qexp | | Nodes with a GPU accelerator | 23 | cn[181-203] | 96GB | 16 @ 2.3GHz | qnvidia, qexp |
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
The Anselm cluster consists of 209 computational nodes named cn[1-209] of which 180 are regular compute nodes, 23 are GPU Kepler K20 accelerated nodes, 4 are MIC Xeon Phi 5110P accelerated nodes, and 2 are fat nodes. Each node is a powerful x86-64 computer, equipped with 16 cores (two eight-core Intel Sandy Bridge processors), at least 64 GB of RAM, and a local hard drive. User access to the Anselm cluster is provided by two login nodes login[1,2]. The nodes are interlinked through high speed InfiniBand and Ethernet networks. All nodes share a 320 TB /home disk for storage of user files. The 146 TB shared /scratch storage is available for scratch data. The Anselm cluster consists of 209 computational nodes named cn[1-209] of which 180 are regular compute nodes, 23 are GPU Kepler K20 accelerated nodes, 4 are MIC Xeon Phi 5110P accelerated nodes, and 2 are fat nodes. Each node is a powerful x86-64 computer, equipped with 16 cores (two eight-core Intel Sandy Bridge processors), at least 64 GB of RAM, and a local hard drive. User access to the Anselm cluster is provided by two login nodes login[1,2]. The nodes are interlinked through high speed InfiniBand and Ethernet networks. All nodes share a 320 TB /home disk for storage of user files. The 146 TB shared /scratch storage is available for scratch data.
The Fat nodes are equipped with a large amount (512 GB) of memory. Virtualization infrastructure provides resources to run long term servers and services in virtual mode. Fat nodes and virtual servers may access 45 TB of dedicated block storage. Accelerated nodes, fat nodes, and virtualization infrastructure are available [upon request](https://support.it4i.cz/rt) from a PI. The Fat nodes are equipped with a large amount (512 GB) of memory. Virtualization infrastructure provides resources to run long term servers and services in virtual mode. Fat nodes and virtual servers may access 45 TB of dedicated block storage. Accelerated nodes, fat nodes, and virtualization infrastructure are available [upon request][a] from a PI.
Schematic representation of the Anselm cluster. Each box represents a node (computer) or storage capacity: Schematic representation of the Anselm cluster. Each box represents a node (computer) or storage capacity:
...@@ -17,16 +17,16 @@ There are four types of compute nodes: ...@@ -17,16 +17,16 @@ There are four types of compute nodes:
* 4 compute nodes with a MIC accelerator - an Intel Xeon Phi 5110P * 4 compute nodes with a MIC accelerator - an Intel Xeon Phi 5110P
* 2 fat nodes - equipped with 512 GB of RAM and two 100 GB SSD drives * 2 fat nodes - equipped with 512 GB of RAM and two 100 GB SSD drives
[More about Compute nodes](anselm/compute-nodes/). [More about Compute nodes][1].
GPU and accelerated nodes are available upon request, see the [Resources Allocation Policy](anselm/resources-allocation-policy/). GPU and accelerated nodes are available upon request, see the [Resources Allocation Policy][2].
All of these nodes are interconnected through fast InfiniBand and Ethernet networks. [More about the Network](anselm/network/). All of these nodes are interconnected through fast InfiniBand and Ethernet networks. [More about the Network][3].
Every chassis provides an InfiniBand switch, marked **isw**, connecting all nodes in the chassis, as well as connecting the chassis to the upper level switches. Every chassis provides an InfiniBand switch, marked **isw**, connecting all nodes in the chassis, as well as connecting the chassis to the upper level switches.
All of the nodes share a 360 TB /home disk for storage of user files. The 146 TB shared /scratch storage is available for scratch data. These file systems are provided by the Lustre parallel file system. There is also local disk storage available on all compute nodes in /lscratch. [More about Storage](anselm/storage/). All of the nodes share a 360 TB /home disk for storage of user files. The 146 TB shared /scratch storage is available for scratch data. These file systems are provided by the Lustre parallel file system. There is also local disk storage available on all compute nodes in /lscratch. [More about Storage][4].
User access to the Anselm cluster is provided by two login nodes login1, login2, and data mover node dm1. [More about accessing the cluster.](anselm/shell-and-data-access/) User access to the Anselm cluster is provided by two login nodes login1, login2, and data mover node dm1. [More about accessing the cluster][5].
The parameters are summarized in the following tables: The parameters are summarized in the following tables:
...@@ -35,7 +35,7 @@ The parameters are summarized in the following tables: ...@@ -35,7 +35,7 @@ The parameters are summarized in the following tables:
| Primary purpose | High Performance Computing | | Primary purpose | High Performance Computing |
| Architecture of compute nodes | x86-64 | | Architecture of compute nodes | x86-64 |
| Operating system | Linux (CentOS) | | Operating system | Linux (CentOS) |
| [**Compute nodes**](anselm/compute-nodes/) | | | [**Compute nodes**][1] | |
| Total | 209 | | Total | 209 |
| Processor cores | 16 (2 x 8 cores) | | Processor cores | 16 (2 x 8 cores) |
| RAM | min. 64 GB, min. 4 GB per core | | RAM | min. 64 GB, min. 4 GB per core |
...@@ -57,4 +57,12 @@ The parameters are summarized in the following tables: ...@@ -57,4 +57,12 @@ The parameters are summarized in the following tables:
| MIC accelerated | 2 x Intel Sandy Bridge E5-2470, 2.3 GHz | 96 GB | Intel Xeon Phi 5110P | | MIC accelerated | 2 x Intel Sandy Bridge E5-2470, 2.3 GHz | 96 GB | Intel Xeon Phi 5110P |
| Fat compute node | 2 x Intel Sandy Bridge E5-2665, 2.4 GHz | 512 GB | - | | Fat compute node | 2 x Intel Sandy Bridge E5-2665, 2.4 GHz | 512 GB | - |
For more details refer to [Compute nodes](anselm/compute-nodes/), [Storage](anselm/storage/), and [Network](anselm/network/). For more details refer to [Compute nodes][1], [Storage][4], and [Network][3].
[1]: ./compute-nodes.md
[2]: ./resources-allocation-policy.md
[3]: ./network.md
[4]: ./storage.md
[5]: ./shell-and-data-access.md
[a]: https://support.it4i.cz/rt
# Introduction # Introduction
Welcome to Anselm supercomputer cluster. The Anselm cluster consists of 209 compute nodes, totalling 3344 compute cores with 15 TB RAM, giving over 94 TFLOP/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 16 cores, at least 64 GB of RAM, and a 500 GB hard disk drive. Nodes are interconnected through a fully non-blocking fat-tree InfiniBand network, and are equipped with Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview](anselm/hardware-overview/). Welcome to Anselm supercomputer cluster. The Anselm cluster consists of 209 compute nodes, totalling 3344 compute cores with 15 TB RAM, giving over 94 TFLOP/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 16 cores, at least 64 GB of RAM, and a 500 GB hard disk drive. Nodes are interconnected through a fully non-blocking fat-tree InfiniBand network, and are equipped with Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview][1].
The cluster runs with an [operating system](software/operating-system/) which is compatible with the RedHat [Linux family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment](environment-and-modules/). The cluster runs with an operating system which is compatible with the RedHat [Linux family][a]. We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment][2].
The user data shared file-system (HOME, 320 TB) and job data shared file-system (SCRATCH, 146 TB) are available to users. The user data shared file-system (HOME, 320 TB) and job data shared file-system (SCRATCH, 146 TB) are available to users.
The PBS Professional workload manager provides [computing resources allocations and job execution](anselm/resources-allocation-policy/). The PBS Professional workload manager provides [computing resources allocations and job execution][3].
Read more on how to [apply for resources](general/applying-for-resources/), [obtain login credentials](general/obtaining-login-credentials/obtaining-login-credentials/) and [access the cluster](anselm/shell-and-data-access/). Read more on how to [apply for resources][4], [obtain login credentials][5] and [access the cluster][6].
[1]: ./hardware-overview.md
[2]: ../environment-and-modules.md
[3]: ./resources-allocation-policy.md
[4]: ../general/applying-for-resources.md
[5]: ../general/obtaining-login-credentials/obtaining-login-credentials.md
[6]: ./shell-and-data-access.md
[a]: http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment