Commit faf64a14 authored by David Hrbáč's avatar David Hrbáč

Merge branch 'dgx' into 'master'

Dgx

See merge request !240
parents 22727d7a ed5b910c
Pipeline #6197 passed with stages
in 1 minute and 27 seconds
NICE
DGX-2
DGX
DCV
In In
CAE CAE
CUBE CUBE
......
...@@ -60,9 +60,9 @@ The parameters are summarized in the following tables: ...@@ -60,9 +60,9 @@ The parameters are summarized in the following tables:
For more details refer to [Compute nodes][1], [Storage][4], and [Network][3]. For more details refer to [Compute nodes][1], [Storage][4], and [Network][3].
[1]: compute-nodes.md [1]: compute-nodes.md
[2]: resources-allocation-policy.md [2]: ../general/resources-allocation-policy.md
[3]: network.md [3]: network.md
[4]: storage.md [4]: storage.md
[5]: shell-and-data-access.md [5]: ../general/shell-and-data-access.md
[a]: https://support.it4i.cz/rt [a]: https://support.it4i.cz/rt
# Introduction # Introduction
Welcome to Anselm supercomputer cluster. The Anselm cluster consists of 209 compute nodes, totalling 3344 compute cores with 15 TB RAM, giving over 94 TFLOP/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 16 cores, at least 64 GB of RAM, and a 500 GB hard disk drive. Nodes are interconnected through a fully non-blocking fat-tree InfiniBand network, and are equipped with Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview][1]. Welcome to Anselm supercomputer cluster. The Anselm cluster consists of 209 compute nodes, totaling 3344 compute cores with 15 TB RAM, giving over 94 TFLOP/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 16 cores, at least 64 GB of RAM, and a 500 GB hard disk drive. Nodes are interconnected through a fully non-blocking fat-tree InfiniBand network, and are equipped with Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview][1].
The cluster runs with an operating system which is compatible with the RedHat [Linux family][a]. We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment][2]. The cluster runs with an operating system which is compatible with the RedHat [Linux family][a]. We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment][2].
...@@ -12,9 +12,9 @@ Read more on how to [apply for resources][4], [obtain login credentials][5] and ...@@ -12,9 +12,9 @@ Read more on how to [apply for resources][4], [obtain login credentials][5] and
[1]: hardware-overview.md [1]: hardware-overview.md
[2]: ../environment-and-modules.md [2]: ../environment-and-modules.md
[3]: resources-allocation-policy.md [3]: ../general/resources-allocation-policy.md
[4]: ../general/applying-for-resources.md [4]: ../general/applying-for-resources.md
[5]: ../general/obtaining-login-credentials/obtaining-login-credentials.md [5]: ../general/obtaining-login-credentials/obtaining-login-credentials.md
[6]: shell-and-data-access.md [6]: ../general/shell-and-data-access.md
[a]: http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg [a]: http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg
# Resources Allocation Policy
## Job Queue Policies
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and the resources available to the Project. The Fair-share system of Anselm ensures that individual users may consume approximately equal amounts of resources per week. Detailed information can be found in the [Job scheduling][1] section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. The following table provides the queue partitioning overview:
!!! note
Check the queue status at <https://extranet.it4i.cz/rsweb/anselm/>
| queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime |
| ------------------- | -------------- | -------------------- | ---------------------------------------------------- | --------- | -------- | ------------- | -------- |
| qexp | no | none required | 209 nodes | 1 | 150 | no | 1 h |
| qprod | yes | > 0 | 180 nodes w/o accelerator | 16 | 0 | no | 24/48 h |
| qlong | yes | > 0 | 180 nodes w/o accelerator | 16 | 0 | no | 72/144 h |
| qnvidia, qmic | yes | > 0 | 23 nvidia nodes, 4 mic nodes | 16 | 200 | yes | 24/48 h |
| qfat | yes | > 0 | 2 fat nodes | 16 | 200 | yes | 24/144 h |
| qfree | yes | < 120% of allocation | 180 w/o accelerator | 16 | -1024 | no | 12 h |
!!! note
**The qfree queue is not free of charge**. [Normal accounting][2] applies. However, it allows for utilization of free resources, once a project has exhausted all its allocated computational resources. This does not apply to Director's Discretion projects (DD projects) by default. Usage of qfree after exhaustion of DD projects' computational resources is allowed after request for this queue.
**The qexp queue is equipped with nodes which do not have exactly the same CPU clock speed.** Should you need the nodes to have exactly the same CPU speed, you have to select the proper nodes during the PSB job submission.
* **qexp**, the Express queue: This queue is dedicated to testing and running very small jobs. It is not required to specify a project to enter the qexp. There are always 2 nodes reserved for this queue (w/o accelerators), a maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB of RAM (cn208-209). This enables us to test and tune accelerated code and code with higher RAM requirements. The nodes may be allocated on a per core basis. No special authorization is required to use qexp. The maximum runtime in qexp is 1 hour.
* **qprod**, the Production queue: This queue is intended for normal production runs. It is required that an active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerators are included. Full nodes, 16 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
* **qlong**, the Long queue: This queue is intended for long production runs. It is required that an active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times that of the standard qprod time - 3 x 48 h).
* **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to accessing the Nvidia accelerated nodes, the qmic to accessing MIC nodes and qfat the Fat nodes. It is required that an active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic, and 2 fat nodes are included. Full nodes, 16 cores per node, are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs to explicitly ask [support][a] for authorization to enter the dedicated queues for all users associated with her/his project.
* **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a project has exhausted all of its allocated computational resources (Does not apply to DD projects by default; DD projects have to request persmission to use qfree after exhaustion of computational resources). It is required that active project is specified to enter the queue. Consumed resources will be accounted to the Project. Access to the qfree queue is automatically removed if consumed resources exceed 120% of the resources allocated to the Project. Only 180 nodes without accelerators may be accessed from this queue. Full nodes, 16 cores per node, are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
## Queue Notes
The job wall clock time defaults to **half the maximum time**, see the table above. Longer wall time limits can be [set manually, see examples][3].
Jobs that exceed the reserved wall clock time (Req'd Time) get killed automatically. The wall clock time limit can be changed for queuing jobs (state Q) using the qalter command, however it cannot be changed for a running job (state R).
Anselm users may check the current queue configuration [here][b].
## Queue Status
!!! tip
Check the status of jobs, queues and compute nodes [here][c].
![rspbs web interface](../img/rsweb.png)
Display the queue status on Anselm:
```console
$ qstat -q
```
The PBS allocation overview may be obtained also using the rspbs command:
```console
$ rspbs
Usage: rspbs [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
--get-node-ncpu-chart
Print chart of allocated ncpus per node
--summary Print summary
--get-server-details Print server
--get-queues Print queues
--get-queues-details Print queues details
--get-reservations Print reservations
--get-reservations-details
Print reservations details
--get-nodes Print nodes of PBS complex
--get-nodeset Print nodeset of PBS complex
--get-nodes-details Print nodes details
--get-jobs Print jobs
--get-jobs-details Print jobs details
--get-jobs-check-params
Print jobid, job state, session_id, user, nodes
--get-users Print users of jobs
--get-allocated-nodes
Print allocated nodes of jobs
--get-allocated-nodeset
Print allocated nodeset of jobs
--get-node-users Print node users
--get-node-jobs Print node jobs
--get-node-ncpus Print number of ncpus per node
--get-node-allocated-ncpus
Print number of allocated ncpus per node
--get-node-qlist Print node qlist
--get-node-ibswitch Print node ibswitch
--get-user-nodes Print user nodes
--get-user-nodeset Print user nodeset
--get-user-jobs Print user jobs
--get-user-jobc Print number of jobs per user
--get-user-nodec Print number of allocated nodes per user
--get-user-ncpus Print number of allocated ncpus per user
--get-qlist-nodes Print qlist nodes
--get-qlist-nodeset Print qlist nodeset
--get-ibswitch-nodes Print ibswitch nodes
--get-ibswitch-nodeset
Print ibswitch nodeset
--state=STATE Only for given job state
--jobid=JOBID Only for given job ID
--user=USER Only for given user
--node=NODE Only for given node
--nodestate=NODESTATE
Only for given node state (affects only --get-node*
--get-qlist-* --get-ibswitch-* actions)
--incl-finished Include finished jobs
```
---8<--- "resource_accounting.md"
---8<--- "mathjax.md"
[1]: job-priority.md
[2]: #resources-accounting-policy
[3]: job-submission-and-execution.md
[a]: https://support.it4i.cz/rt/
[b]: https://extranet.it4i.cz/rsweb/anselm/queues
[c]: https://extranet.it4i.cz/rsweb/anselm/
...@@ -4,7 +4,7 @@ There are two main shared file systems on Anselm cluster, the [HOME][1] and [SCR ...@@ -4,7 +4,7 @@ There are two main shared file systems on Anselm cluster, the [HOME][1] and [SCR
## Archiving ## Archiving
Please don't use shared filesystems as a backup for large amount of data or long-term archiving mean. The academic staff and students of research institutions in the Czech Republic can use [CESNET storage service][3], which is available via SSHFS. Don't use shared filesystems as a backup for large amount of data or long-term archiving mean. The academic staff and students of research institutions in the Czech Republic can use [CESNET storage service][3], which is available via SSHFS.
## Shared Filesystems ## Shared Filesystems
...@@ -333,7 +333,7 @@ The procedure to obtain the CESNET access is quick and trouble-free. ...@@ -333,7 +333,7 @@ The procedure to obtain the CESNET access is quick and trouble-free.
### Understanding CESNET Storage ### Understanding CESNET Storage
!!! note !!! note
It is very important to understand the CESNET storage before uploading data. [Please read][i] first. It is very important to understand the CESNET storage before uploading data. [Read][i] first.
Once registered for CESNET Storage, you may [access the storage][j] in number of ways. We recommend the SSHFS and RSYNC methods. Once registered for CESNET Storage, you may [access the storage][j] in number of ways. We recommend the SSHFS and RSYNC methods.
......
# NVIDIA DGX-2
[DGX-2][a] builds upon [DGX-1][b] in several ways. Introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
With NVLink2, enables sixteen GPUs to be grouped together in a single system, for a total bandwidth going beyond 14 TB/s. Pair of Xeon CPUs, 1.5 TB of memory, and 30 TB of NVMe storage, and we get a system that consumes 10 kW, weighs 163.29 kg, but offers easily double the performance of the DGX-1.
NVIDIA likes to tout that this means it offers a total of ~2 PFLOPs of compute performance in a single system, when using the tensor cores.
<div align="center">
<iframe src="https://www.youtube.com/embed/OTOGw0BRqK0" width="50%" height="195" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
![](../img/dgx1.png)
| NVIDIA DGX-2 | |
| --- | --- |
| CPUs | 2 x Intel Xeon Platinum |
| GPUs | 16 x NVIDIA Tesla V100 32GB HBM2 |
| System Memory | Up to 1.5 TB DDR4 |
| GPU Memory | 512 GB HBM2 (16 x 32 GB) |
| Storage | 30 TB NVMe, Up to 60 TB |
| Networking | 8 x Infiniband or 8 x 100 GbE |
| Power | 10 kW |
| Size | 350 lbs |
| GPU Throughput | Tensor: 1920 TFLOPs, FP16: 480 TFLOPs, FP32: 240 TFLOPs, FP64: 120 TFLOPs |
![](../img/dgx2.png)
AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes
The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space, though with the usual tradeoffs involved if going off-chip.
Not unlike the Tesla V100 memory capacity increase then, one of NVIDIA’s goals here is to build a system that can keep in-memory workloads that would be too large for an 8 GPU cluster. Providing one such example, NVIDIA is saying that the DGX-2 is able to complete the training process for FAIRSEQ – a neural network model for language translation – 10x faster than a DGX-1 system, bringing it down to less than two days total rather than 15.
![](../img/dgx3.png)
Otherwise, similar to its DGX-1 counterpart, the DGX-2 is designed to be a powerful server in its own right. On the storage side the DGX-2 comes with 30TB of NVMe-based solid state storage. And for clustering or further inter-system communications, it also offers InfiniBand and 100GigE connectivity, up to eight of them.
![](../img/dgx4.png)
The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity.
[a]: https://www.nvidia.com/content/dam/en-zz/es_em/Solutions/Data-Center/dgx-2/nvidia-dgx-2-datasheet.pdf
[b]: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/dgx-1/dgx-1-ai-supercomputer-datasheet-v4.pdf
...@@ -42,7 +42,7 @@ Also remember that display number should be less or equal 99. ...@@ -42,7 +42,7 @@ Also remember that display number should be less or equal 99.
Based on this **we have choosen display number 61** for us, so this number you can see in examples below. Based on this **we have choosen display number 61** for us, so this number you can see in examples below.
!!! note !!! note
Your situation may be different so also choose of your number may be different. **Please choose and use your own display number accordingly!** Your situation may be different so also choose of your number may be different. **Choose and use your own display number accordingly!**
Start your VNC server on choosen display number (61): Start your VNC server on choosen display number (61):
...@@ -76,7 +76,7 @@ username :102 ...@@ -76,7 +76,7 @@ username :102
``` ```
!!! note !!! note
The VNC server runs on port 59xx, where xx is the display number. So, you get your port number simply as 5900 + display number, in our example 5900 + 61 = 5961. Another example for display number 102 is calculation of TCP port 5900 + 102 = 6002 but be aware, that TCP ports above 6000 are often used by X11. **Please, calculate your own port number and use it instead of 5961 from examples below!** The VNC server runs on port 59xx, where xx is the display number. So, you get your port number simply as 5900 + display number, in our example 5900 + 61 = 5961. Another example for display number 102 is calculation of TCP port 5900 + 102 = 6002 but be aware, that TCP ports above 6000 are often used by X11. **Calculate your own port number and use it instead of 5961 from examples below!**
To access the VNC server you have to create a tunnel between the login node using TCP port 5961 and your machine using a free TCP port (for simplicity the very same) in next step. See examples for [Linux/Mac OS][2] and [Windows][3]. To access the VNC server you have to create a tunnel between the login node using TCP port 5961 and your machine using a free TCP port (for simplicity the very same) in next step. See examples for [Linux/Mac OS][2] and [Windows][3].
...@@ -260,4 +260,4 @@ Example described above: ...@@ -260,4 +260,4 @@ Example described above:
[1]: x-window-system.md [1]: x-window-system.md
[2]: #linuxmac-os-example-of-creating-a-tunnel [2]: #linuxmac-os-example-of-creating-a-tunnel
[3]: #windows-example-of-creating-a-tunnel [3]: #windows-example-of-creating-a-tunnel
[4]: ../../../anselm/job-submission-and-execution.md [4]: ../../job-submission-and-execution.md
...@@ -93,7 +93,7 @@ local $ ssh-keygen -C 'username@organization.example.com' -f additional_key ...@@ -93,7 +93,7 @@ local $ ssh-keygen -C 'username@organization.example.com' -f additional_key
``` ```
!!! note !!! note
Please, enter **strong** **passphrase** for securing your private key. Enter **strong** **passphrase** for securing your private key.
You can insert additional public key into authorized_keys file for authentication with your own private key. Additional records in authorized_keys file must be delimited by new line. Users are not advised to remove the default public key from authorized_keys file. You can insert additional public key into authorized_keys file for authentication with your own private key. Additional records in authorized_keys file must be delimited by new line. Users are not advised to remove the default public key from authorized_keys file.
......
...@@ -7,7 +7,7 @@ In many cases, it is useful to submit a huge (>100+) number of computational job ...@@ -7,7 +7,7 @@ In many cases, it is useful to submit a huge (>100+) number of computational job
However, executing a huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling, and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1000 per job array** However, executing a huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling, and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1000 per job array**
!!! note !!! note
Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. Follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
* Use [Job arrays][1] when running a huge number of [multithread][2] (bound to one node only) or multinode (multithread across several nodes) jobs * Use [Job arrays][1] when running a huge number of [multithread][2] (bound to one node only) or multinode (multithread across several nodes) jobs
* Use [GNU parallel][3] when running single core jobs * Use [GNU parallel][3] when running single core jobs
...@@ -45,7 +45,7 @@ First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) ...@@ -45,7 +45,7 @@ First, we create a tasklist file (or subjobs list), listing all tasks (subjobs)
$ find . -name 'file*' > tasklist $ find . -name 'file*' > tasklist
``` ```
Then we create the jobscript: Then we create the jobscript for Anselm cluster:
```bash ```bash
#!/bin/bash #!/bin/bash
...@@ -70,6 +70,31 @@ cp $PBS_O_WORKDIR/$TASK input ; cp $PBS_O_WORKDIR/myprog.x . ...@@ -70,6 +70,31 @@ cp $PBS_O_WORKDIR/$TASK input ; cp $PBS_O_WORKDIR/myprog.x .
cp output $PBS_O_WORKDIR/$TASK.out cp output $PBS_O_WORKDIR/$TASK.out
``` ```
Then we create jobscript for Salomon cluster:
```bash
#!/bin/bash
#PBS -A PROJECT_ID
#PBS -q qprod
#PBS -l select=1:ncpus=24,walltime=02:00:00
# change to scratch directory
SCR=/scratch/work/user/$USER/$PBS_JOBID
mkdir -p $SCR ; cd $SCR || exit
# get individual tasks from tasklist with index from PBS JOB ARRAY
TASK=$(sed -n "${PBS_ARRAY_INDEX}p" $PBS_O_WORKDIR/tasklist)
# copy input file and executable to scratch
cp $PBS_O_WORKDIR/$TASK input ; cp $PBS_O_WORKDIR/myprog.x .
# execute the calculation
./myprog.x < input > output
# copy output file to submit directory
cp output $PBS_O_WORKDIR/$TASK.out
```
In this example, the submit directory holds the 900 input files, the executable myprog.x, and the jobscript file. As an input for each run, we take the filename of the input file from the created tasklist file. We copy the input file to the local scratch memory /lscratch/$PBS_JOBID, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The myprog.x runs on one node only and must use threads to run in parallel. Be aware, that if the myprog.x **is not multithreaded**, then all the **jobs are run as single thread programs in a sequential** manner. Due to the allocation of the whole node, the accounted time is equal to the usage of the whole node, while using only 1/16 of the node! In this example, the submit directory holds the 900 input files, the executable myprog.x, and the jobscript file. As an input for each run, we take the filename of the input file from the created tasklist file. We copy the input file to the local scratch memory /lscratch/$PBS_JOBID, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The myprog.x runs on one node only and must use threads to run in parallel. Be aware, that if the myprog.x **is not multithreaded**, then all the **jobs are run as single thread programs in a sequential** manner. Due to the allocation of the whole node, the accounted time is equal to the usage of the whole node, while using only 1/16 of the node!
If running a huge number of parallel multicore (in means of multinode multithread, e. g. MPI enabled) jobs is needed, then a job array approach should be used. The main difference as compared to previous examples using one node is that the local scratch memory should not be used (as it's not shared between nodes) and MPI or other techniques for parallel multinode processing has to be used properly. If running a huge number of parallel multicore (in means of multinode multithread, e. g. MPI enabled) jobs is needed, then a job array approach should be used. The main difference as compared to previous examples using one node is that the local scratch memory should not be used (as it's not shared between nodes) and MPI or other techniques for parallel multinode processing has to be used properly.
...@@ -78,11 +103,20 @@ If running a huge number of parallel multicore (in means of multinode multithrea ...@@ -78,11 +103,20 @@ If running a huge number of parallel multicore (in means of multinode multithrea
To submit the job array, use the qsub -J command. The 900 jobs of the [example above][5] may be submitted like this: To submit the job array, use the qsub -J command. The 900 jobs of the [example above][5] may be submitted like this:
#### Anselm
```console ```console
$ qsub -N JOBNAME -J 1-900 jobscript $ qsub -N JOBNAME -J 1-900 jobscript
12345[].dm2 12345[].dm2
``` ```
#### Salomon
```console
$ qsub -N JOBNAME -J 1-900 jobscript
506493[].isrv5
```
In this example, we submit a job array of 900 subjobs. Each subjob will run on one full node and is assumed to take less than 2 hours (note the #PBS directives in the beginning of the jobscript file, don't forget to set your valid PROJECT_ID and desired queue). In this example, we submit a job array of 900 subjobs. Each subjob will run on one full node and is assumed to take less than 2 hours (note the #PBS directives in the beginning of the jobscript file, don't forget to set your valid PROJECT_ID and desired queue).
Sometimes for testing purposes, you may need to submit a one-element only array. This is not allowed by PBSPro, but there's a workaround: Sometimes for testing purposes, you may need to submit a one-element only array. This is not allowed by PBSPro, but there's a workaround:
...@@ -152,7 +186,7 @@ Read more on job arrays in the [PBSPro Users guide][6]. ...@@ -152,7 +186,7 @@ Read more on job arrays in the [PBSPro Users guide][6].
!!! note !!! note
Use GNU parallel to run many single core tasks on one node. Use GNU parallel to run many single core tasks on one node.
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful when running single core jobs via the queue system on Anselm. GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful when running single core jobs via the queue systems.
For more information and examples see the parallel man page: For more information and examples see the parallel man page:
...@@ -308,7 +342,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**, ...@@ -308,7 +342,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**,
Download the examples in [capacity.zip][9], illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs. Download the examples in [capacity.zip][9], illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs.
Unzip the archive in an empty directory on Anselm and follow the instructions in the README file Unzip the archive in an empty directory on cluster and follow the instructions in the README file
```console ```console
$ unzip capacity.zip $ unzip capacity.zip
......
...@@ -56,7 +56,7 @@ Job execution priority (job sort formula) is calculated as: ...@@ -56,7 +56,7 @@ Job execution priority (job sort formula) is calculated as:
### Job Backfilling ### Job Backfilling
The Anselm cluster uses job backfilling. The scheduler uses job backfilling.
Backfilling means fitting smaller jobs around the higher-priority jobs that the scheduler is going to run next, in such a way that the higher-priority jobs are not delayed. Backfilling allows us to keep resources from becoming idle when the top job (the job with the highest execution priority) cannot run. Backfilling means fitting smaller jobs around the higher-priority jobs that the scheduler is going to run next, in such a way that the higher-priority jobs are not delayed. Backfilling allows us to keep resources from becoming idle when the top job (the job with the highest execution priority) cannot run.
...@@ -71,5 +71,11 @@ Specifying more accurate walltime enables better scheduling, better execution ti ...@@ -71,5 +71,11 @@ Specifying more accurate walltime enables better scheduling, better execution ti
---8<--- "mathjax.md" ---8<--- "mathjax.md"
### Job Placement
Job [placement can be controlled by flags during submission][1].
[1]: job-submission-and-execution.md#job_placement
[a]: https://extranet.it4i.cz/rsweb/anselm/queues [a]: https://extranet.it4i.cz/rsweb/anselm/queues
[b]: https://extranet.it4i.cz/rsweb/anselm/projects [b]: https://extranet.it4i.cz/rsweb/anselm/projects
...@@ -40,9 +40,9 @@ Read more on [Capacity computing][6] page. ...@@ -40,9 +40,9 @@ Read more on [Capacity computing][6] page.
[1]: #terminology-frequently-used-on-these-pages [1]: #terminology-frequently-used-on-these-pages
[2]: ../pbspro.md [2]: ../pbspro.md
[3]: ../salomon/job-priority.md#fair-share-priority [3]: job-priority.md#fair-share-priority
[4]: ../salomon/resources-allocation-policy.md [4]: resources-allocation-policy.md
[5]: ../salomon/job-submission-and-execution.md [5]: job-submission-and-execution.md
[6]: ../salomon/capacity-computing.md [6]: capacity-computing.md
[a]: https://extranet.it4i.cz/rsweb/salomon/queues [a]: https://extranet.it4i.cz/rsweb/salomon/queues
...@@ -2,10 +2,34 @@ ...@@ -2,10 +2,34 @@
## Job Queue Policies ## Job Queue Policies
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling][1] section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview: The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and the resources available to the Project. The Fair-share system of Anselm ensures that individual users may consume approximately equal amounts of resources per week. Detailed information can be found in the [Job scheduling][1] section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. The following table provides the queue partitioning overview:
!!! note !!! note
Check the queue status [here][a]. Check the queue status [here][z].
### Anselm
| queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime |
| ------------------- | -------------- | -------------------- | ---------------------------------------------------- | --------- | -------- | ------------- | -------- |
| qexp | no | none required | 209 nodes | 1 | 150 | no | 1 h |
| qprod | yes | > 0 | 180 nodes w/o accelerator | 16 | 0 | no | 24/48 h |
| qlong | yes | > 0 | 180 nodes w/o accelerator | 16 | 0 | no | 72/144 h |
| qnvidia | yes | > 0 | 23 nvidia nodes | 16 | 200 | yes | 24/48 h |
| qfat | yes | > 0 | 2 fat nodes | 16 | 200 | yes | 24/144 h |
| qfree | yes | < 120% of allocation | 180 w/o accelerator | 16 | -1024 | no | 12 h |
!!! note
**The qfree queue is not free of charge**. [Normal accounting][2] applies. However, it allows for utilization of free resources, once a project has exhausted all its allocated computational resources. This does not apply to Director's Discretion projects (DD projects) by default. Usage of qfree after exhaustion of DD projects' computational resources is allowed after request for this queue.
**The qexp queue is equipped with nodes which do not have exactly the same CPU clock speed.** Should you need the nodes to have exactly the same CPU speed, you have to select the proper nodes during the PSB job submission.
* **qexp**, the Express queue: This queue is dedicated to testing and running very small jobs. It is not required to specify a project to enter the qexp. There are always 2 nodes reserved for this queue (w/o accelerators), a maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB of RAM (cn208-209). This enables us to test and tune accelerated code and code with higher RAM requirements. The nodes may be allocated on a per core basis. No special authorization is required to use qexp. The maximum runtime in qexp is 1 hour.
* **qprod**, the Production queue: This queue is intended for normal production runs. It is required that an active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerators are included. Full nodes, 16 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
* **qlong**, the Long queue: This queue is intended for long production runs. It is required that an active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times that of the standard qprod time - 3 x 48 h).
* **qnvidia**, **qmic**, **qfat**, the Dedicated queues: The queue qnvidia is dedicated to accessing the Nvidia accelerated nodes, the qmic to accessing MIC nodes and qfat the Fat nodes. It is required that an active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic, and 2 fat nodes are included. Full nodes, 16 cores per node, are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs to explicitly ask [support][a] for authorization to enter the dedicated queues for all users associated with her/his project.
* **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a project has exhausted all of its allocated computational resources (Does not apply to DD projects by default; DD projects have to request persmission to use qfree after exhaustion of computational resources). It is required that active project is specified to enter the queue. Consumed resources will be accounted to the Project. Access to the qfree queue is automatically removed if consumed resources exceed 120% of the resources allocated to the Project. Only 180 nodes without accelerators may be accessed from this queue. Full nodes, 16 cores per node, are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
### Salomon
| queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime | | queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime |
| ------------------------------- | -------------- | -------------------- | ------------------------------------------------------------- | --------- | -------- | ------------- | --------- | | ------------------------------- | -------------- | -------------------- | ------------------------------------------------------------- | --------- | -------- | ------------- | --------- |
...@@ -35,26 +59,26 @@ The resources are allocated to the job in a fair-share fashion, subject to const ...@@ -35,26 +59,26 @@ The resources are allocated to the job in a fair-share fashion, subject to const
## Queue Notes ## Queue Notes
The job wall-clock time defaults to **half the maximum time**, see table above. Longer wall time limits can be [set manually, see examples][3]. The job wall clock time defaults to **half the maximum time**, see the table above. Longer wall time limits can be [set manually, see examples][3].
Jobs that exceed the reserved wall-clock time (Req'd Time) get killed automatically. Wall-clock time limit can be changed for queuing jobs (state Q) using the qalter command, however can not be changed for a running job (state R). Jobs that exceed the reserved wall clock time (Req'd Time) get killed automatically. The wall clock time limit can be changed for queuing jobs (state Q) using the qalter command, however it cannot be changed for a running job (state R).
Salomon users may check current queue configuration [here][b]. Anselm users may check the current queue configuration [here][b].
## Queue Status ## Queue Status
!!! note !!! tip
Check the status of jobs, queues and compute nodes [here][a]. Check the status of jobs, queues and compute nodes [here][c].
![RSWEB Salomon](../img/rswebsalomon.png "RSWEB Salomon") ![rspbs web interface](../img/rsweb.png)
Display the queue status on Salomon: Display the queue status on Anselm:
```console ```console
$ qstat -q $ qstat -q
``` ```
The PBS allocation overview may be obtained also using the rspbs command. The PBS allocation overview may be obtained also using the rspbs command:
```console ```console
$ rspbs $ rspbs
...@@ -63,6 +87,9 @@ Usage: rspbs [options] ...@@ -63,6 +87,9 @@ Usage: rspbs [options]
Options: Options:
--version show program's version number and exit --version show program's version number and exit
-h, --help show this help message and exit -h, --help show this help message and exit
--get-node-ncpu-chart
Print chart of allocated ncpus per node
--summary Print summary
--get-server-details Print server --get-server-details Print server
--get-queues Print queues --get-queues Print queues
--get-queues-details Print queues details --get-queues-details Print queues details
...@@ -99,10 +126,6 @@ Options: ...@@ -99,10 +126,6 @@ Options:
--get-ibswitch-nodes Print ibswitch nodes --get-ibswitch-nodes Print ibswitch nodes
--get-ibswitch-nodeset --get-ibswitch-nodeset
Print ibswitch nodeset Print ibswitch nodeset
--summary Print summary
--get-node-ncpu-chart
Obsolete. Print chart of allocated ncpus per node
--server=SERVER Use given PBS server
--state=STATE Only for given job state --state=STATE Only for given job state
--jobid=JOBID Only for given job ID --jobid=JOBID Only for given job ID
--user=USER Only for given user --user=USER Only for given user
...@@ -118,8 +141,9 @@ Options: ...@@ -118,8 +141,9 @@ Options:
---8<--- "mathjax.md" ---8<--- "mathjax.md"
[1]: job-priority.md [1]: job-priority.md
[2]: #resource-accounting-policy [2]: #resources-accounting-policy
[3]: job-submission-and-execution.md [3]: job-submission-and-execution.md
[a]: https://extranet.it4i.cz/rsweb/salomon/ [a]: https://support.it4i.cz/rt/
[b]: https://extranet.it4i.cz/rsweb/salomon/queues [b]: https://extranet.it4i.cz/rsweb/anselm/queues
[c]: https://extranet.it4i.cz/rsweb/anselm/
docs.it4i/img/favicon.ico

32.2 KB | W: | H:

docs.it4i/img/favicon.ico

2.72 KB | W: | H:

docs.it4i/img/favicon.ico
docs.it4i/img/favicon.ico
docs.it4i/img/favicon.ico
docs.it4i/img/favicon.ico
  • 2-up
  • Swipe
  • Onion skin
docs.it4i/img/fig1.png

154 KB | W: | H:

docs.it4i/img/fig1.png

154 KB | W: | H:

docs.it4i/img/fig1.png
docs.it4i/img/fig1.png
docs.it4i/img/fig1.png
docs.it4i/img/fig1.png
  • 2-up
  • Swipe
  • Onion skin
# Documentation # Documentation
!!! Warning Welcome to the IT4Innovations documentation pages. The IT4Innovations national supercomputing center operates the supercomputers [Anselm][2] and [Salomon][1]. The supercomputers are [available][4] to the academic community within the Czech Republic and Europe, and the industrial community worldwide. The purpose of these pages is to provide comprehensive documentation of the hardware, software and usage of the computers.
Salomon operating system has been upgraded to the latest CentOS 7.6 on 2018-12-05. Make sure to read the [details][upgrade].
Welcome to the IT4Innovations documentation pages. The IT4Innovations national supercomputing center operates the supercomputers [Salomon][1] and [Anselm][2]. The supercomputers are [available][3] to the academic community within the Czech Republic and Europe, and the industrial community worldwide. The purpose of these pages is to provide comprehensive documentation of the hardware, software and usage of the computers.
## How to Read the Documentation ## How to Read the Documentation
...@@ -14,7 +11,7 @@ Welcome to the IT4Innovations documentation pages. The IT4Innovations national s ...@@ -14,7 +11,7 @@ Welcome to the IT4Innovations documentation pages. The IT4Innovations national s
## Getting Help and Support ## Getting Help and Support
!!! note !!! note
Contact [support\[at\]it4i.cz][a] for help and support regarding the cluster technology at IT4Innovations. Please use **Czech**, **Slovak** or **English** language for communication with us. Follow the status of your request to IT4Innovations [here][b]. The IT4Innovations support team will use best efforts to resolve requests within thirty days. Contact [support\[at\]it4i.cz][a] for help and support regarding the cluster technology at IT4Innovations. Use **Czech**, **Slovak** or **English** language for communication with us. Follow the status of your request to IT4Innovations [here][b]. The IT4Innovations support team will use best efforts to resolve requests within thirty days.
Use your IT4Innovations username and password to log in to the [support][b] portal. Use your IT4Innovations username and password to log in to the [support][b] portal.
...@@ -34,7 +31,7 @@ In many cases, you will run your own code on the cluster. In order to fully expl ...@@ -34,7 +31,7 @@ In many cases, you will run your own code on the cluster. In order to fully expl
* **node:** a computer, interconnected via a network to other computers - Computational nodes are powerful computers, designed for, and dedicated to executing demanding scientific computations. * **node:** a computer, interconnected via a network to other computers - Computational nodes are powerful computers, designed for, and dedicated to executing demanding scientific computations.
* **core:** a processor core, a unit of processor, executing computations * **core:** a processor core, a unit of processor, executing computations
* **core-hour:** also normalized core-hour, NCH. A metric of computer utilization, [see definition][4]. * **core-hour:** also normalized core-hour, NCH. A metric of computer utilization, [see definition][5].
* **job:** a calculation running on the supercomputer - the job allocates and utilizes the resources of the supercomputer for certain time. * **job:** a calculation running on the supercomputer - the job allocates and utilizes the resources of the supercomputer for certain time.
* **HPC:** High Performance Computing * **HPC:** High Performance Computing
* **HPC (computational) resources:** corehours, storage capacity, software licences * **HPC (computational) resources:** corehours, storage capacity, software licences
...@@ -71,9 +68,8 @@ By doing so, you can save other readers from frustration and help us improve. ...@@ -71,9 +68,8 @@ By doing so, you can save other readers from frustration and help us improve.
[1]: salomon/introduction.md [1]: salomon/introduction.md
[2]: anselm/introduction.md [2]: anselm/introduction.md
[3]: general/applying-for-resources.md [4]: general/applying-for-resources.md
[4]: salomon/resources-allocation-policy.md#normalized-core-hours-nch [5]: general/resources-allocation-policy.md#normalized-core-hours-nch
[upgrade]: salomon-upgrade.md
[a]: mailto:support@it4i.cz [a]: mailto:support@it4i.cz
[b]: http://support.it4i.cz/rt [b]: http://support.it4i.cz/rt
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
# Available Modules
## Compiler
| Module | Description |
| ------ | ----------- |
| [icc](http://software.intel.com/en-us/intel-compilers/) | Intel C and C++ compilers |
## Data
| Module | Description |
| ------ | ----------- |
| [HDF5](http://www.hdfgroup.org/HDF5/) | HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections. |