Commit ffcc5c62 authored by Pavel Gajdušek's avatar Pavel Gajdušek

Merge remote-tracking branch 'origin/master' into gajdusek_clean

Conflicts:
	mkdocs.yml
parents de451d7b c8119615
# Resources Allocation Policy
## Resources Allocation Policy
## Introduction
### Job queue policies
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The Fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview:
......@@ -27,7 +27,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const
* **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to access the Nvidia accelerated nodes, the qmic to access MIC nodes and qfat the Fat nodes. It is required that active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full nodes, 16 cores per node are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated to her/his Project.
* **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 16 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
### Notes
### Queue notes
The job wall clock time defaults to **half the maximum time**, see table above. Longer wall time limits can be [set manually, see examples](job-submission-and-execution/).
......@@ -35,7 +35,7 @@ Jobs that exceed the reserved wall clock time (Req'd Time) get killed automatica
Anselm users may check current queue configuration at <https://extranet.it4i.cz/anselm/queues>.
### Queue Status
### Queue status
!!! tip
Check the status of jobs, queues and compute nodes at <https://extranet.it4i.cz/anselm/>
......@@ -106,24 +106,8 @@ Options:
--incl-finished Include finished jobs
```
## Resources Accounting Policy
---8<--- "resource_accounting.md"
### Core-Hours
The resources that are currently subject to accounting are the core-hours. The core-hours are accounted on the wall clock basis. The accounting runs whenever the computational cores are allocated or blocked via the PBS Pro workload manager (the qsub command), regardless of whether the cores are actually used for any calculation. 1 core-hour is defined as 1 processor core allocated for 1 hour of wall clock time. Allocating a full node (16 cores) for 1 hour accounts to 16 core-hours. See example in the [Job submission and execution](job-submission-and-execution/) section.
---8<--- "mathjax.md"
### Check Consumed Resources
!!! note
The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients>
User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes.
```console
$ it4ifree
Password:
PID Total Used ...by me Free
-------- ------- ------ -------- -------
OPEN-0-0 1500000 400644 225265 1099356
DD-13-1 10000 2606 2606 7394
```
......@@ -15,7 +15,11 @@ In order to display graphical user interface GUI of various software tools, you
## X Display Forwarding on Windows
On Windows use the PuTTY client to enable X11 forwarding. In PuTTY menu, go to Connection-SSH-X11, mark the Enable X11 forwarding checkbox before logging in. Then log in as usual.
On Windows use the PuTTY client to enable X11 forwarding. In PuTTY menu, go to Connection-SSH-X11, mark the Enable X11 forwarding checkbox before logging in.
![](../../../img/cygwinX11forwarding.png)
Then log in as usual.
To verify the forwarding, type
......@@ -130,27 +134,3 @@ In this way, we run remote gnome session on the cluster, displaying it in the lo
Use System-Log Out to close the gnome-session
### if No Able to Forward X11 Using PuTTY to CygwinX
```console
[usename@login1.anselm ~]$ gnome-session &
[1] 23691
[usename@login1.anselm ~]$ PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused
PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused
(gnome-session:23691): WARNING **: Cannot open display:**
```
1. Locate and modify Cygwin shortcut that uses [startxwin](http://x.cygwin.com/docs/man1/startxwin.1.html)
locate
C:cygwin64binXWin.exe
change it
to
C:_cygwin64binXWin.exe -listen tcp_
![XWin-listen-tcp.png](../../../img/XWinlistentcp.png "XWin-listen-tcp.png")
1. Check Putty settings:
Enable X11 forwarding
![](../../../img/cygwinX11forwarding.png)
# Resource Allocation and Job Execution
To run a [job](/#terminology-frequently-used-on-these-pages), [computational resources](/salomon/resources-allocation-policy/#resource-accounting-policy) for this particular job must be allocated. This is done via the PBS Pro job workload manager software, which distributes workloads across the supercomputer. Extensive information about PBS Pro can be found in the [PBS Pro User's Guide](/pbspro).
## Resources Allocation Policy
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share](/salomon/job-priority/#fair-share-priority) ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are are the most important:
* **qexp**, the Express queue
* **qprod**, the Production queue
* **qlong**, the Long queue
* **qmpp**, the Massively parallel queue
* **qnvidia**, **qmic**, **qfat**, the Dedicated queues
* **qfree**, the Free resource utilization queue
!!! note
Check the queue status at <https://extranet.it4i.cz/>
Read more on the [Resource AllocationPolicy](/salomon/resources-allocation-policy) page.
## Job Submission and Execution
!!! note
Use the **qsub** command to submit your jobs.
The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 16 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.**
Read more on the [Job submission and execution](/salomon/job-submission-and-execution) page.
## Capacity Computing
!!! note
Use Job arrays when running huge number of jobs.
Use GNU Parallel and/or Job arrays when running (many) single core jobs.
In many cases, it is useful to submit huge (100+) number of computational jobs into the PBS queue system. Huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations, achieving best runtime, throughput and computer utilization. In this chapter, we discuss the the recommended way to run huge number of jobs, including **ways to run huge number of single core jobs**.
Read more on [Capacity computing](/salomon/capacity-computing) page.
\ No newline at end of file
......@@ -31,7 +31,7 @@ In many cases, you will run your own code on the cluster. In order to fully expl
* **node:** a computer, interconnected by network to other computers - Computational nodes are powerful computers, designed and dedicated for executing demanding scientific computations.
* **core:** processor core, a unit of processor, executing computations
* **corehours:** wall clock hours of processor core time - Each node is equipped with **X** processor cores, provides **X** corehours per 1 wall clock hour.
* **core-hour:** also normalized core-hour, NCH. A metric of computer utilization, [see definition](salomon/resources-allocation-policy/#normalized-core-hours-nch).
* **job:** a calculation running on the supercomputer - The job allocates and utilizes resources of the supercomputer for certain time.
* **HPC:** High Performance Computing
* **HPC (computational) resources:** corehours, storage capacity, software licences
......@@ -59,4 +59,7 @@ local $
## Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in the text or the code we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this documentation. If you find any errata, please report them by visiting <http://support.it4i.cz/rt>, creating a new ticket, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website.
Although we have taken every care to ensure the accuracy of the content, mistakes do happen.
If you find an inconsistency or error, please report it by visiting <http://support.it4i.cz/rt>, creating a new ticket, and entering the details.
By doing so, you can save other readers from frustration and help us improve.
We will fix the problem as soon as possible.
# Resources Allocation Policy
## Resources Allocation Policy
### Job queue policies
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. The fair-share at Anselm ensures that individual users may consume approximately equal amount of resources per week. Detailed information in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following table provides the queue partitioning overview:
......@@ -16,7 +18,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const
| **qviz** Visualization queue | yes | none required | 2 (with NVIDIA Quadro K5000) | 4 | 150 | no | 1 / 8h |
!!! note
**The qfree queue is not free of charge**. [Normal accounting](resources-allocation-policy/#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue.
**The qfree queue is not free of charge**. [Normal accounting](#resource-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does not apply to Directors Discretion (DD projects) but may be allowed upon request.
* **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour.
* **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, however only 86 per job. Full nodes, 24 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
......@@ -29,15 +31,15 @@ The resources are allocated to the job in a fair-share fashion, subject to const
!!! note
To access node with Xeon Phi co-processor user needs to specify that in [job submission select statement](job-submission-and-execution/).
## Notes
### Queue notes
The job wall clock time defaults to **half the maximum time**, see table above. Longer wall time limits can be [set manually, see examples](job-submission-and-execution/).
The job wall-clock time defaults to **half the maximum time**, see table above. Longer wall time limits can be [set manually, see examples](job-submission-and-execution/).
Jobs that exceed the reserved wall clock time (Req'd Time) get killed automatically. Wall clock time limit can be changed for queuing jobs (state Q) using the qalter command, however can not be changed for a running job (state R).
Jobs that exceed the reserved wall-clock time (Req'd Time) get killed automatically. Wall-clock time limit can be changed for queuing jobs (state Q) using the qalter command, however can not be changed for a running job (state R).
Salomon users may check current queue configuration at <https://extranet.it4i.cz/rsweb/salomon/queues>.
## Queue Status
### Queue Status
!!! note
Check the status of jobs, queues and compute nodes at [https://extranet.it4i.cz/rsweb/salomon/](https://extranet.it4i.cz/rsweb/salomon)
......@@ -109,24 +111,8 @@ Options:
--incl-finished Include finished jobs
```
## Resources Accounting Policy
### Core-Hours
---8<--- "resource_accounting.md"
The resources that are currently subject to accounting are the core-hours. The core-hours are accounted on the wall clock basis. The accounting runs whenever the computational cores are allocated or blocked via the PBS Pro workload manager (the qsub command), regardless of whether the cores are actually used for any calculation. 1 core-hour is defined as 1 processor core allocated for 1 hour of wall clock time. Allocating a full node (24 cores) for 1 hour accounts to 24 core-hours. See example in the [Job submission and execution](job-submission-and-execution/) section.
### Check Consumed Resources
!!! note
The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients>
User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes.
```console
$ it4ifree
Password:
PID Total Used ...by me Free
-------- ------- ------ -------- -------
OPEN-0-0 1500000 400644 225265 1099356
DD-13-1 10000 2606 2606 7394
```
---8<--- "mathjax.md"
# Clp
## Introduction
Clp (Coin-or linear programming) is an open-source linear programming solver written in C++. It is primarily meant to be used as a callable library, but a basic, stand-alone executable version is also available.
Clp ([projects.coin-or.org/Clp](https://projects.coin-or.org/Clp)) is a part of the COIN-OR (The Computational Infrastracture for Operations Research) project ([projects.coin-or.org/](https://projects.coin-or.org/)).
## Modules
Clp, version 1.16.10 is available on Salomon via module Clp:
```console
$ ml Clp
```
The module sets up environment variables required for linking and running applications using Clp. This particular command loads the default module Clp/1.16.10-intel-2017a, Intel module intel/2017a and other related modules.
## Compiling and linking
!!! note
Link with -lClp
Load the Clp module. Link using -lClp switch to link your code against Clp.
```console
$ ml Clp
$ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lClp
```
## Example
An example of Clp enabled application follows. In this example, the library solves linear programming problem loaded from file.
```cpp
#include "coin/ClpSimplex.hpp"
int main (int argc, const char *argv[])
{
ClpSimplex model;
int status;
if (argc<2)
status=model.readMps("/apps/all/Clp/1.16.10-intel-2017a/lib/p0033.mps");
else
status=model.readMps(argv[1]);
if (!status) {
model.primal();
}
return 0;
}
```
### Load modules and compile:
```console
ml Clp
icc lp.c -o lp.x -Wl,-rpath=$LIBRARY_PATH -lClp
```
In this example, the lp.c code is compiled using the Intel compiler and linked with Clp. To run the code, the Intel module has to be loaded.
# LMGC90
## Introduction
LMGC90 is a free and open source software dedicated to multiple physics simulation of discrete material and structures.
More details on the capabilities of LMGC90 are available [here][Welcome].
## Modules
The LMGC90, version 2017.rc1 is available on Salomon via module `LMGC90`:
```sh
$ ml LMGC90
```
The module sets up environment variables and loads some other modules, required for running LMGC90 python scripts. This particular command loads the default module, which is `LMGC90/2017.rc1-GCC-6.3.0-2.27`, and modules:
```console
GCCcore/6.3.0
binutils/2.27-GCCcore-6.3.0
GCC/6.3.0-2.27
bzip2/1.0.6
zlib/1.2.8
ncurses/6.0
libreadline/6.3
Tcl/8.6.3
SQLite/3.8.8.1
Python/2.7.9
```
## Running generic example
LMGC90 software main API is a Python module. It comes with a pre-processor written in Python. There are several examples that you can copy from the `examples` directory which is in `/apps/all/LMGC90/2017.rc1-GCC-6.3.0-2.27` folder. Follow the next steps to run one of them.
First choose an example and open a terminal in the directory of the copied example.
### Generation
To have more information on the pre-processor open in a web navigator the file [docs/pre_lmgc/index.html][pre_lmgc].
To run an example, if there is no `DATBOX` directory or it is empty, run the Python generation script which is mostly called `gen_sample.py` with the command:
```console
$ python gen_sample.py
```
You should now have a `DATBOX` directory containing all needed `.DAT` and `.INI` files.
### Computation
Now run the command script usually called `command.py`:
```console
$ python command.py
```
To get more information on the structure on command scripts read the documentation opening the file [docs/chipy/index.html][chipy] in a web browser.
Once the computation is done, you should get the directory `OUTBOX` containing ASCII output files, and a `DISPLAY` directory with output file readable by paraview.
### Postprocessing and Visualization
The ASCII files in `POSTPRO` directory result from the commands in the `DATBOX/POSTPRO.DAT` file. To have more information on how to use these features read the documents [manuals/LMGC90_Postpro.pdf][LMGC90_Postpro.pdf].
The files inside the `DISPLAY` directory can be visualized with paraview. It is advised to read the `.pvd` files which ensure time consistency. The different output files are:
- tacts: contactors of rigid objects
- rigids: center of mass of rigid objects
- inter: interactions
- mecafe: mechanical mesh
- therfe: thermal mesh
- porofe: porous mechanical mesh
- multife: multi-phasic fluid in porous media mesh
[Welcome]: <http://www.lmgc.univ-montp2.fr/~dubois/LMGC90/Web/Welcome_!.html>
[pre_lmgc]: <http://www.lmgc.univ-montp2.fr/%7Edubois/LMGC90/UserDoc/pre/index.html>
[chipy]: <http://www.lmgc.univ-montp2.fr/%7Edubois/LMGC90/UserDoc/chipy/index.html>
[LMGC90_Postpro.pdf]: <https://git-xen.lmgc.univ-montp2.fr/lmgc90/lmgc90_user/blob/2017.rc1/manuals/LMGC90_Postpro.pdf>
......@@ -324,8 +324,8 @@ The local RAM disk file system is intended for temporary scratch data generated
## Summary
| Mountpoint | Usage | Protocol | Net | Capacity | Throughput | Limitations | Access |
| ------------- | ------------------------------ | ----------- | ------- | -------- | ------------ | ----------------------- | --------------------------- |
| Mountpoint | Usage | Protocol | Net Capacity| Throughput | Limitations | Access | Service |
| ------------- | ------------------------------ | ----------- | ------- | -------- | ------------ | ----------------------- | --------------------------- |
| /home | home directory | NFS, 2-Tier | 0.5 PB | 6 GB/s | Quota 250GB | Compute and login nodes | backed up |
| /scratch/work | large project files | Lustre | 1.69 PB | 30 GB/s | Quota | Compute and login nodes | none |
| /scratch/temp | job temporary data | Lustre | 1.69 PB | 30 GB/s | Quota 100 TB | Compute and login nodes | files older 90 days removed |
......
## Resource Accounting Policy
### Wall-clock Core-Hours WCH
The wall-clock core-hours (WCH) are the basic metric of computer utilization time.
1 wall-clock core-hour is defined as 1 processor core allocated for 1 hour of wall-clock time. Allocating a full node (16 cores Anselm, 24 cores Salomon)
for 1 hour amounts to 16 wall-clock core-hours (Anselm) or 24 wall-clock core-hours (Salomon).
### Normalized Core-Hours NCH
The resources subject to accounting are the normalized core-hours (NCH).
The normalized core-hours are obtained from WCH by applying a normalization factor:
$$
NCH = F*WCH
$$
All jobs are accounted in normalized core-hours, using factor F valid at the time of the execution:
| System | F | Validity |
| ------------------------------- | - | -------- |
| Salomon | 1.00 | 2017-09-11 to 2018-06-01 |
| Anselm | 0.65 | 2017-09-11 to 2018-06-01 |
The accounting runs whenever the computational cores are allocated via the PBS Pro workload manager (the qsub command), regardless of whether
the cores are actually used for any calculation.
!!! note
**The allocations are requested/granted in normalized core-hours NCH.**
!!! warning
Whenever the term core-hour is used in this documentation, we mean the normalized core-hour, NCH.
The normalized core-hours were introduced to treat systems of different age on equal footing.
Normalized core-hour is an accounting tool to discount the legacy systems. The past (before 2017-09-11) F factors are all 1.0.
In future, the factors F will be updated, as new systems are installed. Factors F are expected to only decrease in time.
See examples in the [Job submission and execution](job-submission-and-execution/) section.
### Consumed Resources
Check how many core-hours have been consumed. The command it4ifree is available on cluster login nodes.
```console
$ it4ifree
Projects I am participating in
==============================
PID Days left Total Used WCHs Used NCHs WCHs by me NCHs by me Free
---------- ----------- ------- ----------- ----------- ------------ ------------ -------
OPEN-XX-XX 323 0 5169947 5169947 50001 50001 1292555
Projects I am Primarily Investigating
=====================================
PID Login Used WCHs Used NCHs
---------- ---------- ----------- -----------
OPEN-XX-XX user1 376670 376670
user2 4793277 4793277
Legend
======
WCH = Wall-clock Core Hour
NCH = Normalized Core Hour
```
The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients>
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment