Commit b48cf4a1 authored by Pavel Jirásek's avatar Pavel Jirásek
Browse files

Merge branch 'remark' of gitlab.it4i.cz:it4i-admins/docs.it4i into remark

parents 2c6e2585 ed3a6164
Pipeline #1947 passed with stages
in 1 minute and 21 seconds
......@@ -70,7 +70,7 @@ cp $PBS_O_WORKDIR/$TASK input ; cp $PBS_O_WORKDIR/myprog.x .
cp output $PBS_O_WORKDIR/$TASK.out
```
In this example, the submit directory holds the 900 input files, executable myprog.x and the jobscript file. As input for each run, we take the filename of input file from created tasklist file. We copy the input file to local scratch /lscratch/$PBS_JOBID, execute the myprog.x and copy the output file back to >the submit directory, under the $TASK.out name. The myprog.x runs on one node only and must use threads to run in parallel. Be aware, that if the myprog.x **is not multithreaded**, then all the **jobs are run as single thread programs in sequential** manner. Due to allocation of the whole node, the accounted time is equal to the usage of whole node\*\*, while using only 1/16 of the node!
In this example, the submit directory holds the 900 input files, executable myprog.x and the jobscript file. As input for each run, we take the filename of input file from created tasklist file. We copy the input file to local scratch /lscratch/$PBS_JOBID, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The myprog.x runs on one node only and must use threads to run in parallel. Be aware, that if the myprog.x **is not multithreaded**, then all the **jobs are run as single thread programs in sequential** manner. Due to allocation of the whole node, the accounted time is equal to the usage of whole node, while using only 1/16 of the node!
If huge number of parallel multicore (in means of multinode multithread, e. g. MPI enabled) jobs is needed to run, then a job array approach should also be used. The main difference compared to previous example using one node is that the local scratch should not be used (as it's not shared between nodes) and MPI or other technique for parallel multinode run has to be used properly.
......@@ -285,7 +285,7 @@ In this example, the jobscript executes in multiple instances in parallel, on al
When deciding this values, think about following guiding rules:
1. Let n=N/16. Inequality (n+1) \* T < W should hold. The N is number of tasks per subjob, T is expected single task walltime and W is subjob walltime. Short subjob walltime improves scheduling and job throughput.
1. Let n=N/16. Inequality (n+1) \* T < W should hold. The N is number of tasks per subjob, T is expected single task walltime and W is subjob walltime. Short subjob walltime improves scheduling and job throughput.
2. Number of tasks should be modulo 16.
3. These rules are valid only when all tasks have similar task walltimes T.
......
......@@ -75,9 +75,6 @@ PrgEnv-gnu sets up the GNU development environment in conjunction with the bullx
PrgEnv-intel sets up the INTEL development environment in conjunction with the Intel MPI library
How to using modules in examples:
&lt;tty-player controls src=/src/anselm/modules_anselm.ttyrec>&lt;/tty-player>
### Application Modules Path Expansion
All application modules on Salomon cluster (and further) will be build using tool called [EasyBuild](http://hpcugent.github.io/easybuild/ "EasyBuild"). In case that you want to use some applications that are build by EasyBuild already, you have to modify your MODULEPATH environment variable.
......
......@@ -29,8 +29,8 @@ Fair-share priority is calculated as
![](../img/fairshare_formula.png)
where MAX_FAIRSHARE has value 1E6,
usage_Project_ is cumulated usage by all members of selected project,
usage_Total_ is total usage by all users, by all projects.
usage<sub>Project</sub> is cumulated usage by all members of selected project,
usage<sub>Total</sub> is total usage by all users, by all projects.
Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut in half periodically, at the interval 168 hours (one week).
Jobs queued in queue qexp are not calculated to project's usage.
......@@ -40,7 +40,7 @@ Jobs queued in queue qexp are not calculated to project's usage.
Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job.
\###Eligible time
### Eligible time
Eligible time is amount (in seconds) of eligible time job accrued while waiting to run. Jobs with higher eligible time gains higher priority.
......
......@@ -156,7 +156,7 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
16547.srv11 user2 qprod job3x 13516 2 32 -- 48:00 R 00:58
```
In this example user1 and user2 are running jobs named job1, job2 and job3x. The jobs job1 and job2 are using 4 nodes, 16 cores per node each. The job1 already runs for 38 hours and 25 minutes, job2 for 17 hours 44 minutes. The job1 already consumed 64_38.41 = 2458.6 core hours. The job3x already consumed 0.96_32 = 30.93 core hours. These consumed core hours will be accounted on the respective project accounts, regardless of whether the allocated cores were actually used for computations.
In this example user1 and user2 are running jobs named job1, job2 and job3x. The jobs job1 and job2 are using 4 nodes, 16 cores per node each. The job1 already runs for 38 hours and 25 minutes, job2 for 17 hours 44 minutes. The job1 already consumed `64 x 38.41 = 2458.6` core hours. The job3x already consumed `0.96 x 32 = 30.93` core hours. These consumed core hours will be accounted on the respective project accounts, regardless of whether the allocated cores were actually used for computations.
Check status of your jobs using check-pbs-jobs command. Check presence of user's PBS jobs' processes on execution hosts. Display load, processes. Display job standard and error output. Continuously display (tail -f) job standard or error output.
......
......@@ -41,7 +41,8 @@ Please [follow the documentation](shell-and-data-access/).
To have the OpenGL acceleration, **24 bit color depth must be used**. Otherwise only the geometry (desktop size) definition is needed.
_At first VNC server run you need to define a password._
!!! Hint
At first VNC server run you need to define a password.
This example defines desktop with dimensions 1200x700 pixels and 24 bit color depth.
......@@ -97,7 +98,7 @@ $ ssh login2.anselm.it4i.cz -L 5901:localhost:5901
```
x-window-system/
_If you use Windows and Putty, please refer to port forwarding setup in the documentation:_
If you use Windows and Putty, please refer to port forwarding setup in the documentation:
[x-window-and-vnc#section-12](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/)
#### 7. If you don't have Turbo VNC installed on your workstation
......@@ -112,15 +113,15 @@ Mind that you should connect through the SSH tunneled port. In this example it i
$ vncviewer localhost:5901
```
_If you use Windows version of TurboVNC Viewer, just run the Viewer and use address **localhost:5901**._
If you use Windows version of TurboVNC Viewer, just run the Viewer and use address **localhost:5901**.
#### 9. Proceed to the chapter "Access the visualization node"
_Now you should have working TurboVNC session connected to your workstation._
Now you should have working TurboVNC session connected to your workstation.
#### 10. After you end your visualization session
_Don't forget to correctly shutdown your own VNC server on the login node!_
Don't forget to correctly shutdown your own VNC server on the login node!
```bash
$ vncserver -kill :1
......@@ -135,13 +136,16 @@ qviz**. The queue has following properties:
| ---------------------------- | -------------- | ----------------- | ----- | --------- | -------- | ------------- | ---------------- |
| **qviz** Visualization queue | yes | none required | 2 | 4 | 150 | no | 1 hour / 8 hours |
Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 16 GB of RAM and 1/4 of the GPU capacity. _If more GPU power or RAM is required, it is recommended to allocate one whole node per user, so that all 16 cores, whole RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum._
Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 16 GB of RAM and 1/4 of the GPU capacity.
!!! Note
If more GPU power or RAM is required, it is recommended to allocate one whole node per user, so that all 16 cores, whole RAM and whole GPU is exclusive. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum.
To access the visualization node, follow these steps:
#### 1. In your VNC session, open a terminal and allocate a node using PBSPro qsub command
_This step is necessary to allow you to proceed with next steps._
This step is necessary to allow you to proceed with next steps.
```bash
$ qsub -I -q qviz -A PROJECT_ID
......@@ -153,7 +157,7 @@ In this example the default values for CPU cores and usage time are used.
$ qsub -I -q qviz -A PROJECT_ID -l select=1:ncpus=16 -l walltime=02:00:00
```
_Substitute **PROJECT_ID** with the assigned project identification string._
Substitute **PROJECT_ID** with the assigned project identification string.
In this example a whole node for 2 hours is requested.
......
......@@ -22,7 +22,7 @@ The resources are allocated to the job in a fair-share fashion, subject to const
- **qexp**, the Express queue: This queue is dedicated for testing and running very small jobs. It is not required to specify a project to enter the qexp. There are 2 nodes always reserved for this queue (w/o accelerator), maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB RAM (cn208-209). This enables to test and tune also accelerated code or code with higher RAM requirements. The nodes may be allocated on per core basis. No special authorization is required to use it. The maximum runtime in qexp is 1 hour.
- **qprod**, the Production queue: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerator are included. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
- **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 \* 48 h).
- **qlong**, the Long queue: This queue is intended for long production runs. It is required that active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times of the standard qprod time - 3 x 48 h).
- **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to access the Nvidia accelerated nodes, the qmic to access MIC nodes and qfat the Fat nodes. It is required that active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full nodes, 16 cores per node are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated to her/his Project.
- **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a Project exhausted all its allocated computational resources (Does not apply to DD projects by default. DD projects have to request for persmission on qfree after exhaustion of computational resources.). It is required that active project is specified to enter the queue, however no remaining resources are required. Consumed resources will be accounted to the Project. Only 178 nodes without accelerator may be accessed from this queue. Full nodes, 16 cores per node are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
......
......@@ -54,7 +54,7 @@ Now, compile it with Intel compiler:
Now, lets run it with Valgrind. The syntax is:
valgrind [valgrind options] &lt;your program binary> [your program options]
valgrind [valgrind options] < your program binary > [your program options]
If no Valgrind options are specified, Valgrind defaults to running Memcheck tool. Please refer to the Valgrind documentation for a full description of command line options.
......
......@@ -6,7 +6,7 @@ Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, e
- BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, offering vector, vector-matrix, and matrix-matrix operations.
- The PARDISO direct sparse solver, an iterative sparse solver, and supporting sparse BLAS (level 1, 2, and 3) routines for solving sparse systems of equations.
- ScaLAPACK distributed processing linear algebra routines for Linux_ and Windows_ operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS).
- ScaLAPACK distributed processing linear algebra routines for Linux and Windows operating systems, as well as the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS).
- Fast Fourier transform (FFT) functions in one, two, or three dimensions with support for mixed radices (not limited to sizes that are powers of 2), as well as distributed versions of these functions.
- Vector Math Library (VML) routines for optimized mathematical operations on vectors.
- Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions.
......
......@@ -2,16 +2,15 @@
The Salomon cluster provides following elements of the Intel Parallel Studio XE
|Intel Parallel Studio XE|
\| -------------------------------------------------\|
|Intel Compilers|
|Intel Debugger|
|Intel MKL Library|
|Intel Integrated Performance Primitives Library|
|Intel Threading Building Blocks Library|
|Intel Trace Analyzer and Collector|
|Intel Advisor|
|Intel Inspector|
Intel Parallel Studio XE
* Intel Compilers
* Intel Debugger
* Intel MKL Library
* Intel Integrated Performance Primitives Library
* Intel Threading Building Blocks Library
* Intel Trace Analyzer and Collector
* Intel Advisor
* Intel Inspector
## Intel compilers
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment