Commit e2559ac9 authored by David Hrbáč's avatar David Hrbáč

Merge branch 'virtual_environment' into 'master'

Virtual environment, upgrade MKdocs, upgrade Material design

See merge request !219
parents adc7f60e eac42505
Pipeline #5246 failed with stages
in 3 minutes and 31 seconds
site/
scripts/*.csv
venv/
......@@ -2,10 +2,15 @@ stages:
- test
- build
- deploy
- after_test
variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
docs:
stage: test
image: davidhrbac/docker-mdcheck:latest
allow_failure: true
script:
- mdl -r ~MD024,~MD013,~MD033,~MD014,~MD026,~MD037,~MD036,~MD010,~MD029 *.md docs.it4i # BUGS
......@@ -36,15 +41,39 @@ ext_links:
only:
- master
404s:
stage: after_test
image: davidhrbac/docker-mkdocscheck:latest
script:
- wget -V
- echo https://docs.it4i.cz/devel/$CI_BUILD_REF_NAME/
- wget --spider -e robots=off -o wget.log -r -p https://docs.it4i.cz/devel/$CI_BUILD_REF_NAME/
after_script:
- sed -n '/^Found .* broken links.$/,$p' wget.log
mkdocs:
stage: build
image: davidhrbac/docker-mkdocscheck:latest
cache:
paths:
- .cache/pip
- venv/
before_script:
- python -V # Print out python version for debugging
- pip install virtualenv
- virtualenv venv
- source venv/bin/activate
- pip install -r requirements.txt
script:
- mkdocs -V
# add version to footer
- bash scripts/add_version.sh
# get modules list from clusters
- bash scripts/get_modules.sh
# generate site_url
- (if [ "${CI_BUILD_REF_NAME}" != 'hrb3' ]; then sed -i "s/\(site_url.*$\)/\1devel\/$CI_BUILD_REF_NAME\//" mkdocs.yml;fi);
# generate ULT for code link
- sed -i "s/master/$CI_BUILD_REF_NAME/g" material-new/partials/toc.html
# regenerate modules matrix
- python scripts/modules-matrix.py > docs.it4i/modules-matrix.md
- python scripts/modules-json.py > docs.it4i/modules-matrix.json
......
# User documentation
This is project contain IT4Innovations user documentation source.
This project contains IT4Innovations user documentation source.
## Development
### Install
```console
$ sudo apt install libpython-dev
$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
```
## Environments
......
......@@ -9,13 +9,13 @@ However, executing a huge number of jobs via the PBS queue may strain the system
!!! note
Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
* Use [Job arrays](capacity-computing/#job-arrays) when running a huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs
* Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs
* Combine [GNU parallel with Job arrays](capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs
* Use [Job arrays][1] when running a huge number of [multithread][2] (bound to one node only) or multinode (multithread across several nodes) jobs
* Use [GNU parallel][3] when running single core jobs
* Combine [GNU parallel with Job arrays][4] when running huge number of single core jobs
## Policy
1. A user is allowed to submit at most 100 jobs. Each job may be [a job array](capacity-computing/#job-arrays).
1. A user is allowed to submit at most 100 jobs. Each job may be [a job array][1].
1. The array size is at most 1000 subjobs.
## Job Arrays
......@@ -76,7 +76,7 @@ If running a huge number of parallel multicore (in means of multinode multithrea
### Submit the Job Array
To submit the job array, use the qsub -J command. The 900 jobs of the [example above](capacity-computing/#array_example) may be submitted like this:
To submit the job array, use the qsub -J command. The 900 jobs of the [example above][5] may be submitted like this:
```console
$ qsub -N JOBNAME -J 1-900 jobscript
......@@ -145,7 +145,7 @@ Display status information for all user's subjobs.
$ qstat -u $USER -tJ
```
Read more on job arrays in the [PBSPro Users guide](../pbspro/).
Read more on job arrays in the [PBSPro Users guide][6].
## GNU Parallel
......@@ -207,7 +207,7 @@ In this example, tasks from the tasklist are executed via the GNU parallel. The
### Submit the Job
To submit the job, use the qsub command. The 101 task job of the [example above](capacity-computing/#gp_example) may be submitted as follows:
To submit the job, use the qsub command. The 101 task job of the [example above][7] may be submitted as follows:
```console
$ qsub -N JOBNAME jobscript
......@@ -292,7 +292,7 @@ When deciding this values, keep in mind the following guiding rules:
### Submit the Job Array (-J)
To submit the job array, use the qsub -J command. The 992 task job of the [example above](capacity-computing/#combined_example) may be submitted like this:
To submit the job array, use the qsub -J command. The 992 task job of the [example above][8] may be submitted like this:
```console
$ qsub -N JOBNAME -J 1-992:32 jobscript
......@@ -306,7 +306,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**,
## Examples
Download the examples in [capacity.zip](capacity.zip), illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs.
Download the examples in [capacity.zip][9], illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs.
Unzip the archive in an empty directory on Anselm and follow the instructions in the README file
......@@ -314,3 +314,13 @@ Unzip the archive in an empty directory on Anselm and follow the instructions in
$ unzip capacity.zip
$ cat README
```
[1]: #job-arrays
[2]: #shared-jobscript-on-one-node
[3]: #gnu-parallel
[4]: #job-arrays-and-gnu-parallel
[5]: #array_example
[6]: ../pbspro.md
[7]: #gp_example
[8]: #combined_example
[9]: capacity.zip
......@@ -2,7 +2,7 @@
## Node Configuration
Anselm is cluster of x86-64 Intel based nodes built with Bull Extreme Computing bullx technology. The cluster contains four types of compute nodes.
Anselm is a cluster of x86-64 Intel based nodes built with Bull Extreme Computing bullx technology. The cluster contains four types of compute nodes.
### Compute Nodes Without Accelerators
......@@ -52,7 +52,7 @@ Anselm is cluster of x86-64 Intel based nodes built with Bull Extreme Computing
### Compute Node Summary
| Node type | Count | Range | Memory | Cores | [Access](resources-allocation-policy/) |
| Node type | Count | Range | Memory | Cores | Queues |
| ---------------------------- | ----- | ----------- | ------ | ----------- | -------------------------------------- |
| Nodes without an accelerator | 180 | cn[1-180] | 64GB | 16 @ 2.4GHz | qexp, qprod, qlong, qfree, qprace, qatlas |
| Nodes with a GPU accelerator | 23 | cn[181-203] | 96GB | 16 @ 2.3GHz | qnvidia, qexp |
......
......@@ -2,7 +2,7 @@
The Anselm cluster consists of 209 computational nodes named cn[1-209] of which 180 are regular compute nodes, 23 are GPU Kepler K20 accelerated nodes, 4 are MIC Xeon Phi 5110P accelerated nodes, and 2 are fat nodes. Each node is a powerful x86-64 computer, equipped with 16 cores (two eight-core Intel Sandy Bridge processors), at least 64 GB of RAM, and a local hard drive. User access to the Anselm cluster is provided by two login nodes login[1,2]. The nodes are interlinked through high speed InfiniBand and Ethernet networks. All nodes share a 320 TB /home disk for storage of user files. The 146 TB shared /scratch storage is available for scratch data.
The Fat nodes are equipped with a large amount (512 GB) of memory. Virtualization infrastructure provides resources to run long term servers and services in virtual mode. Fat nodes and virtual servers may access 45 TB of dedicated block storage. Accelerated nodes, fat nodes, and virtualization infrastructure are available [upon request](https://support.it4i.cz/rt) from a PI.
The Fat nodes are equipped with a large amount (512 GB) of memory. Virtualization infrastructure provides resources to run long term servers and services in virtual mode. Fat nodes and virtual servers may access 45 TB of dedicated block storage. Accelerated nodes, fat nodes, and virtualization infrastructure are available [upon request][a] from a PI.
Schematic representation of the Anselm cluster. Each box represents a node (computer) or storage capacity:
......@@ -17,16 +17,16 @@ There are four types of compute nodes:
* 4 compute nodes with a MIC accelerator - an Intel Xeon Phi 5110P
* 2 fat nodes - equipped with 512 GB of RAM and two 100 GB SSD drives
[More about Compute nodes](compute-nodes/).
[More about Compute nodes][1].
GPU and accelerated nodes are available upon request, see the [Resources Allocation Policy](resources-allocation-policy/).
GPU and accelerated nodes are available upon request, see the [Resources Allocation Policy][2].
All of these nodes are interconnected through fast InfiniBand and Ethernet networks. [More about the Network](network/).
All of these nodes are interconnected through fast InfiniBand and Ethernet networks. [More about the Network][3].
Every chassis provides an InfiniBand switch, marked **isw**, connecting all nodes in the chassis, as well as connecting the chassis to the upper level switches.
All of the nodes share a 360 TB /home disk for storage of user files. The 146 TB shared /scratch storage is available for scratch data. These file systems are provided by the Lustre parallel file system. There is also local disk storage available on all compute nodes in /lscratch. [More about Storage](storage/).
All of the nodes share a 360 TB /home disk for storage of user files. The 146 TB shared /scratch storage is available for scratch data. These file systems are provided by the Lustre parallel file system. There is also local disk storage available on all compute nodes in /lscratch. [More about Storage][4].
User access to the Anselm cluster is provided by two login nodes login1, login2, and data mover node dm1. [More about accessing the cluster.](shell-and-data-access/)
User access to the Anselm cluster is provided by two login nodes login1, login2, and data mover node dm1. [More about accessing the cluster][5].
The parameters are summarized in the following tables:
......@@ -35,7 +35,7 @@ The parameters are summarized in the following tables:
| Primary purpose | High Performance Computing |
| Architecture of compute nodes | x86-64 |
| Operating system | Linux (CentOS) |
| [**Compute nodes**](compute-nodes/) | |
| [**Compute nodes**][1] | |
| Total | 209 |
| Processor cores | 16 (2 x 8 cores) |
| RAM | min. 64 GB, min. 4 GB per core |
......@@ -57,4 +57,12 @@ The parameters are summarized in the following tables:
| MIC accelerated | 2 x Intel Sandy Bridge E5-2470, 2.3 GHz | 96 GB | Intel Xeon Phi 5110P |
| Fat compute node | 2 x Intel Sandy Bridge E5-2665, 2.4 GHz | 512 GB | - |
For more details refer to [Compute nodes](compute-nodes/), [Storage](storage/), and [Network](network/).
For more details refer to [Compute nodes][1], [Storage][4], and [Network][3].
[1]: compute-nodes.md
[2]: resources-allocation-policy.md
[3]: network.md
[4]: storage.md
[5]: shell-and-data-access.md
[a]: https://support.it4i.cz/rt
# Introduction
Welcome to Anselm supercomputer cluster. The Anselm cluster consists of 209 compute nodes, totalling 3344 compute cores with 15 TB RAM, giving over 94 TFLOP/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 16 cores, at least 64 GB of RAM, and a 500 GB hard disk drive. Nodes are interconnected through a fully non-blocking fat-tree InfiniBand network, and are equipped with Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview](hardware-overview/).
Welcome to Anselm supercomputer cluster. The Anselm cluster consists of 209 compute nodes, totalling 3344 compute cores with 15 TB RAM, giving over 94 TFLOP/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 16 cores, at least 64 GB of RAM, and a 500 GB hard disk drive. Nodes are interconnected through a fully non-blocking fat-tree InfiniBand network, and are equipped with Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview][1].
The cluster runs with an [operating system](software/operating-system/) which is compatible with the RedHat [Linux family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment](environment-and-modules/).
The cluster runs with an operating system which is compatible with the RedHat [Linux family][a]. We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment][2].
The user data shared file-system (HOME, 320 TB) and job data shared file-system (SCRATCH, 146 TB) are available to users.
The PBS Professional workload manager provides [computing resources allocations and job execution](resources-allocation-policy/).
The PBS Professional workload manager provides [computing resources allocations and job execution][3].
Read more on how to [apply for resources](../general/applying-for-resources/), [obtain login credentials](../general/obtaining-login-credentials/obtaining-login-credentials/) and [access the cluster](shell-and-data-access/).
Read more on how to [apply for resources][4], [obtain login credentials][5] and [access the cluster][6].
[1]: hardware-overview.md
[2]: ../environment-and-modules.md
[3]: resources-allocation-policy.md
[4]: ../general/applying-for-resources.md
[5]: ../general/obtaining-login-credentials/obtaining-login-credentials.md
[6]: shell-and-data-access.md
[a]: http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg
......@@ -16,7 +16,7 @@ Queue priority is the priority of the queue in which the job is waiting prior to
Queue priority has the biggest impact on job execution priority. The execution priority of jobs in higher priority queues is always greater than the execution priority of jobs in lower priority queues. Other properties of jobs used for determining the job execution priority (fair-share priority, eligible time) cannot compete with queue priority.
Queue priorities can be seen at <https://extranet.it4i.cz/anselm/queues>
Queue priorities can be seen [here][a].
### Fair-Share Priority
......@@ -36,7 +36,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage decays, halving at
Jobs queued in the queue qexp are not used to calculate the project's usage.
!!! note
Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>.
Calculated usage and fair-share priority can be seen [here][b].
Calculated fair-share priority can be also be seen in the Resource_List.fairshare attribute of a job.
......@@ -70,3 +70,6 @@ This means that jobs with lower execution priority can be run before jobs with h
Specifying more accurate walltime enables better scheduling, better execution times, and better resource usage. Jobs with suitable (small) walltime can be backfilled - and overtake job(s) with a higher priority.
---8<--- "mathjax.md"
[a]: https://extranet.it4i.cz/anselm/queues
[b]: https://extranet.it4i.cz/anselm/projects
......@@ -51,7 +51,7 @@ $ qsub -A OPEN-0-0 -q qfree -l select=10:ncpus=16 ./myjob
In this example, we allocate 10 nodes, 16 cores per node, for 12 hours. We allocate these resources via the qfree queue. It is not required that the project OPEN-0-0 has any available resources left. Consumed resources are still accounted for. The jobscript myjob will be executed on the first node in the allocation.
All qsub options may be [saved directly into the jobscript](#example-jobscript-for-mpi-calculation-with-preloaded-inputs). In such cases, it is not necessary to specify any options for qsub.
All qsub options may be [saved directly into the jobscript][1]. In such cases, it is not necessary to specify any options for qsub.
```console
$ qsub ./myjob
......@@ -92,9 +92,9 @@ In this example, we allocate 4 nodes, 16 cores per node, selecting only the node
### Placement by IB Switch
Groups of computational nodes are connected to chassis integrated Infiniband switches. These switches form the leaf switch layer of the [Infiniband network](network/) fat tree topology. Nodes sharing the leaf switch can communicate most efficiently. Sharing the same switch prevents hops in the network and facilitates unbiased, highly efficient network communication.
Groups of computational nodes are connected to chassis integrated Infiniband switches. These switches form the leaf switch layer of the [Infiniband network][2] fat tree topology. Nodes sharing the leaf switch can communicate most efficiently. Sharing the same switch prevents hops in the network and facilitates unbiased, highly efficient network communication.
Nodes sharing the same switch may be selected via the PBS resource attribute ibswitch. Values of this attribute are iswXX, where XX is the switch number. The node-switch mapping can be seen in the [Hardware Overview](hardware-overview/) section.
Nodes sharing the same switch may be selected via the PBS resource attribute ibswitch. Values of this attribute are iswXX, where XX is the switch number. The node-switch mapping can be seen in the [Hardware Overview][3] section.
We recommend allocating compute nodes to a single switch when best possible computational network performance is required to run the job efficiently:
......@@ -339,7 +339,7 @@ exit
In this example, a directory in /home holds the input file input and executable mympiprog.x . We create the directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI program mympiprog.x and copy the output file back to the /home directory. mympiprog.x is executed as one process per node, on all allocated nodes.
!!! note
Consider preloading inputs and executables onto [shared scratch](storage/) memory before the calculation starts.
Consider preloading inputs and executables onto [shared scratch][4] memory before the calculation starts.
In some cases, it may be impractical to copy the inputs to the scratch memory and the outputs to the home directory. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such cases, it is the users' responsibility to preload the input files on shared /scratch memory before the job submission, and retrieve the outputs manually after all calculations are finished.
......@@ -373,15 +373,14 @@ exit
In this example, input and executable files are assumed to be preloaded manually in the /scratch/$USER/myjob directory. Note the **mpiprocs** and **ompthreads** qsub options controlling the behavior of the MPI execution. mympiprog.x is executed as one process per node, on all 100 allocated nodes. If mympiprog.x implements OpenMP threads, it will run 16 threads per node.
More information can be found in the [Running OpenMPI](../software/mpi/Running_OpenMPI/) and [Running MPICH2](../software/mpi/running-mpich2/)
sections.
More information can be found in the [Running OpenMPI][5] and [Running MPICH2][6] sections.
### Example Jobscript for Single Node Calculation
!!! note
The local scratch directory is often useful for single node jobs. Local scratch memory will be deleted immediately after the job ends.
Example jobscript for single node calculation, using [local scratch](storage/) memory on the node:
Example jobscript for single node calculation, using [local scratch][4] memory on the node:
```bash
#!/bin/bash
......@@ -407,4 +406,12 @@ In this example, a directory in /home holds the input file input and executable
### Other Jobscript Examples
Further jobscript examples may be found in the software section and the [Capacity computing](capacity-computing/) section.
Further jobscript examples may be found in the software section and the [Capacity computing][7] section.
[1]: #example-jobscript-for-mpi-calculation-with-preloaded-inputs
[2]: network.md
[3]: hardware-overview.md
[4]: storage.md
[5]: ../software/mpi/running_openmpi.md
[6]: ../software/mpi/running-mpich2.md
[7]: capacity-computing.md
# Network
All of the compute and login nodes of Anselm are interconnected through an [InfiniBand](http://en.wikipedia.org/wiki/InfiniBand) QDR network and a Gigabit [Ethernet](http://en.wikipedia.org/wiki/Ethernet) network. Both networks may be used to transfer user data.
All of the compute and login nodes of Anselm are interconnected through an [InfiniBand][a] QDR network and a Gigabit [Ethernet][b] network. Both networks may be used to transfer user data.
## InfiniBand Network
All of the compute and login nodes of Anselm are interconnected through a high-bandwidth, low-latency [InfiniBand](http://en.wikipedia.org/wiki/InfiniBand) QDR network (IB 4 x QDR, 40 Gbps). The network topology is a fully non-blocking fat-tree.
All of the compute and login nodes of Anselm are interconnected through a high-bandwidth, low-latency [InfiniBand][a] QDR network (IB 4 x QDR, 40 Gbps). The network topology is a fully non-blocking fat-tree.
The compute nodes may be accessed via the InfiniBand network using ib0 network interface, in address range 10.2.1.1-209. The MPI may be used to establish native InfiniBand connection among the nodes.
......@@ -19,6 +19,8 @@ The compute nodes may be accessed via the regular Gigabit Ethernet network inter
## Example
In this example, we access the node cn110 through the InfiniBand network via the ib0 interface, then from cn110 to cn108 through the Ethernet network.
```console
$ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob
$ qstat -n -u username
......@@ -32,4 +34,5 @@ $ ssh 10.2.1.110
$ ssh 10.1.1.108
```
In this example, we access the node cn110 through the InfiniBand network via the ib0 interface, then from cn110 to cn108 through the Ethernet network.
[a]: http://en.wikipedia.org/wiki/InfiniBand
[b]: http://en.wikipedia.org/wiki/Ethernet
# Remote Visualization Service
## Introduction
The goal of this service is to provide users with GPU accelerated use of OpenGL applications, especially for pre- and post- processing work, where not only GPU performance is needed but also fast access to the shared file systems of the cluster and a reasonable amount of RAM.
The service is based on integration of the open source tools VirtualGL and TurboVNC together with the cluster's job scheduler PBS Professional.
Currently there are two dedicated compute nodes for this service with the following configuration for each node:
| [**Visualization node configuration**](compute-nodes/) | |
| ------------------------------------------------------ | --------------------------------------- |
| CPU | 2 x Intel Sandy Bridge E5-2670, 2.6 GHz |
| Processor cores | 16 (2 x 8 cores) |
| RAM | 64 GB, min. 4 GB per core |
| GPU | NVIDIA Quadro 4000, 2 GB RAM |
| Local disk drive | yes - 500 GB |
| Compute network | InfiniBand QDR |
## Schematic Overview
![rem_vis_scheme](../img/scheme.png "rem_vis_scheme")
![rem_vis_legend](../img/legend.png "rem_vis_legend")
## How to Use the Service
### Setup and Start Your Own TurboVNC Server
TurboVNC is designed and implemented for cooperation with VirtualGL and is available for free for all major platforms. For more information and download, please refer to: <http://sourceforge.net/projects/turbovnc/>
**Always use TurboVNC on both sides** (server and client) **don't mix TurboVNC and other VNC implementations** (TightVNC, TigerVNC, ...) as the VNC protocol implementation may slightly differ and diminish your user experience by introducing picture artifacts, etc.
The procedure is:
#### 1. Connect to a Login Node
Please [follow the documentation](shell-and-data-access/).
#### 2. Run Your Own Instance of TurboVNC Server
To have OpenGL acceleration, **24 bit color depth must be used**. Otherwise only the geometry (desktop size) definition is needed.
!!! hint
The first time the VNC server is run you need to define a password.
This example defines a desktop with the dimensions of 1200x700 pixels and 24 bit color depth.
```console
$ module load turbovnc/1.2.2
$ vncserver -geometry 1200x700 -depth 24
Desktop 'TurboVNC: login2:1 (username)' started on display login2:1
Starting applications specified in /home/username/.vnc/xstartup.turbovnc
Log file is /home/username/.vnc/login2:1.log
```
#### 3. Remember Which Display Number Your VNC Server Runs (You Will Need It in the Future to Stop the Server)
```console
$ vncserver -list
TurboVNC server sessions:
X DISPLAY # PROCESS ID
:1 23269
```
In this example the VNC server runs on display **:1**.
#### 4. Remember the Exact Login Node Where Your VNC Server Runs
```console
$ uname -n
login2
```
In this example the VNC server runs on **login2**.
#### 5. Remember on Which TCP Port Your Own VNC Server Is Running
To get the port you have to look to the log file of your VNC server.
```console
$ grep -E "VNC.*port" /home/username/.vnc/login2:1.log
20/02/2015 14:46:41 Listening for VNC connections on TCP port 5901
```
In this example the VNC server listens on TCP port **5901**.
#### 6. Connect to the Login Node Where Your VNC Server Runs With SSH to Tunnel Your VNC Session
Tunnel the TCP port on which your VNC server is listenning.
```console
$ ssh login2.anselm.it4i.cz -L 5901:localhost:5901
```
x-window-system/
If you use Windows and Putty, please refer to port forwarding setup in the documentation:
[x-window-and-vnc#section-12](../general/accessing-the-clusters/graphical-user-interface/x-window-system/)
#### 7. If You Don't Have Turbo VNC Installed on Your Workstation
Get it from: <http://sourceforge.net/projects/turbovnc/>
#### 8. Run TurboVNC Viewer From Your Workstation
Mind that you should connect through the SSH tunneled port. In this example it is 5901 on your workstation (localhost).
```console
$ vncviewer localhost:5901
```
If you use the Windows version of TurboVNC Viewer, just run the Viewer and use the address **localhost:5901**.
#### 9. Proceed to the Chapter "Access the Visualization Node"
Now you should have a working TurboVNC session connected to your workstation.
#### 10. After You End Your Visualization Session
Don't forget to correctly shutdown your own VNC server on the login node!
```console
$ vncserver -kill :1
```
### Access the Visualization Node
**To access the node use the dedicated PBS Professional scheduler queue
qviz**. The queue has the following properties:
| queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime |
| ---------------------------- | -------------- | ----------------- | ----- | --------- | -------- | ------------- | ---------------- |
| **qviz** Visualization queue | yes | none required | 2 | 4 | 150 | no | 1 hour / 8 hours |
Currently when accessing the node, each user gets 4 cores of a CPU allocated, thus approximately 16 GB of RAM and 1/4 of the GPU capacity.
!!! note
If more GPU power or RAM is required, it is recommended to allocate one whole node per user, so that all 16 cores, the whole RAM, and the whole GPU is exclusive. This is currently also the maximum allocation allowed per user. One hour of work is allocated by default, the user may ask for 2 hours maximum.
To access the visualization node, follow these steps:
#### 1. In Your VNC Session, Open a Terminal and Allocate a Node Using the PBSPro qsub Command
This step is necessary to allow you to proceed with the next steps.
```console
$ qsub -I -q qviz -A PROJECT_ID
```
In this example the default values for CPU cores and usage time are used.
```console
$ qsub -I -q qviz -A PROJECT_ID -l select=1:ncpus=16 -l walltime=02:00:00
```
Substitute **PROJECT_ID** with the assigned project identification string.
In this example a whole node is requested for 2 hours.
If there are free resources for your request, you will have a shell running on an assigned node. Please remember the name of the node.
```console
$ uname -n
srv8
```
In this example the visualization session was assigned to node **srv8**.
#### 2. In Your VNC Session Open Another Terminal (Keep the One With Interactive PBSPro Job Open)
Setup the VirtualGL connection to the node, which PBSPro allocated for our job.
```console
$ vglconnect srv8
```
You will be connected with the created VirtualGL tunnel to the visualization node, where you will have a shell.
#### 3. Load the VirtualGL Module
```console
$ module load virtualgl/2.4
```
#### 4. Run Your Desired OpenGL Accelerated Application Using the VirtualGL Script "Vglrun"
```console
$ vglrun glxgears
```
If you want to run an OpenGL application which is available through modules, you need to first load the respective module. E.g. to run the **Mentat** OpenGL application from **MARC** software package use:
```console
$ module load marc/2013.1
$ vglrun mentat
```
#### 5. After You End Your Work With the OpenGL Application
Just logout from the visualization node and exit both opened terminals and end your VNC server session as described above.
## Tips and Tricks
If you want to increase the responsibility of the visualization, please adjust your TurboVNC client settings in this way:
![rem_vis_settings](../img/turbovncclientsetting.png "rem_vis_settings")
To have an idea how the settings are affecting the resulting picture utility three levels of "JPEG image quality" are demonstrated:
** JPEG image quality = 30 **
![rem_vis_q3](../img/quality3.png "rem_vis_q3")
** JPEG image quality = 15 **
![rem_vis_q2](../img/quality2.png "rem_vis_q2")
** JPEG image quality = 10 **
![rem_vis_q1](../img/quality1.png "rem_vis_q1")
# Resource Allocation and Job Execution
To run a [job](job-submission-and-execution/), [computational resources](resources-allocation-policy/) for this particular job must be allocated. This is done via the PBS Pro job workload manager software, which efficiently distributes workloads across the supercomputer. Extensive information about PBS Pro can be found in the [official documentation here](../pbspro/), especially in the PBS Pro User's Guide.
## Resource Allocation Policy
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share](job-priority/) system of Anselm ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. The following queues are available to Anselm users:
* **qexp**, the Express queue
* **qprod**, the Production queue
* **qlong**, the Long queue, regula
* **qnvidia**, **qmic**, **qfat**, the Dedicated queues
* **qfree**, the Free resource utilization queue
!!! note
Check the queue status at <https://extranet.it4i.cz/anselm/>
Read more on the [Resource AllocationPolicy](resources-allocation-policy/) page.
## Job Submission and Execution
!!! note
Use the **qsub** command to submit your jobs.
The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is an entire node, 16 cores**, with the exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on the first of the allocated nodes.**
Read more on the [Job submission and execution](job-submission-and-execution/) page.
## Capacity Computing
!!! note
Use Job arrays when running a huge number of jobs.
Use GNU Parallel and/or Job arrays when running (many) single core jobs.
In many cases, it is useful to submit a huge (100+) number of computational jobs into the PBS queue system. A huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations, achieving the best runtime, throughput, and computer utilization. In this chapter, we discuss the the recommended way to run a huge number of jobs, including **ways to run a huge number of single core jobs**.
Read more on the [Capacity computing](capacity-computing/) page.
......@@ -2,7 +2,7 @@
## Job Queue Policies
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and the resources available to the Project. The Fair-share system of Anselm ensures that individual users may consume approximately equal amounts of resources per week. Detailed information can be found in the [Job scheduling](job-priority/) section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. The following table provides the queue partitioning overview:
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and the resources available to the Project. The Fair-share system of Anselm ensures that individual users may consume approximately equal amounts of resources per week. Detailed information can be found in the [Job scheduling][1] section. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. The following table provides the queue partitioning overview:
!!! note
Check the queue status at <https://extranet.it4i.cz/anselm/>
......@@ -17,28 +17,28 @@ The resources are allocated to the job in a fair-share fashion, subject to const
| qfree | yes | < 120% of allocation | 180 w/o accelerator | 16 | -1024 | no | 12 h |
!!! note
**The qfree queue is not free of charge**. [Normal accounting](#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a project has exhausted all its allocated computational resources. This does not apply to Director's Discretion projects (DD projects) by default. Usage of qfree after exhaustion of DD projects' computational resources is allowed after request for this queue.
**The qfree queue is not free of charge**. [Normal accounting][2] applies. However, it allows for utilization of free resources, once a project has exhausted all its allocated computational resources. This does not apply to Director's Discretion projects (DD projects) by default. Usage of qfree after exhaustion of DD projects' computational resources is allowed after request for this queue.
**The qexp queue is equipped with nodes which do not have exactly the same CPU clock speed.** Should you need the nodes to have exactly the same CPU speed, you have to select the proper nodes during the PSB job submission.
* **qexp**, the Express queue: This queue is dedicated to testing and running very small jobs. It is not required to specify a project to enter the qexp. There are always 2 nodes reserved for this queue (w/o accelerators), a maximum 8 nodes are available via the qexp for a particular user, from a pool of nodes containing Nvidia accelerated nodes (cn181-203), MIC accelerated nodes (cn204-207) and Fat nodes with 512GB of RAM (cn208-209). This enables us to test and tune accelerated code and code with higher RAM requirements. The nodes may be allocated on a per core basis. No special authorization is required to use qexp. The maximum runtime in qexp is 1 hour.
* **qprod**, the Production queue: This queue is intended for normal production runs. It is required that an active project with nonzero remaining resources is specified to enter the qprod. All nodes may be accessed via the qprod queue, except the reserved ones. 178 nodes without accelerators are included. Full nodes, 16 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprod is 48 hours.
* **qlong**, the Long queue: This queue is intended for long production runs. It is required that an active project with nonzero remaining resources is specified to enter the qlong. Only 60 nodes without acceleration may be accessed via the qlong queue. Full nodes, 16 cores per node, are allocated. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qlong is 144 hours (three times that of the standard qprod time - 3 x 48 h).
* **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to accessing the Nvidia accelerated nodes, the qmic to accessing MIC nodes and qfat the Fat nodes. It is required that an active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic, and 2 fat nodes are included. Full nodes, 16 cores per node, are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs to explicitly ask [support](https://support.it4i.cz/rt/) for authorization to enter the dedicated queues for all users associated with her/his project.
* **qnvidia**, qmic, qfat, the Dedicated queues: The queue qnvidia is dedicated to accessing the Nvidia accelerated nodes, the qmic to accessing MIC nodes and qfat the Fat nodes. It is required that an active project with nonzero remaining resources is specified to enter these queues. 23 nvidia, 4 mic, and 2 fat nodes are included. Full nodes, 16 cores per node, are allocated. The queues run with very high priority, the jobs will be scheduled before the jobs coming from the qexp queue. An PI needs to explicitly ask [support][a] for authorization to enter the dedicated queues for all users associated with her/his project.
* **qfree**, The Free resource queue: The queue qfree is intended for utilization of free resources, after a project has exhausted all of its allocated computational resources (Does not apply to DD projects by default; DD projects have to request persmission to use qfree after exhaustion of computational resources). It is required that active project is specified to enter the queue. Consumed resources will be accounted to the Project. Access to the qfree queue is automatically removed if consumed resources exceed 120% of the resources allocated to the Project. Only 180 nodes without accelerators may be accessed from this queue. Full nodes, 16 cores per node, are allocated. The queue runs with very low priority and no special authorization is required to use it. The maximum runtime in qfree is 12 hours.
## Queue Notes
The job wall clock time defaults to **half the maximum time**, see the table above. Longer wall time limits can be [set manually, see examples](job-submission-and-execution/).