Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision

Target

Select target project
  • sccs/docs.it4i.cz
  • soj0018/docs.it4i.cz
  • lszustak/docs.it4i.cz
  • jarosjir/docs.it4i.cz
  • strakpe/docs.it4i.cz
  • beranekj/docs.it4i.cz
  • tab0039/docs.it4i.cz
  • davidciz/docs.it4i.cz
  • gui0013/docs.it4i.cz
  • mrazek/docs.it4i.cz
  • lriha/docs.it4i.cz
  • it4i-vhapla/docs.it4i.cz
  • hol0598/docs.it4i.cz
  • sccs/docs-it-4-i-cz-fumadocs
  • siw019/docs-it-4-i-cz-fumadocs
15 results
Select Git revision
Show changes
Showing
with 1508 additions and 0 deletions
# Open OnDemand
[Open OnDemand][1] is an intuitive, innovative, and interactive interface to remote computing resources.
It allows users to access our services from any device and web browser,
resulting in faster and more efficient use of supercomputing resources.
For more information, see the Open OnDemand [documentation][2].
## Access Open OnDemand
To access the OOD service, you must be connected to [IT4I VPN][a].
Then go to [https://ood-karolina.it4i.cz/][3] for Karolina
or [https://ood-barbora.it4i.cz/][4] for Barbora and enter your e-INFRA CZ or IT4I credentials.
From the top menu bar, you can manage your files and jobs, access the cluster's shell
and launch interactive apps on login nodes.
## OOD Apps on IT4I Clusters
!!! note
Barbora OOD offers Mate and XFCE Desktops on login node only. Other applications listed below are exclusive to Karolina OOD.
* Desktops
* Karolina Login Mate
* Karolina Login XFCE
* Gnome Desktop
* GUIs
* Ansys
* Blender
* ParaView
* TorchStudio
* Servers
* Code Server
* Jupyter (+IJulia)
* MATLAB
* TensorBoard
* Simulation
* Code Aster
Depending on a selected application, you can set up various properties;
e.g. partition, number of nodes, tasks per node reservation, etc.
For `qgpu` partitions, you can select the number of GPUs.
![Ansys app in OOD GUI](../../../img/ood-ansys.png)
## Job Composer Tutorial
Under *Jobs > Job Composer*, you can create jobs from several sources.
A simple tutorial will guide you through the process.
To restart the tutorial, click *Help* in the upper right corner.
[1]: https://openondemand.org/
[2]: https://osc.github.io/ood-documentation/latest/
[3]: https://ood-karolina.it4i.cz/
[4]: https://ood-barbora.it4i.cz/
[a]: ../vpn-access.md
# VNC
Virtual Network Computing (VNC) is a graphical desktop-sharing system that uses the Remote Frame Buffer protocol (RFB) to remotely control another computer. It transmits the keyboard and mouse events from one computer to another, relaying the graphical screen updates back in the other direction, over a network.
VNC-based connections are usually faster (require less network bandwidth) than [X11][1] applications forwarded directly through SSH.
The recommended clients are [TightVNC][b] or [TigerVNC][c] (free, open source, available for almost any platform).
## Create VNC Server Password
!!! note
VNC server password should be set before the first login. Use a strong password.
```console
$ vncpasswd
Password:
Verify:
```
## Start VNC Server
!!! note
To access VNC, a remote VNC Server must be started first and a tunnel using SSH port forwarding must be established.
[See below][2] the details on SSH tunnels.
Start by **choosing your display number**.
To choose a free one, you should check currently occupied display numbers - list them using the command:
```console
$ ps aux | grep Xvnc | sed -rn 's/(\s) .*Xvnc (\:[0-9]+) .*/\1 \2/p'
username :79
username :60
.....
```
As you can see above, displays ":79" and ":60" are already occupied.
Generally, you can choose display number freely, *except these occupied numbers*.
Also remember that display number should be lower than or equal to 99.
Based on this requirement, we have chosen the display number 61, as seen in the examples below.
!!! note
Your situation may be different so the choice of your number may differ, as well. **Choose and use your own display number accordingly!**
Start your remote VNC server on the chosen display number (61):
```console
$ vncserver :61 -geometry 1600x900 -depth 16
New 'login2:1 (username)' desktop is login2:1
Starting applications specified in /home/username/.vnc/xstartup
Log file is /home/username/.vnc/login2:1.log
```
Check whether the VNC server is running on the chosen display number (61):
```console
$ vncserver -list
TigerVNC server sessions:
X DISPLAY # PROCESS ID
:61 18437
```
Another way to check it:
```console
$ ps aux | grep Xvnc | sed -rn 's/(\s) .*Xvnc (\:[0-9]+) .*/\1 \2/p'
username :61
username :102
```
!!! note
The VNC server runs on port 59xx, where xx is the display number. To get your port number, simply add 5900 + display number, in our example 5900 + 61 = 5961. Another example for display number 102 is calculation of TCP port 5900 + 102 = 6002, but note that TCP ports above 6000 are often used by X11. **Calculate your own port number and use it instead of 5961 from examples below**.
To access the remote VNC server you have to create a tunnel between the login node using TCP port 5961 and your local machine using a free TCP port (for simplicity the very same) in next step. See examples for [Linux/Mac OS][2] and [Windows][3].
!!! note
The tunnel must point to the same login node where you launched the VNC server, e.g. login2. If you use just cluster-name.it4i.cz, the tunnel might point to a different node due to DNS round robin.
## Linux/Mac OS Example of Creating a Tunnel
On your local machine, create the tunnel:
```console
$ ssh -TN -f username@login2.cluster-name.it4i.cz -L 5961:localhost:5961
```
Issue the following command to check the tunnel is established (note the PID 2022 in the last column, it is required for closing the tunnel):
```console
$ netstat -natp | grep 5961
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.0.1:5961 0.0.0.0:* LISTEN 2022/ssh
tcp6 0 0 ::1:5961 :::* LISTEN 2022/ssh
```
Or on Mac OS use this command:
```console
$ lsof -n -i4TCP:5961 | grep LISTEN
ssh 75890 sta545 7u IPv4 0xfb062b5c15a56a3b 0t0 TCP 127.0.0.1:5961 (LISTEN)
```
Connect with the VNC client:
```console
$ vncviewer 127.0.0.1:5961
```
In this example, we connect to remote VNC server on port 5961, via the SSH tunnel. The connection is encrypted and secured. The VNC server listening on port 5961 provides screen of 1600x900 pixels.
You have to close the SSH tunnel which is still running in the background after you finish the work. Use the following command (PID 2022 in this case, see the netstat command above):
```console
kill 2022
```
!!! note
You can watch the instruction video on how to make a VNC connection between a local Ubuntu desktop and the IT4I cluster [here][e].
## Windows Example of Creating a Tunnel
Start the VNC server using the `vncserver` command described above.
Search for the localhost and port number (in this case 127.0.0.1:5961):
```console
$ netstat -tanp | grep Xvnc
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.0.1:5961 0.0.0.0:* LISTEN 24031/Xvnc
```
### PuTTY
On the PuTTY Configuration screen, go to _Connection -> SSH -> Tunnels_ to set up the tunnel.
Fill the _Source port_ and _Destination_ fields. **Do not forget to click the _Add_ button**.
![](../../../img/putty-tunnel.png)
### WSL (Bash on Windows)
[Windows Subsystem for Linux][d] is another way to run Linux software in a Windows environment.
At your machine, create the tunnel:
```console
$ ssh username@login2.cluster-name.it4i.cz -L 5961:localhost:5961
```
## Example of Starting VNC Client
Run the VNC client of your choice, select the VNC server 127.0.0.1, port 5961 and connect using the VNC password.
### TigerVNC Viewer
![](../../../img/vncviewer.png)
In this example, we connect to remote the VNC server on port 5961, via the SSH tunnel, using the TigerVNC viewer. The connection is encrypted and secured. The VNC server listening on port 5961 provides a screen of 1600x900 pixels.
### TightVNC Viewer
Use your VNC password to log using the TightVNC Viewer and start a Gnome Session on the login node.
![](../../../img/TightVNC_login.png)
## Gnome Session
After the successful login, you should see the following screen:
![](../../../img/gnome_screen.png)
### Disable Your Gnome Session Screensaver
Open the Screensaver preferences dialog:
![](../../../img/gdmscreensaver.png)
Uncheck both options below the slider:
![](../../../img/gdmdisablescreensaver.png)
### Kill Screensaver if Locked Screen
If the screen gets locked, you have to kill the screensaver. Do not forget to disable the screensaver then.
```console
$ ps aux | grep screen
username 1503 0.0 0.0 103244 892 pts/4 S+ 14:37 0:00 grep screen
username 24316 0.0 0.0 270564 3528 ? Ss 14:12 0:00 gnome-screensaver
[username@login2 .vnc]$ kill 24316
```
## Kill VNC Server After Finished Work
You should kill your VNC server using the command:
```console
$ vncserver -kill :61
Killing Xvnc process ID 7074
Xvnc process ID 7074 already killed
```
or:
```console
$ pkill vnc
```
!!! note
Also, do not forget to terminate the SSH tunnel, if it was used. For details, see the end of [this section][2].
## GUI Applications on Compute Nodes Over VNC
The very same methods as described above may be used to run the GUI applications on compute nodes. However, for maximum performance, follow these steps:
Open a Terminal (_Applications -> System Tools -> Terminal_). Run all the following commands in the terminal.
![](../../../img/gnome-terminal.png)
Allow incoming X11 graphics from the compute nodes at the login node:
Get an interactive session on a compute node (for more detailed info [look here][4]). Forward X11 system using `--x11` option:
```console
$ salloc -A PROJECT_ID -p qcpu --x11
```
Test that the DISPLAY redirection into your VNC session works, by running an X11 application (e.g. XTerm, Intel Advisor, etc.) on the assigned compute node:
```console
$ xterm
```
The example described above:
![](../../../img/node_gui_xwindow.png)
### GUI Over VNC and SSH
For a [better performance][1] an SSH connection can be used.
Open two Terminals (_Applications -> System Tools -> Terminal_) as described before.
Get an interactive session on a compute node (for more detailed info [look here][4]). Forward X11 system using `--x11` option:
```console
$ salloc -A PROJECT_ID -p qcpu --x11
```
In the second terminal connect to the assigned node and run the X11 application
```console
$ ssh -X node_name.barbora.it4i.cz
$ xterm
```
The example described above:
![](../../../img/node_gui_sshx.png)
[b]: http://www.tightvnc.com
[c]: http://sourceforge.net/apps/mediawiki/tigervnc/index.php?title=Main_Page
[d]: http://docs.microsoft.com/en-us/windows/wsl
[e]: https://www.youtube.com/watch?v=b9Ez9UN2uL0
[1]: x-window-system.md
[2]: #linuxmac-os-example-of-creating-a-tunnel
[3]: #windows-example-of-creating-a-tunnel
[4]: ../../job-submission-and-execution.md
# X Window System
The X Window system is a principal way to get GUI access to the clusters. The **X Window System** (commonly known as **X11**, based on its current major version being 11, or shortened to simply **X**, and sometimes informally **X-Windows**) is a computer software system and network protocol that provides a basis for graphical user interfaces (GUIs) and rich input device capability for networked computers.
!!! tip
The X display forwarding must be activated and the X server running on client side
## X Display
### Linux Example
In order to display the GUI of various software tools, you need to enable the X display forwarding. On Linux and Mac, log in using the `-X` option in the SSH client:
```console
local $ ssh -X username@cluster-name.it4i.cz
```
### PuTTY on Windows
On Windows, use the PuTTY client to enable X11 forwarding. In PuTTY menu, go to _Connection > SSH > X11_ and check the _Enable X11 forwarding_ checkbox before logging in. Then log in as usual.
![](../../../img/cygwinX11forwarding.png)
### WSL (Bash on Windows)
To enable the X display forwarding, log in using the `-X` option in the SSH client:
```console
local $ ssh -X username@cluster-name.it4i.cz
```
!!! tip
If you are getting the "cannot open display" error message, try to export the DISPLAY variable, before attempting to log in:
```console
local $ export DISPLAY=localhost:0.0
```
## X Server
In order to display the GUI of various software tools, you need a running X server on your desktop computer. For Linux users, no action is required as the X server is the default GUI environment on most Linux distributions. Mac and Windows users need to install and run the X server on their workstations.
### X Server on OS X
Mac OS users need to install [XQuartz server][d].
### WSL (Bash on Windows)
To run Linux GuI on WSL, download, for example, [VcXsrv][a].
1. After installation, run XLaunch and during the initial setup, check the `Disable access control`.
!!! tip
Save the configuration and launch VcXsrv using the `config.xlaunch` file, so you won't have to check the option on every run.
1. Allow VcXsrv in your firewall to communicate on private and public networks.
1. Set the `DISPLAY` environment variable, using the following command:
```console
export DISPLAY="`grep nameserver /etc/resolv.conf | sed 's/nameserver //'`:0"
```
!!! tip
Include the command at the end of the `/etc/bash.bashrc`, so you don't have to run it every time you run WSL.
1. Test the configuration by running `echo $DISPLAY`:
```code
user@nb-user:/$ echo $DISPLAY
172.26.240.1:0
```
### X Server on Windows
There is a variety of X servers available for the Windows environment. The commercial Xwin32 is very stable and feature-rich. The Cygwin environment provides fully featured open-source XWin X server. For simplicity, we recommend the open-source X server by the [Xming project][e]. For stability and full features, we recommend the [XWin][f] X server by Cygwin
| How to use Xwin | How to use Xming |
|--- | --- |
| [Install Cygwin][g]. Find and execute XWin.exe to start the X server on Windows desktop computer. | Use Xlaunch to configure Xming. Run Xming to start the X server on a Windows desktop computer. |
## Running GUI Enabled Applications
!!! note
Make sure that X forwarding is activated and the X server is running.
Then launch the application as usual. Use the `&` to run the application in background:
```console
$ ml intel (idb and gvim not installed yet)
$ gvim &
```
```console
$ xterm
```
In this example, we activate the Intel programing environment tools and then start the graphical gvim editor.
## GUI Applications on Compute Nodes
Allocate the compute nodes using the `--x11` option on the `salloc` command:
```console
$ salloc -A PROJECT-ID -q qcpu_exp --x11
```
In this example, we allocate one node via qcpu_exp queue, interactively. We request X11 forwarding with the `--x11` option. It will be possible to run the GUI enabled applications directly on the first compute node.
For **better performance**, log on the allocated compute node via SSH, using the `-X` option.
```console
$ ssh -X cn245
```
In this example, we log on the cn245 compute node, with the X11 forwarding enabled.
## Gnome GUI Environment
The Gnome 2.28 GUI environment is available on the clusters. We recommend using a separate X server window for displaying the Gnome environment.
### Gnome on Linux and OS X
To run the remote Gnome session in a window on a Linux/OS X computer, you need to install Xephyr. Ubuntu package is
xserver-xephyr, on OS X it is part of [XQuartz][i]. First, launch Xephyr on local machine:
```console
local $ Xephyr -ac -screen 1024x768 -br -reset -terminate :1 &
```
This will open a new X window of size 1024x768 at DISPLAY :1. Next, connect via SSH to the cluster with the `DISPLAY` environment variable set and launch a gnome-session:
```console
local $ DISPLAY=:1.0 ssh -XC yourname@cluster-name.it4i.cz -i ~/.ssh/path_to_your_key
... cluster-name MOTD...
yourname@login1.cluster-namen.it4i.cz $ gnome-session &
```
On older systems where Xephyr is not available, you may also try Xnest instead of Xephyr. Another option is to launch a new X server in a separate console via:
```console
xinit /usr/bin/ssh -XT -i .ssh/path_to_your_key yourname@cluster-namen.it4i.cz gnome-session -- :1 vt12
```
However, this method does not seem to work with recent Linux distributions and you will need to manually source
/etc/profile to properly set environment variables for Slurm.
### Gnome on Windows
Use XLaunch to start the Xming server or run the XWin.exe. Select the "One window" mode.
Log in to the cluster using [PuTTY][2] or [Bash on Windows][3]. On the cluster, run the gnome-session command.
```console
$ gnome-session &
```
This way, we run a remote gnome session on the cluster, displaying it in the local X server.
Use System-Log Out to close the gnome-session.
[1]: #if-no-able-to-forward-x11-using-putty-to-cygwinx
[2]: #putty-on-windows
[3]: #wsl-bash-on-windows
[a]: https://sourceforge.net/projects/vcxsrv/
[d]: https://www.xquartz.org
[e]: http://sourceforge.net/projects/xming/
[f]: http://x.cygwin.com/
[g]: http://x.cygwin.com/
[i]: http://xquartz.macosforge.org/landing/
# Xorg
## Introduction
!!! note
Available only for Karolina accelerated nodes acn[01-72] and vizualization servers viz[1-2]
Some applications (e.g. Paraview, Ensight, Blender, Ovito) require not only visualization but also computational resources such as multiple cores or multiple graphics accelerators. For the processing of demanding tasks, more operating memory and more memory on the graphics card are also required. These requirements are met by all accelerated nodes on the Karolina cluster, which are equipped with eight graphics cards with 40GB GPU memory and 1TB CPU memory. To run properly, it is required to have the Xorg server running and the VirtualGL environment installed.
## Xorg
[Xorg][a] is a free and open source implementation of the X Window System imaging server maintained by the X.Org Foundation. Client-side implementations of the protocol are available, for example, in the form of Xlib and XCB. While Xorg usually supports 2D hardware acceleration, 3D hardware acceleration is often missing. With hardware 3D acceleration, 3D rendering uses the graphics processor on the graphics card instead of taking up valuable CPU resources when rendering 3D images. It is also referred to as hardware acceleration instead of software acceleration because without this 3D acceleration, the processor is forced to draw everything itself using the [Mesa][c] software rendering libraries, which takes up quite a bit of computing power. There is a VirtualGL package that solves these problems.
## VirtualGL
[VirtualGL][b] is an open source software package that redirects 3D rendering commands from Linux OpenGL applications to 3D accelerator hardware in a dedicated server and sends the rendered output to a client located elsewhere on the network. On the server side, VirtualGL consists of a library that handles the redirection and a wrapper that instructs applications to use the library. Clients can connect to the server either using a remote X11 connection or using an X11 proxy such as a VNC server. In the case of an X11 connection, some VirtualGL software is also required on the client side to receive the rendered graphical output separately from the X11 stream. In the case of VNC connections, no specific client-side software is needed other than the VNC client itself. VirtualGL works seamlessly with [headless][d] NVIDIA GPUs (Ampere, Tesla).
## Running Paraview With GUI and Interactive Job on Karolina
1. Run [VNC environment][1]
1. Run terminal in VNC session:
```console
[loginX.karolina]$ gnome-terminal
```
1. Run interactive job in gnome terminal
```console
[loginX.karolina]$ salloc --A PROJECT-ID -q qgpu --x11 --comment use:xorg=true
```
1. Run Xorg server
```console
[acnX.karolina]$ Xorg :0 &
```
1. Load VirtualGL:
```console
[acnX.karolina]$ ml VirtualGL
```
1. Find number of DISPLAY:
```console
[acnX.karolina]$ echo $DISPLAY
localhost:XX.0 (for ex. localhost:50.0)
```
1. Load ParaView:
```console
[acnX.karolina]$ ml ParaView
```
1. Run ParaView:
```console
[acnX.karolina]$ DISPLAY=:XX vglrun paraview
```
!!! note
It is not necessary to run Xorg from the command line on the visualization servers viz[1-2]. Xorg runs without interruption and is started when the visualization server boots.<br> Another option is to use [vglclient][2] for visualization server.
## Running Blender (Eevee) on the Background Without GUI and Without Interactive Job on Karolina
1. Download and extract Blender and Eevee scene:
```console
[loginX.karolina]$ wget https://ftp.nluug.nl/pub/graphics/blender/release/Blender2.93/blender-2.93.6-linux-x64.tar.xz ; tar -xvf blender-2.93.6-linux-x64.tar.xz ; wget https://download.blender.org/demo/eevee/mr_elephant/mr_elephant.blend
```
1. Create a running script:
```console
[loginX.karolina]$ echo 'Xorg :0 &' > run_eevee.sh ; echo 'cd' $PWD >> run_eevee.sh ; echo 'DISPLAY=:0 ./blender-2.93.6-linux-x64/blender --factory-startup --enable-autoexec -noaudio --background ./mr_elephant.blend --render-output ./#### --render-frame 0' >> run_eevee.sh ; chmod +x run_eevee.sh
```
1. Run job from terminal:
```console
[loginX.karolina]$ sbatch -A PROJECT-ID -q qcpu --comment use:xorg=true ./run_eevee.sh
```
[1]: ./vnc.md
[2]: ../../../software/viz/vgl.md
[a]: https://www.x.org/wiki/
[b]: https://en.wikipedia.org/wiki/VirtualGL
[c]: https://docs.mesa3d.org/index.html
[d]: https://virtualgl.org/Documentation/HeadlessNV
# PuTTY (Windows)
## Windows PuTTY Installer
We recommend you to download "**A Windows installer for everything except PuTTYtel**" with **Pageant** (SSH authentication agent) and **PuTTYgen** (PuTTY key generator) which is available [here][a].
!!! note
"Pageant" is optional.
"Change Password for Existing Private Key" is optional.
## PuTTY - How to Connect to the IT4Innovations Cluster
* Run PuTTY
* Enter Host name and Save session fields with login address and browse Connection - SSH - Auth menu. The _Host Name_ input may be in the format **"username@clustername.it4i.cz"** so you do not have to type your login each time. In this example, replace the word `cluster` in the `cluster.it4i.cz` address with the name of the cluster to which you want to connect.
![](../../../img/PuTTY_host_cluster.png)
* Category - Connection - SSH - Auth:
Select Attempt authentication using Pageant.
Select Allow agent forwarding.
Browse and select your private key file.
![](../../../img/PuTTY_keyV.png)
* Return to Session page and Save selected configuration with _Save_ button.
![](../../../img/PuTTY_save_cluster.png)
* Now you can log in using _Open_ button.
![](../../../img/PuTTY_open_cluster.png)
* Enter your username if the _Host Name_ input is not in the format "username@cluster.it4i.cz".
* Enter passphrase for selected private key file if Pageant **SSH authentication agent is not used.**
## Another PuTTY Settings
* Category - Windows - Translation - Remote character set and select **UTF-8**.
* Category - Terminal - Features and select **Disable application keypad mode** (enable numpad)
* Save your configuration in the Session - Basic options for your PuTTY section with the _Save_ button.
## Pageant SSH Agent
Pageant holds your private key in memory without needing to retype a passphrase on every login.
* Run Pageant.
* On Pageant Key List press _Add key_ and select your private key (id_rsa.ppk).
* Enter your passphrase.
* Now you have your private key in memory without needing to retype a passphrase on every login.
![](../../../img/PageantV.png)
## PuTTY Key Generator
PuTTYgen is the PuTTY key generator. You can load in an existing private key and change your passphrase or generate a new public/private key pair.
### Change Password for Existing Private Key
You can change the password of your SSH key with "PuTTY Key Generator". Make sure to back up the key.
* Load your private key file with _Load_ button.
* Enter your current passphrase.
* Change key passphrase.
* Confirm key passphrase.
* Save your private key with the _Save private key_ button.
![](../../../img/PuttyKeygeneratorV.png)
### Generate a New Public/Private Key
You can generate an additional public/private key pair and insert public key into authorized_keys file for authentication with your own private key.
* Start with _Generate_ button.
![](../../../img/PuttyKeygenerator_001V.png)
* Generate some randomness.
![](../../../img/PuttyKeygenerator_002V.png)
* Wait.
![](../../../img/PuttyKeygenerator_003V.png)
* Enter a comment for your key using the 'username@organization.example.com' format.
Enter a key passphrase, confirm it and save your new private key in the _ppk_ format.
![](../../../img/PuttyKeygenerator_004V.png)
* Save the public key with the _Save public key_ button.
You can copy public key out of the ‘Public key for pasting into the authorized_keys file’ box.
![](../../../img/PuttyKeygenerator_005V.png)
* Export the private key in the OpenSSH format "id_rsa" using Conversion - Export OpenSSH key
![](../../../img/PuttyKeygenerator_006V.png)
## Managing Your SSH Key
To manage your SSH key for authentication to clusters, see the [SSH Key Management][3] section.
[1]: ./ssh-key-management.md
[1]: #putty
[2]: ssh-keys.md#how-to-add-your-own-key
[3]: ./ssh-key-management.md
[a]: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
# SSH
Secure Shell (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network.
SSH uses public-private key pair for authentication, allowing users to log in without having to specify a password. The public key is placed on all computers that must allow access to the owner of the matching private key (the private key must be kept **secret**).
## Private Key
!!! note
The path to a private key is usually /home/username/.ssh/
A private key file in the `id_rsa` or `*.ppk` format is present locally on local side and used for example in the Pageant SSH agent (for Windows users). The private key should always be kept in a safe place.
### Example of RSA Private Key Format
```console
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAqbo7jokygnBpG2wYa5NB45ns6+UKTNLMLHF0BO3zmRtKEElE
aGqXfbYwvXlcuRb2d9/Y5dVpCZHV0kbY3NhtVOcEIe+1ROaiU9BEsUAhMNEvgiLV
gSql4QvRO4BWPlM8+WAWXDp3oeoBh8glXyuh9teb8yq98fv1r1peYGRrW3/s4V+q
O1SQ0XY2T7rWCYRLIP6rTMXArTI35v3WU513mn7nm1fJ7oN0QgVH5b0W9V1Kyc4l
9vILHeMXxvz+i/5jTEfLOJpiRGYZYcaYrE4dIiHPl3IlbV7hlkK23Xb1US8QJr5G
ADxp1VTkHjY+mKagEfxl1hQIb42JLHhKMEGqNQIDAQABAoIBAQCkypPuxZjL+vai
UGa5dAWiRZ46P2yrwHPKpvEdpCdDPbLAc1K/CtdBkHZsUPxNHVV6eFWweW99giIY
Av+mFWC58X8asBHQ7xkmxW0cqAZRzpkRAl9IBS9/fKjO28Fgy/p+suOi8oWbKIgJ
3LMkX0nnT9oz1AkOfTNC6Tv+3SE7eTj1RPcMjur4W1Cd1N3EljLszdVk4tLxlXBS
yl9NzVnJJbJR4t01l45VfFECgYEAno1WJSB/SwdZvS9GkfhvmZd3r4vyV9Bmo3dn
XZAh8HRW13imOnpklDR4FRe98D9A7V3yh9h60Co4oAUd6N+Oc68/qnv/8O9efA+M
/neI9ANYFo8F0+yFCp4Duj7zPV3aWlN/pd8TNzLqecqh10uZNMy8rAjCxybeZjWd
DyhgywXhAoGBAN3BCazNefYpLbpBQzwes+f2oStvwOYKDqySWsYVXeVgUI+OWTVZ
eZ26Y86E8MQO+q0TIxpwou+TEaUgOSqCX40Q37rGSl9K+rjnboJBYNCmwVp9bfyj
kCLL/3g57nTSqhgHNa1xwemePvgNdn6FZteA8sXiCg5ZzaISqWAffek5AoGBAMPw
V/vwQ96C8E3l1cH5cUbmBCCcfXM2GLv74bb1V3SvCiAKgOrZ8gEgUiQ0+TfcbAbe
7MM20vRNQjaLTBpai/BTbmqM1Q+r1KNjq8k5bfTdAoGANgzlNM9omM10rd9WagL5
yuJcal/03p048mtB4OI4Xr5ZJISHze8fK4jQ5veUT9Vu2Fy/w6QMsuRf+qWeCXR5
RPC2H0JzkS+2uZp8BOHk1iDPqbxWXJE9I57CxBV9C/tfzo2IhtOOcuJ4LY+sw+y/
ocKpJbdLTWrTLdqLHwicdn8OxeWot1mOukyK2l0UeDkY6H5pYPtHTpAZvRBd7ETL
Zs2RP3KFFvho6aIDGrY0wee740/jWotx7fbxxKwPyDRsbH3+1Wx/eX2RND4OGdkH
gejJEzpk/7y/P/hCad7bSDdHZwO+Z03HIRC0E8yQz+JYatrqckaRCtd7cXryTmTR
FbvLJmECgYBDpfno2CzcFJCTdNBZFi34oJRiDb+HdESXepk58PcNcgK3R8PXf+au
OqDBtZIuFv9U1WAg0gzGwt/0Y9u2c8m0nXziUS6AePxy5sBHs7g9C9WeZRz/nCWK
+cHIm7XOwBEzDKz5f9eBqRGipm0skDZNKl8X/5QMTT5K3Eci2n+lTw==
-----END RSA PRIVATE KEY-----
```
### Example of Ed25519 Private Key Format
```console
PuTTY-User-Key-File-3: ssh-ed25519
Encryption: aes256-cbc
Comment: eddsa-key-20240910
Public-Lines: 2
AAAAC3NzaC1lZDI1NTE5AAAAIBKNwqaWU260wueN00nBGRwIqeOedRedtS0T7QVn
h0i2
Key-Derivation: Argon2id
Argon2-Memory: 8192
Argon2-Passes: 21
Argon2-Parallelism: 1
Argon2-Salt: bb64fc32b368aa16d6e8159c8d921f63
Private-Lines: 1
+7StvvEmCMchEy1tUyIMLfGTZBk7dgGUpJEJzNl82qmNZD1TmQOqNmCRiK84P/TL
Private-MAC: dc3f83cef42026a2038f28e96f87367d762e72265621d82e2fe124634ec3c905
```
## Public Key
A public key file in the `*.pub` format is present on the remote side and allows an access to the owner of the matching private key.
### Example of RSA Public Key Format
```console
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCpujuOiTKCcGkbbBhrk0Hjmezr5QpM0swscXQE7fOZG0oQSURoapd9tjC9eVy5FvZ339jl1WkJkdXSRtjc2G1U5wQh77VE5qJT0ESxQCEw0S+CItWBKqXhC9E7gFY+UyP5YBZcOneh6gGHyCVfK6H215vzKr3x+/WvWl5gZGtbf+zhX6o4RJDRdjZPutYJhEsg/qtMxcCtMjfm/dZTnXeafuebV8nug3RCBUflvRb1XUrJuiX28gsd4xfG/P6L/mNMR8s4kmJEZhlhxpj8Th0iIc+XciVtXuGWQrbddcVRLxAmvkYAPGnVVOQeNj69pqAR/GXaFAhvjYkseEowQao1 username@organization.example.com
```
### Example of Ed25519 Public Key Format
```console
---- BEGIN SSH2 PUBLIC KEY ----
Comment: "eddsa-key-20240910"
AAAAC3NzaC1lZDI1NTE5AAAAIBKNwqaWU260wueN00nBGRwIqeOedRedtS0T7QVn
h0i2
---- END SSH2 PUBLIC KEY ----
```
## SSH Key Management
You can manage your own SSH key for authentication to clusters:
* [e-INFRA CZ account][3]
* [IT4I account][4]
[1]: ./ssh-keys.md
[2]: ./putty.md
[3]: ../../management/einfracz-profile.md
[4]: ../../management/it4i-profile.md
# OpenSSH Keys (UNIX)
## Creating Your Own Key
To generate a new keypair of your public and private key, use the `ssh-keygen` tool:
```console
local $ ssh-keygen -t ed25519 -C username@organization.example.com' -f additional_key
```
!!! note
Enter a **strong** **passphrase** for securing your private key.
By default, your private key is saved to the `id_rsa` file in the `.ssh` directory
and your public key is saved to the `id_rsa.pub` file.
## Adding SSH Key to Linux System SSH Agent
1. Check if SSH Agent is running:
```
eval "$(ssh-agent -s)"
```
1. Add the key to SSH Agent:
```
ssh-add ~/.ssh/name_of_your_ssh_key_file
```
1. Verify the key Added to SSH Agent:
```
ssh-add -l
```
## Managing Your SSH Key
To manage your SSH key for authentication to clusters, see the [SSH Key Management][1] section.
[1]: ./ssh-key-management.md
# Tmux
[Tmux][1] is an open-source terminal multiplexer which allows multiple terminal sessions to be accessed simultaneously in a single window. Tmux allows you to switch easily between several programs in one terminal, detach them (they keep running in the background) and reattach them to a different terminal.
Note that [GNU Screen][2] is not supported, but if you prefer it, you can install it in your `/home` folder:
```console
wget https://ftp.gnu.org/gnu/screen/screen-4.9.0.tar.gz
tar xf screen-4.9.0.tar.gz && rm screen-4.9.0.tar.gz
cd screen-4.9.0
./autogen.sh
./configure --prefix=$HOME/.local/screen
make
make install
mkdir $HOME/.local/screen/etc
cp etc/etcscreenrc $HOME/.local/screen/etc/screenrc
echo "export PATH=\$HOME/.local/screen/bin:\$PATH" >> $HOME/.bashrc
cd ../ && rm -rf screen-4.9.0
```
[1]: https://github.com/tmux/tmux/wiki
[2]: https://www.gnu.org/software/screen/
# VPN Access
## Accessing IT4Innovations Internal Resources via VPN
To access IT4Innovations' resources and licenses, it is necessary to connect to its local network via VPN.
IT4Innovations uses the FortiClient VPN software.
For the list of supported operating systems, see the [FortiClient Administration Guide][a].
!!! Note "Realms"
If you are member of a partner organization, we may ask you to use so called realm in your VPN connection. In the Remote Gateway field, include the realm path after the IP address or hostname. For example, for a realm `excellent`, the field would read as follows `reconnect.it4i.cz:443/excellent`.
## VPN Client Download
* Windows: Download the **FortiClient VPN-only** app from the [official page][g] (Microsoft Store app is not recommended).
* Mac: Download the **FortiClient VPN** app from the [Apple Store][d].
* Linux: Download the [FortiClient][e] or [OpenFortiVPN][f] app.
## Working With Windows/Mac VPN Client
!!! Tip "Instructional video for Mac"
See [the instructional video][h] on how to download the VPN client and connect to the IT4I VPN on Mac.
Before the first login, you must configure the VPN.
In the New VPN Connection section, provide the name of your VPN connection and the following settings:
Name | Value
:-------------------|:------------------
VPN | SSL-VPN
Remote Gateway | reconnect.it4i.cz
Port | 443
Client Certificate | None
Optionally, you can describe the VPN connection and select Save Login under Authentication.
![](../../img/fc_vpn_web_login_2_1.png)
Save the settings, enter your login credentials and click Connect.
![](../../img/fc_vpn_web_login_3_1.png)
## Linux Client
Connection will work with following settings:
Name | Value
:------------|:----------------------
VPN-Server | reconnect.it4i.cz
VPN-Port | 443
Set-Routes | Enabled
Set-DNS | Enabled
DNS Servers | 10.5.8.11, 10.5.8.22
Linux VPN clients need to run under root.
OpenFortiGUI uses sudo by default; be sure that your user is allowed to use sudo.
[1]: ../../general/obtaining-login-credentials/obtaining-login-credentials.md#login-credentials
[2]: ../../general/access/einfracz-account.md
[a]: http://docs.fortinet.com/document/forticlient/latest/administration-guide/646779/installation-requirements
[c]: https://github.com/theinvisible/openfortigui
[d]: https://apps.apple.com/cz/app/forticlient-vpn/id1475674905?l=cs
[e]: https://www.fortinet.com/support/product-downloads/linux
[f]: https://github.com/adrienverge/openfortivpn
[g]: https://www.fortinet.com/support/product-downloads#vpn
[h]: https://www.youtube.com/watch?v=xGcROEreop8
\ No newline at end of file
# Get Project
The computational resources of IT4I are allocated by the Allocation Committee via several [allocation mechanisms][a] to a project investigated by a Primary Investigator. By allocating the computational resources, the Allocation Committee is authorizing the PI to access and use the clusters. The PI may decide to authorize a number of their collaborators to access and use the clusters to consume the resources allocated to their Project. These collaborators will be associated to the Project. The Figure below is depicting the authorization chain:
![](../img/Authorization_chain.png)
**Allocation Mechanisms:**
* Academic researchers may apply via Open Access Competitions.
* Commercial and non-commercial institutions may also apply via the Directors Discretion.
In all cases, IT4Innovations’ access mechanisms are aimed at distributing computational resources while taking into account the development and application of supercomputing methods and their benefits and usefulness for society. The applicants are expected to submit a proposal. In the proposal, the applicants **apply for a particular amount of core-hours** of computational resources. The requested core-hours should be substantiated by scientific excellence of the proposal, its computational maturity and expected impacts. The allocation decision is based on the scientific, technical, and economic evaluation of the proposal.
## Becoming Primary Investigator
Once you create an account, log in to the [IT4I SCS portal][e] and apply for a project.
You will be informed by IT4I about the Allocation Committee decision.
Once approved by the Allocation Committee, you become the Primary Investigator (PI) for the project
and are authorized to use the clusters and any allocated resources as well as authorize collaborators for your project.
### Authorize Collaborators for Your Project
As a PI, you can approve or deny users' requests to join your project. There are two methods of authorizing collaborators:
#### Authorization by Web
This is a preferred method if you have an IT4I or e-INFRA CZ account.
Log in to the [IT4I SCS portal][e] using your credentials and go to the **Authorization Requests** section.
Here you can authorize collaborators for your project.
#### Authorization by Email (An Alternative Approach)
In order to authorize a Collaborator to utilize the allocated resources, the PI should contact the [IT4I support][f] (email: [support\[at\]it4i.cz][g]) and provide the following information:
1. Identify their project by project ID.
1. Provide a list of people, including themself, who are authorized to use the resources allocated to the project. The list must include the full name, email and affiliation. If collaborators' login access already exists in the IT4I systems, provide their usernames as well.
1. Include "Authorization to IT4Innovations" into the subject line.
!!! warning
Should the above information be provided by email, the email **must be** digitally signed. Read more on [digital signatures][2].
Example (except the subject line which must be in English, you may use Czech or Slovak language for communication with us):
```console
Subject: Authorization to IT4Innovations
Dear support,
Please include my collaborators to project OPEN-0-0.
John Smith, john.smith@myemail.com, Department of Chemistry, MIT, US
Jonas Johansson, jjohansson@otheremail.se, Department of Physics, RIT, Sweden
Luisa Fibonacci, lf@emailitalia.it, Department of Mathematics, National Research Council, Italy
Thank you,
PI
(Digitally signed)
```
!!! note
Web-based email interfaces cannot be used for secure communication; external application, such as Thunderbird or Outlook must be used. This way, your new credentials will be visible only in applications that have access to your certificate.
[1]: obtaining-login-credentials/obtaining-login-credentials.md
[2]: https://docs.it4i.cz/general/obtaining-login-credentials/obtaining-login-credentials/#certificates-for-digital-signatures
[a]: https://www.it4i.cz/en/for-users/computing-resources-allocation
[b]: http://www.it4i.cz/open-access-competition/?lang=en&lang=en
[c]: http://www.it4i.cz/obtaining-computational-resources-through-directors-discretion/?lang=en&lang=en
[d]: https://prace-ri.eu/hpc-access/deci-access/deci-access-information-for-applicants/
[e]: https://scs.it4i.cz
[f]: https://support.it4i.cz/rt/
[g]: mailto:support@it4i.cz
# Acceptable Use Policy
![PDF presentation on Slurm Batch Jobs Examples](../general/AUP-final.pdf){ type=application/pdf style="min-height:100vh;width:100%" }
---
hide:
- toc
---
# Barbora Partitions
!!! important
Active [project membership][1] is required to run jobs.
Below is the list of partitions available on the Barbora cluster:
| Partition | Project resources | Nodes | Min ncpus | Priority | Authorization | Walltime (def/max) |
| ---------------- | -------------------- | -------------------------- | --------- | -------- | ------------- | ------------------ |
| **qcpu** | > 0 | 190 | 36 | 2 | no | 24 / 48h |
| **qcpu_biz** | > 0 | 190 | 36 | 3 | no | 24 / 48h |
| **qcpu_exp** | < 150% of allocation | 16 | 36 | 4 | no | 1 / 1h |
| **qcpu_free** | < 150% of allocation | 124<br>max 4 per job | 36 | 1 | no | 12 / 18h |
| **qcpu_long** | > 0 | 60<br>max 20 per job | 36 | 2 | no | 72 / 144h |
| **qcpu_preempt** | active Barbora<br>CPU alloc. | 190<br>max 4 per job | 36 | 0 | no | 12 / 12h |
| **qgpu** | > 0 | 8 | 24 | 2 | yes | 24 / 48h |
| **qgpu_biz** | > 0 | 8 | 24 | 3 | yes | 24 / 48h |
| **qgpu_exp** | < 150% of allocation | 4<br>max 1 per job | 24 | 4 | no | 1 / 1h |
| **qgpu_free** | < 150% of allocation | 5<br>max 2 per job | 24 | 1 | no | 12 / 18h |
| **qgpu_preempt** | active Barbora<br>GPU alloc. | 4<br>max 2 per job | 24 | 0 | no | 12 / 12h |
| **qdgx** | > 0 | cn202 | 96 | 2 | yes | 4 / 48h |
| **qviz** | > 0 | 2 with NVIDIA Quadro P6000 | 4 | 2 | no | 1 / 8h |
| **qfat** | > 0 | 1 fat node | 128 | 2 | yes | 24 / 48h |
[1]: access/project-access.md
# Capacity Computing
## Introduction
In many cases, it is useful to submit a huge number of computational jobs into the Slurm queue system.
A huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations,
achieving the best runtime, throughput, and computer utilization. This is called **Capacity Computing**
However, executing a huge number of jobs via the Slurm queue may strain the system. This strain may
result in slow response to commands, inefficient scheduling, and overall degradation of performance
and user experience for all users.
We **recommend** using [**Job arrays**][1] or [**HyperQueue**][2] to execute many jobs.
There are two primary scenarios:
1. Number of jobs < 1500, **and** the jobs are able to utilize one or more **full** nodes:
Use [**Job arrays**][1].
The Job array allows to submit and control up to 1500 jobs (tasks) in one packet. Several job arrays may be submitted.
2. Number of jobs >> 1500, **or** the jobs only utilize a **few cores/accelerators** each:
Use [**HyperQueue**][2].
HyperQueue can help efficiently load balance a very large number of jobs (tasks) amongst available computing nodes.
HyperQueue may be also used if you have dependencies among the jobs.
[1]: job-arrays.md
[2]: hyperqueue.md
\ No newline at end of file
File added
# Energy Saving
IT4Innovations has implemented a set of energy saving measures on the supercomputing clusters. The measures are selected to minimize the performance impact and achieve significant cost, energy, and carbon footprint reduction effect.
The energy saving measures are effective as of **1.2.2023**.
## Karolina
### Measures
The CPU core and GPU streaming multiprocessors frequency limit is implemented for the Karolina supercomputer:
|Measure | Value |
|---------------------------------------------------------|---------|
|Compute nodes **cn[001-720]**<br> CPU core frequency limit | 2.100 GHz |
|Accelerated compute nodes **acn[001-72]**<br> CPU core frequency limit | 2.600 GHz |
|Accelerated compute nodes **acn[001-72]**<br> GPU SMs frequency limit | 1.290 GHz |
### Performance Impact
The performance impact depends on the [arithmetic intensity][1] of the executed workload.
The [arithmetic intensity][2] is a measure of floating-point operations (FLOPs) performed by a given code (or code section) relative to the amount of memory accesses (Bytes) that are required to support those operations. It is defined as a FLOP per Byte ratio (F/B).Arithmetic intensity is a characteristic of the computational algorithm.
In general, the processor frequency [capping][3] has low performance impact for memory bound computations (arithmetic intensity below the [ridge point][2]). For processor bound computations (arithmetic intensity above the [ridge point][2]), the impact is proportional to the frequency reduction.
On Karolina, runtime increase **up to 16%** is [observed][4] for arithmeticaly intensive CPU workloads and **up to 10%** for intensive GPU workloads. **No slowdown** is [observed][4] for memory bound workloads.
### Energy Efficiency
The energy efficiency in floating point operations per energy unit is increased by **up to 30%** for both the CPU and GPU workloads. The efficiency depends on the arithmetic intensity, however energy savings are always achieved.
## Barbora
None implemented yet.
## NVIDIA DGX-2
None implemented yet.
## Complementary Systems
None implemented yet.
[1]: https://en.wikipedia.org/wiki/Roofline_model
[2]: https://dl.acm.org/doi/10.1145/1498765.1498785
[3]: https://slovnik.seznam.cz/preklad/anglicky_cesky/capping
[4]: Energy_saving_Karolina.pdf
# Satisfaction and Feedback
IT4Innovations National Supercomputing Center is interested in [user satisfaction and feedback][1]. It allows us to prioritize and focus on the most pressing issues. With the help of user feedback, we strive to provide smooth and productive environment, where computational tasks may be solved without distraction or annoyance.
## Feedback Form
Please provide us with feedback regarding your satisfaction with our services using [the online form][1]. Set the values and comment on the individual aspects of our services.
We prefer you enter [**new inputs 3 times a year**][1].
You may view your [feedback history][2] any time.
You are welcome to modify your most recent input.
The form inquires about:
- Resource allocation and access
- Computing environment
- Added value services
You may set the satisfaction score on a **scale of 1 to 5** as well as leave **text comments**.
The score is interpreted as follows:
|Value | Interpretation |
|-----|---|
| 1-2 | Values below 3 indicate a level of dissatisfaction; improvements or other actions are desirable. The values are interpreted as a measure of how deep the dissatisfaction is.|
| 3 | Value 3 indicates a degree of satisfaction. Users are reasonably happy with the environment and services and do not require changes, although there still might be room for improvements. |
| 4-5 | Values above 3 indicate a level of exceptional appreciation and satisfaction; the values are interpreted as a measure of how rewarding the experience is. |
## Feedback Automation
In order to obtain ample feedback data without forcing our users
to spend efforts in filling out the feedback form, we implement automatic data collection.
The automation works as follows:
If the last feedback entry is older than 4 months, a new feedback entry is created as a copy of the last entry.
The new entry is modified in this way:
- score values greater than 3 are decremented by one;
- score values lower than 3 are incremented by one;
- score values equal to 3 are preserved;
- text fields are set blank.
Once a new feedback is created, users are notified by email and invited to [modify the feedback entry][2] as they see fit.
**Rationale:** Feedback automation takes away some effort from a group of moderately satisfied users,
while prompting the users to express satisfaction/dissatisfaction.
We assume that moderately satisfied users (satisfaction value 3) do not require changes to the environment
and tend to remain moderately satisfied in time.
Further, we assume that satisfied users (values 4-5) develop in time towards moderately satisfied (value 3)
by getting accustomed to the provided standards.
The dissatisfied users (values 1-2) also develop towards moderately satisfied due to
gradual improvements implemented by the IT4I.
## Request Tracker Feedback
Please use the [user satisfaction and feedback][1] form to provide your overall view.
For acute, pressing issues and immediate contact, reach out for support via the [Request tracker portal][3] or [support\[at\]it4i.cz][4] email.
Express your satisfaction with the solution of an individual [Request tracker][3] ticket by selecting **Feedback** menu on the ticket form.
## Evaluation
The user feedback is evaluated 4 times a year, in the end of March, June, September, and December.
We consider the text comments, as well as evaluate the score average, distribution and trends.
This is done in summary as well as per individual category.
[1]: https://scs.it4i.cz/feedbacks/new
[2]: https://scs.it4i.cz/feedbacks/
[3]: https://support.it4i.cz/rt
[4]: mailto:support@it4i.cz
# HyperQueue
HyperQueue lets you build a computation plan consisting of a large amount of tasks and then execute it transparently over a system like SLURM/PBS.
It dynamically groups tasks into Slurm jobs and distributes them to fully utilize allocated nodes.
You thus do not have to manually aggregate your tasks into Slurm jobs.
Find more about HyperQueue in its [documentation][a].
![](../img/hq-idea-s.png)
## Features
* **Transparent task execution on top of a Slurm/PBS cluster**
* Automatic task distribution amongst jobs, nodes, and cores
* Automatic submission of PBS/Slurm jobs
* **Dynamic load balancing across jobs**
* Work-stealing scheduler
* NUMA-aware, core planning, task priorities, task arrays
* Nodes and tasks may be added/removed on the fly
* **Scalable**
* Low overhead per task (~100μs)
* Handles hundreds of nodes and millions of tasks
* Output streaming avoids creating many files on network filesystems
* **Easy deployment**
* Single binary, no installation, depends only on *libc*
* No elevated privileges required
## Installation
* On Barbora and Karolina, you can simply load the HyperQueue module:
```console
$ ml HyperQueue
```
* If you want to install/compile HyperQueue manually, follow the steps on the [official webpage][b].
## Usage
### Starting the Server
To use HyperQueue, you first have to start the HyperQueue server. It is a long-lived process that
is supposed to be running on a login node. You can start it with the following command:
```console
$ hq server start
```
### Submitting Computation
Once the HyperQueue server is running, you can submit jobs into it. Here are a few examples of job submissions.
You can find more information in the [documentation][1].
* Submit a simple job (command `echo 'Hello world'` in this case)
```console
$ hq submit echo 'Hello world'
```
* Submit a job with 10000 tasks
```console
$ hq submit --array 1-10000 my-script.sh
```
Once you start some jobs, you can observe their status using the following commands:
```console
# Display status of a single job
$ hq job <job-id>
# Display status of all jobs
$ hq jobs
```
!!! important
Before the jobs can start executing, you have to provide HyperQueue with some computational resources.
### Providing Computational Resources
Before HyperQueue can execute your jobs, it needs to have access to some computational resources.
You can provide these by starting HyperQueue *workers* which connect to the server and execute your jobs.
The workers should run on computing nodes, therefore they should be started inside Slurm jobs.
There are two ways of providing computational resources.
* **Allocate Slurm jobs automatically**
HyperQueue can automatically submit Slurm jobs with workers on your behalf. This system is called
[automatic allocation][c]. After the server is started, you can add a new automatic allocation
queue using the `hq alloc add` command:
```console
$ hq alloc add slurm -- -A<PROJECT-ID> -p qcpu_exp
```
After you run this command, HQ will automatically start submitting Slurm jobs on your behalf
once some HQ jobs are submitted.
* **Manually start Slurm jobs with HQ workers**
With the following command, you can submit a Slurm job that will start a single HQ worker which
will connect to a running HQ server.
```console
$ salloc <salloc-params> -- /bin/bash -l -c "$(which hq) worker start"
```
!!! tip
For debugging purposes, you can also start the worker e.g. on a login node, simply by running
`$ hq worker start`. Do not use such worker for any long-running computations though!
## Architecture
Here you can see the architecture of HyperQueue.
The user submits jobs into the server which schedules them onto a set of workers running on compute nodes.
![](../img/hq-architecture.png)
[1]: https://it4innovations.github.io/hyperqueue/stable/jobs/jobs/
[a]: https://it4innovations.github.io/hyperqueue/stable/
[b]: https://it4innovations.github.io/hyperqueue/stable/installation/
[c]: https://it4innovations.github.io/hyperqueue/stable/deployment/allocation/
# Job Arrays
A job array is a compact representation of many jobs called tasks. Tasks share the same job script, and have the same values for all attributes and resources, with the following exceptions:
* each task has a unique index, `$SLURM_ARRAY_TASK_ID`
* job Identifiers of tasks only differ by their indices
* the state of tasks can differ
All tasks within a job array have the same scheduling priority and schedule as independent jobs. An entire job array is submitted through a single `sbatch` command and may be managed by `squeue`, `scancel` and `scontrol` commands as a single job.
## Shared Jobscript
All tasks in a job array use the very same single jobscript. Each task runs its own instance of the jobscript. The instances execute different work controlled by the `$SLURM_ARRAY_TASK_ID` variable.
Example:
Assume we have 900 input files with the name of each beginning with "file" (e.g. file001, ..., file900). Assume we would like to use each of these input files with myprog.x program executable,
each as a separate, single node job running 128 threats.
First, we create a `tasklist` file, listing all tasks - all input files in our example:
```console
$ find . -name 'file*' > tasklist
```
Then we create a jobscript:
```bash
#!/bin/bash
#SBATCH -p qcpu
#SBATCH -A SERVICE
#SBATCH --nodes 1 --ntasks-per-node 1 --cpus-per-task 128
#SBATCH -t 02:00:00
#SBATCH -o /dev/null
# change to scratch directory
SCRDIR=/scratch/project/$SLURM_JOB_ACCOUNT/$SLURM_JOB_USER/$SLURM_JOB_ID
mkdir -p $SCRDIR
cd $SCRDIR || exit
# get individual tasks from tasklist with index from SLURM JOB ARRAY
TASK=$(sed -n "${SLURM_ARRAY_TASK_ID}p" $SLURM_SUBMIT_DIR/tasklist)
# copy input file and executable to scratch
cp $SLURM_SUBMIT_DIR/$TASK input
cp $SLURM_SUBMIT_DIR/myprog.x .
# execute the calculation
./myprog.x < input > output
# copy output file to submit directory
cp output $SLURM_SUBMIT_DIR/$TASK.out
```
In this example, the submit directory contains the 900 input files, the myprog.x executable,
and the jobscript file. As an input for each run, we take the filename of the input file from the created
tasklist file. We copy the input file to a scratch directory `/scratch/project/$SLURM_JOB_ACCOUNT/$SLURM_JOB_USER/$SLURM_JOB_ID`,
execute the myprog.x and copy the output file back to the submit directory, under the `$TASK.out` name. The myprog.x executable runs on one node only and must use threads to run in parallel.
Be aware, that if the myprog.x **is not multithreaded or multi-process (MPI)**, then all the **jobs are run as single-thread programs, wasting node resources**.
## Submitting Job Array
To submit the job array, use the `sbatch --array` command. The 900 jobs of the [example above][2] may be submitted like this:
```console
$ sbatch -J JOBNAME --array 1-900 ./jobscript
```
In this example, we submit a job array of 900 tasks. Each task will run on one full node and is assumed to take less than 2 hours (note the #SBATCH directives in the beginning of the jobscript file, do not forget to set your valid PROJECT_ID and desired queue).
## Managing Job Array
Check status of the job array using the `squeue --me` command, alternatively `squeue --me --array`.
```console
$ squeue --me --long
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
2499924_[1-900] qcpu myarray user PENDING 0:00 02:00:00 1 (Resources)
```
Check the status of the tasks using the `squeue` command.
```console
$ squeue -j 2499924 --long
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
2499924_1 qcpu myarray user PENDING 0:00 02:00:00 1 (Resources)
. . . . . . . . . . .
. . . . . . . . . . .
2499924_900 qcpu myarray user PENDING 0:00 02:00:00 1 (Resources)
```
Delete the entire job array. Running tasks will be killed, queueing tasks will be deleted.
```console
$ scancel 2499924
```
For more information on job arrays, see the [SLURM guide][1].
[1]: https://slurm.schedmd.com/job_array.html
[2]: #shared-jobscript
# Job Scheduling
## Job Priority
The scheduler gives each job a priority and then uses this job priority to select which job(s) to run.
Job priority is determined by these job properties (in order of importance):
1. queue priority
1. fair-share priority
1. job age/eligible time
### Queue Priority
Queue priority is the priority of the queue in which the job is waiting prior to execution.
Queue priority has the biggest impact on job priority. The priority of jobs in higher priority queues is always greater than the priority of jobs in lower priority queues. Other properties of jobs used for determining the job priority (fair-share priority, eligible time) cannot compete with queue priority.
Queue priorities can be seen [here][a].
### Fair-Share Priority
Fair-share priority is calculated based on recent usage of resources. Fair-share priority is calculated per project, i.e. all members of a project share the same fair-share priority. Projects with higher recent usage have a lower fair-share priority than projects with lower or no recent usage.
Fair-share priority is used for ranking jobs with equal queue priority.
Usage decays, halving at intervals of 7 days.
### Job Age/Eligible Time
The job age factor represents the length of time a job has been sitting in the queue and eligible to run.
Job age has the least impact on priority.
### Formula
Job priority is calculated as:
---8<--- "job_sort_formula.md"
### Job Backfilling
The scheduler uses job backfilling.
Backfilling means fitting smaller jobs around the higher-priority jobs that the scheduler is going to run next, in such a way that the higher-priority jobs are not delayed. Backfilling allows us to keep resources from becoming idle when the top job (the job with the highest priority) cannot run.
The scheduler makes a list of jobs to run in order of priority. The scheduler looks for smaller jobs that can fit into the usage gaps around the highest-priority jobs in the list. The scheduler looks in the prioritized list of jobs and chooses the highest-priority smaller jobs that fit. Filler jobs are run only if they will not delay the start time of top jobs.
This means that jobs with lower priority can be run before jobs with higher priority.
!!! note
It is **very beneficial to specify the timelimit** when submitting jobs.
Specifying more accurate timelimit enables better scheduling, better times, and better resource usage. Jobs with suitable (small) timelimit can be backfilled - and overtake job(s) with a higher priority.
---8<--- "mathjax.md"
## Technical Details
Priorities are set using Slurm's [Multifactor Priority Plugin][1]. Current settings are as follows:
```
$ grep ^Priority /etc/slurm/slurm.conf
PriorityFlags=DEPTH_OBLIVIOUS
PriorityType=priority/multifactor
PriorityDecayHalfLife=7-0
PriorityMaxAge=14-0
PriorityWeightAge=100000
PriorityWeightFairshare=10000000
PriorityWeightPartition=1000000000
```
## Inspecting Job Priority
One can inspect job priority using `sprio` command. Job priority is in the field PRIORITY and it is comprised of PARTITION, FAIRSHARE and AGE priorities.
```
$ sprio -l -j 894782
JOBID PARTITION USER ACCOUNT PRIORITY SITE AGE ASSOC FAIRSHARE JOBSIZE PARTITION QOSNAME QOS NICE TRES
894782 qgpu user1 service 300026688 0 17 0 26671 0 300000000 normal 0 0
```
[1]: https://slurm.schedmd.com/priority_multifactor.html
[a]: https://extranet.it4i.cz/rsweb/karolina/queues
slurm-job-submission-and-execution.md
\ No newline at end of file