...

Commits (142)
 ... ... @@ -2,6 +2,7 @@ stages: - test - build - deploy - after_test variables: PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip" ... ... @@ -40,6 +41,15 @@ ext_links: only: - master 404s: stage: after_test image: davidhrbac/docker-mkdocscheck:latest script: - wget -V - echo https://docs.it4i.cz/devel/$CI_BUILD_REF_NAME/ - wget --spider -e robots=off -o wget.log -r -p https://docs.it4i.cz/devel/$CI_BUILD_REF_NAME/ - cat wget.log | awk '/^Found [0-9]+ broken links.$/,/FINISHED/ { rc=-1; print $0 }; END { exit rc }' mkdocs: stage: build image: davidhrbac/docker-mkdocscheck:latest ... ... @@ -59,9 +69,10 @@ mkdocs: - bash scripts/add_version.sh # get modules list from clusters - bash scripts/get_modules.sh #generate site_url - sed "s/$$site_url.*$$/\1devel\/$CI_BUILD_REF_NAME\//" mkdocs.yml | head - (if [ "${CI_BUILD_REF_NAME}" != 'hrb3' ]; then sed -i "s/$$site_url.*$$/\1devel\/$CI_BUILD_REF_NAME\//" mkdocs.yml;fi); # generate site_url - (if [ "${CI_BUILD_REF_NAME}" != 'master' ]; then sed -i "s/$$site_url.*$$/\1devel\/$CI_BUILD_REF_NAME\//" mkdocs.yml;fi); # generate ULT for code link - sed -i "s/master/$CI_BUILD_REF_NAME/g" material/partials/toc.html # regenerate modules matrix - python scripts/modules-matrix.py > docs.it4i/modules-matrix.md - python scripts/modules-json.py > docs.it4i/modules-matrix.json ... ...  ... ... @@ -9,13 +9,13 @@ However, executing a huge number of jobs via the PBS queue may strain the system !!! note Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. * Use [Job arrays](/anselm/capacity-computing/#job-arrays) when running a huge number of [multithread](anselm/capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs * Use [GNU parallel](/anselm/capacity-computing/#gnu-parallel) when running single core jobs * Combine [GNU parallel with Job arrays](/anselm/capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs * Use [Job arrays][1] when running a huge number of [multithread][2] (bound to one node only) or multinode (multithread across several nodes) jobs * Use [GNU parallel][3] when running single core jobs * Combine [GNU parallel with Job arrays][4] when running huge number of single core jobs ## Policy 1. A user is allowed to submit at most 100 jobs. Each job may be [a job array](/anselm/capacity-computing/#job-arrays). 1. A user is allowed to submit at most 100 jobs. Each job may be [a job array][1]. 1. The array size is at most 1000 subjobs. ## Job Arrays ... ... @@ -76,7 +76,7 @@ If running a huge number of parallel multicore (in means of multinode multithrea ### Submit the Job Array To submit the job array, use the qsub -J command. The 900 jobs of the [example above](/anselm/capacity-computing/#array_example) may be submitted like this: To submit the job array, use the qsub -J command. The 900 jobs of the [example above][5] may be submitted like this: console$ qsub -N JOBNAME -J 1-900 jobscript ... ... @@ -145,7 +145,7 @@ Display status information for all user's subjobs. $qstat -u$USER -tJ  Read more on job arrays in the [PBSPro Users guide](pbspro/). Read more on job arrays in the [PBSPro Users guide][6]. ## GNU Parallel ... ... @@ -207,7 +207,7 @@ In this example, tasks from the tasklist are executed via the GNU parallel. The ### Submit the Job To submit the job, use the qsub command. The 101 task job of the [example above](/anselm/capacity-computing/#gp_example) may be submitted as follows: To submit the job, use the qsub command. The 101 task job of the [example above][7] may be submitted as follows: console $qsub -N JOBNAME jobscript ... ... @@ -292,7 +292,7 @@ When deciding this values, keep in mind the following guiding rules: ### Submit the Job Array (-J) To submit the job array, use the qsub -J command. The 992 task job of the [example above](/anselm/capacity-computing/#combined_example) may be submitted like this: To submit the job array, use the qsub -J command. The 992 task job of the [example above][8] may be submitted like this: console$ qsub -N JOBNAME -J 1-992:32 jobscript ... ... @@ -306,7 +306,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**, ## Examples Download the examples in [capacity.zip](capacity.zip), illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs. Download the examples in [capacity.zip][9], illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs. Unzip the archive in an empty directory on Anselm and follow the instructions in the README file ... ... @@ -314,3 +314,13 @@ Unzip the archive in an empty directory on Anselm and follow the instructions in $unzip capacity.zip$ cat README  [1]: #job-arrays [2]: #shared-jobscript-on-one-node [3]: #gnu-parallel [4]: #job-arrays-and-gnu-parallel [5]: #array_example [6]: ../pbspro.md [7]: #gp_example [8]: #combined_example [9]: capacity.zip
 ... ... @@ -2,7 +2,7 @@ ## Node Configuration Anselm is cluster of x86-64 Intel based nodes built with Bull Extreme Computing bullx technology. The cluster contains four types of compute nodes. Anselm is a cluster of x86-64 Intel based nodes built with Bull Extreme Computing bullx technology. The cluster contains four types of compute nodes. ### Compute Nodes Without Accelerators ... ... @@ -52,7 +52,7 @@ Anselm is cluster of x86-64 Intel based nodes built with Bull Extreme Computing ### Compute Node Summary | Node type | Count | Range | Memory | Cores | [Access](/general/resources-allocation-policy/) | | Node type | Count | Range | Memory | Cores | Queues | | ---------------------------- | ----- | ----------- | ------ | ----------- | -------------------------------------- | | Nodes without an accelerator | 180 | cn[1-180] | 64GB | 16 @ 2.4GHz | qexp, qprod, qlong, qfree, qprace, qatlas | | Nodes with a GPU accelerator | 23 | cn[181-203] | 96GB | 16 @ 2.3GHz | qnvidia, qexp | ... ...
 ... ... @@ -16,7 +16,7 @@ Queue priority is the priority of the queue in which the job is waiting prior to Queue priority has the biggest impact on job execution priority. The execution priority of jobs in higher priority queues is always greater than the execution priority of jobs in lower priority queues. Other properties of jobs used for determining the job execution priority (fair-share priority, eligible time) cannot compete with queue priority. Queue priorities can be seen at [https://extranet.it4i.cz/anselm/queues](https://extranet.it4i.cz/anselm/queues) Queue priorities can be seen [here][a]. ### Fair-Share Priority ... ... @@ -36,7 +36,7 @@ Usage counts allocated core-hours (ncpus x walltime). Usage decays, halving at Jobs queued in the queue qexp are not used to calculate the project's usage. !!! note Calculated usage and fair-share priority can be seen at [https://extranet.it4i.cz/anselm/projects](https://extranet.it4i.cz/anselm/projects). Calculated usage and fair-share priority can be seen [here][b]. Calculated fair-share priority can be also be seen in the Resource_List.fairshare attribute of a job. ... ... @@ -70,3 +70,6 @@ This means that jobs with lower execution priority can be run before jobs with h Specifying more accurate walltime enables better scheduling, better execution times, and better resource usage. Jobs with suitable (small) walltime can be backfilled - and overtake job(s) with a higher priority. ---8<--- "mathjax.md" [a]: https://extranet.it4i.cz/rsweb/anselm/queues [b]: https://extranet.it4i.cz/rsweb/anselm/projects
 ... ... @@ -51,7 +51,7 @@ $qsub -A OPEN-0-0 -q qfree -l select=10:ncpus=16 ./myjob In this example, we allocate 10 nodes, 16 cores per node, for 12 hours. We allocate these resources via the qfree queue. It is not required that the project OPEN-0-0 has any available resources left. Consumed resources are still accounted for. The jobscript myjob will be executed on the first node in the allocation. All qsub options may be [saved directly into the jobscript](#example-jobscript-for-mpi-calculation-with-preloaded-inputs). In such cases, it is not necessary to specify any options for qsub. All qsub options may be [saved directly into the jobscript][1]. In such cases, it is not necessary to specify any options for qsub. console$ qsub ./myjob ... ... @@ -92,9 +92,9 @@ In this example, we allocate 4 nodes, 16 cores per node, selecting only the node ### Placement by IB Switch Groups of computational nodes are connected to chassis integrated Infiniband switches. These switches form the leaf switch layer of the [Infiniband network](/anselm/network/) fat tree topology. Nodes sharing the leaf switch can communicate most efficiently. Sharing the same switch prevents hops in the network and facilitates unbiased, highly efficient network communication. Groups of computational nodes are connected to chassis integrated Infiniband switches. These switches form the leaf switch layer of the [Infiniband network][2] fat tree topology. Nodes sharing the leaf switch can communicate most efficiently. Sharing the same switch prevents hops in the network and facilitates unbiased, highly efficient network communication. Nodes sharing the same switch may be selected via the PBS resource attribute ibswitch. Values of this attribute are iswXX, where XX is the switch number. The node-switch mapping can be seen in the [Hardware Overview](/anselm/hardware-overview/) section. Nodes sharing the same switch may be selected via the PBS resource attribute ibswitch. Values of this attribute are iswXX, where XX is the switch number. The node-switch mapping can be seen in the [Hardware Overview][3] section. We recommend allocating compute nodes to a single switch when best possible computational network performance is required to run the job efficiently: ... ... @@ -339,7 +339,7 @@ exit In this example, a directory in /home holds the input file input and executable mympiprog.x . We create the directory myjob on the /scratch filesystem, copy input and executable files from the /home directory where the qsub was invoked ($PBS_O_WORKDIR) to /scratch, execute the MPI program mympiprog.x and copy the output file back to the /home directory. mympiprog.x is executed as one process per node, on all allocated nodes. !!! note Consider preloading inputs and executables onto [shared scratch](storage/) memory before the calculation starts. Consider preloading inputs and executables onto [shared scratch][4] memory before the calculation starts. In some cases, it may be impractical to copy the inputs to the scratch memory and the outputs to the home directory. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such cases, it is the users' responsibility to preload the input files on shared /scratch memory before the job submission, and retrieve the outputs manually after all calculations are finished. ... ... @@ -373,15 +373,14 @@ exit In this example, input and executable files are assumed to be preloaded manually in the /scratch/$USER/myjob directory. Note the **mpiprocs** and **ompthreads** qsub options controlling the behavior of the MPI execution. mympiprog.x is executed as one process per node, on all 100 allocated nodes. If mympiprog.x implements OpenMP threads, it will run 16 threads per node. More information can be found in the [Running OpenMPI](/software/mpi/Running_OpenMPI/) and [Running MPICH2](software/mpi/running-mpich2/) sections. More information can be found in the [Running OpenMPI][5] and [Running MPICH2][6] sections. ### Example Jobscript for Single Node Calculation !!! note The local scratch directory is often useful for single node jobs. Local scratch memory will be deleted immediately after the job ends. Example jobscript for single node calculation, using [local scratch](/anselm/storage/) memory on the node: Example jobscript for single node calculation, using [local scratch][4] memory on the node: bash #!/bin/bash ... ... @@ -407,4 +406,12 @@ In this example, a directory in /home holds the input file input and executable ### Other Jobscript Examples Further jobscript examples may be found in the software section and the [Capacity computing](/anselm/capacity-computing/) section. Further jobscript examples may be found in the software section and the [Capacity computing][7] section. [1]: #example-jobscript-for-mpi-calculation-with-preloaded-inputs [2]: network.md [3]: hardware-overview.md [4]: storage.md [5]: ../software/mpi/running_openmpi.md [6]: ../software/mpi/running-mpich2.md [7]: capacity-computing.md
 # Network All of the compute and login nodes of Anselm are interconnected through an [InfiniBand](http://en.wikipedia.org/wiki/InfiniBand) QDR network and a Gigabit [Ethernet](http://en.wikipedia.org/wiki/Ethernet) network. Both networks may be used to transfer user data. All of the compute and login nodes of Anselm are interconnected through an [InfiniBand][a] QDR network and a Gigabit [Ethernet][b] network. Both networks may be used to transfer user data. ## InfiniBand Network All of the compute and login nodes of Anselm are interconnected through a high-bandwidth, low-latency [InfiniBand](http://en.wikipedia.org/wiki/InfiniBand) QDR network (IB 4 x QDR, 40 Gbps). The network topology is a fully non-blocking fat-tree. All of the compute and login nodes of Anselm are interconnected through a high-bandwidth, low-latency [InfiniBand][a] QDR network (IB 4 x QDR, 40 Gbps). The network topology is a fully non-blocking fat-tree. The compute nodes may be accessed via the InfiniBand network using ib0 network interface, in address range 10.2.1.1-209. The MPI may be used to establish native InfiniBand connection among the nodes. ... ... @@ -19,6 +19,8 @@ The compute nodes may be accessed via the regular Gigabit Ethernet network inter ## Example In this example, we access the node cn110 through the InfiniBand network via the ib0 interface, then from cn110 to cn108 through the Ethernet network. console $qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob$ qstat -n -u username ... ... @@ -32,4 +34,5 @@ $ssh 10.2.1.110$ ssh 10.1.1.108  In this example, we access the node cn110 through the InfiniBand network via the ib0 interface, then from cn110 to cn108 through the Ethernet network. [a]: http://en.wikipedia.org/wiki/InfiniBand [b]: http://en.wikipedia.org/wiki/Ethernet
 ... ... @@ -10,7 +10,7 @@ The Anselm cluster is accessed by SSH protocol via login nodes login1 and login2 | login1.anselm.it4i.cz | 22 | ssh | login1 | | login2.anselm.it4i.cz | 22 | ssh | login2 | Authentication is by [private key](../../general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/) Authentication is available by [private key][1] only. !!! note Please verify SSH fingerprints during the first logon. They are identical on all login nodes: ... ... @@ -39,7 +39,7 @@ If you see a warning message "UNPROTECTED PRIVATE KEY FILE!", use this command t $chmod 600 /path/to/id_rsa  On **Windows**, use [PuTTY ssh client](../general/accessing-the-clusters/shell-access-and-data-transfer/putty.md). On **Windows**, use [PuTTY ssh client][2]. After logging in, you will see the command prompt: ... ... @@ -61,11 +61,11 @@ Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com Example to the cluster login: !!! note The environment is **not** shared between login nodes, except for [shared filesystems](storage/#shared-filesystems). The environment is **not** shared between login nodes, except for [shared filesystems][3]. ## Data Transfer Data in and out of the system may be transferred by the [scp](http://en.wikipedia.org/wiki/Secure_copy) and sftp protocols. (Not available yet). In the case that large volumes of data are transferred, use the dedicated data mover node dm1.anselm.it4i.cz for increased performance. Data in and out of the system may be transferred by the [scp][a] and sftp protocols. (Not available yet). In the case that large volumes of data are transferred, use the dedicated data mover node dm1.anselm.it4i.cz for increased performance. | Address | Port | Protocol | | --------------------- | ---- | --------- | ... ... @@ -73,7 +73,7 @@ Data in and out of the system may be transferred by the [scp](http://en.wikipedi | login1.anselm.it4i.cz | 22 | scp | | login2.anselm.it4i.cz | 22 | scp | Authentication is by [private key](../general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md) Authentication is by [private key][1] only. !!! note Data transfer rates of up to **160MB/s** can be achieved with scp or sftp. ... ... @@ -101,7 +101,7 @@ or$ sftp -o IdentityFile=/path/to/id_rsa username@anselm.it4i.cz  A very convenient way to transfer files in and out of Anselm is via the fuse filesystem [sshfs](http://linux.die.net/man/1/sshfs) A very convenient way to transfer files in and out of Anselm is via the fuse filesystem [sshfs][b]. console $sshfs -o IdentityFile=/path/to/id_rsa username@anselm.it4i.cz:. mountpoint ... ... @@ -117,9 +117,9 @@$ man scp $man sshfs  On Windows, use the [WinSCP client](http://winscp.net/eng/download.php) to transfer the data. The [win-sshfs client](http://code.google.com/p/win-sshfs/) provides a way to mount the Anselm filesystems directly as an external disc. On Windows, use the [WinSCP client][c] to transfer the data. The [win-sshfs client][d] provides a way to mount the Anselm filesystems directly as an external disc. More information about the shared file systems is available [here](access/storage/). More information about the shared file systems is available [here][4]. ## Connection Restrictions ... ... @@ -169,15 +169,15 @@$ ssh -L 6000:localhost:1234 remote.host.com Remote port forwarding from compute nodes allows applications running on the compute nodes to access hosts outside the Anselm Cluster. First, establish the remote port forwarding form the login node, as [described above](#port-forwarding-from-login-nodes). First, establish the remote port forwarding form the login node, as [described above][5]. Second, invoke port forwarding from the compute node to the login node. Insert the following line into your jobscript or interactive shell; Second, invoke port forwarding from the compute node to the login node. Insert the following line into your jobscript or interactive shell: console $ssh -TN -f -L 6000:localhost:6000 login1  In this example, we assume that port forwarding from login1:6000 to remote.host.com:1234 has been established beforehand. By accessing localhost:6000, an application running on a compute node will see the response of remote.host.com:1234 In this example, we assume that port forwarding from login1:6000 to remote.host.com:1234 has been established beforehand. By accessing localhost:6000, an application running on a compute node will see the response of remote.host.com:1234. ### Using Proxy Servers ... ... @@ -192,21 +192,39 @@ To establish a local proxy server on your workstation, install and run SOCKS pro$ ssh -D 1080 localhost  On Windows, install and run the free, open source [Sock Puppet](http://sockspuppet.com/) server. On Windows, install and run the free, open source [Sock Puppet][e] server. Once the proxy server is running, establish ssh port forwarding from Anselm to the proxy server, port 1080, exactly as [described above](#port-forwarding-from-login-nodes): Once the proxy server is running, establish ssh port forwarding from Anselm to the proxy server, port 1080, exactly as [described above][5]: console $ssh -R 6000:localhost:1080 anselm.it4i.cz  Now, configure the applications proxy settings to **localhost:6000**. Use port forwarding to access the [proxy server from compute nodes](#port-forwarding-from-compute-nodes) as well. Now, configure the applications proxy settings to **localhost:6000**. Use port forwarding to access the [proxy server from compute nodes][5] as well. ## Graphical User Interface * The [X Window system](/general/accessing-the-clusters/graphical-user-interface/x-window-system/) is the principal way to get GUI access to the clusters. * [Virtual Network Computing](/general/accessing-the-clusters/graphical-user-interface/vnc/) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing) system that uses the [Remote Frame Buffer protocol](http://en.wikipedia.org/wiki/RFB_protocol) to remotely control another [computer](http://en.wikipedia.org/wiki/Computer). * The [X Window system][6] is the principal way to get GUI access to the clusters. * [Virtual Network Computing][7] is a graphical [desktop sharing][f] system that uses the [Remote Frame Buffer protocol][g] to remotely control another [computer][h]. ## VPN Access * Access IT4Innovations internal resources via [VPN](/general/accessing-the-clusters/vpn-access/). * Access IT4Innovations internal resources via [VPN][8]. [1]: ../general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md [2]: ../general/accessing-the-clusters/shell-access-and-data-transfer/putty.md [3]: storage.md#shared-filesystems [4]: storage.md [5]: #port-forwarding-from-login-nodes [6]: ../general/accessing-the-clusters/graphical-user-interface/x-window-system.md [7]: ../general/accessing-the-clusters/graphical-user-interface/vnc.md [8]: ../general/accessing-the-clusters/vpn-access.md [a]: http://en.wikipedia.org/wiki/Secure_copy [b]: http://linux.die.net/man/1/sshfs [c]: http://winscp.net/eng/download.php [d]: http://code.google.com/p/win-sshfs/ [e]: http://sockspuppet.com/ [f]: http://en.wikipedia.org/wiki/Desktop_sharing [g]: http://en.wikipedia.org/wiki/RFB_protocol [h]: http://en.wikipedia.org/wiki/Computer  ... ... @@ -197,11 +197,11 @@$ ./test.cuda ### cuBLAS The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library with 152 standard BLAS routines. A basic description of the library together with basic performance comparisons with MKL can be found [here](https://developer.nvidia.com/cublas "Nvidia cuBLAS"). The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library with 152 standard BLAS routines. A basic description of the library together with basic performance comparisons with MKL can be found [here][a]. #### cuBLAS Example: SAXPY The SAXPY function multiplies the vector x by the scalar alpha, and adds it to the vector y, overwriting the latest vector with the result. A description of the cuBLAS function can be found in [NVIDIA CUDA documentation](http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-axpy "Nvidia CUDA documentation "). Code can be pasted in the file and compiled without any modification. The SAXPY function multiplies the vector x by the scalar alpha, and adds it to the vector y, overwriting the latest vector with the result. A description of the cuBLAS function can be found in [NVIDIA CUDA documentation][b]. Code can be pasted in the file and compiled without any modification. cpp /* Includes, system */ ... ... @@ -283,8 +283,8 @@ int main(int argc, char **argv) !!! note cuBLAS has its own function for data transfers between CPU and GPU memory: - [cublasSetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector) - transfers data from CPU to GPU memory - [cublasGetVector](http://docs.nvidia.com/cuda/cublas/index.html#cublasgetvector) - transfers data from GPU to CPU memory - [cublasSetVector][c] - transfers data from CPU to GPU memory - [cublasGetVector][d] - transfers data from GPU to CPU memory To compile the code using the NVCC compiler a "-lcublas" compiler flag has to be specified: ... ... @@ -307,3 +307,8 @@ $ml cuda$ ml intel $icc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart  [a]: https://developer.nvidia.com/cublas [b]: http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-axpy [c]: http://docs.nvidia.com/cuda/cublas/index.html#cublassetvector [d]: http://docs.nvidia.com/cuda/cublas/index.html#cublasgetvector This diff is collapsed.  ... ... @@ -30,7 +30,7 @@ fi In order to configure your shell for running particular application on clusters we use Module package interface. Application modules on clusters are built using [EasyBuild](/software/tools/easybuild/). The modules are divided into the following structure: Application modules on clusters are built using [EasyBuild][1]. The modules are divided into the following structure:  base: Default module class ... ... @@ -61,4 +61,7 @@ Application modules on clusters are built using [EasyBuild](/software/tools/easy !!! note The modules set up the application paths, library paths and environment variables for running particular application. The modules may be loaded, unloaded and switched, according to momentary needs. For details see [here](/software/modules/lmod/). The modules may be loaded, unloaded and switched, according to momentary needs. For details see [lmod][2]. [1]: software/tools/easybuild.md [2]: software/modules/lmod.md  # VNC The **Virtual Network Computing** (**VNC**) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing "Desktop sharing") system that uses the [Remote Frame Buffer protocol (RFB)](http://en.wikipedia.org/wiki/RFB_protocol "RFB protocol") to remotely control another [computer](http://en.wikipedia.org/wiki/Computer "Computer"). It transmits the [keyboard](http://en.wikipedia.org/wiki/Computer_keyboard "Computer keyboard") and [mouse](http://en.wikipedia.org/wiki/Computer_mouse") events from one computer to another, relaying the graphical [screen](http://en.wikipedia.org/wiki/Computer_screen "Computer screen") updates back in the other direction, over a [network](http://en.wikipedia.org/wiki/Computer_network "Computer network"). The **Virtual Network Computing** (**VNC**) is a graphical [desktop sharing][a] system that uses the [Remote Frame Buffer protocol (RFB)][b] to remotely control another [computer][c]). It transmits the [keyboard][d] and [mouse][e] events from one computer to another, relaying the graphical [screen][f] updates back in the other direction, over a [network][g]. Vnc-based connections are usually faster (require less network bandwidth) then [X11](/general/accessing-the-clusters/graphical-user-interface/x-window-system) applications forwarded directly through ssh. Vnc-based connections are usually faster (require less network bandwidth) then [X11][1] applications forwarded directly through ssh. The recommended clients are [TightVNC](http://www.tightvnc.com) or [TigerVNC](http://sourceforge.net/apps/mediawiki/tigervnc/index.php?title=Main_Page) (free, open source, available for almost any platform). The recommended clients are [TightVNC][h] or [TigerVNC][i] (free, open source, available for almost any platform). In this chapter we show how to create an underlying ssh tunnel from your client machine to one of our login nodes. Then, how to start your own vnc server on our login node and finally how to connect to your vnc server via the encrypted ssh tunnel. ... ... @@ -24,7 +24,7 @@ Verify: !!! note To access VNC a local vncserver must be started first and also a tunnel using SSH port forwarding must be established. [See below](#linuxmac-os-example-of-creating-a-tunnel) for the details on SSH tunnels. [See below][2] for the details on SSH tunnels. You should start by **choosing your display number**. To choose free one, you should check currently occupied display numbers - list them using command: ... ... @@ -78,7 +78,7 @@ username :102 !!! note The VNC server runs on port 59xx, where xx is the display number. So, you get your port number simply as 5900 + display number, in our example 5900 + 61 = 5961. Another example for display number 102 is calculation of TCP port 5900 + 102 = 6002 but be aware, that TCP ports above 6000 are often used by X11. **Please, calculate your own port number and use it instead of 5961 from examples below!** To access the VNC server you have to create a tunnel between the login node using TCP port 5961 and your machine using a free TCP port (for simplicity the very same) in next step. See examples for [Linux/Mac OS](#linuxmac-os-example-of-creating-a-tunnel) and [Windows](#windows-example-of-creating-a-tunnel). To access the VNC server you have to create a tunnel between the login node using TCP port 5961 and your machine using a free TCP port (for simplicity the very same) in next step. See examples for [Linux/Mac OS][2] and [Windows][3]. !!! note The tunnel must point to the same login node where you launched the VNC server, eg. login2. If you use just cluster-name.it4i.cz, the tunnel might point to a different node due to DNS round robin. ... ... @@ -145,7 +145,7 @@ Fill the Source port and Destination fields. **Do not forget to click the Add bu ### WSL (Bash on Windows) [Windows Subsystem for Linux](http://docs.microsoft.com/en-us/windows/wsl) is another way to run Linux software in a Windows environment. [Windows Subsystem for Linux][j] is another way to run Linux software in a Windows environment. At your machine, create the tunnel: ... ... @@ -214,7 +214,7 @@ Or this way:  !!! note Do not forget to terminate also SSH tunnel, if it was used. Look on end of [this section](#linuxmac-os-example-of-creating-a-tunnel) for the details. Do not forget to terminate also SSH tunnel, if it was used. Look on end of [this section][2] for the details. ## GUI Applications on Compute Nodes Over VNC ... ... @@ -230,7 +230,7 @@ Allow incoming X11 graphics from the compute nodes at the login node:$ xhost +  Get an interactive session on a compute node (for more detailed info [look here](/anselm/job-submission-and-execution/)). Use the **-v DISPLAY** option to propagate the DISPLAY on the compute node. In this example, we want a complete node (16 cores in this example) from the production queue: Get an interactive session on a compute node (for more detailed info [look here][4]). Use the **-v DISPLAY** option to propagate the DISPLAY on the compute node. In this example, we want a complete node (16 cores in this example) from the production queue: console $qsub -I -v DISPLAY=$(uname -n):$(echo$DISPLAY | cut -d ':' -f 2) -A PROJECT_ID -q qprod -l select=1:ncpus=16 ... ... @@ -245,3 +245,19 @@ $xterm Example described above: ![](../../../img/gnome-compute-nodes-over-vnc.png) [a]: http://en.wikipedia.org/wiki/Desktop_sharing [b]: http://en.wikipedia.org/wiki/RFB_protocol [c]: http://en.wikipedia.org/wiki/Computer [d]: http://en.wikipedia.org/wiki/Computer_keyboard [e]: http://en.wikipedia.org/wiki/Computer_mouse [f]: http://en.wikipedia.org/wiki/Computer_screen [g]: http://en.wikipedia.org/wiki/Computer_network [h]: http://www.tightvnc.com [i]: http://sourceforge.net/apps/mediawiki/tigervnc/index.php?title=Main_Page [j]: http://docs.microsoft.com/en-us/windows/wsl [1]: x-window-system.md [2]: #linuxmac-os-example-of-creating-a-tunnel [3]: #windows-example-of-creating-a-tunnel [4]: ../../../anselm/job-submission-and-execution.md  # X Window System The X Window system is a principal way to get GUI access to the clusters. The **X Window System** (commonly known as **X11**, based on its current major version being 11, or shortened to simply **X**, and sometimes informally **X-Windows**) is a computer software system and network [protocol](http://en.wikipedia.org/wiki/Protocol_%28computing%29 "Protocol (computing)") that provides a basis for [graphical user interfaces](http://en.wikipedia.org/wiki/Graphical_user_interface "Graphical user interface") (GUIs) and rich input device capability for [networked computers](http://en.wikipedia.org/wiki/Computer_network "Computer network"). The X Window system is a principal way to get GUI access to the clusters. The **X Window System** (commonly known as **X11**, based on its current major version being 11, or shortened to simply **X**, and sometimes informally **X-Windows**) is a computer software system and network [protocol][a] that provides a basis for [graphical user interfaces][b] (GUIs) and rich input device capability for [networked computers][c]. !!! tip The X display forwarding must be activated and the X server running on client side ... ... @@ -60,18 +60,17 @@ In order to display graphical user interface GUI of various software tools, you ### X Server on OS X Mac OS users need to install [XQuartz server](https://www.xquartz.org). Mac OS users need to install [XQuartz server][d]. ### X Server on Windows There are variety of X servers available for Windows environment. The commercial Xwin32 is very stable and rich featured. The Cygwin environment provides fully featured open-source XWin X server. For simplicity, we recommend open-source X server by the [Xming project](http://sourceforge.net/projects/xming/). For stability and full features we recommend the [XWin](http://x.cygwin.com/) X server by Cygwin There are variety of X servers available for Windows environment. The commercial Xwin32 is very stable and rich featured. The Cygwin environment provides fully featured open-source XWin X server. For simplicity, we recommend open-source X server by the [Xming project][e]. For stability and full features we recommend the [XWin][f] X server by Cygwin | How to use Xwin | How to use Xming | |--- | --- | | [Install Cygwin](http://x.cygwin.com/) Find and execute XWin.exe to start the X server on Windows desktop computer.[If no able to forward X11 using PuTTY to CygwinX](#if-no-able-to-forward-x11-using-putty-to-cygwinx) | Use Xlaunch to configure the Xming. Run Xming to start the X server on Windows desktop computer. | | [Install Cygwin][g]. Find and execute XWin.exe to start the X server on Windows desktop computer.[If no able to forward X11 using PuTTY to CygwinX][1] | Use Xlaunch to configure the Xming. Run Xming to start the X server on Windows desktop computer. | Read more on [http://www.math.umn.edu/systems_guide/putty_xwin32.html](http://www.math.umn.edu/systems_guide/putty_xwin32.shtml) Read more [here][h]. ## Running GUI Enabled Applications ... ... @@ -116,7 +115,7 @@ The Gnome 2.28 GUI environment is available on the clusters. We recommend to use ### Gnome on Linux and OS X To run the remote Gnome session in a window on Linux/OS X computer, you need to install Xephyr. Ubuntu package is xserver-xephyr, on OS X it is part of [XQuartz](http://xquartz.macosforge.org/landing/). First, launch Xephyr on local machine: xserver-xephyr, on OS X it is part of [XQuartz][i]. First, launch Xephyr on local machine: console local$ Xephyr -ac -screen 1024x768 -br -reset -terminate :1 & ... ... @@ -143,7 +142,7 @@ However this method does not seem to work with recent Linux distributions and yo Use XLaunch to start the Xming server or run the XWin.exe. Select the "One window" mode. Log in to the cluster, using [PuTTY](#putty-on-windows) or [Bash on Windows](#wsl-bash-on-windows). On the cluster, run the gnome-session command. Log in to the cluster, using [PuTTY][2] or [Bash on Windows][3]. On the cluster, run the gnome-session command. console $gnome-session & ... ... @@ -153,3 +152,16 @@ In this way, we run remote gnome session on the cluster, displaying it in the lo Use System-Log Out to close the gnome-session [1]: #if-no-able-to-forward-x11-using-putty-to-cygwinx [2]: #putty-on-windows [3]: #wsl-bash-on-windows [a]: http://en.wikipedia.org/wiki/Protocol_%28computing%29 [b]: http://en.wikipedia.org/wiki/Graphical_user_interface [c]: http://en.wikipedia.org/wiki/Computer_network [d]: https://www.xquartz.org [e]: http://sourceforge.net/projects/xming/ [f]: http://x.cygwin.com/ [g]: http://x.cygwin.com/ [h]: http://www.math.umn.edu/systems_guide/putty_xwin32.shtml [i]: http://xquartz.macosforge.org/landing/  ... ... @@ -2,10 +2,10 @@ ## Windows PuTTY Installer We recommned you to download "**A Windows installer for everything except PuTTYtel**" with **Pageant** (SSH authentication agent) and **PuTTYgen** (PuTTY key generator) which is available [here](http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html). We recommned you to download "**A Windows installer for everything except PuTTYtel**" with **Pageant** (SSH authentication agent) and **PuTTYgen** (PuTTY key generator) which is available [here][a]. !!! note After installation you can proceed directly to private keys authentication using ["Putty"](#putty). After installation you can proceed directly to private keys authentication using ["Putty"][1]. "Change Password for Existing Private Key" is optional. ... ... @@ -23,7 +23,7 @@ We recommned you to download "**A Windows installer for everything except PuTTYt * Category - Connection - SSH - Auth: Select Attempt authentication using Pageant. Select Allow agent forwarding. Browse and select your [private key](ssh-keys/) file. Browse and select your [private key][2] file. ![](../../../img/PuTTY_keyV.png) ... ... @@ -36,7 +36,7 @@ We recommned you to download "**A Windows installer for everything except PuTTYt ![](../../../img/PuTTY_open_Salomon.png) * Enter your username if the _Host Name_ input is not in the format "username@salomon.it4i.cz". * Enter passphrase for selected [private key](/general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/) file if Pageant **SSH authentication agent is not used.** * Enter passphrase for selected [private key][2] file if Pageant **SSH authentication agent is not used.** ## Another PuTTY Settings ... ... @@ -63,7 +63,7 @@ PuTTYgen is the PuTTY key generator. You can load in an existing private key and You can change the password of your SSH key with "PuTTY Key Generator". Make sure to backup the key. * Load your [private key](/general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/) file with _Load_ button. * Load your [private key][2] file with _Load_ button. * Enter your current passphrase. * Change key passphrase. * Confirm key passphrase. ... ... @@ -104,4 +104,9 @@ You can generate an additional public/private key pair and insert public key int ![](../../../img/PuttyKeygenerator_006V.png) * Now you can insert additional public key into authorized_keys file for authentication with your own private key. You must log in using ssh key received after registration. Then proceed to [How to add your own key](/general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/). You must log in using ssh key received after registration. Then proceed to [How to add your own key][2]. [1]: #putty [2]: ssh-keys.md [a]: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html  ... ... @@ -15,7 +15,7 @@ It is impossible to connect to VPN from other operating systems. ## VPN Client Installation You can install VPN client from web interface after successful login with [IT4I credentials](/general/obtaining-login-credentials/obtaining-login-credentials/#login-credentials) on address [https://vpn.it4i.cz/user](https://vpn.it4i.cz/user) You can install VPN client from web interface after successful login with [IT4I credentials][1] [here][a]. ![](../../img/vpn_web_login.png) ... ... @@ -43,7 +43,7 @@ After successful download of installation file, you have to execute this executa You can use graphical user interface or command line interface to run VPN client on all supported operating systems. We suggest using GUI. Before the first login to VPN, you have to fill URL **[https://vpn.it4i.cz/user](https://vpn.it4i.cz/user)** into the text field. Before the first login to VPN, you have to fill URL **[https://vpn.it4i.cz/user][a]** into the text field. ![](../../img/vpn_contacting_https_cluster.png) ... ... @@ -72,3 +72,8 @@ After a successful logon, you can see a green circle with a tick mark on the loc ![](../../img/vpn_successfull_connection.png) For disconnecting, right-click on the AnyConnect client icon in the system tray and select **VPN Disconnect**. [1]: ../../general/obtaining-login-credentials/obtaining-login-credentials.md#login-credentials [a]: https://vpn.it4i.cz/user  # Applying for Resources Computational resources may be allocated by any of the following [Computing resources allocation](http://www.it4i.cz/computing-resources-allocation/?lang=en) mechanisms. Computational resources may be allocated by any of the following [Computing resources allocation][a] mechanisms. Academic researchers can apply for computational resources via [Open Access Competitions](http://www.it4i.cz/open-access-competition/?lang=en&lang=en). Academic researchers can apply for computational resources via [Open Access Competitions][b]. Anyone is welcomed to apply via the [Directors Discretion.](http://www.it4i.cz/obtaining-computational-resources-through-directors-discretion/?lang=en&lang=en) Anyone is welcomed to apply via the [Directors Discretion][c]. Foreign (mostly European) users can obtain computational resources via the [PRACE (DECI) program](http://www.prace-ri.eu/DECI-Projects). Foreign (mostly European) users can obtain computational resources via the [PRACE (DECI) program][d]. In all cases, IT4Innovations’ access mechanisms are aimed at distributing computational resources while taking into account the development and application of supercomputing methods and their benefits and usefulness for society. The applicants are expected to submit a proposal. In the proposal, the applicants **apply for a particular amount of core-hours** of computational resources. The requested core-hours should be substantiated by scientific excellence of the proposal, its computational maturity and expected impacts. Proposals do undergo a scientific, technical and economic evaluation. The allocation decisions are based on this evaluation. More information at [Computing resources allocation](http://www.it4i.cz/computing-resources-allocation/?lang=en) and [Obtaining Login Credentials](/general/obtaining-login-credentials/obtaining-login-credentials/) page. In all cases, IT4Innovations’ access mechanisms are aimed at distributing computational resources while taking into account the development and application of supercomputing methods and their benefits and usefulness for society. The applicants are expected to submit a proposal. In the proposal, the applicants **apply for a particular amount of core-hours** of computational resources. The requested core-hours should be substantiated by scientific excellence of the proposal, its computational maturity and expected impacts. Proposals do undergo a scientific, technical and economic evaluation. The allocation decisions are based on this evaluation. More information at [Computing resources allocation][a] and [Obtaining Login Credentials][1] page. [1]: obtaining-login-credentials/obtaining-login-credentials.md [a]: http://www.it4i.cz/computing-resources-allocation/?lang=en [b]: http://www.it4i.cz/open-access-competition/?lang=en&lang=en [c]: http://www.it4i.cz/obtaining-computational-resources-through-directors-discretion/?lang=en&lang=en [d]: http://www.prace-ri.eu/DECI-Projects  ... ... @@ -17,11 +17,11 @@ However, users need only manage User and CA certificates. Note that your user ce ## Q: Which X.509 Certificates Are Recognised by IT4Innovations? [The Certificates for Digital Signatures](#the-certificates-for-digital-signatures). [The Certificates for Digital Signatures][1]. ## Q: How Do I Get a User Certificate That Can Be Used With IT4Innovations? To get a certificate, you must make a request to your local, IGTF approved, Certificate Authority (CA). Usually you then must visit, in person, your nearest Registration Authority (RA) to verify your affiliation and identity (photo identification is required). Usually, you will then be emailed details on how to retrieve your certificate, although procedures can vary between CAs. If you are in Europe, you can locate [your trusted CA](https://www.eugridpma.org/members/worldmap/). To get a certificate, you must make a request to your local, IGTF approved, Certificate Authority (CA). Usually you then must visit, in person, your nearest Registration Authority (RA) to verify your affiliation and identity (photo identification is required). Usually, you will then be emailed details on how to retrieve your certificate, although procedures can vary between CAs. If you are in Europe, you can locate [your trusted CA][a]. In some countries certificates can also be retrieved using the TERENA Certificate Service, see the FAQ below for the link. ... ... @@ -31,7 +31,7 @@ Yes, provided that the CA which provides this service is also a member of IGTF. ## Q: Does IT4Innovations Support the TERENA Certificate Service? Yes, ITInnovations supports TERENA eScience personal certificates. For more information, visit [TCS - Trusted Certificate Service](https://tcs-escience-portal.terena.org/), where you also can find if your organisation/country can use this service Yes, ITInnovations supports TERENA eScience personal certificates. For more information, visit [TCS - Trusted Certificate Service][b], where you also can find if your organisation/country can use this service. ## Q: What Format Should My Certificate Take? ... ... @@ -51,7 +51,7 @@ To convert your Certificate from p12 to JKS, IT4Innovations recommends using the Certification Authority (CA) certificates are used to verify the link between your user certificate and the authority which issued it. They are also used to verify the link between the host certificate of a IT4Innovations server and the CA which issued that certificate. In essence they establish a chain of trust between you and the target server. Thus, for some grid services, users must have a copy of all the CA certificates. To assist users, SURFsara (a member of PRACE) provides a complete and up-to-date bundle of all the CA certificates that any PRACE user (or IT4Innovations grid services user) will require. Bundle of certificates, in either p12, PEM or JKS formats, are [available here](https://winnetou.surfsara.nl/prace/certs/). To assist users, SURFsara (a member of PRACE) provides a complete and up-to-date bundle of all the CA certificates that any PRACE user (or IT4Innovations grid services user) will require. Bundle of certificates, in either p12, PEM or JKS formats, are [available here][c]. It is worth noting that gsissh-term and DART automatically updates their CA certificates from this SURFsara website. In other cases, if you receive a warning that a server’s certificate can not be validated (not trusted), then update your CA certificates via the SURFsara website. If this fails, then contact the IT4Innovations helpdesk. ... ... @@ -61,7 +61,7 @@ Lastly, if you need the CA certificates for a personal Globus 5 installation, th myproxy-get-trustroots -s myproxy-prace.lrz.de  If you run this command as ’root’, then it will install the certificates into /etc/grid-security/certificates. If you run this not as ’root’, then the certificates will be installed into$HOME/.globus/certificates. For Globus, you can download the globuscerts.tar.gz packet [available here](https://winnetou.surfsara.nl/prace/certs/). If you run this command as ’root’, then it will install the certificates into /etc/grid-security/certificates. If you run this not as ’root’, then the certificates will be installed into $HOME/.globus/certificates. For Globus, you can download the globuscerts.tar.gz packet [available here][c]. ## Q: What Is a DN and How Do I Find Mine? ... ... @@ -104,7 +104,7 @@ To check your certificate (e.g., DN, validity, issuer, public key algorithm, etc openssl x509 -in usercert.pem -text -noout  To download openssl if not pre-installed, see [here](https://www.openssl.org/source/). On Macintosh Mac OS X computers openssl is already pre-installed and can be used immediately. To download openssl if not pre-installed, see [here][d]. On Macintosh Mac OS X computers openssl is already pre-installed and can be used immediately. ## Q: How Do I Create and Then Manage a Keystore? ... ... @@ -126,7 +126,7 @@ You also can import CA certificates into your java keystore with the tool, e.g.: where$mydomain.crt is the certificate of a trusted signing authority (CA) and $mydomain is the alias name that you give to the entry. More information on the tool can be found [here](http://docs.oracle.com/javase/7/docs/technotes/tools/solaris/keytool.html) More information on the tool can be found [here][e]. ## Q: How Do I Use My Certificate to Access the Different Grid Services? ... ... @@ -134,7 +134,7 @@ Most grid services require the use of your certificate; however, the format of y If employing the PRACE version of GSISSH-term (also a Java Web Start Application), you may use either the PEM or p12 formats. Note that this service automatically installs up-to-date PRACE CA certificates. If the grid service is UNICORE, then you bind your certificate, in either the p12 format or JKS, to UNICORE during the installation of the client on your local machine. For more information visit [UNICORE6 in PRACE](http://www.prace-ri.eu/UNICORE6-in-PRACE) If the grid service is UNICORE, then you bind your certificate, in either the p12 format or JKS, to UNICORE during the installation of the client on your local machine. For more information visit [UNICORE6 in PRACE][f]. If the grid service is part of Globus, such as GSI-SSH, GriFTP or GRAM5, then the certificates can be in either p12 or PEM format and must reside in the "$HOME/.globus" directory for Linux and Mac users or %HOMEPATH%.globus for Windows users. (Windows users will have to use the DOS command ’cmd’ to create a directory which starts with a ’.’). Further, user certificates should be named either "usercred.p12" or "usercert.pem" and "userkey.pem", and the CA certificates must be kept in a pre-specified directory as follows. For Linux and Mac users, this directory is either $HOME/.globus/certificates or /etc/grid-security/certificates. For Windows users, this directory is %HOMEPATH%.globuscertificates. (If you are using GSISSH-Term from prace-ri.eu then you do not have to create the .globus directory nor install CA certificates to use this tool alone). ... ... @@ -152,12 +152,23 @@ A proxy certificate is a short-lived certificate which may be employed by UNICOR ## Q: What Is the MyProxy Service? [The MyProxy Service](http://grid.ncsa.illinois.edu/myproxy/) , can be employed by gsissh-term and Globus tools, and is an online repository that allows users to store long lived proxy certificates remotely, which can then be retrieved for use at a later date. Each proxy is protected by a password provided by the user at the time of storage. This is beneficial to Globus users as they do not have to carry their private keys and certificates when travelling; nor do users have to install private keys and certificates on possibly insecure computers. [The MyProxy Service][g], can be employed by gsissh-term and Globus tools, and is an online repository that allows users to store long lived proxy certificates remotely, which can then be retrieved for use at a later date. Each proxy is protected by a password provided by the user at the time of storage. This is beneficial to Globus users as they do not have to carry their private keys and certificates when travelling; nor do users have to install private keys and certificates on possibly insecure computers. ## Q: Someone May Have Copied or Had Access to the Private Key of My Certificate Either in a Separate File or in the Browser. What Should I Do? Please ask the CA that issued your certificate to revoke this certificate and to supply you with a new one. In addition, report this to IT4Innovations by contacting [the support team](https://support.it4i.cz/rt). Please ask the CA that issued your certificate to revoke this certificate and to supply you with a new one. In addition, report this to IT4Innovations by contacting [the support team][h]. ## Q: My Certificate Expired. What Should I Do? In order to still be able to communicate with us, one has to make a request for the new certificate to your Certificate Authority (CA). There is no need to explicitly send us any information about your new certificate if a new one has the same Distinguished Name (DN) as the old one. [1]: #the-certificates-for-digital-signatures [a]: https://www.eugridpma.org/members/worldmap/ [b]: https://tcs-escience-portal.terena.org/ [c]: https://winnetou.surfsara.nl/prace/certs/ [d]: https://www.openssl.org/source/ [e]: http://docs.oracle.com/javase/7/docs/technotes/tools/solaris/keytool.html [f]: http://www.prace-ri.eu/UNICORE6-in-PRACE [g]: http://grid.ncsa.illinois.edu/myproxy/ [h]: https://support.it4i.cz/rt This diff is collapsed.  # Resource Allocation and Job Execution To run a [job](/#terminology-frequently-used-on-these-pages), [computational resources](/salomon/resources-allocation-policy#resource-accounting-policy) for this particular job must be allocated. This is done via the PBS Pro job workload manager software, which distributes workloads across the supercomputer. Extensive information about PBS Pro can be found in the [PBS Pro User's Guide](/pbspro). To run a [job][1], computational resources for this particular job must be allocated. This is done via the PBS Pro job workload manager software, which distributes workloads across the supercomputer. Extensive information about PBS Pro can be found in the [PBS Pro User's Guide][2]. ## Resources Allocation Policy The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share](/salomon/job-priority#fair-share-priority) ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are are the most important: The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share][3] ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are are the most important: * **qexp**, the Express queue * **qprod**, the Production queue ... ... @@ -14,9 +14,9 @@ The resources are allocated to the job in a fair-share fashion, subject to const * **qfree**, the Free resource utilization queue !!! note Check the queue status at [https://extranet.it4i.cz/](https://extranet.it4i.cz/) Check the queue status [here][a]. Read more on the [Resource AllocationPolicy](/salomon/resources-allocation-policy) page. Read more on the [Resource AllocationPolicy][4] page. ## Job Submission and Execution ... ... @@ -25,7 +25,7 @@ Read more on the [Resource AllocationPolicy](/salomon/resources-allocation-polic The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 16 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** Read more on the [Job submission and execution](/salomon/job-submission-and-execution) page. Read more on the [Job submission and execution][5] page. ## Capacity Computing ... ... @@ -36,4 +36,13 @@ Use GNU Parallel and/or Job arrays when running (many) single core jobs. In many cases, it is useful to submit huge (100+) number of computational jobs into the PBS queue system. Huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations, achieving best runtime, throughput and computer utilization. In this chapter, we discuss the the recommended way to run huge number of jobs, including **ways to run huge number of single core jobs**. Read more on [Capacity computing](/salomon/capacity-computing) page. Read more on [Capacity computing][6] page. [1]: #terminology-frequently-used-on-these-pages [2]: ../pbspro.md [3]: ../salomon/job-priority.md#fair-share-priority [4]: ../salomon/resources-allocation-policy.md [5]: ../salomon/job-submission-and-execution.md [6]: ../salomon/capacity-computing.md [a]: https://extranet.it4i.cz/rsweb/salomon/queues  # Documentation Welcome to the IT4Innovations documentation pages. The IT4Innovations national supercomputing center operates the supercomputers [Salomon](/salomon/introduction/) and [Anselm](/anselm/introduction/). The supercomputers are [available](/general/applying-for-resources/) to the academic community within the Czech Republic and Europe, and the industrial community worldwide. The purpose of these pages is to provide comprehensive documentation of the hardware, software and usage of the computers. !!! Warning There's a planned Salomon upgrade. Make sure to read the [details][upgrade]. Welcome to the IT4Innovations documentation pages. The IT4Innovations national supercomputing center operates the supercomputers [Salomon][1] and [Anselm][2]. The supercomputers are [available][3] to the academic community within the Czech Republic and Europe, and the industrial community worldwide. The purpose of these pages is to provide comprehensive documentation of the hardware, software and usage of the computers. ## How to Read the Documentation ... ... @@ -11,27 +14,27 @@ Welcome to the IT4Innovations documentation pages. The IT4Innovations national s ## Getting Help and Support !!! note Contact [support$at$it4i.cz](mailto:support@it4i.cz) for help and support regarding the cluster technology at IT4Innovations. Please use **Czech**, **Slovak** or **English** language for communication with us. Follow the status of your request to IT4Innovations at [support.it4i.cz/rt](http://support.it4i.cz/rt). The IT4Innovations support team will use best efforts to resolve requests within thirty days. Contact [support$at$it4i.cz][a] for help and support regarding the cluster technology at IT4Innovations. Please use **Czech**, **Slovak** or **English** language for communication with us. Follow the status of your request to IT4Innovations [here][b]. The IT4Innovations support team will use best efforts to resolve requests within thirty days. Use your IT4Innovations username and password to log in to the [support](http://support.it4i.cz/) portal. Use your IT4Innovations username and password to log in to the [support][b] portal. ## Required Proficiency !!! note You need basic proficiency in Linux environments. In order to use the system for your calculations, you need basic proficiency in Linux environments. To gain this proficiency we recommend you read the [introduction to Linux](http://www.tldp.org/LDP/intro-linux/html/) operating system environments, and install a Linux distribution on your personal computer. A good choice might be the [CentOS](http://www.centos.org/) distribution, as it is similar to systems on the clusters at IT4Innovations. It's easy to install and use. In fact, any Linux distribution would do. In order to use the system for your calculations, you need basic proficiency in Linux environments. To gain this proficiency we recommend you read the [introduction to Linux][c] operating system environments, and install a Linux distribution on your personal computer. A good choice might be the [CentOS][d] distribution, as it is similar to systems on the clusters at IT4Innovations. It's easy to install and use. In fact, any Linux distribution would do. !!! note Learn how to parallelize your code! In many cases, you will run your own code on the cluster. In order to fully exploit the cluster, you will need to carefully consider how to utilize all the cores available on the node and how to use multiple nodes at the same time. You need to **parallelize** your code. Proficieny in MPI, OpenMP, CUDA, UPC or GPI2 programming may be gained via [training provided by IT4Innovations.](http://prace.it4i.cz) In many cases, you will run your own code on the cluster. In order to fully exploit the cluster, you will need to carefully consider how to utilize all the cores available on the node and how to use multiple nodes at the same time. You need to **parallelize** your code. Proficieny in MPI, OpenMP, CUDA, UPC or GPI2 programming may be gained via [training provided by IT4Innovations][e]. ## Terminology Frequently Used on These Pages * **node:** a computer, interconnected via a network to other computers - Computational nodes are powerful computers, designed for, and dedicated to executing demanding scientific computations. * **core:** a processor core, a unit of processor, executing computations * **core-hour:** also normalized core-hour, NCH. A metric of computer utilization, [see definition](/salomon/resources-allocation-policy/#normalized-core-hours-nch). * **core-hour:** also normalized core-hour, NCH. A metric of computer utilization, [see definition][4]. * **job:** a calculation running on the supercomputer - the job allocates and utilizes the resources of the supercomputer for certain time. * **HPC:** High Performance Computing * **HPC (computational) resources:** corehours, storage capacity, software licences ... ... @@ -60,8 +63,20 @@ local$ ## Errors Although we have taken every care to ensure the accuracy of the content, mistakes do happen. If you find an inconsistency or error, report it by visiting [http://support.it4i.cz/rt](http://support.it4i.cz/rt), creating a new ticket, and entering the details. If you find an inconsistency or error, report it by visiting [support][b], creating a new ticket, and entering the details. By doing so, you can save other readers from frustration and help us improve. !!! tip We will fix the problem as soon as possible. [1]: salomon/introduction.md [2]: anselm/introduction.md [3]: general/applying-for-resources.md [4]: salomon/resources-allocation-policy.md#normalized-core-hours-nch [upgrade]: salomon-upgrade.md [a]: mailto:support@it4i.cz [b]: http://support.it4i.cz/rt [c]: http://www.tldp.org/LDP/intro-linux/html/ [d]: http://www.centos.org/ [e]: http://prace.it4i.cz
 ... ... @@ -24,7 +24,8 @@ Install development packages (gcc, g++, make, automake, autoconf, bison, flex, p $qsub ... -l mic_devel=true  Available on Salomon Perrin nodes. !!! Warning Available on Salomon Perrin nodes. ## Global RAM Disk ... ... @@ -34,7 +35,8 @@ Create global shared file system consisting of RAM disks of allocated nodes. Fil$ qsub ... -l global_ramdisk=true  Available on Salomon nodes. !!! Warning Available on Salomon nodes only. ## Virtualization Network ... ... @@ -44,7 +46,7 @@ Configure network for virtualization, create interconnect for fast communication $qsub ... -l virt_network=true  [See Tap Interconnect](/software/tools/virtualization/#tap-interconnect) [See Tap Interconnect][1] ## x86 Adapt Support ... ... @@ -54,9 +56,11 @@ Load kernel module, that allows changing/toggling system parameters stored in MS$ qsub ... -l x86_adapt=true  Hazardous, it causes CPU frequency disruption. !!! Danger Hazardous, it causes CPU frequency disruption. Available on Salomon nodes. !!! Warning Available on Salomon nodes only. ## Disabling Intel Turbo Boost on CPU ... ... @@ -70,7 +74,8 @@ $qsub ... -l cpu_turbo_boost=false ## Offlining CPU Cores Not available. !!! Info Not available now. To offline N CPU cores ... ... @@ -86,16 +91,18 @@$ qsub ... -l cpu_offline_cores=PATTERN where pattern is list of core's numbers to offline separated by character 'c' e.g. "5c11c16c23c" Hazardous, it causes Lustre threads disruption. !!! Danger Hazardous, it causes Lustre threads disruption. ## Setting Intel Hyper Threading on CPU Not available, requires changed BIOS settings. Intel Hyper Threading is disabled by default. To enable Intel Hyper Threading on allocated nodes CPUs Intel Hyper Threading is disabled by default. To enable Intel Hyper Threading on allocated nodes CPUs: console $qsub ... -l cpu_hyper_threading=true  !!! Warning Available on Salomon nodes only. [1]: software/tools/virtualization.md#tap-interconnect This diff is collapsed. This source diff could not be displayed because it is too large. You can view the blob instead.  ... ... @@ -6,13 +6,19 @@ | ------ | ----------- | | [icc](http://software.intel.com/en-us/intel-compilers/) | Intel C and C++ compilers | ## Data | Module | Description | | ------ | ----------- | | [HDF5](http://www.hdfgroup.org/HDF5/) | HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections. | ## Devel | Module | Description | | ------ | ----------- | | devel_environment | | | M4 | | | ncurses | | | [devel_environment](https://docs.it4i.cz/software/mic/mic_environment) | Devel environment for intel xeon phi GCC 5.1.1 Python 2.7.12 Perl 5.14.2 CMake 2.8.7 Make 3.82 ncurses 5.9 ... | | [M4](http://www.gnu.org/software/m4/m4.html) | GNU M4 is an implementation of the traditional Unix macro processor. It is mostly SVR4 compatible although it has some extensions (for example, handling more than 9 positional parameters to macros). GNU M4 also has built-in functions for including files, running shell commands, doing arithmetic, etc. | | [ncurses](http://www.gnu.org/software/ncurses/) | The Ncurses (new curses) library is a free software emulation of curses in System V Release 4.0, and more. It uses Terminfo format, supports pads and color and multiple highlights and forms characters and function-key mapping, and has all the other SYSV-curses enhancements over BSD Curses. | ## Lang ... ... @@ -33,6 +39,7 @@ | Module | Description | | ------ | ----------- | | GMP | | | [Octave](http://www.gnu.org/software/octave/) | GNU Octave is a high-level interpreted language, primarily intended for numerical computations. | ## Mpi ... ... @@ -41,23 +48,32 @@ | ------ | ----------- | | [impi](http://software.intel.com/en-us/intel-mpi-library/) | Intel MPI Library, compatible with MPICH ABI | ## Numlib | Module | Description | | ------ | ----------- | | [imkl](http://software.intel.com/en-us/intel-mkl/) | Intel Math Kernel Library is a library of highly optimized, extensively threaded math routines for science, engineering, and financial applications that require maximum performance. Core math functions include BLAS, LAPACK, ScaLAPACK, Sparse Solvers, Fast Fourier Transforms, Vector Math, and more. | ## Toolchain | Module | Description | | ------ | ----------- | | [iccifort](http://software.intel.com/en-us/intel-cluster-toolkit-compiler/) | Intel C, C++ & Fortran compilers | | [ifort](http://software.intel.com/en-us/intel-compilers/) | Intel Fortran compiler | | [iimpi](http://software.intel.com/en-us/intel-cluster-toolkit-compiler/) | Intel C/C++ and Fortran compilers, alongside Intel MPI. | | [intel](http://software.intel.com/en-us/intel-cluster-toolkit-compiler/) | Compiler toolchain including Intel compilers, Intel MPI and Intel Math Kernel Library (MKL). | ## Tools | Module | Description | | ------ | ----------- | | bzip2 | | | cURL | | | [bzip2](http://www.bzip.org/) | bzip2 is a freely available, patent free, high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression. | | [cURL](http://curl.haxx.se) | libcurl is a free and easy-to-use client-side URL transfer library | | [expat](http://expat.sourceforge.net/) | Expat is an XML parser library written in C. It is a stream-oriented parser in which an application registers handlers for things the parser might find in the XML document (like start tags) | | OpenSSL | | ## Vis | Module | Description | | ------ | ----------- | | gettext | | | [gettext](http://www.gnu.org/software/gettext/) | GNU gettext' is an important step for the GNU Translation Project, as it is an asset on which we may build many other steps. This package offers to programmers, translators, and even users, a well integrated set of tools and documentation |  ... ... @@ -61,6 +61,7 @@ | [pkg-config](http://www.freedesktop.org/wiki/Software/pkg-config/) | pkg-config is a helper tool used when compiling applications and libraries. It helps you insert the correct compiler options on the command line so an application can use gcc -o test test.c pkg-config --libs --cflags glib-2.0 for instance, rather than hard-coding values on where to find glib (or other libraries). | | [Qt](http://qt-project.org/) | Qt is a comprehensive cross-platform C++ application framework. | | [Qt5](http://qt.io/) | Qt is a comprehensive cross-platform C++ application framework. | | [sparsehash](https://github.com/sparsehash/sparsehash) | An extremely memory-efficient hash_map implementation. 2 bits/entry overhead! The SparseHash library contains several hash-map implementations, including implementations that optimize for space or speed. | | [SQLite](http://www.sqlite.org/) | SQLite: SQL Database Engine in a C Library | | [SWIG](http://www.swig.org/) | SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages. | | [xorg-macros](http://cgit.freedesktop.org/xorg/util/macros) | X.org macros utilities. | ... ... @@ -140,7 +141,7 @@ | Module | Description | | ------ | ----------- | | CUDA | | | [CUDA](https://developer.nvidia.com/cuda-toolkit) | CUDA (formerly Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA and implemented by the graphics processing units (GPUs) that they produce. CUDA gives developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs. | | [hwloc](http://www.open-mpi.org/projects/hwloc/) | The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs. It primarily aims at helping applications with gathering information about modern computing hardware so as to exploit it accordingly and efficiently. | | [libpciaccess](http://cgit.freedesktop.org/xorg/lib/libpciaccess/) | Generic PCI access library. | ... ... @@ -148,13 +149,14 @@ | Module | Description | | ------ | ----------- | | [foss]((none)) | GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK. | | foss | GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK. | | [GNU](http://www.gnu.org/software/) | Compiler-only toolchain with GCC and binutils. | | [gompi]((none)) | GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support. | | gompi | GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support. | | [iccifort](http://software.intel.com/en-us/intel-cluster-toolkit-compiler/) | Intel C, C++ and Fortran compilers | | [iimpi](http://software.intel.com/en-us/intel-cluster-toolkit-compiler/) | Intel C/C++ and Fortran compilers, alongside Intel MPI. | | [intel](http://software.intel.com/en-us/intel-cluster-toolkit-compiler/) | Intel Cluster Toolkit Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MPI & Intel MKL. | | [PRACE](http://www.prace-ri.eu/PRACE-Common-Production) | The PRACE Common Production Environment (PCPE) is a set of software tools and libraries that are planned to be available on all PRACE execution sites. The PCPE also defines a set of environment variables that try to make compilation on all sites as homogeneous and simple as possible. | | [Py](https://www.python.org) | Python 2.7 toolchain | ## Tools ... ... @@ -162,6 +164,7 @@ | ------ | ----------- | | [Bash](http://www.gnu.org/software/bash) | Bash is an sh-compatible command language interpreter that executes commands read from the standard input or from a file. Bash also incorporates useful features from the Korn and C shells (ksh and csh). | | [binutils](http://directory.fsf.org/project/binutils/) | binutils: GNU binary utilities | | [BLCR](http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/) | Future Technologies Group researchers are developing a hybrid kernel/user implementation of checkpoint/restart. Their goal is to provide a robust, production quality implementation that checkpoints a wide range of applications, without requiring changes to be made to application code. This work focuses on checkpointing parallel applications that communicate through MPI, and on compatibility with the software suite produced by the SciDAC Scalable Systems Software ISIC. | | [bzip2](http://www.bzip.org/) | bzip2 is a freely available, patent free, high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression. | | [cURL](http://curl.haxx.se) | libcurl is a free and easy-to-use client-side URL transfer library, supporting DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, Telnet and TFTP. libcurl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, Kerberos), file transfer resume, http proxy tunneling and more. | | [DMTCP](http://dmtcp.sourceforge.net/index.html) | DMTCP (Distributed MultiThreaded Checkpointing) transparently checkpoints a single-host or distributed computation in user-space -- with no modifications to user code or to the O/S. | ... ... @@ -171,6 +174,7 @@ | [gzip](http://www.gnu.org/software/gzip/) | gzip (GNU zip) is a popular data compression program as a replacement for compress | | MATLAB | | | [Mercurial](http://mercurial.selenic.com/) | Mercurial is a free, distributed source control management tool. It efficiently handles projects of any size and offers an easy and intuitive interface. | | [moreutils](https://joeyh.name/code/moreutils/) | Moreutils is a growing collection of the unix tools that nobody thought to write long ago when unix was young. | | [numactl](http://oss.sgi.com/projects/libnuma/) | The numactl program allows you to run your application program on specific cpu's and memory nodes. It does this by supplying a NUMA memory policy to the operating system before running your program. The libnuma library provides convenient ways for you to add NUMA memory policies into your own program. | | pigz | | | [QEMU](http://wiki.qemu.org/Main_Page) | QEMU is a generic and open source machine emulator and virtualizer. | ... ... This source diff could not be displayed because it is too large. You can view the blob instead.  * ![pdf](img/pdf.png)[PBS Pro Programmer's Guide](http://www.pbsworks.com/pdfs/PBSProgramGuide13.0.pdf) * ![pdf](img/pdf.png)[PBS Pro Quick Start Guide](http://www.pbsworks.com/pdfs/PBSQuickStartGuide13.0.pdf) * ![pdf](img/pdf.png)[PBS Pro Reference Guide](http://www.pbsworks.com/pdfs/PBSReferenceGuide13.0.pdf) * ![pdf](img/pdf.png)[PBS Pro User's Guide](http://www.pbsworks.com/pdfs/PBSUserGuide13.0.pdf) * ![pdf](img/pdf.png)[PBS Pro Programmer's Guide][1] * ![pdf](img/pdf.png)[PBS Pro Quick Start Guide][2] * ![pdf](img/pdf.png)[PBS Pro Reference Guide][3] * ![pdf](img/pdf.png)[PBS Pro User's Guide][4] [1]: http://www.pbsworks.com/pdfs/PBSProgramGuide13.0.pdf [2]: http://www.pbsworks.com/pdfs/PBSQuickStartGuide13.0.pdf [3]: http://www.pbsworks.com/pdfs/PBSReferenceGuide13.0.pdf [4]: http://www.pbsworks.com/pdfs/PBSUserGuide13.0.pdf This diff is collapsed.  There's a planned upgrade of Salomon since 2018-12-04 til 2018-12-05. !!! Warning This upgrade will introduce a lot of changes with respect to production and user experience. !!! Hint You might **need** to **recompile** your binaries. Salomon operating system will be upgraded to the latest CentOS 7.6. We will be able to support the latest software versions and keep the cluster security with upstream releases after the upgrade. Major changes are: * kernel will be upgraded to 3.10 (2.6.32 now) * glibc will be upgraded to 2.17 (2.12 now) * software modules/binaries should be recompiled or deleted ## Discontinued Modules A new tag has been introduced. Modules tagged with **C6** might be malfunctioning. These modules might be recompiled during transition period. Keep support@it4i.cz informed on malfunctioning modules. console$ ml av intel/ --------------------------- /apps/modules/toolchain ----------------------------