From 6680f297fcdd72fea5bc803c62862ef8cbfb5a5e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Luk=C3=A1=C5=A1=20Krup=C4=8D=C3=ADk?= <lukas.krupcik@vsb.cz> Date: Fri, 26 Aug 2016 13:22:44 +0200 Subject: [PATCH] salomon docs.it4i.cz/salomon/resource-allocation-and-job-execution --- docs.it4i/salomon/accessing-the-cluster.md | 143 ------- .../accessing-the-cluster.md | 114 ++++++ .../outgoing-connections.md | 110 ++---- .../accessing-the-cluster/vpn-access.md | 72 +--- docs.it4i/salomon/environment-and-modules.md | 107 ++--- .../hardware-overview-1/hardware-overview.md | 89 ----- docs.it4i/salomon/hardware-overview.md | 60 +++ docs.it4i/salomon/introduction.md | 25 +- .../network-1/7d-enhanced-hypercube.md | 37 -- .../7D_Enhanced_hypercube.png | Bin .../salomon/network/7d-enhanced-hypercube.md | 32 ++ ...ngleplanetopologyAcceleratednodessmall.png | Bin .../IBsingleplanetopologyICEXMcellsmall.png | Bin .../Salomon_IB_topology.png | Bin .../ib-single-plane-topology.md | 34 +- .../salomon/{network-1 => network}/network.md | 53 +-- docs.it4i/salomon/prace.md | 367 ++++++------------ .../capacity-computing.md | 320 +++++---------- .../introduction.md | 57 +-- .../{hardware-overview-1 => }/uv-2000.jpeg | Bin 20 files changed, 566 insertions(+), 1054 deletions(-) delete mode 100644 docs.it4i/salomon/accessing-the-cluster.md create mode 100644 docs.it4i/salomon/accessing-the-cluster/accessing-the-cluster.md delete mode 100644 docs.it4i/salomon/hardware-overview-1/hardware-overview.md create mode 100644 docs.it4i/salomon/hardware-overview.md delete mode 100644 docs.it4i/salomon/network-1/7d-enhanced-hypercube.md rename docs.it4i/salomon/{network-1 => network}/7D_Enhanced_hypercube.png (100%) create mode 100644 docs.it4i/salomon/network/7d-enhanced-hypercube.md rename docs.it4i/salomon/{network-1 => network}/IBsingleplanetopologyAcceleratednodessmall.png (100%) rename docs.it4i/salomon/{network-1 => network}/IBsingleplanetopologyICEXMcellsmall.png (100%) rename docs.it4i/salomon/{network-1 => network}/Salomon_IB_topology.png (100%) rename docs.it4i/salomon/{network-1 => network}/ib-single-plane-topology.md (53%) rename docs.it4i/salomon/{network-1 => network}/network.md (54%) rename docs.it4i/salomon/{hardware-overview-1 => }/uv-2000.jpeg (100%) diff --git a/docs.it4i/salomon/accessing-the-cluster.md b/docs.it4i/salomon/accessing-the-cluster.md deleted file mode 100644 index 88e1d77ab..000000000 --- a/docs.it4i/salomon/accessing-the-cluster.md +++ /dev/null @@ -1,143 +0,0 @@ -Shell access and data transfer -============================== - - - -Interactive Login ------------------ - -The Salomon cluster is accessed by SSH protocol via login nodes login1, -login2, login3 and login4 at address salomon.it4i.cz. The login nodes -may be addressed specifically, by prepending the login node name to the -address. - -The alias >salomon.it4i.cz is currently not available through VPN -connection. Please use loginX.salomon.it4i.cz when connected to -VPN. - - |Login address|Port|Protocol|Login node| - |---|---|---|---| - |salomon.it4i.cz|22|ssh|round-robin DNS record for login[1-4]| - |login1.salomon.it4i.cz|22|ssh|login1| - |login1.salomon.it4i.cz|22|ssh|login1| - |login1.salomon.it4i.cz|22|ssh|login1| - |login1.salomon.it4i.cz|22|ssh|login1| - -The authentication is by the [private -key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.html) - -Please verify SSH fingerprints during the first logon. They are -identical on all login nodes: -f6:28:98:e4:f9:b2:a6:8f:f2:f4:2d:0a:09:67:69:80 (DSA) -70:01:c9:9a:5d:88:91:c7:1b:c0:84:d1:fa:4e:83:5c (RSA) - - - -Private key (`id_rsa/id_rsa.ppk` ): `600 (-rw-------)`s authentication: - -On **Linux** or **Mac**, use - -` -local $ ssh -i /path/to/id_rsa username@salomon.it4i.cz -` - -If you see warning message "UNPROTECTED PRIVATE KEY FILE!", use this -command to set lower permissions to private key file. - -` -local $ chmod 600 /path/to/id_rsa -` - -On **Windows**, use [PuTTY ssh -client](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty/putty.html). - -After logging in, you will see the command prompt: - -                    _____      _                            -                   / ____|    | |                           -                  | (___  __ _| | ___ _ __ ___  ___ _ __  -                   ___ / _` | |/ _ | '_ ` _ / _ | '_ -                   ____) | (_| | | (_) | | | | | | (_) | | | | -                  |_____/ __,_|_|___/|_| |_| |_|___/|_| |_| -                  - -                        http://www.it4i.cz/?lang=en - - Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com - [username@login2.salomon ~]$ - -The environment is **not** shared between login nodes, except for -[shared filesystems](storage/storage.html). - -Data Transfer -------------- - -Data in and out of the system may be transferred by the -[scp](http://en.wikipedia.org/wiki/Secure_copy) and sftp -protocols. - -In case large volumes of data are transferred, use dedicated data mover -nodes cedge[1-3].salomon.it4i.cz for increased performance. - - - -HTML commented section #1 (removed cedge servers from the table) - - Address |Port|Protocol| - ----------------------- |---|---|------------ - salomon.it4i.cz 22 scp, sftp - login1.salomon.it4i.cz 22 scp, sftp - login2.salomon.it4i.cz 22 scp, sftp - login3.salomon.it4i.cz 22 scp, sftp - login4.salomon.it4i.cz 22 scp, sftp - - The authentication is by the [private -key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.html) - -HTML commented section #2 (ssh transfer performance data need to be -verified) - -On linux or Mac, use scp or sftp client to transfer the data to Salomon: - -` -local $ scp -i /path/to/id_rsa my-local-file username@salomon.it4i.cz:directory/file -` - -` -local $ scp -i /path/to/id_rsa -r my-local-dir username@salomon.it4i.cz:directory -` - - or - -` -local $ sftp -o IdentityFile=/path/to/id_rsa username@salomon.it4i.cz -` - -Very convenient way to transfer files in and out of the Salomon computer -is via the fuse filesystem -[sshfs](http://linux.die.net/man/1/sshfs) - -` -local $ sshfs -o IdentityFile=/path/to/id_rsa username@salomon.it4i.cz:. mountpoint -` - -Using sshfs, the users Salomon home directory will be mounted on your -local computer, just like an external disk. - -Learn more on ssh, scp and sshfs by reading the manpages - -` -$ man ssh -$ man scp -$ man sshfs -` - -On Windows, use [WinSCP -client](http://winscp.net/eng/download.php) to transfer -the data. The [win-sshfs -client](http://code.google.com/p/win-sshfs/) provides a -way to mount the Salomon filesystems directly as an external disc. - -More information about the shared file systems is available -[here](storage/storage.html). - diff --git a/docs.it4i/salomon/accessing-the-cluster/accessing-the-cluster.md b/docs.it4i/salomon/accessing-the-cluster/accessing-the-cluster.md new file mode 100644 index 000000000..db5adec33 --- /dev/null +++ b/docs.it4i/salomon/accessing-the-cluster/accessing-the-cluster.md @@ -0,0 +1,114 @@ +Shell access and data transfer +============================== + +Interactive Login +----------------- +The Salomon cluster is accessed by SSH protocol via login nodes login1, login2, login3 and login4 at address salomon.it4i.cz. The login nodes may be addressed specifically, by prepending the login node name to the address. + +>The alias >salomon.it4i.cz is currently not available through VPN connection. Please use loginX.salomon.it4i.cz when connected to VPN. + + |Login address|Port|Protocol|Login node| + |---|---|---|---| + |salomon.it4i.cz|22|ssh|round-robin DNS record for login[1-4]| + |login1.salomon.it4i.cz|22|ssh|login1| + |login1.salomon.it4i.cz|22|ssh|login1| + |login1.salomon.it4i.cz|22|ssh|login1| + |login1.salomon.it4i.cz|22|ssh|login1| + +The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.html) + +>Please verify SSH fingerprints during the first logon. They are identical on all login nodes: +f6:28:98:e4:f9:b2:a6:8f:f2:f4:2d:0a:09:67:69:80 (DSA) +70:01:c9:9a:5d:88:91:c7:1b:c0:84:d1:fa:4e:83:5c (RSA) + +Private key authentication: + +On **Linux** or **Mac**, use + +```bash +local $ ssh -i /path/to/id_rsa username@salomon.it4i.cz +``` + +If you see warning message "UNPROTECTED PRIVATE KEY FILE!", use this command to set lower permissions to private key file. + +```bash +local $ chmod 600 /path/to/id_rsa +``` + +On **Windows**, use [PuTTY ssh client](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty/putty.html). + +After logging in, you will see the command prompt: + +```bash + _____ _ + / ____| | | + | (___ __ _| | ___ _ __ ___ ___ _ __ + \___ \ / _` | |/ _ \| '_ ` _ \ / _ \| '_ \ + ____) | (_| | | (_) | | | | | | (_) | | | | + |_____/ \__,_|_|\___/|_| |_| |_|\___/|_| |_| + + + http://www.it4i.cz/?lang=en + + +Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com +[username@login2.salomon ~]$ +``` + +>The environment is **not** shared between login nodes, except for [shared filesystems](storage/storage.html). + +Data Transfer +------------- +Data in and out of the system may be transferred by the [scp](http://en.wikipedia.org/wiki/Secure_copy) and sftp protocols. + +In case large volumes of data are transferred, use dedicated data mover nodes cedge[1-3].salomon.it4i.cz for increased performance. + +HTML commented section #1 (removed cedge servers from the table) + + |Address|Port|Protocol| + |---|---| + |salomon.it4i.cz|22|scp, sftp| + |login1.salomon.it4i.cz|22|scp, sftp| + |login2.salomon.it4i.cz|22|scp, sftp| + |login3.salomon.it4i.cz|22|scp, sftp| + |login4.salomon.it4i.cz|22|scp, sftp| + +The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.html) + +HTML commented section #2 (ssh transfer performance data need to be verified) + +On linux or Mac, use scp or sftp client to transfer the data to Salomon: + +```bash +local $ scp -i /path/to/id_rsa my-local-file username@salomon.it4i.cz:directory/file +``` + +```bash +local $ scp -i /path/to/id_rsa -r my-local-dir username@salomon.it4i.cz:directory +``` + +or + +```bash +local $ sftp -o IdentityFile=/path/to/id_rsa username@salomon.it4i.cz +``` + +Very convenient way to transfer files in and out of the Salomon computer is via the fuse filesystem [sshfs](http://linux.die.net/man/1/sshfs) + +```bash +local $ sshfs -o IdentityFile=/path/to/id_rsa username@salomon.it4i.cz:. mountpoint +``` + +Using sshfs, the users Salomon home directory will be mounted on your local computer, just like an external disk. + +Learn more on ssh, scp and sshfs by reading the manpages + +```bash +$ man ssh +$ man scp +$ man sshfs +``` + +On Windows, use [WinSCP client](http://winscp.net/eng/download.php) to transfer the data. The [win-sshfs client](http://code.google.com/p/win-sshfs/) provides a way to mount the Salomon filesystems directly as an external disc. + +More information about the shared file systems is available [here](storage/storage.html). \ No newline at end of file diff --git a/docs.it4i/salomon/accessing-the-cluster/outgoing-connections.md b/docs.it4i/salomon/accessing-the-cluster/outgoing-connections.md index 252d36b8f..43128e14b 100644 --- a/docs.it4i/salomon/accessing-the-cluster/outgoing-connections.md +++ b/docs.it4i/salomon/accessing-the-cluster/outgoing-connections.md @@ -1,118 +1,80 @@ -Outgoing connections +Outgoing connections ==================== - - Connection restrictions ----------------------- +Outgoing connections, from Salomon Cluster login nodes to the outside world, are restricted to following ports: -Outgoing connections, from Salomon Cluster login nodes to the outside -world, are restricted to following ports: - - |Port|Protocol| - |---|---| - |22|ssh| - |80|http| - |443|https| - |9418|git| +|Port|Protocol| +|---|---| +|22|ssh| +|80|http| +|443|https| +|9418|git| -Please use **ssh port forwarding** and proxy servers to connect from -Salomon to all other remote ports. +>Please use **ssh port forwarding** and proxy servers to connect from Salomon to all other remote ports. -Outgoing connections, from Salomon Cluster compute nodes are restricted -to the internal network. Direct connections form compute nodes to -outside world are cut. +Outgoing connections, from Salomon Cluster compute nodes are restricted to the internal network. Direct connections form compute nodes to outside world are cut. Port forwarding --------------- ### Port forwarding from login nodes -Port forwarding allows an application running on Salomon to connect to -arbitrary remote host and port. +>Port forwarding allows an application running on Salomon to connect to arbitrary remote host and port. -It works by tunneling the connection from Salomon back to users -workstation and forwarding from the workstation to the remote host. +It works by tunneling the connection from Salomon back to users workstation and forwarding from the workstation to the remote host. -Pick some unused port on Salomon login node (for example 6000) and -establish the port forwarding: +Pick some unused port on Salomon login node (for example 6000) and establish the port forwarding: -` +```bash local $ ssh -R 6000:remote.host.com:1234 salomon.it4i.cz -` +``` -In this example, we establish port forwarding between port 6000 on -Salomon and port 1234 on the remote.host.com. By accessing -localhost:6000 on Salomon, an application will see response of -remote.host.com:1234. The traffic will run via users local workstation. +In this example, we establish port forwarding between port 6000 on Salomon and port 1234 on the remote.host.com. By accessing localhost:6000 on Salomon, an application will see response of remote.host.com:1234. The traffic will run via users local workstation. -Port forwarding may be done **using PuTTY** as well. On the PuTTY -Configuration screen, load your Salomon configuration first. Then go to -Connection->SSH->Tunnels to set up the port forwarding. Click -Remote radio button. Insert 6000 to Source port textbox. Insert -remote.host.com:1234. Click Add button, then Open. +Port forwarding may be done **using PuTTY** as well. On the PuTTY Configuration screen, load your Salomon configuration first. Then go to Connection->SSH->Tunnels to set up the port forwarding. Click Remote radio button. Insert 6000 to Source port textbox. Insert remote.host.com:1234. Click Add button, then Open. -Port forwarding may be established directly to the remote host. However, -this requires that user has ssh access to remote.host.com +Port forwarding may be established directly to the remote host. However, this requires that user has ssh access to remote.host.com -` +```bash $ ssh -L 6000:localhost:1234 remote.host.com -` +``` Note: Port number 6000 is chosen as an example only. Pick any free port. ### Port forwarding from compute nodes -Remote port forwarding from compute nodes allows applications running on -the compute nodes to access hosts outside Salomon Cluster. +Remote port forwarding from compute nodes allows applications running on the compute nodes to access hosts outside Salomon Cluster. -First, establish the remote port forwarding form the login node, as -[described -above](outgoing-connections.html#port-forwarding-from-login-nodes). +First, establish the remote port forwarding form the login node, as [described above](outgoing-connections.html#port-forwarding-from-login-nodes). -Second, invoke port forwarding from the compute node to the login node. -Insert following line into your jobscript or interactive shell +Second, invoke port forwarding from the compute node to the login node. Insert following line into your jobscript or interactive shell -` +```bash $ ssh -TN -f -L 6000:localhost:6000 login1 -` +``` -In this example, we assume that port forwarding from login1:6000 to -remote.host.com:1234 has been established beforehand. By accessing -localhost:6000, an application running on a compute node will see -response of remote.host.com:1234 +In this example, we assume that port forwarding from login1:6000 to remote.host.com:1234 has been established beforehand. By accessing localhost:6000, an application running on a compute node will see response of remote.host.com:1234 ### Using proxy servers -Port forwarding is static, each single port is mapped to a particular -port on remote host. Connection to other remote host, requires new -forward. +Port forwarding is static, each single port is mapped to a particular port on remote host. Connection to other remote host, requires new forward. -Applications with inbuilt proxy support, experience unlimited access to -remote hosts, via single proxy server. +>Applications with inbuilt proxy support, experience unlimited access to remote hosts, via single proxy server. -To establish local proxy server on your workstation, install and run -SOCKS proxy server software. On Linux, sshd demon provides the -functionality. To establish SOCKS proxy server listening on port 1080 -run: +To establish local proxy server on your workstation, install and run SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: -` +```bash local $ ssh -D 1080 localhost -` +``` -On Windows, install and run the free, open source [Sock -Puppet](http://sockspuppet.com/) server. +On Windows, install and run the free, open source [Sock Puppet](http://sockspuppet.com/) server. -Once the proxy server is running, establish ssh port forwarding from -Salomon to the proxy server, port 1080, exactly as [described -above](outgoing-connections.html#port-forwarding-from-login-nodes). +Once the proxy server is running, establish ssh port forwarding from Salomon to the proxy server, port 1080, exactly as [described above](outgoing-connections.html#port-forwarding-from-login-nodes). -` +```bash local $ ssh -R 6000:localhost:1080 salomon.it4i.cz -` - -Now, configure the applications proxy settings to **localhost:6000**. -Use port forwarding to access the [proxy server from compute -nodes](outgoing-connections.html#port-forwarding-from-compute-nodes) -as well . +``` +Now, configure the applications proxy settings to **localhost:6000**. Use port forwarding to access the [proxy server from compute nodes](outgoing-connections.html#port-forwarding-from-compute-nodes) as well . \ No newline at end of file diff --git a/docs.it4i/salomon/accessing-the-cluster/vpn-access.md b/docs.it4i/salomon/accessing-the-cluster/vpn-access.md index 7ac87bb0c..0f8411cc9 100644 --- a/docs.it4i/salomon/accessing-the-cluster/vpn-access.md +++ b/docs.it4i/salomon/accessing-the-cluster/vpn-access.md @@ -1,50 +1,38 @@ -VPN Access +VPN Access ========== - - Accessing IT4Innovations internal resources via VPN --------------------------------------------------- -For using resources and licenses which are located at IT4Innovations -local network, it is necessary to VPN connect to this network. -We use Cisco AnyConnect Secure Mobility Client, which is supported on -the following operating systems: +For using resources and licenses which are located at IT4Innovations local network, it is necessary to VPN connect to this network. We use Cisco AnyConnect Secure Mobility Client, which is supported on the following operating systems: -- >Windows XP -- >Windows Vista -- >Windows 7 -- >Windows 8 -- >Linux -- >MacOS +- Windows XP +- Windows Vista +- Windows 7 +- Windows 8 +- Linux +- MacOS It is impossible to connect to VPN from other operating systems. VPN client installation ------------------------------------ -You can install VPN client from web interface after successful login -with LDAP credentials on address <https://vpn.it4i.cz/user> +You can install VPN client from web interface after successful login with LDAP credentials on address <https://vpn.it4i.cz/user>  -According to the Java settings after login, the client either -automatically installs, or downloads installation file for your -operating system. It is necessary to allow start of installation tool -for automatic installation. +According to the Java settings after login, the client either automatically installs, or downloads installation file for your operating system. It is necessary to allow start of installation tool for automatic installation.  -   -After successful installation, VPN connection will be established and -you can use available resources from IT4I network. +After successful installation, VPN connection will be established and you can use available resources from IT4I network.  -If your Java setting doesn't allow automatic installation, you can -download installation file and install VPN client manually. +If your Java setting doesn't allow automatic installation, you can download installation file and install VPN client manually.  @@ -52,57 +40,39 @@ After you click on the link, download of installation file will start.  -After successful download of installation file, you have to execute this -tool with administrator's rights and install VPN client manually. +After successful download of installation file, you have to execute this tool with administrator's rights and install VPN client manually. Working with VPN client ----------------------- -You can use graphical user interface or command line interface to run -VPN client on all supported operating systems. We suggest using GUI. +You can use graphical user interface or command line interface to run VPN client on all supported operating systems. We suggest using GUI. -Before the first login to VPN, you have to fill -URL **https://vpn.it4i.cz/user** into the text field. +Before the first login to VPN, you have to fill URL **[https://vpn.it4i.cz/user](https://vpn.it4i.cz/user)** into the text field. - Contacting  -After you click on the Connect button, you must fill your login -credentials. +After you click on the Connect button, you must fill your login credentials. - Contacting  -After a successful login, the client will minimize to the system tray. -If everything works, you can see a lock in the Cisco tray icon. +After a successful login, the client will minimize to the system tray. If everything works, you can see a lock in the Cisco tray icon. -[  -If you right-click on this icon, you will see a context menu in which -you can control the VPN connection. +If you right-click on this icon, you will see a context menu in which you can control the VPN connection. -[  -When you connect to the VPN for the first time, the client downloads the -profile and creates a new item "IT4I cluster" in the connection list. -For subsequent connections, it is not necessary to re-enter the URL -address, but just select the corresponding item. +When you connect to the VPN for the first time, the client downloads the profile and creates a new item "IT4I cluster" in the connection list. For subsequent connections, it is not necessary to re-enter the URL address, but just select the corresponding item. - Contacting  Then AnyConnect automatically proceeds like in the case of first logon.  -After a successful logon, you can see a green circle with a tick mark on -the lock icon. +After a successful logon, you can see a green circle with a tick mark on the lock icon. - Succesfull  -For disconnecting, right-click on the AnyConnect client icon in the -system tray and select **VPN Disconnect**. - +For disconnecting, right-click on the AnyConnect client icon in the system tray and select **VPN Disconnect**. \ No newline at end of file diff --git a/docs.it4i/salomon/environment-and-modules.md b/docs.it4i/salomon/environment-and-modules.md index e9da01143..e47c9130c 100644 --- a/docs.it4i/salomon/environment-and-modules.md +++ b/docs.it4i/salomon/environment-and-modules.md @@ -1,15 +1,11 @@ -Environment and Modules +Environment and Modules ======================= - - ### Environment Customization -After logging in, you may want to configure the environment. Write your -preferred path definitions, aliases, functions and module loads in the -.bashrc file +After logging in, you may want to configure the environment. Write your preferred path definitions, aliases, functions and module loads in the .bashrc file -` +```bash # ./bashrc # Source global definitions @@ -26,25 +22,17 @@ if [ -n "$SSH_TTY" ] then module list # Display loaded modules fi -` +``` -Do not run commands outputing to standard output (echo, module list, -etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental -functionality (scp, PBS) of your account! Take care for SSH session -interactivity for such commands as - stated in the previous example. -in the previous example. +>Do not run commands outputing to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session interactivity for such commands as stated in the previous example. ### Application Modules -In order to configure your shell for running particular application on -Salomon we use Module package interface. +In order to configure your shell for running particular application on Salomon we use Module package interface. -Application modules on Salomon cluster are built using -[EasyBuild](http://hpcugent.github.io/easybuild/ "EasyBuild"). The -modules are divided into the following structure: +Application modules on Salomon cluster are built using [EasyBuild](http://hpcugent.github.io/easybuild/ "EasyBuild"). The modules are divided into the following structure: -` +```bash base: Default module class bio: Bioinformatics, biology and biomedical cae: Computer Aided Engineering (incl. CFD) @@ -66,86 +54,61 @@ modules are divided into the following structure: toolchain: EasyBuild toolchains tools: General purpose tools vis: Visualization, plotting, documentation and typesetting -` +``` -The modules set up the application paths, library paths and environment -variables for running particular application. +>The modules set up the application paths, library paths and environment variables for running particular application. -The modules may be loaded, unloaded and switched, according to momentary -needs. +The modules may be loaded, unloaded and switched, according to momentary needs. To check available modules use -` +```bash $ module avail -` +``` To load a module, for example the OpenMPI module use -` +```bash $ module load OpenMPI -` +``` -loading the OpenMPI module will set up paths and environment variables -of your active shell such that you are ready to run the OpenMPI software +loading the OpenMPI module will set up paths and environment variables of your active shell such that you are ready to run the OpenMPI software To check loaded modules use -` +```bash $ module list -` +``` - To unload a module, for example the OpenMPI module use +To unload a module, for example the OpenMPI module use -` +```bash $ module unload OpenMPI -` +``` Learn more on modules by reading the module man page -` +```bash $ man module -` +``` ### EasyBuild Toolchains -As we wrote earlier, we are using EasyBuild for automatised software -installation and module creation. - -EasyBuild employs so-called **compiler toolchains** or, -simply toolchains for short, which are a major concept in handling the -build and installation processes. - -A typical toolchain consists of one or more compilers, usually put -together with some libraries for specific functionality, e.g., for using -an MPI stack for distributed computing, or which provide optimized -routines for commonly used math operations, e.g., the well-known -BLAS/LAPACK APIs for linear algebra routines. - -For each software package being built, the toolchain to be used must be -specified in some way. - -The EasyBuild framework prepares the build environment for the different -toolchain components, by loading their respective modules and defining -environment variables to specify compiler commands (e.g., -via `$F90`), compiler and linker options (e.g., -via `$CFLAGS` and `$LDFLAGS`{.docutils .literal}), -the list of library names to supply to the linker (via `$LIBS`{.docutils -.literal}), etc. This enables making easyblocks -largely toolchain-agnostic since they can simply rely on these -environment variables; that is, unless they need to be aware of, for -example, the particular compiler being used to determine the build -configuration options. - -Recent releases of EasyBuild include out-of-the-box toolchain support -for: +As we wrote earlier, we are using EasyBuild for automatised software installation and module creation. + +EasyBuild employs so-called **compiler toolchains** or, simply toolchains for short, which are a major concept in handling the build and installation processes. + +A typical toolchain consists of one or more compilers, usually put together with some libraries for specific functionality, e.g., for using an MPI stack for distributed computing, or which provide optimized routines for commonly used math operations, e.g., the well-known BLAS/LAPACK APIs for linear algebra routines. + +For each software package being built, the toolchain to be used must be specified in some way. + +The EasyBuild framework prepares the build environment for the different toolchain components, by loading their respective modules and defining environment variables to specify compiler commands (e.g., via `$F90`), compiler and linker options (e.g., via `$CFLAGS` and `$LDFLAGS`), the list of library names to supply to the linker (via `$LIBS`), etc. This enables making easyblocks largely toolchain-agnostic since they can simply rely on these environment variables; that is, unless they need to be aware of, for example, the particular compiler being used to determine the build configuration options. + +Recent releases of EasyBuild include out-of-the-box toolchain support for: - various compilers, including GCC, Intel, Clang, CUDA - common MPI libraries, such as Intel MPI, MPICH, MVAPICH2, OpenMPI -- various numerical libraries, including ATLAS, Intel MKL, OpenBLAS, - ScalaPACK, FFTW - - +- various numerical libraries, including ATLAS, Intel MKL, OpenBLAS, ScalaPACK, FFTW On Salomon, we have currently following toolchains installed: diff --git a/docs.it4i/salomon/hardware-overview-1/hardware-overview.md b/docs.it4i/salomon/hardware-overview-1/hardware-overview.md deleted file mode 100644 index 58d4f81c6..000000000 --- a/docs.it4i/salomon/hardware-overview-1/hardware-overview.md +++ /dev/null @@ -1,89 +0,0 @@ -Hardware Overview -================= - - - -Introduction ------------- - -The Salomon cluster consists of 1008 computational nodes of which 576 -are regular compute nodes and 432 accelerated nodes. Each node is a - powerful x86-64 computer, equipped -with 24 cores (two twelve-core Intel Xeon processors) and 128GB RAM. The -nodes are interlinked by high speed InfiniBand and Ethernet networks. -All nodes share 0.5PB /home NFS disk storage to store the user files. -Users may use a DDN Lustre shared storage with capacity of 1.69 PB which -is available for the scratch project data. The user access to the -Salomon cluster is provided by four login nodes. - -[More about schematic representation of the Salomon cluster compute -nodes IB -topology](../network-1/ib-single-plane-topology.html). - - - -The parameters are summarized in the following tables: - -General information -------------------- - -In general** -Primary purpose -High Performance Computing -Architecture of compute nodes -x86-64 -Operating system -CentOS 6.7 Linux -[**Compute nodes**](../compute-nodes.html) -Totally -1008 -Processor -2x Intel Xeon E5-2680v3, 2.5GHz, 12cores -RAM -128GB, 5.3GB per core, DDR4@2133 MHz -Local disk drive -no -Compute network / Topology -InfiniBand FDR56 / 7D Enhanced hypercube -w/o accelerator -576 -MIC accelerated -432 -In total** -Total theoretical peak performance (Rpeak) -2011 Tflop/s -Total amount of RAM -129.024 TB -Compute nodes -------------- - - |Node|Count|Processor|Cores|Memory|Accelerator| - ----------------- - |---|---|------------------------ ------- -------- -------------------------------------------- - |w/o accelerator|576|2x Intel Xeon E5-2680v3, 2.5GHz|24|128GB|-| - |MIC accelerated|432|2x Intel Xeon E5-2680v3, 2.5GHz|24|128GB|2x Intel Xeon Phi 7120P, 61cores, 16GB RAM| - -For more details please refer to the [Compute -nodes](../compute-nodes.html). - -Remote visualization nodes --------------------------- - -For remote visualization two nodes with NICE DCV software are available -each configured: - - |Node|Count|Processor|Cores|Memory|GPU Accelerator| - --------------- - |---|---|----------------------- ------- -------- ------------------------------ - |visualization|2|2x Intel Xeon E5-2695v3, 2.3GHz|28|512GB|NVIDIA QUADRO K5000, 4GB RAM| - -SGI UV 2000 ------------ - -For large memory computations a special SMP/NUMA SGI UV 2000 server is -available: - - |Node |Count |Processor |Cores<th align="left">Memory<th align="left">Extra HW | - | --- | --- | - |UV2000 |1 |14x Intel Xeon E5-4627v2, 3.3GHz, 8cores |112 |3328GB DDR3@1866MHz |2x 400GB local SSD1x NVIDIA GM200(GeForce GTX TITAN X),12GB RAM\ | - - - diff --git a/docs.it4i/salomon/hardware-overview.md b/docs.it4i/salomon/hardware-overview.md new file mode 100644 index 000000000..555dbcf5f --- /dev/null +++ b/docs.it4i/salomon/hardware-overview.md @@ -0,0 +1,60 @@ +Hardware Overview +================= + +Introduction +------------ +The Salomon cluster consists of 1008 computational nodes of which 576 are regular compute nodes and 432 accelerated nodes. Each node is a powerful x86-64 computer, equipped with 24 cores (two twelve-core Intel Xeon processors) and 128GB RAM. The nodes are interlinked by high speed InfiniBand and Ethernet networks. All nodes share 0.5PB /home NFS disk storage to store the user files. Users may use a DDN Lustre shared storage with capacity of 1.69 PB which is available for the scratch project data. The user access to the Salomon cluster is provided by four login nodes. + +[More about schematic representation of the Salomon cluster compute nodes IB topology](../network/ib-single-plane-topology.md). + + + +The parameters are summarized in the following tables: + +General information +------------------- + +|**In general**|| +|---|---| +|Primary purpose|High Performance Computing| +|Architecture of compute nodes|x86-64| +|Operating system|CentOS 6.7 Linux| +|[**Compute nodes**](../compute-nodes.md)|| +|Totally|1008| +|Processor|2x Intel Xeon E5-2680v3, 2.5GHz, 12cores| +|RAM|128GB, 5.3GB per core, DDR4@2133 MHz| +|Local disk drive|no| +|Compute network / Topology|InfiniBand FDR56 / 7D Enhanced hypercube| +|w/o accelerator|576| +|MIC accelerated|432| +|**In total**|| +|Total theoretical peak performance (Rpeak)|2011 Tflop/s| +|Total amount of RAM|129.024 TB| + +Compute nodes +------------- + +|Node|Count|Processor|Cores|Memory|Accelerator| +|---|---| +|w/o accelerator|576|2x Intel Xeon E5-2680v3, 2.5GHz|24|128GB|-| +|MIC accelerated|432|2x Intel Xeon E5-2680v3, 2.5GHz|24|128GB|2x Intel Xeon Phi 7120P, 61cores, 16GB RAM| + +For more details please refer to the [Compute nodes](../compute-nodes.md). + +Remote visualization nodes +-------------------------- +For remote visualization two nodes with NICE DCV software are available each configured: + +|Node|Count|Processor|Cores|Memory|GPU Accelerator| +|---|---| +|visualization|2|2x Intel Xeon E5-2695v3, 2.3GHz|28|512GB|NVIDIA QUADRO K5000, 4GB RAM| + +SGI UV 2000 +----------- +For large memory computations a special SMP/NUMA SGI UV 2000 server is available: + +|Node |Count |Processor |Cores|Memory|Extra HW | +| --- | --- | +|UV2000 |1 |14x Intel Xeon E5-4627v2, 3.3GHz, 8cores |112 |3328GB DDR3@1866MHz |2x 400GB local SSD1x NVIDIA GM200(GeForce GTX TITAN X),12GB RAM\ | + + \ No newline at end of file diff --git a/docs.it4i/salomon/introduction.md b/docs.it4i/salomon/introduction.md index 7996d0664..9eb6ee3fa 100644 --- a/docs.it4i/salomon/introduction.md +++ b/docs.it4i/salomon/introduction.md @@ -1,23 +1,9 @@ -Introduction +Introduction ============ -Welcome to Salomon supercomputer cluster. The Salomon cluster consists -of 1008 compute nodes, totaling 24192 compute cores with 129TB RAM and -giving over 2 Pflop/s theoretical peak performance. Each node is a -powerful x86-64 computer, equipped with 24 -cores, at least 128GB RAM. Nodes are interconnected by 7D Enhanced -hypercube Infiniband network and equipped with Intel Xeon E5-2680v3 -processors. The Salomon cluster consists of 576 nodes without -accelerators and 432 nodes equipped with Intel Xeon Phi MIC -accelerators. Read more in [Hardware -Overview](hardware-overview-1/hardware-overview.html). - -The cluster runs CentOS Linux [ -](http://www.bull.com/bullx-logiciels/systeme-exploitation.html) -operating system, which is compatible with -the RedHat [ -Linux -family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) +Welcome to Salomon supercomputer cluster. The Salomon cluster consists of 1008 compute nodes, totaling 24192 compute cores with 129TB RAM and giving over 2 Pflop/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 24 cores, at least 128GB RAM. Nodes are interconnected by 7D Enhanced hypercube Infiniband network and equipped with Intel Xeon E5-2680v3 processors. The Salomon cluster consists of 576 nodes without accelerators and 432 nodes equipped with Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview](hardware-overview-1/hardware-overview.html). + +The cluster runs [CentOS Linux](http://www.bull.com/bullx-logiciels/systeme-exploitation.html) operating system, which is compatible with the RedHat [ Linux family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) **Water-cooled Compute Nodes With MIC Accelerator** @@ -29,5 +15,4 @@ family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_T  - - + \ No newline at end of file diff --git a/docs.it4i/salomon/network-1/7d-enhanced-hypercube.md b/docs.it4i/salomon/network-1/7d-enhanced-hypercube.md deleted file mode 100644 index 633115bd6..000000000 --- a/docs.it4i/salomon/network-1/7d-enhanced-hypercube.md +++ /dev/null @@ -1,37 +0,0 @@ -7D Enhanced Hypercube -===================== - -[More about Job submission - Placement by IB switch / Hypercube -dimension.](../resource-allocation-and-job-execution/job-submission-and-execution.html) - -Nodes may be selected via the PBS resource attribute ehc_[1-7]d . - - |Hypercube|dimension| - --------------- |---|---|--------------------------------- - |1D|ehc_1d| - |2D|ehc_2d| - |3D|ehc_3d| - |4D|ehc_4d| - |5D|ehc_5d| - |6D|ehc_6d| - |7D|ehc_7d| - -[Schematic representation of the Salomon cluster IB single-plain -topology represents hypercube -dimension 0](ib-single-plane-topology.html). - -### 7D Enhanced Hypercube {#d-enhanced-hypercube} - - - - - - |Node type|Count|Short name|Long name|Rack| - -------------------------------------- - |---|---|-------- -------------------------- ------- - |M-Cell compute nodes w/o accelerator|576|cns1 -cns576|r1i0n0 - r4i7n17|1-4| - |compute nodes MIC accelerated|432|cns577 - cns1008|r21u01n577 - r37u31n1008|21-38| - -###  IB Topology - - - diff --git a/docs.it4i/salomon/network-1/7D_Enhanced_hypercube.png b/docs.it4i/salomon/network/7D_Enhanced_hypercube.png similarity index 100% rename from docs.it4i/salomon/network-1/7D_Enhanced_hypercube.png rename to docs.it4i/salomon/network/7D_Enhanced_hypercube.png diff --git a/docs.it4i/salomon/network/7d-enhanced-hypercube.md b/docs.it4i/salomon/network/7d-enhanced-hypercube.md new file mode 100644 index 000000000..6d2a2eb5b --- /dev/null +++ b/docs.it4i/salomon/network/7d-enhanced-hypercube.md @@ -0,0 +1,32 @@ +7D Enhanced Hypercube +===================== + +[More about Job submission - Placement by IB switch / Hypercube dimension.](../resource-allocation-and-job-execution/job-submission-and-execution.md) + +Nodes may be selected via the PBS resource attribute ehc_[1-7]d . + +|Hypercube|dimension| +|---|---| +|1D|ehc_1d| +|2D|ehc_2d| +|3D|ehc_3d| +|4D|ehc_4d| +|5D|ehc_5d| +|6D|ehc_6d| +|7D|ehc_7d| + +[Schematic representation of the Salomon cluster IB single-plain topology represents hypercube dimension 0](ib-single-plane-topology.md). + +### 7D Enhanced Hypercube {#d-enhanced-hypercube} + + + +|Node type|Count|Short name|Long name|Rack| +|---|---| +|M-Cell compute nodes w/o accelerator|576|cns1 -cns576|r1i0n0 - r4i7n17|1-4| +|compute nodes MIC accelerated|432|cns577 - cns1008|r21u01n577 - r37u31n1008|21-38| + +###  IB Topology + + + diff --git a/docs.it4i/salomon/network-1/IBsingleplanetopologyAcceleratednodessmall.png b/docs.it4i/salomon/network/IBsingleplanetopologyAcceleratednodessmall.png similarity index 100% rename from docs.it4i/salomon/network-1/IBsingleplanetopologyAcceleratednodessmall.png rename to docs.it4i/salomon/network/IBsingleplanetopologyAcceleratednodessmall.png diff --git a/docs.it4i/salomon/network-1/IBsingleplanetopologyICEXMcellsmall.png b/docs.it4i/salomon/network/IBsingleplanetopologyICEXMcellsmall.png similarity index 100% rename from docs.it4i/salomon/network-1/IBsingleplanetopologyICEXMcellsmall.png rename to docs.it4i/salomon/network/IBsingleplanetopologyICEXMcellsmall.png diff --git a/docs.it4i/salomon/network-1/Salomon_IB_topology.png b/docs.it4i/salomon/network/Salomon_IB_topology.png similarity index 100% rename from docs.it4i/salomon/network-1/Salomon_IB_topology.png rename to docs.it4i/salomon/network/Salomon_IB_topology.png diff --git a/docs.it4i/salomon/network-1/ib-single-plane-topology.md b/docs.it4i/salomon/network/ib-single-plane-topology.md similarity index 53% rename from docs.it4i/salomon/network-1/ib-single-plane-topology.md rename to docs.it4i/salomon/network/ib-single-plane-topology.md index 70bd60ea1..c7d5a9ee8 100644 --- a/docs.it4i/salomon/network-1/ib-single-plane-topology.md +++ b/docs.it4i/salomon/network/ib-single-plane-topology.md @@ -1,22 +1,13 @@ -IB single-plane topology +IB single-plane topology ======================== - +A complete M-Cell assembly consists of four compute racks. Each rack contains 4x physical IRUs - Independent rack units. Using one dual socket node per one blade slot leads to 8 logical IRUs. Each rack contains 4x2 SGI ICE X IB Premium Blades. -A complete M-Cell assembly consists of four compute racks. Each rack -contains 4x physical IRUs - Independent rack units. Using one dual -socket node per one blade slot leads to 8 logical IRUs. Each rack -contains 4x2 SGI ICE X IB Premium Blades. +The SGI ICE X IB Premium Blade provides the first level of interconnection via dual 36-port Mellanox FDR InfiniBand ASIC switch with connections as follows: -The SGI ICE X IB Premium Blade provides the first level of -interconnection via dual 36-port Mellanox FDR InfiniBand ASIC switch -with connections as follows: - -- 9 ports from each switch chip connect to the unified backplane, to - connect the 18 compute node slots +- 9 ports from each switch chip connect to the unified backplane, to connect the 18 compute node slots - 3 ports on each chip provide connectivity between the chips -- 24 ports from each switch chip connect to the external bulkhead, for - a total of 48 +- 24 ports from each switch chip connect to the external bulkhead, for a total of 48 ###IB single-plane topology - ICEX Mcell @@ -24,23 +15,14 @@ Each colour in each physical IRU represents one dual-switch ASIC switch.  - - ### IB single-plane topology - Accelerated nodes -Each of the 3 inter-connected D racks are equivalent to one half of -Mcell rack. 18x D rack with MIC accelerated nodes [r21-r38] are -equivalent to 3 Mcell racks as shown in a diagram [7D Enhanced -Hypercube](7d-enhanced-hypercube.html). +Each of the 3 inter-connected D racks are equivalent to one half of Mcell rack. 18x D rack with MIC accelerated nodes [r21-r38] are equivalent to 3 Mcell racks as shown in a diagram [7D Enhanced Hypercube](7d-enhanced-hypercube.md). -As shown in a diagram : +As shown in a diagram : - Racks 21, 22, 23, 24, 25, 26 are equivalent to one Mcell rack. - Racks 27, 28, 29, 30, 31, 32 are equivalent to one Mcell rack. - Racks 33, 34, 35, 36, 37, 38 are equivalent to one Mcell rack. - - - - + \ No newline at end of file diff --git a/docs.it4i/salomon/network-1/network.md b/docs.it4i/salomon/network/network.md similarity index 54% rename from docs.it4i/salomon/network-1/network.md rename to docs.it4i/salomon/network/network.md index 79187d02c..afe7789ef 100644 --- a/docs.it4i/salomon/network-1/network.md +++ b/docs.it4i/salomon/network/network.md @@ -1,45 +1,24 @@ -Network +Network ======= - - -All compute and login nodes of Salomon are interconnected by 7D Enhanced -hypercube -[Infiniband](http://en.wikipedia.org/wiki/InfiniBand) -network and by Gigabit -[Ethernet](http://en.wikipedia.org/wiki/Ethernet) -network. Only -[Infiniband](http://en.wikipedia.org/wiki/InfiniBand) -network may be used to transfer user data. +All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube [Infiniband](http://en.wikipedia.org/wiki/InfiniBand) network and by Gigabit [Ethernet](http://en.wikipedia.org/wiki/Ethernet) +network. Only [Infiniband](http://en.wikipedia.org/wiki/InfiniBand) network may be used to transfer user data. Infiniband Network ------------------ +All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube [Infiniband](http://en.wikipedia.org/wiki/InfiniBand) network (56 Gbps). The network topology is a [7D Enhanced hypercube](7d-enhanced-hypercube.md). -All compute and login nodes of Salomon are interconnected by 7D Enhanced -hypercube -[Infiniband](http://en.wikipedia.org/wiki/InfiniBand) -network (56 Gbps). The network topology is a [7D Enhanced -hypercube](7d-enhanced-hypercube.html). - -Read more about schematic representation of the Salomon cluster [IB -single-plain topology](ib-single-plane-topology.html) -([hypercube dimension](7d-enhanced-hypercube.html) -0).[>](IB%20single-plane%20topology%20-%20Accelerated%20nodes.pdf/view.html) +Read more about schematic representation of the Salomon cluster [IB single-plain topology](ib-single-plane-topology.md) +([hypercube dimension](7d-enhanced-hypercube.md) 0). -The compute nodes may be accessed via the Infiniband network using ib0 -network interface, in address range 10.17.0.0 (mask 255.255.224.0). The -MPI may be used to establish native Infiniband connection among the -nodes. +The compute nodes may be accessed via the Infiniband network using ib0 network interface, in address range 10.17.0.0 (mask 255.255.224.0). The MPI may be used to establish native Infiniband connection among the nodes. -The network provides **2170MB/s** transfer rates via the TCP connection -(single stream) and up to **3600MB/s** via native Infiniband protocol. - - +The network provides **2170MB/s** transfer rates via the TCP connection (single stream) and up to **3600MB/s** via native Infiniband protocol. Example ------- -` +```bash $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob $ qstat -n -u username Req'd Req'd Elap @@ -47,19 +26,18 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- 15209.isrv5 username qexp Name0 5530 4 96 -- 01:00 R 00:00 r4i1n0/0*24+r4i1n1/0*24+r4i1n2/0*24+r4i1n3/0*24 -` +``` -In this example, we access the node r4i1n0 by Infiniband network via the -ib0 interface. +In this example, we access the node r4i1n0 by Infiniband network via the ib0 interface. -` +```bash $ ssh 10.17.35.19 -` +``` In this example, we get information of the Infiniband network. -` +```bash $ ifconfig .... inet addr:10.17.35.19.... @@ -70,5 +48,4 @@ $ ip addr show ib0 .... inet 10.17.35.19.... .... -` - +``` \ No newline at end of file diff --git a/docs.it4i/salomon/prace.md b/docs.it4i/salomon/prace.md index daee86213..825758924 100644 --- a/docs.it4i/salomon/prace.md +++ b/docs.it4i/salomon/prace.md @@ -1,110 +1,61 @@ -PRACE User Support +PRACE User Support ================== - - Intro ----- +PRACE users coming to Salomon as to TIER-1 system offered through the DECI calls are in general treated as standard users and so most of the general documentation applies to them as well. This section shows the main differences for quicker orientation, but often uses references to the original documentation. PRACE users who don't undergo the full procedure (including signing the IT4I AuP on top of the PRACE AuP) will not have a password and thus access to some services intended for regular users. This can lower their comfort, but otherwise they should be able to use the TIER-1 system as intended. Please see the [Obtaining Login Credentials section](../get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.html), if the same level of access is required. -PRACE users coming to Salomon as to TIER-1 system offered through the -DECI calls are in general treated as standard users and so most of the -general documentation applies to them as well. This section shows the -main differences for quicker orientation, but often uses references to -the original documentation. PRACE users who don't undergo the full -procedure (including signing the IT4I AuP on top of the PRACE AuP) will -not have a password and thus access to some services intended for -regular users. This can lower their comfort, but otherwise they should -be able to use the TIER-1 system as intended. Please see the [Obtaining -Login Credentials -section](../get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.html), -if the same level of access is required. - -All general [PRACE User -Documentation](http://www.prace-ri.eu/user-documentation/) -should be read before continuing reading the local documentation here. - -[]()Help and Support ------------------------- +All general [PRACE User Documentation](http://www.prace-ri.eu/user-documentation/) should be read before continuing reading the local documentation here. -If you have any troubles, need information, request support or want to -install additional software, please use [PRACE +Help and Support +------------------------ +If you have any troubles, need information, request support or want to install additional software, please use [PRACE Helpdesk](http://www.prace-ri.eu/helpdesk-guide264/). -Information about the local services are provided in the [introduction -of general user documentation](introduction.html). -Please keep in mind, that standard PRACE accounts don't have a password -to access the web interface of the local (IT4Innovations) request -tracker and thus a new ticket should be created by sending an e-mail to -support[at]it4i.cz. +Information about the local services are provided in the [introduction of general user documentation](introduction.html). Please keep in mind, that standard PRACE accounts don't have a password to access the web interface of the local (IT4Innovations) request tracker and thus a new ticket should be created by sending an e-mail to support[at]it4i.cz. Obtaining Login Credentials --------------------------- +In general PRACE users already have a PRACE account setup through their HOMESITE (institution from their country) as a result of rewarded PRACE project proposal. This includes signed PRACE AuP, generated and registered certificates, etc. -In general PRACE users already have a PRACE account setup through their -HOMESITE (institution from their country) as a result of rewarded PRACE -project proposal. This includes signed PRACE AuP, generated and -registered certificates, etc. - -If there's a special need a PRACE user can get a standard (local) -account at IT4Innovations. To get an account on the Salomon cluster, the -user needs to obtain the login credentials. The procedure is the same as -for general users of the cluster, so please see the corresponding -[section of the general documentation -here](../get-started-with-it4innovations/obtaining-login-credentials.html). +If there's a special need a PRACE user can get a standard (local) account at IT4Innovations. To get an account on the Salomon cluster, the user needs to obtain the login credentials. The procedure is the same as for general users of the cluster, so please see the corresponding [section of the general documentation here](../get-started-with-it4innovations/obtaining-login-credentials.html). Accessing the cluster --------------------- ### Access with GSI-SSH -For all PRACE users the method for interactive access (login) and data -transfer based on grid services from Globus Toolkit (GSI SSH and -GridFTP) is supported. +For all PRACE users the method for interactive access (login) and data transfer based on grid services from Globus Toolkit (GSI SSH and GridFTP) is supported. -The user will need a valid certificate and to be present in the PRACE -LDAP (please contact your HOME SITE or the primary investigator of your -project for LDAP account creation). +The user will need a valid certificate and to be present in the PRACE LDAP (please contact your HOME SITE or the primary investigator of your project for LDAP account creation). -Most of the information needed by PRACE users accessing the Salomon -TIER-1 system can be found here: +Most of the information needed by PRACE users accessing the Salomon TIER-1 system can be found here: -- [General user's - FAQ](http://www.prace-ri.eu/Users-General-FAQs) -- [Certificates - FAQ](http://www.prace-ri.eu/Certificates-FAQ) -- [Interactive access using - GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) -- [Data transfer with - GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) -- [Data transfer with - gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer) +- [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs) +- [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ) +- [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) +- [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) +- [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer) - - -Before you start to use any of the services don't forget to create a -proxy certificate from your certificate: +Before you start to use any of the services don't forget to create a proxy certificate from your certificate: +```bash $ grid-proxy-init +``` -To check whether your proxy certificate is still valid (by default it's -valid 12 hours), use: +To check whether your proxy certificate is still valid (by default it's valid 12 hours), use: +```bash $ grid-proxy-info +``` - - -To access Salomon cluster, two login nodes running GSI SSH service are -available. The service is available from public Internet as well as from -the internal PRACE network (accessible only from other PRACE partners). +To access Salomon cluster, two login nodes running GSI SSH service are available. The service is available from public Internet as well as from the internal PRACE network (accessible only from other PRACE partners). -***Access from PRACE network:** +**Access from PRACE network:** -It is recommended to use the single DNS name -salomon-prace.it4i.cz which is distributed -between the two login nodes. If needed, user can login directly to one -of the login nodes. The addresses are: +It is recommended to use the single DNS name salomon-prace.it4i.cz which is distributed between the two login nodes. If needed, user can login directly to one of the login nodes. The addresses are: - |Login address|Port|Protocol|Login node| + |Login address|Port|Protocol|Login node| |---|---| |salomon-prace.it4i.cz|2222|gsissh|login1, login2, login3 or login4| |login1-prace.salomon.it4i.cz|2222|gsissh|login1| @@ -112,274 +63,210 @@ of the login nodes. The addresses are: |login3-prace.salomon.it4i.cz|2222|gsissh|login3| |login4-prace.salomon.it4i.cz|2222|gsissh|login4| - - +```bash $ gsissh -p 2222 salomon-prace.it4i.cz +``` -When logging from other PRACE system, the prace_service script can be -used: +When logging from other PRACE system, the prace_service script can be used: +```bash $ gsissh `prace_service -i -s salomon` +``` - +**Access from public Internet:** -***Access from public Internet:** +It is recommended to use the single DNS name salomon.it4i.cz which is distributed between the two login nodes. If needed, user can login directly to one of the login nodes. The addresses are: -It is recommended to use the single DNS name -salomon.it4i.cz which is distributed between -the two login nodes. If needed, user can login directly to one of the -login nodes. The addresses are: - - |Login address|Port|Protocol|Login node| - |---|---| - |salomon.it4i.cz|2222|gsissh|login1, login2, login3 or login4| - |login1.salomon.it4i.cz|2222|gsissh|login1| - |login2-prace.salomon.it4i.cz|2222|gsissh|login2| - |login3-prace.salomon.it4i.cz|2222|gsissh|login3| - |login4-prace.salomon.it4i.cz|2222|gsissh|login4| +|Login address|Port|Protocol|Login node| +|---|---| +|salomon.it4i.cz|2222|gsissh|login1, login2, login3 or login4| +|login1.salomon.it4i.cz|2222|gsissh|login1| +|login2-prace.salomon.it4i.cz|2222|gsissh|login2| +|login3-prace.salomon.it4i.cz|2222|gsissh|login3| +|login4-prace.salomon.it4i.cz|2222|gsissh|login4| +```bash $ gsissh -p 2222 salomon.it4i.cz +``` -When logging from other PRACE system, the -prace_service script can be used: +When logging from other PRACE system, the prace_service script can be used: +```bash $ gsissh `prace_service -e -s salomon` +``` - - -Although the preferred and recommended file transfer mechanism is [using -GridFTP](prace.html#file-transfers), the GSI SSH -implementation on Salomon supports also SCP, so for small files transfer -gsiscp can be used: +Although the preferred and recommended file transfer mechanism is [using GridFTP](prace.html#file-transfers), the GSI SSH +implementation on Salomon supports also SCP, so for small files transfer gsiscp can be used: +```bash $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ + $ gsiscp -P 2222 salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ + $ gsiscp -P 2222 salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ +``` ### Access to X11 applications (VNC) -If the user needs to run X11 based graphical application and does not -have a X11 server, the applications can be run using VNC service. If the -user is using regular SSH based access, please see the [section in -general -documentation](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html). +If the user needs to run X11 based graphical application and does not have a X11 server, the applications can be run using VNC service. If the user is using regular SSH based access, please see the [section in general documentation](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html). -If the user uses GSI SSH based access, then the procedure is similar to -the SSH based access ([look -here](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html)), -only the port forwarding must be done using GSI SSH: +If the user uses GSI SSH based access, then the procedure is similar to the SSH based access ([look here](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html)), only the port forwarding must be done using GSI SSH: +```bash $ gsissh -p 2222 salomon.it4i.cz -L 5961:localhost:5961 +``` ### Access with SSH -After successful obtainment of login credentials for the local -IT4Innovations account, the PRACE users can access the cluster as -regular users using SSH. For more information please see the [section in -general -documentation](accessing-the-cluster/shell-and-data-access/shell-and-data-access.html). +After successful obtainment of login credentials for the local IT4Innovations account, the PRACE users can access the cluster as regular users using SSH. For more information please see the [section in general documentation](accessing-the-cluster/shell-and-data-access/shell-and-data-access.html). File transfers ------------------ +PRACE users can use the same transfer mechanisms as regular users (if they've undergone the full registration procedure). For information about this, please see [the section in the general documentation](accessing-the-cluster/shell-and-data-access/shell-and-data-access.html). -PRACE users can use the same transfer mechanisms as regular users (if -they've undergone the full registration procedure). For information -about this, please see [the section in the general -documentation](accessing-the-cluster/shell-and-data-access/shell-and-data-access.html). - -Apart from the standard mechanisms, for PRACE users to transfer data -to/from Salomon cluster, a GridFTP server running Globus Toolkit GridFTP -service is available. The service is available from public Internet as -well as from the internal PRACE network (accessible only from other -PRACE partners). +Apart from the standard mechanisms, for PRACE users to transfer data to/from Salomon cluster, a GridFTP server running Globus Toolkit GridFTP service is available. The service is available from public Internet as well as from the internal PRACE network (accessible only from other PRACE partners). -There's one control server and three backend servers for striping and/or -backup in case one of them would fail. +There's one control server and three backend servers for striping and/or backup in case one of them would fail. -***Access from PRACE network:** +**Access from PRACE network:** - |Login address|Port|Node role| - |---|---| - |gridftp-prace.salomon.it4i.cz|2812|Front end /control server| - |lgw1-prace.salomon.it4i.cz|2813|Backend / data mover server| - |lgw2-prace.salomon.it4i.cz|2813|Backend / data mover server| - |lgw3-prace.salomon.it4i.cz|2813|Backend / data mover server| +|Login address|Port|Node role| +|---|---| +|gridftp-prace.salomon.it4i.cz|2812|Front end /control server| +|lgw1-prace.salomon.it4i.cz|2813|Backend / data mover server| +|lgw2-prace.salomon.it4i.cz|2813|Backend / data mover server| +|lgw3-prace.salomon.it4i.cz|2813|Backend / data mover server| -Copy files **to** Salomon by running the following commands on your -local machine: +Copy files **to** Salomon by running the following commands on your local machine: +```bash $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +``` Or by using prace_service script: +```bash $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +``` Copy files **from** Salomon: +```bash $ globus-url-copy gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +``` Or by using prace_service script: +```bash $ globus-url-copy gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +``` - - -***Access from public Internet:** +**Access from public Internet:** - |Login address|Port|Node role| - |---|---| - |gridftp.salomon.it4i.cz|2812|Front end /control server| - |lgw1.salomon.it4i.cz|2813|Backend / data mover server| - |lgw2.salomon.it4i.cz|2813|Backend / data mover server| - |lgw3.salomon.it4i.cz|2813|Backend / data mover server| +|Login address|Port|Node role| +|---|---| +|gridftp.salomon.it4i.cz|2812|Front end /control server| +|lgw1.salomon.it4i.cz|2813|Backend / data mover server| +|lgw2.salomon.it4i.cz|2813|Backend / data mover server| +|lgw3.salomon.it4i.cz|2813|Backend / data mover server| -Copy files **to** Salomon by running the following commands on your -local machine: +Copy files **to** Salomon by running the following commands on your local machine: +```bash $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +``` Or by using prace_service script: +```bash $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ +``` Copy files **from** Salomon: +```bash $ globus-url-copy gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ +``` Or by using prace_service script: +```bash $ globus-url-copy gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ - - +``` Generally both shared file systems are available through GridFTP: - |File system mount point|Filesystem|Comment| - |---|---| - |/home|Lustre|Default HOME directories of users in format /home/prace/login/| - |/scratch|Lustre|Shared SCRATCH mounted on the whole cluster| +|File system mount point|Filesystem|Comment| +|---|---| +|/home|Lustre|Default HOME directories of users in format /home/prace/login/| +|/scratch|Lustre|Shared SCRATCH mounted on the whole cluster| -More information about the shared file systems is available -[here](storage.html). +More information about the shared file systems is available [here](storage.html). -Please note, that for PRACE users a "prace" directory is used also on -the SCRATCH file system. +Please note, that for PRACE users a "prace" directory is used also on the SCRATCH file system. - |Data type|Default path| - |---|---| - |large project files|/scratch/work/user/prace/login/| - |large scratch/temporary data|/scratch/temp/| +|Data type|Default path| +|---|---| +|large project files|/scratch/work/user/prace/login/| +|large scratch/temporary data|/scratch/temp/| Usage of the cluster -------------------- +There are some limitations for PRACE user when using the cluster. By default PRACE users aren't allowed to access special queues in the PBS Pro to have high priority or exclusive access to some special equipment like accelerated nodes and high memory (fat) nodes. There may be also restrictions obtaining a working license for the commercial software installed on the cluster, mostly because of the license agreement or because of insufficient amount of licenses. -There are some limitations for PRACE user when using the cluster. By -default PRACE users aren't allowed to access special queues in the PBS -Pro to have high priority or exclusive access to some special equipment -like accelerated nodes and high memory (fat) nodes. There may be also -restrictions obtaining a working license for the commercial software -installed on the cluster, mostly because of the license agreement or -because of insufficient amount of licenses. - -For production runs always use scratch file systems. The available file -systems are described [here](storage/storage.html). +For production runs always use scratch file systems. The available file systems are described [here](storage/storage.html). ### Software, Modules and PRACE Common Production Environment -All system wide installed software on the cluster is made available to -the users via the modules. The information about the environment and -modules usage is in this [section of general -documentation](environment-and-modules.html). +All system wide installed software on the cluster is made available to the users via the modules. The information about the environment and modules usage is in this [section of general documentation](environment-and-modules.html). -PRACE users can use the "prace" module to use the [PRACE Common -Production -Environment](http://www.prace-ri.eu/PRACE-common-production). +PRACE users can use the "prace" module to use the [PRACE Common Production Environment](http://www.prace-ri.eu/PRACE-common-production). +```bash $ module load prace - - +``` ### Resource Allocation and Job Execution -General information about the resource allocation, job queuing and job -execution is in this [section of general -documentation](resource-allocation-and-job-execution/introduction.html). +General information about the resource allocation, job queuing and job execution is in this [section of general documentation](resource-allocation-and-job-execution/introduction.html). -For PRACE users, the default production run queue is "qprace". PRACE -users can also use two other queues "qexp" and "qfree". +For PRACE users, the default production run queue is "qprace". PRACE users can also use two other queues "qexp" and "qfree". - |queue|Active project|Project resources|Nodes|priority|authorization|walltime | - |---|---| - |**qexp** \|no|none required|32 nodes, max 8 per user|150|no|1 / 1h| - \ + |**qexp** Express queue|no|none required|32 nodes, max 8 per user|150|no|1 / 1h| + |**qprace** Production queue|yes|>0|1006 nodes, max 86 per job|0|no|24 / 48h| + |**qfree** Free resource queue|yes|none required|752 nodes, max 86 per job|-1024|no|12 / 12h| - gt; 0 >1006 nodes, max 86 per job 0 no 24 / 48h> 0 >1006 nodes, max 86 per job 0 no 24 / 48h - \ - - |**qfree** \|yes|none required|752 nodes, max 86 per job|-1024|no|12 / 12h| - \ - - -qprace**, the PRACE \***: This queue is intended for -normal production runs. It is required that active project with nonzero -remaining resources is specified to enter the qprace. The queue runs -with medium priority and no special authorization is required to use it. -The maximum runtime in qprace is 48 hours. If the job needs longer time, -it must use checkpoint/restart functionality. +**qprace**, the PRACE This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprace. The queue runs with medium priority and no special authorization is required to use it. The maximum runtime in qprace is 48 hours. If the job needs longer time, it must use checkpoint/restart functionality. ### Accounting & Quota -The resources that are currently subject to accounting are the core -hours. The core hours are accounted on the wall clock basis. The -accounting runs whenever the computational cores are allocated or -blocked via the PBS Pro workload manager (the qsub command), regardless -of whether the cores are actually used for any calculation. See [example -in the general -documentation](resource-allocation-and-job-execution/resources-allocation-policy.html). - -PRACE users should check their project accounting using the [PRACE -Accounting Tool -(DART)](http://www.prace-ri.eu/accounting-report-tool/). - -Users who have undergone the full local registration procedure -(including signing the IT4Innovations Acceptable Use Policy) and who -have received local password may check at any time, how many core-hours -have been consumed by themselves and their projects using the command -"it4ifree". Please note that you need to know your user password to use -the command and that the displayed core hours are "system core hours" -which differ from PRACE "standardized core hours". - -The **it4ifree** command is a part of it4i.portal.clients package, -located here: -<https://pypi.python.org/pypi/it4i.portal.clients> +The resources that are currently subject to accounting are the core hours. The core hours are accounted on the wall clock basis. The accounting runs whenever the computational cores are allocated or blocked via the PBS Pro workload manager (the qsub command), regardless of whether the cores are actually used for any calculation. See [example in the general documentation](resource-allocation-and-job-execution/resources-allocation-policy.html). + +PRACE users should check their project accounting using the [PRACE Accounting Tool (DART)](http://www.prace-ri.eu/accounting-report-tool/). + +Users who have undergone the full local registration procedure (including signing the IT4Innovations Acceptable Use Policy) and who have received local password may check at any time, how many core-hours have been consumed by themselves and their projects using the command "it4ifree". Please note that you need to know your user password to use the command and that the displayed core hours are "system core hours" which differ from PRACE "standardized core hours". + +>The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> +```bash $ it4ifree Password:     PID  Total Used ...by me Free   -------- ------- ------ -------- -------   OPEN-0-0 1500000 400644  225265 1099356   DD-13-1   10000 2606 2606 7394 +``` - - -By default file system quota is applied. To check the current status of -the quota (separate for HOME and SCRATCH) use +By default file system quota is applied. To check the current status of the quota (separate for HOME and SCRATCH) use +```bash $ quota $ lfs quota -u USER_LOGIN /scratch +``` -If the quota is insufficient, please contact the -[support](prace.html#help-and-support) and request an -increase. - - - - - +If the quota is insufficient, please contact the [support](prace.html#help-and-support) and request an increase. \ No newline at end of file diff --git a/docs.it4i/salomon/resource-allocation-and-job-execution/capacity-computing.md b/docs.it4i/salomon/resource-allocation-and-job-execution/capacity-computing.md index 1282e33da..88a557d22 100644 --- a/docs.it4i/salomon/resource-allocation-and-job-execution/capacity-computing.md +++ b/docs.it4i/salomon/resource-allocation-and-job-execution/capacity-computing.md @@ -1,85 +1,52 @@ -Capacity computing +Capacity computing ================== - - Introduction ------------ +In many cases, it is useful to submit huge (100+) number of computational jobs into the PBS queue system. Huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations, achieving best runtime, throughput and computer utilization. + +However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1500 per job array** -In many cases, it is useful to submit huge (>100+) number of -computational jobs into the PBS queue system. Huge number of (small) -jobs is one of the most effective ways to execute embarrassingly -parallel calculations, achieving best runtime, throughput and computer -utilization. - -However, executing huge number of jobs via the PBS queue may strain the -system. This strain may result in slow response to commands, inefficient -scheduling and overall degradation of performance and user experience, -for all users. For this reason, the number of jobs is **limited to 100 -per user, 1500 per job array** - -Please follow one of the procedures below, in case you wish to schedule -more than >100 jobs at a time. - -- Use [Job arrays](capacity-computing.html#job-arrays) - when running huge number of - [multithread](capacity-computing.html#shared-jobscript-on-one-node) - (bound to one node only) or multinode (multithread across - several nodes) jobs -- Use [GNU - parallel](capacity-computing.html#gnu-parallel) when - running single core jobs -- Combine[GNU parallel with Job - arrays](capacity-computing.html#combining-job-arrays-and-gnu-parallel) - when running huge number of single core jobs +>Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. + +- Use [Job arrays](capacity-computing.md#job-arrays) when running huge number of [multithread](capacity-computing.md#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs +- Use [GNU parallel](capacity-computing.md#gnu-parallel) when running single core jobs +- Combine[GNU parallel with Job arrays](capacity-computing.md#combining-job-arrays-and-gnu-parallel) when running huge number of single core jobs Policy ------ - -1. A user is allowed to submit at most 100 jobs. Each job may be [a job - array](capacity-computing.html#job-arrays). +1. A user is allowed to submit at most 100 jobs. Each job may be [a job array](capacity-computing.md#job-arrays). 2. The array size is at most 1000 subjobs. Job arrays -------------- +>Huge number of jobs may be easily submitted and managed as a job array. -Huge number of jobs may be easily submitted and managed as a job array. - -A job array is a compact representation of many jobs, called subjobs. -The subjobs share the same job script, and have the same values for all -attributes and resources, with the following exceptions: +A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: - each subjob has a unique index, $PBS_ARRAY_INDEX - job Identifiers of subjobs only differ by their indices - the state of subjobs can differ (R,Q,...etc.) -All subjobs within a job array have the same scheduling priority and -schedule as independent jobs. -Entire job array is submitted through a single qsub command and may be -managed by qdel, qalter, qhold, qrls and qsig commands as a single job. +All subjobs within a job array have the same scheduling priority and schedule as independent jobs. Entire job array is submitted through a single qsub command and may be managed by qdel, qalter, qhold, qrls and qsig commands as a single job. ### Shared jobscript -All subjobs in job array use the very same, single jobscript. Each -subjob runs its own instance of the jobscript. The instances execute -different work controlled by $PBS_ARRAY_INDEX variable. +All subjobs in job array use the very same, single jobscript. Each subjob runs its own instance of the jobscript. The instances execute different work controlled by $PBS_ARRAY_INDEX variable. Example: -Assume we have 900 input files with name beginning with "file" (e. g. -file001, ..., file900). Assume we would like to use each of these input -files with program executable myprog.x, each as a separate job. +Assume we have 900 input files with name beginning with "file" (e. g. file001, ..., file900). Assume we would like to use each of these input files with program executable myprog.x, each as a separate job. -First, we create a tasklist file (or subjobs list), listing all tasks -(subjobs) - all input files in our example: +First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) - all input files in our example: -` +```bash $ find . -name 'file*' > tasklist -` +``` Then we create jobscript: -` +```bash #!/bin/bash #PBS -A PROJECT_ID #PBS -q qprod @@ -90,9 +57,9 @@ SCR=/scratch/work/user/$USER/$PBS_JOBID mkdir -p $SCR ; cd $SCR || exit # get individual tasks from tasklist with index from PBS JOB ARRAY -TASK=$(sed -n "${PBS_ARRAY_INDEX}p" $PBS_O_WORKDIR/tasklist) +TASK=$(sed -n "${PBS_ARRAY_INDEX}p" $PBS_O_WORKDIR/tasklist) -# copy input file and executable to scratch +# copy input file and executable to scratch cp $PBS_O_WORKDIR/$TASK input ; cp $PBS_O_WORKDIR/myprog.x . # execute the calculation @@ -100,58 +67,36 @@ cp $PBS_O_WORKDIR/$TASK input ; cp $PBS_O_WORKDIR/myprog.x . # copy output file to submit directory cp output $PBS_O_WORKDIR/$TASK.out -` - -In this example, the submit directory holds the 900 input files, -executable myprog.x and the jobscript file. As input for each run, we -take the filename of input file from created tasklist file. We copy the -input file to scratch /scratch/work/user/$USER/$PBS_JOBID, execute -the myprog.x and copy the output file back to >the submit -directory, under the $TASK.out name. The myprog.x runs on one -node only and must use threads to run in parallel. Be aware, that if the -myprog.x **is not multithreaded**, then all the **jobs are run as single -thread programs in sequential** manner. Due to allocation of the whole -node, the **accounted time is equal to the usage of whole node**, while -using only 1/24 of the node! - -If huge number of parallel multicore (in means of multinode multithread, -e. g. MPI enabled) jobs is needed to run, then a job array approach -should also be used. The main difference compared to previous example -using one node is that the local scratch should not be used (as it's not -shared between nodes) and MPI or other technique for parallel multinode -run has to be used properly. +``` + +In this example, the submit directory holds the 900 input files, executable myprog.x and the jobscript file. As input for each run, we take the filename of input file from created tasklist file. We copy the input file to scratch /scratch/work/user/$USER/$PBS_JOBID, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The myprog.x runs on one node only and must use threads to run in parallel. Be aware, that if the myprog.x **is not multithreaded**, then all the **jobs are run as single thread programs in sequential** manner. Due to allocation of the whole node, the **accounted time is equal to the usage of whole node**, while using only 1/24 of the node! + +If huge number of parallel multicore (in means of multinode multithread, e. g. MPI enabled) jobs is needed to run, then a job array approach should also be used. The main difference compared to previous example using one node is that the local scratch should not be used (as it's not shared between nodes) and MPI or other technique for parallel multinode run has to be used properly. ### Submit the job array -To submit the job array, use the qsub -J command. The 900 jobs of the -[example above](capacity-computing.html#array_example) may -be submitted like this: +To submit the job array, use the qsub -J command. The 900 jobs of the [example above](capacity-computing.html#array_example) may be submitted like this: -` +```bash $ qsub -N JOBNAME -J 1-900 jobscript 506493[].isrv5 -` +``` -In this example, we submit a job array of 900 subjobs. Each subjob will -run on full node and is assumed to take less than 2 hours (please note -the #PBS directives in the beginning of the jobscript file, dont' -forget to set your valid PROJECT_ID and desired queue). +In this example, we submit a job array of 900 subjobs. Each subjob will run on full node and is assumed to take less than 2 hours (please note the #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue). -Sometimes for testing purposes, you may need to submit only one-element -array. This is not allowed by PBSPro, but there's a workaround: +Sometimes for testing purposes, you may need to submit only one-element array. This is not allowed by PBSPro, but there's a workaround: -` +```bash $ qsub -N JOBNAME -J 9-10:2 jobscript -` +``` -This will only choose the lower index (9 in this example) for -submitting/running your job. +This will only choose the lower index (9 in this example) for submitting/running your job. ### Manage the job array Check status of the job array by the qstat command. -` +```bash $ qstat -a 506493[].isrv5 isrv5: @@ -159,13 +104,13 @@ isrv5: Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- 12345[].dm2 user2 qprod xx 13516 1 24 -- 00:50 B 00:02 -` +``` The status B means that some subjobs are already running. Check status of the first 100 subjobs by the qstat command. -` +```bash $ qstat -a 12345[1-100].isrv5 isrv5: @@ -177,80 +122,68 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time 12345[3].isrv5 user2 qprod xx 13516 1 24 -- 00:50 R 00:01 12345[4].isrv5 user2 qprod xx 13516 1 24 -- 00:50 Q -- . . . . . . . . . . . - , . . . . . . . . . . + , . . . . . . . . . . 12345[100].isrv5 user2 qprod xx 13516 1 24 -- 00:50 Q -- -` +``` -Delete the entire job array. Running subjobs will be killed, queueing -subjobs will be deleted. +Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted. -` +```bash $ qdel 12345[].isrv5 -` +``` Deleting large job arrays may take a while. Display status information for all user's jobs, job arrays, and subjobs. -` +```bash $ qstat -u $USER -t -` +``` Display status information for all user's subjobs. -` +```bash $ qstat -u $USER -tJ -` +``` -Read more on job arrays in the [PBSPro Users -guide](../../pbspro-documentation.html). +Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation.html). GNU parallel ---------------- +>Use GNU parallel to run many single core tasks on one node. -Use GNU parallel to run many single core tasks on one node. - -GNU parallel is a shell tool for executing jobs in parallel using one or -more computers. A job can be a single command or a small script that has -to be run for each of the lines in the input. GNU parallel is most -useful in running single core jobs via the queue system on Anselm. +GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm. For more information and examples see the parallel man page: -` +```bash $ module add parallel $ man parallel -` +``` ### GNU parallel jobscript -The GNU parallel shell executes multiple instances of the jobscript -using all cores on the node. The instances execute different work, -controlled by the $PARALLEL_SEQ variable. +The GNU parallel shell executes multiple instances of the jobscript using all cores on the node. The instances execute different work, controlled by the $PARALLEL_SEQ variable. Example: -Assume we have 101 input files with name beginning with "file" (e. g. -file001, ..., file101). Assume we would like to use each of these input -files with program executable myprog.x, each as a separate single core -job. We call these single core jobs tasks. +Assume we have 101 input files with name beginning with "file" (e. g. file001, ..., file101). Assume we would like to use each of these input files with program executable myprog.x, each as a separate single core job. We call these single core jobs tasks. -First, we create a tasklist file, listing all tasks - all input files in -our example: +First, we create a tasklist file, listing all tasks - all input files in our example: -` +```bash $ find . -name 'file*' > tasklist -` +``` Then we create jobscript: -` +```bash #!/bin/bash #PBS -A PROJECT_ID #PBS -q qprod #PBS -l select=1:ncpus=24,walltime=02:00:00 -[ -z "$PARALLEL_SEQ" ] && +[ -z "$PARALLEL_SEQ" ] && { module add parallel ; exec parallel -a $PBS_O_WORKDIR/tasklist $0 ; } # change to local scratch directory @@ -258,98 +191,70 @@ SCR=/scratch/work/user/$USER/$PBS_JOBID/$PARALLEL_SEQ mkdir -p $SCR ; cd $SCR || exit # get individual task from tasklist -TASK=$1  +TASK=$1 -# copy input file and executable to scratch -cp $PBS_O_WORKDIR/$TASK input +# copy input file and executable to scratch +cp $PBS_O_WORKDIR/$TASK input # execute the calculation cat input > output # copy output file to submit directory cp output $PBS_O_WORKDIR/$TASK.out -` - -In this example, tasks from tasklist are executed via the GNU -parallel. The jobscript executes multiple instances of itself in -parallel, on all cores of the node. Once an instace of jobscript is -finished, new instance starts until all entries in tasklist are -processed. Currently processed entry of the joblist may be retrieved via -$1 variable. Variable $TASK expands to one of the input filenames from -tasklist. We copy the input file to local scratch, execute the myprog.x -and copy the output file back to the submit directory, under the -$TASK.out name. +``` + +In this example, tasks from tasklist are executed via the GNU parallel. The jobscript executes multiple instances of itself in parallel, on all cores of the node. Once an instace of jobscript is finished, new instance starts until all entries in tasklist are processed. Currently processed entry of the joblist may be retrieved via $1 variable. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. ### Submit the job -To submit the job, use the qsub command. The 101 tasks' job of the -[example above](capacity-computing.html#gp_example) may be -submitted like this: +To submit the job, use the qsub command. The 101 tasks' job of the [example above](capacity-computing.html#gp_example) may be submitted like this: -` +```bash $ qsub -N JOBNAME jobscript 12345.dm2 -` +``` -In this example, we submit a job of 101 tasks. 24 input files will be -processed in parallel. The 101 tasks on 24 cores are assumed to -complete in less than 2 hours. +In this example, we submit a job of 101 tasks. 24 input files will be processed in parallel. The 101 tasks on 24 cores are assumed to complete in less than 2 hours. -Please note the #PBS directives in the beginning of the jobscript file, -dont' forget to set your valid PROJECT_ID and desired queue. +Please note the #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. Job arrays and GNU parallel ------------------------------- +>Combine the Job arrays and GNU parallel for best throughput of single core jobs -Combine the Job arrays and GNU parallel for best throughput of single -core jobs - -While job arrays are able to utilize all available computational nodes, -the GNU parallel can be used to efficiently run multiple single-core -jobs on single node. The two approaches may be combined to utilize all -available (current and future) resources to execute single core jobs. +While job arrays are able to utilize all available computational nodes, the GNU parallel can be used to efficiently run multiple single-core jobs on single node. The two approaches may be combined to utilize all available (current and future) resources to execute single core jobs. -Every subjob in an array runs GNU parallel to utilize all cores on the -node +>Every subjob in an array runs GNU parallel to utilize all cores on the node ### GNU parallel, shared jobscript -Combined approach, very similar to job arrays, can be taken. Job array -is submitted to the queuing system. The subjobs run GNU parallel. The -GNU parallel shell executes multiple instances of the jobscript using -all cores on the node. The instances execute different work, controlled -by the $PBS_JOB_ARRAY and $PARALLEL_SEQ variables. +Combined approach, very similar to job arrays, can be taken. Job array is submitted to the queuing system. The subjobs run GNU parallel. The GNU parallel shell executes multiple instances of the jobscript using all cores on the node. The instances execute different work, controlled by the $PBS_JOB_ARRAY and $PARALLEL_SEQ variables. Example: -Assume we have 992 input files with name beginning with "file" (e. g. -file001, ..., file992). Assume we would like to use each of these input -files with program executable myprog.x, each as a separate single core -job. We call these single core jobs tasks. +Assume we have 992 input files with name beginning with "file" (e. g. file001, ..., file992). Assume we would like to use each of these input files with program executable myprog.x, each as a separate single core job. We call these single core jobs tasks. -First, we create a tasklist file, listing all tasks - all input files in -our example: +First, we create a tasklist file, listing all tasks - all input files in our example: -` +```bash $ find . -name 'file*' > tasklist -` +``` -Next we create a file, controlling how many tasks will be executed in -one subjob +Next we create a file, controlling how many tasks will be executed in one subjob -` +```bash $ seq 32 > numtasks -` +``` Then we create jobscript: -` +```bash #!/bin/bash #PBS -A PROJECT_ID #PBS -q qprod #PBS -l select=1:ncpus=24,walltime=02:00:00 -[ -z "$PARALLEL_SEQ" ] && +[ -z "$PARALLEL_SEQ" ] && { module add parallel ; exec parallel -a $PBS_O_WORKDIR/numtasks $0 ; } # change to local scratch directory @@ -361,74 +266,47 @@ IDX=$(($PBS_ARRAY_INDEX + $PARALLEL_SEQ - 1)) TASK=$(sed -n "${IDX}p" $PBS_O_WORKDIR/tasklist) [ -z "$TASK" ] && exit -# copy input file and executable to scratch -cp $PBS_O_WORKDIR/$TASK input +# copy input file and executable to scratch +cp $PBS_O_WORKDIR/$TASK input # execute the calculation cat input > output # copy output file to submit directory cp output $PBS_O_WORKDIR/$TASK.out -` +``` -In this example, the jobscript executes in multiple instances in -parallel, on all cores of a computing node. Variable $TASK expands to -one of the input filenames from tasklist. We copy the input file to -local scratch, execute the myprog.x and copy the output file back to the -submit directory, under the $TASK.out name. The numtasks file controls -how many tasks will be run per subjob. Once an task is finished, new -task starts, until the number of tasks in numtasks file is reached. +In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached. -Select subjob walltime and number of tasks per subjob carefully +>Select subjob walltime and number of tasks per subjob carefully - When deciding this values, think about following guiding rules : +When deciding this values, think about following guiding rules : -1. Let n=N/24. Inequality (n+1) * T < W should hold. The N is - number of tasks per subjob, T is expected single task walltime and W - is subjob walltime. Short subjob walltime improves scheduling and - job throughput. +1. Let n=N/24. Inequality (n+1) * T < W should hold. The N is number of tasks per subjob, T is expected single task walltime and W is subjob walltime. Short subjob walltime improves scheduling and job throughput. 2. Number of tasks should be modulo 24. -3. These rules are valid only when all tasks have similar task - walltimes T. +3. These rules are valid only when all tasks have similar task walltimes T. ### Submit the job array -To submit the job array, use the qsub -J command. The 992 tasks' job of -the [example -above](capacity-computing.html#combined_example) may be -submitted like this: +To submit the job array, use the qsub -J command. The 992 tasks' job of the [example above](capacity-computing.html#combined_example) may be submitted like this: -` +```bash $ qsub -N JOBNAME -J 1-992:32 jobscript 12345[].dm2 -` +``` -In this example, we submit a job array of 31 subjobs. Note the -J -1-992:**48**, this must be the same as the number sent to numtasks file. -Each subjob will run on full node and process 24 input files in -parallel, 48 in total per subjob. Every subjob is assumed to complete -in less than 2 hours. +In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**48**, this must be the same as the number sent to numtasks file. Each subjob will run on full node and process 24 input files in parallel, 48 in total per subjob. Every subjob is assumed to complete in less than 2 hours. -Please note the #PBS directives in the beginning of the jobscript file, -dont' forget to set your valid PROJECT_ID and desired queue. +Please note the #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. Examples -------- +Download the examples in [capacity.zip](capacity-computing-example), illustrating the above listed ways to run huge number of jobs. We recommend to try out the examples, before using this for running production jobs. -Download the examples in -[capacity.zip](capacity-computing-example), -illustrating the above listed ways to run huge number of jobs. We -recommend to try out the examples, before using this for running -production jobs. +Unzip the archive in an empty directory on Anselm and follow the instructions in the README file -Unzip the archive in an empty directory on Anselm and follow the -instructions in the README file - -` +```bash $ unzip capacity.zip $ cd capacity $ cat README -` - - - +``` \ No newline at end of file diff --git a/docs.it4i/salomon/resource-allocation-and-job-execution/introduction.md b/docs.it4i/salomon/resource-allocation-and-job-execution/introduction.md index 6ef6f4cfa..7f07b0b6c 100644 --- a/docs.it4i/salomon/resource-allocation-and-job-execution/introduction.md +++ b/docs.it4i/salomon/resource-allocation-and-job-execution/introduction.md @@ -1,56 +1,27 @@ -Resource Allocation and Job Execution +Resource Allocation and Job Execution ===================================== - - -To run a [job](job-submission-and-execution.html), -[computational -resources](resources-allocation-policy.html) for this -particular job must be allocated. This is done via the PBS Pro job -workload manager software, which efficiently distributes workloads -across the supercomputer. Extensive informations about PBS Pro can be -found in the [official documentation -here](../../pbspro-documentation.html), especially in -the [PBS Pro User's -Guide](https://docs.it4i.cz/pbspro-documentation/pbspro-users-guide). +To run a [job](job-submission-and-execution.html), [computational resources](resources-allocation-policy.html) for this particular job must be allocated. This is done via the PBS Pro job workload manager software, which efficiently distributes workloads across the supercomputer. Extensive informations about PBS Pro can be found in the [official documentation here](../../pbspro-documentation.html), especially in the [PBS Pro User's Guide](https://docs.it4i.cz/pbspro-documentation/pbspro-users-guide). Resources Allocation Policy --------------------------- +The resources are allocated to the job in a fairshare fashion, subject to constraints set by the queue and resources available to the Project. [The Fairshare](job-priority.html) at Salomon ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are available to Anselm users: -The resources are allocated to the job in a fairshare fashion, subject -to constraints set by the queue and resources available to the Project. -[The Fairshare](job-priority.html) at Salomon ensures -that individual users may consume approximately equal amount of -resources per week. The resources are accessible via several queues for -queueing the jobs. The queues provide prioritized and exclusive access -to the computational resources. Following queues are available to Anselm -users: - -- **qexp**, the \ -- **qprod**, the \*** -- **qlong**, the Long queue -- **qmpp**, the Massively parallel queue -- **qfat**, the queue to access SMP UV2000 machine -- **qfree,** the Free resource utilization queue +- **qexp**, the Express queue +- **qprod**, the Production queue +- **qlong**, the Long queue +- **qmpp**, the Massively parallel queue +- **qfat**, the queue to access SMP UV2000 machine +- **qfree**, the Free resource utilization queue -Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> +>Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> -Read more on the [Resource Allocation -Policy](resources-allocation-policy.html) page. +Read more on the [Resource Allocation Policy](resources-allocation-policy.html) page. Job submission and execution ---------------------------- +>Use the **qsub** command to submit your jobs. -Use the **qsub** command to submit your jobs. - -The qsub submits the job into the queue. The qsub command creates a -request to the PBS Job manager for allocation of specified resources. -The **smallest allocation unit is entire node, 24 cores**, with -exception of the qexp queue. The resources will be allocated when -available, subject to allocation policies and constraints. **After the -resources are allocated the jobscript or interactive shell is executed -on first of the allocated nodes.** - -Read more on the [Job submission and -execution](job-submission-and-execution.html) page. +The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 24 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.** +Read more on the [Job submission and execution](job-submission-and-execution.html) page. \ No newline at end of file diff --git a/docs.it4i/salomon/hardware-overview-1/uv-2000.jpeg b/docs.it4i/salomon/uv-2000.jpeg similarity index 100% rename from docs.it4i/salomon/hardware-overview-1/uv-2000.jpeg rename to docs.it4i/salomon/uv-2000.jpeg -- GitLab