diff --git a/README.md b/README.md index b40b4dca8747b6099ab29f12854e88b14195a608..3dd8adae260a59941b48dfc49ebb90ca13644406 100644 --- a/README.md +++ b/README.md @@ -27,3 +27,10 @@ Konverze html do md ```bash html_md.sh -c ``` + +TestovánĂ + +```bash +html_md.sh -t +html_md.sh -t1 +``` diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/outgoing-connections.md b/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/outgoing-connections.md index b18fe9a0209b1677e3ece6eee4f42fd83eac02dd..e1a4a38dc2823aeee3bb07fc9c1d0baafa431f67 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/outgoing-connections.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/outgoing-connections.md @@ -3,7 +3,7 @@ Outgoing connections - + Connection restrictions ----------------------- @@ -11,12 +11,12 @@ Connection restrictions Outgoing connections, from Anselm Cluster login nodes to the outside world, are restricted to following ports: - Port Protocol - ------ ---------- - 22 ssh - 80 http - 443 https - 9418 git +Port Protocol +------ ---------- +22 ssh +80 http +443 https +9418 git Please use **ssh port forwarding** and proxy servers to connect from Anselm to all other remote ports. @@ -28,7 +28,7 @@ outside world are cut. Port forwarding --------------- -### []()Port forwarding from login nodes +### Port forwarding from login nodes Port forwarding allows an application running on Anselm to connect to arbitrary remote host and port. @@ -39,7 +39,7 @@ workstation and forwarding from the workstation to the remote host. Pick some unused port on Anselm login node (for example 6000) and establish the port forwarding: -``` +``` local $ ssh -R 6000:remote.host.com:1234 anselm.it4i.cz ``` @@ -57,13 +57,13 @@ remote.host.com:1234. Click Add button, then Open. Port forwarding may be established directly to the remote host. However, this requires that user has ssh access to remote.host.com -``` +``` $ ssh -L 6000:localhost:1234 remote.host.com ``` Note: Port number 6000 is chosen as an example only. Pick any free port. -### []()Port forwarding from compute nodes +### Port forwarding from compute nodes Remote port forwarding from compute nodes allows applications running on the compute nodes to access hosts outside Anselm Cluster. @@ -75,7 +75,7 @@ above](outgoing-connections.html#port-forwarding-from-login-nodes). Second, invoke port forwarding from the compute node to the login node. Insert following line into your jobscript or interactive shell -``` +``` $ ssh -TN -f -L 6000:localhost:6000 login1 ``` @@ -98,7 +98,7 @@ SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: -``` +``` local $ ssh -D 1080 localhost ``` @@ -109,7 +109,7 @@ Once the proxy server is running, establish ssh port forwarding from Anselm to the proxy server, port 1080, exactly as [described above](outgoing-connections.html#port-forwarding-from-login-nodes). -``` +``` local $ ssh -R 6000:localhost:1080 anselm.it4i.cz ``` diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/shell-and-data-access/shell-and-data-access.md b/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/shell-and-data-access/shell-and-data-access.md index 6ca703c9f35d55f50c729b4eb20318f4ebd70d67..65d1c36ad2d0cd897ffd1c16e0956f7bd0d3b304 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/shell-and-data-access/shell-and-data-access.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/shell-and-data-access/shell-and-data-access.md @@ -3,7 +3,7 @@ Shell access and data transfer - + Interactive Login ----------------- @@ -12,19 +12,19 @@ The Anselm cluster is accessed by SSH protocol via login nodes login1 and login2 at address anselm.it4i.cz. The login nodes may be addressed specifically, by prepending the login node name to the address. - Login address Port Protocol Login node - ----------------------- ------ ---------- ---------------------------------------------- - anselm.it4i.cz 22 ssh round-robin DNS record for login1 and login2 - login1.anselm.it4i.cz 22 ssh login1 - login2.anselm.it4i.cz 22 ssh login2 +Login address Port Protocol Login node +----------------------- ------ ---------- ---------------------------------------------- +anselm.it4i.cz 22 ssh round-robin DNS record for login1 and login2 +login1.anselm.it4i.cz 22 ssh login1 +login2.anselm.it4i.cz 22 ssh login2 The authentication is by the [private key](../../../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.html) Please verify SSH fingerprints during the first logon. They are -identical on all login nodes:<span class="monospace"> +identical on all login nodes: 29:b3:f4:64:b0:73:f5:6f:a7:85:0f:e0:0d:be:76:bf (DSA) -d4:6f:5c:18:f4:3f:70:ef:bc:fc:cc:2b:fd:13:36:b7 (RSA)</span> +d4:6f:5c:18:f4:3f:70:ef:bc:fc:cc:2b:fd:13:36:b7 (RSA)  @@ -32,14 +32,14 @@ Private keys authentication: On **Linux** or **Mac**, use -``` +``` local $ ssh -i /path/to/id_rsa username@anselm.it4i.cz ``` If you see warning message "UNPROTECTED PRIVATE KEY FILE!", use this command to set lower permissions to private key file. -``` +``` local $ chmod 600 /path/to/id_rsa ``` @@ -48,19 +48,19 @@ client](../../../get-started-with-it4innovations/accessing-the-clusters/shell-ac After logging in, you will see the command prompt: - _ - / | | - / _ __ ___ ___| |_ __ ___ - / / | '_ / __|/ _ | '_ ` _ - / ____ | | | __ __/ | | | | | | - /_/ __| |_|___/___|_|_| |_| |_| + _ + / | | + / _ __ ___ ___| |_ __ ___ + / / | '_ / __|/ _ | '_ ` _ + / ____ | | | __ __/ | | | | | | + /_/ __| |_|___/___|_|_| |_| |_| -                        http://www.it4i.cz/?lang=en +                        http://www.it4i.cz/?lang=en - Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com - [username@login2.anselm ~]$ + Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com + [username@login2.anselm ~]$ The environment is **not** shared between login nodes, except for [shared filesystems](../storage-1.html#section-1). @@ -70,16 +70,16 @@ Data Transfer Data in and out of the system may be transferred by the [scp](http://en.wikipedia.org/wiki/Secure_copy) and sftp -protocols. <span class="discreet">(Not available yet.) In case large +protocols. class="discreet">(Not available yet.) In case large volumes of data are transferred, use dedicated data mover node -dm1.anselm.it4i.cz for increased performance.</span> +dm1.anselm.it4i.cz for increased performance. - Address Port Protocol - -------------------------------------------------- ---------------------------------- ----------------------------------------- - anselm.it4i.cz 22 scp, sftp - login1.anselm.it4i.cz 22 scp, sftp - login2.anselm.it4i.cz 22 scp, sftp - <span class="discreet">dm1.anselm.it4i.cz</span> <span class="discreet">22</span> <span class="discreet">scp, sftp</span> +Address Port Protocol +-------------------------------------------------- ---------------------------------- ----------------------------------------- +anselm.it4i.cz 22 scp, sftp +login1.anselm.it4i.cz 22 scp, sftp +login2.anselm.it4i.cz 22 scp, sftp + class="discreet">dm1.anselm.it4i.cz class="discreet">22 <span class="discreet">scp, sftp</span>  The authentication is by the [private key](../../../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.html) @@ -97,17 +97,17 @@ network provider. On linux or Mac, use scp or sftp client to transfer the data to Anselm: -``` +``` local $ scp -i /path/to/id_rsa my-local-file username@anselm.it4i.cz:directory/file ``` -``` +``` local $ scp -i /path/to/id_rsa -r my-local-dir username@anselm.it4i.cz:directory ``` > or -``` +``` local $ sftp -o IdentityFile=/path/to/id_rsa username@anselm.it4i.cz ``` @@ -115,7 +115,7 @@ Very convenient way to transfer files in and out of the Anselm computer is via the fuse filesystem [sshfs](http://linux.die.net/man/1/sshfs) -``` +``` local $ sshfs -o IdentityFile=/path/to/id_rsa username@anselm.it4i.cz:. mountpoint ``` @@ -124,7 +124,7 @@ local computer, just like an external disk. Learn more on ssh, scp and sshfs by reading the manpages -``` +``` $ man ssh $ man scp $ man sshfs diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/vpn-access.md b/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/vpn-access.md index cfc7e40c07a39c21432acbfe70d29f167cec9c03..8f9b46013862cdf3290aa48ff3c924a1d8916974 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/vpn-access.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/vpn-access.md @@ -3,12 +3,12 @@ VPN Access - + Accessing IT4Innovations internal resources via VPN --------------------------------------------------- -**Failed to initialize connection subsystem Win 8.1 - 02-10-15 MS +Failed to initialize connection subsystem Win 8.1 - 02-10-15 MS patch** Workaround can be found at [https://docs.it4i.cz/vpn-connection-fail-in-win-8.1](../../vpn-connection-fail-in-win-8.1.html) @@ -20,16 +20,16 @@ local network, it is necessary to VPN connect to this network. We use Cisco AnyConnect Secure Mobility Client, which is supported on the following operating systems: -- <span>Windows XP</span> -- <span>Windows Vista</span> -- <span>Windows 7</span> -- <span>Windows 8</span> -- <span>Linux</span> -- <span>MacOS</span> +- >Windows XP +- >Windows Vista +- >Windows 7 +- >Windows 8 +- >Linux +- >MacOS It is impossible to connect to VPN from other operating systems. -<span>VPN client installation</span> +>VPN client installation ------------------------------------ You can install VPN client from web interface after successful login diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/compute-nodes.md b/converted/docs.it4i.cz/anselm-cluster-documentation/compute-nodes.md index 5f8fe6273238b3e16196742cf6090ca48af2ad80..ecae0fd750eef8f45a0efec6338f165bbbd64cef 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/compute-nodes.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/compute-nodes.md @@ -3,7 +3,7 @@ Compute Nodes - + Nodes Configuration ------------------- @@ -12,198 +12,198 @@ Anselm is cluster of x86-64 Intel based nodes built on Bull Extreme Computing bullx technology. The cluster contains four types of compute nodes.**** -### **Compute Nodes Without Accelerator** +###Compute Nodes Without Accelerator** -- <div class="itemizedlist"> +- <div class="itemizedlist"> - 180 nodes + 180 nodes - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - 2880 cores in total + 2880 cores in total - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node + two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - 64 GB of physical memory per node + 64 GB of physical memory per node - + -- one 500GB SATA 2,5” 7,2 krpm HDD per node -- <div class="itemizedlist"> +- one 500GB SATA 2,5” 7,2 krpm HDD per node +- <div class="itemizedlist"> - bullx B510 blade servers + bullx B510 blade servers - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - cn[1-180] + cn[1-180] - + -### **Compute Nodes With GPU Accelerator** +###Compute Nodes With GPU Accelerator** -- <div class="itemizedlist"> +- <div class="itemizedlist"> - 23 nodes + 23 nodes - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - 368 cores in total + 368 cores in total - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node + two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - 96 GB of physical memory per node + 96 GB of physical memory per node - + -- one 500GB SATA 2,5” 7,2 krpm HDD per node -- <div class="itemizedlist"> +- one 500GB SATA 2,5” 7,2 krpm HDD per node +- <div class="itemizedlist"> - GPU accelerator 1x NVIDIA Tesla Kepler K20 per node + GPU accelerator 1x NVIDIA Tesla Kepler K20 per node - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - bullx B515 blade servers + bullx B515 blade servers - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - cn[181-203] + cn[181-203] - + -### **Compute Nodes With MIC Accelerator** +###Compute Nodes With MIC Accelerator** -- <div class="itemizedlist"> +- <div class="itemizedlist"> - 4 nodes + 4 nodes - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - 64 cores in total + 64 cores in total - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node + two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - 96 GB of physical memory per node + 96 GB of physical memory per node - + -- one 500GB SATA 2,5” 7,2 krpm HDD per node -- <div class="itemizedlist"> +- one 500GB SATA 2,5” 7,2 krpm HDD per node +- <div class="itemizedlist"> - MIC accelerator 1x Intel Phi 5110P per node + MIC accelerator 1x Intel Phi 5110P per node - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - bullx B515 blade servers + bullx B515 blade servers - + -- <div class="itemizedlist"> +- <div class="itemizedlist"> - cn[204-207] + cn[204-207] - + -### **Fat Compute Nodes** +###Fat Compute Nodes** -- <div> +- <div> - 2 nodes + 2 nodes - + -- <div> +- <div> - 32 cores in total + 32 cores in total - + -- <div> +- <div> - 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node + 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node - + -- <div> +- <div> - 512 GB of physical memory per node + 512 GB of physical memory per node - + -- two 300GB SAS 3,5”15krpm HDD (RAID1) per node -- <div> +- two 300GB SAS 3,5”15krpm HDD (RAID1) per node +- <div> - two 100GB SLC SSD per node + two 100GB SLC SSD per node - + -- <div> +- <div> - bullx R423-E3 servers + bullx R423-E3 servers - + -- <div> +- <div> - cn[208-209] + cn[208-209] - +  -**** +** -****Figure Anselm bullx B510 servers**** +**Figure Anselm bullx B510 servers**** ### Compute Nodes Summary******** - Node type Count Range Memory Cores [Access](resource-allocation-and-job-execution/resources-allocation-policy.html) - ---------------------------- ------- --------------- -------- ------------- -------------------------------------------------------------------------------------------------- - Nodes without accelerator 180 cn[1-180] 64GB 16 @ 2.4Ghz qexp, qprod, qlong, qfree - Nodes with GPU accelerator 23 cn[181-203] 96GB 16 @ 2.3Ghz qgpu, qprod - Nodes with MIC accelerator 4 cn[204-207] 96GB 16 @ 2.3GHz qmic, qprod - Fat compute nodes 2 cn[208-209] 512GB 16 @ 2.4GHz qfat, qprod +Node type Count Range Memory Cores [Access](resource-allocation-and-job-execution/resources-allocation-policy.html) +---------------------------- ------- --------------- -------- ------------- -------------------------------------------------------------------------------------------------- +Nodes without accelerator 180 cn[1-180] 64GB 16 @ 2.4Ghz qexp, qprod, qlong, qfree +Nodes with GPU accelerator 23 cn[181-203] 96GB 16 @ 2.3Ghz qgpu, qprod +Nodes with MIC accelerator 4 cn[204-207] 96GB 16 @ 2.3GHz qmic, qprod +Fat compute nodes 2 cn[208-209] 512GB 16 @ 2.4GHz qfat, qprod @@ -218,34 +218,34 @@ with accelerator). Processors support Advanced Vector Extensions (AVX) 256-bit instruction set. ### Intel Sandy Bridge E5-2665 Processor -- eight-core -- speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology -- peak performance: <span class="emphasis">19.2 Gflop/s</span> per - core -- caches: - <div class="itemizedlist"> +- eight-core +- speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology +- peak performance: class="emphasis">19.2 Gflop/s per + core +- caches: + <div class="itemizedlist"> - - L2: 256 KB per core - - L3: 20 MB per processor + - L2: 256 KB per core + - L3: 20 MB per processor - + -- memory bandwidth at the level of the processor: 51.2 GB/s +- memory bandwidth at the level of the processor: 51.2 GB/s ### Intel Sandy Bridge E5-2470 Processor -- eight-core -- speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology -- peak performance: <span class="emphasis">18.4 Gflop/s</span> per - core -- caches: - <div class="itemizedlist"> +- eight-core +- speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology +- peak performance: class="emphasis">18.4 Gflop/s per + core +- caches: + <div class="itemizedlist"> - - L2: 256 KB per core - - L3: 20 MB per processor + - L2: 256 KB per core + - L3: 20 MB per processor - + -- memory bandwidth at the level of the processor: 38.4 GB/s +- memory bandwidth at the level of the processor: 38.4 GB/s @@ -257,7 +257,7 @@ have set PBS resource attribute cpu_freq = 23. -``` +``` $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16:cpu_freq=24 -I ``` @@ -269,52 +269,52 @@ all nodes of job by using resource attribute cpu_turbo_boost. - $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I + $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I Memory Architecture ------------------- ### Compute Node Without Accelerator -- 2 sockets -- Memory Controllers are integrated into processors. - <div class="itemizedlist"> +- 2 sockets +- Memory Controllers are integrated into processors. + <div class="itemizedlist"> - - 8 DDR3 DIMMS per node - - 4 DDR3 DIMMS per CPU - - 1 DDR3 DIMMS per channel - - Data rate support: up to 1600MT/s + - 8 DDR3 DIMMS per node + - 4 DDR3 DIMMS per CPU + - 1 DDR3 DIMMS per channel + - Data rate support: up to 1600MT/s - + -- Populated memory: 8x 8GB DDR3 DIMM 1600Mhz +- Populated memory: 8x 8GB DDR3 DIMM 1600Mhz ### Compute Node With GPU or MIC Accelerator -- 2 sockets -- Memory Controllers are integrated into processors. - <div class="itemizedlist"> +- 2 sockets +- Memory Controllers are integrated into processors. + <div class="itemizedlist"> - - 6 DDR3 DIMMS per node - - 3 DDR3 DIMMS per CPU - - 1 DDR3 DIMMS per channel - - Data rate support: up to 1600MT/s + - 6 DDR3 DIMMS per node + - 3 DDR3 DIMMS per CPU + - 1 DDR3 DIMMS per channel + - Data rate support: up to 1600MT/s - + -- Populated memory: 6x 16GB DDR3 DIMM 1600Mhz +- Populated memory: 6x 16GB DDR3 DIMM 1600Mhz ### Fat Compute Node -- 2 sockets -- Memory Controllers are integrated into processors. - <div class="itemizedlist"> +- 2 sockets +- Memory Controllers are integrated into processors. + <div class="itemizedlist"> - - 16 DDR3 DIMMS per node - - 8 DDR3 DIMMS per CPU - - 2 DDR3 DIMMS per channel - - Data rate support: up to 1600MT/s + - 16 DDR3 DIMMS per node + - 8 DDR3 DIMMS per CPU + - 2 DDR3 DIMMS per channel + - Data rate support: up to 1600MT/s - + -- Populated memory: 16x 32GB DDR3 DIMM 1600Mhz +- Populated memory: 16x 32GB DDR3 DIMM 1600Mhz diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/69842481-634a-484e-90cd-d65e0ddca1e8.jpeg b/converted/docs.it4i.cz/anselm-cluster-documentation/downloadfilesuccessfull.jpeg similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/69842481-634a-484e-90cd-d65e0ddca1e8.jpeg rename to converted/docs.it4i.cz/anselm-cluster-documentation/downloadfilesuccessfull.jpeg diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/environment-and-modules.md b/converted/docs.it4i.cz/anselm-cluster-documentation/environment-and-modules.md index a4e9c7abc5686caf3f59d29b5c353c369d46cd00..a301f19292f5262080771d8cc2e93c5f1138dee9 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/environment-and-modules.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/environment-and-modules.md @@ -3,7 +3,7 @@ Environment and Modules - + ### Environment Customization @@ -11,12 +11,12 @@ After logging in, you may want to configure the environment. Write your preferred path definitions, aliases, functions and module loads in the .bashrc file -``` +``` # ./bashrc # Source global definitions if [ -f /etc/bashrc ]; then - . /etc/bashrc + . /etc/bashrc fi # User specific aliases and functions @@ -33,9 +33,9 @@ fi Do not run commands outputing to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session -interactivity for such commands as <span id="result_box" -class="short_text"><span class="hps alt-edited">stated</span> <span -class="hps">in the previous example.</span></span> +interactivity for such commands as id="result_box" +class="short_text"> class="hps alt-edited">stated +class="hps">in the previous example. ### Application Modules @@ -57,13 +57,13 @@ needs. To check available modules use -``` +``` $ module avail ``` To load a module, for example the octave module use -``` +``` $ module load octave ``` @@ -72,19 +72,19 @@ your active shell such that you are ready to run the octave software To check loaded modules use -``` +``` $ module list ```  To unload a module, for example the octave module use -``` +``` $ module unload octave ``` Learn more on modules by reading the module man page -``` +``` $ man module ``` @@ -96,7 +96,7 @@ the bullx MPI library PrgEnv-intel sets up the INTEL development environment in conjunction with the Intel MPI library -### []()Application Modules Path Expansion +### Application Modules Path Expansion All application modules on Salomon cluster (and further) will be build using tool called @@ -105,7 +105,7 @@ In case that you want to use some applications that are build by EasyBuild already, you have to modify your MODULEPATH environment variable. -``` +``` export MODULEPATH=$MODULEPATH:/apps/easybuild/modules/all/ ``` diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/4d6e7cb7-9aa7-419c-9583-6dfd92b2c015.jpeg b/converted/docs.it4i.cz/anselm-cluster-documentation/executionaccess.jpeg similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/4d6e7cb7-9aa7-419c-9583-6dfd92b2c015.jpeg rename to converted/docs.it4i.cz/anselm-cluster-documentation/executionaccess.jpeg diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/bed3998c-4b82-4b40-83bd-c3528dde2425.jpeg b/converted/docs.it4i.cz/anselm-cluster-documentation/executionaccess2.jpeg similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/bed3998c-4b82-4b40-83bd-c3528dde2425.jpeg rename to converted/docs.it4i.cz/anselm-cluster-documentation/executionaccess2.jpeg diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/hardware-overview.md b/converted/docs.it4i.cz/anselm-cluster-documentation/hardware-overview.md index e996eddb1f9f55a75a64afce653c90d8900b0af9..50ae62cbd9a4b5377a0fe58552b5cb0445d2f79a 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/hardware-overview.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/hardware-overview.md @@ -3,12 +3,12 @@ Hardware Overview - + The Anselm cluster consists of 209 computational nodes named cn[1-209] of which 180 are regular compute nodes, 23 GPU Kepler K20 accelerated nodes, 4 MIC Xeon Phi 5110 accelerated nodes and 2 fat nodes. Each node -is a <span class="WYSIWYG_LINK">powerful</span> x86-64 computer, +is a class="WYSIWYG_LINK">powerful x86-64 computer, equipped with 16 cores (two eight-core Intel Sandy Bridge processors), at least 64GB RAM, and local hard drive. The user access to the Anselm cluster is provided by two login nodes login[1,2]. The nodes are @@ -30,22 +30,22 @@ node (computer) or storage capacity: User-oriented infrastructure Storage Management infrastructure - -------- - login1 - login2 - dm1 - -------- +-------- +login1 +login2 +dm1 +-------- + +Rack 01, Switch isw5 -**Rack 01, Switch isw5 -** - -------------- -------------- -------------- -------------- -------------- - cn186 cn187 cn188 cn189 - cn181 cn182 cn183 cn184 cn185 - -------------- -------------- -------------- -------------- -------------- +-------------- -------------- -------------- -------------- -------------- +cn186 cn187 cn188 cn189 +cn181 cn182 cn183 cn184 cn185 +-------------- -------------- -------------- -------------- -------------- + +Rack 01, Switch isw4 -**Rack 01, Switch isw4 -** cn29 cn30 @@ -99,8 +99,8 @@ Srv node Srv node Srv node ... -**Rack 01, Switch isw0 -** +Rack 01, Switch isw0 + cn11 cn12 @@ -120,8 +120,8 @@ cn7 cn8 cn9 cn10 -**Rack 02, Switch isw10 -** +Rack 02, Switch isw10 + cn73 cn74 @@ -136,8 +136,8 @@ cn191 cn192 cn205 cn206 -**Rack 02, Switch isw9 -** +Rack 02, Switch isw9 + cn65 cn66 @@ -157,8 +157,8 @@ cn61 cn62 cn63 cn64 -**Rack 02, Switch isw6 -** +Rack 02, Switch isw6 + cn47 cn48 @@ -178,8 +178,8 @@ cn43 cn44 cn45 cn46 -**Rack 03, Switch isw15 -** +Rack 03, Switch isw15 + cn193 cn194 @@ -195,8 +195,8 @@ cn123 cn124 cn125 cn126 -**Rack 03, Switch isw14 -** +Rack 03, Switch isw14 + cn109 cn110 @@ -216,8 +216,8 @@ cn105 cn106 cn107 cn108 -**Rack 03, Switch isw11 -** +Rack 03, Switch isw11 + cn91 cn92 @@ -237,8 +237,8 @@ cn87 cn88 cn89 cn90 -**Rack 04, Switch isw20 -** +Rack 04, Switch isw20 + cn173 cn174 @@ -258,8 +258,8 @@ cn169 cn170 cn171 cn172 -**Rack 04, **Switch** isw19 -** +Rack 04, **Switch** isw19 + cn155 cn156 @@ -279,8 +279,8 @@ cn151 cn152 cn153 cn154 -**Rack 04, Switch isw16 -** +Rack 04, Switch isw16 + cn137 cn138 @@ -300,19 +300,19 @@ cn133 cn134 cn135 cn136 -**Rack 05, Switch isw21 -** +Rack 05, Switch isw21 + - -------------- -------------- -------------- -------------- -------------- - cn201 cn202 cn203 cn204 - cn196 cn197 cn198 cn199 cn200 - -------------- -------------- -------------- -------------- -------------- +-------------- -------------- -------------- -------------- -------------- +cn201 cn202 cn203 cn204 +cn196 cn197 cn198 cn199 cn200 +-------------- -------------- -------------- -------------- -------------- - ---------------- - Fat node cn208 - Fat node cn209 - ... - ---------------- +---------------- +Fat node cn208 +Fat node cn209 +... +---------------- @@ -320,12 +320,12 @@ The cluster compute nodes cn[1-207] are organized within 13 chassis. There are four types of compute nodes: -- 180 compute nodes without the accelerator -- 23 compute nodes with GPU accelerator - equipped with NVIDIA Tesla - Kepler K20 -- 4 compute nodes with MIC accelerator - equipped with Intel Xeon Phi - 5110P -- 2 fat nodes - equipped with 512GB RAM and two 100GB SSD drives +- 180 compute nodes without the accelerator +- 23 compute nodes with GPU accelerator - equipped with NVIDIA Tesla + Kepler K20 +- 4 compute nodes with MIC accelerator - equipped with Intel Xeon Phi + 5110P +- 2 fat nodes - equipped with 512GB RAM and two 100GB SSD drives [More about Compute nodes](compute-nodes.html). @@ -333,10 +333,10 @@ GPU and accelerated nodes are available upon request, see the [Resources Allocation Policy](resource-allocation-and-job-execution/resources-allocation-policy.html). -All these nodes are interconnected by fast <span -class="WYSIWYG_LINK">InfiniBand <span class="WYSIWYG_LINK">QDR</span> -network</span> and Ethernet network. [More about the <span -class="WYSIWYG_LINK">Network</span>](network.html). +All these nodes are interconnected by fast +class="WYSIWYG_LINK">InfiniBand class="WYSIWYG_LINK">QDR +network and Ethernet network. [More about the +class="WYSIWYG_LINK">Network](network.html). Every chassis provides Infiniband switch, marked **isw**, connecting all nodes in the chassis, as well as connecting the chassis to the upper level switches. @@ -345,8 +345,8 @@ All nodes share 360TB /home disk storage to store user files. The 146TB shared /scratch storage is available for the scratch data. These file systems are provided by Lustre parallel file system. There is also local disk storage available on all compute nodes /lscratch. [More about -<span -class="WYSIWYG_LINK">Storage</span>](storage.html). + +class="WYSIWYG_LINK">Storage](storage.html). The user access to the Anselm cluster is provided by two login nodes login1, login2, and data mover node dm1. [More about accessing @@ -354,7 +354,7 @@ cluster.](accessing-the-cluster.html)  The parameters are summarized in the following tables: -**In general** +In general** Primary purpose High Performance Computing Architecture of compute nodes @@ -380,19 +380,19 @@ MIC accelerated 4, cn[204-207] Fat compute nodes 2, cn[208-209] -**In total** +In total** Total theoretical peak performance (Rpeak) 94 Tflop/s Total max. LINPACK performance (Rmax) 73 Tflop/s Total amount of RAM 15.136 TB - Node Processor Memory Accelerator - ------------------ --------------------------------------- -------- ---------------------- - w/o accelerator 2x Intel Sandy Bridge E5-2665, 2.4GHz 64GB - - GPU accelerated 2x Intel Sandy Bridge E5-2470, 2.3GHz 96GB NVIDIA Kepler K20 - MIC accelerated 2x Intel Sandy Bridge E5-2470, 2.3GHz 96GB Intel Xeon Phi P5110 - Fat compute node 2x Intel Sandy Bridge E5-2665, 2.4GHz 512GB - +Node Processor Memory Accelerator +------------------ --------------------------------------- -------- ---------------------- +w/o accelerator 2x Intel Sandy Bridge E5-2665, 2.4GHz 64GB - +GPU accelerated 2x Intel Sandy Bridge E5-2470, 2.3GHz 96GB NVIDIA Kepler K20 +MIC accelerated 2x Intel Sandy Bridge E5-2470, 2.3GHz 96GB Intel Xeon Phi P5110 +Fat compute node 2x Intel Sandy Bridge E5-2665, 2.4GHz 512GB -  For more details please refer to the [Compute nodes](compute-nodes.html), diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/202d14e9-e2e1-450b-a584-e78c018d6b6a.jpeg b/converted/docs.it4i.cz/anselm-cluster-documentation/instalationfile.jpeg similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/202d14e9-e2e1-450b-a584-e78c018d6b6a.jpeg rename to converted/docs.it4i.cz/anselm-cluster-documentation/instalationfile.jpeg diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/introduction.md b/converted/docs.it4i.cz/anselm-cluster-documentation/introduction.md index 5be69738b59ac40ab44c1e5aaf11616b9b774647..cb25cce81b9eed12427ad908695327c84b5f6cec 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/introduction.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/introduction.md @@ -3,24 +3,24 @@ Introduction - + Welcome to Anselm supercomputer cluster. The Anselm cluster consists of 209 compute nodes, totaling 3344 compute cores with 15TB RAM and giving -over 94 Tflop/s theoretical peak performance. Each node is a <span -class="WYSIWYG_LINK">powerful</span> x86-64 computer, equipped with 16 +over 94 Tflop/s theoretical peak performance. Each node is a +class="WYSIWYG_LINK">powerful x86-64 computer, equipped with 16 cores, at least 64GB RAM, and 500GB harddrive. Nodes are interconnected by fully non-blocking fat-tree Infiniband network and equipped with Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview](hardware-overview.html). -The cluster runs bullx Linux [<span -class="WYSIWYG_LINK"></span>](http://www.bull.com/bullx-logiciels/systeme-exploitation.html)[operating +The cluster runs bullx Linux [ +class="WYSIWYG_LINK">](http://www.bull.com/bullx-logiciels/systeme-exploitation.html)[operating system](software/operating-system.html), which is -compatible with the <span class="WYSIWYG_LINK">RedHat</span> [<span +compatible with the class="WYSIWYG_LINK">RedHat [ class="WYSIWYG_LINK">Linux -family.</span>](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) +family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) We have installed a wide range of [software](software.1.html) packages targeted at different scientific domains. These packages are accessible via the diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/5498e1ba-2242-4b9c-a799-0377a73f779e.jpeg b/converted/docs.it4i.cz/anselm-cluster-documentation/java_detection.jpeg similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/5498e1ba-2242-4b9c-a799-0377a73f779e.jpeg rename to converted/docs.it4i.cz/anselm-cluster-documentation/java_detection.jpeg diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/30271119-b392-4db9-a212-309fb41925d6.jpeg b/converted/docs.it4i.cz/anselm-cluster-documentation/login.jpeg similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/30271119-b392-4db9-a212-309fb41925d6.jpeg rename to converted/docs.it4i.cz/anselm-cluster-documentation/login.jpeg diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/a6fd5f3f-bce4-45c9-85e1-8d93c6395eee.jpeg b/converted/docs.it4i.cz/anselm-cluster-documentation/loginwithprofile.jpeg similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/a6fd5f3f-bce4-45c9-85e1-8d93c6395eee.jpeg rename to converted/docs.it4i.cz/anselm-cluster-documentation/loginwithprofile.jpeg diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/network.md b/converted/docs.it4i.cz/anselm-cluster-documentation/network.md index b18fe77e1bf6c2a978ab319e8f4d8b9c781b8fe8..f212cbea7da35aab0d3411dd809a9be887b72768 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/network.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/network.md @@ -3,7 +3,7 @@ Network - + All compute and login nodes of Anselm are interconnected by [Infiniband](http://en.wikipedia.org/wiki/InfiniBand) @@ -42,14 +42,14 @@ The network provides **114MB/s** transfer rates via the TCP connection. Example ------- -``` +``` $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob $ qstat -n -u username - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00 - cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16 +15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00 + cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16 $ ssh 10.2.1.110 $ ssh 10.1.1.108 diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/prace.md b/converted/docs.it4i.cz/anselm-cluster-documentation/prace.md index a93c53a7af6feaa9afc033905d39635bacb943e0..7e6fe2800e4641edcc40c1def7102ee7005abf4d 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/prace.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/prace.md @@ -3,7 +3,7 @@ PRACE User Support - + Intro ----- @@ -25,7 +25,7 @@ All general [PRACE User Documentation](http://www.prace-ri.eu/user-documentation/) should be read before continuing reading the local documentation here. -[]()Help and Support +Help and Support -------------------- If you have any troubles, need information, request support or want to @@ -70,28 +70,28 @@ project for LDAP account creation). Most of the information needed by PRACE users accessing the Anselm TIER-1 system can be found here: -- [General user's - FAQ](http://www.prace-ri.eu/Users-General-FAQs) -- [Certificates - FAQ](http://www.prace-ri.eu/Certificates-FAQ) -- [Interactive access using - GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) -- [Data transfer with - GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) -- [Data transfer with - gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer) +- [General user's + FAQ](http://www.prace-ri.eu/Users-General-FAQs) +- [Certificates + FAQ](http://www.prace-ri.eu/Certificates-FAQ) +- [Interactive access using + GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) +- [Data transfer with + GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) +- [Data transfer with + gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer)  Before you start to use any of the services don't forget to create a proxy certificate from your certificate: - $ grid-proxy-init + $ grid-proxy-init To check whether your proxy certificate is still valid (by default it's valid 12 hours), use: - $ grid-proxy-info + $ grid-proxy-info  @@ -99,49 +99,49 @@ To access Anselm cluster, two login nodes running GSI SSH service are available. The service is available from public Internet as well as from the internal PRACE network (accessible only from other PRACE partners). -**Access from PRACE network:** +Access from PRACE network:** -It is recommended to use the single DNS name <span -class="monospace">anselm-prace.it4i.cz</span> which is distributed +It is recommended to use the single DNS name +anselm-prace.it4i.cz which is distributed between the two login nodes. If needed, user can login directly to one of the login nodes. The addresses are: - Login address Port Protocol Login node - ----------------------------- ------ ---------- ------------------ - anselm-prace.it4i.cz 2222 gsissh login1 or login2 - login1-prace.anselm.it4i.cz 2222 gsissh login1 - login2-prace.anselm.it4i.cz 2222 gsissh login2 +Login address Port Protocol Login node +----------------------------- ------ ---------- ------------------ +anselm-prace.it4i.cz 2222 gsissh login1 or login2 +login1-prace.anselm.it4i.cz 2222 gsissh login1 +login2-prace.anselm.it4i.cz 2222 gsissh login2  - $ gsissh -p 2222 anselm-prace.it4i.cz + $ gsissh -p 2222 anselm-prace.it4i.cz When logging from other PRACE system, the prace_service script can be used: - $ gsissh `prace_service -i -s anselm` + $ gsissh `prace_service -i -s anselm`  -**Access from public Internet:** +Access from public Internet:** -It is recommended to use the single DNS name <span -class="monospace">anselm.it4i.cz</span> which is distributed between the +It is recommended to use the single DNS name +anselm.it4i.cz which is distributed between the two login nodes. If needed, user can login directly to one of the login nodes. The addresses are: - Login address Port Protocol Login node - ----------------------- ------ ---------- ------------------ - anselm.it4i.cz 2222 gsissh login1 or login2 - login1.anselm.it4i.cz 2222 gsissh login1 - login2.anselm.it4i.cz 2222 gsissh login2 +Login address Port Protocol Login node +----------------------- ------ ---------- ------------------ +anselm.it4i.cz 2222 gsissh login1 or login2 +login1.anselm.it4i.cz 2222 gsissh login1 +login2.anselm.it4i.cz 2222 gsissh login2 - $ gsissh -p 2222 anselm.it4i.cz + $ gsissh -p 2222 anselm.it4i.cz -When logging from other PRACE system, the <span -class="monospace">prace_service</span> script can be used: +When logging from other PRACE system, the +prace_service script can be used: - $ gsissh `prace_service -e -s anselm` + $ gsissh `prace_service -e -s anselm`  @@ -150,13 +150,13 @@ GridFTP](prace.html#file-transfers), the GSI SSH implementation on Anselm supports also SCP, so for small files transfer gsiscp can be used: - $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ anselm.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ + $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ anselm.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 anselm.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ + $ gsiscp -P 2222 anselm.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ anselm-prace.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ + $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ anselm-prace.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 anselm-prace.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ + $ gsiscp -P 2222 anselm-prace.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ ### Access to X11 applications (VNC) @@ -171,7 +171,7 @@ the SSH based access ([look here](https://docs.it4i.cz/anselm-cluster-documentation/resolveuid/11e53ad0d2fd4c5187537f4baeedff33)), only the port forwarding must be done using GSI SSH: - $ gsissh -p 2222 anselm.it4i.cz -L 5961:localhost:5961 + $ gsissh -p 2222 anselm.it4i.cz -L 5961:localhost:5961 ### Access with SSH @@ -181,7 +181,7 @@ regular users using SSH. For more information please see the [section in general documentation](https://docs.it4i.cz/anselm-cluster-documentation/resolveuid/5d3d6f3d873a42e584cbf4365c4e251b). -[]()File transfers +File transfers ------------------ PRACE users can use the same transfer mechanisms as regular users (if @@ -198,68 +198,68 @@ PRACE partners). There's one control server and three backend servers for striping and/or backup in case one of them would fail. -**Access from PRACE network:** +Access from PRACE network:** - Login address Port Node role - ------------------------------ ------ ----------------------------- - gridftp-prace.anselm.it4i.cz 2812 Front end /control server - login1-prace.anselm.it4i.cz 2813 Backend / data mover server - login2-prace.anselm.it4i.cz 2813 Backend / data mover server - dm1-prace.anselm.it4i.cz 2813 Backend / data mover server +Login address Port Node role +------------------------------ ------ ----------------------------- +gridftp-prace.anselm.it4i.cz 2812 Front end /control server +login1-prace.anselm.it4i.cz 2813 Backend / data mover server +login2-prace.anselm.it4i.cz 2813 Backend / data mover server +dm1-prace.anselm.it4i.cz 2813 Backend / data mover server Copy files **to** Anselm by running the following commands on your local machine: - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ + $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ -Or by using <span class="monospace">prace_service</span> script: +Or by using prace_service script: - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ + $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ Copy files **from** Anselm: - $ globus-url-copy gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ + $ globus-url-copy gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ -Or by using <span class="monospace">prace_service</span> script: +Or by using prace_service script: - $ globus-url-copy gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ + $ globus-url-copy gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_  -**Access from public Internet:** +Access from public Internet:** - Login address Port Node role - ------------------------ ------ ----------------------------- - gridftp.anselm.it4i.cz 2812 Front end /control server - login1.anselm.it4i.cz 2813 Backend / data mover server - login2.anselm.it4i.cz 2813 Backend / data mover server - dm1.anselm.it4i.cz 2813 Backend / data mover server +Login address Port Node role +------------------------ ------ ----------------------------- +gridftp.anselm.it4i.cz 2812 Front end /control server +login1.anselm.it4i.cz 2813 Backend / data mover server +login2.anselm.it4i.cz 2813 Backend / data mover server +dm1.anselm.it4i.cz 2813 Backend / data mover server Copy files **to** Anselm by running the following commands on your local machine: - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ + $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ -Or by using <span class="monospace">prace_service</span> script: +Or by using prace_service script: - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ + $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ Copy files **from** Anselm: - $ globus-url-copy gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ + $ globus-url-copy gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ -Or by using <span class="monospace">prace_service</span> script: +Or by using prace_service script: - $ globus-url-copy gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ + $ globus-url-copy gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_  Generally both shared file systems are available through GridFTP: - File system mount point Filesystem Comment - ------------------------- ------------ ---------------------------------------------------------------- - /home Lustre Default HOME directories of users in format /home/prace/login/ - /scratch Lustre Shared SCRATCH mounted on the whole cluster +File system mount point Filesystem Comment +------------------------- ------------ ---------------------------------------------------------------- +/home Lustre Default HOME directories of users in format /home/prace/login/ +/scratch Lustre Shared SCRATCH mounted on the whole cluster More information about the shared file systems is available [here](storage.html). @@ -290,7 +290,7 @@ PRACE users can use the "prace" module to use the [PRACE Common Production Environment](http://www.prace-ri.eu/PRACE-common-production). - $ module load prace + $ module load prace  @@ -303,22 +303,22 @@ documentation](resource-allocation-and-job-execution/introduction.html). For PRACE users, the default production run queue is "qprace". PRACE users can also use two other queues "qexp" and "qfree". - ------------------------------------------------------------------------------------------------------------------------- - queue Active project Project resources Nodes priority authorization walltime - default/max - --------------------- ---------------- ------------------- --------------------- ---------- --------------- ------------- - **qexp** no none required 2 reserved, high no 1 / 1h - Express queue 8 total +------------------------------------------------------------------------------------------------------------------------- +queue Active project Project resources Nodes priority authorization walltime + default/max +--------------------- ---------------- ------------------- --------------------- ---------- --------------- ------------- +qexp** no none required 2 reserved, high no 1 / 1h +Express queue 8 total - **qprace** yes > 0 178 w/o accelerator medium no 24 / 48h - Production queue - +qprace** yes > 0 178 w/o accelerator medium no 24 / 48h +Production queue + - **qfree** yes none required 178 w/o accelerator very low no 12 / 12h - Free resource queue - ------------------------------------------------------------------------------------------------------------------------- +qfree** yes none required 178 w/o accelerator very low no 12 / 12h +Free resource queue +------------------------------------------------------------------------------------------------------------------------- -**qprace**, the PRACE Production queue****: This queue is intended for +qprace**, the PRACE Production queue****: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprace. The queue runs with medium priority and no special authorization is required to use it. @@ -351,20 +351,20 @@ The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> - $ it4ifree - Password: -     PID  Total Used ...by me Free -   -------- ------- ------ -------- ------- -   OPEN-0-0 1500000 400644  225265 1099356 -   DD-13-1   10000 2606 2606 7394 + $ it4ifree + Password: +     PID  Total Used ...by me Free +   -------- ------- ------ -------- ------- +   OPEN-0-0 1500000 400644  225265 1099356 +   DD-13-1   10000 2606 2606 7394  By default file system quota is applied. To check the current status of the quota use - $ lfs quota -u USER_LOGIN /home - $ lfs quota -u USER_LOGIN /scratch + $ lfs quota -u USER_LOGIN /home + $ lfs quota -u USER_LOGIN /scratch If the quota is insufficient, please contact the [support](prace.html#help-and-support) and request an diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/remote-visualization.md b/converted/docs.it4i.cz/anselm-cluster-documentation/remote-visualization.md index 9871e2aab91188bd4ba65f89412433b7f84723e9..8ca4bf3e7e8b75a5d4c9050cfd32512654a03a96 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/remote-visualization.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/remote-visualization.md @@ -45,7 +45,7 @@ TurboVNC is designed and implemented for cooperation with VirtualGL and available for free for all major platforms. For more information and download, please refer to: <http://sourceforge.net/projects/turbovnc/> -**Always use TurboVNC on both sides** (server and client) **don't mix +Always use TurboVNC on both sides** (server and client) **don't mix TurboVNC and other VNC implementations** (TightVNC, TigerVNC, ...) as the VNC protocol implementation may slightly differ and diminish your user experience by introducing picture artifacts, etc. @@ -67,7 +67,7 @@ Otherwise only the geometry (desktop size) definition is needed. This example defines desktop with dimensions 1200x700 pixels and 24 bit color depth. -``` +``` $ module load turbovnc/1.2.2 $ vncserver -geometry 1200x700 -depth 24 @@ -79,7 +79,7 @@ Log file is /home/username/.vnc/login2:1.log #### 3. Remember which display number your VNC server runs (you will need it in the future to stop the server). {#3-remember-which-display-number-your-vnc-server-runs-you-will-need-it-in-the-future-to-stop-the-server} -``` +``` $ vncserver -list TurboVNC server sessions: @@ -92,7 +92,7 @@ In this example the VNC server runs on display **:1**. #### 4. Remember the exact login node, where your VNC server runs. {#4-remember-the-exact-login-node-where-your-vnc-server-runs} -``` +``` $ uname -n login2 ``` @@ -103,7 +103,7 @@ In this example the VNC server runs on **login2**. To get the port you have to look to the log file of your VNC server. -``` +``` $ grep -E "VNC.*port" /home/username/.vnc/login2:1.log 20/02/2015 14:46:41 Listening for VNC connections on TCP port 5901 ``` @@ -114,12 +114,12 @@ In this example the VNC server listens on TCP port **5901**. Tunnel the TCP port on which your VNC server is listenning. -``` +``` $ ssh login2.anselm.it4i.cz -L 5901:localhost:5901 ``` *If you use Windows and Putty, please refer to port forwarding setup -<span class="internal-link">in the documentation</span>:* + in the documentation:* [https://docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/x-window-and-vnc#section-12](accessing-the-cluster/x-window-and-vnc.html#section-12) #### 7. If you don't have Turbo VNC installed on your workstation. {#7-if-you-don-t-have-turbo-vnc-installed-on-your-workstation} @@ -131,7 +131,7 @@ Get it from: <http://sourceforge.net/projects/turbovnc/> Mind that you should connect through the SSH tunneled port. In this example it is 5901 on your workstation (localhost). -``` +``` $ vncviewer localhost:5901 ``` @@ -148,7 +148,7 @@ workstation.* *Don't forget to correctly shutdown your own VNC server on the login node!* -``` +``` $ vncserver -kill :1 ``` @@ -156,7 +156,7 @@ Access the visualization node ----------------------------- To access the node use a dedicated PBS Professional scheduler queue -**qviz**. The queue has following properties: +qviz**. The queue has following properties: <table> <colgroup> @@ -189,7 +189,7 @@ default/max</th> <td align="left">none required</td> <td align="left">2</td> <td align="left">4</td> -<td align="left"><span><em>150</em></span></td> +<td align="left">><em>150</em></td> <td align="left">no</td> <td align="left">1 hour / 2 hours</td> </tr> @@ -210,14 +210,14 @@ To access the visualization node, follow these steps: *This step is necessary to allow you to proceed with next steps.* -``` +``` $ qsub -I -q qviz -A PROJECT_ID ``` In this example the default values for CPU cores and usage time are used. -``` +``` $ qsub -I -q qviz -A PROJECT_ID -l select=1:ncpus=16 -l walltime=02:00:00 ``` @@ -229,7 +229,7 @@ In this example a whole node for 2 hours is requested. If there are free resources for your request, you will have a shell running on an assigned node. Please remember the name of the node. -``` +``` $ uname -n srv8 ``` @@ -241,7 +241,7 @@ In this example the visualization session was assigned to node **srv8**. Setup the VirtualGL connection to the node, which PBSPro allocated for your job. -``` +``` $ vglconnect srv8 ``` @@ -250,13 +250,13 @@ node, where you will have a shell. #### 3. Load the VirtualGL module. {#3-load-the-virtualgl-module} -``` +``` $ module load virtualgl/2.4 ``` #### 4. Run your desired OpenGL accelerated application using VirtualGL script "vglrun". {#4-run-your-desired-opengl-accelerated-application-using-virtualgl-script-vglrun} -``` +``` $ vglrun glxgears ``` @@ -265,7 +265,7 @@ available through modules, you need at first load the respective module. E. g. to run the **Mentat** OpenGL application from **MARC** software package use: -``` +``` $ module load marc/2013.1 $ vglrun mentat ``` diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/capacity-computing.md b/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/capacity-computing.md index bd2495e0d31583bb7c89b5558830964d19088e1e..5ccce37c5c3721a9ed72a592f9e906b46dde435a 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/capacity-computing.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/capacity-computing.md @@ -3,12 +3,12 @@ Capacity computing - + Introduction ------------ -In many cases, it is useful to submit huge (<span>100+</span>) number of +In many cases, it is useful to submit huge (>100+) number of computational jobs into the PBS queue system. Huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations, achieving best runtime, throughput and computer @@ -21,28 +21,28 @@ for all users. For this reason, the number of jobs is **limited to 100 per user, 1000 per job array** Please follow one of the procedures below, in case you wish to schedule -more than <span>100</span> jobs at a time. - -- Use [Job arrays](capacity-computing.html#job-arrays) - when running huge number of - [multithread](capacity-computing.html#shared-jobscript-on-one-node) - (bound to one node only) or multinode (multithread across - several nodes) jobs -- Use [GNU - parallel](capacity-computing.html#gnu-parallel) when - running single core jobs -- Combine[GNU parallel with Job - arrays](capacity-computing.html#combining-job-arrays-and-gnu-parallel) - when running huge number of single core jobs +more than >100 jobs at a time. + +- Use [Job arrays](capacity-computing.html#job-arrays) + when running huge number of + [multithread](capacity-computing.html#shared-jobscript-on-one-node) + (bound to one node only) or multinode (multithread across + several nodes) jobs +- Use [GNU + parallel](capacity-computing.html#gnu-parallel) when + running single core jobs +- Combine[GNU parallel with Job + arrays](capacity-computing.html#combining-job-arrays-and-gnu-parallel) + when running huge number of single core jobs Policy ------ -1. A user is allowed to submit at most 100 jobs. Each job may be [a job - array](capacity-computing.html#job-arrays). -2. The array size is at most 1000 subjobs. +1.A user is allowed to submit at most 100 jobs. Each job may be [a job + array](capacity-computing.html#job-arrays). +2.The array size is at most 1000 subjobs. -[]()Job arrays +Job arrays -------------- Huge number of jobs may be easily submitted and managed as a job array. @@ -51,22 +51,22 @@ A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: -- each subjob has a unique index, $PBS_ARRAY_INDEX -- job Identifiers of subjobs only differ by their indices -- the state of subjobs can differ (R,Q,...etc.) +- each subjob has a unique index, $PBS_ARRAY_INDEX +- job Identifiers of subjobs only differ by their indices +- the state of subjobs can differ (R,Q,...etc.) All subjobs within a job array have the same scheduling priority and schedule as independent jobs. Entire job array is submitted through a single qsub command and may be managed by qdel, qalter, qhold, qrls and qsig commands as a single job. -### []()Shared jobscript +### Shared jobscript All subjobs in job array use the very same, single jobscript. Each subjob runs its own instance of the jobscript. The instances execute different work controlled by $PBS_ARRAY_INDEX variable. -[]()Example: +Example: Assume we have 900 input files with name beginning with "file" (e. g. file001, ..., file900). Assume we would like to use each of these input @@ -75,13 +75,13 @@ files with program executable myprog.x, each as a separate job. First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) - all input files in our example: -``` +``` $ find . -name 'file*' > tasklist ``` Then we create jobscript: -``` +``` #!/bin/bash #PBS -A PROJECT_ID #PBS -q qprod @@ -92,7 +92,7 @@ SCR=/lscratch/$PBS_JOBID mkdir -p $SCR ; cd $SCR || exit # get individual tasks from tasklist with index from PBS JOB ARRAY -TASK=$(sed -n "${PBS_ARRAY_INDEX}p" $PBS_O_WORKDIR/tasklist) +TASK=$(sed -n "${PBS_ARRAY_INDEX}p" $PBS_O_WORKDIR/tasklist) # copy input file and executable to scratch cp $PBS_O_WORKDIR/$TASK input ; cp $PBS_O_WORKDIR/myprog.x . @@ -108,12 +108,12 @@ In this example, the submit directory holds the 900 input files, executable myprog.x and the jobscript file. As input for each run, we take the filename of input file from created tasklist file. We copy the input file to local scratch /lscratch/$PBS_JOBID, execute the myprog.x -and copy the output file back to <span>the submit directory</span>, +and copy the output file back to >the submit directory, under the $TASK.out name. The myprog.x runs on one node only and must use threads to run in parallel. Be aware, that if the myprog.x **is not multithreaded**, then all the **jobs are run as single thread programs in sequential** manner. Due to allocation of the whole node, the -**accounted time is equal to the usage of whole node**, while using only +accounted time is equal to the usage of whole node**, while using only 1/16 of the node! If huge number of parallel multicore (in means of multinode multithread, @@ -129,7 +129,7 @@ To submit the job array, use the qsub -J command. The 900 jobs of the [example above](capacity-computing.html#array_example) may be submitted like this: -``` +``` $ qsub -N JOBNAME -J 1-900 jobscript 12345[].dm2 ``` @@ -142,7 +142,7 @@ forget to set your valid PROJECT_ID and desired queue). Sometimes for testing purposes, you may need to submit only one-element array. This is not allowed by PBSPro, but there's a workaround: -``` +``` $ qsub -N JOBNAME -J 9-10:2 jobscript ``` @@ -153,40 +153,40 @@ submitting/running your job. Check status of the job array by the qstat command. -``` +``` $ qstat -a 12345[].dm2 dm2: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -12345[].dm2 user2 qprod xx 13516 1 16 -- 00:50 B 00:02 +12345[].dm2 user2 qprod xx 13516 1 16 -- 00:50 B 00:02 ``` The status B means that some subjobs are already running. Check status of the first 100 subjobs by the qstat command. -``` +``` $ qstat -a 12345[1-100].dm2 dm2: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -12345[1].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:02 -12345[2].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:02 -12345[3].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:01 -12345[4].dm2 user2 qprod xx 13516 1 16 -- 00:50 Q -- - . . . . . . . . . . . - , . . . . . . . . . . -12345[100].dm2 user2 qprod xx 13516 1 16 -- 00:50 Q -- +12345[1].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:02 +12345[2].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:02 +12345[3].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:01 +12345[4].dm2 user2 qprod xx 13516 1 16 -- 00:50 Q -- + . . . . . . . . . . . + , . . . . . . . . . . +12345[100].dm2user2 qprod xx 13516 1 16 -- 00:50 Q -- ``` Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted. -``` +``` $ qdel 12345[].dm2 ``` @@ -194,20 +194,20 @@ Deleting large job arrays may take a while. Display status information for all user's jobs, job arrays, and subjobs. -``` +``` $ qstat -u $USER -t ``` Display status information for all user's subjobs. -``` +``` $ qstat -u $USER -tJ ``` Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation.html). -[]()GNU parallel +GNU parallel ---------------- Use GNU parallel to run many single core tasks on one node. @@ -219,7 +219,7 @@ useful in running single core jobs via the queue system on Anselm. For more information and examples see the parallel man page: -``` +``` $ module add parallel $ man parallel ``` @@ -230,7 +230,7 @@ The GNU parallel shell executes multiple instances of the jobscript using all cores on the node. The instances execute different work, controlled by the $PARALLEL_SEQ variable. -[]()Example: +Example: Assume we have 101 input files with name beginning with "file" (e. g. file001, ..., file101). Assume we would like to use each of these input @@ -240,13 +240,13 @@ job. We call these single core jobs tasks. First, we create a tasklist file, listing all tasks - all input files in our example: -``` +``` $ find . -name 'file*' > tasklist ``` Then we create jobscript: -``` +``` #!/bin/bash #PBS -A PROJECT_ID #PBS -q qprod @@ -288,7 +288,7 @@ To submit the job, use the qsub command. The 101 tasks' job of the [example above](capacity-computing.html#gp_example) may be submitted like this: -``` +``` $ qsub -N JOBNAME jobscript 12345.dm2 ``` @@ -300,7 +300,7 @@ complete in less than 2 hours. Please note the #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. -[]()Job arrays and GNU parallel +Job arrays and GNU parallel ------------------------------- Combine the Job arrays and GNU parallel for best throughput of single @@ -322,7 +322,7 @@ GNU parallel shell executes multiple instances of the jobscript using all cores on the node. The instances execute different work, controlled by the $PBS_JOB_ARRAY and $PARALLEL_SEQ variables. -[]()Example: +Example: Assume we have 992 input files with name beginning with "file" (e. g. file001, ..., file992). Assume we would like to use each of these input @@ -332,20 +332,20 @@ job. We call these single core jobs tasks. First, we create a tasklist file, listing all tasks - all input files in our example: -``` +``` $ find . -name 'file*' > tasklist ``` Next we create a file, controlling how many tasks will be executed in one subjob -``` +``` $ seq 32 > numtasks ``` Then we create jobscript: -``` +``` #!/bin/bash #PBS -A PROJECT_ID #PBS -q qprod @@ -385,13 +385,13 @@ Select subjob walltime and number of tasks per subjob carefully  When deciding this values, think about following guiding rules : -1. Let n=N/16. Inequality (n+1) * T < W should hold. The N is - number of tasks per subjob, T is expected single task walltime and W - is subjob walltime. Short subjob walltime improves scheduling and - job throughput. -2. Number of tasks should be modulo 16. -3. These rules are valid only when all tasks have similar task - walltimes T. +1.Let n=N/16. Inequality (n+1) * T < W should hold. The N is + number of tasks per subjob, T is expected single task walltime and W + is subjob walltime. Short subjob walltime improves scheduling and + job throughput. +2.Number of tasks should be modulo 16. +3.These rules are valid only when all tasks have similar task + walltimes T. ### Submit the job array @@ -400,7 +400,7 @@ the [example above](capacity-computing.html#combined_example) may be submitted like this: -``` +``` $ qsub -N JOBNAME -J 1-992:32 jobscript 12345[].dm2 ``` @@ -426,7 +426,7 @@ production jobs. Unzip the archive in an empty directory on Anselm and follow the instructions in the README file -``` +``` $ unzip capacity.zip $ cat README ``` diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/introduction.md b/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/introduction.md index bde0cbf9be729d4cfe36c733753f35cefe93c855..06d093ce6f266d317868007e2c45e51e92cfb5a9 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/introduction.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/introduction.md @@ -3,7 +3,7 @@ Resource Allocation and Job Execution - + To run a [job](../introduction.html), [computational resources](../introduction.html) for this particular job @@ -27,11 +27,11 @@ queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are available to Anselm users: -- **qexp**, the Express queue -- **qprod**, the Production queue**** -- **qlong**, the Long queue, regula -- **qnvidia, qmic, qfat**, the Dedicated queues -- **qfree,** the Free resource utilization queue +- **qexp**, the Express queue +- **qprod**, the Production queue**** +- **qlong**, the Long queue, regula +- **qnvidia, qmic, qfat**, the Dedicated queues +- **qfree,** the Free resource utilization queue Check the queue status at <https://extranet.it4i.cz/anselm/> @@ -60,7 +60,7 @@ Capacity computing Use Job arrays when running huge number of jobs. Use GNU Parallel and/or Job arrays when running (many) single core jobs. -In many cases, it is useful to submit huge (<span>100+</span>) number of +In many cases, it is useful to submit huge (>100+) number of computational jobs into the PBS queue system. Huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations, achieving best runtime, throughput and computer diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job-priority.md b/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job-priority.md index 5750dbfc67906bdd97579d087e98f18d9c1b4183..e5a5d60ef558b5c3c85aa639cf01c60d74a93021 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job-priority.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job-priority.md @@ -10,9 +10,9 @@ execution priority to select which job(s) to run. Job execution priority on Anselm is determined by these job properties (in order of importance): -1. queue priority -2. fairshare priority -3. eligible time +1.queue priority +2.fairshare priority +3.eligible time ### Queue priority @@ -48,27 +48,27 @@ Usage counts allocated corehours (ncpus*walltime). Usage is decayed, or cut in half periodically, at the interval 168 hours (one week). Jobs queued in queue qexp are not calculated to project's usage. -<span>Calculated usage and fairshare priority can be seen at -<https://extranet.it4i.cz/anselm/projects>.</span> +>Calculated usage and fairshare priority can be seen at +<https://extranet.it4i.cz/anselm/projects>. -<span> -<span>Calculated fairshare priority can be also seen as -Resource_List.fairshare attribute of a job.</span> -</span> +> +>Calculated fairshare priority can be also seen as +Resource_List.fairshare attribute of a job. -### <span>Eligible time</span> + +### >Eligible time Eligible time is amount (in seconds) of eligible time job accrued while waiting to run. Jobs with higher eligible time gains higher -pri<span><span></span></span>ority. +pri>>ority. Eligible time has the least impact on execution priority. Eligible time is used for sorting jobs with equal queue priority and fairshare -priority. It is very, very difficult for <span>eligible time</span> to +priority. It is very, very difficult for >eligible time to compete with fairshare priority. -<span><span>Eligible time can be seen as eligible_time attribute of -job.</span></span> +>>Eligible time can be seen as eligible_time attribute of +job. ### Formula @@ -78,7 +78,7 @@ Job execution priority (job sort formula) is calculated as: ### Job backfilling -<span>Anselm cluster uses job backfilling.</span> +>Anselm cluster uses job backfilling. Backfilling means fitting smaller jobs around the higher-priority jobs that the scheduler is going to run next, in such a way that the diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job-submission-and-execution.md b/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job-submission-and-execution.md index 9954980e40e1e8eb13eb2e0d25287a3bf64eb072..9bb39151d678be66ad23bf47b835ac79ca393762 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job-submission-and-execution.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job-submission-and-execution.md @@ -3,27 +3,27 @@ Job submission and execution - + Job Submission -------------- When allocating computational resources for the job, please specify -1. suitable queue for your job (default is qprod) -2. number of computational nodes required -3. number of cores per node required -4. maximum wall time allocated to your calculation, note that jobs - exceeding maximum wall time will be killed -5. Project ID -6. Jobscript or interactive switch +1.suitable queue for your job (default is qprod) +2.number of computational nodes required +3.number of cores per node required +4.maximum wall time allocated to your calculation, note that jobs + exceeding maximum wall time will be killed +5.Project ID +6.Jobscript or interactive switch Use the **qsub** command to submit your job to a queue for allocation of the computational resources. Submit the job using the qsub command: -``` +``` $ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] jobscript ``` @@ -36,7 +36,7 @@ on first of the allocated nodes.** ### Job Submission Examples -``` +``` $ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=16,walltime=03:00:00 ./myjob ``` @@ -47,7 +47,7 @@ myjob will be executed on the first node in the allocation.  -``` +``` $ qsub -q qexp -l select=4:ncpus=16 -I ``` @@ -57,7 +57,7 @@ available interactively  -``` +``` $ qsub -A OPEN-0-0 -q qnvidia -l select=10:ncpus=16 ./myjob ``` @@ -67,7 +67,7 @@ Jobscript myjob will be executed on the first node in the allocation.  -``` +``` $ qsub -A OPEN-0-0 -q qfree -l select=10:ncpus=16 ./myjob ``` @@ -83,7 +83,7 @@ All qsub options may be [saved directly into the jobscript](job-submission-and-execution.html#PBSsaved). In such a case, no options to qsub are needed. -``` +``` $ qsub ./myjob ``` @@ -92,7 +92,7 @@ $ qsub ./myjob By default, the PBS batch system sends an e-mail only when the job is aborted. Disabling mail events completely can be done like this: -``` +``` $ qsub -m n ``` @@ -103,7 +103,7 @@ Advanced job placement Specific nodes may be allocated via the PBS -``` +``` qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16:host=cn171+1:ncpus=16:host=cn172 -I ``` @@ -117,17 +117,17 @@ interactively. Nodes equipped with Intel Xeon E5-2665 CPU have base clock frequency 2.4GHz, nodes equipped with Intel Xeon E5-2470 CPU have base frequency 2.3 GHz (see section Compute Nodes for details). Nodes may be selected -via the PBS resource attribute <span -class="highlightedSearchTerm">cpu_freq</span> . +via the PBS resource attribute +class="highlightedSearchTerm">cpu_freq . - CPU Type base freq. Nodes cpu_freq attribute - -------------------- ------------ ---------------------------- --------------------- - Intel Xeon E5-2665 2.4GHz cn[1-180], cn[208-209] 24 - Intel Xeon E5-2470 2.3GHz cn[181-207] 23 +CPU Type base freq. Nodes cpu_freq attribute +-------------------- ------------ ---------------------------- --------------------- +Intel Xeon E5-2665 2.4GHz cn[1-180], cn[208-209] 24 +Intel Xeon E5-2470 2.3GHz cn[181-207] 23  -``` +``` $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16:cpu_freq=24 -I ``` @@ -143,8 +143,8 @@ with Intel Xeon E5-2665 CPU. Groups of computational nodes are connected to chassis integrated Infiniband switches. These switches form the leaf switch layer of the -[Infiniband network](../network.html) <span -class="internal-link">fat</span> tree topology. Nodes sharing the leaf +[Infiniband network](../network.html) +fat tree topology. Nodes sharing the leaf switch can communicate most efficiently. Sharing the same switch prevents hops in the network and provides for unbiased, most efficient network communication. @@ -160,7 +160,7 @@ efficiently: - qsub -A OPEN-0-0 -q qprod -l select=18:ncpus=16:ibswitch=isw11 ./myjob + qsub -A OPEN-0-0 -q qprod -l select=18:ncpus=16:ibswitch=isw11 ./myjob In this example, we request all the 18 nodes sharing the isw11 switch @@ -179,7 +179,7 @@ keeping the default. If necessary (such as in case of benchmarking) you can disable the Turbo for all nodes of the job by using the PBS resource attribute cpu_turbo_boost - $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I + $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I More about the Intel Turbo Boost in the TurboBoost section @@ -190,10 +190,10 @@ very special and demanding MPI program. We request Turbo off, 2 full chassis of compute nodes (nodes sharing the same IB switches) for 30 minutes: - $ qsub -A OPEN-0-0 -q qprod - -l select=18:ncpus=16:ibswitch=isw10:mpiprocs=1:ompthreads=16+18:ncpus=16:ibswitch=isw20:mpiprocs=16:ompthreads=1 - -l cpu_turbo_boost=0,walltime=00:30:00 - -N Benchmark ./mybenchmark + $ qsub -A OPEN-0-0 -q qprod + -l select=18:ncpus=16:ibswitch=isw10:mpiprocs=1:ompthreads=16+18:ncpus=16:ibswitch=isw20:mpiprocs=16:ompthreads=1 + -l cpu_turbo_boost=0,walltime=00:30:00 + -N Benchmark ./mybenchmark The MPI processes will be distributed differently on the nodes connected to the two switches. On the isw10 nodes, we will run 1 MPI process per @@ -209,25 +209,25 @@ Job Management Check status of your jobs using the **qstat** and **check-pbs-jobs** commands -``` +``` $ qstat -a $ qstat -a -u username $ qstat -an -u username $ qstat -f 12345.srv11 ``` -[]()Example: +Example: -``` +``` $ qstat -a srv11: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -16287.srv11 user1 qlong job1 6183 4 64 -- 144:0 R 38:25 -16468.srv11 user1 qlong job2 8060 4 64 -- 144:0 R 17:44 -16547.srv11 user2 qprod job3x 13516 2 32 -- 48:00 R 00:58 +16287.srv11 user1 qlong job1 6183 4 64 -- 144:0 R 38:25 +16468.srv11 user1 qlong job2 8060 4 64 -- 144:0 R 17:44 +16547.srv11 user2 qprod job3x 13516 2 32 -- 48:00 R 00:58 ``` In this example user1 and user2 are running jobs named job1, job2 and @@ -243,7 +243,7 @@ of user's PBS jobs' processes on execution hosts. Display load, processes. Display job standard and error output. Continuously display (tail -f) job standard or error output. -``` +``` $ check-pbs-jobs --check-all $ check-pbs-jobs --print-load --print-processes $ check-pbs-jobs --print-job-out --print-job-err @@ -255,7 +255,7 @@ $ check-pbs-jobs --jobid JOBID --tailf-job-out Examples: -``` +``` $ check-pbs-jobs --check-all JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 Check session id: OK @@ -267,16 +267,16 @@ cn165: No process In this example we see that job 35141.dm2 currently runs no process on allocated node cn165, which may indicate an execution error. -``` +``` $ check-pbs-jobs --print-load --print-processes JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 Print load cn164: LOAD: 16.01, 16.01, 16.00 -cn165: LOAD: 0.01, 0.00, 0.01 +cn165: LOAD:0.01, 0.00, 0.01 Print processes - %CPU CMD -cn164: 0.0 -bash -cn164: 0.0 /bin/bash /var/spool/PBS/mom_priv/jobs/35141.dm2.SC + %CPU CMD +cn164:0.0 -bash +cn164:0.0 /bin/bash /var/spool/PBS/mom_priv/jobs/35141.dm2.SC cn164: 99.7 run-task ... ``` @@ -285,11 +285,11 @@ In this example we see that job 35141.dm2 currently runs process run-task on node cn164, using one thread only, while node cn165 is empty, which may indicate an execution error. -``` +``` $ check-pbs-jobs --jobid 35141.dm2 --print-job-out JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 Print job standard output: -======================== Job start ========================== +======================== Job start========================== Started at   : Fri Aug 30 02:47:53 CEST 2013 Script name  : script Run loop 1 @@ -301,23 +301,23 @@ In this example, we see actual output (some iteration loops) of the job 35141.dm2 Manage your queued or running jobs, using the **qhold**, **qrls**, -**qdel,** **qsig** or **qalter** commands +qdel,** **qsig** or **qalter** commands You may release your allocation at any time, using qdel command -``` +``` $ qdel 12345.srv11 ``` You may kill a running job by force, using qsig command -``` +``` $ qsig -s 9 12345.srv11 ``` Learn more by reading the pbs man page -``` +``` $ man pbs_professional ``` @@ -337,16 +337,16 @@ manager. The jobscript or interactive shell is executed on first of the allocated nodes. -``` +``` $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob $ qstat -n -u username srv11: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00 - cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16 +15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00 + cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16 ```  In this example, the nodes cn17, cn108, cn109 and cn110 were allocated @@ -357,7 +357,7 @@ use as well. The jobscript or interactive shell is by default executed in home directory -``` +``` $ qsub -q qexp -l select=4:ncpus=16 -I qsub: waiting for job 15210.srv11 to start qsub: job 15210.srv11 ready @@ -379,7 +379,7 @@ Calculations on allocated nodes may be executed remotely via the MPI, ssh, pdsh or clush. You may find out which nodes belong to the allocation by reading the $PBS_NODEFILE file -``` +``` qsub -q qexp -l select=4:ncpus=16 -I qsub: waiting for job 15210.srv11 to start qsub: job 15210.srv11 ready @@ -413,7 +413,7 @@ The recommended way to run production jobs is to change to /scratch directory early in the jobscript, copy all inputs to /scratch, execute the calculations and copy outputs to home directory. -``` +``` #!/bin/bash # change to scratch directory, exit on failure @@ -456,14 +456,14 @@ subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished. -[]()Store the qsub options within the jobscript. +Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. Example jobscript for an MPI job with preloaded inputs and executables, options for qsub are stored within the script : -``` +``` #!/bin/bash #PBS -q qprod #PBS -N MYJOB @@ -488,7 +488,7 @@ exit In this example, input and executable files are assumed preloaded manually in /scratch/$USER/myjob directory. Note the **mpiprocs** and -**ompthreads** qsub options, controlling behavior of the MPI execution. +ompthreads** qsub options, controlling behavior of the MPI execution. The mympiprog.x is executed as one process per node, on all 100 allocated nodes. If mympiprog.x implements OpenMP threads, it will run 16 threads per node. @@ -498,7 +498,7 @@ OpenMPI](../software/mpi-1/Running_OpenMPI.html) and [Running MPICH2](../software/mpi-1/running-mpich2.html) sections. -### Example Jobscript for Single Node Calculation[]() +### Example Jobscript for Single Node Calculation Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. @@ -506,7 +506,7 @@ scratch will be deleted immediately after the job ends. Example jobscript for single node calculation, using [local scratch](../storage.html) on the node: -``` +``` #!/bin/bash # change to local scratch directory diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/resources-allocation-policy.md b/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/resources-allocation-policy.md index 1e2bfd00355f1021de4cd7c191704c5ec94b34ab..23c8050fa39ffce846574d24cfcbcf7fcec28807 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/resources-allocation-policy.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/resources-allocation-policy.md @@ -3,7 +3,7 @@ Resources Allocation Policy - + Resources Allocation Policy --------------------------- @@ -51,7 +51,7 @@ Express queue</td> <td align="left">2 reserved, 31 total<br /> including MIC, GPU and FAT nodes</td> <td align="left">1</td> -<td align="left"><span><em>150</em></span></td> +<td align="left">><em>150</em></td> <td align="left">no</td> <td align="left">1h</td> </tr> @@ -62,7 +62,7 @@ Production queue</td> <br /> </td> <td align="left">> 0</td> -<td align="left"><p><span><em>178 nodes w/o accelerator</em></span><br /> +<td align="left"><p>><em>178 nodes w/o accelerator</em><br /> <br /> </p></td> <td align="left">16</td> @@ -90,7 +90,7 @@ Dedicated queues</p></td> 4 total qmic<br /> 2 total qfat</td> <td align="left">16</td> -<td align="left"><span><em>200</em></span></td> +<td align="left">><em>200</em></td> <td align="left">yes</td> <td align="left">24/48h</td> </tr> @@ -108,7 +108,7 @@ Free resource queue</td> </tbody> </table> -**The qfree queue is not free of charge**. [Normal +The qfree queue is not free of charge**. [Normal accounting](resources-allocation-policy.html#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does @@ -116,65 +116,65 @@ not apply for Directors Discreation's projects (DD projects) by default. Usage of qfree after exhaustion of DD projects computational resources is allowed after request for this queue. -**The qexp queue is equipped with the nodes not having the very same CPU +The qexp queue is equipped with the nodes not having the very same CPU clock speed.** Should you need the very same CPU speed, you have to select the proper nodes during the PSB job submission. -**** - -- **qexp**, the Express queue: This queue is dedicated for testing and - running very small jobs. It is not required to specify a project to - enter the qexp. <span>*<span>There are 2 nodes always reserved for - this queue (w/o accelerator), maximum 8 nodes are available via the - qexp for a particular user, from a pool of nodes containing - **Nvidia** accelerated nodes (cn181-203), **MIC** accelerated - nodes (cn204-207) and **Fat** nodes with 512GB RAM (cn208-209). This - enables to test and tune also accelerated code or code with higher - RAM requirements</span>.*</span> The nodes may be allocated on per - core basis. No special authorization is required to use it. The - maximum runtime in qexp is 1 hour. -- **qprod**, the Production queue****: This queue is intended for - normal production runs. It is required that active project with - nonzero remaining resources is specified to enter the qprod. All - nodes may be accessed via the qprod queue, except the reserved ones. - <span>*<span>178 nodes without accelerator are - included</span>.*</span> Full nodes, 16 cores per node - are allocated. The queue runs with medium priority and no special - authorization is required to use it. The maximum runtime in qprod is - 48 hours. -- **qlong**, the Long queue****: This queue is intended for long - production runs. It is required that active project with nonzero - remaining resources is specified to enter the qlong. Only 60 nodes - without acceleration may be accessed via the qlong queue. Full - nodes, 16 cores per node are allocated. The queue runs with medium - priority and no special authorization is required to use it.<span> - *The maximum runtime in qlong is 144 hours (three times of the - standard qprod time - 3 * 48 h).*</span> -- **qnvidia, qmic, qfat**, the Dedicated queues****: The queue qnvidia - is dedicated to access the Nvidia accelerated nodes, the qmic to - access MIC nodes and qfat the Fat nodes. It is required that active - project with nonzero remaining resources is specified to enter - these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full - nodes, 16 cores per node are allocated. The queues run with<span> - *very high priority*</span>, the jobs will be scheduled before the - jobs coming from the<span> *qexp* </span>queue. An PI<span> *needs - explicitly* </span>ask - [support](https://support.it4i.cz/rt/) for - authorization to enter the dedicated queues for all users associated - to her/his Project. -- **qfree**, The Free resource queue****: The queue qfree is intended - for utilization of free resources, after a Project exhausted all its - allocated computational resources (Does not apply to DD projects - by default. DD projects have to request for persmission on qfree - after exhaustion of computational resources.). It is required that - active project is specified to enter the queue, however no remaining - resources are required. Consumed resources will be accounted to - the Project. Only 178 nodes without accelerator may be accessed from - this queue. Full nodes, 16 cores per node are allocated. The queue - runs with very low priority and no special authorization is required - to use it. The maximum runtime in qfree is 12 hours. +** + +- **qexp**, the Express queue: This queue is dedicated for testing and + running very small jobs. It is not required to specify a project to + enter the qexp. >*>There are 2 nodes always reserved for + this queue (w/o accelerator), maximum 8 nodes are available via the + qexp for a particular user, from a pool of nodes containing + **Nvidia** accelerated nodes (cn181-203), **MIC** accelerated + nodes (cn204-207) and **Fat** nodes with 512GB RAM (cn208-209). This + enables to test and tune also accelerated code or code with higher + RAM requirements.* The nodes may be allocated on per + core basis. No special authorization is required to use it. The + maximum runtime in qexp is 1 hour. +- **qprod**, the Production queue****: This queue is intended for + normal production runs. It is required that active project with + nonzero remaining resources is specified to enter the qprod. All + nodes may be accessed via the qprod queue, except the reserved ones. + >*>178 nodes without accelerator are + included.* Full nodes, 16 cores per node + are allocated. The queue runs with medium priority and no special + authorization is required to use it. The maximum runtime in qprod is + 48 hours. +- **qlong**, the Long queue****: This queue is intended for long + production runs. It is required that active project with nonzero + remaining resources is specified to enter the qlong. Only 60 nodes + without acceleration may be accessed via the qlong queue. Full + nodes, 16 cores per node are allocated. The queue runs with medium + priority and no special authorization is required to use it.> + *The maximum runtime in qlong is 144 hours (three times of the + standard qprod time - 3 * 48 h).* +- **qnvidia, qmic, qfat**, the Dedicated queues****: The queue qnvidia + is dedicated to access the Nvidia accelerated nodes, the qmic to + access MIC nodes and qfat the Fat nodes. It is required that active + project with nonzero remaining resources is specified to enter + these queues. 23 nvidia, 4 mic and 2 fat nodes are included. Full + nodes, 16 cores per node are allocated. The queues run with> + *very high priority*, the jobs will be scheduled before the + jobs coming from the> *qexp* queue. An PI> *needs + explicitly* ask + [support](https://support.it4i.cz/rt/) for + authorization to enter the dedicated queues for all users associated + to her/his Project. +- **qfree**, The Free resource queue****: The queue qfree is intended + for utilization of free resources, after a Project exhausted all its + allocated computational resources (Does not apply to DD projects + by default. DD projects have to request for persmission on qfree + after exhaustion of computational resources.). It is required that + active project is specified to enter the queue, however no remaining + resources are required. Consumed resources will be accounted to + the Project. Only 178 nodes without accelerator may be accessed from + this queue. Full nodes, 16 cores per node are allocated. The queue + runs with very low priority and no special authorization is required + to use it. The maximum runtime in qfree is 12 hours. ### Notes ** -** + The job wall clock time defaults to **half the maximum time**, see table above. Longer wall time limits can be [set manually, see @@ -197,14 +197,14 @@ Check the status of jobs, queues and compute nodes at Display the queue status on Anselm: -``` +``` $ qstat -q ``` The PBS allocation overview may be obtained also using the rspbs command. -``` +``` $ rspbs Usage: rspbs [options] @@ -260,7 +260,7 @@ Options:  --incl-finished      Include finished jobs ``` -[]()Resources Accounting Policy +Resources Accounting Policy ------------------------------- ### The Core-Hour @@ -285,13 +285,13 @@ User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. -``` +``` $ it4ifree Password: -    PID  Total Used ...by me Free +    PID  Total Used ...by me Free   -------- ------- ------ -------- -------   OPEN-0-0 1500000 400644  225265 1099356 -  DD-13-1   10000 2606 2606 7394 +  DD-13-1   10000 2606 2606 7394 ```  diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys.md index b8d91d70ebb3f2b44c858215c70ac5e30ad3bea5..c34ea3094c159950bec97fc5c0b6a136f192e483 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys.md @@ -1,7 +1,7 @@ Overview of ANSYS Products ========================== -**[SVS FEM](http://www.svsfem.cz/)** as ***[ANSYS +[SVS FEM](http://www.svsfem.cz/)** as ***[ANSYS Channel partner](http://www.ansys.com/)*** for Czech Republic provided all ANSYS licenses for ANSELM cluster and supports of all ANSYS Products (Multiphysics, Mechanical, MAPDL, CFX, Fluent, @@ -13,15 +13,15 @@ Anselm provides as commercial as academic variants. Academic variants are distinguished by "**Academic...**" word in the name of  license or by two letter preposition "**aa_**" in the license feature name. Change of license is realized on command line respectively directly in user's -pbs file (see individual products). [<span id="result_box" -class="short_text"><span class="hps">More</span> <span class="hps">about -licensing</span> <span -class="hps">here</span></span>](ansys/licensing.html) +pbs file (see individual products). [ id="result_box" +class="short_text"> class="hps">More class="hps">about +licensing +class="hps">here](ansys/licensing.html) To load the latest version of any ANSYS product (Mechanical, Fluent, CFX, MAPDL,...) load the module: - $ module load ansys + $ module load ansys ANSYS supports interactive regime, but due to assumed solution of extremely difficult tasks it is not recommended. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-cfx.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-cfx.md index e12ff157564f5d83c1218134cba586923268ff63..515285d8ed20b3b74b75c3b1e1a570e231573efd 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-cfx.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-cfx.md @@ -15,10 +15,10 @@ environment, with extensive capabilities for customization and automation using session files, scripting and a powerful expression language. -<span>To run ANSYS CFX in batch mode you can utilize/modify the default -cfx.pbs script and execute it via the qsub command.</span> +>To run ANSYS CFX in batch mode you can utilize/modify the default +cfx.pbs script and execute it via the qsub command. -``` +``` #!/bin/bash #PBS -l nodes=2:ppn=16 #PBS -q qprod @@ -71,17 +71,17 @@ assumes such structure of allocated resources. Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input -file has to be specified. <span>Input file has to be defined by common +file has to be specified. >Input file has to be defined by common CFX def file which is attached to the cfx solver via parameter --def</span> +-def -**License** should be selected by parameter -P (Big letter **P**). +License** should be selected by parameter -P (Big letter **P**). Licensed products are the following: aa_r (ANSYS **Academic** Research), ane3fl (ANSYS Multiphysics)-**Commercial.** -[<span id="result_box" class="short_text"><span class="hps">More</span> -<span class="hps">about licensing</span> <span -class="hps">here</span></span>](licensing.html) +[ id="result_box" class="short_text"> class="hps">More + class="hps">about licensing +class="hps">here](licensing.html)  diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-fluent.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-fluent.md index e43a8033c561f08c767efbfee7a909de077b08a5..1db2e6eb0be2a514fa91a85848c6b7b44d2b5581 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-fluent.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-fluent.md @@ -12,13 +12,13 @@ treatment plants. Special models that give the software the ability to model in-cylinder combustion, aeroacoustics, turbomachinery, and multiphase systems have served to broaden its reach. -<span>1. Common way to run Fluent over pbs file</span> +>1. Common way to run Fluent over pbs file ------------------------------------------------------ -<span>To run ANSYS Fluent in batch mode you can utilize/modify the -default fluent.pbs script and execute it via the qsub command.</span> +>To run ANSYS Fluent in batch mode you can utilize/modify the +default fluent.pbs script and execute it via the qsub command. -``` +``` #!/bin/bash #PBS -S /bin/bash #PBS -l nodes=2:ppn=16 @@ -68,50 +68,50 @@ Journal file with definition of the input geometry and boundary conditions and defined process of solution has e.g. the following structure: - /file/read-case aircraft_2m.cas.gz - /solve/init - init - /solve/iterate - 10 - /file/write-case-dat aircraft_2m-solution - /exit yes + /file/read-case aircraft_2m.cas.gz + /solve/init + init + /solve/iterate + 10 + /file/write-case-dat aircraft_2m-solution + /exit yes -<span>The appropriate dimension of the problem has to be set by -parameter (2d/3d). </span> +>The appropriate dimension of the problem has to be set by +parameter (2d/3d). -<span>2. Fast way to run Fluent from command line</span> +>2. Fast way to run Fluent from command line -------------------------------------------------------- -``` +``` fluent solver_version [FLUENT_options] -i journal_file -pbs ``` This syntax will start the ANSYS FLUENT job under PBS Professional using -the <span class="monospace">qsub</span> command in a batch manner. When +the qsub command in a batch manner. When resources are available, PBS Professional will start the job and return -a job ID, usually in the form of <span -class="emphasis">*job_ID.hostname*</span>. This job ID can then be used +a job ID, usually in the form of +class="emphasis">*job_ID.hostname*. This job ID can then be used to query, control, or stop the job using standard PBS Professional -commands, such as <span class="monospace">qstat</span> or <span -class="monospace">qdel</span>. The job will be run out of the current -working directory, and all output will be written to the file <span -class="monospace">fluent.o</span><span> </span><span -class="emphasis">*job_ID*</span>.     +commands, such as qstat or +qdel. The job will be run out of the current +working directory, and all output will be written to the file +fluent.o> +class="emphasis">*job_ID*.     3. Running Fluent via user's config file ---------------------------------------- -The sample script uses a configuration file called <span -class="monospace">pbs_fluent.conf</span>  if no command line arguments +The sample script uses a configuration file called +pbs_fluent.conf  if no command line arguments are present. This configuration file should be present in the directory from which the jobs are submitted (which is also the directory in which the jobs are executed). The following is an example of what the content -of <span class="monospace">pbs_fluent.conf</span> can be: +of pbs_fluent.conf can be: -``` +``` input="example_small.flin" case="Small-1.65m.cas" fluent_args="3d -pmyrinet" @@ -123,37 +123,37 @@ The following is an explanation of the parameters: -<span><span class="monospace">input</span> is the name of the input -file.</span> +> input is the name of the input +file. -<span class="monospace">case</span> is the name of the <span -class="monospace">.cas</span> file that the input file will utilize. + case is the name of the +.cas file that the input file will utilize. -<span class="monospace">fluent_args</span> are extra ANSYS FLUENT + fluent_args are extra ANSYS FLUENT arguments. As shown in the previous example, you can specify the -interconnect by using the <span class="monospace">-p</span> interconnect -command. The available interconnects include <span -class="monospace">ethernet</span> (the default), <span -class="monospace">myrinet</span>,<span class="monospace"> -infiniband</span>, <span class="monospace">vendor</span>, <span -class="monospace">altix</span><span>,</span> and <span -class="monospace">crayx</span>. The MPI is selected automatically, based +interconnect by using the -p interconnect +command. The available interconnects include +ethernet (the default), +myrinet, class="monospace"> +infiniband, vendor, +altix>, and +crayx. The MPI is selected automatically, based on the specified interconnect. -<span class="monospace">outfile</span> is the name of the file to which + outfile is the name of the file to which the standard output will be sent. -<span class="monospace">mpp="true"</span> will tell the job script to + mpp="true" will tell the job script to execute the job across multiple processors.         -<span>To run ANSYS Fluent in batch mode with user's config file you can +>To run ANSYS Fluent in batch mode with user's config file you can utilize/modify the following script and execute it via the qsub -command.</span> +command. -``` +``` #!/bin/sh #PBS -l nodes=2:ppn=4 #PBS -1 qprod @@ -164,30 +164,30 @@ command.</span> #We assume that if they didn’t specify arguments then they should use the #config file if [ "xx${input}${case}${mpp}${fluent_args}zz" = "xxzz" ]; then - if [ -f pbs_fluent.conf ]; then - . pbs_fluent.conf - else - printf "No command line arguments specified, " - printf "and no configuration file found. Exiting n" - fi + if [ -f pbs_fluent.conf ]; then + . pbs_fluent.conf + else + printf "No command line arguments specified, " + printf "and no configuration file found. Exiting n" + fi fi #Augment the ANSYS FLUENT command line arguments case "$mpp" in - true) - #MPI job execution scenario - num_nodes=â€cat $PBS_NODEFILE | sort -u | wc -l†- cpus=â€expr $num_nodes * $NCPUS†- #Default arguments for mpp jobs, these should be changed to suit your - #needs. - fluent_args="-t$ $fluent_args -cnf=$PBS_NODEFILE" - ;; - *) - #SMP case - #Default arguments for smp jobs, should be adjusted to suit your - #needs. - fluent_args="-t$NCPUS $fluent_args" - ;; + true) + #MPI job execution scenario + num_nodes=â€cat $PBS_NODEFILE | sort -u | wc -l†+ cpus=â€expr $num_nodes * $NCPUS†+ #Default arguments for mpp jobs, these should be changed to suit your + #needs. + fluent_args="-t$ $fluent_args -cnf=$PBS_NODEFILE" + ;; + *) + #SMP case + #Default arguments for smp jobs, should be adjusted to suit your + #needs. + fluent_args="-t$NCPUS $fluent_args" + ;; esac #Default arguments for all jobs fluent_args="-ssh -g -i $input $fluent_args" @@ -199,13 +199,13 @@ command.</span> Fluent arguments: $fluent_args" #run the solver - /ansys_inc/v145/fluent/bin/fluent $fluent_args > $outfile + /ansys_inc/v145/fluent/bin/fluent $fluent_args> $outfile ``` -<span>It runs the jobs out of the directory from which they are -submitted (PBS_O_WORKDIR).</span> +>It runs the jobs out of the directory from which they are +submitted (PBS_O_WORKDIR). 4. Running Fluent in parralel ----------------------------- @@ -215,7 +215,7 @@ do so this ANSYS Academic Research license must be placed before ANSYS CFD license in user preferences. To make this change anslic_admin utility should be run -``` +``` /ansys_inc/shared_les/licensing/lic_admin/anslic_admin ``` diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-ls-dyna.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-ls-dyna.md index 4f27959d582152e335b164ca112c37c1289a1075..48cba5f35b8c3c80e04cefb9d3f7b4c6e567f150 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-ls-dyna.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-ls-dyna.md @@ -8,21 +8,21 @@ technology-rich, time-tested explicit solver without the need to contend with the complex input requirements of this sophisticated program. Introduced in 1996, ANSYS LS-DYNA capabilities have helped customers in numerous industries to resolve highly intricate design -issues. <span>ANSYS Mechanical users have been able take advantage of +issues. >ANSYS Mechanical users have been able take advantage of complex explicit solutions for a long time utilizing the traditional -ANSYS Parametric Design Language (APDL) environment. <span>These +ANSYS Parametric Design Language (APDL) environment. >These explicit capabilities are available to ANSYS Workbench users as well. The Workbench platform is a powerful, comprehensive, easy-to-use environment for engineering simulation. CAD import from all sources, geometry cleanup, automatic meshing, solution, parametric optimization, result visualization and comprehensive report generation are all available within a single fully interactive modern graphical user -environment.</span></span> +environment. -<span>To run ANSYS LS-DYNA in batch mode you can utilize/modify the -default ansysdyna.pbs script and execute it via the qsub command.</span> +>To run ANSYS LS-DYNA in batch mode you can utilize/modify the +default ansysdyna.pbs script and execute it via the qsub command. -``` +``` #!/bin/bash #PBS -l nodes=2:ppn=16 #PBS -q qprod @@ -68,21 +68,21 @@ echo Machines: $hl /ansys_inc/v145/ansys/bin/ansys145 -dis -lsdynampp i=input.k -machines $hl ``` -<span>Header of the pbs file (above) is common and description can be -find on </span>[this -site](../../resource-allocation-and-job-execution/job-submission-and-execution.html)<span>. +>Header of the pbs file (above) is common and description can be +find on [this +site](../../resource-allocation-and-job-execution/job-submission-and-execution.html)>. [SVS FEM](http://www.svsfem.cz) recommends to utilize sources by keywords: nodes, ppn. These keywords allows to address directly the number of nodes (computers) and cores (ppn) which will be utilized in the job. Also the rest of code assumes such structure of -allocated resources.</span> +allocated resources. Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input file has to be specified. Input file has to be defined by common LS-DYNA .**k** file which is attached to the ansys solver via parameter i= -<span><span> </span></span> +>> diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-mechanical-apdl.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-mechanical-apdl.md index 285f909a4c9da3a74b1393c27f7edbc0a2d26866..1c0aab2a5b77a5fbde4338214c73b5b4ccd8cd4a 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-mechanical-apdl.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ansys-mechanical-apdl.md @@ -1,19 +1,19 @@ ANSYS MAPDL =========== -<span>**[ANSYS +>**[ANSYS Multiphysics](http://www.ansys.com/Products/Simulation+Technology/Structural+Mechanics/ANSYS+Multiphysics)** software offers a comprehensive product solution for both multiphysics and single-physics analysis. The product includes structural, thermal, fluid and both high- and low-frequency electromagnetic analysis. The product also contains solutions for both direct and sequentially coupled physics problems including direct coupled-field elements and the ANSYS -multi-field solver.</span> +multi-field solver. -<span>To run ANSYS MAPDL in batch mode you can utilize/modify the -default mapdl.pbs script and execute it via the qsub command.</span> +>To run ANSYS MAPDL in batch mode you can utilize/modify the +default mapdl.pbs script and execute it via the qsub command. -``` +``` #!/bin/bash #PBS -l nodes=2:ppn=16 #PBS -q qprod @@ -71,14 +71,14 @@ queue. Input file should be in working directory or full path to input file has to be specified. Input file has to be defined by common APDL file which is attached to the ansys solver via parameter -i -**License** should be selected by parameter -p. Licensed products are +License** should be selected by parameter -p. Licensed products are the following: aa_r (ANSYS **Academic** Research), ane3fl (ANSYS Multiphysics)-**Commercial**, aa_r_dy (ANSYS **Academic** -AUTODYN)<span> -[<span id="result_box" class="short_text"><span class="hps">More</span> -<span class="hps">about licensing</span> <span -class="hps">here</span></span>](licensing.html) -</span> +AUTODYN)> +[ id="result_box" class="short_text"> class="hps">More + class="hps">about licensing +class="hps">here](licensing.html) + diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ls-dyna.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ls-dyna.md index 51443a473bb79b90ee2661e542ae38ab92af9337..3591982a5c40125a683b4ecba68b777ed8e40e88 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ls-dyna.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/ansys/ls-dyna.md @@ -8,7 +8,7 @@ analysis capability, a wide range of constitutive models to simulate a whole range of engineering materials (steels, composites, foams, concrete, etc.), error-checking features and the high scalability have enabled users worldwide to solve successfully many complex -problems. <span>Additionally LS-DYNA is extensively used to simulate +problems. >Additionally LS-DYNA is extensively used to simulate impacts on structures from drop tests, underwater shock, explosions or high-velocity impacts. Explosive forming, process engineering, accident reconstruction, vehicle dynamics, thermal brake disc analysis or nuclear @@ -16,16 +16,16 @@ safety are further areas in the broad range of possible applications. In leading-edge research LS-DYNA is used to investigate the behaviour of materials like composites, ceramics, concrete, or wood. Moreover, it is used in biomechanics, human modelling, molecular structures, casting, -forging, or virtual testing.</span> +forging, or virtual testing. -<span>Anselm provides **1 commercial license of LS-DYNA without HPC** -support now. </span> +>Anselm provides **1 commercial license of LS-DYNA without HPC** +support now. -<span><span>To run LS-DYNA in batch mode you can utilize/modify the +>>To run LS-DYNA in batch mode you can utilize/modify the default lsdyna.pbs script and execute it via the qsub -command.</span></span> +command. -``` +``` #!/bin/bash #PBS -l nodes=1:ppn=16 #PBS -q qprod @@ -61,7 +61,7 @@ allocated resources. Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input file has to be specified. Input file has to be defined by common LS-DYNA -**.k** file which is attached to the LS-DYNA solver via parameter i= +.k** file which is attached to the LS-DYNA solver via parameter i= diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/chemistry/molpro.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/chemistry/molpro.md index 878debb79c104b713c3a611596a1fb352fcf1171..7cc4223a88c2ed9e9a5afdb08a43a04aa3334580 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/chemistry/molpro.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/chemistry/molpro.md @@ -15,11 +15,11 @@ License Molpro software package is available only to users that have a valid license. Please contact support to enable access to Molpro if you have a -valid license appropriate for running on our cluster (eg. <span>academic -research group licence, parallel execution).</span> +valid license appropriate for running on our cluster (eg. >academic +research group licence, parallel execution). -<span>To run Molpro, you need to have a valid license token present in -"<span class="monospace">$HOME/.molpro/token"</span></span>. You can +>To run Molpro, you need to have a valid license token present in +" $HOME/.molpro/token". You can download the token from [Molpro website](https://www.molpro.net/licensee/?portal=licensee). @@ -31,15 +31,15 @@ parallel version compiled with Intel compilers and Intel MPI. Compilation parameters are default : - Parameter Value - ------------------------------------------------- ----------------------------- - <span>max number of atoms</span> 200 - <span>max number of valence orbitals</span> 300 - <span>max number of basis functions</span> 4095 - <span>max number of states per symmmetry</span> 20 - <span>max number of state symmetries</span> 16 - <span>max number of records</span> 200 - <span>max number of primitives</span> <span>maxbfn x [2]</span> +Parameter Value +------------------------------------------------- ----------------------------- +>max number of atoms 200 +>max number of valence orbitals 300 +>max number of basis functions 4095 +>max number of states per symmmetry 20 +>max number of state symmetries 16 +>max number of records 200 +>max number of primitives >maxbfn x [2]  @@ -57,8 +57,8 @@ for more details. The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI -parallelization only. This can be achieved by passing option <span -class="monospace">mpiprocs=16:ompthreads=1</span> to PBS. +parallelization only. This can be achieved by passing option +mpiprocs=16:ompthreads=1 to PBS. You are advised to use the -d option to point to a directory in [SCRATCH filesystem](../../storage.html). Molpro can produce a @@ -67,25 +67,25 @@ these are placed in the fast scratch filesystem. ### Example jobscript - #PBS -A IT4I-0-0 - #PBS -q qprod - #PBS -l select=1:ncpus=16:mpiprocs=16:ompthreads=1 + #PBS -A IT4I-0-0 + #PBS -q qprod + #PBS -l select=1:ncpus=16:mpiprocs=16:ompthreads=1 - cd $PBS_O_WORKDIR + cd $PBS_O_WORKDIR - # load Molpro module - module add molpro + # load Molpro module + module add molpro - # create a directory in the SCRATCH filesystem - mkdir -p /scratch/$USER/$PBS_JOBID + # create a directory in the SCRATCH filesystem + mkdir -p /scratch/$USER/$PBS_JOBID - # copy an example input - cp /apps/chem/molpro/2010.1/molprop_2010_1_Linux_x86_64_i8/examples/caffeine_opt_diis.com . + # copy an example input + cp /apps/chem/molpro/2010.1/molprop_2010_1_Linux_x86_64_i8/examples/caffeine_opt_diis.com . - # run Molpro with default options - molpro -d /scratch/$USER/$PBS_JOBID caffeine_opt_diis.com + # run Molpro with default options + molpro -d /scratch/$USER/$PBS_JOBID caffeine_opt_diis.com - # delete scratch directory - rm -rf /scratch/$USER/$PBS_JOBID + # delete scratch directory + rm -rf /scratch/$USER/$PBS_JOBID diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/chemistry/nwchem.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/chemistry/nwchem.md index 588e90ee25961d3e84e24e7f85e5ba8d0dcd5eaa..9ee1499e52d32ba15f616a75d9f090b051d96b38 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/chemistry/nwchem.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/chemistry/nwchem.md @@ -2,14 +2,14 @@ NWChem ====== High-Performance Computational Chemistry -<span>Introduction</span> +>Introduction ------------------------- -<span>NWChem aims to provide its users with computational chemistry +>NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel -supercomputers to conventional workstation clusters.</span> +supercomputers to conventional workstation clusters. [Homepage](http://www.nwchem-sw.org/index.php/Main_Page) @@ -18,23 +18,23 @@ Installed versions The following versions are currently installed : -- 6.1.1, not recommended, problems have been observed with this - version +- 6.1.1, not recommended, problems have been observed with this + version -- 6.3-rev2-patch1, current release with QMD patch applied. Compiled - with Intel compilers, MKL and Intel MPI +- 6.3-rev2-patch1, current release with QMD patch applied. Compiled + with Intel compilers, MKL and Intel MPI -- 6.3-rev2-patch1-openmpi, same as above, but compiled with OpenMPI - and NWChem provided BLAS instead of MKL. This version is expected to - be slower +- 6.3-rev2-patch1-openmpi, same as above, but compiled with OpenMPI + and NWChem provided BLAS instead of MKL. This version is expected to + be slower -- 6.3-rev2-patch1-venus, this version contains only libraries for - VENUS interface linking. Does not provide standalone NWChem - executable +- 6.3-rev2-patch1-venus, this version contains only libraries for + VENUS interface linking. Does not provide standalone NWChem + executable For a current list of installed versions, execute : - module avail nwchem + module avail nwchem Running ------- @@ -42,25 +42,25 @@ Running NWChem is compiled for parallel MPI execution. Normal procedure for MPI jobs applies. Sample jobscript : - #PBS -A IT4I-0-0 - #PBS -q qprod - #PBS -l select=1:ncpus=16 + #PBS -A IT4I-0-0 + #PBS -q qprod + #PBS -l select=1:ncpus=16 - module add nwchem/6.3-rev2-patch1 - mpirun -np 16 nwchem h2o.nw + module add nwchem/6.3-rev2-patch1 + mpirun -np 16 nwchem h2o.nw -<span>Options</span> +>Options -------------------- Please refer to [the documentation](http://www.nwchem-sw.org/index.php/Release62:Top-level) and in the input file set the following directives : -- <span>MEMORY : controls the amount of memory NWChem will use</span> -- <span>SCRATCH_DIR : set this to a directory in [SCRATCH - filesystem](../../storage.html#scratch) (or run the - calculation completely in a scratch directory). For certain - calculations, it might be advisable to reduce I/O by forcing - "direct" mode, eg. "scf direct"</span> +- >MEMORY : controls the amount of memory NWChem will use +- >SCRATCH_DIR : set this to a directory in [SCRATCH + filesystem](../../storage.html#scratch) (or run the + calculation completely in a scratch directory). For certain + calculations, it might be advisable to reduce I/O by forcing + "direct" mode, eg. "scf direct" diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/compilers.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/compilers.md index 7f2c5a86ee563f4e9de4348b007f0289312dfe02..5a862a79ce8c7d925a7f6b5ec532165ceb01cab1 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/compilers.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/compilers.md @@ -4,16 +4,16 @@ Compilers Available compilers, including GNU, INTEL and UPC compilers - + Currently there are several compilers for different programming languages available on the Anselm cluster: -- C/C++ -- Fortran 77/90/95 -- Unified Parallel C -- Java -- nVidia CUDA +- C/C++ +- Fortran 77/90/95 +- Unified Parallel C +- Java +- nVidia CUDA  @@ -37,20 +37,20 @@ accessible in the search path by default. It is strongly recommended to use the up to date version (4.8.1) which comes with the module gcc: - $ module load gcc - $ gcc -v - $ g++ -v - $ gfortran -v + $ module load gcc + $ gcc -v + $ g++ -v + $ gfortran -v With the module loaded two environment variables are predefined. One for maximum optimizations on the Anselm cluster architecture, and the other for debugging purposes: - $ echo $OPTFLAGS - -O3 -march=corei7-avx + $ echo $OPTFLAGS + -O3 -march=corei7-avx - $ echo $DEBUGFLAGS - -O0 -g + $ echo $DEBUGFLAGS + -O0 -g For more informations about the possibilities of the compilers, please see the man pages. @@ -60,42 +60,42 @@ Unified Parallel C UPC is supported by two compiler/runtime implementations: -- GNU - SMP/multi-threading support only -- Berkley - multi-node support as well as SMP/multi-threading support +- GNU - SMP/multi-threading support only +- Berkley - multi-node support as well as SMP/multi-threading support ### GNU UPC Compiler To use the GNU UPC compiler and run the compiled binaries use the module gupc - $ module add gupc - $ gupc -v - $ g++ -v + $ module add gupc + $ gupc -v + $ g++ -v Simple program to test the compiler - $ cat count.upc + $ cat count.upc - /* hello.upc - a simple UPC example */ - #include <upc.h> - #include <stdio.h> + /* hello.upc - a simple UPC example */ + #include <upc.h> + #include <stdio.h> - int main() { -  if (MYTHREAD == 0) { -    printf("Welcome to GNU UPC!!!n"); -  } -  upc_barrier; -  printf(" - Hello from thread %in", MYTHREAD); -  return 0; - } + int main() { +  if (MYTHREAD == 0) { +    printf("Welcome to GNU UPC!!!n"); +  } +  upc_barrier; +  printf(" - Hello from thread %in", MYTHREAD); +  return 0; + } To compile the example use - $ gupc -o count.upc.x count.upc + $ gupc -o count.upc.x count.upc To run the example with 5 threads issue - $ ./count.upc.x -fupc-threads-5 + $ ./count.upc.x -fupc-threads-5 For more informations see the man pages. @@ -104,8 +104,8 @@ For more informations see the man pages. To use the Berkley UPC compiler and runtime environment to run the binaries use the module bupc - $ module add bupc - $ upcc -version + $ module add bupc + $ upcc -version As default UPC network the "smp" is used. This is very quick and easy way for testing/debugging, but limited to one node only. @@ -113,40 +113,40 @@ way for testing/debugging, but limited to one node only. For production runs, it is recommended to use the native Infiband implementation of UPC network "ibv". For testing/debugging using multiple nodes, the "mpi" UPC network is recommended. Please note, that -**the selection of the network is done at the compile time** and not at +the selection of the network is done at the compile time** and not at runtime (as expected)! Example UPC code: - $ cat hello.upc + $ cat hello.upc - /* hello.upc - a simple UPC example */ - #include <upc.h> - #include <stdio.h> + /* hello.upc - a simple UPC example */ + #include <upc.h> + #include <stdio.h> - int main() { -  if (MYTHREAD == 0) { -    printf("Welcome to Berkeley UPC!!!n"); -  } -  upc_barrier; -  printf(" - Hello from thread %in", MYTHREAD); -  return 0; - } + int main() { +  if (MYTHREAD == 0) { +    printf("Welcome to Berkeley UPC!!!n"); +  } +  upc_barrier; +  printf(" - Hello from thread %in", MYTHREAD); +  return 0; + } To compile the example with the "ibv" UPC network use - $ upcc -network=ibv -o hello.upc.x hello.upc + $ upcc -network=ibv -o hello.upc.x hello.upc To run the example with 5 threads issue - $ upcrun -n 5 ./hello.upc.x + $ upcrun -n 5 ./hello.upc.x To run the example on two compute nodes using all 32 cores, with 32 threads, issue - $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 - $ module add bupc - $ upcrun -n 32 ./hello.upc.x + $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 + $ module add bupc + $ upcrun -n 32 ./hello.upc.x  For more informations see the man pages. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/comsol/comsol-multiphysics.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/comsol/comsol-multiphysics.md index e6729427d177c76793a73299e3581d5ceab0699a..297236be2542569b2cdc2415872e3895d6d0b3a6 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/comsol/comsol-multiphysics.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/comsol/comsol-multiphysics.md @@ -3,103 +3,103 @@ COMSOL Multiphysics® - -<span><span>Introduction -</span></span> + +>>Introduction + ------------------------- -<span><span>[COMSOL](http://www.comsol.com)</span></span><span><span> +>>[COMSOL](http://www.comsol.com)<span><span> is a powerful environment for modelling and solving various engineering and scientific problems based on partial differential equations. COMSOL is designed to solve coupled or multiphysics phenomena. For many standard engineering problems COMSOL provides add-on products such as electrical, mechanical, fluid flow, and chemical -applications.</span></span> +applications. -- <span><span>[Structural Mechanics - Module](http://www.comsol.com/structural-mechanics-module), - </span></span> +- >>[Structural Mechanics + Module](http://www.comsol.com/structural-mechanics-module), + -- <span><span>[Heat Transfer - Module](http://www.comsol.com/heat-transfer-module), - </span></span> +- >>[Heat Transfer + Module](http://www.comsol.com/heat-transfer-module), + -- <span><span>[CFD - Module](http://www.comsol.com/cfd-module), - </span></span> +- >>[CFD + Module](http://www.comsol.com/cfd-module), + -- <span><span>[Acoustics - Module](http://www.comsol.com/acoustics-module), - </span></span> +- >>[Acoustics + Module](http://www.comsol.com/acoustics-module), + -- <span><span>and [many - others](http://www.comsol.com/products)</span></span> +- >>and [many + others](http://www.comsol.com/products) -<span><span>COMSOL also allows an -</span></span><span><span><span><span>interface support for +>>COMSOL also allows an +>><span><span>interface support for equation-based modelling of -</span></span></span></span><span><span>partial differential -equations.</span></span> +</span></span>>>partial differential +equations. + +>>Execution -<span><span>Execution -</span></span> ---------------------- -<span><span>On the Anselm cluster COMSOL is available in the latest -stable version. There are two variants of the release:</span></span> - -- <span><span>**Non commercial**</span></span><span><span> or so - called </span></span><span><span>**EDU - variant**</span></span><span><span>, which can be used for research - and educational purposes.</span></span> - -- <span><span>**Commercial**</span></span><span><span> or so called - </span></span><span><span>**COM variant**</span></span><span><span>, - which can used also for commercial activities. - </span></span><span><span>**COM variant**</span></span><span><span> - has only subset of features compared to the - </span></span><span><span>**EDU - variant**</span></span><span><span> available. <span - class="internal-link"><span id="result_box" class="short_text"><span - class="hps">More</span> <span class="hps">about - licensing</span> will be posted <span class="hps">here - soon</span>.</span></span> - </span></span> - -<span><span>To load the of COMSOL load the module</span></span> - -``` +>>On the Anselm cluster COMSOL is available in the latest +stable version. There are two variants of the release: + +- >>**Non commercial**<span><span> or so + called >>**EDU + variant**>>, which can be used for research + and educational purposes. + +- >>**Commercial**<span><span> or so called + >>**COM variant**</span></span><span><span>, + which can used also for commercial activities. + >>**COM variant**</span></span><span><span> + has only subset of features compared to the + >>**EDU + variant**>> available. <span + id="result_box" class="short_text"> + class="hps">More class="hps">about + licensing will be posted class="hps">here + soon.</span> + + +>>To load the of COMSOL load the module + +``` $ module load comsol ``` -<span><span>By default the </span></span><span><span>**EDU -variant**</span></span><span><span> will be loaded. If user needs other +>>By default the <span><span>**EDU +variant**>> will be loaded. If user needs other version or variant, load the particular version. To obtain the list of -available versions use</span></span> +available versions use -``` +``` $ module avail comsol ``` -<span><span>If user needs to prepare COMSOL jobs in the interactive mode +>>If user needs to prepare COMSOL jobs in the interactive mode it is recommend to use COMSOL on the compute nodes via PBS Pro scheduler. In order run the COMSOL Desktop GUI on Windows is recommended to use the [Virtual Network Computing -(VNC)](https://docs.it4i.cz/anselm-cluster-documentation/software/comsol/resolveuid/11e53ad0d2fd4c5187537f4baeedff33).</span></span> +(VNC)](https://docs.it4i.cz/anselm-cluster-documentation/software/comsol/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). -``` +``` $ xhost + $ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ncpus=16 $ module load comsol $ comsol ``` -<span><span>To run COMSOL in batch mode, without the COMSOL Desktop GUI +>>To run COMSOL in batch mode, without the COMSOL Desktop GUI environment, user can utilized the default (comsol.pbs) job script and -execute it via the qsub command.</span></span> +execute it via the qsub command. -``` +``` #!/bin/bash #PBS -l select=3:ncpus=16 #PBS -q qprod @@ -124,37 +124,37 @@ ntask=$(wc -l $PBS_NODEFILE) comsol -nn $ batch -configuration /tmp –mpiarg –rmk –mpiarg pbs -tmpdir /scratch/$USER/ -inputfile name_input_f.mph -outputfile name_output_f.mph -batchlog name_log_f.log ``` -<span><span>Working directory has to be created before sending the +>>Working directory has to be created before sending the (comsol.pbs) job script into the queue. Input file (name_input_f.mph) has to be in working directory or full path to input file has to be specified. The appropriate path to the temp directory of the job has to -be set by command option (-tmpdir).</span></span> +be set by command option (-tmpdir). LiveLink™* *for MATLAB^®^ ------------------------- -<span><span>COMSOL is the software package for the numerical solution of +>>COMSOL is the software package for the numerical solution of the partial differential equations. LiveLink for MATLAB allows connection to the -COMSOL</span></span><span><span>^<span><span><span><span><span><span><span>**®**</span></span></span></span></span></span></span>^</span></span><span><span> +COMSOL>>^<span><span><span><span><span><span><span>**®**</span></span></span></span></span></span></span>^</span></span><span><span> API (Application Programming Interface) with the benefits of the programming language and computing environment of the MATLAB. -</span></span> -<span><span>LiveLink for MATLAB is available in both -</span></span><span><span>**EDU**</span></span><span><span> and -</span></span><span><span>**COM**</span></span><span><span> -</span></span><span><span>**variant**</span></span><span><span> of the + +>>LiveLink for MATLAB is available in both +>>**EDU**</span></span><span><span> and +>>**COM**</span></span><span><span> +>>**variant**</span></span><span><span> of the COMSOL release. On Anselm 1 commercial -(</span></span><span><span>**COM**</span></span><span><span>) license +(>>**COM**</span></span><span><span>) license and the 5 educational -(</span></span><span><span>**EDU**</span></span><span><span>) licenses +(>>**EDU**</span></span><span><span>) licenses of LiveLink for MATLAB (please see the [ISV Licenses](../isv_licenses.html)) are available. Following example shows how to start COMSOL model from MATLAB via -LiveLink in the interactive mode.</span></span> +LiveLink in the interactive mode. -``` +``` $ xhost + $ qsub -I -X -A PROJECT_ID -q qexp -l select=1:ncpus=16 $ module load matlab @@ -162,15 +162,15 @@ $ module load comsol $ comsol server matlab ``` -<span><span>At the first time to launch the LiveLink for MATLAB +>>At the first time to launch the LiveLink for MATLAB (client-MATLAB/server-COMSOL connection) the login and password is -requested and this information is not requested again.</span></span> +requested and this information is not requested again. -<span><span>To run LiveLink for MATLAB in batch mode with +>>To run LiveLink for MATLAB in batch mode with (comsol_matlab.pbs) job script you can utilize/modify the following -script and execute it via the qsub command.</span></span> +script and execute it via the qsub command. -``` +``` #!/bin/bash #PBS -l select=3:ncpus=16 #PBS -q qprod diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers.md index 70a4827518254f6dd6b26e725b572db1eec68fdc..50587938502afd4807ec7efd0a114397d819e82f 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers.md @@ -3,7 +3,7 @@ Debuggers and profilers summary - + Introduction ------------ @@ -23,8 +23,8 @@ environment. Use [X display](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) for running the GUI. - $ module load intel - $ idb + $ module load intel + $ idb Read more at the [Intel Debugger](intel-suite/intel-debugger.html) page. @@ -40,8 +40,8 @@ every thread running as part of your program, or for every process - even if these processes are distributed across a cluster using an MPI implementation. - $ module load Forge - $ forge + $ module load Forge + $ forge Read more at the [Allinea DDT](debuggers/allinea-ddt.html) page. @@ -56,8 +56,8 @@ about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs. Our license is limited to 64 MPI processes. - $ module load PerformanceReports/6.0 - $ perf-report mpirun -n 64 ./my_application argument01 argument02 + $ module load PerformanceReports/6.0 + $ perf-report mpirun -n 64 ./my_application argument01 argument02 Read more at the [Allinea Performance Reports](debuggers/allinea-performance-reports.html) @@ -72,8 +72,8 @@ analyze, organize, and test programs, making it easy to isolate and identify problems in individual threads and processes in programs of great complexity. - $ module load totalview - $ totalview + $ module load totalview + $ totalview Read more at the [Totalview](debuggers/total-view.html) page. @@ -83,8 +83,8 @@ Vampir trace analyzer Vampir is a GUI trace analyzer for traces in OTF format. - $ module load Vampir/8.5.0 - $ vampir + $ module load Vampir/8.5.0 + $ vampir Read more at the [Vampir](../../salomon/software/debuggers/vampir.html) page. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/allinea-ddt.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/allinea-ddt.md index 1d1fbd6ce666cf6602604529857a55aed58f47ad..d909877d1cfb77e240d7a9c9122998748fafadb2 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/allinea-ddt.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/allinea-ddt.md @@ -3,7 +3,7 @@ Allinea Forge (DDT,MAP) - + Allinea Forge consist of two tools - debugger DDT and profiler MAP. @@ -25,13 +25,13 @@ On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel processes. In case of debugging GPU or Xeon Phi accelerated codes the limit is 8 accelerators. These limitation means that: -- 1 user can debug up 64 processes, or -- 32 users can debug 2 processes, etc. +- 1 user can debug up 64 processes, or +- 32 users can debug 2 processes, etc. In case of debugging on accelerators: -- 1 user can debug on up to 8 accelerators, or -- 8 users can debug on single accelerator. +- 1 user can debug on up to 8 accelerators, or +- 8 users can debug on single accelerator. Compiling Code to run with DDT ------------------------------ @@ -40,16 +40,16 @@ Compiling Code to run with DDT Load all necessary modules to compile the code. For example: - $ module load intel - $ module load impi ... or ... module load openmpi/X.X.X-icc + $ module load intel + $ module load impi ... or ... module load openmpi/X.X.X-icc Load the Allinea DDT module: - $ module load Forge + $ module load Forge Compile the code: -``` +``` $ mpicc -g -O0 -o test_debug test.c $ mpif90 -g -O0 -o test_debug test.f @@ -61,43 +61,43 @@ $ mpif90 -g -O0 -o test_debug test.f Before debugging, you need to compile your code with theses flags: -**-g** : Generates extra debugging information usable by GDB. -g3 +-g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. -**-O0** : Suppress all optimizations. +-O0** : Suppress all optimizations.  Starting a Job with DDT ----------------------- -Be sure to log in with an <span class="internal-link">X window -forwarding</span> enabled. This could mean using the -X in the ssh:  +Be sure to log in with an X window +forwarding enabled. This could mean using the -X in the ssh:  - $ ssh -X username@anselm.it4i.cz + $ ssh -X username@anselm.it4i.cz Other options is to access login node using VNC. Please see the detailed information on how to [use graphic user interface on -Anselm](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33)<span -class="internal-link"></span>. +Anselm](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) +. From the login node an interactive session **with X windows forwarding** (-X option) can be started by following command: - $ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 + $ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 Then launch the debugger with the ddt command followed by the name of the executable to debug: - $ ddt test_debug + $ ddt test_debug -A<span style="text-align: start; "> submission window that appears have +A submission window that appears have a prefilled path to the executable to debug. You can select the number of MPI processors and/or OpenMP threads on which to run and press run. -Command line arguments to a program can be entered to the</span> -"Arguments<span class="Apple-converted-space">" </span><span -style="text-align: start; ">box.</span> +Command line arguments to a program can be entered to the +"Arguments " +box.  @@ -107,7 +107,7 @@ For example the number of MPI processes is set by option "-np 4". Skipping the dialog is done by "-start" option. To see the list of the "ddt" command line parameters, run "ddt --help".  - ddt -start -np 4 ./hello_debug_impi + ddt -start -np 4 ./hello_debug_impi  @@ -116,7 +116,7 @@ Documentation Users can find original User Guide after loading the DDT module: - $DDTPATH/doc/userguide.pdf + $DDTPATH/doc/userguide.pdf  diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md index 5ea3709fa01f289c90fd9f982f33f583c90baef3..16c147569aa4858f839491c5f22fd593a216ea86 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/allinea-performance-reports.md @@ -4,7 +4,7 @@ Allinea Performance Reports quick application profiling - + Introduction ------------ @@ -25,7 +25,7 @@ Modules Allinea Performance Reports version 6.0 is available - $ module load PerformanceReports/6.0 + $ module load PerformanceReports/6.0 The module sets up environment variables, required for using the Allinea Performance Reports. This particular command loads the default module, @@ -39,7 +39,7 @@ Use the the perf-report wrapper on your (MPI) program. Instead of [running your MPI program the usual way](../mpi-1.html), use the the perf report wrapper: - $ perf-report mpirun ./mympiprog.x + $ perf-report mpirun ./mympiprog.x The mpi program will run as usual. The perf-report creates two additional files, in *.txt and *.html format, containing the @@ -56,18 +56,18 @@ compilers and linked against intel MPI library: First, we allocate some nodes via the express queue: - $ qsub -q qexp -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 -I - qsub: waiting for job 262197.dm2 to start - qsub: job 262197.dm2 ready + $ qsub -q qexp -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 -I + qsub: waiting for job 262197.dm2 to start + qsub: job 262197.dm2 ready Then we load the modules and run the program the usual way: - $ module load intel impi allinea-perf-report/4.2 - $ mpirun ./mympiprog.x + $ module load intel impi allinea-perf-report/4.2 + $ mpirun ./mympiprog.x Now lets profile the code: - $ perf-report mpirun ./mympiprog.x + $ perf-report mpirun ./mympiprog.x Performance report files [mympiprog_32p*.txt](mympiprog_32p_2014-10-15_16-56.txt) diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/cube.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/cube.md index 4511d90935940dde142b79550477a1ec09ff2c6c..1b8957ed751873683a506f73dc51ece8beffa64b 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/cube.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/cube.md @@ -8,11 +8,11 @@ CUBE is a graphical performance report explorer for displaying data from Score-P and Scalasca (and other compatible tools). The name comes from the fact that it displays performance data in a three-dimensions : -- **performance metric**, where a number of metrics are available, - such as communication time or cache misses, -- **call path**, which contains the call tree of your program -- s**ystem resource**, which contains system's nodes, processes and - threads, depending on the parallel programming model. +- **performance metric**, where a number of metrics are available, + such as communication time or cache misses, +- **call path**, which contains the call tree of your program +- s**ystem resource**, which contains system's nodes, processes and + threads, depending on the parallel programming model. Each dimension is organized in a tree, for example the time performance @@ -41,11 +41,11 @@ Installed versions Currently, there are two versions of CUBE 4.2.3 available as [modules](../../environment-and-modules.html) : -- <span class="s1"><span class="monospace">cube/4.2.3-gcc</span>, - compiled with GCC</span> +- class="s1"> cube/4.2.3-gcc, + compiled with GCC -- <span class="s1"><span class="monospace">cube/4.2.3-icc</span>, - compiled with Intel compiler</span> +- class="s1"> cube/4.2.3-icc, + compiled with Intel compiler Usage ----- @@ -57,16 +57,16 @@ for a list of methods to launch graphical applications on Anselm. Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes. -After loading the apropriate module, simply launch <span -class="monospace">cube</span> command, or alternatively you can use -<span class="monospace">scalasca -examine</span> command to launch the +After loading the apropriate module, simply launch +cube command, or alternatively you can use + scalasca -examine command to launch the GUI. Note that for Scalasca datasets, if you do not analyze the data -with <span><span class="monospace">scalasca --examine </span></span>before to opening them with CUBE, not all +with > scalasca +-examine before to opening them with CUBE, not all performance data will be available. - <span>References</span> + >References -1. <http://www.scalasca.org/software/cube-4.x/download.html> +1.<http://www.scalasca.org/software/cube-4.x/download.html> diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md index 8e1e391a1cb8d25a1a80cf6b5d4ddd9456923a2a..cdd497466fdc33757e5cec1be21a5aef06077ece 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/intel-performance-counter-monitor.md @@ -5,19 +5,19 @@ Introduction ------------ Intel PCM (Performance Counter Monitor) is a tool to monitor performance -hardware counters on Intel<span>®</span> processors, similar to +hardware counters on Intel>® processors, similar to [PAPI](papi.html). The difference between PCM and PAPI is that PCM supports only Intel hardware, but PCM can monitor also -uncore metrics, like memory controllers and <span>QuickPath Interconnect -links.</span> +uncore metrics, like memory controllers and >QuickPath Interconnect +links. -<span>Installed version</span> +>Installed version ------------------------------ Currently installed version 2.6. To load the [module](../../environment-and-modules.html), issue : - $ module load intelpcm + $ module load intelpcm Command line tools ------------------ @@ -29,7 +29,7 @@ PCM provides a set of tools to monitor system/or application. Measures memory bandwidth of your application or the whole system. Usage: - $ pcm-memory.x <delay>|[external_program parameters] + $ pcm-memory.x <delay>|[external_program parameters] Specify either a delay of updates in seconds or an external program to monitor. If you get an error about PMU in use, respond "y" and relaunch @@ -37,34 +37,34 @@ the program. Sample output: - ---------------------------------------||--------------------------------------- - -- Socket 0 --||-- Socket 1 -- - ---------------------------------------||--------------------------------------- - ---------------------------------------||--------------------------------------- - ---------------------------------------||--------------------------------------- - -- Memory Performance Monitoring --||-- Memory Performance Monitoring -- - ---------------------------------------||--------------------------------------- - -- Mem Ch 0: Reads (MB/s): 2.44 --||-- Mem Ch 0: Reads (MB/s): 0.26 -- - -- Writes(MB/s): 2.16 --||-- Writes(MB/s): 0.08 -- - -- Mem Ch 1: Reads (MB/s): 0.35 --||-- Mem Ch 1: Reads (MB/s): 0.78 -- - -- Writes(MB/s): 0.13 --||-- Writes(MB/s): 0.65 -- - -- Mem Ch 2: Reads (MB/s): 0.32 --||-- Mem Ch 2: Reads (MB/s): 0.21 -- - -- Writes(MB/s): 0.12 --||-- Writes(MB/s): 0.07 -- - -- Mem Ch 3: Reads (MB/s): 0.36 --||-- Mem Ch 3: Reads (MB/s): 0.20 -- - -- Writes(MB/s): 0.13 --||-- Writes(MB/s): 0.07 -- - -- NODE0 Mem Read (MB/s): 3.47 --||-- NODE1 Mem Read (MB/s): 1.45 -- - -- NODE0 Mem Write (MB/s): 2.55 --||-- NODE1 Mem Write (MB/s): 0.88 -- - -- NODE0 P. Write (T/s) : 31506 --||-- NODE1 P. Write (T/s): 9099 -- - -- NODE0 Memory (MB/s): 6.02 --||-- NODE1 Memory (MB/s): 2.33 -- - ---------------------------------------||--------------------------------------- - -- System Read Throughput(MB/s): 4.93 -- - -- System Write Throughput(MB/s): 3.43 -- - -- System Memory Throughput(MB/s): 8.35 -- - ---------------------------------------||--------------------------------------- + ---------------------------------------||--------------------------------------- + -- Socket 0 --||-- Socket 1 -- + ---------------------------------------||--------------------------------------- + ---------------------------------------||--------------------------------------- + ---------------------------------------||--------------------------------------- + -- Memory Performance Monitoring --||-- Memory Performance Monitoring -- + ---------------------------------------||--------------------------------------- + -- Mem Ch 0: Reads (MB/s): 2.44 --||-- Mem Ch 0: Reads (MB/s): 0.26 -- + -- Writes(MB/s): 2.16 --||-- Writes(MB/s): 0.08 -- + -- Mem Ch 1: Reads (MB/s): 0.35 --||-- Mem Ch 1: Reads (MB/s): 0.78 -- + -- Writes(MB/s): 0.13 --||-- Writes(MB/s): 0.65 -- + -- Mem Ch 2: Reads (MB/s): 0.32 --||-- Mem Ch 2: Reads (MB/s): 0.21 -- + -- Writes(MB/s): 0.12 --||-- Writes(MB/s): 0.07 -- + -- Mem Ch 3: Reads (MB/s): 0.36 --||-- Mem Ch 3: Reads (MB/s): 0.20 -- + -- Writes(MB/s): 0.13 --||-- Writes(MB/s): 0.07 -- + -- NODE0 Mem Read (MB/s): 3.47 --||-- NODE1 Mem Read (MB/s): 1.45 -- + -- NODE0 Mem Write (MB/s): 2.55 --||-- NODE1 Mem Write (MB/s): 0.88 -- + -- NODE0 P. Write (T/s) : 31506 --||-- NODE1 P. Write (T/s): 9099 -- + -- NODE0 Memory (MB/s): 6.02 --||-- NODE1 Memory (MB/s): 2.33 -- + ---------------------------------------||--------------------------------------- + -- System Read Throughput(MB/s): 4.93 -- + -- System Write Throughput(MB/s): 3.43 -- + -- System Memory Throughput(MB/s): 8.35 -- + ---------------------------------------||--------------------------------------- ### pcm-msr -Command <span class="monospace">pcm-msr.x</span> can be used to +Command pcm-msr.x can be used to read/write model specific registers of the CPU. ### pcm-numa @@ -73,129 +73,129 @@ NUMA monitoring utility does not work on Anselm. ### pcm-pcie -Can be used to monitor PCI Express bandwith. Usage: <span -class="monospace">pcm-pcie.x <delay></span> +Can be used to monitor PCI Express bandwith. Usage: +pcm-pcie.x <delay> ### pcm-power Displays energy usage and thermal headroom for CPU and DRAM sockets. -Usage: <span> </span><span class="monospace">pcm-power.x <delay> | -<external program></span> +Usage: > pcm-power.x <delay> | +<external program> ### pcm This command provides an overview of performance counters and memory -usage. <span>Usage: </span><span> </span><span class="monospace">pcm.x -<delay> | <external program></span> +usage. >Usage: > <span pcm.x +<delay> | <external program> Sample output : - $ pcm.x ./matrix - - Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100 ID=db05e43) - - Copyright (c) 2009-2013 Intel Corporation - - Number of physical cores: 16 - Number of logical cores: 16 - Threads (logical cores) per physical core: 1 - Num sockets: 2 - Core PMU (perfmon) version: 3 - Number of core PMU generic (programmable) counters: 8 - Width of generic (programmable) counters: 48 bits - Number of core PMU fixed counters: 3 - Width of fixed counters: 48 bits - Nominal core frequency: 2400000000 Hz - Package thermal spec power: 115 Watt; Package minimum power: 51 Watt; Package maximum power: 180 Watt; - Socket 0: 1 memory controllers detected with total number of 4 channels. 2 QPI ports detected. - Socket 1: 1 memory controllers detected with total number of 4 channels. 2 QPI ports detected. - Number of PCM instances: 2 - Max QPI link speed: 16.0 GBytes/second (8.0 GT/second) - - Detected Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz "Intel(r) microarchitecture codename Sandy Bridge-EP/Jaketown" - - Executing "./matrix" command: - - Exit code: 0 - - - EXEC : instructions per nominal CPU cycle - IPC : instructions per CPU cycle - FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost) - AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost) - L3MISS: L3 cache misses - L2MISS: L2 cache misses (including other core's L2 cache *hits*) - L3HIT : L3 cache hit ratio (0.00-1.00) - L2HIT : L2 cache hit ratio (0.00-1.00) - L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency - L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00) - READ : bytes read from memory controller (in GBytes) - WRITE : bytes written to memory controller (in GBytes) - TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature - - - Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE | TEMP - - 0 0 0.00 0.64 0.01 0.80 5592 11 K 0.49 0.13 0.32 0.06 N/A N/A 67 - 1 0 0.00 0.18 0.00 0.69 3086 5552 0.44 0.07 0.48 0.08 N/A N/A 68 - 2 0 0.00 0.23 0.00 0.81 300 562 0.47 0.06 0.43 0.08 N/A N/A 67 - 3 0 0.00 0.21 0.00 0.99 437 862 0.49 0.06 0.44 0.09 N/A N/A 73 - 4 0 0.00 0.23 0.00 0.93 293 559 0.48 0.07 0.42 0.09 N/A N/A 73 - 5 0 0.00 0.21 0.00 1.00 423 849 0.50 0.06 0.43 0.10 N/A N/A 69 - 6 0 0.00 0.23 0.00 0.94 285 558 0.49 0.06 0.41 0.09 N/A N/A 71 - 7 0 0.00 0.18 0.00 0.81 674 1130 0.40 0.05 0.53 0.08 N/A N/A 65 - 8 1 0.00 0.47 0.01 1.26 6371 13 K 0.51 0.35 0.31 0.07 N/A N/A 64 - 9 1 2.30 1.80 1.28 1.29 179 K 15 M 0.99 0.59 0.04 0.71 N/A N/A 60 - 10 1 0.00 0.22 0.00 1.26 315 570 0.45 0.06 0.43 0.08 N/A N/A 67 - 11 1 0.00 0.23 0.00 0.74 321 579 0.45 0.05 0.45 0.07 N/A N/A 66 - 12 1 0.00 0.22 0.00 1.25 305 570 0.46 0.05 0.42 0.07 N/A N/A 68 - 13 1 0.00 0.22 0.00 1.26 336 581 0.42 0.04 0.44 0.06 N/A N/A 69 - 14 1 0.00 0.22 0.00 1.25 314 565 0.44 0.06 0.43 0.07 N/A N/A 69 - 15 1 0.00 0.29 0.00 1.19 2815 6926 0.59 0.39 0.29 0.08 N/A N/A 69 - ------------------------------------------------------------------------------------------------------------------- - SKT 0 0.00 0.46 0.00 0.79 11 K 21 K 0.47 0.10 0.38 0.07 0.00 0.00 65 - SKT 1 0.29 1.79 0.16 1.29 190 K 15 M 0.99 0.59 0.05 0.70 0.01 0.01 61 - ------------------------------------------------------------------------------------------------------------------- - TOTAL * 0.14 1.78 0.08 1.28 201 K 15 M 0.99 0.59 0.05 0.70 0.01 0.01 N/A - - Instructions retired: 1345 M ; Active cycles: 755 M ; Time (TSC): 582 Mticks ; C0 (active,non-halted) core residency: 6.30 % - - C1 core residency: 0.14 %; C3 core residency: 0.20 %; C6 core residency: 0.00 %; C7 core residency: 93.36 %; - C2 package residency: 48.81 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %; - - PHYSICAL CORE IPC : 1.78 => corresponds to 44.50 % utilization for cores in active state - Instructions per nominal CPU cycle: 0.14 => corresponds to 3.60 % core utilization over time interval - - Intel(r) QPI data traffic estimation in bytes (data traffic coming to CPU/socket through QPI links): - - QPI0 QPI1 | QPI0 QPI1 - ---------------------------------------------------------------------------------------------- - SKT 0 0 0 | 0% 0% - SKT 1 0 0 | 0% 0% - ---------------------------------------------------------------------------------------------- - Total QPI incoming data traffic: 0 QPI data traffic/Memory controller traffic: 0.00 - - Intel(r) QPI traffic estimation in bytes (data and non-data traffic outgoing from CPU/socket through QPI links): - - QPI0 QPI1 | QPI0 QPI1 - ---------------------------------------------------------------------------------------------- - SKT 0 0 0 | 0% 0% - SKT 1 0 0 | 0% 0% - ---------------------------------------------------------------------------------------------- - Total QPI outgoing data and non-data traffic: 0 - - ---------------------------------------------------------------------------------------------- - SKT 0 package consumed 4.06 Joules - SKT 1 package consumed 9.40 Joules - ---------------------------------------------------------------------------------------------- - TOTAL: 13.46 Joules - - ---------------------------------------------------------------------------------------------- - SKT 0 DIMMs consumed 4.18 Joules - SKT 1 DIMMs consumed 4.28 Joules - ---------------------------------------------------------------------------------------------- - TOTAL: 8.47 Joules - Cleaning up + $ pcm.x ./matrix + + Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100 ID=db05e43) + + Copyright (c) 2009-2013 Intel Corporation + + Number of physical cores: 16 + Number of logical cores: 16 + Threads (logical cores) per physical core: 1 + Num sockets: 2 + Core PMU (perfmon) version: 3 + Number of core PMU generic (programmable) counters: 8 + Width of generic (programmable) counters: 48 bits + Number of core PMU fixed counters: 3 + Width of fixed counters: 48 bits + Nominal core frequency: 2400000000 Hz + Package thermal spec power: 115 Watt; Package minimum power: 51 Watt; Package maximum power: 180 Watt; + Socket 0: 1 memory controllers detected with total number of 4 channels. 2 QPI ports detected. + Socket 1: 1 memory controllers detected with total number of 4 channels. 2 QPI ports detected. + Number of PCM instances: 2 + Max QPI link speed: 16.0 GBytes/second (8.0 GT/second) + + Detected Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz "Intel(r) microarchitecture codename Sandy Bridge-EP/Jaketown" + + Executing "./matrix" command: + + Exit code: 0 + + + EXEC : instructions per nominal CPU cycle + IPC : instructions per CPU cycle + FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost) + AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost) + L3MISS: L3 cache misses + L2MISS: L2 cache misses (including other core's L2 cache *hits*) + L3HIT : L3 cache hit ratio (0.00-1.00) + L2HIT : L2 cache hit ratio (0.00-1.00) + L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency + L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00) + READ : bytes read from memory controller (in GBytes) + WRITE : bytes written to memory controller (in GBytes) + TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature + + + Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE | TEMP + + 0 0 0.00 0.64 0.01 0.80 5592 11 K 0.49 0.13 0.32 0.06 N/A N/A 67 + 1 0 0.00 0.18 0.00 0.69 3086 5552 0.44 0.07 0.48 0.08 N/A N/A 68 + 2 0 0.00 0.23 0.00 0.81 300 562 0.47 0.06 0.43 0.08 N/A N/A 67 + 3 0 0.00 0.21 0.00 0.99 437 862 0.49 0.06 0.44 0.09 N/A N/A 73 + 4 0 0.00 0.23 0.00 0.93 293 559 0.48 0.07 0.42 0.09 N/A N/A 73 + 5 0 0.00 0.21 0.00 1.00 423 849 0.50 0.06 0.43 0.10 N/A N/A 69 + 6 0 0.00 0.23 0.00 0.94 285 558 0.49 0.06 0.41 0.09 N/A N/A 71 + 7 0 0.00 0.18 0.00 0.81 674 1130 0.40 0.05 0.53 0.08 N/A N/A 65 + 8 1 0.00 0.47 0.01 1.26 6371 13 K 0.51 0.35 0.31 0.07 N/A N/A 64 + 9 1 2.30 1.80 1.28 1.29 179 K 15 M 0.99 0.59 0.04 0.71 N/A N/A 60 + 10 1 0.00 0.22 0.00 1.26 315 570 0.45 0.06 0.43 0.08 N/A N/A 67 + 11 1 0.00 0.23 0.00 0.74 321 579 0.45 0.05 0.45 0.07 N/A N/A 66 + 12 1 0.00 0.22 0.00 1.25 305 570 0.46 0.05 0.42 0.07 N/A N/A 68 + 13 1 0.00 0.22 0.00 1.26 336 581 0.42 0.04 0.44 0.06 N/A N/A 69 + 14 1 0.00 0.22 0.00 1.25 314 565 0.44 0.06 0.43 0.07 N/A N/A 69 + 15 1 0.00 0.29 0.00 1.19 2815 6926 0.59 0.39 0.29 0.08 N/A N/A 69 + ------------------------------------------------------------------------------------------------------------------- + SKT 0 0.00 0.46 0.00 0.79 11 K 21 K 0.47 0.10 0.38 0.07 0.00 0.00 65 + SKT 1 0.29 1.79 0.16 1.29 190 K 15 M 0.99 0.59 0.05 0.70 0.01 0.01 61 + ------------------------------------------------------------------------------------------------------------------- + TOTAL * 0.14 1.78 0.08 1.28 201 K 15 M 0.99 0.59 0.05 0.70 0.01 0.01 N/A + + Instructions retired: 1345 M ; Active cycles: 755 M ; Time (TSC): 582 Mticks ; C0 (active,non-halted) core residency: 6.30 % + + C1 core residency: 0.14 %; C3 core residency: 0.20 %; C6 core residency: 0.00 %; C7 core residency: 93.36 %; + C2 package residency: 48.81 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %; + + PHYSICAL CORE IPC : 1.78 => corresponds to 44.50 % utilization for cores in active state + Instructions per nominal CPU cycle: 0.14 => corresponds to 3.60 % core utilization over time interval + + Intel(r) QPI data traffic estimation in bytes (data traffic coming to CPU/socket through QPI links): + + QPI0 QPI1 | QPI0 QPI1 + ---------------------------------------------------------------------------------------------- + SKT 0 0 0 | 0% 0% + SKT 1 0 0 | 0% 0% + ---------------------------------------------------------------------------------------------- + Total QPI incoming data traffic: 0 QPI data traffic/Memory controller traffic: 0.00 + + Intel(r) QPI traffic estimation in bytes (data and non-data traffic outgoing from CPU/socket through QPI links): + + QPI0 QPI1 | QPI0 QPI1 + ---------------------------------------------------------------------------------------------- + SKT 0 0 0 | 0% 0% + SKT 1 0 0 | 0% 0% + ---------------------------------------------------------------------------------------------- + Total QPI outgoing data and non-data traffic: 0 + + ---------------------------------------------------------------------------------------------- + SKT 0 package consumed 4.06 Joules + SKT 1 package consumed 9.40 Joules + ---------------------------------------------------------------------------------------------- + TOTAL: 13.46 Joules + + ---------------------------------------------------------------------------------------------- + SKT 0 DIMMs consumed 4.18 Joules + SKT 1 DIMMs consumed 4.28 Joules + ---------------------------------------------------------------------------------------------- + TOTAL: 8.47 Joules + Cleaning up  @@ -218,85 +218,85 @@ root user) Sample program using the API : - #include <stdlib.h> - #include <stdio.h> - #include "cpucounters.h" + #include <stdlib.h> + #include <stdio.h> + #include "cpucounters.h" - #define SIZE 1000 + #define SIZE 1000 - using namespace std; + using namespace std; - int main(int argc, char **argv) { - float matrixa[SIZE][SIZE], matrixb[SIZE][SIZE], mresult[SIZE][SIZE]; - float real_time, proc_time, mflops; - long long flpins; - int retval; - int i,j,k; + int main(int argc, char **argv) { + float matrixa[SIZE][SIZE], matrixb[SIZE][SIZE], mresult[SIZE][SIZE]; + float real_time, proc_time, mflops; + long long flpins; + int retval; + int i,j,k; - PCM * m = PCM::getInstance(); + PCM * m = PCM::getInstance(); - if (m->program() != PCM::Success) return 1; + if (m->program() != PCM::Success) return 1; - SystemCounterState before_sstate = getSystemCounterState(); + SystemCounterState before_sstate = getSystemCounterState(); - /* Initialize the Matrix arrays */ - for ( i=0; i<SIZE*SIZE; i++ ){ - mresult[0][i] = 0.0; - matrixa[0][i] = matrixb[0][i] = rand()*(float)1.1; } + /* Initialize the Matrix arrays */ + for ( i=0; i<SIZE*SIZE; i++ ){ + mresult[0][i] = 0.0; + matrixa[0][i] = matrixb[0][i] = rand()*(float)1.1; } - /* A naive Matrix-Matrix multiplication */ - for (i=0;i<SIZE;i++) - for(j=0;j<SIZE;j++) - for(k=0;k<SIZE;k++) - mresult[i][j]=mresult[i][j] + matrixa[i][k]*matrixb[k][j]; + /* A naive Matrix-Matrix multiplication */ + for (i=0;i<SIZE;i++) + for(j=0;j<SIZE;j++) + for(k=0;k<SIZE;k++) + mresult[i][j]=mresult[i][j] + matrixa[i][k]*matrixb[k][j]; - SystemCounterState after_sstate = getSystemCounterState(); + SystemCounterState after_sstate = getSystemCounterState(); - cout << "Instructions per clock:" << getIPC(before_sstate,after_sstate) - << "L3 cache hit ratio:" << getL3CacheHitRatio(before_sstate,after_sstate) - << "Bytes read:" << getBytesReadFromMC(before_sstate,after_sstate); + cout << "Instructions per clock:" << getIPC(before_sstate,after_sstate) + << "L3 cache hit ratio:" << getL3CacheHitRatio(before_sstate,after_sstate) + << "Bytes read:" << getBytesReadFromMC(before_sstate,after_sstate); - for (i=0; i<SIZE;i++) - for (j=0; j<SIZE; j++) - if (mresult[i][j] == -1) printf("x"); + for (i=0; i<SIZE;i++) + for (j=0; j<SIZE; j++) + if (mresult[i][j] == -1) printf("x"); - return 0; - } + return 0; + } Compile it with : - $ icc matrix.cpp -o matrix -lpthread -lpcm + $ icc matrix.cpp -o matrix -lpthread -lpcm Sample output : - $ ./matrix - Number of physical cores: 16 - Number of logical cores: 16 - Threads (logical cores) per physical core: 1 - Num sockets: 2 - Core PMU (perfmon) version: 3 - Number of core PMU generic (programmable) counters: 8 - Width of generic (programmable) counters: 48 bits - Number of core PMU fixed counters: 3 - Width of fixed counters: 48 bits - Nominal core frequency: 2400000000 Hz - Package thermal spec power: 115 Watt; Package minimum power: 51 Watt; Package maximum power: 180 Watt; - Socket 0: 1 memory controllers detected with total number of 4 channels. 2 QPI ports detected. - Socket 1: 1 memory controllers detected with total number of 4 channels. 2 QPI ports detected. - Number of PCM instances: 2 - Max QPI link speed: 16.0 GBytes/second (8.0 GT/second) - Instructions per clock:1.7 - L3 cache hit ratio:1.0 - Bytes read:12513408 + $ ./matrix + Number of physical cores: 16 + Number of logical cores: 16 + Threads (logical cores) per physical core: 1 + Num sockets: 2 + Core PMU (perfmon) version: 3 + Number of core PMU generic (programmable) counters: 8 + Width of generic (programmable) counters: 48 bits + Number of core PMU fixed counters: 3 + Width of fixed counters: 48 bits + Nominal core frequency: 2400000000 Hz + Package thermal spec power: 115 Watt; Package minimum power: 51 Watt; Package maximum power: 180 Watt; + Socket 0: 1 memory controllers detected with total number of 4 channels. 2 QPI ports detected. + Socket 1: 1 memory controllers detected with total number of 4 channels. 2 QPI ports detected. + Number of PCM instances: 2 + Max QPI link speed: 16.0 GBytes/second (8.0 GT/second) + Instructions per clock:1.7 + L3 cache hit ratio:1.0 + Bytes read:12513408 References ---------- -1. <https://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization> -2. <https://software.intel.com/sites/default/files/m/3/2/2/xeon-e5-2600-uncore-guide.pdf> Intel® - Xeon® Processor E5-2600 Product Family Uncore Performance - Monitoring Guide. -3. <http://intel-pcm-api-documentation.github.io/classPCM.html> API - Documentation +1.<https://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization> +2.<https://software.intel.com/sites/default/files/m/3/2/2/xeon-e5-2600-uncore-guide.pdf> Intel® + Xeon® Processor E5-2600 Product Family Uncore Performance + Monitoring Guide. +3.<http://intel-pcm-api-documentation.github.io/classPCM.html> API + Documentation diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md index 488513eabd887ca213f2ca65923ca7365365dccf..26ab8794f660f011a54079b13b6aca3e7028f383 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/intel-vtune-amplifier.md @@ -3,21 +3,21 @@ Intel VTune Amplifier - + Introduction ------------ -Intel*® *VTune™ <span>Amplifier, part of Intel Parallel studio, is a GUI +Intel*® *VTune™ >Amplifier, part of Intel Parallel studio, is a GUI profiling tool designed for Intel processors. It offers a graphical performance analysis of single core and multithreaded applications. A -highlight of the features:</span> +highlight of the features: -- Hotspot analysis -- Locks and waits analysis -- Low level specific counters, such as branch analysis and memory - bandwidth -- Power usage analysis - frequency and sleep states. +- Hotspot analysis +- Locks and waits analysis +- Low level specific counters, such as branch analysis and memory + bandwidth +- Power usage analysis - frequency and sleep states.  @@ -28,26 +28,26 @@ Usage To launch the GUI, first load the module: - $ module add VTune/2016_update1 + $ module add VTune/2016_update1 -<span class="s1">and launch the GUI :</span> + class="s1">and launch the GUI : - $ amplxe-gui + $ amplxe-gui -<span>To profile an application with VTune Amplifier, special kernel +>To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on [using GUI -applications](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33).</span> +applications](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). -<span>The GUI will open in new window. Click on "*New Project...*" to +>The GUI will open in new window. Click on "*New Project...*" to create a new project. After clicking *OK*, a new window with project properties will appear.  At "*Application:*", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "*Managed code profiling mode:*" select "*Native*" (unless you want to profile managed mode .NET/Mono applications). After clicking *OK*, -your project is created.</span> +your project is created. To run a new analysis, click "*New analysis...*". You will see a list of possible analysis. Some of them will not be possible on the current CPU @@ -69,7 +69,7 @@ the command line needed to perform the selected analysis. The command line will look like this: - /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -knob collection-detail=stack-and-callcount -mrte-mode=native -target-duration-type=veryshort -app-working-dir /home/sta545/test -- /home/sta545/test_pgsesv + /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -knob collection-detail=stack-and-callcount -mrte-mode=native -target-duration-type=veryshort -app-working-dir /home/sta545/test -- /home/sta545/test_pgsesv Copy the line to clipboard and then you can paste it in your jobscript or in command line. After the collection is run, open the GUI once @@ -85,13 +85,13 @@ It is possible to analyze both native and offload Xeon Phi applications. For offload mode, just specify the path to the binary. For native mode, you need to specify in project properties: -Application: <span class="monospace">ssh</span> +Application: ssh -Application parameters: <span class="monospace">mic0 source ~/.profile -&& /path/to/your/bin</span> +Application parameters: mic0 source ~/.profile +&& /path/to/your/bin -Note that we include <span class="monospace">source ~/.profile -</span>in the command to setup environment paths [as described +Note that we include source ~/.profile +in the command to setup environment paths [as described here](../intel-xeon-phi.html). If the analysis is interrupted or aborted, further analysis on the card @@ -102,11 +102,11 @@ card. You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later : - $ amplxe-cl -collect knc-hotspots -no-auto-finalize -- ssh mic0 - "export LD_LIBRARY_PATH=/apps/intel/composer_xe_2015.2.164/compiler/lib/mic/:/apps/intel/composer_xe_2015.2.164/mkl/lib/mic/; export KMP_AFFINITY=compact; /tmp/app.mic" + $ amplxe-cl -collect knc-hotspots -no-auto-finalize -- ssh mic0 + "export LD_LIBRARY_PATH=/apps/intel/composer_xe_2015.2.164/compiler/lib/mic/:/apps/intel/composer_xe_2015.2.164/mkl/lib/mic/; export KMP_AFFINITY=compact; /tmp/app.mic" References ---------- -1. <span><https://www.rcac.purdue.edu/tutorials/phi/PerformanceTuningXeonPhi-Tullos.pdf> Performance - Tuning for Intel® Xeon Phi™ Coprocessors</span> +1.><https://www.rcac.purdue.edu/tutorials/phi/PerformanceTuningXeonPhi-Tullos.pdf> Performance + Tuning for Intel® Xeon Phi™ Coprocessors diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/papi.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/papi.md index 06e249181f627f8a2010b1dbd3db76abcd08744b..385f2c266ce5cb877d6da2e5d35ffa9066617518 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/papi.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/papi.md @@ -3,41 +3,41 @@ PAPI - + Introduction ------------ -<span dir="auto">Performance Application Programming -Interface </span><span>(PAPI)  is a portable interface to access + dir="auto">Performance Application Programming +Interface >(PAPI)  is a portable interface to access hardware performance counters (such as instruction counts and cache misses) found in most modern architectures. With the new component framework, PAPI is not limited only to CPU counters, but offers also -components for CUDA, network, Infiniband etc.</span> +components for CUDA, network, Infiniband etc. -<span>PAPI provides two levels of interface - a simpler, high level -interface and more detailed low level interface.</span> +>PAPI provides two levels of interface - a simpler, high level +interface and more detailed low level interface. -<span>PAPI can be used with parallel as well as serial programs.</span> +>PAPI can be used with parallel as well as serial programs. Usage ----- To use PAPI, load -[module](../../environment-and-modules.html) <span -class="monospace">papi</span> : +[module](../../environment-and-modules.html) +papi : - $ module load papi + $ module load papi -This will load the default version. Execute <span -class="monospace">module avail papi</span> for a list of installed +This will load the default version. Execute +module avail papi for a list of installed versions. Utilites -------- -The <span class="monospace">bin</span> directory of PAPI (which is -automatically added to <span class="monospace">$PATH</span> upon +The bin directory of PAPI (which is +automatically added to $PATH upon loading the module) contains various utilites. ### papi_avail @@ -46,72 +46,72 @@ Prints which preset events are available on the current CPU. The third column indicated whether the preset event is available on the current CPU. - $ papi_avail - Available events and hardware information. - -------------------------------------------------------------------------------- - PAPI Version : 5.3.2.0 - Vendor string and code : GenuineIntel (1) - Model string and code : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (45) - CPU Revision : 7.000000 - CPUID Info : Family: 6 Model: 45 Stepping: 7 - CPU Max Megahertz : 2601 - CPU Min Megahertz : 1200 - Hdw Threads per core : 1 - Cores per Socket : 8 - Sockets : 2 - NUMA Nodes : 2 - CPUs per Node : 8 - Total CPUs : 16 - Running in a VM : no - Number Hardware Counters : 11 - Max Multiplex Counters : 32 - -------------------------------------------------------------------------------- - Name Code Avail Deriv Description (Note) - PAPI_L1_DCM 0x80000000 Yes No Level 1 data cache misses - PAPI_L1_ICM 0x80000001 Yes No Level 1 instruction cache misses - PAPI_L2_DCM 0x80000002 Yes Yes Level 2 data cache misses - PAPI_L2_ICM 0x80000003 Yes No Level 2 instruction cache misses - PAPI_L3_DCM 0x80000004 No No Level 3 data cache misses - PAPI_L3_ICM 0x80000005 No No Level 3 instruction cache misses - PAPI_L1_TCM 0x80000006 Yes Yes Level 1 cache misses - PAPI_L2_TCM 0x80000007 Yes No Level 2 cache misses - PAPI_L3_TCM 0x80000008 Yes No Level 3 cache misses - .... + $ papi_avail + Available events and hardware information. + -------------------------------------------------------------------------------- + PAPI Version : 5.3.2.0 + Vendor string and code : GenuineIntel (1) + Model string and code : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (45) + CPU Revision : 7.000000 + CPUID Info : Family: 6 Model: 45 Stepping: 7 + CPU Max Megahertz : 2601 + CPU Min Megahertz : 1200 + Hdw Threads per core : 1 + Cores per Socket : 8 + Sockets : 2 + NUMA Nodes : 2 + CPUs per Node : 8 + Total CPUs : 16 + Running in a VM : no + Number Hardware Counters : 11 + Max Multiplex Counters : 32 + -------------------------------------------------------------------------------- + Name Code Avail Deriv Description (Note) + PAPI_L1_DCM 0x80000000 Yes No Level 1 data cache misses + PAPI_L1_ICM 0x80000001 Yes No Level 1 instruction cache misses + PAPI_L2_DCM 0x80000002 Yes Yes Level 2 data cache misses + PAPI_L2_ICM 0x80000003 Yes No Level 2 instruction cache misses + PAPI_L3_DCM 0x80000004 No No Level 3 data cache misses + PAPI_L3_ICM 0x80000005 No No Level 3 instruction cache misses + PAPI_L1_TCM 0x80000006 Yes Yes Level 1 cache misses + PAPI_L2_TCM 0x80000007 Yes No Level 2 cache misses + PAPI_L3_TCM 0x80000008 Yes No Level 3 cache misses + .... ### papi_native_avail -<span>Prints which native events are available on the current -CPU.</span> +>Prints which native events are available on the current +CPU. -### <span class="s1">papi_cost</span> +### class="s1">papi_cost -<span>Measures the cost (in cycles) of basic PAPI operations.</span> +>Measures the cost (in cycles) of basic PAPI operations. -### <span>papi_mem_info</span> +### >papi_mem_info -<span>Prints information about the memory architecture of the current -CPU.</span> +>Prints information about the memory architecture of the current +CPU. PAPI API -------- PAPI provides two kinds of events: -- **Preset events** is a set of predefined common CPU events, - <span>standardized</span> across platforms. -- **Native events **is a set of all events supported by the - current hardware. This is a larger set of features than preset. For - other components than CPU, only native events are usually available. +- **Preset events** is a set of predefined common CPU events, + >standardized across platforms. +- **Native events **is a set of all events supported by the + current hardware. This is a larger set of features than preset. For + other components than CPU, only native events are usually available. To use PAPI in your application, you need to link the appropriate include file. -- <span class="monospace">papi.h</span> for C -- <span class="monospace">f77papi.h</span> for Fortran 77 -- <span class="monospace">f90papi.h</span> for Fortran 90 -- <span class="monospace">fpapi.h</span> for Fortran with preprocessor +- papi.h for C +- f77papi.h for Fortran 77 +- f90papi.h for Fortran 90 +- fpapi.h for Fortran with preprocessor -The include path is automatically added by papi module to <span -class="monospace">$INCLUDE</span>. +The include path is automatically added by papi module to +$INCLUDE. @@ -144,60 +144,60 @@ Example The following example prints MFLOPS rate of a naive matrix-matrix multiplication : - #include <stdlib.h> - #include <stdio.h> - #include "papi.h" - #define SIZE 1000 - - int main(int argc, char **argv) { - float matrixa[SIZE][SIZE], matrixb[SIZE][SIZE], mresult[SIZE][SIZE]; - float real_time, proc_time, mflops; - long long flpins; - int retval; - int i,j,k; - - /* Initialize the Matrix arrays */ - for ( i=0; i<SIZE*SIZE; i++ ){ - mresult[0][i] = 0.0; - matrixa[0][i] = matrixb[0][i] = rand()*(float)1.1; - } -  - /* Setup PAPI library and begin collecting data from the counters */ - if((retval=PAPI_flops( &real_time, &proc_time, &flpins, &mflops))<PAPI_OK) - printf("Error!"); - - /* A naive Matrix-Matrix multiplication */ - for (i=0;i<SIZE;i++) - for(j=0;j<SIZE;j++) - for(k=0;k<SIZE;k++) - mresult[i][j]=mresult[i][j] + matrixa[i][k]*matrixb[k][j]; - - /* Collect the data into the variables passed in */ - if((retval=PAPI_flops( &real_time, &proc_time, &flpins, &mflops))<PAPI_OK) - printf("Error!"); - - printf("Real_time:t%fnProc_time:t%fnTotal flpins:t%lldnMFLOPS:tt%fn", real_time, proc_time, flpins, mflops); - PAPI_shutdown(); - return 0; - } + #include <stdlib.h> + #include <stdio.h> + #include "papi.h" + #define SIZE 1000 + + int main(int argc, char **argv) { + float matrixa[SIZE][SIZE], matrixb[SIZE][SIZE], mresult[SIZE][SIZE]; + float real_time, proc_time, mflops; + long long flpins; + int retval; + int i,j,k; + + /* Initialize the Matrix arrays */ + for ( i=0; i<SIZE*SIZE; i++ ){ + mresult[0][i] = 0.0; + matrixa[0][i] = matrixb[0][i] = rand()*(float)1.1; + } +  + /* Setup PAPI library and begin collecting data from the counters */ + if((retval=PAPI_flops( &real_time, &proc_time, &flpins, &mflops))<PAPI_OK) + printf("Error!"); + + /* A naive Matrix-Matrix multiplication */ + for (i=0;i<SIZE;i++) + for(j=0;j<SIZE;j++) + for(k=0;k<SIZE;k++) + mresult[i][j]=mresult[i][j] + matrixa[i][k]*matrixb[k][j]; + + /* Collect the data into the variables passed in */ + if((retval=PAPI_flops( &real_time, &proc_time, &flpins, &mflops))<PAPI_OK) + printf("Error!"); + + printf("Real_time:t%fnProc_time:t%fnTotal flpins:t%lldnMFLOPS:tt%fn", real_time, proc_time, flpins, mflops); + PAPI_shutdown(); + return 0; + }  Now compile and run the example : - $ gcc matrix.c -o matrix -lpapi - $ ./matrix - Real_time: 8.852785 - Proc_time: 8.850000 - Total flpins: 6012390908 - MFLOPS: 679.366211 + $ gcc matrix.c -o matrix -lpapi + $ ./matrix + Real_time: 8.852785 + Proc_time: 8.850000 + Total flpins: 6012390908 + MFLOPS: 679.366211 Let's try with optimizations enabled : - $ gcc -O3 matrix.c -o matrix -lpapi - $ ./matrix - Real_time: 0.000020 - Proc_time: 0.000000 - Total flpins: 6 - MFLOPS: inf + $ gcc -O3 matrix.c -o matrix -lpapi + $ ./matrix + Real_time: 0.000020 + Proc_time: 0.000000 + Total flpins: 6 + MFLOPS: inf Now we see a seemingly strange result - the multiplication took no time and only 6 floating point instructions were issued. This is because the @@ -206,20 +206,20 @@ as the result is actually not used anywhere in the program. We can fix this by adding some "dummy" code at the end of the Matrix-Matrix multiplication routine : - for (i=0; i<SIZE;i++) - for (j=0; j<SIZE; j++) - if (mresult[i][j] == -1.0) printf("x"); + for (i=0; i<SIZE;i++) + for (j=0; j<SIZE; j++) + if (mresult[i][j] == -1.0) printf("x"); Now the compiler won't remove the multiplication loop. (However it is still not that smart to see that the result won't ever be negative). Now run the code again: - $ gcc -O3 matrix.c -o matrix -lpapi - $ ./matrix - Real_time: 8.795956 - Proc_time: 8.790000 - Total flpins: 18700983160 - MFLOPS: 2127.529297 + $ gcc -O3 matrix.c -o matrix -lpapi + $ ./matrix + Real_time: 8.795956 + Proc_time: 8.790000 + Total flpins: 18700983160 + MFLOPS: 2127.529297 ### Intel Xeon Phi @@ -229,43 +229,43 @@ operations counter is missing. To use PAPI in [Intel Xeon Phi](../intel-xeon-phi.html) native applications, you -need to load module with "<span class="monospace">-mic</span>" suffix, -for example "<span class="monospace">papi/5.3.2-mic</span>" : +need to load module with " -mic" suffix, +for example " papi/5.3.2-mic" : - $ module load papi/5.3.2-mic + $ module load papi/5.3.2-mic Then, compile your application in the following way: - $ module load intel - $ icc -mmic -Wl,-rpath,/apps/intel/composer_xe_2013.5.192/compiler/lib/mic matrix-mic.c -o matrix-mic -lpapi -lpfm + $ module load intel + $ icc -mmic -Wl,-rpath,/apps/intel/composer_xe_2013.5.192/compiler/lib/mic matrix-mic.c -o matrix-mic -lpapi -lpfm -To execute the application on MIC, you need to manually set <span -class="monospace">LD_LIBRARY_PATH :</span> +To execute the application on MIC, you need to manually set +LD_LIBRARY_PATH : - $ qsub -q qmic -A NONE-0-0 -I - $ ssh mic0 - $ export LD_LIBRARY_PATH=/apps/tools/papi/5.4.0-mic/lib/ - $ ./matrix-mic + $ qsub -q qmic -A NONE-0-0 -I + $ ssh mic0 + $ export LD_LIBRARY_PATH=/apps/tools/papi/5.4.0-mic/lib/ + $ ./matrix-mic -Alternatively, you can link PAPI statically (<span -class="monospace">-static</span> flag), then <span -class="monospace">LD_LIBRARY_PATH</span> does not need to be set. +Alternatively, you can link PAPI statically ( +-static flag), then +LD_LIBRARY_PATH does not need to be set. You can also execute the PAPI tools on MIC : - $ /apps/tools/papi/5.4.0-mic/bin/papi_native_avail + $ /apps/tools/papi/5.4.0-mic/bin/papi_native_avail To use PAPI in offload mode, you need to provide both host and MIC versions of PAPI: - $ module load papi/5.4.0 - $ icc matrix-offload.c -o matrix-offload -offload-option,mic,compiler,"-L$PAPI_HOME-mic/lib -lpapi" -lpapi + $ module load papi/5.4.0 + $ icc matrix-offload.c -o matrix-offload -offload-option,mic,compiler,"-L$PAPI_HOME-mic/lib -lpapi" -lpapi References ---------- -1. <http://icl.cs.utk.edu/papi/> Main project page -2. <http://icl.cs.utk.edu/projects/papi/wiki/Main_Page> Wiki -3. <http://icl.cs.utk.edu/papi/docs/> API Documentation +1.<http://icl.cs.utk.edu/papi/> Main project page +2.<http://icl.cs.utk.edu/projects/papi/wiki/Main_Page> Wiki +3.<http://icl.cs.utk.edu/papi/docs/> API Documentation diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/scalasca.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/scalasca.md index 19c86697b7ff098e5873e7df828157d94fa0612b..3fff2cf0633fabe011e55750d93fddf33b904966 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/scalasca.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/scalasca.md @@ -1,7 +1,7 @@ Scalasca ======== -<span>Introduction</span> +>Introduction ------------------------- [Scalasca](http://www.scalasca.org/) is a software tool @@ -21,45 +21,45 @@ There are currently two versions of Scalasca 2.0 [modules](../../environment-and-modules.html) installed on Anselm: -- <span class="s1"><span - class="monospace">scalasca2/2.0-gcc-openmpi</span>, for usage with - [GNU Compiler](../compilers.html) and - [OpenMPI](../mpi-1/Running_OpenMPI.html),</span> +- class="s1"> + scalasca2/2.0-gcc-openmpi, for usage with + [GNU Compiler](../compilers.html) and + [OpenMPI](../mpi-1/Running_OpenMPI.html), -- <span class="s1"><span - class="monospace">scalasca2/2.0-icc-impi</span>, for usage with - [Intel Compiler](../compilers.html) and [Intel - MPI](../mpi-1/running-mpich2.html).</span> +- class="s1"> + scalasca2/2.0-icc-impi, for usage with + [Intel Compiler](../compilers.html) and [Intel + MPI](../mpi-1/running-mpich2.html). Usage ----- Profiling a parallel application with Scalasca consists of three steps: -1. Instrumentation, compiling the application such way, that the - profiling data can be generated. -2. Runtime measurement, running the application with the Scalasca - profiler to collect performance data. -3. Analysis of reports +1.Instrumentation, compiling the application such way, that the + profiling data can be generated. +2.Runtime measurement, running the application with the Scalasca + profiler to collect performance data. +3.Analysis of reports ### Instrumentation -Instrumentation via "<span class="monospace">scalasca --instrument</span>" is discouraged. Use [Score-P +Instrumentation via " scalasca +-instrument" is discouraged. Use [Score-P instrumentation](score-p.html). ### Runtime measurement After the application is instrumented, runtime measurement can be -performed with the "<span class="monospace">scalasca -analyze</span>" +performed with the " scalasca -analyze" command. The syntax is : -<span class="monospace">scalasca -analyze [scalasca options] -[launcher] [launcher options] [program] [program options]</span> + scalasca -analyze [scalasca options] +[launcher] [launcher options] [program] [program options] An example : - $ scalasca -analyze mpirun -np 4 ./mympiprogram + $ scalasca -analyze mpirun -np 4 ./mympiprogram Some notable Scalsca options are: @@ -67,10 +67,10 @@ Some notable Scalsca options are: collected. -e <directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with -prefix <span>scorep_, followed by name of the executable and launch -configuration.</span> -<span> -</span> +prefix >scorep_, followed by name of the executable and launch +configuration. +> + Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage.html). @@ -84,13 +84,13 @@ tool is launched. To launch the analysis, run : -``` +``` scalasca -examine [options] <experiment_directory> ``` If you do not wish to launch the GUI tool, use the "-s" option : -``` +``` scalasca -examine -s <experiment_directory> ``` @@ -104,6 +104,6 @@ GUI viewer. References ---------- -1. <http://www.scalasca.org/> +1.<http://www.scalasca.org/> diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/score-p.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/score-p.md index b4cd5497a1a992f899849353d20d6fd719a8884c..15482da4c33f86faf39d5330f7fe7abb1c5385b3 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/score-p.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/score-p.md @@ -19,24 +19,24 @@ There are currently two versions of Score-P version 1.2.6 [modules](../../environment-and-modules.html) installed on Anselm : -- <span class="s1">scorep/1.2.3-gcc-openmpi<span>, for usage - with </span>[GNU - Compiler](../compilers.html)<span> and </span>[OpenMPI](../mpi-1/Running_OpenMPI.html)<span>,</span></span> +- class="s1">scorep/1.2.3-gcc-openmpi>, for usage + with [GNU + Compiler](../compilers.html)> and [OpenMPI](../mpi-1/Running_OpenMPI.html)>,</span> -- <span class="s1">scorep/1.2.3-icc-impi<span>, for usage - with </span>[Intel - Compiler](../compilers.html)<span> and </span>[Intel - MPI](../mpi-1/running-mpich2.html)<span>.</span></span> +- class="s1">scorep/1.2.3-icc-impi>, for usage + with [Intel + Compiler](../compilers.html)> and [Intel + MPI](../mpi-1/running-mpich2.html)>. Instrumentation --------------- -<span>There are three ways to instrument your parallel applications in -order to enable performance data collection :</span> +>There are three ways to instrument your parallel applications in +order to enable performance data collection : -1. <span>Automated instrumentation using compiler</span> -2. <span>Manual instrumentation using API calls</span> -3. <span>Manual instrumentation using directives</span> +1.>Automated instrumentation using compiler +2.>Manual instrumentation using API calls +3.>Manual instrumentation using directives ### Automated instrumentation @@ -45,11 +45,11 @@ every routine entry and exit using compiler hooks, and will intercept MPI calls and OpenMP regions. This method might, however, produce a large number of data. If you want to focus on profiler a specific regions of your code, consider using the manual instrumentation methods. -To use automated instrumentation, simply prepend <span -class="monospace">scorep</span> to your compilation command. For +To use automated instrumentation, simply prepend +scorep to your compilation command. For example, replace : -``` +``` $ mpif90 -c foo.f90 $ mpif90 -c bar.f90 $ mpif90 -o myapp foo.o bar.o @@ -57,89 +57,89 @@ $ mpif90 -o myapp foo.o bar.o with : -``` -$ scorep mpif90 -c foo.f90 -$ scorep mpif90 -c bar.f90 -$ scorep mpif90 -o myapp foo.o bar.o +``` +$ scorepmpif90 -c foo.f90 +$ scorepmpif90 -c bar.f90 +$ scorepmpif90 -o myapp foo.o bar.o ``` Usually your program is compiled using a Makefile or similar script, so -it advisable to add the <span class="monospace">scorep</span> command to -your definition of variables <span class="monospace">CC</span>, <span -class="monospace">CXX</span>, <span class="monospace">FCC</span> etc. +it advisable to add the scorep command to +your definition of variables CC, +CXX, class="monospace">FCC etc. -It is important that <span class="monospace">scorep</span> is prepended +It is important that scorep is prepended also to the linking command, in order to link with Score-P instrumentation libraries. -### <span>Manual instrumentation using API calls</span> +### >Manual instrumentation using API calls -To use this kind of instrumentation, use <span -class="monospace">scorep</span> with switch <span -class="monospace">--user</span>. You will then mark regions to be +To use this kind of instrumentation, use +scorep with switch +--user. You will then mark regions to be instrumented by inserting API calls. An example in C/C++ : - #include <scorep/SCOREP_User.h> - void foo() - { - SCOREP_USER_REGION_DEFINE( my_region_handle ) - // more declarations - SCOREP_USER_REGION_BEGIN( my_region_handle, "foo", SCOREP_USER_REGION_TYPE_COMMON ) - // do something - SCOREP_USER_REGION_END( my_region_handle ) - } + #include <scorep/SCOREP_User.h> + void foo() + { + SCOREP_USER_REGION_DEFINE( my_region_handle ) + // more declarations + SCOREP_USER_REGION_BEGIN( my_region_handle, "foo", SCOREP_USER_REGION_TYPE_COMMON ) + // do something + SCOREP_USER_REGION_END( my_region_handle ) + }  and Fortran : - #include "scorep/SCOREP_User.inc" - subroutine foo - SCOREP_USER_REGION_DEFINE( my_region_handle ) - ! more declarations - SCOREP_USER_REGION_BEGIN( my_region_handle, "foo", SCOREP_USER_REGION_TYPE_COMMON ) - ! do something - SCOREP_USER_REGION_END( my_region_handle ) - end subroutine foo + #include "scorep/SCOREP_User.inc" + subroutine foo + SCOREP_USER_REGION_DEFINE( my_region_handle ) + ! more declarations + SCOREP_USER_REGION_BEGIN( my_region_handle, "foo", SCOREP_USER_REGION_TYPE_COMMON ) + ! do something + SCOREP_USER_REGION_END( my_region_handle ) + end subroutine foo Please refer to the [documentation for description of the API](https://silc.zih.tu-dresden.de/scorep-current/pdf/scorep.pdf). -### <span>Manual instrumentation using directives</span> +### >Manual instrumentation using directives This method uses POMP2 directives to mark regions to be instrumented. To -use this method, use command <span class="monospace">scorep ---pomp.</span> +use this method, use command scorep +--pomp. Example directives in C/C++ : - void foo(...) - { - /* declarations */ - #pragma pomp inst begin(foo) - ... - if (<condition>) - { - #pragma pomp inst altend(foo) - return; - } - ... - #pragma pomp inst end(foo) - } - -<span>and in Fortran :</span> - - subroutine foo(...) - !declarations - !POMP$ INST BEGIN(foo) - ... - if (<condition>) then - !POMP$ INST ALTEND(foo) - return - end if - ... - !POMP$ INST END(foo) - end subroutine foo + void foo(...) + { + /* declarations */ + #pragma pomp inst begin(foo) + ... + if (<condition>) + { + #pragma pomp inst altend(foo) + return; + } + ... + #pragma pomp inst end(foo) + } + +>and in Fortran : + + subroutine foo(...) + !declarations + !POMP$ INST BEGIN(foo) + ... + if (<condition>) then + !POMP$ INST ALTEND(foo) + return + end if + ... + !POMP$ INST END(foo) + end subroutine foo The directives are ignored if the program is compiled without Score-P. Again, please refer to the diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/summary.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/summary.md index fc017fcdd5ab9dcca129c4bb4c837cb51a6eb729..b60cde8ff14bcce56f9f8f1c2ed8fa29489d5c25 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/summary.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/summary.md @@ -3,7 +3,7 @@ Debuggers and profilers summary - + Introduction ------------ @@ -23,8 +23,8 @@ environment. Use [X display](https://docs.it4i.cz/anselm-cluster-documentation/software/debuggers/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) for running the GUI. - $ module load intel - $ idb + $ module load intel + $ idb Read more at the [Intel Debugger](../intel-suite/intel-debugger.html) page. @@ -40,8 +40,8 @@ every thread running as part of your program, or for every process - even if these processes are distributed across a cluster using an MPI implementation. - $ module load Forge - $ forge + $ module load Forge + $ forge Read more at the [Allinea DDT](allinea-ddt.html) page. @@ -55,8 +55,8 @@ about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs. Our license is limited to 64 MPI processes. - $ module load PerformanceReports/6.0 - $ perf-report mpirun -n 64 ./my_application argument01 argument02 + $ module load PerformanceReports/6.0 + $ perf-report mpirun -n 64 ./my_application argument01 argument02 Read more at the [Allinea Performance Reports](allinea-performance-reports.html) page. @@ -70,8 +70,8 @@ analyze, organize, and test programs, making it easy to isolate and identify problems in individual threads and processes in programs of great complexity. - $ module load totalview - $ totalview + $ module load totalview + $ totalview Read more at the [Totalview](total-view.html) page. @@ -80,8 +80,8 @@ Vampir trace analyzer Vampir is a GUI trace analyzer for traces in OTF format. - $ module load Vampir/8.5.0 - $ vampir + $ module load Vampir/8.5.0 + $ vampir Read more at the [Vampir](../../../salomon/software/debuggers/vampir.html) page. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/total-view.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/total-view.md index 8361a39a1c9dc08522c209990301632b30b603d3..1474446d0f068975d7144bd1ac25d6dc9abbfad7 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/total-view.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/total-view.md @@ -16,15 +16,15 @@ Debugging of GPU accelerated codes is also supported. You can check the status of the licenses here: - cat /apps/user/licenses/totalview_features_state.txt + cat /apps/user/licenses/totalview_features_state.txt - # totalview - # ------------------------------------------------- - # FEATURE                      TOTAL  USED AVAIL - # ------------------------------------------------- - TotalView_Team                    64     0    64 - Replay                            64     0    64 - CUDA                              64     0    64 + # totalview + # ------------------------------------------------- + # FEATURE                      TOTAL  USED AVAIL + # ------------------------------------------------- + TotalView_Team                    64     0    64 + Replay                            64     0    64 + CUDA                              64     0    64 Compiling Code to run with TotalView ------------------------------------ @@ -33,29 +33,29 @@ Compiling Code to run with TotalView Load all necessary modules to compile the code. For example: - module load intel + module load intel - module load impi  ... or ... module load openmpi/X.X.X-icc + module load impi  ... or ... module load openmpi/X.X.X-icc Load the TotalView module: - module load totalview/8.12 + module load totalview/8.12 Compile the code: - mpicc -g -O0 -o test_debug test.c + mpicc -g -O0 -o test_debug test.c - mpif90 -g -O0 -o test_debug test.f + mpif90 -g -O0 -o test_debug test.f ### Compiler flags Before debugging, you need to compile your code with theses flags: -**-g** : Generates extra debugging information usable by GDB. -g3 +-g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. -**-O0** : Suppress all optimizations. +-O0** : Suppress all optimizations. Starting a Job with TotalView ----------------------------- @@ -63,7 +63,7 @@ Starting a Job with TotalView Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh: - ssh -X username@anselm.it4i.cz + ssh -X username@anselm.it4i.cz Other options is to access login node using VNC. Please see the detailed information on how to use graphic user interface on Anselm @@ -72,7 +72,7 @@ information on how to use graphic user interface on Anselm From the login node an interactive session with X windows forwarding (-X option) can be started by following command: - qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 + qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00 Then launch the debugger with the totalview command followed by the name of the executable to debug. @@ -81,49 +81,49 @@ of the executable to debug. To debug a serial code use: - totalview test_debug + totalview test_debug ### Debugging a parallel code - option 1 -To debug a parallel code compiled with <span>**OpenMPI**</span> you need +To debug a parallel code compiled with >**OpenMPI** you need to setup your TotalView environment: -**Please note:** To be able to run parallel debugging procedure from the +Please note:** To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your **~/.tvdrc** file: - proc mpi_auto_run_starter { -    set starter_programs -    set executable_name [TV::symbol get $loaded_id full_pathname] -    set file_component [file tail $executable_name] + proc mpi_auto_run_starter { +    set starter_programs +    set executable_name [TV::symbol get $loaded_id full_pathname] +    set file_component [file tail $executable_name] -    if {[lsearch -exact $starter_programs $file_component] != -1} { -        puts "**************************************" -        puts "Automatically starting $file_component" -        puts "**************************************" -        dgo -    } - } +    if {[lsearch -exact $starter_programs $file_component] != -1} { +        puts "**************************************" +        puts "Automatically starting $file_component" +        puts "**************************************" +        dgo +    } + } - # Append this function to TotalView's image load callbacks so that - # TotalView run this program automatically. + # Append this function to TotalView's image load callbacks so that + # TotalView run this program automatically. - dlappend TV::image_load_callbacks mpi_auto_run_starter + dlappend TV::image_load_callbacks mpi_auto_run_starter The source code of this function can be also found in - /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl + /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl You can also add only following line to you ~/.tvdrc file instead of the entire function: -**source /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl** +source /apps/mpi/openmpi/intel/1.6.5/etc/openmpi-totalview.tcl** You need to do this step only once. Now you can run the parallel debugger using: - mpirun -tv -n 5 ./test_debug + mpirun -tv -n 5 ./test_debug When following dialog appears click on "Yes" @@ -143,9 +143,9 @@ to specify a MPI implementation used to compile the source code. The following example shows how to start debugging session with Intel MPI: - module load intel/13.5.192 impi/4.1.1.036 totalview/8/13 + module load intel/13.5.192 impi/4.1.1.036 totalview/8/13 - totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi + totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi After running previous command you will see the same window as shown in the screenshot above. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/valgrind.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/valgrind.md index b1aa57fc54b1f17728c7f917a0a56b5eaa4d6de5..d76f3cd3e72ce9fb37ace70b8fec131984150e4e 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/valgrind.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/valgrind.md @@ -19,271 +19,271 @@ Valgrind run 5-100 times slower. The main tools available in Valgrind are : -- **Memcheck**, the original, must used and default tool. Verifies - memory access in you program and can detect use of unitialized - memory, out of bounds memory access, memory leaks, double free, etc. -- **Massif**, a heap profiler. -- **Hellgrind** and **DRD** can detect race conditions in - multi-threaded applications. -- **Cachegrind**, a cache profiler. -- **Callgrind**, a callgraph analyzer. -- For a full list and detailed documentation, please refer to the - [official Valgrind - documentation](http://valgrind.org/docs/). +- **Memcheck**, the original, must used and default tool. Verifies + memory access in you program and can detect use of unitialized + memory, out of bounds memory access, memory leaks, double free, etc. +- **Massif**, a heap profiler. +- **Hellgrind** and **DRD** can detect race conditions in + multi-threaded applications. +- **Cachegrind**, a cache profiler. +- **Callgrind**, a callgraph analyzer. +- For a full list and detailed documentation, please refer to the + [official Valgrind + documentation](http://valgrind.org/docs/). Installed versions ------------------ There are two versions of Valgrind available on Anselm. -- <span>Version 3.6.0, installed by operating system vendor - in </span><span class="monospace">/usr/bin/valgrind. - </span><span>This version is available by default, without the need - to load any module. This version however does not provide additional - MPI support.</span> -- <span><span>Version 3.9.0 with support for Intel MPI, available in - [module](../../environment-and-modules.html) </span></span><span - class="monospace">valgrind/3.9.0-impi. </span>After loading the - module, this version replaces the default valgrind. +- >Version 3.6.0, installed by operating system vendor + in /usr/bin/valgrind. + >This version is available by default, without the need + to load any module. This version however does not provide additional + MPI support. +- >>Version 3.9.0 with support for Intel MPI, available in + [module](../../environment-and-modules.html) + valgrind/3.9.0-impi. After loading the + module, this version replaces the default valgrind. Usage ----- Compile the application which you want to debug as usual. It is -advisable to add compilation flags <span class="monospace">-g </span>(to +advisable to add compilation flags -g (to add debugging information to the binary so that you will see original -source code lines in the output) and <span class="monospace">-O0</span> +source code lines in the output) and -O0 (to disable compiler optimizations). For example, lets look at this C code, which has two problems : - #include <stdlib.h> + #include <stdlib.h> - void f(void) - { - int* x = malloc(10 * sizeof(int)); - x[10] = 0; // problem 1: heap block overrun - } // problem 2: memory leak -- x not freed + void f(void) + { + int* x = malloc(10 * sizeof(int)); + x[10] = 0; // problem 1: heap block overrun + } // problem 2: memory leak -- x not freed - int main(void) - { - f(); - return 0; - } + int main(void) + { + f(); + return 0; + } Now, compile it with Intel compiler : - $ module add intel - $ icc -g valgrind-example.c -o valgrind-example + $ module add intel + $ icc -g valgrind-example.c -o valgrind-example Now, lets run it with Valgrind. The syntax is : -<span class="monospace">valgrind [valgrind options] <your program -binary> [your program options]</span> + valgrind [valgrind options] <your program +binary> [your program options] If no Valgrind options are specified, Valgrind defaults to running Memcheck tool. Please refer to the Valgrind documentation for a full description of command line options. - $ valgrind ./valgrind-example - ==12652== Memcheck, a memory error detector - ==12652== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. - ==12652== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info - ==12652== Command: ./valgrind-example - ==12652== - ==12652== Invalid write of size 4 - ==12652== at 0x40053E: f (valgrind-example.c:6) - ==12652== by 0x40054E: main (valgrind-example.c:11) - ==12652== Address 0x5861068 is 0 bytes after a block of size 40 alloc'd - ==12652== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) - ==12652== by 0x400528: f (valgrind-example.c:5) - ==12652== by 0x40054E: main (valgrind-example.c:11) - ==12652== - ==12652== - ==12652== HEAP SUMMARY: - ==12652== in use at exit: 40 bytes in 1 blocks - ==12652== total heap usage: 1 allocs, 0 frees, 40 bytes allocated - ==12652== - ==12652== LEAK SUMMARY: - ==12652== definitely lost: 40 bytes in 1 blocks - ==12652== indirectly lost: 0 bytes in 0 blocks - ==12652== possibly lost: 0 bytes in 0 blocks - ==12652== still reachable: 0 bytes in 0 blocks - ==12652== suppressed: 0 bytes in 0 blocks - ==12652== Rerun with --leak-check=full to see details of leaked memory - ==12652== - ==12652== For counts of detected and suppressed errors, rerun with: -v - ==12652== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6) + $ valgrind ./valgrind-example + ==12652== Memcheck, a memory error detector + ==12652== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. + ==12652== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info + ==12652== Command: ./valgrind-example + ==12652== + ==12652== Invalid write of size 4 + ==12652== at 0x40053E: f (valgrind-example.c:6) + ==12652== by 0x40054E: main (valgrind-example.c:11) + ==12652== Address 0x5861068 is 0 bytes after a block of size 40 alloc'd + ==12652== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) + ==12652== by 0x400528: f (valgrind-example.c:5) + ==12652== by 0x40054E: main (valgrind-example.c:11) + ==12652== + ==12652== + ==12652== HEAP SUMMARY: + ==12652== in use at exit: 40 bytes in 1 blocks + ==12652== total heap usage: 1 allocs, 0 frees, 40 bytes allocated + ==12652== + ==12652== LEAK SUMMARY: + ==12652== definitely lost: 40 bytes in 1 blocks + ==12652== indirectly lost: 0 bytes in 0 blocks + ==12652== possibly lost: 0 bytes in 0 blocks + ==12652== still reachable: 0 bytes in 0 blocks + ==12652== suppressed: 0 bytes in 0 blocks + ==12652== Rerun with --leak-check=full to see details of leaked memory + ==12652== + ==12652== For counts of detected and suppressed errors, rerun with: -v + ==12652== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6) In the output we can see that Valgrind has detected both errors - the off-by-one memory access at line 5 and a memory leak of 40 bytes. If we want a detailed analysis of the memory leak, we need to run Valgrind -with <span class="monospace">--leak-check=full</span> option : - - $ valgrind --leak-check=full ./valgrind-example - ==23856== Memcheck, a memory error detector - ==23856== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. - ==23856== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info - ==23856== Command: ./valgrind-example - ==23856== - ==23856== Invalid write of size 4 - ==23856== at 0x40067E: f (valgrind-example.c:6) - ==23856== by 0x40068E: main (valgrind-example.c:11) - ==23856== Address 0x66e7068 is 0 bytes after a block of size 40 alloc'd - ==23856== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) - ==23856== by 0x400668: f (valgrind-example.c:5) - ==23856== by 0x40068E: main (valgrind-example.c:11) - ==23856== - ==23856== - ==23856== HEAP SUMMARY: - ==23856== in use at exit: 40 bytes in 1 blocks - ==23856== total heap usage: 1 allocs, 0 frees, 40 bytes allocated - ==23856== - ==23856== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1 - ==23856== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) - ==23856== by 0x400668: f (valgrind-example.c:5) - ==23856== by 0x40068E: main (valgrind-example.c:11) - ==23856== - ==23856== LEAK SUMMARY: - ==23856== definitely lost: 40 bytes in 1 blocks - ==23856== indirectly lost: 0 bytes in 0 blocks - ==23856== possibly lost: 0 bytes in 0 blocks - ==23856== still reachable: 0 bytes in 0 blocks - ==23856== suppressed: 0 bytes in 0 blocks - ==23856== - ==23856== For counts of detected and suppressed errors, rerun with: -v - ==23856== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6) - -Now we can see that the memory leak is due to the <span -class="monospace">malloc()</span> at line 6. - -<span>Usage with MPI</span> +with --leak-check=full option : + + $ valgrind --leak-check=full ./valgrind-example + ==23856== Memcheck, a memory error detector + ==23856== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. + ==23856== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info + ==23856== Command: ./valgrind-example + ==23856== + ==23856== Invalid write of size 4 + ==23856== at 0x40067E: f (valgrind-example.c:6) + ==23856== by 0x40068E: main (valgrind-example.c:11) + ==23856== Address 0x66e7068 is 0 bytes after a block of size 40 alloc'd + ==23856== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) + ==23856== by 0x400668: f (valgrind-example.c:5) + ==23856== by 0x40068E: main (valgrind-example.c:11) + ==23856== + ==23856== + ==23856== HEAP SUMMARY: + ==23856== in use at exit: 40 bytes in 1 blocks + ==23856== total heap usage: 1 allocs, 0 frees, 40 bytes allocated + ==23856== + ==23856== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1 + ==23856== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) + ==23856== by 0x400668: f (valgrind-example.c:5) + ==23856== by 0x40068E: main (valgrind-example.c:11) + ==23856== + ==23856== LEAK SUMMARY: + ==23856== definitely lost: 40 bytes in 1 blocks + ==23856== indirectly lost: 0 bytes in 0 blocks + ==23856== possibly lost: 0 bytes in 0 blocks + ==23856== still reachable: 0 bytes in 0 blocks + ==23856== suppressed: 0 bytes in 0 blocks + ==23856== + ==23856== For counts of detected and suppressed errors, rerun with: -v + ==23856== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6) + +Now we can see that the memory leak is due to the +malloc() at line 6. + +>Usage with MPI --------------------------- Although Valgrind is not primarily a parallel debugger, it can be used to debug parallel applications as well. When launching your parallel applications, prepend the valgrind command. For example : - $ mpirun -np 4 valgrind myapplication + $ mpirun -np 4 valgrind myapplication The default version without MPI support will however report a large number of false errors in the MPI library, such as : - ==30166== Conditional jump or move depends on uninitialised value(s) - ==30166== at 0x4C287E8: strlen (mc_replace_strmem.c:282) - ==30166== by 0x55443BD: I_MPI_Processor_model_number (init_interface.c:427) - ==30166== by 0x55439E0: I_MPI_Processor_arch_code (init_interface.c:171) - ==30166== by 0x558D5AE: MPID_nem_impi_init_shm_configuration (mpid_nem_impi_extensions.c:1091) - ==30166== by 0x5598F4C: MPID_nem_init_ckpt (mpid_nem_init.c:566) - ==30166== by 0x5598B65: MPID_nem_init (mpid_nem_init.c:489) - ==30166== by 0x539BD75: MPIDI_CH3_Init (ch3_init.c:64) - ==30166== by 0x5578743: MPID_Init (mpid_init.c:193) - ==30166== by 0x554650A: MPIR_Init_thread (initthread.c:539) - ==30166== by 0x553369F: PMPI_Init (init.c:195) - ==30166== by 0x4008BD: main (valgrind-example-mpi.c:18) + ==30166== Conditional jump or move depends on uninitialised value(s) + ==30166== at 0x4C287E8: strlen (mc_replace_strmem.c:282) + ==30166== by 0x55443BD: I_MPI_Processor_model_number (init_interface.c:427) + ==30166== by 0x55439E0: I_MPI_Processor_arch_code (init_interface.c:171) + ==30166== by 0x558D5AE: MPID_nem_impi_init_shm_configuration (mpid_nem_impi_extensions.c:1091) + ==30166== by 0x5598F4C: MPID_nem_init_ckpt (mpid_nem_init.c:566) + ==30166== by 0x5598B65: MPID_nem_init (mpid_nem_init.c:489) + ==30166== by 0x539BD75: MPIDI_CH3_Init (ch3_init.c:64) + ==30166== by 0x5578743: MPID_Init (mpid_init.c:193) + ==30166== by 0x554650A: MPIR_Init_thread (initthread.c:539) + ==30166== by 0x553369F: PMPI_Init (init.c:195) + ==30166== by 0x4008BD: main (valgrind-example-mpi.c:18) so it is better to use the MPI-enabled valgrind from module. The MPI -version requires library <span -class="monospace">/apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so</span>, -which must be included in the<span class="monospace"> LD_PRELOAD -</span>environment variable. +version requires library +/apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so, +which must be included in the LD_PRELOAD +environment variable. Lets look at this MPI example : - #include <stdlib.h> - #include <mpi.h> + #include <stdlib.h> + #include <mpi.h> - int main(int argc, char *argv[]) - { -      int *data = malloc(sizeof(int)*99); + int main(int argc, char *argv[]) + { +      int *data = malloc(sizeof(int)*99); -      MPI_Init(&argc, &argv); -     MPI_Bcast(data, 100, MPI_INT, 0, MPI_COMM_WORLD); -      MPI_Finalize(); +      MPI_Init(&argc, &argv); +     MPI_Bcast(data, 100, MPI_INT, 0, MPI_COMM_WORLD); +      MPI_Finalize(); -        return 0; - } +        return 0; + } There are two errors - use of uninitialized memory and invalid length of the buffer. Lets debug it with valgrind : - $ module add intel impi - $ mpicc -g valgrind-example-mpi.c -o valgrind-example-mpi - $ module add valgrind/3.9.0-impi - $ mpirun -np 2 -env LD_PRELOAD /apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi + $ module add intel impi + $ mpicc -g valgrind-example-mpi.c -o valgrind-example-mpi + $ module add valgrind/3.9.0-impi + $ mpirun -np 2 -env LD_PRELOAD /apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi Prints this output : (note that there is output printed for every launched MPI process) - ==31318== Memcheck, a memory error detector - ==31318== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. - ==31318== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info - ==31318== Command: ./valgrind-example-mpi - ==31318== - ==31319== Memcheck, a memory error detector - ==31319== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. - ==31319== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info - ==31319== Command: ./valgrind-example-mpi - ==31319== - valgrind MPI wrappers 31319: Active for pid 31319 - valgrind MPI wrappers 31319: Try MPIWRAP_DEBUG=help for possible options - valgrind MPI wrappers 31318: Active for pid 31318 - valgrind MPI wrappers 31318: Try MPIWRAP_DEBUG=help for possible options - ==31319== Unaddressable byte(s) found during client check request - ==31319== at 0x4E35974: check_mem_is_addressable_untyped (libmpiwrap.c:960) - ==31319== by 0x4E5D0FE: PMPI_Bcast (libmpiwrap.c:908) - ==31319== by 0x400911: main (valgrind-example-mpi.c:20) - ==31319== Address 0x69291cc is 0 bytes after a block of size 396 alloc'd - ==31319== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) - ==31319== by 0x4007BC: main (valgrind-example-mpi.c:8) - ==31319== - ==31318== Uninitialised byte(s) found during client check request - ==31318== at 0x4E3591D: check_mem_is_defined_untyped (libmpiwrap.c:952) - ==31318== by 0x4E5D06D: PMPI_Bcast (libmpiwrap.c:908) - ==31318== by 0x400911: main (valgrind-example-mpi.c:20) - ==31318== Address 0x6929040 is 0 bytes inside a block of size 396 alloc'd - ==31318== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) - ==31318== by 0x4007BC: main (valgrind-example-mpi.c:8) - ==31318== - ==31318== Unaddressable byte(s) found during client check request - ==31318== at 0x4E3591D: check_mem_is_defined_untyped (libmpiwrap.c:952) - ==31318== by 0x4E5D06D: PMPI_Bcast (libmpiwrap.c:908) - ==31318== by 0x400911: main (valgrind-example-mpi.c:20) - ==31318== Address 0x69291cc is 0 bytes after a block of size 396 alloc'd - ==31318== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) - ==31318== by 0x4007BC: main (valgrind-example-mpi.c:8) - ==31318== - ==31318== - ==31318== HEAP SUMMARY: - ==31318== in use at exit: 3,172 bytes in 67 blocks - ==31318== total heap usage: 191 allocs, 124 frees, 81,203 bytes allocated - ==31318== - ==31319== - ==31319== HEAP SUMMARY: - ==31319== in use at exit: 3,172 bytes in 67 blocks - ==31319== total heap usage: 175 allocs, 108 frees, 48,435 bytes allocated - ==31319== - ==31318== LEAK SUMMARY: - ==31318== definitely lost: 408 bytes in 3 blocks - ==31318== indirectly lost: 256 bytes in 1 blocks - ==31318== possibly lost: 0 bytes in 0 blocks - ==31318== still reachable: 2,508 bytes in 63 blocks - ==31318== suppressed: 0 bytes in 0 blocks - ==31318== Rerun with --leak-check=full to see details of leaked memory - ==31318== - ==31318== For counts of detected and suppressed errors, rerun with: -v - ==31318== Use --track-origins=yes to see where uninitialised values come from - ==31318== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4) - ==31319== LEAK SUMMARY: - ==31319== definitely lost: 408 bytes in 3 blocks - ==31319== indirectly lost: 256 bytes in 1 blocks - ==31319== possibly lost: 0 bytes in 0 blocks - ==31319== still reachable: 2,508 bytes in 63 blocks - ==31319== suppressed: 0 bytes in 0 blocks - ==31319== Rerun with --leak-check=full to see details of leaked memory - ==31319== - ==31319== For counts of detected and suppressed errors, rerun with: -v - ==31319== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4) + ==31318== Memcheck, a memory error detector + ==31318== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. + ==31318== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info + ==31318== Command: ./valgrind-example-mpi + ==31318== + ==31319== Memcheck, a memory error detector + ==31319== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. + ==31319== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info + ==31319== Command: ./valgrind-example-mpi + ==31319== + valgrind MPI wrappers 31319: Active for pid 31319 + valgrind MPI wrappers 31319: Try MPIWRAP_DEBUG=help for possible options + valgrind MPI wrappers 31318: Active for pid 31318 + valgrind MPI wrappers 31318: Try MPIWRAP_DEBUG=help for possible options + ==31319== Unaddressable byte(s) found during client check request + ==31319== at 0x4E35974: check_mem_is_addressable_untyped (libmpiwrap.c:960) + ==31319== by 0x4E5D0FE: PMPI_Bcast (libmpiwrap.c:908) + ==31319== by 0x400911: main (valgrind-example-mpi.c:20) + ==31319== Address 0x69291cc is 0 bytes after a block of size 396 alloc'd + ==31319== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) + ==31319== by 0x4007BC: main (valgrind-example-mpi.c:8) + ==31319== + ==31318== Uninitialised byte(s) found during client check request + ==31318== at 0x4E3591D: check_mem_is_defined_untyped (libmpiwrap.c:952) + ==31318== by 0x4E5D06D: PMPI_Bcast (libmpiwrap.c:908) + ==31318== by 0x400911: main (valgrind-example-mpi.c:20) + ==31318== Address 0x6929040 is 0 bytes inside a block of size 396 alloc'd + ==31318== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) + ==31318== by 0x4007BC: main (valgrind-example-mpi.c:8) + ==31318== + ==31318== Unaddressable byte(s) found during client check request + ==31318== at 0x4E3591D: check_mem_is_defined_untyped (libmpiwrap.c:952) + ==31318== by 0x4E5D06D: PMPI_Bcast (libmpiwrap.c:908) + ==31318== by 0x400911: main (valgrind-example-mpi.c:20) + ==31318== Address 0x69291cc is 0 bytes after a block of size 396 alloc'd + ==31318== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) + ==31318== by 0x4007BC: main (valgrind-example-mpi.c:8) + ==31318== + ==31318== + ==31318== HEAP SUMMARY: + ==31318== in use at exit: 3,172 bytes in 67 blocks + ==31318== total heap usage: 191 allocs, 124 frees, 81,203 bytes allocated + ==31318== + ==31319== + ==31319== HEAP SUMMARY: + ==31319== in use at exit: 3,172 bytes in 67 blocks + ==31319== total heap usage: 175 allocs, 108 frees, 48,435 bytes allocated + ==31319== + ==31318== LEAK SUMMARY: + ==31318== definitely lost: 408 bytes in 3 blocks + ==31318== indirectly lost: 256 bytes in 1 blocks + ==31318== possibly lost: 0 bytes in 0 blocks + ==31318== still reachable: 2,508 bytes in 63 blocks + ==31318== suppressed: 0 bytes in 0 blocks + ==31318== Rerun with --leak-check=full to see details of leaked memory + ==31318== + ==31318== For counts of detected and suppressed errors, rerun with: -v + ==31318== Use --track-origins=yes to see where uninitialised values come from + ==31318== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4) + ==31319== LEAK SUMMARY: + ==31319== definitely lost: 408 bytes in 3 blocks + ==31319== indirectly lost: 256 bytes in 1 blocks + ==31319== possibly lost: 0 bytes in 0 blocks + ==31319== still reachable: 2,508 bytes in 63 blocks + ==31319== suppressed: 0 bytes in 0 blocks + ==31319== Rerun with --leak-check=full to see details of leaked memory + ==31319== + ==31319== For counts of detected and suppressed errors, rerun with: -v + ==31319== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4) We can see that Valgrind has reported use of unitialised memory on the master process (which reads the array to be broadcasted) and use of diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/vampir.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/vampir.md index 1d7d268a519355a6308618db897055c087a5ef72..d6d3243302b19239d16c2e1fa5ce1152a906c732 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/vampir.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/vampir.md @@ -14,21 +14,21 @@ first to collect the traces. Installed versions ------------------ -Version 8.5.0 is currently installed as module <span -class="monospace">Vampir/8.5.0</span> : +Version 8.5.0 is currently installed as module +Vampir/8.5.0 : - $ module load Vampir/8.5.0 - $ vampir & + $ module load Vampir/8.5.0 + $ vampir & User manual ----------- -You can find the detailed user manual in PDF format in <span -class="monospace">$EBROOTVAMPIR/doc/vampir-manual.pdf</span> +You can find the detailed user manual in PDF format in +$EBROOTVAMPIR/doc/vampir-manual.pdf References ---------- -1. <https://www.vampir.eu> +1.<https://www.vampir.eu> diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/gpi2.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/gpi2.md index 8d25e7ec3ba1473a6b95be366164a0922828df06..954fbca278eb85a473dac2c93d6b307fd625d323 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/gpi2.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/gpi2.md @@ -4,7 +4,7 @@ GPI-2 A library that implements the GASPI specification - + Introduction ------------ @@ -19,16 +19,16 @@ The GPI-2 library implements the GASPI specification (Global Address Space Programming Interface, [www.gaspi.de](http://www.gaspi.de/en/project.html)). -<span>GASPI is a Partitioned Global Address Space (PGAS) API. It aims at +>GASPI is a Partitioned Global Address Space (PGAS) API. It aims at scalable, flexible and failure tolerant computing in massively parallel -environments.</span> +environments. Modules ------- The GPI-2, version 1.0.2 is available on Anselm via module gpi2: - $ module load gpi2 + $ module load gpi2 The module sets up environment variables, required for linking and running GPI-2 enabled applications. This particular command loads the @@ -45,48 +45,48 @@ infinband communication library ibverbs. ### Compiling and linking with Intel compilers - $ module load intel - $ module load gpi2 - $ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs + $ module load intel + $ module load gpi2 + $ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs ### Compiling and linking with GNU compilers - $ module load gcc - $ module load gpi2 - $ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs + $ module load gcc + $ module load gpi2 + $ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs Running the GPI-2 codes ----------------------- -<span>gaspi_run</span> +>gaspi_run gaspi_run starts the GPI-2 application The gaspi_run utility is used to start and run GPI-2 applications: - $ gaspi_run -m machinefile ./myprog.x + $ gaspi_run -m machinefile ./myprog.x A machine file (**machinefile**) with the hostnames of nodes where the application will run, must be provided. The**** machinefile lists all nodes on which to run, one entry per node per process. This file may be hand created or obtained from standard $PBS_NODEFILE: - $ cut -f1 -d"." $PBS_NODEFILE > machinefile + $ cut -f1 -d"." $PBS_NODEFILE > machinefile machinefile: - cn79 - cn80 + cn79 + cn80 This machinefile will run 2 GPI-2 processes, one on node cn79 other on node cn80. machinefle: - cn79 - cn79 - cn80 - cn80 + cn79 + cn79 + cn80 + cn80 This machinefile will run 4 GPI-2 processes, 2 on node cn79 o 2 on node cn80. @@ -96,7 +96,7 @@ node Example: - $ qsub -A OPEN-0-0 -q qexp -l select=2:ncpus=16:mpiprocs=16 -I + $ qsub -A OPEN-0-0 -q qexp -l select=2:ncpus=16:mpiprocs=16 -I This example will produce $PBS_NODEFILE with 16 entries per node. @@ -116,63 +116,63 @@ Example Following is an example GPI-2 enabled code: - #include <GASPI.h> - #include <stdlib.h> - - void success_or_exit ( const char* file, const int line, const int ec) - { - if (ec != GASPI_SUCCESS) - { - gaspi_printf ("Assertion failed in %s[%i]:%dn", file, line, ec); - exit (1); - } - } - - #define ASSERT(ec) success_or_exit (__FILE__, __LINE__, ec); - - int main(int argc, char *argv[]) - { - gaspi_rank_t rank, num; - gaspi_return_t ret; - - /* Initialize GPI-2 */ - ASSERT( gaspi_proc_init(GASPI_BLOCK) ); - - /* Get ranks information */ - ASSERT( gaspi_proc_rank(&rank) ); - ASSERT( gaspi_proc_num(&num) ); - - gaspi_printf("Hello from rank %d of %dn", - rank, num); - - /* Terminate */ - ASSERT( gaspi_proc_term(GASPI_BLOCK) ); - - return 0; - } + #include <GASPI.h> + #include <stdlib.h> + + void success_or_exit ( const char* file, const int line, const int ec) + { + if (ec != GASPI_SUCCESS) + { + gaspi_printf ("Assertion failed in %s[%i]:%dn", file, line, ec); + exit (1); + } + } + + #define ASSERT(ec) success_or_exit (__FILE__, __LINE__, ec); + + int main(int argc, char *argv[]) + { + gaspi_rank_t rank, num; + gaspi_return_t ret; + + /* Initialize GPI-2 */ + ASSERT( gaspi_proc_init(GASPI_BLOCK) ); + + /* Get ranks information */ + ASSERT( gaspi_proc_rank(&rank) ); + ASSERT( gaspi_proc_num(&num) ); + + gaspi_printf("Hello from rank %d of %dn", + rank, num); + + /* Terminate */ + ASSERT( gaspi_proc_term(GASPI_BLOCK) ); + + return 0; + } Load modules and compile: - $ module load gcc gpi2 - $ gcc helloworld_gpi.c -o helloworld_gpi.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs + $ module load gcc gpi2 + $ gcc helloworld_gpi.c -o helloworld_gpi.x -Wl,-rpath=$LIBRARY_PATH -lGPI2 -libverbs Submit the job and run the GPI-2 application - $ qsub -q qexp -l select=2:ncpus=1:mpiprocs=1,place=scatter,walltime=00:05:00 -I - qsub: waiting for job 171247.dm2 to start - qsub: job 171247.dm2 ready + $ qsub -q qexp -l select=2:ncpus=1:mpiprocs=1,place=scatter,walltime=00:05:00 -I + qsub: waiting for job 171247.dm2 to start + qsub: job 171247.dm2 ready - cn79 $ module load gpi2 - cn79 $ cut -f1 -d"." $PBS_NODEFILE > machinefile - cn79 $ gaspi_run -m machinefile ./helloworld_gpi.x - Hello from rank 0 of 2 + cn79 $ module load gpi2 + cn79 $ cut -f1 -d"." $PBS_NODEFILE > machinefile + cn79 $ gaspi_run -m machinefile ./helloworld_gpi.x + Hello from rank 0 of 2 At the same time, in another session, you may start the gaspi logger: - $ ssh cn79 - cn79 $ gaspi_logger - GASPI Logger (v1.1) - [cn80:0] Hello from rank 1 of 2 + $ ssh cn79 + cn79 $ gaspi_logger + GASPI Logger (v1.1) + [cn80:0] Hello from rank 1 of 2 In this example, we compile the helloworld_gpi.c code using the **gnu compiler** (gcc) and link it to the GPI-2 and ibverbs library. The diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite.md index 20f127eb3686961a8521ca4786e69736c1d7a650..048a009a495954b6784cd7c3910f85adc24102de 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite.md @@ -3,18 +3,18 @@ Intel Parallel Studio - + The Anselm cluster provides following elements of the Intel Parallel Studio XE - Intel Parallel Studio XE - ------------------------------------------------- - Intel Compilers - Intel Debugger - Intel MKL Library - Intel Integrated Performance Primitives Library - Intel Threading Building Blocks Library +Intel Parallel Studio XE +------------------------------------------------- +Intel Compilers +Intel Debugger +Intel MKL Library +Intel Integrated Performance Primitives Library +Intel Threading Building Blocks Library Intel compilers --------------- @@ -23,9 +23,9 @@ The Intel compilers version 13.1.3 are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. - $ module load intel - $ icc -v - $ ifort -v + $ module load intel + $ icc -v + $ ifort -v Read more at the [Intel Compilers](intel-suite/intel-compilers.html) page. @@ -40,8 +40,8 @@ environment. Use [X display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) for running the GUI. - $ module load intel - $ idb + $ module load intel + $ idb Read more at the [Intel Debugger](intel-suite/intel-debugger.html) page. @@ -55,7 +55,7 @@ Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. - $ module load mkl + $ module load mkl Read more at the [Intel MKL](intel-suite/intel-mkl.html) page. @@ -70,7 +70,7 @@ includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. - $ module load ipp + $ module load ipp Read more at the [Intel IPP](intel-suite/intel-integrated-performance-primitives.html) @@ -88,7 +88,7 @@ smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. - $ module load tbb + $ module load tbb Read more at the [Intel TBB](intel-suite/intel-tbb.html) page. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-compilers.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-compilers.md index 5c6ac2b9300419153f4e43e89a7e6340cc162695..92023bac7b1e1feda0fee4b4650a3185e9297a9a 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-compilers.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-compilers.md @@ -3,15 +3,15 @@ Intel Compilers - + The Intel compilers version 13.1.1 are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. - $ module load intel - $ icc -v - $ ifort -v + $ module load intel + $ icc -v + $ ifort -v The intel compilers provide for vectorization of the code, via the AVX instructions and support threading parallelization via OpenMP @@ -20,8 +20,8 @@ For maximum performance on the Anselm cluster, compile your programs using the AVX instructions, with reporting where the vectorization was used. We recommend following compilation options for high performance - $ icc -ipo -O3 -vec -xAVX -vec-report1 myprog.c mysubroutines.c -o myprog.x - $ ifort -ipo -O3 -vec -xAVX -vec-report1 myprog.f mysubroutines.f -o myprog.x + $ icc -ipo -O3 -vec -xAVX -vec-report1 myprog.c mysubroutines.c -o myprog.x + $ ifort -ipo -O3 -vec -xAVX -vec-report1 myprog.f mysubroutines.f -o myprog.x In this example, we compile the program enabling interprocedural optimizations between source files (-ipo), aggresive loop optimizations @@ -31,8 +31,8 @@ The compiler recognizes the omp, simd, vector and ivdep pragmas for OpenMP parallelization and AVX vectorization. Enable the OpenMP parallelization by the **-openmp** compiler switch. - $ icc -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.c mysubroutines.c -o myprog.x - $ ifort -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.f mysubroutines.f -o myprog.x + $ icc -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.c mysubroutines.c -o myprog.x + $ ifort -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.f mysubroutines.f -o myprog.x Read more at <http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm> @@ -41,29 +41,29 @@ Sandy Bridge/Haswell binary compatibility ----------------------------------------- Anselm nodes are currently equipped with Sandy Bridge CPUs, while -Salomon will use Haswell architecture. <span>The new processors are +Salomon will use Haswell architecture. >The new processors are backward compatible with the Sandy Bridge nodes, so all programs that ran on the Sandy Bridge processors, should also run on the new Haswell -nodes. </span><span>To get optimal performance out of the Haswell -processors a program should make use of the special </span><span>AVX2 +nodes. >To get optimal performance out of the Haswell +processors a program should make use of the special >AVX2 instructions for this processor. One can do this by recompiling codes -with the compiler flags </span><span>designated to invoke these +with the compiler flags >designated to invoke these instructions. For the Intel compiler suite, there are two ways of -doing </span><span>this:</span> - -- <span>Using compiler flag (both for Fortran and C): <span - class="monospace">-xCORE-AVX2</span>. This will create a - binary </span><span class="s1">with AVX2 instructions, specifically - for the Haswell processors. Note that the - executable </span><span>will not run on Sandy Bridge nodes.</span> -- <span>Using compiler flags (both for Fortran and C): <span - class="monospace">-xAVX -axCORE-AVX2</span>. This - will </span><span>generate multiple, feature specific auto-dispatch - code paths for Intel® processors, if there is </span><span>a - performance benefit. So this binary will run both on Sandy Bridge - and Haswell </span><span>processors. During runtime it will be - decided which path to follow, dependent on - which </span><span>processor you are running on. In general this - will result in larger binaries.</span> +doing >this: + +- >Using compiler flag (both for Fortran and C): + -xCORE-AVX2. This will create a + binary class="s1">with AVX2 instructions, specifically + for the Haswell processors. Note that the + executable >will not run on Sandy Bridge nodes. +- >Using compiler flags (both for Fortran and C): + -xAVX -axCORE-AVX2. This + will >generate multiple, feature specific auto-dispatch + code paths for Intel® processors, if there is >a + performance benefit. So this binary will run both on Sandy Bridge + and Haswell >processors. During runtime it will be + decided which path to follow, dependent on + which >processor you are running on. In general this + will result in larger binaries. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-debugger.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-debugger.md index e5fa269a5fc6c60273c061047f8bf742b90300df..3b220c7c046373441fb15fcd655e1df4f1e77966 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-debugger.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-debugger.md @@ -3,7 +3,7 @@ Intel Debugger - + Debugging serial applications ----------------------------- @@ -15,12 +15,12 @@ environment. Use [X display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) for running the GUI. - $ module load intel - $ idb + $ module load intel + $ idb The debugger may run in text mode. To debug in text mode, use - $ idbc + $ idbc To debug on the compute nodes, module intel must be loaded. The GUI on compute nodes may be accessed using the same way as in [the @@ -29,14 +29,14 @@ section](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/ Example: - $ qsub -q qexp -l select=1:ncpus=16 -X -I - qsub: waiting for job 19654.srv11 to start - qsub: job 19654.srv11 ready + $ qsub -q qexp -l select=1:ncpus=16 -X -I + qsub: waiting for job 19654.srv11 to start + qsub: job 19654.srv11 ready - $ module load intel - $ module load java - $ icc -O0 -g myprog.c -o myprog.x - $ idb ./myprog.x + $ module load intel + $ module load java + $ icc -O0 -g myprog.c -o myprog.x + $ idb ./myprog.x In this example, we allocate 1 full compute node, compile program myprog.c with debugging options -O0 -g and run the idb debugger @@ -56,12 +56,12 @@ rank in separate xterm terminal (do not forget the [X display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33)). Using Intel MPI, this may be done in following way: - $ qsub -q qexp -l select=2:ncpus=16 -X -I - qsub: waiting for job 19654.srv11 to start - qsub: job 19655.srv11 ready + $ qsub -q qexp -l select=2:ncpus=16 -X -I + qsub: waiting for job 19654.srv11 to start + qsub: job 19655.srv11 ready - $ module load intel impi - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x + $ module load intel impi + $ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x In this example, we allocate 2 full compute node, run xterm on each node and start idb debugger in command line mode, debugging two ranks of @@ -75,12 +75,12 @@ the debugger to bind to all ranks and provide aggregated outputs across the ranks, pausing execution automatically just after startup. You may then set break points and step the execution manually. Using Intel MPI: - $ qsub -q qexp -l select=2:ncpus=16 -X -I - qsub: waiting for job 19654.srv11 to start - qsub: job 19655.srv11 ready + $ qsub -q qexp -l select=2:ncpus=16 -X -I + qsub: waiting for job 19654.srv11 to start + qsub: job 19655.srv11 ready - $ module load intel impi - $ mpirun -n 32 -idb ./mympiprog.x + $ module load intel impi + $ mpirun -n 32 -idb ./mympiprog.x ### Debugging multithreaded application diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md index 743c5ec1b99815708599a5de7163d2ffdfa42bfe..0945105c13f75f85b49718e28fe65ab79759b2df 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-integrated-performance-primitives.md @@ -3,7 +3,7 @@ Intel IPP - + Intel Integrated Performance Primitives --------------------------------------- @@ -19,7 +19,7 @@ algebra functions and many more. Check out IPP before implementing own math functions for data processing, it is likely already there. - $ module load ipp + $ module load ipp The module sets up environment variables, required for linking and running ipp enabled applications. @@ -27,60 +27,60 @@ running ipp enabled applications. IPP example ----------- - #include "ipp.h" - #include <stdio.h> - int main(int argc, char* argv[]) - { - const IppLibraryVersion *lib; - Ipp64u fm; - IppStatus status; - - status= ippInit(); //IPP initialization with the best optimization layer - if( status != ippStsNoErr ) { - printf("IppInit() Error:n"); - printf("%sn", ippGetStatusString(status) ); - return -1; - } - - //Get version info - lib = ippiGetLibVersion(); - printf("%s %sn", lib->Name, lib->Version); - - //Get CPU features enabled with selected library level - fm=ippGetEnabledCpuFeatures(); - printf("SSE :%cn",(fm>>1)&1?'Y':'N'); - printf("SSE2 :%cn",(fm>>2)&1?'Y':'N'); - printf("SSE3 :%cn",(fm>>3)&1?'Y':'N'); - printf("SSSE3 :%cn",(fm>>4)&1?'Y':'N'); - printf("SSE41 :%cn",(fm>>6)&1?'Y':'N'); - printf("SSE42 :%cn",(fm>>7)&1?'Y':'N'); - printf("AVX :%cn",(fm>>8)&1 ?'Y':'N'); - printf("AVX2 :%cn", (fm>>15)&1 ?'Y':'N' ); - printf("----------n"); - printf("OS Enabled AVX :%cn", (fm>>9)&1 ?'Y':'N'); - printf("AES :%cn", (fm>>10)&1?'Y':'N'); - printf("CLMUL :%cn", (fm>>11)&1?'Y':'N'); - printf("RDRAND :%cn", (fm>>13)&1?'Y':'N'); - printf("F16C :%cn", (fm>>14)&1?'Y':'N'); - - return 0; - } + #include "ipp.h" + #include <stdio.h> + int main(int argc, char* argv[]) + { + const IppLibraryVersion *lib; + Ipp64u fm; + IppStatus status; + + status= ippInit(); //IPP initialization with the best optimization layer + if( status != ippStsNoErr ) { + printf("IppInit() Error:n"); + printf("%sn", ippGetStatusString(status) ); + return -1; + } + + //Get version info + lib = ippiGetLibVersion(); + printf("%s %sn", lib->Name, lib->Version); + + //Get CPU features enabled with selected library level + fm=ippGetEnabledCpuFeatures(); + printf("SSE :%cn",(fm>>1)&1?'Y':'N'); + printf("SSE2 :%cn",(fm>>2)&1?'Y':'N'); + printf("SSE3 :%cn",(fm>>3)&1?'Y':'N'); + printf("SSSE3 :%cn",(fm>>4)&1?'Y':'N'); + printf("SSE41 :%cn",(fm>>6)&1?'Y':'N'); + printf("SSE42 :%cn",(fm>>7)&1?'Y':'N'); + printf("AVX :%cn",(fm>>8)&1 ?'Y':'N'); + printf("AVX2 :%cn", (fm>>15)&1 ?'Y':'N' ); + printf("----------n"); + printf("OS Enabled AVX :%cn", (fm>>9)&1 ?'Y':'N'); + printf("AES :%cn", (fm>>10)&1?'Y':'N'); + printf("CLMUL :%cn", (fm>>11)&1?'Y':'N'); + printf("RDRAND :%cn", (fm>>13)&1?'Y':'N'); + printf("F16C :%cn", (fm>>14)&1?'Y':'N'); + + return 0; + }  Compile above example, using any compiler and the ipp module. - $ module load intel - $ module load ipp + $ module load intel + $ module load ipp - $ icc testipp.c -o testipp.x -lippi -lipps -lippcore + $ icc testipp.c -o testipp.x -lippi -lipps -lippcore You will need the ipp module loaded to run the ipp enabled executable. This may be avoided, by compiling library search paths into the executable - $ module load intel - $ module load ipp + $ module load intel + $ module load ipp - $ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore + $ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore Code samples and documentation ------------------------------ diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-mkl.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-mkl.md index e731e0871086d15aa5230118a938e892eae6a55d..c599d5d2969e5a5f57f8da4f1e47023dd48b5f43 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-mkl.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-mkl.md @@ -3,7 +3,7 @@ Intel MKL - + Intel Math Kernel Library @@ -13,59 +13,59 @@ Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL provides these basic math kernels: -[]() -- <div id="d4841e18"> - +- <div id="d4841e18"> - []()BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, - offering vector, vector-matrix, and matrix-matrix operations. -- <div id="d4841e21"> + - + BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, + offering vector, vector-matrix, and matrix-matrix operations. +- <div id="d4841e21"> + + - []()The PARDISO direct sparse solver, an iterative sparse solver, - and supporting sparse BLAS (level 1, 2, and 3) routines for solving - sparse systems of equations. -- <div id="d4841e24"> + The PARDISO direct sparse solver, an iterative sparse solver, + and supporting sparse BLAS (level 1, 2, and 3) routines for solving + sparse systems of equations. +- <div id="d4841e24"> - + - []()ScaLAPACK distributed processing linear algebra routines for - Linux* and Windows* operating systems, as well as the Basic Linear - Algebra Communications Subprograms (BLACS) and the Parallel Basic - Linear Algebra Subprograms (PBLAS). -- <div id="d4841e27"> + ScaLAPACK distributed processing linear algebra routines for + Linux* and Windows* operating systems, as well as the Basic Linear + Algebra Communications Subprograms (BLACS) and the Parallel Basic + Linear Algebra Subprograms (PBLAS). +- <div id="d4841e27"> - + - []()Fast Fourier transform (FFT) functions in one, two, or three - dimensions with support for mixed radices (not limited to sizes that - are powers of 2), as well as distributed versions of - these functions. -- <div id="d4841e30"> + Fast Fourier transform (FFT) functions in one, two, or three + dimensions with support for mixed radices (not limited to sizes that + are powers of 2), as well as distributed versions of + these functions. +- <div id="d4841e30"> - + - []()Vector Math Library (VML) routines for optimized mathematical - operations on vectors. -- <div id="d4841e34"> + Vector Math Library (VML) routines for optimized mathematical + operations on vectors. +- <div id="d4841e34"> - + - []()Vector Statistical Library (VSL) routines, which offer - high-performance vectorized random number generators (RNG) for - several probability distributions, convolution and correlation - routines, and summary statistics functions. -- <div id="d4841e37"> + Vector Statistical Library (VSL) routines, which offer + high-performance vectorized random number generators (RNG) for + several probability distributions, convolution and correlation + routines, and summary statistics functions. +- <div id="d4841e37"> - + - []()Data Fitting Library, which provides capabilities for - spline-based approximation of functions, derivatives and integrals - of functions, and search. -- Extended Eigensolver, a shared memory version of an eigensolver - based on the Feast Eigenvalue Solver. + Data Fitting Library, which provides capabilities for + spline-based approximation of functions, derivatives and integrals + of functions, and search. +- Extended Eigensolver, a shared memory version of an eigensolver + based on the Feast Eigenvalue Solver. @@ -74,7 +74,7 @@ Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/ Intel MKL version 13.5.192 is available on Anselm - $ module load mkl + $ module load mkl The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the @@ -91,10 +91,10 @@ type (necessary for indexing large arrays, with more than 2^31^-1 elements), whereas the LP64 libraries index arrays with the 32-bit integer type. - Interface Integer type - ----------- ----------------------------------------------- - LP64 32-bit, int, integer(kind=4), MPI_INT - ILP64 64-bit, long int, integer(kind=8), MPI_INT64 +Interface Integer type +----------- ----------------------------------------------- +LP64 32-bit, int, integer(kind=4), MPI_INT +ILP64 64-bit, long int, integer(kind=8), MPI_INT64 ### Linking @@ -106,7 +106,7 @@ You will need the mkl module loaded to run the mkl enabled executable. This may be avoided, by compiling library search paths into the executable. Include rpath on the compile line: - $ icc .... -Wl,-rpath=$LIBRARY_PATH ... + $ icc .... -Wl,-rpath=$LIBRARY_PATH ... ### Threading @@ -118,13 +118,13 @@ For this to work, the application must link the threaded MKL library OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS - $ export OMP_NUM_THREADS=16 - $ export KMP_AFFINITY=granularity=fine,compact,1,0 + $ export OMP_NUM_THREADS=16 + $ export KMP_AFFINITY=granularity=fine,compact,1,0 The application will run with 16 threads with affinity optimized for fine grain parallelization. -[]()Examples +Examples ------------ Number of examples, demonstrating use of the MKL library and its linking @@ -134,47 +134,47 @@ program for multi-threaded matrix multiplication. ### Working with examples - $ module load intel - $ module load mkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas + $ module load intel + $ module load mkl + $ cp -a $MKL_EXAMPLES/cblas /tmp/ + $ cd /tmp/cblas - $ make sointel64 function=cblas_dgemm + $ make sointel64 function=cblas_dgemm In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL example suite installed on Anselm. ### Example: MKL and Intel compiler - $ module load intel - $ module load mkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - $ - $ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x - $ ./cblas_dgemmx.x data/cblas_dgemmx.d + $ module load intel + $ module load mkl + $ cp -a $MKL_EXAMPLES/cblas /tmp/ + $ cd /tmp/cblas + $ + $ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x + $ ./cblas_dgemmx.x data/cblas_dgemmx.d In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL with icc -mkl option. Using the -mkl option is equivalent to: - $ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x - -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 + $ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x + -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 In this example, we compile and link the cblas_dgemm example, using LP64 interface to threaded MKL and Intel OMP threads implementation. ### Example: MKL and GNU compiler - $ module load gcc - $ module load mkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - - $ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x - -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm + $ module load gcc + $ module load mkl + $ cp -a $MKL_EXAMPLES/cblas /tmp/ + $ cd /tmp/cblas + + $ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x + -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm - $ ./cblas_dgemmx.x data/cblas_dgemmx.d + $ ./cblas_dgemmx.x data/cblas_dgemmx.d In this example, we compile, link and run the cblas_dgemm example, using LP64 interface to threaded MKL and gnu OMP threads implementation. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-parallel-studio-introduction.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-parallel-studio-introduction.md index 1d4309ba0985631ad1df421cb1ecf59d9a1464c1..d4a7b981037e8786a840f42381a49eb1a69cc35b 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-parallel-studio-introduction.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-parallel-studio-introduction.md @@ -3,18 +3,18 @@ Intel Parallel Studio - + The Anselm cluster provides following elements of the Intel Parallel Studio XE - Intel Parallel Studio XE - ------------------------------------------------- - Intel Compilers - Intel Debugger - Intel MKL Library - Intel Integrated Performance Primitives Library - Intel Threading Building Blocks Library +Intel Parallel Studio XE +------------------------------------------------- +Intel Compilers +Intel Debugger +Intel MKL Library +Intel Integrated Performance Primitives Library +Intel Threading Building Blocks Library Intel compilers --------------- @@ -23,9 +23,9 @@ The Intel compilers version 13.1.3 are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. - $ module load intel - $ icc -v - $ ifort -v + $ module load intel + $ icc -v + $ ifort -v Read more at the [Intel Compilers](intel-compilers.html) page. @@ -40,8 +40,8 @@ environment. Use [X display](https://docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) for running the GUI. - $ module load intel - $ idb + $ module load intel + $ idb Read more at the [Intel Debugger](intel-debugger.html) page. @@ -55,7 +55,7 @@ Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. - $ module load mkl + $ module load mkl Read more at the [Intel MKL](intel-mkl.html) page. @@ -69,7 +69,7 @@ includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. - $ module load ipp + $ module load ipp Read more at the [Intel IPP](intel-integrated-performance-primitives.html) page. @@ -86,7 +86,7 @@ smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. - $ module load tbb + $ module load tbb Read more at the [Intel TBB](intel-tbb.html) page. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-tbb.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-tbb.md index 2672984a6a6a56fddd4e74f396fcf755ef2d4367..c9dde88efb33bdc82cc2602d7aa8dcd8c4cf0f31 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-tbb.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-suite/intel-tbb.md @@ -3,7 +3,7 @@ Intel TBB - + Intel Threading Building Blocks ------------------------------- @@ -18,7 +18,7 @@ accelerator](../intel-xeon-phi.html). Intel TBB version 4.1 is available on Anselm - $ module load tbb + $ module load tbb The module sets up environment variables, required for linking and running tbb enabled applications. @@ -31,12 +31,12 @@ Examples Number of examples, demonstrating use of TBB and its built-in scheduler is available on Anselm, in the $TBB_EXAMPLES directory. - $ module load intel - $ module load tbb - $ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ - $ cd /tmp/parallel_reduce/primes - $ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb - $ ./primes.x + $ module load intel + $ module load tbb + $ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ + $ cd /tmp/parallel_reduce/primes + $ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb + $ ./primes.x In this example, we compile, link and run the primes example, demonstrating use of parallel task-based reduce in computation of prime @@ -46,7 +46,7 @@ You will need the tbb module loaded to run the tbb enabled executable. This may be avoided, by compiling library search paths into the executable. - $ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb + $ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb Further reading --------------- diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-xeon-phi.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-xeon-phi.md index 485f6a98add3336f4a9e0336b64b5e2d09c0ad31..f82effb17a7d696edea35e67f18badf413d5123f 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-xeon-phi.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/intel-xeon-phi.md @@ -4,7 +4,7 @@ Intel Xeon Phi A guide to Intel Xeon Phi usage - + Intel Xeon Phi can be programmed in several modes. The default mode on Anselm is offload mode, but all modes described in this document are @@ -16,81 +16,81 @@ Intel Utilities for Xeon Phi To get access to a compute node with Intel Xeon Phi accelerator, use the PBS interactive session - $ qsub -I -q qmic -A NONE-0-0 + $ qsub -I -q qmic -A NONE-0-0 To set up the environment module "Intel" has to be loaded - $ module load intel/13.5.192 + $ module load intel/13.5.192 Information about the hardware can be obtained by running the micinfo program on the host. - $ /usr/bin/micinfo + $ /usr/bin/micinfo The output of the "micinfo" utility executed on one of the Anselm node is as follows. (note: to get PCIe related details the command has to be run with root privileges) - MicInfo Utility Log - - Created Mon Jul 22 00:23:50 2013 - - -        System Info -                HOST OS                : Linux -                OS Version             : 2.6.32-279.5.2.bl6.Bull.33.x86_64 -                Driver Version         : 6720-15 -                MPSS Version           : 2.1.6720-15 -                Host Physical Memory   : 98843 MB - - Device No: 0, Device Name: mic0 - -        Version -                Flash Version           : 2.1.03.0386 -                SMC Firmware Version    : 1.15.4830 -                SMC Boot Loader Version : 1.8.4326 -                uOS Version             : 2.6.38.8-g2593b11 -                Device Serial Number    : ADKC30102482 - -        Board -                Vendor ID               : 0x8086 -                Device ID               : 0x2250 -                Subsystem ID            : 0x2500 -                Coprocessor Stepping ID : 3 -                PCIe Width              : x16 -                PCIe Speed              : 5 GT/s -                PCIe Max payload size   : 256 bytes -                PCIe Max read req size  : 512 bytes -                Coprocessor Model       : 0x01 -                Coprocessor Model Ext   : 0x00 -                Coprocessor Type        : 0x00 -                Coprocessor Family      : 0x0b -                Coprocessor Family Ext  : 0x00 -                Coprocessor Stepping    : B1 -                Board SKU               : B1PRQ-5110P/5120D -                ECC Mode                : Enabled -                SMC HW Revision         : Product 225W Passive CS - -        Cores -                Total No of Active Cores : 60 -                Voltage                 : 1032000 uV -                Frequency               : 1052631 kHz - -        Thermal -                Fan Speed Control       : N/A -                Fan RPM                 : N/A -                Fan PWM                 : N/A -                Die Temp                : 49 C - -        GDDR -                GDDR Vendor             : Elpida -                GDDR Version            : 0x1 -                GDDR Density            : 2048 Mb -                GDDR Size               : 7936 MB -                GDDR Technology         : GDDR5 -                GDDR Speed              : 5.000000 GT/s -                GDDR Frequency          : 2500000 kHz -                GDDR Voltage            : 1501000 uV + MicInfo Utility Log + + Created Mon Jul 22 00:23:50 2013 + + +        System Info +                HOST OS                : Linux +                OS Version             : 2.6.32-279.5.2.bl6.Bull.33.x86_64 +                Driver Version         : 6720-15 +                MPSS Version           : 2.1.6720-15 +                Host Physical Memory   : 98843 MB + + Device No: 0, Device Name: mic0 + +        Version +                Flash Version           : 2.1.03.0386 +                SMC Firmware Version    : 1.15.4830 +                SMC Boot Loader Version : 1.8.4326 +                uOS Version             : 2.6.38.8-g2593b11 +                Device Serial Number    : ADKC30102482 + +        Board +                Vendor ID               : 0x8086 +                Device ID               : 0x2250 +                Subsystem ID            : 0x2500 +                Coprocessor Stepping ID : 3 +                PCIe Width              : x16 +                PCIe Speed              : 5 GT/s +                PCIe Max payload size   : 256 bytes +                PCIe Max read req size  : 512 bytes +                Coprocessor Model       : 0x01 +                Coprocessor Model Ext   : 0x00 +                Coprocessor Type        : 0x00 +                Coprocessor Family      : 0x0b +                Coprocessor Family Ext  : 0x00 +                Coprocessor Stepping    : B1 +                Board SKU               : B1PRQ-5110P/5120D +                ECC Mode                : Enabled +                SMC HW Revision         : Product 225W Passive CS + +        Cores +                Total No of Active Cores : 60 +                Voltage                 : 1032000 uV +                Frequency               : 1052631 kHz + +        Thermal +                Fan Speed Control       : N/A +                Fan RPM                 : N/A +                Fan PWM                 : N/A +                Die Temp                : 49 C + +        GDDR +                GDDR Vendor             : Elpida +                GDDR Version            : 0x1 +                GDDR Density            : 2048 Mb +                GDDR Size               : 7936 MB +                GDDR Technology         : GDDR5 +                GDDR Speed              : 5.000000 GT/s +                GDDR Frequency          : 2500000 kHz +                GDDR Voltage            : 1501000 uV Offload Mode ------------ @@ -99,44 +99,44 @@ To compile a code for Intel Xeon Phi a MPSS stack has to be installed on the machine where compilation is executed. Currently the MPSS stack is only installed on compute nodes equipped with accelerators. - $ qsub -I -q qmic -A NONE-0-0 - $ module load intel/13.5.192 + $ qsub -I -q qmic -A NONE-0-0 + $ module load intel/13.5.192 For debugging purposes it is also recommended to set environment variable "OFFLOAD_REPORT". Value can be set from 0 to 3, where higher number means more debugging information. - export OFFLOAD_REPORT=3 + export OFFLOAD_REPORT=3 A very basic example of code that employs offload programming technique is shown in the next listing. Please note that this code is sequential and utilizes only single core of the accelerator. - $ vim source-offload.cpp + $ vim source-offload.cpp - #include <iostream> + #include <iostream> - int main(int argc, char* argv[]) - { -    const int niter = 100000; -    double result = 0; + int main(int argc, char* argv[]) + { +    const int niter = 100000; +    double result = 0; -  #pragma offload target(mic) -    for (int i = 0; i < niter; ++i) { -        const double t = (i + 0.5) / niter; -        result += 4.0 / (t * t + 1.0); -    } -    result /= niter; -    std::cout << "Pi ~ " << result << 'n'; - } +  #pragma offload target(mic) +    for (int i = 0; i < niter; ++i) { +        const double t = (i + 0.5) / niter; +        result += 4.0 / (t * t + 1.0); +    } +    result /= niter; +    std::cout << "Pi ~ " << result << 'n'; + } To compile a code using Intel compiler run - $ icc source-offload.cpp -o bin-offload + $ icc source-offload.cpp -o bin-offload To execute the code, run the following command on the host - ./bin-offload + ./bin-offload ### Parallelization in Offload Mode Using OpenMP @@ -144,91 +144,91 @@ One way of paralelization a code for Xeon Phi is using OpenMP directives. The following example shows code for parallel vector addition. - $ vim ./vect-add + $ vim ./vect-add - #include <stdio.h> + #include <stdio.h> - typedef int T; + typedef int T; - #define SIZE 1000 + #define SIZE 1000 - #pragma offload_attribute(push, target(mic)) - T in1[SIZE]; - T in2[SIZE]; - T res[SIZE]; - #pragma offload_attribute(pop) + #pragma offload_attribute(push, target(mic)) + T in1[SIZE]; + T in2[SIZE]; + T res[SIZE]; + #pragma offload_attribute(pop) - // MIC function to add two vectors - __attribute__((target(mic))) add_mic(T *a, T *b, T *c, int size) { -  int i = 0; -  #pragma omp parallel for -    for (i = 0; i < size; i++) -      c[i] = a[i] + b[i]; - } + // MIC function to add two vectors + __attribute__((target(mic))) add_mic(T *a, T *b, T *c, int size) { +  int i = 0; +  #pragma omp parallel for +    for (i = 0; i < size; i++) +      c[i] = a[i] + b[i]; + } - // CPU function to add two vectors - void add_cpu (T *a, T *b, T *c, int size) { -  int i; -  for (i = 0; i < size; i++) -    c[i] = a[i] + b[i]; - } + // CPU function to add two vectors + void add_cpu (T *a, T *b, T *c, int size) { +  int i; +  for (i = 0; i < size; i++) +    c[i] = a[i] + b[i]; + } - // CPU function to generate a vector of random numbers - void random_T (T *a, int size) { -  int i; -  for (i = 0; i < size; i++) -    a[i] = rand() % 10000; // random number between 0 and 9999 - } + // CPU function to generate a vector of random numbers + void random_T (T *a, int size) { +  int i; +  for (i = 0; i < size; i++) +    a[i] = rand() % 10000; // random number between 0 and 9999 + } - // CPU function to compare two vectors - int compare(T *a, T *b, T size ){ -  int pass = 0; -  int i; -  for (i = 0; i < size; i++){ -    if (a[i] != b[i]) { -      printf("Value mismatch at location %d, values %d and %dn",i, a[i], b[i]); -      pass = 1; -    } -  } -  if (pass == 0) printf ("Test passedn"); else printf ("Test Failedn"); -  return pass; - } + // CPU function to compare two vectors + int compare(T *a, T *b, T size ){ +  int pass = 0; +  int i; +  for (i = 0; i < size; i++){ +    if (a[i] != b[i]) { +      printf("Value mismatch at location %d, values %d and %dn",i, a[i], b[i]); +      pass = 1; +    } +  } +  if (pass == 0) printf ("Test passedn"); else printf ("Test Failedn"); +  return pass; + } - int main() - { -  int i; -  random_T(in1, SIZE); -  random_T(in2, SIZE); + int main() + { +  int i; +  random_T(in1, SIZE); +  random_T(in2, SIZE); -  #pragma offload target(mic) in(in1,in2) inout(res) -  { +  #pragma offload target(mic) in(in1,in2) inout(res) +  { -    // Parallel loop from main function -    #pragma omp parallel for -    for (i=0; i<SIZE; i++) -      res[i] = in1[i] + in2[i]; +    // Parallel loop from main function +    #pragma omp parallel for +    for (i=0; i<SIZE; i++) +      res[i] = in1[i] + in2[i]; -    // or parallel loop is called inside the function -    add_mic(in1, in2, res, SIZE); +    // or parallel loop is called inside the function +    add_mic(in1, in2, res, SIZE); -  } +  } -  //Check the results with CPU implementation -  T res_cpu[SIZE]; -  add_cpu(in1, in2, res_cpu, SIZE); -  compare(res, res_cpu, SIZE); +  //Check the results with CPU implementation +  T res_cpu[SIZE]; +  add_cpu(in1, in2, res_cpu, SIZE); +  compare(res, res_cpu, SIZE); - } + } During the compilation Intel compiler shows which loops have been vectorized in both host and accelerator. This can be enabled with compiler option "-vec-report2". To compile and execute the code run - $ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add + $ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add - $ ./vect-add + $ ./vect-add Some interesting compiler flags useful not only for code debugging are: @@ -252,119 +252,119 @@ transparently. Behavioral of automatic offload mode is controlled by functions called within the program or by environmental variables. Complete list of -controls is listed [<span -class="external-link">here</span>](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). +controls is listed [ +class="external-link">here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). The Automatic Offload may be enabled by either an MKL function call within the code: - mkl_mic_enable(); + mkl_mic_enable(); or by setting environment variable - $ export MKL_MIC_ENABLE=1 + $ export MKL_MIC_ENABLE=1 To get more information about automatic offload please refer to "[Using Intel® MKL Automatic Offload on Intel ® Xeon Phi™ Coprocessors](http://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf)" -white paper or [<span class="external-link">Intel MKL -documentation</span>](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). +white paper or [ class="external-link">Intel MKL +documentation](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). ### Automatic offload example At first get an interactive PBS session on a node with MIC accelerator and load "intel" module that automatically loads "mkl" module as well. - $ qsub -I -q qmic -A OPEN-0-0 -l select=1:ncpus=16 - $ module load intel + $ qsub -I -q qmic -A OPEN-0-0 -l select=1:ncpus=16 + $ module load intel Following example show how to automatically offload an SGEMM (single -precision - g<span dir="auto">eneral matrix multiply</span>) function to +precision - g dir="auto">eneral matrix multiply) function to MIC coprocessor. The code can be copied to a file and compiled without any necessary modification. - $ vim sgemm-ao-short.c + $ vim sgemm-ao-short.c - #include <stdio.h> - #include <stdlib.h> - #include <malloc.h> - #include <stdint.h> + #include <stdio.h> + #include <stdlib.h> + #include <malloc.h> + #include <stdint.h> - #include "mkl.h" + #include "mkl.h" - int main(int argc, char **argv) - { -        float *A, *B, *C; /* Matrices */ + int main(int argc, char **argv) + { +        float *A, *B, *C; /* Matrices */ -        MKL_INT N = 2560; /* Matrix dimensions */ -        MKL_INT LD = N; /* Leading dimension */ -        int matrix_bytes; /* Matrix size in bytes */ -        int matrix_elements; /* Matrix size in elements */ +        MKL_INT N = 2560; /* Matrix dimensions */ +        MKL_INT LD = N; /* Leading dimension */ +        int matrix_bytes; /* Matrix size in bytes */ +        int matrix_elements; /* Matrix size in elements */ -        float alpha = 1.0, beta = 1.0; /* Scaling factors */ -        char transa = 'N', transb = 'N'; /* Transposition options */ +        float alpha = 1.0, beta = 1.0; /* Scaling factors */ +        char transa = 'N', transb = 'N'; /* Transposition options */ -        int i, j; /* Counters */ +        int i, j; /* Counters */ -        matrix_elements = N * N; -        matrix_bytes = sizeof(float) * matrix_elements; +        matrix_elements = N * N; +        matrix_bytes = sizeof(float) * matrix_elements; -        /* Allocate the matrices */ -        A = malloc(matrix_bytes); B = malloc(matrix_bytes); C = malloc(matrix_bytes); +        /* Allocate the matrices */ +        A = malloc(matrix_bytes); B = malloc(matrix_bytes); C = malloc(matrix_bytes); -        /* Initialize the matrices */ -        for (i = 0; i < matrix_elements; i++) { -                A[i] = 1.0; B[i] = 2.0; C[i] = 0.0; -        } +        /* Initialize the matrices */ +        for (i = 0; i < matrix_elements; i++) { +                A[i] = 1.0; B[i] = 2.0; C[i] = 0.0; +        } -        printf("Computing SGEMM on the hostn"); -        sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); +        printf("Computing SGEMM on the hostn"); +        sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); -        printf("Enabling Automatic Offloadn"); -        /* Alternatively, set environment variable MKL_MIC_ENABLE=1 */ -        mkl_mic_enable(); -        -        int ndevices = mkl_mic_get_device_count(); /* Number of MIC devices */ -        printf("Automatic Offload enabled: %d MIC devices presentn",  ndevices); +        printf("Enabling Automatic Offloadn"); +        /* Alternatively, set environment variable MKL_MIC_ENABLE=1 */ +        mkl_mic_enable(); +        +        int ndevices = mkl_mic_get_device_count(); /* Number of MIC devices */ +        printf("Automatic Offload enabled: %d MIC devices presentn",  ndevices); -        printf("Computing SGEMM with automatic workdivisionn"); -        sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); +        printf("Computing SGEMM with automatic workdivisionn"); +        sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); -        /* Free the matrix memory */ -        free(A); free(B); free(C); +        /* Free the matrix memory */ +        free(A); free(B); free(C); -        printf("Donen"); +        printf("Donen"); -    return 0; - } +    return 0; + } Please note: This example is simplified version of an example from MKL. The expanded version can be found here: -**$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** +$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** To compile a code using Intel compiler use: - $ icc -mkl sgemm-ao-short.c -o sgemm + $ icc -mkl sgemm-ao-short.c -o sgemm For debugging purposes enable the offload report to see more information about automatic offloading. - $ export OFFLOAD_REPORT=2 + $ export OFFLOAD_REPORT=2 The output of a code should look similar to following listing, where lines starting with [MKL] are generated by offload reporting: - Computing SGEMM on the host - Enabling Automatic Offload - Automatic Offload enabled: 1 MIC devices present - Computing SGEMM with automatic workdivision - [MKL] [MIC --] [AO Function]   SGEMM - [MKL] [MIC --] [AO SGEMM Workdivision] 0.00 1.00 - [MKL] [MIC 00] [AO SGEMM CPU Time]     0.463351 seconds - [MKL] [MIC 00] [AO SGEMM MIC Time]     0.179608 seconds - [MKL] [MIC 00] [AO SGEMM CPU->MIC Data] 52428800 bytes - [MKL] [MIC 00] [AO SGEMM MIC->CPU Data] 26214400 bytes - Done + Computing SGEMM on the host + Enabling Automatic Offload + Automatic Offload enabled: 1 MIC devices present + Computing SGEMM with automatic workdivision + [MKL] [MIC --] [AO Function]   SGEMM + [MKL] [MIC --] [AO SGEMM Workdivision] 0.00 1.00 + [MKL] [MIC 00] [AO SGEMM CPU Time]     0.463351 seconds + [MKL] [MIC 00] [AO SGEMM MIC Time]     0.179608 seconds + [MKL] [MIC 00] [AO SGEMM CPU->MIC Data] 52428800 bytes + [MKL] [MIC 00] [AO SGEMM MIC->CPU Data] 26214400 bytes + Done  @@ -379,9 +379,9 @@ To compile a code user has to be connected to a compute with MIC and load Intel compilers module. To get an interactive session on a compute node with an Intel Xeon Phi and load the module use following commands: - $ qsub -I -q qmic -A NONE-0-0 + $ qsub -I -q qmic -A NONE-0-0 - $ module load intel/13.5.192 + $ module load intel/13.5.192 Please note that particular version of the Intel module is specified. This information is used later to specify the correct library paths. @@ -391,16 +391,16 @@ to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: - $ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host + $ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host To run this code on host, use: - $ ./vect-add-host + $ ./vect-add-host The second example shows how to compile the same code for Intel Xeon Phi: - $ icc -mmic -fopenmp vect-add.c -o vect-add-mic + $ icc -mmic -fopenmp vect-add.c -o vect-add-mic ### Execution of the Program in Native Mode on Intel Xeon Phi @@ -411,18 +411,18 @@ have to copy binary files or libraries between the host and accelerator. To connect to the accelerator run: - $ ssh mic0 + $ ssh mic0 If the code is sequential, it can be executed directly: - mic0 $ ~/path_to_binary/vect-add-seq-mic + mic0 $ ~/path_to_binary/vect-add-seq-mic If the code is parallelized using OpenMP a set of additional libraries is required for execution. To locate these libraries new path has to be added to the LD_LIBRARY_PATH environment variable prior to the execution: - mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH + mic0 $ export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH Please note that the path exported in the previous example contains path to a specific compiler (here the version is 5.192). This version number @@ -430,142 +430,142 @@ has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required -for execution of an OpenMP parallel code on Intel Xeon Phi is:<span -class="discreet visualHighlight"></span> +for execution of an OpenMP parallel code on Intel Xeon Phi is: +class="discreet visualHighlight"> -<span>/apps/intel/composer_xe_2013.5.192/compiler/lib/mic +>/apps/intel/composer_xe_2013.5.192/compiler/lib/mic libiomp5.so libimf.so libsvml.so libirng.so libintlc.so.5 -</span> -<span>Finally, to run the compiled code use: </span> - $ ~/path_to_binary/vect-add-mic +>Finally, to run the compiled code use: + + $ ~/path_to_binary/vect-add-mic -<span>OpenCL</span> +>OpenCL ------------------- -<span>OpenCL (Open Computing Language) is an open standard for +>OpenCL (Open Computing Language) is an open standard for general-purpose parallel programming for diverse mix of multi-core CPUs, GPU coprocessors, and other parallel processors. OpenCL provides a flexible execution model and uniform programming environment for software developers to write portable code for systems running on both the CPU and graphics processors or accelerators like the Intel® Xeon -Phi.</span> +Phi. -<span>On Anselm OpenCL is installed only on compute nodes with MIC +>On Anselm OpenCL is installed only on compute nodes with MIC accelerator, therefore OpenCL code can be compiled only on these nodes. -</span> - module load opencl-sdk opencl-rt -<span>Always load "opencl-sdk" (providing devel files like headers) and + module load opencl-sdk opencl-rt + +>Always load "opencl-sdk" (providing devel files like headers) and "opencl-rt" (providing dynamic library libOpenCL.so) modules to compile and link OpenCL code. Load "opencl-rt" for running your compiled code. -</span> -<span>There are two basic examples of OpenCL code in the following -directory: </span> - /apps/intel/opencl-examples/ +>There are two basic examples of OpenCL code in the following +directory: + + /apps/intel/opencl-examples/ -<span>First example "CapsBasic" detects OpenCL compatible hardware, here +>First example "CapsBasic" detects OpenCL compatible hardware, here CPU and MIC, and prints basic information about the capabilities of it. -</span> - /apps/intel/opencl-examples/CapsBasic/capsbasic -<span>To compile and run the example copy it to your home directory, get + /apps/intel/opencl-examples/CapsBasic/capsbasic + +>To compile and run the example copy it to your home directory, get a PBS interactive session on of the nodes with MIC and run make for compilation. Make files are very basic and shows how the OpenCL code can -be compiled on Anselm. </span> +be compiled on Anselm. - $ cp /apps/intel/opencl-examples/CapsBasic/* . - $ qsub -I -q qmic -A NONE-0-0 - $ make + $ cp /apps/intel/opencl-examples/CapsBasic/* . + $ qsub -I -q qmic -A NONE-0-0 + $ make -<span>The compilation command for this example is: </span> +>The compilation command for this example is: - $ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ + $ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ -<span>After executing the complied binary file, following output should +>After executing the complied binary file, following output should be displayed. -</span> - ./capsbasic - Number of available platforms: 1 - Platform names: -    [0] Intel(R) OpenCL [Selected] - Number of devices available for each type: -    CL_DEVICE_TYPE_CPU: 1 -    CL_DEVICE_TYPE_GPU: 0 -    CL_DEVICE_TYPE_ACCELERATOR: 1 + ./capsbasic + + Number of available platforms: 1 + Platform names: +    [0] Intel(R) OpenCL [Selected] + Number of devices available for each type: +    CL_DEVICE_TYPE_CPU: 1 +    CL_DEVICE_TYPE_GPU: 0 +    CL_DEVICE_TYPE_ACCELERATOR: 1 - *** Detailed information for each device *** + *** Detailed information for each device *** - CL_DEVICE_TYPE_CPU[0] -    CL_DEVICE_NAME:       Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz -    CL_DEVICE_AVAILABLE: 1 + CL_DEVICE_TYPE_CPU[0] +    CL_DEVICE_NAME:       Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz +    CL_DEVICE_AVAILABLE: 1 - ... + ... - CL_DEVICE_TYPE_ACCELERATOR[0] -    CL_DEVICE_NAME: Intel(R) Many Integrated Core Acceleration Card -    CL_DEVICE_AVAILABLE: 1 + CL_DEVICE_TYPE_ACCELERATOR[0] +    CL_DEVICE_NAME: Intel(R) Many Integrated Core Acceleration Card +    CL_DEVICE_AVAILABLE: 1 - ... + ... -<span>More information about this example can be found on Intel website: +>More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> -</span> -<span>The second example that can be found in -"/apps/intel/opencl-examples" </span><span>directory is General Matrix + +>The second example that can be found in +"/apps/intel/opencl-examples" >directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. -</span> - - $ cp -r /apps/intel/opencl-examples/* . - $ qsub -I -q qmic -A NONE-0-0 - $ cd GEMM - $ make - -<span>The compilation command for this example is: </span> - - $ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ - -<span>To see the performance of Intel Xeon Phi performing the DGEMM run -the example as follows: </span> - - ./gemm -d 1 - Platforms (1): - [0] Intel(R) OpenCL [Selected] - Devices (2): - [0] Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz - [1] Intel(R) Many Integrated Core Acceleration Card [Selected] - Build program options: "-DT=float -DTILE_SIZE_M=1 -DTILE_GROUP_M=16 -DTILE_SIZE_N=128 -DTILE_GROUP_N=1 -DTILE_SIZE_K=8" - Running gemm_nn kernel with matrix size: 3968x3968 - Memory row stride to ensure necessary alignment: 15872 bytes - Size of memory region for one matrix: 62980096 bytes - Using alpha = 0.57599 and beta = 0.872412 - ... - Host time: 0.292953 sec. - Host perf: 426.635 GFLOPS - Host time: 0.293334 sec. - Host perf: 426.081 GFLOPS - ... - -<span>Please note: GNU compiler is used to compile the OpenCL codes for + + + $ cp -r /apps/intel/opencl-examples/* . + $ qsub -I -q qmic -A NONE-0-0 + $ cd GEMM + $ make + +>The compilation command for this example is: + + $ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ + +>To see the performance of Intel Xeon Phi performing the DGEMM run +the example as follows: + + ./gemm -d 1 + Platforms (1): + [0] Intel(R) OpenCL [Selected] + Devices (2): + [0] Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz + [1] Intel(R) Many Integrated Core Acceleration Card [Selected] + Build program options: "-DT=float -DTILE_SIZE_M=1 -DTILE_GROUP_M=16 -DTILE_SIZE_N=128 -DTILE_GROUP_N=1 -DTILE_SIZE_K=8" + Running gemm_nn kernel with matrix size: 3968x3968 + Memory row stride to ensure necessary alignment: 15872 bytes + Size of memory region for one matrix: 62980096 bytes + Using alpha = 0.57599 and beta = 0.872412 + ... + Host time: 0.292953 sec. + Host perf: 426.635 GFLOPS + Host time: 0.293334 sec. + Host perf: 426.081 GFLOPS + ... + +>Please note: GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. -</span> -<span>MPI </span> + +>MPI ----------------- ### Environment setup and compilation @@ -574,89 +574,89 @@ Again an MPI code for Intel Xeon Phi has to be compiled on a compute node with accelerator and MPSS software stack installed. To get to a compute node with accelerator use: - $ qsub -I -q qmic -A NONE-0-0 + $ qsub -I -q qmic -A NONE-0-0 The only supported implementation of MPI standard for Intel Xeon Phi is Intel MPI. To setup a fully functional development environment a combination of Intel compiler and Intel MPI has to be used. On a host load following modules before compilation: - $ module load intel/13.5.192 impi/4.1.1.036 + $ module load intel/13.5.192 impi/4.1.1.036 To compile an MPI code for host use: - $ mpiicc -xhost -o mpi-test mpi-test.c + $ mpiicc -xhost -o mpi-test mpi-test.c To compile the same code for Intel Xeon Phi architecture use: - $ mpiicc -mmic -o mpi-test-mic mpi-test.c + $ mpiicc -mmic -o mpi-test-mic mpi-test.c An example of basic MPI version of "hello-world" example in C language, that can be executed on both host and Xeon Phi is (can be directly copy and pasted to a .c file) - #include <stdio.h> - #include <mpi.h> + #include <stdio.h> + #include <mpi.h> - int main (argc, argv) -     int argc; -     char *argv[]; - { -  int rank, size; + int main (argc, argv) +     int argc; +     char *argv[]; + { +  int rank, size; -  int len; -  char node[MPI_MAX_PROCESSOR_NAME]; +  int len; +  char node[MPI_MAX_PROCESSOR_NAME]; -  MPI_Init (&argc, &argv);     /* starts MPI */ -  MPI_Comm_rank (MPI_COMM_WORLD, &rank);       /* get current process id */ -  MPI_Comm_size (MPI_COMM_WORLD, &size);       /* get number of processes */ +  MPI_Init (&argc, &argv);     /* starts MPI */ +  MPI_Comm_rank (MPI_COMM_WORLD, &rank);       /* get current process id */ +  MPI_Comm_size (MPI_COMM_WORLD, &size);       /* get number of processes */ -  MPI_Get_processor_name(node,&len); +  MPI_Get_processor_name(node,&len); -  printf( "Hello world from process %d of %d on host %s n", rank, size, node ); -  MPI_Finalize(); -  return 0; - } +  printf( "Hello world from process %d of %d on host %s n", rank, size, node ); +  MPI_Finalize(); +  return 0; + } ### MPI programming models -<span>Intel MPI for the Xeon Phi coprocessors offers different MPI -programming models:</span> +>Intel MPI for the Xeon Phi coprocessors offers different MPI +programming models: -**Host-only model** - all MPI ranks reside on the host. The coprocessors +Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.)** Coprocessor-only model** - all MPI ranks reside only on the coprocessors. -**Symmetric model** - the MPI ranks reside on both the host and the +Symmetric model** - the MPI ranks reside on both the host and the coprocessor. Most general MPI case. -### <span>Host-only model</span> +### >Host-only model -<span></span>In this case all environment variables are set by modules, +>In this case all environment variables are set by modules, so to execute the compiled MPI program on a single node, use: - $ mpirun -np 4 ./mpi-test + $ mpirun -np 4 ./mpi-test The output should be similar to: - Hello world from process 1 of 4 on host cn207 - Hello world from process 3 of 4 on host cn207 - Hello world from process 2 of 4 on host cn207 - Hello world from process 0 of 4 on host cn207 + Hello world from process 1 of 4 on host cn207 + Hello world from process 3 of 4 on host cn207 + Hello world from process 2 of 4 on host cn207 + Hello world from process 0 of 4 on host cn207 ### Coprocessor-only model -<span>There are two ways how to execute an MPI code on a single +>There are two ways how to execute an MPI code on a single coprocessor: 1.) lunch the program using "**mpirun**" from the coprocessor; or 2.) lunch the task using "**mpiexec.hydra**" from a host. -</span> -**Execution on coprocessor** + +Execution on coprocessor** Similarly to execution of OpenMP programs in native mode, since the environmental module are not supported on MIC, user has to setup paths @@ -665,17 +665,17 @@ by creating a "**.profile**" file in user's home directory. This file sets up the environment on the MIC automatically once user access to the accelerator through the SSH. - $ vim ~/.profile + $ vim ~/.profile - PS1='[u@h W]$ ' - export PATH=/usr/bin:/usr/sbin:/bin:/sbin + PS1='[u@h W]$ ' + export PATH=/usr/bin:/usr/sbin:/bin:/sbin - #OpenMP - export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH + #OpenMP + export LD_LIBRARY_PATH=/apps/intel/composer_xe_2013.5.192/compiler/lib/mic:$LD_LIBRARY_PATH - #Intel MPI - export LD_LIBRARY_PATH=/apps/intel/impi/4.1.1.036/mic/lib/:$LD_LIBRARY_PATH - export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH + #Intel MPI + export LD_LIBRARY_PATH=/apps/intel/impi/4.1.1.036/mic/lib/:$LD_LIBRARY_PATH + export PATH=/apps/intel/impi/4.1.1.036/mic/bin/:$PATH Please note:  - this file sets up both environmental variable for both MPI and OpenMP @@ -687,28 +687,28 @@ to match with loaded modules. To access a MIC accelerator located on a node that user is currently connected to, use: - $ ssh mic0 + $ ssh mic0 or in case you need specify a MIC accelerator on a particular node, use: - $ ssh cn207-mic0 + $ ssh cn207-mic0 To run the MPI code in parallel on multiple core of the accelerator, use: - $ mpirun -np 4 ./mpi-test-mic + $ mpirun -np 4 ./mpi-test-mic The output should be similar to: - Hello world from process 1 of 4 on host cn207-mic0 - Hello world from process 2 of 4 on host cn207-mic0 - Hello world from process 3 of 4 on host cn207-mic0 - Hello world from process 0 of 4 on host cn207-mic0 + Hello world from process 1 of 4 on host cn207-mic0 + Hello world from process 2 of 4 on host cn207-mic0 + Hello world from process 3 of 4 on host cn207-mic0 + Hello world from process 0 of 4 on host cn207-mic0 -** -** -**Execution on host** + + +Execution on host** If the MPI program is launched from host instead of the coprocessor, the environmental variables are not set using the ".profile" file. Therefore @@ -718,178 +718,178 @@ user has to specify library paths from the command line when calling First step is to tell mpiexec that the MPI should be executed on a local accelerator by setting up the environmental variable "I_MPI_MIC" - $ export I_MPI_MIC=1 + $ export I_MPI_MIC=1 Now the MPI program can be executed as: - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic + $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic or using mpirun - $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic + $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ -host mic0 -n 4 ~/mpi-test-mic Please note:  - the full path to the binary has to specified (here: -"**<span>~/mpi-test-mic</span>**") +"**>~/mpi-test-mic**")  - the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code The output should be again similar to: - Hello world from process 1 of 4 on host cn207-mic0 - Hello world from process 2 of 4 on host cn207-mic0 - Hello world from process 3 of 4 on host cn207-mic0 - Hello world from process 0 of 4 on host cn207-mic0 + Hello world from process 1 of 4 on host cn207-mic0 + Hello world from process 2 of 4 on host cn207-mic0 + Hello world from process 3 of 4 on host cn207-mic0 + Hello world from process 0 of 4 on host cn207-mic0 Please note that the "mpiexec.hydra" requires a file -"**<span>pmi_proxy</span>**" from Intel MPI library to be copied to the +"**>pmi_proxy**" from Intel MPI library to be copied to the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: -   $ ssh mic0 ls /bin/pmi_proxy -  /bin/pmi_proxy +   $ ssh mic0 ls /bin/pmi_proxy +  /bin/pmi_proxy + -** -** -**Execution on host - MPI processes distributed over multiple + +Execution on host - MPI processes distributed over multiple accelerators on multiple nodes** -<span>To get access to multiple nodes with MIC accelerator, user has to +>To get access to multiple nodes with MIC accelerator, user has to use PBS to allocate the resources. To start interactive session, that allocates 2 compute nodes = 2 MIC accelerators run qsub command with -following parameters: </span> +following parameters: - $ qsub -I -q qmic -A NONE-0-0 -l select=2:ncpus=16 + $ qsub -I -q qmic -A NONE-0-0 -l select=2:ncpus=16 - $ module load intel/13.5.192 impi/4.1.1.036 + $ module load intel/13.5.192 impi/4.1.1.036 -<span>This command connects user through ssh to one of the nodes +>This command connects user through ssh to one of the nodes immediately. To see the other nodes that have been allocated use: -</span> - $ cat $PBS_NODEFILE -<span>For example: </span> + $ cat $PBS_NODEFILE + +>For example: - cn204.bullx - cn205.bullx + cn204.bullx + cn205.bullx -<span>This output means that the PBS allocated nodes cn204 and cn205, +>This output means that the PBS allocated nodes cn204 and cn205, which means that user has direct access to "**cn204-mic0**" and -"**cn-205-mic0**" accelerators.</span> +"**cn-205-mic0**" accelerators. -<span>Please note: At this point user can connect to any of the +>Please note: At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: -- to connect to the second node : **<span class="monospace">$ ssh -cn205</span>** -<span>- to connect to the accelerator on the first node from the first -node: <span class="monospace">**$ ssh cn204-mic0**</span></span> or -**<span class="monospace">$ ssh mic0</span>** -**-** to connect to the accelerator on the second node from the first -node: <span class="monospace">**$ ssh cn205-mic0**</span> -</span> - -<span>At this point we expect that correct modules are loaded and binary -is compiled. For parallel execution the mpiexec.hydra is used.</span> +- to connect to the second node : ** $ ssh +cn205** +>- to connect to the accelerator on the first node from the first +node: **$ ssh cn204-mic0** or + $ ssh mic0** +-** to connect to the accelerator on the second node from the first +node: **$ ssh cn205-mic0** + + +>At this point we expect that correct modules are loaded and binary +is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC" - $ export I_MPI_MIC=1 + $ export I_MPI_MIC=1 -<span>The launch the MPI program use:</span> +>The launch the MPI program use: - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp -  -genv I_MPI_FABRICS shm:tcp -  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204-mic0 -n 4 ~/mpi-test-mic - : -host cn205-mic0 -n 6 ~/mpi-test-mic + $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ + -genv I_MPI_FABRICS_LIST tcp +  -genv I_MPI_FABRICS shm:tcp +  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 + -host cn204-mic0 -n 4 ~/mpi-test-mic + : -host cn205-mic0 -n 6 ~/mpi-test-mic or using mpirun: - $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp -  -genv I_MPI_FABRICS shm:tcp -  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204-mic0 -n 4 ~/mpi-test-mic - : -host cn205-mic0 -n 6 ~/mpi-test-mic + $ mpirun -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ + -genv I_MPI_FABRICS_LIST tcp +  -genv I_MPI_FABRICS shm:tcp +  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 + -host cn204-mic0 -n 4 ~/mpi-test-mic + : -host cn205-mic0 -n 6 ~/mpi-test-mic In this case four MPI processes are executed on accelerator cn204-mic and six processes are executed on accelerator cn205-mic0. The sample output (sorted after execution) is: - Hello world from process 0 of 10 on host cn204-mic0 - Hello world from process 1 of 10 on host cn204-mic0 - Hello world from process 2 of 10 on host cn204-mic0 - Hello world from process 3 of 10 on host cn204-mic0 - Hello world from process 4 of 10 on host cn205-mic0 - Hello world from process 5 of 10 on host cn205-mic0 - Hello world from process 6 of 10 on host cn205-mic0 - Hello world from process 7 of 10 on host cn205-mic0 - Hello world from process 8 of 10 on host cn205-mic0 - Hello world from process 9 of 10 on host cn205-mic0 + Hello world from process 0 of 10 on host cn204-mic0 + Hello world from process 1 of 10 on host cn204-mic0 + Hello world from process 2 of 10 on host cn204-mic0 + Hello world from process 3 of 10 on host cn204-mic0 + Hello world from process 4 of 10 on host cn205-mic0 + Hello world from process 5 of 10 on host cn205-mic0 + Hello world from process 6 of 10 on host cn205-mic0 + Hello world from process 7 of 10 on host cn205-mic0 + Hello world from process 8 of 10 on host cn205-mic0 + Hello world from process 9 of 10 on host cn205-mic0 The same way MPI program can be executed on multiple hosts: - $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp -  -genv I_MPI_FABRICS shm:tcp -  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -host cn204 -n 4 ~/mpi-test - : -host cn205 -n 6 ~/mpi-test + $ mpiexec.hydra -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ + -genv I_MPI_FABRICS_LIST tcp +  -genv I_MPI_FABRICS shm:tcp +  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 + -host cn204 -n 4 ~/mpi-test + : -host cn205 -n 6 ~/mpi-test -### <span>Symmetric model </span> +### >Symmetric model -<span>In a symmetric mode MPI programs are executed on both host +>In a symmetric mode MPI programs are executed on both host computer(s) and MIC accelerator(s). Since MIC has a different architecture and requires different binary file produced by the Intel compiler two different files has to be compiled before MPI program is -executed. </span> +executed. -<span>In the previous section we have compiled two binary files, one for +>In the previous section we have compiled two binary files, one for hosts "**mpi-test**" and one for MIC accelerators "**mpi-test-mic**". These two binaries can be executed at once using mpiexec.hydra: -</span> - $ mpiexec.hydra - -genv I_MPI_FABRICS_LIST tcp - -genv I_MPI_FABRICS shm:tcp -  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -host cn205 -n 2 ~/mpi-test - : -host cn205-mic0 -n 2 ~/mpi-test-mic + + $ mpiexec.hydra + -genv I_MPI_FABRICS_LIST tcp + -genv I_MPI_FABRICS shm:tcp +  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 + -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ + -host cn205 -n 2 ~/mpi-test + : -host cn205-mic0 -n 2 ~/mpi-test-mic In this example the first two parameters (line 2 and 3) sets up required environment variables for execution. The third line specifies binary that is executed on host (here cn205) and the last line specifies the binary that is execute on the accelerator (here cn205-mic0). -<span>The output of the program is: </span> +>The output of the program is: - Hello world from process 0 of 4 on host cn205 - Hello world from process 1 of 4 on host cn205 - Hello world from process 2 of 4 on host cn205-mic0 - Hello world from process 3 of 4 on host cn205-mic0 + Hello world from process 0 of 4 on host cn205 + Hello world from process 1 of 4 on host cn205 + Hello world from process 2 of 4 on host cn205-mic0 + Hello world from process 3 of 4 on host cn205-mic0 -<span>The execution procedure can be simplified by using the mpirun +>The execution procedure can be simplified by using the mpirun command with the machine file a a parameter. Machine file contains list of all nodes and accelerators that should used to execute MPI processes. -</span> -<span>An example of a machine file that uses 2 <span>hosts (**cn205** + +>An example of a machine file that uses 2 >hosts (**cn205** and **cn206**) and 2 accelerators **(cn205-mic0** and **cn206-mic0**) to -run 2 MPI processes on each</span> of them: -</span> +run 2 MPI processes on each of them: + - $ cat hosts_file_mix - cn205:2 - cn205-mic0:2 - cn206:2 - cn206-mic0:2 + $ cat hosts_file_mix + cn205:2 + cn205-mic0:2 + cn206:2 + cn206-mic0:2 -<span>In addition if a naming convention is set in a way that the name +>In addition if a naming convention is set in a way that the name of the binary for host is **"bin_name"** and the name of the binary for the accelerator is **"bin_name-mic"** then by setting up the environment variable **I_MPI_MIC_POSTFIX** to **"-mic"** user do not @@ -897,50 +897,50 @@ have to specify the names of booth binaries. In this case mpirun needs just the name of the host binary file (i.e. "mpi-test") and uses the suffix to get a name of the binary for accelerator (i..e. "mpi-test-mic"). -</span> - $ export I_MPI_MIC_POSTFIX=-mic - <span>To run the MPI code using mpirun and the machine file + $ export I_MPI_MIC_POSTFIX=-mic + + >To run the MPI code using mpirun and the machine file "hosts_file_mix" use: -</span> - - $ mpirun - -genv I_MPI_FABRICS shm:tcp - -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ - -genv I_MPI_FABRICS_LIST tcp -  -genv I_MPI_FABRICS shm:tcp -  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 - -machinefile hosts_file_mix - ~/mpi-test - -<span>A possible output of the MPI "hello-world" example executed on two + + + $ mpirun + -genv I_MPI_FABRICS shm:tcp + -genv LD_LIBRARY_PATH /apps/intel/impi/4.1.1.036/mic/lib/ + -genv I_MPI_FABRICS_LIST tcp +  -genv I_MPI_FABRICS shm:tcp +  -genv I_MPI_TCP_NETMASK=10.1.0.0/16 + -machinefile hosts_file_mix + ~/mpi-test + +>A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is: -</span> - Hello world from process 0 of 8 on host cn204 - Hello world from process 1 of 8 on host cn204 - Hello world from process 2 of 8 on host cn204-mic0 - Hello world from process 3 of 8 on host cn204-mic0 - Hello world from process 4 of 8 on host cn205 - Hello world from process 5 of 8 on host cn205 - Hello world from process 6 of 8 on host cn205-mic0 - Hello world from process 7 of 8 on host cn205-mic0 + + Hello world from process 0 of 8 on host cn204 + Hello world from process 1 of 8 on host cn204 + Hello world from process 2 of 8 on host cn204-mic0 + Hello world from process 3 of 8 on host cn204-mic0 + Hello world from process 4 of 8 on host cn205 + Hello world from process 5 of 8 on host cn205 + Hello world from process 6 of 8 on host cn205-mic0 + Hello world from process 7 of 8 on host cn205-mic0 Please note: At this point the MPI communication between MIC accelerators on different nodes uses 1Gb Ethernet only. -**Using the PBS automatically generated node-files -** +Using the PBS automatically generated node-files + PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: -**Host only node-file:** +Host only node-file:**  - /lscratch/$/nodefile-cn -**MIC only node-file**: +MIC only node-file**:  - /lscratch/$/nodefile-mic -**Host and MIC node-file**: +Host and MIC node-file**:  - /lscratch/$/nodefile-mix Please note each host or accelerator is listed only per files. User has @@ -952,8 +952,8 @@ Optimization For more details about optimization techniques please read Intel document [Optimization and Performance Tuning for Intel® Xeon Phi™ -Coprocessors](http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization "http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization"){.external -.text}. +Coprocessors](http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization "http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization") +.  diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/isv_licenses.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/isv_licenses.md index bc415273d0369792f82cc63521902dd456817b16..e3bc7b71e23e9984e6db0d57308e74e2562ae110 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/isv_licenses.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/isv_licenses.md @@ -4,7 +4,7 @@ ISV Licenses A guide to managing Independent Software Vendor licences - + On Anselm cluster there are also installed commercial software applications, also known as ISV (Independent Software Vendor), which are @@ -40,15 +40,15 @@ number of free license features For each license there is a unique text file, which provides the information about the name, number of available (purchased/licensed), number of used and number of free license features. The text files are -accessible from the Anselm command prompt.[]() +accessible from the Anselm command prompt. - Product File with license state Note - ------------ ----------------------------------------------------- --------------------- - ansys /apps/user/licenses/ansys_features_state.txt Commercial - comsol /apps/user/licenses/comsol_features_state.txt Commercial - comsol-edu /apps/user/licenses/comsol-edu_features_state.txt Non-commercial only - matlab /apps/user/licenses/matlab_features_state.txt Commercial - matlab-edu /apps/user/licenses/matlab-edu_features_state.txt Non-commercial only +Product File with license state Note +------------ ----------------------------------------------------- --------------------- +ansys /apps/user/licenses/ansys_features_state.txt Commercial +comsol /apps/user/licenses/comsol_features_state.txt Commercial +comsol-edu /apps/user/licenses/comsol-edu_features_state.txt Non-commercial only +matlab /apps/user/licenses/matlab_features_state.txt Commercial +matlab-edu /apps/user/licenses/matlab-edu_features_state.txt Non-commercial only The file has a header which serves as a legend. All the info in the legend starts with a hash (#) so it can be easily filtered when parsing @@ -56,22 +56,22 @@ the file via a script. Example of the Commercial Matlab license state: - $ cat /apps/user/licenses/matlab_features_state.txt - # matlab - # ------------------------------------------------- - # FEATURE TOTAL USED AVAIL - # ------------------------------------------------- - MATLAB 1 1 0 - SIMULINK 1 0 1 - Curve_Fitting_Toolbox 1 0 1 - Signal_Blocks 1 0 1 - GADS_Toolbox 1 0 1 - Image_Toolbox 1 0 1 - Compiler 1 0 1 - Neural_Network_Toolbox 1 0 1 - Optimization_Toolbox 1 0 1 - Signal_Toolbox 1 0 1 - Statistics_Toolbox 1 0 1 + $ cat /apps/user/licenses/matlab_features_state.txt + # matlab + # ------------------------------------------------- + # FEATURE TOTAL USED AVAIL + # ------------------------------------------------- + MATLAB 1 1 0 + SIMULINK 1 0 1 + Curve_Fitting_Toolbox 1 0 1 + Signal_Blocks 1 0 1 + GADS_Toolbox 1 0 1 + Image_Toolbox 1 0 1 + Compiler 1 0 1 + Neural_Network_Toolbox 1 0 1 + Optimization_Toolbox 1 0 1 + Signal_Toolbox 1 0 1 + Statistics_Toolbox 1 0 1 License tracking in PBS Pro scheduler and users usage ----------------------------------------------------- @@ -79,21 +79,21 @@ License tracking in PBS Pro scheduler and users usage Each feature of each license is accounted and checked by the scheduler of PBS Pro. If you ask for certain licences, the scheduler won't start the job until the asked licenses are free (available). This prevents to -crash batch jobs, just because of <span id="result_box" -class="short_text"><span class="hps">unavailability</span></span> of the +crash batch jobs, just because of id="result_box" +class="short_text"> class="hps">unavailability of the needed licenses. The general format of the name is: -**feature__APP__FEATURE** +feature__APP__FEATURE** Names of applications (APP): -- ansys -- comsol -- comsol-edu -- matlab -- matlab-edu +- ansys +- comsol +- comsol-edu +- matlab +- matlab-edu  @@ -207,7 +207,7 @@ Example of PBS Pro resource name, based on APP and FEATURE name: </tbody> </table> -**Be aware, that the resource names in PBS Pro are CASE SENSITIVE!** +Be aware, that the resource names in PBS Pro are CASE SENSITIVE!** ### Example of qsub statement @@ -215,7 +215,7 @@ Run an interactive PBS job with 1 Matlab EDU license, 1 Distributed Computing Toolbox and 32 Distributed Computing Engines (running on 32 cores): - $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 -l feature__matlab-edu__MATLAB=1 -l feature__matlab-edu__Distrib_Computing_Toolbox=1 -l feature__matlab-edu__MATLAB_Distrib_Comp_Engine=32 + $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=16 -l feature__matlab-edu__MATLAB=1 -l feature__matlab-edu__Distrib_Computing_Toolbox=1 -l feature__matlab-edu__MATLAB_Distrib_Comp_Engine=32 The license is used and accounted only with the real usage of the product. So in this example, the general Matlab is used after Matlab is diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/java.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/java.md index 9910b24fd07042b164bea0f5f87a971dee51f52d..6bf87d7f92753e8ebdc89be81bde7e474c533c6c 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/java.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/java.md @@ -4,26 +4,26 @@ Java Java on ANSELM - + Java is available on Anselm cluster. Activate java by loading the java module - $ module load java + $ module load java Note that the java module must be loaded on the compute nodes as well, in order to run java on compute nodes. Check for java version and path - $ java -version - $ which java + $ java -version + $ which java With the module loaded, not only the runtime environment (JRE), but also the development environment (JDK) with the compiler is available. - $ javac -version - $ which javac + $ javac -version + $ which javac Java applications may use MPI for interprocess communication, in conjunction with OpenMPI. Read more diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/kvirtualization.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/kvirtualization.md index 1ff0abeaa366812b5189450bfb24168404b5a8ed..0eec6016681350d406141600d390b1945e7246ef 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/kvirtualization.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/kvirtualization.md @@ -4,7 +4,7 @@ Virtualization Running virtual machines on compute nodes - + Introduction ------------ @@ -12,14 +12,14 @@ Introduction There are situations when Anselm's environment is not suitable for user needs. -- Application requires different operating system (e.g Windows), - application is not available for Linux -- Application requires different versions of base system libraries and - tools -- Application requires specific setup (installation, configuration) of - complex software stack -- Application requires privileged access to operating system -- ... and combinations of above cases +- Application requires different operating system (e.g Windows), + application is not available for Linux +- Application requires different versions of base system libraries and + tools +- Application requires specific setup (installation, configuration) of + complex software stack +- Application requires privileged access to operating system +- ... and combinations of above cases  We offer solution for these cases - **virtualization**. Anselm's environment gives the possibility to run virtual machines on compute @@ -50,8 +50,8 @@ Virtualization has also some drawbacks, it is not so easy to setup efficient solution. Solution described in chapter -[HOWTO](virtualization.html#howto)<span -class="anchor-link"> is suitable for </span>single node tasks, does not +[HOWTO](virtualization.html#howto) +class="anchor-link"> is suitable for single node tasks, does not introduce virtual machine clustering. Please consider virtualization as last resort solution for your needs. @@ -67,8 +67,8 @@ Licensing --------- IT4Innovations does not provide any licenses for operating systems and -software of virtual machines. Users are (<span id="result_box"><span -class="hps">in accordance with</span></span> [Acceptable use policy +software of virtual machines. Users are ( id="result_box"> +class="hps">in accordance with [Acceptable use policy document](http://www.it4i.cz/acceptable-use-policy.pdf)) fully responsible for licensing all software running in virtual machines on Anselm. Be aware of complex conditions of licensing software in @@ -77,10 +77,10 @@ virtual environments. Users are responsible for licensing OS e.g. MS Windows and all software running in their virtual machines. -[]() HOWTO + HOWTO ---------- -### []()Virtual Machine Job Workflow +### Virtual Machine Job Workflow We propose this job workflow: @@ -102,11 +102,11 @@ concurrent jobs. ### Procedure -1. Prepare image of your virtual machine -2. Optimize image of your virtual machine for Anselm's virtualization -3. Modify your image for running jobs -4. Create job script for executing virtual machine -5. Run jobs +1.Prepare image of your virtual machine +2.Optimize image of your virtual machine for Anselm's virtualization +3.Modify your image for running jobs +4.Create job script for executing virtual machine +5.Run jobs ### Prepare image of your virtual machine @@ -114,15 +114,15 @@ You can either use your existing image or create new image from scratch. QEMU currently supports these image types or formats: -- raw -- cloop -- cow -- qcow -- qcow2 -- vmdk - VMware 3 & 4, or 6 image format, for exchanging images with - that product -- vdi - VirtualBox 1.1 compatible image format, for exchanging images - with VirtualBox. +- raw +- cloop +- cow +- qcow +- qcow2 +- vmdk - VMware 3 & 4, or 6 image format, for exchanging images with + that product +- vdi - VirtualBox 1.1 compatible image format, for exchanging images + with VirtualBox. You can convert your existing image using qemu-img convert command. Supported formats of this command are: blkdebug blkverify bochs cloop @@ -143,16 +143,16 @@ information see [Virtio Linux](http://www.linux-kvm.org/page/Virtio) and [Virtio Windows](http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers). -Disable all <span id="result_box" class="short_text"><span -class="hps trans-target-highlight">unnecessary</span></span> services +Disable all id="result_box" class="short_text"> +class="hps trans-target-highlight">unnecessary services and tasks. Restrict all unnecessary operating system operations. -Remove all <span id="result_box" class="short_text"><span +Remove all id="result_box" class="short_text"> class="hps trans-target-highlight">unnecessary software and -files.</span></span> +files. -<span id="result_box" class="short_text"><span -class="hps trans-target-highlight"></span></span>Remove all paging + id="result_box" class="short_text"> +class="hps trans-target-highlight">Remove all paging space, swap files, partitions, etc. Shrink your image. (It is recommended to zero all free space and @@ -169,8 +169,8 @@ We recommend, that startup script maps Job Directory from host (from compute node) runs script (we call it "run script") from Job Directory and waits for application's exit -- for management purposes if run script does not exist wait for some - time period (few minutes) +- for management purposes if run script does not exist wait for some + time period (few minutes) shutdowns/quits OS For Windows operating systems we suggest using Local Group Policy @@ -179,42 +179,42 @@ script or similar service. Example startup script for Windows virtual machine: - @echo off - set LOG=c:startup.log - set MAPDRIVE=z: - set SCRIPT=%MAPDRIVE%run.bat - set TIMEOUT=300 + @echo off + set LOG=c:startup.log + set MAPDRIVE=z: + set SCRIPT=%MAPDRIVE%run.bat + set TIMEOUT=300 - echo %DATE% %TIME% Running startup script>%LOG% + echo %DATE% %TIME% Running startup script>%LOG% - rem Mount share - echo %DATE% %TIME% Mounting shared drive>>%LOG% - net use z: 10.0.2.4qemu >>%LOG% 2>&1 - dir z: >>%LOG% 2>&1 - echo. >>%LOG% + rem Mount share + echo %DATE% %TIME% Mounting shared drive>>%LOG% + net use z: 10.0.2.4qemu >>%LOG% 2>&1 + dir z: >>%LOG% 2>&1 + echo. >>%LOG% - if exist %MAPDRIVE% ( -  echo %DATE% %TIME% The drive "%MAPDRIVE%" exists>>%LOG% + if exist %MAPDRIVE% ( +  echo %DATE% %TIME% The drive "%MAPDRIVE%" exists>>%LOG% -  if exist %SCRIPT% ( -    echo %DATE% %TIME% The script file "%SCRIPT%"exists>>%LOG% -    echo %DATE% %TIME% Running script %SCRIPT%>>%LOG% -    set TIMEOUT=0 -    call %SCRIPT% -  ) else ( -    echo %DATE% %TIME% The script file "%SCRIPT%"does not exist>>%LOG% -  ) +  if exist %SCRIPT% ( +    echo %DATE% %TIME% The script file "%SCRIPT%"exists>>%LOG% +    echo %DATE% %TIME% Running script %SCRIPT%>>%LOG% +    set TIMEOUT=0 +    call %SCRIPT% +  ) else ( +    echo %DATE% %TIME% The script file "%SCRIPT%"does not exist>>%LOG% +  ) - ) else ( -  echo %DATE% %TIME% The drive "%MAPDRIVE%" does not exist>>%LOG% - ) - echo. >>%LOG% + ) else ( +  echo %DATE% %TIME% The drive "%MAPDRIVE%" does not exist>>%LOG% + ) + echo. >>%LOG% - timeout /T %TIMEOUT% + timeout /T %TIMEOUT% - echo %DATE% %TIME% Shut down>>%LOG% - shutdown /s /t 0 + echo %DATE% %TIME% Shut down>>%LOG% + shutdown /s /t 0 Example startup script maps shared job script as drive z: and looks for run script called run.bat. If run script is found it is run else wait @@ -222,53 +222,53 @@ for 5 minutes, then shutdown virtual machine. ### Create job script for executing virtual machine -Create job script according recommended <span id="result_box" -class="short_text"><span -class="hps trans-target-highlight"></span></span>[Virtual Machine Job +Create job script according recommended id="result_box" +class="short_text"> +class="hps trans-target-highlight">[Virtual Machine Job Workflow](virtualization.html#virtual-machine-job-workflow). Example job for Windows virtual machine: - #/bin/sh - - JOB_DIR=/scratch/$USER/win/$ - - #Virtual machine settings - VM_IMAGE=~/work/img/win.img - VM_MEMORY=49152 - VM_SMP=16 - - # Prepare job dir - mkdir -p $ && cd $ || exit 1 - ln -s ~/work/win . - ln -s /scratch/$USER/data . - ln -s ~/work/win/script/run/run-appl.bat run.bat - - # Run virtual machine - export TMPDIR=/lscratch/$ - module add qemu - qemu-system-x86_64 -  -enable-kvm -  -cpu host -  -smp $ -  -m $ -  -vga std -  -localtime -  -usb -usbdevice tablet -  -device virtio-net-pci,netdev=net0 -  -netdev user,id=net0,smb=$,hostfwd=tcp::3389-:3389 -  -drive file=$,media=disk,if=virtio -  -snapshot -  -nographic + #/bin/sh + + JOB_DIR=/scratch/$USER/win/$ + + #Virtual machine settings + VM_IMAGE=~/work/img/win.img + VM_MEMORY=49152 + VM_SMP=16 + + # Prepare job dir + mkdir -p $ && cd $ || exit 1 + ln -s ~/work/win . + ln -s /scratch/$USER/data . + ln -s ~/work/win/script/run/run-appl.bat run.bat + + # Run virtual machine + export TMPDIR=/lscratch/$ + module add qemu + qemu-system-x86_64 +  -enable-kvm +  -cpu host +  -smp $ +  -m $ +  -vga std +  -localtime +  -usb -usbdevice tablet +  -device virtio-net-pci,netdev=net0 +  -netdev user,id=net0,smb=$,hostfwd=tcp::3389-:3389 +  -drive file=$,media=disk,if=virtio +  -snapshot +  -nographic Job script links application data (win), input data (data) and run script (run.bat) into job directory and runs virtual machine. Example run script (run.bat) for Windows virtual machine: - z: - cd winappl - call application.bat z:data z:output + z: + cd winappl + call application.bat z:data z:output Run script runs application from shared job directory (mapped as drive z:), process input data (z:data) from job directory and store output @@ -287,17 +287,17 @@ work on login nodes. Load QEMU environment module: - $ module add qemu + $ module add qemu Get help - $ man qemu + $ man qemu Run virtual machine (simple) - $ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -vnc :0 + $ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -vnc :0 - $ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -vnc :0 + $ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -vnc :0 You can access virtual machine by VNC viewer (option -vnc) connecting to IP address of compute node. For VNC you must use [VPN @@ -305,16 +305,16 @@ network](../../accessing-the-cluster/vpn-access.html). Install virtual machine from iso file - $ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -cdrom linux-install.iso -boot d -vnc :0 + $ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -cdrom linux-install.iso -boot d -vnc :0 - $ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -cdrom win-install.iso -boot d -vnc :0 + $ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -cdrom win-install.iso -boot d -vnc :0 Run virtual machine using optimized devices, user network backend with sharing and port forwarding, in snapshot mode - $ qemu-system-x86_64 -drive file=linux.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 -vnc :0 -snapshot + $ qemu-system-x86_64 -drive file=linux.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 -vnc :0 -snapshot - $ qemu-system-x86_64 -drive file=win.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 -vnc :0 -snapshot + $ qemu-system-x86_64 -drive file=win.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 -vnc :0 -snapshot Thanks to port forwarding you can access virtual machine via SSH (Linux) or RDP (Windows) connecting to IP address of compute node (and port 2222 @@ -338,22 +338,22 @@ have access to Anselm's network via NAT on compute node (host). Simple network setup - $ qemu-system-x86_64 ... -net nic -net user + $ qemu-system-x86_64 ... -net nic -net user (It is default when no -net options are given.) Simple network setup with sharing and port forwarding (obsolete but simpler syntax, lower performance) - $ qemu-system-x86_64 ... -net nic -net user,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 + $ qemu-system-x86_64 ... -net nic -net user,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 Optimized network setup with sharing and port forwarding - $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 + $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 ### Advanced networking -**Internet access** +Internet access** Sometime your virtual machine needs access to internet (install software, updates, software activation, etc). We suggest solution using @@ -364,35 +364,35 @@ performance, the worst performance of all described solutions. Load VDE enabled QEMU environment module (unload standard QEMU module first if necessary). - $ module add qemu/2.1.2-vde2 + $ module add qemu/2.1.2-vde2 Create virtual network switch. - $ vde_switch -sock /tmp/sw0 -mgmt /tmp/sw0.mgmt -daemon + $ vde_switch -sock /tmp/sw0 -mgmt /tmp/sw0.mgmt -daemon Run SLIRP daemon over SSH tunnel on login node and connect it to virtual network switch. - $ dpipe vde_plug /tmp/sw0 = ssh login1 $VDE2_DIR/bin/slirpvde -s - --dhcp & + $ dpipe vde_plug /tmp/sw0 = ssh login1 $VDE2_DIR/bin/slirpvde -s - --dhcp & Run qemu using vde network backend, connect to created virtual switch. Basic setup (obsolete syntax) - $ qemu-system-x86_64 ... -net nic -net vde,sock=/tmp/sw0 + $ qemu-system-x86_64 ... -net nic -net vde,sock=/tmp/sw0 Setup using virtio device (obsolete syntax) - $ qemu-system-x86_64 ... -net nic,model=virtio -net vde,sock=/tmp/sw0 + $ qemu-system-x86_64 ... -net nic,model=virtio -net vde,sock=/tmp/sw0 Optimized setup - $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev vde,id=net0,sock=/tmp/sw0 + $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev vde,id=net0,sock=/tmp/sw0 + + -** -** -**TAP interconnect** +TAP interconnect** Both user and vde network backend have low performance. For fast interconnect (10Gbps and more) of compute node (host) and virtual @@ -404,8 +404,8 @@ networking, so you should provide your services if you need them. Run qemu with TAP network backend: - $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net1 - -netdev tap,id=net1,ifname=tap0,script=no,downscript=no + $ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net1 + -netdev tap,id=net1,ifname=tap0,script=no,downscript=no Interface tap0 has IP address 192.168.1.1 and network mask 255.255.255.0 (/24). In virtual machine use IP address from range @@ -415,56 +415,56 @@ non-privileged user can provide services on these ports. Redirected ports: -- DNS udp/53->udp/3053, tcp/53->tcp3053 -- DHCP udp/67->udp3067 -- SMB tcp/139->tcp3139, tcp/445->tcp3445). +- DNS udp/53->udp/3053, tcp/53->tcp3053 +- DHCP udp/67->udp3067 +- SMB tcp/139->tcp3139, tcp/445->tcp3445). You can configure IP address of virtual machine statically or dynamically. For dynamic addressing provide your DHCP server on port 3067 of tap0 interface, you can also provide your DNS server on port 3053 of tap0 interface for example: - $ dnsmasq --interface tap0 --bind-interfaces -p 3053 --dhcp-alternate-port=3067,68 --dhcp-range=192.168.1.15,192.168.1.32 --dhcp-leasefile=/tmp/dhcp.leasefile + $ dnsmasq --interface tap0 --bind-interfaces -p 3053 --dhcp-alternate-port=3067,68 --dhcp-range=192.168.1.15,192.168.1.32 --dhcp-leasefile=/tmp/dhcp.leasefile You can also provide your SMB services (on ports 3139, 3445) to obtain high performance data sharing. Example smb.conf (not optimized) - [global] - socket address=192.168.1.1 - smb ports = 3445 3139 - - private dir=/tmp/qemu-smb - pid directory=/tmp/qemu-smb - lock directory=/tmp/qemu-smb - state directory=/tmp/qemu-smb - ncalrpc dir=/tmp/qemu-smb/ncalrpc - log file=/tmp/qemu-smb/log.smbd - smb passwd file=/tmp/qemu-smb/smbpasswd - security = user - map to guest = Bad User - unix extensions = no - load printers = no - printing = bsd - printcap name = /dev/null - disable spoolss = yes - log level = 1 - guest account = USER - [qemu] - path=/scratch/USER/tmp - read only=no - guest ok=yes - writable=yes - follow symlinks=yes - wide links=yes - force user=USER + [global] + socket address=192.168.1.1 + smb ports = 3445 3139 + + private dir=/tmp/qemu-smb + pid directory=/tmp/qemu-smb + lock directory=/tmp/qemu-smb + state directory=/tmp/qemu-smb + ncalrpc dir=/tmp/qemu-smb/ncalrpc + log file=/tmp/qemu-smb/log.smbd + smb passwd file=/tmp/qemu-smb/smbpasswd + security = user + map to guest = Bad User + unix extensions = no + load printers = no + printing = bsd + printcap name = /dev/null + disable spoolss = yes + log level = 1 + guest account = USER + [qemu] + path=/scratch/USER/tmp + read only=no + guest ok=yes + writable=yes + follow symlinks=yes + wide links=yes + force user=USER (Replace USER with your login name.) Run SMB services - smbd -s /tmp/qemu-smb/smb.conf + smbd -s /tmp/qemu-smb/smb.conf  @@ -472,20 +472,20 @@ Virtual machine can of course have more than one network interface controller, virtual machine can use more than one network backend. So, you can combine for example use network backend and TAP interconnect. -### []()Snapshot mode +### Snapshot mode In snapshot mode image is not written, changes are written to temporary file (and discarded after virtual machine exits). **It is strongly recommended mode for running your jobs.** Set TMPDIR environment variable to local scratch directory for placement temporary files. - $ export TMPDIR=/lscratch/$ - $ qemu-system-x86_64 ... -snapshot + $ export TMPDIR=/lscratch/$ + $ qemu-system-x86_64 ... -snapshot ### Windows guests For Windows guests we recommend these options, life will be easier: - $ qemu-system-x86_64 ... -localtime -usb -usbdevice tablet + $ qemu-system-x86_64 ... -localtime -usb -usbdevice tablet diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/Running_OpenMPI.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/Running_OpenMPI.md index e6a572e9fd9b44bb340fb765009397eb66e8c802..c8ae48589fae957c6f96ebcd50fc26e12385f44e 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/Running_OpenMPI.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/Running_OpenMPI.md @@ -3,7 +3,7 @@ Running OpenMPI - + OpenMPI program execution ------------------------- @@ -18,19 +18,19 @@ Use the mpiexec to run the OpenMPI code. Example: - $ qsub -q qexp -l select=4:ncpus=16 -I - qsub: waiting for job 15210.srv11 to start - qsub: job 15210.srv11 ready + $ qsub -q qexp -l select=4:ncpus=16 -I + qsub: waiting for job 15210.srv11 to start + qsub: job 15210.srv11 ready - $ pwd - /home/username + $ pwd + /home/username - $ module load openmpi - $ mpiexec -pernode ./helloworld_mpi.x - Hello world! from rank 0 of 4 on host cn17 - Hello world! from rank 1 of 4 on host cn108 - Hello world! from rank 2 of 4 on host cn109 - Hello world! from rank 3 of 4 on host cn110 + $ module load openmpi + $ mpiexec -pernode ./helloworld_mpi.x + Hello world! from rank 0 of 4 on host cn17 + Hello world! from rank 1 of 4 on host cn108 + Hello world! from rank 2 of 4 on host cn109 + Hello world! from rank 3 of 4 on host cn110 Please be aware, that in this example, the directive **-pernode** is used to run only **one task per node**, which is normally an unwanted @@ -41,29 +41,29 @@ directive** to run up to 16 MPI tasks per each node. In this example, we allocate 4 nodes via the express queue interactively. We set up the openmpi environment and interactively run the helloworld_mpi.x program. -Note that the executable <span -class="monospace">helloworld_mpi.x</span> must be available within the +Note that the executable +helloworld_mpi.x must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem. You need to preload the executable, if running on the local scratch /lscratch filesystem - $ pwd - /lscratch/15210.srv11 + $ pwd + /lscratch/15210.srv11 - $ mpiexec -pernode --preload-binary ./helloworld_mpi.x - Hello world! from rank 0 of 4 on host cn17 - Hello world! from rank 1 of 4 on host cn108 - Hello world! from rank 2 of 4 on host cn109 - Hello world! from rank 3 of 4 on host cn110 + $ mpiexec -pernode --preload-binary ./helloworld_mpi.x + Hello world! from rank 0 of 4 on host cn17 + Hello world! from rank 1 of 4 on host cn108 + Hello world! from rank 2 of 4 on host cn109 + Hello world! from rank 3 of 4 on host cn110 -In this example, we assume the executable <span -class="monospace">helloworld_mpi.x</span> is present on compute node +In this example, we assume the executable +helloworld_mpi.x is present on compute node cn17 on local scratch. We call the mpiexec whith the -**--preload-binary** argument (valid for openmpi). The mpiexec will copy -the executable from cn17 to the <span -class="monospace">/lscratch/15210.srv11</span> directory on cn108, cn109 +--preload-binary** argument (valid for openmpi). The mpiexec will copy +the executable from cn17 to the +/lscratch/15210.srv11 directory on cn108, cn109 and cn110 and execute the program. MPI process mapping may be controlled by PBS parameters. @@ -77,11 +77,11 @@ MPI process. Follow this example to run one MPI process per node, 16 threads per process. - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I + $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I - $ module load openmpi + $ module load openmpi - $ mpiexec --bind-to-none ./helloworld_mpi.x + $ mpiexec --bind-to-none ./helloworld_mpi.x In this example, we demonstrate recommended way to run an MPI application, using 1 MPI processes per node and 16 threads per socket, @@ -92,11 +92,11 @@ on 4 nodes. Follow this example to run two MPI processes per node, 8 threads per process. Note the options to mpiexec. - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I + $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I - $ module load openmpi + $ module load openmpi - $ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x + $ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x In this example, we demonstrate recommended way to run an MPI application, using 2 MPI processes per node and 8 threads per socket, @@ -108,11 +108,11 @@ node, on 4 nodes Follow this example to run 16 MPI processes per node, 1 thread per process. Note the options to mpiexec. - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I + $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I - $ module load openmpi + $ module load openmpi - $ mpiexec -bycore -bind-to-core ./helloworld_mpi.x + $ mpiexec -bycore -bind-to-core ./helloworld_mpi.x In this example, we demonstrate recommended way to run an MPI application, using 16 MPI processes per node, single threaded. Each @@ -127,19 +127,19 @@ operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: - $ export GOMP_CPU_AFFINITY="0-15" + $ export GOMP_CPU_AFFINITY="0-15" or this one for Intel OpenMP: - $ export KMP_AFFINITY=granularity=fine,compact,1,0 + $ export KMP_AFFINITY=granularity=fine,compact,1,0 As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and later) the following variables may be used for Intel or GCC: - $ export OMP_PROC_BIND=true - $ export OMP_PLACES=cores + $ export OMP_PROC_BIND=true + $ export OMP_PLACES=cores -<span>OpenMPI Process Mapping and Binding</span> +>OpenMPI Process Mapping and Binding ------------------------------------------------ The mpiexec allows for precise selection of how the MPI processes will @@ -155,18 +155,18 @@ openmpi only. Example hostfile - cn110.bullx - cn109.bullx - cn108.bullx - cn17.bullx + cn110.bullx + cn109.bullx + cn108.bullx + cn17.bullx Use the hostfile to control process placement - $ mpiexec -hostfile hostfile ./helloworld_mpi.x - Hello world! from rank 0 of 4 on host cn110 - Hello world! from rank 1 of 4 on host cn109 - Hello world! from rank 2 of 4 on host cn108 - Hello world! from rank 3 of 4 on host cn17 + $ mpiexec -hostfile hostfile ./helloworld_mpi.x + Hello world! from rank 0 of 4 on host cn110 + Hello world! from rank 1 of 4 on host cn109 + Hello world! from rank 2 of 4 on host cn108 + Hello world! from rank 3 of 4 on host cn17 In this example, we see that ranks have been mapped on nodes according to the order in which nodes show in the hostfile @@ -180,11 +180,11 @@ Appropriate binding may boost performance of your application. Example rankfile - rank 0=cn110.bullx slot=1:0,1 - rank 1=cn109.bullx slot=0:* - rank 2=cn108.bullx slot=1:1-2 - rank 3=cn17.bullx slot=0:1,1:0-2 - rank 4=cn109.bullx slot=0:*,1:* + rank 0=cn110.bullx slot=1:0,1 + rank 1=cn109.bullx slot=0:* + rank 2=cn108.bullx slot=1:1-2 + rank 3=cn17.bullx slot=0:1,1:0-2 + rank 4=cn109.bullx slot=0:*,1:* This rankfile assumes 5 ranks will be running on 4 nodes and provides exact mapping and binding of the processes to the processor sockets and @@ -198,17 +198,17 @@ rank 3 will be bounded to cn17, socket0 core1, socket1 core0, core1, core2 rank 4 will be bounded to cn109, all cores on both sockets - $ mpiexec -n 5 -rf rankfile --report-bindings ./helloworld_mpi.x - [cn17:11180] MCW rank 3 bound to socket 0[core 1] socket 1[core 0-2]: [. B . . . . . .][B B B . . . . .] (slot list 0:1,1:0-2) - [cn110:09928] MCW rank 0 bound to socket 1[core 0-1]: [. . . . . . . .][B B . . . . . .] (slot list 1:0,1) - [cn109:10395] MCW rank 1 bound to socket 0[core 0-7]: [B B B B B B B B][. . . . . . . .] (slot list 0:*) - [cn108:10406] MCW rank 2 bound to socket 1[core 1-2]: [. . . . . . . .][. B B . . . . .] (slot list 1:1-2) - [cn109:10406] MCW rank 4 bound to socket 0[core 0-7] socket 1[core 0-7]: [B B B B B B B B][B B B B B B B B] (slot list 0:*,1:*) - Hello world! from rank 3 of 5 on host cn17 - Hello world! from rank 1 of 5 on host cn109 - Hello world! from rank 0 of 5 on host cn110 - Hello world! from rank 4 of 5 on host cn109 - Hello world! from rank 2 of 5 on host cn108 + $ mpiexec -n 5 -rf rankfile --report-bindings ./helloworld_mpi.x + [cn17:11180] MCW rank 3 bound to socket 0[core 1] socket 1[core 0-2]: [. B . . . . . .][B B B . . . . .] (slot list 0:1,1:0-2) + [cn110:09928] MCW rank 0 bound to socket 1[core 0-1]: [. . . . . . . .][B B . . . . . .] (slot list 1:0,1) + [cn109:10395] MCW rank 1 bound to socket 0[core 0-7]: [B B B B B B B B][. . . . . . . .] (slot list 0:*) + [cn108:10406] MCW rank 2 bound to socket 1[core 1-2]: [. . . . . . . .][. B B . . . . .] (slot list 1:1-2) + [cn109:10406] MCW rank 4 bound to socket 0[core 0-7] socket 1[core 0-7]: [B B B B B B B B][B B B B B B B B] (slot list 0:*,1:*) + Hello world! from rank 3 of 5 on host cn17 + Hello world! from rank 1 of 5 on host cn109 + Hello world! from rank 0 of 5 on host cn110 + Hello world! from rank 4 of 5 on host cn109 + Hello world! from rank 2 of 5 on host cn108 In this example we run 5 MPI processes (5 ranks) on four nodes. The rankfile defines how the processes will be mapped on the nodes, sockets @@ -224,9 +224,9 @@ and cores. In all cases, binding and threading may be verified by executing for example: - $ mpiexec -bysocket -bind-to-socket --report-bindings echo - $ mpiexec -bysocket -bind-to-socket numactl --show - $ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS + $ mpiexec -bysocket -bind-to-socket --report-bindings echo + $ mpiexec -bysocket -bind-to-socket numactl --show + $ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS Changes in OpenMPI 1.8 ---------------------- @@ -267,7 +267,7 @@ Some options have changed in OpenMPI version 1.8. </tr> <tr class="even"> <td align="left">-pernode</td> -<td align="left"><p><span class="s1">--map-by ppr:1:node</span></p></td> +<td align="left"><p> class="s1">--map-by ppr:1:node</p></td> </tr> </tbody> </table> diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/mpi.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/mpi.md index 83fdb81cf9bc4a64ca4e9a2c9825dad100343a5f..826a565810cd9d30536ee4ceed8dbe8b1346fe95 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/mpi.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/mpi.md @@ -3,7 +3,7 @@ MPI - + Setting up MPI Environment -------------------------- @@ -49,43 +49,43 @@ MPI libraries are activated via the environment modules. Look up section modulefiles/mpi in module avail - $ module avail - ------------------------- /opt/modules/modulefiles/mpi ------------------------- - bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-icc - impi/4.0.3.008 openmpi/1.6.5-gcc(default) - impi/4.1.0.024 openmpi/1.6.5-gcc46 - impi/4.1.0.030 openmpi/1.6.5-icc - impi/4.1.1.036(default) openmpi/1.8.1-gcc - openmpi/1.8.1-gcc46 - mvapich2/1.9-gcc(default) openmpi/1.8.1-gcc49 - mvapich2/1.9-gcc46 openmpi/1.8.1-icc + $ module avail + ------------------------- /opt/modules/modulefiles/mpi ------------------------- + bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-icc + impi/4.0.3.008 openmpi/1.6.5-gcc(default) + impi/4.1.0.024 openmpi/1.6.5-gcc46 + impi/4.1.0.030 openmpi/1.6.5-icc + impi/4.1.1.036(default) openmpi/1.8.1-gcc + openmpi/1.8.1-gcc46 + mvapich2/1.9-gcc(default) openmpi/1.8.1-gcc49 + mvapich2/1.9-gcc46 openmpi/1.8.1-icc There are default compilers associated with any particular MPI implementation. The defaults may be changed, the MPI libraries may be used in conjunction with any compiler. The defaults are selected via the modules in following way - Module MPI Compiler suite - -------------- ------------------ -------------------------------------------------------------------------------- - PrgEnv-gnu bullxmpi-1.2.4.1 bullx GNU 4.4.6 - PrgEnv-intel Intel MPI 4.1.1 Intel 13.1.1 - bullxmpi bullxmpi-1.2.4.1 none, select via module - impi Intel MPI 4.1.1 none, select via module - openmpi OpenMPI 1.6.5 GNU compilers 4.8.1, GNU compilers 4.4.6, Intel Compilers - openmpi OpenMPI 1.8.1 GNU compilers 4.8.1, GNU compilers 4.4.6, GNU compilers 4.9.0, Intel Compilers - mvapich2 MPICH2 1.9 GNU compilers 4.8.1, GNU compilers 4.4.6, Intel Compilers +Module MPI Compiler suite +-------------- ------------------ -------------------------------------------------------------------------------- +PrgEnv-gnu bullxmpi-1.2.4.1 bullx GNU 4.4.6 +PrgEnv-intel Intel MPI 4.1.1 Intel 13.1.1 +bullxmpi bullxmpi-1.2.4.1 none, select via module +impi Intel MPI 4.1.1 none, select via module +openmpi OpenMPI 1.6.5 GNU compilers 4.8.1, GNU compilers 4.4.6, Intel Compilers +openmpi OpenMPI 1.8.1 GNU compilers 4.8.1, GNU compilers 4.4.6, GNU compilers 4.9.0, Intel Compilers +mvapich2 MPICH2 1.9 GNU compilers 4.8.1, GNU compilers 4.4.6, Intel Compilers Examples: - $ module load openmpi + $ module load openmpi In this example, we activate the latest openmpi with latest GNU compilers To use openmpi with the intel compiler suite, use - $ module load intel - $ module load openmpi/1.6.5-icc + $ module load intel + $ module load openmpi/1.6.5-icc In this example, the openmpi 1.6.5 using intel compilers is activated @@ -95,41 +95,41 @@ Compiling MPI Programs After setting up your MPI environment, compile your program using one of the mpi wrappers - $ mpicc -v - $ mpif77 -v - $ mpif90 -v + $ mpicc -v + $ mpif77 -v + $ mpif90 -v Example program: - // helloworld_mpi.c - #include <stdio.h> + // helloworld_mpi.c + #include <stdio.h> - #include<mpi.h> + #include<mpi.h> - int main(int argc, char **argv) { + int main(int argc, char **argv) { - int len; - int rank, size; - char node[MPI_MAX_PROCESSOR_NAME]; + int len; + int rank, size; + char node[MPI_MAX_PROCESSOR_NAME]; - // Initiate MPI - MPI_Init(&argc, &argv); - MPI_Comm_rank(MPI_COMM_WORLD,&rank); - MPI_Comm_size(MPI_COMM_WORLD,&size); + // Initiate MPI + MPI_Init(&argc, &argv); + MPI_Comm_rank(MPI_COMM_WORLD,&rank); + MPI_Comm_size(MPI_COMM_WORLD,&size); - // Get hostame and print - MPI_Get_processor_name(node,&len); - printf("Hello world! from rank %d of %d on host %sn",rank,size,node); + // Get hostame and print + MPI_Get_processor_name(node,&len); + printf("Hello world! from rank %d of %d on host %sn",rank,size,node); - // Finalize and exit - MPI_Finalize(); + // Finalize and exit + MPI_Finalize(); - return 0; - } + return 0; + } Compile the above example with - $ mpicc helloworld_mpi.c -o helloworld_mpi.x + $ mpicc helloworld_mpi.c -o helloworld_mpi.x Running MPI Programs -------------------- @@ -157,13 +157,13 @@ Consider these ways to run an MPI program: 2. Two MPI processes per node, 8 threads per process 3. 16 MPI processes per node, 1 thread per process. -**One MPI** process per node, using 16 threads, is most useful for +One MPI** process per node, using 16 threads, is most useful for memory demanding applications, that make good use of processor cache memory and are not memory bound. This is also a preferred way for communication intensive applications as one process per node enjoys full bandwidth access to the network interface. -**Two MPI** processes per node, using 8 threads each, bound to processor +Two MPI** processes per node, using 8 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 8 threads and @@ -177,7 +177,7 @@ operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. -**16 MPI** processes per node, using 1 thread each bound to processor +16 MPI** processes per node, using 1 thread each bound to processor core is most suitable for highly scalable applications with low communication demand. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/mpi4py-mpi-for-python.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/mpi4py-mpi-for-python.md index 9fe98f9543cef0a1c005d404d0a6e1168aeaf0e0..725ccf169440d30d429b1cd9cf7df5808fcdc69a 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/mpi4py-mpi-for-python.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/mpi4py-mpi-for-python.md @@ -4,7 +4,7 @@ MPI4Py (MPI for Python) OpenMPI interface to Python - + Introduction ------------ @@ -29,8 +29,8 @@ Modules MPI4Py is build for OpenMPI. Before you start with MPI4Py you need to load Python and OpenMPI modules. - $ module load python - $ module load openmpi + $ module load python + $ module load openmpi Execution --------- @@ -38,65 +38,65 @@ Execution You need to import MPI to your python program. Include the following line to the python script: - from mpi4py import MPI + from mpi4py import MPI The MPI4Py enabled python programs [execute as any other OpenMPI](Running_OpenMPI.html) code.The simpliest way is to run - $ mpiexec python <script>.py + $ mpiexec python <script>.py -<span>For example</span> +>For example - $ mpiexec python hello_world.py + $ mpiexec python hello_world.py Examples -------- ### Hello world! - from mpi4py import MPI + from mpi4py import MPI - comm = MPI.COMM_WORLD + comm = MPI.COMM_WORLD - print "Hello! I'm rank %d from %d running in total..." % (comm.rank, comm.size) + print "Hello! I'm rank %d from %d running in total..." % (comm.rank, comm.size) - comm.Barrier()  # wait for everybody to synchronize + comm.Barrier()  # wait for everybody to synchronize -### <span>Collective Communication with NumPy arrays</span> +### >Collective Communication with NumPy arrays - from mpi4py import MPI - from __future__ import division - import numpy as np + from mpi4py import MPI + from __future__ import division + import numpy as np - comm = MPI.COMM_WORLD + comm = MPI.COMM_WORLD - print("-"*78) - print(" Running on %d cores" % comm.size) - print("-"*78) + print("-"*78) + print(" Running on %d cores" % comm.size) + print("-"*78) - comm.Barrier() + comm.Barrier() - # Prepare a vector of N=5 elements to be broadcasted... - N = 5 - if comm.rank == 0: -   A = np.arange(N, dtype=np.float64)   # rank 0 has proper data - else: -   A = np.empty(N, dtype=np.float64)   # all other just an empty array + # Prepare a vector of N=5 elements to be broadcasted... + N = 5 + if comm.rank == 0: +   A = np.arange(N, dtype=np.float64)   # rank 0 has proper data + else: +   A = np.empty(N, dtype=np.float64)   # all other just an empty array - # Broadcast A from rank 0 to everybody - comm.Bcast( [A, MPI.DOUBLE] ) + # Broadcast A from rank 0 to everybody + comm.Bcast( [A, MPI.DOUBLE] ) - # Everybody should now have the same... - print "[%02d] %s" % (comm.rank, A) + # Everybody should now have the same... + print "[%02d] %s" % (comm.rank, A) Execute the above code as: - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I + $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I - $ module load python openmpi + $ module load python openmpi - $ mpiexec -bycore -bind-to-core python hello_world.py + $ mpiexec -bycore -bind-to-core python hello_world.py In this example, we run MPI4Py enabled code on 4 nodes, 16 cores per node (total of 64 processes), each python process is bound to a diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/running-mpich2.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/running-mpich2.md index cc828bcf5916f72f46a2a2251cf6f93d76fa627e..b14053ba6089d4b072795d088f3c969dc0b87d26 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/running-mpich2.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/mpi-1/running-mpich2.md @@ -3,7 +3,7 @@ Running MPICH2 - + MPICH2 program execution ------------------------ @@ -19,17 +19,17 @@ Use the mpirun to execute the MPICH2 code. Example: - $ qsub -q qexp -l select=4:ncpus=16 -I - qsub: waiting for job 15210.srv11 to start - qsub: job 15210.srv11 ready + $ qsub -q qexp -l select=4:ncpus=16 -I + qsub: waiting for job 15210.srv11 to start + qsub: job 15210.srv11 ready - $ module load impi + $ module load impi - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE ./helloworld_mpi.x - Hello world! from rank 0 of 4 on host cn17 - Hello world! from rank 1 of 4 on host cn108 - Hello world! from rank 2 of 4 on host cn109 - Hello world! from rank 3 of 4 on host cn110 + $ mpirun -ppn 1 -hostfile $PBS_NODEFILE ./helloworld_mpi.x + Hello world! from rank 0 of 4 on host cn17 + Hello world! from rank 1 of 4 on host cn108 + Hello world! from rank 2 of 4 on host cn109 + Hello world! from rank 3 of 4 on host cn110 In this example, we allocate 4 nodes via the express queue interactively. We set up the intel MPI environment and interactively run @@ -42,14 +42,14 @@ same path on all nodes. This is automatically fulfilled on the /home and You need to preload the executable, if running on the local scratch /lscratch filesystem - $ pwd - /lscratch/15210.srv11 - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE cp /home/username/helloworld_mpi.x . - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE ./helloworld_mpi.x - Hello world! from rank 0 of 4 on host cn17 - Hello world! from rank 1 of 4 on host cn108 - Hello world! from rank 2 of 4 on host cn109 - Hello world! from rank 3 of 4 on host cn110 + $ pwd + /lscratch/15210.srv11 + $ mpirun -ppn 1 -hostfile $PBS_NODEFILE cp /home/username/helloworld_mpi.x . + $ mpirun -ppn 1 -hostfile $PBS_NODEFILE ./helloworld_mpi.x + Hello world! from rank 0 of 4 on host cn17 + Hello world! from rank 1 of 4 on host cn108 + Hello world! from rank 2 of 4 on host cn109 + Hello world! from rank 3 of 4 on host cn110 In this example, we assume the executable helloworld_mpi.x is present on shared home directory. We run the cp command via mpirun, copying the @@ -68,11 +68,11 @@ MPI process. Follow this example to run one MPI process per node, 16 threads per process. Note that no options to mpirun are needed - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I + $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=1:ompthreads=16 -I - $ module load mvapich2 + $ module load mvapich2 - $ mpirun ./helloworld_mpi.x + $ mpirun ./helloworld_mpi.x In this example, we demonstrate recommended way to run an MPI application, using 1 MPI processes per node and 16 threads per socket, @@ -84,11 +84,11 @@ Follow this example to run two MPI processes per node, 8 threads per process. Note the options to mpirun for mvapich2. No options are needed for impi. - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I + $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=2:ompthreads=8 -I - $ module load mvapich2 + $ module load mvapich2 - $ mpirun -bind-to numa ./helloworld_mpi.x + $ mpirun -bind-to numa ./helloworld_mpi.x In this example, we demonstrate recommended way to run an MPI application, using 2 MPI processes per node and 8 threads per socket, @@ -101,11 +101,11 @@ Follow this example to run 16 MPI processes per node, 1 thread per process. Note the options to mpirun for mvapich2. No options are needed for impi. - $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I + $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I - $ module load mvapich2 + $ module load mvapich2 - $ mpirun -bind-to core ./helloworld_mpi.x + $ mpirun -bind-to core ./helloworld_mpi.x In this example, we demonstrate recommended way to run an MPI application, using 16 MPI processes per node, single threaded. Each @@ -120,17 +120,17 @@ operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: - $ export GOMP_CPU_AFFINITY="0-15" + $ export GOMP_CPU_AFFINITY="0-15" or this one for Intel OpenMP: - $ export KMP_AFFINITY=granularity=fine,compact,1,0 + $ export KMP_AFFINITY=granularity=fine,compact,1,0 As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and later) the following variables may be used for Intel or GCC: - $ export OMP_PROC_BIND=true - $ export OMP_PLACES=cores + $ export OMP_PROC_BIND=true + $ export OMP_PLACES=cores  @@ -150,20 +150,20 @@ and mvapich2 only. Example machinefile - cn110.bullx - cn109.bullx - cn108.bullx - cn17.bullx - cn108.bullx + cn110.bullx + cn109.bullx + cn108.bullx + cn17.bullx + cn108.bullx Use the machinefile to control process placement - $ mpirun -machinefile machinefile helloworld_mpi.x - Hello world! from rank 0 of 5 on host cn110 - Hello world! from rank 1 of 5 on host cn109 - Hello world! from rank 2 of 5 on host cn108 - Hello world! from rank 3 of 5 on host cn17 - Hello world! from rank 4 of 5 on host cn108 + $ mpirun -machinefile machinefile helloworld_mpi.x + Hello world! from rank 0 of 5 on host cn110 + Hello world! from rank 1 of 5 on host cn109 + Hello world! from rank 2 of 5 on host cn108 + Hello world! from rank 3 of 5 on host cn17 + Hello world! from rank 4 of 5 on host cn108 In this example, we see that ranks have been mapped on nodes according to the order in which nodes show in the machinefile @@ -182,8 +182,8 @@ to bind the process on single core or entire socket. In all cases, binding and threading may be verified by executing - $ mpirun -bindto numa numactl --show - $ mpirun -bindto numa echo $OMP_NUM_THREADS + $ mpirun -bindto numa numactl --show + $ mpirun -bindto numa echo $OMP_NUM_THREADS Intel MPI on Xeon Phi --------------------- diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/copy_of_matlab.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/copy_of_matlab.md index f2db3e2e2e148315563973997b5cf8087d1f5601..d7b9b61c25b1fe1df6388249cd5a95ce99135228 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/copy_of_matlab.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/copy_of_matlab.md @@ -3,7 +3,7 @@ Matlab - + Introduction ------------ @@ -11,24 +11,24 @@ Introduction Matlab is available in versions R2015a and R2015b. There are always two variants of the release: -- Non commercial or so called EDU variant, which can be used for - common research and educational purposes. -- Commercial or so called COM variant, which can used also for - commercial activities. The licenses for commercial variant are much - more expensive, so usually the commercial variant has only subset of - features compared to the EDU available. +- Non commercial or so called EDU variant, which can be used for + common research and educational purposes. +- Commercial or so called COM variant, which can used also for + commercial activities. The licenses for commercial variant are much + more expensive, so usually the commercial variant has only subset of + features compared to the EDU available.  To load the latest version of Matlab load the module - $ module load MATLAB + $ module load MATLAB By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use - $ module avail MATLAB + $ module avail MATLAB If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use @@ -46,16 +46,16 @@ is recommended. To run Matlab with GUI, use - $ matlab + $ matlab To run Matlab in text mode, without the Matlab Desktop GUI environment, use - $ matlab -nodesktop -nosplash + $ matlab -nodesktop -nosplash plots, images, etc... will be still available. -[]()Running parallel Matlab using Distributed Computing Toolbox / Engine +Running parallel Matlab using Distributed Computing Toolbox / Engine ------------------------------------------------------------------------ Distributed toolbox is available only for the EDU variant @@ -72,11 +72,11 @@ To use Distributed Computing, you first need to setup a parallel profile. We have provided the profile for you, you can either import it in MATLAB command line: - >> parallel.importProfile('/apps/all/MATLAB/2015a-EDU/SalomonPBSPro.settings') + >> parallel.importProfile('/apps/all/MATLAB/2015a-EDU/SalomonPBSPro.settings') - ans = + ans = - SalomonPBSPro + SalomonPBSPro Or in the GUI, go to tab HOME -> Parallel -> Manage Cluster Profiles..., click Import and navigate to : @@ -99,9 +99,9 @@ for Matlab GUI. For more information about GUI based applications on Anselm see [this page](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html). - $ xhost + - $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 - -l feature__matlab__MATLAB=1 + $ xhost + + $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 + -l feature__matlab__MATLAB=1 This qsub command example shows how to run Matlab on a single node. @@ -112,35 +112,35 @@ Engines licenses. Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: - r1i0n17$ module load MATLAB/2015b-EDU - r1i0n17$ matlab & + r1i0n17$ module load MATLAB/2015b-EDU + r1i0n17$ matlab & -### []()Parallel Matlab batch job in Local mode +### Parallel Matlab batch job in Local mode To run matlab in batch mode, write an matlab script, then write a bash jobscript and execute via the qsub command. By default, matlab will execute one matlab worker instance per allocated core. - #!/bin/bash - #PBS -A PROJECT ID - #PBS -q qprod - #PBS -l select=1:ncpus=16:mpiprocs=16:ompthreads=1 + #!/bin/bash + #PBS -A PROJECT ID + #PBS -q qprod + #PBS -l select=1:ncpus=16:mpiprocs=16:ompthreads=1 - # change to shared scratch directory - SCR=/scratch/work/user/$USER/$PBS_JOBID - mkdir -p $SCR ; cd $SCR || exit + # change to shared scratch directory + SCR=/scratch/work/user/$USER/$PBS_JOBID + mkdir -p $SCR ; cd $SCR || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/matlabcode.m . + # copy input file to scratch + cp $PBS_O_WORKDIR/matlabcode.m . - # load modules - module load MATLAB/2015a-EDU + # load modules + module load MATLAB/2015a-EDU - # execute the calculation - matlab -nodisplay -r matlabcode > output.out + # execute the calculation + matlab -nodisplay -r matlabcode > output.out - # copy output file to home - cp output.out $PBS_O_WORKDIR/. + # copy output file to home + cp output.out $PBS_O_WORKDIR/. This script may be submitted directly to the PBS workload manager via the qsub command. The inputs and matlab script are in matlabcode.m @@ -151,14 +151,14 @@ include quit** statement at the end of the matlabcode.m script. Submit the jobscript using qsub - $ qsub ./jobscript + $ qsub ./jobscript ### Parallel Matlab Local mode program example The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. - cluster = parcluster('local') + cluster = parcluster('local') This script creates scheduler object "cluster" of type "local" that starts workers locally. @@ -170,39 +170,39 @@ function. The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. - parpool(cluster,16); - - - ... parallel code ... + parpool(cluster,16); + + + ... parallel code ... + - - parpool close + parpool close The complete example showing how to use Distributed Computing Toolbox in local mode is shown here. - cluster = parcluster('local'); - cluster + cluster = parcluster('local'); + cluster - parpool(cluster,24); + parpool(cluster,24); - n=2000; + n=2000; - W = rand(n,n); - W = distributed(W); - x = (1:n)'; - x = distributed(x); - spmd - [~, name] = system('hostname') -    -    T = W*x; % Calculation performed on labs, in parallel. -             % T and W are both codistributed arrays here. - end - T; - whos        % T and W are both distributed arrays here. + W = rand(n,n); + W = distributed(W); + x = (1:n)'; + x = distributed(x); + spmd + [~, name] = system('hostname') +    +    T = W*x; % Calculation performed on labs, in parallel. +             % T and W are both codistributed arrays here. + end + T; + whos        % T and W are both distributed arrays here. - parpool close - quit + parpool close + quit You can copy and paste the example in a .m file and execute. Note that the parpool size should correspond to **total number of cores** @@ -217,29 +217,29 @@ it spawns the workers in a separate job submitted by MATLAB using qsub. This is an example of m-script using PBS mode: - cluster = parcluster('SalomonPBSPro'); - set(cluster, 'SubmitArguments', '-A OPEN-0-0'); - set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=16'); - set(cluster, 'NumWorkers', 160); + cluster = parcluster('SalomonPBSPro'); + set(cluster, 'SubmitArguments', '-A OPEN-0-0'); + set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=16'); + set(cluster, 'NumWorkers', 160); - pool = parpool(cluster, 160); + pool = parpool(cluster, 160); - n=2000; + n=2000; - W = rand(n,n); - W = distributed(W); - x = (1:n)'; - x = distributed(x); - spmd - [~, name] = system('hostname') + W = rand(n,n); + W = distributed(W); + x = (1:n)'; + x = distributed(x); + spmd + [~, name] = system('hostname') - T = W*x; % Calculation performed on labs, in parallel. - % T and W are both codistributed arrays here. - end - whos % T and W are both distributed arrays here. + T = W*x; % Calculation performed on labs, in parallel. + % T and W are both codistributed arrays here. + end + whos % T and W are both distributed arrays here. - % shut down parallel pool - delete(pool) + % shut down parallel pool + delete(pool) Note that we first construct a cluster object using the imported profile, then set some important options, namely : SubmitArguments, @@ -267,28 +267,28 @@ SalomonPBSPro](copy_of_matlab.html#running-parallel-matlab-using-distributed-com This is an example of m-script using direct mode: - parallel.importProfile('/apps/all/MATLAB/2015a-EDU/SalomonDirect.settings') - cluster = parcluster('SalomonDirect'); - set(cluster, 'NumWorkers', 48); + parallel.importProfile('/apps/all/MATLAB/2015a-EDU/SalomonDirect.settings') + cluster = parcluster('SalomonDirect'); + set(cluster, 'NumWorkers', 48); - pool = parpool(cluster, 48); + pool = parpool(cluster, 48); - n=2000; + n=2000; - W = rand(n,n); - W = distributed(W); - x = (1:n)'; - x = distributed(x); - spmd - [~, name] = system('hostname') + W = rand(n,n); + W = distributed(W); + x = (1:n)'; + x = distributed(x); + spmd + [~, name] = system('hostname') - T = W*x; % Calculation performed on labs, in parallel. - % T and W are both codistributed arrays here. - end - whos % T and W are both distributed arrays here. + T = W*x; % Calculation performed on labs, in parallel. + % T and W are both codistributed arrays here. + end + whos % T and W are both distributed arrays here. - % shut down parallel pool - delete(pool) + % shut down parallel pool + delete(pool) ### Non-interactive Session and Licenses @@ -309,12 +309,12 @@ allocation. Starting Matlab workers is an expensive process that requires certain amount of time. For your information please see the following table: - compute nodes number of workers start-up time[s] - --------------- ------------------- -------------------- - 16 384 831 - 8 192 807 - 4 96 483 - 2 48 16 +compute nodes number of workers start-up time[s] +--------------- ------------------- -------------------- +16 384 831 +8 192 807 +4 96 483 +2 48 16 MATLAB on UV2000 ----------------- @@ -330,13 +330,13 @@ You can use MATLAB on UV2000 in two parallel modes : Since this is a SMP machine, you can completely avoid using Parallel Toolbox and use only MATLAB's threading. MATLAB will automatically -detect the number of cores you have allocated and will set <span -class="monospace">maxNumCompThreads </span>accordingly and certain -operations, such as <span class="monospace">fft, , eig, svd</span>, +detect the number of cores you have allocated and will set +maxNumCompThreads accordingly and certain +operations, such as fft, , eig, svd, etc. will be automatically run in threads. The advantage of this mode is -that you don't need to modify your existing sequential codes.<span -class="monospace"> -</span> +that you don't need to modify your existing sequential codes. + + ### Local cluster mode diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/introduction.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/introduction.md index 3c03b655cf1e6e88b3bd0e557804adbc819b15c9..320a56635a477f6c52b04e6a966ae1dcf7a20ceb 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/introduction.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/introduction.md @@ -4,7 +4,7 @@ Numerical languages Interpreted languages for numerical computations and analysis - + Introduction ------------ @@ -18,10 +18,10 @@ Matlab MATLAB^®^ is a high-level language and interactive environment for numerical computation, visualization, and programming. - $ module load MATLAB/2015b-EDU - $ matlab + $ module load MATLAB/2015b-EDU + $ matlab -Read more at the [Matlab<span class="internal-link"></span> +Read more at the [Matlab page](matlab.html). Octave @@ -31,8 +31,8 @@ GNU Octave is a high-level interpreted language, primarily intended for numerical computations. The Octave language is quite similar to Matlab so that most programs are easily portable. - $ module load Octave - $ octave + $ module load Octave + $ octave Read more at the [Octave page](octave.html). @@ -42,8 +42,8 @@ R The R is an interpreted language and environment for statistical computing and graphics. - $ module load R - $ R + $ module load R + $ R Read more at the [R page](r.html). diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/matlab.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/matlab.md index 513fdcd2f92e79a2767d8dabbe48fc92ff2af251..00799872748ab1d2803ab4e7e9a9b45f0fb4dffd 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/matlab.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/matlab.md @@ -3,7 +3,7 @@ Matlab 2013-2014 - + Introduction ------------ @@ -15,24 +15,24 @@ instead](copy_of_matlab.html). Matlab is available in the latest stable version. There are always two variants of the release: -- Non commercial or so called EDU variant, which can be used for - common research and educational purposes. -- Commercial or so called COM variant, which can used also for - commercial activities. The licenses for commercial variant are much - more expensive, so usually the commercial variant has only subset of - features compared to the EDU available. +- Non commercial or so called EDU variant, which can be used for + common research and educational purposes. +- Commercial or so called COM variant, which can used also for + commercial activities. The licenses for commercial variant are much + more expensive, so usually the commercial variant has only subset of + features compared to the EDU available.  To load the latest version of Matlab load the module - $ module load matlab + $ module load matlab By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use - $ module avail matlab + $ module avail matlab If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use @@ -50,12 +50,12 @@ is recommended. To run Matlab with GUI, use - $ matlab + $ matlab To run Matlab in text mode, without the Matlab Desktop GUI environment, use - $ matlab -nodesktop -nosplash + $ matlab -nodesktop -nosplash plots, images, etc... will be still available. @@ -79,20 +79,20 @@ system MPI user has to override default Matlab setting by creating new configuration file in its home directory. The path and file name has to be exactly the same as in the following listing: - $ vim ~/matlab/mpiLibConf.m + $ vim ~/matlab/mpiLibConf.m - function [lib, extras] = mpiLibConf - %MATLAB MPI Library overloading for Infiniband Networks + function [lib, extras] = mpiLibConf + %MATLAB MPI Library overloading for Infiniband Networks - mpich = '/opt/intel/impi/4.1.1.036/lib64/'; + mpich = '/opt/intel/impi/4.1.1.036/lib64/'; - disp('Using Intel MPI 4.1.1.036 over Infiniband') + disp('Using Intel MPI 4.1.1.036 over Infiniband') - lib = strcat(mpich, 'libmpich.so'); - mpl = strcat(mpich, 'libmpl.so'); - opa = strcat(mpich, 'libopa.so'); + lib = strcat(mpich, 'libmpich.so'); + mpl = strcat(mpich, 'libmpl.so'); + opa = strcat(mpich, 'libopa.so'); - extras = ; + extras = ; System MPI library allows Matlab to communicate through 40Gbps Infiniband QDR interconnect instead of slower 1Gb ethernet network. @@ -110,9 +110,9 @@ for Matlab GUI. For more information about GUI based applications on Anselm see [this page](https://docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/resolveuid/11e53ad0d2fd4c5187537f4baeedff33). - $ xhost + - $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=4:ncpus=16:mpiprocs=16 -l walltime=00:30:00 - -l feature__matlab__MATLAB=1 + $ xhost + + $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=4:ncpus=16:mpiprocs=16 -l walltime=00:30:00 + -l feature__matlab__MATLAB=1 This qsub command example shows how to run Matlab with 32 workers in following configuration: 2 nodes (use all 16 cores per node) and 16 @@ -127,9 +127,9 @@ Engines licenses. Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: - cn79$ module load matlab/R2013a-EDU - cn79$ module load impi/4.1.1.036 - cn79$ matlab & + cn79$ module load matlab/R2013a-EDU + cn79$ module load impi/4.1.1.036 + cn79$ matlab & ### Parallel Matlab batch job @@ -137,27 +137,27 @@ To run matlab in batch mode, write an matlab script, then write a bash jobscript and execute via the qsub command. By default, matlab will execute one matlab worker instance per allocated core. - #!/bin/bash - #PBS -A PROJECT ID - #PBS -q qprod - #PBS -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 + #!/bin/bash + #PBS -A PROJECT ID + #PBS -q qprod + #PBS -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 - # change to shared scratch directory - SCR=/scratch/$USER/$PBS_JOBID - mkdir -p $SCR ; cd $SCR || exit + # change to shared scratch directory + SCR=/scratch/$USER/$PBS_JOBID + mkdir -p $SCR ; cd $SCR || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/matlabcode.m . + # copy input file to scratch + cp $PBS_O_WORKDIR/matlabcode.m . - # load modules - module load matlab/R2013a-EDU - module load impi/4.1.1.036 + # load modules + module load matlab/R2013a-EDU + module load impi/4.1.1.036 - # execute the calculation - matlab -nodisplay -r matlabcode > output.out + # execute the calculation + matlab -nodisplay -r matlabcode > output.out - # copy output file to home - cp output.out $PBS_O_WORKDIR/. + # copy output file to home + cp output.out $PBS_O_WORKDIR/. This script may be submitted directly to the PBS workload manager via the qsub command. The inputs and matlab script are in matlabcode.m @@ -168,16 +168,16 @@ include quit** statement at the end of the matlabcode.m script. Submit the jobscript using qsub - $ qsub ./jobscript + $ qsub ./jobscript ### Parallel Matlab program example The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. - sched = findResource('scheduler', 'type', 'mpiexec'); - set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun'); - set(sched, 'EnvironmentSetMethod', 'setenv'); + sched = findResource('scheduler', 'type', 'mpiexec'); + set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun'); + set(sched, 'EnvironmentSetMethod', 'setenv'); This script creates scheduler object "sched" of type "mpiexec" that starts workers using mpirun tool. To use correct version of mpirun, the @@ -192,42 +192,42 @@ The last step is to start matlabpool with "sched" object and correct number of workers. In this case qsub asked for total number of 32 cores, therefore the number of workers is also set to 32. - matlabpool(sched,32); - - - ... parallel code ... + matlabpool(sched,32); + + + ... parallel code ... + - - matlabpool close + matlabpool close The complete example showing how to use Distributed Computing Toolbox is show here. - sched = findResource('scheduler', 'type', 'mpiexec'); - set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun') - set(sched, 'EnvironmentSetMethod', 'setenv') - set(sched, 'SubmitArguments', '') - sched + sched = findResource('scheduler', 'type', 'mpiexec'); + set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun') + set(sched, 'EnvironmentSetMethod', 'setenv') + set(sched, 'SubmitArguments', '') + sched - matlabpool(sched,32); + matlabpool(sched,32); - n=2000; + n=2000; - W = rand(n,n); - W = distributed(W); - x = (1:n)'; - x = distributed(x); - spmd - [~, name] = system('hostname') -    -    T = W*x; % Calculation performed on labs, in parallel. -             % T and W are both codistributed arrays here. - end - T; - whos        % T and W are both distributed arrays here. + W = rand(n,n); + W = distributed(W); + x = (1:n)'; + x = distributed(x); + spmd + [~, name] = system('hostname') +    +    T = W*x; % Calculation performed on labs, in parallel. +             % T and W are both codistributed arrays here. + end + T; + whos        % T and W are both distributed arrays here. - matlabpool close - quit + matlabpool close + quit You can copy and paste the example in a .m file and execute. Note that the matlabpool size should correspond to **total number of cores** @@ -252,12 +252,12 @@ allocation. Starting Matlab workers is an expensive process that requires certain amount of time. For your information please see the following table: - compute nodes number of workers start-up time[s] - --------------- ------------------- -------------------- - 16 256 1008 - 8 128 534 - 4 64 333 - 2 32 210 +compute nodes number of workers start-up time[s] +--------------- ------------------- -------------------- +16 256 1008 +8 128 534 +4 64 333 +2 32 210  diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/octave.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/octave.md index 384badbe779394660da6b360e859b9c30882c514..04e38a45c03a925763de6e5ecddd87671cc96fc4 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/octave.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/octave.md @@ -3,7 +3,7 @@ Octave - + Introduction ------------ @@ -18,19 +18,19 @@ non-interactive programs. The Octave language is quite similar to Matlab so that most programs are easily portable. Read more on <http://www.gnu.org/software/octave/>**** -** -**Two versions of octave are available on Anselm, via module - Version module - ----------------------------------------------------------- --------------------------- - Octave 3.8.2, compiled with GCC and Multithreaded MKL Octave/3.8.2-gimkl-2.11.5 - Octave 4.0.1, compiled with GCC and Multithreaded MKL Octave/4.0.1-gimkl-2.11.5 - Octave 4.0.0, compiled with <span>GCC and OpenBLAS</span> Octave/4.0.0-foss-2015g +Two versions of octave are available on Anselm, via module + +Version module +----------------------------------------------------------- --------------------------- +Octave 3.8.2, compiled with GCC and Multithreaded MKL Octave/3.8.2-gimkl-2.11.5 +Octave 4.0.1, compiled with GCC and Multithreaded MKL Octave/4.0.1-gimkl-2.11.5 +Octave 4.0.0, compiled with >GCC and OpenBLAS Octave/4.0.0-foss-2015g  Modules and execution ---------------------- - $ module load Octave + $ module load Octave The octave on Anselm is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many octave kernels, @@ -42,31 +42,31 @@ OMP_NUM_THREADS environment variable. To run octave interactively, log in with ssh -X parameter for X11 forwarding. Run octave: - $ octave + $ octave To run octave in batch mode, write an octave script, then write a bash jobscript and execute via the qsub command. By default, octave will use 16 threads when running MKL kernels. - #!/bin/bash + #!/bin/bash - # change to local scratch directory - cd /lscratch/$PBS_JOBID || exit + # change to local scratch directory + cd /lscratch/$PBS_JOBID || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/octcode.m . + # copy input file to scratch + cp $PBS_O_WORKDIR/octcode.m . - # load octave module - module load octave + # load octave module + module load octave - # execute the calculation - octave -q --eval octcode > output.out + # execute the calculation + octave -q --eval octcode > output.out - # copy output file to home - cp output.out $PBS_O_WORKDIR/. + # copy output file to home + cp output.out $PBS_O_WORKDIR/. - #exit - exit + #exit + exit This script may be submitted directly to the PBS workload manager via the qsub command. The inputs are in octcode.m file, outputs in @@ -78,7 +78,7 @@ The octave c compiler mkoctfile calls the GNU gcc 4.8.1 for compiling native c code. This is very useful for running native c subroutines in octave environment. - $ mkoctfile -v + $ mkoctfile -v Octave may use MPI for interprocess communication This functionality is currently not supported on Anselm cluster. In case @@ -101,19 +101,19 @@ library](../intel-xeon-phi.html#section-3) Example - $ export OFFLOAD_REPORT=2 - $ export MKL_MIC_ENABLE=1 - $ module load octave - $ octave -q - octave:1> A=rand(10000); B=rand(10000); - octave:2> tic; C=A*B; toc - [MKL] [MIC --] [AO Function]   DGEMM - [MKL] [MIC --] [AO DGEMM Workdivision]   0.32 0.68 - [MKL] [MIC 00] [AO DGEMM CPU Time]   2.896003 seconds - [MKL] [MIC 00] [AO DGEMM MIC Time]   1.967384 seconds - [MKL] [MIC 00] [AO DGEMM CPU->MIC Data]   1347200000 bytes - [MKL] [MIC 00] [AO DGEMM MIC->CPU Data]   2188800000 bytes - Elapsed time is 2.93701 seconds. + $ export OFFLOAD_REPORT=2 + $ export MKL_MIC_ENABLE=1 + $ module load octave + $ octave -q + octave:1> A=rand(10000); B=rand(10000); + octave:2> tic; C=A*B; toc + [MKL] [MIC --] [AO Function]   DGEMM + [MKL] [MIC --] [AO DGEMM Workdivision]   0.32 0.68 + [MKL] [MIC 00] [AO DGEMM CPU Time]   2.896003 seconds + [MKL] [MIC 00] [AO DGEMM MIC Time]   1.967384 seconds + [MKL] [MIC 00] [AO DGEMM CPU->MIC Data]   1347200000 bytes + [MKL] [MIC 00] [AO DGEMM MIC->CPU Data]   2188800000 bytes + Elapsed time is 2.93701 seconds. In this example, the calculation was automatically divided among the CPU cores and the Xeon Phi MIC accelerator, reducing the total runtime from @@ -125,28 +125,28 @@ A version of [native](../intel-xeon-phi.html#section-4) Octave is compiled for Xeon Phi accelerators. Some limitations apply for this version: -- Only command line support. GUI, graph plotting etc. is - not supported. -- Command history in interactive mode is not supported. +- Only command line support. GUI, graph plotting etc. is + not supported. +- Command history in interactive mode is not supported. Octave is linked with parallel Intel MKL, so it best suited for batch processing of tasks that utilize BLAS, LAPACK and FFT operations. By default, number of threads is set to 120, you can control this -with <span><span class="monospace">OMP_NUM_THREADS</span> environment -variable. </span> +with > OMP_NUM_THREADS environment +variable. Calculations that do not employ parallelism (either by using parallel -MKL eg. via matrix operations, <span class="monospace">fork()</span> +MKL eg. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU. -<span>To use Octave on a node with Xeon Phi:</span> +>To use Octave on a node with Xeon Phi: - $ ssh mic0 # login to the MIC card - $ source /apps/tools/octave/3.8.2-mic/bin/octave-env.sh # set up environment variables - $ octave -q /apps/tools/octave/3.8.2-mic/example/test0.m # run an example + $ ssh mic0 # login to the MIC card + $ source /apps/tools/octave/3.8.2-mic/bin/octave-env.sh # set up environment variables + $ octave -q /apps/tools/octave/3.8.2-mic/example/test0.m # run an example -<span> </span> +> diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/r.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/r.md index 0b552a03a7f1b4acc1baa19af90232648bcc09cd..bde86cc5d5a081443efba4b2c6f50969b6a3d0d4 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/r.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/r.md @@ -3,7 +3,7 @@ R - + Introduction ------------ @@ -33,20 +33,20 @@ Read more on <http://www.r-project.org/>, Modules ------- -****The R version 3.0.1 is available on Anselm, along with GUI interface +**The R version 3.0.1 is available on Anselm, along with GUI interface Rstudio - Application Version module - ------------- -------------- --------- - **R** R 3.0.1 R - **Rstudio** Rstudio 0.97 Rstudio +Application Version module +------------- -------------- --------- +R** R 3.0.1 R +Rstudio** Rstudio 0.97 Rstudio - $ module load R + $ module load R Execution --------- -[]()The R on Anselm is linked to highly optimized MKL mathematical +The R on Anselm is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many R kernels, notably the linear algebra subroutines. The R runs these heavy calculation kernels without any penalty. By default, the R would @@ -58,8 +58,8 @@ OMP_NUM_THREADS environment variable. To run R interactively, using Rstudio GUI, log in with ssh -X parameter for X11 forwarding. Run rstudio: - $ module load Rstudio - $ rstudio + $ module load Rstudio + $ rstudio ### Batch execution @@ -69,25 +69,25 @@ running MKL kernels. Example jobscript: - #!/bin/bash + #!/bin/bash - # change to local scratch directory - cd /lscratch/$PBS_JOBID || exit + # change to local scratch directory + cd /lscratch/$PBS_JOBID || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/rscript.R . + # copy input file to scratch + cp $PBS_O_WORKDIR/rscript.R . - # load R module - module load R + # load R module + module load R - # execute the calculation - R CMD BATCH rscript.R routput.out + # execute the calculation + R CMD BATCH rscript.R routput.out - # copy output file to home - cp routput.out $PBS_O_WORKDIR/. + # copy output file to home + cp routput.out $PBS_O_WORKDIR/. - #exit - exit + #exit + exit This script may be submitted directly to the PBS workload manager via the qsub command. The inputs are in rscript.R file, outputs in @@ -105,7 +105,7 @@ above](r.html#interactive-execution). In the following sections, we focus on explicit parallelization, where parallel constructs are directly stated within the R script. -[]()Package parallel +Package parallel -------------------- The package parallel provides support for parallel computation, @@ -114,15 +114,15 @@ from package snow) and random-number generation. The package is activated this way: - $ R - > library(parallel) + $ R + > library(parallel) More information and examples may be obtained directly by reading the documentation available in R - > ?parallel - > library(help = "parallel") - > vignette("parallel") + > ?parallel + > library(help = "parallel") + > vignette("parallel") Download the package [parallell](package-parallel-vignette) vignette. @@ -139,41 +139,41 @@ Only cores of single node can be utilized this way! Forking example: - library(parallel) + library(parallel) - #integrand function - f <- function(i,h) { - x <- h*(i-0.5) - return (4/(1 + x*x)) - } + #integrand function + f <- function(i,h) { + x <- h*(i-0.5) + return (4/(1 + x*x)) + } - #initialize - size <- detectCores() + #initialize + size <- detectCores() - while (TRUE) - { - #read number of intervals - cat("Enter the number of intervals: (0 quits) ") - fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) + while (TRUE) + { + #read number of intervals + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) - if(n<=0) break + if(n<=0) break - #run the calculation - n <- max(n,size) - h <- 1.0/n + #run the calculation + n <- max(n,size) + h <- 1.0/n - i <- seq(1,n); - pi3 <- h*sum(simplify2array(mclapply(i,f,h,mc.cores=size))); + i <- seq(1,n); + pi3 <- h*sum(simplify2array(mclapply(i,f,h,mc.cores=size))); - #print results - cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) - } + #print results + cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) + } The above example is the classic parallel example for calculating the number Ď€. Note the **detectCores()** and **mclapply()** functions. Execute the example as: - $ R --slave --no-save --no-restore -f pi3p.R + $ R --slave --no-save --no-restore -f pi3p.R Every evaluation of the integrad function runs in parallel on different process. @@ -193,8 +193,8 @@ reference manual is available at When using package Rmpi, both openmpi and R modules must be loaded - $ module load openmpi - $ module load R + $ module load openmpi + $ module load R Rmpi may be used in three basic ways. The static approach is identical to executing any other MPI programm. In addition, there is Rslaves @@ -202,60 +202,60 @@ dynamic MPI approach and the mpi.apply approach. In the following section, we will use the number Ď€ integration example, to illustrate all these concepts. -### []()static Rmpi +### static Rmpi Static Rmpi programs are executed via mpiexec, as any other MPI programs. Number of processes is static - given at the launch time. Static Rmpi example: - library(Rmpi) - - #integrand function - f <- function(i,h) { - x <- h*(i-0.5) - return (4/(1 + x*x)) + library(Rmpi) + + #integrand function + f <- function(i,h) { + x <- h*(i-0.5) + return (4/(1 + x*x)) + } + + #initialize + invisible(mpi.comm.dup(0,1)) + rank <- mpi.comm.rank() + size <- mpi.comm.size() + n<-0 + + while (TRUE) + { + #read number of intervals + if (rank==0) { + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) } - #initialize - invisible(mpi.comm.dup(0,1)) - rank <- mpi.comm.rank() - size <- mpi.comm.size() - n<-0 + #broadcat the intervals + n <- mpi.bcast(as.integer(n),type=1) - while (TRUE) - { - #read number of intervals - if (rank==0) { - cat("Enter the number of intervals: (0 quits) ") - fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) - } + if(n<=0) break - #broadcat the intervals - n <- mpi.bcast(as.integer(n),type=1) + #run the calculation + n <- max(n,size) + h <- 1.0/n - if(n<=0) break + i <- seq(rank+1,n,size); + mypi <- h*sum(sapply(i,f,h)); - #run the calculation - n <- max(n,size) - h <- 1.0/n + pi3 <- mpi.reduce(mypi) - i <- seq(rank+1,n,size); - mypi <- h*sum(sapply(i,f,h)); - - pi3 <- mpi.reduce(mypi) - - #print results - if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) - } + #print results + if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) + } - mpi.quit() + mpi.quit() The above is the static MPI example for calculating the number Ď€. Note the **library(Rmpi)** and **mpi.comm.dup()** function calls. Execute the example as: - $ mpiexec R --slave --no-save --no-restore -f pi3.R + $ mpiexec R --slave --no-save --no-restore -f pi3.R ### dynamic Rmpi @@ -265,70 +265,70 @@ function call within the Rmpi program. Dynamic Rmpi example: - #integrand function - f <- function(i,h) { - x <- h*(i-0.5) - return (4/(1 + x*x)) + #integrand function + f <- function(i,h) { + x <- h*(i-0.5) + return (4/(1 + x*x)) + } + + #the worker function + workerpi <- function() + { + #initialize + rank <- mpi.comm.rank() + size <- mpi.comm.size() + n<-0 + + while (TRUE) + { + #read number of intervals + if (rank==0) { + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) } - #the worker function - workerpi <- function() - { - #initialize - rank <- mpi.comm.rank() - size <- mpi.comm.size() - n<-0 + #broadcat the intervals + n <- mpi.bcast(as.integer(n),type=1) - while (TRUE) - { - #read number of intervals - if (rank==0) { - cat("Enter the number of intervals: (0 quits) ") - fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) - } + if(n<=0) break - #broadcat the intervals - n <- mpi.bcast(as.integer(n),type=1) + #run the calculation + n <- max(n,size) + h <- 1.0/n - if(n<=0) break + i <- seq(rank+1,n,size); + mypi <- h*sum(sapply(i,f,h)); - #run the calculation - n <- max(n,size) - h <- 1.0/n + pi3 <- mpi.reduce(mypi) - i <- seq(rank+1,n,size); - mypi <- h*sum(sapply(i,f,h)); + #print results + if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) + } + } - pi3 <- mpi.reduce(mypi) + #main + library(Rmpi) - #print results - if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) - } - } - - #main - library(Rmpi) - - cat("Enter the number of slaves: ") - fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) + cat("Enter the number of slaves: ") + fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) - mpi.spawn.Rslaves(nslaves=ns) - mpi.bcast.Robj2slave(f) - mpi.bcast.Robj2slave(workerpi) + mpi.spawn.Rslaves(nslaves=ns) + mpi.bcast.Robj2slave(f) + mpi.bcast.Robj2slave(workerpi) - mpi.bcast.cmd(workerpi()) - workerpi() + mpi.bcast.cmd(workerpi()) + workerpi() - mpi.quit() + mpi.quit() The above example is the dynamic MPI example for calculating the number Ď€. Both master and slave processes carry out the calculation. Note the -**mpi.spawn.Rslaves(), mpi.bcast.Robj2slave()** and the -**mpi.bcast.cmd()** function calls. +mpi.spawn.Rslaves(), mpi.bcast.Robj2slave()** and the +mpi.bcast.cmd()** function calls. Execute the example as: - $ R --slave --no-save --no-restore -f pi3Rslaves.R + $ R --slave --no-save --no-restore -f pi3Rslaves.R ### mpi.apply Rmpi @@ -341,63 +341,63 @@ Execution is identical to other dynamic Rmpi programs. mpi.apply Rmpi example: - #integrand function - f <- function(i,h) { - x <- h*(i-0.5) - return (4/(1 + x*x)) - } + #integrand function + f <- function(i,h) { + x <- h*(i-0.5) + return (4/(1 + x*x)) + } - #the worker function - workerpi <- function(rank,size,n) - { - #run the calculation - n <- max(n,size) - h <- 1.0/n + #the worker function + workerpi <- function(rank,size,n) + { + #run the calculation + n <- max(n,size) + h <- 1.0/n - i <- seq(rank,n,size); - mypi <- h*sum(sapply(i,f,h)); + i <- seq(rank,n,size); + mypi <- h*sum(sapply(i,f,h)); - return(mypi) - } + return(mypi) + } - #main - library(Rmpi) + #main + library(Rmpi) - cat("Enter the number of slaves: ") - fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) + cat("Enter the number of slaves: ") + fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) - mpi.spawn.Rslaves(nslaves=ns) - mpi.bcast.Robj2slave(f) - mpi.bcast.Robj2slave(workerpi) + mpi.spawn.Rslaves(nslaves=ns) + mpi.bcast.Robj2slave(f) + mpi.bcast.Robj2slave(workerpi) - while (TRUE) - { - #read number of intervals - cat("Enter the number of intervals: (0 quits) ") - fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) - if(n<=0) break + while (TRUE) + { + #read number of intervals + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) + if(n<=0) break - #run workerpi - i=seq(1,2*ns) - pi3=sum(mpi.parSapply(i,workerpi,2*ns,n)) + #run workerpi + i=seq(1,2*ns) + pi3=sum(mpi.parSapply(i,workerpi,2*ns,n)) - #print results - cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) - } + #print results + cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) + } - mpi.quit() + mpi.quit() The above is the mpi.apply MPI example for calculating the number Ď€. Only the slave processes carry out the calculation. Note the -**mpi.parSapply(), ** function call. The package <span +mpi.parSapply(), ** function call. The package class="anchor-link">parallel -[example](r.html#package-parallel)</span>[above](r.html#package-parallel) +[example](r.html#package-parallel)[above](r.html#package-parallel) may be trivially adapted (for much better performance) to this structure using the mclapply() in place of mpi.parSapply(). Execute the example as: - $ R --slave --no-save --no-restore -f pi3parSapply.R + $ R --slave --no-save --no-restore -f pi3parSapply.R Combining parallel and Rmpi --------------------------- @@ -414,30 +414,30 @@ submit via the **qsub** Example jobscript for [static Rmpi](r.html#static-rmpi) parallel R execution, running 1 process per core: - #!/bin/bash - #PBS -q qprod - #PBS -N Rjob - #PBS -l select=100:ncpus=16:mpiprocs=16:ompthreads=1 + #!/bin/bash + #PBS -q qprod + #PBS -N Rjob + #PBS -l select=100:ncpus=16:mpiprocs=16:ompthreads=1 - # change to scratch directory - SCRDIR=/scratch/$USER/myjob - cd $SCRDIR || exit + # change to scratch directory + SCRDIR=/scratch/$USER/myjob + cd $SCRDIR || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/rscript.R . + # copy input file to scratch + cp $PBS_O_WORKDIR/rscript.R . - # load R and openmpi module - module load R - module load openmpi + # load R and openmpi module + module load R + module load openmpi - # execute the calculation - mpiexec -bycore -bind-to-core R --slave --no-save --no-restore -f rscript.R + # execute the calculation + mpiexec -bycore -bind-to-core R --slave --no-save --no-restore -f rscript.R - # copy output file to home - cp routput.out $PBS_O_WORKDIR/. + # copy output file to home + cp routput.out $PBS_O_WORKDIR/. - #exit - exit + #exit + exit For more information about jobscripts and MPI execution refer to the [Job diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/fftw.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/fftw.md index ac19c671d835596c4cf2850d81878095a46813cc..c35d9412de21656fb52ef417775e4550f7d8c97f 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/fftw.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/fftw.md @@ -4,7 +4,7 @@ FFTW The discrete Fourier transform in one or more dimensions, MPI parallel - +  @@ -15,10 +15,10 @@ discrete cosine/sine transforms or DCT/DST). The FFTW library allows for MPI parallel, in-place discrete Fourier transform, with data distributed over number of nodes.**** -** -**Two versions, **3.3.3** and **2.1.5** of FFTW are available on Anselm, + +Two versions, **3.3.3** and **2.1.5** of FFTW are available on Anselm, each compiled for **Intel MPI** and **OpenMPI** using **intel** and -**gnu** compilers. These are available via modules: +gnu** compilers. These are available via modules: <table> <colgroup> @@ -97,7 +97,7 @@ each compiled for **Intel MPI** and **OpenMPI** using **intel** and </tbody> </table> - $ module load fftw3 + $ module load fftw3 The module sets up environment variables, required for linking and running fftw enabled applications. Make sure that the choice of fftw @@ -107,48 +107,48 @@ different implementations may have unpredictable results. Example ------- - #include <fftw3-mpi.h> - int main(int argc, char **argv) - { -    const ptrdiff_t N0 = 100, N1 = 1000; -    fftw_plan plan; -    fftw_complex *data; -    ptrdiff_t alloc_local, local_n0, local_0_start, i, j; + #include <fftw3-mpi.h> + int main(int argc, char **argv) + { +    const ptrdiff_t N0 = 100, N1 = 1000; +    fftw_plan plan; +    fftw_complex *data; +    ptrdiff_t alloc_local, local_n0, local_0_start, i, j; -    MPI_Init(&argc, &argv); -    fftw_mpi_init(); +    MPI_Init(&argc, &argv); +    fftw_mpi_init(); -    /* get local data size and allocate */ -    alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD, -                                         &local_n0, &local_0_start); -    data = fftw_alloc_complex(alloc_local); +    /* get local data size and allocate */ +    alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD, +                                         &local_n0, &local_0_start); +    data = fftw_alloc_complex(alloc_local); -    /* create plan for in-place forward DFT */ -    plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD, -                                FFTW_FORWARD, FFTW_ESTIMATE); +    /* create plan for in-place forward DFT */ +    plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD, +                                FFTW_FORWARD, FFTW_ESTIMATE); -    /* initialize data */ -    for (i = 0; i < local_n0; ++i) for (j = 0; j < N1; ++j) -    {  data[i*N1 + j][0] = i; -        data[i*N1 + j][1] = j; } +    /* initialize data */ +    for (i = 0; i < local_n0; ++i) for (j = 0; j < N1; ++j) +    {  data[i*N1 + j][0] = i; +        data[i*N1 + j][1] = j; } -    /* compute transforms, in-place, as many times as desired */ -    fftw_execute(plan); +    /* compute transforms, in-place, as many times as desired */ +    fftw_execute(plan); -    fftw_destroy_plan(plan); +    fftw_destroy_plan(plan); -    MPI_Finalize(); - } +    MPI_Finalize(); + } Load modules and compile: - $ module load impi intel - $ module load fftw3-mpi + $ module load impi intel + $ module load fftw3-mpi - $ mpicc testfftw3mpi.c -o testfftw3mpi.x -Wl,-rpath=$LIBRARY_PATH -lfftw3_mpi + $ mpicc testfftw3mpi.c -o testfftw3mpi.x -Wl,-rpath=$LIBRARY_PATH -lfftw3_mpi -<span class="internal-link">Run the example as [Intel MPI -program](../mpi-1/running-mpich2.html)</span>. + Run the example as [Intel MPI +program](../mpi-1/running-mpich2.html). Read more on FFTW usage on the [FFTW website.](http://www.fftw.org/fftw3_doc/) diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/gsl.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/gsl.md index 18847a38b34f42ff775df06d876e9a743d4f47a7..00ea8e8779ce195d586afb128ad1cce2d8a76ea9 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/gsl.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/gsl.md @@ -5,7 +5,7 @@ The GNU Scientific Library. Provides a wide range of mathematical routines. - + Introduction ------------ @@ -20,43 +20,43 @@ wrappers to be written for very high level languages. The library covers a wide range of topics in numerical computing. Routines are available for the following areas: - ------------------------ ------------------------ ------------------------ - Complex Numbers Roots of Polynomials +------------------------ ------------------------ ------------------------ + Complex Numbers Roots of Polynomials - Special Functions Vectors and Matrices + Special Functions Vectors and Matrices - Permutations Combinations + Permutations Combinations - Sorting BLAS Support + Sorting BLAS Support - Linear Algebra CBLAS Library + Linear Algebra CBLAS Library - Fast Fourier Transforms Eigensystems + Fast Fourier Transforms Eigensystems - Random Numbers Quadrature + Random Numbers Quadrature - Random Distributions Quasi-Random Sequences + Random Distributions Quasi-Random Sequences - Histograms Statistics + Histograms Statistics - Monte Carlo Integration N-Tuples + Monte Carlo Integration N-Tuples - Differential Equations Simulated Annealing + Differential Equations Simulated Annealing - Numerical Interpolation - Differentiation + Numerical Interpolation + Differentiation - Series Acceleration Chebyshev Approximations + Series Acceleration Chebyshev Approximations - Root-Finding Discrete Hankel - Transforms + Root-Finding Discrete Hankel + Transforms - Least-Squares Fitting Minimization + Least-Squares Fitting Minimization - IEEE Floating-Point Physical Constants + IEEE Floating-Point Physical Constants - Basis Splines Wavelets - ------------------------ ------------------------ ------------------------ + Basis Splines Wavelets +------------------------ ------------------------ ------------------------ Modules ------- @@ -64,12 +64,12 @@ Modules The GSL 1.16 is available on Anselm, compiled for GNU and Intel compiler. These variants are available via modules: - Module Compiler - ----------------------- ----------- - gsl/1.16-gcc gcc 4.8.6 - gsl/1.16-icc(default) icc +Module Compiler +----------------------- ----------- +gsl/1.16-gcc gcc 4.8.6 +gsl/1.16-icc(default) icc -  $ module load gsl +  $ module load gsl The module sets up environment variables, required for linking and running GSL enabled applications. This particular command loads the @@ -86,16 +86,16 @@ Using the MKL is recommended. ### Compiling and linking with Intel compilers - $ module load intel - $ module load gsl - $ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl + $ module load intel + $ module load gsl + $ icc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl ### Compiling and linking with GNU compilers - $ module load gcc - $ module load mkl - $ module load gsl/1.16-gcc - $ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lgsl + $ module load gcc + $ module load mkl + $ module load gsl/1.16-gcc + $ gcc myprog.c -o myprog.x -Wl,-rpath=$LIBRARY_PATH -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lgsl Example ------- @@ -103,60 +103,60 @@ Example Following is an example of discrete wavelet transform implemented by GSL: - #include <stdio.h> - #include <math.h> - #include <gsl/gsl_sort.h> - #include <gsl/gsl_wavelet.h> - - int - main (int argc, char **argv) - { -  int i, n = 256, nc = 20; -  double *data = malloc (n * sizeof (double)); -  double *abscoeff = malloc (n * sizeof (double)); -  size_t *p = malloc (n * sizeof (size_t)); - -  gsl_wavelet *w; -  gsl_wavelet_workspace *work; - -  w = gsl_wavelet_alloc (gsl_wavelet_daubechies, 4); -  work = gsl_wavelet_workspace_alloc (n); - -  for (i=0; i<n; i++) -  data[i] = sin (3.141592654*(double)i/256.0); - -  gsl_wavelet_transform_forward (w, data, 1, n, work); - -  for (i = 0; i < n; i++) -    { -      abscoeff[i] = fabs (data[i]); -    } -  -  gsl_sort_index (p, abscoeff, 1, n); -  -  for (i = 0; (i + nc) < n; i++) -    data[p[i]] = 0; -  -  gsl_wavelet_transform_inverse (w, data, 1, n, work); -  -  for (i = 0; i < n; i++) -    { -      printf ("%gn", data[i]); -    } -  -  gsl_wavelet_free (w); -  gsl_wavelet_workspace_free (work); - -  free (data); -  free (abscoeff); -  free (p); -  return 0; - } + #include <stdio.h> + #include <math.h> + #include <gsl/gsl_sort.h> + #include <gsl/gsl_wavelet.h> + + int + main (int argc, char **argv) + { +  int i, n = 256, nc = 20; +  double *data = malloc (n * sizeof (double)); +  double *abscoeff = malloc (n * sizeof (double)); +  size_t *p = malloc (n * sizeof (size_t)); + +  gsl_wavelet *w; +  gsl_wavelet_workspace *work; + +  w = gsl_wavelet_alloc (gsl_wavelet_daubechies, 4); +  work = gsl_wavelet_workspace_alloc (n); + +  for (i=0; i<n; i++) +  data[i] = sin (3.141592654*(double)i/256.0); + +  gsl_wavelet_transform_forward (w, data, 1, n, work); + +  for (i = 0; i < n; i++) +    { +      abscoeff[i] = fabs (data[i]); +    } +  +  gsl_sort_index (p, abscoeff, 1, n); +  +  for (i = 0; (i + nc) < n; i++) +    data[p[i]] = 0; +  +  gsl_wavelet_transform_inverse (w, data, 1, n, work); +  +  for (i = 0; i < n; i++) +    { +      printf ("%gn", data[i]); +    } +  +  gsl_wavelet_free (w); +  gsl_wavelet_workspace_free (work); + +  free (data); +  free (abscoeff); +  free (p); +  return 0; + } Load modules and compile: - $ module load intel gsl - icc dwt.c -o dwt.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl + $ module load intel gsl + icc dwt.c -o dwt.x -Wl,-rpath=$LIBRARY_PATH -mkl -lgsl In this example, we compile the dwt.c code using the Intel compiler and link it to the MKL and GSL library, note the -mkl and -lgsl options. The diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/hdf5.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/hdf5.md index 54bf5a49f6592a35f5ca0841eff79cffe68630bf..45983dad867fed66af00b70b3a62f17ea9230762 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/hdf5.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/hdf5.md @@ -16,7 +16,7 @@ according to your needs. Versions **1.8.11** and **1.8.13** of HDF5 library are available on Anselm, compiled for **Intel MPI** and **OpenMPI** using **intel** and -**gnu** compilers. These are available via modules: +gnu** compilers. These are available via modules: <table style="width:100%;"> <colgroup> @@ -99,7 +99,7 @@ Anselm, compiled for **Intel MPI** and **OpenMPI** using **intel** and  - $ module load hdf5-parallel + $ module load hdf5-parallel The module sets up environment variables, required for linking and running HDF5 enabled applications. Make sure that the choice of HDF5 @@ -119,60 +119,60 @@ computations. Example ------- - #include "hdf5.h" - #define FILE "dset.h5" + #include "hdf5.h" + #define FILE "dset.h5" - int main() { + int main() { - hid_t file_id, dataset_id, dataspace_id; /* identifiers */ - hsize_t dims[2]; - herr_t status; - int i, j, dset_data[4][6]; + hid_t file_id, dataset_id, dataspace_id; /* identifiers */ + hsize_t dims[2]; + herr_t status; + int i, j, dset_data[4][6]; - /* Create a new file using default properties. */ - file_id = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); + /* Create a new file using default properties. */ + file_id = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); - /* Create the data space for the dataset. */ - dims[0] = 4; - dims[1] = 6; - dataspace_id = H5Screate_simple(2, dims, NULL); + /* Create the data space for the dataset. */ + dims[0] = 4; + dims[1] = 6; + dataspace_id = H5Screate_simple(2, dims, NULL); - /* Initialize the dataset. */ - for (i = 0; i < 4; i++) - for (j = 0; j < 6; j++) - dset_data[i][j] = i * 6 + j + 1; + /* Initialize the dataset. */ + for (i = 0; i < 4; i++) + for (j = 0; j < 6; j++) + dset_data[i][j] = i * 6 + j + 1; - /* Create the dataset. */ - dataset_id = H5Dcreate2(file_id, "/dset", H5T_STD_I32BE, dataspace_id, - H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); + /* Create the dataset. */ + dataset_id = H5Dcreate2(file_id, "/dset", H5T_STD_I32BE, dataspace_id, + H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); - /* Write the dataset. */ - status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, - dset_data); + /* Write the dataset. */ + status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, + dset_data); - status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, - dset_data); + status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, + dset_data); - /* End access to the dataset and release resources used by it. */ - status = H5Dclose(dataset_id); + /* End access to the dataset and release resources used by it. */ + status = H5Dclose(dataset_id); - /* Terminate access to the data space. */ - status = H5Sclose(dataspace_id); + /* Terminate access to the data space. */ + status = H5Sclose(dataspace_id); - /* Close the file. */ - status = H5Fclose(file_id); - } + /* Close the file. */ + status = H5Fclose(file_id); + } Load modules and compile: - $ module load intel impi - $ module load hdf5-parallel + $ module load intel impi + $ module load hdf5-parallel - $ mpicc hdf5test.c -o hdf5test.x -Wl,-rpath=$LIBRARY_PATH $HDF5_INC $HDF5_SHLIB + $ mpicc hdf5test.c -o hdf5test.x -Wl,-rpath=$LIBRARY_PATH $HDF5_INC $HDF5_SHLIB -<span class="internal-link">Run the example as [Intel MPI -program](../anselm-cluster-documentation/software/mpi-1/running-mpich2.html)</span>. + Run the example as [Intel MPI +program](../anselm-cluster-documentation/software/mpi-1/running-mpich2.html). For further informations, please see the website: <http://www.hdfgroup.org/HDF5/> @@ -183,10 +183,10 @@ For further informations, please see the website:  -<span -class="smarterwiki-popup-bubble smarterwiki-popup-bubble-active smarterwiki-popup-bubble-flipped"><span -class="smarterwiki-popup-bubble-tip"></span><span -class="smarterwiki-popup-bubble-body"><span -class="smarterwiki-popup-bubble-links-container"><span -class="smarterwiki-popup-bubble-links"><span -class="smarterwiki-popup-bubble-links-row">[{.smarterwiki-popup-bubble-link-favicon}](http://maps.google.com/maps?q=HDF5%20icc%20serial%09pthread%09hdf5%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09%24HDF5_INC%20%24HDF5_CPP_LIB%09%24HDF5_INC%20%24HDF5_F90_LIB%0A%0AHDF5%20icc%20parallel%20MPI%0A%09pthread%2C%20IntelMPI%09hdf5-parallel%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09Not%20supported%09%24HDF5_INC%20%24HDF5_F90_LIB "Search Google Maps"){.smarterwiki-popup-bubble-link}[{.smarterwiki-popup-bubble-link-favicon}](http://www.google.com/search?q=HDF5%20icc%20serial%09pthread%09hdf5%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09%24HDF5_INC%20%24HDF5_CPP_LIB%09%24HDF5_INC%20%24HDF5_F90_LIB%0A%0AHDF5%20icc%20parallel%20MPI%0A%09pthread%2C%20IntelMPI%09hdf5-parallel%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09Not%20supported%09%24HDF5_INC%20%24HDF5_F90_LIB "Search Google"){.smarterwiki-popup-bubble-link}[](http://www.google.com/search?hl=com&btnI=I'm+Feeling+Lucky&q=HDF5%20icc%20serial%09pthread%09hdf5%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09%24HDF5_INC%20%24HDF5_CPP_LIB%09%24HDF5_INC%20%24HDF5_F90_LIB%0A%0AHDF5%20icc%20parallel%20MPI%0A%09pthread%2C%20IntelMPI%09hdf5-parallel%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09Not%20supported%09%24HDF5_INC%20%24HDF5_F90_LIB+wikipedia "Search Wikipedia"){.smarterwiki-popup-bubble-link}</span></span></span></span></span> + +class="smarterwiki-popup-bubble smarterwiki-popup-bubble-active smarterwiki-popup-bubble-flipped"> +class="smarterwiki-popup-bubble-tip"> +class="smarterwiki-popup-bubble-body"> +class="smarterwiki-popup-bubble-links-container"> +class="smarterwiki-popup-bubble-links"> +class="smarterwiki-popup-bubble-links-row">[{.smarterwiki-popup-bubble-link-favicon}](http://maps.google.com/maps?q=HDF5%20icc%20serial%09pthread%09hdf5%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09%24HDF5_INC%20%24HDF5_CPP_LIB%09%24HDF5_INC%20%24HDF5_F90_LIB%0A%0AHDF5%20icc%20parallel%20MPI%0A%09pthread%2C%20IntelMPI%09hdf5-parallel%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09Not%20supported%09%24HDF5_INC%20%24HDF5_F90_LIB "Search Google Maps"){.smarterwiki-popup-bubble-link}[{.smarterwiki-popup-bubble-link-favicon}](http://www.google.com/search?q=HDF5%20icc%20serial%09pthread%09hdf5%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09%24HDF5_INC%20%24HDF5_CPP_LIB%09%24HDF5_INC%20%24HDF5_F90_LIB%0A%0AHDF5%20icc%20parallel%20MPI%0A%09pthread%2C%20IntelMPI%09hdf5-parallel%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09Not%20supported%09%24HDF5_INC%20%24HDF5_F90_LIB "Search Google"){.smarterwiki-popup-bubble-link}[](http://www.google.com/search?hl=com&btnI=I'm+Feeling+Lucky&q=HDF5%20icc%20serial%09pthread%09hdf5%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09%24HDF5_INC%20%24HDF5_CPP_LIB%09%24HDF5_INC%20%24HDF5_F90_LIB%0A%0AHDF5%20icc%20parallel%20MPI%0A%09pthread%2C%20IntelMPI%09hdf5-parallel%2F1.8.13%09%24HDF5_INC%20%24HDF5_SHLIB%09Not%20supported%09%24HDF5_INC%20%24HDF5_F90_LIB+wikipedia "Search Wikipedia"){.smarterwiki-popup-bubble-link}</span></span></span> diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/intel-numerical-libraries.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/intel-numerical-libraries.md index 946850e12ac6247fabacba22fa6b4e489f59a247..2304f2fd17fe219c72ae5157348c0ded0e64cb9d 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/intel-numerical-libraries.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/intel-numerical-libraries.md @@ -4,7 +4,7 @@ Intel numerical libraries Intel libraries for high performance in numerical computing - + Intel Math Kernel Library ------------------------- @@ -15,7 +15,7 @@ Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. - $ module load mkl + $ module load mkl Read more at the [Intel MKL](../intel-suite/intel-mkl.html) page. @@ -30,7 +30,7 @@ includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. - $ module load ipp + $ module load ipp Read more at the [Intel IPP](../intel-suite/intel-integrated-performance-primitives.html) @@ -48,7 +48,7 @@ smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. - $ module load tbb + $ module load tbb Read more at the [Intel TBB](../intel-suite/intel-tbb.html) page. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md index 77bac5a4ee730bd606c119879d805d844840dabf..7fa0a444c23e6c29bef8b872bc843483cc5b153e 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/magma-for-intel-xeon-phi.md @@ -8,7 +8,7 @@ accelerators To be able to compile and link code with MAGMA library user has to load following module: - $ module load magma/1.3.0-mic + $ module load magma/1.3.0-mic To make compilation more user friendly module also sets these two environment variables: @@ -21,9 +21,9 @@ step).  Compilation example: - $ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall $MAGMA_INC -c testing_dgetrf_mic.cpp -o testing_dgetrf_mic.o + $ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall $MAGMA_INC -c testing_dgetrf_mic.cpp -o testing_dgetrf_mic.o - $ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall -fPIC -Xlinker -zmuldefs -Wall -DNOCHANGE -DHOST testing_dgetrf_mic.o -o testing_dgetrf_mic $MAGMA_LIBS + $ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall -fPIC -Xlinker -zmuldefs -Wall -DNOCHANGE -DHOST testing_dgetrf_mic.o -o testing_dgetrf_mic $MAGMA_LIBS  @@ -34,43 +34,43 @@ accelerator prior to executing the user application. The server can be started and stopped using following scripts: To start MAGMA server use: -**$MAGMAROOT/start_magma_server** +$MAGMAROOT/start_magma_server** To stop the server use: -**$MAGMAROOT/stop_magma_server** +$MAGMAROOT/stop_magma_server** For deeper understanding how the MAGMA server is started, see the following script: -**$MAGMAROOT/launch_anselm_from_mic.sh** +$MAGMAROOT/launch_anselm_from_mic.sh** To test if the MAGMA server runs properly we can run one of examples that are part of the MAGMA installation: - [user@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic + [user@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic - [user@cn204 ~]$ export OMP_NUM_THREADS=16 + [user@cn204 ~]$ export OMP_NUM_THREADS=16 - [lriha@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic - Usage: /apps/libs/magma-mic/magmamic-1.3.0/testing/testing_dgetrf_mic [options] [-h|--help] + [lriha@cn204 ~]$ $MAGMAROOT/testing/testing_dgetrf_mic + Usage: /apps/libs/magma-mic/magmamic-1.3.0/testing/testing_dgetrf_mic [options] [-h|--help] -  M    N    CPU GFlop/s (sec)  MAGMA GFlop/s (sec)  ||PA-LU||/(||A||*N) - ========================================================================= -  1088 1088    ---  ( --- )    13.93 (  0.06)    --- -  2112 2112    ---  ( --- )    77.85 (  0.08)    --- -  3136 3136    ---  ( --- )   183.21 (  0.11)    --- -  4160 4160    ---  ( --- )   227.52 (  0.21)    --- -  5184 5184    ---  ( --- )   258.61 (  0.36)    --- -  6208 6208    ---  ( --- )   333.12 (  0.48)    --- -  7232 7232    ---  ( --- )   416.52 (  0.61)    --- -  8256 8256    ---  ( --- )   446.97 (  0.84)    --- -  9280 9280    ---  ( --- )   461.15 (  1.16)    --- - 10304 10304    ---  ( --- )   500.70 (  1.46)    --- +  M    N    CPU GFlop/s (sec)  MAGMA GFlop/s (sec)  ||PA-LU||/(||A||*N) + ========================================================================= +  1088 1088    ---  ( --- )    13.93 (  0.06)    --- +  2112 2112    ---  ( --- )    77.85 (  0.08)    --- +  3136 3136    ---  ( --- )   183.21 (  0.11)    --- +  4160 4160    ---  ( --- )   227.52 (  0.21)    --- +  5184 5184    ---  ( --- )   258.61 (  0.36)    --- +  6208 6208    ---  ( --- )   333.12 (  0.48)    --- +  7232 7232    ---  ( --- )   416.52 (  0.61)    --- +  8256 8256    ---  ( --- )   446.97 (  0.84)    --- +  9280 9280    ---  ( --- )   461.15 (  1.16)    --- + 10304 10304    ---  ( --- )   500.70 (  1.46)    ---  Please note: MAGMA contains several benchmarks and examples that can be found in: -**$MAGMAROOT/testing/** +$MAGMAROOT/testing/** MAGMA relies on the performance of all CPU cores as well as on the performance of the accelerator. Therefore on Anselm number of CPU OpenMP diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/petsc.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/petsc.md index 2441fd2bb02fa04fb473cd9b8f87974fe835f099..ebb0f189e6c78b162df95c989ae84f5f8715069d 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/petsc.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/petsc.md @@ -7,7 +7,7 @@ equations. It supports MPI, shared memory, and GPUs through CUDA or OpenCL, as well as hybrid MPI-shared memory or MPI-GPU parallelism. - + Introduction ------------ @@ -24,18 +24,18 @@ as well as hybrid MPI-shared memory or MPI-GPU parallelism. Resources --------- -- [project webpage](http://www.mcs.anl.gov/petsc/) -- [documentation](http://www.mcs.anl.gov/petsc/documentation/) - - [PETSc Users - Manual (PDF)](http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) - - [index of all manual - pages](http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/singleindex.html) -- PRACE Video Tutorial [part - 1](http://www.youtube.com/watch?v=asVaFg1NDqY), [part - 2](http://www.youtube.com/watch?v=ubp_cSibb9I), [part - 3](http://www.youtube.com/watch?v=vJAAAQv-aaw), [part - 4](http://www.youtube.com/watch?v=BKVlqWNh8jY), [part - 5](http://www.youtube.com/watch?v=iXkbLEBFjlM) +- [project webpage](http://www.mcs.anl.gov/petsc/) +- [documentation](http://www.mcs.anl.gov/petsc/documentation/) + - [PETSc Users + Manual (PDF)](http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf) + - [index of all manual + pages](http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/singleindex.html) +- PRACE Video Tutorial [part + 1](http://www.youtube.com/watch?v=asVaFg1NDqY), [part + 2](http://www.youtube.com/watch?v=ubp_cSibb9I), [part + 3](http://www.youtube.com/watch?v=vJAAAQv-aaw), [part + 4](http://www.youtube.com/watch?v=BKVlqWNh8jY), [part + 5](http://www.youtube.com/watch?v=iXkbLEBFjlM) Modules ------- @@ -43,8 +43,8 @@ Modules You can start using PETSc on Anselm by loading the PETSc module. Module names obey this pattern: - # module load petsc/version-compiler-mpi-blas-variant, e.g. - module load petsc/3.4.4-icc-impi-mkl-opt + # module load petsc/version-compiler-mpi-blas-variant, e.g. + module load petsc/3.4.4-icc-impi-mkl-opt where `variant` is replaced by one of ``. The `opt` variant is compiled @@ -80,27 +80,27 @@ petsc module loaded. ### Libraries linked to PETSc on Anselm (as of 11 April 2015) -- dense linear algebra - - [Elemental](http://libelemental.org/) -- sparse linear system solvers - - [Intel MKL - Pardiso](https://software.intel.com/en-us/node/470282) - - [MUMPS](http://mumps.enseeiht.fr/) - - [PaStiX](http://pastix.gforge.inria.fr/) - - [SuiteSparse](http://faculty.cse.tamu.edu/davis/suitesparse.html) - - [SuperLU](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu) - - [SuperLU_Dist](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu_dist) -- input/output - - [ExodusII](http://sourceforge.net/projects/exodusii/) - - [HDF5](http://www.hdfgroup.org/HDF5/) - - [NetCDF](http://www.unidata.ucar.edu/software/netcdf/) -- partitioning - - [Chaco](http://www.cs.sandia.gov/CRF/chac.html) - - [METIS](http://glaros.dtc.umn.edu/gkhome/metis/metis/overview) - - [ParMETIS](http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview) - - [PT-Scotch](http://www.labri.fr/perso/pelegrin/scotch/) -- preconditioners & multigrid - - [Hypre](http://acts.nersc.gov/hypre/) - - [Trilinos ML](http://trilinos.sandia.gov/packages/ml/) - - [SPAI - Sparse Approximate - Inverse](https://bitbucket.org/petsc/pkg-spai) +- dense linear algebra + - [Elemental](http://libelemental.org/) +- sparse linear system solvers + - [Intel MKL + Pardiso](https://software.intel.com/en-us/node/470282) + - [MUMPS](http://mumps.enseeiht.fr/) + - [PaStiX](http://pastix.gforge.inria.fr/) + - [SuiteSparse](http://faculty.cse.tamu.edu/davis/suitesparse.html) + - [SuperLU](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu) + - [SuperLU_Dist](http://crd.lbl.gov/~xiaoye/SuperLU/#superlu_dist) +- input/output + - [ExodusII](http://sourceforge.net/projects/exodusii/) + - [HDF5](http://www.hdfgroup.org/HDF5/) + - [NetCDF](http://www.unidata.ucar.edu/software/netcdf/) +- partitioning + - [Chaco](http://www.cs.sandia.gov/CRF/chac.html) + - [METIS](http://glaros.dtc.umn.edu/gkhome/metis/metis/overview) + - [ParMETIS](http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview) + - [PT-Scotch](http://www.labri.fr/perso/pelegrin/scotch/) +- preconditioners & multigrid + - [Hypre](http://acts.nersc.gov/hypre/) + - [Trilinos ML](http://trilinos.sandia.gov/packages/ml/) + - [SPAI - Sparse Approximate + Inverse](https://bitbucket.org/petsc/pkg-spai) diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/trilinos.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/trilinos.md index ce966e34b8099af11cef51ddbbbd344184ea5e24..63d2afd82a00f0a778215f139cca41a9d35feae7 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/trilinos.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/numerical-libraries/trilinos.md @@ -16,22 +16,22 @@ supported within Trilinos packages. Current Trilinos installation on ANSELM contains (among others) the following main packages -- **Epetra** - core linear algebra package containing classes for - manipulation with serial and distributed vectors, matrices, - and graphs. Dense linear solvers are supported via interface to BLAS - and LAPACK (Intel MKL on ANSELM). Its extension **EpetraExt** - contains e.g. methods for matrix-matrix multiplication. -- **Tpetra** - next-generation linear algebra package. Supports 64bit - indexing and arbitrary data type using C++ templates. -- **Belos** - library of various iterative solvers (CG, block CG, - GMRES, block GMRES etc.). -- **Amesos** - interface to direct sparse solvers. -- **Anasazi** - framework for large-scale eigenvalue algorithms. -- **IFPACK** - distributed algebraic preconditioner (includes e.g. - incomplete LU factorization) -- **Teuchos** - common tools packages. This package contains classes - for memory management, output, performance monitoring, BLAS and - LAPACK wrappers etc. +- **Epetra** - core linear algebra package containing classes for + manipulation with serial and distributed vectors, matrices, + and graphs. Dense linear solvers are supported via interface to BLAS + and LAPACK (Intel MKL on ANSELM). Its extension **EpetraExt** + contains e.g. methods for matrix-matrix multiplication. +- **Tpetra** - next-generation linear algebra package. Supports 64bit + indexing and arbitrary data type using C++ templates. +- **Belos** - library of various iterative solvers (CG, block CG, + GMRES, block GMRES etc.). +- **Amesos** - interface to direct sparse solvers. +- **Anasazi** - framework for large-scale eigenvalue algorithms. +- **IFPACK** - distributed algebraic preconditioner (includes e.g. + incomplete LU factorization) +- **Teuchos** - common tools packages. This package contains classes + for memory management, output, performance monitoring, BLAS and + LAPACK wrappers etc. For the full list of Trilinos packages, descriptions of their capabilities, and user manuals see @@ -46,7 +46,7 @@ installed on ANSELM. First, load the appropriate module: - $ module load trilinos + $ module load trilinos For the compilation of CMake-aware project, Trilinos provides the FIND_PACKAGE( Trilinos ) capability, which makes it easy to build @@ -59,11 +59,11 @@ system, which allows users to include important Trilinos variables directly into their makefiles. This can be done simply by inserting the following line into the makefile: - include Makefile.export.Trilinos + include Makefile.export.Trilinos or - include Makefile.export.<package> + include Makefile.export.<package> if you are interested only in a specific Trilinos package. This will give you access to the variables such as Trilinos_CXX_COMPILER, diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/nvidia-cuda.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/nvidia-cuda.md index 11c93502f21121e1958062d9784c93dc1ed2e309..b185b86cd6bae5f7a315f011b3c262e86b8402c9 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/nvidia-cuda.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/nvidia-cuda.md @@ -4,7 +4,7 @@ nVidia CUDA A guide to nVidia CUDA programming and GPU usage - + CUDA Programming on Anselm -------------------------- @@ -12,185 +12,185 @@ CUDA Programming on Anselm The default programming model for GPU accelerators on Anselm is Nvidia CUDA. To set up the environment for CUDA use - $ module load cuda + $ module load cuda If the user code is hybrid and uses both CUDA and MPI, the MPI environment has to be set up as well. One way to do this is to use the PrgEnv-gnu module, which sets up correct combination of GNU compiler and MPI library. - $ module load PrgEnv-gnu + $ module load PrgEnv-gnu CUDA code can be compiled directly on login1 or login2 nodes. User does not have to use compute nodes with GPU accelerator for compilation. To compile a CUDA source code, use nvcc compiler. - $ nvcc --version + $ nvcc --version -<span>CUDA Toolkit comes with large number of examples, that can be +>CUDA Toolkit comes with large number of examples, that can be helpful to start with. To compile and test these examples user should -copy them to its home directory </span> +copy them to its home directory - $ cd ~ - $ mkdir cuda-samples - $ cp -R /apps/nvidia/cuda/6.5.14/samples/* ~/cuda-samples/ + $ cd ~ + $ mkdir cuda-samples + $ cp -R /apps/nvidia/cuda/6.5.14/samples/* ~/cuda-samples/ To compile an examples, change directory to the particular example (here the example used is deviceQuery) and run "make" to start the compilation - $ cd ~/cuda-samples/1_Utilities/deviceQuery - $ make + $ cd ~/cuda-samples/1_Utilities/deviceQuery + $ make To run the code user can use PBS interactive session to get access to a node from qnvidia queue (note: use your project name with parameter -A in the qsub command) and execute the binary file - $ qsub -I -q qnvidia -A OPEN-0-0 - $ module load cuda - $ ~/cuda-samples/1_Utilities/deviceQuery/deviceQuery + $ qsub -I -q qnvidia -A OPEN-0-0 + $ module load cuda + $ ~/cuda-samples/1_Utilities/deviceQuery/deviceQuery Expected output of the deviceQuery example executed on a node with Tesla K20m is - CUDA Device Query (Runtime API) version (CUDART static linking) - - Detected 1 CUDA Capable device(s) - - Device 0: "Tesla K20m" - CUDA Driver Version / Runtime Version 5.0 / 5.0 - CUDA Capability Major/Minor version number: 3.5 - Total amount of global memory: 4800 MBytes (5032706048 bytes) - (13) Multiprocessors x (192) CUDA Cores/MP: 2496 CUDA Cores - GPU Clock rate: 706 MHz (0.71 GHz) - Memory Clock rate: 2600 Mhz - Memory Bus Width: 320-bit - L2 Cache Size: 1310720 bytes - Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096) - Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 - Total amount of constant memory: 65536 bytes - Total amount of shared memory per block: 49152 bytes - Total number of registers available per block: 65536 - Warp size: 32 - Maximum number of threads per multiprocessor: 2048 - Maximum number of threads per block: 1024 - Maximum sizes of each dimension of a block: 1024 x 1024 x 64 - Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 - Maximum memory pitch: 2147483647 bytes - Texture alignment: 512 bytes - Concurrent copy and kernel execution: Yes with 2 copy engine(s) - Run time limit on kernels: No - Integrated GPU sharing Host Memory: No - Support host page-locked memory mapping: Yes - Alignment requirement for Surfaces: Yes - Device has ECC support: Enabled - Device supports Unified Addressing (UVA): Yes - Device PCI Bus ID / PCI location ID: 2 / 0 - Compute Mode: - < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > - deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = Tesla K20m + CUDA Device Query (Runtime API) version (CUDART static linking) + + Detected 1 CUDA Capable device(s) + + Device 0: "Tesla K20m" + CUDA Driver Version / Runtime Version 5.0 / 5.0 + CUDA Capability Major/Minor version number: 3.5 + Total amount of global memory: 4800 MBytes (5032706048 bytes) + (13) Multiprocessors x (192) CUDA Cores/MP: 2496 CUDA Cores + GPU Clock rate: 706 MHz (0.71 GHz) + Memory Clock rate: 2600 Mhz + Memory Bus Width: 320-bit + L2 Cache Size: 1310720 bytes + Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096) + Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 + Total amount of constant memory: 65536 bytes + Total amount of shared memory per block: 49152 bytes + Total number of registers available per block: 65536 + Warp size: 32 + Maximum number of threads per multiprocessor: 2048 + Maximum number of threads per block: 1024 + Maximum sizes of each dimension of a block: 1024 x 1024 x 64 + Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 + Maximum memory pitch: 2147483647 bytes + Texture alignment: 512 bytes + Concurrent copy and kernel execution: Yes with 2 copy engine(s) + Run time limit on kernels: No + Integrated GPU sharing Host Memory: No + Support host page-locked memory mapping: Yes + Alignment requirement for Surfaces: Yes + Device has ECC support: Enabled + Device supports Unified Addressing (UVA): Yes + Device PCI Bus ID / PCI location ID: 2 / 0 + Compute Mode: + < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > + deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = Tesla K20m ### Code example In this section we provide a basic CUDA based vector addition code example. You can directly copy and paste the code to test it. - $ vim test.cu - - #define N (2048*2048) - #define THREADS_PER_BLOCK 512 - - #include <stdio.h> - #include <stdlib.h> - - // GPU kernel function to add two vectors - __global__ void add_gpu( int *a, int *b, int *c, int n){ -  int index = threadIdx.x + blockIdx.x * blockDim.x; -  if (index < n) -  c[index] = a[index] + b[index]; - } - - // CPU function to add two vectors - void add_cpu (int *a, int *b, int *c, int n) { -  for (int i=0; i < n; i++) - c[i] = a[i] + b[i]; - } - - // CPU function to generate a vector of random integers - void random_ints (int *a, int n) { -  for (int i = 0; i < n; i++) -  a[i] = rand() % 10000; // random number between 0 and 9999 - } - - // CPU function to compare two vectors - int compare_ints( int *a, int *b, int n ){ -  int pass = 0; -  for (int i = 0; i < N; i++){ -  if (a[i] != b[i]) { -  printf("Value mismatch at location %d, values %d and %dn",i, a[i], b[i]); -  pass = 1; -  } -  } -  if (pass == 0) printf ("Test passedn"); else printf ("Test Failedn"); -  return pass; - } - - - int main( void ) { -  -  int *a, *b, *c; // host copies of a, b, c -  int *dev_a, *dev_b, *dev_c; // device copies of a, b, c -  int size = N * sizeof( int ); // we need space for N integers - -  // Allocate GPU/device copies of dev_a, dev_b, dev_c -  cudaMalloc( (void**)&dev_a, size ); -  cudaMalloc( (void**)&dev_b, size ); -  cudaMalloc( (void**)&dev_c, size ); - -  // Allocate CPU/host copies of a, b, c -  a = (int*)malloc( size ); -  b = (int*)malloc( size ); -  c = (int*)malloc( size ); -  -  // Fill input vectors with random integer numbers -  random_ints( a, N ); -  random_ints( b, N ); - -  // copy inputs to device -  cudaMemcpy( dev_a, a, size, cudaMemcpyHostToDevice ); -  cudaMemcpy( dev_b, b, size, cudaMemcpyHostToDevice ); - -  // launch add_gpu() kernel with blocks and threads -  add_gpu<<< N/THREADS_PER_BLOCK, THREADS_PER_BLOCK >>>( dev_a, dev_b, dev_c, N ); - -  // copy device result back to host copy of c -  cudaMemcpy( c, dev_c, size, cudaMemcpyDeviceToHost ); - -  //Check the results with CPU implementation -  int *c_h; c_h = (int*)malloc( size ); -  add_cpu (a, b, c_h, N); -  compare_ints(c, c_h, N); - -  // Clean CPU memory allocations -  free( a ); free( b ); free( c ); free (c_h); - -  // Clean GPU memory allocations -  cudaFree( dev_a ); -  cudaFree( dev_b ); -  cudaFree( dev_c ); - -  return 0; - } + $ vim test.cu + + #define N (2048*2048) + #define THREADS_PER_BLOCK 512 + + #include <stdio.h> + #include <stdlib.h> + + // GPU kernel function to add two vectors + __global__ void add_gpu( int *a, int *b, int *c, int n){ +  int index = threadIdx.x + blockIdx.x * blockDim.x; +  if (index < n) +  c[index] = a[index] + b[index]; + } + + // CPU function to add two vectors + void add_cpu (int *a, int *b, int *c, int n) { +  for (int i=0; i < n; i++) + c[i] = a[i] + b[i]; + } + + // CPU function to generate a vector of random integers + void random_ints (int *a, int n) { +  for (int i = 0; i < n; i++) +  a[i] = rand() % 10000; // random number between 0 and 9999 + } + + // CPU function to compare two vectors + int compare_ints( int *a, int *b, int n ){ +  int pass = 0; +  for (int i = 0; i < N; i++){ +  if (a[i] != b[i]) { +  printf("Value mismatch at location %d, values %d and %dn",i, a[i], b[i]); +  pass = 1; +  } +  } +  if (pass == 0) printf ("Test passedn"); else printf ("Test Failedn"); +  return pass; + } + + + int main( void ) { +  +  int *a, *b, *c; // host copies of a, b, c +  int *dev_a, *dev_b, *dev_c; // device copies of a, b, c +  int size = N * sizeof( int ); // we need space for N integers + +  // Allocate GPU/device copies of dev_a, dev_b, dev_c +  cudaMalloc( (void**)&dev_a, size ); +  cudaMalloc( (void**)&dev_b, size ); +  cudaMalloc( (void**)&dev_c, size ); + +  // Allocate CPU/host copies of a, b, c +  a = (int*)malloc( size ); +  b = (int*)malloc( size ); +  c = (int*)malloc( size ); +  +  // Fill input vectors with random integer numbers +  random_ints( a, N ); +  random_ints( b, N ); + +  // copy inputs to device +  cudaMemcpy( dev_a, a, size, cudaMemcpyHostToDevice ); +  cudaMemcpy( dev_b, b, size, cudaMemcpyHostToDevice ); + +  // launch add_gpu() kernel with blocks and threads +  add_gpu<<< N/THREADS_PER_BLOCK, THREADS_PER_BLOCK >>>( dev_a, dev_b, dev_c, N ); + +  // copy device result back to host copy of c +  cudaMemcpy( c, dev_c, size, cudaMemcpyDeviceToHost ); + +  //Check the results with CPU implementation +  int *c_h; c_h = (int*)malloc( size ); +  add_cpu (a, b, c_h, N); +  compare_ints(c, c_h, N); + +  // Clean CPU memory allocations +  free( a ); free( b ); free( c ); free (c_h); + +  // Clean GPU memory allocations +  cudaFree( dev_a ); +  cudaFree( dev_b ); +  cudaFree( dev_c ); + +  return 0; + } This code can be compiled using following command - $ nvcc test.cu -o test_cuda + $ nvcc test.cu -o test_cuda To run the code use interactive PBS session to get access to one of the GPU accelerated nodes - $ qsub -I -q qnvidia -A OPEN-0-0 - $ module load cuda - $ ./test.cuda + $ qsub -I -q qnvidia -A OPEN-0-0 + $ module load cuda + $ ./test.cuda CUDA Libraries -------------- @@ -203,7 +203,7 @@ standard BLAS routines. Basic description of the library together with basic performance comparison with MKL can be found [here](https://developer.nvidia.com/cublas "Nvidia cuBLAS"). -**CuBLAS example: SAXPY** +CuBLAS example: SAXPY** SAXPY function multiplies the vector x by the scalar alpha and adds it to the vector y overwriting the latest vector with the result. The @@ -211,81 +211,81 @@ description of the cuBLAS function can be found in [NVIDIA CUDA documentation](http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-axpy "Nvidia CUDA documentation "). Code can be pasted in the file and compiled without any modification. - /* Includes, system */ - #include <stdio.h> - #include <stdlib.h> + /* Includes, system */ + #include <stdio.h> + #include <stdlib.h> - /* Includes, cuda */ - #include <cuda_runtime.h> - #include <cublas_v2.h> + /* Includes, cuda */ + #include <cuda_runtime.h> + #include <cublas_v2.h> - /* Vector size */ - #define N (32) + /* Vector size */ + #define N (32) - /* Host implementation of a simple version of saxpi */ - void saxpy(int n, float alpha, const float *x, float *y) - { -    for (int i = 0; i < n; ++i) -    y[i] = alpha*x[i] + y[i]; - } + /* Host implementation of a simple version of saxpi */ + void saxpy(int n, float alpha, const float *x, float *y) + { +    for (int i = 0; i < n; ++i) +    y[i] = alpha*x[i] + y[i]; + } - /* Main */ - int main(int argc, char **argv) - { -    float *h_X, *h_Y, *h_Y_ref; -    float *d_X = 0; -    float *d_Y = 0; + /* Main */ + int main(int argc, char **argv) + { +    float *h_X, *h_Y, *h_Y_ref; +    float *d_X = 0; +    float *d_Y = 0; -    const float alpha = 1.0f; -    int i; +    const float alpha = 1.0f; +    int i; -    cublasHandle_t handle; +    cublasHandle_t handle; -    /* Initialize CUBLAS */ -    printf("simpleCUBLAS test running..n"); -    cublasCreate(&handle); +    /* Initialize CUBLAS */ +    printf("simpleCUBLAS test running..n"); +    cublasCreate(&handle); -    /* Allocate host memory for the matrices */ -    h_X = (float *)malloc(N * sizeof(h_X[0])); -    h_Y = (float *)malloc(N * sizeof(h_Y[0])); -    h_Y_ref = (float *)malloc(N * sizeof(h_Y_ref[0])); +    /* Allocate host memory for the matrices */ +    h_X = (float *)malloc(N * sizeof(h_X[0])); +    h_Y = (float *)malloc(N * sizeof(h_Y[0])); +    h_Y_ref = (float *)malloc(N * sizeof(h_Y_ref[0])); -    /* Fill the matrices with test data */ -    for (i = 0; i < N; i++) -    { -        h_X[i] = rand() / (float)RAND_MAX; -        h_Y[i] = rand() / (float)RAND_MAX; -        h_Y_ref[i] = h_Y[i]; -    } +    /* Fill the matrices with test data */ +    for (i = 0; i < N; i++) +    { +        h_X[i] = rand() / (float)RAND_MAX; +        h_Y[i] = rand() / (float)RAND_MAX; +        h_Y_ref[i] = h_Y[i]; +    } -    /* Allocate device memory for the matrices */ -    cudaMalloc((void **)&d_X, N * sizeof(d_X[0])); -    cudaMalloc((void **)&d_Y, N * sizeof(d_Y[0])); +    /* Allocate device memory for the matrices */ +    cudaMalloc((void **)&d_X, N * sizeof(d_X[0])); +    cudaMalloc((void **)&d_Y, N * sizeof(d_Y[0])); -    /* Initialize the device matrices with the host matrices */ -    cublasSetVector(N, sizeof(h_X[0]), h_X, 1, d_X, 1); -    cublasSetVector(N, sizeof(h_Y[0]), h_Y, 1, d_Y, 1); +    /* Initialize the device matrices with the host matrices */ +    cublasSetVector(N, sizeof(h_X[0]), h_X, 1, d_X, 1); +    cublasSetVector(N, sizeof(h_Y[0]), h_Y, 1, d_Y, 1); -    /* Performs operation using plain C code */ -    saxpy(N, alpha, h_X, h_Y_ref); +    /* Performs operation using plain C code */ +    saxpy(N, alpha, h_X, h_Y_ref); -    /* Performs operation using cublas */ -    cublasSaxpy(handle, N, &alpha, d_X, 1, d_Y, 1); +    /* Performs operation using cublas */ +    cublasSaxpy(handle, N, &alpha, d_X, 1, d_Y, 1); -    /* Read the result back */ -    cublasGetVector(N, sizeof(h_Y[0]), d_Y, 1, h_Y, 1); +    /* Read the result back */ +    cublasGetVector(N, sizeof(h_Y[0]), d_Y, 1, h_Y, 1); -    /* Check result against reference */ -    for (i = 0; i < N; ++i) -        printf("CPU res = %f t GPU res = %f t diff = %f n", h_Y_ref[i], h_Y[i], h_Y_ref[i] - h_Y[i]); +    /* Check result against reference */ +    for (i = 0; i < N; ++i) +        printf("CPU res = %f t GPU res = %f t diff = %f n", h_Y_ref[i], h_Y[i], h_Y_ref[i] - h_Y[i]); -    /* Memory clean up */ -    free(h_X); free(h_Y); free(h_Y_ref); -    cudaFree(d_X); cudaFree(d_Y); +    /* Memory clean up */ +    free(h_X); free(h_Y); free(h_Y_ref); +    cudaFree(d_X); cudaFree(d_Y); -    /* Shutdown */ -    cublasDestroy(handle); - } +    /* Shutdown */ +    cublasDestroy(handle); + }  Please note: cuBLAS has its own function for data transfers between CPU and GPU memory: @@ -299,18 +299,18 @@ and GPU memory:  To compile the code using NVCC compiler a "-lcublas" compiler flag has to be specified: - $ module load cuda - $ nvcc -lcublas test_cublas.cu -o test_cublas_nvcc + $ module load cuda + $ nvcc -lcublas test_cublas.cu -o test_cublas_nvcc To compile the same code with GCC: - $ module load cuda - $ gcc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart + $ module load cuda + $ gcc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart To compile the same code with Intel compiler: - $ module load cuda intel - $ icc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart + $ module load cuda intel + $ icc -std=c99 test_cublas.c -o test_cublas_icc -lcublas -lcudart diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/diagnostic-component-team.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/diagnostic-component-team.md index 227075b0037e273fccca040011303fab047c028b..c5fd0a79c8db47644b28dcffff812507a62ea979 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/diagnostic-component-team.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/diagnostic-component-team.md @@ -3,7 +3,7 @@ Diagnostic component (TEAM) - + ### Access @@ -21,7 +21,7 @@ disease-associated variants. When no diagnostic mutation is found, the file can be sent to the disease-causing gene discovery tool to see wheter new disease associated variants can be found. -TEAM <span>(27)</span> is an intuitive and easy-to-use web tool that +TEAM >(27) is an intuitive and easy-to-use web tool that fills the gap between the predicted mutations and the final diagnostic in targeted enrichment sequencing analysis. The tool searches for known diagnostic mutations, corresponding to a disease panel, among the @@ -43,7 +43,7 @@ increases.](images/fig5.png.1 "fig5.png")  -***Figure 5. ****Interface of the application. Panels for defining +*Figure 5. ****Interface of the application. Panels for defining targeted regions of interest can be set up by just drag and drop known disease genes or disease definitions from the lists. Thus, virtual panels can be interactively improved as the knowledge of the disease diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/overview.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/overview.md index d55a09eebb938606bc280c634952f33d0a024900..3b08c53001785bc601285d0cb0574fe7c12d1612 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/overview.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/overview.md @@ -4,7 +4,7 @@ Overview The human NGS data processing solution - + Introduction ------------ @@ -30,7 +30,7 @@ component where they can be analysed directly by the user that produced them, depending of the experimental design carried out.](images/fig1.png "Fig 1") -***Figure 1.*** *OMICS MASTER solution overview. Data is produced in the +*Figure 1.*** *OMICS MASTER solution overview. Data is produced in the external labs and comes to IT4I (represented by the blue dashed line). The data pre-processor converts raw data into a list of variants and annotations for each sequenced patient. These lists files together with @@ -38,7 +38,7 @@ primary and secondary (alignment) data files are stored in IT4I sequence DB and uploaded to the discovery (candidate prioritization) or diagnostic component where they can be analyzed directly by the user that produced them, depending of the experimental design carried -out*.<span style="text-align: left; "> </span> +out*. style="text-align: left; "> Typical genomics pipelines are composed by several components that need to be launched manually. The advantage of OMICS MASTER pipeline is that @@ -79,28 +79,28 @@ the future. #### Quality control, preprocessing and statistics for FASTQ -**Component:**<span> Hpg-Fastq & FastQC.</span> +Component:**> Hpg-Fastq & FastQC. These steps are carried out over the original FASTQ file with optimized scripts and includes the following steps: sequence cleansing, estimation of base quality scores, elimination of duplicates and statistics. -**Input:** FASTQ file. +Input:** FASTQ file. -**Output:** FASTQ file plus an HTML file containing statistics on the +Output:** FASTQ file plus an HTML file containing statistics on the data. -**FASTQ format -**<span>It represents the nucleotide sequence and its corresponding +FASTQ format +>It represents the nucleotide sequence and its corresponding quality scores. -</span>** -***Figure 2.****** FASTQ file.*** -** + +*Figure 2.****** FASTQ file.*** + #### Mapping -**Component:** Hpg-aligner.**** +Component:** Hpg-aligner.**** Sequence reads are mapped over the human reference genome. SOLiD reads are not covered by this solution; they should be mapped with specific @@ -113,27 +113,27 @@ most common scenarios). This proposal provides a simple and fast solution that maps almost all the reads, even those containing a high number of mismatches or indels. -**Input:** FASTQ file. +Input:** FASTQ file. -**Output:** Aligned file in BAM format.**** +Output:** Aligned file in BAM format.**** -**Sequence Alignment/Map (SAM)** +Sequence Alignment/Map (SAM)** -<span>It is a human readable tab-delimited format in which each read and +>It is a human readable tab-delimited format in which each read and its alignment is represented on a single line. The format can represent unmapped reads, reads that are mapped to unique locations, and reads that are mapped to multiple locations. -</span><span>The SAM format </span>^(1)^<span> consists of one header +>The SAM format ^(1)^> consists of one header section and one alignment section. The lines in the header section start with character â€@’, and lines in the alignment section do not. All lines are TAB delimited. -<span>In SAM, each alignment line has 11 mandatory fields and a variable +>In SAM, each alignment line has 11 mandatory fields and a variable number of optional fields. The mandatory fields are briefly described in Table 1. They must be present but their value can be a -</span></span>â€*’<span><span> or a zero (depending on the field) if the -corresponding information is unavailable.</span>  </span> +â€*’>> or a zero (depending on the field) if the +corresponding information is unavailable.  <table> @@ -208,10 +208,10 @@ corresponding information is unavailable.</span>  </span> -<span> +> -</span>***Table 1.*** *Mandatory fields in the SAM format. -<span> +*Table 1.*** *Mandatory fields in the SAM format. +> The standard CIGAR description of pairwise alignment defines three operations: â€M’ for match/mismatch, â€I’ for insertion compared with the reference and â€D’ for deletion. The extended CIGAR proposed in SAM added @@ -235,7 +235,7 @@ The hard clipping operation H indicates that the clipped sequence is not present in the sequence field. The NM tag gives the number of mismatches. Read r004 is aligned across an intron, indicated by the N operation.](images/fig3.png "fig3.png") -*** +* Figure 3.*** *SAM format file. The â€@SQ’ line in the header section gives the order of reference sequences. Notably, r001 is the name of a read pair. According to FLAG 163 (=1+2+32+128), the read mapped to @@ -250,32 +250,32 @@ r003 map to position 9, and the first five to position 29 on the reverse strand. The hard clipping operation H indicates that the clipped sequence is not present in the sequence field. The NM tag gives the number of mismatches. Read r004 is aligned across an intron, indicated -by the N operation.*</span>* +by the N operation.** -**Binary Alignment/Map (BAM)** +Binary Alignment/Map (BAM)** -<span>BAM is the binary representation of SAM and keeps exactly the same +>BAM is the binary representation of SAM and keeps exactly the same information as SAM. BAM uses lossless compression to reduce the size of the data by about 75% and provides an indexing system that allows reads that overlap a region of the genome to be retrieved and rapidly -traversed. </span> +traversed. #### Quality control, preprocessing and statistics for BAM -**Component:** Hpg-Fastq & FastQC. Some features: +Component:** Hpg-Fastq & FastQC. Some features: -- Quality control: % reads with N errors, % reads with multiple - mappings, strand bias, paired-end insert, ... -- Filtering: by number of errors, number of hits, … - - Comparator: stats, intersection, ... +- Quality control: % reads with N errors, % reads with multiple + mappings, strand bias, paired-end insert, ... +- Filtering: by number of errors, number of hits, … + - Comparator: stats, intersection, ... -**Input:** BAM file. +Input:** BAM file. -**Output:** BAM file plus an HTML file containing statistics. +Output:** BAM file plus an HTML file containing statistics. #### Variant Calling -**Component:** GATK. +Component:** GATK. Identification of single nucleotide variants and indels on the alignments is performed using the Genome Analysis Toolkit (GATK). GATK @@ -284,22 +284,22 @@ high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. -**Input:** BAM +Input:** BAM -**Output:** VCF +Output:** VCF -**Variant Call Format (VCF)** +Variant Call Format (VCF)** -<span>VCF </span>^(3)^<span> is a standardized format for storing the +>VCF ^(3)^> is a standardized format for storing the most prevalent types of sequence variation, including SNPs, indels and larger structural variants, together with rich annotations. The format was developed with the primary intention to represent human genetic -variation, but its use is not restricted </span><span>to diploid genomes +variation, but its use is not restricted >to diploid genomes and can be used in different contexts as well. Its flexibility and user extensibility allows representation of a wide variety of genomic variation with respect to a single reference sequence. -</span>A VCF file consists of a header section and a data section. The +A VCF file consists of a header section and a data section. The header contains an arbitrary number of metainformation lines, each starting with characters â€##’, and a TAB delimited field definition line, starting with a single â€#’ character. The meta-information header @@ -348,8 +348,8 @@ reference bases replaced by the haplotype in the ALT column. The coordinate refers to the first reference base. (g) Users are advised to use simplest representation possible and lowest coordinate in cases where the position is ambiguous.](images/fig4.png) -** -Figure 4.**<span> (a) Example of valid VCF. The header lines + +Figure 4.**> (a) Example of valid VCF. The header lines ##fileformat and #CHROM are mandatory, the rest is optional but strongly recommended. Each line of the body describes variants present in the sampled population at one genomic position or region. All @@ -368,11 +368,11 @@ deletion, replacement, and a large deletion. The REF columns shows the reference bases replaced by the haplotype in the ALT column. The coordinate refers to the first reference base. (g) Users are advised to use simplest representation possible and lowest coordinate in cases -where the position is ambiguous.</span> +where the position is ambiguous. -### <span>Annotating</span> +### >Annotating -**Component:** HPG-Variant +Component:** HPG-Variant The functional consequences of every variant found are then annotated using the HPG-Variant software, which extracts from CellBase**,** the @@ -393,9 +393,9 @@ conventional reports beyond the coding regions and expands the knowledge on the contribution of non-coding or synonymous variants to the phenotype studied. -**Input:** VCF +Input:** VCF -**Output:** The output of this step is the Variant Calling Format (VCF) +Output:** The output of this step is the Variant Calling Format (VCF) file, which contains changes with respect to the reference genome with the corresponding QC and functional annotations. @@ -404,24 +404,24 @@ the corresponding QC and functional annotations. CellBase^(5)^ is a relational database integrates biological information from different sources and includes: -**Core features:** +Core features:** We took genome sequences, genes, transcripts, exons, cytobands or cross -references (xrefs) identifiers (IDs) <span>from Ensembl -</span>^(6)^<span>. Protein information including sequences, xrefs or +references (xrefs) identifiers (IDs) >from Ensembl +^(6)^>. Protein information including sequences, xrefs or protein features (natural variants, mutagenesis sites, post-translational modifications, etc.) were imported from UniProt -</span>^(7)^<span>.</span> +^(7)^>. -**Regulatory:** +Regulatory:** CellBase imports miRNA from miRBase ^(8)^; curated and non-curated miRNA -targets from miRecords ^(9)^, <span>miRTarBase </span>^(10)^<span>, -TargetScan</span>^(11)^<span> and microRNA.org </span>^(12)^<span> and +targets from miRecords ^(9)^, >miRTarBase ^(10)^>, +TargetScan^(11)^> and microRNA.org ^(12)^> and CpG islands and conserved regions from the UCSC database -</span>^(13)^<span>.</span><span> </span> +^(13)^>.> </span> -**Functional annotation** +Functional annotation** OBO Foundry ^(14)^ develops many biomedical ontologies that are implemented in OBO format. We designed a SQL schema to store these OBO @@ -429,26 +429,26 @@ ontologies and >30 ontologies were imported. OBO ontology term annotations were taken from Ensembl ^(6)^. InterPro ^(15)^ annotations were also imported. -**Variation** +Variation** CellBase includes SNPs from dbSNP ^(16)^; SNP population frequencies from HapMap ^(17)^, 1000 genomes project ^(18)^ and Ensembl ^(6)^; phenotypically annotated SNPs were imported from NHRI GWAS Catalog -^(19)^,^ ^<span>HGMD </span>^(20)^<span>, Open Access GWAS Database -</span>^(21)^<span>, UniProt </span>^(7)^<span> and OMIM -</span>^(22)^<span>; mutations from COSMIC </span>^(23)^<span> and +^(19)^,^ ^>HGMD ^(20)^>, Open Access GWAS Database +^(21)^>, UniProt ^(7)^> and OMIM +^(22)^>; mutations from COSMIC ^(23)^> and structural variations from Ensembl -</span>^(6)^<span>.</span><span> </span> +^(6)^>.> </span> -**Systems biology** +Systems biology** We also import systems biology information like interactome information -from IntAct ^(24)^. Reactome ^(25)^<span> stores pathway and interaction -information in BioPAX </span>^(26)^<span> format. BioPAX data exchange -format </span><span>enables the integration of diverse pathway +from IntAct ^(24)^. Reactome ^(25)^> stores pathway and interaction +information in BioPAX ^(26)^> format. BioPAX data exchange +format >enables the integration of diverse pathway resources. We successfully solved the problem of storing data released in BioPAX format into a SQL relational schema, which allowed us -importing Reactome in CellBase.</span> +importing Reactome in CellBase. ### [Diagnostic component (TEAM)](diagnostic-component-team.html) @@ -460,42 +460,42 @@ importing Reactome in CellBase.</span> Usage ----- -First of all, we should load <span class="monospace">ngsPipeline</span> +First of all, we should load ngsPipeline module: - $ module load ngsPipeline - -This command will load <span class="monospace">python/2.7.5</span> -module and all the required modules (<span -class="monospace">hpg-aligner</span>, <span -class="monospace">gatk</span>, etc) - -<span> If we launch ngsPipeline with â€-h’, we will get the usage -help: </span> - - $ ngsPipeline -h - Usage: ngsPipeline.py [-h] -i INPUT -o OUTPUT -p PED --project PROJECT --queue -            QUEUE [--stages-path STAGES_PATH] [--email EMAIL] - [--prefix PREFIX] [-s START] [-e END] --log - - Python pipeline - - optional arguments: -  -h, --help       show this help message and exit -  -i INPUT, --input INPUT -  -o OUTPUT, --output OUTPUT -             Output Data directory -  -p PED, --ped PED   Ped file with all individuals -  --project PROJECT   Project Id -  --queue QUEUE     Queue Id -  --stages-path STAGES_PATH -             Custom Stages path -  --email EMAIL     Email -  --prefix PREFIX    Prefix name for Queue Jobs name -  -s START, --start START -             Initial stage -  -e END, --end END   Final stage -  --log         Log to file + $ module load ngsPipeline + +This command will load python/2.7.5 +module and all the required modules ( +hpg-aligner, +gatk, etc) + +> If we launch ngsPipeline with â€-h’, we will get the usage +help: + + $ ngsPipeline -h + Usage: ngsPipeline.py [-h] -i INPUT -o OUTPUT -p PED --project PROJECT --queue +            QUEUE [--stages-path STAGES_PATH] [--email EMAIL] + [--prefix PREFIX] [-s START] [-e END] --log + + Python pipeline + + optional arguments: +  -h, --help       show this help message and exit +  -i INPUT, --input INPUT +  -o OUTPUT, --output OUTPUT +             Output Data directory +  -p PED, --ped PED   Ped file with all individuals +  --project PROJECT   Project Id +  --queue QUEUE     Queue Id +  --stages-path STAGES_PATH +             Custom Stages path +  --email EMAIL     Email +  --prefix PREFIX    Prefix name for Queue Jobs name +  -s START, --start START +             Initial stage +  -e END, --end END   Final stage +  --log         Log to file  @@ -531,52 +531,52 @@ end the pipeline in a specific stage we must use -e.      *--log*. Using log argument NGSpipeline will prompt all the logs to this file. -<span>    </span>*--project*<span>. Project ID of your supercomputer -allocation. </span> +>    *--project*>. Project ID of your supercomputer +allocation. -<span>    *--queue*. +>    *--queue*. [Queue](../../resource-allocation-and-job-execution/introduction.html) -to run the jobs in.</span> +to run the jobs in. - <span>Input, output and ped arguments are mandatory. If the output -folder does not exist, the pipeline will create it.</span> + >Input, output and ped arguments are mandatory. If the output +folder does not exist, the pipeline will create it. -<span>Examples</span> +>Examples --------------------- This is an example usage of NGSpipeline: -We have a folder with the following structure in <span><span -class="monospace">/apps/bio/omics/1.0/sample_data/</span> </span><span>:</span> +We have a folder with the following structure in > +/apps/bio/omics/1.0/sample_data/ >:</span> - /apps/bio/omics/1.0/sample_data - └── data - ├── file.ped - ├── sample1 - │  ├── sample1_1.fq - │  └── sample1_2.fq - └── sample2 - ├── sample2_1.fq - └── sample2_2.fq + /apps/bio/omics/1.0/sample_data + └── data + ├── file.ped + ├── sample1 + │  ├── sample1_1.fq + │  └── sample1_2.fq + └── sample2 + ├── sample2_1.fq + └── sample2_2.fq -The ped file (<span class="monospace">file.ped</span>) contains the -following info:<span> </span> +The ped file ( file.ped) contains the +following info:> - #family_ID sample_ID parental_ID maternal_ID sex phenotype - FAM sample_A 0 0 1 1 - FAM sample_B 0 0 2 2 + #family_ID sample_ID parental_ID maternal_ID sex phenotype + FAM sample_A 0 0 1 1 + FAM sample_B 0 0 2 2 Now, lets load the NGSPipeline module and copy the sample data to a [scratch directory](../../storage.html) : - $ module load ngsPipeline - $ mkdir -p /scratch/$USER/omics/results - $ cp -r /apps/bio/omics/1.0/sample_data /scratch/$USER/omics/ + $ module load ngsPipeline + $ mkdir -p /scratch/$USER/omics/results + $ cp -r /apps/bio/omics/1.0/sample_data /scratch/$USER/omics/ Now, we can launch the pipeline (replace OPEN-0-0 with your Project ID) : - $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped --project OPEN-0-0 --queue qprod + $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped --project OPEN-0-0 --queue qprod This command submits the processing [jobs to the queue](../../resource-allocation-and-job-execution/job-submission-and-execution.html). @@ -584,54 +584,54 @@ queue](../../resource-allocation-and-job-execution/job-submission-and-execution. If we want to re-launch the pipeline from stage 4 until stage 20 we should use the next command: - $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped -s 4 -e 20 --project OPEN-0-0 --queue qprod + $ ngsPipeline -i /scratch/$USER/omics/sample_data/data -o /scratch/$USER/omics/results -p /scratch/$USER/omics/sample_data/data/file.ped -s 4 -e 20 --project OPEN-0-0 --queue qprod -<span>Details on the pipeline</span> +>Details on the pipeline ------------------------------------ -<span>The pipeline calls the following tools:</span> - -- <span>[fastqc](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), - a<span> quality control tool for high throughput - sequence data.</span></span> -- <span>[gatk](https://www.broadinstitute.org/gatk/), <span>The - Genome Analysis Toolkit or GATK is a software package developed at - the Broad Institute to analyze high-throughput sequencing data. The - toolkit offers a wide variety of tools, with a primary focus on - variant discovery and genotyping as well as strong emphasis on data - quality assurance. Its robust architecture, powerful processing - engine and high-performance computing features make it capable of - taking on projects of any size.</span></span> -- <span>[hpg-aligner](http://wiki.opencb.org/projects/hpg/doku.php?id=aligner:downloads), <span>HPG - Aligner has been designed to align short and long reads with high - sensitivity, therefore any number of mismatches or indels - are allowed. HPG Aligner implements and combines two well known - algorithms: </span>*Burrows-Wheeler Transform*<span> (BWT) to - speed-up mapping high-quality reads, - and </span>*Smith-Waterman*<span> (SW) to increase sensitivity when - reads cannot be mapped using BWT.</span></span> -- <span><span><span>[hpg-fastq](http://docs.bioinfo.cipf.es/projects/fastqhpc/wiki), <span> a - quality control tool for high throughput - sequence data.</span></span></span></span> -- <span><span><span><span>[hpg-variant](http://wiki.opencb.org/projects/hpg/doku.php?id=variant:downloads), <span>The - HPG Variant suite is an ambitious project aimed to provide a - complete suite of tools to work with genomic variation data, from - VCF tools to variant profiling or genomic statistics. It is being - implemented using High Performance Computing technologies to provide - the best performance possible.</span></span></span></span></span> -- <span><span><span><span>[picard](http://picard.sourceforge.net/), <span>Picard - comprises Java-based command-line utilities that manipulate SAM - files, and a Java API (HTSJDK) for creating new programs that read - and write SAM files. Both SAM text format and SAM binary (BAM) - format are supported.</span></span></span></span></span> -- <span><span><span><span>[samtools](http://samtools.sourceforge.net/samtools-c.shtml), <span>SAM - Tools provide various utilities for manipulating alignments in the - SAM format, including sorting, merging, indexing and generating - alignments in a - per-position format.</span></span></span></span></span> -- <span><span><span><span><span>[snpEff](http://snpeff.sourceforge.net/), <span>Genetic - variant annotation and effect - prediction toolbox.</span></span></span></span></span></span> +>The pipeline calls the following tools: + +- >[fastqc](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), + a> quality control tool for high throughput + sequence data. +- >[gatk](https://www.broadinstitute.org/gatk/), >The + Genome Analysis Toolkit or GATK is a software package developed at + the Broad Institute to analyze high-throughput sequencing data. The + toolkit offers a wide variety of tools, with a primary focus on + variant discovery and genotyping as well as strong emphasis on data + quality assurance. Its robust architecture, powerful processing + engine and high-performance computing features make it capable of + taking on projects of any size. +- >[hpg-aligner](http://wiki.opencb.org/projects/hpg/doku.php?id=aligner:downloads), >HPG + Aligner has been designed to align short and long reads with high + sensitivity, therefore any number of mismatches or indels + are allowed. HPG Aligner implements and combines two well known + algorithms: *Burrows-Wheeler Transform*> (BWT) to + speed-up mapping high-quality reads, + and *Smith-Waterman*> (SW) to increase sensitivity when + reads cannot be mapped using BWT. +- >><span>[hpg-fastq](http://docs.bioinfo.cipf.es/projects/fastqhpc/wiki), <span> a + quality control tool for high throughput + sequence data.</span></span> +- >><span><span>[hpg-variant](http://wiki.opencb.org/projects/hpg/doku.php?id=variant:downloads), <span>The + HPG Variant suite is an ambitious project aimed to provide a + complete suite of tools to work with genomic variation data, from + VCF tools to variant profiling or genomic statistics. It is being + implemented using High Performance Computing technologies to provide + the best performance possible.</span></span></span> +- >><span><span>[picard](http://picard.sourceforge.net/), <span>Picard + comprises Java-based command-line utilities that manipulate SAM + files, and a Java API (HTSJDK) for creating new programs that read + and write SAM files. Both SAM text format and SAM binary (BAM) + format are supported.</span></span></span> +- >><span><span>[samtools](http://samtools.sourceforge.net/samtools-c.shtml), <span>SAM + Tools provide various utilities for manipulating alignments in the + SAM format, including sorting, merging, indexing and generating + alignments in a + per-position format.</span></span></span> +- >><span><span><span>[snpEff](http://snpeff.sourceforge.net/), <span>Genetic + variant annotation and effect + prediction toolbox.</span></span></span></span> This listing show which tools are used in each step of the pipeline : @@ -639,37 +639,37 @@ This listing show which tools are used in each step of the pipeline : <!-- --> -- <span>stage-00: fastqc</span> -- <span>stage-01: hpg_fastq</span> -- <span>stage-02: fastqc</span> -- <span>stage-03: hpg_aligner and samtools</span> -- <span>stage-04: samtools</span> -- <span>stage-05: samtools</span> -- <span>stage-06: fastqc</span> -- <span>stage-07: picard</span> -- <span>stage-08: fastqc</span> -- <span>stage-09: picard</span> -- <span>stage-10: gatk</span> -- <span>stage-11: gatk</span> -- <span>stage-12: gatk</span> -- <span>stage-13: gatk</span> -- <span>stage-14: gatk</span> -- <span>stage-15: gatk</span> -- <span>stage-16: samtools</span> -- <span>stage-17: samtools</span> -- <span>stage-18: fastqc</span> -- <span>stage-19: gatk</span> -- <span>stage-20: gatk</span> -- <span>stage-21: gatk</span> -- <span>stage-22: gatk</span> -- <span>stage-23: gatk</span> -- <span>stage-24: hpg-variant</span> -- <span>stage-25: hpg-variant</span> -- <span>stage-26: snpEff</span> -- <span>stage-27: snpEff</span> -- <span>stage-28: hpg-variant</span> - -<span>Interpretation</span> +- >stage-00: fastqc +- >stage-01: hpg_fastq +- >stage-02: fastqc +- >stage-03: hpg_aligner and samtools +- >stage-04: samtools +- >stage-05: samtools +- >stage-06: fastqc +- >stage-07: picard +- >stage-08: fastqc +- >stage-09: picard +- >stage-10: gatk +- >stage-11: gatk +- >stage-12: gatk +- >stage-13: gatk +- >stage-14: gatk +- >stage-15: gatk +- >stage-16: samtools +- >stage-17: samtools +- >stage-18: fastqc +- >stage-19: gatk +- >stage-20: gatk +- >stage-21: gatk +- >stage-22: gatk +- >stage-23: gatk +- >stage-24: hpg-variant +- >stage-25: hpg-variant +- >stage-26: snpEff +- >stage-27: snpEff +- >stage-28: hpg-variant + +>Interpretation --------------------------- The output folder contains all the subfolders with the intermediate @@ -680,12 +680,12 @@ file button. It is important to note here that the entire management of the VCF file is local: no patient’s sequence data is sent over the Internet thus avoiding any problem of data privacy or confidentiality. -<span></span> +starts.](images/fig7.png "fig7.png") -***Figure 7***. *TEAM upload panel.* *Once the file has been uploaded, a +*Figure 7***. *TEAM upload panel.* *Once the file has been uploaded, a panel must be chosen from the Panel **** list. Then, pressing the Run button the diagnostic process starts.* @@ -709,7 +709,7 @@ associated to an already existing disease term (action E). Disease terms can be removed by simply dragging themback (action H).](images/fig7x.png "fig7x.png") -***Figure 7.*** *The panel manager. The elements used to define a panel +*Figure 7.*** *The panel manager. The elements used to define a panel are (**A**) disease terms, (**B**) diagnostic mutations and (**C**) genes. Arrows represent actions that can be taken in the panel manager. Panels can be defined by using the known mutations and genes of a @@ -718,7 +718,7 @@ Diagnostic** box (action **D**). This action, in addition to defining the diseases in the **Primary Diagnostic** box, automatically adds the corresponding genes to the **Genes** box. The panels can be customized by adding new genes (action **F**) or removing undesired genes (action -**G**). New disease mutations can be added independently or associated +G**). New disease mutations can be added independently or associated to an already existing disease term (action **E**). Disease terms can be removed by simply dragging them back (action **H**).* @@ -729,7 +729,7 @@ BierApp by using the following form: job as well as a description. ](images/fig8.png "fig8.png")* -****Figure 8.*** *BierApp VCF upload panel. It is recommended to choose +**Figure 8.*** *BierApp VCF upload panel. It is recommended to choose a name for the job as well as a description.** Each prioritization (â€job’) has three associated screens that facilitate @@ -739,16 +739,16 @@ number and types of variants found and its distribution according to consequence types. The second screen, in the â€Variants and effect’ tab, is the actual filtering tool, and the third one, the â€Genome view’ tab, offers a representation of the selected variants within the genomic -context provided by an embedded version of <span>the Genome Maps Tool -</span>^(30)^<span>.</span> +context provided by an embedded version of >the Genome Maps Tool +^(30)^>. -**** -*****Figure 9.*** *This picture shows all the information associated to +***Figure 9.*** *This picture shows all the information associated to the variants. If a variant has an associated phenotype we could see it in the last column. In this case, the variant 7:132481242 C>T is associated to the phenotype: large intestine tumor.*** @@ -756,147 +756,147 @@ associated to the phenotype: large intestine tumor.*** * * -<span> -</span> +> + -<span>References</span> +>References ----------------------- -1. <span class="discreet">Heng Li, Bob Handsaker, Alec Wysoker, Tim - Fennell, Jue Ruan, Nils Homer, Gabor Marth5, Goncalo Abecasis6, - Richard Durbin and 1000 Genome Project Data Processing Subgroup: The - Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, - 25: 2078-2079.</span> -2. <span class="discreet"><span>McKenna A, Hanna M, Banks E, Sivachenko - A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, - Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce - framework for analyzing next-generation DNA sequencing data. - </span>*Genome Res* <span>2010, 20:1297-1303.</span></span> -3. <span class="discreet">Petr Danecek, Adam Auton, Goncalo Abecasis, - Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. - Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean - McVean, Richard Durbin, and 1000 Genomes Project Analysis Group. The - variant call format and VCFtools. Bioinformatics 2011, - 27: 2156-2158.</span> -4. <span class="discreet">Medina I, De Maria A, Bleda M, Salavert F, - Alonso R, Gonzalez CY, Dopazo J: VARIANT: Command Line, Web service - and Web interface for fast and accurate functional characterization - of variants found by Next-Generation Sequencing. Nucleic Acids Res - 2012, 40:W54-58.</span> -5. <span class="discreet">Bleda M, Tarraga J, de Maria A, Salavert F, - Garcia-Alonso L, Celma M, Martin A, Dopazo J, Medina I: CellBase, a - comprehensive collection of RESTful web services for retrieving - relevant biological information from heterogeneous sources. Nucleic - Acids Res 2012, 40:W609-614.</span> -6. <span class="discreet">Flicek,P., Amode,M.R., Barrell,D., Beal,K., - Brent,S., Carvalho-Silva,D., Clapham,P., Coates,G., - Fairley,S., Fitzgerald,S. et al. (2012) Ensembl 2012. Nucleic Acids - Res., 40, D84–D90.</span> -7. <span class="discreet">UniProt Consortium. (2012) Reorganizing the - protein space at the Universal Protein Resource (UniProt). Nucleic - Acids Res., 40, D71–D75.</span> -8. <span class="discreet">Kozomara,A. and Griffiths-Jones,S. (2011) - miRBase: integrating microRNA annotation and deep-sequencing data. - Nucleic Acids Res., 39, D152–D157.</span> -9. <span class="discreet">Xiao,F., Zuo,Z., Cai,G., Kang,S., Gao,X. - and Li,T. (2009) miRecords: an integrated resource for - microRNA-target interactions. Nucleic Acids Res., - 37, D105–D110.</span> -10. <span class="discreet">Hsu,S.D., Lin,F.M., Wu,W.Y., Liang,C., - Huang,W.C., Chan,W.L., Tsai,W.T., Chen,G.Z., Lee,C.J., Chiu,C.M. - et al. (2011) miRTarBase: a database curates experimentally - validated microRNA-target interactions. Nucleic Acids Res., - 39, D163–D169.</span> -11. <span class="discreet">Friedman,R.C., Farh,K.K., Burge,C.B. - and Bartel,D.P. (2009) Most mammalian mRNAs are conserved targets - of microRNAs. Genome Res., 19, 92–105.</span> -12. <span class="discreet">Betel,D., Wilson,M., Gabow,A., Marks,D.S. - and Sander,C. (2008) The microRNA.org resource: targets - and expression. Nucleic Acids Res., 36, D149–D153.</span> -13. <span class="discreet">Dreszer,T.R., Karolchik,D., Zweig,A.S., - Hinrichs,A.S., Raney,B.J., Kuhn,R.M., Meyer,L.R., Wong,M., - Sloan,C.A., Rosenbloom,K.R. et al. (2012) The UCSC genome browser - database: extensions and updates 2011. Nucleic Acids Res., - 40, D918–D923.</span> -14. <span class="discreet">Smith,B., Ashburner,M., Rosse,C., Bard,J., - Bug,W., Ceusters,W., Goldberg,L.J., Eilbeck,K., - Ireland,A., Mungall,C.J. et al. (2007) The OBO Foundry: coordinated - evolution of ontologies to support biomedical data integration. Nat. - Biotechnol., 25, 1251–1255.</span> -15. <span class="discreet">Hunter,S., Jones,P., Mitchell,A., - Apweiler,R., Attwood,T.K.,Bateman,A., Bernard,T., Binns,D., - Bork,P., Burge,S. et al. (2012) InterPro in 2011: new developments - in the family and domain prediction database. Nucleic Acids Res., - 40, D306–D312.</span> -16. <span class="discreet">Sherry,S.T., Ward,M.H., Kholodov,M., - Baker,J., Phan,L., Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the - NCBI database of genetic variation. Nucleic Acids Res., - 29, 308–311.</span> -17. <span class="discreet">Altshuler,D.M., Gibbs,R.A., Peltonen,L., - Dermitzakis,E., Schaffner,S.F., Yu,F., Bonnen,P.E., de Bakker,P.I., - Deloukas,P., Gabriel,S.B. et al. (2010) Integrating common and rare - genetic variation in diverse human populations. Nature, - 467, 52–58.</span> -18. <span class="discreet">1000 Genomes Project Consortium. (2010) A map - of human genome variation from population-scale sequencing. Nature, - 467, 1061–1073.</span> -19. <span class="discreet">Hindorff,L.A., Sethupathy,P., Junkins,H.A., - Ramos,E.M., Mehta,J.P., Collins,F.S. and Manolio,T.A. (2009) - Potential etiologic and functional implications of genome-wide - association loci for human diseases and traits. Proc. Natl Acad. - Sci. USA, 106, 9362–9367.</span> -20. <span class="discreet">Stenson,P.D., Ball,E.V., Mort,M., - Phillips,A.D., Shiel,J.A., Thomas,N.S., Abeysinghe,S., Krawczak,M. - and Cooper,D.N. (2003) Human gene mutation database (HGMD): - 2003 update. Hum. Mutat., 21, 577–581.</span> -21. <span class="discreet">Johnson,A.D. and O’Donnell,C.J. (2009) An - open access database of genome-wide association results. BMC Med. - Genet, 10, 6.</span> -22. <span class="discreet">McKusick,V. (1998) A Catalog of Human Genes - and Genetic Disorders, 12th edn. John Hopkins University - Press,Baltimore, MD.</span> -23. <span class="discreet">Forbes,S.A., Bindal,N., Bamford,S., Cole,C., - Kok,C.Y., Beare,D., Jia,M., Shepherd,R., Leung,K., Menzies,A. et al. - (2011) COSMIC: mining complete cancer genomes in the catalogue of - somatic mutations in cancer. Nucleic Acids Res., - 39, D945–D950.</span> -24. <span class="discreet">Kerrien,S., Aranda,B., Breuza,L., Bridge,A., - Broackes-Carter,F., Chen,C., Duesbury,M., Dumousseau,M., - Feuermann,M., Hinz,U. et al. (2012) The Intact molecular interaction - database in 2012. Nucleic Acids Res., 40, D841–D846.</span> -25. <span class="discreet">Croft,D., O’Kelly,G., Wu,G., Haw,R., - Gillespie,M., Matthews,L., Caudy,M., Garapati,P., - Gopinath,G., Jassal,B. et al. (2011) Reactome: a database of - reactions, pathways and biological processes. Nucleic Acids Res., - 39, D691–D697.</span> -26. <span class="discreet">Demir,E., Cary,M.P., Paley,S., Fukuda,K., - Lemer,C., Vastrik,I.,Wu,G., D’Eustachio,P., Schaefer,C., Luciano,J. - et al. (2010) The BioPAX community standard for pathway - data sharing. Nature Biotechnol., 28, 935–942.</span> -27. <span class="discreet">Alemán Z, GarcĂa-GarcĂa F, Medina I, Dopazo J - (2014): A web tool for the design and management of panels of genes - for targeted enrichment and massive sequencing for - clinical applications. Nucleic Acids Res 42: W83-7.</span> -28. <span class="discreet">[Alemán - A](http://www.ncbi.nlm.nih.gov/pubmed?term=Alem%C3%A1n%20A%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)<span>, </span>[Garcia-Garcia - F](http://www.ncbi.nlm.nih.gov/pubmed?term=Garcia-Garcia%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)<span>, </span>[Salavert - F](http://www.ncbi.nlm.nih.gov/pubmed?term=Salavert%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)<span>, </span>[Medina - I](http://www.ncbi.nlm.nih.gov/pubmed?term=Medina%20I%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)<span>, </span>[Dopazo - J](http://www.ncbi.nlm.nih.gov/pubmed?term=Dopazo%20J%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)<span> (2014). - A web-based interactive framework to assist in the prioritization of - disease candidate genes in whole-exome sequencing studies. - </span>[Nucleic - Acids Res.](http://www.ncbi.nlm.nih.gov/pubmed/?term=BiERapp "Nucleic acids research.")<span>42 :W88-93.</span></span> -29. <span class="discreet">Landrum,M.J., Lee,J.M., Riley,G.R., Jang,W., - Rubinstein,W.S., Church,D.M. and Maglott,D.R. (2014) ClinVar: public - archive of relationships among sequence variation and - human phenotype. Nucleic Acids Res., 42, D980–D985.</span> -30. <span class="discreet">Medina I, Salavert F, Sanchez R, de Maria A, - Alonso R, Escobar P, Bleda M, Dopazo J: Genome Maps, a new - generation genome browser. Nucleic Acids Res 2013, 41:W41-46.</span> +1. class="discreet">Heng Li, Bob Handsaker, Alec Wysoker, Tim + Fennell, Jue Ruan, Nils Homer, Gabor Marth5, Goncalo Abecasis6, + Richard Durbin and 1000 Genome Project Data Processing Subgroup: The + Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, + 25: 2078-2079. +2. class="discreet">>McKenna A, Hanna M, Banks E, Sivachenko + A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, + Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce + framework for analyzing next-generation DNA sequencing data. + *Genome Res* >2010, 20:1297-1303.</span> +3. class="discreet">Petr Danecek, Adam Auton, Goncalo Abecasis, + Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. + Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean + McVean, Richard Durbin, and 1000 Genomes Project Analysis Group. The + variant call format and VCFtools. Bioinformatics 2011, + 27: 2156-2158. +4. class="discreet">Medina I, De Maria A, Bleda M, Salavert F, + Alonso R, Gonzalez CY, Dopazo J: VARIANT: Command Line, Web service + and Web interface for fast and accurate functional characterization + of variants found by Next-Generation Sequencing. Nucleic Acids Res + 2012, 40:W54-58. +5. class="discreet">Bleda M, Tarraga J, de Maria A, Salavert F, + Garcia-Alonso L, Celma M, Martin A, Dopazo J, Medina I: CellBase, a + comprehensive collection of RESTful web services for retrieving + relevant biological information from heterogeneous sources. Nucleic + Acids Res 2012, 40:W609-614. +6. class="discreet">Flicek,P., Amode,M.R., Barrell,D., Beal,K., + Brent,S., Carvalho-Silva,D., Clapham,P., Coates,G., + Fairley,S., Fitzgerald,S. et al. (2012) Ensembl 2012. Nucleic Acids + Res., 40, D84–D90. +7. class="discreet">UniProt Consortium. (2012) Reorganizing the + protein space at the Universal Protein Resource (UniProt). Nucleic + Acids Res., 40, D71–D75. +8. class="discreet">Kozomara,A. and Griffiths-Jones,S. (2011) + miRBase: integrating microRNA annotation and deep-sequencing data. + Nucleic Acids Res., 39, D152–D157. +9. class="discreet">Xiao,F., Zuo,Z., Cai,G., Kang,S., Gao,X. + and Li,T. (2009) miRecords: an integrated resource for + microRNA-target interactions. Nucleic Acids Res., + 37, D105–D110. +10. class="discreet">Hsu,S.D., Lin,F.M., Wu,W.Y., Liang,C., + Huang,W.C., Chan,W.L., Tsai,W.T., Chen,G.Z., Lee,C.J., Chiu,C.M. + et al. (2011) miRTarBase: a database curates experimentally + validated microRNA-target interactions. Nucleic Acids Res., + 39, D163–D169. +11. class="discreet">Friedman,R.C., Farh,K.K., Burge,C.B. + and Bartel,D.P. (2009) Most mammalian mRNAs are conserved targets + of microRNAs. Genome Res., 19, 92–105. +12. class="discreet">Betel,D., Wilson,M., Gabow,A., Marks,D.S. + and Sander,C. (2008) The microRNA.org resource: targets + and expression. Nucleic Acids Res., 36, D149–D153. +13. class="discreet">Dreszer,T.R., Karolchik,D., Zweig,A.S., + Hinrichs,A.S., Raney,B.J., Kuhn,R.M., Meyer,L.R., Wong,M., + Sloan,C.A., Rosenbloom,K.R. et al. (2012) The UCSC genome browser + database: extensions and updates 2011. Nucleic Acids Res., + 40, D918–D923. +14. class="discreet">Smith,B., Ashburner,M., Rosse,C., Bard,J., + Bug,W., Ceusters,W., Goldberg,L.J., Eilbeck,K., + Ireland,A., Mungall,C.J. et al. (2007) The OBO Foundry: coordinated + evolution of ontologies to support biomedical data integration. Nat. + Biotechnol., 25, 1251–1255. +15. class="discreet">Hunter,S., Jones,P., Mitchell,A., + Apweiler,R., Attwood,T.K.,Bateman,A., Bernard,T., Binns,D., + Bork,P., Burge,S. et al. (2012) InterPro in 2011: new developments + in the family and domain prediction database. Nucleic Acids Res., + 40, D306–D312. +16. class="discreet">Sherry,S.T., Ward,M.H., Kholodov,M., + Baker,J., Phan,L., Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the + NCBI database of genetic variation. Nucleic Acids Res., + 29, 308–311. +17. class="discreet">Altshuler,D.M., Gibbs,R.A., Peltonen,L., + Dermitzakis,E., Schaffner,S.F., Yu,F., Bonnen,P.E., de Bakker,P.I., + Deloukas,P., Gabriel,S.B. et al. (2010) Integrating common and rare + genetic variation in diverse human populations. Nature, + 467, 52–58. +18. class="discreet">1000 Genomes Project Consortium. (2010) A map + of human genome variation from population-scale sequencing. Nature, + 467, 1061–1073. +19. class="discreet">Hindorff,L.A., Sethupathy,P., Junkins,H.A., + Ramos,E.M., Mehta,J.P., Collins,F.S. and Manolio,T.A. (2009) + Potential etiologic and functional implications of genome-wide + association loci for human diseases and traits. Proc. Natl Acad. + Sci. USA, 106, 9362–9367. +20. class="discreet">Stenson,P.D., Ball,E.V., Mort,M., + Phillips,A.D., Shiel,J.A., Thomas,N.S., Abeysinghe,S., Krawczak,M. + and Cooper,D.N. (2003) Human gene mutation database (HGMD): + 2003 update. Hum. Mutat., 21, 577–581. +21. class="discreet">Johnson,A.D. and O’Donnell,C.J. (2009) An + open access database of genome-wide association results. BMC Med. + Genet, 10, 6. +22. class="discreet">McKusick,V. (1998) A Catalog of Human Genes + and Genetic Disorders, 12th edn. John Hopkins University + Press,Baltimore, MD. +23. class="discreet">Forbes,S.A., Bindal,N., Bamford,S., Cole,C., + Kok,C.Y., Beare,D., Jia,M., Shepherd,R., Leung,K., Menzies,A. et al. + (2011) COSMIC: mining complete cancer genomes in the catalogue of + somatic mutations in cancer. Nucleic Acids Res., + 39, D945–D950. +24. class="discreet">Kerrien,S., Aranda,B., Breuza,L., Bridge,A., + Broackes-Carter,F., Chen,C., Duesbury,M., Dumousseau,M., + Feuermann,M., Hinz,U. et al. (2012) The Intact molecular interaction + database in 2012. Nucleic Acids Res., 40, D841–D846. +25. class="discreet">Croft,D., O’Kelly,G., Wu,G., Haw,R., + Gillespie,M., Matthews,L., Caudy,M., Garapati,P., + Gopinath,G., Jassal,B. et al. (2011) Reactome: a database of + reactions, pathways and biological processes. Nucleic Acids Res., + 39, D691–D697. +26. class="discreet">Demir,E., Cary,M.P., Paley,S., Fukuda,K., + Lemer,C., Vastrik,I.,Wu,G., D’Eustachio,P., Schaefer,C., Luciano,J. + et al. (2010) The BioPAX community standard for pathway + data sharing. Nature Biotechnol., 28, 935–942. +27. class="discreet">Alemán Z, GarcĂa-GarcĂa F, Medina I, Dopazo J + (2014): A web tool for the design and management of panels of genes + for targeted enrichment and massive sequencing for + clinical applications. Nucleic Acids Res 42: W83-7. +28. class="discreet">[Alemán + A](http://www.ncbi.nlm.nih.gov/pubmed?term=Alem%C3%A1n%20A%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Garcia-Garcia + F](http://www.ncbi.nlm.nih.gov/pubmed?term=Garcia-Garcia%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Salavert + F](http://www.ncbi.nlm.nih.gov/pubmed?term=Salavert%20F%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Medina + I](http://www.ncbi.nlm.nih.gov/pubmed?term=Medina%20I%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)>, [Dopazo + J](http://www.ncbi.nlm.nih.gov/pubmed?term=Dopazo%20J%5BAuthor%5D&cauthor=true&cauthor_uid=24803668)> (2014). + A web-based interactive framework to assist in the prioritization of + disease candidate genes in whole-exome sequencing studies. + [Nucleic + Acids Res.](http://www.ncbi.nlm.nih.gov/pubmed/?term=BiERapp "Nucleic acids research.")>42 :W88-93. +29. class="discreet">Landrum,M.J., Lee,J.M., Riley,G.R., Jang,W., + Rubinstein,W.S., Church,D.M. and Maglott,D.R. (2014) ClinVar: public + archive of relationships among sequence variation and + human phenotype. Nucleic Acids Res., 42, D980–D985. +30. class="discreet">Medina I, Salavert F, Sanchez R, de Maria A, + Alonso R, Escobar P, Bleda M, Dopazo J: Genome Maps, a new + generation genome browser. Nucleic Acids Res 2013, 41:W41-46.  -<span> -</span> +> + diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/priorization-component-bierapp.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/priorization-component-bierapp.md index fe9b742bc55f13f1856e23e997831d7ac5862666..0474925e999d50a5e290e951806fa42b14892098 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/priorization-component-bierapp.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/omics-master-1/priorization-component-bierapp.md @@ -10,13 +10,13 @@ BiERApp is available at the following address The address is accessible only via [VPN. ](../../accessing-the-cluster/vpn-access.html) -### <span>BiERApp</span> +### >BiERApp -### <span>This tool is aimed to discover new disease genes or variants by studying affected families or cases and controls. It carries out a filtering process to sequentially remove: (i) variants which are not no compatible with the disease because are not expected to have impact on the protein function; (ii) variants that exist at frequencies incompatible with the disease; (iii) variants that do not segregate with the disease. The result is a reduced set of disease gene candidates that should be further validated experimentally.</span> +### >This tool is aimed to discover new disease genes or variants by studying affected families or cases and controls. It carries out a filtering process to sequentially remove: (i) variants which are not no compatible with the disease because are not expected to have impact on the protein function; (ii) variants that exist at frequencies incompatible with the disease; (iii) variants that do not segregate with the disease. The result is a reduced set of disease gene candidates that should be further validated experimentally. -BiERapp <span>(28)</span> efficiently helps in the identification of +BiERapp >(28) efficiently helps in the identification of causative variants in family and sporadic genetic diseases. The program reads lists of predicted variants (nucleotide substitutions and indels) in affected individuals or tumor samples and controls. In family @@ -37,10 +37,10 @@ filters available. The tool includes a genomic viewer (Genome Maps 30) that enables the representation of the variants in the corresponding genomic coordinates.](images/fig6.png.1 "fig6.png") -***Figure 6***. *Web interface to the prioritization tool.* *This +*Figure 6***. *Web interface to the prioritization tool.* *This figure* *shows the interface of the web tool for candidate gene prioritization with the filters available. The tool includes a genomic -viewer (Genome Maps <span>30</span>) that enables the representation of +viewer (Genome Maps >30) that enables the representation of the variants in the corresponding genomic coordinates.* diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/openfoam.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/openfoam.md index 781c001d753fd6cf44ef0028c7cb6581cae080d8..e458aebc392201979ba1a46b071c3e52e276f379 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/openfoam.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/openfoam.md @@ -4,9 +4,9 @@ OpenFOAM A free, open source CFD software package - -**Introduction** + +Introduction** ---------------- OpenFOAM is a free, open source CFD software package developed @@ -18,7 +18,7 @@ academic organisations. Homepage: <http://www.openfoam.com/> -### **Installed version** +###Installed version** Currently, several version compiled by GCC/ICC compilers in single/double precision with several version of openmpi are available on @@ -26,79 +26,79 @@ Anselm. For example syntax of available OpenFOAM module is: -<span>< openfoam/2.2.1-icc-openmpi1.6.5-DP ></span> +>< openfoam/2.2.1-icc-openmpi1.6.5-DP > -this means openfoam version <span>2.2.1</span> compiled by -<span>ICC</span> compiler with <span>openmpi1.6.5</span> in<span> double -precision</span>. +this means openfoam version >2.2.1 compiled by +>ICC compiler with >openmpi1.6.5 in<span> double +precision. Naming convection of the installed versions is following: -<span>  -openfoam/<</span><span>VERSION</span><span>>-<</span><span>COMPILER</span><span>>-<</span><span>openmpiVERSION</span><span>>-<</span><span>PRECISION</span><span>></span> +>  +openfoam/<>VERSION>>-<</span><span>COMPILER</span><span>>-<</span><span>openmpiVERSION</span><span>>-<</span><span>PRECISION</span><span>></span> -- <span><</span><span>VERSION</span><span>> - version of - openfoam</span> -- <span><<span>COMPILER</span>> - version of used - compiler</span> -- <span><<span>openmpiVERSION</span>> - version of used - openmpi/impi</span> -- <span><<span>PRECISION</span>> - DP/</span><span>SP – - double/single precision</span> +- ><>VERSION<span>> - version of + openfoam +- ><>COMPILER> - version of used + compiler +- ><>openmpiVERSION> - version of used + openmpi/impi +- ><>PRECISION> - DP/<span>SP – + double/single precision -### **Available OpenFOAM modules** +###Available OpenFOAM modules** To check available modules use - $ module avail + $ module avail In /opt/modules/modulefiles/engineering you can see installed engineering softwares: - ------------------------------------ /opt/modules/modulefiles/engineering ------------------------------------------------------------- - ansys/14.5.x              matlab/R2013a-COM                               openfoam/2.2.1-icc-impi4.1.1.036-DP - comsol/43b-COM            matlab/R2013a-EDU                               openfoam/2.2.1-icc-openmpi1.6.5-DP - comsol/43b-EDU            openfoam/2.2.1-gcc481-openmpi1.6.5-DP           paraview/4.0.1-gcc481-bullxmpi1.2.4.1-osmesa10.0 - lsdyna/7.x.x              openfoam/2.2.1-gcc481-openmpi1.6.5-SP + ------------------------------------ /opt/modules/modulefiles/engineering ------------------------------------------------------------- + ansys/14.5.x              matlab/R2013a-COM                               openfoam/2.2.1-icc-impi4.1.1.036-DP + comsol/43b-COM            matlab/R2013a-EDU                               openfoam/2.2.1-icc-openmpi1.6.5-DP + comsol/43b-EDU            openfoam/2.2.1-gcc481-openmpi1.6.5-DP           paraview/4.0.1-gcc481-bullxmpi1.2.4.1-osmesa10.0 + lsdyna/7.x.x              openfoam/2.2.1-gcc481-openmpi1.6.5-SP For information how to use modules please [look here](../environment-and-modules.html "Environment and Modules "). -**Getting Started** +Getting Started** ------------------- To create OpenFOAM environment on ANSELM give the commands: - $ module load openfoam/2.2.1-icc-openmpi1.6.5-DP + $ module load openfoam/2.2.1-icc-openmpi1.6.5-DP - $ source $FOAM_BASHRC + $ source $FOAM_BASHRC Pleas load correct module with your requirements “compiler - GCC/ICC, precision - DP/SP”. Create a project directory within the $HOME/OpenFOAM directory -named <span><USER></span>-<OFversion> and create a directory +named ><USER>-<OFversion> and create a directory named run within it, e.g. by typing: - $ mkdir -p $FOAM_RUN + $ mkdir -p $FOAM_RUN Project directory is now available by typing: - $ cd /home/<USER>/OpenFOAM/<USER>-<OFversion>/run + $ cd /home/<USER>/OpenFOAM/<USER>-<OFversion>/run <OFversion> - for example <2.2.1> or - $ cd $FOAM_RUN + $ cd $FOAM_RUN @@ -106,209 +106,209 @@ Copy the tutorial examples directory in the OpenFOAM distribution to the run directory: - $ cp -r $FOAM_TUTORIALS $FOAM_RUN + $ cp -r $FOAM_TUTORIALS $FOAM_RUN Now you can run the first case for example incompressible laminar flow in a cavity. -**Running Serial Applications** +Running Serial Applications** ------------------------------- -<span>Create a Bash script </span><span>test.sh</span> -<span></span> +>Create a Bash script >test.sh +> -<span> </span> - #!/bin/bash - module load openfoam/2.2.1-icc-openmpi1.6.5-DP - source $FOAM_BASHRC +> + #!/bin/bash + module load openfoam/2.2.1-icc-openmpi1.6.5-DP + source $FOAM_BASHRC - # source to run functions - . $WM_PROJECT_DIR/bin/tools/RunFunctions + # source to run functions + . $WM_PROJECT_DIR/bin/tools/RunFunctions - cd $FOAM_RUN/tutorials/incompressible/icoFoam/cavity + cd $FOAM_RUN/tutorials/incompressible/icoFoam/cavity - runApplication blockMesh - runApplication icoFoam + runApplication blockMesh + runApplication icoFoam -<span> </span> +> -<span> </span> +> -<span>Job submission</span> +>Job submission -<span> </span> - $ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16,walltime=03:00:00 test.sh +> + $ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16,walltime=03:00:00 test.sh -<span> </span> +> -<span> </span>For information about job submission please [look +> For information about job submission please [look here](../resource-allocation-and-job-execution/job-submission-and-execution.html "Job submission"). -**<span>Running applications in parallel</span>** +>Running applications in parallel** ------------------------------------------------- -<span>Run the second case for example external incompressible turbulent -flow - case - motorBike.</span> -<span>First we must run serial application bockMesh and decomposePar for -preparation of parallel computation.</span> -<span> -</span> +>Run the second case for example external incompressible turbulent +flow - case - motorBike. +>First we must run serial application bockMesh and decomposePar for +preparation of parallel computation. +> + -<span>Create a Bash scrip test.sh:</span> +>Create a Bash scrip test.sh: -<span> </span> - #!/bin/bash - module load openfoam/2.2.1-icc-openmpi1.6.5-DP - source $FOAM_BASHRC +> + #!/bin/bash + module load openfoam/2.2.1-icc-openmpi1.6.5-DP + source $FOAM_BASHRC - # source to run functions - . $WM_PROJECT_DIR/bin/tools/RunFunctions + # source to run functions + . $WM_PROJECT_DIR/bin/tools/RunFunctions - cd $FOAM_RUN/tutorials/incompressible/simpleFoam/motorBike + cd $FOAM_RUN/tutorials/incompressible/simpleFoam/motorBike - runApplication blockMesh - runApplication decomposePar + runApplication blockMesh + runApplication decomposePar -<span> -</span> +> -<span> </span> +> -<span>Job submission</span> +>Job submission -<span> </span> - $ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16,walltime=03:00:00 test.sh -<span> </span> +> + $ qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=16,walltime=03:00:00 test.sh +> -<span><span>This job create simple block mesh and domain decomposition. -Check your decomposition, and submit parallel computation:</span></span> -<span><span>Create a PBS script<span> -testParallel.pbs</span>:</span></span> +>>This job create simple block mesh and domain decomposition. +Check your decomposition, and submit parallel computation: +>>Create a PBS script<span> +testParallel.pbs:</span> -<span> </span> - #!/bin/bash - #PBS -N motorBike - #PBS -l select=2:ncpus=16 - #PBS -l walltime=01:00:00 - #PBS -q qprod - #PBS -A OPEN-0-0 - module load openfoam/2.2.1-icc-openmpi1.6.5-DP - source $FOAM_BASHRC +> + #!/bin/bash + #PBS -N motorBike + #PBS -l select=2:ncpus=16 + #PBS -l walltime=01:00:00 + #PBS -q qprod + #PBS -A OPEN-0-0 - cd $FOAM_RUN/tutorials/incompressible/simpleFoam/motorBike + module load openfoam/2.2.1-icc-openmpi1.6.5-DP + source $FOAM_BASHRC - nproc = 32 + cd $FOAM_RUN/tutorials/incompressible/simpleFoam/motorBike - mpirun -hostfile $ -np $nproc snappyHexMesh -overwrite -parallel | tee snappyHexMesh.log + nproc = 32 - mpirun -hostfile $ -np $nproc potentialFoam -noFunctionObject-writep -parallel | tee potentialFoam.log + mpirun -hostfile $ -np $nproc snappyHexMesh -overwrite -parallel | tee snappyHexMesh.log - mpirun -hostfile $ -np $nproc simpleFoam -parallel | tee simpleFoam.log + mpirun -hostfile $ -np $nproc potentialFoam -noFunctionObject-writep -parallel | tee potentialFoam.log -<span> </span> + mpirun -hostfile $ -np $nproc simpleFoam -parallel | tee simpleFoam.log +> -<span>nproc – number of subdomains</span> -<span>Job submission</span> +>nproc – number of subdomains +>Job submission -<span> </span> - $ qsub testParallel.pbs -<span> </span> +> + $ qsub testParallel.pbs +> -**<span>Compile your own solver</span>** + +>Compile your own solver** ---------------------------------------- -<span>Initialize OpenFOAM environment before compiling your solver -</span> +>Initialize OpenFOAM environment before compiling your solver + -<span> </span> - $ module load openfoam/2.2.1-icc-openmpi1.6.5-DP - $ source $FOAM_BASHRC - $ cd $FOAM_RUN/ +> + $ module load openfoam/2.2.1-icc-openmpi1.6.5-DP + $ source $FOAM_BASHRC + $ cd $FOAM_RUN/ -<span>Create directory applications/solvers in user directory</span> -<span> </span> - $ mkdir -p applications/solvers - $ cd applications/solvers +>Create directory applications/solvers in user directory +> + $ mkdir -p applications/solvers + $ cd applications/solvers -<span> </span> +> -<span>Copy icoFoam solver’s source files</span> +>Copy icoFoam solver’s source files -<span> </span> - $ cp -r $FOAM_SOLVERS/incompressible/icoFoam/ My_icoFoam - $ cd My_icoFoam +> + $ cp -r $FOAM_SOLVERS/incompressible/icoFoam/ My_icoFoam + $ cd My_icoFoam -<span>Rename icoFoam.C to My_icoFOAM.C</span> +>Rename icoFoam.C to My_icoFOAM.C -<span> </span> - $ mv icoFoam.C My_icoFoam.C +> + $ mv icoFoam.C My_icoFoam.C -<span> </span> +> -<span>Edit <span>*files*</span> file in *Make* directory:</span> +>Edit >*files* file in *Make* directory: -<span> </span> - icoFoam.C - EXE = $(FOAM_APPBIN)/icoFoam +> + icoFoam.C + EXE = $(FOAM_APPBIN)/icoFoam -<span>and change to:</span> +>and change to: - My_icoFoam.C - EXE = $(FOAM_USER_APPBIN)/My_icoFoam + My_icoFoam.C + EXE = $(FOAM_USER_APPBIN)/My_icoFoam -<span></span> +> -<span>In directory My_icoFoam give the compilation command:</span> +>In directory My_icoFoam give the compilation command: -<span> </span> - $ wmake +> + $ wmake @@ -316,9 +316,9 @@ testParallel.pbs</span>:</span></span>  -** Have a fun with OpenFOAM :)** + Have a fun with OpenFOAM :)** - <span id="__caret"><span id="__caret"></span></span> + id="__caret"> id="__caret">  diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/operating-system.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/operating-system.md index d5b1b2ae7861b9474e73ea0924974c0b938d454a..34467f1dad4fad0e346f323db9f71275d3fa5606 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/operating-system.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/operating-system.md @@ -4,7 +4,7 @@ Operating System The operating system, deployed on ANSELM - + The operating system on Anselm is Linux - bullx Linux Server release 6.3. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/paraview.md b/converted/docs.it4i.cz/anselm-cluster-documentation/software/paraview.md index 78ddf47ffd2d29d45e01bc2463b9ec52bc90c302..757e07cfb4f61d0653bf4e3a3735bea8dd309133 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/software/paraview.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/software/paraview.md @@ -5,12 +5,12 @@ An open-source, multi-platform data analysis and visualization application - + Introduction ------------ -**ParaView** is an open-source, multi-platform data analysis and +ParaView** is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques. The data exploration can be done interactively in 3D or @@ -37,15 +37,15 @@ ParaView server is launched on compute nodes by the user, and client is launched on your desktop PC to control and view the visualization. Download ParaView client application for your OS here : <http://paraview.org/paraview/resources/software.php>. Important : -**your version must match the version number installed on Anselm** ! +your version must match the version number installed on Anselm** ! (currently v4.0.1) ### Launching server To launch the server, you must first allocate compute nodes, for example -:<span> </span> +:> - $ qsub -I -q qprod -A OPEN-0-0 -l select=2 + $ qsub -I -q qprod -A OPEN-0-0 -l select=2 to launch an interactive session on 2 nodes. Refer to [Resource Allocation and Job @@ -54,15 +54,15 @@ for details. After the interactive session is opened, load the ParaView module : - $ module add paraview + $ module add paraview Now launch the parallel server, with number of nodes times 16 processes : - $ mpirun -np 32 pvserver --use-offscreen-rendering - Waiting for client... - Connection URL: cs://cn77:11111 - Accepting connection(s): cn77:11111 + $ mpirun -np 32 pvserver --use-offscreen-rendering + Waiting for client... + Connection URL: cs://cn77:11111 + Accepting connection(s): cn77:11111  Note the that the server is listening on compute node cn77 in this case, we shall use this information later. @@ -75,15 +75,15 @@ number on your PC to be forwarded to ParaView server, for example 12345. If your PC is running Linux, use this command to estabilish a SSH tunnel : - ssh -TN -L 12345:cn77:11111 username@anselm.it4i.cz + ssh -TN -L 12345:cn77:11111 username@anselm.it4i.cz -replace <span class="monospace">username</span> with your login and cn77 +replace username with your login and cn77 with the name of compute node your ParaView server is running on (see previous step). If you use PuTTY on Windows, load Anselm connection -configuration, t<span>hen go to Connection-></span><span -class="highlightedSearchTerm">SSH</span><span>->Tunnels to set up the +configuration, t>hen go to Connection-> +class="highlightedSearchTerm">SSH>->Tunnels to set up the port forwarding. Click Remote radio button. Insert 12345 to Source port -textbox. Insert cn77:11111. Click Add button, then Open. </span>[Read +textbox. Insert cn77:11111. Click Add button, then Open. [Read more about port forwarding.](https://docs.it4i.cz/anselm-cluster-documentation/software/resolveuid/11e53ad0d2fd4c5187537f4baeedff33) @@ -103,7 +103,7 @@ click Connect to connect to the ParaView server. In your terminal where you have interactive session with ParaView server launched, you should see : - Client connected. + Client connected. You can now use Parallel ParaView. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/storage-1/cesnet-data-storage.md b/converted/docs.it4i.cz/anselm-cluster-documentation/storage-1/cesnet-data-storage.md index 2f65b5a9b91aa4b44fc2ac836760cc8497b53a3d..a5dfb47922cded184c2b07602830603efd4d70c4 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/storage-1/cesnet-data-storage.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/storage-1/cesnet-data-storage.md @@ -3,7 +3,7 @@ CESNET Data Storage - + Introduction ------------ @@ -66,31 +66,31 @@ than copied in and out in a usual fashion. First, create the mountpoint - $ mkdir cesnet + $ mkdir cesnet Mount the storage. Note that you can choose among the ssh.du1.cesnet.cz (Plzen), ssh.du2.cesnet.cz (Jihlava), ssh.du3.cesnet.cz (Brno) Mount tier1_home **(only 5120M !)**: - $ sshfs username@ssh.du1.cesnet.cz:. cesnet/ + $ sshfs username@ssh.du1.cesnet.cz:. cesnet/ For easy future access from Anselm, install your public key - $ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys + $ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys Mount tier1_cache_tape for the Storage VO: - $ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/ + $ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/ View the archive, copy the files and directories in and out - $ ls cesnet/ - $ cp -a mydir cesnet/. - $ cp cesnet/myfile . + $ ls cesnet/ + $ cp -a mydir cesnet/. + $ cp cesnet/myfile . Once done, please remember to unmount the storage - $ fusermount -u cesnet + $ fusermount -u cesnet ### Rsync access @@ -117,13 +117,13 @@ More about Rsync at Transfer large files to/from Cesnet storage, assuming membership in the Storage VO - $ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. - $ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile . + $ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. + $ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile . Transfer large directories to/from Cesnet storage, assuming membership in the Storage VO - $ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. - $ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder . + $ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. + $ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder . Transfer rates of about 28MB/s can be expected. diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/storage-1/storage.md b/converted/docs.it4i.cz/anselm-cluster-documentation/storage-1/storage.md index f5cbffc4bcc3b8a227b17a9b2a9df04b261d7c3f..f8ee9e6a4e95930030fd5abe918afbc1a75e3def 100644 --- a/converted/docs.it4i.cz/anselm-cluster-documentation/storage-1/storage.md +++ b/converted/docs.it4i.cz/anselm-cluster-documentation/storage-1/storage.md @@ -3,7 +3,7 @@ Storage - + There are two main shared file systems on Anselm cluster, the [HOME](../storage.html#home) and @@ -40,15 +40,15 @@ A user file on the Lustre filesystem can be divided into multiple chunks (OSTs) (disks). The stripes are distributed among the OSTs in a round-robin fashion to ensure load balancing. -When a client (a <span class="glossaryItem">compute <span -class="glossaryItem">node</span></span> from your job) needs to create -or access a file, the client queries the metadata server (<span -class="glossaryItem">MDS</span>) and the metadata target (<span -class="glossaryItem">MDT</span>) for the layout and location of the +When a client (a class="glossaryItem">compute +class="glossaryItem">node from your job) needs to create +or access a file, the client queries the metadata server ( +class="glossaryItem">MDS) and the metadata target ( +class="glossaryItem">MDT) for the layout and location of the [file's stripes](http://www.nas.nasa.gov/hecc/support/kb/Lustre_Basics_224.html#striping). Once the file is opened and the client obtains the striping information, -the <span class="glossaryItem">MDS</span> is no longer involved in the +the class="glossaryItem">MDS is no longer involved in the file I/O process. The client interacts directly with the object storage servers (OSSes) and OSTs to perform I/O operations such as locking, disk allocation, storage, and retrieval. @@ -61,17 +61,17 @@ There is default stripe configuration for Anselm Lustre filesystems. However, users can set the following stripe parameters for their own directories or files to get optimum I/O performance: -1. stripe_size: the size of the chunk in bytes; specify with k, m, or - g to use units of KB, MB, or GB, respectively; the size must be an - even multiple of 65,536 bytes; default is 1MB for all Anselm Lustre - filesystems -2. stripe_count the number of OSTs to stripe across; default is 1 for - Anselm Lustre filesystems one can specify -1 to use all OSTs in - the filesystem. -3. stripe_offset The index of the <span - class="glossaryItem">OST</span> where the first stripe is to be - placed; default is -1 which results in random selection; using a - non-default value is NOT recommended. +1.stripe_size: the size of the chunk in bytes; specify with k, m, or + g to use units of KB, MB, or GB, respectively; the size must be an + even multiple of 65,536 bytes; default is 1MB for all Anselm Lustre + filesystems +2.stripe_count the number of OSTs to stripe across; default is 1 for + Anselm Lustre filesystems one can specify -1 to use all OSTs in + the filesystem. +3.stripe_offset The index of the + class="glossaryItem">OST where the first stripe is to be + placed; default is -1 which results in random selection; using a + non-default value is NOT recommended.  @@ -83,22 +83,22 @@ setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. -``` +``` $ lfs getstripe dir|filename $ lfs setstripe -s stripe_size -c stripe_count -o stripe_offset dir|filename ``` Example: -``` +``` $ lfs getstripe /scratch/username/ /scratch/username/ -stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 +stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 $ lfs setstripe -c -1 /scratch/username/ $ lfs getstripe /scratch/username/ /scratch/username/ -stripe_count: 10 stripe_size: 1048576 stripe_offset: -1 +stripe_count:10 stripe_size: 1048576 stripe_offset: -1 ``` In this example, we view current stripe setting of the @@ -109,7 +109,7 @@ and verified. All files written to this directory will be striped over Use lfs check OSTs to see the number and status of active OSTs for each filesystem on Anselm. Learn more by reading the man page -``` +``` $ lfs check osts $ man lfs ``` @@ -152,51 +152,51 @@ servers (MDS) and four data/object storage servers (OSS). Two object storage servers are used for file system HOME and another two object storage servers are used for file system SCRATCH. -<span class="emphasis">Configuration of the storages</span> + class="emphasis">Configuration of the storages -- <span class="emphasis">HOME Lustre object storage</span> - <div class="itemizedlist"> +- class="emphasis">HOME Lustre object storage + <div class="itemizedlist"> - - One disk array NetApp E5400 - - 22 OSTs - - 227 2TB NL-SAS 7.2krpm disks - - 22 groups of 10 disks in RAID6 (8+2) - - 7 hot-spare disks + - One disk array NetApp E5400 + - 22 OSTs + - 227 2TB NL-SAS 7.2krpm disks + - 22 groups of 10 disks in RAID6 (8+2) + - 7 hot-spare disks - + -- <span class="emphasis">SCRATCH Lustre object storage</span> - <div class="itemizedlist"> +- class="emphasis">SCRATCH Lustre object storage + <div class="itemizedlist"> - - Two disk arrays NetApp E5400 - - 10 OSTs - - 106 2TB NL-SAS 7.2krpm disks - - 10 groups of 10 disks in RAID6 (8+2) - - 6 hot-spare disks + - Two disk arrays NetApp E5400 + - 10 OSTs + - 106 2TB NL-SAS 7.2krpm disks + - 10 groups of 10 disks in RAID6 (8+2) + - 6 hot-spare disks - + -- <span class="emphasis">Lustre metadata storage</span> - <div class="itemizedlist"> +- class="emphasis">Lustre metadata storage + <div class="itemizedlist"> - - One disk array NetApp E2600 - - 12 300GB SAS 15krpm disks - - 2 groups of 5 disks in RAID5 - - 2 hot-spare disks + - One disk array NetApp E2600 + - 12 300GB SAS 15krpm disks + - 2 groups of 5 disks in RAID5 + - 2 hot-spare disks - + -### []()[]()HOME +###HOME The HOME filesystem is mounted in directory /home. Users home directories /home/username reside on this filesystem. Accessible capacity is 320TB, shared among all users. Individual users are -restricted by filesystem usage quotas, set to 250GB per user. <span>If +restricted by filesystem usage quotas, set to 250GB per user. >If 250GB should prove as insufficient for particular user, please -contact</span> [support](https://support.it4i.cz/rt), +contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. The HOME filesystem is intended for preparation, evaluation, processing @@ -236,17 +236,17 @@ Default stripe count 1 Number of OSTs 22 -### []()[]()SCRATCH +###SCRATCH The SCRATCH filesystem is mounted in directory /scratch. Users may freely create subdirectories and files on the filesystem. Accessible capacity is 146TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 100TB per user. The purpose of this quota is to prevent runaway programs from filling the -entire filesystem and deny service to other users. <span>If 100TB should +entire filesystem and deny service to other users. >If 100TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be -lifted upon request. </span> +lifted upon request. The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input @@ -283,25 +283,25 @@ Default stripe count 1 Number of OSTs 10 -### <span>Disk usage and quota commands</span> +### >Disk usage and quota commands -<span>User quotas on the file systems can be checked and reviewed using -following command:</span> +>User quotas on the file systems can be checked and reviewed using +following command: -``` +``` $ lfs quota dir ``` Example for Lustre HOME directory: -``` +``` $ lfs quota /home Disk quotas for user user001 (uid 1234): - Filesystem kbytes quota limit grace files quota limit grace - /home 300096 0 250000000 - 2102 0 500000 - + Filesystem kbytes quota limit grace files quota limit grace + /home 300096 0 250000000 - 2102 0 500000 - Disk quotas for group user001 (gid 1234): - Filesystem kbytes quota limit grace files quota limit grace - /home 300096 0 0 - 2102 0 0 - + Filesystem kbytes quota limit grace files quota limit grace + /home 300096 0 0 - 2102 0 0 - ``` In this example, we view current quota size limit of 250GB and 300MB @@ -309,11 +309,11 @@ currently used by user001. Example for Lustre SCRATCH directory: -``` +``` $ lfs quota /scratch Disk quotas for user user001 (uid 1234): - Filesystem kbytes quota limit grace files quota limit grace -  /scratch    8    0 100000000000    -    3    0    0    - + Filesystem kbytes quota limit grace files quota limit grace +  /scratch    8    0 100000000000    -    3    0    0    - Disk quotas for group user001 (gid 1234): Filesystem kbytes quota limit grace files quota limit grace /scratch    8    0    0    -    3    0    0    - @@ -327,20 +327,20 @@ currently used by user001. To have a better understanding of where the space is exactly used, you can use following command to find out. -``` +``` $ du -hs dir ``` Example for your HOME directory: -``` +``` $ cd /home $ du -hs * .[a-zA-z0-9]* | grep -E "[0-9]*G|[0-9]*M" | sort -hr -258M cuda-samples -15M .cache -13M .mozilla -5,5M .eclipse -2,7M .idb_13.0_linux_intel64_app +258M cuda-samples +15M .cache +13M .mozilla +5,5M .eclipse +2,7M .idb_13.0_linux_intel64_app ``` This will list all directories which are having MegaBytes or GigaBytes @@ -349,14 +349,14 @@ is sorted in descending order from largest to smallest files/directories. -<span>To have a better understanding of previous commands, you can read -manpages.</span> +>To have a better understanding of previous commands, you can read +manpages. -``` +``` $ man lfs ``` -``` +``` $ man du ``` @@ -372,7 +372,7 @@ ACLs on a Lustre file system work exactly like ACLs on any Linux file system. They are manipulated with the standard tools in the standard manner. Below, we create a directory and allow a specific user access. -``` +``` [vop999@login1.anselm ~]$ umask 027 [vop999@login1.anselm ~]$ mkdir test [vop999@login1.anselm ~]$ ls -ld test @@ -417,7 +417,7 @@ Every computational node is equipped with 330GB local scratch disk. Use local scratch in case you need to access large amount of small files during your calculation. -[]()The local scratch disk is mounted as /lscratch and is accessible to +The local scratch disk is mounted as /lscratch and is accessible to user at /lscratch/$PBS_JOBID directory. The local scratch filesystem is intended for temporary scratch data @@ -453,7 +453,7 @@ size during your calculation. Be very careful, use of RAM disk filesystem is at the expense of operational memory. -[]()The local RAM disk is mounted as /ramdisk and is accessible to user +The local RAM disk is mounted as /ramdisk and is accessible to user at /ramdisk/$PBS_JOBID directory. The local RAM disk filesystem is intended for temporary scratch data @@ -470,9 +470,9 @@ the output data from within the jobscript. RAM disk Mountpoint -<span class="monospace">/ramdisk</span> + /ramdisk Accesspoint -<span class="monospace">/ramdisk/$PBS_JOBID</span> + /ramdisk/$PBS_JOBID Capacity 60GB at compute nodes without accelerator @@ -493,17 +493,17 @@ Each node is equipped with local /tmp directory of few GB capacity. The files in /tmp directory are automatically purged. -**Summary -** +Summary + ---------- - Mountpoint Usage Protocol Net Capacity Throughput Limitations Access Services - ------------------------------------------ --------------------------- ---------- ---------------- ------------ ------------- ------------------------- ----------------------------- - <span class="monospace">/home</span> home directory Lustre 320 TiB 2 GB/s Quota 250GB Compute and login nodes backed up - <span class="monospace">/scratch</span> cluster shared jobs' data Lustre 146 TiB 6 GB/s Quota 100TB Compute and login nodes files older 90 days removed - <span class="monospace">/lscratch</span> node local jobs' data local 330 GB 100 MB/s none Compute nodes purged after job ends - <span class="monospace">/ramdisk</span> node local jobs' data local 60, 90, 500 GB 5-50 GB/s none Compute nodes purged after job ends - <span class="monospace">/tmp</span> local temporary files local 100 MB/s none Compute and login nodes auto purged +Mountpoint Usage Protocol Net Capacity Throughput Limitations Access Services +------------------------------------------ --------------------------- ---------- ---------------- ------------ ------------- ------------------------- ----------------------------- + /home home directory Lustre 320 TiB 2 GB/s Quota 250GB Compute and login nodes backed up + /scratch cluster shared jobs' data Lustre 146 TiB 6 GB/s Quota 100TB Compute and login nodes files older 90 days removed + /lscratch node local jobs' data local 330 GB 100 MB/s none Compute nodes purged after job ends + /ramdisk node local jobs' data local 60, 90, 500 GB 5-50 GB/s none Compute nodes purged after job ends + /tmp local temporary files local 100 MB/s none Compute and login nodes auto purged  diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/c6d69ffe-da75-4cb6-972a-0cf4c686b6e1.jpeg b/converted/docs.it4i.cz/anselm-cluster-documentation/successfullinstalation.jpeg similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/c6d69ffe-da75-4cb6-972a-0cf4c686b6e1.jpeg rename to converted/docs.it4i.cz/anselm-cluster-documentation/successfullinstalation.jpeg diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/cygwin-and-x11-forwarding.md b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/cygwin-and-x11-forwarding.md index e652cf79cf02d49b996fb0c12c72d1b78a2624ec..234b2cc0b0b62de3146151d5116105b8471a6c7a 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/cygwin-and-x11-forwarding.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/cygwin-and-x11-forwarding.md @@ -1,44 +1,44 @@ Cygwin and X11 forwarding ========================= -### If <span style="text-align: left; float: none; ">no able to forward X11 using PuTTY to CygwinX</span> +### If no able to forward X11 using PuTTY to CygwinX -``` +``` [usename@login1.anselm ~]$ gnome-session & [1] 23691 [usename@login1.anselm ~]$ PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused -** (gnome-session:23691): WARNING **: Cannot open display: + (gnome-session:23691): WARNING **: Cannot open display: ``` -<span style="text-align: left; float: none; "> </span> - -1. <span style="text-align: left; float: none; ">Locate and modify - <span style="text-align: left; float: none; ">Cygwin shortcut that - uses<span - class="Apple-converted-space"> </span></span>[startxwin](http://x.cygwin.com/docs/man1/startxwin.1.html) - locate - C:cygwin64binXWin.exe - <span style="text-align: left; float: none; "><span - style="text-align: left; float: none; "><span - style="text-align: left; float: none; ">change it - to</span></span></span> - C:*cygwin64binXWin.exe -listen tcp* - -  - </span> - <span style="text-align: left; float: none; "><span - style="text-align: left; float: none; "></span></span> -2. <span style="text-align: left; float: none; "><span - style="text-align: left; float: none; ">Check Putty settings: - <span style="text-align: left; float: none; ">Enable X11 - forwarding</span><span style="text-align: left; float: none; "><span - style="text-align: left; float: none; "></span><span - class="Apple-converted-space"> - </span></span> - [](cygwin-and-x11-forwarding.html) - </span></span> + + +1. Locate and modify + Cygwin shortcut that + uses +  [startxwin](http://x.cygwin.com/docs/man1/startxwin.1.html) + locate + C:cygwin64binXWin.exe + + + change it + + C:*cygwin64binXWin.exe -listen tcp* + +  + + + +2. + Check Putty settings: + Enable X11 + forwarding + + + +  +  diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/0f5b58e3-253c-4f87-a3b2-16f75cbf090f.png b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/cygwinX11forwarding.png similarity index 100% rename from converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/0f5b58e3-253c-4f87-a3b2-16f75cbf090f.png rename to converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/cygwinX11forwarding.png diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/8e80a92f-f691-4d92-8e62-344128dcc00b.png b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/gdmscreensaver.png similarity index 100% rename from converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/8e80a92f-f691-4d92-8e62-344128dcc00b.png rename to converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/gdmscreensaver.png diff --git a/converted/docs.it4i.cz/salomon/7758b792-24eb-48dc-bf72-618cda100fda.png b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/gnome_screen.png similarity index 100% rename from converted/docs.it4i.cz/salomon/7758b792-24eb-48dc-bf72-618cda100fda.png rename to converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/gnome_screen.png diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/graphical-user-interface.md b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/graphical-user-interface.md index 7f4f4a334cb3f93e163e5dfd479ed916b8a37db7..e51a82cddb0ea5efdd5f6abca11fdee84745000a 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/graphical-user-interface.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/graphical-user-interface.md @@ -3,7 +3,7 @@ Graphical User Interface - + X Window System --------------- @@ -17,16 +17,16 @@ System**](x-window-system/x-window-and-vnc.html). VNC --- -The **Virtual Network Computing** (**VNC**) is a graphical <span +The **Virtual Network Computing** (**VNC**) is a graphical class="link-external">[desktop -sharing](http://en.wikipedia.org/wiki/Desktop_sharing "Desktop sharing")</span> -system that uses the <span class="link-external">[Remote Frame Buffer +sharing](http://en.wikipedia.org/wiki/Desktop_sharing "Desktop sharing") +system that uses the class="link-external">[Remote Frame Buffer protocol -(RFB)](http://en.wikipedia.org/wiki/RFB_protocol "RFB protocol")</span> -to remotely control another <span -class="link-external">[computer](http://en.wikipedia.org/wiki/Computer "Computer")</span>. +(RFB)](http://en.wikipedia.org/wiki/RFB_protocol "RFB protocol") +to remotely control another +class="link-external">[computer](http://en.wikipedia.org/wiki/Computer "Computer"). Read more about configuring -**[VNC](../../../salomon/accessing-the-cluster/graphical-user-interface/vnc.html)**. +[VNC](../../../salomon/accessing-the-cluster/graphical-user-interface/vnc.html)**. diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md index bea668234e17b946c2363a5dc035332ff6ebedc0..4c14334193b53458451663a8848123bdb7117f16 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md @@ -3,7 +3,7 @@ VNC - + The **Virtual Network Computing** (**VNC**) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing "Desktop sharing") @@ -18,7 +18,7 @@ and events from one computer to another, relaying the graphical [screen](http://en.wikipedia.org/wiki/Computer_screen "Computer screen") updates back in the other direction, over a -[network](http://en.wikipedia.org/wiki/Computer_network "Computer network").^[<span>[</span>1<span>]</span>](http://en.wikipedia.org/wiki/Virtual_Network_Computing#cite_note-1)^ +[network](http://en.wikipedia.org/wiki/Computer_network "Computer network").(http://en.wikipedia.org/wiki/Virtual_Network_Computing#cite_note-1)^ The recommended clients are [TightVNC](http://www.tightvnc.com) or @@ -31,7 +31,7 @@ Create VNC password Local VNC password should be set before the first login. Do use a strong password. -``` +``` [username@login2 ~]$ vncpasswd Password: Verify: @@ -46,10 +46,10 @@ using SSH port forwarding must be established. for the details on SSH tunnels. In this example we use port 61. You can find ports which are already occupied. Here you can see that -ports "<span class="pln">/usr/bin/Xvnc :79"</span> and "<span -class="pln">/usr/bin/Xvnc :60" are occupied.</span> +ports "/usr/bin/Xvnc :79" and " +/usr/bin/Xvnc :60" are occupied. -``` +``` [username@login2 ~]$ ps aux | grep Xvnc username   5971 0.0 0.0 201072 92564 ?       SN  Sep22  4:19 /usr/bin/Xvnc :79 -desktop login2:79 (username) -auth /home/gre196/.Xauthority -geometry 1024x768 -rfbwait 30000 -rfbauth /home/username/.vnc/passwd -rfbport 5979 -fp catalogue:/etc/X11/fontpath.d -pn username   10296 0.0 0.0 131772 21076 pts/29  SN  13:01  0:01 /usr/bin/Xvnc :60 -desktop login2:61 (username) -auth /home/username/.Xauthority -geometry 1600x900 -depth 16 -rfbwait 30000 -rfbauth /home/jir13/.vnc/passwd -rfbport 5960 -fp catalogue:/etc/X11/fontpath.d -pn @@ -58,7 +58,7 @@ username   10296 0.0 0.0 131772 21076 pts/29  SN  13:01  0:01 / Choose free port e.g. 61 and start your VNC server: -``` +``` [username@login2 ~]$ vncserver :61 -geometry 1600x900 -depth 16 New 'login2:1 (username)' desktop is login2:1 @@ -69,7 +69,7 @@ Log file is /home/username/.vnc/login2:1.log Check if VNC server is started on the port (in this example 61): -``` +``` [username@login2 .vnc]$ vncserver -list TigerVNC server sessions: @@ -78,9 +78,9 @@ X DISPLAY #    PROCESS ID :61             18437 ``` -Another command:<span class="pln"></span> +Another command: -``` +``` [username@login2 .vnc]$  ps aux | grep Xvnc username   10296 0.0 0.0 131772 21076 pts/29  SN  13:01  0:01 /usr/bin/Xvnc :61 -desktop login2:61 (username) -auth /home/jir13/.Xauthority -geometry 1600x900 -depth 16 -rfbwait 30000 -rfbauth /home/username/.vnc/passwd -rfbport 5961 -fp catalogue:/etc/X11/fontpath.d -pn @@ -94,11 +94,11 @@ The tunnel must point to the same login node where you launched the VNC server, eg. login2. If you use just cluster-name.it4i.cz, the tunnel might point to a different node due to DNS round robin. -### []()[]()Linux/Mac OS example of creating a tunnel +###Linux/Mac OS example of creating a tunnel At your machine, create the tunnel: -``` +``` local $ ssh -TN -f username@login2.cluster-name.it4i.cz -L 5961:localhost:5961 ``` @@ -106,7 +106,7 @@ Issue the following command to check the tunnel is established (please note the PID 2022 in the last column, you'll need it for closing the tunnel): -``` +``` local $ netstat -natp | grep 5961 (Not all processes could be identified, non-owned process info  will not be shown, you would have to be root to see it all.) @@ -116,14 +116,14 @@ tcp6      0     0 ::1:5961               :::*   Or on Mac OS use this command: -``` +``` local-mac $ lsof -n -i4TCP:5961 | grep LISTEN ssh 75890 sta545 7u IPv4 0xfb062b5c15a56a3b 0t0 TCP 127.0.0.1:5961 (LISTEN) ``` Connect with the VNC client: -``` +``` local $ vncviewer 127.0.0.1:5961 ``` @@ -135,7 +135,7 @@ You have to destroy the SSH tunnel which is still running at the background after you finish the work. Use the following command (PID 2022 in this case, see the netstat command above): -``` +``` kill 2022 ``` @@ -147,9 +147,9 @@ Start vncserver using command vncserver described above. Search for the localhost and port number (in this case 127.0.0.1:5961).** -** -``` + +``` [username@login2 .vnc]$ netstat -tanp | grep Xvnc (Not all processes could be identified, non-owned process info  will not be shown, you would have to be root to see it all.) @@ -162,16 +162,16 @@ to set up the tunnel. Fill the Source port and Destination fields. **Do not forget to click the Add button**. -[](putty-tunnel.png) + + Run the VNC client of your choice, select VNC server 127.0.0.1, port 5961 and connect using VNC password. ### Example of starting TigerVNC viewer - + + In this example, we connect to VNC server on port 5961, via the ssh tunnel, using TigerVNC viewer. The connection is encrypted and secured. @@ -183,35 +183,35 @@ pixels. Use your VNC password to log using TightVNC Viewer and start a Gnome Session on the login node. -[****](TightVNC_login.png) + Gnome session ------------- You should see after the successful login. -[](https://docs.it4i.cz/get-started-with-it4innovations/gnome_screen.jpg) -### **Disable your Gnome session screensaver -** + + +###Disable your Gnome session screensaver + Open Screensaver preferences dialog: -[](../../../../salomon/gnome_screen.jpg.1) + + Uncheck both options below the slider: -[](gdmdisablescreensaver.png) + + ### Kill screensaver if locked screen If the screen gets locked you have to kill the screensaver. Do not to forget to disable the screensaver then. -``` +``` [username@login2 .vnc]$ ps aux | grep screen username    1503 0.0 0.0 103244  892 pts/4   S+  14:37  0:00 grep screen username    24316 0.0 0.0 270564 3528 ?       Ss  14:12  0:00 gnome-screensaver @@ -223,7 +223,7 @@ username    24316 0.0 0.0 270564 3528 ?       Ss  14:12 You should kill your VNC server using command: -``` +``` [username@login2 .vnc]$ vncserver -kill :61 Killing Xvnc process ID 7074 Xvnc process ID 7074 already killed @@ -231,7 +231,7 @@ Xvnc process ID 7074 already killed Or this way: -``` +``` [username@login2 .vnc]$ pkill vnc ``` @@ -241,16 +241,16 @@ GUI applications on compute nodes over VNC The very [same methods as described above](https://docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-and-vnc#gui-applications-on-compute-nodes), may be used to run the GUI applications on compute nodes. However, for -**maximum performance**, proceed following these steps: +maximum performance**, proceed following these steps: Open a Terminal (Applications -> System Tools -> Terminal). Run all the next commands in the terminal. -[](gnome-terminal.png) + Allow incoming X11 graphics from the compute nodes at the login node: -``` +``` $ xhost + ``` @@ -261,19 +261,19 @@ Use the **-v DISPLAY** option to propagate the DISPLAY on the compute node. In this example, we want a complete node (24 cores in this example) from the production queue: -``` +``` $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A PROJECT_ID -q qprod -l select=1:ncpus=24 ``` Test that the DISPLAY redirection into your VNC session works, by running a X11 application (e. g. XTerm) on the assigned compute node: -``` +``` $ xterm ``` Example described above: -[](gnome-compute-nodes-over-vnc.png) + diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/bb4cedff-4cb6-402b-ac79-039186fe5df3.png b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vncviewer.png similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/bb4cedff-4cb6-402b-ac79-039186fe5df3.png rename to converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vncviewer.png diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpnuiV.png b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vpnuiV.png similarity index 100% rename from converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpnuiV.png rename to converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vpnuiV.png diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md index 7762da1c72ebb5285e1362ed5ba29c3b5849454a..f3bf43d32c515b1f0a046c79adfaba0fb247d291 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system.md @@ -3,7 +3,7 @@ X Window System - + The X Window system is a principal way to get GUI access to the clusters. The **X Window System** (commonly known as **X11**, based on @@ -25,7 +25,7 @@ In order to display graphical user interface GUI of various software tools, you need to enable the X display forwarding. On Linux and Mac, log in using the -X option tho ssh client: -``` +``` local $ ssh -X username@cluster-name.it4i.cz ``` @@ -37,13 +37,13 @@ checkbox before logging in. Then log in as usual. To verify the forwarding, type -``` +``` $ echo $DISPLAY ``` if you receive something like -``` +``` localhost:10.0 ``` @@ -88,7 +88,7 @@ stability and full features we recommend the <td align="left"><p><a href="http://x.cygwin.com/" class="external-link">Install Cygwin</a></p> <p>Find and execute XWin.exe<br /> to start the X server on Windows desktop computer.</p> -<p><a href="x-window-system/cygwin-and-x11-forwarding.html" class="internal-link">If no able to forward X11 using PuTTY to CygwinX</a></p></td> +<p><a href="x-window-system/cygwin-and-x11-forwarding.html" If no able to forward X11 using PuTTY to CygwinX</a></p></td> <td align="left"><p>Use Xlaunch to configure the Xming.</p> <p>Run Xming<br /> to start the X server on Windows desktop computer.</p></td> @@ -106,23 +106,23 @@ Make sure that X forwarding is activated and the X server is running. Then launch the application as usual. Use the & to run the application in background. -``` +``` $ module load intel (idb and gvim not installed yet) $ gvim & ``` -``` +``` $ xterm ``` In this example, we activate the intel programing environment tools, then start the graphical gvim editor. -### []()GUI Applications on Compute Nodes +### GUI Applications on Compute Nodes Allocate the compute nodes using -X option on the qsub command -``` +``` $ qsub -q qexp -l select=2:ncpus=24 -X -I ``` @@ -130,10 +130,10 @@ In this example, we allocate 2 nodes via qexp queue, interactively. We request X11 forwarding with the -X option. It will be possible to run the GUI enabled applications directly on the first compute node. -**Better performance** is obtained by logging on the allocated compute +Better performance** is obtained by logging on the allocated compute node via ssh, using the -X option. -``` +``` $ ssh -X r24u35n680 ``` @@ -151,38 +151,38 @@ environment. ### Gnome on Linux and OS X To run the remote Gnome session in a window on Linux/OS X computer, you -need to install Xephyr. Ubuntu package is <span -class="monospace">xserver-xephyr</span>, on OS X it is part of +need to install Xephyr. Ubuntu package is +xserver-xephyr, on OS X it is part of [XQuartz](http://xquartz.macosforge.org/landing/). First, launch Xephyr on local machine: -``` +``` local $ Xephyr -ac -screen 1024x768 -br -reset -terminate :1 & ``` This will open a new X window with size 1024x768 at DISPLAY :1. Next, ssh to the cluster with DISPLAY environment variable set and launch -<span class="monospace">gnome-session</span> + gnome-session - local $ DISPLAY=:1.0 ssh -XC yourname@cluster-name.it4i.cz -i ~/.ssh/path_to_your_key - ... cluster-name MOTD... - yourname@login1.cluster-namen.it4i.cz $ gnome-session & + local $ DISPLAY=:1.0 ssh -XC yourname@cluster-name.it4i.cz -i ~/.ssh/path_to_your_key + ... cluster-name MOTD... + yourname@login1.cluster-namen.it4i.cz $ gnome-session & On older systems where Xephyr is not available, you may also try Xnest instead of Xephyr. Another option is to launch a new X server in a separate console, via: -``` +``` xinit /usr/bin/ssh -XT -i .ssh/path_to_your_key yourname@cluster-namen.it4i.cz gnome-session -- :1 vt12 ``` However this method does not seem to work with recent Linux -distributions and you will need to manually source <span -class="monospace">/etc/profile</span> to properly set environment +distributions and you will need to manually source +/etc/profile to properly set environment variables for PBS. -### Gnome on Windows** -** +### Gnome on Windows + Use Xlaunch to start the Xming server or run the XWin.exe. Select the ''One window" mode. @@ -190,7 +190,7 @@ Use Xlaunch to start the Xming server or run the XWin.exe. Select the Log in to the cluster, using PuTTY. On the cluster, run the gnome-session command. -``` +``` $ gnome-session & ``` diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/pageant.md b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/pageant.md index d6e3c3b8dc36814ddfbfa237411d0299c9975285..d6a8cf8081544656ca5ed20ed4d41b109a14e534 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/pageant.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/pageant.md @@ -3,18 +3,18 @@ Pageant SSH agent - + Pageant holds your private key in memory without needing to retype a passphrase on every login. -- Run Pageant. -- On Pageant Key List press *Add key* and select your private - key (id_rsa.ppk). -- Enter your passphrase. -- Now you have your private key in memory without needing to retype a - passphrase on every login. - - [](PageantV.png) +- Run Pageant. +- On Pageant Key List press *Add key* and select your private + key (id_rsa.ppk). +- Enter your passphrase. +- Now you have your private key in memory without needing to retype a + passphrase on every login. + +   diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md index 31f03e8a30ef0775c7cd9bd915a8d1b57a8baa61..03e17904173f4d295377ea8d3bc7d644fb185ae7 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.md @@ -3,21 +3,21 @@ PuTTY - -PuTTY -<span class="Apple-converted-space"> </span>before we start SSH connection ssh-connection style="text-align: start; "} + +PuTTY - before we start SSH connection ssh-connection style="text-align: start; "} --------------------------------------------------------------------------------- ### Windows PuTTY Installer We recommned you to download "**A Windows installer for everything except PuTTYtel**" with ***Pageant*** (SSH authentication agent) and -**PuTTYgen** (PuTTY key generator) which is available +PuTTYgen** (PuTTY key generator) which is available [here](http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html). -<span class="internal-link">After installation you can proceed directly + After installation you can proceed directly to private keys authentication using -["Putty"](putty.html#putty).</span> +["Putty"](putty.html#putty). "Change Password for Existing Private Key" is optional. "Generate a New Public/Private key pair" is intended for users without Public/Private key in the initial email containing login credentials. @@ -36,57 +36,57 @@ if needed. without needing to retype a passphrase on every login. We recommend its usage. -[]()PuTTY - how to connect to the IT4Innovations cluster +PuTTY - how to connect to the IT4Innovations cluster -------------------------------------------------------- -- Run PuTTY -- Enter Host name and Save session fields with [Login - address](../../../../salomon/accessing-the-cluster/shell-and-data-access/shell-and-data-access.html) - and browse Connection - > SSH -> Auth menu. - The *Host Name* input may be in the format - **"username@clustername.it4i.cz"** so you don't have to type your - login each time. - In this example we will connect to the Salomon cluster using -  **"salomon.it4i.cz"**. - - [](PuTTY_host_Salomon.png) +- Run PuTTY +- Enter Host name and Save session fields with [Login + address](../../../../salomon/accessing-the-cluster/shell-and-data-access/shell-and-data-access.html) + and browse Connection - > SSH -> Auth menu. + The *Host Name* input may be in the format + **"username@clustername.it4i.cz"** so you don't have to type your + login each time. + In this example we will connect to the Salomon cluster using +  **"salomon.it4i.cz"**. + +   -- Category -> Connection - > SSH -> Auth: - Select Attempt authentication using Pageant. - Select Allow agent forwarding. - Browse and select your [private - key](../ssh-keys.html) file. - - [](PuTTY_keyV.png) - -- Return to Session page and Save selected configuration with *Save* - button. - - [](PuTTY_save_Salomon.png) - -- Now you can log in using *Open* button. - - [](PuTTY_open_Salomon.png) - -- Enter your username if the *Host Name* input is not in the format - "username@salomon.it4i.cz". - -- Enter passphrase for selected [private - key](../ssh-keys.html) file if Pageant ****SSH - authentication agent is not used. - +- Category -> Connection - > SSH -> Auth: + Select Attempt authentication using Pageant. + Select Allow agent forwarding. + Browse and select your [private + key](../ssh-keys.html) file. + +  + +- Return to Session page and Save selected configuration with *Save* + button. + +  + +- Now you can log in using *Open* button. + +  + +- Enter your username if the *Host Name* input is not in the format + "username@salomon.it4i.cz". + +- Enter passphrase for selected [private + key](../ssh-keys.html) file if Pageant ****SSH + authentication agent is not used. + Another PuTTY Settings ---------------------- -- Category -> Windows -> Translation -> Remote character set - and select **UTF-8**. +- Category -> Windows -> Translation -> Remote character set + and select **UTF-8**. <!-- --> -- Category -> Terminal -> Features and select **Disable - application keypad mode** (enable numpad) -- Save your configuration on Session page in to Default Settings with - *Save* button . +- Category -> Terminal -> Features and select **Disable + application keypad mode** (enable numpad) +- Save your configuration on Session page in to Default Settings with + *Save* button . diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/puttygen.md b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/puttygen.md index 628cc640cf44ece3e7850b65442fb52fc77f66fc..8bc7590212421c6ee323e47d87b16b02a89c518f 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/puttygen.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/puttygen.md @@ -3,7 +3,7 @@ PuTTY key generator - + PuTTYgen is the PuTTY key generator. You can load in an existing private key and change your passphrase or generate a new public/private key @@ -14,14 +14,14 @@ pair. You can change the password of your SSH key with "PuTTY Key Generator". Make sure to backup the key. -- Load your [private key](../ssh-keys.html) file with - *Load* button. -- Enter your current passphrase. -- Change key passphrase. -- Confirm key passphrase. -- Save your private key with *Save private key* button. - - [](PuttyKeygeneratorV.png) +- Load your [private key](../ssh-keys.html) file with + *Load* button. +- Enter your current passphrase. +- Change key passphrase. +- Confirm key passphrase. +- Save your private key with *Save private key* button. + +   @@ -29,43 +29,43 @@ Make sure to backup the key. You can generate an additional public/private key pair and insert public key into authorized_keys file for authentication with your own private -<span>key. </span> +>key. -- Start with *Generate* button. - - [](PuttyKeygenerator_001V.png) - -- Generate some randomness. - - [](PuttyKeygenerator_002V.png) - -- Wait. - - [](20150312_143443.png) - -- Enter a *comment* for your key using format - 'username@organization.example.com'. - Enter key passphrase. - Confirm key passphrase. - Save your new private key `in "*.ppk" `format with *Save private - key* button. - - [](PuttyKeygenerator_004V.png) - -- Save the public key with *Save public key* button. - You can copy public key out of the â€Public key for pasting into - authorized_keys file’ box. - - [](PuttyKeygenerator_005V.png) - -- Export private key in OpenSSH format "id_rsa" using Conversion - -> Export OpenSSH key - - [](PuttyKeygenerator_006V.png) - -- Now you can insert additional public key into authorized_keys file - for authentication with your own private <span>key. - You must log in using ssh key received after registration. Then - proceed to [How to add your own - key](../ssh-keys.html). - </span> +- Start with *Generate* button. + +  + +- Generate some randomness. + +  + +- Wait. + +  + +- Enter a *comment* for your key using format + 'username@organization.example.com'. + Enter key passphrase. + Confirm key passphrase. + Save your new private key `in "*.ppk" `format with *Save private + key* button. + +  + +- Save the public key with *Save public key* button. + You can copy public key out of the â€Public key for pasting into + authorized_keys file’ box. + +  + +- Export private key in OpenSSH format "id_rsa" using Conversion + -> Export OpenSSH key + +  + +- Now you can insert additional public key into authorized_keys file + for authentication with your own private >key. + You must log in using ssh key received after registration. Then + proceed to [How to add your own + key](../ssh-keys.html). + diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md index e3ad6a6fe4c47c82baf1f5efb92c81cf6a3f59a9..5e87ffbd430137384a21b506e0d29108983536b2 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.md @@ -3,64 +3,64 @@ SSH keys - -<span id="Key_management" class="mw-headline">Key management</span> + + ------------------------------------------------------------------- After logging in, you can see .ssh/ directory with SSH keys and authorized_keys file: - $ cd /home/username/ - $ ls -la .ssh/ - total 24 - drwx------ 2 username username 4096 May 13 15:12 . - drwxr-x---22 username username 4096 May 13 07:22 .. - -rw-r--r-- 1 username username 392 May 21 2014 authorized_keys - -rw------- 1 username username 1675 May 21 2014 id_rsa - -rw------- 1 username username 1460 May 21 2014 id_rsa.ppk - -rw-r--r-- 1 username username 392 May 21 2014 id_rsa.pub - -<span class="visualHighlight"></span>Please note that private keys in + $ cd /home/username/ + $ ls -la .ssh/ + total 24 + drwx------ 2 username username 4096 May 13 15:12 . + drwxr-x---22 username username 4096 May 13 07:22 .. + -rw-r--r-- 1 username username 392 May 21 2014 authorized_keys + -rw------- 1 username username 1675 May 21 2014 id_rsa + -rw------- 1 username username 1460 May 21 2014 id_rsa.ppk + -rw-r--r-- 1 username username 392 May 21 2014 id_rsa.pub + + class="visualHighlight">Please note that private keys in .ssh directory are without passphrase and allow you to connect within the cluster. ### Access privileges on .ssh folder -- `.ssh`<span - style="text-align: left; float: none; "><span - class="Apple-converted-space"> </span>directory:</span> <span - style="text-align: left; float: none; ">700 (drwx------)</span> -- <span - style="text-align: left; float: none; ">Authorized_keys, <span - style="text-align: left; float: none; "><span - style="text-align: left; float: none; "><span - class="Apple-converted-space">known_hosts</span></span></span> and - <span style="text-align: left; float: none; ">public key <span - style="text-align: left; float: none; ">(</span>`.pub`<span - style="text-align: left; float: none; "><span - class="Apple-converted-space"> </span>file): <span - class="Apple-converted-space"> - </span></span>`644 (-rw-r--r--)`</span></span> -- <span style="text-align: left; float: none; "><span - style="text-align: left; float: none; ">``<span - class="Apple-converted-space"> <span - style="text-align: left; float: none; ">Private key - (</span>`id_rsa/id_rsa.ppk`<span - style="text-align: left; float: none; ">): <span - class="Apple-converted-space"></span></span>`600 (-rw-------)` - </span></span> - </span> +- `.ssh` + +  directory: + 700 (drwx------) +- + Authorized_keys, + + + known_hosts</span> and + public key + (`.pub` + +  file): + + `644 (-rw-r--r--)`</span></span> +- + `` + + Private key + (`id_rsa/id_rsa.ppk` + ): + `600 (-rw-------)` + + <!-- --> - cd /home/username/ - chmod 700 .ssh/ - chmod 644 .ssh/authorized_keys - chmod 644 .ssh/id_rsa.pub - chmod 644 .ssh/known_hosts - chmod 600 .ssh/id_rsa - chmod 600 .ssh/id_rsa.ppk + cd /home/username/ + chmod 700 .ssh/ + chmod 644 .ssh/authorized_keys + chmod 644 .ssh/id_rsa.pub + chmod 644 .ssh/known_hosts + chmod 600 .ssh/id_rsa + chmod 600 .ssh/id_rsa.ppk Private key ----------- @@ -68,75 +68,75 @@ Private key The path to a private key is usually /home/username/.ssh/ Private key file in "id_rsa" or `"*.ppk" `format is used to -authenticate with the servers. <span -style="text-align: start; float: none; ">Private key is present locally -on local side and used for example in SSH agent [Pageant (for Windows -users)](putty/PageantV.png). The private key should -always be kept in a safe place.</span> - -<span style="text-align: start; float: none; ">An example of private key -format:</span> - - -----BEGIN RSA PRIVATE KEY----- - MIIEpAIBAAKCAQEAqbo7jokygnBpG2wYa5NB45ns6+UKTNLMLHF0BO3zmRtKEElE - aGqXfbYwvXlcuRb2d9/Y5dVpCZHV0kbY3NhtVOcEIe+1ROaiU9BEsUAhMNEvgiLV - gSql4QvRO4BWPlM8+WAWXDp3oeoBh8glXyuh9teb8yq98fv1r1peYGRrW3/s4V+q - O1SQ0XY2T7rWCYRLIP6rTMXArTI35v3WU513mn7nm1fJ7oN0QgVH5b0W9V1Kyc4l - 9vILHeMXxvz+i/5jTEfLOJpiRGYZYcaYrE4dIiHPl3IlbV7hlkK23Xb1US8QJr5G - ADxp1VTkHjY+mKagEfxl1hQIb42JLHhKMEGqNQIDAQABAoIBAQCkypPuxZjL+vai - UGa5dAWiRZ46P2yrwHPKpvEdpCdDPbLAc1K/CtdBkHZsUPxNHVV6eFWweW99giIY - Av+mFWC58X8asBHQ7xkmxW0cqAZRzpkRAl9IBS9/fKjO28Fgy/p+suOi8oWbKIgJ - 3LMkX0nnT9oz1AkOfTNC6Tv+3SE7eTj1RPcMjur4W1Cd1N3EljLszdVk4tLxlXBS - yl9NzVnJJbJR4t01l45VfFECgYEAno1WJSB/SwdZvS9GkfhvmZd3r4vyV9Bmo3dn - XZAh8HRW13imOnpklDR4FRe98D9A7V3yh9h60Co4oAUd6N+Oc68/qnv/8O9efA+M - /neI9ANYFo8F0+yFCp4Duj7zPV3aWlN/pd8TNzLqecqh10uZNMy8rAjCxybeZjWd - DyhgywXhAoGBAN3BCazNefYpLbpBQzwes+f2oStvwOYKDqySWsYVXeVgUI+OWTVZ - eZ26Y86E8MQO+q0TIxpwou+TEaUgOSqCX40Q37rGSl9K+rjnboJBYNCmwVp9bfyj - kCLL/3g57nTSqhgHNa1xwemePvgNdn6FZteA8sXiCg5ZzaISqWAffek5AoGBAMPw - V/vwQ96C8E3l1cH5cUbmBCCcfXM2GLv74bb1V3SvCiAKgOrZ8gEgUiQ0+TfcbAbe - 7MM20vRNQjaLTBpai/BTbmqM1Q+r1KNjq8k5bfTdAoGANgzlNM9omM10rd9WagL5 - yuJcal/03p048mtB4OI4Xr5ZJISHze8fK4jQ5veUT9Vu2Fy/w6QMsuRf+qWeCXR5 - RPC2H0JzkS+2uZp8BOHk1iDPqbxWXJE9I57CxBV9C/tfzo2IhtOOcuJ4LY+sw+y/ - ocKpJbdLTWrTLdqLHwicdn8OxeWot1mOukyK2l0UeDkY6H5pYPtHTpAZvRBd7ETL - Zs2RP3KFFvho6aIDGrY0wee740/jWotx7fbxxKwPyDRsbH3+1Wx/eX2RND4OGdkH - gejJEzpk/7y/P/hCad7bSDdHZwO+Z03HIRC0E8yQz+JYatrqckaRCtd7cXryTmTR - FbvLJmECgYBDpfno2CzcFJCTdNBZFi34oJRiDb+HdESXepk58PcNcgK3R8PXf+au - OqDBtZIuFv9U1WAg0gzGwt/0Y9u2c8m0nXziUS6AePxy5sBHs7g9C9WeZRz/nCWK - +cHIm7XOwBEzDKz5f9eBqRGipm0skDZNKl8X/5QMTT5K3Eci2n+lTw== - -----END RSA PRIVATE KEY----- +authenticate with the servers. +Private key is present locally +on local side and used for example in SSH agent +. The private key should +always be kept in a safe place. + + An example of private key +format: + + -----BEGIN RSA PRIVATE KEY----- + MIIEpAIBAAKCAQEAqbo7jokygnBpG2wYa5NB45ns6+UKTNLMLHF0BO3zmRtKEElE + aGqXfbYwvXlcuRb2d9/Y5dVpCZHV0kbY3NhtVOcEIe+1ROaiU9BEsUAhMNEvgiLV + gSql4QvRO4BWPlM8+WAWXDp3oeoBh8glXyuh9teb8yq98fv1r1peYGRrW3/s4V+q + O1SQ0XY2T7rWCYRLIP6rTMXArTI35v3WU513mn7nm1fJ7oN0QgVH5b0W9V1Kyc4l + 9vILHeMXxvz+i/5jTEfLOJpiRGYZYcaYrE4dIiHPl3IlbV7hlkK23Xb1US8QJr5G + ADxp1VTkHjY+mKagEfxl1hQIb42JLHhKMEGqNQIDAQABAoIBAQCkypPuxZjL+vai + UGa5dAWiRZ46P2yrwHPKpvEdpCdDPbLAc1K/CtdBkHZsUPxNHVV6eFWweW99giIY + Av+mFWC58X8asBHQ7xkmxW0cqAZRzpkRAl9IBS9/fKjO28Fgy/p+suOi8oWbKIgJ + 3LMkX0nnT9oz1AkOfTNC6Tv+3SE7eTj1RPcMjur4W1Cd1N3EljLszdVk4tLxlXBS + yl9NzVnJJbJR4t01l45VfFECgYEAno1WJSB/SwdZvS9GkfhvmZd3r4vyV9Bmo3dn + XZAh8HRW13imOnpklDR4FRe98D9A7V3yh9h60Co4oAUd6N+Oc68/qnv/8O9efA+M + /neI9ANYFo8F0+yFCp4Duj7zPV3aWlN/pd8TNzLqecqh10uZNMy8rAjCxybeZjWd + DyhgywXhAoGBAN3BCazNefYpLbpBQzwes+f2oStvwOYKDqySWsYVXeVgUI+OWTVZ + eZ26Y86E8MQO+q0TIxpwou+TEaUgOSqCX40Q37rGSl9K+rjnboJBYNCmwVp9bfyj + kCLL/3g57nTSqhgHNa1xwemePvgNdn6FZteA8sXiCg5ZzaISqWAffek5AoGBAMPw + V/vwQ96C8E3l1cH5cUbmBCCcfXM2GLv74bb1V3SvCiAKgOrZ8gEgUiQ0+TfcbAbe + 7MM20vRNQjaLTBpai/BTbmqM1Q+r1KNjq8k5bfTdAoGANgzlNM9omM10rd9WagL5 + yuJcal/03p048mtB4OI4Xr5ZJISHze8fK4jQ5veUT9Vu2Fy/w6QMsuRf+qWeCXR5 + RPC2H0JzkS+2uZp8BOHk1iDPqbxWXJE9I57CxBV9C/tfzo2IhtOOcuJ4LY+sw+y/ + ocKpJbdLTWrTLdqLHwicdn8OxeWot1mOukyK2l0UeDkY6H5pYPtHTpAZvRBd7ETL + Zs2RP3KFFvho6aIDGrY0wee740/jWotx7fbxxKwPyDRsbH3+1Wx/eX2RND4OGdkH + gejJEzpk/7y/P/hCad7bSDdHZwO+Z03HIRC0E8yQz+JYatrqckaRCtd7cXryTmTR + FbvLJmECgYBDpfno2CzcFJCTdNBZFi34oJRiDb+HdESXepk58PcNcgK3R8PXf+au + OqDBtZIuFv9U1WAg0gzGwt/0Y9u2c8m0nXziUS6AePxy5sBHs7g9C9WeZRz/nCWK + +cHIm7XOwBEzDKz5f9eBqRGipm0skDZNKl8X/5QMTT5K3Eci2n+lTw== + -----END RSA PRIVATE KEY----- Public key ---------- -Public key file in "*.pub" format is used to <span -style="text-align: start; float: none; ">verify a<span -class="Apple-converted-space"> </span></span><span -style="text-align: start; ">digital signature</span>. Public <span -style="text-align: start; float: none; ">key is present on the remote -side and <span style="text-align: start; float: none; ">allows access to -the owner of the matching private key</span>.</span> +Public key file in "*.pub" format is used to +verify a + +digital signature. Public +key is present on the remote +side and allows access to +the owner of the matching private key. -<span style="text-align: start; float: none; ">An example of public key -format:</span> + An example of public key +format: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCpujuOiTKCcGkbbBhrk0Hjmezr5QpM0swscXQE7fOZG0oQSURoapd9tjC9eVy5FvZ339jl1WkJkdXSRtjc2G1U5wQh77VE5qJT0ESxQCEw0S+CItWBKqXhC9E7gFY+UyP5YBZcOneh6gGHyCVfK6H215vzKr3x+/WvWl5gZGtbf+zhX6o4RJDRdjZPutYJhEsg/qtMxcCtMjfm/dZTnXeafuebV8nug3RCBUflvRb1XUrJuiX28gsd4xfG/P6L/mNMR8s4kmJEZhlhxpj8Th0iIc+XciVtXuGWQrbddcVRLxAmvkYAPGnVVOQeNj69pqAR/GXaFAhvjYkseEowQao1 username@organization.example.com + ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCpujuOiTKCcGkbbBhrk0Hjmezr5QpM0swscXQE7fOZG0oQSURoapd9tjC9eVy5FvZ339jl1WkJkdXSRtjc2G1U5wQh77VE5qJT0ESxQCEw0S+CItWBKqXhC9E7gFY+UyP5YBZcOneh6gGHyCVfK6H215vzKr3x+/WvWl5gZGtbf+zhX6o4RJDRdjZPutYJhEsg/qtMxcCtMjfm/dZTnXeafuebV8nug3RCBUflvRb1XUrJuiX28gsd4xfG/P6L/mNMR8s4kmJEZhlhxpj8Th0iIc+XciVtXuGWQrbddcVRLxAmvkYAPGnVVOQeNj69pqAR/GXaFAhvjYkseEowQao1 username@organization.example.com ### How to add your own key First, generate a new keypair of your public and private key: - local $ ssh-keygen -C 'username@organization.example.com' -f additional_key + local $ ssh-keygen -C 'username@organization.example.com' -f additional_key Please, enter **strong** **passphrase** for securing your private key. You can insert additional public key into authorized_keys file for -authentication with your own private <span>key. Additional records in -authorized_keys file must be delimited by new line</span>. Users are +authentication with your own private >key. Additional records in +authorized_keys file must be delimited by new line. Users are not advised to remove the default public key from authorized_keys file. Example: - $ cat additional_key.pub >> ~/.ssh/authorized_keys + $ cat additional_key.pub >> ~/.ssh/authorized_keys In this example, we add an additional public key, stored in file additional_key.pub into the authorized_keys. Next time we log in, we diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.md b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.md index 335f333d97ee55796469e7769ec872abbbae20c1..604bdf27ee2135e02b164baef79b514f0e58b446 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpn-connection-fail-in-win-8.1.md @@ -9,18 +9,18 @@ will also impact WIndows 7 users with IE11. Windows Server 2008/2012 are also impacted by this defect, but neither is a supported OS for AnyConnect. -**Workaround:** +Workaround:** -- Close the Cisco AnyConnect Window and the taskbar mini-icon -- Right click vpnui.exe in the 'Cisco AnyConnect Secure Mobility - Client' folder. (C:Program Files (x86)CiscoCisco AnyConnect - Secure Mobility -- Client) -- Click on the 'Run compatibility troubleshooter' button -- Choose 'Try recommended settings' -- The wizard suggests Windows 8 compatibility. -- Click 'Test Program'. This will open the program. -- Close +- Close the Cisco AnyConnect Window and the taskbar mini-icon +- Right click vpnui.exe in the 'Cisco AnyConnect Secure Mobility + Client' folder. (C:Program Files (x86)CiscoCisco AnyConnect + Secure Mobility +- Client) +- Click on the 'Run compatibility troubleshooter' button +- Choose 'Try recommended settings' +- The wizard suggests Windows 8 compatibility. +- Click 'Test Program'. This will open the program. +- Close  diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/applying-for-resources.md b/converted/docs.it4i.cz/get-started-with-it4innovations/applying-for-resources.md index 2e9159d4e34ec27e9ab46c6d221fc6f7b560ebf8..bc922be6eeb7c554a5be15ca228eba3beae436f1 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/applying-for-resources.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/applying-for-resources.md @@ -3,7 +3,7 @@ Applying for Resources - + Computational resources may be allocated by any of the following [Computing resources @@ -29,8 +29,8 @@ proposal. In the proposal, the applicants ***apply for a particular amount of core-hours*** of computational resources. The requested core-hours should be substantiated by scientific excellence of the proposal, its computational maturity and expected impacts. -<span>Proposals do undergo a scientific, technical and economic -evaluation.</span> The allocation decisions are based on this +>Proposals do undergo a scientific, technical and economic +evaluation. The allocation decisions are based on this evaluation. More information at [Computing resources allocation](http://www.it4i.cz/computing-resources-allocation/?lang=en) and [Obtaining Login diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/introduction.md b/converted/docs.it4i.cz/get-started-with-it4innovations/introduction.md index 25dac76a5dcceea843891d5837f453fbb5fb7710..faec5ef5ff7604cdee2614a1a3f131ad1edcb9dd 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/introduction.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/introduction.md @@ -3,27 +3,27 @@ Documentation - + Welcome to IT4Innovations documentation pages. The IT4Innovations national supercomputing center operates supercomputers [Salomon](../salomon.html) and -[Anselm](../anselm.html). The supercomputers are [<span -class="external-link">available</span>](applying-for-resources.html) +[Anselm](../anselm.html). The supercomputers are [ +class="external-link">available](applying-for-resources.html) to academic community within the Czech Republic and Europe and industrial community worldwide. The purpose of these pages is to provide a comprehensive documentation on hardware, software and usage of the computers. -<span class="link-external"><span class="WYSIWYG_LINK">How to read the documentation</span></span> + class="link-external"> class="WYSIWYG_LINK">How to read the documentation -------------------------------------------------------------------------------------------------- -1. Read the list in the left column. Select the subject of interest. - Alternatively, use the Search box in the upper right corner. -2. Read the CONTENTS in the upper right corner. -3. Scan for all the yellow bulb call-outs on the page. -4. Read the details if still more information is needed. **Look for - examples** illustrating the concepts. +1.Read the list in the left column. Select the subject of interest. + Alternatively, use the Search box in the upper right corner. +2.Read the CONTENTS in the upper right corner. +3.Scan for all the yellow bulb call-outs on the page. +4.Read the details if still more information is needed. **Look for + examples** illustrating the concepts.  @@ -31,8 +31,8 @@ The call-out.  Focus on the call-outs before reading full details.  -- Read the [Changelog](changelog.html) to keep up - to date. +- Read the [Changelog](changelog.html) to keep up + to date. Getting Help and Support ------------------------ @@ -59,11 +59,11 @@ You need basic proficiency in Linux environment. In order to use the system for your calculations, you need basic proficiency in Linux environment. To gain the proficiency, we recommend -you reading the [<span class="WYSIWYG_LINK">introduction to -Linux</span>](http://www.tldp.org/LDP/intro-linux/html/) +you reading the [ class="WYSIWYG_LINK">introduction to +Linux](http://www.tldp.org/LDP/intro-linux/html/) operating system environment and installing a Linux distribution on your -personal computer. A good choice might be the [<span -class="WYSIWYG_LINK">Fedora</span>](http://fedoraproject.org/) +personal computer. A good choice might be the [ +class="WYSIWYG_LINK">Fedora](http://fedoraproject.org/) distribution, as it is similar to systems on the clusters at IT4Innovations. It's easy to install and use. In fact, any distribution would do. @@ -85,32 +85,32 @@ IT4Innovations.](http://prace.it4i.cz) Terminology Frequently Used on These Pages ------------------------------------------ -- **node:** a computer, interconnected by network to other computers - - Computational nodes are powerful computers, designed and dedicated - for executing demanding scientific computations. -- **core:** processor core, a unit of processor, executing - computations -- **corehours:** wall clock hours of processor core time - Each node - is equipped with **X** processor cores, provides **X** corehours per - 1 wall clock hour. -- **[]()job:** a calculation running on the supercomputer - The job - allocates and utilizes resources of the supercomputer for - certain time. -- **HPC:** High Performance Computing -- []()**HPC (computational) resources:** corehours, storage capacity, - software licences -- **code:** a program -- **primary investigator (PI):** a person responsible for execution of - computational project and utilization of computational resources - allocated to that project -- **collaborator:** a person participating on execution of - computational project and utilization of computational resources - allocated to that project -- **[]()project:** a computational project under investigation by the - PI - The project is identified by the project ID. The computational - resources are allocated and charged per project. -- **[]()jobscript:** a script to be executed by the PBS Professional - workload manager +- **node:** a computer, interconnected by network to other computers - + Computational nodes are powerful computers, designed and dedicated + for executing demanding scientific computations. +- **core:** processor core, a unit of processor, executing + computations +- **corehours:** wall clock hours of processor core time - Each node + is equipped with **X** processor cores, provides **X** corehours per + 1 wall clock hour. +- **job:** a calculation running on the supercomputer - The job + allocates and utilizes resources of the supercomputer for + certain time. +- **HPC:** High Performance Computing +- **HPC (computational) resources:** corehours, storage capacity, + software licences +- **code:** a program +- **primary investigator (PI):** a person responsible for execution of + computational project and utilization of computational resources + allocated to that project +- **collaborator:** a person participating on execution of + computational project and utilization of computational resources + allocated to that project +- **project:** a computational project under investigation by the + PI - The project is identified by the project ID. The computational + resources are allocated and charged per project. +- **jobscript:** a script to be executed by the PBS Professional + workload manager Conventions ----------- @@ -120,13 +120,13 @@ examples. We use the following conventions:  Cluster command prompt -``` +``` $ ``` Your local linux host command prompt -``` +``` local $ ``` diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/Authorization_chain.png b/converted/docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/Authorization_chain.png similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/Authorization_chain.png rename to converted/docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/Authorization_chain.png diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/certificates-faq.md b/converted/docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/certificates-faq.md index b1d4ad3e6f5cf5ee2adf1ebe7dd6529fd0c81f77..e3b321a6442faf5ccdf4a23ac18b054ecd555871 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/certificates-faq.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/certificates-faq.md @@ -4,7 +4,7 @@ Certificates FAQ FAQ about certificates in general - + Q: What are certificates? @@ -18,25 +18,25 @@ only one password is required. There are different kinds of certificates, each with a different scope of use. We mention here: -- User (Private) certificates +- User (Private) certificates <!-- --> -- Certificate Authority (CA) certificates +- Certificate Authority (CA) certificates <!-- --> -- Host certificates +- Host certificates <!-- --> -- Service certificates +- Service certificates  However, users need only manage User and CA certificates. Note that your user certificate is protected by an associated private key, and this -**private key must never be disclosed**. +private key must never be disclosed**. Q: Which X.509 certificates are recognised by IT4Innovations? ------------------------------------------------------------- @@ -137,7 +137,7 @@ Lastly, if you need the CA certificates for a personal Globus 5 installation, then you can install the CA certificates from a MyProxy server with the following command. - myproxy-get-trustroots -s myproxy-prace.lrz.de + myproxy-get-trustroots -s myproxy-prace.lrz.de If you run this command as ’root’, then it will install the certificates into /etc/grid-security/certificates. If you run this not as ’root’, @@ -170,25 +170,25 @@ The following examples are for Unix/Linux operating systems only. To convert from PEM to p12, enter the following command: - openssl pkcs12 -export -in usercert.pem -inkey userkey.pem -out - username.p12 + openssl pkcs12 -export -in usercert.pem -inkey userkey.pem -out + username.p12 To convert from p12 to PEM, type the following *four* commands: - openssl pkcs12 -in username.p12 -out usercert.pem -clcerts -nokeys - openssl pkcs12 -in username.p12 -out userkey.pem -nocerts - chmod 444 usercert.pem - chmod 400 userkey.pem + openssl pkcs12 -in username.p12 -out usercert.pem -clcerts -nokeys + openssl pkcs12 -in username.p12 -out userkey.pem -nocerts + chmod 444 usercert.pem + chmod 400 userkey.pem To check your Distinguished Name (DN), enter the following command: - openssl x509 -in usercert.pem -noout -subject -nameopt - RFC2253 + openssl x509 -in usercert.pem -noout -subject -nameopt + RFC2253 To check your certificate (e.g., DN, validity, issuer, public key algorithm, etc.), enter the following command: - openssl x509 -in usercert.pem -text -noout + openssl x509 -in usercert.pem -text -noout To download openssl for both Linux and Windows, please visit <http://www.openssl.org/related/binaries.html>. On Macintosh Mac OS X @@ -202,9 +202,9 @@ manage keystores, which themselves are stores of keys and certificates. For example if you want to convert your pkcs12 formatted key pair into a java keystore you can use the following command. - keytool -importkeystore -srckeystore $my_p12_cert -destkeystore - $my_keystore -srcstoretype pkcs12 -deststoretype jks -alias - $my_nickname -destalias $my_nickname + keytool -importkeystore -srckeystore $my_p12_cert -destkeystore + $my_keystore -srcstoretype pkcs12 -deststoretype jks -alias + $my_nickname -destalias $my_nickname where $my_p12_cert is the name of your p12 (pkcs12) certificate, $my_keystore is the name that you give to your new java keystore and @@ -214,7 +214,7 @@ is used also for the new keystore. You also can import CA certificates into your java keystore with the tool, e.g.: - keytool -import -trustcacerts -alias $mydomain -file $mydomain.crt -keystore $my_keystore + keytool -import -trustcacerts -alias $mydomain -file $mydomain.crt -keystore $my_keystore where $mydomain.crt is the certificate of a trusted signing authority (CA) and $mydomain is the alias name that you give to the entry. @@ -261,7 +261,7 @@ by first choosing the "Preferences" window. For Windows, this is Tools->Options. For Linux, this is Edit->Preferences. For Mac, this is Firefox->Preferences. Then, choose the "Advanced" button; followed by the "Encryption" tab. Then, choose the "Certificates" panel; -select the option []()"Select one automatically" if you have only one +select the option "Select one automatically" if you have only one certificate, or "Ask me every time" if you have more then one. Then click on the "View Certificates" button to open the "Certificate Manager" window. You can then select the "Your Certificates" tab and diff --git a/converted/docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.md b/converted/docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.md index 752bd57eb78712a263767517a5883e424fc23b27..9f3f8d85678030e096fa34f772dc567cb7d33235 100644 --- a/converted/docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.md +++ b/converted/docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.md @@ -3,7 +3,7 @@ Obtaining Login Credentials - + Obtaining Authorization ----------------------- @@ -18,9 +18,9 @@ allocated to her/his Project. These collaborators will be associated to the Project. The Figure below is depicting the authorization chain: - + + + @@ -37,7 +37,7 @@ paperwork is required. All IT4I employees may contact the Head of Supercomputing Services in order to obtain **free access to the clusters**. -### []()Authorization of PI by Allocation Committee +### Authorization of PI by Allocation Committee The PI is authorized to use the clusters by the allocation decision issued by the Allocation Committee.The PI will be informed by IT4I about @@ -52,10 +52,10 @@ Log in to the [IT4I Extranet portal](https://extranet.it4i.cz) using IT4I credentials and go to the **Projects** section. -- **Users:** Please, submit your requests for becoming a - project member. -- **Primary Investigators:** Please, approve or deny users' requests - in the same section. +- **Users:** Please, submit your requests for becoming a + project member. +- **Primary Investigators:** Please, approve or deny users' requests + in the same section. ### Authorization by e-mail (an alternative approach) @@ -65,36 +65,36 @@ support](https://support.it4i.cz/rt/) (E-mail: [support [at] it4i.cz](mailto:support%20%5Bat%5D%20it4i.cz)) and provide following information: -1. Identify your project by project ID -2. Provide list of people, including himself, who are authorized to use - the resources allocated to the project. The list must include full - name, e-mail and affiliation. Provide usernames as well, if - collaborator login access already exists on the IT4I systems. -3. Include "Authorization to IT4Innovations" into the subject line. +1.Identify your project by project ID +2.Provide list of people, including himself, who are authorized to use + the resources allocated to the project. The list must include full + name, e-mail and affiliation. Provide usernames as well, if + collaborator login access already exists on the IT4I systems. +3.Include "Authorization to IT4Innovations" into the subject line. Example (except the subject line which must be in English, you may use Czech or Slovak language for communication with us): - Subject: Authorization to IT4Innovations + Subject: Authorization to IT4Innovations - Dear support, + Dear support, - Please include my collaborators to project OPEN-0-0. + Please include my collaborators to project OPEN-0-0. - John Smith, john.smith@myemail.com, Department of Chemistry, MIT, US - Jonas Johansson, jjohansson@otheremail.se, Department of Physics, Royal Institute of Technology, Sweden - Luisa Fibonacci, lf@emailitalia.it, Department of Mathematics, National Research Council, Italy + John Smith, john.smith@myemail.com, Department of Chemistry, MIT, US + Jonas Johansson, jjohansson@otheremail.se, Department of Physics, Royal Institute of Technology, Sweden + Luisa Fibonacci, lf@emailitalia.it, Department of Mathematics, National Research Council, Italy - Thank you, - PI - (Digitally signed) + Thank you, + PI + (Digitally signed) Should the above information be provided by e-mail, the e-mail **must be** digitally signed. Read more on [digital signatures](obtaining-login-credentials.html#the-certificates-for-digital-signatures) below. -[]()The Login Credentials +The Login Credentials ------------------------- Once authorized by PI, every person (PI or Collaborator) wishing to @@ -103,36 +103,36 @@ support](https://support.it4i.cz/rt/) (E-mail: [support [at] it4i.cz](mailto:support%20%5Bat%5D%20it4i.cz)) providing following information: -1. Project ID -2. Full name and affiliation -3. Statement that you have read and accepted the [Acceptable use policy - document](http://www.it4i.cz/acceptable-use-policy.pdf) (AUP). -4. Attach the AUP file. -5. Your preferred username, max 8 characters long. The preferred - username must associate your surname and name or be otherwise - derived from it. Only alphanumeric sequences, dash and underscore - signs are allowed. -6. In case you choose [Alternative way to personal - certificate](obtaining-login-credentials.html#alternative-way-of-getting-personal-certificate), - a **scan of photo ID** (personal ID or passport or driver license) - is required +1.Project ID +2.Full name and affiliation +3.Statement that you have read and accepted the [Acceptable use policy + document](http://www.it4i.cz/acceptable-use-policy.pdf) (AUP). +4.Attach the AUP file. +5.Your preferred username, max 8 characters long. The preferred + username must associate your surname and name or be otherwise + derived from it. Only alphanumeric sequences, dash and underscore + signs are allowed. +6.In case you choose [Alternative way to personal + certificate](obtaining-login-credentials.html#alternative-way-of-getting-personal-certificate), + a **scan of photo ID** (personal ID or passport or driver license) + is required Example (except the subject line which must be in English, you may use Czech or Slovak language for communication with us): - Subject: Access to IT4Innovations + Subject: Access to IT4Innovations - Dear support, + Dear support, - Please open the user account for me and attach the account to OPEN-0-0 - Name and affiliation: John Smith, john.smith@myemail.com, Department of Chemistry, MIT, US - I have read and accept the Acceptable use policy document (attached) + Please open the user account for me and attach the account to OPEN-0-0 + Name and affiliation: John Smith, john.smith@myemail.com, Department of Chemistry, MIT, US + I have read and accept the Acceptable use policy document (attached) - Preferred username: johnsm + Preferred username: johnsm - Thank you, - John Smith - (Digitally signed) + Thank you, + John Smith + (Digitally signed) Should the above information be provided by e-mail, the e-mail **must be** digitally signed. To sign an e-mail, you need digital certificate. @@ -150,15 +150,15 @@ the IT4I systems. We accept certificates issued by any widely respected certification authority. -**For various reasons we do not accept PGP keys.** Please, use only +For various reasons we do not accept PGP keys.** Please, use only X.509 PKI certificates for communication with us. You will receive your personal login credentials by protected e-mail. The login credentials include: -1. username -2. ssh private key and private key passphrase -3. system password +1.username +2.ssh private key and private key passphrase +3.system password The clusters are accessed by the [private key](../accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.html) @@ -170,7 +170,7 @@ listed on <http://support.it4i.cz/>. On Linux, use -``` +``` local $ ssh-keygen -f id_rsa -p ``` @@ -182,7 +182,7 @@ Generator](../accessing-the-clusters/shell-access-and-data-transfer/putty/puttyg Change password in your user profile at <https://extranet.it4i.cz/user/> -[]()The Certificates for Digital Signatures +The Certificates for Digital Signatures ------------------------------------------- We accept personal certificates issued by any widely respected @@ -198,16 +198,16 @@ electronic contact with Czech authorities is accepted as well. Certificate generation process is well-described here: -- [How to generate a personal TCS certificate in Mozilla Firefox web - browser - (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/moz-cert-gen) +- [How to generate a personal TCS certificate in Mozilla Firefox web + browser + (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/moz-cert-gen)  -A FAQ about certificates can be found here: <span>[Certificates -FAQ](certificates-faq.html).</span> +A FAQ about certificates can be found here: >[Certificates +FAQ](certificates-faq.html). -[]()Alternative Way to Personal Certificate +Alternative Way to Personal Certificate ------------------------------------------- Follow these steps **only** if you can not obtain your certificate in a @@ -216,41 +216,41 @@ In case you choose this procedure, please attach a **scan of photo ID** (personal ID or passport or drivers license) when applying for [login credentials](obtaining-login-credentials.html#the-login-credentials). -1. Go to <https://www.cacert.org/>. - - If there's a security warning, just acknowledge it. - -2. Click *Join*. -3. Fill in the form and submit it by the *Next* button. - - Type in the e-mail address which you use for communication - with us. - - Don't forget your chosen *Pass Phrase*. - -4. You will receive an e-mail verification link. Follow it. -5. After verifying, go to the CAcert's homepage and login using - *Password Login*. -6. Go to *Client Certificates* -> *New*. -7. Tick *Add* for your e-mail address and click the *Next* button. -8. Click the *Create Certificate Request* button. -9. You'll be redirected to a page from where you can download/install - your certificate. - - Simultaneously you'll get an e-mail with a link to - the certificate. +1.Go to <https://www.cacert.org/>. + - If there's a security warning, just acknowledge it. + +2.Click *Join*. +3.Fill in the form and submit it by the *Next* button. + - Type in the e-mail address which you use for communication + with us. + - Don't forget your chosen *Pass Phrase*. + +4.You will receive an e-mail verification link. Follow it. +5.After verifying, go to the CAcert's homepage and login using + *Password Login*. +6.Go to *Client Certificates* -> *New*. +7.Tick *Add* for your e-mail address and click the *Next* button. +8.Click the *Create Certificate Request* button. +9.You'll be redirected to a page from where you can download/install + your certificate. + - Simultaneously you'll get an e-mail with a link to + the certificate. Installation of the Certificate Into Your Mail Client ----------------------------------------------------- The procedure is similar to the following guides: -- MS Outlook 2010 - - [How to Remove, Import, and Export Digital - Certificates](http://support.microsoft.com/kb/179380) - - [Importing a PKCS #12 certificate - (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/outl-cert-imp) -- Mozilla Thudnerbird - - [Installing an SMIME - certificate](http://kb.mozillazine.org/Installing_an_SMIME_certificate) - - [Importing a PKCS #12 certificate - (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/moz-cert-imp) +- MS Outlook 2010 + - [How to Remove, Import, and Export Digital + Certificates](http://support.microsoft.com/kb/179380) + - [Importing a PKCS #12 certificate + (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/outl-cert-imp) +- Mozilla Thudnerbird + - [Installing an SMIME + certificate](http://kb.mozillazine.org/Installing_an_SMIME_certificate) + - [Importing a PKCS #12 certificate + (in Czech)](http://idoc.vsb.cz/xwiki/wiki/infra/view/uzivatel/moz-cert-imp) End of User Account Lifecycle ----------------------------- @@ -264,9 +264,9 @@ were attached expires. User will get 3 automatically generated warning e-mail messages of the pending removal:. -- First message will be sent 3 months before the removal -- Second message will be sent 1 month before the removal -- Third message will be sent 1 week before the removal. +- First message will be sent 3 months before the removal +- Second message will be sent 1 month before the removal +- Third message will be sent 1 week before the removal. The messages will inform about the projected removal date and will challenge the user to migrate her/his data diff --git a/converted/docs.it4i.cz/index.md b/converted/docs.it4i.cz/index.md index 69233f4f7b50a9a36a093630789398ab1e8644f1..42a6d692007d1cfb7b6b16f4e8a404bb925fc62a 100644 --- a/converted/docs.it4i.cz/index.md +++ b/converted/docs.it4i.cz/index.md @@ -3,27 +3,27 @@ Documentation - + Welcome to IT4Innovations documentation pages. The IT4Innovations national supercomputing center operates supercomputers [Salomon](salomon.html) and -[Anselm](anselm.html). The supercomputers are [<span -class="external-link">available</span>](get-started-with-it4innovations/applying-for-resources.html) +[Anselm](anselm.html). The supercomputers are [ +class="external-link">available](get-started-with-it4innovations/applying-for-resources.html) to academic community within the Czech Republic and Europe and industrial community worldwide. The purpose of these pages is to provide a comprehensive documentation on hardware, software and usage of the computers. -<span class="link-external"><span class="WYSIWYG_LINK">How to read the documentation</span></span> + class="link-external"> class="WYSIWYG_LINK">How to read the documentation -------------------------------------------------------------------------------------------------- -1. Read the list in the left column. Select the subject of interest. - Alternatively, use the Search box in the upper right corner. -2. Read the CONTENTS in the upper right corner. -3. Scan for all the yellow bulb call-outs on the page. -4. Read the details if still more information is needed. **Look for - examples** illustrating the concepts. +1.Read the list in the left column. Select the subject of interest. + Alternatively, use the Search box in the upper right corner. +2.Read the CONTENTS in the upper right corner. +3.Scan for all the yellow bulb call-outs on the page. +4.Read the details if still more information is needed. **Look for + examples** illustrating the concepts.  @@ -31,9 +31,9 @@ The call-out.  Focus on the call-outs before reading full details.  -- Read the - [Changelog](get-started-with-it4innovations/changelog.html) - to keep up to date. +- Read the + [Changelog](get-started-with-it4innovations/changelog.html) + to keep up to date. Getting Help and Support ------------------------ @@ -60,11 +60,11 @@ You need basic proficiency in Linux environment. In order to use the system for your calculations, you need basic proficiency in Linux environment. To gain the proficiency, we recommend -you reading the [<span class="WYSIWYG_LINK">introduction to -Linux</span>](http://www.tldp.org/LDP/intro-linux/html/) +you reading the [ class="WYSIWYG_LINK">introduction to +Linux](http://www.tldp.org/LDP/intro-linux/html/) operating system environment and installing a Linux distribution on your -personal computer. A good choice might be the [<span -class="WYSIWYG_LINK">Fedora</span>](http://fedoraproject.org/) +personal computer. A good choice might be the [ +class="WYSIWYG_LINK">Fedora](http://fedoraproject.org/) distribution, as it is similar to systems on the clusters at IT4Innovations. It's easy to install and use. In fact, any distribution would do. @@ -86,32 +86,32 @@ IT4Innovations.](http://prace.it4i.cz) Terminology Frequently Used on These Pages ------------------------------------------ -- **node:** a computer, interconnected by network to other computers - - Computational nodes are powerful computers, designed and dedicated - for executing demanding scientific computations. -- **core:** processor core, a unit of processor, executing - computations -- **corehours:** wall clock hours of processor core time - Each node - is equipped with **X** processor cores, provides **X** corehours per - 1 wall clock hour. -- **[]()job:** a calculation running on the supercomputer - The job - allocates and utilizes resources of the supercomputer for - certain time. -- **HPC:** High Performance Computing -- []()**HPC (computational) resources:** corehours, storage capacity, - software licences -- **code:** a program -- **primary investigator (PI):** a person responsible for execution of - computational project and utilization of computational resources - allocated to that project -- **collaborator:** a person participating on execution of - computational project and utilization of computational resources - allocated to that project -- **[]()project:** a computational project under investigation by the - PI - The project is identified by the project ID. The computational - resources are allocated and charged per project. -- **[]()jobscript:** a script to be executed by the PBS Professional - workload manager +- **node:** a computer, interconnected by network to other computers - + Computational nodes are powerful computers, designed and dedicated + for executing demanding scientific computations. +- **core:** processor core, a unit of processor, executing + computations +- **corehours:** wall clock hours of processor core time - Each node + is equipped with **X** processor cores, provides **X** corehours per + 1 wall clock hour. +- **job:** a calculation running on the supercomputer - The job + allocates and utilizes resources of the supercomputer for + certain time. +- **HPC:** High Performance Computing +- **HPC (computational) resources:** corehours, storage capacity, + software licences +- **code:** a program +- **primary investigator (PI):** a person responsible for execution of + computational project and utilization of computational resources + allocated to that project +- **collaborator:** a person participating on execution of + computational project and utilization of computational resources + allocated to that project +- **project:** a computational project under investigation by the + PI - The project is identified by the project ID. The computational + resources are allocated and charged per project. +- **jobscript:** a script to be executed by the PBS Professional + workload manager Conventions ----------- @@ -121,13 +121,13 @@ examples. We use the following conventions:  Cluster command prompt -``` +``` $ ``` Your local linux host command prompt -``` +``` local $ ``` diff --git a/converted/docs.it4i.cz/salomon/accessing-the-cluster.md b/converted/docs.it4i.cz/salomon/accessing-the-cluster.md index a1d96c6260f1184143ef3f1c37e1236795050767..a6b63b5c58da6b955765fe557dac221881277e22 100644 --- a/converted/docs.it4i.cz/salomon/accessing-the-cluster.md +++ b/converted/docs.it4i.cz/salomon/accessing-the-cluster.md @@ -3,7 +3,7 @@ Shell access and data transfer - + Interactive Login ----------------- @@ -13,25 +13,25 @@ login2, login3 and login4 at address salomon.it4i.cz. The login nodes may be addressed specifically, by prepending the login node name to the address. -The alias <span>salomon.it4i.cz is currently not available through VPN +The alias >salomon.it4i.cz is currently not available through VPN connection. Please use loginX.salomon.it4i.cz when connected to -VPN.</span> +VPN. - Login address Port Protocol Login node - ------------------------ ------ ---------- ----------------------------------------- - salomon.it4i.cz 22 ssh round-robin DNS record for login[1-4] - login1.salomon.it4i.cz 22 ssh login1 - login2.salomon.it4i.cz 22 ssh login2 - login3.salomon.it4i.cz 22 ssh login3 - login4.salomon.it4i.cz 22 ssh login4 +Login address Port Protocol Login node +------------------------ ------ ---------- ----------------------------------------- +salomon.it4i.cz 22 ssh round-robin DNS record for login[1-4] +login1.salomon.it4i.cz 22 ssh login1 +login2.salomon.it4i.cz 22 ssh login2 +login3.salomon.it4i.cz 22 ssh login3 +login4.salomon.it4i.cz 22 ssh login4 The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.html) Please verify SSH fingerprints during the first logon. They are -identical on all login nodes:<span class="monospace"> +identical on all login nodes: f6:28:98:e4:f9:b2:a6:8f:f2:f4:2d:0a:09:67:69:80 (DSA) -70:01:c9:9a:5d:88:91:c7:1b:c0:84:d1:fa:4e:83:5c (RSA)</span> +70:01:c9:9a:5d:88:91:c7:1b:c0:84:d1:fa:4e:83:5c (RSA)  @@ -39,14 +39,14 @@ Private keys authentication: On **Linux** or **Mac**, use -``` +``` local $ ssh -i /path/to/id_rsa username@salomon.it4i.cz ``` If you see warning message "UNPROTECTED PRIVATE KEY FILE!", use this command to set lower permissions to private key file. -``` +``` local $ chmod 600 /path/to/id_rsa ``` @@ -55,19 +55,19 @@ client](../get-started-with-it4innovations/accessing-the-clusters/shell-access-a After logging in, you will see the command prompt: -                    _____      _                            -                   / ____|    | |                           -                  | (___  __ _| | ___ _ __ ___  ___ _ __  -                   ___ / _` | |/ _ | '_ ` _ / _ | '_ -                   ____) | (_| | | (_) | | | | | | (_) | | | | -                  |_____/ __,_|_|___/|_| |_| |_|___/|_| |_| -                  +                    _____      _                            +                   / ____|    | |                           +                  | (___  __ _| | ___ _ __ ___  ___ _ __  +                   ___ / _` | |/ _ | '_ ` _ / _ | '_ +                   ____) | (_| | | (_) | | | | | | (_) | | | | +                  |_____/ __,_|_|___/|_| |_| |_|___/|_| |_| +                  -                        http://www.it4i.cz/?lang=en +                        http://www.it4i.cz/?lang=en - Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com - [username@login2.salomon ~]$ + Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com + [username@login2.salomon ~]$ The environment is **not** shared between login nodes, except for [shared filesystems](storage/storage.html). @@ -86,13 +86,13 @@ nodes cedge[1-3].salomon.it4i.cz for increased performance. HTML commented section #1 (removed cedge servers from the table) - Address Port Protocol - ------------------------------------------------------ ------ ----------------------------------------- - salomon.it4i.cz 22 scp, sftp - login1.salomon.it4i.cz 22 scp, sftp - login2.salomon.it4i.cz 22 scp, sftp - <span class="discreet"></span>login3.salomon.it4i.cz 22 <span class="discreet"></span>scp, sftp - login4.salomon.it4i.cz 22 scp, sftp +Address Port Protocol +------------------------------------------------------ ------ ----------------------------------------- +salomon.it4i.cz 22 scp, sftp +login1.salomon.it4i.cz 22 scp, sftp +login2.salomon.it4i.cz 22 scp, sftp + class="discreet">login3.salomon.it4i.cz 22 class="discreet">scp, sftp +login4.salomon.it4i.cz 22 scp, sftp  The authentication is by the [private key](../get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.html) @@ -102,17 +102,17 @@ verified) On linux or Mac, use scp or sftp client to transfer the data to Salomon: -``` +``` local $ scp -i /path/to/id_rsa my-local-file username@salomon.it4i.cz:directory/file ``` -``` +``` local $ scp -i /path/to/id_rsa -r my-local-dir username@salomon.it4i.cz:directory ``` > or -``` +``` local $ sftp -o IdentityFile=/path/to/id_rsa username@salomon.it4i.cz ``` @@ -120,7 +120,7 @@ Very convenient way to transfer files in and out of the Salomon computer is via the fuse filesystem [sshfs](http://linux.die.net/man/1/sshfs) -``` +``` local $ sshfs -o IdentityFile=/path/to/id_rsa username@salomon.it4i.cz:. mountpoint ``` @@ -129,13 +129,13 @@ local computer, just like an external disk. Learn more on ssh, scp and sshfs by reading the manpages -``` +``` $ man ssh $ man scp $ man sshfs ``` -<span>On Windows</span>, use [WinSCP +>On Windows, use [WinSCP client](http://winscp.net/eng/download.php) to transfer the data. The [win-sshfs client](http://code.google.com/p/win-sshfs/) provides a diff --git a/converted/docs.it4i.cz/salomon/copy_of_vpn_web_install_3.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/copy_of_vpn_web_install_3.png similarity index 100% rename from converted/docs.it4i.cz/salomon/copy_of_vpn_web_install_3.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/copy_of_vpn_web_install_3.png diff --git a/converted/docs.it4i.cz/salomon/accessing-the-cluster/outgoing-connections.md b/converted/docs.it4i.cz/salomon/accessing-the-cluster/outgoing-connections.md index 7da426aab8686ca0eafcdb8fcf8aa02b460617d5..4ba8f15efa9635820cca99b64aed3d522bd74eb4 100644 --- a/converted/docs.it4i.cz/salomon/accessing-the-cluster/outgoing-connections.md +++ b/converted/docs.it4i.cz/salomon/accessing-the-cluster/outgoing-connections.md @@ -3,7 +3,7 @@ Outgoing connections - + Connection restrictions ----------------------- @@ -11,12 +11,12 @@ Connection restrictions Outgoing connections, from Salomon Cluster login nodes to the outside world, are restricted to following ports: - Port Protocol - ------ ---------- - 22 ssh - 80 http - 443 https - 9418 git +Port Protocol +------ ---------- +22 ssh +80 http +443 https +9418 git Please use **ssh port forwarding** and proxy servers to connect from Salomon to all other remote ports. @@ -28,7 +28,7 @@ outside world are cut. Port forwarding --------------- -### []()Port forwarding from login nodes +### Port forwarding from login nodes Port forwarding allows an application running on Salomon to connect to arbitrary remote host and port. @@ -39,7 +39,7 @@ workstation and forwarding from the workstation to the remote host. Pick some unused port on Salomon login node (for example 6000) and establish the port forwarding: -``` +``` local $ ssh -R 6000:remote.host.com:1234 salomon.it4i.cz ``` @@ -57,13 +57,13 @@ remote.host.com:1234. Click Add button, then Open. Port forwarding may be established directly to the remote host. However, this requires that user has ssh access to remote.host.com -``` +``` $ ssh -L 6000:localhost:1234 remote.host.com ``` Note: Port number 6000 is chosen as an example only. Pick any free port. -### []()Port forwarding from compute nodes +### Port forwarding from compute nodes Remote port forwarding from compute nodes allows applications running on the compute nodes to access hosts outside Salomon Cluster. @@ -75,7 +75,7 @@ above](outgoing-connections.html#port-forwarding-from-login-nodes). Second, invoke port forwarding from the compute node to the login node. Insert following line into your jobscript or interactive shell -``` +``` $ ssh -TN -f -L 6000:localhost:6000 login1 ``` @@ -98,7 +98,7 @@ SOCKS proxy server software. On Linux, sshd demon provides the functionality. To establish SOCKS proxy server listening on port 1080 run: -``` +``` local $ ssh -D 1080 localhost ``` @@ -109,7 +109,7 @@ Once the proxy server is running, establish ssh port forwarding from Salomon to the proxy server, port 1080, exactly as [described above](outgoing-connections.html#port-forwarding-from-login-nodes). -``` +``` local $ ssh -R 6000:localhost:1080 salomon.it4i.cz ``` diff --git a/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn-access.md b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn-access.md index 6093d132d4fe17d200c907c41233d1a427bc78c8..7d6b7500f509702efec187a547d3af27498830cd 100644 --- a/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn-access.md +++ b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn-access.md @@ -3,7 +3,7 @@ VPN Access - + Accessing IT4Innovations internal resources via VPN --------------------------------------------------- @@ -13,52 +13,52 @@ local network, it is necessary to VPN connect to this network. We use Cisco AnyConnect Secure Mobility Client, which is supported on the following operating systems: -- <span>Windows XP</span> -- <span>Windows Vista</span> -- <span>Windows 7</span> -- <span>Windows 8</span> -- <span>Linux</span> -- <span>MacOS</span> +- >Windows XP +- >Windows Vista +- >Windows 7 +- >Windows 8 +- >Linux +- >MacOS It is impossible to connect to VPN from other operating systems. -<span>VPN client installation</span> +>VPN client installation ------------------------------------ You can install VPN client from web interface after successful login with LDAP credentials on address <https://vpn.it4i.cz/user> -[](../vpn_web_login.png) + + According to the Java settings after login, the client either automatically installs, or downloads installation file for your operating system. It is necessary to allow start of installation tool for automatic installation. -[](../vpn_web_login_2.png) -[](../vpn_web_install_2.png)[](../copy_of_vpn_web_install_3.png) + + + +Install](https://docs.it4i.cz/salomon/vpn_web_install_2.png/@@images/c2baba93-824b-418d-b548-a73af8030320.png "VPN Install")](../vpn_web_install_2.png) + After successful installation, VPN connection will be established and you can use available resources from IT4I network. -[](../vpn_web_install_4.png) + + If your Java setting doesn't allow automatic installation, you can download installation file and install VPN client manually. -[](../vpn_web_download.png) + + After you click on the link, download of installation file will start. -[](../vpn_web_download_2.png) + + After successful download of installation file, you have to execute this tool with administrator's rights and install VPN client manually. @@ -69,50 +69,50 @@ Working with VPN client You can use graphical user interface or command line interface to run VPN client on all supported operating systems. We suggest using GUI. - + Before the first login to VPN, you have to fill URL **https://vpn.it4i.cz/user** into the text field. -[](../vpn_contacting_https_cluster.png) + Contacting + After you click on the Connect button, you must fill your login credentials. -[](../vpn_contacting_https.png) + Contacting + After a successful login, the client will minimize to the system tray. If everything works, you can see a lock in the Cisco tray icon. -[](../../anselm-cluster-documentation/anyconnecticon.jpg) + + If you right-click on this icon, you will see a context menu in which you can control the VPN connection. -[](../../anselm-cluster-documentation/anyconnectcontextmenu.jpg) + + When you connect to the VPN for the first time, the client downloads the profile and creates a new item "IT4I cluster" in the connection list. For subsequent connections, it is not necessary to re-enter the URL address, but just select the corresponding item. -[](../vpn_contacting.png) + Contacting + Then AnyConnect automatically proceeds like in the case of first logon. -[](../vpn_login.png) + + After a successful logon, you can see a green circle with a tick mark on the lock icon. -[](../vpn_successfull_connection.png) + Succesfull + For disconnecting, right-click on the AnyConnect client icon in the system tray and select **VPN Disconnect**. diff --git a/converted/docs.it4i.cz/salomon/vpn_contacting.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_contacting.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_contacting.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_contacting.png diff --git a/converted/docs.it4i.cz/salomon/vpn_contacting_https.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_contacting_https.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_contacting_https.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_contacting_https.png diff --git a/converted/docs.it4i.cz/salomon/vpn_contacting_https_cluster.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_contacting_https_cluster.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_contacting_https_cluster.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_contacting_https_cluster.png diff --git a/converted/docs.it4i.cz/salomon/vpn_login.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_login.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_login.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_login.png diff --git a/converted/docs.it4i.cz/salomon/vpn_successfull_connection.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_successfull_connection.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_successfull_connection.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_successfull_connection.png diff --git a/converted/docs.it4i.cz/salomon/vpn_web_download.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_download.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_web_download.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_download.png diff --git a/converted/docs.it4i.cz/salomon/vpn_web_download_2.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_download_2.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_web_download_2.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_download_2.png diff --git a/converted/docs.it4i.cz/salomon/vpn_web_install_2.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_install_2.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_web_install_2.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_install_2.png diff --git a/converted/docs.it4i.cz/salomon/vpn_web_install_4.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_install_4.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_web_install_4.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_install_4.png diff --git a/converted/docs.it4i.cz/salomon/vpn_web_login.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_login.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_web_login.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_login.png diff --git a/converted/docs.it4i.cz/salomon/vpn_web_login_2.png b/converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_login_2.png similarity index 100% rename from converted/docs.it4i.cz/salomon/vpn_web_login_2.png rename to converted/docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_login_2.png diff --git a/converted/docs.it4i.cz/salomon/environment-and-modules.md b/converted/docs.it4i.cz/salomon/environment-and-modules.md index 957d43f2ce4959e0ce05c3c0b6847dbcca4c8dde..53ff50906134f85f629407e964cea76a8bf2eeaa 100644 --- a/converted/docs.it4i.cz/salomon/environment-and-modules.md +++ b/converted/docs.it4i.cz/salomon/environment-and-modules.md @@ -3,7 +3,7 @@ Environment and Modules - + ### Environment Customization @@ -11,12 +11,12 @@ After logging in, you may want to configure the environment. Write your preferred path definitions, aliases, functions and module loads in the .bashrc file -``` +``` # ./bashrc # Source global definitions if [ -f /etc/bashrc ]; then - . /etc/bashrc + . /etc/bashrc fi # User specific aliases and functions @@ -33,9 +33,9 @@ fi Do not run commands outputing to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Take care for SSH session -interactivity for such commands as <span id="result_box" -class="short_text"><span class="hps alt-edited">stated</span> <span -class="hps">in the previous example.</span></span> +interactivity for such commands as id="result_box" +class="short_text"> class="hps alt-edited">stated +class="hps">in the previous example. ### Application Modules @@ -46,7 +46,7 @@ Application modules on Salomon cluster are built using [EasyBuild](http://hpcugent.github.io/easybuild/ "EasyBuild"). The modules are divided into the following structure: -``` +``` base: Default module class bio: Bioinformatics, biology and biomedical cae: Computer Aided Engineering (incl. CFD) @@ -78,13 +78,13 @@ needs. To check available modules use -``` +``` $ module avail ``` To load a module, for example the OpenMPI module use -``` +``` $ module load OpenMPI ``` @@ -93,19 +93,19 @@ of your active shell such that you are ready to run the OpenMPI software To check loaded modules use -``` +``` $ module list ```  To unload a module, for example the OpenMPI module use -``` +``` $ module unload OpenMPI ``` Learn more on modules by reading the module man page -``` +``` $ man module ``` @@ -142,23 +142,23 @@ configuration options. Recent releases of EasyBuild include out-of-the-box toolchain support for: -- various compilers, including GCC, Intel, Clang, CUDA -- common MPI libraries, such as Intel MPI, MPICH, MVAPICH2, OpenMPI -- various numerical libraries, including ATLAS, Intel MKL, OpenBLAS, - ScalaPACK, FFTW +- various compilers, including GCC, Intel, Clang, CUDA +- common MPI libraries, such as Intel MPI, MPICH, MVAPICH2, OpenMPI +- various numerical libraries, including ATLAS, Intel MKL, OpenBLAS, + ScalaPACK, FFTW  On Salomon, we have currently following toolchains installed: - Toolchain Module(s) - -------------------- ------------------------------------------------ - GCC GCC - ictce icc, ifort, imkl, impi - intel GCC, icc, ifort, imkl, impi - gompi GCC, OpenMPI - goolf BLACS, FFTW, GCC, OpenBLAS, OpenMPI, ScaLAPACK - <span>iompi</span> OpenMPI, icc, ifort - iccifort icc, ifort +Toolchain Module(s) +-------------------- ------------------------------------------------ +GCC GCC +ictce icc, ifort, imkl, impi +intel GCC, icc, ifort, imkl, impi +gompi GCC, OpenMPI +goolf BLACS, FFTW, GCC, OpenBLAS, OpenMPI, ScaLAPACK +>iompi OpenMPI, icc, ifort +iccifort icc, ifort diff --git a/converted/docs.it4i.cz/salomon/hardware-overview-1/hardware-overview.md b/converted/docs.it4i.cz/salomon/hardware-overview-1/hardware-overview.md index 9833fbd46985a198fcf700ea18bc8a498546650f..14fed67250cfc55391a5e05b3a9fdba0dc2325e0 100644 --- a/converted/docs.it4i.cz/salomon/hardware-overview-1/hardware-overview.md +++ b/converted/docs.it4i.cz/salomon/hardware-overview-1/hardware-overview.md @@ -3,14 +3,14 @@ Hardware Overview - + Introduction ------------ The Salomon cluster consists of 1008 computational nodes of which 576 are regular compute nodes and 432 accelerated nodes. Each node is a -<span class="WYSIWYG_LINK">powerful</span> x86-64 computer, equipped + class="WYSIWYG_LINK">powerful x86-64 computer, equipped with 24 cores (two twelve-core Intel Xeon processors) and 128GB RAM. The nodes are interlinked by high speed InfiniBand and Ethernet networks. All nodes share 0.5PB /home NFS disk storage to store the user files. @@ -20,8 +20,8 @@ Salomon cluster is provided by four login nodes. [More about schematic representation of the Salomon cluster compute nodes IB -topology](../network-1/ib-single-plane-topology.html).<span -class="internal-link"></span> +topology](../network-1/ib-single-plane-topology.html). + [](../salomon-2) @@ -30,7 +30,7 @@ The parameters are summarized in the following tables: General information ------------------- -**In general** +In general** Primary purpose High Performance Computing Architecture of compute nodes @@ -52,18 +52,18 @@ w/o accelerator 576 MIC accelerated 432 -**In total** +In total** Total theoretical peak performance (Rpeak) 2011 Tflop/s Total amount of RAM -<span>129.024 TB</span> +>129.024 TB Compute nodes ------------- - Node Count Processor Cores Memory Accelerator - ----------------- ------- ---------------------------------- ------- -------- -------------------------------------------- - w/o accelerator 576 2x Intel Xeon E5-2680v3, 2.5GHz 24 128GB - - MIC accelerated 432 2x Intel Xeon E5-2680v3, 2.5GHz 24 128GB 2x Intel Xeon Phi 7120P, 61cores, 16GB RAM +Node Count Processor Cores Memory Accelerator +----------------- ------- ---------------------------------- ------- -------- -------------------------------------------- +w/o accelerator 576 2x Intel Xeon E5-2680v3, 2.5GHz 24 128GB - +MIC accelerated 432 2x Intel Xeon E5-2680v3, 2.5GHz 24 128GB 2x Intel Xeon Phi 7120P, 61cores, 16GB RAM For more details please refer to the [Compute nodes](../compute-nodes.html). @@ -74,9 +74,9 @@ Remote visualization nodes For remote visualization two nodes with NICE DCV software are available each configured: - Node Count Processor Cores Memory GPU Accelerator - --------------- ------- --------------------------------- ------- -------- ------------------------------ - visualization 2 2x Intel Xeon E5-2695v3, 2.3GHz 28 512GB NVIDIA QUADRO K5000, 4GB RAM +Node Count Processor Cores Memory GPU Accelerator +--------------- ------- --------------------------------- ------- -------- ------------------------------ +visualization 2 2x Intel Xeon E5-2695v3, 2.3GHz 28 512GB NVIDIA QUADRO K5000, 4GB RAM SGI UV 2000 ----------- diff --git a/converted/docs.it4i.cz/salomon/index.md b/converted/docs.it4i.cz/salomon/index.md index f5d09b4ffdbc3b008dd6ccba28e5a0df0397455f..0f83c38f8a48c519c0f8cfef4aa37062f10cce69 100644 --- a/converted/docs.it4i.cz/salomon/index.md +++ b/converted/docs.it4i.cz/salomon/index.md @@ -3,8 +3,8 @@ Introduction Welcome to Salomon supercomputer cluster. The Salomon cluster consists of 1008 compute nodes, totaling 24192 compute cores with 129TB RAM and -giving over 2 Pflop/s theoretical peak performance. Each node is a <span -class="WYSIWYG_LINK">powerful</span> x86-64 computer, equipped with 24 +giving over 2 Pflop/s theoretical peak performance. Each node is a +class="WYSIWYG_LINK">powerful x86-64 computer, equipped with 24 cores, at least 128GB RAM. Nodes are interconnected by 7D Enhanced hypercube Infiniband network and equipped with Intel Xeon E5-2680v3 processors. The Salomon cluster consists of 576 nodes without @@ -12,22 +12,22 @@ accelerators and 432 nodes equipped with Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview](hardware-overview-1/hardware-overview.html). -The cluster runs CentOS Linux [<span -class="WYSIWYG_LINK"></span>](http://www.bull.com/bullx-logiciels/systeme-exploitation.html)<span -class="internal-link">operating system</span>, which is compatible with -the <span class="WYSIWYG_LINK">RedHat</span> [<span +The cluster runs CentOS Linux [ +class="WYSIWYG_LINK">](http://www.bull.com/bullx-logiciels/systeme-exploitation.html) +operating system, which is compatible with +the class="WYSIWYG_LINK">RedHat [ class="WYSIWYG_LINK">Linux -family.</span>](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) +family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) **Water-cooled Compute Nodes With MIC Accelerator** -[](salomon) + - + **Tape Library T950B** - +![]](salomon-3.jpeg) - + diff --git a/converted/docs.it4i.cz/salomon/introduction.md b/converted/docs.it4i.cz/salomon/introduction.md index f5d09b4ffdbc3b008dd6ccba28e5a0df0397455f..0f83c38f8a48c519c0f8cfef4aa37062f10cce69 100644 --- a/converted/docs.it4i.cz/salomon/introduction.md +++ b/converted/docs.it4i.cz/salomon/introduction.md @@ -3,8 +3,8 @@ Introduction Welcome to Salomon supercomputer cluster. The Salomon cluster consists of 1008 compute nodes, totaling 24192 compute cores with 129TB RAM and -giving over 2 Pflop/s theoretical peak performance. Each node is a <span -class="WYSIWYG_LINK">powerful</span> x86-64 computer, equipped with 24 +giving over 2 Pflop/s theoretical peak performance. Each node is a +class="WYSIWYG_LINK">powerful x86-64 computer, equipped with 24 cores, at least 128GB RAM. Nodes are interconnected by 7D Enhanced hypercube Infiniband network and equipped with Intel Xeon E5-2680v3 processors. The Salomon cluster consists of 576 nodes without @@ -12,22 +12,22 @@ accelerators and 432 nodes equipped with Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview](hardware-overview-1/hardware-overview.html). -The cluster runs CentOS Linux [<span -class="WYSIWYG_LINK"></span>](http://www.bull.com/bullx-logiciels/systeme-exploitation.html)<span -class="internal-link">operating system</span>, which is compatible with -the <span class="WYSIWYG_LINK">RedHat</span> [<span +The cluster runs CentOS Linux [ +class="WYSIWYG_LINK">](http://www.bull.com/bullx-logiciels/systeme-exploitation.html) +operating system, which is compatible with +the class="WYSIWYG_LINK">RedHat [ class="WYSIWYG_LINK">Linux -family.</span>](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) +family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) **Water-cooled Compute Nodes With MIC Accelerator** -[](salomon) + - + **Tape Library T950B** - +![]](salomon-3.jpeg) - + diff --git a/converted/docs.it4i.cz/salomon/network-1/7d-enhanced-hypercube.md b/converted/docs.it4i.cz/salomon/network-1/7d-enhanced-hypercube.md index e35b1fd457ac69517ccade3ec1ba1122af42db38..9487e731a7cb90eafc4c412f36bb33b424bb6d13 100644 --- a/converted/docs.it4i.cz/salomon/network-1/7d-enhanced-hypercube.md +++ b/converted/docs.it4i.cz/salomon/network-1/7d-enhanced-hypercube.md @@ -6,35 +6,35 @@ dimension.](../resource-allocation-and-job-execution/job-submission-and-executio Nodes may be selected via the PBS resource attribute ehc_[1-7]d . - Hypercube dimension <span class="pun">node_group_key</span> - --------------------- ------------------------------------------- - 1D ehc_1d - 2D ehc_2d - 3D ehc_3d - 4D ehc_4d - 5D ehc_5d - 6D ehc_6d - 7D ehc_7d +Hypercube dimension +--------------------- ------------------------------------------- +1D ehc_1d +2D ehc_2d +3D ehc_3d +4D ehc_4d +5D ehc_5d +6D ehc_6d +7D ehc_7d [Schematic representation of the Salomon cluster IB single-plain -topology represents <span class="internal-link">hypercube -dimension</span> 0](ib-single-plane-topology.html). +topology represents hypercube +dimension 0](ib-single-plane-topology.html). ### 7D Enhanced Hypercube -[](7D_Enhanced_hypercube.png) +  - Node type Count Short name Long name Rack - -------------------------------------- ------- ------------------ -------------------------- ------- - M-Cell compute nodes w/o accelerator 576 cns1 -cns576 r1i0n0 - r4i7n17 1-4 - compute nodes MIC accelerated 432 cns577 - cns1008 r21u01n577 - r37u31n1008 21-38 +Node type Count Short name Long name Rack +-------------------------------------- ------- ------------------ -------------------------- ------- +M-Cell compute nodes w/o accelerator 576 cns1 -cns576 r1i0n0 - r4i7n17 1-4 +compute nodes MIC accelerated 432 cns577 - cns1008 r21u01n577 - r37u31n1008 21-38 ###  IB Topology - [](Salomon_IB_topology.png) + + diff --git a/converted/docs.it4i.cz/salomon/network-1/ib-single-plane-topology.md b/converted/docs.it4i.cz/salomon/network-1/ib-single-plane-topology.md index d5733c512eea7970eceda9b26c5fbea02d13cffe..7e585bd38a3bfa06be87f38807f6fdd3bc8080d9 100644 --- a/converted/docs.it4i.cz/salomon/network-1/ib-single-plane-topology.md +++ b/converted/docs.it4i.cz/salomon/network-1/ib-single-plane-topology.md @@ -3,7 +3,7 @@ IB single-plane topology - + A complete M-Cell assembly consists of four compute racks. Each rack contains 4x physical IRUs - Independent rack units. Using one dual @@ -14,13 +14,13 @@ The SGI ICE X IB Premium Blade provides the first level of interconnection via dual 36-port Mellanox FDR InfiniBand ASIC switch with connections as follows: -- 9 ports from each switch chip connect to the unified backplane, to - connect the 18 compute node slots -- 3 ports on each chip provide connectivity between the chips -- 24 ports from each switch chip connect to the external bulkhead, for - a total of 48 +- 9 ports from each switch chip connect to the unified backplane, to + connect the 18 compute node slots +- 3 ports on each chip provide connectivity between the chips +- 24 ports from each switch chip connect to the external bulkhead, for + a total of 48 -### **IB single-plane topology - ICEX Mcell** +###IB single-plane topology - ICEX Mcell** Each colour in each physical IRU represents one dual-switch ASIC switch. @@ -38,9 +38,9 @@ Hypercube](7d-enhanced-hypercube.html). As shown in a diagram [IB Topology](Salomon_IB_topology.png): -- Racks 21, 22, 23, 24, 25, 26 are equivalent to one Mcell rack. -- Racks 27, 28, 29, 30, 31, 32 are equivalent to one Mcell rack. -- Racks 33, 34, 35, 36, 37, 38 are equivalent to one Mcell rack. +- Racks 21, 22, 23, 24, 25, 26 are equivalent to one Mcell rack. +- Racks 27, 28, 29, 30, 31, 32 are equivalent to one Mcell rack. +- Racks 33, 34, 35, 36, 37, 38 are equivalent to one Mcell rack. [](https://docs.it4i.cz/salomon/network-1/ib-single-plane-topology/IB%20single-plane%20topology%20-%20Accelerated%20nodes.pdf) diff --git a/converted/docs.it4i.cz/salomon/network-1/network.md b/converted/docs.it4i.cz/salomon/network-1/network.md index 33912ca6878779375e567e22fe02bae48c02c8f2..dc1e8ac5e874112fc46f81bbc545b20b593841de 100644 --- a/converted/docs.it4i.cz/salomon/network-1/network.md +++ b/converted/docs.it4i.cz/salomon/network-1/network.md @@ -3,7 +3,7 @@ Network - + All compute and login nodes of Salomon are interconnected by 7D Enhanced hypercube @@ -26,10 +26,10 @@ hypercube](7d-enhanced-hypercube.html). Read more about schematic representation of the Salomon cluster [IB single-plain topology](ib-single-plane-topology.html) ([hypercube dimension](7d-enhanced-hypercube.html) -0).[<span></span>](IB%20single-plane%20topology%20-%20Accelerated%20nodes.pdf/view.html){.state-missing-value -.contenttype-file} +0).[>](IB%20single-plane%20topology%20-%20Accelerated%20nodes.pdf/view.html){.state-missing-value + + -- - The compute nodes may be accessed via the Infiniband network using ib0 network interface, in address range 10.17.0.0 (mask 255.255.224.0). The @@ -44,27 +44,27 @@ The network provides **2170MB/s** transfer rates via the TCP connection Example ------- -``` +``` $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob $ qstat -n -u username - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -15209.isrv5 username qexp Name0 5530 4 96 -- 01:00 R 00:00 - r4i1n0/0*24+r4i1n1/0*24+r4i1n2/0*24+r4i1n3/0*24 +15209.isrv5 username qexp Name0 5530 4 96 -- 01:00 R 00:00 + r4i1n0/0*24+r4i1n1/0*24+r4i1n2/0*24+r4i1n3/0*24 ``` In this example, we access the node r4i1n0 by Infiniband network via the ib0 interface. -``` +``` $ ssh 10.17.35.19 ``` -In this example, we <span style="text-align: start; float: none; ">get -information of the Infiniband network.</span> +In this example, we get +information of the Infiniband network. -``` +``` $ ifconfig .... inet addr:10.17.35.19.... diff --git a/converted/docs.it4i.cz/salomon/prace.md b/converted/docs.it4i.cz/salomon/prace.md index 1270e9a129460c96c0bc33ae80adeb76b744703d..ecbac35e56165c3973a87bb5153e771a05275ac7 100644 --- a/converted/docs.it4i.cz/salomon/prace.md +++ b/converted/docs.it4i.cz/salomon/prace.md @@ -3,7 +3,7 @@ PRACE User Support - + Intro ----- @@ -25,7 +25,7 @@ All general [PRACE User Documentation](http://www.prace-ri.eu/user-documentation/) should be read before continuing reading the local documentation here. -[]()[]()Help and Support +[]()Help and Support ------------------------ If you have any troubles, need information, request support or want to @@ -70,28 +70,28 @@ project for LDAP account creation). Most of the information needed by PRACE users accessing the Salomon TIER-1 system can be found here: -- [General user's - FAQ](http://www.prace-ri.eu/Users-General-FAQs) -- [Certificates - FAQ](http://www.prace-ri.eu/Certificates-FAQ) -- [Interactive access using - GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) -- [Data transfer with - GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) -- [Data transfer with - gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer) +- [General user's + FAQ](http://www.prace-ri.eu/Users-General-FAQs) +- [Certificates + FAQ](http://www.prace-ri.eu/Certificates-FAQ) +- [Interactive access using + GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) +- [Data transfer with + GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) +- [Data transfer with + gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer)  Before you start to use any of the services don't forget to create a proxy certificate from your certificate: - $ grid-proxy-init + $ grid-proxy-init To check whether your proxy certificate is still valid (by default it's valid 12 hours), use: - $ grid-proxy-info + $ grid-proxy-info  @@ -99,53 +99,53 @@ To access Salomon cluster, two login nodes running GSI SSH service are available. The service is available from public Internet as well as from the internal PRACE network (accessible only from other PRACE partners). -**Access from PRACE network:** +Access from PRACE network:** -It is recommended to use the single DNS name <span -class="monospace">salomon-prace.it4i.cz</span> which is distributed +It is recommended to use the single DNS name +salomon-prace.it4i.cz which is distributed between the two login nodes. If needed, user can login directly to one of the login nodes. The addresses are: - Login address Port Protocol Login node - ------------------------------ ------ ---------- ---------------------------------- - salomon-prace.it4i.cz 2222 gsissh login1, login2, login3 or login4 - login1-prace.salomon.it4i.cz 2222 gsissh login1 - login2-prace.salomon.it4i.cz 2222 gsissh login2 - login3-prace.salomon.it4i.cz 2222 gsissh login3 - login4-prace.salomon.it4i.cz 2222 gsissh login4 +Login address Port Protocol Login node +------------------------------ ------ ---------- ---------------------------------- +salomon-prace.it4i.cz 2222 gsissh login1, login2, login3 or login4 +login1-prace.salomon.it4i.cz 2222 gsissh login1 +login2-prace.salomon.it4i.cz 2222 gsissh login2 +login3-prace.salomon.it4i.cz 2222 gsissh login3 +login4-prace.salomon.it4i.cz 2222 gsissh login4  - $ gsissh -p 2222 salomon-prace.it4i.cz + $ gsissh -p 2222 salomon-prace.it4i.cz When logging from other PRACE system, the prace_service script can be used: - $ gsissh `prace_service -i -s salomon` + $ gsissh `prace_service -i -s salomon`  -**Access from public Internet:** +Access from public Internet:** -It is recommended to use the single DNS name <span -class="monospace">salomon.it4i.cz</span> which is distributed between +It is recommended to use the single DNS name +salomon.it4i.cz which is distributed between the two login nodes. If needed, user can login directly to one of the login nodes. The addresses are: - Login address Port Protocol Login node - ------------------------ ------ ---------- ---------------------------------- - salomon.it4i.cz 2222 gsissh login1, login2, login3 or login4 - login1.salomon.it4i.cz 2222 gsissh login1 - login2.salomon.it4i.cz 2222 gsissh login2 - login3.salomon.it4i.cz 2222 gsissh login3 - login4.salomon.it4i.cz 2222 gsissh login4 +Login address Port Protocol Login node +------------------------ ------ ---------- ---------------------------------- +salomon.it4i.cz 2222 gsissh login1, login2, login3 or login4 +login1.salomon.it4i.cz 2222 gsissh login1 +login2.salomon.it4i.cz 2222 gsissh login2 +login3.salomon.it4i.cz 2222 gsissh login3 +login4.salomon.it4i.cz 2222 gsissh login4 - $ gsissh -p 2222 salomon.it4i.cz + $ gsissh -p 2222 salomon.it4i.cz -When logging from other PRACE system, the <span -class="monospace">prace_service</span> script can be used: +When logging from other PRACE system, the +prace_service script can be used: - $ gsissh `prace_service -e -s salomon` + $ gsissh `prace_service -e -s salomon`  @@ -154,13 +154,13 @@ GridFTP](prace.html#file-transfers), the GSI SSH implementation on Salomon supports also SCP, so for small files transfer gsiscp can be used: - $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ + $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ + $ gsiscp -P 2222 salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ + $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ - $ gsiscp -P 2222 salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ + $ gsiscp -P 2222 salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ ### Access to X11 applications (VNC) @@ -175,7 +175,7 @@ the SSH based access ([look here](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html)), only the port forwarding must be done using GSI SSH: - $ gsissh -p 2222 salomon.it4i.cz -L 5961:localhost:5961 + $ gsissh -p 2222 salomon.it4i.cz -L 5961:localhost:5961 ### Access with SSH @@ -185,7 +185,7 @@ regular users using SSH. For more information please see the [section in general documentation](accessing-the-cluster/shell-and-data-access/shell-and-data-access.html). -[]()File transfers +File transfers ------------------ PRACE users can use the same transfer mechanisms as regular users (if @@ -202,68 +202,68 @@ PRACE partners). There's one control server and three backend servers for striping and/or backup in case one of them would fail. -**Access from PRACE network:** +Access from PRACE network:** - Login address Port Node role - ------------------------------- ------ ----------------------------- - gridftp-prace.salomon.it4i.cz 2812 Front end /control server - lgw1-prace.salomon.it4i.cz 2813 Backend / data mover server - lgw2-prace.salomon.it4i.cz 2813 Backend / data mover server - lgw3-prace.salomon.it4i.cz 2813 Backend / data mover server +Login address Port Node role +------------------------------- ------ ----------------------------- +gridftp-prace.salomon.it4i.cz 2812 Front end /control server +lgw1-prace.salomon.it4i.cz 2813 Backend / data mover server +lgw2-prace.salomon.it4i.cz 2813 Backend / data mover server +lgw3-prace.salomon.it4i.cz 2813 Backend / data mover server Copy files **to** Salomon by running the following commands on your local machine: - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ + $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ -Or by using <span class="monospace">prace_service</span> script: +Or by using prace_service script: - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ + $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ Copy files **from** Salomon: - $ globus-url-copy gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ + $ globus-url-copy gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ -Or by using <span class="monospace">prace_service</span> script: +Or by using prace_service script: - $ globus-url-copy gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ + $ globus-url-copy gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_  -**Access from public Internet:** +Access from public Internet:** - Login address Port Node role - ------------------------- ------ ----------------------------- - gridftp.salomon.it4i.cz 2812 Front end /control server - lgw1.salomon.it4i.cz 2813 Backend / data mover server - lgw2.salomon.it4i.cz 2813 Backend / data mover server - lgw3.salomon.it4i.cz 2813 Backend / data mover server +Login address Port Node role +------------------------- ------ ----------------------------- +gridftp.salomon.it4i.cz 2812 Front end /control server +lgw1.salomon.it4i.cz 2813 Backend / data mover server +lgw2.salomon.it4i.cz 2813 Backend / data mover server +lgw3.salomon.it4i.cz 2813 Backend / data mover server Copy files **to** Salomon by running the following commands on your local machine: - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ + $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ -Or by using <span class="monospace">prace_service</span> script: +Or by using prace_service script: - $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ + $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ Copy files **from** Salomon: - $ globus-url-copy gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ + $ globus-url-copy gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ -Or by using <span class="monospace">prace_service</span> script: +Or by using prace_service script: - $ globus-url-copy gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_ + $ globus-url-copy gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_  Generally both shared file systems are available through GridFTP: - File system mount point Filesystem Comment - ------------------------- ------------ ---------------------------------------------------------------- - /home Lustre Default HOME directories of users in format /home/prace/login/ - /scratch Lustre Shared SCRATCH mounted on the whole cluster +File system mount point Filesystem Comment +------------------------- ------------ ---------------------------------------------------------------- +/home Lustre Default HOME directories of users in format /home/prace/login/ +/scratch Lustre Shared SCRATCH mounted on the whole cluster More information about the shared file systems is available [here](storage.html). @@ -271,10 +271,10 @@ More information about the shared file systems is available Please note, that for PRACE users a "prace" directory is used also on the SCRATCH file system. - Data type Default path - ------------------------------ --------------------------------- - large project files /scratch/work/user/prace/login/ - large scratch/temporary data /scratch/temp/ +Data type Default path +------------------------------ --------------------------------- +large project files /scratch/work/user/prace/login/ +large scratch/temporary data /scratch/temp/ Usage of the cluster -------------------- @@ -301,7 +301,7 @@ PRACE users can use the "prace" module to use the [PRACE Common Production Environment](http://www.prace-ri.eu/PRACE-common-production). - $ module load prace + $ module load prace  @@ -314,22 +314,22 @@ documentation](resource-allocation-and-job-execution/introduction.html). For PRACE users, the default production run queue is "qprace". PRACE users can also use two other queues "qexp" and "qfree". - --------------------------------------------------------------------------------------------------------------------------------------------- - queue Active project Project resources Nodes priority authorization walltime - default/max - --------------------- ---------------- ------------------- ----------------------------------------- ---------- --------------- ------------- - **qexp** no none required 32 nodes, max 8 per user 150 no 1 / 1h - Express queue +--------------------------------------------------------------------------------------------------------------------------------------------- +queue Active project Project resources Nodes priority authorization walltime + default/max +--------------------- ---------------- ------------------- ----------------------------------------- ---------- --------------- ------------- +qexp** no none required 32 nodes, max 8 per user 150 no 1 / 1h +Express queue - **qprace** yes > 0 <span>1006 nodes, max 86 per job</span> 0 no 24 / 48h - Production queue - +qprace** yes > 0 >1006 nodes, max 86 per job 0 no 24 / 48h +Production queue + - **qfree** yes none required 752 nodes, max 86 per job -1024 no 12 / 12h - Free resource queue - --------------------------------------------------------------------------------------------------------------------------------------------- +qfree** yes none required 752 nodes, max 86 per job -1024 no 12 / 12h +Free resource queue +--------------------------------------------------------------------------------------------------------------------------------------------- -**qprace**, the PRACE Production queue****: This queue is intended for +qprace**, the PRACE Production queue****: This queue is intended for normal production runs. It is required that active project with nonzero remaining resources is specified to enter the qprace. The queue runs with medium priority and no special authorization is required to use it. @@ -362,20 +362,20 @@ The **it4ifree** command is a part of it4i.portal.clients package, located here: <https://pypi.python.org/pypi/it4i.portal.clients> - $ it4ifree - Password: -     PID  Total Used ...by me Free -   -------- ------- ------ -------- ------- -   OPEN-0-0 1500000 400644  225265 1099356 -   DD-13-1   10000 2606 2606 7394 + $ it4ifree + Password: +     PID  Total Used ...by me Free +   -------- ------- ------ -------- ------- +   OPEN-0-0 1500000 400644  225265 1099356 +   DD-13-1   10000 2606 2606 7394  By default file system quota is applied. To check the current status of the quota (separate for HOME and SCRATCH) use - $ quota - $ lfs quota -u USER_LOGIN /scratch + $ quota + $ lfs quota -u USER_LOGIN /scratch If the quota is insufficient, please contact the [support](prace.html#help-and-support) and request an diff --git a/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/capacity-computing.md b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/capacity-computing.md index b6a6b3b89914c55a9eff3f987e60f6e965258e27..924cfbeff5667a28479254233eb057bf24f3393c 100644 --- a/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/capacity-computing.md +++ b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/capacity-computing.md @@ -3,12 +3,12 @@ Capacity computing - + Introduction ------------ -In many cases, it is useful to submit huge (<span>100+</span>) number of +In many cases, it is useful to submit huge (>100+) number of computational jobs into the PBS queue system. Huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations, achieving best runtime, throughput and computer @@ -21,28 +21,28 @@ for all users. For this reason, the number of jobs is **limited to 100 per user, 1500 per job array** Please follow one of the procedures below, in case you wish to schedule -more than <span>100</span> jobs at a time. - -- Use [Job arrays](capacity-computing.html#job-arrays) - when running huge number of - [multithread](capacity-computing.html#shared-jobscript-on-one-node) - (bound to one node only) or multinode (multithread across - several nodes) jobs -- Use [GNU - parallel](capacity-computing.html#gnu-parallel) when - running single core jobs -- Combine[GNU parallel with Job - arrays](capacity-computing.html#combining-job-arrays-and-gnu-parallel) - when running huge number of single core jobs +more than >100 jobs at a time. + +- Use [Job arrays](capacity-computing.html#job-arrays) + when running huge number of + [multithread](capacity-computing.html#shared-jobscript-on-one-node) + (bound to one node only) or multinode (multithread across + several nodes) jobs +- Use [GNU + parallel](capacity-computing.html#gnu-parallel) when + running single core jobs +- Combine[GNU parallel with Job + arrays](capacity-computing.html#combining-job-arrays-and-gnu-parallel) + when running huge number of single core jobs Policy ------ -1. A user is allowed to submit at most 100 jobs. Each job may be [a job - array](capacity-computing.html#job-arrays). -2. The array size is at most 1000 subjobs. +1.A user is allowed to submit at most 100 jobs. Each job may be [a job + array](capacity-computing.html#job-arrays). +2.The array size is at most 1000 subjobs. -[]()Job arrays +Job arrays -------------- Huge number of jobs may be easily submitted and managed as a job array. @@ -51,22 +51,22 @@ A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: -- each subjob has a unique index, $PBS_ARRAY_INDEX -- job Identifiers of subjobs only differ by their indices -- the state of subjobs can differ (R,Q,...etc.) +- each subjob has a unique index, $PBS_ARRAY_INDEX +- job Identifiers of subjobs only differ by their indices +- the state of subjobs can differ (R,Q,...etc.) All subjobs within a job array have the same scheduling priority and schedule as independent jobs. Entire job array is submitted through a single qsub command and may be managed by qdel, qalter, qhold, qrls and qsig commands as a single job. -### []()Shared jobscript +### Shared jobscript All subjobs in job array use the very same, single jobscript. Each subjob runs its own instance of the jobscript. The instances execute different work controlled by $PBS_ARRAY_INDEX variable. -[]()Example: +Example: Assume we have 900 input files with name beginning with "file" (e. g. file001, ..., file900). Assume we would like to use each of these input @@ -75,13 +75,13 @@ files with program executable myprog.x, each as a separate job. First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) - all input files in our example: -``` +``` $ find . -name 'file*' > tasklist ``` Then we create jobscript: -``` +``` #!/bin/bash #PBS -A PROJECT_ID #PBS -q qprod @@ -92,7 +92,7 @@ SCR=/scratch/work/user/$USER/$PBS_JOBID mkdir -p $SCR ; cd $SCR || exit # get individual tasks from tasklist with index from PBS JOB ARRAY -TASK=$(sed -n "${PBS_ARRAY_INDEX}p" $PBS_O_WORKDIR/tasklist) +TASK=$(sed -n "${PBS_ARRAY_INDEX}p" $PBS_O_WORKDIR/tasklist) # copy input file and executable to scratch cp $PBS_O_WORKDIR/$TASK input ; cp $PBS_O_WORKDIR/myprog.x . @@ -108,8 +108,8 @@ In this example, the submit directory holds the 900 input files, executable myprog.x and the jobscript file. As input for each run, we take the filename of input file from created tasklist file. We copy the input file to scratch /scratch/work/user/$USER/$PBS_JOBID, execute -the myprog.x and copy the output file back to <span>the submit -directory</span>, under the $TASK.out name. The myprog.x runs on one +the myprog.x and copy the output file back to >the submit +directory, under the $TASK.out name. The myprog.x runs on one node only and must use threads to run in parallel. Be aware, that if the myprog.x **is not multithreaded**, then all the **jobs are run as single thread programs in sequential** manner. Due to allocation of the whole @@ -129,7 +129,7 @@ To submit the job array, use the qsub -J command. The 900 jobs of the [example above](capacity-computing.html#array_example) may be submitted like this: -``` +``` $ qsub -N JOBNAME -J 1-900 jobscript 506493[].isrv5 ``` @@ -142,7 +142,7 @@ forget to set your valid PROJECT_ID and desired queue). Sometimes for testing purposes, you may need to submit only one-element array. This is not allowed by PBSPro, but there's a workaround: -``` +``` $ qsub -N JOBNAME -J 9-10:2 jobscript ``` @@ -153,40 +153,40 @@ submitting/running your job. Check status of the job array by the qstat command. -``` +``` $ qstat -a 506493[].isrv5 isrv5: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -12345[].dm2 user2 qprod xx 13516 1 24 -- 00:50 B 00:02 +12345[].dm2 user2 qprod xx 13516 1 24 -- 00:50 B 00:02 ``` The status B means that some subjobs are already running. Check status of the first 100 subjobs by the qstat command. -``` +``` $ qstat -a 12345[1-100].isrv5 isrv5: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -12345[1].isrv5 user2 qprod xx 13516 1 24 -- 00:50 R 00:02 -12345[2].isrv5 user2 qprod xx 13516 1 24 -- 00:50 R 00:02 -12345[3].isrv5 user2 qprod xx 13516 1 24 -- 00:50 R 00:01 -12345[4].isrv5 user2 qprod xx 13516 1 24 -- 00:50 Q -- - . . . . . . . . . . . - , . . . . . . . . . . -12345[100].isrv5 user2 qprod xx 13516 1 24 -- 00:50 Q -- +12345[1].isrv5 user2 qprod xx 13516 1 24 -- 00:50 R 00:02 +12345[2].isrv5 user2 qprod xx 13516 1 24 -- 00:50 R 00:02 +12345[3].isrv5 user2 qprod xx 13516 1 24 -- 00:50 R 00:01 +12345[4].isrv5 user2 qprod xx 13516 1 24 -- 00:50 Q -- + . . . . . . . . . . . + , . . . . . . . . . . +12345[100].isrv5user2 qprod xx 13516 1 24 -- 00:50 Q -- ``` Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted. -``` +``` $ qdel 12345[].isrv5 ``` @@ -194,20 +194,20 @@ Deleting large job arrays may take a while. Display status information for all user's jobs, job arrays, and subjobs. -``` +``` $ qstat -u $USER -t ``` Display status information for all user's subjobs. -``` +``` $ qstat -u $USER -tJ ``` Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation.html). -[]()GNU parallel +GNU parallel ---------------- Use GNU parallel to run many single core tasks on one node. @@ -219,7 +219,7 @@ useful in running single core jobs via the queue system on Anselm. For more information and examples see the parallel man page: -``` +``` $ module add parallel $ man parallel ``` @@ -230,7 +230,7 @@ The GNU parallel shell executes multiple instances of the jobscript using all cores on the node. The instances execute different work, controlled by the $PARALLEL_SEQ variable. -[]()Example: +Example: Assume we have 101 input files with name beginning with "file" (e. g. file001, ..., file101). Assume we would like to use each of these input @@ -240,13 +240,13 @@ job. We call these single core jobs tasks. First, we create a tasklist file, listing all tasks - all input files in our example: -``` +``` $ find . -name 'file*' > tasklist ``` Then we create jobscript: -``` +``` #!/bin/bash #PBS -A PROJECT_ID #PBS -q qprod @@ -288,7 +288,7 @@ To submit the job, use the qsub command. The 101 tasks' job of the [example above](capacity-computing.html#gp_example) may be submitted like this: -``` +``` $ qsub -N JOBNAME jobscript 12345.dm2 ``` @@ -300,7 +300,7 @@ complete in less than 2 hours. Please note the #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue. -[]()Job arrays and GNU parallel +Job arrays and GNU parallel ------------------------------- Combine the Job arrays and GNU parallel for best throughput of single @@ -322,7 +322,7 @@ GNU parallel shell executes multiple instances of the jobscript using all cores on the node. The instances execute different work, controlled by the $PBS_JOB_ARRAY and $PARALLEL_SEQ variables. -[]()Example: +Example: Assume we have 992 input files with name beginning with "file" (e. g. file001, ..., file992). Assume we would like to use each of these input @@ -332,20 +332,20 @@ job. We call these single core jobs tasks. First, we create a tasklist file, listing all tasks - all input files in our example: -``` +``` $ find . -name 'file*' > tasklist ``` Next we create a file, controlling how many tasks will be executed in one subjob -``` +``` $ seq 32 > numtasks ``` Then we create jobscript: -``` +``` #!/bin/bash #PBS -A PROJECT_ID #PBS -q qprod @@ -385,13 +385,13 @@ Select subjob walltime and number of tasks per subjob carefully  When deciding this values, think about following guiding rules : -1. Let n=N/24. Inequality (n+1) * T < W should hold. The N is - number of tasks per subjob, T is expected single task walltime and W - is subjob walltime. Short subjob walltime improves scheduling and - job throughput. -2. Number of tasks should be modulo 24. -3. These rules are valid only when all tasks have similar task - walltimes T. +1.Let n=N/24. Inequality (n+1) * T < W should hold. The N is + number of tasks per subjob, T is expected single task walltime and W + is subjob walltime. Short subjob walltime improves scheduling and + job throughput. +2.Number of tasks should be modulo 24. +3.These rules are valid only when all tasks have similar task + walltimes T. ### Submit the job array @@ -400,7 +400,7 @@ the [example above](capacity-computing.html#combined_example) may be submitted like this: -``` +``` $ qsub -N JOBNAME -J 1-992:32 jobscript 12345[].dm2 ``` @@ -426,7 +426,7 @@ production jobs. Unzip the archive in an empty directory on Anselm and follow the instructions in the README file -``` +``` $ unzip capacity.zip $ cd capacity $ cat README diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/fairshare_formula.png b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/fairshare_formula.png similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/fairshare_formula.png rename to converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/fairshare_formula.png diff --git a/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/introduction.md b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/introduction.md index 4332b2fd9c3f428d36cc40e2f7f18509214283f7..3fb197f639f8b26b1e2bc024b2ca65fc3539a28d 100644 --- a/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/introduction.md +++ b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/introduction.md @@ -3,7 +3,7 @@ Resource Allocation and Job Execution - + To run a [job](job-submission-and-execution.html), [computational @@ -28,12 +28,12 @@ queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are available to Anselm users: -- **qexp**, the Express queue -- **qprod**, the Production queue**** -- **qlong**, the Long queue -- **qmpp**, the Massively parallel queue -- **qfat**, the queue to access SMP UV2000 machine -- **qfree,** the Free resource utilization queue +- **qexp**, the Express queue +- **qprod**, the Production queue**** +- **qlong**, the Long queue +- **qmpp**, the Massively parallel queue +- **qfat**, the queue to access SMP UV2000 machine +- **qfree,** the Free resource utilization queue Check the queue status at <https://extranet.it4i.cz/rsweb/salomon/> diff --git a/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job-priority.md b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job-priority.md index 31f9f1b5afd9294f187621d64e40d2b0311d5fbb..0a0caaecc83507be4af55df289bc2311c6803754 100644 --- a/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job-priority.md +++ b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job-priority.md @@ -10,9 +10,9 @@ execution priority to select which job(s) to run. Job execution priority is determined by these job properties (in order of importance): -1. queue priority -2. fairshare priority -3. eligible time +1.queue priority +2.fairshare priority +3.eligible time ### Queue priority @@ -39,7 +39,7 @@ Fairshare priority is used for ranking jobs with equal queue priority. Fairshare priority is calculated as - + where MAX_FAIRSHARE has value 1E6, usage~Project~ is cumulated usage by all members of selected project, @@ -49,37 +49,37 @@ Usage counts allocated corehours (ncpus*walltime). Usage is decayed, or cut in half periodically, at the interval 168 hours (one week). Jobs queued in queue qexp are not calculated to project's usage. -<span>Calculated usage and fairshare priority can be seen at -<https://extranet.it4i.cz/rsweb/salomon/projects>.</span> +>Calculated usage and fairshare priority can be seen at +<https://extranet.it4i.cz/rsweb/salomon/projects>. -<span> -<span>Calculated fairshare priority can be also seen as -Resource_List.fairshare attribute of a job.</span> -</span> +> +>Calculated fairshare priority can be also seen as +Resource_List.fairshare attribute of a job. -### <span>Eligible time</span> + +### >Eligible time Eligible time is amount (in seconds) of eligible time job accrued while waiting to run. Jobs with higher eligible time gains higher -pri<span><span></span></span>ority. +pri>>ority. Eligible time has the least impact on execution priority. Eligible time is used for sorting jobs with equal queue priority and fairshare -priority. It is very, very difficult for <span>eligible time</span> to +priority. It is very, very difficult for >eligible time to compete with fairshare priority. -<span><span>Eligible time can be seen as eligible_time attribute of -job.</span></span> +>>Eligible time can be seen as eligible_time attribute of +job. ### Formula Job execution priority (job sort formula) is calculated as: - + ### Job backfilling -<span>The scheduler uses job backfilling.</span> +>The scheduler uses job backfilling. Backfilling means fitting smaller jobs around the higher-priority jobs that the scheduler is going to run next, in such a way that the diff --git a/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job-submission-and-execution.md b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job-submission-and-execution.md index 792c2c27faee00c5e1666812b2a4ab2a484debef..fb8a5a1b01efe259017eff72d1de77a3d853251d 100644 --- a/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job-submission-and-execution.md +++ b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job-submission-and-execution.md @@ -3,27 +3,27 @@ Job submission and execution - + Job Submission -------------- When allocating computational resources for the job, please specify -1. suitable queue for your job (default is qprod) -2. number of computational nodes required -3. number of cores per node required -4. maximum wall time allocated to your calculation, note that jobs - exceeding maximum wall time will be killed -5. Project ID -6. Jobscript or interactive switch +1.suitable queue for your job (default is qprod) +2.number of computational nodes required +3.number of cores per node required +4.maximum wall time allocated to your calculation, note that jobs + exceeding maximum wall time will be killed +5.Project ID +6.Jobscript or interactive switch Use the **qsub** command to submit your job to a queue for allocation of the computational resources. Submit the job using the qsub command: -``` +``` $ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] jobscript ``` @@ -36,11 +36,11 @@ on first of the allocated nodes.** PBS statement nodes (qsub -l nodes=nodespec) is not supported on Salomon cluster.** -** + ### Job Submission Examples -``` +``` $ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=24,walltime=03:00:00 ./myjob ``` @@ -51,7 +51,7 @@ myjob will be executed on the first node in the allocation.  -``` +``` $ qsub -q qexp -l select=4:ncpus=24 -I ``` @@ -61,7 +61,7 @@ available interactively  -``` +``` $ qsub -A OPEN-0-0 -q qlong -l select=10:ncpus=24 ./myjob ``` @@ -71,7 +71,7 @@ executed on the first node in the allocation.  -``` +``` $ qsub -A OPEN-0-0 -q qfree -l select=10:ncpus=24 ./myjob ``` @@ -97,16 +97,16 @@ resources have been spent, etc. The Phi cards are thus also available to PRACE users. There's no need to ask for permission to utilize the Phi cards in project proposals. -``` -$ qsub -A OPEN-0-0 -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 ./myjob +``` +$ qsub-A OPEN-0-0 -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 ./myjob ``` In this example, we allocate 1 node, with 24 cores, with 2 Xeon Phi 7120p cards, running batch job ./myjob. The default time for qprod is used, e. g. 24 hours. -``` -$ qsub -A OPEN-0-0 -I -q qlong -l select=4:ncpus=24:accelerator=True:naccelerators=2 -l walltime=56:00:00 -I +``` +$ qsub-A OPEN-0-0 -I -q qlong -l select=4:ncpus=24:accelerator=True:naccelerators=2 -l walltime=56:00:00 -I ``` In this example, we allocate 4 nodes, with 24 cores per node (totalling @@ -130,7 +130,7 @@ user may not utilize CPU or memory allocated to a job by other user. Always, full chunks are allocated, a job may only use resources of the NUMA nodes allocated to itself. -``` +```  $ qsub -A OPEN-0-0 -q qfat -l select=14 ./myjob ``` @@ -138,7 +138,7 @@ In this example, we allocate all 14 NUMA nodes (corresponds to 14 chunks), 112 cores of the SGI UV2000 node for 72 hours. Jobscript myjob will be executed on the node uv1. -``` +``` $ qsub -A OPEN-0-0 -q qfat -l select=1:mem=2000GB ./myjob ``` @@ -152,7 +152,7 @@ All qsub options may be [saved directly into the jobscript](job-submission-and-execution.html#PBSsaved). In such a case, no options to qsub are needed. -``` +``` $ qsub ./myjob ``` @@ -161,24 +161,24 @@ $ qsub ./myjob By default, the PBS batch system sends an e-mail only when the job is aborted. Disabling mail events completely can be done like this: -``` +``` $ qsub -m n ``` -[]()Advanced job placement +Advanced job placement -------------------------- ### Placement by name Specific nodes may be allocated via the PBS -``` +``` qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=24:host=r24u35n680+1:ncpus=24:host=r24u36n681 -I ``` Or using short names -``` +``` qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=24:host=cns680+1:ncpus=24:host=cns681 -I ``` @@ -191,19 +191,19 @@ available interactively. Nodes may be selected via the PBS resource attribute ehc_[1-7]d . - Hypercube dimension <span class="pun">node_group_key</span> - --------------------- ------------------------------------------- - 1D ehc_1d - 2D ehc_2d - 3D ehc_3d - 4D ehc_4d - 5D ehc_5d - 6D ehc_6d - 7D ehc_7d +Hypercube dimension +--------------------- ------------------------------------------- +1D ehc_1d +2D ehc_2d +3D ehc_3d +4D ehc_4d +5D ehc_5d +6D ehc_6d +7D ehc_7d  -``` +``` $ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=24 -l place=group=ehc_1d -I ``` @@ -232,7 +232,7 @@ Infiniband switch list: -``` +``` $ qmgr -c "print node @a" | grep switch set node r4i1n11 resources_available.switch = r4i1s0sw1 set node r2i0n0 resources_available.switch = r2i0s0sw1 @@ -245,7 +245,7 @@ List of all nodes per Infiniband switch: -``` +``` $ qmgr -c "print node @a" | grep r36sw3 set node r36u31n964 resources_available.switch = r36sw3 set node r36u32n965 resources_available.switch = r36sw3 @@ -267,14 +267,14 @@ efficiently: -``` +``` $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24:switch=r4i1s0sw1 ./myjob ``` In this example, we request all the 9 nodes sharing the r4i1s0sw1 switch for 24 hours. -``` +``` $ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24 -l place=group=switch ./myjob ``` In this example, we request 9 nodes placed on the same switch using node @@ -289,25 +289,25 @@ Job Management Check status of your jobs using the **qstat** and **check-pbs-jobs** commands -``` +``` $ qstat -a $ qstat -a -u username $ qstat -an -u username $ qstat -f 12345.isrv5 ``` -[]()Example: +Example: -``` +``` $ qstat -a srv11: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -16287.isrv5 user1 qlong job1 6183 4 64 -- 144:0 R 38:25 -16468.isrv5 user1 qlong job2 8060 4 64 -- 144:0 R 17:44 -16547.isrv5 user2 qprod job3x 13516 2 32 -- 48:00 R 00:58 +16287.isrv5 user1 qlong job1 6183 4 64 -- 144:0 R 38:25 +16468.isrv5 user1 qlong job2 8060 4 64 -- 144:0 R 17:44 +16547.isrv5 user2 qprod job3x 13516 2 32 -- 48:00 R 00:58 ``` In this example user1 and user2 are running jobs named job1, job2 and @@ -323,7 +323,7 @@ of user's PBS jobs' processes on execution hosts. Display load, processes. Display job standard and error output. Continuously display (tail -f) job standard or error output. -``` +``` $ check-pbs-jobs --check-all $ check-pbs-jobs --print-load --print-processes $ check-pbs-jobs --print-job-out --print-job-err @@ -333,7 +333,7 @@ $ check-pbs-jobs --jobid JOBID --tailf-job-out Examples: -``` +``` $ check-pbs-jobs --check-all JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3 Check session id: OK @@ -345,16 +345,16 @@ r3i6n3: No process In this example we see that job 35141.dm2 currently runs no process on allocated node r3i6n2, which may indicate an execution error. -``` +``` $ check-pbs-jobs --print-load --print-processes JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3 Print load r3i6n2: LOAD: 16.01, 16.01, 16.00 -r3i6n3: LOAD: 0.01, 0.00, 0.01 +r3i6n3: LOAD:0.01, 0.00, 0.01 Print processes - %CPU CMD -r3i6n2: 0.0 -bash -r3i6n2: 0.0 /bin/bash /var/spool/PBS/mom_priv/jobs/35141.dm2.SC + %CPU CMD +r3i6n2:0.0 -bash +r3i6n2:0.0 /bin/bash /var/spool/PBS/mom_priv/jobs/35141.dm2.SC r3i6n2: 99.7 run-task ... ``` @@ -363,11 +363,11 @@ In this example we see that job 35141.dm2 currently runs process run-task on node r3i6n2, using one thread only, while node r3i6n3 is empty, which may indicate an execution error. -``` +``` $ check-pbs-jobs --jobid 35141.dm2 --print-job-out JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3 Print job standard output: -======================== Job start ========================== +======================== Job start========================== Started at   : Fri Aug 30 02:47:53 CEST 2013 Script name  : script Run loop 1 @@ -379,23 +379,23 @@ In this example, we see actual output (some iteration loops) of the job 35141.dm2 Manage your queued or running jobs, using the **qhold**, **qrls**, -**qdel,** **qsig** or **qalter** commands +qdel,** **qsig** or **qalter** commands You may release your allocation at any time, using qdel command -``` +``` $ qdel 12345.isrv5 ``` You may kill a running job by force, using qsig command -``` +``` $ qsig -s 9 12345.isrv5 ``` Learn more by reading the pbs man page -``` +``` $ man pbs_professional ``` @@ -415,16 +415,16 @@ manager. The jobscript or interactive shell is executed on first of the allocated nodes. -``` +``` $ qsub -q qexp -l select=4:ncpus=24 -N Name0 ./myjob $ qstat -n -u username isrv5: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- -15209.isrv5 username qexp Name0 5530 4 96 -- 01:00 R 00:00 - r21u01n577/0*24+r21u02n578/0*24+r21u03n579/0*24+r21u04n580/0*24 +15209.isrv5 username qexp Name0 5530 4 96 -- 01:00 R 00:00 + r21u01n577/0*24+r21u02n578/0*24+r21u03n579/0*24+r21u04n580/0*24 ```  In this example, the nodes r21u01n577, r21u02n578, r21u03n579, @@ -435,7 +435,7 @@ nodes r21u02n578, r21u03n579, r21u04n580 are available for use as well. The jobscript or interactive shell is by default executed in home directory -``` +``` $ qsub -q qexp -l select=4:ncpus=24 -I qsub: waiting for job 15210.isrv5 to start qsub: job 15210.isrv5 ready @@ -457,7 +457,7 @@ Calculations on allocated nodes may be executed remotely via the MPI, ssh, pdsh or clush. You may find out which nodes belong to the allocation by reading the $PBS_NODEFILE file -``` +``` qsub -q qexp -l select=2:ncpus=24 -I qsub: waiting for job 15210.isrv5 to start qsub: job 15210.isrv5 ready @@ -492,7 +492,7 @@ The recommended way to run production jobs is to change to /scratch directory early in the jobscript, copy all inputs to /scratch, execute the calculations and copy outputs to home directory. -``` +``` #!/bin/bash # change to scratch directory, exit on failure @@ -535,14 +535,14 @@ subsequent calculation. In such a case, it is users responsibility to preload the input files on shared /scratch before the job submission and retrieve the outputs manually, after all calculations are finished. -[]()Store the qsub options within the jobscript. +Store the qsub options within the jobscript. Use **mpiprocs** and **ompthreads** qsub options to control the MPI job execution. Example jobscript for an MPI job with preloaded inputs and executables, options for qsub are stored within the script : -``` +``` #!/bin/bash #PBS -q qprod #PBS -N MYJOB @@ -567,14 +567,14 @@ exit In this example, input and executable files are assumed preloaded manually in /scratch/$USER/myjob directory. Note the **mpiprocs** and -**ompthreads** qsub options, controlling behavior of the MPI execution. +ompthreads** qsub options, controlling behavior of the MPI execution. The mympiprog.x is executed as one process per node, on all 100 allocated nodes. If mympiprog.x implements OpenMP threads, it will run 24 threads per node. HTML commented section #2 (examples need to be reworked) -### Example Jobscript for Single Node Calculation[]() +### Example Jobscript for Single Node Calculation Local scratch directory is often useful for single node jobs. Local scratch will be deleted immediately after the job ends. @@ -584,7 +584,7 @@ operational memory. Example jobscript for single node calculation, using [local scratch](../storage.html) on the node: -``` +``` #!/bin/bash # change to local scratch directory diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job_sort_formula.png b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job_sort_formula.png similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job_sort_formula.png rename to converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/job_sort_formula.png diff --git a/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/resources-allocation-policy.md b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/resources-allocation-policy.md index 36ff83d28e0fdd010ed325ab683499f30a910eb8..d75ca58423cd5c23a793da977493b2031f096e00 100644 --- a/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/resources-allocation-policy.md +++ b/converted/docs.it4i.cz/salomon/resource-allocation-and-job-execution/resources-allocation-policy.md @@ -3,7 +3,7 @@ Resources Allocation Policy - + Resources Allocation Policy --------------------------- @@ -51,7 +51,7 @@ Express queue</td> <td align="left">none required</td> <td align="left">32 nodes, max 8 per user</td> <td align="left">24</td> -<td align="left"><span>150</span></td> +<td align="left">>150</td> <td align="left">no</td> <td align="left">1 / 1h</td> </tr> @@ -62,7 +62,7 @@ Production queue</td> <br /> </td> <td align="left">> 0</td> -<td align="left"><p><span>1006 nodes, max 86 per job</span></p></td> +<td align="left"><p>>1006 nodes, max 86 per job</p></td> <td align="left">24</td> <td align="left">0</td> <td align="left">no</td> @@ -128,7 +128,7 @@ Free resource queue</td>  -**The qfree queue is not free of charge**. [Normal +The qfree queue is not free of charge**. [Normal accounting](resources-allocation-policy.html#resources-accounting-policy) applies. However, it allows for utilization of free resources, once a Project exhausted all its allocated computational resources. This does @@ -138,67 +138,67 @@ is allowed after request for this queue.  -- **qexp**, the Express queue: This queue is dedicated for testing and - running very small jobs. It is not required to specify a project to - enter the qexp. <span>*<span>There are 2 nodes always reserved for - this queue (w/o accelerator), maximum 8 nodes are available via the - qexp for a particular user. </span>*</span>The nodes may be - allocated on per core basis. No special authorization is required to - use it. The maximum runtime in qexp is 1 hour. -- **qprod**, the Production queue****: This queue is intended for - normal production runs. It is required that active project with - nonzero remaining resources is specified to enter the qprod. All - nodes may be accessed via the qprod queue, however only 86 per job. - ** Full nodes, 24 cores per node are allocated. The queue runs with - medium priority and no special authorization is required to use it. - The maximum runtime in qprod is 48 hours. -- **qlong**, the Long queue****: This queue is intended for long - production runs. It is required that active project with nonzero - remaining resources is specified to enter the qlong. Only 336 nodes - without acceleration may be accessed via the qlong queue. Full - nodes, 24 cores per node are allocated. The queue runs with medium - priority and no special authorization is required to use it.<span> - *The maximum runtime in qlong is 144 hours (three times of the - standard qprod time - 3 * 48 h)*</span> -- <span>****qmpp**, the massively parallel queue. This queue is - intended for massively parallel runs. It is required that active - project with nonzero remaining resources is specified to enter - the qmpp. All nodes may be accessed via the qmpp queue. ** Full - nodes, 24 cores per node are allocated. The queue runs with medium - priority and no special authorization is required to use it. The - maximum runtime in qmpp is 4 hours. An PI<span> *needs explicitly* - </span>ask [support](https://support.it4i.cz/rt/) - for authorization to enter the queue for all users associated to - her/his Project. - </span> -- <span>**</span>**qfat**, the UV2000 queue. This queue is dedicated - to access the fat SGI UV2000 SMP machine. The machine (uv1) has 112 - Intel IvyBridge cores at 3.3GHz and 3.25TB RAM. An PI<span> *needs - explicitly* </span>ask - [support](https://support.it4i.cz/rt/) for - authorization to enter the queue for all users associated to her/his - Project.**** -- **qfree**, the Free resource queue****: The queue qfree is intended - for utilization of free resources, after a Project exhausted all its - allocated computational resources (Does not apply to DD projects - by default. DD projects have to request for persmission on qfree - after exhaustion of computational resources.). It is required that - active project is specified to enter the queue, however no remaining - resources are required. Consumed resources will be accounted to - the Project. Only 178 nodes without accelerator may be accessed from - this queue. Full nodes, 24 cores per node are allocated. The queue - runs with very low priority and no special authorization is required - to use it. The maximum runtime in qfree is 12 hours. -- **qviz**, the Visualization queue****: Intended for - pre-/post-processing using OpenGL accelerated graphics. Currently - when accessing the node, each user gets 4 cores of a CPU allocated, - thus approximately 73 GB of RAM and 1/7 of the GPU capacity - (default "chunk"). *If more GPU power or RAM is required, it is - recommended to allocate more chunks (with 4 cores each) up to one - whole node per user, so that all 28 cores, 512 GB RAM and whole GPU - is exclusive. This is currently also the maximum allowed allocation - per one user. One hour of work is allocated by default, the user may - ask for 2 hours maximum.* +- **qexp**, the Express queue: This queue is dedicated for testing and + running very small jobs. It is not required to specify a project to + enter the qexp. >*>There are 2 nodes always reserved for + this queue (w/o accelerator), maximum 8 nodes are available via the + qexp for a particular user. *The nodes may be + allocated on per core basis. No special authorization is required to + use it. The maximum runtime in qexp is 1 hour. +- **qprod**, the Production queue****: This queue is intended for + normal production runs. It is required that active project with + nonzero remaining resources is specified to enter the qprod. All + nodes may be accessed via the qprod queue, however only 86 per job. + ** Full nodes, 24 cores per node are allocated. The queue runs with + medium priority and no special authorization is required to use it. + The maximum runtime in qprod is 48 hours. +- **qlong**, the Long queue****: This queue is intended for long + production runs. It is required that active project with nonzero + remaining resources is specified to enter the qlong. Only 336 nodes + without acceleration may be accessed via the qlong queue. Full + nodes, 24 cores per node are allocated. The queue runs with medium + priority and no special authorization is required to use it.> + *The maximum runtime in qlong is 144 hours (three times of the + standard qprod time - 3 * 48 h)* +- >****qmpp**, the massively parallel queue. This queue is + intended for massively parallel runs. It is required that active + project with nonzero remaining resources is specified to enter + the qmpp. All nodes may be accessed via the qmpp queue. ** Full + nodes, 24 cores per node are allocated. The queue runs with medium + priority and no special authorization is required to use it. The + maximum runtime in qmpp is 4 hours. An PI> *needs explicitly* + ask [support](https://support.it4i.cz/rt/) + for authorization to enter the queue for all users associated to + her/his Project. + +- >****qfat**, the UV2000 queue. This queue is dedicated + to access the fat SGI UV2000 SMP machine. The machine (uv1) has 112 + Intel IvyBridge cores at 3.3GHz and 3.25TB RAM. An PI> *needs + explicitly* ask + [support](https://support.it4i.cz/rt/) for + authorization to enter the queue for all users associated to her/his + Project.**** +- **qfree**, the Free resource queue****: The queue qfree is intended + for utilization of free resources, after a Project exhausted all its + allocated computational resources (Does not apply to DD projects + by default. DD projects have to request for persmission on qfree + after exhaustion of computational resources.). It is required that + active project is specified to enter the queue, however no remaining + resources are required. Consumed resources will be accounted to + the Project. Only 178 nodes without accelerator may be accessed from + this queue. Full nodes, 24 cores per node are allocated. The queue + runs with very low priority and no special authorization is required + to use it. The maximum runtime in qfree is 12 hours. +- **qviz**, the Visualization queue****: Intended for + pre-/post-processing using OpenGL accelerated graphics. Currently + when accessing the node, each user gets 4 cores of a CPU allocated, + thus approximately 73 GB of RAM and 1/7 of the GPU capacity + (default "chunk"). *If more GPU power or RAM is required, it is + recommended to allocate more chunks (with 4 cores each) up to one + whole node per user, so that all 28 cores, 512 GB RAM and whole GPU + is exclusive. This is currently also the maximum allowed allocation + per one user. One hour of work is allocated by default, the user may + ask for 2 hours maximum.*  @@ -207,7 +207,7 @@ To access node with Xeon Phi co-processor user needs to specify that in statement](job-submission-and-execution.html). ### Notes** -** + The job wall clock time defaults to **half the maximum time**, see table above. Longer wall time limits can be [set manually, see @@ -228,20 +228,20 @@ Check the status of jobs, queues and compute nodes at  - +  Display the queue status on Salomon: -``` +``` $ qstat -q ``` The PBS allocation overview may be obtained also using the rspbs command. -``` +``` $ rspbs Usage: rspbs [options] @@ -298,7 +298,7 @@ Options:  --incl-finished      Include finished jobs ``` -[]()Resources Accounting Policy +Resources Accounting Policy ------------------------------- ### The Core-Hour @@ -323,13 +323,13 @@ User may check at any time, how many core-hours have been consumed by himself/herself and his/her projects. The command is available on clusters' login nodes. -``` +``` $ it4ifree Password: -    PID  Total Used ...by me Free +    PID  Total Used ...by me Free   -------- ------- ------ -------- -------   OPEN-0-0 1500000 400644  225265 1099356 -  DD-13-1   10000 2606 2606 7394 +  DD-13-1   10000 2606 2606 7394 ```  diff --git a/converted/docs.it4i.cz/salomon/ba2c321e-1554-4826-b6ec-3c68d370cd9f.jpeg b/converted/docs.it4i.cz/salomon/salomon-1.jpeg similarity index 100% rename from converted/docs.it4i.cz/salomon/ba2c321e-1554-4826-b6ec-3c68d370cd9f.jpeg rename to converted/docs.it4i.cz/salomon/salomon-1.jpeg diff --git a/converted/docs.it4i.cz/salomon/d2a6de55-62fc-454f-adda-a6a25e3f44dd.jpeg b/converted/docs.it4i.cz/salomon/salomon-3.jpeg similarity index 100% rename from converted/docs.it4i.cz/salomon/d2a6de55-62fc-454f-adda-a6a25e3f44dd.jpeg rename to converted/docs.it4i.cz/salomon/salomon-3.jpeg diff --git a/converted/docs.it4i.cz/salomon/82997462-cd88-49eb-aad5-71d77903d903.jpeg b/converted/docs.it4i.cz/salomon/salomon-4.jpeg similarity index 100% rename from converted/docs.it4i.cz/salomon/82997462-cd88-49eb-aad5-71d77903d903.jpeg rename to converted/docs.it4i.cz/salomon/salomon-4.jpeg diff --git a/converted/docs.it4i.cz/salomon/c1109cbb-9bf4-4f0a-8b0f-a1e464fed0c4.jpeg b/converted/docs.it4i.cz/salomon/sgi-c1104-gp1.jpeg similarity index 100% rename from converted/docs.it4i.cz/salomon/c1109cbb-9bf4-4f0a-8b0f-a1e464fed0c4.jpeg rename to converted/docs.it4i.cz/salomon/sgi-c1104-gp1.jpeg diff --git a/converted/docs.it4i.cz/salomon/software/ansys/a34a45cc-9385-4f05-b12e-efadf1bd93bb.png b/converted/docs.it4i.cz/salomon/software/ansys/AMsetPar1.png similarity index 100% rename from converted/docs.it4i.cz/salomon/software/ansys/a34a45cc-9385-4f05-b12e-efadf1bd93bb.png rename to converted/docs.it4i.cz/salomon/software/ansys/AMsetPar1.png diff --git a/converted/docs.it4i.cz/salomon/software/ansys/ansys-cfx.md b/converted/docs.it4i.cz/salomon/software/ansys/ansys-cfx.md index 908277472e72fb79008c5b87645d811316f96453..e6d4672e1e07a5bddfc0d891c0fe1a9859aeb704 100644 --- a/converted/docs.it4i.cz/salomon/software/ansys/ansys-cfx.md +++ b/converted/docs.it4i.cz/salomon/software/ansys/ansys-cfx.md @@ -15,51 +15,51 @@ environment, with extensive capabilities for customization and automation using session files, scripting and a powerful expression language. -<span>To run ANSYS CFX in batch mode you can utilize/modify the default -cfx.pbs script and execute it via the qsub command.</span> - - #!/bin/bash - #PBS -l nodes=2:ppn=24 - #PBS -q qprod - #PBS -N $USER-CFX-Project - #PBS -A OPEN-0-0 - - #! Mail to user when job terminate or abort - #PBS -m ae - - #!change the working directory (default is home directory) - #cd <working directory> (working directory must exists) - WORK_DIR="/scratch/work/user/$USER" - cd $WORK_DIR - - echo Running on host `hostname` - echo Time is `date` - echo Directory is `pwd` - echo This jobs runs on the following processors: - echo `cat $PBS_NODEFILE` - - module load ANSYS - - #### Set number of processors per host listing - procs_per_host=24 - #### Create host list - hl="" - for host in `cat $PBS_NODEFILE` - do - if [ "$hl" = "" ] - then hl="$host:$procs_per_host" - else hl="$:$host:$procs_per_host" - fi - done - - echo Machines: $hl - - # prevent ANSYS from attempting to use scif0 interface - export MPI_IC_ORDER="UDAPL" - - #-dev input.def includes the input of CFX analysis in DEF format - #-P the name of prefered license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial)) - cfx5solve -def input.def -size 4 -size-ni 4x -part-large -start-method "Platform MPI Distributed Parallel" -par-dist $hl -P aa_r +>To run ANSYS CFX in batch mode you can utilize/modify the default +cfx.pbs script and execute it via the qsub command. + + #!/bin/bash + #PBS -l nodes=2:ppn=24 + #PBS -q qprod + #PBS -N $USER-CFX-Project + #PBS -A OPEN-0-0 + + #! Mail to user when job terminate or abort + #PBS -m ae + + #!change the working directory (default is home directory) + #cd <working directory> (working directory must exists) + WORK_DIR="/scratch/work/user/$USER" + cd $WORK_DIR + + echo Running on host `hostname` + echo Time is `date` + echo Directory is `pwd` + echo This jobs runs on the following processors: + echo `cat $PBS_NODEFILE` + + module load ANSYS + + #### Set number of processors per host listing + procs_per_host=24 + #### Create host list + hl="" + for host in `cat $PBS_NODEFILE` + do + if [ "$hl" = "" ] + then hl="$host:$procs_per_host" + else hl="$:$host:$procs_per_host" + fi + done + + echo Machines: $hl + + # prevent ANSYS from attempting to use scif0 interface + export MPI_IC_ORDER="UDAPL" + + #-dev input.def includes the input of CFX analysis in DEF format + #-P the name of prefered license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial)) + cfx5solve -def input.def -size 4 -size-ni 4x -part-large -start-method "Platform MPI Distributed Parallel" -par-dist $hl -P aa_r Header of the pbs file (above) is common and description can be find [this @@ -71,15 +71,15 @@ assumes such structure of allocated resources. Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input -file has to be specified. <span>Input file has to be defined by common +file has to be specified. >Input file has to be defined by common CFX def file which is attached to the cfx solver via parameter --def</span> +-def -**License** should be selected by parameter -P (Big letter **P**). +License** should be selected by parameter -P (Big letter **P**). Licensed products are the following: aa_r (ANSYS **Academic** Research), ane3fl (ANSYS Multiphysics)-**Commercial.** -<span>[More about licensing here](licensing.html)</span> +>[More about licensing here](licensing.html)  We have observed that the -P settings does not always work. Please set your [license diff --git a/converted/docs.it4i.cz/salomon/software/ansys/ansys-fluent.md b/converted/docs.it4i.cz/salomon/software/ansys/ansys-fluent.md index cfb0172936aabc5c143d792d605c86e99f660d70..dec9fca12ccf802eedb06f590801761b99d2a1f5 100644 --- a/converted/docs.it4i.cz/salomon/software/ansys/ansys-fluent.md +++ b/converted/docs.it4i.cz/salomon/software/ansys/ansys-fluent.md @@ -12,40 +12,40 @@ treatment plants. Special models that give the software the ability to model in-cylinder combustion, aeroacoustics, turbomachinery, and multiphase systems have served to broaden its reach. -<span>1. Common way to run Fluent over pbs file</span> +>1. Common way to run Fluent over pbs file ------------------------------------------------------ -<span>To run ANSYS Fluent in batch mode you can utilize/modify the -default fluent.pbs script and execute it via the qsub command.</span> +>To run ANSYS Fluent in batch mode you can utilize/modify the +default fluent.pbs script and execute it via the qsub command. - #!/bin/bash - #PBS -S /bin/bash - #PBS -l nodes=2:ppn=24 - #PBS -q qprod - #PBS -N Fluent-Project - #PBS -A OPEN-0-0 + #!/bin/bash + #PBS -S /bin/bash + #PBS -l nodes=2:ppn=24 + #PBS -q qprod + #PBS -N Fluent-Project + #PBS -A OPEN-0-0 - #! Mail to user when job terminate or abort - #PBS -m ae + #! Mail to user when job terminate or abort + #PBS -m ae - #!change the working directory (default is home directory) - #cd <working directory> (working directory must exists) - WORK_DIR="/scratch/work/user/$USER" - cd $WORK_DIR + #!change the working directory (default is home directory) + #cd <working directory> (working directory must exists) + WORK_DIR="/scratch/work/user/$USER" + cd $WORK_DIR - echo Running on host `hostname` - echo Time is `date` - echo Directory is `pwd` - echo This jobs runs on the following processors: - echo `cat $PBS_NODEFILE` + echo Running on host `hostname` + echo Time is `date` + echo Directory is `pwd` + echo This jobs runs on the following processors: + echo `cat $PBS_NODEFILE` - #### Load ansys module so that we find the cfx5solve command - module load ANSYS + #### Load ansys module so that we find the cfx5solve command + module load ANSYS - # Use following line to specify MPI for message-passing instead - NCORES=`wc -l $PBS_NODEFILE |awk '{print $1}'` + # Use following line to specify MPI for message-passing instead + NCORES=`wc -l $PBS_NODEFILE |awk '{print $1}'` - /apps/cae/ANSYS/16.1/v161/fluent/bin/fluent 3d -t$NCORES -cnf=$PBS_NODEFILE -g -i fluent.jou + /apps/cae/ANSYS/16.1/v161/fluent/bin/fluent 3d -t$NCORES -cnf=$PBS_NODEFILE -g -i fluent.jou Header of the pbs file (above) is common and description can be find on [this @@ -66,90 +66,90 @@ Journal file with definition of the input geometry and boundary conditions and defined process of solution has e.g. the following structure: - /file/read-case aircraft_2m.cas.gz - /solve/init - init - /solve/iterate - 10 - /file/write-case-dat aircraft_2m-solution - /exit yes + /file/read-case aircraft_2m.cas.gz + /solve/init + init + /solve/iterate + 10 + /file/write-case-dat aircraft_2m-solution + /exit yes -<span>The appropriate dimension of the problem has to be set by -parameter (2d/3d). </span> +>The appropriate dimension of the problem has to be set by +parameter (2d/3d). -<span>2. Fast way to run Fluent from command line</span> +>2. Fast way to run Fluent from command line -------------------------------------------------------- - fluent solver_version [FLUENT_options] -i journal_file -pbs + fluent solver_version [FLUENT_options] -i journal_file -pbs This syntax will start the ANSYS FLUENT job under PBS Professional using -the <span class="monospace">qsub</span> command in a batch manner. When +the qsub command in a batch manner. When resources are available, PBS Professional will start the job and return -a job ID, usually in the form of <span -class="emphasis">*job_ID.hostname*</span>. This job ID can then be used +a job ID, usually in the form of +class="emphasis">*job_ID.hostname*. This job ID can then be used to query, control, or stop the job using standard PBS Professional -commands, such as <span class="monospace">qstat</span> or <span -class="monospace">qdel</span>. The job will be run out of the current -working directory, and all output will be written to the file <span -class="monospace">fluent.o</span><span> </span><span -class="emphasis">*job_ID*</span>.     +commands, such as qstat or +qdel. The job will be run out of the current +working directory, and all output will be written to the file +fluent.o> +class="emphasis">*job_ID*.     3. Running Fluent via user's config file ---------------------------------------- -The sample script uses a configuration file called <span -class="monospace">pbs_fluent.conf</span>  if no command line arguments +The sample script uses a configuration file called +pbs_fluent.conf  if no command line arguments are present. This configuration file should be present in the directory from which the jobs are submitted (which is also the directory in which the jobs are executed). The following is an example of what the content -of <span class="monospace">pbs_fluent.conf</span> can be: - -``` - input="example_small.flin" - case="Small-1.65m.cas" - fluent_args="3d -pmyrinet" - outfile="fluent_test.out" - mpp="true" +of pbs_fluent.conf can be: + +``` +input="example_small.flin" +case="Small-1.65m.cas" +fluent_args="3d -pmyrinet" +outfile="fluent_test.out" +mpp="true" ``` The following is an explanation of the parameters: -<span><span class="monospace">input</span> is the name of the input -file.</span> +> input is the name of the input +file. -<span class="monospace">case</span> is the name of the <span -class="monospace">.cas</span> file that the input file will utilize. + case is the name of the +.cas file that the input file will utilize. -<span class="monospace">fluent_args</span> are extra ANSYS FLUENT + fluent_args are extra ANSYS FLUENT arguments. As shown in the previous example, you can specify the -interconnect by using the <span class="monospace">-p</span> interconnect -command. The available interconnects include <span -class="monospace">ethernet</span> (the default), <span -class="monospace">myrinet</span>,<span class="monospace"> -infiniband</span>, <span class="monospace">vendor</span>, <span -class="monospace">altix</span><span>,</span> and <span -class="monospace">crayx</span>. The MPI is selected automatically, based +interconnect by using the -p interconnect +command. The available interconnects include +ethernet (the default), +myrinet, class="monospace"> +infiniband, vendor, +altix>, and +crayx. The MPI is selected automatically, based on the specified interconnect. -<span class="monospace">outfile</span> is the name of the file to which + outfile is the name of the file to which the standard output will be sent. -<span class="monospace">mpp="true"</span> will tell the job script to + mpp="true" will tell the job script to execute the job across multiple processors.         -<span>To run ANSYS Fluent in batch mode with user's config file you can +>To run ANSYS Fluent in batch mode with user's config file you can utilize/modify the following script and execute it via the qsub -command.</span> +command. -``` +``` #!/bin/sh #PBS -l nodes=2:ppn=24 #PBS -1 qprod @@ -160,30 +160,30 @@ command.</span> #We assume that if they didn’t specify arguments then they should use the #config file if [ "xx${input}${case}${mpp}${fluent_args}zz" = "xxzz" ]; then - if [ -f pbs_fluent.conf ]; then - . pbs_fluent.conf - else - printf "No command line arguments specified, " - printf "and no configuration file found. Exiting n" - fi + if [ -f pbs_fluent.conf ]; then + . pbs_fluent.conf + else + printf "No command line arguments specified, " + printf "and no configuration file found. Exiting n" + fi fi #Augment the ANSYS FLUENT command line arguments case "$mpp" in - true) - #MPI job execution scenario - num_nodes=â€cat $PBS_NODEFILE | sort -u | wc -l†- cpus=â€expr $num_nodes * $NCPUS†- #Default arguments for mpp jobs, these should be changed to suit your - #needs. - fluent_args="-t$ $fluent_args -cnf=$PBS_NODEFILE" - ;; - *) - #SMP case - #Default arguments for smp jobs, should be adjusted to suit your - #needs. - fluent_args="-t$NCPUS $fluent_args" - ;; + true) + #MPI job execution scenario + num_nodes=â€cat $PBS_NODEFILE | sort -u | wc -l†+ cpus=â€expr $num_nodes * $NCPUS†+ #Default arguments for mpp jobs, these should be changed to suit your + #needs. + fluent_args="-t$ $fluent_args -cnf=$PBS_NODEFILE" + ;; + *) + #SMP case + #Default arguments for smp jobs, should be adjusted to suit your + #needs. + fluent_args="-t$NCPUS $fluent_args" + ;; esac #Default arguments for all jobs fluent_args="-ssh -g -i $input $fluent_args" @@ -200,13 +200,13 @@ command.</span> -<span>It runs the jobs out of the directory from which they are -submitted (PBS_O_WORKDIR).</span> +>It runs the jobs out of the directory from which they are +submitted (PBS_O_WORKDIR). 4. Running Fluent in parralel ----------------------------- -[]()Fluent could be run in parallel only under Academic Research +Fluent could be run in parallel only under Academic Research license. To do so this ANSYS Academic Research license must be placed before ANSYS CFD license in user preferences. To make this change [anslic_admin utility should be diff --git a/converted/docs.it4i.cz/salomon/software/ansys/ansys-ls-dyna.md b/converted/docs.it4i.cz/salomon/software/ansys/ansys-ls-dyna.md index e1e59d32e960917d52e941d23949f91ae32db6fb..7b94cf848b907b642a25b99c2ebcee6c0792348d 100644 --- a/converted/docs.it4i.cz/salomon/software/ansys/ansys-ls-dyna.md +++ b/converted/docs.it4i.cz/salomon/software/ansys/ansys-ls-dyna.md @@ -8,69 +8,69 @@ technology-rich, time-tested explicit solver without the need to contend with the complex input requirements of this sophisticated program. Introduced in 1996, ANSYS LS-DYNA capabilities have helped customers in numerous industries to resolve highly intricate design -issues. <span>ANSYS Mechanical users have been able take advantage of +issues. >ANSYS Mechanical users have been able take advantage of complex explicit solutions for a long time utilizing the traditional -ANSYS Parametric Design Language (APDL) environment. <span>These +ANSYS Parametric Design Language (APDL) environment. >These explicit capabilities are available to ANSYS Workbench users as well. The Workbench platform is a powerful, comprehensive, easy-to-use environment for engineering simulation. CAD import from all sources, geometry cleanup, automatic meshing, solution, parametric optimization, result visualization and comprehensive report generation are all available within a single fully interactive modern graphical user -environment.</span></span> - -<span>To run ANSYS LS-DYNA in batch mode you can utilize/modify the -default ansysdyna.pbs script and execute it via the qsub command.</span> - - #!/bin/bash - #PBS -l nodes=2:ppn=24 - #PBS -q qprod - #PBS -N DYNA-Project - #PBS -A OPEN-0-0 - - #! Mail to user when job terminate or abort - #PBS -m ae - - #!change the working directory (default is home directory) - #cd <working directory> - WORK_DIR="/scratch/work/user/$USER" - cd $WORK_DIR - - echo Running on host `hostname` - echo Time is `date` - echo Directory is `pwd` - echo This jobs runs on the following processors: - echo `cat $PBS_NODEFILE` - - module load ANSYS - - #### Set number of processors per node - procs_per_host=24 - #### Create host list - hl="" - for host in `cat $PBS_NODEFILE` - do - if [ "$hl" = "" ] - then hl="$host:$procs_per_host" - else hl="$:$host:$procs_per_host" - fi - done - - echo Machines: $hl - - # prevent ANSYS from attempting to use scif0 interface - export MPI_IC_ORDER="UDAPL" - - lsdyna161 -dis -usessh -machines "$hl" i=input.k - -<span>Header of the pbs file (above) is common and description can be -find </span><span> on [this +environment. + +>To run ANSYS LS-DYNA in batch mode you can utilize/modify the +default ansysdyna.pbs script and execute it via the qsub command. + + #!/bin/bash + #PBS -l nodes=2:ppn=24 + #PBS -q qprod + #PBS -N DYNA-Project + #PBS -A OPEN-0-0 + + #! Mail to user when job terminate or abort + #PBS -m ae + + #!change the working directory (default is home directory) + #cd <working directory> + WORK_DIR="/scratch/work/user/$USER" + cd $WORK_DIR + + echo Running on host `hostname` + echo Time is `date` + echo Directory is `pwd` + echo This jobs runs on the following processors: + echo `cat $PBS_NODEFILE` + + module load ANSYS + + #### Set number of processors per node + procs_per_host=24 + #### Create host list + hl="" + for host in `cat $PBS_NODEFILE` + do + if [ "$hl" = "" ] + then hl="$host:$procs_per_host" + else hl="$:$host:$procs_per_host" + fi + done + + echo Machines: $hl + + # prevent ANSYS from attempting to use scif0 interface + export MPI_IC_ORDER="UDAPL" + + lsdyna161 -dis -usessh -machines "$hl" i=input.k + +>Header of the pbs file (above) is common and description can be +find > on [this site](../../resource-allocation-and-job-execution/job-submission-and-execution.html). [SVS FEM](http://www.svsfem.cz) recommends to utilize sources by keywords: nodes, ppn. These keywords allows to address directly the number of nodes (computers) and cores (ppn) which will be utilized in the job. Also the rest of code assumes such structure of -allocated resources.</span> +allocated resources. Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input @@ -82,8 +82,8 @@ fail to run on nodes with Xeon Phi accelerator (it will use the virtual interface of Phi cards instead of the real InfiniBand interface and MPI will fail. -<span><span> -</span></span> +>> + diff --git a/converted/docs.it4i.cz/salomon/software/ansys/ansys-mechanical-apdl.md b/converted/docs.it4i.cz/salomon/software/ansys/ansys-mechanical-apdl.md index 691b8e710823da60c216db5270677c0aa3f51938..7aac5b95c4eac9a6d3397a6b6f4ebf10e2db5eb4 100644 --- a/converted/docs.it4i.cz/salomon/software/ansys/ansys-mechanical-apdl.md +++ b/converted/docs.it4i.cz/salomon/software/ansys/ansys-mechanical-apdl.md @@ -1,61 +1,61 @@ ANSYS MAPDL =========== -<span>**[ANSYS +>**[ANSYS Multiphysics](http://www.ansys.com/Products/Simulation+Technology/Structural+Mechanics/ANSYS+Multiphysics)** software offers a comprehensive product solution for both multiphysics and single-physics analysis. The product includes structural, thermal, fluid and both high- and low-frequency electromagnetic analysis. The product also contains solutions for both direct and sequentially coupled physics problems including direct coupled-field elements and the ANSYS -multi-field solver.</span> - -<span>To run ANSYS MAPDL in batch mode you can utilize/modify the -default mapdl.pbs script and execute it via the qsub command.</span> - - #!/bin/bash - #PBS -l nodes=2:ppn=24 - #PBS -q qprod - #PBS -N ANSYS-Project - #PBS -A OPEN-0-0 - - #! Mail to user when job terminate or abort - #PBS -m ae - - #!change the working directory (default is home directory) - #cd <working directory> (working directory must exists) - WORK_DIR="/scratch/work/user/$USER" - cd $WORK_DIR - - echo Running on host `hostname` - echo Time is `date` - echo Directory is `pwd` - echo This jobs runs on the following processors: - echo `cat $PBS_NODEFILE` - - module load ANSYS/16.1 - - #### Set number of processors per host listing - procs_per_host=24 - #### Create host list - hl="" - for host in `cat $PBS_NODEFILE` - do - if [ "$hl" = "" ] - then hl="$host:$procs_per_host" - else hl="$:$host:$procs_per_host" - fi - done - - echo Machines: $hl - - # prevent ANSYS from attempting to use scif0 interface - export MPI_IC_ORDER="UDAPL" - - #-i input.dat includes the input of analysis in APDL format - #-o file.out is output file from ansys where all text outputs will be redirected - #-p the name of license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial), aa_r_dy=Academic AUTODYN) - ansys161 -b -dis -usessh -p aa_r -i input.dat -o file.out -machines "$hl" -dir $WORK_DIR +multi-field solver. + +>To run ANSYS MAPDL in batch mode you can utilize/modify the +default mapdl.pbs script and execute it via the qsub command. + + #!/bin/bash + #PBS -l nodes=2:ppn=24 + #PBS -q qprod + #PBS -N ANSYS-Project + #PBS -A OPEN-0-0 + + #! Mail to user when job terminate or abort + #PBS -m ae + + #!change the working directory (default is home directory) + #cd <working directory> (working directory must exists) + WORK_DIR="/scratch/work/user/$USER" + cd $WORK_DIR + + echo Running on host `hostname` + echo Time is `date` + echo Directory is `pwd` + echo This jobs runs on the following processors: + echo `cat $PBS_NODEFILE` + + module load ANSYS/16.1 + + #### Set number of processors per host listing + procs_per_host=24 + #### Create host list + hl="" + for host in `cat $PBS_NODEFILE` + do + if [ "$hl" = "" ] + then hl="$host:$procs_per_host" + else hl="$:$host:$procs_per_host" + fi + done + + echo Machines: $hl + + # prevent ANSYS from attempting to use scif0 interface + export MPI_IC_ORDER="UDAPL" + + #-i input.dat includes the input of analysis in APDL format + #-o file.out is output file from ansys where all text outputs will be redirected + #-p the name of license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial), aa_r_dy=Academic AUTODYN) + ansys161 -b -dis -usessh -p aa_r -i input.dat -o file.out -machines "$hl" -dir $WORK_DIR Header of the PBS file (above) is common and description can be find on [this @@ -71,12 +71,12 @@ queue. Input file should be in working directory or full path to input file has to be specified. Input file has to be defined by common APDL file which is attached to the ansys solver via parameter -i -**License** should be selected by parameter -p. Licensed products are +License** should be selected by parameter -p. Licensed products are the following: aa_r (ANSYS **Academic** Research), ane3fl (ANSYS Multiphysics)-**Commercial**, aa_r_dy (ANSYS **Academic** -AUTODYN)<span> +AUTODYN)> [More about licensing here](licensing.html) -</span> + diff --git a/converted/docs.it4i.cz/salomon/software/ansys/ansys-products-mechanical-fluent-cfx-mapdl.md b/converted/docs.it4i.cz/salomon/software/ansys/ansys-products-mechanical-fluent-cfx-mapdl.md index 5d04e44a694c6616e4b5b57e85fe3f52f1d5194e..4e3c1f7353d4c75b43d55988b22735e04d3ccdeb 100644 --- a/converted/docs.it4i.cz/salomon/software/ansys/ansys-products-mechanical-fluent-cfx-mapdl.md +++ b/converted/docs.it4i.cz/salomon/software/ansys/ansys-products-mechanical-fluent-cfx-mapdl.md @@ -1,7 +1,7 @@ Overview of ANSYS Products ========================== -**[SVS FEM](http://www.svsfem.cz/)** as ***[ANSYS +[SVS FEM](http://www.svsfem.cz/)** as ***[ANSYS Channel partner](http://www.ansys.com/)*** for Czech Republic provided all ANSYS licenses for all clusters and supports of all ANSYS Products (Multiphysics, Mechanical, MAPDL, CFX, Fluent, @@ -19,7 +19,7 @@ licensing here](licensing.html) To load the latest version of any ANSYS product (Mechanical, Fluent, CFX, MAPDL,...) load the module: - $ module load ANSYS + $ module load ANSYS ANSYS supports interactive regime, but due to assumed solution of extremely difficult tasks it is not recommended. diff --git a/converted/docs.it4i.cz/salomon/software/ansys/ansys.md b/converted/docs.it4i.cz/salomon/software/ansys/ansys.md index d664bf055da459d6ff7e7e89cbdf4fa82b9e290d..6101ad911d4f90bf715b1540e23c2fdd6ded9612 100644 --- a/converted/docs.it4i.cz/salomon/software/ansys/ansys.md +++ b/converted/docs.it4i.cz/salomon/software/ansys/ansys.md @@ -1,7 +1,7 @@ Overview of ANSYS Products ========================== -**[SVS FEM](http://www.svsfem.cz/)** as ***[ANSYS +[SVS FEM](http://www.svsfem.cz/)** as ***[ANSYS Channel partner](http://www.ansys.com/)*** for Czech Republic provided all ANSYS licenses for all clusters and supports of all ANSYS Products (Multiphysics, Mechanical, MAPDL, CFX, Fluent, @@ -19,7 +19,7 @@ licensing here](ansys/licensing.html) To load the latest version of any ANSYS product (Mechanical, Fluent, CFX, MAPDL,...) load the module: - $ module load ANSYS + $ module load ANSYS ANSYS supports interactive regime, but due to assumed solution of extremely difficult tasks it is not recommended. diff --git a/converted/docs.it4i.cz/salomon/software/ansys/licensing.md b/converted/docs.it4i.cz/salomon/software/ansys/licensing.md index 409633e18684c961bb1772d4285a073d0e0afc09..4e7cdebe4649ad9b68a996478b8450365caed786 100644 --- a/converted/docs.it4i.cz/salomon/software/ansys/licensing.md +++ b/converted/docs.it4i.cz/salomon/software/ansys/licensing.md @@ -4,17 +4,17 @@ Licensing and Available Versions ANSYS licence can be used by: ----------------------------- -- all persons in the carrying out of the CE IT4Innovations Project (In - addition to the primary licensee, which is VSB - Technical - University of Ostrava, users are CE IT4Innovations third parties - - CE IT4Innovations project partners, particularly the University of - Ostrava, the Brno University of Technology - Faculty of Informatics, - the Silesian University in Opava, Institute of Geonics AS CR.) -- <span id="result_box" class="short_text"><span class="hps">all - persons</span> <span class="hps">who have a valid</span> <span - class="hps">license</span></span> -- <span id="result_box" class="short_text"><span class="hps">students - of</span> <span class="hps">the Technical University</span></span> +- all persons in the carrying out of the CE IT4Innovations Project (In + addition to the primary licensee, which is VSB - Technical + University of Ostrava, users are CE IT4Innovations third parties - + CE IT4Innovations project partners, particularly the University of + Ostrava, the Brno University of Technology - Faculty of Informatics, + the Silesian University in Opava, Institute of Geonics AS CR.) +- id="result_box" class="short_text"> class="hps">all + persons class="hps">who have a valid + class="hps">license +- id="result_box" class="short_text"> class="hps">students + of class="hps">the Technical University</span> ANSYS Academic Research ----------------------- @@ -32,8 +32,8 @@ restrictions. Available Versions ------------------ -- 16.1 -- 17.0 +- 16.1 +- 17.0 License Preferences ------------------- diff --git a/converted/docs.it4i.cz/salomon/software/ansys/setting-license-preferences.md b/converted/docs.it4i.cz/salomon/software/ansys/setting-license-preferences.md index 6335b793cd556f940e16225f1c299bf4135c217e..5338e55ced9d7217de8952066c724173bfa7ecc9 100644 --- a/converted/docs.it4i.cz/salomon/software/ansys/setting-license-preferences.md +++ b/converted/docs.it4i.cz/salomon/software/ansys/setting-license-preferences.md @@ -2,8 +2,8 @@ Setting license preferences =========================== Some ANSYS tools allow you to explicitly specify usage of academic or -commercial licenses in the command line (eg. <span -class="monospace">ansys161 -p aa_r</span> to select Academic Research +commercial licenses in the command line (eg. +ansys161 -p aa_r to select Academic Research license). However, we have observed that not all tools obey this option and choose commercial license. @@ -13,7 +13,7 @@ or bottom of the list accordingly. Launch the ANSLIC_ADMIN utility in a graphical environment: - $ANSYSLIC_DIR/lic_admin/anslic_admin + $ANSYSLIC_DIR/lic_admin/anslic_admin ANSLIC_ADMIN Utility will be run diff --git a/converted/docs.it4i.cz/salomon/software/ansys/workbench.md b/converted/docs.it4i.cz/salomon/software/ansys/workbench.md index 59c64a4b6eebb616dd00c1850378744f79b493cc..dfa1cb805f42d7e9a7ab7972dc579330b5d459c7 100644 --- a/converted/docs.it4i.cz/salomon/software/ansys/workbench.md +++ b/converted/docs.it4i.cz/salomon/software/ansys/workbench.md @@ -17,7 +17,7 @@ run on two Salomon nodes). If you want the job to run on more then 1 node, you must also provide a so called MPI appfile. In the Additional Command Line Arguments input field, enter : - -mpifile /path/to/my/job/mpifile.txt + -mpifile /path/to/my/job/mpifile.txt Where /path/to/my/job is the directory where your project is saved. We will create the file mpifile.txt programatically later in the batch @@ -27,46 +27,46 @@ Processing* *Guide*. Now, save the project and close Workbench. We will use this script to launch the job: - #!/bin/bash - #PBS -l select=2:ncpus=24 - #PBS -q qprod - #PBS -N test9_mpi_2 - #PBS -A OPEN-0-0 - - # Mail to user when job terminate or abort - #PBS -m a - - # change the working directory - WORK_DIR="$PBS_O_WORKDIR" - cd $WORK_DIR - - echo Running on host `hostname` - echo Time is `date` - echo Directory is `pwd` - echo This jobs runs on the following nodes: - echo `cat $PBS_NODEFILE` - - module load ANSYS - - #### Set number of processors per host listing - procs_per_host=24 - #### Create MPI appfile - echo -n "" > mpifile.txt - for host in `cat $PBS_NODEFILE` - do - echo "-h $host -np $procs_per_host $ANSYS160_DIR/bin/ansysdis161 -dis" >> mpifile.txt - done - - #-i input.dat includes the input of analysis in APDL format - #-o file.out is output file from ansys where all text outputs will be redirected - #-p the name of license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial), aa_r_dy=Academic AUTODYN) - - # prevent using scsif0 interface on accelerated nodes - export MPI_IC_ORDER="UDAPL" - # spawn remote process using SSH (default is RSH) - export MPI_REMSH="/usr/bin/ssh" - - runwb2 -R jou6.wbjn -B -F test9.wbpj + #!/bin/bash + #PBS -l select=2:ncpus=24 + #PBS -q qprod + #PBS -N test9_mpi_2 + #PBS -A OPEN-0-0 + + # Mail to user when job terminate or abort + #PBS -m a + + # change the working directory + WORK_DIR="$PBS_O_WORKDIR" + cd $WORK_DIR + + echo Running on host `hostname` + echo Time is `date` + echo Directory is `pwd` + echo This jobs runs on the following nodes: + echo `cat $PBS_NODEFILE` + + module load ANSYS + + #### Set number of processors per host listing + procs_per_host=24 + #### Create MPI appfile + echo -n "" > mpifile.txt + for host in `cat $PBS_NODEFILE` + do + echo "-h $host -np $procs_per_host $ANSYS160_DIR/bin/ansysdis161 -dis" >> mpifile.txt + done + + #-i input.dat includes the input of analysis in APDL format + #-o file.out is output file from ansys where all text outputs will be redirected + #-p the name of license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial), aa_r_dy=Academic AUTODYN) + + # prevent using scsif0 interface on accelerated nodes + export MPI_IC_ORDER="UDAPL" + # spawn remote process using SSH (default is RSH) + export MPI_REMSH="/usr/bin/ssh" + + runwb2 -R jou6.wbjn -B -F test9.wbpj The solver settings are saved in file solvehandlers.xml, which is not located in the project directory. Verify your solved settings when diff --git a/converted/docs.it4i.cz/salomon/software/chemistry/molpro.md b/converted/docs.it4i.cz/salomon/software/chemistry/molpro.md index 9b1db148e3d6d5dcfe486710d609a5a0c48a93f6..844869e13be3d2aa1eefeb34de18ac59cd1daaab 100644 --- a/converted/docs.it4i.cz/salomon/software/chemistry/molpro.md +++ b/converted/docs.it4i.cz/salomon/software/chemistry/molpro.md @@ -15,11 +15,11 @@ License Molpro software package is available only to users that have a valid license. Please contact support to enable access to Molpro if you have a -valid license appropriate for running on our cluster (eg. <span>academic -research group licence, parallel execution).</span> +valid license appropriate for running on our cluster (eg. >academic +research group licence, parallel execution). -<span>To run Molpro, you need to have a valid license token present in -"<span class="monospace">$HOME/.molpro/token"</span></span>. You can +>To run Molpro, you need to have a valid license token present in +" $HOME/.molpro/token". You can download the token from [Molpro website](https://www.molpro.net/licensee/?portal=licensee). @@ -31,15 +31,15 @@ parallel version compiled with Intel compilers and Intel MPI. Compilation parameters are default : - Parameter Value - ------------------------------------------------- ----------------------------- - <span>max number of atoms</span> 200 - <span>max number of valence orbitals</span> 300 - <span>max number of basis functions</span> 4095 - <span>max number of states per symmmetry</span> 20 - <span>max number of state symmetries</span> 16 - <span>max number of records</span> 200 - <span>max number of primitives</span> <span>maxbfn x [2]</span> +Parameter Value +------------------------------------------------- ----------------------------- +>max number of atoms 200 +>max number of valence orbitals 300 +>max number of basis functions 4095 +>max number of states per symmmetry 20 +>max number of state symmetries 16 +>max number of records 200 +>max number of primitives >maxbfn x [2]  @@ -57,8 +57,8 @@ for more details. The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI -parallelization only. This can be achieved by passing option <span -class="monospace">mpiprocs=24:ompthreads=1</span> to PBS. +parallelization only. This can be achieved by passing option +mpiprocs=24:ompthreads=1 to PBS. You are advised to use the -d option to point to a directory in [SCRATCH filesystem](../../storage.html). Molpro can produce a @@ -67,26 +67,26 @@ these are placed in the fast scratch filesystem. ### Example jobscript - #PBS -A IT4I-0-0 - #PBS -q qprod - #PBS -l select=1:ncpus=24:mpiprocs=24:ompthreads=1 + #PBS -A IT4I-0-0 + #PBS -q qprod + #PBS -l select=1:ncpus=24:mpiprocs=24:ompthreads=1 - cd $PBS_O_WORKDIR + cd $PBS_O_WORKDIR - # load Molpro module - module add Molpro/2010.1-patch-57-intel2015b + # load Molpro module + module add Molpro/2010.1-patch-57-intel2015b - # create a directory in the SCRATCH filesystem - mkdir -p /scratch/work/user/$USER/$PBS_JOBID + # create a directory in the SCRATCH filesystem + mkdir -p /scratch/work/user/$USER/$PBS_JOBID - # copy an example input - cp /apps/all/Molpro/2010.1-patch57/molprop_2010_1_Linux_x86_64_i8/examples/caffeine_opt_diis.com . + # copy an example input + cp /apps/all/Molpro/2010.1-patch57/molprop_2010_1_Linux_x86_64_i8/examples/caffeine_opt_diis.com . - # run Molpro with default options - molpro -d /scratch/work/user/$USER/$PBS_JOBID caffeine_opt_diis.com + # run Molpro with default options + molpro -d /scratch/work/user/$USER/$PBS_JOBID caffeine_opt_diis.com - # delete scratch directory - rm -rf /scratch/work/user/$USER/$PBS_JOBID + # delete scratch directory + rm -rf /scratch/work/user/$USER/$PBS_JOBID diff --git a/converted/docs.it4i.cz/salomon/software/chemistry/nwchem.md b/converted/docs.it4i.cz/salomon/software/chemistry/nwchem.md index c37310924cb489476b64cf644b6be74d9821511e..18d279ce916ab9625d37d7bd86332fb9bdf5e448 100644 --- a/converted/docs.it4i.cz/salomon/software/chemistry/nwchem.md +++ b/converted/docs.it4i.cz/salomon/software/chemistry/nwchem.md @@ -2,14 +2,14 @@ NWChem ====== High-Performance Computational Chemistry -<span>Introduction</span> +>Introduction ------------------------- -<span>NWChem aims to provide its users with computational chemistry +>NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel -supercomputers to conventional workstation clusters.</span> +supercomputers to conventional workstation clusters. [Homepage](http://www.nwchem-sw.org/index.php/Main_Page) @@ -18,16 +18,16 @@ Installed versions The following versions are currently installed : -- <span>NWChem/6.3.revision2-2013-10-17-Python-2.7.8, current release. - Compiled with Intel compilers, MKL and Intel MPI</span> +- >NWChem/6.3.revision2-2013-10-17-Python-2.7.8, current release. + Compiled with Intel compilers, MKL and Intel MPI -  +  -- <span>NWChem/6.5.revision26243-intel-2015b-2014-09-10-Python-2.7.8</span> +- >NWChem/6.5.revision26243-intel-2015b-2014-09-10-Python-2.7.8 For a current list of installed versions, execute : - module avail NWChem + module avail NWChem The recommend to use version 6.5. Version 6.3 fails on Salomon nodes with accelerator, because it attempts to communicate over scif0 @@ -41,27 +41,27 @@ Running NWChem is compiled for parallel MPI execution. Normal procedure for MPI jobs applies. Sample jobscript : - #PBS -A IT4I-0-0 - #PBS -q qprod - #PBS -l select=1:ncpus=24:mpiprocs=24 + #PBS -A IT4I-0-0 + #PBS -q qprod + #PBS -l select=1:ncpus=24:mpiprocs=24 - cd $PBS_O_WORKDIR - module add NWChem/6.5.revision26243-intel-2015b-2014-09-10-Python-2.7.8 - mpirun nwchem h2o.nw + cd $PBS_O_WORKDIR + module add NWChem/6.5.revision26243-intel-2015b-2014-09-10-Python-2.7.8 + mpirun nwchem h2o.nw -<span>Options</span> +>Options -------------------- Please refer to [the documentation](http://www.nwchem-sw.org/index.php/Release62:Top-level) and in the input file set the following directives : -- <span>MEMORY : controls the amount of memory NWChem will use</span> -- <span>SCRATCH_DIR : set this to a directory in [SCRATCH - filesystem](../../storage.html) (or run the - calculation completely in a scratch directory). For certain - calculations, it might be advisable to reduce I/O by forcing - "direct" mode, eg. "scf direct"</span> +- >MEMORY : controls the amount of memory NWChem will use +- >SCRATCH_DIR : set this to a directory in [SCRATCH + filesystem](../../storage.html) (or run the + calculation completely in a scratch directory). For certain + calculations, it might be advisable to reduce I/O by forcing + "direct" mode, eg. "scf direct" diff --git a/converted/docs.it4i.cz/salomon/software/chemistry/phono3py.md b/converted/docs.it4i.cz/salomon/software/chemistry/phono3py.md index bf08e525601e4da94b98138331bcd5528a89bf15..4aeb425767613523985a9de4324d3d011cc6f944 100644 --- a/converted/docs.it4i.cz/salomon/software/chemistry/phono3py.md +++ b/converted/docs.it4i.cz/salomon/software/chemistry/phono3py.md @@ -2,7 +2,7 @@ Phono3py ======== - +  Introduction ------------- @@ -17,7 +17,7 @@ http://atztogo.github.io/phono3py/index.html Load the phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 module -``` +``` $ module load phono3py/0.9.14-ictce-7.3.5-Python-2.7.9 ``` @@ -31,7 +31,7 @@ using the diamond structure of silicon stored in [POSCAR](phono3py-input/poscar-si) (the same form as in VASP) using single displacement calculations within supercell. -``` +``` $ cat POSCAR  Si   1.0 @@ -53,16 +53,16 @@ Direct ### Generating displacement using 2x2x2 supercell for both second and third order force constants -``` +``` $ phono3py -d --dim="2 2 2" -c POSCAR ``` -<span class="n">111 displacements is created stored in <span -class="n">disp_fc3.yaml</span>, and the structure input files with this + 111 displacements is created stored in +disp_fc3.yaml, and the structure input files with this displacements are POSCAR-00XXX, where the XXX=111. -</span> -``` + +``` disp_fc3.yaml POSCAR-00008 POSCAR-00017 POSCAR-00026 POSCAR-00035 POSCAR-00044 POSCAR-00053 POSCAR-00062 POSCAR-00071 POSCAR-00080 POSCAR-00089 POSCAR-00098 POSCAR-00107 POSCAR        POSCAR-00009 POSCAR-00018 POSCAR-00027 POSCAR-00036 POSCAR-00045 POSCAR-00054 POSCAR-00063 POSCAR-00072 POSCAR-00081 POSCAR-00090 POSCAR-00099 POSCAR-00108 POSCAR-00001  POSCAR-00010 POSCAR-00019 POSCAR-00028 POSCAR-00037 POSCAR-00046 POSCAR-00055 POSCAR-00064 POSCAR-00073 POSCAR-00082 POSCAR-00091 POSCAR-00100 POSCAR-00109 @@ -74,7 +74,7 @@ POSCAR-00006  POSCAR-00015 POSCAR-00024 POSCAR-00033 POSCAR-00042 POS POSCAR-00007  POSCAR-00016 POSCAR-00025 POSCAR-00034 POSCAR-00043 POSCAR-00052 POSCAR-00061 POSCAR-00070 POSCAR-00079 POSCAR-00088 POSCAR-00097 POSCAR-00106 ``` -<span class="n"> For each displacement the forces needs to be + For each displacement the forces needs to be calculated, i.e. in form of the output file of VASP (vasprun.xml). For a single VASP calculations one needs [KPOINTS](phono3py-input/KPOINTS), @@ -84,9 +84,9 @@ single VASP calculations one needs generated by [prepare.sh](phono3py-input/prepare.sh) script. Then each of the single 111 calculations is submitted [run.sh](phono3py-input/run.sh) by -[submit.sh](phono3py-input/submit.sh).</span> +[submit.sh](phono3py-input/submit.sh). -``` +``` $./prepare.sh $ls disp-00001 disp-00009 disp-00017 disp-00025 disp-00033 disp-00041 disp-00049 disp-00057 disp-00065 disp-00073 disp-00081 disp-00089 disp-00097 disp-00105    INCAR @@ -99,30 +99,30 @@ disp-00007 disp-00015 disp-00023 disp-00031 disp-00039 disp-00047 di disp-00008 disp-00016 disp-00024 disp-00032 disp-00040 disp-00048 disp-00056 disp-00064 disp-00072 disp-00080 disp-00088 disp-00096 disp-00104 disp_fc3.yaml ``` -<span class="n">Taylor your run.sh script to fit into your project and + Taylor your run.sh script to fit into your project and other needs and submit all 111 calculations using submit.sh -script</span> +script -``` +``` $ ./submit.sh ``` -<span class="n">Collecting results and post-processing with phono3py</span> + Collecting results and post-processing with phono3py --------------------------------------------------------------------------- -<span class="n">Once all jobs are finished and vasprun.xml is created in -each disp-XXXXX directory the collection is done by </span> + Once all jobs are finished and vasprun.xml is created in +each disp-XXXXX directory the collection is done by -``` +``` $ phono3py --cf3 disp-{00001..00111}/vasprun.xml ``` -<span class="n"><span class="n">and + and `disp_fc2.yaml, FORCES_FC2`, `FORCES_FC3`{.docutils -.literal}</span> and disp_fc3.yaml should appear and put into the hdf -format by </span> +.literal} and disp_fc3.yaml should appear and put into the hdf +format by -``` +``` $ phono3py --dim="2 2 2" -c POSCAR ``` @@ -131,17 +131,17 @@ resulting in `fc2.hdf5` and `fc3.hdf5`{.docutils ### Thermal conductivity -<span class="pre">The phonon lifetime calculations takes some time, + The phonon lifetime calculations takes some time, however is independent on grid points, so could be splitted: -</span> -``` + +``` $ phono3py --fc3 --fc2 --dim="2 2 2" --mesh="9 9 9" --sigma 0.1 --wgp ``` -### <span class="n">Inspecting ir_grid_points.yaml</span> +### Inspecting ir_grid_points.yaml -``` +``` $ grep grid_point ir_grid_points.yaml num_reduced_ir_grid_points: 35 ir_grid_points: # [address, weight] @@ -185,21 +185,21 @@ ir_grid_points: # [address, weight] one finds which grid points needed to be calculated, for instance using following -``` +``` $ phono3py --fc3 --fc2 --dim="2 2 2" --mesh="9 9 9" -c POSCAR --sigma 0.1 --br --write-gamma --gp="0 1 2 ``` -<span class="n">one calculates grid points 0, 1, 2. To automize one can + one calculates grid points 0, 1, 2. To automize one can use for instance scripts to submit 5 points in series, see -[gofree-cond1.sh](phono3py-input/gofree-cond1.sh)</span> +[gofree-cond1.sh](phono3py-input/gofree-cond1.sh) -``` +``` $ qsub gofree-cond1.sh ``` -<span class="n">Finally the thermal conductivity result is produced by -grouping single conductivity per grid calculations using </span> + Finally the thermal conductivity result is produced by +grouping single conductivity per grid calculations using -``` +``` $ phono3py --fc3 --fc2 --dim="2 2 2" --mesh="9 9 9" --br --read_gamma ``` diff --git a/converted/docs.it4i.cz/salomon/software/compilers.md b/converted/docs.it4i.cz/salomon/software/compilers.md index 93eff8b158b2c0af8f82dcaa1824bcf2ca4a533d..294b3784c272924bdfda1204df3304a761cc3fcb 100644 --- a/converted/docs.it4i.cz/salomon/software/compilers.md +++ b/converted/docs.it4i.cz/salomon/software/compilers.md @@ -4,27 +4,27 @@ Compilers Available compilers, including GNU, INTEL and UPC compilers - + There are several compilers for different programming languages available on the cluster: -- C/C++ -- Fortran 77/90/95/HPF -- Unified Parallel C -- Java +- C/C++ +- Fortran 77/90/95/HPF +- Unified Parallel C +- Java The C/C++ and Fortran compilers are provided by: Opensource: -- GNU GCC -- Clang/LLVM +- GNU GCC +- Clang/LLVM Commercial licenses: -- Intel -- PGI +- Intel +- PGI Intel Compilers --------------- @@ -38,22 +38,22 @@ PGI Compilers The Portland Group Cluster Development Kit (PGI CDK) is available. - $ module load PGI - $ pgcc -v - $ pgc++ -v - $ pgf77 -v - $ pgf90 -v - $ pgf95 -v - $ pghpf -v + $ module load PGI + $ pgcc -v + $ pgc++ -v + $ pgf77 -v + $ pgf90 -v + $ pgf95 -v + $ pghpf -v The PGI CDK also incudes tools for debugging and profiling. PGDBG OpenMP/MPI debugger and PGPROF OpenMP/MPI profiler are available - $ module load PGI - $ module load Java - $ pgdbg & - $ pgprof & + $ module load PGI + $ module load Java + $ pgdbg & + $ pgprof & For more information, see the [PGI page](http://www.pgroup.com/products/pgicdk.htm). @@ -68,20 +68,20 @@ accessible in the search path by default. It is strongly recommended to use the up to date version which comes with the module GCC: - $ module load GCC - $ gcc -v - $ g++ -v - $ gfortran -v + $ module load GCC + $ gcc -v + $ g++ -v + $ gfortran -v With the module loaded two environment variables are predefined. One for maximum optimizations on the cluster's architecture, and the other for debugging purposes: - $ echo $OPTFLAGS - -O3 -march=native + $ echo $OPTFLAGS + -O3 -march=native - $ echo $DEBUGFLAGS - -O0 -g + $ echo $DEBUGFLAGS + -O0 -g For more information about the possibilities of the compilers, please see the man pages. @@ -91,42 +91,42 @@ Unified Parallel C UPC is supported by two compiler/runtime implementations: -- GNU - SMP/multi-threading support only -- Berkley - multi-node support as well as SMP/multi-threading support +- GNU - SMP/multi-threading support only +- Berkley - multi-node support as well as SMP/multi-threading support ### GNU UPC Compiler To use the GNU UPC compiler and run the compiled binaries use the module gupc - $ module add gupc - $ gupc -v - $ g++ -v + $ module add gupc + $ gupc -v + $ g++ -v Simple program to test the compiler - $ cat count.upc + $ cat count.upc - /* hello.upc - a simple UPC example */ - #include <upc.h> - #include <stdio.h> + /* hello.upc - a simple UPC example */ + #include <upc.h> + #include <stdio.h> - int main() { -  if (MYTHREAD == 0) { -    printf("Welcome to GNU UPC!!!n"); -  } -  upc_barrier; -  printf(" - Hello from thread %in", MYTHREAD); -  return 0; - } + int main() { +  if (MYTHREAD == 0) { +    printf("Welcome to GNU UPC!!!n"); +  } +  upc_barrier; +  printf(" - Hello from thread %in", MYTHREAD); +  return 0; + } To compile the example use - $ gupc -o count.upc.x count.upc + $ gupc -o count.upc.x count.upc To run the example with 5 threads issue - $ ./count.upc.x -fupc-threads-5 + $ ./count.upc.x -fupc-threads-5 For more informations see the man pages. @@ -135,8 +135,8 @@ For more informations see the man pages. To use the Berkley UPC compiler and runtime environment to run the binaries use the module bupc - $ module add bupc - $ upcc -version + $ module add bupc + $ upcc -version As default UPC network the "smp" is used. This is very quick and easy way for testing/debugging, but limited to one node only. @@ -144,40 +144,40 @@ way for testing/debugging, but limited to one node only. For production runs, it is recommended to use the native Infiband implementation of UPC network "ibv". For testing/debugging using multiple nodes, the "mpi" UPC network is recommended. Please note, that -**the selection of the network is done at the compile time** and not at +the selection of the network is done at the compile time** and not at runtime (as expected)! Example UPC code: - $ cat hello.upc + $ cat hello.upc - /* hello.upc - a simple UPC example */ - #include <upc.h> - #include <stdio.h> + /* hello.upc - a simple UPC example */ + #include <upc.h> + #include <stdio.h> - int main() { -  if (MYTHREAD == 0) { -    printf("Welcome to Berkeley UPC!!!n"); -  } -  upc_barrier; -  printf(" - Hello from thread %in", MYTHREAD); -  return 0; - } + int main() { +  if (MYTHREAD == 0) { +    printf("Welcome to Berkeley UPC!!!n"); +  } +  upc_barrier; +  printf(" - Hello from thread %in", MYTHREAD); +  return 0; + } To compile the example with the "ibv" UPC network use - $ upcc -network=ibv -o hello.upc.x hello.upc + $ upcc -network=ibv -o hello.upc.x hello.upc To run the example with 5 threads issue - $ upcrun -n 5 ./hello.upc.x + $ upcrun -n 5 ./hello.upc.x To run the example on two compute nodes using all 48 cores, with 48 threads, issue - $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=24 - $ module add bupc - $ upcrun -n 48 ./hello.upc.x + $ qsub -I -q qprod -A PROJECT_ID -l select=2:ncpus=24 + $ module add bupc + $ upcrun -n 48 ./hello.upc.x  For more informations see the man pages. diff --git a/converted/docs.it4i.cz/salomon/software/comsol/comsol-multiphysics.md b/converted/docs.it4i.cz/salomon/software/comsol/comsol-multiphysics.md index 692b590436bded3466d9597a9e664b0ef09f9b0d..b444af48f2a132562f9df7ca5d271767e1349c8d 100644 --- a/converted/docs.it4i.cz/salomon/software/comsol/comsol-multiphysics.md +++ b/converted/docs.it4i.cz/salomon/software/comsol/comsol-multiphysics.md @@ -3,103 +3,103 @@ COMSOL Multiphysics® - -<span><span>Introduction -</span></span> + +>>Introduction + ------------------------- -<span><span>[COMSOL](http://www.comsol.com)</span></span><span><span> +>>[COMSOL](http://www.comsol.com)<span><span> is a powerful environment for modelling and solving various engineering and scientific problems based on partial differential equations. COMSOL is designed to solve coupled or multiphysics phenomena. For many standard engineering problems COMSOL provides add-on products such as electrical, mechanical, fluid flow, and chemical -applications.</span></span> +applications. -- <span><span>[Structural Mechanics - Module](http://www.comsol.com/structural-mechanics-module), - </span></span> +- >>[Structural Mechanics + Module](http://www.comsol.com/structural-mechanics-module), + -- <span><span>[Heat Transfer - Module](http://www.comsol.com/heat-transfer-module), - </span></span> +- >>[Heat Transfer + Module](http://www.comsol.com/heat-transfer-module), + -- <span><span>[CFD - Module](http://www.comsol.com/cfd-module), - </span></span> +- >>[CFD + Module](http://www.comsol.com/cfd-module), + -- <span><span>[Acoustics - Module](http://www.comsol.com/acoustics-module), - </span></span> +- >>[Acoustics + Module](http://www.comsol.com/acoustics-module), + -- <span><span>and [many - others](http://www.comsol.com/products)</span></span> +- >>and [many + others](http://www.comsol.com/products) -<span><span>COMSOL also allows an -</span></span><span><span><span><span>interface support for +>>COMSOL also allows an +>><span><span>interface support for equation-based modelling of -</span></span></span></span><span><span>partial differential -equations.</span></span> +</span></span>>>partial differential +equations. + +>>Execution -<span><span>Execution -</span></span> ---------------------- -<span><span>On the clusters COMSOL is available in the latest stable -version. There are two variants of the release:</span></span> - -- <span><span>**Non commercial**</span></span><span><span> or so - called </span></span><span><span>**EDU - variant**</span></span><span><span>, which can be used for research - and educational purposes.</span></span> - -- <span><span>**Commercial**</span></span><span><span> or so called - </span></span><span><span>**COM variant**</span></span><span><span>, - which can used also for commercial activities. - </span></span><span><span>**COM variant**</span></span><span><span> - has only subset of features compared to the - </span></span><span><span>**EDU - variant**</span></span><span><span> available. <span - class="internal-link"><span id="result_box" class="short_text"><span - class="hps">More</span> <span class="hps">about - licensing</span> will be posted <span class="hps">here - soon</span>.</span></span> - </span></span> - -<span><span>To load the of COMSOL load the module</span></span> - -``` +>>On the clusters COMSOL is available in the latest stable +version. There are two variants of the release: + +- >>**Non commercial**<span><span> or so + called >>**EDU + variant**>>, which can be used for research + and educational purposes. + +- >>**Commercial**<span><span> or so called + >>**COM variant**</span></span><span><span>, + which can used also for commercial activities. + >>**COM variant**</span></span><span><span> + has only subset of features compared to the + >>**EDU + variant**>> available. <span + id="result_box" class="short_text"> + class="hps">More class="hps">about + licensing will be posted class="hps">here + soon.</span> + + +>>To load the of COMSOL load the module + +``` $ module load COMSOL/51-EDU ``` -<span><span>By default the </span></span><span><span>**EDU -variant**</span></span><span><span> will be loaded. If user needs other +>>By default the <span><span>**EDU +variant**>> will be loaded. If user needs other version or variant, load the particular version. To obtain the list of -available versions use</span></span> +available versions use -``` +``` $ module avail COMSOL ``` -<span><span>If user needs to prepare COMSOL jobs in the interactive mode +>>If user needs to prepare COMSOL jobs in the interactive mode it is recommend to use COMSOL on the compute nodes via PBS Pro scheduler. In order run the COMSOL Desktop GUI on Windows is recommended to use the [Virtual Network Computing -(VNC)](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html).</span></span> +(VNC)](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html). -``` +``` $ xhost + $ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ppn=24 $ module load COMSOL $ comsol ``` -<span><span>To run COMSOL in batch mode, without the COMSOL Desktop GUI +>>To run COMSOL in batch mode, without the COMSOL Desktop GUI environment, user can utilized the default (comsol.pbs) job script and -execute it via the qsub command.</span></span> +execute it via the qsub command. -``` +``` #!/bin/bash #PBS -l select=3:ppn=24 #PBS -q qprod @@ -124,37 +124,37 @@ ntask=$(wc -l $PBS_NODEFILE) comsol -nn $ batch -configuration /tmp –mpiarg –rmk –mpiarg pbs -tmpdir /scratch/$USER/ -inputfile name_input_f.mph -outputfile name_output_f.mph -batchlog name_log_f.log ``` -<span><span>Working directory has to be created before sending the +>>Working directory has to be created before sending the (comsol.pbs) job script into the queue. Input file (name_input_f.mph) has to be in working directory or full path to input file has to be specified. The appropriate path to the temp directory of the job has to -be set by command option (-tmpdir).</span></span> +be set by command option (-tmpdir). LiveLink™* *for MATLAB^®^ ------------------------- -<span><span>COMSOL is the software package for the numerical solution of +>>COMSOL is the software package for the numerical solution of the partial differential equations. LiveLink for MATLAB allows connection to the -COMSOL</span></span><span><span>^<span><span><span><span><span><span><span>**®**</span></span></span></span></span></span></span>^</span></span><span><span> +COMSOL>>^<span><span><span><span><span><span><span>**®**</span></span></span></span></span></span></span>^</span></span><span><span> API (Application Programming Interface) with the benefits of the programming language and computing environment of the MATLAB. -</span></span> -<span><span>LiveLink for MATLAB is available in both -</span></span><span><span>**EDU**</span></span><span><span> and -</span></span><span><span>**COM**</span></span><span><span> -</span></span><span><span>**variant**</span></span><span><span> of the + +>>LiveLink for MATLAB is available in both +>>**EDU**</span></span><span><span> and +>>**COM**</span></span><span><span> +>>**variant**</span></span><span><span> of the COMSOL release. On the clusters 1 commercial -(</span></span><span><span>**COM**</span></span><span><span>) license +(>>**COM**</span></span><span><span>) license and the 5 educational -(</span></span><span><span>**EDU**</span></span><span><span>) licenses +(>>**EDU**</span></span><span><span>) licenses of LiveLink for MATLAB (please see the [ISV Licenses](../isv_licenses.html)) are available. Following example shows how to start COMSOL model from MATLAB via -LiveLink in the interactive mode.</span></span> +LiveLink in the interactive mode. -``` +``` $ xhost + $ qsub -I -X -A PROJECT_ID -q qexp -l select=1:ppn=24 $ module load MATLAB @@ -162,15 +162,15 @@ $ module load COMSOL $ comsol server MATLAB ``` -<span><span>At the first time to launch the LiveLink for MATLAB +>>At the first time to launch the LiveLink for MATLAB (client-MATLAB/server-COMSOL connection) the login and password is -requested and this information is not requested again.</span></span> +requested and this information is not requested again. -<span><span>To run LiveLink for MATLAB in batch mode with +>>To run LiveLink for MATLAB in batch mode with (comsol_matlab.pbs) job script you can utilize/modify the following -script and execute it via the qsub command.</span></span> +script and execute it via the qsub command. -``` +``` #!/bin/bash #PBS -l select=3:ppn=24 #PBS -q qprod diff --git a/converted/docs.it4i.cz/salomon/software/comsol/licensing-and-available-versions.md b/converted/docs.it4i.cz/salomon/software/comsol/licensing-and-available-versions.md index b5b4951999864f907738cab5b8ec3556f598c650..c12b4071016e0b938bfa6268be33ded0d9860364 100644 --- a/converted/docs.it4i.cz/salomon/software/comsol/licensing-and-available-versions.md +++ b/converted/docs.it4i.cz/salomon/software/comsol/licensing-and-available-versions.md @@ -4,17 +4,17 @@ Licensing and Available Versions Comsol licence can be used by: ------------------------------ -- all persons in the carrying out of the CE IT4Innovations Project (In - addition to the primary licensee, which is VSB - Technical - University of Ostrava, users are CE IT4Innovations third parties - - CE IT4Innovations project partners, particularly the University of - Ostrava, the Brno University of Technology - Faculty of Informatics, - the Silesian University in Opava, Institute of Geonics AS CR.) -- <span id="result_box" class="short_text"><span class="hps">all - persons</span> <span class="hps">who have a valid</span> <span - class="hps">license</span></span> -- <span id="result_box" class="short_text"><span class="hps">students - of</span> <span class="hps">the Technical University</span></span> +- all persons in the carrying out of the CE IT4Innovations Project (In + addition to the primary licensee, which is VSB - Technical + University of Ostrava, users are CE IT4Innovations third parties - + CE IT4Innovations project partners, particularly the University of + Ostrava, the Brno University of Technology - Faculty of Informatics, + the Silesian University in Opava, Institute of Geonics AS CR.) +- id="result_box" class="short_text"> class="hps">all + persons class="hps">who have a valid + class="hps">license +- id="result_box" class="short_text"> class="hps">students + of class="hps">the Technical University</span> Comsol EDU Network Licence -------------------------- @@ -27,13 +27,13 @@ Comsol COM Network Licence The licence intended to be used for science and research, publications, students’ projects, commercial research with no commercial use -restrictions. <span id="result_box"><span class="hps">E</span><span -class="hps">nables</span> <span class="hps">the solution</span> <span -class="hps">of at least</span> <span class="hps">one job</span> <span -class="hps">by one user</span> <span class="hps">in one</span> <span -class="hps">program start.</span></span> +restrictions. id="result_box"> class="hps">E<span +class="hps">nables class="hps">the solution +class="hps">of at least class="hps">one job +class="hps">by one user class="hps">in one +class="hps">program start. Available Versions ------------------ -- ver. 51 +- ver. 51 diff --git a/converted/docs.it4i.cz/salomon/software/debuggers.md b/converted/docs.it4i.cz/salomon/software/debuggers.md index 67defc2e2cefdc7e3226b6d3e876b10c67953a32..6ff12150489d2616f140cf0b59f654e00ab3e85e 100644 --- a/converted/docs.it4i.cz/salomon/software/debuggers.md +++ b/converted/docs.it4i.cz/salomon/software/debuggers.md @@ -3,7 +3,7 @@ Debuggers and profilers summary - + Introduction ------------ @@ -25,8 +25,8 @@ environment. Use [X display](../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html) for running the GUI. - $ module load intel - $ idb + $ module load intel + $ idb Read more at the [Intel Debugger](intel-suite/intel-debugger.html) page. @@ -42,8 +42,8 @@ every thread running as part of your program, or for every process - even if these processes are distributed across a cluster using an MPI implementation. - $ module load Forge - $ forge + $ module load Forge + $ forge Read more at the [Allinea DDT](debuggers/allinea-ddt.html) page. @@ -58,8 +58,8 @@ about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs. Our license is limited to 64 MPI processes. - $ module load PerformanceReports/6.0 - $ perf-report mpirun -n 64 ./my_application argument01 argument02 + $ module load PerformanceReports/6.0 + $ perf-report mpirun -n 64 ./my_application argument01 argument02 Read more at the [Allinea Performance Reports](debuggers/allinea-performance-reports.html) @@ -74,8 +74,8 @@ analyze, organize, and test programs, making it easy to isolate and identify problems in individual threads and processes in programs of great complexity. - $ module load TotalView/8.15.4-6-linux-x86-64 - $ totalview + $ module load TotalView/8.15.4-6-linux-x86-64 + $ totalview Read more at the [Totalview](debuggers/total-view.html) page. @@ -85,8 +85,8 @@ Vampir trace analyzer Vampir is a GUI trace analyzer for traces in OTF format. - $ module load Vampir/8.5.0 - $ vampir + $ module load Vampir/8.5.0 + $ vampir Read more at the [Vampir](debuggers/vampir.html) page. diff --git a/converted/docs.it4i.cz/salomon/software/debuggers/3550e4ae-2eab-4571-8387-11a112dd6ca8.png b/converted/docs.it4i.cz/salomon/software/debuggers/Snmekobrazovky20160211v14.27.45.png similarity index 100% rename from converted/docs.it4i.cz/salomon/software/debuggers/3550e4ae-2eab-4571-8387-11a112dd6ca8.png rename to converted/docs.it4i.cz/salomon/software/debuggers/Snmekobrazovky20160211v14.27.45.png diff --git a/converted/docs.it4i.cz/salomon/software/debuggers/42d90ce5-8468-4edb-94bb-4009853d9f65.png b/converted/docs.it4i.cz/salomon/software/debuggers/Snmekobrazovky20160708v12.33.35.png similarity index 100% rename from converted/docs.it4i.cz/salomon/software/debuggers/42d90ce5-8468-4edb-94bb-4009853d9f65.png rename to converted/docs.it4i.cz/salomon/software/debuggers/Snmekobrazovky20160708v12.33.35.png diff --git a/converted/docs.it4i.cz/salomon/software/debuggers/aislinn.md b/converted/docs.it4i.cz/salomon/software/debuggers/aislinn.md index 9f4f0e6349129a527def874de10c17e72edd25b3..fa876d1ca4a9ba0057df1a4f96efed267c2d13b3 100644 --- a/converted/docs.it4i.cz/salomon/software/debuggers/aislinn.md +++ b/converted/docs.it4i.cz/salomon/software/debuggers/aislinn.md @@ -1,15 +1,15 @@ Aislinn ======= -- Aislinn is a dynamic verifier for MPI programs. For a fixed input it - covers all possible runs with respect to nondeterminism introduced - by MPI. It allows to detect bugs (for sure) that occurs very rare in - normal runs. -- Aislinn detects problems like invalid memory accesses, deadlocks, - misuse of MPI, and resource leaks. -- Aislinn is open-source software; you can use it without any - licensing limitations. -- Web page of the project: <http://verif.cs.vsb.cz/aislinn/> +- Aislinn is a dynamic verifier for MPI programs. For a fixed input it + covers all possible runs with respect to nondeterminism introduced + by MPI. It allows to detect bugs (for sure) that occurs very rare in + normal runs. +- Aislinn detects problems like invalid memory accesses, deadlocks, + misuse of MPI, and resource leaks. +- Aislinn is open-source software; you can use it without any + licensing limitations. +- Web page of the project: <http://verif.cs.vsb.cz/aislinn/> Note @@ -22,36 +22,36 @@ problems, please contact the author: <stanislav.bohm@vsb.cz>. Let us have the following program that contains a bug that is not manifested in all runs: -``` +``` #include <mpi.h> #include <stdlib.h> int main(int argc, char **argv) { - int rank; - - MPI_Init(&argc, &argv); - MPI_Comm_rank(MPI_COMM_WORLD, &rank); - - if (rank == 0) { - int *mem1 = (int*) malloc(sizeof(int) * 2); - int *mem2 = (int*) malloc(sizeof(int) * 3); - int data; - MPI_Recv(&data, 1, MPI_INT, MPI_ANY_SOURCE, 1, - MPI_COMM_WORLD, MPI_STATUS_IGNORE); - mem1[data] = 10; // <---------- Possible invalid memory write - MPI_Recv(&data, 1, MPI_INT, MPI_ANY_SOURCE, 1, - MPI_COMM_WORLD, MPI_STATUS_IGNORE); - mem2[data] = 10; - free(mem1); - free(mem2); - } - - if (rank == 1 || rank == 2) { - MPI_Send(&rank, 1, MPI_INT, 0, 1, MPI_COMM_WORLD); - } - - MPI_Finalize(); - return 0; + int rank; + + MPI_Init(&argc, &argv); + MPI_Comm_rank(MPI_COMM_WORLD, &rank); + + if (rank == 0) { + int *mem1 = (int*) malloc(sizeof(int) * 2); + int *mem2 = (int*) malloc(sizeof(int) * 3); + int data; + MPI_Recv(&data, 1, MPI_INT, MPI_ANY_SOURCE, 1, + MPI_COMM_WORLD, MPI_STATUS_IGNORE); + mem1[data] = 10; // <---------- Possible invalid memory write + MPI_Recv(&data, 1, MPI_INT, MPI_ANY_SOURCE, 1, + MPI_COMM_WORLD, MPI_STATUS_IGNORE); + mem2[data] = 10; + free(mem1); + free(mem2); + } + + if (rank == 1 || rank == 2) { + MPI_Send(&rank, 1, MPI_INT, 0, 1, MPI_COMM_WORLD); + } + + MPI_Finalize(); + return 0; } ``` @@ -63,7 +63,7 @@ memory write occurs at line 16. To verify this program by Aislinn, we first load Aislinn itself: -``` +``` $ module load aislinn ``` @@ -73,7 +73,7 @@ Now we compile the program by Aislinn implementation of MPI. There are has to be recompiled; non-MPI parts may remain untouched. Let us assume that our program is in `test.cpp`. -``` +``` $ mpicc -g test.cpp -o test ``` @@ -86,7 +86,7 @@ Now we run the Aislinn itself. The argument `-p 3` specifies that we want to verify our program for the case of three MPI processes -``` +``` $ aislinn -p 3 ./test ==AN== INFO: Aislinn v0.3.0 ==AN== INFO: Found error 'Invalid write' @@ -103,48 +103,48 @@ At the beginning of the report there are some basic summaries of the verification. In the second part (depicted in the following picture), the error is described. - + It shows us: -> - Error occurs in process 0 in test.cpp on line 16. -> - Stdout and stderr streams are empty. (The program does not -> write anything). -> - The last part shows MPI calls for each process that occurs in the -> invalid run. The more detailed information about each call can be -> obtained by mouse cursor. +> - Error occurs in process 0 in test.cpp on line 16. +> - Stdout and stderr streams are empty. (The program does not +> write anything). +> - The last part shows MPI calls for each process that occurs in the +> invalid run. The more detailed information about each call can be +> obtained by mouse cursor. ### Limitations Since the verification is a non-trivial process there are some of limitations. -- The verified process has to terminate in all runs, i.e. we cannot - answer the halting problem. -- The verification is a computationally and memory demanding process. - We put an effort to make it efficient and it is an important point - for further research. However covering all runs will be always more - demanding than techniques that examines only a single run. The good - practise is to start with small instances and when it is feasible, - make them bigger. The Aislinn is good to find bugs that are hard to - find because they occur very rarely (only in a rare scheduling). - Such bugs often do not need big instances. -- Aislinn expects that your program is a "standard MPI" program, i.e. - processes communicate only through MPI, the verified program does - not interacts with the system in some unusual ways (e.g. - opening sockets). +- The verified process has to terminate in all runs, i.e. we cannot + answer the halting problem. +- The verification is a computationally and memory demanding process. + We put an effort to make it efficient and it is an important point + for further research. However covering all runs will be always more + demanding than techniques that examines only a single run. The good + practise is to start with small instances and when it is feasible, + make them bigger. The Aislinn is good to find bugs that are hard to + find because they occur very rarely (only in a rare scheduling). + Such bugs often do not need big instances. +- Aislinn expects that your program is a "standard MPI" program, i.e. + processes communicate only through MPI, the verified program does + not interacts with the system in some unusual ways (e.g. + opening sockets). There are also some limitations bounded to the current version and they will be removed in the future: -- All files containing MPI calls have to be recompiled by MPI - implementation provided by Aislinn. The files that does not contain - MPI calls, they do not have to recompiled. Aislinn MPI - implementation supports many commonly used calls from MPI-2 and - MPI-3 related to point-to-point communication, collective - communication, and communicator management. Unfortunately, MPI-IO - and one-side communication is not implemented yet. -- Each MPI can use only one thread (if you use OpenMP, set - `OMP_NUM_THREADS` to 1). -- There are some limitations for using files, but if the program just - reads inputs and writes results, it is ok. +- All files containing MPI calls have to be recompiled by MPI + implementation provided by Aislinn. The files that does not contain + MPI calls, they do not have to recompiled. Aislinn MPI + implementation supports many commonly used calls from MPI-2 and + MPI-3 related to point-to-point communication, collective + communication, and communicator management. Unfortunately, MPI-IO + and one-side communication is not implemented yet. +- Each MPI can use only one thread (if you use OpenMP, set + `OMP_NUM_THREADS` to 1). +- There are some limitations for using files, but if the program just + reads inputs and writes results, it is ok. diff --git a/converted/docs.it4i.cz/salomon/software/debuggers/allinea-ddt.md b/converted/docs.it4i.cz/salomon/software/debuggers/allinea-ddt.md index e34630d9f7c06517629dd674a1d9c3ff1bb5af2b..014d2e7dcd9d95ca5884809e853da44b8415ed8b 100644 --- a/converted/docs.it4i.cz/salomon/software/debuggers/allinea-ddt.md +++ b/converted/docs.it4i.cz/salomon/software/debuggers/allinea-ddt.md @@ -3,7 +3,7 @@ Allinea Forge (DDT,MAP) - + Allinea Forge consist of two tools - debugger DDT and profiler MAP. @@ -25,13 +25,13 @@ On the clusters users can debug OpenMP or MPI code that runs up to 64 parallel processes. In case of debugging GPU or Xeon Phi accelerated codes the limit is 8 accelerators. These limitation means that: -- 1 user can debug up 64 processes, or -- 32 users can debug 2 processes, etc. +- 1 user can debug up 64 processes, or +- 32 users can debug 2 processes, etc. In case of debugging on accelerators: -- 1 user can debug on up to 8 accelerators, or -- 8 users can debug on single accelerator. +- 1 user can debug on up to 8 accelerators, or +- 8 users can debug on single accelerator. Compiling Code to run with Forge -------------------------------- @@ -40,16 +40,16 @@ Compiling Code to run with Forge Load all necessary modules to compile the code. For example: - $ module load intel - $ module load impi ... or ... module load OpenMPI + $ module load intel + $ module load impi ... or ... module load OpenMPI Load the Allinea DDT module: - $ module load Forge + $ module load Forge Compile the code: -``` +``` $ mpicc -g -O0 -o test_debug test.c $ mpif90 -g -O0 -o test_debug test.f @@ -59,54 +59,54 @@ $ mpif90 -g -O0 -o test_debug test.f Before debugging, you need to compile your code with theses flags: -**-g** : Generates extra debugging information usable by GDB. -g3 +-g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. -**-O0** : Suppress all optimizations. +-O0** : Suppress all optimizations.  Direct starting a Job with Forge -------------------------------- -Be sure to log in with an [<span class="internal-link">X window -forwarding</span> +Be sure to log in with an [ X window +forwarding enabled](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html). This could mean using the -X in the ssh:  - $ ssh -X username@clustername.it4i.cz + $ ssh -X username@clustername.it4i.cz Other options is to access login node using VNC. Please see the detailed -information on <span class="internal-link">[how to <span -class="internal-link">use graphic user interface on the -clusters</span>](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html)</span><span -class="internal-link"></span>. +information on [how to +use graphic user interface on the +clusters](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html) +. From the login node an interactive session **with X windows forwarding** (-X option) can be started by following command: - $ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=24:mpiprocs=24,walltime=01:00:00 + $ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=24:mpiprocs=24,walltime=01:00:00 Then launch the debugger with the ddt command followed by the name of the executable to debug: - $ ddt test_debug + $ ddt test_debug Forge now has common GUI for both DDT and MAP. In interactive mode, you -can launch Forge using <span class="monospace">forge, ddt or map, -</span>the latter two will just launch forge and swith to the respective +can launch Forge using forge, ddt or map, +the latter two will just launch forge and swith to the respective tab in the common GUI. -A<span style="text-align: start; "> submission window that appears have +A submission window that appears have a prefilled path to the executable to debug. You can select the number of MPI processors and/or OpenMP threads on which to run and press run. -Command line arguments to a program can be entered to the</span> -"Arguments<span class="Apple-converted-space">" </span><span -style="text-align: start; ">box.</span> +Command line arguments to a program can be entered to the +"Arguments " +box. -[{.image-inline width="451" -height="513"}](ddt1.png) + + To start the debugging directly without the submission window, user can specify the debugging and execution parameters from the command line. @@ -114,7 +114,7 @@ For example the number of MPI processes is set by option "-np 4". Skipping the dialog is done by "-start" option. To see the list of the "ddt" command line parameters, run "ddt --help".  - ddt -start -np 4 ./hello_debug_impi + ddt -start -np 4 ./hello_debug_impi All of the above text also applies for MAP, just replace ddt command with map. @@ -131,14 +131,14 @@ To use Reverse connect, use a jobscript that you would normally use to launch your application, just prepend ddt/map --connect to your application: - map --connect mpirun -np 24 ./mpi-test - ddt --connect mpirun -np 24 ./mpi-test + map --connect mpirun -np 24 ./mpi-test + ddt --connect mpirun -np 24 ./mpi-test Launch Forge GUI on login node and submit the job using qsub. When the job starts running, Forge will ask you to accept the connection: - + + After accepting the request, you can start remote profiling/debugging. @@ -153,9 +153,9 @@ Xeon Phi programs. It is recommended to set the following environment values on the offload host: - export MYO_WATCHDOG_MONITOR=-1 # To make sure the host process isn't killed when we enter a debugging session - export AMPLXE_COI_DEBUG_SUPPORT=true # To make sure that debugging symbols are accessible on the host and the card - unset OFFLOAD_MAIN # To make sure allinea DDT can attach to offloaded codes + export MYO_WATCHDOG_MONITOR=-1 # To make sure the host process isn't killed when we enter a debugging session + export AMPLXE_COI_DEBUG_SUPPORT=true # To make sure that debugging symbols are accessible on the host and the card + unset OFFLOAD_MAIN # To make sure allinea DDT can attach to offloaded codes Then use one of the above mentioned methods to launch Forge. (Reverse connect also works.) @@ -166,26 +166,26 @@ Native mode programs can be profiled/debugged using the remote launch feature. First, you need to create a script that will setup the environment on the Phi card. An example: - #!/bin/bash - # adjust PATH and LD_LIBRARY_PATH according to the toolchain/libraries your app is using. - export PATH=/apps/all/impi/5.0.3.048-iccifort-2015.3.187/mic/bin:$PATH - export LD_LIBRARY_PATH=/apps/all/impi/5.0.3.048-iccifort-2015.3.187/mic/lib:/apps/all/ifort/2015.3.187/lib/mic:/apps/all/icc/2015.3.187/lib/mic:$LD_LIBRARY_PATH - export MIC_OMP_NUM_THREADS=60 - export MYO_WATCHDOG_MONITOR=-1 - export AMPLXE_COI_DEBUG_SUPPORT=true - unset OFFLOAD_MAIN - export I_MPI_MIC=1 - -Save the script in eg.<span class="monospace"> ~/remote-mic.sh. -</span>Now, start an interactive graphical session on a node with + #!/bin/bash + # adjust PATH and LD_LIBRARY_PATH according to the toolchain/libraries your app is using. + export PATH=/apps/all/impi/5.0.3.048-iccifort-2015.3.187/mic/bin:$PATH + export LD_LIBRARY_PATH=/apps/all/impi/5.0.3.048-iccifort-2015.3.187/mic/lib:/apps/all/ifort/2015.3.187/lib/mic:/apps/all/icc/2015.3.187/lib/mic:$LD_LIBRARY_PATH + export MIC_OMP_NUM_THREADS=60 + export MYO_WATCHDOG_MONITOR=-1 + export AMPLXE_COI_DEBUG_SUPPORT=true + unset OFFLOAD_MAIN + export I_MPI_MIC=1 + +Save the script in eg. ~/remote-mic.sh. +Now, start an interactive graphical session on a node with accelerator: - $ qsub â€IX â€q qexp â€l select=1:ncpus=24:accelerator=True + $ qsub â€IX â€q qexp â€l select=1:ncpus=24:accelerator=True Launch Forge : - $ module load Forge - $ forge& + $ module load Forge + $ forge& Now click on the remote launch drop-down list, select "Configure..." and Add a new remote connection with the following parameters: @@ -209,7 +209,7 @@ Documentation Users can find original User Guide after loading the Forge module: - $EBROOTFORGE/doc/userguide-forge.pdf + $EBROOTFORGE/doc/userguide-forge.pdf  diff --git a/converted/docs.it4i.cz/salomon/software/debuggers/allinea-performance-reports.md b/converted/docs.it4i.cz/salomon/software/debuggers/allinea-performance-reports.md index c4bee9336b08b4dbe56ca6af93fa03fd4cf315fe..390af2892bf7c650fe8fbae991ccfcbc9abd616f 100644 --- a/converted/docs.it4i.cz/salomon/software/debuggers/allinea-performance-reports.md +++ b/converted/docs.it4i.cz/salomon/software/debuggers/allinea-performance-reports.md @@ -4,7 +4,7 @@ Allinea Performance Reports quick application profiling - + Introduction ------------ @@ -25,7 +25,7 @@ Modules Allinea Performance Reports version 6.0 is available - $ module load PerformanceReports/6.0 + $ module load PerformanceReports/6.0 The module sets up environment variables, required for using the Allinea Performance Reports. @@ -38,13 +38,13 @@ Use the the perf-report wrapper on your (MPI) program. Instead of [running your MPI program the usual way](../mpi-1.html), use the the perf report wrapper: - $ perf-report mpirun ./mympiprog.x + $ perf-report mpirun ./mympiprog.x The mpi program will run as usual. The perf-report creates two additional files, in *.txt and *.html format, containing the -performance report. Note that <span class="internal-link">demanding MPI -codes should be run within </span>[<span class="internal-link">the queue -system</span>](../../resource-allocation-and-job-execution/job-submission-and-execution.html). +performance report. Note that demanding MPI +codes should be run within [ the queue +system](../../resource-allocation-and-job-execution/job-submission-and-execution.html). Example ------- @@ -55,18 +55,18 @@ compilers and linked against intel MPI library: First, we allocate some nodes via the express queue: - $ qsub -q qexp -l select=2:ppn=24:mpiprocs=24:ompthreads=1 -I - qsub: waiting for job 262197.dm2 to start - qsub: job 262197.dm2 ready + $ qsub -q qexp -l select=2:ppn=24:mpiprocs=24:ompthreads=1 -I + qsub: waiting for job 262197.dm2 to start + qsub: job 262197.dm2 ready Then we load the modules and run the program the usual way: - $ module load intel impi PerfReports/6.0 - $ mpirun ./mympiprog.x + $ module load intel impi PerfReports/6.0 + $ mpirun ./mympiprog.x Now lets profile the code: - $ perf-report mpirun ./mympiprog.x + $ perf-report mpirun ./mympiprog.x Performance report files [mympiprog_32p*.txt](mympiprog_32p_2014-10-15_16-56.txt) diff --git a/converted/docs.it4i.cz/salomon/software/debuggers/intel-vtune-amplifier.md b/converted/docs.it4i.cz/salomon/software/debuggers/intel-vtune-amplifier.md index deee29179ee6d019750307f4405944803af576f0..51dbe43813b0a583fd3396a45eacb0351eb0d5fe 100644 --- a/converted/docs.it4i.cz/salomon/software/debuggers/intel-vtune-amplifier.md +++ b/converted/docs.it4i.cz/salomon/software/debuggers/intel-vtune-amplifier.md @@ -3,64 +3,64 @@ Intel VTune Amplifier XE - + Introduction ------------ -Intel*® *VTune™ <span>Amplifier, part of Intel Parallel studio, is a GUI +Intel*® *VTune™ >Amplifier, part of Intel Parallel studio, is a GUI profiling tool designed for Intel processors. It offers a graphical performance analysis of single core and multithreaded applications. A -highlight of the features:</span> +highlight of the features: -- Hotspot analysis -- Locks and waits analysis -- Low level specific counters, such as branch analysis and memory - bandwidth -- Power usage analysis - frequency and sleep states. +- Hotspot analysis +- Locks and waits analysis +- Low level specific counters, such as branch analysis and memory + bandwidth +- Power usage analysis - frequency and sleep states. -[](vtune-amplifier) + Usage ----- -<span>To profile an application with VTune Amplifier, special kernel +>To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on the login nodes, thus direct profiling on login nodes is not possible. By default, the kernel modules ale not loaded on compute nodes neither. In order to have the modules loaded, you need to specify vtune=version PBS resource at job submit. The version is the same as for *environment module*. For -example to use <span -class="monospace">VTune/2016_update1</span>:</span> +example to use +VTune/2016_update1: - $ qsub -q qexp -A OPEN-0-0 -I -l select=1,vtune=2016_update1 + $ qsub -q qexp -A OPEN-0-0 -I -l select=1,vtune=2016_update1 After that, you can verify the modules sep*, pax and vtsspp are present in the kernel : - $ lsmod | grep -e sep -e pax -e vtsspp - vtsspp 362000 0 - sep3_15 546657 0 - pax 4312 0 + $ lsmod | grep -e sep -e pax -e vtsspp + vtsspp 362000 0 + sep3_15 546657 0 + pax 4312 0 To launch the GUI, first load the module: - $ module add VTune/2016_update1 + $ module add VTune/2016_update1 -<span class="s1">and launch the GUI :</span> + class="s1">and launch the GUI : - $ amplxe-gui + $ amplxe-gui -<span>The GUI will open in new window. Click on "*New Project...*" to +>The GUI will open in new window. Click on "*New Project...*" to create a new project. After clicking *OK*, a new window with project properties will appear.  At "*Application:*", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "*Managed code profiling mode:*" select "*Native*" (unless you want to profile managed mode .NET/Mono applications). After clicking *OK*, -your project is created.</span> +your project is created. To run a new analysis, click "*New analysis...*". You will see a list of possible analysis. Some of them will not be possible on the current CPU @@ -82,7 +82,7 @@ the command line needed to perform the selected analysis. The command line will look like this: - /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -app-working-dir /home/sta545/tmp -- /home/sta545/tmp/sgemm + /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -collect advanced-hotspots -app-working-dir /home/sta545/tmp -- /home/sta545/tmp/sgemm Copy the line to clipboard and then you can paste it in your jobscript or in command line. After the collection is run, open the GUI once @@ -120,11 +120,11 @@ analyze it in the GUI later : Native launch: - $ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-native:0 -collect advanced-hotspots -- /home/sta545/tmp/vect-add-mic + $ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-native:0 -collect advanced-hotspots -- /home/sta545/tmp/vect-add-mic Host launch: - $ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-host-launch:0 -collect advanced-hotspots -- /home/sta545/tmp/sgemm + $ /apps/all/VTune/2016_update1/vtune_amplifier_xe_2016.1.1.434111/bin64/amplxe-cl -target-system mic-host-launch:0 -collect advanced-hotspots -- /home/sta545/tmp/sgemm You can obtain this command line by pressing the "Command line..." button on Analysis Type screen. @@ -132,10 +132,10 @@ button on Analysis Type screen. References ---------- -1. <span><https://www.rcac.purdue.edu/tutorials/phi/PerformanceTuningXeonPhi-Tullos.pdf> Performance - Tuning for Intel® Xeon Phi™ Coprocessors</span> -2. <span><https://software.intel.com/en-us/intel-vtune-amplifier-xe-support/documentation> <span>Intel® - VTune™ Amplifier Support</span></span> -3. <span><span><https://software.intel.com/en-us/amplifier_help_linux> Linux - user guide</span></span> +1.><https://www.rcac.purdue.edu/tutorials/phi/PerformanceTuningXeonPhi-Tullos.pdf> Performance + Tuning for Intel® Xeon Phi™ Coprocessors +2.><https://software.intel.com/en-us/intel-vtune-amplifier-xe-support/documentation> >Intel® + VTune™ Amplifier Support +3.>><https://software.intel.com/en-us/amplifier_help_linux> Linux + user guide diff --git a/converted/docs.it4i.cz/salomon/software/debuggers/summary.md b/converted/docs.it4i.cz/salomon/software/debuggers/summary.md index f367995a76aca51d82430c58223c2498ba5da869..0739ba35eb0bd0a8ff9947f43b0ee7f94f627f70 100644 --- a/converted/docs.it4i.cz/salomon/software/debuggers/summary.md +++ b/converted/docs.it4i.cz/salomon/software/debuggers/summary.md @@ -3,7 +3,7 @@ Debuggers and profilers summary - + Introduction ------------ @@ -25,8 +25,8 @@ environment. Use [X display](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html) for running the GUI. - $ module load intel - $ idb + $ module load intel + $ idb Read more at the [Intel Debugger](../intel-suite/intel-debugger.html) page. @@ -42,8 +42,8 @@ every thread running as part of your program, or for every process - even if these processes are distributed across a cluster using an MPI implementation. - $ module load Forge - $ forge + $ module load Forge + $ forge Read more at the [Allinea DDT](allinea-ddt.html) page. @@ -57,8 +57,8 @@ about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs. Our license is limited to 64 MPI processes. - $ module load PerformanceReports/6.0 - $ perf-report mpirun -n 64 ./my_application argument01 argument02 + $ module load PerformanceReports/6.0 + $ perf-report mpirun -n 64 ./my_application argument01 argument02 Read more at the [Allinea Performance Reports](allinea-performance-reports.html) page. @@ -72,8 +72,8 @@ analyze, organize, and test programs, making it easy to isolate and identify problems in individual threads and processes in programs of great complexity. - $ module load TotalView/8.15.4-6-linux-x86-64 - $ totalview + $ module load TotalView/8.15.4-6-linux-x86-64 + $ totalview Read more at the [Totalview](total-view.html) page. @@ -82,8 +82,8 @@ Vampir trace analyzer Vampir is a GUI trace analyzer for traces in OTF format. - $ module load Vampir/8.5.0 - $ vampir + $ module load Vampir/8.5.0 + $ vampir Read more at the [Vampir](vampir.html) page. diff --git a/converted/docs.it4i.cz/salomon/software/debuggers/total-view.md b/converted/docs.it4i.cz/salomon/software/debuggers/total-view.md index b39e006c720f9809952ded41d147bc16ce9f01d2..5502fa7cf920f6baa1711bfee75d00c3264d59c7 100644 --- a/converted/docs.it4i.cz/salomon/software/debuggers/total-view.md +++ b/converted/docs.it4i.cz/salomon/software/debuggers/total-view.md @@ -24,29 +24,29 @@ Compiling Code to run with TotalView Load all necessary modules to compile the code. For example: - module load intel + module load intel - module load impi  ... or ... module load OpenMPI/X.X.X-icc + module load impi  ... or ... module load OpenMPI/X.X.X-icc Load the TotalView module: - module load TotalView/8.15.4-6-linux-x86-64 + module load TotalView/8.15.4-6-linux-x86-64 Compile the code: - mpicc -g -O0 -o test_debug test.c + mpicc -g -O0 -o test_debug test.c - mpif90 -g -O0 -o test_debug test.f + mpif90 -g -O0 -o test_debug test.f ### Compiler flags Before debugging, you need to compile your code with theses flags: -**-g** : Generates extra debugging information usable by GDB. -g3 +-g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers. -**-O0** : Suppress all optimizations. +-O0** : Suppress all optimizations. Starting a Job with TotalView ----------------------------- @@ -54,7 +54,7 @@ Starting a Job with TotalView Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh: - ssh -X username@salomon.it4i.cz + ssh -X username@salomon.it4i.cz Other options is to access login node using VNC. Please see the detailed information on how to use graphic user interface on Anselm @@ -63,7 +63,7 @@ information on how to use graphic user interface on Anselm From the login node an interactive session with X windows forwarding (-X option) can be started by following command: - qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=24:mpiprocs=24,walltime=01:00:00 + qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=24:mpiprocs=24,walltime=01:00:00 Then launch the debugger with the totalview command followed by the name of the executable to debug. @@ -72,59 +72,59 @@ of the executable to debug. To debug a serial code use: - totalview test_debug + totalview test_debug ### Debugging a parallel code - option 1 -To debug a parallel code compiled with <span>**OpenMPI**</span> you need +To debug a parallel code compiled with >**OpenMPI** you need to setup your TotalView environment: -**Please note:** To be able to run parallel debugging procedure from the +Please note:** To be able to run parallel debugging procedure from the command line without stopping the debugger in the mpiexec source code you have to add the following function to your **~/.tvdrc** file: - proc mpi_auto_run_starter { -    set starter_programs -    set executable_name [TV::symbol get $loaded_id full_pathname] -    set file_component [file tail $executable_name] + proc mpi_auto_run_starter { +    set starter_programs +    set executable_name [TV::symbol get $loaded_id full_pathname] +    set file_component [file tail $executable_name] -    if {[lsearch -exact $starter_programs $file_component] != -1} { -        puts "**************************************" -        puts "Automatically starting $file_component" -        puts "**************************************" -        dgo -    } - } +    if {[lsearch -exact $starter_programs $file_component] != -1} { +        puts "**************************************" +        puts "Automatically starting $file_component" +        puts "**************************************" +        dgo +    } + } - # Append this function to TotalView's image load callbacks so that - # TotalView run this program automatically. + # Append this function to TotalView's image load callbacks so that + # TotalView run this program automatically. - dlappend TV::image_load_callbacks mpi_auto_run_starter + dlappend TV::image_load_callbacks mpi_auto_run_starter The source code of this function can be also found in - /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl + /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl You can also add only following line to you ~/.tvdrc file instead of the entire function: -**source /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl** +source /apps/all/OpenMPI/1.10.1-GNU-4.9.3-2.25/etc/openmpi-totalview.tcl** You need to do this step only once. See also [OpenMPI FAQ entry](https://www.open-mpi.org/faq/?category=running#run-with-tv) Now you can run the parallel debugger using: - mpirun -tv -n 5 ./test_debug + mpirun -tv -n 5 ./test_debug When following dialog appears click on "Yes" -[](totalview1.png) + At this point the main TotalView GUI window will appear and you can insert the breakpoints and start debugging: -[](totalview2.png) + ### Debugging a parallel code - option 2 @@ -135,9 +135,9 @@ to specify a MPI implementation used to compile the source code. The following example shows how to start debugging session with Intel MPI: - module load intel/2015b-intel-2015b impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25 TotalView/8.15.4-6-linux-x86-64 + module load intel/2015b-intel-2015b impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25 TotalView/8.15.4-6-linux-x86-64 - totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi + totalview -mpi "Intel MPI-Hydra" -np 8 ./hello_debug_impi After running previous command you will see the same window as shown in the screenshot above. diff --git a/converted/docs.it4i.cz/salomon/software/debuggers/valgrind.md b/converted/docs.it4i.cz/salomon/software/debuggers/valgrind.md index 8781ae49abf3324bb960cec7ac31d479ca5d90ef..281244908b180a0e2cc66691a5fc71fe386550fd 100644 --- a/converted/docs.it4i.cz/salomon/software/debuggers/valgrind.md +++ b/converted/docs.it4i.cz/salomon/software/debuggers/valgrind.md @@ -19,279 +19,279 @@ Valgrind run 5-100 times slower. The main tools available in Valgrind are : -- **Memcheck**, the original, must used and default tool. Verifies - memory access in you program and can detect use of unitialized - memory, out of bounds memory access, memory leaks, double free, etc. -- **Massif**, a heap profiler. -- **Hellgrind** and **DRD** can detect race conditions in - multi-threaded applications. -- **Cachegrind**, a cache profiler. -- **Callgrind**, a callgraph analyzer. -- For a full list and detailed documentation, please refer to the - [official Valgrind - documentation](http://valgrind.org/docs/). +- **Memcheck**, the original, must used and default tool. Verifies + memory access in you program and can detect use of unitialized + memory, out of bounds memory access, memory leaks, double free, etc. +- **Massif**, a heap profiler. +- **Hellgrind** and **DRD** can detect race conditions in + multi-threaded applications. +- **Cachegrind**, a cache profiler. +- **Callgrind**, a callgraph analyzer. +- For a full list and detailed documentation, please refer to the + [official Valgrind + documentation](http://valgrind.org/docs/). Installed versions ------------------ There are two versions of Valgrind available on the cluster. -- <span>Version 3.8.1, installed by operating system vendor - in </span><span class="monospace">/usr/bin/valgrind. - </span><span>This version is available by default, without the need - to load any module. This version however does not provide additional - MPI support. Also, it does not support AVX2 instructions, - **debugging of an AVX2-enabled executable with this version will - fail**</span> -- <span><span>Version 3.11.0 built by ICC with support for Intel MPI, - available in - [module](../../environment-and-modules.html) </span></span><span - class="monospace">Valgrind/3.11.0-intel-2015b. </span>After loading - the module, this version replaces the default valgrind. -- Version 3.11.0 built by GCC with support for Open MPI, module <span - class="monospace">Valgrind/3.11.0-foss-2015b</span> +- >Version 3.8.1, installed by operating system vendor + in /usr/bin/valgrind. + >This version is available by default, without the need + to load any module. This version however does not provide additional + MPI support. Also, it does not support AVX2 instructions, + **debugging of an AVX2-enabled executable with this version will + fail** +- >>Version 3.11.0 built by ICC with support for Intel MPI, + available in + [module](../../environment-and-modules.html) + Valgrind/3.11.0-intel-2015b. After loading + the module, this version replaces the default valgrind. +- Version 3.11.0 built by GCC with support for Open MPI, module + Valgrind/3.11.0-foss-2015b Usage ----- Compile the application which you want to debug as usual. It is -advisable to add compilation flags <span class="monospace">-g </span>(to +advisable to add compilation flags -g (to add debugging information to the binary so that you will see original -source code lines in the output) and <span class="monospace">-O0</span> +source code lines in the output) and -O0 (to disable compiler optimizations). For example, lets look at this C code, which has two problems : - #include <stdlib.h> + #include <stdlib.h> - void f(void) - { - int* x = malloc(10 * sizeof(int)); - x[10] = 0; // problem 1: heap block overrun - } // problem 2: memory leak -- x not freed + void f(void) + { + int* x = malloc(10 * sizeof(int)); + x[10] = 0; // problem 1: heap block overrun + } // problem 2: memory leak -- x not freed - int main(void) - { - f(); - return 0; - } + int main(void) + { + f(); + return 0; + } Now, compile it with Intel compiler : - $ module add intel - $ icc -g valgrind-example.c -o valgrind-example + $ module add intel + $ icc -g valgrind-example.c -o valgrind-example Now, lets run it with Valgrind. The syntax is : -<span class="monospace">valgrind [valgrind options] <your program -binary> [your program options]</span> + valgrind [valgrind options] <your program +binary> [your program options] If no Valgrind options are specified, Valgrind defaults to running Memcheck tool. Please refer to the Valgrind documentation for a full description of command line options. - $ valgrind ./valgrind-example - ==12652== Memcheck, a memory error detector - ==12652== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. - ==12652== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info - ==12652== Command: ./valgrind-example - ==12652== - ==12652== Invalid write of size 4 - ==12652== at 0x40053E: f (valgrind-example.c:6) - ==12652== by 0x40054E: main (valgrind-example.c:11) - ==12652== Address 0x5861068 is 0 bytes after a block of size 40 alloc'd - ==12652== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) - ==12652== by 0x400528: f (valgrind-example.c:5) - ==12652== by 0x40054E: main (valgrind-example.c:11) - ==12652== - ==12652== - ==12652== HEAP SUMMARY: - ==12652== in use at exit: 40 bytes in 1 blocks - ==12652== total heap usage: 1 allocs, 0 frees, 40 bytes allocated - ==12652== - ==12652== LEAK SUMMARY: - ==12652== definitely lost: 40 bytes in 1 blocks - ==12652== indirectly lost: 0 bytes in 0 blocks - ==12652== possibly lost: 0 bytes in 0 blocks - ==12652== still reachable: 0 bytes in 0 blocks - ==12652== suppressed: 0 bytes in 0 blocks - ==12652== Rerun with --leak-check=full to see details of leaked memory - ==12652== - ==12652== For counts of detected and suppressed errors, rerun with: -v - ==12652== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6) + $ valgrind ./valgrind-example + ==12652== Memcheck, a memory error detector + ==12652== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. + ==12652== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info + ==12652== Command: ./valgrind-example + ==12652== + ==12652== Invalid write of size 4 + ==12652== at 0x40053E: f (valgrind-example.c:6) + ==12652== by 0x40054E: main (valgrind-example.c:11) + ==12652== Address 0x5861068 is 0 bytes after a block of size 40 alloc'd + ==12652== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) + ==12652== by 0x400528: f (valgrind-example.c:5) + ==12652== by 0x40054E: main (valgrind-example.c:11) + ==12652== + ==12652== + ==12652== HEAP SUMMARY: + ==12652== in use at exit: 40 bytes in 1 blocks + ==12652== total heap usage: 1 allocs, 0 frees, 40 bytes allocated + ==12652== + ==12652== LEAK SUMMARY: + ==12652== definitely lost: 40 bytes in 1 blocks + ==12652== indirectly lost: 0 bytes in 0 blocks + ==12652== possibly lost: 0 bytes in 0 blocks + ==12652== still reachable: 0 bytes in 0 blocks + ==12652== suppressed: 0 bytes in 0 blocks + ==12652== Rerun with --leak-check=full to see details of leaked memory + ==12652== + ==12652== For counts of detected and suppressed errors, rerun with: -v + ==12652== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6) In the output we can see that Valgrind has detected both errors - the off-by-one memory access at line 5 and a memory leak of 40 bytes. If we want a detailed analysis of the memory leak, we need to run Valgrind -with <span class="monospace">--leak-check=full</span> option : - - $ valgrind --leak-check=full ./valgrind-example - ==23856== Memcheck, a memory error detector - ==23856== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. - ==23856== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info - ==23856== Command: ./valgrind-example - ==23856== - ==23856== Invalid write of size 4 - ==23856== at 0x40067E: f (valgrind-example.c:6) - ==23856== by 0x40068E: main (valgrind-example.c:11) - ==23856== Address 0x66e7068 is 0 bytes after a block of size 40 alloc'd - ==23856== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) - ==23856== by 0x400668: f (valgrind-example.c:5) - ==23856== by 0x40068E: main (valgrind-example.c:11) - ==23856== - ==23856== - ==23856== HEAP SUMMARY: - ==23856== in use at exit: 40 bytes in 1 blocks - ==23856== total heap usage: 1 allocs, 0 frees, 40 bytes allocated - ==23856== - ==23856== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1 - ==23856== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) - ==23856== by 0x400668: f (valgrind-example.c:5) - ==23856== by 0x40068E: main (valgrind-example.c:11) - ==23856== - ==23856== LEAK SUMMARY: - ==23856== definitely lost: 40 bytes in 1 blocks - ==23856== indirectly lost: 0 bytes in 0 blocks - ==23856== possibly lost: 0 bytes in 0 blocks - ==23856== still reachable: 0 bytes in 0 blocks - ==23856== suppressed: 0 bytes in 0 blocks - ==23856== - ==23856== For counts of detected and suppressed errors, rerun with: -v - ==23856== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6) - -Now we can see that the memory leak is due to the <span -class="monospace">malloc()</span> at line 6. - -<span>Usage with MPI</span> +with --leak-check=full option : + + $ valgrind --leak-check=full ./valgrind-example + ==23856== Memcheck, a memory error detector + ==23856== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. + ==23856== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info + ==23856== Command: ./valgrind-example + ==23856== + ==23856== Invalid write of size 4 + ==23856== at 0x40067E: f (valgrind-example.c:6) + ==23856== by 0x40068E: main (valgrind-example.c:11) + ==23856== Address 0x66e7068 is 0 bytes after a block of size 40 alloc'd + ==23856== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) + ==23856== by 0x400668: f (valgrind-example.c:5) + ==23856== by 0x40068E: main (valgrind-example.c:11) + ==23856== + ==23856== + ==23856== HEAP SUMMARY: + ==23856== in use at exit: 40 bytes in 1 blocks + ==23856== total heap usage: 1 allocs, 0 frees, 40 bytes allocated + ==23856== + ==23856== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1 + ==23856== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) + ==23856== by 0x400668: f (valgrind-example.c:5) + ==23856== by 0x40068E: main (valgrind-example.c:11) + ==23856== + ==23856== LEAK SUMMARY: + ==23856== definitely lost: 40 bytes in 1 blocks + ==23856== indirectly lost: 0 bytes in 0 blocks + ==23856== possibly lost: 0 bytes in 0 blocks + ==23856== still reachable: 0 bytes in 0 blocks + ==23856== suppressed: 0 bytes in 0 blocks + ==23856== + ==23856== For counts of detected and suppressed errors, rerun with: -v + ==23856== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6) + +Now we can see that the memory leak is due to the +malloc() at line 6. + +>Usage with MPI --------------------------- Although Valgrind is not primarily a parallel debugger, it can be used to debug parallel applications as well. When launching your parallel applications, prepend the valgrind command. For example : - $ mpirun -np 4 valgrind myapplication + $ mpirun -np 4 valgrind myapplication The default version without MPI support will however report a large number of false errors in the MPI library, such as : - ==30166== Conditional jump or move depends on uninitialised value(s) - ==30166== at 0x4C287E8: strlen (mc_replace_strmem.c:282) - ==30166== by 0x55443BD: I_MPI_Processor_model_number (init_interface.c:427) - ==30166== by 0x55439E0: I_MPI_Processor_arch_code (init_interface.c:171) - ==30166== by 0x558D5AE: MPID_nem_impi_init_shm_configuration (mpid_nem_impi_extensions.c:1091) - ==30166== by 0x5598F4C: MPID_nem_init_ckpt (mpid_nem_init.c:566) - ==30166== by 0x5598B65: MPID_nem_init (mpid_nem_init.c:489) - ==30166== by 0x539BD75: MPIDI_CH3_Init (ch3_init.c:64) - ==30166== by 0x5578743: MPID_Init (mpid_init.c:193) - ==30166== by 0x554650A: MPIR_Init_thread (initthread.c:539) - ==30166== by 0x553369F: PMPI_Init (init.c:195) - ==30166== by 0x4008BD: main (valgrind-example-mpi.c:18) + ==30166== Conditional jump or move depends on uninitialised value(s) + ==30166== at 0x4C287E8: strlen (mc_replace_strmem.c:282) + ==30166== by 0x55443BD: I_MPI_Processor_model_number (init_interface.c:427) + ==30166== by 0x55439E0: I_MPI_Processor_arch_code (init_interface.c:171) + ==30166== by 0x558D5AE: MPID_nem_impi_init_shm_configuration (mpid_nem_impi_extensions.c:1091) + ==30166== by 0x5598F4C: MPID_nem_init_ckpt (mpid_nem_init.c:566) + ==30166== by 0x5598B65: MPID_nem_init (mpid_nem_init.c:489) + ==30166== by 0x539BD75: MPIDI_CH3_Init (ch3_init.c:64) + ==30166== by 0x5578743: MPID_Init (mpid_init.c:193) + ==30166== by 0x554650A: MPIR_Init_thread (initthread.c:539) + ==30166== by 0x553369F: PMPI_Init (init.c:195) + ==30166== by 0x4008BD: main (valgrind-example-mpi.c:18) so it is better to use the MPI-enabled valgrind from module. The MPI versions requires library : -<span -class="monospace">$EBROOTVALGRIND/lib/valgrind/libmpiwrap-amd64-linux.so</span> -which must be included in the<span class="monospace"> LD_PRELOAD -</span>environment variable. +$EBROOTVALGRIND/lib/valgrind/libmpiwrap-amd64-linux.so + +which must be included in the LD_PRELOAD +environment variable. Lets look at this MPI example : - #include <stdlib.h> - #include <mpi.h> + #include <stdlib.h> + #include <mpi.h> - int main(int argc, char *argv[]) - { -      int *data = malloc(sizeof(int)*99); + int main(int argc, char *argv[]) + { +      int *data = malloc(sizeof(int)*99); -      MPI_Init(&argc, &argv); -     MPI_Bcast(data, 100, MPI_INT, 0, MPI_COMM_WORLD); -      MPI_Finalize(); +      MPI_Init(&argc, &argv); +     MPI_Bcast(data, 100, MPI_INT, 0, MPI_COMM_WORLD); +      MPI_Finalize(); -        return 0; - } +        return 0; + } There are two errors - use of uninitialized memory and invalid length of the buffer. Lets debug it with valgrind : - $ module add intel impi - $ mpiicc -g valgrind-example-mpi.c -o valgrind-example-mpi - $ module add Valgrind/3.11.0-intel-2015b - $ mpirun -np 2 -env LD_PRELOAD $EBROOTVALGRIND/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi + $ module add intel impi + $ mpiicc -g valgrind-example-mpi.c -o valgrind-example-mpi + $ module add Valgrind/3.11.0-intel-2015b + $ mpirun -np 2 -env LD_PRELOAD $EBROOTVALGRIND/lib/valgrind/libmpiwrap-amd64-linux.so valgrind ./valgrind-example-mpi Prints this output : (note that there is output printed for every launched MPI process) - ==31318== Memcheck, a memory error detector - ==31318== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. - ==31318== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info - ==31318== Command: ./valgrind-example-mpi - ==31318== - ==31319== Memcheck, a memory error detector - ==31319== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. - ==31319== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info - ==31319== Command: ./valgrind-example-mpi - ==31319== - valgrind MPI wrappers 31319: Active for pid 31319 - valgrind MPI wrappers 31319: Try MPIWRAP_DEBUG=help for possible options - valgrind MPI wrappers 31318: Active for pid 31318 - valgrind MPI wrappers 31318: Try MPIWRAP_DEBUG=help for possible options - ==31319== Unaddressable byte(s) found during client check request - ==31319== at 0x4E35974: check_mem_is_addressable_untyped (libmpiwrap.c:960) - ==31319== by 0x4E5D0FE: PMPI_Bcast (libmpiwrap.c:908) - ==31319== by 0x400911: main (valgrind-example-mpi.c:20) - ==31319== Address 0x69291cc is 0 bytes after a block of size 396 alloc'd - ==31319== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) - ==31319== by 0x4007BC: main (valgrind-example-mpi.c:8) - ==31319== - ==31318== Uninitialised byte(s) found during client check request - ==31318== at 0x4E3591D: check_mem_is_defined_untyped (libmpiwrap.c:952) - ==31318== by 0x4E5D06D: PMPI_Bcast (libmpiwrap.c:908) - ==31318== by 0x400911: main (valgrind-example-mpi.c:20) - ==31318== Address 0x6929040 is 0 bytes inside a block of size 396 alloc'd - ==31318== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) - ==31318== by 0x4007BC: main (valgrind-example-mpi.c:8) - ==31318== - ==31318== Unaddressable byte(s) found during client check request - ==31318== at 0x4E3591D: check_mem_is_defined_untyped (libmpiwrap.c:952) - ==31318== by 0x4E5D06D: PMPI_Bcast (libmpiwrap.c:908) - ==31318== by 0x400911: main (valgrind-example-mpi.c:20) - ==31318== Address 0x69291cc is 0 bytes after a block of size 396 alloc'd - ==31318== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) - ==31318== by 0x4007BC: main (valgrind-example-mpi.c:8) - ==31318== - ==31318== - ==31318== HEAP SUMMARY: - ==31318== in use at exit: 3,172 bytes in 67 blocks - ==31318== total heap usage: 191 allocs, 124 frees, 81,203 bytes allocated - ==31318== - ==31319== - ==31319== HEAP SUMMARY: - ==31319== in use at exit: 3,172 bytes in 67 blocks - ==31319== total heap usage: 175 allocs, 108 frees, 48,435 bytes allocated - ==31319== - ==31318== LEAK SUMMARY: - ==31318== definitely lost: 408 bytes in 3 blocks - ==31318== indirectly lost: 256 bytes in 1 blocks - ==31318== possibly lost: 0 bytes in 0 blocks - ==31318== still reachable: 2,508 bytes in 63 blocks - ==31318== suppressed: 0 bytes in 0 blocks - ==31318== Rerun with --leak-check=full to see details of leaked memory - ==31318== - ==31318== For counts of detected and suppressed errors, rerun with: -v - ==31318== Use --track-origins=yes to see where uninitialised values come from - ==31318== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4) - ==31319== LEAK SUMMARY: - ==31319== definitely lost: 408 bytes in 3 blocks - ==31319== indirectly lost: 256 bytes in 1 blocks - ==31319== possibly lost: 0 bytes in 0 blocks - ==31319== still reachable: 2,508 bytes in 63 blocks - ==31319== suppressed: 0 bytes in 0 blocks - ==31319== Rerun with --leak-check=full to see details of leaked memory - ==31319== - ==31319== For counts of detected and suppressed errors, rerun with: -v - ==31319== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4) + ==31318== Memcheck, a memory error detector + ==31318== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. + ==31318== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info + ==31318== Command: ./valgrind-example-mpi + ==31318== + ==31319== Memcheck, a memory error detector + ==31319== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. + ==31319== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info + ==31319== Command: ./valgrind-example-mpi + ==31319== + valgrind MPI wrappers 31319: Active for pid 31319 + valgrind MPI wrappers 31319: Try MPIWRAP_DEBUG=help for possible options + valgrind MPI wrappers 31318: Active for pid 31318 + valgrind MPI wrappers 31318: Try MPIWRAP_DEBUG=help for possible options + ==31319== Unaddressable byte(s) found during client check request + ==31319== at 0x4E35974: check_mem_is_addressable_untyped (libmpiwrap.c:960) + ==31319== by 0x4E5D0FE: PMPI_Bcast (libmpiwrap.c:908) + ==31319== by 0x400911: main (valgrind-example-mpi.c:20) + ==31319== Address 0x69291cc is 0 bytes after a block of size 396 alloc'd + ==31319== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) + ==31319== by 0x4007BC: main (valgrind-example-mpi.c:8) + ==31319== + ==31318== Uninitialised byte(s) found during client check request + ==31318== at 0x4E3591D: check_mem_is_defined_untyped (libmpiwrap.c:952) + ==31318== by 0x4E5D06D: PMPI_Bcast (libmpiwrap.c:908) + ==31318== by 0x400911: main (valgrind-example-mpi.c:20) + ==31318== Address 0x6929040 is 0 bytes inside a block of size 396 alloc'd + ==31318== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) + ==31318== by 0x4007BC: main (valgrind-example-mpi.c:8) + ==31318== + ==31318== Unaddressable byte(s) found during client check request + ==31318== at 0x4E3591D: check_mem_is_defined_untyped (libmpiwrap.c:952) + ==31318== by 0x4E5D06D: PMPI_Bcast (libmpiwrap.c:908) + ==31318== by 0x400911: main (valgrind-example-mpi.c:20) + ==31318== Address 0x69291cc is 0 bytes after a block of size 396 alloc'd + ==31318== at 0x4C27AAA: malloc (vg_replace_malloc.c:291) + ==31318== by 0x4007BC: main (valgrind-example-mpi.c:8) + ==31318== + ==31318== + ==31318== HEAP SUMMARY: + ==31318== in use at exit: 3,172 bytes in 67 blocks + ==31318== total heap usage: 191 allocs, 124 frees, 81,203 bytes allocated + ==31318== + ==31319== + ==31319== HEAP SUMMARY: + ==31319== in use at exit: 3,172 bytes in 67 blocks + ==31319== total heap usage: 175 allocs, 108 frees, 48,435 bytes allocated + ==31319== + ==31318== LEAK SUMMARY: + ==31318== definitely lost: 408 bytes in 3 blocks + ==31318== indirectly lost: 256 bytes in 1 blocks + ==31318== possibly lost: 0 bytes in 0 blocks + ==31318== still reachable: 2,508 bytes in 63 blocks + ==31318== suppressed: 0 bytes in 0 blocks + ==31318== Rerun with --leak-check=full to see details of leaked memory + ==31318== + ==31318== For counts of detected and suppressed errors, rerun with: -v + ==31318== Use --track-origins=yes to see where uninitialised values come from + ==31318== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4) + ==31319== LEAK SUMMARY: + ==31319== definitely lost: 408 bytes in 3 blocks + ==31319== indirectly lost: 256 bytes in 1 blocks + ==31319== possibly lost: 0 bytes in 0 blocks + ==31319== still reachable: 2,508 bytes in 63 blocks + ==31319== suppressed: 0 bytes in 0 blocks + ==31319== Rerun with --leak-check=full to see details of leaked memory + ==31319== + ==31319== For counts of detected and suppressed errors, rerun with: -v + ==31319== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4) We can see that Valgrind has reported use of unitialised memory on the master process (which reads the array to be broadcasted) and use of diff --git a/converted/docs.it4i.cz/salomon/software/debuggers/vampir.md b/converted/docs.it4i.cz/salomon/software/debuggers/vampir.md index 8ac45ee1c4d2d39df84649f39a959e0170a8fdd6..4fb7ad88a4b64c6abdebc1014cb936c1f5f4aefd 100644 --- a/converted/docs.it4i.cz/salomon/software/debuggers/vampir.md +++ b/converted/docs.it4i.cz/salomon/software/debuggers/vampir.md @@ -7,28 +7,28 @@ functionality to collect traces, you need to use a trace collection tool (such as [Score-P](score-p.html)) first to collect the traces. - + ---------------------------------------------------------------------------------------------------------------------------------------------- Installed versions ------------------ -Version 8.5.0 is currently installed as module <span -class="monospace">Vampir/8.5.0</span> : +Version 8.5.0 is currently installed as module +Vampir/8.5.0 : - $ module load Vampir/8.5.0 - $ vampir & + $ module load Vampir/8.5.0 + $ vampir & User manual ----------- -You can find the detailed user manual in PDF format in <span -class="monospace">$EBROOTVAMPIR/doc/vampir-manual.pdf</span> +You can find the detailed user manual in PDF format in +$EBROOTVAMPIR/doc/vampir-manual.pdf References ---------- -1. <https://www.vampir.eu> +1.<https://www.vampir.eu> diff --git a/converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/3d4533af-8ce5-4aed-9bac-09fbbcd2650a.png b/converted/docs.it4i.cz/salomon/software/debuggers/vtune-amplifier.png similarity index 100% rename from converted/docs.it4i.cz/anselm-cluster-documentation/software/debuggers/3d4533af-8ce5-4aed-9bac-09fbbcd2650a.png rename to converted/docs.it4i.cz/salomon/software/debuggers/vtune-amplifier.png diff --git a/converted/docs.it4i.cz/salomon/software/intel-suite/fb3b3ac2-a88f-4e55-a25e-23f1da2200cb.png b/converted/docs.it4i.cz/salomon/software/intel-suite/Snmekobrazovky20151204v15.35.12.png similarity index 100% rename from converted/docs.it4i.cz/salomon/software/intel-suite/fb3b3ac2-a88f-4e55-a25e-23f1da2200cb.png rename to converted/docs.it4i.cz/salomon/software/intel-suite/Snmekobrazovky20151204v15.35.12.png diff --git a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-advisor.md b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-advisor.md index c0e3f55bdc7e107366a498d8042dafa4e9ed1928..12a07b716878a7d98fb7be22c56e0a5e7415fd9f 100644 --- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-advisor.md +++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-advisor.md @@ -11,10 +11,10 @@ Installed versions The following versions are currently available on Salomon as modules: - --------------- ----------------------- - **Version** **Module** - 2016 Update 2 Advisor/2016_update2 - --------------- ----------------------- +--------------- ----------------------- +Version** **Module** +2016 Update 2 Advisor/2016_update2 +--------------- ----------------------- Usage ----- @@ -28,7 +28,7 @@ line. To profile from GUI, launch Advisor: - $ advixe-gui + $ advixe-gui Then select menu File -> New -> Project. Choose a directory to save project data to. After clicking OK, Project properties window will @@ -44,11 +44,11 @@ command line. References ---------- -1. [Intel® Advisor 2015 Tutorial: Find Where to Add Parallelism - C++ - Sample](https://software.intel.com/en-us/advisorxe_2015_tut_lin_c) -2. [Product - page](https://software.intel.com/en-us/intel-advisor-xe) -3. [Documentation](https://software.intel.com/en-us/intel-advisor-2016-user-guide-linux) +1.[Intel® Advisor 2015 Tutorial: Find Where to Add Parallelism - C++ + Sample](https://software.intel.com/en-us/advisorxe_2015_tut_lin_c) +2.[Product + page](https://software.intel.com/en-us/intel-advisor-xe) +3.[Documentation](https://software.intel.com/en-us/intel-advisor-2016-user-guide-linux)  diff --git a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-compilers.md b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-compilers.md index 39bbbf195e07d4219b2e3db6e34028b75172feba..6dd1fa07a77fee58fe0bc8fe768fa11681b53209 100644 --- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-compilers.md +++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-compilers.md @@ -3,15 +3,15 @@ Intel Compilers - + The Intel compilers in multiple versions are available, via module intel. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. - $ module load intel - $ icc -v - $ ifort -v + $ module load intel + $ icc -v + $ ifort -v The intel compilers provide for vectorization of the code, via the AVX2 instructions and support threading parallelization via OpenMP @@ -21,8 +21,8 @@ your programs using the AVX2 instructions, with reporting where the vectorization was used. We recommend following compilation options for high performance - $ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.c mysubroutines.c -o myprog.x - $ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.f mysubroutines.f -o myprog.x + $ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.c mysubroutines.c -o myprog.x + $ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec myprog.f mysubroutines.f -o myprog.x In this example, we compile the program enabling interprocedural optimizations between source files (-ipo), aggresive loop optimizations @@ -32,8 +32,8 @@ The compiler recognizes the omp, simd, vector and ivdep pragmas for OpenMP parallelization and AVX2 vectorization. Enable the OpenMP parallelization by the **-openmp** compiler switch. - $ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.c mysubroutines.c -o myprog.x - $ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.f mysubroutines.f -o myprog.x + $ icc -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.c mysubroutines.c -o myprog.x + $ ifort -ipo -O3 -xCORE-AVX2 -qopt-report1 -qopt-report-phase=vec -openmp myprog.f mysubroutines.f -o myprog.x Read more at <https://software.intel.com/en-us/intel-cplusplus-compiler-16.0-user-and-reference-guide> @@ -44,32 +44,32 @@ Sandy Bridge/Ivy Bridge/Haswell binary compatibility Anselm nodes are currently equipped with Sandy Bridge CPUs, while Salomon compute nodes are equipped with Haswell based architecture. The UV1 SMP compute server has Ivy Bridge CPUs, which are equivalent to -Sandy Bridge (only smaller manufacturing technology). <span>The new +Sandy Bridge (only smaller manufacturing technology). >The new processors are backward compatible with the Sandy Bridge nodes, so all programs that ran on the Sandy Bridge processors, should also run on the -new Haswell nodes. </span><span>To get optimal performance out of the +new Haswell nodes. >To get optimal performance out of the Haswell processors a program should make use of the -special </span><span>AVX2 instructions for this processor. One can do +special >AVX2 instructions for this processor. One can do this by recompiling codes with the compiler -flags </span><span>designated to invoke these instructions. For the +flags >designated to invoke these instructions. For the Intel compiler suite, there are two ways of -doing </span><span>this:</span> - -- <span>Using compiler flag (both for Fortran and C): <span - class="monospace">-xCORE-AVX2</span>. This will create a - binary </span><span class="s1">with AVX2 instructions, specifically - for the Haswell processors. Note that the - executable </span><span>will not run on Sandy Bridge/Ivy - Bridge nodes.</span> -- <span>Using compiler flags (both for Fortran and C): <span - class="monospace">-xAVX -axCORE-AVX2</span>. This - will </span><span>generate multiple, feature specific auto-dispatch - code paths for Intel® processors, if there is </span><span>a - performance benefit. So this binary will run both on Sandy - Bridge/Ivy Bridge and Haswell </span><span>processors. During - runtime it will be decided which path to follow, dependent on - which </span><span>processor you are running on. In general this - will result in larger binaries.</span> +doing >this: + +- >Using compiler flag (both for Fortran and C): + -xCORE-AVX2. This will create a + binary class="s1">with AVX2 instructions, specifically + for the Haswell processors. Note that the + executable >will not run on Sandy Bridge/Ivy + Bridge nodes. +- >Using compiler flags (both for Fortran and C): + -xAVX -axCORE-AVX2. This + will >generate multiple, feature specific auto-dispatch + code paths for Intel® processors, if there is >a + performance benefit. So this binary will run both on Sandy + Bridge/Ivy Bridge and Haswell >processors. During + runtime it will be decided which path to follow, dependent on + which >processor you are running on. In general this + will result in larger binaries. diff --git a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-debugger.md b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-debugger.md index 273d0931a7e150cb056b4445093f73e17660afb5..6ffaab545926b4d888555054de75ec6e0711e22c 100644 --- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-debugger.md +++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-debugger.md @@ -3,7 +3,7 @@ Intel Debugger - + IDB is no longer available since Intel Parallel Studio 2015 @@ -17,13 +17,13 @@ environment. Use [X display](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html) for running the GUI. - $ module load intel/2014.06 - $ module load Java - $ idb + $ module load intel/2014.06 + $ module load Java + $ idb The debugger may run in text mode. To debug in text mode, use - $ idbc + $ idbc To debug on the compute nodes, module intel must be loaded. The GUI on compute nodes may be accessed using the same way as in [the @@ -32,14 +32,14 @@ section](../../../get-started-with-it4innovations/accessing-the-clusters/graphic Example: - $ qsub -q qexp -l select=1:ncpus=24 -X -I - qsub: waiting for job 19654.srv11 to start - qsub: job 19654.srv11 ready + $ qsub -q qexp -l select=1:ncpus=24 -X -I + qsub: waiting for job 19654.srv11 to start + qsub: job 19654.srv11 ready - $ module load intel - $ module load Java - $ icc -O0 -g myprog.c -o myprog.x - $ idb ./myprog.x + $ module load intel + $ module load Java + $ icc -O0 -g myprog.c -o myprog.x + $ idb ./myprog.x In this example, we allocate 1 full compute node, compile program myprog.c with debugging options -O0 -g and run the idb debugger @@ -59,12 +59,12 @@ rank in separate xterm terminal (do not forget the [X display](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html)). Using Intel MPI, this may be done in following way: - $ qsub -q qexp -l select=2:ncpus=24 -X -I - qsub: waiting for job 19654.srv11 to start - qsub: job 19655.srv11 ready + $ qsub -q qexp -l select=2:ncpus=24 -X -I + qsub: waiting for job 19654.srv11 to start + qsub: job 19655.srv11 ready - $ module load intel impi - $ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x + $ module load intel impi + $ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x In this example, we allocate 2 full compute node, run xterm on each node and start idb debugger in command line mode, debugging two ranks of @@ -78,12 +78,12 @@ the debugger to bind to all ranks and provide aggregated outputs across the ranks, pausing execution automatically just after startup. You may then set break points and step the execution manually. Using Intel MPI: - $ qsub -q qexp -l select=2:ncpus=24 -X -I - qsub: waiting for job 19654.srv11 to start - qsub: job 19655.srv11 ready + $ qsub -q qexp -l select=2:ncpus=24 -X -I + qsub: waiting for job 19654.srv11 to start + qsub: job 19655.srv11 ready - $ module load intel impi - $ mpirun -n 48 -idb ./mympiprog.x + $ module load intel impi + $ mpirun -n 48 -idb ./mympiprog.x ### Debugging multithreaded application diff --git a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-inspector.md b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-inspector.md index 3bb46a5401e36b9777229277828e7cfa5c6bebf0..dce7067681a702f079b72943dfdd01d21a3c78fa 100644 --- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-inspector.md +++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-inspector.md @@ -11,10 +11,10 @@ Installed versions The following versions are currently available on Salomon as modules: - --------------- ------------------------- - **Version** **Module** - 2016 Update 1 Inspector/2016_update1 - --------------- ------------------------- +--------------- ------------------------- +Version** **Module** +2016 Update 1 Inspector/2016_update1 +--------------- ------------------------- Usage ----- @@ -29,7 +29,7 @@ line. To debug from GUI, launch Inspector: - $ inspxe-gui & + $ inspxe-gui & Then select menu File -> New -> Project. Choose a directory to save project data to. After clicking OK, Project properties window will @@ -44,7 +44,7 @@ analysis directly from command line. ### Batch mode Analysis can be also run from command line in batch mode. Batch mode -analysis is run with command <span class="monospace">inspxe-cl</span>. +analysis is run with command inspxe-cl. To obtain the required parameters, either consult the documentation or you can configure the analysis in the GUI and then click "Command Line" button in the lower right corner to the respective command line. @@ -55,11 +55,11 @@ selecting File -> Open -> Result... References ---------- -1. [Product - page](https://software.intel.com/en-us/intel-inspector-xe) -2. [Documentation and Release - Notes](https://software.intel.com/en-us/intel-inspector-xe-support/documentation) -3. [Tutorials](https://software.intel.com/en-us/articles/inspectorxe-tutorials) +1.[Product + page](https://software.intel.com/en-us/intel-inspector-xe) +2.[Documentation and Release + Notes](https://software.intel.com/en-us/intel-inspector-xe-support/documentation) +3.[Tutorials](https://software.intel.com/en-us/articles/inspectorxe-tutorials) diff --git a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-integrated-performance-primitives.md b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-integrated-performance-primitives.md index f3d26a63be5327ea7774d65733e04c7069a782bd..e0b1bf13ba4b0fa90837c56f4e401a3e095df99d 100644 --- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-integrated-performance-primitives.md +++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-integrated-performance-primitives.md @@ -3,7 +3,7 @@ Intel IPP - + Intel Integrated Performance Primitives --------------------------------------- @@ -19,7 +19,7 @@ algebra functions and many more. Check out IPP before implementing own math functions for data processing, it is likely already there. - $ module load ipp + $ module load ipp The module sets up environment variables, required for linking and running ipp enabled applications. @@ -27,60 +27,60 @@ running ipp enabled applications. IPP example ----------- - #include "ipp.h" - #include <stdio.h> - int main(int argc, char* argv[]) - { - const IppLibraryVersion *lib; - Ipp64u fm; - IppStatus status; - - status= ippInit(); //IPP initialization with the best optimization layer - if( status != ippStsNoErr ) { - printf("IppInit() Error:n"); - printf("%sn", ippGetStatusString(status) ); - return -1; - } - - //Get version info - lib = ippiGetLibVersion(); - printf("%s %sn", lib->Name, lib->Version); - - //Get CPU features enabled with selected library level - fm=ippGetEnabledCpuFeatures(); - printf("SSE :%cn",(fm>>1)&1?'Y':'N'); - printf("SSE2 :%cn",(fm>>2)&1?'Y':'N'); - printf("SSE3 :%cn",(fm>>3)&1?'Y':'N'); - printf("SSSE3 :%cn",(fm>>4)&1?'Y':'N'); - printf("SSE41 :%cn",(fm>>6)&1?'Y':'N'); - printf("SSE42 :%cn",(fm>>7)&1?'Y':'N'); - printf("AVX :%cn",(fm>>8)&1 ?'Y':'N'); - printf("AVX2 :%cn", (fm>>15)&1 ?'Y':'N' ); - printf("----------n"); - printf("OS Enabled AVX :%cn", (fm>>9)&1 ?'Y':'N'); - printf("AES :%cn", (fm>>10)&1?'Y':'N'); - printf("CLMUL :%cn", (fm>>11)&1?'Y':'N'); - printf("RDRAND :%cn", (fm>>13)&1?'Y':'N'); - printf("F16C :%cn", (fm>>14)&1?'Y':'N'); - - return 0; - } + #include "ipp.h" + #include <stdio.h> + int main(int argc, char* argv[]) + { + const IppLibraryVersion *lib; + Ipp64u fm; + IppStatus status; + + status= ippInit(); //IPP initialization with the best optimization layer + if( status != ippStsNoErr ) { + printf("IppInit() Error:n"); + printf("%sn", ippGetStatusString(status) ); + return -1; + } + + //Get version info + lib = ippiGetLibVersion(); + printf("%s %sn", lib->Name, lib->Version); + + //Get CPU features enabled with selected library level + fm=ippGetEnabledCpuFeatures(); + printf("SSE :%cn",(fm>>1)&1?'Y':'N'); + printf("SSE2 :%cn",(fm>>2)&1?'Y':'N'); + printf("SSE3 :%cn",(fm>>3)&1?'Y':'N'); + printf("SSSE3 :%cn",(fm>>4)&1?'Y':'N'); + printf("SSE41 :%cn",(fm>>6)&1?'Y':'N'); + printf("SSE42 :%cn",(fm>>7)&1?'Y':'N'); + printf("AVX :%cn",(fm>>8)&1 ?'Y':'N'); + printf("AVX2 :%cn", (fm>>15)&1 ?'Y':'N' ); + printf("----------n"); + printf("OS Enabled AVX :%cn", (fm>>9)&1 ?'Y':'N'); + printf("AES :%cn", (fm>>10)&1?'Y':'N'); + printf("CLMUL :%cn", (fm>>11)&1?'Y':'N'); + printf("RDRAND :%cn", (fm>>13)&1?'Y':'N'); + printf("F16C :%cn", (fm>>14)&1?'Y':'N'); + + return 0; + }  Compile above example, using any compiler and the ipp module. - $ module load intel - $ module load ipp + $ module load intel + $ module load ipp - $ icc testipp.c -o testipp.x -lippi -lipps -lippcore + $ icc testipp.c -o testipp.x -lippi -lipps -lippcore You will need the ipp module loaded to run the ipp enabled executable. This may be avoided, by compiling library search paths into the executable - $ module load intel - $ module load ipp + $ module load intel + $ module load ipp - $ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore + $ icc testipp.c -o testipp.x -Wl,-rpath=$LIBRARY_PATH -lippi -lipps -lippcore Code samples and documentation ------------------------------ diff --git a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-mkl.md b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-mkl.md index e196a69fcd26239240be524c28bf4cce35cbc80d..69deedba6b0e9c44e08b576f45ba4a796a325ced 100644 --- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-mkl.md +++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-mkl.md @@ -3,7 +3,7 @@ Intel MKL - + Intel Math Kernel Library @@ -13,59 +13,59 @@ Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, extensively threaded and optimized for maximum performance. Intel MKL provides these basic math kernels: -[]() -- <div id="d4841e18"> - +- <div id="d4841e18"> - []()BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, - offering vector, vector-matrix, and matrix-matrix operations. -- <div id="d4841e21"> + - + BLAS (level 1, 2, and 3) and LAPACK linear algebra routines, + offering vector, vector-matrix, and matrix-matrix operations. +- <div id="d4841e21"> + + - []()The PARDISO direct sparse solver, an iterative sparse solver, - and supporting sparse BLAS (level 1, 2, and 3) routines for solving - sparse systems of equations. -- <div id="d4841e24"> + The PARDISO direct sparse solver, an iterative sparse solver, + and supporting sparse BLAS (level 1, 2, and 3) routines for solving + sparse systems of equations. +- <div id="d4841e24"> - + - []()ScaLAPACK distributed processing linear algebra routines for - Linux* and Windows* operating systems, as well as the Basic Linear - Algebra Communications Subprograms (BLACS) and the Parallel Basic - Linear Algebra Subprograms (PBLAS). -- <div id="d4841e27"> + ScaLAPACK distributed processing linear algebra routines for + Linux* and Windows* operating systems, as well as the Basic Linear + Algebra Communications Subprograms (BLACS) and the Parallel Basic + Linear Algebra Subprograms (PBLAS). +- <div id="d4841e27"> - + - []()Fast Fourier transform (FFT) functions in one, two, or three - dimensions with support for mixed radices (not limited to sizes that - are powers of 2), as well as distributed versions of - these functions. -- <div id="d4841e30"> + Fast Fourier transform (FFT) functions in one, two, or three + dimensions with support for mixed radices (not limited to sizes that + are powers of 2), as well as distributed versions of + these functions. +- <div id="d4841e30"> - + - []()Vector Math Library (VML) routines for optimized mathematical - operations on vectors. -- <div id="d4841e34"> + Vector Math Library (VML) routines for optimized mathematical + operations on vectors. +- <div id="d4841e34"> - + - []()Vector Statistical Library (VSL) routines, which offer - high-performance vectorized random number generators (RNG) for - several probability distributions, convolution and correlation - routines, and summary statistics functions. -- <div id="d4841e37"> + Vector Statistical Library (VSL) routines, which offer + high-performance vectorized random number generators (RNG) for + several probability distributions, convolution and correlation + routines, and summary statistics functions. +- <div id="d4841e37"> - + - []()Data Fitting Library, which provides capabilities for - spline-based approximation of functions, derivatives and integrals - of functions, and search. -- Extended Eigensolver, a shared memory version of an eigensolver - based on the Feast Eigenvalue Solver. + Data Fitting Library, which provides capabilities for + spline-based approximation of functions, derivatives and integrals + of functions, and search. +- Extended Eigensolver, a shared memory version of an eigensolver + based on the Feast Eigenvalue Solver. @@ -74,7 +74,7 @@ Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/ Intel MKL version 11.2.3.187 is available on the cluster - $ module load imkl + $ module load imkl The module sets up environment variables, required for linking and running mkl enabled applications. The most important variables are the @@ -91,10 +91,10 @@ integer type (necessary for indexing large arrays, with more than 2^31^-1 elements), whereas the LP64 libraries index arrays with the 32-bit integer type. - Interface Integer type - ----------- ----------------------------------------------- - LP64 32-bit, int, integer(kind=4), MPI_INT - ILP64 64-bit, long int, integer(kind=8), MPI_INT64 +Interface Integer type +----------- ----------------------------------------------- +LP64 32-bit, int, integer(kind=4), MPI_INT +ILP64 64-bit, long int, integer(kind=8), MPI_INT64 ### Linking @@ -106,7 +106,7 @@ You will need the mkl module loaded to run the mkl enabled executable. This may be avoided, by compiling library search paths into the executable. Include rpath on the compile line: - $ icc .... -Wl,-rpath=$LIBRARY_PATH ... + $ icc .... -Wl,-rpath=$LIBRARY_PATH ... ### Threading @@ -118,13 +118,13 @@ For this to work, the application must link the threaded MKL library OpenMP environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY. MKL_NUM_THREADS takes precedence over OMP_NUM_THREADS - $ export OMP_NUM_THREADS=24 - $ export KMP_AFFINITY=granularity=fine,compact,1,0 + $ export OMP_NUM_THREADS=24 + $ export KMP_AFFINITY=granularity=fine,compact,1,0 The application will run with 24 threads with affinity optimized for fine grain parallelization. -[]()Examples +Examples ------------ Number of examples, demonstrating use of the Intel MKL library and its @@ -134,47 +134,47 @@ compiled program for multi-threaded matrix multiplication. ### Working with examples - $ module load intel - $ module load imkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas + $ module load intel + $ module load imkl + $ cp -a $MKL_EXAMPLES/cblas /tmp/ + $ cd /tmp/cblas - $ make sointel64 function=cblas_dgemm + $ make sointel64 function=cblas_dgemm In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL example suite installed on clusters. ### Example: MKL and Intel compiler - $ module load intel - $ module load imkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - $ - $ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x - $ ./cblas_dgemmx.x data/cblas_dgemmx.d + $ module load intel + $ module load imkl + $ cp -a $MKL_EXAMPLES/cblas /tmp/ + $ cd /tmp/cblas + $ + $ icc -w source/cblas_dgemmx.c source/common_func.c -mkl -o cblas_dgemmx.x + $ ./cblas_dgemmx.x data/cblas_dgemmx.d In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL with icc -mkl option. Using the -mkl option is equivalent to: - $ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x - -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 + $ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x + -I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 In this example, we compile and link the cblas_dgemm example, using LP64 interface to threaded MKL and Intel OMP threads implementation. ### Example: Intel MKL and GNU compiler - $ module load GCC - $ module load imkl - $ cp -a $MKL_EXAMPLES/cblas /tmp/ - $ cd /tmp/cblas - - $ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x - -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm + $ module load GCC + $ module load imkl + $ cp -a $MKL_EXAMPLES/cblas /tmp/ + $ cd /tmp/cblas + + $ gcc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x + -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lm - $ ./cblas_dgemmx.x data/cblas_dgemmx.d + $ ./cblas_dgemmx.x data/cblas_dgemmx.d In this example, we compile, link and run the cblas_dgemm example, using LP64 interface to threaded MKL and gnu OMP threads implementation. @@ -191,8 +191,8 @@ LAPACKE C Interface MKL includes LAPACKE C Interface to LAPACK. For some reason, although Intel is the author of LAPACKE, the LAPACKE header files are not present -in MKL. For this reason, we have prepared <span -class="monospace">LAPACKE</span> module, which includes Intel's LAPACKE +in MKL. For this reason, we have prepared +LAPACKE module, which includes Intel's LAPACKE headers from official LAPACK, which you can use to compile code using LAPACKE interface against MKL. diff --git a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-parallel-studio-introduction.md b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-parallel-studio-introduction.md index 91639197a31a97e7f8ca3ef7fc4a3b8d2fc023e1..adeb405a8af2bc8357f6b2f87fe32faa2bab4717 100644 --- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-parallel-studio-introduction.md +++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-parallel-studio-introduction.md @@ -3,21 +3,21 @@ Intel Parallel Studio - + The Salomon cluster provides following elements of the Intel Parallel Studio XE - Intel Parallel Studio XE - ------------------------------------------------- - Intel Compilers - Intel Debugger - Intel MKL Library - Intel Integrated Performance Primitives Library - Intel Threading Building Blocks Library - Intel Trace Analyzer and Collector - Intel Advisor - Intel Inspector +Intel Parallel Studio XE +------------------------------------------------- +Intel Compilers +Intel Debugger +Intel MKL Library +Intel Integrated Performance Primitives Library +Intel Threading Building Blocks Library +Intel Trace Analyzer and Collector +Intel Advisor +Intel Inspector Intel compilers --------------- @@ -26,9 +26,9 @@ The Intel compilers version 131.3 are available, via module iccifort/2013.5.192-GCC-4.8.3. The compilers include the icc C and C++ compiler and the ifort fortran 77/90/95 compiler. - $ module load intel - $ icc -v - $ ifort -v + $ module load intel + $ icc -v + $ ifort -v Read more at the [Intel Compilers](intel-compilers.html) page. @@ -45,8 +45,8 @@ environment. Use [X display](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html) for running the GUI. - $ module load intel - $ idb + $ module load intel + $ idb Read more at the [Intel Debugger](intel-debugger.html) page. @@ -60,7 +60,7 @@ Intel MKL unites and provides these basic components: BLAS, LAPACK, ScaLapack, PARDISO, FFT, VML, VSL, Data fitting, Feast Eigensolver and many more. - $ module load imkl + $ module load imkl Read more at the [Intel MKL](intel-mkl.html) page. @@ -74,7 +74,7 @@ includes signal, image and frame processing algorithms, such as FFT, FIR, Convolution, Optical Flow, Hough transform, Sum, MinMax and many more. - $ module load ipp + $ module load ipp Read more at the [Intel IPP](intel-integrated-performance-primitives.html) page. @@ -91,7 +91,7 @@ smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner. - $ module load tbb + $ module load tbb Read more at the [Intel TBB](intel-tbb.html) page. diff --git a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-tbb.md b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-tbb.md index 5669a5f84e53b0fb3528c5441ce8939adabbc44d..bdc4e31e726b15ae2d8bbf2df5edbbec45a21744 100644 --- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-tbb.md +++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-tbb.md @@ -3,7 +3,7 @@ Intel TBB - + Intel Threading Building Blocks ------------------------------- @@ -18,7 +18,7 @@ accelerator](../intel-xeon-phi.html). Intel TBB version 4.3.5.187 is available on the cluster. - $ module load tbb + $ module load tbb The module sets up environment variables, required for linking and running tbb enabled applications. @@ -31,12 +31,12 @@ Examples Number of examples, demonstrating use of TBB and its built-in scheduler is available on Anselm, in the $TBB_EXAMPLES directory. - $ module load intel - $ module load tbb - $ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ - $ cd /tmp/parallel_reduce/primes - $ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb - $ ./primes.x + $ module load intel + $ module load tbb + $ cp -a $TBB_EXAMPLES/common $TBB_EXAMPLES/parallel_reduce /tmp/ + $ cd /tmp/parallel_reduce/primes + $ icc -O2 -DNDEBUG -o primes.x main.cpp primes.cpp -ltbb + $ ./primes.x In this example, we compile, link and run the primes example, demonstrating use of parallel task-based reduce in computation of prime @@ -46,7 +46,7 @@ You will need the tbb module loaded to run the tbb enabled executable. This may be avoided, by compiling library search paths into the executable. - $ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb + $ icc -O2 -o primes.x main.cpp primes.cpp -Wl,-rpath=$LIBRARY_PATH -ltbb Further reading --------------- diff --git a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md index 59703656e7a420b13bd8b3b38165623371fa814f..9fbc281c05262b9fc7af4d66f8141bc5579a58d4 100644 --- a/converted/docs.it4i.cz/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md +++ b/converted/docs.it4i.cz/salomon/software/intel-suite/intel-trace-analyzer-and-collector.md @@ -14,8 +14,8 @@ view it. Installed version ----------------- -Currently on Salomon is version 9.1.2.024 available as module <span -class="monospace">itac/9.1.2.024</span> +Currently on Salomon is version 9.1.2.024 available as module +itac/9.1.2.024 Collecting traces ----------------- @@ -23,8 +23,8 @@ Collecting traces ITAC can collect traces from applications that are using Intel MPI. To generate a trace, simply add -trace option to your mpirun command : - $ module load itac/9.1.2.024 - $ mpirun -trace myapp + $ module load itac/9.1.2.024 + $ mpirun -trace myapp The trace will be saved in file myapp.stf in the current directory. @@ -35,8 +35,8 @@ To view and analyze the trace, open ITAC GUI in a [graphical environment](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html) : - $ module load itac/9.1.2.024 - $ traceanalyzer + $ module load itac/9.1.2.024 + $ traceanalyzer The GUI will launch and you can open the produced *.stf file. @@ -48,10 +48,10 @@ Please refer to Intel documenation about usage of the GUI tool. References ---------- -1. [Getting Started with Intel® Trace Analyzer and - Collector](https://software.intel.com/en-us/get-started-with-itac-for-linux) -2. [Intel® Trace Analyzer and Collector - - Documentation](http://Intel®%20Trace%20Analyzer%20and%20Collector%20-%20Documentation) +1.[Getting Started with Intel® Trace Analyzer and + Collector](https://software.intel.com/en-us/get-started-with-itac-for-linux) +2.[Intel® Trace Analyzer and Collector - + Documentation](http://Intel®%20Trace%20Analyzer%20and%20Collector%20-%20Documentation) diff --git a/converted/docs.it4i.cz/salomon/software/intel-xeon-phi.md b/converted/docs.it4i.cz/salomon/software/intel-xeon-phi.md index 57b85d4b6c6c6cbca3895787e8dba6798b4727d0..dcdaa2091784a97cd88522d2c02dee6d63546032 100644 --- a/converted/docs.it4i.cz/salomon/software/intel-xeon-phi.md +++ b/converted/docs.it4i.cz/salomon/software/intel-xeon-phi.md @@ -4,7 +4,7 @@ Intel Xeon Phi A guide to Intel Xeon Phi usage - + Intel Xeon Phi accelerator can be programmed in several modes. The default mode on the cluster is offload mode, but all modes described in @@ -16,131 +16,131 @@ Intel Utilities for Xeon Phi To get access to a compute node with Intel Xeon Phi accelerator, use the PBS interactive session - $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 + $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 To set up the environment module "intel" has to be loaded, without specifying the version, default version is loaded (at time of writing this, it's 2015b) - $ module load intel + $ module load intel Information about the hardware can be obtained by running the micinfo program on the host. - $ /usr/bin/micinfo + $ /usr/bin/micinfo The output of the "micinfo" utility executed on one of the cluster node is as follows. (note: to get PCIe related details the command has to be run with root privileges) - MicInfo Utility Log - Created Mon Aug 17 13:55:59 2015 - - - System Info - HOST OS : Linux - OS Version : 2.6.32-504.16.2.el6.x86_64 - Driver Version : 3.4.1-1 - MPSS Version : 3.4.1 - Host Physical Memory : 131930 MB - - Device No: 0, Device Name: mic0 - - Version - Flash Version : 2.1.02.0390 - SMC Firmware Version : 1.16.5078 - SMC Boot Loader Version : 1.8.4326 - uOS Version : 2.6.38.8+mpss3.4.1 - Device Serial Number : ADKC44601414 - - Board - Vendor ID : 0x8086 - Device ID : 0x225c - Subsystem ID : 0x7d95 - Coprocessor Stepping ID : 2 - PCIe Width : x16 - PCIe Speed : 5 GT/s - PCIe Max payload size : 256 bytes - PCIe Max read req size : 512 bytes - Coprocessor Model : 0x01 - Coprocessor Model Ext : 0x00 - Coprocessor Type : 0x00 - Coprocessor Family : 0x0b - Coprocessor Family Ext : 0x00 - Coprocessor Stepping : C0 - Board SKU : C0PRQ-7120 P/A/X/D - ECC Mode : Enabled - SMC HW Revision : Product 300W Passive CS - - Cores - Total No of Active Cores : 61 - Voltage : 1007000 uV - Frequency : 1238095 kHz - - Thermal - Fan Speed Control : N/A - Fan RPM : N/A - Fan PWM : N/A - Die Temp : 60 C - - GDDR - GDDR Vendor : Samsung - GDDR Version : 0x6 - GDDR Density : 4096 Mb - GDDR Size : 15872 MB - GDDR Technology : GDDR5 - GDDR Speed : 5.500000 GT/s - GDDR Frequency : 2750000 kHz - GDDR Voltage : 1501000 uV - - Device No: 1, Device Name: mic1 - - Version - Flash Version : 2.1.02.0390 - SMC Firmware Version : 1.16.5078 - SMC Boot Loader Version : 1.8.4326 - uOS Version : 2.6.38.8+mpss3.4.1 - Device Serial Number : ADKC44500454 - - Board - Vendor ID : 0x8086 - Device ID : 0x225c - Subsystem ID : 0x7d95 - Coprocessor Stepping ID : 2 - PCIe Width : x16 - PCIe Speed : 5 GT/s - PCIe Max payload size : 256 bytes - PCIe Max read req size : 512 bytes - Coprocessor Model : 0x01 - Coprocessor Model Ext : 0x00 - Coprocessor Type : 0x00 - Coprocessor Family : 0x0b - Coprocessor Family Ext : 0x00 - Coprocessor Stepping : C0 - Board SKU : C0PRQ-7120 P/A/X/D - ECC Mode : Enabled - SMC HW Revision : Product 300W Passive CS - - Cores - Total No of Active Cores : 61 - Voltage : 998000 uV - Frequency : 1238095 kHz - - Thermal - Fan Speed Control : N/A - Fan RPM : N/A - Fan PWM : N/A - Die Temp : 59 C - - GDDR - GDDR Vendor : Samsung - GDDR Version : 0x6 - GDDR Density : 4096 Mb - GDDR Size : 15872 MB - GDDR Technology : GDDR5 - GDDR Speed : 5.500000 GT/s - GDDR Frequency : 2750000 kHz - GDDR Voltage : 1501000 uV + MicInfo Utility Log + Created Mon Aug 17 13:55:59 2015 + + + System Info + HOST OS : Linux + OS Version : 2.6.32-504.16.2.el6.x86_64 + Driver Version : 3.4.1-1 + MPSS Version : 3.4.1 + Host Physical Memory : 131930 MB + + Device No: 0, Device Name: mic0 + + Version + Flash Version : 2.1.02.0390 + SMC Firmware Version : 1.16.5078 + SMC Boot Loader Version : 1.8.4326 + uOS Version : 2.6.38.8+mpss3.4.1 + Device Serial Number : ADKC44601414 + + Board + Vendor ID : 0x8086 + Device ID : 0x225c + Subsystem ID : 0x7d95 + Coprocessor Stepping ID : 2 + PCIe Width : x16 + PCIe Speed : 5 GT/s + PCIe Max payload size : 256 bytes + PCIe Max read req size : 512 bytes + Coprocessor Model : 0x01 + Coprocessor Model Ext : 0x00 + Coprocessor Type : 0x00 + Coprocessor Family : 0x0b + Coprocessor Family Ext : 0x00 + Coprocessor Stepping : C0 + Board SKU : C0PRQ-7120 P/A/X/D + ECC Mode : Enabled + SMC HW Revision : Product 300W Passive CS + + Cores + Total No of Active Cores : 61 + Voltage : 1007000 uV + Frequency : 1238095 kHz + + Thermal + Fan Speed Control : N/A + Fan RPM : N/A + Fan PWM : N/A + Die Temp : 60 C + + GDDR + GDDR Vendor : Samsung + GDDR Version : 0x6 + GDDR Density : 4096 Mb + GDDR Size : 15872 MB + GDDR Technology : GDDR5 + GDDR Speed : 5.500000 GT/s + GDDR Frequency : 2750000 kHz + GDDR Voltage : 1501000 uV + + Device No: 1, Device Name: mic1 + + Version + Flash Version : 2.1.02.0390 + SMC Firmware Version : 1.16.5078 + SMC Boot Loader Version : 1.8.4326 + uOS Version : 2.6.38.8+mpss3.4.1 + Device Serial Number : ADKC44500454 + + Board + Vendor ID : 0x8086 + Device ID : 0x225c + Subsystem ID : 0x7d95 + Coprocessor Stepping ID : 2 + PCIe Width : x16 + PCIe Speed : 5 GT/s + PCIe Max payload size : 256 bytes + PCIe Max read req size : 512 bytes + Coprocessor Model : 0x01 + Coprocessor Model Ext : 0x00 + Coprocessor Type : 0x00 + Coprocessor Family : 0x0b + Coprocessor Family Ext : 0x00 + Coprocessor Stepping : C0 + Board SKU : C0PRQ-7120 P/A/X/D + ECC Mode : Enabled + SMC HW Revision : Product 300W Passive CS + + Cores + Total No of Active Cores : 61 + Voltage : 998000 uV + Frequency : 1238095 kHz + + Thermal + Fan Speed Control : N/A + Fan RPM : N/A + Fan PWM : N/A + Die Temp : 59 C + + GDDR + GDDR Vendor : Samsung + GDDR Version : 0x6 + GDDR Density : 4096 Mb + GDDR Size : 15872 MB + GDDR Technology : GDDR5 + GDDR Speed : 5.500000 GT/s + GDDR Frequency : 2750000 kHz + GDDR Voltage : 1501000 uV Offload Mode ------------ @@ -149,44 +149,44 @@ To compile a code for Intel Xeon Phi a MPSS stack has to be installed on the machine where compilation is executed. Currently the MPSS stack is only installed on compute nodes equipped with accelerators. - $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 - $ module load intel + $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 + $ module load intel For debugging purposes it is also recommended to set environment variable "OFFLOAD_REPORT". Value can be set from 0 to 3, where higher number means more debugging information. - export OFFLOAD_REPORT=3 + export OFFLOAD_REPORT=3 A very basic example of code that employs offload programming technique is shown in the next listing. Please note that this code is sequential and utilizes only single core of the accelerator. - $ vim source-offload.cpp + $ vim source-offload.cpp - #include <iostream> + #include <iostream> - int main(int argc, char* argv[]) - { -    const int niter = 100000; -    double result = 0; + int main(int argc, char* argv[]) + { +    const int niter = 100000; +    double result = 0; -  #pragma offload target(mic) -    for (int i = 0; i < niter; ++i) { -        const double t = (i + 0.5) / niter; -        result += 4.0 / (t * t + 1.0); -    } -    result /= niter; -    std::cout << "Pi ~ " << result << 'n'; - } +  #pragma offload target(mic) +    for (int i = 0; i < niter; ++i) { +        const double t = (i + 0.5) / niter; +        result += 4.0 / (t * t + 1.0); +    } +    result /= niter; +    std::cout << "Pi ~ " << result << 'n'; + } To compile a code using Intel compiler run - $ icc source-offload.cpp -o bin-offload + $ icc source-offload.cpp -o bin-offload To execute the code, run the following command on the host - ./bin-offload + ./bin-offload ### Parallelization in Offload Mode Using OpenMP @@ -194,91 +194,91 @@ One way of paralelization a code for Xeon Phi is using OpenMP directives. The following example shows code for parallel vector addition. - $ vim ./vect-add + $ vim ./vect-add - #include <stdio.h> + #include <stdio.h> - typedef int T; + typedef int T; - #define SIZE 1000 + #define SIZE 1000 - #pragma offload_attribute(push, target(mic)) - T in1[SIZE]; - T in2[SIZE]; - T res[SIZE]; - #pragma offload_attribute(pop) + #pragma offload_attribute(push, target(mic)) + T in1[SIZE]; + T in2[SIZE]; + T res[SIZE]; + #pragma offload_attribute(pop) - // MIC function to add two vectors - __attribute__((target(mic))) add_mic(T *a, T *b, T *c, int size) { -  int i = 0; -  #pragma omp parallel for -    for (i = 0; i < size; i++) -      c[i] = a[i] + b[i]; - } + // MIC function to add two vectors + __attribute__((target(mic))) add_mic(T *a, T *b, T *c, int size) { +  int i = 0; +  #pragma omp parallel for +    for (i = 0; i < size; i++) +      c[i] = a[i] + b[i]; + } - // CPU function to add two vectors - void add_cpu (T *a, T *b, T *c, int size) { -  int i; -  for (i = 0; i < size; i++) -    c[i] = a[i] + b[i]; - } + // CPU function to add two vectors + void add_cpu (T *a, T *b, T *c, int size) { +  int i; +  for (i = 0; i < size; i++) +    c[i] = a[i] + b[i]; + } - // CPU function to generate a vector of random numbers - void random_T (T *a, int size) { -  int i; -  for (i = 0; i < size; i++) -    a[i] = rand() % 10000; // random number between 0 and 9999 - } + // CPU function to generate a vector of random numbers + void random_T (T *a, int size) { +  int i; +  for (i = 0; i < size; i++) +    a[i] = rand() % 10000; // random number between 0 and 9999 + } - // CPU function to compare two vectors - int compare(T *a, T *b, T size ){ -  int pass = 0; -  int i; -  for (i = 0; i < size; i++){ -    if (a[i] != b[i]) { -      printf("Value mismatch at location %d, values %d and %dn",i, a[i], b[i]); -      pass = 1; -    } -  } -  if (pass == 0) printf ("Test passedn"); else printf ("Test Failedn"); -  return pass; - } + // CPU function to compare two vectors + int compare(T *a, T *b, T size ){ +  int pass = 0; +  int i; +  for (i = 0; i < size; i++){ +    if (a[i] != b[i]) { +      printf("Value mismatch at location %d, values %d and %dn",i, a[i], b[i]); +      pass = 1; +    } +  } +  if (pass == 0) printf ("Test passedn"); else printf ("Test Failedn"); +  return pass; + } - int main() - { -  int i; -  random_T(in1, SIZE); -  random_T(in2, SIZE); + int main() + { +  int i; +  random_T(in1, SIZE); +  random_T(in2, SIZE); -  #pragma offload target(mic) in(in1,in2) inout(res) -  { +  #pragma offload target(mic) in(in1,in2) inout(res) +  { -    // Parallel loop from main function -    #pragma omp parallel for -    for (i=0; i<SIZE; i++) -      res[i] = in1[i] + in2[i]; +    // Parallel loop from main function +    #pragma omp parallel for +    for (i=0; i<SIZE; i++) +      res[i] = in1[i] + in2[i]; -    // or parallel loop is called inside the function -    add_mic(in1, in2, res, SIZE); +    // or parallel loop is called inside the function +    add_mic(in1, in2, res, SIZE); -  } +  } -  //Check the results with CPU implementation -  T res_cpu[SIZE]; -  add_cpu(in1, in2, res_cpu, SIZE); -  compare(res, res_cpu, SIZE); +  //Check the results with CPU implementation +  T res_cpu[SIZE]; +  add_cpu(in1, in2, res_cpu, SIZE); +  compare(res, res_cpu, SIZE); - } + } During the compilation Intel compiler shows which loops have been vectorized in both host and accelerator. This can be enabled with compiler option "-vec-report2". To compile and execute the code run - $ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add + $ icc vect-add.c -openmp_report2 -vec-report2 -o vect-add - $ ./vect-add + $ ./vect-add Some interesting compiler flags useful not only for code debugging are: @@ -302,42 +302,42 @@ transparently. Behavioural of automatic offload mode is controlled by functions called within the program or by environmental variables. Complete list of -controls is listed [<span -class="external-link">here</span>](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). +controls is listed [ +class="external-link">here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). The Automatic Offload may be enabled by either an MKL function call within the code: - mkl_mic_enable(); + mkl_mic_enable(); or by setting environment variable - $ export MKL_MIC_ENABLE=1 + $ export MKL_MIC_ENABLE=1 To get more information about automatic offload please refer to "[Using Intel® MKL Automatic Offload on Intel ® Xeon Phi™ Coprocessors](http://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf)" -white paper or [<span class="external-link">Intel MKL -documentation</span>](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). +white paper or [ class="external-link">Intel MKL +documentation](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). ### Automatic offload example #1 Following example show how to automatically offload an SGEMM (single -precision - g<span dir="auto">eneral matrix multiply</span>) function to +precision - g dir="auto">eneral matrix multiply) function to MIC coprocessor. At first get an interactive PBS session on a node with MIC accelerator and load "intel" module that automatically loads "mkl" module as well. - $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 - $ module load intel + $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 + $ module load intel  The code can be copied to a file and compiled without any necessary modification. - $ vim sgemm-ao-short.c + $ vim sgemm-ao-short.c -``` +``` #include <stdio.h> #include <stdlib.h> #include <malloc.h> @@ -394,47 +394,47 @@ int main(int argc, char **argv) Please note: This example is simplified version of an example from MKL. The expanded version can be found here: -**$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** +$MKL_EXAMPLES/mic_ao/blasc/source/sgemm.c** To compile a code using Intel compiler use: - $ icc -mkl sgemm-ao-short.c -o sgemm + $ icc -mkl sgemm-ao-short.c -o sgemm For debugging purposes enable the offload report to see more information about automatic offloading. - $ export OFFLOAD_REPORT=2 + $ export OFFLOAD_REPORT=2 The output of a code should look similar to following listing, where lines starting with [MKL] are generated by offload reporting: - [user@r31u03n799 ~]$ ./sgemm - Computing SGEMM on the host - Enabling Automatic Offload - Automatic Offload enabled: 2 MIC devices present - Computing SGEMM with automatic workdivision - [MKL] [MIC --] [AO Function]   SGEMM - [MKL] [MIC --] [AO SGEMM Workdivision]   0.44 0.28 0.28 - [MKL] [MIC 00] [AO SGEMM CPU Time]   0.252427 seconds - [MKL] [MIC 00] [AO SGEMM MIC Time]   0.091001 seconds - [MKL] [MIC 00] [AO SGEMM CPU->MIC Data]   34078720 bytes - [MKL] [MIC 00] [AO SGEMM MIC->CPU Data]   7864320 bytes - [MKL] [MIC 01] [AO SGEMM CPU Time]   0.252427 seconds - [MKL] [MIC 01] [AO SGEMM MIC Time]   0.094758 seconds - [MKL] [MIC 01] [AO SGEMM CPU->MIC Data]   34078720 bytes - [MKL] [MIC 01] [AO SGEMM MIC->CPU Data]   7864320 bytes - Done + [user@r31u03n799 ~]$ ./sgemm + Computing SGEMM on the host + Enabling Automatic Offload + Automatic Offload enabled: 2 MIC devices present + Computing SGEMM with automatic workdivision + [MKL] [MIC --] [AO Function]   SGEMM + [MKL] [MIC --] [AO SGEMM Workdivision]   0.44 0.28 0.28 + [MKL] [MIC 00] [AO SGEMM CPU Time]   0.252427 seconds + [MKL] [MIC 00] [AO SGEMM MIC Time]   0.091001 seconds + [MKL] [MIC 00] [AO SGEMM CPU->MIC Data]   34078720 bytes + [MKL] [MIC 00] [AO SGEMM MIC->CPU Data]   7864320 bytes + [MKL] [MIC 01] [AO SGEMM CPU Time]   0.252427 seconds + [MKL] [MIC 01] [AO SGEMM MIC Time]   0.094758 seconds + [MKL] [MIC 01] [AO SGEMM CPU->MIC Data]   34078720 bytes + [MKL] [MIC 01] [AO SGEMM MIC->CPU Data]   7864320 bytes + Done Behavioral of automatic offload mode is controlled by functions called within the program or by environmental variables. Complete list of -controls is listed [<span -class="external-link">here</span>](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). +controls is listed [ +class="external-link">here](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-3DC4FC7D-A1E4-423D-9C0C-06AB265FFA86.htm). To get more information about automatic offload please refer to "[Using Intel® MKL Automatic Offload on Intel ® Xeon Phi™ Coprocessors](http://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf)" -white paper or [<span class="external-link">Intel MKL -documentation</span>](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). +white paper or [ class="external-link">Intel MKL +documentation](https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). ### Automatic offload example #2 @@ -444,30 +444,30 @@ offloaded. At first get an interactive PBS session on a node with MIC accelerator. - $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 + $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 Once in, we enable the offload and run the Octave software. In octave, we generate two large random matrices and let them multiply together. - $ export MKL_MIC_ENABLE=1 - $ export OFFLOAD_REPORT=2 - $ module load Octave/3.8.2-intel-2015b - - $ octave -q - octave:1> A=rand(10000); - octave:2> B=rand(10000); - octave:3> C=A*B; - [MKL] [MIC --] [AO Function]   DGEMM - [MKL] [MIC --] [AO DGEMM Workdivision]   0.14 0.43 0.43 - [MKL] [MIC 00] [AO DGEMM CPU Time]   3.814714 seconds - [MKL] [MIC 00] [AO DGEMM MIC Time]   2.781595 seconds - [MKL] [MIC 00] [AO DGEMM CPU->MIC Data]   1145600000 bytes - [MKL] [MIC 00] [AO DGEMM MIC->CPU Data]   1382400000 bytes - [MKL] [MIC 01] [AO DGEMM CPU Time]   3.814714 seconds - [MKL] [MIC 01] [AO DGEMM MIC Time]   2.843016 seconds - [MKL] [MIC 01] [AO DGEMM CPU->MIC Data]   1145600000 bytes - [MKL] [MIC 01] [AO DGEMM MIC->CPU Data]   1382400000 bytes - octave:4> exit + $ export MKL_MIC_ENABLE=1 + $ export OFFLOAD_REPORT=2 + $ module load Octave/3.8.2-intel-2015b + + $ octave -q + octave:1> A=rand(10000); + octave:2> B=rand(10000); + octave:3> C=A*B; + [MKL] [MIC --] [AO Function]   DGEMM + [MKL] [MIC --] [AO DGEMM Workdivision]   0.14 0.43 0.43 + [MKL] [MIC 00] [AO DGEMM CPU Time]   3.814714 seconds + [MKL] [MIC 00] [AO DGEMM MIC Time]   2.781595 seconds + [MKL] [MIC 00] [AO DGEMM CPU->MIC Data]   1145600000 bytes + [MKL] [MIC 00] [AO DGEMM MIC->CPU Data]   1382400000 bytes + [MKL] [MIC 01] [AO DGEMM CPU Time]   3.814714 seconds + [MKL] [MIC 01] [AO DGEMM MIC Time]   2.843016 seconds + [MKL] [MIC 01] [AO DGEMM CPU->MIC Data]   1145600000 bytes + [MKL] [MIC 01] [AO DGEMM MIC->CPU Data]   1382400000 bytes + octave:4> exit On the example above we observe, that the DGEMM function workload was split over CPU, MIC 0 and MIC 1, in the ratio 0.14 0.43 0.43. The matrix @@ -485,9 +485,9 @@ To compile a code user has to be connected to a compute with MIC and load Intel compilers module. To get an interactive session on a compute node with an Intel Xeon Phi and load the module use following commands: - $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 + $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 - $ module load intel + $ module load intel Please note that particular version of the Intel module is specified. This information is used later to specify the correct library paths. @@ -497,16 +497,16 @@ to specify "-mmic" compiler flag. Two compilation examples are shown below. The first example shows how to compile OpenMP parallel code "vect-add.c" for host only: - $ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host + $ icc -xhost -no-offload -fopenmp vect-add.c -o vect-add-host To run this code on host, use: - $ ./vect-add-host + $ ./vect-add-host The second example shows how to compile the same code for Intel Xeon Phi: - $ icc -mmic -fopenmp vect-add.c -o vect-add-mic + $ icc -mmic -fopenmp vect-add.c -o vect-add-mic ### Execution of the Program in Native Mode on Intel Xeon Phi @@ -518,23 +518,23 @@ have to copy binary files or libraries between the host and accelerator. Get the PATH of MIC enabled libraries for currently used Intel Compiler (here was icc/2015.3.187-GNU-5.1.0-2.25 used) : - $ echo $MIC_LD_LIBRARY_PATH - /apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic + $ echo $MIC_LD_LIBRARY_PATH + /apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic To connect to the accelerator run: - $ ssh mic0 + $ ssh mic0 If the code is sequential, it can be executed directly: - mic0 $ ~/path_to_binary/vect-add-seq-mic + mic0 $ ~/path_to_binary/vect-add-seq-mic If the code is parallelized using OpenMP a set of additional libraries is required for execution. To locate these libraries new path has to be added to the LD_LIBRARY_PATH environment variable prior to the execution: - mic0 $ export LD_LIBRARY_PATH=/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic:$LD_LIBRARY_PATH + mic0 $ export LD_LIBRARY_PATH=/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic:$LD_LIBRARY_PATH Please note that the path exported in the previous example contains path to a specific compiler (here the version is 2015.3.187-GNU-5.1.0-2.25). @@ -542,142 +542,142 @@ This version number has to match with the version number of the Intel compiler module that was used to compile the code on the host computer. For your information the list of libraries and their location required -for execution of an OpenMP parallel code on Intel Xeon Phi is:<span -class="discreet visualHighlight"></span> +for execution of an OpenMP parallel code on Intel Xeon Phi is: +class="discreet visualHighlight"> -<span>/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic +>/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic libiomp5.so libimf.so libsvml.so libirng.so libintlc.so.5 -</span> -<span>Finally, to run the compiled code use: </span> - $ ~/path_to_binary/vect-add-mic +>Finally, to run the compiled code use: + + $ ~/path_to_binary/vect-add-mic -<span>OpenCL</span> +>OpenCL ------------------- -<span>OpenCL (Open Computing Language) is an open standard for +>OpenCL (Open Computing Language) is an open standard for general-purpose parallel programming for diverse mix of multi-core CPUs, GPU coprocessors, and other parallel processors. OpenCL provides a flexible execution model and uniform programming environment for software developers to write portable code for systems running on both the CPU and graphics processors or accelerators like the Intel® Xeon -Phi.</span> +Phi. -<span>On Anselm OpenCL is installed only on compute nodes with MIC +>On Anselm OpenCL is installed only on compute nodes with MIC accelerator, therefore OpenCL code can be compiled only on these nodes. -</span> - module load opencl-sdk opencl-rt -<span>Always load "opencl-sdk" (providing devel files like headers) and + module load opencl-sdk opencl-rt + +>Always load "opencl-sdk" (providing devel files like headers) and "opencl-rt" (providing dynamic library libOpenCL.so) modules to compile and link OpenCL code. Load "opencl-rt" for running your compiled code. -</span> -<span>There are two basic examples of OpenCL code in the following -directory: </span> - /apps/intel/opencl-examples/ +>There are two basic examples of OpenCL code in the following +directory: + + /apps/intel/opencl-examples/ -<span>First example "CapsBasic" detects OpenCL compatible hardware, here +>First example "CapsBasic" detects OpenCL compatible hardware, here CPU and MIC, and prints basic information about the capabilities of it. -</span> - /apps/intel/opencl-examples/CapsBasic/capsbasic -<span>To compile and run the example copy it to your home directory, get + /apps/intel/opencl-examples/CapsBasic/capsbasic + +>To compile and run the example copy it to your home directory, get a PBS interactive session on of the nodes with MIC and run make for compilation. Make files are very basic and shows how the OpenCL code can -be compiled on Anselm. </span> +be compiled on Anselm. - $ cp /apps/intel/opencl-examples/CapsBasic/* . - $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 - $ make + $ cp /apps/intel/opencl-examples/CapsBasic/* . + $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 + $ make -<span>The compilation command for this example is: </span> +>The compilation command for this example is: - $ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ + $ g++ capsbasic.cpp -lOpenCL -o capsbasic -I/apps/intel/opencl/include/ -<span>After executing the complied binary file, following output should +>After executing the complied binary file, following output should be displayed. -</span> - ./capsbasic - Number of available platforms: 1 - Platform names: -    [0] Intel(R) OpenCL [Selected] - Number of devices available for each type: -    CL_DEVICE_TYPE_CPU: 1 -    CL_DEVICE_TYPE_GPU: 0 -    CL_DEVICE_TYPE_ACCELERATOR: 1 + ./capsbasic - *** Detailed information for each device *** + Number of available platforms: 1 + Platform names: +    [0] Intel(R) OpenCL [Selected] + Number of devices available for each type: +    CL_DEVICE_TYPE_CPU: 1 +    CL_DEVICE_TYPE_GPU: 0 +    CL_DEVICE_TYPE_ACCELERATOR: 1 - CL_DEVICE_TYPE_CPU[0] -    CL_DEVICE_NAME:       Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz -    CL_DEVICE_AVAILABLE: 1 + *** Detailed information for each device *** - ... + CL_DEVICE_TYPE_CPU[0] +    CL_DEVICE_NAME:       Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz +    CL_DEVICE_AVAILABLE: 1 - CL_DEVICE_TYPE_ACCELERATOR[0] -    CL_DEVICE_NAME: Intel(R) Many Integrated Core Acceleration Card -    CL_DEVICE_AVAILABLE: 1 + ... - ... + CL_DEVICE_TYPE_ACCELERATOR[0] +    CL_DEVICE_NAME: Intel(R) Many Integrated Core Acceleration Card +    CL_DEVICE_AVAILABLE: 1 -<span>More information about this example can be found on Intel website: + ... + +>More information about this example can be found on Intel website: <http://software.intel.com/en-us/vcsource/samples/caps-basic/> -</span> -<span>The second example that can be found in -"/apps/intel/opencl-examples" </span><span>directory is General Matrix + +>The second example that can be found in +"/apps/intel/opencl-examples" >directory is General Matrix Multiply. You can follow the the same procedure to download the example to your directory and compile it. -</span> - - $ cp -r /apps/intel/opencl-examples/* . - $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 - $ cd GEMM - $ make - -<span>The compilation command for this example is: </span> - - $ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ - -<span>To see the performance of Intel Xeon Phi performing the DGEMM run -the example as follows: </span> - - ./gemm -d 1 - Platforms (1): - [0] Intel(R) OpenCL [Selected] - Devices (2): - [0] Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz - [1] Intel(R) Many Integrated Core Acceleration Card [Selected] - Build program options: "-DT=float -DTILE_SIZE_M=1 -DTILE_GROUP_M=16 -DTILE_SIZE_N=128 -DTILE_GROUP_N=1 -DTILE_SIZE_K=8" - Running gemm_nn kernel with matrix size: 3968x3968 - Memory row stride to ensure necessary alignment: 15872 bytes - Size of memory region for one matrix: 62980096 bytes - Using alpha = 0.57599 and beta = 0.872412 - ... - Host time: 0.292953 sec. - Host perf: 426.635 GFLOPS - Host time: 0.293334 sec. - Host perf: 426.081 GFLOPS - ... - -<span>Please note: GNU compiler is used to compile the OpenCL codes for + + + $ cp -r /apps/intel/opencl-examples/* . + $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 + $ cd GEMM + $ make + +>The compilation command for this example is: + + $ g++ cmdoptions.cpp gemm.cpp ../common/basic.cpp ../common/cmdparser.cpp ../common/oclobject.cpp -I../common -lOpenCL -o gemm -I/apps/intel/opencl/include/ + +>To see the performance of Intel Xeon Phi performing the DGEMM run +the example as follows: + + ./gemm -d 1 + Platforms (1): + [0] Intel(R) OpenCL [Selected] + Devices (2): + [0] Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz + [1] Intel(R) Many Integrated Core Acceleration Card [Selected] + Build program options: "-DT=float -DTILE_SIZE_M=1 -DTILE_GROUP_M=16 -DTILE_SIZE_N=128 -DTILE_GROUP_N=1 -DTILE_SIZE_K=8" + Running gemm_nn kernel with matrix size: 3968x3968 + Memory row stride to ensure necessary alignment: 15872 bytes + Size of memory region for one matrix: 62980096 bytes + Using alpha = 0.57599 and beta = 0.872412 + ... + Host time: 0.292953 sec. + Host perf: 426.635 GFLOPS + Host time: 0.293334 sec. + Host perf: 426.081 GFLOPS + ... + +>Please note: GNU compiler is used to compile the OpenCL codes for Intel MIC. You do not need to load Intel compiler module. -</span> -<span>MPI</span> + +>MPI ---------------- ### Environment setup and compilation @@ -685,8 +685,8 @@ Intel MIC. You do not need to load Intel compiler module. To achieve best MPI performance always use following setup for Intel MPI on Xeon Phi accelerated nodes: - $ export I_MPI_FABRICS=shm:dapl - $ export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0,ofa-v2-mcm-1 + $ export I_MPI_FABRICS=shm:dapl + $ export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0,ofa-v2-mcm-1 This ensures, that MPI inside node will use SHMEM communication, between HOST and Phi the IB SCIF will be used and between different nodes or @@ -705,32 +705,32 @@ Again an MPI code for Intel Xeon Phi has to be compiled on a compute node with accelerator and MPSS software stack installed. To get to a compute node with accelerator use: - $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 + $ qsub -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 The only supported implementation of MPI standard for Intel Xeon Phi is Intel MPI. To setup a fully functional development environment a combination of Intel compiler and Intel MPI has to be used. On a host load following modules before compilation: - $ module load intel impi + $ module load intel impi To compile an MPI code for host use: - $ mpiicc -xhost -o mpi-test mpi-test.c + $ mpiicc -xhost -o mpi-test mpi-test.c To compile the same code for Intel Xeon Phi architecture use: - $ mpiicc -mmic -o mpi-test-mic mpi-test.c + $ mpiicc -mmic -o mpi-test-mic mpi-test.c Or, if you are using Fortran : - $ mpiifort -mmic -o mpi-test-mic mpi-test.f90 + $ mpiifort -mmic -o mpi-test-mic mpi-test.f90 An example of basic MPI version of "hello-world" example in C language, that can be executed on both host and Xeon Phi is (can be directly copy and pasted to a .c file) -``` +``` #include <stdio.h> #include <mpi.h> @@ -758,42 +758,42 @@ int main (argc, argv) ### MPI programming models -<span>Intel MPI for the Xeon Phi coprocessors offers different MPI -programming models:</span> +>Intel MPI for the Xeon Phi coprocessors offers different MPI +programming models: -**Host-only model** - all MPI ranks reside on the host. The coprocessors +Host-only model** - all MPI ranks reside on the host. The coprocessors can be used by using offload pragmas. (Using MPI calls inside offloaded code is not supported.)** Coprocessor-only model** - all MPI ranks reside only on the coprocessors. -**Symmetric model** - the MPI ranks reside on both the host and the +Symmetric model** - the MPI ranks reside on both the host and the coprocessor. Most general MPI case. -### <span>Host-only model</span> +### >Host-only model -<span></span>In this case all environment variables are set by modules, +>In this case all environment variables are set by modules, so to execute the compiled MPI program on a single node, use: - $ mpirun -np 4 ./mpi-test + $ mpirun -np 4 ./mpi-test The output should be similar to: - Hello world from process 1 of 4 on host r38u31n1000 - Hello world from process 3 of 4 on host r38u31n1000 - Hello world from process 2 of 4 on host r38u31n1000 - Hello world from process 0 of 4 on host r38u31n1000 + Hello world from process 1 of 4 on host r38u31n1000 + Hello world from process 3 of 4 on host r38u31n1000 + Hello world from process 2 of 4 on host r38u31n1000 + Hello world from process 0 of 4 on host r38u31n1000 ### Coprocessor-only model -<span>There are two ways how to execute an MPI code on a single +>There are two ways how to execute an MPI code on a single coprocessor: 1.) lunch the program using "**mpirun**" from the coprocessor; or 2.) lunch the task using "**mpiexec.hydra**" from a host. -</span> -**Execution on coprocessor** + +Execution on coprocessor** Similarly to execution of OpenMP programs in native mode, since the environmental module are not supported on MIC, user has to setup paths @@ -805,21 +805,21 @@ accelerator through the SSH. At first get the LD_LIBRARY_PATH for currenty used Intel Compiler and Intel MPI: - $ echo $MIC_LD_LIBRARY_PATH - /apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/mkl/lib/mic:/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/lib/mic:/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic/ + $ echo $MIC_LD_LIBRARY_PATH + /apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/mkl/lib/mic:/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/lib/mic:/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic/ Use it in your ~/.profile: - $ vim ~/.profile + $ vim ~/.profile - PS1='[u@h W]$ ' - export PATH=/usr/bin:/usr/sbin:/bin:/sbin + PS1='[u@h W]$ ' + export PATH=/usr/bin:/usr/sbin:/bin:/sbin - #IMPI - export PATH=/apps/all/impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25/mic/bin/:$PATH + #IMPI + export PATH=/apps/all/impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25/mic/bin/:$PATH - #OpenMP (ICC, IFORT), IMKL and IMPI - export LD_LIBRARY_PATH=/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/mkl/lib/mic:/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/lib/mic:/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic:$LD_LIBRARY_PATH + #OpenMP (ICC, IFORT), IMKL and IMPI + export LD_LIBRARY_PATH=/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/mkl/lib/mic:/apps/all/imkl/11.2.3.187-iimpi-7.3.5-GNU-5.1.0-2.25/lib/mic:/apps/all/icc/2015.3.187-GNU-5.1.0-2.25/composer_xe_2015.3.187/compiler/lib/mic:$LD_LIBRARY_PATH Please note:  - this file sets up both environmental variable for both MPI and OpenMP @@ -831,27 +831,27 @@ to match with loaded modules. To access a MIC accelerator located on a node that user is currently connected to, use: - $ ssh mic0 + $ ssh mic0 or in case you need specify a MIC accelerator on a particular node, use: - $ ssh r38u31n1000-mic0 + $ ssh r38u31n1000-mic0 To run the MPI code in parallel on multiple core of the accelerator, use: - $ mpirun -np 4 ./mpi-test-mic + $ mpirun -np 4 ./mpi-test-mic The output should be similar to: - Hello world from process 1 of 4 on host r38u31n1000-mic0 - Hello world from process 2 of 4 on host r38u31n1000-mic0 - Hello world from process 3 of 4 on host r38u31n1000-mic0 - Hello world from process 0 of 4 on host r38u31n1000-mic0 + Hello world from process 1 of 4 on host r38u31n1000-mic0 + Hello world from process 2 of 4 on host r38u31n1000-mic0 + Hello world from process 3 of 4 on host r38u31n1000-mic0 + Hello world from process 0 of 4 on host r38u31n1000-mic0 -** ** + ** -**Execution on host** +Execution on host** If the MPI program is launched from host instead of the coprocessor, the environmental variables are not set using the ".profile" file. Therefore @@ -861,172 +861,172 @@ user has to specify library paths from the command line when calling First step is to tell mpiexec that the MPI should be executed on a local accelerator by setting up the environmental variable "I_MPI_MIC" - $ export I_MPI_MIC=1 + $ export I_MPI_MIC=1 Now the MPI program can be executed as: - $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -host mic0 -n 4 ~/mpi-test-mic + $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -host mic0 -n 4 ~/mpi-test-mic or using mpirun - $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -host mic0 -n 4 ~/mpi-test-mic + $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -host mic0 -n 4 ~/mpi-test-mic Please note:  - the full path to the binary has to specified (here: -"**<span>~/mpi-test-mic</span>**") +"**>~/mpi-test-mic**")  - the LD_LIBRARY_PATH has to match with Intel MPI module used to compile the MPI code The output should be again similar to: - Hello world from process 1 of 4 on host r38u31n1000-mic0 - Hello world from process 2 of 4 on host r38u31n1000-mic0 - Hello world from process 3 of 4 on host r38u31n1000-mic0 - Hello world from process 0 of 4 on host r38u31n1000-mic0 + Hello world from process 1 of 4 on host r38u31n1000-mic0 + Hello world from process 2 of 4 on host r38u31n1000-mic0 + Hello world from process 3 of 4 on host r38u31n1000-mic0 + Hello world from process 0 of 4 on host r38u31n1000-mic0 Please note that the "mpiexec.hydra" requires a file -"**<span>pmi_proxy</span>**" from Intel MPI library to be copied to the +"**>pmi_proxy**" from Intel MPI library to be copied to the MIC filesystem. If the file is missing please contact the system administrators. A simple test to see if the file is present is to execute: -   $ ssh mic0 ls /bin/pmi_proxy -  /bin/pmi_proxy +   $ ssh mic0 ls /bin/pmi_proxy +  /bin/pmi_proxy -** ** + ** -**Execution on host - MPI processes distributed over multiple +Execution on host - MPI processes distributed over multiple accelerators on multiple nodes** -<span>To get access to multiple nodes with MIC accelerator, user has to +>To get access to multiple nodes with MIC accelerator, user has to use PBS to allocate the resources. To start interactive session, that allocates 2 compute nodes = 2 MIC accelerators run qsub command with -following parameters: </span> +following parameters: - $ qsub -I -q qprod -l select=2:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 + $ qsub -I -q qprod -l select=2:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 -A NONE-0-0 - $ module load intel impi + $ module load intel impi -<span>This command connects user through ssh to one of the nodes +>This command connects user through ssh to one of the nodes immediately. To see the other nodes that have been allocated use: -</span> - $ cat $PBS_NODEFILE -<span>For example: </span> + $ cat $PBS_NODEFILE + +>For example: - r38u31n1000.bullx - r38u32n1001.bullx + r38u31n1000.bullx + r38u32n1001.bullx -<span>This output means that the PBS allocated nodes r38u31n1000 and +>This output means that the PBS allocated nodes r38u31n1000 and r38u32n1001, which means that user has direct access to -"**r38u31n1000-mic0**" and "**<span>r38u32n1001</span>-mic0**" -accelerators.</span> +"**r38u31n1000-mic0**" and "**>r38u32n1001-mic0**" +accelerators. -<span>Please note: At this point user can connect to any of the +>Please note: At this point user can connect to any of the allocated nodes or any of the allocated MIC accelerators using ssh: -- to connect to the second node : **<span class="monospace">$ -ssh <span>r38u32n1001</span></span>** -<span>- to connect to the accelerator on the first node from the first -node: <span class="monospace">**$ ssh -<span>r38u31n1000</span>-mic0**</span></span> or **<span -class="monospace">$ ssh mic0</span>** -**-** to connect to the accelerator on the second node from the first -node: <span class="monospace">**$ ssh -<span>r38u32n1001</span>-mic0**</span> -</span> - -<span>At this point we expect that correct modules are loaded and binary -is compiled. For parallel execution the mpiexec.hydra is used.</span> +- to connect to the second node : ** $ +ssh >r38u32n1001** +>- to connect to the accelerator on the first node from the first +node: **$ ssh +>r38u31n1000-mic0**</span> or ** +$ ssh mic0** +-** to connect to the accelerator on the second node from the first +node: **$ ssh +>r38u32n1001-mic0** + + +>At this point we expect that correct modules are loaded and binary +is compiled. For parallel execution the mpiexec.hydra is used. Again the first step is to tell mpiexec that the MPI can be executed on MIC accelerators by setting up the environmental variable "I_MPI_MIC", don't forget to have correct FABRIC and PROVIDER defined. - $ export I_MPI_MIC=1 - $ export I_MPI_FABRICS=shm:dapl - $ export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0,ofa-v2-mcm-1 + $ export I_MPI_MIC=1 + $ export I_MPI_FABRICS=shm:dapl + $ export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0,ofa-v2-mcm-1 -<span>The launch the MPI program use:</span> +>The launch the MPI program use: - $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH - -host r38u31n1000-mic0 -n 4 ~/mpi-test-mic - : -host r38u32n1001-mic0 -n 6 ~/mpi-test-mic + $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH + -host r38u31n1000-mic0 -n 4 ~/mpi-test-mic + : -host r38u32n1001-mic0 -n 6 ~/mpi-test-mic or using mpirun: - $ mpirun -genv LD_LIBRARY_PATH - -host r38u31n1000-mic0 -n 4 ~/mpi-test-mic - : -host r38u32n1001-mic0 -n 6 ~/mpi-test-mic + $ mpirun -genv LD_LIBRARY_PATH + -host r38u31n1000-mic0 -n 4 ~/mpi-test-mic + : -host r38u32n1001-mic0 -n 6 ~/mpi-test-mic In this case four MPI processes are executed on accelerator r38u31n1000-mic and six processes are executed on accelerator r38u32n1001-mic0. The sample output (sorted after execution) is: - Hello world from process 0 of 10 on host r38u31n1000-mic0 - Hello world from process 1 of 10 on host r38u31n1000-mic0 - Hello world from process 2 of 10 on host r38u31n1000-mic0 - Hello world from process 3 of 10 on host r38u31n1000-mic0 - Hello world from process 4 of 10 on host r38u32n1001-mic0 - Hello world from process 5 of 10 on host r38u32n1001-mic0 - Hello world from process 6 of 10 on host r38u32n1001-mic0 - Hello world from process 7 of 10 on host r38u32n1001-mic0 - Hello world from process 8 of 10 on host r38u32n1001-mic0 - Hello world from process 9 of 10 on host r38u32n1001-mic0 + Hello world from process 0 of 10 on host r38u31n1000-mic0 + Hello world from process 1 of 10 on host r38u31n1000-mic0 + Hello world from process 2 of 10 on host r38u31n1000-mic0 + Hello world from process 3 of 10 on host r38u31n1000-mic0 + Hello world from process 4 of 10 on host r38u32n1001-mic0 + Hello world from process 5 of 10 on host r38u32n1001-mic0 + Hello world from process 6 of 10 on host r38u32n1001-mic0 + Hello world from process 7 of 10 on host r38u32n1001-mic0 + Hello world from process 8 of 10 on host r38u32n1001-mic0 + Hello world from process 9 of 10 on host r38u32n1001-mic0 The same way MPI program can be executed on multiple hosts: - $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH - -host r38u31n1000 -n 4 ~/mpi-test - : -host r38u32n1001 -n 6 ~/mpi-test + $ mpirun -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH + -host r38u31n1000 -n 4 ~/mpi-test + : -host r38u32n1001 -n 6 ~/mpi-test -### <span>Symmetric model </span> +### >Symmetric model -<span>In a symmetric mode MPI programs are executed on both host +>In a symmetric mode MPI programs are executed on both host computer(s) and MIC accelerator(s). Since MIC has a different architecture and requires different binary file produced by the Intel compiler two different files has to be compiled before MPI program is -executed. </span> +executed. -<span>In the previous section we have compiled two binary files, one for +>In the previous section we have compiled two binary files, one for hosts "**mpi-test**" and one for MIC accelerators "**mpi-test-mic**". These two binaries can be executed at once using mpiexec.hydra: -</span> - $ mpirun - -genv $MIC_LD_LIBRARY_PATH - -host r38u32n1001 -n 2 ~/mpi-test - : -host r38u32n1001-mic0 -n 2 ~/mpi-test-mic + + $ mpirun + -genv $MIC_LD_LIBRARY_PATH + -host r38u32n1001 -n 2 ~/mpi-test + : -host r38u32n1001-mic0 -n 2 ~/mpi-test-mic In this example the first two parameters (line 2 and 3) sets up required environment variables for execution. The third line specifies binary that is executed on host (here r38u32n1001) and the last line specifies the binary that is execute on the accelerator (here r38u32n1001-mic0). -<span>The output of the program is: </span> +>The output of the program is: - Hello world from process 0 of 4 on host r38u32n1001 - Hello world from process 1 of 4 on host r38u32n1001 - Hello world from process 2 of 4 on host r38u32n1001-mic0 - Hello world from process 3 of 4 on host r38u32n1001-mic0 + Hello world from process 0 of 4 on host r38u32n1001 + Hello world from process 1 of 4 on host r38u32n1001 + Hello world from process 2 of 4 on host r38u32n1001-mic0 + Hello world from process 3 of 4 on host r38u32n1001-mic0 -<span>The execution procedure can be simplified by using the mpirun +>The execution procedure can be simplified by using the mpirun command with the machine file a a parameter. Machine file contains list of all nodes and accelerators that should used to execute MPI processes. -</span> -<span>An example of a machine file that uses 2 <span>hosts (r38u32n1001 + +>An example of a machine file that uses 2 >hosts (r38u32n1001 and r38u33n1002) and 2 accelerators **(r38u32n1001-mic0** and -**<span><span>r38u33n1002</span></span>-mic0**) to run 2 MPI processes -on each</span> of them: -</span> +>>r38u33n1002-mic0**) to run 2 MPI processes +on each of them: - $ cat hosts_file_mix - r38u32n1001:2 - r38u32n1001-mic0:2 - r38u33n1002:2 - r38u33n1002-mic0:2 -<span>In addition if a naming convention is set in a way that the name + $ cat hosts_file_mix + r38u32n1001:2 + r38u32n1001-mic0:2 + r38u33n1002:2 + r38u33n1002-mic0:2 + +>In addition if a naming convention is set in a way that the name of the binary for host is **"bin_name"** and the name of the binary for the accelerator is **"bin_name-mic"** then by setting up the environment variable **I_MPI_MIC_POSTFIX** to **"-mic"** user do not @@ -1034,43 +1034,43 @@ have to specify the names of booth binaries. In this case mpirun needs just the name of the host binary file (i.e. "mpi-test") and uses the suffix to get a name of the binary for accelerator (i..e. "mpi-test-mic"). -</span> - $ export I_MPI_MIC_POSTFIX=-mic - <span>To run the MPI code using mpirun and the machine file + $ export I_MPI_MIC_POSTFIX=-mic + + >To run the MPI code using mpirun and the machine file "hosts_file_mix" use: -</span> - $ mpirun - -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH - -machinefile hosts_file_mix - ~/mpi-test -<span>A possible output of the MPI "hello-world" example executed on two + $ mpirun + -genv LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH + -machinefile hosts_file_mix + ~/mpi-test + +>A possible output of the MPI "hello-world" example executed on two hosts and two accelerators is: -</span> - Hello world from process 0 of 8 on host r38u31n1000 - Hello world from process 1 of 8 on host r38u31n1000 - Hello world from process 2 of 8 on host r38u31n1000-mic0 - Hello world from process 3 of 8 on host r38u31n1000-mic0 - Hello world from process 4 of 8 on host r38u32n1001 - Hello world from process 5 of 8 on host r38u32n1001 - Hello world from process 6 of 8 on host r38u32n1001-mic0 - Hello world from process 7 of 8 on host r38u32n1001-mic0 -**Using the PBS automatically generated node-files -** + Hello world from process 0 of 8 on host r38u31n1000 + Hello world from process 1 of 8 on host r38u31n1000 + Hello world from process 2 of 8 on host r38u31n1000-mic0 + Hello world from process 3 of 8 on host r38u31n1000-mic0 + Hello world from process 4 of 8 on host r38u32n1001 + Hello world from process 5 of 8 on host r38u32n1001 + Hello world from process 6 of 8 on host r38u32n1001-mic0 + Hello world from process 7 of 8 on host r38u32n1001-mic0 + +Using the PBS automatically generated node-files + PBS also generates a set of node-files that can be used instead of manually creating a new one every time. Three node-files are genereated: -**Host only node-file:** +Host only node-file:**  - /lscratch/$/nodefile-cn -**MIC only node-file**: +MIC only node-file**:  - /lscratch/$/nodefile-mic -**Host and MIC node-file**: +Host and MIC node-file**:  - /lscratch/$/nodefile-mix Please note each host or accelerator is listed only per files. User has @@ -1082,5 +1082,5 @@ Optimization For more details about optimization techniques please read Intel document [Optimization and Performance Tuning for Intel® Xeon Phi™ -Coprocessors](http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization "http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization"){.external -.text}. +Coprocessors](http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization "http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization") +. diff --git a/converted/docs.it4i.cz/salomon/software/java.md b/converted/docs.it4i.cz/salomon/software/java.md index 149db1d329cd89085988d242bcf842d3f146346d..2701936ebc099c312e5594c243f241a4d80b5dac 100644 --- a/converted/docs.it4i.cz/salomon/software/java.md +++ b/converted/docs.it4i.cz/salomon/software/java.md @@ -4,26 +4,26 @@ Java Java on the cluster - + Java is available on the cluster. Activate java by loading the Java module - $ module load Java + $ module load Java Note that the Java module must be loaded on the compute nodes as well, in order to run java on compute nodes. Check for java version and path - $ java -version - $ which java + $ java -version + $ which java With the module loaded, not only the runtime environment (JRE), but also the development environment (JDK) with the compiler is available. - $ javac -version - $ which javac + $ javac -version + $ which javac Java applications may use MPI for interprocess communication, in conjunction with OpenMPI. Read more diff --git a/converted/docs.it4i.cz/salomon/software/mpi-1/Running_OpenMPI.md b/converted/docs.it4i.cz/salomon/software/mpi-1/Running_OpenMPI.md index 3d15d224fd6c8897e34cee68765a64e88385384d..504db7f2b0415a3ba8ff8468c4232663c0004563 100644 --- a/converted/docs.it4i.cz/salomon/software/mpi-1/Running_OpenMPI.md +++ b/converted/docs.it4i.cz/salomon/software/mpi-1/Running_OpenMPI.md @@ -3,7 +3,7 @@ Running OpenMPI - + OpenMPI program execution ------------------------- @@ -18,19 +18,19 @@ Use the mpiexec to run the OpenMPI code. Example: - $ qsub -q qexp -l select=4:ncpus=24 -I - qsub: waiting for job 15210.isrv5 to start - qsub: job 15210.isrv5 ready + $ qsub -q qexp -l select=4:ncpus=24 -I + qsub: waiting for job 15210.isrv5 to start + qsub: job 15210.isrv5 ready - $ pwd - /home/username + $ pwd + /home/username - $ module load OpenMPI - $ mpiexec -pernode ./helloworld_mpi.x - Hello world! from rank 0 of 4 on host r1i0n17 - Hello world! from rank 1 of 4 on host r1i0n5 - Hello world! from rank 2 of 4 on host r1i0n6 - Hello world! from rank 3 of 4 on host r1i0n7 + $ module load OpenMPI + $ mpiexec -pernode ./helloworld_mpi.x + Hello world! from rank 0 of 4 on host r1i0n17 + Hello world! from rank 1 of 4 on host r1i0n5 + Hello world! from rank 2 of 4 on host r1i0n6 + Hello world! from rank 3 of 4 on host r1i0n7 Please be aware, that in this example, the directive **-pernode** is used to run only **one task per node**, which is normally an unwanted @@ -41,28 +41,28 @@ directive** to run up to 24 MPI tasks per each node. In this example, we allocate 4 nodes via the express queue interactively. We set up the openmpi environment and interactively run the helloworld_mpi.x program. -Note that the executable <span -class="monospace">helloworld_mpi.x</span> must be available within the +Note that the executable +helloworld_mpi.x must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch filesystem. You need to preload the executable, if running on the local ramdisk /tmp filesystem - $ pwd - /tmp/pbs.15210.isrv5 + $ pwd + /tmp/pbs.15210.isrv5 - $ mpiexec -pernode --preload-binary ./helloworld_mpi.x - Hello world! from rank 0 of 4 on host r1i0n17 - Hello world! from rank 1 of 4 on host r1i0n5 - Hello world! from rank 2 of 4 on host r1i0n6 - Hello world! from rank 3 of 4 on host r1i0n7 + $ mpiexec -pernode --preload-binary ./helloworld_mpi.x + Hello world! from rank 0 of 4 on host r1i0n17 + Hello world! from rank 1 of 4 on host r1i0n5 + Hello world! from rank 2 of 4 on host r1i0n6 + Hello world! from rank 3 of 4 on host r1i0n7 -In this example, we assume the executable <span -class="monospace">helloworld_mpi.x</span> is present on compute node +In this example, we assume the executable +helloworld_mpi.x is present on compute node r1i0n17 on ramdisk. We call the mpiexec whith the **--preload-binary** argument (valid for openmpi). The mpiexec will copy the executable from -r1i0n17 to the <span class="monospace">/tmp/pbs.15210.isrv5</span> +r1i0n17 to the /tmp/pbs.15210.isrv5 directory on r1i0n5, r1i0n6 and r1i0n7 and execute the program. MPI process mapping may be controlled by PBS parameters. @@ -76,11 +76,11 @@ MPI process. Follow this example to run one MPI process per node, 24 threads per process. - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=1:ompthreads=24 -I + $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=1:ompthreads=24 -I - $ module load OpenMPI + $ module load OpenMPI - $ mpiexec --bind-to-none ./helloworld_mpi.x + $ mpiexec --bind-to-none ./helloworld_mpi.x In this example, we demonstrate recommended way to run an MPI application, using 1 MPI processes per node and 24 threads per socket, @@ -91,11 +91,11 @@ on 4 nodes. Follow this example to run two MPI processes per node, 8 threads per process. Note the options to mpiexec. - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=2:ompthreads=12 -I + $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=2:ompthreads=12 -I - $ module load OpenMPI + $ module load OpenMPI - $ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x + $ mpiexec -bysocket -bind-to-socket ./helloworld_mpi.x In this example, we demonstrate recommended way to run an MPI application, using 2 MPI processes per node and 12 threads per socket, @@ -107,11 +107,11 @@ node, on 4 nodes Follow this example to run 24 MPI processes per node, 1 thread per process. Note the options to mpiexec. - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I + $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I - $ module load OpenMPI + $ module load OpenMPI - $ mpiexec -bycore -bind-to-core ./helloworld_mpi.x + $ mpiexec -bycore -bind-to-core ./helloworld_mpi.x In this example, we demonstrate recommended way to run an MPI application, using 24 MPI processes per node, single threaded. Each @@ -126,19 +126,19 @@ operating system might still migrate OpenMP threads between cores. You might want to avoid this by setting these environment variable for GCC OpenMP: - $ export GOMP_CPU_AFFINITY="0-23" + $ export GOMP_CPU_AFFINITY="0-23" or this one for Intel OpenMP: - $ export KMP_AFFINITY=granularity=fine,compact,1,0 + $ export KMP_AFFINITY=granularity=fine,compact,1,0 As of OpenMP 4.0 (supported by GCC 4.9 and later and Intel 14.0 and later) the following variables may be used for Intel or GCC: - $ export OMP_PROC_BIND=true - $ export OMP_PLACES=cores + $ export OMP_PROC_BIND=true + $ export OMP_PLACES=cores -<span>OpenMPI Process Mapping and Binding</span> +>OpenMPI Process Mapping and Binding ------------------------------------------------ The mpiexec allows for precise selection of how the MPI processes will @@ -154,18 +154,18 @@ openmpi only. Example hostfile - r1i0n17.smc.salomon.it4i.cz - r1i0n5.smc.salomon.it4i.cz - r1i0n6.smc.salomon.it4i.cz - r1i0n7.smc.salomon.it4i.cz + r1i0n17.smc.salomon.it4i.cz + r1i0n5.smc.salomon.it4i.cz + r1i0n6.smc.salomon.it4i.cz + r1i0n7.smc.salomon.it4i.cz Use the hostfile to control process placement - $ mpiexec -hostfile hostfile ./helloworld_mpi.x - Hello world! from rank 0 of 4 on host r1i0n17 - Hello world! from rank 1 of 4 on host r1i0n5 - Hello world! from rank 2 of 4 on host r1i0n6 - Hello world! from rank 3 of 4 on host r1i0n7 + $ mpiexec -hostfile hostfile ./helloworld_mpi.x + Hello world! from rank 0 of 4 on host r1i0n17 + Hello world! from rank 1 of 4 on host r1i0n5 + Hello world! from rank 2 of 4 on host r1i0n6 + Hello world! from rank 3 of 4 on host r1i0n7 In this example, we see that ranks have been mapped on nodes according to the order in which nodes show in the hostfile @@ -179,11 +179,11 @@ Appropriate binding may boost performance of your application. Example rankfile - rank 0=r1i0n7.smc.salomon.it4i.cz slot=1:0,1 - rank 1=r1i0n6.smc.salomon.it4i.cz slot=0:* - rank 2=r1i0n5.smc.salomon.it4i.cz slot=1:1-2 - rank 3=r1i0n17.smc.salomon slot=0:1,1:0-2 - rank 4=r1i0n6.smc.salomon.it4i.cz slot=0:*,1:* + rank 0=r1i0n7.smc.salomon.it4i.cz slot=1:0,1 + rank 1=r1i0n6.smc.salomon.it4i.cz slot=0:* + rank 2=r1i0n5.smc.salomon.it4i.cz slot=1:1-2 + rank 3=r1i0n17.smc.salomon slot=0:1,1:0-2 + rank 4=r1i0n6.smc.salomon.it4i.cz slot=0:*,1:* This rankfile assumes 5 ranks will be running on 4 nodes and provides exact mapping and binding of the processes to the processor sockets and @@ -197,17 +197,17 @@ rank 3 will be bounded to r1i0n17, socket0 core1, socket1 core0, core1, core2 rank 4 will be bounded to r1i0n6, all cores on both sockets - $ mpiexec -n 5 -rf rankfile --report-bindings ./helloworld_mpi.x - [r1i0n17:11180] MCW rank 3 bound to socket 0[core 1] socket 1[core 0-2]: [. B . . . . . . . . . .][B B B . . . . . . . . .] (slot list 0:1,1:0-2) - [r1i0n7:09928] MCW rank 0 bound to socket 1[core 0-1]: [. . . . . . . . . . . .][B B . . . . . . . . . .] (slot list 1:0,1) - [r1i0n6:10395] MCW rank 1 bound to socket 0[core 0-7]: [B B B B B B B B B B B B][. . . . . . . . . . . .] (slot list 0:*) - [r1i0n5:10406] MCW rank 2 bound to socket 1[core 1-2]: [. . . . . . . . . . . .][. B B . . . . . . . . .] (slot list 1:1-2) - [r1i0n6:10406] MCW rank 4 bound to socket 0[core 0-7] socket 1[core 0-7]: [B B B B B B B B B B B B][B B B B B B B B B B B B] (slot list 0:*,1:*) - Hello world! from rank 3 of 5 on host r1i0n17 - Hello world! from rank 1 of 5 on host r1i0n6 - Hello world! from rank 0 of 5 on host r1i0n7 - Hello world! from rank 4 of 5 on host r1i0n6 - Hello world! from rank 2 of 5 on host r1i0n5 + $ mpiexec -n 5 -rf rankfile --report-bindings ./helloworld_mpi.x + [r1i0n17:11180] MCW rank 3 bound to socket 0[core 1] socket 1[core 0-2]: [. B . . . . . . . . . .][B B B . . . . . . . . .] (slot list 0:1,1:0-2) + [r1i0n7:09928] MCW rank 0 bound to socket 1[core 0-1]: [. . . . . . . . . . . .][B B . . . . . . . . . .] (slot list 1:0,1) + [r1i0n6:10395] MCW rank 1 bound to socket 0[core 0-7]: [B B B B B B B B B B B B][. . . . . . . . . . . .] (slot list 0:*) + [r1i0n5:10406] MCW rank 2 bound to socket 1[core 1-2]: [. . . . . . . . . . . .][. B B . . . . . . . . .] (slot list 1:1-2) + [r1i0n6:10406] MCW rank 4 bound to socket 0[core 0-7] socket 1[core 0-7]: [B B B B B B B B B B B B][B B B B B B B B B B B B] (slot list 0:*,1:*) + Hello world! from rank 3 of 5 on host r1i0n17 + Hello world! from rank 1 of 5 on host r1i0n6 + Hello world! from rank 0 of 5 on host r1i0n7 + Hello world! from rank 4 of 5 on host r1i0n6 + Hello world! from rank 2 of 5 on host r1i0n5 In this example we run 5 MPI processes (5 ranks) on four nodes. The rankfile defines how the processes will be mapped on the nodes, sockets @@ -223,9 +223,9 @@ and cores. In all cases, binding and threading may be verified by executing for example: - $ mpiexec -bysocket -bind-to-socket --report-bindings echo - $ mpiexec -bysocket -bind-to-socket numactl --show - $ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS + $ mpiexec -bysocket -bind-to-socket --report-bindings echo + $ mpiexec -bysocket -bind-to-socket numactl --show + $ mpiexec -bysocket -bind-to-socket echo $OMP_NUM_THREADS Changes in OpenMPI 1.8 ---------------------- @@ -266,7 +266,7 @@ Some options have changed in OpenMPI version 1.8. </tr> <tr class="even"> <td align="left">-pernode</td> -<td align="left"><p><span class="s1">--map-by ppr:1:node</span></p></td> +<td align="left"><p> class="s1">--map-by ppr:1:node</p></td> </tr> </tbody> </table> diff --git a/converted/docs.it4i.cz/salomon/software/mpi-1/mpi.md b/converted/docs.it4i.cz/salomon/software/mpi-1/mpi.md index 95609cf30be74e2500580b33f654c2c109966bb4..22a1f72ed72340f755062d1e662b73fc943abbc0 100644 --- a/converted/docs.it4i.cz/salomon/software/mpi-1/mpi.md +++ b/converted/docs.it4i.cz/salomon/software/mpi-1/mpi.md @@ -3,60 +3,60 @@ MPI - + Setting up MPI Environment -------------------------- The Salomon cluster provides several implementations of the MPI library: - ------------------------------------------------------------------------- - MPI Library Thread support - ------------------------------------ ------------------------------------ - **Intel MPI 4.1** Full thread support up to - MPI_THREAD_MULTIPLE +------------------------------------------------------------------------- +MPI Library Thread support +------------------------------------ ------------------------------------ +Intel MPI 4.1** Full thread support up to + MPI_THREAD_MULTIPLE - **Intel MPI 5.0** Full thread support up to - MPI_THREAD_MULTIPLE +Intel MPI 5.0** Full thread support up to + MPI_THREAD_MULTIPLE - OpenMPI 1.8.6 Full thread support up to - MPI_THREAD_MULTIPLE, MPI-3.0 - support +OpenMPI 1.8.6 Full thread support up to + MPI_THREAD_MULTIPLE, MPI-3.0 + support - SGI MPT 2.12 - ------------------------------------------------------------------------- +SGI MPT 2.12 +------------------------------------------------------------------------- MPI libraries are activated via the environment modules. Look up section modulefiles/mpi in module avail - $ module avail - ------------------------------ /apps/modules/mpi ------------------------------- - impi/4.1.1.036-iccifort-2013.5.192 - impi/4.1.1.036-iccifort-2013.5.192-GCC-4.8.3 - impi/5.0.3.048-iccifort-2015.3.187 - impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25 - MPT/2.12 - OpenMPI/1.8.6-GNU-5.1.0-2.25 + $ module avail + ------------------------------ /apps/modules/mpi ------------------------------- + impi/4.1.1.036-iccifort-2013.5.192 + impi/4.1.1.036-iccifort-2013.5.192-GCC-4.8.3 + impi/5.0.3.048-iccifort-2015.3.187 + impi/5.0.3.048-iccifort-2015.3.187-GNU-5.1.0-2.25 + MPT/2.12 + OpenMPI/1.8.6-GNU-5.1.0-2.25 There are default compilers associated with any particular MPI implementation. The defaults may be changed, the MPI libraries may be used in conjunction with any compiler. The defaults are selected via the modules in following way - -------------------------------------------------------------------------- - Module MPI Compiler suite - ------------------------ ------------------------ ------------------------ - impi-5.0.3.048-iccifort- Intel MPI 5.0.3 - 2015.3.187 +-------------------------------------------------------------------------- +Module MPI Compiler suite +------------------------ ------------------------ ------------------------ +impi-5.0.3.048-iccifort- Intel MPI 5.0.3 +2015.3.187 - OpenMP-1.8.6-GNU-5.1.0-2 OpenMPI 1.8.6 - .25 - -------------------------------------------------------------------------- +OpenMP-1.8.6-GNU-5.1.0-2 OpenMPI 1.8.6 +.25 +-------------------------------------------------------------------------- Examples: - $ module load gompi/2015b + $ module load gompi/2015b In this example, we activate the latest OpenMPI with latest GNU compilers (OpenMPI 1.8.6 and GCC 5.1). Please see more information about @@ -65,7 +65,7 @@ Modules](../../environment-and-modules.html) . To use OpenMPI with the intel compiler suite, use - $ module load iompi/2015.03 + $ module load iompi/2015.03 In this example, the openmpi 1.8.6 using intel compilers is activated. It's used "iompi" toolchain. @@ -76,14 +76,14 @@ Compiling MPI Programs After setting up your MPI environment, compile your program using one of the mpi wrappers - $ mpicc -v - $ mpif77 -v - $ mpif90 -v + $ mpicc -v + $ mpif77 -v + $ mpif90 -v When using Intel MPI, use the following MPI wrappers: - $ mpicc - $ mpiifort + $ mpicc + $ mpiifort Wrappers mpif90, mpif77 that are provided by Intel MPI are designed for gcc and gfortran. You might be able to compile MPI code by them even @@ -92,35 +92,35 @@ native MIC compilation with -mmic does not work with mpif90). Example program: - // helloworld_mpi.c - #include <stdio.h> + // helloworld_mpi.c + #include <stdio.h> - #include<mpi.h> + #include<mpi.h> - int main(int argc, char **argv) { + int main(int argc, char **argv) { - int len; - int rank, size; - char node[MPI_MAX_PROCESSOR_NAME]; + int len; + int rank, size; + char node[MPI_MAX_PROCESSOR_NAME]; - // Initiate MPI - MPI_Init(&argc, &argv); - MPI_Comm_rank(MPI_COMM_WORLD,&rank); - MPI_Comm_size(MPI_COMM_WORLD,&size); + // Initiate MPI + MPI_Init(&argc, &argv); + MPI_Comm_rank(MPI_COMM_WORLD,&rank); + MPI_Comm_size(MPI_COMM_WORLD,&size); - // Get hostame and print - MPI_Get_processor_name(node,&len); - printf("Hello world! from rank %d of %d on host %sn",rank,size,node); + // Get hostame and print + MPI_Get_processor_name(node,&len); + printf("Hello world! from rank %d of %d on host %sn",rank,size,node); - // Finalize and exit - MPI_Finalize(); + // Finalize and exit + MPI_Finalize(); - return 0; - } + return 0; + } Compile the above example with - $ mpicc helloworld_mpi.c -o helloworld_mpi.x + $ mpicc helloworld_mpi.c -o helloworld_mpi.x Running MPI Programs -------------------- @@ -148,13 +148,13 @@ Consider these ways to run an MPI program: 2. Two MPI processes per node, 12 threads per process 3. 24 MPI processes per node, 1 thread per process. -**One MPI** process per node, using 24 threads, is most useful for +One MPI** process per node, using 24 threads, is most useful for memory demanding applications, that make good use of processor cache memory and are not memory bound. This is also a preferred way for communication intensive applications as one process per node enjoys full bandwidth access to the network interface. -**Two MPI** processes per node, using 12 threads each, bound to +Two MPI** processes per node, using 12 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 12 @@ -168,7 +168,7 @@ operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables. -**24 MPI** processes per node, using 1 thread each bound to processor +24 MPI** processes per node, using 1 thread each bound to processor core is most suitable for highly scalable applications with low communication demand. diff --git a/converted/docs.it4i.cz/salomon/software/mpi-1/mpi4py-mpi-for-python.md b/converted/docs.it4i.cz/salomon/software/mpi-1/mpi4py-mpi-for-python.md index 86056f58d87fe0760ae20d9fdd02ae6136783e65..fddc6f7d275e4ce7d56828271d4442eae5b0eb4f 100644 --- a/converted/docs.it4i.cz/salomon/software/mpi-1/mpi4py-mpi-for-python.md +++ b/converted/docs.it4i.cz/salomon/software/mpi-1/mpi4py-mpi-for-python.md @@ -4,7 +4,7 @@ MPI4Py (MPI for Python) OpenMPI interface to Python - + Introduction ------------ @@ -30,7 +30,7 @@ MPI4Py is build for OpenMPI. Before you start with MPI4Py you need to load Python and OpenMPI modules. You can use toolchain, that loads Python and OpenMPI at once. - $ module load Python/2.7.9-foss-2015g + $ module load Python/2.7.9-foss-2015g Execution --------- @@ -38,65 +38,65 @@ Execution You need to import MPI to your python program. Include the following line to the python script: - from mpi4py import MPI + from mpi4py import MPI The MPI4Py enabled python programs [execute as any other OpenMPI](Running_OpenMPI.html) code.The simpliest way is to run - $ mpiexec python <script>.py + $ mpiexec python <script>.py -<span>For example</span> +>For example - $ mpiexec python hello_world.py + $ mpiexec python hello_world.py Examples -------- ### Hello world! - from mpi4py import MPI + from mpi4py import MPI - comm = MPI.COMM_WORLD + comm = MPI.COMM_WORLD - print "Hello! I'm rank %d from %d running in total..." % (comm.rank, comm.size) + print "Hello! I'm rank %d from %d running in total..." % (comm.rank, comm.size) - comm.Barrier()  # wait for everybody to synchronize + comm.Barrier()  # wait for everybody to synchronize -### <span>Collective Communication with NumPy arrays</span> +### >Collective Communication with NumPy arrays - from __future__ import division - from mpi4py import MPI - import numpy as np + from __future__ import division + from mpi4py import MPI + import numpy as np - comm = MPI.COMM_WORLD + comm = MPI.COMM_WORLD - print("-"*78) - print(" Running on %d cores" % comm.size) - print("-"*78) + print("-"*78) + print(" Running on %d cores" % comm.size) + print("-"*78) - comm.Barrier() + comm.Barrier() - # Prepare a vector of N=5 elements to be broadcasted... - N = 5 - if comm.rank == 0: -   A = np.arange(N, dtype=np.float64)   # rank 0 has proper data - else: -   A = np.empty(N, dtype=np.float64)   # all other just an empty array + # Prepare a vector of N=5 elements to be broadcasted... + N = 5 + if comm.rank == 0: +   A = np.arange(N, dtype=np.float64)   # rank 0 has proper data + else: +   A = np.empty(N, dtype=np.float64)   # all other just an empty array - # Broadcast A from rank 0 to everybody - comm.Bcast( [A, MPI.DOUBLE] ) + # Broadcast A from rank 0 to everybody + comm.Bcast( [A, MPI.DOUBLE] ) - # Everybody should now have the same... - print "[%02d] %s" % (comm.rank, A) + # Everybody should now have the same... + print "[%02d] %s" % (comm.rank, A) Execute the above code as: - $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I + $ qsub -q qexp -l select=4:ncpus=24:mpiprocs=24:ompthreads=1 -I - $ module load Python/2.7.9-foss-2015g + $ module load Python/2.7.9-foss-2015g - $ mpiexec --map-by core --bind-to core python hello_world.py + $ mpiexec --map-by core --bind-to core python hello_world.py In this example, we run MPI4Py enabled code on 4 nodes, 24 cores per node (total of 96 processes), each python process is bound to a diff --git a/converted/docs.it4i.cz/salomon/software/numerical-languages/introduction.md b/converted/docs.it4i.cz/salomon/software/numerical-languages/introduction.md index 165a0da35c1f6a04935a252434ad3dc616ad9805..5dba41d7cb27d6e096013c34ef931f1fb81cf666 100644 --- a/converted/docs.it4i.cz/salomon/software/numerical-languages/introduction.md +++ b/converted/docs.it4i.cz/salomon/software/numerical-languages/introduction.md @@ -4,7 +4,7 @@ Numerical languages Interpreted languages for numerical computations and analysis - + Introduction ------------ @@ -18,10 +18,10 @@ Matlab MATLAB^®^ is a high-level language and interactive environment for numerical computation, visualization, and programming. - $ module load MATLAB - $ matlab + $ module load MATLAB + $ matlab -Read more at the [Matlab<span class="internal-link"></span> +Read more at the [Matlab page](matlab.html). Octave @@ -31,8 +31,8 @@ GNU Octave is a high-level interpreted language, primarily intended for numerical computations. The Octave language is quite similar to Matlab so that most programs are easily portable. - $ module load Octave - $ octave + $ module load Octave + $ octave Read more at the [Octave page](octave.html). @@ -42,8 +42,8 @@ R The R is an interpreted language and environment for statistical computing and graphics. - $ module load R - $ R + $ module load R + $ R Read more at the [R page](r.html). diff --git a/converted/docs.it4i.cz/salomon/software/numerical-languages/matlab.md b/converted/docs.it4i.cz/salomon/software/numerical-languages/matlab.md index 32e0f3e37a963de7736613f570bcbff427820435..0e2275e041b2fa63ebd5c4ac4c75beab900280f7 100644 --- a/converted/docs.it4i.cz/salomon/software/numerical-languages/matlab.md +++ b/converted/docs.it4i.cz/salomon/software/numerical-languages/matlab.md @@ -3,7 +3,7 @@ Matlab - + Introduction ------------ @@ -11,24 +11,24 @@ Introduction Matlab is available in versions R2015a and R2015b. There are always two variants of the release: -- Non commercial or so called EDU variant, which can be used for - common research and educational purposes. -- Commercial or so called COM variant, which can used also for - commercial activities. The licenses for commercial variant are much - more expensive, so usually the commercial variant has only subset of - features compared to the EDU available. +- Non commercial or so called EDU variant, which can be used for + common research and educational purposes. +- Commercial or so called COM variant, which can used also for + commercial activities. The licenses for commercial variant are much + more expensive, so usually the commercial variant has only subset of + features compared to the EDU available.  To load the latest version of Matlab load the module - $ module load MATLAB + $ module load MATLAB By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use - $ module avail MATLAB + $ module avail MATLAB If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use @@ -46,16 +46,16 @@ is recommended. To run Matlab with GUI, use - $ matlab + $ matlab To run Matlab in text mode, without the Matlab Desktop GUI environment, use - $ matlab -nodesktop -nosplash + $ matlab -nodesktop -nosplash plots, images, etc... will be still available. -[]()Running parallel Matlab using Distributed Computing Toolbox / Engine +Running parallel Matlab using Distributed Computing Toolbox / Engine ------------------------------------------------------------------------ Distributed toolbox is available only for the EDU variant @@ -72,11 +72,11 @@ To use Distributed Computing, you first need to setup a parallel profile. We have provided the profile for you, you can either import it in MATLAB command line: - >> parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonPBSPro.settings') + >> parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonPBSPro.settings') - ans = + ans = - SalomonPBSPro + SalomonPBSPro Or in the GUI, go to tab HOME -> Parallel -> Manage Cluster Profiles..., click Import and navigate to : @@ -96,9 +96,9 @@ for Matlab GUI. For more information about GUI based applications on Anselm see [this page](../../../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc.html). - $ xhost + - $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 - -l feature__matlab__MATLAB=1 + $ xhost + + $ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 + -l feature__matlab__MATLAB=1 This qsub command example shows how to run Matlab on a single node. @@ -109,35 +109,35 @@ Engines licenses. Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: - r1i0n17$ module load MATLAB/2015a-EDU - r1i0n17$ matlab & + r1i0n17$ module load MATLAB/2015a-EDU + r1i0n17$ matlab & -### []()Parallel Matlab batch job in Local mode +### Parallel Matlab batch job in Local mode To run matlab in batch mode, write an matlab script, then write a bash jobscript and execute via the qsub command. By default, matlab will execute one matlab worker instance per allocated core. - #!/bin/bash - #PBS -A PROJECT ID - #PBS -q qprod - #PBS -l select=1:ncpus=24:mpiprocs=24:ompthreads=1 + #!/bin/bash + #PBS -A PROJECT ID + #PBS -q qprod + #PBS -l select=1:ncpus=24:mpiprocs=24:ompthreads=1 - # change to shared scratch directory - SCR=/scratch/work/user/$USER/$PBS_JOBID - mkdir -p $SCR ; cd $SCR || exit + # change to shared scratch directory + SCR=/scratch/work/user/$USER/$PBS_JOBID + mkdir -p $SCR ; cd $SCR || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/matlabcode.m . + # copy input file to scratch + cp $PBS_O_WORKDIR/matlabcode.m . - # load modules - module load MATLAB/2015a-EDU + # load modules + module load MATLAB/2015a-EDU - # execute the calculation - matlab -nodisplay -r matlabcode > output.out + # execute the calculation + matlab -nodisplay -r matlabcode > output.out - # copy output file to home - cp output.out $PBS_O_WORKDIR/. + # copy output file to home + cp output.out $PBS_O_WORKDIR/. This script may be submitted directly to the PBS workload manager via the qsub command. The inputs and matlab script are in matlabcode.m @@ -148,14 +148,14 @@ include quit** statement at the end of the matlabcode.m script. Submit the jobscript using qsub - $ qsub ./jobscript + $ qsub ./jobscript ### Parallel Matlab Local mode program example The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. - cluster = parcluster('local') + cluster = parcluster('local') This script creates scheduler object "cluster" of type "local" that starts workers locally. @@ -167,39 +167,39 @@ function. The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. - parpool(cluster,24); - - - ... parallel code ... + parpool(cluster,24); + + + ... parallel code ... + - - parpool close + parpool close The complete example showing how to use Distributed Computing Toolbox in local mode is shown here. - cluster = parcluster('local'); - cluster + cluster = parcluster('local'); + cluster - parpool(cluster,24); + parpool(cluster,24); - n=2000; + n=2000; - W = rand(n,n); - W = distributed(W); - x = (1:n)'; - x = distributed(x); - spmd - [~, name] = system('hostname') -    -    T = W*x; % Calculation performed on labs, in parallel. -             % T and W are both codistributed arrays here. - end - T; - whos        % T and W are both distributed arrays here. + W = rand(n,n); + W = distributed(W); + x = (1:n)'; + x = distributed(x); + spmd + [~, name] = system('hostname') +    +    T = W*x; % Calculation performed on labs, in parallel. +             % T and W are both codistributed arrays here. + end + T; + whos        % T and W are both distributed arrays here. - parpool close - quit + parpool close + quit You can copy and paste the example in a .m file and execute. Note that the parpool size should correspond to **total number of cores** @@ -214,29 +214,29 @@ it spawns the workers in a separate job submitted by MATLAB using qsub. This is an example of m-script using PBS mode: - cluster = parcluster('SalomonPBSPro'); - set(cluster, 'SubmitArguments', '-A OPEN-0-0'); - set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=24'); - set(cluster, 'NumWorkers', 240); + cluster = parcluster('SalomonPBSPro'); + set(cluster, 'SubmitArguments', '-A OPEN-0-0'); + set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=24'); + set(cluster, 'NumWorkers', 240); - pool = parpool(cluster,240); + pool = parpool(cluster,240); - n=2000; + n=2000; - W = rand(n,n); - W = distributed(W); - x = (1:n)'; - x = distributed(x); - spmd - [~, name] = system('hostname') + W = rand(n,n); + W = distributed(W); + x = (1:n)'; + x = distributed(x); + spmd + [~, name] = system('hostname') - T = W*x; % Calculation performed on labs, in parallel. - % T and W are both codistributed arrays here. - end - whos % T and W are both distributed arrays here. + T = W*x; % Calculation performed on labs, in parallel. + % T and W are both codistributed arrays here. + end + whos % T and W are both distributed arrays here. - % shut down parallel pool - delete(pool) + % shut down parallel pool + delete(pool) Note that we first construct a cluster object using the imported profile, then set some important options, namely : SubmitArguments, @@ -264,28 +264,28 @@ SalomonPBSPro](matlab.html#running-parallel-matlab-using-distributed-computing-t This is an example of m-script using direct mode: - parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonDirect.settings') - cluster = parcluster('SalomonDirect'); - set(cluster, 'NumWorkers', 48); + parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonDirect.settings') + cluster = parcluster('SalomonDirect'); + set(cluster, 'NumWorkers', 48); - pool = parpool(cluster, 48); + pool = parpool(cluster, 48); - n=2000; + n=2000; - W = rand(n,n); - W = distributed(W); - x = (1:n)'; - x = distributed(x); - spmd - [~, name] = system('hostname') + W = rand(n,n); + W = distributed(W); + x = (1:n)'; + x = distributed(x); + spmd + [~, name] = system('hostname') - T = W*x; % Calculation performed on labs, in parallel. - % T and W are both codistributed arrays here. - end - whos % T and W are both distributed arrays here. + T = W*x; % Calculation performed on labs, in parallel. + % T and W are both codistributed arrays here. + end + whos % T and W are both distributed arrays here. - % shut down parallel pool - delete(pool) + % shut down parallel pool + delete(pool) ### Non-interactive Session and Licenses @@ -308,12 +308,12 @@ getting the resource allocation. Starting Matlab workers is an expensive process that requires certain amount of time. For your information please see the following table: - compute nodes number of workers start-up time[s] - --------------- ------------------- -------------------- - 16 384 831 - 8 192 807 - 4 96 483 - 2 48 16 +compute nodes number of workers start-up time[s] +--------------- ------------------- -------------------- +16 384 831 +8 192 807 +4 96 483 +2 48 16 MATLAB on UV2000 ----------------- @@ -329,13 +329,13 @@ You can use MATLAB on UV2000 in two parallel modes : Since this is a SMP machine, you can completely avoid using Parallel Toolbox and use only MATLAB's threading. MATLAB will automatically -detect the number of cores you have allocated and will set <span -class="monospace">maxNumCompThreads </span>accordingly and certain -operations, such as <span class="monospace">fft, , eig, svd</span>, +detect the number of cores you have allocated and will set +maxNumCompThreads accordingly and certain +operations, such as fft, , eig, svd, etc. will be automatically run in threads. The advantage of this mode is -that you don't need to modify your existing sequential codes.<span -class="monospace"> -</span> +that you don't need to modify your existing sequential codes. + + ### Local cluster mode diff --git a/converted/docs.it4i.cz/salomon/software/numerical-languages/octave.md b/converted/docs.it4i.cz/salomon/software/numerical-languages/octave.md index 729eebd783b8e15042c4c1c2b6a1dc023ab4615a..22465fc8dc9b74d0fcfe394e5f5b72ad7b00716f 100644 --- a/converted/docs.it4i.cz/salomon/software/numerical-languages/octave.md +++ b/converted/docs.it4i.cz/salomon/software/numerical-languages/octave.md @@ -3,7 +3,7 @@ Octave - + GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical @@ -15,16 +15,16 @@ non-interactive programs. The Octave language is quite similar to Matlab so that most programs are easily portable. Read more on <http://www.gnu.org/software/octave/>**** -** -**Two versions of octave are available on the cluster, via module - Status Version module - ------------ -------------- -------- - **Stable** Octave 3.8.2 Octave +Two versions of octave are available on the cluster, via module + +Status Version module +------------ -------------- -------- +Stable** Octave 3.8.2 Octave  - $ module load Octave + $ module load Octave The octave on the cluster is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many octave kernels, @@ -36,32 +36,32 @@ OMP_NUM_THREADS environment variable. To run octave interactively, log in with ssh -X parameter for X11 forwarding. Run octave: - $ octave + $ octave To run octave in batch mode, write an octave script, then write a bash jobscript and execute via the qsub command. By default, octave will use 16 threads when running MKL kernels. - #!/bin/bash + #!/bin/bash - # change to local scratch directory - mkdir -p /scratch/work/user/$USER/$PBS_JOBID - cd /scratch/work/user/$USER/$PBS_JOBID || exit + # change to local scratch directory + mkdir -p /scratch/work/user/$USER/$PBS_JOBID + cd /scratch/work/user/$USER/$PBS_JOBID || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/octcode.m . + # copy input file to scratch + cp $PBS_O_WORKDIR/octcode.m . - # load octave module - module load Octave + # load octave module + module load Octave - # execute the calculation - octave -q --eval octcode > output.out + # execute the calculation + octave -q --eval octcode > output.out - # copy output file to home - cp output.out $PBS_O_WORKDIR/. + # copy output file to home + cp output.out $PBS_O_WORKDIR/. - #exit - exit + #exit + exit This script may be submitted directly to the PBS workload manager via the qsub command. The inputs are in octcode.m file, outputs in diff --git a/converted/docs.it4i.cz/salomon/software/numerical-languages/r.md b/converted/docs.it4i.cz/salomon/software/numerical-languages/r.md index f1ac55755baeed1239f8e90f8fb9561711f8e12a..f75613e70d239aa24ed31ecfe8c9741031f95b1a 100644 --- a/converted/docs.it4i.cz/salomon/software/numerical-languages/r.md +++ b/converted/docs.it4i.cz/salomon/software/numerical-languages/r.md @@ -3,7 +3,7 @@ R - + Introduction ------------ @@ -33,20 +33,20 @@ Read more on <http://www.r-project.org/>, Modules ------- -****The R version 3.1.1 is available on the cluster, along with GUI +**The R version 3.1.1 is available on the cluster, along with GUI interface Rstudio - Application Version module - ------------- -------------- --------------------- - **R** R 3.1.1 R/3.1.1-intel-2015b - **Rstudio** Rstudio 0.97 Rstudio +Application Version module +------------- -------------- --------------------- +R** R 3.1.1 R/3.1.1-intel-2015b +Rstudio** Rstudio 0.97 Rstudio - $ module load R + $ module load R Execution --------- -[]()The R on Anselm is linked to highly optimized MKL mathematical +The R on Anselm is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many R kernels, notably the linear algebra subroutines. The R runs these heavy calculation kernels without any penalty. By default, the R would @@ -58,8 +58,8 @@ OMP_NUM_THREADS environment variable. To run R interactively, using Rstudio GUI, log in with ssh -X parameter for X11 forwarding. Run rstudio: - $ module load Rstudio - $ rstudio + $ module load Rstudio + $ rstudio ### Batch execution @@ -69,25 +69,25 @@ running MKL kernels. Example jobscript: - #!/bin/bash + #!/bin/bash - # change to local scratch directory - cd /lscratch/$PBS_JOBID || exit + # change to local scratch directory + cd /lscratch/$PBS_JOBID || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/rscript.R . + # copy input file to scratch + cp $PBS_O_WORKDIR/rscript.R . - # load R module - module load R + # load R module + module load R - # execute the calculation - R CMD BATCH rscript.R routput.out + # execute the calculation + R CMD BATCH rscript.R routput.out - # copy output file to home - cp routput.out $PBS_O_WORKDIR/. + # copy output file to home + cp routput.out $PBS_O_WORKDIR/. - #exit - exit + #exit + exit This script may be submitted directly to the PBS workload manager via the qsub command. The inputs are in rscript.R file, outputs in @@ -105,7 +105,7 @@ above](r.html#interactive-execution). In the following sections, we focus on explicit parallelization, where parallel constructs are directly stated within the R script. -[]()Package parallel +Package parallel -------------------- The package parallel provides support for parallel computation, @@ -114,15 +114,15 @@ from package snow) and random-number generation. The package is activated this way: - $ R - > library(parallel) + $ R + > library(parallel) More information and examples may be obtained directly by reading the documentation available in R - > ?parallel - > library(help = "parallel") - > vignette("parallel") + > ?parallel + > library(help = "parallel") + > vignette("parallel") Download the package [parallell](package-parallel-vignette) vignette. @@ -139,41 +139,41 @@ Only cores of single node can be utilized this way! Forking example: - library(parallel) + library(parallel) - #integrand function - f <- function(i,h) { - x <- h*(i-0.5) - return (4/(1 + x*x)) - } + #integrand function + f <- function(i,h) { + x <- h*(i-0.5) + return (4/(1 + x*x)) + } - #initialize - size <- detectCores() + #initialize + size <- detectCores() - while (TRUE) - { - #read number of intervals - cat("Enter the number of intervals: (0 quits) ") - fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) + while (TRUE) + { + #read number of intervals + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) - if(n<=0) break + if(n<=0) break - #run the calculation - n <- max(n,size) - h <- 1.0/n + #run the calculation + n <- max(n,size) + h <- 1.0/n - i <- seq(1,n); - pi3 <- h*sum(simplify2array(mclapply(i,f,h,mc.cores=size))); + i <- seq(1,n); + pi3 <- h*sum(simplify2array(mclapply(i,f,h,mc.cores=size))); - #print results - cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) - } + #print results + cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) + } The above example is the classic parallel example for calculating the number Ď€. Note the **detectCores()** and **mclapply()** functions. Execute the example as: - $ R --slave --no-save --no-restore -f pi3p.R + $ R --slave --no-save --no-restore -f pi3p.R Every evaluation of the integrad function runs in parallel on different process. @@ -193,8 +193,8 @@ reference manual is available at When using package Rmpi, both openmpi and R modules must be loaded - $ module load OpenMPI - $ module load R + $ module load OpenMPI + $ module load R Rmpi may be used in three basic ways. The static approach is identical to executing any other MPI programm. In addition, there is Rslaves @@ -202,60 +202,60 @@ dynamic MPI approach and the mpi.apply approach. In the following section, we will use the number Ď€ integration example, to illustrate all these concepts. -### []()static Rmpi +### static Rmpi Static Rmpi programs are executed via mpiexec, as any other MPI programs. Number of processes is static - given at the launch time. Static Rmpi example: - library(Rmpi) - - #integrand function - f <- function(i,h) { - x <- h*(i-0.5) - return (4/(1 + x*x)) + library(Rmpi) + + #integrand function + f <- function(i,h) { + x <- h*(i-0.5) + return (4/(1 + x*x)) + } + + #initialize + invisible(mpi.comm.dup(0,1)) + rank <- mpi.comm.rank() + size <- mpi.comm.size() + n<-0 + + while (TRUE) + { + #read number of intervals + if (rank==0) { + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) } - #initialize - invisible(mpi.comm.dup(0,1)) - rank <- mpi.comm.rank() - size <- mpi.comm.size() - n<-0 + #broadcat the intervals + n <- mpi.bcast(as.integer(n),type=1) - while (TRUE) - { - #read number of intervals - if (rank==0) { - cat("Enter the number of intervals: (0 quits) ") - fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) - } + if(n<=0) break - #broadcat the intervals - n <- mpi.bcast(as.integer(n),type=1) + #run the calculation + n <- max(n,size) + h <- 1.0/n - if(n<=0) break + i <- seq(rank+1,n,size); + mypi <- h*sum(sapply(i,f,h)); - #run the calculation - n <- max(n,size) - h <- 1.0/n + pi3 <- mpi.reduce(mypi) - i <- seq(rank+1,n,size); - mypi <- h*sum(sapply(i,f,h)); - - pi3 <- mpi.reduce(mypi) - - #print results - if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) - } + #print results + if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) + } - mpi.quit() + mpi.quit() The above is the static MPI example for calculating the number Ď€. Note the **library(Rmpi)** and **mpi.comm.dup()** function calls. Execute the example as: - $ mpirun R --slave --no-save --no-restore -f pi3.R + $ mpirun R --slave --no-save --no-restore -f pi3.R ### dynamic Rmpi @@ -265,70 +265,70 @@ function call within the Rmpi program. Dynamic Rmpi example: - #integrand function - f <- function(i,h) { - x <- h*(i-0.5) - return (4/(1 + x*x)) + #integrand function + f <- function(i,h) { + x <- h*(i-0.5) + return (4/(1 + x*x)) + } + + #the worker function + workerpi <- function() + { + #initialize + rank <- mpi.comm.rank() + size <- mpi.comm.size() + n<-0 + + while (TRUE) + { + #read number of intervals + if (rank==0) { + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) } - #the worker function - workerpi <- function() - { - #initialize - rank <- mpi.comm.rank() - size <- mpi.comm.size() - n<-0 + #broadcat the intervals + n <- mpi.bcast(as.integer(n),type=1) - while (TRUE) - { - #read number of intervals - if (rank==0) { - cat("Enter the number of intervals: (0 quits) ") - fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) - } + if(n<=0) break - #broadcat the intervals - n <- mpi.bcast(as.integer(n),type=1) + #run the calculation + n <- max(n,size) + h <- 1.0/n - if(n<=0) break + i <- seq(rank+1,n,size); + mypi <- h*sum(sapply(i,f,h)); - #run the calculation - n <- max(n,size) - h <- 1.0/n + pi3 <- mpi.reduce(mypi) - i <- seq(rank+1,n,size); - mypi <- h*sum(sapply(i,f,h)); + #print results + if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) + } + } - pi3 <- mpi.reduce(mypi) + #main + library(Rmpi) - #print results - if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) - } - } - - #main - library(Rmpi) - - cat("Enter the number of slaves: ") - fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) + cat("Enter the number of slaves: ") + fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) - mpi.spawn.Rslaves(nslaves=ns) - mpi.bcast.Robj2slave(f) - mpi.bcast.Robj2slave(workerpi) + mpi.spawn.Rslaves(nslaves=ns) + mpi.bcast.Robj2slave(f) + mpi.bcast.Robj2slave(workerpi) - mpi.bcast.cmd(workerpi()) - workerpi() + mpi.bcast.cmd(workerpi()) + workerpi() - mpi.quit() + mpi.quit() The above example is the dynamic MPI example for calculating the number Ď€. Both master and slave processes carry out the calculation. Note the -**mpi.spawn.Rslaves(), mpi.bcast.Robj2slave()** and the -**mpi.bcast.cmd()** function calls. +mpi.spawn.Rslaves(), mpi.bcast.Robj2slave()** and the +mpi.bcast.cmd()** function calls. Execute the example as: - $ mpirun -np 1 R --slave --no-save --no-restore -f pi3Rslaves.R + $ mpirun -np 1 R --slave --no-save --no-restore -f pi3Rslaves.R Note that this method uses MPI_Comm_spawn (Dynamic process feature of MPI-2) to start the slave processes - the master process needs to be @@ -347,63 +347,63 @@ Execution is identical to other dynamic Rmpi programs. mpi.apply Rmpi example: - #integrand function - f <- function(i,h) { - x <- h*(i-0.5) - return (4/(1 + x*x)) - } + #integrand function + f <- function(i,h) { + x <- h*(i-0.5) + return (4/(1 + x*x)) + } - #the worker function - workerpi <- function(rank,size,n) - { - #run the calculation - n <- max(n,size) - h <- 1.0/n + #the worker function + workerpi <- function(rank,size,n) + { + #run the calculation + n <- max(n,size) + h <- 1.0/n - i <- seq(rank,n,size); - mypi <- h*sum(sapply(i,f,h)); + i <- seq(rank,n,size); + mypi <- h*sum(sapply(i,f,h)); - return(mypi) - } + return(mypi) + } - #main - library(Rmpi) + #main + library(Rmpi) - cat("Enter the number of slaves: ") - fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) + cat("Enter the number of slaves: ") + fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) - mpi.spawn.Rslaves(nslaves=ns) - mpi.bcast.Robj2slave(f) - mpi.bcast.Robj2slave(workerpi) + mpi.spawn.Rslaves(nslaves=ns) + mpi.bcast.Robj2slave(f) + mpi.bcast.Robj2slave(workerpi) - while (TRUE) - { - #read number of intervals - cat("Enter the number of intervals: (0 quits) ") - fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) - if(n<=0) break + while (TRUE) + { + #read number of intervals + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) + if(n<=0) break - #run workerpi - i=seq(1,2*ns) - pi3=sum(mpi.parSapply(i,workerpi,2*ns,n)) + #run workerpi + i=seq(1,2*ns) + pi3=sum(mpi.parSapply(i,workerpi,2*ns,n)) - #print results - cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) - } + #print results + cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) + } - mpi.quit() + mpi.quit() The above is the mpi.apply MPI example for calculating the number Ď€. Only the slave processes carry out the calculation. Note the -**mpi.parSapply(), ** function call. The package <span +mpi.parSapply(), ** function call. The package class="anchor-link">parallel -[example](r.html#package-parallel)</span>[above](r.html#package-parallel) +[example](r.html#package-parallel)[above](r.html#package-parallel) may be trivially adapted (for much better performance) to this structure using the mclapply() in place of mpi.parSapply(). Execute the example as: - $ mpirun -np 1 R --slave --no-save --no-restore -f pi3parSapply.R + $ mpirun -np 1 R --slave --no-save --no-restore -f pi3parSapply.R Combining parallel and Rmpi --------------------------- @@ -420,30 +420,30 @@ submit via the **qsub** Example jobscript for [static Rmpi](r.html#static-rmpi) parallel R execution, running 1 process per core: - #!/bin/bash - #PBS -q qprod - #PBS -N Rjob - #PBS -l select=100:ncpus=24:mpiprocs=24:ompthreads=1 + #!/bin/bash + #PBS -q qprod + #PBS -N Rjob + #PBS -l select=100:ncpus=24:mpiprocs=24:ompthreads=1 - # change to scratch directory - SCRDIR=/scratch/work/user/$USER/myjob - cd $SCRDIR || exit + # change to scratch directory + SCRDIR=/scratch/work/user/$USER/myjob + cd $SCRDIR || exit - # copy input file to scratch - cp $PBS_O_WORKDIR/rscript.R . + # copy input file to scratch + cp $PBS_O_WORKDIR/rscript.R . - # load R and openmpi module - module load R - module load OpenMPI + # load R and openmpi module + module load R + module load OpenMPI - # execute the calculation - mpiexec -bycore -bind-to-core R --slave --no-save --no-restore -f rscript.R + # execute the calculation + mpiexec -bycore -bind-to-core R --slave --no-save --no-restore -f rscript.R - # copy output file to home - cp routput.out $PBS_O_WORKDIR/. + # copy output file to home + cp routput.out $PBS_O_WORKDIR/. - #exit - exit + #exit + exit For more information about jobscripts and MPI execution refer to the [Job @@ -460,7 +460,7 @@ linear algebra operations on the Xeon Phi accelerator by using Automated Offload. To use MKL Automated Offload, you need to first set this environment variable before R execution : - $ export MKL_MIC_ENABLE=1 + $ export MKL_MIC_ENABLE=1 [Read more about automatic offload](../intel-xeon-phi.html) diff --git a/converted/docs.it4i.cz/salomon/software/operating-system.md b/converted/docs.it4i.cz/salomon/software/operating-system.md index e1356212ff7701988f3cea747f6c2ca983dbac81..12519fd4a000dbf83ef13e7293a62d25d7150cca 100644 --- a/converted/docs.it4i.cz/salomon/software/operating-system.md +++ b/converted/docs.it4i.cz/salomon/software/operating-system.md @@ -4,13 +4,13 @@ Operating System The operating system, deployed on Salomon cluster - + The operating system on Salomon is Linux - CentOS 6.6. -<span>The CentOS Linux distribution is a stable, predictable, manageable +>The CentOS Linux distribution is a stable, predictable, manageable and reproducible platform derived from the sources of Red Hat Enterprise -Linux (RHEL).</span> +Linux (RHEL). diff --git a/converted/docs.it4i.cz/salomon/storage/cesnet-data-storage.md b/converted/docs.it4i.cz/salomon/storage/cesnet-data-storage.md index 2f65b5a9b91aa4b44fc2ac836760cc8497b53a3d..a5dfb47922cded184c2b07602830603efd4d70c4 100644 --- a/converted/docs.it4i.cz/salomon/storage/cesnet-data-storage.md +++ b/converted/docs.it4i.cz/salomon/storage/cesnet-data-storage.md @@ -3,7 +3,7 @@ CESNET Data Storage - + Introduction ------------ @@ -66,31 +66,31 @@ than copied in and out in a usual fashion. First, create the mountpoint - $ mkdir cesnet + $ mkdir cesnet Mount the storage. Note that you can choose among the ssh.du1.cesnet.cz (Plzen), ssh.du2.cesnet.cz (Jihlava), ssh.du3.cesnet.cz (Brno) Mount tier1_home **(only 5120M !)**: - $ sshfs username@ssh.du1.cesnet.cz:. cesnet/ + $ sshfs username@ssh.du1.cesnet.cz:. cesnet/ For easy future access from Anselm, install your public key - $ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys + $ cp .ssh/id_rsa.pub cesnet/.ssh/authorized_keys Mount tier1_cache_tape for the Storage VO: - $ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/ + $ sshfs username@ssh.du1.cesnet.cz:/cache_tape/VO_storage/home/username cesnet/ View the archive, copy the files and directories in and out - $ ls cesnet/ - $ cp -a mydir cesnet/. - $ cp cesnet/myfile . + $ ls cesnet/ + $ cp -a mydir cesnet/. + $ cp cesnet/myfile . Once done, please remember to unmount the storage - $ fusermount -u cesnet + $ fusermount -u cesnet ### Rsync access @@ -117,13 +117,13 @@ More about Rsync at Transfer large files to/from Cesnet storage, assuming membership in the Storage VO - $ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. - $ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile . + $ rsync --progress datafile username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. + $ rsync --progress username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafile . Transfer large directories to/from Cesnet storage, assuming membership in the Storage VO - $ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. - $ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder . + $ rsync --progress -av datafolder username@ssh.du1.cesnet.cz:VO_storage-cache_tape/. + $ rsync --progress -av username@ssh.du1.cesnet.cz:VO_storage-cache_tape/datafolder . Transfer rates of about 28MB/s can be expected. diff --git a/converted/docs.it4i.cz/salomon/storage/storage.md b/converted/docs.it4i.cz/salomon/storage/storage.md index 7b85a45db114de07de9236a1071890cec38eaef7..d0a381ed39b91a740d2d24315d3124bf4bbe7e27 100644 --- a/converted/docs.it4i.cz/salomon/storage/storage.md +++ b/converted/docs.it4i.cz/salomon/storage/storage.md @@ -3,16 +3,16 @@ Storage - + Introduction ------------ -There are two main shared file systems on Salomon cluster, the [<span -class="anchor-link"><span -class="anchor-link">HOME</span></span>](storage.html#home) -and [<span class="anchor-link"><span -class="anchor-link">SCRATCH</span></span>](storage.html#shared-filesystems). +There are two main shared file systems on Salomon cluster, the [ +class="anchor-link"> +class="anchor-link">HOME](storage.html#home) +and [ class="anchor-link"> +class="anchor-link">SCRATCH](storage.html#shared-filesystems). All login and compute nodes may access same data on shared filesystems. Compute nodes are also equipped with local (non-shared) scratch, ramdisk and tmp filesystems. @@ -28,7 +28,7 @@ Use [TEMP](storage.html#temp) for large scratch data. Do not use for [archiving](storage.html#archiving)! -[]()Archiving +Archiving ------------- Please don't use shared filesystems as a backup for large amount of data @@ -37,12 +37,12 @@ institutions in the Czech Republic can use [CESNET storage service](../../anselm-cluster-documentation/storage-1/cesnet-data-storage.html), which is available via SSHFS. -[]()Shared Filesystems +Shared Filesystems ---------------------- -Salomon computer provides two main shared filesystems, the [<span +Salomon computer provides two main shared filesystems, the [ class="anchor-link">HOME -filesystem</span>](storage.html#home-filesystem) and the +filesystem](storage.html#home-filesystem) and the [SCRATCH filesystem](storage.html#scratch-filesystem). The SCRATCH filesystem is partitioned to [WORK and TEMP workspaces](storage.html#shared-workspaces). The HOME @@ -52,7 +52,7 @@ systems are accessible via the Infiniband network. Extended ACLs are provided on both HOME/SCRATCH filesystems for the purpose of sharing data with other users using fine-grained control. -### []()[]()HOME filesystem +###HOME filesystem The HOME filesystem is realized as a Tiered filesystem, exported via NFS. The first tier has capacity 100TB, second tier has capacity 400TB. @@ -60,7 +60,7 @@ The filesystem is available on all login and computational nodes. The Home filesystem hosts the [HOME workspace](storage.html#home). -### []()[]()SCRATCH filesystem +###SCRATCH filesystem The architecture of Lustre on Salomon is composed of two metadata servers (MDS) and six data/object storage servers (OSS). Accessible @@ -68,29 +68,29 @@ capacity is 1.69 PB, shared among all users. The SCRATCH filesystem hosts the [WORK and TEMP workspaces](storage.html#shared-workspaces). -<span class="listitem">Configuration of the SCRATCH Lustre storage -</span> + class="listitem">Configuration of the SCRATCH Lustre storage -<span class="emphasis"></span><span class="emphasis"></span> -- <span class="emphasis">SCRATCH Lustre object storage</span> - <div class="itemizedlist"> - - Disk array SFA12KX - - 540 4TB SAS 7.2krpm disks - - 54 OSTs of 10 disks in RAID6 (8+2) - - 15 hot-spare disks - - 4x 400GB SSD cache + class="emphasis"> class="emphasis"> +- class="emphasis">SCRATCH Lustre object storage + <div class="itemizedlist"> - + - Disk array SFA12KX + - 540 4TB SAS 7.2krpm disks + - 54 OSTs of 10 disks in RAID6 (8+2) + - 15 hot-spare disks + - 4x 400GB SSD cache + + -- <span class="emphasis">SCRATCH Lustre metadata storage</span> - <div class="itemizedlist"> +- class="emphasis">SCRATCH Lustre metadata storage + <div class="itemizedlist"> - - Disk array EF3015 - - 12 600GB SAS 15krpm disks + - Disk array EF3015 + - 12 600GB SAS 15krpm disks - + @@ -103,15 +103,15 @@ A user file on the Lustre filesystem can be divided into multiple chunks (OSTs) (disks). The stripes are distributed among the OSTs in a round-robin fashion to ensure load balancing. -When a client (a <span class="glossaryItem">compute <span -class="glossaryItem">node</span></span> from your job) needs to create -or access a file, the client queries the metadata server (<span -class="glossaryItem">MDS</span>) and the metadata target (<span -class="glossaryItem">MDT</span>) for the layout and location of the +When a client (a class="glossaryItem">compute +class="glossaryItem">node from your job) needs to create +or access a file, the client queries the metadata server ( +class="glossaryItem">MDS) and the metadata target ( +class="glossaryItem">MDT) for the layout and location of the [file's stripes](http://www.nas.nasa.gov/hecc/support/kb/Lustre_Basics_224.html#striping). Once the file is opened and the client obtains the striping information, -the <span class="glossaryItem">MDS</span> is no longer involved in the +the class="glossaryItem">MDS is no longer involved in the file I/O process. The client interacts directly with the object storage servers (OSSes) and OSTs to perform I/O operations such as locking, disk allocation, storage, and retrieval. @@ -124,17 +124,17 @@ There is default stripe configuration for Salomon Lustre filesystems. However, users can set the following stripe parameters for their own directories or files to get optimum I/O performance: -1. stripe_size: the size of the chunk in bytes; specify with k, m, or - g to use units of KB, MB, or GB, respectively; the size must be an - even multiple of 65,536 bytes; default is 1MB for all Salomon Lustre - filesystems -2. stripe_count the number of OSTs to stripe across; default is 1 for - Salomon Lustre filesystems one can specify -1 to use all OSTs in - the filesystem. -3. stripe_offset The index of the <span - class="glossaryItem">OST</span> where the first stripe is to be - placed; default is -1 which results in random selection; using a - non-default value is NOT recommended. +1.stripe_size: the size of the chunk in bytes; specify with k, m, or + g to use units of KB, MB, or GB, respectively; the size must be an + even multiple of 65,536 bytes; default is 1MB for all Salomon Lustre + filesystems +2.stripe_count the number of OSTs to stripe across; default is 1 for + Salomon Lustre filesystems one can specify -1 to use all OSTs in + the filesystem. +3.stripe_offset The index of the + class="glossaryItem">OST where the first stripe is to be + placed; default is -1 which results in random selection; using a + non-default value is NOT recommended.  @@ -146,22 +146,22 @@ setstripe command for setting the stripe parameters to get optimal I/O performance The correct stripe setting depends on your needs and file access patterns. -``` +``` $ lfs getstripe dir|filename $ lfs setstripe -s stripe_size -c stripe_count -o stripe_offset dir|filename ``` Example: -``` +``` $ lfs getstripe /scratch/work/user/username /scratch/work/user/username -stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 +stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 $ lfs setstripe -c -1 /scratch/work/user/username/ $ lfs getstripe /scratch/work/user/username/ /scratch/work/user/username/ -stripe_count: -1 stripe_size: 1048576 stripe_offset: -1 +stripe_count:-1 stripe_size: 1048576 stripe_offset: -1 ``` In this example, we view current stripe setting of the @@ -172,7 +172,7 @@ all (54) OSTs Use lfs check OSTs to see the number and status of active OSTs for each filesystem on Salomon. Learn more by reading the man page -``` +``` $ lfs check osts $ man lfs ``` @@ -208,23 +208,23 @@ on a single-stripe file. Read more on <http://wiki.lustre.org/manual/LustreManual20_HTML/ManagingStripingFreeSpace.html> -<span>Disk usage and quota commands</span> +>Disk usage and quota commands ------------------------------------------ -<span>User quotas on the Lustre file systems (SCRATCH) can be checked -and reviewed using following command:</span> +>User quotas on the Lustre file systems (SCRATCH) can be checked +and reviewed using following command: -``` +``` $ lfs quota dir ``` Example for Lustre SCRATCH directory: -``` +``` $ lfs quota /scratch Disk quotas for user user001 (uid 1234): - Filesystem kbytes quota limit grace files quota limit grace -  /scratch    8    0 100000000000    -    3    0    0    - + Filesystem kbytes quota limit grace files quota limit grace +  /scratch    8    0 100000000000    -    3    0    0    - Disk quotas for group user001 (gid 1234): Filesystem kbytes quota limit grace files quota limit grace /scratch    8    0    0    -    3    0    0    - @@ -236,33 +236,33 @@ currently used by user001. HOME directory is mounted via NFS, so a different command must be used to obtain quota information: -  $ quota +  $ quota Example output: - $ quota - Disk quotas for user vop999 (uid 1025): - Filesystem blocks quota limit grace files quota limit grace - home-nfs-ib.salomon.it4i.cz:/home - 28 0 250000000 10 0 500000 + $ quota + Disk quotas for user vop999 (uid 1025): + Filesystem blocks quota limit grace files quota limit grace + home-nfs-ib.salomon.it4i.cz:/home + 28 0 250000000 10 0 500000 To have a better understanding of where the space is exactly used, you can use following command to find out. -``` +``` $ du -hs dir ``` Example for your HOME directory: -``` +``` $ cd /home $ du -hs * .[a-zA-z0-9]* | grep -E "[0-9]*G|[0-9]*M" | sort -hr -258M cuda-samples -15M .cache -13M .mozilla -5,5M .eclipse -2,7M .idb_13.0_linux_intel64_app +258M cuda-samples +15M .cache +13M .mozilla +5,5M .eclipse +2,7M .idb_13.0_linux_intel64_app ``` This will list all directories which are having MegaBytes or GigaBytes @@ -271,14 +271,14 @@ is sorted in descending order from largest to smallest files/directories. -<span>To have a better understanding of previous commands, you can read -manpages.</span> +>To have a better understanding of previous commands, you can read +manpages. -``` +``` $ man lfs ``` -``` +``` $ man du ``` @@ -295,7 +295,7 @@ ACLs on a Lustre file system work exactly like ACLs on any Linux file system. They are manipulated with the standard tools in the standard manner. Below, we create a directory and allow a specific user access. -``` +``` [vop999@login1.salomon ~]$ umask 027 [vop999@login1.salomon ~]$ mkdir test [vop999@login1.salomon ~]$ ls -ld test @@ -331,18 +331,18 @@ for more information on Linux ACL: [http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html ](http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html) -[]()Shared Workspaces +Shared Workspaces --------------------- -### []()[]()HOME +###HOME Users home directories /home/username reside on HOME filesystem. Accessible capacity is 0.5PB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 250GB per user. -<span>If 250GB should prove as insufficient for particular user, please -contact</span> [support](https://support.it4i.cz/rt), +>If 250GB should prove as insufficient for particular user, please +contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request. The HOME filesystem is intended for preparation, evaluation, processing @@ -371,14 +371,14 @@ User quota 250GB Protocol NFS, 2-Tier -### []()WORK +### WORK The WORK workspace resides on SCRATCH filesystem. Users may create subdirectories and files in directories **/scratch/work/user/username** and **/scratch/work/project/projectid. **The /scratch/work/user/username is private to user, much like the home directory. The /scratch/work/project/projectid is accessible to all users involved in -project projectid. <span></span> +project projectid. > The WORK workspace is intended to store users project data as well as for high performance access to input and output files. All project data @@ -414,7 +414,7 @@ Number of OSTs 54 Protocol Lustre -### []()TEMP +### TEMP The TEMP workspace resides on SCRATCH filesystem. The TEMP workspace accesspoint is /scratch/temp. Users may freely create subdirectories @@ -422,10 +422,10 @@ and files on the workspace. Accessible capacity is 1.6P, shared among all users on TEMP and WORK. Individual users are restricted by filesystem usage quotas, set to 100TB per user. The purpose of this quota is to prevent runaway programs from filling the entire filesystem -and deny service to other users. <span>If 100TB should prove as +and deny service to other users. >If 100TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be -lifted upon request. </span> +lifted upon request. The TEMP workspace is intended for temporary scratch data generated during the calculation as well as for high performance access to input @@ -476,7 +476,7 @@ size during your calculation. Be very careful, use of RAM disk filesystem is at the expense of operational memory. -[]()The local RAM disk is mounted as /ramdisk and is accessible to user +The local RAM disk is mounted as /ramdisk and is accessible to user at /ramdisk/$PBS_JOBID directory. The local RAM disk filesystem is intended for temporary scratch data @@ -493,9 +493,9 @@ the output data from within the jobscript. RAM disk Mountpoint -<span class="monospace">/ramdisk</span> + /ramdisk Accesspoint -<span class="monospace">/ramdisk/$PBS_JOBID</span> + /ramdisk/$PBS_JOBID Capacity 120 GB Throughput @@ -507,22 +507,22 @@ none  -**Summary -** +Summary + ---------- - ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - Mountpoint Usage Protocol Net Capacity Throughput Limitations Access Services - ---------------------------------------------- -------------------------------- ------------- -------------- ------------ ------------- ------------------------- ----------------------------- - <span class="monospace">/home</span> home directory NFS, 2-Tier 0.5 PB 6 GB/s Quota 250GB Compute and login nodes backed up +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +Mountpoint Usage Protocol Net Capacity Throughput Limitations Access Services +---------------------------------------------- -------------------------------- ------------- -------------- ------------ ------------- ------------------------- ----------------------------- + /home home directory NFS, 2-Tier 0.5 PB 6 GB/s Quota 250GB Compute and login nodes backed up - <span class="monospace">/scratch/work</span> large project files Lustre 1.69 PB 30 GB/s Quota Compute and login nodes none - 1TB + /scratch/work large project files Lustre 1.69 PB 30 GB/s Quota Compute and login nodes none + 1TB - <span class="monospace">/scratch/temp</span> job temporary data Lustre 1.69 PB 30 GB/s Quota 100TB Compute and login nodes files older 90 days removed + /scratch/temp job temporary data Lustre 1.69 PB 30 GB/s Quota 100TB Compute and login nodes files older 90 days removed - <span class="monospace">/ramdisk</span> job temporary data, node local local 120GB 90 GB/s none Compute nodes purged after job ends - ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + /ramdisk job temporary data, node local local 120GB 90 GB/s none Compute nodes purged after job ends +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  diff --git a/converted/docs.it4i.cz/salomon/04ce7514-8d27-4cdb-bf0f-45d875c75df0.jpeg b/converted/docs.it4i.cz/salomon/uv-2000.jpeg similarity index 100% rename from converted/docs.it4i.cz/salomon/04ce7514-8d27-4cdb-bf0f-45d875c75df0.jpeg rename to converted/docs.it4i.cz/salomon/uv-2000.jpeg diff --git a/html_md.sh b/html_md.sh index 55f329c6ec0ff2b0585f09baf2a2066f4d88dd15..b4bf49f851488668faef5194e058863a60ba0002 100755 --- a/html_md.sh +++ b/html_md.sh @@ -2,9 +2,9 @@ ### DOWNLOAD AND CONVERT DOCUMENTATION # autor: kru0052 -# version: 1.1 -# change: converted files moved to new directory with images, deleted witch for info (-i) and deleting files (-d) -# bugs: bad formatting tables, bad links for images and other files, stayed a few html elements +# version: 1.3 +# change: repair images links +# bugs: bad formatting tables, bad links for other files, stayed a few html elements ### if [ "$1" = "-t" ]; then @@ -12,13 +12,127 @@ if [ "$1" = "-t" ]; then echo "Testing..." + #grep ./docs.it4i.cz/salomon/software/ansys/Fluent_Licence_2.jpg ./converted/ -R + + (while read i; + do + if [ -f "$i" ]; + then + #echo "$i"; + test=$(grep "$i" ./converted/ -R) + if [ ! "$test" ] + then + continue + else + echo "$test" >> test.txt + + fi + + fi + + done) < ./info/list_image.txt + + fi + +if [ "$1" = "-t1" ]; then + # testing new function + + echo "Testing 1..." + + rm -rf ./converted + + # exists file for move? + if [ -f ./info/list_md.txt ]; + then + mkdir converted; + (while read i; + do + mkdir "./converted/$i"; + done) < ./source/list_folder.txt + + # move md files to new folders + while read a b ; do + cp "$a" "./converted/$b"; + + done < <(paste ./info/list_md.txt ./source/list_md_mv.txt) + # copy jpg and jpeg to new folders + + #cat "${i%.*}TMP.md" > "${i%.*}.md"; + + while read a b ; do cp "$a" "./converted/$b"; done < <(paste ./info/list_image.txt ./source/list_image_mv.txt) + + while read a ; do + + echo "$a"; + + sed -e 's/``` /```/' "./converted/$a" | sed -e 's/ //' | sed -e 's/<span class="pln">//' | sed -e 's/<span//' | sed -e 's/class="pln">//' | sed -e 's/<\/span>//' | sed -e 's/^\*\*//' | sed -e 's/\^\[>\[1<span>\]<\/span>\]//' > "./converted/${a%.*}TMP.md"; + + while read x ; do + arg1=`echo "$x" | cut -d"&" -f1 | sed 's:[]\[\^\$\.\*\/\"]:\\\\&:g'`; + arg2=`echo $x | cut -d"&" -f2 | sed 's:[]\[\^\$\.\*\/\"]:\\\\&:g'`; + #echo "$arg1"; + #echo ">$arg2"; + + sed -e 's/'"$arg1"'/'"$arg2"'/' "./converted/${a%.*}TMP.md" > "./converted/${a%TMP.*}.TEST.md"; + cat "./converted/${a%TMP.*}.TEST.md" > "./converted/${a%.*}TMP.md"; + done < ./source/replace.txt + cat "./converted/${a%.*}TMP.md" > "./converted/${a%.*}.md"; + rm "./converted/${a%.*}TMP.md"; + rm "./converted/${a%TMP.*}.TEST.md"; + done <./source/list_md_mv.txt + + else + echo "list_md.txt not exists!!!!!" + fi + + +fi + +if [ "$1" = "-t2" ]; then + # testing new function + + + # sed -e 's/[/' + #^[>[1<span>]</span>] + + # sed -e 's/'"$arg1"'/'"$arg2"'/g' + + + echo "Testing 2 ..." + + sed -e 's/``` /```/' /home/kru0052/Desktop/docs.it4i_new/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc.md | sed -e 's/ //' | sed -e 's/^ **//'| sed -e 's/^R/R /' | sed -e 's/^=/===/' | sed -e 's/<span class="pln">//' | sed -e 's/<span//' | sed -e 's/class="pln">//' | sed -e 's/<\/span>//' | sed -e 's/### \[\]()\[\]()/###/' | sed -e 's/^\*\*//' | sed -e 's/\^\[>\[1<span>\]<\/span>\]//' > /home/kru0052/Desktop/docs.it4i_new/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc2.md + + + while read a ; do + arg1=`echo "$a" | cut -d"&" -f1 | sed 's:[]\[\^\$\.\*\/\"]:\\\\&:g'`; + arg2=`echo $a | cut -d"&" -f2 | sed 's:[]\[\^\$\.\*\/\"]:\\\\&:g'`; + echo "$arg1"; + echo "$arg2"; + + sed -e 's/'"$arg1"'/'"$arg2"'/' /home/kru0052/Desktop/docs.it4i_new/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc2.md > /home/kru0052/Desktop/docs.it4i_new/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc3.md; + cat /home/kru0052/Desktop/docs.it4i_new/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc3.md > /home/kru0052/Desktop/docs.it4i_new/converted/docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vnc2.md; + + done < ./source/replace_in.txt + + + +fi + if [ "$1" = "-w" ]; then # download html pages wget -X pbspro-documentation,changelog,whats-new,portal_css,portal_javascripts,++resource++jquery-ui-themes,anselm-cluster-documentation/icon.jpg -R favicon.ico,pdf.png,logo.png,background.png,application.png,search_icon.png,png.png,sh.png,touch_icon.png,anselm-cluster-documentation/icon.jpg,*js,robots.txt,*xml,RSS,download_icon.png,pdf,*zip,*rar,@@*,anselm-cluster-documentation/icon.jpg.1 --mirror --convert-links --adjust-extension --page-requisites --no-parent https://docs.it4i.cz; + + wget http://verif.cs.vsb.cz/aislinn/doc/report.png + mv report.png ./converted/salomon/software/debuggers/ + fi if [ "$1" = "-c" ]; then ### convert html to md + # erasing the previous transfer + rm -rf converted; + rm -rf info; + find . -name "*.ht*" | while read i; do @@ -83,15 +197,11 @@ if [ "$1" = "-c" ]; then done) < ./source/list_rm.txt ### create new folder and move converted files - # erasing the previous transfer - rm -rf converted; - rm -rf info; - # create folder info and view all files and folder mkdir info; - find ./docs.it4i.cz -name "*.png" -type f> ./info/list_png.txt; - find ./docs.it4i.cz -name "*.jpg" -type f> ./info/list_jpg.txt; - find ./docs.it4i.cz -name "*.jpeg" -type f>> ./info/list_jpg.txt; + find ./docs.it4i.cz -name "*.png" -type f > ./info/list_image.txt; + find ./docs.it4i.cz -name "*.jpg" -type f >> ./info/list_image.txt; + find ./docs.it4i.cz -name "*.jpeg" -type f >> ./info/list_image.txt; find ./docs.it4i.cz -name "*.md" -type f> ./info/list_md.txt; find ./docs.it4i.cz -type d | sort > ./info/list_folder.txt @@ -105,11 +215,9 @@ if [ "$1" = "-c" ]; then done) < ./source/list_folder.txt # move md files to new folders - while read a b ; do mv "$a" "./converted/$b"; done < <(paste ./info/list_md.txt ./source/list_md_mv.txt) + while read a b ; do cp "$a" "./converted/$b"; done < <(paste ./info/list_md.txt ./source/list_md_mv.txt) # copy jpg and jpeg to new folders - while read a b ; do cp "$a" "./converted/$b"; done < <(paste ./info/list_jpg.txt ./source/list_jpg_mv.txt) - # copy png files to new folders - while read a b ; do cp "$a" "./converted/$b"; done < <(paste ./info/list_png.txt ./source/list_png_mv.txt) + while read a b ; do cp "$a" "./converted/$b"; done < <(paste ./info/list_image.txt ./source/list_image_mv.txt) else echo "list_md.txt not exists!!!!!" fi diff --git a/source/list_png_mv.txt b/source/list_image_mv.txt similarity index 52% rename from source/list_png_mv.txt rename to source/list_image_mv.txt index bead13a74ab4b7f0a1a2d5a6bd6cd797104a514d..3447b7aa4d9d2531c52637e92bd1810182621ece 100644 --- a/source/list_png_mv.txt +++ b/source/list_image_mv.txt @@ -1,10 +1,10 @@ ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/TightVNC_login.png ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/putty-tunnel.png ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/gnome-terminal.png -./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/8e80a92f-f691-4d92-8e62-344128dcc00b.png +./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/gdmscreensaver.png ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/gnome-compute-nodes-over-vnc.png ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/gdmdisablescreensaver.png -./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/0f5b58e3-253c-4f87-a3b2-16f75cbf090f.png +./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/cygwinX11forwarding.png ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/XWinlistentcp.png ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/PuttyKeygenerator_004V.png ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/20150312_143443.png @@ -18,47 +18,74 @@ ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/PuTTY_open_Salomon.png ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/PuttyKeygenerator_002V.png ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/PuttyKeygenerator_006V.png -./docs.it4i.cz/salomon/copy_of_vpn_web_install_3.png -./docs.it4i.cz/salomon/vpn_contacting.png +./docs.it4i.cz/salomon/accessing-the-cluster/copy_of_vpn_web_install_3.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_contacting.png ./docs.it4i.cz/salomon/resource-allocation-and-job-execution/rswebsalomon.png -./docs.it4i.cz/salomon/vpn_successfull_connection.png -./docs.it4i.cz/salomon/vpn_web_install_2.png -./docs.it4i.cz/salomon/vpn_web_login_2.png -./docs.it4i.cz/salomon/7758b792-24eb-48dc-bf72-618cda100fda.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_successfull_connection.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_install_2.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_login_2.png +./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/gnome_screen.png ./docs.it4i.cz/salomon/network-1/IBsingleplanetopologyAcceleratednodessmall.png ./docs.it4i.cz/salomon/network-1/IBsingleplanetopologyICEXMcellsmall.png ./docs.it4i.cz/salomon/network-1/Salomon_IB_topology.png ./docs.it4i.cz/salomon/network-1/7D_Enhanced_hypercube.png -./docs.it4i.cz/salomon/vpn_web_login.png -./docs.it4i.cz/salomon/vpn_login.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_login.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_login.png ./docs.it4i.cz/salomon/software/debuggers/totalview2.png -./docs.it4i.cz/salomon/software/debuggers/3550e4ae-2eab-4571-8387-11a112dd6ca8.png +./docs.it4i.cz/salomon/software/debuggers/Snmekobrazovky20160211v14.27.45.png ./docs.it4i.cz/salomon/software/debuggers/ddt1.png ./docs.it4i.cz/salomon/software/debuggers/totalview1.png -./docs.it4i.cz/salomon/software/debuggers/42d90ce5-8468-4edb-94bb-4009853d9f65.png -./docs.it4i.cz/salomon/software/intel-suite/fb3b3ac2-a88f-4e55-a25e-23f1da2200cb.png -./docs.it4i.cz/salomon/software/ansys/a34a45cc-9385-4f05-b12e-efadf1bd93bb.png -./docs.it4i.cz/salomon/vpn_contacting_https_cluster.png -./docs.it4i.cz/salomon/vpn_web_download.png -./docs.it4i.cz/salomon/vpn_web_download_2.png -./docs.it4i.cz/salomon/vpn_contacting_https.png -./docs.it4i.cz/salomon/vpn_web_install_4.png -./docs.it4i.cz/anselm-cluster-documentation/bb4cedff-4cb6-402b-ac79-039186fe5df3.png -./docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/job_sort_formula.png -./docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/fairshare_formula.png +./docs.it4i.cz/salomon/software/debuggers/Snmekobrazovky20160708v12.33.35.png +./docs.it4i.cz/salomon/software/intel-suite/Snmekobrazovky20151204v15.35.12.png +./docs.it4i.cz/salomon/software/ansys/AMsetPar1.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_contacting_https_cluster.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_download.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_download_2.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_contacting_https.png +./docs.it4i.cz/salomon/accessing-the-cluster/vpn_web_install_4.png +./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vncviewer.png +./docs.it4i.cz/salomon/resource-allocation-and-job-execution/job_sort_formula.png +./docs.it4i.cz/salomon/resource-allocation-and-job-execution/fairshare_formula.png ./docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/rsweb.png ./docs.it4i.cz/anselm-cluster-documentation/quality2.png ./docs.it4i.cz/anselm-cluster-documentation/turbovncclientsetting.png -./docs.it4i.cz/anselm-cluster-documentation/Authorization_chain.png +./docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/Authorization_chain.png ./docs.it4i.cz/anselm-cluster-documentation/scheme.png ./docs.it4i.cz/anselm-cluster-documentation/quality3.png ./docs.it4i.cz/anselm-cluster-documentation/legend.png ./docs.it4i.cz/anselm-cluster-documentation/bullxB510.png -./docs.it4i.cz/anselm-cluster-documentation/software/debuggers/3d4533af-8ce5-4aed-9bac-09fbbcd2650a.png +./docs.it4i.cz/salomon/software/debuggers/vtune-amplifier.png ./docs.it4i.cz/anselm-cluster-documentation/software/debuggers/totalview2.png ./docs.it4i.cz/anselm-cluster-documentation/software/debuggers/Snmekobrazovky20141204v12.56.36.png ./docs.it4i.cz/anselm-cluster-documentation/software/debuggers/ddt1.png ./docs.it4i.cz/anselm-cluster-documentation/software/debuggers/totalview1.png ./docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages/Matlab.png ./docs.it4i.cz/anselm-cluster-documentation/quality1.png -./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/vpnuiV.png +./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/vpnuiV.png +./docs.it4i.cz/salomon/software/ansys/Fluent_Licence_2.jpg +./docs.it4i.cz/salomon/software/ansys/Fluent_Licence_4.jpg +./docs.it4i.cz/salomon/software/ansys/Fluent_Licence_1.jpg +./docs.it4i.cz/salomon/software/ansys/Fluent_Licence_3.jpg +./docs.it4i.cz/anselm-cluster-documentation/Anselmprofile.jpg +./docs.it4i.cz/anselm-cluster-documentation/anyconnecticon.jpg +./docs.it4i.cz/anselm-cluster-documentation/anyconnectcontextmenu.jpg +./docs.it4i.cz/anselm-cluster-documentation/logingui.jpg +./docs.it4i.cz/anselm-cluster-documentation/software/ansys/Fluent_Licence_2.jpg +./docs.it4i.cz/anselm-cluster-documentation/software/ansys/Fluent_Licence_4.jpg +./docs.it4i.cz/anselm-cluster-documentation/software/ansys/Fluent_Licence_1.jpg +./docs.it4i.cz/anselm-cluster-documentation/software/ansys/Fluent_Licence_3.jpg +./docs.it4i.cz/anselm-cluster-documentation/firstrun.jpg +./docs.it4i.cz/anselm-cluster-documentation/successfullconnection.jpg +./docs.it4i.cz/salomon/sgi-c1104-gp1.jpeg +./docs.it4i.cz/salomon/salomon-1.jpeg +./docs.it4i.cz/salomon/uv-2000.jpeg +./docs.it4i.cz/salomon/salomon-3.jpeg +./docs.it4i.cz/salomon/salomon-4.jpeg +./docs.it4i.cz/anselm-cluster-documentation/loginwithprofile.jpeg +./docs.it4i.cz/anselm-cluster-documentation/instalationfile.jpeg +./docs.it4i.cz/anselm-cluster-documentation/successfullinstalation.jpeg +./docs.it4i.cz/anselm-cluster-documentation/java_detection.jpeg +./docs.it4i.cz/anselm-cluster-documentation/executionaccess.jpeg +./docs.it4i.cz/anselm-cluster-documentation/downloadfilesuccessfull.jpeg +./docs.it4i.cz/anselm-cluster-documentation/executionaccess2.jpeg +./docs.it4i.cz/anselm-cluster-documentation/login.jpeg diff --git a/source/list_jpg_mv.txt b/source/list_jpg_mv.txt deleted file mode 100644 index a86c3ad3db6943b9dd3f62d97c046524a38c22c8..0000000000000000000000000000000000000000 --- a/source/list_jpg_mv.txt +++ /dev/null @@ -1,27 +0,0 @@ -./docs.it4i.cz/salomon/software/ansys/Fluent_Licence_2.jpg -./docs.it4i.cz/salomon/software/ansys/Fluent_Licence_4.jpg -./docs.it4i.cz/salomon/software/ansys/Fluent_Licence_1.jpg -./docs.it4i.cz/salomon/software/ansys/Fluent_Licence_3.jpg -./docs.it4i.cz/anselm-cluster-documentation/Anselmprofile.jpg -./docs.it4i.cz/anselm-cluster-documentation/anyconnecticon.jpg -./docs.it4i.cz/anselm-cluster-documentation/anyconnectcontextmenu.jpg -./docs.it4i.cz/anselm-cluster-documentation/logingui.jpg -./docs.it4i.cz/anselm-cluster-documentation/software/ansys/Fluent_Licence_2.jpg -./docs.it4i.cz/anselm-cluster-documentation/software/ansys/Fluent_Licence_4.jpg -./docs.it4i.cz/anselm-cluster-documentation/software/ansys/Fluent_Licence_1.jpg -./docs.it4i.cz/anselm-cluster-documentation/software/ansys/Fluent_Licence_3.jpg -./docs.it4i.cz/anselm-cluster-documentation/firstrun.jpg -./docs.it4i.cz/anselm-cluster-documentation/successfullconnection.jpg -./docs.it4i.cz/salomon/c1109cbb-9bf4-4f0a-8b0f-a1e464fed0c4.jpeg -./docs.it4i.cz/salomon/ba2c321e-1554-4826-b6ec-3c68d370cd9f.jpeg -./docs.it4i.cz/salomon/04ce7514-8d27-4cdb-bf0f-45d875c75df0.jpeg -./docs.it4i.cz/salomon/d2a6de55-62fc-454f-adda-a6a25e3f44dd.jpeg -./docs.it4i.cz/salomon/82997462-cd88-49eb-aad5-71d77903d903.jpeg -./docs.it4i.cz/anselm-cluster-documentation/a6fd5f3f-bce4-45c9-85e1-8d93c6395eee.jpeg -./docs.it4i.cz/anselm-cluster-documentation/202d14e9-e2e1-450b-a584-e78c018d6b6a.jpeg -./docs.it4i.cz/anselm-cluster-documentation/c6d69ffe-da75-4cb6-972a-0cf4c686b6e1.jpeg -./docs.it4i.cz/anselm-cluster-documentation/5498e1ba-2242-4b9c-a799-0377a73f779e.jpeg -./docs.it4i.cz/anselm-cluster-documentation/4d6e7cb7-9aa7-419c-9583-6dfd92b2c015.jpeg -./docs.it4i.cz/anselm-cluster-documentation/69842481-634a-484e-90cd-d65e0ddca1e8.jpeg -./docs.it4i.cz/anselm-cluster-documentation/bed3998c-4b82-4b40-83bd-c3528dde2425.jpeg -./docs.it4i.cz/anselm-cluster-documentation/30271119-b392-4db9-a212-309fb41925d6.jpeg diff --git a/source/list_rm.txt b/source/list_rm.txt index 023765227417e537248fc43300d481c0684672e7..fe72b7929478137df0ab4e981c063c7379401a62 100644 --- a/source/list_rm.txt +++ b/source/list_rm.txt @@ -34,3 +34,5 @@ ./docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty.1.md ./docs.it4i.cz/robots.txt ./docs.it4i.cz/anselm-cluster-documentation/icon.jpg +./docs.it4i.cz/salomon/software/numerical-languages.md +./docs.it4i.cz/anselm-cluster-documentation/software/numerical-languages.md diff --git a/source/replace.txt b/source/replace.txt new file mode 100644 index 0000000000000000000000000000000000000000..074b420e9a27ada00a9adaac7d4ce731c94b7499 --- /dev/null +++ b/source/replace.txt @@ -0,0 +1,113 @@ +[](putty-tunnel.png)& +& +[****](TightVNC_login.png)& +[](https://docs.it4i.cz/get-started-with-it4innovations/gnome_screen.jpg)& +[](../../../../salomon/gnome_screen.jpg.1)& +[](gdmdisablescreensaver.png)& +[](gnome-terminal.png)& +[](gnome-compute-nodes-over-vnc.png)& +### **&### +### []()[]()&### + [](PuttyKeygeneratorV.png)& + [](PuttyKeygenerator_001V.png)& + [](PuttyKeygenerator_002V.png)& + [](20150312_143443.png)& + [](PuttyKeygenerator_004V.png)& + [](PuttyKeygenerator_005V.png)& + [](PuttyKeygenerator_006V.png)& + **[]()&** + []()& + & + [](cygwin-and-x11-forwarding.html)& +### Gnome on Windows**&### Gnome on Windows +### []()&### + id="Key_management" class="mw-headline">Key management& +style="text-align: start; float: none; ">& +class="Apple-converted-space">& +style="text-align: start; ">& +[Pageant (for Windows& +users)](putty/PageantV.png)& +PuTTY - class="Apple-converted-space"> before we start SSH connection ssh-connection style="text-align: start; "}& + [](PuTTY_host_Salomon.png)& + [](PuTTY_keyV.png)& + [](PuTTY_save_Salomon.png)& + [](PuTTY_open_Salomon.png)& + [](PageantV.png)& +& +Water-cooled Compute Nodes With MIC Accelerator**&**Water-cooled Compute Nodes With MIC Accelerator** +[](salomon)& +& +Tape Library T950B**&**Tape Library T950B** +&![]](salomon-3.jpeg) +& +& +& +& +class="pun">node_group_key& +class="internal-link">& +[](7D_Enhanced_hypercube.png)& + []&![] +{.state-missing-value +.contenttype-file}& +- - & +[](../vpn_web_login.png)& +Install](https://docs.it4i.cz/salomon/vpn_web_login_2.png/@@images/be923364-0175-4099-a363-79229b88e252.png "VPN Install")](../vpn_web_login_2.png)& +Install](https://docs.it4i.cz/salomon/vpn_web_install_2.png/@@images/c2baba93-824b-418d-b548-a73af8030320.png "VPN Install")](../vpn_web_install_2.png)[ +Install](https://docs.it4i.cz/salomon/vpn_web_install_4.png/@@images/4cc26b3b-399d-413b-9a6c-82ec47899585.png "VPN Install")](../vpn_web_install_4.png)& +Install](https://docs.it4i.cz/salomon/vpn_web_download.png/@@images/06a88cce-5f51-42d3-8f0a-f615a245beef.png "VPN Install")](../vpn_web_download.png)& +Install](https://docs.it4i.cz/salomon/vpn_web_download_2.png/@@images/3358d2ce-fe4d-447b-9e6c-b82285f9796e.png "VPN Install")](../vpn_web_download_2.png)& +& +Install](https://docs.it4i.cz/salomon/copy_of_vpn_web_install_3.png/@@images/9c34e8ad-64b1-4e1d-af3a-13c7a18fbca4.png "VPN Install")](../copy_of_vpn_web_install_3.png)& +[](../vpn_contacting_https_cluster.png)& +Cluster](https://docs.it4i.cz/salomon/vpn_contacting_https.png/@@images/ff365499-d07c-4baf-abb8-ce3e15559210.png "VPN Contacting Cluster")](../vpn_contacting_https.png)& +[](../../anselm-cluster-documentation/anyconnecticon.jpg)& +[](../../anselm-cluster-documentation/anyconnectcontextmenu.jpg)&& +Cluster](https://docs.it4i.cz/salomon/vpn_contacting.png/@@images/9ccabccf-581a-476a-8c24-ce9842c3e657.png "VPN Contacting Cluster")](../vpn_contacting.png)&& +login](https://docs.it4i.cz/salomon/vpn_login.png/@@images/5102f29d-93cf-4cfd-8f55-c99c18f196ea.png "VPN login")](../vpn_login.png)&& +Connection](https://docs.it4i.cz/salomon/vpn_successfull_connection.png/@@images/45537053-a47f-48b2-aacd-3b519d6770e6.png "VPN Succesfull Connection")](../vpn_successfull_connection.png)&& +[](vtune-amplifier)& +<span& +class="internal-link">& +</span>& +[{.image-inline width="451"& +height="513"}](ddt1.png)& +& +& +& +[](totalview1.png)& +[](totalview2.png)& +class="monospace">& +{.external& +.text}& + class="n">& +class="n">& +<span class="n">& + class="pre">& +### class="n">&### + style="text-align: left; float: none; ">Â & +1. style="text-align: left; float: none; ">Locate and modify&1. Locate and modify + style="text-align: left; float: none; ">& + style="text-align: left; float: none; ">& + style="text-align: left; float: none; ">& + style="text-align: left; float: none; ">change it&change it to + to</span>& + style="text-align: left; float: none; ">& + style="text-align: left; float: none; ">& +2. style="text-align: left; float: none; ">&2. Check Putty settings: + style="text-align: left; float: none; ">Check Putty settings:& + style="text-align: left; float: none; ">Enable X11&Enable X11 + forwarding style="text-align: left; float: none; ">& + style="text-align: left; float: none; ">&