Compare revisions

5dcd0a4f · 5dcd0a4f · 5dcd0a4f · 5dcd0a4f · 5dcd0a4f · 5dcd0a4f
--- a/docs.it4i.cz/anselm-cluster-documentation/hardware-overview.md
+++ b/docs.it4i.cz/anselm-cluster-documentation/hardware-overview.md
+Hardware Overview 
+=================
+
+
+
+  
+
+The Anselm cluster consists of 209 computational nodes named cn[1-209]
+of which 180 are regular compute nodes, 23 GPU Kepler K20 accelerated
+nodes, 4 MIC Xeon Phi 5110 accelerated nodes and 2 fat nodes. Each node
+is a <span class="WYSIWYG_LINK">powerful</span> x86-64 computer,
+equipped with 16 cores (two eight-core Intel Sandy Bridge processors),
+at least 64GB RAM, and local hard drive. The user access to the Anselm
+cluster is provided by two login nodes login[1,2]. The nodes are
+interlinked by high speed InfiniBand and Ethernet networks. All nodes
+share 320TB /home disk storage to store the user files. The 146TB shared
+/scratch storage is available for the scratch data.
+
+The Fat nodes are equipped with large amount (512GB) of memory.
+Virtualization infrastructure provides resources to run long term
+servers and services in virtual mode. Fat nodes and virtual servers may
+access 45 TB of dedicated block storage. Accelerated nodes, fat nodes,
+and virtualization infrastructure are available [upon
+request](https://support.it4i.cz/rt) made by a PI.
+
+Schematic representation of the Anselm cluster. Each box represents a
+node (computer) or storage capacity:
+
+
+User-oriented infrastructure
+Storage
+Management infrastructure
+  --------
+  login1
+  login2
+  dm1
+  --------
+
+**Rack 01, Switch isw5
+**
+
+  -------------- -------------- -------------- -------------- --------------
+  cn186          cn187                         cn188          cn189
+  cn181          cn182          cn183          cn184          cn185
+  -------------- -------------- -------------- -------------- --------------
+
+**Rack 01, Switch isw4
+**
+
+cn29
+cn30
+cn31
+cn32
+cn33
+cn34
+cn35
+cn36
+cn19
+cn20
+cn21
+cn22
+cn23
+cn24
+cn25
+cn26
+cn27
+cn28
+<table>
+<colgroup>
+<col width="100%" />
+</colgroup>
+<tbody>
+<tr class="odd">
+<td align="left"><p> </p>
+<p> </p>
+<p>Lustre FS</p>
+<p>/home<br />
+320TB</p>
+<p> </p>
+<p> </p></td>
+</tr>
+<tr class="even">
+<td align="left"><p>Lustre FS</p>
+<p>/scratch<br />
+146TB</p></td>
+</tr>
+</tbody>
+</table>
+
+Management
+nodes
+Block storage
+45 TB
+Virtualization
+infrastructure
+servers
+...
+Srv node
+Srv node
+Srv node
+...
+**Rack 01, Switch isw0
+**
+
+cn11
+cn12
+cn13
+cn14
+cn15
+cn16
+cn17
+cn18
+cn1
+cn2
+cn3
+cn4
+cn5
+cn6
+cn7
+cn8
+cn9
+cn10
+**Rack 02, Switch isw10
+**
+
+cn73
+cn74
+cn75
+cn76
+cn77
+cn78
+cn79
+cn80
+cn190
+cn191
+cn192
+cn205
+cn206
+**Rack 02, Switch isw9
+**
+
+cn65
+cn66
+cn67
+cn68
+cn69
+cn70
+cn71
+cn72
+cn55
+cn56
+cn57
+cn58
+cn59
+cn60
+cn61
+cn62
+cn63
+cn64
+**Rack 02, Switch isw6
+**
+
+cn47
+cn48
+cn49
+cn50
+cn51
+cn52
+cn53
+cn54
+cn37
+cn38
+cn39
+cn40
+cn41
+cn42
+cn43
+cn44
+cn45
+cn46
+**Rack 03, Switch isw15
+**
+
+cn193
+cn194
+cn195
+cn207
+cn117
+cn118
+cn119
+cn120
+cn121
+cn122
+cn123
+cn124
+cn125
+cn126
+**Rack 03, Switch isw14
+**
+
+cn109
+cn110
+cn111
+cn112
+cn113
+cn114
+cn115
+cn116
+cn99
+cn100
+cn101
+cn102
+cn103
+cn104
+cn105
+cn106
+cn107
+cn108
+**Rack 03, Switch isw11
+**
+
+cn91
+cn92
+cn93
+cn94
+cn95
+cn96
+cn97
+cn98
+cn81
+cn82
+cn83
+cn84
+cn85
+cn86
+cn87
+cn88
+cn89
+cn90
+**Rack 04, Switch isw20
+**
+
+cn173
+cn174
+cn175
+cn176
+cn177
+cn178
+cn179
+cn180
+cn163
+cn164
+cn165
+cn166
+cn167
+cn168
+cn169
+cn170
+cn171
+cn172
+**Rack 04, **Switch** isw19
+**
+
+cn155
+cn156
+cn157
+cn158
+cn159
+cn160
+cn161
+cn162
+cn145
+cn146
+cn147
+cn148
+cn149
+cn150
+cn151
+cn152
+cn153
+cn154
+**Rack 04, Switch isw16
+**
+
+cn137
+cn138
+cn139
+cn140
+cn141
+cn142
+cn143
+cn144
+cn127
+cn128
+cn129
+cn130
+cn131
+cn132
+cn133
+cn134
+cn135
+cn136
+**Rack 05, Switch isw21
+**
+
+  -------------- -------------- -------------- -------------- --------------
+  cn201          cn202                         cn203          cn204
+  cn196          cn197          cn198          cn199          cn200
+  -------------- -------------- -------------- -------------- --------------
+
+  ----------------
+  Fat node cn208
+  Fat node cn209
+  ...
+  ----------------
+
+
+
+The cluster compute nodes cn[1-207] are organized within 13 chassis. 
+
+There are four types of compute nodes:
+
+-   180 compute nodes without the accelerator
+-   23 compute nodes with GPU accelerator - equipped with NVIDIA Tesla
+    Kepler K20
+-   4 compute nodes with MIC accelerator - equipped with Intel Xeon Phi
+    5110P
+-   2 fat nodes - equipped with 512GB RAM and two 100GB SSD drives
+
+[More about Compute nodes](compute-nodes.html).
+
+GPU and accelerated nodes are available upon request, see the [Resources
+Allocation
+Policy](resource-allocation-and-job-execution/resources-allocation-policy.html).
+
+All these nodes are interconnected by fast <span
+class="WYSIWYG_LINK">InfiniBand <span class="WYSIWYG_LINK">QDR</span>
+network</span> and Ethernet network.  [More about the <span
+class="WYSIWYG_LINK">Network</span>](network.html).
+Every chassis provides Infiniband switch, marked **isw**, connecting all
+nodes in the chassis, as well as connecting the chassis to the upper
+level switches.
+
+All nodes share 360TB /home disk storage to store user files. The 146TB
+shared /scratch storage is available for the scratch data. These file
+systems are provided by Lustre parallel file system. There is also local
+disk storage available on all compute nodes /lscratch.  [More about
+<span
+class="WYSIWYG_LINK">Storage</span>](storage.html).
+
+The user access to the Anselm cluster is provided by two login nodes
+login1, login2, and data mover node dm1. [More about accessing
+cluster.](accessing-the-cluster.html)
+
+ The parameters are summarized in the following tables:
+
+**In general**
+Primary purpose
+High Performance Computing
+Architecture of compute nodes
+x86-64
+Operating system
+Linux
+[**Compute nodes**](compute-nodes.html)
+Totally
+209
+Processor cores
+16 (2x8 cores)
+RAM
+min. 64 GB, min. 4 GB per core
+Local disk drive
+yes - usually 500 GB
+Compute network
+InfiniBand QDR, fully non-blocking, fat-tree
+w/o accelerator
+180, cn[1-180]
+GPU accelerated
+23, cn[181-203]
+MIC accelerated
+4, cn[204-207]
+Fat compute nodes
+2, cn[208-209]
+**In total**
+Total theoretical peak performance  (Rpeak)
+94 Tflop/s
+Total max. LINPACK performance  (Rmax)
+73 Tflop/s
+Total amount of RAM
+15.136 TB
+  Node               Processor                               Memory   Accelerator
+  ------------------ --------------------------------------- -------- ----------------------
+  w/o accelerator    2x Intel Sandy Bridge E5-2665, 2.4GHz   64GB     -
+  GPU accelerated    2x Intel Sandy Bridge E5-2470, 2.3GHz   96GB     NVIDIA Kepler K20
+  MIC accelerated    2x Intel Sandy Bridge E5-2470, 2.3GHz   96GB     Intel Xeon Phi P5110
+  Fat compute node   2x Intel Sandy Bridge E5-2665, 2.4GHz   512GB    -
+
+  For more details please refer to the [Compute
+nodes](compute-nodes.html),
+[Storage](storage.html), and
+[Network](network.html).
+
--- a/docs.it4i.cz/anselm-cluster-documentation/icon.jpg
+++ b/docs.it4i.cz/anselm-cluster-documentation/icon.jpg
--- a/docs.it4i.cz/anselm-cluster-documentation/instalationfile.jpg/@@images/202d14e9-e2e1-450b-a584-e78c018d6b6a.jpeg
+++ b/docs.it4i.cz/anselm-cluster-documentation/instalationfile.jpg/@@images/202d14e9-e2e1-450b-a584-e78c018d6b6a.jpeg
--- a/docs.it4i.cz/anselm-cluster-documentation/introduction.md
+++ b/docs.it4i.cz/anselm-cluster-documentation/introduction.md
+Introduction 
+============
+
+
+
+  
+
+Welcome to Anselm supercomputer cluster. The Anselm cluster consists of
+209 compute nodes, totaling 3344 compute cores with 15TB RAM and giving
+over 94 Tflop/s theoretical peak performance. Each node is a <span
+class="WYSIWYG_LINK">powerful</span> x86-64 computer, equipped with 16
+cores, at least 64GB RAM, and 500GB harddrive. Nodes are interconnected
+by fully non-blocking fat-tree Infiniband network and equipped with
+Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA
+Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware
+Overview](hardware-overview.html).
+
+The cluster runs bullx Linux [<span
+class="WYSIWYG_LINK"></span>](http://www.bull.com/bullx-logiciels/systeme-exploitation.html)[operating
+system](software/operating-system.html), which is
+compatible with the <span class="WYSIWYG_LINK">RedHat</span> [<span
+class="WYSIWYG_LINK">Linux
+family.</span>](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg)
+We have installed a wide range of
+[software](software.1.html) packages targeted at
+different scientific domains. These packages are accessible via the
+[modules environment](environment-and-modules.html).
+
+User data shared file-system (HOME, 320TB) and job data shared
+file-system (SCRATCH, 146TB) are available to users.
+
+The PBS Professional workload manager provides [computing resources
+allocations and job
+execution](resource-allocation-and-job-execution.html).
+
+Read more on how to [apply for
+resources](../get-started-with-it4innovations/applying-for-resources.html),
+[obtain login
+credentials,](../get-started-with-it4innovations/obtaining-login-credentials.html)
+and [access the cluster](accessing-the-cluster.html).
+
--- a/docs.it4i.cz/anselm-cluster-documentation/java_detection.jpg/@@images/5498e1ba-2242-4b9c-a799-0377a73f779e.jpeg
+++ b/docs.it4i.cz/anselm-cluster-documentation/java_detection.jpg/@@images/5498e1ba-2242-4b9c-a799-0377a73f779e.jpeg
--- a/docs.it4i.cz/anselm-cluster-documentation/legend.png
+++ b/docs.it4i.cz/anselm-cluster-documentation/legend.png
--- a/docs.it4i.cz/anselm-cluster-documentation/login.jpg/@@images/30271119-b392-4db9-a212-309fb41925d6.jpeg
+++ b/docs.it4i.cz/anselm-cluster-documentation/login.jpg/@@images/30271119-b392-4db9-a212-309fb41925d6.jpeg
--- a/docs.it4i.cz/anselm-cluster-documentation/logingui.jpg
+++ b/docs.it4i.cz/anselm-cluster-documentation/logingui.jpg
--- a/docs.it4i.cz/anselm-cluster-documentation/loginwithprofile.jpg/@@images/a6fd5f3f-bce4-45c9-85e1-8d93c6395eee.jpeg
+++ b/docs.it4i.cz/anselm-cluster-documentation/loginwithprofile.jpg/@@images/a6fd5f3f-bce4-45c9-85e1-8d93c6395eee.jpeg
--- a/docs.it4i.cz/anselm-cluster-documentation/network.md
+++ b/docs.it4i.cz/anselm-cluster-documentation/network.md
+Network 
+=======
+
+
+
+  
+
+All compute and login nodes of Anselm are interconnected by
+[Infiniband](http://en.wikipedia.org/wiki/InfiniBand)
+DR network and by Gigabit
+[Ethernet](http://en.wikipedia.org/wiki/Ethernet)
+network. Both networks may be used to transfer user data.
+
+Infiniband Network
+------------------
+
+All compute and login nodes of Anselm are interconnected by a
+high-bandwidth, low-latency
+[Infiniband](http://en.wikipedia.org/wiki/InfiniBand)
+DR network (IB 4x QDR, 40 Gbps). The network topology is a fully
+non-blocking fat-tree.
+
+The compute nodes may be accessed via the Infiniband network using ib0
+network interface, in address range 10.2.1.1-209. The MPI may be used to
+establish native Infiniband connection among the nodes.
+
+The network provides **2170MB/s** transfer rates via the TCP connection
+(single stream) and up to **3600MB/s** via native Infiniband protocol.
+
+The Fat tree topology ensures that peak transfer rates are achieved
+between any two nodes, independent of network traffic exchanged among
+other nodes concurrently.
+
+Ethernet Network
+----------------
+
+The compute nodes may be accessed via the regular Gigabit Ethernet
+network interface eth0, in address range 10.1.1.1-209, or by using
+aliases cn1-cn209.
+The network provides **114MB/s** transfer rates via the TCP connection.
+
+Example
+-------
+
+``` 
+$ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob
+$ qstat -n -u username
+                                                            Req'd  Req'd   Elap
+Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
+--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
+15209.srv11     username qexp     Name0        5530   4  64    --  01:00 R 00:00
+   cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16
+
+$ ssh 10.2.1.110
+$ ssh 10.1.1.108
+```
+
+In this example, we access the node cn110 by Infiniband network via the
+ib0 interface, then from cn110 to cn108 by Ethernet network.
+
--- a/docs.it4i.cz/anselm-cluster-documentation/prace.md
+++ b/docs.it4i.cz/anselm-cluster-documentation/prace.md
+PRACE User Support 
+==================
+
+
+
+  
+
+Intro
+-----
+
+PRACE users coming to Anselm as to TIER-1 system offered through the
+DECI calls are in general treated as standard users and so most of the
+general documentation applies to them as well. This section shows the
+main differences for quicker orientation, but often uses references to
+the original documentation. PRACE users who don't undergo the full
+procedure (including signing the IT4I AuP on top of the PRACE AuP) will
+not have a password and thus access to some services intended for
+regular users. This can lower their comfort, but otherwise they should
+be able to use the TIER-1 system as intended. Please see the [Obtaining
+Login Credentials
+section](../get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials.html),
+if the same level of access is required.
+
+All general [PRACE User
+Documentation](http://www.prace-ri.eu/user-documentation/)
+should be read before continuing reading the local documentation here.
+
+[]()Help and Support
+--------------------
+
+If you have any troubles, need information, request support or want to
+install additional software, please use [PRACE
+Helpdesk](http://www.prace-ri.eu/helpdesk-guide264/).
+
+Information about the local services are provided in the [introduction
+of general user documentation](introduction.html).
+Please keep in mind, that standard PRACE accounts don't have a password
+to access the web interface of the local (IT4Innovations) request
+tracker and thus a new ticket should be created by sending an e-mail to
+support[at]it4i.cz.
+
+Obtaining Login Credentials
+---------------------------
+
+In general PRACE users already have a PRACE account setup through their
+HOMESITE (institution from their country) as a result of rewarded PRACE
+project proposal. This includes signed PRACE AuP, generated and
+registered certificates, etc.
+
+If there's a special need a PRACE user can get a standard (local)
+account at IT4Innovations. To get an account on the Anselm cluster, the
+user needs to obtain the login credentials. The procedure is the same as
+for general users of the cluster, so please see the corresponding
+[section of the general documentation
+here](../get-started-with-it4innovations/obtaining-login-credentials.html).
+
+Accessing the cluster
+---------------------
+
+### Access with GSI-SSH
+
+For all PRACE users the method for interactive access (login) and data
+transfer based on grid services from Globus Toolkit (GSI SSH and
+GridFTP) is supported.
+
+The user will need a valid certificate and to be present in the PRACE
+LDAP (please contact your HOME SITE or the primary investigator of your
+project for LDAP account creation).
+
+Most of the information needed by PRACE users accessing the Anselm
+TIER-1 system can be found here:
+
+-   [General user's
+    FAQ](http://www.prace-ri.eu/Users-General-FAQs)
+-   [Certificates
+    FAQ](http://www.prace-ri.eu/Certificates-FAQ)
+-   [Interactive access using
+    GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh)
+-   [Data transfer with
+    GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details)
+-   [Data transfer with
+    gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer)
+
+ 
+
+Before you start to use any of the services don't forget to create a
+proxy certificate from your certificate:
+
+    $ grid-proxy-init
+
+To check whether your proxy certificate is still valid (by default it's
+valid 12 hours), use:
+
+    $ grid-proxy-info
+
+ 
+
+To access Anselm cluster, two login nodes running GSI SSH service are
+available. The service is available from public Internet as well as from
+the internal PRACE network (accessible only from other PRACE partners).
+
+**Access from PRACE network:**
+
+It is recommended to use the single DNS name <span
+class="monospace">anselm-prace.it4i.cz</span> which is distributed
+between the two login nodes. If needed, user can login directly to one
+of the login nodes. The addresses are:
+
+  Login address                 Port   Protocol   Login node
+  ----------------------------- ------ ---------- ------------------
+  anselm-prace.it4i.cz          2222   gsissh     login1 or login2
+  login1-prace.anselm.it4i.cz   2222   gsissh     login1
+  login2-prace.anselm.it4i.cz   2222   gsissh     login2
+
+ 
+
+    $ gsissh -p 2222 anselm-prace.it4i.cz
+
+When logging from other PRACE system, the prace_service script can be
+used:
+
+    $ gsissh `prace_service -i -s anselm`
+
+ 
+
+**Access from public Internet:**
+
+It is recommended to use the single DNS name <span
+class="monospace">anselm.it4i.cz</span> which is distributed between the
+two login nodes. If needed, user can login directly to one of the login
+nodes. The addresses are:
+
+  Login address           Port   Protocol   Login node
+  ----------------------- ------ ---------- ------------------
+  anselm.it4i.cz          2222   gsissh     login1 or login2
+  login1.anselm.it4i.cz   2222   gsissh     login1
+  login2.anselm.it4i.cz   2222   gsissh     login2
+
+    $ gsissh -p 2222 anselm.it4i.cz
+
+When logging from other PRACE system, the <span
+class="monospace">prace_service</span> script can be used:
+
+    $ gsissh `prace_service -e -s anselm`
+
+ 
+
+Although the preferred and recommended file transfer mechanism is [using
+GridFTP](prace.html#file-transfers), the GSI SSH
+implementation on Anselm supports also SCP, so for small files transfer
+gsiscp can be used:
+
+    $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ anselm.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_
+
+    $ gsiscp -P 2222 anselm.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ 
+
+    $ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ anselm-prace.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_
+
+    $ gsiscp -P 2222 anselm-prace.it4i.cz:_ANSELM_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_ 
+
+### Access to X11 applications (VNC)
+
+If the user needs to run X11 based graphical application and does not
+have a X11 server, the applications can be run using VNC service. If the
+user is using regular SSH based access, please see the [section in
+general
+documentation](https://docs.it4i.cz/anselm-cluster-documentation/resolveuid/11e53ad0d2fd4c5187537f4baeedff33).
+
+If the user uses GSI SSH based access, then the procedure is similar to
+the SSH based access ([look
+here](https://docs.it4i.cz/anselm-cluster-documentation/resolveuid/11e53ad0d2fd4c5187537f4baeedff33)),
+only the port forwarding must be done using GSI SSH:
+
+    $ gsissh -p 2222 anselm.it4i.cz -L 5961:localhost:5961
+
+### Access with SSH
+
+After successful obtainment of login credentials for the local
+IT4Innovations account, the PRACE users can access the cluster as
+regular users using SSH. For more information please see the [section in
+general
+documentation](https://docs.it4i.cz/anselm-cluster-documentation/resolveuid/5d3d6f3d873a42e584cbf4365c4e251b).
+
+[]()File transfers
+------------------
+
+PRACE users can use the same transfer mechanisms as regular users (if
+they've undergone the full registration procedure). For information
+about this, please see [the section in the general
+documentation](https://docs.it4i.cz/anselm-cluster-documentation/resolveuid/5d3d6f3d873a42e584cbf4365c4e251b).
+
+Apart from the standard mechanisms, for PRACE users to transfer data
+to/from Anselm cluster, a GridFTP server running Globus Toolkit GridFTP
+service is available. The service is available from public Internet as
+well as from the internal PRACE network (accessible only from other
+PRACE partners).
+
+There's one control server and three backend servers for striping and/or
+backup in case one of them would fail.
+
+**Access from PRACE network:**
+
+  Login address                  Port   Node role
+  ------------------------------ ------ -----------------------------
+  gridftp-prace.anselm.it4i.cz   2812   Front end /control server
+  login1-prace.anselm.it4i.cz    2813   Backend / data mover server
+  login2-prace.anselm.it4i.cz    2813   Backend / data mover server
+  dm1-prace.anselm.it4i.cz       2813   Backend / data mover server
+
+Copy files **to** Anselm by running the following commands on your local
+machine:
+
+    $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_
+
+Or by using <span class="monospace">prace_service</span> script:
+
+    $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_
+
+Copy files **from** Anselm:
+
+    $ globus-url-copy gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
+
+Or by using <span class="monospace">prace_service</span> script:
+
+    $ globus-url-copy gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
+
+ 
+
+**Access from public Internet:**
+
+  Login address            Port   Node role
+  ------------------------ ------ -----------------------------
+  gridftp.anselm.it4i.cz   2812   Front end /control server
+  login1.anselm.it4i.cz    2813   Backend / data mover server
+  login2.anselm.it4i.cz    2813   Backend / data mover server
+  dm1.anselm.it4i.cz       2813   Backend / data mover server
+
+Copy files **to** Anselm by running the following commands on your local
+machine:
+
+    $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_
+
+Or by using <span class="monospace">prace_service</span> script:
+
+    $ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_
+
+Copy files **from** Anselm:
+
+    $ globus-url-copy gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
+
+Or by using <span class="monospace">prace_service</span> script:
+
+    $ globus-url-copy gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
+
+ 
+
+Generally both shared file systems are available through GridFTP:
+
+  File system mount point   Filesystem   Comment
+  ------------------------- ------------ ----------------------------------------------------------------
+  /home                     Lustre       Default HOME directories of users in format /home/prace/login/
+  /scratch                  Lustre       Shared SCRATCH mounted on the whole cluster
+
+More information about the shared file systems is available
+[here](storage.html).
+
+Usage of the cluster
+--------------------
+
+There are some limitations for PRACE user when using the cluster. By
+default PRACE users aren't allowed to access special queues in the PBS
+Pro to have high priority or exclusive access to some special equipment
+like accelerated nodes and high memory (fat) nodes. There may be also
+restrictions obtaining a working license for the commercial software
+installed on the cluster, mostly because of the license agreement or
+because of insufficient amount of licenses.
+
+For production runs always use scratch file systems, either the global
+shared or the local ones. The available file systems are described
+[here](hardware-overview.html).
+
+### Software, Modules and PRACE Common Production Environment
+
+All system wide installed software on the cluster is made available to
+the users via the modules. The information about the environment and
+modules usage is in this [section of general
+documentation](environment-and-modules.html).
+
+PRACE users can use the "prace" module to use the [PRACE Common
+Production
+Environment](http://www.prace-ri.eu/PRACE-common-production).
+
+    $ module load prace
+
+ 
+
+### Resource Allocation and Job Execution
+
+General information about the resource allocation, job queuing and job
+execution is in this [section of general
+documentation](resource-allocation-and-job-execution/introduction.html).
+
+For PRACE users, the default production run queue is "qprace". PRACE
+users can also use two other queues "qexp" and "qfree".
+
+  -------------------------------------------------------------------------------------------------------------------------
+  queue                 Active project   Project resources   Nodes                 priority   authorization   walltime
+                                                                                                              default/max
+  --------------------- ---------------- ------------------- --------------------- ---------- --------------- -------------
+  **qexp**             no               none required       2 reserved,          high       no              1 / 1h
+  Express queue                                              8 total                                          
+
+  **qprace**           yes             &gt; 0              178 w/o accelerator   medium     no              24 / 48h
+  Production queue                                                                                           
+                                                                                                              
+
+  **qfree**            yes              none required       178 w/o accelerator   very low   no              12 / 12h
+  Free resource queue                                                                                         
+  -------------------------------------------------------------------------------------------------------------------------
+
+**qprace**, the PRACE Production queue****This queue is intended for
+normal production runs. It is required that active project with nonzero
+remaining resources is specified to enter the qprace. The queue runs
+with medium priority and no special authorization is required to use it.
+The maximum runtime in qprace is 12 hours. If the job needs longer time,
+it must use checkpoint/restart functionality.
+
+### Accounting & Quota
+
+The resources that are currently subject to accounting are the core
+hours. The core hours are accounted on the wall clock basis. The
+accounting runs whenever the computational cores are allocated or
+blocked via the PBS Pro workload manager (the qsub command), regardless
+of whether the cores are actually used for any calculation. See [example
+in the general
+documentation](resource-allocation-and-job-execution/resources-allocation-policy.html).
+
+PRACE users should check their project accounting using the [PRACE
+Accounting Tool
+(DART)](http://www.prace-ri.eu/accounting-report-tool/).
+
+Users who have undergone the full local registration procedure
+(including signing the IT4Innovations Acceptable Use Policy) and who
+have received local password may check at any time, how many core-hours
+have been consumed by themselves and their projects using the command
+"it4ifree". Please note that you need to know your user password to use
+the command and that the displayed core hours are "system core hours"
+which differ from PRACE "standardized core hours".
+
+The **it4ifree** command is a part of it4i.portal.clients package,
+located here:
+<https://pypi.python.org/pypi/it4i.portal.clients>
+
+    $ it4ifree
+    Password:
+         PID    Total   Used   ...by me Free
+       -------- ------- ------ -------- -------
+       OPEN-0-0 1500000 400644   225265 1099356
+       DD-13-1    10000   2606     2606    7394
+
+ 
+
+By default file system quota is applied. To check the current status of
+the quota use
+
+    $ lfs quota -u USER_LOGIN /home
+    $ lfs quota -u USER_LOGIN /scratch
+
+If the quota is insufficient, please contact the
+[support](prace.html#help-and-support) and request an
+increase.
+
+ 
+
+ 
+
--- a/docs.it4i.cz/anselm-cluster-documentation/quality1.png
+++ b/docs.it4i.cz/anselm-cluster-documentation/quality1.png
--- a/docs.it4i.cz/anselm-cluster-documentation/quality2.png
+++ b/docs.it4i.cz/anselm-cluster-documentation/quality2.png
--- a/docs.it4i.cz/anselm-cluster-documentation/quality3.png
+++ b/docs.it4i.cz/anselm-cluster-documentation/quality3.png
--- a/docs.it4i.cz/anselm-cluster-documentation/remote-visualization.md
+++ b/docs.it4i.cz/anselm-cluster-documentation/remote-visualization.md
+Remote visualization service 
+============================
+
+Introduction 
+------------
+
+The goal of this service is to provide the users a GPU accelerated use
+of OpenGL applications, especially for pre- and post- processing work,
+where not only the GPU performance is needed but also fast access to the
+shared file systems of the cluster and a reasonable amount of RAM.
+
+The service is based on integration of open source tools VirtualGL and
+TurboVNC together with the cluster's job scheduler PBS Professional.
+
+Currently two compute nodes are dedicated for this service with
+following configuration for each node:
+
+[**Visualization node
+configuration**](compute-nodes.html)
+CPU
+2x Intel Sandy Bridge E5-2670, 2.6GHz
+Processor cores
+16 (2x8 cores)
+RAM
+64 GB, min. 4 GB per core
+GPU
+NVIDIA Quadro 4000, 2GB RAM
+Local disk drive
+yes - 500 GB
+Compute network
+InfiniBand QDR
+Schematic overview
+------------------
+
+![rem_vis_scheme](scheme.png "rem_vis_scheme")
+
+![rem_vis_legend](legend.png "rem_vis_legend")
+
+How to use the service 
+----------------------
+
+### Setup and start your own TurboVNC server.
+
+TurboVNC is designed and implemented for cooperation with VirtualGL and
+available for free for all major platforms. For more information and
+download, please refer to<http://sourceforge.net/projects/turbovnc/>
+
+**Always use TurboVNC on both sides** (server and client) **don't mix
+TurboVNC and other VNC implementations** (TightVNC, TigerVNC, ...) as
+the VNC protocol implementation may slightly differ and diminish your
+user experience by introducing picture artifacts, etc.
+
+The procedure is:
+
+#### 1. Connect to a login node. {#1-connect-to-a-login-node}
+
+Please [follow the
+documentation](https://docs.it4i.cz/anselm-cluster-documentation/resolveuid/5d3d6f3d873a42e584cbf4365c4e251b).
+
+#### 2. Run your own instance of TurboVNC server. {#2-run-your-own-instance-of-turbovnc-server}
+
+To have the OpenGL acceleration, **24 bit color depth must be used**.
+Otherwise only the geometry (desktop size) definition is needed.
+
+*At first VNC server run you need to define a password.*
+
+This example defines desktop with dimensions 1200x700 pixels and 24 bit
+color depth.
+
+``` 
+$ module load turbovnc/1.2.2 
+$ vncserver -geometry 1200x700 -depth 24 
+
+Desktop 'TurboVNClogin2:1 (username)' started on display login2:1 
+
+Starting applications specified in /home/username/.vnc/xstartup.turbovnc 
+Log file is /home/username/.vnc/login2:1.log 
+```
+
+#### 3. Remember which display number your VNC server runs (you will need it in the future to stop the server). {#3-remember-which-display-number-your-vnc-server-runs-you-will-need-it-in-the-future-to-stop-the-server}
+
+``` 
+$ vncserver -list 
+
+TurboVNC server sessions
+
+X DISPLAY # PROCESS ID 
+:1 23269 
+```
+
+In this example the VNC server runs on display **:1**.
+
+#### 4. Remember the exact login node, where your VNC server runs. {#4-remember-the-exact-login-node-where-your-vnc-server-runs}
+
+``` 
+$ uname -n
+login2 
+```
+
+In this example the VNC server runs on **login2**.
+
+#### 5. Remember on which TCP port your own VNC server is running. {#5-remember-on-which-tcp-port-your-own-vnc-server-is-running}
+
+To get the port you have to look to the log file of your VNC server.
+
+``` 
+$ grep -E "VNC.*port" /home/username/.vnc/login2:1.log 
+20/02/2015 14:46:41 Listening for VNC connections on TCP port 5901 
+```
+
+In this example the VNC server listens on TCP port **5901**.
+
+#### 6. Connect to the login node where your VNC server runs with SSH to tunnel your VNC session. {#6-connect-to-the-login-node-where-your-vnc-server-runs-with-ssh-to-tunnel-your-vnc-session}
+
+Tunnel the TCP port on which your VNC server is listenning.
+
+``` 
+$ ssh login2.anselm.it4i.cz -L 5901:localhost:5901 
+```
+
+*If you use Windows and Putty, please refer to port forwarding setup
+<span class="internal-link">in the documentation</span>:*
+[https://docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/x-window-and-vnc#section-12](accessing-the-cluster/x-window-and-vnc.html#section-12)
+
+#### 7. If you don't have Turbo VNC installed on your workstation. {#7-if-you-don-t-have-turbo-vnc-installed-on-your-workstation}
+
+Get it from<http://sourceforge.net/projects/turbovnc/>
+
+#### 8. Run TurboVNC Viewer from your workstation. {#8-run-turbovnc-viewer-from-your-workstation}
+
+Mind that you should connect through the SSH tunneled port. In this
+example it is 5901 on your workstation (localhost).
+
+``` 
+$ vncviewer localhost:5901 
+```
+
+*If you use Windows version of TurboVNC Viewer, just run the Viewer and
+use address **localhost:5901**.*
+
+#### 9. Proceed to the chapter "Access the visualization node." {#9-proceed-to-the-chapter-access-the-visualization-node}
+
+*Now you should have working TurboVNC session connected to your
+workstation.*
+
+#### 10. After you end your visualization session. {#10-after-you-end-your-visualization-session}
+
+*Don't forget to correctly shutdown your own VNC server on the login
+node!*
+
+``` 
+$ vncserver -kill :1 
+```
+
+Access the visualization node
+-----------------------------
+
+To access the node use a dedicated PBS Professional scheduler queue
+**qviz**. The queue has following properties:
+
+<table>
+<colgroup>
+<col width="12%" />
+<col width="12%" />
+<col width="12%" />
+<col width="12%" />
+<col width="12%" />
+<col width="12%" />
+<col width="12%" />
+<col width="12%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th align="left">queue</th>
+<th align="left">active project</th>
+<th align="left">project resources</th>
+<th align="left">nodes</th>
+<th align="left">min ncpus*</th>
+<th align="left">priority</th>
+<th align="left">authorization</th>
+<th align="left">walltime<br />
+default/max</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left"><p><strong>qviz              </strong> Visualization queue</p></td>
+<td align="left">yes</td>
+<td align="left">none required</td>
+<td align="left">2</td>
+<td align="left">4</td>
+<td align="left"><span><em>150</em></span></td>
+<td align="left">no</td>
+<td align="left">1 hour / 2 hours</td>
+</tr>
+</tbody>
+</table>
+
+Currently when accessing the node, each user gets 4 cores of a CPU
+allocated, thus approximately 16 GB of RAM and 1/4 of the GPU capacity.
+*If more GPU power or RAM is required, it is recommended to allocate one
+whole node per user, so that all 16 cores, whole RAM and whole GPU is
+exclusive. This is currently also the maximum allowed allocation per one
+user. One hour of work is allocated by default, the user may ask for 2
+hours maximum.*
+
+To access the visualization node, follow these steps:
+
+#### 1. In your VNC session, open a terminal and allocate a node using PBSPro qsub command. {#1-in-your-vnc-session-open-a-terminal-and-allocate-a-node-using-pbspro-qsub-command}
+
+*This step is necessary to allow you to proceed with next steps.*
+
+``` 
+$ qsub -I -q qviz -A PROJECT_ID 
+```
+
+In this example the default values for CPU cores and usage time are
+used.
+
+``` 
+$ qsub -I -q qviz -A PROJECT_ID -l select=1:ncpus=16 -l walltime=02:00:00 
+```
+
+*Substitute **PROJECT_ID** with the assigned project identification
+string.*
+
+In this example a whole node for 2 hours is requested.
+
+If there are free resources for your request, you will have a shell
+running on an assigned node. Please remember the name of the node.
+
+``` 
+$ uname -n
+srv8 
+```
+
+In this example the visualization session was assigned to node **srv8**.
+
+#### 2. In your VNC session open another terminal (keep the one with interactive PBSPro job open). {#2-in-your-vnc-session-open-another-terminal-keep-the-one-with-interactive-pbspro-job-open}
+
+Setup the VirtualGL connection to the node, which PBSPro allocated for
+your job.
+
+``` 
+$ vglconnect srv8 
+```
+
+You will be connected with created VirtualGL tunnel to the visualization
+node, where you will have a shell.
+
+#### 3. Load the VirtualGL module. {#3-load-the-virtualgl-module}
+
+``` 
+$ module load virtualgl/2.4 
+```
+
+#### 4. Run your desired OpenGL accelerated application using VirtualGL script "vglrun". {#4-run-your-desired-opengl-accelerated-application-using-virtualgl-script-vglrun}
+
+``` 
+$ vglrun glxgears 
+```
+
+Please note, that if you want to run an OpenGL application which is
+available through modules, you need at first load the respective module.
+E. g. to run the **Mentat** OpenGL application from **MARC** software
+package use:
+
+``` 
+$ module load marc/2013.1 
+$ vglrun mentat 
+```
+
+#### 5. After you end your work with the OpenGL application. {#5-after-you-end-your-work-with-the-opengl-application}
+
+Just logout from the visualization node and exit both opened terminals
+and end your VNC server session as described above.
+
+Tips and Tricks
+---------------
+
+If you want to increase the responsibility of the visualization, please
+adjust your TurboVNC client settings in this way:
+
+![rem_vis_settings](turbovncclientsetting.png "rem_vis_settings")
+
+To have an idea how the settings are affecting the resulting picture
+quality three levels of "JPEG image quality" are demonstrated:
+
+1. JPEG image quality = 30
+
+![rem_vis_q3](quality3.png "rem_vis_q3")
+
+2. JPEG image quality = 15
+
+![rem_vis_q2](quality2.png "rem_vis_q2")
+
+3. JPEG image quality = 10
+
+![rem_vis_q1](quality1.png "rem_vis_q1")
--- a/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution.md
+++ b/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution.md
+Resource Allocation and Job Execution 
+=====================================
+
+
+
+  
+
+To run a [job](introduction.html), [computational
+resources](introduction.html) for this particular job
+must be allocated. This is done via the PBS Pro job workload manager
+software, which efficiently distributes workloads across the
+supercomputer. Extensive informations about PBS Pro can be found in the
+[official documentation
+here](../pbspro-documentation.html), especially in the
+[PBS Pro User's
+Guide](../pbspro-documentation/pbspro-users-guide.1).
+
+Resources Allocation Policy
+---------------------------
+
+The resources are allocated to the job in a fairshare fashion, subject
+to constraints set by the queue and resources available to the Project.
+[The
+Fairshare](resource-allocation-and-job-execution/job-priority.html)
+at Anselm ensures that individual users may consume approximately equal
+amount of resources per week. The resources are accessible via several
+queues for queueing the jobs. The queues provide prioritized and
+exclusive access to the computational resources. Following queues are
+available to Anselm users:
+
+-   **qexp**, the Express queue
+-   **qprod**, the Production queue****
+-   **qlong**, the Long queue, regula
+-   **qnvidia, qmic, qfat**, the Dedicated queues
+-   **qfree,** the Free resource utilization queue
+
+Check the queue status at <https://extranet.it4i.cz/anselm/>
+
+Read more on the [Resource Allocation
+Policy](resource-allocation-and-job-execution/resources-allocation-policy.html)
+page.
+
+Job submission and execution
+----------------------------
+
+Use the **qsub** command to submit your jobs.
+
+The qsub submits the job into the queue. The qsub command creates a
+request to the PBS Job manager for allocation of specified resources. 
+The **smallest allocation unit is entire node, 16 cores**, with
+exception of the qexp queue. The resources will be allocated when
+available, subject to allocation policies and constraints. **After the
+resources are allocated the jobscript or interactive shell is executed
+on first of the allocated nodes.**
+
+Read more on the [Job submission and
+execution](resource-allocation-and-job-execution/job-submission-and-execution.html)
+page.
+
+Capacity computing
+------------------
+
+Use Job arrays when running huge number of jobs.
+Use GNU Parallel and/or Job arrays when running (many) single core jobs.
+
+In many cases, it is useful to submit huge (<span>100+</span>) number of
+computational jobs into the PBS queue system. Huge number of (small)
+jobs is one of the most effective ways to execute embarrassingly
+parallel calculations, achieving best runtime, throughput and computer
+utilization. In this chapter, we discuss the the recommended way to run
+huge number of jobs, including **ways to run huge number of single core
+jobs**.
+
+Read more on [Capacity
+computing](resource-allocation-and-job-execution/capacity-computing.html)
+page.
+
--- a/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/capacity-computing-examples
+++ b/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/capacity-computing-examples
--- a/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/capacity-computing.md
+++ b/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/capacity-computing.md
+Capacity computing 
+==================
+
+
+
+  
+
+Introduction
+------------
+
+In many cases, it is useful to submit huge (<span>100+</span>) number of
+computational jobs into the PBS queue system. Huge number of (small)
+jobs is one of the most effective ways to execute embarrassingly
+parallel calculations, achieving best runtime, throughput and computer
+utilization.
+
+However, executing huge number of jobs via the PBS queue may strain the
+system. This strain may result in slow response to commands, inefficient
+scheduling and overall degradation of performance and user experience,
+for all users. For this reason, the number of jobs is **limited to 100
+per user, 1000 per job array**
+
+Please follow one of the procedures below, in case you wish to schedule
+more than <span>100</span> jobs at a time.
+
+-   Use [Job arrays](capacity-computing.html#job-arrays)
+    when running huge number of
+    [multithread](capacity-computing.html#shared-jobscript-on-one-node)
+    (bound to one node only) or multinode (multithread across
+    several nodes) jobs
+-   Use [GNU
+    parallel](capacity-computing.html#gnu-parallel) when
+    running single core jobs
+-   Combine[GNU parallel with Job
+    arrays](capacity-computing.html#combining-job-arrays-and-gnu-parallel) 
+    when running huge number of single core jobs
+
+Policy
+------
+
+1.  A user is allowed to submit at most 100 jobs. Each job may be [a job
+    array](capacity-computing.html#job-arrays).
+2.  The array size is at most 1000 subjobs.
+
+[]()Job arrays
+--------------
+
+Huge number of jobs may be easily submitted and managed as a job array.
+
+A job array is a compact representation of many jobs, called subjobs.
+The subjobs share the same job script, and have the same values for all
+attributes and resources, with the following exceptions:
+
+-   each subjob has a unique index, $PBS_ARRAY_INDEX
+-   job Identifiers of subjobs only differ by their indices
+-   the state of subjobs can differ (R,Q,...etc.)
+
+All subjobs within a job array have the same scheduling priority and
+schedule as independent jobs.
+Entire job array is submitted through a single qsub command and may be
+managed by qdel, qalter, qhold, qrls and qsig commands as a single job.
+
+### []()Shared jobscript
+
+All subjobs in job array use the very same, single jobscript. Each
+subjob runs its own instance of the jobscript. The instances execute
+different work controlled by $PBS_ARRAY_INDEX variable.
+
+[]()Example:
+
+Assume we have 900 input files with name beginning with "file" (e. g.
+file001, ..., file900). Assume we would like to use each of these input
+files with program executable myprog.x, each as a separate job.
+
+First, we create a tasklist file (or subjobs list), listing all tasks
+(subjobs) - all input files in our example:
+
+``` 
+$ find . -name 'file*' > tasklist
+```
+
+Then we create jobscript:
+
+``` 
+#!/bin/bash
+#PBS -A PROJECT_ID
+#PBS -q qprod
+#PBS -l select=1:ncpus=16,walltime=02:00:00
+
+# change to local scratch directory
+SCR=/lscratch/$PBS_JOBID
+mkdir -p $SCR ; cd $SCR || exit
+
+# get individual tasks from tasklist with index from PBS JOB ARRAY
+TASK=$(sed -n "${PBS_ARRAY_INDEX}p" $PBS_O_WORKDIR/tasklist)  
+
+# copy input file and executable to scratch 
+cp $PBS_O_WORKDIR/$TASK input ; cp $PBS_O_WORKDIR/myprog.x .
+
+# execute the calculation
+./myprog.x < input > output
+
+# copy output file to submit directory
+cp output $PBS_O_WORKDIR/$TASK.out
+```
+
+In this example, the submit directory holds the 900 input files,
+executable myprog.x and the jobscript file. As input for each run, we
+take the filename of input file from created tasklist file. We copy the
+input file to local scratch /lscratch/$PBS_JOBID, execute the myprog.x
+and copy the output file back to <span>the submit directory</span>,
+under the $TASK.out name. The myprog.x runs on one node only and must
+use threads to run in parallel. Be aware, that if the myprog.x **is not
+multithreaded**, then all the **jobs are run as single thread programs
+in sequential** manner. Due to allocation of the whole node, the
+**accounted time is equal to the usage of whole node**, while using only
+1/16 of the node!
+
+If huge number of parallel multicore (in means of multinode multithread,
+e. g. MPI enabled) jobs is needed to run, then a job array approach
+should also be used. The main difference compared to previous example
+using one node is that the local scratch should not be used (as it's not
+shared between nodes) and MPI or other technique for parallel multinode
+run has to be used properly.
+
+### Submit the job array
+
+To submit the job array, use the qsub -J command. The 900 jobs of the
+[example above](capacity-computing.html#array_example) may
+be submitted like this:
+
+``` 
+$ qsub -N JOBNAME -J 1-900 jobscript
+12345[].dm2
+```
+
+In this example, we submit a job array of 900 subjobs. Each subjob will
+run on full node and is assumed to take less than 2 hours (please note
+the #PBS directives in the beginning of the jobscript file, dont'
+forget to set your valid PROJECT_ID and desired queue).
+
+Sometimes for testing purposes, you may need to submit only one-element
+array. This is not allowed by PBSPro, but there's a workaround:
+
+``` 
+$ qsub -N JOBNAME -J 9-10:2 jobscript
+```
+
+This will only choose the lower index (9 in this example) for
+submitting/running your job.
+
+### Manage the job array
+
+Check status of the job array by the qstat command.
+
+``` 
+$ qstat -a 12345[].dm2
+
+dm2:
+                                                            Req'd  Req'd   Elap
+Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
+--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
+12345[].dm2     user2    qprod    xx          13516   1  16    --  00:50 B 00:02
+```
+
+The status B means that some subjobs are already running.
+
+Check status of the first 100 subjobs by the qstat command.
+
+``` 
+$ qstat -a 12345[1-100].dm2
+
+dm2:
+                                                            Req'd  Req'd   Elap
+Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
+--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
+12345[1].dm2    user2    qprod    xx          13516   1  16    --  00:50 R 00:02
+12345[2].dm2    user2    qprod    xx          13516   1  16    --  00:50 R 00:02
+12345[3].dm2    user2    qprod    xx          13516   1  16    --  00:50 R 00:01
+12345[4].dm2    user2    qprod    xx          13516   1  16    --  00:50 Q   --
+     .             .        .      .             .    .   .     .    .   .    .
+     ,             .        .      .             .    .   .     .    .   .    . 
+12345[100].dm2  user2    qprod    xx          13516   1  16    --  00:50 Q   --
+```
+
+Delete the entire job array. Running subjobs will be killed, queueing
+subjobs will be deleted.
+
+``` 
+$ qdel 12345[].dm2
+```
+
+Deleting large job arrays may take a while.
+
+Display status information for all user's jobs, job arrays, and subjobs.
+
+``` 
+$ qstat -u $USER -t
+```
+
+Display status information for all user's subjobs.
+
+``` 
+$ qstat -u $USER -tJ
+```
+
+Read more on job arrays in the [PBSPro Users
+guide](../../pbspro-documentation.html).
+
+[]()GNU parallel
+----------------
+
+Use GNU parallel to run many single core tasks on one node.
+
+GNU parallel is a shell tool for executing jobs in parallel using one or
+more computers. A job can be a single command or a small script that has
+to be run for each of the lines in the input. GNU parallel is most
+useful in running single core jobs via the queue system on  Anselm.
+
+For more information and examples see the parallel man page:
+
+``` 
+$ module add parallel
+$ man parallel
+```
+
+### GNU parallel jobscript
+
+The GNU parallel shell executes multiple instances of the jobscript
+using all cores on the node. The instances execute different work,
+controlled by the $PARALLEL_SEQ variable.
+
+[]()Example:
+
+Assume we have 101 input files with name beginning with "file" (e. g.
+file001, ..., file101). Assume we would like to use each of these input
+files with program executable myprog.x, each as a separate single core
+job. We call these single core jobs tasks.
+
+First, we create a tasklist file, listing all tasks - all input files in
+our example:
+
+``` 
+$ find . -name 'file*' > tasklist
+```
+
+Then we create jobscript:
+
+``` 
+#!/bin/bash
+#PBS -A PROJECT_ID
+#PBS -q qprod
+#PBS -l select=1:ncpus=16,walltime=02:00:00
+
+[ -z "$PARALLEL_SEQ" ] && 
+{ module add parallel ; exec parallel -a $PBS_O_WORKDIR/tasklist $0 ; }
+
+# change to local scratch directory
+SCR=/lscratch/$PBS_JOBID/$PARALLEL_SEQ
+mkdir -p $SCR ; cd $SCR || exit
+
+# get individual task from tasklist
+TASK=$1  
+
+# copy input file and executable to scratch 
+cp $PBS_O_WORKDIR/$TASK input 
+
+# execute the calculation
+cat  input > output
+
+# copy output file to submit directory
+cp output $PBS_O_WORKDIR/$TASK.out
+```
+
+In this example, tasks from tasklist are executed via the GNU
+parallel. The jobscript executes multiple instances of itself in
+parallel, on all cores of the node. Once an instace of jobscript is
+finished, new instance starts until all entries in tasklist are
+processed. Currently processed entry of the joblist may be retrieved via
+$1 variable. Variable $TASK expands to one of the input filenames from
+tasklist. We copy the input file to local scratch, execute the myprog.x
+and copy the output file back to the submit directory, under the
+$TASK.out name. 
+
+### Submit the job
+
+To submit the job, use the qsub command. The 101 tasks' job of the
+[example above](capacity-computing.html#gp_example) may be
+submitted like this:
+
+``` 
+$ qsub -N JOBNAME jobscript
+12345.dm2
+```
+
+In this example, we submit a job of 101 tasks. 16 input files will be
+processed in  parallel. The 101 tasks on 16 cores are assumed to
+complete in less than 2 hours.
+
+Please note the #PBS directives in the beginning of the jobscript file,
+dont' forget to set your valid PROJECT_ID and desired queue.
+
+[]()Job arrays and GNU parallel
+-------------------------------
+
+Combine the Job arrays and GNU parallel for best throughput of single
+core jobs
+
+While job arrays are able to utilize all available computational nodes,
+the GNU parallel can be used to efficiently run multiple single-core
+jobs on single node. The two approaches may be combined to utilize all
+available (current and future) resources to execute single core jobs.
+
+Every subjob in an array runs GNU parallel to utilize all cores on the
+node
+
+### GNU parallel, shared jobscript
+
+Combined approach, very similar to job arrays, can be taken. Job array
+is submitted to the queuing system. The subjobs run GNU parallel. The
+GNU parallel shell executes multiple instances of the jobscript using
+all cores on the node. The instances execute different work, controlled
+by the $PBS_JOB_ARRAY and $PARALLEL_SEQ variables.
+
+[]()Example:
+
+Assume we have 992 input files with name beginning with "file" (e. g.
+file001, ..., file992). Assume we would like to use each of these input
+files with program executable myprog.x, each as a separate single core
+job. We call these single core jobs tasks.
+
+First, we create a tasklist file, listing all tasks - all input files in
+our example:
+
+``` 
+$ find . -name 'file*' > tasklist
+```
+
+Next we create a file, controlling how many tasks will be executed in
+one subjob
+
+``` 
+$ seq 32 > numtasks
+```
+
+Then we create jobscript:
+
+``` 
+#!/bin/bash
+#PBS -A PROJECT_ID
+#PBS -q qprod
+#PBS -l select=1:ncpus=16,walltime=02:00:00
+
+[ -z "$PARALLEL_SEQ" ] && 
+{ module add parallel ; exec parallel -a $PBS_O_WORKDIR/numtasks $0 ; }
+
+# change to local scratch directory
+SCR=/lscratch/$PBS_JOBID/$PARALLEL_SEQ
+mkdir -p $SCR ; cd $SCR || exit
+
+# get individual task from tasklist with index from PBS JOB ARRAY and index form Parallel
+IDX=$(($PBS_ARRAY_INDEX + $PARALLEL_SEQ - 1))
+TASK=$(sed -n "${IDX}p" $PBS_O_WORKDIR/tasklist)
+[ -z "$TASK" ] && exit
+
+# copy input file and executable to scratch 
+cp $PBS_O_WORKDIR/$TASK input 
+
+# execute the calculation
+cat input > output
+
+# copy output file to submit directory
+cp output $PBS_O_WORKDIR/$TASK.out
+```
+
+In this example, the jobscript executes in multiple instances in
+parallel, on all cores of a computing node.  Variable $TASK expands to
+one of the input filenames from tasklist. We copy the input file to
+local scratch, execute the myprog.x and copy the output file back to the
+submit directory, under the $TASK.out name.  The numtasks file controls
+how many tasks will be run per subjob. Once an task is finished, new
+task starts, until the number of tasks  in numtasks file is reached.
+
+Select  subjob walltime and number of tasks per subjob  carefully
+
+ When deciding this values, think about following guiding rules :
+
+1.  Let n=N/16.  Inequality (n+1) * T &lt; W should hold. The N is
+    number of tasks per subjob, T is expected single task walltime and W
+    is subjob walltime. Short subjob walltime improves scheduling and
+    job throughput.
+2.  Number of tasks should be modulo 16.
+3.  These rules are valid only when all tasks have similar task
+    walltimes T.
+
+### Submit the job array
+
+To submit the job array, use the qsub -J command. The 992 tasks' job of
+the [example
+above](capacity-computing.html#combined_example) may be
+submitted like this:
+
+``` 
+$ qsub -N JOBNAME -J 1-992:32 jobscript
+12345[].dm2
+```
+
+In this example, we submit a job array of 31 subjobs. Note the  -J
+1-992:**32**, this must be the same as the number sent to numtasks file.
+Each subjob will run on full node and process 16 input files in
+parallel, 32 in total per subjob.  Every subjob is assumed to complete
+in less than 2 hours.
+
+Please note the #PBS directives in the beginning of the jobscript file,
+dont' forget to set your valid PROJECT_ID and desired queue.
+
+Examples
+--------
+
+Download the examples in
+[capacity.zip](capacity-computing-examples), 
+illustrating the above listed ways to run huge number of jobs. We
+recommend to try out the examples, before using this for running
+production jobs.
+
+Unzip the archive in an empty directory on Anselm and follow the
+instructions in the README file
+
+``` 
+$ unzip capacity.zip
+$ cat README
+```
+
+ 
--- a/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/fairshare_formula.png
+++ b/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/fairshare_formula.png
--- a/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/introduction.md
+++ b/docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/introduction.md
+Resource Allocation and Job Execution 
+=====================================
+
+
+
+  
+
+To run a [job](../introduction.html), [computational
+resources](../introduction.html) for this particular job
+must be allocated. This is done via the PBS Pro job workload manager
+software, which efficiently distributes workloads across the
+supercomputer. Extensive informations about PBS Pro can be found in the
+[official documentation
+here](../../pbspro-documentation.html), especially in
+the [PBS Pro User's
+Guide](../../pbspro-documentation/pbspro-users-guide.1).
+
+Resources Allocation Policy
+---------------------------
+
+The resources are allocated to the job in a fairshare fashion, subject
+to constraints set by the queue and resources available to the Project.
+[The Fairshare](job-priority.html) at Anselm ensures
+that individual users may consume approximately equal amount of
+resources per week. The resources are accessible via several queues for
+queueing the jobs. The queues provide prioritized and exclusive access
+to the computational resources. Following queues are available to Anselm
+users:
+
+-   **qexp**, the Express queue
+-   **qprod**, the Production queue****
+-   **qlong**, the Long queue, regula
+-   **qnvidia, qmic, qfat**, the Dedicated queues
+-   **qfree,** the Free resource utilization queue
+
+Check the queue status at <https://extranet.it4i.cz/anselm/>
+
+Read more on the [Resource Allocation
+Policy](resources-allocation-policy.html) page.
+
+Job submission and execution
+----------------------------
+
+Use the **qsub** command to submit your jobs.
+
+The qsub submits the job into the queue. The qsub command creates a
+request to the PBS Job manager for allocation of specified resources. 
+The **smallest allocation unit is entire node, 16 cores**, with
+exception of the qexp queue. The resources will be allocated when
+available, subject to allocation policies and constraints. **After the
+resources are allocated the jobscript or interactive shell is executed
+on first of the allocated nodes.**
+
+Read more on the [Job submission and
+execution](job-submission-and-execution.html) page.
+
+Capacity computing
+------------------
+
+Use Job arrays when running huge number of jobs.
+Use GNU Parallel and/or Job arrays when running (many) single core jobs.
+
+In many cases, it is useful to submit huge (<span>100+</span>) number of
+computational jobs into the PBS queue system. Huge number of (small)
+jobs is one of the most effective ways to execute embarrassingly
+parallel calculations, achieving best runtime, throughput and computer
+utilization. In this chapter, we discuss the the recommended way to run
+huge number of jobs, including **ways to run huge number of single core
+jobs**.
+
+Read more on [Capacity
+computing](capacity-computing.html) page.
+
+
+
No results found