@@ -41,7 +41,7 @@ Assume we have 900 input files with name beginning with "file" (e. g. file001, .
First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) - all input files in our example:
```bash
```console
$find .-name'file*'> tasklist
```
...
...
@@ -78,7 +78,7 @@ If huge number of parallel multicore (in means of multinode multithread, e. g. M
To submit the job array, use the qsub -J command. The 900 jobs of the [example above](capacity-computing/#array_example) may be submitted like this:
```bash
```console
$qsub -N JOBNAME -J 1-900 jobscript
12345[].dm2
```
...
...
@@ -87,7 +87,7 @@ In this example, we submit a job array of 900 subjobs. Each subjob will run on f
Sometimes for testing purposes, you may need to submit only one-element array. This is not allowed by PBSPro, but there's a workaround:
```bash
```console
$qsub -N JOBNAME -J 9-10:2 jobscript
```
...
...
@@ -97,7 +97,7 @@ This will only choose the lower index (9 in this example) for submitting/running
Check status of the job array by the qstat command.
```bash
```console
$qstat -a 12345[].dm2
dm2:
...
...
@@ -110,7 +110,7 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
The status B means that some subjobs are already running.
Check status of the first 100 subjobs by the qstat command.
```bash
```console
$qstat -a 12345[1-100].dm2
dm2:
...
...
@@ -128,20 +128,20 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted.
```bash
```console
$qdel 12345[].dm2
```
Deleting large job arrays may take a while.
Display status information for all user's jobs, job arrays, and subjobs.
```bash
```console
$qstat -u$USER-t
```
Display status information for all user's subjobs.
```bash
```console
$qstat -u$USER-tJ
```
...
...
@@ -156,7 +156,7 @@ GNU parallel is a shell tool for executing jobs in parallel using one or more co
For more information and examples see the parallel man page:
```bash
```console
$module add parallel
$man parallel
```
...
...
@@ -171,7 +171,7 @@ Assume we have 101 input files with name beginning with "file" (e. g. file001, .
First, we create a tasklist file, listing all tasks - all input files in our example:
```bash
```console
$find .-name'file*'> tasklist
```
...
...
@@ -209,7 +209,7 @@ In this example, tasks from tasklist are executed via the GNU parallel. The jobs
To submit the job, use the qsub command. The 101 tasks' job of the [example above](capacity-computing/#gp_example) may be submitted like this:
```bash
```console
$qsub -N JOBNAME jobscript
12345.dm2
```
...
...
@@ -239,13 +239,13 @@ Assume we have 992 input files with name beginning with "file" (e. g. file001, .
First, we create a tasklist file, listing all tasks - all input files in our example:
```bash
```console
$find .-name'file*'> tasklist
```
Next we create a file, controlling how many tasks will be executed in one subjob
```bash
```console
$seq 32 > numtasks
```
...
...
@@ -294,7 +294,7 @@ When deciding this values, think about following guiding rules:
To submit the job array, use the qsub -J command. The 992 tasks' job of the [example above](capacity-computing/#combined_example) may be submitted like this:
```bash
```console
$qsub -N JOBNAME -J 1-992:32 jobscript
12345[].dm2
```
...
...
@@ -310,7 +310,7 @@ Download the examples in [capacity.zip](capacity.zip), illustrating the above li
Unzip the archive in an empty directory on Anselm and follow the instructions in the README file
@@ -85,7 +85,7 @@ Anselm is equipped with Intel Sandy Bridge processors Intel Xeon E5-2665 (nodes
Nodes equipped with Intel Xeon E5-2665 CPU have set PBS resource attribute cpu_freq = 24, nodes equipped with Intel Xeon E5-2470 CPU have set PBS resource attribute cpu_freq = 23.
```bash
```console
$qsub -A OPEN-0-0 -q qprod -lselect=4:ncpus=16:cpu_freq=24 -I
```
...
...
@@ -93,8 +93,8 @@ In this example, we allocate 4 nodes, 16 cores at 2.4GHhz per node.
Intel Turbo Boost Technology is used by default, you can disable it for all nodes of job by using resource attribute cpu_turbo_boost.
```bash
$ qsub -A OPEN-0-0 -q qprod -lselect=4:ncpus=16 -lcpu_turbo_boost=0 -I
```console
$qsub -A OPEN-0-0 -q qprod -lselect=4:ncpus=16 -lcpu_turbo_boost=0 -I
@@ -39,14 +39,14 @@ The modules may be loaded, unloaded and switched, according to momentary needs.
To check available modules use
```bash
$ module avail
```console
$module avail**or** ml av
```
To load a module, for example the octave module use
```bash
$ module load octave
```console
$module load octave**or** ml octave
```
loading the octave module will set up paths and environment variables of your active shell such that you are ready to run the octave software
...
...
@@ -54,18 +54,18 @@ loading the octave module will set up paths and environment variables of your ac
To check loaded modules use
```bash
$ module list
$ module list**or** ml
```
To unload a module, for example the octave module use
```bash
$ module unload octave
```console
$module unload octave**or** ml -octave
```
Learn more on modules by reading the module man page
```bash
```console
$man module
```
...
...
@@ -79,7 +79,7 @@ PrgEnv-intel sets up the INTEL development environment in conjunction with the I
All application modules on Salomon cluster (and further) will be build using tool called [EasyBuild](http://hpcugent.github.io/easybuild/"EasyBuild"). In case that you want to use some applications that are build by EasyBuild already, you have to modify your MODULEPATH environment variable.
@@ -16,7 +16,7 @@ When allocating computational resources for the job, please specify
Submit the job using the qsub command:
```bash
```console
$qsub -A Project_ID -q queue -lselect=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] jobscript
```
...
...
@@ -24,25 +24,25 @@ The qsub submits the job into the queue, in another words the qsub command creat
### Job Submission Examples
```bash
```console
$qsub -A OPEN-0-0 -q qprod -lselect=64:ncpus=16,walltime=03:00:00 ./myjob
```
In this example, we allocate 64 nodes, 16 cores per node, for 3 hours. We allocate these resources via the qprod queue, consumed resources will be accounted to the Project identified by Project ID OPEN-0-0. Jobscript myjob will be executed on the first node in the allocation.
```bash
```console
$qsub -q qexp -lselect=4:ncpus=16 -I
```
In this example, we allocate 4 nodes, 16 cores per node, for 1 hour. We allocate these resources via the qexp queue. The resources will be available interactively
```bash
```console
$qsub -A OPEN-0-0 -q qnvidia -lselect=10:ncpus=16 ./myjob
```
In this example, we allocate 10 nvidia accelerated nodes, 16 cores per node, for 24 hours. We allocate these resources via the qnvidia queue. Jobscript myjob will be executed on the first node in the allocation.
```bash
```console
$qsub -A OPEN-0-0 -q qfree -lselect=10:ncpus=16 ./myjob
```
...
...
@@ -50,13 +50,13 @@ In this example, we allocate 10 nodes, 16 cores per node, for 12 hours. We alloc
All qsub options may be [saved directly into the jobscript](#example-jobscript-for-mpi-calculation-with-preloaded-inputs). In such a case, no options to qsub are needed.
```bash
```console
$qsub ./myjob
```
By default, the PBS batch system sends an e-mail only when the job is aborted. Disabling mail events completely can be done like this:
```bash
```console
$qsub -m n
```
...
...
@@ -66,8 +66,8 @@ $ qsub -m n
Specific nodes may be allocated via the PBS
```bash
qsub -A OPEN-0-0 -q qprod -lselect=1:ncpus=16:host=cn171+1:ncpus=16:host=cn172 -I
```console
$qsub -A OPEN-0-0 -q qprod -lselect=1:ncpus=16:host=cn171+1:ncpus=16:host=cn172 -I
```
In this example, we allocate nodes cn171 and cn172, all 16 cores per node, for 24 hours. Consumed resources will be accounted to the Project identified by Project ID OPEN-0-0. The resources will be available interactively.
...
...
@@ -81,7 +81,7 @@ Nodes equipped with Intel Xeon E5-2665 CPU have base clock frequency 2.4GHz, nod
$qsub -A OPEN-0-0 -q qprod -lselect=4:ncpus=16:cpu_freq=24 -I
```
...
...
@@ -95,8 +95,8 @@ Nodes sharing the same switch may be selected via the PBS resource attribute ibs
We recommend allocating compute nodes of a single switch when best possible computational network performance is required to run the job efficiently:
```bash
qsub -A OPEN-0-0 -q qprod -lselect=18:ncpus=16:ibswitch=isw11 ./myjob
```console
$qsub -A OPEN-0-0 -q qprod -lselect=18:ncpus=16:ibswitch=isw11 ./myjob
```
In this example, we request all the 18 nodes sharing the isw11 switch for 24 hours. Full chassis will be allocated.
...
...
@@ -109,8 +109,8 @@ Intel Turbo Boost Technology is on by default. We strongly recommend keeping the
If necessary (such as in case of benchmarking) you can disable the Turbo for all nodes of the job by using the PBS resource attribute cpu_turbo_boost
```bash
$ qsub -A OPEN-0-0 -q qprod -lselect=4:ncpus=16 -lcpu_turbo_boost=0 -I
```console
$qsub -A OPEN-0-0 -q qprod -lselect=4:ncpus=16 -lcpu_turbo_boost=0 -I
```
More about the Intel Turbo Boost in the TurboBoost section
...
...
@@ -119,8 +119,8 @@ More about the Intel Turbo Boost in the TurboBoost section
In the following example, we select an allocation for benchmarking a very special and demanding MPI program. We request Turbo off, 2 full chassis of compute nodes (nodes sharing the same IB switches) for 30 minutes:
@@ -135,7 +135,7 @@ Although this example is somewhat artificial, it demonstrates the flexibility of
!!! note
Check status of your jobs using the **qstat** and **check-pbs-jobs** commands
```bash
```console
$qstat -a
$qstat -a-u username
$qstat -an-u username
...
...
@@ -144,7 +144,7 @@ $ qstat -f 12345.srv11
Example:
```bash
```console
$qstat -a
srv11:
...
...
@@ -160,19 +160,17 @@ In this example user1 and user2 are running jobs named job1, job2 and job3x. The
Check status of your jobs using check-pbs-jobs command. Check presence of user's PBS jobs' processes on execution hosts. Display load, processes. Display job standard and error output. Continuously display (tail -f) job standard or error output.
JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165
Check session id: OK
...
...
@@ -183,7 +181,7 @@ cn165: No process
In this example we see that job 35141.dm2 currently runs no process on allocated node cn165, which may indicate an execution error.
```bash
```console
$check-pbs-jobs --print-load--print-processes
JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165
Print load
...
...
@@ -199,7 +197,7 @@ cn164: 99.7 run-task
In this example we see that job 35141.dm2 currently runs process run-task on node cn164, using one thread only, while node cn165 is empty, which may indicate an execution error.
```bash
```console
$check-pbs-jobs --jobid 35141.dm2 --print-job-out
JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165
Print job standard output:
...
...
@@ -218,19 +216,19 @@ In this example, we see actual output (some iteration loops) of the job 35141.dm
You may release your allocation at any time, using qdel command
```bash
```console
$qdel 12345.srv11
```
You may kill a running job by force, using qsig command
```bash
```console
$qsig -s 9 12345.srv11
```
Learn more by reading the pbs man page
```bash
```console
$man pbs_professional
```
...
...
@@ -246,7 +244,7 @@ The Jobscript is a user made script, controlling sequence of commands for execut
!!! note
The jobscript or interactive shell is executed on first of the allocated nodes.
@@ -262,7 +260,7 @@ In this example, the nodes cn17, cn108, cn109 and cn110 were allocated for 1 hou
The jobscript or interactive shell is by default executed in home directory
```bash
```console
$qsub -q qexp -lselect=4:ncpus=16 -I
qsub: waiting for job 15210.srv11 to start
qsub: job 15210.srv11 ready
...
...
@@ -280,7 +278,7 @@ The allocated nodes are accessible via ssh from login nodes. The nodes may acces
Calculations on allocated nodes may be executed remotely via the MPI, ssh, pdsh or clush. You may find out which nodes belong to the allocation by reading the $PBS_NODEFILE file
@@ -36,14 +36,14 @@ Most of the information needed by PRACE users accessing the Anselm TIER-1 system
Before you start to use any of the services don't forget to create a proxy certificate from your certificate:
```bash
$ grid-proxy-init
```console
$grid-proxy-init
```
To check whether your proxy certificate is still valid (by default it's valid 12 hours), use:
```bash
$ grid-proxy-info
```console
$grid-proxy-info
```
To access Anselm cluster, two login nodes running GSI SSH service are available. The service is available from public Internet as well as from the internal PRACE network (accessible only from other PRACE partners).
...
...
@@ -58,14 +58,14 @@ It is recommended to use the single DNS name anselm-prace.it4i.cz which is distr
When logging from other PRACE system, the prace_service script can be used:
```bash
$ gsissh `prace_service -e-s anselm`
```console
$gsissh `prace_service -e-s anselm`
```
Although the preferred and recommended file transfer mechanism is [using GridFTP](prace/#file-transfers), the GSI SSH implementation on Anselm supports also SCP, so for small files transfer gsiscp can be used: