Commit 9bbfd123 authored by Lukáš Krupčík's avatar Lukáš Krupčík
Browse files

salomon done

parent 5b89a84d
Pipeline #2111 passed with stages
in 1 minute and 3 seconds
......@@ -41,7 +41,7 @@ Assume we have 900 input files with name beginning with "file" (e. g. file001, .
First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) - all input files in our example:
```bash
```console
$ find . -name 'file*' > tasklist
```
......@@ -78,7 +78,7 @@ If huge number of parallel multicore (in means of multinode multithread, e. g. M
To submit the job array, use the qsub -J command. The 900 jobs of the [example above](capacity-computing/#array_example) may be submitted like this:
```bash
```console
$ qsub -N JOBNAME -J 1-900 jobscript
506493[].isrv5
```
......@@ -87,7 +87,7 @@ In this example, we submit a job array of 900 subjobs. Each subjob will run on f
Sometimes for testing purposes, you may need to submit only one-element array. This is not allowed by PBSPro, but there's a workaround:
```bash
```console
$ qsub -N JOBNAME -J 9-10:2 jobscript
```
......@@ -97,7 +97,7 @@ This will only choose the lower index (9 in this example) for submitting/running
Check status of the job array by the qstat command.
```bash
```console
$ qstat -a 506493[].isrv5
isrv5:
......@@ -111,7 +111,7 @@ The status B means that some subjobs are already running.
Check status of the first 100 subjobs by the qstat command.
```bash
```console
$ qstat -a 12345[1-100].isrv5
isrv5:
......@@ -129,7 +129,7 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted.
```bash
```console
$ qdel 12345[].isrv5
```
......@@ -137,13 +137,13 @@ Deleting large job arrays may take a while.
Display status information for all user's jobs, job arrays, and subjobs.
```bash
```console
$ qstat -u $USER -t
```
Display status information for all user's subjobs.
```bash
```console
$ qstat -u $USER -tJ
```
......@@ -158,7 +158,7 @@ GNU parallel is a shell tool for executing jobs in parallel using one or more co
For more information and examples see the parallel man page:
```bash
```console
$ module add parallel
$ man parallel
```
......@@ -173,7 +173,7 @@ Assume we have 101 input files with name beginning with "file" (e. g. file001, .
First, we create a tasklist file, listing all tasks - all input files in our example:
```bash
```console
$ find . -name 'file*' > tasklist
```
......@@ -211,7 +211,7 @@ In this example, tasks from tasklist are executed via the GNU parallel. The jobs
To submit the job, use the qsub command. The 101 tasks' job of the [example above](capacity-computing/#gp_example) may be submitted like this:
```bash
```console
$ qsub -N JOBNAME jobscript
12345.dm2
```
......@@ -241,13 +241,13 @@ Assume we have 992 input files with name beginning with "file" (e. g. file001, .
First, we create a tasklist file, listing all tasks - all input files in our example:
```bash
```console
$ find . -name 'file*' > tasklist
```
Next we create a file, controlling how many tasks will be executed in one subjob
```bash
```console
$ seq 32 > numtasks
```
......@@ -296,7 +296,7 @@ When deciding this values, think about following guiding rules :
To submit the job array, use the qsub -J command. The 992 tasks' job of the [example above](capacity-computing/#combined_example) may be submitted like this:
```bash
```console
$ qsub -N JOBNAME -J 1-992:32 jobscript
12345[].dm2
```
......@@ -312,7 +312,7 @@ Download the examples in [capacity.zip](capacity.zip), illustrating the above li
Unzip the archive in an empty directory on Anselm and follow the instructions in the README file
```bash
```console
$ unzip capacity.zip
$ cd capacity
$ cat README
......
......@@ -4,7 +4,7 @@
After logging in, you may want to configure the environment. Write your preferred path definitions, aliases, functions and module loads in the .bashrc file
```bash
```console
# ./bashrc
# Source global definitions
......@@ -32,7 +32,7 @@ In order to configure your shell for running particular application on Salomon w
Application modules on Salomon cluster are built using [EasyBuild](http://hpcugent.github.io/easybuild/ "EasyBuild"). The modules are divided into the following structure:
```bash
```console
base: Default module class
bio: Bioinformatics, biology and biomedical
cae: Computer Aided Engineering (incl. CFD)
......@@ -63,33 +63,33 @@ The modules may be loaded, unloaded and switched, according to momentary needs.
To check available modules use
```bash
$ module avail
```console
$ module avail **or** ml av
```
To load a module, for example the Open MPI module use
```bash
$ module load OpenMPI
```console
$ module load OpenMPI **or** ml OpenMPI
```
loading the Open MPI module will set up paths and environment variables of your active shell such that you are ready to run the Open MPI software
To check loaded modules use
```bash
$ module list
```console
$ module list **or** ml
```
To unload a module, for example the Open MPI module use
```bash
$ module unload OpenMPI
```console
$ module unload OpenMPI **or** ml -OpenMPI
```
Learn more on modules by reading the module man page
```bash
```console
$ man module
```
......
......@@ -16,7 +16,7 @@ When allocating computational resources for the job, please specify
Submit the job using the qsub command:
```bash
```console
$ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] jobscript
```
......@@ -27,25 +27,25 @@ The qsub submits the job into the queue, in another words the qsub command creat
### Job Submission Examples
```bash
```console
$ qsub -A OPEN-0-0 -q qprod -l select=64:ncpus=24,walltime=03:00:00 ./myjob
```
In this example, we allocate 64 nodes, 24 cores per node, for 3 hours. We allocate these resources via the qprod queue, consumed resources will be accounted to the Project identified by Project ID OPEN-0-0. Jobscript myjob will be executed on the first node in the allocation.
```bash
```console
$ qsub -q qexp -l select=4:ncpus=24 -I
```
In this example, we allocate 4 nodes, 24 cores per node, for 1 hour. We allocate these resources via the qexp queue. The resources will be available interactively
```bash
```console
$ qsub -A OPEN-0-0 -q qlong -l select=10:ncpus=24 ./myjob
```
In this example, we allocate 10 nodes, 24 cores per node, for 72 hours. We allocate these resources via the qlong queue. Jobscript myjob will be executed on the first node in the allocation.
```bash
```console
$ qsub -A OPEN-0-0 -q qfree -l select=10:ncpus=24 ./myjob
```
......@@ -57,13 +57,13 @@ To allocate a node with Xeon Phi co-processor, user needs to specify that in sel
The absence of specialized queue for accessing the nodes with cards means, that the Phi cards can be utilized in any queue, including qexp for testing/experiments, qlong for longer jobs, qfree after the project resources have been spent, etc. The Phi cards are thus also available to PRACE users. There's no need to ask for permission to utilize the Phi cards in project proposals.
```bash
```console
$ qsub -A OPEN-0-0 -I -q qprod -l select=1:ncpus=24:accelerator=True:naccelerators=2:accelerator_model=phi7120 ./myjob
```
In this example, we allocate 1 node, with 24 cores, with 2 Xeon Phi 7120p cards, running batch job ./myjob. The default time for qprod is used, e. g. 24 hours.
```bash
```console
$ qsub -A OPEN-0-0 -I -q qlong -l select=4:ncpus=24:accelerator=True:naccelerators=2 -l walltime=56:00:00 -I
```
......@@ -78,13 +78,13 @@ In this example, we allocate 4 nodes, with 24 cores per node (totalling 96 cores
The UV2000 (node uv1) offers 3328GB of RAM and 112 cores, distributed in 14 NUMA nodes. A NUMA node packs 8 cores and approx. 236GB RAM. In the PBS the UV2000 provides 14 chunks, a chunk per NUMA node (see [Resource allocation policy](resources-allocation-policy/)). The jobs on UV2000 are isolated from each other by cpusets, so that a job by one user may not utilize CPU or memory allocated to a job by other user. Always, full chunks are allocated, a job may only use resources of the NUMA nodes allocated to itself.
```bash
```console
$ qsub -A OPEN-0-0 -q qfat -l select=14 ./myjob
```
In this example, we allocate all 14 NUMA nodes (corresponds to 14 chunks), 112 cores of the SGI UV2000 node for 72 hours. Jobscript myjob will be executed on the node uv1.
```bash
```console
$ qsub -A OPEN-0-0 -q qfat -l select=1:mem=2000GB ./myjob
```
......@@ -94,13 +94,13 @@ In this example, we allocate 2000GB of memory on the UV2000 for 72 hours. By req
All qsub options may be [saved directly into the jobscript](#example-jobscript-for-mpi-calculation-with-preloaded-inputs). In such a case, no options to qsub are needed.
```bash
```console
$ qsub ./myjob
```
By default, the PBS batch system sends an e-mail only when the job is aborted. Disabling mail events completely can be done like this:
```bash
```console
$ qsub -m n
```
......@@ -113,13 +113,13 @@ $ qsub -m n
Specific nodes may be selected using PBS resource attribute host (for hostnames):
```bash
```console
qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=24:host=r24u35n680+1:ncpus=24:host=r24u36n681 -I
```
Specific nodes may be selected using PBS resource attribute cname (for short names in cns[0-1]+ format):
```bash
```console
qsub -A OPEN-0-0 -q qprod -l select=1:ncpus=24:host=cns680+1:ncpus=24:host=cns681 -I
```
......@@ -142,7 +142,7 @@ Nodes directly connected to the one InifiBand switch can be allocated using node
In this example, we request all 9 nodes directly connected to the same switch using node grouping placement.
```bash
```console
$ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24 -l place=group=switch ./myjob
```
......@@ -155,13 +155,13 @@ Nodes directly connected to the specific InifiBand switch can be selected using
In this example, we request all 9 nodes directly connected to r4i1s0sw1 switch.
```bash
```console
$ qsub -A OPEN-0-0 -q qprod -l select=9:ncpus=24:switch=r4i1s0sw1 ./myjob
```
List of all InifiBand switches:
```bash
```console
$ qmgr -c 'print node @a' | grep switch | awk '{print $6}' | sort -u
r1i0s0sw0
r1i0s0sw1
......@@ -169,12 +169,11 @@ r1i1s0sw0
r1i1s0sw1
r1i2s0sw0
...
...
```
List of all all nodes directly connected to the specific InifiBand switch:
```bash
```console
$ qmgr -c 'p n @d' | grep 'switch = r36sw3' | awk '{print $3}' | sort
r36u31n964
r36u32n965
......@@ -203,7 +202,7 @@ Nodes located in the same dimension group may be allocated using node grouping o
In this example, we allocate 16 nodes in the same [hypercube dimension](7d-enhanced-hypercube/) 1 group.
```bash
```console
$ qsub -A OPEN-0-0 -q qprod -l select=16:ncpus=24 -l place=group=ehc_1d -I
```
......@@ -211,7 +210,7 @@ For better understanding:
List of all groups in dimension 1:
```bash
```console
$ qmgr -c 'p n @d' | grep ehc_1d | awk '{print $6}' | sort |uniq -c
18 r1i0
18 r1i1
......@@ -222,7 +221,7 @@ $ qmgr -c 'p n @d' | grep ehc_1d | awk '{print $6}' | sort |uniq -c
List of all all nodes in specific dimension 1 group:
```bash
```console
$ $ qmgr -c 'p n @d' | grep 'ehc_1d = r1i0' | awk '{print $3}' | sort
r1i0n0
r1i0n1
......@@ -236,7 +235,7 @@ r1i0n11
!!! note
Check status of your jobs using the **qstat** and **check-pbs-jobs** commands
```bash
```console
$ qstat -a
$ qstat -a -u username
$ qstat -an -u username
......@@ -245,7 +244,7 @@ $ qstat -f 12345.isrv5
Example:
```bash
```console
$ qstat -a
srv11:
......@@ -261,7 +260,7 @@ In this example user1 and user2 are running jobs named job1, job2 and job3x. The
Check status of your jobs using check-pbs-jobs command. Check presence of user's PBS jobs' processes on execution hosts. Display load, processes. Display job standard and error output. Continuously display (tail -f) job standard or error output.
```bash
```console
$ check-pbs-jobs --check-all
$ check-pbs-jobs --print-load --print-processes
$ check-pbs-jobs --print-job-out --print-job-err
......@@ -271,7 +270,7 @@ $ check-pbs-jobs --jobid JOBID --tailf-job-out
Examples:
```bash
```console
$ check-pbs-jobs --check-all
JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3
Check session id: OK
......@@ -282,7 +281,7 @@ r3i6n3: No process
In this example we see that job 35141.dm2 currently runs no process on allocated node r3i6n2, which may indicate an execution error.
```bash
```console
$ check-pbs-jobs --print-load --print-processes
JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3
Print load
......@@ -298,7 +297,7 @@ r3i6n2: 99.7 run-task
In this example we see that job 35141.dm2 currently runs process run-task on node r3i6n2, using one thread only, while node r3i6n3 is empty, which may indicate an execution error.
```bash
```console
$ check-pbs-jobs --jobid 35141.dm2 --print-job-out
JOB 35141.dm2, session_id 71995, user user2, nodes r3i6n2,r3i6n3
Print job standard output:
......@@ -317,19 +316,19 @@ In this example, we see actual output (some iteration loops) of the job 35141.dm
You may release your allocation at any time, using qdel command
```bash
```console
$ qdel 12345.isrv5
```
You may kill a running job by force, using qsig command
```bash
```console
$ qsig -s 9 12345.isrv5
```
Learn more by reading the pbs man page
```bash
```console
$ man pbs_professional
```
......@@ -345,7 +344,7 @@ The Jobscript is a user made script, controlling sequence of commands for execut
!!! note
The jobscript or interactive shell is executed on first of the allocated nodes.
```bash
```console
$ qsub -q qexp -l select=4:ncpus=24 -N Name0 ./myjob
$ qstat -n -u username
......@@ -362,7 +361,7 @@ In this example, the nodes r21u01n577, r21u02n578, r21u03n579, r21u04n580 were a
!!! note
The jobscript or interactive shell is by default executed in home directory
```bash
```console
$ qsub -q qexp -l select=4:ncpus=24 -I
qsub: waiting for job 15210.isrv5 to start
qsub: job 15210.isrv5 ready
......@@ -380,7 +379,7 @@ The allocated nodes are accessible via ssh from login nodes. The nodes may acces
Calculations on allocated nodes may be executed remotely via the MPI, ssh, pdsh or clush. You may find out which nodes belong to the allocation by reading the $PBS_NODEFILE file
```bash
```console
qsub -q qexp -l select=2:ncpus=24 -I
qsub: waiting for job 15210.isrv5 to start
qsub: job 15210.isrv5 ready
......
......@@ -16,7 +16,7 @@ The network provides **2170MB/s** transfer rates via the TCP connection (single
## Example
```bash
```console
$ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob
$ qstat -n -u username
Req'd Req'd Elap
......@@ -28,14 +28,14 @@ Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
In this example, we access the node r4i1n0 by Infiniband network via the ib0 interface.
```bash
```console
$ ssh 10.17.35.19
```
In this example, we get
information of the Infiniband network.
```bash
```console
$ ifconfig
....
inet addr:10.17.35.19....
......
......@@ -36,14 +36,14 @@ Most of the information needed by PRACE users accessing the Salomon TIER-1 syste
Before you start to use any of the services don't forget to create a proxy certificate from your certificate:
```bash
$ grid-proxy-init
```console
$ grid-proxy-init
```
To check whether your proxy certificate is still valid (by default it's valid 12 hours), use:
```bash
$ grid-proxy-info
```console
$ grid-proxy-info
```
To access Salomon cluster, two login nodes running GSI SSH service are available. The service is available from public Internet as well as from the internal PRACE network (accessible only from other PRACE partners).
......@@ -60,14 +60,14 @@ It is recommended to use the single DNS name salomon-prace.it4i.cz which is dist
| login3-prace.salomon.it4i.cz | 2222 | gsissh | login3 |
| login4-prace.salomon.it4i.cz | 2222 | gsissh | login4 |
```bash
$ gsissh -p 2222 salomon-prace.it4i.cz
```console
$ gsissh -p 2222 salomon-prace.it4i.cz
```
When logging from other PRACE system, the prace_service script can be used:
```bash
$ gsissh `prace_service -i -s salomon`
```console
$ gsissh `prace_service -i -s salomon`
```
#### Access From Public Internet:
......@@ -82,27 +82,24 @@ It is recommended to use the single DNS name salomon.it4i.cz which is distribute
| login3-prace.salomon.it4i.cz | 2222 | gsissh | login3 |
| login4-prace.salomon.it4i.cz | 2222 | gsissh | login4 |
```bash
$ gsissh -p 2222 salomon.it4i.cz
```console
$ gsissh -p 2222 salomon.it4i.cz
```
When logging from other PRACE system, the prace_service script can be used:
```bash
$ gsissh `prace_service -e -s salomon`
```console
$ gsissh `prace_service -e -s salomon`
```
Although the preferred and recommended file transfer mechanism is [using GridFTP](prace/#file-transfers), the GSI SSH
implementation on Salomon supports also SCP, so for small files transfer gsiscp can be used:
```bash
$ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_
$ gsiscp -P 2222 salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_
$ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_
$ gsiscp -P 2222 salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_
```console
$ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_
$ gsiscp -P 2222 salomon.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_
$ gsiscp -P 2222 _LOCAL_PATH_TO_YOUR_FILE_ salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_
$ gsiscp -P 2222 salomon-prace.it4i.cz:_SALOMON_PATH_TO_YOUR_FILE_ _LOCAL_PATH_TO_YOUR_FILE_
```
### Access to X11 Applications (VNC)
......@@ -111,8 +108,8 @@ If the user needs to run X11 based graphical application and does not have a X11
If the user uses GSI SSH based access, then the procedure is similar to the SSH based access ([look here](../general/accessing-the-clusters/graphical-user-interface/x-window-system/)), only the port forwarding must be done using GSI SSH:
```bash
$ gsissh -p 2222 salomon.it4i.cz -L 5961:localhost:5961
```console
$ gsissh -p 2222 salomon.it4i.cz -L 5961:localhost:5961
```
### Access With SSH
......@@ -138,26 +135,26 @@ There's one control server and three backend servers for striping and/or backup
Copy files **to** Salomon by running the following commands on your local machine:
```bash
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_
```console
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_
```
Or by using prace_service script:
```bash
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_
```console
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_
```
Copy files **from** Salomon:
```bash
$ globus-url-copy gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
```console
$ globus-url-copy gsiftp://gridftp-prace.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
```
Or by using prace_service script:
```bash
$ globus-url-copy gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
```console
$ globus-url-copy gsiftp://`prace_service -i -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
```
### Access From Public Internet
......@@ -171,26 +168,26 @@ Or by using prace_service script:
Copy files **to** Salomon by running the following commands on your local machine:
```bash
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_
```console
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_
```
Or by using prace_service script:
```bash
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_
```console
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_
```
Copy files **from** Salomon:
```bash
$ globus-url-copy gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
```console
$ globus-url-copy gsiftp://gridftp.salomon.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
```
Or by using prace_service script:
```bash
$ globus-url-copy gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
```console
$ globus-url-copy gsiftp://`prace_service -e -f salomon`/home/prace/_YOUR_ACCOUNT_ON_SALOMON_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
```
Generally both shared file systems are available through GridFTP:
......@@ -222,8 +219,8 @@ All system wide installed software on the cluster is made available to the users
PRACE users can use the "prace" module to use the [PRACE Common Production Environment](http://www.prace-ri.eu/prace-common-production-environment/).
```bash
$ module load prace
```console
$ module load prace
```