Commit 2172f360 authored by Lukáš Krupčík's avatar Lukáš Krupčík

update

parent e4dc6645
Pipeline #1882 passed with stages
in 1 minute and 7 seconds
...@@ -7,11 +7,11 @@ In many cases, it is useful to submit huge (>100+) number of computational jobs ...@@ -7,11 +7,11 @@ In many cases, it is useful to submit huge (>100+) number of computational jobs
However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1000 per job array** However, executing huge number of jobs via the PBS queue may strain the system. This strain may result in slow response to commands, inefficient scheduling and overall degradation of performance and user experience, for all users. For this reason, the number of jobs is **limited to 100 per user, 1000 per job array**
!!! Note !!! Note
Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time. Please follow one of the procedures below, in case you wish to schedule more than 100 jobs at a time.
- Use [Job arrays](capacity-computing/#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs * Use [Job arrays](capacity-computing/#job-arrays) when running huge number of [multithread](capacity-computing/#shared-jobscript-on-one-node) (bound to one node only) or multinode (multithread across several nodes) jobs
- Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs * Use [GNU parallel](capacity-computing/#gnu-parallel) when running single core jobs
- Combine [GNU parallel with Job arrays](capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs * Combine [GNU parallel with Job arrays](capacity-computing/#job-arrays-and-gnu-parallel) when running huge number of single core jobs
## Policy ## Policy
...@@ -21,13 +21,13 @@ However, executing huge number of jobs via the PBS queue may strain the system. ...@@ -21,13 +21,13 @@ However, executing huge number of jobs via the PBS queue may strain the system.
## Job Arrays ## Job Arrays
!!! Note !!! Note
Huge number of jobs may be easily submitted and managed as a job array. Huge number of jobs may be easily submitted and managed as a job array.
A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: A job array is a compact representation of many jobs, called subjobs. The subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions:
- each subjob has a unique index, $PBS_ARRAY_INDEX * each subjob has a unique index, $PBS_ARRAY_INDEX
- job Identifiers of subjobs only differ by their indices * job Identifiers of subjobs only differ by their indices
- the state of subjobs can differ (R,Q,...etc.) * the state of subjobs can differ (R,Q,...etc.)
All subjobs within a job array have the same scheduling priority and schedule as independent jobs. Entire job array is submitted through a single qsub command and may be managed by qdel, qalter, qhold, qrls and qsig commands as a single job. All subjobs within a job array have the same scheduling priority and schedule as independent jobs. Entire job array is submitted through a single qsub command and may be managed by qdel, qalter, qhold, qrls and qsig commands as a single job.
...@@ -39,7 +39,7 @@ Example: ...@@ -39,7 +39,7 @@ Example:
Assume we have 900 input files with name beginning with "file" (e. g. file001, ..., file900). Assume we would like to use each of these input files with program executable myprog.x, each as a separate job. Assume we have 900 input files with name beginning with "file" (e. g. file001, ..., file900). Assume we would like to use each of these input files with program executable myprog.x, each as a separate job.
First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) - all input files in our example: First, we create a tasklist file (or subjobs list), listing all tasks (subjobs) * all input files in our example:
```bash ```bash
$ find . -name 'file*' > tasklist $ find . -name 'file*' > tasklist
...@@ -103,8 +103,8 @@ $ qstat -a 12345[].dm2 ...@@ -103,8 +103,8 @@ $ qstat -a 12345[].dm2
dm2: dm2:
Req'd Req'd Elap Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- --------------* -------* -* |---|---| -----* --* --* -----* ----* * -----
12345[].dm2 user2 qprod xx 13516 1 16 -- 00:50 B 00:02 12345[].dm2 user2 qprod xx 13516 1 16 -* 00:50 B 00:02
``` ```
The status B means that some subjobs are already running. The status B means that some subjobs are already running.
...@@ -116,14 +116,14 @@ $ qstat -a 12345[1-100].dm2 ...@@ -116,14 +116,14 @@ $ qstat -a 12345[1-100].dm2
dm2: dm2:
Req'd Req'd Elap Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- --------------* -------* -* |---|---| -----* --* --* -----* ----* * -----
12345[1].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:02 12345[1].dm2 user2 qprod xx 13516 1 16 -* 00:50 R 00:02
12345[2].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:02 12345[2].dm2 user2 qprod xx 13516 1 16 -* 00:50 R 00:02
12345[3].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:01 12345[3].dm2 user2 qprod xx 13516 1 16 -* 00:50 R 00:01
12345[4].dm2 user2 qprod xx 13516 1 16 -- 00:50 Q -- 12345[4].dm2 user2 qprod xx 13516 1 16 -* 00:50 Q --
. . . . . . . . . . . . . . . . . . . . . .
, . . . . . . . . . . , . . . . . . . . . .
12345[100].dm2 user2 qprod xx 13516 1 16 -- 00:50 Q -- 12345[100].dm2 user2 qprod xx 13516 1 16 -* 00:50 Q --
``` ```
Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted. Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted.
...@@ -150,7 +150,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/) ...@@ -150,7 +150,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/)
## GNU Parallel ## GNU Parallel
!!! Note !!! Note
Use GNU parallel to run many single core tasks on one node. Use GNU parallel to run many single core tasks on one node.
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm. GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm.
...@@ -169,7 +169,7 @@ Example: ...@@ -169,7 +169,7 @@ Example:
Assume we have 101 input files with name beginning with "file" (e. g. file001, ..., file101). Assume we would like to use each of these input files with program executable myprog.x, each as a separate single core job. We call these single core jobs tasks. Assume we have 101 input files with name beginning with "file" (e. g. file001, ..., file101). Assume we would like to use each of these input files with program executable myprog.x, each as a separate single core job. We call these single core jobs tasks.
First, we create a tasklist file, listing all tasks - all input files in our example: First, we create a tasklist file, listing all tasks * all input files in our example:
```bash ```bash
$ find . -name 'file*' > tasklist $ find . -name 'file*' > tasklist
...@@ -237,7 +237,7 @@ Example: ...@@ -237,7 +237,7 @@ Example:
Assume we have 992 input files with name beginning with "file" (e. g. file001, ..., file992). Assume we would like to use each of these input files with program executable myprog.x, each as a separate single core job. We call these single core jobs tasks. Assume we have 992 input files with name beginning with "file" (e. g. file001, ..., file992). Assume we would like to use each of these input files with program executable myprog.x, each as a separate single core job. We call these single core jobs tasks.
First, we create a tasklist file, listing all tasks - all input files in our example: First, we create a tasklist file, listing all tasks * all input files in our example:
```bash ```bash
$ find . -name 'file*' > tasklist $ find . -name 'file*' > tasklist
...@@ -265,7 +265,7 @@ SCR=/lscratch/$PBS_JOBID/$PARALLEL_SEQ ...@@ -265,7 +265,7 @@ SCR=/lscratch/$PBS_JOBID/$PARALLEL_SEQ
mkdir -p $SCR ; cd $SCR || exit mkdir -p $SCR ; cd $SCR || exit
# get individual task from tasklist with index from PBS JOB ARRAY and index form Parallel # get individual task from tasklist with index from PBS JOB ARRAY and index form Parallel
IDX=$(($PBS_ARRAY_INDEX + $PARALLEL_SEQ - 1)) IDX=$(($PBS_ARRAY_INDEX + $PARALLEL_SEQ * 1))
TASK=$(sed -n "${IDX}p" $PBS_O_WORKDIR/tasklist) TASK=$(sed -n "${IDX}p" $PBS_O_WORKDIR/tasklist)
[ -z "$TASK" ] && exit [ -z "$TASK" ] && exit
......
...@@ -6,46 +6,46 @@ Anselm is cluster of x86-64 Intel based nodes built on Bull Extreme Computing bu ...@@ -6,46 +6,46 @@ Anselm is cluster of x86-64 Intel based nodes built on Bull Extreme Computing bu
### Compute Nodes Without Accelerator ### Compute Nodes Without Accelerator
- 180 nodes * 180 nodes
- 2880 cores in total * 2880 cores in total
- two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node * two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
- 64 GB of physical memory per node * 64 GB of physical memory per node
- one 500GB SATA 2,5” 7,2 krpm HDD per node * one 500GB SATA 2,5” 7,2 krpm HDD per node
- bullx B510 blade servers * bullx B510 blade servers
- cn[1-180] * cn[1-180]
### Compute Nodes With GPU Accelerator ### Compute Nodes With GPU Accelerator
- 23 nodes * 23 nodes
- 368 cores in total * 368 cores in total
- two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node * two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
- 96 GB of physical memory per node * 96 GB of physical memory per node
- one 500GB SATA 2,5” 7,2 krpm HDD per node * one 500GB SATA 2,5” 7,2 krpm HDD per node
- GPU accelerator 1x NVIDIA Tesla Kepler K20 per node * GPU accelerator 1x NVIDIA Tesla Kepler K20 per node
- bullx B515 blade servers * bullx B515 blade servers
- cn[181-203] * cn[181-203]
### Compute Nodes With MIC Accelerator ### Compute Nodes With MIC Accelerator
- 4 nodes * 4 nodes
- 64 cores in total * 64 cores in total
- two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node * two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
- 96 GB of physical memory per node * 96 GB of physical memory per node
- one 500GB SATA 2,5” 7,2 krpm HDD per node * one 500GB SATA 2,5” 7,2 krpm HDD per node
- MIC accelerator 1x Intel Phi 5110P per node * MIC accelerator 1x Intel Phi 5110P per node
- bullx B515 blade servers * bullx B515 blade servers
- cn[204-207] * cn[204-207]
### Fat Compute Nodes ### Fat Compute Nodes
- 2 nodes * 2 nodes
- 32 cores in total * 32 cores in total
- 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node * 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
- 512 GB of physical memory per node * 512 GB of physical memory per node
- two 300GB SAS 3,5”15krpm HDD (RAID1) per node * two 300GB SAS 3,5”15krpm HDD (RAID1) per node
- two 100GB SLC SSD per node * two 100GB SLC SSD per node
- bullx R423-E3 servers * bullx R423-E3 servers
- cn[208-209] * cn[208-209]
![](../img/bullxB510.png) ![](../img/bullxB510.png)
**Figure Anselm bullx B510 servers** **Figure Anselm bullx B510 servers**
...@@ -53,7 +53,7 @@ Anselm is cluster of x86-64 Intel based nodes built on Bull Extreme Computing bu ...@@ -53,7 +53,7 @@ Anselm is cluster of x86-64 Intel based nodes built on Bull Extreme Computing bu
### Compute Nodes Summary ### Compute Nodes Summary
| Node type | Count | Range | Memory | Cores | [Access](resources-allocation-policy/) | | Node type | Count | Range | Memory | Cores | [Access](resources-allocation-policy/) |
| -------------------------- | ----- | ----------- | ------ | ----------- | -------------------------------------- | | -------------------------* | ----* | ----------* | -----* | ----------* | -------------------------------------* |
| Nodes without accelerator | 180 | cn[1-180] | 64GB | 16 @ 2.4Ghz | qexp, qprod, qlong, qfree | | Nodes without accelerator | 180 | cn[1-180] | 64GB | 16 @ 2.4Ghz | qexp, qprod, qlong, qfree |
| Nodes with GPU accelerator | 23 | cn[181-203] | 96GB | 16 @ 2.3Ghz | qgpu, qprod | | Nodes with GPU accelerator | 23 | cn[181-203] | 96GB | 16 @ 2.3Ghz | qgpu, qprod |
| Nodes with MIC accelerator | 4 | cn[204-207] | 96GB | 16 @ 2.3GHz | qmic, qprod | | Nodes with MIC accelerator | 4 | cn[204-207] | 96GB | 16 @ 2.3GHz | qmic, qprod |
...@@ -65,23 +65,23 @@ Anselm is equipped with Intel Sandy Bridge processors Intel Xeon E5-2665 (nodes ...@@ -65,23 +65,23 @@ Anselm is equipped with Intel Sandy Bridge processors Intel Xeon E5-2665 (nodes
### Intel Sandy Bridge E5-2665 Processor ### Intel Sandy Bridge E5-2665 Processor
- eight-core * eight-core
- speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology * speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology
- peak performance: 19.2 GFLOP/s per core * peak performance: 19.2 GFLOP/s per core
- caches: * caches:
- L2: 256 KB per core * L2: 256 KB per core
- L3: 20 MB per processor * L3: 20 MB per processor
- memory bandwidth at the level of the processor: 51.2 GB/s * memory bandwidth at the level of the processor: 51.2 GB/s
### Intel Sandy Bridge E5-2470 Processor ### Intel Sandy Bridge E5-2470 Processor
- eight-core * eight-core
- speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology * speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology
- peak performance: 18.4 GFLOP/s per core * peak performance: 18.4 GFLOP/s per core
- caches: * caches:
- L2: 256 KB per core * L2: 256 KB per core
- L3: 20 MB per processor * L3: 20 MB per processor
- memory bandwidth at the level of the processor: 38.4 GB/s * memory bandwidth at the level of the processor: 38.4 GB/s
Nodes equipped with Intel Xeon E5-2665 CPU have set PBS resource attribute cpu_freq = 24, nodes equipped with Intel Xeon E5-2470 CPU have set PBS resource attribute cpu_freq = 23. Nodes equipped with Intel Xeon E5-2665 CPU have set PBS resource attribute cpu_freq = 24, nodes equipped with Intel Xeon E5-2470 CPU have set PBS resource attribute cpu_freq = 23.
...@@ -101,30 +101,30 @@ Intel Turbo Boost Technology is used by default, you can disable it for all nod ...@@ -101,30 +101,30 @@ Intel Turbo Boost Technology is used by default, you can disable it for all nod
### Compute Node Without Accelerator ### Compute Node Without Accelerator
- 2 sockets * 2 sockets
- Memory Controllers are integrated into processors. * Memory Controllers are integrated into processors.
- 8 DDR3 DIMMs per node * 8 DDR3 DIMMs per node
- 4 DDR3 DIMMs per CPU * 4 DDR3 DIMMs per CPU
- 1 DDR3 DIMMs per channel * 1 DDR3 DIMMs per channel
- Data rate support: up to 1600MT/s * Data rate support: up to 1600MT/s
- Populated memory: 8 x 8 GB DDR3 DIMM 1600 MHz * Populated memory: 8 x 8 GB DDR3 DIMM 1600 MHz
### Compute Node With GPU or MIC Accelerator ### Compute Node With GPU or MIC Accelerator
- 2 sockets * 2 sockets
- Memory Controllers are integrated into processors. * Memory Controllers are integrated into processors.
- 6 DDR3 DIMMs per node * 6 DDR3 DIMMs per node
- 3 DDR3 DIMMs per CPU * 3 DDR3 DIMMs per CPU
- 1 DDR3 DIMMs per channel * 1 DDR3 DIMMs per channel
- Data rate support: up to 1600MT/s * Data rate support: up to 1600MT/s
- Populated memory: 6 x 16 GB DDR3 DIMM 1600 MHz * Populated memory: 6 x 16 GB DDR3 DIMM 1600 MHz
### Fat Compute Node ### Fat Compute Node
- 2 sockets * 2 sockets
- Memory Controllers are integrated into processors. * Memory Controllers are integrated into processors.
- 16 DDR3 DIMMs per node * 16 DDR3 DIMMs per node
- 8 DDR3 DIMMs per CPU * 8 DDR3 DIMMs per CPU
- 2 DDR3 DIMMs per channel * 2 DDR3 DIMMs per channel
- Data rate support: up to 1600MT/s * Data rate support: up to 1600MT/s
- Populated memory: 16 x 32 GB DDR3 DIMM 1600 MHz * Populated memory: 16 x 32 GB DDR3 DIMM 1600 MHz
...@@ -16,7 +16,7 @@ fi ...@@ -16,7 +16,7 @@ fi
alias qs='qstat -a' alias qs='qstat -a'
module load PrgEnv-gnu module load PrgEnv-gnu
# Display information to standard output - only in interactive ssh session # Display information to standard output * only in interactive ssh session
if [ -n "$SSH_TTY" ] if [ -n "$SSH_TTY" ]
then then
module list # Display loaded modules module list # Display loaded modules
...@@ -24,14 +24,14 @@ fi ...@@ -24,14 +24,14 @@ fi
``` ```
!!! Note !!! Note
Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Conside utilization of SSH session interactivity for such commands as stated in the previous example. Do not run commands outputting to standard output (echo, module list, etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental functionality (scp, PBS) of your account! Conside utilization of SSH session interactivity for such commands as stated in the previous example.
### Application Modules ### Application Modules
In order to configure your shell for running particular application on Anselm we use Module package interface. In order to configure your shell for running particular application on Anselm we use Module package interface.
!!! Note !!! Note
The modules set up the application paths, library paths and environment variables for running particular application. The modules set up the application paths, library paths and environment variables for running particular application.
We have also second modules repository. This modules repository is created using tool called EasyBuild. On Salomon cluster, all modules will be build by this tool. If you want to use software from this modules repository, please follow instructions in section [Application Modules Path Expansion](environment-and-modules/#EasyBuild). We have also second modules repository. This modules repository is created using tool called EasyBuild. On Salomon cluster, all modules will be build by this tool. If you want to use software from this modules repository, please follow instructions in section [Application Modules Path Expansion](environment-and-modules/#EasyBuild).
......
...@@ -12,10 +12,10 @@ The cluster compute nodes cn[1-207] are organized within 13 chassis. ...@@ -12,10 +12,10 @@ The cluster compute nodes cn[1-207] are organized within 13 chassis.
There are four types of compute nodes: There are four types of compute nodes:
- 180 compute nodes without the accelerator * 180 compute nodes without the accelerator
- 23 compute nodes with GPU accelerator - equipped with NVIDIA Tesla Kepler K20 * 23 compute nodes with GPU accelerator * equipped with NVIDIA Tesla Kepler K20
- 4 compute nodes with MIC accelerator - equipped with Intel Xeon Phi 5110P * 4 compute nodes with MIC accelerator * equipped with Intel Xeon Phi 5110P
- 2 fat nodes - equipped with 512 GB RAM and two 100 GB SSD drives * 2 fat nodes * equipped with 512 GB RAM and two 100 GB SSD drives
[More about Compute nodes](compute-nodes/). [More about Compute nodes](compute-nodes/).
...@@ -31,7 +31,7 @@ The user access to the Anselm cluster is provided by two login nodes login1, log ...@@ -31,7 +31,7 @@ The user access to the Anselm cluster is provided by two login nodes login1, log
The parameters are summarized in the following tables: The parameters are summarized in the following tables:
| **In general** | | | **In general** | |
| ------------------------------------------- | -------------------------------------------- | | ------------------------------------------* | -------------------------------------------* |
| Primary purpose | High Performance Computing | | Primary purpose | High Performance Computing |
| Architecture of compute nodes | x86-64 | | Architecture of compute nodes | x86-64 |
| Operating system | Linux | | Operating system | Linux |
...@@ -39,7 +39,7 @@ The parameters are summarized in the following tables: ...@@ -39,7 +39,7 @@ The parameters are summarized in the following tables:
| Totally | 209 | | Totally | 209 |
| Processor cores | 16 (2 x 8 cores) | | Processor cores | 16 (2 x 8 cores) |
| RAM | min. 64 GB, min. 4 GB per core | | RAM | min. 64 GB, min. 4 GB per core |
| Local disk drive | yes - usually 500 GB | | Local disk drive | yes * usually 500 GB |
| Compute network | InfiniBand QDR, fully non-blocking, fat-tree | | Compute network | InfiniBand QDR, fully non-blocking, fat-tree |
| w/o accelerator | 180, cn[1-180] | | w/o accelerator | 180, cn[1-180] |
| GPU accelerated | 23, cn[181-203] | | GPU accelerated | 23, cn[181-203] |
...@@ -51,10 +51,10 @@ The parameters are summarized in the following tables: ...@@ -51,10 +51,10 @@ The parameters are summarized in the following tables:
| Total amount of RAM | 15.136 TB | | Total amount of RAM | 15.136 TB |
| Node | Processor | Memory | Accelerator | | Node | Processor | Memory | Accelerator |
| ---------------- | --------------------------------------- | ------ | -------------------- | | ---------------* | --------------------------------------* | -----* | -------------------* |
| w/o accelerator | 2 x Intel Sandy Bridge E5-2665, 2.4 GHz | 64 GB | - | | w/o accelerator | 2 x Intel Sandy Bridge E5-2665, 2.4 GHz | 64 GB | * |
| GPU accelerated | 2 x Intel Sandy Bridge E5-2470, 2.3 GHz | 96 GB | NVIDIA Kepler K20 | | GPU accelerated | 2 x Intel Sandy Bridge E5-2470, 2.3 GHz | 96 GB | NVIDIA Kepler K20 |
| MIC accelerated | 2 x Intel Sandy Bridge E5-2470, 2.3 GHz | 96 GB | Intel Xeon Phi 5110P | | MIC accelerated | 2 x Intel Sandy Bridge E5-2470, 2.3 GHz | 96 GB | Intel Xeon Phi 5110P |
| Fat compute node | 2 x Intel Sandy Bridge E5-2665, 2.4 GHz | 512 GB | - | | Fat compute node | 2 x Intel Sandy Bridge E5-2665, 2.4 GHz | 512 GB | * |
For more details please refer to the [Compute nodes](compute-nodes/), [Storage](storage/), and [Network](network/). For more details please refer to the [Compute nodes](compute-nodes/), [Storage](storage/), and [Network](network/).
...@@ -36,7 +36,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut ...@@ -36,7 +36,7 @@ Usage counts allocated core-hours (`ncpus x walltime`). Usage is decayed, or cut
Jobs queued in queue qexp are not calculated to project's usage. Jobs queued in queue qexp are not calculated to project's usage.
!!! Note !!! Note
Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>. Calculated usage and fair-share priority can be seen at <https://extranet.it4i.cz/anselm/projects>.
Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job. Calculated fair-share priority can be also seen as Resource_List.fairshare attribute of a job.
...@@ -65,6 +65,6 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu ...@@ -65,6 +65,6 @@ The scheduler makes a list of jobs to run in order of execution priority. Schedu
It means, that jobs with lower execution priority can be run before jobs with higher execution priority. It means, that jobs with lower execution priority can be run before jobs with higher execution priority.
!!! Note !!! Note
It is **very beneficial to specify the walltime** when submitting jobs. It is **very beneficial to specify the walltime** when submitting jobs.
Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled - and overtake job(s) with higher priority. Specifying more accurate walltime enables better scheduling, better execution times and better resource usage. Jobs with suitable (small) walltime could be backfilled * and overtake job(s) with higher priority.
...@@ -77,7 +77,7 @@ In this example, we allocate nodes cn171 and cn172, all 16 cores per node, for 2 ...@@ -77,7 +77,7 @@ In this example, we allocate nodes cn171 and cn172, all 16 cores per node, for 2
Nodes equipped with Intel Xeon E5-2665 CPU have base clock frequency 2.4GHz, nodes equipped with Intel Xeon E5-2470 CPU have base frequency 2.3 GHz (see section Compute Nodes for details). Nodes may be selected via the PBS resource attribute cpu_freq . Nodes equipped with Intel Xeon E5-2665 CPU have base clock frequency 2.4GHz, nodes equipped with Intel Xeon E5-2470 CPU have base frequency 2.3 GHz (see section Compute Nodes for details). Nodes may be selected via the PBS resource attribute cpu_freq .
| CPU Type | base freq. | Nodes | cpu_freq attribute | | CPU Type | base freq. | Nodes | cpu_freq attribute |
| ------------------ | ---------- | ---------------------- | ------------------ | | -----------------* | ---------* | ---------------------* | -----------------* |
| Intel Xeon E5-2665 | 2.4GHz | cn[1-180], cn[208-209] | 24 | | Intel Xeon E5-2665 | 2.4GHz | cn[1-180], cn[208-209] | 24 |
| Intel Xeon E5-2470 | 2.3GHz | cn[181-207] | 23 | | Intel Xeon E5-2470 | 2.3GHz | cn[181-207] | 23 |
...@@ -150,10 +150,10 @@ $ qstat -a ...@@ -150,10 +150,10 @@ $ qstat -a
srv11: srv11:
Req'd Req'd Elap Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- --------------* -------* -* |---|---| -----* --* --* -----* ----* * -----
16287.srv11 user1 qlong job1 6183 4 64 -- 144:0 R 38:25 16287.srv11 user1 qlong job1 6183 4 64 -* 144:0 R 38:25
16468.srv11 user1 qlong job2 8060 4 64 -- 144:0 R 17:44 16468.srv11 user1 qlong job2 8060 4 64 -* 144:0 R 17:44
16547.srv11 user2 qprod job3x 13516 2 32 -- 48:00 R 00:58 16547.srv11 user2 qprod job3x 13516 2 32 -* 48:00 R 00:58
``` ```
In this example user1 and user2 are running jobs named job1, job2 and job3x. The jobs job1 and job2 are using 4 nodes, 16 cores per node each. The job1 already runs for 38 hours and 25 minutes, job2 for 17 hours 44 minutes. The job1 already consumed `64 x 38.41 = 2458.6` core hours. The job3x already consumed `0.96 x 32 = 30.93` core hours. These consumed core hours will be accounted on the respective project accounts, regardless of whether the allocated cores were actually used for computations. In this example user1 and user2 are running jobs named job1, job2 and job3x. The jobs job1 and job2 are using 4 nodes, 16 cores per node each. The job1 already runs for 38 hours and 25 minutes, job2 for 17 hours 44 minutes. The job1 already consumed `64 x 38.41 = 2458.6` core hours. The job3x already consumed `0.96 x 32 = 30.93` core hours. These consumed core hours will be accounted on the respective project accounts, regardless of whether the allocated cores were actually used for computations.
...@@ -253,8 +253,8 @@ $ qstat -n -u username ...@@ -253,8 +253,8 @@ $ qstat -n -u username
srv11: srv11:
Req'd Req'd Elap Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- --------------* -------* -* |---|---| -----* --* --* -----* ----* * -----
15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00 15209.srv11 username qexp Name0 5530 4 64 -* 01:00 R 00:00
cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16 cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16
``` ```
......
...@@ -9,7 +9,7 @@ All compute and login nodes of Anselm are interconnected by a high-bandwidth, lo ...@@ -9,7 +9,7 @@ All compute and login nodes of Anselm are interconnected by a high-bandwidth, lo
The compute nodes may be accessed via the InfiniBand network using ib0 network interface, in address range 10.2.1.1-209. The MPI may be used to establish native InfiniBand connection among the nodes. The compute nodes may be accessed via the InfiniBand network using ib0 network interface, in address range 10.2.1.1-209. The MPI may be used to establish native InfiniBand connection among the nodes.
!!! Note !!! Note
The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via native InfiniBand protocol. The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via native InfiniBand protocol.
The Fat tree topology ensures that peak transfer rates are achieved between any two nodes, independent of network traffic exchanged among other nodes concurrently. The Fat tree topology ensures that peak transfer rates are achieved between any two nodes, independent of network traffic exchanged among other nodes concurrently.
...@@ -24,8 +24,8 @@ $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob ...@@ -24,8 +24,8 @@ $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob
$ qstat -n -u username $ qstat -n -u username
Req'd Req'd Elap Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- --------------* -------* -* |---|---| -----* --* --* -----* ----* * -----
15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00 15209.srv11 username qexp Name0 5530 4 64 -* 01:00 R 00:00
cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16 cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16
$ ssh 10.2.1.110 $ ssh 10.2.1.110
......
...@@ -28,11 +28,11 @@ The user will need a valid certificate and to be present in the PRACE LDAP (plea ...@@ -28,11 +28,11 @@ The user will need a valid certificate and to be present in the PRACE LDAP (plea
Most of the information needed by PRACE users accessing the Anselm TIER-1 system can be found here: Most of the information needed by PRACE users accessing the Anselm TIER-1 system can be found here:
- [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs) * [General user's FAQ](http://www.prace-ri.eu/Users-General-FAQs)
- [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ) * [Certificates FAQ](http://www.prace-ri.eu/Certificates-FAQ)
- [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh) * [Interactive access using GSISSH](http://www.prace-ri.eu/Interactive-Access-Using-gsissh)
- [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details) * [Data transfer with GridFTP](http://www.prace-ri.eu/Data-Transfer-with-GridFTP-Details)
- [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer) * [Data transfer with gtransfer](http://www.prace-ri.eu/Data-Transfer-with-gtransfer)
Before you start to use any of