Commit 899a6f1d authored by Lukáš Krupčík's avatar Lukáš Krupčík

->

parent d018d740
Pipeline #1956 passed with stages
in 1 minute
......@@ -101,10 +101,10 @@ Check status of the job array by the qstat command.
$ qstat -a 12345[].dm2
dm2:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - -----
12345[].dm2 user2 qprod xx 13516 1 16 -- 00:50 B 00:02
12345[].dm2 user2 qprod xx 13516 1 16 -- 00:50 B 00:02
```
The status B means that some subjobs are already running.
......@@ -114,16 +114,16 @@ Check status of the first 100 subjobs by the qstat command.
$ qstat -a 12345[1-100].dm2
dm2:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - -----
12345[1].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:02
12345[2].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:02
12345[3].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:01
12345[4].dm2 user2 qprod xx 13516 1 16 -- 00:50 Q --
12345[1].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:02
12345[2].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:02
12345[3].dm2 user2 qprod xx 13516 1 16 -- 00:50 R 00:01
12345[4].dm2 user2 qprod xx 13516 1 16 -- 00:50 Q --
. . . . . . . . . . .
, . . . . . . . . . .
12345[100].dm2 user2 qprod xx 13516 1 16 -- 00:50 Q --
12345[100].dm2 user2 qprod xx 13516 1 16 -- 00:50 Q --
```
Delete the entire job array. Running subjobs will be killed, queueing subjobs will be deleted.
......@@ -152,7 +152,7 @@ Read more on job arrays in the [PBSPro Users guide](../../pbspro-documentation/)
!!! note
Use GNU parallel to run many single core tasks on one node.
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm.
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel is most useful in running single core jobs via the queue system on Anselm.
For more information and examples see the parallel man page:
......@@ -197,7 +197,7 @@ TASK=$1
cp $PBS_O_WORKDIR/$TASK input
# execute the calculation
cat input > output
cat input > output
# copy output file to submit directory
cp output $PBS_O_WORKDIR/$TASK.out
......@@ -214,7 +214,7 @@ $ qsub -N JOBNAME jobscript
12345.dm2
```
In this example, we submit a job of 101 tasks. 16 input files will be processed in parallel. The 101 tasks on 16 cores are assumed to complete in less than 2 hours.
In this example, we submit a job of 101 tasks. 16 input files will be processed in parallel. The 101 tasks on 16 cores are assumed to complete in less than 2 hours.
!!! hint
Use #PBS directives in the beginning of the jobscript file, dont' forget to set your valid PROJECT_ID and desired queue.
......@@ -279,10 +279,10 @@ cat input > output
cp output $PBS_O_WORKDIR/$TASK.out
```
In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached.
In this example, the jobscript executes in multiple instances in parallel, on all cores of a computing node. Variable $TASK expands to one of the input filenames from tasklist. We copy the input file to local scratch, execute the myprog.x and copy the output file back to the submit directory, under the $TASK.out name. The numtasks file controls how many tasks will be run per subjob. Once an task is finished, new task starts, until the number of tasks in numtasks file is reached.
!!! note
Select subjob walltime and number of tasks per subjob carefully
Select subjob walltime and number of tasks per subjob carefully
When deciding this values, think about following guiding rules:
......
......@@ -28,7 +28,7 @@ fi
### Application Modules
In order to configure your shell for running particular application on Anselm we use Module package interface.
In order to configure your shell for running particular application on Anselm we use Module package interface.
!!! note
The modules set up the application paths, library paths and environment variables for running particular application.
......@@ -43,7 +43,7 @@ To check available modules use
$ module avail
```
To load a module, for example the octave module use
To load a module, for example the octave module use
```bash
$ module load octave
......
# Hardware Overview
The Anselm cluster consists of 209 computational nodes named cn[1-209] of which 180 are regular compute nodes, 23 GPU Kepler K20 accelerated nodes, 4 MIC Xeon Phi 5110P accelerated nodes and 2 fat nodes. Each node is a powerful x86-64 computer, equipped with 16 cores (two eight-core Intel Sandy Bridge processors), at least 64 GB RAM, and local hard drive. The user access to the Anselm cluster is provided by two login nodes login[1,2]. The nodes are interlinked by high speed InfiniBand and Ethernet networks. All nodes share 320 TB /home disk storage to store the user files. The 146 TB shared /scratch storage is available for the scratch data.
The Anselm cluster consists of 209 computational nodes named cn[1-209] of which 180 are regular compute nodes, 23 GPU Kepler K20 accelerated nodes, 4 MIC Xeon Phi 5110P accelerated nodes and 2 fat nodes. Each node is a powerful x86-64 computer, equipped with 16 cores (two eight-core Intel Sandy Bridge processors), at least 64 GB RAM, and local hard drive. The user access to the Anselm cluster is provided by two login nodes login[1,2]. The nodes are interlinked by high speed InfiniBand and Ethernet networks. All nodes share 320 TB /home disk storage to store the user files. The 146 TB shared /scratch storage is available for the scratch data.
The Fat nodes are equipped with large amount (512 GB) of memory. Virtualization infrastructure provides resources to run long term servers and services in virtual mode. Fat nodes and virtual servers may access 45 TB of dedicated block storage. Accelerated nodes, fat nodes, and virtualization infrastructure are available [upon request](https://support.it4i.cz/rt) made by a PI.
......
......@@ -2,7 +2,7 @@
Welcome to Anselm supercomputer cluster. The Anselm cluster consists of 209 compute nodes, totaling 3344 compute cores with 15 TB RAM and giving over 94 TFLOP/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 16 cores, at least 64 GB RAM, and 500 GB hard disk drive. Nodes are interconnected by fully non-blocking fat-tree InfiniBand network and equipped with Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview](hardware-overview/).
The cluster runs [operating system](software/operating-system/), which is compatible with the RedHat [Linux family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment](environment-and-modules/).
The cluster runs [operating system](software/operating-system/), which is compatible with the RedHat [Linux family.](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg) We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment](environment-and-modules/).
User data shared file-system (HOME, 320 TB) and job data shared file-system (SCRATCH, 146 TB) are available to users.
......
......@@ -40,13 +40,13 @@ In this example, we allocate 4 nodes, 16 cores per node, for 1 hour. We allocate
$ qsub -A OPEN-0-0 -q qnvidia -l select=10:ncpus=16 ./myjob
```
In this example, we allocate 10 nvidia accelerated nodes, 16 cores per node, for 24 hours. We allocate these resources via the qnvidia queue. Jobscript myjob will be executed on the first node in the allocation.
In this example, we allocate 10 nvidia accelerated nodes, 16 cores per node, for 24 hours. We allocate these resources via the qnvidia queue. Jobscript myjob will be executed on the first node in the allocation.
```bash
$ qsub -A OPEN-0-0 -q qfree -l select=10:ncpus=16 ./myjob
```
In this example, we allocate 10 nodes, 16 cores per node, for 12 hours. We allocate these resources via the qfree queue. It is not required that the project OPEN-0-0 has any available resources left. Consumed resources are still accounted for. Jobscript myjob will be executed on the first node in the allocation.
In this example, we allocate 10 nodes, 16 cores per node, for 12 hours. We allocate these resources via the qfree queue. It is not required that the project OPEN-0-0 has any available resources left. Consumed resources are still accounted for. Jobscript myjob will be executed on the first node in the allocation.
All qsub options may be [saved directly into the jobscript](job-submission-and-execution/#PBSsaved). In such a case, no options to qsub are needed.
......@@ -126,7 +126,7 @@ In the following example, we select an allocation for benchmarking a very specia
-N Benchmark ./mybenchmark
```
The MPI processes will be distributed differently on the nodes connected to the two switches. On the isw10 nodes, we will run 1 MPI process per node 16 threads per process, on isw20 nodes we will run 16 plain MPI processes.
The MPI processes will be distributed differently on the nodes connected to the two switches. On the isw10 nodes, we will run 1 MPI process per node 16 threads per process, on isw20 nodes we will run 16 plain MPI processes.
Although this example is somewhat artificial, it demonstrates the flexibility of the qsub command options.
......@@ -148,12 +148,12 @@ Example:
$ qstat -a
srv11:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - -----
16287.srv11 user1 qlong job1 6183 4 64 -- 144:0 R 38:25
16468.srv11 user1 qlong job2 8060 4 64 -- 144:0 R 17:44
16547.srv11 user2 qprod job3x 13516 2 32 -- 48:00 R 00:58
16287.srv11 user1 qlong job1 6183 4 64 -- 144:0 R 38:25
16468.srv11 user1 qlong job2 8060 4 64 -- 144:0 R 17:44
16547.srv11 user2 qprod job3x 13516 2 32 -- 48:00 R 00:58
```
In this example user1 and user2 are running jobs named job1, job2 and job3x. The jobs job1 and job2 are using 4 nodes, 16 cores per node each. The job1 already runs for 38 hours and 25 minutes, job2 for 17 hours 44 minutes. The job1 already consumed `64 x 38.41 = 2458.6` core hours. The job3x already consumed `0.96 x 32 = 30.93` core hours. These consumed core hours will be accounted on the respective project accounts, regardless of whether the allocated cores were actually used for computations.
......@@ -251,10 +251,10 @@ $ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob
$ qstat -n -u username
srv11:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - -----
15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00
15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00
cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16
```
......
......@@ -22,10 +22,10 @@ The compute nodes may be accessed via the regular Gigabit Ethernet network inter
```bash
$ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob
$ qstat -n -u username
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - -----
15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00
15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00
cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16
$ ssh 10.2.1.110
......
......@@ -137,7 +137,7 @@ Copy files **to** Anselm by running the following commands on your local machine
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_
```
Or by using prace_service script:
Or by using prace_service script:
```bash
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_
......@@ -149,7 +149,7 @@ Copy files **from** Anselm:
$ globus-url-copy gsiftp://gridftp-prace.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
```
Or by using prace_service script:
Or by using prace_service script:
```bash
$ globus-url-copy gsiftp://`prace_service -i -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
......@@ -170,7 +170,7 @@ Copy files **to** Anselm by running the following commands on your local machine
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_
```
Or by using prace_service script:
Or by using prace_service script:
```bash
$ globus-url-copy file://_LOCAL_PATH_TO_YOUR_FILE_ gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_
......@@ -182,7 +182,7 @@ Copy files **from** Anselm:
$ globus-url-copy gsiftp://gridftp.anselm.it4i.cz:2812/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
```
Or by using prace_service script:
Or by using prace_service script:
```bash
$ globus-url-copy gsiftp://`prace_service -e -f anselm`/home/prace/_YOUR_ACCOUNT_ON_ANSELM_/_PATH_TO_YOUR_FILE_ file://_LOCAL_PATH_TO_YOUR_FILE_
......
......@@ -98,7 +98,7 @@ $ ssh login2.anselm.it4i.cz -L 5901:localhost:5901
```
x-window-system/
If you use Windows and Putty, please refer to port forwarding setup in the documentation:
If you use Windows and Putty, please refer to port forwarding setup in the documentation:
[x-window-and-vnc#section-12](../get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/)
#### 7. If You Don't Have Turbo VNC Installed on Your Workstation
......
......@@ -59,9 +59,9 @@ Options:
--get-node-ncpu-chart
Print chart of allocated ncpus per node
--summary Print summary
--get-server-details Print server
--get-server-details Print server
--get-queues Print queues
--get-queues-details Print queues details
--get-queues-details Print queues details
--get-reservations Print reservations
--get-reservations-details
Print reservations details
......@@ -92,7 +92,7 @@ Options:
--get-user-ncpus Print number of allocated ncpus per user
--get-qlist-nodes Print qlist nodes
--get-qlist-nodeset Print qlist nodeset
--get-ibswitch-nodes Print ibswitch nodes
--get-ibswitch-nodes Print ibswitch nodes
--get-ibswitch-nodeset
Print ibswitch nodeset
--state=STATE Only for given job state
......
......@@ -47,7 +47,7 @@ After logging in, you will see the command prompt:
http://www.it4i.cz/?lang=en
Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com
Last login: Tue Jul 9 15:57:38 2013 from your-host.example.com
[username@login2.anselm ~]$
```
......@@ -194,7 +194,7 @@ Once the proxy server is running, establish ssh port forwarding from Anselm to t
local $ ssh -R 6000:localhost:1080 anselm.it4i.cz
```
Now, configure the applications proxy settings to **localhost:6000**. Use port forwarding to access the [proxy server from compute nodes](#port-forwarding-from-compute-nodes) as well.
Now, configure the applications proxy settings to **localhost:6000**. Use port forwarding to access the [proxy server from compute nodes](#port-forwarding-from-compute-nodes) as well.
## Graphical User Interface
......
......@@ -62,11 +62,11 @@ The appropriate dimension of the problem has to be set by parameter (2d/3d).
fluent solver_version [FLUENT_options] -i journal_file -pbs
```
This syntax will start the ANSYS FLUENT job under PBS Professional using the qsub command in a batch manner. When resources are available, PBS Professional will start the job and return a job ID, usually in the form of _job_ID.hostname_. This job ID can then be used to query, control, or stop the job using standard PBS Professional commands, such as qstat or qdel. The job will be run out of the current working directory, and all output will be written to the file fluent.o _job_ID_.
This syntax will start the ANSYS FLUENT job under PBS Professional using the qsub command in a batch manner. When resources are available, PBS Professional will start the job and return a job ID, usually in the form of _job_ID.hostname_. This job ID can then be used to query, control, or stop the job using standard PBS Professional commands, such as qstat or qdel. The job will be run out of the current working directory, and all output will be written to the file fluent.o _job_ID_.
## Running Fluent via User's Config File
The sample script uses a configuration file called pbs_fluent.conf if no command line arguments are present. This configuration file should be present in the directory from which the jobs are submitted (which is also the directory in which the jobs are executed). The following is an example of what the content of pbs_fluent.conf can be:
The sample script uses a configuration file called pbs_fluent.conf if no command line arguments are present. This configuration file should be present in the directory from which the jobs are submitted (which is also the directory in which the jobs are executed). The following is an example of what the content of pbs_fluent.conf can be:
```bash
input="example_small.flin"
......
# ANSYS LS-DYNA
**[ANSYSLS-DYNA](http://www.ansys.com/products/structures/ansys-ls-dyna)** software provides convenient and easy-to-use access to the technology-rich, time-tested explicit solver without the need to contend with the complex input requirements of this sophisticated program. Introduced in 1996, ANSYS LS-DYNA capabilities have helped customers in numerous industries to resolve highly intricate design issues. ANSYS Mechanical users have been able take advantage of complex explicit solutions for a long time utilizing the traditional ANSYS Parametric Design Language (APDL) environment. These explicit capabilities are available to ANSYS Workbench users as well. The Workbench platform is a powerful, comprehensive, easy-to-use environment for engineering simulation. CAD import from all sources, geometry cleanup, automatic meshing, solution, parametric optimization, result visualization and comprehensive report generation are all available within a single fully interactive modern graphical user environment.
**[ANSYSLS-DYNA](http://www.ansys.com/products/structures/ansys-ls-dyna)** software provides convenient and easy-to-use access to the technology-rich, time-tested explicit solver without the need to contend with the complex input requirements of this sophisticated program. Introduced in 1996, ANSYS LS-DYNA capabilities have helped customers in numerous industries to resolve highly intricate design issues. ANSYS Mechanical users have been able take advantage of complex explicit solutions for a long time utilizing the traditional ANSYS Parametric Design Language (APDL) environment. These explicit capabilities are available to ANSYS Workbench users as well. The Workbench platform is a powerful, comprehensive, easy-to-use environment for engineering simulation. CAD import from all sources, geometry cleanup, automatic meshing, solution, parametric optimization, result visualization and comprehensive report generation are all available within a single fully interactive modern graphical user environment.
To run ANSYS LS-DYNA in batch mode you can utilize/modify the default ansysdyna.pbs script and execute it via the qsub command.
......
......@@ -2,7 +2,7 @@
**[SVS FEM](http://www.svsfem.cz/)** as **[ANSYS Channel partner](http://www.ansys.com/)** for Czech Republic provided all ANSYS licenses for ANSELM cluster and supports of all ANSYS Products (Multiphysics, Mechanical, MAPDL, CFX, Fluent, Maxwell, LS-DYNA...) to IT staff and ANSYS users. If you are challenging to problem of ANSYS functionality contact please [hotline@svsfem.cz](mailto:hotline@svsfem.cz?subject=Ostrava%20-%20ANSELM)
Anselm provides commercial as well as academic variants. Academic variants are distinguished by "**Academic...**" word in the name of license or by two letter preposition "**aa\_**" in the license feature name. Change of license is realized on command line respectively directly in user's PBS file (see individual products). [ More about licensing here](ansys/licensing/)
Anselm provides commercial as well as academic variants. Academic variants are distinguished by "**Academic...**" word in the name of license or by two letter preposition "**aa\_**" in the license feature name. Change of license is realized on command line respectively directly in user's PBS file (see individual products). [ More about licensing here](ansys/licensing/)
To load the latest version of any ANSYS product (Mechanical, Fluent, CFX, MAPDL,...) load the module:
......
......@@ -33,7 +33,7 @@ Compilation parameters are default:
Molpro is compiled for parallel execution using MPI and OpenMP. By default, Molpro reads the number of allocated nodes from PBS and launches a data server on one node. On the remaining allocated nodes, compute processes are launched, one process per node, each with 16 threads. You can modify this behavior by using -n, -t and helper-server options. Please refer to the [Molpro documentation](http://www.molpro.net/info/2010.1/doc/manual/node9.html) for more details.
!!! note
The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS.
The OpenMP parallelization in Molpro is limited and has been observed to produce limited scaling. We therefore recommend to use MPI parallelization only. This can be achieved by passing option mpiprocs=16:ompthreads=1 to PBS.
You are advised to use the -d option to point to a directory in [SCRATCH file system](../../storage/storage/). Molpro can produce a large amount of temporary data during its run, and it is important that these are placed in the fast scratch file system.
......
......@@ -18,7 +18,7 @@ For information about the usage of Intel Compilers and other Intel products, ple
## GNU C/C++ and Fortran Compilers
For compatibility reasons there are still available the original (old 4.4.6-4) versions of GNU compilers as part of the OS. These are accessible in the search path by default.
For compatibility reasons there are still available the original (old 4.4.6-4) versions of GNU compilers as part of the OS. These are accessible in the search path by default.
It is strongly recommended to use the up to date version (4.8.1) which comes with the module gcc:
......
......@@ -49,7 +49,7 @@ To run COMSOL in batch mode, without the COMSOL Desktop GUI environment, user ca
#PBS -l select=3:ncpus=16
#PBS -q qprod
#PBS -N JOB_NAME
#PBS -A PROJECT_ID
#PBS -A PROJECT_ID
cd /scratch/$USER/ || exit
......@@ -95,7 +95,7 @@ To run LiveLink for MATLAB in batch mode with (comsol_matlab.pbs) job script you
#PBS -l select=3:ncpus=16
#PBS -q qprod
#PBS -N JOB_NAME
#PBS -A PROJECT_ID
#PBS -A PROJECT_ID
cd /scratch/$USER || exit
......
......@@ -53,7 +53,7 @@ Before debugging, you need to compile your code with theses flags:
## Starting a Job With DDT
Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh:
Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh:
```bash
$ ssh -X username@anselm.it4i.cz
......
......@@ -30,7 +30,7 @@ CUBE is a graphical application. Refer to Graphical User Interface documentation
!!! note
Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes.
After loading the appropriate module, simply launch cube command, or alternatively you can use scalasca -examine command to launch the GUI. Note that for Scalasca datasets, if you do not analyze the data with scalasca -examine before to opening them with CUBE, not all performance data will be available.
After loading the appropriate module, simply launch cube command, or alternatively you can use scalasca -examine command to launch the GUI. Note that for Scalasca datasets, if you do not analyze the data with scalasca -examine before to opening them with CUBE, not all performance data will be available.
References
1\. <http://www.scalasca.org/software/cube-4.x/download.html>
......@@ -57,7 +57,7 @@ Sample output:
### Pcm-Msr
Command pcm-msr.x can be used to read/write model specific registers of the CPU.
Command pcm-msr.x can be used to read/write model specific registers of the CPU.
### Pcm-Numa
......
......@@ -56,7 +56,7 @@ Application: ssh
Application parameters: mic0 source ~/.profile && /path/to/your/bin
Note that we include source ~/.profile in the command to setup environment paths [as described here](../intel-xeon-phi/).
Note that we include source ~/.profile in the command to setup environment paths [as described here](../intel-xeon-phi/).
!!! note
If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card.
......
......@@ -20,7 +20,7 @@ This will load the default version. Execute module avail papi for a list of inst
## Utilities
The bin directory of PAPI (which is automatically added to $PATH upon loading the module) contains various utilites.
The bin directory of PAPI (which is automatically added to $PATH upon loading the module) contains various utilites.
### Papi_avail
......
......@@ -34,14 +34,14 @@ $ mpif90 -o myapp foo.o bar.o
with:
```bash
$ scorep mpif90 -c foo.f90
$ scorep mpif90 -c bar.f90
$ scorep mpif90 -o myapp foo.o bar.o
$ scorep mpif90 -c foo.f90
$ scorep mpif90 -c bar.f90
$ scorep mpif90 -o myapp foo.o bar.o
```
Usually your program is compiled using a Makefile or similar script, so it advisable to add the scorep command to your definition of variables CC, CXX, FCC etc.
Usually your program is compiled using a Makefile or similar script, so it advisable to add the scorep command to your definition of variables CC, CXX, FCC etc.
It is important that scorep is prepended also to the linking command, in order to link with Score-P instrumentation libraries.
It is important that scorep is prepended also to the linking command, in order to link with Score-P instrumentation libraries.
### Manual Instrumentation Using API Calls
......@@ -78,7 +78,7 @@ Please refer to the [documentation for description of the API](https://silc.zih.
### Manual Instrumentation Using Directives
This method uses POMP2 directives to mark regions to be instrumented. To use this method, use command scorep --pomp.
This method uses POMP2 directives to mark regions to be instrumented. To use this method, use command scorep --pomp.
Example directives in C/C++ :
......
......@@ -20,7 +20,7 @@ You can check the status of the licenses here:
# totalview
# -------------------------------------------------
# FEATURE TOTAL USED AVAIL
# FEATURE TOTAL USED AVAIL
# -------------------------------------------------
TotalView_Team 64 0 64
Replay 64 0 64
......
......@@ -156,7 +156,7 @@ The default version without MPI support will however report a large number of fa
==30166== by 0x4008BD: main (valgrind-example-mpi.c:18)
```
so it is better to use the MPI-enabled valgrind from module. The MPI version requires library /apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so, which must be included in the LD_PRELOAD environment variable.
so it is better to use the MPI-enabled valgrind from module. The MPI version requires library /apps/tools/valgrind/3.9.0/impi/lib/valgrind/libmpiwrap-amd64-linux.so, which must be included in the LD_PRELOAD environment variable.
Lets look at this MPI example :
......
......@@ -11,7 +11,7 @@ Intel Math Kernel Library (Intel MKL) is a library of math kernel subroutines, e
* Vector Math Library (VML) routines for optimized mathematical operations on vectors.
* Vector Statistical Library (VSL) routines, which offer high-performance vectorized random number generators (RNG) for several probability distributions, convolution and correlation routines, and summary statistics functions.
* Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search.
* Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver.
* Extended Eigensolver, a shared memory version of an eigensolver based on the Feast Eigenvalue Solver.
For details see the [Intel MKL Reference Manual](http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm).
......@@ -39,7 +39,7 @@ The MKL library provides number of interfaces. The fundamental once are the LP64
Linking MKL libraries may be complex. Intel [mkl link line advisor](http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor) helps. See also [examples](intel-mkl/#examples) below.
You will need the mkl module loaded to run the mkl enabled executable. This may be avoided, by compiling library search paths into the executable. Include rpath on the compile line:
You will need the mkl module loaded to run the mkl enabled executable. This may be avoided, by compiling library search paths into the executable. Include rpath on the compile line:
```bash
$ icc .... -Wl,-rpath=$LIBRARY_PATH ...
......@@ -74,7 +74,7 @@ Number of examples, demonstrating use of the MKL library and its linking is avai
$ make sointel64 function=cblas_dgemm
```
In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL example suite installed on Anselm.
In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL example suite installed on Anselm.
### Example: MKL and Intel Compiler
......@@ -88,14 +88,14 @@ In this example, we compile, link and run the cblas_dgemm example, demonstratin
$ ./cblas_dgemmx.x data/cblas_dgemmx.d
```
In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL with icc -mkl option. Using the -mkl option is equivalent to:
In this example, we compile, link and run the cblas_dgemm example, demonstrating use of MKL with icc -mkl option. Using the -mkl option is equivalent to:
```bash
$ icc -w source/cblas_dgemmx.c source/common_func.c -o cblas_dgemmx.x
-I$MKL_INC_DIR -L$MKL_LIB_DIR -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5
```
In this example, we compile and link the cblas_dgemm example, using LP64 interface to threaded MKL and Intel OMP threads implementation.
In this example, we compile and link the cblas_dgemm example, using LP64 interface to threaded MKL and Intel OMP threads implementation.
### Example: MKL and GNU Compiler
......@@ -111,7 +111,7 @@ In this example, we compile and link the cblas_dgemm example, using LP64 interf
$ ./cblas_dgemmx.x data/cblas_dgemmx.d
```
In this example, we compile, link and run the cblas_dgemm example, using LP64 interface to threaded MKL and gnu OMP threads implementation.
In this example, we compile, link and run the cblas_dgemm example, using LP64 interface to threaded MKL and gnu OMP threads implementation.
## MKL and MIC Accelerators
......
......@@ -38,7 +38,7 @@ Example of the Commercial Matlab license state:
$ cat /apps/user/licenses/matlab_features_state.txt
# matlab
# -------------------------------------------------
# FEATURE TOTAL USED AVAIL
# FEATURE TOTAL USED AVAIL
# -------------------------------------------------
MATLAB 1 1 0
SIMULINK 1 0 1
......
......@@ -198,7 +198,7 @@ Example run script (run.bat) for Windows virtual machine:
call application.bat z:data z:output
```
Run script runs application from shared job directory (mapped as drive z:), process input data (z:data) from job directory and store output to job directory (z:output).
Run script runs application from shared job directory (mapped as drive z:), process input data (z:data) from job directory and store output to job directory (z:output).
### Run Jobs
......
......@@ -19,7 +19,7 @@ Look up section modulefiles/mpi in module avail
```bash
$ module avail
------------------------- /opt/modules/modulefiles/mpi -------------------------
bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-icc
bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-icc
impi/4.0.3.008 openmpi/1.6.5-gcc(default)
impi/4.1.0.024 openmpi/1.6.5-gcc46
impi/4.1.0.030 openmpi/1.6.5-icc
......
......@@ -41,7 +41,7 @@ You need to preload the executable, if running on the local scratch /lscratch fi
Hello world! from rank 3 of 4 on host cn110
```
In this example, we assume the executable helloworld_mpi.x is present on shared home directory. We run the cp command via mpirun, copying the executable from shared home to local scratch . Second mpirun will execute the binary in the /lscratch/15210.srv11 directory on nodes cn17, cn108, cn109 and cn110, one process per node.
In this example, we assume the executable helloworld_mpi.x is present on shared home directory. We run the cp command via mpirun, copying the executable from shared home to local scratch . Second mpirun will execute the binary in the /lscratch/15210.srv11 directory on nodes cn17, cn108, cn109 and cn110, one process per node.
!!! note
MPI process mapping may be controlled by PBS parameters.
......
......@@ -274,7 +274,7 @@ You can use MATLAB on UV2000 in two parallel modes:
### Threaded Mode
Since this is a SMP machine, you can completely avoid using Parallel Toolbox and use only MATLAB's threading. MATLAB will automatically detect the number of cores you have allocated and will set maxNumCompThreads accordingly and certain operations, such as fft, , eig, svd, etc. will be automatically run in threads. The advantage of this mode is that you don't need to modify your existing sequential codes.
Since this is a SMP machine, you can completely avoid using Parallel Toolbox and use only MATLAB's threading. MATLAB will automatically detect the number of cores you have allocated and will set maxNumCompThreads accordingly and certain operations, such as fft, , eig, svd, etc. will be automatically run in threads. The advantage of this mode is that you don't need to modify your existing sequential codes.
### Local Cluster Mode
......
......@@ -72,7 +72,7 @@ extras = {};
System MPI library allows Matlab to communicate through 40 Gbit/s InfiniBand QDR interconnect instead of slower 1 Gbit Ethernet network.
!!! note
The path to MPI library in "mpiLibConf.m" has to match with version of loaded Intel MPI module. In this example the version 4.1.1.036 of Intel MPI is used by Matlab and therefore module impi/4.1.1.036 has to be loaded prior to starting Matlab.
The path to MPI library in "mpiLibConf.m" has to match with version of loaded Intel MPI module. In this example the version 4.1.1.036 of Intel MPI is used by Matlab and therefore module impi/4.1.1.036 has to be loaded prior to starting Matlab.
### Parallel Matlab Interactive Session
......
......@@ -70,7 +70,7 @@ This script may be submitted directly to the PBS workload manager via the qsub c
## Parallel R
Parallel execution of R may be achieved in many ways. One approach is the implied parallelization due to linked libraries or specially enabled functions, as [described above](r/#interactive-execution). In the following sections, we focus on explicit parallelization, where parallel constructs are directly stated within the R script.
Parallel execution of R may be achieved in many ways. One approach is the implied parallelization due to linked libraries or specially enabled functions, as [described above](r/#interactive-execution). In the following sections, we focus on explicit parallelization, where parallel constructs are directly stated within the R script.
## Package Parallel
......@@ -375,7 +375,7 @@ Example jobscript for [static Rmpi](r/#static-rmpi) parallel R execution, runnin
#PBS -N Rjob
#PBS -l select=100:ncpus=16:mpiprocs=16:ompthreads=1
# change to scratch directory
# change to scratch directory
SCRDIR=/scratch/$USER/myjob
cd $SCRDIR || exit
......
......@@ -2,7 +2,7 @@
The discrete Fourier transform in one or more dimensions, MPI parallel
FFTW is a C subroutine library for computing the discrete Fourier transform in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, e.g. the discrete cosine/sine transforms or DCT/DST). The FFTW library allows for MPI parallel, in-place discrete Fourier transform, with data distributed over number of nodes.
FFTW is a C subroutine library for computing the discrete Fourier transform in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, e.g. the discrete cosine/sine transforms or DCT/DST). The FFTW library allows for MPI parallel, in-place discrete Fourier transform, with data distributed over number of nodes.
Two versions, **3.3.3** and **2.1.5** of FFTW are available on Anselm, each compiled for **Intel MPI** and **OpenMPI** using **intel** and **gnu** compilers. These are available via modules:
......
......@@ -23,7 +23,7 @@ Compilation example:
```bash
$ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall $MAGMA_INC -c testing_dgetrf_mic.cpp -o testing_dgetrf_mic.o
$ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall -fPIC -Xlinker -zmuldefs -Wall -DNOCHANGE -DHOST testing_dgetrf_mic.o -o testing_dgetrf_mic $MAGMA_LIBS
$ icc -mkl -O3 -DHAVE_MIC -DADD_ -Wall -fPIC -Xlinker -zmuldefs -Wall -DNOCHANGE -DHOST testing_dgetrf_mic.o -o testing_dgetrf_mic $MAGMA_LIBS
```
### Running MAGMA Code
......@@ -54,15 +54,15 @@ To test if the MAGMA server runs properly we can run one of examples that are pa
M N CPU GFlop/s (sec) MAGMA GFlop/s (sec) ||PA-LU||/(||A||*N)
=========================================================================
1088 1088 --- ( --- ) 13.93 ( 0.06) ---
2112 2112 --- ( --- ) 77.85 ( 0.08) ---
3136 3136 --- ( --- ) 183.21 ( 0.11) ---
4160 4160 --- ( --- ) 227.52 ( 0.21) ---
5184 5184 --- ( --- ) 258.61 ( 0.36) ---
6208 6208 --- ( --- ) 333.12 ( 0.48) ---
7232 7232 --- ( --- ) 416.52 ( 0.61) ---
8256 8256 --- ( --- ) 446.97 ( 0.84) ---
9280 9280 --- ( --- ) 461.15 ( 1.16) ---
1088 1088 --- ( --- ) 13.93 ( 0.06) ---
2112 2112 --- ( --- ) 77.85 ( 0.08) ---
3136 3136 --- ( --- ) 183.21 ( 0.11) ---
4160 4160 --- ( --- ) 227.52 ( 0.21) ---
5184 5184 --- ( --- ) 258.61 ( 0.36) ---
6208 6208 --- ( --- ) 333.12 ( 0.48) ---
7232 7232 --- ( --- ) 416.52 ( 0.61) ---
8256 8256 --- ( --- ) 446.97 ( 0.84) ---
9280 9280 --- ( --- ) 461.15 ( 1.16) ---
10304 10304 --- ( --- ) 500.70 ( 1.46) ---
```
......
......@@ -10,7 +10,7 @@ PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of bu
* [project webpage](http://www.mcs.anl.gov/petsc/)
* [documentation](http://www.mcs.anl.gov/petsc/documentation/)
* [PETSc Users Manual (PDF)](http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf)
* [PETSc Users Manual (PDF)](http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf)
* [index of all manual pages](http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/singleindex.html)
* PRACE Video Tutorial [part1](http://www.youtube.com/watch?v=asVaFg1NDqY), [part2](http://www.youtube.com/watch?v=ubp_cSibb9I), [part3](http://www.youtube.com/watch?v=vJAAAQv-aaw), [part4](http://www.youtube.com/watch?v=BKVlqWNh8jY), [part5](http://www.youtube.com/watch?v=iXkbLEBFjlM)
......
......@@ -53,7 +53,7 @@ Because a direct connection is not allowed to compute nodes on Anselm, you must
ssh -TN -L 12345:cn77:11111 username@anselm.it4i.cz
```
replace username with your login and cn77 with the name of compute node your ParaView server is running on (see previous step). If you use PuTTY on Windows, load Anselm connection configuration, t>hen go to Connection-> SSH>->Tunnels to set up the port forwarding. Click Remote radio button. Insert 12345 to Source port textbox. Insert cn77:11111. Click Add button, then Open.
replace username with your login and cn77 with the name of compute node your ParaView server is running on (see previous step). If you use PuTTY on Windows, load Anselm connection configuration, t>hen go to Connection-> SSH>->Tunnels to set up the port forwarding. Click Remote radio button. Insert 12345 to Source port textbox. Insert cn77:11111. Click Add button, then Open.
Now launch ParaView client installed on your desktop PC. Select File->Connect..., click Add Server. Fill in the following :
......
......@@ -23,7 +23,7 @@ If multiple clients try to read and write the same part of a file at the same ti
There is default stripe configuration for Anselm Lustre filesystems. However, users can set the following stripe parameters for their own directories or files to get optimum I/O performance:
1. stripe_size: the size of the chunk in bytes; specify with k, m, or g to use units of KB, MB, or GB, respectively; the size must be an even multiple of 65,536 bytes; default is 1MB for all Anselm Lustre filesystems
2. stripe_count the number of OSTs to stripe across; default is 1 for Anselm Lustre filesystems one can specify -1 to use all OSTs in the filesystem.
2. stripe_count the number of OSTs to stripe across; default is 1 for Anselm Lustre filesystems one can specify -1 to use all OSTs in the filesystem.
3. stripe_offset The index of the OST where the first stripe is to be placed; default is -1 which results in random selection; using a non-default value is NOT recommended.
!!! note
......@@ -76,7 +76,7 @@ Read more on <http://doc.lustre.org/lustre_manual.xhtml#managingstripingfreespac
### Lustre on Anselm
The architecture of Lustre on Anselm is composed of two metadata servers (MDS) and four data/object storage servers (OSS). Two object storage servers are used for file system HOME and another two object storage servers are used for file system SCRATCH.
The architecture of Lustre on Anselm is composed of two metadata servers (MDS) and four data/object storage servers (OSS). Two object storage servers are used for file system HOME and another two object storage servers are used for file system SCRATCH.
Configuration of the storages
......@@ -132,7 +132,7 @@ Default stripe size is 1MB, stripe count is 1. There are 22 OSTs dedicated for t
The SCRATCH filesystem is mounted in directory /scratch. Users may freely create subdirectories and files on the filesystem. Accessible capacity is 146TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 100TB per user. The purpose of this quota is to prevent runaway programs from filling the entire filesystem and deny service to other users. If 100TB should prove as insufficient for particular user, please contact [support](https://support.it4i.cz/rt), the quota may be lifted upon request.
!!! note
The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory.
The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory.
>Users are advised to save the necessary data from the SCRATCH filesystem to HOME filesystem after the calculations and clean up the scratch files.
......@@ -166,11 +166,11 @@ Example for Lustre HOME directory:
```bash
$ lfs quota /home
Disk quotas for user user001 (uid 1234):
Filesystem kbytes quota limit grace files quota limit grace
/home 300096 0 250000000 - 2102 0 500000 -
Filesystem kbytes quota limit grace files quota limit grace
/home 300096 0 250000000 - 2102 0 500000 -
Disk quotas for group user001 (gid 1234):
Filesystem kbytes quota limit grace files quota limit grace
/home 300096 0 0 - 2102 0 0 -
Filesystem kbytes quota limit grace files quota limit grace
/home 300096 0 0 - 2102 0 0 -
```
In this example, we view current quota size limit of 250GB and 300MB currently used by user001.
......@@ -180,7 +180,7 @@ Example for Lustre SCRATCH directory:
```bash
$ lfs quota /scratch
Disk quotas for user user001 (uid 1234):
Filesystem kbytes quota limit grace files quota limit grace
<