Commit a2457b15 authored by Pavel Jirásek's avatar Pavel Jirásek
Browse files

Merge branch 'remark' of gitlab.it4i.cz:it4i-admins/docs.it4i into remark

parents 35b03a28 8d2eb094
......@@ -6,7 +6,7 @@ CUBE is a graphical performance report explorer for displaying data from Score-P
- **performance metric**, where a number of metrics are available, such as communication time or cache misses,
- **call path**, which contains the call tree of your program
- s**ystem resource**, which contains system's nodes, processes and threads, depending on the parallel programming model.
- **system resource**, which contains system's nodes, processes and threads, depending on the parallel programming model.
Each dimension is organized in a tree, for example the time performance metric is divided into Execution time and Overhead time, call path dimension is organized by files and routines in your source code etc.
......@@ -27,7 +27,7 @@ Currently, there are two versions of CUBE 4.2.3 available as [modules](../../env
CUBE is a graphical application. Refer to Graphical User Interface documentation for a list of methods to launch graphical applications on Anselm.
!!! Note "Note"
!!! Note
Analyzing large data sets can consume large amount of CPU and RAM. Do not perform large analysis on login nodes.
After loading the appropriate module, simply launch cube command, or alternatively you can use scalasca -examine command to launch the GUI. Note that for Scalasca datasets, if you do not analyze the data with scalasca -examine before to opening them with CUBE, not all performance data will be available.
......
......@@ -69,11 +69,11 @@ Can be used to monitor PCI Express bandwith. Usage: pcm-pcie.x <delay>
### pcm-power
Displays energy usage and thermal headroom for CPU and DRAM sockets. Usage: pcm-power.x <delay> \| <external program>
Displays energy usage and thermal headroom for CPU and DRAM sockets. Usage: `pcm-power.x <delay> | <external program>`
### pcm
This command provides an overview of performance counters and memory usage. Usage: pcm.x &lt;delay> \| &lt;external program>
This command provides an overview of performance counters and memory usage. Usage: `pcm.x <delay> | <external program>`
Sample output :
......@@ -192,7 +192,7 @@ Can be used as a sensor for ksysguard GUI, which is currently not installed on A
In a similar fashion to PAPI, PCM provides a C++ API to access the performance counter from within your application. Refer to the [Doxygen documentation](http://intel-pcm-api-documentation.github.io/classPCM.html) for details of the API.
!!! Note "Note"
!!! Note
Due to security limitations, using PCM API to monitor your applications is currently not possible on Anselm. (The application must be run as root user)
Sample program using the API :
......
......@@ -2,7 +2,7 @@
## Introduction
Intel_® _VTune Amplifier, part of Intel Parallel studio, is a GUI profiling tool designed for Intel processors. It offers a graphical performance analysis of single core and multithreaded applications. A highlight of the features:
Intel VTune Amplifier, part of Intel Parallel studio, is a GUI profiling tool designed for Intel processors. It offers a graphical performance analysis of single core and multithreaded applications. A highlight of the features:
- Hotspot analysis
- Locks and waits analysis
......@@ -26,7 +26,7 @@ and launch the GUI :
$ amplxe-gui
```
!!! Note "Note"
!!! Note
To profile an application with VTune Amplifier, special kernel modules need to be loaded. The modules are not loaded on Anselm login nodes, thus direct profiling on login nodes is not possible. Use VTune on compute nodes and refer to the documentation on using GUI applications.
The GUI will open in new window. Click on "_New Project..._" to create a new project. After clicking _OK_, a new window with project properties will appear. At "_Application:_", select the bath to your binary you want to profile (the binary should be compiled with -g flag). Some additional options such as command line arguments can be selected. At "_Managed code profiling mode:_" select "_Native_" (unless you want to profile managed mode .NET/Mono applications). After clicking _OK_, your project is created.
......@@ -47,7 +47,7 @@ Copy the line to clipboard and then you can paste it in your jobscript or in com
## Xeon Phi
!!! Note "Note"
!!! Note
This section is outdated. It will be updated with new information soon.
It is possible to analyze both native and offload Xeon Phi applications. For offload mode, just specify the path to the binary. For native mode, you need to specify in project properties:
......@@ -58,7 +58,7 @@ Application parameters: mic0 source ~/.profile && /path/to/your/bin
Note that we include source ~/.profile in the command to setup environment paths [as described here](../intel-xeon-phi/).
!!! Note "Note"
!!! Note
If the analysis is interrupted or aborted, further analysis on the card might be impossible and you will get errors like "ERROR connecting to MIC card". In this case please contact our support to reboot the MIC card.
You may also use remote analysis to collect data from the MIC and then analyze it in the GUI later :
......
......@@ -68,7 +68,7 @@ Prints which native events are available on the current CPU.
Measures the cost (in cycles) of basic PAPI operations.
\###papi_mem_info
### papi_mem_info
Prints information about the memory architecture of the current CPU.
......
......@@ -27,9 +27,9 @@ Instrumentation via " scalasca -instrument" is discouraged. Use [Score-P instrum
### Runtime measurement
After the application is instrumented, runtime measurement can be performed with the " scalasca -analyze" command. The syntax is:
After the application is instrumented, runtime measurement can be performed with the `scalasca -analyze` command. The syntax is:
scalasca -analyze [scalasca options][launcher] [launcher options][program] [program options]
`scalasca -analyze [scalasca options][launcher] [launcher options][program] [program options]`
An example :
......@@ -39,10 +39,10 @@ An example :
Some notable Scalasca options are:
**-t Enable trace data collection. By default, only summary data are collected.**
**-e &lt;directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with prefix scorep\_, followed by name of the executable and launch configuration.**
- **-t Enable trace data collection. By default, only summary data are collected.**
- **-e &lt;directory> Specify a directory to save the collected data to. By default, Scalasca saves the data to a directory with prefix scorep\_, followed by name of the executable and launch configuration.**
!!! Note "Note"
!!! Note
Scalasca can generate a huge amount of data, especially if tracing is enabled. Please consider saving the data to a [scratch directory](../../storage/storage/).
### Analysis of reports
......
......@@ -287,7 +287,7 @@ In this example, the jobscript executes in multiple instances in parallel, on al
When deciding this values, think about following guiding rules :
1. Let n=N/24. Inequality (n+1) \* T &lt; W should hold. The N is number of tasks per subjob, T is expected single task walltime and W is subjob walltime. Short subjob walltime improves scheduling and job throughput.
1. Let n = N / 24. Inequality (n + 1) x T < W should hold. The N is number of tasks per subjob, T is expected single task walltime and W is subjob walltime. Short subjob walltime improves scheduling and job throughput.
2. Number of tasks should be modulo 24.
3. These rules are valid only when all tasks have similar task walltimes T.
......
......@@ -7,7 +7,7 @@ Compute nodes with MIC accelerator **contains two Intel Xeon Phi 7120P accelerat
[More about schematic representation of the Salomon cluster compute nodes IB topology](ib-single-plane-topology/).
\###Compute Nodes Without Accelerator
### Compute Nodes Without Accelerator
- codename "grafton"
- 576 nodes
......@@ -17,7 +17,7 @@ Compute nodes with MIC accelerator **contains two Intel Xeon Phi 7120P accelerat
![cn_m_cell](../img/cn_m_cell)
\###Compute Nodes With MIC Accelerator
### Compute Nodes With MIC Accelerator
- codename "perrin"
- 432 nodes
......
......@@ -5,8 +5,6 @@ software contains the broad physical modeling capabilities needed to model flow,
1. Common way to run Fluent over pbs file
* * *
To run ANSYS Fluent in batch mode you can utilize/modify the default fluent.pbs script and execute it via the qsub command.
```bash
......@@ -60,8 +58,6 @@ The appropriate dimension of the problem has to be set by parameter (2d/3d).
2. Fast way to run Fluent from command line
* * *
```bash
fluent solver_version [FLUENT_options] -i journal_file -pbs
```
......@@ -70,8 +66,6 @@ This syntax will start the ANSYS FLUENT job under PBS Professional using the qs
3. Running Fluent via user's config file
* * *
The sample script uses a configuration file called pbs_fluent.conf if no command line arguments are present. This configuration file should be present in the directory from which the jobs are submitted (which is also the directory in which the jobs are executed). The following is an example of what the content of pbs_fluent.conf can be:
```bash
......@@ -149,8 +143,6 @@ It runs the jobs out of the directory from which they are submitted (PBS_O_WORKD
4. Running Fluent in parralel
* * *
Fluent could be run in parallel only under Academic Research license. To do so this ANSYS Academic Research license must be placed before ANSYS CFD license in user preferences. To make this change anslic_admin utility should be run
```bash
......
......@@ -12,7 +12,7 @@ Enable Distribute Solution checkbox and enter number of cores (eg. 48 to run on
-mpifile /path/to/my/job/mpifile.txt
```
Where /path/to/my/job is the directory where your project is saved. We will create the file mpifile.txt programatically later in the batch script. For more information, refer to _ANSYS Mechanical APDL Parallel Processing_ _Guide_.
Where /path/to/my/job is the directory where your project is saved. We will create the file mpifile.txt programatically later in the batch script. For more information, refer to \*ANSYS Mechanical APDL Parallel Processing\* \*Guide\*.
Now, save the project and close Workbench. We will use this script to launch the job:
......
......@@ -181,10 +181,10 @@ To run the example on two compute nodes using all 48 cores, with 48 threads, iss
For more information see the man pages.
\##Java
## Java
For information how to use Java (runtime and/or compiler), please read the [Java page](java/).
\##NVIDIA CUDA
## NVIDIA CUDA
For information how to work with NVIDIA CUDA, please read the [NVIDIA CUDA page](../../anselm-cluster-documentation/software/nvidia-cuda/).
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment