diff --git a/docs.it4i/src/mympiprog_32p_2014-10-15_16-56.html b/docs.it4i/src/mympiprog_32p_2014-10-15_16-56.html new file mode 100644 index 0000000000000000000000000000000000000000..ce60070a9ee25a91973a577fd048d88f31d4680e --- /dev/null +++ b/docs.it4i/src/mympiprog_32p_2014-10-15_16-56.html @@ -0,0 +1,610 @@ + + + + +mympiprog.x - Performance Report + + + + + + + +
+ +
+ +
+
+ + + + + + + + +
Executable:mympiprog.x
Resources:32 processes, 2 nodes
Machine:cn182
Start time:Wed Oct 15 16:56:23 2014
Total time:7 seconds (0 minutes)
Full path:/home/user
Notes:
+
+
+
+
+
+
+
+

Error: javascript is not running

+

The graphs in this Performance Report require javascript, which is disabled or not working.

+

Check whether your javascript support is enabled or try another browser.

+

Remember, you can always contact support@allinea.com, we're very nice!

+
+
+
Summary: mympiprog.x is CPU-bound in this configuration
+
The total wallclock time was spent as follows:
+ + + + + + + +
CPU88.6%

Time spent running application code. High values are usually good.

This is high; check the CPU performance section for optimization advice.

MPI11.4%

Time spent in MPI calls. High values are usually bad.

This is very low; this code may benefit from increasing the process count.

I/O0.0%

Time spent in filesystem I/O. High values are usually bad.

This is negligible; there's no need to investigate I/O performance.

+

This application run was CPU-bound. A breakdown of this time and advice for investigating further is in the CPU section below.

As very little time is spent in MPI calls, this code may also benefit from running at larger scales.

+
+
+
+
+
CPU
+
A breakdown of how the 88.6% total CPU time was spent:
+ + + + + +
Scalar numeric ops50.0%
Vector numeric ops50.0%
Memory accesses0.0%
Other0.0%
+
+
The per-core performance is arithmetic-bound. Try to increase the amount of time spent in vectorized instructions by analyzing the compiler's vectorization reports.
+
+
+
+
+
MPI
+
Of the 11.4% total time spent in MPI calls:
+ + + + + +
Time in collective calls100.0%
Time in point-to-point calls0.0%
Effective process collective rate1.65e+02 
Effective process point-to-point rate0.00e+00 
+
+
Most of the time is spent in collective calls with a very low transfer rate. This suggests load imbalance is causing synchonization overhead; use an MPI profiler to investigate further.
+
+
+
+
+
+
+
+
I/O
+
A breakdown of how the 0.0% total I/O time was spent:
+ + + + + +
Time in reads0.0%
Time in writes0.0%
Effective process read rate0.00e+00 
Effective process write rate0.00e+00 
+
+
No time is spent in I/O operations. There's nothing to optimize here!
+
+
+
+
+
Memory
+
Per-process memory usage may also affect scaling:
+ + + + +
Mean process memory usage2.33e+07 
Peak process memory usage2.35e+07 
Peak node memory usage2.8%
+
+
The peak node memory usage is very low. You may be able to reduce the amount of allocation time used by running with fewer MPI processes and more data on each process.
+
+
+
+
+
+
+ + diff --git a/docs.it4i/src/mympiprog_32p_2014-10-15_16-56.txt b/docs.it4i/src/mympiprog_32p_2014-10-15_16-56.txt new file mode 100644 index 0000000000000000000000000000000000000000..de8449179640fd943a9f007f9eda084b11f2a455 --- /dev/null +++ b/docs.it4i/src/mympiprog_32p_2014-10-15_16-56.txt @@ -0,0 +1,50 @@ +Executable: mympiprog.x +Resources: 32 processes, 2 nodes +Machine: cn182 +Started on: Wed Oct 15 16:56:23 2014 +Total time: 7 seconds (0 minutes) +Full path: /home/user +Notes: + +Summary: mympiprog.x is CPU-bound in this configuration +CPU: 88.6% |========| +MPI: 11.4% || +I/O: 0.0% | +This application run was CPU-bound. A breakdown of this time and advice for investigating further is found in the CPU section below. +As very little time is spent in MPI calls, this code may also benefit from running at larger scales. + +CPU: +A breakdown of how the 88.6% total CPU time was spent: +Scalar numeric ops: 50.0% |====| +Vector numeric ops: 50.0% |====| +Memory accesses: 0.0% | +Other: 0.0% | +The per-core performance is arithmetic-bound. Try to increase the amount of time spent in vectorized instructions by analyzing the compiler's vectorization reports. + + +MPI: +A breakdown of how the 11.4% total MPI time was spent: +Time in collective calls: 100.0% |=========| +Time in point-to-point calls: 0.0% | +Effective collective rate: 1.65e+02 bytes/s +Effective point-to-point rate: 0.00e+00 bytes/s +Most of the time is spent in collective calls with a very low transfer rate. This suggests load imbalance is causing synchonization overhead; use an MPI profiler to investigate further. + + +I/O: +A breakdown of how the 0.0% total I/O time was spent: +Time in reads: 0.0% | +Time in writes: 0.0% | +Effective read rate: 0.00e+00 bytes/s +Effective write rate: 0.00e+00 bytes/s +No time is spent in I/O operations. There's nothing to optimize here! + + +Memory: +Per-process memory usage may also affect scaling: +Mean process memory usage: 2.33e+07 bytes +Peak process memory usage: 2.35e+07 bytes +Peak node memory usage: 2.8% | +The peak node memory usage is very low. You may be able to reduce the amount of allocation time used by running with fewer MPI processes and more data on each process. + +