diff --git a/docs.it4i/general/karolina-slurm.md b/docs.it4i/general/karolina-slurm.md index 4fe93e07966fd1dafcb7e771f262b4638fb93d1c..43ceb0782f76cfcb7f3d10f1911646a2a058bfbe 100644 --- a/docs.it4i/general/karolina-slurm.md +++ b/docs.it4i/general/karolina-slurm.md @@ -43,6 +43,14 @@ On Karolina cluster Division of nodes means that if two users allocate a portion of the same node, they can see each other's running processes. If this solution is inconvenient for you, consider allocating a whole node. + +IT4I clusters are monitored for resources utilization. +One of the monitoring daemons is using registers to collect performance +monitoring counters (PMC), which user may need when analysing performance +of the executed application (perf or [Score-P][10] profiling tools). +To deactivate the daemon and release the respective registers set job feature +during allocation, as specified [here][9]. + ## Using CPU Queues Access [standard compute nodes][4]. @@ -173,3 +181,5 @@ $ salloc -A PROJECT-ID -p qviz --exclusive [6]: /karolina/compute-nodes/#data-analytics-compute-node [7]: /karolina/visualization/ [8]: ./karolina-partitions.md +[9]: /job-features.md/#cluster-monitoring +[10]: /software/debuggers/score-p/ \ No newline at end of file diff --git a/docs.it4i/job-features.md b/docs.it4i/job-features.md index 4bf46fe9909aec8602675e36a61f4e7fb572c5c5..ede779c794bc08e6709d499d09b323287569c62c 100644 --- a/docs.it4i/job-features.md +++ b/docs.it4i/job-features.md @@ -115,6 +115,21 @@ $ salloc ... --comment "use:msr=version_string" !!! Warning Available on Barbora nodes only. +!!! Warning + It is recommended to combine with setting the feature `mon-flops=off`. + +## Cluster Monitoring + +Disable monitoring of certain registers which are used to collect performance +monitoring counters (PMC) values such as CPU FLOPs or Memory Bandwidth: + +```console +$ salloc ... --comment "use:mon-flops=off" +``` + +!!! Warning + Available on Karolina nodes only. + ## HDEEM Support Load the HDEEM software stack. The [High Definition Energy Efficiency Monitoring][b] (HDEEM) library is a software interface used to measure power consumption of HPC clusters with bullx blades.