From a465a41c6e6905f3f47b8e6398fedf2ce3082578 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Petr=20Pt=C3=A1=C4=8Dek?= <petr.ptacek@vsb.cz> Date: Thu, 20 Mar 2025 12:38:32 +0100 Subject: [PATCH] ADD, added mon-flops job feature description --- docs.it4i/general/karolina-slurm.md | 10 ++++++++++ docs.it4i/job-features.md | 15 +++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/docs.it4i/general/karolina-slurm.md b/docs.it4i/general/karolina-slurm.md index 4fe93e079..43ceb0782 100644 --- a/docs.it4i/general/karolina-slurm.md +++ b/docs.it4i/general/karolina-slurm.md @@ -43,6 +43,14 @@ On Karolina cluster Division of nodes means that if two users allocate a portion of the same node, they can see each other's running processes. If this solution is inconvenient for you, consider allocating a whole node. + +IT4I clusters are monitored for resources utilization. +One of the monitoring daemons is using registers to collect performance +monitoring counters (PMC), which user may need when analysing performance +of the executed application (perf or [Score-P][10] profiling tools). +To deactivate the daemon and release the respective registers set job feature +during allocation, as specified [here][9]. + ## Using CPU Queues Access [standard compute nodes][4]. @@ -173,3 +181,5 @@ $ salloc -A PROJECT-ID -p qviz --exclusive [6]: /karolina/compute-nodes/#data-analytics-compute-node [7]: /karolina/visualization/ [8]: ./karolina-partitions.md +[9]: /job-features.md/#cluster-monitoring +[10]: /software/debuggers/score-p/ \ No newline at end of file diff --git a/docs.it4i/job-features.md b/docs.it4i/job-features.md index 4bf46fe99..ede779c79 100644 --- a/docs.it4i/job-features.md +++ b/docs.it4i/job-features.md @@ -115,6 +115,21 @@ $ salloc ... --comment "use:msr=version_string" !!! Warning Available on Barbora nodes only. +!!! Warning + It is recommended to combine with setting the feature `mon-flops=off`. + +## Cluster Monitoring + +Disable monitoring of certain registers which are used to collect performance +monitoring counters (PMC) values such as CPU FLOPs or Memory Bandwidth: + +```console +$ salloc ... --comment "use:mon-flops=off" +``` + +!!! Warning + Available on Karolina nodes only. + ## HDEEM Support Load the HDEEM software stack. The [High Definition Energy Efficiency Monitoring][b] (HDEEM) library is a software interface used to measure power consumption of HPC clusters with bullx blades. -- GitLab