Skip to content
Snippets Groups Projects
Commit b017bfe2 authored by Ondrej Vysocky's avatar Ondrej Vysocky
Browse files

NEW A64FX energy measurement example code

parents
No related branches found
No related tags found
No related merge requests found
LICENSE 0 → 100644
A64FX energy measurement example application license (BSD-3)
Copyright (c) 2023, IT4Innovations National Supercomputing Center, Ostrava, Czech Republic
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Makefile 0 → 100644
CXX?=g++
ifeq ($(CXX),FCC)
OMP_FLAG=-Kopenmp
else
OMP_FLAG=-fopenmp
endif
all:
${CXX} -o getEnergy ${OMP_FLAG} a64fx_energy.cpp
Fujitsu A64FX CPU example energy consumption measurement C++ code using perf to read performance counters.
We advise to use [MERIC runtime system](https://code.it4i.cz/vys0053/meric/-/tree/dev), a complex solution providing (not only) energy consumption measurement.
--------------------------------------------------------------------------------
# Notes #
--------------------------------------------------------------------------------
- Current implementation works well for 48-cores versions of A64FX CPU, for 26-cores version it is neccessary to set correct units of monitored performance counters in `PERFEVENT::init()` method. See [source code](https://code.it4i.cz/vys0053/meric/-/blob/master/a64fx_energy.cpp#L125).
- A64FX CPU provides performance counters monitoring power consumption of 3 power domains. The counter updates every cycle of core, L2 cache, or CMG local memory respectively. When the specific hardware is idle, the counter does not increment.
--------------------------------------------------------------------------------
# Acknowledgement #
--------------------------------------------------------------------------------
Implemented at [IT4Innovations National Supercomputing Center](https://www.it4i.cz/) under [BSD-3 license](https://code.it4i.cz/energy-efficiency/a64fx-energy-measurement/blob/master/LICENSE) for [H2020 EUPEX project](https://eupex.eu/).
European Pilot for Exascale (EUPEX) project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101033975. The JU receives support from the European Union's Horizon 2020 research and innovation programme and France, Germany, Italy, Greece, United Kingdom, Czech Republic, Croatia.
/**
* A64FX energy measurement C++ code using perf to read performance counters
*
* Implemented for H2020 EUPEX project (https://eupex.eu/)
*
* 10.11.2022
* Ondrej Vysocky
* IT4Innovations national supercomputing center, Czech Republic
**/
#include <linux/perf_event.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <sys/syscall.h>
#include <climits>
#include <cstring>
#include <vector>
#include <map>
#include <iostream>
#include "work.h" //do_work()
// From PAPI src https://bitbucket.org/icl/papi/src/6d0f9d3dfb45a2f4262e9f63963e34cd121cd3d1/src/components/perf_event/perf_helpers.h#lines-5
/* In case headers aren't new enough to have __NR_perf_event_open */
#ifndef __NR_perf_event_open
#ifdef __powerpc__
#define __NR_perf_event_open 319
#elif defined(__x86_64__)
#define __NR_perf_event_open 298
#elif defined(__i386__)
#define __NR_perf_event_open 336
#elif defined(__arm__) || defined(__aarch64__)
#define __NR_perf_event_open 364
#endif
#endif
/**
* EA_CORE = energy consumption per cycle of core, unit 8nJ for A64FX (2.2/2.0/1.8 GHz, 48 cores), and 9nJ for A64FX (2.6 GHz, 24 cores)
* EA_L2 = energy consumption per cycle of L2 cache, unit 32nJ for A64FX (2.2/2.0/1.8 GHz, 48 cores), and 36nJ for A64FX (2.6 GHz, 24 cores).
* EA_MEMORY = energy consumption per cycle of CMG local memory, unit 256nJ
**/
namespace PERFEVENT_COUNTERS {
const std::string EA_CORE = "r01e0";
const std::string EA_L2 = "r03e0";
const std::string EA_MEMORY = "r03e8";
// a label to display
static std::map<const std::string, std::string> counterLabel = {
{ EA_CORE, "CORE [J]" },
{ EA_L2, "L2 [J]" },
{ EA_MEMORY, "MEM [J]" }
};
} // PERFEVENT_COUNTERS namespace
namespace PERFEVENT {
std::map<std::string, unsigned long long int> counters;
unsigned long long int COUNTERMAX = LLONG_MAX;
std::map<std::string, double> units;
/**
* open a performance counter descriptor
**/
long openCounter(perf_event_attr attr)
{
//(struct perf_event_attr, pid, cpu, group_fd, flags);
long fd = syscall(__NR_perf_event_open, &attr, syscall(__NR_gettid), -1, -1, 0);
/*
pid == 0 and cpu == -1
This measures the calling process/thread on any CPU.
pid == 0 and cpu >= 0
This measures the calling process/thread only when running on the specified CPU.
pid > 0 and cpu == -1
This measures the specified process/thread on any CPU.
pid > 0 and cpu >= 0
This measures the specified process/thread only when running on the specified CPU.
pid == -1 and cpu >= 0
This measures all processes/threads on the specified CPU. This requires CAP_SYS_ADMIN capability or a /proc/sys/kernel/perf_event_paranoid value of less than 1.
pid == -1 and cpu == -1
This setting is invalid and will return an error.
*/
if (fd < 0)
{
//int err = errno;
std::cerr << "PREFEVENT error while opening new fd\n";
}
return fd;
}
/**
* activates selected A64FX performance counters counting
**/
void init()
{
auto setCounter = [&] (__u32 type, __u64 config)
{
struct perf_event_attr attr;
memset(&attr, 0, sizeof(struct perf_event_attr));
attr.size = sizeof(struct perf_event_attr);
attr.inherit = 1;
attr.type = type;
attr.config = config;
return openCounter(attr);
};
units[PERFEVENT_COUNTERS::EA_CORE] = 8e-9;
units[PERFEVENT_COUNTERS::EA_L2] = 32e-9;
/* //TODO automatic identification of CPU type
// units of 26-cores A64FX CPU
units[PERFEVENT_COUNTERS::EA_CORE] = 9e-9;
units[PERFEVENT_COUNTERS::EA_L2] = 36e-9;
*/
units[PERFEVENT_COUNTERS::EA_MEMORY] = 256e-9;
counters[PERFEVENT_COUNTERS::EA_CORE] = setCounter(PERF_TYPE_RAW, 0x01e0);
counters[PERFEVENT_COUNTERS::EA_L2] = setCounter(PERF_TYPE_RAW, 0x03e0);
counters[PERFEVENT_COUNTERS::EA_MEMORY] = setCounter(PERF_TYPE_RAW, 0x03e8);
for(std::map<std::string, unsigned long long int>::iterator iterator = counters.begin(); iterator != counters.end(); iterator++)
{
if (ioctl(iterator->second, PERF_EVENT_IOC_RESET, PERF_IOC_FLAG_GROUP) != 0)
{
std::cerr << "PREFEVENT error when reseting counter\n";
}
if (ioctl(iterator->second, PERF_EVENT_IOC_ENABLE, PERF_IOC_FLAG_GROUP) != 0)
{
std::cerr << "PREFEVENT error when starting counting\n";
}
}
}
/**
* close opened handlers
**/
void close()
{
for(std::map<std::string, unsigned long long int>::iterator iterator = counters.begin(); iterator != counters.end(); iterator++)
{
if (ioctl(iterator->second, PERF_EVENT_IOC_DISABLE, 0) != 0)
{
std::cerr << "PREFEVENT error when stoping counting\n";
}
}
for(std::map<std::string, unsigned long long int>::iterator iterator = counters.begin(); iterator != counters.end(); iterator++)
{
if (::close(iterator->second) != 0)
{
std::cerr << "PREFEVENT error when closing file descriptor\n";
}
}
}
/**
* Read performance counters listed in the input std::map, and fills the map with these values.
* Counter value set to zero in case of an error.
**/
void getCounterValues(std::map<std::string, unsigned long long int> & record)
{
for(std::map<std::string, unsigned long long int>::iterator iterator = counters.begin(); iterator != counters.end(); iterator++)
{
unsigned long long int result = 0;
if ((read(iterator->second, &result, sizeof(unsigned long long int))) != sizeof(unsigned long long int))
{
//int errn = errno;
std::cerr << "PREFEVENT error. Failed to read counter "<< iterator->first << std::endl;
record[iterator->first] = 0;
}
else
{
record[iterator->first] = result;
}
}
}
/**
* From two values of a counter produces a final value (applies unit)
* Handles one counter overflow
**/
unsigned long long getResultValue(unsigned long long int startValue, unsigned long long int stopValue, double runtime, std::string counter, std::string regionName)
{
if (stopValue < startValue)
{
std::cerr << "PERFEVENT OVERFLOW, MAX VALUE: " << COUNTERMAX <<" " <<counter << '\n'
<< "PERFEVENT REPAIR " << startValue << " .. " << stopValue << " = " << COUNTERMAX- startValue + stopValue << std::endl;
return (COUNTERMAX - startValue + stopValue) * units[counter];
}
else
{
return (stopValue - startValue) * units[counter];
}
}
/**
* compose a list of performance counters to trace
**/
void initMap(std::map<std::string, unsigned long long int> & dict)
{
dict[PERFEVENT_COUNTERS::EA_CORE] = 0;
dict[PERFEVENT_COUNTERS::EA_L2] = 0;
dict[PERFEVENT_COUNTERS::EA_MEMORY] = 0;
}
} // PERFEVENT namespace
////////////////////////////////////////////////////////////////////////////////
int main ()
{
#ifndef TEST
std::map<std::string, unsigned long long int> counters;
std::vector<std::map<std::string, unsigned long long int>> records;
PERFEVENT::init();
PERFEVENT::initMap(counters);
PERFEVENT::getCounterValues(counters);
for(std::map<std::string, unsigned long long int>::iterator iterator = counters.begin(); iterator != counters.end(); iterator++)
{
std::cout << "START " << PERFEVENT_COUNTERS::counterLabel[iterator->first] << ": " << iterator->second << '\n';
}
records.push_back(counters);
#endif
//do_some_work() -- cannot be sleep!! A64FX does not accumulate power samples if the HW is idle
#pragma omp parallel
do_work(5.0);
#ifndef TEST
PERFEVENT::getCounterValues(counters);
for(std::map<std::string, unsigned long long int>::iterator iterator = counters.begin(); iterator != counters.end(); iterator++)
{
std::cout << "STOP " << PERFEVENT_COUNTERS::counterLabel[iterator->first] << ": " << iterator->second << '\n';
}
records.push_back(counters);
PERFEVENT::close();
unsigned long long sum = 0;
for(std::map<std::string, unsigned long long int>::iterator iterator = counters.begin(); iterator != counters.end(); iterator++)
{
unsigned long long value = PERFEVENT::getResultValue(records[0][iterator->first], records[1][iterator->first], 0.0, iterator->first, "");
sum += value;
std::cout << PERFEVENT_COUNTERS::counterLabel[iterator->first] <<": "<< value << '\n';
}
std::cout << "SUM Energy consumption [J]: " << sum << '\n';
#endif
return 0;
}
/*******************************************************************************
$ export OMP_NUM_THREADS=48
$ ./energy
CORE [J]: 480
L2 [J]: 406
MEM [J]: 421
SUM Energy consumption [J]: 1307
$ perf stat -e cpu-cycles -e r01e0 -e r03e0 -e r03e8 ./energy_test
Performance counter stats for './energy_test':
432,821,395,996 cpu-cycles
59,970,613,497 r01e0
12,684,023,414 r03e0
1,643,952,943 r03e8
5.075817498 seconds time elapsed
240.306538000 seconds user
0.159771000 seconds sys
# 59970613497*0.000000008 + 12684023414*0.000000032 + 1643952943*0.000000256 = 1306.505610632 [J]
*******************************************************************************/
work.h 0 → 100644
#include <sys/time.h>
#define REPS 100000
//converts struct timespec to double
double timespec2double(timespec time)
{
return time.tv_sec + (double)time.tv_nsec/1e9;
}
//performs computational workload for a specific period of time
//double timeout - time in seconds
void do_work(double timeout)
{
int i, l = 1;
double exe_time = 0;
struct timespec startTime, endTime;
clock_gettime(CLOCK_MONOTONIC, &startTime);
double start = timespec2double(startTime);
while(1)
{
for(i = 0; i < REPS; i++)
{
l += l * i;
}
clock_gettime(CLOCK_MONOTONIC, &endTime);
exe_time = timespec2double(endTime) - start;
if(exe_time > timeout)
{
break;
}
}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment