Computer Standards & Interfaces 97 (2026) 104120


                                                                     Contents lists available at ScienceDirect


                                                           Computer Standards & Interfaces
                                                              journal homepage: www.elsevier.com/locate/csi


Energy consumption assessment in embedded AI: Metrological
improvements of benchmarks for edge devices
Andrea Apicella b , Pasquale Arpaia a ,∗, Luigi Capobianco d , Francesco Caputo a ,
Antonella Cioffi d , Antonio Esposito a , Francesco Isgrò a , Rosanna Manzo c ,
Nicola Moccaldi a , Danilo Pau e , Ettore Toscano d
a
  Dipartimento di Ingegneria Elettrica e delle Tecnologie dell’Informazione, Università degli Studi di Napoli Federico II, Naples, Italy
b
  Dipartimento di Ingegneria dell’Informazione ed Elettrica e Matematica applicata (DIEM), Università degli Studi di Salerno, Fisciano, Italy
c
  Dipartimento di Sanità Pubblica e Medicina Preventiva, Università degli Studi di Napoli Federico II, Naples, Italy
d
  Software Design Center, STMicroelectronics, Marcianise, Italy
e System Research and Applications, STMicroelectronics, Agrate Brianza, Italy


ARTICLE                INFO                                 ABSTRACT

Keywords:                                                   This manuscript proposes a new method to improve the MLCommons protocol for measuring power consump-
Energy assessment                                           tion on Microcontroller Units (MCUs) when running edge Artificial Intelligence (AI). In particular, the proposed
Embedded AI                                                 approach (i) selectively measures the power consumption attributable to the inferences (namely, the predictions
Tiny-ML
                                                            performed by Artificial Neural Networks — ANN), preventing the impact of other operations, (ii) accurately
Uncertainty analysis
                                                            identifies the time window for acquiring the sample of the current thanks to the simultaneous measurement of
Edge device benchmark
                                                            power consumption and inference duration, and (iii) precisely synchronize the measurement windows and the
                                                            inferences. The method is validated on three use cases: (i) Rockchip RV1106, a neural MCU that implements
                                                            ANN via hardware neural processing unit through a dedicated accelerator, (ii) STM32 H7, and (iii) STM32 U5,
                                                            high-performance and ultra-low-power general-purpose microcontroller, respectively. The proposed method
                                                            returns higher power consumption for the two devices with respect to the MLCommons approach. This result
                                                            is compatible with an improvement of selectivity and accuracy. Furthermore, the method reduces measurement
                                                            uncertainty on the Rockchip RV1106 and STM32 boards by factors of 6 and 12, respectively.


1. Introduction                                                                                   (MCUs), widely used in IoT, this is particularly true. Many IoT applica-
                                                                                                  tions, such as autonomous driving [6], demand low-latency responses
    The rapid expansion of Internet of Things (IoT) devices has ushered                           to be effectively reactive. Moreover, several IoT devices often operate
in a new era of connected intelligence at the edge, where data process-                           under very limited power sources. Promising energy-efficient strategies
ing, low latency, and real-time decision making can take place directly                           aim to minimize consumption. For instance, index modulation [7,8] is
at the edge [1]. These IoT devices cover a variety of applications, from                          a transmission technique that conveys additional information through
smart home sensors [2], to industrial automation [3], and health mon-                             the indices of available resources such as antennas, subcarriers, or
itoring systems [4], where low latency responses and energy efficiency                            time slots, and it can significantly reduce energy usage while maintain-
are essential.                                                                                    ing data throughput. Nevertheless, even with advanced optimization
    Extending computation to more peripheral network nodes enhances                               strategies, the repetitive and frequent processing required by many ap-
all key aspects of edge computing, including energy efficiency, carbon                            plications can rapidly deplete power resources, thereby limiting device
footprint reduction, security, latency, privacy, offline functionality, and
                                                                                                  lifetime.
data management costs [5]. However, deploying intelligence at the
                                                                                                      In recent years, Machine Learning (ML) methods [9], particularly
end nodes requires careful consideration of the IoT devices inherent
                                                                                                  Artificial Neural Networks (ANNs), have been increasingly deployed on
limitations, such as memory and computational resources impacting
                                                                                                  IoT devices to enhance localized data processing capabilities and reduce
time performances, and energy constraints. For Microcontroller Units


    ∗ Corresponding author.
     E-mail addresses: andapicella@unisa.it (A. Apicella), pasquale.arpaia@unina.it (P. Arpaia), luigi.capobianco@st.com (L. Capobianco),
francesco.caputo3@unina.it (F. Caputo), antonella.cioffi@st.com (A. Cioffi), antonio.esposito9@unina.it (A. Esposito), francesco.isgro@unina.it (F. Isgrò),
rosanna.manzo@unina.it (R. Manzo), nicola.moccaldi@unina.it (N. Moccaldi), danilo.pau@st.com (D. Pau), ettore.toscano@st.com (E. Toscano).

https://doi.org/10.1016/j.csi.2025.104120
Received 10 January 2025; Received in revised form 2 September 2025; Accepted 21 December 2025
Available online 22 December 2025
0920-5489/© 2025 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
A. Apicella et al.                                                                                               Computer Standards & Interfaces 97 (2026) 104120


dependency on cloud infrastructures [10,11]. It is common to refer to
these devices as tiny devices [12] and embedded ML as tiny machine
learning or tiny ML [5].
    Consequently, assessing the inference time provided by the IoT
hardware for a specific ANN model is crucial to ensure that the em-
bedded system can satisfy real-time processing requirements. In this
context, inference refers to the process of an ANN generating outputs
based on its trained model parameters and given inputs.
    Therefore, tailored energy consumption metrics are essential to
ensure the alignment between the ANN implementation and the en-
ergy constraints of the targeted IoT application. To this aim, Neural
MCUs are new edge devices embedding ANN accelerators, specifically
designed to manage the trade-off between reliability, latency, cost,
and power consumption [13]. Therefore, adopting standardized metrics
and procedures is essential for assessing the actual performance gains
achieved by neural MCUs in the context of embedded AI. Despite
several frameworks and tools have been proposed to facilitate the
benchmarking of tinyML models [14–16], no standardized metrics and
procedures are currently defined.
                                                                                  Fig. 1. Energy measurement set up proposed by MLPerf Tiny Benchmark [17,
    Among the proposed benchmarking protocols, MLPerf Tiny Bench-                 19]. The DUT is powered by the Energy Monitor. The IO manager serves as
mark (MLPTB) [17] is developed by the MLCommons Association,                      an electrical-isolation proxy.
the largest and most authoritative community aimed at improving
the industrialization standardization process of machine learning [18].
MLPTB provides protocols and AI components, namely datasets and
                                                                                  functionalities: (i) sending a trigger signal, (ii) enabling UART commu-
pre-trained ML models. These can act as metrological references when
                                                                                  nication, (iii) generating and feeding random input data to the ANN,
implemented on different hardware to assess their performance such
                                                                                  (iv) performing inferences, and (v) printing the prediction results. The
as the inference time and the power consumption under real-world
                                                                                  software includes a graphical user interface that can be run on the Host
conditions. However, the MLPTB protocols exhibit some metrological
                                                                                  Computer, allowing the initiation of the measurement and monitoring
weakness: (i) both the assessment of time performance and energy
                                                                                  of input data. It is important to emphasize that in phase (iii) random
consumption is realized without measurement uncertainty computa-
                                                                                  data are generated to feed the ANN. This operation, however, does
tion, (ii) the energy consumption analysis is performed based on an
                                                                                  not reflect real-world applications, where the network processes sensor
approximate estimate of the average inference duration, and (iii) the
                                                                                  data in real time. Although not an intrinsic part of ANN inference,
impact on consumption caused by inferences is not isolated with respect
                                                                                  MLPTB includes this step in the performance and energy measurements.
to other processes.
                                                                                  Throughout this paper, phase (iii) is explicitly distinguished from phase
    In this paper, a new method is proposed and validated to improve
                                                                                  (iv) (i.e., inference) and is referred to as the pre-inference phase.
MLPTB protocols to measure power consumption in MCUs running
ANNs, in a rigorous metrological framework. Specifically, in Section 2                The energy per inference (𝐸𝑖𝑛𝑓 ) is calculated using latency infor-
the MLPTB framework is reported, then the proposed method is pre-                 mation determined in the Performance phase. Specifically, the IPS is
sented in Section 3. Experiments and results are reported in Section 4            determined by taking the median value across five experiments. In each
and discussed in Section 5.                                                       experiment, input data is provided for a duration of at least 10 s, and
                                                                                  the number of inferences is recorded via a direct connection between
2. Background                                                                     the Host Computer and the DUT. Given the IPS, 𝐸𝑖𝑛𝑓 is computed as:
                                                                                            𝐼𝑚 × 𝑉𝑛
                                                                                  𝐸𝑖𝑛𝑓 =                                                                     (1)
     Several frameworks and tools have been introduced to support                          𝜏 × 𝐼𝑃 𝑆
the benchmarking of tinyML models [14–16]. Among the available                    where 𝑉𝑛 is the nominal voltage, 𝐼𝑚 is the current averaged over the
benchmarking protocols, the MLPerf Tiny Benchmark (MLPTB) [17],                   fixed period 𝜏.
developed by the MLCommons Association [18], emerges as a key
initiative.
                                                                                  3. Proposed method
     MLPTB proposes two modalities of assessment: (i) Performance and
(ii) Energy. The former measures Latency (inferences per second — IPS)
and accuracy (percentage of correct predictions to all predictions ratio)             The MLCommons pre-inference phase generates random numbers as
through a direct USB connection between a Device Under Test (DUT)                 input to the ANN in order to perform inference (in addition to memory
and an host computer, while the latter measures energy (micro-joules              operations needed to provide the input to the network). However, ran-
per inference). In the remainder of this section, the energy configura-           dom number generation is hardly reproducible across different devices
tion mode is detailed, as it represents the central focus of this study. In       under test, since both the libraries and the hardware resources available
the energy configuration mode (Fig. 1), an Energy Monitor is proposed             on the microcontrollers for random number generation vary. In con-
to supply power to the DUT while measuring the current consumption.               trast, the proposed work selectively excludes the pre-inference phase
An Input/Output Manager is introduced to interface the Host Computer              from the performance and energy measurements, ensuring greater re-
with the DUT and serving as an electrical-isolation proxy. Furthermore,           producibility while also providing a closer adherence to the actual
MLPTB requires level shifters to adapt the power supply in input to the           operation of the device in real-world scenarios. In the following of this
DUT (not reported in Fig. 1 to simplify the schematic as they are not             section, the proposed method is described. In paragraph 3.1 the circuit
essential to the discussion).                                                     solution for the joint measurement of time and energy consumption
     In addition to defining assessment procedures, MLPTB provides                is described. In paragraph 3.2 the expected impact of the method on
some firmware and software [19] for ML tasks on DUT. In particular,               selectivity, accuracy, and uncertainty during the energy measurement
the provided firmware to be loaded onto the DUT ensures the following             is highlighted.

                                                                              2
A. Apicella et al.                                                                                              Computer Standards & Interfaces 97 (2026) 104120


                                                                                inference. Furthermore, it is assumed with a non-negligible degree of
                                                                                approximation that the inferences are executed consecutively by the
                                                                                MCU, disregarding the impact of inter-inference operations that are
                                                                                still present. Finally, the delays in the transmission of the command for
                                                                                starting the measurement have a further impact on the accuracy, albeit
                                                                                to a very small extent. Specifically, this refers to the time taken by the
                                                                                CPU on the DUT to generate the trigger signal and by the Measurement
                                                                                Board to handle the interrupt triggered at its input pin (see Fig. 3).
                                                                                     In the proposed method, limiting the observation to a single in-
                                                                                ference at a time eliminates the approximation inherent in MLPTB,
                                                                                where the inference duration is estimated through the average of
                                                                                multiple successive inferences executed within a known time window.
                                                                                Specifically, the proposed method allows the exclusion of all energy
                                                                                contributions unrelated to the inference itself (e.g., data transfer op-
                                                                                erations to memory during the pre-inference phase). However, in the
                                                                                proposed method, the repetition of the measurement for each inference
                                                                                amplifies the impact of inaccuracies caused by the delay in transmitting
                                                                                the status signal. In contrast, the MLPTB approach mitigates this effect
Fig. 2. Proposed energy measurement setup. The Host Computer powers the
                                                                                because the delay only occurs at the start of the measurement for
DUT and an ammeter is connected in series along the power line on the DUT       multiple inferences. To address this issue, the inference duration (𝛥𝑡)
(e.g. a MCU).                                                                   measurement is also performed. In the firmware for the DUT, the
                                                                                onboard counter is read immediately before and after the inference
                                                                                execution. The 𝛥𝑡, is used to appropriately resize the current sample
                                                                                vector acquired while the inference status signal is active. The current
3.1. Circuit diagram and measurement procedure
                                                                                sample vector is trimmed at both ends by a number of elements (𝑁𝑡𝑟𝑖𝑚 ),
                                                                                calculated as follows:
    The proposed method utilizes an ammeter that does not require                           (            )
powering the DUT to measure the absorbed current. The ammeter is                          𝑓    𝑁𝑐𝑠
                                                                                𝑁𝑡𝑟𝑖𝑚 = 𝑐          − 𝛥𝑡                                                (2)
connected in series to the microprocessor on the MCU powered by the                       2     𝑓𝑐
Host Computer through the USB port (Fig. 2). This approach allows               where 𝑓𝑐 is the sampling frequency of the Ammeter, 𝑁𝑐𝑠 is the number
the Host Computer to perform both latency and energy measurements               of current samples acquired when the inference status signal is high,
simultaneously. Indeed, the firmware provided by MLPTB enables the              and 𝛥𝑡 is the inference duration.
DUT to update the Host Computer on the number of completed infer-
ences through the USB connection. Instead of computing the energy               3.3. Uncertainty improvements
per inference as the ratio between the total energy measured in a
specific time window and the number of inferences (MLPTB method),                   Two distinct phases should be addressed in the evaluation of un-
the proposed method computes the energy for each inference without              certainty: (i) the inference time measurement, and (ii) the energy
considering the impact of pre-inference phase. This is obtained by              consumption assessment. In particular, an important source of un-
modifying the firmware provided by MLPTB: the trigger is replaced by            certainty in MLPTB is due to the counting of inferences during the
a logic signal (inference status) that goes high during an ongoing infer-       IPS measurement affecting inference time measurement and, conse-
ence and returns low otherwise. The inference status signal output from         quently, also the energy consumption assessment. More deeply, the
the device under test is sampled by the Measurement Board (ammeter)             measurement window is not an integer multiple of the inference period,
in parallel with the current (Fig. 3.a). Two vectors of synchronously           therefore, there is no synchronization between the end of the last
sampled data (current and inference status signal) are sent to the Host         inference and the end of the measurement window. This contribution
Computer. The current samples are processed, and the energy consump-            can be modeled by a uniform random variable whose domain is equal
tion is calculated only when the inference status samples indicate a            to the central value inference duration 𝛥𝑡𝑚 , with a standard deviation
low logic signal. Additionally, before and after each inference, the DUT        𝜎1𝑐𝑜𝑛𝑡 computed as:
reads the values of the Clock and Reset Management Unit (CRMU) and
                                                                                               𝛥𝑡
transmits them to the Host Computer to determine the duration of the            𝜎1𝑐𝑜𝑛𝑡 = 𝑢𝑡1 = √𝑚                                                           (3)
inference. Finally, the software on the Host Computer computes the                            2 3
mean value of 𝑁 inferences with associated uncertainty. In this work,           The uncertainty of the MLPTB method is assessed by assuming the
𝑁 is set to 100. Similar to the MLPTB, the proposed firmware runs as            median inference duration approximately equal to the mean. Differ-
the sole program on the MCU, with fully sequential execution and no             ently, in the proposed method the counting uncertainty is determined
concurrency, or interrupts. Furthermore, in the proposed method, the            by the fact that the inference duration is not an integer multiple of
inference status signal is set high immediately after the pre-inference         the counter period (𝑇𝑐 ). Again, the random variable with uniform
phase, and the CRMU is queried right before the inference execution.            probability distribution effectively describes this aspect. The standard
As soon as the inference completes, the CRMU is queried again, and              deviation 𝜎2𝑐𝑜𝑛𝑡 is computed as:
finally the inference status is set low to signal the ammeter that the                         𝑇
inference has finished. In Fig. 4, a flowchart describing the customized        𝜎2𝑐𝑜𝑛𝑡 = 𝑢𝑡2 = √𝑐                                                           (4)
firmware behavior is reported.                                                                2 3
                                                                                Assuming that 𝛥𝑡𝑚 ≫ 𝑇𝑐 , it follows 𝑢𝑡1 ≫ 𝑢𝑡2 and the proposed method
3.2. Accuracy improvements                                                      improves the measurement uncertainty due to counting.
                                                                                    Then there is the uncertainty due to the variability of the duration
   In the MLPTB, the number of inferences during the measurement                time of the processes between the inferences (pre-inference phase). The
time in energy mode is calculated using the IPS obtained from the               proposed method is not affected by this source of uncertainty because
previous latency measurement. This approach introduces accuracy is-             it excludes from the energy measurement all the processes outside
sues because an estimator is used instead of the actual time of each            the inference. Finally, both methods are exposed to the uncertainty

                                                                            3
A. Apicella et al.                                                                                                       Computer Standards & Interfaces 97 (2026) 104120


Fig. 3. Comparison between the block diagram of the proposed method (a) and ML Commons-Tiny approach (b) for energy consumption measurement. The
added blocks and signals are reported in red. In the proposed method, the Device Under Test stops the power consumption computation after each inference.
Differently, in the MLCommons-Tiny approach, the Host Computer stops the acquisition of current samples after a fixed time window, without distinguishing
between pre-inference and inference phases. Furthermore, it computes the energy consumption (μJ per inference) based on the Inference per Second measured
exploiting the Performance mode (see Section 2.) The Counter and the Time Calculator blocks are used for the measurement of the duration of each inference,
while an Inference Status ADC minimizes the latency between the inference start and current sample consideration. (For interpretation of the references to color
in this figure legend, the reader is referred to the web version of this article.)


                                                                                          according to the following formula [20]:
                                                                                              √
                                                                                          𝑢𝑐 = 𝑢2𝐴 + 𝑢2𝐵 + 𝑢2𝐵 + ⋯ + 𝑢2𝐵 .                                           (5)
                                                                                                        1     2         𝐾


                                                                                          4. Experiments and results

                                                                                             In this section, a comparison between the application of the pro-
                                                                                          posed and MLPTB methods is presented. In paragraph 4.1 the ex-
                                                                                          perimental procedure is described. The DUTs and the ammeter are
                                                                                          presented in paragraph 4.2. Results are reported in paragraph 4.3.

                                                                                          4.1. Experimental procedure

                                                                                             The MLPTB method was implemented using two different circuit
                                                                                          configurations for measuring inference duration and energy per infer-
                                                                                          ence, as described in [17]. Instead, in the proposed method the two
                                                                                          measures were realized with the same circuital solution shown in Fig. 2.
                                                                                          The Firmware used for MLPTB measurement was modified to allow the
                                                                                          measurement of the single inference as described in the paragraph 3.1.
                                                                                          The four MLPerf benchmarks were retained: (i) Anomaly Detection, (ii)
                                                                                          Keyword Spotting, (iii) Image Classification, (iv) Visual Wake Words.
                                                                                          Each benchmark targets a specific use case and specifies a dataset, a
                                                                                          model, and a quality target [17].

                                                                                          4.2. Experimental setup

                                                                                              Both methods are applied on three different MCU: STMicroelec-
                                                                                          tronics STM32-H7 (Clock Frequency = 280 MHz), STMicroelectronics
                                                                                          STM32-U5 (Clock Frequency = 160 MHz), and Rockchip RV1106 (Clock
Fig. 4. Flow chart of the proposed Firmware. The pre-inference phase (in red)             Frequency = 1200 MHz). The STM32H7 and the STM32U5 are general-
is excluded from both time (CRMU timestamp read) and energy assessment                    purpose microcontrollers, the former designed for high-performance
(‘‘Inference Status’’ digital signal setting and unsetting). (For interpretation of       applications and the latter for ultra-low-power operation, both pro-
the references to color in this figure legend, the reader is referred to the web          duced by STMicroelectronics. These devices do not have any ded-
version of this article.)                                                                 icated Neural Processing Unit (NPU) hardware for ANN computa-
                                                                                          tion, so this part is commonly made by implemented firmware that
                                                                                          run on main Central Process Unit (CPU). The firmware is automati-
of the stability of the DUT (jitter) and ammeter precision, as well                       cally deployed using ST EdgeAI Core Technology and compiled through
as to the uncertainty of the signal transmission times between the                        STMCubeIDE [21] compiler implementing all needed tools to convert,
devices involved in the measurement process. For the calculation of                       optimize, and implement ANN models on the DUT.
the measurement uncertainty, the combined standard uncertainty 𝑢𝑐 is                          The evaluation boards of the STMicroelectronics Nucleo-STM32H7
adopted, where the contribution from the type A evaluation (𝑢𝐴 ) is                       with STM32H7 microcontroller and B-U585I-IOT02 A Discovery Kit
integrated with the 𝐾 contributions from the type B evaluations (𝑢𝐵𝑘 ),                   with STM32U5 microcontroller were chosen for the experimental setup

                                                                                      4
A. Apicella et al.                                                                                               Computer Standards & Interfaces 97 (2026) 104120


                                          (a)                     (b)                                (c)


                                                                            (d)


Fig. 5. Hardware components used in the experiments: (a) H7 board with STM32H7 MCU, (b) Luckfox Pico Pro Max with Rockchip RV1106 SoC, (c) B-U585I-
IOT02 A Discovery Kit with STM32U5 MCU, and (d) Power Profiler Kit II ammeter.


(Figs. 5(a), 5(c)). They include a connector in series to the MCU’s power         counter values returned by two consecutive CRMU readings. On each
supply line allowing an ammeter to be inserted to assess the power                board, 30 experiments were performed, each providing two latency
consumption of the DUT under operating conditions.                                values. For each board, the mean value and type A uncertainty were
    The RV1106 is a System on Chip (SoC) produced by Rockchip Elec-               computed. In the worst case, namely the Rockchip, the latency was
tronics. This device has a dedicated NPU hardware, so the computation             found to be 7 ± 4 CPU clock cycles (2 ± 1 for the other two boards),
of ANN models are made by hardware, and the software shall only                   which corresponds to only a few nanoseconds. Tables 1, 2, and 3
allocate necessary data into a dedicated memory area. While STM32                 present the results of inference duration (𝛥𝑡) assessments conducted
microcontrollers operate without an operating system, RV1106 requires             using both the MLPTB and the proposed methods. The results are
the use of an operating system given its CPU architecture. Ubuntu                 reported for the Rockchip RV1106, STM32H7, and STM32U5, respec-
22.04 RT [22] was therefore installed to minimize execution timing                tively, with varying ANN models. Concerning uncertainty computation,
uncertainties.                                                                    the MLPTB method does not provide strategies for calculating mea-
    The software is deployed using RKNN Toolkit compiler that im-                 surement uncertainty and, in this work, it was computed by referring
plements all needed tools to convert, optimize, and implement ANN                 to the sole contribution of the counting inferences (Eq. (2)). In the
models on the device. The evaluation board with Rockchip RV1106                   proposed method, since the Clock and Reset Management Unit (CRMU)
chosen for the experimental setup is the Luckfox Pico Pro Max (Fig.               of the MCUs is employed for inference time measurement, the type
5(b)). The ammeter is inserted between USB-C main supply and the                  A uncertainty is combined with type B contributions arising from
SoC’s power supply line in order to assess the power consumption of               counting uncertainty, system clock stability (jitter), and the response
device under operative conditions.                                                time required by the CRMU to be queried and to return a value.
    The measurement board used for the power assessment is the Power              For all the considered microcontrollers, the type B contribution was
Profiler Kit II (PPKII) produced by Nordic Semiconductor (Fig. 5(d)).             found to be dominated by the counting uncertainty, computed using
This device is composed by an ammeter and a 8-bits digital sampler                formula (4), and equal to 289 ns. The jitter contribution is at least
synchronized with the same time base. It can work into two different              three orders of magnitude smaller at room temperature (between 20 ◦ C
modes that affect the only ammeter component:                                     and 30 ◦ C) [23–25]. Similarly, the uncertainty related to the CRMU
                                                                                  response time, characterized in this work for all three microcontrollers,
     • Source Meter: With this mode, the internal ammeter is linked               was found to be equal to 1 CPU clock cycle. In the worst case, i.e., con-
       to a power supply generator that can be used to provide the                sidering the STM32U5 device with the lowest CPU clock frequency, this
       power supply to DUT. This mode was adopted for the MLPTB                   contribution was on the order of nanoseconds. Therefore, the overall
       implementation                                                             evaluated uncertainty corresponds to the joint contribution of type A
     • Ammeter Mode: With this mode, the instrument works as a pure               and type B, with the latter coinciding with the counting uncertainty,
       ammeter and the power supply of DUT can be provided ex-                    according to:
       ternally. This mode was implemented in the proposed method                      √
       application.                                                               𝑢𝑡 = 𝑢2𝐴 + 𝑢2𝐵                                                        (6)

   For both modes, the device was metrologically characterized under                 To propagate the measurement uncertainty of the 𝛥𝑡 on the energy
operating conditions of 20–30 ◦ C (the same conditions used for all               per inference (𝐸𝑖𝑛𝑓 ) measurement, a constant power 𝑃 is assumed
experiments), exhibiting an uncertainty of less than 2%.                          during the inference time, obtaining the following propagation formula:

4.3. Results
                                                                                  𝐸𝑖𝑛𝑓 = 𝑃 𝛥𝑡 ⇒ 𝑢𝑒 = 𝑃 𝑢𝑑                                                    (7)
    For the proposed method, a characterization of the CRMU query                 where 𝑢𝑒 is the energy per inference measurement uncertainty. With
latency was carried out on all devices. A modified version of the same            respect to the energy consumption estimation, an additional uncer-
firmware used for the energy consumption assessment was employed.                 tainty source arises from the measuring instrument, i.e., the ammeter
Specifically, an additional CRMU query was appended directly after                employed. For both methods, an instrumental uncertainty of 2% was
the preceding one, making it consecutive to the two already present.              considered, after a metrological characterization performed under oper-
The CRMU query latency was measured as the difference between the                 ational conditions at room temperature (between 20 ◦ C and 30 ◦ C). The

                                                                             5
A. Apicella et al.                                                                                                                            Computer Standards & Interfaces 97 (2026) 104120


                     Table 1
                     Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑡 ) of inference duration (expressed in ms) assessed by MLCommons and
                     proposed methods on Rockchip RV1106 at varying of neural models.
                      Method          Visual Wake Words                Image Classification                    Keyword Spotting                   Anomaly Detection
                                      𝑚𝑡                   𝑢𝑡          𝑚𝑡                         𝑢𝑡           𝑚𝑡                  𝑢𝑡             𝑚𝑡                  𝑢𝑡
                      Proposed        0.820                0.006       0.415                      0.012        0.400               0.008          0.558               0.033
                      MLPTB           0.815                0.235       0.414                      0.120        0.371               0.107          0.350               0.101
                     a
                         In MLPTB, the counting uncertainty was taken into account.


                     Table 2
                     Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑡 ) of inference duration (expressed in ms) assessed by MLCommons and
                     proposed methods on STM32H7 microcontroller at varying of neural models.
                      Method          Visual Wake Words               Image Classification                      Keyword Spotting                  Anomaly Detection
                                      𝑚𝑡                  𝑢𝑡          𝑚𝑡                      𝑢𝑡                𝑚𝑡                 𝑢𝑡             𝑚𝑡                  𝑢𝑡
                      Proposed        29.656              0.003       49.941                  0.001             14.860             0.001          1.690               0.002
                      MLPTB           29.600              8.545       51.900                  14.982            15.400             4.446          1.800               0.520
                     a   In MLPTB, the Counting Uncertainty was taken into account.


                     Table 3
                     Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑡 ) of inference duration (expressed in ms) assessed by MLCommons and
                     proposed methods on STM32U5 microcontroller at varying of neural models.
                      Method          Visual Wake Words                Image Classification                    Keyword Spotting                   Anomaly Detection
                                      𝑚𝑡                  𝑢𝑡           𝑚𝑡                     𝑢𝑡               𝑚𝑡                  𝑢𝑡             𝑚𝑡                  𝑢𝑡
                      Proposed        78.447              0.002        133.280                0.002            48.060              0.001          4.910               0.002
                      MLPTB           71.600              20.669       128.200                37.008           38.600              11.143         4.800               1.386
                     a
                         In MLPTB, the Counting Uncertainty was taken into account.


                     Table 4
                     Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑒 ) of energy (expressed in μJ) assessed by MLCommons and proposed methods
                     on Rockchip RV1106 at varying of neural models.
                      Method          Visual Wake Words               Image Classification                     Keyword Spotting                   Anomaly Detection
                                      𝑚𝑡                       𝑢𝑒     𝑚𝑡                               𝑢𝑒      𝑚𝑡                       𝑢𝑒        𝑚𝑡                       𝑢𝑒
                      Proposed        380                      13     193                              15      165                      9         222                      11
                      MLPTB           373                      108    183                              53      159                      46        148                      43
                     a
                         In MLPTB, the counting uncertainty was propagated into the energy measurements.


                     Table 5
                     Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑒 ) of energy (expressed in μJ) assessed by MLCommons and proposed methods
                     on STM32H7 microcontroller at varying of neural models.
                      Method          Visual Wake Words               Image Classification                     Keyword Spotting                   Anomaly Detection
                                      𝑚𝑡                   𝑢𝑒         𝑚𝑡                          𝑢𝑒           𝑚𝑡                       𝑢𝑒        𝑚𝑡                       𝑢𝑒
                      Proposed        4386                 88         7536                        151          2202                     44        236                      6
                      MLPTB           3699                 1068       6311                        1822         1870                     540       221                      64
                     a   In MLPTB, the counting uncertainty was propagated into the energy measurements.


final uncertainty was thus obtained by applying the following formula:                                 trends: for two networks, the measured consumption is higher with the
                                                                                                       proposed method, while for the other two networks it is higher with
       √                                                                                               MLCommons. Regarding the uncertainty, the proposed method reduces
𝑢𝑒 =    𝑢2𝑡 + 𝑢2𝑠                                                                     (8)
           𝑝                                                                                           it by a factor of 12.
where 𝑢𝑡𝑝 denotes the inference time measurement uncertainty 𝑢𝑡 prop-
agated through the functional relation used for energy computation                                     5. Discussion
(see formula), and 𝑢𝑠 represents the instrumental uncertainty of the
ammeter. The measurement uncertainty obtained for the proposed                                             The contrasting trends from energy assessment on STM32U5 pro-
method appears for all tested devices to be very low compared to the                                   vide an opportunity to discuss the relationship between the two meth-
uncertainty of the MLPTB method.                                                                       ods in terms of metrological accuracy. The MLCommons method ex-
    In Tables 4, 5, and 6 a comparison between results of energy per                                   tracts a central Inference Per Second value based on five experiments,
inference assessment by MLPTB and proposed methods are reported for                                    whereas our method computes a central value as the mean over 100
the three DUTs. On the Rockchip RV1106, the proposed method mea-                                       acquisitions. Given the large uncertainty of the MLPTB method and
sures an inference energy value that is, on average, 15% higher than                                   the limited number of experiments, the calculated central value is
that obtained with MLPTB, while improving the uncertainty by a factor                                  unlikely to be a reliable estimator of the true value of the measured
of 6. In the case of a STM32H7 inference energy assessment grows                                       quantity [26]. The comparison of mean values obtained with the two
by 16% while the uncertainty improves by a factor of 12. Notably,                                      methods is limited by the large difference in their associated uncertain-
the inference energy assessment on the STM32U5 shows contrasting                                       ties. The less precise method exhibits an uncertainty up to two orders

                                                                                              6
A. Apicella et al.                                                                                                               Computer Standards & Interfaces 97 (2026) 104120


Fig. 6. Temporal diagram of current values acquired from MCU during ANN operations. Orange traces represent (a) the inference status signal in the proposed
method and (b) the trigger signal in the MLPTB method. The windows used for energy consumption estimation are highlighted in light blue. Specifically, the
proposed method (a) considers only the current samples acquired during each neural network inference phase, whereas the MLPTB method (b) also includes the
energy contribution of pre-inference phases (light yellow window). (For interpretation of the references to color in this figure legend, the reader is referred to
the web version of this article.)


Fig. 7. Comparison between proposed method (orange) and MLPTB (green) in Energy per inference Assessment on the Rockchip RV1106, at varying th Models
provided by MLCommons. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)


                     Table 6
                     Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑒 ) of energy (expressed in μJ) assessed by MLCommons and proposed methods
                     on STM32U5 microcontroller at varying of neural models.
                      Method          Visual Wake Words              Image Classification               Keyword Spotting             Anomaly Detection
                                      𝑚𝑡                   𝑢𝑒        𝑚𝑡                         𝑢𝑒      𝑚𝑡                 𝑢𝑒        𝑚𝑡                    𝑢𝑒
                      Proposed        2362                 47        3249                       65      1184               27        116                   3
                      MLPTB           1921                 556       3384                       980     1004               291       121                   35
                     a
                         In MLPTB, the counting uncertainty was propagated into the energy measurements.


of magnitude higher than the other, rendering direct statistical com-                           by low energy consumption) from the calculation (Fig. 6). This prevents
parisons of the means largely insignificant. Observed differences may                           underestimation of the actual energy consumption, which may occur
therefore primarily reflect the inherent variability of the less accurate                       when using the MLPTB method.
method rather than genuine differences in the measured phenomenon.                                  Finally the Figs. 7, 8, and 9 present the histograms of Energy
However, it is important to note that the proposed method provides                              per Inference assessment with the two methods on Rockchip RV1106,
greater selectivity by excluding the pre-inference phase (characterized                         STM32H7, and STM32U5, respectively. The orange bars (proposed

                                                                                            7
A. Apicella et al.                                                                                            Computer Standards & Interfaces 97 (2026) 104120


Fig. 8. Comparison between proposed method (orange) and MLPTB (green) in Energy per inference Assessment on the STM32 H7, at varying th Models provided
by MLCommons. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)


Fig. 9. Comparison between proposed method (orange) and MLPTB (green) in Energy per inference Assessment on the STM32 U5, at varying th Models provided
by MLCommons. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)


method) are generally higher than the green bars (MLPTB). However,             6. Conclusions
comparing the mean values measured by the two methods is challeng-
ing due to the large uncertainty intervals (error bars) associated with            A new method for assessing power consumption of edge devices
MLPTB. Nevertheless, the differences in error bar lengths confirm the          such as MCUs running ANNs is presented, claiming metrological im-
improved precision of the proposed method.                                     provements over the MLPerf Tiny Benchmark. Unlike MLPTB, the
    The metrological improvements introduced in this work have direct          proposed method calculates the duration and energy consumption of
                                                                               each individual inference performed by the Device Under Test. Through
consequences for the practical adoption of embedded AI. First, more
                                                                               an appropriate circuit and firmware design, the method measures only
accurate and reproducible energy assessments enhance the reliability of
                                                                               the energy consumed by the inference, excluding other operations from
benchmarking, enabling fair comparisons among devices and support-
                                                                               the computation. This approach not only enhances the selectivity and
ing informed selection of hardware for battery-powered applications,
                                                                               accuracy of the measurement process but also reduces measurement
where autonomy is a critical design constraint. Second, the improved           uncertainty. Instead of counting the number of inferences over a fixed
accuracy in energy characterization facilitates more precise sizing of         interval, as MLPTB does, the proposed method counts the number of
power supply components, which is essential for ensuring efficiency,           ticks from the counter of the DUT during a single inference execution.
stability, and cost-effectiveness in embedded deployments. Finally, the        On a NPU powered microcontroller, the proposed method improves
refined timing characterization allows designers to better estimate            measurement uncertainty by a factor of 6. In the case of two general-
inference latency, a key parameter for real-time and safety-critical           purpose microcontrollers (high-performance and ultra-low-power), the
applications.                                                                  measurement uncertainty improves by a factor of 12.

                                                                           8
A. Apicella et al.                                                                                                                 Computer Standards & Interfaces 97 (2026) 104120


CRediT authorship contribution statement                                                      [6] M. Cunneen, M. Mullins, F. Murphy, Autonomous vehicles and embedded
                                                                                                  artificial intelligence: The challenges of framing machine driving decisions, Appl.
                                                                                                  Artif. Intell. 33 (8) (2019) 706–731.
    Andrea Apicella: Writing – review & editing, Methodology, Con-
                                                                                              [7] J. Li, S. Dang, M. Wen, Q. Li, Y. Chen, Y. Huang, W. Shang, Index modulation
ceptualization. Pasquale Arpaia: Writing – review & editing, Method-                              multiple access for 6G communications: Principles, applications, and challenges,
ology, Conceptualization. Luigi Capobianco: Writing – review & edit-                              IEEE Netw. 37 (1) (2023) 52–60.
ing, Methodology, Conceptualization. Francesco Caputo: Writing – re-                          [8] M. Wen, B. Zheng, K.J. Kim, M. Di Renzo, T.A. Tsiftsis, K.-C. Chen, N.
view & editing, Writing – original draft, Visualization, Validation, Soft-                        Al-Dhahir, A survey on spatial modulation in emerging wireless systems: Re-
                                                                                                  search progresses and applications, IEEE J. Sel. Areas Commun. 37 (9) (2019)
ware, Methodology, Investigation, Formal analysis, Data curation, Con-                            1949–1972.
ceptualization. Antonella Cioffi: Writing – review & editing, Methodol-                       [9] M.I. Jordan, T.M. Mitchell, Machine learning: Trends, perspectives, and
ogy, Conceptualization. Antonio Esposito: Writing – review & editing,                             prospects, Science 349 (6245) (2015) 255–260.
Methodology, Conceptualization. Francesco Isgrò: Writing – review                            [10] S. Mishra, J. Manda, Improving real-time analytics through the internet of things
                                                                                                  and data processing at the network edge, J. AI Assist. Sci. Discov. 4 (1) (2024)
& editing, Methodology, Conceptualization. Rosanna Manzo: Writ-
                                                                                                  184–206.
ing – review & editing, Methodology, Conceptualization. Nicola Moc-                          [11] M. De Donno, K. Tange, N. Dragoni, Foundations and evolution of mod-
caldi: Writing – review & editing, Methodology, Conceptualization.                                ern computing paradigms: Cloud, IoT, edge, and fog, IEEE Access 7 (2019)
Danilo Pau: Writing – review & editing, Methodology, Conceptual-                                  150936–150948.
ization. Ettore Toscano: Writing – review & editing, Methodology,                            [12] D.P. Pau, P.K. Ambrose, F.M. Aymone, A quantitative review of automated neural
                                                                                                  search and on-device learning for tiny devices, Chips 2 (2) (2023) 130–141.
Conceptualization.                                                                           [13] C.-T. Lin, P.X. Huang, J. Oh, D. Wang, M. Seok, iMCU: A 102-𝜇J, 61-ms digital
                                                                                                  in-memory computing-based microcontroller unit for edge TinyML, in: 2023 IEEE
Declaration of competing interest                                                                 Custom Integrated Circuits Conference, CICC, IEEE, 2023, pp. 1–2.
                                                                                             [14] S. Gal-On, M. Levy, Exploring coremark a benchmark maximizing simplicity and
                                                                                                  efficacy, Embed. Microprocess. Benchmark Consortium (2012).
    The authors declare that they have no known competing finan-
                                                                                             [15] P. Torelli, M. Bangale, Measuring Inference Performance of Machine-Learning
cial interests or personal relationships that could have appeared to                              Frameworks on Edge-Class Devices with the Mlmark Benchmark, Techincal Re-
influence the work reported in this paper.                                                        port, 2021, Available Online: https://www.eembc.org/techlit/articles/MLMARK-
                                                                                                  WHITEPAPERFINAL-1.pdf. (Accessed on 5 April 2021).
Acknowledgments                                                                              [16] B. Sudharsan, S. Salerno, D.-D. Nguyen, M. Yahya, A. Wahid, P. Yadav, J.G.
                                                                                                  Breslin, M.I. Ali, Tinyml benchmark: Executing fully connected neural networks
                                                                                                  on commodity microcontrollers, in: 2021 IEEE 7th World Forum on Internet of
    This work was carried out within the DHEAL-COM project (ID: PNC-                              Things, WF-IoT, IEEE, 2021, pp. 883–884.
E3-2022-23683267 PNC – HLS – DH; CUP: E63C22003790001), which                                [17] C. Banbury, V.J. Reddi, P. Torelli, J. Holleman, N. Jeffries, C. Kiraly, P. Montino,
was financially supported by the Italian Ministry of Health through                               D. Kanter, S. Ahmed, D. Pau, et al., Mlperf tiny benchmark, 2021, arXiv preprint
                                                                                                  arXiv:2106.07597.
the Complementary National Plan (CNP) to the PNRR. This publication
                                                                                             [18] MLCommons, 2024, URL: https://mlcommons.org/benchmarks/inference-tiny/.
reflects only the authors’ view and the Italian Ministry of Health is not                    [19] Performance mode vs. Energy mode, 2022, URL: https://github.com/eembc/
responsible for any use that may be made of the information it contains.                          energyrunner?tab=readme-ov-file#performance-mode-vs-energy-mode.
                                                                                             [20] B.N. Taylor, C.E. Kuyatt, Guidelines for Evaluating and Expressing the Un-
Data availability                                                                                 certainty of NIST Measurement Results, NIST Technical Note 1297, National
                                                                                                  Institute of Standards and Technology (NIST), Gaithersburg, MD, 2020, http:
                                                                                                  //dx.doi.org/10.6028/NIST.TN.1297-2020.
    Data will be made available on request.                                                  [21] STMCubeIDE, 2022, URL: https://stm32ai.st.com/stm32-cube-ai/.
                                                                                             [22] Ubuntu 12 RT, 2012, Real-time variant of Ubuntu 12, Canonical Ltd. https:
                                                                                                  //ubuntu.com/real-time. Canonical Ltd.
References                                                                                   [23] STMicroelectronics, STM32H753xI - 32-bit Arm® Cortex® -M7 480MHz MCUs,
                                                                                                  2MB flash, 1MB RAM, 46 com. and Analog Interfaces, Crypto - Datasheet -
 [1] R. Chataut, A. Phoummalayvane, R. Akl, Unleashing the power of IoT: A                        Production Data, Datasheet DS12117 Rev 9, STMicroelectronics, 2023, p. 358,
     comprehensive review of IoT applications and future prospects in healthcare,                 URL: https://www.st.com/resource/en/datasheet/stm32h753vi.pdf. (Accessed 21
     agriculture, smart homes, smart cities, and industry 4.0, Sensors 23 (16) (2023)             August 2025).
     7194.                                                                                   [24] STMicroelectronics, STM32U575xx - Ultra-low-power Arm® Cortex® -M33 32-bit
 [2] Q. Ma, H. Tan, T. Zhou, Mutual authentication scheme for smart devices in                    MCU+TrustZone® +FPU, 240 DMIPS, up to 2 MB Flash memory, 786 KB SRAM -
     IoT-enabled smart home systems, Comput. Stand. Interfaces 86 (2023) 103743.                  Datasheet - production data, Datasheet DS13737 Rev 10, STMicroelectronics,
 [3] C.-W. Shih, C.-H. Wang, Integrating wireless sensor networks with statistical                2024, p. 346, URL: https://www.st.com/resource/en/datasheet/stm32u575ag.
     quality control to develop a cold chain system in food industries, Comput. Stand.            pdf. (Accessed 21 August 2025).
     Interfaces 45 (2016) 62–78.                                                             [25] UEC Electronics, AR4236–AR4237 Luckfox Pico Pro/Max Datasheet,
 [4] S.B. Baker, W. Xiang, I. Atkinson, Internet of things for smart healthcare:                  Datasheet, UEC Electronics, 2024, URL: https://uelectronics.com/wp-
     Technologies, challenges, and opportunities, IEEE Access 5 (2017) 26521–26544.               content/uploads/2024/07/AR4236-AR4237-Luckfox-Pico-Pro-Max-Datasheet.pdf.
 [5] Y. Abadade, A. Temouden, H. Bamoumen, N. Benamar, Y. Chtouki, A.S. Hafid,                    (Accessed 21 August 2025).
     A comprehensive survey on tinyml, IEEE Access (2023).                                   [26] I. BIPM, I. IFCC, I. ISO, O. IUPAP, Evaluation of measurement data—guide to
                                                                                                  the expression of uncertainty in measurement, JCGM 100: 2008 GUM 1995 with
                                                                                                  minor corrections, Jt. Comm. Guides Metrol. 98 (2008).


                                                                                         9