0% found this document useful (0 votes)

50 views35 pages

An4777 How To Optimize Power Consumption On stm32 Mcus Stmicroelectronics

Power consumption

Uploaded by

Mehmet Kırgözoğlu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views35 pages

An4777 How To Optimize Power Consumption On stm32 Mcus Stmicroelectronics

Power consumption

Uploaded by

Mehmet Kırgözoğlu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

AN4777

Application note

How to optimize power consumption on STM32 MCUs

Introduction
This application note applies to the X-CUBE-REF-PM expansion package for STM32Cube, which includes power mode
examples for STM32G0 series, STM32L0 series, STM32L1 series and STM32L4 series microcontrollers.
The power consumption is the biggest advantage of low-power STM32 microcontrollers. The firmware example related to this
application note provides helpful hints on achieving the datasheet levels of power consumption and a simple framework to ease
further experimentation with different configurations.
The low-power STM32 microcontrollers have a rich variety of configuration options regarding the flash memory interface.
While the G0 is not labeled as low lower series, the feature set is similar, and it is a small device with low power consumption.
This application note showcases the different settings under various test conditions, providing guidelines for the optimization of
the power efficiency and is particularly focused on the influence of memory subsystem settings on the execution efficiency. This
subject is covered at the same detail level than in the product datasheets.

Reference documents
The reference documents are available on STMicroelectronics on www.st.com web site
• Ultra-low-power STM32L0x3 advanced Arm®-based 32-bit MCUs reference manual (RM0367)
• STM32L100xx, STM32L151xx, STM32L152xx and STM32L162xx advanced Arm®-based 32-bit MCUs reference manual
(RM0038)
• STM32L4x6 advanced Arm®-based 32-bit MCUs reference manual (RM0411)
• STM32G0x1 advanced Arm®-based 32-bit MCUs (RM0444)

AN4777 - Rev 4 - June 2023 www.st.com

For further information contact your local STMicroelectronics sales office.
AN4777
General information

1 General information

This document applies to Arm®-based devices.

Note: Arm is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or elsewhere.

AN4777 - Rev 4 page 2/35

AN4777
Definitions

2 Definitions

Table 1. List of acronyms

Term Description

NV Nonvolatile (memory), also referred as flash memory

HSI High-speed internal clock
SPI Serial peripheral interface bus
MCU Microcontroller
CPU Central processing unit (part of the MCU)
NVIC Nested vector interrupt controller
DMA Direct memory access
RM Reference manual
SWD Single wire debug interface

AN4777 - Rev 4 page 3/35

AN4777
System architecture

3 System architecture

The memory interface manages the read and write accesses from the core/bus matrix towards the nonvolatile
memory. This holds for both the instruction and data access.
For configuring the nonvolatile memory read access during the program execution, the configuration flags are
accessible in the access control register.
The latency serves the purpose of reducing the rate at which the NVM is read. An extra wait cycle must be
enabled for a system clock higher than 16 MHz for the highest voltage regulator range. For lower core voltages,
this threshold frequency goes lower.
To compensate this bandwidth deficiency, a prefetch can be configured. The memory controller then attempts to
have the next instruction ready before the core requests it.
The STM32L1 flash memory interface can use a 64-bit read access internally to be able to serve the core with
data and instruction close to its own space. The extra 32 bits are used by the prefetch to load the next instruction
and provide it to the core immediately when needed.
The STM32L0 flash memory interface does not have the 64-bit wide bus, but the memory controller is capable of
data preread. This simple buffer is similar to the prefetch, but works also for data.
The STM32L4 flash memory interface has a full 64-bit wide (plus 8-bit ECC) connection to the bus matrix, shared
between data and instruction. The flash memory interface incorporates an ART Accelerator, a prefetch
mechanism and a cache designed to minimize the effect of memory latency. The flash memory interface is then
capable of transferring data and instruction simultaneously, under the condition that they are ready in the cache.
The STM32G0 flash memory interface features prefetch and instruction cache, though smaller than on the L4. No
cache is available for data read. It handles one or two banks of flash memory very similar to the situation found in
the STM32L4. Native word width is 64-bit plus 8-bit of ECC.
All the performance improvements resulting from the memory interface settings, come at a cost of an increased
power consumption. Access with no latency, no preread, no cache, and no prefetch is used in the low-power
mode. The following section sheds light on the kind of tradeoffs that the performance improvements represent.

AN4777 - Rev 4 page 4/35

AN4777
Low-power modes

4 Low-power modes

The bulk of this application note and its main focus are the run modes and efficiency of the code execution. This
is the main added value of this application note over the information covered in the datasheets.
For the sake of completeness, the low-power modes must however be mentioned. It means the states in which
the CPU core cannot execute any code and only the selected subset of peripherals are active.
The following table compares the low-power modes between the MCU series:

Table 2. Low-power mode brief comparison

STM32L0 Series/
MCU series STM32L4 series STM32G0 series
STM32L1 series

Either main or low-power Low-power regulator on, main Either main or low-power regulator,
Sleep modes regulator, flash memory clock off regulator configurable, flash memory flash memory state in low power
with low-power sleep clock configurable mode configurable
Stop modes Single stop mode Stop0, Stop1, and Stop2 steps Stop0 and Stop1
Available and also special shutdown Available and shutdown mode as
Standby Available
mode implemented well

All necessary details about listed low-power modes are in the reference manual and datasheets.

AN4777 - Rev 4 page 5/35

AN4777
Operation modes

5 Operation modes

The following operation modes are used to assess the impact of the memory interface settings on the
performance and power consumption. All the measurements have been done using VCC = 3.3 V and the voltage
regulator range 1. The speed and consumption would be lower using lower regulator levels, but linearly lower
relative to the range 1 measurements. For example with the voltage regulator range 3 and the system clock
speed at 2 MHz (from MSI) the power consumption would be roughly 10 times lower for all the measurements
and the performance roughly 10 times lower for all the measured configurations. There is no point in repeating the
measurement for all the configuration combinations.

5.1 STM32G0 device options

Up to two wait states may be configured on the STM32G0 series. Operation with zero latency is permitted up to
24 MHz in main regulator range 1 and to 8 MHz in range 2.

Table 3. The options in voltage regulator range 1

Frequency ≤ 24 ≤ 48 ≤ 64

Latency 0 0 1 1 1 1 2 2 2 2
Instruction cache 0 1 0 0 1 1 0 0 1 1
Prefetch 0 0 0 1 0 1 0 1 0 1

While it is possible to enable prefetch regardless of latency setting, it makes no sense when number of wait states
is zero. In range 2 the system clock is capped at 16 MHz, which is achieved with 1 wait state. For more details
see chapter 3.3.4 in RM0444.

5.2 STM32L1 series device options

Table 4 lists a short summary of the device options. For a detailed description, refer to read interface section of
the RM0038 reference manual.

Table 4. Configurations available on STM32L1 series devices with regulator range 1

Frequency <16 MHz >16 MHz

Latency 0 0 1 1 1 1
64-bit 0 1 1 1 1 1
Prefetch 0 0 0 1 0 1

The table of valid configurations is clearly demonstrating the following simple rules:
• Wait states are inevitable when exceeding 16 MHz.
• When the latency is set to 1, the 64-bit access is mandatory.
• The prefetch is impossible without the 64-bit access.

5.3 STM32L0 series device options

Table 5 lists a short summary of the device options. For a detailed description, refer to reading the NVM section of
the RM0367 reference manual.

AN4777 - Rev 4 page 6/35

AN4777
STM32L0 series device options

Table 5. Configurations available on STM32L0 series devices with regulator range 1

Frequency <16 MHz >16 MHz

Latency 0 1 0 1 1 1 0 1 1 1 1 1 1
Preread 0 0 1 1 1 0 X X 0 1 1 0 X
Prefetch 0 0 0 0 1 1 X X 0 0 1 1 X
Buffer disable 0 0 0 0 0 0 1 1 0 0 0 0 1

The table of valid configurations is clearly demonstrating the following simple rules:
• The latency cannot be zero with clock speeds exceeding 16 MHz.
• When the buffer is disabled, it cannot be configured.
• Prefetch and preread configure the usage of the six words in the internal buffer, not their total amount.

AN4777 - Rev 4 page 7/35

AN4777
STM32L4 Series device options

5.4 STM32L4 Series device options

Table 6 lists a short summary of the device options. For a detailed description refer to reading the NVM section of
the RM0411 reference manual.

Table 6. Device option summary

Frequency <16 (at VCORE range 1) >16 (at VCORE range 2)

Latency 0 >1
Data cache 0 0 1 1 0 0 0 1 0 1 1 1
Instruction cache 0 1 0 1 0 0 1 0 1 1 0 1
Prefetch 0 0 0 0 0 1 0 0 1 0 1 1

The prefetch, data cache and instruction cache settings are independent of each other. Each of these three
features can be enabled or disabled independently of the frequency or any other setting. However, some settings
make less sense than others, especially with zero wait states, prefetch is definitely not recommended.
The settings are only simple when the voltage regulator settings are disregarded. But the read access latency
strongly depends on the voltage regulator settings. For example at a 16 MHz speed, while with range 1 the
latency on a Flash read is 1 CPU cycle, with range 2 the latency on the same core frequency increases to 3 CPU
cycles.
For more details see the Read access latency section in the RM0411 reference manual.

5.5 Execution from a volatile memory

The intuitive way to avoid the flash memory speed issues would be to use the RAM for selected portions of code.
There are several reasons not to do that.
1. The RAM is a scarce resource on small devices.
2. Most of data are likely to be placed in the RAM, accessing the code in the RAM eliminates the advantage of
Harvard architecture (separate data and instruction buses) approach in the STM32L1 and STM32L4 series.
3. To switch off the flash memory and conserve more energy, also the interrupt table and interrupt handlers
need to be in the RAM.
In case of a typical microcontroller application, the overall energy budget of the RAM execution is roughly the
same as the execution on the 32 MHz system clock with the flash memory latency set. Which means that if the
flash memory can run without the latency enabled, it is a better option most of the time. In other words the RAM
execution tends to be about 30% slower than the execution of the same code from the flash memory and the
current consumption will not decrease more than the same 30% range.
Note: When the decision is taken to use the RAM for code storage, the address on which the code is stored within
RAM may play a significant role in the power consumption figures. This note is not only relevant for STM32L4
series. Because the principle behind this behavior cannot be generally described for every configuration and use
case, it is best to figure out the optimal placement by experimenting with the application during development,
especially if the product features several separate sections of RAM with different properties.

AN4777 - Rev 4 page 8/35

AN4777
Reproducing the measurements to get datasheet values

6 Reproducing the measurements to get datasheet values

The STM32Cube Expansion Package (X-CUBE-REF-PM) related to this application note is intended for use with
cheap and widespread STM32 Nucleo application boards. With some effort, the examples can be adapted for
other hardwares. This description refers to Nucleo boards.

6.1 Hardware and prerequisites

For simplicity sake the examples are using the VCP UART embedded in the STLINK for the UI and controls. Only
a single USB cable is used to power and control the tested Nucleo board. A terminal emulator program is
necessary on the PC to connect to the virtual COM port. Tera Term 4.84 is used in the testing.
The Nucleo board is not equipped with any power sensing capability. However, it is equipped with a JP6 jumper
that can be replaced with an ampere meter or any other current sensing device. For details refer to STM32
Nucleo-64 boards (MB1136) user manual (UM1724).
The X-NUCLEO-LPM01 energy monitor device used in this example development is an ideal choice of current
and energy monitor device for the Nucleo board.
For measurements involving the HSE bypass, a clock source is necessary. Nucleo boards however provide an
option to use 8 MHz MCO from STLINK as the HSE clock. Some solder bridges may need to be modified for this.
See the particular Nucleo user manual for details.

6.2 Example operation

Configure the terminal emulator application to 9600Bd, 8-bit, no parity.
The firmware is loaded and executed: the terminal displays the following screen:

Figure 1. Terminal screen

All the controls are implemented as number key press inputs, with choices listed on the bottom of the screen. The
choices are not available at all times.
The control firmware deliberately tries to hide settings that are not applicable. For example, when a low-power
mode is selected, the executed code selection is hidden as not relevant.
Enter the number corresponding to the available choices (selections 1-5).

AN4777 - Rev 4 page 9/35

AN4777
Test configurations explained

In case of another selection, the terminal asks for a new value. Once the choice is made, updated settings are
listed. For example, when the low-power run mode is selected, the oscillator settings are adjusted to produce a
compatible system clock.
To execute a test, first set the power mode: it determines the available system clock settings and the test
availability. For low-power mode the active peripherals may be selected, for run modes the executed code may be
selected.
The firmware tries to limit the access to some of the setting combinations, that would obviously lead to failure.
However, especially when using the HSE clock source it is still possible to leave the operating conditions
envelope defined in the datasheets. The correct operation is then not guaranteed.
To start the test execution, enter ‘6’ in the root menu. In case of failure, the firmware activates the on-board LED.

6.3 Test configurations explained

The example firmware may be built with several different options.

Table 7. The example build options

Define Active Not active

Blue button on the Nucleo board abort most of the Reset button on the Nucleo board is used to return
EXTI_BUTTON tests, returning into the root menu, retaining settings. into the root menu. Settings are however reset to
May cause additional current consumption. default values.
Relevant computational tests are limited to
Tests run until aborted by reset, power off,
FINITE_LOOP LOOP_COUNT cycles. Measuring the time to complete
debugger or EXTI (list depending on other options).
the task is used to compute the execution efficiency.
Debug interface is active during the test. Useful to Debug interface is in high-Z during the test. Only
DEBUG_ON
review the settings and check the functionality. this code must be ever used for measurement.

The default setting ‘with all three define switches not active’, is the configuration, which allows the user to obtain
the datasheet values.
The datasheet includes the power consumption measurements for several different codes executed. These are
Dhrystone, CoreMark, Fibonacci, and while(1) loop. The CoreMark is not included in the published example code
for licensing reasons. But the example includes two additional test codes instead. The “Reduced code” and
“Memory read stress test” are focusing directly on the memory interface settings and their influence on the
execution efficiency.
The flash memory interface efficiency focused tests are not present in the datasheet. The results of their
execution are analyzed in the following pages.

AN4777 - Rev 4 page 10/35

AN4777
Power consumption and performance comparison using STM32L1 Series devices

7 Power consumption and performance comparison using STM32L1

Series devices

To assess the performance of the MCU with different memory controller settings, several benchmark tests have
been used. All the tests have been executed on a NUCLEO-L152RE board using all the available memory
interface settings, listed in Section 5.2 STM32L1 series device options. All the tests have been executed both in
standalone and in parallel with a DMA transfer, constantly reading from the program NV memory. The DMA
channel was directed to the SPI output configured to the highest available speed (fPCLK/2) and low priority.
Three clock configurations have been used in the measurements. One with the plain 16 MHz HSI clock as the
system clock and no latency set, another with the same clock but the Flash latency configured (Flash memory
running effectively on lower clock) and the third with the PLL set to produce the 32 MHz system clock.
All the measurements are taken on a single sample of NUCLEO-L152RE board at ambient temperature. The
values provided are an arithmetic mean from several measurements.

7.1 STM32L1 dhrystone benchmark

Although the Dhrystone benchmark is often deemed outdated, it is still somewhat representative of many
microcontroller applications.

Table 8. Dhrystone results with no background transfer

Frequency 16 MHz 32 MHz

Latency 0 0 0 1 1 1 1
64-bit 0 1 1 1 1 1 1
Prefetch 0 0 1 0 1 0 1
Timing for 50000 cycles [s] 2.57 2.57 2.57 3.05 2.86 1.52 1.46
Average current [mA] 5.75 5.78 6.11 5.13 5.62 10.42 11.08
Energy [mJ] 48.77 49.02 51.82 51.63 53.04 52.27 53.38

AN4777 - Rev 4 page 11/35

AN4777
STM32L1 dhrystone benchmark

Figure 2. Dhrystone results with no background transfer

12
32MHz; 64b +
prefetch
32MHz; 64b access without prefetch
10

16MHz; prefetch and 64b access

6 16MHz; latency,
I[mA]

16MHz; 64b access, no prefetch 64b and prefetch

16MHz; no prefetch, 64b off

16MHz; latency and 64b access

0
0 0.5 1 1.5 time [s] 2 2.5 3 3.5

Table 9. Dhrystone results with DMA simultaneously reading data from the Flash memory

Frequency 16 MHz 32 MHz

Latency 0 0 0 1 1 1 1
64-bit 0 1 1 1 1 1 1
Prefetch 0 0 1 0 1 0 1
Timing for 50000 cycles [s] 2.72 2.68 2.68 3.28 3.09 1.64 1.55
Average current [mA] 6.17 6.25 6.58 5.50 5.99 11.24 11.68
Energy [mJ] 55.38 55.28 58.19 59.53 61.08 60.83 59.74

AN4777 - Rev 4 page 12/35

AN4777
32-bit instruction code

Figure 3. Dhrystone results with DMA simultaneously reading data from the Flash memory

32MHz; 64b and prefetch

12
32MHz; 64b access and no
prefetch

8
16MHz; latency on, 64b
I[mA]

16MHz; 64b access and prefetch acces and prefetch active

16MHz; 64b access, no prefetch
6
16MHz; both prefetch and 64b
Access off
16MHz; latency,and 64b
4 access set, no prefetch

0
0 0.5 1 1.5 time [s] 2 2.5 3 3.5

Configuring a 64-bit access or a prefetch makes a very small difference on a low clock speed where the latency
can be avoided. On the contrary, setting the latency may lead to a lower power consumption in situations where
the speed is not critical. At higher speeds the efficiency of the prefetch is situational, leading to ultimate
performance but the gain in speed may be lower than the consumption increase.

7.2 32-bit instruction code

A stress test consists of executing 12 aligned 32-bit instructions manipulating data in registers in a loop of 500000
cycles. The code with a higher ratio of 32-bit instructions is more likely to find a bottleneck in the memory interface
than a typical Thumb code with prevalent 16-bit instructions.

Table 10. 32-bit code result with no background transfer

Frequency 16 MHz 32 MHz

Latency 0 0 0 1 1 1 1
64-bit 0 1 1 1 1 1 1
Prefetch 0 0 1 0 1 0 1
Timing for 500000 cycles [s] 0.9 0.9 0.9 1.06 0.964 0.59 0.497
Average current [mA] 5.25 5.41 5.63 4.82 5.11 9.09 9.78
Energy [mJ] 15.59 16.07 16.72 16.86 16.26 17.70 16.04

AN4777 - Rev 4 page 13/35

AN4777
32-bit instruction code

Figure 4. 32-bit code result with no background transfer

10
32MHz; 64b and prefetch on 32MHz; prefetch off

8
I[mA]

16MHz; 64b and prefetch active

6 16MHz; 64b access, no prefetch
16MHz; with latency and
16MHz; no 64b access nor 64b access activated
prefetch used

16MHz; with latency, 64b access

4 and prefetch all active

0
0 0.2 0.4 0.6 0.8 1 1.2
time [s]

Table 11. 32-bit code result with DMA simultaneously reading data from the Flash memory

Frequency 16 MHz 32 MHz

Latency 0 0 0 1 1 1 1
64bit 0 1 1 1 1 1 1
Prefetch 0 0 1 0 1 0 1
Timing for 500000 cycles [s] 0.956 0.921 0.916 1.22 1.02 0.64 0.54
Average current [mA] 5.85 5.96 6.18 5.20 5.67 9.83 10.66
Energy [mJ] 18.46 18.11 18.68 20.94 19.09 20.76 19.00

AN4777 - Rev 4 page 14/35

AN4777
STM32L1 memory read stress test

Figure 5. 32-bit code result with DMA simultaneously reading data from the Flash memory

32MHz; prefetch enabled

10
32MHz; prefetch disabled

16MHz; with 64b access and prefetch

16MHz; latency and 64b access
16MHz; with 64b access, no prefetch active, no prefetch
I [mA]

6
16MHz; no 64b access, no prefetch
16MHz; latency active along with
64b access and prefetch
4

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4
time [s]

The findings are in line with the expectations: a code with high share of 32-bit instructions benefits a lot from the
prefetch once the memory latency is in place. But with zero latency the extra bandwidth is likely to be useless.

7.3 STM32L1 memory read stress test

A stress test consists of executing 20 LDR instructions fetching data from the program NV memory to the CPU
core registers in a loop of 500000 cycles. This way, not only the instructions are fetched from the memory but
another read access is generated during the instruction execution, again creating a choke point at the memory
interface. Fetching of subsequent instruction is then likely to be delayed. The code simulates a case when a
heavy load of literal pools (string constants) like for example predefined messages, is read from a non-volatile
memory very often.
Note: The memory reading by LDM instructions is not used as it is not demonstrating limits of the memory interface,
only the memory itself.

Table 12. Literal pool with no additional data read from the Flash memory

Frequency 16 MHz 32 MHz

Latency 0 0 0 1 1 1 1
64-bit 0 1 1 1 1 1 1
Prefetch 0 0 1 0 1 0 1
Timing for 500000 cycles [s] 3.66 2.73 2.72 3.38 3.32 1.69 1.66
Average current [mA] 5.44 5.58 6.12 4.85 5.33 9.78 10.73
Energy [mJ] 65.70 50.27 54.93 54.10 58.40 54.54 58.78

AN4777 - Rev 4 page 15/35

AN4777
STM32L1 memory read stress test

Figure 6. Literal pool reading with no additional data read from the Flash memory

32MHz, 64b and prefetch

10 32MHz without prefetch

16MHz, 64b and prefetch 16MHz w/o 64b

I[mA]

access
6
16MHz without prefetch

16MHz with latency, 64b

4 and prefetch
16MHz with 64b and
latency of 1

0
0 0.5 1 1.5 2 2.5 3 3.5 4
time [s]

Table 13. Literal pool reading with DMA simultaneously reading the Flash memory

Frequency 16 MHz 32 MHz

Latency 0 0 0 1 1 1 1
64-bit 0 1 1 1 1 1 1
Prefetch 0 0 1 0 1 0 1
Timing for 500000 cycles [s] 3.98 2.94 2.94 3.92 3.88 1.97 1.96
Average current [mA] 6.04 6.26 6.73 5.40 5.72 10.62 11.59
Energy [mJ] 79.33 60.73 65.29 69.85 73.24 69.04 74.96

AN4777 - Rev 4 page 16/35

AN4777
STM32L1 memory read stress test

Figure 7. Literal pool reading with DMA simultaneously reading data from the Flash memory

12
32MHz, 64b and prefetch

32MHz without prefetch

8
16MHz, 64b and prefetch
I[mA]

16MHz, without prefetch 16MHz, w/o 64b access

16MHz with latency, 64b and

4 prefetch
16MHz with latency
and 64b

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
time [s]

As expected, mostly in case of a data read transfer the effect of the prefetch is lower, but a 64-bit memory access
makes a significant difference even with zero memory latency.

AN4777 - Rev 4 page 17/35

AN4777
Power consumption and performance comparison using STM32L0 series devices

8 Power consumption and performance comparison using STM32L0

series devices

The Cortex-M0+ core is much simpler compared to the Cortex-M3 used in the STM32L1 Series. The 32-bit
instruction benchmark is dropped as the Thumb-2 instruction set support in the M0+ core is very limited and an
extensive usage of 32-bit code is not realistic with a code compiled for the STM32L0 Series.
The remaining tests have been executed on a NUCLEO-L073RZ board using all the available memory interface
settings, listed in Section 5.3 STM32L0 series device options. All the tests have been executed both standalone
and in parallel with a DMA transfer constantly reading from the program NV memory. The DMA channel was
directed to the SPI output configured to the highest available speed (fPCLK/2), but low priority.
Two clock configurations have been used in the measurements. One with the plain 16 MHz HSI clock as the
system clock and no latency set, the other with the PLL set to produce the 32 MHz system clock and of course
the Flash memory latency set to 1.
All the measurements are taken on a single sample of NUCLEO-L073RZ board at ambient temperature. The
values provided are an arithmetic mean from several measurements.

8.1 STM32L0 Dhrystone benchmark

The Dhrystone code is executed and the task consists of processing 50000 cycles of the test code.

Table 14. Dhrystone with no additional data read from the flash memory

Frequency 16 MHz 32 MHz

Latency 0 0 0 0 0 1 1 1 1 1
Prefetch 1 0 0 1 0 1 0 0 1 0
Preread 1 1 0 0 0 1 1 0 0 0
Disabled buffer 0 0 1 0 0 0 0 1 0 0
Time [ms] 3769 3766 3771 3769 3769 2139 2667 2720 2130 2667
Average current [mA] 4.32 4.42 4.54 4.40 4.39 8.14 7.52 7.52 8.04 7.43
Energy [mJ] 53.73 54.93 56.49 54.72 54.60 57.46 66.20 67.49 56.51 65.40

AN4777 - Rev 4 page 18/35

AN4777
STM32L0 Dhrystone benchmark

Figure 8. Dhrystone with no additional data read from the flash memory

9.00

32MHz; pre-read and prefetch

32MHz; pre-read only
8.00
32MHz; prefetch only 32MHz; buffer disabled

7.00 - pre-read or prefetch

32MHz; no

6.00

5.00
16MHz; buffer disabled
I[mA]

16MHz; pre-read only

4.00
16MHz; prefetch only
16MHz; no pre-read or prefetch
3.00 16MHz; prefetch and pre-read

2.00

1.00

0.00
0 500 1000 1500 2000 2500 3000 3500 4000
time [ms]

Table 15. Dhrystone with DMA simultaneously reading data from the flash memory

Frequency 16 MHz 32 MHz

Latency 0 0 0 0 0 1 1 1 1 1
Prefetch 1 0 0 1 0 1 0 0 1 0
Preread 1 1 0 0 0 1 1 0 0 0
Disabled buffer 0 0 1 0 0 0 0 1 0 0
Time [ms] 3903 3901 3906 3906 3904 2377 2853 2956 2334 2843
Average current [mA] 4.69 4.77 4.87 4.68 4.59 8.58 8.21 8.15 8.66 7.80
Energy [mJ] 69.40 61.41 62.77 60.32 59.13 67.29 77.31 79.31 66.70 73.17

AN4777 - Rev 4 page 19/35

AN4777
STM32L0 Dhrystone benchmark

Figure 9. Dhrystone with DMA simultaneously reading data from the flash memory

10
32MHz; prefetch only

9
32MHz; pre-read and prefetch
32MHz; pre-ready only
8 32MHz; buffer disabled
32MHz; no pre-read or prefetch
7

16MHz; buffer disabled

5
16MHz; pre-read only
I[mA]

16MHz; pre-read and prefetch

16MHz; prefetch only
4
16MHz; no prefetch or pre-read

0
0 500 1000 1500 2000 2500 3000 3500 4000 4500
time [ms]

This example clearly shows that the internal six word buffer improves the energy efficiency even if it is not well
utilized, like in case of zero latency. The best option is to keep it on, but to disable the prefetch and preread.
In case of the configuration with the latency is enabled, the prefetch is probably worth using. The preread is
obviously not used by the DMA channel and does not represent an improvement in this particular scenario.

AN4777 - Rev 4 page 20/35

AN4777
STM32L0 memory read stress test

8.2 STM32L0 memory read stress test

A stress test consists of executing 20 LDR instructions fetching data from program NV memory to CPU core
registers in a loop of 500000 cycles. This way, not only the instructions are fetched from the memory but another
read access is generated during the instruction execution, again creating a choke point at the memory interface.
Fetching of subsequent instruction is then likely to be delayed. The code simulates a case when a heavy load of
literal pools, like for example predefined messages, is read from a non-volatile memory very often.
Note: The memory reading by LDM instructions are not used as it is not demonstrating limits of the memory interface,
only the memory itself.

Table 16. Literal pool with no additional data read from the Flash memory

Frequency 16 MHz 32 MHz

Latency 0 0 0 0 0 1 1 1 1 1
Prefetch 1 0 0 1 0 1 0 0 1 0
Pre-read 1 1 0 0 0 1 1 0 0 0
Disabled buffer 0 0 1 0 0 0 0 1 0 0
Time [ms] 2402.5 2401.5 2403 2403 2399.5 2009 2058.5 2091 1817 1819
Average current [mA] 3.4 3.42 3.36 3.14 3.19 6.03 6.05 5.94 5.83 5.73
Energy [mJ] 26.95 27.10 26.64 24.89 25.25 39.97 41.09 40.98 34.95 34.39

Figure 10. Literal pool with no additional data read from the Flash memory

7
32MHz; both pre-read
and prefetch on
32MHz; prefetch only 32MHz; pre-read only
6

32MHz; buffer disabled

32MHz; no pre-read
5 or prefetch

4
16MHz; pre-read only
I[mA]

16MHz; prefetch and pre-read

3 16MHz; buffer disabled

16MHz; no pre-read or prefetch

16MHz; prefetch only

0
0 500 1000 1500 2000 2500 3000
time[ms]

AN4777 - Rev 4 page 21/35

AN4777
STM32L0 memory read stress test

Table 17. Literal pool with DMA simultaneously reading data from the Flash memory

Frequency 16 MHz 32 MHz

Latency 0 0 0 0 0 1 1 1 1 1
Prefetch 1 0 0 1 0 1 0 0 1 0
Pre-read 1 1 0 0 0 1 1 0 0 0
Disabled buffer 0 0 1 0 0 0 0 1 0 0
Time [ms] 2533.5 2533.5 4854.5 4587 4591 2292.5 2301 2420 2299 2302.5
Average current [mA] 3.86 3.86 3.38 3.32 3.29 7.42 7.39 7.34 7.25 7.18
Energy [mJ] 32.27 32.27 54.15 50.26 49.84 56.13 56.11 58.62 55.00 54.56

Figure 11. Literal pool with DMA simultaneously reading data from the Flash memory

8
32MHz; pre-read only
32MHz; prefetch and pre-read

32MHz; buffer disabled

32MHz; only prefetch

6
32MHz; no pre-read or prefetch

4
I[mA]

16MHz; pre-read disabled 16MHz; buffer disabled

16MHz; pre-read active

0
0 1000 2000 3000 4000 5000 6000
time[ms]

This example finally demonstrates the advantage of the pre-read setting. It can greatly improve the efficiency
when more than one stream of data is read from the Flash memory and there is no latency. The prefetch is not
useful when dealing mostly with data, that is no surprise. Again it is a good idea to keep the buffer enabled. The
only reason to disable the buffer is if the timing needs to be more deterministic, whatever the efficiency cost may
be.

AN4777 - Rev 4 page 22/35

AN4777
Power consumption and performance comparison using STM32L4 Series devices

9 Power consumption and performance comparison using STM32L4

Series devices

The STM32L4 Series devices are based on the Cortex-M4 core connected to the 32-bit multilayer AHB bus matrix
that connects up to six master and eight slave devices supporting concurrent operations as long as the bus
masters are accessing different bus slaves.
The tests have been executed on a NUCLEO-L476RG board using all the available memory interface settings,
listed in Table 6. Device option summary. The results of execution with a concurrent DMA transfer are not
included for the STM32L4 Series. The impact of the DMA on timing is minimal and the added current
consumption is approximately the same regardless of the Flash interface configuration, making the results not
interesting.
One set of tests has been executed only with VCORE range1 to provide a comparison with other series featured in
this overview and to assess the impact of the prefetch and caches.
Other set of measurements has been executed using different latency, frequency and voltage regulator settings to
assess the energy needed for different operations in case of a battery powered application.
All the measurements are taken on a single sample of NUCLEO-L476RG board at ambient temperature. The
values provided are an arithmetic mean from several measurements.

9.1 Influence of prefetch and cache with zero Flash latency

One fact must be clarified before more measurement results presentation. Neither the prefetch or caches have
any influence on the execution speed when the Flash memory is available with zero latency. But the impact on the
power consumption may be significant.
The prefetch actively tries to read the following instruction from the Flash memory and the energy used to read
the instruction may be wasted in case of branch. In case of a correct instruction prefetch there is no timing
advantage, as the instruction is also ready within one clock cycle from the Flash memory. It is recommended to
disable the prefetch when the latency is zero. The measured input current difference is 10% in case of dhrystone.
On the contrary the caches tend to conserve the energy when they are activated. Both the instruction and data
cache are likely to replace an access to the Flash memory with an access to the cache, which needs significantly
less current. The test have proven that enabling the caches lowers the power consumption by 20%.
With both contributors combined, the STM32L476G in a worst configuration of the Flash interface, runs at
significantly higher current consumption than that using optimal settings (both at 16 MHz, latency 0, VCORE range
1).

9.2 STM32L4 dhrystone benchmark

The Dhrystone code is executed only on optimal settings with a zero latency, where the timing is still the same.
The task consists of processing 50000 cycles of the test loop, using the HSI or the HSI sourced PLL as the clock
source.

Table 18. Dhrystone test using core voltage range 1 and HSI clock

Frequency 16 MHz 32 MHz

Latency 0 1
D-cache 1 0 0 0 1 1 1 1 0
I-cache 1 0 0 1 1 1 0 0 1
Prefetch 0 0 1 1 1 0 0 1 0
Time [ms] 2561 1552 1473 1313 1281 1283 1498 1430 1310
Average current [mA] 3.12 6.55 6.61 5.87 5.9 5.65 6.56 6.6 5.71
Energy [mJ] 24.80 31.51 30.19 23.89 23.42 22.48 30.45 29.25 23.19

AN4777 - Rev 4 page 23/35

AN4777
STM32L4 dhrystone benchmark

Figure 12. Dhrystone test plot of energy needed for execution

7.00
29.25 30.19

31.51
30.45
6.00
23.42 23.89

22.48 23.19

5.00
I [mA]

30.57
4.00

29.77
26.28
24.72
24.88
3.00

2.00
1000 1500 2000 2500 3000
time [ms]

This example clearly demonstrates that while the prefetch can lead to an improved performance, especially if the
instruction cache is enabled, it does not bring a significant additional advantage in case of the dhrystone test
code. The prefetch complements the caches and helps in the code sections with minimum loops, where the
caches cannot help.
The optimal configuration of the Flash interface being identified, how the cache behaves using different core clock
speeds. A higher clock speed leads to a higher latency, forcing the core to wait for a read access to the Flash
memory if the instruction and data are not available in the ART cache. The core waiting for the memory still needs
energy, reducing the overall efficiency.

AN4777 - Rev 4 page 24/35

AN4777
STM32L4 memory read stress test

Figure 13. Energy cost of the dhrystone test loop

Range2, ART disabled

E [mJ]

25
Range2, ART enabled
Range1, ART disabled
Range1, ART enabled
20

10
0 10 20 30 40 50 60 70 80 90
f [MHz]

In Figure 13. Energy cost of the dhrystone test loop the same test loop of 50000 dhrystone tests is executed with
different clock settings using either the MSI, or in case of a 64 MHz and a 80 MHz PLL, a module with the MSI as
the source clock. The additional power consumption of the PLL causes a slight drop in the efficiency visible on the
chart.
Otherwise the chart shows us that at least in case of a dhrystone test, which includes lot of loops, the ART
accelerator cache is able of improving the MCU execution efficiency by increasing the core clock. This is a
remarkable feature.

9.3 STM32L4 memory read stress test

A stress test consists of executing 20 LDR instructions fetching data from program NV memory to CPU core
registers in a loop of 100000 cycles. This test demonstrates mainly the power of the data cache in such situations.

Table 19. Literal measurements

Frequency 16 MHz 32 MHz

Latency 0 1
D-cache 1 0 0 0 1 1 1 1 0
I-cache 1 0 0 1 1 1 0 0 1
Prefetch 0 0 1 1 1 0 0 1 0
Time [ms] 570 344.5 344 340.2 284.9 284.3 288.1 288.7 340.2
Average current [mA] 3.10 6.75 6.77 6.49 6.19 6.09 6.9 6.88 6.45
Energy [mJ] 5.49 7.21 7.22 6.84 5.47 5.37 6.16 6.16 6.80

AN4777 - Rev 4 page 25/35

AN4777
STM32L4 memory read stress test

Figure 14. Literal pool chart plot of energy efficiency

6.16
7 7.22
6.16 7.21
5.47 6.84
6 5.37 6.80

6.90
I[mA]

4
6.39

3
5.49

0
0 100 200 300 400 500 600
time [ms]

In case of data literal pool loop the data cache tends to improve significantly the execution speed, while the
instruction cache tends to rather contribute to the power consumption. What is not visible from the plot is that the
efficiency improvement tends to grow slowly with several hundred iterations before reaching a maximum.

AN4777 - Rev 4 page 26/35

AN4777
Power consumption and performance measurements on STM32G0 series device

10 Power consumption and performance measurements on STM32G0

series device

The STM32G0 shares some power saving features with the low power series. The STM32G0B1RE, device used
in the measurement, has 512 kB of dual bank flash memory.
ES0548 and ES0549 describe a bug that compromises the prefetch advantage of this device.
When the boundary between the two banks is crossed, the prefetch may fail to present the intended instruction,
resulting in a possible hard fault.
There is no workaround, so it is recommended to disable prefetch.
Architecturally, the STM32G0 has the same Cortex-M0+ CPU core as the STM32L0 series, but with a nonvolatile
memory arrangement more similar to the STM32L4, with a smaller cache.
The measurements presented in this document are performed on the Nucleo-G0B1RE board without
modifications.

10.1 STM32G0 Dhrystone benchmark

All measurements are made using the 8 MHz HSE supplied by the STLINK. This arrangement is slightly less
efficient than HSI at 16 MHz because it needs a PLL to achieve this clock speed. At higher clock rates, where a
PLL is needed anyway, the consumption is slightly lower because the oscillator is externalized. In this case, it is a
fair comparison of efficiency between different clock speed configurations.
The benefit of both cache and prefetch depends on the code being executed. More loops emphasize the benefit
of cache, while branching negates the benefit of prefetch. In general, accessing flash memory consumes more
power than accessing RAM. If the cache hits, some energy is saved. If the prefetch fails, energy is wasted.
The following table shows how flash memory interface settings affect both performance and efficiency.

Table 20. Flash memory interface settings

frequency [MHz] 16 32 64

Latency 0 1 1 1 1 2 2 2 2
Cache 0 1 1 0 0 1 1 0 0
Prefetch 0 0 1 1 0 0 1 1 0
Time [s] 2.06 1.17 1.09 1.19 1.3 0.66 0.595 0.693 0.789
Average current [mA] 2.56 4.21 4.57 4.84 4.39 7.72 8.4 8.56 7.7
Energy [mJ] 17.67 16.89 17.05 19.73 19.57 17.38 17.08 20.23 20.56

The benchmark shows the advantage of both cache and prefetch. As latency increases, they keep the CPU busy
and efficient. But while the cache hits save energy, the prefetch costs energy even if the instruction is not used.
In some cases, such as the Dhrystone running with 1 wait state, prefetching improves performance but decreases
the overall power efficiency.
Other methods of assessing performance have been used, with results that differ in absolute terms or even in the
order of configurations in terms of efficiency. However, the overall trend is broadly the same, suggesting that both
prefetch and cache benefits increase as clock speed (and latency) increases, with cache improving more on the
efficiency side, and prefetch providing the greatest benefit at peak performance.

AN4777 - Rev 4 page 27/35

AN4777
General observations on power consumption optimization

11 General observations on power consumption optimization

The general rule to minimize the power consumption is to perform the task for the shortest possible time, at the
lowest possible operating frequency and with the clock enabled to a minimal part of the silicon.
In other words, the goal is to optimize for execution speed and then find an optimal balance between the time and
the clock frequency. The speed optimization is mostly a matter of compiler choice. If the user has the opportunity,
he must build the reference projects with different development tools and observe the difference in power
consumption and execution speed.
Even the best compiler can benefit from some tricks applicable in most C source codes:
1. Where possible, use variables of size that correspond to the CPU register size (32 bits).
2. Use macros instead of simple functions to save on function call overhead.
3. Learn to use keywords like static, restrict, register, inline.
4. Most compilers can be guided using various “#pragma” statements for more optimized results. Check what
pragmas are available in your development toolchain.
The memory placement influences also the power consumption. Some microcontrollers embed more than one
type of volatile memory. Some may need little more energy than others.

AN4777 - Rev 4 page 28/35

AN4777
Conclusion

12 Conclusion

Each low-power STM32 microcontroller series requires a slightly different approach to optimize the energy
efficiency.
Putting the product in low-power mode during the idle period is best practice, but the wake up time must always
be considered. The peripherals left active in low-power mode to trigger the wake up have an impact on the power
consumption. This is detailed in the datasheet and can be checked using the firmware examples.
Another set of optimization challenges is presented in relation to the Run mode and the code execution.
The measured results provide the guidance for decision whether or not to enable the different memory interface
settings. The features like the prefetch, improving the benchmark result, also lead to a higher power consumption
and the overall efficiency is dependent on the task processed by the microcontroller.
There is no significant benefit in tweaking the settings when the flash memory latency is not in place. This makes
sense only if the flash memory contains frequently used literal pools (predefined data constants) or if the cache
access leads to lower energy consumption.
With the flash memory latency in place, the flash interface must be set up carefully, as the performance difference
between the optimal and default configuration may be significant. It is definitely possible to activate some flash
interface settings only temporarily for particular operations and disable them afterwards.
It is demonstrated that the errata present on the dual bank STM32G0 devices does impact the top performance,
but less so the efficiency.

AN4777 - Rev 4 page 29/35

AN4777

Revision history

Table 21. Document revision history

Date Revision Changes

19-Jan-2016 1 Initial release.

Updated cover adding STM32L4 Series.
Updated Section 3 System architecture adding STM32L4 memory interface description.
24-Oct-2016 2 Added Section 5.4 STM32L4 Series device options.
Added Power consumption and performance comparison using STM32L4 Series devices.
Updated Section 12 Conclusion.
Added Section 1 General information
Added Section 4 Low-power modes.
21-Aug-2019 3
Added Section 6 Reproducing the measurements to get datasheet values.
Updated Section 11 General observations on power consumption optimization.
Added new sections:
28-Jun-2023 4 • Section 5.1 STM32G0 device options
• Section 10 Power consumption and performance measurements on STM32G0 series device

AN4777 - Rev 4 page 30/35

AN4777
Contents

Contents
1 General information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
3 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Low-power modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5 Operation modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
5.1 STM32G0 device options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2 STM32L1 Series device options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.3 STM32L0 Series device options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.4 STM32L4 Series device options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.5 Execution from a volatile memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

6 Reproducing the measurements to get datasheet values . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6.1 Hardware and prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6.2 Example operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6.3 Test configurations explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

7 Power consumption and performance comparison using STM32L1 Series devices 11

7.1 STM32L1 dhrystone benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.2 32-bit instruction code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.3 STM32L1 memory read stress test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8 Power consumption and performance comparison using STM32L0 Series devices 18

8.1 STM32L0 dhrystone benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8.2 STM32L0 memory read stress test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

9 Power consumption and performance comparison using STM32L4 Series devices 23

9.1 Influence of prefetch and cache with zero Flash latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.2 STM32L4 dhrystone benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.3 STM32L4 memory read stress test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

10 Power consumption and performance measurements on STM32G0 series device. .27

10.1 STM32G0 Dhrystone benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

11 General observations on power consumption optimization . . . . . . . . . . . . . . . . . . . . . . . .28

12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

AN4777 - Rev 4 page 31/35

AN4777
Contents

Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33
List of figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34

AN4777 - Rev 4 page 32/35

AN4777
List of tables

List of tables
Table 1. List of acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Table 2. Low-power mode brief comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Table 3. The options in voltage regulator range 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Table 4. Configurations available on STM32L1 series devices with regulator range 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Table 5. Configurations available on STM32L0 series devices with regulator range 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Table 6. Device option summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Table 7. The example build options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Table 8. Dhrystone results with no background transfer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Table 9. Dhrystone results with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . . . . . . . . . . . . 12
Table 10. 32-bit code result with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Table 11. 32-bit code result with DMA simultaneously reading data from the Flash memory. . . . . . . . . . . . . . . . . . . . . . . 14
Table 12. Literal pool with no additional data read from the Flash memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Table 13. Literal pool reading with DMA simultaneously reading the Flash memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Table 14. Dhrystone with no additional data read from the flash memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Table 15. Dhrystone with DMA simultaneously reading data from the flash memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Table 16. Literal pool with no additional data read from the Flash memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Table 17. Literal pool with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Table 18. Dhrystone test using core voltage range 1 and HSI clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Table 19. Literal measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Table 20. Flash memory interface settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Table 21. Document revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

AN4777 - Rev 4 page 33/35

AN4777
List of figures

List of figures
Figure 1. Terminal screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Figure 2. Dhrystone results with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Figure 3. Dhrystone results with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . . . . . . . . . . . 13
Figure 4. 32-bit code result with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Figure 5. 32-bit code result with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . . . . . . . . . . . 15
Figure 6. Literal pool reading with no additional data read from the Flash memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Figure 7. Literal pool reading with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . . . . . . . . . . 17
Figure 8. Dhrystone with no additional data read from the flash memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Figure 9. Dhrystone with DMA simultaneously reading data from the flash memory. . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Figure 10. Literal pool with no additional data read from the Flash memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 11. Literal pool with DMA simultaneously reading data from the Flash memory. . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 12. Dhrystone test plot of energy needed for execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 13. Energy cost of the dhrystone test loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 14. Literal pool chart plot of energy efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

AN4777 - Rev 4 page 34/35

AN4777

IMPORTANT NOTICE – READ CAREFULLY

STMicroelectronics NV and its subsidiaries (“ST”) reserve the right to make changes, corrections, enhancements, modifications, and improvements to ST
products and/or to this document at any time without notice. Purchasers should obtain the latest relevant information on ST products before placing orders. ST
products are sold pursuant to ST’s terms and conditions of sale in place at the time of order acknowledgment.
Purchasers are solely responsible for the choice, selection, and use of ST products and ST assumes no liability for application assistance or the design of
purchasers’ products.
No license, express or implied, to any intellectual property right is granted by ST herein.
Resale of ST products with provisions different from the information set forth herein shall void any warranty granted by ST for such product.
ST and the ST logo are trademarks of ST. For additional information about ST trademarks, refer to www.st.com/trademarks. All other product or service names
are the property of their respective owners.
Information in this document supersedes and replaces information previously supplied in any prior versions of this document.
© 2023 STMicroelectronics – All rights reserved

AN4777 - Rev 4 page 35/35

Solution Manual-Chemical Engineering Thermodynamics - Smith Van Ness
87% (184)
Solution Manual-Chemical Engineering Thermodynamics - Smith Van Ness
621 pages
Microprocessors and Microcomputers
91% (11)
Microprocessors and Microcomputers
573 pages
Esaote Falco
50% (2)
Esaote Falco
57 pages
Risc Processor
No ratings yet
Risc Processor
123 pages
QPDEC2010 Sem 1
No ratings yet
QPDEC2010 Sem 1
16 pages
An4777 How To Optimize Power Consumption On stm32 Mcus Stmicroelectronics
No ratings yet
An4777 How To Optimize Power Consumption On stm32 Mcus Stmicroelectronics
33 pages
AN4621 Application Note: STM32L4 and STM32L4+ Ultra-Low-Power Features Overview
No ratings yet
AN4621 Application Note: STM32L4 and STM32L4+ Ultra-Low-Power Features Overview
31 pages
An5568 Ultralowpower Features of Stm32wl Series Microcontrollers Stmicroelectronics
No ratings yet
An5568 Ultralowpower Features of Stm32wl Series Microcontrollers Stmicroelectronics
24 pages
AN3193 Application Note
No ratings yet
AN3193 Application Note
13 pages
01-Microcontroller
No ratings yet
01-Microcontroller
26 pages
an4746-optimizing-power-and-performance-with-stm32l4-and-stm32l4-series-microcontrollers-stmicroelectronics
No ratings yet
an4746-optimizing-power-and-performance-with-stm32l4-and-stm32l4-series-microcontrollers-stmicroelectronics
34 pages
ESARM Unit-III and IV Slides Merged
No ratings yet
ESARM Unit-III and IV Slides Merged
301 pages
ds80c320 Memory Interface Timing
No ratings yet
ds80c320 Memory Interface Timing
9 pages
LPC24xx User Manual
No ratings yet
LPC24xx User Manual
792 pages
LPC2148
No ratings yet
LPC2148
26 pages
microcontrollers-stm32l4-series-product-overview
No ratings yet
microcontrollers-stm32l4-series-product-overview
19 pages
LPC4370
No ratings yet
LPC4370
161 pages
Power Mode
No ratings yet
Power Mode
15 pages
Microcontrollers - STM32C0 in Person Workshop - V1.8
No ratings yet
Microcontrollers - STM32C0 in Person Workshop - V1.8
109 pages
Interfacing of MSP430
No ratings yet
Interfacing of MSP430
15 pages
LPC2148
No ratings yet
LPC2148
26 pages
dm00260799 Writing To Nonvolatile Memory Without Disrupting Code Execution On Microcontrollers of The stm32l0 and stm32l1 Series Stmicroelectronics
No ratings yet
dm00260799 Writing To Nonvolatile Memory Without Disrupting Code Execution On Microcontrollers of The stm32l0 and stm32l1 Series Stmicroelectronics
16 pages
S32K3xx Low Power Management
No ratings yet
S32K3xx Low Power Management
20 pages
stm32wb06kc
No ratings yet
stm32wb06kc
74 pages
LPC84 X
No ratings yet
LPC84 X
100 pages
Self Internet Notes
No ratings yet
Self Internet Notes
9 pages
LPC84 X
No ratings yet
LPC84 X
98 pages
S32G2 Memory
No ratings yet
S32G2 Memory
15 pages
Epmc Pu 0114 1.0
No ratings yet
Epmc Pu 0114 1.0
353 pages
2.1 LPC2129 Microcontroller: 2.1.1 Features
No ratings yet
2.1 LPC2129 Microcontroller: 2.1.1 Features
14 pages
LPC82X
No ratings yet
LPC82X
82 pages
STM32 Wireless Family Presentation
No ratings yet
STM32 Wireless Family Presentation
122 pages
General Description
No ratings yet
General Description
37 pages
19ECE304 - ARM Hardware
No ratings yet
19ECE304 - ARM Hardware
94 pages
ARM7 Processor Architecture
No ratings yet
ARM7 Processor Architecture
33 pages
Stm32, The Optimal Platform Choice The Stm32 Key Benefits: 72 MHZ Cortex-M3 Cpu - Wide Selection of Devices
No ratings yet
Stm32, The Optimal Platform Choice The Stm32 Key Benefits: 72 MHZ Cortex-M3 Cpu - Wide Selection of Devices
6 pages
Drills
No ratings yet
Drills
5 pages
STM32G4-System-Power Control PWR
No ratings yet
STM32G4-System-Power Control PWR
47 pages
3.4.5 Boot Modes: Chapter 3: Getting It Working
No ratings yet
3.4.5 Boot Modes: Chapter 3: Getting It Working
11 pages
Assignment Answers
No ratings yet
Assignment Answers
12 pages
Es & Vlsi 16-11-2021
No ratings yet
Es & Vlsi 16-11-2021
48 pages
Familiarization of Beaglebone Black: Exp No: 1 Date:15-01-16
No ratings yet
Familiarization of Beaglebone Black: Exp No: 1 Date:15-01-16
7 pages
MSP 430
No ratings yet
MSP 430
61 pages
Embedded Systems For ECE
No ratings yet
Embedded Systems For ECE
92 pages
Dm00272913 Level 1 Cache on Stm32f7 Series and Stm32h7 Series Stmicroelectronics
No ratings yet
Dm00272913 Level 1 Cache on Stm32f7 Series and Stm32h7 Series Stmicroelectronics
13 pages
EP - Unit - V - Real World Interfacing With Cortex M4 Based Microcontroller
100% (1)
EP - Unit - V - Real World Interfacing With Cortex M4 Based Microcontroller
114 pages
LPC82X
No ratings yet
LPC82X
85 pages
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
AMC Unit-4 Notes - (14-10-22)
No ratings yet
AMC Unit-4 Notes - (14-10-22)
26 pages
1-Megabit 2.7-Volt Minimum Dataflash At45Db011D: Features
No ratings yet
1-Megabit 2.7-Volt Minimum Dataflash At45Db011D: Features
52 pages
ATM33e Series Datasheet_0087
No ratings yet
ATM33e Series Datasheet_0087
43 pages
LPC214x Architecture - Peripherals and Programming
No ratings yet
LPC214x Architecture - Peripherals and Programming
44 pages
H27UBG8T2A Hynix
No ratings yet
H27UBG8T2A Hynix
67 pages
AN11027
No ratings yet
AN11027
25 pages
LPC2478 PDF
No ratings yet
LPC2478 PDF
93 pages
32-Bit Arm Cortex - M7 280 MHZ Mcus, 2-Mbyte Flash Memory, 1.4 Mbyte Ram, 46 Com. and Analog Interfaces, SMPS, Crypto
No ratings yet
32-Bit Arm Cortex - M7 280 MHZ Mcus, 2-Mbyte Flash Memory, 1.4 Mbyte Ram, 46 Com. and Analog Interfaces, SMPS, Crypto
231 pages
User Manual LPC3250
No ratings yet
User Manual LPC3250
721 pages
Preliminary H27UBG8T2A Series 32Gb (4096M X 8bit) NAND Flash
No ratings yet
Preliminary H27UBG8T2A Series 32Gb (4096M X 8bit) NAND Flash
67 pages
S32K DS
No ratings yet
S32K DS
101 pages
LPC 1788 User Manual
No ratings yet
LPC 1788 User Manual
1,108 pages
W78C32
No ratings yet
W78C32
14 pages
eced4402_Lab1
No ratings yet
eced4402_Lab1
11 pages
Computer Science II Essentials
From Everand
Computer Science II Essentials
Randall Raus
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
Learn the Pic® Micro on Your Smartphone
From Everand
Learn the Pic® Micro on Your Smartphone
Clive W. Humphris
No ratings yet
B-EV4 Series: Printer Setting Tool Operating Specification
No ratings yet
B-EV4 Series: Printer Setting Tool Operating Specification
16 pages
Tte Intro Embedded C
No ratings yet
Tte Intro Embedded C
54 pages
Mtask
No ratings yet
Mtask
3 pages
2 - Big Five
No ratings yet
2 - Big Five
34 pages
FRAM - Semiconductor Memory Types
No ratings yet
FRAM - Semiconductor Memory Types
5 pages
Gas Hydrates
100% (1)
Gas Hydrates
16 pages
Electrical Engineering Dynamics
No ratings yet
Electrical Engineering Dynamics
69 pages
Doe Fundamentals Handbook: Electrical Science Volume 2 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 2 of 4
118 pages
Synchronous Motors
No ratings yet
Synchronous Motors
22 pages
International and Global Environment: The Environment of Organization
No ratings yet
International and Global Environment: The Environment of Organization
10 pages
Single Phase Motors
No ratings yet
Single Phase Motors
26 pages
Doe Fundamentals Handbook: Electrical Science Volume 3 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 3 of 4
126 pages
Doe Fundamentals Handbook: Electrical Science Volume 1 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 1 of 4
166 pages
İzmir Institute of Technology Faculty of Engineering Department
No ratings yet
İzmir Institute of Technology Faculty of Engineering Department
11 pages
Doe Fundamentals Handbook: Electrical Science Volume 2 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 2 of 4
118 pages
Doe Fundamentals Handbook: Electrical Science Volume 2 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 2 of 4
118 pages
Doe Fundamentals Handbook: Electrical Science Volume 1 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 1 of 4
166 pages
Doe Fundamentals Handbook: Electrical Science Volume 2 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 2 of 4
118 pages
Doe Fundamentals Handbook: Electrical Science Volume 2 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 2 of 4
118 pages
Doe Fundamentals Handbook: Electrical Science Volume 1 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 1 of 4
166 pages
Doe Fundamentals Handbook: Electrical Science Volume 1 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 1 of 4
166 pages
Embedded Systems
No ratings yet
Embedded Systems
63 pages
Doe Fundamentals Handbook: Electrical Science Volume 1 of 4
No ratings yet
Doe Fundamentals Handbook: Electrical Science Volume 1 of 4
166 pages
Dynamic Simulation of Brushless DC Motor Drives
No ratings yet
Dynamic Simulation of Brushless DC Motor Drives
7 pages
Experiment #1 Alternator and Parallel Operation: 1. Theoretical Section
No ratings yet
Experiment #1 Alternator and Parallel Operation: 1. Theoretical Section
26 pages
Introduction To: Hurst Brushless DC Motors
No ratings yet
Introduction To: Hurst Brushless DC Motors
1 page
Digital Control of Four Quadrant Operation of BLDC Motor
No ratings yet
Digital Control of Four Quadrant Operation of BLDC Motor
10 pages
Microprocessors-Architecture and Programming
No ratings yet
Microprocessors-Architecture and Programming
31 pages
Types of Microcontrollers
No ratings yet
Types of Microcontrollers
7 pages
Es Unit - 5 Embedded System Development
No ratings yet
Es Unit - 5 Embedded System Development
32 pages
Von Neumann
No ratings yet
Von Neumann
3 pages
Nano Programmed Control
0% (1)
Nano Programmed Control
2 pages
8086 Instruction Set
100% (1)
8086 Instruction Set
101 pages
Embedded Cyberphysical And Iot Systems Essays Dedicated To Marilyn Wolf On The Occasion Of Her 60th Birthday 1st Ed Shuvra S Bhattacharyya instant download
No ratings yet
Embedded Cyberphysical And Iot Systems Essays Dedicated To Marilyn Wolf On The Occasion Of Her 60th Birthday 1st Ed Shuvra S Bhattacharyya instant download
88 pages
Arduino - PinMapping2560
100% (1)
Arduino - PinMapping2560
6 pages
Intel Overclocking Guide
No ratings yet
Intel Overclocking Guide
36 pages
Microcontrollers 8051 MSP430 Notes For IV Sem ECE Students
100% (2)
Microcontrollers 8051 MSP430 Notes For IV Sem ECE Students
61 pages
Download Complete Embedded System Design : Embedded Systems, Foundations of Cyber-Physical Systems, and the Internet of Things Marwedel PDF for All Chapters
100% (1)
Download Complete Embedded System Design : Embedded Systems, Foundations of Cyber-Physical Systems, and the Internet of Things Marwedel PDF for All Chapters
43 pages
VOS Startup and Shutdown A System (r282-03)
No ratings yet
VOS Startup and Shutdown A System (r282-03)
198 pages
Information Technology For Business
No ratings yet
Information Technology For Business
43 pages
Intel® Driver & Support Assistant - Detailed Report
No ratings yet
Intel® Driver & Support Assistant - Detailed Report
5 pages
PIC16 (L) F722A/723A: 28-Pin Flash Microcontrollers With XLP Technology
No ratings yet
PIC16 (L) F722A/723A: 28-Pin Flash Microcontrollers With XLP Technology
266 pages
Netdata Debian Linux
No ratings yet
Netdata Debian Linux
52 pages
A 8 Bit Alu
100% (2)
A 8 Bit Alu
2 pages
Sharjeel Zaidi Microprocessor
No ratings yet
Sharjeel Zaidi Microprocessor
23 pages
Verdeyen Laser Electronics Solutions
No ratings yet
Verdeyen Laser Electronics Solutions
27 pages
Computer Hardware & Software Notes-3
No ratings yet
Computer Hardware & Software Notes-3
47 pages
Embedded Systems Lab Lab 5 Introduction To Microcontrollers (PIC16F84A)
100% (1)
Embedded Systems Lab Lab 5 Introduction To Microcontrollers (PIC16F84A)
3 pages
Computer Processors
No ratings yet
Computer Processors
10 pages
Uni_of_Nottingham_2025 Data Scientist Programme Guide (3)
No ratings yet
Uni_of_Nottingham_2025 Data Scientist Programme Guide (3)
15 pages
Advanced Computer Architecture 1 1
No ratings yet
Advanced Computer Architecture 1 1
118 pages
D.L. QUESTIONS
No ratings yet
D.L. QUESTIONS
6 pages
Pentium 4 Cache Presentation
No ratings yet
Pentium 4 Cache Presentation
20 pages
8085 Pin Diagram
No ratings yet
8085 Pin Diagram
48 pages