01 Shore Arm
01 Shore Arm
Chris Shore
Training Manager
ARM Ltd
Cambridge, UK
[email protected]
Abstract—The design of real-time embedded systems involves So, assuming that we have a system which is powerful
a constant trade-off between meeting real-time design goals and enough to meet all its real-time design constraints, we have a
operating within power and energy budgets. This paper explores secondary problem.
how the architectural features of ARM® Cortex®-M
microcontrollers can be used to maximize power efficiency while That problem is making the system as efficient as possible.
ensuring that real-time goals are met. If we have sufficient processing power, it is relatively trivial to
conceive and implement a design which meets all time-related
Keywords—interrupts; microcontroller; real-time; power goals but such a system might be extremely inefficient when it
efficiency. comes to energy consumption. The requirement to minimize
energy consumption introduces another set of goals and
I. INTRODUCTION constraints. It requires us to make use of sleep modes and low
A “real-time” system is one which has to react to external power modes in the hardware, to use the most efficient
events within certain time constraints. Wikipedia says: methods for switching from one event to another and to
maximize “idle time” in the design to make opportunities to
“A system is real-time if the total correctness on an use low power modes.
operation depends not only on its logical correctness but
also upon the time in which it is performed.” The battle is between meeting the time constraints while
using the least amount of energy in doing so. This is a constant
We can talk about varying degrees of real-time behavior battle and one which cannot be forgotten. The simplest and
but the message is still the same – if the system does not meet most straightforward design is unlikely to be the most energy-
its real-time design goals, it is not a functionally correct efficient one. Likewise, the most energy-efficient design will
system. most likely not meet all of its real-time goals.
So, if an event occurs – for example, the user pressing a
key, or the power system registering a voltage drop, or the II. A SIMPLE REAL-TIME SYSTEM
arrival of a message from some other system, all of which will So, let us imagine you are building a simple system. How
have different reaction times associated with them – the system do you organize your software architecture so that all these
must do something. Most importantly, it must do that competing inputs are dealt with? Figure 1 shows an example
something within a certain time. If it does not do that, it has set of inputs which you might have in a typical system.
failed as a system and is not fit for purpose.
Designing systems to meet real-time constraints is hard,
fixing them when those constraints are not met is also hard. It
is a complex job of prioritizing the events so that those with
hard and short deadlines are handled first, with great urgency,
and those with softer, longer deadlines are handled later. In the
end, if the system does not have enough processing power to
accomplish all its goals, it will need completely redesigning
and it may be impossible to fix it without changing to more
powerful hardware.
www.embedded-world.eu
main()
{
Init();
while (1)
{
if (StuffToDo)
{
DoStuff();
}
}
}
Then we have an ADC, converting some external voltage The principle method we have available to use is interrupts.
into something we can measure and record. How quickly this B. Interrupts
needs servicing depends on how often the measurements are
Here is a system which uses only interrupts to drive
made. Perhaps the measurements are made every 100ms, or
everything. We assume that each input event is associated with
perhaps only once every minute. The real-time constraint is
an input to the processor which will generate an input when
very different in each case.
action is required. Each input event is then serviced when its
There are other inputs too, each with different interrupt fires and, most importantly, ONLY when its interrupt
characteristics and different real-time requirements. The first fires. There is no need to waste time and energy checking each
task is to establish the real-time requirements for each input input all the time because we know that we will receive an
separately. interrupt when something needs doing. So we just wait for the
interrupts to roll in and there is nothing else to do.
A. Superloop
The easiest thing to do with the software is simply to write Our main program is simply an empty infinite loop which
a single loop which cycles around all of the input sources in sleeps all the time waiting for interrupts to happen. All of the
turn and checks whether they require action. This is very easy code is in the interrupt handlers.
to write in software but it is also very simplistic and potentially Main()
inefficient. It services all of the input events with the same {
priority, in the same order and to some extent the speed at Init();
while (1)
which the loop much cycle round is dictated by the shortest, the {
tightest of the real-time deadlines in the system. So, if the clock Sleep();
needs servicing once every millisecond, the loop will check all }
}
the other inputs once every millisecond too - whether they need interrupt()
some action or not. Of course, most of the time, most of them {
will not need any action, so checking them is wasted effort. In DoStuff();
a real-time system, that is wasted time and in an embedded, }
battery-powered system, that is wasted energy. You may have heard that it is good practice to keep
So, all in all, this is not a very good solution but it is a very interrupt service routines short. This is a potential disadvantage
easy one. of processing everything in interrupt handlers. If one particular
event requires a lot of time-consuming processing, then this
can hold up the other events and that may break some of the
real-time goals in your system design. So, one option is to
divide the processing between the interrupt handlers, which we immediately go to sleep if there were no other interrupts
often call the “background” and the main application, which needing action.2
we might call the “foreground”. Then we can divide the
processing for each event into two pieces: the part which needs Recall also the concept of interrupt latency. This is defined
to be carried out urgently as soon as the interrupt is triggered; as the length of time it takes from the moment the interrupt
and a second part which can wait until there is nothing else event occurs to the execution of the first instruction of the
pending and which can be executed in our foreground main interrupt handler. Obviously we want it to be as short as
loop. possible but it depends on many things, including how long it
takes to save context, how many other events are waiting to be
Here is how we might do this1. processed at the time, whether the current routine can be
interrupted or whether the new interrupt must wait until it has
Main()
{ completed, whether the system is currently in some low power
Init(); state and how long it will take to wake up, and so on. We will
while (1) look at ways of minimizing and managing this latency.
{
Sleep(); A. Interrupt Timing
if (StuffPending)
{ Figure 2 is a simple diagram showing the timing around the
DoForegroundStuff(); execution of a single interrupt. We assume that interrupts are
} enabled and the current task can be interrupted immediately so
}
} neither of those make any contribution to latency. The interrupt
interrupt() is detected and actioned as soon as it occurs.
{
DoStuff(); You can see that there is something which we call “entry
PendForegroundStuff(); latency” - the time it takes the system to interrupt what it is
} doing, save context and begin executing the interrupt handler.
C. Polling or Interrupts? On a Cortex-M microcontroller, this is handled completely in
hardware and on a Cortex-M3 it takes 12 cycles.
In general, an interrupt-based solution will exhibit better
real-time performance than a polling solution but the Then, at the end, you can see that there is some time to
comparison becomes less clear when considering energy- wrap up and finish off the interrupt. We call that “exit latency”.
efficiency. On a Cortex-M microcontroller this is also handled completely
by the hardware and on a Cortex-M3 it takes 10 cycles.
This is because interrupts have an overhead associated with
them – servicing an interrupt involves some overhead to do Clearly entry latency determines how quickly we can action
with saving system context, putting stuff on the stack and so any particular input event. Exit latency has an effect on that
on. If your main loop is able to make very effective use of the too. If you have more than one interrupt, then the exit latency is
system’s sleep capability, then perhaps a polled solution might part of what determines how quickly you can service the next
be more energy efficient than an interrupt-based solution. event.
Cortex-M microcontrollers incorporate a specific set of
hardware optimizations which make it possible to implement
very efficient interrupt-based systems. When these are taken
into consideration, an interrupt-based solution will almost
always be by far the more efficient.
Before I introduce you to these features, first of all let us
revise how interrupts work.
III. INTERRUPTS-101
When an interrupt occurs, the system saves the current
context (a subset of the main register bank plus some status
information) and jumps to a dedicated piece of software called
an interrupt handler. This software handles the event and
carries out whatever processing is necessary. When this is Figure 2- Interrupt Latency
complete, the system restores the context and returns to the
interrupted application. In the case of our simplest system, that Figure 3 shows two interrupts. Assuming no pre-emption,
would simply return to the main loop where it would something we will get on to shortly, the second must wait for
first to complete, so the total latency for the second event
1 2
It is worth noting that, if you are using an RTOS, you can build In some systems, the saving and restoring of context could be
any of these architectures using standard RTOS features. The main very simple, just switching a few registers; in others, it might involve
loop simply becomes a task, or a set of tasks, and interrupt handlers putting a large number of registers on the stack. It could be quite time-
are installed as normal. consuming.
www.embedded-world.eu
actually includes the exit latency from the first. Actually, it is and longer. You can see that the latency for the fourth interrupt
worse than that. If the second event cannot pre-empt or here is the sum of the total processing time of all the others.
interrupt the first before it has finished, then the potential This is unlikely to be acceptable in the majority of real-time
maximum latency for the second event is made up of three systems and a mechanism is required for getting round this.
things:
1) the exit latency from the first interrupt (10 cycles on a
Cortex-M3);
2) the entry latency for the second interrupt (12 cycles on
a Cortex-M3);
3) the entire processing time of the interrupt handler for
the first event.
Figure 6 - Pre-emption
www.embedded-world.eu
IV. DEFINING A PRIORITY SCHEME
A. Simple Priority
In a Cortex-M microcontroller, each interrupt can be
assigned a priority, configured by registers in the interrupt Figure 13 - Reduced number of priority levels
controller. The architecture allows up to 8-bits of priority, so
you could have 256 priority levels as shown in Figure 10. But The key thing to realize here is that only the pre-empt
you only have that many levels if the chip designer has actually priority affects pre-emption, so we can group interrupts
implemented all of the bits.3 together in such a way that only certain interrupts can actually
interrupt others. This gives us flexibility to manage latency
while keeping control of the amount of pre-emption which
goes on. And since pre-emption, as we have seen, has an
energy cost, it is important to eliminate unnecessary pre-
emption.
Figure 10 - A simple priority scheme
Next we will look at an example of using these two priority
Figure 11 shows an example in which only 3 bits are fields to configure interrupts into groups.
implemented, giving 8 priority levels. The interrupt controller
architecture is defined so that a lower priority value actually C. Interrupt Groups
gives an interrupt a higher relative priority - an interrupt In Figure 14 we have grouped our interrupts into four sets.
configured with priority level 3 is a higher priority interrupt The top two are timers and are in groups on their own - the top
than an interrupt with priority level 8, and so on.4 one is assumed to be the highest priority interrupt in the
system. It will pre-empt anything else and will have the lowest
and most consistent latency.
The next one down is another timer but we assume it is not
quite so important so it goes into the next level down. It will
Figure 11 - Restricted priority levels pre-empt everything except the main OS timer.
B. Sub-Priority Then we group together two interrupts associated with the
The priority register can be split, under software control, UART, one for receive and one for transmit. We make
into two fields, called “pre-empt priority” and “sub-priority”. reception more important than transmission as we do not want
to risk losing data.5
Figure 12 shows an example where all 8 priority bits are
implemented and the register is split into a 3-bit pre-empt All the rest of the interrupts are grouped at the lowest level
priority field and a 5-bit sub-priority field. That allows 8 levels in the system. They cannot pre-empt each other, nor can they
of priority which control pre-emption and then 32 levels of pre-empt any of the higher priority events. Their latency will be
sub-priority. the longest in the system but these events are such that this
does not matter. We do not need to expend energy getting to
them more quickly.
But notice that we can actually give them a priority relative
to each other. This is a priority which only applies within this
Figure 12 - Pre-emption priority and sub-priority group but it is important. This sub-priority determines the order
in which they will be handled if more than one of them occur
Figure 13 shows another example where not all the bits are simultaneously. So, if we finish processing all the higher
implemented – in this case we have 6 bits of priority which are priority events and there are a number of these lower ones
split into a 2-bit pre-empt priority and a 4-bit sub-priority. pending, then they will be serviced in the order determined by
the sub-priority. This mechanism is very powerful and allows
3
This is one of the elements of the processor which the chip us to manage the relative latency of all events at the same
designer can configure in order to reduce the gate count of the core. priority. In this case, since we want to respond promptly to the
Implementing fewer bits of priority reduces the complexity of the user pressing a key, we specify that the Keypad interrupt
interrupt controller quite considerably and reduces the overall size and should be handled first, then the ADC, then others.
power consumption of the processor. So, a chip designer will usually
implement fewer than 8 bits.
4
Note that any unimplemented bits are always omitted from the
bottom end of the priority value and they simply read as zero. This is
useful to know that when you are programming a device which has
fewer than 8 bits in the priority registers.
In Figure 10, priorities start from zero and increase in steps of one;
in Figure 11, priority starts from 0 and increases in steps of 0x20, or 5
If the two events occur simultaneously, it would be important to
32 in decimal. So the priorities are 0, 32, 64, 96 and so on. read the incoming data out (before it is potentially over-written by
further incoming data) before writing outgoing data.
What happens when you go to sleep depends on the
processor you are using. Cortex-M microcontrollers have a
pretty standard set of power saving modes which are illustrated
in Figure 15.
You can see that they range from Power Off on the left, to
Active on the right. There are various steps in between and it is
important to understand that exactly what they do is
determined by the designer of the particular chip you are using.
The difference between Sleep mode and Deep Sleep mode, for
instance, depends on the configuration of power domains
within the chip and which particular components the chip
Figure 14 - An example priority scheme
designer decided should be powered down at this transition.6
www.embedded-world.eu
system simply wakes up automatically when the next interrupt like this is really easy to do and results in a very energy-
happens. efficient design.
The ARM C compiler has an intrinsic function, __WFI(), Because the SLEEPONEXIT bit is set, the system sleeps
which can be used directly from C to avoid having to write any automatically at the end of the last interrupt. Since the event
assembler. which will wake it up will be another interrupt, it is able to
leave the context on the stack. When the next interrupt is
Main()
{ triggered and the system wakes up, it does not need to save
Init(); context and can jump immediately to the interrupt handler.
while (1) This takes only a few cycles and avoids all the time and energy
{ cost of saving and restoring context.
__WFI();
} As we have seen, these features are very easy to use from a
}
interrupt() software point of view.
{
DoStuff(); But it gets better!
}
B. Managing Latency
Most RTOSs will provide an API for the sleep function. If Recall from Figure 6 how to maintain the low latency
one is available, it should be used in preference to calling WFI requirements of a particular interrupt by giving it a higher
directly, as the RTOS may need to carry out some internal priority. The penalty we pay for that is that the time to
configuration actions before activating low power state. complete the low priority interrupt is extended. In this case, by
the entire time it takes to process the high priority interrupt.
VI. EVERYTHING IN INTERRUPTS
That may be a problem and to fix it we need to split the
A. An Interrupt-Only System processing of the high priority interrupt into two parts: a short
In a system which does everything in interrupts, you might part which needs to be carried out urgently, followed by a
do something like this. longer part which can wait a little.
Main() One method of doing this is to set a flag in the interrupt
{ handler and then drop back into the main, or foreground, loop
Init();
Set SLEEPONEXIT; when interrupts have completed. This would be done by
unsetting the SLEEPONEXIT bit so that the system would not
while (1) sleep when the last interrupt handler exits. Instead, the system
{ would fall back into the main loop and carry out any
__WFI();
// will never get here; processing necessary before setting SLEEPONEXIT and going
} to sleep again to wait for the next event.
}
interrupt() The disadvantage of this method is that it requires continual
{ management of the SLEEPONEXIT bit. It also requires
DoStuff();
}
frequent entry to the foreground task and we have remarked
that there is a penalty for doing that – the system must restore
This system will never actually execute its main loop - it context again (taking all the registers off the stack) and then
just sleeps. Everything is handled in interrupts and the system save context again (putting all the registers back on the stack)
automatically enters low power state when there are no more when returning to the interrupt context. If the foreground
interrupts pending. You would observe a power consumption processing lasts some while, then we would do this potentially
profile like the one shown in Figure 16. many times, and the potential energy cost is actually quite high.
Cortex-M microcontrollers provide a features which allows
us to do this while remaining in the interrupt context.
C. The PendSV Exception
The PendSV exception is often used by operating systems
to handle context switches but we can use it here to delay
processing of low latency interrupts without having to drop
into the foreground.
PendSV is an architecturally-defined interrupt and is
typically configured with the lowest priority in the system. So,
Figure 16 - Example power consumption profile
in Figure 17, it is added it to the list we had before with a pre-
empt priority of 15, the lowest available in a system with only
Once the system is initialized, it is active only when 4 bits of priority. Because it has the lowest priority in the
interrupts are being processed and it sleeps automatically system, it never pre-empts anything but gets handled when all
whenever there are no interrupts pending. Setting up a system
other interrupts have been processed and no others are pending. It then completes the processing of IRQ2.
We can trigger it when we need to under software control.
No context switch is required, other than a tail-chain from
the final interrupt into the PendSV handler. When the PendSV
handler completes, the system automatically goes back to
sleep.
VII. CONCLUSIONS
For the vast majority of systems, a properly implemented
interrupt-based solution is more efficient and exhibits better
real-time characteristics than a polling solution - certainly on a
Cortex-M microcontroller which has all these interrupt system
optimizations.
Polling creates potential problems with managing latency
and power; interrupts, except in the simplest systems, always
have some small overhead in saving and restoring context. But
this can be managed and minimized in a way which polling
Figure 17 - Priority scheme including PendSV
does not allow.
Here are some details about PendSV. The features of the Cortex-M microcontroller architecture
which make this possible are:
1) It is an internal exception - that means it is built in to
the system. 1) a hardware-supported interrupt entry and exit
sequence;
2) It is imprecise – that means that it does not occur
when you trigger it but happens at some later time. 2) tail-chaining support in hardware which minimizes
the time and energy cost of switching from one
3) It can be triggered under software control by setting a
interrupt to the next;
bit in the Interrupt Control and State Register.
3) a configurable priority scheme which separates pre-
4) It can be triggered by any interrupt handler.
emption priority and sub-priority;
And because we will configure it with the lowest priority in
4) the ability to enable a low power state automatically
the system, it gets taken only when all other interrupts have
when there are no interrupts pending;
completed.
5) low power states which can entered and exited with
The effect can be seen in Figure 18.
almost no penalty at all;
6) the PendSV mechanism which allows us to split an
interrupt handler into an urgent part and a part which
can safely be deferred.
How best to use these features depends, as usual, on each
individual system, its real-time requirements and its energy
constraints.
For those interrupts for which latency is most
important, prioritize them carefully to allow pre-
emption where necessary.
For those which take a long time to process,
consider making use of the PendSV feature.
Figure 18 - Using PendSV To minimize energy use, maximize tail-chaining -
in order to do that, group as many interrupts as
IRQ1 wakes the system up and starts to execute possible at the same pre-empt priority and use sub-
ISR1. priority to control the order in which they are
ISR2 still pre-empts ISR1 so the low latency processed.
requirement of IRQ2 is satisfied but ISR2 only As ever, there is a spectrum of possibilities. It is always
does what it needs to do urgently, sets PendSV worth doing some modelling and, if possible, some
and then exits, allowing ISR1 to finish earlier. measurements on a real prototype system to determine which
Once ISR1 has finished, assuming nothing else is approach is best for your system.
pending, the processor automatically tail-chains
straight into the PendSV handler.
www.embedded-world.eu
REFERENCES [2] ARM, “Cortex-M3 Devices Generic User Guide”, ARM Ltd, ARM DUI
0552A
[3] Joseph Yiu, “The Definitive Guide to ARM Cortex-M4 and Cortex-M4
[1] ARM, “ARMv7-M Architecture Reference Manual”, ARM Ltd, ARM Processors”, Newnes
DDI0403