30 Pitfalls in Rtos
30 Pitfalls in Rtos
DAVID B. STEWART
f e a t u r e
30 Pitfalls for
Real-Time Software
Developers, Part 1
The path to successful real-time software development is strewn with pitfalls along the way that can trap the
unwary programmer. This month and next the author guides you past 30 of them.
N
ovices and experts alike, whether in a uni- creating software that is both more reliable and easier to
versity or corporation, repeat the same mis- maintain.
takes over and over again when developing This list first began as the 10 most common pitfalls, but
real-time software. I have observed this while there were just so many common mistakes and problems
reviewing and grading code in academic pro- that the list grew. It expanded through 15 and 25, to its pre-
jects, and as a consultant involved in numer- sent number. This month, I’ll present problems 30 through
ous design and code reviews for industry. 16; the rest I’ll lay out for you next month.
Most real-time software developers are not even aware For each problem, I present the misconception or
that their favorite methods can be problematic. Quite often, source of the problem. Then I offer possible solutions or
experts are self-taught; hence they tend to have the same alternatives that can help minimize or eliminate the mis-
bad habits as when they first began, usually because they takes. If you’re not familiar with the details or terminology
never witnessed better ways of programming their embed- of the alternate solutions, then a quick library or Web
ded systems. These experts then train novices, who subse- search should yield additional literature on the topic. While
quently acquire the same bad habits. The purpose of this there is usually agreement about most items being mistakes,
article is to improve your awareness of common problems, some of the mistakes listed and the corresponding pro-
and to provide a start towards eliminating mistakes and thus posed solutions may be controversial. In such cases, simply
highlighting that there is a disagree- should be more open-minded about Every application is unique, but more
ment as to what is the best way to alle- the similarities in their work. Even often than not the procedure to speci-
viate these problems encourages what seems like the most different fy, design, and build the software is the
designers to compare their methods applications are probably nearly iden- same. Embedded software designers
to other approaches, and to reconsid- tical when you consider the nuts and should learn as much as possible from
er if their methods are provably better. bolts of the real-time infrastructure. the experiences of others and not
Correcting just one of these mis- For example, communications engi- shrug off experience just because it
takes within a project can lead to neers will claim their applications have was acquired in a different application
weeks or months of savings in man- no similarities to systems designed by area.
power (especially during the mainte- control engineers because of the high
nance phase of a software life cycle) or volume of data and the need for spe- #29 Tools choice driven by marketing
can result in a significant increase in cial processors such as digital signal hype, not by evaluation of technical needs
quality and robustness of the applica- processors (DSPs). In response, ask Software tools for embedded sys-
tion. If multiple mistakes are common “What is different in the LCD display tems are often purchased based on the
and they are all fixed, potential com- software in a cellular phone vs. one in flashiness of the marketing, because a
pany savings or additional profits can a temperature controller? Are they lot of other people are using them, or
be in the thousands or millions of dol- really different?” because of a feature that sounds
lars. Thus I encourage you to review Comparing control and communi- appealing but really does not make a
your current methods and policies, cation systems side-by-side, both are difference.
compare them to each of the reported characterized by modules that have Flashiness. Just because one tool has
mistakes and the proposed alterna- inputs and outputs, with a function a prettier graphical user interface
tives, and decide for yourself if poten- that maps the input to the output. A than another does not make it better.
tial savings exist for your company or 256 x 256 image processed by a DSP It’s important to consider the techni-
project. Even if there are no direct sav- algorithm might not be that different cal capabilities of each, relative to the
ings, consider the potential for from graphical code for an LCD dot needs of the application being built.
improved quality and robustness at no matrix display of size 320 x 200. Number of users. Buying software
extra cost by modifying some of your Furthermore, both use hardware with from a vendor just because it’s the
current practices. limited memory and processing power biggest does not mean it’s the best.
Here now are the first 15 of the 30 relative to the size of the application; Along with pitches that more people
most common mistakes; problems that both require development of software are using the software are probably
are higher on the list (where #30 is on a platform distinct from the target, hidden true stories that more people
lowest and #1 is highest on list) are and many of the issues in developing are paying for more than they really
either more common and/or have the software for a DSP also apply to devel- need, or that more people have
most impact on quality, development oping software for a microcontroller. unused versions of the tools sitting on
time, and software maintenance. The timing and volume of data are the shelf after discovering the tools
Naturally, the order represents my different. But if the system is designed were not suited to their needs.
opinion. It’s not so important that one correctly, these are just variables in Promises of compatibility. Managers
mistake is listed higher on the list than equations. Methods to analyze are especially influenced by a product
another. What is important is that resources such as memory and pro- because of promises of compatibility.
both are listed, thus both may be sig- cessing time are the same—both may So what if software is 100% POSIX-
nificant in your specific environment. require similar real-time scheduling, compliant? What is its relevance? Is
and both may also have high-speed there a plan to change the operating
#30 “My problem is different.” interrupt handlers that can cause pri- system? Suppose there is a change to
Many designers and programmers ority inversion. another POSIX-compliant operating
RUPERT ADLEY
refuse to listen to the experiences of Perhaps if control systems and com- system—what is there to gain?
others, claiming that their applica- munication systems are similar, so are Absolutely nothing, unless “exten-
tions are different, and of course, two different control applications or sions” are used. But if such extensions
much more complicated. Designers two different communication systems. are used, compatibility is lost, hence
the benefits are no longer there. if (x == 1) change, but the delay time would
Standards such as POSIX have not x=0; remain the same.
been proven to even be good for real- else If the timer doesn’t support read-
time systems, let alone the best. x=1 ing intermediate count-down values,
Therefore, don’t assume that the an alternative is to profile the speed of
product is better because of that Instead, a Boolean algebra computa- the processor during initialization.
promise. Portability and reusability tion would be the following: Execute an empty loop continuously
can only be achieved if all the design- and count how often it occurs between
ers follow proven software engi- x = !x; // x = NOT x; can also use two timer interrupts. Since frequency
neering strategies for developing com- // x = 1-x of the timer interrupt is known, a
ponent-based software.1,2 value for the number of microseconds
When selecting tools, consider the Despite the simplicity, many program- per iteration can be computed. This
needs of the application first; then mers still toggle a Boolean value with value is then used to dynamically
investigate the dozens (or hundreds) the if statement above. determine how many iterations of the
of options available from a technical loop to perform for a specified delay
perspective, as they relate specifically #27 Delays implemented as empty loops time. In our custom RTOS with this
to the application requirements. The Real-time software often uses delays implementation, the delay function
best tools for a particular design or to ensure that data sent or received was accurate within 10% of the desired
application are not necessarily the over an I/O port has time to propa- time for any processor with which we
most popular. gate. These delays are frequently tested it, without ever having to
implemented by putting a few no-ops change the code.
#28 Large if-then-else and case statements or empty loops (assuming volatile is
It’s not uncommon to see large if- used if the compiler performs opti- #26 Interactive and incomplete test programs
else statements or case statements in mizations). If this code is used on a Many embedded designers create a
embedded code. These are problem- different processor, or even the same series of test programs, each program
atic from three perspectives: processor running at a different rate testing a separate feature. Test pro-
(for example, a 25MHz vs. 33MHz grams need to be executed one at a
• Such statements are extremely dif- CPU), the code may stop working on time, and in some cases require the
ficult to debug, because code ends the faster processor. This is especially user to provide input (say, through a
up having so many different paths. something to avoid, since it results in keypad or switch) and observe the out-
If statements are nested it becomes the kind of timing problem that is put response. The problem with this
even more complicated extremely difficult to track down and method is that programmers tend only
• The difference between best-case solve, because the symptoms of the to test what they are changing. Since
and worst-case execution time problem are sporadic. there are often interactions between
becomes significant. This leads to Instead, use a mechanism based on unrelated code due to the sharing of
either under-utilizing the CPU, or a timer. Some RTOSes provide these resources, every time a change is
the possibility of timing errors functions, but if not, one can still easi- made, the entire system should under-
when the longest path is taken ly be built. Following are two possi- go testing.
• The difficulty of structured code bilities to build a custom delay(int To accomplish this, avoid interac-
coverage testing grows exponen- usec) function. tive test programs. Create a single test
tially with the number of branches, Most count-down timers allow the program that goes through as much
so branches should be minimized software to read a register to obtain self-testing as possible, so that any time
the current count-down value. A sys- even the smallest change is made, a
Computational methods can often tem variable can be saved to store the complete test can easily and quickly be
provide an equivalent answer. rate of the timer, in units such as performed.
Performing Boolean algebra, imple- microseconds per tick. Suppose the Unfortunately, this is more easily
menting a finite state machine as a value is 2µs per tick, and a delay of said than done. Some testing, especial-
jump table, or using lookup tables are 10µs is required: the delay function ly of I/O devices, can only be done
alternatives that can reduce a 100-line busy-waits for five timer ticks. Suppose interactively. Nevertheless, the princi-
if-else statement to less than 10 lines of a different speed processor is used— ple of automated testing should be at
code. the timer ticks are still the same. Or if the forefront of any attempt to create
Here is a trivial example of con- the timer frequency changes, then the test software, and not a side-thought
verting an if statement to Boolean system variable would change, and the with test code written only on an as-
algebra: number of ticks to busy-wait would needed basis.
#25 Reusing code not designed for reuse sors that are very different (for exam- timing implications of every single line
Code that is not designed for reuse ple, from different manufacturers and of code is important. Understand the
will not be in the form of an abstract using different architectures). capabilities and limitations of the tar-
data type or object. The code may get processor(s), and redesign an
have interdependencies with other #23 One big loop application that makes excessive use of
code, such that if all of it is taken, When real-time software is slow instructions. For example, for the
there is more code than needed. If designed as a single big loop, we have Z180, doing everything in float is bet-
only part is taken, it must be thor- no flexibility to modify the execution ter than having only some variables
oughly dissected, which increases the time of various parts of the code inde- float and lots of mixed-type arithmetic.
risk of unknowingly cutting out some- pendently. Few real-time systems need
thing that is needed, or unexpectedly to operate everything at the same rate. #21 Over-designing the system
changing the functionality. If code If the CPU is overloaded, one of the If the processor and memory uti-
isn’t designed for reuse, it’s better to methods to reduce utilization is to lization are less than 90% on average
analyze what the existing code does, selectively slow down only the less crit- and less than 100% peak, then the sys-
then redesign and re-implement the ical parts of the code. This approach tem has probably been over-designed.
code as well-structured reusable soft- works, however, only if the multitask- Writing programs for a processor with
ware components. From there on, the ing features of an RTOS are used, or more than enough resources is a luxu-
code can be reused. Rewriting this the code was developed based on a ry for a software developer. In some
module will take less time than the flexible custom or commercial real- cases, however, this luxury is so costly
development and debugging time time executive. that it can make the difference
needed to reuse the original code. between a profit and bankruptcy!
A common misconception is that #22 No analysis of hardware peculiarities Contributing towards minimizing the
because software is defined in separate before starting software design price and power consumption of an
modules, it is naturally reusable. This How long does it take to add two embedded system is a software engi-
is a separate mistake on its own, relat- eight-bit numbers? What about two 16- neer’s duty. If the CPU is only 45% uti-
ed to creating software with too many bit or 32-bit numbers? What about two lized, you can use a processor that
dependencies. See more details in mis- floats? What if an eight-bit number is operates at half the speed instead,
take #18. added to a float? A software designer thus saving as much as four times the
who cannot answer these questions off power and possibly one or more dol-
#24 Generalizations based on a single the top of his or her head for the tar- lars per processor.
architecture get processor isn’t adequately pre- If the product is mass-produced,
Embedded software designers may pared to design and code real-time saving $1 on the processor could save
have the need to develop software that software. a million dollars over the production
is intended to run on a variety of pro- Here are sample answers to the span of the item. If the product is bat-
cessors and platforms. In such a case, above measurements for a 6MHz Z180 tery-powered, it will allow the battery
it’s not uncommon for the program- (in microseconds): 7, 12, 28, 137, and to last much longer, thus increasing
mer to begin writing software for one 308. Note that it takes 250% more the marketing appeal of the product.
of the platforms, but generalize any- time to do float plus byte than float As an extreme example of power con-
thing and everything in preparation plus float, due to the long conversion sumption of computers, consider a
for porting the code at a later time. time from byte to float. Such anom- laptop. Most have less than three
Unfortunately, doing so usually alies are often the source of code that hours of power when using a heavy
causes more harm than good. The overloads the processor. battery. A watch, however, has a light-
design will tend to over-generalize In another example, a special pur- weight, cheap battery that can last
items that are very similar on very dif- pose floating-point accelerator did three years. Although software isn’t
ferent architectures, while not gen- floating-point addition/multiplication usually associated with power con-
eralizing some items that are different, 10 times faster than a 33MHz 68882, sumption, it does have a major role.
but that the designer did not foresee but sin() and cos() took the same Fast processors and more memory
as different. amount of time. This is because the than necessary tend to also lead to lazi-
A better strategy is to design and 68882 has the trigonometric functions ness in thinking about the design.
develop the code simultaneously on built into its hardware, while the float- Start embedded development with
multiple architectures, generalizing ing point accelerator did those partic- slower processors with less memory,
only those parts that are different in ular functions in software. and move up to the next level of
the different architectures. Inten- When code is implemented for a processor only on an as-needed basis.
tionally choose three or four proces- real-time system, being aware of the Software that uses hardware more effi-
FIGURE 1 Examples of dependency graphs, with and without cycles. An objective in developing good software is to
decompose code into modules to minimize or eliminate circular dependencies. a) Dependency graph with no cycles.
This is desirable. b) Dependency graph with cycle between ghi and jkl. c) Dependency graph with many circular
dependencies, including a major circular dependency.
ciently is more likely to evolve from the CPU is overloaded, it’s nice to A dependency diagram consists of
this approach than from later trying to know that a variety of places remain in nodes and arrows, such that each
cut corners to bring down the cost of the code where simple, straightfor- node represents a module (such as
the system. ward optimizations can be performed one source code file), and the arrows
quickly. show dependencies between that node
#20 Fine-grain optimizing during first and other modules. Modules on the
implementation #19 “It’s just a glitch.” bottom-most row are not dependent
The converse to problem #21 is Some programmers use the same on any other software module. To
also a common mistake. Some pro- workarounds over and over again maximize software reusability, arrows
grammers foresee anomalies (some because the system has a glitch. A pro- should always point downwards, and
are real, some are mythical). An exam- grammer’s typical response is that it not upwards or bidirectionally. For
ple of a mythical anomaly is that mul- always executes well if the workaround example, module abc depends on
tiplication takes much longer than is used. module def if it has a #include “def.h”
addition. Many designers would Unfortunately, the same errors that in the code, or an extern declaration
implement 3*x as x+x+x. On many force a workaround are likely to resur- in the file abc.c to a variable or func-
embedded processors, however, multi- rect themselves later in a different tion defined in module def.c.
plication is less than twice as long as form. Anytime there is any “glitch,” it The dependency graph is a valu-
addition, so x+x+x would be slower means something is wrong! Make sure able software engineering aid. Given
than 3*x. appropriate steps are taken to under- such a diagram, it’s easy to identify
A programmer who foresees all the stand the problem. A workaround may what parts of the software can be
anomalies may implement the first be valuable to ensure that a product is reused, create a strategy for incremen-
version of the code in an unreadable shipped on time, but immediately tal testing of modules, and develop a
manner so as to optimize the code; after the deadline, take a bit of extra method to limit error propagation
this is before knowing if optimization time to identify the problem, to through the entire system.
is even needed. As a general rule, ensure it does not show up again— Each circular dependency (a cycle
don’t perform fine-grained optimiza- such as during the next big demo. in the graph) reduces the ability to
tions during implementation. Only reuse the software module. Testing
optimize segments of code later if it #18 Too many inter-module dependencies can only occur for the combined set
proves necessary to get better perfor- The dependencies between mod- of dependent modules, and errors
mance. If optimization is unnecessary, ules in a good software design can be will be difficult to isolate to a single
then keep the more readable code. If drawn as a tree, as shown in Figure 1a. module. If the graph has too many
cycles, or a major cycle exists where a ing any progress for an hour. Relax, process needs it. Steenstrup and Arbib
module at the bottom-most level of take a walk around a lake, go for a developed the port-automation theory
the graph is dependent on the top- beer, take a nap—anything. to formally prove that a stable and reli-
most module, then not a single mod- With a clear mind that results from able control system can be created by
ule is reusable. a bit of mental relaxation, analyzing only reading the most recent data.3
Figures 1b and 1c both include cir- what is happening is much easier, and Costly blocking is eliminated by creat-
cular dependencies. If a circular you can more quickly converge to a ing local copies of shared data, to
dependency is inevitable, Figure 1b is solution. A two-hour break—even with ensure that every process has mutually
much preferred over Figure 1c, since a deadline looming—might save a day exclusive access to the information it
in 1b reusing some of the modules is of work. A 10-minute coffee break needs.2 Using states instead of mes-
still possible. The restriction in Figure away from the computer can some- sages also provides robustness if the
1b is that modules pqr and xyz can only times save an hour of work. possibility of lost messages exists, if
be reused together. In Figure 1c, how- code does not all execute at the same
ever, reusing any subset of modules #16 Using message passing as primary rate, and if implementing with shared
isn’t possible, as too many dependen- inter-process communication memory generates less operating sys-
cies exist between modules. When software is developed as tem overhead.
Furthermore, a major circular depen- functional blocks, the first thought is Converting control systems from
dency exists, where module xyz— to implement inputs and outputs as message-based communication to
which should not be dependent on messages. Although this works well in state-based communication is general-
anything because it is at the bottom of non-real-time environments—such as ly straightforward. For example, an
the graph—is dependent on abc. Only for distributed networking—it’s prob- intelligent train control system has
one such major cycle is required to lematic in a real-time system. independent control of every brake to
make the entire application non- Three major problems arise when maximize train handling. To minimize
reusable. Unfortunately, most existing using message passing in a real-time stopping distance when coming to a
applications are more similar to system: full stop, all the brakes on the train
Figure 1c than to Figure 1a or Figure must be applied together. The I/O
1b, hence the difficulty in reusing soft- • Message passing requires synchro- logic for each brake is handled by a
ware from existing applications. nization, a primary source of separate process; the control module
To best use dependency graphs to unpredictability to real-time sched- must inform each brake module to
analyze the reusability and maintain- uling. Functional blocks end up turn on the brakes. When using a mes-
ability of software, write code that executing synchronously, and thus sage-based system, the controlling unit
makes it easy to generate the graph. analysis of the system’s timing is dif- sends a message, “apply brake,” to
That is, all extern declarations for ficult, if not impossible every brake process. This approach
exported variables in functions in a • In systems with bi-directional com- has high communication overhead,
module xxx should be defined in file munication between processes or potential loss of messages if tasks exe-
xxx.h. In module yyy, simply looking at any kind of feedback loop, dead- cute at different frequencies, nonde-
what files are #include’d allows deter- lock is a possibility terministic blocking, a separate copy
mination of that module’s dependen- • Message passing incurs significantly of the message for every process, and
cies. If this convention is not followed, more overhead as compared to the possibility of deadlock. Due to the
and an extern declaration is embed- shared memory. While messages dependencies among processes, it cre-
ded in yyy.c instead of #includeing the may be required for communica- ates a real-time system that is difficult
appropriate file, then the dependency tion across networks and serial to analyze and is not suitable for
graph will be erroneous and an lines, it’s often inefficient when ran- reconfigurable systems. In contrast, in
attempt to reuse code that appears to dom-access to the data is possible, a state-based communication mecha-
be independent of the other module as is the case for interprocess com- nism, each brake module executes
will be difficult. munication on a single processor periodically and monitors the brake
variable to update the state of its own
#17 “I don’t have time to take a break.” State-based communication is pre- brake I/O. For example, instead of
Many programmers struggle non- ferred in embedded systems to pro- the “apply brake” message, revise the
stop for hours on a problem, only to vide higher assurability. A state-based state of the brake variable so that it
hit dead end after dead end. They system uses structured shared memo- says, “the brake should be on.” Since
continue because they face a deadline. ry, such that communication has less processes are periodic, a schedulabili-
Many hours could be saved if the per- overhead. The most recent data is ty analysis is easier. Processes only
son simply took a break after not mak- always available to a process when the need to bind to a single element in the
References
1. D.B. Stewart, “Designing Software
Components for Real-Time
Applications,” in Proceedings of
Embedded Systems Conference, San
Jose, CA, September 1999.
2. D.B. Stewart, R.A. Volpe, and P.K.
Khosla, “Design of dynamically recon-
figurable real-time software using port-
based objects,” IEEE Trans. on Software
Engineering, v. 23, n. 12, Dec. 1997.
3. M. Steenstrup, M. Arbib, and E.G.
Manes. “Port Automata and the
Algebra of Concurrent Processes,”
Journal of Computer and System
Sciences, v. 27, n.1, pp. 29-50, Jan.
1983.