Dokumen - Pub Embedded Realtime Systems Programming 9780070482845 0070482845
Dokumen - Pub Embedded Realtime Systems Programming 9780070482845 0070482845
Systems Programming
Embedded Realtime
Systems Programming
Sriram V Iyer
Pankaj Gupta
Philips Semiconductors Design
Competence Centre, Bangalore
Tata McGraw-Hill
ISBN 0-07-048284-5
As we think of writing the preface, we are filled with thoughts on this book, which took over 20
months to reach this shape.
Embedded industry is now in its adolescent phase — Too young to be called mature and too
mature to be called nascent. It is currently experiencing the ‘growing pains’. The industry once
restricted to a very small community is beginning to embrace more and more architects and
developers into its fold.
When we started out in this industry after our college, a sense of mystery took over. We were
familiar with programming in the colleges. We did do a course on programming (typically
C/C++/Java), data structures/algorithms etc as a part of our curriculum. Programming in an
embedded-realtime scenario gave a feeling of deja vu of our previous programming experiences
though new experiences were definitely different.
As we patiently took notes, conversed wit senior colleagues and by sifting through a mam-
moth amount of information in the web, things started becoming clear. Finally, we could com-
prehend the rules that differentiate embedded programming from normal desktop/applications
programming. It by no way means we have reached nirvana of embedded-realtime program-
ming. We have just crossed the first few basic steps that we would like to share it with you.
This time, we were asked by Mr. K.M. Jayanth, the Department Manager of the department
we work for, to compile our thoughts, experiences and information we had gathered and create
a primer on embedded-realtime software for our department. We ventured into this activity
without knowing what we were stepping into. As we started we saw that the project was indeed
huge. After numerous sleepless nights, gallons of caffeine and countless pizzas we could finally
constrain ourselves to writing a 100 odd pages for that primer. We clearly felt that we were “hun-
gry” for a lot more but had to restrict the content to this level for it to remain a ‘primer ’.
At the same time, we got a mail from Deepa (Manager, Professional Publishing, TMH) if a
book on C++ could be written on seeing some C++ articles* in ITspace.com. We decided that
a book on programming embedded-realtime systems would be more appropriate for us, than a
book on C++.
This book is our collective wisdom distilled over hours of design, development, integration
and debugging experiences. The contents were also developed with interaction with many of
*Sriram V Iyer was the community guide/moderator for the C++ section in ITspace.com
viii Preface
our colleagues and intelligent conversations during the series of a seminar on embedded-real-
time systems that we conduct in our department.
Audience
The audience of the book could range from a novice developer who is curious to know more
about embedded systems. It could be of immense use to a practicing embedded engineer to
know more about the covered topics. It could also be used as an undergraduate text for CS, EE
and courses in Embedded-Realtime systems.
The book is not intended to replace courses in microprocessors, algorithms, programming
but can definitely complement this courses.
Though this book talks a lot about software interactions with hardware, the focus of the book
is restricted only to software. Though some insights are provided in some chapters, this book
definitely does not address the issue of embedded hardware design.
Acknowledgements
The authors would like to express their heart-felt gratitude to Philips for providing a congenial
and nurturing environment for our thoughts on the subject. We especially owe a lot to Jayanth,
Department Manager DSS for providing constant support and feedback during the entire dura-
tion of this big exercise. We also thank Dr Antonio, Director PS DCC-B, Philips for finding time
out of his busy schedule to review the manuscript and write the beautiful preface of the book.
The writers would like to say, “thanks buddies” to our teams in Philips who constantly pulled
our legs and created perfect atmosphere for this creative endeavor.☺
Sriram V Iyer
Pankaj Gupta
Foreword
In times when the pace of change was slow, the variety of products and services was small, the
channels of communication and distribution less explosive, and the consumer needs less sophis-
ticated, engineering could enjoy prolonged periods of relative stability. The times when the
customer can be held almost constant and the optimisation of other variables could have been
optimised are over. These times have long gone.
Now, human beings live in times of choice. They are continuously bombarded with pur-
chasing alternatives in every aspect of their lives. They demand more and more from their
purchases and their suppliers. The markets are fragmented and their products can be tailored by
design, programmability, service and variety. In the world of high technology such as
Semiconductors, there is an analogy that can explain this process: behind the proliferation of
electronic components, infiltrating our communication systems, entertainment centers, trans-
port, homes, there are thousands of integrated circuits that are produced in high volume up to
the last layer, which in turn is designed by the customer to add the final touch of personality
needed for their specific products. Radical customization dramatically has shortened time-to-
market and time-to-money. This exemplifies the remaking of our means of production to accom-
modate our ever-changing social and personal needs.
Programming embedded and realtime systems is not an exception to the rule—this also ben-
efits from the flexibility the topic has to offer. It emphasizes the best-practice approach but allows
the necessary customisation to shorten the time-to-market deliverables.
One of the functions of this book, and perhaps the most important one, is to open up the logic
of applying the appropriate fundamentals on embedded software and realtime systems, so that
everyone in the software industry can participate in the understanding of best practices. If pru-
dence rather than brilliance is to be our guiding principle, then many fundamentals are far bet-
ter than a series of sophisticated but isolated experiences. If embedded software is going to be
the semiconductors driving force, and most of the semiconductor organisations are debating and
insist this is their actual goal, then its fundamentals must be accessible to all players, and not as
is sometimes the case, be reserved to an elect few who think that software engineering is just a
part of the system approach.
Finally, I would like to make you think that this work is going to remain the basics of engi-
neering despite the fast-paced, ever changing competitive world. This book will continue mak-
ing you think about engineering basics and the way you think about engineering. But it will also
make you think about fundamentals of software engineering.
I am confident the reader will enjoy and find Embedded-Realtime Systems Programming a
useful experience.
Preface vii
Foreword xi
Introduction
Main Entry: em· bed· ded
Pronunciation: im-be-ded
Type: adjective
: being a constituent within a similar surrounding.
Hardly convincing isn't it! That is why, we were motivated to write this book. Embedded sys -
tems seem to touch our lives everyday almost wherever we go. However, they still remain
puddled inside a shroud of mystery, far from the normal world, being able to be unraveled
only by elderly professors with flowing beards ☺.
No, embedded systems are not confined to these hal -
lowed places alone. This section shall endeavor to intro -
duce the reader to the common world applications of
embedded systems. And it is our attempt to transform the
reader's knowledge to an extent that (s)he looks at ordi -
nary appliances at home or at work in a totally different
light. Yes, we are talking of really ordinary appliances that
have embedded systems inside them in some form or the
other. We will then look at the unique challenges that lie
in the path of engineers who create such systems and
how it is different from normal systems. So, fasten your
seat belts, here we make a take-off!
Chapter
1
Introduction to
Embedded Realtime Systems
These are the days when the terms like embedded, ubiquitous and pervasive comput-
ing are increasingly becoming more and more popular in the world of programming.
Embedded realtime programming was once looked upon as a niche skill that many
programmers can keep themselves away from —but not any more. The focus is now on
many smart and intelligent devices. The personal computer (PC)/workstation is mov-
ing away from the focal point of the computing/programming industry.
We are f looded with embedded systems that seem to be everywhere (ubiquitous) and
inconspicuous. These systems should ideally communicate with each other (distributed)
to achieve a feel of a complete system.
Before we delve further, we can define what an embedded system actually is. An
embedded system is defined as “A microprocessor based system that does not look like
a computer”.*
If we look around, we will realise that there are a lot of devices with limited intelli-
gence. Let us consider the good old washing machine. The main purpose of a washing
machine is to wash clothes. But the modern world has extended it to include extra func-
tions and give more control thereby optimising the actual process of washing clothes.
Present day washing machines come complete with sensors, which maintain optimum
water temperature, cloth dependent spin-speed, number of spins, etc. They take care of
filling water, heating it to a particular temperature, mixing the optimum amount of
detergent, soaking the clothes in water for just the right time, the soft tumble for
extracting dirt, the aggressive tumble for removing stains, and excessive detergent from
clothes, and finally the spin-dry. All this happens with minimum user intervention. The
user may just have to select what kind of clothes are being put inside the machine and
possibly how dirty they are!
This is not magic. All this is possible because somebody hit upon a brilliant idea that
we can use a small microprocessor to automate a lot of the dreary process of washing.
Since microprocessor cannot function in isolation, it needs inputs from sensors and
other controlling devices so as to feel what is going around and then “decide” what
actions need to be performed, which parts of the system have to run and in what order.
The sensors detect that the quantity of water inside the machine is at a certain level and
indicate this to the processor. The processor computes the required quantity of water
that is necessary for the number of clothes and based on user settings. It then generates
a control signal to stop the f low of water inside the machine and switch on the heater.
The temperature detector keeps giving indications about the current temperature inside
the washing machine compartment. At the optimum temperature for the kind of clothes
to be washed, the processor generates a control signal to stop the heater. Then it gives
a signal to start the soft tumble action to soak the clothes properly in water and mix the
detergent. The processor will keep a watch on the amount of time the soft tumble action
is going on. At the optimum time, it will stop the soft tumble action and start the aggres-
sive tumble action to fight the stains. So, we can see that washing machine is an
example of an embedded system. As illustrated, the seemingly simple task of washing
clothes is a big exercise for the processor!
As embedded systems started progressing, they started becoming more and more
complex. Additionally, new attributes that got added to these systems were smart and
intelligent. Not only were the embedded devices able to do their jobs but also were able
to do them smartly. What exactly do we mean by intelligence? Intelligence is one of the
terms that cannot still be defined in a single concrete way (If it was indeed definable,
we would have a laptop typing these pages on its own!). We can define a smart device
as a device with the following attributes:
❑ Computational Power All these devices have some amount of computing power.
This could be provided by a very simple 8-bit controller or a high-end 64-bit
microprocessor.
❑ Memory The next requirement is memory. These devices possess some amount
of memory that can be used by the processor and also some to remember user
data and preferences.
Introduction 5
❑ Realtime All the devices have to respond to user/environmental inputs within
a specified period of time.
❑ Communication The device must be able to receive inputs given by other
devices in the environment, process it and provide some tangible output to the
other devices or users.
❑ Dynamic decisions The system should be able to change its next course of activ-
ity based on the change of input from its sensors or surroundings.
❑ Limited operating system (OS) support for programming Application programs for
PCs/workstations are launched from the operating system. The tasks like schedul-
ing, memory management, hardware abstractions and input/output from/ to
peripherals are delegated to the OS. In embedded systems, the OS is part of appli-
cation code and it closely co-ordinates with the OS to support a majority of the
features that a desktop OS may provide.
❑ Limited secondary memory Many embedded systems do not boot from a hard
disk. (A cell-phone with a hard disk? ☺). They depend on other types of non-
volatile memory like read only memory (ROMs) and “FLASH memory” instead
of secondary memory devices like the f loppy disks or hard disks. Since we do not
6 Embedded Realtime Systems Programming
talk about giga-bytes of f lash (systems with 16 MB f lash are considered premium),
our code and data sizes must be small.
❑ Limited random access memory (RAM) Since embedded systems inherently
operate with restrictions on resources, we do not usually have concepts of swap-
ping, virtual memory, etc. in typical embedded systems. And, while programming
for embedded systems, we must be very careful about memory leaks because,
these programs tend to run forever. For example, a television goes to standby
mode when it is switched off (unless the power is switched off). Some components
take rest, while the program still runs anticipating commands from the remote
commander. If the program ends when the television goes to standby mode, the
television cannot be switched on again because there is no entity on television
listening to the remote commander. These programs that run in embedded sys-
tems tend to run forever, and even a single byte leak in some path of execution
will definitely bring the system to a grinding halt at a later point of time.
❑ Interaction with hardware This is the caveat. This factor singularly differentiates
Usually all embedded systems have a lot in common in terms of their components and
their requirements. The following subsections introduce some of these requirements
and components.
This processing logic that used to be “hardwired” in a chip or other electrical circuits
grew up exponentially and is so complex nowadays that many functionalities are sim-
ply unimaginable without software. The usual practice is to hardwire ‘mature’ features
in hardware and use software to implement evolving features.
An embedded system can also take inputs from its environment. For example, con-
sider a music system with preset options such as theatre, hall, rock, etc. A user can
change the acoustic effect based on his requirements. In this case input is received from
the user (Fig. 1.2).
8 Embedded Realtime Systems Programming
1.2.2 Memory
Memory is a very precious resource and is always found wanting in many embedded
systems (including human systems ☺).
It is indeed true that memory is becoming cheaper nowadays. But, in these days
of intense price wars, every resource must be handled with extreme care. And, in
many systems, some space has to be allocated for future expansions. Also, we cannot
afford expansion slots as in PC for embedded systems due to cost constraints,
Introduction 9
embedded-hardware design constraints and form-factor* restrictions. So, memory
should be handled very carefully.
These constraints on memory will become self evident by looking at good embed-
ded system designs and its software. Algorithms that use a huge amount of memory or
copying of huge data structures are ignored unless it is an absolute necessity.
Much of embedded system software is such that it directly uses memory instead of
high-level abstractions. Many RTOSes, however do provide routines that hide the com-
plexity of memory management.
Many embedded systems do not carry hard disk or f loppy disk drives with them.
The usage of secondary storage is not possible in most embedded systems. So, these sys-
tems usually have some ROM and nonvolatile RAM where the code and user prefer-
ences are stored. (Various memories that are used and the programming techniques to
handle memory efficiently are discussed separately in Chapter 4).
We have to remember that most of the programs do not terminate (when was the last
time you “rebooted” your refrigerator?) and tend to run forever. In case of mission-criti-
cal systems, when emergency strikes or when some irrecoverable error occurs, embed-
ded systems implement what are called watchdog timers which just reset the system.
1.2.3 Realtime
We can define a system as a collection of subsystems or components that respond to the
inputs from the user or the environment or from itself (e.g. timers). Typically, there is a
time lapse between the time at which the input is given and the time at which the system
responds. In any system, it is quite natural to expect some response within a specific
time interval. But, there are systems where, very strict (not necessarily short) deadlines
have to be met. These systems are called realtime systems. These systems are charac-
terised by the well-known one-liner:
P2P
“A late answer is a wrong answer”.
*The size/form and shape of the appliance (or device). People will not buy a cell phone as big as a
dumbbell just because it can be enhanced with more features in the future owing to its expansion slots.
10 Embedded Realtime Systems Programming
❑ Hard realtime systems A realtime system where missing a deadline could cause
drastic results that could lead to loss of life and/or property is called a hard real-
time system.
Examples are aircrafts, biomedical instruments (like pacemakers), nuclear reac-
tors, etc.
For example, fighter jets have to respond to the air force pilot immediately.
❑ Soft realtime systems A realtime system where a few missed deadlines may not
cause any significant inconvenience to the user is known as a soft realtime system.
Examples are televisions, multimedia streaming over Internet (where loss of some
packets can be afforded).
There is widespread confusion about ‘hard and fast’ realtime systems and soft and
slow realtime systems.
The realtime systems can also be classified as fast and slow systems based on the time
deadlines they operate with. Again, this is a very subjective definition. Typically, any
system that works with subsecond response times can be classified as a ‘fast’ realtime
system. The other systems that can take a second or more time to respond can be clas-
sified as ‘slow’ realtime systems.
Soft realtime systems can be fast. A few packets can be lost without loss to life and
limb across a high-speed router that works with nanosecond deadlines. Similarly, hard
realtime systems can also be slow — the cadmium rods inside a nuclear reactor need not
be pulled out at lightening speed.
Closely associated with the concept of realtime is the concept of determinism. This is
also a very important concept that differentiates realtime programming from normal
application programming.
We have seen that a realtime system is one that behaves predictably— it responds
within a particular amount of time. The time interval between the instant at
which the input occurred to the time instance at which output occurs should
be ‘deterministic’ or predictable. This does not necessarily require that the systems must
be fast. It only requires that the system should always respond within a known period
of time.
For e.g., Java though is highly acclaimed as portable that makes it ideal for embedded
software systems that runs on various types of platforms, was found not ideally suited
for realtime systems because, some of the calls are not deterministic.
Introduction 11
Java implements the concept of automatic garbage collection.* This means that the pro-
grammer need not remember to free any of his memory. The logic for collection of
unused memory is built into the runtime. But this could cause problems.
Whenever memory for the objects are allocated from the heap, if there is not suffi-
cient memory, then the garbage collection algorithm is executed till all the memory is
reclaimed. This could stop our program from execution for an unknown period of time.
As we will see later, this is a major issue in realtime systems wherever every function
call must be deterministic.
❑ Cost Cost is often the major driving factor behind many embedded systems.
This requires the designer to be extremely conscious about memory, peripherals,
etc. This factor plays a key role in high volume products. But, some highly specif-
ic applications like avionics can be chosen to be expensive.
❑ Reliability Some products require a 99.999% uptime. Typical examples are
routers, bridges, power systems, etc. But some may not require this kind of
reliability (It is OK if your bread gets a bit overdone once in a while in
*Garbage collection is an oft-misused term. In many languages, the programmer has to take care of
his/her ‘garbage’, i.e. the memory she/he no longer uses. She/he has to remember to free the memory
manually. In some languages like Java, there is an option of the language-runtime (Java Virtual
Machine—JVM) taking care of freeing unused memory. This is called ‘automatic garbage collection’.
We have to remember that garbage collection must be done—either manually by the programmer or
automatically by the language runtime.
12 Embedded Realtime Systems Programming
your microwave). Reliability may require the designer to opt for some level of
redundancy.* This could make the system more expensive.
❑ Lifetime Products that have a longer lifetime must be built with robust and
proven components.
❑ Power consumption This is becoming an important area of research in itself.
With growing number of mobile instruments, power consumption has become a
major concern. The problem was first encountered while designing laptops. The
laptop seemed to siphon off the power in no time. Similarly, many of today’s
devices are mobile— like the cellular phone, PDA, to quote a popular few. The
design of these devices is such that the power consumption is reduced to the min-
imum. Some of the popular tactics used include shutting down those peripherals
which are not immediately required. These tactics are highly dependent on soft-
ware. This factor has reached new dimensions with new processors being designed
such that some of their parts can be shut down whenever not required. This
requires a shift to a new era of programming with more dimensions being added
to embedded software programming. The programmer for mobile devices is
becoming increasingly aware of the power-saving features in his programming
platform (peripherals and processors).
These are some of the soft factors that drive design of embedded systems and its
software.
Let us see some of the typical embedded systems that surround us.
*Having some duplicate peripherals that can be used when the main peripheral fails is called redun-
dancy. This duplicate peripheral is not in use always. It is used only when the main device fails.
There could be any level of redundancy. In highly critical systems more levels of redundancy can be
provided. If the swapping between the failed main device and the redundant device can occur
without a power-down or restart, then the device is said to be ‘hot-swappable’. It is called
‘cold-swappable’ otherwise.
Introduction 13
environment (hall, theatre, open-air, etc). These features are not hardwired in chips but
are usually taken care of by the software that goes with these systems. The processors
are typically 8-bit microprocessors for handling user inputs and display. Additionally,
they have a high-end DSP 16-bit/32-bit microprocessor and/or MPEG2, MPEG4
decoders for decoding the input stream for various supported media. The RAM for
these kinds of systems can vary a lot from 64KB to a few MB depending on how com-
plex the system is.
On the realtime front, the media should be read, decoded and the stream must be
sent to the speakers/video output at a predefined rate. Based on the media the require-
ments for this data throughput may vary. Imagine a Bluetooth™ network that takes care
of playing your favourite music as you enter the house (by contacting your PDA). This
system needs to interact with its components as well as other devices in realtime so that
the desired functionality (playing of favourite music) is achieved.
On f lashing of the card as detected by the magnetic sensor, the card identifier is looked
upon in the access control list. If the card does have the access permit then the LEDs
on the unit f lashes and the door is unlocked for entry. Or, the system can emit a sound
or display that the access is not permitted. The unit should just look up the access table
and respond to the user.
However, this should happen sufficiently fast—typically in less than a second. We can-
not allow even a few seconds lapse because, the user may assume that his access was
not permitted or that the system is dysfunctional. The lists can be stored in a central
server where the look up can be done. In this case, the authentication unit may not
require storing of all the lists in its memory. Or, it can store the list only for the location
for which it controls the access.
This is left entirely to the discretion of the designer of the system. The memory required
for this system will depend on the method opted for its design. These units are
connected with each other, usually using some kind of Ethernet connection.
14 Embedded Realtime Systems Programming
Assembly language was the lingua franca for programming embedded systems till
recently. Nowadays there are many languages to program them— C,C++, Ada, Forth
and … Java together with its new avatar J2ME. Embedded software is coming of age
and it is fast catching up with application software. The presence of tools to model the
software in UML, SDL is sufficient to indicate the maturity of embedded software
programming.
But, the majority of software for embedded systems is still done in C. Recent survey
indicates that approximately 45% of the embedded software is still being done in C.
C++ is also increasing its presence in embedded systems. C++ is based on C, and helps
the programmer to pace his transition to OO methodology and reap the benefits of
such an approach.
C is very close to assembly programming and it allows very easy access to underly-
ing hardware. A huge number of high quality compilers and debugging tools are avail-
able for C. Though C++ is theoretically as efficient as C, some of its compilers are
buggy due to the huge size of the language. These compilers may create a buggy exe-
cutable in some situations. C can definitely claim to have more mature compilers than
C++. And in C++, some of the features do cause a lot of code to bloat. Actually there
is an ongoing effort to identify a subset of C++ that can be used in embedded systems.
This subset is called the Embedded C++.*
In this book, we concentrate on C and we use C++ wherever applicable. The myths
that surround C++ and implications of using it in embedded systems are discussed in
Appendix A.
In this book, we will explore what an embedded system is, various types of embedded
systems, techniques to program them, and major concepts that are required to be mas-
tered for efficient design and implementation of embedded system software. We will
also take a peek into the process of developing efficient embedded software and its
potential pitfalls.
*For more information look into the site for Embedded C++ - http://www.caravan.net/ec2plus/.
Introduction 15
However, the book is NOT
■ An ASIC developer’s guide
■ Guide for board design/board layout
■ Detailed guide to semiconductor /digital techniques
■ User guide to any embedded programming language
This book aims to explore the basic concepts that underline an embedded system
and its software.
In Part I, the basic functionalities of an embedded system are defined. This part shall
be the springboard for the rest of the book, by giving useful insights into the importance
and organisation of rest of the sections.
A microprocessor or a microcontroller has loads of features. In Part II, we discuss
concepts like interrupts, hardware timers, memory types and its management.
Embedded software has a lot of components that can be found in many systems. For
example, state machines /state charts are used in many communication protocols. Task-
based design is also one of the ubiquitous programming practice. In part III some of
the common design paradigms that a programmer can immediately benefit from are
described.
Software is complete only with its corresponding engineering practices. Some of the
software engineering issues are dealt with in Part IV. Estimation, requirements gather-
ing, architecture definition, design, implementation and testing of embedded systems
should be familiar to embedded programmers. This helps in improving over-all quali-
ty of the system. All the case studies and examples discussed in the chapters above are
used to create a complete case study to help a programmer understand the complete
Software Development Lifecycle (SDLC).
16 Embedded Realtime Systems Programming
The content of the book in all the chapters is based on the Learning Pyramid™ as
indicated in ancient Indian texts (Vedas) (Fig. 1.3).
Note
Watchdog Timers: Watchdog timer is a mechanism to monitor the activity (rather inactivity) in a sys-
tem. A watchdog timer periodically checks for a predefined activity. (It could be looking for a specific
value to be updated periodically in a particular memory location). If the watchdog senses that the value
has been updated as expected, then, it concludes that the system is an irrecoverable state (either the code
is stuck in a loop, or it has crashed, etc.) and resets the system.
Introduction 17
The third stage is assimilation —This consists of insights and examples to get deep
understanding of the system and interrelation between various information groups.
The final stage is application —This stage includes real case studies that would help
the reader exercise all that he has assimilated. This involves use of knowledge, after
comprehension and assimilation, production of an experience by using it to solve a
problem in real life.
Thus the Learning Pyramid™ helps in complete coverage of any subject, and specif-
ically here “Embedded realtime systems” and its software.
In this chapter, we learnt about typical characteristics and features of embedded sys-
tems. Embedded systems can be found all around us — in washing machines, in music
systems, remote controls and the like. Embedded systems have been increasing in com-
plexity and intelligence constantly. It is a challenge to balance the strict restriction on
memory and size of embedded systems with more computational power. In addition,
programming for embedded devices has its own problems such as limited OS support,
lack of standard I/O devices, limited memory and interaction with hardware devices in
realtime. C replaced Assembly as the most commonly used language for programming
embedded systems as of today because of ease of programming and its compact code
generation. Other languages such as C++ and Java are picking up too.
18 Embedded Realtime Systems Programming
Embedded nitty-gritty
Chapter 2 gives an insight into the build process
associated with embedded systems, and we under -
stand why and how it is different from a traditional
compilation process. Memory is one of the most
important parts of an embedded system. So it
always pays for an embedded programmer to know
something about its organisation, access methods
and the associated circuitry in order to get a feel of
things. We take a look at them in Chapters 3 and 4.
Also, most embedded systems interact with the
external world using the ‘interrupt' mechanism.
Chapter
2
The process of translating the code that is written by humans to the code that is under-
standable by the microprocessor is called the build process (Fig. 2.1).
The steps that are involved in transforming the source code (the code created by the
programmer) to the final executable format are listed below:
i. Preprocessing
ii. Compiling
iii. Linking
iv. Locating
2.1 PREPROCESSING
This is the first step in the build process. The whole of the build process is accomplished
by a set of executables (not just cc/bcc32). Preprocessing is accomplished using an exe-
cutable usually called the cpp (C Pre-Processor) in the case of Unix* and cpp32.exe/
cpp.exe in the case of Windows™ machines.
The pre-processor is automatically invoked by ‘cc’ (or cl or bcc32 or gcc, etc.) and its
output is a temporary file, which is deleted at the end of the compilation process.
*When we mean Unix it is usually *nix (The ‘*’ symbol is to represent a wild card and not to swear at
Unix). Unix in this book, is Unix and its clones (HP-UX, Solaris, Linux and our favourite–AIX).
The Build Process for Embedded System 23
The preprocessor does the following jobs:
i. Strips the comments
ii. Expands include files
iii. Expands MACROs and replaces symbolic constants (simply put, the
#defines ☺)
Including files is a great mechanism to maintain code modularity thought upon years
before. Though other mechanisms have evolved, they are based on similar concepts
and we still cannot do away with including files. Let us see a sample header file and see
what is done while including header files.
P2P
Can we include .c (source) files?
// foo.h
#ifndef MY_FOO_H
#define MY_FOO_H
/*
Preprocessor does not bother typedefs. typedef is a
compiler feature. So, note the semicolon in the end.
*/
/*
PI is a symbolic constant. We can use PI wherever we
can use 3.14159
*/
#define PI 3.14159
/*
SQR is a macro. Though preprocessor is a text replacement
program, macros can be used effectively to improve our
coding. (though they have a tendency to introduce bugs.)
*/
#define SQR(x) (x)*(x)
void myFoo ( MY_INT );
#endif
Listing 2.1: foo.h
24 Embedded Realtime Systems Programming
// foo.c
#include “foo.h”
myFoo(i);
return 0;
}
Now, let us look at foo.c after it has been preprocessed. (Please note that during the
normal compilation process, this file is not generated. This file is a temporary file that
gets deleted after the compiler uses it. However, preprocessors can be used directly and
have options to dump the preprocessed file in the screen or be redirected to a file. We
can use the corresponding preprocessor available in our platform to generate the pre-
processed file). We used the preprocessor cpp32.exe freely available Borland C/C++
compiler provided by Borland™ Inc. The comments are inserted by the preprocessor
to improve our readability and will not be present in the actual output given to the com-
piler (i.e. in the temporary file created and given to the compiler).
Note
To create this preprocessed output, we used the C Preprocessor (cpp32.exe) provided by Borland
free command line tools.
cpp32 -Ic:\Borland\bcc55\Include foo.c
After preprocessing foo.c will look like above. Note that the header file foo.h is
expanded (foo.c lines are in bold). Look at the lines foo.c #5 and foo.c #9 respectively.
The preprocessor has expanded the macros SQR and replaced PI.
It should be noted that the compiler sees the .c file only after preprocessing. So, the
compiler cannot see if we had manually typed in 3.14159 or the preprocessor replaced
26 Embedded Realtime Systems Programming
PI with 3.14159. The compiler/linker usually create entries in the final executable to
add debugging information. Since the compiler cannot identify preprocessor symbols
(because they have been removed already when it comes to the compiler), it cannot add
any debug information about preprocessor symbols in the executable. So, debugging
preprocessor macros is extremely difficult (if not impossible).
2.2 COMPILING
This is one of the most important steps in the build process where the object code is
created from the source code. The name for this step has become synonymous with the
entire build process.
In the compiling parlance, the code that the user creates is called the source code and
the output of the compiler is called object code.
In this step, the code written by the user is converted to machine understandable
code. The steps involved in compilation process can be split as
i. Parsing
ii. Object code generation
In the parsing step, the compiler parses the source file to validate the use of variables
and checks that the language grammar/semantics are not violated. The parsing step also
makes a note of the external variables that are used from other modules and the vari-
ables exported to other modules.
In the next step, the object code is generated from the corresponding high-level lan-
guage statements. Some compilers choose to create an intermediate assembly file and
invoke the assembler to create object code from the assembly listing produced.
The object code created cannot be executed yet. One of the important points to be
observed is that the compiler works on a single source file at a time. In compiler
parlance, each source file is called a ‘translation unit ’ (TU). An object file (.o or .obj) is
created for every translation unit. A TU is typically the preprocessed file produced by
the preprocessor.
P2P
Can a .obj file be created for a .h file?
The Build Process for Embedded System 27
There exists a one to one relation between every source file and object file (At least,
for every compiler that we know of replace a .c extension with a .o/.obj extension for
the object file name).
If we look at the second point, it is really not necessary for a compiler to produce
object code that is executable in the same workstation it runs. Both are logically inde-
pendent. A compiler can also produce object code that can be executed on a different
processor. (The catch here is that the build code cannot be executed immediately in the
same workstation in the absence of emulators)
Before venturing into the details of cross compiling, we have to know brief ly, how
software development is carried out for embedded systems.
The embedded software is finally executed on boards like the one shown in Fig. 2.2.
A board has a processor (ARM/MIPS/x86), some RAM, ROM, Flash and some
interconnect devices like Ethernet, UARTs etc. They usually don’t have any display or
keyboard attached. So, it is not possible to do any development of software for the
board on the board. If you have worked on an 8085 board, then you would remember
that the programs used to be typed in directly using a small numeric keyboard directly
in machine language.
28 Embedded Realtime Systems Programming
So, wouldn’t it be nice if we can build the code for the board in the comfort of devel-
opment environment in a PC/workstation, but execute the code in the target? A cross
compiler helps us do exactly this. Though a cross compiler is run on the host like a
Unix/Windows workstation, the object code cannot usually be executed in the same
workstation.
Definition
A cross compiler is defined as a compiler that produces object code for the processor in the
target rather than the host in which the compiler is executing.*
Analysis:
Compiler The cross compiler is also a compiler because it converts the source
code to object code.
Processor in the target The targets are usually boards /platforms that do not have a
display and keyboard. So, the compilation is done on a PC/Unix workstation. But
the object code produced by the cross compiler executes in the target and not on
the host.
*Not to mention that both the host and target processors can be same in which case the executable can
be run on the host also.
The Build Process for Embedded System 29
2.4 LINKING
The process of compilation ends after creating object files for every source file (trans-
lation unit). We still do not have a single executable. The object files though in machine
understandable format, are incomplete. Some of the incomplete parts could be:
a. References to external variables : Consider the case when a project consists of two
files. The t2.c declares a global variable foo that is an integer and t1.c refers to it
by an external reference. Since the compiler works only on a single translation
unit at a time, while compiling t1.c, the compiler can only assume about the
existence of foo and does not know about exact location of foo. In the case of t1.c,
the compiler adds the name of foo to the list of ‘imported’ symbols when it
adds it to list of ‘exported’ symbols while compiling t2.c. It is the duty of the linker
to resolve these external symbols (now does the linker error ‘unresolved exter-
nal XXXX’ mean more to you?). This is not limited to variables. The linker
also links the references to functions that are defined in other source files and
libraries.
//t1.c //t2.c
// ... // ...
b. No binding to real address : Again, due to the nature of the compiler (working
with one file at a time), the addresses of different variables in different files will be
assigned relative to the particular file. When all the object files are linked togeth-
er to a single executable file, the address assigned to all these variables must be
unique. The linker takes care of assigning correct addresses to these variables.
Code in a file may also refer to a function in some other file. The linker fills these
addresses of the functions. Then, the code for a particular function may be in a
library. The linker will search the library and link appropriate code with the
application.
30 Embedded Realtime Systems Programming
2.5 LOCATING
The output of the linker is a single executable. The linker would have resolved all the
external references among the object files, linked up the code from library, etc. Still, the
executable is not ready to run on the target!
The final step (at last) that should be done is called ‘locating’. This step is unheard of
among developers who work with abstraction levels of COM, .net, etc. Though this is
done for every executable, this step is of high importance to embedded systems
because, this step requires a lot of input from the programmer.
This step finally adds the target specific information into the executable. This was not
required in the case of application programs because the OS takes care of most of the
locating issues. Typically, the locating is done at load time in many operating systems.
This is not the case in embedded systems, where locating must be done because devel-
opment is done on hardware platforms unique to each project.
A dedicated OS on the target that loads the executable to the memory is unheard of
in embedded systems. The programmer must explicitly provide the hardware specific
information of location of ROM, RAM, etc. and their sizes.
This information can be provided either as a command line option or inside linker
scripts. The linker (ld) in GNU GCC compiler collection has a sophisticated linker
script language that can be used to finely control the final executable. ARM linker
provided with the ARM developer suite (ADS 2.2) also has a good support for
these scripts.
The locator may be available as a separate program or be bundled together with the
linker. Please refer to the toolset documentation on specific information for controlling
the linker/locator.
By this time we are fairly comfortable with the various steps in the build process. The
entire build process is not done by a single huge monolithic executable. It is instead
accomplished by a set of executables. For instance, we saw that the preprocessing in our
case was done by cpp32.exe. But, who takes care of invoking the right component at
the right time?
The answer to question is: The compiler driver.
The Build Process for Embedded System 31
The compiler driver is one of the least known terms among the developer commu-
nity. We usually think all the work of building is done by cc / bcc32/cl. But, this is not
true. cc is simply a compiler driver that takes care of invoking the right component at
the right time and passing the correct inputs to them.
So far, we were only talking about providing various inputs to the linker/compiler, etc.
At the same time, we can request the linker to provide some output too (other than the
executable, obviously ☺).
At this stage, before continuing, the reader may require a small deviation to look at
section 2.10 in order to understand the concept of segments. At this stage of build pro-
cess, we do not know the exact location /address of (initialised) global variables, various
functions, etc. We also do not know the size of each function (in bytes), the amount of
debug information, etc. These are decided by the linker and known only to the linker.
To know these details, we can request the linker to generate MAP files.
32 Embedded Realtime Systems Programming
The generated MAP file is huge but some of the contents are showed in Fig. 2.4 for
explanatory purposes. (The map file was generated using Microsoft cl compiler (32-bit
C/C++ Standard Compiler Version 13.00.9466 for 80x86)).
From the excerpt above, we can see that the .text segment (containing the executable
code) has segment number ‘0001’. The data segments (the initialised and un-initialised
data) belong to segment 3 (0003). The C pro-
Note gram has 3 symbols described in the map file.
The first two are functions (foo and main) that
MAP file can be generated using the –M belong to the text segment. (indicated by
option provided for the Borland compiler. symbol ‘f ’ in the MAP file)
bcc32 –Ic:\Borland\bcc55\Include We had two global variables. As indicated
–Lc:\Borland\bcc55\Lib -M foo .c above, only the initialised variable is assigned
an address in the map file. The symbol for
uninitialised variable myvar_un_init is not present since it has been moved to bss
section.
We know very well that all global/static variables are initialised to zero. If in doubt,
execute the following program and check it out!)
#include <stdio.h>
int a;
int main(void)
{
int b;
static int c;
return 0;
}
The output of the program when compiled and run in our PC was:
0 575 0
The 575 is a garbage value and can change based on the memory location the
variable finds itself. The point to be taken is that the variables a, c are initialised to
zero even before main is entered. This is done during the initialisation of the bss
segment. This is sufficient proof that main() is not the first function that gets
executed.
Now, embedded systems are becoming complex and have various types of RAM.
Designers can choose to load different parts of program to different types of RAM (say
SRAM, SDRAM) during the execution of the program to meet performance and cost
requirements.
For this we need to instruct the linker to place various modules in various locations
in the memory. So, we need to create linker scripts.
The Build Process for Embedded System 35
A sample linker script is given below:
ROM 0x08000000 0x2000 ; “Region” “Base Address” “Maximum Size”
{
ROM 0x08000000 ; “Region” “Base Address
{
foo.o (+RO)
}
RAM 0x0600000 ; “Region” “Base Address”
{
foo.o ( +RW )
foo.o ( +ZI )
}
}
Here, we are placing the read-only content (.text section in ROM and the rest in
RAM) during execution. RO, RW are Read Only and Read Write respectively. The ZI
is called the Zero Init section, which corresponds to BSS.
In the startup script, the memory for ZI section must be allocated and the contents
must be zeroed.
The syntax of these files are linker specific. The format given above is based on
ARM™ linker (provided with ARM developer suite). ‘ld’ of the GNU GCC compiler
collection also supports linker scripts.
In order to finish the discussion related to compilation, let us brief ly touch upon how
this code is executed. We will take this process further when we describe memory types
and we will explain available tools in Chapter 11.
As we mentioned before, the typical process of code development for embedded
systems happens on a host machine. This host machine may have an integrated
development environment and a cross compiler in order to produce machine code
understandable by the embedded hardware. Figure 2.6 illustrates this concept. The
developer works on the host system and “loads” the machine code on the target hard-
ware memory (usually Flash or some kind of ROM— we will discuss this in Chapter 3)
using a serial link or Ethernet connection.
P2P
Why are host and target called so?
36 Embedded Realtime Systems Programming
As we will see in Chapter 11, Fig. 2.6 is a simplified version of the actual setup; usu-
ally there are more tools available to help the developer.
Serial Link /
Ethernet
Flash/ROM
Host Target
Fig. 2.6 Development of embedded platforms
return 0;
}
The above program does nothing useful, but is taken for demonstration of various
program segments. This program exists in the text segment. The program has some
data (the variables) and some code (the for loop) that operates on the data.
The data can be classified as:
i. Global Variables
■ Initialised
■ Uninitialised
ii. Local Variables
Note
Global variables, global variables with file scope (global variables defined with static qualifier) and
static variables defined inside a function scope are variables that are supposed to have a static linkage.
Usually programmers get carried over by this static linkage notation. All the above three kinds of
variables have space allocated at the compile/link time. So, their addresses are a constant, i.e. static
throughout the lifetime of the program. This is unlike local variables that get created on the stack and
have varying addresses for every invocation. (In fact, to enable recursion, we require each instance of
the local variable of the same function to have different addresses).
38 Embedded Realtime Systems Programming
This means that a[0] = 0, a[1] = 1, a[2] = 2 and the rest a[3] to a[99]
are initialised to 0.
But, the danger of above initialisation is that, for the sake of initialising 3 members
of the array, we have added the space needed for 100 integers to the size of the final
executable. If the integer size in your machine is 4 bytes, then, the code size increases
by 400 bytes. That is quite a lot of memory in embedded systems.
Sometimes, global variables are initialised to zero for the sake of clarity.
int i = 0;
int j = 0;
int main(void)
{
// . . .
}
But, this initialisation to zero is superf luous and is not required. But, as a result of ini-
tialisation, the final code size will increase by a few bytes. Uninitialised global variables
are automatically initialised to zero. But, the main objective of this section is to bring
home the point that initialised global variables are part of the data segment (that take
space in the final executable).
*Henceforth, in this chapter, whenever a reference is made to global variables, it is also applicable to
static variables unless explicitly mentioned otherwise.
The Build Process for Embedded System 39
2.10.3 BSS segment
This is the place for uninitialised global variables. Adding uninitialised global variables
does not increase the code size. A note of the size of the bss segment is kept in the exe-
cutable. During the loading of the executable, the size for the bss segment is allocated
at runtime. Then, the memory allocated for the BSS segment is zeroed. (Now we know
how global variables are initialised to zero).
Tips
If you have global integer/array variables initialized to zero, then, you may very well remove the
initializers to save object code size.
// bssarray.c
#include <stdio.h>
//dataarray.c
#include <stdio.h>
In our system, when we compiled the above programs with the Borland compiler,
we got the following output:
40 Embedded Realtime Systems Programming
We can see that the size varies approximately by 400 bytes (size of 100 integers in
our machine). (The same code when compiled with Microsoft cl compiler produced the
following output. Here, the difference is exactly 400 bytes as expected. This is not a
benchmark test to compare two compilers. We just want to show that different compil-
ers could produce different outputs and that around 400 bytes get added to the code
size when we try to have an initialised array instead of uninitialised one).
The standards (C(C9X) and C++ (ISO 14882)) say that the BSS segments be filled
with zeroes. But, the numbers may not have a logical value zero and can have their own
representation. (For e.g., for a global f loating point number, if all its bytes are initialised
to zero, then it can take a value decided by the compiler implementers. But, in most
cases, we can assume that the value is zero)
Function call
Usually a function is called with some arguments. The function performs some opera-
tion based on the arguments and may return a value to the function that called it. The
arguments that need to be passed to the function occupy memory.
Note
The class of functions that do not take any argument, but return a value are called ‘generators’ or
‘sources’. E.g. rand() in math library that returns a random value. The classes of functions that take
input but do not return any value are known as ‘sinks’. Sinks are not to be confused with procedures
that used to perform a specific operation based on its arguments.
While designing systems we must be careful in avoiding sources and sinks, since they are not natural
in systems.
Computation
A function typically uses local variables to store intermediate values. These local vari-
ables are created during runtime and require memory.
The mechanism that takes care of these dynamic data structures used to implement
these stack structure actually form part of C-Runtime.*2 The data structure that is used
to handle the above memory requirements is called a stack frame.
Consider the following code:
// ...
int main()
{
foo(a)
// ...
}
The main() calls foo()which in turn calls foobar(). Considering that no other functions
are called in between, the stack would look like this:
Note
This is just a sample representation of stack frame. For the exact structure (in Unix), we can refer to
frame.h file.
We can now see that both passing of arguments and the return value take space. So,
whenever a big structure needs to be passed to a function, it is advisable to pass the
pointer to the function rather than the structure itself directly because, it takes a lot of
space in the stack. Moreover, time is also spent in copying the big structure into the
memory allocated for the local variable. Similarly, while returning a structure, it is
preferable to pass back a pointer rather than the entire structure. We just have to make
sure that we don’t pass the pointer of structure created in local memory (since that will
create a dangling pointer).
Tips
Whenever you pass a structure/class as an argument or return a structure from a function, it is always
preferable to pass their pointers (or references (in C++)) rather than the actual structures themselves.
This will save a lot of stack space.
Note
The order in which the arguments are pushed into the stack gives rise to an interesting
situation. Consider the following function call:
foo(a, b, c); // Let a, b, c be 3 integers – 4 bytes each
Assembly language code for pushing the arguments into the stack can be:
push a into stack
push b into stack
push c into stack
So, at the called side (i.e. inside foo), the arguments are got back as
pop c from stack
pop b from stack
pop a from stack
Note that these are in reverse order of the arguments that are pushed because stack
is a LIFO (last in first out) structure.
The Build Process for Embedded System 45
So far, it seems good. But now, we can question ourselves, “Why push ‘a ’ in the stack
first? Why not c? ”
The answer is: either way is OK so long as both the caller and callee agree to the con-
vention. The two types of conventions are called ‘Pascal’ calling convention and ‘C’
calling convention, respectively.
In the pascal calling convention, the leftmost argument is pushed first (in this case, a)
and the rightmost last (c, in this case). So, c is the first argument popped by the callee.
In C, the rightmost argument is pushed first and the leftmost (i.e., the first argument)
last into the stack. This enables C to handle function with variable number of arguments
(The classic example being printf).
The first argument can provide data on the arguments that follow it. In the case of
classic printf, the first argument tells about the size of the arguments that follow (%d =>
integer – 4 bytes (in 32 bit machines), %c => character etc – 1 byte).
Consider
printf (“%d %c”, a, c);
printf (“%c %c %c %c %c”, x, y, z, t);
In both cases, 5 bytes are pushed into the stack. These five bytes can be seen as
5 characters, an integer and a character, two short integers and a character and so on.
To identify the exact arguments passed, the printf takes help from the first argument
which is standardized as a string. For e.g., in the format string, if printf sees a %d, it
infers that the data in the byte stream is an integer and pops 4 bytes. If it sees a %c, it
infers that the byte in the stack is a character and pops only one byte. This is possible
because, the first argument —format string is pushed last. So, functions with C calling
conventions can have variable number of arguments while those with pascal calling
conventions cannot.
Usually, it is possible for the programmer to control the way the arguments are
pushed by using compiler specific options.
int __cdecl foo(int a, float b);
int pascal foobar (char a, int b);
Windows programmers would have been familiar with int pascal WinMain(). In Intel
architectures, it was found that functions that use pascal conventions took less space. So,
windows programmers use the PASCAL prefix to function that do not use variable
number of arguments.
46 Embedded Realtime Systems Programming
The build process converts a human understandable code to a machine executable for-
mat. The build process consists of preprocessing, compiling, linking and locating.
Preprocessor strips comments, expands include files and macros. Compilation parses
48 Embedded Realtime Systems Programming
the pre-processed code and generates object code for a particular file. Embedded
systems usually use a special type of compilation called cross compilation since the code
is supposed to execute on a target platform different from the host. After compilation,
the linker is called to resolve external references and perform binding to real address-
es. The final step in the build process is locating, which means giving details of posi-
tioning and size of RAM/ROM and segments. The steps of build process are taken care
of by a compiler driver that resolves all arguments and ensures invoking of the right
component at the right time.
A program consists of four segments: data, bss, stack and text. Data segment stores
initialised global variables. Bss segment stores uninitialised global variables. The stack
segment stores parameters and data while the program is executing. Its most important
job is to keep track of arguments, local variables and return addresses when functions
are called. Text segment stores the actual code.
Types of Memory
Most embedded systems use memories in various ways. Memory is required inside
embedded systems for a variety of reasons: to store the executable code, to load the
instructions to be executed as well as the associated data, to save important information
that can be changed by the user during sessions. We will introduce these causes of
memory usage inside embedded systems in this section. Then we will deal with the
techniques used to satisfy these requirements inside embedded systems.
Let us begin this chapter by giving an introduction to the types of memory used by
embedded systems. As embedded system engineers, it is always advantageous to peek
into this domain so that we can appreciate the usage of different types of memory. We
can understand the type of memory, which should be used for a specific activity. Also,
this gives us an understanding of the capabilities of these memories and the pros and
cons in using a specific type of memory. We will discuss about ROM and its kinds,
RAM and its types, as well as the FLASH memory. We will necessarily keep this dis-
cussion short. The interested reader is advised to consult the exhaustive literature avail-
able for memory.
Basically embedded systems require memory to store the following classes of data:
Data related to executable code in machine instruction format: This is usually burnt*
while the device is being manufactured. This kind of data is typically not changed dur-
ing the lifetime of the device. This kind of data requires a write-protected or read only
memory — that is, once filled, this memory will not be changed during the lifetime of
the product.
Data storing the current context of execution: This data is usually the variables being
used by programs and their stacks. This memory is very volatile. Since the variables
and stacks make sense only when a program is executing, it is expected that the con-
tents of this memory can be lost when power is turned off. However, it is also expect-
ed that this kind of memory is fast to access for reading as well as writing because the
realtime behaviour of a device will also be governed by this factor. This type of mem-
ory should be fast, erasable and volatile —or in technical jargon, random access mem-
ory. We will see later that this name is a misnomer by the way. Embedded systems use
different types of random access memory based on amount of data used, cost of device
and requirements on speed.
P2P
Does access to RAM need to be deterministic?
Configuration data: This data relates to configuration of the device. For example, in
DECT* phones we can store the phonebook in the form of name-telephone number
pairs. Now this phonebook is expected to remain intact if the phone is switched off and
back on again. However, it is also expected that entries can be added, deleted and
changed over and over again. This kind of memory should be capable of being altered.
It should not be volatile so that the data is not lost at switch off. It is a sort of mixed type
of the earlier memories. This memory does not have very stringent requirements on
speed though. So this memory should be non-volatile and changeable.
A memory access is the procedure used to read and write into memory. This procedure
used to control each access to memory involves the memory controller to generate cor-
rect signals to specify which memory location needs to be accessed. This is done based
on the access of data by the program. The data shows up on the data bus connected to
the processor or any other device that requested it.
Memory chips are organised as rows and columns of data. For example, a 16MB chip
can be accessed as a 4M×4 block. This means that there are 4M (4,194,304) addresses
with 4 bit each; so there are 4,194,304 different memory locations —sometimes called
cells— each of which contains 4 bits of data. 4,194,304 is equal to 2^22, which means
22 bits are required to uniquely address that number of memory locations. Thus, in
theory 22 address lines are required.
*DECT—Digitally Enhanced Cordless Telephone.
Types of Memory 53
However, in practice, memory chips do not have these many address lines. They
are instead logically organised as a “square” of rows and columns— sometimes called
bitlines and wordlines respectively. The low-order 11 bits are considered the “row”
and the high-order 11 bits the “column”. First the row address is sent to the chip,
and then the column address. For example, let’s suppose that we want to access
memory location 3,780,514 in this chip. This corresponds to a binary address of
“1110011010111110100010”. First, “11110100010” would be sent to select the “row”, and
then “11100110101” would be sent to select the column. This combination selects the
unique location of memory address 3,780,514. The active row and column then sends
its data out over the data bus by the data interface.
Figure 3.1 shows at a conceptual level, an example of memory access with an 8*8 row
and column grid. Note that the grid does not have to be square, and in fact in real life
it’s usually a rectangle where the number of rows is less than the number of columns.
P2P
What is the advantage of using a rectangular address grid?
This is analogous to how a particular cell on a spreadsheet is selected and set: row #34,
say, and then look at column “J” to find cell “J34”. Similarly, for example, how do chess
connoisseurs track the moves made by Vishwanathan Anand yesterday in New York?
Elementary, my dear Watson, just label the chessboard by rows 1 to 8, and column a to
g. Now, all moves can be represented by a series of digit-alphabet combinations.
Let us now get back to the world of memory chips ☺. If we apply common sense to
this theory, we can argue that designing memory chips in this manner is both more
complex and slower than just putting one address pin on the chip for each address line
required to uniquely address the chip—why not just put 22 address pins on the chip?
The answer may not surprise many people: it is all about cost finally. Especially when
so many embedded systems do not have hard real time constraints, we can still live by
with a few memory-access delays if it makes the system simpler and cheaper. By using
the row/column method, it is possible to greatly reduce the number of pins on the
DRAM chip (We will explain how this “D” came before RAM very soon ☺). Here, 11
address pins are required instead of 22. However it should be noted that additional
signaling is required so that the memory chip and the accessing device are always syn-
chronised about what they are expecting. This signaling pin is usually called probe or
chip select. One thing is for sure: everything else remaining constant, having to send
the address in two “chunks” slows down the addressing process, but by keeping the chip
smaller and with fewer inputs we gain in terms of power consumption and space
(because of reduction in number of pins). The reduction in power consumption further
leads to an increase in the speed of the chip, partially offsetting the loss in access speed.
Figure 3.2 shows a typical memory chip with 8 address lines and two data lines.
With the aid of Figs. 3.3 and 3.4 respectively, let us trace the steps for read and write
operations through this chip.
For writing:
i. The address of the cell to be written to is placed on the address pins via the
address bus: This is done by first setting the RAS and putting appropriate row
number followed by setting the CAS and putting appropriate column number
on the address bus.
ii. Set the Write Enable pin: The bit that needs to be stored in the chip is sent on
the Data In pin via the data bus.
iii. Chip select is activated to select the memory chip: When all these operations
are performed simultaneously, a bit on Din pin is written inside the chip at the
address specified by the address bus.
In actual practice, memory is accessed at least a byte at a time, and not a bit at a time.
This is accomplished by stacking each such chip into blocks of eight and combining the
bit-data streams from these eight chips. When these chips need to be addressed, the
Chip select is enabled on all of them, and the same address is specified on all address
lines. Depending on the capacity of the data bus, each such block can again be stacked
to make mega-blocks that can service data in multiples of a byte.
Definition
The amount of time that it takes for the memory to produce the data required, from the start of
the access until when the valid data is available for use, is called the memory’s access time,
abbreviated tAC (See Fig. 3.4).
Access time is normally measured in nanoseconds (ns). Memory available today nor-
mally has access time ranging from 5 to 70 nanoseconds.
Figure 3.5 identifies the different types of memory. Let us look at each below.
3.2.1 RAM
Random access memory (RAM) is a read-write memory. RAM is considered “ran-
dom access” because any memory location can be accessed directly instead of a sequen-
tial operation from the beginning of the memory. Being random access does not define
this kind of memory completely. It is sufficient to distinguish it from its opposite, serial
access memory (SAM). SAM stores data as a series of memory cells that can only be
accessed sequentially (like a cassette tape). Data is searched from the beginning of the
memory until it is found or end-of-memory is reached. SAM works very well for
memory buffers, where the data is normally stored in the order in which it will be used
(a good example is the texture buffer memory on a video card). A random access
58 Embedded Realtime Systems Programming
memory on the other hand can directly address particular portions of memory. This
makes it fast and expensive as compared to SAM.
RAM is the place in embedded systems where usually, the program, its stack and
data in current use are kept so that the processor can quickly reach it. At the beginning
of execution (switch on of the system), these initial values are loaded inside RAM. RAM
is used since it is much faster to read from and write to than its distant cousin: the ROM.
However, the data in RAM is volatile and stays there only as long as the system is pow-
ered up. When the system is turned off, RAM loses its data. When the system is
switched on again, the binary image is again loaded from ROM and all stack and data
is again initialised.
Tips
Who loads this program and data inside RAM? It is the job of your program to know the amount of
data and its required space inside RAM.
so the DRAM controller just keeps on reading periodically from each cell. For that rea-
son there is a refresh timer inside the figure. The DRAM controller takes care of
scheduling the refreshes and making sure that they don’t interfere with regular reads
and writes generated by the processor or some other device. DRAM controller period-
ically sweeps through all of the rows by cycling RAS repeatedly and placing a series of
row addresses on the address bus. The upside of a DRAM is that since it is so simple,
it is small in size and less expensive. The downside is that all this refreshing takes time
and slows down the memory, particularly as compared to its sister—the SRAM.
Note
Earlier in the chapter, we had pointed out that a DRAM chip is usually in the form of a rectangle
instead of being a square. Now is the time to explain this fact. Since DRAM uses RAS to periodically
sweep through the entire RAM area, this operation will be faster if the number of rows is less because
the fewer rows the chip has, the less time it takes to refresh all the rows. Consequently, DRAM makers
design DRAMs with fewer rows than columns thus resulting in a rectangular layout.
As DRAMs became more sophisticated, it became common to put this refresh cir-
cuit from the system board directly onto the DRAM chip itself. When this is done, from
the outside, it appears that the memory is behaving statically since it does not require
any refresh circuit from outside. In reality, however, it is still a DRAM since each mem-
ory cell is being constantly refreshed on the chip; only the position of the source of
refresh operation has changed. When the refresh circuit is integrated with the DRAM
chip, the device is called a Pseudostatic DRAM.
Types of Memory 61
DRAM is of two kinds: asynchronous and synchronous. An asynchronous DRAM
has the freedom to start its operations irrespective of the clock. However, this requires
some time for co-ordination between the different pins in order to judge the change of
configurations on the pins. SDRAM or the synchronous dynamic RAM is called so
because this memory marches in step with the system clock instead of allowing itself the
asynchronous freedom to respond at its own pace and on its own schedule. SDRAM
“locks” (synchronises) the memory access to the CPU clock. This way we get faster data
transfer. While one portion of data is transported to the CPU another may be being pre-
pared for transfer. Additionally, it stays on the row containing the requested bit and
moves rapidly through the columns, reading each bit as it goes. The idea is that most
of the time, the data asked for from the device will be in consecutive locations inside
memory. To understand why, let us take a look at Fig. 3.7, where we show an asyn-
chronous operation of reading two bits.
For each operation, synchronisation has to be maintained between RAS, CAS, etc.
In Fig. 3.8, a corresponding operation for SDRAM has been illustrated. Data starts to
be read from contiguous memory locations after the bitline and wordline have been
specified. Each clock tick initiates a read operation from the next wordline.
SDRAM typically has an access time of only 6–12 ns. Another variant of SDRAM is
called DDR RAM or a double density RAM. It is a new technology and is a clock-
doubled version of SDRAM, which is replacing SDRAM nowadays.
SRAM
SRAM or static RAM is so called because it retains any information stored in it, as long
as power is maintained. The data just sits there, calmly awaiting retrieval by the system
command. Upon receiving an order to over-write the data or to provide some data
being retained, the SRAM is very fast to respond. That’s one of its endearing qualities.
SRAM uses a completely different mechanism for storage of information. An SRAM
cell is usually made up of a f lip-f lop gate which further comprises of about 4 to 6 tran-
sistors, arranged in a configuration that traps either a binary 1 or a binary 0 in between
them until that value is either written over with a new value or the power goes out. This
configuration never needs refreshing unless power is switched off. This makes SRAM
much faster in response time than DRAM and very power efficient. SRAM can be
made with a rise time as short as 4 ns. However, because it has more cells, each cell of
the SRAM takes more space as compared to a DRAM cell. This means that a chip can-
not hold as many cells as that of DRAM. This makes SRAM more expensive as com-
pared to DRAM. SRAM is normally used in places that require a very quick response
time, like for example cache.
3.2.2 ROM
ROM, or the read only memory, as the name suggests is a memory in which we can
only read from. This means that ROM cannot be written again and again. This memo-
ry will retain its contents even when the power is switched off. Hence this memory is
used to store anything that needs to be used after the system has been switched off and
on again. What kind of information can this be? This is usually the actual program that
will be executed on the embedded system. Since this memory does not get erased at
switch-off, it is also called nonvolatile memory.*
Because of the way it stores information (as we will see soon), ROM is much slower
compared to RAM, typically having double the access time of RAM or more. However,
*One term that often confuses people is that RAM is the “opposite” of ROM because RAM is read-
write and ROM is read-only memory. Hence, since RAM stands for “random access memory”, ROM is
not random access. This is not true. ROM is also a random access memory. Inside ROM, any location
can be read in any order, it is just not writeable. RAM gets its name because primitive read-write mem-
ories introduced in the beginning were sequential, and did not allow random access. The name stays
with RAM even though it is no longer relevant. ☺
Types of Memory 63
it is expected that ROM need not be accessed as frequently as RAM, so this limitation
can be lived with. This, combined with the fact that ROM is considerably cheaper as
compared to RAM per byte, definitely has its own advantage.
Definition
RAM is often used to shadow parameters stored in EEPROM (RAM is mapped to ROM’s
memory space) to improve performance. This technique is called ‘ROM shadowing’.
While the purpose of a ROM is that its contents cannot be changed, there are times
when being able to change the contents can be very useful. Sometimes it is desirable
that the memory remains read-only for all normal circumstances and it should be
possible to over-write it by specially defined processes. For example, in a mobile phone,
it will be worthwhile to store a specific type of ringer tone into such memory that cannot
be erased when the phone is switched off. However, it should also be possible to update
this tone from time to time.
Similarly, the user settings inside a washing machine need to be nonvolatile from the
switch off point of view. However, it should be possible to set different setting for dif-
ferent clothes (cotton, wool…) and mode of operation (spin dry, double wash…).
Hence, there exist a lot of ROM variants that can be changed under certain circum-
stances; these can be thought of as “writeable nonvolatile memory”. The following
sections describe the different types of ROMs with a description of their relative
modifiability.
Regular ROM
A regular ROM is constructed from hardwired logic, encoded in the silicon itself, much
the way that a processor is. It is designed to perform a specific function and cannot be
changed. This is inf lexible and so regular ROMs are generally used only for programs
that are static (not changing often) and mass-produced. ROM is analogous to a com-
mercial software CD-ROM that can be purchased in a store.
While RAM uses a transistor and capacitor combination to store data, ROM uses an
electric diode at the junction of row-column to determine whether a 1 or a 0 has been
stored at the location. When the ROM chips are “burned”, the cells where a 1 has to
be stored have a connected diode. The cells where a 0 has to be stored have an uncon-
nected electric circuit. The diodes in these intersections allow the f low of current in
only one direction. Like all diodes, a voltage above the break over voltage (of the order
of 600 mVolts) is needed so that the diode passes current. To determine whether a cell
64 Embedded Realtime Systems Programming
has a 0 or a 1, a voltage above 600 mV is applied to a column while keeping the row
grounded. If the diode is connected, the current will be conducted to the ground. Using
this method, the status of each cell can be read.
Obviously, since there is physical presence of a diode to indicate a 1 in a cell, this
kind of memory cannot be changed and reused. Once the ROM is manufactured, it can
only be read. And if there are some bugs in the values, well, unfortunately, the whole
chip has to be thrown. This makes the chip design process long and cumbersome. On
the upside, ROM chips are very cheap for mass-production, have high reliability over
a long duration and they consume very less power.
3.2.3 Flash
Flash memory is similar to EEPROM in design. The difference is that it can be erased
and reprogrammed in blocks instead of one byte at a time. In-circuit wiring is used
to apply electric charge to an entire chip or to specific sections called blocks, each
usually of size 512 bytes. Being light, compact and energy-efficient, typical uses of
66 Embedded Realtime Systems Programming
FLASH are in CompactFlash, SmartMedia, Memory Stick (most often found in digital
cameras), PCMCIA type I and type II memory cards (used as solid-state disks in
laptops). The original intended usage of FLASH memory was to replace mass storage
devices like disk drives and tapes. Flash memory in the form of a card or stick is very
versatile and can be used across devices if a standard file system is used to represent
data inside it. This is the concept of so-called linear f lash.
There is another kind of FLASH called the ATA f lash. An ATA f lash memory mod-
ule interfaces with the rest of the system using the de facto “AT Attachment” standard.
The FLASH gives the illusion that it is made up of sectors like on a hard disk and the
same APIs can be used as for accessing a disk drive. The main advantages of ATA f lash,
from the embedded system developer’s perspective, are f lexibility and interchange-
ability with hard disks. While linear f lash modules aren’t 100% interchangeable
between devices, ATA f lash overcomes this limitation by using a standard AT interface
for accessing it. ATA f lash can be accessed using an operating system’s standard disk
access code and the same file system APIs. This aids in cross compatibility.
For example, a memory card inside a digital camera, equipped with f lash memory
uses a format to store data that is compatible with the way PC Card stores it. Hence, the
card can just be inserted into a PC card slot and can be read directly by the computer.
Not only does this promote cross compatibility, it aids in debugging as well since the lim-
itation of an embedded system (lack of screen and input device) are easily surmounted.
There are additional advantages. The built-in file system is robust enough to perform
some housekeeping tasks. For example, it can detect areas of memory that are defective.
It can then forbid access to these regions for read-write purposes. It can have a
mechanism by which it can create virtual sectors which point to physical sectors in
memory in such a way that read and write accesses to these sectors is evenly spread on
the chip thus preventing heavy usage and associated wear of a particular portion of
the chip. As expected, everything in this world comes with a price ☺. ATA Flash has so
many advantages, but all these features make it more expensive and power-hungry.
Because of speed limitations, f lash memories incorporate built-in SRAM buffers, dupli-
cating the contents of a block of memory from the f lash array for fast access.
Memory is used inside embedded systems in order to store executable code, to load the
instructions to be executed together with their data, to store information that can
change between sessions (for example user preferences). This gives rise to different
Types of Memory 67
types of memory inside embedded systems. Memory chips are usually arranged in the
form of rectangles as rows and columns of data. Typically, memory chips use the same
address lines for row and column address and reduce the number of lines significantly.
The data is read or written based on the RAS and CAS pins.
There are various classes of memory based on their usage. The random access mem-
ory has the ability to be accessed in a non-serial way and loses its contents if the power
supply is disconnected. It is of two types: the dynamic RAM needs constant charging
typically many times a second to prevent it from forgetting its data. The static RAM
does not need constant charging for storing its contents. The Read-Only memory is a
form of ‘not-easily-erasable’ memory. The regular ROM can be used only once and its
contents cannot be changed after burning it once. The programmable ROM is capable
of being recharged once through electric current. The EPROM uses the Fowler-
Nordheim technique to erase its contents any number of times. EEPROM is similar to
EPROM in operation except that its contents can be erased through software control
through electric charge. Flash is a type of EEPROM that can be reprogrammed in
blocks instead of one byte at a time. Flash memory has enhanced cross-compatibility of
memory across embedded devices through the usage of a standard AT interface.
Memory Management in
Embedded Realtime Systems
Memory is a very precious resource in embedded systems. As we have seen in the chap-
ter ‘Introduction to embedded/realtime systems’, price is one of the critical factors in
designing embedded systems especially while targeting high volume products. We have
seen the rapid fall in prices of memory in the desktop systems, we have seen that cur-
rent desktop configurations have 256 to 512 MB RAM as minimum configuration. But,
in embedded systems, even 8Mbit Flash and few MB RAM can be considered a pre-
mium. Desktop systems also have the luxury of having huge hard disks, which can be
used as supplementary virtual memory. Few embedded systems have hard disks. So, the
concept of virtual memory is absent in most of the embedded systems. Since memory
is scarce, it should be managed properly.
Before jumping into memory management schemes for embedded systems, let us see
how memory was managed before.
machine language translations. (Gurus were those people who knew the op-codes for
50 –60 assembly instructions)
When variables were absent, there is no point talking about arrays. An array by def-
inition is a “finite set of homogenous data stored in contiguous memory locations, each location
accessed by the array variable and its index /indices”.
With this definition in mind, we used to allocate contiguous memory locations on
paper in our memory map and then calculate array locations.
Static allocation
This consists of global variables and the variables are defined with the ‘static’ qualifier.
The address or the memory from these locations are provided at compile/link time and
do not vary when the program is executing.*1
Automatic allocation
This kind of allocation is done at runtime for variables defined in every function (or
scope). This allocation of memory for local variables is done automatically during the
execution of the program (by C runtime*2). Hence it is christened as ‘automatic’ alloca-
tion. This is the reason for existence of a keyword called ‘auto’ (ever heard of it?) in C
that is never used.
Keyword ‘auto’ is a qualifier for a variable defined in a scope local to a function.
In the listing below, memory for the variables a, b and c have been allocated as auto-
matic. In the listing below, the keyword auto is used to explicitly qualify variable c for
automatic allocation. Memory for all the variables are allocated on the runtime stack (as
explained in the chapter “build process for embedded systems”.
// . . .
}
Dynamic allocation
The third way to allocate memory for variables is to demand some amount of memory
from a component called the heap. In the earlier two cases, the compiler is responsible
for allocating memory for the variables and, so it is its duty to take care of de-allocating
the memory.
The user can use variables that are statically or automatically allocated without both-
ering much about memory leaks. He should bother if he uses a large number of
initialised static variables since it will cause the code size to increase.
The first two methods of allocation are explained in Chapter 2. This chapter deals
with third kind of allocation (dynamic allocation) and the runtime data structures.
C Programmers will be familiar with good old malloc and free (and the C++
counterparts with new and delete). However, there is a huge variety of heap man-
agement functions available especially for the embedded/realtime space. These heap
management functions can be classified as
■ Variable Buffer size routines
■ Fixed buffer size routines
Before going deep into these functions, we need to know the typical implementations
of these memory allocation functions and their effects on the realtime behaviour of our
program.
Note
Some of the popular myths in the common programming community are listed below:
i. The size of memory allocated by malloc is exactly equal to the size requested
ii. There is no harm in freeing a memory twice
iii. Malloc/Free request and return memory from the OS
iv. Malloc/Free do not take much time (i.e. they are pretty fast)
Here, T can be any type (A predefined type such as int or user defined type like a
struct).
*Programming idioms are constructs (like the one described here) that have been used repeatedly that
programmers use them even without consciously thinking about them.
Memory Management in Embedded Realtime Systems 73
Novice programmers generally feel that malloc allocates space exactly equal to the
size of type T. To understand why this doesn’t happen mostly, we must understand how
malloc works.
Malloc (and other allocation functions) request a block of memory during the pro-
gram startup. This memory can be used by malloc to provide memory to its callers.
This memory is organised by malloc and is termed as ‘heap’.
We should understand that the entire memory of the heap is not available to the pro-
gram. Some part of the total memory is required to maintain the heap. So, the total
memory of the heap can be divided as
■ The memory required to maintain the heap
■ The memory available for the program
Why do we need memory to maintain the heap? To answer the question let us
remind ourselves of how malloc and free are used.
T* p = (T*) malloc ( sizeof(T) );
free ( p );
We see that we pass the amount of memory required to malloc, but not to free the
memory. This information is stored somewhere else in the heap. This space for storage
of the size of pointer is said to be used for maintenance of the heap.
The entire available memory with the heap cannot be looked upon in units of one
but as chunks of blocks with unit sizes of 16, 32, etc. (usually in powers of 2).
The concepts discussed above are illustrated in the example below:
Consider we have 2K (2 * 1024 = 2048 bytes) of memory and we analyse two
cases where the memory is split as 16 and 32 byte blocks. (We should remind ourselves
that the entire 2K memory may not be available for the program. But for the sake of
making this discussion simpler, we are considering entire memory as available to
programs).
Memory can be acquired or released only in the units in which the memory is divid-
ed (i.e. 16 or 32) as shown in Fig. 4.1.
74 Embedded Realtime Systems Programming
3 * 32 bytes
allocated for 70
bytes (32 byte
blocks)
5 * 16 bytes
allocated for 70
bytes (16 byte
blocks)
Here we can see that for a 70-byte request, the memory allocated is 96 and 80 respec-
tively for the cases where the unit sizes are 32 and 16. We can infer that by reducing the
unit size we can reduce the wastage of memory, the ideal case being unit size = 1. To
see why this is not done, we can look up some implementations of heap. In some
RTOSes, the beginning few blocks of a memory chunk is reserved for maintaining the
data regarding allocation of blocks (Fig. 4.4).
The amount of memory required to maintain the data is inversely proportional to the
unit size. If, the unit size is small, then the number of units (total memory/unit size) will
be high and hence the memory needed to maintain the data regarding those units
becomes high (thus reducing total available memory).
Thus, by decreasing the unit size, we may actually waste more memory than what we
may think we are saving. This can be considered as a classical maxima/minima prob-
lem in calculus.
Usually in desktop implementation of heap managers usually the problem of deter-
mining and assigning unit sizes for heaps does not arise. But, in RTOS, the burden of
determining unit sizes rests with the programmer.
76 Embedded Realtime Systems Programming
Memory Required
for Maintenance
Total
Memory Available
Memory
Note
If you need to determine the unit size for the heap, then don’t make random guesses to suit the situation.
Instead, study and prepare a table of all the structures that are used in the program and that require
dynamic memory management. Then choose a unit size based on the table, I know this is tough.
But, who said embedded programming was easy? ☺
Now I know that you’ll swear that malloc will not give you the exact number of
bytes you had requested for. But still, before we move to clear the next myth, let us see
how memory allocation works.
Usually the heap management keeps a list called the “free list” which is a list of blocks
containing memory that is free. These free blocks can be arranged in various orders:
■ Decreasing sizes of memory (largest free block first)
■ Increasing sizes of memory (Smallest free block first)
■ Memory Location (Physically adjacent blocks)
Memory Management in Embedded Realtime Systems 77
In the first case, malloc will work real fast. If memory is requested, it is checked if
the request can be satisfied with the memory available in the first block. If available,
the requested memory is allocated. Else, there is no point traversing the list because,
later blocks will only be smaller. This is called the first fit mechanism.
In the second case, where the smallest block is the first bloc, the heap manager will
traverse the list until a block that is large enough to satisfy the request is found. This is
called the ‘best fit’ allocation. It is definitely slower than the first fit approach.
The advantage of the third arrangement (where the blocks are arranged in an order
that is physically adjacent) is that adjacent free blocks can be easily identified and
merged (or in memory management parlance, ‘coalesced ’).
Now, we can tackle the next myth: “There is no harm in freeing a memory twice ”. Freeing
a memory twice can happen if multiple copies of the pointer are stored in different
places. This can happen when the pointer is saved in lists or when passed to functions.
Sometimes, it could happen that a pointer is freed in multiple places.
This usually results in unpredictable behaviour. Heap functions have the right to val-
idate the pointer being freed. They have the right to raise software exceptions. This usu-
ally happens in desktop applications.
A more dangerous thing can happen in the case where a pointer freed could be allo-
cated for some other purpose. Freeing the memory by an invalid previous reference will
free the memory allocated to newly allocated pointer. In this case, the behaviour of the
system will be inexplicable. So, freeing a pointer twice is definitely not safe.
In Chapter 2, we discussed about the startup code that gets linked up with the main
application. One of the tasks of the startup code is to initialise the heap so that the appli-
cation can start using the heap as soon as it comes up. In desktop based applications,
during the startup, a chunk of memory is requested during the startup by the heap man-
ager. Any requests by the application to allocate and free memory is handled by the
heap manager using the memory chunk requested during the startup. So, the OS has
no role to play in heap management during the execution of the program. (Except in
cases where the heap size may be extended by the OS then heap space runs out). This
information should clear the third myth— “Malloc/ Free request and return memory
to the OS ”.
The fourth myth — “Malloc/ Free do not take much time (i.e. they are real fast) ” is a myth
that could affect realtime programmers. We have seen that heap is organised as lists and
some search occurs whenever memory is requested or even freed (to merge adjacent
78 Embedded Realtime Systems Programming
blocks). The duration of this search is unpredictable. The problem with this delay is that
they make the system non-deterministic. Non-determinism is the greatest enemy in
determining realtime responses of a system.
To really understand the realtime issues and why they arise, we have to appreciate
two processes that are associated with any heap:
■ Heap Fragmentation
■ Heap Compaction
Heap fragmentation: As the name indicates, heap fragmentation is a condition where the
available memory is scattered throughout the heap in such a way that it is not possible
to service a request for memory even if the collectively available memory (scattered
throughout the heap) is more than the memory requested. The condition is illustrated
in Fig. 4.6.
In the hypothetical condition described above, if a request for 35 K is made, then, it
cannot be serviced (even if total available memory is 45K) because the heap is
fragmented and no single block that can satisfy the request is available.
Fragmentation can be classified as:
i. Internal fragmentation
ii. External fragmentation
Utilized
Memory
Memory
Divided as
Blocks
An allocated block
10K Byte
Block
5K Byte
Block
30K Byte
Block
Heap compaction: We have read before that the heap can be arranged/structured in var-
ious ways. One popular approach is that the heap manager maintains a list of available
memory locations (known as free list).
If a request is made, and no sufficient memory is available, then a process known
as heap compaction is carried out. In this case, the entire list of free locations is
scanned and adjacent ones are merged (coalesced) to see if we can still service the
request.
The problem, which occurs here, is that the compaction can take an unknown peri-
od of time based on the fragmentation level of the heap. This will make the system non-
deterministic. Long compaction times can wreak havoc while considering the fact that
realtime deadlines have to be met. For e.g. think of a case of a f light controller task that
started executing heap compaction for 20 seconds during the landing of a f light.
There are various ways of working around this problem. The heap manager can be
made to run as a separate thread carrying out problems in installments. This still intro-
duces some level of non-determinism into the system.
Another alternative case is that, compaction can be done whenever a pointer is freed.
But, again this could cause a condition where the adjacent blocks again need to be
merged and the list restored. This again introduces nondeterminism.
It should now be appreciated as to why we dwelt so long on this concept. The rea-
sons are unique to embedded realtime systems:
i. Realtime
ii. Run for ever
Realtime: In realtime applications, systems not meeting the deadlines can cause results
ranging from minor discomfort to huge loss of life and property. In the case of desktop
applications, users are usually tolerant when the system appears to have crashed
or ‘hang’. These embedded realtime systems can be used by people who have no
introduction to computing (e.g. your vegetable vendor). If consumers finds that their
Memory Management in Embedded Realtime Systems 81
mobile phone is not responding in a way that they want it to, then they will return it
back — and no amount of explaining that it was a rare case of realtime deadline miss
will help.
So, timeliness is a very important criterion for realtime systems. A designer /devel-
oper targeting realtime systems must know that these issues exist and tackle them.
Run forever: Unlike desktop applications that exit after sometime*, embedded systems
tend to run forever. (When did you last “reboot” your television/mobile phone?).
Memory is a finite resource and the leakage of even a single byte will bring the system
to a grinding halt though it may have giga bytes of memory available. So, memory
issues must be tacked carefully in embedded systems.
Solution of dynamic memory problems: It might now seem to you that with so many per-
ils lurking around, it is safer not to use dynamic memory management at all and sim-
ply allocate all memory statically or in stacks (local variables) during runtime. These
approaches are not without their disadvantages:
Statically allocating memory for all the variables may increase the code size and
hence the RAM size in the final system.
Lot of memory in the system will be wasted if all the local variables are statically allo-
cated. It will lead to a condition where a lot of memory is badly utilised. That is, some
variables require storage only for a small period of time but they occupy memory per-
manently.
It is not always possible to predict memory requirement that may occur during run-
time. Since, memory is really expensive, we cannot have oversized arrays eating up
expensive memory. So, we have to live with dynamic memory management.
But, there are some schemes that offer deterministic or even constant time operations
for allocation and freeing of memory.
Pools can force realtime behaviour inside the system because they do not take time
to allocate and free memory. This is because, memory is not actually allocated and
freed. Only the pointers no longer point to the memory segments. They still remain
allocated from the system point of view.
*Daemons and services that execute in servers are exception to this. But they are not usually realtime.
82 Embedded Realtime Systems Programming
The pools divide the available memory space into linked lists of memory segments.
Hence, dirty mini-fragments as we discussed above are not created. And, no
complicated algorithms to optimise the memory space are required. A first fit algorithm
is all that is required at allocation time. And memory freed is not fragmented further.
So it is instantly available for next allocation.
On the downside, pools have the following disadvantages:
■ At startup time, a lot of effort and time is spent in initialising the pool and allo-
cating chunks of memory of different sizes and making linked lists.
However, a real time system usually requires the least real time behaviour at
start up time. If it is not really time-bound like a mobile phone, which is expected
to respond to an incoming call 30 seconds after startup.
■ A prudent decision has to be made to define the size of memory chunks inside
the pool. Usually, as we discussed earlier, more than one lists of different sizes are
available.
The size of chunks inside these lists is usually configurable inside all RTOSes.
The sizes should be decided based on typical memory usage characteristics of the
system.
■ All said and done, pools may still introduce fragmentation and wastage of mem-
ory. If we have 20 byte chunks available and our application requires 10 bytes, I
will still be given a pointer to point a 20 byte chunk and 10 bytes will be wasted
in this chunk.
But this does not make the memory space dirty and it is a small price to pay for
the elegance and simplicity of the approach.
Some of the most common problems associated with dynamic memory usage are
memory leak and dangling pointers. This section will address memory leaks, its causes
and possible solutions. The next section will take a look at dangling pointers.
Definition
Memory leak is an amount of memory that has been rendered useless and inaccessible to the
system.
What are memory leaks: Memory leak easily seems to be one of the most famous and
destructive forms of problems related to dynamic memory usage. How can a part of
memory become inaccessible to the system? And what is the significance of the term
“memory leak”? Take a look at Listing 4.1. We will use the malloc and free function calls
in order to illustrate the concept. The same problem exists for a pool-based system too.
Memory Management in Embedded Realtime Systems 83
#include <malloc.h>
#include <stdio.h>
/*
This program will allocate some memory. Then the memory will
not be freed by the program. Hence the memory will remain
allocated but it will be not be used by the program. The rest
of the system will not be able to use the memory anyway. This
condition is called memory leak because the total amount of
memory available in the system decreases.
*/
return 0;
}
20 litres of petrol to get to the destination, this time around, there are chances that you
will become stranded mid-way with no clue as to why your car did not cover the
appropriate number of kilometres. This is because the petrol tank of the car had a leak.
Similar is the situation in software.
#include <malloc.h>
/*
This program will allocate some memory. Then the memory
will not be freed by the program. Hence the memory will
remain allocated but it will be not be used by the program.
The rest of the system will not be able to use the memory
anyway. This condition is called memory leak because the
total amount of memory available in the system decreases.
*/
/* Following is the structure defined as an example */
typedef struct
{
u8 ld; // u8, u16 are user define types
u16 number;
t_MyStruct* next;
} t_MyStruct;
/*
Code that does some manipulation and computation
*/
fillData(lp_temp_pointer);
lp_temp_pointer = lp_temp_pointer-> next;
} // for
} // if
free(lp_pointer);
return(SUCCESS);
/*
A memory leak has been created here since all blocks of
data pointed to by next have not been freed.
*/
} // while
} // main
Since a portion of the code keeps on taking a chunk of memory and never return it,
at some point in time, memory will not be enough to conduct the normal business. Even
though tools are available in the market to detect and fix them, it is not easy to detect
memory leaks. Thus due to the nature of the problem, the bug is mysterious and usu-
ally difficult to trace. You may get statements like the following from the customer:
■ System was running fine in the field, though it appeared it was becoming
slower. Finally on the 53rd day, all of a sudden it crashed.
■ When the user tried to browse a 10 MB file, the system crashed midway.
■ To our pleasant surprise, this time, the system did not crash on the 53rd day. It
crashed only on the 62nd day.
All this usually points to the same direction. It will be futile to look at the millions of
lines of trace statements of the last 52 days because the problem will not be visible
there. The problem is that in some corner of the software, somebody is allocating mem-
ory and it is somehow not getting freed. Hence over a period of time, the system
becomes helpless. A question that comes to the mind here is as to why the system took
more than 53 days to crash. Well, we know that embedded systems are event-driven
machines. An embedded system probably will not do much unless some external or
internal event happens and introduces some activity inside the system. And, if the leak
86 Embedded Realtime Systems Programming
is inside a particular portion of software that gets executed whenever event E happens,
it entirely depends on the number of times the event E happens, in order for the sys-
tem to stop functioning properly. Second point to be mentioned in this regard is that
the memory leak may go on undetected for some time inside the system. In our previ-
ous listing, for example, if the program keeps on inadvertently accessing the next blocks
via the lp_pointer->next pointer, there may be no problem till the point of time that the
block pointed to by lp_pointer is reassigned to some other part of the system.
Typical causes of memory leaks inside realtime systems: Memory leaks arise since the pro-
grammers are usually unaware about sections of code—whether a particular pointer to
a memory block can be freed as it is not clearly known whether it is required any
longer.
Dangling pointers: The problem of dangling pointers often arises when there is more
than one pointer to a specific block. If the first entity owns the memory and wants to
free it, then it must first consider whether any other pointers point at that location. If
any do, and the first entity frees the memory, those other pointers become dangling
pointers — pointers that point to space that is no longer valid. When the dangling
pointer is used, you may happen to get the correct data, but eventually the memory will
be reused (via another call to malloc()) leading to unpleasant interactions between the
88 Embedded Realtime Systems Programming
dangling pointer and the new owner of that piece of memory. Consider the following
code listing for a demonstration of dangling pointer.
#include <malloc.h>
u8 * createMem(void)
{
u8 * lp_data ;
u8 vl_index ;
In this listing, lp_data and vl_index are local variables inside the function. Since
lp_data does not point to the heap its scope is limited to the stack of the function
createMem. When the function returns its value, the memory location pointed to by
lp_data is no longer valid. If this value is assigned to another pointer in the calling func-
tion, that pointer will become a dangling pointer.
Tips
A leak occurs when you fail to free something; a dangling pointer occurs when you free something
that was not yet ready to be freed.
Memory leaks and dangling pointers are similar to race conditions in a number of
ways. The misbehaviour they cause may occur far from where the bug was caused. As
a result, these problems are difficult to resolve by stepping through the code with a
debugger. For both memory leaks and race conditions, code inspections sometimes
catch these problems more quickly than any technical solution.
Adding debug code to generate output is often a better alternative than a source code
debugger, but in the case of race conditions, it could alter the behaviour enough to
disguise the problem. With memory leaks, adding debug code can change the shape of
the memory layout, which means that dangling pointer bugs may exhibit different
Memory Management in Embedded Realtime Systems 89
behaviour. Another drawback is that if the debugging code consumes memory, you
may run out of RAM sooner in the debug version than you would in the production
version. Still, a leak is a leak and should remain detectable regardless of these side
effects of the debug code.
This section tries to document some of the most common pointer-related bugs observed
by us during our tryst with embedded systems. Through this section, we can arrive at
some general guidelines in order to troubleshoot pointer-related bugs. However, the
bottom line is: better safe than sorry. ☺
■ While debugging, it helps to document everything inside the program: all argu-
ments passed, environment variables, etc. In this way, reliable logs can be gener-
ated and tracked based on these standard values.
Many times, we can make two and two together by looking at a list of such
parameter value-output pairs. The best way to do it is to maintain a testing report
log and running tests from a batch file that contains enough comments to map the
values of the input and erroneous output values.
■ During implementation, it is advisable to introduce probing of the trace state-
ments inside the code. It generally helps if a trace statement is present at the top
of each function and inside each impossible switch statement.
These trace statements can help significantly during debugging process since
they can pick the trail of the program like Sherlock Holmes! We just need to back-
track from the final executed statement in order to hook the bug.
■ Debuggers are very helpful in solving pointer-related problems. They provide
tools to watch the values of memory pointed by these pointers. Certain values
should immediately raise suspicion.
For example, if you see a pointer with a small negative value (e.g., FFFFFE
hex), it is possible that the pointer has either been corrupted or was never
initialised in the program.
Interrupts are to embedded systems what pointers are to C/C++ programmers. Most
embedded programmers shudder when they hear about interrupts. The interrupts are
inarguably one of those components of embedded system programming that are tough-
est to comprehend. This is because interrupts form the boundary between hardware
and software. The inputs from the external world enter the software realm usually by
interrupts. This chapter explains the basic concepts around interrupts and the ways to
program them.
Every embedded system typically takes input from its environment or its user. The
interrupt mechanism is one of the common ways to interact with the user.
Consider a situation in which the microprocessor has to process inputs from three
peripheral devices:
D1
µP D2
D3
This keeps the microprocessor always busy. It is either polling for input or process-
ing a polled input. This method has more cons than pros with one definite drawback of
appearing wicked to the microprocessor for forcing it to work forever.
The other mechanism is the interrupting mechanism. In this mechanism, the device
informs, i.e. interrupts the microprocessor whenever it has some input for the processor.
The clear advantage of this method over the polling method is that the processor is free
to do other work(running other applications ☺) when there is no input from these
devices.
Definition
An interrupt is a way to asynchronously request the processor to perform a desired service.
Analysis:
Asynchronous: Because, the interrupt might come from another device that may not be
clocked by the system clock. The interrupt might arrive independent of the micropro-
cessor clock. (However, the interrupts are presented to microprocessor only synchro-
nously —the processing is synchronous to the system clock)
Request: Interrupts (except the one called non maskable interrupt (NMI)) are requests.
The processor is free to defer servicing the interrupt when it is working on a higher
priority code (could be the service routine for a higher priority interrupt).
Perform Desired Service: Nobody likes to be interrupted for fun (especially for others’) and
this applies to microprocessors too. A device interrupts the microprocessor to request it
to perform a defined service or to pass on some data to the processor.
Interrupts and ISRs 95
One evident disadvantage of polling is that it consumes a lot of processing power. And,
since all the devices are polled in a sequential manner, there is no explicit priority
mechanism in the polling method. Let us consider the loop described in Listing 5.2.
while (1) {
if (input_from_D1)
Process_D1();
if (input_from_D2)
Process_D2();
if (input_from_D3)
Process_D3();
}
Consider that D1 and D3 have inputs for the microprocessor (i.e. they require the
service of the processor and the processor has begun work on input from D1). So, input
from the D3 has to wait for the processor to finish its work with D1, and query D2
before it can start to work with input from D3. Even if device D3 has a high priority,
the processor cannot be made to process D3 before it finishes its work with D1 and its
query with D2.
What if D3 is an input port and its buffer could overf low if not attended for a spe-
cific time? We may be able to finetune the loop (and the functions that service D1 and
D2) in such a way that D3 will be attended before it overf lows.
A simple way is to change the main loop is described in Listing 5.3.
while (1) {
if (input_from_D1)
Process_D1();
if (input_from_D3)
Process_D3();
if (input_from_D2)
Process_D2();
if (input_from_D3)
Process_D3();
}
In this case, if processing of D1 and D2 are within the overf low time of D3, device
D3 is safe. Sometime, we may be required to finely tune /optimise the functions that
process inputs from D1 and D2 such that the input from D3 does not overf low.
So far so good. It may now seem to work without any problem. But, what if another
device is added or due to change in requirements (the only requirements that do not
change are that of dead software), priority of a device changes, then the situation
becomes very awkward. All the fine-tuning is lost and we have to embark on another
attempt to adjust the code and the priorities. The coding for this case reduces to mere
jugglery. Even after a second tuning, we might have to anxiously wait for next change
in requirements with crossed fingers. Simply put, this solution is not elegant.
The area of interrupts and servicing them is filled with many terms. This jargon soup is
explained below.
There are many types of interrupts. The general nomenclature observed is
■ Hardware interrupts
■ Software interrupts
Interrupts
Hardware Software
The interrupts can be classified on the basis of the source of the interrupt.
The interrupts can also be classified based on their temporal relationship with the
system clock of the processor:
■ Synchronous interrupts: If the source of the interrupt is aligned exactly in
phase with the system clock (the clock used by the microprocessor), then the inter-
rupt source is said to be synchronous (Fig. 5.3). E.g., a timer service that uses the
system clock.
■ Asynchronous interrupts: If the interrupt source is not in phase with the sys-
tem clock it is termed as asynchronous (Fig. 5.4).
This might seem contradicting the definition of interrupt stated in the beginning of
the chapter, but the definition covers the hardware interrupts that arise from the
external devices. The synchronous interrupts occur because of software interrupts or
devices that are driven by the system clock and typically form a small subset of
interrupts arising from external devices in a typical embedded system.
*1Programmers who worked in DOS age would immediately remember the famous INT33H and INT
21H that are some of the oft-used software interrupts—for mouse and other general purposes.
*2C++/Java programmers should not to confuse this with the exception handling mechanism available
with these languages.
98 Embedded Realtime Systems Programming
System Clock
Interrupt
System Clock
Interrupt
Interrupt latency is defined as the time that elapses from the instant the interrupt was
raised to the first instruction of the interrupt service routine being executed.
Greater the latency, greater is the response time. So, the user will feel that the system
is not responsive enough. We will use this term in the context of interrupt handlers very
soon.
5.5 RE-ENTRANCY
Task A Task B
foo() {
// …
return;
}
In Fig. 5.5, foo() is executed in the context of A and B separately, i.e. in their own
stacks. This means that the local variables defined by the function foo() and the
arguments passed to function foo() are allocated in the stacks of respective tasks. This is
illustrated in Fig. 5.6.
foo foo
arguments arguments
The typical layout in memory is illustrated in the Fig. 5.7. From this picture it should
be clear that changing of local variable ‘a’ of foo() called in the context of task A does
NOT affect local variable ‘a’ of foo called in the context of task B. (since they are in dif-
ferent memory locations independent of each other).
‘Task B’ Stack
Global variables
are allocated in
this area
int a;
int foo( int b )
{
a = 1;
/* Rest of processing */
}
Listing 5.5: Global variable in a task
In other words, when foo() accesses global/static data,*1 clashes can occur between
instances of foo() called by the two tasks.
The Peterson’s solution for mutual exclusion between two tasks that share a common
resource is given below: (Don’t worry if you don’t even recognise it. This algorithm is
in the scope of theory of operating systems. This is given just to explain re-entrancy.)
// ...
turn = pid; // process id
while ( turn = = pid && require[other] = = true )
;
Listing 5.6: Peterson’s Solution
The comparison of turn = = pid in the above code may seem unnecessary because
the variable turn is not changed before the comparison (or turn never becomes lvalue
of any expression).
The comparison may sound irrelevant in a von-Neumann*2 model of computing
which is a sequential execution model.
*1Static variables declared within a function is IDENTICAL to a global variable in terms of storage
except that the access is limited to function that defines it. Otherwise it gets stored along with other
global variables.
*2Von Neumann can be considered the father of existing computing model that is widely used in
which the CPU executes the instructions of a program sequentially. If you can pardon us for adding to
the confusion, many concurrent programs are abstractions over the von Neumann model where the OS
provides virtual concurrency over a single processor that executes instructions sequentially.
102 Embedded Realtime Systems Programming
But we should note that in this solution, turn is a global variable. The task that runs
the above code might have been pre-empted after executing the statement turn = pid.
The higher priority task may now change the value of turn. So, when the execution
returns to the first task, the value of turn would have changed. Now the validation
turn = = pid immediately after the assignment turn = pid makes sense.
The use of shared global variables amidst multiple tasks without any synchronisation
may lead to race conditions.*1
So, we can conclude that if a function uses only its local variables and does not use
any variable with static storage (includes global variables and variables that have a stat-
ic storage specification) are safe to call any number of times. This kind of code is called
‘re-entrant ’ code.*2 It should be observed that any set of code (not necessarily a function)
could be classified as re-entrant.
Summarising, a code can be called re-entrant if and only if:
■ It uses only local variables (i.e. the variables allocated on the stack)
■ It does not use any variable that has static storage (global variables and local
variables that are specified with static storage)
■ It calls only functions that are re-entrant. (i.e. these three rules must be applied
to all the function that the code calls)
■ If global/static variables are used, they should be accessed in a mutually exclu-
sive way. The mechanism to create a mutually exclusive access to such variables
is described in the chapter— realtime operating systems.
Note
Interrupt Latencies
After fixing some major fixes, once we decided to analyze the performance of the system. We attached the
logic analyzer to measure the interrupt latencies. To our astonishment, we found that the latencies were
much larger than we had even anticipated. This was heavily affecting the throughput of the system. So,
we decided to analyze what the problem was. We found that we had two sets of memories one that was
internal to the processor and one that was external. When the code was built and loaded (we had used
the scatter loading technique described in Chapter 2) we had designed that some code be loaded in the
internal RAM and some in the external. During the path in which the interrupt was serviced, some of
the code was in internal and some in external. We noticed that the jump between these locations caused
the most of the performance penalty. So, we changed the loader file to put all the code needed by the ISR
in internal RAM to avoid expensive jumps. After this the latency was reduced to a desired value.
Microprocessors and microcontrollers usually have only a few interrupt pins.* This
number is typically two to four (with one allocated for NMI (nonmaskable interrupt)).
This number is normally smaller than the number of devices that the processor must
interface.
So, in a situation where multiple devices need to interrupt the microprocessor, a
hardware called “programmable interrupt controller ” is used.
D1
Microprocessor
D2
D3
Interrupt controller INTR
Dn
Fig. 5.8 Multiple devices connected to a sing INTR using an interrupt controller
*This is a typical situation. Chips designed with specific applications may have a number of interrupt
pins.
104 Embedded Realtime Systems Programming
The diagram above describes the position of the interrupt controller. Some people
tend to imagine that the controller is a highly intelligent* hardware. (Look ma! This
device arbitrates the interrupts raised by many devices). In its simplest avatar, this con-
troller can be as dumb as an 8-3 encoder.
A case of this simple 8-3 encoder used to map multiple addresses is given below.
(Old-timers will be reminded of their happy times with 8085)
µP
D1
D8
Fig. 5.9 An 8-3 encoder used to connect 8 devices to a microprocessor with 3 interrupt pins
right to spend its time in a responsible manner). This can be compared to us program-
mers switching the phone to ‘Don’t disturb mode’ when we are doing some very impor-
tant work.*
So, when the microprocessor is working on a higher priority interrupt, lower priori-
ty interrupts are masked.
Debug Tips
Sometimes it might seem that the system does not respond after the execution of some interrupt service
routines. In case of processors we can check if the interrupts are re-enabled in the ISR. If not the inter-
rupts will be permanently disabled and the system would appear non-responsive. The system would be
up once we re-enable the interrupts in the ISR.
*1It could be an external device connected to the microprocessor or could be a device integrated with-
in the microprocessor as in case of SoCs (System on Chips).
*2It is strictly not necessary that execution returns to the same task that was executing when the inter-
rupt was raised. This scenario is explained more in the RTOS chapter.
Interrupts and ISRs 107
Interrupt
raised
Disable interrupts
Save context
Re-enable interrupt
Restore context
Stop
These are the generic steps that are executed while processing an interrupt. These
steps are not rigid. For e.g. steps (vi) and (vii) can be interchanged. And, the process of
handling interrupts is highly processor specific. In some processors (8085) we have to
explicitly disable interrupts in an ISR. But in case of some processors, the interrupts are
disabled once an interrupt is raised.
Saving of context
We have seen that an interrupt can be raised when a task is executing. The processor
now jumps to the corresponding interrupt service routine and returns back to the task
108 Embedded Realtime Systems Programming
once it is finished. In a single processor system (i.e. a system with a single CPU/micro-
processor), the path of the instruction pointer (or PC — program counter) is given by
the following diagram:
Task #n
An interrupt Time
occurs at this time
instant
Fig. 5.11 An ISR interrupting a task
We all know that many calculations (and variables) are stored in registers. For e.g.
8085 has eight registers and all additions can be performed only with the register A
(accumulator). Say, for e.g. the task that was being executed before the interrupt was
raised (indicated by ‘Task #n’ in the figure) stored some value in the accumulator (or as
a matter of fact in any register). Now, the interrupt occurred and the ISR is executed.
The ISR may need some registers for its computational purposes. So, if it uses the reg-
ister A for some adding purposes, the previous value stored in register A will be lost.
So, when the Task #n resumes, the data it would have stored in its registers would have
been corrupted. This is not acceptable.
So, before the ISR uses any of the registers, the ISR usually saves the entire set of
registers in the system stack. So, on completion of the ISR, the context that was saved
is restored. Now, the Task #n can continue without its data being corrupted. Usually
microprocessors provide a single instruction using which the entire context can be
saved/restored instead of saving /restoring all the register contents one after another.
Interrupts and ISRs 109
Context saving in modern processors
Many modern processors (RISC and CISC processors) have register banks. This means
that there are few sets of registers. The processor operates in a few modes (say Normal,
Supervisor, etc.). And each mode has a register bank, i.e. its own set of registers. So a
task that operates in a normal mode can use its set of registers. When it is interrupted,
the processor switches to another mode that has its own set of registers. Because of this
there is no corruption of registers since the general task and ISR operate on their own
set of registers.
The visible advantage of this method is that the time taken to save and restore the
context is saved. This increases the speed at which the interrupt is processed.
The disadvantage is that the microprocessor requires a set of registers for each mode.
Tips
A microprocessor may choose to bank only a subset of registers instead of the entire set of registers. In
this case we can speed up programming by using only the registers that are banked. A good optimising
compiler can assign variables to registers that are banked in a particular mode.
The transition from single tasked environment model (like DOS) to multitasked envi-
ronment is complicated because, it takes time for a programmer to get used to the
effects of concurrency in the system. The effects are still worsened in a system where
interrupts are enabled.
For e.g., in an 8085 system, we could write a delay routine based on the number of
clock cycles required by an instruction (typically NOP) and looping a particular num-
ber of times.
;Delay Routine
MOV A, 0xD4 ;Some number based on the delay required
LOOP: NOP
DCR A ;decrement register A
JNZ LOOP ;check if loop one
RET ;return
In the delay routine above, we know the number of cycles required by the loop
(the NOP, DCR A and JNZ instructions). We also know the clock speed at which
110 Embedded Realtime Systems Programming
the microprocessor executes. So, we can determine delay based on the required
delay value.
Loop value = (Required delay time)/(Time for one loop)
This worked fine in single threaded applications as in 8085.* But, this will not work
in any multi-tasked environment or in an environment that supports interrupts. This is
because, these delay routines work with the assumption that the CPU time is always
available to them.
TaskB
TaskA
Fig. 5.12 Tasks deprived of CPU time because of preemption by other tasks
As illustrated in the above figure, we can see that no task can have all the CPU time
for itself. It could be pre-empted by other tasks. So, carefully calculated loops like the
one shown above will not work. We will typically need help from the hardware and the
OS. Many OS (and RTOSes) provide APIs to start and stop timers without bothering
about these issues of pre-emption by other tasks /processes.
We can see that the advantage of the simple non-nested interrupt handler lies in its
simplicity of implementation. It is also very easy to debug, since it cannot be pre-empt-
ed by other interrupts.
But this simplicity comes with a cost —High Interrupt Latency. If we give another care-
ful look at Fig. 5.12, then we will notice that during the execution of a simple non-nested
ISR, the interrupts are disabled till the completion of the ISR. During this period, the
*We should remember this could work only when interrupts were disabled.
Interrupts and ISRs 111
microprocessor will be effectively isolated from the environment. So, the system does
not respond to any other interrupt during this period. This implies that disabling
interrupts for a particular period of time will increase interrupt latency of other inter-
rupts (including another instance of the same interrupt) for the same amount of time.
This kind of addition to already existing system latency may not be acceptable in all
cases.
In many cases (as we will see later) disabling the interrupts for such a long period is
not required at all.
The comfort of coding and debugging a non-nested ISR comes with a heavy price
tag — increase in latency of interrupts. As indicated earlier, increase in latency of inter-
rupts beyond a certain threshold value may not be acceptable (i.e. will not satisfy the
system requirements).
Warning
If you find that most of the code in an ISR is non re-entrant and this technique does not save much
time (especially when the ISR is long), still we have a problem. This means that the system uses many
global/static variables. We should try to redesign the system in such a way that we minimise the
number of global variables.
112 Embedded Realtime Systems Programming
Interrupt
raised
Disable interrupts
Save context
Execute the
re-entrant part of the
ISR
Restore context
Return
The first question that pops up in the minds of novice embedded programmers is how
the processor knows that the interrupt occurred? Surprise! The answer to this question
is that the processor polls for it. But, in this case the processor polls the interrupt pins
during every system cycle for a fraction of the cycle. This is different from the device
polling that we discussed earlier. In this case, reading of memory is not required and it
does not involve reading from the registers of these interrupts.*1
Programming Tips
If the Interrupt Vector Table (IVT) can hold ten entries and of which only four are used, then usually peo-
ple do not fill the rest of the entries or fill it with NULL. This doesn't solve the problem. And, could
actually worsen it. The program will crash once it gets a spurious interrupt and we will not be able to
track the problem. (Unless we use some kind of realtime trace program or a trace program like Introspect
of gdb of GNU/GCC compiler collection). To make things simpler we can have a single handler for all
spurious interrupts, say, UnexpectedInterruptHandler and have a breakpoint on this handler. The
execution would stop when a spurious interrupt is detected and we would know where it occurred.
Nowadays, the processors can be configured in such a way that the interrupt is
edge/level sensitive.
So, by watching the INTR pins, the microprocessor knows if an interrupt occurred.
The next question is “How does the processor know which ISR to execute when the
interrupt is raised?”. The answer depends on the processor.
If the controller is a simple controller such as a 8051 controller, the address to which
the jump is to be made is hardwired.
5.8.2 Vectoring
A more prevalent and widely used scheme is called ‘vectoring ’. In the case of vectoring,
the device/interrupt controller asserts the interrupt (INTR) pin of the microprocessor
and waits for the ‘interrupt acknowledgement’ (INTA). On receiving the INTA, the
device places a 8-bit or 16-bit data in the data bus of the processor. Each device (inter-
rupt source) is assigned a unique number. This data is used to branch to an appropriate
ISR. This array/vector of interrupt handlers is known as the ‘interrupt vector table’
(IVT)
The steps can be summarised as follows:
i. The device or the interrupt controller asserts the interrupt pin (INTR).
ii. The processor detects the interrupt and acknowledges it (INTA).
iii. The device places 8/16 bit data in the data bus.
iv. The CPU sends EOI (end of interrupt) to the device/interrupt controller.
v. The CPU branches to the ISR corresponding to the data.
Then, based on the ISR type (non-nested /nested), the f low continues as described
in Fig. 5.10 or Fig. 5.13 respectively.
Keyword – Volatile
The keyword volatile is used to instruct the compiler not to 'aggressively' optimise the code regarding
a particular variable. For any compiler (C++, C, FORTRAN etc.) , the optimiser plays a very important
role in final object code generation.
These are just a hint of some of the optimisations that usually a compiler does. There is a lot more
to it. (Probably some compiler writer can send in an article on object code optimisations. It would
be very interesting!)
Sometimes, in certain situations it is desirable not to have optimisations. In such cases, where vari-
ables should not be optimised, keyword volatile is used. Some of the applications of this keyword
are in
116 Embedded Realtime Systems Programming
■ Multithreading / Multitasking
■ Hardware related programs such as memory mapped input /output each of which are explained
below.
Here, we use turn = pid and immediately check if turn == pid, because a context switch (the
OS may schedule another process) might have occurred after turn = pid, and some other process that
shares the variable turn might have changed the value of turn. But compiler might see this validation
of variable turn as redundant and may evaluate the condition to be true always. In this case the solu-
tion fails. So, we instruct the compiler not to optimise code based on variable turn. For this, we
declare turn as
This will make the compiler wary of turn and will not perform optimisations based on turn.
Memory mapped input/output: In memory mapped input/output (i/o) we could map a device to a
particular memory location. Then writing into and reading from that memory location become
synonymous to sending input and obtaining output from the device. For example, de-referencing a
pointer can mean reading a byte (or word of predefined size) from the device (or port). In Listing 8-3,
// Listing 8-3
/* read first byte/word : equivalent
of char c = *((char*) (0xFF4E321A)) in C */
char c = * ( reinterpret_cast(0xFF4E321A) );
/* read second byte/word - compiler may feel
this is redundant */
c = *(reinterpret_cast (0xFF4E321A));
if ( c == 0x4C ) { /* if second byte/word equals 0x4C */
doSomething();
}
the compiler may feel that the second line is redundant and may remove that code. So, the if state-
ment will actually compare the first byte with a predefined value and not the second byte. Again, to
turn off the optimisations, we declare c as
Interrupts and ISRs 117
volatile char c;
There was another interesting incident when the use of volatile turned out to be useful. When one of
my friends (a beginner) was using an IDE, he ran into problems while debugging —the 'watch' window
displayed something like —'Unable to display optimised variable'. I asked him to recompile using
DEBUG switch on. But he could not immediately identify location of that option in his IDE. I then
asked him to put volatile before the variables he wanted to 'watch' and it worked fine! So, some
keywords find some applications the language designers would not have even thought of!
The first one was difficult, since the higher end processor was more expensive (obvi-
ously) and did not fit into our budget. So, it was decided that we should check all the
ISRs that were causing the problems.
118 Embedded Realtime Systems Programming
Fortunately we could identify the problem soon. The problem was not in the trans-
mission part as one would immediately expect but on the reception side.
One of the main tasks for reception expected a packet in a specific format. But, the
format of the packet received from the air was different (since it was just raw informa-
tion). The ISR tried to format the received data into the required format before passing*
it on to the task responsible for reception. This took valuable time during a packet burst.
Rx task
Queue
Rx_ISR()
{
retrieve pointer from register;
enable hardware to get the next
packet;
*To pass the data from ISR to the task IPC (Inter Process Communication) mechanisms were used.
These are covered in detail in the RTOS (RealTime Operating Systems) chapter.
Interrupts and ISRs 119
Of these, the first two and the last step were very small. The third step took the longest
time since it had to peek into some of the fields of the header to format the received data.
Actually, it was a mistake that the ISR was designed this way. Any way it is always
better to be late than never, and we changed the lower interface of the reception task
in such a way that it could accept raw information. The ISR would just receive the data
and pass on the pointer to the task.
The lower of the interface of the receive task now look like the illustration below:
Rx task
Now, the ISR just received the data and posted it to the task and the system could
handle the bursts easily.
This is again the appropriate time to discuss the difference between normal desktop
programming and realtime programs. In the above example we saw that the program
was logically correct. It performed the operation that was logically correct. Still, the nec-
essary functionality was not achieved because, we have to add one more dimension—
time, for correctness of realtime programs. This makes the life of a realtime program-
mer really interesting.
The pseudocode of the new ISR is given below:
Rx_ISR()
{
retrieve pointer from register;
enable hardware to get the next packet;
Note that the step that was used to format the data is removed (and the functionality
is added to the receive task).
P2P
This solves the latency problem. But there exists more problems in this area. One of the important top-
,
ics not discussed above is the problem of the size of buffers. The question that arises in one s mind is
“How much memory should I reserve to buffer the packets before they get processed completely?”
Since the packets will be processed by the task and the ISR queues the packets rapidly during the
burst, the calculation of buffer required at the receiving side must be carefully calculated. This is a
classic example of application of “queuing theory”.
Note
An atomic operation is one that should be done without being preempted. The above example describes
that even a simple operation like i++ need not be atomic. So never assume anything about the output
produced by the compiler. Either make sure that the operation is indeed atomic by checking the assembly
output. Or protect them by disabling interrupts. While disabling interrupts, we must remember that it
increases the latency of the system.
There could be many issues while dealing with interrupts. This section describes
various scenarios that could arise due to incorrect handling of interrupts and ways to
identify and solve the issues.
A golden rule of debugging ISRs is that, “Don’t write complex ISRs that need to be
debugged”. ISRs are asynchronous so, we may never know when they occur during the
execution of the program. Another problem regarding debugging of ISRs is they can-
not usually be debugged by using break points. In case of networks, we might get the
next packet when we step through the code and the program will fail. In some cases in
systems using robots and huge servomotors, we can cause damage to the machinery if
we stop them abruptly. So, the best way to write ISRs is to keep them as simple
as possible.
case of some processors, the interrupts are disabled at power-up. In these cases,
interrupts must be explicitly enabled in the startup code (or boot-loader) of the system
to receive interrupts.
So, we must check the processor/board documentation on enabling interrupts and
make sure that interrupts are enabled when the software begins executing.
The other reason could be that one (or more) ISR has some bug. Interrupts are dis-
abled when an ISR begins execution. The ISR should enable interrupts explicitly
before it returns. If not, the interrupts that occur later will not be serviced. (These kinds
of errors can easily be identified in the code review sessions. If not, these kinds of bugs
take much more time in debugging efforts later).
This situation can also occur because of configuration of the interrupt
controller/interrupt pin. In the beginning of the chapter, various kinds of interrupts
were discussed. One of the types were “Edge/Level Sensitive” ones. Nowadays, all
these pins are configurable. But still, we need to configure them right.
Let us consider the following situation: If, the interrupt pin is configured as level sen-
sitive and if the INTR line is low,
When an interrupt is raised, it should be high for sufficiently long period for the
microprocessor to recognise the interrupt. If this time period ‘t’ for which the interrupt
signal is asserted is less than the time required for the processor to recognise the inter-
rupt, then the processor will not service it.
Another extreme is that the interrupt is configured as edge sensitive and the proces-
sor recognises spurious interrupts that are caused by even mild electrical disturbances.
The usual practice among novice programmers is to fill only the IVT entries that cor-
respond to the interrupts that are used by the system. The rest are left unfilled. This
would cause the processor to jump to undefined locations when a spurious interrupt
Interrupts and ISRs 123
occurs. Another practice that is equally dangerous is filling of all the unused IVT entries
as NULL (0). This would cause unexplained behaviour when a spurious interrupt
arrives. So, during the development period (before release), we should ideally write a
handler for unexpected interrupt and place a breakpoint over it. This will help us
identify a spurious interrupt when it occurs.
A typical handler is given in Listing 5.9.
#ifdef DEBUG
DB_PRINT("Unexpected Interrupt");
#endif
}
If the code is to be shifted for field-testing, it is advisable to add a print statement that
will write in f lash* (or some other nonvolatile memory). We can use this trace to
identify occurrences of these interrupts.
; Save Context
;
; Restore Context
RET ; return from procedure
; Save Context
;
; Restore Context
IRET ; return from ISR
In all but extremely small systems, C can be used to code the ISRs. Many compilers
especially those targeted at embedded systems usually provide a keyword (like INTER-
RUPT, __interrupt etc.) to specify if a function is an ISR. Then the compiler will take
care of inserting the correct return statement. Nowadays, the C compilers optimise so
much that in all but very small systems (where no C compiler is available) it is always
preferable to use C compiler to generate code instead of assembly.
*BIG DETAIL
Interrupts and ISRs 125
(If you don’t follow the instructions given above after lengthy warnings, you may
very well deserve the bug ☺. So make sure that all the handlers for the spurious
interrupts are filled).
The problem could be in ‘memory writes ’ that are triggered by the interrupts. Check
all the writes that happen in the period during which the variable changes
unpredictably.
We remember a nightmare of a debugging experience during the development of a
wireless networking protocol. We always received a packet that would not be accepted
by the system because it failed in some validation. What was puzzling was that the pack-
et sent was a valid packet and the same packet was received (i.e. No transmission/recep-
tion errors — in wireless protocols there could be errors here also).
And the validation routine was checked again and was found to be perfect. After
some time, we narrowed down to the area where the validation was failing. It was in
the length of a particular field.
The length of a particular field received was compared to value stored in a global
variable. It was here that the validation kept failing. We found that the length against
which the received length was validated kept changing mysteriously. Once we found
this, we put break points in all locations where this length could be changed (i.e. wher-
ever gu16CorrectLength was on the left hand side of any expression (lvalue)). But it was
of no use since the value was not changed in the breakpoints.
Now we removed all the breakpoints and set up a ‘watchpoint ’. A watchpoint also
breaks the execution of a program not when the execution reaches a particular line, but
when a condition becomes true. We can use watchpoint to stop the execution when a
write occurs at a memory location. We set a watchpoint to break when a write occurs
on gu16CorrectLength. And finally, the program stopped during a memcpy (memory
copy) in the Receive ISR. The problem was finally solved.
The receive ISR was given a pointer where it could write the received data.
The ISR was supposed to receive a small packet (ACK) after receiving the packet.
The transmitter had correctly transmitted the packet, but had transmitted the ACK
packet in an incorrect format. The ACK packet was longer. It so overshot its length and
126 Embedded Realtime Systems Programming
Pointer used by
Receive ISR
20 bytes
Location of global
ariables
wrote over the global variable space. (One should remember that RTOS’ are not
UNIX. They don’t dump the core whenever an illegal memory write occurs. We can
usually write anywhere we want. Actually some other variables had changed too, but
they were not observed because they were not in use in the receive path.
So, when we had checked the received packet it was fine. But before the received
packet could be processed, the ACK was received and it had corrupted the global vari-
able space. So, once the length of the ACK was corrected, the problem vanished. And,
the person in-charge of the board was extremely happy because he was afraid that the
board was dysfunctional.
In order to access and process peripheral devices, polling and interrupts are the two
most common methods. Polling in a sequential way is not an effective way of handling
these devices. Instead, interrupts can be used effectively by the processor while still per-
forming other jobs when the devices are not ready.
Interrupts can be classified in a variety of ways: hardware and software interrupts, peri-
odic and aperiodic, synchronous and asynchronous. When a lot of interrupts are
expected, it is wise to use an interrupt controller.
Interrupts and ISRs 127
Interrupt Service Routines are executed when an interrupt is executed. They can be
nested as well as non-nested. When an interrupt is raised, the current context of the task
needs to be saved and the control needs to be given to the ISR.
Non-nested interrupts tend to have higher interrupt latency if proper care is not taken
to design them. ISRs should be designed with care so that they are short, simple, effi-
cient and re-entrant. Re-entrancy can be achieved by using only local variables, guard-
ing global variables through mutual exclusion and calling re-entrant functions.
Introduction to
Realtime Theory
This chapter is one of the prime motivating factors behind the writing of this book.
Books on embedded programming shy away from realtime theory at best describing
APIs of some RTOS. Books on realtime theory revel in mathematics without thought
on practical applications much to the agony of an average engineer/programmer. We
honestly believe programming is fun only when the theory behind it is clear. The chap-
ter introduces some aspects of realtime theory (the topic deserves an entire book). After
this the reader is encouraged to study other material, which we hope will be easier to
comprehend with this introduction.
As described in the introduction of the book, in realtime systems, providing the result
within a deadline is as important as providing the correct answer. An oft-quoted saying
in realtime theory is, “A late answer is a wrong answer.” This can be compared to a quiz
program. A late answer is usually not accepted (and your chance may have been passed
on to your competitor). Sometimes the deadlines are very small, i.e. the system must
respond rapidly. But, there could be instances when the response can be slow, but dead-
lines are critical. So, based on these characteristics, a realtime system can be classified
as illustrated in next page.
This is a very important classification since it is common to find people interpreting
a fast realtime system as a hard realtime system.
A hard realtime system is one where missing of a deadline can cause a great loss to
life and property. Aeroplane/Space navigation systems and nuclear power plants are
some examples of this kind of system.
A soft realtime system is one where the system is resilient to missing a few deadlines.
132 Embedded Realtime Systems Programming
Criticality
Examples are DVD players and music systems. The user usually tolerates an occasion-
al glitch.
If we carefully observe the two definitions of hard and soft realtime systems, they do
not include a notion of the speed with which the system must respond to. They simply
describe the criticality of meeting the deadlines. If the system has to meet the deadline
in a few microseconds (to few milliseconds), then the system is categorised as a fast real-
time system.
If you are watching video on a broadband network, most probably your system is
receiving data at a few mbps. So, this is a fast realtime system. But, this is not a hard
realtime system because a rare miss in audio/video is tolerated and does not make you
lose your life or property (unless you were watching world cup soccer ☺). Similarly
hard realtime systems need not be fast.
It is this timeliness factor which distinguishes realtime software from normal appli-
cation software targeted at desktop computers. In realtime software, we might have to
ascertain that all the deadlines are met before deploying the software. In desktop soft-
ware, usually ensuring correctness is sufficient, but not in the case of realtime systems.
So, while developing realtime software, we should do ‘Performance Analysis’ during the
design phase.
Definition
The scheduling theory deals with schedulability of concurrent tasks with deadlines and
priorities.
Introduction to Realtime Theory 133
Analysis:
Schedulability: The processing power of a given CPU is fixed. So, if various tasks are
present in the system, in the worst case, can all tasks be scheduled in such a way that
all the deadlines are met? This is called schedulability.
Concurrent tasks: Because, if the tasks are sequential, (it is known as batch processing),
there is no need for complex performance analysis. Since, in sequential processing, a
task can begin only after all its predecessors have completed. So, scheduling theory
here, deals with scheduling of concurrent tasks.
Deadlines: All (concurrent) tasks have a deadline to meet.
Priority: Different tasks, though they run concurrently are differentiated based on
their priorities.
The theory of scheduling is vast and has been around for a long time (1960s, 1970s).
It has matured over years. The only sad part of the story is that its application is usual-
ly limited to academic interests. Mostly, they are outside the scope of a common engi-
neer. But, the theory has wide practical implications and is immensely useful in mathe-
matical validation of systems. ‘Rate Monotonic Scheduling ’ is chosen for discussion in this
chapter because of its popularity and wide variety of applications.
As indicated earlier, this is one of the most popular and widely used scheduling mech-
anisms. This method has also been implemented in Ada* 9X. This theory began its
rapid growth from a paper on rate monotonic scheduling published by Liu and Layland
in 1973 in ACM.
We would have studied about monotonic sequences, i.e. a sequence of numbers that
are either arranged in an increasing or decreasing manner.
So, the numbers
2, 5, 12, 27, 99
form a monotonic sequence but the following series does not.
3, 5, 8, 6, 10, 9
*Ada is a programming language that has inbuilt support for tasks and scheduling of tasks.
134 Embedded Realtime Systems Programming
6.2.1 Definition
The rate monotonic scheduling is a way of scheduling in which increasing priorities are
assigned to tasks in decreasing order of their periods.
i.e. the tasks are arranged in monotonic series of their periods, e.g. if we have
4 tasks:
Task Period Priority
1 100 2
2 250 4
3 150 3
4 80 1
Now the tasks are arranged in increasing order of their periods and priority is
assigned.
Here Task #4 has the highest priority and Task #2 the lowest.
Before proceeding to intricacies of RMS, let us make some assumptions:
6.2.2 Assumptions
Let Ti be the period of the periodic task.
Let Ci be the time the processor would be required by the task
Let Di be the deadline of the task. Initially, the Di is assumed to be equal to Ti (i.e.
the task should complete before its period).
Now, let us define a useful term called the utilization ratio (Ui )
U i = C i / Ti
Introduction to Realtime Theory 135
Ui is defined as the ratio of the execution time of task i to its period Ti . Obviously
the acceptable limit of Ui is 1.
Individual utilisation
∀i : Ti : C i ≤ Ti
This is the first necessary condition. If a periodic task takes more time to complete than
its own period it cannot be scheduled (even if it is the only task in the system). If this is
the case, then the processing power of the processor can be increased to reduce Ci . Or,
the algorithm can be improved or changed, or some part of the task can be moved to
hardware.
Total utilisation
n
∑ (C
i =1
i / Ti ) ≤ 1
This rule states that the sum of all utilisation ratios cannot exceed 1. The previous result
said that individual utilisations couldn’t exceed 1. This is because if ∑U i = 1, then it
means that CPU is 100% time loaded. It cannot do any extra task. So, the sum of all
utilisation ratios must be less than or equal to 1. Again, this is a necessary test and not
a sufficiency test.
Now, let us discuss one more important concept— Deadlines —before discussing the
next conditions. Deadlines are points in time from the arrival of event to that a task
needs to complete its work. But, now let us assume that the deadline of a task is equal
to its period, i.e. a task can complete its work before its next period
t
t=0 T 2T 3T 4T
Fig. 6.1 Period and deadline
136 Embedded Realtime Systems Programming
In the above picture, we see that an event occurs periodically with period T. The
deadline is the time within which the response must be given, in our case the task must
be accomplished. The deadline and the period of a task are two independent entities.
For the sake of the following discussion, let us assume that the Deadline = Period.
Consider the set of following tasks:
We have to make sure that tasks are listed in the order (as above) of their Rate
Monotonic Priority. Let us draw a diagram that depicts the state of the system when all
the tasks are started at the same instant (t = 0).
Task2
Task3
Time
In a pre-emptive system, the ready task with the highest priority will be active at any
point of time. (There are other kinds of systems where like time-sharing and fairness
scheduling policies are adopted*).
*More later in this chapter.
Introduction to Realtime Theory 137
Let us see how these 3 different tasks are scheduled in the system. Since RMS uses a
strictly pre-emptive scheduling, only the task with highest priority is executed by the
system at any time instant.
We also assume the worst case situation that all the tasks are started simultaneously
(at t = 0). Since the task #1, (with T = 50) has the highest RMS priority, it is always exe-
cuted whenever its period occurs.
When task #1 finishes at T = 10, task #2 begins its execution. Now, in a pre-emptive
system, only the highest priority task will be executed. Let us see how these three dif-
ferent tasks are getting scheduled. RMS uses a strictly pre-emptive scheduling scheme.
From the diagram we see that task #1, gets scheduled exactly at t = 50, 100 etc. Task #1
takes a period of 10 from the CPU.
Though all the tasks are ready to execute at t = 0, only the highest priority task is exe-
cuted. After this completes, the task with next higher priority is executed (task #2 with
T—100. This executes from time 10 – 30 in the timeline.
Now, after task #2 completes, task #3 can execute. Task #3 requires a time of 30
units to finish its job. It starts at t = 30 and should complete by 80 if it is uninterrupted.
But at t = 50, the highest priority task — task #1 gets ready to execute. So, task #3 gets
pre-empted at t = 50. This is called a scheduling point. Now, task #1 completes at t = 60.
Since task #2 is not ready to execute now, task #3 can continue. Task #3 completes by
t = 90. So, all the three tasks get scheduled.
Therefore, for any task, Task #j to complete its execution, all the higher priority tasks
within the period Tj cannot take time more than Cj.
Please take some breath and read the above conclusion before going to the explana-
tion below. Any task Tj takes Cj time to complete. So, the time it can spare* to higher
priority tasks is Tj – Cj.
Cj
0
P TCj
Tj
Consider a task with period Tj. The task takes time Cj to execute. The period of exe-
cution of task j may not coincide with beginning of the period unless it is the highest
priority task. So, there will be a phase delay P before a task starts executing.
For task j to complete, the phase delay can be utmost Tj — Cj. Otherwise it cannot
complete before its next period starts.
Similarly, even if the phase delay P is less than (Tj—Cj ), if higher priority tasks pre-
empt the task j for a period more than Tj – Cj, then task j cannot complete.
Consider two tasks i , j, Ti > Tj. Hence, priority of task j is greater than task i. Task
j will pre-empt task i (maximum) Ti / Tj times. (In our previous example, during the
period of task #3, task #1 is scheduled 200/50 = 4 times).
So, time taken by task j from task i is,
(Ti / T j ) * C j
∑ (T
j =1
i / T j ) * C j ≤ (Ti − C i )
To explain this, consider the previous diagram. Let us consider task T3 . (Period =
200). T1 pre-empts it 200 / 50 = 4 times during the period of task T3. So, the time
taken by task T1 during the period of T3 is:
200 / 50 * 10 = 40
200 /100 * 20 = 40
So, out of 200 units for T3 , 40 + 40 = 80 units have already been taken by higher
priority tasks. So, the time left for T3 = 200 – 80 = 120.
The time needed for T3 is 50 units, which can be comfortably accommodated in the
available 120 units. So, the set of tasks T1 , T2 and T3 are schedulable.
Introduction to Realtime Theory 139
This is the third sufficient, but not a necessary condition for schedulability.
Note
This is a pessimistic test because we take the ceiling operator (e.g. 3 / 2 = 2 ) while doing this calcu-
lation. It must be noted that a higher priority task, in this case, T1 may not pre-empt T3 four times as
indicated in the calculations. In the example, T1 pre-empts T3 only once. This test does not consider the
execution time of T3. It only takes into consideration the period of T3. So, this is a pessimistic test.
After these three conditions are passed, we can move to the next test, which is also a
pessimistic test. The theorem on which the test is based is called the “Utilisation Bound
Theorem”.
Definition
Consider a set of n independent periodic tasks. They are schedulable (under the priority assign-
ment of RMS) if
n
∑C
i =1
i / Ti ≤ n (21 / n − 1)
Independent — Because for RMS, we assume that the tasks do not interact (neither syn-
chronisation nor communication) with each other.
Periodic — The tasks are periodic, i.e. they have a certain frequency. They are not
sporadic or event triggered.
Schedulable — All the tasks can meet their deadlines.
Since the proof is not as simple as the ones described before, it is beyond the scope
of this book.*
The expression in the RHS of the equation above may look pretty complicated. On
the first look, people feel that evaluation of these expressions can be time consuming.
*We feel that the engineers need to remember the basic theory behind any topic and most importantly
remember the results than spending huge efforts in proving them. We are not suggesting that we
should be dumb to only apply the results, but we shouldn’t be too engrossed only in the academics.
140 Embedded Realtime Systems Programming
If we carefully look at the expression, we can see that it is a constant for various values
of n .
So, we can make a lookup table for various values of n.
So, whenever we are required to do performance analysis, we can create a table like
the one below:
The data we have at the start is usually Ci and Ti only. The fourth column is Ui which
defines the utilisation ratio of the task to the available CPU time. Ui (cum) as the name
suggests is the cumulative utilisation of the tasks so far. For e.g., if we consider only the
two tasks, the total utilisation is 0.667. Ui (cum) should not exceed the limit set by UB the-
orem. In this case, we see that by addition of the third task, Ui (cum) becomes greater
than UB(i). So, these three tasks are not schedulable according to UB theorem.
We should remember that UBT is also a pessimistic test. (Actually, the set of three
tasks are indeed schedulable). We may have to perform more tests (like the Response
Time test) to ascertain if the set of tasks is schedulable. Readers are encouraged to delve
deeper into realtime theory. (Some recommended books are given in the Bibliography
section in the end).
Real time systems can be classified in two ways. First based on their speed (fast and
slow). Second, based on the criticality of their deadlines (hard and soft). There can even
Introduction to Realtime Theory 141
be an overlap between these two classifications. Real time theory deals with assessing
the schedulability of more than one concurrent tasks with given deadlines and priorities
in such a way that all deadlines are strictly met. In this chapter, we used a simplified
version of Rate Monotonic Analysis in order to arrive at a mathematical formula for
their schedulability. The utilisation bound theorem in its simplest form, gives a formu-
la of schedulability for independent periodic tasks.
7.1 INTRODUCTION
The term realtime operating system (RTOS) usually triggers an image of a highly
complicated OS in the minds of a programmer who is new to embedded systems. It is
usually felt that only the experts can program with RTOS. This is totally untrue. In real-
ity, an RTOS is not usually such a complex piece of software (in comparison to some
of the mammoth size OSes’ currently available). Though current RTOSes provide a
huge variety of features, a basic RTOS is just small enough to provide some scheduling,
memory management and a decent level of hardware abstraction.
■ Scheduling
*Nowadays, the embedded (not necessarily realtime) OSes are as big and complex as their desktop
counterparts. For example, Windows XP is available as an ‘embedded’ version.
144 Embedded Realtime Systems Programming
■ Intertask communication/synchronisation
■ Memory Management
■ Timers
■ Support for ISRs
Many good RTOSes also provide support for protocols like TCP/IP and some appli-
cations like telnet, tftp etc.
In a desktop development environment, a programmer opens his IDE/favourite
editor*1 and types in his code. Then, he builds it using his compiler and then executes
his program. The point to be noted is that the OS is already running and it ‘loads’ the
executable program. The program then makes uses of the OS services. The OS takes
care of scheduling it (sometimes hanging in the process ☺). When the program com-
pletes, it exits. The OS can run other programs even while our program is running*2.
The important point to observe is that programs have a definite exit point and pro-
grams that contain infinite loops are considered bad. The following diagram (Fig. 7.1)
describes the steps during the lifetime of a desktop OS:
Power-up
OS
Initialisation
OS Ready
Load Program
#1, #2…
Program #1 exits,
#2 exits … #4
starts
Power-up
Decompression/
Bootloading
Hardware Init
BSP/RTOS
Initialisation
Application
Code Begins
*With even Windows XP getting embedded, the barriers are slowly decreasing.
146 Embedded Realtime Systems Programming
In this case, we can see that there is no OS during the startup. There is only a single
executable that contains:
1. OS Code
2. Board Support Package (BSP)
3. Application code
BSP is a part of OS code in the sense that it is used by the OS to talk to different
hardware in the board. We should remember that the software that we write is run on
the target board.
The OS vendor has no idea of how the board is organised. He does not know the
components of the board in most cases. (For e.g. what is the Ethernet controller? UART
controller? etc.)
In some cases, the software is developed over some standard boards that can be
bought off the shelf. If the board is quite common, then the OS vendors may provide
BSP for those boards. But, if the board is custom built, we have to write a BSP for the
board.
The OS needs to interact with the hardware on the board. For e.g. if we have a TCP/IP
stack (provided by an OS vendor) integrated with our software, it should finally talk to
the Ethernet controller on our board to finally put the data in the networking medium.
The TCP/IP package has no clue about the Ethernet controller (say 3Com, Intel) and the
location of the Ethernet controller and ways to program it. Similarly, if there is a debug
agent that runs as a component in the OS that may want to use the serial port, then we
should provide the details and the driver for the UART* controller used in our board.
*Universal Asynchronous Receiver/Transmitter.
Realtime Operating Systems 147
We should understand that the BSP is highly
Note specific to both the board and the RTOS for
which the BSP is written.
BSP is a component that is used to
The BSP/startup code will be the first to be
provide board/hardware specific details
to the OS, for the OS to provide hard- executed at the start of the system. The BSP
ware abstractions to the tasks that use its code usually does the following:
services.
Initialisation of processor (Usually processors
■
can operate in various modes. BSP code initialises the mode of operation of the
processor. Then, it sets various parameters required by the processor.)
■ Memory initialisation
■ Clock setup
■ Setting up of various components such as cache
Note
BSP engineers usually assume that the hardware is correct. But, these days when hardware as well as
the pressures on the engineers gets complex by the day, the hardware too gets as buggy as software. (We
are not beaming with pride when we say this. Some day we’ll be able to produce cleaner implementa-
tion of hardware and software). So, it is a good idea to search for errata/ known bugs section in hard-
ware documentation rather than to waste inappropriate time in searching for the bug )
148 Embedded Realtime Systems Programming
Now, let us see the various components of an RTOS and how these functionalities
are achieved.
7.4.1 Tasks
The concept of a task is fundamental to understanding an RTOS.
Analysis
‘Atomic Unit’ — A task is considered as an atomic unit because, any other entity small-
er than a task cannot compete for system resources.
‘Scheduled’ — In a system, there could be many tasks competing for system resources.
But, the duty of the RTOS is to schedule tasks such that requirements of tasks are met
in such a way that the system objectives are met.*
‘System Resources’ — All tasks compete for resources like CPU, memory, input /output
devices etc.
A task first needs to be ‘created ’. A task is usually characterised by the following
parameters: (sample parameters are given in brackets)
1. A task name (“TxTask”)
2. Priority (100)
3. Stack size (0×4000)
4. OS specific options
*What this means is that, some task can be blocked for a long period of time and another can use the
same resource for a long time. There is no question of fairness as long as the system objective is met.
In an aeroplane, a task that controls the air-conditioning system is less important than a task that con-
trols the flight.
Realtime Operating Systems 149
These parameters are used to create a task. A typical call might look like as in
Listing 7.1.
if (result == OS_SUCCESS) {
// task successfully created…
}
Listing 7.1: Task creation
Now, the task can be considered to be in an embryonic state (Dormant). It still does
not have the code to execute. But by now, a task control block (TCB) would have been
allocated by the RTOS.
3. Running
4. Blocked
❑ Dormant When the task is created, but not yet added to RTOS for scheduling.
❑ Ready The task is ready to run. But, cannot do so currently because, a higher
priority task is being executed.
❑ Running The task is currently using the CPU
❑ Blocked The task is waiting for some resource/input.
Dormant
Running
Blocked
Ready
The stages of a task and its transitions are illustrated in Fig. 7.4.
i. When a task is created, it is in a ‘Dormant’ state
ii. When it is added to the RTOS for scheduling, it usually arrives in the ready
state. But, if it is the highest priority task, it could begin executing right away.
iii. When a task is running and if another higher priority task becomes ready, the
task that is running is pre-empted and the highest priority task is scheduled for
execution.
iv. During the course of execution of a task, it may require a resource or input. In
this case, if the resource/input is not immediately available, the task gets blocked.
Realtime Operating Systems 151
Note
What happens if a lower priority task starts a higher priority task? It gets pre-empted immediately. So,
the root task that creates all the tasks should typically have the highest priority and if it has any task
higher than its own priority, it should be started last.
Other transitions
There are transition stages in the execution of the tasks. Some of them are as follows:
■ Ready to running : For a particular task, if all the higher priority tasks are blocked,
then the task is scheduled for execution. It then changes its state from ready to
running.
■ Blocked to ready : When a higher priority task releases a resource required by a
lower priority task, then the lower priority task cannot begin execution. The high-
er priority task will continue to run. But, the state of the lower priority task will
change from ‘blocked’ to ‘ready’.
■ Blocked to running : Sometimes, it could happen that a higher priority task is
blocked on some resource/input. When that resource is available again, then the
task begins execution, pre-empting the lower priority task that was executing.
P2P
Impossible transition: Of the transitions considered above, the transition “Ready to Blocked” is not
possible. A task can block itself only when it is running. So, this transition is not possible.
You can see that it has no system calls… in fact no code except an infinite loop. The
idle task in an RTOS is the task with the lowest priority. And, many RTOS’ reserve a
few lowest and highest priority tasks for themselves. For e.g., if an RTOS can provide
256 tasks, it may reserve the lowest 10 and the highest 10, leaving the user with 226
tasks of priorities in the range (10 –246).
CPU loading
Though an idle task does nothing, we can use it to determine the CPU loading — the
average utilisation ratio of the CPU. This can be done by making the idle task writing
the system clock in some memory location whenever it gets scheduled. So, an idle task
need not be ‘idle ’ after all.
P2P
Is an ISR also a task?
The answer is no. A task is a standalone executable entity. An ISR is a routine that is called by system
in response to an interrupt event. (However some newer RTOSes model ISRs as high priority threads
are schedulable by the OS kernel).
Pros:
Once the priorities are set properly, we can rest assured that only the important things
are handled first.
Cons:
It is possible that one or more of the lower priority tasks do not get to execute at all. So,
to avoid this, a proper analysis should be done in the design phase.
Note
There could be a slight deviation in the implementation of this scheduling algorithm. In almost all sys-
tems, ISRs will have highest priority irrespective of the priorities assigned to the tasks. This is usually
necessary too. So, this deviation from the normal behaviour is acceptable.
Time slicing
In this kind of scheduling policy, the CPU time is shared between all the tasks. Each
task gets a fraction of CPU time. There is no notion of priority here. This scheduling is
also known as round robin scheduling. The scheduling is not used in its original form.
However, this can be used in conjunction with pre-emptive scheduling. In a pre-
emptive system, if two or more tasks have same priority,*2 we can make the scheduler
use time slicing for those tasks with the same priority.
Pros:
i. No need for complex analysis of system
ii. This kind of kernel is relatively easily to implement
iii. The pre-emption time of a task is deterministic i.e. if a task is pre-empted, we
will know exactly the time after which the task will be scheduled (if the
number of tasks in the system do not vary with time)
Cons:
This is a very rigid scheduling policy (and exactly that is what it is meant to be — there
is no notion of priority).
Fairness scheduling
In this kind of scheduling, every task is given an opportunity to execute. Unlike pre-
emptive scheduling, in which a lower priority task may not get an opportunity to exe-
cute, in this case, every task will be given a ‘fair ’ chance to execute. Though some kind
of priority mechanism could be incorporated here, it is not strict. Priority of a task,
which has not executed for some period will gradually be increased by the RTOS and
will finally get a chance to execute.
This scheduling policy is complex (how to vary priority of tasks in such a way as to
achieve fairness?). And, it does not fit right in realtime systems.
This kind of scheduling is widely available in desktop OS’. (We still listen to music
while compiling our programs).
Pros:
i. Every task will get an opportunity to execute
Cons:
i. Introduces nondeterminism into the system
This situation is different because, there is a shared region between the roads. So,
traffic on the two roads need explicit synchronisation. Another point to be observed is
that traffic signal is required only at the region of intersection. There is no need for this
synchronisation either before or after this region.
For e.g., consider that two tasks want to share a printer. Let task A want to print the
following sequence:
123
If these tasks are scheduled in a round robin (time slicing) method, then the printout
appearing on paper could be
1 2 A B 3 C (or any other junk)
It is also possible that the output is perfect because, the two tasks had sufficient time
to queue their requests.
But, since such an error situation could arise, and can cause undesirable output from
the system, this problem should be addressed. The immediate solution that occurs to
mind is that one of the tasks can acquire the printer resource, use it and then release it.
To implement this solution, we need to use a mutex— a short and very common
name for “Mutual Exclusion”.
As the name indicates, it is a mechanism to exclude other tasks to use a resource
when a specific task has acquired it.
Realtime Operating Systems 157
For e.g., task A can be coded as:
// Task A code
// . . .
mutex_acquire( printer_mutex );
print( 1 );
print( 2 );
print( 3 );
mutex_release( printer_mutex );
// Task B code
// . . .
mutex_acquire( printer_mutex );
print( 'A' );
print( 'B' );
print( 'C' );
mutex_release( printer_mutex );
At any point of time if both the tasks want to use the printer, they first try to acquire
the mutex. Since, we are considering only a single processor model, the task, which
makes the first attempt will acquire it.
Let us consider a case where task A has acquired the printer_mutex. (Refer
Listing 7.5.)
// Task A code
// . . .
mutex_acquire( printer_mutex );
print( 1 );
Pre-empted here
print( 2 );
Listing 7.5: Task A pre-emption
Let us now consider that the task B has a higher priority and it gets scheduled after
print( 1 ). And now, let task B also want to print something. It will now try to acquire
the printer_mutex. But it cannot, since task A has already acquired the mutex.
158 Embedded Realtime Systems Programming
// Task B code
// . . .
mutex_acquire( printer_mutex ); Blocked here
print( 'A' );
print( 'B' );
The task B will now be blocked. (It is not necessary that the task B be blocked. We
say B is blocked on resource.)
Since task B is blocked, task A gets to resume again and completes its printing. It then
releases the mutex. Now, task B can resume and continue with its printing.
Task A
Task A
Task A resumes
blocks on
again.
the mutex
Fig. 7.7 Task dynamics
We should remember that since task B is a higher priority task, the execution would
shift to task B immediately after task A releases the mutex.
Realtime Operating Systems 159
Consider the following code of task A:
// . . .
print ( 3 );
mutex_release ( printer_mutex );
my_foo(); // some other function called from task A
Listing 7.7: Mutex release by task A
In a truly pre-emptive system, the execution will be transferred to task ‘B’ immedi-
ately after execution of mutex_release. Statement my_foo will be executed only
after task A is scheduled again.
Task B Task A
mutex_acquire
// . . .
// . . .
mutex_acquire // (gets blocked)
print ( 3 );
mutex_release
print (‘A’);
// . . .
mutex_release
// . . .
myFoo();
The above figure describes how execution is transferred between tasks A and B dur-
ing a sample run. The shaded boxes indicate the code that gets executed.
Pseudocode for the entire program is given in Listing 7.8.
160 Embedded Realtime Systems Programming
#include <my_os.h>
int main()
{
// . . .
task_create ( TaskA );
task_create ( TaskB );
// . . .
}
Listing 7.8: Source file for Task A and B with mutex sharing
Mutexes are also required when two tasks share data using global variables.
Let us consider a case where two tasks are writing into contiguous memory locations
and another task uses these values produced by the two tasks. In concurrent program-
ming parlance, the first two tasks that generate the values are called ‘producers ’ and the
P1 ptr
P2 C
“This is the second time I have written you, and I don't blame you for not answering me, because I
kind of sounded crazy, but it is a fact that we have a tradition in our family of ice cream for dessert
after dinner each night. But the kind of ice-cream varies so, every night, after we've eaten, the whole
family votes on which kind of ice-cream we should have and I drive down to the store to get it. It's
also a fact that I recently purchased a new Pontiac and since then my trips to the store have created a
problem.”
“You see, every time I buy vanilla ice-cream, when I start back from the store my car won't start. If I
get any other kind of ice-cream, the car starts just fine. I want you to know I'm serious about this ques-
tion, no matter how silly it sounds: 'What is there about a Pontiac that makes it not start when I get
vanilla ice-cream, and easy to start whenever I get any other kind?'”
The Pontiac President was understandably skeptical about the letter, but sent an engineer to check it
out anyway. The latter was surprised to be greeted by a successful, obviously well-educated man in a
fine neighbourhood. He had arranged to meet the man just after dinner time, so the two hopped into
the car and drove to the ice-cream store. It was vanilla ice-cream that night and, sure enough, after
they came back to the car, it wouldn't start.
The engineer returned for three more nights. The first night, the man got chocolate. The car started.
The second night, he got strawberry. The car started. The third night he ordered vanilla. The car failed
to start.
Now the engineer, being a logical man, refused to believe that this man's car was allergic to vanilla
ice-cream. He arranged, therefore, to continue his visits for as long as it took to solve the problem.
And toward this end he began to take notes: he jotted down all sorts of data, time of day, type of gas
used, time to drive back and forth, etc.
In a short time, he had a clue: the man took less time to buy vanilla than any other flavour. Why? The
answer was in the layout of the store.
Vanilla, being the most popular flavour, was in a separate case at the front of the store for quick pick-
up. All the other flavours were kept in the back of the store at a different counter where it took consid-
erably longer to find the flavour and get checked out.
Now the question for the engineer was why the car wouldn't start when it took less time. Once time
became the problem — not the vanilla ice-cream — the engineer quickly came up with the answer:
vapour lock. It was happening every night, but the extra time taken to get the other flavours allowed
the engine to cool down sufficiently to start. When the man got vanilla, the engine was still too hot for
the vapour lock to dissipate.
To avoid this problem, shared global variables must be used only with
synchronisation.
There is another interesting solution called the Peterson’s solution described in the
sidebar on keyword volatile.
Priority inversion is one of the issues that must be addressed during the analysis and
design of realtime systems.
We discussed that, in a pre-emptive system, at any point of time, only the task with
the highest priority executes. But, due to some reasons, if a higher priority task is
Realtime Operating Systems 163
blocked because of some lower priority task, then a ‘Priority Inversion ’ is said to have
occurred.
It can happen in two ways:
❑ Bounded priority inversion
❑ Unbounded priority inversion
Let us discuss them one-by-one.
TA blocks on
the mutex
acquired
by TC
TA acquires
Priority
mutex
TB releases
mutex
TB acquires
mutex Critical section
TB is Time
preempted
here
Fig. 7.10 Bounded priority inversion
Initially, let TA be executing and after sometime, TA gets blocked and TB scheduled.
Now, let TB acquire a mutex corresponding to a resource shared between TA and TB.
After sometime, before TB gets to finish its critical section code, TA gets scheduled (since
TA’s priority is higher).
After sometime, TA tries to acquire the mutex for the resource shared between TA and
TB . But, it cannot acquire the mutex because, it has already been acquired by TB .
164 Embedded Realtime Systems Programming
Because of this TA is blocked. Now, TB runs till it completes its critical section code and
releases the mutex. Once the mutex is released, TA begins execution.
Here, we see that TA gets blocked for a period, because of lower priority than task TB
in acquiring a shared resource. So, in this case priority inversion is said to have
occurred.
Now, let us look at the question, “How long is TA blocked?” The answer is, in worst
case, TA will be blocked for the period equal to the critical section of TB (i.e. if TB is pre-
empted immediately after acquiring the mutex)
// Task B Code
mutex_acquire( my_mutex ); TB is pre-empted
here
// Critical Section code
mutex_release( my_mutex );
Here we see that the period for which the priority inversion occurs is ‘bounded ’. The
worst case is that the priority inversion occurs for the period equal to complete TB
critical section. So, this is called ‘bounded priority inversion’.
In summary, a ‘bounded priority inversion’ is said to occur when a higher priority
task is blocked for a deterministic period of time within a limit (bound).
*A system with any number of tasks > 2 can be used. To make the illustration of unbounded priority
inversion easier, a system with 3 tasks is considered.
Realtime Operating Systems 165
Let the three tasks in the system be Ta , Tb , Tc in decreasing order of priority (Ta has
highest priority)
Ta blocks on
the mutex
acquired by Tc
Tb gets
scheduled
Ta continues by
acquiring mutex
Priority
Tc releases
mutex
Tc acquires
mutex
Time
Initially, let us assume that the highest priority task (Ta ) is running and gets blocked
(Refer Figure 7.12). Now Tc starts running. (Assuming Tb is also blocked because of some
reason). The task Tc acquires the mutex for the resource shared between Ta and Tc and
enters the critical region.
Now, it gets pre-empted by Tb , which gets pre-empted again by task Ta .
After sometime, Ta tries to acquire the mutex for the shared resource. But, Tc had
already taken the mutex. Once Ta gets blocked, Tb starts running. Now, Tc is still
blocked and cannot release the mutex required.
Unlike the previous case, we cannot say how long it will be before the lower priori-
ty task releases the resource needed by higher priority task.
We’ll have to wait for the intermediate priority task(s) to complete before the lower
priority task will release the resource. So, this is called and ‘unbounded priority
inversion’.
166 Embedded Realtime Systems Programming
oldPriority = task_getPriority();
// This call returns the priority
// of the current task
newPriority = R1_PRIORITY;
task_setPriority ( newPriority );
// Now, use the resource
task_setPriority ( oldPriority );
Listing 7.9: PCP without OS support
Say, if T3 wants to use R1 , it sets its priority to 1 and then access the resource. Now,
task T1 cannot pre-empt T3 because its priority has increased to 1. After using the
resource, the task restores its own priority.
The advantage of this system is that it does not require any explicit support from the
RTOS. Another significant change is that there is no mutex /semaphore is required. We
just use the priority changing mechanism provided by the RTOS.
However, many RTOS’ help by adding support for PCP in the mutexes. Whenever
a mutex is created, it is associated with a corresponding priority. Any task that takes the
mutex will take up the priority associated with the mutex. The greatest advantage of this
is that the boosting and restoring of priority is automated.
This method has the following disadvantages:
i. This method is manual i.e. the priority is associated with the resource
manually. So, in case of large systems, maintaining priorities associated with
resources can be error prone.
ii. Manual set /reset of priorities: In its original form (i.e. without mutexes), after
using the resource, if tasks do not reset their priority, it could cause havoc. In
168 Embedded Realtime Systems Programming
this aspect, using PCP provided by RTOS with mutexes is a preferable way of
using PCP.
iii. Time-slicing not allowed: While using PCP, we have to adopt only a strict
pre-emptive scheduling policy. PCP will fail if we mix pre-emptive and
time-slicing.
It is now up to the designers to use either or none or both of the protocols appropri-
ately for their system after weighing the benefits of all the schemes.
Mars Rover
The concepts like priority inversion and preventive measures described earlier are not
limited to academic research. They can and should be applied to solve practical issues.
Importance of these analysis, once believed to be a cosmetic addition to analysis of sys-
tems, came to light during the Mars Rover failure. Mars Rover was a project by NASA
to explore the surface of MARS. It consisted of two major blocks —landing software
and land mission software. Landing software was very critical because, any fault in this
part would make the rover crash on the mars surface. The land mission software was
used by the controller to analyse the environment in Mars, collect data and transmit
them back to earth.
The landing of the Rover was perfect. But, later, during the execution of the land mis-
sion software, the system started resetting itself mysteriously. So, data could not be col-
lected and sent to earth.
Usually, there are two versions of any software— the debug and release versions. The
debug versions are used during the test phases of the software. The executable in this
case is usually big because, it contains debug information also. After testing and the soft-
ware is declared bug-free (!), a release version is made. This version is much leaner than
the debug version, but contains no information for the programmer if the system fails.
Fortunately, the land mission software in the Rover was the debug version. The
RTOS used was VxWorks™, which offered features like saving the collection of events
till the system was reset. After a long analysis the bug was found. (Actually, it seems that
it was not possible to reproduce the bug on earth. Engineers had left for the day and
one engineer had stayed back and he could reproduce the problem).
There was an information bus, which was to be used by a high priority task. Another
low priority task also required the bus. (The bus was a shared resource). Hence, a mutex
was used to implement task synchronisation. Whenever the reset occurred, the lower
Realtime Operating Systems 169
priority task had acquired the mutex and was later pre-empted by the higher priority
task. But, it could not use the bus because, the mutex was acquired by the lower prior-
ity task. In between, an intermediate priority task used to run pre-empting the lower pri-
ority task. So, the higher priority task could not get the mutex.
Meantime, the system had a watchdog timer. As indicated before, a watchdog timer
is used to reset a system if ‘hangs ’ for sometime. Here, the watchdog timer noticed that
the high priority task could not access the bus for a long time and hence reset the entire
system.
Mutexes, when being created were created as ‘plain vanilla’ mutexes. So, it was decid-
ed to enable PCP feature of the mutex. Then, remotely, using one of the debug support
feature of VxWorks, the mutexes when being created, the PCP f lag was set to true
instead of false, and the problem never happened again.
So, realtime theory finds practical uses too.
Pipe P3 Pipe P4
Fig. 7.13 A Reservoir with 4 Pumps
maximum number of tokens) from the semaphore. When the requested number of
tokens are available, the number of tokens in the semaphore is decremented, and given
to the task. If the number of tokens exceed the number of available tokens, the task
blocks. (Some RTOS’ can provide the options of blocking, non-blocking or blocking
with a timeout). Once some other task releases the tokens, the blocking task is released.
Types of Semaphores: If more tasks are blocking on a semaphore, and some tokens are
released, which task must acquire the tokens? The answer is that the RTOS usually pro-
vides options to configure this.
There could be a FIFO (First In First Out) arrangement where the task that first
blocked on the semaphore is given the tokens. If the semaphore is created with priority
options set, the tokens are acquired by the blocking task with the highest priority.
Illustration: Consider a system with 3 tasks. Let us also assume it is the system which
tries to control the 4 pumps in the reservoir. Let a task then acquire 3 pumps (tokens).
semaphore_acquire( pump_semaphore, 3 ); // Task B
But, since the required number of tokens is not available, the task will block. (As
indicated with mutexes, a task can also choose not to block or to block with a timeout).
Let us assume task A also requires 3 tokens.
semaphore_acquire( pump_semaphore, 2 ); // Task A
Let task B execute, complete its critical section and release its three tokens.
semaphore_release( pump_semaphore, 2 ); // Task C
Now, if the pump semaphore was a FIFO semaphore, then, task C will acquire the
two tokens since it had blocked first on the semaphore. But, if the pump semaphore was
a priority semaphore, then, task A would have acquired its 3 tokens, since it has a pri-
ority higher than that of C.
Binary Semaphores: We know that a semaphore can take on any positive integral value
(including 0) at any point of time. If a semaphore is created with maximum value = 1,
then its value will toggle between 0 and 1.
Hence, such a semaphore is called a binary semaphore. A binary semaphore has
properties identical to mutex. Such a semaphore can be used where no explicit mutex
feature is provided by the RTOS.
Realtime Operating Systems 171
The story of semaphores
Dijkstra was once thinking hard about task synchronisation —about tasks synchronising
before entering their critical sections.* There was a railway station in front of his house,
where he could see lots of trains waiting for their signals.
(Obviously there were more trains than there were platforms). He saw that signals
were used to control the movement of trains.
Fig. 7.15 Train can proceed Fig. 7.16 Train should wait
Now, Dijkstra could see that the station/platforms were similar to shared resources
and the trains were similar to tasks. The trains wanted to use the platforms as the tasks
wanted to use the shared resources. The one thing missing in RTOS was the signal pole.
So, he added it to the software system. He first used the term ‘seinpaal ’ (signal-pole in
Dutch). When applied specifically in parlance of trains, it becomes’ ‘semafoor’* in
Dutch.
Nowadays we use terms like acquiring and releasing semaphores
(semaphore_acquire, semaphore_release), older documentation (even some
newer ones) will have APIs like semaphore_p, semaphore_v (or P(s), V(s)).
Almost every OS book will define P and V operations on a semaphore. The letter ‘p’
is taken from Dutch word ‘proberen’ which means ‘try to’ (acquire the semaphore). The
letter ‘v’ is taken from the Dutch word ‘vrijgeven’, which means ‘release’ (the
semaphore).
We guess that the complex Dutch words were reduced to one-letter acronyms by dis-
traught non-Dutch programmers to avoid typing complete Dutch words.
We hope you are not surprised to suddenly come across theory on how RTOS work,
midway through the chapter. We had earlier discussed the differences between a desk-
top OS and an RTOS. There is no separate OS running that can be used to load and
execute programs. Also, there is only a single executable file that includes both the
application code and OS code.
The working of RTOS is a mystery for novice programmers. In this section we dis-
cuss how scheduling works in an RTOS.
As discussed in the ‘Build Process’ chapter, any program can be split into two com-
ponents:
i. Code
ii. Data
Code is non-modifiable, i.e. contents of the text section do not change and the con-
tents of the data section of the program keep changing as the system is running. Initially,
the entire application is stored in the ROM/Flash.
*Truly, no prizes will be given for guessing semaphore is derived from semafoor.
Realtime Operating Systems 173
After a program is setup and is executing, the lower memory sections are usually
required by the RTOS/hardware for board specific software and interrupt vector table
entries. It is followed by the code section. Since the code is non-modifiable, we can
choose to keep it in ROM (if not compressed and slow access time of ROM is not a
problem).
Each task then requires a stack to create its local variables and store the arguments
passed to functions. Each task is associated with its own stack of memory.
T1 Stack
T2 Stack
Note that the stack grows downwards. In basic RTOS’, there will be no memory
protections. Unix programmers will be familiar with core dumps whenever there is an
illegal memory access. But, an RTOS does not have such features.* In some cases, stack
of a task can grow beyond its limits. For e.g., if stack size of T1 is insufficient, the stack
may overf low into T2 stack area. This will corrupt the values stored by task T2 in its
stack. We’ll never know that this had happened until the system does something inter-
estingly out of the normal. ☺
So far, we saw how tasks come to life and execute. The next part is to know how the
OS works. First, let us consider a strictly pre-emptive system. (And, let us restrict
the discussion only to a uniprocessor system). Let us also consider that some task is
running.
*Statutory Warning: In computing parlance it is true that ‘If it works, it is obsolete’. We recently saw a
demo of an RTOS will all these features. So, this statement can become antiquated. (We are bored of
telling, ‘usually does not have’, and ‘typically does not happen’ to avoid prying eyes of our critics.
174 Embedded Realtime Systems Programming
OS is scheduled only when a task executes a system call. System calls are the
RTOS API that get linked with the application for task creation, semaphore/mutex
operations, etc.
In a strictly pre-emptive system, a task can change its state (running, ready, blocked)
only after executing a system call. The only way a running task can block itself is by
executing a system call typically of a semaphore/mutex operation.
For e.g., if a task is requesting a semaphore or a mutex that is not available, then a
task could block. If a low priority task releases a semaphore on which a higher priority
task is blocked, then, the lower priority task will be blocked.
Conceptually, a semaphore release implementation could be
release_tokens_if_someoneelse_is_blocking();
invoke_scheduler();
}
Fig. 7.18: RTOS Scheduler invoked from a System Call
The invoke scheduler function should conceptually be a part of every system call.
This call should check if any task needs to be scheduled by updating and checking the
state of all tasks. If say, some other task needs to be scheduled, then the scheduler will
save the context (registers) of the task to its TCB, restores the context of the task that
needs to be scheduled from its corresponding TCB and schedules the new task.
Another way scheduling a task is by raising an interrupt when the corresponding ISR
executes a system call. For e.g. if some data arrives in the system and the ISR posts the
data (using system calls) to a higher priority task, then, after the ISR completes, the
higher priority task will be scheduled.
Realtime Operating Systems 175
The important points to be noted are as follows:
■ Task switching does not happen immediately after the ISR executes a system
call. It happens only after the ISR completes.
■ Once the ISR completes, it need not return to the task that was running when
the ISR was triggered.
Tasks
A system call is
executed in the ISR that
invokes the scheduler
TB
TA
Another way of scheduling is time slicing. Here, the OS initialises an ISR called the
‘tick ’ ISR that gets executed whenever a tick (a fixed period of time) occurs. The sched-
uler could be invoked in the tick ISR. Usually any RTOS will initialise a tick routine for
implementing RTOS timers. This tick ISR can also be used for scheduling.
176 Embedded Realtime Systems Programming
So far, we have seen how different tasks synchronise to access shared resources. This
section describes how these tasks communicate. Two mechanisms that are provided by
various RTOS’ are
❑ Message Queues
❑ Signals/Events
In addition, we will discuss the following mechanisms that can be used in specific sit-
uations (not recommended normally!):
❑ Function calls
❑ Accessing of variables
Message queues are used to pass data, while signals /events are used to signal
other tasks.
1
2
3
These act as buffers between two tasks. A task need not consume data passed by
another immediately— i.e. the processing need not be synchronous. (Whenever a syn-
chronous processing is required, it is better to use a function call mechanism).
Realtime Operating Systems 177
To use a message queue, the first step is to create one. Creation of a queue returns a
queue ID. So, if any task wants to post some message to a task, it should use its queue
ID.
qid = queue_create ( “MyQueue”, // Some queue name
QUEUE_OPTIONS ); // OS Specific Options
Each queue can usually be configured as fixed size/variable size. Most RTOS will
provide at the least, fixed size entries.
Though the queue post operation varies in methods across RTOS’, the idea is usual-
ly to pass a pointer to some data. This pointer will point to a structure previously agreed
between the tasks that use the queue.
Say there are two tasks that try to communicate using queues. The transmit task want
to send a message my_message to the receiving task.
The above declarations are common to both the transmitting and receiving task. So,
these are put in a common header file to be included in both files implementing both
the tasks.
// Sender side
queue_message* qmsg;
my_message * rmsg;
rmsg->a = 2;
contd…
178 Embedded Realtime Systems Programming
contd…
rmsg->c = 'r';
qmsg->msg_id = RX_MESSAGE;
qmsg->pMessage = (void*) rmsg;
// Receiver side
queue_message* qmsg;
my_message * rmsg;
if(qmsg->msg_id == RX_MESSAGE)
rmsg = (my_message *) qmsg->pMessage;
// . . .
free( qmsg );
free( rmsg );
Listing 7.12: Code for the receiving side
On the transmitting side, we can see that, the transmitting task allocates memory for
the queue message and the message to be transmitted. It then passes the pointer to
my_message in the pMessage field of the queue message structure.
On the receiving side, the queue message pointer is passed and the receiver extracts
the message from the pMessage field based on the message ID. It should be noted that
there could be many messages passed between two tasks and message ID is one of the
common ways to distinguish between the messages.
The usual programming practice is to use structures to pass data between two tasks
even if it is a single character. Because, in future, if we want to pass more data, we can
add them on to the structure and the interface (i.e. pointer to the structure) will remain
the same.
3 3
10 4
10
Some provide operations to insert some (say, an important) message to the top of the
queue.
Using queues
It is a good practice to associate a queue with a task. This queue can then be addressed
through its unique mailbox id. This is illustrated in the following diagram:
T1 T2
Q1 Q2
T3
Q3
Let us assume Q1 is empty and TA blocks on it. Meanwhile if some message is posted
to Q 2, it will remain unattended till some message is posted in Q1. Unix programmers
usually have features like select call for a process that blocks multiple tasks. There are
no such mechanisms in most RTOSes. So, it is best to avoid these kinds of constructs.
We burnt our fingers once during one of our early projects…
We were to design and implement a task as a part of a networking protocol subsys-
tem. This task that was supposed to receive packets from a lower layer (let’s call it LL)
and pass it to Upper layer (UL) after some processing. We could also receive some
packets to be transmitted to lower layer.
UL
QUL
T1
QLL
LL
UL
T1 Q1
LL
But, ideally speaking, the design could have been such that, there are two tasks that
do the interfacing between the two layers. It is better to assign different actions that
could be concurrent to different tasks. However, it is up to the discretion of the system
designer, based on the system and software constraints.
7.9.2 Events
Events, as mentioned earlier, are also known as signals. These cannot be used to pass
data between tasks, but can be used to signal occurrence of some activity to another
task. Events are supported by many RTOS’. An event is encoded as an integer. Let us
see how events can be used. Consider a machine where an integer is 32 bits long.
Events can only be used between tasks that have mutually agreed upon using events.
Otherwise, miscommunication could occur.
B5 B4 B3 B2 B1 B0
0 1 0 0 0 1
The last 6 bits of an event f lag is given in the above diagram. (Fig. 7.25)
Let us consider two tasks T1 and T2 that want to communicate using events. The first
step is that they have to agree upon the events that they’ll use to communicate. The
events are integers with only one of their bits set to 1.
The values of numbers that can be used as events are numbers that can be
expressed in 2n where n = 0 (or more as restricted by the machine word size). Let one
of the tasks be a producer task that signals that data has arrived and let the consumer
task indicate that it has completed its processing. So, the two events that can be agreed
upon can be
#define MY_PROJECT_DATA_READY (0x01 << 0)
#define MY_PROJECT_PROCESSING_COMPLETE (0x01 << 1)
Moreover, if the consumer is overloaded, it can ask the producer to wait before it
produces more data. And, the consumer can indicate that it has sent all the data it wants
to send:
#define MY_PROJECT_WAIT (0x01 << 2)
#define MY_PROJECT_DATA_COMPLETE (0x01 << 2)
The values assigned to the above three symbolic constants are 1, 2 and 4 respectively.
Now, the consumer task has to wait for the event
event_receive( MY_PROJECT_DATA_READY | MY_PROJECT_DATA_COMPLETE,
ANY_EVENT, // Some OS specific flag
&received_events);
In the above example, the first argument is the events that the consumer task must
wait on. In this case, it waits for signal DATA_READY and DATA_COMPLETE. The
second argument is an OS specific f lag. In some OS’ it is possible to block the occur-
rence of any event or occurrence of all events. The third argument is that the events
have arrived in a bit encoded form. The third argument is used to find out which event
has occurred. Typical usage will be
if (received_events & MY_PROJECT_DATA_READY) {
// Means data is ready. So, process the data
}
else if (received_events & MY_PROJECT_DATA_COMPLETE)
// Data reception is complete. Cleanup or do something else
}
Note the use of bitwise ORing (| operator) for combining events and bitwise ANDing
(& operator) for checking of individual events.*
*Check Chapter 10 for implementation aspects in embedded systems.
Realtime Operating Systems 183
Similarly, some RTOS’ also give an option to block with time-outs to avoid condi-
tions where the system hangs if the necessary event does not occur. The time-out is
passed as an argument to the event_receive call.
To send an event, the event or the events to be sent and the task ID to be sent are
required:
event_send (MY_PROJECT_DATA_READY, consumer_task_id);
We should note that unlike queues, where messages stack up, in the case of events, it
does not happen. If an event is sent to task and before it processes it, if the same event
is sent again, the second signal can be considered ‘lost’, because the OS will not remem-
ber that the event is sent twice.
7.10 TIMERS
Almost every RTOS will provide features to use a timer. We have already seen that we
cannot use delay loops to implement timers in an environment that supports preemp-
tive multitasking. So, we have to use either some of the hardware timers that could be
available or use the abstractions of timers provided by the RTOS.
Various RTOSes provide various kinds of APIs to use timers. But some of the con-
cepts remain the same irrespective of the OS:
■ Tick ISR —This is an ISR that gets executed at every tick of the clock. The res-
olution of timers that an RTOS can provide is decided by the clock that is used for
the ticks and the tick ISR.
■ Starting of Timer — There is usually an API provided to start a timer. The
arguments are usually the duration of the timer (may be in micro/milli seconds or
in counts of ticks of the system clock) and the function to be called at the expiry
of the timer. For e.g.
tLED = timer_start ( 1000, // 1000 milliseconds
ToggleLED ); // Call a function to toggle
// ON/OFF an LED
// (void ToggleLED(void); )
Usually, APIs that are used to start timer usually return an ID to the timer that can
be used to either cancel or query the current status of the timer.
The above call will call the function ToggleLED() after expiry of 1000 milliseconds.
The RTOS that you may be using may provide a different interface. Some may post an
event or a message in the queue after the expiry of a timer.
Cancelling (Stopping) a Timer: There are usually APIs that can be used to stop a
timer before the timer expires. This can be done in cases when we are waiting for an
input but we do not want to be stuck in the case when input does not arrive or is lost.
In those cases, we start a timer before we request input and we cancel the timer on
arrival of input. If the input does not arrive, then in the timer expiry function we han-
dle the situation. A timer ID (like tLED in the code above) is required to cancel a timer.
This is one of the common patterns that is used when we interact with the external
environment for inputs /synchronisation.
timer_cancel ( tLED ); // Stop/Cancel the LED timer
186 Embedded Realtime Systems Programming
Warning
Strange situations can occur in cases when the input occurs and at the same time the timer also
expires. So, we might need to handle these kinds of issues also. I ran into a similar situation recently.
Positive Acknowledgement Scheme is a technique in wireless networks, where we assume that the
packet transmitted over the medium is considered successful only if the acknowledgment for the trans-
mitted packet is received. If the acknowledgment (ACK) is not received, we may have to indicate that
the packet was not transmitted successfully. I had a system in which results of both successful and
unsuccessful transmissions were communicated to a higher level entity. I suddenly realised that I was
getting more results (either successful/unsuccessful) than the packets I actually had to transmit. I then
realised that due to some complex processing in the ISR, even after the ACK was received, the timer
that started to stop waiting for the ACK was not immediately cancelled. So, the timeout indicated an
unsuccessful transmission and the receive ISR indicated a successful transmission for the same packet.
(The receive ISR cancelled the timer AFTER it had expired and the result of the cancellation unfortu-
nately, was not checked). So, I was getting more results than the packets and hence the transmit state
machine was going awry! The problem was solved by increasing the timeout and optimising the
reception ISR.
Requirement Engineering
8.1 INTRODUCTION
The first two activities trigger the rest of the events. For example, it is perfectly fine
to embark on testing activity after a requirements document has been created. This
chapter will look brief ly at the activities and issues related to gathering and specifica-
tion of requirements.
Hence the basic steps of requirements, architecture and design deserve a lot of effort
before jumping into the world of coding. As a matter of fact, the standard of embedded
industry is that the complete product development of an embedded system is 40%
design, 40% testing and only 20% coding. Secondly, it is advisable to catch bugs
creeping into the system as early as possible. Data shows that bugs caught in the
specifications phase are the cheapest to solve. They become more and more expensive
and effort sensitive as the product goes through the later stages of its lifecycle. It is advis-
able to spend some time in the beginning of product development to understand and
Requirement Engineering 193
document the requirements and design of the system before embarking on the imple-
mentation of the product. Implementation has its own issues and they are not in any
way trivial. However, if we do not follow this school of thought, we will burden the
implementation stage with issues that could easily have been solved at requirements
and design stages. In later sections, we will analyse the issues that affect different stages
of the development lifecycle and the cost of such issues. One thing, however, has been
observed over a long period of time:
P2P
The quality, clarity and completeness of specification description of a product have a lot of role in
driving the eventual quality of the product being developed.
Figure 8.1 illustrates the increase in the complexity and cost of bugs in different stages
of the development.
Complexity
of bugs
As we saw in the previous section, the statement of need is the first stage of conceptu-
alisation. This may include, at a very high level, the functions the product is expected
to perform, the stimulus-behaviour sets of some actions from the system in plain
English. Sometimes, this document is also referred to as the ‘feature list’. Alternatively,
194 Embedded Realtime Systems Programming
Requirement elicitation
Elicitation means gathering of requirements from the customer (the customer is defined
in Section 8.3). This means, listening to the customer, sending appropriate stimuli to the
customer so that accurate responses are received, understanding the needs of the customer
other than the ones being stated, asking meaningful questions in order to arrive at better
details, summarising the points under discussion from time to time in order to erase any
misconceptions or communication gaps, etc. In that sense of the word, requirement elic-
itation procedure is more of an art than a science since it goes into the realm of non-
technical aspects of engineering.
The most popular methods of requirement elicitation are interviews, brainstorming
sessions, questionnaires and use cases. Many rounds of these and combination of these
methods are usually used to elicit requirements.
Requirement analysis
Requirement analysis involves estimation of the cost based on requirement elicitation
process and classifying them into categories like: mandatory, optional and good to
have. The visualised solution during the requirement elicitation process may then be
scaled down into a workable solution, based on real life constraints.
Requirement analysis may identify dependencies among requirements, assumptions
to be made during development and any reuse possible from an existing product. Based
on real life problems associated with insufficient resources, insufficient time, change in
requirements, imperfect communication and lack of proper financial support, a work-
able solution is then arrived at. (See Fig. 8.2).
Requirement Engineering 197
Requirement specification
Now comes the time when the requirements are written in black and white based on
the elicitation and analysis phase of requirement gathering. If the previous two steps
have been followed properly, chances are that all requirements have at least been cap-
tured. The step of converting these requirements into a human readable and under-
standable form remains.
The IEEE 830 –1993 provides a description of good requirement definition. By and
large, a requirement definition needs to take care of following issues:
Function: The actual function of the product in the form of stimuli-response pairs or
use cases. For example, when a mobile station is paged by the network, it shall respond with a
paging response message : is a function in the requirement definition of a mobile phone.
Such requirements may be provided by standard bodies and/or business requirements
of the product and organisation. Business requirements may define the product strate-
gy based on its vision and scope and may explain where the product fits in the market.
Interfaces : All the external interfaces and environment in which the product is expect-
ed to perform. The interfaces and environment may include software, hardware and
human beings. The mobile phone shall provide the CLEAR key to the user in order to kill an
editor or cancel an outgoing call: is an interface requirement for the mobile phone.
Performance : The realtime characteristics and timing constraints of the product. The
mobile phone after getting switched on, shall be capable of responding to a paging within 30 sec-
onds, under ideal radio conditions : is a realtime requirement for a mobile phone.
Non-functional requirements : These requirements are not associated directly with the
product behaviour, but more with the overall characteristics such as maintainability,
scalability, availability, portability, testability, size, security, etc. The ROM size shall not
exceed 5MB : is a quantitative nonfunctional requirement.
Quality requirements : These are related to aspects of product development, such as,
development environment (use of structured language vs object oriented design), avail-
able budget and resources, etc.
Requirement Engineering 199
Requirement inspection
After the requirements are ready, it is time to inspect them for verification of all com-
ponents listed in the specification. A body constituted by affected parties and neutral
third persons should carry out inspection. The purpose of inspection is to check that —
❑ Requirement definition is complete
❑ It is error free and clear
❑ It is understood well and agreeable to all parties affected by it
The software shall compute aircraft position within the following accuracies:
■ + or − 50 ft in the horizontal plane
■ + or − 20 ft in the vertical plane
Let us take an example of a card verifier that controls the access of the visitor of a build-
ing through his card. This system will have a small sensor to detect the card and read
the bar code to identify the card. Then the system will search the code inside a list of
codes that are authorised and have the access inside the building. If the code matches,
it will generate a signal to release the lock for the gate. Otherwise it will show possibly
a red light on the front monitor and a user-friendly message such as “Card not autho-
rised”. If an unauthorised card is put inside the slot for a maximum number of times,
the system will generate an alarm. Figure 8.3 shows an illustration of such a system.
ROM
sensor and
Barcode Insertion
bardcode
of the slot
reader
card
Display
Release
lock
The system has three external interfaces. One interface is with the card that is
entered inside the slot. The other one is the lock of the gate. The third is the user inter-
face that displays a message.
The previous two paragraphs can be taken as statement of need from the customer.
As usually is the case, this statement of need is vague and very general from an engi-
neering point of view. It does not give insight into a lot of specifics that may be very
important for the engineering community. It is silent about the non-functional require-
ments of the system— its reliability, testability, ease of future development, etc.
Authorised
cards
Verify
card
System
administrator
Display
message
User
Release
lock
Raise
alarm
Fig. 8.4 Top level use case for card verification system
As shown in Fig. 8.4, this system has a very simple use case diagram. The users of
the system are the human users or more specifically the card that is inserted into the
system. Input to the system is a card-in event and a card-out event. The card-in event
causes the system to read the barcode number of the card. The card-out event is an indi-
cation that another card can potentially be inserted very soon so that the system should
get ready for the next input. The output of this system is unlocking of the gate, a user-
friendly message or a small alarm and red light.
Requirement Engineering 203
This system is a realtime system, since the output of the system, door lock release and
red light have to be generated in a ‘reasonable’ amount of time. The amount of time
that is reasonable is anybody’s guess and is not generally specified. It is measured in
few milliseconds accuracy. Hence, this system is not a hard realtime system. And, it is
a soft realtime system since a few milliseconds delay will not cause damage to life and
limb. The specifications for this system will finally give an approximate best and worst
time behaviour of this system. The timing issues of the system can then be derived from
the functional requirements given by the customer and these best and worst case sce-
narios.
Once, the use-case diagrams identify the external behaviour of the system on a high-
er level and list the kind of interactions between the system and the external world, it
is time to judge the different events in terms of their time relationship. One of the most
commonly used tools for this job is a message sequence chart (MSC). As the name
suggests, message sequence charts are basically a representation of components of the
system as a whole, a stimulus, and the representation of other events and message
exchange in time domain. MSCs give a feeling about the impact of different events on
a system, as also on the state of the system.
For example, MSCs can be made for situations related to an authorised card, that has
not been inserted properly and which has been left inside the reader, etc. These three
situations will possibly elicit similar responses as a whole but may be associated with
additional actions as well. The additional action may be a beep sound to indicate that
the visitor has left the card inside the machine.
MSCs operate in the realm of the system design, and within the specifications. They
usually throw light into the different ways in which a system can be broken and devel-
oped. However, a requirement definition in terms of its use-case and the MSC defining
its break up are closely linked and drive each other in the beginning of the project. The
requirement definition drives the MSC generation. In turn, any missing requirements
can easily be detected based on additional insight received with the help of MSCs and
realtime behaviour modelling of its components. We will describe the MSCs for this
system in the next chapter. However, a basic introduction in this chapter would not be
out of place.
brainstorm into all the possible events the system may have to manage. This effective-
ly presents new requirements for the system that previously the customer may not have
thought of. Thus a greater degree of control over the development cycle is achieved
since the customer requirements are met by the efforts of implementation team. Any
communication gaps left because of insufficient specification at the early stage of con-
ceptualisation of the system are filled here.
The following list defines the components of a good requirement document for embed-
ded systems.
❑ Author (list): This comprises of the author(s) of the document. One author should
be identified as the owner of the document. (S)he updates the document and main-
tains it throughout the duration of the project.
❑ Technical control or review team: This identifies the architects who review the
document for the sole intention of checking the technical suitability of the
document.
❑ Distribution list: This is a list of teams or people who get affected by the require-
ment document.
❑ Status of the document: Status refers to whether the document is a draft, or under
review, or released.
❑ Project Name: The name identifies the collection of activities for a particular
goal.
206 Embedded Realtime Systems Programming
❑ Date: The date mentions the last time the document was changed.
❑ Version: It identifies the evolution.
❑ History: It lists the changes that happened in the past with an overview of their
impact and the author of these changes.
❑ Abbreviation list: It lists the abbreviations used in the document. It may give a
reference to another document.
❑ Definitions: It gives an introduction to the terms used in the document. They
may be standard or particular to the document.
❑ References: Any related information can be found in the list of references.
❑ Perspective of the system: Defines the external interfaces of the system and the
kind of inputs that are expected in the system.
❑ Functional requirements: Defines what the system is expected to do based on the
different inputs or change of internal conditions.
❑ Performance requirements: If the system is expected to respond in a particular
time or if it has timing constraints for some processing, they are identified here.
❑ Use-cases: This is becoming an increasingly popular way to capture require-
ments.
❑ MSCs: To relate the system with the external world in the time domain.
❑ Other issues: related to testing, future maintenance, change control, constraints
with the environment, etc. to keep track of future activities.
Notice in the requirement document that the system has been seen from a black box
perspective. The requirements document has tried to create stimulus-behaviour pairs in
different situations only. No attempt has been made to think in terms of how (or
whether) it is possible to implement this specific kind of behaviour. When requirements
change, they have an impact on this relationship between this input-output system. This
specific change in input-output pair results in change in only specific portions of design.
The bottom line of this approach is to quantify the system in terms of stimulus and
actions. It is also possible to create specifications for parts of the system such as card
reader and database search, if being developed by different teams. In that case, a spec-
ification will exist for each such part or component of the system.
Requirement Engineering 207
To summarise, a good requirement document for an embedded system should have the
following characteristics.
constraint may not have been explicitly stated in the functional requirement document
since it may have been assumed as ‘understood’. On the other hand, a card verification
system may have a constraint on the available memory since it has to store a huge
number of authorised card codes for comparison. It is possible that these constraints are
not quantifiable. However, they can be told in relative terms. For example, for the card
verification system, the system is expected to complete its processing of the card in the
time that the current card is taken out and the next card is inserted inside the slot. Since
only humans are expected to interact with the system, and there are physical limitations
on the speed with which this procedure can be performed, this constraint gives a fairly
good idea about the processing speed required for each request.
If the system has a bearing on the behaviour of other embedded systems it interacts
with, or if it assumes a particular interface with some systems, the requirements docu-
ment is the best place to identify and detail them.
When specifications are available, it is time to define the architecture and design of the
system. The architecture of a system identifies its components, the interfaces between
them in a static and dynamic way. Each component will have a design potentially in
a hierarchical way. The design will determine factors such as what data structures to
use, how to distribute functionality according to their priority and their order of
calling, whether to use messages or mailboxes for communication, any imported or
exported APIs, etc.
9.1 GENERAL
P2P
The process of developing hardware and software simultaneously by delegating issues between
hardware and software, frequently, is called co-design.
212 Embedded Realtime Systems Programming
The following factors play a pivotal role in deciding the overall architecture of
the system.
Cost of developing the hardware : It typically takes more money to develop a part of
hardware dedicated to a specific task. And, it is more difficult later on to change that
hardware owing to a request for change in behaviour of the system. However, once
hardware has been developed, it is usually free from bugs and less prone to introduce
problems in the rest of the system.* And anything being executed in hardware is more
efficient and saves a lot of memory space that can be used by software applications.
When the embedded system is being budgeted, the cost of developing different hard-
ware plays a key role in defining the break-up of functions in software and hardware.
Change in behaviour or requirements of system : In a system that is expected to change fre-
quently, implementation inside hardware may not be a prudent choice. This is because
of two factors:
■ Firstly, hardware typically takes more time to develop.
■ Secondly, facilities to develop specific pieces of hardware may not be available
at all places. So, it involves additional delays.
When software and hardware teams sit at different physical locations, significant
delays in hardware can retard the overall integration of hardware and software.
Complexity of the hardware: The architect has to take into account the complexity of
the existing hardware and the cost of maintaining it in the future. In the embedded
world, usually, complexity is avoided at all costs since it makes future changes very dif-
ficult. Usually, hardware is kept for only those jobs, which are very expensive (in terms
of time or complexity) to be performed in software.
Timing constraints: In a lot of embedded systems, there are specific timing constraints.
For example, mobile phones have to listen to information broadcast over the air peri-
odically. This period is measured in microseconds. For example, every 577 microsec-
onds, a mobile station may have to tune to a particular frequency and time slot, send
62 bits of data to a particular base station, and then do some other tasks. Now, this accu-
racy of 577 microseconds cannot usually be guaranteed by software. These timing con-
straints require that some hardware circuitry is working in close co-ordination with the
*But, nowadays, as mentioned earlier, hardware too gets buggy. So the embedded engineer must be
careful enough to identify issues with hardware before spending inappropriate time in finding elusive
bugs in software.
Architecture and Design of an Embedded System 213
clock of the system and ensuring that the timing requirement of execution of such
actions are met. This also gives rise to a requirement that this strict timing constraint is
an important activity in the system and other jobs being executed inside the system may
be pre-empted when this situation arises.
Specific requirements of the system : Sometimes systems have specific requirements that
require some actions to be performed in hardware. For example, a mobile phone need-
ing to perform access for the establishment of a call, needs to use the “slotted Aloha”*
mechanism for channel access. This requires generation of a random number. If the first
attempt is not successful because of a collision, another random number should be
generated after a random duration. All these random numbers should be completely
independent from one another. Now, implementation of such a series of random num-
bers in such close proximity is very difficult (to say the least) in software. It would be
much easier to implement it in hardware (e.g. by using the random noise being received
by the receiver of the mobile phone) and it will also guarantee that the numbers so
generated are indeed random in value, unlike software which can produce only pseudo-
random values.
Synchronisation needs with the external world : An embedded system needs to interact
with the external world. The system and the external entity need to follow the same
clock so that they are able to understand each other all the time. So, before any trans-
mission takes place, both the systems decide who is the master for the communication
happening between them. Then the slave has to synchronise its clock with that of the
master. Usually, such kinds of requirements exist for embedded systems being used for
communication protocols. For example, when a wireless device is switched on, it needs
to synchronise itself to the beacon of a base station in order to get information about
timing and synchronisation. Different beacons may have different clocks.
Change of configuration : Any system having the ability and requirement to change the
configuration parameters should have some software to interact with external environ-
ment to use it and possibly store it. This change of configuration usually introduces a
different path of execution inside the software. This configuration implementation is
best done in software. As we saw, earlier, these parameters may exist in EEPROM or
FLASH.
*Slotted Aloha is a contention-based protocol used when a single channel for communication exists
and there are many data sources. Any data source transmits some information and waits for a reply. In
case of a clash from another data source, a random delay is introduced before the next transmission
and this process continues till contention is resolved.
214 Embedded Realtime Systems Programming
All the above mentioned requirements are fairly common across the embedded system
domain, what differs is the relative importance of each driving factor behind it. So, it is
advisable to look at the architecture styles prevalent among the software engineering
community and compare them with respect to their focus and area of usage. This will
give us a feeling of identifying our system with that of some tested systems. We can take
the advantages of a lot of architecture styles and build a composite architecture for the
unique mix of driving factors for our system. This section together with the next section
shall provide us with a lot of information gathered from the varied experience of
embedded system architects.
There exist basically four broad architecture styles:
❑ Data f low
❑ Data centric
❑ Virtual machine
❑ Call and Return
Store
A B
C
A
Data
then used to simulate it in order to develop or test the other components. As is evident,
this style can provide portability easily. In embedded systems parlance, this architecture
style is especially useful since in many cases the underlying hardware is either not avail-
able or it is simpler or cheaper to develop and test software with a simulated version.
Usually, the software for embedded systems is tested on a host to find all the major
problems of implementation. In such a case, a virtual machine architecture is useful in
providing the relevant abstraction for development and testing environment.
Main
Foo6
Foo3 foo1
Foo4 foo2
Foo5
Flow of control
P2P
In real practice however, systems are implemented through a mix of architecture styles mentioned
above.
In order to arrive at suitable architectures, the designers can use the experience and
expertise of architectures developed previously in similar situations. This existing well-
proven experience in software development can be used to create a software with
specific properties. In fact, while designing a system, very often than not, most design-
ers try to relate the properties of the new system with some system designed by them
218 Embedded Realtime Systems Programming
in the past and then reuse or tailor it so that it can be applied in the current context.
This is basically true and advisable for two reasons. One, a working system designed
earlier for a similar problem gives more confidence that the new design will work.
Second, it saves time if an idea, model or implementation can be reused.
Such an idea is an architecture pattern. An intuitive definition for architecture pat-
terns can be given as follows:
Tips
By no means, an architecture pattern can be just lifted and copied into the current
system. This may be possible if the systems are very similar and they have to deal with
similar driving forces, but generally, an architecture pattern is associated with a context,
for example, a distributed system or interactive system or embedded system or a com-
bination of some of these. Second, the architecture pattern provides a way of solving a
common recurring problem in this context. For example, CORBA is an architecture
pattern for distributed object based systems.
This section shall describe some patterns available for embedded systems. These pat-
terns are discussed below together with their context.
Router pattern
Router pattern creates independent components such that the communication among
them is transparent to the sender and the receiver. There is a dynamic binding of des-
tination component for each message inside the router component. So, all components
indicate the messages they want to receive to the router, henceforth, the component
Architecture and Design of an Embedded System 219
sends all messages to the router. The router then sends the message to the destination
component. If the message was a request primitive, the router takes care to send the cor-
responding confirm or reject primitive back to the sender component when the request
has reached the destination. This pattern is especially useful if, the components have to
be independent of each other, and when, components have instances. When compo-
nents have instances, each instance can communicate with its corresponding instance
on another component in a simple way, based on the routing table maintained by the
router. When instances get destroyed, the router is updated.
An example of this pattern inside dual mode 3G phones shall be useful here. Dual
mode 3G phones should be capable of connecting to a 3G network, as well as, the older
2.5G GPRS networks, depending on their availability. So, a link layer can be developed
for both technologies, and a router decides which layer is active based on the registra-
tion performed by that component. The higher layer components become independent
of which process and component they need to send their messages to.
Microkernel pattern
Microkernel pattern is used to create a basic set of essential services and a mechanism
to develop additional applications and extensions in an independent way based on the
kernel core. In a way, this pattern is very similar to the Unix kernel mechanism. The
shell hides the applications from the kernel specifics and vice-versa. The kernel is
generic and need not be modified for any application. Applications receive a standard
interface from the shell and use the services of the shell to perform their jobs. This
results in architectures where slightly different applications based on a core set of oper-
ations need to be developed, or in cases where the life span of applications is not large
and need to be enhanced without touching the core of the kernel.
Applications
Shell
Microkernel
Client-server pattern
The client-server pattern is used in cases where one component needs to access service
from another component. Usually, these components exist on physically separate
nodes; however, conceptually they may co-exist as well. The server provides a
particular service and starts listening for requests coming from clients. In this way, the
implementation of service becomes independent of the request. The server and client
only need to establish the protocol with which they communicate. The client must
know the physical address of the server in order to connect to the destination. Also, the
server and the client should use common protocol mechanism in order to understand
each other, and, they should implement recovery and acknowledgement procedures
possibly. The sequence of events in case of a client server pattern happens as follows:
■ Server starts its service and waits for a request in a predefined format on a well-
known logical location such as port.
■ The client sends a request to the server using its physical address and port num-
ber known before hand.
■ The request is routed to the server.
■ Server sends a response to this request back to the client using the physical
address of the client mentioned in the request message.
■ The client may send more requests or the server may send more data.
The client and server are independent of each other, so this pattern aids in imple-
mentation-independence, portability and scalability. However, there are two serious
drawbacks. First, the client should know before the communication, the physical
Architecture and Design of an Embedded System 221
address of the server. If the server changes its location or there is another server pro-
viding the same service, some changes are required on the client side to make alternate
arrangements.
Server1
Request:1.1.1.1 1.1.1.1
Client
Request:2.2.2.2
Server2
Response 2.2.2.2
Second, the client and server should use a mutually agreed protocol. This means that
if a client wants to connect to servers A and B, that use different protocols for commu-
nication, the client needs to implement both these protocols in order to use their ser-
vices.
Server1
Request:1.1.1.1
1.1.1.1
Request:Service
Client Proxy Request:2.2.2.2
Server2
Response
2.2.2.2
The card verification system introduced in the previous chapter can be understood first
by drawing MSCs that relate the components of the system and their interaction. The
system has basically four physical components: card slot, door lock, card verifier and
screen.
The MSC in Fig. 9.7 introduces a correct path of execution in which the card is
authorised and is inserted into the slot properly.
This case is known as the ‘Happy Path’, i.e. a situation where nothing unanticipated
has happened. This is the path that is supposed to be executed very often. But, a robust
software design must also anticipate some error conditions. Two of such conditions that
can be thought of are
i. Insertion of an unauthorised card
ii. Removing the card too quickly
Architecture and Design of an Embedded System 223
In each of these cases, the user is alerted of the error.
CardInserted( )
CARD_IN( )
Is Card
Authorized( )
Open Lock( )
DisplayWelcome( )
CardRemoved( )
CloseLock( )
ClearDisplay( )
The following MSC (Fig. 9.8) describes the situation where an unauthorised card was
inserted.
This MSC (in Fig. 9.9) takes a look at a scenario where the card has not been insert-
ed into the slot for the minimum duration.
224 Embedded Realtime Systems Programming
User
CardInserted( )
CARD_IN( )
IsCardAuthorized( )
DisplayAuthenticationFailure( )
CardSlot Screen
User
CardInserted( )
CardRemoved( )
DisplayCardNot Read( )
Design for
ReadCard
Functional
requirements
Requirements Architecture for
for Card Reader CardReader
Design for
CompCard
Design for
ManageDoor
Testing for
CardReader
Figure 9.10 illustrates how the various stages of development of this system are con-
nected to one another. Figure 9.11 gives a possible architecture of this system.
As we can see in this diagram, the interfaces have been identified from the point of
view of implementation. Once, this arbitration is achieved, each of these tasks can be
226 Embedded Realtime Systems Programming
designed independently and in parallel and integrated after implementation. The figure
shows some of the major types of interface prevalent in embedded systems: interrupt,
message, function-call and callback.
CardDriver
software CompCard
ReadCard List of authorized
cards
ManageDoor
DoorDriver
Process
Functional interface
Interrupt interface
Message interface
Driver callback interface
*1A mobile phone accesses the network over the air. For this it needs to get synchronised to the base
station in time so that the information being broadcast is accessible and readable.
*2The coverage area of the base station to which the mobile station is listening governs an area of service.
228 Embedded Realtime Systems Programming
of its data related to authorised cards built-in. This data is not expected to change
while the system is being accessed. This data has its own limitation that it is potentially
huge and secondly, what matters is the time in which a new card number is searched
in this list since, that is what governs the availability of the card verification system
for the next user. In case of mobile phone, the limitation is to read the data completely
and correctly as it occurs on the air interface, and then take actions based on this
data in a realtime manner before the processor time is allocated to other. Therefore the
module that is receiving this data needs to be given high priority so that the data can
be read at proper times and no other trivial actions halt this operation. Then the
modules that act on this data need to run uninterrupted (for example by application
modules).
9.5.5 Decomposition
A typical embedded system is first divided into layers and/or tasks. Sometimes these
layers are defined in the system by governing standard bodies. This is in case the
embedded system is supposed to interwork with other systems from other vendors.
Bluetooth, Wireless LAN and other wireless mobile phones are examples of such stan-
dard systems. These layers are present to perform specific actions and are usually based
on the ISO OSI mechanism for defining a protocol stack (See Fig. 9.12).
Application
Request and
confirm Primitives
Presentation
Session
Transport
Network
Data Link
Physical
*FISU: Fill-In Signalling Unit messages. In SS7, this heartbeat mechanism is used at MTP layer level. All nodes
send this empty message to each other whenever nothing else is present to be sent.
230 Embedded Realtime Systems Programming
and node C are providing two routes to the same destination node D. So at any given
time, node A can decide that the health of node B is not particularly good, and hence
can change the routing to node C and help the node B in recovering from congestion.
In wireless system, the mechanism of heart-beat is used for a different purpose. Since
the resources on the air are very precious, they cannot be allocated to all users all the
time. The air interface, however, is particularly prone to bad radio conditions and
potential drop in quality of channel. Hence if heart-beat is not received from a
particular user, the resources assigned to that user are released and then they can be
potentially allocated to other users. The heart-beat and associated procedures are
typically a part of lower layers inside the protocol stack. Once the quality of link is
guaranteed, the higher protocol layers use more sophisticated methods to deal with
selective retransmissions, or retransmissions after a particular number of packets inside
a huge stream of data (including ARQ and selective repeats).
Node B
Congestion
Node D
Node A Alternate
path from
A to B
Node C
Waterfall model
The waterfall model is a stage-based approach to implementing a system. First a
requirement is taken, and then a design is provided for this requirement, following
which it is reviewed and finally the implementation begins, followed by testing of the
module. Figure 9.14 gives these different stages of a waterfall model.
Correspondence
Requirement Testing
Correspondence
Design Integration
Coding, unit
testing
Iterative approach
In the iterative approach, on the other hand, the module is looked at from all perspec-
tives in the beginning itself. So we have use-cases that specify requirements and archi-
tecture. The class diagrams throw light on the design, etc. In this way, it is possible to
analyse the system at a much greater level in the beginning itself. Since most of the
problems in the field occur due to unforeseen errors during design and unfortunate
crossover of events, the UML approach seems to be better in countering this threat.
However, the proponents of waterfall model argue that UML based design sometimes
becomes too detailed in the beginning itself. As usually is the case, the combination of
both approaches should generate better results. The waterfall model should be used
while keeping the advantages of UML related to thinking in terms of use-cases, class
diagrams, etc. to visualise and represent parts of the system.
232 Embedded Realtime Systems Programming
Factor Weight
ROM Medium
RAM Medium
Maintainability High
Ability to evolve or get extended High
Portability Low
Effort Medium
Efficiency High
Table 9.1: Weighted factors for architecture evaluation
Now we can look at three different ways of breaking the tasks of the system.
❑ Approach 1, in which we have a separate task for everything. That means three
tasks, one for housekeeping during idle mode, one for data transfer during the
Architecture and Design of an Embedded System 233
connection, and one for measuring quality of service, and then initiating the
connection process for sending a report to the other node.
❑ Approach 2, in which there is only one task handling everything.
❑ Approach 3, in which there is one task for handling quality of service issues, and
one more task, which takes care of everything else.
All these approaches may be appropriate in the light of the factors listed above.
some modules may have to be optimised with regards to their usage of memory. This
is also required to judge the optimum allocation of stack space to different tasks. When
the RTOS starts execution, all stack space is initialised to some constant number say,
0×55. During its execution, the realtime system uses varying amount of stack space.
After a lot of trials, the unused space that still contains 0×55 is freed for some other use.
❑ Author (list): Author(s) of the document. As for requirements, one author should
be identified as the owner of the document. S/he updates the document and main-
tains it throughout the duration of the project.
❑ Technical control or review team: Identifies the team, which reviews the docu-
ment for the sole intention of checking the technical suitability of the document.
This review takes special focus on effect of external interfaces on the module.
❑ Distribution list: List of teams or people who get affected by the design docu-
ment. These possibly will be other modules that have interface with this module.
❑ Status of the document: Whether it is a draft, or under review, or released.
❑ Project Name: This identifies the collection of activities for a particular goal.
❑ Date: mentions the last time the document was changed.
❑ Version: identifies the evolution.
❑ History: lists the changes that happened in the past with an overview of their
impact and the author of these changes.
❑ Abbreviation list: lists the abbreviations used in the document. It may give a ref-
erence to another document. This may give a reference to architecture document
or requirements document.
❑ Definitions: Introduction to the terms used in the document. They may be stan-
dard or particular to the document. This may give reference to requirement doc-
ument or architecture document.
❑ References: Any related information can be found in the list of references.
❑ Perspective of the system: Defines the external interfaces of the system and the
kind of inputs that are expected in the system.
❑ Data structures: Based on the interface and expected behaviour of the system,
the set of data structures are defined. These data structures are the core of infor-
mation processing for the module. It is advised to provide this section with
attributes like initial values of variables, their scope, conditions under which they
will change, etc.
❑ State diagram: The high level state diagram of the system identifies the static de-
sign of the system. This gives all pairs of stimulus-response expected in the system.
236 Embedded Realtime Systems Programming
Architecture of a system is like a blueprint of the system. It concerns itself with identi-
fying the components of the system, interactions between them and managing trade-offs
based on priorities. Each of the components can then be individually designed (possi-
bly) independent of each other. A lot of architecture styles are available, each focusing
on some priorities over others. While architecture has evolved over the years, the archi-
tect community has documented architectural patterns that have been known to solve
Architecture and Design of an Embedded System 237
some typical problems that occur in architecture definition. The design of the system
heavily depends on its priorities. This chapter gave examples of different choices avail-
able to the designer based on the kind of system being developed. In the end, the design
document should be traced from the requirements and should clearly define static and
dynamic behaviour of the system through SDLs /UMLs and handle possible
erroneous /cross-over cases.
Implementation Aspects
in Embedded Systems
10.1 INTRODUCTION
Now this is one topic that all embedded engineers love. The smell of wires and sound of
data inside them makes many a heart flutter with joy ☺. Implementation has its own his-
tory. In the not so old world, assembly language used to rule the roost in embedded sys-
tems. With the advent of efficient compilers in C, it became possible to get the same
power of assembly language while retaining the ease of coding and understanding of a
high level language. C came like a boon for the programmers especially in the embedded
world and has since its introduction remained as the queen (or is king?☺) of the embed-
ded world. Object oriented methods and languages are making heavy inroads into the
world of embedded world of late, however, even today most programmers feel comfort-
able and secure with this language. Hence, this chapter has been written with C in mind
even though most of the things are valid for any other similar high-level language.
Ideally speaking, implementation deserves a complete book by itself. However, it
would not be out of place to document some tips about implementation aspects in
embedded systems in this chapter. These implementation tips are usually encountered
during the daily life of an embedded engineer. This chapter should be read from the
perspective of good and bad practices during implementation. Like the proverbial stitch
in time, good programming habits save lot of time later.
10.2 READABILITY
The code written by an engineer should be readable and understandable by the rest of
the community. In this direction, the following are helpful.
240 Embedded Realtime Systems Programming
10.2.2 Comments
Comments are very useful for readability of code. The comments should be given in
sentence form, with correct spelling, grammar, and punctuation (although the termi-
nating period is not extremely important). Good code comments should strive to tell
the reader why, as opposed to what the code is doing.
Bad comment example:
/* Assign the value of 4 to the variable x. */
x = 4;
Implementaion Aspects in Embedded Systems 241
Good comment example:
/* Loop four times: once for each corner of the rectangle. */
x = 4;
Here is another example with clear, simple and useful comments.
/*
All ports except the MAX_PORT port are initialised
since the last port is reserved for use by the debugger.
*/
int lv_index ;
for(lv_index = 0; lv_index < NUM_RECTANGLES; lv_index ++)
{
ga_Areas[lv_index] = CalculateArea( );
}
Now, because of the block comment in the beginning of else block, it becomes imme-
diately clear what this block as a whole is supposed to do. And, it becomes clear that
the author has missed out the condition to check the MIN_AREA.
These block comments should be written in the beginning of the development of
code so that only the conditional statements and block comments are written. This
skeleton is then later filled. This helps in organising the code better and there are few
loopholes in the code.
To execute code that uses this variable, we would write the following fragment:
if ( ! TempNotGreater )
{
/* Temp > threshold code here */
}
The predicate inside the if statement is hard to understand. We have to say to our-
selves, “OK. Since TempNotGreater is false means negative of negative of this condition means
temperature is higher ”. Instead, it’s clearer to name the boolean ‘TempLesser’, with these
meanings:
if ( TempLesser )
{
/* Temp < threshold code here */
}
There is no reason to calculate this value by hand when the compiler can calculate
possibly more accurately than us. Instead let us see this definition:
#define PI 3.141592653
#define TWO_PI (2.0*PI)
244 Embedded Realtime Systems Programming
It is easier to understand the intent and be more accurate. Let us say we are defining
the center of a rectangle, consider these two possible definitions:
#define RECT_LEFT 100
#define RECT_RIGHT 200
#define RECT_MIDDLE 150
The latter definition helps us understand the meaning of RECT_MIDDLE. Also, the
latter definition enables us to change RECT_LEFT and RECT_RIGHT without explicit-
ly recalculating RECT_MIDDLE.
In order to aid future maintenance of the code, some points are noteworthy.
return return_status;
}
The result of the expression enclosed within an if statement must be either true or
false. Thus the code can be simplified:
int Mycompare2( char * testarray )
{
int return_status = (strcmp(testarray, “TEST” ) == 0);
return return_status;
}
10.4 PERFORMANCE
Though compilers perform a lot of optimisation, the way we write code can also affect
the performance of code.
x = index << 3;
y = (index << 3) + (index << 1) + index;
}
Division can be implemented using the right shift operator and modulus by a power
of two can be obtained by performing the binary and operation (&) by the same num-
ber, less one, as follows:
for(index = MIN; index < MAX; index += INCREMENT)
{
/* x = (index % 8) */
/* Y = (index % 32) */
x = index & 7;
y = index & 31;
}
Usually bitshifting is faster than multiplication, division and modulus. The comments
inside the code are useful to describe the result to fellow developers.
bit_field_struct wasted_array[MAX_NUM];
If the flags above are used to store some status values or similar operations, we can
save seven (because of byte stuffing) bytes by creating them using a bit each. We can
use more bits for flags that have more possible values too. Since we are wasting mem-
ory in structure definition itself, any array like wasted_array above created through
bit_field_struct will replicate this wastage. Consider the following code instead.
Implementaion Aspects in Embedded Systems 249
typedef struct bit_field_struct
{
u8 one:1;
u8 two:1;
u8 three:1;
u8 four:1;
u8 five:1;
u8 six:1;
u8 seven:1;
u8 eight:1;
}
bit_field_struct wasted_array[MAX_NUM];
The MAP file provided by the linker (or some tools provided by the RTOS ven-
dors (e.g. pSOS Awareness)) can be used to find out the location where the stack
of a particular task begins. We must remember that the stack grows downwards.
T1 Stack begins
0×1000 Stack grows Stack used
downwards by task T1
T2 Stack begins
0×0600 Stack grows Stack used
downwards by task T2
We can have watch points where the task stack ends. (For e.g. 0×0601 in the above
picture). The program execution will stop whenever stack grows to its limit.
❑ Another way is to fill the entire stack area with a pattern e.g. 0×DEAD or sim-
ilar. Now let the program run for some time. Now if we examine the memory we
can see how much the stack was unused by seeing the memory dump of stack
region. Based on this the stack size can be either increased or decreased.
10.5.4 Endianness
Endianness refers to the representation of multibyte variables inside embedded system
as low-order first (called little endian), or high-order first (called big endian).
Care should be taken to maintain the endianness throughout. An endianness differ-
ence can cause problems if the processor tries to read binary data written in the oppo-
site format from a shared memory location or file.
Implementaion Aspects in Embedded Systems 251
Consider the following example of a little endian processor:
short x = 1 ;
short z = 0 ;
short y = 0×FE;
In case of a little endian architecture the memory location will look like the
following:
Memory address Contents
0×1000 01 00 00 00
0×1004 FE 00 00 00
Notice that the value is stored contrary to how humans tend to read the two bytes.
So, the programmer should be a little careful when accessing the contents of this mem-
ory directly through a pointer since it may result in an erroneous interpretation of mem-
ory.
10.5.5 Compiler-optimiser
Compilation is followed by an optimisation step in which redundant and useless code
is removed. Normally, it works in the benefit of the programmer. For example, consid-
er the code in Listing 10.1.
{
int a = MAX_VAL ;
Here “a” has been defined and assigned a value of MAX_VAL. The program com-
pares the same value again in the next statement. This is a perfect candidate for opti-
misation and most optimisers will remove the comparison statement a == MAX_VAL.
Makes sense. The problem comes if “a” is a shared variable and can be potentially
changed by another task. So after “a” is assigned a value of MAX_VAL above, an unfor-
tunate task switch happens and the other task changes its value. Now, we want to call
252 Embedded Realtime Systems Programming
{
volatile int a = MAX_VAL ;
In such cases, we can use the volatile keywork (see Listing 10.2). This keyword tells
the optimiser that the variable associated with this keyword should not be optimised at
any cost since it is “volatile” to optimisation and can create undesired results.
Another place where we should disable the optimiser is when we are reading from
say a memory where data is changing, (for example a memory-mapped I/O). So we will
be constantly accessing the same memory location through a pointer possibly in a loop.
Most optimisers will disable reading the memory again and return the contents read for
the first time. Making the pointer volatile solves this problem.
Callback systems
Callback systems are the forefathers of today’s application frameworks. They work on
the principle of “inversion of control ” i.e. “we will call you, don’t call us ”.
In this kind of system, we implement a few functions as specified by the callback system
and these functions are called whenever required. A classic example to callback (not call back
system) is the comparison function we pass to quicksort (qsort()) routine. (quicksort what?)
One real world example could be when you implement a Network Interface Card
(NIC). The NIC usually implements the MAC (Media Access Control) of Data Link
Implementaion Aspects in Embedded Systems 253
layer and PHY (Physical) layers of OSI stack. This should integrate with the LLC
(Logical Link Control) already available in the host.
For e.g., if the host wants to transmit a packet using the NIC, what does it do? The
NIC MAC layer can implement a callback exported by the LLC. And LLC would call
this function whenever it wants to transmit a packet. This makes the software in the
driver independent of the actual conditions under which the function will be called by
the application. It is important to note that even if we implement the callback function,
we don’t call it directly. It is called as and when required by the systems (or the appli-
cation framework).
State machines
State machines seem to be ubiquitous today and their applications are wide and varied.
It might seem strange that these state machines are not an old concept in computer
science! State machines are spin-offs from finite state automata theory.
This theory has revolutionised the field of compilers. It should be interesting to
observe that, the earlier compilers did not allow nesting of expressions more than a cer-
tain level because they could not parse those expressions. The compilers grew tremen-
dously powerful after the introduction of automata theory.
State machines rule the protocol world. Almost every protocol in tele-
communications and networking fields uses state machines. The extensive use of state
machines has led to creation of a language called SDL (specification and description
language). SDL extends the functionality of state machines and provides a lot of addi-
tional features.
In some operating systems there is direct support for implementing state machines.
This is OS specific. But for other RTOS’ we have to implement our own state machines.
In this section we will explore ways of implementing them.
NOT_CONNECTED
P2P_CONNECT
CS_CONNECT
DISCONNECT
DISCONNECT
CONNECTED_CS
CONNECTED_P2P
TX TX RX
RX
At any point of time, the protocol could be in any of the following states:
i. NOT_CONNECTED
ii. CONNECTED_P2P
iii. CONNECTED_CS
The state transitions occur due to events that could be triggered by the environment
or could be triggered internally. (e.g. timeouts)
State transitions could also occur because of setting/resetting of a flag. This is illus-
trated by the following example.
run_heater
[current_room_temp == set_temp]
idle
This is explicitly noted here because, many people either associate the action with
the transition or assume that it happens after transition. This ambiguity can be removed
by thinking of transition as the final step of action.
Now, we will explore two ways of implementing state machines. The two ways
described here are:
i. Using switch-case construct
ii. Using function pointers
Implementing state machines using the switch-case construct: Here, we create various states
and signals using enum or #define and use a switch case construct to implement a
state machine.
Let us consider the case of networking protocol illustrated in Fig. 10.2.
The three states can be defined in a header file (StateMachine.h) as:
typedef enum ProtocolState_ {
PS_NOT_CONNECTED,
PS_CONNECTED_P2P,
PS_CONNECTED_CS
} ProtocolState;
PS_ prefix is added as the acronym for ProtocolState. This can be replaced by
the project name.
256 Embedded Realtime Systems Programming
Let us assume that these signals are appropriately defined in StateMachine.h after
sufficient precautions for multiple inclusion.
There are two ways of looking at this state machine implementation:
i. In every state, various signals are handled
ii. Each signal behaves differently in different states
We will choose the first option that is widely used because of many reasons like
extensibility and reusability.
/* static global variable that stores the current state
of the state machine */
static UINT16 u16State = PS_NOT_CONNECTED
switch (u16State)
{
case PS_NOT_CONNECTED:
{
switch ( signal )
{
case P2P_CONNECT:
{
/* take appropriate action … */
Implementaion Aspects in Embedded Systems 257
}
break;
case CS_CONNECT:
{
/* take appropriate action … */
}
break;
} /* end of switch(signal) */
case DISCONNECT:
{
/* … */
}
break;
case PS_CONNECTED_P2P:
{
/* … */
}
break;
default:
{
DB_PRINT (“Unknown State”);
}
break;
Pros:
Very simple to implement.
Cons:
If each case becomes big, the handler must be made into a separate function.
Otherwise, its readability goes down drastically.
Implementing state machines using function pointers: Another way of implementing state
machines is by using function pointers and state/signal matrix.
The state signal matrix for the above protocol example can look like:
OnTxRequestinConnectedP2P ( );
Signal State
It is important to note that all the signals are not handled in all the states. It is better
to fill in a common error function in all unexpected signals in all the states rather than
leaving them undefined. Then, we’ll never know if a spurious signal occurred. Our
code will crash unexpectedly.
Pros:
This system is easy to debug.
Cons:
As the states and the signals grow, the matrix may become sparse and may occu-
py much more space than necessary.
The designers/implementers should weigh the pros and cons of both the methods
before choosing one over the other.
Implementaion Aspects in Embedded Systems 259
Sound programming practices are like the proverbial stitch in time. Coding guidelines
are helpful in creating consistent quality code across the team. Proper care should be
taken to make the code readable, robust, maintainable and efficient. Macros are very
helpful in creating well-written code, however, the programmer should use them prop-
erly, else, they have the potential to introduce bugs.
Based on the qualities of real time systems, usually they are implemented using state
machines.
Estimation Modelling in
Embedded System
11.1 INTRODUCTION
An embedded project is usually executed by a team. This team will have a goal (in
terms of fulfilling the functional requirements from the customer). So, it becomes inher-
ently important to take stock before beginning the project and try to create a picture of
cost and time required.
The first step in this direction is to try to calculate the amount of effort required to
convert the set of functional requirements into working code, unit test it and integrate
it. This procedure is called effort estimation. This chapter provides an introduction to this
procedure. We try to focus on the factors that make estimation inherently difficult and
inaccurate. We will highlight the reasons for keeping the estimations recursive and up
to date.
Though this chapter introduces estimation as an integral activity of software
development, the information in this chapter is far from being exhaustive. The
discussion in this chapter has been intentionally kept brief and introductory. The reader
is advised to look at references given at appropriate places inside the chapter for more
information.
Suppose you are invited to your friend’s place for a dinner party. Your friend gives you
a map of the part of town he resides in, writes down his postal address and telephone
number and leaves. You are supposed to be there at 8 PM. It is a nice evening. You take
262 Embedded Realtime Systems Programming
a good bath, wear cologne and set off — cool breeze in the hair and not a worry in the
world. Not a worry till you realise that you do not know what time to start in order to
reach there at 8.
If you are in a situation like this, what thoughts will come to your mind? You may
start thinking, “well, let me make a smart guess based on the facts I have at hand”. You
may take into account the traffic patterns in that part of the city at this time of the day,
you may ask other people or your friend to give you a feeling of the time it takes to
reach there. You may like to consider the status of your car before applying some
alteration on the duration given by your friend. If you own a Porsche, you may want to
change the duration he has specified (for example ☺). This complete process of time
duration arrived at by you based on historical data and expert advice, to use the jargon,
is called estimation. By its very definition, estimation is not accurate. This is because
we have not taken into consideration what is actually happening on ground for arriving
at this estimate, or what events will have to be considered in future to arrive at an
accurate figure. We have taken only past experience into account. And past experience
is based on past events, which may not be valid now. So, if you are not Nostradamus,
it will be difficult for you to predict future events and take them into account ☺.
What we saw in the last section applies perfectly to software. Suppose your customer
comes out with a project and a set of requirements. You need to estimate these require-
ments based on three factors, namely:
❑ how much effort is required for it, i.e. man days,
❑ how many people and resources are required for its execution, i.e. cost;
❑ what will be the basis of acceptance of the project by the customer, i.e. quality.
Secondly, clients often ask for an estimate without giving an accurate description of what
they want. (E.g. “I would like a vending machine program. How much will that cost?”)
This drastically undermines the ability to determine how long a task will take. Usually,
most programmers heavily underestimate the time required, sometimes in order to get the
job and sometimes from inexperience. And, as we all know, in the programmer’s world,
everything can be done in a few weeks ☺. Secondly, usually in practical situations, if I am
in competition with other contractors bidding for the same project and the client is look-
ing for the best bargain, I am bound to lose the bid if our estimate is larger than others’. It
is a trade-off between high cost being expensive and low cost being risky.
4x
Design
Requirements
Code
Delivery
Feasibility
0.25x
When a project is being estimated, usually the required information is not completely
available and a lot of ambiguities exist.
Estimation is performed in a number of steps. At any point of time, the estimator has
to perform these steps in order to arrive at the final estimate for the deliverable. This
section will provide an introduction to these steps. The next sections will describe how
to perform these steps.
Accuracy means how close the estimate is to the actual figure. For example, if the
final effort is 100 staff hours, an estimate of 90 hours is more accurate than 80 hours.
Precision defines the level of uncertainty for that estimate. For example, 90 plus or
minus 40 is less precise than 80 plus or minus 20.
As we saw in Fig. 11.1, the uncertainty cone is very wide at the beginning of
the project. Hence the accuracy and precision are not very high at this time, and so
cannot be relied upon heavily. The estimation of size should invariably be refined
(made precise and more accurate) during the entire duration of the project, periodically,
or in different stages (requirements to design to implementation, etc.), or at the
update of requirements, etc. It is often easier for an estimator to propose an estimate
in SLOC. This is because the final product will be measured in this unit and
also, because of experience of working with C by embedded engineers. The unit
of measurement may be convenient; however, it is very difficult to predict a size
in terms of lines of code when not a single line has been written. First, it heavily
depends on coding styles of individuals. Second, it depends on the language being used
for implementation. FORTRAN is known to use less SLOC for mathematical
operations as compared to C++, but fails utterly when it comes to performing file
operations!
Hence, if the organisation has very strict programming guidelines as proposed in
previous chapters and they are followed religiously this basic inconsistency may be
nullified. Otherwise, these factors may bring an inherent inaccuracy in the estimation
process.
Gather
requirements
Estimate size
Estimate
effort Previous
Review the estimate at regular
Projects
Database
Estimate
schedule
Analyse the
estimation
Available
process
Estimate cost Resources
Review the
estimate
The estimator should keep the availability of staff in mind and the level of expertise
available in executing similar projects in the past. The amount of code reuse possible
also has a bearing on total estimated effort. The size of code that has to be written for
adapting the code for reuse definitely brings in overheads. In addition, organisational
factors like attrition, holiday, etc. and effort for risk management create an impact on
the estimated schedule. All these little estimates are themselves inaccurate and they
tend to add on to the inherent inaccuracy of an estimation process.
Expert opinion
One of the first methods that comes to mind in case of estimation of a particular
software is to get the expert’s opinion. The expert would have experienced a similar sys-
tem in the past and would have come across similar code. He or she is best equipped
to perform an extrapolation from the past and apply it to the new estimation. However,
this needs to be taken with a pinch of salt. First, usually such an estimate does not have
a quantitative analysis. Hence it is difficult to review it in a quantitative way. Second,
the estimate so done may differ from expert to expert in the perception of complexity
of the problem, the foreseen problems and risks are based on different possible
implementations. Hence it is very difficult to come to a conclusion based on tentative
subjective analysis. Third, no two experiences are the same. If an expert has an
experience in developing a project in the past, it is near to impossible that the new soft-
ware will be the same. In fact, very few parts of the code will be similar to the old one.
Hence, the estimate will be mostly based on the factors affecting the current project.
When a lot of money and resources are involved, we would certainly like to use some-
thing more than just intuition! Please note that this estimation can be done in any unit
be it SLOC, FP or something else: it does not really affect the estimation process and
the associated difficulty of accuracy and precision.
Strengths of this method:
❑ The domain expert is the best person to estimate.
Historical data
Another way of doing it is to use historical data of similar projects and apply it to the
current activity. This will give close results if historical data is available in the organisa-
tion in the first place. Secondly, this data should correspond to the kind of system being
developed. An embedded system is doomed for disaster if data from a GUI system is
taken for estimation. Thirdly, factors under which the previous system was developed
should be taken into consideration.
Delphi technique
In order to arrive at an analysis, certain parametric or algorithmic models can
be used. The most famous among these models is the Delphi technique. This
technique uses a group of experts who are asked to estimate the software individually
and in isolation. The coordinator without a group discussion then averages the
272 Embedded Realtime Systems Programming
experts’ opinion. A variation to this technique called the wide band Delphi works as
follows:
i. Co-ordinator provides each expert with an estimation sheet.
ii. Each expert fills out the form individually.
iii. Co-ordinator collates all estimates and marks the points where the estimates
differ widely.
iv. Co-ordinator calls a group meeting; where the experts discuss these points and
understand from one another the basis of arriving at the estimation figures.
v. Based on the discussion, experts review and submit the estimates again.
vi. The co-ordinator and the experts go through iterations in order to arrive at a
consensus.
Weaknesses:
❑ Since experts do the estimates based on their experience, there is no way to
judge the accuracy of an estimate.
❑ Estimate arrived at group consensus may be more accurate, however, it may
involve a lot of time. A quick estimate from an expert may be quick but not
quantifiable.
❑ The estimate is not always exactly repeatable.
❑ Arriving at a consensus of different estimates may be difficult. There is no
clarity on whether to find the average or median of these different estimates.
There are two ways to refine this approach. If a person is asked to provide an esti-
mate about the card verification system, he will most probably reply, “Mm, may be
2000 SLOC”. This is possible because, usually while giving estimates from the top of
the mind, the system is viewed in its entirety and many times the complexity involved
in the interfaces among the components are not taken into account. This is called the
top-down method of estimation.
Estimation Modelling in Embedded System 273
The estimation can be arrived at in a different way if the expert tries to break the sys-
tem into its components to get a feel of the internal behaviour of the system. The expert
bases this on the knowledge and experience of working for similar system. This is called
the bottom-up method of estimation.
The expert can then provide an estimate based on the two approaches. Conceptually
both top-down and bottom-up approaches usually arrive at the same results. It depends
on the ease of use for the expert.
Strengths:
❑ System level focus.
❑ Takes care of overall internal and external factors like system integration into
account.
❑ Puts emphasis on the coupling or interaction among the system constituents.
Weaknesses:
❑ Can be less accurate because of lack of detail at the level of constituent blocks.
❑ No focus on the complexities of individual components of the system. So their
estimation may not prove to be accurate.
Bottom up estimating
In this technique, the cost of each software component is estimated and these costs are
then added to arrive at an estimated cost for the overall product. As can be seen easily,
this approach provides the estimator with what the top-down approach fails in— the
detail about individual components.
Strengths:
❑ All components are individually estimated, so there is a better estimation
basis.
❑ Interaction between the components can be better estimated once the
individual components are understood well.
274 Embedded Realtime Systems Programming
Weaknesses:
❑ May overlook overall common effort related to system integration,
configuration management associated with software development.
Based on the experience of the software industry, there are some lessons to be learned
regarding estimation and the associated planning.
Change and addition in requirements: The possibility of new requirements during the
development of software and the associated changes need to be kept in mind while
doing estimation. COCOMOII [1] manages this fact well while doing estimation.
Estimation inaccuracy: By its very definition, estimation is inaccurate. It needs to be
refined at each stage of development.
Management of the project: Even with accurate estimation, a mismanaged project can
lend itself into trouble with respect to effort and schedule. Hence, wastage should be
eliminated at all costs.
Don 't over-promise: Human limits should never be reached while planning. When all
estimates are pointing otherwise, it is better not to over-promise. It finally boils down
to the basics of doing too many things wrong or doing few things right.
Beware of excessive multitasking: Humans cannot perform a lot of jobs at the same
time. While performing the planning for a project, it is vital to look at this tendency on
the part of software engineers. When the number of jobs performed simultaneously by
a software engineer increases, the quality of deliverable invariably suffers.
Precision is not possible: As we saw early in the chapter, estimation should be done
in ranges owing to the inherent lack of precision in estimation. These ranges become
narrower and narrower as the project proceeds until at delivery time when they con-
verge on the actual delivered size of the product.
Use several methods in parallel: It is advisable to use a lot of techniques for estimation
in parallel thus verifying the final figure against each other.
Estimation Modelling in Embedded System 275
Proper planning: While making a schedule, it is imperative to consider that people
work x days a week and for y weeks in a year. They are not evenly spread across the
whole year. People take holidays. There are other distractions such as travel, recreation
and organisational indulgences. So the schedule needs to take these factors into account.
Proper review: It is advisable not to be in a hurry while doing estimation. It is always
good to sit down and review the estimates made in the first round. People are known
to be overly optimistic when they want to make estimates. Hence, some time should
definitely be spent in revising the analysis and factors affecting the estimations.
Who should do the estimation? Two sets of people are indispensable for doing esti-
mates. First, veterans of estimation, these people have battled hard and know the gen-
eral issues affecting the project. The other people are the developers themselves.
Developers usually have strong focus in technical details and it helps in highlighting
problems in implementation.
Ask as many questions as possible: When doing
P2P estimates, we should feel free to ask questions.
These questions will narrow down the assump-
Involvement of developers during
tions we have made and hence lead us towards a
estimation brings in a feeling of
ownership that garners commitment more focused and less hazy analysis.
to the project.
Store project data: Organisations that keep
database of projects they completed in the past
can reuse this knowledge when planning for the next projects. This is true even when
the projects of the past were different or involved different factors, or were performed
under different conditions. Past records give us a hint of achievable targets and a basis
can be arrived regarding the new project. It may vary slightly depending on the extent
of changes; however, data collected over a period of time can be really accurate.
Estimation of size, effort and schedule are the prerequisites for a good project plan.
Estimation of size can be done on the basis of complexity of the code and other factors
such as coupling between components, and can be expressed as SLOC or Function
points. This size can be converted into effort based on the empirical formulae available
for different projects from a study conducted by Boehm.
All said and done, estimation is a team activity, and all members of a team need to
be involved during initial estimation, review and planning.
12.1 INTRODUCTION
As is true with all systems, embedded systems need to be validated before they are
shipped for delivery. Testing of embedded systems is done at various levels: first the
developers perform unit testing of lowest level code, module level testing to test a group
of units forming a logical entity, and finally system testing to validate the entire system.
Another way of looking at testing is the kind of errors being looked for, that is, the scope
of testing. It is interesting to note how embedded system validation differs from
validation of applications. Also, the kind of challenges that exist in validating software
written for embedded systems are quite interesting. This chapter begins with this
discussion followed by a brief introduction of the different kinds of testing performed
on embedded systems, validation tools available, testing strategies and tactics, some
troubleshooting tips and finally, we will look into some of the most famous embedded
system faults in the human history together with the reasons behind them.
In general, any software needs to be tested using the following three basic steps:
i. Create an input for the software in some form.
ii. Receive an output and compare it with the expected output.
iii. Repeat it till it can be safely concluded that the major paths of execution and
decision have been tested.
278 Embedded Realtime Systems Programming
We need to keep in mind, that for a reasonably big software, whether it is application
or embedded, it is highly difficult, if not impossible to test each and every part of it.
That is why the last item in the above list states that the test should be concluded after
all major paths have been checked. The definition of what the major path constitutes,
need to be defined before the test execution.
Let us then list out what makes testing of software so very difficult.
❑ Sensitivity to errors: Software is inherently sensitive to errors. A near correct
value is no better than a completely wrong value. The percentage of software exe-
cuted in the correct way does not matter if the output is not correct. The only
difference is the amount of time that needs to be spent in debugging may vary
based on how easily the error can be tracked. Even then, mostly, a completely
wrong answer is easier to track than an error that does not occur all the time or the
one that gives a slightly wrong answer, because both of these conditions will
generally get executed in some obscure portion of the code.
❑ Complexity: All said and done, software is complex. If we take the instruction
set of a language, there are seemingly infinite ways in which they can be combined
in order to arrive at a code.
❑ Testing issues: Even a simple program cannot be tested completely based on all
combination of possible inputs and comparing against all possible outputs. Testing
all execution paths and decision statements with all possible values is a really hard
job. This, together with the real life constraints of aggressive schedules and limit-
ed resources, makes it a real losing battle. So, at the end of the day, what is impor-
tant is to realize that software shipped for delivery can never be claimed for zero
defects. We can only give a feeling of the quality of software based on defects
found, and how the number of defects has gone down over a period of time. And
most importantly:
P2P
Testing can only unearth the presence of bugs; it cannot lay any claim on the absence of them.
Warning
Armed with this background of Section 12.3, it would now be worthwhile to know
how best all these problems can be minimised. On a broad level, testing of
embedded systems can be divided into two categories based on their platform.
Embedded are one of the very few kinds of systems that are developed on one
kind of platform (generally Unix, however, Windows too), but need to be cross-
compiled. The first platform is called the host and the second platform is called the
target. Host is used for development because of limitations of editing and compiling
code on the target system. So, testing needs to be performed on both the host
and target level.
Incomplete software
Testing on target hardware means that the complete software should be available for
validation. This means that the target testing can be performed only after the entire
development has finished. This is a major limitation since target testing cannot check
the quality and validity of individual components of embedded software. So, it is more
expensive since for each small bug in a piece of software (that may not actually relate
to being called “realtime”), the complete software needs to be executed, and the long
race of chasing of the mysterious bug begins. It is much more cost effective and time
282 Embedded Realtime Systems Programming
Incomplete hardware
Not only does the software need to be complete, but also the hardware should be up and
running before the actual target testing is done. Now, it is a project planning issue, but in
most cases, both the hardware and software cannot be delivered on exactly the same day.
This means that at least one of the teams has to wait. While hardware is getting ready, it
makes more sense to perform testing on the software in a limited way on the host.
Regression
Due to realtime and actual usage of the complete environment, it is much harder to sim-
ulate a bug again in target platform later in order to check for regression. Once a bug
has been detected, it can be simulated easily on a host and can be effectively used later
as part of standard test suite to contain regression.
Incomplete testing
Target testing cannot test all portions of the code. The reason is simple. A good part of
embedded systems code relates to catching exceptions triggered by rare failures. The
rest of it deals with realtime characteristics of input signals and depends on what “state”
of processing the system was at that time. So, it is not possible to claim that all code has
been tested. It is much easier to simulate such situations on the host and test code.
No realtime
All problems that relate to the realtime behaviour of the system are very difficult to test
on the host. Even though simulators are used, they can help only to a limited extent
Validation and Debugging of Embedded Systems 283
since they perform some kind of overlay over the existing software, thus changing the
real time behaviour of the code.
Host System
Test script/keyboard
Hardware
Host
independent
code
Logs/display
Target System
Sensor
Hardware
Outside world
Hardware
Hardware
independent
code
Output
Hardware
There is a part of code inside the system that is completely hardware independent,
and a piece of code that depends on the hardware. The idea is to replace this hardware
dependent code with a test setup that simulates this environment. The simulation test
code needs to be written for ISRs, timing interrupts, direct access to memory and
devices, other modules, etc. Usually, such test code is called a “stub”.
time performing manual testing over and above the well-defined tests in automated test-
ing. Chances are that the manual testing also unearths bugs. So, the automated testing
didn’t replace the manual testing, the advantages of manual and automated testing cre-
ated a synergy and made the testing process much more effective.
Let us look at it in a different way. Let us assume that the project has planned for 20
days of testing effort. If 80% of the test plan can be automated, only 4 days are spent on
the automated testing, and at the end of it, the result is the same as that had been
achieved by performing manual testing for 20 days. If manual testing is continued in
the rest of 16 days, any other bug found could not have been found if the automated
testing had not been available.
To conclude:
We usually cannot find new bugs by performing automated test-
ing but we can reduce testing effort tremendously, while still
ensuring quality of the product by performing a judicious mix of
manual and automated testing.
of each other. The test suites can be developed as soon as the requirements for a
system are ready and much of the work can be done parallel to development of source
code. And because functional testing looks at the system from a requirement point
of view, it is best suited for testing of nonfunctional requirements as well. These
features allow it to find problems that are not detectable by regression testing or
code-based testing.
Once host testing is finished target testing can commence. As we have noticed before,
it is a bit more difficult to test on the target platform because of the limited visibility of
what goes on inside the target, in realtime.
Target system
ROM emulator
with software of
Cross-compiler embedded system
on host
Memory
socket
Serial line connection for
loading of ROM emulator
Software of embedded
system
Debugger software
Debugger
software
on host Target system
As is evident, the source program needs to be cross-compiled together with the small
debugger software and then loaded on the target.
12.5.5 JTAG
JTAG is a hardware tool that can control and observe boundary pins of a device for
verification of their operation via software control. The reason that a special tool had to
be created for this purpose is because of the proliferation of number of pins in a given
area on a chip. A JTAG ( Joint Test Action Group) consortium exists that caters to the
requirements and standardisation of this testing procedure. IEEE 1149.1 standard,
known as IEEE Standard Test Access and Boundary Scan Architecture provides
complete detail of this procedure.
292 Embedded Realtime Systems Programming
For a boundary scan to be possible, the device should be compliant to JTAG, which
means that processor provides what is known as a JTAG port. A cable connects the host
to the JTAG port on the target system and software on the host controls the target
microprocessor through it.
Validation activity is necessary in order to find out problems in the software. However,
as we will see some guidelines about programming in the later chapters, here are some
tips about programming embedded systems:
❑ KISS: Write simple code that can be easily understood and changed by a third
person if required. Efficiency is a major factor in embedded systems. But it brings
in complicated optimised code. A balance needs to be maintained between the
two. In other words, Keep It Short and Simple.
❑ Build tracing mechanisms in the code early enough such that a compile switch
or a similar method can enable them. Traces can vary from knowing the execution
flow of functions down to the values of variables.
❑ Document your interfaces carefully: Any interface used by your module should be
well documented. Any assumptions in this regard can prove detrimental.
❑ Do not assume anything about realtime behaviour: While designing the system, it
is advisable not to make any assumptions like:
This message will always be received in this state since it normally takes 5 msec
for response to travel from task B to task A.
After sending a request to task B , task A has sufficient time to process internal
conditions. So, this function need not be optimised.
Such assumptions can prove costly.
❑ Create contingency plan: The software should be designed with significant dis-
tance from deadlines. If the software has been found to be working perilously near
to its deadlines while validation, Murphy’s law is bound to haunt you in the field.
This section provides case studies of some famous faults in embedded systems togeth-
er with an analysis of the causes behind these failures.
❑ Therac-25: Therac-25 was a medical linear accelerator. It overdosed six radia-
tion therapy patients over a two year period leading to deaths of three of them.
294 Embedded Realtime Systems Programming
The cause that was found related to a total lack of formal software product life
cycle, insufficient time allocated for testing, little documentation and an adhoc
approach to implementation and testing of software.
❑ Ariane-5: Ariane-5 was a $500 million rocket designed to launch satellites in
1996. This rocket flew for a little more than 40 seconds before self-destructing.
The cause was found to be a software error, tracking launch data that was not
even relevant to the execution of flight when the error occurred. It triggered a
chain of events resulting in finally a self-destruct.
An improper software reuse from Ariane-4, an unnecessary 64-bit floating point
datum, a horizontal launch velocity vector, was forcefully converted into a 16-bit
signed integer causing an overflow. Incidentally, this data was relevant to the sys-
tem only when it was on the launch platform and was misleading 30 seconds after
the launch. The system decided to self-destruct.
❑ Mars Mission in 1998: Mars orbiter was supposed to orbit around Mars as the
first interplanetary weather satellite. However, it lost communication with NASA
due to either entering orbit too sharply and getting destroyed, or with a small orbit.
Cause: Failure to approach orbit at the right angle because of an inconsistency
in the units of measure used by two separate modules developed by separate soft-
ware groups.
❑ Mars Lander 1999: Mars Lander was supposed to land on the surface of the plan-
et and perform experiments for 90 days. Communication was lost after entering
into the atmosphere.
Cause: Spurious signals generated when the lander’s legs were deployed during
descent, giving an indication that it had landed even before it had actually done
so, thus crashing it into the surface of Mars.
Warning
Task 0
Task/Process running
Task/Process Pre-empted
15.1 INTRODUCTION
C++ has always been thought of as a ‘big’ and ‘lavish’ language loaded with tones of
features, consuming space and slowing down the execution. This led to the perception
that, C++ cannot fit in ‘small’ systems. This idea was seconded by the older compilers
producing 2MB executables for a small ‘Hello world’ program. But the language is now
much mature and a large number of very good compilers are available. It is time to give
a serious thought about C++ for powering Embedded/Realtime Systems.
This paper is divided into 4 parts as following
i. The Myths regarding C++.
ii. C++, OO, Component Based Development, ORBs, …
iii. A perspective on Embedded C++.
iv. Transition to C++.
15.2 PURPOSE
C++ was mainly thought of as a language that was ‘distant’ from embedded systems.
The paper challenges that notion by showing the effectiveness of the language.
Myths regarding C++ exist because of the lack of knowledge of “what happens under
the hood”. The basic constraints of any language are code size and speed of execution.
298 Embedded Realtime Systems Programming
There are many facts contributing to these constraints. In C++ point of view, the
general misconceptions and their reasons are described in this section.
The inline functions of C++ is another powerful concept, which is always blamed
because of misuse, accounting to code bloat. Inline functions provide understandabil-
ity and safety of normal functions. Appropriate use of inline functions will improve
both size and speed.
Now the inheritance mechanism of C++, is nothing but “Struct-inside-Struct” in
C. But in C, as we need to manually implement this, will cause introduction of more
and more functions.
void fn(int j)
{
struct myStruct mySt;
mySt.i = (int *)malloc(MAX_CLS *sizeof (int));
…..
free(mySt.i); /* usually programmers fail to do*/
}
Listing 15.3: Constructors and destructors
Appendix A 301
The constructors and destructors in C++ automate the process of acquiring and
releasing resources (not just memory). The technique of having a constructor acquire a
resource and a destructor release can be phrased as ‘resource acquisition is initialisation ’.
class Header {
public:
virtual int length ( void ) = 0;
// …
};
private:
char* szBuf;
// …
};
private:
char* szBuf;
// …
};
More than just elegance, simplicity and improvement in readability, this causes
considerable reduction in code size by 1/n where n is the number of types of packets (in
this case).
15.3.6 Maintainability
Our vProcessHeader will work even if a new class of packets with a new type of head-
er is introduced.
15.3.7 Reusability
Reusability was achieved in C using libraries (using the Structured Analysis and Design
paradigm). These approaches though quite suitable for smaller projects, they cannot be
used to construct large software systems. Any change in their interface broke the build.
They were the main cause of increase in fragility of large systems built this way.
Let us consider a code that could be used to write on a display terminal in C. This
can be called vDisplayOnScreen (char* szText );
Now if we wanted to add a background colour, we need to rewrite
vDisplayOnScreen (char* szText) as vDisplayOnScreen (char* szText, enColor color ).
304 Embedded Realtime Systems Programming
This obviously breaks the build. This causes ripples to be caused throughout the pro-
ject and obviously the code has to be recompiled. Let us see how we can do this in
C++.
class DisplayWriter {
public:
virtual void display ( );
// …
private:
string text;
};
Say, now we want to add a background color, we can write a class such as
BGDisplayWriter.
private:
enColor color;
};
Embedded C++ specification is a subset of C++ that aims to reduce code size and
improve speed of C++ code by excluding certain features of C++ that cause code and
time overheads.
Zero overhead rule in design of C++: What you don’t use, you don’t pay for it. This sim-
ply means that the language features that you do not use do not cause overheads, run-
time or code size. Most of the good compilers go a long way in implementing this. But
RTTI, exception handling, etc. inevitably cause some increase in code size. But most
compilers give you an option of disabling these features when you don’t use them.
What is embedded C++?: Embedded C++ is a scaled down version of ANSI C++,
with the following features removed:
❑ Multiple Inheritance/Virtual base classes
❑ Run Time Type Identification (RTTI)
❑ Templates
❑ Exception Handling
❑ Namespaces
❑ New style casts (static_cast, dynamic_cast, const_cast, reinterpret_cast )
Exception and RTTI are some of the major features that cause quite some code bloat
without the user being aware of it. In theory, a good implementation of templates by the
compiler causes no code bloat. The ‘standard template library ’ (STL) is one great reason
to shift to C++. But, by removing templates, this advantage is nullified. Nowadays, good
C++ compilers also give an option of EC++ with templates. (For e.g. Green Hills™ C++
compiler comes with a dialect called ETC++ which is EC++ with template support).
Even though, namespaces and new type casts do not cause any increase in code size they
were not included in EC++ because those were relatively new features then.
306 Embedded Realtime Systems Programming
Using EC++ causes quite some reduction in object code produced from C++, espe-
cially in large software systems. As compilers mature, the overheads also will reduce
considerably. The EC++ standardisation was done by http://www.caravan.net/ec2plus.
The language constructs are so that, their appropriate usage will provide good space
and time saving measures, but careless usage may cause problems. So, the following
facts should be taken into account during migration to C++.
15.5.2 Templates
A class template is rather like macro expanding to an entire class. Older compilers had
expanded the templates to classes every time it is encountered which was having dev-
astating effect on the code size. But the newer compilers and linkers find duplicates and
produce at most, one expansion of a given template for a given parameter class.
15.5.3 Exceptions
Exceptions are abnormal condition handlers, which will really help the programmer to
handle such conditions, also facilitating prevention of resource leakage. Support for
exception results in small performance penalty for each function call. This is to record
information to ensure that, destructor calls are made when an exception is thrown. But
usage wise, the exceptions are worth this cost, if used properly.
15.6 CONCLUSION
We see that C++ has come a long way since the first Cfront compiler was written for
C++ by Bjarne Stroustrup in AT&T Bell (then) Labs. The compilers do a fantastic job
in producing code with very little overheads. It took 20 years for C to enter embedded
systems because it was thought to be a language that was huge for embedded systems.
Now C is the most widely used language for programming embedded systems. Through
this paper we state the facts regarding using C++ in embedded/realtime systems. We
see that the benefits clearly overweigh the defects. Let us not delay the usage of C++
in embedded systems for next 20 years. The language is mature enough now and it is
the right time to shift to C++ to reap the benefits of OO-based software development.
15.7 REFERENCES
The card verification system was introduced in Chapter 1. The requirements of the
system have been defined in Chapter 8.
Create a requirements document for the system.
1
Design all components shown in Figure 8.3
Write the pseudocode for these components and simulate this device.
I want to automate my house in the following way: My mobile phone sends a mes-
sage to my PC at home that controls all household devices. So, when I leave my
place of work in the evening, I just send a message to my PC. My PC then switch-
es on my geyser so that I have hot water ready for a bath. My coffee maker starts
preparing coffee and the music system plays the kind of music I like when I enter
2 my house. Also, the refrigerator detects if any groceries are missing and sends me
a message via the PC.
Create a simulation of this intelligent network that satisfies the above require-
ments. Make suitable assumptions wherever required while giving appropriate
explanations.
312 Embedded Realtime Systems Programming
Assuming the following code is run on a system that allocates two bytes for an
unsigned int and one byte for unsigned char. Calculate the data segment of the fol-
lowing code:
#define TRUE 1
#define FALSE 0
unsigned int my_int;
unsigned int another_int=0;
unsigned int third_int=1;
int main(void)
3 {
unsigned int local_int=3;
unsigned char local_char;
int i;
my_int = local_int;
The following code uses a function called GetBits that performs the following
function:
Takes argument an offset byte off_byte, offset bit off_bit and number of bits num
Operates on a global array buff of length 24 bytes.
Returns a long int by filling least significant bits of buff with num bits from the
Position off_byte & off_bit.
The code takes an input buffer stream and starts decoding bits and fills the struc-
4 ture global_struct based on the values of some parameters. What different prob-
lems/bugs can you spot in the code? Imagine char occupies 1 byte and unsigned
long 4 bytes.
typedef struct
{
char ncc;
char bcc;
char power;
Exercises 313
char chn;
char c1;
char beacon;
} global_struct ;
int decode_struct( )
{
off_byte = 4 ;
off_bit = 2 ;
if(GetBits(off_byte, off_bit, 1) == 1)
{
off_byte += 21 ;
off_bit = 0 ;
}
else
4 {
off_bit ++;
}
if(GetBits(off_byte, off_bit, 3) == 0x2)
{
off_bit += 3 ;
global_struct.ncc = GetBits(off_byte, off_bit, 3);
global_struct.bcc = GetBits(off_byte, off_bit, 4);
global_struct.power = GetBits(off_byte, off_bit, 6);
}
else if(GetBits(off_byte, off_bit, 3) == 0x1)
{
global_struct.chn = GetBits(off_byte, off_bit, 3);
global_struct.c1 = GetBits(off_byte, off_bit, 4);
global_struct.beacon = GetBits(off_byte, off_bit, 9);
}
}
In the previous question, is it possible to optimize the usage of RAM? What about
5 ROM? For achieving the same functionality, how is it possible to write more effi-
cient code?
As we saw inside the chapter, memory pools are an efficient way of removing the
6 non-real-time effects of dynamic memory allocation. The total amount of memo-
ry that can be allocated inside a system is 20 KB. It is desired to create memory
314 Embedded Realtime Systems Programming
pools of sizes 100 bytes, 1K and 2K. Create a best-fit pool allocation and de-allo-
cation mechanism taking care of any external interrupts entering the system.
Write a driver to take care of a DMA channel of size 1024 bytes. The DMA chan-
nel generates an interrupt when the buffer is half full as well as when it is full. The
7 data should be stored in a circular buffer and pointers should be used to keep track
of new and old data.
Two devices A and B are connected by a UART that can support 9.6 baud. Device
A takes input from a user or a stored file and starts transmission of data. The speed
9 of this transmission may be more than 9.6 baud. Design a protocol between these
devices such that data is not lost mid-way.
Exercises 315
Many operating systems now provide “plug-and-play” functionality. For example,
it is possible to plug another device to a running system. The same functionality is
10 also found in ad-hoc networks of small devices such as WirelessLANs and
BluetoothTM. How do these devices detect each other? Would polling be suitable
here?
Design a queuing mailbox mechanism between two tasks inside a system. It should
provide the following features:
a. Two levels of priority of messages based on different function calls
b. Ability to look inside the queue to check for a particular message
11 c. Save a message instead of consuming it, which means that the message
goes to the end of the queue
d. Sending messages to multiple recipients
3G 219 communication 5, 11
accuracy 267, 268 Computational power 4, 7
Address Dynamic decisions 5
bus 53 Memory 5, 11
pins 55 Realtime 5, 6, 9
ADS 30 Auto 71
Allocation of local variable in a stack 100 Bar code reader 201
API 211, 236 Binding 29
Architecture bitfields 248
definition 211 bit-shifting 246
pattern 218 Board support package 32, 146
distributed 220 need 146
broker 222 Broker 222
client server 220 Buffer size allocation 71
proxy 221 Fixed 72
layered pattern 218 Variable 72
micro-kernel 219 build process 21, 22
router pattern 218 byte stuffing 247
style 214 C runtime 71
call and return 216 callback 236, 252
data centric 215 Card verifier 201, 222
data flow 214 Chip Select (CS) 54, 55
virtual machine 215 CAS 55, 56
ARQ, selective repeat 230 RAS 55, 56
Array 70 COCOMO 274
Attributes of embedded systems 4, 8 Co-design 211
318 Index