Embedded Systems: Lecture Notes
Embedded Systems: Lecture Notes
ON
EMBEDDED SYSTEMS
(IARE-R16)
INTRODUCTION
This chapter introduces the reader to the world of embedded systems. Everything that we look
around us today is electronic. The days are gone where almost everything was manual. Now even
the food that we eat is cooked with the assistance of a microchip (oven) and the ease at which we
wash our clothes is due to the washing machine. This world of electronic items is made up of
embedded system. In this chapter we will understand the basics of embedded system right from
its definition.
The hardware & mechanical components will consist all the physically visible things
that are used for input, output, etc.
An embedded system will always have a chip (either microprocessor or
microcontroller) that has the code or software which drives the system.
HISTORY OF EMBEDDED SYSTEM
The first recognised embedded system is the Apollo Guidance
Computer(AGC) developed by MIT lab.
AGC was designed on 4K words of ROM & 256 words of RAM.
The clock frequency of first microchip used in AGC was
1.024 MHz.
The computing unit of AGC consists of 11 instructions and 16 bit word logic.
It used 5000 ICs.
The UI of AGC is known DSKY(display/keyboard) which resembles a calculator
type keypad with array ofnumerals.
The first mass-produced embedded system was guidance computer for the
Minuteman-I missile in 1961.
In the year 1971 Intel introduced the world's first microprocessor chip called the
4004, was designed for use in business calculators. It was produced by the
Japanese company Busicom.
On generation
1. First generation(1G):
Built around 8bit microprocessor & microcontroller.
Simple in hardware circuit & firmwaredeveloped.
Examples: Digital telephone keypads.
2. Second generation(2G):
Built around 16-bit µp & 8-bit µc.
They are more complex & powerful than 1G µp & µc.
Examples: SCADA systems
3. Third generation(3G):
Built around 32-bit µp & 16-bit µc.
Concepts like Digital Signal Processors (DSPs),
Application Specific Integrated Circuits(ASICs) evolved.
Examples: Robotics, Media, etc.
4. Fourth generation:
Built around 64-bit µp & 32-bit µc.
The concept of System on Chips (SoC), Multicore
Processors evolved.
Highly complex & very powerful.
Examples: Smart Phones.
2. Medium-scale:
Slightly complex in hardware & firmwarerequirement.
Built around medium performance & low cost 16 or 32 bit
µp/µc.
Usually contain operating system.
Examples: Industrial machines.
3. Large-scale:
Highly complex hardware & firmware.
Built around 32 or 64 bit RISC µp/µc or PLDs or Multicore
Processors.
Response is time-critical.
Examples: Mission critical applications.
On deterministic behavior
On triggering
Embedded systems which are ―Reactive‖ in nature can
be based on triggering.
Reactive systems can be:
Event triggered
Time triggered
What is an embedded computer system? Loosely defined, it is any device that includes a
programmable computer but is not itself intended to be a general-purpose computer. Thus, a PC
is not itself an embedded computing system, although PCs are often used to build embedded
computing systems. But a fax machine or a clock built from a microprocessor is an embedded
computing system.
This means that embedded computing system design is a useful skill for many types of
product design. Automobiles, cell phones, and even household appliances make extensive use of
microprocessors. Designers in many fields must be able to identify where microprocessors can
be used, design a hardware platform with I/O devices that can support the required tasks, and
implement software that performs the required processing.
Embedding Computers
Computers have been embedded into applications since the earliest days of computing.
One example is the Whirlwind, a computer designed at MIT in the late 1940s and early 1950s.
Whirlwind was also the first computer designed to support real-time operation and was
originally conceived as a mechanism for controlling an aircraft simulator. Even though it was
extremely large physically compared to today‘s computers (e.g., it contained over 4,000 vacuum
tubes), its complete design from components to system was attuned to the needs of real-time
embedded computing. The utility of computers in replacing mechanical or human controllers
was evident from the very beginning of the computer era—for example, computers were
proposed to control chemical processes in the late 1940s [Sto95].
A microprocessor is a single-chip CPU. Very large scale integration (VLSI) the acronym
is the name technology has allowed us to put a complete CPU on a single chip since 1970s, but
those CPUs were very simple. The first microprocessor, the Intel 4004, was designed for an
embedded application, namely, a calculator. The calculator was not a general-purpose
computer—it merely provided basic arithmetic functions.
However, the ability to write programs to perform math rather than having to design
digital circuits to perform operations like trigonometric functions was critical to the successful
design of the calculator. Automobile designers started making use of the microprocessor soon
after single-chip CPUs became available. The most important and sophisticated use of
microprocessors in automobiles was to control the engine: determining when spark plugs fire,
controlling the fuel/air mixture, and so on. There was a trend toward electronics in automobiles
in general—electronic devices could be used to replace the mechanical distributor. But the big
push toward microprocessor-based engine control came from two nearly simultaneous
developments:
The oil shock of the 1970s caused consumers to place much higher value on fuel
economy, and fears of pollution resulted in laws restricting automobile engine emissions. The
combination of low fuel consumption and low emissions is very difficult to achieve; to meet
these goals without compromising engine performance, automobile manufacturers turned to
sophisticated control algorithms that could be implemented only with microprocessors.
There are many household uses of microprocessors. The typical microwave oven has at
least one microprocessor to control oven operation. Many houses have advanced thermostat
systems, which change the temperature level at various times during the day. The modern camera
is a prime example of the powerful features that can be added under microprocessor control.
A programmable CPU was used rather than a hardwired unit for two reasons: First, it
made the system easier to design and debug; and second, it allowed the possibility of upgrades
and using the CPU for other purposes. A high-end automobile may have 100 microprocessors,
but even inexpensive cars today use 40 microprocessors. Some of these microprocessors do very
simple things such as detect whether seat belts are in use. Others control critical functions such
as the ignition and braking systems. Application Example describes some of the microprocessors
used in the BMW 850i.
Application Example
BMW 850i brake and stability control system
The BMW 850i was introduced with a sophisticated system for controlling the wheels of
the car. An antilock brake system (ABS) reduces skidding by pumping the brakes. An automatic
stability control (ASC_T) system intervenes with the engine during maneuvering to improve the
car‘s stability. These systems actively control critical systems of the car; as control systems, they
require inputs from and output to the automobile.
Let‘s first look at the ABS. The purpose of an ABS is to temporarily release the brake on
a wheel when it rotates too slowly—when a wheel stops turning, the car starts skidding and
becomes hard to control. It sits between the hydraulic pump, which provides power to the brakes,
and the brakes themselves as seen in the following diagram. This hookup allows the ABS system
to modulate the brakes in order to keep the wheels from locking. The ABS system uses sensors
on each wheel to measure the speed of the wheel.
The wheel speeds are used by the ABS system to determine how to vary the hydraulic
fluid pressure to prevent the wheels from skidding. The ASC _ T system‘s job is to control the
engine power and the brake to improve the car‘s stability during maneuvers. The ASC _ T
controls four different systems: throttle, ignition timing, differential brake, and (on automatic
transmission cars) gear shifting. The ASC_T can be turned off by the driver, which can be
important when operating with tire snow chains. The ABS and ASC _ T must clearly
communicate because the ASC _ T interacts with the brake system. Since the ABS was
introduced several years earlier than the ASC _ T, it was important to be able to interface ASC _
T to the existing ABS module, as well as to other existing electronic modules. The engine and
control management units include the electronically controlled throttle, digital engine
management, and electronic transmission control. The ASC _ T control unit has two
microprocessors on two printed circuit boards, one of which concentrates on logic-relevant
components and the other on performance-specific components.
THE EMBEDDED SYSTEM DESIGN PROCESS
This section provides an overview of the embedded system design process aimed at two
objectives. First,it will give us an introduction to the various steps in embedded system design
before we delve into them in more detail. Second, it will allow us to consider the design
methodology itself. A design methodology is important for three reasons. First, it allows us to
keep a scorecard on a design to ensure that we have done everything we need to do, such as
optimizing performance or performing functional tests. Second, it allows us to develop
computer-aided design tools.
Developing a single program that takes in a concept for an embedded system and emits a
completed design would be a daunting task, but by first breaking the process into manageable
steps, we can work on automating (or at least semi automating) the steps one at a time. Third, a
design methodology makes it much easier for members of a design team to communicate. By
defining the overall process, team members can more easily understand what they are supposed
to do, what they should receive from other team members at certain times, and what they are to
hand off when they complete their assigned steps. Since most embedded systems are designed by
teams, coordination is perhaps the most important role of a well-defined design methodology.
Figure summarizes the major steps in the embedded system design process.
In this top–down view, we start with the system requirements. In the next step,
specification, we create a more detailed description of what we want. But the specification states
only how the system behaves, not how it is built. The details of the system‘s internals begin to
take shape when we develop the architecture, which gives the system structure in terms of large
components. Once we know the components we need, we can design those components,
including both software modules and any specialized hardware we need. Based on those
components, we can finally build a complete system.
In this section we will consider design from the top–down—we will begin with the most
abstract description of the system and conclude with concrete details. The alternative is a
bottom–up view in which we start with components to build a system. Bottom–up design steps
are shown in the figure as dashed-line arrows.
We need bottom–up design because we do not have perfect insight into how later stages
of the design process will turn out. Decisions at one stage of design are based upon estimates of
what will happen later: How fast can we make a particular function run? How much memory will
we need? How much system bus capacity do we need? If our estimates are inadequate, we may
have to backtrack and amend our original decisions to take the new facts into account. In general,
the less experience we have with the design of similar systems, the more we will have to rely on
bottom-up design information to help us refine the system. But the steps in the design process are
only one axis along which we can view embedded system design. We also need to consider the
major goals of the design:
■ manufacturing cost;
■ performance (both overall speed and deadlines); and
■ power consumption.
We must also consider the tasks we need to perform at every step in the design process. At each
step in the design,we add detail:
■ We must analyze the design at each step to determine how we can meet the
specifications.
■ We must then refine the design to add detail.
■ And we must verify the design to ensure that it still meets all system goals,
such as cost, speed, and so on.
Requirements
Clearly, before we design a system, we must know what we are designing. The initial
stages of the design process capture this information for use in creating the architecture and
components. We generally proceed in two phases: First, we gather an informal description from
the customers known as requirements, and we refine the requirements into a specification that
contains enough information to begin designing the system architecture.
Separating out requirements analysis and specification is often necessary because of the
large gap between what the customers can describe about the system they want and what the
architects need to design the system. Consumers of embedded systems are usually not
themselves embedded system designers or even product designers.
Their understanding of the system is based on how they envision users‘ interactions with
the system. They may have unrealistic expectations as to what can be done within their budgets;
and they may also express their desires in a language very different from system architects‘
jargon. Capturing a consistent set of requirements from the customer and then massaging those
requirements into a more formal specification is a structured way to manage the process of
translating from the consumer‘s language to the designer‘s.
Requirements may be functional or nonfunctional. We must of course capture the basic
functions of the embedded system, but functional description is often not sufficient. Typical
nonfunctional requirements include:
■ Performance: The speed of the system is often a major consideration both for the
usability of the system and for its ultimate cost. As we have noted, performance may be a
combination of soft performance metrics such as approximate time to perform a user-
level function and hard deadlines by which a particular operation must be completed.
■ Cost: The target cost or purchase price for the system is almost always a consideration.
Cost typically has two major components: manufacturing cost includes the cost of
components and assembly; nonrecurring engineering (NRE) costs include the personnel
and other costs of designing the system.
■ Physical size and weight: The physical aspects of the final system can vary greatly
depending upon the application. An industrial control system for an assembly line may be
designed to fit into a standard-size rack with no strict limitations on weight. A handheld
device typically has tight requirements on both size and weight that can ripple through
the entire system design.
■ Power consumption: Power, of course, is important in battery-powered systems and is
often important in other applications as well. Power can be specified in the requirements
stage in terms of battery life—the customer is unlikely to be able to describe the
allowable wattage.
■ Name: This is simple but helpful. Giving a name to the project not only simplifies
talking about it to other people but can also crystallize the purpose of the machine.
■ Purpose: This should be a brief one- or two-line description of what the system is
supposed to do. If you can‘t describe the essence of your system in one or two lines,
chances are that you don‘t understand it well enough.
■ Inputs and outputs: These two entries are more complex than they seem. The inputs
and outputs to the system encompass a wealth of detail: — Types of data: Analog
electronic signals? Digital data? Mechanical inputs? — Data characteristics: Periodically
arriving data, such as digital audio samples? Occasional user inputs? How many bits per
data element? — Types of I/O devices: Buttons? Analog/digital converters? Video
displays?
■ Functions: This is a more detailed description of what the system does. A good way to
approach this is to work from the inputs to the outputs: When the system receives an
input, what does it do? How do user interface inputs affect these functions? How do
different functions interact?
■ Performance: Many embedded computing systems spend at least some time controlling
physical devices or processing data coming from the physical world. In most of these
cases, the computations must be performed within a certain time frame. It is essential that
the performance requirements be identified early since they must be carefully measured
during implementation to ensure that the system works properly.
■ Manufacturing cost: This includes primarily the cost of the hardware components.
Even if you don‘t know exactly how much you can afford to spend on system
components, you should have some idea of the eventual cost range. Cost has a substantial
influence on architecture:A machine that is meant to sell at $10 most likely has a very
different internal structure than a $100 system.
■ Power: Similarly, you may have only a rough idea of how much power the system can
consume, but a little information can go a long way. Typically, the most important
decision is whether the machine will be battery powered or plugged into the wall.
Battery-powered machines must be much more careful about how they spend energy.
■ Physical size and weight: You should give some indication of the physical size of the
system to help guide certain architectural decisions. A desktop machine has much more
flexibility in the components used than, for example, a lapel mounted voice recorder.
A more thorough requirements analysis for a large system might use a form similar to Figure as a
summary of the longer requirements document. After an introductory section containing this
form, a longer requirements document could include details on each of the items mentioned in
the introduction. For example, each individual feature described in the introduction in a single
sentence may be described in detail in a section of the specification.
After writing the requirements, you should check them for internal consistency: Did you forget
to assign a function to an input or output? Did you consider all the modes in which you want the
system to operate? Did you place an unrealistic number of features into a battery-powered, low-
cost machine? To practice the capture of system requirements, Example creates the requirements
for a GPS moving map system.
Example
Requirements analysis of a GPS moving map
The moving map is a handheld device that displays for the user a map of the terrain around the
user‘s current position; the map display changes as the user and the map device change position.
The moving map obtains its position from the GPS, a satellite-based navigation system. The
moving map display might look something like the following figure.
What requirements might we have for our GPS moving map? Here is an initial list:
■ Functionality: This system is designed for highway driving and similar uses, not
nautical or aviation uses that require more specialized databases and functions. The
system should show major roads and other landmarks available in standard topographic
databases.
■ User interface: The screen should have at least 400_600 pixel resolution. The device
should be controlled by no more than three buttons. A menu system should pop up on the
screen when buttons are pressed to allow the user to make selections to control the
system.
■ Performance: The map should scroll smoothly. Upon power-up, a display should take
no more than one second to appear, and the system should be able to verify its position
and display the current map within 15 s.
■ Cost: The selling cost (street price) of the unit should be no more than $100.
■ Physical size and weight: The device should fit comfortably in the palm of the hand.
■ Power consumption: The device should run for at least eight hours on four AA
batteries.
Note that many of these requirements are not specified in engineering units—for
example, physical size is measured relative to a hand, not in centimeters. Although these
requirements must ultimately be translated into something that can be used by the designers,
keeping a record of what the customer wants can help to resolve questions about the
specification that may crop up later during design. Based on this discussion, let‘s write a
requirements chart for our moving map system:
Specification
The specification is more precise—it serves as the contract between the customer and the
architects. As such, the specification must be carefully written so that it accurately reflects the
customer‘s requirements and does so in a way that can be clearly followed during design.
Specification is probably the least familiar phase of this methodology for neophyte designers, but
it is essential to creating working systems with a minimum of designer effort.
Designers who lack a clear idea of what they want to build when they begin typically
make faulty assumptions early in the process that aren‘t obvious until they have a working
system. At that point, the only solution is to take the machine apart, throw away some of it, and
start again. The specification should be understandable enough so that someone can verify that it
meets system requirements and overall expectations of the customer. It should also be
unambiguous enough that designers know what they need to build.
Designers can run into several different types of problems caused by unclear
specifications. If the behavior of some feature in a particular situation is unclear from the
specification, the designer may implement the wrong functionality. If global characteristics of
the specification are wrong or incomplete, the overall system architecture derived from the
specification may be inadequate to meet the needs of implementation.
A specification of the GPS system would include several components:
■ Data received from the GPS satellite constellation.
■ Map data.
■ User interface.
■ Operations that must be performed to satisfy customer requests.
■ Background actions required to keep the system running, such as operating the GPS
receiver.
UML, a language for describing specifications, will be introduced later and we will use it to
write a specification. We will practice writing specifications in each chapter as we work through
example system designs. We will also study specification techniques in more later.
Architecture Design
The specification does not say how the system does things, only what the system does.
Describing how the system implements those functions is the purpose of the architecture. The
architecture is a plan for the overall structure of the system that will be used later to design the
components that make up the architecture. The creation of the architecture is the first phase of
what many designers think of as design. To understand what an architectural description is, let‘s
look at sample architecture for the moving map of Example Figure shows sample system
architecture in the form of a block diagram that shows major operations and data flows among
them.
This block diagram is still quite abstract—we have not yet specified which operations
will be performed by software running on a CPU, what will be done by special-purpose
hardware, and so on. The diagram does, however, go a long way toward describing how to
implement the functions described in the specification. We clearly see, for example, that we need
to search the topographic database and to render (i.e., draw) the results for the display. We have
chosen to separate those functions so that we can potentially do them in parallel—performing
rendering separately from searching the database may help us update the screen more fluidly.
Only after we have designed an initial architecture that is not biased toward too many
implementation details should we refine that system block diagram into two block diagrams: one
for hardware and another for software. These two more refined block diagrams are shown in
Figure 1.4.The hardware block diagram clearly shows that we have one central CPU surrounded
by memory and I/O devices. In particular, we have chosen to use two memories: a frame buffer
for the pixels to be displayed and a separate program/data memory for general use by the CPU.
The software block diagram fairly closely follows the system block diagram, but we have added
a timer to control when we read the buttons on the user interface and render data onto the screen.
To have a truly complete architectural description, we require more detail, such as where units in
the software block diagram will be executed in the hardware block diagram and when operations
will be performed in time. Architectural descriptions must be designed to satisfy both functional
and nonfunctional requirements. Not only must all the required functions be present, but we must
meet cost, speed, power, and other nonfunctional constraints.
Starting out with a system architecture and refining that to hardware and software
architectures is one good way to ensure that we meet all specifications: We can concentrate on
the functional elements in the system block diagram, and then consider the nonfunctional
constraints when creating the hardware and software architectures. How do we know that our
hardware and software architectures in fact meet constraints on speed, cost, and so on? We must
somehow be able to estimate the properties of the components of the block diagrams, such as the
search and rendering functions in the moving map system.
Accurate estimation derives in part from experience, both general design experience and
particular experience with similar systems. However, we can sometimes create simplified models
to help us make more accurate estimates. Sound estimates of all nonfunctional constraints
during the architecture phase are crucial, since decisions based on bad data will show up during
the final phases of design, indicating that we did not, in fact, meet the specification.
Designing Hardware and Software Components
The architectural description tells us what components we need. The component design
effort builds those components in conformance to the architecture and specification. The
components will in general include both hardware—FPGAs, boards, and so on—and software
modules. Some of the components will be ready-made. The CPU, for example, will be a standard
component in almost all cases, as will memory chips and many other components. In the moving
map, the GPS receiver is a good example of a specialized component that will nonetheless be a
predesigned, standard component.
We can also make use of standard software modules. One good example is the
topographic database. Standard topographic databases exist, and you probably want to use
standard routines to access the database—not only is the data in a predefined format, but it is
highly compressed to save storage. Using standard software for these access functions not only
saves us design time, but it may give us a faster implementation for specialized functions such as
the data decompression phase. You will have to design some components yourself. Even if you
are using only standard integrated circuits, you may have to design the printed circuit board that
connects them. You will probably have to do a lot of custom programming as well.
When creating these embedded software modules, you must of course make use of your
expertise to ensure that the system runs properly in real time and that it does not take up more
memory space than is allowed. The power consumption of the moving map software example is
particularly important. You may need to be very careful about how you read and write memory
to minimize power—for example, since memory accesses are a major source of power
consumption, memory transactions must be carefully planned to avoid reading the same data
several times.
System Integration
Only after the components are built do we have the satisfaction of putting them together
and seeing a working system. Of course, this phase usually consists of a lot more than just
plugging everything together and standing back. Bugs are typically found during system
integration, and good planning can help us find the bugs quickly. By building up the system in
phases and running properly chosen tests, we can often find bugs more easily. If we debug only a
few modules at a time, we are more likely to uncover the simple bugs and able to easily
recognize them.
Only by fixing the simple bugs early will we be able to uncover the more complex or
obscure bugs that can be identified only by giving the system a hard workout. We need to ensure
during the architectural and component design phases that we make it as easy as possible to
assemble the system in phases and test functions relatively independently.
System integration is difficult because it usually uncovers problems. It is often hard to observe
the system in sufficient detail to determine exactly what is wrong— the debugging facilities for
embedded systems are usually much more limited than what you would find on desktop systems.
As a result, determining why things do not stet work correctly and how they can be fixed is a
challenge in itself. Careful attention to inserting appropriate debugging facilities during design
can help ease system integration problems, but the nature of embedded computing means that
this phase will always be a challenge.
UML was designed to be useful at many levels of abstraction in the design process. UML
is useful because it encourages design by successive refinement and progressively adding detail
to the design, rather than rethinking the design at each new level of abstraction. UML is an
object-oriented modeling language. We will see precisely what we mean by an object in just a
moment, but object-oriented design emphasizes two concepts of importance:
■ It encourages the design to be described as a number of interacting objects, rather than a few
large monolithic blocks of code.
■ At least some of those objects will correspond to real pieces of software or hardware in the
system. We can also use UML to model the outside world that interacts with our system, in
which case the objects may correspond to people or other machines. It is sometimes important to
implement something we think of at a high level as a single object using several distinct pieces
of code or to otherwise break up the object correspondence in the implementation. However,
thinking of the design in terms of actual objects helps us understand the natural structure of the
system. Object-oriented (often abbreviated OO) specification can be seen in two complementary
ways:
■ Object-oriented specification allows a system to be described in a way that closely models
real-world objects and their interactions.
■ Object-oriented specification provides a basic set of primitives that can be used to describe
systems with particular attributes, irrespective of the relationships of those systems‘ components
to real-world objects. Both views are useful. At a minimum, object-oriented specification is a set
of linguistic mechanisms. In many cases, it is useful to describe a system in terms of real-world
analogs. However, performance, cost, and so on may dictate that we change the specification to
be different in some ways from the real-world elements we are trying to model and implement.
In this case, the object-oriented specification mechanisms are still useful. What is the
relationship between an object-oriented specification and an object oriented programming
language (such as C++)? A specification language may not be executable. But both object-
oriented specification and programming languages provide similar basic methods for structuring
large systems.
Unified Modeling Language (UML)—the acronym is the name is a large language, and covering
all of it is beyond the scope of this book. In this section, we introduce only a few basic concepts.
In later chapters, as we need a few more UML concepts, we introduce them to the basic
modeling elements introduced here. Because UML is so rich, there are many graphical elements
in a UML diagram. It is important to be careful to use the correct drawing to describe
something—for instance, UML distinguishes between arrows with open and filled-in
arrowheads, and solid and broken lines. As you become more familiar with the language, uses of
the graphical primitives will become more natural to you. We also won‘t take a strict object-
oriented approach. We may not always use objects for certain elements of a design—in some
cases, such as when taking particular aspects of the implementation into account, it may make
sense to use another design style. However, object-oriented design is widely applicable, and no
designer can consider himself or herself design literate without understanding it.
Structural Description
By structural description, we mean the basic components of the system; we will learn
how to describe how these components act in the next section. The principal component of an
object-oriented design is, naturally enough, the object. An object includes a set of attributes that
define its internal state. When implemented in a programming language, these attributes usually
become variables or constants held in a data structure.
In some cases, we will add the type of the attribute after the attribute name for clarity, but
we do not always have to specify a type for an attribute. An object describing a display (such as a
CRT screen) is shown in UML notation in Figure. The text in the folded-corner page icon is a
note; it does not correspond to an object in the system and only serves as a comment. The
attribute is, in this case, an array of pixels that holds the contents of the display.
The object is identified in two ways: It has a unique name, and it is a member of a class.
The name is underlined to show that this is a description of an object and not of a class. A class
is a form of type definition—all objects derived from the same class have the same
characteristics, although their attributes may have different values. A class defines the attributes
that an object may have. It also defines the operations that determine how the object interacts
with the rest of the world. In a programming language, the operations would become pieces of
code used to manipulate the object.
The UML description of the Display class is shown in Figure. The class has the name that
we saw used in the d1 object since d1 is an instance of class Display. The Display class defines
the pixels attribute seen in the object; remember that when we instantiate the class an object, that
object will have its own memory so that different objects of the same class have their own values
for the attributes. Other classes can examine and modify class attributes; if we have to do
something more complex than use the attribute directly, we define a behavior to perform that
function.
A class defines both the interface for a particular type of object and that object‘s
implementation. When we use an object, we do not directly manipulate its attributes—we can
only read or modify the object‘s state through the operations that define the interface to the
object. (The implementation includes both the attributes and whatever code is used to implement
the operations.) As long as we do not change the behavior of the object seen at the interface, we
can change the implementation as much as we want. This lets us improve the system by, for
example, speeding up an operation or reducing the amount of memory required without requiring
changes to anything else that uses the object.
Clearly, the choice of an interface is a very important decision in object-oriented design. The
proper interface must provide ways to access the object‘s state (since we cannot directly see the
attributes) as well as ways to update the state. We need to make the object‘s interface general
enough so that we can make full use of its capabilities. However, excessive generality often
makes the object large and slow. Big, complex interfaces also make the class definition difficult
for designers to understand and use properly. There are several types of relationships that can
exist between objects and classes:
■ Association occurs between objects that communicate with each other but have no
ownership relationship between them.
■ Aggregation describes a complex object made of smaller objects.
■ Composition is a type of aggregation in which the owner does not allow access to the
component objects.
■ Generalization allows us to define one class in terms of another.
The elements of a UML class or object do not necessarily directly correspond to statements in a
programming language—if the UML is intended to describe something more abstract than a
program, there may be a significant gap between the contents of the UML and a program
implementing it. The attributes of an object do not necessarily reflect variables in the object. An
attribute is some value that reflects the current state of the object. In the program
implementation, that value could be computed from some other internal variables. The behaviors
of the object would, in a higher-level specification, reflect the basic things that can be done with
an object. Implementing all these features may require breaking up a behavior into several
smaller behaviors—for example, initialize the object before you start to change its internal state-
derived classes.
Unified Modeling Language, like most object-oriented languages, allows us to define one class
in terms of another. An example is shown in Figure, where we derive two particular types of
displays. The first, BW_ display, describes a black and- white display. This does not require us to
add new attributes or operations, but we can specialize both to work on one-bit pixels. The
second, Color_map_display, uses a graphic device known as a color map to allow the user to
select from a large number of
available colors even with a small number of bits per pixel. This class defines a color_map
attribute that determines how pixel values are mapped onto display colors. A derived class
inherits all the attributes and operations from its base class. In this class, Display is the base class
for the two derived classes. A derived class is defined to include all the attributes of its base
class.
This relation is transitive—if Display were derived from another class, both BW_display and
Color_map_display would inherit all the attributes and operations of Display’s base class as
well. Inheritance has two purposes. It of course allows us to succinctly describe one class that
shares some characteristics with another class. Even more important, it captures those
relationships between classes and documents them. If we ever need to change any of the classes,
knowledge of the class structure helps us determine the reach of changes—for example, should
the change affect only Color_map_display objects or should it change all Display objects?
Typically,we find that we use a certain combination of elements in an object or class many
times.We can give these patterns names, which are called stereotypes in UML. A stereotype
name is written in the form <<signal>>. Figure shows a stereotype for a signal, which is a
communication mechanism.
Behavioral Description
We have to specify the behavior of the system as well as its structure. One way to specify the
behavior of an operation is a state machine. Figure shows UML states; the transition between
two states is shown by a skeleton arrow. These state machines will not rely on the operation of a
clock, as in hardware; rather, changes from one state to another are triggered by the occurrence
of events.
An event is some type of action. The event may originate outside the system, such as a user
pressing a button. It may also originate inside, such as when one routine finishes its computation
and passes the result on to another routine. We will concentrate on the following three types of
events defined by UML, as illustrated in Figure.
Let‘s consider a simple state machine specification to understand the semantics of UML
state machines. A state machine for an operation of the display is shown in Figure. The start and
stop states are special states that help us to organize the flow of the state machine. The states in
the state machine represent different conceptual operations.
In some cases, we take conditional transitions out of states based on inputs or the results
of some computation done in the state. In other cases, we make an unconditional transition to the
next state. Both the unconditional and conditional transitions make use of the call event. Splitting
a complex operation into several states helps document the required steps, much as subroutines
can be used to structure code. It is sometimes useful to show the sequence of operations over
time, particularly when several objects are involved.
In this case, we can create a sequence diagram, like the one for a mouse click scenario
shown in Figure. A sequence diagram is somewhat similar to a hardware timing diagram,
although the time flows vertically in a sequence diagram, whereas time typically flows
horizontally in a timing diagram. The sequence diagram is designed to show a particular scenario
or choice of events—it is not convenient for showing a number of mutually exclusive
possibilities. In this case, the sequence shows what happens when a mouse click is on the menu
region. Processing includes three objects shown at the top of the diagram. Extending below each
object is its lifeline, a dashed line that shows how long the object is alive. In this case, all the
objects remain alive for the entire sequence, but in other cases objects may be created or
destroyed during processing. The boxes
along the lifelines show the focus of control in the sequence, that is, when the object is actively
processing. In this case, the mouse object is active only long enough to create the mouse_click
event. The display object remains in play longer; it in turn uses call events to invoke the menu
object twice: once to determine which menu item was selected and again to actually execute the
menu call. The find_region( ) call is internal to the display object, so it does not appear as an
event in the diagram.
Keypad on the top of the machine. LCD display unit on the top of the machine. It displays
menus, text entered into the ACVM and pictograms, welcome, thank and other messages.
Graphic interactions with the machine. Displays time and date. Delivery slot so that child can
collect the chocolate and coins, if refunded. Internet connection port so that owner can know
status of the ACVM sales from remote.
Smart Card
Smart card– a plastic card in ISO standard dimensions, 85.60 mm x 53.98 x 0.80 mm.
_ Embedded system on a card.
_ SoC (System-On-Chip).
_ ISO recommended standards are ISO7816 (1 to 4) for host-machine contact based
cards and ISO14443 (Part A or B) for the contact-less cards.
_ Silicon chip is just a few mm in size and is concealed in-between the layers. Its very
small size protects the card from bending
Embedded hardware components
_ Microcontroller or ASIP (Application Specific Instruction Set Processor)
_ RAM for temporary variables and stack
_ ROM for application codes and RTOS codes for scheduling the tasks
_ EEPROM for storing user data, user address, user identification codes, card number and expiry
date
_ Timer and Interrupt controller
_ A carrier frequency ~16 MHz generating circuit and Amplitude Shifted Key (ASK)
_ Interfacing circuit for the I/Os
_ Charge pump
ROM
Fabrication key, Personalization key An utilization lock.
_ RTOS and application using only the logical addresses
Embedded Software
_ Boot-up, Initialisation and OS programs
_ Smart card secure file system
_ Connection establishment and termination
_ Communication with host
_ Cryptography
_ Host authentication
_ Card authentication
_ Addition parameters or recent new data sent by the host (for example, present balance left).
Smart Card OS Special features
_ Protected environment.
_ Every method, class and run time libraryshould be scalable.
_ Code-size generated be optimum.
_ Memory should not exceed 64 kB memory.
_ Limiting uses of specific data types; multidimensional arrays, long 64-bit integer and floating
points
Digital Camera
A typical Camera
_ 4 M pixel/6 M pixel still images, clear visual display (ClearVid) CMOS sensor, 7 cm wide
LCD photo display screen, enhanced imaging processor, double anti blur solution and high-speed
processing engine, 10X optical and 20X digital zooms
_ Record high definition video-clips. It therefore has speaker microphone(s) for high quality
recorded sound.
_ Audio/video Out Port for connecting to a TV/DVD player.
Arrangements
_ Keys on the camera.
_ Shutter, lens and charge coupled device (CCD) array sensors
_ Good resolution photo quality LCD display unit
_ Displays text such as image-title, shooting data and time and serial number. It displays
messages. It displays the GUI menu when user interacts with the camera.
_ Self-timer lamp for flash.
Internal units
_ Internal memory flash to store OS and embedded software and limited number of image files
_ Flash memory stick of 2 GB or more for large storage.
_ Universal Serial Bus (USB), Bluetooth and serial COM port for connecting it to computer,
mobile and printer. LCD screen to display frame view.
_ Saved images display using the navigation keys.
_ Frame light falls on the CCD array, which through an ADC transmits the bits for each pixel in
each row in the frame and for the dark area pixels in each row for offset correction in CCD
signaled light intensities for each row.
_ The CCD bits of each pixel in each row and column are offset corrected by CCD signal
processor (CCDSP).
Embedded systems possess certain specific characteristics and these are unique to each
Embedded system.
4. Distributed
6. Power concerns
7. Single-functioned
8. Complex functionality
9. Tightly-constrained
10. Safety-critical
Each E.S has certain functions to perform and they are developed in such a manner to
do the intended functions only.
They cannot be used for any other purpose.
Ex – The embedded control units of the microwave oven cannot be replaced with
AC‟S embedded control unit because the embedded control units of microwave oven
and AC are specifically designed to perform certain specific tasks.
Example – E.S which are mission critical like flight control systems, Antilock Brake
Systems (ABS) etc are Real Time systems.
4. Distributed: –
Product aesthetics (size, weight, shape, style, etc) is an important factor in choosing a
product.
It is convenient to handle a compact device than a bulky product.
6. Power Concerns:-
Power management is another important factor that needs to be considered in
designing embedded systems.
E.S should be designed in such a way as to minimize the heat dissipation by the
system.
8. Complex functionality: -
We have to run sophisticated algorithms or multiple algorithms in some applications.
9. Tightly-constrained:-
Low cost, low power, small, fast, etc
10. Safety-critical:-
Must not endanger human life and the environment
1. Response :-
It is the measure of quickness of the system.
It tells how fast the system is tracking the changes in input variables. Most of the E.S
demands fast response which should be almost real time.
Ex – Flight control application.
2. Throughput :-
It deals w ith the efficiency of a system.
It can be defined as the rate of production or operation of a defined process over a stated
period of time.
The rates can be expressed in terms of products, batches produced or any other meaningful
measurements.
Ex – In case of card reader throughput means how many transactions the reader can perform
in a minute or in an hour or in a day.
3. Reliability :-
It is a measure of how much we can rely upon the proper functioning of the system.
• Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR) are the terms
used in determining system reliability.
• MTTR specifies how long the system is allowed to be out of order following a failure.
• For embedded system with critical application need, it should be of the order of minutes.
4. Maintainability:-
• It deals with support and maintenance to the end user or client in case of technical issues
and product failure or on the basis of a routine system checkup.
• A more reliable system means a system with less corrective maintainability requirements
and vice versa.
5. Security:-
• Confidentiality, Integrity and availability are the three major measures of information
security.
• Confidentiality deals with protection of data and application from unauthorized
disclosure.
• Integrity deals with the protection of data and application from unauthorized
modification.
• Availability deals with protection of data and application from unauthorized users.
6. Safety :-
Safety deals with the possible damages that can happen to the operator, public and the
environment due to the breakdown of an Embedded System.
The breakdown of an embedded system may occur due to a hardware failure or a firmware
failure.
Safety analysis is a must in product engineering to evaluate the anticipated damages and
determine the best course of action to bring down the consequences of damage to an
acceptable level.
• Testability deals with how easily one can test the design, application and by which means
it can be done.
• For an E.S testability is applicable to both the embedded hardware and firmware.
• Embedded hardware testing ensures that the peripherals and total hardware functions in
the desired manner, whereas firmware testing ensures that the firmware is functioning in the
expected way.
• 1. Hardware level: It is used for finding the issues created by hardware problems.
• 2. Software level: It is employed for finding the errors created by the flaws in the software.
2. Evolvability :-
• For an embedded system evolvability refers to the ease with which the embedded product
can be modified to take advantage of new firmware or hardware technologies.
3. Portability:-
• „Porting‟ represents the migration of embedded firmware written for one target processor
to a different target processor.
• It is the time elapsed between the conceptualization of a product and the time at which the
product is ready for selling.
• The commercial embedded product market is highly competitive and time to market the
product is critical factor in the success of commercial embedded product.
• There may be multiple players in embedded industry who develop products of the same
category (like mobile phone).
• Cost is a factor which is closely monitored by both end user and product manufacturer.
• Any failure to position the cost of a commercial product at a nominal rate may lead to the
failure of the product in the market.
• Proper market study and cost benefit analysis should be carried out before taking a
decision on the per-unit cost of the embedded product.
• The ultimate aim of the product is to generate marginal profit so the budget and total cost
should be properly balanced to provide a marginal profit.
UML is an object-oriented modeling language. We will see precisely what we mean by an object
in just a moment, but object-oriented design emphasizes two concepts of importance:
At least some of those object will correspond to real pieces of software or hardware in the
system. We can also use UML to model the outside world that interacts with our system, in
which case the objects may correspond to people or other machines. It is sometimes important to
implement something we think of at a high level as a single object using several distinct pieces
of code or to otherwise break up the object correspondence in the implementation However,
thinking of the design in terms of actual objects helps us understand the natural structure of the
system. Object-oriented (often abbreviated OO) specification can be seen in two complementary
ways:
Object-oriented specification allows a system to be described in a way that closely models
real-world objects and their interactions.
Object-oriented specification provides a basic set of primitives that can be used to describe
systems with particular attributes, irrespective of the relationships of those systems‘ components
to real-world objects.
Both views are useful. At a minimum, object-oriented specification is a set of linguistic
mechanisms. In many cases, it is useful to describe a system in terms of real-world analogs.
However, performance, cost, and so on may dictate that we change the specification to be
different in some ways from the real-world elements we are trying to model and implement. In
this case, the object-oriented specification mechanisms are still useful.
A specification language may not be executable. But both object-oriented specification and
programming languages provide similar basic methods for structuring large systems.
Unified Modeling Language (UML)—the acronym is the name is a large language, and covering
all of it is beyond the scope of this book. In this section, we introduce only a few basic concepts.
In later chapters, as we need a few more UML concepts, we introduce them to the basic
modeling elements introduced here.
Because UML is so rich, there are many graphical elements in a UML diagram. It is important
to be careful to use the correct drawing to describe something for instance; UML distinguishes
between arrows with open and filled-in arrowheads, and solid and broken lines. As you become
more familiar with the language, uses of the graphical primitives will become more natural to
you.
We also won‘t take a strict object-oriented approach. We may not always use objects for certain
elements of a design—in some cases, such as when taking particular aspects of the
implementation into account, it may make sense to use another design style. However, object-
oriented design is widely applicable, and no designer can consider himself or herself design
literate without understanding it.
1. Structural Description:
By structural description, we mean the basic components of the system; we will
learn how to describe how these components act in the next section. The principal component of
an object-oriented design is, naturally enough, the object. An object includes a set
of attributes that define its internal state.
When implemented in a programming language, these attributes usually become
variables or constants held in a data structure. In some cases, we will add the type of the attribute
after the attribute name for clarity, but we do not always have to specify a type for an attribute.
An object describing a display (such as a CRT screen) is shown in UML notation in Figure a).
The text in the folded-corner page icon is a note; it does not correspond to an
object in the system and only serves as a comment. The attribute is, in this case, an array of
pixels that holds the contents of the display. The object is identified in two ways: It has a unique
name, and it is a member of a class. The name is underlined to show that this is a description of
an object and not of a class.
A class is a form of type definition—all objects derived from the same class have
the same characteristics, although their attributes may have different values. A class defines the
attributes that an object may have. It also defines the operations that determine how the object
interacts with the rest of the world. In a programming language, the operations would become
pieces of code used to manipulate the object.
The UML description of the Display class is shown in Figure b). The class has
the name that we saw used in the d1 object since d1 is an instance of class Display.
The Display class defines the pixels attribute seen in the object; remember that
when we instantiate the class an object, that object will have its own memory so that different
objects of the same class have their own values for the attributes. Other classes can examine and
modify class attributes; if we have to do something more complex than use the attribute directly,
we define a behavior to perform that function.
A class defines both the interface for a particular type of object and that
object‘s implementation. When we use an object, we do not directly manipulate its attributes—
we can only read or modify the object‘s state through the operations that define the interface to
the object.
As long as we do not change the behavior of the object seen at the interface, we
can change the implementation as much as we want. This lets us improve the system by, for
example, speeding up an operation or reducing the amount of memory required without requiring
changes to anything else that uses the object.
Clearly, the choice of an interface is a very important decision in object-oriented
design. The proper interface must provide ways to access the object‘s state (since we cannot
directly see the attributes) as well as ways to update the state.
We need to make the object‘s interface general enough so that we can make full
use of its capabilities. However, excessive generality often makes the object large and slow. Big,
complex interfaces also make the class definition difficult for designers to understand and use
properly.
There are several types of relationships that can exist between objects and classes:
■ Association occurs between objects that communicate with each other but have no ownership
relationship between them.
■ Aggregation describes a complex object made of smaller objects.
■ Composition is a type of aggregation in which the owner does not allow access to the
component objects.
■ Generalization allows us to define one class in terms of another.
2. Behavioral Description:
We have to specify the behavior of the system as well as its structure. One way to
specify the behavior of an operation is a state machine.
These state machines will not rely on the operation of a clock, as in hardware;
rather, changes from one state to another are triggered by the occurrence
of events.
An event is some type of action. The event may originate outside the system, such as a user
pressing a button. It may also originate inside, such as when one routine finishes its computation
and passes the result on to another routine.We will concentrate on the following three types of
events defined by UML, as illustrated in Figure 1.8 c):
A signal is an asynchronous occurrence. It is defined in UML by an object that is labeled as
a <<signal>>. The object in the diagram serves as a declaration of the event‘s existence. Because
it is an object, a signal may have parameters that are passed to the signal‘s receiver.
■A time-out event causes the machine to leave a state after a certain amount of time. The
label tm(time-value) on the edge gives the amount of time after which the transition occurs. A
time-out is generallyimplemented with an external timer. This notation simplifies the
specification and allows us to defer implementation details about the time-out mechanism.
In order to learn how to use UML to model systems, we will specify a simple system, a model
train controller, which is illustrated in Figure 1.2.The user sends messages to the train with a
control box attached to the tracks.
The control box may have familiar controls such as a throttle, emergency stop button, and so on.
Since the train receives its electrical power from the two rails of the track, the control box can
send signals to the train over the tracks by modulating the power supply voltage. As shown in the
figure, the control panel sends packets over the tracks to the receiver on the train.
The train includes analog electronics to sense the bits being transmitted and a control system to
set the train motor‘s speed and direction based on those commands.
Each packet includes an address so that the console can control several trains on the same track;
the packet also includes an error correction code (ECC) to guard against transmission errors.
This is a one-way communication system the model train cannot send commands back to the
user.
We start by analyzing the requirements for the train control system.We will base our system on a
real standard developed for model trains.We then develop two specifications: a simple, high-
level specification and then a more detailed specification.
Requirements
We will develop our system using a widely used standard for model train control. We could
develop our own train control system from scratch, but basing our system upon a standard has
several advantages in this case: It reduces the amount of work we have to do and it allows us to
use a wide variety of existing trains and other pieces of equipment.
DCC
The Digital Command Control (DCC) was created by the National Model Railroad Association
to support interoperable digitally-controlled model trains.
Hobbyists started building homebrew digital control systems in the 1970s and Marklin developed
its own digital control system in the 1980s. DCC was created to provide a standard that could be
built by any manufacturer so that hobbyists could mix and match components from multiple
vendors.
The DCC standard is given in two documents:
Standard S-9.1, the DCC Electrical Standard, defines how bits are encoded on the rails for
transmission.
Standard S-9.2, the DCC Communication Standard, defines the packets that carry information.
Any DCC-conforming device must meet these specifications. DCC also provides several
recommended practices. These are not strictly required but they provide some hints to
manufacturers and users as to how to best use DCC.
The DCC standard does not specify many aspects of a DCC train system. It doesn‘t define the
control panel, the type of microprocessor used, the programming language to be used, or many
other aspects of a real model train system.
The standard concentrates on those aspects of system design that are necessary for
interoperability. Over standardization, or specifying elements that do not really need to be
standardized, only makes the standard less attractive and harder to implement.
The Electrical Standard deals with voltages and currents on the track. While the electrical
engineering aspects of this part of the specification are beyond the scope of the book, we will
briefly discuss the data encoding here.
The standard must be carefully designed because the main function of the track is to carry power
to the locomotives. The signal encoding system should not interfere with power transmission
either to DCC or non-DCC locomotives. A key requirement is that the data signal should not
change the DC value of the rails.
The data signal swings between two voltages around the power supply voltage. As shown in
Figure 1.3, bits are encoded in the time between transitions, not by voltage levels. A 0 is at least
100 ms while a 1 is nominally 58ms.
The durations of the high (above nominal voltage) and low (below nominal voltage) parts of a
bit are equal to keep the DC value constant. The specification also gives the allowable variations
in bit times that a conforming DCC receiver must be able to tolerate.
The standard also describes other electrical properties of the system, such as allowable transition
times for signals.
The DCC Communication Standard describes how bits are combined into packets and the
meaning of some important packets.
Some packet types are left undefined in the standard but typical uses are given in Recommended
Practices documents. We can write the basic packet format as a regular expression:
D is the data byte, which includes eight bits. A data byte may contain an address, instruction,
data, or error correction information.
A baseline packet is the minimum packet that must be accepted by all DCC implementations.
More complex packets are given in a Recommended Practice document.
A baseline packet has three data bytes: an address data byte that gives the intended receiver of
the packet; the instruction data byte provides a basic instruction; and an error correction data
byte is used to detect and correct transmission errors.
The instruction data byte carries several pieces of information. Bits 0–3 provide a 4-bit speed
value. Bit 4 has an additional speed bit, which is interpreted as the least significant speed bit. Bit
5 gives direction, with 1 for forward and 0 for reverse. Bits 7–8 are set at 01 to indicate that this
instruction provides speed and direction.
The error correction data byte is the bitwise exclusive OR of the address and instruction data
bytes.
The standard says that the command unit should send packets frequently since a packet may be
corrupted. Packets should be separated by at least 5 ms.
Conceptual Specification
Digital Command Control specifies some important aspects of the system, particularly those that
allow equipment to interoperate. But DCC deliberately does not specify everything about a
model train control system. We need to round out our specification with details that complement
the DCC spec.
A conceptual specification allows us to understand the system a little better. We will use the
experience gained by writing the conceptual specification to help us write a detailed specification
to be given to a system architect. This specification does not correspond to what any commercial
DCC controllers do, but it is simple enough to allow us to cover some basic concepts in system
design.
A train control system turns commands into packets. A command comes from the command
unit while a packet is transmitted over the rails.
Commands and packets may not be generated in a 1-to-1 ratio. In fact, the DCC standard says
that command units should resend packets in case a packet is dropped during transmission.
We now need to model the train control system itself. There are clearly two major subsystems:
the command unit and the train-board component as shown in Figure 1.4. Each of these
subsystems has its own internal structure.
The basic relationship between them is illustrated in Figure 1.5. This figure shows a
UML collaboration diagram; we could have used another type of figure, such as a class or
object diagram, but we wanted to emphasize the transmit/receive relationship between these
major subsystems. The command unit and receiver are each represented by objects; the
command unit sends a sequence of packets to the train‘s receiver, as illustrated by the arrow.
The notation on the arrow provides both the type of message sent and its sequence in a flow of
messages; since the console sends all the messages, we have numbered the arrow‘s messages as
1..n. Those messages are of course carried over the track.
Since the track is not a computer component and is purely passive, it does not appear in the
diagram. However, it would be perfectly legitimate to model the track in the collaboration
diagram, and in some situations it may be wise to model such nontraditional components in the
specification diagrams. For example, if we are worried about what happens when the track
breaks, modeling the tracks would help us identify failure modes and possible recovery
mechanisms.
Let‘s break down the command unit and receiver into their major components. The console
needs to perform three functions: read the state of the front panel on the command unit, format
messages, and transmit messages. The train receiver must also perform three major functions:
receive the message, interpret the message (taking into account the current speed, inertia setting,
etc.),and actually control the motor. In this case, let‘s use a class diagram to represent the design;
we could also use an object diagram if we wished. The UML class diagram is shown in Figure
1.6. It shows the console class using three classes, one for each of its major components. These
classes must define some behaviors, but for the moment we will concentrate on the basic
characteristics of these classes:
The Console class describes the command unit‘s front panel, which contains the analog knobs
and hardware to interface to the digital parts of the system.
The Formatter class includes behaviors that know how to read the panel knobs and creates a bit
stream for the required message.
The Transmitter class interfaces to analog electronics to send the message along the track.
There will be one instance of the Console class and one instance of each of the component
classes, as shown by the numeric values at each end of the relationship links. We have also
shown some special classes that represent analog components, ending the name of each with an
asterisk:
Knobs* describes the actual analog knobs, buttons, and levers on the control panel.
Sender* describes the analog electronics that send bits along the track.
Likewise, the Train makes use of three other classes that define its components:
The Receiver class knows how to turn the analog signals on the track into digital form.
The Controller class includes behaviors that interpret the commands and figures out how to
control the motor.
The Motor interface class defines how to generate the analog signals required to control the
motor. We define two classes to represent analog components:
Detector* detects analog signals on the track and converts them into digital form.
Pulser* turns digital commands into the analog signals required to control the motor speed.
UNIT II
INTRODUCTION TO EMBEDDED C AND APPLICATIONS
BASIC C DATA TYPES
Let‘s start by looking at how ARM compilers handle the basic C data types. We will see that some of these types
are more efficient to use for local variables than others. There are also differences between the addressing modes
available when loading and storing data of each type.
ARM processors have 32-bit registers and 32-bit data processing operations. The ARM architecture is a RISC
load/store architecture. In other words you must load values from memory into registers before acting on them.
There are no arithmetic or logical instructions that manipulate values in memory directly.
Early versions of the ARM architecture (ARMv1 to ARMv3) provided hardware support for loading and
storing unsigned 8-bit and unsigned orsigned 32-bit values.
These architectures were used on processors prior to the ARM7TDMI. Table 5.1 shows the
load/store instruction classes available by ARM architecture.
InTable 5.1loadsthatacton8- or16-bitvalues extendthevalue to32bitsbefore writing to an ARM
register. Unsigned values are zero-extended, and signed values sign-extended. This means that the
cast of a loaded value to an inttype does not cost extra instructions. Similarly, a store of an 8- or 16-bit
value selects the lowest 8 or 16 bits of the register. The cast of an intto smaller type does notcost extra
instructions on a store.
The ARMv4 architecture and above support signed 8-bit and 16-bit loads and stores directly,
through new instructions. Since these instructions are a later addition, they do not support as many
addressing modes as the pre-ARMv4 instructions. (See Section 3.3
fordetailsofthedifferentaddressingmodes.)Wewillseetheeffectofthisintheexample
checksum_v3in Section 5.2.1.
Finally, ARMv5 adds instruction support for 64-bit load and stores. This is available in ARM9E and
latercores.
Prior to ARMv4, ARM processors were not good at handling signed 8-bit or any 16-bit values.
Therefore ARM C compilers define charto be an unsigned 8-bit value, rather than a signed 8-
bitvalueasistypicalinmanyothercompilers.
Compilers armcc and gcc use the datatype mappings in Table 5.2 foranARM target. The
exceptional case for type charis worth noting as it can cause problems when you are porting code
from another processor architecture. A common example is using a chartype variable ias a loop
counter, with loop continuation condition i 0. As iis unsigned for the ARM compilers, the loop will
never terminate. Fortunately armcc produces a warning inthis situation: unsigned comparison with 0.
Compilers also provide an override switch to make charsigned. For example, the command line
option -fsigned-charwill make char signedongcc.Thecommandlineoption-
zcwillhavethesameeffectwitharmcc.
For the rest of this book we assume that you are using an ARMv4
processor or above. This includes ARM7TDMI and all later
processors.
CDataType Implementation
char unsigned 8-bitbyte
short signed 16-bithalfword
int signed 32-bit word
long signed 32-bit word
long long signed 64-bit double word
check
s
um_v
1
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ;i=0
checksum_v1_loop
LDR r3,[r2,r1,LSL ; r3 =
#2] data[i]
ADD r1,r1,#1 ; r1 = i+1
AND r1,r1,#0xff ; i = (char)r1
CMP r1,#0x40 ; compare i,
64
ADD r0,r3,r0 ; sum += r3
BCC checksum_v1 ; if (i<64)
_lo op loop
MOV pc,r14 ; return sum
Now compare this to the compiler output where instead we declare ias an unsigned int.
check
s
um_v
2
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ;i=0
checksum_v2_loop
LDR r3,[r2,r1,LSL ; r3 = data[i]
#2]
ADD r1,r1,#1 ; r1++
CMP r1,#0x40 ; compare i, 64
ADD r0,r3,r0 ; sum += r3
BCC checksum_v2 ; if (i<64) goto
_lo op loop
MOV pc,r14 ; return sum
In the first case, the compiler inserts an extra AND instruction to reduce ito the range 0 to
255beforethecomparisonwith64. Thisinstructiondisappearsinthesecondcase.
Next, suppose the data packet contains 16-bit values and we need a 16-bit checksum. It is tempting to
write the following C code:
}
return sum;
}
You may wonder why the forloop body doesn‘t contain the code
sum += data[i];
With armcc this code will produce a warning if you enable implicit narrowing cast warnings using the
compiler switch -W+ n. The expression sum+data[i]is an integer and so can
onlybeassignedtoashortusingan(implicitorexplicit) narrowingcast. Asyou cansee
inthefollowingassemblyoutput,thecompilermustinsertextrainstructionstoimplement the narrowing
cast:
checksu
m
_v3
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ;i=0
checksum_v3_loop
ADD r3,r2,r1,LSL ; r3 = &data[i]
#1
LDRH r3,[r3,#0] ; r3 = data[i]
ADD r1,r1,#1 ; i++
CMP r1,#0x40 ; compare i, 64
ADD r0,r3,r0 ; r0 = sum + r3
MOV r0,r0,LSL #16
MOV r0,r0,ASR #16 ; sum = (short)r0
BCC checksum_v ; if (i<64) goto
3_l oop loop
MOV pc,r14 ; return sum
The loop is now three instructions longer than the loop for example checksum_v2 earlier!
There are two reasons for the extra instructions:
■ The LDRH instruction does not allow for a shifted address offset as the LDR
instruction did in checksum_v2. Therefore the first ADDin the loop calculates the address ofitem i in the
array. The LDRHloads from an address with no offset. LDRHhas fewer addressing
modesthanLDRasitwasalateradditiontotheARMinstructionset.(SeeTable5.1.)
■ The cast reducing total+array[i]to a shortrequires two MOVinstructions. The
compiler shifts left by 16 and then right by 16 to implement a 16-bit sign extend. The shift right is a sign-
extending shift so it replicates the sign bit to fill the upper 16 bits.
We can avoid the second problem by using an inttype variable to hold the partial sum. We
only reduce the sum to a shorttype atthe functionexit.
However, the first problem is a new issue. We can solve it by accessing the array by
incrementing thepointer data rather than using an index as in data[i]. This is efficient regardless of
array type size or element size. All ARM load and store instructions have a postincrement
addressing mode.
Example:
The checksum_v4 code fixes all the problems we have discussed in this section. It
uses int type local variables to avoid unnecessary casts. It increments the pointer
data instead of using an index offset data[i].
The compiler is still performing one cast to a 16-bit range, on the function return. You could
remove this also by returning an intresultas discussed inSection 5.2.2.
add_v1
ADD r0,r0,r1,ASR #1 ; r0 = (int)a + ((int)b>>1)
MOV r0,r0,LSL #16
MOV r0,r0,ASR #16 ; r0 = (short)r0
MOV pc,r14 ; return r0
The gcc compiler we used is more cautious and makes no assumptions about the range of argument value. This version of the
compiler reduces the input arguments to the rangeofashortin both the callerandthe callee. Italsocasts thereturnvalueto
ashorttype. Here is the compiled code for add_v1:
add_v
1_gcc
MOV r0, r0, LSL #16
MOV r1, r1, LSL #16
MOV r1, r1, ASR #17 ; r1 = (int)b>>1
ADD r1, r1, r0, ASR #16 ; r1 += (int)a
MOV r1, r1, LSL #16
MOV r0, r1, ASR #16 ; r0 = (short)r1
MOV pc, lr ; return r0
Notice that the compiler adds one to the sum before shifting by right if the sum is negative. In other words it
replaces x/2 by the statement:
Itmust do this because x is signed. In C onan ARM target, a divide by two is nota right shift if x is negative. For example, 3 1 2 but 3/2 1.
Division rounds towards zero, but arithmetic right shift− rounds towards = −. −
It is more efficient to use unsigned types for divisions. The compiler converts− unsigned power oftwo divisions directly toright
shifts. For general divisions, the divide routine in the C library is faster for unsigned types. See Section 5.10 for discussion on
avoidingdivisions completely.
■ For local variables held in registers, don‘t use a char or short type unless 8-bit or 16-bit modular
arithmetic is necessary. Use the signedor unsignedinttypes instead. Unsigned types are fasterwhen you use divisions.
■ For array entries and global variables held in main memory, use the type with the smallest size
possible to hold the required data. This saves memory footprint. The ARMv4 architecture is efficient at loading and storing all
data widths provided you traverse arrays by incrementing the array pointer. Avoid using offsets from the base of the array with
shorttypearrays,as LDRHdoesnotsupportthis.
■ Use explicit casts when reading array entries or global variables into local variables, or writing local
variables out to array entries. The casts make it clear that for fast operation you are taking a narrow width type stored in memory and
expanding it to a wider type in the registers. Switch on implicit narrowing cast warnings in the compiler to detect implicit casts.
■ Avoid implicit or explicit narrowing casts in expressions because they usually cost extra cycles. Casts on
loads or stores are usually free because the load or store instruction performs the cast for you.
■ Avoidcharandshorttypesforfunctionarguments orreturnvalues.Insteadusethe inttype even if the
range of the parameter is smaller. This prevents the compiler performing unnecessarycasts.
C LOOPING STRUCTURES
ThissectionlooksatthemostefficientwaystocodeforandwhileloopsontheARM.Westartbylookingatloopswithafixed
numberofiterationsand then move on to loop swith avariable number of iterations. Finally we look at loop unrolling.
Here is the last version of the 64-word packet checksum routine we studied in
Thisshowshowthecompilertreatsaloopwith incrementingcounti++.
This compiles to
checks
um_v5
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ;i=0
checksum_v5_loop
LDR r3,[r2],#4 ; r3 = *(data++)
ADD r1,r1,#1 ; i++
CMP r1,#0x40 ; compare i, 64
ADD r0,r3,r0 ; sum += r3
BCC checksum_v5_l ; if (i<64) goto loop
oop
MOV pc,r14 ; return sum
■ A subtract to decrement the loop counter, which also sets the condition code flags on the result
■ A conditional branch instruction
The key point is that the loop counter should count down to zero rather than counting up to some arbitrary limit. Then
the comparison with zero is free since the result is stored in the condition flags. Since we are no longer using ias an array
index, there is no problem in counting down rather than up.
EXAMPLE 2
This example shows the improvement if we switch to a decrementing loop rather than an
incrementing loop.
This compiles to
checksum
_v6
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0x40 ; i = 64
checksum_v6_loop
LDR r3,[r2],#4 ; r3 = *(data++)
SUBS r1,r1,#1 ; i-- and set flags
ADD r0,r3,r0 ; sum += r3
BNE checksum_v6_l ; if (i!=0) goto loop
oop
MOV pc,r14 ; return sum
The SUBSand BNEinstructions implement the loop. Our checksum example now has the minimum number of four
instructions per loop. This is much better than six for checksum_v1andeight for checksum_v3.
For an unsigned loop counter iwe can use either of the loop continuation conditions i!=0ori>0.Asican’tbenegative,they
are the same condition. For a signed loop counter, it is tempting to use the condition i>0 to continue the loop. You might
expect the compiler togenerate thefollowingtwoinstructions to implementtheloop:
The compiler is not being inefficient. It must be careful about the case when i=-0x80000000becausethetwo
sections ofcodegenerate differentanswers inthis case. For the firstpiece ofcode the SUBSinstruction compares i
with 1 and then decrements i. Since -0x80000000 < 1, the loop terminates. For the second piece of code, we
decrement iand then compare with 0. Modulo arithmetic means that inow has the value
+0x7fffffff, which is greater than zero. Thus the loop continues for many iterations.
Ofcourse, in practice, irarely takes the value -0x80000000. The compiler can’tusu- ally determine this, especially
if the loop starts with a variable number of iterations (see Section 5.3.2).
Therefore you should use the termination condition i!=0 for signed or unsigned loop counters. It saves one
instructionovertheconditioni>0forsignedi.
This compiles to
checks
um_v7
MOV r2,#0 ; sum = 0
CMP r1,#0 ; compare N, 0
BEQ checksum_v7 ; if (N==0) goto end
_end
checksum_v7_loop
LDR r3,[r0],#4 ; r3 = *(data++)
SUBS r1,r1,#1 ; N-- and set flags
ADD r2,r3,r2 ; sum += r3
BNE checksum_v7_l ; if (N!=0) goto loop
oop
checksum_v7_end
M r0,r2 ; r0 = sum
O
V
M pc,r14 ; return r0
O
V
Notice thatthe compiler checks that Nis nonzeroon entry to the function. Often this check is unnecessary since you know
thatthearraywon’tbeempty.Inthiscase ado-while loopgivesbetterperformanceandcodedensitythanaforloop.
EXAMPLE 3 This example shows how to use a do-whileloop to remove the test for Nbeing zero that occurs in a
forloop.
int checksum_v8(int *data, unsigned int N)
{
int sum=0;
do
{
sum += *(data++);
} while (--N!=0); return sum;
}
Comparethiswiththeoutputforchecksum_v7toseethetwo-cyclesaving. ■
LOOP UNROLLING
We saw in Section 5.3.1 that each loop iteration costs two instructions in addition to the body of the loop: a subtract to
decrementtheloopcountandaconditionalbranch.
We call these instructions the loop overhead. On ARM7 or ARM9 processors the subtracttakesonecycleand
thebranchthreecycles,givinganoverheadoffourcycles per loop.
You can save some of these cycles by unrolling a loop—repeating the loop body several times, and reducing the
number of loop iterations by the same proportion. For example, let’s unrollour packet checksum example four
times.
EXAMPLE The following code unrolls our packet checksum loop by four times. We assume that the number
4
ofwords in the packet Nis amultiple offour.
do
{
sum += *(data++); sum +=
*(data++); sum += *(data++); sum
+= *(data++); N -= 4;
} while ( N!=0); return sum;
}
This compiles to
checksum_v9
MOV r2,#0 ; sum = 0
checksum_v9_loop
LD r3,[r0],#4 r3 = *(data++)
R
SU r1,r1,#4 N -= 4 & set flags
BS
AD r2,r3,r2 sum += r3
D
LD r3,[r0],#4 r3 = *(data++)
R
AD r2,r3,r2 sum += r3
D
LD r3,[r0],#4 r3 = *(data++)
R
AD r2,r3,r2 sum += r3
D
LD r3,[r0],#4 r3 = *(data++)
R
AD r2,r3,r2 sum += r3
D
BN checksum_v9_loo p if (N!=0) goto
E loop
MO r0,r2 r0 = sum
V
MO pc,r14 return r0
V
Wehavereducedtheloopoverheadfrom4Ncyclesto(4N)/4 Ncycles.OntheARM7TDMI, this accelerates the loop from 8
cycles per accumulate to 20/4 5 cycles per accumulate, nearly doubling the speed! For the ARM9TDMI, which has a faster
load instruction, the benefitisevenhigher. ■
There are two questions you need to ask when unrolling a loop:
To start with the first question, only unroll loops that are important for the overall performance of the application. Otherwise
unrolling will increase the code size with little performance benefit. Unrolling may even reduce performance by evicting more
important code from the cache.
Suppose the loop is important, for example, 30% of the entire application. Suppose you unroll the loop until it is 0.5 KB in code size
(128 instructions). Then the loop overhead is at most 4 cycles compared to a loop bodyofaround 128 cycles. The loop overhead
cost is 3/128, roughly 3%. Recalling that the loop is 30% of the entire application, overall the loop overhead is only 1%. Unrolling
the code further gains little extra performance, but has a significant impact on the cache contents. It is usually not worth unrolling
further when the gain is less than 1%.
For the second question, try to arrange it so that array sizes are multiples of your unroll amount. Ifthisisn‘tpossible, thenyoumustadd
extracodetotakecareoftheleftovercases. Thisincreasesthecodesizealittlebutkeepstheperformancehigh.
EXAMPLE 5 This example handles the checksum of any size of data packet using a loop that has been unrolled
fourtimes.
The second forloophandles the remainingcases when Nis not a multiple offour. Note that both N/4and N&3can be zero,
so wecan‘t use do-whileloops.
■ Use loops that count down to zero. Then the compiler does not need to allocate aregistertohold
theterminationvalue,andthecomparisonwithzeroisfree.
■ Useunsignedloopcountersbydefaultandthecontinuationconditioni!=0ratherthan
i>0. This will ensure that the loop overhead is only two instructions.
■ Usedo-whileloops ratherthanforloopswhenyouknowtheloopwilliterateatleast once. Thissaves
thecompilercheckingtoseeiftheloopcountiszero.
■ Unroll important loops to reduce the loop overhead. Do not overunroll. If the loop overhead is small
asaproportion ofthetotal, then unrollingwillincrease code size and hurt the performance of the cache.
■ Try to arrange that the number of elements in arrays are multiples of four or eight. You can then unroll
loopseasilybytwo,four,oreighttimeswithoutworryingaboutthe leftover arrayelements.
First let’s look at the number of processor registers the ARM C compilers have avail- able for allocating variables.
Table 5.3 shows the standard register names and usage when following the ARM-Thumb procedure call standard
(ATPCS), which is used in code generated by Ccompilers.
Table 5.3 C compiler registerusage.
Alternate
Register register
number names ATPCS register usage
r0 a1 Argument registers. These hold the first four function arguments on
r1 a2 a function call and the return value on a function return. A function may
r2 a3 corrupt these registers and use them as general scratch registers within the
r3 a4 function.
r4 v1 General variable registers. The function must preserve the callee values of these
r5 v2 registers.
r6 v3
r7 v4
r8 v5
r9 v6sb Generalvariableregister.Thefunctionmustpreserve thecallee
value of this register except when compiling for read-write position independence (RWPI). Thenr9 holdsthe static base address.
Thisistheaddressoftheread-writedata.
r10 v7 sl Generalvariableregister.Thefunctionmustpreserve thecallee
value ofthisregister except whencompiling with stack limit checking.Thenr10holds thestacklimitaddress.
r11 v8 fp Generalvariableregister.Thefunctionmustpreserve thecallee
value ofthis registerexcept when compiling using aframe pointer.Onlyoldversionsofarmccuseaframepointer.
r12 ip A general scratchregister thatthe function can corrupt. Itis
usefulasascratchregisterforfunctionveneersorother intraprocedure callrequirements.
r13 sp The stack pointer, pointing to the full descending stack.
r14 lr The link register. On a function call this holds the return
address.
r15 pc The programcounter.
Provided thecompiler is notusing software stack checking ora frame pointer, then the C compiler can use registers r0 to
r12 and r14 to hold variables. It must save the callee values of r4 to r11 and r14 on the stack if using these registers.
Intheory,theCcompilercanassign14variablestoregisterswithoutspillage.Inpractice, somecompilersuseafixedregister such
as r12 for intermediate scratch working and do not assign variables to this register. Also, complex expressions require
intermediate working registers to evaluate. Therefore, to ensure good assignment to registers, you should try to limit the
internalloopoffunctionstousingatmost12localvariables.
Ifthecompilerdoes needtoswap outvariables, thenitchooseswhichvariables toswap out based onfrequency ofuse.A variable
used inside a loop counts multiple times. You can guide the compiler as to which variables are important by ensuring these variables are used
within the innermost loop.
TheregisterkeywordinChintsthatacompilershouldallocatethegivenvariableto a register. However, different compilers treat
this keyword in different ways, and different
architectureshaveadifferentnumberofavailable registers(forexample,ThumbandARM). Therefore we recommend that you
avoid using registerand rely on the compiler‘s normal register allocationroutine.
■ Try to limit the number oflocal variables in the internal loop offunctions to 12. The compiler should
be able to allocate these to ARM registers.
■ You can guide the compiler as to which variables are important by ensuring these variables are
used within the innermost loop.
FUNCTION CALLS
The ARM Procedure Call Standard (APCS) defines how to pass function arguments and return values in ARM registers.
The more recent ARM-Thumb Procedure Call Standard (ATPCS)covers ARMand Thumb interworkingas well.
ThefirstfourintegerargumentsarepassedinthefirstfourARMregisters:r0,r1,r2, andr3.Subsequentintegerargumentsare
placedonthefulldescendingstack,ascendingin memory as inFigure 5.1. Function return integer values arepassed in r0.
Thisdescriptioncoversonlyintegerorpointerarguments.Two-wordargumentssuchas longlongor doubleare passed in a pair of
consecutive argument registers and returned in r0, r1. The compilermaypass structures inregisters or byreference accordingto
command line compileroptions.
The first point to note about the procedure call standard is the four-register rule. Functions with four or fewer
arguments are far more efficient to call than functions with five or more arguments. For functions with four or fewer
arguments, the compiler can pass all the arguments in registers. For functions with more arguments, both the caller and
callee must access the stack for some arguments. Note that for C++ the first argument to an object method is the this pointer.
Thisargumentisimplicit andadditionalto the explicit arguments.
If your C function needs more than four arguments, or your C++ method more than three explicit arguments, then it
is almost always more efficient to use structures. Group related arguments into structures, and pass a structure pointer rather
than mul- tiple arguments. Which arguments are related will depend on the structure of your software.
sp + 16 Argument 8
sp + 12 Argument 7
sp + 8 Argument 6
sp + 4 Argument 5
sp Argument 4
r3 r2 r1 r0 Argument 3
Argument 2
char *queue_bytes_v1(
char *Q_start, /* Queue buffer start address */
char *Q_end, /* Queue buffer end address */
char *Q_ptr, /* Current queue pointer position */
char *data, /* Data to insert into the queue */
unsigned int N) /* Number of bytes to insert */
{
do
{
*(Q_ptr++) = *(data++);
if (Q_ptr == Q_end)
{
Q_ptr = Q_start;
}
} while (--N); return Q_ptr;
}
This compiles to
queue_bytes_v1
STR r14,[r13,#-4]! save lr on the
stack
LDR r12,[r13,#4] r12 = N
queue_v1_loop
LDRB r14,[r3],#1 r14 = *(data++)
STRB r14,[r2],#1 *(Q_ptr++) = r14
CMP r2,r1 if (Q_ptr ==
Q_end)
MOVEQ r2,r0 {Q_ptr = Q_start;}
SUBS r12,r12,#1 --N and set flags
BNE queue_v1_l if (N!=0) goto loop
oop
MOV r0,r2 r0 = Q_ptr
LDR pc,[r13],#4 return r0
Compare this with a more structured approach using three function arguments.
EXA The following code creates a Queuestructure and passes this to the function to reduce the number
M of function arguments.
PLE
typedef struct {
char *Q_start; /* Queue buffer start address */ char
*Q_end; /* Queue buffer end address */
char *Q_ptr; /* Current queue pointer position */
} Queue;
do
{
*(Q_ptr++) = *(data++);
if (Q_ptr == Q_end)
{
Q_ptr = queue->Q_start;
}
} while (--N);
queue->Q_ptr = Q_ptr;
}
This compiles to
queue_bytes_v2
S r14,[r13,#-4]! save lr on the stack
T
R
L r3,[r0,#8] r3 = queue-
D >Q_ptr
R
L r14,[r0,#4] r14 = queue-
D >Q_end
R
queue_v2_loop
LDRB r12,[r1],#1 ; r12 = *(data++)
STRB r12,[r3],#1 ; *(Q_ptr++) = r12
CMP r3,r14 ; if (Q_ptr == Q_end)
LDRE r3,[r0,#0] ; Q_ptr = queue->Q_start
Q
SUBS r2,r2,#1 ; --N and set flags
BNE queue_v2_loop ; if (N!=0) goto loop
■ STR r3,[r0,#8] ; queue->Q_ptr = r3
The queue_bytes_v2is one instruction longer than queue_bytes_v1, but it is in fact more efficient overall. The second
version has only three function arguments rather than five. Each call to the function requires only three register setups. This
compares with four register setups, a stack push, and a stack pull for the first version. There is a net saving of two
instructions in function call overhead. There are likely further savings in the callee function, as it only needs to assign a
single register to the Queuestructure pointer, rather than three registers in the nonstructured case.
There are other ways of reducing function call overhead if your function is very small and corrupts few registers (uses few
local variables). Put the C function in the same C file as the functions that will call it. The C compiler then knows the code
generated for the callee functionandcanmakeoptimizations in thecaller function:
■ The caller function need not preserve registers that it can see the callee doesn‘t corrupt. Therefore the
callerfunctionneednotsavealltheATPCScorruptibleregisters.
■ If the callee function is very small, then the compilercan inline the code in the caller function. This
removesthefunctioncalloverheadcompletely.
EXAMPLE
The function uint_to_hexconverts a 32-bitunsigned integer into an array ofeight hexa- decimal digits. It uses a helper
function nybble_to_hex, which converts a digitdin the range 0 to 15 to a hexadecimal digit.
return d - 10 + ‘A‘;
}
When we compile this, we see that uint_to_hexdoesn‘t call nybble_to_hexatall! In the following compiled code, the
compiler has inlined the uint_to_hexcode. This is more efficient than generating a function call.
uint_to_hex
MOV r3,#8 ;i=8
uint_to_hex_loop
MOV r1,r1,ROR #28 ; in = (in<<4)|(in>>28)
AND r2,r1,#0xf ; r2 = in & 15
CMP r2,#0xa ; if (r2>=10)
ADDC r2,r2,#0x37 ; r2 +=’A’-10
S
ADDC r2,r2,#0x30 ; else r2 +=’0’
■ C
STRB r2,[r0],#1 ; *(out++) = r2
SUBS r3,r3,#1 ; i-- and set flags
BNE uint_to_hex_loop ; if (i!=0) goto loop
MOV pc,r14 ; return
The compiler will only inline small functions. You can ask the compiler to inline a functionusing the inline
keyword, although this keyword is only a hint and the compiler may ignore it (see Section 5.12 for more on
inline functions). Inlining large functions can lead to big increases in code size without much performance
improvement.
POINTER ALIASING
Two pointers are said to alias when they point to the same address. If you write to one pointer, it will affect the value you
read from the other pointer. In a function, the compiler often doesn’t know which pointers can alias and which pointers
can’t. The compiler must be very pessimistic and assume that any write to a pointer may affect the value read from any
other pointer, which can significantly reduce code efficiency.
Let’sstartwithaverysimpleexample.Thefollowingfunctionincrementstwotimer values by a step amount:
This compiles to
timers_v1
LD R r3,[r0,#0] ; r3 = *timer1
AD D r3,r3,r12 ; r3 += r12
ST R r3,[r0,#0] ; *timer1 = r3
LD R r0,[r1,#0] ; r0 = *timer2
LD R r2,[r2,#0] ; r2 = *step
AD D r0,r0,r2 ; r0 += r2
ST R r0,[r1,#0] ; *timer2 = t0
EXAM Inthecodefortimers_v3weusealocalvariablesteptoholdthevalueofstate -
PLE >step. Nowthe compilerdoesnotneed toworrythatstatemayaliaswith
timers.
void timers_v3(State *state, Timers *timers)
{
int step = state->step;
timers->timer1 += step; timers->timer2
+= step;
} ■
You must also be careful of other, less obvious situations where aliasing may occur. When you
call another function, thisfunction mayalterthestateofmemoryandsochange the values of any
expressions involving memory reads. The compiler will evaluate the expressions again. For
example suppose you read state->step, call a function and then read state- >step again. The
compiler must assume that the function could change the value of state- >step in
memory.Thereforeitwillperformtworeads,ratherthanreusing the first value it read for
state- >step.
Another pitfall is to take the address of a local variable. Once you do this, the variable is
referenced by a pointer and so aliasing can occur with otherpointers. The compiler is likely to keep
reading the variable from the stack in case aliasing occurs. Consider the following example,
which reads and then checksums a data packet:
int checksum_next_packet(void)
{
int *data; int N, sum=0;
data =
get_next_packet(&N);
do
{
sum += *(data++);
} while (--N);
return sum;
}
Here get_next_packetis a function returning the address and size of the next data packet.
The previous code compiles to
checksum_next_packet
STMFD r13!,{r4,r ; save r4, lr on the stack
14}
create two stacked
SU B r13,r13,#8 variables
BL get_next_pa ; r0 = data
ck et
checksum_loop
LD R r1,[r0],#4 ; r1 = *(data++)
L D M FD r13!,{r4,pc} ; return r0
Note how the compiler reads and writes N from the stack for every N--. Once you take the
address of N and pass it to get_next_packet, the compiler needs to worry about aliasing because the
pointers dataand &Nmay alias. To avoid this, don‘t take the address of local variables. If you must do
this, then copy the value into another local variable before use.
You may wonder why the compiler makes room for two stacked variables when it only uses one.
This is to keep the stack eight-byte aligned, whichisrequiredforLDRDinstructions available in
ARMv5TE. The example above doesn‘t actually use an LDRD, but the compiler
doesnotknowwhetherget_next_packetwilluse this instruction.
SUMMARY Avoiding Pointer Aliasing
STRUCTURE ARRANGEMENT
The way you lay out a frequently used structure can havea significant impact onits perfor-
manceandcodedensity.Therearetwo issuesconcerningstructuresontheARM:alignment
ofthestructureentriesandtheoverallsizeofthestructure.
For architectures up to and including ARMv5TE, load and store instructions are only guaranteed to
load and store values with address aligned to the size of the access width. Table 5.4
summarizes these restrictions.
For this reason, ARM compilers will automatically align the start address ofa structure toamultiple
ofthelargestaccesswidth used within the structure (usually four or eight bytes) and alignentries
within structures to their access width by inserting padding.
For example, consider the structure
struct { char a; int
b; char c; short d;
}
For a little-endian memory system the compiler will lay this out adding padding to ensure that the
next object is aligned to the sizeofthatobject:
Address +3 +2 +1 +0
+0 pad pad pad a
+4 b[31,24] b[23,16] b[15,8] b[7,0]
+8 d[15,8] d[7,0] pad c
This reduces the structure size from 12 bytes to 8 bytes, with the
following new layout: Address +3 +2 +1 +0
+0 d[15,8] d[7,0] c a
b[31,24] b[23,16] b[15,8] b[7,0]
+4
Therefore, it is a good idea to group structure elements of the same size, so that the structure
layout doesn’t contain unnecessary padding. The armcc compiler does include akeyword
packedthatremovesallpadding.Forexample,the structure
Address +3 +2 +1 +0
+0 b[23,16] b[15,8] b[7,0] a
d[15,8] d[7,0] c b[31,24]
+4
Instructions Offset available from the base register
LDRB, LDRSB, STRB 0 to 31 bytes
LDRH, LDRSH, STRH 0 to 31 halfwords (0 to 62 bytes)
LDR, STR 0 to 31 words (0 to 124 bytes)
void dostageA(void); void
dostageB(void); void dostageC(void);
typedef struct {
unsigned int stageA : 1; unsigned int stageB : 1;
unsigned int stageC : 1;
} Stages_v1;
if (stages->stageB)
{
dostageB();
}
if (stages->stageC)
{
dostageC();
}
}
Note that the compiler accesses the memory location containing the bit-field three times. Because the bit-field is stored in
memory, the dostagefunctions could change the value. Also, the compiler uses two instructions to test bit 1 and bit 2 of the
bit-field, rather than a single instruction.
You can generate far more efficient code by using an integer rather than a bit-field. Use
enumor #definemasks to divide the integer type into different fields.
EXAMPLE The following code implements the dostagesfunction using logical operations rather than bit-fields:
This compiles to
readint
B IC r3,r0,#3 ; r3 = data & 0xFFFFFFFC
STRB 8 X X X B ( A)
LDRH 16 0 0 B(A+) B ( A)
E XAMPLE
Thesefunctionsread a32-bitintegerfromabytestreampointedtobydata. Thebytestream contains little- or big
endian data, respectively. These functions are independent of the ARM memory system byte order since
they only use byte accesses.
a0 = *(data++); a1 =
*(data++); a2 = *(data++); a3
= *(data++);
return a0 | (a1<<8) | (a2<<16) | (a3<<24);
}
{
int a0,a1,a2,a3;
a0 = *(data++); a1 = *(data++); a2
= *(data++); a3 = *(data++);
return (((((a0<<8) | a1)<<8) | a2)<<8) | a3;
} ■
If speed is critical, then the fastest approach is to write several variants of the critical routine.
Foreachpossiblealignment andARMendiannessconfiguration,youcallaseparate routine optimized for
that situation.
E XAMPLE The read_samplesroutine takes an array of N16-bit sound samples at address in.
The sound samples are little-endian (for example from a.wavfile) and can be at
any byte alignment. The routine copiesthe samples to an aligned array of shorttype
values pointed tobyout.
d
o
{
sample = *(data++); #ifdef
BIG_ENDIAN sample = (sample >> 8) |
(sample<<8); #endif
*(out++) = (short)sample;
} while
(--N);
break;
case 1: /* the input pointer is not aligned */ data = (unsigned short
*)(in-1);
sample = *(data++);
#ifdef BIG_ENDIAN
sample = sample & 0xFF; /* get first byte of sample */
#else #endif
sample = sample >> 8; /* get first byte of sample */
#ifdef BIG_ENDIAN
do
{
next = *(data++);
/* complete one sample and start the next */
}
*out++ = (short)((next << 8) | sample); sample = next>>8;
}
}
The routine works by having different code for each endianness and alignment. Endianness
isdealtwithatcompiletimeusingthe BIG_ENDIANcompilerflag.Alignment must be dealt with at run time
using the switchstatement.
You can make the routine even more efficient by using 32-bit reads and writes rather than 16-bit reads and writes,
which leadstofourelementsintheswitch statement,onefor eachpossible addressalignment modulofour.
DIVISION
The ARM does not have a divide instruction in hardware. Instead the compiler implements divisions by calling
softwareroutinesintheClibrary.Therearemanydifferenttypesofdivision routine thatyou can tailor to a specific
range of numerator and denominator values. We look at assembly division routines in detail in Chapter 7. The
standard integer division routine provided in the C library can take between 20 and 100 cycles, depending on
implementation,earlytermination,andtherangesoftheinputoperands.
Division and modulus (/ and %) are such slow operations that you should avoid them as much as possible. However,
division by a constant and repeated division by the same denominator can be handled efficiently. This section
describes how to replacecertain divisionsbymultiplicationsandhowtominimizethenumberofdivisioncalls.
Circular buffers are one area where programmers often use division, but you can avoid these divisions completely.
Suppose you have a circular buffer of size buffer_size bytes and a position indicated by a buffer offset. To
advance the offset by incrementbytesyou could write
offset += increment;
if (offset>=buffer_size)
{
offset -= buffer_size;
}
The first version may take 50 cycles; the second will take 3 cycles because it does not involve a division. We’ve
assumed that increment < buffer_size; you can always arrange this in practice.
If you can’t avoid a division, then try to arrange that the numerator and denominator are unsigned integers.
Signed division routines are slower since they take the absolute values of the numerator and denominator and
then call the unsigned division routine. They fix the sign of the result afterwards.
Many C library division routines return the quotient and remainder from the division. In other words a free
remainder operation is available to you with each division operation and vice versa. For example, to find
the (x, y) position of a locationatoffsetbytes into a screenbuffer, it is tempting towrite
It appears that we have saved a division by using a subtract and multiply to calculate p.x, butin fact, it is often more
efficienttowritethefunctionwiththemodulusorremainder operation.
EXAMPLE
Ingetxy_v2,thequotientandremainderoperationonlyrequireasinglecalltoadivision routine:
There is only one division call here, as you can see in the following compiler output. In fact, this
version is four instructions shorter than getxy_v1. Note that this may not be the case for all
compilers and C libraries.
getxy_v2
STMF r13!,{r4, r14} ; stack r4, lr
D
MOV r4,r0 ; move p to r4
MOV r0,r2 ; r0 = bytes_per_line
BL rt_udiv ; (r0,r1) = (r1/r0, r1%r0)
STR r0,[r4,#4] ; p.y = offset / bytes_per_line
STR r1,[r4,#0] ; p.x = offset % bytes_per_line
(x, y , z) → (x/z, y /z )
In these situations it is more efficient to cache the value of 1/zin some way and use a mul-
tiplication by 1/zinstead of a division. We will show how to do this in the next subsection.
Introduction -Operating system (OS): An Operating system (OS) is a piece of software that
controls the overall operation of the Computer. It acts as an interface between hardware and
application programs .It facilitates the user to format disks, create, print, copy, delete and display
files, read data from files ,write data to files , control the I/O operations, allocate memory
locations and process the interrupts etc. It provides the users an interface to the hardware
resources. In a multiuser system it allows several users to share the CPU time, share the other
system resources and provide inter task communication, Timers, clocks, memory management and
also avoids the interference of different users in sharing the resources etc. Hence the OS is also
known as a resource manager.
So, the Operating system can also be defined as a collection of system calls or functions which
provide an interface between hardware and application program.
It manages the hardware resources of a computer and hosting applications that run on the
computer. Hence it is also called a resource Manager.
An OS typically provides multitasking, synchronization, Interrupt and Event Handling,
Input/Output, Inter-task Communication, Timers and Clocks and Memory Management. The core
of the OS is the Kernel which is typically a small, highly optimized set of libraries.
The Kernel is a program that constitutes the central core of an operating system. It has complete
control over everything that occurs in the system. The Kernel is the first part of the operating
system to load into memory during booting (i.e., system startup), and it remains there for the entire
duration of the session because its services are required continuously.
The kernel provides basic services for all other parts of the operating system, typically including
memory management, process management, file management and I/O (input/output) management
(i.e., accessing the peripheral devices). These services are requested by other parts of the operating
system or by application programs through a specified set of program interfaces referred to as
system calls.
Popular Operating Systems: Windows (from Microsoft), MacOS, MS-Dos, Linux(Open source),
Unix (Multi user-Bell Labs), Xenix (Microsoft), Android (Mobile).
Types of operating systems:
An Operating system (OS) is nothing but a piece of software that controls the overall operation of
the Computer. It acts as an interface between hardware and application programs .It facilitates the
user to format disks, create ,print ,copy , delete and display files , read data from files ,write data to
files ,control the I/O operations , allocate memory locations and process the interrupts etc. It
provides the users an interface to the hardware resources. In a multiuser system it allows several
users to share the CPU time, share the other system resources and provide inter task
communication, Timers, clocks, memory management and also avoids the interference of different
users in sharing the resources etc. Hence the OS is also known as a resource manager.
There are three important types of operating systems .They are (i).Embedded Operating System
(ii). Real time operating system and (iii).Handheld operating system.
(i).Embedded Operating System
The operating system used for embedded computer systems is known as embedded operating
system. These operating systems are designed to be compact, efficient, and reliable.
The embedded operating system uses a preemptive priority based kernel. But this kernel do not
meet the strict deadlines. By removing the unnecessary components from the kernel of desktop
operating system, the embedded operating can be obtained. This OS occupies less memory
space.
The popularly known embedded operating systems are
(a).Embedded NT (b) Windows XP Embedded (c) Embedded
Linux
The Embedded NT for its minimal operation without any network support occupies nearly 9MB of
RAM and 8 MB of Flash .It is a preemptive, multitasking operating system. Generally Embedded
NT is preferred to other OSs because of its ease in developing the applications. It is suitable for
embedded systems built around single board computers for applications, like Internet Kiosks,
Automatic Teller Machines (ATM) etc..
Microsoft Windows XP Embedded is the successor to Embedded NT. It is also pre- emptive
multitasking operating system like Embedded NT. This OS is widely used in set top boxes, point
of sale terminals and Internet Kiosks etc.
Embedded Linux is a open source software and it is covered by GNU General Public License
(GPL) and hence the complete source code is available at free of cost. The important features of
Embedded Linux are POSIX support and availability of large software resources.
Embedded Linux is used in embedded computer systems such as mobile phones, personal digital
assistants, media players, set-top boxes, and other consumer electronics devices, networking
equipment, machine control, industrial automation, navigation equipment and medical
instruments.
REAL TIME SYSTEMS: Real-time systems are those systems in which the correctness of the
system depends not only on the Output, but also on the time at which the results are produced
(Time constraints must be strictly followed).
Real time systems are two types. (i) Soft real time systems and (ii) Hard real time systems. A Soft
real time system is one in which the performance of the system is only degraded but, not destroyed
if the timing deadlines are not met.
For Ex: Air conditioner, TV remote or music player, Bus reservation ,automated teller machine in
a bank , A Lift etc.
A hard Real time system is one in which the failure to meet the time dead lines may lead to a
complete catastrophe or damage to the system.
For Ex: Air navigation system, Nuclear power plant , Failure of car brakes , Gas leakage system
,RADAR operation ,Air traffic control system etc.
Typical Real Time Applications: Real Time systems find applications in various fields of
science and technology. The prominent applications are (i) Digital Control (ii) command and
control, (iii) Signal processing (iv) Telecommunication systems and (v) Defense etc.
Examples:
In automobile engineering, the real time systems control the engine and brakes of the
vehicle and regulate traffic lights for smooth travel.
In air craft monitoring, the real time systems schedule and monitor the takeoff and landing
of the planes, make it fly, maintain the flight path, and avoid accidents.
The real time patient care system monitor and regulate the blood pressure and heart beats
of the patient and also, they can entertain people with electronic games, TV and music.
The real time systems are found in Air Traffic Control system also. The Air Traffic Control
(ATC) system regulates the flow of flights to each destination airport. It does so by
assigning to each aircraft an arrival time and en route to the destination
The real time systems are important in industries also. For example a system of robots
perform assembly tasks and repairs in a factory or chemical industries where human beings
cannot enter.
An avionics system for a military aircraft, the real time systems perform the tracking and
ballistic computations and coordinates the RADAR and weapon control systems.
Digital filtering, video and voice compressing/decompression, and radar signal processing
are the major applications of real time systems in signal processing.
Another interesting application is the real-time database systems that refers to a diverse
spectrum of information systems, ranging from stock price quotation systems, to track
records databases, to real-time file systems.
Real time systems are also found in Supervisory Control and Data Acquisition (SCADA).
In SCADA systems the sensors are placed at different geographical points to collect the
raw data and this data are processed and stored in a Real time data base.
Robots used in nuclear power stations, to handle the radioactive material and other dangerous
materials.
Real time system applications are also found in office automation where
LASER printers and FAX machines are used.
REAL TIME OPERATING SYSTEM (RTOS): It is an operating system that supports real-time
applications by providing logically correct result within the deadline set by the user. A real time
operating system makes the embedded system into a real time embedded system. The basic
structure of RTOS is similar to regular OS but, in addition, it provides mechanisms to allow real
time scheduling of tasks.
Though the real-time operating systems may or may not increase the speed of execution, but they
provide more precise and predictable timing characteristics than general-purpose OS.
The figure below shows the embedded system with RTOS.
All the embedded systems are not designed with RTOS. Low end application systems do not
require the RTOS but only High end application oriented embedded systems which require
scheduling alone need the RTOS.
For example an embedded system which measures Temperature or Humidity etc. do not require
any operating system. Whereas a Mobile phone , RADAR or Satellite system used for high end
applications require an operating system.
RTOS Applications/Featur
es
Windows CE (Microsoft Used small foot print mobile and connected devices Supported by
Widows) ARM,MIPS, SH4 & x86 architectures
LynxOS Complex, hard real-time applications
Usage - RTOS are typically used for embedded applications, while General Purpose OS are
used for Desktop PCs or other generally purpose PCs.
Note: Jitter: The Timing error of a task over subsequent iterations of a program or loop is referred
to as jitter. RTOS are optimized to minimize jitter.
i. Monolithic kernels: provide rich and powerful abstractions of the underlying hardware.
ii. Microkernels provide a small set of simple hardware abstractions and use applications called
servers to provide more functionality
iii. Hybrid (modified Micro kernels) Kernels are much like pure Microkernels, except that they
include some additional code in kernel space to increase performance.
iv. Exo-kernels provide minimal abstractions, allowing low-level hardware access. In Exo-
kernel systems, library operating systems provide the abstractions typically present in monolithic
kernels.
Pre-Emptive and Non-Pre-Emptive: In a normal operating system ,if a task is running ,it will
continue to run until its completion .It cannot be stopped by the OS in the middle due to any
reason
.Such concept is known as non-preemptive.
In real time OS, a running task can be stopped due to a high priority task at any time with-out the
willing of present running task. This is known as pre-emptiveness.
So, Preemptive scheduling involves scheduling based on the highest priority. The highest priority
will always be given chance. Non-preemptive scheduling is a process is not interrupted once
started until it is finished.
Initialization of RTOS:
RTOS is initialized using the following code. Void main(void)
{
Init RTOS( ); /*Initialize the RTOS*/ Start task (v respond to Button, High
_priority); Start task (v calculate task levels , low_priority); Start_RTOS ( ); /*start
RTOS*/
}
The heart or nucleus of any RTOS is the kernel. Inside the kernel is the scheduler. It is basically a
set of algorithms which manage the task running order. Multitasking definition comes from the
ability of the kernel to control multiple tasks that must run within time deadlines. Multitasking
may give the impression that multiple threads are running concurrently, as a matter of fact the
processer runs task by task, according to the task scheduling.
A task is a basic unit or atomic unit of execution that can be scheduled by an RTOS to use the
system resources like CPU, Memory, I/O devices etc. It starts with reading of the input data and of
the internal state of the task, and terminates with the production of the results and updating the
internal state. The control signal that initiates the execution of a task is provided by the operating
system.
There are two types of tasks. (i)Simple Task(S-Task) and (ii) Complex Task(C-Task).
Simple Task (S-task): A simple task is one which has no synchronization point i.e., whenever an S
-task is started, it continues until its termination point is reached. Because an S-task cannot be
blocked within the body of the task the execution time of an S-task is not directly dependent on
the progress of the other tasks in the node. S- task is mainly used for single user systems.
Complex Task (C-Task): A task is called a complex task (C-Task) if it contains a blocking
synchronization statement (e.g., a semaphore operation "wait") within the task body. Such a "wait"
operation may be required because the task must wait until a condition outside the task is satisfied,
e.g., until another task has finished updating a common data structure, or until input from a
terminal has arrived.
Task States:
At any instant of time a task can be in one of the following states:
If no task is ready to run and all of the tasks are blocked, the RTOS will usually run the Idle Task. An
Idle Task does nothing .The idle task has the lowest priority.
void Idle task(void)
{
While(1);
}
Creation of a Task:
A task is characterized by the parameters like task name , its priority , stack size and operating
system options .To create a task these parameters must be specified .A simple program to create a
task is given below.
result = task-create(―Tx Task‖, 100,0x4000,OS_Pre-emptiable); /*task create*/ if (result =
= os_success)
{ /*task successfully created*/
}
Task Scheduler:
Task scheduler is one of the important component of the Kernel .Basically it is a set of algorithms
that manage the multiple tasks in an embedded system. The various tasks are handled by the
scheduler in an orderly manner.
This produces the effect of simple multitasking with a single processor. The advantage of using a
scheduler is the ease of implementing the sleep mode in microcontrollers which will reduce the
power consumption considerably (from mA to µA). This is important in battery operated
embedded systems.
The task scheduler establishes task time slots. Time slot width and activation depends on the
available resources and priorities.
A scheduler decides which task will run next in a multitasking system. Every RTOS provides
three specific functions.
(i).Scheduling (ii) Dispatching and (iii). Inter-process communication and synchronization.
e scheduling determines ,which task ,will run next in a multitasking system and the dispatches
perform the necessary book keeping to start the task and Inter-process communication and
synchronization assumes that each task cooperate with others.
Scheduling Algorithms: In Multitasking system to schedule the various tasks, different
scheduling algorithms are used. They are (a).First in First out (b).Round Robin algorithm
(c).Round Robin with priority (d) Non-preemptive (e)Pre-emptive.
In FIFO scheduling algorithm, the tasks which are ready-to-run are kept in a queue and the CPU
serves the tasks on first-come-first served basis.
In Round-Robin Algorithm the kernel allocates a certain amount of time for each task waiting in
the queue. For example, if three tasks 1, 2 and 3 are waiting in the queue, the CPU first executes
task1 then task2 then task3 and then again task1.
The round-robin algorithm can be slightly modified by assigning priority levels to the tasks. A
high priority task can interrupt the CPU so that it can be executed. This scheduling algorithm can
meet the desired response time for a high priority task. This is the Round Robin with priority.
In Shortest-Job First scheduling algorithm, the task that will take minimum time to be executed
will be given priority. The disadvantage of this is that as this approach satisfies the maximum
number of tasks, some tasks may have to wait forever.
In preemptive multitasking, the highest priority task is always executed by the CPU, by
preempting the lower priority task. All real-time operating systems implement this scheduling
algorithm.
The various function calls provided by the OS API for task management are given below.
Create a task
Delete a task
Suspend a task
Resume a task
Change priority of a task
Query a task
Process or Task:
Embedded program (a static entity) = a collection of firmware modules. When a firmware module
is executing, it is called a process or task . A task is usually implemented in C by writing a
function. A task or process simply identifies a job that is to be done within an embedded
application.
When a process is created, it is allocated a number of resources by the OS, which may include: –
Process stack – Memory address space – Registers (through the CPU) – A program counter (PC) –
I/O ports, network connections, file descriptors, etc.
Threads: A process or task is characterized by a collection of resources that are utilized to
execute a program. The smallest subset of these resources (a copy of the CPU registers including
the PC and a stack) that is necessary for the execution of the program is called a thread. A thread is
a unit of computation with code and context, but no private data.
Multitasking:
A multitasking environment allows applications to be constructed as a set of independent tasks,
each with a separate thread of execution and its own set of system resources. The inter-task
communication facilities allow these tasks to synchronize and coordinate their activity.
Multitasking provides the fundamental mechanism for an application to control and react to
multiple, discrete real-world events and is therefore essential for many real-time applications.
Multitasking creates the appearance of many threads of execution running concurrently when, in
fact, the kernel interleaves their execution on the basis of a scheduling algorithm. This also leads
to efficient utilization of the CPU time and is essential for many embedded applications where
processors are limited in computing speed due to cost, power, silicon area and other constraints. In
a multi-tasking operating system it is assumed that the various tasks are to cooperate to serve the
requirements of the overall system. Co-operation will require that the tasks communicate with
each other and share common data in an orderly an disciplined manner, without creating undue
contention and deadlocks. The way in which tasks communicate and share data is to be regulated
such that communication or shared data access error is prevented and data, which is private to a
task, is protected. Further, tasks may be dynamically created and terminated by other tasks, as and
when needed.
The faster the ISR can do its job, the better the real time performance of the RTOS. Hence the ISR
should be always as small as possible.
When the CPU receives either software or hardware interrupts, it will try to execute the
corresponding ISR. Before that all the other interrupt sources are disabled and the interrupts are
enabled only after the completion of the ISR .Hence the CPU must execute the ISR as fast as
possible and also the ISR must be always as small as possible.
In real-time operating systems, the interrupt latency, interrupt response time and the interrupt
recovery time are very important.
Interrupt Latency: It is the time between the generation of an interrupt by a device and the
servicing of the device which generated the interrupt.
For many operating systems, devices are serviced as soon as the device's interrupt handler is
executed. Interrupt latency may be affected by interrupt controllers, interrupt masking, and the
operating system's (OS) interrupt handling methods.
Interrupt Response Time: Time between receipt of interrupt signal and starting the code that
handles the interrupt is called interrupt response time.
Interrupt Recovery Time: Time required for CPU to return to the interrupted code/highest priority
task is called interrupt recovery time.
Semaphores:
A semaphore is nothing but a value or variable or data which can control the allocation of a
resource among different tasks in a parallel programming environment. So, Semaphores are a useful
tool in the prevention of race conditions and deadlocks; however, their use is by no means a
guarantee that a program is free from these problems.
Semaphores which allow an arbitrary resource count are called counting semaphores, whilst
semaphores which are restricted to the values 0 and 1 (or locked/unlocked, unavailable/available)
are called binary semaphores.
The operation of a semaphore can be understood from the following diagram.
A binary semaphore is a synchronization object that can have only two states 0 or 1.
Take: Taking a binary semaphore brings it in the ―taken‖ state, trying to take a semaphore that is
already taken enters the invoking thread into a waiting queue.
Release: Releasing a binary semaphore brings it in the ―not taken‖ state if there are not queued
threads. If there are queued threads then a thread is removed from the queue and resumed, the
binary semaphore remains in the ―taken‖ state. Releasing a semaphore that is already in its ―not
taken‖ state has no effect.
Binary semaphores have no ownership attribute and can be released by any thread or interrupt
handler regardless of who performed the last take operation. Because of this binary semaphores are
often used to synchronize threads with external events implemented as ISRs, for example waiting
for a packet from a network or waiting that a button is pressed. Because there is no ownership
concept a binary semaphore object can be created to be either in the ―taken‖ or ―not taken‖ state
initially.
Counting Semaphores:
A counting semaphore is a synchronization object that can have an arbitrarily large number of
states. The internal state is defined by a signed integer variable, the counter.
The counter value (N) has a precise meaning: The Negative value indicates that, there are exactly -
N threads queued on the semaphore.
The Zero value indicates that no waiting threads, a wait operation would put in queue the invoking
thread.
The Positive value indicates that no waiting threads, a wait operation would not put in queue the
invoking thread.
Two operations are defined for counting the semaphores.
Wait: This operation decreases the semaphore counter .If the result is negative then the invoking
thread is queued.
Signal: This operation increases the semaphore counter .If the result is nonnegative then a waiting
thread is removed from the queue and resumed.
Counting semaphores have no ownership attribute and can be signaled by any thread or interrupt
handler regardless of who performed the last wait operation .Because there is no ownership concept
a counting semaphore object can be created with any initial counter value as long it is non-negative.
The counting semaphores are usually used as guards of resources available in a discrete quantity.
For example the counter may represent the number of used slots into a circular queue, producer
threads would ―signal‖ the semaphores when inserting items in the queue, consumer threads would
―wait‖ for an item to appear in queue, this would ensure that no consumer would be able to fetch an
item from the queue if there are no items available.
The OS function calls provided for Semaphore management are
Create a semaphore
Delete a semaphore
Acquire a semaphore
Release a semaphore
Query a semaphore
Mutexes:
Mutex means mutual exclusion A mutex is a synchronization object that can have only two states.
They are not-owned and owned. Two operations are defined for mutexes.
Lock: This operation attempts to take ownership of a mutex, if the mutex is already owned by
another thread then the invoking thread is queued.
Unlock: This operation relinquishes ownership of a mutex. If there are queued threads then a thread
is removed from the queue and resumed, ownership is implicitly assigned to the thread.
Mutex is basically a locking mechanism where a process locks a resource using mutex. As long as
the process has mutex, no other process can use the same resource. (Mutual exclusion). Once
process is done with resource, it releases the mutex. Here comes the concept of ownership. Mutex is
locked and released by the same process/thread. It cannot happen that mutex is acquired by one
process and released by other.
So, unlike semaphores, mutexes have owners. A mutex can be unlocked only by the thread that
owns it. Most RTOSs implement this protocol in order to address the Priority Inversion problem.
Semaphores can also handle mutual exclusion problems but are best used as a communication
mechanism between threads or between ISRs and threads.
The OS functions calls provided for mutex management are
Create a mutex
Delete a mutex
Acquire a mutex
Release a mutex
Query a mutex
Wait on a mutex
Difference between Mutex & Semaphore: Mutexes are typically used to serialize access to a
section of re-entrant code that cannot be executed concurrently by more than one thread. A mutex
object only allows one thread into a controlled section, forcing other threads which attempt to gain
access to that section to wait until the first thread has exited from that section.
A semaphore restricts the number of simultaneous users of a shared resource up to a maximum
number. Threads can request access to the resource (decrementing the semaphore), and can signal
that they have finished using the resource (incrementing the semaphore).
Mailboxes:
One of the important Kernel services used to send the Messages to a task is the message mailbox. A
Mailbox is basically a pointer size variable. Tasks or ISRs can deposit and receive messages (the
pointer) through the mailbox.
A task looking for a message from an empty mailbox is blocked and placed on waiting list for a
time (time out specified by the task) or until a message is received. When a message is sent to the
mail box, the highest priority task waiting for the message is given the message in priority-based
mailbox or the first task to request the message is given the message in FIFO based mailbox.
The operation of a mailbox object is similar to our postal mailbox. When someone posts a message
in our mailbox, we take out the message.
A task can have a mailbox into which others can post a mail. A task or ISR sends the message to the
mailbox.
To manage the mailbox object, the following function calls are provided in the OS API:
Create a mailbox
Delete a mailbox
Query a mailbox
Post a message in a mailbox
Read a message form a mailbox.
Message Queues:
The Message Queues, are used to send one or more messages to a task i.e. the message queues are
used to establish the Inter task communication. Basically Queue is an array of mailboxes. Tasks and
ISRs can send and receive messages to the Queue through services provided by the kernel.
Extraction of messages from a queue follow FIFO or LIFO structure.
Applications of message queue are
In each of these applications, a task or an ISR deposits the message in the message queue. Other
tasks can take the messages. Based on our application, the highest priority task or the first task
waiting in the queue can take the message. At the time of creating a queue, the queue is given
name or ID, queue length, sending task waiting list and receiving task waiting list.
To use a message queue, first it must be created. The creation of a Queue return a queue ID. So, if
any task wish to post some message to a task ,it should use its queue ID.
qid = queue_create( ―MyQueue‖ , Queue_options) ; //*Queue name and OS
specification options*//
Each queue can be configured as a fixed size/variable size.
The following function calls are provided to manage message queues
Create a queue
Delete a queue
Flush a queue
Post a message in queue
Post a message in front of queue
Read message from queue
Broadcast a message
Show queue information
Show queue waiting list.
Event Registers:
Some kernels provide a special register as part of each tasks control block .This register, called an
event register. It consists of a group of binary event flags used to track the occurrence of specific
events. Depending on a given kernel‘s implementation of this mechanism, an event register can be 8
or 16 or 32 bits wide, may be even more.
Each bit in the event register treated like a binary flag and can be either set or cleared.
Through the event register, a task can check for the presence of particular events that can control its
execution. An external source, such as a task or an ISR, can set bits in the event register to inform
the task that a particular event has occurred.
For managing the event registers, the following function calls are provided:
Pipes:
Pipes are kernel objects that are used to exchange unstructured data and facilitate synchronization
among tasks. In a traditional implementation, a pipe is a unidirectional data exchange facility, as
shown in below Figure.
Two descriptors, one for each end of the pipe (one end for reading and one for writing), are returned
when the pipe is created. Data is written via one descriptor and read via the other. The data remains
in the pipe as an unstructured byte stream.
Data is read from the pipe in FIFO order. A pipe provides a simple data flow facility so that the
reader becomes blocked when the pipe is empty, and the writer becomes blocked when the pipe is
full. Typically, a pipe is used to exchange data between a data-producing task and a data-consuming
task, as shown in the below Figure. It is also permissible to have several writers for the pipe with
multiple readers on it.
Create a pipe
Open a pipe
Close a pipe
Read from the pipe
Write to the pipe
Signals-Signal Handler
A signal is an event indicator. It is a software interrupt that is generated when an event occurs. It
diverts the signal receiver from its normal execution path and triggers the associated asynchronous
processing. Mainly the, signals notify tasks of events that occurred during the execution of other
tasks or ISRs. The difference between a signal and a normal interrupt is that signals are so-called
software interrupts, which are generated via the execution of some software within the system. By
contrast, normal interrupts are usually generated by the arrival of an interrupt signal on one of the
CPU‘s external pins. They are not generated by software within the system but by external devices.
The number and type of signals defined is both system-dependent and RTOS- dependent. An easy
way to understand signals is to remember that each signal is associated with an event. The event can
be either unintentional, such as an illegal instruction encountered during program execution, or the
event may be intentional, such as a notification to one task from another that it is about to terminate.
While a task can specify the particular actions to undertake when a signal arrives, the task has no
control over when it receives signals. Consequently, the signal arrivals often appear quite random,
When a signal arrives, the task is diverted from its normal execution path, and the corresponding
signal routine is invoked. The terms signal routine, signal handler, asynchronous event handler, and
asynchronous signal routine are inter-changeable. Each signal is identified by an integer value,
which is the signal number or vector number.
The function calls to manage a signal are
Get time
Set time
Time delay (in system clock ticks)
Time delay (in seconds)
Reset timer
Memory Management:
It is a service provided by a kernel which allots the memory needed, either static or dynamic for
various processes. The manager optimizes the memory needs and memory utilization. The memory
manager allocates memory to the processes and manages it with appropriate protection. There may
be static and dynamic allocations of memory. The manager optimizes the memory needs and
memory utilization. An RTOS may disable the support to the dynamic block allocation, MMU
support to the dynamic page allocation and dynamic binding as this increases the latency of
servicing the tasks and ISRs.
Hence, the two instructions ―Malloc‖ and ―free‖, although available in C language , are not used by
the embedded engineer ,because of the latency problem.
So, an RTOS may or may not support memory protection in order to reduce the latency and
memory needs of the processes.
The API provides the following function calls to manage memory
In Scheduling, priority inversion is the scenario where a low priority Task holds a shared resource
that is required by a high priority task. This causes the execution of the high priority task to be
blocked until the low priority task releases the resource, effectively ―inverting‖ the relative
priorities of the two tasks.
Suppose some other medium priority task, one that does not depend on the shared resource,
attempts to run in the interim, it will take precedence over both the low priority task and the high
priority task.
The consequences of the priority Inversion are
Saving memory:
Program Memory:
Data Memory:
• Make sure that you are not using two functions to do the same thing.
• Check that your development tools are not sabotaging you.
• Configure your RTOS to contain only those functions that you need.
• Look at the assembly language listings created by your cross-compiler to see if certain
of your C statements translate into huge numbers of instructions.
Saving power:
• The primary method for preserving battery power is to turn off parts or all of the system
whenever possible.
• Most embedded-system microprocessors have at least one power-saving mode; many
have several.
• The modes have names such as sleep mode, low-power mode, idle mode, standby
mode, and so on.
• A very common power-saving mode is one in which the microprocessor stops executing
instructions, stops any built-in peripherals, and stops its clock circuit. This saves a lot of
power, but the drawback typically is that the only way to start the microprocessor up
again is to reset it.
• Static RAM uses very little power when the microprocessor isn't executing instructions
• Another typical power-saving mode is one in which the microprocessor stops executing
instructions but the on-board peripherals continue to operate.
• Another common method for saving power is to turn off the entire system and have the
user turn it back on when it is needed.
Shared memory:
In this model stored information in a shared region of memory is processed, possibly under the
control of a supervisor process.
• multiple cores
Message Passing:
In this model, data is shared by sending and receiving messages between co-operating processes,
using system calls. Message Passing is particularly useful in a distributed environment where the
communicating processes may reside on different, network connected, systems. Message passing
architectures are usually easier to implement but are also usually slower than shared memory
architectures.
An example might be a networked cluster of nodes
Typically Inter-Process Communication is built on two operations, send() and receive() involving
communication links created between co-operating processes.
It can be said as the special case of message-passing model. It has become widely accepted because of
the following features: Simple call syntax and similarity to local procedure calls. Its ease of use,
efficiency and generality. It can be used as an IPC mechanism between processes on different machines
and also between different processes on the same machine.
Sockets:
Sockets (Berkley sockets) are one of the most widely used communication APIs. A socket is an object
from which messages and are sent and received. A socket is a network communication endpoint.
In connection-based communication such as TCP, a server application binds a socket to a specific port
number. This has the effect of registering the server with the system to receive all data destined for that
port. A client can then rendezvous with the server at the server's port, as illustrated here: Data transfer
operations on sockets work just like read and write operations on files. A socket is closed, just like a
file, when communications is finished.
Network communications are conducted through a pair of cooperating sockets, each known as the peer
of the other.
Processes connected by sockets can be on different computers (known as a heterogeneous environment)
that may use different data representations. Data is serialized into a sequence of bytes by the local
sender and deserialized into a local data format at the receiving end.
Task Synchronization:
All the tasks in the multitasking operating systems work together to solve a larger problem and to
synchronize their activities, they occasionally communicate with one another.
For example, in the printer sharing device the printer task doesn‘t have any work to do until new data is
supplied to it by one of the computer tasks. So the printer and the computer tasks must communicate
with one another to coordinate their access to common data buffers. One way to do this is to use a data
structure called a mutex. Mutexes are mechanisms provided by many operating systems to assist with
task synchronization.
A mutex is a multitasking-aware binary flag. It is because the processes of setting and clearing the
binary flag are atomic (i.e. these operations cannot be interrupted). When this binary flag is set, the
shared data buffer is assumed to be in use by one of the tasks. All other tasks must wait until that flag is
cleared before reading or writing any of the data within that buffer.
The atomicity of the mutex set and clear operations is enforced by the operating system, which disables
interrupts before reading or modifying the state of the binary flag.
Device drivers:
Simplify the access to devices – Hide device specific details as much as possible – Provide a consistent
way to access different devices.
A device driver USER only needs to know (standard) interface functions without knowledge of physical
properties of the device .
A device driver DEVELOPER needs to know physical details and provides the interface functions as
specified.
UNIT-IV
EMBEDDED SOFTWARE DEVELOPMENT TOOLS
Contents at a glance:
DEBUGGING TECHNIQUES
• Host:
– A computer system on which all the programming tools run
– Where the embedded software is developed, compiled, tested, debugged,
optimized, and prior to its translation into target device.
• Target:
– After writing the program, compiled, assembled and linked, it is moved to
target
– After development, the code is cross-compiled, translated – cross-assembled,
linked into target processor instruction set and located into the target.
• Suppose the native compiler on a Windows NT system is based on Intel Pentium. This
compiler may possible if target microprocessor is also Intel Pentium. This is not possible if
the target microprocessor is other than Intel i.e. like MOTOROLA, Zilog etc.
• A cross compiler that runs on host system and produces the binary instructions that will be
understood by your target microprocessor. This cross compiler is a program which will do the
above task. If we write C/C++ source code that could compile on native compiler and run on
host, we could compile the same source code through cross compiler and make run it run on
target also.
• That may not possible in all the cases since there is no problem with if, switch and loops
statements for both compilers but there may be an error with respect to the following:
In Function declarations
The size may be different in host and target
Data structures may be different in two machines.
Ability to access 16 and 32 bit entries reside at two machines.
Sometimes cross compiler may warn an error which may not be warned by native complier.
The figure shows the process of building software for an embedded system.
As you can see in figure the output files from each tool become the input files for the next.
Because of this the tools must be compatible with each other.
A set of tools that is compatible in this way is called tool chain. Tool chains that run on various
hosts and builds programs for various targets.
II. LINKER/LOCATORS FOR EMBEDDED SOFTWARE:
• Linker:
• Locator:
• produces target machine code (which the locator glues into the RTOS)
and the combined code (called map) gets copied into the target ROM
Linking Process shown below:
• The native linker creates a file on the disk drive of the host system that is read by a
part of operating system called the loader whenever the user requests to run the
programs.
• The loader finds memory into which to load the program, copies the program from
the disk into the memory
• Address Resolution:
• Above Figure shows the process of building application software with native tools. One
problem in the tool chain must solve is that many microprocessor instructions contain the
addresses of their operands.
• the above figure MOVE instruction in ABBOTT.C will load the value of variable idunno into
register R1 must contain the address of the variable. Similarly CALL instruction must contain
the address of the whosonfirst. This process of solving problem is called address resolution.
• When abbott.c file compiling,the compiler does not have any idea what the address of idunno
and whosonfirst() just it compiles both separately and leave them as object files for linker.
• Now linker will decide that the address of idunno must be patched to whosonfirst() call
instructoin. When linker puts the two object files together, it figures out idunno and
whosonfirst() are in relation for execution and places in executable files.
• After loader copies the program into memory and exactly knows where idunno and
whosonfirst() are in memory. This whole process called as address resolution.
Therefore the locator must know where the program resides and fix up all memories.
Locators have mechanism that allows you to tell them where the program will be in the target system.
Locators use any number of different output file formats.
The tools you are using to load your program into target must understand whatever file format your
locator produces.
Another issue that locators must resolve in the embedded environment is that some parts of the
program need to end up in the ROM and some parts need to end up in RAM.
For example whosonfirst() end up in ROM and must be remembered even power is off. The variable
idunno would have to be in RAM, since it data may be changed.
This issue does not arise with application programming, because the loader copies the entire program
into RAM.
Most tools chains deal with this problem by dividing the programs into segments. Each
segment is a piece of program that the locator can place it in memory independently of other
segments.
Segments solve other problems like when processor power on, embedded system programmer must
ensure where the first instruction is at particular place with the help of segments.
The linker/ Locator reshuffle these segments and places Z.asm start up code at where
processor begins its execution, it places code segment in ROM and data segment in RAM.
Most compilers automatically divide the module into two or more segments: The instructions
(code), uninitialized code, Initialized, Constant strings. Cross assemblers also allow you to
specify the segment or segments into which the output from the assembler should be placed.
Locator places the segments in memory. The following two lines of instructions tells one
commercial locator how to build the program.
int ifreq;
ifreq =
freq;
}
Where the variable ifreq must be stored. In the above code, in the first case ifreq the initial value
must reside in the ROM (this is the only memory that stores the data while the power is off).In the
second case the ifreq must be in RAM, because setfreq () changes it frequently.
The only solution to the problem is to store the variable in RAM and store the initial value in ROM
and copy the initial value into the variable at startup. Loader sees that each initialized variable has
the correct initial value when it loads the program. But there is no loader in embedded system, so that
the application must itself arrange for initial values to be copied into variables.
The locator deals with this is to create a shadow segment in ROM that contains all of the initial
values, a segment that is copied to the real initialized - data segment at start up. When an embedded
system is powdered on the contents of the RAM are garbage. They only become all zeros if some
start up code in the embedded system sets them as zeros.
Locator Maps:
• Most locators will create an output file, called map, that lists where the locator
placed each of the segments in memory.
• These are useful for debugging an ‗advanced‗ locator is capable of running a startup
code in ROM, which load the embedded code from ROM into RAM to execute
quickly since RAM is faster
Locator MAP IS SHOWN BELOW:
– PROM programmers
– ROM emulators
– In circuit emulators
– Flash
– Monitors
PROM Programmers:
The classic way to get the software from the locator output file into target system by
creating file in ROM or PROM.
Creating ROM is appropriate when software development has been completed, since
cost to build ROMs is quite high. Putting the program into PROM requires a device
called PROM programmer device.
PROM is appropriate if software is small enough, if you plan to make changes to the
software and debug. To do this, place PROM in socket on the Target than being
soldered directly in the circuit (the following figure shows). When we find bug, you
can remove the PROM containing the software with the bug from target and put it into
the eraser (if it is an erasable PROM) or into the waste basket. Otherwise program a
new PROM with software which is bug fixed and free, and put that PROM in the
socket. We need small tool called chip puller (inexpensive) to remove PROM from
the socket. We can insert the PROM into socket without any tool than thumb (see
figure8). If PROM programmer and the locator are from different vendors, its upto us
to make them compatible.
In circuit emulators:
If we want to debug the software, then we can use overlay memory which is a common
feature of in-circuit emulators. In-circuit emulator is a mechanism to get software into target
for debugging purposes.
Flash:
If your target stores its program in flash memory, then one option you always have is to place
flash memory in socket and treat it like an EPROM .However, If target has a serial port, a
network connection, or some other mechanism for communicating with the outside world,
link then target can communicate with outside world, flash memories open up another
possibility: you can write a piece of software to receive new programs from your host across
the communication link and write them into the flash memory. Although this may seem like
difficult
The reasons for new programs from host:
You can load new software into your system for debugging, without pulling chip out of
socket and replacing.
Downloading new software is fast process than taking out of socket, programming and
returning into the socket.
If customers want to load new versions of the software onto your product.
Monitors:
It is a program that resides in target ROM and knows how to load new programs onto the
system. A typical monitor allows you to send the data across a serial port, stores the software
in the target RAM, and then runs it. Sometimes monitors will act as locator also, offers few
debugging services like setting break points, display memory and register values. You can
write your own monitor program.
DEBUGGING TECHNIQUES
I. Testing on host machine
II. using laboratory tools
III. an example system
Introduction:
While developing the embedded system software, the developer will develop the code with
the lots of bugs in it. The testing and quality assurance process may reduce the number of
bugs by some factor. But only the way to ship the product with fewer bugs is to write
software with few fewer bugs. The world extremely intolerant of buggy embedded systems.
The testing and debugging will play a very important role in embedded system software
development process.
BUT: It is impossible to exercise all the code in the target. For example, a laser printer may
have code to deal with the situation that arise when the user presses the one of the buttons just
as a paper jams, but in the real time to test this case. We have to make paper to jam and then
press the button within a millisecond, this is not very easy to do.
Example: In bar code scanner, while scanning it will show the pervious scan results every
time, the bug will be difficult to find and fix.
Leave an “Audit trail” of test result:
Like telegraph ―seems to work‖ in the network environment as it what it sends and receives is
not easy as knowing, but valuable of storing what it is sending and receiving.
BUT: It is difficult to keep track of what results we got always, because embedded systems
do not have a disk drive.
Conclusion: Don‗t test on the target, because it is difficult to achieve the goals by testing
software on target system. The alternative is to test your code on the host system.
The hardware and hardware dependent code has been replaced with test scaffold software on
the right side. The scaffold software provides the same entry points as does the hardware
dependent code on the target system, and it calls the same functions in the hardware
independent code. The scaffold software takes its instructions from the keyboard or from a
file; it produces output onto the display or into the log file.
Conclusion: Using this technique you can design clean interface between hardware
independent software and rest of the code.
Calling Interrupt Routines by scaffold code:
Based on the occurrence of interrupts tasks will be executed. Therefore, to make the system
do anything in the test environment, the test scaffold must execute the interrupt routines.
Interrupts have two parts one which deals with hardware (by hardware dependent interrupt
calls) and other deals rest of the system (hardware independent interrupt calls).
#frame arrives
# Dst Src
Ctrl
mr/56 ab
#Backoff timeout
expires Kt0
#timeout expires again
Kt0
#sometime pass
Kn2
Kn2
#Another beacon frame arrives
Each command in this script file causes the test scaffold to call one of the interrupts in the
hardware independent part.
In response to the kt0 command the test scaffold calls one of the timer interrupt routines. In
response to the command kn followed by number, the test scaffold calls a different timer
interrupt routine the
indicated number of times. In response to the command mr causes the test scaffold to write
the data into memory.
Features of script files:
The commands are simple two or three letter commands and we could write the parser more
quickly.
Comments are allowed, comments script file indicate what is being tested, indicate what
results you expect, and gives version control information etc.
Data can be entered in ASCII or in Hexadecimal.
Targets that have their radios turned off and tuned to different frequencies do not receive the
frame.
The scaffold simulates the interference that prevents one or more stations from receiving the
data. In this way the scaffold tests various pieces of software communication properly with
each other or not.(see the above figure).
Volt meters:
Volt meter is for measuring the voltage difference between two points. The common use of
voltmeter is to determine whether or not chip in the circuit have power. A system can suffer power
failure for any number of reasons- broken leads, incorrect wiring, etc. the usual way to use a volt
meter It is used to turn on the power, put one of the meter probes on a pin that should be attached
to the VCC and the other pin that should be attached to ground. If volt meter does not indicate the
correct voltage then we have hardware problem to fix.
Ohm Meters:
Ohm meter is used for measuring the resistance between two points, the most common use of
Ohm meter is to check whether the two things are connected or not. If one of the address signals
from microprocessors is not connected to the RAM, turn the circuit off, and then put the two
probes on the two points to be tested, if ohm meter reads out 0 ohms, it means that there is no
resistance between two probes and that the two points on the circuit are therefore connected. The
product commonly known as Multimeter functions as both volt and Ohm meters.
Oscilloscopes:
It is a device that graphs voltage versus time, time and voltage are graphed horizontal and vertical
axis respectively. It is analog device which signals exact voltage but not low or high.
Features of Oscilloscope:
You can monitor one or two signals simultaneously.
You can adjust time and voltage scales fairly wide range.
You can adjust the vertical level on the oscilloscope screen corresponds to ground.
With the use of trigger, oscilloscope starts graphing. For example we can tell the oscilloscope to
start graphing when signal reaches 4.25 volts and is rising.
Oscilloscopes extremely useful for Hardware engineers, but software engineers use them for the
following purposes:
1. Oscilloscope used as volt meter, if the voltage on a signal never changes, it will display
horizontal line whose location on the screen tells the voltage of the signal.
2. If the line on the Oscilloscope display is flat, then no clocking signal is in Microprocessor and it
is not executing any instructions.
3. Use Oscilloscope to see as if the signal is changing as expected.
4. We can observe a digital signal which transition from VCC to ground and vice versa shows
there is hardware bug.
Figure3 is a sketch of a typical oscilloscope, consists of probes used to connect the oscilloscope to
the circuit. The probes usually have sharp metal ends holds against the signal on the circuit.
Witch‗s caps fit over the metal points and contain little clip that hold the probe in the circuit. Each
probe has ground lead a short wire that extends from the head of the probe, it can easily attach to
the circuit. It is having numerous adjustment knobs and buttons allow you to control. Some may
have on screen menus and set of function buttons along the side of the screen.
4(a): A Reasonable clock signal
Logic Analyzers:
This tool is similar to oscilloscope, which captures signals and graphs them on its screen. But it
differs with oscilloscope in several fundamental ways
A logic analyzer track many signals simultaneously.
The logic analyzer only knows 2 voltages, VCC and Ground. If the voltage is in between
VCC and ground, then logical analyzer will report it as VCC or Ground but not like exact
voltage.
All logic analyzers are storage devices. They capture signals first and display them later.
Logic analyzers have much more complex triggering techniques than oscilloscopes.
Logical analyzers will operate in state mode as well as timing mode.
Figure7 shows a typical logic analyzer. They have display screens similar to those of
oscilloscopes. Most logic analyzers present menus on the screen and give you a keyboard to enter
choices, some may have mouse as well as network connections to control from work stations.
Logical analyzers include hard disks and diskettes. It can be attached to many signals through
ribbons. Since logic analyzer can attach to many signals simultaneously, one or more ribbon
cables typically attach to the analyzer.
In circuit emulators:
In-circuit emulators also called as emulator or ICE replaces the processor in target system.
Ice appears as processor and connects all the signals and drives. It can perform debugging, set
break points after break point is hit we can examine the contents of memory, registers, see the
source code, resume the execution. Emulators are extremely useful, it is having the power of
debugging, acts as logical analyzer. Advantages of logical analyzers over emulators:
Logical analyzers will have better trace filters, more sophisticated triggering
mechanisms.
Logic analyzers will also run in timing mode.
Logic analyzers will work with any microprocessor.
With the logic analyzers you can hook up as many as or few connections as you
like. With the emulator you must connect all of the signal.
Emulators are more invasive than logic analyzers.
Software only Monitors:
One widely available debugging tool often called as Monitor .monitors allow you to run software
on the actual target, giving the debugging interface to that of In circuit emulator.
Monitors typically work as follows:
One part of the monitor is a small program resides in ROM on the target, this knows
how to receive software on serial port, across network, copy into the RAM and run on
it. Other names for monitor are target agent, monitor, debugging kernel and so on.
Another part the monitor run on host side, communicates with debugging kernel,
provides debugging interface through serial port communication network.
You write your modules and compile or assemble them.
The program on the host cooperates with debugging kernel to download compiled
module into the target system RAM. Instruct the monitor to set break points, run the
system and so on.
You can then instruct the monitor to set breakpoints.
See the above figure, Monitors are extraordinarily valuable, gives debugging interface without any
modifications.
Disadvantages of Monitors:
The target hardware must have communication port to communicate the debugging kernel
with host program. We need to write the communication hardware driver to get the monitor
working.
At some point we have to remove the debugging kernel from your target system and try to
run the software without it.
Most of the monitors are incapable of capturing the traces like of those logic analyzers and
emulators.
Once a breakpoint is hit, stop the execution can disrupt the real time operations so badly.
Other Monitors:
The other two mechanisms are used to construct the monitors, but they differ with normal monitor
in how they interact with the target. The first target interface is with through a ROM emulator.
This will do the downing programs at target side, allows the host program to set break points and
other various debugging techniques.
Advantages of JTAG:
No need of communication port at target for debugging process.
This mechanism is not dependent on hardware design.
No additional software is required in ROM.
UNIT V
Unit V contents at a glance:
The CPU has several internal registers that store values used internally. One of those registers is
the
program counter (PC) ,which holds the address in memory ofaninstruction.
The program counter does not directly determine what the machine does next,but only
indirectly by pointing to an instruction in memory.
2. Harvard architecture:
Harvard machine has separate memories for data and program.
The program counter points to program memory, not data memory.
As a result, it is harder to write self-modifying programs (programs that write data values, then
use
Those values as instructions) on Harvard machines.
Advantage:
The separation of program and data memories provides higher performance for digital signal
processing.
ARM instructions are written one per line, starting after the first column.
Comments begin with a semicolon and continue to the end of the line.
A label, which gives a name to a memory location, comes at the beginning of the line, starting
in the first column.
Here is an example:
LDR r0,[r8]; a comment
label ADD r4,r0,r1
Memory Organization in ARM Processor:
The ARM architecture supports two basic types of data:
The standard ARM word is 32 bits long.
The ARM processor can be configured at power-up to address the bytes in a word
in either little-endian mode (with the lowest-order byte residing in the low-order
bits of the word) or big-endian mode
ARM is a load-store architecture—data operands must first be loaded into the CPU and then
stored back to main memory to save the results
ADD r0,r1,r2
This instruction sets register r0 to the sum of the values stored in r1 and r2.
ADD r0,r1,#2 (immediate operand are allowed during addition)
Multiplication:
no immediate operand is allowed in multiplication
two source operands must be different registers
MLA: The MLA instruction performs a multiply-accumulate operation, particularly useful in matrix
operations and signal processing
MLA r0,r1,r2,r3 sets r0 to the value r1x r2+r3.
Shift operations:
Logical shift(LSL, LSR)
Arithmetic shifts (ASL, ASR)
right shift moves bits down to the least-significant bit in the word.
The LSL and LSR modifiers perform left and right logical shifts, filling the least-
significant bits of the operand with zeroes.
The arithmetic shift left is equivalent to an LSL, but the ASR copies the sign
bit—if the sign is 0, a 0 is copied, while if the sign is 1, a 1 is copied.
For instance, LDR r0,[r1,#16] loads r0 with the value stored at location r1+16.( r1-base address, 16 is
offset)
Auto-indexing updates the base register, such that LDR r0,[r1,#16]!--- first adds 16 to the value of
r1, and then uses that new value as the address. The ! operator causes the base register to be updated
with the computed address so that it can be used again later.
Post-indexing does not perform the offset calculation until after the fetch has been performed.
Consequently,
LDR r0,[r1],#
16 will load r0 with the value stored at the memory location whose address is given by r1, and then add 16 to r1
and set r1 to the new value.
FLOW OF CONTROL INSTRUCTIONS
(Branch Instructions):
Branch Instructions
1. conditional instructions(BGE-- B is branch, GE is condition)
2. unconditional instructions(B)
the following branch instruction B #100 will add 400 to the current PC value
SHARC Processor:
4. base-plus-offset mode
5. Circular Buffers
expression : y = a*(b+c);
program:
R1 = DM(_b) ! Load b
R2 = DM(_c); ! Load c
R2 = R1 + R2;
R0 = DM(_a); ! Load a
R2 = R2*R0;
DM(_y) = R23; ! Store result in y
SHARC jump:
Unconditional flow of control change:
JUMP foo
Three addressing modes:
direct;
indirect;
PC-
relative.
I. BUS PROTOCOLS:
For serial data communication between different peripherals components , the following standards are
used :
VME
PCI
ISA etc
For distributing embedded applications, the following interconnection network protocols are there:
I2 C
CAN etc
I2C :
The I 2 C bus is a well-known bus commonly used to link microcontrollers into systems
I 2C is designed to be low cost, easy to implement, and of moderate speed up to 100 KB/s for
the standard bus and up to 400 KB/s for the extended bus
it uses only two lines: the serial data line (SDL) for data and the serial clock line (SCL), which
indicates when valid data are on the data line
The basic electrical interface of I2C to the bus is shown in Figure
A pull-up resistor keeps the default state of the signal high, and transistors are used in each bus
device to pull down the signal when a 0 is to be transmitted.
Open collector/open drain signaling allows several devices to simultaneously write the bus
without causing electrical damage.
The open collector/open drain circuitry allows a slave device to stretch a clock signal during a
read from a slave.
The master is responsible for generating the SCL clock, but the slave can stretch the low period
of the clock
The I2C bus is designed as a multimaster bus—any one of several different devices may act as
the master at various times.
As a result, there is no global master to generate the clock signal on SCL. Instead, a master
drives both SCL and SDL when it is sending data. When the bus is idle, both SCL and SDL
remain high.
The address 0000000 is used to signal a general call or bus broadcast, which can be used to signal
all devices simultaneously. A bus transaction comprised a series of 1-byte transmissions and an
address followed by one or more data bytes.
data-push programming :
I2C encourages a data-push programming style. When a master wants to write a slave, it
transmits the slave‗s address followed by the data.
Since a slave cannot initiate a transfer, the master must send a read request with the slave‗s
address and let the slave transmit the data.
Therefore, an address transmission includes the 7-bit address and 1 bit for data direction: 0 for
writing from the master to the slave and 1 for reading from the slave to the master
Bus transaction or transmission process:
1) start signal (SCL high and sending 1 to 0 in SDL)
2) followed by device address of 7 bits
3) RW(read / write bit) set to either 0 or 1
4) after address, now the data will be sent
5) after transmitting the complete data, the transmission stops.
The below figure is showing write and read bus transaction:
In CAN terminology, a logical 1 on the bus is called recessive and a logical 0 is dominant.
The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls
the bus down (making 0 dominant over 1).
When all nodes are transmitting 1s, the bus is said to be in the recessive state; when a node
transmits a 0, the bus is in the dominant state. Data are sent on the network in packets known as
data frames.
The first field in the packet contains the packet‗s destination address and is known as the
arbitration field. The destination identifier is 11 bits long.
The trailing remote transmission request (RTR) bit is set to 0 if the data frame is used to request
data from the device specified by the identifier.
When RTR 1, the packet is used to write data to the destination identifier.
The control field provides an identifier extension and a 4-bit length for the data field with a 1 in
between. The data field is from 0 to 64 bytes, depending on the value given in the control field.
A cyclic redundancy check (CRC) is sent after the data field for error detection.
The acknowledge field is used to let the identifier signal whether the frame was correctly received:
The sender puts a recessive bit (1) in the ACK slot of the acknowledge field; if the receiver
detected an error, it forces the value to a dominant (0) value.
If the sender sees a 0 on the bus in the ACK slot, it knows that it must retransmit. The ACK slot is
followed by a single bit delimiter followed by the end-of-frame field.
since CAN is a bus, it does not need network layer services to establish end-to-end connections.
The protocol control block is responsible for determining when to send messages, when a message
must be resent due to arbitration losses, and when a message should be received.
INTERNET ENABLED SYSTEMS:
IP Protocol:
The Internet Protocol (IP) is the fundamental protocol on the Internet.
it is an internetworking standard.
an Internet packet will travel over several different networks from source to destination.
The IP allows data to flow seamlessly through these networks from one end user to another
When node A wants to send data to node B, the application‗s data pass through several layers of
the protocol stack to send to the IP.
IP creates packets for routing to the destination, which are then sent to the data link and physical
layers.
A node that transmits data among different types of networks is known as a router.
IP Packet Format:
The header and data payload are both of variable length.
The maximum total length of the header and data payload is 65,535 bytes.
An Internet address is a number (32 bits in early versions of IP, 128 bits in IPv6). The IP address is
typically written in the form xxx.xx.xx.xx.
packets that do arrive may come out of order. This is referred to as best-effort routing. Since routes
for data may change quickly with subsequent packets being routed along very different paths with
different delays, real-time performance of IP can be hard to predict.