Chapter 02 Embedded Systems Hardware Arcitecture

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Technical University of Mombasa

TEE 4415: EMBEDDED SYSTEMS


Lecturer: Francisco Omutsani
Email: [email protected]

LECTURE SESSIONS
Session Two: Embedded System Architecture.
2.0 Session Objectives
By the end of this session, you should be able to:
 Describe embedded system architecture
 Explain the function of each building blocks of an embedded system hardware
 Classify the types of embedded systems processors, their architecture
 Describe the memory used in embedded system
 Describe the different I/O subsystems Data transfer Schemes
 Describe the CPU performance and Enhancement parameters

2.1 Architecture of Embedded System


An mbeddded system consits of a custom-built hardware built around a Central Processig Unit.
This hardware also contains memory chips onto which the software is loaded.
The embedded system architecture can be represented as a layered achitecture shown below.

Application Software

Operating System

Hardware

Figure 01 Layered Architecture of an Embedded System.

TUM is ISO 9001:2015 Certified Omutsani Francisco


1
Software
Software also called programs is the instructions that tell the computer what to do and how to do
it. A program is a set of instructions that are coded by the programmer to aid the microprocessor
to execute a given application.
The two main categories of software are System Software and Application Software.
 The system software also called the operating system (OS) actually runs the system. This
software controls all the operations of the computer and its devices.
 Note that O.S is not compulsory in every embedded system. For example small appliances
such as remote control units, airconditioners, toys etc you can write only the software
specific code to that application.
 On the other hand for applications involving complex processing it is advisable to have an
operating system.

2.2 Hardware Architecture


Every embedded system consists of customized-hardware built around a Central Processing Unit
(CPU).The following illustration shows the basic structure of an embedded system.

Fig 02 basic structure of an embedded system


 Sensor − It measures the physical quantity and converts it to an electrical signal which can
be read by an observer or by any electronic instrument like an A2D converter. A sensor
stores the measured quantity to the memory.
 A-D Converter − an analog-to-digital converter converts the analog signal sent by the
sensor into a digital signal.
 Processor & ASICs − Processors process the data to measure the output and store it to the
memory.
 Memory: Used to store the processed data

TUM is ISO 9001:2015 Certified Omutsani Francisco


2
 D-A Converter − A digital-to-analog converter converts the digital data fed by the
processor to analog data
 Actuator − An actuator compares the output given by the D-A Converter to the actual
(expected) output stored in it and stores the approved output.
In many of the embedded systems, especially in control systems, embedded hardware is used in a
loop as shown below:

Figure 03 Hardware in the loop


 For example in a mobile phone, sensor correspond to the antenna and actuators
corresponds to the speakers.
2.2.1 Embedded System hardware blocks
Building blocks of the hardware of an embedded system

Read Only Random


Memory Access
Memory

Input Output
Devices Central Processing
Devices
Unit (CPU)

Communication
Application specific Circuitry
Interfaces

Central processing Unit (CPU):


This can be any of the following microcontroller microprocessor or digital Signal processor (DSP).
A microcontroller is a low – cost processor used in small scale applications, Microcontroller
houses on the same chip the necessary components such as memory serial communication
interface analog-to- digital converter.

TUM is ISO 9001:2015 Certified Omutsani Francisco


3
On the other hand microprocessor is more powerful but you need to use any external components
with them
DSP is used mainly for applications in which signal processing is involved such as audio and
video processing
Memory:
Memory is categorized as Random Access Memory (RAM) and Read Only Memory (ROM) The
content of 6the RAM will be erased if power is switched off to the chip whereas ROM retains its
content even if the power is switched off. so the firmware is stored in the ROM when the power is
switched on, the processor reads the ROM, the program is transferred to Ram, where the program
is executed
Input devices:
In an embedded system the input devices have very limited capability. Many embedded system
will have a small keypad which may be used to input only digits Many embedded systems used
in process control do not have any input devices instead they interact with sensors and
transducers to produce electrical signals that are in turn fed to other systems
Output Devices:
In an embedded system the output devices have very limited capability some embedded systems
will have a few LEDs to indicate the health status of the systems modules or the visual indication
of alarms. A small LCD Liquid Crystal Display may be used to display some important
parameters
Communication Interface:
The embedded system may need to interact with other embedded systems or they may have to
transmit data to a desktop. to facilitate this the embedded system are provided with one or a few
communication interfaces such as RS232 RS 422 , RS485 , USB Universal Serial Bus, IEEE 1394,
Ethernet etc.
Application Specific Circuitry:
This includes the sensors, transducers, special processing control circuitry required by an
embedded system depending on its unique application. This circuitry interacts with the CPU to
carry out its unique work

2.2.2 Processors parameters


A variety of embedded applications processors are characterize the following parameters:
 Very small in size

TUM is ISO 9001:2015 Certified Omutsani Francisco


4
 Small memory capacity
 Inexpensive
 Low-power devices
 High-performance
 Special/dedicated applications
 Incorporated only the necessary peripherals

2.2.3 Processor internal architecture

General Purpose ALU


Registers

Control Unit
(CU)

Stack Pointer Memory Address Address


Register Bus

Instruction Memory Data Data Bus


Pointer or Register
Program Counter
Control &
Instruction Decoder Register
Status Bus

Figure 04 Processor Internal Architecture


CPU may be either Microcontroller, Microprocessor or DSP and consist of the following units:
a) ALU which performs the Arithmetic and Logic operationsn
b) General Purpose Rehister , this registers constitute the processor internal memory. The
number of registers varies from processor to processor. Register carry the current data and
operands that are being manipulated by the processor. When aprocessor is refered to as 8-
bit, 16- bit or 32-bit , it refers to the width of the register.
c) Control Unit, it fetches the instructions from memory, decodes it and then executes it. CU
consist of:
i. Instruction Pointer ( also called Program Counter),that points to (or hold the address of
the next instruction to be executed
ii. Stack pointer that points to the top of the stack in memory
iii. Instruction decoder used to decode the instruction

TUM is ISO 9001:2015 Certified Omutsani Francisco


5
iv. Memoy address register hold the address of the memory location where read or write
memory operation
v. Memory data register hold the data of the memory location where read or write
memory operation
In addition to manipulating data, the CPU job is to read data and instructions from memory , read
and write data to memory , write data to output devices, and read data from input devices.
To do this functions the processor communicates with other devices using three buses namely:
 Data bus which is bi-directional, used to carry the data between the processor and
other devices
 Address bus which is un-directional, used to carrie address signals from processor
to memory
 Control and status bus which is bi-directional , used to carry control/status
information about the operation and the health of the system.
2.3 Types of Processors
 Microcontrollers
A microcontroller (µC) is a small computer on a single integrated circuit consisting of a
relatively simple central processing unit (CPU) combined with peripheral devices such as
memories, I/O devices, and timers.
The simplest microcontrollers operate on 8-bit words and are suitable for applications that
require small amounts of memory and simple logical functions (vs. performance-intensive
arithmetic functions).
They may consume extremely small amounts of energy, and often include a sleep mode that
reduces the power consumption to Nano Watts.
 Embedded components such as sensor network nodes and surveillance devices have
been demonstrated that can operate on a small battery for several years.
Microcontrollers such as a family of x86 CPUs used mainly in netbooks and other small mobile
computers.
Because these processors are designed to use relatively little energy without losing too much
performance relative to processors used in higher-end computers, they are suitable for some
embedded applications and in servers where cooling is problematic.
 DSP Processors

TUM is ISO 9001:2015 Certified Omutsani Francisco


6
This are Processors designed specifically to support numerically intensive signal processing
applications are often called DSP processors, or DSPs (digital signal processors).
 A motion control application, for example, may read position or location information
from sensors at sample rates ranging from a few Hertz (Hz, or samples per second) to
a few hundred Hertz.
 Audio signals are sampled at rates ranging from 8,000 Hz (or 8 kHz, the sample rate
used in telephony for voice signals) to 44.1 kHz (the sample rate of CDs).
 Ultrasonic applications (such as medical imaging) and high-performance music
applications may sample sound signals at much higher rates.
 Video typically uses sample rates of 25 or 30 Hz for consumer devices to much higher
rates for specialty measurement applications. Each sample, of course, contains an
entire image (called a frame), which itself has many samples (called pixels) distributed
in space rather than time.
 Other embedded applications that make heavy use of signal processing include:
 Interactive games
 Radar,
 Sonar,
 LIDAR (light detection and ranging) imaging systems;
 Video analytics (the extraction of information from video, for example for
surveillance);
 Driver-assist systems for cars;
 Medical electronics;
 Scientific instrumentation.
Signal processing applications all share certain characteristics.
 First, they deal with large amounts of data. The data may represent samples in time of a
physical processor (such as samples of a wireless radio signal), samples in space (such
as images), or both (such as video and radar).
 Second, they typically perform sophisticated mathematical operations on the data,
including filtering, system identification, frequency analysis, machine learning, and
feature extraction. These operations are mathematically intensive.
2.4 CO-PROCESSORS:

TUM is ISO 9001:2015 Certified Omutsani Francisco


7
These are auxiliary processors that provide flexibility in the program execution whereby the co-
processor attached to the CPU are allowed to implement some of the instructions. For example,
floating-point arithmetic was introduced into the Intel architecture by providing separate chips
that implemented the floating-point instructions.
To support co-processors, certain opcode must be reserved in the instruction set for co-processor
operations. Because it executes instructions, a co-processor must be tightly coupled to the CPU.
When the CPU receives a co-processor instruction, the CPU must activate the co-processor and
pass it the relevant instruction. Co-processor instructions can load and store co-processor registers
or can perform internal operations. The CPU can suspend execution to wait for the co-processor
instruction to finish; it can also take a more superscalar approach and continue executing
instructions while waiting for the co-processor to finish.
Graphics Processors
A graphics processing unit (GPU) is a specialized processor designed especially to perform the
calculations required in graphics rendering.
 Earlier GPU were used to render text and graphics, to combine multiple graphic
patterns, and to draw rectangles, triangles, circles, and arcs.
 Modern GPUs support 3D graphics, shading, and digital video.
o Dominant providers of GPUs today are Intel, NVIDIA and AMD.
 Some embedded applications, particularly games, are a good match for GPUs.
 However, GPUs have evolved towards more general programming models, and
hence have started to appear in other compute-intensive applications, such as
instrumentation.
 GPUs are typically quite power hungry, and therefore today are not a good match
for energy constrained embedded applications.

2.5 Processor Architecture


Based on the number of memory and data bus used processor can be categorized into three types
of architecture namely:
 Von Neumman Architecture
 Harvard Architecture
 Super Harvard Architecture
2.5.1 Von neumman architecture is the most widely used , it has only one memory chip that
stores both data and instructions.

TUM is ISO 9001:2015 Certified Omutsani Francisco


8
Address Bus

Memory
CPU Data Bus (Instruction
& Data)

2.5.2 The harvard architecture


It has two separate memory chips
 one to store the instructions called program memory,
 the other to store the data called Data memory.
 It has two pairs of data buses that that link the two memory chips to the CPU.
 This architecture is more efficient because instruction and data are accessed fast.

Program Data
Memory Memory
Address Address
bus bus
Program
Memory CPU Data Memory

Program
Figure 07 Harvard Architecture
Data
Memory
Memory
Data bus
Data bus
2.5.3 The super Harvard Architecture (SHARC)
It is a slight but significant modification of the Harvard architecture.
In this architeture provision has been mad to store secondary data in the program memory
so as to balance the load on both memory blocks, since the data memory is frequently
accessed than the program memory
Program Data
Memory Memory
Address Address
bus CPU bus
Program
Memory Data Memory
(Instructions Instruction
and Program Cache
Data
Secondary Memory
Memory
Data) Data bus
Data bus

Fihure 08 Super Harvard Achitecture (SHARC)

TUM is ISO 9001:2015 Certified Omutsani Francisco


9
Another axis along which we can organize computer architectures relates to their instructions and
how they are executed.
Hence processors are divided into the following categories:
 Complex instruction set computers (CISC).
 Reduced instruction set computers (RISC).
Many early computer architectures were what are known today as complex instruction set
computers (CISC).
 These machines provided a variety of instructions that may perform very complex
tasks, such as string searching;
 They also generally used a number of different instruction formats of varying
lengths.
Reduced instruction set computers (RISC).
 These computers tended to provide somewhat fewer and simpler instructions.
 The instructions were also chosen so that they could be efficiently executed in pipelined
processors. Early RISC designs substantially outperformed CISC designs of the period.
In general while evaluating the processors the following specifications are to be considered:
 Clock speed
 Length of the register
 Width of the data and address bus
 Number of registers
 Internal RAM
 Internal ROM
 On Chip peripherals such as timer UART, DAC and ADC
 Interrupt lines
 Number of programmable I/O lines
2.6 ACCELERATORS:
An accelerator is attached to CPU buses to quickly execute certain key functions. Accelerators can
provide large performance increases for applications with computational kernels that spend a
great deal of time in a small section of code. Accelerators can also provide critical speedups for
low-latency I/O functions.
 The design of accelerated systems is one example of hardware/software co-design—the
simultaneous design of hardware and software to meet system objectives. Thus far, we have

TUM is ISO 9001:2015 Certified Omutsani Francisco


10
taken the computing platform as a given; by adding accelerators, we can customize the
embedded platform to better meet our application’s demands.
As illustrated in Figure 4.1, a CPU accelerator is attached to the CPU bus. The CPU is often called
the host. The CPU talks to the accelerator through data and control registers in the accelerator.
These registers allow the CPU to monitor the accelerator’s operation and to give the accelerator
commands.
The CPU and accelerator may also communicate via shared memory. If the accelerator needs to
operate on a large volume of data, it is usually more efficient to leave the data in memory and
have the accelerated read and write memory directly rather than to have the CPU shuttle data
from memory to accelerator registers and back.

2.4 Fig 4.1 CPU accelerators in a system.


An accelerator is not a co-processor. A co-processor is connected to the internals of the CPU and
processes instructions as defined by opcode.
An accelerator interacts with the CPU through the programming model interface; it does not
execute instructions. Its interface is functionally equivalent to an I/O device, although it usually
does not perform input or output.
Both CPUs and accelerators perform computations required by the specification; at some level we
do not care whether the work is done on a programmable CPU or on a hardwired unit.
The first task in designing an accelerator is determining that our system actually needs one. We
have to make sure that the function we want to accelerate will run more quickly on our accelerator
than it will by executing as software on a CPU.

TUM is ISO 9001:2015 Certified Omutsani Francisco


11
If our system CPU is a small microcontroller, the race may be easily won, but competing against a
high-performance CPU is a challenge. We also have to make sure that the accelerated function will
speed up the system. If some other operation is in fact the bottleneck, or if moving data into and
out of the accelerator is too slow, then adding the accelerator may not be a net gain.
2.7 Memory:
The memory is divided into two categories namely program and data memory.
Both program and data memory can be internal to the processor ( in a microcontroller) or it can be
erternal memory. If the capacityof the internal memory is not sufficient, you can use external
memory chips to increase the memory capacity.

Processor

Address
Bus

Data Bus

EEPROM Flash
RAM
Memory

Figure 09 Memory in Embedded systems


Memory chips are classified as:
 Random Access Memory (RAM)
 Read only Memory (ROM)
 Hybrid Memory.
RAM: It is a read –write chip that comes in two types namely:
 Static RAM (SRAM): looses itts content the moment power is tuned off to the chip
 Dynamic RAM (DRAM) :retains its contents for a fraction of a second , to keep its contents
intact the DRAM is refreshed periodically using DRAM controller.
 SRAM is faster and consumes less power.
 The DRAM is very cheap and hence use in high capacity RAM
ROM is used to store the firmware in embedded systems , this firmware is is written into the ROM
at the factory.
Avsariety of ROMare available with different capabilities
 Programmable ROM (PROM) it can be programmed only once by the user

TUM is ISO 9001:2015 Certified Omutsani Francisco


12
 Erasable programmable ROM (EPROM) uit can be programmed many times
Hybrid memory devices:
 EEPROM
 Non-Volatile RAM
 Flash memory is a type of EEPROM it is characterised by its fast read quality.
2.7.1 MEMORY SYSTEM MECHANISMS:
Modern microprocessors do more than just read and write a monolithic memory. Architectural
features improve both the speed and capacity of memory systems.
Microprocessor clock rates have increased at a faster rate than memory speeds, such that
memories are falling behind microprocessors every day. As a result, computer architects resort to
caches to increase the average performance of the memory system.
Although memory capacity is increasing steadily, program sizes are increasing as well, and
designers may not be willing to pay for all the memory demanded by an application. Modern
microprocessor units (MMUs) perform address translations that provide a larger virtual memory
space in a small physical memory. In this section, we review both caches and MMUs.
2.7.1.1 Caches:
Caches are widely used to speed up memory system performance. Many microprocessor
architectures include caches as part of their definition.
The cache speeds up average memory access time when properly used. It increases the variability
of memory access times accesses in the cache will be fast, while access to locations not cached will
be slow. This variability in performance makes it especially important to understand how caches
work so that we can better understand how to predict cache performance and factor variability
into system design.
A cache is a small, fast memory that holds copies of some of the contents of main memory.
Because the cache is fast, it provides higher-speed access for the CPU; but since it is small, not all
requests can be satisfied by the cache, forcing the system to wait for the slower main memory.
Caching makes sense when the CPU is using only a relatively small set of memory locations at any
one time; the set of active locations is often called the working set.
Figure 1.19 shows how the cache support reads in the memory system. A cache controller mediates
between the CPU and the memory system comprised of the main memory.

TUM is ISO 9001:2015 Certified Omutsani Francisco


13
The cache controller sends a memory request to the cache and main memory. If the requested
location is in the cache, the cache controller forwards the location’s contents to the CPU and aborts
the main memory request; this condition is known as a cache hit.
If the location is not in the cache, the controller waits for the value from main memory and
forwards it to the CPU; this situation is known as a cache miss.

Fig 1.19: The cache in the memory system.


We can classify cache misses into several types depending on the situation that generated them:
 A compulsory miss (also known as a cold miss) occurs the first time a location is used,
 A capacity miss is caused by a too-large working set, and
 A conflict miss happens when two locations map to the same location in the cache.
2.8 I/O SUBSYSTEMS
I/O subsystem layer hides the device peculiarities from the application layer, hence providing a
uniform access method to all the peripheral I/O devices of the system

Application software layer


I/O subsystem layer
Device drivers

Interrupt Handlers
I/O device Hardware
Layer representation of the software component with respect to I/O devices
It comprises of I/O devices and their associated device drivers. I/O subsystem defines a standard
set of functions called application Programmable interface (API) for the operation of the I/O
devices.
The functions of the device drivers include:

TUM is ISO 9001:2015 Certified Omutsani Francisco


14
 Implementing each function call in the API set
 Export the set of function calls to the I/O subsystem layer
 Sets up the associated link between the I/O subsystem and the corresponding I/O device
Examples of function calls applicable to I/O subsystems operation include:
 Create function: used to create a virtual instance of the I/O device in the I/O subsystem
layer making the device available for subsequent operations.
 Destroy function: used to deletes the created virtual Instances of an I/O device once the
services of the I/O are no longer required effectively giving the drivers the opportunity to
perform cleanup operation such as unmapping the system from the system memory map
(Space), deallocating the Interrupt request line and removing the interrupt service routine
from the system
 Open function: Used to prepare an I/O device for subsequent operation such as read or
write i.e. enable or disable a device, the function can also specify the mode of use e.g. open
for read only or for read/write
 Close function: used to close a previously opened I/O devices one its service is no longer
needed.
 Read function: Used to retrieve data from the previously opened I/O device. The caller
may specify the amount of data to be retrieved from the device and the location in memory
where data is to be stored
 Write function: Used to transfer data to the previously opened I/O device. The caller may
specify the amount of data to be transferred to the device and the location memory holding
data.
I/O devices can range from low bit rate to high bit rate hardware. Each device port can be
accessed through special processor instructions
There are two ways used to access each device port namely.
1. Memory Mapped I/O
2. I/O Mapped I/O or Isolated I/O
2.8.1 Memory Mapped I/O
Memory Mapped I/O system is used in smaller system.
In this system there is only one address space, some addresses are assigned to memories and some
addresses to I/O devices i.e. the addresses to I/O devices are different from the addresses which
have been assigned to memories.

TUM is ISO 9001:2015 Certified Omutsani Francisco


15
In other words the I/O devices and memory share the memory map, with some address reserved
for the I/O devices and rest for the memory.
This scheme is shown in figure 7.1.

Figure 7.1
The microprocessor use R/W signals to determine the direction of data flow.
The main advantage of this scheme is that it does not require additional decoding circuitry and the
set of instructions may be used to fetch data either from the I/O devices or from the memory
locations.
Following instructions may be used for the data transfer in this scheme:
MOV M, A
It moves the contents of accumulator to MH-L. If H-L pair contains the address of memory
location, then the data will be transferred to memory location; if on the other hand H-L pair
contains the address of the I/O devices, then the accumulator data will be transferred to I/O
devices.
MOV A, M
It moves the contents of MH-L to accumulator. If H-L pair contains the address of memory
location, then the data will be transferred from memory location; similarly if H-L pair contains the
address of the I/O devices, then the data from I/O devices will be transferred to accumulator.
STA address
It stores the contents of accumulator to addressed location. If the address represents the address of
memory location then the accumulator contents will be stored to memory location; if on the other
hand address represents the address of the I/O devices, then the accumulator data will be
transferred to I/O devices.

TUM is ISO 9001:2015 Certified Omutsani Francisco


16
LDA address
It stores the addressed contents to the accumulator. If the address represents the address of
memory location then the contents stored in memory location will be transferred to accumulator;
if on the other and address represents the address of the I/O devices, then the data from the I/O
devices will be transferred to the accumulator.
Many other instructions like LDAX, STAX and other arithmetic and logic instructions etc. may
also be used in this memory mapped I/O systems
2.8.2 I/O Mapped I/O or Isolated I/O
Larger systems use this technique for the data communication to I/O devices.
In this technique the microprocessor treats the memory and I/O devices separately, using
different control lines for each of them.
Figure 7.2 shows this system used in 8085 microprocessor.

Figure 7.2
In 8085A the IO/M’ signal is used for this purpose. If this signal is low (0), it represents the
memory.
However, if this signal is high (1), it represents the I/O operations. RD’ and WR’ signals in
association with IO/M’ signal help in performing I/O read/write operations or memory
read/write operations.
Table 7.1 shows the operations of these signals.

Table 7.1

TUM is ISO 9001:2015 Certified Omutsani Francisco


17
Fig 7.3
Comparison of Memory Mapped I/O and I/O Mapped I/O is given in table 7.2.

Table 7.2
2.9 DATA TRANSFER SCHEMES
The data transfer schemes are categorized depending upon the capabilities of I/O devices to
accept or transfer serial or parallel data.

TUM is ISO 9001:2015 Certified Omutsani Francisco


18
Fig 7.0
The 8085 microprocessor is a parallel device i.e. it transfers eight bits of data simultaneously over
eight data lines (parallel I/O mode).
Similarly, parallel data communication is not possible with devices such as CRT terminal or
Cassette tape etc. For these devices, therefore, serial I/O mode is used which transfer a single bit
on a single line at a time.
 For serial data transmission, 8-bit parallel word is converted to a stream of eight serial bit
using parallel-to-serial conversion.
 Similarly, in serial reception of data, the microprocessor receives a stream of 8-bit one by
one which are then converted to 8bit parallel word using serial-to-parallel conversion.
A. SERIAL DATA TRANFER
Serial interfaces for serial data transfer employs any of the three modes:
 Simplex
 Half Duplex
 Full Duplex
Simplex In this mode data transfer is accomplished in only one direction

Transmitter Receiver

Examples include radio, TV broadcasting


Half Duplex, involves data transfer in either direction but one at a time using only one channel
Example includes the radio walkie-talkie

Transmitter Transmitter

Receiver Receiver

TUM is ISO 9001:2015 Certified Omutsani Francisco


19
Full Duplex: Involves transfer of data in both directions at the same time using two channels
Example is Phone

Transmitter Transmitter

Receiver Receiver

B. PARALLEL DATA TRANFER


The parallel data transfer can be accomplished in THREE ways:
 Programed I/O
 Interrupt Driven
 DMA
2.9.1 Programmed I/O Data Transfer
This method of data transfer is generally used in the simple microprocessor systems where speed
is unimportant.
This method uses instructions to get the data into or out of the microprocessor.
The data transfer can be synchronous or asynchronous depending upon the type and the speed of
the I/O devices.
Synchronous
 Synchronous type of data transfer can be used when the speed of the I/O devices matches
with the speed of the microprocessor.
 The common clock pulse synchronizes the microprocessor and the I/O devices. In such
data transfer scheme because of the matching of the speed, the microprocessor does not
have to wait for the availability of the data; the microprocessor immediately sends data
for the transfer as soon as the microprocessor issues a signal.
Asynchronous
 The asynchronous data transfer method is used when the speed of the I/O devices is
slower than the speed of the microprocessor.
 Because of the mismatch of the speed, the internal timing of the I/O device is independent
from the microprocessor and thus two units are said to be asynchronous to each other.
 The asynchronous data transfer is normally implemented using ‘handshaking’ mode.

TUM is ISO 9001:2015 Certified Omutsani Francisco


20
 In the handshaking mode some signals are exchanged between the I/O device and
microprocessor before the data transfer takes place.
 The microprocessor has to check the status to the input/output device, if the device is
ready for the data transfer.
 The microprocessor initiates the I/O device to get ready; the status of the I/O device is
continuously checked by the microprocessor till the I/O device becomes ready, the
microprocessor sends instructions to transfer the data.
 Flow chart for this mode of data transfer is shown in figure 7.4.

Fig 7.4
 Fig. 7.5 illustrates the asynchronous handshaking process to transfer the data from the
microprocessor to I/O device.
 In this figure, the microprocessor sends a ready signal to I/O device. When the device is
ready to accept the data, the I/O device sends an ‘ACK’ (Acknowledge) signal to
microprocessor indicating that the I/O device has acknowledged the ‘Ready’ signal and is
ready for the transfer of data.

TUM is ISO 9001:2015 Certified Omutsani Francisco


21
Fig 7.5
Figure 7.6 shows the asynchronous handshaking process to transfer the data from the I/O
device to microprocessor.
In this case I/O device issues the ready signal to microprocessor indicating that I/O device
is ready to send the data to microprocessor.
In response to this signal, valid data signal is sent by the microprocessor to I/O device and
then the valid data is put on the data bus for the transfer

Fig 7.6
NOTE: In the programmed I/O data transfer method discussed above, the microprocessor is
busy all the time in checking for the availability of data from the slower I/O devices and also in
checking if I/O device is ready for the data transfer. In other words in this data transfer scheme,
some of the microprocessor time is wasted in waiting while an I/O device is getting ready.

2.9.2 Interrupt Driven I/O Data Transfer


The interrupt is hardware facilities provided on the microprocessor to aid in data transfer between
the I/O and microprocessor
The interrupt driven I/O data transfer method is efficient as no microprocessor time is wasted in
waiting for an I/O device to be ready.
The I/O device informs the microprocessor for the data transfer whenever the I/O device is ready.
This is achieved by interrupting the microprocessor. The flow chart for this method of data
transfer is shown in figure 7.7.

TUM is ISO 9001:2015 Certified Omutsani Francisco


22
Fig 7.7
In the beginning the microprocessor initiates data transfer by requesting the I/O device ‘to
get ready’ and then continue executing its original program rather wasting its time by
checking the status of I/O device.
Whenever the device is ready to accept or supply data, it informs the processor through a
control signal known as interrupt signal.
In response to this interrupt signal, the microprocessor sends back an interrupt
acknowledge signal to the I/O device indicating that it received the request (Fig. 7.8).
It then suspends its job after executing the current instruction.
It saves the contents of program counter and status to stack and jumps to the subroutine
program. This subroutine program is called Interrupt Service Subroutine (ISS) program.
The ISS saves the processor status into stack; and after executing the instruction for the data
transfer,
It restores the processor status and then returns to main program.

TUM is ISO 9001:2015 Certified Omutsani Francisco


23
Fig 7.8
As already discussed, several input/output devices may be connected to microprocessor using
Interrupt Driven Data Transfer Scheme. Following interrupt request configuration may arise while
interfacing the I/O devices to microprocessor.
1. Single Interrupt system
2. Multi Interrupt System
1. Single Interrupt System
When only one interrupt line is available with the microprocessor and several I/O devices are to
be connected, then the method is known as Single Interrupt System.
Figure 7.9 shows the way to connect several devices to one active low input interrupt terminal (
INTR ) of the microprocessor.

Figure 7.9
However, to connect the several I/O devices to active high interrupt terminal (INTR) is shown in
figure 7.10. In the active low interrupt line of the microprocessor the devices are connected to
INTR terminal through different open collector NOT gates; when any of the devices is active it
provides a low signal to INTR enabling the interrupt line.
Similarly, in the active high interrupt line the I/O devices are connected through an OR gate.
When any of the device is high the output of OR gate sends a high signal to interrupt line (INTR).

TUM is ISO 9001:2015 Certified Omutsani Francisco


24
Figure 7.10
When the interrupt line is active in either of the two methods discussed above, then the
microprocessor will not know which device has sent the interrupt signal.
There are three techniques commonly used to solve the problem for the microprocessor that is
requesting the interrupt and resolving any simultaneous requests by two or more devices. These
are:
a) Polling:
The interrupt signal from each device can be used to set one bit of a register
wired as an input port. When an interrupt occurs, the ISS polls this port to see
who requested service.
A priority is automatically established by the order of polling. This technique is
very simple but has the negative of degrading response time.
b) Daisy Chain:
This technique has been shown in figure 7.11.
In this method each I/O device has an Interrupt Enable Input (IEI) and an
Interrupt Enable Output (IEO).
An interrupt request can be made only if IEI is high.
A serial connection like a chain of all the I/O devices is made.
The highest priority I/O device is placed at the first position followed by the
lower priority devices in sequence.
If any device sends an interrupt signal i.e. low to the interrupt line, then the INTR
line of the processor is enabled.
The interrupt acknowledge signal (INTA) is enabled in response to low INTR
line.
In figure 7.11, device 2 is requesting an interrupt causing its IEO to be low.
This in turn disables devices 3 and 4. It may be noted that device 1 is still able to
request an interrupt because it has a higher priority.

TUM is ISO 9001:2015 Certified Omutsani Francisco


25
The interrupt acknowledge signal can be used to reset IEO of the interrupting
device.

Fig 7.11
c) Priority Interrupt Controller (PIC):
In this method several I/O devices may be connected to a single interrupt line
through programmable interrupt controller (IC 8259).
Up to 8 input/output devices may be connected to the microprocessor.
If more than 8 I/O devices to be connected, more PICs (programmable interrupt
controllers) are used in cascade.
2. Multi Interrupt System
When the microprocessor has several interrupt terminals and one I/O device is to be connected to
each interrupt terminal, then it is known as multi interrupt system. In this scheme, the number of
I/O devices to be connected to the interrupt lines should be equal to or less than the number of
interrupt terminals.
In this way one device is connected to each level of interrupt.
So when a device interrupts the microprocessor, it immediately knows which device has
interrupted.
Such an interrupt scheme is known as vectored interrupt.
2.9.3 Direct Memory Access (DMA) Data Transfer
In programmed I/O or interrupt driven I/O methods of data transfer between the I/O devices
and external memory is via the accumulator. For bulk data transfer from I/O devices to memory
or vice-versa, these two methods discussed above are time consuming and quite uneconomical

TUM is ISO 9001:2015 Certified Omutsani Francisco


26
even though the speed of I/O devices matches with the speed of microprocessor; since the data is
first transferred to accumulator and then to concerned device. The Direct Memory Access (DMA)
data transfer method is used for bulk data transfer from I/O devices to microprocessor or vice-
versa. In this method I/O devices are allowed to transfer the data directly to the external memory
without being routed through accumulator. For this the microprocessor relinquishes the control
over the data bus and address bus, so that these can be used for transfer of data between the
devices. For the data transfer using DMA process, a request to the microprocessor by the I/O
device is sent and on receipt of such request, the microprocessor relinquishes the address and data
buses and informs the I/O devices of the situation by sending Acknowledge signal as shown in
figures 7.12 and 7.13. The I/O device withdraws the request when the data transfer between the
I/O device and external memory is complete.

Fig 7.12
It may be mentioned here that DMA transfer the data of the following types:
• Memory to I/O device
• I/O device to memory
• Memory to memory
• I/O device to I/O device

Fig 7.13

TUM is ISO 9001:2015 Certified Omutsani Francisco


27
For transferring the data through DMA, an interfacing chip known as DMA Controller is used
with the microprocessor that helps to generate the addresses for the data to be transferred from
the I/O devices (Fig. 7.14). The peripheral device sends the request signal (DMARQ) to the DMA
controller and the DMA controller in turn passes it to the microprocessor (HOLD signal). On
receipt of the DMA request the microprocessor sends an acknowledge signal (HLDA) to the DMA
controller. On receipt of this signal (HLDA) the DMA controller sends a DMA acknowledge signal
(DMACK) to the I/O device. The DMA controller then takes over the control of the buses of
microprocessor and controls the data transfer between RAM and I/O device. When the data
transfer is complete, DMA controller returns the control over the buses to the microprocessor by
disabling the HOLD and DMACK signals.

Fig 7.14

2.10 CPU PERFORMANCE:


Let us consider the factors that can substantially influence program performance:
 Pipelining.
 Caching.
2.10.1 Pipelining
Modern CPUs are designed as pipelined machines in which several instructions are executed in
parallel. Pipelining greatly increases the efficiency of the CPU. But like any pipeline, a CPU
pipeline works best when its contents flow smoothly.
Some sequences of instructions can disrupt the flow of information in the pipeline and,
temporarily at least, slow down the operation of the CPU.

TUM is ISO 9001:2015 Certified Omutsani Francisco


28
The ARM7 has a three-stage pipeline:
■ Fetch the instruction is fetched from memory.
■ Decode the instruction’s opcode and operands are decoded to determine what function to
perform.
■ Execute the decoded instruction is executed.
Each of these operations requires one clock cycle for typical instructions. Thus, a normal
instruction requires three clock cycles to completely execute, known as the latency of instruction
execution. But since the pipeline has three stages, an instruction is completed in every clock cycle.
In other words, the pipeline has a throughput of one instruction per cycle.
Figure 1.22 illustrates the position of instructions in the pipeline during execution using the
notation introduced by Hennessy and Patterson [Hen06]. A vertical slice through the timeline
shows all instructions in the pipeline at that time. By following an instruction horizontally, we can
see the progress of its execution.
The C55x includes a seven-stage pipeline :
1. Fetch.
2. Decode.
3. Address computes data and branch addresses.
4. Access 1 reads data.
5. Access 2 finishes data read.
6. Read stage puts operands onto internal busses.
7. Execute performs operations.
RISC machines are designed to keep the pipeline busy. CISC machines may display a wide
variation in instruction timing. Pipelined RISC machines typically have more regular timing
characteristics most instructions that do not have pipeline hazards display the same latency.

Fig 1.22 Pipelined execution of ARM instructions.

TUM is ISO 9001:2015 Certified Omutsani Francisco


29
2.10.2 Caching
We have already discussed caches functionally. Although caches are invisible in the programming
model, they have a profound effect on performance. We introduce caches because they
substantially reduce memory access time when the requested location is in the cache.
However, the desired location is not always in the cache since it is considerably smaller than main
memory. As a result, caches cause the time required to access memory to vary considerably. The
extra time required to access a memory location not in the cache is often called the cache miss
penalty.
The amount of variation depends on several factors in the system architecture, but a cache miss is
often several clock cycles slower than a cache hit. The time required to access a memory location
depends on whether the requested location is in the cache. However, as we have seen, a location
may not be in the cache for several reasons.
■ At a compulsory miss, the location has not been referenced before.
■ At a conflict miss, two particular memory locations are fighting for the same cache line.
■ At a capacity miss, the program’s working set is simply too large for the cache.
The contents of the cache can change considerably over the course of execution of a program.
When we have several programs running concurrently on the CPU,
2.11 CPU POWER CONSUMPTION:
Power consumption is, in some situations, as important as execution time. In this section we study
the characteristics of CPUs that influence power consumption and mechanisms provided by CPUs
to control how much power they consume. First, it is important to distinguish between energy and
power. Power is, of course, energy consumption per unit time. Heat generation depends on power
consumption. Battery life, on the other hand, most directly depends on energy consumption.
Generally, we will use the term power as shorthand for energy and power consumption,
distinguishing between them only when necessary.
The high-level power consumption characteristics of CPUs and other system components are
derived from the circuits used to build those components. Today, virtually all digital systems are
built with complementary metal oxide semiconductor (CMOS) circuitry. The detailed circuit
characteristics are best left to a study of VLSI design [Wol08], but the basic sources of CMOS
power consumption are easily identified and briefly described below.
■ Voltage drops: The dynamic power consumption of a CMOS circuit is proportional to the square
of the power supply voltage (V2). Therefore, by reducing the power supply voltage to the lowest

TUM is ISO 9001:2015 Certified Omutsani Francisco


30
level that provides the required performance, we can significantly reduce power consumption. We
also may be able to add parallel hardware and even further reduce the power supply voltage
while maintaining required performance.
■ Toggling: A CMOS circuit uses most of its power when it is changing its output value. This
provides two ways to reduce power consumption. By reducing the speed at which the circuit
operates, we can reduce its power consumption (although not the total energy required for the
operation, since the result is available later).We can actually reduce energy consumption by
eliminating unnecessary changes to the inputs of a CMOS circuit—eliminating unnecessary
glitches at the circuit outputs eliminates unnecessary power consumption.

2.12 Session Summary


Described embedded system architecture overview loop, explained the function of each
building blocks of an embedded system hardware, classified the types of embedded
systems processors and their architecture, described the memory used in embedded
system, described the I/O subsystem used on embedded system and finally discussed the
CPU performance and enhancement
2.13 Student Activity
1. Define the following terminology as applied to embedded system
a. Sensors and Actuators
b. Microcontroller and microprocessor
c. RISC and CISC technology
2. With block diagram describe the following architectures
a. Embedded system Architecture
b. Internal processor architecture
3. Describe any FOUR characteristics of an embedded system
4. With aid of a block diagram distinguish between Von Neumann and Harvard CPU
architecture
5. Discuss the I/O subsystem data transfer schemes.
6. Describe the key CPU performance and enhancement parameters
2.14 Further Readings/References
Core Textbooks
i. Tim Wilmshurt (2009) Designing Embedded Systems with PIC Microcontrollers, Principles

TUM is ISO 9001:2015 Certified Omutsani Francisco


31
and Applications, Second Edition, Newnes, ISBN-13: 978-1856177504, ISBN-10: 1856177505
ii. Rafiquzzaman M. (2014) Fundamentals of Digital Logic and Microcontrollers, 6th Edition,
Wiley, ISBN: 978-1-118-85579-9
iii. Robert B. R., Bruce J. W., Bryan A. J. (2014) Microcontrollers: From Assembly Language to C
Using the PIC24 Family, 2nd Edition, Cengage Learning PTR. ISBN-13: 978-1305076556,
ISBN-10: 1305076559
Core Journals
i. Journal of Microprocessors and Microsystems: Embedded Hardware Design, ISSN: 0141-
9331
ii. Journal of Electrical Engineering and Electronic Technology, ISSN 2325-9833
iii. International journal of electrical, computer, and systems engineering ISSN: 2077-1231
(Online) /2227-2739 (Print)
Recommended Textbooks
i. Dogan Ibrahim (2014) PIC Microcontroller Projects in C, Second Edition: Basic to
Advanced, 2nd Edition, Newness, ISBN-10: 0080999247, ISBN-13: 978-0080999241
ii. Sabri Cetinkunt (2015) Mechatronics with Experiments, 2nd Edition, Wiley, ISBN: 978-1-
118-80246-5
iii. Ying Bai (2016) Practical Microcontroller Engineering with ARM- Technology, Wiley, ISBN:
978-1-119-05237-1
Recommended Journals
i. IEEE Transactions on Communications ISSN:0090-6778
ii. IEEE journal on Selected Areas in Communications ISSN:0733-8716
iii. Journal of Microcontroller Engineering & Applications, ISSN 2455- 197X

TUM is ISO 9001:2015 Certified Omutsani Francisco


32

You might also like