Unit 1 Introduction To Embedded System Design
Unit 1 Introduction To Embedded System Design
Engineering
Unit 1
1
Unit 1: Syllabus
Introduction to Embedded System Design 09 Hrs
Introduction, Characteristics of Embedding Computing Applications, Concept
of Real time Systems, Challenges in Embedded System Design, Design Process:
Requirements, Specifications, Hardware Software Partitioning, Architecture
Design
Embedded System Architecture:
Processor performance Enhancement: Co-Processor & Hardware Accelerators,
Pipelining, Superscalar Execution, Multi Core CPUs.
2 MGRJ,ECE,RVCE
Block diagram of a
computer system
MP
IC
4 MGRJ,ECE,RVCE
Embedded System: Definition
An Electronic/Electro mechanical system which is designed to perform a specific
function and is a combination of both hardware and firmware (Software).
5 MGRJ, ECE
The Typical Embedded System
6 MGRJ, ECE
Source: Ref. 2
The Core of the Embedded Systems
-The core of the embedded system falls into any one(or more) of the following
categories.
• General Purpose and Domain Specific Processors
• Microprocessors
• Microcontrollers
• Digital Signal Processors
• Programmable Logic Devices (PLDs)
• Application Specific Integrated Circuits (ASICs)
• Commercial off the shelf Components (COTS)
7 MGRJ, ECE
Sensors & Actuators
Sensor:
A transducer device which converts energy from one form to another for
any measurement or control purpose. Sensors acts as input device
Eg. Hall Effect Sensor which measures the distance between
the cushion and magnet in the Smart Running shoes from
Adidas
Actuator:
A form of transducer device (mechanical or electrical) which converts
signals to corresponding physical action (motion). Actuator acts as an
output device Electronics-enabled “Smart” running shoes from Adidas
Photo Courtesy of Adidas (www.adidas.com)
10 MGRJ, ECE
Embedded Systems Vs General Computing Systems
General Purpose System Embedded System
A system which is a combination of
A system which is a combination of
generic hardware and General Purpose special purpose hardware and embedded
Operating System for executing a OS for executing a specific set of
variety of applications. applications.
May or may not contain an operating
Contain a General Purpose Operating
System (GPOS). system for functioning.
The firmware of the embedded system is
Applications are alterable
(programmable) by user (It is possible pre-programmed and it is non-
for the end user to re-install the alterable by end-user (There may be
Operating System, and add or remove exceptions for systems supporting OS
user applications). kernel image flashing through special
11 MGRJ, ECE
hardware settings).
Embedded Systems Vs General Computing Systems
General Purpose System Embedded System
• Performance is the key deciding factor Application specific requirements (like
on the selection of the system. Always performance, power requirements,
‘Faster is Better’. memory usage etc.) are the key deciding
• Less/not at all tailored towards reduced factors.
operating power requirements, options Highly tailored to take advantage of the
for different levels of power power saving modes supported by
management. hardware and Operating System
• Response requirements are not time For certain category of embedded systems
critical. like mission critical systems, the response
time requirement is highly critical.
12 MGRJ, ECE
Embedded Systems Vs General Computing Systems
13 MGRJ, ECE
What is this?
14 MGRJ,ECE,RVCE
Characteristics of Embedding Computing Applications
Application and Domain Specific.
Reactive and Real time
Operates in Harsh environment
Distributed
Small size and Weight
Power Concerns
Compact Systems…
15 MGRJ,ECE,RVCE
Characteristics…
Application and Domain Specific Systems
Embedded systems are not general-purpose computers.
Optimized for a specific application.
Many of the job characteristics are known before the hardware is designed
which allows the designer to focus on the specific design constraints of a well
defined application.
Embedded S/W usually cannot run on other embedded systems without
modification.
Hardware tailored to an application.
– Unnecessary circuitry is eliminated
– Resources shared if possible.
16 MGRJ, ECE
Characteristics …
17 MGRJ, ECE
Real & Real time systems…
One of the biggest challenges for embedded system designers is,
performing an accurate worst case design analysis on systems with
statistical performance characteristics (e.g., cache memory on a DSP
or other embedded processor).
Accurately predicting the worst case may be difficult in complicated
architectures.
Real time system operation means the timing behavior of the system
should be deterministic. Real Time system should not miss deadline.
Ex: Mission control, Flight control system etc…
18 MGRJ, ECE
Characteristics …
Harsh environment
Many embedded systems do not operate in a controlled
environment.
Excessive heat is often a problem, especially in applications involving
combustion (e.g., Automobile applications).
Protection from vibration, shock, lightning, power supply
fluctuations, water, corrosion, fire, and general physical damage.
Embedded system designer is to model accurately the different
parameters of harsh environment in real world.
19 MGRJ, ECE
Characteristics …
Small and low weight
Many embedded computers are physically located within some larger system.
Challenge is to develop non-rectangular geometries for certain solutions.
Weight can also be a critical constraint.
Power Concerns
When controlling physical equipment, large current loads may need to be
switched in order to operate motors and other actuators.
Designer must carefully balance system tradeoffs among analog components,
power, mechanical, network, and digital hardware with corresponding software.
Power management need to be considered in designing embedded system.
System should be designed in such a way as to minimize the heat dissipation by
the system. More power consumption less battery life.
20 MGRJ, ECE
Characteristics …
Distributed Systems
A set of nodes connected by the network, cooperating to achieve a common goal
- Node: a µC + I/O + communication interface.
- One or multiple networks: wired, wireless.
Ex: Embedded systems in automobiles
21 MGRJ,ECE,RVCE
Real Time Systems/Services
The real time service is triggered by a real world event and produces a
corresponding system response, how long this transformation of input to
output takes is a key design issue.
The real time services are often implemented by integrating H/W and S/W
components.
The real time systems either polls sensors on a periodic basis, or the sensor
components provide digitized data on a known sampling interval with an
interrupt generated to the controller.
The real time systems are categorized into Hard real time and Soft real
time systems based on time of completion.
22 MGRJ, ECE
Real Time Systems….
Service Response Timeline
23 MGR,ECE,RVCE
Steps involved in the Design
The Design happens in three steps mainly:
1. Modeling is the process of gaining a deeper understanding
of a system through imitation. Models express what a
system does or should do.
2. Design is the structured creation of hardware &
Software. It specifies how a system does what it does.
3. Analysis: is the process of gaining a deeper understanding
of a system through dissection. It specifies why a system
does what it does (or fails to do what a model says it
should do).
MGRJ, ECE Source: NPTEL Course “Embedded System Design” by Arnab Sarkar, IIT, Guwahati
24
Design Process….
Each design step is consisting of number of operations.
Modelling Analysis
• Requirements Design
• Functionality test
• Specifications • Architecture design
-Architecture Selection:
• Objective and closeness
• The product requirements are • Choice of processing elements functions defined by
captured from the customer and • standard/custom/semi-custom combining metrics like power,
converted into system level needs HW area, etc are evaluated.
or processing requirements. • Memories, Interfacing, • If it is not close to the
• English (or other natural communication
language) is common starting expected value, the design
• Hardware software portioning
point. and modelling processes are
• Hardware software codesign
• Computation models are used to reiterated.
capture the behaviour. • Component design
• In Real Time(RT) systems,
E.g: Sequential program model. • System Integration
timing behaviour must be
Finite state machine model.
Communicating process verified in addition to
model.
MGRJ,ECE,RVCE
functional correctness.
25
Example: An Elevator Controller
Partial English Description
“Move the elevator either up or down to reach the requested floor. Once at the
requested floor, open the door for at least 10 seconds, and keep it open until the
requested floor changes. Ensure the door is never open while moving. Don’t change
directions unless there are no higher requests when moving up or no lower requests
when moving down…”
Source: NPTEL Course “Embedded System Design” by Arnab Sarkar, IIT, Guwahati
26 MGRJ, ECE
Elevator Controller…
Simple Elevator Controller
‒Request Resolver resolves various floor requests into
single requested floor.
‒Unit Control moves elevator to this requested floor.
27 MGRJ, ECE
Modelling: Sequential Program Model
Declarations:
28 MGRJ, ECE
Modelling: Finite State Machine(FSM) model
FSM for UnitContol process.
FSM model is described by considering systems with:
- Possible states:
E.g: Idle, GoingDown, GoingUp, DoorOpen
- Possible transitions from one state to another based on input
E.g: req> floor
- Actions that occur in each state
E.g:In the GoingUpstate, u,d,o,t= 1,0,0,0 (up = 1, down, open, and timer_start= 0)
29 MGRJ, ECE
FSM model…
UnitControl process using a state machine.
30 MGRJ, ECE
Hardware Software Partitioning
Many functions can be done by software on a general purpose microprocessor OR by
hardware on an application specific ICs (ASICs)
E.g: Game console graphic, PWM, PID control(Hardware).
Leads to Hardware/Software Co-design concept.
Where to place functionality?
E.g: A Sort algorithm Faster in hardware, but more expensive.
More flexible in software but slower.
Designer must be able to explore these various trade-offs:
▪ Speed.
▪ Reliability.
▪ Cost
▪ Form (size, weight, and power constraints.)
31 MGRJ, ECE
Hardware Software Partitioning…
Move “bottleneck” computations from software to hardware.
Hardware Implementation
32 MGRJ, ECE
Source: http://class.ece.iastate.edu/cpre488/lectures/Lect-08.pdf
Example:
FIR Filter
Source: NPTEL Course “Embedded System Design” by Arnab Sarkar, IIT, Guwahati
34 MGRJ, ECE
Hardware Software Co-design
Source: NPTEL Course “Embedded System Design” by Arnab Sarkar, IIT, Guwahati
35 MGRJ, ECE
Zynq 7000 SoC
Comment on this architecture.
36 MGRJ, ECE
Tutorial 1
Problem 1
Design a system HONEY BEE COUNTER with following specifications. The bees
are assumed to enter the bee hive in rectangular box through a small hole. Another
hole is made for the bees to exit. Assume suitable sensors are placed at entry & exit
holes. The system is designed to display the number of bees in hive at any time.
Assume initially there are no bees in hive.
Write block diagram & pseudo code of above system implementation.
37 MGRJ, ECE
Tutorial 1
Problem 2
Design MCU based system to control temperature of the furnace with following
specifications. The furnace temperature has to maintain at 30±10C. Connect
suitable sensors & actuators. Display the temperature on LCD. The power
consumption has to be minimized. Show the design & implementation (diagram
+Program).
38 MGRJ, ECE
Tutorial 1
Problem3
The MCUs are used for control automation of chemical plant. A MCU is used to
control the liquid flow of blast furnace. Another MCU is used control the
temperature of blast furnace. The liquid level & temperature of blast furnace is
displayed in master room powered by another MCU.
Show the block diagram connecting different MCUs. Comment on the interface to
be used.
39 MGRJ, ECE
Enhancing Performance of Processors
Hardware Accelerators & Coprocessors
We can use hardware accelerators and coprocessing to create more efficient, higher throughput
designs.
Hardware accelerators are dedicated fixed-function peripherals designed to perform a
single computationally intensive task over and over.
They offload the main processor with general purpose instruction set, allowing it to do
general-purpose tasks.
Application of accelerators & coprocessor is not a new concept.
E.g. 8087 Intel Math coprocessor released in 80’s.
But, it received a renewed interest around 2002 due to the single thread performance stall.
-Frequency scaling became unsustainable with smaller IC feature sizes.
-Instruction-level parallelism (IPL) can go only so far.
40 MGRJ, ECE
Hardware Accelerators..
Analog Devices SHARC® ADSP-2146x
SHARC® ADSP-2146x processor incorporates hardware accelerators for
implementing three widely used signal processing operations: FIR (finite impulse
response), IIR (infinite impulse response), and FFT (fast Fourier transform).
The ADSP-2146x core has a maximum clock rate of 450 MHz. By using SIMD (single-
instruction multiple-data), the core can perform two MAC (multiply-accumulate)
operations per clock cycle for a peak rate of 900 MMAC/sec.
The accelerator in comparison, operates at the clock rate of 225 MHz. Using its four
dedicated MAC units, the FIR accelerator achieves a peak theoretical throughput of
900 MMAC/sec.
Source:White paper on hardware accelerators in SHARC processors by Paul Beckmann, DSP Concepts, LLC.
41 MGRJ, ECE
Analog Devices SHARC® ADSP-2146x
Consider a home theatre system with 7.1 channels of audio at 96 kHz operating at a
block size of 32 samples. Assume that room equalization is being applied by 8 FIR
filters, each 512 points long.
No. of MAC operations: 8 x 512 x 96KHz=393 MMAC/sec.
If the core CPU were to perform the filtering, it would take 44% of a 450 MHz
SHARC processor.
This FIR processing represents a significant portion of the overall computation of
CPU and fortunately can be offloaded to the accelerator.
42 MGRJ, ECE
ARM NEON Hardware Accelerators..
Arm NEON technology is an advanced SIMD (single instruction multiple data) architecture
extension for the Arm Cortex-A series and Cortex-R52 processors.
NEON technology is intended to improve the multimedia user experience by accelerating
audio and video encoding/decoding, user interface, 2D/3D graphics or gaming.
NEON instructions allow up to:
44 MGRJ, ECE
Enhancing Performance of Processors…
Pipelining
A pipeline is a cascaded connection of processing stages which are connected to
perform a fixed function over a stream of data flowing from one end to the other.
In modern CPUs, the pipelines are applied for instruction execution, arithmetic
computation and memory access operations.
The pipeline is constructed with 𝑘 processing stages. The processed results are passed
from 𝑆𝑖 to stage 𝑆𝑖+1 for all 𝑖 = 1,2, … … . . 𝑘 − 1.
𝑆𝑖 = 𝑠𝑡𝑎𝑔𝑒 𝑖
𝐿 = 𝐿𝑎𝑡𝑐ℎ
𝜏 = 𝐶𝑙𝑜𝑐𝑘 𝑃𝑒𝑟𝑖𝑜𝑑
𝜏𝑚 = 𝑀𝑎𝑥 𝑆𝑡𝑎𝑔𝑒 𝑑𝑒𝑙𝑎𝑦
𝑑 = 𝐿𝑎𝑡𝑐ℎ 𝑑𝑒𝑙𝑎𝑦
45 MGRJ, ECE
Source: Kai Hwang, “Advanced Computer Architecture”, Tata Mcgraw Hill Education.
Clock Cycle(𝝉)
Pipelining…
τ = max τi ∀ i = 1,2 … . k + d = t m + d
Pipeline Frequency or Maximum Throughput
1
𝑓=
𝜏
Ideally, one result is expected to come out of pipeline per cycle.
However, depending on the initiation rate of successive tasks actual throughput of the
pipeline will be lower than 𝑓.
Speedup(𝑺𝒌 )
Ideally, a pipeline with 𝑘 stages can process 𝑛 tasks in 𝑘 + 𝑛 − 1 clock cycles. Where, 𝑘
cycles are needed to complete the execution of the very first task and remaining (𝑛 − 1)
tasks require (𝑛 − 1) cycles.
Total time required: 𝑇𝑘 = [𝑘 + (𝑛 − 1)]𝜏
46 MGRJ, ECE
Speedup….
Consider an equivalent function nonpipelined processor which has a flow through delay
of 𝑘𝜏.
Total time required: 𝑇𝑙 = 𝑛𝑘𝜏.
𝑇𝑙 𝑛𝑘𝜏 𝑛𝑘
Speedup factor 𝑆𝑘 = = =
𝑇𝑘 𝑘+ 𝑛−1 𝜏 𝑘+ 𝑛−1
47
MGRJ, ECE
Speedup….
However, number stages cannot be increased indefinitely because practical constraints
on cost, implementation complexity, circuit implementation, etc.
The figure shows optimal number of pipeline stages(performance cost ratio Vs number
stages).
In practice, most pipelining is staged with 2 ≤ 𝑘 ≤ 15. Very few pipelines are
designed to exceed 10 stages in real computers.
48 MGRJ, ECE
Speedup….
The efficiency 𝐸𝑘 is defined:
𝑆𝑘 𝑛
𝐸𝑘 = = 𝐸𝑘 → 1 as 𝑛 → ∞ (𝑈𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑)
𝑘 𝑘+ 𝑛−1
1
𝐸𝑘 → as 𝑛 = 1 (𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑)
𝑘
The Pipeline Throughput 𝐻𝑘 is defined as the number of tasks performed per unit
time.
𝑛 𝑛𝑓
𝐻𝑘 = = = 𝐸𝑘 . 𝑓
𝑘+ 𝑛−1 𝜏 𝑘+ 𝑛−1
49 MGRJ, ECE
Instruction Pipeline
ARM Cortex M3 three stage pipeline:
- The figure below shows 3 stage pipeline of Cortex M3.
- The Fetch stage fetches instructions from memory, presumably one per cycle.
-The Decode stage reveals the instruction function to be performed and
identifies resources needed .
-The instructions are executed in Execute stage.
Cycles 1 2 3 4 5 ……….
50 MGRJ, ECE
Instruction Pipeline….
A seven stage instruction pipeline
-The figure below show a seven stage pipeline with three Execute(E) stages.
-The Issue(I) stage reserves resources and control pipeline interlocks.
- The Writeback(W) stage used to write results back into the registers.
51 MGRJ, ECE
Seven stage Instruction Pipeline….
52 MGRJ, ECE
Seven stage Instruction Pipeline….
The following figure shows an improved timing after the instruction issuing order
is changed(out of order execution) to eliminate unnecessary delays due to
dependence.
53 MGRJ, ECE
Enhancing Performance of Processors
Superscalar Execution
In a superscalar execution, multiple instruction pipelines are used. This implies multiple
instructions are issued per cycle and multiple results are generated per cycle.
Superscalar processors are designed to exploit more instruction level parallelism in
user programs.
Only independent instructions can be executed in parallel without causing a wait state.
The amount of instruction level parallelism varies widely depending on type of code
being executed.
The instruction issue degree in a super scalar processor has been limited to 2 to 5 in
practice (Average number of instructions to be executed in parallel is 2 without loop
unrolling).
54 MGRJ, ECE
Superscalar Execution….
Time in cycles
55 MGRJ, ECE
Enhancing Performance of Processors..
VLIW architecture
• The Very Long Instruction Word (VLIW) architecture uses more
functional units.
• The CPI of VLIW processor is less compared to CISC & RISC processors.
• 256 or 1024 bits per instruction word.
• Programs are written in conventional short instruction words.
• The code compaction must be done by compiler.
• Instruction parallelism and data movement in a VLIW architecture are
completely specified at the compile time.
56 MGRJ, ECE
VLIW architecture…
57 MGRJ, ECE
VLIW architecture…
58 MGRJ, ECE
Enhancing Performance of Processors…
Multi core CPUs
Power and frequency limitations observed on single core implementations have
paved the gateway for multicore technology.
The frequency in single core CPUs is limited to 4GHz, as any increase beyond this
frequency increases power dissipation.
A Multi-core processor is typically a single processor which contains several
cores on a chip.
The individual cores on a multi-core processor don’t necessarily run as fast as the
highest performing single-core processors, but they improve overall performance by
handling more tasks in parallel.
59 MGRJ, ECE
Multi core CPUs..
The multiple cores inside the chip are not clocked at a higher frequency, but instead their
capability to execute programs in parallel is ultimately contributes to the overall
performance making them more energy efficient and low power cores as shown in the
figure below.
Multi-core processors could also be implemented as a combination of both
homogeneous and heterogeneous cores.
In homogeneous core architecture, all the cores in the CPU are identical and they
improve the overall processor performance
by breaking up a high computationally intensive
application into less computationally intensive
applications and execute them in parallel.
E.g: AMD Dual cores & Intel Core2 Duo and
Quad Cores.
60 MGRJ, ECE
Source: Intel Higher Education Program & FAER.
Multi core CPUs..
61 MGRJ, ECE
Challenges with multicores:
62 MGRJ, ECE
CPU Benchmarking standards
MIPS(Million Instructions Per Second)
𝐶𝑙𝑜𝑐𝑘 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑀𝐼𝑃𝑆 =
𝐶𝑃𝐼∗1,000,000
MIPS is only an approximation as to a processors performance because some
processor instructions do more work than others with an instruction.
A computer rated at 100 MIPS may be able to compute certain values faster than
another computer rated at 120 MIPS.
Dhrystone MIPS: Dhrystone is a standard program consisting of arithmetic &
logical operations on integers and is used to benchmark CPU.
63 MGRJ, ECE
MIPS…
Tutorial 4
The execution times (in seconds) of three programs on three MCUs are given
below:
64 MGRJ, ECE
CPU Benchmarking standards
MFLOPS(Mega Floating Point Operations per Second)
65 MGRJ, ECE
Coremark CPU Benchmarking standards
66 MGRJ, ECE
Suggested reading
MMAC,
Coremark: https://www.eembc.org/coremark/
specint2006,specfp2006
67 MGRJ, ECE