0% found this document useful (0 votes)
92 views

Unit 1 Introduction To Embedded System Design

This document provides an overview of embedded system design. It discusses the characteristics of embedded computing applications including being application specific, reactive and real-time, operating in harsh environments, distributed, small in size and weight, with power concerns. It also covers embedded system architecture, processors, buses, memory, ports and I/O. Block diagrams of computer systems and typical embedded systems are shown.

Uploaded by

Ahan Tejaswi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views

Unit 1 Introduction To Embedded System Design

This document provides an overview of embedded system design. It discusses the characteristics of embedded computing applications including being application specific, reactive and real-time, operating in harsh environments, distributed, small in size and weight, with power concerns. It also covers embedded system architecture, processors, buses, memory, ports and I/O. Block diagrams of computer systems and typical embedded systems are shown.

Uploaded by

Ahan Tejaswi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

RV College of

Engineering

Unit 1
1
Unit 1: Syllabus
 Introduction to Embedded System Design 09 Hrs
Introduction, Characteristics of Embedding Computing Applications, Concept
of Real time Systems, Challenges in Embedded System Design, Design Process:
Requirements, Specifications, Hardware Software Partitioning, Architecture
Design
 Embedded System Architecture:
Processor performance Enhancement: Co-Processor & Hardware Accelerators,
Pipelining, Superscalar Execution, Multi Core CPUs.

2 MGRJ,ECE,RVCE
Block diagram of a
computer system

MP
IC

 Bus: Collection of wires


3 MGRJ,ECE,RVCE
Source: T.L Floyd, “Digital Fundamentals”, 9e Bus
Block diagram of a computer
 All computer systems consists of basic functional blocks that include a
CPU, memory and input/output ports.
 These blocks are connected together with three internal buses: Address
bus, Control bus, Data bus collectively called as system bus.
 A port is a physical interface on a computer through which data is passed
to and from peripherals.
 The memory includes program memory(ROM) to store instructions to
be executed to solve a specific problem and data memory to store data
during executing instructions.

4 MGRJ,ECE,RVCE
Embedded System: Definition
An Electronic/Electro mechanical system which is designed to perform a specific
function and is a combination of both hardware and firmware (Software).

E.g. Electronic Toys, Mobile Handsets, Washing Machines, Air Conditioners,


Automotive Control Units, Set Top Box, DVD Player etc…

5 MGRJ, ECE
The Typical Embedded System

6 MGRJ, ECE
Source: Ref. 2
The Core of the Embedded Systems
-The core of the embedded system falls into any one(or more) of the following
categories.
• General Purpose and Domain Specific Processors
• Microprocessors
• Microcontrollers
• Digital Signal Processors
• Programmable Logic Devices (PLDs)
• Application Specific Integrated Circuits (ASICs)
• Commercial off the shelf Components (COTS)

7 MGRJ, ECE
Sensors & Actuators
Sensor:
A transducer device which converts energy from one form to another for
any measurement or control purpose. Sensors acts as input device
Eg. Hall Effect Sensor which measures the distance between
the cushion and magnet in the Smart Running shoes from
Adidas
Actuator:
A form of transducer device (mechanical or electrical) which converts
signals to corresponding physical action (motion). Actuator acts as an
output device Electronics-enabled “Smart” running shoes from Adidas
Photo Courtesy of Adidas (www.adidas.com)

Eg. Micro motor actuator which adjusts the position of the


cushioning element in the Smart Running shoes from
MGRJ, ECE
8
Adidas
Communication Interface
• Communication interface is essential for communicating with various subsystems
of the embedded system and with the external world.
• Serial interfaces like I2C, SPI, UART, 1-Wire etc., and Parallel bus interface are
examples of ‘Onboard Communication Interface’.
• The ‘Product level communication interface’ (External Communication
Interface) is responsible for data transfer between the embedded system and
other devices or modules.
• The external communication interface: Infrared (IR), Bluetooth (BT), Wireless
LAN (Wi-Fi), Radio Frequency waves (RF), etc. are examples for wireless
communication interface.
• RS-232C/RS-422/RS 485, USB, Ethernet (TCP-IP), , Parallel port etc. are
examples for wired interfaces.
9 MGRJ, ECE
Embedded Firmware/Software
• The control algorithm (Program instructions) and the
configuration settings that an embedded system developer dumps
into the code (Program) memory of the embedded system.
• The embedded firmware can be developed in various methods like
-Write the program in high level languages.
-Write the program in Assembly Language.

10 MGRJ, ECE
Embedded Systems Vs General Computing Systems
General Purpose System Embedded System
 A system which is a combination of
 A system which is a combination of
generic hardware and General Purpose special purpose hardware and embedded
Operating System for executing a OS for executing a specific set of
variety of applications. applications.
 May or may not contain an operating
 Contain a General Purpose Operating
System (GPOS). system for functioning.
 The firmware of the embedded system is
 Applications are alterable
(programmable) by user (It is possible pre-programmed and it is non-
for the end user to re-install the alterable by end-user (There may be
Operating System, and add or remove exceptions for systems supporting OS
user applications). kernel image flashing through special
11 MGRJ, ECE
hardware settings).
Embedded Systems Vs General Computing Systems
General Purpose System Embedded System
• Performance is the key deciding factor  Application specific requirements (like
on the selection of the system. Always performance, power requirements,
‘Faster is Better’. memory usage etc.) are the key deciding
• Less/not at all tailored towards reduced factors.
operating power requirements, options Highly tailored to take advantage of the
for different levels of power power saving modes supported by
management. hardware and Operating System
• Response requirements are not time  For certain category of embedded systems
critical. like mission critical systems, the response
time requirement is highly critical.

12 MGRJ, ECE
Embedded Systems Vs General Computing Systems

General Purpose System Embedded System

• Need not be deterministic in execution  Execution behavior is deterministic for


behavior. certain type of embedded systems like
‘Hard Real Time’ systems.

13 MGRJ, ECE
What is this?

14 MGRJ,ECE,RVCE
Characteristics of Embedding Computing Applications
 Application and Domain Specific.
 Reactive and Real time
 Operates in Harsh environment
 Distributed
 Small size and Weight
 Power Concerns
 Compact Systems…

15 MGRJ,ECE,RVCE
Characteristics…
 Application and Domain Specific Systems
 Embedded systems are not general-purpose computers.
 Optimized for a specific application.
 Many of the job characteristics are known before the hardware is designed
which allows the designer to focus on the specific design constraints of a well
defined application.
 Embedded S/W usually cannot run on other embedded systems without
modification.
 Hardware tailored to an application.
– Unnecessary circuitry is eliminated
– Resources shared if possible.

16 MGRJ, ECE
Characteristics …

 Reactive & Real time Systems


 Typical embedded systems model responds to the environment via sensors
and control the environment using actuators.
 The embedded systems are expected to run at the speed of the
environment, this characteristic is called “reactive”.
 Reactive computation means that the system executes in response to
external events.
 External events can be either periodic or aperiodic.

17 MGRJ, ECE
Real & Real time systems…
 One of the biggest challenges for embedded system designers is,
performing an accurate worst case design analysis on systems with
statistical performance characteristics (e.g., cache memory on a DSP
or other embedded processor).
 Accurately predicting the worst case may be difficult in complicated
architectures.
 Real time system operation means the timing behavior of the system
should be deterministic. Real Time system should not miss deadline.
Ex: Mission control, Flight control system etc…

18 MGRJ, ECE
Characteristics …
 Harsh environment
 Many embedded systems do not operate in a controlled
environment.
 Excessive heat is often a problem, especially in applications involving
combustion (e.g., Automobile applications).
 Protection from vibration, shock, lightning, power supply
fluctuations, water, corrosion, fire, and general physical damage.
 Embedded system designer is to model accurately the different
parameters of harsh environment in real world.

19 MGRJ, ECE
Characteristics …
 Small and low weight
 Many embedded computers are physically located within some larger system.
 Challenge is to develop non-rectangular geometries for certain solutions.
 Weight can also be a critical constraint.

 Power Concerns
 When controlling physical equipment, large current loads may need to be
switched in order to operate motors and other actuators.
 Designer must carefully balance system tradeoffs among analog components,
power, mechanical, network, and digital hardware with corresponding software.
 Power management need to be considered in designing embedded system.
System should be designed in such a way as to minimize the heat dissipation by
the system. More power consumption less battery life.
20 MGRJ, ECE
Characteristics …
 Distributed Systems
 A set of nodes connected by the network, cooperating to achieve a common goal
- Node: a µC + I/O + communication interface.
- One or multiple networks: wired, wireless.
Ex: Embedded systems in automobiles

21 MGRJ,ECE,RVCE
Real Time Systems/Services
 The real time service is triggered by a real world event and produces a
corresponding system response, how long this transformation of input to
output takes is a key design issue.
 The real time services are often implemented by integrating H/W and S/W
components.
 The real time systems either polls sensors on a periodic basis, or the sensor
components provide digitized data on a known sampling interval with an
interrupt generated to the controller.
 The real time systems are categorized into Hard real time and Soft real
time systems based on time of completion.

22 MGRJ, ECE
Real Time Systems….
Service Response Timeline

23 MGR,ECE,RVCE
Steps involved in the Design
The Design happens in three steps mainly:
1. Modeling is the process of gaining a deeper understanding
of a system through imitation. Models express what a
system does or should do.
2. Design is the structured creation of hardware &
Software. It specifies how a system does what it does.
3. Analysis: is the process of gaining a deeper understanding
of a system through dissection. It specifies why a system
does what it does (or fails to do what a model says it
should do).

MGRJ, ECE Source: NPTEL Course “Embedded System Design” by Arnab Sarkar, IIT, Guwahati
24
Design Process….
 Each design step is consisting of number of operations.

Modelling Analysis
• Requirements Design
• Functionality test
• Specifications • Architecture design
-Architecture Selection:
• Objective and closeness
• The product requirements are • Choice of processing elements functions defined by
captured from the customer and • standard/custom/semi-custom combining metrics like power,
converted into system level needs HW area, etc are evaluated.
or processing requirements. • Memories, Interfacing, • If it is not close to the
• English (or other natural communication
language) is common starting expected value, the design
• Hardware software portioning
point. and modelling processes are
• Hardware software codesign
• Computation models are used to reiterated.
capture the behaviour. • Component design
• In Real Time(RT) systems,
E.g: Sequential program model. • System Integration
timing behaviour must be
Finite state machine model.
Communicating process verified in addition to
model.
MGRJ,ECE,RVCE
functional correctness.
25
Example: An Elevator Controller
 Partial English Description
“Move the elevator either up or down to reach the requested floor. Once at the
requested floor, open the door for at least 10 seconds, and keep it open until the
requested floor changes. Ensure the door is never open while moving. Don’t change
directions unless there are no higher requests when moving up or no lower requests
when moving down…”

Source: NPTEL Course “Embedded System Design” by Arnab Sarkar, IIT, Guwahati
26 MGRJ, ECE
Elevator Controller…
 Simple Elevator Controller
‒Request Resolver resolves various floor requests into
single requested floor.
‒Unit Control moves elevator to this requested floor.

27 MGRJ, ECE
Modelling: Sequential Program Model
Declarations:

28 MGRJ, ECE
Modelling: Finite State Machine(FSM) model
 FSM for UnitContol process.
 FSM model is described by considering systems with:
- Possible states:
E.g: Idle, GoingDown, GoingUp, DoorOpen
- Possible transitions from one state to another based on input
E.g: req> floor
- Actions that occur in each state
E.g:In the GoingUpstate, u,d,o,t= 1,0,0,0 (up = 1, down, open, and timer_start= 0)

29 MGRJ, ECE
FSM model…
 UnitControl process using a state machine.

30 MGRJ, ECE
Hardware Software Partitioning
 Many functions can be done by software on a general purpose microprocessor OR by
hardware on an application specific ICs (ASICs)
E.g: Game console graphic, PWM, PID control(Hardware).
 Leads to Hardware/Software Co-design concept.
 Where to place functionality?
E.g: A Sort algorithm Faster in hardware, but more expensive.
More flexible in software but slower.
 Designer must be able to explore these various trade-offs:
▪ Speed.
▪ Reliability.
▪ Cost
▪ Form (size, weight, and power constraints.)
31 MGRJ, ECE
Hardware Software Partitioning…
 Move “bottleneck” computations from software to hardware.

Hardware Implementation

32 MGRJ, ECE
Source: http://class.ece.iastate.edu/cpre488/lectures/Lect-08.pdf
Example:
FIR Filter

MGRJ, ECE Source: http://class.ece.iastate.edu/cpre488/lectures/Lect-08.pdf


33
Hardware Software Partitioning…

Source: NPTEL Course “Embedded System Design” by Arnab Sarkar, IIT, Guwahati
34 MGRJ, ECE
Hardware Software Co-design

Source: NPTEL Course “Embedded System Design” by Arnab Sarkar, IIT, Guwahati
35 MGRJ, ECE
Zynq 7000 SoC
 Comment on this architecture.

36 MGRJ, ECE
Tutorial 1
Problem 1
Design a system HONEY BEE COUNTER with following specifications. The bees
are assumed to enter the bee hive in rectangular box through a small hole. Another
hole is made for the bees to exit. Assume suitable sensors are placed at entry & exit
holes. The system is designed to display the number of bees in hive at any time.
Assume initially there are no bees in hive.
Write block diagram & pseudo code of above system implementation.

37 MGRJ, ECE
Tutorial 1

Problem 2
Design MCU based system to control temperature of the furnace with following
specifications. The furnace temperature has to maintain at 30±10C. Connect
suitable sensors & actuators. Display the temperature on LCD. The power
consumption has to be minimized. Show the design & implementation (diagram
+Program).

38 MGRJ, ECE
Tutorial 1

Problem3
The MCUs are used for control automation of chemical plant. A MCU is used to
control the liquid flow of blast furnace. Another MCU is used control the
temperature of blast furnace. The liquid level & temperature of blast furnace is
displayed in master room powered by another MCU.
Show the block diagram connecting different MCUs. Comment on the interface to
be used.

39 MGRJ, ECE
Enhancing Performance of Processors
Hardware Accelerators & Coprocessors
 We can use hardware accelerators and coprocessing to create more efficient, higher throughput
designs.
 Hardware accelerators are dedicated fixed-function peripherals designed to perform a
single computationally intensive task over and over.
 They offload the main processor with general purpose instruction set, allowing it to do
general-purpose tasks.
 Application of accelerators & coprocessor is not a new concept.
E.g. 8087 Intel Math coprocessor released in 80’s.
 But, it received a renewed interest around 2002 due to the single thread performance stall.
-Frequency scaling became unsustainable with smaller IC feature sizes.
-Instruction-level parallelism (IPL) can go only so far.
40 MGRJ, ECE
Hardware Accelerators..
Analog Devices SHARC® ADSP-2146x
 SHARC® ADSP-2146x processor incorporates hardware accelerators for
implementing three widely used signal processing operations: FIR (finite impulse
response), IIR (infinite impulse response), and FFT (fast Fourier transform).

 The ADSP-2146x core has a maximum clock rate of 450 MHz. By using SIMD (single-
instruction multiple-data), the core can perform two MAC (multiply-accumulate)
operations per clock cycle for a peak rate of 900 MMAC/sec.

 The accelerator in comparison, operates at the clock rate of 225 MHz. Using its four
dedicated MAC units, the FIR accelerator achieves a peak theoretical throughput of
900 MMAC/sec.

Source:White paper on hardware accelerators in SHARC processors by Paul Beckmann, DSP Concepts, LLC.
41 MGRJ, ECE
Analog Devices SHARC® ADSP-2146x

 Consider a home theatre system with 7.1 channels of audio at 96 kHz operating at a
block size of 32 samples. Assume that room equalization is being applied by 8 FIR
filters, each 512 points long.
 No. of MAC operations: 8 x 512 x 96KHz=393 MMAC/sec.
 If the core CPU were to perform the filtering, it would take 44% of a 450 MHz
SHARC processor.
 This FIR processing represents a significant portion of the overall computation of
CPU and fortunately can be offloaded to the accelerator.

42 MGRJ, ECE
ARM NEON Hardware Accelerators..
 Arm NEON technology is an advanced SIMD (single instruction multiple data) architecture
extension for the Arm Cortex-A series and Cortex-R52 processors.
 NEON technology is intended to improve the multimedia user experience by accelerating
audio and video encoding/decoding, user interface, 2D/3D graphics or gaming.
 NEON instructions allow up to:

• NEON can be used multiple ways, including NEON enabled


libraries, compiler's auto-vectorization feature, NEON intrinsics,
and NEON assembly code.

43 MGRJ, ECE Source: www.arm.com


Tutorial 2
 A new multimedia unit (MU) that is added in a processor speeds up the
completion of multimedia instructions given to the processor by 4 times.
Assuming a program has 40% multimedia instructions, what is the overall
speedup gained while running the program when it is executed on the
processor with the new MU than when it is run on the processor without this
MU?( Use Amdahl’s Law)
1
 𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = 𝐹
1−𝐹 +
𝑠
 F= Fraction Enhanced S=Speed up

44 MGRJ, ECE
Enhancing Performance of Processors…
Pipelining
 A pipeline is a cascaded connection of processing stages which are connected to
perform a fixed function over a stream of data flowing from one end to the other.
 In modern CPUs, the pipelines are applied for instruction execution, arithmetic
computation and memory access operations.
 The pipeline is constructed with 𝑘 processing stages. The processed results are passed
from 𝑆𝑖 to stage 𝑆𝑖+1 for all 𝑖 = 1,2, … … . . 𝑘 − 1.
𝑆𝑖 = 𝑠𝑡𝑎𝑔𝑒 𝑖
𝐿 = 𝐿𝑎𝑡𝑐ℎ
𝜏 = 𝐶𝑙𝑜𝑐𝑘 𝑃𝑒𝑟𝑖𝑜𝑑
𝜏𝑚 = 𝑀𝑎𝑥 𝑆𝑡𝑎𝑔𝑒 𝑑𝑒𝑙𝑎𝑦
𝑑 = 𝐿𝑎𝑡𝑐ℎ 𝑑𝑒𝑙𝑎𝑦

45 MGRJ, ECE
Source: Kai Hwang, “Advanced Computer Architecture”, Tata Mcgraw Hill Education.
Clock Cycle(𝝉)
Pipelining…
τ = max τi ∀ i = 1,2 … . k + d = t m + d
Pipeline Frequency or Maximum Throughput
1
𝑓=
𝜏
 Ideally, one result is expected to come out of pipeline per cycle.
 However, depending on the initiation rate of successive tasks actual throughput of the
pipeline will be lower than 𝑓.
Speedup(𝑺𝒌 )
 Ideally, a pipeline with 𝑘 stages can process 𝑛 tasks in 𝑘 + 𝑛 − 1 clock cycles. Where, 𝑘
cycles are needed to complete the execution of the very first task and remaining (𝑛 − 1)
tasks require (𝑛 − 1) cycles.
 Total time required: 𝑇𝑘 = [𝑘 + (𝑛 − 1)]𝜏

46 MGRJ, ECE
Speedup….
 Consider an equivalent function nonpipelined processor which has a flow through delay
of 𝑘𝜏.
 Total time required: 𝑇𝑙 = 𝑛𝑘𝜏.
𝑇𝑙 𝑛𝑘𝜏 𝑛𝑘
 Speedup factor 𝑆𝑘 = = =
𝑇𝑘 𝑘+ 𝑛−1 𝜏 𝑘+ 𝑛−1

 The maximum speedup is 𝑆𝑘 → 𝑘 as 𝑛 → ∞. The maximum speedup is very difficult to achieve


because of data dependences between successive tasks, program branches, interrupts, etc.
 For small values of 𝑛, the speedup is very poor
as shown in figure.
 For larger number of 𝑘, the higher the potential
of speedup performance.

47
MGRJ, ECE
Speedup….
 However, number stages cannot be increased indefinitely because practical constraints
on cost, implementation complexity, circuit implementation, etc.
 The figure shows optimal number of pipeline stages(performance cost ratio Vs number
stages).

 In practice, most pipelining is staged with 2 ≤ 𝑘 ≤ 15. Very few pipelines are
designed to exceed 10 stages in real computers.

48 MGRJ, ECE
Speedup….
 The efficiency 𝐸𝑘 is defined:
𝑆𝑘 𝑛
𝐸𝑘 = = 𝐸𝑘 → 1 as 𝑛 → ∞ (𝑈𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑)
𝑘 𝑘+ 𝑛−1
1
𝐸𝑘 → as 𝑛 = 1 (𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑)
𝑘
 The Pipeline Throughput 𝐻𝑘 is defined as the number of tasks performed per unit
time.
𝑛 𝑛𝑓
𝐻𝑘 = = = 𝐸𝑘 . 𝑓
𝑘+ 𝑛−1 𝜏 𝑘+ 𝑛−1

Maximum throughput 𝒇 occurs when 𝐸𝑘 → 1 𝑎𝑠 𝑛 → ∞.

49 MGRJ, ECE
Instruction Pipeline
 ARM Cortex M3 three stage pipeline:
- The figure below shows 3 stage pipeline of Cortex M3.
- The Fetch stage fetches instructions from memory, presumably one per cycle.
-The Decode stage reveals the instruction function to be performed and
identifies resources needed .
-The instructions are executed in Execute stage.

Cycles 1 2 3 4 5 ……….

50 MGRJ, ECE
Instruction Pipeline….
 A seven stage instruction pipeline
-The figure below show a seven stage pipeline with three Execute(E) stages.
-The Issue(I) stage reserves resources and control pipeline interlocks.
- The Writeback(W) stage used to write results back into the registers.

- Assume pipelined execution of high level language statements:


X=Y+Z and A=B x C
- These macro operations will be converted to several assembly statements.

51 MGRJ, ECE
Seven stage Instruction Pipeline….

 Assuming this architecture, the pipelined execution is shown below.


 The figure illustrates the issue of instructions following original program order(In
order execution). The shaded boxes correspond to idle cycles when instruction
issues are blocked due to resource latency or data dependencies.

52 MGRJ, ECE
Seven stage Instruction Pipeline….

 The following figure shows an improved timing after the instruction issuing order
is changed(out of order execution) to eliminate unnecessary delays due to
dependence.

53 MGRJ, ECE
Enhancing Performance of Processors
Superscalar Execution
 In a superscalar execution, multiple instruction pipelines are used. This implies multiple
instructions are issued per cycle and multiple results are generated per cycle.
 Superscalar processors are designed to exploit more instruction level parallelism in
user programs.
 Only independent instructions can be executed in parallel without causing a wait state.
 The amount of instruction level parallelism varies widely depending on type of code
being executed.
 The instruction issue degree in a super scalar processor has been limited to 2 to 5 in
practice (Average number of instructions to be executed in parallel is 2 without loop
unrolling).

54 MGRJ, ECE
Superscalar Execution….

 The figure shows tripe issue superscalar processor with degree 𝑚 = 3.


 Due to desire for higher degree of instruction
level parallelism in programs, the superscalar
processor depends more on compiler to
exploit parallelism.
E.g: IBM’s POWERPC

Time in cycles

55 MGRJ, ECE
Enhancing Performance of Processors..
VLIW architecture
• The Very Long Instruction Word (VLIW) architecture uses more
functional units.
• The CPI of VLIW processor is less compared to CISC & RISC processors.
• 256 or 1024 bits per instruction word.
• Programs are written in conventional short instruction words.
• The code compaction must be done by compiler.
• Instruction parallelism and data movement in a VLIW architecture are
completely specified at the compile time.

56 MGRJ, ECE
VLIW architecture…

 A typical VLIW processor with instruction format in shown in figure.

57 MGRJ, ECE
VLIW architecture…

 A typical VLIW execution with degree 3 is shown in figure.

58 MGRJ, ECE
Enhancing Performance of Processors…
Multi core CPUs
 Power and frequency limitations observed on single core implementations have
paved the gateway for multicore technology.
 The frequency in single core CPUs is limited to 4GHz, as any increase beyond this
frequency increases power dissipation.
 A Multi-core processor is typically a single processor which contains several
cores on a chip.
 The individual cores on a multi-core processor don’t necessarily run as fast as the
highest performing single-core processors, but they improve overall performance by
handling more tasks in parallel.

59 MGRJ, ECE
Multi core CPUs..

 The multiple cores inside the chip are not clocked at a higher frequency, but instead their
capability to execute programs in parallel is ultimately contributes to the overall
performance making them more energy efficient and low power cores as shown in the
figure below.
 Multi-core processors could also be implemented as a combination of both
homogeneous and heterogeneous cores.
 In homogeneous core architecture, all the cores in the CPU are identical and they
improve the overall processor performance
by breaking up a high computationally intensive
application into less computationally intensive
applications and execute them in parallel.
E.g: AMD Dual cores & Intel Core2 Duo and
Quad Cores.
60 MGRJ, ECE
Source: Intel Higher Education Program & FAER.
Multi core CPUs..

 In heterogenous multicores consist of dedicated application specific processor


cores that would target the issue of running variety of applications to be executed
on a computer.
 An example could be a DSP core addressing multimedia applications that require
heavy mathematical calculations, a complex core addressing computationally
intensive application and a remedial core which addresses less computationally
intensive applications.
E.g:TI OMAP(ARM Core+DSP core), QualCom Snapdragon
Challenges with multicores:
 Majority of applications used today were written to run on only a single processor failing
to use the capability of multi-core processors.

61 MGRJ, ECE
Challenges with multicores:

 The delay of on-chip interconnects is becoming a critical bottle-neck in meeting


performance of multi-core chips. The performance of the processor truly
depends on how fast a CPU can fetch data rather than how fast it can operate on
it to avoid data starvation scenario.
 The Multiple cores accessing shared data simultaneously may lead to a timing
dependent error known as “data race condition”. In a multi-core environment, data
structure is open to access to all other cores when one core is updating it. In the event
of a secondary core accessing data even before the first core finishes updating the
memory, the secondary core faults in some manner.
 The multi-cores interaction between on chip components viz. cores, memory
controllers and shared components viz. cache and memories where bus contention and
latency are the key areas of concern.

62 MGRJ, ECE
CPU Benchmarking standards
 MIPS(Million Instructions Per Second)
𝐶𝑙𝑜𝑐𝑘 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑀𝐼𝑃𝑆 =
𝐶𝑃𝐼∗1,000,000
 MIPS is only an approximation as to a processors performance because some
processor instructions do more work than others with an instruction.
 A computer rated at 100 MIPS may be able to compute certain values faster than
another computer rated at 120 MIPS.
 Dhrystone MIPS: Dhrystone is a standard program consisting of arithmetic &
logical operations on integers and is used to benchmark CPU.

63 MGRJ, ECE
MIPS…
Tutorial 4
 The execution times (in seconds) of three programs on three MCUs are given
below:

Execution Time (in seconds)


Program
MCU A MCU B MCU C
Program 1 10 1 20
Program 2 1000 200 20
Program 3 500 200 20

 Assume that 100,000,000 instructions were executed in each of the three


programs. Calculate the MIPS(Million instructions Per Second) rating of each
program on each of the three machines. Based on these ratings, draw a clear
conclusion regarding the relative performance of the three computers.

64 MGRJ, ECE
CPU Benchmarking standards
MFLOPS(Mega Floating Point Operations per Second)

 A floating-point operation is an addition, subtraction, multiplication, or division


operation applied to a number in a single or double precision floating point
representation.
 Clearly, a MFLOPS rating is dependent on the program. Different programs require
the execution of different numbers of floating-point operations.
 MFLOPS has a stronger claim than MIPS to being a fair comparison between
different computers. The key to this claim is that the same program running on
different computers may execute a different number of instructions but will always
execute the same number of floating-point operations.

65 MGRJ, ECE
Coremark CPU Benchmarking standards

 CoreMark® is a benchmark that measures the performance of microcontrollers (MCUs)


and central processing units (CPUs) used in embedded systems.
 Replacing the Dhrystone benchmark, Coremark contains implementations of the
following algorithms: list processing (find and sort), matrix manipulation, state machine
(determine if an input stream contains valid numbers), and CRC (cyclic redundancy
check).
 It is designed to run on devices from 8-bit microcontrollers to 64-bit microprocessors.

66 MGRJ, ECE
 Suggested reading
MMAC,
Coremark: https://www.eembc.org/coremark/
specint2006,specfp2006

67 MGRJ, ECE

You might also like