Unit 1-ERTS
Unit 1-ERTS
04/28/2021 1
UNIT I INTRODUCTION TO
EMBEDDED COMPUTING
Complex systems and microprocessors – Design
example: Model train controller Embedded system
design process – Formalism for system design –
Instruction sets Preliminaries – ARM Processor – CPU:
Programming input and output – Supervisor mode,
exception and traps – Coprocessor – Memory system
mechanism – CPU performance – CPU power
consumption.
04/28/2021 2
DEFINITION
Embedded computing system: any device that
includes a programmable computer but is not itself
a general-purpose computer.
Take advantage of application characteristics to
optimize the design
EMBEDDING A COMPUTER
output analog
input analog
CPU
mem
embedded
computer 04/28/2021 3
EXAMPLES
Cell phone.
Printer.
Automobile: engine, brakes, dash, etc.
Airplane: engine, flight controls, nav/comm.
Digital television.
Household appliances.
04/28/2021 4
CHARACTERISTICS OF EMBEDDED
SYSTEMS
Sophisticated functionality.
Real-time operation.
Low manufacturing cost.
Low power.
Designed to tight deadlines by small teams.
FUNCTIONAL COMPLEXITY
Often have to run sophisticated algorithms or multiple
algorithms.
Cell phone, laser printer.
Often provide sophisticated user interfaces.
04/28/2021 5
REAL-TIME OPERATION
Must finish operations by deadlines.
Hard real time: missing deadline causes failure.
Soft real time: missing deadline results in degraded
performance.
Many systems are multi-rate: must handle operations
at widely varying rates.
NON-FUNCTIONAL REQUIREMENTS
Many embedded systems are mass-market items that
must have low manufacturing costs.
Limited memory, microprocessor power, etc.
Power consumption is critical in battery-powered
devices.
Excessivepower consumption increases system cost even
in wall-powered devices.
04/28/2021 6
WHAT DOES
“PERFORMANCE” MEAN?
In general-purpose computing, performance often
means average-case, may not be well-defined.
In real-time systems, performance means meeting
deadlines.
Missing the deadline by even a little is bad.
Finishing ahead of the deadline may not help.
04/28/2021 7
CHARACTERIZING
PERFORMANCE
We need to analyze the system at several levels of
abstraction to understand performance:
CPU.
Platform.
Program.
Task.
Multiprocessor.
04/28/2021 8
DESIGN GOALS
Performance.
Overall speed, deadlines.
Functionality and user interface.
Manufacturing cost.
Power consumption.
Other requirements (physical size, etc.)
04/28/2021 9
LEVELS OF ABSTRACTION
requirements
specification
architecture
component
design
system
integration
04/28/2021 10
TOP-DOWN VS. BOTTOM-UP
Top-down design:
start
from most abstract description;
work to most detailed.
Bottom-up design:
work from small components to big system.
Real design uses both techniques.
04/28/2021 11
STEPWISE REFINEMENT
At each level of abstraction, we must:
analyze the design to determine characteristics of the
current state of the design;
refine the design to add detail.
REQUIREMENTS
04/28/2021 13
OBJECT-ORIENTED DESIGN
Object-oriented (OO) design: A generalization of
object-oriented programming.
Object = state + methods.
State
provides each object with its own identity.
Methods provide an abstract interface to the object.
04/28/2021 15
UML OBJECT
object name
class name
d1: Display
pixels is a
2-D array pixels: array[] of pixels
elements
menu_items
comment
attributes
04/28/2021 16
UML CLASS
pixels
elements
menu_items
mouse_click() operations
draw_box
04/28/2021 17
THE CLASS INTERFACE
The operations provide the abstract interface
between the class’s implementation and other
classes.
Operations may have arguments, return values.
An operation can examine and/or modify the
object’s state.
04/28/2021 18
RELATIONSHIPS BETWEEN
OBJECTS AND CLASSES
Association: objects communicate but one does
not own the other.
Aggregation: a complex object is made of several
smaller objects.
Composition: aggregation in which owner does not
allow access to its components.
Generalization: define one class in terms of
another.
04/28/2021 19
CLASS DERIVATION
May want to define one class in terms of another.
Derived class inherits attributes, operations of base class.
Derived_class
UML
generalization
Base_class
04/28/2021 20
CLASS DERIVATION EXAMPLE
Display base
class
pixels
elements
menu_items
pixel()
derived class set_pixel()
mouse_click()
draw_box
BW_display Color_map_display
04/28/2021 21
MULTIPLE INHERITANCE
base classes
Speaker Display
Multimedia_display
derived class
04/28/2021 22
LINKS AND ASSOCIATIONS
Link: describes relationships between objects.
Association: describes relationship between
classes.
LINK EXAMPLE
Link defines the contains relationship:
message
message
msg = msg2
length = 2114
04/28/2021 23
ASSOCIATION EXAMPLE
# contained messages # containing message sets
msg: ADPCM_stream
count : integer
length : integer contains
STATE MACHINES
transition
a b
04/28/2021 25
SIGNAL EVENT
<<signal>>
mouse_click a
b
declaration
event description
04/28/2021 26
CALL EVENT
draw_box(10,5,3,2,blue)
c d
TIMER EVENT
tm(time-value)
e f
04/28/2021 27
EXAMPLE STATE MACHINE
start input/output
region = drawing/
find_object(objid) highlight(objid)
found object
object highlighted
finish
04/28/2021 28
SEQUENCE DIAGRAM
EXAMPLE
m: Mouse d1: Display u: Menu
mouse_click(x,y,button)
which_menu(x,y,i)
time
call_menu(i)
04/28/2021 29
INTRODUCTION
Example: model train controller.
PURPOSES OF EXAMPLE
Follow a design through several levels of
abstraction.
Gain experience with UML.
04/28/2021 30
MODEL TRAIN SETUP
rcvr motor
power
supply
console
04/28/2021 31
REQUIREMENTS
Console can control 8 trains on 1 track.
Throttle has at least 63 levels.
Inertia control adjusts responsiveness with at least
8 levels.
Emergency stop button.
Error detection scheme on messages.
04/28/2021 32
REQUIREMENTS FORM
name model train controller
purpose control speed of <= 8 model trains
inputs throttle, inertia, emergency stop, train
#
outputs train control signals
functions set engine speed w. inertia; emergency
stop
performance can update train speed at least 10
times/sec
manufacturing cost $50
power wall powered
physical console comfortable for 2 hands; < 2
size/weight lbs.
04/28/2021 33
DIGITAL COMMAND
CONTROL
DCC created by model railroad hobbyists, picked
up by industry.
Defines way in which model trains, controllers
communicate.
Leavesmany system design aspects open, allowing
competition.
This is a simple example of a big trend:
Cell phones, digital TV rely on standards.
04/28/2021 34
DCC ELECTRICAL STANDARD
Voltage moves around
logic 1 logic 0
the power supply
voltage; adds no DC
component.
1 is 58 ms, 0 is at least time
100 ms.
58 ms >= 100 ms
04/28/2021 35
DCC COMMUNICATION
STANDARD
Basic packet format: PSA(sD)+E.
P: preamble = 1111111111.
S: packet start bit = 0.
A: address data byte.
s: data byte start bit.
D: data byte (data payload).
E: packet end bit = 1.
04/28/2021 36
DCC PACKET TYPES
Baseline packet: minimum packet that must be
accepted by all DCC implementations.
Address data byte gives receiver address.
Instruction data byte gives basic instruction.
Error correction data byte gives ECC.
04/28/2021 37
BASIC SYSTEM COMMANDS
set-speed speed
(positive/negative)
set-inertia inertia-value (non-
negative)
estop none
04/28/2021 38
TYPICAL CONTROL
SEQUENCE
:console :train_rcvr
set-inertia
set-speed
set-speed
estop
set-speed
04/28/2021 39
MESSAGE CLASSES
command
value: unsigned-
value: integer
integer
04/28/2021 40
ROLES OF MESSAGE CLASSES
Implemented message classes derived from
message class.
Attributesand operations will be filled in for detailed
specification.
Implemented message classes specify message
type by their class.
May have to add type as parameter to data structure in
implementation.
04/28/2021 41
SUBSYSTEM
COLLABORATION DIAGRAM
Shows relationship between console and receiver (ignores
role of track):
1..n: command
:console :receiver
04/28/2021 42
MAJOR SUBSYSTEM ROLES
Console:
read state of front panel;
format messages;
transmit messages.
Train:
receive message;
interpret message;
control the train.
04/28/2021 43
CONSOLE SYSTEM CLASSES
1 console
1
1 1 1
1
1 1
1 1
receiver* sender*
04/28/2021 44
CONSOLE CLASS ROLES
panel: describes analog knobs and interface
hardware.
formatter: turns knob settings into bit streams.
transmitter: sends data on track.
04/28/2021 45
TRAIN SYSTEM CLASSES
train set
1
1..t
1 1
train
1 1 motor
receiver interface
1 1
1 controller 1
1 1
detector* pulser*
04/28/2021 46
TRAIN CLASS ROLES
receiver: digitizes signal from track.
controller: interprets received commands and
makes control decisions.
motor interface: generates signals required by
motor.
04/28/2021 47
DETAILED SPECIFICATION
We can now fill in the details of the conceptual
specification:
more classes;
behaviors.
Sketching out the spec first helps us understand the
basic relationships in the system.
04/28/2021 48
TRAIN SPEED CONTROL
Motor controlled by pulse width modulation:
duty
cycle +
V
-
04/28/2021 49
CONSOLE PHYSICAL OBJECT
CLASSES
knobs* pulser*
sender* detector*
04/28/2021 50
PANEL AND MOTOR
INTERFACE CLASSES
panel motor-interface
speed: integer
train-number() : integer
speed() : integer
inertia() : integer
estop() : boolean
new-settings()
04/28/2021 51
TRANSMITTER AND
RECEIVER CLASSES
transmitter receiver
current: command
new: boolean
send-speed(adrs: integer,
speed: integer)
send-inertia(adrs: integer, read-cmd()
val: integer) new-cmd() : boolean
set-estop(adrs: integer) rcv-type(msg-type:
command)
rcv-speed(val: integer)
rcv-inertia(val:integer)
04/28/2021 52
CLASS DESCRIPTIONS
transmitter class has one behavior for each type of
message sent.
receiver function provides methods to:
detect a new message;
determine its type;
read its parameters (estop has no parameters).
04/28/2021 53
FORMATTER CLASS
formatter
current-train: integer
current-speed[ntrains]: integer
current-inertia[ntrains]:
unsigned-integer
current-estop[ntrains]: boolean
send-command()
panel-active() : boolean
operate()
04/28/2021 54
FORMATTER CLASS
DESCRIPTION
Formatter class holds state for each train, setting
for current train.
The operate() operation performs the basic
formatting task.
04/28/2021 55
CONTROL INPUT CASES
Use a soft panel to show current panel settings for
each train.
Changing train number:
mustchange soft panel settings to reflect current train’s
speed, etc.
Controlling throttle/inertia/estop:
read panel, check for changes, perform command.
04/28/2021 56
CONTROL INPUT SEQUENCE
DIAGRAM
:knobs :panel :formatter :transmitter
change in read panel
control panel-active
change in speed/
settings
inertia/estop
number
new-settings
set-knobs
04/28/2021 57
FORMATTER OPERATE
BEHAVIOR
update-panel()
idle
send-command()
other
04/28/2021 58
PANEL-ACTIVE BEHAVIOR
T
current-train = train-knob
panel*:read-train() update-screen
changed = true
F
T
panel*:read-speed() current-speed = throttle
changed = true
... ...
04/28/2021 59
CONTROLLER CLASS
controller
current-train: integer
current-speed[ntrains]: integer
current-direction[ntrains]: boolean
current-inertia[ntrains]:
unsigned-integer
operate()
issue-command()
04/28/2021 60
SEQUENCE DIAGRAM FOR
SET-SPEED COMMAND
:receiver :controller :motor-interface :pulser*
new-cmd
cmd-type
set-pulse
set-pulse
set-pulse
04/28/2021 61
CONTROLLER OPERATE
BEHAVIOR
wait for a
command
from receiver
receive-command()
issue-command()
04/28/2021 62
REFINED COMMAND CLASSES
command
type: 3-bits
address: 3-bits
parity: 1-bit
type=010 type=001
type=000
value: 7-bits value: 3-bits
04/28/2021 63
VON NEUMANN
ARCHITECTURE
Memory holds data, instructions.
Central processing unit (CPU) fetches instructions
from memory.
Separate
CPU and memory distinguishes
programmable computer.
CPU registers help out: program counter (PC),
instruction register (IR), general-purpose registers,
etc.
04/28/2021 64
CPU + MEMORY
address
200
PC
data
memory
CPU
200 ADD r5,r1,r3 ADD IR
r5,r1,r3
04/28/2021 65
HARVARD ARCHITECTURE
address
data memory
data PC
CPU
address
04/28/2021 66
VON NEUMANN VS. HARVARD
Harvard can’t use self-modifying code.
Harvard allows two simultaneous memory fetches.
Most DSPs use Harvard architecture for streaming
data:
greatermemory bandwidth;
more predictable bandwidth.
04/28/2021 67
RISC VS. CISC
Complex instruction set computer (CISC):
many addressing modes;
many operations.
Reduced instruction set computer (RISC):
load/store;
pipelinable instructions.
04/28/2021 68
INSTRUCTION SET
CHARACTERISTICS
Fixed vs. variable length.
Addressing modes.
Number of operands.
Types of operands.
04/28/2021 69
ARM - INTRODUCTION
Advances RISC Machines (now known as ARM) was established
as a joint venture between Acorn, Apple and VLSI in November
1990
ARM is the industry's leading provider of 16/32-bit embedded
RISC microprocessor solutions
Architectural simplicity which allows Very small implementations
which result in Very low power consumption
The company licenses its high-performance, low-cost, power-
efficient RISC processors, peripherals, and system-chip designs to
leading international electronics companies
ARM provides comprehensive support required in developing a
complete system
04/28/2021 70
ARM - INTRODUCTION
Currently available processors:
Arm 1
This was the very first ARM processor and was superseded by the
ARM2 fairly quickly
Arm 2
The ARM2 chip, and processor cell, features 27 registers of which
16 are accessible at any one time. Four processor modes are
available:
USR : user mode
IRQ : interrupt mode (with a private copy of R13 and R14)
FIQ : fast interrupt mode (private copies of R8 to R14)
SVC : supervisor mode. (private copies of R13 and R14)
04/28/2021 71
ARM - INTRODUCTION
Arm 3
This is an upgraded ARM2 cell, with a cache and dedicated coprocessor
interface added
Arm 6
This processor cell is the first of the commercially available ARMs to
have a full 32bit addressing capability
Additionally the processor now has 31 registers in it along with six new
processor modes:
User32 - 32 bit USR mode
04/28/2021 72
ARM - INTRODUCTION
Arm 7
The ARM7 cell is functionally identical to the ARM6 cell in capabilities
but may be clocked faster than the ARM6
A variant of the ARM7 cell offers an improved hardware multiply,
suitable for DSP work
Arm 8
Includes a five stage pipeline, a speculative instruction fetcher and
internal tweaks to the processor to allow a higher clock speed
Strong ARM
This is the high speed variant of the ARM chip family
Architecturally it is similar to the ARM8 core, sharing the five stage
pipeline with that processor
A further difference is change from a unified data and instruction cache
to a split, Harvard architecture, instruction and data cache
04/28/2021 73
ARM - INTRODUCTION
ARM9
An incremental improvement over the ARM8 this chip features
the same five stage pipeline but is now a Harvard Architecture
chip, like the Strong ARM
ARM 10
300 MHz
400 MIPS
600 mWatts
04/28/2021 74
ARM ASSEMBLY LANGUAGE
EXAMPLE
label1 ADR r4,c
LDR r0,[r4] ; a comment
ADR r4,d
LDR r1,[r4]
SUB r0,r0,r1 ; comment
04/28/2021 75
ARM INSTRUCTION SET
ARM versions.
ARM assembly language.
ARM programming model.
ARM memory organization.
ARM data operations.
ARM flow of control.
04/28/2021 76
ARM REGISTERS (4)
System & User FIQ Supervisor Abort IRQ Undefined
R0 R0 R0 R0 R0 R0
R1 R1 R1 R1 R1 R1
R2 R2 R2 R2 R2 R2
R3 R3 R3 R3 R3 R3
R4 R4 R4 R4 R4 R4
R5 R5 R5 R5 R5 R5
R6 R6 R6 R6 R6 R6
R7 R7_fiq R7 R7 R7 R7
R8 R8_fiq R8 R8 R8 R8
R9 R9_fiq R9 R9 R9 R9
R10 R10_fiq R10 R10 R10 R10
R11 R11_fiq R11 R11 R11 R11
R12 R12_fiq R12 R12 R12 R12
R13 R13_fiq R13_svc R13_abt R13_irq R13_und
R14 R14_fiq R14_svc R14_abt R14_irq R14_und
R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC)
CPSR CPSR CPSR CPSR CPSR CPSR
SPSR_fiq SPSR_svc SPSR_abt SPSR_irq SPSR_und
04/28/2021 77
ARM PROGRAMMING MODEL
r0 r8
r1 r9
31 0
r2 r10
r3 r11 CPSR
r4 r12
r5 r13
NZCV
r6 r14
r7 r15 (PC)
04/28/2021 78
ENDIANNESS
Relationship between bit and byte/word ordering
defines endianness:
little-endian big-endian
04/28/2021 79
ARM DATA TYPES
Word is 32 bits long.
Word can be divided into four 8-bit bytes.
ARM addresses cam be 32 bits long.
Address refers to byte.
Address 4 starts at byte 4.
Can be configured at power-up as either little- or
bit-endian mode.
04/28/2021 80
ARM STATUS BITS
Every arithmetic, logical, or shifting operation sets
CPSR bits:
N (negative), Z (zero), C (carry), V (overflow).
Examples:
-1 + 1 = 0: NZCV = 0110.
231-1+1 = -231: NZCV = 0101.
04/28/2021 81
ARM INSTRUCTION SET
Features:
Load/Store architecture
3-address data processing instructions
Conditional execution
Load/Store multiple registers
Shift & ALU operation in single clock cycle
04/28/2021 82
ARITHMETIC OPERATIONS
04/28/2021 83
LOGICAL, SHIFT/ROTATE
04/28/2021 84
COMPARISON, MOVE
INSTRUCTIONS
04/28/2021 85
Load/Store Instructions
04/28/2021 86
FLOW OF CONTROL
B – Branch
BL – Branch and Link
Conditional execution:
Each data processing instruction
prefixed by condition code
Result – smooth flow of instructions through pipeline
16 condition codes:
04/28/2021 88
ADDRESSING MODES IN ARM
…
• Absolute Addressing - adding or subtracting to the PC
(r15) a constant equal to the distance between the current instruction
and the desired location, the desired address can be generated
without performing load.
04/28/2021 89
ADDRESSING MODES IN ARM
…
• Base-plus-offset Addressing - related to indirect addressing
the register value is added to
anothervalue to form the address
Eg : LDR r0, [r1, #16] loads r0 with the value stored at location r1 + 16.
r1 – base and immediate value is offset
04/28/2021 90
C ASSIGNMENTS IN ARM
INSTRUCTIONS
x = (a+b) - c
04/28/2021 91
The C statement y = a*(b+c) can be
coded as follows
04/28/2021 92
IMPLEMENTING AN IF STATEMENT IN ARM
if (a < b) {
x = 5;
y = c + d;
}
else x = c – d;
04/28/2021 93
FIR FILTER
IMPLEMENTATION IN ARM
The C code for the FIR filter using for loop Using while loop
for (i = 0, f = 0; i < N; i++) i = 0;
f = f + c[i] * x[i]; f = 0;
while (i < N) {
f = f + c[i]*x[i];
i++;
}
04/28/2021 94
ARM CODE FOR FIR FILTER
04/28/2021 95
I/O DEVICES
Usually includes some non-digital component.
Typical digital interface to CPU:
status
reg
mechanism
CPU
data
reg
04/28/2021 96
PROGRAMMING I/O
Two types of instructions can support I/O:
special-purpose
I/O instructions;
memory-mapped load/store instructions.
Intel x86 provides in, out instructions. Most
other CPUs use memory-mapped I/O.
I/O instructions do not preclude memory-mapped
I/O.
04/28/2021 97
INTERRUPT I/O
Busy/wait is very inefficient.
CPU can’t do other work while testing device.
Hard to do simultaneous I/O.
Interrupts allow a device to change the flow of
control in the CPU.
Causes subroutine call to handle device.
INTERRUPT INTERFACE
intr request
status
reg
mechanism
intr ack
PC
IR
CPU
data/address
data
reg
04/28/2021 98
PRIORITIES AND VECTORS
Two mechanisms allow us to make interrupts more
specific:
Priorities determine what interrupt gets CPU first.
Vectors determine what code is called for each type of
interrupt.
Mechanisms are orthogonal: most CPUs provide
both.
04/28/2021 99
PRIORITIZED INTERRUPTS
interrupt
acknowledge
L1 L2 .. Ln
CPU
04/28/2021 100
INTERRUPT PRIORITIZATION
Masking: interrupt with priority lower than current
priority is not recognized until pending interrupt is
complete.
Non-maskable interrupt (NMI): highest-priority,
never masked.
Often used for power-down.
04/28/2021 101
EXAMPLE: PRIORITIZED I/O
:interrupts :foreground :A :B :C
A,B
04/28/2021 102
INTERRUPT VECTOR
ACQUISITION
:CPU :device
receive
request
receive
ack
receive
vector
04/28/2021 103
GENERIC INTERRUPT
MECHANISM
continue intr?
execution N Assume priority selection is
Y handled before this
point.
N intr priority >
ignore current
priority?
Y
ack
Y
Y N
bus error timeout? vector?
Y
call table[vector]
04/28/2021 104
INTERRUPT SEQUENCE
CPU acknowledges request.
Device sends vector.
CPU calls handler.
Software processes request.
CPU restores state to foreground program.
04/28/2021 105
ARM INTERRUPTS
ARM7 supports two types of interrupts:
Fast interrupt requests (FIQs).
Interrupt requests (IRQs).
Interrupt table starts at location 0.
04/28/2021 108
TRAP
Trap (software interrupt): an exception generated
by an instruction.
Call supervisor mode.
ARM uses SWI instruction for traps.
SHARC offers three levels of software interrupts.
Called by setting bits in IRPTL register.
04/28/2021 109
CO-PROCESSOR
Co-processor: added function unit that is called by
instruction.
Floating-point units are often structured as co-
processors.
ARM allows up to 16 designer-selected co-
processors.
Floating-point co-processor uses units 1, 2.
C55x uses co-processors as well.
04/28/2021 110
CACHES AND CPUS
address data
cache
controller
cache main
CPU
memory
address
data data
04/28/2021 111
CACHE OPERATION
Many main memory locations are mapped onto one
cache entry.
May have caches for:
instructions;
data;
data + instructions (unified).
Terms
Cache hit: required location is in cache.
Cache miss: required location is not in cache.
Working set: set of locations used by program in a time
interval.
04/28/2021 112
TYPES OF MISSES
Compulsory (cold): location has never been
accessed.
Capacity: working set is too large.
Conflict: multiple locations in working set map to
same cache entry.
04/28/2021 114
MULTI-LEVEL CACHE ACCESS
TIME
h1 = cache hit rate.
h2 = hit rate on L2.
Average memory access time:
tav = h1tL1 + (h2-h1)tL2 + (1- h2-h1)tmain
04/28/2021 116
CACHE PERFORMANCE
BENEFITS
Keep frequently-accessed locations in fast cache.
Cache retrieves more than one word at a time.
Sequential accesses are faster after first access.
04/28/2021 117
DIRECT-MAPPED CACHE
cache block
hit value
byte
04/28/2021 118
WRITE OPERATIONS
Write-through: immediately copy write to main
memory.
Write-back: write to main memory only when
location is removed from cache.
SET-ASSOCIATIVE CACHE
A set of direct-mapped caches:
hit data
04/28/2021 119
SCRATCH PAD MEMORIES
Alternative to cache:
Software determines what is stored in scratch pad.
Provides predictable behavior at the cost of
software control.
C55x cache can be configured as scratch pad.
04/28/2021 120
MEMORY MANAGEMENT
UNITS
Memory management unit (MMU) translates addresses:
logical physical
address address
memory
main
CPU management
memory
unit
04/28/2021 121
MEMORY MANAGEMENT
TASKS
Allows programs to move in physical memory
during execution.
Allows virtual memory:
memory images kept in secondary storage;
images returned to main memory on demand during
execution.
Page fault: request for location not resident in
memory.
04/28/2021 122
ADDRESS TRANSLATION
Requires some sort of register/table to allow
arbitrary mappings of logical to physical
addresses.
Two basic schemes:
segmented;
paged.
Segmentation and paging can be combined (x86).
04/28/2021 123
SEGMENTS AND PAGES
page 1
page 2
segment 1
memory
segment 2
04/28/2021 124
SEGMENT ADDRESS
TRANSLATION
physical address
04/28/2021 125
PAGE ADDRESS TRANSLATION
page offset
page i base
concatenate
page offset
04/28/2021 126
PAGE TABLE ORGANIZATIONS
page
descriptor
page descriptor
flat tree
04/28/2021 127
ARM MEMORY
MANAGEMENT
Memory region types:
section:1 Mbyte block;
large page: 64 kbytes;
small page: 4 kbytes.
An address is marked as section-mapped or page-
mapped.
Two-level translation scheme.
04/28/2021 128
ARM ADDRESS TRANSLATION
Translation table 1st index 2nd index offset
base register
descriptor concatenate
1st level table
concatenate
descriptor
2nd level table
physical address
04/28/2021 129
PIPELINING
Several instructions are executed simultaneously at
different stages of completion.
Various conditions can cause pipeline bubbles that
reduce utilization:
branches;
memory system delays;
etc.
04/28/2021 130
PERFORMANCE MEASURES
Latency: time it takes for an instruction to get
through the pipeline.
Throughput: number of instructions executed per
time period.
Pipelining increases throughput without reducing
latency.
04/28/2021 131
ARM7 PIPELINE
ARM 7 has 3-stage pipe:
fetchinstruction from memory;
decode opcode and operands;
execute.
04/28/2021 132
ARM PIPELINE EXECUTION
time
1 2 3
04/28/2021 133
PIPELINE STALLS
If every step cannot be completed in the same
amount of time, pipeline stalls.
Bubbles introduced by stall increase latency,
reduce throughput.
04/28/2021 134
ARM MULTI-CYCLE LDMIA
INSTRUCTION
time
04/28/2021 135
CONTROL STALLS
Branches often introduce stalls (branch penalty).
Stall time may depend on whether branch is taken.
May have to squash instructions that already
started executing.
Don’t know what to fetch until condition is
evaluated.
04/28/2021 136
MEMORY SYSTEM PERFORMANCE
Caches introduce indeterminacy in execution time.
Depends on order of execution.
Cache miss penalty: added time due to a cache
miss.
04/28/2021 137
CPU POWER CONSUMPTION
Most modern CPUs are designed with power
consumption in mind to some degree.
Power vs. energy:
heat depends on power consumption;
battery life depends on energy consumption.
04/28/2021 138
CPU POWER-SAVING STRATEGIES
Reduce power supply voltage.
Run at lower clock frequency.
Disable function units with control signals when
not in use.
Disconnect parts from power supply when not in
use.
POWER MANAGEMENT STYLES
Static power management: does not depend on
CPU activity.
Example: user-activated power-down mode.
Dynamic power management: based on CPU
activity.
Example: disabling off function units.
04/28/2021 139
ARM7 - APPLICATIONS
The ARM7 is ideally suited to those applications
requiring RISC performance from a compact, power-
efficient processor
Telecomms - GSM terminal controller
Datacomms - Protocol conversion
Portable Computing - Palmtop computer
Portable Instrument - Hendheld data acquisition unit
Automotive - Engine management unit
Information systems - Smart cards
Imaging - JPEG controller
04/28/2021 140