0% found this document useful (0 votes)
89 views38 pages

Computing Platforms: Design Methodology. Consumer Electronics Architectures. System-Level Performance and Power Analysis

Uploaded by

Lordwin Micheal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views38 pages

Computing Platforms: Design Methodology. Consumer Electronics Architectures. System-Level Performance and Power Analysis

Uploaded by

Lordwin Micheal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Computing platforms

Design methodology.
Consumer electronics architectures.
System-level performance and power
analysis.
Evaluation boards

Designed by CPU manufacturer or others.


Includes CPU, memory, some I/O devices.
May include prototyping section.
CPU manufacturer often gives out
reference design---can be used as starting
point for your custom board design.
ARM evaluation module

ARM processor.
Display, serial port,
etc.
Prototyping area.
BeagleBoard

OMAP processor.
Audio input and
output.
Video output.
SD card.
Choosing a platform

CPU: choice of instruction sets, features,


etc.
Bus determines available I/O devices,
system performance.
Memory size, speed.
I/O devices vary in performance, cost.
Intellectual property

Hardware designs, source or object code,


netlists, etc.
Used at all levels of design:
Schematics for hardware reference design.
Drivers and run-time libraries.
Software development environments.
BeagleBoard IP

PCB schematics and artwork files.


Bill of materials for components.
Compiler.
Linux.
Debugging embedded
systems

Challenges:
target system may be hard to observe;
target may be hard to control;
may be hard to generate realistic inputs;
setup sequence may be complex.
Host/target design

Use a host system to prepare software for


target system:

target
system

serial line
host system
Host-based tools

Cross compiler:
compiles code on host for target system.
Cross debugger:
displays target state, allows target system to
be controlled.
Software debuggers

A monitor program residing on the target


provides basic debugger functions.
Debugger should have a minimal footprint
in memory.
User program must be careful not to
destroy debugger program, but , should
be able to recover from some damage
caused by user code.
Breakpoints

A breakpoint allows the user to stop


execution, examine system state, and
change state.
Replace the breakpointed instruction with
a subroutine call to the monitor program.
ARM breakpoints

0x400 MUL r4,r6,r6 0x400 MUL r4,r6,r6


0x404 ADD r2,r2,r4 0x404 ADD r2,r2,r4
0x408 ADD r0,r0,#1 0x408 ADD r0,r0,#1
0x40c B loop 0x40c BL bkpoint

uninstrumented code code with breakpoint


Breakpoint handler actions

Save registers.
Allow user to examine machine.
Before returning, restore system state.
Safest way to execute the instruction is to
replace it and execute in place.
Put another breakpoint after the replaced
breakpoint to allow restoring the original
breakpoint.
In-circuit emulators

A microprocessor in-circuit emulator is a


specially-instrumented microprocessor.
Allows you to stop execution, examine
CPU state, modify registers.
Logic analyzers

A logic analyzer is an array of low-grade


oscilloscopes:
Logic analyzer
architecture

UUT sample
microprocessor
memory

system clock vector


address
controller
state or
clock timing mode
gen
keypad display
Boundary scan

Simplifies testing of
multiple chips on a
board.
Registers on pins can
be configured as a
scan chain.
Used for debuggers,
in-circuit emulators.
How to exercise code

Run on host system.


Run on target system.
Run in instruction-level simulator.
Run on cycle-accurate simulator.
Run in hardware/software co-simulation
environment.
Debugging real-time code

Bugs in drivers can cause non-


deterministic behavior in the foreground
problem.
Bugs may be timing-dependent.
Consumer electronics use
cases

 Multimedia: stored in
compressed form,
uncompressed on
viewing.
 Data storage and
management: keep track
of your multimedia, etc.
 Communication:
download, upload, chat.
Non-functional
requirements for CE

Often battery-operated, strict power


budget.,
Very inexpensive.
User interface must be capable but
inexpensive.
CE devices and hosts

 Many devices talk to host


system.
PC host does things that
are hard to do on the
device.
Increasingly, CE
devices communicate
directly over the
network, avoiding the
host for access.
Platforms and operating
systems

Many CE devices use


a DSP for signal
processing and a
RISC CPU for other
tasks.
I/O devices include
buttons, screen, USB.
Flash file systems

Flash is widely used for mass storage.


Flash wears out on writing (up to 1 million
cycles).
Directory is most often written, wears out
first.
Flash file system has layer that moves
contents to levelize wear.
Hides wear leveling from API.
Cell phones

Most popular CE
device in history;
most widely used
computing device.
1 billion sold per year.
Handset talks to cell.
Cells hand off
handset as it moves.
Cell phone platforms
 Today’s cell phones use analog
front end, digital baseband
processing.
 Future cell phones will
perform IF processing with
DSP.
 Baseband processing in DSP:
 Voice compression.
 Network protocol.
 Other processing:
 Multimedia functions.
 User interface.
 File system.
 Applications (contacts, etc.)
© 2000 Morgan Overheads for Computers as
Kaufman Components
System-level performance
analysis

Performance depends
on all the elements of
the system: memory
CPU. CPU
Cache. cache
Bus.
Main memory.
I/O device.

Overheads for Computers as


© 2008 Wayne Wolf Components 2nd ed.
Bandwidth as performance

Bandwidth applies to several components:


Memory.
Bus.
CPU fetches.
Different parts of the system run at
different clock rates.
Different components may have different
widths (bus, memory).
Bandwidth and data
transfers

Video frame: 320 x 240 x 3 = 230,400


bytes.
Transfer in 1/30 sec.
Transfer 1 byte/msec, 0.23 sec per frame.
Too slow.
Increase bandwidth:
Increase bus width.
Increase bus clock rate.
Bus bandwidth

T: # bus cycles.


O1 D O2
P: time/bus cycle.
Total time for
W
transfer:
t = TP.
D: data payload
length.
O1 + O2 = overhead Tbasic(N) = (D+O)N/W
O.
Bus burst transfer
bandwidth

T: # bus cycles.


1 2 B O
P: time/bus cycle.
Total time for
… W
transfer:
t = TP.
D: data payload
length.
O1 + O2 = overhead Tburst(N) = (BD+O)N/(BW)
O.
Memory aspect ratios

16 M
64 M

8M

8
Memory access times

Memory component access times comes


from chip data sheet.
Page modes allow faster access for
successive transfers on same page.
If data doesn’t fit naturally into physical
words:
A = [(E/w)mod W]+1
Bus performance
bottlenecks

Transfer 320 x 240


video frame @ 30
frames/sec = 612,000 memory
bytes/sec. CPU

Is performance
bottleneck bus or
memory?
Bus performance
bottlenecks, cont’d.

Bus: assume 1 MHz bus, D=1, O=3:


Tbasic = (1+3)612,000/2 = 1,224,000 cycles
= 1.224 sec.
Memory: try burst mode B=4, width
w=0.5.
Tmem = (4*1+4)612,000/(4*0.5) = 2,448,000
cycles = 0.2448 sec.
Performance spreadsheet
bus memory
clock period 1.00E-06 clock period 1.00E-08
W 2 W 0.5
D 1 D 1
O 3 O 4
B 4
N 612000 N 612000

T_basic 1224000 T_mem 2448000


t 1.22E+00 t 2.45E-02
Parallelism

Speed things up by
running several units
at once.
DMA provides
parallelism if CPU
doesn’t need the bus:
DMA + bus.
CPU.

You might also like