0% found this document useful (0 votes)
5 views52 pages

Devembeddedf (Compatibility Mode)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views52 pages

Devembeddedf (Compatibility Mode)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Microprocessor vs microcontroller

A Microprocessor-Based Embedded System A Microcontroller-Based Embedded System

Embedded System Design program


memory
data
storage
program
memory
data
storage

I/O
microprocessor
I/O
microprocessor core
I/O I/O

real-time
real-time I/O I/O clock
clock
Dr. J.T. Devaraju To outside world To outside world

Dept. of Electronic Science


• Microcontroller advantages
Bangalore University
Bangalore – lower cost, more reliable, better performance,
faster and lower RF signature
– may be less flexible for research and development
projects

What is an embedded system?


Outline • An embedded system can be defined as a computing device
embedded within electronic devices that does a ‘specific focused
• Embedded systems overview job.

– What are they? • Embedded systems do a very specific task, they cannot be
• Design challenge – optimizing design programmed to do different things.
metrics • Embedded systems have very limited resources, particularly the
• Technologies memory. Generally, they do not have secondary storage devices
such as the CDROM or the floppy disk.
– Processor technologies
– IC technologies • Hard to define. Nearly any computing system other than a desktop
This image cannot currently be display ed.

computer
– Design technologies

embedded system cont…


• The Embedded system technology is one of the Applications
highest growth areas & each day, our lives becoming
more dependent on 'embedded systems’, •Consumer electronics
•Office automation
• Recent statistics claiming that an average home in •Industrial Automation
cities has between 50 to 100 microcontrollers busy •Biomedical engineering
in computing & controlling different environments •Wireless communication
and devices. •Data communication
•Telecommunications
•Automobile Industry
• Mercedes s-class car has 63 microprocessors. •Military and so on.
BMW has car 65 microprocessors.

• May be after some years who knows human organs


like liver kidney etc may be replaced by embedded

1
Consumer electronics Office Automation Industrial Automation
•Digital camera, digital diary, • For process control in
pharmaceutical, cement, sugar,
• DVD player, electronic toys, •Copying machine
oil exploration, nuclear energy,
• microwave oven, •Fax machine
electricity genera-tion and
•Key telephone,
•Remote controls for TV , •Printer,
transmission.
• air-conditioner, • For Monitoring the temperature,
•Scanner,
•video game consoles, video pressure, humidity, etc., and then
take appropriate action based on the
recorders , •Modem etc.
monitored levels to control other
• wristwatches, devices or to send information to
• Mobile phones,PDA’s, a centralized monitoring station.
• Palmtops etc. • In hazardous industrial environment,
robots are used, complicated tasks
such as hardware assembly.

Medical electronics Computer networking Telecommunications


Bridges, routers,
ECG, EEG Integrated Services Digital subscriber terminals Network equipment
Blood Pressure measuring devices, Networks (ISDN), Asynchronous
X-ray scanners Transfer Mode (ATM), X.25 and
Equipment used in blood analysis, frame relay switches are embedded key telephones, Multiplexers,
radiation, systems which implement the ISDN phones, Multiple access systems,
colonoscopy, endoscopy necessary data communication Packet Assemblers dissemblers
Advanced microscopes etc.
terminal adapters,
protocols. web cameras (PADs), ~~ate11ite modems etc.
IP phone, IP gateway,
Wireless technologies IP gatekeeper etc. are the latest
embedded systems that provide
very low-cost voice
Base station Controllers
Router server communication over the Internet.
mobile switching Centers.

Insemination Security Automobile Industry


Today’s high-tech car has about 20 embedded
Testing and measurement Security devices at homes, systems for transmission control,
Equipment In scientific and offices, airports etc. • Engine spark control,
engineering activities such as for authentication and • Air-conditioning,
oscilloscope, spectrum verification • Digital tachometer,
analyzer, logic analyzer, Access control in high security • Navigation etc.
protocol analyzer, radio buildings.
communication test set etc.

Finance
•smart cards (has a small micro-controller
and memory and it interacts with the
smartcard reader)
• ATM

2
Kereta BMW Night Vision
Towards Autonomous Vehicles

http://iLab.usc.edu
http://beobots.org

TELEMEDICINE VIA SATELLITE


Extension education for
Doctors in rural/remote
Reaching the un-reached
areas

Referral
Hospitals
Health Specialist
Centre
Video Conferencing

Buffalo's LinkStation Mini


Cardiology
Polaroid
Zink

Pathology
Video Conferencing iKIT with keyboard
Rural / AMBULAN CE

Panel of Doctors
Remote/ Inaccessible

The iPhone is a line of Internet- and multimedia-


enabled smartphones
camera phone
text messaging and visual voicemail,
portable media player (equivalent to a video • The first Google-
iPod),
Internet client with e-mail, web browsing, powered handsets
and HSDPA, Wi-Fi 802.11b/g, connectivity
WAP
hit stores, and
Multi-touch input method (virtual keyboard) already the first
GPRS, EDGE
Bluetooth, disposable Android
-Accelerometer sensor for auto-rotate
- Proximity sensor phone is coming…
-- Scratch-resistant surface
Call records upto 100,
8 GB/ 16 GB storage, 128 MB RAM
Infrared port Hop-on will announces its
USB Android phone in January
Camer 2009 at CES

3
Myvu Crystal • This robot has
“Personal humidifying, oxygen-
Viewer” producing, aroma-
connects easily emitting, and kinetic
to your video functions.
iPod or • The robotic plant can
interact with people
portable media
when they approach
player it and can ‘dance’
It is about 1.30 meter tall and
40 centimeters in diameter.
Weighing about 1 ounce, Myvu's SolidOptex™ optical technology
when music is (The flower, not the kid…
provides the user with the impression of a free-floating monitor. he’ s shorter…)
Let’s call it a “Monitor-on-the-Nose.” played.

• Wearable computing..
3M Mini-Projector is Several Bluetooth helmets
Designed for Business have been developed for
Professionals skiing and motorcycling from
companies such as Marker
and Motorola.
• Jackets that plug into all of
your gear and create a
personal area network are
available from ScotteVest.
TI and others develop the smallest projectors Bluetooth glove phone (Jason Bradbury, UK)

• First, it will be the big • The Minoru uses two lenses to


3D cinema roll-out capture your images and
• Then Hollywood will videos in 3-D.
encourage Home TV • It can be used with programs
as an outlet for their like Windows Live Messenger.
already paid-for 3D • To see the 3-D image, your
content chat partner needs– you
Philips is, creating some of the first 3D
TVs that don’t require glasses (auto-
guessed it—some of those
stereoscopic displays are built with new 3D glasses!
dozens of micro lenses that transmit
different images to right and left eyes.

4
Plastic Logic will target business • KINDLE, to surprise
readers of all, sold 240,000
units before Q4
This $350 machine e-book
reader is Amazon's iPod, at
378,000 units this year. The
Kindle will in 3 years be a
$1.1 billion business and 4%
of all Amazon sales.

5
Iowa State robot available for ribbon cuttings,
Robotic Applications birthday parties, uprisings
• Parts handling
• Assembly
• Painting
• Surveillance
• Security (bomb disposal …
really telecherics
rather than robotics)
• Home help (grass cutting,
nursing)

Applications cont.. Some common characteristics of


embedded systems

• Well, the list goes on. It is no exaggeration • Single-functioned


to say that, eyes wherever you go, you • Tightly-constrained
can see, or at least feel, the work of an • Reactive and real-time
embedded system!

Single-functioned Tightly-constrained
Embedded systems have to work against some deadlines. A specific
job has to be completed within a specific time. In some embedded
• Executes a single program repeatedly unlike desktop pc. systems, called real-time systems, the deadlines are stringent.
Missing a deadline may cause a catastrophe-loss of life or damage
•Embedded systems do a very specific task, they to property.
cannot be programmed to do different things. (Low cost, low power, small, fast)
missile that has to track and intercept an enemy aircraft.
Ex:- Pager, weighing machine. The missile contains an embedded system that tracks the
aircraft and generates a control signal that will launch the
missile. If there is a delay in tracking the aircraft and if
the missile misses the deadline, the enemy aircraft may
drop a bomb and cause loss of many lives. Hence, this
system is a hard real-time embedded system

•The Engine management program of the CAR must


generate pulses that actuate the fuel injectors with timely
& calculated pattern

•Break control system

6
Tightly-constrained cont…

Embedded systems are constrained for power. As many embedded


Reactive and real-time
systems operate through a battery, the power consumption has to be
very low.
•Continually reacts to changes in the system’s
Embedded systems need to be highly reliable. Once in a while, environment
pressing ALT-CTRL-DEL is OK on your desktop, but you cannot Must compute certain results in real-time without delay
afford to reset your embedded system. It is not acceptable for your
car engine management system to require occasional rebooting
because of software hang up.

Some embedded systems have to operate in extreme environmental


conditions such as very high temperatures and humidity.

Embedded systems that address the consumer market (for example,


electronic toys) are very cost-sensitive: Even a reduction of Rs1 is lot
of cost saving, because thousands or millions systems may be sold.

The embedded system market is one of the highest growth areas


Categories of Embedded Systems
as these systems are used in very market segment
Based on functionality and performance requirements,
Recent advancements in Embedded systems Increasing integration of
communication, multimeadia, processing & relentless digitization of data embedded systems can be categorized as
(including even RF data) continue to expand the scope and complexity ES.
 Stand-alone embedded systems
To appreciate these advances and to productively contribute to  Real-time systems
function of these systems, the comprehensive understanding of the
 Networked information appliances
technology behind the embedded system is must.
 Mobile devices
The growing popularity of modern ES’s design requires
electronics computer engineers who can cross the boundaries
between HW & SW Simultaneously by knowing technology
capabilities and limitations of the HW and SW to build ES and
methods to evaluate design.

Real Time Systems


Embedded systems in which some specific work has to be done in a specific time
Stand alone Embedded Systems period are called real-time systems.

Hard real time’ systems.


As the name implies, stand-alone systems work in stand-alone
mode. They take inputs, process them and produce the desired Example, consider a system that has to open a valve within 30 milliseconds when
output. the humidity crosses a particular threshold. If the valve is not opened within 30
milliseconds, a catastrophe may occur.
automobiles, consumer electronic items etc
Such systems with strict deadlines are called hard real time’ systems.

Soft real-time systems


In some embedded systems, deadlines are imposed, but not adhering to them
once in a while may not lead to a catastrophe.
Example, DVD player Suppose, you give a command to the DVD player from a
remote control, and there is a delay of a few milliseconds in executing that
command. But, this delay won’t lead to a serious implication. Such systems are
called soft real-time systems

7
Networked Information Appliances An embedded system example -
Embedded systems that are provided with network interfaces and accessed by - a digital camera
networks such as Local Area Network or the Internet are called networked
Digital camera chip
information appliances. Such CCD

CCD preprocessor Pixel coprocessor D2A


A2D

lens

JPEG codec M icrocontroller M ultiplier/Accum

DM A controller Display ctrl

Mobile Devices M emory controller ISA bus interface UART LCD ctrl

Mobile devices such as mobile phones, Personal Digital Assistants • Single-functioned -- always a digital camera
(PDAs), smart phones etc. are a special category of embedded
• Tightly-constrained -- Low cost, low power, small, fast
systems.
• Reactive and real-time -- only to a small extent

Design challenge – optimizing Design challenge – optimizing


design metrics design metrics
• Obvious design goal: • Common metrics
– Construct an implementation with desired – Unit cost: the monetary cost of manufacturing each copy of
functionality the system, excluding NRE cost
– NRE cost (Non-Recurring Engineering cost):
• Key design challenge: The one-time monetary cost of designing the system
– Simultaneously optimize numerous design – Size: the physical space required by the system
metrics
– Performance: the execution time or throughput of the
• Design metric system

– A measurable feature of a system’s – Power: the amount of power consumed by the system
implementation – Flexibility: the ability to change the functionality of the
system without incurring heavy NRE cost
– Optimizing design metrics is a key challenge

Design metric competition -- improving one


Design challenge – optimizing may worsen others
design metrics Power

• Common metrics (continued) Performance Size

– Time-to-prototype: the time needed to build a working


version of the system NRE cost

– Time-to-market: the time required to develop a system to


Metrics typically compete with anther,
the point that it can be released and sold to customers
Improveing one often leads to worsening of onther
– Maintainability: the ability to modify the system after its
initial release • Expertise with both software and hardware is
– Correctness, safety, many more needed to optimize design metrics
– Not just a hardware or software expert, as is common
– A designer must be comfortable with various
technologies in order to choose the best for a given
application and constraints

8
Losses due to delayed market entry
Time-to-market: a demanding
design metric • Simplified revenue model
– Product life = 2W, peak at
• Time required to
W
develop a product to Peak revenue – Time of market entry
the point it can be Peak revenue from defines a triangle,

Revenues ($)
sold to customers delayed entry
representing market
Revenues ($)

On-time

• Market window Market rise Market fall penetration


– Period during which Delayed
– Triangle area equals
the product would revenue
Time (months) have highest sales D W 2W
• Loss
On-time Delayed Time
• Average time-to- entry entry – The difference between the
market constraint is on-time and delayed
about 8 months triangle areas
• Delays can be costly

Losses due to delayed market entry (cont.)


NRE and unit cost metrics
• Area = 1/2 * base * height
• Costs:
Peak revenue – On-time = 1/2 * 2W * W – Unit cost: the monetary cost of manufacturing each copy of the
Peak revenue from – Delayed = 1/2 * (W-D+W)*(W-D) system, excluding NRE cost
Revenues ($)

delayed entry
– NRE cost (Non-Recurring Engineering cost): The one-time
Market rise
On-time
Market fall
• Percentage revenue loss = monetary cost of designing the system

Delayed
(D(3W-D)/2W2)*100% – total cost = NRE cost + unit cost * # of units
– per-product cost = total cost / # of units
= (NRE cost / # of units) + unit cost
D W 2W
• Example
On-time Delayed Time
entry entry – NRE=$2000, unit=$100
– For 10 units
– Lifetime 2W=52 wks, delay D=4 wks – total cost = $2000 + 10*$100 = $3000
– (4*(3*26 –4)/2*26^2) = 22% – per-product cost = $2000/10 + $100 = $300
– Lifetime 2W=52 wks, delay D=10 wks
– (10*(3*26 –10)/2*26^2) = 50% Amortizing NRE cost over the units results in an
– Delays are costly! additional $200 per unit

The performance design metric


NRE and unit cost metrics • Measure of how long the system take to execute desired
• Compare technologies by costs -- best depends tasks.
on quantity • Widely-used measure of system, widely-abused
– Clock frequency, instructions per second – not good measures
– Technology A: NRE=$2,000, unit=$100
– Digital camera example – a user cares about how fast it processes
– Technology B: NRE=$30,000, unit=$30 images, not clock speed or instructions per second
– Technology C: NRE=$100,000, unit=$2 • Latency (response time)
– Time between task start and end
$200,000 $200
A A
B
– e.g., Camera’s A and B process images in 0.25 seconds
B
$160,000 $160
C C
• Throughput
total c ost (x1000)

per produc t c ost

$120,000 $120
– Tasks per second, e.g. Camera A processes 4 images per second
$80,000 $80
– Throughput can be more than latency seems to imply due to
$40,000 $40 concurrency, e.g. Camera B may process 8 images per second (by
$0 $0 capturing a new image while previous image is being stored).
0 800 1600 2400 0 800 1600 2400
Number of units (volume) Numb er of units (volume) • Speedup of B over S = B’s performance / A’s
performance
• But, must also consider time-to-market
– Throughput speedup = 8/4 = 2

9
Three key embedded system
Processor Technology
technologies
• Technology
– A manner of accomplishing a task, especially
using technical processes, methods, or
knowledge General Application Single
• Three key technologies for embedded purpose specific purpose
Processor Processor Processor
systems
– Processor technology Microprocessors &
microcontrollers ASIP Specific
– IC technology processor
F P processor
– Design technology Programming
router

Processor technology Processor technology


• The architecture of the computation engine used
to implement a system’s desired functionality • Processors vary in their customization for the problem at
• Processor does not have to be programmable hand total = 0
for i = 1 to N loop
– “Processor” not equal to general-purpose processor total += M[i]
Controller Datapath Controller Datapath Controller Datapath end loop
Desired
Control Control logic Registers Control index
logic and
Register
file and State logic
total
functionality
State register register
Custom State
+
ALU register
General
IR PC ALU IR PC
Data Data
memory memory
Program Data Program memory
memory memory
Assembly code Assembly code
for: for:
General-purpose Application-specific Single-purpose
total = 0
for i =1 to …
total = 0
for i =1 to …
processor processor processor
General-purpose (“software”) Application-specific S ingle-purpose (“hardware”)

General-purpose processors
–Performance may be fast for computation intensive applications
• Programmable device used in a
variety of applications Controller Datapath
– Also known as “microprocessor” Control –Unit cost may be relatively high for large quantities
Register
logic and
• Features State register
file
–Performance may be slow for certain operation
– Program memory General
–Size and power may be large
– General datapath with large register IR PC ALU

file and general ALU


Program Data
• User benefits memory memory

– Low time-to-market and NRE costs Assembly code


for:
– Unit cost may be low for small total = 0
quantities for i =1 to …

– High flexibility
• “Pentium” the most well-known,
but there are hundreds of others

10
Single-purpose processors Application-specific processors
• Programmable processor
• Digital circuit designed to optimized for a particular class
execute exactly one program Controller Datapath of applications having common
Controller Datapath

Control
– a.k.a. coprocessor, accelerator or Control index
characteristics logic and
Registers
logic
peripheral total State register
State – Compromise between general- Custom
• Features register +
purpose and single-purpose IR PC
ALU

– Contains only the components processors Data


Data
needed to execute a single memory
• Features Program
memory
memory

program
– Program memory Assembly code
– No program memory for:
– Optimized datapath total = 0
• Benefits – Special functional units
for i =1 to …

– Fast
• Benefits
– Low power
– Some flexibility, good
– Small size performance, size and power

GPP SPP ASIP


How to choose a processor?
Performance Fast Fast <gpp good There are over 300 alternatives
low for certain
application Real-time
Cost of Constraints Legacy
Size Large Low good Code Power
Goods
Budget

Power Large Low good

Unit cost Low –samll quant Low –large quant Performance


High-large quant High-small quant Time to
Market
Flexibility High Low

Design Time Low High


Tool
NRE cost Low High Large Support
Landmines

Time to market Low High

Selection of system components Choices for processor architecture


• Adequacy of I/O and processor capabilities • CISC - Complex Instruction Set Computer
– Digital and analog input/output facilities. – Many instructions which can perform involved operations:
– Timers, PWM channels, encoder interfaces compact code
– Communication channels: I2C, RS232, parallel, CAN, ethernet – Can be many clock cycles per instruction
– Interrupt sources and configurability – Large silicon area > Higher cost per die
• RISC - Reduced Instruction Set Computer
• Performance
– More modern architecture
– Processor core speed (MIPS)
– One instruction executed per clock cycle > Very fast
– I/O overhead and performance (i.e. acquisition speed, accuracy) • DSP - Digital Signal Processor
– Dedicated hardware for communications and timing operations – Specialized type of uP
– Requires a collection of benchmarking standard and tools – Designed for real time mathematical manipulation of data
• RTOS and tool chain availability streams
– RTOS compatibility, device drivers and debugging tools • Radar image processing, audio/voice processing, ultrasound
and photographic image processing
– Tool chain: compilers, debugging facilities and performance
– Includes instructions designed for multiplication and
evaluation tools
accumulation
Choice can be one of the three or a combination

11
IC technology
IC technology
• The manner in which a digital (gate-level)
implementation is mapped onto an IC • Three types of IC technologies
– IC: Integrated circuit, or “chip” – Full-custom/VLSI
– IC technologies differ in their customization to – Semi-custom ASIC (gate array and standard
a design cell)
– IC’s consist of numerous layers (perhaps 10 – PLD (Programmable Logic Device)
or more)
• IC technologies differ with respect to who builds
each layer and when

gate
IC package IC oxide
source channel drain
Silicon substrate

Full-custom/VLSI Semi-custom
• All layers are optimized for an embedded • Lower layers are fully or partially built
system’s particular digital implementation – Designers are left with routing of wires and
– Placing transistors maybe placing some blocks
– Sizing transistors • Benefits
– Routing wires – Good performance, good size, less NRE cost
• Benefits than a full-custom implementation (perhaps
– Excellent performance, small size, low power $10k to $100k)

• Drawbacks • Drawbacks
– High NRE cost (e.g., $300k), long time-to- – Still require weeks to months to develop
market

PLD (Programmable Logic Moore’s law


Device) • The most important trend in embedded
• All layers already exist systems
– Designers can purchase an IC – Predicted in 1965 by Intel co-founder Gordon
– Connections on the IC are either created or Moore
destroyed to implement desired functionality IC transistor capacity has doubled roughly
– Field-Programmable Gate Array (FPGA) very every 18 months for the past several
popular 10,000
decades
1,000
• Benefits LogicNote:
transistors 100
logarithmic
per chipscale 10
– Low NRE costs, almost instant IC availability (in millions) 1
0.1
• Drawbacks 0.01
0.001
– Bigger, expensive (perhaps $30 per unit),
power hungry, slower

12
Graphical illustration of Moore’s
Moore’s law
law
• Wow 1981 1984 1987 1990 1993 1996 1999 2002
– This growth rate is hard to imagine, most
10,000 150,000,000
people underestimate transistors transistors

– How many ancestors do you have from 20 Leading edge Leading edge

generations ago chip in 1981 chip in 2002

• i.e., roughly how many people alive in the 1500’s


did it take to make you?
• 220 = more than 1 million people • Something that doubles frequently grows
– (This underestimation is the key to pyramid more quickly than most people realize!
schemes!) – A 2002 chip can hold about 15,000 1981
chips inside itself

Design Technology Design productivity exponential


• The manner in which we convert our increase
concept of desired system functionality 100,000

10,000
into an implementation

(K) Trans./Staff – Mo.


1,000
Compilation/ Libraries/ Test/
Synthesis IP Verification

Productivity
100
System System Hw/Sw/ M odel simulat./
Compilation/Synthesis: specification synthesis OS checkers
Automates exploration and 10
insertion of implementation
details for lower level.
1
Behavioral Behavior Cores Hw-Sw
specification synthesis cosimulators
Libraries/IP: Incorporates pre- 0.1
designed implementation from
lower abstraction level into 0.01
1983

2003

2005
1985

1987

1991

1993

2001
higher level.
1989

1997

1999

2007
1995

2009
RT RT RT HDL simulators
specification synthesis components

Test/Verification: Ensures correct


functionality at each level, thus
reducing costly iterations Logic Logic Gates/ Gate
between levels. specification synthesis Cells simulators
• Exponential increase over the past few
To final implementation
decades

Independence of processor and


The co-design ladder
IC technologies
• In the past: Sequential program code (e.g., C, VHDL)
• Basic tradeoff
Behavioral synthesis
– Hardware and Compilers
(1960's,1970's)
(1990's)
– General vs. custom
software design Register transfers
Assembly instructions RT synthesis – With respect to processor technology or IC
technologies were (1980's, 1990's)
Assemblers, linkers technology
very different (1950's, 1960's) Logic equations / FSM's General- Single-
Logic synthesis – The two technologies
purpose are independent
ASIP purpose
– Recent maturation of Machine instructions (1970's, 1980's) General,
processor processor
Customized,
providing improved: providing improved:
synthesis enables a Logic gates
Flexibility
Power efficiency
unified view of Implementation
Maintainability
Performance
Microprocessor plus VLSI, ASIC, or PLD NRE cost
hardware and program bits: “software” implementation: “hardware” Time- to-prototype
Size
Cost (high volume)
software
The choice of hardware versus software for a particular function is simply a tradeoff among various
Time-to-market
Cost (low volume)
design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no
• Hardware/software
fundamental difference between what hardware or software can implement. PLD Semi-custom Full-custom

“codesign”

13
Design productivity gap
Design productivity gap • 1981 leading edge chip required 100 designer
months
• While designer productivity has grown at an – 10,000 transistors / 100 transistors/month
impressive rate over the past decades, the rate • 2002 leading edge chip requires 30,000
of improvement has not kept pace with chip designer months
10,000
capacity 100,000 – 150,000,000 / 5000 transistors/month
1,000 10,000

Logic transistors 100 1000 • Designer cost increase from $1M to $300M
10 Gap Productivity
per chip 100
IC capacity (K) Trans./Staff-Mo. 10,000 100,000
(in millions) 1 10
1,000 10,000
0.1 1
productivity Logic transistors 100 1000
0.01 0.1 10 Gap 100 Productivity
per chip IC capacity
0.001 0.01 (in millions) 1 10 (K) Trans./Staff-Mo.
0.1 1
productivity
0.01 0.1
0.001 0.01

The mythical man-month


• The situation is even worse than the productivity gap Summary
indicates
• In theory, adding designers to team reduces project completion time • Embedded systems are everywhere
• In reality, productivity per designer decreases due to complexities of
team management and communication • Key challenge: optimization of design metrics
• In the software community, known as “the mythical man-month” (Brooks – Design metrics compete with one another
1975)
• A unified view of hardware and software is
• At some point, can actually lengthen project completion time! (“Too many
cooks”) necessary to improve productivity
Team
• 1M transistors, 1 60000 16 15 16 • Three key technologies
designer=5000 50000 19 18
trans/month 40000 24 23 – Processor: general-purpose, application-specific,
• Each additional 30000
Months until completion single-purpose
designer reduces for 20000 43
100 trans/month 10000 Individual – IC: Full-custom, semi-custom, PLD
• So 2 designers produce 0 10 20 30 40 – Design: Compilation/synthesis, libraries/IP,
4900 trans/month each Number of designers
test/verification

Outline
• Introduction
Chapter 2: Custom single- • Combinational logic
purpose processors • Sequential logic
• Custom single-purpose processor design
• RT-level custom single-purpose processor
design

14
GPP SPP ASIP
Performance Fast Fast <gpp good
Introduction
low for certain
application • Processor
Size Large Low good – Digital circuit that performs a
computation tasks Digital camera chip

Power Large Low good – Controller and datapath CCD

CCD Pixel coprocessor D2A


– General-purpose: variety of A2D preprocessor
computation tasks lens
Unit cost Low –samll quant Low –large quant – Single-purpose: one particular JPEG codec M icrocontroller M ultiplier/Accum
High-large quant High-small quant computation task
Flexibility High Low – Custom single-purpose: non- DM A controller Display

standard task ctrl

Design Time Low High • A custom single-purpose M emory controller ISA bus interface UART LCD ctrl

NRE cost Low High Large processor may be


– Fast, small, low power
Time to market Low High – But, high NRE, longer time-to-
market, less flexible

CMOS transistor on silicon CMOS transistor implementations


• Transistor
• Complementary Metal source source
– The basic electrical component in digital Oxide Semiconductor gate Conducts gate Conducts
if gate=1 if gate=0
systems drain drain
• We refer to logic levels nMOS pMOS
– Acts as an on/off switch – Typically 0 is 0V, 1 is 5V
– Voltage at “gate” controls whether current • Two basic CMOS types
source 1 1 1
flows from source to drain gate Conducts
– nMOS conducts if gate=1 x y x
if gate=1 x F = x' y
– Don’t confuse this “gate” with1 a logic gate drain
– pMOS conducts if gate=0 x
F = (xy)'
F = (x+y)'
0 y x y
– Hence “complementary” 0 0

• Basic gates inverter NAND gate NOR gate

gate
IC package IC oxide – Inverter, NAND, NOR
source channel drain
Silicon substrate

Basic logic gates


x F x
0
F
0
x
y
F
x
0
y
0
F
0
x
y F
x
0
y
0
F
0
x
y
F
x
0
y
0
F
0
Combinational logic design
1 1 0 1 0 0 1 1 0 1 1
1 0 0 1 0 1 1 0 1
F=x F=xy F=x+y F=xy A) Problem description B) Truth table C) Output equations
1 1 1 1 1 1 1 1 0
Driver AND OR XOR Inputs Outputs
y is 1 if a is to 1, or b and c are 1. z is 1 if y = a'bc + ab'c' + ab'c + abc' + abc
b or c is to 1, but not both, or if all are 1. a b c y z
F <= x; F <= x and y; F <= x or y; 0 0 0 0 0
F <= x xor y;
0 0 1 0 1 z = a'b'c + a'bc' + ab'c + abc' + abc
0 1 0 0 1
x F x F x x y F x x y F x x y F 0 1 1 1 0
F F F 1 0 0 1 0
0 1 y 0 0 1 y 0 0 1 y 0 0 1 1 0 1 1 1
1 0 0 1 1 0 1 0 0 1 0 D) Minimized output equations 1 1 0 1 1
F = x’ F = (x y)’ 1 0 1 F = (x+y)’ 1 0 0 F=x y 1 0 0 y bc 1 1 1 1 1 E) Logic Gates
Inverter NAND 1 1 0 NOR 1 1 0 XNOR 1 1 1 a 00 01 11 10
0 0 0 1 0
F= not x; a y
F= x nand y; F= x nor y; F= x xnor y; 1 1 1 1 1 b
c
y = a + bc architecture Behavioral of prob is
library IEEE; z
bc
use IEEE.STD_LOGIC_1164.ALL; VHDL CODE module and_2( a 00 01 11 10 begin
use IEEE.STD_LOGIC_ARITH.ALL; input a, 0 0 1 0 1 y <= a or ( b and c );
z <= ( a and b ) or ( b xor c ); z
use IEEE.STD_LOGIC_UNSIGNED.ALL; input b, VERILOG 1 0 1 1 1
end Behavioral;
entity AND2 is architecture Behavioral of AND2 is
z = ab + b’c + bc’
Port ( a : in STD_LOGIC; begin output c
b : in STD_LOGIC; c<= a and b; );
c : out STD_LOGIC); end Behavioral; always
end AND2; c <= a & b;
endmodule

15
Combinational components Combinational components Cont…
I(log n -1) I0
I(m-1) I1 I0

n … A B
A B A B
log n x n n n
S0 n-bit, m x 1 n n n
Decoder
… Multiplexor
… n bit,
S(log m) n-bit n-bit m function S0
n Adder Comparator ALU
O = I0 if S=0..00 O0 =1 if I=0..00 …
O(n-1) O1 O0 n
I1 if S=0..01 O O1 =1 if I=0..01 n S(log m)

… …
I(m-1) if S=1..11 O(n-1) =1 if I=1..11 With enable input e  all O’s are 0 if e=0 sum = A+B
carry sum less equal greater O = A op B O
(first n bits) With carry-in input Ci less = 1 if A<B op determined
architecture Behavioral of mux is equal =1 if A=B by S.
carry = (n+1)’th
begin sum = A + B + Ci greater=1 if A>B
bit of A+B May have status outputs carry, zero, etc.
process(a,b,i0,i1,i2,i3);
variable muxval:integer;
Begin muxval:=0;
architecture Behavioral of decoder is
if(a='1')then
muxval:=muxval+1; architecture Behavioral of adder4 is architecture Behavioral of comp2 is
begin begin
end if; begin
if(b='1')then fulad f1( a[0], b[0], ci, s[0], co[0] ); i[0] <= a[0] xnor b[0];
y0<=not ((not a) and (not b)); i[1] <= a[1] xnor b[1];
muxval:=muxval+2; fulad f2( a[1], b[1], co[0], s[1], co[1] );
y1<=not ((not a) and b);
end if; fulad f3( a[2], b[2], co[1], s[2], co[2] );
y2<= not (a and (not b)); eq <= i[0] and i[1];
case muxval is fulad f1( a[3], b[3], co[2], s[3], co[3] );
y3<= not (a and b); g <= ( a[1] and (not ( b[1] )) or
when 0 => y<=i0; end Behavioral;
when 1 => y<=i1; ( i[1] and a[0] and (not ( b[0] ));
end Behavioral; l <= ( (not (a[1] )and b[1] ) or
when 2 => y<=i2; architecture Behavioral of fulad is
when 3 => y<=i3; begin ( i[1] and (not (a[0]) and b[0] );
when others => null; sum <= a xor b xor c; end Behavioral;
end case; carry <= (a and b) or (b and c) or (a and c);
end process; end Behavioral;
end Behavioral;

JK Flip Flop
Sequential components
I
n
n-bit
D Q load shift Counter
J Q n-bit
Register
n-bit
Shift register n
CK FF clear I Q Q=
CK FF Q= n
Q = lsb
0 if clear=1, Q
0 if clear=1, Q(prev)+1 if count=1 and clock=1.
- Content shifted
CLR Q
I if load=1 and clock=1,
Q
- I stored in msb
K CLR Q
Q(previous) otherwise.

module sreg (input clk ,input clr, module sreg (input clk ,input clr,
input [3:0]i, module sreg (input clk ,input clr,
input i, output [3:0]q); output [3:0]q);
output [3:0]q); dff df0(clk,clr,i[0],q[0]); tff tf0(clk,clr,,q[0]);
JK FLIP FLOP dff df0(clk,clr,i[0],q[0]); dff df1(clk,clr,i[1],q[1]); tff tf1(q[0],clr,q[1]);
dff df1(clk,clr,i[1],q[1]); dff df2(clk,clr,i[2],q[2]); tff tf2(q[1],clr,q[2]);
dff df2(clk,clr,i[2],q[2]); dff df3(clk,clr,i[3],q[3]); tff tf3(q[2],clr,q[3]);
always @ ( negedge clk or negedge clr ) begin endmodule endmodule
dff df3(clk,clr,i[3],q[3]);
if (! clr) begin endmodule module dff( input clk, input clr, input d,
q<=0; output q); module tff( input clk, input clr, output q);
qc <=~q; always @(negedge clk or negedge clr)
always @(negedge clk or negedge clr)
end module dff( input clk, input clr, input d, if (!clr) if (!clr)
else begin output q); q <= 0; q <= 0;
always @(negedge clk or negedge clr) else else
q <= (j && qc) || (~k && q); q<=!q;
if (!clr) q<=d;
qc <= ~((j && qc) || (~k && q)); q <= 0; endmodule endmodule
end else
end q<=d;
endmodule

Sequential logic design (cont.)


Sequential logic design I1
a
Q1Q0
00
E) Minimized Output Equations

01 11 10
F) Combinational Logic

0 a
0 0 1 1 x
I1 = Q1’Q0a + Q1a’ +
A) Problem Description C) Implementation Model D) State Table (Moore-type) 1 Q1Q0’
0 1 0 1
You want to construct a clock
divider. Slow down your pre- x
a Combinational logic Inputs Outputs Q1Q0
existing clock so that you output a I1 Q1 Q0 a I1 I0 I0
x 00 01 11 10 I1
1 for every four clock cycles a
I0 0 0 0 0 0
0 0 0 1 1 0 I0 = Q0a’ + Q0’a
0 0 1 0 1
0 1 0 0 1 0
B) State Diagram FSM Q1 Q0 0 1 1 1 0 1 1 0 0 1
1 0 0 1 0 0
State register 1 0 1 1 1
1 1 0 1 1 x Q1Q0 I0
a=0 x=0 x=1 a=0 1 00 01 11 10
I1 I0 1 1 1 0 0 a
0 0 0 1 0 x = Q1Q0
0 a=1 3
1 Q1 Q0
0 0 1 0
a=1
C) Implementation Model
a=1
• Given this implementation x
a Combinational
1
a=1
2 model logic I
1I
a=0 a=0 0
x=0 x=0
– Sequential logic design quickly Q
1
Q
0
State register
reduces to combinational logic I I
design 1 0

16
Custom single-purpose Example: greatest common
processor basic model … … divisor
external external
(a) black-box view • First create
controller datapath
control
inputs
data
inputs
(b) desired functionality algorithm
… … x_i
next-state registers go_i y_i
datapath
control and
GCD
0: int x, y; • Convert algorithm to
controller inputs datapath control 1: while (1)
logic
{
“complex” state
datapath
state functional d_o
2: while (!go_i); machine
control 3: x = x_i;
… outputs …
register units
– Known as FSMD:
4: y = y_i;
external
control
external
data 5: while (x != y) finite-state machine
outputs outputs
… … { with datapath
Controller Datapath 6: if (x < y) – Can use templates to
a view inside the controller and datapath
controller and datapath Control index
7: y = y - x;
logic
total perform such
State else
register
+
8: x = x - y; conversion
Data
}
memory 9: d_o = x;
}

(c) state diagram


State diagram templates 0: int x, y; 1:
!1

1: while (1) 1 !(!go_i)


2:

Assignment statement Loop statement Branch statement { !go_i

while (cond) if (c1)


2: while (!go_i); 2-J:
a=b
next statement { c1 stmts 3: x = x_i; 3: x = x_i
loop-body- else if c2
statements c2 stmts
4: y = y_i;
} else 5: while (x != y) 4: y = y_i

next statement other stmts { !(x!=y)


next statement 5:
6: if (x < y) x!=y
!cond
a=b C: C: 7: y = y - x; 6:

c1 !c1*c2 !c1*!c2
cond else x<y !(x<y)

loop-body- y = y -x x=x -y
next
statements
c1 stmts c2 stmts others 8: x = x - y; 7: 8:
statement
} 6-J:

J: J: 9: d_o = x;
5-J:
}
next next 9: d_o = x
statement statement
1-J:

Creating the Datapath Creating the controller’s FSM and Datapath


go_i
• FSM is same structure
Controller
• Create a register for any 1:
!1
0000 1:
!1
as FSMD
declared variable 1 !(!go_i)
1 !(!go_i)

• Create a functional unit for each 2: 1:


!1 0001 2: • Replace complex
!go_i
!go_i
arithmetic operation 2-J: 2:
1 !(!go_i)
0010 2-J:
actions/conditions with
• Connect the ports, registers and 3: x = x_i
2-J:
!go_i
0011 3:
x_sel = 0
x_ld = 1
datapath configurations
functional units
4: y = y_i 3: x = x_i
– Based on reads and writes 0100 4:
y_sel = 0
y_ld = 1
5: !(x!=y) 4: y = y_i
– Use multiplexors for multiple x_i y_i
!x_neq_y
x_i y_i
x!=y
Datapath Datapath
sources 6: 5: !(x!=y)
0101 5:

x_sel x_neq_y x_sel


n-bit 2x1 n-bit 2x1 x!=y n-bit 2x1 n-bit 2x1
• Create unique identifier x<y !(x<y)
y_sel 6:
0110 6:
y_sel
7: y = y -x 8: x = x - y
x_ld x_ld
– for each datapath component 0: x 0: y x<y !(x<y)
x_lt_y !x_lt_y
0: x 0: y
6-J: y_ld 7: y_sel = 1 8: x_sel = 1 y_ld
control input and output 7: y = y -x 8: x = x - y y_ld = 1 x_ld = 1

5-J: 6-J: 0111 1000


!= < subtractor subtractor != < subtractor subtractor
d_o = x 5: x!=y 6: x<y 8: x-y 7: y-x 1001 6-J: 5: x!=y 6: x<y 8: x-y 7: y-x
9:
x_neq_y 5-J: x_neq_y

1-J: x_lt_y 9: d 1010 5-J: x_lt_y 9: d


9: d_o = x
d_ld d_ld
1011 9: d_ld = 1
d_o 1-J: d_o

1100 1-J:

17
Splitting into a controller and datapath Controller state table for the GCD example
go_i Inputs Outputs
Controller !1 Q3 Q2 Q1 Q0 x_ne x_lt_ go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld
0000 1: q_y y
go_i 0 0 0 0 * * * 0 0 0 1 X X 0 0 0
1 !(!go_i)
Controller implementation model Controller !1 0001 2: 0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0000 1: x_i y_i !go_i
go_i 0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
x_sel 1 !(!go_i) (b) Datapath 0010 2-J:
Combinational y_sel 0001 2: 0 0 1 0 * * * 0 0 0 1 X X 0 0 0
logic !go_i x_sel x_sel = 0
x_ld n-bit 2x1 n-bit 2x1 0011 3: x_ld = 1 0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
y_ld 0010 2-J: y_sel
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y y_sel = 0
0100 4: y_ld = 1 0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
x_lt_y y_ld
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
d_ld x_neq_y=0
y_sel = 0 0101 5:
0100 4: y_ld = 1 0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
!= < subtractor subtractor x_neq_y=1
0110 6: 0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: 0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
x_neq_y x_lt_y=1 x_lt_y=0
Q3 Q2 Q1 Q0 x_neq_y=1
0110 6: x_lt_y 9: d 7: y_sel = 1 8: x_sel = 1 1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
State register y_ld = 1 x_ld = 1
d_ld 1 0 0 1 * * * 1 0 1 0 X X 0 0 0
x_lt_y=1 x_lt_y=0 0111 1000
I3 I2 I1 I0
7: y_sel = 1 8: x_sel = 1 d_o
1001 6-J:
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
y_ld = 1 x_ld = 1
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
0111 1000 1010 5-J: 1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1001 6-J:
1011 9: d_ld = 1 1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1010 5-J: 1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1100 1-J:
1011 9: d_ld = 1 1 1 1 1 * * * 0 0 0 0 X X 0 0 0

1100 1-J:

Optimizing single-purpose processors Optimizing the original program


• Optimization is the task of making design metric • Analyze program attributes and look for
values the best possible areas of possible improvement
• Optimization opportunities – number of computations
– original program – size of variable
– FSMD – time and space complexity
– datapath – operations used
– FSM • multiplication and division very expensive

Optimizing the original program (cont’) Optimizing the original program (cont’)

original program optimized program


0: int x, y; 0: int x, y, r; • The second algorithm is far more efficient in
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger number terms of time analysis
4: y = y_i; 3: if (x_i >= y_i) {
5: while (x != y) 4: x=x_i;
{
replace the subtraction
operation(s) with modulo
5: y=y_i; • The efficient design is a widely researched
6: if (x < y) }
7: y = y - x;
operation in order to speed
up program
6: else { area
else 7: x=y_i;
8: x = x - y; 8: y=x_i;
} } • The choice of algorithm have the biggest
9: d_o = x; 9: while (y != 0)
} { impact on the efficiency
10: r = x % y;
11: x = y;
GCD(42, 8) - 9 iterations to complete the loop 12: y = r;
}
x and y values evaluated as follows : (42, 8), 13: d_o = x;
(43, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), }

(2,2).
GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows: (42, 8),
(8,2), (2,0)

18
Optimizing the FSMD Optimizing the FSMD (cont.)
• Areas of possible improvements 1:
int x, y; !1 original FSMD optimized FSMD
int x, y;
– merge states 2:
1 !(!go_i) eliminate state 1 – out going transitions have constant values 2:
go_i !go_i
!go_i

• states with constants on transitions can be 2-J:


merge state 2 and state 2J – no loop operation in
3:
x = x_i
y = y_i

eliminated, transition taken is already known 3: x = x_i between them


5:

• states with independent operations can be merged 4: y = y_i


merge state 3 and state 4 – assignment operations are
independent of one another
x<y
7: y = y -x
x>y
8: x = x - y
5: !(x!=y)

– separate states 6:
x!=y
merge state 5 and state 6 – transitions from state 6 can 9: d_o = x

be done in state 5
• states which require complex operations (a*b*c*d) 7: y = y -x
x<y !(x<y)
8: x = x - y

can be broken into smaller states to reduce 6-J:


eliminate state 5J and 6J – transitions from each state
can be done from state 7 and state 8, respectively
hardware size 5-J:
eliminate state 1-J – transition from state 1-J can be
– scheduling 9:
d_o = x done directly from state 9

1-J:

Optimizing the datapath Optimizing the FSM


• State encoding
• Sharing of functional units – task of assigning a unique bit pattern to each
state in an FSM
– one-to-one mapping, as done previously, is
not necessary – size of state register and combinational logic
vary
– if same operation occurs in different states,
they can share a single functional unit – can be treated as an ordering problem
• Multi-functional units • State minimization
– ALUs support a variety of operations, it can – task of merging equivalent states into a single
be shared among operations occurring in state
different states • state equivalent if for all possible input
combinations the two states generate the same
outputs and transitions to the next same state

RT-level custom single-purpose processor design


RT-level custom single-purpose processor
design (cont’)
• We often start with a state Bridge
Problem Specification

machine Bridge
(a) Controller
rdy_in=0 rdy_in=1
Receiver

rdy_in A single-purpose processor that


Sender

rdy_out rdy_in=1
– Rather than algorithm clock
converts two 4-bit inputs, arriving one
at a time over data_in along with a
WaitFirst4 RecFirst4Start RecFirst4End
rdy_in pulse, into one 8-bit output on data_lo_ld=1
– Cycle timing often too data_in(4)
data_out along with a rdy_out pulse.
data_out(8) rdy_in=0 rdy_in=0 rdy_in=1
central to functionality WaitSecond4
rdy_in=1
RecSecond4Start RecSecond4End
data_hi_ld=1
• Example rdy_in=0 Bridge rdy_in=1

– Bus bridge that converts 4- WaitFirst4


rdy_in=1
RecFirst4Start RecFirst4End
Send8Start
data_out_ld=1
Send8End
rdy_out=0
bit bus to 8-bit bus data_lo=data_in rdy_out=1

rdy_in=0 rdy_in=0 rdy_in=1


– Start with FSMD rdy_in=1 rdy_in rdy_out
WaitSecond4 RecSecond4Start RecSecond4End clk
– Known as register-transfer
FSMD

data_hi=data_in
data_in(4) data_out

(RT) level rdy_in=0


data_out_ld

data_lo_ld
data_hi_ld

Inputs
registers

Send8Start rdy_in: bit; data_in: bit[4]; data_hi data_lo


to all

Send8End
– Exercise: complete the data_out=data_hi
& data_lo rdy_out=0
Outputs
rdy_out: bit; data_out:bit[8] data_out
rdy_out=1
design Variables
data_lo, data_hi: bit[4]; (b) Datapath

19
Summary
• Custom single-purpose processors
– Straightforward design techniques Chapter 3 General-Purpose
– Can be built to execute algorithms
– Typically start with FSMD
Processors: Software
– CAD tools can be of great assistance

Introduction Basic Architecture


• General-Purpose Processor • Control unit and
– Processor designed for a variety of computation tasks datapath
Processor
– Low unit cost, in part because manufacturer spreads – Circuitry to Control unit Datapath
NRE over large numbers of units transforming data ALU
• Motorola sold half a billion 68HC05 – Note similarity to Controller Control
/Status
microcontrollers in 1996 alone single-purpose
– Carefully designed since higher NRE is acceptable processor Registers
• Key differences
• Can yield good performance, size and power
– Datapath is general
– Low NRE cost, short time-to-market/prototype, high PC IR
– Control unit doesn’t
flexibility
store the algorithm –
• User just writes software; no processor design the algorithm is I/O

– a.k.a. “microprocessor” – “micro” used when they “programmed” into Memory

were implemented on one or a few chips rather than the memory


entire rooms

Datapath Operations Control Unit


• Control unit: configures the
• Load Processor datapath operations Processor

– Read memory Control unit Datapath


– Sequence of desired Control unit Datapath

location into ALU operations (“instructions”) ALU


Controller Control +1 stored in memory – “program” Controller Control
register /Status /Status
• Instruction cycle – broken into
• ALU operation Registers
several sub-operations, each Registers
– Input certain registers one clock cycle, e.g.:
through ALU, store – Fetch: Get next instruction
back in register 10 11 into IR
PC IR PC IR R0 R1
• Store – Decode: Determine what the
instruction means
– Write register to
memory location I/O
– Fetch operands: Move data I/O
... from memory to datapath ...
Memory 100 load R0, M[500] Memory
10 register 500 10
101 inc R1, R0 501
11 – Execute: Move data through
... 102 store M[501], R1 ...
the ALU
– Store results: Write data from
register to memory

20
Control Unit Sub-Operations Control Unit Sub-Operations
• Fetch
– Get next Processor • Decode Processor
Control unit Datapath Control unit Datapath

instruction into ALU – Determine ALU

IR
Controller Control
/Status
what the Controller Control
/Status

– PC: program Registers


instruction Registers

counter, means
always points PC 100 IR
load R0, M[500] R0 R1 PC 100 IR
load R0, M[500] R0 R1

to next
instruction I/O I/O
100 load R0, M[500] Memory
... 100 load R0, M[500] Memory
...
– IR: holds the 101 inc R1, R0
500
501
10
101 inc R1, R0
500
501
10

102 store M[501], R1 ... 102 store M[501], R1 ...


fetched
instruction

Control Unit Sub-Operations Control Unit Sub-Operations


• Fetch operands Processor • Execute Processor
Control unit Datapath Control unit Datapath
– Move data ALU – Move data ALU

from memory Controller Control


/Status
through the Controller Control
/Status

to datapath Registers
ALU Registers
register – This particular
PC 100 IR R0
10
R1
instruction PC 100 IR R0
10
R1
11
load R0, M[500] load R0, M[500]
does nothing
I/O during this sub- I/O
... ...
100 load R0, M[500]
101 inc R1, R0
Memory
500 10 operation 100 load R0, M[500]
101 inc R1, R0
Memory
500 10
501 501
102 store M[501], R1 ... 102 store M[501], R1 ...

Control Unit Sub-Operations Instruction Cycles


• Store results Processor PC=100 Processor
Control unit Datapath Fetch Decode Fetch Exec. Store Control unit Datapath
– Write data from ALU
ops results ALU
clk
register to Controller Control
/Status
Controller Control
/Status

memory Registers Registers

– This particular
instruction PC 100 IR R0
10
R1
11
PC 100 IR R0
10
R1
load R0, M[500] load R0, M[500]
does nothing
during this sub- I/O I/O
... ...
operation 100 load R0, M[500]
101 inc R1, R0
Memory
500 10
100 load R0, M[500]
101 inc R1, R0
Memory
500 10
501 11 501
102 store M[501], R1 ... 102 store M[501], R1 ...

21
Instruction Cycles Instruction Cycles
PC=100 Processor PC=100 Processor

Fetch Decode Fetch Exec. Store Control unit Datapath Fetch Decode Fetch Exec. Store Control unit Datapath
ops results ALU
ops results ALU
clk Controller Control +1 clk Controller Control
/Status /Status

PC=101 PC=101
Registers Registers
Fetch Decode Fetch Exec. Store Fetch Decode Fetch Exec. Store
ops results ops results
clk clk
10 11 10 11
PC 101 IR R0 R1 PC 102 IR R0 R1
inc R1, R0 store M[501], R1
PC=102
I/O
Fetch Decode Fetch Exec. Store I/O
... ops results ...
100 load R0, M[500] Memory 100 load R0, M[500] Memory
500 10 clk 500 10
101 inc R1, R0 501 101 inc R1, R0 501 11
102 store M[501], R1 ... 102 store M[501], R1 ...

Architectural Considerations Architectural Considerations


• N-bit processor Processor • Clock frequency Processor
Control unit Datapath Control unit Datapath
– N-bit ALU, ALU – Inverse of clock ALU

registers, buses, Controller Control


/Status
period Controller Control
/Status

memory data Registers – Must be longer Registers


interface than longest
– Embedded: 8- PC IR
register to PC IR

bit, 16-bit, 32-bit register delay in


common I/O entire processor I/O
Memory Memory
– Desktop/servers – Memory access
: 32-bit, even 64 is often the
• PC size longest
determines

Two Memory Architectures Cache Memory

Processor Processor
• Memory access may Fast/expensive technology, usually on
the same chip

• Princeton be slow Processor


– Fewer memory
wires
• Cache is small but fast
• Harvard Program Data memory Memory
memory close to Cache

– Simultaneous
memory (program and data)
processor
program and – Holds copy of part of Memory
data memory Harvard Princeton
memory
access Slower/cheaper technology, usually on
– Hits and misses a different chip

22
Pipelining: Increasing Superscalar and VLIW Architectures
• Performance can be improved by:
Instruction Throughput – Faster clock (but there’s a limit)
– Pipelining: slice up instruction into stages, overlap stages
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
– Multiple ALUs to support more than one instruction stream
Non-pipelined Pipelined • Superscalar
Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
– Scalar: non-vector operations
non-pipelined dish cleaning Time pipelined dish cleaning Time
– Fetches instructions in batches, executes as many as
possible
Fetch-instr. 1 2 3 4 5 6 7 8 » May require extensive hardware to detect
Decode 1 2 3 4 5 6 7 8 independent instructions
Fetch ops. 1 2 3 4 5 6 7 8 Pipelined – VLIW: each word in memory has multiple independent
Execute 1 2 3 4 5 6 7 8 instructions
Instruction 1
Store res. 1 2 3 4 5 6 7 8 » Relies on the compiler to detect and schedule
instructions
Time
pipelined instruction execution
» Currently growing in popularity

Programmer’s View Assembly-Level Instructions


• Programmer writes the program instructions that carry out the desired Instruction 1 opcode operand1 operand2

functionality Instruction 2 opcode operand1 operand2

• Programmer doesn’t need detailed understanding of architecture, may Instruction 3 opcode operand1 operand2
need to know architectural abstraction
Instruction 4 opcode operand1 operand2
– Instead, needs to know what instructions can be executed
...
• Two levels of instructions:
– Assembly level
– Structured languages (C, C++, Java, etc.) • Instruction Set
• Most development today done using structured languages
– But, some assembly level programming may still be necessary
– Defines the legal set of instructions for that
– Drivers: portion of program that communicates with and/or controls
processor
(drives) another device • Data transfer: memory/register, register/register,
• Often have detailed timing considerations, extensive bit I/O, etc.
manipulation • Arithmetic/logical: move register through ALU and
• Assembly level may be best for these back
• Branches: determine next PC value when not just
PC+1

Addressing Modes A Simple (Trivial) Instruction Set


Addressing Register-file Memory
mode Operand field contents contents Assembly instruct. First byte Second byte Operation

MOV Rn, direct 0000 Rn direct Rn = M(direct)


Immediate Data
MOV direct, Rn 0001 Rn direct M(direct) = Rn
Register-direct
Register address Data
MOV @Rn, Rm 0010 Rn Rm M(Rn) = Rm

Register
Register address Memory address Data MOV Rn, #immed. 0011 Rn immediate Rn = immediate
indirect

ADD Rn, Rm 0100 Rn Rm Rn = Rn + Rm


Direct Memory address Data
SUB Rn, Rm 0101 Rn Rm Rn = Rn - Rm

Indirect Memory address Memory address


JZ Rn, relative 0110 Rn relative PC = PC+ relative
(only if Rn is 0)
Data opcode operands

23
Sample Programs Programmer Considerations
C program Equivalent assembly program

0 MOV R0, #0; // total = 0


• Program and data memory space
1 MOV R1, #10; // i = 10
2 MOV R2, #1; // constant 1 – Embedded processors often very limited
3 MOV R3, #0; // constant 0
• e.g., 64 Kbytes program, 256 bytes of RAM
Loop: JZ R1, Next; // Done if i=0
int total = 0; 5 ADD R0, R1; // total += i (expandable)
for (int i=10; i!=0; i--) 6 SUB R1, R2; // i--
total += i;
// next instructions...
7
Next:
JZ R3, Loop;
// next instructions...
// Jump always
• Registers: How many are there?
– Only a direct concern for assembly-level
• Try some others
programmers
– Handshake: Wait until the value of M[254] is not 0, set
M[255] to 1, wait until M[254] is 0, set M[255] to 0 • I/O
(assume those locations are ports).
– How communicate with external signals?
– (Harder) Count the occurrences of zero in an array
stored in memory locations 100 through 199. • Interrupts

Microprocessor Architecture
Example: parallel port driver
Overview
• If you are using a particular LPT Connection Pin
1
I/O Direction
Output
Register Address
0th bit of register #2
Pin 13
Switch
microprocessor, now is a good time to 2-9 Output 0th bit of register #2
PC Parallel port
LED
Pin 2
review its architecture 10,11,12,13,15

14,16,17
Input

Output
6,7,5,4,3th bit of register #1

1,2,3th bit of register #2

• Using assembly language programming we can


configure a PC parallel port to perform digital I/O
– write and read to three special registers to accomplish
this table provides list of parallel port connector pins
and corresponding register location
– Example : parallel port monitors the input switch and
turns the LED on/off accordingly

Operating System
Parallel Port Example • Optional software layer
; This program consists of a sub-routine that reads extern “C” CheckPort(void); // defined in
providing low-level
;
;
;
the state of the input pin, determining the on/off state
of our switch and asserts the output pin, turning the LED
on/off accordingly
void main(void) {
while( 1 ) {
// assembly
services to a program
.386

(application).
CheckPort();
}
CheckPort proc }
push ax ; save the content
push dx
mov
in
; save the content
dx, 3BCh + 1 ; base + 1 for register #1
al, dx ; read register #1
– File management, disk DB file_name “out.txt” -- store file name

MOV R0, 1324 -- system call “open” id


and
cmp
jne
al, 10h
al, 0
SwitchOn
; mask out all but bit # 4
; is it 0?
; if not, we need to turn the LED on
Pin 13
Switch
access MOV R1, file_name
INT 34
-- address of file-name
-- cause a system call
PC Parallel port JZ R0, L1 -- if zero -> error
SwitchOff:
mov dx, 3BCh + 0 ; base + 0 for register #0 Pin 2 LED – Keyboard/display . . . read the file
JMP L2 -- bypass error cond.
in
and
out
al, dx
al, f7h
dx, al
; read the current state of the port
; clear first bit (masking)
; write it out to the port
interfacing L1:
. . . handle the error
jmp Done ; we are done

SwitchOn:
LPT Connection Pin I/O Direction Register Address – Scheduling multiple L2:

mov dx, 3BCh + 0 ; base + 0 for register #0 1 Output 0th bit of register #2
in
or
al, dx
al, 01h
; read the current state of the port
; set first bit (masking) 2-9 Output 0th bit of register #2
programs for execution
out dx, al ; write it out to the port

Done: pop dx ; restore the content


10,11,12,13,15
14,16,17
Input
Output
6,7,5,4,3th bit of register
#1
1,2,3th bit of register #2
• Or even just multiple
pop
CheckPort
ax
endp
; restore the content
threads from one program
– Program makes system
calls to the OS

24
Development Environment
Software Development Process
• Development processor • Compilers
– The processor on which we write and debug – Cross
our programs C File C File Asm. compiler
File
• Usually a PC • Runs on one
Compiler Assemble
• Target processor r processor,
Binary Binary Binary but generates
– The processor that the program will run on in File File File
code for
our embedded system Library
Linker
Debugger another
• Often different from the development processor Exec.
File Profiler • Assemblers
Implementation Phase Verification Phase • Linkers
• Debuggers
Development processor Target processor
• Profilers

Running a Program Instruction Set Simulator For A


• If development processor is different than Simple Processor
target, how can we run our compiled #include <stdio.h>
typedef struct {
}
}
return 0;
code? Two options:
unsigned char first_byte, second_byte;
} instruction; }

instruction program[1024]; //instruction memory int main(int argc, char *argv[]) {

– Download to target processor unsigned char memory[256]; //data memory


FILE* ifs;
void run_program(int num_bytes) {
If( argc != 2 ||
– Simulate int pc = -1;
unsigned char reg[16], fb, sb;
(ifs = fopen(argv[1], “rb”) == NULL ) {
return –1;
}

• Simulation while( ++pc < (num_bytes / 2) ) {


fb = program[pc].first_byte;
sb = program[pc].second_byte;
if (run_program(fread(program,
sizeof(program) == 0) {
print_memory_contents();
switch( fb >> 4 ) { return(0);
– One method: Hardware description language case 0: reg[fb & 0x0f] = memory[sb]; break;
case 1: memory[sb] = reg[fb & 0x0f]; break;
}
else return(-1);
case 2: memory[reg[fb & 0x0f]] = }
• But slow, not always available reg[sb >> 4]; break;
case 3: reg[fb & 0x0f] = sb; break;
case 4: reg[fb & 0x0f] += reg[sb >> 4]; break;
– Another method: Instruction set simulator case 5: reg[fb & 0x0f] -= reg[sb >> 4]; break;
case 6: pc += sb; break;
default: return –1;
(ISS)
• Runs on development processor, but executes
instructions of target processor

Testing and Debugging Application-Specific Instruction-Set Processors


(ASIPs)
(a) (b)

Implementation
• ISS • General-purpose processors
Implementation
Phase Phase – Gives us control over time – Sometimes too general to be effective in demanding
– set breakpoints, look at
Verification register values, set values, application
Phase Development processor
step-by-step execution, ... • e.g., video processing – requires huge video
Debugger – But, doesn’t interact with buffers and operations on large arrays of data,
/ ISS
real environment inefficient on a GPP
Emulator
• Download to board – But single-purpose processor has high NRE, not
– Use device programmer programmable
External tools
– Runs in real environment,
but not controllable
• ASIPs – targeted to a particular domain
Programmer
• Compromise: emulator – Contain architectural features specific to that domain
Verification
Phase – Runs in real environment, • e.g., embedded control, digital signal processing,
at speed or near video processing, network processing,
– Supports some telecommunications, etc.
controllability from the PC – Still programmable

25
A Common ASIP: Microcontroller Another Common ASIP: Digital Signal
• For embedded control applications
Processors (DSP)
– Reading sensors, setting actuators
– Mostly dealing with events (bits): data is present, but not in huge
• For signal processing applications
amounts – Large amounts of digitized data, often streaming
– e.g., VCR, disk drive, digital camera (assuming SPP for image
compression), washing machine, microwave oven
– Data transformations must be applied fast
• Microcontroller features – e.g., cell-phone voice filter, digital TV, music
– On-chip peripherals synthesizer
• Timers, analog-digital converters, serial communication, etc. • DSP features
• Tightly integrated for programmer, typically part of register
space – Several instruction execution units
– On-chip program and data memory – Multiple-accumulate single-cycle instruction, other
– Direct programmer access to many of the chip’s pins instrs.
– Specialized instructions for bit-manipulation and other low-level
operations – Efficient vector operations – e.g., add two arrays
• Vector ALUs, loop buffers, etc.

Trend: Even More Customized ASIPs Selecting a Microprocessor


• Issues
– Technical: speed, power, size, cost
• In the past, microprocessors were acquired as chips – Other: development environment, prior expertise, licensing, etc.
• Today, we increasingly acquire a processor as Intellectual Property • Speed: how evaluate a processor’s speed?
(IP)
– Clock speed – but instructions per cycle may differ
– e.g., synthesizable VHDL model
– Instructions per second – but work per instr. may differ
• Opportunity to add a custom datapath hardware and a few custom
instructions, or delete a few instructions – Dhrystone: Synthetic benchmark, developed in 1984.
Dhrystones/sec.
– Can have significant performance, power and size impacts
• MIPS: 1 MIPS = 1757 Dhrystones per second (based on
– Problem: need compiler/debugger for customized ASIP Digital’s VAX 11/780). A.k.a. Dhrystone MIPS. Commonly
• Remember, most development uses structured languages used today.
• One solution: automatic compiler/debugger generation – So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per
– e.g., www.tensillica.com second
• Another solution: retargettable compilers – SPEC: set of more realistic benchmarks, but oriented to
– e.g., www.improvsys.com (customized VLIW desktops
architectures) – EEMBC – EDN Embedded Benchmark Consortium,
www.eembc.org
• Suites of benchmarks: automotive, consumer electronics,
networking, office automation, telecommunications

Designing a General Purpose Processor


General Purpose Processors
FSMD
Processor Clock speed Periph. Bus Width MIPS
General Purpose Processors
Power Trans. Price • Not something an Declarations:
bit PC[16], IR[16]; Reset PC=0;
bit M [64k][16], RF[16][16];
Intel PIII 1GHz 2x16 K
L1, 256K
32 ~900 97W ~7M $900
embedded system Fetch IR=M[PC];
PC=PC+1
L2, MMX
IBM
PowerPC
550 MHz 2x32 K
L1, 256K
32/64 ~1300 5W ~7M $900 designer normally Decode from states
below
750X
MIPS
R5000
250 MHz
L2
2x32 K
2 way set assoc.
32/64 NA NA 3.6M NA would do op = 0000
M ov1 RF[rn] = M[dir]
to Fetch

StrongARM
SA-110
233 MHz None 32 268 1W 2.1M NA
– But instructive to see 0001
M ov2 M[dir] = RF[rn]
to Fetch
Microcontroller
Intel 12 MHz 4K ROM, 128 RAM, 8 ~1 ~0.2W ~10K $7 how simply we can M ov3 M[rn] = RF[rm]
0010 to Fetch
8051 32 I/O, Timer, UART
Motorola 3 MHz 4K ROM, 192 RAM, 8 ~.5 ~0.1W ~10K $5 build one top down M ov4 RF[rn]= imm
68HC811 32 I/O, Timer, WDT,
0011 to Fetch
SPI
Digital Signal Processors – Remember that real Aliases:
Add RF[rn] =RF[rn]+RF[rm]
op IR[15..12] dir IR[7..0]
TI C5416 160 MHz 128K, SRAM, 3 T1
Ports, DMA, 13
16/32 ~600 NA NA $34
processors aren’t rn IR[11..8]
rm IR[7..4]
imm IR[7..0]
rel IR[7..0]
0100 to Fetch

ADC, 9 DAC Sub RF[rn] = RF[rn]-RF[rm]


Lucent 80 MHz 16K Inst., 2K Data, 32 40 NA NA $75 usually built this way 0101 to Fetch
DSP32C Serial Ports, DMA
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
• Much more optimized, 0110
Jz PC=(RF[rn]=0) ?rel :PC
to Fetch
much more bottom-up
design

26
Architecture of a Simple Microprocessor
• Storage devices for A Simple Microprocessor
each declared variable
– register file holds each Reset PC=0; PCclr=1;
Datapath
of the variables Control unit To all
input RFs
1 0
2x1 mux Fetch IR=M[PC]; MS=10; Datapath 1
control Control unit To all 0
• Functional units to signals
RFwa
RFw Decode
PC=PC+1
from states
Irld=1;
Mre=1;
input
contro
RFs
2x1 mux
PCinc=1;
carry out the FSMD Controller
(Next-state and
control From all
RFwe
RF (16) M ov1
below
RF[rn] = M[dir] RFwa=rn; RFwe=1; RFs=01; Controller
l
signals RFwa RFw

operations logic; state register) output


control
RFr1a op = 0000 to Fetch Ms=01; Mre=1;

RFr1a=rn; RFr1e=1;
(Next-state and
control From all
RFwe
RF (16)
signals RFr1e M ov2 M[dir] = RF[rn] logic; state output RFr1a
– One ALU carries out 16 RFr2a
0001 to Fetch Ms=01; Mwe=1;
register) control
signals RFr1e
every required operation PCld
PCinc
PC IR
Irld
RFr2e
RFr1 RFr2 0010
M ov3 M[rn] = RF[rm]
to Fetch
RFr1a=rn; RFr1e=1;
Ms=10; Mwe=1; 16 RFr2a
PCld Irld
• Connections added PCclr
ALUs
ALU
0011
M ov4 RF[rn]= imm
to Fetch
RFwa=rn; RFwe=1; RFs=10;
PCinc
PC IR
RFr2e
RFr1 RFr2

among the 2 1 0
ALUz
0100
Add RF[rn] =RF[rn]+RF[rm]
to Fetch
RFwa=rn; RFwe=1; RFs=00;
RFr1a=rn; RFr1e=1;
PCclr
ALUs

ALUz
ALU

components’ ports Ms
3x1 mux Mre Mwe 0101
Sub RF[rn] = RF[rn]-RF[rm]
RFr2a=rm; RFr2e=1; ALUs=00
RFwa=rn; RFwe=1; RFs=00;
RFr1a=rn; RFr1e=1;
2 1 0

to Fetch
corresponding to the Jz PC=(RF[rn]=0) ?rel :PC
RFr2a=rm; RFr2e=1; ALUs=01
PCld= ALUz;
Ms
3x1 mux Mre Mwe
RFrla=rn;
operations required by A Me mory D
0110 to Fetch
RFrle=1;
FSM operations that replace the FSMD
FSMD
the FSM operations after a datapath is created
A Me mory D
You just built a simple microprocessor!
• Unique identifiers
created for every
control signal

Chapter Summary
• General-purpose processors
– Good performance, low NRE, flexible
• Controller, datapath, and memory
• Structured languages prevail Chapter 4 Standard Single
– But some assembly level programming still necessary
• Many tools available Purpose Processors:
– Including instruction-set simulators, and in-circuit
emulators Peripherals
• ASIPs
– Microcontrollers, DSPs, network processors, more
customized ASIPs
• Choosing among processors is an important step
• Designing a general-purpose processor is conceptually
the same as designing a single-purpose processor

Introduction Timers
• Single-purpose processors • Timer: measures time intervals
Basic timer
– Performs specific computation task – To generate timed output events
16-bit up
– Custom single-purpose processors • e.g., hold traffic light green for 10 s counter
16 Cnt
• Designed by us for a unique task – To measure input events Clk

– Standard single-purpose processors • e.g., measure a car’s speed

• “Off-the-shelf” -- pre-designed for a common task • Based on counting clock pulses Top

• Low-NRE,unit cost, performance-faster, low power, • E.g., let Clk period be 10 ns


Reset
small size-customized for particular task • And we count 20,000 Clk pulses
• Then 200 microseconds have passed
• a.k.a., peripherals
• 16-bit counter would count up to
• serial transmission 65,535*10 ns = 655.35 microsec.,
• analog/digital conversions resolution = 10 ns
• Top: indicates top count reached, wrap-
around

27
Other timer structures
Counters • Interval timer Timer with a terminal
count
– Indicates when 16-bit up
Timer/counter Clk
• Counter: like a timer, but desired time interval counter 16 Cnt

has passed
counts pulses on a general Clk 2x1 mux 16-bit up counter
16
Cnt
– We set terminal count Reset
input signal rather than to desired interval =
16/32-bit timer

Top 16-bit up
clock Cnt_in Top
• Number of clock
Clk
counter 16 Cnt1

– e.g., count cars passing over Reset cycles = Desired Terminal count
Top2
Top1

a sensor Mode time interval / 16-bit up Cnt2


– Can often configure device as Clock period counter 16

either a timer or counter • Cascaded counters Time with prescaler

• Prescaler Clk Prescaler 16-bit up


counter
– Divides clock
– Increases range,
decreases resolution Mode

indicator reaction
Reaction Timer Watchdog timer
light button

LCD time: 100 ms


• Must reset timer
every X time unit, osc clk overflow overflow to system reset
prescaler scalereg timereg
/* main.c */ else timer generates or
• Measure time between turning light #define MS_INIT 63535 a signal
interrupt
checkreg
on and user pushing button void main(void){
int count_milliseconds = 0; • Common use: detect
– 16-bit timer, clk period is 83.33 configure timer mode
set Cnt to MS_INIT
failure, self-reset /* main.c */
watchdog_reset_routine(){
ns, counter increments every 6 wait a random amount of time
• Another use: main(){
/* checkreg is set so we can load value into
timereg. Zero is loaded into scalereg and
wait until card inserted
cycles turn on indicator light timeouts call watchdog_reset_routine
11070 is loaded into timereg */
start timer
checkreg = 1
– e.g., ATM while(transaction in progress){
– Resolution = 6*83.33=0.5 while (user has not pushed reaction button){ if(button pressed){
scalereg = 0
if(Top) { machine perform corresponding action
timereg = 11070
microsec. stop timer call watchdog_reset_routine
}
set Cnt to MS_INIT
start timer
– 16-bit timer, 2 }
void interrupt_service_routine(){
– Range = 65535*0.5 reset Top microsec. /* if watchdog_reset_routine not called every
eject card
reset screen
count_milliseconds++; < 2 minutes, interrupt_service_routine is
microseconds = 32.77 } resolution called */
}
} }
milliseconds turn light off – timereg value =
printf(“time: %i ms“, count_milliseconds);
} 2*(216-1)–X =
– Want program to count millisec., 131070–X
so initialize counter to 65535 – – For 2 min., X =
1000/0.5 = 63535 120,000 microsec.

Serial Transmission Using UARTs Serial (RS232C) Driver Interface


U4
• UART: Universal 31
EA/VP P0.0
39
38
19 P0.1 37
Asynchronous PIC16F877A
X1 P0.2 36
embedded P0.3 35
P0.4
Receiver Transmitter 1
0 1 0 1
device 18
X2 P0.5
34
33
1 P0.6
0 1 32
– Takes parallel data and 9
RESET
P0.7
21
transmits serially P2.0
P2.1
22
12 23
10011011 10011011 INT0 P2.2
– Receives serial data 13
14 INT1 P2.3
24
25
T0 P2.4
and converts to parallel Sending UART Receiving UART
15
T1 P2.5
26
27
P2.6 +5V
1 28 Serial Port
• Parity: extra bit for start bit end bit
2
3
P1.0
P1.1
P2.7
17 7 Of
P1.2 RD Computer
simple error checking data 4
5 P1.3
P1.4
WR
PSEN
16
29 T1OUT
T2OUT
5
18 1
2
6 30 1
P1.5 ALE/P T2IN 3
• Start bit, stop bit 7
P1.6
RC7/Rx/DT
RXD
10 3
R1OUT R1IN
4
8 11 2 19 CON3
P1.7 TXD T1IN R2IN
RC6/Tx/CK
• Baud rate 8051 12
17 V- C2+
11
15
V- C2+
– signal changes per 9
GND C2-
16
10
second 1 0 0 1 1 0 1 1
C2-

MAX-233
– bit rate usually higher

28
COM1- 0x3F8 to 0x3FE
0x3F8 Data #include<iostream.h>
0x3F9 Interrupt Enable Register Buad rate Hex No.
#include<stdio.h>
0x3FA Interrupt identification Register #include<conio.h> 110 0x417
Line control Register 0x3FB 0x3FB Line Control Register #include<dos.h>
D7 D6 D5 D4 D3 D2 D1 D0 0x3FC Modem Control Register 300 0x180
#define port 0x3F8 //com1 OR 0x2F8 for com2
0x3FD Line Status Register 1200 0x060
00 5bits 0x3FE Modem Status Register //com1 0x3F8 to 0x3FE
01 6bits 2400 0x030
DLAB=1 to PE main()
access divider 10 7bits { 4800 0x018
latch 11 8bits int k,j,t;
0 odd 0 1Stopbit 9600 0x00C
clrscr();
parityt 1 2bits outportb(port+3,0x80); //Baud rate specifier (0x3FB) 19200 0x007
1 Even outportb(port+0,0x30); //Baud rate LSB
LSB 0x3F8
0x3FD Line Status Register outportb(port+1,0x00); //Baud rate MSB (0x3F9)
outportb(port+3,0x03); //cw no parity MSB 0x3F9
D7 D6 D5 D4 D3 D2 D1 D0 outportb(port+2,0x00); // Interrupt identification Register
0 TSRE THRE BI FE PE OE RxRDY outportb(port+4,0x0b); //modem control

while(!inport(port+5) & 0x20); //Check status for redy to Tx


TSRE--- Tx Shift Register Empty outport(port,'a');
THRE--- Tx Hold Register Empty
while(!inport(port+5) & 0x01); //Check status for redy to Rx
t=inport(port);

Pulse width modulator


• Generates pulses with
Controlling a DC motor with a
specific high/low times PWM
• Duty cycle: % time clk clk_div counter Input Voltage
% of Maximum
Voltage Applied
RPM of DC Motor
( 0 – 254)
high pwm_o controls how
fast the
0 0 0

clk counter 2.5 50 1840


– Square wave: 50% duty increments
8-bit
comparator pwm_o
counter <
cycle_high,
3.75 75 6900
cycle 25% duty cycle – average pwm_o is 1.25V pwm_o = 1
counter >= 5.0 100 9200
cycle_high,
• Common use: control cycle_high
pwm_o = 0 Relationship between applied voltage and speed of
pwm_o the DC M otor
average voltage to Internal Structure of PWM
clk
electric device void main(void){ The PWM alone cannot drive the
50% duty cycle – average pwm_o is 2.5V. 5V
– Simpler than DC-DC /* controls period */
DC motor, a possible way to
implement a driver is shown
converter or digital- PWMP = 0xff;
/* controls duty cycle */
below using an MJE3055T NPN
transistor.
analog converter pwm_o PWM1 = 0x7f; 5V From DC
processor M OTOR
– DC motor speed, clk
}
while(1){};
A
dimmer lights 75% duty cycle – average pwm_o is 3.75V.
B
• Another use: encode
commands, receiver
uses timer to decode

Pulse width modulator (PWM) Controlling a DC motor with a PWM


% of Maximum
counter
Input Voltage RPM of DC Motor
clk clk_div Voltage Applied
( 0 – 254)
controls how 0 0 0
fast the
16 bit up counter counter 8-bit counter <
2.5 50 1840
increments comparator pwm_o cycle_high, 3.75 75 6900
TC=65535-N pwm_o = 1
counter >= 5.0 100 9200
Load timer Load timer Load timer Load timer cycle_high cycle_high,
pwm_o = 0 Relationship between applied voltage and speed of
the DC M otor
Internal Structure of PWM

Timer OF Timer OF Timer OF Timer OF V


The PWM alone cannot drive the DC
motor, a possible way to implement a DC MOTOR
driver is shown below using an
MJE3055T NPN transistor. mC
R1 R2 Q2
5V
RC5 BC547
C1
CAP
R
From DC
proces M OTOR
sor

Optinal

29
TEMPERATURE CONTROL USING THYRISTOR contd…..
TEMPERATURE CONTROLE USING THYRISTOR
+5V
(AC VOLTAGE CONTROLLER)
mC
9V 10K
ZCP
1K

a Vr.m.s AC MAINS
- +
INT
Heating (temp)
0 TIMER
PC817
Voltage OPTOCOUPLER

t RC5

390E,11W 560E,11W LOAD


100E D5

6
LED BTA16
t 1 0.1uF
TRIAC
AC
2 Supply

4
MOC3020

5ms If 1-IC=1ms
Voltage
5ms=5000 IC CONSTANT CURRENT SOURCE USING PWM
t
V
16 bit up counter
LOAD
TC=65535-N mC 25% duty cycle – average pwm_o is 1.25V

R1 R2 Q2
RC5 BU508
pwm_
o
t C1
CAP
clk
1E
Load timer Load timer Load timer Load timer
50% duty cycle – average pwm_o is 2.5V.

pwm_
ADC o
Timer OF Timer OF Timer OF Timer OF clk

75% duty cycle – average pwm_o is 3.75V.

pwm_
o
t clk

Block Diagram (16 X 2) LCD pin description


PIN NO SYMBOL DESCRIPTION
LCD controller
Pin 1 VSS Ground Terminal, 0V
C
O LCD Pin 2 VDD Supply Terminal, +5V
E N
Pin 3 VL Liquid Crystal drive Voltage
T
R
RW O Pin 4 RS Register Select:
L RS=0 – Instruction Register
L RS=1 – Data Register
RS E
DO -D7 R Pin 5 R/W Read/Write
R/W=1 – Read
R/W=0 –Write
D
R Pin 6 E Enable:
I Enables Read/Write
VDD V
E Pin 7 DB0 Bi-directional data-bus:
R
VL To to When interface data length is 8-bits, data transfer is done once
Pin 14 DB7 through DB0-DB7. When the interface data length is 4-bits, data
transfer is done twice through DB4-DB7.
Vss
Pin 15 BACKLIGHT In case of 15 Pin Modules, Pin 15 is the supply (+5V) for the
Pin 16 SUPPLY LED. In case of 16 Pin Modules, Pin 15 is the (LED+) (5V) and
Pin 16 is (LED-) GND (0V) for the LED.

30
CODE (HEX) COMMAND TO LCD INSTRUCTION REGISTER
1 Clear display screen
2 Return home
4 Decrement cursor (shift cursor to left)
6 Increment cursor (shift cursor to right)
5 Shift display right
7 Shift display left
8 Display off, cursor off
A Display on, cursor on
C Display on, cursor off
E Display on, cursor blinking
F Display on, cursor blinking
10 Shift cursor position to left
14 Shift cursor position to right
18 Shift the entire display to the left
1C Shift the entire display to the right
80 Force cursor to beginning of 1st line
C0 Force cursor to beginning of 2nd line
38 2 lines and 5x7 matrix

Writing Command to LCD Writing Data to LCD


ORG 00H
MOV R4,#9 LCD.ASM
MOV A,#38H ; 2 lines and 5x7 matrix DATAWR :
Check for ready or wait for >5msec CALL COMMAND CALL READY
Check for ready or wait for >200msec
MOV A,#0EH ;Display on, cursor blinking MOV P3,A
command D7.......D0 Data D7.......D0 CALL COMMAND
SETB P2.0
MOV A,#01H ;Clear display
screen CLR P2.1
RS=0 RS=1 CALL COMMAND SETB P2.2
NOP NOP
MOV DPTR,#200H
R/W=0 R/W=0 CLR A
CLR P2.2
RET
AGAIN : MOVC A,@A+DPTR
CALL DATAWR
;****************
E E INC DPTR READY : SETB P3.7 OR Use Delay
CLR A routine
void command(void) DJNZ R4,AGAIN CLR P2.0
void display(void) SETB P2.1
{
{ BACK : CLR P2.2
setdelay(25); or call ready ;****************
PORTD=R3; COMMAND : SETB P2.2
PORTD=R3; CALL READY JB P3.7,BACK
setdelay(2); or call ready
RC0=0; //Rs MOV P3,A
RET
RC0=1; //Rs CLR P2.0
RC1=0; //R/W CLR P2.1 ;****************
RC1=0; //R/W
RC2=1; //E1 SETB P2.2 ORG 200H
RC2=1; //E1 NOP
asm("nop"); CLR P2.2 DB 'I','R','E'
asm("nop"); DB 'C','A','.'
asm("nop"); RET
asm("nop"); ;******************* DB 'T','K',' '
RC2=0;
RC2=0; ;****************
}
} HERE : NOP
END

C1
C2
Keypad controller C3
C4

R1
R2
R3
N1 R4
N2
N3 k_pressed
N4

M1
M2
M3
M4
key_code
4
key_code
Algorithm for keyboard-Display
• Wait until all keys are open.
keypad controller
• Check for any key press.
• Wait for around 10mS (Key debounce).
N=4, M=4 • Identify the key pressed by scanning each row taken one at a
time.
• Assign key code
• Display the key pressed on the 7-segment display.

31
Stepper Motor
• A stepper motor is an electromechanical device which converts electrical pulses into

Stepper motor controller •


discrete mechanical movements
The shaft or spindle of a stepper motor rotates in discrete step increments when electrical
command pulses are applied to it in the proper sequence
Sequence A B A’ B’ • The motors rotation has several direct relationships to these applied input pulses
• Stepper motor: rotates fixed 1 + + - - • The sequence of the applied pulses is directly related to the direction of motor shafts
2 - + + - rotation
number of degrees when 3 - - + + • The speed of the motor shafts rotation is directly related to the frequency of the input
4 + - - +
given a “step” signal 5 + + - -
pulses
• The length of rotation is directly related to the number of input pulses applied
– In contrast, DC motor just Vd 1 16 Vm • A stepper motor can be a good choice whenever controlled movement is required. They
M C3479P 15
rotates when power applied, A’
A
2
3 14
B
B’
can be used to advantage in applications where you need to control rotation angle, speed,
position and synchronism
coasts to stop GND
4
5
13
12
GND

Bias’/Set 6 11 Phase A’

• Rotation achieved by Clk


O|C
7
8
10
9
CW’/CCW
Full’/Half Step

applying specific voltage Red A


sequence to coils White
Yellow
A’
B
Black B’
• Controller greatly simplifies
this

If one winding is energized at any


given time (Single coil excitation or wave drive
mode).
If The stator is energized according to
the sequence A → B → A→ B and the
rotor steps from position 8 → 2 → 4
→ 6. For unipolar and bipolar wound

In Full Step Drive you are ener-gizing


two phases at any given time.The stator
Half Step Drive combines both wave and full step (1&2 phases
is energized according tothe sequence
on) drive modes. Every second step only one phase is energized
AB  AB  A B  A B and the rotor
and during the other steps one phase on each stator.(Half step excitation )
steps from position 1 → 3 → 5 → 7 .
Full step mode results in the same
The stator is energized according to the sequence
angular movement but the mechanical
AB  B  AB  A  A B  B  A B  A and the rotor steps
position is offset by one half of a full
from position 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8. This results
step. (Double coil excitation)
in angular movements that are half of those in 1- or 2-phases

Single coil excitation Double coil excitation Half step excitation

Clockwise Anticlockwise Clockwise Anticlockwise Clockwise Anticlockwise


L4 L3 L2 L1 L4 L3 L2 L1 L4 L3 L2 L1 L4 L3 L2 L1 L4 L3 L2 L1 L4 L3 L2 L1
0 0 0 1 (1) 0 0 0 1 (1) 0 0 1 1 (3) 0 0 1 1 (3) 0001 0001
0 0 1 0 (2) 1 0 0 0 (8) 0 1 1 0 (6) 1 0 0 1 (9) 0011 0011
0 1 0 0 (4) 0 1 0 0 (4) 1 1 0 0(C) 1 1 0 0 (C) 0010 1000
1 0 0 0 (8) 0 0 1 0 (2) 1 0 0 1(9) 0 1 1 0 (6) 0110 1001
0100 0100
The table will give you the complete idea that how to give pulses 1100 1100
in each mode 1000 0010
1001 0110

Note:- In half step excitation mode motor will rotate at half the specified given
step resolution. Means if step resolution is 1.8 degree then in this mode it will be
0.9 degree. Step resolution means on receiving on 1 pulse motor will rotate that
much degree. If step resolution is 1.8 degree then it will take 200 pulses for motor
to compete 1 revolution (360 degree).

32
MOVLW N (50d)
PORT=Sequence1 (6) MOVWF COUNT
go: MOVLW 0X06
Stepper motor with controller
Delay MOVWF PORTB
PORT=Sequence2 (3)
Delay
call delay
MOVLW 0X03
(driver)
/* main.c */
MOVWF PORTB void main(void){
PORT=Sequence3 (9) call delay MC3479P sbit clk=P1^1;
*/turn the motor forward */
Delay MOVLW 0X09 Stepper Motor sbit cw=P1^0;
cw=0; /* set direction */
Driver 8051 clk=0; /* pulse clock */
PORT=Sequence4 (C ) MOVWF PORTB CW’/CCW
void delay(void){
delay();
int i, j;
delay call delay 10 CLK
P1.0
for (i=0; i<1000; i++)
clk=1;
P1.1
MOVLW 0X0C 7 for ( j=0; j<50; j++)
i = i + 0; /*turn the motor backwards */
MOVWF PORTB 2 A’ B 15 cw=1; /* set direction */
}
call delay 3 A B’ 14 clk=0; /* pulse clock */
delay();
DECFZ COUNT,1 clk=1;
GOTO go
01100110=66 RETURN }

00110011=x3 MOVLW 0X66 Stepper


The output pins on the stepper motor driver do not
provide enough current to drive the stepper motor. +V
MOVWF SEQ Motor To amplify the current, a buffer is needed. One 1K
Q1
possible implementation of the buffers is pictured
10011001=x9 Back: MOVF SEQ,0 to the left. Q1 is an M JE3055T NPN trans istor A B

and Q2 is an M JE2955T PNP transistor. A is


MOVWF PORT connected to the 8051 microcontroller and B is
1K
Q2

11001100=xC BCF STATUS,C


connected to the stepper motor.

RRF SEQ,1
CALL delay
GOTO back

Stepper motor without controller


Analog-to-digital converters
(driver)
8051 /*main.c*/ /* counter clockwise movement */
sbit notA=P2^0; if(dir==0){
P2.4 GND/ +V sbit isA=P2^1; for(y=0; y<=step; y++){ Vmax = 7.5V 1111 4 4
sbit notB=P2^2; for(z=19; z>=0; z - 4){
P2.3
sbit isB=P2^3; isA=lookup[z]; 7.0V 1110
P2.2 6.5V
sbit dir=P2^4; isB=lookup[z-1]; 1101 3 3

analog output (V)


analog input (V)
P2.1
P2.0
notA=lookup[z -2]; 6.0V 1100
void delay(){ notB=lookup[z-3]; 5.5V
int a, b; delay( ); 1011
2 2
for(a=0; a<5000; a++) } 5.0V 1010
Stepper for(b=0; b<10000; b++) } 4.5V 1001
M otor a=a+0; } 1 1
} } 4.0V 1000
void main( ){ 3.5V 0111
void move(int dir, int steps) { int z; time
3.0V 0110 time
A possible way to implement the buffers is located int y, z; int lookup[20] = { t1 t2 t3 t4 t1 t2 t3 t4
below. The 8051 alone cannot drive the stepper motor, so /* clockwise movement */ 1, 1, 0, 0, 2.5V 0101
several transistors were added to increase the current going if(dir == 1){ 0, 1, 1, 0, 2.0V 0100 0100 1000 0110 0101 0100 1000 0110 0101
to the stepper motor. Q1 are M JE3055T NPN transistors for(y=0; y<=steps; y++){ 0, 0, 1, 1, 1.5V 0011 Digital output Digital input
and Q3 is an M JE2955T PNP transistor. A is connected to for(z=0; z<=19; z+4){ 1, 0, 0, 1,
isA=lookup[z]; 1, 1, 0, 0 }; 1.0V 0010
the 8051 microcontroller and B is connected to the stepper
motor. +V isB=lookup[z+1]; while(1){ 0.5V 0001
1K notA=lookup[z+2]; /*move forward, 15 degrees (2 steps) */ 0V
Q1
notB=lookup[z+3]; move(1, 2);
0000
+V B
1K delay(); /* move backwards, 7.5 degrees (1step)*/
A
} move(0, 1); proportionality analog to digital digital to analog
Q2
} }
330 } }

Digital-to-analog conversion
using successive approximation
Given an analog input signal whose voltage should range from 0 to 15 volts, and an 8-bit digital encoding, calculate the correct encoding for
5 volts. Then trace the successive-approximation approach to find the correct encoding.

5/15 = d/(28-1) Encoding: 01010101


d= 85

Successive-approximation method Chapter 5 Memory


½(Vmax – Vmin) = 7.5 volts 0 0 0 0 0 0 0 0 ½(5.63 + 4.69) = 5.16 volts 0 1 0 1 0 0 0 0
Vmax = 7.5 volts. Vmax = 5.16 volts.

½(7.5 + 0) = 3.75 volts 0 1 0 0 0 0 0 0 ½(5.16 + 4.69) = 4.93 volts 0 1 0 1 0 1 0 0


Vmin = 3.75 volts. Vmin = 4.93 volts.

½(7.5 + 3.75) = 5.63 volts 0 1 0 0 0 0 0 0 ½(5.16 + 4.93) = 5.05 volts 0 1 0 1 0 1 0 0


Vmax = 5.63 volts Vmax = 5.05 volts.

½(5.63 + 3.75) = 4.69 volts 0 1 0 1 0 0 0 0 ½(5.05 + 4.93) = 4.99 volts 0 1 0 1 0 1 0 1


Vmin = 4.69 volts.

33
Outline Introduction
• Memory Write Ability and Storage • Embedded system’s functionality aspects
Permanence – Processing
• processors
• Common Memory Types
• transformation of data
• Composing Memory – Storage
• Memory Hierarchy and Cache • memory
• Advanced RAM • retention of data
– Communication
• buses
• transfer of data

Write ability/ storage


Memory: basic concepts
permanence

permanence
m × n me mory
• Traditional ROM/RAM
• Stores large number of bits

Storage
… distinctions Mask-programmed ROM Ideal memory
– m x n: m words of n bits each – ROM
m words

Life of OT P ROM

– k = Log2(m) address input signals • read only, bits stored without product

power T ens of EPROM EEPROM FLASH


– or m = 2^k words – RAM
years
Battery Nonvolatile NVRAM
n bits per word
– e.g., 4,096 x 8 memory: • read and write, lose stored bits life (10
years)
without power
• 32,768 bits In-system
SRAM/DRAM
me mory external vie w • Traditional distinctions blurred programmable

• 12 address input signals r/w


Near
Write
2k × n read and write – Advanced ROMs can be written zero
ability
• 8 input/output data signals enable memory
to During External External External External
In-system, fast
fabrication programmer, programmer, programmer programmer
• e.g., EEPROM
• Memory access A0

only one time only 1,000s OR in-system, OR in-system,
of cycles 1,000s block-oriented
writes,
unlimited
– Advanced RAMs can hold bits of cycles writes, 1,000s
cycles

– r/w: selects read or write Ak-1


without power of cycles

– enable: read or write only when • e.g., NVRAM
Write ability and storage permanence of memories,
asserted Qn-1 Q0 • Write ability showing relative degrees along each axis (not to scale).

– Manner and speed a memory


– multiport: multiple accesses to different can be written
locations simultaneously
• Storage permanence

Write ability Storage permanence


• Ranges of write ability • Range of storage permanence
– High end – High end
• processor writes to memory simply and quickly • essentially never loses bits
• e.g., RAM • e.g., mask-programmed ROM
– Middle range – Middle range
• processor writes to memory, but slower • holds bits days, months, or years after memory’s power source turned off
• e.g., FLASH, EEPROM • e.g., NVRAM
– Lower range
– Lower range
• special equipment, “programmer”, must be used to write
to memory • holds bits as long as power supplied to memory
• e.g., EPROM, OTP ROM • e.g., SRAM
– Low end – Low end
• bits stored only during fabrication • begins to lose bits almost immediately after written
• e.g., Mask-programmed ROM • e.g., DRAM
• In-system programmable memory • Nonvolatile memory
– Can be written to by a processor in the embedded – Holds bits after power is no longer supplied
system using the memory
– High end and middle range of storage permanence
– Memories in high end and middle range of write ability

34
ROM: “Read-Only” Memory Example: 8 x 4 ROM
• Nonvolatile memory • Horizontal lines = words
Internal view
• Can be read from but not written to, • Vertical lines = data
8 × 4 ROM
by a processor in an embedded External view
• Lines connected only at word 0
enable 2k × n ROM enable 3×8 word 1
system A0
circles decoder word 2
A0 word line


• Traditionally written to, Ak-1

• Decoder sets word 2’s line A1
A2

“programmed”, before inserting to Qn-1 Q0 to 1 if address input is 010 data line

embedded system • Data lines Q3 and Q1 are


programmable
connection wired-OR

Q3 Q2 Q1 Q0
• Uses set to 1 because there is a
– Store software program for general- “programmed” connection
purpose processor with word 2’s line
• program instructions can be one or
• Word 2 is not connected
more ROM words
with data lines Q2 and Q0

Implementing combinational
Mask-programmed ROM
function
• Any combinational circuit of n functions of same k
variables can be done with 2^k x n ROM • Connections “programmed” at fabrication
– set of masks
• Lowest write ability
Truth table
– only once
Inputs (address) Outputs
a
0
b
0
c
0
y
0
z
0
8×2 ROM
0
0
0
1
word 0
word 1
• Highest storage permanence
0 0 1 0 1
0 1
0
0
1
1
0
1
0
1
1
0 enable 1
1
0
0
– bits never change unless damaged
1 0 0 1 0
1 0 1 1 1 c 1 1
1
1
1
1
0
1
1
1
1
1
b
a
y
1
1
1
1
z
word 7 • Typically used for final design of high-
volume systems
– spread out NRE cost for a low unit cost

OTP ROM: One-time EPROM: Erasable


programmable ROM programmable ROM
• Programmable component is a MOS
• Connections “programmed” after manufacture by transistor 0V

user
floating gate

– Transistor has “floating” gate surrounded by an source drain

insulator
– user provides file of desired contents of ROM – (a) Negative charges form a channel between (a)

– file input to machine called ROM programmer source and drain storing a logic 1
– (b) Large positive voltage at gate causes negative
– each programmable connection is a fuse charges to move out of channel and get trapped in +15V
floating gate storing a logic 0
– ROM programmer blows fuses where connections – (c) (Erase) Shining UV rays on surface of floating-
(b)
source drain

should not exist gate causes negative charges to return to channel


5-30 min
from floating gate restoring the logic 1
• Very low write ability – (d) An EPROM package showing quartz window
drain
through which UV light can pass source

– typically written only once and requires ROM (c)

• Better write ability


programmer device – can be erased and reprogrammed thousands (d)

• Very high storage permanence of times


.

• Reduced storage permanence


– bits don’t change unless reconnected to programmer
– program lasts about 10 years but is
and more fuses blown

35
EEPROM: Electrically erasable
Flash Memory
programmable ROM
• Programmed and erased electronically • Extension of EEPROM
– typically by using higher than normal voltage – Same floating gate principle
– can program and erase individual words – Same write ability and storage permanence
• Better write ability • Fast erase
– can be in-system programmable with built-in circuit to – Large blocks of memory erased at once, rather than
provide higher than normal voltage one word at a time
• built-in memory controller commonly used to hide details from – Blocks typically several thousand bytes large
memory user
– writes very slow due to erasing and programming
• Writes to single words may be slower
• “busy” pin indicates to processor EEPROM still writing – Entire block must be read, word updated, then entire
block written back
– can be erased and programmed tens of thousands of
times • Used with embedded systems storing large data
• Similar storage permanence to EPROM (about 10 items in nonvolatile memory

RAM: “Random-access”
Basic types of RAM
memory e xte rnal view

• Typically volatile memory r/w 2k × n read and write • SRAM: Static RAM memory cell internals
enable memory
– bits are not held without power supply – Memory cell uses flip-flop to
A0 SRAM

• Read and written to easily by Ak-1 store bit

embedded system during – Requires 6 transistors Data' Data
execution Qn-1 Q0

inte rnal view – Holds data as long as power


• Internal structure more complex I3 I2 I1 I0
supplied
W

than ROM 4×4 RAM

– a word consists of several memory cells, enable 2×4


• DRAM: Dynamic RAM DRAM
decoder

each storing 1 bit A0 – Memory cell uses MOS Data


A1
– each input and output data line connects Memory
cell
transistor and capacitor to W

to each cell in its column rd/wr To every cell store bit


– rd/wr connected to every cell Q3 Q2 Q1 Q0
– More compact than SRAM
– when row is enabled by decoder, each – “Refresh” required due to
cell has logic that stores input data bit
capacitor leak

Example:
Ram variations HM6264 & 27C256 RAM/ROM
devices
• Low-cost low-capacity
• PSRAM: Pseudo-static RAM 11-13, 15-19 data<7…0>
11-13, 15-19 data<7…0>

memory devices 2,23,21,24,


25, 3-10
addr<15...0> 27,26,2, 2 3,21, addr<15...0>

– DRAM with built-in memory refresh controller 22 /OE


24,25, 3-10
22 /OE
• Commonly used in 8-bit 27 /WE 20 /CS
– Popular low-cost high-density alternative to SRAM microcontroller-based 20 /CS1

26 CS2 HM6264 27C256

• NVRAM: Nonvolatile RAM embedded systems block diagrams

Device Access T ime (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V)

– Holds data after external power removed • First two numeric digits HM6264
27C256
85-100
90
.01
.5
15
100
5
5

– Battery-backed RAM indicate device type de vice characteristics

Read operation Write operation

• SRAM with own permanently connected battery – RAM: 62 data


data

• writes as fast as reads – ROM: 27 addr addr


OE WE

• no limit on number of writes unlike nonvolatile ROM-based • Subsequent digits /CS1


CS2
/CS1
CS2
memory indicate capacity in timing diagrams

– SRAM with EEPROM or flash kilobits


• stores complete RAM contents on EEPROM or flash before
power turned off

36
Example:
TC55V2325FF-100 memory Composing memory
• 2-megabit
device data<31…0> Device
T C55V23
Access T ime (ns)
10
Standby Pwr. (mW) Active Pwr. (mW)
na 1200
Vcc Voltage (V)
3.3
• Memory size needed often differs from size of
readily available memories
Increase number of words
2m+1 × n ROM
addr<15…0> 25FF-100
2m × n ROM
synchronous addr<10...0> de vice characteristics • When available memory is larger, simply ignore A0
unneeded high-order address bits and higher … …
pipelined burst /CS1

data lines
Am-1
1×2 …
/CS2 A single read operation Am decoder
SRAM memory CS3 • When available memory is smaller, compose 2m × n ROM
CLK
device /WE
/ADSP
several smaller memories into one larger enable

/OE memory
• Designed to be MODE
/ADSC
– Connect side-by-side to increase width of words

interfaced with /ADSP


/ADV

addr <15…0> – Connect top to bottom to increase number of



32-bit /ADSC /WE
words
2m × 3n ROM
Qn-1 Q0
/OE
/ADV
• added high-order address line selects smaller
processors CLK /CS1 and /CS2
enable 2m × n ROM 2m × n ROM
memory containing desired word using a
2m × n ROM A

Increase width Increase number


• Capable of fast TC55V2325F
F-100
CS3
of words decoder
A0 …
Am
… … and width of
data<31…0> – Combine techniques to increase number and words
sequential reads block diagram
timing diagram width of words
… … … enable

and writes as Q3n-1 Q2n-1 Q0 outputs

well as single
byte I/O

Memory hierarchy Cache


• Want inexpensive, • Usually designed with SRAM
fast memory Processor
– faster but more expensive than DRAM
• Main memory Registers
• Usually on same chip as processor
– Large, inexpensive, – space limited, so much smaller than off-chip main memory
Cache
slow memory stores – faster access ( 1 cycle vs. several cycles for main memory)
entire program and Main memory • Cache operation:
data Disk
– Request for main memory access (read or write)
– First, check cache for copy
• Cache T ape
• cache hit
– Small, expensive, fast – copy is in cache, quick access
memory stores copy • cache miss
of likely accessed – copy not in cache, read address and possibly its neighbors into cache

parts of larger • Several cache design choices


memory – cache mapping, replacement policies, and write techniques

Cache mapping Direct mapping


• Far fewer number of available cache addresses • Main memory address divided into 2
fields
• Are address’ contents in cache?
– Index
• Cache mapping used to assign main memory • cache address
• number of bits determined by cache size
address to cache address and determine hit or T ag Index Offset

– Tag V T D
miss • compared with tag stored in cache at
Data

• Three basic techniques: address indicated by index


Valid
• if tags match, check valid bit =
– Direct mapping
• Valid bit
– Fully associative mapping – indicates whether data in slot has been
– Set-associative mapping loaded from memory
• Caches partitioned into indivisible blocks or lines • Offset
of adjacent memory addresses – used to find particular word in cache line

– usually 4 or 8 addresses per line

37
Fully associative mapping Set-associative mapping
• Complete main memory address stored in each • Compromise between direct
cache address mapping and fully associative
• All addresses stored in cache simultaneously mapping
T ag Index Offset

compared with desired address • Index same as in direct mapping


V T D V T D

• Valid bit and offset same as direct mapping


T ag Offset
• But, each cache address Data

V T D V T D V T D
Data contains content and tags of 2 or Valid


more memory address locations = =

Valid
= =
=
• Tags of that set simultaneously
compared as in fully associative
mapping
• Cache with set size N called N-
way set-associative
– 2-way, 4-way, 8-way are common

Cache-replacement policy Cache write techniques


• Technique for choosing which block to replace • When written, data cache must update main
– when fully associative cache is full memory
– when set-associative cache’s line is full • Write-through
• Direct mapped cache has no choice – write to main memory whenever cache is written to
• Random – easiest to implement
– replace block chosen at random – processor must wait for slower main memory write
– potential for unnecessary writes
• LRU: least-recently used
– replace block not accessed for longest time • Write-back
• FIFO: first-in-first-out – main memory only written when “dirty” block replaced
– extra dirty bit for each block set when cache block
– push block onto queue when accessed
written to
– choose block to replace by popping queue
– reduces number of slow main memory writes

Cache impact on system


Cache performance trade-offs
performance
• Most important parameters in terms of performance:
– Total size of cache
• Improving cache hit rate without increasing
• total number of data bytes cache can hold size
• tag, valid and other house keeping bits not included in total
– Increase line size
– Degree of associativity
– Data block size – Change set-associativity
0.16

0.14
• Larger caches achieve lower miss rates but higher access 0.12

cost % cache miss


0.1 1 way
2 way
0.08
– e.g., 4 way
0.06 8 way
• 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 0.04
cycles 0.02
– avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles 0
cache size
1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb
• 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not
change
– avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles
(improvement)

38
Advanced RAM Basic DRAM
• DRAMs commonly used as main memory in • Address bus
processor based embedded systems multiplexed between
row and column data Refresh
– high capacity, low cost components
Circuit

Col Addr . Buffer


Data In Buffer
Sense
• Many variations of DRAMs proposed • Row and column rd/wr cas
Amplifiers
Col Decoder

– need to keep pace with processor speeds addresses are latched

cas, ras, clock


Data Out Buffer

Row Decoder
Row Addr. Buffer
– FPM DRAM: fast page mode DRAM in, sequentially, by
– EDO DRAM: extended data out DRAM strobing ras and cas ras
address
signals, respectively Bit storage array

– SDRAM/ESDRAM: synchronous and enhanced


synchronous DRAM • Refresh circuitry can be
external or internal to
– RDRAM: rambus DRAM
DRAM device
– strobes consecutive
memory address

Fast Page Mode DRAM (FPM Extended data out DRAM (EDO
DRAM) DRAM)
• Each row of memory bit array is viewed as a page
• Improvement of FPM DRAM
• Page contains multiple words
• Individual words addressed by column address
• Extra latch before output buffer
• Timing diagram: – allows strobing of cas before data read operation
completed
– row (page) address sent
– 3 words read consecutively by sending column address for each • Reduces read/write latency by additional cycle
• Extra cycle eliminated on each read/write of words from
ras
ras

cas
same page cas address row col col col

data data data data


address row col col col

Speedup through overlap


data data data data

(S)ynchronous and
Enhanced Synchronous (ES) Rambus DRAM (RDRAM)
• DRAM
SDRAM latches data on active edge of clock
• Eliminates time to detect ras/cas and rd/wr • More of a bus interface architecture than
signals DRAM architecture
• A counter is initialized to column address then • Data is latched on both rising and falling
incremented on active edge of clock to access edge of clock
consecutive memory locations
• Broken into 4 banks each with own row
• ESDRAM improves SDRAM clock

– added buffers enable


ras
overlapping of column addressing
decoder
cas

– faster clocking and


address lower read/write latency possible
row col
– can have 4 pages open at a time
data
data data data
• Capable of very high throughput

39
Memory Management Unit
DRAM integration problem
(MMU)
• SRAM easily integrated on same chip as • Duties of MMU
processor
– Handles DRAM refresh, bus interface and
• DRAM more difficult arbitration
– Different chip making process between DRAM – Takes care of memory sharing among
and conventional logic multiple processors
– Goal of conventional logic (IC) designers: – Translates logic memory addresses from
• minimize parasitic capacitance to reduce signal processor to physical memory addresses of
propagation delays and power consumption DRAM
– Goal of DRAM designers: • Modern CPUs often come with MMU
• create capacitor cells to retain stored information
built-in
– Integration processes beginning to appear
• Single-purpose processors can be used

Outline
• Interfacing basics
• Microprocessor interfacing
Chapter 6 Interfacing – I/O Addressing
– Interrupts
– Direct memory access
• Arbitration
• Hierarchical buses
• Protocols
– Serial
– Parallel
– Wireless

Introduction A simple bus


• Wires:
• Embedded system functionality aspects – Uni-directional or bi-
– Processing directional Processor
rd'/wr Memory

• Transformation of data – One line may represent enable

• Implemented using processors multiple wires addr[0-11]

– Storage • Bus data[0-7]

• Retention of data – Set of wires with a single


• Implemented using memory function bus

bus structure
– Communication • Address bus, data bus
• Transfer of data between processors and – Or, entire collection of wires
memories
• Address, data and control
• Implemented using buses
• Associated protocol: rules for
• Called interfacing communication

40
Timing Diagrams
Ports • Most common method for describing
a communication protocol
rd'/wr
• Time proceeds to the right on x-axis
Processor Memory rd'/wr
• Control signal: low or high
port enable enable
addr[0-11]
– May be active low (e.g., go’, /go, or
addr
go_L)
data[0-7]
– Use terms assert (active) and data
deassert
bus tsetup tread
– Asserting go’ means go=0
• Conducting device on periphery read protocol
• Data signal: not valid or valid
• Connects bus to processor or memory
• Protocol may have subprotocols rd'/wr
• Often referred to as a pin – Called bus cycle, e.g., read and enable
– Actual pins on periphery of IC package that plug into socket on printed- write
addr
circuit board – Each may be several clock cycles
– Sometimes metallic balls instead of pins • Read example data

– Today, metal “pads” connecting processors and memories within single – rd’/wr set low,address placed on tsetup twrite
IC addr for at least tsetup time before write protocol
enable asserted, enable triggers
• Single wire or set of wires with single function memory to place data on data wires
– E.g., 12-wire address port by time tread

Basic protocol concepts Basic protocol concepts: control


• Actor: master initiates, servant (slave) respond
• Direction: sender, receiver methods
• Addresses: special kind of data Master Servant
Master Servant req
– Specifies a location in memory, a peripheral, or a register within a req

peripheral ack

• Time multiplexing data


data
– Share a single set of wires for multiple pieces of data
– Saves wires at expense of time req 1 3 req 1 3
ack 2 4
data 2 4
Time-multiplexed data transfer data
Master req Servant Master req Servant taccess
data(15:0) data(15:0)
addr data addr data
1. Master asserts req to receive data 1. Master asserts req to receive data
mux demux mux demux
data(8) 2. Servant puts data on bus within time taccess 2. Servant puts data on bus and asserts ack
addr/data
3. Master receives data and deasserts req 3. Master receives data and deasserts req
4. Servant ready for next request 4. Servant ready for next request

req req
data 15:8 7:0 addr/data addr data
Strobe protocol Handshake protocol
data serializing address/data muxing

A strobe/handshake compromise ISA bus protocol – memory access


To achieve both the speed of strobe & varying response time tolerance of
handshake • ISA: Industry
Microprocessor Memory I/O Device
Master req Servant Standard ISA bus
wait
Architecture memory-read bus cycle
data C1 C2 WAIT C3
– Common in 80x86’s CYCLE

CLOCK
C4
DAT A
req
wait
1 3 req 1
wait 2 3
4
• Features D[7-0]

A[19-0] ADDRESS

data 2 4 data 5 – 20-bit address ALE

/MEMR
taccess taccess
1. Master asserts req to receive data 1. Master asserts req to receive data – Compromise CHRDY

memory-write bus cycle


2. Servant puts data on bus within time taccess 2. Servant can't put data within taccess, asserts wait ack
(wait line is unused) 3. Servant puts data on bus and deasserts wait strobe/handshake CYCLE C1 C2 WAIT C3
C4
3. Master receives data and deasserts req
4. Servant ready for next request
4. Master receives data and deasserts req
5. Servant ready for next request
control CLOCK

DAT A
D[7-0]

• 4 cycles default A[19-0] ADDRESS

• Unless CHRDY ALE

/MEMW
Fast-response case Slow-response case deasserted – resulting CHRDY

in additional wait
cycles (up to 6)

41
Microprocessor interfacing: I/O addressing Compromises/extensions
• A microprocessor communicates with • Parallel I/O peripheral
other devices using some of its pins – When processor only supports bus- Processor Memory
based I/O but parallel I/O needed
– Port-based I/O (parallel I/O) System bus

– Each port on peripheral connected


• Processor has one or more N-bit ports to a register within peripheral that is
Parallel I/O peripheral

• Processor’s software reads and writes a port just read/written by the processor Port A Port B Port C

like a register
• Extended parallel I/O Adding parallel I/O to a bus-
based I/O processor
• E.g., P0 = 0xFF; v = P1.2; -- P0 and P1 are 8-bit
– When processor supports port- Processor Port 0
ports Port 1
based I/O but more ports needed Port 2
Port 3
– Bus-based I/O – One or more processor ports Parallel I/O peripheral

• Processor has address, data and control ports that interface with parallel I/O peripheral
form a single bus extending total number of ports Port A Port B Port C
Extended parallel I/O
• Communication protocol is built into the processor available for I/O
• A single instruction carries out the read or write – e.g., extending 4 ports to 6 ports in
protocol on the bus figure

Types of bus-based I/O: Memory-mapped I/O vs. Standard I/O


memory-mapped I/O and standard I/O
• Memory-mapped I/O
• Processor talks to both memory and peripherals using same – Requires no special instructions
bus – two ways to talk to peripherals
• Assembly instructions involving memory like MOV
– Memory-mapped I/O and ADD work with peripherals as well
• Peripheral registers occupy addresses in same • Standard I/O requires special instructions (e.g., IN,
address space as memory OUT) to move data between peripheral registers
• e.g., Bus has 16-bit address and memory
– lower 32K addresses may correspond to memory
– upper 32k addresses may correspond to peripherals
• Standard I/O
– Standard I/O (I/O-mapped I/O) – No loss of memory addresses to peripherals
• Additional pin (M/IO) on bus indicates whether a – Simpler address decoding logic in peripherals
memory or peripheral access possible
• e.g., Bus has 16-bit address • When number of peripherals much smaller than
– all 64K addresses correspond to memory when M/IO set to 0 address space then high-order address bits can be
– all 64K addresses correspond to peripherals when M/IO set ignored
to 1 – smaller and/or faster comparators

ISA bus A basic memory protocol


• ISA supports
D<0...7>
Adr. 7..0
standard I/O P0

P2 Adr. 15…8
Data P0 D
/CS
Q
A<0...15>
/OE
ALE G /WE
– /IOR distinct from Q Adr. 7…0 74373 CS2 /CS1
8 HM6264
/MEMR for peripheral ISA I/O bus read protocol
ALE P2
/WR /CS

read CYCLE C1
C4
C2 WAIT C3 /RD /RD
/PSEN
D<0...7>
A<0...14>
CLOCK

• /IOW used for writes D[7-0]


DAT A 8051
/OE
27C256

A[15-0] ADDRESS
– 16-bit address space
for I/O vs. 20-bit
ALE

/IOR
• Interfacing an 8051 to external memory
address space for
CHRDY
– Ports P0 and P2 support port-based I/O when 8051
memory internal memory being used
– Those ports serve as data/address buses when
– Otherwise very similar
external memory is being used
to memory protocol
– 16-bit address and 8-bit data are time multiplexed;
low 8-bits of address must therefore be latched with
aid of ALE signal

42
A more complex memory Microprocessor interfacing: interrupts
protocol • Suppose a peripheral intermittently receives
FSM description
data, which must be serviced by the processor
Specification for a single
GO=0
GO=1
– The processor can poll the peripheral
read operation
CLK S0
ADSP=1,
ADSC=1
ADV=1, OE=1,
ADSP=0,
ADSC=0
ADV=0, OE=1,
S1 regularly to see if data has arrived – wasteful
GO=0
/ADSP

/ADSC
Addr = ‘Z’ Addr = Addr0
– The peripheral can interrupt the processor
Data is
/ADV GO=0
ready
here!
when it has data
GO=1
addr <15…0>
/WE • Requires an extra pin or pins: Int
ADSP=1, ADSP=1,
/OE

/CS1 and /CS2


S2 ADSC=0
ADV=1, OE=1,
Addr = ‘Z’ GO=1
ADSC=1
ADV=0, OE=0,
Addr = ‘Z’
S3 – If Int is 1, processor suspends current
CS3 program, jumps to an Interrupt Service
GO=1
data<31…0>
GO=0 Routine, or ISR
• Generates control signals to drive the TC55V2325FF memory chip in – Known as interrupt-driven I/O
burst mode
– Addr0 is the starting address input to device – Essentially, “polling” of the interrupt pin is
– GO is enable/disable input to device built-into the hardware, so no extra time!

Microprocessor interfacing: interrupts


Interrupt-driven I/O using fixed
• What is the address (interrupt address ISR location
1(a): µP is executing its main program. 1(b): P1 receives input data in a

vector) of the ISR? Time


register with address 0x8000.

– Fixed interrupt 2: P1 asserts Int to request


servicing by the
• Address built into microprocessor, cannot be 3: After completing instruction at 100, µP microprocessor.
sees Int asserted, saves the PC’s value of
changed 100, and sets PC to the ISR fixed location
of 16.
• Either ISR stored at address or a jump to actual ISR
stored if not enough bytes available 4(a): The ISR reads data from 0x8000, 4(b): After being read, P1 de-
modifies the data, and writes the resulting asserts Int.
– Vectored interrupt data to 0x8001.

• Peripheral must provide the address


5: The ISR returns, thus restoring PC to
• Common when microprocessor has multiple 100+1=101, where µP resumes executing.

peripherals connected by a system bus


– Compromise: interrupt address table

Interrupt-driven I/O using fixed Interrupt-driven I/O using fixed


ISR location ISR location
1(a): P is executing its main program Program memory µP Data memory 2: P1 asserts Int to request servicing by Program memory µP Data memory
ISR the microprocessor ISR
1(b): P1 receives input data in a register 16: MOV R0, 0x8000 16: MOV R0, 0x8000
17: # modifies R0 System bus 17: # modifies R0 System bus
with address 0x8000.
18: MOV 0x8001, R0 18: MOV 0x8001, R0
19: RETI # ISR return 19: RETI # ISR return
... Int P1 P2 ... Int P1 P2
Main program Main program 1
... PC ... PC
0x8000 0x8001 0x8000 0x8001
100: instruction 100: instruction
101: instruction 101: instruction

43
Interrupt-driven I/O using fixed Interrupt-driven I/O using fixed
ISR location ISR location
3: After completing instruction at 100, Program memory µP Data memory 4(a): The ISR reads data from 0x8000, Program memory µP Data memory
P sees Int asserted, saves the PC’s ISR modifies the data, and writes the ISR
value of 100, and sets PC to the ISR 16: MOV R0, 0x8000 resulting data to 0x8001. 16: MOV R0, 0x8000
17: # modifies R0 System bus 17: # modifies R0 System bus
fixed location of 16.
18: MOV 0x8001, R0 4(b): After being read, P1 deasserts Int. 18: MOV 0x8001, R0
19: RETI # ISR return 19: RETI # ISR return
... Int P1 P2 ... Int P1 P2
Main program Main program 0
... PC ... PC
0x8000 0x8001 0x8000 0x8001
100: instruction 100: instruction
101: instruction 100 101: instruction 100

Interrupt-driven I/O using fixed Interrupt-driven I/O using


ISR location vectored interrupt
1(a): µP is executing its main program. 1(b): P1 receives input data in a
Time
5: The ISR returns, thus restoring PC to Program memory µP Data memory register with address 0x8000.
100+1=101, where P resumes ISR
executing. 16: MOV R0, 0x8000
17: # modifies R0 System bus
2: P1 asserts Int to request servicing
18: MOV 0x8001, R0 3: After completing instruction at 100, µP sees Int by the microprocessor.
19: RETI # ISR return asserted, saves the PC’s value of 100, and asserts
... Int P1 P2 Inta.
Main program 4: P1 detects Inta and puts interrupt
... PC address vector 16 on the data bus.
0x8000 0x8001
100: instruction +1
101: instruction 100
5(a): µP jumps to the address on the bus (16).
The ISR there reads data from 0x8000, modifies
the data, and writes the resulting data to 0x8001. 5(b): After being read, P1 deasserts
Int.

6: The ISR returns, thus restoring PC to


100+1=101, where µP resumes executing.

Interrupt-driven I/O using Interrupt-driven I/O using


vectored interrupt vectored interrupt
1(a): P is executing its main program Program memory µP Data memory 2: P1 asserts Int to request servicing by the Program memory µP Data memory
ISR microprocessor ISR
1(b): P1 receives input data in a register 16: MOV R0, 0x8000 16: MOV R0, 0x8000
with address 0x8000. 17: # modifies R0 System bus 17: # modifies R0 System bus
18: MOV 0x8001, R0 18: MOV 0x8001, R0
19: RETI # ISR return 19: RETI # ISR return
... Inta P1 P2 ... Inta P1 P2
Main program Int Main program Int
... PC 16 ... PC 1 16
100: instruction 0x8000 0x8001 100: instruction 0x8000 0x8001
101: instruction 100 101: instruction 100

44
Interrupt-driven I/O using Interrupt-driven I/O using
vectored interrupt vectored interrupt
3: After completing instruction at 100, µP Program memory µP Data memory 4: P1 detects Inta and puts interrupt Program memory µP Data memory
sees Int asserted, saves the PC’s value of ISR address vector 16 on the data bus ISR
100, and asserts Inta 16: MOV R0, 0x8000 16: MOV R0, 0x8000
17: # modifies R0 System bus 17: # modifies R0 16 System bus
18: MOV 0x8001, R0 18: MOV 0x8001, R0
19: RETI # ISR return 1 19: RETI # ISR return
... Inta P1 P2 ... Inta P1 P2
Main program Int Main program Int
... PC 16 ... PC 16
100: instruction 0x8000 0x8001 100: instruction 0x8000 0x8001
101: instruction 100 101: instruction 100

Interrupt-driven I/O using Interrupt-driven I/O using


vectored interrupt vectored interrupt
5(a): PC jumps to the address on the bus Program memory µP Data memory 6: The ISR returns, thus restoring the PC to Program memory µP Data memory
(16). The ISR there reads data from ISR 100+1=101, where the µP resumes ISR
0x8000, modifies the data, and writes the 16: MOV R0, 0x8000 16: MOV R0, 0x8000
17: # modifies R0 System bus System bus
resulting data to 0x8001. 17: # modifies R0
18: MOV 0x8001, R0 18: MOV 0x8001, R0
19: RETI # ISR return 19: RETI # ISR return
... Inta P1 P2 ...
5(b): After being read, P1 deasserts Int. Int P1 P2
Main program Int Main program
... PC 0 16 ... PC
100: instruction 0x8000 0x8001
0x8000 0x8001 100: instruction +1
101: instruction 100 101: instruction 100

Additional interrupt issues


Interrupt address table • Maskable vs. non-maskable interrupts
– Maskable: programmer can set bit that causes
processor to ignore interrupt
• Compromise between fixed and vectored • Important when in the middle of time-critical code
interrupts – Non-maskable: a separate interrupt pin that can’t be
masked
– One interrupt pin • Typically reserved for drastic situations, like power failure
requiring immediate backup of data to non-volatile memory
– Table in memory holding ISR addresses
(maybe 256 words) • Jump to ISR
– Some microprocessors treat jump same as call of any
– Peripheral doesn’t provide ISR address, but subroutine
rather index into table • Complete state saved (PC, registers) – may take hundreds of
cycles
• Fewer bits are sent by the peripheral
– Others only save partial state, like PC only
• Can move ISR location without changing • Thus, ISR must not modify registers, or else must save them
peripheral first
• Assembly-language programmer must be aware of which
registers stored

45
Direct memory access Peripheral to memory transfer
• Buffering
– Temporarily storing data in memory before processing without DMA, using vectored
– Data accumulated in peripherals commonly buffered
• Microprocessor could handle this with ISR
interrupt
– Storing and restoring microprocessor state inefficient

Time
1(a): µP is executing its main program. 1(b): P1 receives input data in a register
with address 0x8000.
– Regular program must wait
• DMA controller more efficient 2: P1 asserts Int to request servicing by
the microprocessor.
– Separate single-purpose processor 3: After completing instruction at 100, µP sees Int
asserted, saves the PC’s value of 100, and asserts Inta.
– Microprocessor relinquishes control of system bus to DMA 4: P1 detects Inta and puts interrupt
controller address vector 16 on the data bus.

– Microprocessor can meanwhile execute its regular 5(a): µP jumps to the address on the bus (16). The ISR
there reads data from 0x8000 and then writes it to
program 0x0001, which is in memory. 5(b): After being read, P1 deasserts Int.

• No inefficient storing and restoring state due to ISR call


• Regular program need not wait unless it requires the 6: The ISR returns, thus restoring PC to 100+1=101,
system bus where µP resumes executing.

– Harvard archictecture – processor can fetch and execute


instructions as long as they don’t access data memory – if they
do, processor stalls

Peripheral to memory transfer Peripheral to memory transfer


without DMA, using vectored without DMA, using vectored
interrupt interrupt
1(a): P is executing its main program Program memory µP Data memory
ISR 0x0000 0x0001
1(b): P1 receives input data in a register 16: MOV R0, 0x8000 µP Data memory
17: # modifies R0
2: P1 asserts Int to request servicing by the Program memory
with address 0x8000. microprocessor ISR 0x0000 0x0001
18: MOV 0x0001, R0 System bus
16: MOV R0, 0x8000
19: RETI # ISR return 17: # modifies R0
...
18: MOV 0x0001, R0 System bus
Main program Inta P1
... 19: RETI # ISR return
Int ...
100: instruction 16
101: instruction PC Main program Inta P1
0x8000 ...
Int
100: instruction 16
PC 1
101: instruction 0x8000
100

Peripheral to memory transfer Peripheral to memory transfer


without DMA, using vectored without DMA, using vectored
interrupt interrupt (cont’)
4: P1 detects Inta and puts interrupt address Program memory µP Data memory
3: After completing instruction at 100, P Program memory µP Data memory ISR 0x0000 0x0001
vector 16 on the data bus.
sees Int asserted, saves the PC’s value of ISR 0x0000 0x0001 16: MOV R0, 0x8000
100, and asserts Inta. 16: MOV R0, 0x8000 17: # modifies R0
17: # modifies R0 18: MOV 0x0001, R0 System bus
18: MOV 0x0001, R0 System bus 16
19: RETI # ISR return
19: RETI # ISR return ...
... 1 Main program Inta P1
Main program Inta P1 ...
... Int
Int 100: instruction 16
100: instruction 16 101: instruction PC
PC 0x8000
101: instruction 0x8000 100
100

46
Peripheral to memory transfer Peripheral to memory transfer
without DMA, using vectored without DMA, using vectored
interrupt (cont’) interrupt (cont’)
6: The ISR returns, thus restoring PC to Program memory µP Data memory
5(a): P jumps to the address on the bus (16). µP Data memory 100+1=101, where P resumes executing. ISR 0x0000 0x0001
Program memory
ISR 0x0000 0x0001 16: MOV R0, 0x8000
The ISR there reads data from 0x8000 and
16: MOV R0, 0x8000 17: # modifies R0
then writes it to 0x0001, which is in memory. 18: MOV 0x0001,
0x8001, R0 System bus
17: # modifies R0
18: MOV 0x0001,
0x8001, R0 System bus 19: RETI # ISR return
5(b): After being read, P1 de-asserts Int. ...
19: RETI # ISR return
... Main program Inta P1
...
Main program Inta P1 Int
... 100: instruction 16
Int PC
100: instruction 16 101: instruction
PC 0 +1 0x8000
101: instruction 0x8000 100
100

Peripheral to memory transfer Peripheral to memory transfer


with DMA with DMA (cont’)
1(a): µP is executing its main program. 1(b): P1 receives input
Time

It has already configured the DMA ctrl data in a register with


registers. address 0x8000.
µP Data memory
1(a): P is executing its main program. It has Program memory
0x0000 0x0001
3: DMA ctrl asserts Dreq already configured the DMA ctrl registers
4: After executing instruction 100, µP to request control of No ISR needed!
sees Dreq asserted, releases the system system bus. 2: P1 asserts req to request
1(b): P1 receives input data in a register with System bus
bus, asserts Dack, and resumes servicing by DMA ctrl.
execution. µP stalls only if it needs the address 0x8000.
system bus to continue executing. ...
5: (a) DMA ctrl asserts Dack DMA ctrl P1
Main program Dreq
ack (b) reads data from ... 0x0001 ack
0x8000 and (b) writes that 100: instruction PC 0x8000 req
data to 0x0001. 101: instruction 0x8000
100

6:. DMA de-asserts Dreq


and ack completing
handshake with P1.
7(a): µP de-asserts Dack and resumes 7(b): P1 de-asserts req.
control of the bus.

Peripheral to memory transfer Peripheral to memory transfer


with DMA (cont’) with DMA (cont’)
2: P1 asserts req to request servicing Program memory µP Data memory 4: After executing instruction 100, P sees Program memory µP Data memory
by DMA ctrl. 0x0000 0x0001 Dreq asserted, releases the system bus, asserts 0x0000 0x0001

No ISR needed! Dack, and resumes execution, P stalls only if No ISR needed!
3: DMA ctrl asserts Dreq to request control of System bus it needs the system bus to continue executing. System bus
system bus
... ... 1
Dack DMA ctrl P1 Dack DMA ctrl P1
Main program Dreq Main program Dreq
... 0x0001 ack ... 0x0001 ack
1
100: instruction PC 100: instruction PC
0x8000 req 0x8000 req
101: instruction 0x8000 101: instruction 0x8000
100 1 100

47
Peripheral to memory transfer Peripheral to memory transfer
with DMA (cont’) with DMA (cont’)
5: DMA ctrl (a) asserts ack, (b) reads data Program memory µP Data memory 6: DMA de-asserts Dreq and ack completing Program memory µP Data memory
from 0x8000, and (c) writes that data to 0x0000 0x0001 the handshake with P1. 0x0000 0x0001
0x0001. No ISR needed! No ISR needed!
System bus System bus
(Meanwhile, processor still executing if not
stalled!) ... ...
Dack DMA ctrl P1 Dack DMA ctrl P1
Main program Dreq 1 Main program Dreq 0
... 0x0001 ack ... 0x0001 ack
0
100: instruction PC 100: instruction PC
0x8000 req 0x8000 0x8000 req
0x8000
101: instruction 101: instruction
100 100

Arbitration: Priority arbiter


ISA bus DMA cycles • Consider the situation where multiple peripherals request
service from single resource (e.g., microprocessor, DMA
controller) simultaneously - which gets serviced first?
Processor Memory

ISA-Bus
• Priority arbiter
R A
R – Single-purpose processor
DMA A I/O Device
– Peripherals make requests to arbiter, arbiter makes
DMA Me mory-Write Bus Cycle DMA Me mory-Re ad Bus Cycle
requests to resource
CYCLE C1 C2 C3 C4 C5 C6 CYCLE C1 C2 C3 C4 C5 C6
– Arbiter connected to system bus for configuration only
C7 C7
CLOCK CLOCK

D[7-0] DAT A D[7-0] DAT A


Micro-
A[19-0] ADDRESS A[19-0] ADDRESS processor
ALE ALE System bus 7
/IOR /MEMR
Inta 5
Priority Peripheral1 Peripheral2
/MEMW /IOW Int arbiter
3
CHRDY CHRDY Ireq1 2 2
Iack1 6
Ireq2
Iack2

Arbitration using a priority


Arbitration: Priority arbiter
arbiter
Micro-
processor
System bus
• Types of priority
7
Inta 5
Priority Peripheral1 Peripheral2 • Fixed priority
Int arbiter
3
Ireq1 2 2 – each peripheral has unique rank
Iack1 6
– highest rank chosen first with simultaneous requests
Ireq2
Iack2 – preferred when clear difference in rank between
peripherals
1. 1. Microprocessor is executing its program.
2. 2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts Ireq2. • Rotating priority (round-robin)
3. 3. Priority arbiter sees at least one Ireq input asserted, so asserts Int.
4. 4. Microprocessor stops executing its program and stores its state. – priority changed based on history of servicing
5. 5. Microprocessor asserts Inta.
6. 6. Priority arbiter asserts Iack1 to acknowledge Peripheral1. – better distribution of servicing especially among
7. 7. Peripheral1 puts its interrupt address vector on the system bus
8. 8. Microprocessor jumps to the address of ISR read from data bus, ISR executes and returns
peripherals with similar priority demands
9. (and completes handshake with arbiter).
10. 9. Microprocessor resumes executing its program.

48
Arbitration: Daisy-chain arbitration Arbitration: Daisy-chain arbitration
• Arbitration done by peripherals • Pros/cons
– Built into peripheral or external logic added
• req input and ack output added to each peripheral – Easy to add/remove peripheral - no system
• Peripherals connected to each other in daisy-chain manner redesign needed
– One peripheral connected to resource, all others connected
“upstream” – Does not support rotating priority
– Peripheral’s req flows “downstream” to resource, resource’s ack – One broken peripheral can cause loss of
flows “upstream” to requesting peripheral
– Closest peripheral has highest priority
access to other peripherals
Micro-
P
P processor System bus
System bus System bus
Inta
Priority Peripheral Peripheral Peripheral1 Peripheral2
Peripheral1 Peripheral2 Int arbiter 1 2 Inta
Ack_in Ack_out Ack_in Ack_out
Inta
Ack_in Ack_out Ack_in Ack_out Ireq1 Int Req_out Req_in Req_out Req_in 0
Int Req_out Req_in Req_out Req_in 0 Iack1
Ireq2
Daisy-chain aware peripherals
Iack2
Daisy-chain aware peripherals

Network-oriented arbitration Example: Vectored interrupt using


an interrupt table
• When multiple microprocessors share a • Fixed priority: i.e., Peripheral1 has highest
priority
bus (sometimes called a network) Processor • Keyword “_at_” followed by memory
address forces compiler to place variables
– Arbitration typically built into bus protocol MASK
MEMORY
in specific memory locations
IDX0
– e.g., memory-mapped registers in arbiter,
– Separate processors may try to write IDX1

ENABLE
Priority Arbiter
peripherals

Memory Bus
• A peripheral’s index into interrupt table is
simultaneously causing collisions DAT A Peripheral 1 Peripheral 2 Jump T able
sent to memory-mapped register in arbiter
• Data must be resent • Peripherals receive external data and
raise interrupt
• Don’t want to start sending again at same time void Peripheral1_ISR(void) {
unsigned char data;
– statistical methods can be used to reduce chances unsigned
unsigned
char ARBITER_MASK_REG
char ARBITER_CH0_INDEX_REG
_at_
_at_
0xfff0;
0xfff1; data = PERIPHERAL1_DATA_REG;
unsigned char ARBITER_CH1_INDEX_REG _at_ 0xfff2; // do something with the data
unsigned char ARBITER_ENABLE_REG _at_ 0xfff3; }

• Typically used for connecting multiple unsigned


unsigned
unsigned
char PERIPHERAL1_DATA_REG
char PERIPHERAL2_DATA_REG
void* INTERRUPT_LOOKUP_TABLE[256]
_at_
_at_
_at_
0xffe0;
0xffe1;
0x0100;
void Peripheral2_ISR(void) {
unsigned char data;
data = PERIPHERAL2_DATA_REG;
// do something with the data

distant chips void main() {


InitializePeripherals();
}
void InitializePeripherals(void) {
ARBITER_MASK_REG = 0x03; // enable both channels
for(;;) {} // main program goes here ARBITER_CH0_INDEX_REG = 13;

– Trend – use to connect multiple on-chip } ARBITER_CH1_INDEX_REG = 17;


INTERRUPT_LOOKUP_TABLE[13] = (void*)Peripheral1_ISR;
INTERRUPT_LOOKUP_TABLE[17] = (void*)Peripheral2_ISR;

processors }
ARBITER_ENABLE_REG = 1;

Intel 8259 programmable


Intel 8237 DMA controller
priority controller
Signal Description
D[7..0] Intel 8237 REQ 0
A[19..0] ACK 0 D[7..0] These wires are connected to the system bus (ISA) and are used by the Signal Description
ALE microprocessor to write to the internal registers of the 8237. D[7..0] These wires are connected to the sy stem bus and are used by the microp rocessor to
M EM R REQ 1
A[19..0] These wires are connected to the system bus (ISA) and are used by the DMA to write or read the internal registers of the 8259.
M EM W ACK 1
IOR issue the memory location where the transferred data is to be written to. The 8237 is A[0..0] This p in actis in cunjunction w ith WR/RD signals. It is used by the 8259 to decipher
IOW REQ 2 ALE* also
This addressed by the
is the address micro-processor
latch through
enable signal. The 8237theuselower bits ofwhen
this signal thesedriving
addressthelines. D[7..0] Intel 8259 IR0 various command words the microp rocessor writes and status the microprocessor
A[0..0] IR1
ACK 2 system bus (ISA). wishes to read.
RD IR2
HLDA MEMR* This is the memory write signal issued by the 8237 when driving the system bus WR IR3 WR When this w rite signal is asserted, the 8259 accepts the command on the data line, i.e.,
HRQ REQ 3 (ISA). INT IR4
the microprocessor w rites to the 8259 by placing a command on the data lines and
ACK 3 INT A IR5
MEMW* This is the memory read signal issued by the 8237 when driving the system bus (ISA). IR6 asserting this signal.
CAS[2..0] IR7
RD When this read signal is asserted, the 8259 provides on the data lines its status, i.e., the
IOR* This is the I/O device read signal issued by the 8237 when driving the system bus SP/EN
microp rocessor reads the status of the 8259 by asserting this signal and reading the data
(ISA) in order to read a byte from an I/O device lines.
IOW* This is the I/O device write signal issued by the 8237 when driving the system bus
INT This signal is asserted w henever a valid interrup t request is received by the 8259, i.e., it
(ISA) in order to write a byte to an I/O device.
is used to interrup t the microp rocessor.
HLDA This signal (hold acknowledge) is asserted by the microprocessor to signal that it has
relinquished the system bus (ISA). INTA This signal, is used to enable 8259 interrup t-vector data onto the data bus by a sequence
HRQ This signal (hold request) is asserted by the 8237 to signal to the microprocessor a of interrupt acknowledge p ulses issued by the microp rocessor.
request to relinquish the system bus (ISA).
REQ 0,1,2,3 An attached device to one of these channels asserts this signal to request a DMA IR An interrup t request is executed by a p eripheral device w hen one of these signals is
transfer. 0,1,2,3,4,5,6,7 asserted.
ACK 0,1,2,3 The 8237 asserts this signal to grant a DMA transfer to an attached device to one of CA S[2..0] These are cascade signals to enable multip le 8259 chip s to be chained together.
these channels.
*See the ISA bus description in this chapter for complete details. SP /EN This function is used in conjunction with the CAS signals for cascading purposes.

49
Advanced communication principles
Multilevel bus architectures
• Layering
• Don’t want one bus for all communication – Break complexity of communication protocol into pieces easier to
– Peripherals would need high-speed, processor-specific bus interface design and understand
• excess gates, power consumption, and cost; less portable
– Lower levels provide services to higher level
– Too many peripherals slows down bus
Micro- Cache Memory DMA • Lower level might work with bits while higher level might work with
• Processor-local bus processor controller controller
packets of data
– High speed, wide, most frequent
communication – Physical layer
– Connects microprocessor, cache, Processor-local bus • Lowest level in hierarchy
memory controllers, etc. Peripheral Peripheral Peripheral Bridge • Medium to carry data from one actor (device or node) to another
• Peripheral bus • Parallel communication
– Lower speed, narrower, less – Physical layer capable of transporting multiple bits of data
frequent communication Peripheral bus

– Typically industry standard bus


• Serial communication
(ISA, PCI) for portability – Physical layer transports one bit of data at a time
• Bridge • Wireless communication
– Single-purpose processor converts communication between busses
– No physical connection needed for transport at physical layer

Parallel communication Serial communication


• Single data wire, possibly also control and power
• Multiple data, control, and possibly power wires wires
– One bit per wire • Words transmitted one bit at a time
• High data throughput with short distances • Higher data throughput with long distances
• Typically used when connecting devices on – Less average capacitance, so more bits per unit of time
same IC or same circuit board • Cheaper, less bulky
– Bus must be kept short
• More complex interfacing logic and
• long parallel wires result in high capacitance values which
requires more time to charge/discharge communication protocol
• Data misalignment between wires increases as length – Sender needs to decompose word into bits
increases
– Receiver needs to recompose bits into word
• Higher cost, bulky – Control signals often sent on same wire as data
increasing protocol complexity

Wireless communication Error detection and correction


• Infrared (IR) • Often part of bus protocol
– Electronic wave frequencies just below visible light • Error detection: ability of receiver to detect errors during
spectrum transmission
– Diode emits infrared light to generate signal • Error correction: ability of receiver and transmitter to
cooperate to correct problem
– Infrared transistor detects signal, conducts when
– Typically done by acknowledgement/retransmission protocol
exposed to infrared light
• Bit error: single bit is inverted
– Cheap to build
• Burst of bit error: consecutive bits received incorrectly
– Need line of sight, limited range • Parity: extra bit sent with word used for error detection
• Radio frequency (RF) – Odd parity: data word plus parity bit contains odd number of 1’s
– Even parity: data word plus parity bit contains even number of
– Electromagnetic wave frequencies in radio spectrum 1’s
– Analog circuitry and antenna needed on both sides of – Always detects single bit errors, but not all burst bit errors
transmission • Checksum: extra word sent with data packet of multiple
– Line of sight not needed, transmitter power words
– e.g., extra word contains XOR sum of all data words in packet
determines range

50
Serial protocols: I2C I2C bus structure
SCL
• I2C (Inter-IC) SDA
Micro- EEPROM Temp. LCD-
– Two-wire serial bus protocol developed by Philips controller (servant) Sensor controller
(master) (servant) (servant) < 400 pF
Semiconductors nearly 20 years ago Addr=0x01 Addr=0x02 Addr=0x03

– Enables peripheral ICs to communicate using simple


SDA SDA SDA SDA
communication hardware
– Data transfer rates up to 100 kbits/s and 7-bit SCL SCL SCL SCL
Start condition Sending 0 Sending 1 Stop condition
addressing possible in normal mode From
From
– 3.4 Mbits/s and 10-bit addressing in fast-mode Servant receiver

D
– Common devices capable of interfacing to I2C bus: C
S A A A A R A D D D A S O
• EPROMS, Flash, and some RAM memory, real-time clocks, T R 6 5 0 / C 8 7 0 C T P
T w K K
watchdog timers, and microcontrollers Typical read/write cycle

Serial protocols: CAN


• CAN (Controller area network)
Serial protocols: FireWire
– Protocol for real-time applications
• FireWire (a.k.a. I-Link, Lynx, IEEE 1394)
– Developed by Robert Bosch GmbH
– High-performance serial bus developed by Apple Computer Inc.
– Originally for communication among components of cars
– Designed for interfacing independent electronic components
– Applications now using CAN include:
• e.g., Desktop, scanner
• elevator controllers, copiers, telescopes, production-line control
– Data transfer rates from 12.5 to 400 Mbits/s, 64-bit addressing
systems, and medical instruments
– Plug-and-play capabilities
– Data transfer rates up to 1 Mbit/s and 11-bit addressing
– Packet-based layered design structure
– Common devices interfacing with CAN:
• 8051-compatible 8592 processor and standalone CAN controllers
– Applications using FireWire include:
• disk drives, printers, scanners, cameras
– Actual physical design of CAN bus not specified in protocol
– Capable of supporting a LAN similar to Ethernet
• Requires devices to transmit/detect dominant and recessive signals
to/from bus • 64-bit address:
• e.g., ‘1’ = dominant, ‘0’ = recessive if single data wire used – 10 bits for network ids, 1023 subnetworks
• Bus guarantees dominant signal prevails over recessive signal if – 6 bits for node ids, each subnetwork can have 63 nodes
asserted simultaneously – 48 bits for memory address, each node can have 281 terabytes of
distinct locations

Serial protocols: USB


• USB (Universal Serial Bus) Parallel protocols: PCI Bus
– Easier connection between PC and monitors, printers, digital
speakers, modems, scanners, digital cameras, joysticks,
multimedia game equipment • PCI Bus (Peripheral Component Interconnect)
– 2 data rates: – High performance bus originated at Intel in the early 1990’s
• 12 Mbps for increased bandwidth devices
– Standard adopted by industry and administered by PCISIG (PCI
• 1.5 Mbps for lower-speed devices (joysticks, game pads) Special Interest Group)
– Tiered star topology can be used – Interconnects chips, expansion boards, processor memory
• One USB device (hub) connected to PC subsystems
– hub can be embedded in devices like monitor, printer, or keyboard or
can be standalone – Data transfer rates of 127.2 to 508.6 Mbits/s and 32-bit addressing
• Multiple USB devices can be connected to hub • Later extended to 64-bit while maintaining compatibility with 32-bit
schemes
• Up to 127 devices can be connected like this
– Synchronous bus architecture
– USB host controller
• Manages and controls bandwidth and driver software required by – Multiplexed data/address lines
each peripheral
• Dynamically allocates power downstream according to devices
connected/disconnected

51
Wireless protocols: IrDA
Parallel protocols: ARM Bus
• IrDA
– Protocol suite that supports short-range point-to-point
• ARM Bus infrared data transmission
– Designed and used internally by ARM Corporation – Created and promoted by the Infrared Data
– Interfaces with ARM line of processors Association (IrDA)
– Many IC design companies have own bus protocol – Data transfer rate of 9.6 kbps and 4 Mbps
– Data transfer rate is a function of clock speed
– IrDA hardware deployed in notebook computers,
• If clock speed of bus is X, transfer rate = 16 x X bits/s
printers, PDAs, digital cameras, public phones, cell
– 32-bit addressing
phones
– Lack of suitable drivers has slowed use by
applications
– Windows 2000/98 now include support
– Becoming available on popular embedded OS’s

Wireless Protocols: IEEE


Wireless protocols: Bluetooth
802.11
• IEEE 802.11
• Bluetooth
– Proposed standard for wireless LANs
– New, global standard for wireless connectivity
– Specifies parameters for PHY and MAC layers of
– Based on low-cost, short-range radio link
network
– Connection established when within 10 meters of • PHY layer
each other – physical layer
– No line-of-sight required – handles transmission of data between nodes
• e.g., Connect to printer in another room – provisions for data transfer rates of 1 or 2 Mbps
– operates in 2.4 to 2.4835 GHz frequency band (RF)
– or 300 to 428,000 GHz (IR)
• MAC layer
– medium access control layer
– protocol responsible for maintaining order in shared medium
– collision avoidance/detection

Chapter Summary
• Basic protocol concepts
– Actors, direction, time multiplexing, control methods
• General-purpose processors
– Port-based or bus-based I/O
– I/O addressing: Memory mapped I/O or Standard I/O
– Interrupt handling: fixed or vectored
– Direct memory access
• Arbitration
– Priority arbiter (fixed/rotating) or daisy chain
• Bus hierarchy
• Advanced communication
– Parallel vs. serial, wires vs. wireless, error detection/correction,
layering
– Serial protocols: I2C, CAN, FireWire, and USB; Parallel: PCI and
ARM.
– Serial wireless protocols: IrDA, Bluetooth, and IEEE 802.11.

52

You might also like