Devembeddedf (Compatibility Mode)
Devembeddedf (Compatibility Mode)
I/O
microprocessor
I/O
microprocessor core
I/O I/O
real-time
real-time I/O I/O clock
clock
Dr. J.T. Devaraju To outside world To outside world
– What are they? • Embedded systems do a very specific task, they cannot be
• Design challenge – optimizing design programmed to do different things.
metrics • Embedded systems have very limited resources, particularly the
• Technologies memory. Generally, they do not have secondary storage devices
such as the CDROM or the floppy disk.
– Processor technologies
– IC technologies • Hard to define. Nearly any computing system other than a desktop
This image cannot currently be display ed.
computer
– Design technologies
1
Consumer electronics Office Automation Industrial Automation
•Digital camera, digital diary, • For process control in
pharmaceutical, cement, sugar,
• DVD player, electronic toys, •Copying machine
oil exploration, nuclear energy,
• microwave oven, •Fax machine
electricity genera-tion and
•Key telephone,
•Remote controls for TV , •Printer,
transmission.
• air-conditioner, • For Monitoring the temperature,
•Scanner,
•video game consoles, video pressure, humidity, etc., and then
take appropriate action based on the
recorders , •Modem etc.
monitored levels to control other
• wristwatches, devices or to send information to
• Mobile phones,PDA’s, a centralized monitoring station.
• Palmtops etc. • In hazardous industrial environment,
robots are used, complicated tasks
such as hardware assembly.
Finance
•smart cards (has a small micro-controller
and memory and it interacts with the
smartcard reader)
• ATM
2
Kereta BMW Night Vision
Towards Autonomous Vehicles
http://iLab.usc.edu
http://beobots.org
Referral
Hospitals
Health Specialist
Centre
Video Conferencing
Pathology
Video Conferencing iKIT with keyboard
Rural / AMBULAN CE
Panel of Doctors
Remote/ Inaccessible
3
Myvu Crystal • This robot has
“Personal humidifying, oxygen-
Viewer” producing, aroma-
connects easily emitting, and kinetic
to your video functions.
iPod or • The robotic plant can
interact with people
portable media
when they approach
player it and can ‘dance’
It is about 1.30 meter tall and
40 centimeters in diameter.
Weighing about 1 ounce, Myvu's SolidOptex™ optical technology
when music is (The flower, not the kid…
provides the user with the impression of a free-floating monitor. he’ s shorter…)
Let’s call it a “Monitor-on-the-Nose.” played.
• Wearable computing..
3M Mini-Projector is Several Bluetooth helmets
Designed for Business have been developed for
Professionals skiing and motorcycling from
companies such as Marker
and Motorola.
• Jackets that plug into all of
your gear and create a
personal area network are
available from ScotteVest.
TI and others develop the smallest projectors Bluetooth glove phone (Jason Bradbury, UK)
4
Plastic Logic will target business • KINDLE, to surprise
readers of all, sold 240,000
units before Q4
This $350 machine e-book
reader is Amazon's iPod, at
378,000 units this year. The
Kindle will in 3 years be a
$1.1 billion business and 4%
of all Amazon sales.
5
Iowa State robot available for ribbon cuttings,
Robotic Applications birthday parties, uprisings
• Parts handling
• Assembly
• Painting
• Surveillance
• Security (bomb disposal …
really telecherics
rather than robotics)
• Home help (grass cutting,
nursing)
Single-functioned Tightly-constrained
Embedded systems have to work against some deadlines. A specific
job has to be completed within a specific time. In some embedded
• Executes a single program repeatedly unlike desktop pc. systems, called real-time systems, the deadlines are stringent.
Missing a deadline may cause a catastrophe-loss of life or damage
•Embedded systems do a very specific task, they to property.
cannot be programmed to do different things. (Low cost, low power, small, fast)
missile that has to track and intercept an enemy aircraft.
Ex:- Pager, weighing machine. The missile contains an embedded system that tracks the
aircraft and generates a control signal that will launch the
missile. If there is a delay in tracking the aircraft and if
the missile misses the deadline, the enemy aircraft may
drop a bomb and cause loss of many lives. Hence, this
system is a hard real-time embedded system
6
Tightly-constrained cont…
7
Networked Information Appliances An embedded system example -
Embedded systems that are provided with network interfaces and accessed by - a digital camera
networks such as Local Area Network or the Internet are called networked
Digital camera chip
information appliances. Such CCD
lens
Mobile Devices M emory controller ISA bus interface UART LCD ctrl
Mobile devices such as mobile phones, Personal Digital Assistants • Single-functioned -- always a digital camera
(PDAs), smart phones etc. are a special category of embedded
• Tightly-constrained -- Low cost, low power, small, fast
systems.
• Reactive and real-time -- only to a small extent
– A measurable feature of a system’s – Power: the amount of power consumed by the system
implementation – Flexibility: the ability to change the functionality of the
system without incurring heavy NRE cost
– Optimizing design metrics is a key challenge
8
Losses due to delayed market entry
Time-to-market: a demanding
design metric • Simplified revenue model
– Product life = 2W, peak at
• Time required to
W
develop a product to Peak revenue – Time of market entry
the point it can be Peak revenue from defines a triangle,
Revenues ($)
sold to customers delayed entry
representing market
Revenues ($)
On-time
delayed entry
– NRE cost (Non-Recurring Engineering cost): The one-time
Market rise
On-time
Market fall
• Percentage revenue loss = monetary cost of designing the system
Delayed
(D(3W-D)/2W2)*100% – total cost = NRE cost + unit cost * # of units
– per-product cost = total cost / # of units
= (NRE cost / # of units) + unit cost
D W 2W
• Example
On-time Delayed Time
entry entry – NRE=$2000, unit=$100
– For 10 units
– Lifetime 2W=52 wks, delay D=4 wks – total cost = $2000 + 10*$100 = $3000
– (4*(3*26 –4)/2*26^2) = 22% – per-product cost = $2000/10 + $100 = $300
– Lifetime 2W=52 wks, delay D=10 wks
– (10*(3*26 –10)/2*26^2) = 50% Amortizing NRE cost over the units results in an
– Delays are costly! additional $200 per unit
$120,000 $120
– Tasks per second, e.g. Camera A processes 4 images per second
$80,000 $80
– Throughput can be more than latency seems to imply due to
$40,000 $40 concurrency, e.g. Camera B may process 8 images per second (by
$0 $0 capturing a new image while previous image is being stored).
0 800 1600 2400 0 800 1600 2400
Number of units (volume) Numb er of units (volume) • Speedup of B over S = B’s performance / A’s
performance
• But, must also consider time-to-market
– Throughput speedup = 8/4 = 2
9
Three key embedded system
Processor Technology
technologies
• Technology
– A manner of accomplishing a task, especially
using technical processes, methods, or
knowledge General Application Single
• Three key technologies for embedded purpose specific purpose
Processor Processor Processor
systems
– Processor technology Microprocessors &
microcontrollers ASIP Specific
– IC technology processor
F P processor
– Design technology Programming
router
General-purpose processors
–Performance may be fast for computation intensive applications
• Programmable device used in a
variety of applications Controller Datapath
– Also known as “microprocessor” Control –Unit cost may be relatively high for large quantities
Register
logic and
• Features State register
file
–Performance may be slow for certain operation
– Program memory General
–Size and power may be large
– General datapath with large register IR PC ALU
– High flexibility
• “Pentium” the most well-known,
but there are hundreds of others
10
Single-purpose processors Application-specific processors
• Programmable processor
• Digital circuit designed to optimized for a particular class
execute exactly one program Controller Datapath of applications having common
Controller Datapath
Control
– a.k.a. coprocessor, accelerator or Control index
characteristics logic and
Registers
logic
peripheral total State register
State – Compromise between general- Custom
• Features register +
purpose and single-purpose IR PC
ALU
program
– Program memory Assembly code
– No program memory for:
– Optimized datapath total = 0
• Benefits – Special functional units
for i =1 to …
– Fast
• Benefits
– Low power
– Some flexibility, good
– Small size performance, size and power
11
IC technology
IC technology
• The manner in which a digital (gate-level)
implementation is mapped onto an IC • Three types of IC technologies
– IC: Integrated circuit, or “chip” – Full-custom/VLSI
– IC technologies differ in their customization to – Semi-custom ASIC (gate array and standard
a design cell)
– IC’s consist of numerous layers (perhaps 10 – PLD (Programmable Logic Device)
or more)
• IC technologies differ with respect to who builds
each layer and when
gate
IC package IC oxide
source channel drain
Silicon substrate
Full-custom/VLSI Semi-custom
• All layers are optimized for an embedded • Lower layers are fully or partially built
system’s particular digital implementation – Designers are left with routing of wires and
– Placing transistors maybe placing some blocks
– Sizing transistors • Benefits
– Routing wires – Good performance, good size, less NRE cost
• Benefits than a full-custom implementation (perhaps
– Excellent performance, small size, low power $10k to $100k)
• Drawbacks • Drawbacks
– High NRE cost (e.g., $300k), long time-to- – Still require weeks to months to develop
market
12
Graphical illustration of Moore’s
Moore’s law
law
• Wow 1981 1984 1987 1990 1993 1996 1999 2002
– This growth rate is hard to imagine, most
10,000 150,000,000
people underestimate transistors transistors
– How many ancestors do you have from 20 Leading edge Leading edge
10,000
into an implementation
Productivity
100
System System Hw/Sw/ M odel simulat./
Compilation/Synthesis: specification synthesis OS checkers
Automates exploration and 10
insertion of implementation
details for lower level.
1
Behavioral Behavior Cores Hw-Sw
specification synthesis cosimulators
Libraries/IP: Incorporates pre- 0.1
designed implementation from
lower abstraction level into 0.01
1983
2003
2005
1985
1987
1991
1993
2001
higher level.
1989
1997
1999
2007
1995
2009
RT RT RT HDL simulators
specification synthesis components
“codesign”
13
Design productivity gap
Design productivity gap • 1981 leading edge chip required 100 designer
months
• While designer productivity has grown at an – 10,000 transistors / 100 transistors/month
impressive rate over the past decades, the rate • 2002 leading edge chip requires 30,000
of improvement has not kept pace with chip designer months
10,000
capacity 100,000 – 150,000,000 / 5000 transistors/month
1,000 10,000
Logic transistors 100 1000 • Designer cost increase from $1M to $300M
10 Gap Productivity
per chip 100
IC capacity (K) Trans./Staff-Mo. 10,000 100,000
(in millions) 1 10
1,000 10,000
0.1 1
productivity Logic transistors 100 1000
0.01 0.1 10 Gap 100 Productivity
per chip IC capacity
0.001 0.01 (in millions) 1 10 (K) Trans./Staff-Mo.
0.1 1
productivity
0.01 0.1
0.001 0.01
Outline
• Introduction
Chapter 2: Custom single- • Combinational logic
purpose processors • Sequential logic
• Custom single-purpose processor design
• RT-level custom single-purpose processor
design
14
GPP SPP ASIP
Performance Fast Fast <gpp good
Introduction
low for certain
application • Processor
Size Large Low good – Digital circuit that performs a
computation tasks Digital camera chip
Design Time Low High • A custom single-purpose M emory controller ISA bus interface UART LCD ctrl
gate
IC package IC oxide – Inverter, NAND, NOR
source channel drain
Silicon substrate
15
Combinational components Combinational components Cont…
I(log n -1) I0
I(m-1) I1 I0
…
n … A B
A B A B
log n x n n n
S0 n-bit, m x 1 n n n
Decoder
… Multiplexor
… n bit,
S(log m) n-bit n-bit m function S0
n Adder Comparator ALU
O = I0 if S=0..00 O0 =1 if I=0..00 …
O(n-1) O1 O0 n
I1 if S=0..01 O O1 =1 if I=0..01 n S(log m)
… …
I(m-1) if S=1..11 O(n-1) =1 if I=1..11 With enable input e all O’s are 0 if e=0 sum = A+B
carry sum less equal greater O = A op B O
(first n bits) With carry-in input Ci less = 1 if A<B op determined
architecture Behavioral of mux is equal =1 if A=B by S.
carry = (n+1)’th
begin sum = A + B + Ci greater=1 if A>B
bit of A+B May have status outputs carry, zero, etc.
process(a,b,i0,i1,i2,i3);
variable muxval:integer;
Begin muxval:=0;
architecture Behavioral of decoder is
if(a='1')then
muxval:=muxval+1; architecture Behavioral of adder4 is architecture Behavioral of comp2 is
begin begin
end if; begin
if(b='1')then fulad f1( a[0], b[0], ci, s[0], co[0] ); i[0] <= a[0] xnor b[0];
y0<=not ((not a) and (not b)); i[1] <= a[1] xnor b[1];
muxval:=muxval+2; fulad f2( a[1], b[1], co[0], s[1], co[1] );
y1<=not ((not a) and b);
end if; fulad f3( a[2], b[2], co[1], s[2], co[2] );
y2<= not (a and (not b)); eq <= i[0] and i[1];
case muxval is fulad f1( a[3], b[3], co[2], s[3], co[3] );
y3<= not (a and b); g <= ( a[1] and (not ( b[1] )) or
when 0 => y<=i0; end Behavioral;
when 1 => y<=i1; ( i[1] and a[0] and (not ( b[0] ));
end Behavioral; l <= ( (not (a[1] )and b[1] ) or
when 2 => y<=i2; architecture Behavioral of fulad is
when 3 => y<=i3; begin ( i[1] and (not (a[0]) and b[0] );
when others => null; sum <= a xor b xor c; end Behavioral;
end case; carry <= (a and b) or (b and c) or (a and c);
end process; end Behavioral;
end Behavioral;
JK Flip Flop
Sequential components
I
n
n-bit
D Q load shift Counter
J Q n-bit
Register
n-bit
Shift register n
CK FF clear I Q Q=
CK FF Q= n
Q = lsb
0 if clear=1, Q
0 if clear=1, Q(prev)+1 if count=1 and clock=1.
- Content shifted
CLR Q
I if load=1 and clock=1,
Q
- I stored in msb
K CLR Q
Q(previous) otherwise.
module sreg (input clk ,input clr, module sreg (input clk ,input clr,
input [3:0]i, module sreg (input clk ,input clr,
input i, output [3:0]q); output [3:0]q);
output [3:0]q); dff df0(clk,clr,i[0],q[0]); tff tf0(clk,clr,,q[0]);
JK FLIP FLOP dff df0(clk,clr,i[0],q[0]); dff df1(clk,clr,i[1],q[1]); tff tf1(q[0],clr,q[1]);
dff df1(clk,clr,i[1],q[1]); dff df2(clk,clr,i[2],q[2]); tff tf2(q[1],clr,q[2]);
dff df2(clk,clr,i[2],q[2]); dff df3(clk,clr,i[3],q[3]); tff tf3(q[2],clr,q[3]);
always @ ( negedge clk or negedge clr ) begin endmodule endmodule
dff df3(clk,clr,i[3],q[3]);
if (! clr) begin endmodule module dff( input clk, input clr, input d,
q<=0; output q); module tff( input clk, input clr, output q);
qc <=~q; always @(negedge clk or negedge clr)
always @(negedge clk or negedge clr)
end module dff( input clk, input clr, input d, if (!clr) if (!clr)
else begin output q); q <= 0; q <= 0;
always @(negedge clk or negedge clr) else else
q <= (j && qc) || (~k && q); q<=!q;
if (!clr) q<=d;
qc <= ~((j && qc) || (~k && q)); q <= 0; endmodule endmodule
end else
end q<=d;
endmodule
01 11 10
F) Combinational Logic
0 a
0 0 1 1 x
I1 = Q1’Q0a + Q1a’ +
A) Problem Description C) Implementation Model D) State Table (Moore-type) 1 Q1Q0’
0 1 0 1
You want to construct a clock
divider. Slow down your pre- x
a Combinational logic Inputs Outputs Q1Q0
existing clock so that you output a I1 Q1 Q0 a I1 I0 I0
x 00 01 11 10 I1
1 for every four clock cycles a
I0 0 0 0 0 0
0 0 0 1 1 0 I0 = Q0a’ + Q0’a
0 0 1 0 1
0 1 0 0 1 0
B) State Diagram FSM Q1 Q0 0 1 1 1 0 1 1 0 0 1
1 0 0 1 0 0
State register 1 0 1 1 1
1 1 0 1 1 x Q1Q0 I0
a=0 x=0 x=1 a=0 1 00 01 11 10
I1 I0 1 1 1 0 0 a
0 0 0 1 0 x = Q1Q0
0 a=1 3
1 Q1 Q0
0 0 1 0
a=1
C) Implementation Model
a=1
• Given this implementation x
a Combinational
1
a=1
2 model logic I
1I
a=0 a=0 0
x=0 x=0
– Sequential logic design quickly Q
1
Q
0
State register
reduces to combinational logic I I
design 1 0
16
Custom single-purpose Example: greatest common
processor basic model … … divisor
external external
(a) black-box view • First create
controller datapath
control
inputs
data
inputs
(b) desired functionality algorithm
… … x_i
next-state registers go_i y_i
datapath
control and
GCD
0: int x, y; • Convert algorithm to
controller inputs datapath control 1: while (1)
logic
{
“complex” state
datapath
state functional d_o
2: while (!go_i); machine
control 3: x = x_i;
… outputs …
register units
– Known as FSMD:
4: y = y_i;
external
control
external
data 5: while (x != y) finite-state machine
outputs outputs
… … { with datapath
Controller Datapath 6: if (x < y) – Can use templates to
a view inside the controller and datapath
controller and datapath Control index
7: y = y - x;
logic
total perform such
State else
register
+
8: x = x - y; conversion
Data
}
memory 9: d_o = x;
}
c1 !c1*c2 !c1*!c2
cond else x<y !(x<y)
loop-body- y = y -x x=x -y
next
statements
c1 stmts c2 stmts others 8: x = x - y; 7: 8:
statement
} 6-J:
J: J: 9: d_o = x;
5-J:
}
next next 9: d_o = x
statement statement
1-J:
1100 1-J:
17
Splitting into a controller and datapath Controller state table for the GCD example
go_i Inputs Outputs
Controller !1 Q3 Q2 Q1 Q0 x_ne x_lt_ go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld
0000 1: q_y y
go_i 0 0 0 0 * * * 0 0 0 1 X X 0 0 0
1 !(!go_i)
Controller implementation model Controller !1 0001 2: 0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0000 1: x_i y_i !go_i
go_i 0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
x_sel 1 !(!go_i) (b) Datapath 0010 2-J:
Combinational y_sel 0001 2: 0 0 1 0 * * * 0 0 0 1 X X 0 0 0
logic !go_i x_sel x_sel = 0
x_ld n-bit 2x1 n-bit 2x1 0011 3: x_ld = 1 0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
y_ld 0010 2-J: y_sel
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y y_sel = 0
0100 4: y_ld = 1 0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
x_lt_y y_ld
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
d_ld x_neq_y=0
y_sel = 0 0101 5:
0100 4: y_ld = 1 0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
!= < subtractor subtractor x_neq_y=1
0110 6: 0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: 0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
x_neq_y x_lt_y=1 x_lt_y=0
Q3 Q2 Q1 Q0 x_neq_y=1
0110 6: x_lt_y 9: d 7: y_sel = 1 8: x_sel = 1 1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
State register y_ld = 1 x_ld = 1
d_ld 1 0 0 1 * * * 1 0 1 0 X X 0 0 0
x_lt_y=1 x_lt_y=0 0111 1000
I3 I2 I1 I0
7: y_sel = 1 8: x_sel = 1 d_o
1001 6-J:
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
y_ld = 1 x_ld = 1
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
0111 1000 1010 5-J: 1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1001 6-J:
1011 9: d_ld = 1 1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1010 5-J: 1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1100 1-J:
1011 9: d_ld = 1 1 1 1 1 * * * 0 0 0 0 X X 0 0 0
1100 1-J:
Optimizing the original program (cont’) Optimizing the original program (cont’)
(2,2).
GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows: (42, 8),
(8,2), (2,0)
18
Optimizing the FSMD Optimizing the FSMD (cont.)
• Areas of possible improvements 1:
int x, y; !1 original FSMD optimized FSMD
int x, y;
– merge states 2:
1 !(!go_i) eliminate state 1 – out going transitions have constant values 2:
go_i !go_i
!go_i
– separate states 6:
x!=y
merge state 5 and state 6 – transitions from state 6 can 9: d_o = x
be done in state 5
• states which require complex operations (a*b*c*d) 7: y = y -x
x<y !(x<y)
8: x = x - y
1-J:
machine Bridge
(a) Controller
rdy_in=0 rdy_in=1
Receiver
rdy_out rdy_in=1
– Rather than algorithm clock
converts two 4-bit inputs, arriving one
at a time over data_in along with a
WaitFirst4 RecFirst4Start RecFirst4End
rdy_in pulse, into one 8-bit output on data_lo_ld=1
– Cycle timing often too data_in(4)
data_out along with a rdy_out pulse.
data_out(8) rdy_in=0 rdy_in=0 rdy_in=1
central to functionality WaitSecond4
rdy_in=1
RecSecond4Start RecSecond4End
data_hi_ld=1
• Example rdy_in=0 Bridge rdy_in=1
data_hi=data_in
data_in(4) data_out
data_lo_ld
data_hi_ld
Inputs
registers
Send8End
– Exercise: complete the data_out=data_hi
& data_lo rdy_out=0
Outputs
rdy_out: bit; data_out:bit[8] data_out
rdy_out=1
design Variables
data_lo, data_hi: bit[4]; (b) Datapath
19
Summary
• Custom single-purpose processors
– Straightforward design techniques Chapter 3 General-Purpose
– Can be built to execute algorithms
– Typically start with FSMD
Processors: Software
– CAD tools can be of great assistance
20
Control Unit Sub-Operations Control Unit Sub-Operations
• Fetch
– Get next Processor • Decode Processor
Control unit Datapath Control unit Datapath
IR
Controller Control
/Status
what the Controller Control
/Status
counter, means
always points PC 100 IR
load R0, M[500] R0 R1 PC 100 IR
load R0, M[500] R0 R1
to next
instruction I/O I/O
100 load R0, M[500] Memory
... 100 load R0, M[500] Memory
...
– IR: holds the 101 inc R1, R0
500
501
10
101 inc R1, R0
500
501
10
to datapath Registers
ALU Registers
register – This particular
PC 100 IR R0
10
R1
instruction PC 100 IR R0
10
R1
11
load R0, M[500] load R0, M[500]
does nothing
I/O during this sub- I/O
... ...
100 load R0, M[500]
101 inc R1, R0
Memory
500 10 operation 100 load R0, M[500]
101 inc R1, R0
Memory
500 10
501 501
102 store M[501], R1 ... 102 store M[501], R1 ...
– This particular
instruction PC 100 IR R0
10
R1
11
PC 100 IR R0
10
R1
load R0, M[500] load R0, M[500]
does nothing
during this sub- I/O I/O
... ...
operation 100 load R0, M[500]
101 inc R1, R0
Memory
500 10
100 load R0, M[500]
101 inc R1, R0
Memory
500 10
501 11 501
102 store M[501], R1 ... 102 store M[501], R1 ...
21
Instruction Cycles Instruction Cycles
PC=100 Processor PC=100 Processor
Fetch Decode Fetch Exec. Store Control unit Datapath Fetch Decode Fetch Exec. Store Control unit Datapath
ops results ALU
ops results ALU
clk Controller Control +1 clk Controller Control
/Status /Status
PC=101 PC=101
Registers Registers
Fetch Decode Fetch Exec. Store Fetch Decode Fetch Exec. Store
ops results ops results
clk clk
10 11 10 11
PC 101 IR R0 R1 PC 102 IR R0 R1
inc R1, R0 store M[501], R1
PC=102
I/O
Fetch Decode Fetch Exec. Store I/O
... ops results ...
100 load R0, M[500] Memory 100 load R0, M[500] Memory
500 10 clk 500 10
101 inc R1, R0 501 101 inc R1, R0 501 11
102 store M[501], R1 ... 102 store M[501], R1 ...
Processor Processor
• Memory access may Fast/expensive technology, usually on
the same chip
– Simultaneous
memory (program and data)
processor
program and – Holds copy of part of Memory
data memory Harvard Princeton
memory
access Slower/cheaper technology, usually on
– Hits and misses a different chip
22
Pipelining: Increasing Superscalar and VLIW Architectures
• Performance can be improved by:
Instruction Throughput – Faster clock (but there’s a limit)
– Pipelining: slice up instruction into stages, overlap stages
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
– Multiple ALUs to support more than one instruction stream
Non-pipelined Pipelined • Superscalar
Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
– Scalar: non-vector operations
non-pipelined dish cleaning Time pipelined dish cleaning Time
– Fetches instructions in batches, executes as many as
possible
Fetch-instr. 1 2 3 4 5 6 7 8 » May require extensive hardware to detect
Decode 1 2 3 4 5 6 7 8 independent instructions
Fetch ops. 1 2 3 4 5 6 7 8 Pipelined – VLIW: each word in memory has multiple independent
Execute 1 2 3 4 5 6 7 8 instructions
Instruction 1
Store res. 1 2 3 4 5 6 7 8 » Relies on the compiler to detect and schedule
instructions
Time
pipelined instruction execution
» Currently growing in popularity
• Programmer doesn’t need detailed understanding of architecture, may Instruction 3 opcode operand1 operand2
need to know architectural abstraction
Instruction 4 opcode operand1 operand2
– Instead, needs to know what instructions can be executed
...
• Two levels of instructions:
– Assembly level
– Structured languages (C, C++, Java, etc.) • Instruction Set
• Most development today done using structured languages
– But, some assembly level programming may still be necessary
– Defines the legal set of instructions for that
– Drivers: portion of program that communicates with and/or controls
processor
(drives) another device • Data transfer: memory/register, register/register,
• Often have detailed timing considerations, extensive bit I/O, etc.
manipulation • Arithmetic/logical: move register through ALU and
• Assembly level may be best for these back
• Branches: determine next PC value when not just
PC+1
Register
Register address Memory address Data MOV Rn, #immed. 0011 Rn immediate Rn = immediate
indirect
23
Sample Programs Programmer Considerations
C program Equivalent assembly program
Microprocessor Architecture
Example: parallel port driver
Overview
• If you are using a particular LPT Connection Pin
1
I/O Direction
Output
Register Address
0th bit of register #2
Pin 13
Switch
microprocessor, now is a good time to 2-9 Output 0th bit of register #2
PC Parallel port
LED
Pin 2
review its architecture 10,11,12,13,15
14,16,17
Input
Output
6,7,5,4,3th bit of register #1
Operating System
Parallel Port Example • Optional software layer
; This program consists of a sub-routine that reads extern “C” CheckPort(void); // defined in
providing low-level
;
;
;
the state of the input pin, determining the on/off state
of our switch and asserts the output pin, turning the LED
on/off accordingly
void main(void) {
while( 1 ) {
// assembly
services to a program
.386
(application).
CheckPort();
}
CheckPort proc }
push ax ; save the content
push dx
mov
in
; save the content
dx, 3BCh + 1 ; base + 1 for register #1
al, dx ; read register #1
– File management, disk DB file_name “out.txt” -- store file name
SwitchOn:
LPT Connection Pin I/O Direction Register Address – Scheduling multiple L2:
mov dx, 3BCh + 0 ; base + 0 for register #0 1 Output 0th bit of register #2
in
or
al, dx
al, 01h
; read the current state of the port
; set first bit (masking) 2-9 Output 0th bit of register #2
programs for execution
out dx, al ; write it out to the port
24
Development Environment
Software Development Process
• Development processor • Compilers
– The processor on which we write and debug – Cross
our programs C File C File Asm. compiler
File
• Usually a PC • Runs on one
Compiler Assemble
• Target processor r processor,
Binary Binary Binary but generates
– The processor that the program will run on in File File File
code for
our embedded system Library
Linker
Debugger another
• Often different from the development processor Exec.
File Profiler • Assemblers
Implementation Phase Verification Phase • Linkers
• Debuggers
Development processor Target processor
• Profilers
Implementation
• ISS • General-purpose processors
Implementation
Phase Phase – Gives us control over time – Sometimes too general to be effective in demanding
– set breakpoints, look at
Verification register values, set values, application
Phase Development processor
step-by-step execution, ... • e.g., video processing – requires huge video
Debugger – But, doesn’t interact with buffers and operations on large arrays of data,
/ ISS
real environment inefficient on a GPP
Emulator
• Download to board – But single-purpose processor has high NRE, not
– Use device programmer programmable
External tools
– Runs in real environment,
but not controllable
• ASIPs – targeted to a particular domain
Programmer
• Compromise: emulator – Contain architectural features specific to that domain
Verification
Phase – Runs in real environment, • e.g., embedded control, digital signal processing,
at speed or near video processing, network processing,
– Supports some telecommunications, etc.
controllability from the PC – Still programmable
25
A Common ASIP: Microcontroller Another Common ASIP: Digital Signal
• For embedded control applications
Processors (DSP)
– Reading sensors, setting actuators
– Mostly dealing with events (bits): data is present, but not in huge
• For signal processing applications
amounts – Large amounts of digitized data, often streaming
– e.g., VCR, disk drive, digital camera (assuming SPP for image
compression), washing machine, microwave oven
– Data transformations must be applied fast
• Microcontroller features – e.g., cell-phone voice filter, digital TV, music
– On-chip peripherals synthesizer
• Timers, analog-digital converters, serial communication, etc. • DSP features
• Tightly integrated for programmer, typically part of register
space – Several instruction execution units
– On-chip program and data memory – Multiple-accumulate single-cycle instruction, other
– Direct programmer access to many of the chip’s pins instrs.
– Specialized instructions for bit-manipulation and other low-level
operations – Efficient vector operations – e.g., add two arrays
• Vector ALUs, loop buffers, etc.
StrongARM
SA-110
233 MHz None 32 268 1W 2.1M NA
– But instructive to see 0001
M ov2 M[dir] = RF[rn]
to Fetch
Microcontroller
Intel 12 MHz 4K ROM, 128 RAM, 8 ~1 ~0.2W ~10K $7 how simply we can M ov3 M[rn] = RF[rm]
0010 to Fetch
8051 32 I/O, Timer, UART
Motorola 3 MHz 4K ROM, 192 RAM, 8 ~.5 ~0.1W ~10K $5 build one top down M ov4 RF[rn]= imm
68HC811 32 I/O, Timer, WDT,
0011 to Fetch
SPI
Digital Signal Processors – Remember that real Aliases:
Add RF[rn] =RF[rn]+RF[rm]
op IR[15..12] dir IR[7..0]
TI C5416 160 MHz 128K, SRAM, 3 T1
Ports, DMA, 13
16/32 ~600 NA NA $34
processors aren’t rn IR[11..8]
rm IR[7..4]
imm IR[7..0]
rel IR[7..0]
0100 to Fetch
26
Architecture of a Simple Microprocessor
• Storage devices for A Simple Microprocessor
each declared variable
– register file holds each Reset PC=0; PCclr=1;
Datapath
of the variables Control unit To all
input RFs
1 0
2x1 mux Fetch IR=M[PC]; MS=10; Datapath 1
control Control unit To all 0
• Functional units to signals
RFwa
RFw Decode
PC=PC+1
from states
Irld=1;
Mre=1;
input
contro
RFs
2x1 mux
PCinc=1;
carry out the FSMD Controller
(Next-state and
control From all
RFwe
RF (16) M ov1
below
RF[rn] = M[dir] RFwa=rn; RFwe=1; RFs=01; Controller
l
signals RFwa RFw
RFr1a=rn; RFr1e=1;
(Next-state and
control From all
RFwe
RF (16)
signals RFr1e M ov2 M[dir] = RF[rn] logic; state output RFr1a
– One ALU carries out 16 RFr2a
0001 to Fetch Ms=01; Mwe=1;
register) control
signals RFr1e
every required operation PCld
PCinc
PC IR
Irld
RFr2e
RFr1 RFr2 0010
M ov3 M[rn] = RF[rm]
to Fetch
RFr1a=rn; RFr1e=1;
Ms=10; Mwe=1; 16 RFr2a
PCld Irld
• Connections added PCclr
ALUs
ALU
0011
M ov4 RF[rn]= imm
to Fetch
RFwa=rn; RFwe=1; RFs=10;
PCinc
PC IR
RFr2e
RFr1 RFr2
among the 2 1 0
ALUz
0100
Add RF[rn] =RF[rn]+RF[rm]
to Fetch
RFwa=rn; RFwe=1; RFs=00;
RFr1a=rn; RFr1e=1;
PCclr
ALUs
ALUz
ALU
components’ ports Ms
3x1 mux Mre Mwe 0101
Sub RF[rn] = RF[rn]-RF[rm]
RFr2a=rm; RFr2e=1; ALUs=00
RFwa=rn; RFwe=1; RFs=00;
RFr1a=rn; RFr1e=1;
2 1 0
to Fetch
corresponding to the Jz PC=(RF[rn]=0) ?rel :PC
RFr2a=rm; RFr2e=1; ALUs=01
PCld= ALUz;
Ms
3x1 mux Mre Mwe
RFrla=rn;
operations required by A Me mory D
0110 to Fetch
RFrle=1;
FSM operations that replace the FSMD
FSMD
the FSM operations after a datapath is created
A Me mory D
You just built a simple microprocessor!
• Unique identifiers
created for every
control signal
Chapter Summary
• General-purpose processors
– Good performance, low NRE, flexible
• Controller, datapath, and memory
• Structured languages prevail Chapter 4 Standard Single
– But some assembly level programming still necessary
• Many tools available Purpose Processors:
– Including instruction-set simulators, and in-circuit
emulators Peripherals
• ASIPs
– Microcontrollers, DSPs, network processors, more
customized ASIPs
• Choosing among processors is an important step
• Designing a general-purpose processor is conceptually
the same as designing a single-purpose processor
Introduction Timers
• Single-purpose processors • Timer: measures time intervals
Basic timer
– Performs specific computation task – To generate timed output events
16-bit up
– Custom single-purpose processors • e.g., hold traffic light green for 10 s counter
16 Cnt
• Designed by us for a unique task – To measure input events Clk
• “Off-the-shelf” -- pre-designed for a common task • Based on counting clock pulses Top
27
Other timer structures
Counters • Interval timer Timer with a terminal
count
– Indicates when 16-bit up
Timer/counter Clk
• Counter: like a timer, but desired time interval counter 16 Cnt
has passed
counts pulses on a general Clk 2x1 mux 16-bit up counter
16
Cnt
– We set terminal count Reset
input signal rather than to desired interval =
16/32-bit timer
Top 16-bit up
clock Cnt_in Top
• Number of clock
Clk
counter 16 Cnt1
– e.g., count cars passing over Reset cycles = Desired Terminal count
Top2
Top1
indicator reaction
Reaction Timer Watchdog timer
light button
MAX-233
– bit rate usually higher
28
COM1- 0x3F8 to 0x3FE
0x3F8 Data #include<iostream.h>
0x3F9 Interrupt Enable Register Buad rate Hex No.
#include<stdio.h>
0x3FA Interrupt identification Register #include<conio.h> 110 0x417
Line control Register 0x3FB 0x3FB Line Control Register #include<dos.h>
D7 D6 D5 D4 D3 D2 D1 D0 0x3FC Modem Control Register 300 0x180
#define port 0x3F8 //com1 OR 0x2F8 for com2
0x3FD Line Status Register 1200 0x060
00 5bits 0x3FE Modem Status Register //com1 0x3F8 to 0x3FE
01 6bits 2400 0x030
DLAB=1 to PE main()
access divider 10 7bits { 4800 0x018
latch 11 8bits int k,j,t;
0 odd 0 1Stopbit 9600 0x00C
clrscr();
parityt 1 2bits outportb(port+3,0x80); //Baud rate specifier (0x3FB) 19200 0x007
1 Even outportb(port+0,0x30); //Baud rate LSB
LSB 0x3F8
0x3FD Line Status Register outportb(port+1,0x00); //Baud rate MSB (0x3F9)
outportb(port+3,0x03); //cw no parity MSB 0x3F9
D7 D6 D5 D4 D3 D2 D1 D0 outportb(port+2,0x00); // Interrupt identification Register
0 TSRE THRE BI FE PE OE RxRDY outportb(port+4,0x0b); //modem control
Optinal
29
TEMPERATURE CONTROL USING THYRISTOR contd…..
TEMPERATURE CONTROLE USING THYRISTOR
+5V
(AC VOLTAGE CONTROLLER)
mC
9V 10K
ZCP
1K
a Vr.m.s AC MAINS
- +
INT
Heating (temp)
0 TIMER
PC817
Voltage OPTOCOUPLER
t RC5
6
LED BTA16
t 1 0.1uF
TRIAC
AC
2 Supply
4
MOC3020
5ms If 1-IC=1ms
Voltage
5ms=5000 IC CONSTANT CURRENT SOURCE USING PWM
t
V
16 bit up counter
LOAD
TC=65535-N mC 25% duty cycle – average pwm_o is 1.25V
R1 R2 Q2
RC5 BU508
pwm_
o
t C1
CAP
clk
1E
Load timer Load timer Load timer Load timer
50% duty cycle – average pwm_o is 2.5V.
pwm_
ADC o
Timer OF Timer OF Timer OF Timer OF clk
pwm_
o
t clk
30
CODE (HEX) COMMAND TO LCD INSTRUCTION REGISTER
1 Clear display screen
2 Return home
4 Decrement cursor (shift cursor to left)
6 Increment cursor (shift cursor to right)
5 Shift display right
7 Shift display left
8 Display off, cursor off
A Display on, cursor on
C Display on, cursor off
E Display on, cursor blinking
F Display on, cursor blinking
10 Shift cursor position to left
14 Shift cursor position to right
18 Shift the entire display to the left
1C Shift the entire display to the right
80 Force cursor to beginning of 1st line
C0 Force cursor to beginning of 2nd line
38 2 lines and 5x7 matrix
C1
C2
Keypad controller C3
C4
R1
R2
R3
N1 R4
N2
N3 k_pressed
N4
M1
M2
M3
M4
key_code
4
key_code
Algorithm for keyboard-Display
• Wait until all keys are open.
keypad controller
• Check for any key press.
• Wait for around 10mS (Key debounce).
N=4, M=4 • Identify the key pressed by scanning each row taken one at a
time.
• Assign key code
• Display the key pressed on the 7-segment display.
31
Stepper Motor
• A stepper motor is an electromechanical device which converts electrical pulses into
Bias’/Set 6 11 Phase A’
Note:- In half step excitation mode motor will rotate at half the specified given
step resolution. Means if step resolution is 1.8 degree then in this mode it will be
0.9 degree. Step resolution means on receiving on 1 pulse motor will rotate that
much degree. If step resolution is 1.8 degree then it will take 200 pulses for motor
to compete 1 revolution (360 degree).
32
MOVLW N (50d)
PORT=Sequence1 (6) MOVWF COUNT
go: MOVLW 0X06
Stepper motor with controller
Delay MOVWF PORTB
PORT=Sequence2 (3)
Delay
call delay
MOVLW 0X03
(driver)
/* main.c */
MOVWF PORTB void main(void){
PORT=Sequence3 (9) call delay MC3479P sbit clk=P1^1;
*/turn the motor forward */
Delay MOVLW 0X09 Stepper Motor sbit cw=P1^0;
cw=0; /* set direction */
Driver 8051 clk=0; /* pulse clock */
PORT=Sequence4 (C ) MOVWF PORTB CW’/CCW
void delay(void){
delay();
int i, j;
delay call delay 10 CLK
P1.0
for (i=0; i<1000; i++)
clk=1;
P1.1
MOVLW 0X0C 7 for ( j=0; j<50; j++)
i = i + 0; /*turn the motor backwards */
MOVWF PORTB 2 A’ B 15 cw=1; /* set direction */
}
call delay 3 A B’ 14 clk=0; /* pulse clock */
delay();
DECFZ COUNT,1 clk=1;
GOTO go
01100110=66 RETURN }
RRF SEQ,1
CALL delay
GOTO back
Digital-to-analog conversion
using successive approximation
Given an analog input signal whose voltage should range from 0 to 15 volts, and an 8-bit digital encoding, calculate the correct encoding for
5 volts. Then trace the successive-approximation approach to find the correct encoding.
33
Outline Introduction
• Memory Write Ability and Storage • Embedded system’s functionality aspects
Permanence – Processing
• processors
• Common Memory Types
• transformation of data
• Composing Memory – Storage
• Memory Hierarchy and Cache • memory
• Advanced RAM • retention of data
– Communication
• buses
• transfer of data
permanence
m × n me mory
• Traditional ROM/RAM
• Stores large number of bits
Storage
… distinctions Mask-programmed ROM Ideal memory
– m x n: m words of n bits each – ROM
m words
Life of OT P ROM
…
– k = Log2(m) address input signals • read only, bits stored without product
34
ROM: “Read-Only” Memory Example: 8 x 4 ROM
• Nonvolatile memory • Horizontal lines = words
Internal view
• Can be read from but not written to, • Vertical lines = data
8 × 4 ROM
by a processor in an embedded External view
• Lines connected only at word 0
enable 2k × n ROM enable 3×8 word 1
system A0
circles decoder word 2
A0 word line
…
• Traditionally written to, Ak-1
…
• Decoder sets word 2’s line A1
A2
Q3 Q2 Q1 Q0
• Uses set to 1 because there is a
– Store software program for general- “programmed” connection
purpose processor with word 2’s line
• program instructions can be one or
• Word 2 is not connected
more ROM words
with data lines Q2 and Q0
Implementing combinational
Mask-programmed ROM
function
• Any combinational circuit of n functions of same k
variables can be done with 2^k x n ROM • Connections “programmed” at fabrication
– set of masks
• Lowest write ability
Truth table
– only once
Inputs (address) Outputs
a
0
b
0
c
0
y
0
z
0
8×2 ROM
0
0
0
1
word 0
word 1
• Highest storage permanence
0 0 1 0 1
0 1
0
0
1
1
0
1
0
1
1
0 enable 1
1
0
0
– bits never change unless damaged
1 0 0 1 0
1 0 1 1 1 c 1 1
1
1
1
1
0
1
1
1
1
1
b
a
y
1
1
1
1
z
word 7 • Typically used for final design of high-
volume systems
– spread out NRE cost for a low unit cost
user
floating gate
insulator
– user provides file of desired contents of ROM – (a) Negative charges form a channel between (a)
– file input to machine called ROM programmer source and drain storing a logic 1
– (b) Large positive voltage at gate causes negative
– each programmable connection is a fuse charges to move out of channel and get trapped in +15V
floating gate storing a logic 0
– ROM programmer blows fuses where connections – (c) (Erase) Shining UV rays on surface of floating-
(b)
source drain
35
EEPROM: Electrically erasable
Flash Memory
programmable ROM
• Programmed and erased electronically • Extension of EEPROM
– typically by using higher than normal voltage – Same floating gate principle
– can program and erase individual words – Same write ability and storage permanence
• Better write ability • Fast erase
– can be in-system programmable with built-in circuit to – Large blocks of memory erased at once, rather than
provide higher than normal voltage one word at a time
• built-in memory controller commonly used to hide details from – Blocks typically several thousand bytes large
memory user
– writes very slow due to erasing and programming
• Writes to single words may be slower
• “busy” pin indicates to processor EEPROM still writing – Entire block must be read, word updated, then entire
block written back
– can be erased and programmed tens of thousands of
times • Used with embedded systems storing large data
• Similar storage permanence to EPROM (about 10 items in nonvolatile memory
RAM: “Random-access”
Basic types of RAM
memory e xte rnal view
• Typically volatile memory r/w 2k × n read and write • SRAM: Static RAM memory cell internals
enable memory
– bits are not held without power supply – Memory cell uses flip-flop to
A0 SRAM
…
• Read and written to easily by Ak-1 store bit
…
embedded system during – Requires 6 transistors Data' Data
execution Qn-1 Q0
Example:
Ram variations HM6264 & 27C256 RAM/ROM
devices
• Low-cost low-capacity
• PSRAM: Pseudo-static RAM 11-13, 15-19 data<7…0>
11-13, 15-19 data<7…0>
Device Access T ime (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V)
– Holds data after external power removed • First two numeric digits HM6264
27C256
85-100
90
.01
.5
15
100
5
5
36
Example:
TC55V2325FF-100 memory Composing memory
• 2-megabit
device data<31…0> Device
T C55V23
Access T ime (ns)
10
Standby Pwr. (mW) Active Pwr. (mW)
na 1200
Vcc Voltage (V)
3.3
• Memory size needed often differs from size of
readily available memories
Increase number of words
2m+1 × n ROM
addr<15…0> 25FF-100
2m × n ROM
synchronous addr<10...0> de vice characteristics • When available memory is larger, simply ignore A0
unneeded high-order address bits and higher … …
pipelined burst /CS1
data lines
Am-1
1×2 …
/CS2 A single read operation Am decoder
SRAM memory CS3 • When available memory is smaller, compose 2m × n ROM
CLK
device /WE
/ADSP
several smaller memories into one larger enable
…
/OE memory
• Designed to be MODE
/ADSC
– Connect side-by-side to increase width of words
…
well as single
byte I/O
– Tag V T D
miss • compared with tag stored in cache at
Data
37
Fully associative mapping Set-associative mapping
• Complete main memory address stored in each • Compromise between direct
cache address mapping and fully associative
• All addresses stored in cache simultaneously mapping
T ag Index Offset
V T D V T D V T D
Data contains content and tags of 2 or Valid
…
more memory address locations = =
Valid
= =
=
• Tags of that set simultaneously
compared as in fully associative
mapping
• Cache with set size N called N-
way set-associative
– 2-way, 4-way, 8-way are common
0.14
• Larger caches achieve lower miss rates but higher access 0.12
38
Advanced RAM Basic DRAM
• DRAMs commonly used as main memory in • Address bus
processor based embedded systems multiplexed between
row and column data Refresh
– high capacity, low cost components
Circuit
Row Decoder
Row Addr. Buffer
– FPM DRAM: fast page mode DRAM in, sequentially, by
– EDO DRAM: extended data out DRAM strobing ras and cas ras
address
signals, respectively Bit storage array
Fast Page Mode DRAM (FPM Extended data out DRAM (EDO
DRAM) DRAM)
• Each row of memory bit array is viewed as a page
• Improvement of FPM DRAM
• Page contains multiple words
• Individual words addressed by column address
• Extra latch before output buffer
• Timing diagram: – allows strobing of cas before data read operation
completed
– row (page) address sent
– 3 words read consecutively by sending column address for each • Reduces read/write latency by additional cycle
• Extra cycle eliminated on each read/write of words from
ras
ras
cas
same page cas address row col col col
(S)ynchronous and
Enhanced Synchronous (ES) Rambus DRAM (RDRAM)
• DRAM
SDRAM latches data on active edge of clock
• Eliminates time to detect ras/cas and rd/wr • More of a bus interface architecture than
signals DRAM architecture
• A counter is initialized to column address then • Data is latched on both rising and falling
incremented on active edge of clock to access edge of clock
consecutive memory locations
• Broken into 4 banks each with own row
• ESDRAM improves SDRAM clock
39
Memory Management Unit
DRAM integration problem
(MMU)
• SRAM easily integrated on same chip as • Duties of MMU
processor
– Handles DRAM refresh, bus interface and
• DRAM more difficult arbitration
– Different chip making process between DRAM – Takes care of memory sharing among
and conventional logic multiple processors
– Goal of conventional logic (IC) designers: – Translates logic memory addresses from
• minimize parasitic capacitance to reduce signal processor to physical memory addresses of
propagation delays and power consumption DRAM
– Goal of DRAM designers: • Modern CPUs often come with MMU
• create capacitor cells to retain stored information
built-in
– Integration processes beginning to appear
• Single-purpose processors can be used
Outline
• Interfacing basics
• Microprocessor interfacing
Chapter 6 Interfacing – I/O Addressing
– Interrupts
– Direct memory access
• Arbitration
• Hierarchical buses
• Protocols
– Serial
– Parallel
– Wireless
bus structure
– Communication • Address bus, data bus
• Transfer of data between processors and – Or, entire collection of wires
memories
• Address, data and control
• Implemented using buses
• Associated protocol: rules for
• Called interfacing communication
40
Timing Diagrams
Ports • Most common method for describing
a communication protocol
rd'/wr
• Time proceeds to the right on x-axis
Processor Memory rd'/wr
• Control signal: low or high
port enable enable
addr[0-11]
– May be active low (e.g., go’, /go, or
addr
go_L)
data[0-7]
– Use terms assert (active) and data
deassert
bus tsetup tread
– Asserting go’ means go=0
• Conducting device on periphery read protocol
• Data signal: not valid or valid
• Connects bus to processor or memory
• Protocol may have subprotocols rd'/wr
• Often referred to as a pin – Called bus cycle, e.g., read and enable
– Actual pins on periphery of IC package that plug into socket on printed- write
addr
circuit board – Each may be several clock cycles
– Sometimes metallic balls instead of pins • Read example data
– Today, metal “pads” connecting processors and memories within single – rd’/wr set low,address placed on tsetup twrite
IC addr for at least tsetup time before write protocol
enable asserted, enable triggers
• Single wire or set of wires with single function memory to place data on data wires
– E.g., 12-wire address port by time tread
peripheral ack
req req
data 15:8 7:0 addr/data addr data
Strobe protocol Handshake protocol
data serializing address/data muxing
CLOCK
C4
DAT A
req
wait
1 3 req 1
wait 2 3
4
• Features D[7-0]
A[19-0] ADDRESS
/MEMR
taccess taccess
1. Master asserts req to receive data 1. Master asserts req to receive data – Compromise CHRDY
DAT A
D[7-0]
/MEMW
Fast-response case Slow-response case deasserted – resulting CHRDY
in additional wait
cycles (up to 6)
41
Microprocessor interfacing: I/O addressing Compromises/extensions
• A microprocessor communicates with • Parallel I/O peripheral
other devices using some of its pins – When processor only supports bus- Processor Memory
based I/O but parallel I/O needed
– Port-based I/O (parallel I/O) System bus
• Processor’s software reads and writes a port just read/written by the processor Port A Port B Port C
like a register
• Extended parallel I/O Adding parallel I/O to a bus-
based I/O processor
• E.g., P0 = 0xFF; v = P1.2; -- P0 and P1 are 8-bit
– When processor supports port- Processor Port 0
ports Port 1
based I/O but more ports needed Port 2
Port 3
– Bus-based I/O – One or more processor ports Parallel I/O peripheral
• Processor has address, data and control ports that interface with parallel I/O peripheral
form a single bus extending total number of ports Port A Port B Port C
Extended parallel I/O
• Communication protocol is built into the processor available for I/O
• A single instruction carries out the read or write – e.g., extending 4 ports to 6 ports in
protocol on the bus figure
P2 Adr. 15…8
Data P0 D
/CS
Q
A<0...15>
/OE
ALE G /WE
– /IOR distinct from Q Adr. 7…0 74373 CS2 /CS1
8 HM6264
/MEMR for peripheral ISA I/O bus read protocol
ALE P2
/WR /CS
read CYCLE C1
C4
C2 WAIT C3 /RD /RD
/PSEN
D<0...7>
A<0...14>
CLOCK
A[15-0] ADDRESS
– 16-bit address space
for I/O vs. 20-bit
ALE
/IOR
• Interfacing an 8051 to external memory
address space for
CHRDY
– Ports P0 and P2 support port-based I/O when 8051
memory internal memory being used
– Those ports serve as data/address buses when
– Otherwise very similar
external memory is being used
to memory protocol
– 16-bit address and 8-bit data are time multiplexed;
low 8-bits of address must therefore be latched with
aid of ALE signal
42
A more complex memory Microprocessor interfacing: interrupts
protocol • Suppose a peripheral intermittently receives
FSM description
data, which must be serviced by the processor
Specification for a single
GO=0
GO=1
– The processor can poll the peripheral
read operation
CLK S0
ADSP=1,
ADSC=1
ADV=1, OE=1,
ADSP=0,
ADSC=0
ADV=0, OE=1,
S1 regularly to see if data has arrived – wasteful
GO=0
/ADSP
/ADSC
Addr = ‘Z’ Addr = Addr0
– The peripheral can interrupt the processor
Data is
/ADV GO=0
ready
here!
when it has data
GO=1
addr <15…0>
/WE • Requires an extra pin or pins: Int
ADSP=1, ADSP=1,
/OE
43
Interrupt-driven I/O using fixed Interrupt-driven I/O using fixed
ISR location ISR location
3: After completing instruction at 100, Program memory µP Data memory 4(a): The ISR reads data from 0x8000, Program memory µP Data memory
P sees Int asserted, saves the PC’s ISR modifies the data, and writes the ISR
value of 100, and sets PC to the ISR 16: MOV R0, 0x8000 resulting data to 0x8001. 16: MOV R0, 0x8000
17: # modifies R0 System bus 17: # modifies R0 System bus
fixed location of 16.
18: MOV 0x8001, R0 4(b): After being read, P1 deasserts Int. 18: MOV 0x8001, R0
19: RETI # ISR return 19: RETI # ISR return
... Int P1 P2 ... Int P1 P2
Main program Main program 0
... PC ... PC
0x8000 0x8001 0x8000 0x8001
100: instruction 100: instruction
101: instruction 100 101: instruction 100
44
Interrupt-driven I/O using Interrupt-driven I/O using
vectored interrupt vectored interrupt
3: After completing instruction at 100, µP Program memory µP Data memory 4: P1 detects Inta and puts interrupt Program memory µP Data memory
sees Int asserted, saves the PC’s value of ISR address vector 16 on the data bus ISR
100, and asserts Inta 16: MOV R0, 0x8000 16: MOV R0, 0x8000
17: # modifies R0 System bus 17: # modifies R0 16 System bus
18: MOV 0x8001, R0 18: MOV 0x8001, R0
19: RETI # ISR return 1 19: RETI # ISR return
... Inta P1 P2 ... Inta P1 P2
Main program Int Main program Int
... PC 16 ... PC 16
100: instruction 0x8000 0x8001 100: instruction 0x8000 0x8001
101: instruction 100 101: instruction 100
45
Direct memory access Peripheral to memory transfer
• Buffering
– Temporarily storing data in memory before processing without DMA, using vectored
– Data accumulated in peripherals commonly buffered
• Microprocessor could handle this with ISR
interrupt
– Storing and restoring microprocessor state inefficient
Time
1(a): µP is executing its main program. 1(b): P1 receives input data in a register
with address 0x8000.
– Regular program must wait
• DMA controller more efficient 2: P1 asserts Int to request servicing by
the microprocessor.
– Separate single-purpose processor 3: After completing instruction at 100, µP sees Int
asserted, saves the PC’s value of 100, and asserts Inta.
– Microprocessor relinquishes control of system bus to DMA 4: P1 detects Inta and puts interrupt
controller address vector 16 on the data bus.
– Microprocessor can meanwhile execute its regular 5(a): µP jumps to the address on the bus (16). The ISR
there reads data from 0x8000 and then writes it to
program 0x0001, which is in memory. 5(b): After being read, P1 deasserts Int.
46
Peripheral to memory transfer Peripheral to memory transfer
without DMA, using vectored without DMA, using vectored
interrupt (cont’) interrupt (cont’)
6: The ISR returns, thus restoring PC to Program memory µP Data memory
5(a): P jumps to the address on the bus (16). µP Data memory 100+1=101, where P resumes executing. ISR 0x0000 0x0001
Program memory
ISR 0x0000 0x0001 16: MOV R0, 0x8000
The ISR there reads data from 0x8000 and
16: MOV R0, 0x8000 17: # modifies R0
then writes it to 0x0001, which is in memory. 18: MOV 0x0001,
0x8001, R0 System bus
17: # modifies R0
18: MOV 0x0001,
0x8001, R0 System bus 19: RETI # ISR return
5(b): After being read, P1 de-asserts Int. ...
19: RETI # ISR return
... Main program Inta P1
...
Main program Inta P1 Int
... 100: instruction 16
Int PC
100: instruction 16 101: instruction
PC 0 +1 0x8000
101: instruction 0x8000 100
100
No ISR needed! Dack, and resumes execution, P stalls only if No ISR needed!
3: DMA ctrl asserts Dreq to request control of System bus it needs the system bus to continue executing. System bus
system bus
... ... 1
Dack DMA ctrl P1 Dack DMA ctrl P1
Main program Dreq Main program Dreq
... 0x0001 ack ... 0x0001 ack
1
100: instruction PC 100: instruction PC
0x8000 req 0x8000 req
101: instruction 0x8000 101: instruction 0x8000
100 1 100
47
Peripheral to memory transfer Peripheral to memory transfer
with DMA (cont’) with DMA (cont’)
5: DMA ctrl (a) asserts ack, (b) reads data Program memory µP Data memory 6: DMA de-asserts Dreq and ack completing Program memory µP Data memory
from 0x8000, and (c) writes that data to 0x0000 0x0001 the handshake with P1. 0x0000 0x0001
0x0001. No ISR needed! No ISR needed!
System bus System bus
(Meanwhile, processor still executing if not
stalled!) ... ...
Dack DMA ctrl P1 Dack DMA ctrl P1
Main program Dreq 1 Main program Dreq 0
... 0x0001 ack ... 0x0001 ack
0
100: instruction PC 100: instruction PC
0x8000 req 0x8000 0x8000 req
0x8000
101: instruction 101: instruction
100 100
ISA-Bus
• Priority arbiter
R A
R – Single-purpose processor
DMA A I/O Device
– Peripherals make requests to arbiter, arbiter makes
DMA Me mory-Write Bus Cycle DMA Me mory-Re ad Bus Cycle
requests to resource
CYCLE C1 C2 C3 C4 C5 C6 CYCLE C1 C2 C3 C4 C5 C6
– Arbiter connected to system bus for configuration only
C7 C7
CLOCK CLOCK
48
Arbitration: Daisy-chain arbitration Arbitration: Daisy-chain arbitration
• Arbitration done by peripherals • Pros/cons
– Built into peripheral or external logic added
• req input and ack output added to each peripheral – Easy to add/remove peripheral - no system
• Peripherals connected to each other in daisy-chain manner redesign needed
– One peripheral connected to resource, all others connected
“upstream” – Does not support rotating priority
– Peripheral’s req flows “downstream” to resource, resource’s ack – One broken peripheral can cause loss of
flows “upstream” to requesting peripheral
– Closest peripheral has highest priority
access to other peripherals
Micro-
P
P processor System bus
System bus System bus
Inta
Priority Peripheral Peripheral Peripheral1 Peripheral2
Peripheral1 Peripheral2 Int arbiter 1 2 Inta
Ack_in Ack_out Ack_in Ack_out
Inta
Ack_in Ack_out Ack_in Ack_out Ireq1 Int Req_out Req_in Req_out Req_in 0
Int Req_out Req_in Req_out Req_in 0 Iack1
Ireq2
Daisy-chain aware peripherals
Iack2
Daisy-chain aware peripherals
ENABLE
Priority Arbiter
peripherals
Memory Bus
• A peripheral’s index into interrupt table is
simultaneously causing collisions DAT A Peripheral 1 Peripheral 2 Jump T able
sent to memory-mapped register in arbiter
• Data must be resent • Peripherals receive external data and
raise interrupt
• Don’t want to start sending again at same time void Peripheral1_ISR(void) {
unsigned char data;
– statistical methods can be used to reduce chances unsigned
unsigned
char ARBITER_MASK_REG
char ARBITER_CH0_INDEX_REG
_at_
_at_
0xfff0;
0xfff1; data = PERIPHERAL1_DATA_REG;
unsigned char ARBITER_CH1_INDEX_REG _at_ 0xfff2; // do something with the data
unsigned char ARBITER_ENABLE_REG _at_ 0xfff3; }
processors }
ARBITER_ENABLE_REG = 1;
49
Advanced communication principles
Multilevel bus architectures
• Layering
• Don’t want one bus for all communication – Break complexity of communication protocol into pieces easier to
– Peripherals would need high-speed, processor-specific bus interface design and understand
• excess gates, power consumption, and cost; less portable
– Lower levels provide services to higher level
– Too many peripherals slows down bus
Micro- Cache Memory DMA • Lower level might work with bits while higher level might work with
• Processor-local bus processor controller controller
packets of data
– High speed, wide, most frequent
communication – Physical layer
– Connects microprocessor, cache, Processor-local bus • Lowest level in hierarchy
memory controllers, etc. Peripheral Peripheral Peripheral Bridge • Medium to carry data from one actor (device or node) to another
• Peripheral bus • Parallel communication
– Lower speed, narrower, less – Physical layer capable of transporting multiple bits of data
frequent communication Peripheral bus
50
Serial protocols: I2C I2C bus structure
SCL
• I2C (Inter-IC) SDA
Micro- EEPROM Temp. LCD-
– Two-wire serial bus protocol developed by Philips controller (servant) Sensor controller
(master) (servant) (servant) < 400 pF
Semiconductors nearly 20 years ago Addr=0x01 Addr=0x02 Addr=0x03
D
– Common devices capable of interfacing to I2C bus: C
S A A A A R A D D D A S O
• EPROMS, Flash, and some RAM memory, real-time clocks, T R 6 5 0 / C 8 7 0 C T P
T w K K
watchdog timers, and microcontrollers Typical read/write cycle
51
Wireless protocols: IrDA
Parallel protocols: ARM Bus
• IrDA
– Protocol suite that supports short-range point-to-point
• ARM Bus infrared data transmission
– Designed and used internally by ARM Corporation – Created and promoted by the Infrared Data
– Interfaces with ARM line of processors Association (IrDA)
– Many IC design companies have own bus protocol – Data transfer rate of 9.6 kbps and 4 Mbps
– Data transfer rate is a function of clock speed
– IrDA hardware deployed in notebook computers,
• If clock speed of bus is X, transfer rate = 16 x X bits/s
printers, PDAs, digital cameras, public phones, cell
– 32-bit addressing
phones
– Lack of suitable drivers has slowed use by
applications
– Windows 2000/98 now include support
– Becoming available on popular embedded OS’s
Chapter Summary
• Basic protocol concepts
– Actors, direction, time multiplexing, control methods
• General-purpose processors
– Port-based or bus-based I/O
– I/O addressing: Memory mapped I/O or Standard I/O
– Interrupt handling: fixed or vectored
– Direct memory access
• Arbitration
– Priority arbiter (fixed/rotating) or daisy chain
• Bus hierarchy
• Advanced communication
– Parallel vs. serial, wires vs. wireless, error detection/correction,
layering
– Serial protocols: I2C, CAN, FireWire, and USB; Parallel: PCI and
ARM.
– Serial wireless protocols: IrDA, Bluetooth, and IEEE 802.11.
52