0% found this document useful (0 votes)
19 views134 pages

DDCA Ch5

Chapter 5 of 'Digital Design and Computer Architecture' covers essential digital building blocks such as arithmetic circuits, number systems, and memory arrays. It discusses 1-bit adders, multibit adders, and various types of carry propagate adders, emphasizing their design and efficiency. The chapter also introduces hierarchical carry-lookahead adders for improved performance in larger systems.

Uploaded by

Serdar Bozdağ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views134 pages

DDCA Ch5

Chapter 5 of 'Digital Design and Computer Architecture' covers essential digital building blocks such as arithmetic circuits, number systems, and memory arrays. It discusses 1-bit adders, multibit adders, and various types of carry propagate adders, emphasizing their design and efficiency. The chapter also introduces hierarchical carry-lookahead adders for improved performance in larger systems.

Uploaded by

Serdar Bozdağ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 134

Chapter 5

Digital Design and Computer Architecture, 2nd Edition


David Money Harris and Sarah L. Harris

Chapter 5 <1>
Chapter 5 :: Topics
• Introduction
• Arithmetic Circuits
• Number Systems
• Sequential Building Blocks
• Memory Arrays
• Logic Arrays

Chapter 5 <2>
Introduction
• Digital building blocks:
– Gates, multiplexers, decoders, registers,
arithmetic circuits, counters, memory arrays,
logic arrays
• Building blocks demonstrate hierarchy,
modularity, and regularity:
– Hierarchy of simpler components
– Well-defined interfaces and functions
– Regular structure easily extends to different sizes
• Will use these building blocks in Chapter
7 to build microprocessor
Chapter 5 <3>
1-Bit Adders
Half Full
Adder Adder
A B A B

Cout Cout Cin


+ +
S S

A B Cout S Cin A B Cout S


0 0 0 0 0
0 1 0 0 1
1 0 0 1 0
1 1 0 1 1
1 0 0
S = 1 0 1
Cout = 1 1 0
1 1 1

S =
Cout =

Chapter 5 <4>
1-Bit Adders
Half Full
Adder Adder
A B A B

Cout Cout Cin


+ +
S S

A B Cout S Cin A B Cout S


0 0 0 0 0 0 0 0 0
0 1 0 1 0 0 1 0 1
1 0 0 1 0 1 0 0 1
1 1 1 0 0 1 1 1 0
1 0 0 0 1
S = 1 0 1 1 0
Cout = 1 1 0 1 0
1 1 1 1 1

S =
Cout =

Chapter 5 <5>
1-Bit Adders
Half Full
Adder Adder
A B A B

Cout Cout Cin


+ +
S S

A B Cout S Cin A B Cout S


0 0 0 0 0 0 0 0 0
0 1 0 1 0 0 1 0 1
1 0 0 1 0 1 0 0 1
1 1 1 0 0 1 1 1 0
1 0 0 0 1
S =AB 1 0 1 1 0
Cout = AB 1 1 0 1 0
1 1 1 1 1

S = A  B  Cin
Cout = AB + ACin + BCin

Chapter 5 <6>
Multibit Adders (CPAs)
• Types of carry propagate adders (CPAs):
– Ripple-carry (slow)
– Carry-lookahead (fast)
– Prefix (faster)
• Carry-lookahead and prefix adders faster for large adders
but require more hardware
Symbol
A B
N N

Cout Cin
+
N
S
Chapter 5 <7>
Ripple-Carry Adder
• Chain 1-bit adders together
• Carry ripples through entire chain
• Disadvantage: slow

A31 B31 A30 B30 A1 B1 A0 B0

Cout Cin
+ C30 + C29 C1 + C0 +
S31 S30 S1 S0

Chapter 5 <8>
Ripple-Carry Adder Delay

tripple = NtFA
where tFA is the delay of a full adder

Chapter 5 <9>
Building a Faster Adder
– Similar to adding by hand, column by column a3 b3 a2 b2 a1 b1 a0 b0 cin
4-bit adder
– Con: Slow
cout s3 s2 s1 s0
• Output is not correct until the carries have rippled to
the left – critical path carries: c3 c2 c1 cin
• 4-bit carry-ripple adder has 4*2 = 8 gate delays B: b3 b2 b1 b0 a
– Pro: Small A: + a3 a2 a1 a0
• 4-bit carry-ripple adder has just 4*5 = 20 gates cout s3 s2 s1 s0

a3 b3 a2 b2 a1 b1 a0 b0 ci

FA FA FA FA

co s3 s2 s1 s0

10
Efficient Lookahead
cin c0 a
carries: c4 c3 c2 c1 c0 c1 1 0 1 1 1 1 1 1
b0
B: b3 b2 b1 b0 1 1 0 1
a0
A: + a3 a2 a1 a0 + 1 + 1 + 1 + 0
cout s3 s2 s1 s0 0 1 0 0

if a0b0 = 1 if a0 xor b0 = 1
c1 = a0b0 + (a0 xor b0)c0 then c1 = 1 then c1 = 1 if c0 = 1
(call this G: Generate) (call this P: Propagate)
c2 = a1b1 + (a1 xor b1)c1
c3 = a2b2 + (a2 xor b2)c2 Why those names? When a0b0=1, we should generate a 1
for c1. When a0 XOR b0 = 1, we should propagate the c0
c4 = a3b3 + (a3 xor b3)c3 value as the value of c1, meaning c1 should equal c0.

c1 = G0 + P0c0 Gi = aibi (generate)


c2 = G1 + P1c1 Pi = ai XOR bi (propagate)
c3 = G2 + P2c2
c4 = G3 + P3c3

11
Efficient Lookahead
Gi = aibi (generate)
c1 = G0 + P0c0 Pi = ai XOR bi (propagate)

c2 = G1 + P1c1 Gi/Pi function of inputs only


c2 = G1 + P1(G0 + P0c0)
c2 = G1 + P1G0 + P1P0c0

c3 = G2 + P2c2
c3 = G2 + P2(G1 + P1G0 + P1P0c0)
c3 = G2 + P2G1 + P2P1G0 + P2P1P0c0

c4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0c0

12
a3 b3 a2 b2 a1 b1 a0 b0 cin

CLA Half-adder
SPG
block
Half-adder Half-adder Half-adder

• Each stage:
– HA for G
and P
– Another G3 P3
Carry-lookahead logic c3
G2 P2
c2
G1 P1
c1
G0 P0 c0
XOR for s
– Call SPG cout s3 s2 (b) s1 s0
block P3 G3 P2 G2 P1 G1 P0 G0 c0
• Create carry- Carry-lookahead logic
lookahead
logic from
equations
• More a
efficient than
naïve
scheme, at
expense of
one extra Stage 4 Stage 3 Stage 2 Stage 1
gate delay
c1 = G0 + P0c0
c2 = G1 + P1G0 + P1P0c0
13
c3 = G2 + P2G1 + P2P1G0 + P2P1P0c0
cout = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0c0
Carry-Lookahead Adder – High-Level
a3
View
b3 a2 b2 a1 b1 a0 b0 c0

a b cin a b cin a b cin a b cin


SPG block SPG block SPG block SPG block
P G P G P G P G

P3 G3 c3 P2 G2 c2 P1 G1 c1 P0 G0
4-bit carry-lookahead logic
cout
cout s3 s2 s1 s0

• 4-bit adder comparison


• Fast – only 4 gate delays
(gate delays, gates)
– Each stage has SPG block with 2 gate levels
– Carry-ripple: (8, 20)
– Carry-lookahead logic quickly computes the carry
– Two-level: (2, 500)
from the propagate and generate bits using 2 gate
– CLA: (4, 26)
levels inside
o Nice
• Reasonable number of gates – 4-bit adder has only compromise
26 gates
14
Carry-Lookahead Adder – 32-bit?
• Problem: Gates get bigger in each stage
– 4th stage has 5-input gates
– 32nd stage would have 33-input gates
• Too many inputs for one gate
• Would require building from smaller gates, meaning
more levels (slower), more gates (bigger)
• One solution: Connect 4-bit CLA adders in carry- Gates get bigger
ripple manner in each stage
– Ex: 16-bit adder: 4 + 4 + 4 + 4 = 16 gate delays. Stage 4
Can we do better?

a15-a12 b15-b12 a11-a8 b11-b8 a7a6a5a4 b7b6b5b4 a3a2a1a0 b3b2b1b0

a3a2a1a0 b3b2b1b0 a3a2a1a0 b3b2b1b0 a3a2a1a0 b3b2b1b0 a3a2a1a0 b3b2b1b0


4-bit adder cin 4-bit adder cin 4-bit adder cin 4-bit adder cin
a
cout s3 s2 s1 s0 cout s3 s2 s1 s0 cout s3 s2 s1 s0 cout s3 s2 s1 s0

cout s15-s12 s11-s8 s7 s6 s5 s4 s3 s2 s1 s0

15
Hierarchical Carry-Lookahead Adders
• Better solution – Rather than rippling the carries, just repeat the carry-
lookahead concept
– Requires minor modification of 4-bit CLA adder to output P and G
These use carry-lookahead internally

a15-a12 b15-b12 a11-a8 b11-b8 a7 a6 a5 a4 b7b6b5b4 a3 a2 a1 a0 b3b2b1b0

a3 a2 a1 a0 b3 b2b1b0 a3 a2 a1 a0 b3 b2b1b0 a3 a2 a1 a0 b3 b2b1b0 a3 a2 a1 a0 b3 b2b1b0


4-bit adder cin 4-bit adder cin 4-bit adder cin 4-bit adder cin
P G cout s3 s2 s1 s0 P G cout s3 s2 s1 s0 P G cout s3 s2 s1 s0 P G cout s3 s2 s1 s0

P3 G3 c3 P2 G2 c2 P1 G1 c1 P0 G0
4-bit carry-lookahead logic
P G cout

s15-s12 s11-s18 s7-s4 s3-s0


Second level of carry-lookahead

16
Hierarchial Carry-Lookahead Adders
• Hierarchical CLA concept can be applied for larger adders
• 32-bit hierarchical CLA:
– Only about 8 gate delays (2 for SPG block, then 2 per CLA level)
– Only about 14 gates in each 4-bit CLA logic block
SPG ai bi P G c a
block
Q: How many gate
delays for 64-bit
hierarchical CLA,
4-bit 4-bit 4-bit 4-bit 4-bit 4-bit 4-bit 4-bit
using 4-bit CLA logic?
CLA CLA CLA CLA CLA CLA CLA CLA
logic logic logic logic logic logic logic logic
A: 16 CLA-logic blocks
PGc c GP c G P P G c P Gc c GP c G P in 1st level, 4 in 2nd, 1
P G c
in 3rd -- so still just 8
4-bit 4-bit gate delays (2 for
CLA CLA
logic logic SPG, and 2+2+2 for
CLA logic). CLA is a
P G c 2-bit c G P very efficient method.
CLA
logic

17
Subtracter

Symbol Implementation
A B
N
A B
N N
N N
-
N +
Y N
Y

Chapter 5 <18>
Subtracter

Symbol Implementation
A B
N
A B
N N
N N
-
N +
Y N
Y

Chapter 5 <19>
Comparator: Equality

Symbol Implementation
A3
B3

A B A2
4 4 B2
Equal
= A1
B1
Equal
A0
B0

Chapter 5 <20>
Comparator: Equality

Symbol Implementation
A3
B3

A B A2
4 4 B2
Equal
= A1
B1
Equal
A0
B0

Chapter 5 <21>
4.4

Comparators
• N-bit equality comparator: Outputs 1 if two N-bit numbers are equal
– 4-bit equality comparator with inputs A and B
• a3 must equal b3, a2 = b2, a1 = b1, a0 = b0
– Two bits are equal if both 1, or both 0
– eq = (a3b3 + a3’b3’) * (a2b2 + a2’b2’) * (a1b1 + a1’b1’) * (a0b0 + a0’b0’)
• Note that function inside parentheses is XNOR
– eq = (a3 xnor b3) * (a2 xnor b2) * (a1 xnor b1) * (a0 xnor b0)

a3 b3 a2 b2 a1 b1 a0 b0
0110 = 0111 ? 0 0 1 1 1 1 0 1

a3 a2 a1 a0 b3 b2 b1 b0
1 1 1 0
4-bit equality comparator =
eq

a
a
0 eq

22
Comparator: Less Than

A B
N N

-
N
[N-1]

A<B

5-<23>
Chapter 5 <23>
Magnitude Comparator

• N-bit magnitude comparator: A=1011 B=1001


Two N-bit inputs A and B, outputs
whether A>B, A=B, or A<B, for 1011 1001 Equal
– How design? Consider 1011 1001 Equal a
comparing by hand. 1011 1001 Not equal
– First compare a3 and b3. If
So A > B
equal, compare a2 and b2.
And so on.
– Stop if comparison not equal
(the two bits are 0 and 1, or 1
and 0)—whichever of A or B
has the 1 is thus greater. If
never see unequal bit pair,
then A=B. 24
Magnitude Comparator
• By-hand example leads to idea for design
– Start at left, compare each bit pair, pass results to the right
– Each bit pair called a stage
– Each stage has 3 inputs taking results of higher stage, outputs new results to lower
stage
a3 b3 a2 b2 a1 b1 a0 b0

a b a b a b a b
Igt in_gt out_gt in_gt out_gt in_gt out_gt in_gt out_gt AgtB
Ieq in_eq out_eq in_eq out_eq in_eq out_eq in_eq out_eq AeqB
Ilt in_lt out_lt in_lt out_lt in_lt out_lt in_lt out_lt AltB

Stage 3 Stage 2 Stage 1 Stage 0

How design each


a
stage?
0 Igt a3 a2 a1 a0 b3b2b1b0
AgtB
1 Ieq 4-bit magnitude comparator AeqB >=< a

0 Ilt AltB
25
Magnitude Comparator
a3 b3 a2 b2 a1 b1 a0 b0

a b a b a b a b
Igt in_gt out_gt in_gt out_gt in_gt out_gt in_gt out_gt AgtB
Ieq in_eq out_eq in_eq out_eq in_eq out_eq in_eq out_eq AeqB
Ilt in_lt out_lt in_lt out_lt in_lt out_lt in_lt out_lt AltB

Stage 3 Stage 2 Stage 1 Stage 0


• Each stage:
– out_gt = in_gt + (in_eq * a * b’)
• A>B if already determined in higher stage, or if higher stages equal but in this stage a=1
and b=0
– out_lt = in_lt + (in_eq * a’ * b)
a
• A<B if already determined in higher stage, or if higher stages equal but in this stage a=0
and b=1
– out_eq = in_eq * (a XNOR b)
• A=B (so far) if already determined in higher stage and in this stage a=b too
– Simple circuit inside each stage, just a few gates (not shown)

26
Magnitude Comparator
1 1 0 0 1 > 0 1 1
1011 = 1001 ? a3 b3 a2 b2 a1 b1 a0 b0 • Final answer
appears on the
a b a b a b a b right
Igt
0
in_gt out_gt in_gt out_gt 0 in_gt out_gt
1
in_gt out_gt AgtB • Takes time for
1 0 answer to “ripple”
Ieq in_eq out_eq in_eq out_eq 1 in_eq out_eq in_eq out_eq AeqB
0 0 from left to right
Ilt in_lt out_lt in_lt out_lt 0 in_lt out_lt in_lt out_lt AltB
• Thus called “carry-
Stage3 Stage2 Stage1 Stage0 ripple style” after
(c) the carry-ripple
a 1 1 0 0 1 0 1 1 adder
a3 b3 a2 b2 a1 b1 a0 b0 – Even though
there’s no
a b a b a b a b “carry”
0 1 involved
Igt in_gt out_gt in_gt out_gt in_gt out_gt 1 in_gt out_gt AgtB
1 0 0
Ieq in_eq out_eq in_eq out_eq in_eq out_eq in_eq out_eq AeqB
0 0 0
Ilt in_lt out_lt in_lt out_lt in_lt out_lt in_lt out_lt AltB

Stage3 Stage2 Stage1 Stage0


(d)

27
Exercise

Use magnitude comparators and logic to design a circuit


that computes the minimum of three 8 bit numbers

28
29
Exercise

30
31
Arithmetic Logic Unit (ALU)

F2:0 Function
A B 000 A&B
N N 001 A|B
010 A+B
ALU 3F 011 not used
N 100 A & ~B
Y 101 A | ~B
110 A-B
111 SLT

5-<32>
Chapter 5 <32>
ALU Design
A B
N N F2:0 Function
000 A&B
N
001 A|B
1

0 F2
N 010 A+B
011 not used
Cout + 100 A & ~B
[N-1] S
101 A | ~B
Extend
Zero

110 A-B
N N N N
1

0
3

2 F1:0 111 SLT


N
Y
5-<33>
Chapter 5 <33>
Set Less Than (SLT) Example
A B
N N
• Configure 32-bit ALU for SLT
operation: A = 25 and B = 32
N
1

0
F2
N

Cout +
[N-1] S
Extend
Zero

N N N N
1

0
3

2 F1:0
N
Y

5-<34>
Chapter 5 <34>
Set Less Than (SLT) Example
A B
N N
• Configure 32-bit ALU for SLT
operation: A = 25 and B = 32
N
1 – A < B, so Y should be 32-bit
0
F2 representation of 1 (0x00000001)
N
– F2:0 = 111
– F2 = 1 (adder acts as
Cout + subtracter), so 25 - 32 = -7
[N-1] S
– -7 has 1 in the most
Extend

significant bit (S31 = 1)


Zero

N N N N
– F1:0 = 11 multiplexer selects
1

0
3

2 F1:0 Y = S31 (zero extended) =


N
Y
0x00000001.

5-<35>
Chapter 5 <35>
Shifters
• Logical shifter: shifts value to left or right and fills empty spaces with 0’s
– Ex: 11001 >> 2 =
– Ex: 11001 << 2 =

• Arithmetic shifter: same as logical shifter, but on right shift, fills empty
spaces with the old most significant bit (msb).
– Ex: 11001 >>> 2 =
– Ex: 11001 <<< 2 =

• Rotator: rotates bits in a circle, such that bits shifted off one end are
shifted into the other end
– Ex: 11001 ROR 2 =
– Ex: 11001 ROL 2 =

5-<36>
Chapter 5 <36>
Shifters
• Logical shifter:
– Ex: 11001 >> 2 = 00110
– Ex: 11001 << 2 = 00100
• Arithmetic shifter:
– Ex: 11001 >>> 2 = 11110
– Ex: 11001 <<< 2 = 00100
• Rotator:
– Ex: 11001 ROR 2 = 01110
– Ex: 11001 ROL 2 = 00111

Chapter 5 <37>
Shifter Design
A 3 A 2 A1 A0 shamt1:0
2
00 S1:0
01

10
Y3

shamt1:0 11

2 00
S1:0
01
Y2
A3:0 4 >> 4 Y3:0
10

11

00
S1:0
01

10
Y1
11
How to extend it to N-bit shifter?
00
S1:0
Need 2Nx1MUX for each bit. 01
Y0
10
Too expensive! 11

Chapter 5 <38>
Barrel Shifter

Shifts either 4 bits or 0 bits based on input a

Shifts either 2 bits or 0 bits based on input b

Shifts either 1 bits or 0 bits based on input c

39
Shifters as Multipliers, Dividers

• A << N = ?

• A >>> N = ?

Chapter 5 <40>
Shifters as Multipliers, Dividers

• A << N = A × 2N
– Example: 00001 << 2 = 00100 (1 × 22 = 4)
– Example: 11101 << 2 = 10100 (-3 × 22 = -12)
• A >>> N = A ÷ 2N
– Example: 01000 >>> 2 = 00010 (8 ÷ 22 = 2)
– Example: 10000 >>> 2 = 11100 (-16 ÷ 22 = -4)

Chapter 5 <41>
Exercise

42
43
4.2

Registers
• N-bit register: Stores N bits, N is the width
– Common widths: 8, 16, 32 b x
Combinational n1
– Storing data into register: Loading logic
– Opposite of storing: Reading (does not alter contents) n0
• Basic register of Ch 3: Loaded every cycle s1 s0

– Useful for implementing FSM—stores encoded state clk State register


a
I3 I2 I1 I0
load
4-bit register
I3 I2 I1 I0
D D D D
Q Q Q Q reg(4)
clk Q3 Q2 Q1 Q0

Q3 Q2 Q1 Q0

Basic register loads on every clock cycle


How extend to only load on certain cycles?

44
Register with Parallel Load
• Add 2x1 mux to front of each flip-flop
• Register’s load input selects mux input to pass
– load=0: existing flip-flop value; load=1: new input value
I3 I2 I1 I0

1 0 1 0 1 0 1 0
load 2x1 I3 I2 I1 I0
load a
D D D D
Q3 Q2 Q1 Q0
Q Q Q Q
block symbol
Q3 Q2 Q1 Q0

I3 I2 I1 I0 I3 I2 I1 I0

load=1
load=0

10 10 10 10 10 10 10 10

D D D D D D D D

Q Q Q Q Q Q Q Q
Q3 Q2 Q1 Q0 Q3 Q2 Q1 Q0
a
45
Exercise

46
module register(
input logic clk,
input logic [1:0] select,
input logic [3:0] In,
output logic [3:0] Q
);

always_ff @(posedge clk)


case(select)
2'b00: Q <=Q;
2'b01: Q <=In;
2'b10: Q <=4'b0000;
2'b11: Q <=~Q;
endcase
endmodule

47
48
Exercise

49
50
Counters
• Increments on each clock edge
• Used to cycle through numbers. For example,
– 000, 001, 010, 011, 100, 101, 110, 111, 000, 001…
• Example uses:
– Digital clock displays
– Program counter: keeps track of current instruction executing

Symbol Implementation

CLK
N CLK
N N
+ Q
Q N N r
1
Reset
Reset

Chapter 5 <51>
Counters and Timers 4.9

• N-bit up-counter: N-bit register that 0 clr

can increment (add 1) to its own 1


0 cnt
tc
4-bit up-counter
C
value on each clock cycle 4 a
– 0000, 0001, 0010, 0011, ...., 1110, 0 0101
0100
0011
0010
0000
0001
1 1110
...
1111
1111, 0000
– Count “rolls over” from 1111 to 4-bit up-counter
0000 clr
clr
• Terminal (last) count, tc, cnt Id 4-bit register

equals1 during value just


4 4
before rollover +1
a
4
4
• Internal design tc C

– Register, incrementer, and N-input


AND gate to detect terminal count
52
Up/Down-Counter
• Can count either up or
4-bit up/down counter
down
– Includes both
dir
incrementer and 1 4-bit 2 x 1 0
decrementer 4

– Use dir input to clr clr


cnt
select, via 2x1 mux: ld 4-bit register
dir=0 means up
– Likewise, dir selects 4 4 4 4
appropriate terminal
count value (all 1s or 4 –1 +1
all 0s) 4 4
1 2x 1 0
tc C

53
Counter with Load
• Up-counter that can be
L 4
loaded with external value
ld
– Designed using 2x1 1 4-bit 2x1 0
4
mux. ld input selects
cnt Id
incremented value or clr clr
4-bit register

external value
– Load the internal 4 4
+1
4
register when loading
external value or when tc C

counting
– Note that ld has
priority over cnt

54
Exercise

55
module count4(
input clk,
input cnt,
input set,
input clear,
output logic [3:0] count
);

always_ff @(posedge clk)


if (clear) count <= 4'b0000;
else if (set) count <= 4'b1111;
else if (cnt) count <= count-1;

endmodule

56
57
58
Exercise

59
60
Exercise

61
62
M

Timers -1
32
load
• Pulses output at user-specified ld
32-bit register

timer interval when enabled


1 4-bit 2x1 0
– “Ticks” like a clock
– Interval specified as multiple enable ld
32-bit
cnt
of base time unit down-counter
1 microsec tc C
– If base is 1 microsec and user oscillator
Q unused
wants pulse every 300 ms, (a)
loads 300,000 into timer
32
• Can design using oscillator, M
load
register, and down-counter Q
enable 32-bit
1-microsec
(b)
Q timer

(c)

63
module timer(input logic clk, load, enable, logic [31:0] M,
output logic Q, logic[31:0] downcount);

logic [31:0] next;


assign next = downcount-1;

always_ff @(posedge clk)


if (load|Q) downcount <= M-1;
else if (enable) downcount <= next;

always_ff @(posedge clk)


if (downcount==0) Q=1;
else Q=0;

endmodule

64
Exercise

65
66
Shift Registers
• Shift a new bit in on each clock edge
• Shift a bit out on each clock edge
• Serial-to-parallel converter: converts serial input (Sin) to
parallel output (Q0:N-1)

Symbol: Implementation:
CLK
N
Q Sin Sout

Sin Sout
Q0 Q1 Q2 QN-1

Chapter 5 <67>
Shift Register
• Shift right 0
– Move each bit one position right
Register contents
0 1 1 0
– Rightmost bit is “dropped” after shift right
– Assume 0 shifted into leftmost bit

Q: Do four right shifts on 1001, showing value after each shift

A: 1001 (original)
0100
0010 • Implementation: Connect flip-flop
output to next flip-flop’s input
0001
shr_in
0000

68
Shift Register
• To allow register to either shift or retain, use
2x1 muxes
– shr: “0” means retain, “1” shift
– shr_in: value to shift in
• May be 0, or 1
shr_in

1 0 1 0 1 0 1 0 1 0 1 0

shr=1
shr 1 0 1 0
2x1 2x1
D D D D
D D D D
Q Q Q Q
Q Q Q Q
Q3 Q2 Q1 Q0
Q3 Q2 Q1 Q0 (b)
(a )
shr_in
shr
Left-shift register also easy to design
Q3 Q2 Q1 Q0

(c)
69
Shift Register with Parallel Load
• When Load = 1, acts as a normal N-bit register
• When Load = 0, acts as a shift register
• Now can act as a serial-to-parallel converter (Sin to Q0:N-1) or
a parallel-to-serial converter (D0:N-1 to Sout)

D0 D1 D2 DN-1
Load
Clk
Sin 0 0 0 0 Sout
1 1 1 1

Q0 Q1 Q2 QN-1

Chapter 5 <70>
Rotate Register
Register contents
1 1 0 1
before shift right

• Rotate right: Like shift 1 1 1 0


Register contents
after shift right
right, but leftmost bit
comes from rightmost
bit

71
Example: Above-Mirror Car Display

Design the following system to be used for an above-mirror car display: At


any point in time, one of the four 8-bit values need to be displayed:
1) temperature,
2) fuel economy,
3) fuel remaining,
4) speed.

Operation:
• The type of the value is determined by the 2-bit signals a1a0 or x1x0.
• The car’s central computer can update these values at arbitrary times and
in arbitrary order. It sends the data to your system over an 8-bit bus C
after setting the 2-bit signal a1a0 and single-bit signal load.
• Depending on the value of x1x0, your system should output the
corresponding 8-bit value to the display system through an 8-bit bus D.

72
Example: Above-Mirror Car Display

73
Example: Above-Mirror Car Display (cont’d)

How can we reduce the number of wires from car computer to your
system from 11 to 4?

74
Example: Above-Mirror Car Display (cont’d)

75
4.10

Register Files
• Accessing one of
32
C C 8
9 a

several registers is: d0d0 load


loadreg0
reg0 T
32
huge mux

4x 162 4 i0 i0
9
– OK if just a few d1
too much
loadfanout
reg1
8
32-bit
8-bit
4 a0 A 9 4× 1
16x1
registers i0
9
i1
i3-i0 i1 8
– Problematic when a1
d2 load reg2 I
9 dd
328
DD
9
many 8
i2
congestion
d3 load reg3
– Ex: Earlier above- e
d15
e load reg15 M
16*32 =
load i15i3 512
s1 s0wires
load 8
mirror display, with 4 8-bit registers tolerable
32 s3-s0
x y

16 registers 16 32-bit registers


• Much fanout (branching of begins to have fanout and wire
wire): Weakens signal problems
76
• Many wires: Congestion
Register File
• MxN register file: Efficient design for one-at-a-time write/read of many
registers
– Consider 16 32-bit registers
Called “write port”

32 32 32-bit data that is read


32-bit data to

“read port”
W_data R_data
write
4-bit “address” specifies 4 4 4-bit address to specifies
W_addr R_addr
which register to write which register to read
W_en R_en
Enable (load) line: Reg 16 × 32
Enable read
written on next clock register file
a

77
Internal Implementation
• How to handle the large fanout problem?

• Use buffers/repeaters

• How to implement the 32-bit 16x1 MUX efficiently?

Reminder: MUX implementation


32
0 with tri-state buffers:
S
.
. 16x1
D0

. Y

32 D1
15

78
Register File 9
W_data
32
bus 9 32 9
R_data
d0 load reg0 driver d0

• Internal design 2x4


32 9
9 2x4

uses drivers and 1


d1 load reg1 d1
1
a

W_addr
i0
9
32 i0
R_addr
bus driver
d q
1 i1
d2 load reg2
9 d2
i1
1
write 32 read
q=d decoder 1 decoder
d3 load reg3
9 1 d3
Boosts signal e e
32 9 9
W_en R_en
three-state driver 1 4x32 register file 1
a c
d q Internal design of 4x32 RF; 16x32 RF follows similarly

c=1: q=d d q
d
c=0: q= Z q
like no connection

Note: Each driver in figure actually


represents 32 1-bit drivers

79
Exercise

80
81
Exercise

82
83
Memory Arrays
• Efficiently store large amounts of data
• 3 common types:
– Dynamic random access memory (DRAM)
– Static random access memory (SRAM)
– Read only memory (ROM)
• M-bit data value read/ written at each unique N-bit address

N
Address Array

Data
Chapter 5 <84>
Memory Arrays
• 2-dimensional array of bit cells
• Each bit cell stores one bit
• N address bits and M data bits: Address
N
Array
– 2N rows and M columns
– Depth: number of rows (number of words) M

– Width: number of columns (size of word) Data


– Array size: depth × width = 2N × M

Address Data
11 0 1 0
2
Address Array 10 1 0 0
depth
01 1 1 0
3 00 0 1 1
Data width

Chapter 5 <85>
Memory Array Example
• 22 × 3-bit array
• Number of words: 4
• Word size: 3-bits
• For example, the 3-bit word stored at address 10 is 100

Address Data
11 0 1 0
2
Address Array 10 1 0 0
depth
01 1 1 0
3 00 0 1 1
Data width

Chapter 5 <86>
Memory Arrays

1024-word x
10
Address 32-bit
Array

32

Data

Chapter 5 <87>
Memory Array Bit Cells
bitline
wordline
stored
bit

bitline = bitline =
wordline = 1 wordline = 0
stored stored
bit = 0 bit = 0

bitline = bitline =
wordline = 1 wordline = 0
stored stored
bit = 1 bit = 1

(a) (b)

Chapter 5 <88>
Memory Array Bit Cells
bitline
wordline
stored
bit

bitline = 0 bitline = Z
wordline = 1 wordline = 0
stored stored
bit = 0 bit = 0

bitline = 1 bitline = Z
wordline = 1 wordline = 0
stored stored
bit = 1 bit = 1

(a) (b)

Chapter 5 <89>
Memory Array
• Wordline:
– like an enable
– single row in memory array read/written
– corresponds to unique address
– only one wordline HIGH at once
2:4
Decoder bitline2 bitline1 bitline0
wordline3
11
2 stored stored stored
Address bit = 0 bit = 1 bit = 0
wordline2
10
stored stored stored
wordline1 bit = 1 bit = 0 bit = 0
01
stored stored stored
bit = 1 bit = 1 bit = 0
wordline0
00
stored stored stored
bit = 0 bit = 1 bit = 1

Data2 Data1 Data0


Chapter 5 <90>
Types of Memory
• Random access memory (RAM): volatile
• Read only memory (ROM): nonvolatile

Chapter 5 <91>
RAM: Random Access Memory
• Volatile: loses its data when power off
• Read and written quickly
• Main memory in your computer is RAM
(DRAM)

Historically called random access memory because any data


word accessed as easily as any other (in contrast to sequential
access memories such as a tape recorder)

Chapter 5 <92>
ROM: Read Only Memory
• Nonvolatile: retains data when power off
• Read quickly, but writing is impossible or
slow
• Flash memory in cameras, thumb drives, and
digital cameras are all ROMs

Historically called read only memory because ROMs


were written at manufacturing time or by burning fuses.
Once ROM was configured, it could not be written again.
This is no longer the case for Flash memory and other
types of ROMs.
Chapter 5 <93>
Types of RAM
• DRAM (Dynamic random access memory)
• SRAM (Static random access memory)
• Differ in how they store data:
– DRAM uses a capacitor
– SRAM uses cross-coupled inverters

Chapter 5 <94>
Robert Dennard, 1932 -
• Invented DRAM in
1966 at IBM
• Others were skeptical
that the idea would
work
• By the mid-1970’s
DRAM in virtually all
computers

Chapter 5 <95>
DRAM
• Data bits stored on capacitor
• Dynamic because the value needs to be refreshed
(rewritten) periodically and after read:
– Charge leakage from the capacitor degrades the value
– Reading destroys the stored value

bitline bitline
wordline wordline
stored
bit stored
bit

Chapter 5 <96>
DRAM

bitline bitline
wordline wordline

stored + + stored
bit = 1 bit = 0

Chapter 5 <97>
SRAM
bitline
wordline
stored
bit

bitline bitline
wordline

Chapter 5 <98>
Memory Arrays Review
2:4
Decoder bitline2 bitline1 bitline0
wordline3
11
2 stored stored stored
Address bit = 0 bit = 1 bit = 0
wordline2
10
stored stored stored
wordline1 bit = 1 bit = 0 bit = 0
01
stored stored stored
bit = 1 bit = 1 bit = 0
wordline0
00
stored stored stored
bit = 0 bit = 1 bit = 1

Data2 Data1 Data0

DRAM bit cell: SRAM bit cell:


bitline bitline bitline
wordline wordline

Chapter 5 <99>
Memory Comparison

100
Exercise

5.39 Draw a circuit of transistors showing the internal


structure for all the storage cells for a 4x2 DRAM (four
words, two bits each), clearly labelling all internal
components and connections.

Chapter 5 <101>
Solution

Chapter 5 <102>
Exercise

5.40 Draw a circuit of transistors showing the internal


structure for all the storage cells for a 4x2 SRAM (four
words, two bits each), clearly labelling all internal
components and connections.

Chapter 5 <103>
Exercise

Chapter 5 <104>
ROM: Dot Notation

bitline
2:4
Decoder wordline
11
2
Address
bit cell
10
containing 0

01
bitline
wordline
00
bit cell
Data2 Data1 Data0 containing 1

Chapter 5 <105>
Fujio Masuoka, 1944 -
• Developed memories and high
speed circuits at Toshiba, 1971-1994
• Invented Flash memory as an
unauthorized project pursued during
nights and weekends in the late
1970’s
• The process of erasing the memory
reminded him of the flash of a
camera
• Toshiba slow to commercialize the
idea; Intel was first to market in
1988
• Flash has grown into a $25 billion
per year market

Chapter 5 <106>
ROM Storage

2:4
Decoder
11
2
Address
10

01

00

Data2 Data1 Data0

Chapter 5 <107>
ROM Storage

2:4
Decoder Address Data
11
Address 2 11 0 1 0
10
10 1 0 0
depth
01 01 1 1 0
00 00 0 1 1
Data2 Data1 Data0
width

Chapter 5 <108>
ROM Logic

2:4
Decoder
11
2
Address
10

01

00

Data2 Data1 Data0

Chapter 5 <109>
ROM Logic

2:4
Decoder

Address 2
11
Data2 = A1  A0
10
Data1 = A1 + A0
01

00
Data0 = A1A0
Data2 Data1 Data0

Chapter 5 <110>
Example: Logic with ROMs
Implement the following logic functions using a 22 × 3-bit
ROM:
– X = AB 2:4
Decoder
– Y=A+B
11
– Z=AB 2
A, B
10

01

00

X Y Z

Chapter 5 <111>
Example: Logic with ROMs
Implement the following logic functions using a 22 × 3-bit
ROM:
– X = AB 2:4
Decoder
– Y=A+B
11
– Z=AB 2
A, B
10

01

00

X Y Z

Chapter 5 <112>
Logic with Any Memory Array
2:4
Decoder bitline2 bitline1 bitline0
wordline3
11
2 stored stored stored
Address bit = 0 bit = 1 bit = 0
wordline2
10
stored stored stored
wordline1 bit = 1 bit = 0 bit = 0
01
stored stored stored
bit = 1 bit = 1 bit = 0
wordline0
00
stored stored stored
bit = 0 bit = 1 bit = 1

Data2 Data1 Data0

Chapter 5 <113>
Logic with Any Memory Array
2:4
Decoder bitline2 bitline1 bitline0
wordline3
11
2 stored stored stored
Address bit = 0 bit = 1 bit = 0
wordline2
10
stored stored stored
wordline1 bit = 1 bit = 0 bit = 0
01
stored stored stored
bit = 1 bit = 1 bit = 0
wordline0
00
stored stored stored
bit = 0 bit = 1 bit = 1

Data2 Data1 Data0

Data2 = A1  A0
Data1 = A1 + A0
Data0 = A1A0
Chapter 5 <114>
Logic with Memory Arrays
Implement the following logic functions using a 22 × 3-bit
memory array:
– X = AB
– Y=A+B
– Z=AB

Chapter 5 <115>
Logic with Memory Arrays
Implement the following logic functions using a 22 × 3-bit
memory array:
– X = AB 2:4
– Y=A+B Decoder
wordline3
bitline2 bitline1 bitline0
11
– Z=AB stored stored stored
A, B 2
bit = 1 bit = 1 bit = 0
wordline2
10
stored stored stored
wordline1 bit = 0 bit = 1 bit = 1
01
stored stored stored
bit = 0 bit = 1 bit = 0
wordline0
00
stored stored stored
bit = 0 bit = 0 bit = 0

X Y Z

Chapter 5 <116>
Logic with Memory Arrays
Called lookup tables (LUTs): look up output at each input
combination (address)
4-word x 1-bit Array

2:4
Decoder bitline
Truth
Table 00
stored
A A1
bit = 0
A B Y 01
B A0
0 0 0 stored
0 1 0 bit = 0
1 0 0 10
1 1 1 stored
bit = 0
11
stored
bit = 1

Chapter 5 <117>
Multi-ported Memories
• Port: address/data pair
• 3-ported memory
– 2 read ports (A1/RD1, A2/RD2)
– 1 write port (A3/WD3, WE3 enables writing)
• Register file: small multi-ported memory
CLK

WE3
A1 RD1
N M
A2 RD2
N M

A3 Array
N
WD3
M

Chapter 5 <118>
SystemVerilog Memory Arrays
// 256 x 3 memory module with one read/write port
module dmem( input logic clk, we,
input logic[7:0] a
input logic [2:0] wd,
output logic [2:0] rd);

logic [2:0] RAM[255:0];

assign rd = RAM[a];

always @(posedge clk)


if (we)
RAM[a] <= wd;
endmodule

Chapter 5 <119>
Logic Arrays
• PLAs (Programmable logic arrays)
– AND array followed by OR array
– Combinational logic only
– Fixed internal connections
• FPGAs (Field programmable gate arrays)
– Array of Logic Elements (LEs)
– Combinational and sequential logic
– Programmable internal connections

Chapter 5 <120>
PLAs
PLAs
• X = ABC + ABC
• Y = AB Inputs
M

AND Implicants OR
ARRAY N ARRAY

P
Outputs
A B C
OR ARRAY

ABC

ABC

AB

AND ARRAY
X Y

Chapter 5 <121>
PLAs: Dot Notation
Inputs
M

AND Implicants OR
ARRAY N ARRAY

P
Outputs
A B C
OR ARRAY

ABC

ABC

AB

AND ARRAY
X Y

Chapter 5 <122>
FPGA: Field Programmable Gate Array

• Composed of:
– LEs (Logic elements): perform logic
– IOEs (Input/output elements): interface with outside
world
– Programmable interconnection: connect LEs and
IOEs
– Some FPGAs include other building blocks such as
multipliers and RAMs

Chapter 5 <123>
General FPGA Layout

Chapter 5 <124>
CLB : Configurable
Logic Block

IOB: Input-Output
Block
LE: Logic Element
• Composed of:
– LUTs (lookup tables): perform combinational logic
– Flip-flops: perform sequential logic
– Multiplexers: connect LUTs and flip-flops

Chapter 5 <126>
Altera Cyclone IV LE

Chapter 5 <127>
Altera Cyclone IV LE
• The Spartan CLB has:
– 1 four-input LUT
– 1 registered output
– 1 combinational output

Chapter 5 <128>
LE Configuration Example
Show how to configure a Cyclone IV LE to perform the
following functions:
– X = ABC + ABC
– Y = AB

Chapter 5 <129>
LE Configuration Example
Show how to configure a Cyclone IV LE to perform the
following functions:
– X = ABC + ABC
– Y = AB
(A) (B) (C) (X)
data 1 data 2 data 3 data 4 LUT output
0 0 0 X 0
0 0 1 X 1
A data 1
0 1 0 X 0
B data 2
0 1 1 X 0 C
data 3 X
1 0 0 X 0
0 data 4
1 0 1 X 0 LUT
1 1 0 X 1
LE 1
1 1 1 X 0

(A) (B) (Y)


data 1 data 2 data 3 data 4 LUT output
0 0 X X 0 A data 1
0 1 X X 0 B data 2
1 0 X X 1 0 data 3 Y
1 1 X X 0 0 data 4 LUT

LE 2

Chapter 5 <130>
Exercise
Implement the function Y = JKLMPQR using Cyclone IV LEs.

131
Exercise
Implement the following sequential circuit using Cyclone IV LEs

132
Solution

133
FPGA Design Flow
Using a CAD tool (such as Altera’s Quartus II)
• Enter the design using schematic entry or an HDL
• Simulate the design
• Synthesize design and map it onto FPGA
• Download the configuration onto the FPGA
• Test the design

Chapter 5 <134>

You might also like