0% found this document useful (0 votes)
177 views89 pages

RTL Design Slides PDF

Uploaded by

usama nawaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views89 pages

RTL Design Slides PDF

Uploaded by

usama nawaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Digital System Design

EEE344

M. Hassan Aslam
[email protected]
https://sites.google.com/ciitlahore.edu.pk/mianhassanaslam/
Digital Design
Chapter 5:
Register-Transfer Level
(RTL) Design
Slides to accompany the textbook Digital Design, First Edition,
by Frank Vahid, John Wiley and Sons Publishers, 2007.
http://www.ddvahid.com

Copyright © 2007 Frank Vahid


Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities,
subject to keeping this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf
with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means.
Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors
2 PowerPoint source or obtain special use permissions from Wiley – see http://www.ddvahid.com for information.
may obtain
5.1
Introduction
◼ Chpt 2

Higher levels
Register-
❑ Capture Comb. behavior: Equations, truth tables transfer
❑ Convert to circuit: AND + OR + NOT → Comb. logic level (RTL)

◼ Chpt 3 Logic level


❑ Capture sequential behavior: FSMs Transistor level
❑ Convert to circuit: Register + Comb. logic → Controller
◼ Chpt 4 Levels of digital
design abstraction
❑ Datapath components, simple datapaths
◼ Chpt 5
❑ Capture behavior: High-level state machine Processors:
❑ Convert to circuit: Controller + Datapath → Processor • Programmable
❑ Known as “RTL” (register-transfer level) design (microprocessor)
• Custom

3
Note: Slides with animation are denoted with a small red "a" near the animated items
RTL Design: Capture Behavior, Convert to
Circuit
◼ Recall
❑ Chapter 2: Combinational Logic Design
◼ First step: Capture behavior (using equation or truth table)
◼ Remaining steps: Convert to circuit
❑ Chapter 3: Sequential Logic Design Capture behavior
◼ First step: Capture behavior (using FSM)
◼ Remaining steps: Convert to circuit

◼ RTL Design (the method for creating


custom processors)
Convert to circuit
❑ First step: Capture behavior (using high-level
state machine, to be introduced)
❑ Remaining steps: Convert to circuit

4
5.2
RTL Design Method

5
RTL Design Method: “Preview” Example
◼ Soda dispenser s a
❑ c: bit input, 1 when coin
deposited
c Soda
❑ a: 8-bit input having value d dispenser
of deposited coin processor

❑ s: 8-bit input having cost


of a soda s a 25
❑ d: bit output, processor 50 25
1 1 0
sets to 1 when total value 0 0c
Soda tot:
tot:
of deposited coins equals d dispenser a

or exceeds cost of a soda 0 1 0 processor 50


25

How can we precisely describe this


processor’s behavior?
6
Preview Example: Step 1 --
Capture High-Level State Machine s
8
a
8
◼ Declare local register tot c Soda
◼ Init state: Set d=0, tot=0 d dispenser
processor
◼ Wait state: wait for coin
❑ If see coin, go to Add state Inputs: c (bit), a (8 bits), s (8 bits)
◼ Add state: Update total value: Outputs: d (bit)
tot = tot + a Local registers: tot (8 bits)
❑ Remember, a is present coin’s c
value Add
❑ Go back to Wait state Init Wait
tot=tot+a
◼ In Wait state, if tot >= s, go to
Disp(ense) state d=0 c’*(tot<s)
tot=0 c’*(tot<s)’
◼ Disp state: Set d=1 (dispense
Disp
soda)
❑ Return to Init state d=1

7
Preview Example: Inputs : c (bit), a(8 bits) , s (8 bits)

Step 2 -- Create Datapath


O utputs : d (bit)
Local reg isters : tot (8 bits)

c
Add

◼ Need tot register Init Wait


tot= tot+a

d=0 c‘ c‘ *(tot<s)
(tot<s)‘
◼ Need 8-bit comparator tot=0
Disp

to compare s and tot d=1


s a
◼ Need 8-bit adder to
perform tot = tot + a
tot_ld ld
◼ Wire the components tot
tot_clr clr
as needed for above
8
◼ Create control 8 8
input/outputs, give
8-bit 8-bit
them names tot_lt_s
adder
<

Datapath 8

8
Preview Example: Step 3 –
Connect Datapath to a Controller s a

◼ Controller’s inputs tot_ld ld


tot
tot_clr clr
❑ External input c (coin 8
8 8
detected)
8-bit 8-bit
tot_lt_s
❑ Input from datapath < adder

8
comparator’s output, s a
Datapath

which we named
8 8
tot_lt_s
◼ Controller’s outputs
❑ External output d c
(dispense soda)
❑ Outputs to datapath d tot_ld
to load and clear the
tot_clr
tot register
Controller Datapath
tot_lt_s

9
Preview Example: Step 4 – Derive the Controller’s FSM
s a
8 8
◼ Same states
and arcs as c
high-level
d tot_ld
state machine

Controller

Datapath
tot_clr
◼ But set/read
tot_lt_s
datapath s a
control Inputs:: c, tot_lt_s(bit)
Outputs:d, tot_ld, tot_clr (bit) tot_ld
signals for all tot_ld tot_clr
ld
clr
tpt

datapath c c
Add
8
8 8
tot_clr
operations d Init Wait
tot_ld=1 tot_lt_s 8-bit
tot_lt_s 8-bit
and d=0 c’*tot_lt_s < adder

tot_clr=1 Datapath 8
conditions Disp

d=1
Controller

10
Preview Example: Completing the Design
◼ Implement the FSM as
a state register and

tot_lt_s

tot_clr
logic

tot_ld
s1 s0 c n1 n0 d
❑ As in Ch3 0 0 0 0 0 1 0 0 1
❑ Table shown on right 0 0 0 1 0 1 0 0 1

Init
0 0 1 0 0 1 0 0 1
0 0 1 1 0 1 0 0 1
Inputs:: c, tot_lt_s (bit)
0 1 0 0 1 1 0 0 0
Outputs: d, tot_ld, tot_clr (bit)
0 1 0 1 0 1 0 0 0

Wait
tot_ld
c c 0 1 1 0 1 0 0 0 0
Add tot_clr 0 1 1 1 1 0 0 0 0
d Init Wait 1 0 0 0 0 1 0 1 0

Add
tot_ld=1
tot_lt_s
d=0 c’*tot_lt_s
1 1 0 0 0 0 1 0 0
Disp
tot_clr=1
Disp

d=1
Controller

11
Step 1: Create a High-Level State Machine
◼ Let’s consider each step
of the RTL design process
in more detail Inputs : c (bit), a (8 bits) , s (8 bits)
Outputs : d (bit)
◼ Step 1 Local reg isters: tot (8 bits)

❑ Soda dispenser example c

❑ Not an FSM because: Init Wait


tot= tot+a
◼ Multi-bit (data) inputs a and s
◼ Local register tot d=0 c’ (tot<s )
c’(tot<s )’
◼ Data operations tot=0, tot<s, tot=tot+a. tot=0
Disp
❑ Useful high-level state
machine: d=1
◼ Data types beyond just bits
◼ Local registers
◼ Arithmetic equations/expressions

12
Example: Laser-Based Distance Measurer
T (in seconds)
laser
D
Object of
a
interest
sensor
2D = T sec * 3*108 m/sec

◼ Laser-based distance measurement – pulse


laser, measure time T to sense reflection
❑ Laser light travels at speed of light, 3*108 m/sec
❑ Distance is thus D = (T sec * 3*108 m/sec) / 2

13
Example: Laser-Based Distance Measurer
T (in seconds)
B L
laser from button to laser
Laser-based
distance
sensor D 16 measurer S
to display from sensor

◼ Inputs/outputs
❑ B: bit input, from button, to begin measurement
❑ L: bit output, activates laser
❑ S: bit input, senses laser reflection
❑ D: 16-bit output, to display computed distance
14
Example: Laser-Based Distance Measurer
DistanceMeasurer from button B Laser-
L
to laser
Inputs: B (bit), S (bit) based
distance
Outputs: L (bit), D (16 bits) D 16 S
to display measurer
Local storage: Dreg(16) from sensor
(required)
a
S0 ?
(first state usually
L := '0' // laser off initializes the system)
Dreg := 0 // distance is 0

◼ Declare inputs, outputs, and local storage


❑ Dreg required for multi-bit output
◼ Create initial state, name it S0 Recall: '0' means single bit,
❑ Initialize laser to off (L:='0') 0 means integer

❑ Initialize displayed distance to 0 (Dreg:=0)


15
Example: Laser-Based Distance Measurer
from button B Laser-
L
to laser
DistanceMeasurer based
... B' // button not pressed distance
D 16 measurer S
to display from sensor

S0 S1 ?
B
L := '0' // button
Dreg := 0 pressed

◼ Add another state, S1, that waits for a button press


❑ B' – stay in S1, keep waiting
❑ B – go to a new state S2

Q: What should S2 do? A: Turn on the laser


a

16
Example: Laser-Based Distance
Measurer from button B Laser-
based
L
to laser

DistanceMeasurer distance
... B' D 16 S
to display measurer
from sensor

S0 S1 S2 S3
B
L := '0' L := '1' L := '0'
Dreg := 0 // laser on // laser off

◼ Add a state S2 that turns on the laser (L:='1')


◼ Then turn off laser (L:='0') in a state S3

Q: What do next? A: Start timer, wait to sense reflection


a

17
Example: Laser-Based Distance Measurer
B L
from button to laser
DistanceMeasurer Inputs: B (bit), S (bit) Outputs: L (bit), D (16 bits) Laser-based
Local storage: Dreg, Dctr (16 bits) 16
distance
D measurer S
B' to display from sensor
S' // no reflection

S // reflection
S0 S1 S2 S3 ?
B
L := '0' Dctr := 0 L := '1' L := '0'
Dreg := 0 // reset cycle Dctr := Dctr + 1
count // count cycles
a

◼ Stay in S3 until sense reflection (S)


◼ To measure time, count cycles while in S3
❑ To count, declare local storage Dctr
❑ Initialize Dctr to 0 in S1. In S2 would have been O.K. too.
◼ Don't forget to initialize local storage—common mistake
❑ Increment Dctr each cycle in S3

18
Example: Laser-Based Distance Measurer
B L
from button Laser- to laser
DistanceMeasurer Inputs: B (bit), S (bit) Outputs: L (bit), D (16 bits) based
Local storage: Dreg, Dctr (16 bits) distance
D 16 S
to display measurer
from sensor
B' S'

S0 S1 S2 S3 S4
B S
L := '0' Dctr := 0 L := '1' L := '0' Dreg := Dctr/2
Dreg := 0 Dctr := Dctr+1 // calculate D

◼ Once reflection detected (S), go to new state S4


❑ Calculate distance
❑ Assuming clock frequency is 3x108, Dctr holds number of meters, so
Dreg:=Dctr/2
◼ After S4, go back to S1 to wait for button again

19
Step 2: Create a Datapath
◼ Datapath must
❑ Implement data storage
❑ Implement data computations
◼ Look at high-level state machine, do
three substeps
❑ (a) Make data inputs/outputs be datapath
inputs/outputs
❑ (b) Instantiate declared registers into the
datapath (also instantiate a register for each Instantiate: to
data output)
introduce a new
❑ (c) Examine every state and transition, and
instantiate datapath components and component into a
connections to implement any data design.
computations

20
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(a) Make data Local Registers: Dctr (16 bits)
inputs/outputs be
datapath B‘ S‘
inputs/outputs
(b) Instantiate declared
registers into the S0 S1 S2 S3 S4
B S
datapath (also
instantiate a L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
register for each
data output) a
Datapath
(c) Examine every Dreg_clr
state and Dreg_ld
transition, and
Dctr_clr clear clear I
instantiate Dctr: 16-bit Dreg: 16-bit
Dctr_cnt count load
datapath up-counter register
components and Q Q
connections to
implement any 16
data computations
D

21
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(c) (continued) Local Registers: Dctr (16 bits)
Examine every
state and B‘ S‘
transition, and
instantiate
S0 S1 S2 S3 S4
datapath B S
components and L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
connections to D=0 Dctr = Dctr + 1 (calculate D)
implement any Datapath
a

data computations
Dreg_clr >>1
16
Dreg_ld

Dctr_clr clear clear I


Dctr: 16-bit Dreg: 16-bit
Dctr_cnt count load
up-counter register
Q Q
16

16
D

22
Step 2 Example Showing Mux Use
Localregisters:
E, F, G, R (16 bits)
E F G E F G E F G

T0 R = E + F
A B A B add_A_s0 2×1 2×1
+ + add_B_s0
T1 R = R + G A B
a
+
R R

R
(a) (b) (c)

(d)
◼ Introduce mux when one component input can
come from more than one source

23
Step 3: Connecting the Datapath to a
Controller
L
B to laser
from button
Controller from sensor
Dreg_clr S

Dreg_ld
◼ Laser-based distance
measurer example
Dctr_clr Datapath
◼ Easy – just connect all
Dctr_cnt
D control signals
to display between controller and
16 300 M H z Clock
datapath
Datapath

Dreg_clr >>1
Dreg_ld 16

Dctr_clr clear clear I


count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt up-counter load register
Q Q
16
16
24 D
Step 4: Deriving the Controller’s FSM
B
L Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
from button
Controller
to laser
Local Registers: Dctr (16 bits)
from sensor
Dreg_clr S

Dreg_ld
B’ S’
Dctr_clr Datapath

Dctr_cnt
D S0 S1 S2 S3 S4
to display B S
16 300 MHz Clock
L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
Inputs: B, S
◼ FSM has same Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
structure as high-
level state machine B’ S’
a
❑ Inputs/outputs all
bits now S0 S1
B
S2 S3
S
S4
❑ Replace data
operations by bit L=0 L=0 L=1 L=0 L=0
Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
operations using Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
datapath Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)
25
Step 4: Deriving the Controller’s FSM
B’ S’

B S
S0 S1 S2 S3 S4

L=0 L=0 L=1 L=0 L=0


Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)

Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt


◼ Using
shorthand of B’ S’
a
outputs not
assigned B S
S0 S1 S2 S3 S4
implicitly
assigned 0 L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1
Dreg_clr = 1 (clear count) (laser on) Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)

26
Step 4
B L
from button to laser Datapath

Controller
from sensor
Dreg_clr S
Dreg_clr >>1

Datapath
Dreg_ld 16
Dreg_ld
Dctr_clr
Dctr_clr clear clear I
Dctr_cnt count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt up-counter load register
D
to display Q Q
16 300 MHz Clock 16
16
D

Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt

B’ S’

◼ Implement
B S
S0 S1 S2 S3 S4 FSM as state
L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1 register and
Dreg_clr = 1 (clear count) (laser on) Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (laser off)
logic (Ch3) to
(load D reg with Dctr/2)
(clear D reg) (count up) (stop counting) complete the
design

27
RTL Design Examples and Issues
◼ We’ll use several more Master
processor
examples to illustrate RTL
design rd
D
◼ Example: Bus interface 32
4 A
❑ Master processor can read
register from any peripheral Per0 Per1 Per15
◼ Each register has unique 4-bit address
◼ Assume 1 register/periph. to/from processor bus
rd D A
❑ Sets rd=1, A=address
❑ Appropriate peripheral places 32 4
register data on 32-bit D lines Faddr
◼ Periph’s address provided on Faddr inputs Bus interface
(maybe from DIP switches, or another 4
register) Q
32

Main part

Peripheral
28
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd
((A = Faddr)
and rd’)
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1
Q1 = Q

◼ Step 1: Create high-level state machine


❑ State WaitMyAddress
◼ Output “nothing” (“Z”) on D, store peripheral’s register value Q into local
register Q1
◼ Wait until this peripheral’s address is seen (A=Faddr) and rd=1
❑ State SendData
◼ Output Q1 onto D, wait for rd=0 (meaning main processor is done reading
the D lines)
29
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd
((A = Faddr)
and rd’)
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1
Q1 = Q

30
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits) A Faddr Q
rd’ rd
4 4 32
((A = Faddr)
and rd)’ Q1_ld
ld Q1
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1 = (4-bit)
Q1 = Q 32
A_eq_Faddr

D_en
32
a

◼ Step 2: Create a datapath Datapath


(a) Datapath inputs/outputs Bus interface
(b) Instantiate declared registers D
(c) Instantiate datapath components and
connections

31
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd A Faddr Q
Inputs: rd, A_eq_Faddr
((A(bit)
= Faddr)
Outputs: Q1_ld, D_en (bit)
and rd)’ 4 4 32

rdSendData Q1_ld
rd WaitMyAddress rd ld
(A = Faddr) Q1
D = “Z” and(A_eq_
rd Faddr D = Q1
Q1 = Q and rd) ‘
= (4-bit) 32
WaitMyAdd ress SendD ata A_eq_Faddr
A_eq_ Faddr
D_en = 0 and rd D_en = 1 D_en
a Q1_ld = 1 Q1_ld = 0 32

Datapath
Bus interface

◼ Step 3: Connect datapath to controller D

◼ Step 4: Derive controller’s FSM

32
RTL Example: Video Compression – Sum of Absolute Differences
Only difference: ball moving
Frame 1 Frame 2 Frame 1 Frame 2

Digitized Digitized Digitized Difference of a


frame 1 frame 2 frame 1 2 from 1

1 Mbyte 1 Mbyte 1 Mbyte 0.01 Mbyte


(a) (b)
Just send
◼ Video is a series of frames (e.g., 30 per second) difference
◼ Most frames similar to previous frame
❑ Compression idea: just send difference from previous frame

33
RTL Example: Video Compression – Sum of Absolute Differences
compare Each is a pixel, assume
Frame 1 Frame 2
represented as 1 byte
(actually, a color picture
might have 3 bytes per
pixel, for intensity of
red, green, and blue
components of pixel)
◼ Need to quickly determine whether two frames are similar
enough to just send difference for second frame
❑ Compare corresponding 16x16 “blocks”
◼ Treat 16x16 block as 256-byte array
❑ Compute the absolute value of the difference of each array item
❑ Sum those differences – if above a threshold, send complete frame
for second frame; if below, can use difference method (using
another technique, not described)

34
RTL Example: Video Compression – Sum of
Absolute Differences

A SAD
256-byte array

integer
B sad
256-byte array
go

!(i<256)

◼ Want fast sum-of-absolute-differences (SAD) component


❑ When go=1, sums the differences of element pairs in arrays A and
B, outputs that sum

35
RTL Example: Video Compression – Sum of Absolute
Differences
A SAD
Inputs: A, B (256 byte memory); go (bit)
Outputs: sad (32 bits)
B sad Local registers: sum, sad_reg (32 bits); i (9 bits)

go
S0 !go
go
◼ S0: wait for go sum = 0 a
S1
i=0
◼ S1: initialize sum and index
(i<256)’
◼ S2: check if done (i>=256) S2
◼ S3: add difference to sum, i<256
sum=sum+abs(A[i]-B[i])
increment index S3
i=i+1
◼ S4: done, write to output
S4 sad_ reg = sum
sad_reg

36
RTL Example: Video Compression – Sum of
Absolute Differences
Inputs: A, B (256 byte memory); go (bit) AB_addr A_data B_data
Outputs: sad (32 bits)
Local registers: sum, sad_reg (32 bits); i (9 bits) i_lt_256
<256 8 8
9
S0 !go i_inc
go i_clr
i –
sum = 0 a
8
S1
i=0
sum_ld
(i<256)’ sum 32 abs
S2 sum_clr
i<256 32 32 8
sum=sum+abs(A[i]-B[i]) sad_reg_ld
S3
i=i+1
sad_reg +
S4 sad_ reg=sum 32
Datapath
sad
◼ Step 2: Create datapath
37
Example: Video Compression – Sum of Absolute Differences
go AB_rd AB_addr A_data B_data

i_lt_256
<256 8 8
S0 go’
9
go i_inc
S1
sum=0 sum_clr=1
i_clr
i –
i=0 i_clr=1
8
S2 sum_ld
? i<256 i_lt_256 sum 32 abs
sum_clr
S3 sum=sum+abs(A[i]-B[i])
sum_ld=1; AB_rd=1 32 32 8
i=i+1 i_inc=1 sad_reg_ld
S4 sad_reg=sum
a
sad_reg +
sad_reg_ld=1
Controller 32

sad
◼ Step 3: Connect to controller
◼ Step 4: Replace high-level state machine by FSM
38
RTL Example: Video Compression – Sum of Absolute
Differences
◼ Comparing software and custom
circuit SAD
❑ Circuit: Two states (S2 & S3) for
each i, 256 i’s→ 512 clock cycles
❑ Software: Loop (for i = 1 to 256), but (i<256)’
S2
for each i, must move memory to
local registers, subtract, compute i<256
sum=sum+abs(A[i]-B[i])
absolute value, add to sum, S3
i=i+1
increment i – say about 6 cycles per
array item → 256*6 = 1536 cycles
❑ Circuit is about 3 times (300%) faster
❑ Later, we’ll see how to build SAD
circuit that is even faster

39
RTL Design Pitfalls and Good Practice
◼ Common pitfall: Assuming
register is update in the
state it’s written
❑ Final value of Q?
❑ Final state?
❑ Answers may surprise you
◼ Value of Q unknown
◼ Final state is C, not D
❑ Why?
◼ State A: R=99 and Q=R happen
simultaneously
◼ State B: R not updated with R+1 until
next clock cycle, simultaneously with
state register being updated

40
RTL Design Pitfalls and Good Practice
◼ Solutions
❑ Read register in
following state
(Q=R)
❑ Insert extra state
so that conditions
use updated value
❑ Other solutions are
possible, depends
on the example

41
RTL Design Pitfalls and Good Practice
◼ Common pitfall: Inputs: A, B (8 bits) Inputs: A, B (8 bits)
Outputs: P (8 bits) Outputs: P (8 bits)
Reading outputs Local register: R (8 bits)
❑ Outputs can only
be written S T S T

❑ Solution: Introduce
P=A P=P+B R=A P=R+B
additional register, P=A
which can be (a) (b)
written and read

42
RTL Design Pitfalls and Good Practice
◼ Good practice: Register B B
all data outputs R R
❑ In fig (a), output P would
show spurious values as
addition computes
◼ Furthermore, longest register-to- + +
register path, which determines
clock period, is not known until
that output is connected to
another component P
❑ In fig (b), spurious outputs (a) Preg
reduced, and longest
register-to-register path is
clear P
(b)

43
Control vs. Data Dominated RTL Design

◼ Designs often categorized as control-dominated


or data-dominated
❑ Control-dominated design – Controller contains most
of the complexity
❑ Data-dominated design – Datapath contains most of
the complexity
❑ General, descriptive terms – no hard rule that
separates the two types of designs
❑ Laser-based distance measurer – control dominated
❑ Bus interface– mix of control and data
❑ Now let’s do a data dominated design
44
Data Dominated RTL Design Example: FIR Filter

◼ Filter concept
❑ Suppose X is data from a
temperature sensor, and
particular input sequence is
180, 180, 181, 240, 180, 181 X Y
(one per clock cycle)
❑ That 240 is probably wrong! 12 digital filter 12
◼ Could be electrical noise
clk
❑ Filter should remove such
noise in its output Y
❑ Simple filter: Output average
of last N values
◼ Small N: less filtering
◼ Large N: more filtering, but less sharp
output

45
Data Dominated RTL Design Example: FIR Filter

◼ FIR filter
❑ “Finite Impulse Response” X Y
12 digital filter 12
❑ Simply a configurable
clk
weighted sum of past input
values
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
❑ y(t) = c0*x(t) + c1*x(t-1) +
c2*x(t-2)
◼ Above known as “3 tap”
◼ Tens of taps more common
◼ Very general filter – User sets the
constants (c0, c1, c2) to define specific
filter
❑ RTL design
◼ Step 1: Create high-level state machine
❑ But there really is none! Data
dominated indeed.
46 ◼ Go straight to step 2
Data Dominated RTL Design Example: FIR Filter

◼ Step 2: Create datapath X Y


12 digital filter 12
❑ Begin by creating chain of clk
xt registers to hold past
values of X
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Suppose sequence is: 180, 181, 240

240
180
181 180
181 180
a

47
Data Dominated RTL Design Example: FIR Filter

◼ Step 2: Create datapath X Y


12 digital filter 12
(cont.) clk
❑ Instantiate registers for
c0, c1, c2
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
❑ Instantiate multipliers to
compute c*x values
3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X
a
clk

* * *
Y

48
Data Dominated RTL Design Example: FIR Filter

◼ Step 2: Create X
12 digital filter 12
Y

datapath (cont.) clk

❑ Instantiate adders
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X

clk
a
* * *

+ +
Y

49
Data Dominated RTL Design Example: FIR Filter

◼ Step 2: Create datapath (cont.) X Y


12 digital filter 12
❑ Add circuitry to allow loading of clk
particular c register
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
CL 3-tap FIR filter
e
3
Ca1 2x4 2
Ca0 1
0
C

x(t) x(t-1) x(t-2)


c0 c1 c2
xt0 xt1 xt2 a
X

clk

* * *

+ + yreg
Y

50
Data Dominated RTL Design Example: FIR Filter
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
◼ Step 3 & 4: Connect to controller, Create FSM
❑ No controller needed
❑ Extreme data-dominated example
❑ (Example of an extreme control-dominated design – an FSM, with no
datapath)
◼ Comparing the FIR circuit to a software implementation
❑ Circuit
◼ Assume adder has 2-gate delay, multiplier has 20-gate delay
◼ Longest past goes through one multiplier and two adders
❑ 20 + 2 + 2 = 24-gate delay
◼ 100-tap filter, following design on previous slide, would have about a 34-gate
delay: 1 multiplier and 7 adders on longest path
❑ Software
◼ 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per
multiplication, 2 per addition. Say 10-gate delay per instruction.
◼ (100*2 + 100*2)*10 = 4000 gate delays
❑ Circuit is more than 100 times faster (10,000% faster). Wow.

51
5.4
Determining Clock Frequency
◼ Designers of digital circuits
often want fastest
performance clk a b
❑ Means want high clock
frequency
Frequency limited by longest

register-to-register delay
2 ns +
delay
❑ Known as critical path
❑ If clock is any faster, incorrect
data may be stored into register c
❑ Longest path on right is 2 ns
◼ Ignoring wire delays, and register setup
and hold times, for simplicity

52
Critical Path
◼ Example shows four
paths
❑ a to c through +: 2 ns a b

❑ a to d through + and *: 7
ns 2 ns 5 ns
+ *
delay delay
❑ b to d through + and *: 7 7 ns 7 ns

2 ns

5 ns
7 ns
7 ns
ns
c d
❑ b to d through *: 5 ns Max
(2,7,7,5)

◼ Longest path is thus 7 = 7 ns

ns
◼ Fastest frequency
❑ 1 / 7 ns = 142 MHz
53
Critical Path Considering Wire Delays
◼ Real wires have delay too
❑ Must include in critical path
◼ Example shows two paths
❑ Each is 0.5 + 2 + 0.5 = 3 ns clk a b
◼ Trend
0.5 ns
❑ 1980s/1990s: Wire delays were tiny 0.5 ns
compared to logic delays
❑ But wire delays not shrinking as fast as + 2 ns
logic delays
◼ Wire delays may even be greater than 0.5 ns
logic delays!

3 ns

3 ns
c 3 ns
◼ Must also consider register setup and
hold times, also add to path
◼ Then add some time to the computed
path, just to be safe
❑ e.g., if path is 3 ns, say 4 ns instead

54
A Circuit May Have Numerous Paths
s a
◼ Paths can exist
❑ In the datapath Combinational logic 8 8
d
❑ In the controller
❑ Between the tot_ld
ld
controller and tot_clr tot
c clr
datapath
(c ) 8
tot_lt_s
❑ May be hundreds n1
or thousands of
paths n0
8-bit 8-bit
< adder
◼ Timing analysis tot_lt_s 8

tools that evaluate Datapath


s1 s0
all possible paths (b ) (a)
clk
automatically very State register

helpful

55
5.5
Behavioral Level Design: C to Gates
C code
S0 !go
int SAD (byte A[256], byte B[256]) // not quite C syntax
go
{
sum = 0
S1 uint sum; short uint I;
i=0
sum = 0;
(i<256)’ i = 0;
S2 while (i < 256) {
sum = sum + abs(A[i] – B[i]);
i<256
i = i + 1;
sum=sum+abs(A[i]-B[i])
S3 }
i=i+1
return sum;
}
a
S4 sad_ reg = sum

◼ Earlier sum-of-absolute-differences example


❑ Started with high-level state machine
❑ C code is an even better starting point -- easier to understand

56
Behavioral-Level Design: Start with C (or Similar
Language)
◼ Replace first step of RTL design method by two steps
❑ Capture in C, then convert C to high-level state machine
❑ How convert from C to high-level state machine?
Step 1A: Capture in C
a
Step 1B: Convert to high-level state machine

57
Converting from C to High-Level State Machine

◼ Convert each C construct to


equivalent states and
transitions
◼ Assignment statement
target= a
target = expression;
❑ Becomes one state with expression
assignment
◼ If-then statement
❑ Becomes state with condition !cond
check, transitioning to “then” cond
if (cond) {
statements if condition true, // then stmts (then stmts) a

otherwise to ending state }


◼ “then” statements would also be (end)
converted to states

58
Converting from C to High-Level State Machine

◼ If-then-else
!cond
❑ Becomes state with condition if (cond) { cond
check, transitioning to “then” // then stmts
(then stmts) (else stmts)
statements if condition true, or }
else { a
to “else” statements if condition // else stmts (end)
false }

◼ While loop statement !cond

cond
❑ Becomes state with condition while (cond) {
// while stmts (while stmts)
check, transitioning to while }
a

loop’s statements if true, then


transitioning back to condition
(end)
check

59
Simple Example of Converting from C to
High-Level State Machine
Inputs: uint X, Y
Outputs: uint Max !(X>Y) !(X>Y)

X>Y X>Y
if (X > Y) {
Max = X; (then stmts) (else stmts) Max=X Max=Y
}
else {
Max = Y;
(end) (end)
}
a a

(a) (b) (c)

◼ Simple example: Computing the maximum of two numbers


❑ Convert if-then-else statement to states (b)
❑ Then convert assignment statements to states (c)

60
Example: Converting Sum-of-Absolute-Differences C code to
High-Level State Machine
Inputs: byte A[256, B[256]
◼ Convert each construct to bit go;
!(!go)
Output: int sad
states main() !go !go go !go go
{
❑ Simplify when possible, uint sum; short uint I;
while (1) {
sum=0 sum=0
i=0
e.g., merge states
while (!go); i=0
(d)
◼ From high-level state sum = 0;
i = 0;
machine, follow RTL design while (i < 256) {
(b)
(c)

method to create circuit sum = sum + abs(A[i] - B[i]);


i = i + 1;
◼ Thus, can convert C to }
}
sad = sum;
!go go !go go
gates using straightforward }
(a)
a

automatable process !go go


sum=0
i=0
sum=0
i=0
❑ Not all C constructs can be sum=0 !(i<256) !(i<256)
efficiently converted i=0
i<256 i<256
❑ Use C subset if intended for !(i<256)
sum=sum sum=sum
circuit i<256
+ abs
i=i+1
+ abs
i=i+1
while stmts
❑ Can use languages other sad =
sum
than C, of course
(g)

(e) sad =
sum
61 (f)
Register Files
◼ MxN register file
component provides er C C
32

efficient access to M N- er t
8
d0d0 load
loadreg0 huge mux
bit-wide registers ?
s t ompu reg0 T
32
ompu
cthe car4×162×4 i0 i0 miT
❑ If we have many registersom com al too much 8 mi T rr
r rt the car'sd1 32-bit
8-bit r
but only need access one al rt F nec 4 a0 loadfanout
reg1 A 16x41×1 r
r
F
or two at a time, a register ecn i0 o
y ve
a
i1
file is more efficient 8 o
v
i3-i0
a1
i1 ae
d dy - DD
d2 load reg2 I 8
❑ Ex: Above-mirror display 32
(earlier example), but this i2
8 congestion
time having 16 32-bit d3 load reg3 M
d15
e load reg15
registers e
◼ Too many wires, and big mux is load i15i3 s1 s0
load 32 8
too slow s3-s0
x y

62
Register File
◼ Instead, want component that has one data input and one data output,
and allows us to specify which internal register to write and which to read

32 32
W_data R_data a

4 4
W_addr R_addr

W_en R_en
16×32
register file

63
Register File Timing Diagram
◼ Can write one
register and read
one register each
clock cycle
❑ May be same
register

32 32
W_data R_data
2 2
W_addr R_addr

W_en R_en
4x32
register file

64
5.6
Memory Components
◼ Register-transfer level
design instantiates datapath
components to create
datapath, controlled by a

M words
controller
❑ A few more components are
often used outside the
controller and datapath
◼ MxN memory
❑ M words, N bits wide each N-bits
wide each
◼ Several varieties of memory,
which we now introduce M×N memory

65
Random Access Memory (RAM)
◼ RAM – Readable and writable memory 32 32
W_data R_data
❑ “Random access memory” 4 4
W_addr R_addr
◼ Strange name – Created several decades ago to
contrast with sequentially-accessed storage like W_en R_en
tape drives 16×32
register file
❑ Logically same as register file – Memory with
address inputs, data inputs/outputs, and control Register file from Chpt. 4
◼ RAM usually just one port; register file usually two or
more
32
❑ RAM vs. register file data
◼ RAM typically larger than roughly 512 or 1024 words 10
addr
◼ RAM typically stores bits using a bit storage 1024 × 32
rw RAM
approach that is more efficient than a flip flop
en
◼ RAM typically implemented on a chip in a square
rather than rectangular shape – keeps longest wires
(hence delay) short
RAM block symbol

66
RAM Internal Structure
32
data
10
addr Let A = log2M wdata(N-1) wdata(N-2) wdata0
1024x32
rw RAM word bit storage
en enable block
d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
decoder
data cell
addr
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
en rw data
rw to all cells

rdata(N-1) rdata(N-2) rdata0 RAM cell

◼ Similar internal structure as register file


❑ Decoder enables appropriate word based on address
inputs
❑ rw controls whether cell is written or read
❑ Let’s see what’s inside each RAM cell
67
Static RAM (SRAM)
SRAM cell
32 data data’
data
10
addr cell
1024x32 d d’
rw RAM
en
a

word 0
enable

◼ “Static” RAM cell SRAM cell


❑ 6 transistors (recall inverter is 2 transistors) data data’
1 0
❑ Writing this cell d
◼ word enable input comes from decoder a

◼ When 0, value d loops around inverters 1 0

❑ That loop is where a bit stays stored


◼ When 1, the data bit value enters the loop word 1
❑ data is the bit to be stored in this cell enable
❑ data’ enters on other side
data data’
❑ Example shows a “1” being written into cell cell
d d’
1 0 a

68 word 0
enable
Static RAM (SRAM)
32
data
10
addr
1024x32
rw RAM
en

SRAM cell
◼ “Static” RAM cell
data data’
❑ Reading this cell 1 1
◼ Somewhat trickier d
◼ When rw set to read, the RAM logic sets both
1 0
data and data’ to 1
a
◼ The stored bit d will pull either the left line or
the right bit down slightly below 1 1 1 <1
word
◼ “Sense amplifiers” detect which side is slightly enable
pulled down To sense amplifiers

❑ The electrical description of SRAM is really


beyond our scope – just general idea here,
mainly to contrast with DRAM...

69
Dynamic RAM (DRAM)
32
data
10
addr
1024x32
rw RAM
en

DRAM cell
data

◼ “Dynamic” RAM cell cell

word
❑ 1 transistor (rather than 6) enable
d
capacitor
❑ Relies on large capacitor to store bit slowly
◼ Write: Transistor conducts, data voltage level gets discharging
stored on top plate of capacitor
(a)
◼ Read: Just look at value of d
◼ Problem: Capacitor discharges over time data
❑ Must “refresh” regularly, by reading d and
enable
then writing it right back
discharges
d
(b)

70
Comparing Memory Types
◼ Register file MxN Memory
❑ Fastest implemented as a:

❑ But biggest size register


file
◼ SRAM
❑ Fast SRAM
❑ More compact than register file DRAM
◼ DRAM
❑ Slowest
◼ And refreshing takes time
Size comparison for same
❑ But very compact
number of bits (not to scale)
◼ Use register file for small items,
SRAM for large items, and DRAM
for huge items
❑ Note: DRAM’s big capacitor requires
a special chip design process, so
DRAM is often a separate chip

71
Reading and Writing a RAM
clk clk
1 2 3
addr 9 13 9 addr valid setup
time
data 500 999 Z 500 data valid hold Z 500
time
rw 1 means write setup
rw
time
en access
RAM[9] RAM[13] time
now equals 500 now equals 999
◼ Writing (b)
❑ Put address on addr lines, data on data lines, set rw=1, en=1
◼ Reading
❑ Set addr and en lines, but put nothing (Z) on data lines, set rw=0
❑ Data will appear on data lines
◼ Don’t forget to obey setup and hold times
❑ In short – keep inputs stable before and after a clock edge

72
RAM Example: Digital Sound Recorder
4096× 16
RAM

data
addr
rw
en
wire 16
analog-to- digital-to-
digital 12 analog
ad_buf Ra Rrw Ren wire
microphone converter converter
ad_ld processor da_ld

◼ Behavior speaker

❑ Record: Digitize sound, store as series of 4096 12-bit digital values in RAM
◼ We’ll use a 4096x16 RAM (12-bit wide RAM not common)
❑ Play back later
❑ Common behavior in telephone answering machine, toys, voice recorders
◼ To record, processor should read a-to-d, store read values into
successive RAM words
❑ To play, processor should read successive RAM words and enable d-to-a

73
RAM Example: Digital Sound Recorder
4096x16
◼ RTL design of processor RAM

❑ Create high-level state


machine 16
analog-to- digital-to-
❑ Begin with the record behavior digital ad_buf
12
Ra Rw Ren analog
converter converter
❑ Keep local register a ad_ld processor da_ld
◼ Stores current address, ranges from 0
to 4095 (thus need 12 bits)
❑ Create state machine that Record behavior
Local register: a (12 bits)
counts from 0 to 4095 using a
◼ For each a a<4095
S T
❑ Read analog-to-digital conv.
a=0 ad_ld=1
▪ ad_ld=1, ad_buf=1 a
ad_buf=1
❑ Write to RAM at address a Ra=a U
▪ Ra=a, Rrw=1, Ren=1 Rrw=1 a=a+1
Ren=1
a=4095

74
RAM Example: Digital Sound Recorder
4096x16
❑ Now create play behavior RAM data bus
❑ Use local register a again,
create state machine that counts 16
from 0 to 4095 again analog-to-
digital 12
digital-to-
analog
ad_buf Ra Rw Ren
◼ For each a converter converter
❑ Read RAM ad_ld processor da_ld

❑ Write to digital-to-analog conv.


◼ Note: Must write d-to-a one cycle after
reading RAM, when the read data is Play behavior
available on the data bus
Local register: a (12 bits)
❑ The record and play state a<4095
machines would be parts of a V W
larger state machine controlled a=0 ad_buf=0
a

by signals that determine when Ra=a


Rrw=0 X
to record or play Ren=1
da_ld=1
a=a+1
a=4095

75
Read-Only Memory – ROM
32
◼ Memory that can only be read from, not data
10
written to addr
1024× 32
❑ Data lines are output only rw RAM
❑ No need for rw input en

◼ Advantages over RAM


❑ Compact: May be smaller RAM block symbol

❑ Nonvolatile: Saves bits even if power supply


is turned off 32
❑ Speed: May be faster (especially than DRAM) data
10 1024x32
❑ Low power: Doesn’t need power supply to addr
ROM
save bits, so can extend battery life
en
◼ Choose ROM over RAM if stored data won’t
change (or won’t change often) ROM block symbol
❑ For example, a table of Celsius to Fahrenheit
conversions in a digital thermometer

76
Read-Only Memory – ROM
32
data
10 1024x32
addr Let A = log2M
ROM
en
word bit storage
enable block
ROM block symbol d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
decoder
data
addr
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
en data

rdata(N-1) rdata(N-2) rdata0 ROM cell

◼ Internal logical structure similar to RAM, without


the data input lines

77
ROM Types
◼ If a ROM can only be read, how
are the stored bits stored in the
first place?
❑ Storing bits in a ROM known as
programming
❑ Several methods
◼ Mask-programmed ROM 1 data line 0 data line

❑ Bits are hardwired as 0s or 1s cell cell


during chip manufacturing word
◼ 2-bit word on right stores “10” enable
◼ word enable (from decoder) simply passes the
hardwired value through transistor
❑ Notice how compact, and fast, this
memory would be

78
ROM Types
◼ Fuse-Based Programmable
ROM
❑ Each cell has a fuse
❑ A special device, known as a
programmer, blows certain
1 data line 1 data line
fuses (using higher-than-normal
cell cell
voltage) a
◼ Those cells will be read as 0s (involving word
enable
some special electronics)
◼ Cells with unblown fuses will be read as 1s fuse blown fuse
◼ 2-bit word on right stores “10”

❑ Also known as One-Time


Programmable (OTP) ROM

79
ROM Types
◼ Erasable Programmable ROM
(EPROM)
❑ Uses “floating-gate transistor” in each cell
❑ Special programmer device uses higher-
than-normal voltage to cause electrons to
tunnel into the gate

floating-gate
◼ Electrons become trapped in the gate data line data line

transistor
◼ Only done for cells that should store 0 cell cell
◼ Other cells (without electrons trapped in 1 0
gate) will be 1 o
tr
word eÐeÐ
❑ 2-bit word on right stores “10” enable
ting
ar
◼ Details beyond our scope – just general idea tea t trapped electrons
is necessary here g
❑ To erase, shine ultraviolet light onto chip
◼ Gives trapped electrons energy to escape
◼ Requires chip package to have window

80
ROM Types
◼ Electronically-Erasable Programmable ROM
(EEPROM)
❑ Similar to EPROM
◼ Uses floating-gate transistor, electronic programming to
trap electrons in certain cells
❑ But erasing done electronically, not using UV light
❑ Erasing done one word at a time
◼ Flash memory
❑ Like EEPROM, but all words (or large blocks of words)
can be erased simultaneously 32
data
❑ Become common relatively recently (late 1990s) 10
addr
◼ Both types are in-system programmable en 1024x32
❑ Can be programmed with new stored bits while in the EEPROM
write
system in which the ROM operates
◼ Requires bi-directional data lines, and write control input busy

◼ Also need busy output to indicate that erasing is in


progress – erasing takes some time

81
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
◼ Want to record the outgoing
announcement 4096x16 Flash
❑ When rec=1, record digitized “We’re not home.”
sound in locations 0 to 4095
busy
❑ When play=1, play those
stored sounds to digital-to- 16
analog converter analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
◼ What type of memory? converter analog
ad_ld processor converter
❑ Should store without power
da_ld
supply – ROM, not RAM
❑ Should be in-system rec
programmable – EEPROM or record play
Flash, not EPROM, OTP microphone speaker
ROM, or mask-programmed
ROM
❑ Will always erase entire
memory when
reprogramming – Flash better
than EEPROM

82
ROM Example: Digital Telephone Answering
Machine Using a Flash Memory
◼ High-level state machine 4096x16 Flash

❑ Once rec=1, begin erasing


flash by setting er=1
16
analog-to-
❑ Wait for flash to finish digital ad_buf
12
Ra Rrw Ren er bu digital-to-
converter analog
erasing by waiting for ad_ld processor converter
da_ld
bu=0
rec
❑ Execute loop that sets record play

local register a from 0 to microphone speaker

4095, reading analog-to-


digital converter and Local register: a (13 bits)
bu
writing to flash for each a a<4096 a
S T bu’ U
a=0 er=0 ad_ld=1
er=1 ad_buf=1
Ra=a V
rec
Rrw=1
Ren=1
a=a+1 a=4096

83
Blurring of Distinction Between ROM
and RAM ROM RAM
◼ We said that Flash a
EEPROM NVRAM
❑ RAM is readable and writable
❑ ROM is read-only
◼ But some ROMs act almost like RAMs
❑ EEPROM and Flash are in-system programmable
◼ Essentially means that writes are slow
❑ Also, number of writes may be limited (perhaps a few million times)
◼ And, some RAMs act almost like ROMs
❑ Non-volatile RAMs: Can save their data without the power supply
◼ One type: Built-in battery, may work for up to 10 years
◼ Another type: Includes ROM backup for RAM – controller writes RAM contents to ROM
before turning off
◼ New memory technologies evolving that merge RAM and ROM benefits
❑ e.g., MRAM
◼ Bottom line
❑ Lot of choices available to designer, must find best fit with design goals

84
Hierarchy and Abstraction
◼ Abstraction
❑ Hierarchy often involves not just grouping
items into a new item, but also associating
higher-level behavior with the new item,
known as abstraction
◼ e.g., an 8-bit adder has an understandable a7.. a0 b7.. b0
high-level behavior – it adds two 8-bit binary
numbers 8-bit adder ci
❑ Frees designer from having to remember, or
co s7.. s0
even from having to understand, the lower-
level details

85
Hierarchy and Composing Larger
Components from Smaller Versions
◼ A common task is to compose smaller components
into a larger one a
❑ Gates: Suppose you have plenty of 3-input AND gates,
but need a 9-input AND gate
◼ Can simple compose the 9-input gate from several 3-input
gates
❑ Muxes: Suppose you have 4x1 and 2x1 muxes, but
need an 8x1 mux
◼ s2 selects either top or bottom 4x1
◼ s1s0 select particular 4x1 input
◼ Implements 8x1 mux – 8 data inputs, 3 selects, one output

P
ro
vin
ce 1

86
Hierarchy and Composing Larger Components
from Smaller Versions
◼ Composing memory very common
◼ Making memory words wider
❑ Easy – just place memories side-by-side until desired width obtained
❑ Share address/control lines, concatenate data lines
❑ Example: Compose 1024x8 ROMs into 1024x32 ROM
10
addr addr addr addr
1024x8 1024x8 1024x8 1024x8
addr ROM ROM ROM ROM
en en en en
data data data data
en
8 8 8 8

data(31..0)
10
1024x32
ROM
data
32
87
Hierarchy and Composing Larger
Components from Smaller Versions 11
a9..a0
◼ Creating memory with more words addr
❑ Put memories on top of one another until the addr a10 1x2 d0 1024x8
number of desired words is achieved i0 dcd ROM
❑ Use decoder to select among the memories e d1 en data
◼ Can use highest order address input(s) as 8
decoder input
◼ Although actually, any address line could be en addr
used 1024x8
11 ROM
❑ Example: Compose 1024x8 memories into
2048x8 memory 2048x8 en data
ROM
a10 a9 a8 a0 8
0 0 0 0 0 0 0 0 0 0 0 data
0 0 0 0 0 0 0 0 0 0 1 addr
8
0 0 0 0 0 0 0 0 0 1 0 1024x8
a
ROM
0 1 1 1 1 1 1 1 1 1 0 en data
a10 just chooses
0 1 1 1 1 1 1 1 1 1 1 a
which memory To create memory with more
to access 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 addr words and wider words, can first
1 0 0 0 0 0 0 0 0 1 0 1024x8 compose to enough words, then
ROM widen.
1 1 1 1 1 1 1 1 1 1 0 en data
88 1 1 1 1 1 1 1 1 1 1 1
Chapter Summary

❑ Modern digital design involves creating processor-level


components
❑ Four-step RTL method can be used
◼ 1. High-level state machine 2. Create datapath 3. Connect datapath to controller 4.
Derive controller FSM
❑ Several example
◼ Control dominated, data dominated, and mix
❑ Determining fastest clock frequency
◼ By finding critical path
❑ Behavioral-level design – C to gates
◼ By using method to convert C (subset) to high-level state machine
❑ Additional RTL components
◼ Memory: RAM, ROM
◼ Queues
❑ Hierarchy: A key concept used throughout Chapters 2-5
89

You might also like