0% found this document useful (0 votes)
5 views

Lecture#3 Chapter2 CustomSPP Hardware

Uploaded by

zaki1011974
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture#3 Chapter2 CustomSPP Hardware

Uploaded by

zaki1011974
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Embedded Systems Design: A Unified

Hardware/Software Introduction

Chapter 2: Custom single-purpose


processors

1
Outline

• Introduction
• Combinational logic
• Sequential logic
• Custom single-purpose processor design
• RT-level custom single-purpose processor design

Embedded Systems Design: A Unified 2


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Introduction
• Processor
– Digital circuit that performs a
computation tasks
– Controller and datapath Digital camera chip
CCD
– General-purpose: variety of computation
CCD Pixel coprocessor D2A
tasks A2D preprocessor

– Single-purpose: one particular lens


computation task
JPEG codec Microcontroller Multiplier/Accum
– Custom single-purpose: non-standard
task
DMA controller Display
• A custom single-purpose ctrl

processor may be
– Fast, small, low power Memory controller ISA bus interface UART LCD ctrl

– But, high NRE, longer time-to-market,


less flexible

Embedded Systems Design: A Unified 3


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
CMOS transistor on silicon

• Transistor
– The basic electrical component in digital systems
– Acts as an on/off switch
– Voltage at “gate” controls whether current flows from
source to drain
source
– Don’t confuse this “gate” with a logic gate gate Conducts
if gate=1
1 drain

gate
IC package IC oxide
source channel drain
Silicon substrate

Embedded Systems Design: A Unified 4


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Transistor size

Human Hair
~75 m

.
0.18 m

.
180 nm
feature
~40,000 (65-nm node) transistors could fit on cross-section
Embedded Systems Design: A Unified
[C. Keast]
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
CMOS transistor implementations

• Complementary Metal Oxide source source

Semiconductor gate Conducts


if gate=1
gate Conducts
if gate=0
• We refer to logic levels drain drain

nMOS pMOS
– Typically 0 is 0V, 1 is 5V
• Two basic CMOS types
– nMOS conducts if gate=1 1 1 1
– pMOS conducts if gate=0 x y x
x F = x' y
– Hence “complementary” x
F = (xy)'
F = (x+y)'

• Basic gates 0 y x y

– Inverter, NAND, NOR 0 0


inverter NAND gate NOR gate

Embedded Systems Design: A Unified 6


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Basic logic gates

x F x F x x y F x x y F x x y F
F y F F
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
1 0 0 1 0 1 1 0 1
F=x F=xy F=x+y F=xy
1 1 1 1 1 1 1 1 0
Driver AND OR XOR

x F x F x x y F x x y F x x y F
F F F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x y)’ 1 0 1 F = (x+y)’ 1 0 0 F=x y 1 0 0
Inverter NAND 1 1 0 NOR 1 1 0 XNOR 1 1 1

Embedded Systems Design: A Unified 7


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Combinational Circuit design using CMOS

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Gate Array Design Style
• The most popular VLSI technology (v.s. BiCMOS, nMOS).
• CMOS uses both n-channel and p-channel transistors.
• Advantages: lower power dissipation, higher regularity, more reliable
performance, higher noise margin, larger fanout, etc.
• Each type of transistor must sit in a material of the complementary type (the
reverse-biased diodes prevent unwanted current flow).

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
A CMOS Inverter

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
A CMOS NAND Gate

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
A CMOS NOR Gate

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Basic CMOS Logic Library

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Construction of Compound Gates

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Construction of Compound Gates

Embedded Systems Design: A Unified


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Combinational logic design
A) Problem description B) Truth table C) Output equations

y is 1 if a is to 1, or b and c are 1. z is 1 if Inputs Outputs y = a'bc + ab'c' + ab'c + abc' + abc


b or c is to 1, but not both, or if all are 1. a b c y z
0 0 0 0 0
0 0 1 0 1 z = a'b'c + a'bc' + ab'c + abc' + abc
0 1 0 0 1
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1

z = ab + b’c + bc’

Embedded Systems Design: A Unified 16


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Combinational components

I(log n -1) I0 A A B
B A B
I(m-1) I1 I0 n n
… n n n
n …
log n x n n-bit n bit,
S0 n-bit, m x 1 n-bit
Decoder Adder m function S0
… Multiplexor Comparator
ALU …
… n
S(log S(log
n n
m) m)
O(n-1) O1 O0 carry sum less equal greater
O O

O= O0 =1 if I=0..00 sum = A+B less = 1 if A<B O = A op B


I0 if S=0..00 O1 =1 if I=0..01 (first n bits) equal =1 if A=B op determined
I1 if S=0..01 … carry = (n+1)’th greater=1 if A>B by S.
… O(n-1) =1 if I=1..11 bit of A+B
I(m-1) if S=1..11

With enable input e  all With carry-in input Ci May have status outputs
O’s are 0 if e=0 carry, zero, etc.
sum = A + B + Ci

Embedded Systems Design: A Unified 17


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Sequential components

I
n
load shift n-bit
n-bit n-bit
Register Shift register Counter
clear I Q
n n

Q Q

Q= Q = lsb Q=
0 if clear=1, - Content shifted 0 if clear=1,
I if load=1 and clock=1, - I stored in msb Q(prev)+1 if count=1 and clock=1.
Q(previous) otherwise.

Embedded Systems Design: A Unified 18


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Sequential logic design

A) Problem Description C) Implementation Model D) State Table (Moore-type)


You want to construct a clock
divider. Slow down your pre- x
a Combinational logic Inputs Outputs
existing clock so that you output a I1 Q1 Q0 a I1 I0 x
1 for every four clock cycles 0 0 0 0 0
I0 0
0 0 1 0 1
0 1 0 0 1 0
Q1 Q0 0 1 1 1 0
1 0 0 1 0 0
B) State Diagram 1 0 1 1 1
State register
1 1 0 1 1
x=0 x=1 a=0 1
a=0 1 1 1 0 0
I1 I0
0 a=1 3

a=1 a=1

a=0
1
a=1
2
a=0
• Given this implementation model
x=0 x=0
– Sequential logic design quickly reduces to
Go in the states 00 , 01 , 10, 11 , 00 ,…. combinational logic design

Embedded Systems Design: A Unified 19


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Sequential logic design (cont.)
E) Minimized Output Equations F) Combinational Logic
I1 Q1Q0
a 00 01 11 10
a
0 0 0 1 1
I1 = Q1’Q0a + Q1a’ + x
1 Q1Q0’
0 1 0 1

I0 Q1Q0 I1
00 01 11 10
a
0 0 1 1 0 I0 = Q0a’ + Q0’a

1 1 0 0 1

x Q1Q0 I0
a
00 01 11 10
0 0 0 1 0 x = Q1Q0
Q1 Q0
1 0 0 1 0

Embedded Systems Design: A Unified 20


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Custom single-purpose processor basic
model
… …

external external
control data controller datapath
inputs inputs
… …
datapath next-state registers
control and
controller inputs datapath control
logic

datapath
control state functional
outputs register units
… …
external external
control data
outputs outputs
… …

controller and datapath a view inside the controller and datapath

Embedded Systems Design: A Unified 21


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Example: greatest common divisor
!1
(a) black-box 1:
• First create algorithm view 1 !(!go_i)
(c) state
diagram
2:
• Convert algorithm to go_i x_i y_i !go_i
2-J:
“complex” state machine GCD
3: x = x_i
d_o
– Known as FSMD: finite-
4: y = y_i
state machine with datapath
(b) desired functionality !(x!=y)
– Can use templates to 0: int x, y;
5:
x!=y
1: while (1) {
perform such conversion 2: while (!go_i);
6:
x<y !(x<y)
3: x = x_i;
y = y -x 8: x = x - y
4: y = y_i; 7:
5: while (x != y) {
6-J:
6: if (x < y)
7: y = y - x;
else 5-J:

8: x = x - y; 9: d_o = x
}
9: d_o = x; 1-J:
}

Embedded Systems Design: A Unified 22


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
State diagram templates
Assignment statement Loop statement Branch statement
a=b while (cond) { if (c1)
next statement loop-body- c1 stmts
statements else if c2
} c2 stmts
next statement else
other stmts
next statement

!cond
a=b C: C:
cond c1 !c1*c2 !c1*!c2

next loop-body-
c1 stmts c2 stmts others
statement statements

J: J:

next next
statement statement

Embedded Systems Design: A Unified 23


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Creating the datapath
• Create a register for any 1:
!1

declared variable 2:
1 !(!go_i)

x_i y_i

• Create a functional unit for


!go_i
Datapath
2-J:
x_sel
each arithmetic operation 3: x = x_i
y_sel
n-bit 2x1 n-bit 2x1

• Connect the ports, registers


x_ld
4: y = y_i 0: x 0: y
y_ld

and functional units 5: !(x!=y)


!= < subtractor subtractor
x!=y
– Based on reads and writes 6:
5: x!=y
x_neq_y
6: x<y 8: x-y 7: y-x

– Use multiplexors for y = y -x


x<y !(x<y) x_lt_y 9: d
7: 8: x = x - y d_ld
multiple sources 6-J:
d_o

• Create unique identifier 5-J:

– for each datapath component 9: d_o = x

control input and output 1-J:

Embedded Systems Design: A Unified 24


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Creating the controller’s FSM
go_i
• Same structure as FSMD
!1
1:
Controller !1
1 !(!go_i) 0000 1:
2:
!go_i
0001 2:
1 !(!go_i)
• Replace complex
!go_i
2-J:
0010 2-J: actions/conditions with
datapath configurations
3: x = x_i x_sel = 0
0011 3: x_ld = 1

4: y = y_i
y_sel = 0 x_i y_i
0100 4: y_ld = 1
!(x!=y)
Datapath
5: !x_neq_y
0101 5: x_sel
x!=y n-bit 2x1 n-bit 2x1
x_neq_y y_sel
6: 0110 6:
x_ld
x<y !(x<y) x_lt_y !x_lt_y 0: x 0: y
y_ld
7: y = y -x 8: x = x - y 7: y_sel = 1 8: x_sel =1
y_ld = 1 x_ld = 1

6-J: 0111 1000


!= < subtractor subtractor
1001 6-J:
5: x!=y 6: x<y 8: x-y 7: y-x
5-J: x_neq_y
1010 5-J:
x_lt_y 9: d
9: d_o = x 1011 9: d_ld = 1
d_ld

1-J: 1100 1-J: d_o

Embedded Systems Design: A Unified 25


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Splitting into a controller and datapath
go_i

Controller implementation model Controller !1


0000 1: x_i y_i
go_i
x_sel 1 !(!go_i) (b) Datapath
Combinational y_sel 0001 2:
logic !go_i x_sel
x_ld n-bit 2x1 n-bit 2x1
y_ld 0010 2-J: y_sel
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
!= < subtractor subtractor
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: x_neq_y
Q3 Q2 Q1 Q0 x_neq_y=1
0110 6: x_lt_y 9: d
State register d_ld
x_lt_y=1 x_lt_y=0
I3 I2 I1 I0
7: y_sel = 1 8: x_sel =1 d_o
y_ld = 1 x_ld = 1
0111 1000
1001 6-J:

1010 5-J:

1011 9: d_ld = 1

1100 1-J:

Embedded Systems Design: A Unified 26


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Controller state table for the GCD example
Inputs Outputs
Q3 Q2 Q1 Q0 x_ne x_lt_ go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld
q_y y
0 0 0 0 * * * 0 0 0 1 X X 0 0 0
0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
0 0 1 0 * * * 0 0 0 1 X X 0 0 0
0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
1 0 0 1 * * * 1 0 1 0 X X 0 0 0
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1 1 1 1 * * * 0 0 0 0 X X 0 0 0

Embedded Systems Design: A Unified 27


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Completing the GCD custom single-purpose
processor design
• We finished the datapath … …

• We have a state table for controller datapath

the next state and control next-state registers


and
logic control
logic
– All that’s left is
combinational logic state functional
units
register
design
• This is not an optimized
… …
design, but we see the
basic steps a view inside the controller and datapath

Embedded Systems Design: A Unified 28


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
RT-level custom single-purpose processor
design
• We often start with a state

Problem Specification
machine Sende
r rdy_in
Bridge
A single-purpose processor that rdy_out
Rece
iver

– Rather than algorithm clock


converts two 4-bit inputs, arriving one
at a time over data_in along with a
rdy_in pulse, into one 8-bit output on
– Cycle timing often too central data_in(4)
data_out along with a rdy_out pulse.
data_out(8)

to functionality
• Example rdy_in=0 Bridge rdy_in=1
rdy_in=1
– Bus bridge that converts 4-bit WaitFirst4 RecFirst4Start
data_lo=data_in
RecFirst4End

bus to 8-bit bus rdy_in=0 rdy_in=0 rdy_in=1


– Start with FSMD FSMD
WaitSecond4
rdy_in=1
RecSecond4Start RecSecond4End

– Known as register-transfer
data_hi=data_in

rdy_in=0
(RT) level Send8Start
Inputs
rdy_in: bit; data_in: bit[4];
data_out=data_hi Send8End
– Exercise: complete the design & data_lo
rdy_out=1
rdy_out=0
Outputs
rdy_out: bit; data_out:bit[8]
Variables
data_lo, data_hi: bit[4];

Embedded Systems Design: A Unified 29


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
RT-level custom single-purpose processor
design (cont’)
Bridge
(a) Controller
rdy_in=0 rdy_in=1
rdy_in=1
WaitFirst4 RecFirst4Start RecFirst4End
data_lo_ld=1
rdy_in=0 rdy_in=0 rdy_in=1
rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
data_hi_ld=1

Send8Start Send8End
data_out_ld=1 rdy_out=0
rdy_out=1

rdy_in rdy_ou
t
clk
data_in(4) data_out

data_lo_ld
data_out_ld
data_hi_ld
registers

data_hi data_lo
to all

data_out
(b) Datapath

Embedded Systems Design: A Unified 30


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
RT-level custom single-purpose processor
design
Bridge

Problem Specification
(a) Controller
rdy_in=0 rdy_in=1 Sende Bridge Rece
r rdy_in A single-purpose processor that rdy_out iver
rdy_in=1
converts two 4-bit inputs, arriving one
WaitFirst4 RecFirst4Start RecFirst4End clock at a time over data_in along with a
data_lo_ld=1 rdy_in pulse, into one 8-bit output on
data_out along with a rdy_out pulse.
rdy_in=0 rdy_in=0 rdy_in=1 data_in(4) data_out(8)
rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
data_hi_ld=1

rdy_in=0 Bridge rdy_in=1


Send8Start rdy_in=1
Send8End
data_out_ld=1 WaitFirst4 RecFirst4Start RecFirst4End
rdy_out=0 data_lo=data_in
rdy_out=1
rdy_in=0 rdy_in=0 rdy_in=1
rdy_in rdy_ou rdy_in=1
t
clk WaitSecond4 RecSecond4Start RecSecond4End
FSMD
data_hi=data_in
data_in(4) data_out
rdy_in=0
data_lo_ld
data_out_ld

Inputs
data_hi_ld
registers

data_hi data_lo Send8Start rdy_in: bit; data_in: bit[4];


to all

data_out=data_hi Send8End
Outputs
& data_lo rdy_out=0
data_out rdy_out: bit; data_out:bit[8]
rdy_out=1 Variables
(b) Datapath data_lo, data_hi: bit[4];

Embedded Systems Design: A Unified 31 31


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing single-purpose processors

• Optimization is the task of making design metric


values the best possible
• Optimization opportunities
– original program
– FSMD
– datapath
– FSM

Embedded Systems Design: A Unified 32


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the original program

• Analyze program attributes and look for areas of


possible improvement
– number of computations
– size of variable
– time and space complexity
– operations used
• multiplication and division very expensive

Embedded Systems Design: A Unified 33


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the original program (cont’)
original program optimized program
0: int x, y; 0: int x, y, r;
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger number
4: y = y_i; 3: if (x_i >= y_i) {
5: while (x != y) { 4: x=x_i;
replace the subtraction
6: if (x < y) 5: y=y_i;
operation(s) with modulo
7: y = y - x; }
operation in order to speed
else 6: else {
up program
8: x = x - y; 7: x=y_i;
} 8: y=x_i;
9: d_o = x; }
} 9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
GCD(42, 8) - 9 iterations to complete the loop GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows : (42, 8), (43, 8), x and y values evaluated as follows: (42, 8), (8,2),
(26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). (2,0)

Embedded Systems Design: A Unified 34


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the FSMD

• Areas of possible improvements


o merge states
• states with constants on transitions can be eliminated, transition
taken is already known
• states with independent operations can be merged
o separate states
• states which require complex operations (a*b*c*d) can be broken
into smaller states to reduce hardware size
o scheduling

Embedded Systems Design: A Unified 35


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the FSMD (cont.)
int x, y; !1 optimized FSMD
original FSMD
1:
int x, y;
1 !(!go_i) eliminate state 1 – transitions have constant values
2: 2:
!go_i go_i !go_i

2-J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop operation in
3: x = x_i between them
5:

4: y = y_i x<y x>y


merge state 3 and state 4 – assignment operations are
independent of one another 7: y = y -x 8: x = x - y
5: !(x!=y)

x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from state 6 can
x<y !(x<y) be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from each state
6-J: can be done from state 7 and state 8, respectively

5-J:
eliminate state 1-J – transition from state 1-J can be
d_o = x done directly from state 9
9:

1-J:

Embedded Systems Design: A Unified 36


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the datapath

• Sharing of functional units


– one-to-one mapping, as done previously, is not necessary
– if same operation occurs in different states, they can share a
single functional unit
• Multi-functional units
– ALUs support a variety of operations, it can be shared
among operations occurring in different states

Embedded Systems Design: A Unified 37


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Optimizing the FSM

• State encoding
– task of assigning a unique bit pattern to each state in an FSM
– size of state register and combinational logic vary
– can be treated as an ordering problem
• State minimization
– task of merging equivalent states into a single state
• state equivalent if for all possible input combinations the two states
generate the same outputs and transitions to the next same state

Embedded Systems Design: A Unified 38


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Summary

• Custom single-purpose processors


– Straightforward design techniques
– Can be built to execute algorithms
– Typically start with FSMD
– CAD tools can be of great assistance

Embedded Systems Design: A Unified 39


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Typical Exam Question

Design a single-purpose processor that outputs Fibonacci


numbers up to n places. The processor has the
following interface:
• go_i – start signal, when 1 the processors should start
generating the numbers;
• n_i – the number of places to be generated;
• fib_o – gives the value of the current Fibonacci number.
Start with a function computing the desired result, translate it into
a state diagram, and sketch a probable controller and datapath.

Embedded Systems Design: A Unified 40


Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Question
solution

Embedded Systems Design: A Unified 41


Hardware/Software Introduction, (c) 2000 Vahid/Givargis

You might also like