0% found this document useful (0 votes)
36 views

CSPP

The document describes custom single-purpose processors and their design. It discusses: - Processors can be general-purpose or single-purpose, designed for a specific computation task. Custom single-purpose processors can be fast, small, and low power but require more design time. - The basic components of digital circuits like transistors, CMOS transistors, and logic gates like inverters, AND, OR, and XOR gates. - Combinational logic components like decoders, adders, comparators, and arithmetic logic units (ALUs). - Sequential logic components like registers, shift registers, and counters to store state and perform operations over multiple clock cycles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

CSPP

The document describes custom single-purpose processors and their design. It discusses: - Processors can be general-purpose or single-purpose, designed for a specific computation task. Custom single-purpose processors can be fast, small, and low power but require more design time. - The basic components of digital circuits like transistors, CMOS transistors, and logic gates like inverters, AND, OR, and XOR gates. - Combinational logic components like decoders, adders, comparators, and arithmetic logic units (ALUs). - Sequential logic components like registers, shift registers, and counters to store state and perform operations over multiple clock cycles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

SNS COLLEGE OF TECHNOLOGY

(An Autonomous Institution)


COIMBATORE - 35

Custom Single purpose


Processor

Dr. R. Rajasekaran : Embedded System Using Industrial Applications


Introduction
• Processor
– Digital circuit that performs a
computation tasks
– Controller and datapath CCD
Digital camera chip

– General-purpose: variety of CCD Pixel coprocessor D2A


computation tasks A2D preprocessor

– Single-purpose: one particular lens


computation task
JPEG codec Microcontroller Multiplier/Accum
– Custom single-purpose: non-standard
task DMA controller Display

• A custom single-purpose ctrl

processor may be
– Fast, small, low power Memory controller ISA bus interface UART LCD ctrl

– But, high NRE, longer time-to-market,


less flexible

2
CMOS transistor on silicon
• Transistor
– The basic electrical component in digital
systems
– Acts as an on/off switch
– Voltage at “gate” controls whether currentgate
source
Conducts
flows from source to drain 1 if gate=1
drain

– Don’t confuse this “gate” with


gate
a logic gate
IC package IC oxide
source channel drain
Silicon
substrate

3
CMOS transistor
implementations
• Complementary Metal source source

Oxide Semiconductor gate Conducts gate


if gate=1
Conducts
if gate=0

• We refer to logic levels drain drain

nMOS pMOS
– Typically 0 is 0V, 1 is 5V
• Two basic CMOS types
– nMOS conducts if gate=1 1 1 1
x y x
– pMOS conducts if gate=0 x F = x'
F = (xy)' y
– Hence “complementary” x F = (x+y)'
0 y x
• Basic gates
y

0 0
– Inverter, NAND, NOR inverter NAND gate NOR gate

4
Basic logic gates
x x x x
F x F F
x y F y F x y F F x y F
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
F=x F=xy 1 0 0 F=x+ 1 0 1 F=x 1 0 1
1 1 1 1 1 1 1 1 0
Driver AND y y
OR XOR

x F x F x x y F x x y F x x y F
F F F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x 1 0 1 F= 1 0 0 F=x y 1 0 0
Inverte y)’ 1 1 0 (x+y)’ 1 1 0 XNOR 1 1 1
r NAND NOR

5
Combinational logic design
A) Problem description B) Truth table C) Output equations

y is 1 if a is to 1, or b and c are 1. z is Inputs Outputs y = a'bc + ab'c' + ab'c + abc' +


1 if b or c is to 1, but not both, or if all a b c y z abc
are 1. 0 0 0 0 0
0 0 1 0 1 z = a'b'c + a'bc' + ab'c + abc' +
0 1 0 0 1 abc
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1

z = ab + b’c + bc’

6
Combinational components
I(log n -1) I0 A A B
B A B
I(m-1) I1 I0 n n
… n n n
n …
log n x n n-bit n bit,
S0 n-bit, m x 1 n-bit
Decoder Adder m function S0
… Multiplexor Comparator
ALU …
… n
S(log n n S(log
m) m)
O(n-1) O1O0 carry sum less equa greate
O O
l r

O= O0 =1 if I=0..00 sum = A+B less = 1 if A<B O = A op B


I0 if S=0..00 O1 =1 if I=0..01 (first n bits) equal =1 if A=B op determined
I1 if S=0..01 … carry = (n+1)’th greater=1 if A>B by S.
… O(n-1) =1 if I=1..11 bit of A+B
I(m-1) if S=1..11

With enable input e  With carry-in input May have status


all O’s are 0 if e=0 Ci outputs carry, zero,
sum = A + B + Ci etc.

7
Sequential components
I
n
load shift n-bit
n-bit n-bit
Register Shift register Counter
clear I Q
n n

Q Q

Q= Q = lsb Q=
0 if clear=1, - Content shifted 0 if clear=1,
I if load=1 and clock=1, - I stored in msb Q(prev)+1 if count=1 and clock=1.
Q(previous) otherwise.

8
Sequential logic design
A) Problem Description C) Implementation Model D) State Table (Moore-type)
You want to construct a clock
divider. Slow down your pre- x
existing clock so that you output a Combinational logic Inputs Outputs
a 1 for every four clock cycles I1 Q1 Q0 a I1 I0 x
I0 0 0 0 0 0
0
0 0 1 0 1
0 1 0 0 1 0
Q1 Q0 0 1 1 1 0
1 0 0 1 0 0
B) State Diagram State register 1 0 1 1 1
1 1 0 1 1
x=0 x=1 a=0 1
a=0 1 1 1 0 0
I1 I0
0 a=1 3

a=1 a=1

a=0
1
a=1
2
x=0 a=0
• Given this implementation model
x=0
– Sequential logic design quickly reduces
to combinational logic design

9
Sequential logic design (cont.)
E) Minimized Output Equations F) Combinational Logic
I1 Q1Q0
a 00 01 11
10 a
0 0 0 1 1
I1 = Q1’Q0a + Q1a’ + x
1 Q1Q0’
0 1 0 1

I0 I1
a
Q1Q0 01 10
00 11
0 0 1 1 0 I0 = Q0a’ + Q0’a

1 1 0 0 1

x Q1Q0 I0
a
00 01 11 10
0 0 0 1 0 x = Q1Q0
Q1 Q0
1 0 0 1 0

10
Custom single-purpose
processor basic model
… …

external external
control data controller datapath
inputs inputs
… …
datapath next-state registers
control and
controller inputs datapath control
logic

datapath
control state functional
outputs register units
… …
external external
control data
outputs outputs
… …

controller and datapath a view inside the controller and datapath

11
Example: greatest common
divisor !1
(a) black-box 1:
(c) state
• First create algorithm view 1 !(!go_i) diagram
2:

• Convert algorithm to go_i x_i y_i


2-J:
!go_i

“complex” state GCD


3: x = x_i
d_o
machine 4: y = y_i

– Known as FSMD: finite- (b) desired !(x!=y)


5:
state machine with functionality
0: int x, y; x!=y
1: while (1) {
datapath 2: while (!go_i);
6:
x<y !(x<y)
3: x = x_i;
– Can use templates to 4: y = y_i; 7: y = y -x 8: x = x - y

perform such conversion 5: while (x != y) {


6: if (x < y)
6-J:

7: y = y - x;
else 5-J:

8: x = x - y; 9: d_o = x
}
9: d_o = x; 1-J:
}

12
State diagram templates
Assignment statement Loop statement Branch statement
a=b while (cond) if (c1)
next { c1 stmts
statement loop-body- else if c2
c2 stmts
statements else
} other
C: next C: stmts
!cond
a=b
statement
cond next
c1 !c1*c2 !c1*!c2

next loop- statement


c1 c2 others
statemen body-
stmts stmts
t statement
s
J: J:

next next
statement statement

13
Creating the datapath
• Create a register for any 1:
!1

1
declared variable 2:
!(!go_i)

x_i y_i
!go_i
• Create a functional unit 2-J:
x_sel
Datapath

for each arithmetic 3: x = x_i


y_sel
n-bit 2x1 n-bit 2x1

x_ld
operation 4: y = y_i
y_ld
0: x 0: y

• Connect the ports, 5: !(x!=y)


!= < subtractor subtractor
x!=y
registers and functional 6:
5: x!=y
x_neq_
6: x<y 8: x-y 7: y-x

x<y !(x<y) y
units 7: y = y -x 8: x = x - y
x_lt_y
d_ld
9: d

– Based on reads and 6-J:


d_
o

writes 5-J:

– Use multiplexors for 9: d_o = x

multiple sources 1-J:

• Create unique identifier


14
– for each datapath
Creating the controller’s FSM
!1 go_i
1:

1 !(!go_i)
Controller
0000 1:
!1 • Same structure as
2:
!go_i
0001 2:
1 !(!go_i) FSMD
!go_i
2-J:
00102-J: • Replace complex
3: x = x_i x_sel = 0
0011 3: x_ld = 1
actions/conditions with
4: y = y_i
y_sel = 0
0100 4: y_ld = 1 datapath configurations
x_i y_i

!(x!=y)
Datapath
5: !x_neq_y
0101 5: x_sel
x!=y n-bit 2x1 n-bit 2x1
x_neq_y y_sel
6: 0110 6:
x_ld
x<y !(x<y) x_lt_y !x_lt_y 0: x 0: y
y_ld
7: y = y -x 8: x = x - y 7: y_sel = 1 8: x_sel =1
y_ld = 1 x_ld = 1

6-J: 0111 1000


!= < subtractor subtractor
1001 6-J:
5: x!=y 6: x<y 8: x-y 7: y-x
5-J: x_neq_
1010 5-J:
y
x_lt_y 9: d
9: d_o = x 1011 9: d_ld = 1
d_ld

1-J: 1100 1-J: d_


o

15
Splitting into a controller and
datapath go_i

Controller implementation model Controller !1


0000 1: x_i y_i
go_i
x_sel 1 !(!go_i) (b) Datapath
Combinational y_sel 0001 2:
logic !go_i x_sel
x_ld n-bit 2x1 n-bit 2x1
y_ld 00102-J: y_sel
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
!= < subtractor subtractor
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: x_neq_
Q3 Q2 Q1 Q0 x_neq_y= y
0110 6: x_lt_y 9: d
1
State register d_ld
x_lt_y=1 x_lt_y=
I3 I2 I1 I0 0 =1
7: y_sel = 1 8: x_sel d_
y_ld = 1 x_ld = 1 o
0111 1000
1001 6-J:

1010 5-J:

1011 9: d_ld = 1

1100 1-J:

16
Controller state table for the
GCD example
Inputs Outputs
Q3 Q2 Q1 Q0 x_ne x_lt_ go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld
q_y y
0 0 0 0 * * * 0 0 0 1 X X 0 0 0
0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
0 0 1 0 * * * 0 0 0 1 X X 0 0 0
0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
1 0 0 1 * * * 1 0 1 0 X X 0 0 0
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1 1 1 1 * * * 0 0 0 0 X X 0 0 0

17
Completing the GCD custom
single-purpose processor
• We finished the
design … …

datapath controller datapath

• We have a state next-state


and
control
registers

table for the next logic

state and control state functional


units
register
logic
– All that’s left is … …

combinational logic a view inside the controller and datapath


design
18
RT-level custom single-purpose
processor design
• We often start with a

Problem Specification
state machine Send
er rdy_in
Bridge
A single-purpose processor that rdy_out
Rec
eive
converts two 4-bit inputs, arriving r
– Rather than algorithm clock one at a time over data_in along
with a rdy_in pulse, into one 8-bit

– Cycle timing often too data_in(4)


output on data_out along with a
rdy_out pulse. data_out(8)

central to functionality
• Example rdy_in=0
rdy_in=1
Bridge rdy_in=1

– Bus bridge that converts 4- WaitFirst4 RecFirst4Start


data_lo=data_in
RecFirst4End

bit bus to 8-bit bus rdy_in=0 rdy_in=0 rdy_in=1


rdy_in=1
– Start with FSMD FSMD
WaitSecond4 RecSecond4Start RecSecond4End
data_hi=data_in
– Known as register-transfer rdy_in=0
Inputs
(RT) level Send8Start
data_out=data_hi Send8End
rdy_in: bit; data_in: bit[4];
Outputs
rdy_out=0
– Exercise: complete the
& data_lo rdy_out: bit; data_out:bit[8]
rdy_out=1 Variables
data_lo, data_hi: bit[4];
design
19
RT-level custom single-purpose
processor design (cont’)
Bridge
(a) Controller
rdy_in=0 rdy_in=1
rdy_in=1
WaitFirst4 RecFirst4Start RecFirst4End
data_lo_ld=1
rdy_in=0 rdy_in=0 rdy_in=1
rdy_in=1
WaitSecond4 RecSecond4Star RecSecond4End
t
data_hi_ld=1

Send8Start Send8End
data_out_ld=1 rdy_out=0
rdy_out=1

rdy_in rdy_ou
t
clk
data_in(4) data_out

data_lo_ld
data_out_ld
data_hi_ld
registers

data_hi data_lo
to all

data_out
(b) Datapath

20
Optimizing single-purpose
processors
• Optimization is the task of making design
metric values the best possible
• Optimization opportunities
– original program
– FSMD
– datapath
– FSM

21
Optimizing the original program
• Analyze program attributes and look for
areas of possible improvement
– number of computations
– size of variable
– time and space complexity
– operations used
• multiplication and division very expensive

22
Optimizing the original program
(cont’)
original program optimized program
0: int x, y; 0: int x, y, r;
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger
4: y = y_i; number
5: while (x != y) { 3: if (x_i >= y_i) {
replace the subtraction
6: if (x < y) 4: x=x_i;
operation(s) with modulo
7: y = y - x; 5: y=y_i;
operation in order to
else }
speed up program
8: x = x - y; 6: else {
} 7: x=y_i;
9: d_o = x; 8: y=x_i;
} }
9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
GCD(42, 8) - 9 iterations to complete the loop GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows : (42, 8), x and y values evaluated as follows: (42, 8),
(43, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (8,2), (2,0)
(2,2).

23
Optimizing the FSMD
• Areas of possible improvements
– merge states
• states with constants on transitions can be
eliminated, transition taken is already known
• states with independent operations can be merged
– separate states
• states which require complex operations (a*b*c*d)
can be broken into smaller states to reduce
hardware size
– scheduling
24
Optimizing the FSMD (cont.)
int x, y; !1 optimized FSMD
original FSMD
1:
int x, y;
1 !(!go_i) eliminate state 1 – transitions have constant values
2: 2:
!go_i !go_i
go_i
2-J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop operation in
3: x = x_i between them
5:

4: y = y_i x<y x>y


merge state 3 and state 4 – assignment operations
are independent of one another 7: y = y -x 8: x = x - y
5: !(x!=y)

x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from state 6
x<y !(x<y) can be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from each
6-J: state can be done from state 7 and state 8,
respectively
5-J:
eliminate state 1-J – transition from state 1-J can
d_o = x be done directly from state 9
9:

1-J:

25
Optimizing the datapath
• Sharing of functional units
– one-to-one mapping, as done previously, is
not necessary
– if same operation occurs in different states,
they can share a single functional unit
• Multi-functional units
– ALUs support a variety of operations, it can
be shared among operations occurring in
different states
26
Optimizing the FSM
• State encoding
– task of assigning a unique bit pattern to each
state in an FSM
– size of state register and combinational logic
vary
– can be treated as an ordering problem
• State minimization
– task of merging equivalent states into a single
state
27 • state equivalent if for all possible input
combinations the two states generate the same

You might also like