0% found this document useful (0 votes)
171 views22 pages

ARM Implementation: Datapath Control Unit (FSM)

The document discusses the implementation of an ARM processor including its datapath, control unit, clock scheme, register timing, ALU operations, barrel shifter design, multiplier design, and coprocessor interface. Key points include its use of a 2-phase non-overlapping clock, minimum datapath delay consisting of register read, shift, ALU and write times, carry-lookahead adder, crossbar barrel shifter, iterative low-cost multiplier using the ALU and shifter, and coprocessor interface using CPI, CPA, and CPB signals.

Uploaded by

Uday Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views22 pages

ARM Implementation: Datapath Control Unit (FSM)

The document discusses the implementation of an ARM processor including its datapath, control unit, clock scheme, register timing, ALU operations, barrel shifter design, multiplier design, and coprocessor interface. Key points include its use of a 2-phase non-overlapping clock, minimum datapath delay consisting of register read, shift, ALU and write times, carry-lookahead adder, crossbar barrel shifter, iterative low-cost multiplier using the ALU and shifter, and coprocessor interface using CPI, CPA, and CPB signals.

Uploaded by

Uday Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

ARM Implementation

Datapath Control unit (FSM)

2-phase non-overlapping clock scheme


Most ARMs o not operate on e ge-sensitive registers Instea the esign is !ase aroun 2-phase non-overlapping clocks "hich are generate internall# $rom a single clock signal Data movement is controlle !# passing the ata alternativel# through latches "hich are open uring phase 1 or latches uring phase 2
phase 1 phase 2 1 clock cycle
2

ARM atapath timing


Register rea
Register rea !uses % #namic& precharge uring phase 2 During phase 1 selecte registers ischarge the rea !uses "hich !ecome vali earl# in phase 1

Shi$t operation
secon operan passes through !arrel shi$ter

A'( operation
A'( has input latches "hich are open in phase 1& allo"ing the operan s to !egin com!ining in A'( as soon as the# are vali & !ut the# close at the en o$ phase 1 so that the phase 2 precharge oes not get through to the A'( A'( processes the operan s uring the phase 2& pro ucing the vali output to"ar s the en o$ the phase the result is latche in the estination register at the en o$ phase 2
3

ARM datapath timing (contd)


ALU operands latched phase 1 register read time shift time phase 2 read bus valid precharge invalidates shift out valid buses ALU time register write time

Minimum Datapath Delay = Register read time + Shifter Delay + ALU Delay + Register write set-up time + Phase 2 to phase

ALU out

non-o!erlap time
4

)he original ARM1 ripple-carr# a

er

Carr# logic* use CM+S A+I (An -+r-Invert) gate ,ven !its use circuit sho" !elo" + !its use the ual circuit "ith inverte inputs an outputs an A-D an +R gates s"appe aroun .orst case path* !out /2 gates long
A

sum

!in
5

ARM2 0-!it carr# look-ahea scheme


Carr# 1enerate (1) Carr# 2ropagate (2) Cout3/4 5Cin36472 8 1 (se A+I an alternate A-D9+R gates .orst case* : gates long
A#3$%& ) ( #3$%& 4'bit adder logic sum#3$%&

!out#3&

!in#%&
"

)he ARM2 A'( logic $or one result !it


A'( $unctions
ata operations (a & su!& 777) a ress computations $or memor# accesses !ranch target computations f s$ 5 %1 23 !it-"ise logical + operations bus 777

carry logic )

ALU bus ( +A bus

ARM2 A'( $unction co es

)he ARM; carr#-select a


Compute sums o$ various $iel s o$ the "or $or carr#-in o$ <ero an carr#-in o$ one Final result is selecte !# using the correct carr#-in value to control a multiple=er
a/b#3$%& . c

er scheme
a/b#31$2,&

./ .1 ./ .1 s s.1 mu0

mu0

mu0 sum#3$%& sum#*$4& sum#15$,& sum#31$1"&

"orst #ase$ %&log2'word width() gates long

+ote$ e careful1 2an'out on some of these gates is high so direct comparison with previous schemes is not applicable3 -

The ARM6 ALU organization


+ot easy to merge the arithmetic and logic functions 45 a separate logic unit runs in parallel with the adder/ and multiple0or selects the output
A operand latch invert A 9:; gates operand latc h 9:; gates invert

func tion

logic func tions

adder

! in ! 7

logic 8arithmetic

result mu0 <ero detec t result

+ 6

1%

ARM9 carry arbitration encoding


!arry arbitration adder
A 6 6 1 1 > 6 1 6 1 C 6 unkno"n unkno"n 1 u 6 1 1 1 v 6 6 6 1

11

The cross-bar switch barre shi!ter


=hifter delay is critical since it contributes directly to the datapath cycle time !ross'bar switch matri0 >32 0 32? (rinciple for 404 matri0
right 3 right 2 right 1 no shift in#3& in#2& in#1& in#%& left 1 left 2 left 3

out#%& out#1& out#2& out#3& 12

The cross-bar switch barre shi!ter (contd)


(recharged logic is used 45 each switch is a single +@:= transistor (recharging sets all outputs to logic %/ so those which are not connected to any input during switching remain at % giving the <ero filling reAuired by the shift semantics 2or rotate right/ the right shift diagonal is enabled . complementary shift left diagonal >e3 g3/ Bright 1C . Bleft 3C? Arithmetic shift right$ use sign'e0tension 45 separate logic is used to decode the shift amount and discharge those outputs appropriately

13

M" tip ier design


All ARM processors apart $rom the $irst protot#pe have inclu e har "are support $or integer multiplication7 )"o st#les o$ multiplier have !een use * ? +l er ARM cores inclu e lo"-cost multiplication har "are that supports onl# the /2-!it result multipl# an multipl#-accumulate instructions7 ? Recent ARM cores have high-per$ormance multiplication har "are an support the ;0-!it result multipl# an multipl#-accumulate instructions7
14

)he lo"-cost support uses the main atapath iterativel#& emplo#ing the !arrel shi$ter an A'( to generate a 2-!it pro uct in each clock c#cle7 ,arl#-termination logic stops the iterations "hen there are no more ones in the multipl# register7 )he multiplier emplo#s a mo i$ie >ooth@s algorithm to pro uce the 2-!it pro uct7 )his multiplication uses the e=isting shi$ter an A'(& the a itional har "are it reAuires is limite to a e icate t"o-!itsper-c#cle shi$t register $or the multiplier an a $e" gates $or the >ooth@s algorithm control logic7
15

#arry-propagate (a) and carry-sa$e (b) adder str"ct"res


>a? A

!out =

!in

!out

!in =

!out

!in =

!out

!in =

>b?

!out =

!in

!out

!in =

!out

!in =

!out

!in =

1"

ARM high-speed m" tip ier organization

initiali<a tion f or @LA

registers

;s 55 , bits8cycle ;m
rotate sum and carry , bits8cy cle

carry'save adders

partial sum partial carry ALU >add partials?

1*

ARM% register ce circ"it

write ALU bus A bus bus

read read A

1,

ARM register ban& ! oorp an

A bus read decoders bus read decoders 7dd 7ss ALU bus (! bus D+! bus (! register cells ALU bus A bus bus write decoders

1-

The ARM coprocessor inter!ace

!oprocessor architecture
=upport up to 1" coprocesors Each coprocessor can have up to 1" registers !oprocessor instructions
o Dnternal operations on coprocessor registers o Load8store registers from8to memory o @ove data to8from an A;@ register

The ARM coprocessor inter!ace

F A;@*GH@D interface
I cpi >from A;@ to all coprocessors?$ A;@ identifies a coprocessor instruction and wishes to e0ecute it I cpa >from coprocessors to A;@?$ coprocessor absent that there is no coprocessor present that is able to e0ecute the current instruction I cpb >from coprocessors to A;@?$ coprocessor busy/ cannot e0ecute the instruction yet

The ARM coprocessor inter!ace


A;@ decides not to e0ecute it >eg3 condition not satisfied?$ do not assert cpi/ and the instruction will be discarded A;@ decides to e0ecute it >assert cpi? but coprocessor absent >cpa active?$ A;@ takes the undefined instruction trap A;@ decides to e0ecute it >assert cpi? but coprocessor present >cpa inactive? and busy >cpb active?$ A;@ will busy'wait until cpb inactive/ stalling the instruction stream A;@ decides to e0ecute it >assert cpi?/ and the coprocessor accepts it >cpa and cpb inactive?/ both sides commit to complete the instruction

You might also like