0% found this document useful (0 votes)
6 views34 pages

Es - Mod 3

The document provides an overview of ARM processor architecture, detailing the ARM7 and ARM9 models, their pipeline architectures, and features such as instruction sets and memory organization. It explains the ARM family nomenclature, processor modes, and development tools used for programming ARM systems. Additionally, it covers types of ARM instructions, including data processing, arithmetic, logical, and load/store instructions.

Uploaded by

appukuttan7100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views34 pages

Es - Mod 3

The document provides an overview of ARM processor architecture, detailing the ARM7 and ARM9 models, their pipeline architectures, and features such as instruction sets and memory organization. It explains the ARM family nomenclature, processor modes, and development tools used for programming ARM systems. Additionally, it covers types of ARM instructions, including data processing, arithmetic, logical, and load/store instructions.

Uploaded by

appukuttan7100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

MODULE III ECT342 EMBEDDED SYSTEMS

90873.1.ARM PROCESSOR ARCHITECTURE

ARM: Advanced RISC Machines or Acorn RISC Machine.ARM7 was Designed by Acorn
Computers ltd of Cambridge, England in the early 90’s.ARM7 supersedes the ARM6
processor and supports 3 stage pipeline architecture and ARM 9 supports 5 stage pipeline
architecture.ARM is an Industry Standard Architecture & has been licensed to many
semiconductor manufacturers all over the world.ARM is the market leader in low-power &
Cost sensitive embedded applications. High performance for very low power consumption
and price.Architecture is based on Reduced Instruction Set Computer (RISC) principles.ARM
also develops: Software tools, boards, debug tools, application software, peripherals etc.
Several extensions available like: Thumb instruction set and Java machine for different
applications

3.1.1ARM NOMENCLATURE

ARM7TDMI: is a member of ARM family.

• TDMI = (?)
• • Thumb instruction set
• • Debug-interface
• • Multiplier (hardware)
• • In-circuit Emulator
The most used ARM-version
MODULE III ECT342 EMBEDDED SYSTEMS

3.1.2.ARM PROCESSOR CORE,CPU CORE AND ARM MICROCONTROLLER

1.Processor Core(Register,ALU,Control units)

The engine that fetches instructions and execute them

E.g. ARM7TDMI, ARM9TDMI, ARM9E-S

2.CPU Core

Consists of the ARM processor core and some tightly coupled function blocks.Cache and
memory management blocks.E.g.: ARM710T, ARM720T, ARM740T, ARM920T,
ARM922T, ARM940T, ARM946E-S, and ARM966E-S

3.ARM Microcontroller

consists of ARM CPU core and additional I/O peripherals.Eg LPC 2378

3.1.2ARCHITECTURAL INHERITANCE

ARM FAMILY
MODULE III ECT342 EMBEDDED SYSTEMS

ARM (CPU CORES)

3.3.1.ARM ORGANIZATION AND IMPELEMENTATION

3.3.1ARM 3 STAGE PIPELINE ARCHITECTURE

3.3.1.1ARM7TDMI FEATURES

• 32 bit processor core


• 37 Registers of 32 bit wide.
• 16 registers are available to the user
• Supports 8/16/32 bit data formats
• 3 stage- Pipelined architecture
• Some registers can be shared between operation modes
• ARM7TDMI is based on Von Neumann architecture.
• High instruction throughput and impressive real-time interrupt response.
• Employs a 3 stage pipelining.
• Provides hardware extensions for in-circuit debugging.
• Provides 16 bit THUMB instruction for narrow memory operations.
MODULE III ECT342 EMBEDDED SYSTEMS

Memory Access

• The ARM 7 is based on Von Neumann, load/store architecture, i.e.,


– Only 32 bit data bus for both inst. and data.
– Only the load/store inst. (and SWP) access memory.
• Memory is addressed as a 32 bit address space
• Data type can be 8 bit bytes, 16 bit half-words or 32 bit words, and may be seen as a
byte line folded into 4-byte words
3.3.1.2.ARM7 3 STAGE PIPELINE ARCHITECTURE

• REGISTER BANK-37 registers


• BARREL SHIFTER-which can shift or rotate one operand by any no of bits.
• MULTIPLIER UNIT AND 32 BIT ALU
• ADDRESS REGISTER AND INCREMENTER-selects and hold all memory
addresses and generate sequential addresses when required
• DATA REGISTERS- which hold data passing to and from the memory
• INSTRUCTION DECODER AND CONTROL LOGIC
MODULE III ECT342 EMBEDDED SYSTEMS

3.3.1.3 TYPES OF 3 STAGE PIPELINE

THREE STAGE PIPELINE SINGLE CYCLE OPERATION

 Fetch
The instruction is fetched from memory and placed in the instruction pipeline
 Decode
The instruction is decoded and data path control signals prepared for next cycle.
 Execute
the instruction owns the data path,the register bank is read, an operand shifted,the ALU
result generated and written back into destination register.

3 STAGE PIPELINE MULTICYCLE OPERATION


MODULE III ECT342 EMBEDDED SYSTEMS

3.3.1.4 DIFFERENT STATES

• When the processor is executing in ARM state:


– All instructions are 32 bits wide
– All instructions must be word aligned
• When the processor is executing in Thumb state:
– All instructions are 16 bits wide
– All instructions must be halfword aligned
• When the processor is executing in Jazelle state:
– All instructions are 8 bits wide
– Processor performs a word access to read 4 instructions at once
3.3.1.5 ARM STATE REGISTER SET

• Total 37 Registers (Bank switched)


– 31 General Purpose Registers
– 6 Status Registers
• Addressable registers in a given mode
– General Purpose Registers: R0-R12
– Stack Pointer: R13
– Link Register: R14
– PC: R15
• Fixed Mapping Based on Processor Mode
– Mapping 37 registers into 17 registers
– Virtually every operating mode has its private:
• Link Register-R14
• Stack Pointer-R13
ARM REGISTER SET

• GPR-General purpose registers(32 bit) (r0 to r12)

- holds either data or an address. They are identified with letter “r” prefixed to the
register no.
• SPR-Special purpose registers

r13-stack pointer(SP) and stores the head of the stack in the current processor mode.
r14-link register where the core puts the return address whenever it calls a subroutine.
MODULE III ECT342 EMBEDDED SYSTEMS

r15-PC which contains the address of next instruction to be fetched by processor.


r13 and r14 can be used as GPR since these registers are banked during a processor
mode change. r0 to r12 are orthogonal means any instruction that you apply to r0 can
equally well apply to any of other registers.In addition to 16 data registers there are cpsr
(Current program status word register and spsr(saved program status word register)
Banked registers

Out of 37 registers ,20 registers are hidden from a program at different times. They are
available only when the processor is in a particular mode.Banked registers of a particular
mode are denoted by underline character post fixed to the mode
eg abort mode has banked registers r13_abt,r14_abt and spsr_abt

ARM State Register Set (Cont.)

ARM Saves previous PSR into SPSRs during processor mode change
MODULE III ECT342 EMBEDDED SYSTEMS

ARM State Register Set (Cont.)

ARM Saves previous PSR into SPSRs during processor mode change

CPSR/SPSR(PROGRAM STATUS REGISTER)

• Mode bits ‘M’


– Processor mode
– Condition code Flags
– N >> -ve result from ALU
– Z >> zero result from ALU
– C >> ALU carried out
– V >> ALU overflow
• Interrupt Disable bits

– I =1 : IRQ disabled
MODULE III ECT342 EMBEDDED SYSTEMS

– F=1 : FIQ disabled

• T bit :

– ‘0’ – ARM state

– ‘1’ – Thumb state

CPSR Mode Bits

 10000  User Mode


 10001  FIQ Mode
 10010  IRQ Mode
 10011  Supervisor Mode
 10111  Abort Mode
 11011  Undefined Mode
 11111  System Mode
 Only these mode bits settings are valid and programming any other combination into
the CPSR leads to UNPREDICTABLE results. Other bits are to be left unchanged

3.3.1.6ARM PROCESSOR MODES/ARM PROGRAMMERS MODEL


MODULE III ECT342 EMBEDDED SYSTEMS

ARM PROCESSOR MODES TYPES

 Six privileged modes and one non-privileged mode.


 Privileged mode allows full RD/WR access to cpsr.
 Non-privileged mode mode only allows read access to control field in cpsr but still
allows read write access to condition flags.
 Privileged modes are abort, fast interrupt, request, interrupt request, supervisor,
system and undefined mode.
 Nonprivileged mode is user mode.
OR

ARM7TDMI have 7modes of operation:

User (usr): Normal ARM program execution state


Class 1: Exception caused mode change
Abort mode (abt): Entered after a data or instruction prefetch abort
Undefined (und): Entered when an undefined instruction is executed
Class 2: Interrupt caused mode change
FIQ (fiq): Fast interrupt request
IRQ (irq): General-purpose interrupt handling
Class 3: Software Interrupt
Supervisor (svc): Protected mode for the operating system
System (sys): A privileged user mode for the operating system

OR

• The ARM has six privileged operating modes:


– FIQ (entered when a high priority (fast) interrupt is raised)NMI
– IRQ (entered when a low priority (normal) interrupt is raised)
– Supervisor (entered on reset and when a Software Interrupt instruction is
executed)
– Abort (used to handle memory access violations)
– Undef (used to handle undefined instructions) eg 1/0
– System (privileged mode using the same registers as user mode)
• ARM Architecture adds a seventh mode:
– User (unprivileged mode under which most tasks run)

3.3.1.7 EXCEPTIONS/INTERRUPTS OF ARM


MODULE III ECT342 EMBEDDED SYSTEMS

Exception Handling and the Vector Table

• When an exception occurs, the core:


– Copies CPSR into SPSR_<mode>
– Sets appropriate CPSR bits

Interrupt disable flags if appropriate.


MODULE III ECT342 EMBEDDED SYSTEMS

– Maps in appropriate banked registers


– Stores the “return address” in LR_<mode>
– Sets PC to vector address
• To return, exception handler needs to:
– Restore CPSR from SPSR_<mode>
– Restore PC from LR_<mode>
Modes/Register Usage
Exceptions change processor mode

• Each mode associated with own set of registers

• Separate : Stack Pointer – r13

Link Register – r 14

SPSR

• Some registers are shared with other modes

3.3.2 ARM 5 STAGE PIPELINE ARCHITECTURE(ARM9 TDMI)

3.3.2.1ARM 9 FEATURES
MODULE III ECT342 EMBEDDED SYSTEMS

3.3.2.2ARM 9 S STAGE PIPELINE ARCHIECTURE


MODULE III ECT342 EMBEDDED SYSTEMS

ARM 7 AND ARM 9 COMPARISON

5 STAGE PIPELINE

 Fetch

The instruction is fetched from memory and placed in the instruction pipeline
 Decode

The instruction is decoded and register operands read from the register file
 Execute

Operand is shifted and ALU result generated. If instruction is a load or store, the memory
address is computed in ALU.
 Buffer/data

Data memory is accessed if required. Otherwise the ALU result is simply buffered for one
clock cycle to give the same pipeline flow for all instructions
 Write Back

The results generated by the instruction are written back to register file, including any data
loaded from memory
MODULE III ECT342 EMBEDDED SYSTEMS

3.3.2.3.POST,PRE INDEXING,PRE INDEXING WITH WRITE BACK ADDRESSING


MODES

3.3.2.4 ARM MEMORY ORGANIZATION

• Little Endian
In little endian ordering, bytes of increasing significance are stored at increasing addresses
in memory.
• Big Endian
In big endian ordering, bytes of decreasing significance are stored at increasing addresses
in memory.
MODULE III ECT342 EMBEDDED SYSTEMS

3.1.3.ARM DEVELOPMENT TOOLS

The ARM C compiler

The ARM Compiler is specifically designed to optimize software running on ARM


processors. “Compiler" is primarily used for programs that translate source code from
a high-level programming language to a lower level language (e.g. assembly
language, object code, or machine code) to create an executable program is compliant
with the ANSI (American National Standards Institute) standard for C and is supported by the
MODULE III ECT342 EMBEDDED SYSTEMS

appropriate library of standard functions. It uses the ARM Procedure Call Standard for all
externally available functions .The compiler can also produce Thumb code

The ARM assembler

is a full macro assembler which produces ARM object format.An assembler is a program that takes
basic computer instructions and converts them into a pattern of bits that the computer's processor
can use to perform its basic operations.output that can be linked with output from the C
compiler.Assembly source language is near machine-level, with most assembly instructions
translating into single ARM (or Thumb) instructions.

Linker

Takes one or more object files and combines them into an executable program.It resolves
symbolic references between the object files and extracts object modules from libraries as
needed by the program. It can assemble the various components of the program in a number
of different ways, depending on whether the code is to run in RAM or ROM.Normally the
linker includes debug tables in the output file. If the object files were compiled with full
debug information, this will include full symbolic debug tables .The linker can also produce
object library modules that are not executable but are ready for efficient linking with
object files in the future.

ARM symbolic debugger

is a front-end interface to assist in debugging programs running either under emulation


or remotely on a target system such as the ARM development board. The remote system
must support the appropriate remote debug protocols either via a serial line or through a
JTAG test interface Debugging a system where the processor core is embedded within an
application-specific system chip.At a more sophisticated level ARMsd supports full source
level debugging, allowing the C programmer to debug a program using the source file to
specify breakpoints and using variable names from the original program

ARMulator (ARM emulator)

is a suite of programs that models the behaviour of various ARM processor cores in
software on a host system.the ARMulator allows an ARM program developed using the
C compiler or assembler to be tested and debugged on a host machine with no ARM
MODULE III ECT342 EMBEDDED SYSTEMS

processor connected. It allows the number of clock cycles the program takes to execute to be
measured exactly, so the performance of the target system can be evaluated.At its most
complex, the ARMulator can be used as the centre of a complete, timing-accurate, C model of
the target system, with full details of the cache and memory management functions added,
running an operating system.In between these two extremes the ARMulator comes with a set
of model prototyping modules including a rapid prototype memory model and coprocessor
interfacing support

ARM Development Board

is a circuit board incorporating a range of components and interfaces to support the


development of ARM-based systems. It includes an ARM core memory components which
can be configured to match the performance and bus-width of the memory in the target
system and electrically programmable devices which can be configured to emulate
application-specific peripherals. It can support both hardware and software development
before the final application-specific hardware is available

3.2.1ARM INSTRUCTION SET


TYPES OF ARM INSTRUCTION SET

• DATA PROCESSING INSTRUCTIONS


• ARITHMETIC INSTRUCTIONS
• LOGICAL INSTRUCTIONS
• COMPARISON INSTRUCTIONS
• MULTIPLY INSTRUCTIONS
• BRANCH INSTRUCTIONS
• LOAD/STORE INSTRUCTIONS

DATA PROCESSING INSTRUCTIONS


MODULE III ECT342 EMBEDDED SYSTEMS

2.ARITHMETIC INSTRUCTIONS
MODULE III ECT342 EMBEDDED SYSTEMS

3.LOGICAL INSTRUCTIONS

4.COMPARISON INSTRUCTIONS

5.MULTIPLY INSTRUCTIONS
MODULE III ECT342 EMBEDDED SYSTEMS

5.BRANCH INSTRUCTIONS

6.LOAD STORE INSTRUCTIONS


MODULE III ECT342 EMBEDDED SYSTEMS

3.2.4.SIMPLE ASSEMBLY LANGUAGE PROGRAMMING


1.WAP FOR 16 BIT DATA TRANSFER

• LDRB R1,VALUE
• STRB R1,RESULT
2.WAP TO ADD TWO NUMBERS

• LDR R1,VALUE1
• LDR R2,VALUE 2
• ADD R1,R1,R2 ;R1=R1+R2
• STR R1,RESULT
3.WAP TO FIND LARGER OF TWO NUMBERS

• LDR R1,VALUE1
• LDR R2,VALUE 2
• CMP R1,R2
• BHI DONE
• MOV R1,R2
• DONE:STR R1,RESULT
WAP TO FIND ONES COMPLIMENT OF A NUMBER
• LDR R1,VALUE
• MVN R1,R1
• STR R1,RESULT
4.WAP TO ADD FOUR NUMBERS

• MOV R0,R1
• ADD R0,R0,R2; RO=RO+R2
• ADD R0,R0,R3
• ADD R0,R0,R4
5.WAP TO SWAP THE CONTENTS OF REGISTERS R0 AND R1

• MOV R2,R0 ;R2=R0


• MOV R0,R1 ; R0=R1
• MOV R1,R2 ;R1=R2
2
6.WAP TO COMPUTE 4X +3X

2
• MUL R0,R1,R1;R0= R1
• MOV R2,#04
2
• MUL R0,R2,R0 ; R0= 4 R1
• MOV R2,#03
• MUL R2,R1,R2 ;R2=3R1
2
• ADD R0,R0,R2 ; R0= 4 R1 +3R1
MODULE III ECT342 EMBEDDED SYSTEMS

3.3.5 THE ARM COPROCESSOR INTERFACE

• The ARM architecture supports a general mechanism for extending the instruction set
through the addition of coprocessors
• The most common use of a coprocessor is the system coprocessor used to control
on-chip functions such as the cache and memory management unit on the
ARM720
• A floating-point ARM coprocessor has also been developed, and application-specific
coprocessors are a possibility
Most important features of co- processor are

• Support for up to 16 logical coprocessors


• Each coprocessor can have up to 16 private registers of any reasonable size they are
not limited to 32 bits
• Coprocessors use a load-store architecture, with instructions to perform internal
operations on registers, instructions to load and save registers from and to memory,
and instructions to move data to or from an ARM register
ARM7TDMI coprocessor interface

• The ARM7TDMI coprocessor interface is based on 'bus watching’


• The coprocessor is attached to a bus where the ARM instruction stream flows into the
ARM, and the coprocessor copies the instructions into an internal pipeline that
mimics the behaviour of the ARM instruction pipeline.
• As each coprocessor instruction begins execution there is a 'hand-shake' between the
ARM and the coprocessor to confirm that they are both ready to execute it.
The handshake uses three signals:

• cpi (from ARM to all coprocessors)

This signal, which stands for 'Coprocessor Instruction', indicates that the ARM has
identified a coprocessor instruction and wishes to execute it
• cpa (from the coprocessors to ARM)

This is the 'Coprocessor Absent' signal which tells the ARM that there is no coprocessor
present that is able to execute the current instruction
• cpb (from the coprocessors to ARM)
MODULE III ECT342 EMBEDDED SYSTEMS

This is the 'CoProcessor Busy' signal which tells the ARM that the coprocessor cannot
begin executing the instruction yet.
 The timing is such that both the ARM and the coprocessor must generate their
respective signals autonomously.
 The coprocessor cannot wait until it sees cpi before generating cpa and cpb.

Coprocessor registers

• ARM coprocessors have their own private register sets and their state is controlled
by instructions that mirror the instructions that control ARM registers.
• The ARM has sole responsibility for control flow, so the coprocessor instructions are
concerned with data processing and data transfer
Coprocessor data operations

• Coprocessor data operations are completely internal to the coprocessor and cause a
state change in the coprocessor registers
• An example would be floating-point addition, where two registers in the floating-
point coprocessor are added together and the result placed into a third register

Coprocessor data transfers

• Coprocessor data transfer instructions load or store the values in coprocessor registers
from or to memory. Since coprocessors may support their own data types, the number
of words transferred for each register is coprocessor dependent.
• The ARM generates the memory address, but the coprocessor controls the number of
words transferred.
• A coprocessor may perform some type conversion as part of the transfer (for instance
the floating-point coprocessor converts all loaded values into its 80-bit internal
representation).

The ARM coprocessor interface

The coprocessor architecture important features

• Support for up to 16 logical coprocessors.


MODULE III ECT342 EMBEDDED SYSTEMS

• Each coprocessor can have up to 16 private registers of any reasonable size, they are not
limited to 32 bits.

• Coprocessors use a load-store architecture, with instructions to perform internal operations


on registers, instructions to load and save registers from and to memory, and
instructions to move data to or from an ARM register.

1. cpi (from ARM to all coprocessors): This signal, which stands for 'Coprocessor
Instruction', indicates that the ARM has identified a coprocessor instruction and wishes
to execute it.

2. cpa (from the coprocessors to ARM).

This is the 'Coprocessor Absent' signal which tells the ARM that there is no coprocessor
present that is able to execute the current instruction.

3. cpb (from the coprocessors to ARM).

This is the 'Co Processor Busy' signal which tells the ARM that the coprocessor cannot
begin executing the instruction yet.
The ARM system control coprocessor

• The ARM system control coprocessor is an on-chip coprocessor, using logical


coprocessor which controls the operation of the on-chip cache or caches, memory
management or protection unit, write buffer, prefetch buffer, branch target cache and
system configuration signals.
• The control is effected through the reading and writing of the CP registers.

• The instructions registers are all 32 bits long, and access is restricted to MRC and
MCR instructions which must be executed in supervisor mode.

• Move to ARM register from coprocessor-MRC

• Move to coprocessor from ARM register -MCR

• ARM CPUs which are used in embedded systems with fixed or controlled application
programs do not require a full memory management unit with address translation
capabilities.

• For such systems a simpler protection unit is adequate


MODULE III ECT342 EMBEDDED SYSTEMS

3.3.4 ARM implementation


• the design is divided into a
 control section
 Data path section

3.3.4.1DATAPATH SECTION

Clocking scheme

 Data movement is controlled by passing the data alternately through latches which are
open during phase 1 and latches which are open during phase 2
 The non-overlapping property of the phase 1 and phase 2 clocks ensures that there are
no race conditions in the circuit

Datapath timing

• The register read buses are dynamic and are precharged during phase 2
MODULE III ECT342 EMBEDDED SYSTEMS

• When phase 1 goes high, the selected registers discharge the read buses which
become valid early in phase 1
• One operand is passed through the barrel shifter, which also uses dynamic techniques ,
and the shifter output becomes valid a little later in phase 1.
• The ALU has input latches which are open during phase 1, allowing the operands to
begin combining in the ALU as soon as they are valid, but they close at the end of
phase 1 so that the phase 2 precharge does not get through to the ALU.
• The ALU then continues to process the operands through phase 2, producing a valid
output towards the end of the phase which is latched in the destination register at the
end of phase 2.

DATAPATH CYCLE TIME
• The minimum datapath cycle time is therefore the sum of:
 the register read time
 the shifter delay
 the ALU delay
 the register write set-up time
 the phase 2 to phase 1 non-overlap time
(i)ALU delay
• Of these, the ALU delay dominates
 The ALU delay is highly variable, depending on the operation it is performing
 Logical operations are relatively fast, since they involve no carry propagation
 Arithmetic operations (addition, subtraction and comparisons) involve longer logic
paths as the carry can propagate across the word width
TYPES OF ADDER USED

(a)The original ARM1 ripple-carry adder circuit

• It is possible to create a logical circuit using multiple full adders to add N-bit
numbers.
MODULE III ECT342 EMBEDDED SYSTEMS

• Each full adder inputs a C , which is the C of the previous adder


in out
• This kind of adder is called a ripple-carry adder, since each carry bit "ripples" to
the next full adder.
• Using a CMOS AND-OR-INVERT gate for the carry logic and alternating AND/OR
logic so that even bits use the circuit shown and odd bits use the dual circuit with
inverted inputs and outputs and AND and OR gates swapped around, the worst-case
carry path is 32 gates long.

(b)Carry select adder

• The carry-select adder generally consists of two ripple carry adders and a multiplexer
• Adding two n-bit numbers with a carry-select adder is done with two adders
(therefore two ripple carry adders) in order to perform the calculation twice, one time
with the assumption of the carry-in being zero and the other assuming it will be one
• After the two results are calculated, the correct sum, as well as the correct carry-out,
is then selected with the multiplexer once the correct carry-in is known.
The ARM2 4-bit carry look-ahead scheme

• The carry-lookahead adder calculates one or more carry bits before the sum, which
reduces the wait time to calculate the result of the larger value bits.
• In order to allow a higher clock rate, ARM2 used a 4-bit carry look-ahead scheme to
reduce the worst-case carry path length.

• The logic produces carry generate (G) and propagate (P) signals which control the 4-
bit carry-out.

• The carry propagate path length is reduced to eight gate delays, again using merged
AND-OR-INVERT gates and alternating AND/OR logic
MODULE III ECT342 EMBEDDED SYSTEMS

(c)Carry arbitration adder

• The adder logic was further improved on the ARM9TDMI, where a 'carry arbitration‘
adder is used
• This adder computes all intermediate carry values using a 'parallel-prefix' tree, which
is a very fast parallel logic structure.
• The carry arbitration scheme recedes the conventional propagate-generate information
in terms of two new variables, u and v

Carry arbitration adder (ARM9TDMI). ▫. Computes all intermediate carry values using a
'parallel-prefix' tree, which is a very fast parallel logic structure.

ARM6 ALU STRUCTURE


MODULE III ECT342 EMBEDDED SYSTEMS

• The input operands are each selectively inverted, then added and combined in the
logic unit, and finally the required result is selected and issued on the ALU result
bus.
• The C and V flags are generated in the adder (they have no meaning for logical
operations), the N flag is copied from bit 31 of the result and the Z flag is evaluated
from the whole result bus
(ii)The barrel shifter

• In order to minimize the delay through the shifter, a cross-bar switch matrix is used
to steer each input to the appropriate output
• Each input is connected to each output through a switch.
• If pre-charged dynamic logic is used, as it is on the ARM datapaths, each switch can
be implemented as a single NMOS transistor.
MODULE III ECT342 EMBEDDED SYSTEMS

(iii)Multiplier design
• Two styles of multiplier have been used:
 Older ARM cores include low-cost multiplication hardware that supports only the 32-
bit result multiply and multiply-accumulate instructions.
 Recent ARM cores have high-performance multiplication hardware and support the
64-bit result multiply and multiply-accumulate instructions
MODULE III ECT342 EMBEDDED SYSTEMS

As the multiplier is shifted right eight bits per cycle in the 'Rs' register, the partial sum
and carry are rotated right eight bits per cycle. The array is cycled up to four times, using
early termination to complete the instruction in fewer cycles where the multiplier has
sufficient zeros in the top bits, and the partial sum and carry are combined 32 bits at a
time and written back into the register bank. The high-speed multiplier requires
considerably more dedicated hardware than the low-cost solution employed on other ARM
cores .There are 160 bits of shift register and 128 bits of carry-save adder logic.The
incremental area cost is around 10% of the simpler processor cores, though a rather smaller
proportion of the higher-performance cores such as ARMS and StrongARM.Its benefits are
that it speeds up multiplication by a factor of around 3 and it supports the added
functionality of the 64-bit result forms of the multiply instruction
(iv)The register bank
The last major block on the ARM datapath is the register bank.This is where all the user-
visible state is stored in 31 general-purpose 32-bit registers, mounting to around 1
Kbits of data altogether.Since the basic 1-bit register cell is repeated so many times in the
design, it is worth putting considerable effort into minimizing its size
ARM6 register cell circuit
MODULE III ECT342 EMBEDDED SYSTEMS

The ARM datapath is laid out to a constant pitch per bit. The pitch will be a compromise
between the optimum for the complex functions (such as the ALU) which are best suited to a
wide pitch and the simple functions (such as the barrel shifter) which are most efficient
when laid out on a narrow pitch.Each function is then laid out to this pitch, remembering
that there may also be buses passing over a function (for example the B bus passes through
the ALU but is not used by it) space must be allowed for these. It is a good idea to produce a
floor-plan for the datapath noting the 'passenger' buses through each block.The order of the
function blocks is chosen to minimize the number of additional buses passing over the
more complex functions.
MODULE III ECT342 EMBEDDED SYSTEMS

2.CONTROL SECTION
ARM control logic structure

The control logic on the simpler ARM cores has three structural components
 An instruction decoder PLA (programmable logic array). This unit uses some of the
instruction bits and an internal cycle counter to define the class of operation to be
performed on the datapath in the next cycle.
 Distributed secondary control is associated with each of the major datapath
function blocks. This logic uses the class information from the main decoder PLA to
select other instruction bits and/or processor state information to control the datapath.
 Decentralized control units for specific instructions that take a variable number of
cycles to complete (load and store multiple, multiply and coprocessor operations).
Here the main decoder PLA locks into a fixed state until the remote control unit
indicates completion.

You might also like