0% found this document useful (0 votes)
168 views87 pages

ARM - An Understanding and More

The document provides an overview of the ARM processor architecture. It discusses that ARM is a 32-bit RISC processor intended for embedded applications. It has a load-store architecture and uses a 3-stage pipeline. The programmer model involves 16 general purpose registers and additional registers like the program counter, stack pointer, and link register. Each processor mode has its own bank of registers and stack pointer.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views87 pages

ARM - An Understanding and More

The document provides an overview of the ARM processor architecture. It discusses that ARM is a 32-bit RISC processor intended for embedded applications. It has a load-store architecture and uses a 3-stage pipeline. The programmer model involves 16 general purpose registers and additional registers like the program counter, stack pointer, and link register. Each processor mode has its own bank of registers and stack pointer.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 87

3/8/2020 ARM by Shriram 1

ARM – AN
UNDERSTANDING AND
MORE.
Shriram K Vasudevan
Dept. of CSE, Amrita University, Coimbatore, India.
[email protected]
Ph: 89399 18562

Source: inspired by various materials and it is a consolidation! Materials


available are plenty, choosing the right one is the toughest task! 
3/8/2020 ARM by Shriram 2

Agenda.
• Features and Basics
• Architecture.
• Programming Model.
• Instruction Set.
• Thumb Mode.
• Few Sample Codes.
3/8/2020 ARM by Shriram 3

What is ARM?
• ARM processor is basically a 32 bit processor, meant
particularly for high end applications which involve
more complex computation and calculations.
• ARM processor was first developed at “ACORN
computer Limited” of Cambridge, England between
1983 and 1985 just after 1980 when the concept of
RISC was introduced at Stanford and Berkley. (ARM –
Acron RISC Machine)
• ARM specializes in the concept of ARM core, which
they have licensed to number of other manufacturers
to make a variety of chips around the same processor
core. (Means, I tell you how to make, you make it in
your name!)
3/8/2020 ARM by Shriram 4

Contd.,
• So, now the focus is not on family of processors, but
conceptually a CPU architecture which may figure in
number of different chips intended for embedded
applications.
• The ARM is based on RISC architecture, but it is not a
purely RISC architecture because it has been
enhanced to meet the requirement of embedded
applications.  Versatility!
• The requirements for embedded applications are
basically high code density, low power consumption
as well as low and smaller silicon footprint.
Architecturally ARM satisfies various conditions and
properties of RISC processors as well.
3/8/2020 ARM by Shriram 5

Features
• ARM processor has a large uniform register file.
• It is basically a LOAD-STORE architecture, where data processing operations
are only between registers and does not involve any memory operations.
• It is a 32 bit processor and also has variants of 16 bit and 8 bit
architectures.
• So, there are 16 bit and 8 bit variants embedded into a 32 bit processor.
• We will enumerate about 16 bit and 8 bit variants also called as THUMB and
Jazelle architecture.
• ARM has got a very good speed Vs power consumption ratio and
high code density as required by embedded applications
• It has got barrel shifter in the data path, which can maximize the
hardware usage available on the chip.
• It has also got auto increment and auto decrement addressing modes to
optimize program loops; this is not very common with RISC processor. Also
ARM supports LOAD and STORE of multiple data elements through a single
instruction.
• ARM has also got a feature named ‘conditional execution’, where an
instruction gets executed only when a condition is being met, which maximizes
the execution throughput.
3/8/2020 ARM by Shriram 6

See this labels.


3/8/2020 ARM by Shriram 7

The Pipeline
• At the heart of the ARM7 CPU is the instruction pipeline.
The pipeline is used to process instructions taken from
the program store.
• On the ARM 7 a three-stage pipeline is used.
3/8/2020 ARM by Shriram 8

Contd.,
• A three-stage pipeline is the simplest form of pipeline and does not
suffer from the kind of hazards such as read-before-write seen in
pipelines with more stages.
• The pipeline has hardware independent stages that execute one
instruction while decoding a second and fetching a third.
• The pipeline speeds up the throughput of CPU instructions so
effectively that most ARM instructions can be executed in a single
cycle.
• The pipeline works most efficiently on linear code. As soon as a
branch is encountered, the pipeline is flushed and must be refilled
before full execution speed can be resumed. (Very Essential  )
• As we shall see, the ARM instruction set has some interesting
features which help smooth out small jumps in your code in
order to get the best flow of code through the pipeline.
• As the pipeline is part of the CPU, the programmer does not have any
exposure to it.
3/8/2020 ARM by Shriram 9

Programmers Model – Register


Architecture
• The ARM7 is a load-and-store architecture, so in order to
perform any data processing instructions the data has first
to be moved from the memory store into a central set of
registers, the data processing instruction has to be
executed and then the data is stored back into memory.
(This is Load and Store, Recollect the past)
3/8/2020 ARM by Shriram 10

Contd.,
• The central set of registers are a
bank of 16 user registers R0 –
R15. Each of these registers is
32 bits wide and R0 – R12 are
user registers in that they do not
have any specific other function.
(Means, general purpose
registers)
• The Registers R13 – R15 do
have special functions in the
CPU.
• R13 is used as the stack pointer
(SP). (You know what it is!)
3/8/2020 ARM by Shriram 11

Contd.,
• R14 is called the link register (LR).
• When a call is made to a function the
return address is automatically
stored in the link register and is
immediately available on return from
the function.
• This allows quick entry and return in
to a ‘leaf’ function (a function that is
not going to call further functions).
• If the function is part of a branch (i.e. it
is going to call other functions) then the
link register must be preserved on the
stack (R13). (Do you understand??)
• Finally R15 is the program counter
(PC).
• Interestingly, many instructions can be
performed on R13 - R15 as if they were
standard user registers (But, don’t ever
do this please!)
3/8/2020 ARM by Shriram 12

Current Program Status Register


• In addition to the register bank there is an additional 32 bit wide register
called the ‘current program status register’ (CPSR). The CPSR contains
a number of flags which report and control the operation of the ARM7
CPU.
3/8/2020 ARM by Shriram 13

Contd.,
• The top four bits of the CPSR contain the condition codes which are set by
the CPU. (Flags)
• The lowest eight bits in the CPSR contain flags which may be set or cleared
by the application code.
• Bits 7 and 8 are the I and F bits. These bits are used to enable and disable
the two interrupt sources which are external to the ARM7 PU. You should be
careful when programming these two bits because in order to disable
either interrupt source the bit must be set to ‘1’ not ‘0’ as you might
expect. Bit 5 is the THUMB bit.
3/8/2020 ARM by Shriram 14

Contd.,
• The ARM7 CPU is capable of executing two instruction sets; the ARM
instruction set which is 32 bits wide and the THUMB instruction set
which is 16 bits wide. (Jazelle is also there. We are not worried!)
• Consequently the T bit reports which instruction set is being executed.
(Refer the label slide).
• Your code should not try to set or clear this bit to switch between instruction
sets. (Means, instructions shall be different for different modes)
• The last five bits are the mode bits.
• The ARM7 has 7 different operating modes. (We shall see this later)
• Your application code will normally run in the user mode with access to the
register bank R0 – R15 and the CPSR as already discussed.
• However, in response to an exception such as an interrupt, memory
error or software interrupt instruction the processor will change modes.
• When this happens the registers R0 – R12 and R15 remain the same but R13
(LR ) and R14 (SP) are replaced by a new pair of registers unique to that
mode. This means that each mode has its own stack and link register.
(Understand this, please, this will help in resuming operation)
• In addition, the fast interrupt mode (FIQ) has duplicate registers for R7 – R12.
3/8/2020 ARM by Shriram 15

Contd.,
• Each of the modes except user mode has an additional
register called the “saved program status register”.
• If your application is running in user mode when an
exception occurs the mode will change and the current
contents of the CPSR will be saved into the SPSR.
(Context saving is this, folks)
Register Set and View

Current Visible Registers


r0
Abort
Undef
SVC
IRQ
FIQ
User Mode
Mode
Mode
Mode
Mode r1
r2
r3 Banked out Registers
r4
r5
r6 User FIQ IRQ SVC Undef Abort
r7
r8 r8 r8
r9 r9 r9
r10 r10 r10
r11 r11 r11
r12 r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr spsr

39v10 The ARM Architecture TM


16 16
3/8/2020 ARM by Shriram 17

Contd.,
• User : unprivileged mode under which most tasks run

• FIQ : entered when a high priority (fast) interrupt is raised

• IRQ : entered when a low priority (normal) interrupt is raised

• Supervisor : entered on reset and when a Software Interrupt


instruction is executed

• Abort : used to handle memory access violations

• Undef : used to handle undefined instructions

• System : privileged mode using the same registers as user mode


3/8/2020 ARM by Shriram 18

Contd.,
• All modes other than USER are privileged.
• These have full access to system resources and can
move freely.
• Exception modes are also there. (5, see in next slide).
Happens when there is an exception.
• System mode – Like user mode. But, privileged.
• For operating system tasks.
• Does not require additional registers, but needs system resources.
• So, made a privileged mode.
3/8/2020 ARM by Shriram 19

This will clarify.


3/8/2020 ARM by Shriram 20

Exception Modes
• When an exception occurs, the CPU will change modes
and the PC be forced to an exception vector. The vector
table starts from address zero with the reset vector and
then has an exception vector every four bytes.
3/8/2020 ARM by Shriram 21

Contd.,
• There is a gap in the vector table because there is a
missing vector at 0x00000014.
• This location was used on an earlier ARM architecture
and has been preserved on ARM7 to ensure software
compatibility between different ARM architectures.
3/8/2020 ARM by Shriram 22

Basic ARM Architecture


• Any architecture is not only characterized by its data path, but
also by its control path.
• This data path is organized in such a way that the
operands are not directly fetched from memory, and a
basic feature of RISC is that operands get fetched from
registers and not from memory.
• Instructions typically use two source registers and single
result/destination register.
• The more interesting facts are the presence of the ’barrel
shifter’ and the ‘increment/decrement’ logic.
• The barrel shifter on the data path can preprocess the data before it
enters the ALU.
• It is basically a combinational circuit that can shift a data bit to
the left or to the right by an arbitrary number of positions in the
same cycle itself. (This feature is not supported in many of the
other processors)
3/8/2020 ARM by Shriram 23

Contd.,
• In classical shift register, the number of shifts requires an
equivalent number of clocks because, the shifting takes
place based on clocks.
• In barrel shifter, Combinational circuit is used, where the
shifting takes place in a single attempt itself.
• In fact, the shift takes place in the same instruction
itself. This is a very basic enhancement present in the
ARM data path.
• The other interesting feature is the increment and
decrement logic which can operate on the registers that
are independent of the ALU.
• This facilitates the implementation of auto-increment and auto-
decrement features in the ARM, where it is used for movement of
blocks of data between the memory and registers.
3/8/2020 ARM by Shriram 24

ARM organization core data flow model

The arrows represent the direction of data flow, and the lines represent the
buses and the boxes represent either a storage unit or an operation unit.
3/8/2020 ARM by Shriram 25

ARM organization core data


flow model
Instruction decoder:
• It decodes the instruction
before execution is carried
out.
• There are three kinds of
instruction set supported in
ARM core.
• They can be ARM
instruction set; Jazelle
Instruction set and
THUMB instruction set.
• With the CPSR one can set
the operation state and
accordingly instruction set
can be selected.
3/8/2020 ARM by Shriram 26
• Load and Store architecture:
• ARM falls in the RISC category which follows the
Load and Store architecture.
• Means, with Load and Store architecture in place,
registers will be mandatory for processing to be
carried out.
• Without registers one cannot do any sort of
operation with ARM core.
• ARM has 2 source registers Rm and Rn and one
destination register which carries the result. The
destination register is named as Rd.
• A and B are the two buses that will help in reading
the source operands.
• Rm and Rn values will be fetched from the buses
A and B and computation will be carried out in the
ALU or MAC (Multiplication and Accumulate unit).
Address registers are used to hold the address A barrel shifter is a digital circuit that
and address bus will facilitate the storage action. can shift a data word by a specified
number of bits without the use of
• Barrel shifter is a kind of support which is very any sequential logic, only pure
useful in association with ALU for expression combinatorial logic. ... A barrel
evaluation and address calculation. shifter is often used to shift and
• After going through the steps and sequences the rotate n-bits in modern
result will be moved to the Register Rd. microprocessors, typically within a
single clock cycle.

3/8/2020 ARM by Shriram 27

3-stage pipeline ARM organization


Fetch:
The instruction is fetched from memory and placed in the instruction pipeline.
Decode:
The instruction is decoded and the datapath control signals prepared for the
next cycle. In this stage the instruction 'owns' the decode logic but not the
datapath.
Execute:
The instruction 'owns' the datapath; the register bank is read, an operand
shifted, the ALU result generated and written back into a destination register

At any one time, three different instructions may occupy each of these
stages, so the hardware in each stage has to be capable of independent
operation
3/8/2020 ARM by Shriram 28

Contd.,
• When the processor is executing simple data processing
instructions the pipeline enables one instruction to be
completed every clock cycle.
• An individual instruction takes three clock cycles to
complete, so it has a three-cycle latency, but the through-
put is one instruction per cycle.
3/8/2020 ARM by Shriram 29

5 stage pipeline
• Higher performance ARM cores employ a 5-stage pipeline
and have separate instruction and data memories.
• Breaking instruction execution down into five components
rather than three reduces the maximum work which must
be completed in a clock cycle, and hence allows a higher
clock frequency to be used (provided that other system
components, and particularly the instruction memory, are
also redesigned to operate at this higher clock rate).
3/8/2020 ARM by Shriram 30

Contd.,
• Fetch: the instruction is fetched from memory and placed in the
instruction pipeline.
• Decode: the instruction is decoded and register operands read
from the register file. There are three operand read ports in the
register file, so most ARM instructions can source all their
operands in one cycle.
• Execute: an operand is shifted and the ALU result generated. If
the instruction is a load or store the memory address is
computed in the ALU.
• Buffer/data: data memory is accessed if required. Otherwise
the ALU result is simply buffered for one clock cycle to give the
same pipeline flow for all instructions.
• Write-back: the results generated by the instruction are written
back to the register file, including any data loaded from
memory.
3/8/2020 ARM by Shriram 31

What happens when an exception


occurs?
• When an exception occurs, for example an IRQ
exception, the following actions are taken:
• First the address of the next instruction to be
executed (PC + 4) is saved into the link register
3/8/2020 ARM by Shriram 32

Contd.,
• Then the CPSR is copied into the SPSR of the
exception mode that is about to be entered (i.e.
SPSR_irq)
• The PC is then filled with the address of the exception
mode interrupt vector. In the case of the IRQ mode
this is 0x00000018 (Refer the table shown earlier.)
3/8/2020 ARM by Shriram 33

Contd.,
• At the same time the mode is changed to IRQ mode,
which causes R13 and R14 to be replaced by the IRQ
R13 and R14 registers.
• Once your code has finished processing the exception it
must return back to the user mode and continue where it
left off.
• However the ARM instruction set does not contain a
return” or “return from interrupt” instruction so
manipulating the PC must be done by regular instructions.
3/8/2020 ARM by Shriram 34

Contd.,
• The situation is further complicated by there being a
number of different return cases. (Makes life further
difficult)
• Let us consider three cases! All are very interesting to
look into! 
3/8/2020 ARM by Shriram 35

Case : 1
• Consider the SWI instruction.
• In this case the SWI instruction is executed, the address
of the next instruction to be executed is stored in the Link
register and the exception is processed.
• In order to return from the exception all that is necessary
is to move the contents of the link register into the PC
and processing can continue.
• However in order to make the CPU switch modes back to
user mode, a modified version of the move instruction is
used and this is called MOVS (more about this later).
• Hence for a software interrupt the return instruction is
MOVS R15,R14 ; Move Link register into the PC and
switch modes.
3/8/2020 ARM by Shriram 36

Case: 2
• Consider the FIQ and IRQ instructions, when an exception
occurs the current instruction being executed is discarded
and the exception is entered.
• When the code returns from the exception the link register
contains the address of the discarded instruction plus four.
• In order to resume processing at the correct point we need to
roll back the value in the Link register by four.
• In this case we use the subtract instruction to deduct four from
the link register and store the results in the PC.
• As with the move instruction, there is a form of the subtract
instruction which will also restore the operating mode. For an
IRQ, FIQ or Prog Abort, the return instruction is:
• SUBS R15, R14,#4
3/8/2020 ARM by Shriram 37

Case: 3
• In the case of a data abort instruction, the
exception will occur one instruction after execution of
the instruction which caused the exception.
• In this case we will ideally enter the data abort ISR, sort
out the problem with the memory and return to reprocess
the instruction that caused the exception. In this case
we have to roll back the PC by two instructions i.e.
the discarded instruction and the instruction that
caused the exception.
• In other words subtract eight from the link register and
store the result in the PC. For a data abort exception the
return instruction is
• SUBS R15, R14,#8
3/8/2020 ARM by Shriram 38

Features of ARM – Apt here.


• ARM processor has a large uniform register file.
• It is basically a LOAD-STORE architecture, where data processing operations are
only between registers and does not involve any memory operations.
• It is a 32 bit processor and also has variants of 16 bit and 8 bit
architectures.
• So, there are 16 bit and 8 bit variants embedded into a 32 bit processor.
• We will enumerate about 16 bit and 8 bit variants also called as THUMB and
Jazelle architecture.
• ARM has got a very good speed Vs power consumption ratio and high
code density as required by embedded applications
• It has got barrel shifter in the data path, which can maximize the
hardware usage available on the chip.
• It has also got auto increment and auto decrement addressing modes to optimize
program loops; this is not very common with RISC processor. Also ARM supports
LOAD and STORE of multiple data elements through a single instruction.
• ARM has also got a feature named ‘conditional execution’, where an instruction
gets executed only when a condition is being met, which maximizes the execution
throughput.
3/8/2020 ARM by Shriram 39

Conditional Execution – Remember this.


• Most instruction sets only allow branches to be executed
conditionally.
• However by reusing the condition evaluation hardware, ARM
effectively increases number of instructions.
• All instructions contain a condition field which determines whether the
CPU will execute them.
• Non-executed instructions soak up 1 cycle.
• Still have to complete cycle so as to allow fetching and decoding of
following instructions.
• This removes the need for many branches, which stall the
pipeline (3 cycles to refill).
• Allows very dense in-line code, without branches.
• The Time penalty of not executing several conditional instructions is
frequently less than overhead of the branch or subroutine call that
would otherwise be needed.
3/8/2020 ARM by Shriram 40

Condition Field.
3/8/2020 ARM by Shriram 41

Conditional Execution
• To execute an instruction conditionally, simply postfix it with the
appropriate condition:
• For example an add instruction takes the form:
• ADD r0,r1,r2 ; r0 = r1 + r2 (ADDAL)
• To execute this only if the zero flag is set:
• ADDEQ r0,r1,r2 ; If zero flag set then…
; ... r0 = r1 + r2
• By default, data processing operations do not affect the
condition flags (apart from the comparisons where this is the
only effect). To cause the condition flags to be updated, the S
bit of the instruction needs to be set by postfixing the
instruction (and any condition code) with an “S”.
• For example to add two numbers and set the condition flags:
• ADDS r0,r1,r2 ; r0 = r1 + r2
; ... and set flags
3/8/2020 ARM by Shriram 42

ARM Instruction Set / Thumb Instruction


Set.
3/8/2020 ARM by Shriram 43

Quick Analysis of Instruction Set.


Data Processing Instructions.
• Largest family of ARM instructions, all sharing the same
instruction format.
• Contains:
• Arithmetic operations
• Comparisons (no results - just set condition codes)
• Logical operations
• Data movement between registers
• Remember, this is a load / store architecture
• These instruction only work on registers, NOT memory.
• They each perform a specific operation on one or two
operands.
• First operand always a register - Rn
• Second operand sent to the ALU via barrel shifter.
3/8/2020 ARM by Shriram 44

Arithmetic Instructions.
• Operations are:
• ADD operand1 + operand2
• ADC operand1 + operand2 + carry
• SUB operand1 - operand2
• SBC operand1 - operand2 + carry -1
• RSB operand2 - operand1
• RSC operand2 - operand1 + carry - 1
• Syntax:
• <Operation>{<cond>}{S} Rd, Rn, Operand2
• Examples
• ADD r0, r1, r2
• SUBGT r3, r3, #1
• RSBLES r4, r5, #5
3/8/2020 ARM by Shriram 45

Comparisons
• The only effect of the comparisons is to
• UPDATE THE CONDITION FLAGS. Thus no need to set S bit.
• Operations are:
• CMP operand1 - operand2, but result not written
• CMN operand1 + operand2, but result not written
• TST operand1 AND operand2, but result not written
• TEQ operand1 EOR operand2, but result not written
• Syntax:
• <Operation>{<cond>} Rn, Operand2
• Examples:
• CMP r0, r1
• TSTEQ r2, #5
3/8/2020 ARM by Shriram 46

Data Movement
• Operations are:
• MOV operand2
• MVN NOT operand2
Note that these make no use of operand1.
• Syntax:
• <Operation>{<cond>}{S} Rd, Operand2
• Examples:
• MOV r0, r1
• MOVS r2, #10
• MVNEQ r1,#0
3/8/2020 ARM by Shriram 47

Multiplication Instructions
• The Basic ARM provides two multiplication instructions.
• Multiply
• MUL{<cond>}{S} Rd, Rm, Rs ; Rd = Rm * Rs
• Multiply Accumulate - does addition for free
• MLA{<cond>}{S} Rd, Rm, Rs,Rn ; Rd = (Rm * Rs) + Rn
• Restrictions on use:
• Rd and Rm cannot be the same register
• Can be avoid by swapping Rm and Rs around. This works because
multiplication is commutative.
• Cannot use PC.
These will be picked up by the assembler if overlooked.
• Operands can be considered signed or unsigned
• Up to user to interpret correctly.
3/8/2020 ARM by Shriram 48

Load Store
• The ARM is a Load / Store Architecture:
• Does not support memory to memory data processing operations.
• Must move data values into registers before using them.
• This might sound inefficient, but in practice isn’t:
• Load data values from memory into registers.
• Process data in registers using a number of data processing
instructions which are not slowed down by memory access.
• Store results from registers out to memory.
• The ARM has three sets of instructions which interact with
main memory. These are:
• Single register data transfer (LDR / STR).
• Block data transfer (LDM/STM).
• Single Data Swap (SWP).
3/8/2020 ARM by Shriram 49

Software Support
• Download Keil from website. Select ARM core.
• It will ask you to register. Register, download and install. It
is easy.
3/8/2020 ARM by Shriram 50

Remember this.
• There must be an ENTRY directive. This tells the location
of the first executable instruction.
• AREA = PROGRAM / DATA / OR ANYTHING!
• END directive is must to show the code is getting
completed there.
• ARM can deal directly with 32 bit instructions as you all
know.
• It is possible to have a halfword by the use of DCW
directive. To ensure consistency one should use ALIGN
directive as shown in the examples.
3/8/2020 ARM by Shriram 51

How to use Keil?


3/8/2020 ARM by Shriram 52

Select NXP – Philips – LPC2148


3/8/2020 ARM by Shriram 53

Simple addition with ARM mode


instruction
3/8/2020 ARM by Shriram 54

Compare Using ARM Instruction

SWI &11 can be


Used instead of
loop
3/8/2020 ARM by Shriram 55

Swapping two numbers with ARM.


3/8/2020 ARM by Shriram 56

One’s complement
3/8/2020 ARM by Shriram 57

Two’s complement
3/8/2020 ARM by Shriram 58

Greatest of 2 numbers
3/8/2020 ARM by Shriram 59

16 bit operation in ARM


; 16 bit data transfer happens here.
TTL 16bitdatatransfer
AREA program, CODE, READONLY
; AREA is a directive which helps in specifying region where the code has
; to be stored. Here it is RESET. it can be PROGRAM also.
ENTRY

Main
LDRB R1, Value ; Loading the value to be moved.
STR R1, Result ; Store it back.
SWI &11 ; Software Interrupt instead of loop option seen earlier.

Value DCW &C123


ALIGN ; This will support 16 bit execution in ARM mode.
Result DCW 0 ; Storage
END
3/8/2020 ARM by Shriram 60

A Simple One’s complement in a different way.


; Ones complement Example.
TTL onescomplement
AREA program, CODE, READONLY
; AREA is a directive which helps in specifying region where the code has
; to be stored. Here it is RESET. it can be PROGRAM also.
ENTRY

Main
LDRB R1, Value ; Loading the value to be complemented.
MVN R1, R1 ; See the way I used R1 and R1. MVN is NOT.
SWI &11 ; Software Interrupt instead of loop option seen earlier.

Value DCW &0000


ALIGN ; This will support 16 bit execution in ARM mode.
END

Before Execution After Execution


3/8/2020 ARM by Shriram 61

Shift Left One Bit.


; Shift Left One Bit!
TTL shift left one bit
AREA RESET, CODE, READONLY
; AREA is a directive which helps in specifying region where the code
; has to be stored. Here it is RESET. it can be PROGRAM also.
ENTRY
Main
LDRB R1, Value ; Loading the value to be complemented.
MOV R1, R1, LSL#0X1 ; one bit shift.
SWI &11

Value DCD &0001


END

Before Execution
After Execution
3/8/2020 ARM by Shriram 62

Getting into Thumb Mode. (No great


change, it is a mode, that’s all! )
• Core has two execution states ARM and Thumb;
• Thumb is a compressed and 16 bit representation of a
subset of the ARM instruction set.
• Like ARM, Thumb also uses load store architecture for
data processing, data transfer and control flow
instructions.
• The standard chip that includes the Thumb instruction set
is the ARM7TDMI where "T" specifies Thumb.
(Remember this, we saw this in CPSR ;))
3/8/2020 ARM by Shriram 63

Contd.,
3/8/2020 ARM by Shriram 64

How to set thumb state?


• D3 is the default value for the CPSR as shown in the
below. So by default one can observe that the Thumb
state is disabled. To get it enabled, as already discussed
CPSR should be accessed and T bit should be set.
3/8/2020 ARM by Shriram 65

Contd.,
• Setting the T bit can be done by adding 0x20 to the D3. It
will then set the T bit and eventually the THUMB mode will
be set
3/8/2020 ARM by Shriram 66

Try these all with ARM.


• 1. Write a program to find greatest of three numbers.
• 2. Write a program to SWAP two values without using the third
variable.
• 3. Write a simple code to find out if a number is prime or not.
• 4. Write a program to SWAP upper and lower nibbles of a
number.
• 5. Write a program to disassemble a value. E.g. 54H to 05H
and 04H.
• 6. Write a program to convert ASCII value to BINARY.
• 7. Write a program to convert HEXADECIMAL to BINARY.
• 8. Factorial of a given number.
• 9. Fibonacci series.
• 10. Prime number or not.
• 11. Sorting the numbers in a series.
3/8/2020 ARM by Shriram 67

Recollect all these! – Points to remember.


3/8/2020 ARM by Shriram 68

Memory Management in ARM


• Microprocessors are expected to execute instructions at a
very high rate. (Compare 8085 with ARM!!)
• This would be possible when you have sufficient memory
support. (Means, how important it is to have bigger RAM
in your phone)
• Not only bigger, i.e. large, but also, it should be faster! 
• If small, well, it cannot hold all the contents.
• If slow, no use in holding all the contents (Because,
processor would not get the instructions in time to
process)
3/8/2020 ARM by Shriram 69

Contd.,
• Here comes the challenge!
• Larger the memory is, slower it gets. (Obvious right)
• So, it is not possible for someone to design a large memory
which is also faster. (This is not possible!)
• Here comes the possibility:
• Combine a small, fast memory with large slow main
memory (Possible!! Trust me)
• Now, you get a feel as “large fast memory in hand with
you”.
• Let us name these now!
• Small, Fast component == Cache (Will have the most
frequently accessed instruction. Library table/rack is the
example)
• Here comes the terms temporal and spatial locality!
Contd.,
• Suppose you were a student writing a term paper on important
historical developments in computer hardware.
• You are sitting at a desk in a library with a collection of books that you
have pulled from the shelves and are examining.
• You find that several of the important computers that
you need to write about are described in the books you
have, but there is nothing about the EDSAC.
• Therefore, you go back to the shelves and look for an additional book.
You find a book on early British computers that covers EDSAC.
• Once you have a good selection of books on the desk in front of you,
there is a good probability that many of the topics you need can be
found in them, and you may spend most of your time just using the
books on the desk without going back to the shelves.
• Having several books on the desk in front of you saves time
compared to having only one book there and constantly having to go
back to the shelves to return it and take out another.
Contd.,
• The same principle allows us to create the illusion of a
large memory that we can access as fast as a very small
memory.
• Just as you did not need to access all the books in the
library at once with equal probability, a program does not
access all of its code or data at once with equal
probability.
• Otherwise, it would be impossible to make most memory
accesses fast and still have large memory in computers,
just as it would be impossible for you to fit all the library
books on your desk and still find what you wanted quickly
Contd.,
• This principle of locality underlies both the way in which
you did your work in the library and the way that
programs operate.
• The principle of locality states that programs access a
relatively small portion of their address space at any
instant of time, just as you accessed a very small You brought out the
portion of the library’s collection. book on early
English computers
to find out about
• There are two different types of locality: EDSAC, you also
noticed that there
was another book
shelved next to it
If you recently about early
mechanical
brought a book to computers, so you
your desk to look at, also brought back
that book too and,
you will probably later on, found
need to look at it something useful in
again soon that book.
Books on the same
topic are shelved
together in the
library to increase
spatial locality
Contd.,
• Just as accesses to books on the desk naturally exhibit
locality, locality in programs arises from simple and
natural program structures.
• For example, most programs contain loops, so
instructions and data are likely to be accessed repeatedly,
showing high amounts of temporal locality.
• Since instructions are normally accessed sequentially,
programs show high spatial locality.
• Accesses to data also exhibit a natural spatial locality.
For example, accesses to elements of an array or a
record will naturally have high degrees of spatial
locality.
3/8/2020 ARM by Shriram 74

Memory size and speed


• Let us look into the hierarchy!
• Registers (Processor Registers) are on the top!! Means, they
are on the top of the hierarchy!
• Access time of few nanoseconds for the 32 registers we have
seen here in ARM 7 architecture.
• So, you need very less time to access the registers.
• On chip cache up to 32 KB size with access time less than /
eq to 10 nanoseconds.
• Some systems (Desktop) may have few hundred KBs of
second level off – chip cache. Access time remains few tens
of Nano seconds.
We know about Main Memory (RAM) ranging from 512 MBs till
GBs.
• Finally, hard drive with the heavy storage capacity in the range
of GBs and even to TBs with access time of few tens of Milli
seconds.
3/8/2020 ARM by Shriram 75

Unified and Harvard caches


• Caches can be built in many ways. At the highest level a
processor can have one of he following two organizations:
• A unified cache.
• This is a single cache for both instructions and data.
• Separate instruction and data caches
• This organization is sometimes called a modified
Harvard architecture. (This makes load / store in one
cycle)
Cache Performance Metrics
• If the data requested by the processor appears in some block in
the upper level, this is called a hit (analogous to your finding the
information in one of the books on your desk).
• If the data is not found in the upper level, the request is called a
miss.
• The lower level in the hierarchy is then accessed to retrieve the
block containing the requested data.
• (Continuing our analogy, you go from your desk to the shelves to
find the desired book.)
3/8/2020 ARM by Shriram 77

Cache organization
• Since a cache holds a dynamically varying selection of
items from main memory, it must have storage for both
the data and the address at which the data is stored in
main memory. (remember this!)
• Cache is a safe place for hiding or storing things – Says
some dictionary.
• Library example : Desk is also safe!! Books remain safe in
the table as well! 
• We begin by looking at a very simple cache in which the
processor requests are each one word and the blocks
also consist of a single word.
3/8/2020 ARM by Shriram 78

Contd.,
• Figure RHS shows such a
simple cache, before and
after requesting a data
item that is not initially in
the cache.
• Before the request, the
cache contains a collection
of recent references X1,
X2, . . . , Xn – 1, and the
processor requests a word
Xn that is not in the cache.
This reference causes a miss that forces
• This request results in a the cache to fetch Xn from memory and
miss, and the word Xn is insert it into the cache.
brought from memory into
cache.
3/8/2020 ARM by Shriram 79

Contd.,
• There are two questions to answer:
• How do we know if a data item is in the cache?
• Moreover, if it is, how do we find it?
• The answers to these two questions are related. If each word can
go in exactly one place in the cache, then it is straightforward to
find the word if it is in the cache.
• The simplest way to assign a location in the cache for each word in
memory is to assign the cache location based on the address of the
word in memory.
• This cache structure is called direct mapped, since each
memory location is mapped directly to exactly one
location in the cache. The typical mapping between
addresses and cache locations for a direct-mapped cache
is usually simple.
3/8/2020 ARM by Shriram 80

Contd., • Because each cache location


can contain the contents of a
number of different memory
locations, how do we know
whether the data in the cache
corresponds to a requested
word?

• That is, how do we know


whether a requested word is in
the cache or not?
• For example, Figure LHS shows how
the memory addresses between 1ten • We answer this question by
(00001two) and 29ten (11101two) map adding a set of tags to the
to locations 1ten (001two) and 5ten cache. The tags contain the
(101two) in a direct-mapped cache of address information required to
eight words. identify whether a word in the
• This is called Direct Mapping! cache corresponds to the
requested word.
3/8/2020 ARM by Shriram 81

Contd.,
• The tag needs only to
contain the upper portion
of the address,
corresponding to the bits
that are not use as an
index into the cache.
• For example, (See RHS)
we need only to have
the upper 2 of the 5
address bits in the tag,
since the lower 3-bit
index field of the
address selects the
block.
3/8/2020 ARM by Shriram 82

Contd.,
• We also need a way to recognize that a cache block does
not have valid information.
• For instance, when a processor starts up, the cache does
not have good data, and the tag fields will be
meaningless.
• Even after executing many instructions, some of the
cache entries may still be empty, as in Figure RHS.
• Thus, we need to know that the tag should be ignored for
such entries.
• The most common method is to add a valid bit to indicate
whether an entry contains a valid address.
• If the bit is not set, there cannot be a match for this block.
The cache is initially empty, with all valid bits (V entry in cache) turned off (N). The processor requests the
following addresses: 10110two (miss), 11010two (miss), 10110two (hit), 11010two (hit), 10000two (miss), 00011two (miss),
10000two (hit), and 10010two (miss). The figures show the cache contents after each miss in the sequence has been
handled. When address 10010two (18) is referenced, the entry for address 11010two (26) must be replaced, and a
reference to 11010two will cause a subsequent miss. The tag field will contain only the upper portion of the
address. The full address of a word contained in cache block i with tag field j for this cache is j  8 + i, or
equivalently the concatenation of the tag field j and the index i. For example, in cache f above, index 010 has
tag 10 and corresponds to address 10010.
3/8/2020 ARM by Shriram 84

Contd.,
• So far, when we place a block in the cache, we have used a
simple placement scheme: A block can go in exactly one place in
the cache.
• It is direct mapping! We have seen it already!
• There is actually a whole range of schemes for placing blocks. At one
extreme is direct mapped, where a block can be placed in exactly one
location.
• At the other extreme is a scheme where a block can be placed in any
location in the cache.
• Such a scheme is called fully associative because a block in
memory may be associated with any entry in the cache.
• To find a given block in a fully associative cache, all the entries in the
cache must be searched because a block can be placed in any one.
• To make the search practical, it is done in parallel with a comparator
associated with each cache entry. These comparators significantly
increase the hardware cost, effectively making fully associative
placement practical only for caches with small numbers of blocks.
3/8/2020 ARM by Shriram 85

Contd.,
• The middle range of designs between direct mapped and
fully associative is called set associative.
• In a set-associative cache, there are a fixed number of
locations (at least two) where each block can be placed;
• A set-associative cache with n locations for a block is
called an n-way set-associative cache.
• An n-way set-associative cache consists of a number of
sets, each of which consists of n blocks. Each block in the
memory maps to a unique set in the cache given by the
index field, and a block can be placed in any element of
that set.
• Thus, a set associative placement combines direct-mapped
placement and fully associative placement: a block is directly
mapped into a set, and then all the blocks in the set are
searched for a match.
3/8/2020 ARM by Shriram 86

Contd.,
• Remember that in a direct-mapped cache, the position of
a memory block is given by
• (Block number) modulo (Number of cache blocks)
• In a set-associative cache, the set containing a memory
block is given by
• (Block number) modulo (Number of sets in the cache)
3/8/2020 ARM by Shriram 87

PERIODICAL 2
PORTION IS
COMPLETED.
Shriram K Vasudevan

You might also like