100% found this document useful (2 votes)
115 views24 pages

Introduction To ARM

Uploaded by

Sasi Bhushan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
115 views24 pages

Introduction To ARM

Uploaded by

Sasi Bhushan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT – III: ARM ARCHITECTURE & PROGRAMMING MODEL

History, Architecture, ARM design philosophy, Registers, Program status register, Instruction pipeline, Interrupts and vector table, ARM
processor families, Instruction set: Data processing instructions, Addressing modes, Branch, Load-Store instructions, PSR instructions, and
Conditional instructions.
ARM Architecture:

• Load/store architecture
• A large array of uniform registers
• Fixed-length 32-bit instructions
• 3-address instructions
The ARM Architecture consists of
 Arithmetic Logic Unit
 Booth multiplier
 Barrel shifter
 Control unit
 Register file
The ARM processor conjointly has other components like the Program status register, which contains the processor flags (Z, S, V and C). The modes bits
conjointly exist within the program standing register, in addition to the interrupt and quick interrupt disable bits; Some special registers: Some registers are
used like the instruction, memory data read and write registers and memory address register.

Priority encoder:
The encoder is used in the multiple load and store instruction to point which register within the register file to be loaded or kept .

Page 1 of 24
Multiplexers:
Several multiplexers are accustomed to the management operation of the processor buses. Because of the restricted project time, we tend to implement
these components in a very behavioral model. Each component is described with an entity. Every entity has its own architecture, which can be optimized
for certain necessities depending on its application. This creates the design easier to construct and maintain.

Arithmetic Logic Unit (ALU):


The ALU has two 32-bits inputs. The primary comes from the register file, whereas the other comes from the shifter. Status registers flags modified by the
ALU outputs. The V-bit output goes to the V flag as well as the Count goes to the C flag. Whereas the foremost significant
bit really represents the S flag, the ALU output operation is done by NORed to get the Z flag. The ALU has a 4-bit function bus that permits up to 16
opcode to be implemented.

Booth Multiplier Factor:


The multiplier factor has 3 32-bit inputs and the inputs return from the register file. The multiplier output is barely 32-Least Significant Bits of the
merchandise. The entity representation of the multiplier factor is shown in the above block diagram. The multiplication
starts whenever the beginning 04 input goes active. Fin of the output goes high when finishing.

Booth Algorithm:
Booth algorithm is a noteworthy multiplication algorithmic rule for 2’s complement numbers. This treats positive and negative numbers uniformly.
Moreover, the runs of 0’s or 1’s within the multiplier factor are skipped over without any addition or subtraction being performed, thereby creating possible
quicker multiplication.

Barrel Shifter:
The barrel shifter features a 32-bit input to be shifted. This input is coming back from the register file or it might be immediate data. The shifter has
different control inputs coming back from the instruction register. The Shift field within the instruction controls the operation of the barrel shifter. This
field indicates the kind of shift to be performed (logical left or right, arithmetic right or rotate right). The quantity by which the register ought to be shifted
is contained in an immediate field within the instruction or it might be the lower 6 bits of a register within the register file.
The shift_val input bus is 6-bits, permitting up to 32 bit shift. The shift type indicates the needed shift sort of 00, 01, 10, 11 are corresponding to shift left,
shift right, an arithmetic shift
right and rotate right, respectively. The barrel shifter is especially created with multiplexers.

Control Unit:
For any microprocessor, control unit is the heart of the whole process and it is responsible for the system operation, so the control unit design is the most
important part within the whole design. The control unit is sometimes a pure combinational circuit design. Here, the control unit is implemented by easy
state machine. The processor timing is additionally included within the control unit. Signals from the control unit are connected to each component within
the processor to supervise its operation.

ARM design philosophy:


ARM, previously Advanced RISC Machine, originally Acorn RISC Machine, is a family of Reduced Instruction Set Computing (RISC) Architecture for
Computer Processors. The ARM processor core is key component of many successful 32-bit embedded systems.
The RISC design philosophy
The design philosophy aimed at delivering the following.
 simple but powerful instructions
 single cycle execution at a high clock speed
 intelligence in software rather than hardware
 Provide greater flexibility on reducing the complexity of instructions.
The ARM core uses RISC architecture.
The RISC philosophy is implemented with four major design rules:
1. Instructions – RISC processors have a reduced number of instruction classes. These classes provide simple operations that can each execute in a single

Page 2 of 24
cycle. The compiler or programmer synthesizes complicated operations (a divide operation) by combining several simple instructions. Each instruction is a
fixed length to allow the pipeline to fetch future instructions before decoding the current instruction. In contrast, in CISC processors the instructions are
often of variable size and take many cycles to execute.
2. Pipelines —The processing of instructions is broken down into smaller units that can be executed in parallel by pipelines. Ideally the pipeline advances by
one step on each cycle for maximum throughput. There is no need for an instruction to be executed by a mini program called microcode as on CISC
processors.
3. Registers—RISC machines have a large general-purpose register set. Any register can contain either data or an address. In contrast, CISC processors have
dedicated registers for specific purposes.
4. Load-store architecture--The processor operates on data held in registers. Separate load and store instructions transfer data between the register bank and
external memory. In contrast, with a CISC design the data processing operations can act on memory directly.
The ARM Design Philosophy
There are a number of physical features that have driven the ARM processor design.
 Small to reduce power consumption and extend battery operation
 High code density
 Price sensitive and use slow and low-cost memory devices.
 Reduce the area of the die taken up by the embedded processor.
 Hardware debug technology
 ARM core is not a pure RISC architecture
Register of ARM
Registers:
ARM processors provide general-purpose and special-purpose registers. Some additional registers are available in privileged execution modes. In all ARM
processors, the following registers are available and accessible in any processor mode:
 13 general-purpose registers R0-R12.
 One Stack Pointer (SP).
 One Link Register (LR).
 One Program Counter (PC).
 One Application Program Status Register (APSR).
The amount of registers depends on the ARM version. According to the ARM Reference Manual, there are 30 general-purpose 32-bit registers, with the
exception of ARMv6-M and ARMv7-M based processors. The first 16 registers are accessible in user-level mode, the additional registers are available in
privileged software execution (with the exception of ARMv6-M and ARMv7-M). In this tutorial series we will work with the registers that are accessible in
any privilege mode: r0-15. These 16 registers can be split into two groups: general purpose and special purpose registers.

Page 3 of 24
R0-R12: can be used during common operations to store temporary values, pointers (locations to memory), etc. R0, for example, can be referred as
accumulator during the arithmetic operations or for storing the result of a previously called function. R7 becomes useful while working with syscalls as it
stores the syscall number and R11 helps us to keep track of boundaries on the stack serving as the frame pointer (will be covered later). Moreover, the
function calling convention on ARM specifies that the first four arguments of a function are stored in the registers r0-r3.

R13: SP (Stack Pointer). The Stack Pointer points to the top of the stack. The stack is an area of memory used for function-specific storage, which is
reclaimed when the function returns. The stack pointer is therefore used for allocating space on the stack, by subtracting the value (in bytes) we want to
allocate from the stack pointer. In other words, if we want to allocate a 32 bit value, we subtract 4 from the stack pointer.

R14: LR (Link Register). When a function call is made, the Link Register gets updated with a memory address referencing the next instruction where the
function was initiated from. Doing this allows the program return to the “parent” function that initiated the “child” function call after the “child” function is
finished.

R15: PC (Program Counter). The Program Counter is automatically incremented by the size of the instruction executed. This size is always 4 bytes in ARM
state and 2 bytes in THUMB mode.
When a branch instruction is being executed, the PC holds the destination address. During execution, PC stores the address of the current instruction plus 8
(two ARM instructions) in ARM state, and the current instruction plus 4 (two Thumb instructions) in Thumb(v1) state. This is different from x86 where PC
always points to the next instruction to be executed.
Current Program Status Register:
Current Program Status Register:
The Current Program Status Register (CPSR) holds the same program status flags as the APSR, and some additional information.
The CPSR holds:
The APSR flags.
The processor mode.
The interrupt disable flags.
The instruction set state (ARM, Thumb, ThumbEE, or Jazelle®).
The endianness state (on ARMv4T and later).
The execution state bits for the IT block (on ARMv6T2 and later).

The Current Program Status Register is a 32-bit wide register used in the ARM architecture to record various pieces of information regarding the state of the
program being executed by the processor and the state of the processor. This information is recorded by setting or clearing specific bits in the register. The
top four bits (bits 31, 30, 29, and 28) are the condition code (cc) bits and are of most interest to us. Condition code bits are sometimes referred to as "flags".
The lowest 8 bits (bit 7 through to bit 0) store information about the processor's own state. The remaining bits (i.e. bit 27 to bit 8) are currently unused in
most ARM processors.
The N bit is the "negative flag" and indicates that a value is negative.
The Z bit is the "zero flag" and is set when an appropriate instruction produces a zero result.
The C bit is the "carry flag" but it can also be used to indicate "borrows" (from subtraction operations) and "extends" (from shift instructions (LINK)).
The V bit is the "overflow flag" which is set if an instruction produces a result that overflows and hence may go beyond the range of numbers that can be
represented in 2's complement signed format.
For completeness, the other state bits are:
Page 4 of 24
The I and F bits which determine whether interrupts (such as requests for input/output) are enabled or disabled.
The T bit which indicates whether the processor is in "Thumb" mode, where the processor can execute a subset of the assembly language as 16-bit compact
instructions. As Thumb code packs more instructions into the same amount of memory, it is an effective solution to applications where physical memory is
at a premium.
The M4 to M0 bits are the mode bits. Application programs normally run in user mode (where the mode bits are 10000). Whenever an interrupt or similar
event occurs, the processor switches into one of the alternative modes allowing the software handler greater privileges with regard to memory manipulation.

Program status register


Program Status Registers:
At any given moment, you have access to 16 registers (R0-R15) and the Current Program Status Register (CPSR). In User mode, a restricted form of the
CPSR called the Application Program Status Register (APSR) is accessed instead.
The Current Program Status Register (CPSR) is used to store:
• The APSR flags.
• The current processor mode.
• Interrupt disable flags.
• The current processor state, that is, ARM, Thumb, ThumbEE, or Jazelle.
• The endianness.
• Execution state bits for the IT block.
The Program Status Registers (PSRs) form an additional set of banked registers. Each exception mode has its own Saved Program Status Register (SPSR)
where a copy of the pre-exception CPSR is stored automatically when an exception occurs. These are not accessible from User modes. Application
programmers must use the APSR to access the parts of the CPSR that can be changed in unprivileged mode. The APSR must be used only to access the N,
Z, C, V, Q, and GE[3:0] bits. These bits are not normally accessed directly, but instead set by condition code setting instructions and tested by instructions
that are executed conditionally. For example, the CMP R0, R1 instruction compares the values of R0 and R1 and sets the zero flag (Z) if R0 and R1 are
equal.

The individual bits represent the following:


• N – Negative result from ALU.
• Z – Zero result from ALU.
• C – ALU operation Carry out.
• V – ALU operation overflowed.
• Q – cumulative saturation (also described as sticky).
• J – indicates whether the core is in Jazelle state.
• GE[3:0] – used by some SIMD instructions.
• IT [7:2] – If-Then conditional execution of Thumb-2 instruction groups.
• E bit controls load/store endianness.
• A bit disables asynchronous aborts.
• I bit disables IRQ.
Page 5 of 24
• F bit disables FIQ.
• T bit – indicates whether the core is in Thumb state.
• M[4:0] – specifies the processor mode (FIQ, IRQ, as described in Table.

Instruction pipeline
The Process of fetching the next instruction while the current instruction is being executed is called as “pipelining”. Pipelining is supported by the
processor to increase the speed of program execution. Increases throughput. Several operations take place simultaneously, rather than serially in pipelining.
The Pipeline has three stages fetch, decode and execute as shown in figure

The three stages used in the pipeline are:


(i) Fetch : In this stage the ARM processor fetches the instruction from the memory.
(ii) Decode : In this stage recognizes the instruction that is to be executed.
(iii) Execute 2 In this stage the processor processes the instruction and writes the result back to desired register.
If these three stages of execution are overlapped, we will achieve higher speed of execution. Such pipeline exists in version 7 of ARM processor. Once the
pipeline is filled, each instructions require s one cycle to complete execution. Below fig shows three staged pipelined
instruction.
 In first cycle, the processor fetches instruction 1 from the memory
 In the second cycle the processor fetches instruction 2 from the memory and decodes instruction 1.
 In the third cycle the processor fetches instruction 3 from memory, decodes instruction 2 and executes instruction
 In the fourth cycle the processor fetches instruction 4, decodes instruction 3 and executes instruction
 The pipeline thus executes an instruction in three cycles i.e. it delivers a throughput equal to one instruction per cycle.
In case of a multi-cycle instruction as shown in Fig below, instruction 2 (i. e. STR of the
store instruction) requires 4 clock cycles and hence the pipeline stalls for one clock pulse. The

Page 6 of 24
first instruction completes execution in the third clock pulse, while the second instruction instead of completing execution in fourth clock pulse completes
the same in fifth clock pulse. Thereafter every instruction completes execution in one clock pulse as seen in this figure

The amount of work done at each stage can be reduced by increasing the number of stages in the pipeline. To improve the performance, the processor then
can be operated at higher operating frequency. As more number of cycles are required to fill the pipeline, the system latency also increases. The data
dependency between the stages can also be increased as the stages of pipeline increase. So the instructions need to be schedule while writing code to
decrease data dependency.
Interrupts and vector table
Interrupts and the Vector Table.
When an exception or interrupt occurs, the processor sets the pc to a specific memory address. The address is within a special address range called the
vector table. The entries in the vector table are instructions that branch to specific routines designed to handle a particular exception or interrupt. The
memory map address 0x00000000 is reserved for the vector table, a set of 32-bit words. On some processors the vector table can be optionally located at a
higher address in memory (starting at the offset 0xffff0000). Operating systems such as Linux and Microsoft’s embedded products can take advantage of
this feature. When an exception or interrupt occurs, the processor suspends normal execution and starts loading instructions from the exception vector table.
Each vector table entry contains a form of branch instruction pointing to the start of a specific routine:
Reset vector is the location of the first instruction executed by the processor when power is
applied. This instruction branches to the initialization code.
Undefined instruction vector is used when the processor cannot decode an instruction.
Software interrupt vector is called when you execute a SWI instruction. The SWI instruction is frequently used as the mechanism to invoke an operating
system routine.
Prefetch abort vector occurs when the processor attempts to fetch an instruction from an address without the correct access permissions. The actual abort

Page 7 of 24
occurs in the decode stage.
Data abort vector is similar to a pre-fetch abort but is raised when an instruction attempts to
access data memory without the correct access permissions.
Interrupt request vector is used by external hardware to interrupt the normal execution flow of the processor. It can only be raised if IRQs are not masked in
the cpsr.
Interrupt vector Table:

ARM processor families


ARM PROCESSOR FAMILIES
ARM has designed a number of processors that are grouped into different families according to the core they use. The families are based on the ARM7,
ARM9, ARM10, and ARM11 cores. The postfix numbers 7, 9, 10, and 11 indicate different core designs. The ascending number equates to an increase in
performance and sophistication. ARM8 was developed but was soon superseded.
Comparison of attributes between the ARM7, ARM9, ARM10, and ARM11 cores. The numbers quoted can vary greatly and are directly dependent upon
the type and geometry of the manufacturing process, which has a direct effect on the frequency (MHz) and power consumption (watts)

ARM7 Family
The ARM7 core has a Von Neumann–style architecture, where both data and instructions use the same bus. The core has a three-stage pipeline and
executes the architecture ARMv4T instruction set.

Page 8 of 24
ARM9 FAMILY
The ARM9 family was announced in 1997. Because of its five-stage pipeline, the ARM9 processor can run at higher clock frequencies than the ARM7
family. The extra stages improve the overall performance of the processor. The memory system has been redesigned to follow the Harvard architecture,
which separates the data D and instruction I buses.
ARM10 Family
The ARM10, announced in 1999, was designed for performance. It extends the ARM9 pipeline to six stages. It also supports an optional vector floating-
point (VFP) unit, which adds a seventh stage to the ARM10 pipeline. The VFP significantly increases floating-point performance and is
compliant with the IEEE 754.1985 floating-point standard.
ARM11 Family
The ARM1136J-S, announced in 2003, was designed for high performance and power- efficient applications. ARM1136J-S was the first processor
implementation to execute architecture ARMv6 instructions. It incorporates an eight-stage pipeline with separate load- store and + arithmetic pipelines.
Included in the ARMv6 instructions are single instruction multiple data (SIMD) extensions for media processing, specifically designed to increase video
processing performance.
Specialized Processors
Strong ARM was originally co-developed by Digital Semiconductor and is now exclusively licensed by Intel Corporation. It is has been popular for PDAs
and applications that require performance with low power consumption. It is a Harvard architecture with separate D + I caches. Strong ARM was the first
high-performance ARM processor to include a five-stage pipeline, but it does not support the Thumb instruction set. Intel’s X-Scale is a follow-on product
to the Strong ARM and offers dramatic increases in performance. At the time of writing, X-Scale was quoted as being able to run up to 1 GHz. X-Scale
executes architecture v5TE instructions. It is a Harvard architecture and is similar to the Strong ARM, as it also includes an MMU. SC100 is at the other
end of the performance spectrum. It is designed specifically for low-power security applications. The SC100 is the first Secure Core and is based on an
ARM7TDMI core with an MPU. This core is small and has low voltage and current requirements, which makes it attractive for smart card applications.
Addressing modes:
When accessing an operand for a data processing or movement instruction, there are several standard techniques used to specify the desired location. Most
processors support several of these addressing modes
1. Immediate addressing: the desired value is presented as a binary value in the instruction.
2. Absolute addressing: the instruction contains the full binary address of the desired value in memory.
3. Indirect addressing: the instruction contains the binary address of a memory location that contains the binary address of the desired value.
4. Register addressing: the desired value is in a register, and the instruction contains the register number.
5.Register indirect addressing: the instruction contains the number of a register which contains the address of the value in memory.
6. Base plus offset addressing: the instruction specifies a register (the base) and a binary offset to be added to the base to form the memory address.
7. Base plus index addressing: the instruction specifies a base register and another register (the index) which is added to the base to form the memory
address.
8. Base plus scaled index addressing: as above, but the index is multiplied by a constant (usually the size of the data item, and usually a power of two)
before being added to the base.
9. Stack addressing: an implicit or specified register (the stack pointer) points to an area of memory (the stack) where data items are written (pushed) or
read (popped) on a last-in-first-out basis.
Memory is addressed by generating the Effective Address (EA) of the operand by adding a signed offset to the contents of a base register Rn.
Pre-indexed mode:
 EA is the sum of the contents of the base register Rn and an offset value.
Pre-indexed with writeback:
 EA is generated the same way as pre-indexed mode.
 EA is written back into Rn.
Post-indexed mode:
 EA is the contents of Rn.
 Offset is then added to this address and the result is written back to Rn.
Relative addressing mode:
 Program Counter (PC) is used as a base register.
 Pre-indexed addressing mode with immediate offset
No absolute addressing mode available in the ARM processor.
Offset is specified as:

Page 9 of 24
 Immediate value in the instruction itself.
 Contents of a register specified in the instruction.
Addressing modes of ARM processor are classified as follows:

Addressing modes for Data Processing Operand (i.e op1):


These are two method for addressing these operands
Unmodified value In this addressing mode, the register or a value is given unmodified i.e. without any shift or rotation
e. g, (i) MOV R0, # 1234 H This instruction will move the immediate constant value 1234 into register R0.
Modified value In this addressing mode, the given value or register is shifted or rotated. These are Different shift and rotate operations possible as listed
below with examples.
(1) Logical shift left This will take the value of a register and shift the value towards most Significant bits, by n bits.
e.g. MOV R0, R1, LSL # 2
After the execution of this instruction R0 will become the value of R1 shifted 2 bits.
(2) Logical shift right This will take the value of a register and shift the value towards right by n bits.
e.g. MOV R0, R1, LSR R2 After the execution of this instruction R0 will have the value of R1 shifted right by R2 times. R1 and R2 are not altered.
(3) Arithmetic shift right This is similar to logical shift right, except that the MSB is retained as well as shifted for arithmetic shift operation
e.g. MOV R0, R1, ASR #2 After the execution of this instruction R0 will have the value of R1 Arithmetic; shifted right by 2 bits.
(4) Rotate right This will take the value of a register and rotate it right by n bits
e.g. MOV R0, R1, ROR R2 After the execution of this instruction R0 will have the value of R1 rotated right for R2 times.
(5) Rotate right extended This is similar to Rotate right by one bit, with the carry flag moved into the MSB, i.e. it is similar to rotate right through carry
e. g. MOV R0, R1 RRX After the execution of this instruction R0 Will have the value of register R1 rotated right through carry by 1 bit.
Addressing Modes for Memory Access Operand
As already discussed load and store instructions are used to access memory. The different memory access addressing modes are
(i) Register indirect addressing mode
(ii) Relative register indirect addressing mode
(iii) Base indexed indirect addressing mode
(iv) Base with scale register addressing mode
Each of these addressing modes have offset addressing, Pre-index addressing and post-index addressing as explained in the examples for each addressing
mode
(i) Register indirect addressing mode In this addressing mode, a register is used to give the address of the memory location to be accessed.
e. g. LDR R0, [R1] This instruction will load the register R0 with the 32-bit word at the memory address held in the register R1.
(ii) Relative register indirect addressing mode In this addressing mode the memory address is generated by an immediate value added to a register. Pre
index and post index are supported in this addressing mode.
e. g. (a) LDR R0, [R1, #4]
This instruction will load the register R0 with the word at the memory areas calculated by adding the constant address contained in the R1 register value 4
to the memory address contained in R1 register
e.g. (b) LDR R0, [R1, #4]!
This is a pre-index addressing. This instruction is same as that in e. g. (a) this instruction also places the new address in R1 i.e R1 (R1 + 4.
e.g. (c)‘LDR, [R1], #4
This is post-index addressing. This instruction will load register R0 with the word at memory address given in register R1. It will then calculate the new
address by adding 4 to R1 and place this new address in R1
(iii) Base indexed indirect addressing mode In this addressing mode the memory address is generated by adding the values of two registers. Pre-index and
post-index are supported also in this addressing mode.
e.g. (a) LDR R0, [R1, R2]
Page 10 of 24
This instruction will load the register R0 with the word at memory address calculated by adding register R1 to register R2.
e.g. (b) LDR R0, [R1, R2]!
This is pre-index addressing. This instruction is same as that in e.g. (a). This instruction also places the new address in R1 i. e. R1 (-R1 + R2.
e.g. (c) LDR R0, [R1], R2
This is a post-index addressing. This instruction will load register R0 with the word at memory address given in register R1. It will then calculate the new
address by adding the value in register R2 to register R1 and Place this new address in R1.
(iv) Base with scaled register addressing mode In this addressing mode the memory address is generated by a register value added to another register
shifted left. Pre-index and post-index are supported in this addressing mode.
e.g. (a) LDR R0, [R1, R2, LSL #2]
This instruction will load the register R0 with the word at the memory address calculated by adding register leith register R2 shifted left by 2 bits.
e.g. (b) LDR RO,[R1, R2,_LSL #2]!
This is a pre-indexed addressing. This instruction will load the register R0 with the word at the memory address calculated by adding register R1 with
register R2 shifted left by 2 bits. The new address is placed in register R1.
i.e.R1e-R1+R2 <<2.
e.g. (c) LDR R0, [R1], R2, LSL #2.
This is a post-indexed addressing. This instruction will load the register R0 with the word at memory address contained in register R1. It will then calculate
the new address by adding register R1 with register R2 shifted left by two bits. The new address is placed in register.

Describe about data processing instructions of ARM

Data Processing Instructions: The data processing instructions manipulate data within registers.
They are :
Move instructions
Arithmetic instructions
Logical instructions
Comparison instructions
Multiply instructions
Most data processing instructions can process one of their operands using the barrel shifter. If you use the S suffix on a data processing instruction, then it
updates the flags in the CPSR. Move and logical operations update the carry flag C, negative flag N, and zero flag Z. The carry flag is set from the result of
the barrel shift as the last bit shifted out.The N flag is set to bit 31 of the result.The Z flag is set if the result is zero.

Move Instructions:
It copies N into a destination register Rd, where N is a register or immediate value.
This instruction is useful for setting initial values and transferring data between registers.
Syntax: <instruction> {<cond>} {S} Rd, N

The second operand N for all data processing instructions is a register Rm or a constant preceded by #.
Example: This example shows a simple move instruction.
The MOV instruction takes the contents of register r5 and copies them into register r7, in this case, taking the value 5, and overwriting the value 8 in
register r7.
PRE r5 = 5; r7 = 8
MOV r7, r5; let r7 = r5
POST r5 = 5 r7 = 5

Barrel Shifter:
In above example we showed a MOV instruction where N is a simple register. But N can be more than just a register or immediate value; it can also be a
register Rm that has been Pre- processed by the barrel shifter prior to being used by a data processing instruction. Data processing instructions are processed
within the arithmetic logic unit (ALU). A unique and powerful feature of the ARM processor is the ability to shift the 32-bit binary pattern in one of the
source registers left or right by a specific number of positions before it enters the ALU. Pre-processing or shift occurs within the cycle time of the

Page 11 of 24
instruction. This is particularly useful for loading constants into a register and achieving fast multiplies or division by a power of 2. The below figure shows
the data flow between the ALU and the barrel shifter

Register Rn enters the ALU without any pre-processing of registers.


Example: Apply a logical shift left (LSL) to register Rm before moving it to the destination register. The MOV instruction copies the shift
operator result N into register Rd. N represents the result of the LSL operation.
PRE r5 = 5; r7 = 8
MOV r7, r5, LSL #2;
let r7 = r5*4 = (r5 << 2)
POST r5 = 5 r7 = 20
The example multiplies register r5 by four and then places the result into register r7. The five different shift operations that you can use within the barrel
shifter are summarized in below table.

The below table lists the syntax for the different barrel shift operations available on data processing instructions. The second operand N can be an
immediate constant proceeded by #, a register value Rm, or the value of Rm processed by a shift.

Page 12 of 24
Example: This example of a MOVS instruction shifts register r1 left by one bit. This multiplies register r1 by a value 21. As you can see, the C
flag is updated in the cpsr because the S suffix is present in the instruction mnemonic.
PRE cpsr = nzcvqiFt_USER
r0 = 0x00000000
r1 = 0x80000004
MOVS r0, r1, LSL #1
POST cpsr = nzCvqiFt_USER
r0 = 0x00000008
r1 = 0x80000004

Describe about Arithmetic Instructions of ARM

The arithmetic instructions implement addition and subtraction of 32-bit signed and unsigned values.

Example: This simple subtract instruction subtracts a value stored in register r2 from a value store in register r1. The result is stored in register
r0.
PRE r0 = 0x00000000;
r1 = 0x00000002
r2 = 0x00000001
SUB r0, r1, r2
POST r0 = 0x00000001
Example: This reverse subtract instruction (RSB) subtracts r1 from the constant value #0, writing the result to r0. This instruction use to negate
numbers.
PRE r0 = 0x00000000;
r1 = 0x00000077

Page 13 of 24
RSB r0, r1, #0; Rd = 0x0 - r1
POST r0 = -r1 = 0xFFFFFF89

Example: The SUBS instruction is useful for decrementing loop counters. In this example we subtract the immediate value one from the value
one stored in register r1. The result value zero is written to register r1. The cpsr is updated with the ZC flags being set.
PRE cpsr = nzcviFt_USER
r1 = 0x00000001
SUBS r1, r1, #1
POST cpsr = nZCviFt_USER
r1 = 0x00000000

Using the Barrel Shifter with Arithmetic Instructions:


Example: Register r1 is first shifted one location to the left to give the value of twice r1. The ADD instruction then adds the result of the barrel
shift operation to register r1. The final result transferred into register r0 is equal to three times the value stored in register r1.

PRE r0 = 0x00000000;
r1 = 0x00000005
ADD r0, r1, r1, LSL #1
POST r0 = 0x0000000F
r1 = 0x00000005

Describe About Logical instructions of ARM

Logical Instructions:

Logical instructions perform bitwise logical operations on the two source registers.

Example: Shows a logical OR operation between registers r1 and r2. r0 holds the result.
PRE r0 = 0x00000000;
r1 = 0x02040608
r2 = 0x10305070
ORR r0, r1, r2
POST r0 = 0x12345678
Example: Shows a more complicated logical instruction called BIC, which carries out a logical bit clear.
PRE r1 = 0b111:
r2 = 0b0101
BIC r0, r1, r2
POST r0 = 0b1010
This is equivalent to Rd = Rn AND NOT(N)
The logical instructions update the cpsr flags only if the S suffix is present. These instructions can use barrel- shifted second operands in the same way as
the arithmetic instructions.

Page 14 of 24
Describe About Conditional instructions of ARM

Comparison Instructions:
The comparison instructions are used to compare or test a register with a 32-bit value. They update the cpsr flag bits according to the result, but do not
affect other registers. For these instructions no needs to apply the S suffix for update the flags.

Example: This example shows a CMP comparison instruction. You can see that both registers, r0 and r9, are equal before executing the
instruction. The value of the z flag prior to execution is 0 and is represented by a lowercase z. After execution the z flag changes to 1 or an
uppercase Z. This change indicates equality.
PRE cpsr = nzcviFt_USER
r0 = 4;
r9 = 4
CMP r0, r9
POST cpsr = nZcviFt_USER
The CMP is effectively a subtract instruction with the result discarded.
TST instruction is a logical AND operation
TEQ is a logical exclusive OR operation.
Foreach, the results are discarded but the condition bits are updated in the cpsr.

Describe About Multiply Instructions of ARM

Multiply Instructions:
The multiply instructions multiply the contents of a pair of registers and, depending upon the instruction, accumulate the results in with another register.The
long multiplies accumulate onto a pair of registers representing a 64-bit value. The final result is placed in a destination register or a pair of registers.

Example: This example shows a simple multiply instruction that multiplies registers r1 and r2 together and places the result into register r0.
PRE r0 = 0x00000000;
r1 = 0x00000002;
r2 = 0x00000002
MUL r0, r1, r2; r0 = r1*r2

POST r0 = 0x00000004;
r1 = 0x00000002;
r2 = 0x00000002
The long multiply instructions (SMLAL, SMULL, UMLAL, and UMULL) produce a 64-bit result. The result is too large to fit a single 32-bit register so
the result is placed in two registers labeled RdLo and
RdHi. RdLo holds the lower 32 bits of the 64-bit result, and RdHi holds the higher 32 bits of the 64-bit result.
Example: Shows an example of a long unsigned multiply instruction. The instruction multiplies registers r2 and r3 and places the result into

Page 15 of 24
register r0 and r1. Register r0 contains the lower 32 bits, and register r1 contains the higher 32 bits of the 64-bit result.
PRE r0 = 0x00000000;
r1 = 0x00000000;
r2 = 0xF0000002;
r3 = 0x00000002
UMULL r0, r1, r2, r3 ; [r1,r0] = r2*r3
POST r0 = 0xe0000004 ; = RdLo
r1 = 0x00000001 ; = RdHi

Describe about Branching Instructions ARM


Branch Instructions:
A branch instruction changes the flow of execution or is used to call a routine. The change of execution flow forces the program counter pc to point to a
new address.

The address label is stored in the instruction as a signed pc-relative offset and must be within approximately 32 MB of the branch instruction. T refers to the
Thumb bit in the cpsr. When instructions set T, the ARM switches to Thumb state.

Example: This example shows a forward and backward branch.

Example: The branch with link, or BL, instruction is similar to the B instruction but overwrites the link register lr with a return address. It
performs a subroutine call.

The branch exchange (BX) and branch exchange with link (BLX) are the third type of branch instruction. It is primarily used to branch to and from Thumb
code.

Describe about Load-Store instructions

Page 16 of 24
Load-Store Instructions:
 Load-store instructions transfer data between memory and processor registers.
 There are three types of load-store instructions:
 single-register transfer
 multiple-register transfer
 swap

Single-Register Transfer:
These instructions are used for moving a single data item in and out of a register.
The data types supported are signed and unsigned words (32-bit), half words (16-bit), and bytes.

Example: LDR r0, [r1]


STR r0, [r1]
The first instruction loads a word from the address stored in register r1 and places it into register r0. The second instruction goes the other way by storing
the contents of register r0 to the address contained in register r1. Register r1 is called the base address register.

Single-Register Load-Store Addressing Modes:


The ARM instruction set provides different modes for addressing memory.
These modes incorporate one of the indexing methods:
Preindex with write back,
Preindex
Postindex
Preindex with write back: It calculates an address from a base register plus address offset and then updates that address base register with the new address.
Preindex: It calculates an address from a base register plus address offset but does not update the address base register.
Postindex: It only updates the address base register after the address is used.
Note: The pre-index mode is useful for accessing an element in a data structure. The post index and pre index with write back modes are useful
for traversing an array.

The offset address can provide in the instructions in different types. They are
Immediate: It means the address is calculated using the base address register and a 12-bit offset encoded in the instruction.
Register: It means the address is calculated using the base address register and a specific register’s contents.
Scaled: It means the address is calculated using the base address register and a barrel shift operation.

Example: Index addressing modes

Page 17 of 24
PRE r0 = 0x00000000;
r1 = 0x00090000
mem32 [0x00009000] = 0x01010101
mem32 [0x00009004] = 0x02020202
Preindexing with write back:
LDR r0, [r1, #4]!
POST (1) r0 = 0x02020202;
r1 = 0x00009004
Preindexing:
LDR r0, [r1, #4]
POST (2) r0 = 0x02020202;
r1 = 0x00009000
Postindexing:
LDR r0, [r1], #4
POST (3) r0 = 0x01010101;
r1 = 0x00009004
Table below shows the addressing modes available for load and store of a 32-bit word or an unsigned byte. A signed offset or register is denoted by “+/−”,
identifying that it is either a positive or negative offset from the base address register Rn. The base address register is a pointer to a byte in memory, and the
offset specifies a number of bytes.

Table below provides an example of the different variations of the LDR instruction. Table below shows the addressing modes available on load and store
instructions using 16-bit half word or signed byte data.

There are no STRSB or STRSH instructions since STRH store both a signed and unsigned half word; similarly STRB stores signed and unsigned bytes.

Page 18 of 24
Table below shows the variations for STRH instructions.

Multiple-Register Transfer:
Load-store multiple instructions can transfer multiple registers between memory and the processor in a single instruction.
The transfer occurs from a base address register Rn pointing into memory.
Load-store multiple instructions can increase interrupts latency.
ARM implementations do not usually interrupt instructions while they are executing.
If an interrupt has been raised, then it has no effect until the load-store multiple instruction is complete.

Table below shows the different addressing modes for the load-store multiple instructions.

Here N is the number of registers in the list of registers. The base register Rn determines the source or destination address for a load store multiple
instruction.
This register can be optionally updated following the transfer when register Rn is followed by the ‘!’ character.
Example: Register r0 is the base register Rn and is followed by !, indicating that the register is updated after the
instruction is executed.

Page 19 of 24
The decrement versions DA and DB of the load-store multiple instructions decrement the start address and then store to ascending memory locations. This
is equivalent to descending memory but accessing the register list in reverse order.

Example: This example shows an STM increment before instruction followed by an LDM decrement after instruction.
PRE
r0 = 0x00009000
r1 = 0x00000009
r2 = 0x00000008
r3 = 0x00000007
STMIB r0!, {r1-r3}
MOV r1, #1
MOV r2, #2
MOV r3, #3
PRE(2) r0 = 0x0000900c
r1 = 0x00000001
r2 = 0x00000002
r3 = 0x00000003
LDMDA r0!, {r1-r3}

POST r0 = 0x00009000
r1 = 0x00000009
r2 = 0x00000008
r3 = 0x00000007
Load-store multiple instructions with a block memory copy example. This example is a simple routine that copies blocks of 32 bytes from a source address
location to a destination address location. The example has two load-store multiple instructions, which use the same increment after addressing mode.

; r9 points to start of source data


; r10 points to start of destination data

Page 20 of 24
; r11 points to end of the source
Loop ; load 32 bytes from source and update r9 pointer
LDMIA r9!, {r0-r7}
; store 32 bytes to destination and update r10 pointer
STMIA r10!, {r0-r7} ; and store them
; have we reached the end
CMP r9, r11
BNE loop
CMP and BNE compare pointers r9 and r11 to check whether the end of the block copy has been reached.
If the block copy is complete, then the routine finishes; otherwise the loop repeats with the updated values of
register r9 and r10. The BNE is the branch instruction B with a condition mnemonic NE (not equal).

Describe about Program Status Register instructions


Program Status Register Instructions:
The ARM instruction set provides two instructions to directly control a program status register (psr).
The MRS instruction transfers the contents of either the cpsr or spsr into a register.
The MSR instruction transfers the contents of a register into the cpsr or spsr.

In the syntax you can see a label called fields. This can be any combination of control (c), extension
(x), status (s), and flags (f).
These fields relate to particular byte regions in a psr, as shown in below Figure.

The MSR first copies the cpsr into register r1. The BIC instruction clears bit 7 of r1.
Register r1 is then copied back into the cpsr, which enables IRQ interrupts.
PRE cpsr = nzcvqIFt_SVC
MRS r1, cpsr
BIC r1, r1, #0x80 ; 0b01000000
MSR cpsr_c, r1
POST cpsr = nzcvqiFt_SVC

Describe about Conditional instructions of ARM


Conditional Execution:
Most ARM instructions are conditionally executed—you can specify that the instruction only executes if the condition code flags pass a given condition or
test. By using conditional execution instructions you can increase performance and code density. The condition field is a two-letter mnemonic appended to
the instruction mnemonic. The default mnemonic is AL, or always execute.
Conditional execution depends upon two components
The condition field: Is located in instruction (bit 31 – bit 28)
The condition flags: Are located in the cpsr (bit 31 – bit 28)

Page 21 of 24
Example: This example shows an ADD instruction with the EQ condition appended. This instruction will only be executed when the zero flag in the cpsr is
set to 1.
; r0 = r1 + r2 if zero flag is set
ADDEQ r0, r1, r2

Describe About Instructions that used for Loading Constants in ARM


There is no ARM instruction to move a 32-bit constant into a register. To aid programming there are two pseudo instructions to move a 32-bit value into a
register

The first pseudo instruction writes a 32-bit constant to a register using whatever instructions are available. The second pseudo instruction writes a relative
address into a register, which will be encoded using a pc relative expression.
Describe About Software Interrupt Instructions of ARM
Software Interrupt Instruction:
A software interrupt instruction (SWI) causes a software interrupt exception, which provides a mechanism for applications to call operating system
routines.

When the processor executes an SWI instruction, it sets the program counter pc to the offset 0x8 in the vector table. The instruction also forces the
processor mode to SVC, which allows an operating system routine to be called in a privileged mode. Each SWI instruction has an associated SWI number,
which is used to represent a particular function call or feature.

Page 22 of 24
Example: An SWI call with SWI number 0x123456, used by ARM toolkits as a debugging SWI.
PRE
cpsr = nzcVqift_USER
pc = 0x00008000
lr = 0x003fffff; lr = r14
r0 = 0x12
0x00008000 SWI 0x123456
POST
cpsr = nzcVqIft_SVC
spsr = nzcVqift_USER
pc = 0x00000008
lr = 0x00008004
r0 = 0x12

Describe about Instructions used for Stack operation in ARM

Stack Operations:
The ARM architecture uses the load-store multiple instructions to carry out stack operations.
The pop operation (removing data from a stack) uses a load multiple instruction.
The push operation (placing data onto the stack) uses a store multiple instruction.
When using a stack you have to decide whether the stack will grow up or down in memory.
A stack is either ascending (A) or descending (D).
Ascending stacks grow towards higher memory addresses.
Descending stacks grow towards lower memory addresses.
A full stack (F), the stack pointer sp points to an address that is the last used or full location (i.e., sp points to the last item on the stack).
An empty stack (E) the sp points to an address that is the first unused or empty location (i.e., it points after the last item on the stack).
There are a number of load-store multiple addressing mode aliases available to support stack operations

ARM has specified an ARM-Thumb Procedure Call Standard (ATPCS) that defines how routines are called and how registers are allocated.

Page 23 of 24
Describe About Swap Instructions used in ARM
Swap Instruction:
The swap instruction is a special case of a load-store instruction. It swaps the contents of memory with the contents of a register.
This instruction is an atomic operation—it reads and writes a location in the same bus operation, preventing any other instruction from reading or writing
to that location until it completes.

Example: The swap instruction loads a word from memory into register r0 and overwrites the memory with register r1.
PRE
mem32 [0x9000] = 0x12345678
r0 = 0x00000000
r1 = 0x11112222
r2 = 0x00009000
SWP r0, r1, [r2]
POST
mem32 [0x9000] = 0x11112222
r0 = 0x12345678
r1 = 0x11112222
r2 = 0x00009000
This instruction is particularly useful when implementing semaphores and mutual exclusion in an operating system.

Page 24 of 24

You might also like