Introduction To ARM
Introduction To ARM
History, Architecture, ARM design philosophy, Registers, Program status register, Instruction pipeline, Interrupts and vector table, ARM
processor families, Instruction set: Data processing instructions, Addressing modes, Branch, Load-Store instructions, PSR instructions, and
Conditional instructions.
ARM Architecture:
• Load/store architecture
• A large array of uniform registers
• Fixed-length 32-bit instructions
• 3-address instructions
The ARM Architecture consists of
Arithmetic Logic Unit
Booth multiplier
Barrel shifter
Control unit
Register file
The ARM processor conjointly has other components like the Program status register, which contains the processor flags (Z, S, V and C). The modes bits
conjointly exist within the program standing register, in addition to the interrupt and quick interrupt disable bits; Some special registers: Some registers are
used like the instruction, memory data read and write registers and memory address register.
Priority encoder:
The encoder is used in the multiple load and store instruction to point which register within the register file to be loaded or kept .
Page 1 of 24
Multiplexers:
Several multiplexers are accustomed to the management operation of the processor buses. Because of the restricted project time, we tend to implement
these components in a very behavioral model. Each component is described with an entity. Every entity has its own architecture, which can be optimized
for certain necessities depending on its application. This creates the design easier to construct and maintain.
Booth Algorithm:
Booth algorithm is a noteworthy multiplication algorithmic rule for 2’s complement numbers. This treats positive and negative numbers uniformly.
Moreover, the runs of 0’s or 1’s within the multiplier factor are skipped over without any addition or subtraction being performed, thereby creating possible
quicker multiplication.
Barrel Shifter:
The barrel shifter features a 32-bit input to be shifted. This input is coming back from the register file or it might be immediate data. The shifter has
different control inputs coming back from the instruction register. The Shift field within the instruction controls the operation of the barrel shifter. This
field indicates the kind of shift to be performed (logical left or right, arithmetic right or rotate right). The quantity by which the register ought to be shifted
is contained in an immediate field within the instruction or it might be the lower 6 bits of a register within the register file.
The shift_val input bus is 6-bits, permitting up to 32 bit shift. The shift type indicates the needed shift sort of 00, 01, 10, 11 are corresponding to shift left,
shift right, an arithmetic shift
right and rotate right, respectively. The barrel shifter is especially created with multiplexers.
Control Unit:
For any microprocessor, control unit is the heart of the whole process and it is responsible for the system operation, so the control unit design is the most
important part within the whole design. The control unit is sometimes a pure combinational circuit design. Here, the control unit is implemented by easy
state machine. The processor timing is additionally included within the control unit. Signals from the control unit are connected to each component within
the processor to supervise its operation.
Page 2 of 24
cycle. The compiler or programmer synthesizes complicated operations (a divide operation) by combining several simple instructions. Each instruction is a
fixed length to allow the pipeline to fetch future instructions before decoding the current instruction. In contrast, in CISC processors the instructions are
often of variable size and take many cycles to execute.
2. Pipelines —The processing of instructions is broken down into smaller units that can be executed in parallel by pipelines. Ideally the pipeline advances by
one step on each cycle for maximum throughput. There is no need for an instruction to be executed by a mini program called microcode as on CISC
processors.
3. Registers—RISC machines have a large general-purpose register set. Any register can contain either data or an address. In contrast, CISC processors have
dedicated registers for specific purposes.
4. Load-store architecture--The processor operates on data held in registers. Separate load and store instructions transfer data between the register bank and
external memory. In contrast, with a CISC design the data processing operations can act on memory directly.
The ARM Design Philosophy
There are a number of physical features that have driven the ARM processor design.
Small to reduce power consumption and extend battery operation
High code density
Price sensitive and use slow and low-cost memory devices.
Reduce the area of the die taken up by the embedded processor.
Hardware debug technology
ARM core is not a pure RISC architecture
Register of ARM
Registers:
ARM processors provide general-purpose and special-purpose registers. Some additional registers are available in privileged execution modes. In all ARM
processors, the following registers are available and accessible in any processor mode:
13 general-purpose registers R0-R12.
One Stack Pointer (SP).
One Link Register (LR).
One Program Counter (PC).
One Application Program Status Register (APSR).
The amount of registers depends on the ARM version. According to the ARM Reference Manual, there are 30 general-purpose 32-bit registers, with the
exception of ARMv6-M and ARMv7-M based processors. The first 16 registers are accessible in user-level mode, the additional registers are available in
privileged software execution (with the exception of ARMv6-M and ARMv7-M). In this tutorial series we will work with the registers that are accessible in
any privilege mode: r0-15. These 16 registers can be split into two groups: general purpose and special purpose registers.
Page 3 of 24
R0-R12: can be used during common operations to store temporary values, pointers (locations to memory), etc. R0, for example, can be referred as
accumulator during the arithmetic operations or for storing the result of a previously called function. R7 becomes useful while working with syscalls as it
stores the syscall number and R11 helps us to keep track of boundaries on the stack serving as the frame pointer (will be covered later). Moreover, the
function calling convention on ARM specifies that the first four arguments of a function are stored in the registers r0-r3.
R13: SP (Stack Pointer). The Stack Pointer points to the top of the stack. The stack is an area of memory used for function-specific storage, which is
reclaimed when the function returns. The stack pointer is therefore used for allocating space on the stack, by subtracting the value (in bytes) we want to
allocate from the stack pointer. In other words, if we want to allocate a 32 bit value, we subtract 4 from the stack pointer.
R14: LR (Link Register). When a function call is made, the Link Register gets updated with a memory address referencing the next instruction where the
function was initiated from. Doing this allows the program return to the “parent” function that initiated the “child” function call after the “child” function is
finished.
R15: PC (Program Counter). The Program Counter is automatically incremented by the size of the instruction executed. This size is always 4 bytes in ARM
state and 2 bytes in THUMB mode.
When a branch instruction is being executed, the PC holds the destination address. During execution, PC stores the address of the current instruction plus 8
(two ARM instructions) in ARM state, and the current instruction plus 4 (two Thumb instructions) in Thumb(v1) state. This is different from x86 where PC
always points to the next instruction to be executed.
Current Program Status Register:
Current Program Status Register:
The Current Program Status Register (CPSR) holds the same program status flags as the APSR, and some additional information.
The CPSR holds:
The APSR flags.
The processor mode.
The interrupt disable flags.
The instruction set state (ARM, Thumb, ThumbEE, or Jazelle®).
The endianness state (on ARMv4T and later).
The execution state bits for the IT block (on ARMv6T2 and later).
The Current Program Status Register is a 32-bit wide register used in the ARM architecture to record various pieces of information regarding the state of the
program being executed by the processor and the state of the processor. This information is recorded by setting or clearing specific bits in the register. The
top four bits (bits 31, 30, 29, and 28) are the condition code (cc) bits and are of most interest to us. Condition code bits are sometimes referred to as "flags".
The lowest 8 bits (bit 7 through to bit 0) store information about the processor's own state. The remaining bits (i.e. bit 27 to bit 8) are currently unused in
most ARM processors.
The N bit is the "negative flag" and indicates that a value is negative.
The Z bit is the "zero flag" and is set when an appropriate instruction produces a zero result.
The C bit is the "carry flag" but it can also be used to indicate "borrows" (from subtraction operations) and "extends" (from shift instructions (LINK)).
The V bit is the "overflow flag" which is set if an instruction produces a result that overflows and hence may go beyond the range of numbers that can be
represented in 2's complement signed format.
For completeness, the other state bits are:
Page 4 of 24
The I and F bits which determine whether interrupts (such as requests for input/output) are enabled or disabled.
The T bit which indicates whether the processor is in "Thumb" mode, where the processor can execute a subset of the assembly language as 16-bit compact
instructions. As Thumb code packs more instructions into the same amount of memory, it is an effective solution to applications where physical memory is
at a premium.
The M4 to M0 bits are the mode bits. Application programs normally run in user mode (where the mode bits are 10000). Whenever an interrupt or similar
event occurs, the processor switches into one of the alternative modes allowing the software handler greater privileges with regard to memory manipulation.
Instruction pipeline
The Process of fetching the next instruction while the current instruction is being executed is called as “pipelining”. Pipelining is supported by the
processor to increase the speed of program execution. Increases throughput. Several operations take place simultaneously, rather than serially in pipelining.
The Pipeline has three stages fetch, decode and execute as shown in figure
Page 6 of 24
first instruction completes execution in the third clock pulse, while the second instruction instead of completing execution in fourth clock pulse completes
the same in fifth clock pulse. Thereafter every instruction completes execution in one clock pulse as seen in this figure
The amount of work done at each stage can be reduced by increasing the number of stages in the pipeline. To improve the performance, the processor then
can be operated at higher operating frequency. As more number of cycles are required to fill the pipeline, the system latency also increases. The data
dependency between the stages can also be increased as the stages of pipeline increase. So the instructions need to be schedule while writing code to
decrease data dependency.
Interrupts and vector table
Interrupts and the Vector Table.
When an exception or interrupt occurs, the processor sets the pc to a specific memory address. The address is within a special address range called the
vector table. The entries in the vector table are instructions that branch to specific routines designed to handle a particular exception or interrupt. The
memory map address 0x00000000 is reserved for the vector table, a set of 32-bit words. On some processors the vector table can be optionally located at a
higher address in memory (starting at the offset 0xffff0000). Operating systems such as Linux and Microsoft’s embedded products can take advantage of
this feature. When an exception or interrupt occurs, the processor suspends normal execution and starts loading instructions from the exception vector table.
Each vector table entry contains a form of branch instruction pointing to the start of a specific routine:
Reset vector is the location of the first instruction executed by the processor when power is
applied. This instruction branches to the initialization code.
Undefined instruction vector is used when the processor cannot decode an instruction.
Software interrupt vector is called when you execute a SWI instruction. The SWI instruction is frequently used as the mechanism to invoke an operating
system routine.
Prefetch abort vector occurs when the processor attempts to fetch an instruction from an address without the correct access permissions. The actual abort
Page 7 of 24
occurs in the decode stage.
Data abort vector is similar to a pre-fetch abort but is raised when an instruction attempts to
access data memory without the correct access permissions.
Interrupt request vector is used by external hardware to interrupt the normal execution flow of the processor. It can only be raised if IRQs are not masked in
the cpsr.
Interrupt vector Table:
ARM7 Family
The ARM7 core has a Von Neumann–style architecture, where both data and instructions use the same bus. The core has a three-stage pipeline and
executes the architecture ARMv4T instruction set.
Page 8 of 24
ARM9 FAMILY
The ARM9 family was announced in 1997. Because of its five-stage pipeline, the ARM9 processor can run at higher clock frequencies than the ARM7
family. The extra stages improve the overall performance of the processor. The memory system has been redesigned to follow the Harvard architecture,
which separates the data D and instruction I buses.
ARM10 Family
The ARM10, announced in 1999, was designed for performance. It extends the ARM9 pipeline to six stages. It also supports an optional vector floating-
point (VFP) unit, which adds a seventh stage to the ARM10 pipeline. The VFP significantly increases floating-point performance and is
compliant with the IEEE 754.1985 floating-point standard.
ARM11 Family
The ARM1136J-S, announced in 2003, was designed for high performance and power- efficient applications. ARM1136J-S was the first processor
implementation to execute architecture ARMv6 instructions. It incorporates an eight-stage pipeline with separate load- store and + arithmetic pipelines.
Included in the ARMv6 instructions are single instruction multiple data (SIMD) extensions for media processing, specifically designed to increase video
processing performance.
Specialized Processors
Strong ARM was originally co-developed by Digital Semiconductor and is now exclusively licensed by Intel Corporation. It is has been popular for PDAs
and applications that require performance with low power consumption. It is a Harvard architecture with separate D + I caches. Strong ARM was the first
high-performance ARM processor to include a five-stage pipeline, but it does not support the Thumb instruction set. Intel’s X-Scale is a follow-on product
to the Strong ARM and offers dramatic increases in performance. At the time of writing, X-Scale was quoted as being able to run up to 1 GHz. X-Scale
executes architecture v5TE instructions. It is a Harvard architecture and is similar to the Strong ARM, as it also includes an MMU. SC100 is at the other
end of the performance spectrum. It is designed specifically for low-power security applications. The SC100 is the first Secure Core and is based on an
ARM7TDMI core with an MPU. This core is small and has low voltage and current requirements, which makes it attractive for smart card applications.
Addressing modes:
When accessing an operand for a data processing or movement instruction, there are several standard techniques used to specify the desired location. Most
processors support several of these addressing modes
1. Immediate addressing: the desired value is presented as a binary value in the instruction.
2. Absolute addressing: the instruction contains the full binary address of the desired value in memory.
3. Indirect addressing: the instruction contains the binary address of a memory location that contains the binary address of the desired value.
4. Register addressing: the desired value is in a register, and the instruction contains the register number.
5.Register indirect addressing: the instruction contains the number of a register which contains the address of the value in memory.
6. Base plus offset addressing: the instruction specifies a register (the base) and a binary offset to be added to the base to form the memory address.
7. Base plus index addressing: the instruction specifies a base register and another register (the index) which is added to the base to form the memory
address.
8. Base plus scaled index addressing: as above, but the index is multiplied by a constant (usually the size of the data item, and usually a power of two)
before being added to the base.
9. Stack addressing: an implicit or specified register (the stack pointer) points to an area of memory (the stack) where data items are written (pushed) or
read (popped) on a last-in-first-out basis.
Memory is addressed by generating the Effective Address (EA) of the operand by adding a signed offset to the contents of a base register Rn.
Pre-indexed mode:
EA is the sum of the contents of the base register Rn and an offset value.
Pre-indexed with writeback:
EA is generated the same way as pre-indexed mode.
EA is written back into Rn.
Post-indexed mode:
EA is the contents of Rn.
Offset is then added to this address and the result is written back to Rn.
Relative addressing mode:
Program Counter (PC) is used as a base register.
Pre-indexed addressing mode with immediate offset
No absolute addressing mode available in the ARM processor.
Offset is specified as:
Page 9 of 24
Immediate value in the instruction itself.
Contents of a register specified in the instruction.
Addressing modes of ARM processor are classified as follows:
Data Processing Instructions: The data processing instructions manipulate data within registers.
They are :
Move instructions
Arithmetic instructions
Logical instructions
Comparison instructions
Multiply instructions
Most data processing instructions can process one of their operands using the barrel shifter. If you use the S suffix on a data processing instruction, then it
updates the flags in the CPSR. Move and logical operations update the carry flag C, negative flag N, and zero flag Z. The carry flag is set from the result of
the barrel shift as the last bit shifted out.The N flag is set to bit 31 of the result.The Z flag is set if the result is zero.
Move Instructions:
It copies N into a destination register Rd, where N is a register or immediate value.
This instruction is useful for setting initial values and transferring data between registers.
Syntax: <instruction> {<cond>} {S} Rd, N
The second operand N for all data processing instructions is a register Rm or a constant preceded by #.
Example: This example shows a simple move instruction.
The MOV instruction takes the contents of register r5 and copies them into register r7, in this case, taking the value 5, and overwriting the value 8 in
register r7.
PRE r5 = 5; r7 = 8
MOV r7, r5; let r7 = r5
POST r5 = 5 r7 = 5
Barrel Shifter:
In above example we showed a MOV instruction where N is a simple register. But N can be more than just a register or immediate value; it can also be a
register Rm that has been Pre- processed by the barrel shifter prior to being used by a data processing instruction. Data processing instructions are processed
within the arithmetic logic unit (ALU). A unique and powerful feature of the ARM processor is the ability to shift the 32-bit binary pattern in one of the
source registers left or right by a specific number of positions before it enters the ALU. Pre-processing or shift occurs within the cycle time of the
Page 11 of 24
instruction. This is particularly useful for loading constants into a register and achieving fast multiplies or division by a power of 2. The below figure shows
the data flow between the ALU and the barrel shifter
The below table lists the syntax for the different barrel shift operations available on data processing instructions. The second operand N can be an
immediate constant proceeded by #, a register value Rm, or the value of Rm processed by a shift.
Page 12 of 24
Example: This example of a MOVS instruction shifts register r1 left by one bit. This multiplies register r1 by a value 21. As you can see, the C
flag is updated in the cpsr because the S suffix is present in the instruction mnemonic.
PRE cpsr = nzcvqiFt_USER
r0 = 0x00000000
r1 = 0x80000004
MOVS r0, r1, LSL #1
POST cpsr = nzCvqiFt_USER
r0 = 0x00000008
r1 = 0x80000004
The arithmetic instructions implement addition and subtraction of 32-bit signed and unsigned values.
Example: This simple subtract instruction subtracts a value stored in register r2 from a value store in register r1. The result is stored in register
r0.
PRE r0 = 0x00000000;
r1 = 0x00000002
r2 = 0x00000001
SUB r0, r1, r2
POST r0 = 0x00000001
Example: This reverse subtract instruction (RSB) subtracts r1 from the constant value #0, writing the result to r0. This instruction use to negate
numbers.
PRE r0 = 0x00000000;
r1 = 0x00000077
Page 13 of 24
RSB r0, r1, #0; Rd = 0x0 - r1
POST r0 = -r1 = 0xFFFFFF89
Example: The SUBS instruction is useful for decrementing loop counters. In this example we subtract the immediate value one from the value
one stored in register r1. The result value zero is written to register r1. The cpsr is updated with the ZC flags being set.
PRE cpsr = nzcviFt_USER
r1 = 0x00000001
SUBS r1, r1, #1
POST cpsr = nZCviFt_USER
r1 = 0x00000000
PRE r0 = 0x00000000;
r1 = 0x00000005
ADD r0, r1, r1, LSL #1
POST r0 = 0x0000000F
r1 = 0x00000005
Logical Instructions:
Logical instructions perform bitwise logical operations on the two source registers.
Example: Shows a logical OR operation between registers r1 and r2. r0 holds the result.
PRE r0 = 0x00000000;
r1 = 0x02040608
r2 = 0x10305070
ORR r0, r1, r2
POST r0 = 0x12345678
Example: Shows a more complicated logical instruction called BIC, which carries out a logical bit clear.
PRE r1 = 0b111:
r2 = 0b0101
BIC r0, r1, r2
POST r0 = 0b1010
This is equivalent to Rd = Rn AND NOT(N)
The logical instructions update the cpsr flags only if the S suffix is present. These instructions can use barrel- shifted second operands in the same way as
the arithmetic instructions.
Page 14 of 24
Describe About Conditional instructions of ARM
Comparison Instructions:
The comparison instructions are used to compare or test a register with a 32-bit value. They update the cpsr flag bits according to the result, but do not
affect other registers. For these instructions no needs to apply the S suffix for update the flags.
Example: This example shows a CMP comparison instruction. You can see that both registers, r0 and r9, are equal before executing the
instruction. The value of the z flag prior to execution is 0 and is represented by a lowercase z. After execution the z flag changes to 1 or an
uppercase Z. This change indicates equality.
PRE cpsr = nzcviFt_USER
r0 = 4;
r9 = 4
CMP r0, r9
POST cpsr = nZcviFt_USER
The CMP is effectively a subtract instruction with the result discarded.
TST instruction is a logical AND operation
TEQ is a logical exclusive OR operation.
Foreach, the results are discarded but the condition bits are updated in the cpsr.
Multiply Instructions:
The multiply instructions multiply the contents of a pair of registers and, depending upon the instruction, accumulate the results in with another register.The
long multiplies accumulate onto a pair of registers representing a 64-bit value. The final result is placed in a destination register or a pair of registers.
Example: This example shows a simple multiply instruction that multiplies registers r1 and r2 together and places the result into register r0.
PRE r0 = 0x00000000;
r1 = 0x00000002;
r2 = 0x00000002
MUL r0, r1, r2; r0 = r1*r2
POST r0 = 0x00000004;
r1 = 0x00000002;
r2 = 0x00000002
The long multiply instructions (SMLAL, SMULL, UMLAL, and UMULL) produce a 64-bit result. The result is too large to fit a single 32-bit register so
the result is placed in two registers labeled RdLo and
RdHi. RdLo holds the lower 32 bits of the 64-bit result, and RdHi holds the higher 32 bits of the 64-bit result.
Example: Shows an example of a long unsigned multiply instruction. The instruction multiplies registers r2 and r3 and places the result into
Page 15 of 24
register r0 and r1. Register r0 contains the lower 32 bits, and register r1 contains the higher 32 bits of the 64-bit result.
PRE r0 = 0x00000000;
r1 = 0x00000000;
r2 = 0xF0000002;
r3 = 0x00000002
UMULL r0, r1, r2, r3 ; [r1,r0] = r2*r3
POST r0 = 0xe0000004 ; = RdLo
r1 = 0x00000001 ; = RdHi
The address label is stored in the instruction as a signed pc-relative offset and must be within approximately 32 MB of the branch instruction. T refers to the
Thumb bit in the cpsr. When instructions set T, the ARM switches to Thumb state.
Example: The branch with link, or BL, instruction is similar to the B instruction but overwrites the link register lr with a return address. It
performs a subroutine call.
The branch exchange (BX) and branch exchange with link (BLX) are the third type of branch instruction. It is primarily used to branch to and from Thumb
code.
Page 16 of 24
Load-Store Instructions:
Load-store instructions transfer data between memory and processor registers.
There are three types of load-store instructions:
single-register transfer
multiple-register transfer
swap
Single-Register Transfer:
These instructions are used for moving a single data item in and out of a register.
The data types supported are signed and unsigned words (32-bit), half words (16-bit), and bytes.
The offset address can provide in the instructions in different types. They are
Immediate: It means the address is calculated using the base address register and a 12-bit offset encoded in the instruction.
Register: It means the address is calculated using the base address register and a specific register’s contents.
Scaled: It means the address is calculated using the base address register and a barrel shift operation.
Page 17 of 24
PRE r0 = 0x00000000;
r1 = 0x00090000
mem32 [0x00009000] = 0x01010101
mem32 [0x00009004] = 0x02020202
Preindexing with write back:
LDR r0, [r1, #4]!
POST (1) r0 = 0x02020202;
r1 = 0x00009004
Preindexing:
LDR r0, [r1, #4]
POST (2) r0 = 0x02020202;
r1 = 0x00009000
Postindexing:
LDR r0, [r1], #4
POST (3) r0 = 0x01010101;
r1 = 0x00009004
Table below shows the addressing modes available for load and store of a 32-bit word or an unsigned byte. A signed offset or register is denoted by “+/−”,
identifying that it is either a positive or negative offset from the base address register Rn. The base address register is a pointer to a byte in memory, and the
offset specifies a number of bytes.
Table below provides an example of the different variations of the LDR instruction. Table below shows the addressing modes available on load and store
instructions using 16-bit half word or signed byte data.
There are no STRSB or STRSH instructions since STRH store both a signed and unsigned half word; similarly STRB stores signed and unsigned bytes.
Page 18 of 24
Table below shows the variations for STRH instructions.
Multiple-Register Transfer:
Load-store multiple instructions can transfer multiple registers between memory and the processor in a single instruction.
The transfer occurs from a base address register Rn pointing into memory.
Load-store multiple instructions can increase interrupts latency.
ARM implementations do not usually interrupt instructions while they are executing.
If an interrupt has been raised, then it has no effect until the load-store multiple instruction is complete.
Table below shows the different addressing modes for the load-store multiple instructions.
Here N is the number of registers in the list of registers. The base register Rn determines the source or destination address for a load store multiple
instruction.
This register can be optionally updated following the transfer when register Rn is followed by the ‘!’ character.
Example: Register r0 is the base register Rn and is followed by !, indicating that the register is updated after the
instruction is executed.
Page 19 of 24
The decrement versions DA and DB of the load-store multiple instructions decrement the start address and then store to ascending memory locations. This
is equivalent to descending memory but accessing the register list in reverse order.
Example: This example shows an STM increment before instruction followed by an LDM decrement after instruction.
PRE
r0 = 0x00009000
r1 = 0x00000009
r2 = 0x00000008
r3 = 0x00000007
STMIB r0!, {r1-r3}
MOV r1, #1
MOV r2, #2
MOV r3, #3
PRE(2) r0 = 0x0000900c
r1 = 0x00000001
r2 = 0x00000002
r3 = 0x00000003
LDMDA r0!, {r1-r3}
POST r0 = 0x00009000
r1 = 0x00000009
r2 = 0x00000008
r3 = 0x00000007
Load-store multiple instructions with a block memory copy example. This example is a simple routine that copies blocks of 32 bytes from a source address
location to a destination address location. The example has two load-store multiple instructions, which use the same increment after addressing mode.
Page 20 of 24
; r11 points to end of the source
Loop ; load 32 bytes from source and update r9 pointer
LDMIA r9!, {r0-r7}
; store 32 bytes to destination and update r10 pointer
STMIA r10!, {r0-r7} ; and store them
; have we reached the end
CMP r9, r11
BNE loop
CMP and BNE compare pointers r9 and r11 to check whether the end of the block copy has been reached.
If the block copy is complete, then the routine finishes; otherwise the loop repeats with the updated values of
register r9 and r10. The BNE is the branch instruction B with a condition mnemonic NE (not equal).
In the syntax you can see a label called fields. This can be any combination of control (c), extension
(x), status (s), and flags (f).
These fields relate to particular byte regions in a psr, as shown in below Figure.
The MSR first copies the cpsr into register r1. The BIC instruction clears bit 7 of r1.
Register r1 is then copied back into the cpsr, which enables IRQ interrupts.
PRE cpsr = nzcvqIFt_SVC
MRS r1, cpsr
BIC r1, r1, #0x80 ; 0b01000000
MSR cpsr_c, r1
POST cpsr = nzcvqiFt_SVC
Page 21 of 24
Example: This example shows an ADD instruction with the EQ condition appended. This instruction will only be executed when the zero flag in the cpsr is
set to 1.
; r0 = r1 + r2 if zero flag is set
ADDEQ r0, r1, r2
The first pseudo instruction writes a 32-bit constant to a register using whatever instructions are available. The second pseudo instruction writes a relative
address into a register, which will be encoded using a pc relative expression.
Describe About Software Interrupt Instructions of ARM
Software Interrupt Instruction:
A software interrupt instruction (SWI) causes a software interrupt exception, which provides a mechanism for applications to call operating system
routines.
When the processor executes an SWI instruction, it sets the program counter pc to the offset 0x8 in the vector table. The instruction also forces the
processor mode to SVC, which allows an operating system routine to be called in a privileged mode. Each SWI instruction has an associated SWI number,
which is used to represent a particular function call or feature.
Page 22 of 24
Example: An SWI call with SWI number 0x123456, used by ARM toolkits as a debugging SWI.
PRE
cpsr = nzcVqift_USER
pc = 0x00008000
lr = 0x003fffff; lr = r14
r0 = 0x12
0x00008000 SWI 0x123456
POST
cpsr = nzcVqIft_SVC
spsr = nzcVqift_USER
pc = 0x00000008
lr = 0x00008004
r0 = 0x12
Stack Operations:
The ARM architecture uses the load-store multiple instructions to carry out stack operations.
The pop operation (removing data from a stack) uses a load multiple instruction.
The push operation (placing data onto the stack) uses a store multiple instruction.
When using a stack you have to decide whether the stack will grow up or down in memory.
A stack is either ascending (A) or descending (D).
Ascending stacks grow towards higher memory addresses.
Descending stacks grow towards lower memory addresses.
A full stack (F), the stack pointer sp points to an address that is the last used or full location (i.e., sp points to the last item on the stack).
An empty stack (E) the sp points to an address that is the first unused or empty location (i.e., it points after the last item on the stack).
There are a number of load-store multiple addressing mode aliases available to support stack operations
ARM has specified an ARM-Thumb Procedure Call Standard (ATPCS) that defines how routines are called and how registers are allocated.
Page 23 of 24
Describe About Swap Instructions used in ARM
Swap Instruction:
The swap instruction is a special case of a load-store instruction. It swaps the contents of memory with the contents of a register.
This instruction is an atomic operation—it reads and writes a location in the same bus operation, preventing any other instruction from reading or writing
to that location until it completes.
Example: The swap instruction loads a word from memory into register r0 and overwrites the memory with register r1.
PRE
mem32 [0x9000] = 0x12345678
r0 = 0x00000000
r1 = 0x11112222
r2 = 0x00009000
SWP r0, r1, [r2]
POST
mem32 [0x9000] = 0x11112222
r0 = 0x12345678
r1 = 0x11112222
r2 = 0x00009000
This instruction is particularly useful when implementing semaphores and mutual exclusion in an operating system.
Page 24 of 24