0% found this document useful (0 votes)
23 views63 pages

ARM Microcontroller - CIE 2

Uploaded by

mithunuppar78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views63 pages

ARM Microcontroller - CIE 2

Uploaded by

mithunuppar78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

ARM

Microcontroller

Vanitha A
Assistant Professor
ARM Organization and Implementation

• The ARM Processors developed between 1983 and 1985 as


well as between 1990 and 1995 (ARM3 & ARM7) used 3
stage pipeline.
• Since 1995 several new ARM cores have been introduced
which deliver significantly higher performance through the
use of 5-stage pipelines and separate instruction and data
memories
• 3-stage pipeline ARM organization
• The register bank, stores the processor state. It has two read
ports and one write port which can each be used to access
any register.
• The barrel shifter, which can shift or rotate one operand by
any number of bits.
• The ALU, which performs the arithmetic and logic functions
required by the instruction set.
• The address register and incrementer, which select and hold
all memory addresses and generate sequential addresses
when required.
• The data registers, which hold data passing to and from
memory.
• The instruction decoder and associated control logic.
• In a single-cycle data processing instruction, two register
operands are accessed.
• The value on the B bus is shifted and combined with the value
on the A bus in the ALU, then the result is written back into
the register bank.
• The program counter value is in the address register, from
where it is fed into the incrementer, then the incremented
value is copied back into rl5 in the register bank and also into
the address register to be used as the address for the next
instruction fetch.
• The 3-stage pipeline
Fetch; the instruction is fetched from memory and placed in
the instruction pipeline.

Decode; the instruction is decoded and the datapath control


signals prepared for the next cycle.

Execute; the register bank is read, an operand shifted, the


ALU result generated and written back into a destination
register.

At any one time, three different instructions may occupy


each of these stages, so the hardware in each stage has to
be capable of independent operation.
Multi-cycle instruction execution

In this instruction sequence, all parts of the processor are active in every cycle.
The simplest way to view breaks in the ARM pipeline is to observe that:
• All instructions occupy the datapath for one or more adjacent cycles.
• For each cycle that an instruction occupies the datapath, it occupies the
decode logic in the immediately preceding cycle.
• During the first datapath cycle each instruction issues a fetch for the next
instruction but one.
• Branch instructions flush and refill the instruction pipeline.
• ARM instruction execution
• Data processing instructions
• A data processing instruction requires two operands.
• one operand is always a register and the other is either a
second register or an immediate value.
• The second operand is passed through the barrel shifter for
general shift operation, then it is combined with the first
operand in the ALU using a general ALU operation.
• The result from the ALU is written back into the destination
register.
• All these operations take place in a single clock cycle.
Eg: ADD R0,R1,R2
ADD R0,R1, #05h
• Data transfer instructions
• A data transfer (load or store) instructions, A register is used
as the base address, which is added (or subtracted) an offset
(register or an immediate value).
• In first cycle address is sent to the address register, and in a
second cycle the data transfer takes place.
• During the data transfer cycle, the ALU holds the address
components from the first cycle and compute an auto-
indexing modification to the base register if this is required.
• the incremented PC value is stored in the register bank at the
end of the first cycle and address register is free to accept the
data transfer address for the second cycle.
• At the end of the second cycle the PC is fed back to the
address register to allow instruction prefetching to continue.
• The data path operation for the two cycles of a data store
instruction (SIR) with an immediate offset are shown below:
• When the instruction specifies the store of a byte data type,
the 'data out' block extracts the bottom byte from the register
and replicates it four times across the 32-bit data bus.
• Load instructions follow a similar pattern except that the data
from memory only gets as far as the 'data in' register on the
second cycle .
• A third cycle is needed to transfer the data from there to the
destination register.
Note
• The address register is a pipeline register between the
processor data path and the external memory.
• The address register can produce the memory address for the
next cycle a little before the end of the current cycle.
Branching Instructions
• Branch instructions compute the target address in the first
cycle as shown in Figure
• Branch instructions compute the target address in the first
cycle .
• A 24-bit immediate field is extracted from the instruction and
then shifted left two bit positions to give a word-aligned
offset which is added to the PC.
• The result is issued as an instruction fetch address, and while
the instruction pipeline refills the return address is copied
into the link register (r14) if required (cycle 2).
• The third cycle, which is required to complete the pipeline
refilling and also used to make a small correction to the
value stored in the link register in order to point directly at
the instruction which follows the branch.
ARM implementation
• Clocking scheme
• Datapath timing
• Adder design
• ALU functions
• The barrel shifter
• Multiplier design
• The register bank
• Datapath layout
• Control structures
• Datapath timing
• The normal timing of the datapath components in a 3-stage
pipeline is illustrated in Figure

• The register read buses are dynamic and are precharged


during phase 2.
• When phase 1 goes high, the selected registers discharge the
read buses which become valid early in phase 1. One
operand is passed through the barrel shifter.
• The ALU has input latches which are open during phase 1,
allowing the operands to begin combining in the ALU as
soon as they are valid, but they close at the end of phase 1 so
that the phase 2 precharge does not get through to the ALU.
The ALU then continues to process the operands through
phase 2, producing a valid output towards the end of the
phase which is latched in the destination register at the end
of phase 2.
The minimum datapath cycle time is therefore the sum of:
• the register read time;
• the shifter delay;
• the ALU delay;
• the register write set-up time;
• the phase 2 to phase 1 non-overlap time.
Adder design
• The first ARM processor prototype used a simple ripple-
carry adder
• ARM processor prototype used a simple ripple-carry adder.
• In order to allow a higher clock rate, ARM2 used a 4-bit
carry look-ahead scheme to reduce the worst-case carry path
length.
• The logic produces carry generate (G) and propagate (P)
signals which control the 4-bit carry-out.
ALU functions
• The ALU does not only add its two inputs. It must perform
the full set of data operations defined by the instruction set,
including address computations for memory transfers,
branch calculations, bit-wise logical functions
• The barrel shifter
• The ARM architecture supports instructions which perform a
shift operation in series with an ALU operation.
• The shifter performance is therefore critical since the shift
time contributes directly to the datapath cycle time.
• In order to minimize the delay through the shifter, a cross-bar
switch matrix is used to steer each input to the appropriate
output.
• A 4 x 4 matrix is shown
• Each input is connected to each output through a switch.
• The shifting functions are implemented by wiring switches
along diagonals to a common control input:
• For a left or right shift function, one diagonal is turned on.
This connects all the input bits to their respective outputs
where they are used.
• Those outputs that are not connected to any input during a
particular switching operation remain at '0' giving the zero
filling.
• For a rotate right function, the right shift diagonal is enabled
together with the complementary left shift diagonal.
• Arithmetic shift right uses sign-extension rather than zero-fill
for the unconnected output bits.
Multiplier design
All ARM processors apart from the first prototype have
included hardware support for integer multiplication. Two
types of multiplier are used in ARM:
• Older ARM cores include low-cost multiplication hardware
that supports only the 32-bit result multiply and multiply-
accumulate instructions.
• Recent ARM cores have high-performance multiplication
hardware and support the 64-bit result multiply and
multiply-accumulate instructions.
register bank
• The last major block on the ARM datapath is the
register bank.
• These register cells are arranged in columns to form
a 32-bit register, and the columns are packed
together to form the complete register bank.
Architectural Support for High-level Languages

Abstraction in software design


• Assembly-level abstraction
• the assembly programming level works (almost) directly
with the raw machine instruction set, expressing the program
in terms of instructions, addresses, registers, bytes and
words.
• High-level languages
• A high-level language allows the programmer to think in
terms of abstractions that are above the machine level.
• the programmer may not even know on which machine the
program will ultimately run.
Data types
• . A computer data type can therefore be characterized by:
• the number of bits it requires;
• the ordering of those bits;
• the uses to which the group of bits is put.
Decimal numbers
The right-hand digit represents the number of units, the digit to
its left the number of tens, then hundreds, thousands, and so
on. Each time we move left one place the value of the digit is
increased by a factor of 10.
Eg :1995
Binary coded decimal
four Boolean variables to be able to represent each digit from
0 to 9 differently.
Eg:0001 1001 1001 0101
This is the binary coded decimal scheme which is supported
by some computers and is commonly used in pocket
calculators.
Binary notation
the right-hand digit represents units, the next one 2, then 4, and
so on. Each time we move left one place the value of the digit
doubles.
Eg:11111001011
Hexadecimal notation
the binary number can be split into groups of four binary digits
and each group replaced by a hexadecimal number.
Eg: 7CB
• Number ranges
• The number of decimal digits that are required to represent
the number.
• The 32-bit (unsigned) integer, which has a value in the range:
• 0 to 4 294 967 29510 = 0 to FFFFFFFF16
• If a large number is subtracted from a small number, the
result will be negative and cannot be represented by an
unsigned integer of any size.
• Signed integers
• ARM supports a 2's complement binary notation where the
value of the top bit is made negative; in a 32-bit signed
integer all the bits have the same value as they have in the
unsigned case apart from bit 31, which has the value -231
instead of+231. Now the range of numbers is:
• -2 147 483 64810 to +2 147 483 64710 = 8000000016 to
7FFFFFFF,6
• Real numbers
• 'Real' numbers are used to represent fractions and
transcendental values that are useful when dealing with
physical quantities.
• ASCII
• The normal way to store an ASCII character in a computer is
to put the 7-bit binary code into an 8-bit byte.
• ANSI C basic data types
• The dialect of the 'C' language defined by the American
National Standards Institute (ANSI),
the following basic data types:
• Signed and unsigned characters of at least eight bit.
• Signed and unsigned short integers of at least 16 bits.
• Signed and unsigned integers of at least 16 bits.
• Signed and unsigned long integers of at least 32 bits.
• Floating-point, double and long double floating-point
numbers.
• Enumerated types.
• Bitfields

ANSI C derived data types


• In addition, the ANSI C standard defines derived data types:
• Arrays of several objects of the same type.
• Functions which return an object of a given type
• Structures containing a sequence of objects of various types.
• Pointers (which are usually machine addresses) to objects
of a given type
• Unions which allow objects of different types to occupy the
same space at different times.
• The ARM C compiler aligns characters on byte boundaries.
• ARM architectural support for C data types
• Current versions of the ARM include signed byte and signed
and unsigned 16-bit loads and stores, providing some native
support for short integer and signed character types.
Expressions
• The basic ARM integer data processing instructions implement
most of the C integer arithmetic, bit-wise and shift primitives
directly.
• All data processing instructions operate only on values in
register, the key to the efficient evaluation of a complex
expression is to get the required values into the registers in the
right order and to ensure that frequently used values are
normally resident in registers.
• The number of values that can be held in registers and the
number of registers remaining for intermediate results during
expression evaluation .
• Optimizing this trade-off is a major task for the compiler, as is
sorting out the right order to load and combine operands to
achieve the result prescribed by the operator precedence
defined in the language.
ARM support
• The 3-address instruction format used by the ARM gives the
compiler the maximum flexibility in how it preserves or re-
uses registers during expression evaluation.
• Thumb instructions are generally 2-address, which restricts
the compiler's freedom to some extent, and the smaller
number of general registers also makes its job harder.
• Accessing operands
A procedure will normally work with operands that are
presented in one of the following ways, and can be accessed as
indicaed:
• Pointer arithmetic
1. As an argument passed through a register. The value is
already in a register, so no further work is necessary.
2. As an argument passed on the stack. Stack pointer (r13)
relative addressing with an immediate offset known at
compile-time allows the operand to be collected with a single
LDR.
3. As a constant in the procedure's literal pool. PC-relative
addressing, again with an immediate offset known at compile-
time, gives access with a single LDR
1. As a local variable. Local variables are allocated space on
the stack and are accessed by a stack pointer relative LDR.
2. As a global variable. Global (and static) variables are
allocated space in the static area and are accessed by static
base relative addressing. The static base is usually in r9.
Arithmetic on pointers depends on the size of the data type that
the pointers are pointing to. If a pointer is incremented it
changes in units of the size of the data item in bytes.
Thus: int *p; P = P+l;
will increase the value of p by 4 bytes. Since the size of a data
type is known at compile-time, the compiler can scale constants
by an appropriate amount,
int i = 4 ; p = p + i;
If p is held in r0 and i in r1, the change to p may be compiled as
ADD r0, r0, r1, LSL #2 ; scale r1 to int
• Arrays
• Arrays in C are little more than a shorthand notation for
pointer operations,
• The declaration: int a[10];
• establishes a name, ’a’, for the array which is just a pointer to
the first element, and a reference to a [ i ] is equivalent to the
pointer-plus-offset form * (a+ i); the two may be used
interchangeably
• Conditional statements
• Conditional statements are executed if the Boolean result of a
test is true (or false); in C these include if...else statements
and switches.
if...else
• Consider a C statement to find the maximum of two integers:
•{
if (a>b)
c=a;
else c=b;
}
• If a=r0, b=r1and c=r2, the compiled code could be as simple
as:
• The ARM program
i = 10;
CMP r0, rl ; if (a>b)... if(i>5)
BLE ELSE ; i = 7;
MOV r2, r0 ; ..c=a.. else
B stop ; i = 0;
ELSE MOV r2, rl ; c=b
Stop: B stop
end Initialize R0
MOV R0, #10 ;
CMP R0, #5
MOVGT R0, #7
MOVLE R0, #0
Switches
• A switch, or case, statement extends the two-way decision of
an if...else statement to many ways.
• The standard C form of a switch statement is:
switch (expression)
{
case constant-expression]: statements;
case constant-expression^: statementS2 case
constant-expression^: statements^ default: statements^ }
Normally each group of statements ends with a 'break' (or a
'return') to cause the switch statement to terminate, otherwise
the C semantics cause execution to fall through to the next
group of statements.
A switch statement with 'breaks' can always be translated into
an equivalent series of if..else statements:
temp = expression;
if (temp==constant-expressionj) {statements]}
else ...
else if (temp==constant-expressionN) {statements^}
else {statementsj)}
A jump table contains a target address for each possible value
of the switch expression:
• int main(void) caseentry:
LDR R1, value
{ LDR R0, [R1]
int value; LDR R2, =casetable
int var; .
value = 2; .
switch (value)
a:
{ MOV R4, #0x1
case 1: B caseend
var = 2; b:
break; MOV R4, #0x2
B caseend
case 2: c:
var = 3; MOV R4, #0x3
case 3: d:
var = 4; MOV R4, #0x4
break; B caseend
caseend:
default: LDR R1, =var
var = 1; STR R4, [R1]
break; MOV R7, #0x1
XOR R0, R0, R0
}
casetable:
return 0; .word a
} .word b
.word c
.word d
value: .word #0x2
var: .space #0x4
• compiling a switch statement is illustrated by a procedure in the
'Dhrystone' benchmark program which ends (effectively) as follows:
on entry al = 0, a2 = 3, v2 = switch expression
CMP v2,#4 check value for overrun..
ADDLS pc,pc,v2,LSL #2 ..if OK, add to pc (+8)
switch (a) LDMDB fp,{vl,v2,fp,sp,pc} ; ..if not OK, return B LO
{ case 0
case 0: *b = 0; B LI case 1
break; B L2 case 2
case 1: LDMDB fp,{vl,v2,fp,sp,pc} ; case 3 (return)
MOV al,#2 case 4
if (c>100) *b = 0; STR al,[vl]
else *b = 3; LDMDB fp,{vl,v2,fp,sp,pc} ; return
break; LO STR al,[vl]
case 2: LDMDB fp,{vl,v2,fp,sp,pc} ; return
*b = 1; LI LDR a3,c_ADDR ; get address of c
LDR a3, [a3] ; get c
break CMP a3,#&64 ; c>100?..
case 3: STRLE a2, [vl] ; .. No: *b = 3
break; STRGT al, [vl] ; .. Yes: *b = 0
case 4: LDMDB fp,{vl,v2,fp,sp,pc) ; return c_
ADDR DCD L2
*b = 2; break;
MOV al,ttl
} /* end of switch */ STR al,[vl] ; *b = 1
} /* end of procedure * LDMDB fp,{vl,v2,fp,sp,pc} ; return
• If …else if …else statement
MOV R0,#2
Void main()
CMP R0,#0
{ BNE NEXT1
Int x=2; MOV R2,R0
B STOP
If (x=0) NEXT1:CMP RO,#1
Printf (“value is 0”) BNE NEXT2
MOV R2,R0
Elseif(x=1) B STOP
Printf (“value is 1”) NEXT2:CMP R0,#2
Elseif (x=2) BNE NEXT 3
MOV R2,RO
Printf (“value is 2”) B STOP
. .
.
.else .
Printf(“no value matches”) STOP:B STOP
END
.
• Loops
• The C language supports three forms of loop control
structure:
• for(el;e2;e3) {..}
• while (el) {..}
• do {..} while (el)

• Here el, e2 and e3 are expressions which evaluate to 'true' or


'false' and {..} is the body of the loop which is executed a
number of times determined by the control structure.
for loops
For(i=0;i<10;i++)
{
A[i]=0
}
R0=base and R1=index
ADR R0,Table1
MOV R2,#0
MOV R1,#0
Back:CMP R1,#0a
BHE Stop
STRB R2,[R0,R1]
ADD R1,R1,#1
B back
Stop: B Stop
• while loops
while(expression)
{
Loop
}

ARM
Back: CMP R0,#0
BEQ exit
.
Loop
.
B back
Exit:
.
• Do while loop ARM
do Start:
{ :
Loop Loop
} :
While(expression) CMP R0,#0
: BNE start
Exit:
:
Stop: B Stop
end
Functions and procedures
• Program design
• Large programs are broken down into components that are
small for testing instead of a large monolithic program that is
too complex to test fully and hidden 'bugs’ .
• Each small software component should perform a specified
operation using a well-defined interface.
• The full program should be designed as a hierarchy of
components.
• The top of the hierarchy is the program called main.
• At the lowest level of the hierarchy there are leaf routines;
these are routines which do not themselves call any lower-
level routines.
• The bottom-level routines will be library or system functions

Typical hierarchical program structure.


• Subroutine: a generic term for a routine that is called by a
higher-level routine.
• Function: a subroutine which returns a value through its
name.
• Procedure: a subroutine which is called to carry out some
operation on specified data item(s).
• An argument is an expression passed to a function call.
• parameter is a value received by the function .
• A call by reference semantics would cause any change to a
parameter within a function to be passed back to the calling
program.
ARM Procedure Call Standard
ARM has defined a set of rules for procedure entry and exit.
The ARM Procedure Call Standard (APCS) is used by the
ARM C compiler.
• It defines particular uses for the 'general-purpose' registers.
• It defines which form of stack is used from the full/empty,
ascending/descending choices supported by the ARM
instruction set.
• It defines the format of a stack-based data structure used for
back-tracing when debugging programs.
• It defines the function argument and result passing
mechanism to be used by all externally visible functions and
procedures
• It supports the ARM shared library mechanism, it supports a
standard way for shared code to access static data.
APCS register usage
The registers are divided into three sets:
1. Four argument registers which
pass values into the function.
2. Five (to seven) register variables
which the function must return with
unchanged values.
3. Seven (to five) registers which
have a dedicated role,
some of the time.
APCS variants
There are different variants of the APCS which are used to
generate code for a range of different systems. They support:
• 32-or 26-bit PCs
Older ARM processors operated in a 26-bit address space.
• Implicit or explicit stack-limit checking.
Stack overflow must be detected if code is to operate
reliably. The compiler can insert instructions to perform
explicit checks for overflow.
• Two ways to pass floating-point arguments.
The ARM floating-point architecture specifies a set of eight
floating-point registers. The APCS can use these to pass
floating-point arguments into functions.
• Re-entrant or non-re-entrant code.
Code specified as re-entrant is position-independent and
addresses all data indirectly through the static base register.
Use of memory
• In ARM, memory is arranged as a linear set of logical
addresses.
• A C program expects to have access to a fixed area of
program memory and two data areas that grow dynamically .
These dynamic data areas are:
• Stack
Whenever a function is called, a new activation frame is
created on the stack to hold a backtrace record, local variables,
and so on. When a function returns its stack space is
automatically recovered and will be reused for the next
function call.
• Heap
The heap is an area of memory used to satisfy program requests
for more memory for new data structures. the heap will grow
until memory runs out.
• The application image is loaded into the lowest address,
• The heap grows upwards from the top of the application and
the stack grows downwards from the top of memory.
• The unused memory between the top of the heap and the
bottom of the stack is allocated on demand to the heap or the
stack, and if it runs out the program stops due to lack of
memory.
• The memory management unit will allocate additional pages,
on demand, to the heap or the stack, until it runs out of pages
to allocate.
• In a system with no memory management support the
application will be allocated all or part of the physical
memory address space remaining once the operating system
has had its requirements met, and then the application runs
out of memory precisely when the top of the heap meets the
bottom of the stack.
• The stack is a series of chained chunks within the heap. This
causes the application to occupy a single contiguous area of
memory, which grows in one direction as required.
At each function call, stack space is allocated for arguments (if
they cannot all be passed in registers), to save registers for use
within the function, to save the return address and the old stack
pointer, and to allocate memory on the stack for local variable.

All the stack space is recovered on function exit and reused for
subsequent calls,
Data alignment
• The ARM C compiler generally aligns data items on
appropriate boundaries:
• Bytes are stored at any byte address.
• Half-words are stored at even byte addresses.
• Words are stored on four-byte boundaries.
• The data items of different types are declared at the same
time, the compiler will introduce padding for alignment.
Eg: struct SI {char c; int x; short s;}
• This structure will occupy three words of memory as shown
in Figure

normal struct memory allocation


• The compiler can minimize memory wastage by organizing
structures appropriately.
• A structure with the same contents with reordered, occupies
only two memory words instead of the three.
• In ordering structure elements, the smaller data type can be
group with word will minimize the amount of padding that
the compiler has to insert to maintain efficient alignment.
• Eg: struct S2 {char c; short s; int x;}

Fig: more efficient struct memory allocation.


• The pack data is used to minimize memory use even this will
reduce performance.
• The ARM C compiler can produce code that works with
packed data structures where all the padding is removed.
• A packed struct gives precise control of the alignment of all
the fields.
• Eg: packed struct S3 {char c; int x; short s;}

Fig: packed struct memory allocation.


Write, compile and run a 'Hello World' program
written in C.

/* Hello World in C */
#include int main()
{
printf( "Hello World\n" );
return ( 0 );
}
Show how the following data is organized in memory:
struct SI
{
char c;
int x;
};
struct S2
{
char c2[5];
SI si [2];
}

You might also like