Pa Seit Apk Unit-6
Pa Seit Apk Unit-6
Processor Architecture
[214451]
Mrs.A.P.Kulkarni
Sinhgad Technical Education Society’s
UNIT No.: 6
Current Trends in Processor Architecture(06 Hrs)
Mrs.A.P.Kulkarni
UNIT – 6: Current Trends in Processor Architecture
ARM & RISC :ARM and RISC design philosophy, Introduction to ARM processor & its versions
ARM 7, ARM 9, ARM 11, Features& advantages of ARM processor, Suitability of ARM processor
in embedded applications, ARM 7 dataflow model, Programmers model. CPSR & SPSR registers,
Modes of operation, Difference between PIC and ARM.
Unit Objectives
1. Understand the concept of ARM and RISC design philosophy
2. Introduction to ARM processor
3. Features& advantages of ARM processor
4. CPSR & SPSR registers
5. Difference between PIC and ARM.
Unit Outcomes
1. Concept of ARM and RISC design philosophy
2. Features& advantages of ARM processor
3. Difference between PIC and ARM.
4. CPSR & SPSR registers
Books :
T1: Mahamad Ali Mazadi: The 8051 microcontroller & embedded systems‖ 2nd Edition ,PHI
R2: Ramesh Gaonkar, “Fundamentals of Microcontrollers and Applications In Embedded Systems Microcontroller ”
R3: Microchip’s PIC18FXXX Data Sheet
Mrs.A.P.Kulkarni
RISC Design Philosophy
Mrs.A.P.Kulkarni
Features of RISC
RISC stands for Reduced Instruction Set Computer
Most RISC processor use hardwired control
The most of the RISC processors use 32-bit instructions
They have very few instructions
The instructions are predominantly register based
The limited (3 to 5) addressing modes are used by these processors
The memory access cycle is broken into pipelined access operations
This involves the use of caches & working registers
A large register file & separate instruction & data caches are used
one cycle execution time: RISC processors have a CPI (clock per
instruction) of one cycle. This is due to the optimization of each
instruction on the CPU and a technique called
pipelining: a techique that allows for simultaneous execution of
parts, or stages, of instructions to more efficiently process
instructions;
The RISC architecture is used in ARM cores
Mrs.A.P.Kulkarni
Mrs.A.P.Kulkarni
The RISC philosophy is implemented with four
major design rules:
RISC processor are designed to execute simple but powerful
instructions within a single cycle at a high clock speed
RISC follows the four major design rules
1. Instructions – RISC processors have a reduced number of
instruction classes.
• These classes provide simple operations that can each execute in
a single cycle.
• The compiler or programmer synthesizes complicated operations
(a divide operation) by combining several simple instructions.
• Each instruction is a fixed length to allow the pipeline to fetch
future instructions before decoding the current instruction.
• In contrast, in CISC processors the instructions are often of
variable size and take many cycles to execute.
Mrs.A.P.Kulkarni
The RISC philosophy is implemented with four
major design rules:
2.Pipelines —The processing of instructions is broken down
into smaller units that can be executed in parallel by pipelines.
•Ideally the pipeline advances by one step on each cycle for
maximum throughput.
•There is no need for an instruction to be executed by a mini
program called microcode as on CISC processors.
Mrs.A.P.Kulkarni
The RISC philosophy is implemented with four
major design rules:
Mrs.A.P.Kulkarni
The RISC philosophy is implemented with four
major design rules:
Mrs.A.P.Kulkarni
Comparison Between RISC & CISC
RISC CISC
It is a Reduced Instruction Set Computer. It is a Complex Instruction Set
Computer.
It emphasizes on software to optimize the It emphasizes on hardware to optimize
instruction set. the instruction set.
It is a hard wired unit of programming in Microprogramming unit in CISC
the RISC Processor. Processor.
It requires multiple register sets to store It requires a single register set to store
the instruction. the instruction.
RISC has simple decoding of instruction. CISC has complex decoding of
instruction.
Uses of the pipeline are simple in RISC. Uses of the pipeline are difficult in
CISC.
It uses a limited number of instruction It uses a large number of instruction that
that requires less time to execute the requires more time to execute the
instructions. instructions.
Mrs.A.P.Kulkarni
Comparison Between RISC & CISC
RISC CISC
It uses LOAD and STORE that are It uses LOAD and STORE instruction in the
independent instructions in the register-to- memory-to-memory interaction of a
register a program's interaction. program.
RISC has more transistors on memory CISC has transistors to store complex
registers. instructions.
The execution time of RISC is very short. The execution time of CISC is longer.
RISC architecture can be used with high-end CISC architecture can be used with low-end
applications like telecommunication, image applications like home automation, security
processing, video processing, etc. system, etc.
It has fixed format instruction. It has variable format instruction.
The program written for RISC architecture Program written for CISC architecture tends
needs to take more space in memory. to take less space in memory.
Example of RISC: ARM, PA-RISC, Power Examples of CISC: VAX, Motorola 68000
Architecture, Alpha, AVR, ARC and the family, System/360, AMD and the Intel x86
SPARC. CPUs.
Mrs.A.P.Kulkarni
ARM Design Philosophy(features of RISC which is Accepted by
RISC)
•A Large Uniform Register File-
•An ARM processor contains a large number of registers like a
RISC
•A Load-store Architecture-
•ARM processor uses a RISC architecture.
•It contains a large number of registers.
• The instruction set contains separate load & store instructions
for transferring data between the register bank & external
memory.
• When the data is to be operated, it is stored in the register &
then processed.
•The memory accesses are separated from data processing.
•So we can use data items stored in registers multiple times
without multiple memory accesses
•This is advantageous since memory accesses are costly.
•The data processing instructions can operate on memory
directly Mrs.A.P.Kulkarni
ARM Design Philosophy
•Uniform & fixed length (32-bit) instruction fields-
•ARM processor instruction set contains a reduced number of
instructions.
•Also, these instructions perform simple operations which can be
executed in a single cycle.
•If the complicated operations, such as division, are to be executed
in a single cycle.
•If the complicated operations, such as division, are to be performed
the compiler or programmer synthesizes them by combining several
simple instructions.
•Each instruction is a fixed length
•This allows the pipeline to fetch future instructions before
decoding the current instruction in contrast the CISC processor
contains the instructions of variable size, complex & take more
cycles to execute
•So in CISC complexity is in processor hardware whereas in RISC
complexity is in compiler Mrs.A.P.Kulkarni
ARM Design Philosophy
•Three-address instruction formats-
•Most instructions of RISC & ARM processor have three address
instruction formats.
• That is, two source operands are stored in two different address
locations & third operand in a third address location
Mrs.A.P.Kulkarni
ARM Design Philosophy(features of RISC which is
Rejected by RISC)
• Register Windows: The main problem with register windows is the large
chip area occupied by the large number of registers
– This feature is therefore rejected by ARM processor to reduce cost
• Delayed branches: when branch instruction appears in a program &
hence in a pipeline, a delay slot is created
– This causes pipeline problems since this is the disturbance to the smooth
flow of instructions
– This delay slot is filled with some useful instruction which in most cases
will be executed.
– In most RISC processors, this problem is tried to reduce by using delayed
branches
– Here, the branch takes place after the following instruction has executed
– This delayed branching technique works well when the processor uses
single pipeline
– However, it may create problem for super-scalar implementations & also
can not work well with branch prediction mechanism
Mrs.A.P.Kulkarni
ARM Design Philosophy(features of RISC which is
Rejected by RISC)
• Single cycle execution of all instructions-ARM processor
executes most of the data processing instructions in a single cycle
– However, many other instructions need multiple clock cycles for their
execution
– More than one clock cycle becomes the requirement even for a simple
load & store instruction
– At least two memory accesses one of for the instruction & one for the
data are needed
Mrs.A.P.Kulkarni
Introduction to ARM Processor
• ARM has several processors
that are grouped into number
of families based on the
processor core they are
implemented with.
• The architecture of ARM
processors has continued to
evolve with every family.
• Some of the famous ARM
Processor families are
ARM7, ARM9, ARM10 and
ARM11.
• Every ARM processor
implementation executes a
specific instruction set
architecture (ISA)
Mrs.A.P.Kulkarni
ARM Nomenclature
• ARM follows the nomenclature shown in the below figure to describe the
processor implementations.ARM Nomenclature
The letters or words after “ARM” are used to indicate the features of a processor.
x – Family or series
y – Memory Management/Protection Unit
z – Cache
T – 16 bit Thumb decoder
D – JTAG Debugger
M – Fast Multiplier
I – Embedded In-circuit Emulator (ICE) Macrocell
E – Enhanced Instructions for DSP (assumes TDMI)
J – Jazelle (for accelerated JAVA execution)
F – Vector Floating-point Unit
S – Synthesizable Version
Mrs.A.P.Kulkarni
Explanation of the features
T – Thumb Instruction Set-ARM Processors support both the 32-
bit ARM Instruction Set and 16-bit Thumb Instruction Set. The
original 32-bit ARM Instructions consists of 32-bit opcode
which turns out to be a 4-byte binary pattern. 16-bit Thumb
Instructions consists of 16-bit opcode or 2-byte binary pattern
to improve the code density.
D – JTAG Debug-JTAG is a serial protocol used by ARM to
transfer the debug information between the processor and the
test equipment.
M – Fast Multiplier-Older ARM Processors used a small and
simple multiplier unit. This multiplier unit required more clock
cycles to complete a single multiplication. With the
introduction of Fast Multiplier unit, the clock cycles required
for multiplication are significantly reduced and modern ARM
Processors are capable of calculating a 32-bit product in a
single cycle. Mrs.A.P.Kulkarni
Explanation of the features
I – Embedded ICE Macrocell-ARM Processors have on-chip
debug hardware that allows the processor to set breakpoints
and watch points.
E – Enhanced Instructions-ARM Processors with this mode will
support the extended DSP Instruction Set for high performance
DSP applications. With these extended DSP instructions, the
DSP performance of the ARM Processors can be increased
without high clock frequencies.
J – Jazelle-ARM Processors with Jazelle Technology can be used
in accelerated execution of Java byte codes. Jazelle DBX or
Direct Byte code execution is used in mobile phones and other
consumer devices for high performance Java execution
without affecting memory or battery.
Mrs.A.P.Kulkarni
Explanation of the features
F – Vector Floating-point Unit-The Floating Point Architecture
in ARM Processors provide execution of floating point
arithmetic operations. The Dynamic Range and Precision
offered by the Floating Point Architecture in ARM Processors
are used in many real time applications in the industrial and
automotive areas.
S – Synthesizable-The ARM Processor Core is available as
source code. This software core can be compiled into a format
that can be easily understood by the EDA Tools. Using the
processor source code, it is possible to modify the architecture
of the ARM Processor.
Mrs.A.P.Kulkarni
ARM Architecture Enhancements
Revision Example Core ISA enhancement
Implementation
ARMv1 ARM1 First ARM processor 26-bit addressing
ARMv2 ARM2 32-bit multiplier,32-bit coprocessor support
ARMv2a ARM3 On-chip cache, atomic swap instruction, coprocessor
15 for cache management
ARMv3 ARM6 & ARM7DI 32-bit addressing, separate CPSR & SPSR, New nodes
undefined instruction & abort MMU support –virtual
memory
ARMv3M ARM7M Signed & unsigned long multiply instructions
ARMv4 Strong ARM Load-store instructions for signed & unsigned half
words/bytes. New mode-system. Reserve SWT space
for architecturally defined operations. 26-bit
addressing mode no longer supported
ARMv4T ARM7TDMI & ARM9T Thumb
Mrs.A.P.Kulkarni
ARM Architecture Enhancements
Revision Example Core ISA enhancement
Implementation
ARMv5TE ARM9E & ARM10E Superset of the ARMv4T,extra instructions added for
changing state between ARM & Thumb ,enhanced
multiply instructions, extra DSP-type instructions,
faster multiple accumulate
ARMv5TEJ ARM7EJ & ARM926EJ Java Acceleration
ARMv6 ARM11 Improved multiprocessor instructions, unaligned &
mixed Indian data handling , new multimedia
instructions
Mrs.A.P.Kulkarni
ARM processor families-
ARM7,ARM9,ARM 11
ARM7- It is introduced in 1994 (ARM7TDMI, ARM7EJ-S,
ARM720T)
• Arm7 family has been immensely successful & has established
ARM as the architecture of choice in digital word.
• Over the years more than 10 billion ARM7 processor family
based devices have powered a verity of cost & power sensitive
applications.
• Now a days never embedded designs are making use of latest
ARM processor such as Cortex-M0 & Cortex-M3.
Note: The ARM7 processor family ( ARM7 TDMI) is not
recommended for new designs.
Mrs.A.P.Kulkarni
Features of ARM7
1. Pipeline Depth: 3 stage (Fetch, Decode, Execute)
2. Operating frequency: 80 MHz
3. Power Consumption: 0.06 mW/MHz.
4. MIPS/MHz: 0.97
5. Architecture used: Von-Neumann
6. MMU/MPU: Not present
7. Cache Memory: Not present
8. Jazelle Instruction: Not present
9. Thumb Instruction: Yes (16 bit instruction set)
10. ARM Instruction set: Yes (32 bit)
11. ISA (Instruction Set Architecture): V4T (4 TH Version)
Mrs.A.P.Kulkarni
Features of ARM7
12. Interrupt Controller: Not Present
13.ISR entry: Non Deterministic ISR entry
14.Power Management: No in built Power Management
15.Instruction Set Performance v/s code size: Optimal
performance code size balance requires interworking between
ARM & Thumb code
16.Ease of application porting from one device to another: Lack
of standardization inhibits application porting
Mrs.A.P.Kulkarni
ARM9 Processor Family
Introduction-
• The ARM9 family was announced in 1997
• This family enables single processor solution for
microcontroller, DSP & JAVA applications, offering savings in
chip area & complexity, power consumption & time to market
• ARM9 – enhanced processors are well suited for applications
requiring a mix of DSP+ Microcontroller performance
• ARM9 family includes – ARM926EJ-S, ARM946E-S, &
ARM968E-S processors.
Mrs.A.P.Kulkarni
Features of ARM9
1. Pipeline Depth: 5 stage (Fetch, Decode, Execute, Decode,
Write)
2. Operating frequency: 150 MHz
3. Power Consumption: 0.19 mW/MHz
4. MIPS/MHz: 1.1
5. Architecture used: Harvard
6. MMU/MPU: Present
7. Cache Memory: Present (separate 16k/8k)
8. ARM/ Thumb Instruction: Support both
9. ISA (Instruction Set Architecture): V5T(ARM926EJ-S)
10. 31 (32-Bit size) Registers
11. 32-bit ALU & Barrel Shifter
12. Enhanced 32- bit MAC block
Mrs.A.P.Kulkarni
Features of ARM9
13.Memory Controller
Memory operations are controlled by MMU or MPU
– MMU:
• Provides Virtual Memory Support
• Fast Context Switching Extensions
– MPU:
• Enables memory protection & bounding
• Sand – boxing of applications
14. Flexible Cache Design (sizes can be 4KB to 128KB)
15.Flexible Core Design
16.DSP Enhancements: (very important)
17.Single cycle 32x16 multiplier Implementation
Mrs.A.P.Kulkarni
Features of ARM9
18. Speed up all the multiply instructions
19. New 32x16 & 16x16 multiply instructions
20. Allows independent access to 16 bit halves of registers
21. ARM ISA supports 32x32 multiply instruction
22.Saturating Arithmetic (QADD, QSUB)
23. Count leading zero for factor Division
Mrs.A.P.Kulkarni
Applications of ARM9
Mrs.A.P.Kulkarni
ARM11 Processors Family
Introduction
• This family provides the engine that power many smart phones,
also widely used in consumer, home & embedded applications.
• It delivers low power & a range of performance from 350MHz to
1GHz.
• ARM11 processor software is compatible with all previous
generations of ARM processors.
• It introduces 32-bit SIMD for media processing
– Physically tagged caches to improve OS context switch
performance.
– Trust zone for H/W enforced security.
– Tightly coupled memories for real-time applications.
• ARM11 family includes
• ARM1176JZ (F)-S & ARM11MP core, ARM1136J(F)-S,
ARM1156T2-S processor.
Mrs.A.P.Kulkarni
Features of ARM11
1. Pipeline Depth: 8stage
2. Operating frequency: 335MHz.
3. Power Consumption: 0.4mW/MHz.
4. MIPS/MHz: 1.2
5. Architecture used: Harvard
6. MMU/MPU: Present
7. Multiplier unit: 16x32 (16 bits of 32-bit size register)
8. Cache Memory: present (4-64k size)
9. ISA (Instruction Set Architecture): V6
10. Enhanced multiply instruction & saturation
11. Powerful ARMV6 instruction set architecture
12. Supports the thumb instruction set-memory BW & Size
requirements reduces by up to 35%
Mrs.A.P.Kulkarni
Features of ARM11
13. Supports Jazelle Technology for efficient embedded JAVA
execution
14. Supports the DSP extensions
15. SIMD media processing extensions deliver up to 2x
performance for video processing
16. ARM Trust-Zone Technology for on chip security
17. Thumb-2 Technology for enhanced performance energy
efficiency & code density
18. Low power consumption
19. High performance integer processor
20. Vectored interrupt interface & low-interrupt latency mode
speeds up interrupt response & real time performance
21. Optional vector floating point co-processor for automotive/
industrial controls & 3D graphics acceleration
Mrs.A.P.Kulkarni
Comparison Between
ARM7,ARM9,ARM11
Processor ARM7 ARM9 ARM11
Attribute
Pipeline Depth 3-stage 5-stage 8-stage
Typical MHz 80 150 335
mW/MHz 0.06 0.19( + cache) 0.4(+ cache)
MIPS/MHz 0.97 1.1 1.2
Architecture Von neumann Harvard Harvard
Multiplier 8x32 8x32 16x32
MMU/MPU Absent Present Present
Cache Memory Absent Present Present
Configurable Absent Present Present
TCM
Jazelle Absent Present Present
Technology
Mrs.A.P.Kulkarni
Features & Advantages of ARM processor
• Consists of a large uniform register file
• Supports load-store architecture
• Uses simple addressing modes
• Contains a reduced number of instructions
• It has uniform & fixed length (32-bit) instruction fields
• Does not use delayed branch since they make exception handling
more complex
• Does not use register windows to keep chip area small & hence
the cost
• It has control over both the arithmetic logic unit and shifter in
every data-processing instruction to maximize the use of an ALU
& a shifter
• Supports auto-increment & auto-decrement addressing modes to
optimize program loops
Mrs.A.P.Kulkarni
Suitability of ARM Processor in
Embedded System
The ARM instruction set differs from the pure RISC definition in
several ways that make the ARM instruction set suitable for
embedded applications:
• Variable cycle execution for certain instructions: Some ARM
instructions like load-store-multiple instructions vary in the
number of execution cycles depending upon the number of
registers being transferred. Load-store-multiple instructions
transfer data on sequential memory addresses which increases
performance since sequential memory accesses are often faster
than random accesses. Multiple register transfers also improve the
code density
Mrs.A.P.Kulkarni
Suitability of ARM Processor in
Embedded System
Inline barrel shifter to improve core performance & code density:
The ARM arithmetic logic unit has a barrel shifter that is capable
of shift & rotate operations. This inline barrel shifter preprocesses
one of the input registers before it is used by an instruction. This
expands the capability of many instructions to improve core
performance & code density
Thumb 16-bit instruction set: The Thumb instruction set consists of
16-bit instructions that act as a compact shorthand for a subset of
the 32-bit instructions of the standard ARM. These instructions
permit the ARM core to execute either 16 or 32-bit instructions.
The 16-bit instructions improve code density by about 30% over
32-bit fixed length instructions
Mrs.A.P.Kulkarni
Suitability of ARM Processor in
Embedded System
Conditional Execution: These instructions are executed when a
specific condition has been satisfied. This feature improves
performances & code density by reducing branch instructions
Enhanced Instructions: ARM instruction set also supports the
enhanced Digital Signal Processor(DSP) instructions. These
instructions allow ARM processor to serve as a combination of a
processor plus a DSP.
Mrs.A.P.Kulkarni
ARM7 Dataflow Model
Von Neumann Architecture Hence
data coming through bus is either
instruction or data (same memory).
The Sign extend hardware converts
signed 8-bit & 16-bit numbers to 32-
bit values as they are read from
memory & placed in a register (for
signed values), fill zeros if unsigned.
Source operands (Rn & Rm) are
read from the register file using the
internal buses A & B respectively &
result Rd is written back.
The PC value is in the address
register which is fed in to
the incrementer, then the
incremented value is copied back in
to r15.
Mrs.A.P.Kulkarni
ARM7 Dataflow Model
• It is also written in to address register to be used as the address for the next
instruction fetch.
• ALU: (The Arithmetic & logic Unit) or MAC (multiply & accumulate
Unit) takes the register values Rn & Rm from A & B buses & computers a
result).
• Data processing instructions write the result in Rd directly to the register
file.
• Load & Store instruction use the ALU to generate on Address to be to be
held in the address register & broadcast on the address bus.
• Barrel shifter:
– One important feature of the is that register Rm alternatively can be pre
processed in barrel in barrel shifter before it enters the ALU [left shift ,
right shift , rotated etc.].
– Depending on the instruction Barrel Shifter may be used or it could be
short circuit.
– Barrel shifter & ALU can calculate together a wide range of expression
& address in the same cycle.
Mrs.A.P.Kulkarni
ARM7 Processor Modes
Mode When does ARM enters in pericular mode?
Abort Failed attempt to access memory.
Fast interrupt Interrupt request arrives through FIQ channel
request (input).
Interrupt request arrives through IRQ channel
Interrupt request
(input).
After reset. It is generally the mode that an OS
Supervisor
Kernel operates in.
Special version of user mode that allows full
System
read-write access to the CPSR.
When processor encounters an instruction.
Undefined That is undefined or not supported by the
implementation.
Mrs.A.P.Kulkarni
RM7 Programmer's Model or Register
Model
Mrs.A.P.Kulkarni
Explained
• In total 17(Visible)+20(Banked Rrgisters)=37
• The active registers available in the user mode are shown below.
• This is protected mode which is normally used while executing
applications.
• 16 Data registers & one status register
• r0 to r13 are orthogonal general purpose register.
• Orthogonal means, any instruction that you can apply to ro can
equally be applied to any of the other register.
– Eg. ADD ro, r1, r2
– ADD r5, r6, r7
• R13 (stack pointer) and stores the top of the stack in the
current processor mode.
• R14(LR) Link Register where the core puts the return address
on executing a subroutine.
Mrs.A.P.Kulkarni
Explained
• R15(PC) Program counter stores the address of next
instruction to be executed.
• In ARM state all ARM instruction are 32-bits wide.
• In Thumb state all instructions are 16-bit wide.
• In ARM state Instruction have to be four byte aligned in the
memory. Which implies that the bottom two bits of the PC are
always zero(Memory location 1000H,1004,1008H).
Mrs.A.P.Kulkarni
CPSR: Current Processor Status Register
• ARM core uses CPSR to monitor & control internal
operations.
• The unused part reserved for future expansion.
• CPSR fields is divided in to four fields, each 8-bits wide:
flags, status, extension, and control.
• In current designs status & extension fields are reserved for
future purpose.
• In some ARM processor cores have extra bits allocated J bit
(available only on Jazelle enabled processing which execute 8-
bit instructions).
Mrs.A.P.Kulkarni
CPSR Diagram
Mrs.A.P.Kulkarni
SPSR: Save Program Status Register
Mrs.A.P.Kulkarni
Exceptions
• Generated by external events or internal sources
• Seven types of exceptions
– Reset: Occurs when ‘Reset’ pin is asserted – power-up/reset
– Undefined: Occurs when currently executing instruction could not
be recognized
– SWI: Occurs if program in user mode executes SWI instruction to
request OS services that are available in supervisor mode.
– Prefetch Abort: Occurs if instruction fetched from invalid
address. Exception is generated at execution stage.
– Data Abort: Occurs if data load/store attempt at illegal address
– IRQ: Occurs if IRQ pin goes low (only if CPSR IRQ mask bit is
0)
– FIQ: Occurs if FIQ pin goes low (only if CPSR FIQ mask bit is 0)
Mrs.A.P.Kulkarni
Difference between PIC & ARM
PIC ARM
PIC micro-controller refers to ARM micro-controller refers to
01.
Peripheral Interface Controller. Advanced RISC Machine.
Mrs.A.P.Kulkarni
Difference between PIC & ARM
PIC ARM
It is based on some feature of It is based on RISC instruction set
06.
RISC. architecture.
Mrs.A.P.Kulkarni
Difference between PIC & ARM
PIC ARM
Its is available with an average Its is available with a low cost as
11.
cost as compared to the features. compared to the features.
Mrs.A.P.Kulkarni
Technological evolution is the result of our own desire to lead a better life.”
Mrs.A.P.Kulkarni