0% found this document useful (0 votes)
3 views29 pages

LPLP4

The document provides an overview of microprocessor fundamentals, including processor architecture, CPU operation, and pipelining. It explains key components such as the CPU, memory, and buses, and details how instructions are executed through fetch, decode, and execute phases. Additionally, it discusses the benefits of pipelining in improving CPU performance and the factors affecting processor performance.

Uploaded by

Shad Srwd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views29 pages

LPLP4

The document provides an overview of microprocessor fundamentals, including processor architecture, CPU operation, and pipelining. It explains key components such as the CPU, memory, and buses, and details how instructions are executed through fetch, decode, and execute phases. Additionally, it discusses the benefits of pipelining in improving CPU performance and the factors affecting processor performance.

Uploaded by

Shad Srwd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Microprocessor Fundamentals

Belan Bapir Bakr


Table of Contents
1

2 • Processor Architecture

• How the CPU Works

4
• Pipelining

2
Architecture
In the context of computers, architecture has many meanings
Instruction Set Architecture (ISA)
The parts of a processor design that one needs to understand or write assembly/machine code
e.g. instruction set specifications, register
Example ISAs:
Intel: x86, IA32, x86-64, Itanium
ARM
Microarchitecture (or Computer Organization)
Implementation of the architecture
e.g. cache size, core frequency
Platform Architecture (or System Design)
Memory and I/O buses
Memory controllers
Direct memory access

3
Components of the Computer
Since 1946 all computers have had 4
components:

Processor controls the system


Input
Memory contains program and
Device
data Central
Processing Memory
Input devices that get data from Unit
the outside world into the Output
computer Device

Output devices that present com-


putation results to the
outside world

4
What is the CPU?
CPU is the brain of the computer system
The computer works by executing a program containing many instructions
Program
Sequence of instructions that perform a task
Think of executing a program like playing music
Instruction
A binary code representing the simplest operation performed by the processor
Think of an individual instruction as a note coming from a musical instrument
Code forms:
Machine code: byte-level program that a processor executes
Assembly code: text representation of machine code

5
Memory
k × m array of stored bits (k is usually 2n)
Address unique (n-bit) identifier of location
Contents m-bit value stored in location
Basic operations:
LOAD read a value from a memory location
STORE write a value to a memory location
Basic memory types:
RAM read-and-write
ROM read-only

6
Buses
Most processors use the three-bus architecture.
Address bus
Carries the address of memory or I/O device for a data transfer. Determines the addressing range.
Unidirectional: always acts output of the CPU
Data bus
Carries data to be transferred between processor and memory or I/O.
Bidirectional: set as input when reading data, and output when writing data
Control bus
Carries status and control signals required for various operations
An assortment of signals, anything not address or data
e.g. R/W*, IO/M*, Interrupt, DMA

7
Address Bus
The address bus contains the address of memory location or I/O device selected for a data
transfer.
Address bus width determines the addressing range.

CPU Address bus width Total addressable memory


8051 16 216 = 65,536 = 64 KB
8086 20 220 = 1,048,576 = 1MB
68000 24 224 = 16,777,216 = 16 MB
ARM 32 232 = 4,294,967,296 = 4 GB
Xeon 64 264 = 18,446,744,073,709,551,616 = 16 EB

8
Data Bus
Width of data bus determines the amount of data transferable in one step
Most microcontrollers have 8-bit data buses
Can transfer 1byte at any one time
A 32-bit word requires 4 transfers
ARM has a 32 bit data bus
Can transfer 4 bytes at once
Some chips has a external bus with selectable bus width of 8, 16 or 32 bits
Selecting smaller data bus results in lower performance but enables interfacing to lower cost memory
devices

9
Address and Data Buses $000000

224 addresses

24-bit address bus $FFFFFF


15 0
CPU Memory
16-bit data bus

Motorola 68000 bus sizes.

10
Inside the CPU Memory-Facing Registers
Registers store binary data. The following
registers interface with memory.

Program Counter (PC) points to the address of the


instruction currently being
executed by the CPU

Instruction Register (IR) stores the instruction read


from the address of the
instruction indicated by the
PC

Address Register store the address of the


current data

Data Register stores the value of from the


address indicated by Address
Register

11
Inside the CPU Internal Components
These components do not have direct access
to memory.

Decoder interpret the instruction


brought to the IR and pass
it to the CU.

Control Unit (CU) generates control signals


based on the instruction
detected by Decoder

Arithmetic/Logic Unit (ALU) performs arithmetic &


logic operations

Accumulator (ACC) a register that stores the


values used before and
after an ALU operation

12
ALU size
The ALU operates on several bits simultaneously
“The size of the processor”
Usually (but not always) determines data bus size
Typical sizes:
4 bits (remote controllers etc)
8 bits (microcontrollers: 68HC05, 8051, PIC)
16 bits (low-end microprocessors: Intel 8086)
32 bits (most popular size today: ARM, MIPS)
64 bits (servers: IBM POWER, Intel Xeon)

13
Table of Contents

• Processor Architecture
3
• How the CPU Works

• Pipelining

14
How does the CPU Work?

Fetch Decode Execute

Fetch instructions Interpret binary perform requested


from memory instruction action

An instruction has 3 phases of execution


The Control Unit (CU) orchestrates the complete execution of each instruction
At CU’s heart is a Finite State Machine (FSM) that sets up the state of the logic circuits according
to each instruction.
This process is governed by the system clock
the FSM goes through one transition (“machine cycle”) for each tick of the clock.

15
Inside the CPU An Example Program

C version Explanation
Assembly version
void main(void)
LOAD 0x2000 Load value of a to Data Register
{ Address Assembly
int a = 1; ------- --------- ADD 0x2002 After adding the previously loaded
int b = 2; 0x1000 LOAD 0x2000 value of a and the newly loaded
int c; 0x1002 ADD 0x2002 value of b, save in ACC.
c = a + b; 0x1004 STORE 0x2004 STORE 0x2004 Save the added result to the
} address of c

16
Executing LOAD instruction
1. The address the CPU wants to execute is 0x10000 in
the PC.
2. Put 0x1000 in Address Register,
3. When 0x1000 enters the Address Register, it
automatically accesses 0x1000 of the memory
4. Instruction there is read from memory.
5. Instruction is stored in IR (LOAD 0x2000)
6. Instruction goes into the decoder. At the same time the
PC is increased
7. Decoder interprets what the content is. CU
understands that the content is to get the value of
address 0x2000
8. CU generates control signals to read the value of
0x2000 from the memory
9. A value of 1is entered into the data register by the
control signal generated by the CU.
10. Value of data register is available to any circuits
needing it
11. Since this value may be operated through ALU, it is
temporarily stored in ACC.

17
Executing ADD instruction
1. Like LOAD, the address the current CPU will execute is
0x1002, which has already been increased
2. Put 0x1002 in Address Register,
3. Address 0x1002 is accessed
4. Value at 0x1002 is available
5. This value is stored in IR
6. Value in IR is sent to decoder. At the same time the PC
is increased
7. Decoder interprets value in IR. CU understands that the
content is to get the add the value of address 0x2002
8. CU generates control signals to read the value at
0x2002 based on decoder interpretation The ALU is
given a control signal to add.
9. Data from 0x2002 is loaded and saved in data register
10. ALU add data in Data Register with current value in
ACC
11. The sum replaces the old value of ACC

18
Executing STORE instruction
1. The current Instruction to be executed is 0x1004,
which is the PC value
2. The value in PC is transferred to the Address Register
3. Location 0x1004 is accessed
4. The value from 0x1004 is made available to the CPU
5. This value is saved in IR
6. The value in IR is made available to the decoder, and
the PC is incremented
7. Decoder interprets the value in IR
8. The CU generates control signals to store the value in
ACC at 0x2004 based on decoder interpretation
9. The output of the ALU is stored in the ACC
10. Finally, the value in ACC is stored in location 0x2004

19
Table of Contents
• Processor Architecture

3 • How the CPU Works

• Pipelining

20
Idea of the Pipeline 1/3

The sequence of operation of the CPU is the


regardless of instruction
Separate circuits are active during different
phases of execution
Each phase can be executed in parallel

21
Idea of the Pipeline 2/3
With pipelining each stage (fetch,
decode, execution) of the
instructions can be processed at
once.
Pipelining is used even in the
smalles $2 microcontroller
For our short program, while LOAD
0x2000 is actually being executed,
ADD 0x2002 is being decoded and
STORE 0x2004 is being fetched
from memory

22
Idea of the Pipeline 3/3
The 3-stage pipeline of the famous
ARM7
1clock cycle per cell. from the first
cycle to the third cycle
The first opcode is executed, the
second opcode is decoded, and the
third opcode is fetched all at once.
Execute Fetch-Decode-Execute
without pipelining will take 3×3
cycles to execute 3 opcodes
If you use a pipeline, it takes only 5
cycles to execute 3 opcodes

23
A 5-stage Pipeline
The instruction execution steps can be refined to increase the number of pipeline stages

Non-pipelined

Pipelined

24
Pipeline Performance
Latency
Defined as the time (or #cycles) from entering the pipeline until an instruction completes
Pipelining doesn’t help latency of single task
Throughput
Defined as the number of instructions executed per time period
Potential speedup = Number of pipeline stages

Trivia
The longest pipeline on a commercial machine is 31 stages on the Intel Pentium 4.

25
Speedup
k-stage pipeline processes n tasks in k + (n − 1) clock cycles:
k cycles for the first task and
n − 1 cycles for the remaining n − 1tasks
Total time to process n tasks, k stages:
For the pipelined processor:
[k + (n − 1)]τ
For the non-pipelined processor:
nkτ
Speedup (Sk = k as n → ∞):
Tnon−pipelined
Sk =
Tpipelined
T1
=
Tk
nkτ
=
[k + (n − 1)]τ
nk
=
k+n− 1
26
Clocking Si+1
Si

t tm d

Latch delay: d
Clock cycle of the pipeline: τ
τ = max(τm) + d
Pipeline frequency: f
1
f=
τ
∴ Pipeline rate limited by slowest pipeline stage.
Also, increasing #stages adds delay d
27
Limits to Pipelining
Hazards prevent next instruction from executing during its designated clock cycle
Structural hazards
Two instructions attempting to use the same resources at the same time
Data hazards
Instruction attempting to use data before it’s available in the register file
Control hazards
Caused by branch instructions, which invalidates data already in pipeline, requiring flushing and refilling.
Simplest solution is to stall the pipeline until the hazard is resolved, inserting one or more
“bubbles” in the pipeline
More stall cycles = lower performance
Complex solutions include branch prediction and data forwarding, out of the scope of this course

28
CPU Performance
Processor Performance is function of
IC: Instruction count
CPI: Cycle per instruction
Clock cycle

Seconds
CPU time =
Program
Instructions Cycles Seconds
= × ×
Program Instruction cycle
= IC × CPI × Clk

Reducing any of the 3 factors will lead to improved performance


Pipelining reduces CPI.
Best case: CPI = 1.

29

You might also like