0% found this document useful (0 votes)
14 views46 pages

Moduel 5

Module 5 covers processor structure, register organization, instruction cycles, and pipelining techniques. It discusses the functions of processors, types of registers, instruction execution characteristics, and the benefits of reduced instruction set architecture (RISC). Additionally, it addresses challenges such as pipeline hazards and optimization strategies for efficient instruction execution.

Uploaded by

saviosabu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views46 pages

Moduel 5

Module 5 covers processor structure, register organization, instruction cycles, and pipelining techniques. It discusses the functions of processors, types of registers, instruction execution characteristics, and the benefits of reduced instruction set architecture (RISC). Additionally, it addresses challenges such as pipeline hazards and optimization strategies for efficient instruction execution.

Uploaded by

saviosabu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Processor Structure and Reduced

Instruction Set
MODULE 5
Module 5 Processor Structure and Reduced Instruction Set

• Processor organization,
• Register organization
• Instruction cycle
• Instruction pipelining
• Processor Organization for Pipelining
• Instruction Execution Characteristics
• The Use of a Large Register File,
• Compiler-Based Register Optimization
• Reduced Instruction Set Architecture
Processor Functions
• A processor must perform several functions:
• Fetch instruction: Reads instructions from memory.
• Interpret instruction: Decodes the instruction to
determine the operation.
• Fetch data: Reads data from memory or I/O devices.
• Process data: Performs arithmetic or logical operations.
• Write data: Stores results in memory or sends them to an
I/O device.
• To achieve these tasks efficiently, the processor contains
registers, an ALU (Arithmetic and Logic Unit), and a Control
Unit (CU).
Register Organization
• Registers act as high-speed memory within the processor. They can be
categorized into:

1. User-Visible Registers:
• Used in assembly-level programming to minimize memory access.

• Types:
• General-purpose registers: Store operands for any operation.

• Data registers: Hold integer or floating-point values.

• Address registers: Store memory addresses for instructions.

• Condition code registers: Store flags like zero, carry, overflow, etc.
Register Organization
2. Control and Status Registers:
• Used by the processor and OS to manage execution.

• Common registers:
• Program Counter (PC): Stores the address of the next instruction.

• Instruction Register (IR): Holds the fetched instruction.

• Memory Address Register (MAR): Holds the address of data in memory.

• Memory Buffer Register (MBR): Temporarily stores data read from or written to memory.

• Program Status Word (PSW): Holds condition codes, interrupt enable/disable bits, and execution mode.
Instruction Cycle
The instruction cycle consists of fetch, decode, execute, and interrupt handling stages:

1. Fetch: Reads the instruction from memory into the Instruction Register (IR).

2. Decode: Determines the operation and required operands.

3. Execute: Performs the operation using the ALU or registers.

4. Interrupt Handling (if needed): Saves the current state and executes the interrupt
service routine.
• If indirect addressing is used, an indirect cycle fetches the actual operand address.

• The cycle repeats for each instruction in the program.


Data Flow in the Processor
Data moves between the PC, MAR, MBR, IR, and ALU during execution:

1. Fetch Cycle:
• PC → MAR → Address Bus

• Memory → MBR → IR (Instruction is fetched)

• PC is incremented for the next instruction

2. Indirect Cycle (if needed):


• MBR (stores address) → MAR → Memory read

• MBR (stores final operand address)


Data Flow in the Processor
Data moves between the PC, MAR, MBR, IR, and ALU during execution:

3. Execute Cycle:
• ALU operates on data from registers or memory.

• The result is stored in a register or memory location.

4. Interrupt Cycle:
• PC (saved in memory) → New PC loaded with interrupt routine address
Instruction Pipelining
Pipelining Strategy

• Instruction pipelining is a technique used to improve CPU performance by overlapping


instruction execution, much like an assembly line in a factory.

• Instead of executing one instruction at a time, the processor breaks the execution into stages,
with multiple instructions being processed simultaneously.

• Basic Two-Stage Pipeline


1. Fetch Instruction (FI): Reads the instruction from memory.
2. Execute Instruction (EI): Decodes and executes the instruction.

• This approach improves speed by allowing a new instruction to be fetched while another is
being executed. However, execution time varies, and branch instructions cause delays.
Instruction Pipelining
Six-Stage Instruction Pipeline

To optimize performance, instruction processing can be broken into more stages:

1. Fetch Instruction (FI): Reads the instruction into a buffer.

2. Decode Instruction (DI): Determines the opcode and operands.

3. Calculate Operands (CO): Computes effective addresses.

4. Fetch Operands (FO): Retrieves operands from memory or registers.

5. Execute Instruction (EI): Performs the computation.

6. Write Operand (WO): Stores the result.

• With a six-stage pipeline, multiple instructions are in different stages simultaneously. If each stage takes
an equal time, execution time is significantly reduced.

• Challenges: Memory conflicts, branch instructions, and interrupts can stall the pipeline.
Pipeline Performance

Pipeline Hazards
Hazards occur when instruction dependencies prevent continuous execution.
Three types of hazards exist:

1. Resource Hazards (Structural Hazards)


• Occurs when multiple instructions require the same hardware resource.

• Example: If a memory read and an instruction fetch cannot occur simultaneously, the
pipeline must stall.

• Solution: Increase hardware resources (e.g., multiple memory ports, multiple ALUs).
Pipeline Hazards
2. Data Hazards
• Occurs when an instruction depends on the result of a previous instruction still in the pipeline
• Types:
• Read After Write (RAW) Hazard: A register read occurs before the previous instruction writes to it.
• Write After Read (WAR) Hazard: A write occurs before a previous instruction reads from the same location.
• Write After Write (WAW) Hazard: Two instructions write to the same location out of order.
• Example of RAW Hazard:
• ADD EAX, EBX ; EAX = EAX + EBX
• SUB ECX, EAX ; ECX = ECX - EAX (EAX is not ready)
• The pipeline must stall for EAX to be updated before being used.
• Solution:
• Forwarding (Bypassing): Pass data directly to dependent instructions.
• Pipeline Stalling: Delay execution until data is ready.
Pipeline Hazards
3. Control Hazards (Branch Hazards)
• Occur when the pipeline fetches the wrong instruction after a branch (jump, if-else, loops,
etc.)

• Until the branch is executed, the pipeline does not know which instruction to fetch next.

• Penalty: Flushing incorrect instructions from the pipeline causes delays.


Handling Branch Hazards
1. Multiple Streams: Fetch both possible branch targets.
• Problem: Wastes resources and increases complexity.

2. Prefetch Branch Target: Fetch the next instruction and the branch target in
parallel.
• Used in IBM 360/91.

3. Loop Buffer: A small cache that stores recently executed instructions.


• If a loop repeats, it fetches instructions from the buffer instead of memory.

• Used in CDC Star-100, CRAY-1.

4. Branch Prediction: The CPU predicts whether a branch will be taken.


Handling Branch Hazards
4. Branch Prediction: The CPU predicts whether a branch will be taken.
• Static Prediction:
• Always Not Taken: Assume the branch is never taken.
• Always Taken: Assume the branch is always taken.
• Opcode-based Prediction: Certain opcodes predict branch behavior.

• Dynamic Prediction:
• Uses history to make better guesses.
• Taken/Not Taken Switch: Stores whether a branch was taken previously.
• Branch History Table (BHT): A cache that stores past branch decisions.

5. Delayed Branching:
• The CPU reorders instructions to execute useful instructions before resolving the branch.
Problem
• Pipelined processor has a clock rate of 2.5 GHz and executes a program with 1.5
million instructions. The pipeline has five stages, and instructions are issued at a
rate of one per clock cycle. Ignore penalties due to branch instructions and out-
of- sequence executions.
• a. What is the speedup of this processor for this program compared to a nonpipelined
processor?
• b. What is throughput (in MIPS) of the pipelined processor?
• Solution:
• Given:
• clock_rate_ghz = 2.5 # GHz
• instructions = 1.5e6 # 1.5 million instructions
• pipeline_stages = 5 # 5-stage pipeline
• instruction_issue_rate = 1 # One instruction per clock cycle
Problem

Problem
• A nonpipelined processor has a clock rate of 2.5 GHz and an average CPI (cycles
per instruction) of 4. An upgrade to the processor introduces a five- stage
pipeline. However, due to internal pipeline delays, such as latch delay, the clock
rate of the new processor has to be reduced to 2 GHz.
• a. What is the speedup achieved for a typical program?
• b. What is the MIPS rate for each processor?
• Solution:
• Given:
• clock_rate_non_pipelined = 2.5e9 # 2.5 GHz
• CPI_non_pipelined = 4
• clock_rate_pipelined = 2.0e9 # 2 GHz
• pipeline_stages = 5
• CPI_pipelined = 1 # Ideal pipeline CPI
Problem

Problem

Instruction Execution Characteristics
• They are the patterns and behaviors observed during the execution of
high-level language (HLL) programs when compiled to machine-level code.
These characteristics help architects understand:
• What types of instructions occur most frequently
• How operands are used
• How control flows (e.g., branches, loops)
• Which parts of programs consume the most time
1. Operations Performed:
• Studies show that most frequently executed operations in compiled HLL programs are:

• Data movement and control flow dominate, not complex arithmetic.


Instruction Execution Characteristics
2. Operand Usage
• Operand = the data on which operations are performed.
• Findings:
• Most operands are simple scalar variables (e.g., integers, chars)
• Around 80% are local to the procedure/function
• Arrays, structures, and pointers are used less frequently
• Since most data is local, registers are ideal for holding them.

3. Execution Sequencing
• Most instructions are simple (e.g., add, load, branch)
• Procedure calls and returns are frequent and expensive
• Branch instructions (like if, for, while) are common and affect pipeline flow
• Efficient support for procedure calls, register use, and branch prediction is critical.
Instruction Execution Characteristics
• Example Insight (from studies like Patterson's and Hennessy’s):

• Even though CALL/RETURN occurs less frequently, it consumes a


disproportionate amount of time, due to saving/restoring context.
Instruction Execution Characteristics
• Instruction execution characteristics helps architects design better CPUs:
• Add more registers for fast operand access

• Use simple instruction formats for fast decoding

• Design better pipelines and branch prediction

• Optimize hardware for realistic program behavior, not hypothetical workloads


Use of a Large Register File
• Large register file — a fast, on-chip storage space used to hold operands and
temporary results.
• This design choice is driven by the desire to minimize costly memory accesses
and maximize processor speed.
• Why Use a Large Register File?
1. Registers are faster than memory
• Accessing data in a register is much quicker than accessing cache or main memory.
• Keeping operands in registers significantly speeds up instruction execution.
2. High frequency of scalar and local variable access
• Studies show most high-level language (HLL) variables are:
• Scalars (e.g., integers, characters)
• Local to procedures (used within a function)
• Therefore, keeping these frequently used variables in registers is efficient.
Use of a Large Register File
• Why Use a Large Register File?
3. Reduces load/store operations
• RISC design limits memory access to only LOAD and STORE instructions.

• With enough registers, most operations can be done register-to-register, reducing memory traffic.

• Register Windows
• Each procedure needs its own set of registers.

• Calling another procedure (or returning from one) would typically require
saving/restoring registers to/from memory — slow.

• Hence use register windows — overlapping sets of registers assigned to each procedure.
Use of a Large Register File
• Global Variables
• Register windows are great for local variables, but global variables (shared across
functions) can't be held in these rotating windows.
• Solutions:
1. Assign global variables to memory (traditional)
2. Use fixed “global registers” — a small set of registers always accessible to all procedures
3. For frequently used, local variables, a register file is faster and more efficient than cache.
Use of a Large Register File
• Benefits of a Large Register File

• Reduced memory access = better performance

• Enables faster procedure calls with register windows

• Improves instruction pipelining efficiency

• Allows more operand storage for HLL programs


Compiler-Based Register Optimization
• In RISC architecture, the number of physical (hardware) registers is limited
(e.g., 16, 32). But high-level language (HLL) programs use many variables. So the
compiler is responsible for:
• Keeping as many frequently-used variables in registers as possible

• Minimizing load/store instructions to/from memory

• Reusing registers when possible without conflicts

• This is called register allocation and optimization.


Compiler-Based Register Optimization
• Graph Coloring Algorithm
• This is the most common algorithm used in compilers to perform register allocation.
• HLL programs refer to variables symbolically (e.g., a, b, sum)
• Compiler maps these symbolic variables to virtual registers
• Then it tries to assign virtual registers to physical registers in the most efficient way
• If registers run out, some variables must be "spilled" to memory (less efficient)

• Reason for Register Optimization


• Memory access is slow compared to registers
• Good register allocation = faster code, smaller binaries, less energy
• In RISC, since most operations are register-to-register, this becomes even more critical
Compiler-Based Register Optimization
• More Registers vs Compiler Optimization
• If you have many registers (like in some RISC CPUs), optimization becomes easier
• But even with few registers, a smart compiler can do a great job with optimization
• Studies show that beyond 32–64 registers, performance gains taper off unless your
compiler is very poor
Instruction Set Architecture
• Instruction Set Architecture (ISA) is the part of a computer architecture that
defines the interface between software and hardware. It includes:
• The set of instructions the processor can execute (e.g., arithmetic, logical, data transfer).

• Instruction formats, addressing modes, and data types.

• Registers, memory organization, and I/O mechanisms.

• Interrupts and exception handling mechanisms.

• ISA serves as the programmer’s view of the machine, defining what the
processor can do—not how it does it. It acts as a bridge between software and
the underlying hardware implementation.
RISC Architecture – Reduced Instruction Set Computer
• RISC is a computer architecture that uses a small, highly optimized set of
instructions, all designed to be executed very quickly — usually one instruction
per clock cycle.
Instruction Format – Reduced Instruction Set Computer
• RISC architectures typically use simple and fixed-length instruction formats to
facilitate fast decoding and efficient pipelining. Here's an overview of the
common formats:

• Usually, Instruction Format is of 3 types:


• R-Type (Register Type)
• I-Type (Immediate Type)
• J-Type (Jump Type)

• Instructions facilitate Simplified decoding

• Registers are used predominantly for operands


Instruction Format – Reduced Instruction Set Computer
• R-Type (Register Type)
• Used for arithmetic and logical operations.

Field Opcode rs rt rd shamt funct


Bit Length 6 5 5 5 5 6

• rs, rt : Source register


• rd : destination register
• shamt: shift amount
• funct: further specify the operation
• Example:
• ADD R1, R2, R3 Add contents of R2 and R3, store in R1
Instruction Format – Reduced Instruction Set Computer
• I-Type (Immediate Type)

• Used for data transfer, arithmetic with constants, and branching.


Field Opcode rs rt Immediate
Bit Length 6 5 5 16

• rs : Source register

• rt : destination register

• Immediate: constant value or address offset

• Example:
• ADDI R1, R2, #10 Add immediate value 10 to R2, store in R1
Instruction Format – Reduced Instruction Set Computer
• J-Type (Jump Type)

• Used for unconditional jumps.


Field Opcode Address
Bit Length 6 26

• Address : Jump target - usually combined with upper bits from the PC

• Example:
• JMP 10000 Jump to instruction at address 10000
Functional Elements – Reduced Instruction Set
Computer
1. Instruction Fetch Unit (IFU)
• Function: Fetches the next instruction from memory.
• Works with the Program Counter (PC) to determine the address of the next instruction.
• Uses instruction cache to reduce fetch time.
• First stage in the pipeline.

2. Instruction Decode Unit (IDU)


• Function: Decodes the fetched instruction into control signals.
• Identifies the operation (opcode), source and destination registers.
• Checks for operand readiness and forwards to the execution unit.
• Since instruction formats are simple and fixed, decoding is fast and efficient.
• Second stage in the pipeline.
Functional Elements – Reduced Instruction Set
Computer
3. Register File
• Function: Stores general-purpose data for computation.
• Contains a large number of registers (e.g., 32 or more).
• Most operands are read/written directly to/from registers, not memory.
• Registers are used for storing:
• Operands for ALU
• Intermediate values
• Function parameters

• Reduces memory access, speeds up computation.

4. Arithmetic and Logic Unit (ALU)


• Function: Performs arithmetic and logic operations like:
• ADD, SUB, AND, OR, XOR
• Comparisons (for branches)

• Works on data from registers, not memory.


• Third stage of the pipeline (Execute).
Functional Elements – Reduced Instruction Set
Computer
5. Control Unit
• Function: Directs the operation of the processor.
• Generates control signals based on the instruction type.
• Manages instruction flow, pipelining, branching, etc.
• Often hardwired instead of microprogrammed (unlike CISC).
• Enables fast instruction execution.

6. Load/Store Unit (Memory Access)


• Function: Handles memory access operations:
• LOAD: Read data from memory into a register
• STORE: Write data from a register to memory

• Only these instructions access memory.


• Separates memory from computation (Load/Store architecture).
Functional Elements – Reduced Instruction Set
Computer
7. Pipeline Registers
• Function: Hold intermediate data between pipeline stages.
• Allow overlapping execution of multiple instructions.
• Boosts throughput via instruction pipelining.

8. Program Counter (PC)


• Function: Holds the address of the next instruction to fetch.
• Automatically updated after each instruction.
• Can be modified by branch/jump instructions.

9. Instruction and Data Cache


• Function: Stores frequently accessed instructions and data.
• Reduces latency of memory operations.
• Helps maintain performance despite slower main memory.
• Instructions are fixed length (usually 32 bits) and simple in format.
Functional Elements – Reduced Instruction Set
Computer
Functional Element Functionality
Instruction Fetch Unit Fetches instructions
Instruction Decoder Decodes and prepares instructions
Register File Holds operands and results
ALU Performs arithmetic/logical operations
Control Unit Manages execution and pipelining
Load/Store Unit Handles memory reads/writes
Program Counter (PC) Tracks next instruction address
Pipeline Registers Buffer between pipeline stages
Instruction/Data Cache Fast access to code/data
Branch Prediction Unit Reduces branch penalties
Pipelining – Reduced Instruction Set Computer
• RISC designs are ideal for instruction pipelining, which overlaps execution stages of
different instructions. Typical RISC pipeline:
1. IF - Instruction Fetch

2. ID - Instruction Decode

3. EX – Execute

4. MEM - Memory Access (if needed)

5. WB - Write Back

• This allows RISC CPUs to execute 1 instruction per cycle after the pipeline fills up.
Execution – Reduced Instruction Set Computer
1. Fixed-Length Instructions:
• All instructions are 32-bit wide, which simplifies the fetch stage and allows easy decoding.

2. Few Addressing Modes:


• Only register and immediate addressing modes are supported, which simplifies the effective
address calculation stage in the pipeline.

3. Register-to-Register Operations:
• Operands come from fast-access registers, reducing memory access time and improving
overall execution speed.

4. Simple Control Logic:


• Because of uniform instruction formats and simple operations, the control unit can be
hardwired rather than microprogrammed, improving performance.
Advantage – Reduced Instruction Set Computer
• Simpler control logic → faster execution
• Easier to implement pipelining → higher instruction throughput
• Compiler optimization is easier
• Lower power consumption
• Better performance with fewer transistors → cost-effective

• Examples:
• ARM (used in most smartphones & embedded devices)
• MIPS
• RISC-V (open-source architecture)
• SPARC (used in servers)
• PowerPC

You might also like