Moduel 5
Moduel 5
Instruction Set
MODULE 5
Module 5 Processor Structure and Reduced Instruction Set
• Processor organization,
• Register organization
• Instruction cycle
• Instruction pipelining
• Processor Organization for Pipelining
• Instruction Execution Characteristics
• The Use of a Large Register File,
• Compiler-Based Register Optimization
• Reduced Instruction Set Architecture
Processor Functions
• A processor must perform several functions:
• Fetch instruction: Reads instructions from memory.
• Interpret instruction: Decodes the instruction to
determine the operation.
• Fetch data: Reads data from memory or I/O devices.
• Process data: Performs arithmetic or logical operations.
• Write data: Stores results in memory or sends them to an
I/O device.
• To achieve these tasks efficiently, the processor contains
registers, an ALU (Arithmetic and Logic Unit), and a Control
Unit (CU).
Register Organization
• Registers act as high-speed memory within the processor. They can be
categorized into:
1. User-Visible Registers:
• Used in assembly-level programming to minimize memory access.
• Types:
• General-purpose registers: Store operands for any operation.
• Condition code registers: Store flags like zero, carry, overflow, etc.
Register Organization
2. Control and Status Registers:
• Used by the processor and OS to manage execution.
• Common registers:
• Program Counter (PC): Stores the address of the next instruction.
• Memory Buffer Register (MBR): Temporarily stores data read from or written to memory.
• Program Status Word (PSW): Holds condition codes, interrupt enable/disable bits, and execution mode.
Instruction Cycle
The instruction cycle consists of fetch, decode, execute, and interrupt handling stages:
1. Fetch: Reads the instruction from memory into the Instruction Register (IR).
4. Interrupt Handling (if needed): Saves the current state and executes the interrupt
service routine.
• If indirect addressing is used, an indirect cycle fetches the actual operand address.
1. Fetch Cycle:
• PC → MAR → Address Bus
3. Execute Cycle:
• ALU operates on data from registers or memory.
4. Interrupt Cycle:
• PC (saved in memory) → New PC loaded with interrupt routine address
Instruction Pipelining
Pipelining Strategy
• Instead of executing one instruction at a time, the processor breaks the execution into stages,
with multiple instructions being processed simultaneously.
• This approach improves speed by allowing a new instruction to be fetched while another is
being executed. However, execution time varies, and branch instructions cause delays.
Instruction Pipelining
Six-Stage Instruction Pipeline
• With a six-stage pipeline, multiple instructions are in different stages simultaneously. If each stage takes
an equal time, execution time is significantly reduced.
• Challenges: Memory conflicts, branch instructions, and interrupts can stall the pipeline.
Pipeline Performance
•
Pipeline Hazards
Hazards occur when instruction dependencies prevent continuous execution.
Three types of hazards exist:
• Example: If a memory read and an instruction fetch cannot occur simultaneously, the
pipeline must stall.
• Solution: Increase hardware resources (e.g., multiple memory ports, multiple ALUs).
Pipeline Hazards
2. Data Hazards
• Occurs when an instruction depends on the result of a previous instruction still in the pipeline
• Types:
• Read After Write (RAW) Hazard: A register read occurs before the previous instruction writes to it.
• Write After Read (WAR) Hazard: A write occurs before a previous instruction reads from the same location.
• Write After Write (WAW) Hazard: Two instructions write to the same location out of order.
• Example of RAW Hazard:
• ADD EAX, EBX ; EAX = EAX + EBX
• SUB ECX, EAX ; ECX = ECX - EAX (EAX is not ready)
• The pipeline must stall for EAX to be updated before being used.
• Solution:
• Forwarding (Bypassing): Pass data directly to dependent instructions.
• Pipeline Stalling: Delay execution until data is ready.
Pipeline Hazards
3. Control Hazards (Branch Hazards)
• Occur when the pipeline fetches the wrong instruction after a branch (jump, if-else, loops,
etc.)
• Until the branch is executed, the pipeline does not know which instruction to fetch next.
2. Prefetch Branch Target: Fetch the next instruction and the branch target in
parallel.
• Used in IBM 360/91.
• Dynamic Prediction:
• Uses history to make better guesses.
• Taken/Not Taken Switch: Stores whether a branch was taken previously.
• Branch History Table (BHT): A cache that stores past branch decisions.
5. Delayed Branching:
• The CPU reorders instructions to execute useful instructions before resolving the branch.
Problem
• Pipelined processor has a clock rate of 2.5 GHz and executes a program with 1.5
million instructions. The pipeline has five stages, and instructions are issued at a
rate of one per clock cycle. Ignore penalties due to branch instructions and out-
of- sequence executions.
• a. What is the speedup of this processor for this program compared to a nonpipelined
processor?
• b. What is throughput (in MIPS) of the pipelined processor?
• Solution:
• Given:
• clock_rate_ghz = 2.5 # GHz
• instructions = 1.5e6 # 1.5 million instructions
• pipeline_stages = 5 # 5-stage pipeline
• instruction_issue_rate = 1 # One instruction per clock cycle
Problem
•
Problem
• A nonpipelined processor has a clock rate of 2.5 GHz and an average CPI (cycles
per instruction) of 4. An upgrade to the processor introduces a five- stage
pipeline. However, due to internal pipeline delays, such as latch delay, the clock
rate of the new processor has to be reduced to 2 GHz.
• a. What is the speedup achieved for a typical program?
• b. What is the MIPS rate for each processor?
• Solution:
• Given:
• clock_rate_non_pipelined = 2.5e9 # 2.5 GHz
• CPI_non_pipelined = 4
• clock_rate_pipelined = 2.0e9 # 2 GHz
• pipeline_stages = 5
• CPI_pipelined = 1 # Ideal pipeline CPI
Problem
•
Problem
•
Instruction Execution Characteristics
• They are the patterns and behaviors observed during the execution of
high-level language (HLL) programs when compiled to machine-level code.
These characteristics help architects understand:
• What types of instructions occur most frequently
• How operands are used
• How control flows (e.g., branches, loops)
• Which parts of programs consume the most time
1. Operations Performed:
• Studies show that most frequently executed operations in compiled HLL programs are:
3. Execution Sequencing
• Most instructions are simple (e.g., add, load, branch)
• Procedure calls and returns are frequent and expensive
• Branch instructions (like if, for, while) are common and affect pipeline flow
• Efficient support for procedure calls, register use, and branch prediction is critical.
Instruction Execution Characteristics
• Example Insight (from studies like Patterson's and Hennessy’s):
• With enough registers, most operations can be done register-to-register, reducing memory traffic.
• Register Windows
• Each procedure needs its own set of registers.
• Calling another procedure (or returning from one) would typically require
saving/restoring registers to/from memory — slow.
• Hence use register windows — overlapping sets of registers assigned to each procedure.
Use of a Large Register File
• Global Variables
• Register windows are great for local variables, but global variables (shared across
functions) can't be held in these rotating windows.
• Solutions:
1. Assign global variables to memory (traditional)
2. Use fixed “global registers” — a small set of registers always accessible to all procedures
3. For frequently used, local variables, a register file is faster and more efficient than cache.
Use of a Large Register File
• Benefits of a Large Register File
• ISA serves as the programmer’s view of the machine, defining what the
processor can do—not how it does it. It acts as a bridge between software and
the underlying hardware implementation.
RISC Architecture – Reduced Instruction Set Computer
• RISC is a computer architecture that uses a small, highly optimized set of
instructions, all designed to be executed very quickly — usually one instruction
per clock cycle.
Instruction Format – Reduced Instruction Set Computer
• RISC architectures typically use simple and fixed-length instruction formats to
facilitate fast decoding and efficient pipelining. Here's an overview of the
common formats:
• rs : Source register
• rt : destination register
• Example:
• ADDI R1, R2, #10 Add immediate value 10 to R2, store in R1
Instruction Format – Reduced Instruction Set Computer
• J-Type (Jump Type)
• Address : Jump target - usually combined with upper bits from the PC
• Example:
• JMP 10000 Jump to instruction at address 10000
Functional Elements – Reduced Instruction Set
Computer
1. Instruction Fetch Unit (IFU)
• Function: Fetches the next instruction from memory.
• Works with the Program Counter (PC) to determine the address of the next instruction.
• Uses instruction cache to reduce fetch time.
• First stage in the pipeline.
2. ID - Instruction Decode
3. EX – Execute
5. WB - Write Back
• This allows RISC CPUs to execute 1 instruction per cycle after the pipeline fills up.
Execution – Reduced Instruction Set Computer
1. Fixed-Length Instructions:
• All instructions are 32-bit wide, which simplifies the fetch stage and allows easy decoding.
3. Register-to-Register Operations:
• Operands come from fast-access registers, reducing memory access time and improving
overall execution speed.
• Examples:
• ARM (used in most smartphones & embedded devices)
• MIPS
• RISC-V (open-source architecture)
• SPARC (used in servers)
• PowerPC