Unit 4 Notes
Unit 4 Notes
The Basic Processing Unit (BPU) is the core of a computer's central processing unit (CPU),
responsible for executing instructions from the program stored in memory. It handles the
basic operations necessary for instruction execution, like fetching instructions, decoding
them, and then executing them by utilizing different internal components such as the ALU,
registers, and control units.
● Registers: Small, high-speed storage locations inside the CPU. Common registers
include the Program Counter (PC), Instruction Register (IR), and Accumulator
(ACC). Registers hold data temporarily during processing.
● Control Unit (CU): Directs the operation of the processor by interpreting and
executing instructions fetched from memory. It generates control signals that direct
data movement and processing within the CPU.
● Arithmetic Logic Unit (ALU): The ALU performs arithmetic and logic operations. It
processes data by adding, subtracting, comparing, or performing bitwise operations on
the data.
● Memory Interface: This allows the BPU to access instructions and data from main
memory, such as RAM.
Example: In a simple instruction like ADD R1, R2, R3, the control unit fetches the
instruction, the ALU adds the contents of R2 and R3, and the result is stored in R1.
In summary, the BPU is the engine that drives instruction execution by coordinating the flow
of data between memory and the ALU while following instructions sequentially.
The Arithmetic Logic Unit (ALU) is a critical part of the BPU and is responsible for
executing the actual computation tasks that the processor needs to perform. It handles both
arithmetic (addition, subtraction, multiplication, division) and logical operations (AND, OR,
NOT, XOR).
The ALU typically consists of:
● Operands: These are the data inputs that the ALU manipulates. They are usually
provided by the processor's registers.
● Control Unit: The control unit signals the ALU on which operation to perform based
on the instruction being executed.
When the CPU fetches an instruction that involves an arithmetic or logical operation:
1. Instruction Fetch: The instruction is fetched from memory and loaded into the
Instruction Register (IR).
2. Instruction Decode: The control unit decodes the instruction to determine the type of
operation (arithmetic or logic) to be performed.
3. Operand Fetch: The data required for the operation (operands) is fetched from
registers or memory.
4. Execution: The ALU performs the desired operation (e.g., addition, subtraction) on
the fetched operands.
5. Result Storage: The result is stored in the appropriate register or memory location.
The ALU is crucial because it performs the actual calculations and logical decisions that
programs need. Without the ALU, the processor wouldn't be able to perform even the most
basic functions, like adding two numbers or determining whether a condition is true or false.
ALU Design:
In modern processors, ALUs are highly optimized for performance. Some processors have
multiple ALUs to perform parallel operations. The size of the ALU (8-bit, 16-bit, 32-bit, etc.)
determines the word length it can process in a single operation.
Example:
● Performance: The speed of the ALU determines how fast the CPU can perform
operations, influencing the overall performance of the system.
● Instruction Set Architecture (ISA): The types of operations the ALU supports are
defined by the CPU’s ISA (e.g., x86, ARM). More complex ALUs can support a
wider range of operations, including floating-point arithmetic, vector operations, etc.
● Parallelism: Modern CPUs often include multiple ALUs to perform several
operations simultaneously, enhancing the performance of multi-core processors.
Instruction Execution
● Steps in Instruction Execution:
o Fetch: The instruction is fetched from memory.
o Decode: The fetched instruction is decoded to understand what actions are
required.
o Execute: The CPU executes the decoded instruction using the ALU and
registers.
o Writeback: The result of the execution is written back to a register or
memory.
● Example:
o Consider the instruction MOV R1, R2. The CPU fetches this instruction,
decodes it to understand that R2 needs to be copied to R1, and executes the
operation.
Branch Instruction
Branch instructions alter the flow of execution in a program. There are conditional and
unconditional branch instructions. Branch Instructions are critical in controlling the flow of
execution in a program. Unlike sequential execution, where instructions are executed one
after another, branch instructions allow the program to "branch" or "jump" to a different part
of the code based on certain conditions.A branch instruction changes the flow of control by
modifying the program counter (PC) to jump to a new instruction address. This is usually done based
on a condition or an unconditional jump.
Multiple Bus Organization is a method of designing the internal data paths within a
computer system, particularly for its central processing unit (CPU). It improves the system's
performance by enabling multiple data transfers to occur simultaneously, which reduces the
number of clock cycles needed to complete instructions. A multiple bus architecture uses
more than one bus to transfer data simultaneously, allowing for greater data throughput and
faster performance. This is in contrast to a single bus system where all data and control
signals share a single bus.
What is a Bus?
A bus is a communication system that transfers data between components inside a computer,
such as between the CPU, memory, and input/output (I/O) devices. It consists of parallel lines
that carry:
In a single bus architecture, all components are connected to a single bus, which can lead to
bottlenecks when multiple components attempt to communicate simultaneously.
Control Sequence for Execution
Step Action
1 PCout , MAR in , Read, Select4,Add, Zin
2 Zout , PCin , Yin , WMF C
3 MDR out , IRin
4 R3out , MAR in , Read
5 R1out , Yin , WMF C
6 MDR out , SelectY, Add, Zin
7 Zout , R1in , End
Why Use Multiple Bus Organization?
The main motivation behind using multiple buses is to increase the speed and efficiency of
data transfers in the CPU. By having more than one bus, different units in the CPU (such as
registers, memory, and I/O devices) can communicate with each other in parallel, leading to:
● Multiple Buses: More than one bus allows several data transfers to occur in parallel,
reducing overall instruction execution time.
● Register Transfers: Data can be transferred between different registers
simultaneously, improving the speed of operations.
● ALU and Memory Access: The ALU can fetch operands from one register, compute,
and store the result in another register, all in the same clock cycle.
1. Bus A: Connects the registers to the inputs of the Arithmetic Logic Unit (ALU).
2. Bus B: Also connects the registers to the inputs of the ALU, providing simultaneous
access to two operands.
3. Bus C: Takes the result from the ALU and returns it to the registers or memory.
In this setup:
● Two source registers (say R1 and R2) can send data to the ALU at the same time
using Bus A and Bus B.
● The ALU computes the result and sends it back via Bus C to the destination register
(say R3).
This allows parallelism in the operations and increases the instruction throughput.
Instruction Execution in Multiple Bus Organization
Consider the execution of an instruction like:
assembly
ADD R1, R2, R3 ; R3 = R1 + R2
In a single-bus system, the operation would require multiple steps:
In a multiple-bus system:
● R1 and R2 can be accessed simultaneously via separate buses (Bus A and Bus B).
● The ALU computes the sum in the same cycle and returns the result to R3 through
Bus C.
This saves clock cycles, allowing the next instruction to begin earlier.
Advantages of Multiple Bus Organization
● Parallel Data Transfer: Multiple buses allow for parallel communication between
different parts of the CPU, reducing the instruction execution time.
● Higher Bandwidth: Multiple buses increase the data transfer capacity, as more data
can be moved at the same time.
● Improved Performance: The overall system performance improves because there are
fewer bottlenecks, especially when dealing with multiple instructions or data transfers
simultaneously.
Number of One bus shared by all Two or more buses, with specific buses for
Buses components different purposes
Data Transfer Limited by the single bus Higher, as multiple buses increase overall
Bandwidth width bandwidth
Simpler, with a single More complex due to the need for coordination
Complexity
control logic between multiple buses
● Advantages:
o Reduces bus contention
o Increases data transfer rates
Challenges in Multiple Bus Organization
While multiple bus systems offer higher performance, they come with some
challenges:
● Increased Complexity: More buses require more complex control logic to prevent
bus conflicts and manage data transfers efficiently.
● Cost: The additional hardware (wires, control units) needed for multiple buses
increases the cost of the system.
● Synchronization: Managing data transfers across different buses in a synchronized
manner is challenging, especially in highly parallel systems.
Hardwired Control
● Hardwired Control is one of the methods used for designing the control unit in a
CPU. The control unit is responsible for generating the appropriate signals that
coordinate the operation of the other parts of the CPU during the execution of an
instruction. In a hardwired control system, the control signals are generated by a
combination of fixed logic circuits using gates, flip-flops, decoders, and other digital
components. Hardwired control involves designing control logic circuits to manage
the execution of instructions. These control signals are generated using a combination
of gates, flip-flops, decoders, and other circuits.
Advantages:
Example: A control signal that directs data from the memory to the ALU could be
implemented using AND/OR gates based on the opcode.
1. Instruction Fetch:
o The instruction is fetched from memory and loaded into the instruction
register.
2. Instruction Decode:
o The fetched instruction is decoded to determine the operation to be performed
and the operands involved.
3. Generate Control Signals:
o Based on the decoded instruction, the control unit generates a set of control
signals to activate different components of the CPU, such as the ALU,
registers, memory, etc.
4. Execution:
o The control signals guide the execution of the instruction, and the control unit
moves through various states (like fetching operands, performing the
operation, and storing the result).
ADD R1, R2, R3 ; Add contents of R2 and R3, and store the result in R1
1. Instruction Fetch:
o Fetch the instruction from memory into the instruction register.
o Control Signals: Memory Read, Load Instruction Register.
2. Instruction Decode:
o Decode the instruction and determine it is an ADD operation.
o Control Signals: Decode Instruction.
3. Execution:
o Transfer the values from registers R2 and R3 to the inputs of the ALU.
o Control Signals: Select ALU operation (ADD), Enable register R2 and R3,
Store result in register R1.
4. Store the Result:
o The ALU computes the sum and stores the result in R1.
o Control Signals: Write back to register R1.
This whole process is controlled by hardwired circuits that generate the correct signals at
each step.
Finite State Machine (FSM) Model
The hardwired control unit can be visualized as a Finite State Machine (FSM). The control
unit moves from one state to another based on input signals, producing output control signals
in each state.
● States: Each state represents a phase of instruction execution (fetch, decode, execute,
etc.).
● Transitions: The FSM transitions between states based on inputs such as the opcode
and clock pulses.
● Outputs: In each state, the FSM generates specific control signals required for that
phase of instruction execution.
Generally faster due to direct signal Slower, as control signals are fetched from
Speed
generation memory
Flexibility Rigid and difficult to modify Flexible, can add new instructions easily
Cost More costly in terms of hardware Lower cost in terms of control unit design
Feature Hardwired Control Microprogrammed Control
Suitable for simple and fast Suitable for complex processors with larger
Application
processors instruction sets
● Simple and Fast Processors: Hardwired control is used in processors where speed is
a critical requirement, such as in embedded systems, where a small set of instructions
is used.
● Specialized Processors: Used in CPUs designed for specific tasks, where the
instruction set is small and well-defined.
● Overview: Control signals direct the CPU to perform specific tasks, such as fetching
data from memory, sending data to the ALU, or writing data to registers. These
signals are generated based on the instruction's opcode.
● Components of Control Signals:
o Instruction Register (IR): Holds the current instruction.
o Control Logic: Generates signals for memory, I/O devices, and the ALU.
Microprogrammed Control
Pipelining:
● Overview: Pipelining is a technique used to improve the performance of the CPU by
overlapping the execution of multiple instructions. Instructions are broken into
smaller stages, and each stage is processed simultaneously in different parts of the
CPU.
● Stages of Pipelining:
o Fetch
o Decode
o Execute
o Memory Access
o Writeback
● Example: While instruction 1 is in the decode stage, instruction 2 can be fetched.
This allows multiple instructions to be in different stages of execution simultaneously.
Performance in Pipelining
● Performance Enhancement:
o Throughput: Pipelining increases the number of instructions that can be
processed per unit of time.
o Latency: Each instruction still takes the same amount of time, but the overall
throughput is improved.
● Example: If a CPU can complete one stage of an instruction per clock cycle, a
5-stage pipeline can potentially finish 5 instructions in 5 cycles, rather than 5
instructions in 25 cycles.
Hazards in Pipelining
In a pipelined processor, hazards are conditions that disrupt the smooth execution of
instructions and cause performance degradation. There are three main types of hazards:
1. Data Hazards
Data hazards occur when instructions that are executed in sequence depend on each other's
data. In a pipelined architecture, multiple instructions are processed concurrently, leading to
cases where an instruction requires data that is still being processed by a preceding
instruction.
Influence on Instruction Set Design and Performance: Data hazards necessitate careful
instruction scheduling. Processor designers must ensure that operand dependencies are
minimized or handled without stalling the pipeline. Techniques like operand forwarding
(bypassing) and pipeline stalls (bubbles) are used to mitigate data hazards.
Mitigation Techniques:
2. Control Hazards
Control hazards, or branch hazards, arise when the outcome of a branch instruction
(conditional or unconditional) is unknown. The pipeline may need to fetch the next
instruction without knowing if the branch will be taken or not.
Mitigation Techniques:
Structural hazards occur when hardware resources required by the pipeline stages are
insufficient to support concurrent execution of multiple instructions. For example, a conflict
might arise if multiple stages of the pipeline need to access the memory at the same time.
Influence on Instruction Set Design and Performance: Structural hazards can significantly
impact performance if the pipeline design doesn’t provide sufficient resources (e.g., multiple
ALUs, memory access ports). To address this, instruction set design must account for
resource allocation and concurrency within the pipeline.
Mitigation Techniques:
● Multiple Functional Units: Adding more execution units like additional ALUs or
floating-point units helps mitigate conflicts over shared resources.
● Memory Interleaving: This technique increases memory bandwidth, allowing
multiple memory access requests to be handled concurrently.
● Out-of-order Execution: Helps alleviate the pressure on a single resource by
allowing instructions that don’t need that resource to execute first.
Pipelining introduces complexity into the instruction set architecture (ISA) by requiring
considerations for dependencies, branching, and resource conflicts. To optimize pipelining,
instruction sets may:
By addressing these hazards through techniques like forwarding, branch prediction, and
multiple functional units, processors can minimize pipeline stalls, thereby improving system
performance, throughput, and resource utilization. Efficient pipelining ensures a higher
number of instructions executed per cycle (IPC), maximizing the benefits of parallelism
inherent in the architecture.