0% found this document useful (0 votes)

5 views15 pages

CAO Unit 3

The document outlines the architecture of a non-pipelined CPU, detailing its core components such as the Control Unit, Arithmetic Logic Unit, and various registers, as well as the sequential execution of instructions. It also discusses the memory hierarchy, emphasizing the importance of different memory types and the principle of locality for efficient access. Additionally, it covers I/O techniques and various CPU architecture types based on operand storage, highlighting their characteristics and operational mechanisms.

Uploaded by

aigug.r24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views15 pages

CAO Unit 3

Uploaded by

aigug.r24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

‭Unit 3‬

‭1. Basic Non-Pipelined CPU Architecture‬

‭ non-pipelined CPU executes instructions sequentially. One instruction must‬
A
‭complete‬‭all‬‭its stages before the next instruction can begin. Think of it like a‬
‭single worker handling one task entirely before starting the next.‬

‭Core Components:‬

‭1.‬ ‭Control Unit (CU):‬‭The "brain" of the CPU. It fetches instructions from‬
‭memory, decodes them, and generates control signals to direct the other‬
‭components (ALU, registers, memory interfaces) on what to do and‬
‭when.‬
‭2.‬ ‭Arithmetic Logic Unit (ALU):‬‭Performs arithmetic operations (addition,‬
‭subtraction, multiplication, division) and logical operations (AND, OR,‬
‭NOT, XOR, shifts). It takes operands from registers and returns the result‬
‭to a register.‬
‭3.‬ ‭Registers:‬‭Small, extremely fast storage locations‬‭within‬‭the CPU. Used‬
‭to hold data, instructions, addresses, and status information temporarily‬
‭during execution. Key types include:‬
‭○‬ ‭Program Counter (PC):‬‭Holds the memory address of the‬‭next‬
‭instruction to be fetched.‬
‭○‬ ‭Instruction Register (IR):‬‭Holds the instruction‬‭currently‬‭being‬
‭decoded and executed.‬
‭○‬ ‭Memory Address Register (MAR):‬‭Holds the address of the‬
‭memory location to be accessed (read from or written to).‬
‭○‬ ‭Memory Data Register (MDR) / Memory Buffer Register‬
‭(MBR):‬‭Temporarily holds data being transferred to or from‬
‭memory.‬
‭○‬ ‭General Purpose Registers (GPRs):‬‭Used to hold operands and‬
‭results for ALU operations. Accessible by the programmer (via‬
‭assembly language).‬
‭○‬ ‭Status Register / Flags Register:‬‭Holds status bits (flags)‬
‭indicating results of operations (e.g., Zero flag, Carry flag,‬
‭Overflow flag, Negative flag).‬
‭4.‬ ‭Internal Buses:‬‭Pathways connecting the different components (CU,‬
‭ALU, Registers) within the CPU, allowing data and control signals to‬
‭travel between them.‬

‭ peration:‬‭In a non-pipelined architecture, the CPU follows the‬

O
‭Fetch-Decode-Execute cycle strictly sequentially for each instruction. If‬
f‭ etching takes 1 clock cycle, decoding 1, and executing 3, a single instruction‬
‭takes 5 clock cycles. The next instruction only starts fetching‬‭after‬‭the previous‬
‭one has fully completed execution. This leads to underutilization of CPU‬
‭components (e.g., the fetch unit is idle during decode and execute).‬

‭2. Memory Hierarchy‬

‭ PUs operate much faster than main memory (RAM). Accessing RAM for‬
C
‭every instruction and data operand would create a massive bottleneck. The‬
‭Memory Hierarchy is a structure that uses multiple levels of memory with‬
‭different speeds, sizes, and costs to bridge this gap.‬

‭Levels (Closest to CPU outwards):‬

‭1.‬ ‭Registers:‬‭Fastest, smallest, most expensive (part of the CPU). Hold‬

‭currently active data/instructions. Access time ~1 CPU cycle.‬
‭2.‬ ‭Cache Memory (L1, L2, L3):‬‭Small, fast Static RAM (SRAM) located‬
‭closer to the CPU (often on the same chip). Stores frequently accessed‬
‭data and instructions from main memory.‬
‭○‬ ‭L1 Cache:‬‭Smallest, fastest cache, often split into instruction cache‬
‭(L1i) and data cache (L1d). Access time ~ few CPU cycles.‬
‭○‬ ‭L2 Cache:‬‭Larger and slower than L1. Access time ~ 10-20 cycles.‬
‭○‬ ‭L3 Cache:‬‭Largest and slowest cache level, often shared by‬
‭multiple CPU cores. Access time ~ 30-50 cycles.‬
‭3.‬ ‭Main Memory (RAM):‬‭Dynamic RAM (DRAM). Much larger than‬
‭cache but significantly slower. Holds the currently running operating‬
‭system and application programs/data. Access time ~ 100-200+ cycles.‬
‭4.‬ ‭Secondary Storage (Virtual Memory/Swap Space):‬‭Hard Disk Drives‬
‭(HDDs) or Solid State Drives (SSDs). Largest capacity, slowest access‬
‭time, cheapest per bit. Holds data and programs not currently in RAM.‬
‭Used as an extension of RAM (virtual memory). Access time ~‬
‭milliseconds.‬
‭5.‬ ‭Tertiary Storage (Optional):‬‭Optical disks, magnetic tapes for backups‬
‭and archival. Very slow.‬

‭ rinciple of Locality:‬‭Memory hierarchy works efficiently because programs‬

P
‭tend to exhibit:‬

‭●‬ T
‭ emporal Locality:‬‭If an item (instruction or data) is accessed, it's likely‬
‭to be accessed again soon. (Loops, reuse of variables). Caching keeps‬
‭recently used items close.‬
‭●‬ S
‭ patial Locality:‬‭If an item is accessed, items whose addresses are close‬
‭by are likely to be accessed soon. (Sequential code execution, array‬
‭processing). Caching fetches blocks of data around the requested item.‬

‭ oal:‬‭To provide the CPU with an average memory access time close to the‬
G
‭cache speed, while offering the large capacity of main memory and secondary‬
‭storage.‬

‭3. I/O Techniques‬

I‭ nput/Output (I/O) techniques manage the communication between the‬
‭CPU/Memory system and external peripheral devices (keyboard, mouse, disk‬
‭drives, network interfaces, printers, etc.).‬

‭1.‬ ‭Programmed I/O (PIO):‬

‭○‬ ‭Mechanism:‬‭The CPU executes specific I/O instructions. It‬
‭continuously checks (polls) the status register of the I/O device‬
‭until it's ready for data transfer. The CPU is directly responsible for‬
‭moving data between memory/registers and the I/O device buffer.‬
‭○‬ ‭Pros:‬‭Simple to implement.‬
‭○‬ ‭Cons:‬‭Very inefficient. The CPU wastes significant time waiting‬
‭(polling) for the slow I/O device, unable to perform other tasks.‬
‭Only suitable for very simple or slow devices.‬
‭2.‬ ‭Interrupt-Driven I/O:‬
‭○‬ ‭Mechanism:‬‭The CPU initiates an I/O operation and then‬
‭continues executing other tasks. When the I/O device is ready (e.g.,‬
‭data received, operation complete), it sends an interrupt signal to‬
‭the CPU. The CPU suspends its current task, saves its state,‬
‭executes an Interrupt Service Routine (ISR) to handle the data‬
‭transfer, restores its state, and resumes the interrupted task.‬
‭○‬ ‭Pros:‬‭Much more efficient than PIO, as the CPU doesn't wait idly.‬
‭○‬ ‭Cons:‬‭Interrupt handling introduces overhead (saving/restoring‬
‭state, context switching). Still involves the CPU in the actual data‬
‭transfer process.‬
‭3.‬ ‭Direct Memory Access (DMA):‬

‭○‬ M
‭ echanism:‬‭A dedicated hardware controller, the DMA Controller‬
‭(DMAC), manages the data transfer directly between the I/O‬
‭device and main memory,‬‭without‬‭involving the CPU except at the‬
‭beginning (to set up the transfer: source address, destination‬
‭address, data count) and the end (DMAC sends an interrupt when‬
‭done).‬
‭○‬ P ‭ ros:‬‭Most efficient for large data transfers. Frees the CPU almost‬
‭entirely during the transfer. Reduces CPU overhead significantly.‬
‭○‬ ‭Cons:‬‭Requires a dedicated DMAC. Can lead to bus contention if‬
‭the DMAC and CPU need the memory bus simultaneously (cycle‬
‭stealing).‬

‭4. CPU Architecture Types (Based on Operand Storage)‬

‭ his classification refers to how the Instruction Set Architecture (ISA) specifies‬
T
‭the operands for ALU instructions.‬

‭1.‬ ‭Accumulator Architecture:‬

‭○‬ ‭Concept:‬‭Uses a single special register called the "accumulator" as‬
‭one implicit operand for most arithmetic/logic instructions. The‬
‭other operand typically comes from memory. The result is stored‬
‭back in the accumulator.‬
‭○‬ ‭Example Instruction:‬‭ ADD address‬‭(meaning‬‭ ACC = ACC +‬
‭emory[address]‬
M ‭)‬
‭○‬ ‭Characteristics:‬‭Simple hardware, short instructions (one explicit‬
‭address). High memory traffic as operands frequently need‬
‭loading/storing. Older architecture type (e.g., early‬
‭microprocessors).‬
‭2.‬ ‭Stack Architecture:‬
‭○‬ ‭Concept:‬‭Operands are implicitly on the top of a processor stack.‬
‭ALU operations pop operands from the stack and push the result‬
‭back onto it. Requires‬‭ PUSH‬‭and‬‭
POP‬‭instructions to move data‬
‭between memory and the stack.‬
‭○‬ ‭Example Instruction:‬‭ ADD‬‭(pops top two values, adds them,‬
‭pushes result)‬
‭○‬ ‭Characteristics:‬‭Can lead to very compact code ("zero-address‬
‭instructions"). Stack management can be complex. Efficient for‬
‭evaluating complex expressions. Used in some systems like Java‬
‭Virtual Machine (JVM).‬
‭3.‬ ‭General Purpose Register (GPR) Architecture:‬
‭○‬ ‭Concept:‬‭Uses multiple general-purpose registers to hold operands‬
‭and results. Dominant modern architecture.‬
‭○‬ ‭Sub-types:‬
‭■‬ ‭Register-Memory:‬‭Allows ALU instructions to have one‬
‭operand in a register and another in memory.‬‭ADD R1,‬
address‬‭(meaning‬‭
‭ R1 = R1 + Memory[address]‬
‭).‬
‭■‬ R
‭ egister-Register (Load/Store):‬‭ALU operations‬‭only‬‭work‬
LOAD‬‭and‬‭
‭on registers. Separate‬‭ STORE‬‭instructions are‬
LOAD‬
‭required to move data between registers and memory.‬‭
R1, address1‬
‭ LOAD R2, address2‬
‭,‬‭ ADD R3,‬
‭,‬‭
‭1, R2‬
R STORE address3, R3‬
‭,‬‭ ‭.‬
‭○‬ ‭Characteristics:‬‭Reduces memory traffic compared to‬
‭accumulator/stack (registers are faster). Requires more complex‬
‭instruction formats (specifying multiple registers). Load/Store is‬
‭the basis for most RISC architectures (like ARM, MIPS, RISC-V).‬
‭4.‬ ‭Memory-Memory Architecture (Less Common now):‬
‭○‬ ‭Concept:‬‭Allows ALU instructions to operate directly on operands‬
‭located in main memory, potentially storing the result back to‬
‭memory.‬
‭○‬ ‭Example Instruction:‬‭ ADD address1, address2,‬
address3‬‭(meaning‬‭
‭ Memory[address1] =‬
‭emory[address2] + Memory[address3]‬
M ‭)‬
‭ ‬ ‭Characteristics:‬‭Very high flexibility, complex instructions. Very‬
○
‭slow due to multiple memory accesses per instruction. Not‬
‭common in modern general-purpose CPUs, though some complex‬
‭instructions in CISC architectures might resemble this.‬

‭ . Detailed Datapath of a Typical Register-Based (Load/Store)‬

5
‭CPU‬
‭ he datapath shows the physical connections (buses) and functional units (ALU,‬
T
‭registers, memory interfaces) through which data flows during instruction‬
‭execution. Here's a simplified view for a load/store architecture:‬

‭(Visualize blocks connected by lines/arrows representing buses)‬

‭●‬ ‭Instruction Fetch:‬

‭1.‬ ‭Content of the PC is sent to MAR.‬
‭2.‬ ‭A Read signal is sent to the Memory Interface.‬
‭3.‬ ‭PC is incremented (usually PC = PC + 4, assuming 32-bit‬
‭instructions/addresses).‬
‭4.‬ ‭Memory returns the instruction via the data bus to the MDR.‬
‭5.‬ ‭Instruction moves from MDR to the IR.‬
‭●‬ ‭Instruction Decode:‬
‭1.‬ ‭The opcode part of the instruction in IR is sent to the Control Unit.‬
‭2.‬ ‭The Control Unit decodes the instruction and generates control‬
‭signals for subsequent steps.‬
‭3.‬ ‭Register operands specified in the IR (e.g., source registers Rs, Rt)‬
‭are used to select registers from the Register File.‬
‭ ‬ ‭Execute (Example:‬‭
● ADD R_dest, R_src1, R_src2‬ ‭)‬
‭1.‬ ‭Data from R_src1 and R_src2 in the Register File are sent to the‬
‭ALU inputs (Input A, Input B).‬
‭2.‬ ‭The Control Unit sends an "ADD" signal to the ALU.‬
‭3.‬ ‭ALU performs the addition.‬
‭4.‬ ‭ALU result is routed back towards the Register File.‬
‭●‬ ‭Execute (Example: Address calculation for‬‭ LOAD/STORE R_t,‬
offset(R_s)‬
‭ ‭)‬
‭1.‬ ‭Data from R_s in the Register File is sent to ALU Input A.‬
‭2.‬ ‭The 'offset' value (part of the instruction, possibly sign-extended) is‬
‭sent to ALU Input B.‬
‭3.‬ ‭The Control Unit sends an "ADD" signal to the ALU.‬
‭4.‬ ‭ALU calculates the effective memory address (Base Address +‬
‭Offset).‬
‭ ‬ ‭Memory Access (Example:‬‭
● LOAD R_t, address‬ ‭)‬
‭1.‬ ‭The calculated address (from ALU) is sent to the MAR.‬
‭2.‬ ‭The Control Unit sends a Read signal to the Memory Interface.‬
‭3.‬ ‭Memory returns the data via the data bus to the MDR.‬
‭●‬ ‭Memory Access (Example:‬‭ STORE R_t, address‬ ‭)‬
‭1.‬ ‭The calculated address (from ALU) is sent to the MAR.‬
‭2.‬ ‭Data from register R_t (specified in IR) is read from the Register‬
‭File and sent to the MDR.‬
‭3.‬ ‭The Control Unit sends a Write signal to the Memory Interface.‬
‭4.‬ ‭Data from MDR is written to memory at the specified address.‬
‭●‬ ‭Write Back (Example:‬‭ ADD R_dest, ...‬‭or‬‭ LOAD R_t, ...‬
‭)‬
‭1.‬ ‭For ADD: The result from the ALU output is written into R_dest in‬
‭the Register File.‬
‭2.‬ ‭For LOAD: The data fetched from memory (now in MDR) is‬
‭written into R_t in the Register File.‬
‭3.‬ ‭Control Unit provides the correct register address (R_dest or R_t)‬
‭and the Write Enable signal to the Register File.‬

‭Key Datapath Elements:‬

‭‬
● ‭ C -> Adder -> Mux -> PC (For incrementing PC)‬
P
‭●‬ ‭PC -> MAR‬
‭●‬ ‭Memory Interface <-> MAR, MDR‬
‭●‬ ‭MDR -> IR‬
‭●‬ ‭IR -> Control Unit‬
‭‬
● I‭ R (register fields) -> Register File (Read/Write addresses)‬
‭●‬ ‭Register File (Read Ports) -> ALU Inputs (possibly via Muxes)‬
‭●‬ ‭IR (immediate field) -> Sign Extender -> ALU Input B (via Mux)‬
‭●‬ ‭ALU Output -> MAR (for address calculation)‬
‭●‬ ‭ALU Output -> Register File (Write Port) (for R-type results)‬
‭●‬ ‭MDR -> Register File (Write Port) (for Load results)‬
‭●‬ ‭Register File (Read Port) -> MDR (for Store data)‬
‭●‬ ‭Control Unit -> Control signals to Muxes, ALU, Register File (Write‬
‭Enable), Memory Interface (Read/Write).‬

‭6. Fetch-Decode-Execute Cycle (Typically 3 to 5 Stages)‬

‭This is the fundamental cycle performed by the CPU to execute instructions.‬

‭Basic 3-Stage Cycle:‬

‭1.‬ ‭Fetch:‬
‭○‬ ‭Get the address from the PC.‬
‭○‬ ‭Load the instruction from memory at that address into the IR.‬
‭○‬ ‭Increment the PC to point to the next instruction.‬
‭2.‬ ‭Decode:‬
‭○‬ ‭Interpret the opcode in the IR.‬
‭○‬ ‭Identify the operands needed.‬
‭○‬ ‭Generate control signals for the execute stage.‬
‭○‬ ‭Fetch operands from registers if needed.‬
‭3.‬ ‭Execute:‬
‭○‬ ‭Perform the operation specified by the instruction (using the ALU,‬
‭accessing memory, changing PC for jumps/branches, writing‬
‭results to registers).‬

‭ ypical 5-Stage RISC Pipeline Cycle:‬‭(This breakdown is crucial for‬

T
‭understanding pipelining)‬

‭1.‬ ‭IF (Instruction Fetch):‬‭Fetch instruction from memory using the address‬
‭in PC, store in IR, increment PC.‬
‭2.‬ ‭ID (Instruction Decode & Register Fetch):‬‭Decode instruction in IR,‬
‭identify required registers, read operand values from the Register File.‬
‭Decode immediate values. Check for hazards.‬
‭3.‬ ‭EX (Execute / Address Calculation):‬
‭○‬ ‭For ALU instructions: Perform the operation using the ALU on‬
‭operands fetched in ID.‬
‭○‬ F ‭ or Load/Store: Calculate the effective memory address using the‬
‭ALU (Base + Offset).‬
‭○‬ ‭For Branches: Calculate branch target address and evaluate branch‬
‭condition.‬
‭4.‬ ‭MEM (Memory Access):‬
‭○‬ ‭For Load: Read data from memory using the address calculated in‬
‭EX.‬
‭○‬ ‭For Store: Write data (fetched from register in ID) to memory using‬
‭the address calculated in EX.‬
‭○‬ ‭Other instructions usually do nothing at this stage.‬
‭5.‬ ‭WB (Write Back):‬‭Write the result back into the Register File.‬
‭○‬ ‭For ALU instructions: Write the result from the EX stage.‬
‭○‬ ‭For Load instructions: Write the data fetched in the MEM stage.‬

I‭ n a non-pipelined CPU, one instruction goes through all 5 stages before the‬
‭next one starts IF.‬

‭ . Microinstruction Sequencing & Implementation of Control‬

7
‭Unit‬
‭ he Control Unit generates the signals that control the datapath. There are two‬
T
‭main implementation approaches:‬

‭A. Hardwired Control Unit:‬

‭●‬ I‭ mplementation:‬‭Uses fixed, dedicated combinational logic circuits‬

‭(AND, OR, NOT gates, decoders) to generate control signals based on the‬
‭instruction opcode, ALU flags, and timing signals (clock).‬
‭●‬ ‭Operation:‬‭The opcode bits directly feed into the logic gates. The‬
‭outputs of these gates are the control signals.‬
‭●‬ ‭Microinstruction Sequencing:‬‭Not applicable in the same way. The‬
‭"sequence" is determined by the flow through the fixed logic based on the‬
‭current state and instruction.‬
‭●‬ ‭Pros:‬‭Very fast execution speed.‬
‭●‬ ‭Cons:‬‭Complex to design and debug. Inflexible; modifying the‬
‭instruction set requires redesigning the hardware. Difficult to implement‬
‭complex instruction sets. Typically used in RISC processors.‬

‭B. Microprogrammed Control Unit:‬

‭●‬ I‭ mplementation:‬‭Control signals are stored as sequences of‬

‭"microinstructions" in a special memory called the Control Store (or‬
‭Control Memory - C M), typically ROM or fast RAM.‬
‭●‬ ‭Components:‬
‭○‬ ‭Control Store (CS):‬‭Holds the microprogram(s).‬
‭○‬ ‭Microinstruction Register (µIR):‬‭Holds the current‬
‭microinstruction being executed.‬
‭○‬ ‭Microprogram Counter (µPC):‬‭Holds the address of the‬‭next‬
‭microinstruction in the CS to be fetched (analogous to the main‬
‭PC).‬
‭○‬ ‭Sequencing Logic:‬‭Determines the next value for the µPC.‬
‭●‬ ‭Operation:‬
‭○‬ ‭The instruction opcode from the IR is mapped to a starting address‬
‭in the Control Store.‬
‭○‬ ‭The µPC is loaded with this starting address.‬
‭○‬ ‭The microinstruction at µPC address is fetched from CS into the‬
‭µIR.‬
‭○‬ ‭The bits in the µIR directly represent the control signals needed for‬
‭the datapath for that micro-step.‬
‭○‬ ‭The Sequencing Logic uses information from the µIR (next address‬
‭field), the instruction opcode, and ALU flags to calculate the‬
‭address of the‬‭next‬‭microinstruction (µPC update).‬
‭○‬ ‭Repeat steps 3-5 until the end of the micro-routine for the current‬
‭machine instruction.‬
‭●‬ ‭Microinstruction Sequencing:‬‭How the next microinstruction address is‬
‭determined:‬
‭○‬ ‭Increment:‬‭µPC = µPC + 1 (default sequential execution).‬
‭○‬ ‭Branching:‬‭Based on ALU flags (e.g., if Zero flag is set, jump to‬
‭microinstruction X, else continue).‬
‭○‬ ‭Dispatching:‬‭Based on the opcode of the‬‭machine‬‭instruction‬
‭(used to find the start of the correct micro-routine).‬
‭○‬ ‭Explicit Next Address:‬‭The current microinstruction contains the‬
‭address of the next one.‬
‭●‬ ‭Pros:‬‭Flexible (changing the instruction set means rewriting the‬
‭microprogram in the CS, not redesigning hardware). Easier to implement‬
‭complex instruction sets (CISC). Simpler design process.‬
‭●‬ ‭Cons:‬‭Slower than hardwired control due to the extra memory access‬
‭time for fetching microinstructions from the CS.‬

‭8. Enhancing Performance with Pipelining‬

‭ ipelining is a technique used to improve CPU throughput by overlapping the‬
P
‭execution stages of multiple instructions. It doesn't make a single instruction‬
‭faster, but it increases the number of instructions completed per unit of time.‬
‭●‬ C ‭ oncept:‬‭Divide instruction processing into multiple stages (like the‬
‭5-stage IF, ID, EX, MEM, WB). Insert pipeline registers between stages‬
‭to hold the intermediate results and control information for an instruction‬
‭as it moves down the "assembly line".‬
‭●‬ ‭Operation:‬‭In an ideal pipeline, a new instruction enters the first stage‬
‭(IF) in every clock cycle. While instruction 1 is in ID, instruction 2 is in‬
‭IF. While instruction 1 is in EX, instruction 2 is in ID, and instruction 3 is‬
‭in IF, and so on.‬
‭●‬ ‭Benefit:‬‭If there are 'k' stages, the ideal speedup compared to a‬
‭non-pipelined CPU is 'k' times (assuming balanced stage delays and no‬
‭interruptions). In the 5-stage example, after the first instruction takes 5‬
‭cycles to complete, subsequent instructions complete at a rate of one per‬
‭cycle (ideally).‬
‭●‬ ‭Challenges - Pipeline Hazards:‬‭Situations that prevent the next‬
‭instruction in the pipeline from executing during its designated clock‬
‭cycle.‬
‭○‬ ‭Structural Hazards:‬‭Hardware resource conflict. Two different‬
‭instructions in the pipeline need the same resource (e.g., memory‬
‭access) at the same time. Solved by duplicating resources (e.g.,‬
‭separate instruction/data caches) or stalling.‬
‭○‬ ‭Data Hazards:‬‭An instruction depends on the result of a previous‬
‭instruction that is still in the pipeline and hasn't completed writing‬
‭its result.‬
‭■‬ ‭Read After Write (RAW - True Dependence):‬‭Instruction J‬
ADD R1,‬
‭tries to read before instruction I writes. (e.g.,‬‭
‭2, R3‬‭followed by‬‭
R SUB R4, R1, R5‬ ‭). Solved by‬
‭forwarding/bypassing‬‭(routing the result directly from ALU‬
‭output/MEM stage back to ALU input for the next‬
‭instruction) or stalling (inserting bubbles/NOPs).‬
‭■‬ ‭Write After Read (WAR - Anti Dependence):‬‭Instruction J‬
‭tries to write before instruction I reads. Less common in‬
‭simple pipelines; handled by register renaming in more‬
‭advanced CPUs.‬
‭■‬ ‭Write After Write (WAW - Output Dependence):‬‭Instruction J‬
‭tries to write before instruction I writes (to the same‬
‭register). Handled by ensuring writes happen in order or‬
‭register renaming.‬
‭ ‬ ‭Control Hazards (Branch Hazards):‬‭Occur with branch/jump‬
○
‭instructions. The pipeline fetches sequential instructions assuming‬
‭the branch is not taken, but if the branch‬‭is‬‭taken, the fetched‬
‭instructions are wrong and must be flushed. Solved by:‬
‭‬ S
■ ‭ talling:‬‭Wait until the branch outcome is known.‬
‭■‬ ‭Branch Prediction:‬‭Guess the outcome (e.g., predict not‬
‭taken, predict taken, use history). If wrong, flush and fetch‬
‭correct path.‬
‭■‬ ‭Delayed Branch:‬‭Execute one or more instructions‬‭after‬‭the‬
‭branch instruction, regardless of the outcome (compiler tries‬
‭to fill this "delay slot" with useful independent instructions).‬
‭The Need for a Memory Hierarchy‬
‭ odern computer systems use a‬‭memory hierarchy‬‭to balance‬‭speed‬‭,‬‭cost‬‭, and‬
M
‭capacity‬‭. No single memory type can offer the ideal‬‭combination of‬‭very fast‬‭,‬
‭very large‬‭, and‬‭very cheap‬‭. Thus, the hierarchy is‬‭designed to‬‭optimize‬
‭performance while managing cost‬‭.‬

‭Why a Memory Hierarchy Needed ?‬

‭1. Processor vs. Memory Speed Gap‬

‭●‬ M ‭ odern CPUs are‬‭extremely fast‬‭, capable of executing billions of‬

‭instructions per second.‬
‭●‬ ‭Main memory (RAM) is‬‭slower‬‭than the processor.‬
‭●‬ ‭If the CPU had to wait every time for RAM access, performance would‬
‭drastically degrade.‬

‭➤‬‭Solution:‬‭Use‬‭faster, smaller memory (caches)‬‭closer‬‭to the CPU.‬

‭2. Cost vs. Capacity Trade-off‬

‭‬ F
● ‭ ast memory (like SRAM used in caches) is‬‭expensive‬‭.‬
‭●‬ ‭Slower memory (like DRAM or hard disks) is‬‭cheaper‬‭and provides‬
‭more capacity‬‭.‬

➤
‭ ‬‭Solution:‬‭Use‬‭small amounts of fast memory‬‭and‬‭larger‬‭amounts of‬
‭slow memory‬‭.‬

‭3. Locality of Reference Principle‬

‭ rograms tend to access a small portion of memory repeatedly over short‬

P
‭periods:‬

‭●‬ T ‭ emporal locality‬‭: If a memory location is accessed,‬‭it is likely to be‬

‭accessed again soon.‬
‭●‬ ‭Spatial locality‬‭: If one memory location is accessed, nearby locations are‬
‭likely to be accessed soon.‬

➤
‭ ‬‭Caches exploit this‬‭by storing recently or nearby used data to speed up‬
‭access.‬

‭Locality of Reference Principle‬

‭ he locality of reference principle is a key concept in computer architecture that‬
T
‭describes how programs tend to access a relatively small portion of memory at‬
‭any given time. This principle can be broken down into two types:‬

‭1.‬ ‭Temporal Locality‬‭:‬‭This refers to the tendency of a program to access‬

‭the same memory locations repeatedly within a short time frame. For‬
‭example, if a program accesses a particular variable, it is likely to access‬
‭it again soon.‬
‭2.‬ ‭Spatial Locality‬‭:‬‭This refers to the tendency of a program to access‬
‭memory locations that are close to each other. For instance, if a program‬
‭accesses a certain array element, it is likely to access nearby elements‬
‭shortly thereafter.‬

‭ he locality of reference principle is crucial for designing efficient memory‬

T
‭systems because it allows for the implementation of faster, smaller memory‬
‭types (like cache) that can store frequently accessed data, thereby reducing the‬
‭average time to access memory.‬

‭Memory Hierarchy in Practice‬

‭ memory hierarchy is a structured arrangement of different types of memory‬
A
‭that vary in speed, size, and cost. The main levels of the memory hierarchy‬
‭include:‬

‭1.‬ ‭Cache Memory‬‭:‬‭This is the fastest type of memory, located closest to the‬
‭CPU. It is used to store frequently accessed data and instructions to speed‬
‭up processing. Cache memory is typically divided into levels (L1, L2,‬
‭L3), with L1 being the fastest and smallest.‬
‭2.‬ ‭Main Memory (RAM)‬‭:‬‭This is the primary storage used‬‭by the CPU to‬
‭hold data and instructions that are currently in use. It is slower than cache‬
‭but larger in capacity. Main memory is volatile, meaning it loses its‬
‭contents when power is turned off.‬
‭3.‬ ‭Secondary Memory‬‭:‬‭This includes storage devices like hard drives,‬
‭SSDs, and optical disks. Secondary memory is non-volatile and is used‬
‭for long-term data storage. It is much slower than both cache and main‬
‭memory but offers much larger storage capacity at a lower cost.‬

‭Memory Parameters‬
‭ ccess Time‬‭: Time between a memory request and delivery of data.‬
A
‭Cycle Time‬‭: Time between successive accesses.‬
‭ ache memory has the shortest access time, followed by main memory,‬
C
‭and then secondary memory.‬
‭ ost per Bit‬‭:‬‭This is a measure of how much it costs to store one bit of‬
C
‭data. Cache memory is the most expensive per bit, followed by main‬
‭memory, and then secondary memory, which is the least expensive.‬

‭Main Memory‬
‭Semiconductor RAM & ROM Organization‬

‭ AM (Random Access Memory)‬‭:‬‭This is a type of volatile‬‭memory that‬

R
‭allows data to be read and written. It is organized into cells, each with a‬
‭unique address. RAM can be further categorized into:‬

‭ tatic RAM (SRAM)‬‭:‬‭Uses bistable latching circuitry to store each bit. It‬
S
‭is faster and more expensive than DRAM but is used for cache memory‬
‭due to its speed.‬

‭ ynamic RAM (DRAM)‬‭:‬‭Stores each bit in a capacitor,‬‭which must be‬

D
‭refreshed periodically. It is slower and less expensive than SRAM and is‬
‭used for main memory.‬

‭ OM (Read-Only Memory)‬‭:‬‭This is non-volatile memory‬‭that is used to‬

R
‭store firmware or software that does not change. It is organized similarly to‬
‭RAM but is typically slower and cannot be easily modified.‬

‭Memory Expansion‬
‭ emory expansion refers to the ability to increase the amount of RAM in a‬
M
‭system. This can be done by adding more RAM modules to the motherboard,‬
‭allowing for improved performance and the ability to run more applications‬
‭simultaneously.‬

‭Cache Memory‬
‭Associative & Direct Mapped Cache Organizations‬
‭Cache memory can be organized in different ways to optimize performance:‬

‭1.‬ ‭Direct Mapped Cache‬‭:‬‭Each block of main memory maps to exactly one‬
‭cache line. This is simple and fast but can lead to cache misses if multiple‬
‭memory blocks map to the same cache line (known as conflict misses).‬
‭2.‬ ‭Associative Cache‬‭:‬‭Any block of main memory can be stored in any‬
‭cache line. This flexibility reduces conflict misses but requires more‬
‭complex hardware to search for data, making it slower than‬
‭direct-mapped caches.‬
‭3.‬ ‭Set-Associative Cache‬‭:‬‭This is a compromise between‬‭direct-mapped‬
‭and fully associative caches. The cache is divided into sets, and each set‬
‭can hold multiple blocks. A block of memory can be placed in any line‬
‭within a specific set, balancing speed and complexity.‬

‭Summary Table‬

‭Component‬ ‭Speed‬ ‭Cost/bit‬ ‭Volatility‬ ‭Use case‬

‭Registers‬ ‭Fastest‬ ‭Highest‬ ‭Volatile‬ ‭CPU operations‬

‭ ache‬
C ‭Very fast‬ ‭High‬ ‭Volatile‬ ‭Speed up access‬
‭(SRAM)‬

‭ AM‬
R ‭Moderate‬ ‭Medium‬ ‭Volatile‬ ‭Main memory‬
‭(DRAM)‬

‭ROM‬ ‭Slow‬ ‭Medium‬ ‭Non-volatile‬ ‭Firmware/boot‬

‭code‬

‭HDD/SSD‬ ‭Slowest‬ ‭Low‬ ‭Non-volatile‬ ‭Long-term storage‬

Config Pro 4.1 Tutorial
100% (2)
Config Pro 4.1 Tutorial
156 pages
Genie 92sc Eng Manual
No ratings yet
Genie 92sc Eng Manual
110 pages
Computer Architecture
No ratings yet
Computer Architecture
7 pages
ICS 2101: Computer Organization - Complete Notes
No ratings yet
ICS 2101: Computer Organization - Complete Notes
9 pages
Computer Architecture Notes
No ratings yet
Computer Architecture Notes
4 pages
Csa 3marks Short Answers
No ratings yet
Csa 3marks Short Answers
3 pages
Revision Notes - All Subjects
No ratings yet
Revision Notes - All Subjects
43 pages
Computerarchitecture and Organization Summary
No ratings yet
Computerarchitecture and Organization Summary
6 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
6 pages
Com Org Unit Iv
No ratings yet
Com Org Unit Iv
17 pages
Computer Architechure CAE 3
No ratings yet
Computer Architechure CAE 3
13 pages
Final CAAL Notes
No ratings yet
Final CAAL Notes
9 pages
WiSH 2025 Workbook Week1 Day5
No ratings yet
WiSH 2025 Workbook Week1 Day5
16 pages
Mod 1
No ratings yet
Mod 1
12 pages
Question Bank
No ratings yet
Question Bank
15 pages
Computer ch3 What I Learnt and Retained After Reading
No ratings yet
Computer ch3 What I Learnt and Retained After Reading
44 pages
Week 3
No ratings yet
Week 3
3 pages
Coa Notes
No ratings yet
Coa Notes
73 pages
Computer Architecture Module 1 2 Notes
No ratings yet
Computer Architecture Module 1 2 Notes
3 pages
Computer Architecture Notes AYAN
No ratings yet
Computer Architecture Notes AYAN
12 pages
Basic Design of Microcomputer
No ratings yet
Basic Design of Microcomputer
3 pages
4 Computer-Organization (MemoryAddressing)
No ratings yet
4 Computer-Organization (MemoryAddressing)
44 pages
Module 2 Basic Processing Unit
No ratings yet
Module 2 Basic Processing Unit
39 pages
DDCO Imp Qs For 2nd Internals
No ratings yet
DDCO Imp Qs For 2nd Internals
13 pages
Hardware
No ratings yet
Hardware
28 pages
AS CS Notes
No ratings yet
AS CS Notes
28 pages
Central Processing Unit (CPU)
No ratings yet
Central Processing Unit (CPU)
60 pages
Cao Unit 1
No ratings yet
Cao Unit 1
5 pages
Refer This Notes
No ratings yet
Refer This Notes
30 pages
Course Name: Computer Organization and Architecture Course Code: AI553
No ratings yet
Course Name: Computer Organization and Architecture Course Code: AI553
183 pages
Computer Organizer Suggestion
No ratings yet
Computer Organizer Suggestion
34 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
33 pages
Cod Chapter 1
No ratings yet
Cod Chapter 1
23 pages
CSE 330 My Exam Cheat Sheet PDF
No ratings yet
CSE 330 My Exam Cheat Sheet PDF
2 pages
Computer Architectures and Organisation
No ratings yet
Computer Architectures and Organisation
106 pages
The Von Neumann Computer Architecure
No ratings yet
The Von Neumann Computer Architecure
14 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
11 pages
Von Neumann Computer
No ratings yet
Von Neumann Computer
6 pages
COA Basic Structure Computer
No ratings yet
COA Basic Structure Computer
3 pages
Microprocessors PRELIM
No ratings yet
Microprocessors PRELIM
6 pages
FOC MODULE1 Ktuspecial. in
No ratings yet
FOC MODULE1 Ktuspecial. in
49 pages
CS Paper 1 Notes
No ratings yet
CS Paper 1 Notes
12 pages
L3 Processor Organization
No ratings yet
L3 Processor Organization
38 pages
The Von Neumann Computer Model
No ratings yet
The Von Neumann Computer Model
11 pages
Os Imp
No ratings yet
Os Imp
87 pages
3.1 Computer Architechture
No ratings yet
3.1 Computer Architechture
4 pages
Stack-Based ISA (No Explicit Register) : Structure Operation Example Instruction Sequence Result Storage
No ratings yet
Stack-Based ISA (No Explicit Register) : Structure Operation Example Instruction Sequence Result Storage
2 pages
COMP
No ratings yet
COMP
37 pages
CPU Architecture and Organization in Detail
No ratings yet
CPU Architecture and Organization in Detail
6 pages
Computer Architecture Answers
No ratings yet
Computer Architecture Answers
20 pages
COAR - Midterm
No ratings yet
COAR - Midterm
10 pages
CS Grade 9 Unit 3 Class Notes
No ratings yet
CS Grade 9 Unit 3 Class Notes
12 pages
Types of DSP Architectures
100% (3)
Types of DSP Architectures
45 pages
Coa Prep
No ratings yet
Coa Prep
14 pages
E-Note 6169 Content Document 20240206010804PM
No ratings yet
E-Note 6169 Content Document 20240206010804PM
49 pages
CO - OS Unit-1 (Part1)
No ratings yet
CO - OS Unit-1 (Part1)
40 pages
Microprocessor Huwanotes 1
No ratings yet
Microprocessor Huwanotes 1
7 pages
Structure and Function of The Processor
No ratings yet
Structure and Function of The Processor
9 pages
Direct 5th Chapter
No ratings yet
Direct 5th Chapter
6 pages
٣محاضرات أساسيات نضام الحاسوب
No ratings yet
٣محاضرات أساسيات نضام الحاسوب
10 pages
A Universal Log Management Solution: HP Arcsight Logger
No ratings yet
A Universal Log Management Solution: HP Arcsight Logger
8 pages
Computer Applications - Module1
No ratings yet
Computer Applications - Module1
64 pages
Oferta D.C.
No ratings yet
Oferta D.C.
69 pages
Log
No ratings yet
Log
2 pages
Its30505 Mi
No ratings yet
Its30505 Mi
11 pages
Microprocessor and Microcontroller
No ratings yet
Microprocessor and Microcontroller
181 pages
mr60 Manual
No ratings yet
mr60 Manual
25 pages
How To Fix Usb Not Showing Up, Easy Method To Show Usb Not Recognized Connecting Detected
No ratings yet
How To Fix Usb Not Showing Up, Easy Method To Show Usb Not Recognized Connecting Detected
16 pages
Final Exam System On Chip Solutions in Networking SS 2010
No ratings yet
Final Exam System On Chip Solutions in Networking SS 2010
12 pages
Cache Mapping and Its Types
No ratings yet
Cache Mapping and Its Types
11 pages
Vsphere ICM 8 Lab 21
No ratings yet
Vsphere ICM 8 Lab 21
22 pages
l6 Computer Final Paper
No ratings yet
l6 Computer Final Paper
7 pages
Physics Computer 1
No ratings yet
Physics Computer 1
4 pages
Ict Revision Grade 4-7
No ratings yet
Ict Revision Grade 4-7
6 pages
Labview Improving Performance
No ratings yet
Labview Improving Performance
46 pages
CSS Grade 9 Mya Corrected
No ratings yet
CSS Grade 9 Mya Corrected
11 pages
FModel Log 2024 02 05
No ratings yet
FModel Log 2024 02 05
8 pages
Module 001 Introduction To Computers: Lesson 1: Computer Basics
100% (1)
Module 001 Introduction To Computers: Lesson 1: Computer Basics
11 pages
Persistent Staging Area: Purpose
No ratings yet
Persistent Staging Area: Purpose
2 pages
DB2 Storage Management
No ratings yet
DB2 Storage Management
97 pages
COS 331 Operating System2 LectureNotes 2023
No ratings yet
COS 331 Operating System2 LectureNotes 2023
70 pages
Win 7 COOKBOOK
No ratings yet
Win 7 COOKBOOK
13 pages
Speed, Size and Cost
No ratings yet
Speed, Size and Cost
4 pages
Overview of Direct Memory Mapping
No ratings yet
Overview of Direct Memory Mapping
3 pages
Accelerating ML Recommendation
No ratings yet
Accelerating ML Recommendation
23 pages
CDOT
No ratings yet
CDOT
52 pages
Engineering Mathematics: First Order Logic
No ratings yet
Engineering Mathematics: First Order Logic
3 pages
BCA 2017 Onwards
No ratings yet
BCA 2017 Onwards
75 pages

CAO Unit 3

Uploaded by

CAO Unit 3

Uploaded by

‭Unit 3‬

‭1. Basic Non-Pipelined CPU Architecture‬

‭ peration:‬‭In a non-pipelined architecture, the CPU follows the‬

‭2. Memory Hierarchy‬

‭Levels (Closest to CPU outwards):‬

‭1.‬ ‭Registers:‬‭Fastest, smallest, most expensive (part of the CPU). Hold‬

‭ rinciple of Locality:‬‭Memory hierarchy works efficiently because programs‬

‭3. I/O Techniques‬

‭1.‬ ‭Programmed I/O (PIO):‬

‭4. CPU Architecture Types (Based on Operand Storage)‬

‭1.‬ ‭Accumulator Architecture:‬

‭ . Detailed Datapath of a Typical Register-Based (Load/Store)‬

‭(Visualize blocks connected by lines/arrows representing buses)‬

‭●‬ ‭Instruction Fetch:‬

‭Key Datapath Elements:‬

‭6. Fetch-Decode-Execute Cycle (Typically 3 to 5 Stages)‬

‭Basic 3-Stage Cycle:‬

‭ ypical 5-Stage RISC Pipeline Cycle:‬‭(This breakdown is crucial for‬

‭ . Microinstruction Sequencing & Implementation of Control‬

‭A. Hardwired Control Unit:‬

‭●‬ I‭ mplementation:‬‭Uses fixed, dedicated combinational logic circuits‬

‭B. Microprogrammed Control Unit:‬

‭●‬ I‭ mplementation:‬‭Control signals are stored as sequences of‬

‭8. Enhancing Performance with Pipelining‬

‭Why a Memory Hierarchy Needed ?‬

‭1. Processor vs. Memory Speed Gap‬

‭●‬ M ‭ odern CPUs are‬‭extremely fast‬‭, capable of executing billions of‬

‭➤‬‭Solution:‬‭Use‬‭faster, smaller memory (caches)‬‭closer‬‭to the CPU.‬

‭2. Cost vs. Capacity Trade-off‬

‭3. Locality of Reference Principle‬

‭ rograms tend to access a small portion of memory repeatedly over short‬

‭●‬ T ‭ emporal locality‬‭: If a memory location is accessed,‬‭it is likely to be‬

‭Locality of Reference Principle‬

‭1.‬ ‭Temporal Locality‬‭:‬‭This refers to the tendency of a program to access‬

‭ he locality of reference principle is crucial for designing efficient memory‬

‭Memory Hierarchy in Practice‬

‭ AM (Random Access Memory)‬‭:‬‭This is a type of volatile‬‭memory that‬

‭ ynamic RAM (DRAM)‬‭:‬‭Stores each bit in a capacitor,‬‭which must be‬

‭ OM (Read-Only Memory)‬‭:‬‭This is non-volatile memory‬‭that is used to‬

‭Component‬ ‭Speed‬ ‭Cost/bit‬ ‭Volatility‬ ‭Use case‬

‭Registers‬ ‭Fastest‬ ‭Highest‬ ‭Volatile‬ ‭CPU operations‬

‭ROM‬ ‭Slow‬ ‭Medium‬ ‭Non-volatile‬ ‭Firmware/boot‬

‭HDD/SSD‬ ‭Slowest‬ ‭Low‬ ‭Non-volatile‬ ‭Long-term storage‬

You might also like