0% found this document useful (0 votes)
19 views135 pages

Computer Architecture

Computer architecture is the design and structure of a computer's hardware and its components, including the CPU, memory, and I/O devices, and how they communicate. It involves functional units like the input unit, CPU, memory unit, and buses that facilitate data transfer. The document outlines the operational concepts of executing instructions, the types of memory, and the bus structures used in computer systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views135 pages

Computer Architecture

Computer architecture is the design and structure of a computer's hardware and its components, including the CPU, memory, and I/O devices, and how they communicate. It involves functional units like the input unit, CPU, memory unit, and buses that facilitate data transfer. The document outlines the operational concepts of executing instructions, the types of memory, and the bus structures used in computer systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 135

UNIT-I

Computer Architecture refers to how the parts of a computer system are organized and how
they work together to process information. Think of it like the blueprint or design of a house, but
for computers.

Simple Definition

Computer architecture is the design and structure of a computer's hardware and system
components, such as the processor (CPU), memory (RAM), input/output devices, and how they
communicate with each other.

Real-Time Application Example

1.​ Smartphones: When you use a smartphone to open an app, the phone's processor
(CPU) follows a set of instructions to fetch the app from storage, load it into memory
(RAM), and display it on your screen. The architecture of the phone ensures that all
these parts work together efficiently and quickly.

Functional Units

1. Input Unit

●​ What it does: It takes input from devices like a keyboard, mouse, microphone, or
sensors and sends it to the computer for processing.
●​ Example: Typing on a keyboard or clicking with a mouse.
●​ Real-World Analogy: Like a receptionist who receives documents and hands them to
the processing department.

2. Central Processing Unit (CPU)

The CPU is the brain of the computer. It has three main parts:

a) Control Unit (CU)

●​ What it does: It acts like a manager, directing other parts of the computer to do their
jobs.
●​ Example: Tells the processor to fetch instructions, decode them, and execute them.
●​ Real-World Analogy: A manager in a factory giving instructions to workers.

b) Arithmetic Logic Unit (ALU)

●​ What it does: It performs calculations (addition, subtraction, etc.) and logic operations
(like comparing numbers).
●​ Example: Solving a math problem.
●​ Real-World Analogy: A worker in a factory who does calculations or checks product
quality.

3. Memory Unit

●​ What it does: It stores data and instructions. There are two types:
○​ RAM (Random Access Memory) – Temporary memory used while the computer
is on.
○​ ROM (Read-Only Memory) – Permanent memory that stores important
instructions.
●​ Example: When you open an app, it is loaded into RAM so you can use it quickly.
●​ Real-World Analogy: Like a bookshelf where books (data) are stored for workers to
read

1. Registers (Fastest but Smallest)

●​ What are they? Registers are tiny storage locations inside the CPU. They hold small
amounts of data that the CPU is currently working on.
●​ Speed: Fastest type of memory.
●​ Size: Very small (a few bytes to kilobytes).
●​ Example: When you do math on a calculator, the numbers you're currently typing are
stored in registers until the final result is displayed.
●​ Real-World Analogy: Imagine a chef cooking in a kitchen. The chef keeps essential
ingredients (like salt, pepper, or spices) right on the counter. These are like registers —
small, quick to access, but can only hold a few items.

Key Point:

●​ Directly accessible by the CPU.


●​ Holds data like instructions, counters, or temporary results.
●​ Example in use: When adding 5 + 3, the numbers 5 and 3 are held in registers
temporarily until the CPU calculates 8.

💾 2. Primary Memory (RAM & ROM - Temporary Storage)


Primary memory is the working memory that the computer uses while running programs. It
holds data that the CPU actively needs. There are two types of primary memory: RAM and
ROM.

📜 (a) RAM (Random Access Memory)


●​ What is it? RAM is temporary, fast memory that stores data and programs while the
computer is running. Once you turn off the computer, everything in RAM is erased.
●​ Speed: Slower than registers, but faster than secondary memory.
●​ Size: Bigger than registers but smaller than hard drives (usually a few GBs).
●​ Example: When you open a game or a web browser, it is loaded into RAM so that it can
run smoothly.
●​ Real-World Analogy: Imagine a student doing homework on a table. The table is RAM
— it temporarily holds books, notebooks, and pens that the student is currently using.
When the student finishes, they put the items back in a storage cabinet (secondary
memory).

📘 (b) ROM (Read-Only Memory)


●​ What is it? ROM is permanent memory that holds important instructions for the
computer, like how to start (boot) the system. It cannot be changed or erased easily.
●​ Speed: Similar to RAM, but it cannot be modified once written.
●​ Size: Small, typically only a few MBs.
●​ Example: When you turn on your computer, ROM tells it how to start up.
●​ Real-World Analogy: ROM is like the recipe book that tells the chef (CPU) how to
prepare a specific dish. It’s always there and never changes.

Key Point:

●​ RAM: Temporary, holds running programs, erased after power-off.


●​ ROM: Permanent, holds system instructions (like startup code), not erased.

💽 3. Secondary Memory (Long-Term Storage)


●​ What is it? Secondary memory is the permanent storage where you store files,
documents, videos, and apps. It keeps data even when the computer is turned off.
●​ Speed: Slowest of the three types of memory.
●​ Size: Very large (from GBs to TBs or more).
●​ Example: Hard drives (HDDs), Solid State Drives (SSDs), USB drives, and SD cards.
●​ Real-World Analogy: This is like a storage cabinet where you store all your old
schoolwork, books, or files. It takes more time to find something, but it can hold a lot
more stuff than a kitchen counter or a work table.

Key Point:

●​ Secondary memory holds data permanently.


●​ Examples: Hard drives (HDD), Solid State Drives (SSD), USB drives, memory cards.
●​ Even if you turn off your computer, files like photos, movies, and apps are still there.
Basic Operational concepts:

The primary function of a system is execute a program which is a sequence of instructions.


All these instructions are stored in computer memory

Steps to execute any instruction are:

1️⃣ Fetch — Get the instruction from memory.


2️⃣ Decode — Understand what the instruction means.
3️⃣ Execute — Perform the action (like add numbers or show something on the screen).
4️⃣ Store — Save the result for later use (optional, but important).

Step 1: FETCH (Get the Instruction)

●​ What happens? The CPU takes the instruction (like "2 + 3") from memory and brings it
into the CPU.
●​ How it works? The Address Bus points to the location in memory where the instruction
is stored, and the instruction is transferred to the CPU through the Data Bus.
●​ Real-Life Example:
○​ Imagine you’re a student doing homework. The first step is to open your book to
get the question.
○​ For example, the question might be: "Add 2 + 3".

📜 Step 2: DECODE (Understand the Instruction)


●​ What happens? The CPU figures out what the instruction means. If the instruction is
"add 2 + 3," it understands that it needs to add the numbers.
●​ How it works? The Control Unit (CU) reads the instruction and tells the ALU
(Arithmetic Logic Unit) what operation to perform (like addition, subtraction, etc.).
●​ Real-Life Example:
○​ After you open the book and see the question "Add 2 + 3", you read and
understand that "add" means you need to combine the two numbers.

⚙️ Step 3: EXECUTE (Do the Action)


●​ What happens? The CPU does the actual work. If the instruction says "add 2 + 3," the
ALU (Arithmetic Logic Unit) performs the addition and gets the result 5.
●​ How it works? The ALU performs operations like addition, subtraction, comparisons, or
logical decisions.
●​ Real-Life Example:
○​ After you understand the question, you now do the math. You take 2, add 3 to it,
and you get the answer 5.

💾 Step 4: STORE (Save the Result)


●​ What happens? The result of the operation (like the number 5) is stored in memory or
in a register (a small storage space inside the CPU) so it can be used later.
●​ How it works? The result is saved so it can be displayed on the screen, used for further
calculations, or stored for later use.
●​ Real-Life Example:
○​ After you solve 2 + 3 = 5, you write down the answer in your notebook so you
don’t forget it.

Example of 4 Steps Using a Calculator App


Imagine you open a calculator app and type 2 + 3.​
Here’s how the CPU follows the 4-step process:

Step What Happens in the CPU Real-Life Example

1. FETCH CPU gets the instruction "2 + 3" from Student finds the math question
memory. in their book.

2. DECODE CPU understands that it needs to "add" 2 Student reads and understands
and 3. the math problem.

3. EXECUTE CPU’s ALU adds 2 and 3 to get 5. Student solves the math problem
to get 5.

4. STORE CPU stores the result (5) in memory and Student writes the answer (5) in
shows it on the screen. their notebook.

The Instruction Code is a small set of binary (0s and 1s) instructions that tells the CPU what to
do.​
Each instruction is like a secret code that only the CPU understands.

For example, a simple instruction like "ADD 2 + 3" might look like this in binary:

Instruction Code: 11010010


The instruction code consists 2 parts:
Opcode| operand

The CPU breaks this code into 2 parts:

●​ Operation Code (Opcode): What should be done (like ADD, SUBTRACT, MULTIPLY,
etc.).
●​ Operands: The data to work with (like the 2 and 3 in "2 + 3").

Bus Structure:

A bus in computer architecture is like a highway inside the computer. It helps different parts of
the computer, like the CPU, memory, and input/output (I/O) devices, communicate with each
other. Without a bus, these components would not be able to share information.

What is a Bus?
A bus is a collection of wires, paths, or channels used to transfer data, instructions, and
control signals between different parts of a computer.

Types of Buses in a Computer


There are three main types of buses in a computer system:

🏠
📦
1.​ Address Bus — Carries addresses (locations) where data should be sent.

🚦
2.​ Data Bus — Carries actual data (like numbers, text, or images).
3.​ Control Bus — Carries control signals (like "Read", "Write", or "Stop").

📦 1. Address Bus (Where Should the Data Go?)


The Address Bus is like a Google Maps for the CPU. It tells the computer where the data
should be sent or retrieved from.

●​ Purpose: It identifies the location (address) in memory or input/output devices where


data should be sent or fetched.
●​ Who Uses It? The CPU uses the Address Bus to tell memory or I/O devices where to
send or receive data.
●​ Unidirectional: It only goes in one direction — from the CPU to memory or I/O devices.

📦 2. Data Bus (What Data Should Be Sent?)


The Data Bus is like a delivery truck that carries the actual data (like letters, numbers, or
images) between components.

●​ Purpose: It carries the actual data, instructions, and information between the CPU,
memory, and input/output devices.
●​ Who Uses It? The CPU, memory, and input/output devices use the Data Bus to send
and receive actual data.
●​ Bidirectional: It goes in both directions — from CPU to memory and from memory
back to the CPU.

📦 3. Control Bus (How Should We Send Data?)


The Control Bus is like a traffic light that tells everything on the road when to start, stop, or
change directions.

●​ Purpose: It controls when and how the CPU, memory, and I/O devices communicate. It
sends control signals like read, write, halt, and reset.
●​ Who Uses It? The Control Unit (part of the CPU) sends control signals to the other
components via the Control Bus.
●​ Bidirectional: The control signals go both ways between the CPU, memory, and
input/output devices.

How the Address Bus, Data Bus, and Control Bus Work
Together
When you open a file or run an app, the CPU uses all three buses together to complete the
task.​
Here’s how it works step-by-step when you press a key on the keyboard (like typing "A"):

1️⃣ CPU Sends Address:

●​ The CPU uses the Address Bus to tell the memory, "I need the data stored at location

🏠
1234."
●​ Address Bus: Location = 1234

2️⃣ Control Signal is Sent:

●​ The CPU sends a "READ" command using the Control Bus to tell the memory, "Read

🚦
the data at address 1234."
●​ Control Bus: Command = READ

3️⃣ Data is Sent:


●​ The memory sends the requested data (like the letter "A") back to the CPU using the

📦
Data Bus.
●​ Data Bus: Data = A

This 3-step process happens in just a fraction of a second!

🔥 Why Are Buses Important?


1.​ Speed: Buses allow components to send and receive data faster than older connection
methods.
2.​ Communication: Without buses, the CPU, memory, and I/O devices would not be able
to talk to each other.
3.​ Efficiency: The buses make data transfers fast and organized, just like a busy highway
with traffic lights to control the flow.

🏠
●​ The Address Bus ( ) goes from the CPU to memory and I/O to specify where data

📦
should be read or written.
●​ The Data Bus ( ) carries the actual data back and forth between the CPU, memory,

🚦
and I/O devices.
●​ The Control Bus ( ) tells components what to do (like READ or WRITE) and ensures
smooth communication.

🚀 Types of Bus Structures


There are three major types of bus structures used in computer architecture:

1.​ Single Bus Structure


2.​ Multiple Bus Structure
3.​ Hierarchical Bus Structure

📦 1. Single Bus Structure


A single bus structure means that all components (CPU, memory, and I/O devices) are
connected to one shared bus. It’s like a single highway where all cars (data) must use the
same road.

+-----------+ +------------+ +-------------+


| CPU |<---->| MEMORY |<---->| I/O DEVICES |

⬆️
+-----------+ +------------+ +-------------+

🚌 **SINGLE BUS** 🚌
|

⬇️
|

+-------------------------------------+
| Address Bus | Data Bus | Control Bus |
+-------------------------------------+

Explanation of the Single Bus Structure

●​ How It Works:
○​ The CPU, memory, and I/O devices are connected to a single, shared bus.
○​ When one device (like the CPU) wants to send data, all other devices must wait
until the bus is free.
○​ The Address Bus, Data Bus, and Control Bus are all combined into a single,
shared bus.
●​ Advantages:
○​ Simple design — Easy to design and cost-effective.
○​ Low cost — Fewer components and simpler connections.
●​ Disadvantages:
○​ Slow speed — Since only one device can send data at a time, others have to
wait.
○​ Data collision — Multiple devices may try to send data at the same time,
causing delays.
●​ Where Used?
○​ Old computers, simple microcontrollers, and simple embedded systems.
📦 2. Multiple Bus Structure
A multiple bus structure means that the CPU, memory, and I/O devices have separate buses
for each type of communication. Instead of sharing a single highway, they have dedicated
roads for each task.

+-----------+ +------------+ +-------------+


| CPU |<---->| MEMORY |<---->| I/O DEVICES |
+-----------+ +------------+ +-------------+

🚌 CPU BUS 🚌 🚌 MEMORY BUS 🚌 🚌 I/O BUS 🚌


+-------------------------------------+
| Address Bus | Data Bus | Control Bus |
+-------------------------------------+

📘 Explanation of the Multiple Bus Structure


●​ How It Works:
○​ The CPU, memory, and I/O devices have their own buses.
○​ This allows parallel communication — data can be sent from CPU to memory,
and I/O devices can send data to memory at the same time.
○​ The bus structure is divided into multiple buses, such as the CPU bus, memory
bus, and I/O bus.
●​ Advantages:
○​ Faster speed — Devices can communicate simultaneously.
○​ Reduced congestion — Data does not collide because multiple paths are
available.
●​ Disadvantages:
○​ More complex design — It is more expensive to design multiple buses.
○​ Higher cost — More buses, more wiring, and more complexity.
●​ Where Used?
○​ Modern computers (like laptops and desktops), advanced microprocessors, and
systems with heavy I/O usage.

Double Bus Structure


A Double Bus Structure is a more specific type of bus structure where there are two main
buses for connecting different components. It is commonly used in CPU design (like RISC
processors) to make CPU operations faster.

+-------------+
| CPU |
⬇️ ⬇️
+-------------+

🚌 **Bus 1** 🚌 **Bus 2**


⬇️ ⬇️
+------------+ +------------+
| Memory | | I/O Device |
+------------+ +------------+

📘 Explanation of Double Bus Structure


●​ In a Double Bus Structure, the system uses two buses instead of one.
●​ The buses are typically:
○​ Data Bus — Handles actual data.
○​ Address/Control Bus — Handles addresses and control signals.
●​ Some double-bus systems have a separate bus for instruction fetch and a separate
bus for data transfer.
○​ This is common in RISC processors, where instructions and data are handled on
separate buses to allow faster execution.
○​ While one bus fetches an instruction, the other can fetch the next data needed for
that instruction.

📘 Where It’s Used?


●​ RISC (Reduced Instruction Set Computing) processors use double-bus structures.
●​ Used in systems where the CPU needs to process instructions quickly by allowing
parallel execution of instruction fetching and data fetching.

Multiple Bus Structure is like having a highway with multiple lanes. Each lane is for different
vehicles (like trucks, cars, and buses) going to different locations.
Double Bus Structure is like having two roads — one for delivery trucks (data) and one for
postal vans (instructions) so they don’t block each other.

Software Performance:

Software refers to the collection of programs, applications, and operating systems that tell a
computer or device what to do. Unlike hardware (the physical parts of a computer), software is
the set of instructions that makes the hardware work.

📘 Different Types of Software in Computer Architecture (Simple


Explanation)
In computer architecture, software refers to the programs and instructions that tell the hardware
(like the CPU, memory, and input/output devices) what to do. Software acts as a bridge between
the user and the hardware.

There are 3 main types of software in computer architecture:

1.​ System Software


2.​ Application Software
3.​ Programming Software

🔹 1. System Software
System software controls the hardware and provides a platform for other software to run. It
acts like a manager for the entire system.

📌 What it Does:
●​ It controls and manages the hardware (like CPU, memory, and devices).
●​ It provides a platform to run other software (like apps).

📌 Types of System Software:


1.​ Operating System (OS): Controls the whole computer system.
2.​ Utility Programs: Help in system maintenance (like antivirus, disk cleanup, etc.).

🔹 2. Application Software
Application software is designed to help users perform specific tasks like writing, drawing, or
playing games. It is what you use directly.

📌 What it Does:
●​ It allows users to do tasks like typing, editing, calculating, drawing, and playing games.
●​ It runs on top of system software (like Windows or Android).

📌 Types of Application Software:


1.​ Productivity Software: Helps with work (like Microsoft Word for writing).
2.​ Media Software: Used for editing images, audio, or video (like Photoshop).
3.​ Entertainment Software: Used for playing games or watching movies (like Netflix, VLC
Media Player).
4.​ Web Browsers: Used to access websites (like Google Chrome, Safari).
🔹 3. Programming Software
Programming software is used by developers to create, write, and test new software
programs. It provides the tools to create system software or application software.

📌 What it Does:
●​ It allows developers to write, edit, and test code to create new software.
●​ It provides tools to detect errors (bugs) and fix them.

📌 Types of Programming Software:


1.​ Compilers: Converts code into machine language (like GCC, Java compiler).
2.​ Interpreters: Executes the code line-by-line (like Python interpreter).
3.​ Code Editors: Used to write and edit code (like Visual Studio Code).
4.​ Debuggers: Used to find and fix errors in the program.

📘 Real-Life Example to Understand the 3 Types


Let’s imagine you are using your computer to watch a movie on Netflix. Here's how each type
of software plays a role:

1.​ System Software:


○​ The Operating System (OS) (like Windows or Android) makes sure your screen,
keyboard, sound, and memory are all working correctly.
○​ Utility Software (like antivirus) protects your system from viruses while watching
the movie.
2.​ Application Software:
○​ The Netflix app is an application software. It lets you watch movies, shows, and
videos.
○​ VLC Media Player is also an application software if you watch a downloaded
movie.
3.​ Programming Software:
○​ Developers use programming tools (like Visual Studio Code or GCC compiler)
to create the Netflix app. Without programming software, Netflix wouldn’t exist

What is Software Performance?

Software performance refers to how well software works. It measures how fast and efficient a
program is at completing tasks. In simpler terms, it is about how quickly and smoothly a
software or application runs.

Key aspects of software performance include:


1.​ Speed – How quickly the software performs tasks.
2.​ Efficiency – How well the software uses the resources (like memory, CPU) of the
computer.
3.​ Responsiveness – How quickly it reacts to user input or commands.

Factors to check software performance:

Summary of Key Factors:

●​ Execution Speed: How fast the software performs tasks.


●​ Throughput: How much work it can do in a given time.
●​ Resource Utilization: How efficiently it uses hardware resources.
●​ Scalability: How well it handles more work or users.
●​ Reliability: How consistently it works without errors.
●​ Memory Usage: How much memory it uses while running.
●​ Latency: The delay between actions and responses.
●​ Error Rate: How often the software has errors or failures.

📘 Important Measures of Computer Performance


📌 1. Clock Speed
Clock speed tells us how fast the CPU works. It is measured in Hertz (Hz), usually in GHz
(Gigahertz).

📋 What is Clock Speed?


●​ The CPU does its work in small "ticks" called clock cycles.
●​ The clock speed tells us how many cycles the CPU can complete in one second.
●​ For example, if the clock speed is 3 GHz, it means the CPU can do 3 billion cycles per
second.

📋 Why is it Important?
●​ Higher clock speed = Faster CPU.
●​ It directly affects how fast your computer can run programs.

📌 2. Processor Cache
The cache is a small amount of super-fast memory inside the CPU. It stores frequently used
data so that the CPU can access it quickly.

📋 What is Processor Cache?


●​ It is a small, super-fast memory inside the CPU.
●​ It stores data and instructions that are frequently used.
●​ Instead of going to RAM (which is slower), the CPU can get data from the cache.

📋 Types of Cache:
1.​ L1 Cache: Smallest but fastest cache, located inside the CPU.
2.​ L2 Cache: Slightly bigger but slower than L1.
3.​ L3 Cache: Larger, but slower than L2, shared by all CPU cores.

📋 Why is it Important?
●​ More cache = Faster CPU.
●​ When the cache is big, the CPU can store more data close to it, reducing the time to
access memory.

📌 3. Basic Performance Equation


The performance of a computer can be calculated using a simple equation called the Basic
Performance Equation.

📋 Equation:
CPU Time=Instruction Count×CPI×Clock Cycle Time

Where:

●​ CPU Time = Total time to execute a program.


●​ Instruction Count (IC) = Number of instructions in the program.
●​ CPI (Cycles Per Instruction) = Number of clock cycles needed to execute each
instruction.
●​ Clock Cycle Time = Time taken for 1 clock cycle (1 / Clock Speed).

📋 Another Form of the Equation:


CPU Time=Instruction Count×CPI​ )/ Clock Speed

Where Clock Speed = 1 / Clock Cycle Time.

📋 Example Problem 1
Problem:​
A program has 2 billion instructions.​
The CPU takes 2 clock cycles per instruction (CPI = 2).​
The clock speed is 2 GHz (which means 1 cycle = 1 / 2 GHz = 0.5 nanoseconds).

Solution:

CPU Time=Instruction Count×CPI×Clock Cycle Time

CPU Time=2billion×2cycles/instruction×0.5ns

CPU Time=2×2×0.5seconds

CPU Time=2seconds

So, it takes 2 seconds to run this program.

📋 Example Problem 2
Problem:​
A program has 1.5 billion instructions.​
The CPU has a clock speed of 3 GHz.​
It takes 1.5 cycles per instruction (CPI = 1.5).

Solution:

CPU Time=(Instruction Count×CPI​)/ Clock Speed

CPU Time=1.5billion×1.5​/ 3GHz

CPU Time=2.25billion​/ 3GHz

CPU Time=0.75seconds

So, it takes 0.75 seconds to run this program.

Instruction Set Architecture:

🔹 1. What is RISC (Reduced Instruction Set Computer)?


RISC means the CPU has a small set of simple instructions. Each instruction is very simple
and takes only one clock cycle to execute.

📋 Key Features of RISC


1.​ Simple Instructions: Each instruction is small and simple (like "add two numbers").
2.​ Faster Execution: Since each instruction takes only one clock cycle, RISC is fast.
3.​ Requires More Instructions: To complete a task, you might need more instructions
compared to CISC.
4.​ Load/Store Design: Data is only accessed from memory using specific load and store
instructions.

📋 Examples of RISC Processors


●​ ARM Processors (used in smartphones, tablets, smartwatches, etc.).
●​ Apple M1 and M2 Chips (used in new MacBooks and iPads).
●​ RISC-V Processors (new open-source architecture used in IoT devices).

📋 Devices that Use RISC


●​ Smartphones (because ARM processors are RISC-based).
●​ Tablets (Apple iPads use RISC-based chips like the Apple M1 or M2).
●​ Smartwatches (like Apple Watch or Samsung Galaxy Watch).

🔹 2. What is CISC (Complex Instruction Set Computer)?


CISC means the CPU has a large set of complex instructions. Each instruction can do
multiple things at once (like "load data, add it, and store it" in one instruction).

📋 Key Features of CISC


1.​ Complex Instructions: Each instruction can do multiple tasks at once (like load, add,
and store in one step).
2.​ Takes More Time Per Instruction: Since the instruction is complex, it takes more than
one clock cycle to execute.
3.​ Fewer Instructions: Since each instruction can do many things at once, you need fewer
instructions to complete a task.
4.​ Direct Memory Access: Data can be directly accessed from memory without using
special load/store instructions.

📋 Examples of CISC Processors


●​ Intel Processors (used in most desktops and laptops).
●​ AMD Processors (like Ryzen processors in gaming computers).
●​ x86 Processors (used in personal computers, servers, and laptops).

📋 Devices that Use CISC


●​ Laptops (like Dell, HP, Lenovo, and MacBooks with Intel or AMD processors).
●​ Desktops and PCs (most personal computers use Intel or AMD processors).
●​ Servers (large computers that host websites or applications).

📘 Memory Locations and Addresses in Computer Architecture (Simple


Explanation)

When we store data in a computer, it needs to be placed somewhere so that it can be found
later. This "somewhere" is called a memory location. To identify where the data is stored, every
memory location is given a unique address. Think of it like how houses on a street have unique
house numbers so that you can find the correct house.

📌 1. What is a Memory Location?


A memory location is a specific spot in the computer's memory where data (like numbers,
letters, or instructions) is stored. Each memory location can hold one piece of data.

📌 2. What is a Memory Address?


A memory address is a unique number assigned to each memory location so the computer can
find and access it. It works like a house number or an index number for each mailbox. The
address tells the computer where to look for the data.

📌 3. Relationship Between Memory Location and Address


●​ A memory location is where the data is stored.
●​ A memory address is the unique number used to find the location of that data.

📘 1. CPU Addressing Capability


●​ What it is: The total memory the CPU can access, based on the number of address
lines.
●​ How it works: If a CPU has n address lines, it can access 2n2^n2n memory locations.
●​ Example: A CPU with 32 address lines can access 232=4 GB2^{32} = 4 \,
\text{GB}232=4GB of memory.

📘 2. Word Length
●​ What it is: The size of data the CPU processes at once (in bits, like 8, 16, 32, or 64 bits).
●​ How it works: The larger the word length, the more data the CPU can process in one
go.
●​ Example: A 64-bit CPU can process 64 bits of data at a time, while a 32-bit CPU
processes only 32 bits.

📘 3. Memory Cell Organization


●​ What it is: How memory is divided into small storage units (cells), each storing 1 byte
or word.
●​ How it works: Each memory cell has a unique address to identify and access it.
●​ Example: A memory with 1 KB has 1024 cells, and each cell can store 1 byte of data.

📘 4. Byte Addressing
●​ What it is: Each byte (8 bits) in memory has a unique address.
●​ How it works: To access data, the CPU references the address of the exact byte.
●​ Example: If memory starts at address 0x0000, then 0x0001 is the address of the next
byte.

📘 5. Word Addressing
●​ What it is: Instead of addressing each byte, the CPU addresses whole words (like 2, 4,
or 8 bytes at a time).
●​ How it works: Each "word" gets a unique address, and the address jumps by the word
size.
●​ Example: If the word size is 4 bytes, and the first word is at address 0x0000, the next
word will be at 0x0004.

Memory Operations :

📘 1. Memory Read Operation


●​ What it is: The process of retrieving data from memory to the CPU.
●​ How it works: The CPU sends the memory address of the required data, and the
data is sent back to the CPU.
●​ Example: When you open a file, the computer reads the file's data from memory
and displays it on the screen.
📘 2. Memory Write Operation
●​ What it is: The process of storing data from the CPU to memory.
●​ How it works: The CPU sends the data and the memory address where the data should
be stored.
●​ Example: When you save a document, the data from the CPU is written into memory
(like on your hard drive or RAM).

📘 1. What is an Instruction?
●​ What it is: An order or command given to the CPU to perform a specific task.
●​ How it works: Instructions tell the CPU what to do, like add numbers, move data,
or make decisions.
●​ Example: An instruction like ADD A, B tells the CPU to add the values in registers
A and B.

📘 2. Types of Instructions
🔹 1. Data Transfer Instructions
●​ What it is: Used to move data from one location to another.
●​ Example: MOV A, B moves data from register B to register A.

🔹 2. Arithmetic Instructions
●​ What it is: Used to perform math operations like add, subtract, multiply, and divide.
●​ Example: ADD A, B adds the values in registers A and B.

🔹 3. Logical Instructions
●​ What it is: Used to perform logical operations like AND, OR, and NOT.
●​ Example: AND A, B performs the logical AND operation on values in A and B.

🔹 4. Control Instructions
●​ What it is: Used to change the flow of program execution (like loops and jumps).
●​ Example: JUMP 100 moves the program execution to address 100.

🔹 5. Input/Output (I/O) Instructions


●​ What it is: Used to send or receive data to/from input/output devices.
●​ Example: IN A, PORT1 reads input from PORT1 and stores it in A.
Assembly Language Notation?

●​ What it is: A low-level programming language that uses human-readable instructions


(like MOV, ADD) instead of binary machine code.
●​ How it works: Each assembly instruction corresponds to a specific CPU instruction.

MOV A, 5 ; Move the value 5 to register A

ADD A, B ; Add the value of register B to register A

●​ Explanation: MOV moves data, and ADD performs addition.

📘 2. Instruction Format Types


Instructions are made up of opcodes (the action, like ADD) and operands (the data or
addresses it works on). The format defines how many operands are used.

📘 3. Types of Instruction Formats


🔹 1. Three-Address Instruction
●​ What it is: Uses 3 addresses in one instruction (source1, source2, destination).
●​ Syntax: OPCODE DEST, SOURCE1, SOURCE2
●​ Example

ADD C, A, B ; C = A + B

Explanation: Add A and B and store the result in C.

🔹 2. Two-Address Instruction
●​ What it is: Uses 2 addresses (source and destination, one acts as both input and
output).
●​ Syntax: OPCODE DEST, SOURCE

ADD A, B ; A = A + B

Explanation: Add A and B and store the result back in A.

🔹 3. One-Address Instruction
●​ What it is: Uses 1 address for data, with the second operand stored in a special register
(like an Accumulator).
●​ Syntax: OPCODE ADDRESS
●​ Example

LOAD A ; Load data from memory location A into the accumulator

ADD B ; Add value from B to the accumulator

Explanation: Load A into the accumulator, then add B to it.

📘 1. What is an Accumulator?
What it is: A special register in the CPU that stores the result of arithmetic and logic operations.

How it works: It temporarily holds data for operations like ADD, SUB, AND, etc.

Example: When you add two numbers, the result is stored in the accumulator before being
moved elsewhere.

🔹 4. Zero-Address Instruction
●​ What it is: Uses no addresses, as it works with data in a stack (a special memory
structure).
●​ Syntax: OPCODE
●​ Example

PUSH 5 ; Push 5 onto the stack

PUSH 3 ; Push 3 onto the stack

ADD ; Pop two values from the stack, add them, and push result back

Explanation: The stack stores values, and ADD pops two values, adds them, and pushes the
result back.

📘 2. Which Address Instruction is Best?


●​ Best Choice: It depends on the CPU design and the type of task.
●​ Reason:
○​ Three-address instructions are best for faster, complex calculations but
require more memory space.
○​ Two-address instructions balance speed and space.
○​ One-address (with accumulator) is simple but slower due to extra data
movement.
○​ Zero-address is best for stack-based CPUs (like RPN calculators).
📘 3. Real-Time Application of Address Instructions
●​ Three-Address: Used in modern RISC processors for faster execution (like
smartphones, tablets).
●​ Two-Address: Used in embedded systems (like smart appliances) to save memory.
●​ One-Address: Used in older systems with simple accumulator-based CPUs.
●​ Zero-Address: Used in stack-based virtual machines (like the JVM for Java).

📘 1. What is Addressing Mode?


●​ What it is: Addressing mode tells the CPU how to find the data (operands) for an
instruction.
●​ Why it matters: It helps the CPU locate data in registers, memory, or as part of the
instruction itself.
●​ Example: When you ask for a book in a library, you can give its exact location
(register) or describe its position (memory address).

📘 2. Common Addressing Modes (with Examples)


🔹 1. Immediate Addressing Mode
●​ What it is: The operand (data) is directly part of the instruction.
●​ Syntax: MOV A, 5 (Here, 5 is the data itself).

MOV A, 10 ; Store the value 10 directly in register A

Usage: Used to load fixed values like constants.

🔹 2. Register Addressing Mode


●​ What it is: The operand (data) is stored in a CPU register.
●​ Syntax: MOV A, B (Copy data from register B to register A).

Example:​
MOV A, B ; Copy the value from register B to register A

●​ Usage: Used for fast data transfer between registers.

🔹 3. Direct Addressing Mode


●​ What it is: The operand is stored in a memory location whose address is part of the
instruction.
●​ Syntax: MOV A, [1000] (Copy data from memory address 1000 to register A).

Example:​
MOV A, [1000] ; Get the data from memory address 1000 and store it in
register A

●​ Usage: Used to access data stored in memory locations.

🔹 4. Indirect Addressing Mode


●​ What it is: The memory address of the data is stored in a register.
●​ Syntax: MOV A, [R1] (Copy data from memory address stored in register R1 to
register A).

Example:​
MOV R1, 1000 ; Store the memory address 1000 in register R1
MOV A, [R1] ; Access the data at address 1000 (from R1) and store
it in A

●​ Usage: Used in dynamic memory access (like pointers in C).

🔹 5. Indexed Addressing Mode


●​ What it is: The base address is combined with an offset to locate the operand.
●​ Syntax: MOV A, [BASE + OFFSET] (Get data from base + offset address).

Example:​
MOV R1, 1000 ; Base address in R1
MOV A, [R1 + 2] ; Access data from address 1002 (1000 + 2) and store
in A

●​ Usage: Used for arrays and loops where addresses change repeatedly.

🔹 6. Relative Addressing Mode


●​ What it is: The address is relative to the current program counter (PC).
●​ Syntax: JUMP + 10 (Move execution to 10 addresses forward).

Example:​

JUMP +4 ; Jump 4 instructions ahead in the program

●​ Usage: Used in loops and branching (like if-else conditions).

Example Program Using Different Addressing Modes


Problem: Add two numbers from memory and store the result in register A.

; 1. Immediate Addressing
MOV A, 5 ; Load value 5 into register A (Immediate)

; 2. Register Addressing
MOV B, 10 ; Load value 10 into register B
ADD A, B ; Add B to A (Register Addressing)

; 3. Direct Addressing
MOV A, [1000] ; Load value from memory address 1000 into A

; 4. Indirect Addressing
MOV R1, 2000 ; Store the memory address 2000 in register R1
MOV A, [R1] ; Load value from memory address 2000 (via R1) into A

; 5. Indexed Addressing
MOV R1, 1000 ; Base address
MOV A, [R1 + 4] ; Load value from address 1004 (base + 4) into A

; 6. Relative Addressing
JUMP +4 ; Jump 4 instructions ahead (relative to current PC)

Computers Don't Understand Human Languages

●​ Problem: Computers only understand 0s and 1s. Assembly instructions are already
close to machine code.
●​ Why It Matters: If we wrote in English, the computer would need a super complex
translator to understand every sentence, every grammar rule, and every possible
meaning. Instead, assembly is a "middle language" between human language and
binary.
●​ Example:
○​ Binary (what the CPU understands): 1100001100000101
○​ Assembly (middle ground): MOV A, 5
○​ English (too complex): "Store the number 5 in register A of the CPU."

📘 What is Instruction Sequencing in Computer Architecture?


Instruction sequencing is the process of executing instructions in the correct order (or
sequence) so that a program runs properly. It ensures that the CPU follows the logical flow of
the program, one step at a time.
📘 Key Steps in Instruction Sequencing
1.​ Fetch the Instruction:​
The CPU gets the next instruction from memory.
2.​ Decode the Instruction:​
The CPU understands what the instruction means and what needs to be done.
3.​ Execute the Instruction:​
The CPU performs the operation, like adding two numbers or moving data.
4.​ Move to the Next Instruction:​
The CPU updates the Program Counter (PC) to point to the next instruction.

📘 Basic I/O Operations in Computer Architecture


I/O (Input/Output) operations refer to the process of transferring data between a computer and
the outside world (e.g., user, storage devices, or other systems). These operations are
essential for interacting with the system and executing tasks.

📘 Types of I/O Operations


1.​ Input Operations:
○​ Collect data from input devices (e.g., keyboard, mouse, scanner).
○​ Example: When you type on a keyboard, the characters are input into the
computer.
2.​ Output Operations:
○​ Send data from the computer to output devices (e.g., monitor, printer).
○​ Example: When you see text displayed on the screen, the computer is performing
an output operation.

📘 Steps of Basic I/O Operations


1.​ Device Request:​
The CPU sends a command to the input or output device.
2.​ Data Transfer:​
Data moves between the CPU and the device using specific channels (e.g., I/O ports or
buses).
3.​ Acknowledge:​
The device sends a signal back to the CPU, confirming the operation is complete.
📘 Example: Reading from a Keyboard
1.​ Input Request:​
The user types the letter A on the keyboard.
2.​ Data Transfer:
○​ The keyboard sends the binary representation of A (01000001) to the CPU.
○​ The CPU stores this data in memory for further processing.
3.​ Acknowledge:​
The CPU signals the system that the data was received.

📘 Example: Displaying Text on a Screen


1.​ Output Request:​
The CPU sends a command to display the letter A on the screen.
2.​ Data Transfer:
○​ The binary representation of A is sent to the display device.
○​ The monitor converts the binary data into visible text.
3.​ Acknowledge:​
The screen shows the letter A, completing the output process.

📘 Methods of I/O Operations


1.​ Programmed I/O:
○​ The CPU actively manages data transfer.
○​ Example: A CPU repeatedly checks if a printer is ready to print.
2.​ Interrupt-Driven I/O:
○​ Devices notify the CPU when they’re ready to send or receive data using an
interrupt signal.
○​ Example: A keyboard sends an interrupt when a key is pressed.
3.​ Direct Memory Access (DMA):
○​ Data transfers directly between memory and the device without CPU
involvement, improving efficiency.
○​ Example: Copying large files from a USB drive to your hard disk.

📘 Real-Life Analogy
Think of I/O operations as a conversation between you and a vending machine:
1.​ Input: You press a button to select a drink.
2.​ Processing: The machine reads your selection.
3.​ Output: The machine dispenses the drink.

📘 Conclusion
Basic I/O operations are vital for computer systems to communicate with external devices. They
handle both inputs (data in) and outputs (data out), ensuring smooth interactions between the
CPU, memory, and peripherals.
Arthimetic Unit
1's Complement:

●​ 1's complement of a binary number is obtained by flipping all the bits of the number,
i.e., changing every 1 to 0 and every 0 to 1.
●​ The 1's complement system represents negative numbers by inverting the bits of the
corresponding positive number.
●​ For example, the 1's complement of 110010 is 001101.

2's Complement:

●​ 2's complement of a binary number is obtained by first taking the 1's complement and
then adding 1 to the least significant bit (LSB).
●​ The 2's complement is more widely used in computer systems for representing signed
integers because it simplifies addition and subtraction operations.
●​ For example, the 2's complement of 110010 is 001110 (after first flipping the bits and
then adding 1).

complements (specifically 1's complement and 2's complement) are used to represent
negative numbers in binary systems, and they provide a way to indicate the negative sign of a
number.

Example:

Let's take the number 5 and represent it as a negative value using complement notation:

1.​ Positive 5 in binary:


○​ 5 in decimal = 00000101 (8-bit binary)
2.​ 1's complement of -5:

Invert the bits of 5:​



00000101 → 11111010

○​ 11111010 is the 1's complement of -5.


3.​ 2's complement of -5:

Start with the 1's complement (11111010), then add 1:​



11111010 + 1 = 11111011

○​ 11111011 is the 2's complement of -5.


Summary:

●​ 1's and 2's complement systems are used to represent negative numbers in binary.
●​ In both systems, the complement method allows negative numbers to be handled in
binary form efficiently.
●​ 2's complement is the most commonly used system in modern computers because it
simplifies arithmetic and resolves issues like double representation of zero.

Example:

Step 1: Convert 72 to Binary (Unsigned Representation)

First, we need to convert 72 to its binary representation:

●​ 72 in decimal = 1001000 in binary (7 bits).


●​ To represent this as an 8-bit binary number, we add a leading 0 to make it 8 bits:​
72 = 01001000 (8-bit binary).

Step 2: 1's Complement of 72

To find the 1's complement of 72, we invert all the bits of the binary representation:

1.​ The binary representation of 72 is:​


01001000
2.​ Now, invert each bit (change 1 to 0 and 0 to 1):​
01001000 → 10110111

So, the 1's complement of 72 (i.e., for -72) is:​


10110111

Step 3: 2's Complement of 72

To find the 2's complement of 72, we follow these steps:

1.​ First, find the 1's complement (which we already computed):​


1's complement of 72 = 10110111
2.​ Now, add 1 to the 1's complement result:​
10110111 + 1 = 10111000

So, the 2's complement of 72 (i.e., for -72) is:​


10111000
Addition of signed numbers

Example: Adding Two Signed Numbers


(7-4)

Subtraction btw : 7-3

7-8= -1

in 2's complement representation (which is commonly used for signed numbers in most
computer systems), -1 is represented as 11111111 in binary, assuming an 8-bit system.

Half Adder:

What is Half Adder?


Half Adder is a combinational logic circuit that is designed by connecting one EX-OR gate and
one AND gate. The half-adder circuit has two inputs: A and B, which add two input digits and
generate a carry and a sum.

A half adder is a fundamental digital circuit that adds two single-bit binary numbers and
produces a sum and a carry output. It doesn't have any carry input

Inputs and Outputs:

●​ Inputs: Two binary digits (A and B).


●​ Outputs:
○​ Sum (S): The least significant bit of the result.
○​ Carry (C): The carry-out bit (1 if there's a carry, 0 if there isn't).
The output obtained from the EX-OR gate is the sum of the two numbers while that obtained by
AND gate is the carry. There will be no forwarding of carry addition because there is no logic
gate to process that. Thus, this is called the Half Adder circuit.

Logical Expression of Half Adder

The Logical Expression for half added is given as

Sum = A ⊕ B

Carry = A AND B

What is Full Adder ?


Full Adder is the circuit that consists of two EX-OR gates, two AND gates,
and one OR gate. Full Adder is the adder that adds three inputs and
produces two outputs which consist of two EX-OR gates, two AND gates,
and one OR gate. The first two inputs are A and B and the third input is an
input carry as C-IN. The output carry is designated as C-OUT and the normal
output is designated as S which is SUM.
The equation obtained by the EX-OR gate is the sum of the binary digits.
While the output obtained by AND gate is the carry obtained by addition.

Logical Expression of Full Adder

Given Below is the Logical Expression of Full Adder

SUM = (A XOR B) XOR Cin = (A ⊕ B) ⊕ Cin

CARRY-OUT = A AND B OR Cin(A XOR B) = A.B + Cin(A ⊕ B)


Advantages and Disadvantages

Advantages of Half Adder

●​ Flexible and easy when it comes to design.

●​ Involves the use of fewer logic gates thus, is cheaper.

Disadvantages of Half Adder

●​ Fails to process a carry input from the previously added numbers.

●​ Restricted to the addition of only two bits.

Advantages of Full Adder

●​ Can add 3 bits, it includes one carry input and a carry output, which

can perform more elaborate computations.

●​ It can be cascaded to produce adders for a number of bit additions

which makes it suitable for multi bit arithmetic.

Disadvantages of Full Adder

●​ Complex and needs more gates, hence making the design more

complicate and expensive.

●​ Yeah man, slightly slower because normally 2 gate process are used

instead

●​ of 1.
Applications

Applications of Half Adder

●​ Arithmetic operations like addition, subtraction, and multiplication in

low level dynamic circuits.

●​ Three types of rectifiers: half-wave, full-wave, and full-wave with a

center tapped secondary. Used in small integration circuits.

Applications of Full Adder

●​ Carry-look ahead adders in digital processors that utilize multi-bit

binary addition.

●​ Present in the arithmetic logic units (ALU) and other complicated

digital system

Fast Adders.: The adder produce carry propagation delay while performing

other arithmetic operations like multiplication and divisions as it uses several

additions or subtraction steps. This is a major problem for the adder and

hence improving the speed of addition will improve the speed of all other

arithmetic operations. Hence reducing the carry propagation delay of adders

is of great importance. There are different logic design approaches that have

been employed to overcome the carry propagation problem. One widely used

approach is to employ a carry look-ahead which solves this problem by


calculating the carry signals in advance, based on the input signals. This type

of adder circuit is called a carry look-ahead adder.

Carry look ahead adder is used to perform quick arithmetic operations.


BASIC PROCESSING UNIT
INTRODUCTION:

The Basic Processing Unit (BPU), often referred to as the Central Processing Unit
(CPU) in many cases, is the part of a computer or device that carries out instructions
from programs. It’s essentially the "brain" of the computer, where most of the work
happens. Here are its key fundamental concepts explained in simple terms:

1.​ Control Unit (CU):​

○​ The control unit directs the operations of the CPU. It tells the other parts of
the computer system how to respond to a program’s instructions. You can
think of it like a manager who tells everyone what to do.
2.​ Arithmetic Logic Unit (ALU):​

○​ This part of the CPU performs mathematical calculations (like addition or


subtraction) and logical operations (like comparing numbers or checking
conditions). It handles the "thinking" part of the computer.
3.​ Registers:​

○​ Registers are small, fast storage areas in the CPU used to temporarily
hold data and instructions that are being processed. They help speed up
operations by storing data close to where it's needed in the CPU.
4.​ Clock:​

○​ The clock helps the CPU know when to carry out operations. It generates
a consistent timing signal that helps synchronize the work of the different
components of the computer.
5.​ Cache:​

○​ The cache is a smaller, faster type of memory that stores frequently used
data and instructions so that the CPU can access them quickly, rather than
having to retrieve them from slower main memory.

In summary, the BPU/CPU processes information by fetching instructions, executing


them, and storing results—all in a very fast and organized manner—so that the
computer can perform tasks and run programs.

Execution of a Complete Instruction in the Basic Processing Unit (CPU)


The execution of a complete instruction in the CPU follows a series of steps known as
the Fetch-Decode-Execute Cycle or Instruction Cycle. This process happens rapidly
and repeatedly to allow the computer to run programs. Let’s break it down into simple
steps and provide a visual representation.

1. Fetch

●​ What happens? The CPU retrieves (fetches) the next instruction from memory
(RAM) to execute.
●​ How? The Program Counter (PC) holds the memory address of the next
instruction. The CPU uses the Memory Address Register (MAR) to access the
instruction stored at that location in memory, and it is loaded into the Memory
Buffer Register (MBR).

2. Decode

●​ What happens? The fetched instruction is decoded to understand what action


needs to be taken. This step involves translating the instruction into something
the CPU can work with.
●​ How? The Control Unit (CU) examines the instruction to determine the
operation (like adding, moving data, etc.) and identifies which registers or parts of
the CPU are involved.

3. Execute

●​ What happens? The CPU performs the action specified by the instruction.
●​ How? The Arithmetic Logic Unit (ALU) may carry out calculations, logical
operations, or data movement depending on the type of instruction.

4. Store (optional)

●​ What happens? After execution, the result may need to be stored back into
memory or a register.
●​ How? The result is written back to the register or memory as needed.

Visual Representation of the Instruction Cycle

Here's a simplified step-by-step flow:

+---------------------+
| Fetch Instruction |
+---------------------+
|
v
+---------------------+
| Decode Instruction |
+---------------------+
|
v
+---------------------+
| Execute Action |
+---------------------+
|
v
+---------------------+
| Store Result (if needed) |
+---------------------+
|
v
(Repeat the cycle for the next instruction)

Step-by-Step Breakdown:

1.​ Fetching the instruction:​


The CPU's Program Counter (PC) points to the address of the next instruction.
This instruction is fetched from memory and placed into the Instruction Register
(IR).​

2.​ Decoding the instruction:​


The Control Unit (CU) looks at the instruction in the IR and determines what
needs to be done (e.g., add two numbers, move data, etc.).​

3.​ Executing the instruction:​


The ALU performs the operation defined by the instruction (like adding numbers,
comparing values, etc.).​

4.​ Storing the result:​


If necessary, the result of the operation is stored in a register or back to
memory for later use.​

Example:
Let’s say the instruction is to add two numbers.

●​ Fetch: The instruction to add is fetched from memory.


●​ Decode: The control unit decodes that it needs to add two values.
●​ Execute: The ALU performs the addition of those two numbers.
●​ Store: The result of the addition is saved back to a register or memory.

This cycle repeats continuously as long as the program is running.

In simple terms, the CPU is like a worker who keeps fetching a list of tasks, figuring out
what each task requires, doing the task, and then storing the result. This happens very
quickly, so you don’t notice the process happening—it just looks like the computer is
running smoothly.

Let's walk through the execution of a complete instruction step by step, using a
simple example: "Add two numbers." In this case, we’ll assume the numbers are
stored in registers, and the instruction is to add them together.

Example Instruction:

●​ Instruction: ADD R1, R2 (This means: Add the value in register R1 to the value
in register R2 and store the result in R2).

Step-by-Step Execution of the Instruction

1. Fetch

●​ The CPU fetches the instruction (ADD R1, R2) from memory.
●​ The Program Counter (PC) holds the address of the next instruction to execute.
Let’s say the address of this instruction is 1000.
●​ The CPU uses the Memory Address Register (MAR) to access memory at
address 1000, and the Memory Buffer Register (MBR) temporarily stores the
instruction ADD R1, R2.
●​ The Instruction Register (IR) then holds this instruction, so the CPU knows
what it needs to do.

2. Decode

●​ The Control Unit (CU) decodes the instruction from the Instruction Register
(IR).​

●​ The CU determines that the instruction is an ADD operation and identifies the
operands (the values to be added). In this case, the operands are the values
stored in R1 and R2.​

●​ The CU also understands that the result should be stored back into R2.​

●​ Key takeaway: The CU decodes that it needs to take the values in R1 and R2,
perform an addition, and store the result in R2.​

3. Execute

●​ The Arithmetic Logic Unit (ALU) now performs the addition.


●​ The ALU retrieves the values from R1 and R2 (let’s assume R1 = 5 and R2 = 3).
●​ It adds the values: 5 + 3 = 8.
●​ The ALU performs the operation and sends the result (8) to the next step for
storage.

4. Store (optional)

●​ The result (8) from the ALU is stored back into R2.​

●​ Now, R2 holds the value 8.​

●​ Key takeaway: The result of the addition is saved in R2, replacing the old value
in R2 (which was 3).​

Summary of the Process:

1.​ Fetch: The instruction ADD R1, R2 is fetched from memory.


2.​ Decode: The Control Unit decodes that the instruction is an addition of the
values in registers R1 and R2.
3.​ Execute: The ALU adds the values in R1 (5) and R2 (3), resulting in 8.
4.​ Store: The result (8) is stored in R2.

Now, after the instruction is executed, R2 contains the new value 8.

Visual Representation of the Execution Process:


+---------------------+
| Fetch | <-- Fetch the instruction "ADD R1, R2"
| (ADD R1, R2) |
+---------------------+
|
v
+---------------------+
| Decode | <-- Decode the instruction: ADD R1 and R2
+---------------------+
|
v
+---------------------+
| Execute | <-- ALU adds R1 (5) and R2 (3), result = 8
+---------------------+
|
v
+---------------------+
| Store | <-- Store the result (8) in R2
+---------------------+
|
v
(Repeat for the next instruction)

This process happens very quickly, and the CPU continuously cycles through these
steps to process instructions in the program.

Hardwired Control in Basic Processing Unit (CPU)

Hardwired control is a method used in CPUs to manage and direct the execution of
instructions using fixed logic circuits. It involves the use of combinational circuits (like
decoders, gates, and multiplexers) that generate control signals to guide the flow of
data and operations within the CPU. This type of control is called "hardwired" because
the control logic is designed and implemented using physical hardware, and it is not
programmable.

How Hardwired Control Works:

In a hardwired control unit, the sequence of actions required to execute each


instruction is determined by fixed hardware components. These components generate
control signals based on the instruction being executed.
Key Elements of Hardwired Control:

1.​ Instruction Decoder:​

○​ The instruction decoder interprets the opcode (operation code) of the


instruction and translates it into a set of control signals.
2.​ Control Signals:​

○​ The control unit generates signals that control various parts of the CPU
(like the Arithmetic Logic Unit (ALU), registers, and memory).
○​ These signals determine what operations should be performed, such as
"add," "store," or "fetch."
3.​ Clock:​

○​ The clock synchronizes the operations of the CPU and ensures that the
control signals are executed in the right sequence.
4.​ Control Logic Circuit:​

○​ The control logic consists of logic gates and flip-flops, which are used to
control the data flow between registers, ALU, memory, and other
components.

Working of Hardwired Control:

●​ The CPU fetches an instruction from memory.


●​ The Instruction Decoder decodes the instruction and generates the
corresponding control signals.
●​ The control signals then direct the appropriate operations (e.g., ALU operation,
data movement) in each cycle of the fetch-decode-execute process.
●​ This happens quickly, synchronized by the clock, and involves no
reprogramming.

Pictorial Representation of Hardwired Control in a CPU:


+---------------------+ +---------------------+
| Instruction | | Instruction |
| Register (IR) | | Decoder |
| | | |
+---------------------+ +---------------------+
| |
v v
+------------------------+ +------------------------+
| Control Unit (CU) | ---> | Control Signals |
| (Hardwired Logic) | | for ALU, Memory, etc. |
+------------------------+ +------------------------+
|
v
+------------------------+ +------------------------+
| Arithmetic Logic Unit | <--> | Registers / Memory |
+------------------------+ +------------------------+
|
v
+------------------------+
| Output or Store Result|
+------------------------+

Step-by-Step Explanation with Pictorial Example:

1.​ Instruction Fetch:​

○​ The instruction is fetched from memory and stored in the Instruction


Register (IR).
2.​ Instruction Decoding:​

○​ The Instruction Decoder decodes the instruction to figure out what


operation is to be performed (e.g., ADD, SUB, LOAD).
3.​ Control Signal Generation:​

○​ The decoded instruction is sent to the Control Unit (CU), which contains
the hardwired logic.
○​ The CU generates the necessary control signals to tell the ALU,
registers, and memory what operations to perform.
4.​ Execution:​

○​ The ALU performs the operation (like adding two numbers) based on the
control signals.
○​ The results are stored in the registers or sent to memory.
5.​ Repeat:​

○​ The cycle continues with the CPU fetching, decoding, and executing
instructions until the program finishes.
Advantages of Hardwired Control:

●​ Faster Execution: Since the control signals are generated using fixed hardware,
the execution is faster than with programmable control systems.
●​ Simplicity: The design is simpler, especially for simpler or fixed operations.

Disadvantages of Hardwired Control:

●​ Limited Flexibility: It’s harder to modify or change the behavior of the control
unit, as it’s based on physical circuits.
●​ Complexity in Designing for Complex Instructions: As the instruction set
becomes more complex, the hardwired control design can become complicated
and less efficient.

In summary, hardwired control in a CPU is a fast and efficient way to control the flow
of operations by using fixed logic circuits to generate control signals, but it lacks the
flexibility of programmable control units.

Microprogrammed Control in Basic Processing Unit (CPU)

Microprogrammed control is a method of controlling the operations of a CPU using a


set of microinstructions. These microinstructions are stored in memory (usually ROM or
RAM), and each microinstruction controls a small operation of the CPU, such as moving
data, performing calculations, or interacting with I/O devices. The microprogrammed
control unit reads the microinstructions in sequence to execute a complex instruction.

How Microprogrammed Control Works:

Unlike hardwired control where control signals are generated by fixed logic circuits,
microprogrammed control uses a series of microinstructions stored in memory to
generate the control signals. The microinstructions are organized into microprograms
that represent the sequence of operations needed to perform a high-level instruction.

Key Components of Microprogrammed Control:

1.​ Control Memory:​

○​ A special memory (often ROM or RAM) that stores the microprogram


(sequence of microinstructions).
2.​ Microinstruction:​

○​ A low-level instruction that specifies individual operations, such as moving


data between registers or activating certain parts of the ALU.
3.​ Control Unit (CU):​

○​ The CU uses the address of the current microinstruction to fetch it from


the control memory. The microinstruction then generates control signals
that guide the operation of the CPU.
4.​ Program Counter (PC):​

○​ It holds the address of the next microinstruction to be fetched from control


memory.
5.​ Sequencer:​

○​ The sequencer manages the flow of execution by determining the address


of the next microinstruction (either sequential or conditional jump).

Step-by-Step Working of Microprogrammed Control:

1.​ Instruction Fetch:​

○​ The CPU fetches a high-level instruction (e.g., ADD R1, R2).


2.​ Fetch Microinstruction:​

○​ The Program Counter (PC) provides the address of the first


microinstruction for the ADD operation.
○​ The Control Unit (CU) fetches the first microinstruction from Control
Memory.
3.​ Microinstruction Decoding:​

○​ The microinstruction is decoded. This microinstruction specifies what


small operation to perform, such as moving data, activating the ALU, etc.
4.​ Generate Control Signals:​

○​ The decoded microinstruction generates control signals to operate the


components of the CPU (e.g., ALU, registers, memory).
5.​ Execute Operation:​

○​ The CPU performs the operation specified by the microinstruction (e.g.,


ALU adds values from R1 and R2).
6.​ Fetch Next Microinstruction:​

○​ The sequencer determines the address of the next microinstruction (it may
jump or continue sequentially) and updates the Program Counter (PC).
7.​ Repeat the Cycle:​

○​ The process repeats for each microinstruction until the full operation (e.g.,
ADD) is completed.

Pictorial Representation of Microprogrammed Control:


+---------------------+
| High-Level | <-- Fetch high-level instruction (e.g., ADD R1, R2)
| Instruction (ADD) |
+---------------------+
|
v
+---------------------+
| Program Counter | <-- Provide address for the first microinstruction
| (PC) |
+---------------------+
|
v
+---------------------+
| Control Memory | <-- Fetch the first microinstruction (e.g., move R1 to ALU)
+---------------------+
|
v
+---------------------+
| Microinstruction | <-- Decode microinstruction (e.g., ALU operation)
+---------------------+
|
v
+---------------------+
| Control Signals | <-- Generate signals (e.g., ALU, registers)
+---------------------+
|
v
+---------------------+
| Execute Operation | <-- Perform the operation (e.g., ALU adds R1 and R2)
+---------------------+
|
v
+---------------------+
| Sequencer / PC | <-- Update PC, fetch next microinstruction
+---------------------+
|
v
(Repeat for the next microinstruction)

Example of Microprogrammed Control in Action:

Let’s take an example where the instruction is "ADD R1, R2". This instruction is broken
down into several microinstructions to perform the addition:

1.​ First Microinstruction: Move data from register R1 to the ALU input.
2.​ Second Microinstruction: Move data from register R2 to the ALU input.
3.​ Third Microinstruction: Perform the addition operation in the ALU.
4.​ Fourth Microinstruction: Store the result back in R2.

Each of these microinstructions would be fetched from the control memory, decoded,
and executed step by step.

Advantages of Microprogrammed Control:

●​ Flexibility: It's easier to modify or change the control unit since microinstructions
can be updated in memory.
●​ Simplified Design: Microprogrammed control is easier to design for complex
instruction sets, as it breaks down high-level instructions into smaller,
manageable microinstructions.
●​ Cost-effective: Easier to implement for complex processors without needing
complex hardwired logic.

Disadvantages of Microprogrammed Control:

●​ Slower Execution: The execution time can be slower compared to hardwired


control since it involves fetching and decoding microinstructions.
●​ More Memory: Requires extra memory to store the microprogram.

Summary:
●​ Microprogrammed control uses a sequence of microinstructions to generate
control signals for CPU operations.
●​ It provides flexibility and simplifies complex instruction sets but may be slower
than hardwired control.
●​ It works by fetching, decoding, and executing microinstructions from control
memory, generating signals for operations like moving data or performing
calculations.

Here’s a tabular comparison between Microprogrammed Control and Hardwired


Control:

Aspect Microprogrammed Control Hardwired Control

Control Uses microinstructions stored Uses fixed logic circuits to


Mechanism in control memory (ROM/RAM). generate control signals.

Flexibility Highly flexible; microinstructions Less flexible; any change


can be changed or updated requires modifying hardware.
easily.

Design Easier to design for complex More complex to design,


Complexity instructions. especially for complex
instruction sets.

Execution Slower than hardwired control Faster execution since control


Speed due to the need to fetch and signals are generated directly
decode microinstructions. by hardware.

Memory Requires additional memory to No need for extra memory as


Requirement store microprograms. control signals are generated
directly.

Cost Can be more cost-effective for May require more complex


complex instruction sets, as it hardware, increasing the cost.
simplifies design.

Modification Easy to modify; changes in Difficult to modify; changes


instructions can be handled by require hardware redesign.
altering microprograms in
memory.
Speed of Slower, as fetching and decoding Faster, as control signals are
Operation microinstructions takes time. directly generated by the
hardware without needing an
intermediate step.

Suitability for Suitable for complex systems Suitable for simpler systems
Complex with intricate instructions. with fewer and more
Systems straightforward instructions.

Examples Used in complex processors Used in simpler processors or


(e.g., general-purpose CPUs, earlier CPUs with limited
modern microcontrollers). instruction sets.

Summary of Key Differences:

●​ Microprogrammed control is more flexible and easier to modify, making it


suitable for complex and evolving instruction sets. However, it is generally slower
and requires additional memory.
●​ Hardwired control is faster and more efficient in terms of execution, but it is less
flexible and harder to modify, which makes it better suited for simpler systems or
fixed operations.

What is Pipelining?

Pipelining is a technique used in computer architecture to improve the throughput of


the processor by overlapping the execution of multiple instructions. It works by dividing
the execution of instructions into discrete stages, allowing multiple instructions to be
processed simultaneously at different stages of execution.

How Pipelining Works:

In a traditional, non-pipelined processor, instructions are executed one after another,


meaning each instruction must complete before the next one begins. However, in a
pipelined processor, the process of executing an instruction is broken down into several
smaller stages. Each stage handles a specific part of the instruction (e.g., fetching,
decoding, executing, and storing).

As soon as one stage completes its task for one instruction, it can begin working on the
next instruction, while other stages continue to work on different instructions. This
overlap of operations leads to increased instruction throughput (more instructions
executed per unit of time).
Stages of Pipelining:

Typically, a pipeline has the following stages (though the number of stages may vary):

1.​ Instruction Fetch (IF): The instruction is fetched from memory.


2.​ Instruction Decode (ID): The instruction is decoded to determine what action
needs to be taken.
3.​ Execution (EX): The operation is performed (e.g., ALU operation).
4.​ Memory Access (MEM): Data is read from or written to memory if needed.
5.​ Write-back (WB): The result is written back to the register or memory.

Example of Pipelining:

Imagine four instructions are to be executed:

1.​ Instruction 1: ADD R1, R2, R3 (Add the contents of registers R2 and R3,
store result in R1)
2.​ Instruction 2: SUB R4, R5, R6 (Subtract contents of R5 and R6, store result
in R4)
3.​ Instruction 3: MOV R7, R8 (Move contents of R8 to R7)
4.​ Instruction 4: MUL R9, R10, R11 (Multiply contents of R10 and R11, store
result in R9)

In a non-pipelined processor, these instructions would be executed sequentially, one


after another. However, in a pipelined processor, as one instruction moves from one
stage to the next, the next instruction can begin its first stage (fetch), and so on.

After some time, all instructions are in different stages of execution, and the processor
is continuously working on all instructions in parallel, maximizing throughput.

Advantages of Pipelining:

●​ Increased Throughput: More instructions can be processed in the same amount


of time.
●​ Better Resource Utilization: Different parts of the processor are used
simultaneously, leading to efficient use of resources.
●​ Faster Execution: Since instructions are processed concurrently, the overall
execution time is reduced.

Challenges in Pipelining:
●​ Data Hazards: When one instruction depends on the result of a previous
instruction, causing a delay.
●​ Control Hazards: Occurs due to branching instructions that alter the flow of
execution, requiring pipeline flushing or stalling.
●​ Structural Hazards: When hardware resources are insufficient to support the
simultaneous execution of multiple instructions.

Conclusion:

Pipelining is a key technique in modern processors that increases efficiency and speed
by allowing multiple instructions to be processed simultaneously at different stages. This
leads to higher throughput and faster execution times, but it also introduces
complexities like hazards that must be managed effectively.

Now let us look at a real-life example that should operate based on the pipelined
operation concept. Consider a water bottle packaging plant. For this case, let there be
3 processes that a bottle should go through, ensing the bottle(I), Filling water in the
bottle(F), Sealing the bottle(S).

It will be helpful for us to label these stages as stage 1, stage 2, and stage 3. Let each
stage take 1 minute to complete its operation. Now, in a non-pipelined operation, a
bottle is first inserted in the plant, and after 1 minute it is moved to stage 2 where
water is filled. Now, in stage 1 nothing is happening. Likewise, when the bottle is in
stage 3 both stage 1 and stage 2 are inactive. But in pipelined operation, when the
bottle is in stage 2, the bottle in stage 1 can be reloaded. In the same way, during the
bottle 3 there could be one bottle in the 1st and 2nd stage accordingly. Therefore at
the end of stage 3, we receive a new bottle for every minute. Hence, the average time
taken to manufacture 1 bottle is:

Therefore, the average time intervals of manufacturing each bottle is:

Without pipelining = 9/3 minutes = 3m

I F S | | | | | |
| | | I F S | | |
| | | | | | I F S (9 minutes)

With pipelining = 5/3 minutes = 1.67m

I F S | |
| I F S |
| | I F S (5 minutes)

Thus, pipelined operation increases the efficiency of a system.

Design of a basic Pipeline

●​ In a pipelined processor, a pipeline has two ends, the input end and the

output end. Between these ends, there are multiple stages/segments such

that the output of one stage is connected to the input of the next stage and

each stage performs a specific operation.

●​ Interface registers are used to hold the intermediate output between two

stages. These interface registers are also called latch or buffer.

●​ All the stages in the pipeline along with the interface registers are controlled

by a common clock.

Execution in a pipelined processor Execution sequence of instructions in a pipelined


processor can be visualized using a space-time diagram. For example, consider a
processor having 4 stages and let there be 2 instructions to be executed. We can
visualize the execution sequence through the following space-time diagrams:

Non-Overlapped Execution

Stage / Cycle 1 2 3 4 5 6 7 8
I I
S1
1 2

I I
S2
1 2

I I
S3
1 2

I I
S4
1 2

Total time = 8 Cycle

Overlapped Execution

Stage / Cycle 1 2 3 4 5

I I
S1
1 2
I I
S2
1 2

I I
S3
1 2

I I
S4
1 2

Total time = 5 Cycle Pipeline Stages RISC processor has 5 stage instruction pipeline to
execute all the instructions in the RISC instruction set. Following are the 5 stages of
the RISC pipeline with their respective operations:

●​ Stage 1 (Instruction Fetch): In this stage the CPU fetches the instructions

from the address present in the memory location whose value is stored in

the program counter.

●​ Stage 2 (Instruction Decode): In this stage, the instruction is decoded and

register file is accessed to obtain the values of registers used in the

instruction.

●​ Stage 3 (Instruction Execute): In this stage some of activities are done such

as ALU operations.

●​ Stage 4 (Memory Access): In this stage, memory operands are read and

written from/to the memory that is present in the instruction.


●​ Stage 5 (Write Back): In this stage, computed/fetched value is written back

to the register present in the instructions.

Performance of a pipelined processor Consider a ‘k’ segment pipeline with clock cycle
time as ‘Tp’. Let there be ‘n’ tasks to be completed in the pipelined processor. Now, the
first instruction is going to take ‘k’ cycles to come out of the pipeline but the other ‘n –
1’ instructions will take only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles. So, time taken to
execute ‘n’ instructions in a pipelined processor:

ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp

In the same case, for a non-pipelined processor, the execution time of ‘n’ instructions
will be:

ETnon-pipeline = n * k * Tp

So, speedup (S) of the pipelined processor over the non-pipelined processor, when ‘n’
tasks are executed on the same processor is:

S = Performance of non-pipelined processor /


Performance of pipelined processor

As the performance of a processor is inversely proportional to the execution time, we


have,

S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n – 1]

When the number of tasks ‘n’ is significantly larger than k, that is, n >> k

S = n * k / n
S = k

where ‘k’ are the number of stages in the pipeline. Also, Efficiency = Given speed up /
Max speed up = S / Smax We know that Smax = k So, Efficiency = S / k Throughput =
Number of instructions / Total time to complete the instructions So, Throughput = n /
(k + n – 1) * Tp Note: The cycles per instruction (CPI) value of an ideal pipelined
processor is 1 Please see Set 2 for Dependencies and Data Hazard and Set 3 for Types
of pipeline and Stalling.

What are Data Hazards?

In computer architecture, data hazards occur when there are dependencies between
instructions that are executed in a pipeline, potentially causing delays or incorrect
results. These hazards arise because one instruction needs to use the data that is being
modified or is yet to be written by a previous instruction.

Data hazards are critical issues in pipelined processors since multiple instructions are
executed concurrently at different stages. Without careful management, a data hazard
can lead to errors or delays in processing.

Types of Data Hazards:

There are three main types of data hazards:

1.​ Read After Write (RAW) Hazard (True Dependency):​

○​ This is the most common type of data hazard.


○​ Occurs when an instruction depends on the result of a previous
instruction that has not yet completed its write-back stage.

Example:​
Instruction 1: ADD R1, R2, R3 (R1 = R2 + R3)

Instruction 2: SUB R4, R1, R5 (R4 = R1 - R5)

○​
■​ In this example, Instruction 2 tries to read the value of R1 before
Instruction 1 writes its result to R1.
■​ This can lead to incorrect results or delays because R1 isn't
updated in time.
2.​ Write After Read (WAR) Hazard (Anti-Dependency):Occurs when a write
operation happens after a read operation on the same register or memory
location, but the write happens before the read is completed.
This type of hazard is less common and can typically be avoided by reordering
instructions or using pipeline interlocks.

Example:​
Instruction 1: SUB R1, R2, R3 (R1 = R2 - R3)

Instruction 2: ADD R2, R4, R5 (R2 = R4 + R5)

○​
■​ In this example, Instruction 2 writes to R2 after Instruction 1 reads
it. The write to R2 could happen before the read, causing data
inconsistency.
3.​ Write After Write (WAW) Hazard (Output Dependency):​

○​ Occurs when two instructions write to the same register or memory


location, and the second instruction writes before the first.
○​ This can lead to the overwriting of data if not properly handled.

Example:​
Instruction 1: ADD R1, R2, R3 (R1 = R2 + R3)

Instruction 2: SUB R1, R4, R5 (R1 = R4 - R5)

■​ Here, both instructions are writing to R1, and if Instruction 2 writes


before Instruction 1, the result from Instruction 1 will be lost.

Handling Data Hazards:

There are several techniques used to handle data hazards in pipelined processors:

1.​ Pipeline Stalling:​

○​ The pipeline is stalled (delayed) until the data dependency is resolved.


○​ For example, Instruction 2 will wait until Instruction 1 writes its result to
the register before it can proceed.
2.​ Data Forwarding (Bypassing):​
○​ Data is forwarded (or bypassed) directly from one pipeline stage to
another, eliminating the need for stalling.
○​ For example, the result from the execution stage of Instruction 1 can be
directly passed to the execution stage of Instruction 2, reducing delays.
3.​ Reordering Instructions:​

○​ Sometimes, instructions can be reordered to avoid data hazards.


○​ For example, swapping non-dependent instructions can allow one
instruction to proceed while waiting for another to complete.
4.​ Speculative Execution:​

○​ The processor can speculate the result of a branch or instruction and


execute subsequent instructions, resolving dependencies later.

Example of Data Hazard in Pipelining:


Cycle 1: IF IF IF IF (Instructions are fetched)

Cycle 2: ID IF IF IF (Instruction 1 decoded)

Cycle 3: EX ID IF IF (Instruction 1 executed)

Cycle 4: MEM EX ID IF (Instruction 1 memory access)

Cycle 5: WB MEM EX ID (Instruction 1 write-back)

●​ If Instruction 2 needs to use the value from Instruction 1 in a later stage (e.g., R1
in the example), a RAW hazard will occur, because Instruction 2 can't execute
until Instruction 1 writes its result back.

Summary:

●​ Data hazards occur when there are dependencies between instructions in a


pipelined processor that can cause delays or errors in execution.
●​ There are three types of data hazards: RAW (Read After Write), WAR (Write
After Read), and WAW (Write After Write).
●​ Techniques like pipeline stalling, data forwarding, and instruction reordering
are used to manage and resolve data hazards in pipelined processors.

What are Instruction Hazards?

Instruction hazards occur in pipelined processors when there are issues with the flow
of instructions due to dependencies or conflicts between instructions being executed in
parallel. These hazards prevent the pipeline from operating efficiently and can lead to
delays or incorrect execution of instructions.

In simple terms, instruction hazards occur when an instruction cannot be executed in


the pipeline as expected because of interactions with other instructions already in the
pipeline.

Types of Instruction Hazards:

There are three main types of instruction hazards:

1.​ Structural Hazards:​

○​ Occur when there are not enough hardware resources (like functional
units, registers, or memory) to support the simultaneous execution of
multiple instructions in the pipeline.
○​ Example: If a processor has only one memory unit and multiple
instructions need memory access at the same time, a structural hazard
occurs because the memory unit cannot serve multiple requests
simultaneously.
2.​ Data Hazards:​

○​ Occur when instructions depend on the results of previous instructions


that are still being processed in the pipeline.
○​ Example: If one instruction needs data that is being modified by a
previous instruction, it could lead to incorrect results or delays until the
data is available.
○​ We discussed Data Hazards earlier, and they include:
■​ Read After Write (RAW): One instruction reads data that another
instruction is writing to.
■​ Write After Read (WAR): One instruction writes to a location that
another instruction is reading from.
■​ Write After Write (WAW): Two instructions write to the same
location, and the order of writes is crucial.
3.​ Control Hazards:​

○​ Occur due to branch instructions (like if, goto, etc.), which change the
flow of execution.
○​ Control hazards arise when the processor doesn’t know the target of a
branch instruction until it has been fully decoded, potentially causing
instructions after the branch to be fetched or executed prematurely.
○​ Example: If a branch instruction is encountered, the processor has to wait
to determine the target address before continuing to fetch subsequent
instructions. If it fetches the wrong instruction, a control hazard occurs.

How Instruction Hazards Affect Pipelining:

Instruction hazards can cause delays or incorrect results in a pipelined processor, and
different techniques are used to handle them:

●​ Structural hazards can be resolved by adding more resources, such as


additional memory units or functional units.
●​ Data hazards can be handled by techniques like pipeline stalling, data
forwarding (bypassing), or reordering instructions to avoid dependencies.
●​ Control hazards are usually managed using branch prediction, where the
processor guesses the target of the branch instruction and begins executing it in
advance, or by stalling the pipeline until the branch target is known.

Pictorial Representation of Instruction Hazards:


+-----------------------+

| Instruction Fetch | --> Instruction 1

| (IF) |
+-----------------------+

+-----------------------+

| Instruction Decode | --> Instruction 1 (ID)

| (ID) |

+-----------------------+

+-----------------------+

| Execute (EX) | --> Instruction 1 (EX)

| (EX) |

+-----------------------+

+-----------------------+ +--------------------+

| Memory Access (MEM) | --> | Instruction 2 |

| (MEM) | | (EX) |

+-----------------------+ +--------------------+

+-----------------------+

| Write Back (WB) |

| (WB) |

+-----------------------+

In this example:
●​ Instruction 1 goes through the standard pipeline stages.
●​ Instruction 2 may have a data or control dependency on Instruction 1. This
could cause a data hazard or control hazard if the instructions are not properly
managed in the pipeline.

Conclusion:

●​ Instruction hazards in pipelined processors occur when there are dependencies


or conflicts between instructions, leading to delays or errors in execution.
●​ Structural hazards happen due to insufficient hardware resources.
●​ Data hazards arise from dependencies between instructions.
●​ Control hazards are caused by instructions that alter the flow of control (e.g.,
branches).
●​ Various techniques, such as stalling, forwarding, and branch prediction, are
used to handle and minimize the impact of instruction hazards.

Influence of Instruction Sets in Computer Architecture

In computer architecture, the instruction set (also known as the Instruction Set
Architecture or ISA) plays a crucial role in defining how a processor functions and
interacts with software. It is essentially a collection of instructions that the CPU can
execute, and it defines the set of operations, formats, and control mechanisms the
processor supports.

The design of the instruction set has a significant influence on the overall architecture,
performance, and efficiency of the processor.

How Instruction Sets Influence Computer Architecture:

1.​ Processor Design:​

○​ The instruction set directly affects how the processor is designed. The
types of instructions and the number of operands required for each
instruction determine the complexity of the processor's control unit and
data path.
○​ RISC (Reduced Instruction Set Computing) processors have simpler
instruction sets with fewer instructions, which typically require more
cycles per instruction but allow for easier and faster execution of each
instruction.
○​ CISC (Complex Instruction Set Computing) processors have more
complex instructions, often capable of performing several operations in
one instruction, which can reduce the number of instructions but may
involve more complex circuitry.
2.​ Performance:​

○​ The choice of instruction set affects execution speed. For example, in a


RISC architecture, each instruction is designed to be simple and execute
in a single clock cycle. This can result in faster execution of programs, as
there is less variation in instruction execution times.
○​ On the other hand, CISC architectures may execute more complex
instructions in fewer steps, which can sometimes be more efficient for
certain tasks but may involve more cycles for other types of operations.
3.​ Memory Usage:​

○​ The size and format of instructions can affect how much memory is
required for storing instructions. In CISC, each instruction might be
longer, as it encodes more operations or addressing modes. In RISC,
instructions are typically of a fixed length, which can make instruction
fetch and decode simpler and faster, but also might lead to larger
programs as more instructions are needed.
○​ The design of the instruction set also influences data storage (e.g., how
registers and memory are addressed), which impacts both the size and
performance of programs.
4.​ Compiler Design:​

○​ The instruction set defines the type of instructions that a compiler can
use to generate machine code. RISC instruction sets tend to rely on
simpler, more frequent instructions, making it easier for compilers to
optimize code. Compilers for CISC instruction sets must deal with more
complex, multi-operation instructions, which can make code generation
more challenging.
○​ A well-designed instruction set enables better optimizations by the
compiler, leading to improved performance of the resulting programs.
5.​ Instruction Execution Time:​

○​ In CISC architectures, instructions vary in length and complexity, which


means they may take a different number of clock cycles to execute. A
single instruction might perform multiple tasks, which could lead to
longer execution times for some operations.
○​ In RISC architectures, instructions are designed to complete in a fixed
number of clock cycles, typically one cycle, which simplifies performance
prediction and can lead to more predictable and efficient execution.
6.​ System Complexity and Cost:​

○​ CISC architectures tend to have more complex processors since they


support a larger and more intricate set of instructions, which requires
more complex decoding and control mechanisms.
○​ RISC architectures, with their simpler instruction sets, generally lead to
simpler processor designs, which can reduce the cost of manufacturing
and improve reliability.
7.​ Instruction Set Extensions:​

○​ Modern processors often include extended instruction sets to enhance


performance for specific tasks, such as multimedia processing or
cryptographic operations. For example, Intel processors include SSE
(Streaming SIMD Extensions) and AVX (Advanced Vector Extensions),
which are specialized instructions designed to speed up vector and
floating-point operations.
○​ These extensions allow processors to perform specific tasks more
efficiently but may complicate compatibility and performance
optimization across different systems.

Summary of the Influence on Instruction Sets:


Factor RISC (Reduced Instruction CISC (Complex Instruction Set
Set Computing) Computing)

Processor Simple, smaller instruction set More complex, larger instruction


Design leads to simpler CPU design. set requires complex CPU
design.

Execution Speed Fast due to simple May be slower per instruction,


instructions that execute in but fewer instructions overall.
one cycle.

Memory Usage Fixed-length instructions, Variable-length instructions,


often resulting in larger leading to efficient use of
programs. memory.

Compiler Design Easier to optimize due to More complex code generation


simple instructions. due to multi-operation
instructions.

System Simpler, lower-cost systems. More complex, higher-cost


Complexity & systems.
Cost

Instruction Typically 1 cycle per Varies per instruction, may


Execution Time instruction. involve multiple cycles.

Conclusion:

The instruction set profoundly influences the design and performance of a processor.
While RISC focuses on simplicity and efficiency with smaller, uniform instructions,
CISC emphasizes a more complex set of instructions that can do more in a single step.
The decision between RISC and CISC impacts the processor's speed, memory usage,
system complexity, and how compilers optimize code, all of which contribute to the
overall performance and efficiency of a computer system.
Data Path and Control Considerations in Computer Architecture

In computer architecture, the data path and control are fundamental components that
work together to enable the processor to execute instructions efficiently.

1. Data Path:

The data path refers to the collection of components that are responsible for moving
data within the processor. These components include registers, multiplexers, arithmetic
logic units (ALUs), buses, and memory units. The data path is essentially the hardware
infrastructure that allows data to flow between different parts of the processor for
computation and storage.

Key elements of the data path include:

●​ Registers: Small, fast storage locations used to store data and intermediate
results during instruction execution.
●​ ALU (Arithmetic Logic Unit): The part of the processor that performs arithmetic
and logical operations (such as addition, subtraction, AND, OR, etc.).
●​ Multiplexers: Devices that select one of several input signals based on a control
signal, enabling different data sources to be routed to the appropriate
destination.
●​ Buses: Shared pathways that allow data to be transferred between components
within the processor (e.g., between registers and the ALU).
●​ Memory Units: Locations for storing instructions and data, such as caches, main
memory, and registers.

The data path is designed to efficiently process and move data in response to the
instructions being executed by the CPU.

2. Control:

The control unit manages the operation of the data path by generating the necessary
control signals. These control signals dictate the actions that different components in
the data path should take. The control unit can be designed in two ways: hardwired
control or microprogrammed control.
Key aspects of the control unit include:

●​ Control Signals: Signals that instruct various parts of the data path on what
actions to perform. For example, control signals can specify whether the ALU
should add or subtract, or which register to write to.
●​ Instruction Decoding: The control unit decodes the incoming instruction to
determine what needs to be done. The instruction set architecture (ISA) defines
the set of instructions that the control unit must handle.
●​ Execution of Instructions: Based on the decoded instruction, the control unit
will activate specific components of the data path to carry out the operation. For
example, it may select operands from registers, send them to the ALU, or write
back results to memory or registers.

Data Path and Control Relationship:

●​ The data path handles the flow and manipulation of data, while the control
unit coordinates the operations of the data path components.
●​ When the control unit receives an instruction, it decodes it and generates
control signals that direct the data path components to perform the necessary
operations (e.g., arithmetic, memory access, etc.).
●​ The interaction between the data path and control unit ensures that the correct
data is processed at the correct time and stored in the appropriate location.

Data Path and Control in the Context of a Pipeline:

In pipelined processors, the data path and control unit must be carefully designed to
handle multiple instructions in parallel. The data path is structured in stages (e.g.,
instruction fetch, decode, execute, memory access, and write-back), and the control
unit generates the appropriate control signals for each stage.

Key Considerations in Data Path and Control Design:

1.​ Efficiency: The data path must be designed to move data quickly between
registers, memory, and the ALU. Minimizing the number of cycles it takes to
complete each instruction is important for improving processor performance.​
2.​ Hazards: Data hazards (like read-after-write) and control hazards (like
branching) must be managed to avoid delays and ensure correct execution.
Techniques like data forwarding (bypassing) and pipeline stalling are often
used.​

3.​ Complexity: The complexity of the control unit increases with the sophistication
of the instruction set. A CISC processor may require more complex control
signals than a RISC processor due to the greater variety of instructions.​

4.​ Synchronization: In a pipelined processor, the data path components must be


synchronized across the pipeline stages to ensure correct timing and execution.
The control unit must manage the timing of data flows and instruction
executions effectively.​

Summary:

Aspect Data Path Control

Definition The hardware components that The unit that generates


move and process data (ALU, control signals to coordinate
registers, buses). the data path.

Components ALU, registers, memory, Control signals, instruction


multiplexers, buses. decoding, and execution.

Function Performs arithmetic/logical Directs the data path to


operations, stores and retrieves data. perform the correct
operation.

Role in Data flows through different stages Controls the flow and timing
Pipelining of the pipeline (fetch, decode, of data through the pipeline
execute, etc.). stages.
Key Data transfer speed, data hazards, Generating correct control
Consideration and correct routing. signals, instruction
decoding.

Conclusion:

The data path and control unit are integral parts of a processor's architecture. The
data path performs the actual computations and data transfers, while the control unit
ensures that the processor operates correctly by generating the control signals
necessary for executing instructions. Together, they determine how efficiently a
processor can handle and execute instructions, especially in complex, pipelined, or
multi-core systems.
Several important changes are

1. There are separate instruction and data caches that use separate address and data
connections to the processor. This requires two versions of the MAR register, IMAR for
accessing tile instruction cache and DMAR for accessing the data cache.

2. The PC is connected directly to the IMAR, so that the contents of the PC can be
transferred to IMAR at the same time that an independent ALU operation is taking
place.

3. The data address in DMAR can be obtained directly from the register file or from the
ALU to support the register indirect and indexed addressing modes.

4. Separate MDR registers are provided for read and write operations. Data can be
transferred directly between these registers and the register file during load and store
operations without the need to pass through the ALU.

5. Buffer registers have been introduced at the inputs and output of the ALU. These are
registers SRCl, SRC2, and RSLT. Forwarding connections may be added if desired.

6. The instruction register has been replaced with an instruction queue, which is loaded
from the instruction cache.

7. The output of the instruction decoder is connected to the control signal pipeline.
MEMORY SYSTEM
BASIC CONCEPTS:

Basic Concepts of Memory System in Computer Architecture

In computer architecture, the memory system is one of the most important


subsystems, responsible for storing and retrieving data efficiently. The memory
system in a computer is hierarchical, with different levels of memory designed to
provide a balance between speed, cost, and storage capacity. A well-designed
memory system is critical for achieving high performance in modern computing.

Here, we will cover the essential concepts of the memory system in computer
architecture in detail.

1. Memory Hierarchy in Computer Architecture

The memory hierarchy in a computer is organized to optimize performance based


on speed and cost. The hierarchy moves from the fastest and most expensive
memory at the top to the slowest and least expensive memory at the bottom. This
organization allows for efficient access to data, reducing bottlenecks in performance.

Memory Levels in the Hierarchy:

1.​ Registers:
○​ Location: Inside the CPU (central processing unit).
○​ Purpose: Holds the data that the CPU is currently processing. Registers
store intermediate values for computations and instructions.
○​ Speed: Extremely fast, often measured in nanoseconds (ns).
○​ Size: Very small, usually a few bits or bytes (e.g., 32 or 64 bits per
register).
○​ Access: Directly accessible by the CPU.
2.​ Cache Memory:
○​ Location: Between the CPU and main memory (RAM). Typically
includes L1, L2, and sometimes L3 caches.
○​ Purpose: Stores frequently accessed data and instructions, reducing
the average time to access memory.
○​ Speed: Faster than RAM but slower than registers. L1 cache is the
fastest, followed by L2, and L3 cache.
○​ Size: Typically ranges from a few kilobytes (KB) for L1 cache to a few
megabytes (MB) for L3 cache.
○​ Access: Quick access due to proximity to the CPU.
3.​ Main Memory (RAM):
○​ Location: Typically installed on the motherboard or as separate chips.
○​ Purpose: Stores the operating system, applications, and data currently
in use. It is the main working memory of the computer.
○​ Speed: Slower than cache memory but faster than secondary storage.
○​ Size: Typically measured in gigabytes (GB) for modern computers.
○​ Access: Random access, meaning any location can be accessed
directly.
4.​ Secondary Storage:
○​ Location: External to the CPU, usually on hard disk drives (HDDs) or
solid-state drives (SSDs).
○​ Purpose: Used for long-term storage of data, programs, and files.
○​ Speed: Much slower than RAM but offers higher storage capacity.
○​ Size: Can range from hundreds of gigabytes to several terabytes (TB).
○​ Access: Slower access time due to mechanical components (HDD) or
non-volatile memory (SSD).
5.​ Tertiary and Off-line Storage:
○​ Location: External storage devices such as optical disks, magnetic
tapes, or cloud storage.
○​ Purpose: Used for backup, archival, and long-term storage of data.
○​ Speed: Slowest in the memory hierarchy.
○​ Size: Typically much larger in capacity compared to secondary storage.

Key Concept: The Memory Hierarchy is Designed for Trade-offs

●​ Registers are the fastest and smallest, directly accessible by the CPU, but
they are limited in capacity.
●​ Cache is smaller than main memory but much faster and stores recently used
data to reduce access times.
●​ Main memory (RAM) is slower but provides more storage for running
applications and data.
●​ Secondary storage offers large capacity at a slower speed but provides
non-volatile storage.
●​ Tertiary storage provides an even larger capacity but is the slowest for
access.

2. Types of Memory in Computer Systems


There are various types of memory used in computer systems, each with specific
purposes and characteristics:

a. Volatile Memory:

●​ Definition: Memory that loses its contents when the power is turned off.
●​ Examples:
○​ RAM (Random Access Memory): Used as the main working memory for
active data and instructions.
○​ Cache Memory: Temporary high-speed memory used to store
frequently accessed data and instructions.
○​ Registers: Small, fast storage locations within the CPU used for
operations.

b. Non-Volatile Memory:

●​ Definition: Memory that retains data even when the power is off.
●​ Examples:
○​ ROM (Read-Only Memory): Contains firmware or software that is
permanently programmed during manufacturing (e.g., BIOS in a
computer).
○​ Flash Memory (SSD): A form of non-volatile memory used in modern
storage devices like USB flash drives, SSDs, and memory cards.
○​ Hard Disk Drives (HDD): Magnetic storage used for secondary storage,
providing non-volatile storage at a larger capacity.

c. Semi-Volatile Memory:

●​ Definition: Memory that retains data for some period after power is turned off
but may eventually lose it.
●​ Examples:
○​ DRAM (Dynamic RAM): Requires periodic refreshes to retain data. It is
the most common type of main memory.
○​ SRAM (Static RAM): Faster than DRAM and does not require refreshing,
but it is still volatile.

3. Memory Addressing and Access Methods

a. Address Space:
●​ Address space refers to the range of addresses that a system can use to
identify data in memory. This can include the address space of the CPU (for
registers and cache) and the main memory (RAM).

b. Memory Access Methods:

●​ Random Access: In random access memory, data can be accessed directly


and in any order. This is typical of RAM and cache memory.
●​ Sequential Access: Data is accessed in a specific sequence, one item at a
time. This is typical of magnetic tape storage.
●​ Direct Memory Access (DMA): A method where peripherals (like disk drives)
can directly access the memory without CPU intervention, improving data
transfer speeds.

4. Memory Organization

Memory in a computer can be organized in different ways to optimize performance


and efficiency. Common memory organization techniques include:

a. Byte Addressable Memory:

●​ Memory is organized in units called bytes (8 bits). Each byte has a unique
address, and the CPU can access data one byte at a time.
●​ This is typical for most modern computer systems.

b. Word Addressable Memory:

●​ Memory is organized in larger units, called words (e.g., 16-bit or 32-bit words).
A word is the natural unit of data used by the CPU. This organization is used in
some older systems.

5. Memory Management Techniques

The operating system (OS) is responsible for managing the memory in a computer
system. Some of the key techniques used in memory management are:

a. Paging:
●​ Paging divides memory into fixed-size blocks called pages (in virtual memory
systems). The operating system manages the mapping of logical addresses
(virtual memory) to physical addresses (RAM) using a page table.
●​ Paging allows efficient use of memory and supports virtual memory, where
processes may use more memory than physically available.

b. Segmentation:

●​ In segmented memory systems, memory is divided into segments of


variable lengths. Each segment can represent a logical unit of data, such as
code, stack, or heap.
●​ Segmentation allows more flexible memory management compared to
paging, as segments can grow or shrink in size.

c. Memory Protection:

●​ Memory protection ensures that processes cannot access memory allocated


to other processes, preventing memory corruption or unauthorized access.
●​ The MMU (Memory Management Unit) is responsible for enforcing memory
protection policies.

d. Garbage Collection:

●​ In languages like Java or Python, garbage collection automatically frees up


memory that is no longer being used by the program, preventing memory
leaks.
●​ The OS or runtime environment tracks objects in memory and reclaims
memory when they are no longer reachable.

6. Cache Memory and Cache Coherence

a. Cache Memory:

●​ Cache memory is a small but extremely fast memory located between the
CPU and main memory. It stores copies of frequently accessed data to speed
up data retrieval.
●​ Cache hits occur when data is found in the cache, leading to faster access
times. Cache misses occur when the data is not in the cache, requiring access
to slower main memory.
b. Cache Coherence:

●​ Cache coherence ensures that multiple cache copies of the same memory
location are consistent across all cores in multi-core systems.
●​ Cache coherence protocols like MESI (Modified, Exclusive, Shared, Invalid)
are used to maintain consistency across caches in multi-core processors.

7. Virtual Memory

●​ Virtual memory extends the concept of memory beyond physical RAM. It


allows a computer to use disk space (e.g., swap space) as if it were additional
RAM.
●​ Virtual memory allows large programs to run on systems with limited physical
memory, using techniques like paging or segmentation.
●​ The operating system uses the page table to translate virtual addresses into
physical addresses.

Conclusion

The memory system in computer architecture is a hierarchical structure designed to


balance speed, cost, and capacity. From the fastest registers and cache to
secondary storage like HDDs and SSDs, each level of memory serves a specific
purpose to optimize the overall system performance.

Key concepts such as paging, segmentation, memory protection, and virtual


memory are fundamental for memory management, enabling efficient use of
memory resources and supporting multi-tasking and larger applications.

Semiconductor RAM (Random Access Memory)

Semiconductor RAM is a type of volatile memory, meaning that it loses all stored
data when the power is turned off. Unlike non-volatile memory (e.g., hard drives,
SSDs), semiconductor RAM is designed for fast data access and is typically used in
computers, smartphones, and other electronic devices to store data temporarily
while the device is running.

Semiconductor RAM is mainly categorized into Static RAM (SRAM) and Dynamic
RAM (DRAM), each with different characteristics, as explained below.
1. Static RAM (SRAM)

Static RAM (SRAM) is a type of semiconductor RAM that uses flip-flops (a kind of
circuit) to store each bit of data. It is called "static" because it doesn't need to be
refreshed, unlike DRAM, which requires refreshing to retain data.

Key Features:

●​ No Refreshing Needed: Unlike DRAM, SRAM does not need periodic refreshing
of data.
●​ Faster: SRAM is faster than DRAM because accessing data in SRAM is quicker
and more direct.
●​ More Expensive: Since SRAM uses more transistors per bit (usually 4 to 6
transistors per bit), it is more expensive to produce.
●​ Smaller Capacity: SRAM tends to have smaller storage capacities compared
to DRAM because of its higher cost and larger space requirements.
●​ Used in Cache Memory: Because of its high speed, SRAM is commonly used
for CPU cache memory, which is critical for performance.

Operation of SRAM:

●​ Read Operation: When the CPU needs to access data, it sends an address to
the SRAM. The data is immediately available since it doesn’t need to be
refreshed.
●​ Write Operation: Data is written directly into the SRAM cell. Once written, the
data stays in the cell until it is updated or erased.

2. Dynamic RAM (DRAM)

Dynamic RAM (DRAM) is another type of semiconductor RAM, but it works


differently than SRAM. DRAM stores each bit of data in a capacitor, which can lose
its charge over time, meaning the data must be refreshed periodically to prevent it
from being lost.

Key Features:

●​ Needs Refreshing: DRAM stores data in capacitors that discharge over time, so
the memory needs to be refreshed regularly to retain the data.
●​ Slower than SRAM: DRAM is slower than SRAM because of the time needed
for refreshing and accessing data.
●​ Cheaper and Higher Capacity: DRAM is cheaper to produce and can store
much more data in the same physical space compared to SRAM. It is typically
used as the main memory (RAM) in computers and other devices.
●​ Used as Main Memory: DRAM is commonly used for the main system
memory in devices because of its larger capacity and lower cost.

Operation of DRAM:

●​ Read Operation: DRAM cells are addressed by row and column lines, and the
data stored in the capacitor is read.
●​ Write Operation: Data is written into the DRAM by charging the capacitor, and
this data needs to be refreshed continuously.

3. Asynchronous DRAM (ADRAM) & Synchronous DRAM (SDRAM)

Asynchronous DRAM (ADRAM):

●​ No Sync with Clock: Asynchronous DRAM operates independently of the


system clock. Data is transferred when the memory controller is ready, but it is
not synchronized with the clock signal of the CPU.
●​ Slower Data Transfer: Since it is not synchronized with the CPU clock, the
data transfer rate is slower compared to newer types of RAM.
●​ Older Technology: Asynchronous DRAM is mostly obsolete today, as newer
memory technologies are much faster.

Synchronous DRAM (SDRAM):

●​ Clock-Synchronized: Unlike ADRAM, Synchronous DRAM (SDRAM) is


synchronized with the system clock. This means data is transferred in sync
with the CPU's clock cycles, allowing for faster and more efficient data access.
●​ Faster Data Transfer: SDRAM can transfer data more quickly because of the
synchronization with the clock. It has a faster data rate compared to
asynchronous memory.
●​ Common in Modern Systems: SDRAM is widely used in modern computers,
smartphones, and other devices as the main system memory.
4. Double Data Rate SDRAM (DDR SDRAM)

DDR SDRAM is an enhanced version of SDRAM that improves performance by


transferring data twice per clock cycle, rather than once. This allows DDR to achieve
higher data transfer rates without increasing the clock speed.

Key Features:

●​ Twice the Speed of SDRAM: DDR SDRAM transfers data on both the rising and
falling edges of the clock signal, doubling the amount of data transferred per
clock cycle compared to regular SDRAM.
●​ Various Versions (DDR1, DDR2, DDR3, DDR4, DDR5): Over time, DDR
technology has evolved, with each new version offering higher speeds, lower
power consumption, and improved data transfer rates. DDR4 and DDR5 are
the latest versions commonly used today.
●​ Used in Modern Computers: DDR SDRAM is the standard memory in modern
computers and other devices, like gaming consoles and laptops, due to its
high performance and efficiency.

Operation of DDR SDRAM:

●​ Read/Write Operations: DDR SDRAM operates by synchronizing the data


transfer with the system clock. Data is transferred on both the rising and
falling edges of the clock signal, improving throughput.

5. Rambus Memory (RDRAM)

Rambus DRAM (RDRAM) is a type of memory that was developed by Rambus Inc. It
was designed to be faster than standard SDRAM by using a high-speed data bus
and advanced signaling techniques. Although it had a brief period of popularity, it
was eventually superseded by DDR SDRAM.

Key Features:

●​ High Data Transfer Rate: RDRAM used a wider, faster data bus (the Rambus
Channel) and could transfer more data per clock cycle compared to
traditional SDRAM at the time.
●​ Wide Bandwidth: It was designed to support high-bandwidth applications,
like graphics and gaming, and was used in some high-end systems in the late
1990s and early 2000s.
●​ Expensive and Complex: Despite its high performance, RDRAM was
expensive to produce, and its complex interface required additional chips and
controllers, making it more costly than DDR SDRAM.
●​ Limited Adoption: Due to high costs and competition from DDR SDRAM,
RDRAM was not widely adopted and eventually phased out in favor of DDR
memory.

Operation of Rambus Memory (RDRAM):

●​ Data Transfer: RDRAM used a different architecture from DDR SDRAM, with
higher data transfer rates using a 16-bit wide bus instead of the traditional
64-bit bus used by DDR SDRAM.
●​ Clock Speed and Bandwidth: RDRAM typically operated at higher speeds
and with higher bandwidth than standard SDRAM, but it required additional
hardware like special controllers.

Summary of Differences:
Memory Type Key Features Use Case Speed

SRAM Faster, no refresh, more CPU cache (L1, L2 Very


expensive, smaller caches) Fast

DRAM Slower, requires refreshing Main system memory Modera


te

Asynchronous No clock synchronization Older memory Slow


DRAM technology

Synchronous Synchronized with CPU Main memory (Modern Fast


DRAM clock systems)

DDR SDRAM Double data rate, faster Main memory (Modern Very
than SDRAM systems) Fast

RDRAM High bandwidth, High-performance Very


expensive, complex systems (historical) Fast

ROM (Read-Only Memory)


ROM (Read-Only Memory) is a type of non-volatile memory used primarily for
storing firmware, which is permanent software programmed into hardware. Unlike
RAM (Random Access Memory), ROM retains its data even when the power is
turned off, making it ideal for storing system instructions that do not change
frequently.

ROM is mainly used to store boot instructions, firmware, and system configurations
that do not require frequent updates.

Types of ROM

1.​ Mask ROM (MROM):


○​ Description: This is the original type of ROM. It is pre-programmed
during the manufacturing process. The data stored in MROM is
permanently set during production and cannot be changed.
○​ Advantages:
■​ Very cheap to produce in large quantities.
■​ Data is secure and cannot be altered.
○​ Disadvantages:
■​ Not rewritable; once data is written during manufacturing, it
cannot be changed or updated.
■​ Limited flexibility.
○​ Uses: Used for mass-produced products like calculators or low-cost
electronic devices.
2.​ PROM (Programmable ROM):
○​ Description: PROM is a type of ROM that can be programmed after
manufacture. It is one-time programmable (OTP), meaning once the
data is written, it cannot be changed.
○​ Advantages:
■​ More flexible than mask ROM because it allows for programming
after manufacture.
■​ Cheaper than other ROM types for small production runs.
○​ Disadvantages:
■​ Can only be written to once, so no modifications can be made
afterward.
○​ Uses: Used in situations where a specific, unchangeable piece of data
is needed, like storing system startup routines.
3.​ EPROM (Erasable Programmable ROM):
○​ Description: EPROM can be erased and reprogrammed using
ultraviolet (UV) light. It has a window in the package that allows UV
light to erase the data stored in it.
○​ Advantages:
■​ Reprogrammable, which allows for updates.
■​ More flexible than PROM.
○​ Disadvantages:
■​ Erasure requires exposure to UV light, which can take a long
time (typically 20–30 minutes).
■​ It has a limited number of write/erase cycles.
○​ Uses: EPROM was commonly used for system firmware that required
infrequent updates, like BIOS in older computers.
4.​ EEPROM (Electrically Erasable Programmable ROM):
○​ Description: EEPROM can be erased and reprogrammed electrically,
meaning it does not require UV light like EPROM. Data can be written
and erased byte by byte.
○​ Advantages:
■​ Can be reprogrammed without removal from the system.
■​ Data is retained even when power is off.
■​ More flexible than EPROM because it can be erased and
reprogrammed electrically.
○​ Disadvantages:
■​ Slower than other types of ROM (EPROM, Mask ROM).
■​ Limited number of write/erase cycles (typically 1 million cycles).
■​ More expensive than other ROM types.
○​ Uses: EEPROM is used for storing small amounts of data like BIOS
settings or configuration data in embedded systems.
5.​ Flash Memory:
○​ Description: Flash memory is a type of EEPROM that can be erased
and reprogrammed in blocks (rather than byte by byte like traditional
EEPROM). It is faster and more durable than EEPROM and is the most
common type of non-volatile memory used in modern applications.
○​ Advantages:
■​ Can be electrically erased and reprogrammed in large blocks.
■​ High-speed data access and rewriting.
■​ Low cost per bit of storage.
■​ Widely used for mass storage in devices like smartphones,
tablets, and USB drives.
○​ Disadvantages:
■​ Limited number of program/erase cycles (typically 10,000 to
100,000 cycles per block).
■​ Slower write speeds compared to read speeds.
○​ Uses: Flash memory is used in a wide range of applications, including
USB drives, solid-state drives (SSDs), memory cards, and embedded
devices.

Detailed Comparison of EEPROM, EPROM, and Flash Memory


Feature EPROM EEPROM Flash Memory

Erase Method UV light Electrically Electrically


(byte-by-byte) (block-wise)

Reprogrammin Yes, but requires Yes, can be erased Yes, faster block-wise
g UV light for and reprogrammed reprogramming
erasure electrically

Write Speed Slow, due to UV Slower compared to Fast, especially for


erasure Flash and EEPROM read operations

Rewriting Limited number Limited (1 million Higher number of


Frequency of cycles cycles typically) cycles
(1000–100,000) (10,000–100,000 per
block)

Cost Higher due to UV Higher than Flash Lower cost per bit
erase but cheaper than compared to
requirements EPROM EEPROM and EPROM

Use Cases Older systems, BIOS, configuration SSDs, USB drives, SD


firmware updates settings, small cards, smartphones,
embedded systems etc.

Advantages and Disadvantages of Each Type of ROM

Advantages of EPROM:

●​ Reprogrammable: Can be erased and reprogrammed using UV light, allowing


flexibility in development.
●​ Data retention: Retains data without power, making it ideal for storing
firmware.

Disadvantages of EPROM:

●​ Slow Erasure: Erasing data requires UV light and can take a significant amount
of time (typically 20–30 minutes).
●​ Limited Rewrites: The chip can only handle a limited number of erasure and
programming cycles before the quality degrades.

Advantages of EEPROM:

●​ Electrical Erasure: Can be erased and reprogrammed electrically, making it


much more convenient than EPROM.
●​ Byte-by-byte: Allows data to be erased and reprogrammed byte by byte,
offering more flexibility.

Disadvantages of EEPROM:

●​ Slow Write Speeds: Rewriting data byte-by-byte is slower than block-based


systems like Flash.
●​ Limited Write Cycles: Similar to EPROM, EEPROM has a limited number of
write/erase cycles.
●​ Costly: More expensive than other types of memory (like Flash or Mask ROM).

Advantages of Flash Memory:

●​ Faster Performance: Compared to EEPROM, Flash memory offers faster read


and write operations, especially for large blocks of data.
●​ Durability: Flash memory has a higher number of read/write cycles
compared to EEPROM.
●​ Cost-Effective: Flash memory is cheaper per bit than both EEPROM and
EPROM, making it ideal for mass storage.

Disadvantages of Flash Memory:

●​ Limited Write/Erase Cycles: Although Flash memory offers better durability


than EEPROM, it still has a limited number of write/erase cycles (about 10,000
to 100,000 per block).
●​ Slower Write Operations: Writing data to Flash memory, particularly in large
blocks, can be slower than reading.
Applications of ROM, EPROM, EEPROM, and Flash Memory

●​ ROM:
○​ Storing firmware and system boot-up instructions in devices like
computers, calculators, and gaming consoles.
●​ EPROM:
○​ Firmware updates in older systems.
○​ Used in early microcontrollers and system boards where firmware
might need occasional updates.
●​ EEPROM:
○​ Storing configuration settings in embedded systems (e.g., saving BIOS
settings in computers, storing device configurations).
○​ Often used in small-scale storage applications where data changes
infrequently.
●​ Flash Memory:
○​ Used in modern storage devices like USB drives, solid-state drives
(SSDs), and memory cards (SD cards, microSD cards).
○​ In smartphones, tablets, and cameras for storing large amounts of
data quickly and efficiently.
○​ Flash-based storage is widely used in embedded systems and
automotive applications as well.

Memory Hierarchy: Speed, Size, and Cost

In computer architecture, memory hierarchy refers to the arrangement of different


types of memory that vary in speed, size, and cost to optimize overall system
performance and cost efficiency. The idea is to use the fastest memory (but usually
the smallest and most expensive) for operations that need high-speed access, while
slower, larger, and cheaper memory is used for storage of less frequently accessed
data.

The memory hierarchy is designed to exploit the principle of locality of reference


(both spatial and temporal) to improve the speed of accessing data and reduce the
cost of storing large amounts of data. Here’s an overview of the key levels of the
memory hierarchy and how speed, size, and cost vary across these levels.

Levels of Memory Hierarchy

1.​ Registers
○​ Speed: Fastest
○​ Size: Smallest (few bytes to a few kilobytes)
○​ Cost: Most expensive per byte
○​ Description: Registers are the smallest and fastest memory located
inside the CPU. They hold data that is currently being processed.
Registers are very expensive to produce but offer the highest speed.
2.​ Cache Memory (L1, L2, L3)
○​ Speed: Very fast, but slower than registers
○​ Size: Small (from a few KB to a few MB)
○​ Cost: Expensive per byte
○​ Description: Cache memory is located closer to the CPU than the main
memory and stores frequently used instructions and data to speed up
processing. There are typically multiple levels of cache (L1, L2, and L3),
with L1 being the smallest and fastest and L3 being larger but slower.
3.​ Main Memory (RAM)
○​ Speed: Slower than cache memory, but faster than secondary storage
○​ Size: Larger (from a few GB to several GB)
○​ Cost: Less expensive than cache memory per byte
○​ Description: RAM (Random Access Memory) is the primary working
memory in a computer where programs and data in active use are
stored. It is faster than secondary storage (like hard drives or SSDs) but
slower than cache memory. RAM is volatile, meaning it loses all data
when the power is turned off.
4.​ Secondary Storage (HDDs, SSDs, Optical Disks)
○​ Speed: Slowest (compared to all other memory types)
○​ Size: Very large (from hundreds of GB to several TB)
○​ Cost: Least expensive per byte
○​ Description: Secondary storage, like Hard Disk Drives (HDDs), Solid
State Drives (SSDs), or optical disks, is used for long-term data storage.
It is much slower than main memory but can store much larger
amounts of data. SSDs are faster than HDDs, but both are still much
slower than RAM or cache memory.
5.​ Tertiary Storage (Cloud Storage, Magnetic Tape)
○​ Speed: Slowest (can be very slow, especially in case of magnetic tape)
○​ Size: Extremely large (can be in the petabytes)
○​ Cost: Cheapest per byte
○​ Description: Tertiary storage includes things like cloud storage or
magnetic tape, and it's typically used for archiving or backup
purposes. Access to data in tertiary storage is much slower, but it is
very cheap for storing large amounts of data.
Comparison of Speed, Size, and Cost
Memory Type Speed Size Cost

Registers Fastest Smallest (few bytes) Most


expensive

L1 Cache Very fast Very small (32KB - Expensive


128KB)

L2 Cache Fast Small (128KB - 10MB) Expensive

L3 Cache Fast Larger (2MB - 50MB) Expensive

Main Memory (RAM) Moderat Larger (4GB - 64GB) Moderate


e

HDDs / SSDs Slow Large (250GB - 10TB) Cheap

Tertiary Storage Very Very large (TB to PB) Cheapest


slow

How Speed, Size, and Cost Impact the Design of Memory Hierarchy

1.​ Speed:
○​ Higher speed memory is needed to process data as quickly as
possible. The CPU needs to access data from memory quickly, so
registers and cache memory are designed to be extremely fast to
reduce bottlenecks. However, their small size limits how much data
they can store.
○​ RAM is slower but larger, providing more room for data that is actively
used. When more data is needed, it is fetched from secondary storage
(e.g., HDD/SSD), which is much slower but can store far more data.
2.​ Size:
○​ Smaller and faster memories, like registers and cache, are used for
frequently accessed data, while larger and slower memories, like RAM
and secondary storage, store less frequently accessed data.
○​ The design of the memory hierarchy balances between size and
speed. Registers are tiny because they only need to store a few bits of
data for immediate processing, while secondary storage can be much
larger because it is used for storing data not in active use.
3.​ Cost:
○​ Faster memory types like registers and cache are more expensive to
manufacture per byte. As a result, they are made smaller to limit the
cost.
○​ Main memory (RAM) is cheaper per byte than cache memory, and
secondary storage (like HDDs or SSDs) is even cheaper, allowing for
large storage capacities at lower costs. Tertiary storage like magnetic
tape or cloud storage is the least expensive per byte, though it comes
with the drawback of slower access times.

The Principle of Locality

The memory hierarchy is designed to take advantage of locality of reference, which


includes:

1.​ Temporal Locality: Recently accessed data is likely to be accessed again


soon. This is why cache memory stores recently accessed data.
2.​ Spatial Locality: Data that is near other data recently accessed is also likely to
be accessed. This principle is used by cache and memory systems to prefetch
or store contiguous blocks of data.

By having data that is frequently used stored in the fastest (and smallest) memory
types and less frequently used data stored in slower (and larger) memory types, a
system can optimize performance and minimize cost.

Summary of Memory Hierarchy

●​ Registers: Fastest, smallest, most expensive


●​ Cache Memory (L1, L2, L3): Very fast, small, expensive
●​ Main Memory (RAM): Moderate speed, larger size, moderate cost
●​ Secondary Storage (HDD/SSD): Slow, large, cheap
●​ Tertiary Storage (Cloud, Tape): Slowest, extremely large, cheapest

By organizing memory in this hierarchical manner, systems can optimize


performance (speed of access) while minimizing cost and maximizing storage
capacity for large amounts of data.
Cache Memories: Overview

Cache memory is a small, high-speed storage area between the CPU and the main
memory (RAM) in a computer. It stores frequently accessed data to speed up data
retrieval for the CPU. Cache memory works based on the principle of locality of
reference and mapping functions to decide which data to store in the cache.

Locality of Reference

Locality of reference refers to the tendency of programs to access a small set of


memory locations repeatedly over short periods of time. There are two types of
locality:

1.​ Temporal Locality:


○​ If a piece of data is accessed, it is likely to be accessed again in the
near future.
○​ Example: In loops, variables that are frequently accessed (like loop
counters) tend to have high temporal locality.
2.​ Spatial Locality:
○​ If a piece of data is accessed, nearby data is also likely to be accessed
soon.
○​ Example: In arrays or sequential data structures, when one element is
accessed, adjacent elements are likely to be accessed soon.

Cache memory takes advantage of both types of locality. By storing recently


accessed data (temporal locality) and data near it (spatial locality), cache memory
can significantly reduce the time it takes to access data compared to fetching it from
main memory.

Write Operation Protocols in Cache Memory

The write operation refers to how data is written into the cache when a CPU writes
data that needs to be cached. There are two common protocols for handling write
operations:

1.​ Write-Through Cache:


○​ Description: In this scheme, when data is written to the cache, it is also
immediately written to the main memory.
○​ Advantages:
■​ Data in the main memory is always up to date.
■​ Simplicity in managing data consistency between cache and
main memory.
○​ Disadvantages:
■​ Slower due to the overhead of writing to both the cache and
main memory simultaneously.
2.​ Write-Back Cache:
○​ Description: In this scheme, data is written to the cache first, and only
when it is evicted (replaced) from the cache is it written to the main
memory.
○​ Advantages:
■​ Faster than write-through because writing to the main memory is
delayed.
■​ Reduced traffic between the cache and main memory.
○​ Disadvantages:
■​ More complex because it requires maintaining a dirty bit (flag
indicating whether the data in the cache has been modified but
not yet written to the main memory).
■​ Inconsistent data in the main memory until the cache data is
written back.

Mapping Functions in Cache Memory

Cache mapping refers to the way data from the main memory is placed into the
cache memory. There are three types of mapping functions used in cache memory
systems:

1.​ Direct-Mapped Cache:


○​ Description: In this mapping scheme, each block of memory maps to
exactly one cache line (slot). The memory address is divided into three
parts: the tag, the index, and the block offset.
○​ Operation:
■​ The index part identifies the cache line, and the data from the
corresponding block of main memory is placed in that cache
line.
■​ The tag part identifies which block of main memory is stored in
the cache line.
■​ When the CPU accesses a memory address, the cache uses the
index to locate the correct cache line and compares the tag to
check for a cache hit or cache miss.
○​ Advantages:
■​ Simple and easy to implement.
■​ Fast access time for each cache line.
○​ Disadvantages:
■​ If multiple blocks of memory map to the same cache line, a
cache conflict may occur (eviction of data that is frequently
used).
○​ Example: If a cache has 4 lines, and memory has 16 blocks, the
memory address is divided into parts to map the blocks. A specific
block of memory will always map to the same cache line.
2.​ Associative-Mapped Cache:
○​ Description: In this mapping scheme, any block of memory can be
placed in any cache line. There is no specific restriction based on the
memory address.
○​ Operation: The memory address is divided into two parts: the tag and
the block offset. The cache line is searched completely for a match.
○​ Advantages:
■​ Flexible and avoids conflict misses (where multiple blocks map
to the same cache line).
■​ Better performance because more data can be stored in the
cache.
○​ Disadvantages:
■​ Slower than direct-mapped caches because every cache line
must be checked for a match.
■​ More complex and expensive hardware.
○​ Example: Any memory block can be placed into any cache line, so
there is no conflict when mapping. However, this requires a more
complex search mechanism to find the data in the cache.
3.​ Set-Associative Cache:
○​ Description: Set-associative mapping combines the advantages of
both direct-mapped and fully associative caches. The cache is divided
into several sets, and each set contains multiple cache lines. A memory
block can map to any line within a specific set, not just one line as in
direct-mapped cache.
○​ Operation: The memory address is divided into three parts: the tag, the
set index, and the block offset.
■​ The set index identifies which set the block belongs to.
■​ The cache checks all cache lines in that set for a match.
○​ Advantages:
■​ Reduces conflict misses (by allowing multiple lines in a set).
■​ Provides a good balance between speed and flexibility.
○​ Disadvantages:
■​ Slightly slower than direct-mapped caches but faster than fully
associative caches.

Example: If the cache is 2-way set-associative and has 4 sets, a


memory block can be placed in one of the two lines of a set, reducing
conflict misses. This gives a balance between the simplicity of direct
mapping and the flexibility of associative mapping.

Comparison of Mapping Types:


Mapping Description Advantages Disadvantages
Type

Direct-Mapp Each memory block Simple to Conflict misses,


ed Cache maps to exactly one implement, fast limited flexibility
cache line access

Fully Any memory block No conflict misses, Slow (search entire


Associative can be placed in any flexible cache), expensive
Cache cache line

Set-Associat Each block maps to a Balanced between Slower than


ive Cache specific set with direct-mapped and direct-mapped,
multiple cache lines fully associative more complex

Conclusion

●​ Direct-mapped caches are simple and fast but suffer from conflict misses.
●​ Fully associative caches are highly flexible and avoid conflict misses but are
more complex and slower.
●​ Set-associative caches provide a compromise, offering better performance
than direct-mapped caches without the full complexity of fully associative
caches.
By utilizing locality of reference and different mapping schemes, cache memory
can significantly improve the overall performance of a computer system.

Performance Considerations in Memory Systems

When designing memory systems, several factors influence the overall


performance, including how data is accessed, how efficiently the cache operates,
and how the memory hierarchy is managed. Interleaving, hit rate, miss penalty, and
various cache optimizations are key considerations. These factors aim to reduce
memory access delays and maximize data throughput for better system
performance.

Let's explore the performance considerations in detail.

1. Memory Interleaving

Memory interleaving is a technique used to improve memory access speed by


distributing memory addresses across multiple memory modules or banks.

Concept:

●​ Instead of storing all data sequentially in a single block, interleaving splits the
memory into multiple blocks (also called banks), and data is stored across
these banks in a way that allows concurrent access.
●​ It reduces the delay of memory access by enabling parallel read or write
operations.

Types of Interleaving:

●​ 2-way Interleaving: Memory addresses are divided into two banks, and every
alternate address is mapped to a different bank.
●​ 4-way Interleaving: Memory addresses are divided into four banks, with each
bank storing data from every fourth address.

Advantages:

●​ Increased throughput: By accessing multiple memory banks simultaneously,


the overall speed improves.
●​ Reduced contention: Interleaving reduces the chances of bottlenecks when
multiple data requests happen simultaneously.
Disadvantages:

●​ Complexity in addressing: Memory addressing logic becomes more complex.


●​ Requires specific hardware: Not all systems support interleaving directly, so
hardware support is needed.

2. Hit Rate and Miss Penalty

Hit Rate:

●​ Hit rate refers to the percentage of times that data requested by the CPU is
found in the cache (i.e., a cache hit).
○​ Formula: Hit Rate=Number of Cache HitsTotal Number of Memory
Accesses×100\text{Hit Rate} = \frac{\text{Number of Cache
Hits}}{\text{Total Number of Memory Accesses}} \times 100Hit
Rate=Total Number of Memory AccessesNumber of Cache Hits​×100
●​ A high hit rate means that data is frequently accessed from the cache, which
leads to faster performance.

Miss Rate:

●​ The miss rate is the opposite of the hit rate. It refers to the percentage of
times the requested data is not found in the cache (i.e., a cache miss).
○​ Formula: Miss Rate=1−Hit Rate\text{Miss Rate} = 1 - \text{Hit Rate}Miss
Rate=1−Hit Rate
○​ Miss penalty is the time taken to fetch data from a slower memory
hierarchy (e.g., from main memory or disk) when a cache miss occurs.

Miss Penalty:

●​ Miss penalty refers to the additional time required to retrieve data from a
lower-level memory (such as RAM or even disk) when a cache miss happens.
●​ High miss penalty can severely degrade performance, as it requires
accessing slower memory sources.

Impact on Performance:

●​ A high hit rate minimizes the impact of miss penalties because data can be
retrieved from the fast cache.
●​ A low miss rate (or high hit rate) results in faster memory access and better
overall performance.
3. Caches on the Processor Chip (On-Chip Caches)

On-chip caches are caches that are integrated directly onto the processor (CPU),
making them faster and more efficient than off-chip memory. Modern processors
typically include multiple levels of on-chip cache: L1 cache, L2 cache, and
sometimes L3 cache.

Advantages of On-Chip Caches:

●​ Reduced latency: On-chip caches have much lower access times compared to
off-chip memory.
●​ Higher throughput: The CPU can access data from the cache much more
quickly, leading to faster data processing.
●​ Energy efficiency: Since data does not need to travel far (within the chip),
power consumption is lower compared to accessing external memory.

Levels of On-Chip Cache:

●​ L1 Cache: Closest to the processor cores and the smallest in size (typically
32KB - 128KB per core). It stores the most frequently accessed data and
instructions.
●​ L2 Cache: Larger and slightly slower than L1 cache (typically 256KB - 8MB). It
serves as a backup to L1 cache and stores a broader set of data.
●​ L3 Cache: Shared among multiple cores in multi-core processors, much
larger in size (typically 2MB - 16MB), and slower than L1/L2 caches.

4. Write Buffer

A write buffer is a temporary storage area used to hold data before it is written to
the main memory or cache. It helps manage write operations efficiently.

Purpose:

●​ Speed up CPU operations: The CPU can continue executing instructions while
write operations are being buffered.
●​ Reduce contention: It reduces delays due to write operations waiting for
slower memory writes.

How it Works:
●​ When the CPU writes data to memory or cache, the data is first placed in the
write buffer.
●​ The buffer allows the CPU to continue other operations while the data is being
written to the destination (main memory or cache) at a later time.

Impact on Performance:

●​ Write buffers can improve performance by reducing write stalls and allowing
the CPU to perform other tasks while the data is being written.
●​ However, a full write buffer can cause stall cycles in the processor if there is
no space to write new data.

5. Prefetching

Prefetching is the process of anticipating future memory accesses and loading the
corresponding data into the cache before it is actually requested by the CPU.

Types of Prefetching:

●​ Hardware Prefetching: Managed by the hardware, where the system


automatically detects patterns in memory accesses and preloads data into
the cache.
●​ Software Prefetching: Done by the programmer or compiler, where explicit
instructions are added to fetch data into the cache before it is needed.

Benefits:

●​ Reduced cache miss rate: By bringing data into the cache ahead of time, the
likelihood of a cache miss is reduced.
●​ Increased throughput: Data is available in the cache when the CPU needs it,
reducing delays from fetching data from slower memory.

Challenges:

●​ Over-prefetching: Fetching unnecessary data can waste bandwidth and fill


the cache with irrelevant data.
●​ Latency: If the prefetching mechanism is not accurate, it can cause more
delays by fetching unnecessary data.
6. Lockup-Free Cache

A lockup-free cache is a cache architecture that avoids cache access delays when
there are multiple memory accesses happening simultaneously.

How It Works:

●​ In a lockup-free system, the CPU can continue to access the cache even if
there are cache misses or other memory accesses occurring at the same
time.
●​ It prevents the system from being locked up (stalled) due to memory access
issues.

Benefits:

●​ Improved multitasking: Allows multiple memory accesses to occur


simultaneously without blocking.
●​ Reduced bottlenecks: Ensures that one memory access does not hold up
others, improving throughput.

Challenges:

●​ Increased complexity: Implementing lockup-free cache systems adds more


complexity to cache controllers and memory management.
●​ Cost: Lockup-free designs may require more advanced hardware, increasing
the cost of the system.

Summary of Key Techniques for Improving Cache Memory Performance


Techniques Purpose Impact on Performance

Memory Distributes data across Increases memory throughput,


Interleaving multiple memory banks for reduces contention.
parallel access.

Hit Rate Percentage of cache Higher hit rate leads to better


accesses that result in cache performance by reducing misses.
hits.
Miss Penalty The time penalty for Lower miss penalty means
accessing slower memory on better overall system speed.
a cache miss.

On-Chip Caches integrated within the Reduces memory access time,


Caches CPU chip to reduce latency. improves performance.

Write Buffer Temporary storage for write Allows the CPU to continue work
operations to avoid delays. while write operations are
handled.

Prefetching Preloading data into the Reduces cache misses and


cache before it’s accessed. improves throughput.

Lockup-Free Cache design that allows Prevents CPU stalls during cache
Cache concurrent memory access. accesses, improving efficiency.

By utilizing these techniques, memory systems can be optimized to reduce latency,


increase throughput, and maximize CPU efficiency, ultimately improving overall
system performance.

Virtual Memory, Address Translation, and Translation Lookaside Buffer


(TLB)

Virtual memory is a technique used by modern computer systems to provide an


"idealized" abstraction of the storage resources that are actually available on a given
machine. It creates an illusion for users of a very large (and continuous) memory
space, even if the physical memory (RAM) is much smaller. This abstraction is
achieved by using address translation between virtual addresses and physical
addresses.

Let's explore the key concepts in detail.

1. Virtual Memory: Overview

Virtual memory allows a computer to compensate for physical memory shortages,


temporarily transferring data from random access memory (RAM) to disk storage.
This allows systems to run larger applications than would otherwise be possible,
even if the physical memory is limited.
Key Concepts:

●​ Virtual Address Space: A process is given its own range of memory addresses,
referred to as its virtual address space.
●​ Physical Address Space: This is the actual memory space available in the
computer's RAM.
●​ Paging: Memory is divided into fixed-size blocks called pages (for virtual
memory) and frames (for physical memory). The operating system moves
pages between physical memory and secondary storage (like a hard drive or
SSD) as needed.
●​ Page Table: The page table is used to map virtual addresses to physical
addresses. It stores the mapping between virtual pages and physical frames.

Advantages of Virtual Memory:

●​ Isolation and protection: Each process operates in its own address space,
providing protection against interference from other processes.
●​ Efficient use of RAM: By allowing processes to use more memory than is
physically available, the system can run larger applications.
●​ Simplified programming model: Programmers don’t need to worry about
memory limitations because each process is given its own virtual address
space.

2. Address Translation: Virtual Address to Physical Address

Address translation is the process of converting a virtual address into a physical


address. The operating system uses a page table to map virtual addresses to
physical addresses.

How Address Translation Works:

●​ Virtual Address: When a program uses a virtual address to access memory, the
virtual address is divided into two parts:
1.​ Page Number: The higher-order bits that identify the virtual page.
2.​ Offset: The lower-order bits that specify the exact location within the
page (the byte offset).
●​ Page Table Lookup: The page table is used to find the corresponding
physical page frame for a given virtual page. The page table stores the
mapping between the virtual page number and the physical page frame
number.
●​ Physical Address: Once the physical page frame is found, the offset from the
virtual address is combined with the frame number to form the physical
address.

Example of Address Translation:

Consider a virtual address:

●​ Virtual Address: V = (Page Number, Offset)​


Let’s assume that the virtual page size is 4KB (so 12 bits are used for the offset
in the virtual address).
1.​ Page Table: A page table holds the mapping of virtual pages to
physical frames. For example, if virtual page 2 maps to physical frame
5, then the page table stores the mapping for page 2 → frame 5.
2.​ Translation Process:
■​ Extract the page number and the offset from the virtual address.
■​ Use the page table to find the corresponding physical page.
■​ Combine the physical frame number with the offset to obtain the
physical address.
●​ If we want to access memory at virtual address V = (2, 1000), the page
number is 2 and the offset is 1000. If the page table maps page 2 to physical
frame 5, the physical address will be 5 + 1000, resulting in the physical
address.

3. Translation Lookaside Buffer (TLB)

The Translation Lookaside Buffer (TLB) is a special type of cache used to speed up
the address translation process. The TLB stores recent virtual-to-physical page
mappings to avoid the overhead of accessing the page table every time a translation
is needed.

How TLB Works:

●​ When the CPU needs to translate a virtual address to a physical address, it


first checks the TLB.
○​ If the virtual page number is found in the TLB (this is called a TLB hit),
the physical address can be retrieved quickly without accessing the
page table.
○​ If the virtual page number is not found in the TLB (this is called a TLB
miss), the CPU must access the page table to perform the translation,
which is slower.

Structure of the TLB:

●​ The TLB is a small, fast, associative cache that stores entries of the form:
1.​ Virtual Page Number (VPN): The virtual page.
2.​ Physical Frame Number (PFN): The corresponding physical frame.
●​ TLB Entry: A typical entry in the TLB consists of:
1.​ Tag: The virtual page number.
2.​ Data: The corresponding physical frame number.
3.​ Other information: Access permissions, dirty bit, etc.

TLB Hit and Miss:

●​ TLB Hit: The page number is found in the TLB. The physical address can be
generated immediately.
●​ TLB Miss: The page number is not found in the TLB. The CPU must access the
page table to find the physical frame, and then the mapping is typically
cached in the TLB.

TLB Replacement Policy:

●​ Like other caches, the TLB has a limited number of entries. When it’s full, an
entry must be replaced.
○​ Common replacement policies include LRU (Least Recently Used) and
FIFO (First In, First Out).

4. TLB with Address Translation: Diagram

Below is a simplified diagram showing how the TLB works in conjunction with the
page table during address translation.

sql
CopyEdit
+-----------------------+ +----------------------------+
+--------------------+
| CPU (Virtual Addr) | --> | TLB Lookup (TLB Check) |
--> | TLB Hit? |
+-----------------------+ +----------------------------+
+--------------------+
| Virtual Address (V) | | Check if page number in TLB
| | If Hit: Use Frame |
+-----------------------+ +----------------------------+
+--------------------+
| |
|
| |
|
v v
v
+---------------------+ +-----------------------+
+--------------------+
| Extract Page Number | | TLB Miss - Access |
| Use Page Table |
| and Offset | ---> | Page Table for Mapping |
-----> | to Get Physical |
+---------------------+ +-----------------------+
| Address |
|
+--------------------+
v
+-----------------------+
| Combine Physical |
| Frame Number + Offset|
+-----------------------+
|
v
+----------------------+
| Final Physical Addr |
+----------------------+

Explanation:
1.​ CPU generates a virtual address.
2.​ The TLB is checked for the virtual page number (VPN).
3.​ If the TLB contains the mapping (hit), the physical address is quickly obtained.
4.​ If there’s a miss, the page table is accessed to translate the virtual address to
a physical address.
5.​ The physical address is then used for memory access.

Summary of Concepts
Concept Description

Virtual Memory A technique that allows the computer to run large


applications by using disk storage as virtual memory.

Address Translation The process of converting a virtual address to a


physical address using a page table.

Page Table A data structure used to map virtual pages to physical


frames.

TLB (Translation A small, fast cache used to speed up virtual-to-physical


Lookaside Buffer) address translation. TLB stores recent mappings.

TLB Hit Occurs when the requested virtual page number is


found in the TLB.

TLB Miss Occurs when the requested virtual page number is not
found in the TLB, requiring a lookup in the page table.

TLB Replacement Rules for replacing entries in the TLB when it is full,
Policy such as LRU or FIFO.

In summary, virtual memory enables efficient memory usage by translating virtual


addresses to physical memory locations via address translation. The TLB helps
speed up this process by caching recent translations, reducing the need for frequent
accesses to the page table.

Memory Management Requirements


Memory management is a crucial aspect of any operating system (OS) that ensures
the efficient use of computer memory, particularly when multiple processes are
running. Effective memory management is essential for the system's performance,
resource allocation, and overall functionality. Below are the key requirements and
principles for managing memory in a computer system.

1. Efficient Allocation and Deallocation of Memory

The primary task of memory management is allocating memory to processes and


ensuring that it is freed up when no longer needed. This helps to maximize the use
of available memory and minimize waste.

●​ Dynamic Allocation: The system must allocate memory dynamically as


needed by programs. It should provide memory as requested by the
processes and release it when the process terminates or no longer requires it.
●​ Deallocation: Memory that is no longer in use by processes should be
returned to the memory pool to ensure that the system doesn't run out of
resources.
●​ Memory Pool Management: A memory pool is a set of available memory
blocks that can be allocated or deallocated as required.

2. Process Isolation and Protection

Memory management must ensure that processes are isolated from each other.
Each process should have its own private memory space, preventing one process
from interfering with or accessing the memory of another.

●​ Process Isolation: Each process should have its own virtual address space.
This prevents processes from directly accessing or modifying each other’s
memory, providing security and stability.
●​ Protection Mechanisms: The OS should prevent processes from accessing
memory they do not own, even if a process tries to use another process's
memory intentionally or due to bugs.
○​ This is achieved using mechanisms like base and limit registers,
access control lists, and memory segmentation.
3. Memory Address Translation

Most modern systems use virtual memory to manage memory. Memory


management must handle the translation between virtual addresses (used by
programs) and physical addresses (actual locations in the RAM).

●​ Paging: Memory is divided into fixed-size blocks (pages), and the system uses
a page table to map virtual addresses to physical addresses.
●​ Segmentation: Memory is divided into segments, each representing different
types of data, such as code, data, and stack.
●​ Virtual Memory: It allows processes to use more memory than physically
available by swapping data in and out of secondary storage (disk). This
requires efficient address translation mechanisms.

4. Efficient Use of Memory (Minimizing Fragmentation)

Memory fragmentation refers to the inefficient use of memory due to the allocation
and deallocation of memory blocks of various sizes. Fragmentation can degrade
system performance, so memory management should minimize both external and
internal fragmentation.

●​ External Fragmentation: Occurs when free memory is divided into small


blocks scattered across the memory. Even though there might be enough
total free memory, there may not be enough contiguous memory available to
satisfy a large allocation.
●​ Internal Fragmentation: Occurs when memory blocks are allocated in fixed
sizes, and a process does not use the entire block. The unused portion of the
block is wasted.

Solutions to Fragmentation:

●​ Compaction: Reorganizing memory to eliminate gaps created by fragmented


memory.
●​ Paging and Segmentation: Using fixed-size pages or segments can help
reduce external fragmentation. Paged Memory Allocation divides memory
into fixed-size chunks (pages) to prevent fragmentation.
●​ Buddy System: A memory allocation system where memory is divided into
blocks of power-of-two sizes. This reduces fragmentation by ensuring efficient
allocation.
5. Support for Multiprogramming and Multitasking

Modern systems run multiple processes at the same time (multiprogramming or


multitasking). The memory management system must be able to support the
concurrent execution of multiple processes by allocating and managing memory for
each process efficiently.

●​ Context Switching: Memory management must be able to handle context


switching, where the CPU switches from executing one process to another.
During this process, the memory state of the previous process must be saved
and the memory state of the new process restored.
●​ Shared Memory: Multiple processes may need to share data. Memory
management systems should support shared memory segments to allow
efficient communication between processes.

6. Memory Access Control and Security

Access control mechanisms are vital to protect the system from unauthorized
access and to prevent processes from violating memory boundaries.

●​ Permissions: Memory management systems must ensure that only


authorized processes can access specific memory regions. For instance, a
process may be given read-only access to certain parts of memory and
read-write access to others.
●​ Security: Memory management should also prevent malicious processes
from corrupting or accessing critical areas of the memory (e.g., the kernel
memory).
○​ Separation of user space and kernel space ensures that user
processes cannot directly access or modify kernel memory.

7. Swapping and Paging (Virtual Memory Management)

Virtual memory management allows the OS to run large programs on machines with
limited physical memory by swapping data in and out of secondary storage (disk).
This is done through paging or segmentation, which helps the system use memory
more efficiently.
●​ Paging: Dividing memory into small fixed-size blocks called pages. When the
system is running low on memory, it swaps pages from RAM to disk storage
(swap space) and vice versa.
●​ Swapping: The OS swaps entire processes or parts of processes between
RAM and disk to free up memory for other processes.

8. Support for Dynamic Memory Allocation

Modern applications often require dynamic memory allocation, which is memory


that is allocated at runtime as the application needs it.

●​ Heap Memory: The heap is a region of memory used for dynamic memory
allocation. The operating system needs to manage the allocation and
deallocation of memory in the heap to prevent issues like memory leaks or
segmentation faults.
●​ Garbage Collection: Some systems (e.g., Java, Python) use garbage
collection to automatically reclaim memory that is no longer in use. This helps
in managing memory for long-running applications.

9. Paging, Segmentation, and Combined Paging-Segmentation

Paging and segmentation are two methods of memory management, and some
systems combine both techniques.

●​ Paging: Splits memory into small fixed-size blocks (pages). Each process’s
address space is divided into pages, and these pages are mapped to frames
in physical memory.
●​ Segmentation: Divides the memory into segments that are logical divisions
such as code, stack, data, and heap. Each segment can grow or shrink
independently.
●​ Combined Paging-Segmentation: Some systems combine the two
techniques to allow more flexible memory management.

10. Resource Management


The OS must also track memory resources and manage them across multiple
processes. This involves:

●​ Memory Allocation Strategies: Deciding how to allocate memory to different


processes (first-fit, best-fit, worst-fit).
●​ Memory Pools: Ensuring that memory is allocated efficiently and that memory
is not wasted.
●​ Memory Usage Monitoring: Keeping track of memory usage patterns,
including which regions of memory are being used and how often, to make
better allocation decisions.

Summary of Memory Management Requirements


Requirement Description

Efficient Allocate and deallocate memory dynamically to


Allocation/Deallocation maximize usage and prevent waste.

Process Isolation & Ensure that processes do not interfere with each
Protection other’s memory and prevent unauthorized access.

Address Translation Translate virtual addresses to physical addresses


using paging and segmentation.

Minimize Fragmentation Reduce both external and internal fragmentation to


optimize memory use.

Multiprogramming Handle multiple processes and provide each with its


Support own memory space while sharing resources.

Memory Access Control Ensure that memory access is secure, with


appropriate permissions for processes.

Swapping/Paging Manage virtual memory through paging, swapping,


and efficient use of secondary storage (disk).

Dynamic Memory Support dynamic memory allocation at runtime and


Allocation ensure efficient management of heap memory.

Resource Management Efficiently manage memory resources, including


allocation strategies and memory pools.
Secondary Storage: Overview

Secondary storage refers to non-volatile memory used to store data persistently in a


computer system. Unlike primary memory (RAM), which is volatile and temporarily
stores data that is actively being used, secondary storage provides long-term data
storage, even when the power is turned off. It includes devices like magnetic disks,
optical disks, and magnetic tapes.

Let's look at the details of magnetic disks, other storage devices, and how they
operate.

1. Magnetic Disk: Overview

Magnetic disks are the most widely used type of secondary storage. They store data
by magnetizing small sections (tracks) of a disk’s surface. Data is written to or read
from the disk by read/write heads that move across the surface.

Types of Magnetic Disks:

●​ Hard Disk Drives (HDDs): Traditional magnetic disks found in computers,


servers, and other storage systems.
●​ Floppy Disks: An older, portable form of magnetic storage that is now largely
obsolete.

2. Advantages of Magnetic Disks

●​ High Storage Capacity: Magnetic disks offer large storage capacities ranging
from several gigabytes (GB) to multiple terabytes (TB).
●​ Non-Volatility: Data remains stored even when the power is turned off,
making it ideal for long-term storage.
●​ Faster Data Access: Faster than optical media (like CDs/DVDs) for reading
and writing data.
●​ Cost-Effective: Magnetic disks are more affordable than solid-state drives
(SSDs) for the amount of storage they provide.

3. Disadvantages of Magnetic Disks


●​ Mechanical Parts: Magnetic disks have moving parts (e.g., spindle, read/write
heads), which can wear out over time and lead to mechanical failure.
●​ Slower than Solid-State Drives (SSDs): Though fast, magnetic disks are
slower than modern SSDs, which do not have moving parts.
●​ Susceptible to Damage: Physical shocks or vibrations can damage the disk or
corrupt data, particularly in hard disk drives.

4. Organization and Accessing of Data on Disk

Magnetic disks are organized into tracks and sectors:

●​ Tracks: Circular paths on the disk surface where data is stored. Each platter of
a disk has multiple concentric tracks.
●​ Sectors: A track is further divided into smaller segments called sectors, which
typically hold 512 bytes or 4 KB of data.
●​ Cylinders: When multiple platters are used in a disk, the same track number
across different platters forms a cylinder.

Disk Access Mechanism:

1.​ The read/write head moves to the appropriate track.


2.​ Data is read or written to the correct sector.
3.​ This process is managed by the disk controller.

5. Access Time
Access time is the time it takes to retrieve or write data to the disk. It includes:

●​ Seek Time: The time it takes for the disk’s read/write head to move to the
correct track.
●​ Rotational Latency: The time it takes for the disk to rotate to the correct
sector under the read/write head.
●​ Data Transfer Time: The time taken to transfer data to or from the disk once
the correct sector is under the head.

The total disk access time is the sum of seek time, rotational latency, and data
transfer time.

6. Data Buffer/Cache

The data buffer (or cache) is a small amount of high-speed memory used to store
frequently accessed data. It helps improve disk performance by reducing the
number of times data must be read from or written to the slower magnetic disk.

●​ Read Buffer: Caches data that is frequently read.


●​ Write Buffer: Temporarily holds data that is being written to the disk.

7. Disk Controller

The disk controller is a hardware component responsible for managing the reading
and writing of data on the disk. It communicates between the CPU and the disk,
converting commands from the operating system into actions that the disk hardware
can perform. The disk controller handles tasks such as:

●​ Managing the read/write heads.


●​ Translating logical block addresses into physical addresses on the disk.
●​ Managing data transfer between the disk and system memory.

8. ATA/EIDE Disks
●​ ATA (Advanced Technology Attachment): A standard interface for connecting
hard drives to computers. ATA defines both the physical connection and the
protocol for communication.
●​ EIDE (Enhanced IDE): An extended version of ATA that supports faster data
transfer rates, larger storage capacities, and additional features like CD-ROM
support.

Advantages:

●​ Widely supported by various systems.


●​ Cost-effective and reliable for traditional hard drives.

Disadvantages:

●​ Slower transfer speeds compared to newer interfaces like SATA or SCSI.

9. SCSI Disks (Small Computer System Interface)

SCSI is a set of standards for connecting and transferring data between computers
and peripheral devices. It supports multiple devices on a single bus and offers fast
data transfer rates.

Advantages:

●​ Supports multiple devices on one bus, reducing the need for multiple
controller cards.
●​ High performance and reliability, often used in servers and high-end
workstations.
●​ Can support a wide range of devices (hard drives, scanners, printers, etc.).

Disadvantages:

●​ More expensive than other interfaces like ATA or SATA.


●​ Slightly more complex setup.

10. SATA (Serial ATA) Disks

SATA is a newer interface for connecting hard drives and SSDs to the motherboard,
designed to replace the older IDE/ATA interface.
Advantages:

●​ Faster data transfer rates compared to older ATA/EIDE interfaces.


●​ Smaller cables for better airflow in systems.
●​ Hot-swappable (can add/remove drives without shutting down the system).

Disadvantages:

●​ Still slower than modern SAS (Serial Attached SCSI) or NVMe interfaces for
high-performance storage.
●​ Limited to a smaller number of devices connected per bus.

11. Floppy Disks

Floppy disks were once popular for storing small amounts of data. They are now
obsolete in most systems, but were used extensively in the past for portable storage.

Characteristics:

●​ Capacity: Typically 1.44 MB for a 3.5" floppy disk.


●​ Read/Write Mechanism: Magnetic coating on the disk stores data, which is
read or written by the disk drive's read/write head.

Advantages:

●​ Portable and inexpensive (during their time).


●​ Easy to use for transferring small files between computers.

Disadvantages:

●​ Low storage capacity (obsolete in modern systems).


●​ Slower data transfer speeds.
●​ Prone to damage from magnetic fields or physical impact.

12. RAID (Redundant Array of Independent Disks)

RAID is a data storage technology that combines multiple physical disk drives into
one or more logical units to improve data redundancy and performance.

RAID Levels:
●​ RAID 0: Striping (data is divided into blocks and spread across multiple disks).
Advantage: Fast read/write speeds. Disadvantage: No data redundancy.
●​ RAID 1: Mirroring (data is duplicated on two or more disks). Advantage: Data
redundancy (backup). Disadvantage: Requires double the storage.
●​ RAID 5: Striping with parity (data is striped across multiple disks with parity
information for redundancy). Advantage: Efficient use of space and data
redundancy. Disadvantage: Slower write speeds.
●​ RAID 10: Combination of RAID 1 and RAID 0. Advantage: Both redundancy and
improved performance. Disadvantage: Requires a minimum of 4 disks.

Advantages:

●​ Improved data reliability (redundancy).


●​ Enhanced performance (faster read/write speeds in some configurations).

Disadvantages:

●​ Requires more disks.


●​ Can be complex to set up and manage.

13. Optical Disks

Optical disks, such as CDs, DVDs, and Blu-ray disks, use laser technology to read
and write data. They are primarily used for storing media files, software, and archival
data.

Advantages:

●​ Large storage capacity (up to several GBs).


●​ Durable and resistant to physical damage (compared to magnetic disks).
●​ Portable and cheap for mass storage and distribution.

Disadvantages:

●​ Slower access times compared to magnetic disks or SSDs.


●​ Not ideal for frequent read/write operations.
●​ Storage capacity is limited compared to modern hard drives and SSDs.

14. Magnetic Tapes


Magnetic tapes are used primarily for data backup and archival storage. Data is
written to and read from the tape in a sequential manner.

Advantages:

●​ Extremely high storage capacity (terabytes per tape).


●​ Cost-effective for long-term data storage and backup.
●​ Reliable for archival purposes.

Disadvantages:

●​ Sequential access only (slower read/write compared to random access


devices like hard disks).
●​ Expensive setup cost for high-end tape libraries.
●​ Slower data retrieval compared to other secondary storage solutions.

Summary of Secondary Storage Devices


Storage Advantages Disadvantages
Device

Magnetic High capacity, cost-effective, Mechanical failure, slower than


Disks persistent storage SSDs

ATA/EIDE Widely supported, Slower than newer interfaces like


cost-effective SATA

SCSI Fast, reliable, supports Expensive, complex setup


multiple devices

SATA Faster than ATA/EIDE, easy to Slower than high-performance


install interfaces (SAS, NVMe)

Floppy Portable, inexpensive Low capacity, obsolete


Disks (historically)

RAID Data redundancy, improved Complex setup, requires more


performance disks

Optical Durable, cheap, good for Slow access times, limited capacity
Disks mass storage
Magnetic High capacity, cost-effective Slow read/write speeds, sequential
Tapes for backups access only
I/O Organization
Introduction:
In computer architecture, I/O (Input/Output) Organization refers to the way a computer system
manages the data exchange between its internal components (like the CPU, memory) and the
outside world (like keyboards, mice, displays, storage devices, etc.).

I/O Organization is how a computer's internal system communicates with the outside world to
send and receive data efficiently, which is crucial for real-time systems like gaming, interactive
devices, and even industrial machinery.

Simple Explanation:

1.​ Input Devices: Devices that send data into the computer, like keyboards, mouses, and
sensors.
2.​ Output Devices: Devices that display or produce results from the computer, like
monitors, printers, and speakers.
3.​ I/O Controllers: Hardware that helps manage and control data transfer between the
internal system and external devices.

Key Concepts:

●​ I/O Ports: These are physical or virtual ports where input or output devices connect to
the computer.
●​ Buses: These are communication pathways that carry data between the computer's
internal components and external devices.
●​ Direct Memory Access (DMA): A method where data is transferred directly between
memory and I/O devices, bypassing the CPU to speed up data transfer.

Real-Time Applications:

1.​ Touchscreen Devices: Smartphones and tablets use I/O organization to handle touch
input and display output (text, images).
2.​ Printers: Computers send data to printers (output) via an I/O controller to convert digital
documents into physical printouts.

Accessing I/O Devices

Accessing I/O devices refers to how a computer communicates with devices like keyboards,
mice, printers, and storage devices. This involves reading data from or writing data to external
devices, which can be done in different ways. The two most common methods of accessing I/O
devices are Memory-Mapped I/O and I/O Mapped I/O.

1. Memory-Mapped I/O (MMIO)


In Memory-Mapped I/O, I/O devices are treated like memory locations. The addresses for the
devices are part of the same address space used by the computer's RAM. This means that the
CPU can use the same instructions to access both memory and I/O devices.

●​ How it works: The I/O devices are assigned specific addresses within the system's
memory space. When the CPU reads or writes data to those addresses, the data is sent
to or received from the I/O devices.
●​ Advantages:
○​ Easier and faster to program because the CPU uses the same instructions for
memory and I/O.
○​ More flexible, as you can access larger address ranges.
●​ Disadvantages:
○​ Consumes address space that could otherwise be used for RAM.

Example: A keyboard or a display is mapped to specific memory addresses, so reading from


the address will get you the input from the keyboard or output to the display.

2. I/O Mapped I/O (Port-Mapped I/O)

In I/O Mapped I/O, I/O devices have their own separate address space (different from the
memory address space). The CPU uses special instructions to access I/O devices.

●​ How it works: The I/O devices are assigned separate addresses, and the CPU uses
specific I/O instructions (like IN or OUT in assembly language) to access these devices.
●​ Advantages:
○​ Memory space for RAM is not taken up by I/O devices.
○​ More secure and simpler to manage since memory and I/O are separate.
●​ Disadvantages:
○​ The CPU requires different instructions to interact with I/O devices, making
programming slightly more complex.

Example: A printer or disk drive might be accessed through I/O instructions that target its
specific port addresses.

3. I/O Interface for Input Devices

An I/O interface is a hardware component that allows communication between the CPU and the
input device (like a keyboard or mouse). The interface converts signals from the input device
into data the CPU can understand and vice versa.

●​ How it works: When a user interacts with an input device, the interface converts the
physical input (e.g., keypress, mouse movement) into data that the CPU processes.
●​ Example: For a keyboard, each keypress generates a unique code. The interface
transmits this code to the CPU, which interprets it.
What is an Interrupt?

An interrupt is a mechanism that temporarily halts the current execution of a program or


process in a computer system, allowing the CPU to immediately respond to important events or
requests. Once the interrupt is serviced (i.e., the CPU addresses the issue or request), the CPU
returns to the previous task or process. Interrupts enable a computer to handle multiple tasks
efficiently and prioritize critical operations.

In simpler terms, an interrupt is like a "signal" that says, "Hey, stop what you're doing and deal
with this right now!"

Types of Interrupts

Interrupts can be categorized based on their source and how they are handled. Here are the
main types of interrupts:

1. Hardware Interrupts

●​ Definition: These interrupts are generated by external hardware devices or peripherals


(like a keyboard, mouse, or printer) to alert the CPU that an event requires its attention.
●​ Example: Pressing a key on the keyboard triggers a hardware interrupt, telling the CPU
to process the keypress.
●​ Characteristics: Hardware interrupts are asynchronous, meaning they can occur at any
time during program execution.
●​ Example Devices: Keyboard, mouse, printer, timer, disk drives.

2. Software Interrupts

●​ Definition: These interrupts are generated by software (programs or processes) running


on the computer. They are often used for system calls, where a program requests a
service from the operating system.
●​ Example: A program requesting to read from a file might trigger a software interrupt to
ask the OS to handle the task.
●​ Characteristics: Software interrupts are synchronous, meaning they occur as part of the
program's execution flow.
●​ Example Uses: System calls, debugging, exception handling (like division by zero).

3. Internal Interrupts (Exceptions)


●​ Definition: These interrupts occur as a result of errors or exceptional conditions during
the execution of a program. They are triggered by the CPU itself.
●​ Example: A program trying to divide a number by zero triggers an exception interrupt.
●​ Characteristics: Internal interrupts are usually errors or conditions that require
immediate attention, like invalid memory access or arithmetic errors.
●​ Example Types: Division by zero, invalid opcode, page fault (in virtual memory
systems).

4. External Interrupts

●​ Definition: These interrupts are generated by external hardware devices (e.g., a


peripheral or external sensor).
●​ Example: A network card receiving a data packet might send an external interrupt to the
CPU, signaling that it should process the incoming data.
●​ Characteristics: External interrupts are typically asynchronous and occur in response to
external events or stimuli.
●​ Example Devices: Network interface cards, external sensors, or buttons.

How Interrupts Work:

1.​ Interrupt Request (IRQ): An interrupt is initiated by the hardware or software, which
sends an interrupt request to the CPU.
2.​ Interrupt Acknowledgment: The CPU stops executing the current instruction and
acknowledges the interrupt request.
3.​ Interrupt Service Routine (ISR): The CPU runs a special function or routine (ISR)
designed to handle the interrupt. The ISR takes care of the interrupt and processes it.
4.​ Return to Normal Execution: After the ISR finishes, the CPU resumes its previous task
or program where it left off.

Example in Real Life:

Imagine you're cooking dinner (the CPU running a program) and your phone rings (an interrupt).
You stop cooking (pause the current task) and answer the call (handle the interrupt). After the
call is finished, you return to cooking (resume the original task).

Interrupts allow the CPU to be more efficient by responding quickly to important tasks without
having to constantly check for them manually (polling).

What is Direct Memory Access (DMA)?


Direct Memory Access (DMA) is a method of transferring data between an I/O device (like a
hard disk or keyboard) and the system's memory without involving the CPU. It allows data to
be moved directly between the memory and the device, freeing up the CPU to perform other
tasks. This method is faster and more efficient than using the CPU to move data because it
bypasses the need for the CPU to control every single byte of data transfer.

Key Points:

●​ Speed: DMA reduces CPU load, allowing faster data transfers.


●​ Efficiency: DMA allows simultaneous data transfer and processing.
●​ Automated Data Transfer: Once the DMA controller is set up, it handles the transfer
without requiring CPU intervention for each byte.

How DMA Works:

1.​ Initiation: The CPU sends a command to the DMA controller to set up the transfer,
specifying the memory address, the I/O device, and the amount of data to be transferred.
2.​ Transfer: The DMA controller takes control of the system's data bus and transfers data
directly between the I/O device and memory.
3.​ Completion: Once the transfer is complete, the DMA controller sends an interrupt to the
CPU to notify it that the transfer is finished, allowing the CPU to resume its normal tasks.

Steps in DMA Transfer:

1.​ CPU to DMA: The CPU initializes the DMA by sending the source and destination
addresses, the direction (read/write), and the number of data units.
2.​ DMA Controller: The DMA controller handles the data transfer directly, accessing the
memory and the I/O device without CPU involvement.
3.​ Interrupt: After completing the transfer, the DMA controller sends an interrupt to the
CPU, signaling that the task is finished.

Types of DMA:

1.​ Burst Mode DMA: The DMA controller transfers a block of data in a single burst. The
CPU is locked out during this burst and cannot perform other tasks.
2.​ Cycle Stealing DMA: The DMA controller transfers data one byte at a time, stealing
cycles from the CPU. The CPU gets control after each byte is transferred.
3.​ Block Mode DMA: Similar to burst mode, but the DMA controller transfers data in larger
blocks, giving the CPU more time between transfers.
4.​ Demand Mode DMA: The DMA controller transfers data when the I/O device is ready,
and the CPU can continue with other tasks in the meantime.
+-------------------+ +--------------------+ +---------------------+
| | | | | |
| CPU |<----->| DMA Controller |<----->| I/O Device |
| | | | | |
+-------------------+ +--------------------+ +---------------------+
| | |
| v v
| +----------------+ +-----------------+
| | | | |
+----------------->| Memory |<-------------| Data Transfer |
| | | |
+----------------+ +-----------------+

Explanation of Diagram:

1.​ CPU to DMA: The CPU sets up the DMA controller with the required information
(source, destination, and data size).
2.​ DMA Controller: Once initialized, the DMA controller takes over and transfers data
between the memory and I/O device without involving the CPU.
3.​ Memory: Data is moved from memory to the I/O device or from the I/O device to
memory.
4.​ Interrupt: After completing the data transfer, the DMA controller sends an interrupt to the
CPU, notifying it that the task is finished.

Advantages of DMA:

●​ Efficiency: The CPU is freed from manually managing every byte of data transfer,
allowing it to focus on more important tasks.
●​ Speed: Direct transfers without CPU intervention lead to faster data transfers.
●​ Concurrent Operation: DMA enables data transfer and CPU processing to occur
simultaneously, enhancing overall system performance.

What is a Bus in Computer Architecture?

A bus in computer architecture is a set of communication pathways or lines that allow data to be
transferred between different components of the computer, such as the CPU, memory, and
input/output devices. It acts like a "highway" that data, addresses, and control signals travel on,
enabling different parts of the computer to communicate with each other.

There are three main types of buses:

1.​ Data Bus: Carries data between the CPU, memory, and peripherals.
2.​ Address Bus: Carries memory addresses from the CPU to the memory and I/O devices.
3.​ Control Bus: Carries control signals that coordinate the operations of different
components (e.g., read/write signals).

Types of Buses

There are two major types of buses based on how they synchronize data transfer:
Synchronous Bus and Asynchronous Bus.

1. Synchronous Bus

A synchronous bus is a bus system where the data transfer is synchronized with a clock
signal. The clock signal regulates the timing of the data transfer, ensuring that both the sender
and receiver are ready to exchange data at the same time.

Key Features:

●​ Clock Signal: The operation of a synchronous bus depends on a central clock signal
that synchronizes data transfer between components.
●​ Timing: Data is transferred in sync with the clock, meaning every transfer happens at
fixed intervals.
●​ Faster Transfer: Since transfers are synchronized, the timing is predictable and faster.
●​ Efficiency: As both the sender and receiver are synchronized, the system ensures that
data is transferred only when both parties are ready.

Working:

In a synchronous bus, both the sender and receiver wait for the clock signal before sending or
receiving data. This reduces the chances of data being missed or out of sync.

Example:

When a CPU sends data to memory, the transfer will happen at a fixed clock cycle (e.g., every
5ns, 10ns), which ensures consistency and timing precision.

Advantages:

●​ Predictable Timing: Since the clock controls the flow, the timing of the transfer is
predictable and managed.
●​ Faster Transfers: Typically faster because the system works in a regular and organized
way.

Disadvantages:
●​ Dependence on Clock: The entire system depends on the clock speed, meaning higher
clock speeds are needed for faster data transfers.
●​ Complexity: The system may require more complex control mechanisms to handle
various devices.

2. Asynchronous Bus

An asynchronous bus does not rely on a clock signal for synchronization. Instead, data
transfer is controlled by handshaking signals that indicate when the sender and receiver are
ready for data transfer. In other words, the sender and receiver communicate using control
signals, and the data is transferred when both devices are ready.

Key Features:

●​ No Clock Signal: The transfer does not rely on a common clock, making it more flexible.
●​ Handshaking: Communication occurs through handshaking signals. The sender signals
when data is ready, and the receiver signals when it's ready to receive the data.
●​ Variable Timing: The data transfer happens at variable intervals, as it is dependent on
the readiness of the components involved.

Working:

In an asynchronous bus, the sender places data on the bus and then signals the receiver using
control signals that the data is ready. The receiver, in turn, acknowledges the receipt of the data,
and the transfer happens without a fixed timing reference.

Example:

A peripheral device (like a keyboard or mouse) sends data to the CPU. It sends a signal to the
CPU when the data is ready. The CPU reads the data when it is available, without relying on a
clock cycle.

Advantages:

●​ Flexibility: Can work with devices that have different operating speeds.
●​ No Need for Synchronization Clock: There’s no central clock, so devices can operate
independently of each other.

Disadvantages:

●​ Slower Data Transfers: Since there’s no clock, the handshaking process can make data
transfers slower compared to synchronous systems.
●​ Complex Communication: More control signals are needed to manage the data
transfer, which can make the system design more complex.
What are Interface Circuits?

Interface circuits are electronic circuits that allow communication between different
components of a computer system, such as the CPU, memory, and I/O devices (e.g., keyboard,
printer, disk drives). These circuits help translate the signals from one component to a form that
another component can understand. In simpler terms, an interface circuit acts as a translator
between the different parts of a computer system, enabling smooth data communication.

Types of Interface Circuits

1.​ Input Interface Circuits


2.​ Output Interface Circuits
3.​ Standard I/O Interface

1. Input Interface Circuits

Input interface circuits are used to connect input devices (e.g., keyboard, mouse, sensors) to
the computer system so that the data from these devices can be received by the CPU or
memory. These circuits manage the electrical signals from the input device and convert them
into a format that the CPU can understand.

Key Functions:

●​ Signal Conversion: They convert the signals from the input device (which may be in
analog or digital form) into a format suitable for the CPU (usually digital signals).
●​ Data Synchronization: Ensures that the data from the input device is correctly
synchronized with the system’s clock or communication protocol.
●​ Buffering: Stores data temporarily in a buffer before passing it to the CPU, preventing
data loss.

Example:

When you press a key on a keyboard, the input interface converts the keypress signal into
binary data, which is then sent to the CPU for processing.

2. Output Interface Circuits


Output interface circuits are used to connect output devices (e.g., monitors, printers,
speakers) to the computer system. These circuits convert the data generated by the CPU into a
format that the output device can process.

Key Functions:

●​ Signal Conversion: Converts the digital data from the CPU into a form that the output
device can understand. For example, converting digital data into analog signals for a
speaker.
●​ Data Formatting: Organizes the data according to the output device's specifications
(e.g., for printing a document or displaying an image on the screen).
●​ Buffering: Temporarily stores data to prevent overflow or delays in processing.

Example:

When you print a document, the output interface takes the digital data from the CPU and
converts it into a format that the printer can understand, such as control signals or paper
instructions.

3. Standard I/O Interface

A Standard I/O Interface defines a common set of rules, protocols, and connectors that
standardizes the communication between the computer system and external devices. These
interfaces are designed to provide compatibility across different systems and devices, ensuring
they can work together smoothly.

Key Features:

●​ Standardized Communication: Ensures that different devices can communicate with


the CPU and other system components using a common language or protocol.
●​ Plug-and-Play: Devices can be easily connected to the system without requiring
extensive configuration.
●​ Protocol Compatibility: Defines a standard communication protocol that both the
device and the system understand, making integration easier.

Examples of Standard I/O Interfaces:

●​ USB (Universal Serial Bus): A widely used interface for connecting various devices like
keyboards, mice, printers, and external drives.
●​ PCI (Peripheral Component Interconnect): A standard used for connecting internal
components, such as network cards or graphics cards, to the motherboard.
●​ Serial and Parallel Ports: Older standards for connecting devices like printers and
external modems.
What is PCI (Peripheral Component Interconnect)?

PCI (Peripheral Component Interconnect) is a standard for connecting peripheral devices


(such as network cards, sound cards, graphics cards, etc.) to a computer's motherboard. It
provides a high-speed data path for these devices to communicate with the CPU and memory.
PCI slots are used to insert expansion cards that extend the functionality of a computer.

Key Features:

●​ High-Speed Communication: PCI provides a fast data transfer rate (e.g., 33 MHz to
133 MHz).
●​ Plug-and-Play: PCI devices can be added to the system without needing manual
configuration of settings like IRQ or memory addresses.
●​ Bus Architecture: PCI uses a parallel bus architecture, meaning multiple devices can
communicate over the same data path simultaneously.

Example:

A graphics card or network adapter installed in a PCI slot on the motherboard to expand the
system's capabilities.

What is SCSI (Small Computer System Interface)?

SCSI (Small Computer System Interface) is a set of standards for connecting and transferring
data between computers and peripheral devices, such as hard drives, scanners, and printers. It
provides a flexible and fast communication protocol for a wide range of devices.

Key Features:

●​ Multiple Devices: SCSI supports multiple devices on a single bus, allowing devices like
hard drives, printers, and CD drives to be connected simultaneously.
●​ Fast Data Transfer: SCSI provides faster data transfer rates compared to older
standards like parallel ports or serial connections.
●​ Versatile: SCSI can be used for both internal (inside the computer) and external
(connected to the computer) devices.

Example:

You might use a SCSI interface to connect multiple hard drives to a server or workstation,
providing high-speed data transfer and expandability.

What is USB (Universal Serial Bus)?


USB (Universal Serial Bus) is a widely-used standard for connecting various external devices
to a computer, such as keyboards, mice, printers, external storage devices, and more. It is
designed to be simple, universal, and provide power to connected devices.

Key Features:

●​ Ease of Use: USB is plug-and-play, meaning you can connect devices without restarting
the computer or manually configuring settings.
●​ Data and Power: USB not only transfers data but also supplies power to low-power
devices, eliminating the need for separate power cables.
●​ Multiple Device Support: USB can connect multiple devices through hubs, allowing
many devices to be connected to a single USB port.
●​ Hot-Swappable: Devices can be connected and disconnected without turning off the
computer.

Example:

A printer, flash drive, or smartphone can be connected to a computer using a USB port to
transfer data or charge the device.

Summary:

●​ PCI: A high-speed bus standard for connecting internal devices like graphics cards or
network cards to the motherboard.
●​ SCSI: A flexible and high-speed interface used for connecting multiple external devices
like hard drives and scanners.
●​ USB: A universal and widely used standard for connecting external devices to a
computer, providing both data transfer and power.

You might also like