Computer Architecture
Computer Architecture
Computer Architecture refers to how the parts of a computer system are organized and how
they work together to process information. Think of it like the blueprint or design of a house, but
for computers.
Simple Definition
Computer architecture is the design and structure of a computer's hardware and system
components, such as the processor (CPU), memory (RAM), input/output devices, and how they
communicate with each other.
1. Smartphones: When you use a smartphone to open an app, the phone's processor
(CPU) follows a set of instructions to fetch the app from storage, load it into memory
(RAM), and display it on your screen. The architecture of the phone ensures that all
these parts work together efficiently and quickly.
Functional Units
1. Input Unit
● What it does: It takes input from devices like a keyboard, mouse, microphone, or
sensors and sends it to the computer for processing.
● Example: Typing on a keyboard or clicking with a mouse.
● Real-World Analogy: Like a receptionist who receives documents and hands them to
the processing department.
The CPU is the brain of the computer. It has three main parts:
● What it does: It acts like a manager, directing other parts of the computer to do their
jobs.
● Example: Tells the processor to fetch instructions, decode them, and execute them.
● Real-World Analogy: A manager in a factory giving instructions to workers.
● What it does: It performs calculations (addition, subtraction, etc.) and logic operations
(like comparing numbers).
● Example: Solving a math problem.
● Real-World Analogy: A worker in a factory who does calculations or checks product
quality.
3. Memory Unit
● What it does: It stores data and instructions. There are two types:
○ RAM (Random Access Memory) – Temporary memory used while the computer
is on.
○ ROM (Read-Only Memory) – Permanent memory that stores important
instructions.
● Example: When you open an app, it is loaded into RAM so you can use it quickly.
● Real-World Analogy: Like a bookshelf where books (data) are stored for workers to
read
● What are they? Registers are tiny storage locations inside the CPU. They hold small
amounts of data that the CPU is currently working on.
● Speed: Fastest type of memory.
● Size: Very small (a few bytes to kilobytes).
● Example: When you do math on a calculator, the numbers you're currently typing are
stored in registers until the final result is displayed.
● Real-World Analogy: Imagine a chef cooking in a kitchen. The chef keeps essential
ingredients (like salt, pepper, or spices) right on the counter. These are like registers —
small, quick to access, but can only hold a few items.
Key Point:
Key Point:
Key Point:
● What happens? The CPU takes the instruction (like "2 + 3") from memory and brings it
into the CPU.
● How it works? The Address Bus points to the location in memory where the instruction
is stored, and the instruction is transferred to the CPU through the Data Bus.
● Real-Life Example:
○ Imagine you’re a student doing homework. The first step is to open your book to
get the question.
○ For example, the question might be: "Add 2 + 3".
1. FETCH CPU gets the instruction "2 + 3" from Student finds the math question
memory. in their book.
2. DECODE CPU understands that it needs to "add" 2 Student reads and understands
and 3. the math problem.
3. EXECUTE CPU’s ALU adds 2 and 3 to get 5. Student solves the math problem
to get 5.
4. STORE CPU stores the result (5) in memory and Student writes the answer (5) in
shows it on the screen. their notebook.
The Instruction Code is a small set of binary (0s and 1s) instructions that tells the CPU what to
do.
Each instruction is like a secret code that only the CPU understands.
For example, a simple instruction like "ADD 2 + 3" might look like this in binary:
● Operation Code (Opcode): What should be done (like ADD, SUBTRACT, MULTIPLY,
etc.).
● Operands: The data to work with (like the 2 and 3 in "2 + 3").
Bus Structure:
A bus in computer architecture is like a highway inside the computer. It helps different parts of
the computer, like the CPU, memory, and input/output (I/O) devices, communicate with each
other. Without a bus, these components would not be able to share information.
What is a Bus?
A bus is a collection of wires, paths, or channels used to transfer data, instructions, and
control signals between different parts of a computer.
🏠
📦
1. Address Bus — Carries addresses (locations) where data should be sent.
🚦
2. Data Bus — Carries actual data (like numbers, text, or images).
3. Control Bus — Carries control signals (like "Read", "Write", or "Stop").
● Purpose: It carries the actual data, instructions, and information between the CPU,
memory, and input/output devices.
● Who Uses It? The CPU, memory, and input/output devices use the Data Bus to send
and receive actual data.
● Bidirectional: It goes in both directions — from CPU to memory and from memory
back to the CPU.
● Purpose: It controls when and how the CPU, memory, and I/O devices communicate. It
sends control signals like read, write, halt, and reset.
● Who Uses It? The Control Unit (part of the CPU) sends control signals to the other
components via the Control Bus.
● Bidirectional: The control signals go both ways between the CPU, memory, and
input/output devices.
How the Address Bus, Data Bus, and Control Bus Work
Together
When you open a file or run an app, the CPU uses all three buses together to complete the
task.
Here’s how it works step-by-step when you press a key on the keyboard (like typing "A"):
● The CPU uses the Address Bus to tell the memory, "I need the data stored at location
🏠
1234."
● Address Bus: Location = 1234
● The CPU sends a "READ" command using the Control Bus to tell the memory, "Read
🚦
the data at address 1234."
● Control Bus: Command = READ
📦
Data Bus.
● Data Bus: Data = A
🏠
● The Address Bus ( ) goes from the CPU to memory and I/O to specify where data
📦
should be read or written.
● The Data Bus ( ) carries the actual data back and forth between the CPU, memory,
🚦
and I/O devices.
● The Control Bus ( ) tells components what to do (like READ or WRITE) and ensures
smooth communication.
⬆️
+-----------+ +------------+ +-------------+
🚌 **SINGLE BUS** 🚌
|
⬇️
|
+-------------------------------------+
| Address Bus | Data Bus | Control Bus |
+-------------------------------------+
● How It Works:
○ The CPU, memory, and I/O devices are connected to a single, shared bus.
○ When one device (like the CPU) wants to send data, all other devices must wait
until the bus is free.
○ The Address Bus, Data Bus, and Control Bus are all combined into a single,
shared bus.
● Advantages:
○ Simple design — Easy to design and cost-effective.
○ Low cost — Fewer components and simpler connections.
● Disadvantages:
○ Slow speed — Since only one device can send data at a time, others have to
wait.
○ Data collision — Multiple devices may try to send data at the same time,
causing delays.
● Where Used?
○ Old computers, simple microcontrollers, and simple embedded systems.
📦 2. Multiple Bus Structure
A multiple bus structure means that the CPU, memory, and I/O devices have separate buses
for each type of communication. Instead of sharing a single highway, they have dedicated
roads for each task.
+-------------+
| CPU |
⬇️ ⬇️
+-------------+
Multiple Bus Structure is like having a highway with multiple lanes. Each lane is for different
vehicles (like trucks, cars, and buses) going to different locations.
Double Bus Structure is like having two roads — one for delivery trucks (data) and one for
postal vans (instructions) so they don’t block each other.
Software Performance:
Software refers to the collection of programs, applications, and operating systems that tell a
computer or device what to do. Unlike hardware (the physical parts of a computer), software is
the set of instructions that makes the hardware work.
🔹 1. System Software
System software controls the hardware and provides a platform for other software to run. It
acts like a manager for the entire system.
📌 What it Does:
● It controls and manages the hardware (like CPU, memory, and devices).
● It provides a platform to run other software (like apps).
🔹 2. Application Software
Application software is designed to help users perform specific tasks like writing, drawing, or
playing games. It is what you use directly.
📌 What it Does:
● It allows users to do tasks like typing, editing, calculating, drawing, and playing games.
● It runs on top of system software (like Windows or Android).
📌 What it Does:
● It allows developers to write, edit, and test code to create new software.
● It provides tools to detect errors (bugs) and fix them.
Software performance refers to how well software works. It measures how fast and efficient a
program is at completing tasks. In simpler terms, it is about how quickly and smoothly a
software or application runs.
📋 Why is it Important?
● Higher clock speed = Faster CPU.
● It directly affects how fast your computer can run programs.
📌 2. Processor Cache
The cache is a small amount of super-fast memory inside the CPU. It stores frequently used
data so that the CPU can access it quickly.
📋 Types of Cache:
1. L1 Cache: Smallest but fastest cache, located inside the CPU.
2. L2 Cache: Slightly bigger but slower than L1.
3. L3 Cache: Larger, but slower than L2, shared by all CPU cores.
📋 Why is it Important?
● More cache = Faster CPU.
● When the cache is big, the CPU can store more data close to it, reducing the time to
access memory.
📋 Equation:
CPU Time=Instruction Count×CPI×Clock Cycle Time
Where:
📋 Example Problem 1
Problem:
A program has 2 billion instructions.
The CPU takes 2 clock cycles per instruction (CPI = 2).
The clock speed is 2 GHz (which means 1 cycle = 1 / 2 GHz = 0.5 nanoseconds).
Solution:
CPU Time=2billion×2cycles/instruction×0.5ns
CPU Time=2×2×0.5seconds
CPU Time=2seconds
📋 Example Problem 2
Problem:
A program has 1.5 billion instructions.
The CPU has a clock speed of 3 GHz.
It takes 1.5 cycles per instruction (CPI = 1.5).
Solution:
CPU Time=0.75seconds
When we store data in a computer, it needs to be placed somewhere so that it can be found
later. This "somewhere" is called a memory location. To identify where the data is stored, every
memory location is given a unique address. Think of it like how houses on a street have unique
house numbers so that you can find the correct house.
📘 2. Word Length
● What it is: The size of data the CPU processes at once (in bits, like 8, 16, 32, or 64 bits).
● How it works: The larger the word length, the more data the CPU can process in one
go.
● Example: A 64-bit CPU can process 64 bits of data at a time, while a 32-bit CPU
processes only 32 bits.
📘 4. Byte Addressing
● What it is: Each byte (8 bits) in memory has a unique address.
● How it works: To access data, the CPU references the address of the exact byte.
● Example: If memory starts at address 0x0000, then 0x0001 is the address of the next
byte.
📘 5. Word Addressing
● What it is: Instead of addressing each byte, the CPU addresses whole words (like 2, 4,
or 8 bytes at a time).
● How it works: Each "word" gets a unique address, and the address jumps by the word
size.
● Example: If the word size is 4 bytes, and the first word is at address 0x0000, the next
word will be at 0x0004.
Memory Operations :
📘 1. What is an Instruction?
● What it is: An order or command given to the CPU to perform a specific task.
● How it works: Instructions tell the CPU what to do, like add numbers, move data,
or make decisions.
● Example: An instruction like ADD A, B tells the CPU to add the values in registers
A and B.
📘 2. Types of Instructions
🔹 1. Data Transfer Instructions
● What it is: Used to move data from one location to another.
● Example: MOV A, B moves data from register B to register A.
🔹 2. Arithmetic Instructions
● What it is: Used to perform math operations like add, subtract, multiply, and divide.
● Example: ADD A, B adds the values in registers A and B.
🔹 3. Logical Instructions
● What it is: Used to perform logical operations like AND, OR, and NOT.
● Example: AND A, B performs the logical AND operation on values in A and B.
🔹 4. Control Instructions
● What it is: Used to change the flow of program execution (like loops and jumps).
● Example: JUMP 100 moves the program execution to address 100.
ADD C, A, B ; C = A + B
🔹 2. Two-Address Instruction
● What it is: Uses 2 addresses (source and destination, one acts as both input and
output).
● Syntax: OPCODE DEST, SOURCE
ADD A, B ; A = A + B
🔹 3. One-Address Instruction
● What it is: Uses 1 address for data, with the second operand stored in a special register
(like an Accumulator).
● Syntax: OPCODE ADDRESS
● Example
📘 1. What is an Accumulator?
What it is: A special register in the CPU that stores the result of arithmetic and logic operations.
How it works: It temporarily holds data for operations like ADD, SUB, AND, etc.
Example: When you add two numbers, the result is stored in the accumulator before being
moved elsewhere.
🔹 4. Zero-Address Instruction
● What it is: Uses no addresses, as it works with data in a stack (a special memory
structure).
● Syntax: OPCODE
● Example
ADD ; Pop two values from the stack, add them, and push result back
Explanation: The stack stores values, and ADD pops two values, adds them, and pushes the
result back.
Example:
MOV A, B ; Copy the value from register B to register A
Example:
MOV A, [1000] ; Get the data from memory address 1000 and store it in
register A
Example:
MOV R1, 1000 ; Store the memory address 1000 in register R1
MOV A, [R1] ; Access the data at address 1000 (from R1) and store
it in A
Example:
MOV R1, 1000 ; Base address in R1
MOV A, [R1 + 2] ; Access data from address 1002 (1000 + 2) and store
in A
● Usage: Used for arrays and loops where addresses change repeatedly.
Example:
JUMP +4 ; Jump 4 instructions ahead in the program
; 1. Immediate Addressing
MOV A, 5 ; Load value 5 into register A (Immediate)
; 2. Register Addressing
MOV B, 10 ; Load value 10 into register B
ADD A, B ; Add B to A (Register Addressing)
; 3. Direct Addressing
MOV A, [1000] ; Load value from memory address 1000 into A
; 4. Indirect Addressing
MOV R1, 2000 ; Store the memory address 2000 in register R1
MOV A, [R1] ; Load value from memory address 2000 (via R1) into A
; 5. Indexed Addressing
MOV R1, 1000 ; Base address
MOV A, [R1 + 4] ; Load value from address 1004 (base + 4) into A
; 6. Relative Addressing
JUMP +4 ; Jump 4 instructions ahead (relative to current PC)
● Problem: Computers only understand 0s and 1s. Assembly instructions are already
close to machine code.
● Why It Matters: If we wrote in English, the computer would need a super complex
translator to understand every sentence, every grammar rule, and every possible
meaning. Instead, assembly is a "middle language" between human language and
binary.
● Example:
○ Binary (what the CPU understands): 1100001100000101
○ Assembly (middle ground): MOV A, 5
○ English (too complex): "Store the number 5 in register A of the CPU."
📘 Real-Life Analogy
Think of I/O operations as a conversation between you and a vending machine:
1. Input: You press a button to select a drink.
2. Processing: The machine reads your selection.
3. Output: The machine dispenses the drink.
📘 Conclusion
Basic I/O operations are vital for computer systems to communicate with external devices. They
handle both inputs (data in) and outputs (data out), ensuring smooth interactions between the
CPU, memory, and peripherals.
Arthimetic Unit
1's Complement:
● 1's complement of a binary number is obtained by flipping all the bits of the number,
i.e., changing every 1 to 0 and every 0 to 1.
● The 1's complement system represents negative numbers by inverting the bits of the
corresponding positive number.
● For example, the 1's complement of 110010 is 001101.
2's Complement:
● 2's complement of a binary number is obtained by first taking the 1's complement and
then adding 1 to the least significant bit (LSB).
● The 2's complement is more widely used in computer systems for representing signed
integers because it simplifies addition and subtraction operations.
● For example, the 2's complement of 110010 is 001110 (after first flipping the bits and
then adding 1).
complements (specifically 1's complement and 2's complement) are used to represent
negative numbers in binary systems, and they provide a way to indicate the negative sign of a
number.
Example:
Let's take the number 5 and represent it as a negative value using complement notation:
● 1's and 2's complement systems are used to represent negative numbers in binary.
● In both systems, the complement method allows negative numbers to be handled in
binary form efficiently.
● 2's complement is the most commonly used system in modern computers because it
simplifies arithmetic and resolves issues like double representation of zero.
Example:
To find the 1's complement of 72, we invert all the bits of the binary representation:
7-8= -1
in 2's complement representation (which is commonly used for signed numbers in most
computer systems), -1 is represented as 11111111 in binary, assuming an 8-bit system.
Half Adder:
A half adder is a fundamental digital circuit that adds two single-bit binary numbers and
produces a sum and a carry output. It doesn't have any carry input
Sum = A ⊕ B
Carry = A AND B
● Can add 3 bits, it includes one carry input and a carry output, which
● Complex and needs more gates, hence making the design more
● Yeah man, slightly slower because normally 2 gate process are used
instead
● of 1.
Applications
binary addition.
digital system
Fast Adders.: The adder produce carry propagation delay while performing
additions or subtraction steps. This is a major problem for the adder and
hence improving the speed of addition will improve the speed of all other
is of great importance. There are different logic design approaches that have
been employed to overcome the carry propagation problem. One widely used
The Basic Processing Unit (BPU), often referred to as the Central Processing Unit
(CPU) in many cases, is the part of a computer or device that carries out instructions
from programs. It’s essentially the "brain" of the computer, where most of the work
happens. Here are its key fundamental concepts explained in simple terms:
○ The control unit directs the operations of the CPU. It tells the other parts of
the computer system how to respond to a program’s instructions. You can
think of it like a manager who tells everyone what to do.
2. Arithmetic Logic Unit (ALU):
○ Registers are small, fast storage areas in the CPU used to temporarily
hold data and instructions that are being processed. They help speed up
operations by storing data close to where it's needed in the CPU.
4. Clock:
○ The clock helps the CPU know when to carry out operations. It generates
a consistent timing signal that helps synchronize the work of the different
components of the computer.
5. Cache:
○ The cache is a smaller, faster type of memory that stores frequently used
data and instructions so that the CPU can access them quickly, rather than
having to retrieve them from slower main memory.
1. Fetch
● What happens? The CPU retrieves (fetches) the next instruction from memory
(RAM) to execute.
● How? The Program Counter (PC) holds the memory address of the next
instruction. The CPU uses the Memory Address Register (MAR) to access the
instruction stored at that location in memory, and it is loaded into the Memory
Buffer Register (MBR).
2. Decode
3. Execute
● What happens? The CPU performs the action specified by the instruction.
● How? The Arithmetic Logic Unit (ALU) may carry out calculations, logical
operations, or data movement depending on the type of instruction.
4. Store (optional)
● What happens? After execution, the result may need to be stored back into
memory or a register.
● How? The result is written back to the register or memory as needed.
+---------------------+
| Fetch Instruction |
+---------------------+
|
v
+---------------------+
| Decode Instruction |
+---------------------+
|
v
+---------------------+
| Execute Action |
+---------------------+
|
v
+---------------------+
| Store Result (if needed) |
+---------------------+
|
v
(Repeat the cycle for the next instruction)
Step-by-Step Breakdown:
Example:
Let’s say the instruction is to add two numbers.
In simple terms, the CPU is like a worker who keeps fetching a list of tasks, figuring out
what each task requires, doing the task, and then storing the result. This happens very
quickly, so you don’t notice the process happening—it just looks like the computer is
running smoothly.
Let's walk through the execution of a complete instruction step by step, using a
simple example: "Add two numbers." In this case, we’ll assume the numbers are
stored in registers, and the instruction is to add them together.
Example Instruction:
● Instruction: ADD R1, R2 (This means: Add the value in register R1 to the value
in register R2 and store the result in R2).
1. Fetch
● The CPU fetches the instruction (ADD R1, R2) from memory.
● The Program Counter (PC) holds the address of the next instruction to execute.
Let’s say the address of this instruction is 1000.
● The CPU uses the Memory Address Register (MAR) to access memory at
address 1000, and the Memory Buffer Register (MBR) temporarily stores the
instruction ADD R1, R2.
● The Instruction Register (IR) then holds this instruction, so the CPU knows
what it needs to do.
2. Decode
● The Control Unit (CU) decodes the instruction from the Instruction Register
(IR).
● The CU determines that the instruction is an ADD operation and identifies the
operands (the values to be added). In this case, the operands are the values
stored in R1 and R2.
● The CU also understands that the result should be stored back into R2.
● Key takeaway: The CU decodes that it needs to take the values in R1 and R2,
perform an addition, and store the result in R2.
3. Execute
4. Store (optional)
● The result (8) from the ALU is stored back into R2.
● Key takeaway: The result of the addition is saved in R2, replacing the old value
in R2 (which was 3).
This process happens very quickly, and the CPU continuously cycles through these
steps to process instructions in the program.
Hardwired control is a method used in CPUs to manage and direct the execution of
instructions using fixed logic circuits. It involves the use of combinational circuits (like
decoders, gates, and multiplexers) that generate control signals to guide the flow of
data and operations within the CPU. This type of control is called "hardwired" because
the control logic is designed and implemented using physical hardware, and it is not
programmable.
○ The control unit generates signals that control various parts of the CPU
(like the Arithmetic Logic Unit (ALU), registers, and memory).
○ These signals determine what operations should be performed, such as
"add," "store," or "fetch."
3. Clock:
○ The clock synchronizes the operations of the CPU and ensures that the
control signals are executed in the right sequence.
4. Control Logic Circuit:
○ The control logic consists of logic gates and flip-flops, which are used to
control the data flow between registers, ALU, memory, and other
components.
○ The decoded instruction is sent to the Control Unit (CU), which contains
the hardwired logic.
○ The CU generates the necessary control signals to tell the ALU,
registers, and memory what operations to perform.
4. Execution:
○ The ALU performs the operation (like adding two numbers) based on the
control signals.
○ The results are stored in the registers or sent to memory.
5. Repeat:
○ The cycle continues with the CPU fetching, decoding, and executing
instructions until the program finishes.
Advantages of Hardwired Control:
● Faster Execution: Since the control signals are generated using fixed hardware,
the execution is faster than with programmable control systems.
● Simplicity: The design is simpler, especially for simpler or fixed operations.
● Limited Flexibility: It’s harder to modify or change the behavior of the control
unit, as it’s based on physical circuits.
● Complexity in Designing for Complex Instructions: As the instruction set
becomes more complex, the hardwired control design can become complicated
and less efficient.
In summary, hardwired control in a CPU is a fast and efficient way to control the flow
of operations by using fixed logic circuits to generate control signals, but it lacks the
flexibility of programmable control units.
Unlike hardwired control where control signals are generated by fixed logic circuits,
microprogrammed control uses a series of microinstructions stored in memory to
generate the control signals. The microinstructions are organized into microprograms
that represent the sequence of operations needed to perform a high-level instruction.
○ The sequencer determines the address of the next microinstruction (it may
jump or continue sequentially) and updates the Program Counter (PC).
7. Repeat the Cycle:
○ The process repeats for each microinstruction until the full operation (e.g.,
ADD) is completed.
Let’s take an example where the instruction is "ADD R1, R2". This instruction is broken
down into several microinstructions to perform the addition:
1. First Microinstruction: Move data from register R1 to the ALU input.
2. Second Microinstruction: Move data from register R2 to the ALU input.
3. Third Microinstruction: Perform the addition operation in the ALU.
4. Fourth Microinstruction: Store the result back in R2.
Each of these microinstructions would be fetched from the control memory, decoded,
and executed step by step.
● Flexibility: It's easier to modify or change the control unit since microinstructions
can be updated in memory.
● Simplified Design: Microprogrammed control is easier to design for complex
instruction sets, as it breaks down high-level instructions into smaller,
manageable microinstructions.
● Cost-effective: Easier to implement for complex processors without needing
complex hardwired logic.
Summary:
● Microprogrammed control uses a sequence of microinstructions to generate
control signals for CPU operations.
● It provides flexibility and simplifies complex instruction sets but may be slower
than hardwired control.
● It works by fetching, decoding, and executing microinstructions from control
memory, generating signals for operations like moving data or performing
calculations.
Suitability for Suitable for complex systems Suitable for simpler systems
Complex with intricate instructions. with fewer and more
Systems straightforward instructions.
What is Pipelining?
As soon as one stage completes its task for one instruction, it can begin working on the
next instruction, while other stages continue to work on different instructions. This
overlap of operations leads to increased instruction throughput (more instructions
executed per unit of time).
Stages of Pipelining:
Typically, a pipeline has the following stages (though the number of stages may vary):
Example of Pipelining:
1. Instruction 1: ADD R1, R2, R3 (Add the contents of registers R2 and R3,
store result in R1)
2. Instruction 2: SUB R4, R5, R6 (Subtract contents of R5 and R6, store result
in R4)
3. Instruction 3: MOV R7, R8 (Move contents of R8 to R7)
4. Instruction 4: MUL R9, R10, R11 (Multiply contents of R10 and R11, store
result in R9)
After some time, all instructions are in different stages of execution, and the processor
is continuously working on all instructions in parallel, maximizing throughput.
Advantages of Pipelining:
Challenges in Pipelining:
● Data Hazards: When one instruction depends on the result of a previous
instruction, causing a delay.
● Control Hazards: Occurs due to branching instructions that alter the flow of
execution, requiring pipeline flushing or stalling.
● Structural Hazards: When hardware resources are insufficient to support the
simultaneous execution of multiple instructions.
Conclusion:
Pipelining is a key technique in modern processors that increases efficiency and speed
by allowing multiple instructions to be processed simultaneously at different stages. This
leads to higher throughput and faster execution times, but it also introduces
complexities like hazards that must be managed effectively.
Now let us look at a real-life example that should operate based on the pipelined
operation concept. Consider a water bottle packaging plant. For this case, let there be
3 processes that a bottle should go through, ensing the bottle(I), Filling water in the
bottle(F), Sealing the bottle(S).
It will be helpful for us to label these stages as stage 1, stage 2, and stage 3. Let each
stage take 1 minute to complete its operation. Now, in a non-pipelined operation, a
bottle is first inserted in the plant, and after 1 minute it is moved to stage 2 where
water is filled. Now, in stage 1 nothing is happening. Likewise, when the bottle is in
stage 3 both stage 1 and stage 2 are inactive. But in pipelined operation, when the
bottle is in stage 2, the bottle in stage 1 can be reloaded. In the same way, during the
bottle 3 there could be one bottle in the 1st and 2nd stage accordingly. Therefore at
the end of stage 3, we receive a new bottle for every minute. Hence, the average time
taken to manufacture 1 bottle is:
I F S | | | | | |
| | | I F S | | |
| | | | | | I F S (9 minutes)
I F S | |
| I F S |
| | I F S (5 minutes)
● In a pipelined processor, a pipeline has two ends, the input end and the
output end. Between these ends, there are multiple stages/segments such
that the output of one stage is connected to the input of the next stage and
● Interface registers are used to hold the intermediate output between two
● All the stages in the pipeline along with the interface registers are controlled
by a common clock.
Non-Overlapped Execution
Stage / Cycle 1 2 3 4 5 6 7 8
I I
S1
1 2
I I
S2
1 2
I I
S3
1 2
I I
S4
1 2
Overlapped Execution
Stage / Cycle 1 2 3 4 5
I I
S1
1 2
I I
S2
1 2
I I
S3
1 2
I I
S4
1 2
Total time = 5 Cycle Pipeline Stages RISC processor has 5 stage instruction pipeline to
execute all the instructions in the RISC instruction set. Following are the 5 stages of
the RISC pipeline with their respective operations:
● Stage 1 (Instruction Fetch): In this stage the CPU fetches the instructions
from the address present in the memory location whose value is stored in
instruction.
● Stage 3 (Instruction Execute): In this stage some of activities are done such
as ALU operations.
● Stage 4 (Memory Access): In this stage, memory operands are read and
Performance of a pipelined processor Consider a ‘k’ segment pipeline with clock cycle
time as ‘Tp’. Let there be ‘n’ tasks to be completed in the pipelined processor. Now, the
first instruction is going to take ‘k’ cycles to come out of the pipeline but the other ‘n –
1’ instructions will take only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles. So, time taken to
execute ‘n’ instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp
In the same case, for a non-pipelined processor, the execution time of ‘n’ instructions
will be:
ETnon-pipeline = n * k * Tp
So, speedup (S) of the pipelined processor over the non-pipelined processor, when ‘n’
tasks are executed on the same processor is:
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n – 1]
When the number of tasks ‘n’ is significantly larger than k, that is, n >> k
S = n * k / n
S = k
where ‘k’ are the number of stages in the pipeline. Also, Efficiency = Given speed up /
Max speed up = S / Smax We know that Smax = k So, Efficiency = S / k Throughput =
Number of instructions / Total time to complete the instructions So, Throughput = n /
(k + n – 1) * Tp Note: The cycles per instruction (CPI) value of an ideal pipelined
processor is 1 Please see Set 2 for Dependencies and Data Hazard and Set 3 for Types
of pipeline and Stalling.
In computer architecture, data hazards occur when there are dependencies between
instructions that are executed in a pipeline, potentially causing delays or incorrect
results. These hazards arise because one instruction needs to use the data that is being
modified or is yet to be written by a previous instruction.
Data hazards are critical issues in pipelined processors since multiple instructions are
executed concurrently at different stages. Without careful management, a data hazard
can lead to errors or delays in processing.
Example:
Instruction 1: ADD R1, R2, R3 (R1 = R2 + R3)
○
■ In this example, Instruction 2 tries to read the value of R1 before
Instruction 1 writes its result to R1.
■ This can lead to incorrect results or delays because R1 isn't
updated in time.
2. Write After Read (WAR) Hazard (Anti-Dependency):Occurs when a write
operation happens after a read operation on the same register or memory
location, but the write happens before the read is completed.
This type of hazard is less common and can typically be avoided by reordering
instructions or using pipeline interlocks.
Example:
Instruction 1: SUB R1, R2, R3 (R1 = R2 - R3)
○
■ In this example, Instruction 2 writes to R2 after Instruction 1 reads
it. The write to R2 could happen before the read, causing data
inconsistency.
3. Write After Write (WAW) Hazard (Output Dependency):
Example:
Instruction 1: ADD R1, R2, R3 (R1 = R2 + R3)
There are several techniques used to handle data hazards in pipelined processors:
● If Instruction 2 needs to use the value from Instruction 1 in a later stage (e.g., R1
in the example), a RAW hazard will occur, because Instruction 2 can't execute
until Instruction 1 writes its result back.
Summary:
Instruction hazards occur in pipelined processors when there are issues with the flow
of instructions due to dependencies or conflicts between instructions being executed in
parallel. These hazards prevent the pipeline from operating efficiently and can lead to
delays or incorrect execution of instructions.
○ Occur when there are not enough hardware resources (like functional
units, registers, or memory) to support the simultaneous execution of
multiple instructions in the pipeline.
○ Example: If a processor has only one memory unit and multiple
instructions need memory access at the same time, a structural hazard
occurs because the memory unit cannot serve multiple requests
simultaneously.
2. Data Hazards:
○ Occur due to branch instructions (like if, goto, etc.), which change the
flow of execution.
○ Control hazards arise when the processor doesn’t know the target of a
branch instruction until it has been fully decoded, potentially causing
instructions after the branch to be fetched or executed prematurely.
○ Example: If a branch instruction is encountered, the processor has to wait
to determine the target address before continuing to fetch subsequent
instructions. If it fetches the wrong instruction, a control hazard occurs.
Instruction hazards can cause delays or incorrect results in a pipelined processor, and
different techniques are used to handle them:
| (IF) |
+-----------------------+
+-----------------------+
| (ID) |
+-----------------------+
+-----------------------+
| (EX) |
+-----------------------+
+-----------------------+ +--------------------+
| (MEM) | | (EX) |
+-----------------------+ +--------------------+
+-----------------------+
| (WB) |
+-----------------------+
In this example:
● Instruction 1 goes through the standard pipeline stages.
● Instruction 2 may have a data or control dependency on Instruction 1. This
could cause a data hazard or control hazard if the instructions are not properly
managed in the pipeline.
Conclusion:
In computer architecture, the instruction set (also known as the Instruction Set
Architecture or ISA) plays a crucial role in defining how a processor functions and
interacts with software. It is essentially a collection of instructions that the CPU can
execute, and it defines the set of operations, formats, and control mechanisms the
processor supports.
The design of the instruction set has a significant influence on the overall architecture,
performance, and efficiency of the processor.
○ The instruction set directly affects how the processor is designed. The
types of instructions and the number of operands required for each
instruction determine the complexity of the processor's control unit and
data path.
○ RISC (Reduced Instruction Set Computing) processors have simpler
instruction sets with fewer instructions, which typically require more
cycles per instruction but allow for easier and faster execution of each
instruction.
○ CISC (Complex Instruction Set Computing) processors have more
complex instructions, often capable of performing several operations in
one instruction, which can reduce the number of instructions but may
involve more complex circuitry.
2. Performance:
○ The size and format of instructions can affect how much memory is
required for storing instructions. In CISC, each instruction might be
longer, as it encodes more operations or addressing modes. In RISC,
instructions are typically of a fixed length, which can make instruction
fetch and decode simpler and faster, but also might lead to larger
programs as more instructions are needed.
○ The design of the instruction set also influences data storage (e.g., how
registers and memory are addressed), which impacts both the size and
performance of programs.
4. Compiler Design:
○ The instruction set defines the type of instructions that a compiler can
use to generate machine code. RISC instruction sets tend to rely on
simpler, more frequent instructions, making it easier for compilers to
optimize code. Compilers for CISC instruction sets must deal with more
complex, multi-operation instructions, which can make code generation
more challenging.
○ A well-designed instruction set enables better optimizations by the
compiler, leading to improved performance of the resulting programs.
5. Instruction Execution Time:
Conclusion:
The instruction set profoundly influences the design and performance of a processor.
While RISC focuses on simplicity and efficiency with smaller, uniform instructions,
CISC emphasizes a more complex set of instructions that can do more in a single step.
The decision between RISC and CISC impacts the processor's speed, memory usage,
system complexity, and how compilers optimize code, all of which contribute to the
overall performance and efficiency of a computer system.
Data Path and Control Considerations in Computer Architecture
In computer architecture, the data path and control are fundamental components that
work together to enable the processor to execute instructions efficiently.
1. Data Path:
The data path refers to the collection of components that are responsible for moving
data within the processor. These components include registers, multiplexers, arithmetic
logic units (ALUs), buses, and memory units. The data path is essentially the hardware
infrastructure that allows data to flow between different parts of the processor for
computation and storage.
● Registers: Small, fast storage locations used to store data and intermediate
results during instruction execution.
● ALU (Arithmetic Logic Unit): The part of the processor that performs arithmetic
and logical operations (such as addition, subtraction, AND, OR, etc.).
● Multiplexers: Devices that select one of several input signals based on a control
signal, enabling different data sources to be routed to the appropriate
destination.
● Buses: Shared pathways that allow data to be transferred between components
within the processor (e.g., between registers and the ALU).
● Memory Units: Locations for storing instructions and data, such as caches, main
memory, and registers.
The data path is designed to efficiently process and move data in response to the
instructions being executed by the CPU.
2. Control:
The control unit manages the operation of the data path by generating the necessary
control signals. These control signals dictate the actions that different components in
the data path should take. The control unit can be designed in two ways: hardwired
control or microprogrammed control.
Key aspects of the control unit include:
● Control Signals: Signals that instruct various parts of the data path on what
actions to perform. For example, control signals can specify whether the ALU
should add or subtract, or which register to write to.
● Instruction Decoding: The control unit decodes the incoming instruction to
determine what needs to be done. The instruction set architecture (ISA) defines
the set of instructions that the control unit must handle.
● Execution of Instructions: Based on the decoded instruction, the control unit
will activate specific components of the data path to carry out the operation. For
example, it may select operands from registers, send them to the ALU, or write
back results to memory or registers.
● The data path handles the flow and manipulation of data, while the control
unit coordinates the operations of the data path components.
● When the control unit receives an instruction, it decodes it and generates
control signals that direct the data path components to perform the necessary
operations (e.g., arithmetic, memory access, etc.).
● The interaction between the data path and control unit ensures that the correct
data is processed at the correct time and stored in the appropriate location.
In pipelined processors, the data path and control unit must be carefully designed to
handle multiple instructions in parallel. The data path is structured in stages (e.g.,
instruction fetch, decode, execute, memory access, and write-back), and the control
unit generates the appropriate control signals for each stage.
1. Efficiency: The data path must be designed to move data quickly between
registers, memory, and the ALU. Minimizing the number of cycles it takes to
complete each instruction is important for improving processor performance.
2. Hazards: Data hazards (like read-after-write) and control hazards (like
branching) must be managed to avoid delays and ensure correct execution.
Techniques like data forwarding (bypassing) and pipeline stalling are often
used.
3. Complexity: The complexity of the control unit increases with the sophistication
of the instruction set. A CISC processor may require more complex control
signals than a RISC processor due to the greater variety of instructions.
Summary:
Role in Data flows through different stages Controls the flow and timing
Pipelining of the pipeline (fetch, decode, of data through the pipeline
execute, etc.). stages.
Key Data transfer speed, data hazards, Generating correct control
Consideration and correct routing. signals, instruction
decoding.
Conclusion:
The data path and control unit are integral parts of a processor's architecture. The
data path performs the actual computations and data transfers, while the control unit
ensures that the processor operates correctly by generating the control signals
necessary for executing instructions. Together, they determine how efficiently a
processor can handle and execute instructions, especially in complex, pipelined, or
multi-core systems.
Several important changes are
1. There are separate instruction and data caches that use separate address and data
connections to the processor. This requires two versions of the MAR register, IMAR for
accessing tile instruction cache and DMAR for accessing the data cache.
2. The PC is connected directly to the IMAR, so that the contents of the PC can be
transferred to IMAR at the same time that an independent ALU operation is taking
place.
3. The data address in DMAR can be obtained directly from the register file or from the
ALU to support the register indirect and indexed addressing modes.
4. Separate MDR registers are provided for read and write operations. Data can be
transferred directly between these registers and the register file during load and store
operations without the need to pass through the ALU.
5. Buffer registers have been introduced at the inputs and output of the ALU. These are
registers SRCl, SRC2, and RSLT. Forwarding connections may be added if desired.
6. The instruction register has been replaced with an instruction queue, which is loaded
from the instruction cache.
7. The output of the instruction decoder is connected to the control signal pipeline.
MEMORY SYSTEM
BASIC CONCEPTS:
Here, we will cover the essential concepts of the memory system in computer
architecture in detail.
1. Registers:
○ Location: Inside the CPU (central processing unit).
○ Purpose: Holds the data that the CPU is currently processing. Registers
store intermediate values for computations and instructions.
○ Speed: Extremely fast, often measured in nanoseconds (ns).
○ Size: Very small, usually a few bits or bytes (e.g., 32 or 64 bits per
register).
○ Access: Directly accessible by the CPU.
2. Cache Memory:
○ Location: Between the CPU and main memory (RAM). Typically
includes L1, L2, and sometimes L3 caches.
○ Purpose: Stores frequently accessed data and instructions, reducing
the average time to access memory.
○ Speed: Faster than RAM but slower than registers. L1 cache is the
fastest, followed by L2, and L3 cache.
○ Size: Typically ranges from a few kilobytes (KB) for L1 cache to a few
megabytes (MB) for L3 cache.
○ Access: Quick access due to proximity to the CPU.
3. Main Memory (RAM):
○ Location: Typically installed on the motherboard or as separate chips.
○ Purpose: Stores the operating system, applications, and data currently
in use. It is the main working memory of the computer.
○ Speed: Slower than cache memory but faster than secondary storage.
○ Size: Typically measured in gigabytes (GB) for modern computers.
○ Access: Random access, meaning any location can be accessed
directly.
4. Secondary Storage:
○ Location: External to the CPU, usually on hard disk drives (HDDs) or
solid-state drives (SSDs).
○ Purpose: Used for long-term storage of data, programs, and files.
○ Speed: Much slower than RAM but offers higher storage capacity.
○ Size: Can range from hundreds of gigabytes to several terabytes (TB).
○ Access: Slower access time due to mechanical components (HDD) or
non-volatile memory (SSD).
5. Tertiary and Off-line Storage:
○ Location: External storage devices such as optical disks, magnetic
tapes, or cloud storage.
○ Purpose: Used for backup, archival, and long-term storage of data.
○ Speed: Slowest in the memory hierarchy.
○ Size: Typically much larger in capacity compared to secondary storage.
● Registers are the fastest and smallest, directly accessible by the CPU, but
they are limited in capacity.
● Cache is smaller than main memory but much faster and stores recently used
data to reduce access times.
● Main memory (RAM) is slower but provides more storage for running
applications and data.
● Secondary storage offers large capacity at a slower speed but provides
non-volatile storage.
● Tertiary storage provides an even larger capacity but is the slowest for
access.
a. Volatile Memory:
● Definition: Memory that loses its contents when the power is turned off.
● Examples:
○ RAM (Random Access Memory): Used as the main working memory for
active data and instructions.
○ Cache Memory: Temporary high-speed memory used to store
frequently accessed data and instructions.
○ Registers: Small, fast storage locations within the CPU used for
operations.
b. Non-Volatile Memory:
● Definition: Memory that retains data even when the power is off.
● Examples:
○ ROM (Read-Only Memory): Contains firmware or software that is
permanently programmed during manufacturing (e.g., BIOS in a
computer).
○ Flash Memory (SSD): A form of non-volatile memory used in modern
storage devices like USB flash drives, SSDs, and memory cards.
○ Hard Disk Drives (HDD): Magnetic storage used for secondary storage,
providing non-volatile storage at a larger capacity.
c. Semi-Volatile Memory:
● Definition: Memory that retains data for some period after power is turned off
but may eventually lose it.
● Examples:
○ DRAM (Dynamic RAM): Requires periodic refreshes to retain data. It is
the most common type of main memory.
○ SRAM (Static RAM): Faster than DRAM and does not require refreshing,
but it is still volatile.
a. Address Space:
● Address space refers to the range of addresses that a system can use to
identify data in memory. This can include the address space of the CPU (for
registers and cache) and the main memory (RAM).
4. Memory Organization
● Memory is organized in units called bytes (8 bits). Each byte has a unique
address, and the CPU can access data one byte at a time.
● This is typical for most modern computer systems.
● Memory is organized in larger units, called words (e.g., 16-bit or 32-bit words).
A word is the natural unit of data used by the CPU. This organization is used in
some older systems.
The operating system (OS) is responsible for managing the memory in a computer
system. Some of the key techniques used in memory management are:
a. Paging:
● Paging divides memory into fixed-size blocks called pages (in virtual memory
systems). The operating system manages the mapping of logical addresses
(virtual memory) to physical addresses (RAM) using a page table.
● Paging allows efficient use of memory and supports virtual memory, where
processes may use more memory than physically available.
b. Segmentation:
c. Memory Protection:
d. Garbage Collection:
a. Cache Memory:
● Cache memory is a small but extremely fast memory located between the
CPU and main memory. It stores copies of frequently accessed data to speed
up data retrieval.
● Cache hits occur when data is found in the cache, leading to faster access
times. Cache misses occur when the data is not in the cache, requiring access
to slower main memory.
b. Cache Coherence:
● Cache coherence ensures that multiple cache copies of the same memory
location are consistent across all cores in multi-core systems.
● Cache coherence protocols like MESI (Modified, Exclusive, Shared, Invalid)
are used to maintain consistency across caches in multi-core processors.
7. Virtual Memory
Conclusion
Semiconductor RAM is a type of volatile memory, meaning that it loses all stored
data when the power is turned off. Unlike non-volatile memory (e.g., hard drives,
SSDs), semiconductor RAM is designed for fast data access and is typically used in
computers, smartphones, and other electronic devices to store data temporarily
while the device is running.
Semiconductor RAM is mainly categorized into Static RAM (SRAM) and Dynamic
RAM (DRAM), each with different characteristics, as explained below.
1. Static RAM (SRAM)
Static RAM (SRAM) is a type of semiconductor RAM that uses flip-flops (a kind of
circuit) to store each bit of data. It is called "static" because it doesn't need to be
refreshed, unlike DRAM, which requires refreshing to retain data.
Key Features:
● No Refreshing Needed: Unlike DRAM, SRAM does not need periodic refreshing
of data.
● Faster: SRAM is faster than DRAM because accessing data in SRAM is quicker
and more direct.
● More Expensive: Since SRAM uses more transistors per bit (usually 4 to 6
transistors per bit), it is more expensive to produce.
● Smaller Capacity: SRAM tends to have smaller storage capacities compared
to DRAM because of its higher cost and larger space requirements.
● Used in Cache Memory: Because of its high speed, SRAM is commonly used
for CPU cache memory, which is critical for performance.
Operation of SRAM:
● Read Operation: When the CPU needs to access data, it sends an address to
the SRAM. The data is immediately available since it doesn’t need to be
refreshed.
● Write Operation: Data is written directly into the SRAM cell. Once written, the
data stays in the cell until it is updated or erased.
Key Features:
● Needs Refreshing: DRAM stores data in capacitors that discharge over time, so
the memory needs to be refreshed regularly to retain the data.
● Slower than SRAM: DRAM is slower than SRAM because of the time needed
for refreshing and accessing data.
● Cheaper and Higher Capacity: DRAM is cheaper to produce and can store
much more data in the same physical space compared to SRAM. It is typically
used as the main memory (RAM) in computers and other devices.
● Used as Main Memory: DRAM is commonly used for the main system
memory in devices because of its larger capacity and lower cost.
Operation of DRAM:
● Read Operation: DRAM cells are addressed by row and column lines, and the
data stored in the capacitor is read.
● Write Operation: Data is written into the DRAM by charging the capacitor, and
this data needs to be refreshed continuously.
Key Features:
● Twice the Speed of SDRAM: DDR SDRAM transfers data on both the rising and
falling edges of the clock signal, doubling the amount of data transferred per
clock cycle compared to regular SDRAM.
● Various Versions (DDR1, DDR2, DDR3, DDR4, DDR5): Over time, DDR
technology has evolved, with each new version offering higher speeds, lower
power consumption, and improved data transfer rates. DDR4 and DDR5 are
the latest versions commonly used today.
● Used in Modern Computers: DDR SDRAM is the standard memory in modern
computers and other devices, like gaming consoles and laptops, due to its
high performance and efficiency.
Rambus DRAM (RDRAM) is a type of memory that was developed by Rambus Inc. It
was designed to be faster than standard SDRAM by using a high-speed data bus
and advanced signaling techniques. Although it had a brief period of popularity, it
was eventually superseded by DDR SDRAM.
Key Features:
● High Data Transfer Rate: RDRAM used a wider, faster data bus (the Rambus
Channel) and could transfer more data per clock cycle compared to
traditional SDRAM at the time.
● Wide Bandwidth: It was designed to support high-bandwidth applications,
like graphics and gaming, and was used in some high-end systems in the late
1990s and early 2000s.
● Expensive and Complex: Despite its high performance, RDRAM was
expensive to produce, and its complex interface required additional chips and
controllers, making it more costly than DDR SDRAM.
● Limited Adoption: Due to high costs and competition from DDR SDRAM,
RDRAM was not widely adopted and eventually phased out in favor of DDR
memory.
● Data Transfer: RDRAM used a different architecture from DDR SDRAM, with
higher data transfer rates using a 16-bit wide bus instead of the traditional
64-bit bus used by DDR SDRAM.
● Clock Speed and Bandwidth: RDRAM typically operated at higher speeds
and with higher bandwidth than standard SDRAM, but it required additional
hardware like special controllers.
Summary of Differences:
Memory Type Key Features Use Case Speed
DDR SDRAM Double data rate, faster Main memory (Modern Very
than SDRAM systems) Fast
ROM is mainly used to store boot instructions, firmware, and system configurations
that do not require frequent updates.
Types of ROM
Reprogrammin Yes, but requires Yes, can be erased Yes, faster block-wise
g UV light for and reprogrammed reprogramming
erasure electrically
Cost Higher due to UV Higher than Flash Lower cost per bit
erase but cheaper than compared to
requirements EPROM EEPROM and EPROM
Advantages of EPROM:
Disadvantages of EPROM:
● Slow Erasure: Erasing data requires UV light and can take a significant amount
of time (typically 20–30 minutes).
● Limited Rewrites: The chip can only handle a limited number of erasure and
programming cycles before the quality degrades.
Advantages of EEPROM:
Disadvantages of EEPROM:
● ROM:
○ Storing firmware and system boot-up instructions in devices like
computers, calculators, and gaming consoles.
● EPROM:
○ Firmware updates in older systems.
○ Used in early microcontrollers and system boards where firmware
might need occasional updates.
● EEPROM:
○ Storing configuration settings in embedded systems (e.g., saving BIOS
settings in computers, storing device configurations).
○ Often used in small-scale storage applications where data changes
infrequently.
● Flash Memory:
○ Used in modern storage devices like USB drives, solid-state drives
(SSDs), and memory cards (SD cards, microSD cards).
○ In smartphones, tablets, and cameras for storing large amounts of
data quickly and efficiently.
○ Flash-based storage is widely used in embedded systems and
automotive applications as well.
1. Registers
○ Speed: Fastest
○ Size: Smallest (few bytes to a few kilobytes)
○ Cost: Most expensive per byte
○ Description: Registers are the smallest and fastest memory located
inside the CPU. They hold data that is currently being processed.
Registers are very expensive to produce but offer the highest speed.
2. Cache Memory (L1, L2, L3)
○ Speed: Very fast, but slower than registers
○ Size: Small (from a few KB to a few MB)
○ Cost: Expensive per byte
○ Description: Cache memory is located closer to the CPU than the main
memory and stores frequently used instructions and data to speed up
processing. There are typically multiple levels of cache (L1, L2, and L3),
with L1 being the smallest and fastest and L3 being larger but slower.
3. Main Memory (RAM)
○ Speed: Slower than cache memory, but faster than secondary storage
○ Size: Larger (from a few GB to several GB)
○ Cost: Less expensive than cache memory per byte
○ Description: RAM (Random Access Memory) is the primary working
memory in a computer where programs and data in active use are
stored. It is faster than secondary storage (like hard drives or SSDs) but
slower than cache memory. RAM is volatile, meaning it loses all data
when the power is turned off.
4. Secondary Storage (HDDs, SSDs, Optical Disks)
○ Speed: Slowest (compared to all other memory types)
○ Size: Very large (from hundreds of GB to several TB)
○ Cost: Least expensive per byte
○ Description: Secondary storage, like Hard Disk Drives (HDDs), Solid
State Drives (SSDs), or optical disks, is used for long-term data storage.
It is much slower than main memory but can store much larger
amounts of data. SSDs are faster than HDDs, but both are still much
slower than RAM or cache memory.
5. Tertiary Storage (Cloud Storage, Magnetic Tape)
○ Speed: Slowest (can be very slow, especially in case of magnetic tape)
○ Size: Extremely large (can be in the petabytes)
○ Cost: Cheapest per byte
○ Description: Tertiary storage includes things like cloud storage or
magnetic tape, and it's typically used for archiving or backup
purposes. Access to data in tertiary storage is much slower, but it is
very cheap for storing large amounts of data.
Comparison of Speed, Size, and Cost
Memory Type Speed Size Cost
How Speed, Size, and Cost Impact the Design of Memory Hierarchy
1. Speed:
○ Higher speed memory is needed to process data as quickly as
possible. The CPU needs to access data from memory quickly, so
registers and cache memory are designed to be extremely fast to
reduce bottlenecks. However, their small size limits how much data
they can store.
○ RAM is slower but larger, providing more room for data that is actively
used. When more data is needed, it is fetched from secondary storage
(e.g., HDD/SSD), which is much slower but can store far more data.
2. Size:
○ Smaller and faster memories, like registers and cache, are used for
frequently accessed data, while larger and slower memories, like RAM
and secondary storage, store less frequently accessed data.
○ The design of the memory hierarchy balances between size and
speed. Registers are tiny because they only need to store a few bits of
data for immediate processing, while secondary storage can be much
larger because it is used for storing data not in active use.
3. Cost:
○ Faster memory types like registers and cache are more expensive to
manufacture per byte. As a result, they are made smaller to limit the
cost.
○ Main memory (RAM) is cheaper per byte than cache memory, and
secondary storage (like HDDs or SSDs) is even cheaper, allowing for
large storage capacities at lower costs. Tertiary storage like magnetic
tape or cloud storage is the least expensive per byte, though it comes
with the drawback of slower access times.
By having data that is frequently used stored in the fastest (and smallest) memory
types and less frequently used data stored in slower (and larger) memory types, a
system can optimize performance and minimize cost.
Cache memory is a small, high-speed storage area between the CPU and the main
memory (RAM) in a computer. It stores frequently accessed data to speed up data
retrieval for the CPU. Cache memory works based on the principle of locality of
reference and mapping functions to decide which data to store in the cache.
Locality of Reference
The write operation refers to how data is written into the cache when a CPU writes
data that needs to be cached. There are two common protocols for handling write
operations:
Cache mapping refers to the way data from the main memory is placed into the
cache memory. There are three types of mapping functions used in cache memory
systems:
Conclusion
● Direct-mapped caches are simple and fast but suffer from conflict misses.
● Fully associative caches are highly flexible and avoid conflict misses but are
more complex and slower.
● Set-associative caches provide a compromise, offering better performance
than direct-mapped caches without the full complexity of fully associative
caches.
By utilizing locality of reference and different mapping schemes, cache memory
can significantly improve the overall performance of a computer system.
1. Memory Interleaving
Concept:
● Instead of storing all data sequentially in a single block, interleaving splits the
memory into multiple blocks (also called banks), and data is stored across
these banks in a way that allows concurrent access.
● It reduces the delay of memory access by enabling parallel read or write
operations.
Types of Interleaving:
● 2-way Interleaving: Memory addresses are divided into two banks, and every
alternate address is mapped to a different bank.
● 4-way Interleaving: Memory addresses are divided into four banks, with each
bank storing data from every fourth address.
Advantages:
Hit Rate:
● Hit rate refers to the percentage of times that data requested by the CPU is
found in the cache (i.e., a cache hit).
○ Formula: Hit Rate=Number of Cache HitsTotal Number of Memory
Accesses×100\text{Hit Rate} = \frac{\text{Number of Cache
Hits}}{\text{Total Number of Memory Accesses}} \times 100Hit
Rate=Total Number of Memory AccessesNumber of Cache Hits×100
● A high hit rate means that data is frequently accessed from the cache, which
leads to faster performance.
Miss Rate:
● The miss rate is the opposite of the hit rate. It refers to the percentage of
times the requested data is not found in the cache (i.e., a cache miss).
○ Formula: Miss Rate=1−Hit Rate\text{Miss Rate} = 1 - \text{Hit Rate}Miss
Rate=1−Hit Rate
○ Miss penalty is the time taken to fetch data from a slower memory
hierarchy (e.g., from main memory or disk) when a cache miss occurs.
Miss Penalty:
● Miss penalty refers to the additional time required to retrieve data from a
lower-level memory (such as RAM or even disk) when a cache miss happens.
● High miss penalty can severely degrade performance, as it requires
accessing slower memory sources.
Impact on Performance:
● A high hit rate minimizes the impact of miss penalties because data can be
retrieved from the fast cache.
● A low miss rate (or high hit rate) results in faster memory access and better
overall performance.
3. Caches on the Processor Chip (On-Chip Caches)
On-chip caches are caches that are integrated directly onto the processor (CPU),
making them faster and more efficient than off-chip memory. Modern processors
typically include multiple levels of on-chip cache: L1 cache, L2 cache, and
sometimes L3 cache.
● Reduced latency: On-chip caches have much lower access times compared to
off-chip memory.
● Higher throughput: The CPU can access data from the cache much more
quickly, leading to faster data processing.
● Energy efficiency: Since data does not need to travel far (within the chip),
power consumption is lower compared to accessing external memory.
● L1 Cache: Closest to the processor cores and the smallest in size (typically
32KB - 128KB per core). It stores the most frequently accessed data and
instructions.
● L2 Cache: Larger and slightly slower than L1 cache (typically 256KB - 8MB). It
serves as a backup to L1 cache and stores a broader set of data.
● L3 Cache: Shared among multiple cores in multi-core processors, much
larger in size (typically 2MB - 16MB), and slower than L1/L2 caches.
4. Write Buffer
A write buffer is a temporary storage area used to hold data before it is written to
the main memory or cache. It helps manage write operations efficiently.
Purpose:
● Speed up CPU operations: The CPU can continue executing instructions while
write operations are being buffered.
● Reduce contention: It reduces delays due to write operations waiting for
slower memory writes.
How it Works:
● When the CPU writes data to memory or cache, the data is first placed in the
write buffer.
● The buffer allows the CPU to continue other operations while the data is being
written to the destination (main memory or cache) at a later time.
Impact on Performance:
● Write buffers can improve performance by reducing write stalls and allowing
the CPU to perform other tasks while the data is being written.
● However, a full write buffer can cause stall cycles in the processor if there is
no space to write new data.
5. Prefetching
Prefetching is the process of anticipating future memory accesses and loading the
corresponding data into the cache before it is actually requested by the CPU.
Types of Prefetching:
Benefits:
● Reduced cache miss rate: By bringing data into the cache ahead of time, the
likelihood of a cache miss is reduced.
● Increased throughput: Data is available in the cache when the CPU needs it,
reducing delays from fetching data from slower memory.
Challenges:
A lockup-free cache is a cache architecture that avoids cache access delays when
there are multiple memory accesses happening simultaneously.
How It Works:
● In a lockup-free system, the CPU can continue to access the cache even if
there are cache misses or other memory accesses occurring at the same
time.
● It prevents the system from being locked up (stalled) due to memory access
issues.
Benefits:
Challenges:
Write Buffer Temporary storage for write Allows the CPU to continue work
operations to avoid delays. while write operations are
handled.
Lockup-Free Cache design that allows Prevents CPU stalls during cache
Cache concurrent memory access. accesses, improving efficiency.
● Virtual Address Space: A process is given its own range of memory addresses,
referred to as its virtual address space.
● Physical Address Space: This is the actual memory space available in the
computer's RAM.
● Paging: Memory is divided into fixed-size blocks called pages (for virtual
memory) and frames (for physical memory). The operating system moves
pages between physical memory and secondary storage (like a hard drive or
SSD) as needed.
● Page Table: The page table is used to map virtual addresses to physical
addresses. It stores the mapping between virtual pages and physical frames.
● Isolation and protection: Each process operates in its own address space,
providing protection against interference from other processes.
● Efficient use of RAM: By allowing processes to use more memory than is
physically available, the system can run larger applications.
● Simplified programming model: Programmers don’t need to worry about
memory limitations because each process is given its own virtual address
space.
● Virtual Address: When a program uses a virtual address to access memory, the
virtual address is divided into two parts:
1. Page Number: The higher-order bits that identify the virtual page.
2. Offset: The lower-order bits that specify the exact location within the
page (the byte offset).
● Page Table Lookup: The page table is used to find the corresponding
physical page frame for a given virtual page. The page table stores the
mapping between the virtual page number and the physical page frame
number.
● Physical Address: Once the physical page frame is found, the offset from the
virtual address is combined with the frame number to form the physical
address.
The Translation Lookaside Buffer (TLB) is a special type of cache used to speed up
the address translation process. The TLB stores recent virtual-to-physical page
mappings to avoid the overhead of accessing the page table every time a translation
is needed.
● The TLB is a small, fast, associative cache that stores entries of the form:
1. Virtual Page Number (VPN): The virtual page.
2. Physical Frame Number (PFN): The corresponding physical frame.
● TLB Entry: A typical entry in the TLB consists of:
1. Tag: The virtual page number.
2. Data: The corresponding physical frame number.
3. Other information: Access permissions, dirty bit, etc.
● TLB Hit: The page number is found in the TLB. The physical address can be
generated immediately.
● TLB Miss: The page number is not found in the TLB. The CPU must access the
page table to find the physical frame, and then the mapping is typically
cached in the TLB.
● Like other caches, the TLB has a limited number of entries. When it’s full, an
entry must be replaced.
○ Common replacement policies include LRU (Least Recently Used) and
FIFO (First In, First Out).
Below is a simplified diagram showing how the TLB works in conjunction with the
page table during address translation.
sql
CopyEdit
+-----------------------+ +----------------------------+
+--------------------+
| CPU (Virtual Addr) | --> | TLB Lookup (TLB Check) |
--> | TLB Hit? |
+-----------------------+ +----------------------------+
+--------------------+
| Virtual Address (V) | | Check if page number in TLB
| | If Hit: Use Frame |
+-----------------------+ +----------------------------+
+--------------------+
| |
|
| |
|
v v
v
+---------------------+ +-----------------------+
+--------------------+
| Extract Page Number | | TLB Miss - Access |
| Use Page Table |
| and Offset | ---> | Page Table for Mapping |
-----> | to Get Physical |
+---------------------+ +-----------------------+
| Address |
|
+--------------------+
v
+-----------------------+
| Combine Physical |
| Frame Number + Offset|
+-----------------------+
|
v
+----------------------+
| Final Physical Addr |
+----------------------+
Explanation:
1. CPU generates a virtual address.
2. The TLB is checked for the virtual page number (VPN).
3. If the TLB contains the mapping (hit), the physical address is quickly obtained.
4. If there’s a miss, the page table is accessed to translate the virtual address to
a physical address.
5. The physical address is then used for memory access.
Summary of Concepts
Concept Description
TLB Miss Occurs when the requested virtual page number is not
found in the TLB, requiring a lookup in the page table.
TLB Replacement Rules for replacing entries in the TLB when it is full,
Policy such as LRU or FIFO.
Memory management must ensure that processes are isolated from each other.
Each process should have its own private memory space, preventing one process
from interfering with or accessing the memory of another.
● Process Isolation: Each process should have its own virtual address space.
This prevents processes from directly accessing or modifying each other’s
memory, providing security and stability.
● Protection Mechanisms: The OS should prevent processes from accessing
memory they do not own, even if a process tries to use another process's
memory intentionally or due to bugs.
○ This is achieved using mechanisms like base and limit registers,
access control lists, and memory segmentation.
3. Memory Address Translation
● Paging: Memory is divided into fixed-size blocks (pages), and the system uses
a page table to map virtual addresses to physical addresses.
● Segmentation: Memory is divided into segments, each representing different
types of data, such as code, data, and stack.
● Virtual Memory: It allows processes to use more memory than physically
available by swapping data in and out of secondary storage (disk). This
requires efficient address translation mechanisms.
Memory fragmentation refers to the inefficient use of memory due to the allocation
and deallocation of memory blocks of various sizes. Fragmentation can degrade
system performance, so memory management should minimize both external and
internal fragmentation.
Solutions to Fragmentation:
Access control mechanisms are vital to protect the system from unauthorized
access and to prevent processes from violating memory boundaries.
Virtual memory management allows the OS to run large programs on machines with
limited physical memory by swapping data in and out of secondary storage (disk).
This is done through paging or segmentation, which helps the system use memory
more efficiently.
● Paging: Dividing memory into small fixed-size blocks called pages. When the
system is running low on memory, it swaps pages from RAM to disk storage
(swap space) and vice versa.
● Swapping: The OS swaps entire processes or parts of processes between
RAM and disk to free up memory for other processes.
● Heap Memory: The heap is a region of memory used for dynamic memory
allocation. The operating system needs to manage the allocation and
deallocation of memory in the heap to prevent issues like memory leaks or
segmentation faults.
● Garbage Collection: Some systems (e.g., Java, Python) use garbage
collection to automatically reclaim memory that is no longer in use. This helps
in managing memory for long-running applications.
Paging and segmentation are two methods of memory management, and some
systems combine both techniques.
● Paging: Splits memory into small fixed-size blocks (pages). Each process’s
address space is divided into pages, and these pages are mapped to frames
in physical memory.
● Segmentation: Divides the memory into segments that are logical divisions
such as code, stack, data, and heap. Each segment can grow or shrink
independently.
● Combined Paging-Segmentation: Some systems combine the two
techniques to allow more flexible memory management.
Process Isolation & Ensure that processes do not interfere with each
Protection other’s memory and prevent unauthorized access.
Let's look at the details of magnetic disks, other storage devices, and how they
operate.
Magnetic disks are the most widely used type of secondary storage. They store data
by magnetizing small sections (tracks) of a disk’s surface. Data is written to or read
from the disk by read/write heads that move across the surface.
● High Storage Capacity: Magnetic disks offer large storage capacities ranging
from several gigabytes (GB) to multiple terabytes (TB).
● Non-Volatility: Data remains stored even when the power is turned off,
making it ideal for long-term storage.
● Faster Data Access: Faster than optical media (like CDs/DVDs) for reading
and writing data.
● Cost-Effective: Magnetic disks are more affordable than solid-state drives
(SSDs) for the amount of storage they provide.
● Tracks: Circular paths on the disk surface where data is stored. Each platter of
a disk has multiple concentric tracks.
● Sectors: A track is further divided into smaller segments called sectors, which
typically hold 512 bytes or 4 KB of data.
● Cylinders: When multiple platters are used in a disk, the same track number
across different platters forms a cylinder.
5. Access Time
Access time is the time it takes to retrieve or write data to the disk. It includes:
● Seek Time: The time it takes for the disk’s read/write head to move to the
correct track.
● Rotational Latency: The time it takes for the disk to rotate to the correct
sector under the read/write head.
● Data Transfer Time: The time taken to transfer data to or from the disk once
the correct sector is under the head.
The total disk access time is the sum of seek time, rotational latency, and data
transfer time.
6. Data Buffer/Cache
The data buffer (or cache) is a small amount of high-speed memory used to store
frequently accessed data. It helps improve disk performance by reducing the
number of times data must be read from or written to the slower magnetic disk.
7. Disk Controller
The disk controller is a hardware component responsible for managing the reading
and writing of data on the disk. It communicates between the CPU and the disk,
converting commands from the operating system into actions that the disk hardware
can perform. The disk controller handles tasks such as:
8. ATA/EIDE Disks
● ATA (Advanced Technology Attachment): A standard interface for connecting
hard drives to computers. ATA defines both the physical connection and the
protocol for communication.
● EIDE (Enhanced IDE): An extended version of ATA that supports faster data
transfer rates, larger storage capacities, and additional features like CD-ROM
support.
Advantages:
Disadvantages:
SCSI is a set of standards for connecting and transferring data between computers
and peripheral devices. It supports multiple devices on a single bus and offers fast
data transfer rates.
Advantages:
● Supports multiple devices on one bus, reducing the need for multiple
controller cards.
● High performance and reliability, often used in servers and high-end
workstations.
● Can support a wide range of devices (hard drives, scanners, printers, etc.).
Disadvantages:
SATA is a newer interface for connecting hard drives and SSDs to the motherboard,
designed to replace the older IDE/ATA interface.
Advantages:
Disadvantages:
● Still slower than modern SAS (Serial Attached SCSI) or NVMe interfaces for
high-performance storage.
● Limited to a smaller number of devices connected per bus.
Floppy disks were once popular for storing small amounts of data. They are now
obsolete in most systems, but were used extensively in the past for portable storage.
Characteristics:
Advantages:
Disadvantages:
RAID is a data storage technology that combines multiple physical disk drives into
one or more logical units to improve data redundancy and performance.
RAID Levels:
● RAID 0: Striping (data is divided into blocks and spread across multiple disks).
Advantage: Fast read/write speeds. Disadvantage: No data redundancy.
● RAID 1: Mirroring (data is duplicated on two or more disks). Advantage: Data
redundancy (backup). Disadvantage: Requires double the storage.
● RAID 5: Striping with parity (data is striped across multiple disks with parity
information for redundancy). Advantage: Efficient use of space and data
redundancy. Disadvantage: Slower write speeds.
● RAID 10: Combination of RAID 1 and RAID 0. Advantage: Both redundancy and
improved performance. Disadvantage: Requires a minimum of 4 disks.
Advantages:
Disadvantages:
Optical disks, such as CDs, DVDs, and Blu-ray disks, use laser technology to read
and write data. They are primarily used for storing media files, software, and archival
data.
Advantages:
Disadvantages:
Advantages:
Disadvantages:
Optical Durable, cheap, good for Slow access times, limited capacity
Disks mass storage
Magnetic High capacity, cost-effective Slow read/write speeds, sequential
Tapes for backups access only
I/O Organization
Introduction:
In computer architecture, I/O (Input/Output) Organization refers to the way a computer system
manages the data exchange between its internal components (like the CPU, memory) and the
outside world (like keyboards, mice, displays, storage devices, etc.).
I/O Organization is how a computer's internal system communicates with the outside world to
send and receive data efficiently, which is crucial for real-time systems like gaming, interactive
devices, and even industrial machinery.
Simple Explanation:
1. Input Devices: Devices that send data into the computer, like keyboards, mouses, and
sensors.
2. Output Devices: Devices that display or produce results from the computer, like
monitors, printers, and speakers.
3. I/O Controllers: Hardware that helps manage and control data transfer between the
internal system and external devices.
Key Concepts:
● I/O Ports: These are physical or virtual ports where input or output devices connect to
the computer.
● Buses: These are communication pathways that carry data between the computer's
internal components and external devices.
● Direct Memory Access (DMA): A method where data is transferred directly between
memory and I/O devices, bypassing the CPU to speed up data transfer.
Real-Time Applications:
1. Touchscreen Devices: Smartphones and tablets use I/O organization to handle touch
input and display output (text, images).
2. Printers: Computers send data to printers (output) via an I/O controller to convert digital
documents into physical printouts.
Accessing I/O devices refers to how a computer communicates with devices like keyboards,
mice, printers, and storage devices. This involves reading data from or writing data to external
devices, which can be done in different ways. The two most common methods of accessing I/O
devices are Memory-Mapped I/O and I/O Mapped I/O.
● How it works: The I/O devices are assigned specific addresses within the system's
memory space. When the CPU reads or writes data to those addresses, the data is sent
to or received from the I/O devices.
● Advantages:
○ Easier and faster to program because the CPU uses the same instructions for
memory and I/O.
○ More flexible, as you can access larger address ranges.
● Disadvantages:
○ Consumes address space that could otherwise be used for RAM.
In I/O Mapped I/O, I/O devices have their own separate address space (different from the
memory address space). The CPU uses special instructions to access I/O devices.
● How it works: The I/O devices are assigned separate addresses, and the CPU uses
specific I/O instructions (like IN or OUT in assembly language) to access these devices.
● Advantages:
○ Memory space for RAM is not taken up by I/O devices.
○ More secure and simpler to manage since memory and I/O are separate.
● Disadvantages:
○ The CPU requires different instructions to interact with I/O devices, making
programming slightly more complex.
Example: A printer or disk drive might be accessed through I/O instructions that target its
specific port addresses.
An I/O interface is a hardware component that allows communication between the CPU and the
input device (like a keyboard or mouse). The interface converts signals from the input device
into data the CPU can understand and vice versa.
● How it works: When a user interacts with an input device, the interface converts the
physical input (e.g., keypress, mouse movement) into data that the CPU processes.
● Example: For a keyboard, each keypress generates a unique code. The interface
transmits this code to the CPU, which interprets it.
What is an Interrupt?
In simpler terms, an interrupt is like a "signal" that says, "Hey, stop what you're doing and deal
with this right now!"
Types of Interrupts
Interrupts can be categorized based on their source and how they are handled. Here are the
main types of interrupts:
1. Hardware Interrupts
2. Software Interrupts
4. External Interrupts
1. Interrupt Request (IRQ): An interrupt is initiated by the hardware or software, which
sends an interrupt request to the CPU.
2. Interrupt Acknowledgment: The CPU stops executing the current instruction and
acknowledges the interrupt request.
3. Interrupt Service Routine (ISR): The CPU runs a special function or routine (ISR)
designed to handle the interrupt. The ISR takes care of the interrupt and processes it.
4. Return to Normal Execution: After the ISR finishes, the CPU resumes its previous task
or program where it left off.
Imagine you're cooking dinner (the CPU running a program) and your phone rings (an interrupt).
You stop cooking (pause the current task) and answer the call (handle the interrupt). After the
call is finished, you return to cooking (resume the original task).
Interrupts allow the CPU to be more efficient by responding quickly to important tasks without
having to constantly check for them manually (polling).
Key Points:
1. Initiation: The CPU sends a command to the DMA controller to set up the transfer,
specifying the memory address, the I/O device, and the amount of data to be transferred.
2. Transfer: The DMA controller takes control of the system's data bus and transfers data
directly between the I/O device and memory.
3. Completion: Once the transfer is complete, the DMA controller sends an interrupt to the
CPU to notify it that the transfer is finished, allowing the CPU to resume its normal tasks.
1. CPU to DMA: The CPU initializes the DMA by sending the source and destination
addresses, the direction (read/write), and the number of data units.
2. DMA Controller: The DMA controller handles the data transfer directly, accessing the
memory and the I/O device without CPU involvement.
3. Interrupt: After completing the transfer, the DMA controller sends an interrupt to the
CPU, signaling that the task is finished.
Types of DMA:
1. Burst Mode DMA: The DMA controller transfers a block of data in a single burst. The
CPU is locked out during this burst and cannot perform other tasks.
2. Cycle Stealing DMA: The DMA controller transfers data one byte at a time, stealing
cycles from the CPU. The CPU gets control after each byte is transferred.
3. Block Mode DMA: Similar to burst mode, but the DMA controller transfers data in larger
blocks, giving the CPU more time between transfers.
4. Demand Mode DMA: The DMA controller transfers data when the I/O device is ready,
and the CPU can continue with other tasks in the meantime.
+-------------------+ +--------------------+ +---------------------+
| | | | | |
| CPU |<----->| DMA Controller |<----->| I/O Device |
| | | | | |
+-------------------+ +--------------------+ +---------------------+
| | |
| v v
| +----------------+ +-----------------+
| | | | |
+----------------->| Memory |<-------------| Data Transfer |
| | | |
+----------------+ +-----------------+
Explanation of Diagram:
1. CPU to DMA: The CPU sets up the DMA controller with the required information
(source, destination, and data size).
2. DMA Controller: Once initialized, the DMA controller takes over and transfers data
between the memory and I/O device without involving the CPU.
3. Memory: Data is moved from memory to the I/O device or from the I/O device to
memory.
4. Interrupt: After completing the data transfer, the DMA controller sends an interrupt to the
CPU, notifying it that the task is finished.
Advantages of DMA:
● Efficiency: The CPU is freed from manually managing every byte of data transfer,
allowing it to focus on more important tasks.
● Speed: Direct transfers without CPU intervention lead to faster data transfers.
● Concurrent Operation: DMA enables data transfer and CPU processing to occur
simultaneously, enhancing overall system performance.
A bus in computer architecture is a set of communication pathways or lines that allow data to be
transferred between different components of the computer, such as the CPU, memory, and
input/output devices. It acts like a "highway" that data, addresses, and control signals travel on,
enabling different parts of the computer to communicate with each other.
1. Data Bus: Carries data between the CPU, memory, and peripherals.
2. Address Bus: Carries memory addresses from the CPU to the memory and I/O devices.
3. Control Bus: Carries control signals that coordinate the operations of different
components (e.g., read/write signals).
Types of Buses
There are two major types of buses based on how they synchronize data transfer:
Synchronous Bus and Asynchronous Bus.
1. Synchronous Bus
A synchronous bus is a bus system where the data transfer is synchronized with a clock
signal. The clock signal regulates the timing of the data transfer, ensuring that both the sender
and receiver are ready to exchange data at the same time.
Key Features:
● Clock Signal: The operation of a synchronous bus depends on a central clock signal
that synchronizes data transfer between components.
● Timing: Data is transferred in sync with the clock, meaning every transfer happens at
fixed intervals.
● Faster Transfer: Since transfers are synchronized, the timing is predictable and faster.
● Efficiency: As both the sender and receiver are synchronized, the system ensures that
data is transferred only when both parties are ready.
Working:
In a synchronous bus, both the sender and receiver wait for the clock signal before sending or
receiving data. This reduces the chances of data being missed or out of sync.
Example:
When a CPU sends data to memory, the transfer will happen at a fixed clock cycle (e.g., every
5ns, 10ns), which ensures consistency and timing precision.
Advantages:
● Predictable Timing: Since the clock controls the flow, the timing of the transfer is
predictable and managed.
● Faster Transfers: Typically faster because the system works in a regular and organized
way.
Disadvantages:
● Dependence on Clock: The entire system depends on the clock speed, meaning higher
clock speeds are needed for faster data transfers.
● Complexity: The system may require more complex control mechanisms to handle
various devices.
2. Asynchronous Bus
An asynchronous bus does not rely on a clock signal for synchronization. Instead, data
transfer is controlled by handshaking signals that indicate when the sender and receiver are
ready for data transfer. In other words, the sender and receiver communicate using control
signals, and the data is transferred when both devices are ready.
Key Features:
● No Clock Signal: The transfer does not rely on a common clock, making it more flexible.
● Handshaking: Communication occurs through handshaking signals. The sender signals
when data is ready, and the receiver signals when it's ready to receive the data.
● Variable Timing: The data transfer happens at variable intervals, as it is dependent on
the readiness of the components involved.
Working:
In an asynchronous bus, the sender places data on the bus and then signals the receiver using
control signals that the data is ready. The receiver, in turn, acknowledges the receipt of the data,
and the transfer happens without a fixed timing reference.
Example:
A peripheral device (like a keyboard or mouse) sends data to the CPU. It sends a signal to the
CPU when the data is ready. The CPU reads the data when it is available, without relying on a
clock cycle.
Advantages:
● Flexibility: Can work with devices that have different operating speeds.
● No Need for Synchronization Clock: There’s no central clock, so devices can operate
independently of each other.
Disadvantages:
● Slower Data Transfers: Since there’s no clock, the handshaking process can make data
transfers slower compared to synchronous systems.
● Complex Communication: More control signals are needed to manage the data
transfer, which can make the system design more complex.
What are Interface Circuits?
Interface circuits are electronic circuits that allow communication between different
components of a computer system, such as the CPU, memory, and I/O devices (e.g., keyboard,
printer, disk drives). These circuits help translate the signals from one component to a form that
another component can understand. In simpler terms, an interface circuit acts as a translator
between the different parts of a computer system, enabling smooth data communication.
Input interface circuits are used to connect input devices (e.g., keyboard, mouse, sensors) to
the computer system so that the data from these devices can be received by the CPU or
memory. These circuits manage the electrical signals from the input device and convert them
into a format that the CPU can understand.
Key Functions:
● Signal Conversion: They convert the signals from the input device (which may be in
analog or digital form) into a format suitable for the CPU (usually digital signals).
● Data Synchronization: Ensures that the data from the input device is correctly
synchronized with the system’s clock or communication protocol.
● Buffering: Stores data temporarily in a buffer before passing it to the CPU, preventing
data loss.
Example:
When you press a key on a keyboard, the input interface converts the keypress signal into
binary data, which is then sent to the CPU for processing.
Key Functions:
● Signal Conversion: Converts the digital data from the CPU into a form that the output
device can understand. For example, converting digital data into analog signals for a
speaker.
● Data Formatting: Organizes the data according to the output device's specifications
(e.g., for printing a document or displaying an image on the screen).
● Buffering: Temporarily stores data to prevent overflow or delays in processing.
Example:
When you print a document, the output interface takes the digital data from the CPU and
converts it into a format that the printer can understand, such as control signals or paper
instructions.
A Standard I/O Interface defines a common set of rules, protocols, and connectors that
standardizes the communication between the computer system and external devices. These
interfaces are designed to provide compatibility across different systems and devices, ensuring
they can work together smoothly.
Key Features:
● USB (Universal Serial Bus): A widely used interface for connecting various devices like
keyboards, mice, printers, and external drives.
● PCI (Peripheral Component Interconnect): A standard used for connecting internal
components, such as network cards or graphics cards, to the motherboard.
● Serial and Parallel Ports: Older standards for connecting devices like printers and
external modems.
What is PCI (Peripheral Component Interconnect)?
Key Features:
● High-Speed Communication: PCI provides a fast data transfer rate (e.g., 33 MHz to
133 MHz).
● Plug-and-Play: PCI devices can be added to the system without needing manual
configuration of settings like IRQ or memory addresses.
● Bus Architecture: PCI uses a parallel bus architecture, meaning multiple devices can
communicate over the same data path simultaneously.
Example:
A graphics card or network adapter installed in a PCI slot on the motherboard to expand the
system's capabilities.
SCSI (Small Computer System Interface) is a set of standards for connecting and transferring
data between computers and peripheral devices, such as hard drives, scanners, and printers. It
provides a flexible and fast communication protocol for a wide range of devices.
Key Features:
● Multiple Devices: SCSI supports multiple devices on a single bus, allowing devices like
hard drives, printers, and CD drives to be connected simultaneously.
● Fast Data Transfer: SCSI provides faster data transfer rates compared to older
standards like parallel ports or serial connections.
● Versatile: SCSI can be used for both internal (inside the computer) and external
(connected to the computer) devices.
Example:
You might use a SCSI interface to connect multiple hard drives to a server or workstation,
providing high-speed data transfer and expandability.
Key Features:
● Ease of Use: USB is plug-and-play, meaning you can connect devices without restarting
the computer or manually configuring settings.
● Data and Power: USB not only transfers data but also supplies power to low-power
devices, eliminating the need for separate power cables.
● Multiple Device Support: USB can connect multiple devices through hubs, allowing
many devices to be connected to a single USB port.
● Hot-Swappable: Devices can be connected and disconnected without turning off the
computer.
Example:
A printer, flash drive, or smartphone can be connected to a computer using a USB port to
transfer data or charge the device.
Summary:
● PCI: A high-speed bus standard for connecting internal devices like graphics cards or
network cards to the motherboard.
● SCSI: A flexible and high-speed interface used for connecting multiple external devices
like hard drives and scanners.
● USB: A universal and widely used standard for connecting external devices to a
computer, providing both data transfer and power.