Architechture Solve Part 1-1
Architechture Solve Part 1-1
1|Page
Example: Imagine a memory system divided into four banks (Bank 0, Bank 1, Bank 2, Bank 3). When
the CPU needs to read or write data, it can access these banks in a staggered manner. While Bank 0 is
being accessed for the first operation, Bank 1 can be accessed for the second operation, Bank 2 for
the third, and Bank 3 for the fourth. This parallelism increases the overall memory throughput and
reduces latency.
Benefits:
Improved Performance: Allows for faster data access and higher bandwidth.
Reduced Latency: Minimizes the waiting time for memory operations.
(h) State any two different between synchronous and asynchronous pipeline processor.
Differences Between Synchronous and Asynchronous Pipeline Processors
1. Clock Dependency:
Synchronous Pipeline Processor: All stages operate in lockstep, synchronized by a global
clock signal. Each stage performs its operation within a clock cycle.
2|Page
Asynchronous Pipeline Processor: Stages operate independently and are not synchronized
by a global clock. Instead, they use handshaking protocols to communicate and move data
between stages.
2. Performance and Flexibility:
Synchronous Pipeline Processor: Generally simpler to design and debug, but performance is
limited by the slowest stage, as all stages must wait for the clock signal.
Asynchronous Pipeline Processor: More flexible and can potentially achieve higher
performance, as each stage operates at its own speed. However, it is more complex to design
and manage due to the lack of a central clock.
(j) State the relation between clock frequency of the pipeline structure and propagation delay of
the pipeline stage.
Relation Between Clock Frequency and Propagation Delay in Pipeline Stages
Clock Frequency:
The clock frequency of a pipeline structure is the rate at which the clock signals are
generated, dictating the speed at which the pipeline stages process instructions. It is
measured in Hertz (Hz).
Propagation Delay:
Propagation delay is the time taken for a signal to travel from the input to the output of a
pipeline stage. It determines how quickly a stage can complete its operation.
Relationship:
The clock frequency is inversely related to the propagation delay. Specifically, the clock
period (the inverse of clock frequency) must be at least as long as the maximum propagation
delay of any pipeline stage to ensure correct operation.
(k) ’Pipeline performance increases while number of stage increases’ - critically justify the
statement.
Pipeline Performance and Number of Stages
Statement Justification:
Increased Throughput: Adding more stages to a pipeline can increase the throughput, as
more instructions can be processed simultaneously. Each stage performs a part of the task,
allowing multiple instructions to be in different stages of execution at the same time.
Shorter Clock Cycle: By breaking down tasks into smaller stages, the clock cycle time can be
reduced, since each stage requires less time to complete. This shorter clock cycle can lead to
a higher overall clock frequency, improving performance.
Critical Perspective:
3|Page
Diminishing Returns: Beyond a certain point, adding more stages can lead to diminishing
returns. The overhead of managing the pipeline, including control logic and inter-stage
communication, can offset the gains in performance.
Increased Complexity: More stages increase the complexity of the pipeline design, leading to
higher costs and potential issues with synchronization, latency, and data hazards.
(m) What do you mean by 16-ways set-associative cache with line capacity 512bytes?
16-Way Set-Associative Cache with 512-Byte Line Capacity
Definition: A 16-way set-associative cache is a type of CPU cache that divides the cache into multiple
sets, where each set contains multiple lines or blocks of data. Specifically, a 16-way set-associative
cache means that each set contains 16 cache lines.
Details:
Set-Associative: This design lies between fully associative (where any block can be placed
anywhere in the cache) and direct-mapped (where each block has only one possible
location). Set-associative caches strike a balance, providing more flexibility and efficiency.
16-Way: In a 16-way set-associative cache, each set can hold 16 different lines of data,
reducing conflicts and improving hit rates.
Line Capacity: The line capacity of 512 bytes means each line in the cache can store 512
bytes of data. This is the size of the block transferred between the main memory and the
cache.
(n) CSA (Carry Save Adder)
Explanation: A Carry Save Adder (CSA) can add up to R + 1 numbers simultaneously, where R is the
radix or base of the number system. This is because a CSA reduces the number of carry bits by
grouping bits of the same position and summing them without immediately propagating the carry,
allowing it to handle an additional number.
(o) Pipeline Architectures in Parallel Computer Design
1. Linear Pipeline: Instructions pass through a series of stages in a sequential manner.
2. Non-Linear Pipeline: Instructions may follow different paths or branches within the pipeline
based on control decisions, such as conditional statements.
(p) Condition for NaN in IEEE-754 Format
Condition: A NaN (Not a Number) in IEEE-754 format is represented by an exponent of all 1s
(1111...1111) and a non-zero fraction (mantissa).
(q) IEEE-754 Single Precision Format for -4.5
Sign Bit: 1 (indicating negative)
Exponent: 10000001 (bias of 127 for the exponent 2)
Mantissa: 00100000000000000000000
IEEE-754 Representation: 1 10000001 00100000000000000000000
(r) Representation of Zero in IEEE-754
4|Page
Zero: In IEEE-754 format, zero is represented by all bits being zero: 0 00000000
00000000000000000000000.
(s) Relation Between Instruction Cycle, Machine Cycle, and Clock Cycle
1. Clock Cycle: The smallest unit of time in a processor, defined by the processor's clock
frequency.
2. Machine Cycle: Consists of several clock cycles and represents the time taken to execute a
basic operation, like fetching or decoding.
3. Instruction Cycle: Encompasses several machine cycles and represents the time taken to
execute a complete instruction.
(t) Difference Between Main Memory and Cache Memory
1. Speed: Cache memory is faster than main memory.
2. Size: Cache memory is smaller in size compared to main memory.
3. Purpose: Cache memory stores frequently accessed data to speed up processing, while main
memory stores all data and instructions required by the CPU.
(u) Difference Between Primary Memory and Secondary Memory
1. Volatility: Primary memory (RAM) is volatile, losing data when power is off; secondary
memory (HDD, SSD) is non-volatile.
2. Speed: Primary memory is faster, while secondary memory is slower but offers larger storage
capacity.
3. Usage: Primary memory is used for immediate data access by the CPU, whereas secondary
memory is used for long-term storage.
(v) Comparison Between SRAM and DRAM
1. Speed: SRAM is faster due to its static design, while DRAM is slower because it needs
periodic refreshing.
2. Power Consumption: SRAM consumes more power, while DRAM is more power-efficient.
3. Cost: SRAM is more expensive, whereas DRAM is cheaper and used for larger memory
requirements.
(w) Comparison Between RAM, SAM, and CAM
1. RAM (Random Access Memory): Memory where any byte can be accessed directly.
2. SAM (Sequential Access Memory): Memory where data is accessed in a fixed sequence.
3. CAM (Content Addressable Memory): Memory where data is accessed based on content
rather than address.
(x) Define and Explain Instruction Cycle
Instruction Cycle: The cycle through which the CPU fetches, decodes, and executes an instruction. It
includes several stages like fetching the instruction, decoding it, executing it, and storing the result.
(y) What is Interrupt?
Interrupt: An interrupt is a signal to the processor indicating an event that needs immediate
attention. It temporarily halts the current execution, saves its state, and executes an interrupt service
routine to address the event.
(z) Use of DMA (Direct Memory Access)
DMA: Direct Memory Access allows peripherals to directly transfer data to and from memory
without CPU intervention, improving efficiency and freeing up the CPU for other tasks.
Difference Between Turing Computing and Von Neumann Computing
1. Turing Computing: Based on Turing machines, abstract models of computation that define
algorithms and problem-solving without specifying hardware.
2. Von Neumann Computing: Refers to the architecture of a computer system where
instructions and data share the same memory space and are executed sequentially by a CPU.
5|Page
(a) Functions of CPU Registers (2 Marks for each register/counter)
1. Accumulator (AC): Holds intermediate results of arithmetic and logical operations (2 marks).
2. Program Counter (PC): Holds the address of the next instruction to be executed (2 marks).
3. Stack Pointer (SP): Points to the top of the stack, used in stack operations like push and pop
(2 marks).
4. Instruction Register (IR): Holds the currently executing instruction (2 marks).
5. Memory Address Register (MAR): Holds the address of the memory location to be accessed
(2 marks).
6. Memory Buffer Register (MBR): Holds the data to be written to or read from the memory (2
marks).
7. Flag Registers: Contains flags or indicators that represent the state of the CPU, such as zero,
carry, sign, and overflow flags (2 marks).
8. Status Register: Similar to flag registers, it holds information about the state of the CPU and
the results of operations (2 marks).
(b) Compare RISC and CISC Computer Architecture (5 Marks)
RISC (Reduced Instruction Set Computer):
1. Simple instructions that execute in one clock cycle (1 mark).
2. Large number of general-purpose registers (1 mark).
3. Emphasis on software to perform complex tasks (1 mark).
CISC (Complex Instruction Set Computer):
1. Complex instructions that execute multiple tasks in a single instruction (1 mark).
2. Fewer general-purpose registers (1 mark).
3. Emphasis on hardware to execute complex instructions (1 mark).
(c) Characteristics of RISC Architecture. Compare Microprocessor and Microcontroller. (3 + 2)
Characteristics of RISC Architecture:
1. Simplified instruction set (1 mark).
2. Single-cycle instruction execution (1 mark).
3. Large number of registers (1 mark).
Comparison of Microprocessor and Microcontroller:
1. Microprocessor: General-purpose computing unit with no RAM, ROM, or I/O ports on the
chip. Used in PCs, laptops, and servers (1 mark).
2. Microcontroller: Contains CPU, RAM, ROM, and I/O ports on a single chip. Used in
embedded systems like automotive, home appliances, and IoT devices (1 mark).
(d) Explain with a Suitable Diagram How Associative Mapping in the Cache System Works (5 Marks)
(Note: Placeholder for an actual diagram)
Explanation: In associative mapping, any block of main memory can be placed in any line of the
cache. This provides flexibility but requires a search mechanism. Tags are used to identify the blocks,
and the cache controller searches all tags to find the required data. If found, it's a cache hit;
otherwise, it's a cache miss (3 marks for the explanation, 2 marks for the diagram).
(e) Number of Bits Required for Tag, Line, and Offset (5 Marks)
Total Address Size: 32 bits (assuming standard 32-bit address)
Offset: Line capacity of 256 bytes = 2^8, so 8 bits for offset (1 mark).
Number of Lines: 16KB / 256 bytes = 64 lines = 2^6, so 6 bits for the line (1 mark).
Tag: Remaining bits = 32 - (8 + 6) = 18 bits (1 mark).
Tag: 18 bits, Line: 6 bits, Offset: 8 bits.
(f) Effects of Branches and Branch Prediction (3 + 2)
Effects of Branches:
1. Pipeline stalls due to branch instructions (1 mark).
6|Page
2. Flushing the pipeline if the branch is taken (1 mark).
3. Reduced pipeline efficiency (1 mark).
Branch Prediction:
1. Static Prediction: Predicts branches based on a fixed behavior (1 mark).
2. Dynamic Prediction: Uses hardware to track the history of branches and make predictions (1
mark).
(g) Speedup Factor and Formula Derivation (2 + 3)
Speedup Factor: Ratio of execution time without pipeline to execution time with pipeline (2 marks).
Formula:
Speedup=n×tnon-pipelinek×tpipeline+(n−1)×tpipeline\text{Speedup} = \frac{n \times t_{\text{non-
pipeline}}}{k \times t_{\text{pipeline}} + (n-1) \times t_{\text{pipeline}}}
Where nn = number of operations, kk = number of stages, tpipelinet_{\text{pipeline}} = time per
stage (3 marks for derivation).
(h) Advantages of a Pipeline Processor Based Parallel Computer and Floating-Point Addition in Four
Stage Pipeline (5 Marks)
Advantages:
1. Increased throughput due to parallel processing (2 marks).
2. Higher clock speed and efficiency (1 mark).
Floating-Point Addition:
1. Fetch operands.
2. Align exponents.
3. Add significands.
4. Normalize result (2 marks).
(i) Use of Feedback and Feed-Forward Path and Maximum Throughput Calculation (3 + 2)
Feedback Path: Used to reroute the output of a stage back to a previous stage to ensure correct data
processing (1.5 marks).
Feed-Forward Path: Provides direct data transfer to subsequent stages without intermediate stages
(1.5 marks).
Max Throughput Calculation:
Max Throughput=1slowest stage delay=1110 ns=9.09 MFLOPS\text{Max Throughput} = \frac{1}{\
text{slowest stage delay}} = \frac{1}{110 \text{ ns}} = 9.09 \text{ MFLOPS}
(2 marks).
(j) Define and Relation Between CPI and MIPS (2 + 3)
CPI (Cycles Per Instruction): Average number of clock cycles per instruction (2 marks).
MIPS (Million Instructions Per Second):
MIPS=Clock FrequencyCPI×106\text{MIPS} = \frac{\text{Clock Frequency}}{\text{CPI} \times 10^6}
(3 marks).
(k) Design an Instruction Processing Pipeline (5 Marks)
Typical 5-Stage Pipeline:
1. IF (Instruction Fetch): Fetch instruction from memory.
2. ID (Instruction Decode): Decode instruction and fetch registers.
3. EX (Execute): Perform arithmetic/logic operations.
4. MEM (Memory Access): Read/write data from/to memory.
5. WB (Write Back): Write the result back to the register file (5 marks).
(l) Minimum Periods for Floating Point Addition (5 Marks)
Total stages = 4, delay per stage = 2ns.
To add 100 numbers:
7|Page
Number of periods=(100−1)×1+4=103 periods\text{Number of periods} = (100 - 1) \times 1 + 4 =
103 \text{ periods}
(5 marks).
(m) CSA Carry Save Adder and Wallace Tree (2 + 3)
CSA: Can add up to R + 1 numbers, where R is the radix (2 marks).
Wallace Tree:
Use partial products reduction in stages until two numbers remain, then use a conventional
adder (3 marks).
(n) MMU with Segmentation and Paging (5 Marks)
Diagram: [Diagram needed]
Explanation:
Segmentation: Divides memory into segments of variable size.
Paging: Divides segments into fixed-size pages, which simplifies memory management and
reduces fragmentation (3 marks for explanation, 2 marks for diagram).
(o) Difference Between Paging and Segmentation (5 Marks)
1. Paging:
o Fixed-size pages (2.5 marks).
o Simplifies memory management (2.5 marks).
2. Segmentation:
o Variable-size segments (2.5 marks).
o Provides logical separation of memory (2.5 marks).
(p) Structure and Use of TLB (5 Marks)
Translation Lookaside Buffer (TLB):
Structure: Small, fast cache that stores recent translations of virtual memory to physical
addresses (2 marks).
Use: Improves memory access speed by reducing the number of accesses to the page table
(3 marks).
(q) Definitions and Explanations (5 Marks each)
(i) Degree of Parallelism: Number of instructions that can be executed simultaneously (5 marks).
(ii) Scalable Architecture: An architecture that can efficiently handle increasing workloads or
resources (5 marks).
(iii) Amdahl’s Law:
Speedup=1(1−P)+PS\text{Speedup} = \frac{1}{(1 - P) + \frac{P}{S}}
Where PP is the parallelizable portion and SS is the speedup of the parallel portion (5 marks).
(iv) Dynamic Pipeline: A pipeline that can adapt its structure based on the current workload to
optimize performance (5 marks).
8|Page