0% found this document useful (0 votes)
21 views8 pages

Architechture Solve Part 1-1

The document discusses various concepts related to computer architecture, including the Von Neumann Bottleneck, differences between CPU registers, the use of SRAM in cache memory, write-through policy, and cache coherence problems. It also covers pipeline processors, instruction cycles, and comparisons between different types of memory and architectures. Additionally, it addresses the functions of CPU registers and compares RISC and CISC architectures.

Uploaded by

myselfpooja26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Architechture Solve Part 1-1

The document discusses various concepts related to computer architecture, including the Von Neumann Bottleneck, differences between CPU registers, the use of SRAM in cache memory, write-through policy, and cache coherence problems. It also covers pipeline processors, instruction cycles, and comparisons between different types of memory and architectures. Additionally, it addresses the functions of CPU registers and compares RISC and CISC architectures.

Uploaded by

myselfpooja26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1.

Answer any five from the following questions: 5 × 2 = 10


(a) State bottleneck problem in case of Von Neumann Architecture and discuss the possible
solution.
The **Von Neumann Bottleneck** is a limitation in computer architecture where the shared bus
between the CPU and memory creates a data transfer bottleneck, slowing down overall system
performance. One solution is to use **cache memory**, which stores frequently accessed data,
reducing the need for slower main memory access.
(b) Compare General and special purpose CPU registers.
General-Purpose Registers:
1. Versatility: Can store data or addresses, making them multipurpose.
2. Flexibility: Used in a wide range of operations, including arithmetic, logical, and data
manipulation tasks.
3. Examples: Accumulator (A), Base Register (B), Index Register (X).
Special-Purpose Registers:
1. Dedicated Roles: Designed for specific tasks and control various aspects of CPU operations.
2. Limited Use: Cannot be used for general computations, only for their designated functions.
3. Examples: Program Counter (PC), Stack Pointer (SP), Status Register/Flags Register,
Instruction Register (IR).
(c) Why SRAM is used instead of DRAM in case of cache memory design?
1. Speed: SRAM (Static RAM) is faster than DRAM (Dynamic RAM). It does not require periodic
refreshing like DRAM, allowing for quicker data access, which is crucial for the high-speed
requirements of cache memory.
2. Latency: SRAM has lower latency compared to DRAM. The faster access time of SRAM significantly
improves the CPU's performance as it can quickly fetch instructions and data from the cache.
3. Power Consumption: Although SRAM uses more power compared to DRAM, the speed advantage
it offers outweighs the higher power consumption when used in smaller amounts like cache memory.
These advantages make SRAM an ideal choice for cache memory, ensuring faster and more efficient
performance of the CPU.
(d) What is write-through policy? Explain with a suitable example.
Write-Through Policy
Definition: The write-through policy is a cache writing technique in which data is written
simultaneously to both the cache and the main memory. This ensures that the main memory always
has the most up-to-date data.
Example: Imagine a CPU needs to update the value of a variable X stored in memory. With a write-
through policy:
1. The CPU writes the new value of X to the cache.
2. At the same time, the new value of X is also written to the main memory.
Advantages:
 Data Consistency: Ensures that the data in the main memory is always current, which
simplifies data synchronization.
 Reliability: Reduces the risk of data loss in the event of a power failure, since the main
memory is always updated.
(e) What do you mean by 4-ways interleaved memory system?
4-Way Interleaved Memory System
Definition: A 4-way interleaved memory system is a technique used to improve the performance of
memory access by dividing the memory into four separate banks. These banks can be accessed
independently, allowing for multiple memory operations to occur simultaneously.

1|Page
Example: Imagine a memory system divided into four banks (Bank 0, Bank 1, Bank 2, Bank 3). When
the CPU needs to read or write data, it can access these banks in a staggered manner. While Bank 0 is
being accessed for the first operation, Bank 1 can be accessed for the second operation, Bank 2 for
the third, and Bank 3 for the fourth. This parallelism increases the overall memory throughput and
reduces latency.
Benefits:
 Improved Performance: Allows for faster data access and higher bandwidth.
 Reduced Latency: Minimizes the waiting time for memory operations.

(f) State cache coherence problem in share memory architecture.


Cache Coherence Problem in Shared Memory Architecture
Definition: The cache coherence problem occurs in shared memory architectures where multiple
processors have their own caches, leading to the possibility of data inconsistency. When one
processor updates a value in its cache, other processors may have outdated copies of that value in
their caches.
Example: Imagine two processors, P1 and P2, both caching the same memory location X. If P1
updates the value of X in its cache, P2's cache still holds the old value of X. This inconsistency can
lead to incorrect program execution.
Impact:
 Data Inconsistency: Different processors may operate on outdated data, causing errors.
 Synchronization Issues: It complicates synchronization mechanisms, as consistent data
sharing becomes harder.
Solution: Implementing cache coherence protocols like MESI (Modified, Exclusive, Shared, Invalid)
helps ensure all caches have a consistent view of the shared memory.

(g) What is an asynchronous pipeline processor - explain with a suitable example.


Asynchronous Pipeline Processor
Definition: An asynchronous pipeline processor is a type of pipeline processor where stages do not
rely on a global clock signal to synchronize operations. Instead, each stage operates independently
and communicates with adjacent stages using handshaking protocols, allowing data to move through
the pipeline based on the availability of resources.
Example: Consider a 4-stage pipeline with stages A, B, C, and D. In an asynchronous pipeline:
1. Stage A completes its task and sends a signal to Stage B indicating that data is ready.
2. Stage B receives the signal, processes the data, and then sends a signal to Stage C.
3. This process continues through all stages, with each stage working independently without a
global clock.
Benefits:
 Flexibility: Each stage operates at its own pace, which can lead to more efficient processing.
 Reduced Power Consumption: No need for a global clock, leading to potential energy
savings.
 Scalability: Easier to scale as each stage is self-contained and independent.

(h) State any two different between synchronous and asynchronous pipeline processor.
Differences Between Synchronous and Asynchronous Pipeline Processors
1. Clock Dependency:
 Synchronous Pipeline Processor: All stages operate in lockstep, synchronized by a global
clock signal. Each stage performs its operation within a clock cycle.

2|Page
 Asynchronous Pipeline Processor: Stages operate independently and are not synchronized
by a global clock. Instead, they use handshaking protocols to communicate and move data
between stages.
2. Performance and Flexibility:
 Synchronous Pipeline Processor: Generally simpler to design and debug, but performance is
limited by the slowest stage, as all stages must wait for the clock signal.
 Asynchronous Pipeline Processor: More flexible and can potentially achieve higher
performance, as each stage operates at its own speed. However, it is more complex to design
and manage due to the lack of a central clock.

(i) Why buffer is used between two stages of a pipeline processor.


Purpose of Buffers in Pipeline Processors
1. Data Storage: Buffers act as temporary storage for data between pipeline stages, ensuring that
each stage has the necessary data available when it is ready to process.
2. Decoupling Stages: Buffers decouple the stages, allowing each stage to operate independently
without waiting for the previous stage to complete its processing. This enhances the overall
efficiency and throughput of the pipeline.
3. Handling Variations: Buffers help manage variations in the processing time of different stages. If
one stage takes longer to process data, the buffer can store intermediate results until the next stage
is ready, preventing bottlenecks.

(j) State the relation between clock frequency of the pipeline structure and propagation delay of
the pipeline stage.
Relation Between Clock Frequency and Propagation Delay in Pipeline Stages
Clock Frequency:
 The clock frequency of a pipeline structure is the rate at which the clock signals are
generated, dictating the speed at which the pipeline stages process instructions. It is
measured in Hertz (Hz).
Propagation Delay:
 Propagation delay is the time taken for a signal to travel from the input to the output of a
pipeline stage. It determines how quickly a stage can complete its operation.
Relationship:
 The clock frequency is inversely related to the propagation delay. Specifically, the clock
period (the inverse of clock frequency) must be at least as long as the maximum propagation
delay of any pipeline stage to ensure correct operation.

(k) ’Pipeline performance increases while number of stage increases’ - critically justify the
statement.
Pipeline Performance and Number of Stages
Statement Justification:
 Increased Throughput: Adding more stages to a pipeline can increase the throughput, as
more instructions can be processed simultaneously. Each stage performs a part of the task,
allowing multiple instructions to be in different stages of execution at the same time.
 Shorter Clock Cycle: By breaking down tasks into smaller stages, the clock cycle time can be
reduced, since each stage requires less time to complete. This shorter clock cycle can lead to
a higher overall clock frequency, improving performance.
Critical Perspective:

3|Page
 Diminishing Returns: Beyond a certain point, adding more stages can lead to diminishing
returns. The overhead of managing the pipeline, including control logic and inter-stage
communication, can offset the gains in performance.
 Increased Complexity: More stages increase the complexity of the pipeline design, leading to
higher costs and potential issues with synchronization, latency, and data hazards.

(l) State any two advantages of the pipeline structure.


1. Increased Throughput:
 Parallel Processing: By breaking down a task into multiple stages, a pipeline allows for
simultaneous processing of different instructions. This increases the number of instructions
completed per unit of time, significantly boosting overall throughput.
2. Improved CPU Performance:
 Higher Clock Speed: Each stage in the pipeline handles a smaller portion of the task, allowing
for shorter clock cycles. This enables the CPU to operate at a higher clock speed, resulting in
faster processing and improved performance

(m) What do you mean by 16-ways set-associative cache with line capacity 512bytes?
16-Way Set-Associative Cache with 512-Byte Line Capacity
Definition: A 16-way set-associative cache is a type of CPU cache that divides the cache into multiple
sets, where each set contains multiple lines or blocks of data. Specifically, a 16-way set-associative
cache means that each set contains 16 cache lines.
Details:
 Set-Associative: This design lies between fully associative (where any block can be placed
anywhere in the cache) and direct-mapped (where each block has only one possible
location). Set-associative caches strike a balance, providing more flexibility and efficiency.
 16-Way: In a 16-way set-associative cache, each set can hold 16 different lines of data,
reducing conflicts and improving hit rates.
 Line Capacity: The line capacity of 512 bytes means each line in the cache can store 512
bytes of data. This is the size of the block transferred between the main memory and the
cache.
(n) CSA (Carry Save Adder)
Explanation: A Carry Save Adder (CSA) can add up to R + 1 numbers simultaneously, where R is the
radix or base of the number system. This is because a CSA reduces the number of carry bits by
grouping bits of the same position and summing them without immediately propagating the carry,
allowing it to handle an additional number.
(o) Pipeline Architectures in Parallel Computer Design
1. Linear Pipeline: Instructions pass through a series of stages in a sequential manner.
2. Non-Linear Pipeline: Instructions may follow different paths or branches within the pipeline
based on control decisions, such as conditional statements.
(p) Condition for NaN in IEEE-754 Format
Condition: A NaN (Not a Number) in IEEE-754 format is represented by an exponent of all 1s
(1111...1111) and a non-zero fraction (mantissa).
(q) IEEE-754 Single Precision Format for -4.5
 Sign Bit: 1 (indicating negative)
 Exponent: 10000001 (bias of 127 for the exponent 2)
 Mantissa: 00100000000000000000000
 IEEE-754 Representation: 1 10000001 00100000000000000000000
(r) Representation of Zero in IEEE-754

4|Page
Zero: In IEEE-754 format, zero is represented by all bits being zero: 0 00000000
00000000000000000000000.
(s) Relation Between Instruction Cycle, Machine Cycle, and Clock Cycle
1. Clock Cycle: The smallest unit of time in a processor, defined by the processor's clock
frequency.
2. Machine Cycle: Consists of several clock cycles and represents the time taken to execute a
basic operation, like fetching or decoding.
3. Instruction Cycle: Encompasses several machine cycles and represents the time taken to
execute a complete instruction.
(t) Difference Between Main Memory and Cache Memory
1. Speed: Cache memory is faster than main memory.
2. Size: Cache memory is smaller in size compared to main memory.
3. Purpose: Cache memory stores frequently accessed data to speed up processing, while main
memory stores all data and instructions required by the CPU.
(u) Difference Between Primary Memory and Secondary Memory
1. Volatility: Primary memory (RAM) is volatile, losing data when power is off; secondary
memory (HDD, SSD) is non-volatile.
2. Speed: Primary memory is faster, while secondary memory is slower but offers larger storage
capacity.
3. Usage: Primary memory is used for immediate data access by the CPU, whereas secondary
memory is used for long-term storage.
(v) Comparison Between SRAM and DRAM
1. Speed: SRAM is faster due to its static design, while DRAM is slower because it needs
periodic refreshing.
2. Power Consumption: SRAM consumes more power, while DRAM is more power-efficient.
3. Cost: SRAM is more expensive, whereas DRAM is cheaper and used for larger memory
requirements.
(w) Comparison Between RAM, SAM, and CAM
1. RAM (Random Access Memory): Memory where any byte can be accessed directly.
2. SAM (Sequential Access Memory): Memory where data is accessed in a fixed sequence.
3. CAM (Content Addressable Memory): Memory where data is accessed based on content
rather than address.
(x) Define and Explain Instruction Cycle
Instruction Cycle: The cycle through which the CPU fetches, decodes, and executes an instruction. It
includes several stages like fetching the instruction, decoding it, executing it, and storing the result.
(y) What is Interrupt?
Interrupt: An interrupt is a signal to the processor indicating an event that needs immediate
attention. It temporarily halts the current execution, saves its state, and executes an interrupt service
routine to address the event.
(z) Use of DMA (Direct Memory Access)
DMA: Direct Memory Access allows peripherals to directly transfer data to and from memory
without CPU intervention, improving efficiency and freeing up the CPU for other tasks.
Difference Between Turing Computing and Von Neumann Computing
1. Turing Computing: Based on Turing machines, abstract models of computation that define
algorithms and problem-solving without specifying hardware.
2. Von Neumann Computing: Refers to the architecture of a computer system where
instructions and data share the same memory space and are executed sequentially by a CPU.

5|Page
(a) Functions of CPU Registers (2 Marks for each register/counter)
1. Accumulator (AC): Holds intermediate results of arithmetic and logical operations (2 marks).
2. Program Counter (PC): Holds the address of the next instruction to be executed (2 marks).
3. Stack Pointer (SP): Points to the top of the stack, used in stack operations like push and pop
(2 marks).
4. Instruction Register (IR): Holds the currently executing instruction (2 marks).
5. Memory Address Register (MAR): Holds the address of the memory location to be accessed
(2 marks).
6. Memory Buffer Register (MBR): Holds the data to be written to or read from the memory (2
marks).
7. Flag Registers: Contains flags or indicators that represent the state of the CPU, such as zero,
carry, sign, and overflow flags (2 marks).
8. Status Register: Similar to flag registers, it holds information about the state of the CPU and
the results of operations (2 marks).
(b) Compare RISC and CISC Computer Architecture (5 Marks)
 RISC (Reduced Instruction Set Computer):
1. Simple instructions that execute in one clock cycle (1 mark).
2. Large number of general-purpose registers (1 mark).
3. Emphasis on software to perform complex tasks (1 mark).
 CISC (Complex Instruction Set Computer):
1. Complex instructions that execute multiple tasks in a single instruction (1 mark).
2. Fewer general-purpose registers (1 mark).
3. Emphasis on hardware to execute complex instructions (1 mark).
(c) Characteristics of RISC Architecture. Compare Microprocessor and Microcontroller. (3 + 2)
Characteristics of RISC Architecture:
1. Simplified instruction set (1 mark).
2. Single-cycle instruction execution (1 mark).
3. Large number of registers (1 mark).
Comparison of Microprocessor and Microcontroller:
1. Microprocessor: General-purpose computing unit with no RAM, ROM, or I/O ports on the
chip. Used in PCs, laptops, and servers (1 mark).
2. Microcontroller: Contains CPU, RAM, ROM, and I/O ports on a single chip. Used in
embedded systems like automotive, home appliances, and IoT devices (1 mark).
(d) Explain with a Suitable Diagram How Associative Mapping in the Cache System Works (5 Marks)
(Note: Placeholder for an actual diagram)
Explanation: In associative mapping, any block of main memory can be placed in any line of the
cache. This provides flexibility but requires a search mechanism. Tags are used to identify the blocks,
and the cache controller searches all tags to find the required data. If found, it's a cache hit;
otherwise, it's a cache miss (3 marks for the explanation, 2 marks for the diagram).
(e) Number of Bits Required for Tag, Line, and Offset (5 Marks)
 Total Address Size: 32 bits (assuming standard 32-bit address)
 Offset: Line capacity of 256 bytes = 2^8, so 8 bits for offset (1 mark).
 Number of Lines: 16KB / 256 bytes = 64 lines = 2^6, so 6 bits for the line (1 mark).
 Tag: Remaining bits = 32 - (8 + 6) = 18 bits (1 mark).
Tag: 18 bits, Line: 6 bits, Offset: 8 bits.
(f) Effects of Branches and Branch Prediction (3 + 2)
Effects of Branches:
1. Pipeline stalls due to branch instructions (1 mark).

6|Page
2. Flushing the pipeline if the branch is taken (1 mark).
3. Reduced pipeline efficiency (1 mark).
Branch Prediction:
1. Static Prediction: Predicts branches based on a fixed behavior (1 mark).
2. Dynamic Prediction: Uses hardware to track the history of branches and make predictions (1
mark).
(g) Speedup Factor and Formula Derivation (2 + 3)
Speedup Factor: Ratio of execution time without pipeline to execution time with pipeline (2 marks).
Formula:
Speedup=n×tnon-pipelinek×tpipeline+(n−1)×tpipeline\text{Speedup} = \frac{n \times t_{\text{non-
pipeline}}}{k \times t_{\text{pipeline}} + (n-1) \times t_{\text{pipeline}}}
Where nn = number of operations, kk = number of stages, tpipelinet_{\text{pipeline}} = time per
stage (3 marks for derivation).
(h) Advantages of a Pipeline Processor Based Parallel Computer and Floating-Point Addition in Four
Stage Pipeline (5 Marks)
Advantages:
1. Increased throughput due to parallel processing (2 marks).
2. Higher clock speed and efficiency (1 mark).
Floating-Point Addition:
1. Fetch operands.
2. Align exponents.
3. Add significands.
4. Normalize result (2 marks).
(i) Use of Feedback and Feed-Forward Path and Maximum Throughput Calculation (3 + 2)
Feedback Path: Used to reroute the output of a stage back to a previous stage to ensure correct data
processing (1.5 marks).
Feed-Forward Path: Provides direct data transfer to subsequent stages without intermediate stages
(1.5 marks).
Max Throughput Calculation:
Max Throughput=1slowest stage delay=1110 ns=9.09 MFLOPS\text{Max Throughput} = \frac{1}{\
text{slowest stage delay}} = \frac{1}{110 \text{ ns}} = 9.09 \text{ MFLOPS}
(2 marks).
(j) Define and Relation Between CPI and MIPS (2 + 3)
CPI (Cycles Per Instruction): Average number of clock cycles per instruction (2 marks).
MIPS (Million Instructions Per Second):
MIPS=Clock FrequencyCPI×106\text{MIPS} = \frac{\text{Clock Frequency}}{\text{CPI} \times 10^6}
(3 marks).
(k) Design an Instruction Processing Pipeline (5 Marks)
Typical 5-Stage Pipeline:
1. IF (Instruction Fetch): Fetch instruction from memory.
2. ID (Instruction Decode): Decode instruction and fetch registers.
3. EX (Execute): Perform arithmetic/logic operations.
4. MEM (Memory Access): Read/write data from/to memory.
5. WB (Write Back): Write the result back to the register file (5 marks).
(l) Minimum Periods for Floating Point Addition (5 Marks)
 Total stages = 4, delay per stage = 2ns.
 To add 100 numbers:

7|Page
Number of periods=(100−1)×1+4=103 periods\text{Number of periods} = (100 - 1) \times 1 + 4 =
103 \text{ periods}
(5 marks).
(m) CSA Carry Save Adder and Wallace Tree (2 + 3)
CSA: Can add up to R + 1 numbers, where R is the radix (2 marks).
Wallace Tree:
 Use partial products reduction in stages until two numbers remain, then use a conventional
adder (3 marks).
(n) MMU with Segmentation and Paging (5 Marks)
Diagram: [Diagram needed]
Explanation:
 Segmentation: Divides memory into segments of variable size.
 Paging: Divides segments into fixed-size pages, which simplifies memory management and
reduces fragmentation (3 marks for explanation, 2 marks for diagram).
(o) Difference Between Paging and Segmentation (5 Marks)
1. Paging:
o Fixed-size pages (2.5 marks).
o Simplifies memory management (2.5 marks).
2. Segmentation:
o Variable-size segments (2.5 marks).
o Provides logical separation of memory (2.5 marks).
(p) Structure and Use of TLB (5 Marks)
Translation Lookaside Buffer (TLB):
 Structure: Small, fast cache that stores recent translations of virtual memory to physical
addresses (2 marks).
 Use: Improves memory access speed by reducing the number of accesses to the page table
(3 marks).
(q) Definitions and Explanations (5 Marks each)
(i) Degree of Parallelism: Number of instructions that can be executed simultaneously (5 marks).
(ii) Scalable Architecture: An architecture that can efficiently handle increasing workloads or
resources (5 marks).
(iii) Amdahl’s Law:
Speedup=1(1−P)+PS\text{Speedup} = \frac{1}{(1 - P) + \frac{P}{S}}
Where PP is the parallelizable portion and SS is the speedup of the parallel portion (5 marks).
(iv) Dynamic Pipeline: A pipeline that can adapt its structure based on the current workload to
optimize performance (5 marks).

8|Page

You might also like