0% found this document useful (0 votes)
8 views15 pages

MCSE-103 Advanced Computer Architecture (June 2020)

The document discusses Flynn's classification of parallel computing structures, detailing four categories: SISD, SIMD, MISD, and MIMD, each with definitions, architectures, examples, and use cases. It also covers the need for parallel processing, emphasizing performance, efficiency, scalability, and cost-effectiveness, along with various parallel computing structures like pipelined processors and vector processors. Additionally, it addresses data and control hazards in pipelining, differences between multi-computer and multi-processor systems, and architectural models of multi-processor systems.

Uploaded by

gixayew714
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views15 pages

MCSE-103 Advanced Computer Architecture (June 2020)

The document discusses Flynn's classification of parallel computing structures, detailing four categories: SISD, SIMD, MISD, and MIMD, each with definitions, architectures, examples, and use cases. It also covers the need for parallel processing, emphasizing performance, efficiency, scalability, and cost-effectiveness, along with various parallel computing structures like pipelined processors and vector processors. Additionally, it addresses data and control hazards in pipelining, differences between multi-computer and multi-processor systems, and architectural models of multi-processor systems.

Uploaded by

gixayew714
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

### Unit 1: Flynn's and Handler's Classifica on of Parallel Compu ng Structures

#### Ques on 1a: Flynn's Classifica on of Parallel Processing

- **Flynn's Taxonomy**:

- **SISD (Single Instruc on stream, Single Data stream)**:

- **Defini on**: A single instruc on operates on a single data point at a me.

- **Architecture**:

- Single control unit directs opera ons.

- One processing element executes instruc ons.

- **Examples**:

- Tradi onal personal computers.

- Simple microprocessors (e.g., early Intel CPUs).

- **Use Cases**: Suitable for general-purpose compu ng tasks where parallelism is not required.

- **Diagram**:

```

Control Unit -> Processing Element -> Memory

```

- **SIMD (Single Instruc on stream, Mul ple Data streams)**:

- **Defini on**: One instruc on operates on mul ple data points simultaneously.

- **Architecture**:

- Single control unit broadcasts instruc ons to mul ple processing elements.

- Each processing element executes the same instruc on on different pieces of data.

- **Examples**:

- Modern Graphics Processing Units (GPUs).

- Vector processors used in scien fic compu ng.

- **Use Cases**: Ideal for tasks that can be parallelized across large data sets, such as image
processing or scien fic simula ons.

- **Diagram**:

```
Control Unit

| | |

PE1 PE2 PE3 ... PEn

```

- **MISD (Mul ple Instruc on streams, Single Data stream)**:

- **Defini on**: Mul ple instruc ons operate on a single data stream.

- **Architecture**:

- Mul ple control units and processing elements.

- Rarely used due to limited prac cal applica ons.

- **Examples**:

- Hypothe cal systems, certain fault-tolerant systems.

- **Use Cases**: Poten ally useful in scenarios requiring mul ple types of analysis on the same data.

- **Diagram**:

```

CU1 -> PE1

CU2 -> PE2

CU3 -> PE3 ... CUn -> PEn

Data Stream

```

- **MIMD (Mul ple Instruc on streams, Mul ple Data streams)**:

- **Defini on**: Mul ple processors execute different instruc ons on different data points
simultaneously.

- **Architecture**:

- Mul ple autonomous processors, each with its own control unit.

- Processors can operate asynchronously.

- **Examples**:

- Mul core processors.

- Distributed systems like computer clusters.


- **Use Cases**: Suitable for a wide range of applica ons from general-purpose compu ng to large-
scale scien fic simula ons.

- **Diagram**:

```

CU1 -> PE1 -> Data Stream 1

CU2 -> PE2 -> Data Stream 2

CU3 -> PE3 -> Data Stream 3 ... CUn -> PEn -> Data Stream n

```

#### Ques on 1b: Need for Parallel Processing and Classifica on of Parallel Compu ng Structures

- **Need for Parallel Processing**:

- **Performance**:

- Parallel processing can significantly increase computa onal speed by dividing tasks among mul ple
processors.

- Example: Weather simula ons, where data can be processed in parallel to speed up forecasts.

- **Efficiency**:

- U lizes resources more effec vely by distribu ng workloads across processors.

- Example: Data centers distribu ng web requests across mul ple servers.

- **Scalability**:

- Can handle larger problems by adding more processors.

- Example: Scien fic research requiring large-scale simula ons or data analysis.

- **Cost-Effec veness**:

- Reduces processing me, leading to lower opera onal costs.

- Example: Financial modeling where faster computa ons can lead to mely decisions and cost
savings.

- **Classifica on of Parallel Compu ng Structures**:

- **Pipelined Processors**:

- **Defini on**: Overlapping phases of instruc on execu on.

- **Stages**: Fetch, decode, execute, memory access, write-back.

- **Use Case**: Increases instruc on throughput, commonly used in modern CPUs.


- **Diagram**:

```

Fetch -> Decode -> Execute -> Memory Access -> Write-Back

```

- **Vector Processors**:

- **Defini on**: Perform opera ons on en re vectors simultaneously.

- **Use Case**: Efficient for scien fic computa ons involving large data sets.

- **Example**: Cray supercomputers.

- **Diagram**:

```

Vector Processor

```

- **Array Processors**:

- **Defini on**: Grid of processors performing the same instruc on on different data points.

- **Use Case**: Suitable for data-parallel tasks.

- **Example**: Early SIMD systems.

- **Diagram**:

```

Array of Processing Elements (PE)

```

- **Mul threaded Processors**:

- **Defini on**: Use mul ple threads within a single processor to perform tasks concurrently.

- **Use Case**: Enhances performance for applica ons that can be parallelized at the thread level.

- **Example**: Modern CPUs with hyper-threading.

- **Diagram**:

```

Processor Core

Thread 1

Thread 2
```

- **Mul processors**:

- **Defini on**: Systems with mul ple processors working on different tasks.

- **Types**: Tightly coupled (shared memory) vs. loosely coupled (distributed memory).

- **Use Case**: Suitable for high-performance compu ng tasks.

- **Diagram**:

```

Mul ple Processors -> Shared Memory

```

### Unit 1: Pipelined and Vector Processors

#### Ques on 2a: What is Pipelining?

- **Defini on**:

- Pipelining is a technique where mul ple instruc on phases are overlapped to improve processing
efficiency.

- Each stage in the pipeline performs a part of an instruc on, passing it to the next stage in a sequen al
manner.

- **Processing in the Pipeline**:

- **Stages**:

- **Fetch**: Retrieve instruc on from memory.

- **Decode**: Determine the required opera ons and operands.

- **Execute**: Perform the opera ons.

- **Memory Access**: Read/write data from/to memory.

- **Write-Back**: Store the results back in registers.

- **Example**:

- An instruc on pipeline with five stages: Fetch, Decode, Execute, Memory Access, Write-Back.

- Each stage processes a different part of an instruc on simultaneously.

- **Benefits**:
- **Increased Throughput**: Mul ple instruc ons are processed simultaneously, increasing the overall
processing speed.

- **Resource Efficiency**: Be er u liza on of processor resources by keeping all stages ac ve.

- **Diagram**:

```

Time ->

Fetch Decode Execute Mem Access Write-Back

| | | | |

+------->+------->+------->+--------->+---------->

```

#### Ques on 2b: Why Does Pipelining Improve Performance?

- **Increased Throughput**:

- Mul ple instruc ons are processed simultaneously, leading to a higher number of instruc ons
executed per unit of me.

- Example: If each stage takes one clock cycle, a five-stage pipeline can complete five instruc ons every
five cycles once the pipeline is full.

- **Latency Reduc on**:

- Each instruc on has a shorter wait me as the stages overlap.

- Example: In a non-pipelined system, instruc ons would be executed sequen ally, increasing wait me.

- **Resource Efficiency**:

- Keeps all stages of the processor ac ve, maximizing resource usage.

- Example: Instead of having one instruc on monopolize the processor, mul ple instruc ons share
resources, reducing idle me.

- **Illustra on**:

- In a non-pipelined architecture, an instruc on might take five cycles to complete. In a pipelined


architecture, once the pipeline is full, an instruc on completes every cycle.

- Diagram:

```
Non-Pipelined:

Time -> 1 2 3 4 5 6 7 8 9 10

I1 I2 I3

Pipelined:

Time -> 1 2 3 4 5 6 7 8 9 10

Fetch -> I1 I2 I3

Decode -> I1 I2 I3

Execute-> I1 I2 I3

Mem -> I1 I2 I3

WB -> I1 I2 I3

```

### Unit 1: Speedup, Throughput, and Efficiency of Pipelined Architecture

#### Ques on 3a: Speedup, Throughput, and Efficiency of a Pipelined Architecture

- **Speedup**:

- **Defini on**: The ra o of the me taken to complete a task without pipelining to the me taken
with pipelining.

- **Formula**: Speedup (S) = Non-Pipelined Time / Pipelined Time.

- **Example**:

- Non-pipelined execu on me: 100 cycles.

- Pipelined execu on me: 25 cycles.

- Speedup: 100 / 25 = 4.

- **Diagram**:

```

Speedup ->

Non-Pipelined -> |------------------------------|

Pipelined -> |----|----|----|----|


```

- **Throughput**:

- **Defini on**: The number of instruc ons processed per unit me.

- **Formula**: Throughput (T) = Number of Instruc ons / Time.

- **Example**:

- If a pipelined processor can complete 10 instruc ons in 10 cycles, its throughput is 1 instruc on per
cycle.

- This is compared to a non-pipelined processor where instruc ons might complete sequen ally,
resul ng in a lower throughput.

- **Diagram**:

```

Throughput ->

Pipelined -> |----|----|----|----|----|

```

- **Efficiency**:

- **Defini on**: The ra o of useful work done to the total work expended.

- **Formula**: Efficiency = (Number of Instruc ons / Pipelined Time) * 100%.

- **Example**:

- If a pipelined processor completes 100 instruc ons in 20 cycles:

- Efficiency = (100 / 20) * 100% = 500%.

- This high efficiency is due to the overlap of instruc on processing stages, minimizing idle me and
maximizing use of resources.

- **Diagram**:

```

Efficiency ->

Pipelined -> |----|----|----|----|----|

```
### Unit 1: Vector Processing and SIMD Array Processor

#### Ques on 4a: What is Vector Processing?

- **Defini on**:

- Vector processing involves execu ng a single instruc on on mul ple data elements simultaneously.

- This contrasts with scalar processing, where each instruc on operates on a single data element at a
me.

- **Applica ons**:

- **Scien fic Compu ng**: Vector processors excel in tasks such as linear algebra opera ons (matrix
mul plica ons, vector addi ons).

- **Graphics Processing**: Used in rendering pipelines for transforming and shading ver ces.

- **Signal Processing**: Efficient for processing large volumes of data in real- me applica ons (e.g.,
audio and video processing).

- **Benefits**:

- **Performance**: Handles large datasets efficiently by processing mul ple data elements in parallel.

- **Speed**: Significantly faster than scalar processing for opera ons on large arrays or matrices.

- **Power Efficiency**: Achieves higher performance per wa compared to scalar processors due to
parallelism.

- **Example**:

- Cray supercomputers historically used vector processing units for scien fic simula ons and modeling.

- **Diagram**:

```

Vector Processor

```

#### Ques on 4b: SIMD Array Processor

- **Defini on**:

- SIMD (Single Instruc on, Mul ple Data) array processors execute the same instruc on on mul ple
data elements simultaneously.

- Arrays of processing elements (PEs) operate in parallel under the control of a central unit (CU).
- **Architecture**:

- **Control Unit (CU)**: Issues instruc ons to mul ple PEs.

- **Processing Elements (PEs)**: Execute the same instruc on but on different data elements.

- **Interconnec on Network**: Facilitates data exchange between PEs and memory.

- **Applica ons**:

- **Graphics Processing Units (GPUs)**: Use SIMD architecture for parallel execu on of shader
programs.

- **Scien fic Compu ng**: Accelerates simula ons involving large-scale computa ons.

- **Machine Learning**: SIMD processors op mize parallel opera ons in neural network training.

- **Diagram**:

```

Control Unit (CU) -> [PE1, PE2, PE3, ... , PEn]

```

### Unit 2: Data and Control Hazards

#### Ques on 5a: Data and Control Hazards

- **Data Hazards**:

- **Defini on**: Occur when instruc ons depend on the results of previous instruc ons.

- **Types**:

- **RAW (Read A er Write)**: Reading a register before its value is updated.

- **WAR (Write A er Read)**: Wri ng to a register before its previous value is read.

- **WAW (Write A er Write)**: Wri ng to the same register mul ple mes before the previous write
completes.

- **Resolu on**:

- **Forwarding**: Passing data directly from one pipeline stage to another to avoid stalls.

- **Pipeline Interlocks**: Inser ng bubbles (no-ops) to prevent data hazards.

- **Register Renaming**: Using addi onal registers to avoid name conflicts.

- **Control Hazards**:
- **Defini on**: Arise due to condi onal branches that affect program flow.

- **Resolu on**:

- **Branch Predic on**: Specula ng whether a branch will be taken or not before the actual decision.

- **Delayed Branching**: Delaying the effect of a branch instruc on un l its outcome is known.

- **Dynamic Scheduling**: Reordering instruc ons to execute independently of branch decisions.

#### Ques on 5b: Difference Between Mul computer and Mul processor Systems

- **Mul computer Systems**:

- **Defini on**: Comprise mul ple independent computers connected via a network.

- **Characteris cs**:

- Each computer has its own memory and opera ng system.

- Communica on between computers occurs via message passing.

- Examples include clusters of PCs or worksta ons connected over a network.

- **Use Cases**: High availability, scalability, and fault tolerance in distributed compu ng
environments.

- **Mul processor Systems**:

- **Defini on**: Consist of mul ple processors sharing a common memory and opera ng system.

- **Characteris cs**:

- Processors access shared memory for communica on and synchroniza on.

- Examples include symmetric mul processing (SMP) systems or NUMA architectures.

- **Use Cases**: High-performance compu ng, where shared memory access speeds up inter-process
communica on and data sharing.

### Unit 2: Mul processor Models

#### Ques on 6a: Mul processor Architectural Models

- **Defini on**:

- Mul processor systems feature mul ple processors that share a common memory space and can
execute tasks concurrently.

- **Architectural Models**:
- **Tightly Coupled**:

- Processors share memory and communicate directly.

- Suitable for applica ons requiring high-speed communica on and synchroniza on.

- **Loosely Coupled**:

- Processors have separate memories and communicate via interconnec on networks.

- Offers scalability and fault tolerance but requires efficient message-passing protocols.

- **Use Cases**:

- Tightly coupled systems are ideal for real- me processing and high-performance compu ng (HPC).

- Loosely coupled systems excel in distributed compu ng environments where scalability and fault
tolerance are cri cal.

#### Ques on 6b: Loosely Coupled Mul processor System

- **Characteris cs**:

- **Independent Memory**: Each processor has its own memory space.

- **Communica on**: Processors communicate via message passing over a network.

- **Scalability**: Easily scalable by adding more processors and nodes to the network.

- **Examples**: Beowulf clusters, grid compu ng networks.

- **Intra-processor Communica on**:

- Processors communicate within the system using shared buses or interconnec on networks.

- **Inter-processor Communica on**:

- Data exchange between processors involves message-passing protocols that manage communica on
overhead.

### Unit 3: Interconnec on Networks and Load Balancing

#### Ques on 7a: Interconnec on Network Schemes

- **Defini on**:

- Interconnec on networks connect processors, memory, and I/O devices within a mul processor or
mul computer system.

- **Schemes**:
- **Bus-based**:

- Uses a shared communica on bus for data exchange.

- Simple and cost-effec ve but can lead to bo lenecks.

- **Crossbar Switch**:

- Directly connects mul ple devices in a non-blocking manner.

- Offers high throughput but can be costly and complex to implement.

- **Mul stage Networks**:

- Connects devices in mul ple stages (layers) of switches.

- Balances cost and performance, commonly used in large-scale systems.

- **Mesh and Torus**:

- Grid-based structures connec ng processors in a mesh or toroidal topology.

- Provides fault tolerance and scalability, common in supercomputers and HPC clusters.

#### Ques on 7b: Load Balancing in Mul processor Systems

- **Defini on**:

- Load balancing distributes tasks and computa onal load evenly across processors to op mize system
performance.

- **Techniques**:

- **Sta c Load Balancing**:

- Pre-determined assignment of tasks based on known workload characteris cs.

- Example: Round-robin scheduling or par oning tasks based on computa onal complexity.

- **Dynamic Load Balancing**:

- Adjusts task assignment in real- me based on current system load and performance metrics.

- Example: Task stealing where idle processors take on tasks from overloaded processors.

- **Example**:

- Job scheduling algorithms dynamically allocate tasks to processors based on their current workload.

- Load balancing ensures efficient resource u liza on and minimizes idle me in mul processor
systems.
### Unit 3:

Synchroniza on and Coherence in Mul processor Systems

#### Ques on 8a: Synchroniza on Mechanisms in Mul processor Systems

- **Defini on**:

- Synchroniza on ensures orderly execu on of concurrent processes or threads sharing resources.

- **Mechanisms**:

- **Mutual Exclusion**:

- Prevents mul ple processes from accessing a shared resource simultaneously.

- Example: Locks, semaphores, or atomic instruc ons.

- **Atomic Opera ons**:

- Guarantees that a sequence of opera ons is executed as a single unit without interrup on.

- Example: Compare-and-swap (CAS) in shared memory systems.

- **Barrier Synchroniza on**:

- Ensures that all processes reach a specific point before con nuing execu on.

- Example: Barrier synchroniza on used in parallel computa ons to synchronize threads.

- **Diagram**:

```

Synchroniza on Mechanisms ->

Mutual Exclusion -> Locks, Semaphores

Atomic Opera ons -> Compare-and-swap

Barrier Synchroniza on -> Barrier

```

#### Ques on 8b: Cache Coherence Protocols

- **Defini on**:

- Cache coherence ensures that mul ple processors accessing shared data maintain consistency across
their local caches.
- **Protocols**:

- **MESI Protocol**:

- Maintains cache coherence using four states: Modified, Exclusive, Shared, and Invalid.

- Ensures that only one cache has the right to modify a given block of data at a me.

- **MOESI Protocol**:

- Enhances MESI by adding an Owned state, where a cache can read data without checking main
memory.

- Improves efficiency by reducing memory access latency.

- **MESIF Protocol**:

- Further refines MOESI by introducing a Forward state for quicker data transfer between caches.

- Reduces traffic on the interconnect network by allowing direct cache-to-cache transfers.

- **Implementa on**:

- Hardware-based coherence protocols ensure consistent data across caches through snooping or
directory-based approaches.

- Example: Intel processors use MESI-based protocols to maintain cache coherence efficiently.

### Conclusion

This detailed response covers various aspects of parallel compu ng, from Flynn's and Handler's
classifica ons to pipelining, vector processing, hazards, mul processor architectures, interconnec on
networks, load balancing, synchroniza on, and cache coherence. Each sec on provides in-depth
explana ons, examples, and diagrams to illustrate key concepts in advanced computer architecture.

If you have any further ques ons or need addi onal clarifica on on any topic, feel free to ask!

You might also like