MCSE-103 Advanced Computer Architecture (June 2020)
MCSE-103 Advanced Computer Architecture (June 2020)
- **Flynn's Taxonomy**:
- **Architecture**:
- **Examples**:
- **Use Cases**: Suitable for general-purpose compu ng tasks where parallelism is not required.
- **Diagram**:
```
```
- **Defini on**: One instruc on operates on mul ple data points simultaneously.
- **Architecture**:
- Single control unit broadcasts instruc ons to mul ple processing elements.
- Each processing element executes the same instruc on on different pieces of data.
- **Examples**:
- **Use Cases**: Ideal for tasks that can be parallelized across large data sets, such as image
processing or scien fic simula ons.
- **Diagram**:
```
Control Unit
| | |
```
- **Defini on**: Mul ple instruc ons operate on a single data stream.
- **Architecture**:
- **Examples**:
- **Use Cases**: Poten ally useful in scenarios requiring mul ple types of analysis on the same data.
- **Diagram**:
```
Data Stream
```
- **Defini on**: Mul ple processors execute different instruc ons on different data points
simultaneously.
- **Architecture**:
- Mul ple autonomous processors, each with its own control unit.
- **Examples**:
- **Diagram**:
```
CU3 -> PE3 -> Data Stream 3 ... CUn -> PEn -> Data Stream n
```
#### Ques on 1b: Need for Parallel Processing and Classifica on of Parallel Compu ng Structures
- **Performance**:
- Parallel processing can significantly increase computa onal speed by dividing tasks among mul ple
processors.
- Example: Weather simula ons, where data can be processed in parallel to speed up forecasts.
- **Efficiency**:
- Example: Data centers distribu ng web requests across mul ple servers.
- **Scalability**:
- **Cost-Effec veness**:
- Example: Financial modeling where faster computa ons can lead to mely decisions and cost
savings.
- **Pipelined Processors**:
```
Fetch -> Decode -> Execute -> Memory Access -> Write-Back
```
- **Vector Processors**:
- **Use Case**: Efficient for scien fic computa ons involving large data sets.
- **Diagram**:
```
Vector Processor
```
- **Array Processors**:
- **Defini on**: Grid of processors performing the same instruc on on different data points.
- **Diagram**:
```
```
- **Defini on**: Use mul ple threads within a single processor to perform tasks concurrently.
- **Use Case**: Enhances performance for applica ons that can be parallelized at the thread level.
- **Diagram**:
```
Processor Core
Thread 1
Thread 2
```
- **Mul processors**:
- **Defini on**: Systems with mul ple processors working on different tasks.
- **Types**: Tightly coupled (shared memory) vs. loosely coupled (distributed memory).
- **Diagram**:
```
```
- **Defini on**:
- Pipelining is a technique where mul ple instruc on phases are overlapped to improve processing
efficiency.
- Each stage in the pipeline performs a part of an instruc on, passing it to the next stage in a sequen al
manner.
- **Stages**:
- **Example**:
- An instruc on pipeline with five stages: Fetch, Decode, Execute, Memory Access, Write-Back.
- **Benefits**:
- **Increased Throughput**: Mul ple instruc ons are processed simultaneously, increasing the overall
processing speed.
- **Diagram**:
```
Time ->
| | | | |
+------->+------->+------->+--------->+---------->
```
- **Increased Throughput**:
- Mul ple instruc ons are processed simultaneously, leading to a higher number of instruc ons
executed per unit of me.
- Example: If each stage takes one clock cycle, a five-stage pipeline can complete five instruc ons every
five cycles once the pipeline is full.
- Example: In a non-pipelined system, instruc ons would be executed sequen ally, increasing wait me.
- **Resource Efficiency**:
- Example: Instead of having one instruc on monopolize the processor, mul ple instruc ons share
resources, reducing idle me.
- **Illustra on**:
- Diagram:
```
Non-Pipelined:
Time -> 1 2 3 4 5 6 7 8 9 10
I1 I2 I3
Pipelined:
Time -> 1 2 3 4 5 6 7 8 9 10
Fetch -> I1 I2 I3
Decode -> I1 I2 I3
Execute-> I1 I2 I3
Mem -> I1 I2 I3
WB -> I1 I2 I3
```
- **Speedup**:
- **Defini on**: The ra o of the me taken to complete a task without pipelining to the me taken
with pipelining.
- **Example**:
- Speedup: 100 / 25 = 4.
- **Diagram**:
```
Speedup ->
- **Throughput**:
- **Defini on**: The number of instruc ons processed per unit me.
- **Example**:
- If a pipelined processor can complete 10 instruc ons in 10 cycles, its throughput is 1 instruc on per
cycle.
- This is compared to a non-pipelined processor where instruc ons might complete sequen ally,
resul ng in a lower throughput.
- **Diagram**:
```
Throughput ->
```
- **Efficiency**:
- **Defini on**: The ra o of useful work done to the total work expended.
- **Example**:
- This high efficiency is due to the overlap of instruc on processing stages, minimizing idle me and
maximizing use of resources.
- **Diagram**:
```
Efficiency ->
```
### Unit 1: Vector Processing and SIMD Array Processor
- **Defini on**:
- Vector processing involves execu ng a single instruc on on mul ple data elements simultaneously.
- This contrasts with scalar processing, where each instruc on operates on a single data element at a
me.
- **Applica ons**:
- **Scien fic Compu ng**: Vector processors excel in tasks such as linear algebra opera ons (matrix
mul plica ons, vector addi ons).
- **Graphics Processing**: Used in rendering pipelines for transforming and shading ver ces.
- **Signal Processing**: Efficient for processing large volumes of data in real- me applica ons (e.g.,
audio and video processing).
- **Benefits**:
- **Performance**: Handles large datasets efficiently by processing mul ple data elements in parallel.
- **Speed**: Significantly faster than scalar processing for opera ons on large arrays or matrices.
- **Power Efficiency**: Achieves higher performance per wa compared to scalar processors due to
parallelism.
- **Example**:
- Cray supercomputers historically used vector processing units for scien fic simula ons and modeling.
- **Diagram**:
```
Vector Processor
```
- **Defini on**:
- SIMD (Single Instruc on, Mul ple Data) array processors execute the same instruc on on mul ple
data elements simultaneously.
- Arrays of processing elements (PEs) operate in parallel under the control of a central unit (CU).
- **Architecture**:
- **Processing Elements (PEs)**: Execute the same instruc on but on different data elements.
- **Applica ons**:
- **Graphics Processing Units (GPUs)**: Use SIMD architecture for parallel execu on of shader
programs.
- **Scien fic Compu ng**: Accelerates simula ons involving large-scale computa ons.
- **Machine Learning**: SIMD processors op mize parallel opera ons in neural network training.
- **Diagram**:
```
```
- **Data Hazards**:
- **Defini on**: Occur when instruc ons depend on the results of previous instruc ons.
- **Types**:
- **WAR (Write A er Read)**: Wri ng to a register before its previous value is read.
- **WAW (Write A er Write)**: Wri ng to the same register mul ple mes before the previous write
completes.
- **Resolu on**:
- **Forwarding**: Passing data directly from one pipeline stage to another to avoid stalls.
- **Control Hazards**:
- **Defini on**: Arise due to condi onal branches that affect program flow.
- **Resolu on**:
- **Branch Predic on**: Specula ng whether a branch will be taken or not before the actual decision.
- **Delayed Branching**: Delaying the effect of a branch instruc on un l its outcome is known.
#### Ques on 5b: Difference Between Mul computer and Mul processor Systems
- **Defini on**: Comprise mul ple independent computers connected via a network.
- **Characteris cs**:
- **Use Cases**: High availability, scalability, and fault tolerance in distributed compu ng
environments.
- **Defini on**: Consist of mul ple processors sharing a common memory and opera ng system.
- **Characteris cs**:
- **Use Cases**: High-performance compu ng, where shared memory access speeds up inter-process
communica on and data sharing.
- **Defini on**:
- Mul processor systems feature mul ple processors that share a common memory space and can
execute tasks concurrently.
- **Architectural Models**:
- **Tightly Coupled**:
- Suitable for applica ons requiring high-speed communica on and synchroniza on.
- **Loosely Coupled**:
- Offers scalability and fault tolerance but requires efficient message-passing protocols.
- **Use Cases**:
- Tightly coupled systems are ideal for real- me processing and high-performance compu ng (HPC).
- Loosely coupled systems excel in distributed compu ng environments where scalability and fault
tolerance are cri cal.
- **Characteris cs**:
- **Scalability**: Easily scalable by adding more processors and nodes to the network.
- Processors communicate within the system using shared buses or interconnec on networks.
- Data exchange between processors involves message-passing protocols that manage communica on
overhead.
- **Defini on**:
- Interconnec on networks connect processors, memory, and I/O devices within a mul processor or
mul computer system.
- **Schemes**:
- **Bus-based**:
- **Crossbar Switch**:
- Provides fault tolerance and scalability, common in supercomputers and HPC clusters.
- **Defini on**:
- Load balancing distributes tasks and computa onal load evenly across processors to op mize system
performance.
- **Techniques**:
- Example: Round-robin scheduling or par oning tasks based on computa onal complexity.
- Adjusts task assignment in real- me based on current system load and performance metrics.
- Example: Task stealing where idle processors take on tasks from overloaded processors.
- **Example**:
- Job scheduling algorithms dynamically allocate tasks to processors based on their current workload.
- Load balancing ensures efficient resource u liza on and minimizes idle me in mul processor
systems.
### Unit 3:
- **Defini on**:
- **Mechanisms**:
- **Mutual Exclusion**:
- Guarantees that a sequence of opera ons is executed as a single unit without interrup on.
- Ensures that all processes reach a specific point before con nuing execu on.
- **Diagram**:
```
```
- **Defini on**:
- Cache coherence ensures that mul ple processors accessing shared data maintain consistency across
their local caches.
- **Protocols**:
- **MESI Protocol**:
- Maintains cache coherence using four states: Modified, Exclusive, Shared, and Invalid.
- Ensures that only one cache has the right to modify a given block of data at a me.
- **MOESI Protocol**:
- Enhances MESI by adding an Owned state, where a cache can read data without checking main
memory.
- **MESIF Protocol**:
- Further refines MOESI by introducing a Forward state for quicker data transfer between caches.
- **Implementa on**:
- Hardware-based coherence protocols ensure consistent data across caches through snooping or
directory-based approaches.
- Example: Intel processors use MESI-based protocols to maintain cache coherence efficiently.
### Conclusion
This detailed response covers various aspects of parallel compu ng, from Flynn's and Handler's
classifica ons to pipelining, vector processing, hazards, mul processor architectures, interconnec on
networks, load balancing, synchroniza on, and cache coherence. Each sec on provides in-depth
explana ons, examples, and diagrams to illustrate key concepts in advanced computer architecture.
If you have any further ques ons or need addi onal clarifica on on any topic, feel free to ask!