0% found this document useful (0 votes)
21 views25 pages

Chapter 1

The document provides an overview of advanced computer architecture focusing on parallel processing, defining it as a technique where multiple processors execute tasks simultaneously to improve performance and resource utilization. It discusses various architectures such as Symmetric Multiprocessing (SMP), Massively Parallel Processing (MPP), and Cluster Computing, along with the significance, challenges, and characteristics of parallel algorithms. Additionally, it covers parallel programming techniques, models, and performance metrics, emphasizing the importance of task and data decomposition in enhancing computational efficiency.

Uploaded by

Sagar Khanal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views25 pages

Chapter 1

The document provides an overview of advanced computer architecture focusing on parallel processing, defining it as a technique where multiple processors execute tasks simultaneously to improve performance and resource utilization. It discusses various architectures such as Symmetric Multiprocessing (SMP), Massively Parallel Processing (MPP), and Cluster Computing, along with the significance, challenges, and characteristics of parallel algorithms. Additionally, it covers parallel programming techniques, models, and performance metrics, emphasizing the importance of task and data decomposition in enhancing computational efficiency.

Uploaded by

Sagar Khanal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Advanced Computer Architecture

Unit I : Introduction to
Parallel Processing 1

Dr. Roshan Koju 2024 December 8 / 15


Week I & II
Contents

2
Definition of parallel processing
a computing technique in which multiple processors or computational units work
simultaneously to execute multiple tasks or solve a single problem. In a parallel
processing system:
1. The workload is divided into smaller units that can run simultaneously.
2. The subtasks communicate and synchronize with one another as required.
3. The goal is to achieve faster execution and improved resource utilization.

Parallel processing can be implemented using various architectures, including


multicore processors, symmetric multiprocessing (SMP), massively parallel
processors (MPP), or distributed computing systems.
Parallel processing is categorized by how data and tasks are divided and
processed. These categories include: 3
1. Data Parallelism: Dividing data into smaller chunks and processing them
simultaneously.
2. Task Parallelism: Assigning different tasks to separate processors for
concurrent execution.
3. Hybrid Parallelism: Combining data and task parallelism to optimize
performance

3
Significance of parallel processing
Performance Improvement:
A weather simulation task that might take hours on a single processor can be completed in minutes
when distributed across multiple processors.
Scalability for Large Data Sets:
Search engines like Google use parallel processing to index and retrieve data across billions of web
pages in real time.
Real-Time Processing:
In stock trading, algorithms process vast amounts of data to make buy/sell decisions in milliseconds.
Cost Efficiency with Resource Utilization:
Using distributed systems like cloud computing allows organizations to leverage parallel processing
without investing heavily in dedicated hardware.
Enabling Advanced Technologies:
4
∙ Machine Learning and AI: Training neural networks requires processing millions of computations
simultaneously.
∙ Graphics Rendering: High-quality 3D graphics and virtual reality rely on parallel GPU processing.
∙ Scientific Research: Simulations of physical phenomena like climate modeling and molecular
dynamics depend on parallel systems.

4
Challenges of Parallel Processing

∙ Task Division: Identifying independent subtasks can be complex.


∙ Synchronization and Communication: Coordination between tasks
can lead to overhead.
∙ Scalability Issues: Adding more processors doesn’t always guarantee
linear speedup due to communication delays and resource
contention.
5

5
Flynn’s classification

6
Feng’s classification

7
Architectural Classification of Parallel Processing
Symmetric Multiprocessing (SMP).

Symmetric Multiprocessing (SMP) is a type of computer architecture


where multiple processors share a common memory and work
collaboratively on tasks. SMP is widely used in systems that demand
high performance, scalability, and reliability. Each processor in an
SMP system runs its own instance of the operating system, but all
processors share memory, I/O, and system resources equally.
• Symmetry Among Processors
• Shared Memory Architecture
• Shared Bus/System Interconnect
• Single Operating System
• Cache Usage
• Scalability 8

• Resource Sharing
• Fault Tolerance
• Performance Metrics
• Simplified Parallel Programming Model

8
Architectural Classification of Parallel Processing
Symmetric Multiprocessing (SMP).

Symmetric Multiprocessing (SMP) is a type of computer architecture


where multiple processors share a common memory and work
collaboratively on tasks. SMP is widely used in systems that demand
high performance, scalability, and reliability. Each processor in an
SMP system runs its own instance of the operating system, but all
processors share memory, I/O, and system resources equally.
• Symmetry Among Processors
• Shared Memory Architecture
• Shared Bus/System Interconnect
• Single Operating System
• Cache Usage
• Scalability 9

• Resource Sharing
• Fault Tolerance
• Performance Metrics
• Simplified Parallel Programming Model

9
Architectural Classification of Parallel Processing
Massively Parallel Processing (MPP).
Massively Parallel Processing (MPP) refers to a computing architecture where
numerous processors work simultaneously on different parts of a computational
task. It is a scalable approach designed to handle large-scale, data-intensive
problems, making it a cornerstone of high-performance computing (HPC).
a) MPP supports shared-nothing Architecture
b) In MPP each processor works on a different part of the task.
c) Each processor has its own set of disks
d) Each node is responsible for processing only the rows on its own disk
e) Scalability is easy by just adding nodes- Horizontal Scaling
f) Usually comes with huge compression ability
g) MPP processors communicate with each other a messaging interface
h) In MPP each processor uses its own operating system (OS) and memory.

10

10
Architectural Classification of Parallel Processing
Cluster Computing.
Cluster computing is a model of computing that involves connecting a group of
independent computers, called nodes, to work together as a single system. These nodes
collaborate to solve computational problems, providing high performance, scalability,
and fault tolerance. It is widely used in fields requiring intensive computations, such as
scientific research, data analysis, and financial modeling. Storage System
Architecture of Cluster Computing
∙ A shared storage system is often used to allow nodes to access the
Nodes same data.
∙ Each node is an individual computer (e.g., PC, server) with ∙ Common systems include Network File System (NFS) and
its own CPU, memory, and storage. distributed file systems like HDFS (Hadoop Distributed File
∙ Nodes are typically uniform to ensure consistency in System).
performance. Master Node
Interconnection Network ∙ Responsible for scheduling tasks, managing resources, and
∙ Nodes are connected using a high-speed network, such as monitoring the cluster's health.
11
Gigabit Ethernet, InfiniBand, or Fiber Channel, to ∙ Centralized systems have a single master node, while decentralized
facilitate communication. systems distribute responsibilities across multiple nodes.
∙ The network's bandwidth and latency significantly impact Compute Nodes
the cluster's performance.
∙ Perform the actual computational tasks assigned by the master
Cluster Middleware node.
∙ Software that manages the cluster, enabling coordination
and resource allocation.

11
Architectural Classification of Parallel Processing
Cluster Computing.
Cluster computing is a model of computing that involves connecting a group of
independent computers, called nodes, to work together as a single system. These nodes
collaborate to solve computational problems, providing high performance, scalability,
and fault tolerance. It is widely used in fields requiring intensive computations, such as
scientific research, data analysis, and financial modeling.

Classification of Cluster :
1. Load-balancing clusters: Workload is distributed
1. Open Cluster :
across multiple installed servers in the cluster
IPs are needed by every node and those are
network.
accessed only through the internet or web. This
type of cluster causes enhanced security concerns. 2. High availability (HA) clusters: A collection
group that maintains very high Availability.
2. Close Cluster :
Computers pulled from these systems are considered
The nodes are hidden behind the gateway node, to be very much reliable and may not face
and they provide increased protection. They need downtime, even possibly in any instance.
12
fewer IP addresses and are good for computational
3. High-performance (HP) clusters: This computer
tasks.
networking tactic use supercomputers and Cluster
computing to resolve complex and highly advanced
computation problems

12
Characteristics of Parallel Algorithms
1. Decomposition and Concurrency: 5. Load Balancing:
o Parallel algorithms decompose a problem into smaller tasks that o Effective distribution of tasks among processors ensures that no processor
can execute simultaneously. This decomposition can be remains idle while others are overloaded. Poor load balancing leads to
data-based (data parallelism) or task-based (task parallelism). For suboptimal performance.
example, splitting a matrix multiplication problem into
independent sub-matrix operations. 6. Fault Tolerance:

2. Communication: o Parallel algorithms should handle processor failures gracefully, particularly in


o Parallel algorithms often require processors to exchange distributed systems, ensuring continued execution without significant data loss.
intermediate data during execution. The communication 7. Efficiency and Speedup:
overhead, depending on the architecture (shared or distributed o Efficiency is the ratio of speedup to the number of processors used. Speedup
memory), influences the algorithm's efficiency. measures how much faster a parallel algorithm is compared to its sequential
3. Synchronization: counterpart. A well-designed parallel algorithm maximizes both.
o Coordination among processes is crucial to ensure correct 8. Overhead:
execution. Synchronization mechanisms (e.g., locks, barriers) o Parallel algorithms incur overhead due to communication, synchronization, and
13
manage dependencies but can introduce delays. data distribution. Minimizing this overhead is crucial for achieving high
4. Scalability: performance.
o A good parallel algorithm scales efficiently with the number of
processors, maintaining or improving performance as more
resources are added.

13
Parallel Programming Techniques
Task Decomposition Techniques
Task decomposition (also known as functional decomposition) divides
a problem into distinct tasks, each representing a functional
operation that can be executed concurrently

Key Features:
∙ Tasks may have varying computational workloads.
∙ Task dependencies determine execution order.
∙ Typically used when tasks involve distinct processes (e.g., data fetching, processing,
visualization).
Steps in Task Decomposition:
1. Identify Tasks: Break the application into logically distinct operations.
2. Analyze Dependencies: Determine dependencies among tasks to avoid conflicts.
3. Assign Tasks to Processes: Allocate tasks to processors or threads based on their
dependencies and workload.
14

Example:
∙ In a weather simulation application:
o Task 1: Fetch weather data from sensors.
o Task 2: Process the fetched data using a mathematical model.
o Task 3: Visualize the results for end users.
Each task can run in parallel, provided proper synchronization.

14
Parallel Programming Techniques
Data Decomposition Techniques
Data decomposition (or data partitioning) divides the data into smaller
chunks that can be processed simultaneously by different processing
units.
Key Features:
∙ Best suited for problems where the same operations are performed on different subsets of
data.
∙ Improves scalability and efficiency in data-intensive tasks.
Types of Data Decomposition:
1. Block Decomposition: Data is divided into contiguous blocks and assigned to different
processors.
o Example: Dividing a 1D array into equal-sized chunks for processing.
2. Cyclic Decomposition: Data elements are distributed cyclically among processors.
o Example: Assigning every nth element of an array to the same processor.
3. Block-Cyclic Decomposition: A hybrid of block and cyclic decomposition.
o Example: Dividing data into small blocks and distributing cyclically. 15

Example:
In matrix multiplication:
∙ Divide the rows of matrix A and columns of matrix B into smaller blocks.
∙ Assign each block to different processors for simultaneous computation of the result matrix.

15
Pipelining in Parallel Programming
Pipelining divides a task into a sequence of subtasks, where the output
of one becomes the input for the next. Each subtask is processed in
parallel, creating an assembly line effect.

Key Characteristics:
∙ Suitable for problems where tasks are divided into stages with dependencies.
∙ Improves throughput by overlapping execution of subtasks.
∙ Reduces idle time of processing units.

Stages in Pipelining:
1. Divide Work into Stages: Identify sequential subtasks.
2. Parallelize Stages: Assign each stage to a processor.
3. Execute Concurrently: Start new data inputs as soon as the previous stage is free.

Pipeline Performance Metrics:


1. Throughput: Number of tasks completed per unit time.
2. Latency: Time taken to process a single task from start to finish.
16

16
Parallel Programming Models
Parallel programming models provide a framework for designing and implementing software that can execute tasks concurrently on multiple processing units. These models
abstract away some of the complexities of hardware interaction, enabling developers to focus on application-level parallelism.

Shared Memory Model (OpenMP) Message Passing Interface (MPI):


The shared memory model is a parallel programming paradigm where MPI is the most popular standard for distributed memory parallel
multiple threads execute within the same memory space. This model is programming. It provides a set of libraries for communication and
particularly suited for systems with shared memory architecture, such as synchronization across multiple processes.
symmetric multiprocessors (SMP). Core Components of MPI:
1.Point-to-Point Communication: Direct communication between two
Key Features: processes (e.g., MPI_Send and MPI_Recv).
•Global Address Space: All threads share the same memory, which eliminates the
2.Collective Communication: Communication involving all processes in a
need for explicit data transfer.
•Thread-Based Execution: Computations are divided among threads, which can access
group (e.g., broadcast, scatter, gather).
shared variables. 3.Communicators: Define groups of processes that can communicate.
•Synchronization Mechanisms: Required to prevent race conditions and ensure Advantages:
consistent data (e.g., locks, barriers). •High scalability; suitable for large-scale systems like clusters and
supercomputers.
Core Components of OpenMP: •Flexibility to run on heterogeneous systems.
1.Directives: Pragma-based syntax for marking parallel regions (e.g., #pragma omp Disadvantages:
17
parallel). •Increased complexity due to explicit communication.
2.Runtime Library Routines: Functions for managing threads, synchronization, •Overhead from data transfer and synchronization.
and performance.
3.Environment Variables: Control runtime behavior (e.g., OMP_NUM_THREADS for
setting the number of threads).
Advantages:
•Simplicity in coding due to implicit communication.
•Suitable for fine-grained parallelism.

17
Parallel Algorithms for Multiprocessors
Parallel algorithms leverage multiple processors to perform computations simultaneously, enabling faster execution and efficient resource utilization. Multiprocessor
systems, characterized by shared or distributed memory architectures, are ideal platforms for implementing such algorithms.

Divide-and-Conquer Algorithms Graph-Based Algorithms


Divide-and-conquer is a strategy that breaks a problem into smaller Graph-based algorithms use the structure of a graph to parallelize computations. These
subproblems, solves these subproblems concurrently, and combines algorithms are crucial in solving problems like shortest path, network flow, and graph
their results to form the solution to the original problem. traversal.
Steps in Divide-and-Conquer: Key Graph-Based Parallel Algorithms:
1. Divide: Partition the input data into disjoint subsets. 1. Parallel Breadth-First Search (BFS): Used for traversing graphs level by level. Each level can
2. Conquer: Solve each subset independently, often in parallel. be explored concurrently by multiple processors.
3. Combine: Merge the results of the subproblems. 2. Minimum Spanning Tree (MST): Algorithms like Borůvka’s MST can be parallelized by
processing independent components simultaneously.
Applications in Parallel Processing:
∙ Sorting: Parallel quicksort, parallel merge sort. 3. Parallel Dijkstra’s Algorithm: Suitable for shortest-path computation in distributed memory
systems.
∙ Matrix Operations: Strassen’s matrix multiplication.
Graph Representation:
∙ Numerical Problems: Parallel prefix sum, FFT (Fast Fourier Transform).
∙ Adjacency Matrix: Easy to parallelize but memory-intensive.
Example: Parallel Merge Sort ∙ 18Adjacency List: Efficient for sparse graphs but requires sophisticated load-balancing
1. Divide the array into two halves.
strategies.
2. Sort each half concurrently using recursive calls.
3. Merge the two sorted halves using parallel merging techniques.

18
Performance of Parallel Algorithms
Speedup Scalability
Speedup is a measure of how much faster a parallel algorithm runs compared to its sequential Scalability refers to the ability of a parallel algorithm to effectively use more
counterpart. It is defined as the ratio of the time taken to execute the sequential algorithm to the processors as they are added. There are two types of scalability:
time taken to execute the parallel algorithm. ∙ Strong scalability: How the execution time decreases as the number of processors
increases, for a fixed problem size.
∙ Weak scalability: How the algorithm handles increasing problem sizes as the number of
processors increases.
A parallel algorithm is said to be scalable if increasing the number of processors
leads to a proportional decrease in execution time or a proportional increase in
problem size without significant performance degradation.

Efficiency
Efficiency measures how effectively the processors are utilized in the parallel algorithm

19

19
Parallel Programming Languages
CUDA (Compute Unified Device Architecture) for GPUs
MPI (Message Passing Interface) CUDA is a parallel computing platform and
OpenMP (Open Multi-Processing)
MPI is a standardized and portable message-passing OpenMP is a parallel programming model for programming model developed by NVIDIA for
system designed for parallel programming in shared-memory systems, where multiple processors or cores general-purpose computing on GPUs (Graphics
distributed-memory systems, where each processor has share the same memory space. OpenMP simplifies the Processing Units). CUDA is designed to accelerate
its own local memory. It is one of the most widely used computations by harnessing the massive parallelism
process of parallel programming by allowing developers to of modern GPUs.
models for high-performance computing (HPC) parallelize existing sequential programs with minimal
applications. ∙ Key Features:
changes. o CUDA allows developers to write programs that
∙ Key Features: ∙ Key Features: execute parallel code on NVIDIA GPUs.
o MPI allows communication between processes
o OpenMP uses compiler directives, runtime library o It supports a programming model based on
running on different nodes or machines.
routines, and environment variables to control parallelism. threads and blocks, where threads are grouped
o It provides routines for sending and receiving into blocks, and blocks are organized into grids.
o It supports multi-threading, where each thread operates
messages between processes, which can be on the CUDA provides APIs for working with large
on a shared memory space. o
same machine or distributed across different datasets, matrix computations, and other highly
machines. o It is most commonly used in C, C++, and Fortran, but the parallel tasks.
focus is on adding parallelism to sequential code with
o MPI supports both point-to-point communication 20 o It supports both CPU-GPU parallelism and
simple annotations.
(i.e., between two processes) and collective GPU-only parallelism for data-intensive tasks.
communication (i.e., involving multiple processes). o OpenMP allows for implicit parallelism where loops or
sections of code can be automatically parallelized using a
o It is language-independent but typically used with
few simple commands.
languages like C, C++, and Fortran.
MPI is designed to be highly scalable, making it suitable for
supercomputers, clusters, and grid computing

20
Solving Problems with Parallel Algorithms
Load Balancing Issues Deadlock and Synchronization Challenges Race Condition:
Load balancing refers to the process of distributing tasks or Deadlock occurs in parallel systems when two or more processes are
A race condition occurs when two or more
computations evenly across available processors in a parallel system. blocked forever, waiting for each other to release resources. In a processes access shared data simultaneously, and
The goal is to ensure that no processor is underutilized or parallel environment, deadlock is a serious issue because it halts the
the final result depends on the order of
overwhelmed, which would lead to inefficiencies and wasted progress of tasks and results in wasted computational power.
Causes of Deadlock: execution. This can lead to unpredictable
resources.
Resource Allocation: When multiple processes request resources behavior, where the outcome may vary each time
Challenges:
Uneven Task Distribution: If the work is not evenly distributed, some (e.g., memory, I/O), and each process holds one resource while the program is run.
processors may finish their tasks early while others may be waiting for another. Synchronization Strategies:
overwhelmed. This leads to idle processors and slower overall Circular Waiting: A situation where processes form a cycle of
dependencies, causing each process to wait for another in the cycle. Locks and Mutexes: These are mechanisms that
performance.
allow only one process to access a shared
Dynamic Workloads: In some cases, the amount of work assigned to Prevention of Deadlock:
each processor may change during execution, especially when dealingResource Allocation Strategies: One approach is to allocate resource at a time.
with problems that involve dynamic inputs or evolving data (e.g., resources only if all required resources are available, ensuring no Semaphores: A signaling mechanism that
real-time processing). partial allocations occur. controls access to shared resources, using signals
Strategies for Load Balancing: Avoidance Algorithms: These algorithms ensure that a system does to indicate whether resources are available or
Static Load Balancing: Tasks are divided evenly at the start of the not enter a deadlock state by preventing circular wait conditions.
not.
computation and remain fixed throughout. This works well when the Timeouts: Implementing timeouts for waiting processes can help
21 Barriers: A synchronization method where
size of each task is known ahead of time and does not change during detect and recover from potential deadlocks.
execution. Synchronization Challenges: processes wait until all of them reach a certain
Dynamic Load Balancing: Work is distributed dynamically during Synchronization refers to the coordination of tasks and the correct point before proceeding.
execution. This is more adaptable, especially when tasks take varying order of operations in a parallel system. Without proper
amounts of time to complete. A processor that finishes its task early synchronization, parallel tasks may access shared resources at the
can pick up additional work from other processors that are still busy. wrong time, leading to inconsistent results or race conditions.
Work Stealing: A technique where idle processors "steal" tasks from
busy processors to balance the load dynamically.

21
Practice Questions-1
7. Parallel processing is widely used in various domains.
1. Define parallel processing and discuss its
Identify three real-world applications of parallel processing
significance in modern computing environments.
and explain how its characteristics make it suitable for these
2. Discuss the advantages, limitations, and
applications.
scenarios where parallel processing outperforms
8. What are the primary challenges associated with
sequential approaches with examples
implementing parallel processing systems? Discuss trade-offs
3. Compare and contrast Flynn's taxonomy with
between complexity, cost, and performance in parallel
Feng's classification of parallel computers. How
computer architecture design.
does Feng's framework enhance the
9. Describe the evolution of classifications for parallel
understanding of parallel processing systems?
processing systems from traditional models like Flynn's
4. Critically evaluate the role of parallel processing
taxonomy to hybrid models used in contemporary systems
in emerging technologies such as quantum
(e.g., cloud computing and GPUs).
computing and distributed systems.
10. Analyze the significance of parallel processing in improving
5. Compare MPP and cluster computing.
system performance. Discuss the relationship between the
6. Explain and compare multitasking,
degree of parallelism and scalability in parallel computing
multiprocessing and multithreading.
architectures.

22
Practice Questions-2
1. Explain the architecture of Symmetric Multiprocessing (SMP) 6. Explain the architecture of Symmetric Multiprocessing (SMP) and discuss
systems. Discuss the advantages and challenges of using SMP for how it handles memory contention in multi-core processors. Provide an
shared-memory applications. How does SMP ensure memory example of an application where SMP systems are preferable and justify your
consistency in concurrent processes? choice.
2. Compare and contrast Symmetric Multiprocessing (SMP) and 7. Explain how parallel processing is implemented in Hyperconverged
Massively Parallel Processing (MPP) architectures. In which Infrastructure (HCI) and traditional Three-Tier Computing. Discuss the
scenarios would MPP systems outperform SMP systems, and following aspects:
why? Provide examples of applications suited for MPP systems. a. Resource pooling and scalability.
3. Define cluster computing and its role in modern computing b. Data locality and its impact on parallel processing performance.
environments. Discuss the design considerations for a c. Challenges in maintaining fault tolerance and load balancing in both
high-performance cluster, including node configuration, network architectures.
topology, and fault tolerance mechanisms. Provide a detailed comparison and suggest scenarios where each
4. Analyze the differences between fog computing and cloud architecture would be most suitable for high-performance parallel
computing in terms of architecture, latency, scalability, and computing workloads.
resource management. Provide real-world examples where fog 8. With the rise of hybrid cloud deployments, how can parallel processing
computing is more advantageous than cloud computing and models adapt to leverage the benefits of both HCI and three-tier
justify your reasoning. architectures? Propose a hybrid framework combining strengths of both
5. Describe the key characteristics of grid computing and how it systems for optimal parallel processing.
differs from cluster and cloud computing. Discuss how resource
10. Compare Open Clustering vs closed clustering.
allocation and job scheduling are managed in grid computing
environments, using specific algorithms or techniques as 11. Explain various synchronization methods used in SMP and MPP.
examples.

23
Practice Questions-3
1. Explain the difference between data decomposition and
task decomposition in parallel programming. Provide 7. Explain how CUDA facilitates parallel programming for GPUs.
examples where each technique would be most suitable. Provide an example of a simple CUDA kernel for vector addition.
2. Compare and contrast the shared memory model with 8. A dataset contains 1 billion integers, and you need to find the
the distributed memory model in parallel programming. maximum value using a parallel algorithm. Design an approach
Highlight the challenges associated with each. using a divide-and-conquer method and discuss its complexity.
3. Explain the working of a parallel merge sort algorithm 9. Consider a scenario where a large dataset needs to be processed
for multiprocessor systems. Provide a pseudocode for image recognition. Analyze and compare the efficiency of task
representation. decomposition and data decomposition techniques for this problem.
4. Discuss the key design considerations for implementing Highlight the impact of inter-process communication overhead,
matrix multiplication on a multiprocessor system. synchronization issues, and load balancing on the overall
Include aspects such as load balancing and performance.
communication. 10. A parallel matrix multiplication algorithm is implemented on a
5. What is Amdahl’s Law, and how does it impact the multiprocessor system using distributed memory architecture.
scalability of parallel algorithms? Illustrate your answer Evaluate the performance of this algorithm using metrics such as
with an example.Define speedup and efficiency in the speedup, efficiency, and scalability. Additionally, discuss how the
context of parallel algorithms. How do they help in performance would be affected if the number of processors is
evaluating the performance of a parallel algorithm? doubled but the problem size remains constant (strong scaling).
6. Discuss the features of OpenMP that make it suitable
for parallel programming. How does it differ from MPI
in terms of application scenarios?
24
Advanced Computer Architecture
Unit I

The End
25

Dr. Roshan Koju 2024 December 8/15


Week I & II

You might also like