Module 1
Module 1
Parallel Computing
Definition: Parallel computing is a type of computation where many
calculations or processes are carried out simultaneously. Large problems
are divided into smaller ones, which are then solved concurrently.
• Types of Parallelism:
• Data Parallelism: Distributes data across different parallel
computing nodes and performs the same operation on each
node.
• Task Parallelism: Distributes tasks (operations) across
computing nodes, each possibly performing a different task.
• Architectures:
• Shared Memory: All processors access a common memory
space.
• Distributed Memory: Each processor has its own private
memory, and processors communicate through a network.
• Programming Models:
• Message Passing Interface (MPI): A standardized and
portable message-passing system designed to function on
parallel computing architectures.
• OpenMP: An API that supports multi-platform shared memory
multiprocessing programming.
• Synchronization: Ensuring that multiple parallel processes do not
interfere with each other. Techniques include locks, semaphores,
and barriers to manage access to shared resources.
• Applications: Used in various domains such as scientific
simulations, image processing, financial modeling, weather
forecasting, and big data analytics, where processing large datasets
or complex computations quickly is essential.
• Benefits and Challenges:
• Benefits: Significant reduction in computation time, ability to solve
complex problems, efficient utilization of resources.
• Challenges: Complexity in designing parallel algorithms, managing
data dependencies, ensuring load balancing, and handling
communication overhead.
Parallel processing terminology
• Processor: A hardware unit capable of executing instructions. In
parallel processing, it can refer to CPU cores, GPU cores, or other
processing units working simultaneously to perform computations.
• Thread: The smallest unit of execution within a process. Threads
can run concurrently and share the same resources such as
memory, which allows them to execute tasks independently but
cooperatively.
• Task: A discrete unit of work or computation assigned to a
processor. Tasks are executed concurrently in a parallel processing
system, with each processor handling one or more tasks.
• Speedup: A metric for evaluating the performance gain of parallel
processing. It is the ratio of the time taken to solve a problem on a
single processor to the time taken on multiple processors, indicating
how much faster a task can be completed using parallelism.
• Scalability: The capacity of a parallel system to improve
performance proportionally with the addition of more processors.
Good scalability means that as more processors are added, the
system continues to achieve better performance efficiently.
• Granularity: Refers to the size of tasks in a parallel computing
system.
• Fine-Grained Parallelism: Involves small tasks that require
frequent communication and synchronization.
• Coarse-Grained Parallelism: Involves larger tasks with less
frequent communication and synchronization.
• Amdahl's Law: A formula used to find the maximum improvement
possible by improving a particular part of a system. It states that the
overall speedup of a task is limited by the sequential portion of the
task that cannot be parallelized. The formula is: Speedup = 1 / ((1 -
P) + (P / N)), where P is the fraction of the task that can be
parallelized, and N is the number of processors.
Pipelining vs Data Parallelism
Pipelining:
• Definition: A technique where multiple instruction phases are
overlapped. It divides a process into stages, with each stage
completing a part of the process simultaneously.
• Stages: The process is broken down into sequential stages (e.g.,
instruction fetch, decode, execute, memory access, write-back in
CPU pipelines). Each stage processes different parts of multiple
instructions at once.
• Throughput: Increases throughput by executing different parts of
multiple instructions in parallel, reducing the overall time to complete
a sequence of instructions.
• Applications: Widely used in CPU architectures to improve
instruction execution efficiency and in signal processing for
continuous data flow processing.
• Advantages: Efficient use of CPU resources, reduced instruction
latency, increased instruction throughput.
• Challenges: Pipeline hazards such as data hazards (when
instructions depend on the results of previous ones), control hazards
(branching issues), and structural hazards (resource conflicts) need
to be managed.
Data Parallelism:
• Definition: A technique where the same operation is performed on
different pieces of distributed data simultaneously. It involves dividing
data across multiple processing units.
• Operation: Each processor executes the same operation on its
portion of the data set. This can be achieved using SIMD (Single
Instruction, Multiple Data) architectures or parallel data processing
frameworks.
• Scalability: Highly scalable since adding more processors can
handle larger data sets without modifying the code significantly.
• Applications: Common in scientific computing, image and signal
processing, and large-scale data analysis, where operations can be
uniformly applied across large data sets.
• Advantages: High efficiency in processing large data sets, easy to
implement with modern parallel processing frameworks, good
scalability.
• Challenges: Requires uniformity in data and operations, potential for
load balancing issues if data is not evenly distributed, and may
involve significant inter-processor communication overhead.
Comparison:
• Process vs Data Focus: Pipelining focuses on dividing the
processing of a single task into stages, while data parallelism
focuses on performing the same task on different pieces of data
concurrently.
• Sequential vs Simultaneous Operations: Pipelining processes
different stages of a task sequentially but concurrently, while data
parallelism processes the same stage of a task on multiple data
items simultaneously.
• Usage: Pipelining is mostly used in CPU architecture and specific
computational processes, whereas data parallelism is widely used in
parallel computing for large data sets.
Control Parallelism
Definition:
• Control parallelism, also known as task parallelism, refers to the
simultaneous execution of different tasks or threads, each
performing a distinct function within a larger computation. Unlike
data parallelism, where the same operation is applied to multiple
data elements simultaneously, control parallelism focuses on
distributing different tasks across multiple processors.
Task Distribution:
• Each processor or core is assigned a different task. These tasks
may have different computational loads and may execute different
instructions. This allows for concurrent execution of multiple parts of
a program, enhancing overall performance.
Synchronization:
• Requires synchronization mechanisms to manage dependencies
and ensure correct sequencing of tasks. Techniques such as locks,
semaphores, and barriers are commonly used to coordinate tasks
and avoid conflicts.
Applications:
• Widely used in operating systems (for managing different
processes), web servers (handling multiple client requests
simultaneously), and complex simulations where different
components of the simulation can run in parallel.
Advantages:
• Efficiency: Can significantly improve the efficiency and speed of
applications by utilizing multiple processors to perform different tasks
simultaneously.
• Flexibility: Allows for a flexible approach to parallelism,
accommodating various types of tasks and computations.
• Improved Resource Utilization: Ensures that all available
processing resources are utilized effectively, reducing idle time.
Challenges:
• Load Balancing: Ensuring that all processors have an
approximately equal amount of work can be challenging. Uneven
distribution can lead to some processors being overburdened while
others remain idle.
• Complexity: Writing and debugging programs with control
parallelism can be more complex due to the need to manage
dependencies and synchronization between tasks.
• Overhead: Synchronization and communication between tasks can
introduce overhead, potentially reducing the benefits of parallel
execution.
Control Parallel Approach
Definition:
• The control parallel approach, also known as task parallelism,
involves executing different tasks or threads concurrently, each
performing a distinct function. This method focuses on parallelizing
the control flow of a program rather than the data.
Task Division:
• The overall problem is decomposed into multiple, independent tasks.
Each task can be executed on a different processor or core, allowing
for concurrent execution of different parts of a program. These tasks
can vary in complexity and duration.
Concurrency:
• Tasks are executed simultaneously, leveraging multiple processing
units. This can lead to significant performance improvements,
especially in applications with multiple independent operations that
can be parallelized.
Synchronization and Communication:
• Requires careful synchronization to manage dependencies and data
sharing between tasks. Common synchronization mechanisms
include mutexes, semaphores, condition variables, and barriers to
ensure correct execution order and prevent race conditions.
Applications:
• Control parallelism is widely used in real-time systems, multi-
threaded applications, web servers, and complex simulations. For
example, in a web server, handling different client requests
concurrently, or in simulations where different parts of the simulation
run in parallel.
Advantages:
• Performance: By executing tasks concurrently, the overall execution
time can be reduced significantly.
• Flexibility: Allows different parts of a program to be developed and
executed independently.
• Resource Utilization: Maximizes the use of available processing
resources by distributing different tasks across processors.
Challenges:
• Load Balancing: Ensuring an even distribution of tasks among
processors to avoid some processors being idle while others are
overloaded.
• Complexity: Writing and maintaining control parallel programs can
be complex due to the need for synchronization and managing
dependencies.
• Overhead: Synchronization and communication between tasks can
introduce overhead, which can diminish the performance gains from
parallel execution.
Comparison with Data Parallelism:
• Focus: Control parallelism focuses on dividing tasks, whereas data
parallelism focuses on dividing data.
• Granularity: Control parallelism typically deals with larger tasks,
while data parallelism deals with smaller, uniform data elements.
• Use Cases: Control parallelism is suited for applications with
distinct, independent tasks, while data parallelism is ideal for
applications that apply the same operation to large data sets.
Data Parallel Approach:
Definition:
• The data parallel approach involves distributing data across multiple
processing units and performing the same operation on each data
element simultaneously. This approach focuses on parallelizing
operations over large data sets rather than the control flow of a
program.
Data Distribution:
• The data set is divided into smaller chunks or partitions, with each
chunk assigned to a different processor or core. Each processor
performs the same operation on its respective data chunk, allowing
for concurrent data processing.
Operations:
• The same operation or function is applied to different data elements
in parallel. For example, in an image processing application, the
same filter might be applied to different parts of an image
simultaneously.
Scalability:
• Data parallelism is highly scalable. Adding more processors or cores
can handle larger data sets or perform more operations concurrently,
with minimal changes to the program’s structure.
Synchronization:
• Generally involves minimal synchronization compared to control
parallelism. The main synchronization concern is managing the
aggregation of results from different processors, especially if tasks
involve reducing or combining results.
Applications:
• Widely used in scientific computing, big data analytics, machine
learning, and image processing. For example, applying the same
transformation to each pixel in an image or processing large-scale
datasets in parallel.
Advantages:
• Efficiency: Provides high efficiency in processing large data sets by
leveraging parallel execution.
• Simplicity: Easier to implement and reason about compared to
control parallelism, as the same operation is performed on each data
element.
• Scalability: Easily scalable by adding more processors to handle
larger data sets or more complex operations.
Challenges:
• Data Partitioning: Efficiently dividing data to ensure balanced
workloads across processors can be challenging. Uneven data
distribution can lead to some processors being overloaded while
others are idle.
• Communication Overhead: In cases where processors need to
share intermediate results or synchronize, communication overhead
can impact performance.
• Load Balancing: Ensuring that each processor has an equal
amount of work to prevent bottlenecks and inefficiencies.
Data Parallel Approach with I/O Parallel Reduction
Definition:
• Data Parallel Approach: This approach involves performing the
same operation simultaneously on different chunks of data
distributed across multiple processors. It focuses on parallelizing
computations over data rather than control flow.
• I/O Parallel Reduction: A technique used to combine or reduce data
results from multiple processing units. It is often employed to handle
large volumes of data efficiently by distributing input/output
operations and aggregation tasks.
Data Parallel Approach:
• Data Distribution: Large datasets are partitioned into smaller
chunks, which are processed concurrently by different processors or
cores. Each processor performs the same operation on its assigned
chunk of data.
• Operations: The same function or computation is applied to each
data chunk in parallel, such as applying a filter to image pixels or
performing numerical calculations on data arrays.
• Scalability: The approach is highly scalable, as increasing the
number of processors can handle larger datasets or more complex
operations with minimal changes to the program.
I/O Parallel Reduction:
• Purpose: To efficiently combine or reduce the results of parallel
operations. This is crucial when dealing with large datasets that are
processed in parallel and need to be aggregated into a final result.
• Process:
• Parallel Reduction: Involves applying a reduction operation
(e.g., summing values, finding maximum/minimum) across data
processed by different processors. Initially, local reductions are
performed within each processor, followed by global reduction
to combine these partial results.
•I/O Parallelism: Manages input/output operations concurrently,
often by distributing data read/write tasks across multiple I/O
channels or devices to prevent bottlenecks and improve
throughput.
• Techniques:
• Local Reduction: Each processor performs reduction
operations on its local data, resulting in partial results.
• Global Reduction: Partial results from all processors are
combined to produce the final aggregated result. This may
involve hierarchical reduction stages to efficiently merge results.
Advantages:
• Efficiency: Reduces overall processing time by leveraging parallel
execution of I/O operations and data reduction tasks. This is
particularly useful for applications with large-scale data processing
and aggregation needs.
• Improved Throughput: Distributing I/O operations helps avoid
bottlenecks associated with sequential I/O processing, enhancing
overall system performance.
• Scalability: Handles increasing data volumes and processing
requirements effectively by distributing both computation and I/O
tasks.
Challenges:
• Data Partitioning: Efficiently dividing data and balancing workloads
across processors can be complex. Uneven distribution may lead to
some processors being overloaded while others are underutilized.
• Synchronization: Managing the coordination and aggregation of
partial results from multiple processors can introduce
synchronization overhead.
• I/O Bottlenecks: Even with parallel I/O, the system may encounter
bottlenecks if the I/O infrastructure cannot keep up with the demands
of parallel processing.
Applications:
• Scientific Computing: Large-scale simulations and data analysis
where results from multiple parallel computations need to be
aggregated.
• Big Data Analytics: Processing and summarizing large datasets
where data needs to be distributed, computed in parallel, and
reduced to a final result.
Prefix Sums
Definition:
• Prefix sums are used to compute the cumulative sum of elements in
an array up to each index. For an array AAA, the prefix sum array
PPP is defined such that P[i]P[i]P[i] is the sum of the elements
A[0]A[0]A[0] through A[i]A[i]A[i].
Algorithm:
• Initialization: Set P[0]=A[0]P[0] = A[0]P[0]=A[0].
• Iteration: For i=1i = 1i=1 to n−1n-1n−1, set P[i]=P[i−1]+A[i]P[i] = P[i-
1] + A[i]P[i]=P[i−1]+A[i].
• Time Complexity: O(n)O(n)O(n), where nnn is the number of
elements in the array.
Applications:
• Efficiently answering range sum queries. For example, the sum of
elements from index lll to rrr can be computed as P[r]−P[l−1]P[r] -
P[l-1]P[r]−P[l−1].
Advantages:
• Simplifies range sum queries and can be used to optimize various
array processing tasks.
Challenges:
• Requires additional space proportional to the size of the input array,
i.e., O(n)O(n)O(n).
List Ranking
Definition:
• List ranking is the process of determining the position or rank of
each element in a linked list. It involves finding the length of the list
segment starting from each node.
Algorithm:
• Sequential Approach: Traverse the list from each node to count the
number of nodes.
• Parallel Approach: Use a parallel reduction technique to count and
rank nodes efficiently.
Applications:
• Useful in various parallel computing problems and in algorithms
requiring list manipulation and indexing.
Advantages:
• Efficiently determines the length of segments in linked lists, which
can be crucial for certain algorithms.
Challenges:
• Parallel implementations require careful management of shared data
and synchronization.
Graph Coloring
Definition:
• Graph coloring involves assigning colors to vertices of a graph such
that no two adjacent vertices share the same color.
Algorithm:
• Greedy Coloring: Assign colors sequentially, choosing the smallest
available color that does not conflict with adjacent vertices.
• Backtracking: Explore possible color assignments and backtrack if
a conflict is found.
Applications:
• Used in scheduling problems, map coloring, and register allocation
in compilers.
Advantages:
• Helps in solving real-world problems involving conflicts and resource
allocation.
Challenges:
• Determining the minimum number of colors needed (chromatic
number) is NP-hard for general graphs.