0% found this document useful (0 votes)
11 views4 pages

HPC Ut 2

HPC

Uploaded by

Aditya Pimpale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views4 pages

HPC Ut 2

HPC

Uploaded by

Aditya Pimpale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

HPC UNIT 2

Decomposition:
1. Dividing the computation into sub computations to execute them parallely is called as decomposition.
2. Decomposition speeds up the overall computation.
Task:
1. It is programmer defined unit of computation.
2. Tasks are generated by subdividing the main computation by decomposition.
Dependency Graph: -
1. A decomposition can be illustrated in the form of a directed graph with nodes corresponding to tasks & edges
indicating that the result of one task is required for processing the next.
2. Such a graph is called a task dependency graph.

Parallel Algorithm Models: -


Master-Slave Model:
1. It focuses on Task distribution and control.
2. Working: - 1) Master: Assigns tasks. 2) Slave: Executes tasks and reports back.
3. Control: Centralized, master dictates tasks.
Example: In a rendering farm, the master assigns rendering tasks to slave computers. Slaves render frames and send
them back to the master for assembly.

Data-Parallel Model:
1. Performs same operation on different data concurrently.
2. Working: - Every processor performs the same operation on its assigned data.
3. Control: Centralized control for data distribution.
Example: Calculating average temperature across multiple weather stations. Each station calculates its average, then
the main system combines them for the overall average.

Task-Parallel Model:
 Executes independent tasks concurrently.
 Working: - Processors handle different tasks.
 Control: Can be centralized or decentralized.
Example: Web server handling multiple requests. Each request is a task, and they can be processed concurrently.

Characteristics of Tasks: -
1. Task Generation:
 Static: All tasks are known before starting (think of a grocery list you make beforehand).
 Dynamic: Tasks are created during execution (like encountering unexpected ingredients while cooking).
2. Task Sizes:
 Uniform: All tasks are roughly having the same amount of work.
 Non-uniform: Tasks have varying workloads.
3. Knowledge of Task Sizes:
 Known: We know the workload of each task upfront.
 Unknown: We don't know the workload until runtime.
4. Size of Data Associated with Tasks:
 Small tasks require minimal data to execute.
 Large tasks involve processing significant amounts of data.
Characteristics of Interactions:
1. Read-only vs. Read-write Interactions:
 Read-only: Tasks only access data from other tasks without modifying it.
 Read-write: Tasks reads and modifies the data associated with other tasks.
2. One-way vs. Two-way Interactions:
 One-way: A task initiates communication and completes it without further exchange.
 Two-way: Tasks engage in a back-and-forth communication.

Decomposition Techniques

Recursive Decomposition:

1. Concept: Divides problem into smaller sub-problems recursively until it is solvable by a single processor.

2. Recursive decomposition utilizes multiple processors efficiently for sorting large datasets.

3. Implementation: Each processor works on a subset of data, recursively applying quicksort until sorted.

4. Example: Parallel quicksort for sorting a large dataset of numbers.

Data Decomposition:

1. Concept: Divides data into smaller chunks distributed to different processors.

2. Data decomposition enables parallel processing for matrix operations, improving efficiency.

3. Implementation: Processors multiply the sub-matrices independently, then combine results.

4. Example: Parallel matrix multiplication of large matrices A and B.

Exploratory Decomposition:

1. Concept: Divides problem space into sub-spaces for concurrent exploration.

2. Speeds up search for optimal solutions by utilizing multiple processors.

3. Implementation: Each processor explores its sub-space, identifies promising solutions, and communicates
findings for refinement.

4. Example: Parallel parameter space search for optimizing a scientific model.

Limitations of parallelization of any algorithm: -


1. Amdahl's Law limits speedup by sequential tasks.
2. Communication overhead slows parallel processing.
3. Load imbalance reduces overall efficiency.
4. Memory bottleneck restricts scalability.
5. Synchronization complexity affects performance.
6. Not all algorithms parallelize effectively.
Mapping techniques for load balancing in high-performance computing ensure tasks are distributed evenly across
processors to maximize efficiency.

1. Static Mapping:
 Tasks assigned to processors before execution.
 Allocation criteria is predetermined, like task size or complexity.
 Implementation is straightforward, reducing complexity.
 However, there is a possibility of load imbalance, if the tasks vary greatly.
2. Dynamic Mapping:
 Task assignments adjusted during runtime based on system conditions.
 There is a constant monitoring of processor performance and workload distribution.
 Reallocation of tasks is done to ensure balanced load across processors.
 Offers adaptability to changing workload, minimizing idle time.
 Yet, introduces overhead due to constant monitoring and decision-making.

Classification of Dynamic Mapping Techniques: -

Centralized vs. Distributed:

1. Centralized: Master assigns tasks, simpler but can be a bottleneck.

2. Distributed: Processes communicate directly, scalable but more complex.

Deterministic vs. non-deterministic:

1. Deterministic: Rule-based task assignment, predictable but less adaptable.

2. Non-deterministic: Adaptive approaches, potentially better balancing but less predictable.

Informational Requirements:

1. Sender-initiated: Overloaded processes inform for reassignment.

2. Receiver-initiated: Idle processes request tasks, helps balance load.

3. Bilateral: Combines sender and receiver initiation for proactive balancing.

Communication Overhead:

1. Low-overhead: Minimizes communication, suitable for cost-sensitive scenarios.

2. High-overhead: Detailed information gathering for better balancing, but requires more communication.
Different anomalies in parallel algorithms

1. Slow Performance: Parallel processing can be slower due to communication overhead and workload imbalances.

2. Reverse Speedup: Multiple processors may make tasks slower, especially for small tasks or limited
parallelization.

3. Sequential Bottleneck: Inherent sequential work limits overall speedup, as per Amdahl's Law.

4. Data Sharing Issues: Shared-memory systems encounter performance problems due to data access conflicts and
cache invalidation.

5. Synchronization Overhead: Excessive coordination between processors using locks or barriers can slow down
processing.

6. Algorithmic Mismatch: Some algorithms are unsuitable for parallel execution, resulting in unexpected behaviour.

Principles of Parallel Algorithm Design

1. Designing parallel algorithms involves effectively utilizing multiple processors to solve a problem more efficiently
than a sequential approach.
2. Decomposition: Break the problem into smaller, independent tasks or data chunks.
3. Mapping: Assign these tasks or data chunks to processors, aiming for balanced workloads.
4. Communication: Minimize communication between processors to reduce overhead.
5. Synchronization: Coordinate processor execution using synchronization mechanisms sparingly.
6. Algorithmic Suitability: Choose algorithms suitable for parallelization, considering inherent parallelism and
avoiding bottlenecks.

Different methods for containing Interaction Overheads

1. Reduce Communication Costs: Keep frequently used data nearby to minimize communication.
2. Optimize Communication Patterns: Minimize data contention and overlap communication with computations.
3. Strategic Data Management: Carefully replicate often-used data on each processor to avoid communication,
but be cautious of increased memory usage.
4. Utilize System Optimizations: Use pre-optimized communication libraries for efficient data exchange patterns
like broadcasting and gathering.

You might also like