24csppc202 Multicore Architecture and Programming
24csppc202 Multicore Architecture and Programming
PART-A
2. Data Parallelism:
• Concept:
Splitting large datasets into smaller chunks and assigning each chunk to a different core
for processing.
• Example:
Processing a large image by dividing it into tiles and assigning each tile to a different
core.
• Benefits:
Simple to implement and can significantly speed up computations on large datasets.
• Tools:
Libraries like OpenMP (for C/C++) and MPI (for more complex distributed systems) can
facilitate data parallelism.
12.B Discuss deadlocks and livelocks in parallel programming. Explain various techniques
to prevent them.
Deadlocks and livelocks are concurrency problems that can occur in parallel programming,
hindering progress. Deadlocks are situations where two or more processes are blocked
indefinitely, waiting for each other to release resources, while livelocks involve processes
repeatedly changing their state in response to each other, but without making any actual
progress. Several techniques can be employed to prevent these issues.
Deadlocks
Deadlocks occur when two or more processes are stuck in a circular wait, each holding a
resource that the other needs to proceed. This can happen when processes acquire locks in a non-
consistent order, hold and wait for other resources, and when resources cannot be preempted.
Techniques to Prevent Deadlocks:
• No Preemption:
Ensure that a process either acquires all resources it needs at once or releases all
held resources before waiting for another.
• Timeout Mechanisms:
Livelocks
Livelocks are similar to deadlocks but occur when processes are not blocked, but they are
constantly changing their state to avoid a conflict, but are unable to make progress. This often
happens when processes repeatedly retry operations that will always fail.
• Random Backoff:
• Semaphores:
Use semaphores to control concurrency and limit the number of processes
accessing shared resources.
Ensure that transactions are designed to handle potential failures and avoid
infinite loops.
Refactor code to reduce the need for frequent retries that can contribute to livelocks.
13.A) Describe different OpenMP directives and their usage with examples.
1. parallel Directive
• Purpose: Defines a parallel region where a team of threads executes the enclosed code
block concurrently.
• Syntax:
{
// Code executed by multiple threads
}
Example:
2. for Directive
• Purpose: Distributes loop iterations among threads in a parallel region.
• Syntax:
// Loop body
Example:
• Purpose: Splits the enclosed code into sections, each executed by a different thread.
• Syntax:
// Section 1 code
// Section 2 code
}
Example:
{
printf("Section 2 executed by thread %d\n", omp_get_thread_num());
• Purpose: Specifies a block of code that should be executed by only one thread in a
parallel region.
• Syntax:
#pragma omp single
Example:
5. critical Directive
• Purpose: Ensures mutual exclusion for a block of code, preventing race conditions.
• Syntax:
// Critical section
}
Example:
int sum = 0;
sum += array[i];
}
6. barrier Directive
• Purpose: Synchronizes all threads in a team; threads wait until all have reached the
barrier.
• Syntax:
Example:
#pragma omp parallel
// Some computation
7. master Directive
• Purpose: Specifies a block of code to be executed only by the master thread (thread 0).
• Syntax:
#pragma omp master
Example:
Usage: For operations specific to the master thread without synchronization overhead.
8. atomic Directive
• Syntax:
#pragma omp atomic
shared_var++;
Example:
int count = 0;
count++;
}
Usage: For efficient synchronization on simple operations like increments.
9. task Directive
• Syntax:
// Task code
Example:
#pragma omp parallel
}
#pragma omp task
14.A.) Describe different MPI constructs and libraries used for distributed memory
programming.
Introduction to MPI
MPI (Message Passing Interface) is the de facto standard for distributed memory parallel
programming. It allows processes running on different nodes (with separate memory) to
communicate by sending and receiving messages explicitly.
MPI programs typically use multiple processes (not threads) and require explicit coordination.
• Usage:
MPI_Init(&argc, &argv);
// ... MPI calls ...
MPI_Finalize();
• Purpose: Get the number of processes and the rank (ID) of each process.
• Usage:
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
c) MPI_Send and MPI_Recv
• Usage:
a) MPI_Bcast
b) MPI_Scatter
c) MPI_Gather
e) MPI_Allreduce
• Like MPI_Reduce but result is distributed to all processes.
3. Synchronization Constructs
a) MPI_Barrier
MPI_Barrier(MPI_COMM_WORLD);
MPI allows defining custom data types for sending non-contiguous data.
a) MPI_Type_create_struct
a) MPI_COMM_WORLD
b) MPI_Comm_split
• Used to divide the global communicator into subgroups for logical task grouping.
6. MPI Libraries
a) MPICH
b) OpenMPI
• Open-source implementation used in many HPC environments.
c) Intel MPI
#include <stdio.h>
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == 0) {
} else if (rank == 1) {
MPI_Finalize();
return 0;
15.B. Discuss OpenMP and MPI implementations with a case study. Compare their
advantages and limitations.
OpenMP and MPI Implementations with a Case Study: Comparison of Advantages and
Limitations
1. Introduction to OpenMP and MPI
Approach:
Code Snippet:
Characteristics:
Approach:
• Each process computes its part of matrix C and sends it back to root.
Code Snippet:
Characteristics:
• 5. Performance Comparison
Metric OpenMP MPI
Execution Time Faster on single node Scales better across multiple nodes
Limited to one node (shared
Scalability Scales across cluster nodes
memory)
Metric OpenMP MPI
Communication Implicit, memory shared Explicit, requires message passing
Memory Usage Efficient due to shared memory Requires memory duplication
Development Complex with explicit
Easier to code and debug
Effort synchronization
Synchronization Handled by OpenMP runtime Manually managed using barriers
OpenMP Advantages
OpenMP Limitations
MPI Advantages
MPI Limitations
Approach:
8. Conclusion
• For maximum flexibility and scalability, hybrid models (MPI + OpenMP) are the most
effective.