Gauravkumar 221it027@it301 Lab2
Gauravkumar 221it027@it301 Lab2
Gaurav Kumar
221IT027
int main() {
int i, num_threads;
double x, pi, sum, start_time, end_time;
double seq_time, parallel_time, speedup, efficiency;
sum = 0.0;
gettimeofday(&TimeValue_Start, &TimeZone_Start);
for (i = 0; i < num_steps; i++) {
x = (i + 0.5) * step;
sum += 4.0 / (1.0 + x * x);
}
pi = step * sum;
gettimeofday(&TimeValue_Final, &TimeZone_Final);
time_start = TimeValue_Start.tv_sec * 1000000 + TimeValue_Start.tv_usec;
time_end = TimeValue_Final.tv_sec * 1000000 + TimeValue_Final.tv_usec;
seq_time = (time_end - time_start) / 1000000.0;
printf("Sequential calculation:\n");
printf("Calculated Pi: %f, Time taken: %lf seconds\n\n", pi, seq_time);
omp_set_num_threads(num_threads);
gettimeofday(&TimeValue_Start, &TimeZone_Start);
pi = step * sum;
gettimeofday(&TimeValue_Final, &TimeZone_Final);
time_start = TimeValue_Start.tv_sec * 1000000 +
TimeValue_Start.tv_usec;
time_end = TimeValue_Final.tv_sec * 1000000 + TimeValue_Final.tv_usec;
parallel_time = (time_end - time_start) / 1000000.0;
//OUTPUT:-
ANALYSIS:-
This OpenMP program calculates the value of Pi using numerical integration via the midpoint
rule. The program first computes Pi sequentially, then repeats the calculation in parallel
using different numbers of threads. For each parallel computation, the program measures
the time taken, calculates the speedup and efficiency, and prints the results.
1. Performance Analysis:
o Speedup: As expected, the speedup increases with the number of threads.
However, the speedup is not linear, meaning that doubling the number of
threads does not necessarily halve the computation time.
o Efficiency: The efficiency decreases as the number of threads increases. This
is typical in parallel computing due to overheads such as thread management
and synchronization.
▪ For example, at 32 threads, efficiency drops significantly to 0.121929,
indicating that adding more threads yields diminishing returns. This is
likely due to the overhead becoming more significant compared to the
work done by each thread.
2. Impact of Parallelization:
o Using more threads does reduce the computation time, but the benefits
decrease with higher thread counts. At some point, the overhead of
managing many threads outweighs the benefits of parallelization, as seen
with the 32-thread case.
• The program effectively demonstrates the principles of parallel computing, including
speedup and efficiency.
• The code shows that while parallelization can significantly reduce computation time,
there are limits to its effectiveness due to overhead and the nature of the problem
being parallelized.
• Understanding the trade-offs between speedup and efficiency is crucial for
optimizing parallel programs.
Q.2 Develop an OpenMp program for matrix multiplication (C=A*B). Analyze the
speedup and efficiency of the parallelized code. Vary the size of your matrices
from 250, 500, 750, 1000, and 2000 and measure the runtime with one thread.
For each matrix size, change the number of threads from 2,4,8., and plot the
speedup versus the number of threads. Compute the efficiency.
//CODE:-
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#include <time.h>
int main() {
int sizes[] = {250, 500, 750, 1000, 2000};
int numSizes = sizeof(sizes) / sizeof(sizes[0]);
int threads[] = {1, 2, 4, 8};
int numThreads = sizeof(threads) / sizeof(threads[0]);
initializeMatrix(A, n);
initializeMatrix(B, n);
free(A);
free(B);
free(C);
}
return 0;
}
//OUTPUT:-
speedups = {
250: [1.0, 1.779375, 2.131326, 1.995449],
500: [1.0, 1.914561, 2.517489, 3.225659],
750: [1.0, 1.906177, 2.684145, 3.968366],
1000: [1.0, 1.873228, 2.900678, 4.127598],
2000: [1.0, 1.908705, 3.233444, 4.381061]
}
threads = [2, 4, 8]
plt.xlabel('Number of Threads')
plt.ylabel('Speedup')
plt.title('Speedup vs. Number of Threads for Different Matrix Sizes')
plt.legend()
plt.grid(True)
plt.show()
// Graph:-
ANALYSIS:-
Matrix Size: 250 x 250
• Threads: 2 → Speedup: 1.78, Efficiency: 0.89
• Threads: 4 → Speedup: 2.13, Efficiency: 0.53
• Threads: 8 → Speedup: 1.99, Efficiency: 0.25
• Observation: Speedup increases as the number of threads increases, but efficiency
decreases, suggesting diminishing returns due to overhead from parallelization.
Matrix Size: 500 x 500
• Threads: 2 → Speedup: 1.91, Efficiency: 0.96
• Threads: 4 → Speedup: 2.52, Efficiency: 0.63
• Threads: 8 → Speedup: 3.23, Efficiency: 0.40
• Observation: Speedup improves more significantly with increasing threads compared
to the 250 size matrix, but efficiency still drops as more threads are used.
Matrix Size: 750 x 750
• Threads: 2 → Speedup: 1.91, Efficiency: 0.95
• Threads: 4 → Speedup: 2.68, Efficiency: 0.67
• Threads: 8 → Speedup: 3.97, Efficiency: 0.50
• Observation: Similar to the 500 size matrix, but the gains in speedup with more
threads are more substantial.
Matrix Size: 1000 x 1000
• Threads: 2 → Speedup: 1.87, Efficiency: 0.94
• Threads: 4 → Speedup: 2.90, Efficiency: 0.73
• Threads: 8 → Speedup: 4.13, Efficiency: 0.52
• Observation: For this larger matrix size, speedup becomes more noticeable with 8
threads, though efficiency continues to decrease.
Matrix Size: 2000 x 2000
• Threads: 2 → Speedup: 1.91, Efficiency: 0.95
• Threads: 4 → Speedup: 3.23, Efficiency: 0.81
• Threads: 8 → Speedup: 4.38, Efficiency: 0.55
• Observation: Significant speedup is observed with 8 threads, indicating that larger
matrices benefit more from parallelization. The efficiency remains relatively high
compared to smaller matrices.
Graph Analysis
Plot: "Speedup vs. Number of Threads for Different Matrix Sizes"
• X-Axis: Number of threads (1, 2, 4, 8).
• Y-Axis: Speedup.
1. Speedup Trends:
o For all matrix sizes, the speedup increases as the number of threads
increases.
o The speedup curve tends to be more pronounced for larger matrices (e.g.,
size 2000), indicating better parallelization efficiency.
2. Smaller Matrices (e.g., 250, 500):
o The speedup gain is modest, especially when increasing threads from 4 to 8.
o The efficiency drops faster, showing that the overhead of managing more
threads outweighs the benefits of parallel processing for smaller matrices.
3. Larger Matrices (e.g., 1000, 2000):
o The speedup is more significant, particularly with 8 threads.
o The efficiency remains higher compared to smaller matrices, suggesting that
larger matrices have enough computational workload to benefit more from
parallelism.
4. General Efficiency:
o Efficiency decreases as the number of threads increases, which is typical in
parallel computing due to overhead and contention.
o Larger matrices maintain better efficiency with more threads, making them
more suitable for parallel execution.
• Small Matrices: Parallelization offers limited benefits due to lower computational
demands and higher overhead.
• Large Matrices: Parallelization is highly effective, particularly with a greater number
of threads, leading to significant speedup and better resource utilization.
• Optimal Thread Usage: For maximum efficiency, a balance must be struck between
the matrix size and the number of threads. Too many threads for smaller matrices
result in diminishing returns, while larger matrices can effectively leverage more
threads for substantial performance gains.
//CODE:-
#include <stdio.h>
#include <omp.h>
fib[0] = 0;
if (limit > 1) {
fib[1] = 1;
}
int main() {
omp_init_lock(&print_lock);
if (n > MAX_N) {
printf("Number of terms exceeds maximum limit of %d\n",
MAX_N);
return 1;
}
if (num_threads < 2) {
printf("Number of threads must be at least 2.\n");
return 1;
}
// Start timing
start_time = omp_get_wtime();
end_time = omp_get_wtime();
total_time = end_time - start_time;
omp_destroy_lock(&print_lock);
return 0;
}
//OUTPUT:-
ANALYSIS:-
o Increasing the number of threads generally results in reduced total time and
higher speedup.
o However, the efficiency in the output suggests an anomaly as efficiency
should ideally decrease with increasing thread count due to overhead and
diminishing returns.
o For such a small problem size (10 Fibonacci numbers), the overhead of
managing multiple threads can be significant compared to the work done,
leading to unusually high speedup and efficiency values.
o In practical scenarios with larger problem sizes, the overhead becomes more
noticeable, and the efficiency decreases as more threads are used.
The program successfully demonstrates multi-threaded generation and printing of the
Fibonacci series using OpenMP. The observed high speedup and efficiency values are
primarily due to the small problem size and very fast computation time.
//CODE:-
#include <stdio.h>
#include <omp.h>
int main() {
int size = 5;
int A[5], B[5], C[5];
omp_set_num_threads(5);
return 0;
}
//OUTPUT:-
ANALYSIS:-
This OpenMP program performs vector addition of two one-dimensional arrays (A and B) of
size 5 using 5 threads.
1. Thread Assignment:
o The threads are assigned iterations non-deterministically, meaning the order
in which threads complete their tasks can vary each time the program is run.
In this case, threads 0, 4, 3, 1, and 2 completed their tasks in that order.
2. Computation:
o Each thread correctly adds corresponding elements from arrays A and B and
stores the result in C. For example, thread 0 computes C[0] = 1 + 10 = 11, and
thread 4 computes C[4] = 5 + 14 = 19.
3. Output Order:
o The output order of the threads in the terminal may not follow the order of
the indices because the threads execute concurrently. However, each element
of C is correctly calculated based on the input arrays.
4. Final Result:
o The resultant vector C correctly contains the sums of the corresponding
elements of arrays A and B, demonstrating successful parallel computation.
• The program efficiently utilizes 5 threads to perform vector addition. The
parallelization ensures that the task is divided among the threads, potentially
speeding up the computation compared to a single-threaded approach.
• The output correctly reflects the work done by each thread, showing that each
thread handled different parts of the task and contributed to the final result.
//CODE:-
#include <stdio.h>
#include <omp.h>
int main() {
int i;
int num_threads = 4;
return 0;
}
OUTPUT:-
(b) Execute the same program with firstprivate(), record the results and write
your observation.
//code:-
#include <stdio.h>
#include <omp.h>
int main() {
int i;
int num_threads = 4;
return 0;
}
//OUTPUT:-
ANALYSIS:-
In the first program, the variables x and y are shared among all threads, but i is declared as
private.
Each thread has its own private copy of i, initialized with the thread number
(omp_get_thread_num()).
Since x and y are shared, any modification to these variables by one thread will affect their
values as seen by other threads.
Threads execute in parallel, and due to the shared nature of x and y, their values are updated
by each thread.
The final output of x = 16 and y = 26 is the result of the cumulative modifications by all
threads.
The thread execution order is non-deterministic, so the sequence in which x and y are
updated varies with each run.
In the second program, the variables x and y are declared as firstprivate.
Each thread gets its own private copy of x and y, initialized to 10 and 20, respectively, before
entering the parallel region.
Changes made to x and y inside a thread do not affect the copies of other threads or the
original values of x and y.
Each thread starts with its own copy of x = 10 and y = 20.
The threads modify their copies of x and y, but these changes are local to each thread.
After the parallel region, the original x and y remain unchanged at 10 and 20, respectively.
This behavior contrasts with the private case, where changes were visible across threads.
• private(x, y): Variables x and y are shared among threads, leading to potential race
conditions where multiple threads modify the same variables concurrently, causing
non-deterministic output.
• firstprivate(x, y): Each thread operates on its own copy of x and y, initialized with the
values before the parallel region. The original variables remain unchanged after the
parallel region.
This comparison highlights the importance of choosing the right variable scope (private vs.
firstprivate) depending on whether you want variables to be shared or to have thread-
specific copies with initial values.