0% found this document useful (0 votes)
27 views5 pages

SWE2017 - Lab Assignment 1pages-7

program mpi code

Uploaded by

ccannavar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views5 pages

SWE2017 - Lab Assignment 1pages-7

program mpi code

Uploaded by

ccannavar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

SWE2017 - Parallel Programming Lab Assignment – 6

alternative strategies like atomic operations or reduction techniques might be preferable, as


they can reduce the overhead associated with critical sections.

5.
Code:
#include <stdio.h>
#include <omp.h>

int main()
{
int counter = 0;
int total_transactions = 100;
int transactions_per_thread = 20;

#pragma omp parallel num_threads(5) // Create 5 threads


{
for (int i = 0; i < transactions_per_thread; i++)
{
// Critical section to safely update the shared counter variable
#pragma omp critical
{
counter++;
printf("Thread %d processed a transaction. Current counter:
%d\n", omp_get_thread_num(), counter);
}
}
}

printf("Final counter value after all transactions: %d\n", counter);


return 0;
}

Output:
SWE2017 - Parallel Programming Lab Assignment – 6
SWE2017 - Parallel Programming Lab Assignment – 6

The #pragma omp critical directive is essential in this scenario because it ensures that
only one thread at a time can access and update the counter variable. Without this
protection, multiple threads might read the same initial counter value simultaneously,
leading to race conditions. For example, two threads could read the same value (say 10),
both increment it, and both write 11 back to counter, resulting in a missed update and an
incorrect final count.

Performance Consideration
The use of #pragma omp critical serializes access to counter, meaning only one thread
can update it at any given moment. This can lead to thread contention and performance
degradation, especially if many threads frequently access the critical section. In high-
throughput applications, alternative strategies like atomic operations (e.g., #pragma omp
atomic) might be more efficient, as they provide a lighter lock specifically for single
increments and could reduce overhead compared to a full critical section.

6.
Code:
#include <mpi.h>
#include <stdio.h>

int is_valid(int number)


{
int digits[6];
int sum = 0;

for (int i = 5; i >= 0; --i)


{
digits[i] = number % 10;
number /= 10;
}

if (digits[0] == 0)
return 0;

for (int i = 0; i < 5; ++i)


{
if (digits[i] == digits[i + 1])
return 0;
}

for (int i = 0; i < 6; ++i)


{
sum += digits[i];
}
if (sum == 7 || sum == 11 || sum == 13)
return 0;
SWE2017 - Parallel Programming Lab Assignment – 6

return 1;
}

int main(int argc, char **argv)


{
int rank, size, local_count = 0, global_count = 0;

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

int start = 100000 + rank * (900000 / size);


int end = start + (900000 / size);

for (int i = start; i < end; i++)


{
if (is_valid(i))
{
local_count++;
}
}

MPI_Reduce(&local_count, &global_count, 1, MPI_INT, MPI_SUM, 0,


MPI_COMM_WORLD);

if (rank == 0)
{
printf("Total valid identifiers: %d\n", global_count);
}

MPI_Finalize();
return 0;
}

Output:

7.
SWE2017 - Parallel Programming Lab Assignment – 6

In a real-time traffic surveillance system that requires high-speed lane detection, optimizing
the Hough Transform is essential. Below is a refined approach using OpenMP for parallel
processing, workload division, atomic operations, and quantization, alongside an
explanation of adaptive edge detection thresholding.

1. OpenMP and MPI

OpenMP is more suitable than MPI for this application because it uses a shared-memory
model, allowing efficient real-time, frame-by-frame processing on a single multi-core
system. OpenMP’s thread-based parallelism reduces latency by distributing tasks across CPU
cores, which is critical in real-time settings. Conversely, MPI is optimized for distributed
systems and would introduce network communication overhead, slowing down the frame
processing required for real-time responsiveness in high-resolution video streams.

2. Workload Division for High-Resolution Frames (e.g., 4K)

In OpenMP, dividing a high-resolution frame (like 4K) into regions—either strips or tiles—
lets each thread process a smaller portion independently. This approach minimizes memory
contention and optimizes cache usage, as each thread performs edge detection and Hough
Transform operations only on its designated region. For example, each thread could handle
edge detection and voting in the Hough space for a strip of the image, then contribute its
results to a shared accumulator array, enabling the detection of lane markings across the
entire frame.

3. Using #pragma omp atomic for Shared Accumulator Array

The Hough Transform requires an accumulator array, where each element represents a line
parameter (angle and distance) and stores votes from detected edges. The #pragma omp
atomic directive helps avoid race conditions in this array by ensuring that each increment
operation is atomic, which means only one thread can update a specific accumulator cell at
a time. This ensures consistency when multiple threads vote on the same line, preventing
data loss or overwriting in the shared array.

To further reduce contention, threads could maintain private accumulator arrays during
processing, later combining them into the main accumulator array.

4. Role of theta_quantize and r_quantize Functions

Quantizing the angle and distance values reduces the computational load while maintaining
accuracy in detecting lane markings:

• theta_quantize: Discretizes the angle (θ) into a set number of bins, which limits the
orientations evaluated, saving processing time by focusing on only the most relevant
angles for lane detection.
• r_quantize: Discretizes the distance (r) from the origin to the line, grouping similar
values into the same bins. This reduces the resolution of the Hough space, speeding
up the accumulation process without sacrificing line-detection accuracy.

You might also like