0% found this document useful (0 votes)
19 views

Program Excecution ExpFinal

Coding

Uploaded by

Medini Sree S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Program Excecution ExpFinal

Coding

Uploaded by

Medini Sree S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Programs Codes

Intro to Threads and parallel programming

Program 1

#include <stdio.h>

#include <omp.h>

int main() {

// Start a parallel region

#pragma omp parallel

{ // Each thread prints "Hello, World!" with its thread ID

printf("Hello, World! from thread %d\n", omp_get_thread_num());

return 0;

Output

C:\Users\PC\Documents\HPC-College>gcc -fopenmp helloworld.c

C:\Users\PC\Documents\HPC-College>a.exe

Hello, World! from thread 1

Hello, World! from thread 0

Hello, World! from thread 3

Hello, World! from thread 4

Hello, World! from thread 2

Hello, World! from thread 7

Hello, World! from thread 6

Hello, World! from thread 5

Explanation:

• #include <omp.h>: This includes the OpenMP header file, that provides declarations of OpenMP
functions and directives.

• #pragma omp parallel: This instruction forces the compiler to create a parallel region where many
threads will execute the code in the block.
• printf("Hello from thread %d

", omp_get_thread_num());: Inside this parallel region, each of the threads prints a message
showing the number of threads.

• Compiling with OpenMP:

• To compile this program, you need to use a compiler supporting OpenMP. In the case of GCC, you
need to use the following flag: -fopenmp. Here's how to do that: Code copy

gcc -fopenmp -o myprogram myprogram.c

• It tells GCC to enable OpenMP and link the underlying libraries.

Running:

• When you execute the compiled program, "Hello from thread X" will be printed where X is the
number of the thread assigned by OpenMP. How many threads are created actually depends on the
environment settings or on the default configuration.

• Environment Variables:

• The number of threads used in the parallel region can be influenced through the
OMP_NUM_THREADS environment variable. For instance:

export OMP_NUM_THREADS=4

• This sets the number of threads to 4.

Thread Safety:

• The printf function is normally thread-safe in most implementations, though it is a good


practice that your code remains designed to avoid race conditions, especially with more complex
data handling.

The OpenMP #pragma omp parallel directive is used to tell your code what should be parallel.
Below is an explanation of how the code works and what it does.

What is #pragma omp parallel for?

1.Parallel Execution:

Pragma omp parallel "#pragma omp parallel" serves primarily to create a parallel region in
which multiple threads can execute code concurrently. This implies that the block of code following
the directive is going to be executed simultaneously by several threads, making parallel processing
possible.

2. Automatic Thread Management:

Whenever #pragma omp parallel is encountered, OpenMP supports the threads creation
and deletion automatically. You do not need to create or synchronize threads by yourself; OpenMP
manages these issues.
3. Block of Code Execution:

Fast, each thread will execute the code within the parallel region by itself. This enables the
programmer to split the work among several threads; this is in most cases does reduce the total
execution time of the program if these tasks can be run in parallel.

Here,

#pragma omp parallel begins a parallel region.

• printf is called from within this parallel region by all threads. Since omp_get_thread_num() returns
the ID of the thread that executes the current code, each thread prints a different ID.

Important Features

1.Number of Threads:

The number of threads in the parallel region is decided by the OpenMP runtime system. You
can override this number using environment variables like OMP_NUM_THREADS or
programmatically specify this with omp_set_num_threads().

2.Private and Shared Variables:

Variables in a parallel region can either be private to each thread or shared between
threads. Variables are shared by default unless otherwise specified by additional OpenMP clauses,
for example, private, firstprivate, lastprivate.

3.Synchronization:

The basic synchronization issues are handled implicitly by OpenMP. Although in case of
complex synchronization needs, like race conditions, you may be required to use more OpenMP
constructs, such as critical sections or even atomic operations.

4.Scalability:

The #pragma omp parallel directive allows you to write parallel code that should scale with
the number of available processors or cores. By using a number of cores, you can gain immense
speedup in computational tasks.

Example with Control of Number of Threads

If you want to explicitly set the number of threads, you can use:

#include <omp.h>

#include <stdio.h>

int main() {

omp_set_num_threads(4); // Set the number of threads to 4

#pragma omp parallel

{ printf("Hello from thread %d\n", omp_get_thread_num());


} return 0;

This example is modified to force the number of threads to be 4 with the call to
omp_set_num_threads(). The output will now contain messages from 4 threads, if the system
supports as many threads.

Conclusion

Finally, the #pragma omp parallel declaration summarizes one of the basic parallelizing constructs
of OpenMP for parallelizing code for execution on multiple threads; it exploits multi-core
processors for a performance benefit.

Here, this modification offers control to have only 4 threads; thus, a system with only 4 threads will
execute.

General Pragma Program

#include <omp.h>

#include <stdio.h>

int main() {

// Start parallel region

#pragma omp parallel

{ // Ensure that task creation is done by only one thread

#pragma omp single

{ // Create tasks

for (int i = 0; i < 10; i++) {

#pragma omp task

{ // Process task i

printf("Processing task %d in thread %d\n", i, omp_get_thread_num());

} // End of single block

// Ensure all tasks are completed before exiting parallel region

#pragma omp taskwait

} // End of parallel region


return 0;

Output

C:\Users\PC\Documents\HPC-College>a.exe

Processing task 0 in thread 4

Processing task 8 in thread 5

Processing task 9 in thread 5

Processing task 4 in thread 2

Processing task 3 in thread 1

Processing task 5 in thread 7

Processing task 6 in thread 6

Processing task 1 in thread 0

Processing task 7 in thread 4

Processing task 2 in thread 3

Explanation of Code

1. Parallel Region:

#pragma omp parallel: This is the directive that initiates a parallel region where multiple threads
are created.

2. Single Directive:

#pragma omp single: It ensures that only one thread in the whole parallel region executes the
enclosed code block. It is mainly used to ensure that only one thread either creates tasks or some
initialization.

3. Task Creation:

The block #pragma omp single inside contains a for loop going from 0 to 9, and for every iteration, it
creates a new task with #pragma omp task.

4. Processing of Tasks:

Each task, represented by the block of code inside #pragma omp task, will be executed
asynchronously by any available thread in the parallel region.

5. Every line indicates which thread has processed which task; the example as a whole, how tasks
are distributed between the threads.

6. Conclusion
With these modifications, the code now creates and executes tasks correctly inside a parallel
region. The #pragma omp single ensures only one thread creates tasks, while #pragma omp
taskwait ensures that all the tasks are completed before exiting a parallel region.

General Array Assigning Program

#include <omp.h>

#include <stdio.h>

#define N 1000 // Example size of the array

int main() { int array[N];

// Initialize the array with some values

for (int i = 0; i < N; i++)

{ array[i] = i;

// Parallelize and vectorize the loop

#pragma omp parallel for simd

for (int i = 0; i < N; i++) {

array[i] = array[i] * 2;

} // Print the first 10 elements to verify

for (int i = 0; i < 10; i++) {

printf("array[%d] = %d\n", i, array[i]);

return 0;

Output

C:\Users\PC\Documents\HPC-College>a.exe

array[0] = 0

array[1] = 2

array[2] = 4

array[3] = 6

array[4] = 8
array[5] = 10

array[6] = 12

array[7] = 14

array[8] = 16

array[9] = 18

Explanation of Directives

1. #pragma omp parallel for:

This directive informs the compiler to parallelize the for loop across multiple threads. The portion
of the loop's iteration is handled by each thread. Distribution of the iterations across threads is
done for its concurrent execution.

2. #pragma omp simd:

This directive instructs the compiler to vectorize the loop using SIMD – Single Instruction,
Multiple Data. SIMD operations perform the same operation on many data points simultaneously
with a single instruction. This can greatly improve performance for vectorizable operations.

How It Works

1. Parallelization:

Directives #pragma omp parallel for have spread iterations of the loop over available threads and
many iterations may be executed in parallel. The OpenMP runtime is responsible for spawning
threads and distributing the parallel work among the threads.

2. Vectorization:

#pragma omp simd The directive tells the compiler that it should produce SIMD instructions for the
loop. It allows the loop to be executed in parallel on modern processor's SIMD hardware
capabilities such as SSE or AVX instructions on x86 architectures.

Example Use Case

Let's consider N as 1000. We create an array and initialize an array having values from 0 to 999. The
following parallelized and vectorized loop will double all elements of array. Initialization: array[i] = i;
- This instruction fills array having values from 0 to 999. Execution of a loop: array[i] = array[i] * 2; -
This statement doubles each element in array[]. The loop is parallelized and vectorized for better
performance.

Practical Considerations

1. Compiler Supports:

Make sure your compiler does support OpenMP and SIMD directives. Most of the modern
ones, such as GCC, Clang, and Intel's ICC, do support it.
2.Flags for the Compiler:

O While compiling, use flags to enable OpenMP, and optimization. For GNU GCC and Clang,
for instance:

Flag –fopenmp turns the OpenMP on, and –O2 turns on optimization including SIMD
vectorization.

3.Data alignment:

Ensure to align your data in memory for SIMD operations. Some compilers and processors
need the data to be aligned at a certain location for the best SIMD performance.

In this output, it can be seen, that each element has become doubled, what shows, that designed
loop, parallelized and vectorised, has worked.

General Example of Adding sum program

#include<stdio.h>

main(){ int i,sum=0;

for(i=1;i<=100;i++)

sum=sum+i;

printf("Sum is %d",sum);

} Output 5050

#include <stdio.h>: Includes the standard input-output library necessary for using printf.

int main(): The entry point of the program. In C, main should return an int, so it's best to declare it
as int main().

int i, sum = 0;: Declares i for the loop counter and sum to accumulate the total.

for (i = 1; i <= 100; i++): A loop that iterates from 1 to 100.

sum = sum + i;: Adds the current value of i to sum in each iteration.

printf("Sum is %d\n", sum);: Prints the final sum after the loop finishes.

return 0;: Returns 0 to indicate that the program completed successfully.

Manual Calculation Formula {N(N+1)}/2 =Sum.

Now making sum with parallel programming and threads > Example program

#include <stdio.h>

#include <omp.h> // Include the OpenMP header for parallel programming

int main() {
int sum = 0; // Variable to hold the final sum

int tsum[4]; // Array to hold partial sums from each thread

int i; // Loop index variable

omp_set_num_threads(4); // Set the number of threads to be used in parallel regions

#pragma omp parallel

{ int id = omp_get_thread_num(); // Get the unique ID of the current thread

// Initialize the partial sum for each thread

tsum[id] = 0;

#pragma omp for

for (i = 0; i <= 100; i++) {

// Each thread calculates the sum of numbers from 0 to 100

tsum[id] += i; }

// After the parallel region, sum up the partial sums from each thread

for (i = 0; i < 4; i++) {

// Print the sum calculated by each thread

printf("\nThe sum in thread id %d is %d", i, tsum[i]);

// Add the partial sum of each thread to the total sum

sum += tsum[i];

// Print the final sum which is the total sum of numbers from 0 to 100

printf("\nsum = %d\n", sum);

return 0; // Return 0 to indicate successful completion

} Output

The sum in thread id 0 is 325

The sum in thread id 1 is 950

The sum in thread id 2 is 1575

The sum in thread id 3 is 2200

sum=5050
Key Points of the Program

1. Initialization:

o int sum=0, tsum[4], i; initializes the variables. sum will store the final sum of all
integers from 0 to 100. tsum[4] is an array to store partial sums from each thread.

2. OpenMP Setup:

o omp_set_num_threads(4); sets the number of threads to 4.

o #pragma omp parallel starts a parallel region with 4 threads.

o Each thread initializes its local tsum[id] to 0, where id is the thread number.

3. Parallel Loop:

o #pragma omp for divides the for-loop iterations among the threads. Each thread
calculates its partial sum in tsum[id].

4. Aggregation:

o After the parallel region, the main thread prints the partial sum computed by each
thread and aggregates these into sum.

You might also like