Program Excecution ExpFinal
Program Excecution ExpFinal
Program 1
#include <stdio.h>
#include <omp.h>
int main() {
return 0;
Output
C:\Users\PC\Documents\HPC-College>a.exe
Explanation:
• #include <omp.h>: This includes the OpenMP header file, that provides declarations of OpenMP
functions and directives.
• #pragma omp parallel: This instruction forces the compiler to create a parallel region where many
threads will execute the code in the block.
• printf("Hello from thread %d
", omp_get_thread_num());: Inside this parallel region, each of the threads prints a message
showing the number of threads.
• To compile this program, you need to use a compiler supporting OpenMP. In the case of GCC, you
need to use the following flag: -fopenmp. Here's how to do that: Code copy
Running:
• When you execute the compiled program, "Hello from thread X" will be printed where X is the
number of the thread assigned by OpenMP. How many threads are created actually depends on the
environment settings or on the default configuration.
• Environment Variables:
• The number of threads used in the parallel region can be influenced through the
OMP_NUM_THREADS environment variable. For instance:
export OMP_NUM_THREADS=4
Thread Safety:
The OpenMP #pragma omp parallel directive is used to tell your code what should be parallel.
Below is an explanation of how the code works and what it does.
1.Parallel Execution:
Pragma omp parallel "#pragma omp parallel" serves primarily to create a parallel region in
which multiple threads can execute code concurrently. This implies that the block of code following
the directive is going to be executed simultaneously by several threads, making parallel processing
possible.
Whenever #pragma omp parallel is encountered, OpenMP supports the threads creation
and deletion automatically. You do not need to create or synchronize threads by yourself; OpenMP
manages these issues.
3. Block of Code Execution:
Fast, each thread will execute the code within the parallel region by itself. This enables the
programmer to split the work among several threads; this is in most cases does reduce the total
execution time of the program if these tasks can be run in parallel.
Here,
• printf is called from within this parallel region by all threads. Since omp_get_thread_num() returns
the ID of the thread that executes the current code, each thread prints a different ID.
Important Features
1.Number of Threads:
The number of threads in the parallel region is decided by the OpenMP runtime system. You
can override this number using environment variables like OMP_NUM_THREADS or
programmatically specify this with omp_set_num_threads().
Variables in a parallel region can either be private to each thread or shared between
threads. Variables are shared by default unless otherwise specified by additional OpenMP clauses,
for example, private, firstprivate, lastprivate.
3.Synchronization:
The basic synchronization issues are handled implicitly by OpenMP. Although in case of
complex synchronization needs, like race conditions, you may be required to use more OpenMP
constructs, such as critical sections or even atomic operations.
4.Scalability:
The #pragma omp parallel directive allows you to write parallel code that should scale with
the number of available processors or cores. By using a number of cores, you can gain immense
speedup in computational tasks.
If you want to explicitly set the number of threads, you can use:
#include <omp.h>
#include <stdio.h>
int main() {
This example is modified to force the number of threads to be 4 with the call to
omp_set_num_threads(). The output will now contain messages from 4 threads, if the system
supports as many threads.
Conclusion
Finally, the #pragma omp parallel declaration summarizes one of the basic parallelizing constructs
of OpenMP for parallelizing code for execution on multiple threads; it exploits multi-core
processors for a performance benefit.
Here, this modification offers control to have only 4 threads; thus, a system with only 4 threads will
execute.
#include <omp.h>
#include <stdio.h>
int main() {
{ // Create tasks
{ // Process task i
Output
C:\Users\PC\Documents\HPC-College>a.exe
Explanation of Code
1. Parallel Region:
#pragma omp parallel: This is the directive that initiates a parallel region where multiple threads
are created.
2. Single Directive:
#pragma omp single: It ensures that only one thread in the whole parallel region executes the
enclosed code block. It is mainly used to ensure that only one thread either creates tasks or some
initialization.
3. Task Creation:
The block #pragma omp single inside contains a for loop going from 0 to 9, and for every iteration, it
creates a new task with #pragma omp task.
4. Processing of Tasks:
Each task, represented by the block of code inside #pragma omp task, will be executed
asynchronously by any available thread in the parallel region.
5. Every line indicates which thread has processed which task; the example as a whole, how tasks
are distributed between the threads.
6. Conclusion
With these modifications, the code now creates and executes tasks correctly inside a parallel
region. The #pragma omp single ensures only one thread creates tasks, while #pragma omp
taskwait ensures that all the tasks are completed before exiting a parallel region.
#include <omp.h>
#include <stdio.h>
{ array[i] = i;
array[i] = array[i] * 2;
return 0;
Output
C:\Users\PC\Documents\HPC-College>a.exe
array[0] = 0
array[1] = 2
array[2] = 4
array[3] = 6
array[4] = 8
array[5] = 10
array[6] = 12
array[7] = 14
array[8] = 16
array[9] = 18
Explanation of Directives
This directive informs the compiler to parallelize the for loop across multiple threads. The portion
of the loop's iteration is handled by each thread. Distribution of the iterations across threads is
done for its concurrent execution.
This directive instructs the compiler to vectorize the loop using SIMD – Single Instruction,
Multiple Data. SIMD operations perform the same operation on many data points simultaneously
with a single instruction. This can greatly improve performance for vectorizable operations.
How It Works
1. Parallelization:
Directives #pragma omp parallel for have spread iterations of the loop over available threads and
many iterations may be executed in parallel. The OpenMP runtime is responsible for spawning
threads and distributing the parallel work among the threads.
2. Vectorization:
#pragma omp simd The directive tells the compiler that it should produce SIMD instructions for the
loop. It allows the loop to be executed in parallel on modern processor's SIMD hardware
capabilities such as SSE or AVX instructions on x86 architectures.
Let's consider N as 1000. We create an array and initialize an array having values from 0 to 999. The
following parallelized and vectorized loop will double all elements of array. Initialization: array[i] = i;
- This instruction fills array having values from 0 to 999. Execution of a loop: array[i] = array[i] * 2; -
This statement doubles each element in array[]. The loop is parallelized and vectorized for better
performance.
Practical Considerations
1. Compiler Supports:
Make sure your compiler does support OpenMP and SIMD directives. Most of the modern
ones, such as GCC, Clang, and Intel's ICC, do support it.
2.Flags for the Compiler:
O While compiling, use flags to enable OpenMP, and optimization. For GNU GCC and Clang,
for instance:
Flag –fopenmp turns the OpenMP on, and –O2 turns on optimization including SIMD
vectorization.
3.Data alignment:
Ensure to align your data in memory for SIMD operations. Some compilers and processors
need the data to be aligned at a certain location for the best SIMD performance.
In this output, it can be seen, that each element has become doubled, what shows, that designed
loop, parallelized and vectorised, has worked.
#include<stdio.h>
for(i=1;i<=100;i++)
sum=sum+i;
printf("Sum is %d",sum);
} Output 5050
#include <stdio.h>: Includes the standard input-output library necessary for using printf.
int main(): The entry point of the program. In C, main should return an int, so it's best to declare it
as int main().
int i, sum = 0;: Declares i for the loop counter and sum to accumulate the total.
sum = sum + i;: Adds the current value of i to sum in each iteration.
printf("Sum is %d\n", sum);: Prints the final sum after the loop finishes.
Now making sum with parallel programming and threads > Example program
#include <stdio.h>
int main() {
int sum = 0; // Variable to hold the final sum
tsum[id] = 0;
tsum[id] += i; }
// After the parallel region, sum up the partial sums from each thread
sum += tsum[i];
// Print the final sum which is the total sum of numbers from 0 to 100
} Output
sum=5050
Key Points of the Program
1. Initialization:
o int sum=0, tsum[4], i; initializes the variables. sum will store the final sum of all
integers from 0 to 100. tsum[4] is an array to store partial sums from each thread.
2. OpenMP Setup:
o Each thread initializes its local tsum[id] to 0, where id is the thread number.
3. Parallel Loop:
o #pragma omp for divides the for-loop iterations among the threads. Each thread
calculates its partial sum in tsum[id].
4. Aggregation:
o After the parallel region, the main thread prints the partial sum computed by each
thread and aggregates these into sum.