High Performance Computing System (CSE 5154) RCS
High Performance Computing System (CSE 5154) RCS
1A. Availability of a parallel computer system is 2-D MESH SIMD. There are N elements in
an array to be partitioned among p number of processers in mesh SIMD where p <N.
After partitioning, the respective processors perform the summation of the numbers
allocated to them simultaneously. Write a parallel algorithm to perform summation by
the respective processors after partitioning the numbers and also obtain the final sum.
Give the time complexity analysis of your algorithm. 4M
1B. With appropriate diagram, explain Rotating daisy chain algorithm. How the algorithm
will have to handle the device priority issues? 4M
1C. Identify and write the appropriate MPI function to meet the following requirement with
suitable examples:
All the processes collect data from all other processes in the same communicator,
and perform an operation on the data. 2M
2A. Define kernel for a OpenCL program. Write a kernel program to find the square of each
element of an array and add the respective elements of the original array. Also write an
equivalent code for multithreaded version for the same. 4M
2B. Write a CUDA kernel to add two Matrices A and B of dimensions M X N. Kernel uses
only one block and uses M number of CUDA threads in it. Also write code snippet of the
main program to show how do you read these two matrices and invoking the kernel to
meet the above specification? 3M
2C. In Question 2B, use two matrices each of size 4 X 3 with appropriate sample elements in
it. Explain how exactly your kernel looks at these matrices and write down the iterative
steps showing how exactly it handles the matrices. 3M