0% found this document useful (0 votes)
35 views707 pages

HPC Final Merged

Uploaded by

YASH AJAY LOHAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views707 pages

HPC Final Merged

Uploaded by

YASH AJAY LOHAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 707

Seat No -

Total number of questions : 60

11342_High Performance Computing


Time : 1hr
Max Marks : 50
N.B

1) All questions are Multiple Choice Questions having single correct option.

2) Attempt any 50 questions out of 60.

3) Use of calculator is allowed.

4) Each question carries 1 Mark.

5) Specially abled students are allowed 20 minutes extra for examination.

6) Do not use pencils to darken answer.

7) Use only black/blue ball point pen to darken the appropriate circle.

8) No change will be allowed once the answer is marked on OMR Sheet.

9) Rough work shall not be done on OMR sheet or on question paper.

10) Darken ONLY ONE CIRCLE for each answer.

Q.no 1. MIPS stands for?

A : Mandatory Instructions/sec

B : Millions of Instructions/sec

C : Most of Instructions/sec

D : Many Instructions / sec

Q.no 2. Depth First Search is equivalent to which of the traversal in the Binary
Trees?

A : Pre-order Traversal

B : Post-order Traversal

C : Level-order Traversal

D : In-order Traversal
Q.no 3. Regarding implementation of Breadth First Search using queues, what is
the maximum distance between two nodes present in the queue? (considering
each edge length 1)

A : Can be anything

B:0

C : At most 1

D : Insufficient Information

Q.no 4. Calling a kernel is typically referred to as _________.

A : kernel thread

B : kernel initialization

C : kernel termination

D : kernel invocation

Q.no 5. The decomposition technique in which the function is used several


number of times is called as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 6. The decomposition technique in which the input is divided is called


as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 7. Several instructions execution simultaneously in ________________

A : processing

B : parallel processing
C : serial processing

D : multitasking

Q.no 8. Following is not decomposition technique

A : Data Decomposition

B : Recursive Decomposition

C : Serial Decomposition

D : Exploratory Decomposition

Q.no 9. How many Attibutes required to characterize messsage passing paragdigm

A:2

B:4

C:6

D:8

Q.no 10. Which of the following is not an in-place sorting algorithm?

A : Selection sort

B : Heap sort

C : Quick Sort

D : Merge sort

Q.no 11. The time complexity of heap sort in worst case is

A : O(log n)

B : O(n)

C : O(nlogn)

D : O(n^2)

Q.no 12. Most message-passing programs are written using

A : the single program multiple data (SPMD) model.

B : the multiple program and single data(MPSD) model


C : the single program single data (SPSD) model

D : the Multiple program multiple data (SPMD) model

Q.no 13. Decomposition stands for

A : Dividing Problem statement

B : Dividing no of processors

C : Dividing number of tasks

D : Dividing number of operation

Q.no 14. Message-passing programs are often written using

A : symetric Paradigm

B : asymetric Paradigm

C : asynchronous paradigm

D : synchronous paradigm

Q.no 15. Following is not mapping technique

A : Static Mapping

B : Dynamic Mapping

C : Hybrid Mapping

D : All of Above

Q.no 16. Which of the following is not a stable sorting algorithm?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 17. Type of HPC applications of

A : Management

B : Media mass
C : Business

D : Science

Q.no 18. The kernel code is dentified by the ________qualifier with void return type

A : _host_

B : __global__

C : _device_

D : void

Q.no 19. The time complexity of a quick sort algorithm which makes use of
median, found by an O(n) algorithm, as pivot element is

A : O(n^2)

B : O(nlogn)

C : O(nlog(log(n))

D : O(n)

Q.no 20. When the Breadth First Search of a graph is unique?

A : When the graph is a Binary Tree

B : When the graph is a Linked List

C : When the graph is a n-ary Tree

D : When the graph is a Ternary Tree

Q.no 21. Which of the following is not an application of Depth First Search?

A : For generating topological sort of a graph

B : For generating Strongly Connected Components of a directed graph

C : Detecting cycles in the graph

D : Peer to Peer Networks

Q.no 22. The logical view of a machine supporting the message-passing paradigm
consists of p processes, each with its own _______

A : Partitioned Address space


B : Exclusive address space

C : Logical Adress Space

D : Non shared Adress Space

Q.no 23. Which one of the following is not shared by threads?

A : program counter

B : stack

C : both program counter and stack

D : none of the mentioned

Q.no 24. Which of the following is a stable sorting algorithm?

A : Merge sort

B : Typical in-place quick sort

C : Heap sort

D : Selection sort

Q.no 25. In ………………. only one process at a time is allowed into its critical
section, among all processes that have critical sections for the same resource.

A : Mutual Exclusion

B : Synchronization

C : Deadlock

D : Starvation

Q.no 26. We have an internet cloud of resources In cloud computing to form

A : Centralized computing

B : Decentralized computing

C : Parallel computing

D : All of Above

Q.no 27. Broader concept offers Cloud computing .to select which of the following.

A : Parallel computing
B : Centralized computing

C : Utility computing

D : Decentralized computing

Q.no 28. Writing parallel programs is referred to as

A : Parallel computation

B : Parallel processes

C : Parallel development

D : Parallel programming

Q.no 29. Network interfaces allow the transfer of


messages from buffer memory to desired location without ____ intervention

A : DMA

B : CPU

C : I/O

D : Memory

Q.no 30. Consider the situation in which assignment operation is very costly.
Which of the following sorting algorithm should be performed so that the
number of assignment operations is minimized in general?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 31. A process can be ___________

A : single threaded

B : multithreaded

C : both single threaded and multithreaded

D : none of the mentioned

Q.no 32. High performance computing of the computer system tasks are done by
A : node clusters

B : network clusters

C : both a and b

D : Beowulf clusters

Q.no 33. Interprocessor communication that takes place

A : Centralized memory

B : Shared memory

C : Message passing

D : Both A and B

Q.no 34. Which of the following is not a noncomparison sort?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Shell sort

Q.no 35. Parallel computing uses _____ execution

A : sequential

B : unique

C : simultaneous

D : none of the answers is correct

Q.no 36. When the event for which a thread is blocked occurs?

A : thread moves to the ready queue

B : thread remains blocked

C : thread completes

D : a new thread is provided

Q.no 37. Which of the following is NOT a characteristic of parallel computing?


A : Breaks a task into pieces

B : Uses a single processor or computer

C : Simultaneous execution

D : May use networking

Q.no 38. _____ are major issues with non-buffered blocking sends

A : concurrent and mutual exclsion

B : Idling and deadlocks

C : synchronization

D : scheduling

Q.no 39. If the given input array is sorted or nearly sorted, which of the following
algorithm gives the best performance?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 40. Message passing system allows processes to __________

A : communicate with one another without resorting to shared data

B : communicate with one another by resorting to shared data

C : share data

D : name the recipient or sender of the message

Q.no 41. ______________ leads to concurrency.

A : Serialization

B : Parallelism

C : Serial processing

D : Distribution
Q.no 42. The time required to create a new thread in an existing process is
___________

A : greater than the time required to create a new process

B : less than the time required to create a new process

C : equal to the time required to create a new process

D : none of the mentioned

Q.no 43. RMI stands for?

A : Remote Mail InvocationRemote Message Invocation

B : Remaining Method Invention

C : Remaining Method Invocation

D : Remote Method Invocation

Q.no 44. Dynamic networks of networks, is a dynamic connection that grows is


called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : None of these

Q.no 45. If one thread opens a file with read privileges then ___________

A : other threads in the another process can also read from that file

B : other threads in the same process can also read from that file

C : any other thread can not read from that file

D : all of the mentioned

Q.no 46. the basic operations in the message-passing programming paradigm are
___

A : initiate and listen

B : wait and acknoweldge

C : request and reply


D : send and receive

Q.no 47. What is Inter process communication?

A : allows processes to communicate and synchronize their actions when using the
same address space

B : allows processes to communicate and synchronize their actions without using the
same address space

C : allows the processes to only synchronize their actions without communication

D : none of the mentioned

Q.no 48. Which of the ceramic components are easier through nano structuring?

A : Lubrication

B : Coating

C : Fabrication

D : Wear

Q.no 49. Execution of several activities at the same time.

A : multi processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 50. It is ___________ speed and ___________ latency.

A : High, high

B : Low, low

C : High, low

D : Low, high

Q.no 51. Process synchronization of programs is done by

A : input

B : output
C : operating system

D : memory

Q.no 52. The management of data flow between computers or devices or between
nodes in a network is called

A : Flow control

B : Data Control

C : Data Management

D : Flow Management

Q.no 53. Which of the following are TRUE for direct communication?

A : A communication link can be associated with N number of process(N = max.


number of processes supported by system)

B : A communication link can be associated with exactly two processes

C : Exactly N/2 links exist between each pair of processes(N = max. number of
processes supported by system)

D : Exactly two link exists between each pair of processes

Q.no 54. Thread synchronization is required because ___________

A : all threads of a process share the same address space

B : all threads of a process share the same global variables

C : all threads of a process can share the same files

D : all of the mentioned

Q.no 55. Which of the following two operations are provided by the IPC facility?

A : write & delete message

B : delete & receive message

C : send & delete message

D : receive & send message

Q.no 56. Which of the following is not the possible ways of data exchange?

A : Simplex
B : Multiplex

C : Half-duplex

D : Full-duplex

Q.no 57. Which of the following algorithms has lowest worst case time
complexity?

A : Insertion sort

B : Selection sort

C : Quick Sort

D : Heap sort

Q.no 58. A thread shares its resources(like data section, code section, open files,
signals) with ___________

A : other process similar to the one that the thread belongs to

B : other threads that belong to similar processes

C : other threads that belong to the same process

D : all of the mentioned

Q.no 59. The parallelism achieved on the basis of conditions is called as

A : Instruction level

B : Thread level

C : Transaction level

D : None of Above

Q.no 60. The register context and stacks of a thread are deallocated when the
thread?

A : terminates

B : blocks

C : unblocks

D : spawns
Answer for Question No 1. is b

Answer for Question No 2. is a

Answer for Question No 3. is c

Answer for Question No 4. is d

Answer for Question No 5. is b

Answer for Question No 6. is a

Answer for Question No 7. is b

Answer for Question No 8. is c

Answer for Question No 9. is a

Answer for Question No 10. is d

Answer for Question No 11. is c

Answer for Question No 12. is c

Answer for Question No 13. is a

Answer for Question No 14. is c

Answer for Question No 15. is d

Answer for Question No 16. is b


Answer for Question No 17. is d

Answer for Question No 18. is b

Answer for Question No 19. is b

Answer for Question No 20. is b

Answer for Question No 21. is d

Answer for Question No 22. is b

Answer for Question No 23. is c

Answer for Question No 24. is a

Answer for Question No 25. is a

Answer for Question No 26. is d

Answer for Question No 27. is c

Answer for Question No 28. is d

Answer for Question No 29. is b

Answer for Question No 30. is b

Answer for Question No 31. is c

Answer for Question No 32. is d


Answer for Question No 33. is d

Answer for Question No 34. is d

Answer for Question No 35. is c

Answer for Question No 36. is a

Answer for Question No 37. is a

Answer for Question No 38. is b

Answer for Question No 39. is b

Answer for Question No 40. is a

Answer for Question No 41. is b

Answer for Question No 42. is b

Answer for Question No 43. is d

Answer for Question No 44. is c

Answer for Question No 45. is b

Answer for Question No 46. is d

Answer for Question No 47. is b

Answer for Question No 48. is c


Answer for Question No 49. is b

Answer for Question No 50. is c

Answer for Question No 51. is c

Answer for Question No 52. is a

Answer for Question No 53. is b

Answer for Question No 54. is d

Answer for Question No 55. is d

Answer for Question No 56. is b

Answer for Question No 57. is d

Answer for Question No 58. is c

Answer for Question No 59. is b

Answer for Question No 60. is a


Seat No -
Total number of questions : 60

11342_High Performance Computing


Time : 1hr
Max Marks : 50
N.B

1) All questions are Multiple Choice Questions having single correct option.

2) Attempt any 50 questions out of 60.

3) Use of calculator is allowed.

4) Each question carries 1 Mark.

5) Specially abled students are allowed 20 minutes extra for examination.

6) Do not use pencils to darken answer.

7) Use only black/blue ball point pen to darken the appropriate circle.

8) No change will be allowed once the answer is marked on OMR Sheet.

9) Rough work shall not be done on OMR sheet or on question paper.

10) Darken ONLY ONE CIRCLE for each answer.

Q.no 1. The kernel code is dentified by the ________qualifier with void return type

A : _host_

B : __global__

C : _device_

D : void

Q.no 2. Regarding implementation of Breadth First Search using queues, what is


the maximum distance between two nodes present in the queue? (considering
each edge length 1)

A : Can be anything

B:0

C : At most 1

D : Insufficient Information
Q.no 3. The time complexity of a quick sort algorithm which makes use of median,
found by an O(n) algorithm, as pivot element is

A : O(n^2)

B : O(nlogn)

C : O(nlog(log(n))

D : O(n)

Q.no 4. Message-passing programs are often written using

A : symetric Paradigm

B : asymetric Paradigm

C : asynchronous paradigm

D : synchronous paradigm

Q.no 5. Type of HPC applications of

A : Management

B : Media mass

C : Business

D : Science

Q.no 6. Following is not decomposition technique

A : Data Decomposition

B : Recursive Decomposition

C : Serial Decomposition

D : Exploratory Decomposition

Q.no 7. Most message-passing programs are written using

A : the single program multiple data (SPMD) model.

B : the multiple program and single data(MPSD) model

C : the single program single data (SPSD) model

D : the Multiple program multiple data (SPMD) model


Q.no 8. The decomposition technique in which the input is divided is called
as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 9. The logical view of a machine supporting the message-passing paradigm


consists of p processes, each with its own _______

A : Partitioned Address space

B : Exclusive address space

C : Logical Adress Space

D : Non shared Adress Space

Q.no 10. Following is not mapping technique

A : Static Mapping

B : Dynamic Mapping

C : Hybrid Mapping

D : All of Above

Q.no 11. How many Attibutes required to characterize messsage passing


paragdigm

A:2

B:4

C:6

D:8

Q.no 12. Which of the following is a stable sorting algorithm?

A : Merge sort

B : Typical in-place quick sort

C : Heap sort
D : Selection sort

Q.no 13. The time complexity of heap sort in worst case is

A : O(log n)

B : O(n)

C : O(nlogn)

D : O(n^2)

Q.no 14. Decomposition stands for

A : Dividing Problem statement

B : Dividing no of processors

C : Dividing number of tasks

D : Dividing number of operation

Q.no 15. In ………………. only one process at a time is allowed into its critical
section, among all processes that have critical sections for the same resource.

A : Mutual Exclusion

B : Synchronization

C : Deadlock

D : Starvation

Q.no 16. Which one of the following is not shared by threads?

A : program counter

B : stack

C : both program counter and stack

D : none of the mentioned

Q.no 17. Calling a kernel is typically referred to as _________.

A : kernel thread

B : kernel initialization

C : kernel termination
D : kernel invocation

Q.no 18. Depth First Search is equivalent to which of the traversal in the Binary
Trees?

A : Pre-order Traversal

B : Post-order Traversal

C : Level-order Traversal

D : In-order Traversal

Q.no 19. Which of the following is not an application of Breadth First Search?

A : Finding shortest path between two nodes

B : Finding bipartiteness of a graph

C : GPS navigation system

D : Path Finding

Q.no 20. When the Breadth First Search of a graph is unique?

A : When the graph is a Binary Tree

B : When the graph is a Linked List

C : When the graph is a n-ary Tree

D : When the graph is a Ternary Tree

Q.no 21. Several instructions execution simultaneously in ________________

A : processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 22. Which of the following is not an in-place sorting algorithm?

A : Selection sort

B : Heap sort

C : Quick Sort
D : Merge sort

Q.no 23. Which of the following is not a stable sorting algorithm?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 24. MIPS stands for?

A : Mandatory Instructions/sec

B : Millions of Instructions/sec

C : Most of Instructions/sec

D : Many Instructions / sec

Q.no 25. The decomposition technique in which the function is used several
number of times is called as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 26. Time complexity of bubble sort in best case is

A : θ (n)

B : θ (nlogn)

C : θ (n^2)

D : θ (n(logn)^2)

Q.no 27. A process can be ___________

A : single threaded

B : multithreaded

C : both single threaded and multithreaded


D : none of the mentioned

Q.no 28. Message passing system allows processes to __________

A : communicate with one another without resorting to shared data

B : communicate with one another by resorting to shared data

C : share data

D : name the recipient or sender of the message

Q.no 29. Network interfaces allow the transfer of


messages from buffer memory to desired location without ____ intervention

A : DMA

B : CPU

C : I/O

D : Memory

Q.no 30. Interprocessor communication that takes place

A : Centralized memory

B : Shared memory

C : Message passing

D : Both A and B

Q.no 31. Nanoscience can be studied with the help of ___________

A : Quantum mechanics

B : Newtonian mechanics

C : Macro-dynamic

D : Geophysics

Q.no 32. The network topology used for interconnection network.

A : Bus based

B : Mesh

C : Linear Array
D : All of above

Q.no 33. The time required to create a new thread in an existing process is
___________

A : greater than the time required to create a new process

B : less than the time required to create a new process

C : equal to the time required to create a new process

D : none of the mentioned

Q.no 34. Which of the following is NOT a characteristic of parallel computing?

A : Breaks a task into pieces

B : Uses a single processor or computer

C : Simultaneous execution

D : May use networking

Q.no 35. ______________ leads to concurrency.

A : Serialization

B : Parallelism

C : Serial processing

D : Distribution

Q.no 36. Dynamic networks of networks, is a dynamic connection that grows is


called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : None of these

Q.no 37. Running merge sort on an array of size n which is already sorted is

A : O(n)

B : O(nlogn)
C : O(n^2)

D : O(log n)

Q.no 38. Broader concept offers Cloud computing .to select which of the following.

A : Parallel computing

B : Centralized computing

C : Utility computing

D : Decentralized computing

Q.no 39. Which of the ceramic components are easier through nano structuring?

A : Lubrication

B : Coating

C : Fabrication

D : Wear

Q.no 40. What is Inter process communication?

A : allows processes to communicate and synchronize their actions when using the
same address space

B : allows processes to communicate and synchronize their actions without using the
same address space

C : allows the processes to only synchronize their actions without communication

D : none of the mentioned

Q.no 41. When the event for which a thread is blocked occurs?

A : thread moves to the ready queue

B : thread remains blocked

C : thread completes

D : a new thread is provided

Q.no 42. Execution of several activities at the same time.

A : multi processing
B : parallel processing

C : serial processing

D : multitasking

Q.no 43. Which of the following is not a noncomparison sort?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Shell sort

Q.no 44. Writing parallel programs is referred to as

A : Parallel computation

B : Parallel processes

C : Parallel development

D : Parallel programming

Q.no 45. High performance computing of the computer system tasks are done by

A : node clusters

B : network clusters

C : both a and b

D : Beowulf clusters

Q.no 46. Parallel computing uses _____ execution

A : sequential

B : unique

C : simultaneous

D : none of the answers is correct

Q.no 47. Consider the situation in which assignment operation is very costly.
Which of the following sorting algorithm should be performed so that the
number of assignment operations is minimized in general?
A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 48. _____ are major issues with non-buffered blocking sends

A : concurrent and mutual exclsion

B : Idling and deadlocks

C : synchronization

D : scheduling

Q.no 49. We have an internet cloud of resources In cloud computing to form

A : Centralized computing

B : Decentralized computing

C : Parallel computing

D : All of Above

Q.no 50. If the given input array is sorted or nearly sorted, which of the following
algorithm gives the best performance?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 51. The link between two processes P and Q to send and receive messages is
called __________

A : communication link

B : message-passing link

C : synchronization link

D : all of the mentioned


Q.no 52. Dynamic networks is a dynamic connection that grows is called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : Cyber-physical system

Q.no 53. The amount of data that can be carried from one point to another in a
given time period is called

A : Scope

B : Capacity

C : Bandwidth

D : Limitation

Q.no 54. Octa-core processor are the processors of the computer system that
contains

A : 2 processors

B : 4 processors

C : 6 processors

D : 8 processors

Q.no 55. Given a number of elements in the range [0….n^3]. which of the following
sorting algorithms can sort them in O(n) time?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Quick sort

Q.no 56. Termination of the process terminates ___________

A : first thread of the process

B : first two threads of the process

C : all threads within the process


D : no thread within the process

Q.no 57. The register context and stacks of a thread are deallocated when the
thread?

A : terminates

B : blocks

C : unblocks

D : spawns

Q.no 58. Which of the following two operations are provided by the IPC facility?

A : write & delete message

B : delete & receive message

C : send & delete message

D : receive & send message

Q.no 59. Which of the following is not the possible ways of data exchange?

A : Simplex

B : Multiplex

C : Half-duplex

D : Full-duplex

Q.no 60. The parallelism achieved on the basis of operations is called as

A : Instruction level

B : Thread level

C : Transaction level

D : None of Above
Answer for Question No 1. is b

Answer for Question No 2. is c

Answer for Question No 3. is b

Answer for Question No 4. is c

Answer for Question No 5. is d

Answer for Question No 6. is c

Answer for Question No 7. is c

Answer for Question No 8. is a

Answer for Question No 9. is b

Answer for Question No 10. is d

Answer for Question No 11. is a

Answer for Question No 12. is a

Answer for Question No 13. is c

Answer for Question No 14. is a

Answer for Question No 15. is a

Answer for Question No 16. is c


Answer for Question No 17. is d

Answer for Question No 18. is a

Answer for Question No 19. is d

Answer for Question No 20. is b

Answer for Question No 21. is b

Answer for Question No 22. is d

Answer for Question No 23. is b

Answer for Question No 24. is b

Answer for Question No 25. is b

Answer for Question No 26. is a

Answer for Question No 27. is c

Answer for Question No 28. is a

Answer for Question No 29. is b

Answer for Question No 30. is d

Answer for Question No 31. is a

Answer for Question No 32. is d


Answer for Question No 33. is b

Answer for Question No 34. is a

Answer for Question No 35. is b

Answer for Question No 36. is c

Answer for Question No 37. is b

Answer for Question No 38. is c

Answer for Question No 39. is c

Answer for Question No 40. is b

Answer for Question No 41. is a

Answer for Question No 42. is b

Answer for Question No 43. is d

Answer for Question No 44. is d

Answer for Question No 45. is d

Answer for Question No 46. is c

Answer for Question No 47. is b

Answer for Question No 48. is b


Answer for Question No 49. is d

Answer for Question No 50. is b

Answer for Question No 51. is a

Answer for Question No 52. is c

Answer for Question No 53. is c

Answer for Question No 54. is d

Answer for Question No 55. is c

Answer for Question No 56. is c

Answer for Question No 57. is a

Answer for Question No 58. is d

Answer for Question No 59. is b

Answer for Question No 60. is c


Seat No -
Total number of questions : 60

11342_High Performance Computing


Time : 1hr
Max Marks : 50
N.B

1) All questions are Multiple Choice Questions having single correct option.

2) Attempt any 50 questions out of 60.

3) Use of calculator is allowed.

4) Each question carries 1 Mark.

5) Specially abled students are allowed 20 minutes extra for examination.

6) Do not use pencils to darken answer.

7) Use only black/blue ball point pen to darken the appropriate circle.

8) No change will be allowed once the answer is marked on OMR Sheet.

9) Rough work shall not be done on OMR sheet or on question paper.

10) Darken ONLY ONE CIRCLE for each answer.

Q.no 1. The time complexity of a quick sort algorithm which makes use of median,
found by an O(n) algorithm, as pivot element is

A : O(n^2)

B : O(nlogn)

C : O(nlog(log(n))

D : O(n)

Q.no 2. Which of the following is not a stable sorting algorithm?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort
Q.no 3. Which one of the following is not shared by threads?

A : program counter

B : stack

C : both program counter and stack

D : none of the mentioned

Q.no 4. Following is not mapping technique

A : Static Mapping

B : Dynamic Mapping

C : Hybrid Mapping

D : All of Above

Q.no 5. MIPS stands for?

A : Mandatory Instructions/sec

B : Millions of Instructions/sec

C : Most of Instructions/sec

D : Many Instructions / sec

Q.no 6. Message-passing programs are often written using

A : symetric Paradigm

B : asymetric Paradigm

C : asynchronous paradigm

D : synchronous paradigm

Q.no 7. Following is not decomposition technique

A : Data Decomposition

B : Recursive Decomposition

C : Serial Decomposition

D : Exploratory Decomposition
Q.no 8. The time complexity of heap sort in worst case is

A : O(log n)

B : O(n)

C : O(nlogn)

D : O(n^2)

Q.no 9. Calling a kernel is typically referred to as _________.

A : kernel thread

B : kernel initialization

C : kernel termination

D : kernel invocation

Q.no 10. Which of the following is not an application of Breadth First Search?

A : Finding shortest path between two nodes

B : Finding bipartiteness of a graph

C : GPS navigation system

D : Path Finding

Q.no 11. Which of the following is not an application of Depth First Search?

A : For generating topological sort of a graph

B : For generating Strongly Connected Components of a directed graph

C : Detecting cycles in the graph

D : Peer to Peer Networks

Q.no 12. The logical view of a machine supporting the message-passing paradigm
consists of p processes, each with its own _______

A : Partitioned Address space

B : Exclusive address space

C : Logical Adress Space

D : Non shared Adress Space


Q.no 13. Regarding implementation of Breadth First Search using queues, what is
the maximum distance between two nodes present in the queue? (considering
each edge length 1)

A : Can be anything

B:0

C : At most 1

D : Insufficient Information

Q.no 14. In ………………. only one process at a time is allowed into its critical
section, among all processes that have critical sections for the same resource.

A : Mutual Exclusion

B : Synchronization

C : Deadlock

D : Starvation

Q.no 15. Depth First Search is equivalent to which of the traversal in the Binary
Trees?

A : Pre-order Traversal

B : Post-order Traversal

C : Level-order Traversal

D : In-order Traversal

Q.no 16. Most message-passing programs are written using

A : the single program multiple data (SPMD) model.

B : the multiple program and single data(MPSD) model

C : the single program single data (SPSD) model

D : the Multiple program multiple data (SPMD) model

Q.no 17. Decomposition stands for

A : Dividing Problem statement

B : Dividing no of processors
C : Dividing number of tasks

D : Dividing number of operation

Q.no 18. The decomposition technique in which the function is used several
number of times is called as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 19. The decomposition technique in which the input is divided is called
as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 20. Which of the following is a stable sorting algorithm?

A : Merge sort

B : Typical in-place quick sort

C : Heap sort

D : Selection sort

Q.no 21. The kernel code is dentified by the ________qualifier with void return type

A : _host_

B : __global__

C : _device_

D : void

Q.no 22. When the Breadth First Search of a graph is unique?

A : When the graph is a Binary Tree


B : When the graph is a Linked List

C : When the graph is a n-ary Tree

D : When the graph is a Ternary Tree

Q.no 23. Which of the following is not an in-place sorting algorithm?

A : Selection sort

B : Heap sort

C : Quick Sort

D : Merge sort

Q.no 24. Several instructions execution simultaneously in ________________

A : processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 25. How many Attibutes required to characterize messsage passing


paragdigm

A:2

B:4

C:6

D:8

Q.no 26. Dynamic networks of networks, is a dynamic connection that grows is


called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : None of these

Q.no 27. Nanoscience can be studied with the help of ___________


A : Quantum mechanics

B : Newtonian mechanics

C : Macro-dynamic

D : Geophysics

Q.no 28. Time complexity of bubble sort in best case is

A : θ (n)

B : θ (nlogn)

C : θ (n^2)

D : θ (n(logn)^2)

Q.no 29. The time required to create a new thread in an existing process is
___________

A : greater than the time required to create a new process

B : less than the time required to create a new process

C : equal to the time required to create a new process

D : none of the mentioned

Q.no 30. ______________ leads to concurrency.

A : Serialization

B : Parallelism

C : Serial processing

D : Distribution

Q.no 31. Interprocessor communication that takes place

A : Centralized memory

B : Shared memory

C : Message passing

D : Both A and B

Q.no 32. RMI stands for?


A : Remote Mail InvocationRemote Message Invocation

B : Remaining Method Invention

C : Remaining Method Invocation

D : Remote Method Invocation

Q.no 33. Which of the following is not a noncomparison sort?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Shell sort

Q.no 34. What is Inter process communication?

A : allows processes to communicate and synchronize their actions when using the
same address space

B : allows processes to communicate and synchronize their actions without using the
same address space

C : allows the processes to only synchronize their actions without communication

D : none of the mentioned

Q.no 35. _____ are major issues with non-buffered blocking sends

A : concurrent and mutual exclsion

B : Idling and deadlocks

C : synchronization

D : scheduling

Q.no 36. Which of the ceramic components are easier through nano structuring?

A : Lubrication

B : Coating

C : Fabrication

D : Wear
Q.no 37. Parallel computing uses _____ execution

A : sequential

B : unique

C : simultaneous

D : none of the answers is correct

Q.no 38. A process can be ___________

A : single threaded

B : multithreaded

C : both single threaded and multithreaded

D : none of the mentioned

Q.no 39. Which of the following is NOT a characteristic of parallel computing?

A : Breaks a task into pieces

B : Uses a single processor or computer

C : Simultaneous execution

D : May use networking

Q.no 40. If the given input array is sorted or nearly sorted, which of the following
algorithm gives the best performance?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 41. It is ___________ speed and ___________ latency.

A : High, high

B : Low, low

C : High, low

D : Low, high
Q.no 42. Consider the situation in which assignment operation is very costly.
Which of the following sorting algorithm should be performed so that the
number of assignment operations is minimized in general?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 43. Running merge sort on an array of size n which is already sorted is

A : O(n)

B : O(nlogn)

C : O(n^2)

D : O(log n)

Q.no 44. Message passing system allows processes to __________

A : communicate with one another without resorting to shared data

B : communicate with one another by resorting to shared data

C : share data

D : name the recipient or sender of the message

Q.no 45. If one thread opens a file with read privileges then ___________

A : other threads in the another process can also read from that file

B : other threads in the same process can also read from that file

C : any other thread can not read from that file

D : all of the mentioned

Q.no 46. High performance computing of the computer system tasks are done by

A : node clusters

B : network clusters

C : both a and b
D : Beowulf clusters

Q.no 47. the basic operations in the message-passing programming paradigm are
___

A : initiate and listen

B : wait and acknoweldge

C : request and reply

D : send and receive

Q.no 48. We have an internet cloud of resources In cloud computing to form

A : Centralized computing

B : Decentralized computing

C : Parallel computing

D : All of Above

Q.no 49. Broader concept offers Cloud computing .to select which of the following.

A : Parallel computing

B : Centralized computing

C : Utility computing

D : Decentralized computing

Q.no 50. Writing parallel programs is referred to as

A : Parallel computation

B : Parallel processes

C : Parallel development

D : Parallel programming

Q.no 51. Which of the following is not the possible ways of data exchange?

A : Simplex

B : Multiplex

C : Half-duplex
D : Full-duplex

Q.no 52. A thread shares its resources(like data section, code section, open files,
signals) with ___________

A : other process similar to the one that the thread belongs to

B : other threads that belong to similar processes

C : other threads that belong to the same process

D : all of the mentioned

Q.no 53. Thread synchronization is required because ___________

A : all threads of a process share the same address space

B : all threads of a process share the same global variables

C : all threads of a process can share the same files

D : all of the mentioned

Q.no 54. The parallelism achieved on the basis of operations is called as

A : Instruction level

B : Thread level

C : Transaction level

D : None of Above

Q.no 55. Which of the following algorithms has lowest worst case time
complexity?

A : Insertion sort

B : Selection sort

C : Quick Sort

D : Heap sort

Q.no 56. Resources and clients transparency that allows movement within a
system is called

A : Mobility transparency

B : Concurrency transparency
C : Performance transparency

D : Replication transparency

Q.no 57. Multi-processor systems of the computer system has advantage of

A : cost

B : reliability

C : uncertainty

D : scalability

Q.no 58. Process synchronization of programs is done by

A : input

B : output

C : operating system

D : memory

Q.no 59. The management of data flow between computers or devices or between
nodes in a network is called

A : Flow control

B : Data Control

C : Data Management

D : Flow Management

Q.no 60. A thread is also called ___________

A : Light Weight Process(LWP)

B : Heavy Weight Process(HWP)

C : Process

D : None of the mentioned


Answer for Question No 1. is b

Answer for Question No 2. is b

Answer for Question No 3. is c

Answer for Question No 4. is d

Answer for Question No 5. is b

Answer for Question No 6. is c

Answer for Question No 7. is c

Answer for Question No 8. is c

Answer for Question No 9. is d

Answer for Question No 10. is d

Answer for Question No 11. is d

Answer for Question No 12. is b

Answer for Question No 13. is c

Answer for Question No 14. is a

Answer for Question No 15. is a

Answer for Question No 16. is c


Answer for Question No 17. is a

Answer for Question No 18. is b

Answer for Question No 19. is a

Answer for Question No 20. is a

Answer for Question No 21. is b

Answer for Question No 22. is b

Answer for Question No 23. is d

Answer for Question No 24. is b

Answer for Question No 25. is a

Answer for Question No 26. is c

Answer for Question No 27. is a

Answer for Question No 28. is a

Answer for Question No 29. is b

Answer for Question No 30. is b

Answer for Question No 31. is d

Answer for Question No 32. is d


Answer for Question No 33. is d

Answer for Question No 34. is b

Answer for Question No 35. is b

Answer for Question No 36. is c

Answer for Question No 37. is c

Answer for Question No 38. is c

Answer for Question No 39. is a

Answer for Question No 40. is b

Answer for Question No 41. is c

Answer for Question No 42. is b

Answer for Question No 43. is b

Answer for Question No 44. is a

Answer for Question No 45. is b

Answer for Question No 46. is d

Answer for Question No 47. is d

Answer for Question No 48. is d


Answer for Question No 49. is c

Answer for Question No 50. is d

Answer for Question No 51. is b

Answer for Question No 52. is c

Answer for Question No 53. is d

Answer for Question No 54. is c

Answer for Question No 55. is d

Answer for Question No 56. is a

Answer for Question No 57. is b

Answer for Question No 58. is c

Answer for Question No 59. is a

Answer for Question No 60. is a


Seat No -
Total number of questions : 60

11342_High Performance Computing


Time : 1hr
Max Marks : 50
N.B

1) All questions are Multiple Choice Questions having single correct option.

2) Attempt any 50 questions out of 60.

3) Use of calculator is allowed.

4) Each question carries 1 Mark.

5) Specially abled students are allowed 20 minutes extra for examination.

6) Do not use pencils to darken answer.

7) Use only black/blue ball point pen to darken the appropriate circle.

8) No change will be allowed once the answer is marked on OMR Sheet.

9) Rough work shall not be done on OMR sheet or on question paper.

10) Darken ONLY ONE CIRCLE for each answer.

Q.no 1. Which of the following is not an application of Depth First Search?

A : For generating topological sort of a graph

B : For generating Strongly Connected Components of a directed graph

C : Detecting cycles in the graph

D : Peer to Peer Networks

Q.no 2. The time complexity of heap sort in worst case is

A : O(log n)

B : O(n)

C : O(nlogn)

D : O(n^2)

Q.no 3. Message-passing programs are often written using


A : symetric Paradigm

B : asymetric Paradigm

C : asynchronous paradigm

D : synchronous paradigm

Q.no 4. Following is not decomposition technique

A : Data Decomposition

B : Recursive Decomposition

C : Serial Decomposition

D : Exploratory Decomposition

Q.no 5. Several instructions execution simultaneously in ________________

A : processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 6. MIPS stands for?

A : Mandatory Instructions/sec

B : Millions of Instructions/sec

C : Most of Instructions/sec

D : Many Instructions / sec

Q.no 7. Type of HPC applications of

A : Management

B : Media mass

C : Business

D : Science

Q.no 8. The logical view of a machine supporting the message-passing paradigm


consists of p processes, each with its own _______
A : Partitioned Address space

B : Exclusive address space

C : Logical Adress Space

D : Non shared Adress Space

Q.no 9. The decomposition technique in which the input is divided is called


as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 10. Which of the following is not an in-place sorting algorithm?

A : Selection sort

B : Heap sort

C : Quick Sort

D : Merge sort

Q.no 11. Regarding implementation of Breadth First Search using queues, what is
the maximum distance between two nodes present in the queue? (considering
each edge length 1)

A : Can be anything

B:0

C : At most 1

D : Insufficient Information

Q.no 12. Which of the following is a stable sorting algorithm?

A : Merge sort

B : Typical in-place quick sort

C : Heap sort

D : Selection sort
Q.no 13. Following is not mapping technique

A : Static Mapping

B : Dynamic Mapping

C : Hybrid Mapping

D : All of Above

Q.no 14. Calling a kernel is typically referred to as _________.

A : kernel thread

B : kernel initialization

C : kernel termination

D : kernel invocation

Q.no 15. Depth First Search is equivalent to which of the traversal in the Binary
Trees?

A : Pre-order Traversal

B : Post-order Traversal

C : Level-order Traversal

D : In-order Traversal

Q.no 16. Decomposition stands for

A : Dividing Problem statement

B : Dividing no of processors

C : Dividing number of tasks

D : Dividing number of operation

Q.no 17. How many Attibutes required to characterize messsage passing


paragdigm

A:2

B:4

C:6
D:8

Q.no 18. Most message-passing programs are written using

A : the single program multiple data (SPMD) model.

B : the multiple program and single data(MPSD) model

C : the single program single data (SPSD) model

D : the Multiple program multiple data (SPMD) model

Q.no 19. When the Breadth First Search of a graph is unique?

A : When the graph is a Binary Tree

B : When the graph is a Linked List

C : When the graph is a n-ary Tree

D : When the graph is a Ternary Tree

Q.no 20. Which of the following is not a stable sorting algorithm?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 21. Which of the following is not an application of Breadth First Search?

A : Finding shortest path between two nodes

B : Finding bipartiteness of a graph

C : GPS navigation system

D : Path Finding

Q.no 22. In ………………. only one process at a time is allowed into its critical
section, among all processes that have critical sections for the same resource.

A : Mutual Exclusion

B : Synchronization

C : Deadlock
D : Starvation

Q.no 23. The kernel code is dentified by the ________qualifier with void return type

A : _host_

B : __global__

C : _device_

D : void

Q.no 24. The decomposition technique in which the function is used several
number of times is called as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 25. Which one of the following is not shared by threads?

A : program counter

B : stack

C : both program counter and stack

D : none of the mentioned

Q.no 26. Which of the ceramic components are easier through nano structuring?

A : Lubrication

B : Coating

C : Fabrication

D : Wear

Q.no 27. Network interfaces allow the transfer of


messages from buffer memory to desired location without ____ intervention

A : DMA

B : CPU
C : I/O

D : Memory

Q.no 28. The network topology used for interconnection network.

A : Bus based

B : Mesh

C : Linear Array

D : All of above

Q.no 29. Running merge sort on an array of size n which is already sorted is

A : O(n)

B : O(nlogn)

C : O(n^2)

D : O(log n)

Q.no 30. A process can be ___________

A : single threaded

B : multithreaded

C : both single threaded and multithreaded

D : none of the mentioned

Q.no 31. High performance computing of the computer system tasks are done by

A : node clusters

B : network clusters

C : both a and b

D : Beowulf clusters

Q.no 32. RMI stands for?

A : Remote Mail InvocationRemote Message Invocation

B : Remaining Method Invention


C : Remaining Method Invocation

D : Remote Method Invocation

Q.no 33. Message passing system allows processes to __________

A : communicate with one another without resorting to shared data

B : communicate with one another by resorting to shared data

C : share data

D : name the recipient or sender of the message

Q.no 34. Broader concept offers Cloud computing .to select which of the following.

A : Parallel computing

B : Centralized computing

C : Utility computing

D : Decentralized computing

Q.no 35. Time complexity of bubble sort in best case is

A : θ (n)

B : θ (nlogn)

C : θ (n^2)

D : θ (n(logn)^2)

Q.no 36. Which of the following is NOT a characteristic of parallel computing?

A : Breaks a task into pieces

B : Uses a single processor or computer

C : Simultaneous execution

D : May use networking

Q.no 37. When the event for which a thread is blocked occurs?

A : thread moves to the ready queue

B : thread remains blocked


C : thread completes

D : a new thread is provided

Q.no 38. Interprocessor communication that takes place

A : Centralized memory

B : Shared memory

C : Message passing

D : Both A and B

Q.no 39. _____ are major issues with non-buffered blocking sends

A : concurrent and mutual exclsion

B : Idling and deadlocks

C : synchronization

D : scheduling

Q.no 40. Writing parallel programs is referred to as

A : Parallel computation

B : Parallel processes

C : Parallel development

D : Parallel programming

Q.no 41. If one thread opens a file with read privileges then ___________

A : other threads in the another process can also read from that file

B : other threads in the same process can also read from that file

C : any other thread can not read from that file

D : all of the mentioned

Q.no 42. We have an internet cloud of resources In cloud computing to form

A : Centralized computing

B : Decentralized computing
C : Parallel computing

D : All of Above

Q.no 43. the basic operations in the message-passing programming paradigm are
___

A : initiate and listen

B : wait and acknoweldge

C : request and reply

D : send and receive

Q.no 44. Nanoscience can be studied with the help of ___________

A : Quantum mechanics

B : Newtonian mechanics

C : Macro-dynamic

D : Geophysics

Q.no 45. Execution of several activities at the same time.

A : multi processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 46. The time required to create a new thread in an existing process is
___________

A : greater than the time required to create a new process

B : less than the time required to create a new process

C : equal to the time required to create a new process

D : none of the mentioned

Q.no 47. ______________ leads to concurrency.

A : Serialization
B : Parallelism

C : Serial processing

D : Distribution

Q.no 48. Dynamic networks of networks, is a dynamic connection that grows is


called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : None of these

Q.no 49. Parallel computing uses _____ execution

A : sequential

B : unique

C : simultaneous

D : none of the answers is correct

Q.no 50. If the given input array is sorted or nearly sorted, which of the following
algorithm gives the best performance?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 51. The amount of data that can be carried from one point to another in a
given time period is called

A : Scope

B : Capacity

C : Bandwidth

D : Limitation

Q.no 52. Thread synchronization is required because ___________


A : all threads of a process share the same address space

B : all threads of a process share the same global variables

C : all threads of a process can share the same files

D : all of the mentioned

Q.no 53. Multi-processor systems of the computer system has advantage of

A : cost

B : reliability

C : uncertainty

D : scalability

Q.no 54. A thread is also called ___________

A : Light Weight Process(LWP)

B : Heavy Weight Process(HWP)

C : Process

D : None of the mentioned

Q.no 55. The parallelism achieved on the basis of conditions is called as

A : Instruction level

B : Thread level

C : Transaction level

D : None of Above

Q.no 56. NVIDIA thought that 'unifying theme' of every forms of parallelism is the

A : CDA thread

B : PTA thread

C : CUDA thread

D : CUD thread

Q.no 57. Resources and clients transparency that allows movement within a
system is called
A : Mobility transparency

B : Concurrency transparency

C : Performance transparency

D : Replication transparency

Q.no 58. One that is not a type of multiprocessor of the computer system is

A : dual core

B : blade server

C : clustered system

D : single core

Q.no 59. Which of the following are TRUE for direct communication?

A : A communication link can be associated with N number of process(N = max.


number of processes supported by system)

B : A communication link can be associated with exactly two processes

C : Exactly N/2 links exist between each pair of processes(N = max. number of
processes supported by system)

D : Exactly two link exists between each pair of processes

Q.no 60. In indirect communication between processes P and Q __________

A : a) there is another process R to handle and pass on the messages between P and Q

B : there is another machine between the two processes to help communication

C : there is a mailbox to help communication between P and Q

D : none of the mentioned


Answer for Question No 1. is d

Answer for Question No 2. is c

Answer for Question No 3. is c

Answer for Question No 4. is c

Answer for Question No 5. is b

Answer for Question No 6. is b

Answer for Question No 7. is d

Answer for Question No 8. is b

Answer for Question No 9. is a

Answer for Question No 10. is d

Answer for Question No 11. is c

Answer for Question No 12. is a

Answer for Question No 13. is d

Answer for Question No 14. is d

Answer for Question No 15. is a

Answer for Question No 16. is a


Answer for Question No 17. is a

Answer for Question No 18. is c

Answer for Question No 19. is b

Answer for Question No 20. is b

Answer for Question No 21. is d

Answer for Question No 22. is a

Answer for Question No 23. is b

Answer for Question No 24. is b

Answer for Question No 25. is c

Answer for Question No 26. is c

Answer for Question No 27. is b

Answer for Question No 28. is d

Answer for Question No 29. is b

Answer for Question No 30. is c

Answer for Question No 31. is d

Answer for Question No 32. is d


Answer for Question No 33. is a

Answer for Question No 34. is c

Answer for Question No 35. is a

Answer for Question No 36. is a

Answer for Question No 37. is a

Answer for Question No 38. is d

Answer for Question No 39. is b

Answer for Question No 40. is d

Answer for Question No 41. is b

Answer for Question No 42. is d

Answer for Question No 43. is d

Answer for Question No 44. is a

Answer for Question No 45. is b

Answer for Question No 46. is b

Answer for Question No 47. is b

Answer for Question No 48. is c


Answer for Question No 49. is c

Answer for Question No 50. is b

Answer for Question No 51. is c

Answer for Question No 52. is d

Answer for Question No 53. is b

Answer for Question No 54. is a

Answer for Question No 55. is b

Answer for Question No 56. is c

Answer for Question No 57. is a

Answer for Question No 58. is d

Answer for Question No 59. is b

Answer for Question No 60. is c


Seat No -
Total number of questions : 60

11342_High Performance Computing


Time : 1hr
Max Marks : 50
N.B

1) All questions are Multiple Choice Questions having single correct option.

2) Attempt any 50 questions out of 60.

3) Use of calculator is allowed.

4) Each question carries 1 Mark.

5) Specially abled students are allowed 20 minutes extra for examination.

6) Do not use pencils to darken answer.

7) Use only black/blue ball point pen to darken the appropriate circle.

8) No change will be allowed once the answer is marked on OMR Sheet.

9) Rough work shall not be done on OMR sheet or on question paper.

10) Darken ONLY ONE CIRCLE for each answer.

Q.no 1. How many Attibutes required to characterize messsage passing paragdigm

A:2

B:4

C:6

D:8

Q.no 2. Which of the following is not an application of Breadth First Search?

A : Finding shortest path between two nodes

B : Finding bipartiteness of a graph

C : GPS navigation system

D : Path Finding
Q.no 3. The time complexity of a quick sort algorithm which makes use of median,
found by an O(n) algorithm, as pivot element is

A : O(n^2)

B : O(nlogn)

C : O(nlog(log(n))

D : O(n)

Q.no 4. Several instructions execution simultaneously in ________________

A : processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 5. Type of HPC applications of

A : Management

B : Media mass

C : Business

D : Science

Q.no 6. In ………………. only one process at a time is allowed into its critical section,
among all processes that have critical sections for the same resource.

A : Mutual Exclusion

B : Synchronization

C : Deadlock

D : Starvation

Q.no 7. Depth First Search is equivalent to which of the traversal in the Binary
Trees?

A : Pre-order Traversal

B : Post-order Traversal

C : Level-order Traversal
D : In-order Traversal

Q.no 8. Which one of the following is not shared by threads?

A : program counter

B : stack

C : both program counter and stack

D : none of the mentioned

Q.no 9. Most message-passing programs are written using

A : the single program multiple data (SPMD) model.

B : the multiple program and single data(MPSD) model

C : the single program single data (SPSD) model

D : the Multiple program multiple data (SPMD) model

Q.no 10. The logical view of a machine supporting the message-passing paradigm
consists of p processes, each with its own _______

A : Partitioned Address space

B : Exclusive address space

C : Logical Adress Space

D : Non shared Adress Space

Q.no 11. Which of the following is not an in-place sorting algorithm?

A : Selection sort

B : Heap sort

C : Quick Sort

D : Merge sort

Q.no 12. Following is not decomposition technique

A : Data Decomposition

B : Recursive Decomposition

C : Serial Decomposition
D : Exploratory Decomposition

Q.no 13. Regarding implementation of Breadth First Search using queues, what is
the maximum distance between two nodes present in the queue? (considering
each edge length 1)

A : Can be anything

B:0

C : At most 1

D : Insufficient Information

Q.no 14. Calling a kernel is typically referred to as _________.

A : kernel thread

B : kernel initialization

C : kernel termination

D : kernel invocation

Q.no 15. The decomposition technique in which the function is used several
number of times is called as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 16. MIPS stands for?

A : Mandatory Instructions/sec

B : Millions of Instructions/sec

C : Most of Instructions/sec

D : Many Instructions / sec

Q.no 17. The time complexity of heap sort in worst case is

A : O(log n)

B : O(n)
C : O(nlogn)

D : O(n^2)

Q.no 18. Which of the following is not a stable sorting algorithm?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 19. When the Breadth First Search of a graph is unique?

A : When the graph is a Binary Tree

B : When the graph is a Linked List

C : When the graph is a n-ary Tree

D : When the graph is a Ternary Tree

Q.no 20. Message-passing programs are often written using

A : symetric Paradigm

B : asymetric Paradigm

C : asynchronous paradigm

D : synchronous paradigm

Q.no 21. Which of the following is a stable sorting algorithm?

A : Merge sort

B : Typical in-place quick sort

C : Heap sort

D : Selection sort

Q.no 22. The decomposition technique in which the input is divided is called
as_________

A : Data Decomposition

B : Recursive Decomposition
C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 23. Which of the following is not an application of Depth First Search?

A : For generating topological sort of a graph

B : For generating Strongly Connected Components of a directed graph

C : Detecting cycles in the graph

D : Peer to Peer Networks

Q.no 24. Following is not mapping technique

A : Static Mapping

B : Dynamic Mapping

C : Hybrid Mapping

D : All of Above

Q.no 25. The kernel code is dentified by the ________qualifier with void return type

A : _host_

B : __global__

C : _device_

D : void

Q.no 26. Broader concept offers Cloud computing .to select which of the following.

A : Parallel computing

B : Centralized computing

C : Utility computing

D : Decentralized computing

Q.no 27. High performance computing of the computer system tasks are done by

A : node clusters

B : network clusters
C : both a and b

D : Beowulf clusters

Q.no 28. Execution of several activities at the same time.

A : multi processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 29. the basic operations in the message-passing programming paradigm are
___

A : initiate and listen

B : wait and acknoweldge

C : request and reply

D : send and receive

Q.no 30. Nanoscience can be studied with the help of ___________

A : Quantum mechanics

B : Newtonian mechanics

C : Macro-dynamic

D : Geophysics

Q.no 31. Interprocessor communication that takes place

A : Centralized memory

B : Shared memory

C : Message passing

D : Both A and B

Q.no 32. ______________ leads to concurrency.

A : Serialization

B : Parallelism
C : Serial processing

D : Distribution

Q.no 33. If the given input array is sorted or nearly sorted, which of the following
algorithm gives the best performance?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 34. Dynamic networks of networks, is a dynamic connection that grows is


called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : None of these

Q.no 35. A process can be ___________

A : single threaded

B : multithreaded

C : both single threaded and multithreaded

D : none of the mentioned

Q.no 36. The network topology used for interconnection network.

A : Bus based

B : Mesh

C : Linear Array

D : All of above

Q.no 37. Parallel computing uses _____ execution

A : sequential
B : unique

C : simultaneous

D : none of the answers is correct

Q.no 38. It is ___________ speed and ___________ latency.

A : High, high

B : Low, low

C : High, low

D : Low, high

Q.no 39. Which of the following is NOT a characteristic of parallel computing?

A : Breaks a task into pieces

B : Uses a single processor or computer

C : Simultaneous execution

D : May use networking

Q.no 40. Message passing system allows processes to __________

A : communicate with one another without resorting to shared data

B : communicate with one another by resorting to shared data

C : share data

D : name the recipient or sender of the message

Q.no 41. What is Inter process communication?

A : allows processes to communicate and synchronize their actions when using the
same address space

B : allows processes to communicate and synchronize their actions without using the
same address space

C : allows the processes to only synchronize their actions without communication

D : none of the mentioned

Q.no 42. Time complexity of bubble sort in best case is


A : θ (n)

B : θ (nlogn)

C : θ (n^2)

D : θ (n(logn)^2)

Q.no 43. When the event for which a thread is blocked occurs?

A : thread moves to the ready queue

B : thread remains blocked

C : thread completes

D : a new thread is provided

Q.no 44. RMI stands for?

A : Remote Mail InvocationRemote Message Invocation

B : Remaining Method Invention

C : Remaining Method Invocation

D : Remote Method Invocation

Q.no 45. If one thread opens a file with read privileges then ___________

A : other threads in the another process can also read from that file

B : other threads in the same process can also read from that file

C : any other thread can not read from that file

D : all of the mentioned

Q.no 46. Consider the situation in which assignment operation is very costly.
Which of the following sorting algorithm should be performed so that the
number of assignment operations is minimized in general?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort
Q.no 47. _____ are major issues with non-buffered blocking sends

A : concurrent and mutual exclsion

B : Idling and deadlocks

C : synchronization

D : scheduling

Q.no 48. Running merge sort on an array of size n which is already sorted is

A : O(n)

B : O(nlogn)

C : O(n^2)

D : O(log n)

Q.no 49. The time required to create a new thread in an existing process is
___________

A : greater than the time required to create a new process

B : less than the time required to create a new process

C : equal to the time required to create a new process

D : none of the mentioned

Q.no 50. Network interfaces allow the transfer of


messages from buffer memory to desired location without ____ intervention

A : DMA

B : CPU

C : I/O

D : Memory

Q.no 51. Thread synchronization is required because ___________

A : all threads of a process share the same address space

B : all threads of a process share the same global variables

C : all threads of a process can share the same files


D : all of the mentioned

Q.no 52. Which of the following are TRUE for direct communication?

A : A communication link can be associated with N number of process(N = max.


number of processes supported by system)

B : A communication link can be associated with exactly two processes

C : Exactly N/2 links exist between each pair of processes(N = max. number of
processes supported by system)

D : Exactly two link exists between each pair of processes

Q.no 53. Resources and clients transparency that allows movement within a
system is called

A : Mobility transparency

B : Concurrency transparency

C : Performance transparency

D : Replication transparency

Q.no 54. In indirect communication between processes P and Q __________

A : a) there is another process R to handle and pass on the messages between P and Q

B : there is another machine between the two processes to help communication

C : there is a mailbox to help communication between P and Q

D : none of the mentioned

Q.no 55. The architecture which can compute several tasks simultaneously at
processor level itself is called as:

A : Multi core architecture

B : Multi processor architecture

C : Multi threaded architecture

D : All of above

Q.no 56. The amount of data that can be carried from one point to another in a
given time period is called

A : Scope
B : Capacity

C : Bandwidth

D : Limitation

Q.no 57. Process synchronization of programs is done by

A : input

B : output

C : operating system

D : memory

Q.no 58. NVIDIA thought that 'unifying theme' of every forms of parallelism is the

A : CDA thread

B : PTA thread

C : CUDA thread

D : CUD thread

Q.no 59. The transparency that enables accessing local and remote resources
using identical operations is called ____________

A : Access transparency

B : Concurrency transparency

C : Performance transparency

D : Scaling transparency

Q.no 60. Termination of the process terminates ___________

A : first thread of the process

B : first two threads of the process

C : all threads within the process

D : no thread within the process


Answer for Question No 1. is a

Answer for Question No 2. is d

Answer for Question No 3. is b

Answer for Question No 4. is b

Answer for Question No 5. is d

Answer for Question No 6. is a

Answer for Question No 7. is a

Answer for Question No 8. is c

Answer for Question No 9. is c

Answer for Question No 10. is b

Answer for Question No 11. is d

Answer for Question No 12. is c

Answer for Question No 13. is c

Answer for Question No 14. is d

Answer for Question No 15. is b

Answer for Question No 16. is b


Answer for Question No 17. is c

Answer for Question No 18. is b

Answer for Question No 19. is b

Answer for Question No 20. is c

Answer for Question No 21. is a

Answer for Question No 22. is a

Answer for Question No 23. is d

Answer for Question No 24. is d

Answer for Question No 25. is b

Answer for Question No 26. is c

Answer for Question No 27. is d

Answer for Question No 28. is b

Answer for Question No 29. is d

Answer for Question No 30. is a

Answer for Question No 31. is d

Answer for Question No 32. is b


Answer for Question No 33. is b

Answer for Question No 34. is c

Answer for Question No 35. is c

Answer for Question No 36. is d

Answer for Question No 37. is c

Answer for Question No 38. is c

Answer for Question No 39. is a

Answer for Question No 40. is a

Answer for Question No 41. is b

Answer for Question No 42. is a

Answer for Question No 43. is a

Answer for Question No 44. is d

Answer for Question No 45. is b

Answer for Question No 46. is b

Answer for Question No 47. is b

Answer for Question No 48. is b


Answer for Question No 49. is b

Answer for Question No 50. is b

Answer for Question No 51. is d

Answer for Question No 52. is b

Answer for Question No 53. is a

Answer for Question No 54. is c

Answer for Question No 55. is a

Answer for Question No 56. is c

Answer for Question No 57. is c

Answer for Question No 58. is c

Answer for Question No 59. is a

Answer for Question No 60. is c


Seat No -
Total number of questions : 60

11342_High Performance Computing


Time : 1hr
Max Marks : 50
N.B

1) All questions are Multiple Choice Questions having single correct option.

2) Attempt any 50 questions out of 60.

3) Use of calculator is allowed.

4) Each question carries 1 Mark.

5) Specially abled students are allowed 20 minutes extra for examination.

6) Do not use pencils to darken answer.

7) Use only black/blue ball point pen to darken the appropriate circle.

8) No change will be allowed once the answer is marked on OMR Sheet.

9) Rough work shall not be done on OMR sheet or on question paper.

10) Darken ONLY ONE CIRCLE for each answer.

Q.no 1. The time complexity of heap sort in worst case is

A : O(log n)

B : O(n)

C : O(nlogn)

D : O(n^2)

Q.no 2. Regarding implementation of Breadth First Search using queues, what is


the maximum distance between two nodes present in the queue? (considering
each edge length 1)

A : Can be anything

B:0

C : At most 1

D : Insufficient Information
Q.no 3. Most message-passing programs are written using

A : the single program multiple data (SPMD) model.

B : the multiple program and single data(MPSD) model

C : the single program single data (SPSD) model

D : the Multiple program multiple data (SPMD) model

Q.no 4. In ………………. only one process at a time is allowed into its critical section,
among all processes that have critical sections for the same resource.

A : Mutual Exclusion

B : Synchronization

C : Deadlock

D : Starvation

Q.no 5. Following is not mapping technique

A : Static Mapping

B : Dynamic Mapping

C : Hybrid Mapping

D : All of Above

Q.no 6. When the Breadth First Search of a graph is unique?

A : When the graph is a Binary Tree

B : When the graph is a Linked List

C : When the graph is a n-ary Tree

D : When the graph is a Ternary Tree

Q.no 7. The decomposition technique in which the function is used several


number of times is called as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition
D : Exploratory Decomposition

Q.no 8. Which of the following is a stable sorting algorithm?

A : Merge sort

B : Typical in-place quick sort

C : Heap sort

D : Selection sort

Q.no 9. Following is not decomposition technique

A : Data Decomposition

B : Recursive Decomposition

C : Serial Decomposition

D : Exploratory Decomposition

Q.no 10. Which one of the following is not shared by threads?

A : program counter

B : stack

C : both program counter and stack

D : none of the mentioned

Q.no 11. Decomposition stands for

A : Dividing Problem statement

B : Dividing no of processors

C : Dividing number of tasks

D : Dividing number of operation

Q.no 12. Which of the following is not an application of Depth First Search?

A : For generating topological sort of a graph

B : For generating Strongly Connected Components of a directed graph

C : Detecting cycles in the graph


D : Peer to Peer Networks

Q.no 13. Which of the following is not an application of Breadth First Search?

A : Finding shortest path between two nodes

B : Finding bipartiteness of a graph

C : GPS navigation system

D : Path Finding

Q.no 14. Which of the following is not a stable sorting algorithm?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 15. Type of HPC applications of

A : Management

B : Media mass

C : Business

D : Science

Q.no 16. Which of the following is not an in-place sorting algorithm?

A : Selection sort

B : Heap sort

C : Quick Sort

D : Merge sort

Q.no 17. Message-passing programs are often written using

A : symetric Paradigm

B : asymetric Paradigm

C : asynchronous paradigm
D : synchronous paradigm

Q.no 18. The logical view of a machine supporting the message-passing paradigm
consists of p processes, each with its own _______

A : Partitioned Address space

B : Exclusive address space

C : Logical Adress Space

D : Non shared Adress Space

Q.no 19. Depth First Search is equivalent to which of the traversal in the Binary
Trees?

A : Pre-order Traversal

B : Post-order Traversal

C : Level-order Traversal

D : In-order Traversal

Q.no 20. The kernel code is dentified by the ________qualifier with void return type

A : _host_

B : __global__

C : _device_

D : void

Q.no 21. MIPS stands for?

A : Mandatory Instructions/sec

B : Millions of Instructions/sec

C : Most of Instructions/sec

D : Many Instructions / sec

Q.no 22. Calling a kernel is typically referred to as _________.

A : kernel thread

B : kernel initialization
C : kernel termination

D : kernel invocation

Q.no 23. The time complexity of a quick sort algorithm which makes use of
median, found by an O(n) algorithm, as pivot element is

A : O(n^2)

B : O(nlogn)

C : O(nlog(log(n))

D : O(n)

Q.no 24. Several instructions execution simultaneously in ________________

A : processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 25. The decomposition technique in which the input is divided is called
as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 26. Time complexity of bubble sort in best case is

A : θ (n)

B : θ (nlogn)

C : θ (n^2)

D : θ (n(logn)^2)

Q.no 27. Nanoscience can be studied with the help of ___________

A : Quantum mechanics
B : Newtonian mechanics

C : Macro-dynamic

D : Geophysics

Q.no 28. Dynamic networks of networks, is a dynamic connection that grows is


called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : None of these

Q.no 29. Network interfaces allow the transfer of


messages from buffer memory to desired location without ____ intervention

A : DMA

B : CPU

C : I/O

D : Memory

Q.no 30. We have an internet cloud of resources In cloud computing to form

A : Centralized computing

B : Decentralized computing

C : Parallel computing

D : All of Above

Q.no 31. Which of the following is NOT a characteristic of parallel computing?

A : Breaks a task into pieces

B : Uses a single processor or computer

C : Simultaneous execution

D : May use networking

Q.no 32. Broader concept offers Cloud computing .to select which of the following.
A : Parallel computing

B : Centralized computing

C : Utility computing

D : Decentralized computing

Q.no 33. A process can be ___________

A : single threaded

B : multithreaded

C : both single threaded and multithreaded

D : none of the mentioned

Q.no 34. Message passing system allows processes to __________

A : communicate with one another without resorting to shared data

B : communicate with one another by resorting to shared data

C : share data

D : name the recipient or sender of the message

Q.no 35. It is ___________ speed and ___________ latency.

A : High, high

B : Low, low

C : High, low

D : Low, high

Q.no 36. If the given input array is sorted or nearly sorted, which of the following
algorithm gives the best performance?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 37. High performance computing of the computer system tasks are done by
A : node clusters

B : network clusters

C : both a and b

D : Beowulf clusters

Q.no 38. If one thread opens a file with read privileges then ___________

A : other threads in the another process can also read from that file

B : other threads in the same process can also read from that file

C : any other thread can not read from that file

D : all of the mentioned

Q.no 39. Interprocessor communication that takes place

A : Centralized memory

B : Shared memory

C : Message passing

D : Both A and B

Q.no 40. Which of the following is not a noncomparison sort?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Shell sort

Q.no 41. Running merge sort on an array of size n which is already sorted is

A : O(n)

B : O(nlogn)

C : O(n^2)

D : O(log n)

Q.no 42. RMI stands for?


A : Remote Mail InvocationRemote Message Invocation

B : Remaining Method Invention

C : Remaining Method Invocation

D : Remote Method Invocation

Q.no 43. The time required to create a new thread in an existing process is
___________

A : greater than the time required to create a new process

B : less than the time required to create a new process

C : equal to the time required to create a new process

D : none of the mentioned

Q.no 44. Which of the ceramic components are easier through nano structuring?

A : Lubrication

B : Coating

C : Fabrication

D : Wear

Q.no 45. Parallel computing uses _____ execution

A : sequential

B : unique

C : simultaneous

D : none of the answers is correct

Q.no 46. ______________ leads to concurrency.

A : Serialization

B : Parallelism

C : Serial processing

D : Distribution

Q.no 47. When the event for which a thread is blocked occurs?
A : thread moves to the ready queue

B : thread remains blocked

C : thread completes

D : a new thread is provided

Q.no 48. What is Inter process communication?

A : allows processes to communicate and synchronize their actions when using the
same address space

B : allows processes to communicate and synchronize their actions without using the
same address space

C : allows the processes to only synchronize their actions without communication

D : none of the mentioned

Q.no 49. Writing parallel programs is referred to as

A : Parallel computation

B : Parallel processes

C : Parallel development

D : Parallel programming

Q.no 50. the basic operations in the message-passing programming paradigm are
___

A : initiate and listen

B : wait and acknoweldge

C : request and reply

D : send and receive

Q.no 51. Which of the following are TRUE for direct communication?

A : A communication link can be associated with N number of process(N = max.


number of processes supported by system)

B : A communication link can be associated with exactly two processes

C : Exactly N/2 links exist between each pair of processes(N = max. number of
processes supported by system)
D : Exactly two link exists between each pair of processes

Q.no 52. A thread shares its resources(like data section, code section, open files,
signals) with ___________

A : other process similar to the one that the thread belongs to

B : other threads that belong to similar processes

C : other threads that belong to the same process

D : all of the mentioned

Q.no 53. One that is not a type of multiprocessor of the computer system is

A : dual core

B : blade server

C : clustered system

D : single core

Q.no 54. The parallelism achieved on the basis of operations is called as

A : Instruction level

B : Thread level

C : Transaction level

D : None of Above

Q.no 55. NVIDIA thought that 'unifying theme' of every forms of parallelism is the

A : CDA thread

B : PTA thread

C : CUDA thread

D : CUD thread

Q.no 56. In indirect communication between processes P and Q __________

A : a) there is another process R to handle and pass on the messages between P and Q

B : there is another machine between the two processes to help communication

C : there is a mailbox to help communication between P and Q


D : none of the mentioned

Q.no 57. Process synchronization of programs is done by

A : input

B : output

C : operating system

D : memory

Q.no 58. The management of data flow between computers or devices or between
nodes in a network is called

A : Flow control

B : Data Control

C : Data Management

D : Flow Management

Q.no 59. A thread is also called ___________

A : Light Weight Process(LWP)

B : Heavy Weight Process(HWP)

C : Process

D : None of the mentioned

Q.no 60. The parallelism achieved on the basis of conditions is called as

A : Instruction level

B : Thread level

C : Transaction level

D : None of Above
Answer for Question No 1. is c

Answer for Question No 2. is c

Answer for Question No 3. is c

Answer for Question No 4. is a

Answer for Question No 5. is d

Answer for Question No 6. is b

Answer for Question No 7. is b

Answer for Question No 8. is a

Answer for Question No 9. is c

Answer for Question No 10. is c

Answer for Question No 11. is a

Answer for Question No 12. is d

Answer for Question No 13. is d

Answer for Question No 14. is b

Answer for Question No 15. is d

Answer for Question No 16. is d


Answer for Question No 17. is c

Answer for Question No 18. is b

Answer for Question No 19. is a

Answer for Question No 20. is b

Answer for Question No 21. is b

Answer for Question No 22. is d

Answer for Question No 23. is b

Answer for Question No 24. is b

Answer for Question No 25. is a

Answer for Question No 26. is a

Answer for Question No 27. is a

Answer for Question No 28. is c

Answer for Question No 29. is b

Answer for Question No 30. is d

Answer for Question No 31. is a

Answer for Question No 32. is c


Answer for Question No 33. is c

Answer for Question No 34. is a

Answer for Question No 35. is c

Answer for Question No 36. is b

Answer for Question No 37. is d

Answer for Question No 38. is b

Answer for Question No 39. is d

Answer for Question No 40. is d

Answer for Question No 41. is b

Answer for Question No 42. is d

Answer for Question No 43. is b

Answer for Question No 44. is c

Answer for Question No 45. is c

Answer for Question No 46. is b

Answer for Question No 47. is a

Answer for Question No 48. is b


Answer for Question No 49. is d

Answer for Question No 50. is d

Answer for Question No 51. is b

Answer for Question No 52. is c

Answer for Question No 53. is d

Answer for Question No 54. is c

Answer for Question No 55. is c

Answer for Question No 56. is c

Answer for Question No 57. is c

Answer for Question No 58. is a

Answer for Question No 59. is a

Answer for Question No 60. is b


Seat No -
Total number of questions : 60

11342_High Performance Computing


Time : 1hr
Max Marks : 50
N.B

1) All questions are Multiple Choice Questions having single correct option.

2) Attempt any 50 questions out of 60.

3) Use of calculator is allowed.

4) Each question carries 1 Mark.

5) Specially abled students are allowed 20 minutes extra for examination.

6) Do not use pencils to darken answer.

7) Use only black/blue ball point pen to darken the appropriate circle.

8) No change will be allowed once the answer is marked on OMR Sheet.

9) Rough work shall not be done on OMR sheet or on question paper.

10) Darken ONLY ONE CIRCLE for each answer.

Q.no 1. In ………………. only one process at a time is allowed into its critical section,
among all processes that have critical sections for the same resource.

A : Mutual Exclusion

B : Synchronization

C : Deadlock

D : Starvation

Q.no 2. Following is not mapping technique

A : Static Mapping

B : Dynamic Mapping

C : Hybrid Mapping

D : All of Above
Q.no 3. Depth First Search is equivalent to which of the traversal in the Binary
Trees?

A : Pre-order Traversal

B : Post-order Traversal

C : Level-order Traversal

D : In-order Traversal

Q.no 4. Which of the following is a stable sorting algorithm?

A : Merge sort

B : Typical in-place quick sort

C : Heap sort

D : Selection sort

Q.no 5. Most message-passing programs are written using

A : the single program multiple data (SPMD) model.

B : the multiple program and single data(MPSD) model

C : the single program single data (SPSD) model

D : the Multiple program multiple data (SPMD) model

Q.no 6. The time complexity of heap sort in worst case is

A : O(log n)

B : O(n)

C : O(nlogn)

D : O(n^2)

Q.no 7. Which of the following is not an application of Depth First Search?

A : For generating topological sort of a graph

B : For generating Strongly Connected Components of a directed graph

C : Detecting cycles in the graph

D : Peer to Peer Networks


Q.no 8. The time complexity of a quick sort algorithm which makes use of median,
found by an O(n) algorithm, as pivot element is

A : O(n^2)

B : O(nlogn)

C : O(nlog(log(n))

D : O(n)

Q.no 9. Type of HPC applications of

A : Management

B : Media mass

C : Business

D : Science

Q.no 10. MIPS stands for?

A : Mandatory Instructions/sec

B : Millions of Instructions/sec

C : Most of Instructions/sec

D : Many Instructions / sec

Q.no 11. The decomposition technique in which the function is used several
number of times is called as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 12. Which of the following is not an application of Breadth First Search?

A : Finding shortest path between two nodes

B : Finding bipartiteness of a graph

C : GPS navigation system


D : Path Finding

Q.no 13. Message-passing programs are often written using

A : symetric Paradigm

B : asymetric Paradigm

C : asynchronous paradigm

D : synchronous paradigm

Q.no 14. Regarding implementation of Breadth First Search using queues, what is
the maximum distance between two nodes present in the queue? (considering
each edge length 1)

A : Can be anything

B:0

C : At most 1

D : Insufficient Information

Q.no 15. The logical view of a machine supporting the message-passing paradigm
consists of p processes, each with its own _______

A : Partitioned Address space

B : Exclusive address space

C : Logical Adress Space

D : Non shared Adress Space

Q.no 16. Several instructions execution simultaneously in ________________

A : processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 17. Which one of the following is not shared by threads?

A : program counter

B : stack
C : both program counter and stack

D : none of the mentioned

Q.no 18. How many Attibutes required to characterize messsage passing


paragdigm

A:2

B:4

C:6

D:8

Q.no 19. Which of the following is not a stable sorting algorithm?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 20. Following is not decomposition technique

A : Data Decomposition

B : Recursive Decomposition

C : Serial Decomposition

D : Exploratory Decomposition

Q.no 21. Decomposition stands for

A : Dividing Problem statement

B : Dividing no of processors

C : Dividing number of tasks

D : Dividing number of operation

Q.no 22. Calling a kernel is typically referred to as _________.

A : kernel thread

B : kernel initialization
C : kernel termination

D : kernel invocation

Q.no 23. The decomposition technique in which the input is divided is called
as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 24. Which of the following is not an in-place sorting algorithm?

A : Selection sort

B : Heap sort

C : Quick Sort

D : Merge sort

Q.no 25. The kernel code is dentified by the ________qualifier with void return type

A : _host_

B : __global__

C : _device_

D : void

Q.no 26. Message passing system allows processes to __________

A : communicate with one another without resorting to shared data

B : communicate with one another by resorting to shared data

C : share data

D : name the recipient or sender of the message

Q.no 27. It is ___________ speed and ___________ latency.

A : High, high

B : Low, low
C : High, low

D : Low, high

Q.no 28. Running merge sort on an array of size n which is already sorted is

A : O(n)

B : O(nlogn)

C : O(n^2)

D : O(log n)

Q.no 29. If the given input array is sorted or nearly sorted, which of the following
algorithm gives the best performance?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 30. Which of the following is NOT a characteristic of parallel computing?

A : Breaks a task into pieces

B : Uses a single processor or computer

C : Simultaneous execution

D : May use networking

Q.no 31. What is Inter process communication?

A : allows processes to communicate and synchronize their actions when using the
same address space

B : allows processes to communicate and synchronize their actions without using the
same address space

C : allows the processes to only synchronize their actions without communication

D : none of the mentioned

Q.no 32. Network interfaces allow the transfer of


messages from buffer memory to desired location without ____ intervention
A : DMA

B : CPU

C : I/O

D : Memory

Q.no 33. Execution of several activities at the same time.

A : multi processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 34. When the event for which a thread is blocked occurs?

A : thread moves to the ready queue

B : thread remains blocked

C : thread completes

D : a new thread is provided

Q.no 35. Interprocessor communication that takes place

A : Centralized memory

B : Shared memory

C : Message passing

D : Both A and B

Q.no 36. ______________ leads to concurrency.

A : Serialization

B : Parallelism

C : Serial processing

D : Distribution

Q.no 37. High performance computing of the computer system tasks are done by
A : node clusters

B : network clusters

C : both a and b

D : Beowulf clusters

Q.no 38. The network topology used for interconnection network.

A : Bus based

B : Mesh

C : Linear Array

D : All of above

Q.no 39. _____ are major issues with non-buffered blocking sends

A : concurrent and mutual exclsion

B : Idling and deadlocks

C : synchronization

D : scheduling

Q.no 40. The time required to create a new thread in an existing process is
___________

A : greater than the time required to create a new process

B : less than the time required to create a new process

C : equal to the time required to create a new process

D : none of the mentioned

Q.no 41. A process can be ___________

A : single threaded

B : multithreaded

C : both single threaded and multithreaded

D : none of the mentioned

Q.no 42. Broader concept offers Cloud computing .to select which of the following.
A : Parallel computing

B : Centralized computing

C : Utility computing

D : Decentralized computing

Q.no 43. RMI stands for?

A : Remote Mail InvocationRemote Message Invocation

B : Remaining Method Invention

C : Remaining Method Invocation

D : Remote Method Invocation

Q.no 44. If one thread opens a file with read privileges then ___________

A : other threads in the another process can also read from that file

B : other threads in the same process can also read from that file

C : any other thread can not read from that file

D : all of the mentioned

Q.no 45. Writing parallel programs is referred to as

A : Parallel computation

B : Parallel processes

C : Parallel development

D : Parallel programming

Q.no 46. the basic operations in the message-passing programming paradigm are
___

A : initiate and listen

B : wait and acknoweldge

C : request and reply

D : send and receive

Q.no 47. Parallel computing uses _____ execution


A : sequential

B : unique

C : simultaneous

D : none of the answers is correct

Q.no 48. Consider the situation in which assignment operation is very costly.
Which of the following sorting algorithm should be performed so that the
number of assignment operations is minimized in general?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 49. Which of the following is not a noncomparison sort?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Shell sort

Q.no 50. We have an internet cloud of resources In cloud computing to form

A : Centralized computing

B : Decentralized computing

C : Parallel computing

D : All of Above

Q.no 51. The link between two processes P and Q to send and receive messages is
called __________

A : communication link

B : message-passing link

C : synchronization link

D : all of the mentioned


Q.no 52. Process synchronization of programs is done by

A : input

B : output

C : operating system

D : memory

Q.no 53. One that is not a type of multiprocessor of the computer system is

A : dual core

B : blade server

C : clustered system

D : single core

Q.no 54. A thread shares its resources(like data section, code section, open files,
signals) with ___________

A : other process similar to the one that the thread belongs to

B : other threads that belong to similar processes

C : other threads that belong to the same process

D : all of the mentioned

Q.no 55. NVIDIA thought that 'unifying theme' of every forms of parallelism is the

A : CDA thread

B : PTA thread

C : CUDA thread

D : CUD thread

Q.no 56. Termination of the process terminates ___________

A : first thread of the process

B : first two threads of the process

C : all threads within the process

D : no thread within the process


Q.no 57. Given a number of elements in the range [0….n^3]. which of the following
sorting algorithms can sort them in O(n) time?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Quick sort

Q.no 58. Which of the following two operations are provided by the IPC facility?

A : write & delete message

B : delete & receive message

C : send & delete message

D : receive & send message

Q.no 59. In indirect communication between processes P and Q __________

A : a) there is another process R to handle and pass on the messages between P and Q

B : there is another machine between the two processes to help communication

C : there is a mailbox to help communication between P and Q

D : none of the mentioned

Q.no 60. Octa-core processor are the processors of the computer system that
contains

A : 2 processors

B : 4 processors

C : 6 processors

D : 8 processors
Answer for Question No 1. is a

Answer for Question No 2. is d

Answer for Question No 3. is a

Answer for Question No 4. is a

Answer for Question No 5. is c

Answer for Question No 6. is c

Answer for Question No 7. is d

Answer for Question No 8. is b

Answer for Question No 9. is d

Answer for Question No 10. is b

Answer for Question No 11. is b

Answer for Question No 12. is d

Answer for Question No 13. is c

Answer for Question No 14. is c

Answer for Question No 15. is b

Answer for Question No 16. is b


Answer for Question No 17. is c

Answer for Question No 18. is a

Answer for Question No 19. is b

Answer for Question No 20. is c

Answer for Question No 21. is a

Answer for Question No 22. is d

Answer for Question No 23. is a

Answer for Question No 24. is d

Answer for Question No 25. is b

Answer for Question No 26. is a

Answer for Question No 27. is c

Answer for Question No 28. is b

Answer for Question No 29. is b

Answer for Question No 30. is a

Answer for Question No 31. is b

Answer for Question No 32. is b


Answer for Question No 33. is b

Answer for Question No 34. is a

Answer for Question No 35. is d

Answer for Question No 36. is b

Answer for Question No 37. is d

Answer for Question No 38. is d

Answer for Question No 39. is b

Answer for Question No 40. is b

Answer for Question No 41. is c

Answer for Question No 42. is c

Answer for Question No 43. is d

Answer for Question No 44. is b

Answer for Question No 45. is d

Answer for Question No 46. is d

Answer for Question No 47. is c

Answer for Question No 48. is b


Answer for Question No 49. is d

Answer for Question No 50. is d

Answer for Question No 51. is a

Answer for Question No 52. is c

Answer for Question No 53. is d

Answer for Question No 54. is c

Answer for Question No 55. is c

Answer for Question No 56. is c

Answer for Question No 57. is c

Answer for Question No 58. is d

Answer for Question No 59. is c

Answer for Question No 60. is d


Seat No -
Total number of questions : 60

11342_High Performance Computing


Time : 1hr
Max Marks : 50
N.B

1) All questions are Multiple Choice Questions having single correct option.

2) Attempt any 50 questions out of 60.

3) Use of calculator is allowed.

4) Each question carries 1 Mark.

5) Specially abled students are allowed 20 minutes extra for examination.

6) Do not use pencils to darken answer.

7) Use only black/blue ball point pen to darken the appropriate circle.

8) No change will be allowed once the answer is marked on OMR Sheet.

9) Rough work shall not be done on OMR sheet or on question paper.

10) Darken ONLY ONE CIRCLE for each answer.

Q.no 1. Which one of the following is not shared by threads?

A : program counter

B : stack

C : both program counter and stack

D : none of the mentioned

Q.no 2. Which of the following is not an application of Breadth First Search?

A : Finding shortest path between two nodes

B : Finding bipartiteness of a graph

C : GPS navigation system

D : Path Finding
Q.no 3. Regarding implementation of Breadth First Search using queues, what is
the maximum distance between two nodes present in the queue? (considering
each edge length 1)

A : Can be anything

B:0

C : At most 1

D : Insufficient Information

Q.no 4. In ………………. only one process at a time is allowed into its critical section,
among all processes that have critical sections for the same resource.

A : Mutual Exclusion

B : Synchronization

C : Deadlock

D : Starvation

Q.no 5. The kernel code is dentified by the ________qualifier with void return type

A : _host_

B : __global__

C : _device_

D : void

Q.no 6. Which of the following is not an application of Depth First Search?

A : For generating topological sort of a graph

B : For generating Strongly Connected Components of a directed graph

C : Detecting cycles in the graph

D : Peer to Peer Networks

Q.no 7. The time complexity of heap sort in worst case is

A : O(log n)

B : O(n)

C : O(nlogn)
D : O(n^2)

Q.no 8. The logical view of a machine supporting the message-passing paradigm


consists of p processes, each with its own _______

A : Partitioned Address space

B : Exclusive address space

C : Logical Adress Space

D : Non shared Adress Space

Q.no 9. Which of the following is not an in-place sorting algorithm?

A : Selection sort

B : Heap sort

C : Quick Sort

D : Merge sort

Q.no 10. Type of HPC applications of

A : Management

B : Media mass

C : Business

D : Science

Q.no 11. Following is not decomposition technique

A : Data Decomposition

B : Recursive Decomposition

C : Serial Decomposition

D : Exploratory Decomposition

Q.no 12. Following is not mapping technique

A : Static Mapping

B : Dynamic Mapping

C : Hybrid Mapping
D : All of Above

Q.no 13. The time complexity of a quick sort algorithm which makes use of
median, found by an O(n) algorithm, as pivot element is

A : O(n^2)

B : O(nlogn)

C : O(nlog(log(n))

D : O(n)

Q.no 14. The decomposition technique in which the function is used several
number of times is called as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 15. How many Attibutes required to characterize messsage passing


paragdigm

A:2

B:4

C:6

D:8

Q.no 16. When the Breadth First Search of a graph is unique?

A : When the graph is a Binary Tree

B : When the graph is a Linked List

C : When the graph is a n-ary Tree

D : When the graph is a Ternary Tree

Q.no 17. Calling a kernel is typically referred to as _________.

A : kernel thread

B : kernel initialization
C : kernel termination

D : kernel invocation

Q.no 18. Message-passing programs are often written using

A : symetric Paradigm

B : asymetric Paradigm

C : asynchronous paradigm

D : synchronous paradigm

Q.no 19. Several instructions execution simultaneously in ________________

A : processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 20. Decomposition stands for

A : Dividing Problem statement

B : Dividing no of processors

C : Dividing number of tasks

D : Dividing number of operation

Q.no 21. MIPS stands for?

A : Mandatory Instructions/sec

B : Millions of Instructions/sec

C : Most of Instructions/sec

D : Many Instructions / sec

Q.no 22. Which of the following is not a stable sorting algorithm?

A : Insertion sort

B : Selection sort
C : Bubble sort

D : Merge sort

Q.no 23. The decomposition technique in which the input is divided is called
as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 24. Most message-passing programs are written using

A : the single program multiple data (SPMD) model.

B : the multiple program and single data(MPSD) model

C : the single program single data (SPSD) model

D : the Multiple program multiple data (SPMD) model

Q.no 25. Depth First Search is equivalent to which of the traversal in the Binary
Trees?

A : Pre-order Traversal

B : Post-order Traversal

C : Level-order Traversal

D : In-order Traversal

Q.no 26. RMI stands for?

A : Remote Mail InvocationRemote Message Invocation

B : Remaining Method Invention

C : Remaining Method Invocation

D : Remote Method Invocation

Q.no 27. Nanoscience can be studied with the help of ___________

A : Quantum mechanics
B : Newtonian mechanics

C : Macro-dynamic

D : Geophysics

Q.no 28. It is ___________ speed and ___________ latency.

A : High, high

B : Low, low

C : High, low

D : Low, high

Q.no 29. A process can be ___________

A : single threaded

B : multithreaded

C : both single threaded and multithreaded

D : none of the mentioned

Q.no 30. If one thread opens a file with read privileges then ___________

A : other threads in the another process can also read from that file

B : other threads in the same process can also read from that file

C : any other thread can not read from that file

D : all of the mentioned

Q.no 31. Which of the ceramic components are easier through nano structuring?

A : Lubrication

B : Coating

C : Fabrication

D : Wear

Q.no 32. We have an internet cloud of resources In cloud computing to form

A : Centralized computing
B : Decentralized computing

C : Parallel computing

D : All of Above

Q.no 33. What is Inter process communication?

A : allows processes to communicate and synchronize their actions when using the
same address space

B : allows processes to communicate and synchronize their actions without using the
same address space

C : allows the processes to only synchronize their actions without communication

D : none of the mentioned

Q.no 34. Time complexity of bubble sort in best case is

A : θ (n)

B : θ (nlogn)

C : θ (n^2)

D : θ (n(logn)^2)

Q.no 35. Consider the situation in which assignment operation is very costly.
Which of the following sorting algorithm should be performed so that the
number of assignment operations is minimized in general?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 36. Message passing system allows processes to __________

A : communicate with one another without resorting to shared data

B : communicate with one another by resorting to shared data

C : share data

D : name the recipient or sender of the message


Q.no 37. ______________ leads to concurrency.

A : Serialization

B : Parallelism

C : Serial processing

D : Distribution

Q.no 38. High performance computing of the computer system tasks are done by

A : node clusters

B : network clusters

C : both a and b

D : Beowulf clusters

Q.no 39. When the event for which a thread is blocked occurs?

A : thread moves to the ready queue

B : thread remains blocked

C : thread completes

D : a new thread is provided

Q.no 40. Dynamic networks of networks, is a dynamic connection that grows is


called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : None of these

Q.no 41. If the given input array is sorted or nearly sorted, which of the following
algorithm gives the best performance?

A : Insertion sort

B : Selection sort

C : Bubble sort
D : Merge sort

Q.no 42. Execution of several activities at the same time.

A : multi processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 43. _____ are major issues with non-buffered blocking sends

A : concurrent and mutual exclsion

B : Idling and deadlocks

C : synchronization

D : scheduling

Q.no 44. Parallel computing uses _____ execution

A : sequential

B : unique

C : simultaneous

D : none of the answers is correct

Q.no 45. Writing parallel programs is referred to as

A : Parallel computation

B : Parallel processes

C : Parallel development

D : Parallel programming

Q.no 46. Which of the following is NOT a characteristic of parallel computing?

A : Breaks a task into pieces

B : Uses a single processor or computer

C : Simultaneous execution
D : May use networking

Q.no 47. Interprocessor communication that takes place

A : Centralized memory

B : Shared memory

C : Message passing

D : Both A and B

Q.no 48. Running merge sort on an array of size n which is already sorted is

A : O(n)

B : O(nlogn)

C : O(n^2)

D : O(log n)

Q.no 49. Which of the following is not a noncomparison sort?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Shell sort

Q.no 50. The time required to create a new thread in an existing process is
___________

A : greater than the time required to create a new process

B : less than the time required to create a new process

C : equal to the time required to create a new process

D : none of the mentioned

Q.no 51. Multi-processor systems of the computer system has advantage of

A : cost

B : reliability

C : uncertainty
D : scalability

Q.no 52. The parallelism achieved on the basis of operations is called as

A : Instruction level

B : Thread level

C : Transaction level

D : None of Above

Q.no 53. Process synchronization of programs is done by

A : input

B : output

C : operating system

D : memory

Q.no 54. Octa-core processor are the processors of the computer system that
contains

A : 2 processors

B : 4 processors

C : 6 processors

D : 8 processors

Q.no 55. Thread synchronization is required because ___________

A : all threads of a process share the same address space

B : all threads of a process share the same global variables

C : all threads of a process can share the same files

D : all of the mentioned

Q.no 56. Data access and storage are elements of Job throughput, of __________.

A : Flexibility

B : Adaptation

C : Efficiency
D : Dependability

Q.no 57. Messages sent by a process __________

A : have to be of a fixed size

B : have to be a variable size

C : can be fixed or variable sized

D : None of the mentioned

Q.no 58. The link between two processes P and Q to send and receive messages is
called __________

A : communication link

B : message-passing link

C : synchronization link

D : all of the mentioned

Q.no 59. Which of the following algorithms has lowest worst case time
complexity?

A : Insertion sort

B : Selection sort

C : Quick Sort

D : Heap sort

Q.no 60. The register context and stacks of a thread are deallocated when the
thread?

A : terminates

B : blocks

C : unblocks

D : spawns
Answer for Question No 1. is c

Answer for Question No 2. is d

Answer for Question No 3. is c

Answer for Question No 4. is a

Answer for Question No 5. is b

Answer for Question No 6. is d

Answer for Question No 7. is c

Answer for Question No 8. is b

Answer for Question No 9. is d

Answer for Question No 10. is d

Answer for Question No 11. is c

Answer for Question No 12. is d

Answer for Question No 13. is b

Answer for Question No 14. is b

Answer for Question No 15. is a

Answer for Question No 16. is b


Answer for Question No 17. is d

Answer for Question No 18. is c

Answer for Question No 19. is b

Answer for Question No 20. is a

Answer for Question No 21. is b

Answer for Question No 22. is b

Answer for Question No 23. is a

Answer for Question No 24. is c

Answer for Question No 25. is a

Answer for Question No 26. is d

Answer for Question No 27. is a

Answer for Question No 28. is c

Answer for Question No 29. is c

Answer for Question No 30. is b

Answer for Question No 31. is c

Answer for Question No 32. is d


Answer for Question No 33. is b

Answer for Question No 34. is a

Answer for Question No 35. is b

Answer for Question No 36. is a

Answer for Question No 37. is b

Answer for Question No 38. is d

Answer for Question No 39. is a

Answer for Question No 40. is c

Answer for Question No 41. is b

Answer for Question No 42. is b

Answer for Question No 43. is b

Answer for Question No 44. is c

Answer for Question No 45. is d

Answer for Question No 46. is a

Answer for Question No 47. is d

Answer for Question No 48. is b


Answer for Question No 49. is d

Answer for Question No 50. is b

Answer for Question No 51. is b

Answer for Question No 52. is c

Answer for Question No 53. is c

Answer for Question No 54. is d

Answer for Question No 55. is d

Answer for Question No 56. is c

Answer for Question No 57. is c

Answer for Question No 58. is a

Answer for Question No 59. is d

Answer for Question No 60. is a


Seat No -
Total number of questions : 60

11342_High Performance Computing


Time : 1hr
Max Marks : 50
N.B

1) All questions are Multiple Choice Questions having single correct option.

2) Attempt any 50 questions out of 60.

3) Use of calculator is allowed.

4) Each question carries 1 Mark.

5) Specially abled students are allowed 20 minutes extra for examination.

6) Do not use pencils to darken answer.

7) Use only black/blue ball point pen to darken the appropriate circle.

8) No change will be allowed once the answer is marked on OMR Sheet.

9) Rough work shall not be done on OMR sheet or on question paper.

10) Darken ONLY ONE CIRCLE for each answer.

Q.no 1. Which one of the following is not shared by threads?

A : program counter

B : stack

C : both program counter and stack

D : none of the mentioned

Q.no 2. Regarding implementation of Breadth First Search using queues, what is


the maximum distance between two nodes present in the queue? (considering
each edge length 1)

A : Can be anything

B:0

C : At most 1

D : Insufficient Information
Q.no 3. The time complexity of heap sort in worst case is

A : O(log n)

B : O(n)

C : O(nlogn)

D : O(n^2)

Q.no 4. MIPS stands for?

A : Mandatory Instructions/sec

B : Millions of Instructions/sec

C : Most of Instructions/sec

D : Many Instructions / sec

Q.no 5. The time complexity of a quick sort algorithm which makes use of median,
found by an O(n) algorithm, as pivot element is

A : O(n^2)

B : O(nlogn)

C : O(nlog(log(n))

D : O(n)

Q.no 6. The decomposition technique in which the input is divided is called


as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 7. Most message-passing programs are written using

A : the single program multiple data (SPMD) model.

B : the multiple program and single data(MPSD) model

C : the single program single data (SPSD) model


D : the Multiple program multiple data (SPMD) model

Q.no 8. Following is not mapping technique

A : Static Mapping

B : Dynamic Mapping

C : Hybrid Mapping

D : All of Above

Q.no 9. How many Attibutes required to characterize messsage passing paragdigm

A:2

B:4

C:6

D:8

Q.no 10. The kernel code is dentified by the ________qualifier with void return type

A : _host_

B : __global__

C : _device_

D : void

Q.no 11. Depth First Search is equivalent to which of the traversal in the Binary
Trees?

A : Pre-order Traversal

B : Post-order Traversal

C : Level-order Traversal

D : In-order Traversal

Q.no 12. Message-passing programs are often written using

A : symetric Paradigm

B : asymetric Paradigm

C : asynchronous paradigm
D : synchronous paradigm

Q.no 13. Several instructions execution simultaneously in ________________

A : processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 14. The decomposition technique in which the function is used several
number of times is called as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 15. Which of the following is a stable sorting algorithm?

A : Merge sort

B : Typical in-place quick sort

C : Heap sort

D : Selection sort

Q.no 16. In ………………. only one process at a time is allowed into its critical
section, among all processes that have critical sections for the same resource.

A : Mutual Exclusion

B : Synchronization

C : Deadlock

D : Starvation

Q.no 17. Decomposition stands for

A : Dividing Problem statement

B : Dividing no of processors
C : Dividing number of tasks

D : Dividing number of operation

Q.no 18. Which of the following is not an application of Depth First Search?

A : For generating topological sort of a graph

B : For generating Strongly Connected Components of a directed graph

C : Detecting cycles in the graph

D : Peer to Peer Networks

Q.no 19. When the Breadth First Search of a graph is unique?

A : When the graph is a Binary Tree

B : When the graph is a Linked List

C : When the graph is a n-ary Tree

D : When the graph is a Ternary Tree

Q.no 20. Which of the following is not a stable sorting algorithm?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 21. Which of the following is not an in-place sorting algorithm?

A : Selection sort

B : Heap sort

C : Quick Sort

D : Merge sort

Q.no 22. Which of the following is not an application of Breadth First Search?

A : Finding shortest path between two nodes

B : Finding bipartiteness of a graph


C : GPS navigation system

D : Path Finding

Q.no 23. Calling a kernel is typically referred to as _________.

A : kernel thread

B : kernel initialization

C : kernel termination

D : kernel invocation

Q.no 24. The logical view of a machine supporting the message-passing paradigm
consists of p processes, each with its own _______

A : Partitioned Address space

B : Exclusive address space

C : Logical Adress Space

D : Non shared Adress Space

Q.no 25. Type of HPC applications of

A : Management

B : Media mass

C : Business

D : Science

Q.no 26. Which of the following is NOT a characteristic of parallel computing?

A : Breaks a task into pieces

B : Uses a single processor or computer

C : Simultaneous execution

D : May use networking

Q.no 27. It is ___________ speed and ___________ latency.

A : High, high

B : Low, low
C : High, low

D : Low, high

Q.no 28. Broader concept offers Cloud computing .to select which of the following.

A : Parallel computing

B : Centralized computing

C : Utility computing

D : Decentralized computing

Q.no 29. ______________ leads to concurrency.

A : Serialization

B : Parallelism

C : Serial processing

D : Distribution

Q.no 30. the basic operations in the message-passing programming paradigm are
___

A : initiate and listen

B : wait and acknoweldge

C : request and reply

D : send and receive

Q.no 31. Dynamic networks of networks, is a dynamic connection that grows is


called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : None of these

Q.no 32. Running merge sort on an array of size n which is already sorted is

A : O(n)
B : O(nlogn)

C : O(n^2)

D : O(log n)

Q.no 33. The network topology used for interconnection network.

A : Bus based

B : Mesh

C : Linear Array

D : All of above

Q.no 34. Message passing system allows processes to __________

A : communicate with one another without resorting to shared data

B : communicate with one another by resorting to shared data

C : share data

D : name the recipient or sender of the message

Q.no 35. We have an internet cloud of resources In cloud computing to form

A : Centralized computing

B : Decentralized computing

C : Parallel computing

D : All of Above

Q.no 36. If one thread opens a file with read privileges then ___________

A : other threads in the another process can also read from that file

B : other threads in the same process can also read from that file

C : any other thread can not read from that file

D : all of the mentioned

Q.no 37. Consider the situation in which assignment operation is very costly.
Which of the following sorting algorithm should be performed so that the
number of assignment operations is minimized in general?
A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 38. Parallel computing uses _____ execution

A : sequential

B : unique

C : simultaneous

D : none of the answers is correct

Q.no 39. RMI stands for?

A : Remote Mail InvocationRemote Message Invocation

B : Remaining Method Invention

C : Remaining Method Invocation

D : Remote Method Invocation

Q.no 40. Network interfaces allow the transfer of


messages from buffer memory to desired location without ____ intervention

A : DMA

B : CPU

C : I/O

D : Memory

Q.no 41. Nanoscience can be studied with the help of ___________

A : Quantum mechanics

B : Newtonian mechanics

C : Macro-dynamic

D : Geophysics

Q.no 42. When the event for which a thread is blocked occurs?
A : thread moves to the ready queue

B : thread remains blocked

C : thread completes

D : a new thread is provided

Q.no 43. The time required to create a new thread in an existing process is
___________

A : greater than the time required to create a new process

B : less than the time required to create a new process

C : equal to the time required to create a new process

D : none of the mentioned

Q.no 44. High performance computing of the computer system tasks are done by

A : node clusters

B : network clusters

C : both a and b

D : Beowulf clusters

Q.no 45. Time complexity of bubble sort in best case is

A : θ (n)

B : θ (nlogn)

C : θ (n^2)

D : θ (n(logn)^2)

Q.no 46. Writing parallel programs is referred to as

A : Parallel computation

B : Parallel processes

C : Parallel development

D : Parallel programming

Q.no 47. Execution of several activities at the same time.


A : multi processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 48. Which of the ceramic components are easier through nano structuring?

A : Lubrication

B : Coating

C : Fabrication

D : Wear

Q.no 49. If the given input array is sorted or nearly sorted, which of the following
algorithm gives the best performance?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 50. _____ are major issues with non-buffered blocking sends

A : concurrent and mutual exclsion

B : Idling and deadlocks

C : synchronization

D : scheduling

Q.no 51. A thread is also called ___________

A : Light Weight Process(LWP)

B : Heavy Weight Process(HWP)

C : Process

D : None of the mentioned

Q.no 52. NVIDIA thought that 'unifying theme' of every forms of parallelism is the
A : CDA thread

B : PTA thread

C : CUDA thread

D : CUD thread

Q.no 53. In indirect communication between processes P and Q __________

A : a) there is another process R to handle and pass on the messages between P and Q

B : there is another machine between the two processes to help communication

C : there is a mailbox to help communication between P and Q

D : none of the mentioned

Q.no 54. The transparency that enables accessing local and remote resources
using identical operations is called ____________

A : Access transparency

B : Concurrency transparency

C : Performance transparency

D : Scaling transparency

Q.no 55. Octa-core processor are the processors of the computer system that
contains

A : 2 processors

B : 4 processors

C : 6 processors

D : 8 processors

Q.no 56. Given a number of elements in the range [0….n^3]. which of the following
sorting algorithms can sort them in O(n) time?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Quick sort
Q.no 57. Which of the following is not the possible ways of data exchange?

A : Simplex

B : Multiplex

C : Half-duplex

D : Full-duplex

Q.no 58. The register context and stacks of a thread are deallocated when the
thread?

A : terminates

B : blocks

C : unblocks

D : spawns

Q.no 59. Dynamic networks is a dynamic connection that grows is called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : Cyber-physical system

Q.no 60. Multi-processor systems of the computer system has advantage of

A : cost

B : reliability

C : uncertainty

D : scalability
Answer for Question No 1. is c

Answer for Question No 2. is c

Answer for Question No 3. is c

Answer for Question No 4. is b

Answer for Question No 5. is b

Answer for Question No 6. is a

Answer for Question No 7. is c

Answer for Question No 8. is d

Answer for Question No 9. is a

Answer for Question No 10. is b

Answer for Question No 11. is a

Answer for Question No 12. is c

Answer for Question No 13. is b

Answer for Question No 14. is b

Answer for Question No 15. is a

Answer for Question No 16. is a


Answer for Question No 17. is a

Answer for Question No 18. is d

Answer for Question No 19. is b

Answer for Question No 20. is b

Answer for Question No 21. is d

Answer for Question No 22. is d

Answer for Question No 23. is d

Answer for Question No 24. is b

Answer for Question No 25. is d

Answer for Question No 26. is a

Answer for Question No 27. is c

Answer for Question No 28. is c

Answer for Question No 29. is b

Answer for Question No 30. is d

Answer for Question No 31. is c

Answer for Question No 32. is b


Answer for Question No 33. is d

Answer for Question No 34. is a

Answer for Question No 35. is d

Answer for Question No 36. is b

Answer for Question No 37. is b

Answer for Question No 38. is c

Answer for Question No 39. is d

Answer for Question No 40. is b

Answer for Question No 41. is a

Answer for Question No 42. is a

Answer for Question No 43. is b

Answer for Question No 44. is d

Answer for Question No 45. is a

Answer for Question No 46. is d

Answer for Question No 47. is b

Answer for Question No 48. is c


Answer for Question No 49. is b

Answer for Question No 50. is b

Answer for Question No 51. is a

Answer for Question No 52. is c

Answer for Question No 53. is c

Answer for Question No 54. is a

Answer for Question No 55. is d

Answer for Question No 56. is c

Answer for Question No 57. is b

Answer for Question No 58. is a

Answer for Question No 59. is c

Answer for Question No 60. is b


Seat No -
Total number of questions : 60

11342_High Performance Computing


Time : 1hr
Max Marks : 50
N.B

1) All questions are Multiple Choice Questions having single correct option.

2) Attempt any 50 questions out of 60.

3) Use of calculator is allowed.

4) Each question carries 1 Mark.

5) Specially abled students are allowed 20 minutes extra for examination.

6) Do not use pencils to darken answer.

7) Use only black/blue ball point pen to darken the appropriate circle.

8) No change will be allowed once the answer is marked on OMR Sheet.

9) Rough work shall not be done on OMR sheet or on question paper.

10) Darken ONLY ONE CIRCLE for each answer.

Q.no 1. MIPS stands for?

A : Mandatory Instructions/sec

B : Millions of Instructions/sec

C : Most of Instructions/sec

D : Many Instructions / sec

Q.no 2. In ………………. only one process at a time is allowed into its critical section,
among all processes that have critical sections for the same resource.

A : Mutual Exclusion

B : Synchronization

C : Deadlock

D : Starvation
Q.no 3. Which of the following is not an application of Breadth First Search?

A : Finding shortest path between two nodes

B : Finding bipartiteness of a graph

C : GPS navigation system

D : Path Finding

Q.no 4. Regarding implementation of Breadth First Search using queues, what is


the maximum distance between two nodes present in the queue? (considering
each edge length 1)

A : Can be anything

B:0

C : At most 1

D : Insufficient Information

Q.no 5. The decomposition technique in which the input is divided is called


as_________

A : Data Decomposition

B : Recursive Decomposition

C : Speculative Decomposition

D : Exploratory Decomposition

Q.no 6. Which of the following is a stable sorting algorithm?

A : Merge sort

B : Typical in-place quick sort

C : Heap sort

D : Selection sort

Q.no 7. Which one of the following is not shared by threads?

A : program counter

B : stack

C : both program counter and stack


D : none of the mentioned

Q.no 8. Several instructions execution simultaneously in ________________

A : processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 9. When the Breadth First Search of a graph is unique?

A : When the graph is a Binary Tree

B : When the graph is a Linked List

C : When the graph is a n-ary Tree

D : When the graph is a Ternary Tree

Q.no 10. Message-passing programs are often written using

A : symetric Paradigm

B : asymetric Paradigm

C : asynchronous paradigm

D : synchronous paradigm

Q.no 11. How many Attibutes required to characterize messsage passing


paragdigm

A:2

B:4

C:6

D:8

Q.no 12. Following is not mapping technique

A : Static Mapping

B : Dynamic Mapping

C : Hybrid Mapping
D : All of Above

Q.no 13. The time complexity of a quick sort algorithm which makes use of
median, found by an O(n) algorithm, as pivot element is

A : O(n^2)

B : O(nlogn)

C : O(nlog(log(n))

D : O(n)

Q.no 14. Following is not decomposition technique

A : Data Decomposition

B : Recursive Decomposition

C : Serial Decomposition

D : Exploratory Decomposition

Q.no 15. Most message-passing programs are written using

A : the single program multiple data (SPMD) model.

B : the multiple program and single data(MPSD) model

C : the single program single data (SPSD) model

D : the Multiple program multiple data (SPMD) model

Q.no 16. The logical view of a machine supporting the message-passing paradigm
consists of p processes, each with its own _______

A : Partitioned Address space

B : Exclusive address space

C : Logical Adress Space

D : Non shared Adress Space

Q.no 17. The time complexity of heap sort in worst case is

A : O(log n)

B : O(n)
C : O(nlogn)

D : O(n^2)

Q.no 18. Decomposition stands for

A : Dividing Problem statement

B : Dividing no of processors

C : Dividing number of tasks

D : Dividing number of operation

Q.no 19. Type of HPC applications of

A : Management

B : Media mass

C : Business

D : Science

Q.no 20. The kernel code is dentified by the ________qualifier with void return type

A : _host_

B : __global__

C : _device_

D : void

Q.no 21. Which of the following is not a stable sorting algorithm?

A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 22. Which of the following is not an application of Depth First Search?

A : For generating topological sort of a graph

B : For generating Strongly Connected Components of a directed graph


C : Detecting cycles in the graph

D : Peer to Peer Networks

Q.no 23. Calling a kernel is typically referred to as _________.

A : kernel thread

B : kernel initialization

C : kernel termination

D : kernel invocation

Q.no 24. Which of the following is not an in-place sorting algorithm?

A : Selection sort

B : Heap sort

C : Quick Sort

D : Merge sort

Q.no 25. Depth First Search is equivalent to which of the traversal in the Binary
Trees?

A : Pre-order Traversal

B : Post-order Traversal

C : Level-order Traversal

D : In-order Traversal

Q.no 26. Broader concept offers Cloud computing .to select which of the following.

A : Parallel computing

B : Centralized computing

C : Utility computing

D : Decentralized computing

Q.no 27. RMI stands for?

A : Remote Mail InvocationRemote Message Invocation

B : Remaining Method Invention


C : Remaining Method Invocation

D : Remote Method Invocation

Q.no 28. Dynamic networks of networks, is a dynamic connection that grows is


called

A : Multithreading

B : Cyber cycle

C : Internet of things

D : None of these

Q.no 29. the basic operations in the message-passing programming paradigm are
___

A : initiate and listen

B : wait and acknoweldge

C : request and reply

D : send and receive

Q.no 30. Which of the following is NOT a characteristic of parallel computing?

A : Breaks a task into pieces

B : Uses a single processor or computer

C : Simultaneous execution

D : May use networking

Q.no 31. Parallel computing uses _____ execution

A : sequential

B : unique

C : simultaneous

D : none of the answers is correct

Q.no 32. A process can be ___________

A : single threaded
B : multithreaded

C : both single threaded and multithreaded

D : none of the mentioned

Q.no 33. Network interfaces allow the transfer of


messages from buffer memory to desired location without ____ intervention

A : DMA

B : CPU

C : I/O

D : Memory

Q.no 34. High performance computing of the computer system tasks are done by

A : node clusters

B : network clusters

C : both a and b

D : Beowulf clusters

Q.no 35. Message passing system allows processes to __________

A : communicate with one another without resorting to shared data

B : communicate with one another by resorting to shared data

C : share data

D : name the recipient or sender of the message

Q.no 36. The time required to create a new thread in an existing process is
___________

A : greater than the time required to create a new process

B : less than the time required to create a new process

C : equal to the time required to create a new process

D : none of the mentioned

Q.no 37. It is ___________ speed and ___________ latency.


A : High, high

B : Low, low

C : High, low

D : Low, high

Q.no 38. Which of the ceramic components are easier through nano structuring?

A : Lubrication

B : Coating

C : Fabrication

D : Wear

Q.no 39. _____ are major issues with non-buffered blocking sends

A : concurrent and mutual exclsion

B : Idling and deadlocks

C : synchronization

D : scheduling

Q.no 40. Time complexity of bubble sort in best case is

A : θ (n)

B : θ (nlogn)

C : θ (n^2)

D : θ (n(logn)^2)

Q.no 41. When the event for which a thread is blocked occurs?

A : thread moves to the ready queue

B : thread remains blocked

C : thread completes

D : a new thread is provided

Q.no 42. If the given input array is sorted or nearly sorted, which of the following
algorithm gives the best performance?
A : Insertion sort

B : Selection sort

C : Bubble sort

D : Merge sort

Q.no 43. If one thread opens a file with read privileges then ___________

A : other threads in the another process can also read from that file

B : other threads in the same process can also read from that file

C : any other thread can not read from that file

D : all of the mentioned

Q.no 44. The network topology used for interconnection network.

A : Bus based

B : Mesh

C : Linear Array

D : All of above

Q.no 45. Execution of several activities at the same time.

A : multi processing

B : parallel processing

C : serial processing

D : multitasking

Q.no 46. What is Inter process communication?

A : allows processes to communicate and synchronize their actions when using the
same address space

B : allows processes to communicate and synchronize their actions without using the
same address space

C : allows the processes to only synchronize their actions without communication

D : none of the mentioned


Q.no 47. Which of the following is not a noncomparison sort?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Shell sort

Q.no 48. Running merge sort on an array of size n which is already sorted is

A : O(n)

B : O(nlogn)

C : O(n^2)

D : O(log n)

Q.no 49. Writing parallel programs is referred to as

A : Parallel computation

B : Parallel processes

C : Parallel development

D : Parallel programming

Q.no 50. Nanoscience can be studied with the help of ___________

A : Quantum mechanics

B : Newtonian mechanics

C : Macro-dynamic

D : Geophysics

Q.no 51. Given a number of elements in the range [0….n^3]. which of the following
sorting algorithms can sort them in O(n) time?

A : Counting sort

B : Bucket sort

C : Radix sort

D : Quick sort
Q.no 52. Thread synchronization is required because ___________

A : all threads of a process share the same address space

B : all threads of a process share the same global variables

C : all threads of a process can share the same files

D : all of the mentioned

Q.no 53. In indirect communication between processes P and Q __________

A : a) there is another process R to handle and pass on the messages between P and Q

B : there is another machine between the two processes to help communication

C : there is a mailbox to help communication between P and Q

D : none of the mentioned

Q.no 54. Which of the following is not the possible ways of data exchange?

A : Simplex

B : Multiplex

C : Half-duplex

D : Full-duplex

Q.no 55. The link between two processes P and Q to send and receive messages is
called __________

A : communication link

B : message-passing link

C : synchronization link

D : all of the mentioned

Q.no 56. Octa-core processor are the processors of the computer system that
contains

A : 2 processors

B : 4 processors

C : 6 processors
D : 8 processors

Q.no 57. Multi-processor systems of the computer system has advantage of

A : cost

B : reliability

C : uncertainty

D : scalability

Q.no 58. Termination of the process terminates ___________

A : first thread of the process

B : first two threads of the process

C : all threads within the process

D : no thread within the process

Q.no 59. The transparency that enables accessing local and remote resources
using identical operations is called ____________

A : Access transparency

B : Concurrency transparency

C : Performance transparency

D : Scaling transparency

Q.no 60. NVIDIA thought that 'unifying theme' of every forms of parallelism is the

A : CDA thread

B : PTA thread

C : CUDA thread

D : CUD thread
Answer for Question No 1. is b

Answer for Question No 2. is a

Answer for Question No 3. is d

Answer for Question No 4. is c

Answer for Question No 5. is a

Answer for Question No 6. is a

Answer for Question No 7. is c

Answer for Question No 8. is b

Answer for Question No 9. is b

Answer for Question No 10. is c

Answer for Question No 11. is a

Answer for Question No 12. is d

Answer for Question No 13. is b

Answer for Question No 14. is c

Answer for Question No 15. is c

Answer for Question No 16. is b


Answer for Question No 17. is c

Answer for Question No 18. is a

Answer for Question No 19. is d

Answer for Question No 20. is b

Answer for Question No 21. is b

Answer for Question No 22. is d

Answer for Question No 23. is d

Answer for Question No 24. is d

Answer for Question No 25. is a

Answer for Question No 26. is c

Answer for Question No 27. is d

Answer for Question No 28. is c

Answer for Question No 29. is d

Answer for Question No 30. is a

Answer for Question No 31. is c

Answer for Question No 32. is c


Answer for Question No 33. is b

Answer for Question No 34. is d

Answer for Question No 35. is a

Answer for Question No 36. is b

Answer for Question No 37. is c

Answer for Question No 38. is c

Answer for Question No 39. is b

Answer for Question No 40. is a

Answer for Question No 41. is a

Answer for Question No 42. is b

Answer for Question No 43. is b

Answer for Question No 44. is d

Answer for Question No 45. is b

Answer for Question No 46. is b

Answer for Question No 47. is d

Answer for Question No 48. is b


Answer for Question No 49. is d

Answer for Question No 50. is a

Answer for Question No 51. is c

Answer for Question No 52. is d

Answer for Question No 53. is c

Answer for Question No 54. is b

Answer for Question No 55. is a

Answer for Question No 56. is d

Answer for Question No 57. is b

Answer for Question No 58. is c

Answer for Question No 59. is a

Answer for Question No 60. is c


Our telegram channel - https://t.me/sppumcq
click here to join
Which of the following statements are true with regard to compute capability in CUDA

A. Code compiled for hardware of one compute capability will not need to be re-
compiled to run on hardware of another

B. Different compute capabilities may imply a different amount of local memory per
thread

C. Compute capability is measured by the number of FLOPS a GPU accelerator can


compute.

Answer : B

True or False: The threads in a thread block are distributed across SM units so that each
thread is executed by one SM unit.

A. True

B. False

Answer : B

The style of parallelism supported on GPUs is best described as

A. SISD - Single Instruction Single Data

B. MISD - Multiple Instruction Single Data

C. SIMT - Single Instruction Multiple Thread

Answer : C

True or false: Functions annotated with the __global__ qualifier may be executed on the
host or the device

A. True

B. Flase

Answer : A
Which of the following correctly describes a GPU kernel

A. A kernel may contain a mix of host and GPU code

B. All thread blocks involved in the same computation use the same kernel

C. A kernel is part of the GPU's internal micro-operating system, allowing it to act as in


independent host

Answer : B

Which of the following is not a form of parallelism supported by CUDA

A. Vector parallelism - Floating point computations are executed in parallel on wide


vector units

B. Thread level task parallelism - Different threads execute a different tasks

C. Block and grid level parallelism - Different blocks or grids execute different tasks

D. Data parallelism - Different threads and blocks process different parts of data in
memory

Answer :A

What strategy does the GPU employ if the threads within a warp diverge in their execution?

A. Threads are moved to different warps so that divergence does not occur within a
single warp

B. Threads are allowed to diverge

C. All possible execution paths are run by all threads in a warp serially so that thread
instructions do not diverge

Answer : C

Which of the following does not result in uncoalesced (i.e. serialized) memory access on the
K20 GPUs installed on Stampede

A. Aligned, but non-sequential access

B. Misaligned data access

C. Sparse memory access

Answer : A
Which of the following correctly describes the relationship between Warps, thread blocks,
and CUDA cores?

A. A warp is divided into a number of thread blocks, and each thread block executes on
a single CUDA core

B. A thread block may be divided into a number of warps, and each warp may execute
on a single CUDA core

C. A thread block is assigned to a warp, and each thread in the warp is executed on a
separate CUDA core

Answer : B

Shared memory in CUDA is accessible to:

A. All threads in a single block

B. Both the host and GPU

C. All threads associated with a single kernel

Answer : A

CUDA Architecture CPU consist of

A. CUDA Libraries

B. CUDA Runtime

C. CUDA Driver

D. All Above

Answer : D

CUDA platform works on

A. C

B. C++

C. Forton

D. All Above

Answer : D
Threads support Shared memory and Synchronization

A. True

B. False

Answer : A

Application of CUDA are

A. Fast Video Transcoding

B. Medical Imaging

C. Computational Science

D. Oil and Natural Resources exploration

E. All Above

Answer : E

GPU execute device code

A. True

B. False

Answer : A
What are the issues in sorting?

A. Where the Input and Output Sequences are Stored

B. How Comparisons are Performed

C. All above

Answer : C

The parallel run time of the formulation for Bubble sort is

A. Tp = O(n/plogn/p) + O(n) + O(n)

B. Tp = O(n/plogn/p) + O (n/plogp) + O(ln/p)

C. Non of the above

Answer : A

What are the variants of Bubble sort?

A. Shell sort

B. Quick sort

C. Odd-Even transposition

D. Option A & C

Answer : D

What is the overall complexity of parallel algorithm for quick sort?

A. Tp = O(n/p logn/p) + O(n/p logp) + O(log2p)

B. Tp = O(n/p logn/p) + O(n/p logp)

C. Tp = O(n/p logn/p) + O(log2p)

Answer : A
Formally, given a weighted graph G(V, E, w), the all-pairs shortest paths problem is to
find the shortest paths between all pairs of vertices. True or False?

A. True
B. False

Answer : A

What is true for parallel formulation of Dijkstra’s Algorithm?

A. One approach partitions the vertices among different processes and has each process
compute the single-source shortest paths for all vertices assigned to it. We refer to
this approach as the source-partitioned formulation.
B. Another approach assigns each vertex to a set of processes and uses the parallel
formulation of the single-source algorithm to solve the problem on each set of
processes. We refer to this approach as the source-parallel formulation.
C. Both are true
D. Non of these is true

Answer : C

Search algorithms can be used to solve discrete optimization problems. True or False ?

A. True
B. False
Answer : A

Examples of Discrete optimization problems are ;


A. planning and scheduling,
B. The optimal layout of VLSI chips,
C. Robot motion planning,
D. Test-pattern generation for digital circuits, and logistics and control.
E. All of above
Answer : E

List the important parameters of Parallel DFS

A. Work- Splitting Strategies

B. Load balancing Schemes

C. All of above

Answer : C
List the communication strategies for parallel BFS.

A. Random communication strategy

B. Ring communication strategy

C. Blackboard communication strategy

D. All of above

Answer : D

The lower bound on any comparison-based sort of n numbers is Θ(nlog n)


A. True
B. False
Answer : A

In a compare-split operation
A. Each process sends its block of size n/p to the other process
B. Each process merges the received block with its own block and retains only the
appropriate half of the merged block
C. Both A & B
Answer : C

In a typical sorting network


A. Every sorting network is made up of a series of columns
B. Each column contains a number of comparators connected in parallel
C. Both A & B
Answer : C

Bubble sort is difficult to parallelize since the algorithm has no concurrency


A. True
B. False
Answer : A
What are the sources of overhead?

A. Essential /Excess Computation


B. Inter-process Communication
C. Idling
D. All above

Answer : D

Which are the performance metrics for parallel systems?

A. Execution Time
B. Total Parallel Overhead
C. Speedup
D. Efficiency
E. Cost
F. All above

Answer : F

The efficiency of a parallel program can be written as: E = Ts / pTp. True or False?

A. True
B. False

Answer : A

Overhead function or total overhead of a parallel system as the total time collectively
spent by all the processing elements over and above that required by the fastest known
sequential algorithm for solving the same problem on a single processing element.
True or False?
A. True
B. False
Answer : A

What is Speedup?
A. A measure that captures the relative benefit of solving a problem in parallel. It is defined as the
ratio of the time taken to solve a problem on a single processing element to the time required to
solve the same problem on a parallel computer with p identical processing elements.
B. A measure of the fraction of time for which a processing element is usefully
employed.
C. None of the above
Answer : A

In an ideal parallel system, speedup is equal to p and efficiency is equal to one. True or
False?
A. True
B. False
Answer : A
A parallel system is said to be ________________ if the cost of solving a problem on a
parallel computer has the same asymptotic growth (in terms) as a function of the input
size as the fastest-known sequential algorithm on a single processing element.
A. Cost optimal
B. Non Cost optimal
Answer : A

Using fewer than the maximum possible number of processing elements to execute a
parallel algorithm is called ______________ a parallel system in terms of the number of
processing elements.

A. Scaling down
B. Scaling up
Answer : B

The __________________ function determines the ease with which a parallel system can
maintain a constant efficiency and hence achieve speedups increasing in proportion to the
number of processing elements.
A. Isoefficiency
B. Efficiency
C. Scalability
D. Total overhead
Answer : A

Minimum execution time for adding n numbers is Tp = n/p + 2 logp True or False ?
A. True
B. False
Answer : A

The overhead function To = pTP − TS.


A. True
B. False
Answer : A

Performance Metrics for Parallel Systems: Speedup(S) =TS/TP


A. True
B. False
Answer : A

Matrix Vector multiplication 2D Partitions requires some basic communication operations


A. one-to-one communication to align the vector along the main diagonal
B. one-to-all broadcast of each vector element among the n processes of each column
C. all-to-one reduction in each row
D. All Above
Answer : D
HPC MCQ QB for Mock Insem Examination

Unit I
1. Conventional architectures coarsely comprise of a_

A. A processor
B. Memory system
C Data path.
D All of Above

2. Data intensive applications utilize_

A High aggregate throughput


B High aggregate network bandwidth
C High processing and memory system performance.
D None of above

3. A pipeline is like_

A Overlaps various stages of instruction execution to achieve performance.


B House pipeline
C Both a and b
D A gas line

4. Scheduling of instructions is determined_

A True Data Dependency


B Resource Dependency
C Branch Dependency
D All of above

5. VLIW processors rely on_

A Compile time analysis


B Initial time analysis
C Final time analysis
D Mid time analysis

6. Memory system performance is largely captured by_

A Latency
B Bandwidth
C Both a and b
D none of above

7. The fraction of data references satisfied by the cache is called_


A Cache hit ratio
B Cache fit ratio
B Cache best ratio
C none of above

8. A single control unit that dispatches the same Instruction to various processors is__

A SIMD
B SPMD
C MIMD
D None of above

9. The primary forms of data exchange between parallel tasks are_

A Accessing a shared data space


B Exchanging messages.
C Both A and B
D None of Above

10. Switches map a fixed number of inputs to outputs.


A True
B False

Unit 2
1. The First step in developing a parallel algorithm is_

A. To Decompose the problem into tasks that can be executed concurrently


B. Execute directly
C. Execute indirectly
D. None of Above

2. The number of tasks into which a problem is decomposed determines its_

A. Granularity
B. Priority
C. Modernity
D. None of above

3. The length of the longest path in a task dependency graph is called_


A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above

4. The graph of tasks (nodes) and their interactions/data exchange (edges)_


A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above

5. Mappings are determined by_

A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above

6. Decomposition Techniques are_


A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. speculative decomposition
E. All of Above

7. The Owner Computes Rule generally states that the process assigned a particular data
item is responsible for_

A. All computation associated with it


B. Only one computation
C. Only two computation
D. Only occasionally computation

8. A simple application of exploratory decomposition is_

A. The solution to a 15 puzzle


B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above

9. Speculative Decomposition consist of _

A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B

10. task characteristics include:


A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above
Unit 3

1. Group communication operations are built using point-to-point messaging primitives


A. True
B. False

2. Communicating a message of size m over an uncongested network takes time ts + tmw

A. True
B. False

3. The dual of one-to-all broadcast is_

A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above

4. A hypercube has_

A. 2d nodes
B. 2d nodes
C. 2n Nodes
D. N Nodes

5. A binary tree in which processors are (logically) at the leaves and internal nodes are
routing nodes.

A. True
B. False

6. In All-to-All Broadcast each processor is the source as well as destination.

A. True
B. False

7. The Prefix Sum Operation can be implemented using the_

A. All-to-all broadcast kernel.


B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel

8. In the scatter operation_


A. Single node send a unique message of size m to every other node
B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node
D. None of Above

9. The gather operation is exactly the inverse of the_

A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation

10. In All-to-All Personalized Communication Each node has a distinct message of size m
for every other node

A. True
B. False

1. It is ___________ strength and ___________ permeability.


a) High, high
b) Low, low
c) High, low
d) Low, high
View Answer
Answer: c
Explanation: It is specifically chosen so as to have particularly appropriate properties for the
expected use of the structure such as high strength and low permeability.

2. High Performance concrete works out to be economical.


a) True
b) False
View Answer
Answer: a
Explanation: High Performance concrete works out to be economical, even though its initial
cost is high.
3. HPC is not used in high span bridges.
a) True
b) False
View Answer
Answer: b
Explanation: Major applications of high-performance concrete in the field of Civil
Engineering constructions have been in the areas of long-span bridges, high-rise buildings
or structures, highway pavements, etc.

4. Concrete having 28- days’ compressive strength in the range of 60 to 100 MPa.
a) HPC
b) VHPC
c) OPC
d) HSC
View Answer
Answer: a
Explanation: High Performance Concrete having 28- days’ compressive strength in the
range of 60 to 100 MPa.
5. Concrete having 28-days compressive strength in the range of 100 to 150 MPa.
a) HPC
b) VHPC
c) OPC
d) HSC
View Answer
Answer: b
Explanation: Very high performing Concrete having 28-days compressive strength in the
range of 100 to 150 MPa.

w to Install Unity on Ubuntu 18.04 [Complete Procedure]

6. High-Performance Concrete is ____________ as compared to Normal Strength


Concrete.
a) Less brittle
b) Brittle
c) More brittle
d) Highly ductile
View Answer
Answer: c
Explanation: High-Performance Concrete is more brittle as compared to Normal Strength
Concrete (NSC), especially when high strength is the main criteria.

7. The choice of cement for high-strength concrete should not be based only on mortar-
cube tests but it should also include tests of compressive strengths of concrete at
___________ days.
a) 28, 56, 91
b) 28, 60, 90
c) 30, 60, 90
d) 30, 45, 60
View Answer
Answer: a
Explanation: The choice of cement for high-strength concrete should not be based only on
mortar-cube tests but it should also include tests of compressive strengths of concrete at
28, 56, and 91 days.

8. For high-strength concrete, a cement should produce a minimum 7-days mortar-cube


strength of approximately ___ MPa.
a) 10
b) 20
c) 30
d) 40
View Answer
Answer: c
Explanation: For high-strength concrete, a cement should produce a minimum 7-days
mortar-cube strength of approximately 30 MPa.

9. ____________ mm nominal maximum size aggregates gives optimum strength.


a) 9.5 and 10.5
b) 10.5 and 12.5
c) 9.5 and 12.5
d) 11.5 and 12.5
View Answer
Answer: c
Explanation: Many studies have found that 9.5 mm to 12.5 mm nominal maximum size
aggregates gives optimum strength.

10. Due to low w/c ratio _____________


a) It doesn’t cause any problems
b) It causes problems
c) Workability is easy
d) Strength is more
View Answer
Answer: b
Explanation: Due to the low w/c ratio, it causes problems so superplasticizers are used.
HPC MCQ QB for Mock Insem Examination

​Unit I

1. Conventional architectures coarsely comprise of a_

A. A processor

B. Memory system

C Data path.

D All of Above

2. Data intensive applications utilize_

A ​High aggregate throughput

B High aggregate network bandwidth

C High processing and memory system performance.

D None of above

3. A pipeline is like_

A Overlaps various stages of instruction execution to achieve performance.

B House pipeline

C Both a and b

D A gas line

4. Scheduling of instructions is determined_

A True Data Dependency


B Resource Dependency

C Branch Dependency

D All of above

5. VLIW processors rely on_

A ​Compile time analysis

B Initial time analysis

C Final time analysis

D Mid time analysis

6. Memory system performance is largely captured by_

A Latency

B Bandwidth

C Both a and b

D none of above

7. The fraction of data references satisfied by the cache is called_

A Cache ​hit ratio

B Cache ​fit ratio

B Cache ​best ratio

C none of above
8. A single control unit that dispatches the same Instruction to various
processors is__

A SIMD

B SPMD

C MIMD

D None of above

9. The primary forms of data exchange between parallel tasks are_

A Accessing a shared data space

B Exchanging messages.

C Both A and B

D None of Above

10. Switches map a fixed number of inputs to outputs.

A True

B False

Unit 2

1. The First step in developing a parallel algorithm is_

A. ​To Decompose the problem into tasks that can be executed


concurrently

B. Execute directly
C. Execute indirectly

D. None of Above

2. The number of tasks into which a problem is decomposed determines its_

A. ​Granularity

B. Priority

C. Modernity

D. None of above

3. The length of the longest path in a task dependency graph is called_

A. ​the critical path length

B. the critical data length

C. the critical bit length

D. None of above

4. The graph of tasks (nodes) and their interactions/data exchange (edges)_

A. ​Is referred to as a ​task interaction graph

B. Is referred to as a ​task Communication graph

C. Is referred to as a ​task interface graph

D. None of Above

5. Mappings are determined by_

A. task dependency
B. task interaction graphs

C. ​Both A and B

D. None of Above

6. Decomposition Techniques are_

A. recursive decomposition

B. data decomposition

C. exploratory decomposition

D. speculative decomposition

E. ​All of Above

7. The ​Owner Computes Rule ​generally states that the process assigned a
particular data item is responsible for_

A. ​All computation associated with it

B. Only one computation

C. Only two computation

D. Only occasionally computation

8. A simple application of exploratory decomposition is_

A. ​The solution to a 15 puzzle

B. The solution to 20 puzzle

C. The solution to any puzzle

D. None of Above
9. Speculative Decomposition consist of _

A. conservative approaches

B. optimistic approaches

C. ​Both A and B

D. Only B

10. task characteristics include:

A. Task generation.

B. Task sizes.

C. Size of data associated with tasks.

D. ​All of Above

Unit 3

1. Group communication operations are built using point-to-point messaging


primitives

A. ​True

B. False

2. Communicating a message of size m over an uncongested network takes


time ts + tmw

A. ​True

B. False
3. The dual of one-to-all broadcast is_

A. ​All-to-one reduction

B. All-to-one receiver

C. All-to-one Sum

D. None of Above

4. A hypercube has_

d​
A. ​2​ nodes

B. 2d nodes

C. 2n Nodes

D. N Nodes

5. A binary tree in which processors are (logically) at the leaves and internal
nodes are routing nodes.

A. ​True

B. False

6. In All-to-All Broadcast each processor is the source as well as destination.

A. ​True
B. False

7. The Prefix Sum Operation can be implemented using the_

A. ​All-to-all broadcast kernel.

B. All-to-one broadcast kernel.

C. One-to-all broadcast Kernel

D. Scatter Kernel

8. In the ​scatter ​operation_

A. ​Single node send a unique message of size m to every other


node

B. Single node send a same message of size m to every other node

C. Single node send a unique message of size m to next node

D. None of Above

9. The gather operation is exactly the inverse of the_

A. ​Scatter operation

B. Broadcast operation

C. Prefix Sum

D. Reduction operation

10. In All-to-All Personalized Communication Each node has a distinct


message of size m for every other node
A. ​True

B. False
SN Question Option 1 Option 2
1 Any condition that causes a processor to stall is called as _____. Hazard Page fault
2 The time lost due to branch instruction is often referred to as _____. Latency Delay
3 _____ method is used in centralized systems to perform out of order execution.Scorecard Score boardin
4 The computer cluster architecture emerged as an alternative for ____. ISA Workstation
5 NVIDIA CUDA Warp is made up of how many threads? 512 1024
6 Out-of-order instructions is not possible on GPUs. WAHR FALSCH
7 CUDA supports programming in .... C or C++ only Java, Python
8 FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GP 32-bit IEEE flo 32-bit integer in
9 Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processo 1024 128
10 Each NVIDIA GPU has ------ Streaming Multiprocessors 8 1024
11 CUDA provides ------- warp and thread scheduling. Also, the overhead of thread“programming-overhead”,
c “zero-overhead”, 2 clock
1 clock
12 Each warp of GPU receives a single instruction and “broadcasts” it to all of its threads. It is SIMT
SIMD (Single a instru
---- operation.
(Single instru
13 Limitations of CUDA Kernel recursion, call No recursion
14 What is Unified Virtual Machine It is a techniq It is a techniq
15 _______ became the first language specifically designed by a GPU Company to Python,fa GPUs.C, CPUs.
16 The CUDA architecture consists of --------- for parallel computing kernels and funRISC instructio CISC instructio
17 CUDA stands for --------, designed by NVIDIA. Common Uni Complex Unid
18 The host processor spawns multithread tasks (or kernels as they are known in CUD WAHR FALSCH
19 The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core 128, d 256, 5132, 64, 128
20 NVIDIA 8-series GPUs offer -------- . 50-200 GFLOP 200-400 GFL
21 IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors 32-bit
o IEEE flo 32-bit integer in
22 CUDA Hardware programming model supports: a) fully generally data-parallel arch a,c,d,f b,c,d,e
23 In CUDA memory model there are following memory types available: a) Registers; a, b, d, f a, c, d, e, f
24 What is the equivalent of general C program with CUDA C: int main(void) { printf(" int main ( vo__global__ v
25 Which function runs on Device (i.e. GPU): a) __global__ void kernel (void ) { } b)aint m b
26 A simple kernel for adding two integers: __global__ void add( int *a, int *b, int add()
*c will execute add() will ex
27 If variable a is host variable and dev_a is a device (GPU) variable, to allocate mem
cudaMalloc(malloc( &dev
28 If variable a is host variable and dev_a is a device (GPU) variable, to copy input memcpy(
fro dev
cudaMemcpy(
29 Triple angle brackets mark in a statement inside main function, what does it indicates?
a call from ho a call from dev
30 What makes a CUDA code runs in parallel __global__ in main() functio
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
Option 3 Option 4 Correct Ans Level Marks
System error None of the 1 1 1
Branch penal None of the 3 1 1
Optimizing Redundancy 2 1 1
Super compu Distributed sy 3 1 1
312 32 4 1 1
-- -- 2 1 1
C, C++, thirdPascal
party 3 1 1
both none of the ab 1 1 1
512 8 4 1 1
512 16 4 1 1
64, 2 clock 32, 1 clock 2 2 2
SISD (SingleSISTinstru
(Single instru 2 1 1
recursion, no No recursion 2 2 2
It is a techniqIt is a techniq 1 1 1
CUDA C, GPUs. Java, CPUs. 3 1 1
ZISC instructioPTX instructio 4 1 1
Compute Uni Complex Unstructur 3 1 1
--- --- 1 1 1
64, 128, 256256, 512, 10 1 3 3
400-800 GFL800-1000 GFL 1 1 1
both none of the ab 2 1 1
a,d,e,f a,b,c,d,e,f 4 2 2
a, b, c, d, e, b,
f c, e, f 3 2 2
__global__ v__global__ in 2 2 2
both a,b --- 1 1 1
add() will beadd() will be 1 1 1
cudaMalloc(malloc( (void 3 1 1
memcpy( (vo cudaMemcpy( 2 1 1
less than com greater than 1 1 1
Kernel namefirst paramete 4 1 1
marks question A B C D ans
Interconnection Networks Direct Both Static and
0 1 Both Dynamic Static
can be classified as? Network Dynamic.
Parallel Computers are used
Algorithmic Optimization This is an
1 1 to solve which types of Both None
Problems Problems explaination.
problems.
One clock Is used
How many clocks control
2 1 One Three Four Five to control all the
all the stages in a pipeline?
stages.
Main memory is
Main memory in parallel
3 1 Shared Parallel Fixed None shared in parallel
computing is____?
computing.
Ans- (d)-
Application
Which of these is not a class
Application Distributed Symmetric Multicore checkpoiting. is
4 1 of parallel computing
Checkpointing Computing Multiprocessing Computing not a class of
architetcture?
parallel computer
architecture.
Parallel computing
software
Parallel Computing software Parallel
Automatic Application solutionincludes all
5 1 solutions and Techniques All Programming
Parallelization Checkpointing of the following..
includes: languages.
This is an
explanation
The Processors are The Processors
6 2 connected to the memory Switches Cables Buses Registers are connected
through a set of? thru. the switches.
Superscalar Architetcure
This is an
7 2 has how many execution Two One Three Four
explaination.
units?
What is used to hold the The Intermediate
Intermediate
8 2 intermediate output in a Cache RAM ROM Registers are used
Register
pipeline to hold the output.
International
International Human Genome
Human
Which oranization performs Sequencing and Genome Sequencing for
Genome This is an
9 2 sequencing of Human Consortium for Sequencing and Humans and
Sequencing explaination.
Genome? Human Constrium, Consortium,
and
Genome Org. Org.
Consortium
Ans(c)- Five
There are how many stages
10 2 Five Three Two Six stages are there in
in RISC Processor?
a RISC processor.
The DRAM acess
Over the last decade, The
time rate has
DRAM access time has None of the
11 2 0.1 0.2 0.15 improved at a rate
improved at what rate per above
of 10% over the
year?
last decade.
marks question A B C D ans
Cache acts as low
Which memory acts as low- latency high
12 2 latency high bandwidth Cache Register DRAM EPROM bandwidth storage
storage? .This is an
explanation.
Which processor This is an
13 2 SIMD MIMD MISD MIMD
architecture is this? explaination.
This diagram
Which core processor is
14 2 Quad-Core Dual-Core Octa-Core Single-Core shows Quad-
this?
Core.
Data Caching is
Which of these is not a
15 2 Data Caching Decomposition Simplification Parsimony not a prinicple of
scalable design principle?
scable design.
The distance between any O(1) is the ditance
16 2 two nodes in Bus Based O(1) O(n Logn) O(N) O(n^2) between any two
network is? nodes.
All of these are
Early SIMD computers early staged
17 2 All MPP CM-2 Illiac IV
include: SIMD parallel
computers.
This is called
This is which configuration
18 2 Pass-through Cross-Over Shuffle None Pass-through
in Omega networks.
configuration.
Parallelization
Automatic Parallelization includes parse,
19 2 technique doesn’t Share Memory Analyse Schedule Parse analyse schedule
ncludes: and code
generation.
The P4 processor
The Pentuim 4 or P4
has 20 staged
20 2 processor has how many 20 15 18 10
pipeline. This is an
stage pipeline?
explanation.
Sum, Prioirity and
Which protocol is not used
common are used
21 3 to remove concurrent Identify Priority Common Sum
to remove
writes?
concurrent writes.
Exclusive EREW stands for
Erasable Read Easily Read
Read and Exclusive Read
22 3 EREW PRAM stands for? and Erasable and Easily None
Exclusive and Exclsuive
Write PRAM Write
Write Write PRAM.
Multiple
During each clock cycle,
Instuctiion are
multiple instructions are
23 3 Parallel Series Both a and b None piped in parallel.
piped into the processor
This is an
in________?
explanation.
Multistaged
Which Interconnection Multistage Dynamic
24 3 Cross-Bar Bus-Staged Network uses this
Network uses this equation. Networks Networks
eqn.
marks question A B C D ans
There are
generally four
How many types of parallel types of parallel
computing are available computing,
25 3 from both proprietary and 4 2 3 6 available from
open source parallel both proprietary
computing vendors? and open source
parallel computing
vendors.
If a piece of data
is repeatedly used,
If a piece of data is the effective
repeatedly used, the latency of this
effective latency of this memory system
memory system can be Memory can be reduced by
26 3 Hit Ratio Memory ratio Hit Fraction
reduced by the cache. The Fraction. the cache. The
fraction of data references fraction of data
satisfied by the cache is references
called? satisfied by the
cache is called the
cache hit ratio.
SuperScalar
Superscalar Architetcure Data- Architecture can
27 3 Scheduling Phasing Data Extraction
can create problem in? Compiling cause problems in
CPU scheduling.
In cut-through
In cut-through routing, a routing, a message
28 3 message is broken into fixed Flits Flow Digits Control Digits All is broken into
size units called? fixed size units
called flits.
The total communication
This is an
29 3 time for cut-through routing A B C D
explaination.
is?
The Disadvantage of GPU Load- Process All of the This is an
30 1 Data balancing
Pipeline is? balancing balancing above explaination.
Examples of GPU AMD Both AMD and
31 1 Both NVIDIA None
Processors are: Processors NVIDIA.
Simultaneous
execution of
Simultaneous execution of
Stream different programs
32 1 different programs on a data Data Execution Data-paralleism None
Parallelism on a data stream is
stream is called?
called Stream
Parallelism.
Early GPU controllers were GPU This is an
33 1 Video Shifters GPU Shifters Video-Movers
known as? Controllers Explaination.
Algorithm
_____development is a
development is a
critical component of
34 1 Algorithm Code Pseudocode Problem critical component
problem solving using
of problem solving
computers?
using computers
marks question A B C D ans
Graphics
Graphical Gaming Graph This is an
35 1 GPU stands for? Processsing
Processing Unit Processing Unit Processing Unit Explaination.
Unit
Parallelism leads
naturally to
Concurrency. For
Serial
36 1 What leads to concurrency? Parallelism Decomposition All example, Several
Processing
processes trying to
print a file on a
single printer.
Rasterization is the
process of
The process of determining
Space- determining which
which screen-space pixel
37 2 Rasterization Pixelisation Fragmentation Determining screen-space pixel
locations are covered by
Process locations are
each\ntriangle is known as?
covered by
each\ntriangle.
The
programmable
units of the GPU
The programmable units of
follow a single
38 2 the GPU follow which SPMD MISD MIMD SIMD
program multiple-
programming model?
data (SPMD)
programming
model.
Shared Address
Which space can ease the space can ease the
programming effort, programming
especially if the distribution Shared Parallel Series- effort, especially if
39 2 Data- Address
of data is different in Address Address Address the distribution of
different phases of the data is different in
algorithm? different phases of
the algorithm.
Processors are the
Which are the hardware hardware units
40 2 units that physically perform Processsor ALU CPU CU that physically
computations? perform
computations
All of the these are
Examples of Graphics API
41 2 All DirectX CUDA Open-CL examples of
are?
Graphics API
The mechanism by
The mechanism by which which tasks are
tasks are assigned to assigned to
42 2 Mapping Computation Process None
processes for execution is processes for
called___? execution is called
mapping.
marks question A B C D ans
A decomposition
A decomposition into a into a large
large number of small tasks number of small
43 2 Fine- grained Coarse-grained Vector-granied All
is called__________ tasks is called
granularity. fine-grained
granularity.
Identical
operations being
Identical operations being
applied
applied concurrently on Data-
44 2 Parallelism Data Serialsm Concurrency concurrently on
different data items is Parallelism
different data
called?
items is called
Data Parallelism.
System which do not have
This is the
45 2 parallel processsing SISD SIMD MISD MIMD
explainantion.
capabiities?
The time and the
The time and the location in location in the
the program of a static one- program of a static
46 2 Priori Polling Decomposition Execution
way interaction is known as one-way
? interaction is
known a priori.
Memory access in RISC
CALL and MOV and This is the
47 2 architecture is limited to STA and LDA Push and POP
RET JMP explaination.
which instructions?
Data Parallel
Which Algorithms can be algorithms can be
implemented in both shared- implemented in
Data-Parallel Quick-Sort Bubble Sort
48 2 address-space and Data Algorithm both shared-
Algo. Algo. Algo.
message-passing address-space
paradigms? and message-
passing paradigms
Randomized This figure shows
Which type of Distribution is Block-Cyclic Cyclic
49 2 Block None Randomized
this? Distribution Distribution
Distribution Block Distribution.
An abstraction
used to express
An abstraction used to such dependencies
express such dependencies Task- Time- among tasks and
Dependency
50 2 among tasks and their Dependency Dependency None their relative order
Graph.
relative order of execution is Graph. Graph of execution is
known as__________? known as a task-
dependency
graph.
marks question A B C D ans
Block distributions
are some of the
Which is the simplest way to simplest ways to
distribute an array and distribute an array
Block Array Process
51 3 assign uniform contiguous All and assign uniform
Distrbution Distrbution Distribution
portions of the array to contiguous
different processes? portions of the
array to different
processes
An example of a
An example of a decomposition
Image- Travelling Time-
decomposition with a 8 Queen with a regular
52 3 dethering Salesman complexity
regular interaction pattern problem. interaction pattern
problem. Problem Problens
is? is the problem of
image dithering.
A feature of a
task-dependency
A feature of a task-
graph that
dependency graph that
determines the
53 3 determines the average Critical-path Process-path Granularity. Concurrency
average degree of
degree of concurrency for a
concurrency for a
given granularity is
given granularity is
critical path.
The shared-
address-space
The shared-address-space programming
54 3 programming paradigms can Both Two way One way None paradigms can
handle which interactions? handle both one-
way and two-way
interactions.
Cyclic Distribution
can result in an
Which distribution can result
almost perfect
in an almost perfect load
Cyclic Array Block-Cyclic Block load balance due
55 3 balance due to the extreme
Distribution. Distribution Distribution Distribution. to the extreme
fine-grained underlying
fine-grained
decomposition.
underlying
decomposition.
Data sharing
interactions can be
Data sharing interactions
categorized as
56 3 can be categorized Both Read-Write Read only None
either read-only or
as__________interactions?
read-write
interactions
marks question A B C D ans
Algo. Model is a
way of structuring
a parallel algorithm
What is the way of
by selecting a
structuring a parallel
decomposition
algorithm by selecting a
Algorithm Mapping and mapping
57 3 decomposition and mapping Parallel Model Data Model
Model Model technique and
technique and applying the
applying the
appropriate strategy to
appropriate
minimize interactions called?
strategy to
minimize
interactions.
This is Serial
Serial column Column- Bubble Sort
58 3 Which Algorithm is this? None. Column based
based Algo. Algorithm Algo.
algorithm.
Algorithms based on the Matrix- Parallel This is an
59 3 All Quicksort
task graph model include: Factorization QuickSort Explaination.
All-port
communication
Which model permits model permits
simultaneous communication All-port One-port Dual-port Quad-port simultaneous
60 1
on all the channels communication communication communication communication communication on
connected to a node? all the channels
connected to a
node.
A process sends the same
m-word message to every
other process, but different All to All One to All All to All This is an
61 1 None
processes may broadcast Broadcast Broadcast Reduction Explaination.
different messages. It is
called?
All to All One-to-all All-to-one One to one
The Matrix is transposed This is an
62 1 personalized personalized personalized personalized
using which operation? Explaination.
communication communication communication communication.
Each node in a
Each node in a two-
two-dimensional
63 1 dimensional wraparound Four Two Three One
wraparound mesh
mesh has how many ports?
has four ports
Circular shift is a member of
a broader class of global This is ann
64 1 Permutation Combination. Both a and b None
communication operations explaination.
known as?
We define a
circular q-shift as
We define_______ as the the operation in
operation in which node i which node i
Circular q-
65 1 sends a data packet to node Linear shift Circular shift Linear q-shift. sends a data
shift
(i + q) mod p in a p-node packet to node (i
ensemble (0 < q < p). + q) mod p in a p-
node ensemble (0
< q < p).
marks question A B C D ans
Parallel algorithms
often require a
Parallel algorithms often single process to
require a single process to send identical data
send identical data to all One to All One to One All to One to all other
66 1 None
other processes or to a Broadcast Broadcast Broadcast processes or to a
subset of them. This subset of them.
operation is known as? This operation is
known as One to
All Broadcast.
In which Communication
All to All One to One All-to-one One-to-all
each node sends a distinct This is an
67 1 personalized personalized personalized personalized
message of size m to every Explaination.
communication communication communication communication.
other node?
All to All personalized
communication operation is Matrix- Fourier Database Join This is an
68 1 Quick Sort
not used in a which of these Transpose Transformation operation Explaination.
parallel algorithms?
The dual of one to
The Dual of one-to-all All to one All to one One to Many All to All all Broadcast is
69 1
broadcast is? Reduction Broadcast Reduction Broadcast called all to one
reduction.
Reduction on a
Reduction on a linear array linear array can be
can be performed performed by
70 1 by_______ the direction Reversing Forwarding Escaping Widening simply reversing
and the sequence of the direction and
communication? the sequence of
communication
This equation is used to
solve which topology This is an
71 2 Hypercube Mesh Ring Linear-Array
operations in all to all Explaination.
communications?
\nThe communication
Second
pattern of all-to-all Third Variation First Variation Fifth Variation This is an
72 2 Variation of
broadcast can be used to of Reduction of Reduction of Reduction Explaination.
Reduction
perform________?
In the scatter
A single node sends a
operation, a single
unique message of size m to
node sends a
73 2 every other node. This Scatter Reduction Gather Concatenate
unique message of
operation is known
size m to every
as______?
other node.
The Algorithm represents All to All All to All All to All One to One This is an
74 2
which broadcast? Broadcast Broadcast Reduction Reduction explaination.
The message can be The message can
75 2 broadcast in how many Log(p) Log(p^2) One Sin(p) be broadcast in
steps? log p steps.
All to All One-to-all One to one All-to-one
This equation is used to This is an
76 2 personalized personalized personalized personalized
solve which operations? Explaination.
communication communication communication communication.
marks question A B C D ans
There are n^3
There are how many
computations for
computations for n^2 words
77 2 N^3 Tan n E^n Log n n^2 words of data
of data transferred among
transferred among
the nodes?
the nodes.
Scatter opeartion
One-to-all One-to-one All-to-one All-to-all is also known as
Scatter Operation is also
78 2 personalized personalized personalized personalized One-to-all
known as?
communication communication communication communication. personalized
communication.
A hypercube with
A Hypercube with 2d nodes 2d nodes can be
can be regarded as a d- regarded as a d-
79 2 Two One Three Four
dimensional mesh with____ dimensional mesh
nodes in each dimension. with two nodes in
each dimension
One-to-all broadcast and
all-to-one reduction are Gausiian Shortest path Matrix- Vector This is an
80 2 All
used in several important Elimination Algo. multiplication Explaination.
parallel algorithms including?
Each node of the
Each node of the distributed-
distributed-memory parallel memory parallel
81 2 computer is a______ NUMA UMA CCMA None computer is a
shared-memory NUMA shared-
multiprocessor. memory
multiprocessor.
To perform a q-
To perform a q-shift, we shift, we expand q
82 2 expand q as a sum of 2 3 e Log p as a sum of
distinct powers of______? distinct powers of
2.
In which implementation of
This is an
83 3 circular shift, the entire row Mesh Hypercube Ring Linear
Explaination
to data set is shifted by
On a p-node
hypercube with
all-port
communication,
On a p-node hypercube
the coefficients of
with all-port communication,
tw in the
the coefficients of tw in the
expressions for the
expressions for the
communication
communication times of
84 3 Log(p) Cos(p) Sin(p) E^p times of one-to-all
one-to-all and all-to-all
and all-to-all
broadcast and personalized
broadcast and
communication are all
personalized
smaller than their single-port
communication are
counterparts by a factor of?
all smaller than
their single-port
counterparts by a
factor of log p.
marks question A B C D ans
The Equation represents
Data Model Space-Time Ans-(c) Cost
85 3 which analysis in All to All Cost Analysis Time Analysis
Analysis Analysis Analysis.
Broadcasts?
On a p-node hypercube, the
size of each message
86 3 A B C D A
exchanged in the i th of the
log p steps is?
This figure shows
One to All
Which broadcast is applied One to All One to One All to One All to one
87 3 Broadcast being
on this 3D hypercube? Broadcast Broadcast Broadcast Reduction
applied on 3D
hypercube.
The Equation represents
This is an
88 3 which analysis in One to All Cost Analysis Time Analysis Data Analysis Space Analysis
explaination.
Broadcasts?
The time for
The time for circular shift on circular shift on a
a hypercube can be hypercube can be
89 3 improved by almost a factor Log p Cos(p) e^p Sin p improved by
of ______ for large almost a factor of
messages. log p for large
messages.
The execution time
of a parallel
algorithm depends
not only on input
size but also on
The execution time of the number of
Relative Communication
90 1 parallel algorithm Processor Input Size processing
computation speed
doesn’t depends upon? elements used,
and their relative
computation and
interprocess
communication
speeds.
Processing elements in a The processing Both
parallel system may become Load element synchronization
91 1 Both Synchronization
idle due to many reasons Imbalance doesn’t and load
such as: become idle. imbalance
If the scaled-
speedup curve is
If the scaled-speedup curve close to linear with
is close to linear with respect to the
respect to the number of number of
92 1 Scalable Iso-scalable Non-Scalable Scale-Efficient
processing elements, then processing
the parallel system is elements, then the
considered as? parallel system is
considered
scalable
marks question A B C D ans
A parallel system
is the combination
Which system is the
of an algorithm
combination of an algorithm Parallel Data- Parallel Architecture
93 1 Series System and the parallel
and the parallel architecture System System System
architecture on
on which it is implemented?
which it is
implemented
Scalable Speedup
defined as the
What is defined as the speedup obtained
speedup obtained when the when the problem
Scalable Unscalable Superlinearity Isoefficiency
94 1 problem size is increased size is increased
Speedup Speedup Speedup Speedup
linearly with the number of linearly with the
processing elements? number of
processing
elements
The maximum
number of tasks
The maximum number of that can be
tasks that can be executed executed
95 1 simultaneously at any time in Concurrency Parallelism Linearity Execution simultaneously at
a parallel algorithm is called any time in a
its degree of__________. parallel algorithm
is called its degree
of concurrency.
The isoefficiency due to
This is an
96 1 concurrency in 2-D O(p) O(n Logp) O(1) O(n^2)
explaination.
partitioning is:
We define total
overhead of a
parallel system as
the total time
The total time collectively collectively spent
spent by all the processing by all the
elements over and above processing
that required by the fastest elements over and
Total Parallel
97 2 known sequential algorithm Overhead Serial Runtime above that
Overhead Runtime
for solving the same required by the
problem on a single fastest known
processing element is sequential
known as? algorithm for
solving the same
problem on a
single processing
element.
Parallel
Parallel computations computations
involving matrices and involving\nmatrices
98 2 vectors readily lend Decomposition Composition Linearity Parallelsim and vectors
themselves to data readily lend
______________. themselves to data
decomposition.
marks question A B C D ans
Parallel 1-D with Pipelining
This is an
99 2 is a___________ Synchronous Asynchronous Optimal Cost-optimal
explaination.
algorithm?
The serial complexity of
This is an
100 2 Matrix-Matrix Multiplication Õ•(n^3) O(n^2) O(n) O(nlogn)
explaination
is:
What is the problem size for Ï´(n^3) is the
101 2 Ï´(n^3) Ï´(nlogn) Ï´(n^2) Ï´(1)
n x n matrix multiplication? problem size.
The given equation Overhead Series Parallel This is an
102 2 Parallel Model
represents which function? Function Overtime Overtime explaination.
The efficiency of a parallel
103 2 A B C D A
program can be written as:
The total number
The total number of steps in
of steps in the
104 2 the entire pipelined Θ(n) Θ(n^2) Θ(n^3) Θ(1)
entire pipelined
procedure is_______?
procedure is Θ(n)
In Canon's Algorithm, the This is an
105 2 θ(n^2) θ(n) θ(n^3) θ(nlogn)
memory used is? explaination.
Consider the
problem of
Consider the problem of
multiplying two n
multiplying two n × n
× n dense,
106 2 dense, square\nmatrices A A×B A/B A+B A-B
square\nmatrices
and B to yield the product
A and B to yield
matrix C =:
the product matrix
C = A × B.
The serial runtime of
multiplying a matrix of
107 2 A B C D A
dimension n x n with a
vector is?
Efficiency is a
measure of the
________is a measure of
fraction of time for
the fraction of time for Overtime
108 2 Efficiency Linearity Superlinearity which a
which a processing element Function
processing
is usefully employed.
element is usefully
employed.
When the work performed
by a serial algorithm is
greater than its parallel
formulation or due to Superlinear Linear Performance This is an
109 2 Super Linearity
hardware features that put Speedup Speedups Metrics explaintion
the serial implementation at
a disadvantage.This
phenomena is known as?
The all-to-all broadcast and
This is an
110 3 the computation of y[i] both Θ(n) Θ(nlogn) Θ(n^2) Θ(n^3)
explaination.
take time?
marks question A B C D ans
If virtual
processing
elements are
mapped
If virtual processing
appropriately onto
elements are mapped
physical
appropriately onto physical
processing
111 3 processing elements, the N/p P/n N+p N*p
elements, the
overall communication time
overall
does not grow by more than
communication
a factor of
time does not
grow by more
than a factor of
n/p
Parallel execution time can
be expressed as a function
of problem size, overhead
112 3 A B C D A
function, and the number of
processing elements.The
Formed eqn is:
In 2-D partioning, the first Ts + twn/
113 3 Ts - twn/√p.\n Ts*twn/√p.\n Ts/ twn*√p.\n Ts + twn/√p.\n
alignment takes time=? √p.\n
Using fewer than
the maximum
Using fewer than the
possible number
maximum possible number
of processing
114 3 of processing elements to Scaling Down Scaling up Scaling Stimulation
elements to
execute a parallel algorithm
execute a parallel
is called________?
algorithm is called
scaling down.
Which of the following is a
Memory This is an
115 3 drawback of matrix matrix Efficient Time-bound Complex
Optimal explaination
multiplication?
Consider the
problem of sorting
Consider the problem of 1024 numbers (n
sorting 1024 numbers (n = = 1024, log n =
116 3 1024, log n = 10) on 32 P/log n P*log n P+logn N*log p 10) on 32
processing elements. The processing
speedup expected is elements. The
speedup expected
is p/logn
Consider the problem of
adding n numbers on p
processing elements such
Ꙩ((n/p) log Ꙩ((n*p) log Ꙩ((p/n) log Ans-(a)-
117 3 that p < n and both n and p Ꙩ((n) log p).
p). p). p). Ꙩ((n/p) log p).
are powers of 2. The overall
parallel execution time of the
problem is:
DNS algorithm has____ DNS has Ω(n)
118 3 Ω(n) Ω(n^2) Ω(n^3) Ω(logn)
runtime? runtime
marks question A B C D ans
The serial algorithm Ans-(b)-n^2. The
requires______ serial algorithm
119 3 multiplications and additions N^2 N^3 Log n Nlog(n) requires n^2
in matrix-vector multiplications and
multiplication.\n additions.\n\n
The time required
The time required to merge to merge two
120 1 two sorted blocks of n/p θ(n/p) θ(n) θ(p/n) θ(nlogp) sorted blocks of
elements is_________?\n\n n/p elements is
θ(n/p).\n\n
The stack is split
into two equal
In Parallel DFS,the stack is pieces such that
split into two equal pieces the size of the
such that the size of the search space
121 1 Half-Split Half-Split Parallel-Split None
search space represented represented by
by each stack is the same. each stack is the
Such a split is called?. same. Such a split
is called a half-
split.
To avoid sending
very small
To avoid sending very small
amounts of work,
amounts of work, nodes
nodes beyond a
beyond a specified stack
122 1 Cut-Off Breakdown Full Series specified stack
depth are not given away.
depth are not
This depth is called
given away. This
the_________depth.
depth is called the
cutoff depth.
In sequential
In sequential sorting sorting algorithms,
algorithms, the input and the Process Secondary External the input and the
123 1 Main Memory
sorted sequences are stored Memory Memory Memory sorted sequences
in which memory? are stored in the
process's memory
Each process
sends its block to
the other process.
Each process sends its
Now, each
block to the other process.
process merges
Now, each process merges
the two sorted
the two sorted blocks and
124 1 Compare-Split Split Compare Exchange. blocks and retains
retains only the appropriate
only the
half of the merged block.
appropriate half of
We refer to this operation
the merged block.
as?
We refer to this
operation as
compare-split.
marks question A B C D ans
Each process
compares the
Each process compares the
received element
received element with its
with its own and
own and retains the Compare Process-
125 1 Exchange All retains the
appropriate element.We Exchange Exchange
appropriate
refer this operation
element. We refer
as_______?
this as compare
exchange.
Parallel BFS
maintains the
Which algorithm maintains
unexpanded nodes
the unexpanded nodes in the
126 1 Parallel BFS Parallel DFS Both a and b None in the search
search graph, ordered
graph, ordered
according to their l-value?
according to their
l-value.
The critical issue in
The critical issue in parallel parallel depth-first
depth-first search algorithms search algorithms
127 1 is the distribution of the Processor Space Memory Blocks is the distribution
search space among of the search
the____________? space among the
processors
Enumeration Sort uses how
This is an
128 2 many processes to sort n N^2 Logn N^3 N
explaination.
elements?
A bitonic
sequence is a
sequence of
elements <a0, a1,
Which sequence is a
..., an-1> with the
sequence of elements <a0,
property that
a1, ..., an-1> with the
either (1) there
property that either (1) there
exists an index i, 0
exists an index i, 0 ≤ i
i n - 1, such that
≤n - 1, such that <a0, Bitonic Acyclic Asymptotic Cyclic
129 2 <a0, ..., ai > is
..., ai > is monotonically Sequence Sequence Sequence Sequence.
monotonically
increasing and <ai +1, ...,
increasing and <ai
an-1> is monotonically
+1, ..., an-1> is
decreasing, or (2) there
monotonically
exists a cyclic shift of indices
decreasing, or (2)
so that (1) is satisfied.
there exists a
cyclic shift of
indices so that (1)
is satisfied
marks question A B C D ans
To make a
substantial
improvement over
To make a substantial
odd-even
improvement over odd-even
transposition sort,
transposition sort, we need
we need an
130 2 an algorithm that moves Shell Sort Linear Sort Quick-Sort Bubble Sort
algorithm that
elements long distances.
moves elements
Which one of these is such
long distances.
serial sorting algorithm?
Shellsort is one
such serial sorting
algorithm.
Quicksort is a
Quick-Sort is a_________ Divide and Greedy Divide and
131 2 Both a and b None
algorithm? Conquer Approach Conquer
algorithm.
The_______ transposition
algorithm sorts n elements in
n phases (n is even), each of This is an
132 2 Odd-Even Odd Even None
which requires n/2 explaination.
compare-exchange
operations.
The average time
The average time
complexity for
133 2 complexity for Bucket Sort O(n+k) O(nlog(n+k)) O(n^3) θ(n^2)
Bucket Sort is
is?
O(n + k).
A popular serial
algorithm for
A popular serial algorithm sorting an array of
for sorting an array of n n elements whose
elements whose values are Quick-Sort values are
134 2 Bucket Sort Linear Sort Bubble-Sort
uniformly distributed over an Algo. uniformly
interval [a, b] is which distributed over an
algorithm? interval [a, b] is
the bucket sort
algorithm
Best case
Best Case time complexity complexity of
135 2 O(n) O(n^3) O(nlogn) O(n^2)
of Bubble Sort is: bubblesort is
O(n).
marks question A B C D ans
When more than
one process tries
When more than one to write to the
process tries to write to the same memory
same memory location, only location, only one
one arbitrarily chosen arbitrarily chosen
CRCW-
136 2 process is allowed to write, PRAM Partitioning CRCW process is allowed
PRAM
and the remaining writes are to write, and the
ignored. This process is remaining writes
called_________ in quick are ignored.It is
sort. called CRCW
PRAM quick sort
algo.
Average Time Complexity in This is an
137 2 O(nlogn) O(n) O(n^3) θ(n^2)
a quicksort algorithm is: explainatoin.
The isoefficiency function of The isoefficiency
138 2 Global Round Robin (GRR) O (p^2 log p) O (p log p) O ( log p) O (p^2) function of GRR is
is: O (p^2 log p)
A comparator is a
A_____ is a device with
device with two
two inputs x and y and two
139 2 Comparator Router Separator Switch. inputs x and y and
outputs x' and y' in a Sorting
two outputs x' and
Network.
y'
If T is a DFS tree
in G then the
If T is a DFS tree in G then parallel
the parallel implementation implementation of
140 2 of the algorithm runs in O(t) O(tlogn) O(logt) O(1) the algorithm
______________time outputs a proof
complexity. that can be
verified in O(t)
time complexity.
In the quest for
fast sorting
In the quest for fast sorting methods, a
methods, a number of number of
networks have been networks have
141 2 θ(nlogn) θ(n) θ(1) θ(n^2)
designed that sort n been designed that
elements in time significantly sort n elements in
smaller than___? time significantly
smaller than
θ(nlogn).
The average value
The average value of the
of the search
search overhead factor in
142 2 One Two Three Four overhead factor in
parallel DFS is less
parallel DFS is
than______?
less than one
Parallel runtime for
Parallel runtime for Ring
Ring architecture
143 3 architecture in a bitonic sort θ(n) θ(nlogn) θ(n^2) θ(n^3)
in a bitonic sort is
is:
θ(n)
marks question A B C D ans
The Sequential Complexity
This is an
144 3 of Odd-Even Transposition θ(n^2) θ(nlogn) θ(n^3) θ(n)
explaination.
Algorithm is:
The Algorithm represents Sequential Circular Bubble Simple Bubble Linear Bubble This is an
145 3
which bubble sort: Bubble Sort Sort Sort Sort explaination.
Enumeration Sort uses how
This is an
146 3 much time to sort n θ(1) θ(nlogn) θ(n^2) θ(n)
explaination.
elements?
The radix sort
algorithm relies on
The______algorithm relies
the binary
147 3 on the binary representation Radix-sort Bubble Sort Quick-Sort Bucket-Sort
representation of
of the elements to be sorted.
the elements to be
sorted.
Parallel runtime for Mesh
This is an
148 3 architecture in a bitonic sort θ(n/logn) θ(n) θ(n^2) θ(n^3)
explaination.
is:
The number of
The number of threads in a threads in a thread
thread block is limited by block is also
149 1 the architecture to a total of 512 502 510 412 limited by the
how many threads per architecture to a
block? total of 512
threads per block
CUDA Architecture is
NVIDIA provides
150 1 mainly provided by which NVIDIA Intel Apple IBM
CUDA services.
company?
In CUDA Architecture,
Subprograms are
151 1 what are subprograms Kernel Grid Element Blocks
called kernels.
called?
CUDA Stands for
Compute Computer Common USB Common
What is the fullform of Compute Unified
152 1 Unified Device Unified Device Device Unified Disk
CUDA? Device
Architecture Architecture Architecture Architecture
Architecture.
CUDA
Which of these is not an
Thermo Neural VLSI architecture has no
153 2 application of CUDA Fluid Dynamics
Dynamics Networks Stimulation use on Thermo
Arhitecture?
Dynamics.
CUDA
CUDA programming is programming is
especially well-suited to especially well-
address problems that can suited to address
154 2 Data parallel Task Parallel Both a and b None
be expressed problems that can
as__________ be expressed as
computations. dataparallel
computations.
CUDA C/C++ uses which This is an
155 2 global kernel Cuda_void nvcc
keyword in programming: explaination.
CUDA programs are saved This is an
156 2 .cd .cx .cc .cu
with_____ extension. explaination
marks question A B C D ans
The Kepler K20X
chip block
The Kepler K20-X chip
diagram,
block, contains____
157 2 15 8 16 7 containing 15
streaming
streaming
multiprocessors\n(SMs).
multiprocessors
(SMs)
The K20X
The Kepler K20X architecture
158 2 architecture increases the 64K 32K 128K 256K increases the
register file size to: register fi le size to
64K
The register file in a GPU is Register size in a
159 2 2 MB 1 MB 3MB 1024B
of what size? GPU is 2MB.
NVIDIA’s GPU
computing platform is not This is an
160 2 AMD Tegra Quadro Tesla
enabled on which of the explaination.
following product families:
Tesla K-40 has compute This is an
161 2 3.5 3.2 3.4 3.1
capability of: explaination.
The SIMD unit
The SIMD unit creates, creates, manages,
manages, schedules and schedules and
162 2 executes_____ threads 32 16 24 8 executes 32
simultaneously to create a threads
warp. simultaneously to
create a warp
Which hardware is used by
the host interface to fasten Direct
Memory This is an
163 2 the transfer of bulk data to Memory Switch Hub
Hardware Explaination
and fro the graphics Access
pipeline?
A ‘grid’ is a
collection of
A ____ is a collection of
thread blocks of
thread blocks of the same
164 2 Grid Core Element Blcoks the same thread
thread dimensionality which
dimensionality
all execute the same kernel.
which all execute
the same kernel
Active Warps can be
This is an
165 2 classified into how many 3 2 4 5
explaination.
types?
All threads in a
All threads in a grid share Global Synchronized grid\nshare the
166 2 Local Memory All
the same_________space. memory Memory same global
memory space
CUDA was introduced in This is an
167 2 2007 2006 2008 2010
which year? explaination.
marks question A B C D ans
Unlike a C
function call, all
Unlike a C function call, all
168 3 Asynchronous Synchronous Both a and b None CUDA kernel
CUDA kernel launches are:
launches are
asynchronous
A warp consists of
32 consecutive
A warp consists
threads and
of____consecutive threads
all\nthreads in a
and all threads in a warp are
169 3 32 16 64 128 warp are executed
executed in Single
in Single
Instruction Multiple Thread
Instruction
(SIMT) fashion.
Multiple Thread
(SIMT) fashion
There are how many
This is an
170 3 streaming multiprocessors in 16 8 12 4
explaination.
CUDA architecture?
In CUDA programming, if
This is an
171 3 CPU is the host then device GPU Compiler HDD GPGPU
explaination.
will be:
Both grids and
Both grids and blocks use blocks use the
172 3 the______ type with three Dim3 Dim2 Dim1 Dim4 dim3 type with
unsigned integer fields. three unsigned
integer fi elds
Tesla P100 GPU
based on the
Tesla P100 GPU based on Pascal GPU
the Pascal GPU Architecture has
Architecture has 56 56 Streaming
173 3 Streaming Multiprocessors 2048 512 1024 256 Multiprocessors
(SMs), each capable of (SMs), each
supporting up to____active capable of
threads. supporting up to
2048 active
threads.
The maximum size
The maximum size at each
at each level of the
174 3 level of the thread hierarchy Device Host Compiler Memory
thread hierarchy is
is_____dependent.
device dependent.
Intel I7 has the memory bus This is an
175 3 19B 180B 152B 102B
of width: explaination.
The Streaming
The__________ is the Multiprocessor
Streaming
176 3 heart of the GPU Multiprocessor CUDA Compiler (SM) is the heart
Multiprocessor
architecture: of the GPU
architecture.
marks question A B C D ans
A kernel is defi
A kernel is defined using ned using
177 3 the_____ declaration _global _host _device _void the\n__global
specification declaration specifi
cation
The function
printThreadInfo() is not
Memory Matrix Ans-(d)- Memory
178 3 used to print out which of Block Index Control-Index
Allocations Coordinates Allocations.
the following information
about each thread:
Which is alternative options for latency hiding?
A. Increase CPU frequency
B. Multithreading
C. Increase Bandwidth
D. Increase Memory
ANSWER: B

______ Communication model is generally seen in tightly coupled system.


A. Message Passing
B. Shared-address space
C. Client-Server
D. Distributed Network
ANSWER: B

The principal parameters that determine the communication latency are as


follows:
A. Startup time (ts) Per-hop time (th) Per-word transfer time (tw)
B. Startup time (ts) Per-word transfer time (tw)
C. Startup time (ts) Per-hop time (th)
D. Startup time (ts) Message-Packet-Size(W)
ANSWER: A

The number and size of tasks into which a problem is decomposed


determines the __
A. Granularity
B. Task
C. Dependency Graph
D. Decomposition
ANSWER: A

Average Degree of Concurrency is...


A. The average number of tasks that can run concurrently over the entire
duration of execution of the process.
B. The average time that can run concurrently over the entire duration of
execution of the process.
C. The average in degree of task dependency graph.
D. The average out degree of task dependency graph.
ANSWER: A

Which task decomposition technique is suitable for the 15-puzzle problem?


A. Data decomposition
B. Exploratory decomposition
C. Speculative decomposition
D. Recursive decomposition
ANSWER: B

Which of the following method is used to avoid Interaction Overheads?


A. Maximizing data locality
B. Minimizing data locality
C. Increase memory size
D. None of the above.
ANSWER: A

Which of the following is not parallel algorithm model


A. The Data Parallel Model
B. The work pool model
C. The task graph model
D. The Speculative Model
ANSWER: D
Nvidia GPU based on following architecture
A. MIMD
B. SIMD
C. SISD
D. MISD
ANSWER: B

What is Critical Path?

A. The length of the longest path in a task dependency graph is called


the critical path length.
B. The length of the smallest path in a task dependency graph is called
the critical path length.
C. Path with loop
D. None of the mentioned.
ANSWER: A

Which decompositioin technique uses divide-andconquer strategy?


A. recursive decomposition
B. Sdata decomposition
C. exploratory decomposition
D. speculative decomposition
ANSWER: A

If there are 6 nodes in a ring topology how many message passing cycles
will be required to complete broadcast process in one to all?

A. 1
B. 6
C. 3
D. 4
ANSWER: 3

If there is 4 X 4 Mesh topology network then how many ring operation will
perform to complete one to all broadcast?

A. 4
B. 8
C. 16
D. 32
ANSWER: 8

Consider all to all broadcast in ring topology with 8 nodes. How many
messages will be present with each node after 3rd step/cycle of
communication?
A. 3
B. 4
C. 6
D. 7
ANSWER: 4

Consider Hypercube topology with 8 nodes then how many message passing
cycles will require in all to all broadcast operation?

A. The longest path between any pair of finish nodes.


B. The longest directed path between any pair of start & finish node.
C. The shortest path between any pair of finish nodes.
D. The number of maximum nodes level in graph.
ANSWER: D
Scatter is ____________.
A. One to all broadcast communication
B. All to all broadcast communication
C. One to all personalised communication
D. Node of the above.
ANSWER: C

If there is 4X4 Mesh Topology ______ message passing cycles will require
complete all to all reduction.
A. 4
B. 6
C. 8
D. 16
ANSWER: C

Following issue(s) is/are the true about sorting techniques with parallel
computing.
A. Large sequence is the issue
B. Where to store output sequence is the issue
C. Small sequence is the issue
D. None of the above
ANSWER: B

Partitioning on series done after ______________


A. Local arrangement
B. Processess assignments
C. Global arrangement
D. None of the above
ANSWER: C

In Parallel DFS processes has following roles.(Select multiple choices if


applicable)
A. Donor
B. Active
C. Idle
D. Passive
ANSWER: A

Suppose there are 16 elements in a series then how many phases will be
required to sort the series using parallel odd-even bubble sort?
A. 8
B. 4
C. 5
D. 15
ANSWER: D

Which are different sources of Overheads in Parallel Programs?


A. Interprocess interactions
B. Process Idling
C. All mentioned options
D. Excess Computation
ANSWER: C

The ratio of the time taken to solve a problem on a parallel processors


to the time required to solve the same problem on a single processor with
p identical processing elements.
A. The ratio of the time taken to solve a problem on a single processor
to the time required to solve the same problem on a parallel computer
with p identical processing elements.
B. The ratio of the time taken to solve a problem on a single processor
to the time required to solve the same problem on a parallel computer
with p identical processing elements
C. The ratio of number of multiple processors to size of data
D. None of the above
ANSWER: B

Efficiency is a measure of the fraction of time for which a processing


element is usefully employed.
A. TRUE
B. FALSE
ANSWER: A

CUDA helps do execute code in parallel mode using __________


A. CPU
B. GPU
C. ROM
D. Cash memory
ANSWER: B

In thread-function execution scenario thread is a ___________


A. Work
B. Worker
C. Task
D. None of the above
ANSWER: B

In GPU Following statements are true


A. Grid contains Block
B. Block contains Threads
C. All the mentioned options.
D. SM stands for Streaming MultiProcessor
ANSWER: C

Computer system of a parallel computer is capable of_____________


A. Decentralized computing
B. Parallel computing
C. Centralized computing
D. All of these
ANSWER: A

In which application system Distributed systems can run well?


A. HPC
B. Distrubuted Framework
C. HRC
D. None of the above
ANSWER: A

A pipeline is like .................... ?


A. an automobile assembly line
B. house pipeline
C. both a and b
D. a gas line
ANSWER: A

Pipeline implements ?
A. fetch instruction
B. decode instruction
C. fetch operand
D. all of above
ANSWER: D

A processor performing fetch or decoding of different instruction during


the execution of another instruction is called ______ ?
A. Super-scaling
B. Pipe-lining
C. Parallel Computation
D. None of these
ANSWER: B

In a parallel execution, the performance will always improve as the


number of processors will increase?
A. True
B. False
ANSWER: B

VLIW stands for ?


A. Very Long Instruction Word
B. Very Long Instruction Width
C. Very Large Instruction Word
D. Very Long Instruction Width
ANSWER: A

In VLIW the decision for the order of execution of the instructions


depends on the program itself?
A. True
B. False
ANSWER: A

Which one is not a limitation of a distributed memory parallel system?


A. Higher communication time
B. Cache coherency
C. Synchronization overheads
D. None of the above
ANSWER: B

Which of these steps can create conflict among the processors?


A. Synchronized computation of local variables
B. Concurrent write
C. Concurrent read
D. None of the above
ANSWER: B

Which one is not a characteristic of NUMA multiprocessors?


A. It allows shared memory computing
B. Memory units are placed in physically different location
C. All memory units are mapped to one common virtual global memory
D. Processors access their independent local memories
ANSWER: D

Which of these is not a source of overhead in parallel computing?


A. Non-uniform load distribution
B. Less local memory requirement in distributed computing
C. Synchronization among threads in shared memory computing
D. None of the above
ANSWER: B

Systems that do not have parallel processing capabilities are?


A. SISD
B. SIMD
C. MIMD
D. All of the above
ANSWER: A

How does the number of transistors per chip increase according to Moore
´s law?
A. Quadratically
B. Linearly
C. Cubicly
D. Exponentially
ANSWER: D

Parallel processing may occur?


A. in the instruction stream
B. in the data stream
C. both[A] and [B]
D. none of the above
ANSWER: C

To which class of systems does the von Neumann computer belong?


A. SIMD (Single Instruction Multiple Data)
B. MIMD (Multiple Instruction Multiple Data)
C. MISD (Multiple Instruction Single Data)
D. SISD (Single Instruction Single Data)
ANSWER: D

Fine-grain threading is considered as a ______ threading?


A. Instruction-level
B. Loop level
C. Task-level
D. Function-level
ANSWER: A

Multiprocessor is systems with multiple CPUs, which are capable of


independently executing different tasks in parallel. In this category
every processor and memory module has similar access time?
A. UMA
B. Microprocessor
C. Multiprocessor
D. NUMA
ANSWER: A

For inter processor communication the miss arises are called?


A. hit rate
B. coherence misses
C. comitt misses
D. parallel processing
ANSWER: B

NUMA architecture uses _______in design?


A. cache
B. shared memory
C. message passing
D. distributed memory
ANSWER: D

A multiprocessor machine which is capable of executing multiple


instructions on multiple data sets?
A. SISD
B. SIMD
C. MIMD
D. MISD
ANSWER: C

In message passing, send and receive message between?


A. Task or processes
B. Task and Execution
C. Processor and Instruction
D. Instruction and decode
ANSWER: A

The First step in developing a parallel algorithm is_________?


A. To Decompose the problem into tasks that can be executed concurrently
B. Execute directly
C. Execute indirectly
D. None of Above
ANSWER: A

The number of tasks into which a problem is decomposed determines its?


A. Granularity
B. Priority
C. Modernity
D. None of above
ANSWER: A

The length of the longest path in a task dependency graph is called?


A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above
ANSWER: A

The graph of tasks (nodes) and their interactions/data exchange (edges)?


A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above
ANSWER: A

Mappings are determined by?


A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above
ANSWER: C

Decomposition Techniques are?


A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. All of Above
ANSWER: D

The Owner Computes Rule generally states that the process assigned a
particular data item is responsible for?
A. All computation associated with it
B. Only one computation
C. Only two computation
D. Only occasionally computation
ANSWER: A
A simple application of exploratory decomposition is_?
A. The solution to a 15 puzzle
B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above
ANSWER: A

Speculative Decomposition consist of _?


A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B
ANSWER: C

task characteristics include?


A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above
ANSWER: D

Writing parallel programs is referred to as?


A. Parallel computation
B. Parallel processes
C. Parallel development
D. Parallel programming
ANSWER: D

Parallel Algorithm Models?


A. Data parallel model
B. Bit model
C. Data model
D. network model
ANSWER: A

The number and size of tasks into which a problem is decomposed


determines the?
A. fine-granularity
B. coarse-granularity
C. sub Task
D. granularity
ANSWER: A

A feature of a task-dependency graph that determines the average degree


of concurrency for a given granularity is its ___________ path?
A. critical
B. easy
C. difficult
D. ambiguous
ANSWER: A

The pattern of___________ among tasks is captured by what is known as a


task-interaction graph?
A. Interaction
B. communication
C. optmization
D. flow
ANSWER: A
Interaction overheads can be minimized by____?
A. Maximize Data Locality
B. Maximize Volume of data exchange
C. Increase Bandwidth
D. Minimize social media contents
ANSWER: A

Type of parallelism that is naturally expressed by independent tasks in a


task-dependency graph is called _______ parallelism?
A. Task
B. Instruction
C. Data
D. Program
ANSWER: A

Speed up is defined as a ratio of?


A. s=Ts/Tp
B. S= Tp/Ts
C. Ts=S/Tp
D. Tp=S /Ts
ANSWER: A

Parallel computing means to divide the job into several __________?


A. Bit
B. Data
C. Instruction
D. Task
ANSWER: D

_________ is a method for inducing concurrency in problems that can be


solved using the divide-and-conquer strategy?
A. exploratory decomposition
B. speculative decomposition
C. data-decomposition
D. Recursive decomposition
ANSWER: C

The___ time collectively spent by all the processing elements Tall = p


TP?
A. total
B. Average
C. mean
D. sum
ANSWER: A

Group communication operations are built using point-to-point messaging


primitives?
A. True
B. False
ANSWER: A

Communicating a message of size m over an uncongested network takes time


ts + tmw?
A. True
B. False
ANSWER: A

The dual of one-to-all broadcast is ?


A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above
ANSWER: A

A hypercube has?
A. 2d nodes
B. 2d nodes
C. 2n Nodes
D. N Nodes
ANSWER: A

A binary tree in which processors are (logically) at the leaves and


internal nodes are routing nodes?
A. True
B. False
ANSWER: A

In All-to-All Broadcast each processor is thesource as well as


destination?
A. True
B. False
ANSWER: A

The Prefix Sum Operation can be implemented using the ?


A. All-to-all broadcast kernel.
B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel
ANSWER: A

In the scatter operation ?


A. Single node send a unique message of size m to every other node
B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node
D. None of Above
ANSWER: A

The gather operation is exactly the inverse of the ?


A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation
ANSWER: A

In All-to-All Personalized Communication Each node has a distinct message


of size m for every other node ?
A. True
B. False
ANSWER: A

Parallel algorithms often require a single process to send identical data


to all other processes or to a subset of them. This operation is known as
_________?
A. one-to-all broadcast
B. All to one broadcast
C. one-to-all reduction
D. all to one reduction
ANSWER: A
In which of the following operation, a single node sends a unique message
of size m to every other node?
A. Gather
B. Scatter
C. One to all personalized communication
D. Both A and C
ANSWER: D

Gather operation is also known as ________?


A. One to all personalized communication
B. One to all broadcast
C. All to one reduction
D. All to All broadcast
ANSWER: A

one-to-all personalized communication does not involve any duplication of


data?
A. True
B. False
ANSWER: A

Gather operation, or concatenation, in which a single node collects a


unique message from each node?
A. True
B. False
ANSWER: A

Conventional architectures coarsely comprise of a?


A. A processor
B. Memory system
C. Data path.
D. All of Above
ANSWER: D

Data intensive applications utilize?


A. High aggregate throughput
B. High aggregate network bandwidth
C. High processing and memory system performance.
D. None of above
ANSWER: A

A pipeline is like?
A. Overlaps various stages of instruction execution to achieve
performance.
B. House pipeline
C. Both a and b
D. A gas line
ANSWER: A

Scheduling of instructions is determined?


A. True Data Dependency
B. Resource Dependency
C. Branch Dependency
D. All of above
ANSWER: D

VLIW processors rely on?


A. Compile time analysis
B. Initial time analysis
C. Final time analysis
D. Mid time analysis
ANSWER: A

Memory system performance is largely captured by?


A. Latency
B. Bandwidth
C. Both a and b
D. none of above
ANSWER: C

The fraction of data references satisfied by the cache is called?


A. Cache hit ratio
B. Cache fit ratio
C. Cache best ratio
D. none of above
ANSWER: A

A single control unit that dispatches the same Instruction to various


processors is?
A. SIMD
B. SPMD
C. MIMD
D. None of above
ANSWER: A

The primary forms of data exchange between parallel tasks are?


A. Accessing a shared data space
B. Exchanging messages.
C. Both A and B
D. None of Above
ANSWER: C

Switches map a fixed number of inputs to outputs?


A. True
B. False
ANSWER: A

The First step in developing a parallel algorithm is?


A. To Decompose the problem into tasks that can be executed concurrently
B. Execute directly
C. Execute indirectly
D. None of Above
ANSWER: A

The number of tasks into which a problem is decomposed determines its?


A. Granularity
B. Priority
C. Modernity
D. None of above
ANSWER: A

The length of the longest path in a task dependency graph is called?


A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above
ANSWER: A

The graph of tasks (nodes) and their interactions/data exchange (edges)?


A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above
ANSWER: A

Mappings are determined by?


A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above
ANSWER: C

Decomposition Techniques are?


A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. All of Above
ANSWER: D

The Owner Computes Rule generally states that the process assigned a
particular data item are responsible for?
A. All computation associated with it
B. Only one computation
C. Only two computation
D. Only occasionally computation
ANSWER: A

A simple application of exploratory decomposition is?


A. The solution to a 15 puzzle
B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above
ANSWER: A

Speculative Decomposition consist of ?


A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B
ANSWER: C

Task characteristics include?


A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above.
ANSWER: D

Group communication operations are built using point-to-point messaging


primitives?
A. True
B. False
ANSWER: A

Communicating a message of size m over an uncongested network takes time


ts + tmw?
A. True
B. False
ANSWER: A
The dual of one-to-all broadcast is?
A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above
ANSWER: A

A hypercube has?
A. 2d nodes
B. 3d nodes
C. 2n Nodes
D. N Nodes
ANSWER: A

A binary tree in which processors are (logically) at the leaves and


internal nodes are routing nodes?
A. True
B. False
ANSWER: A

In All-to-All Broadcast each processor is the source as well as


destination?
A. True
B. False
ANSWER: A

The Prefix Sum Operation can be implemented using the?


A. All-to-all broadcast kernel.
B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel
ANSWER: A

In the scatter operation?


A. Single node send a unique message of size m to every other node
B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node
D. None of Above
ANSWER: A

The gather operation is exactly the inverse of the?


A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation
ANSWER: A

In All-to-All Personalized Communication Each node has a distinct message


of size m for every other node?
A. True
B. False
ANSWER: A

Computer system of a parallel computer is capable of?


A. Decentralized computing
B. Parallel computing
C. Centralized computing
D. Decentralized computing
E. Distributed computing
ANSWER: A
Writing parallel programs is referred to as?
A. Parallel computation
B. Parallel processes
C. Parallel development
D. Parallel programming
ANSWER: D

Simplifies applications of three-tier architecture is ____________?


A. Maintenance
B. Initiation
C. Implementation
D. Deployment
ANSWER: D

Dynamic networks of networks, is a dynamic connection that grows is


called?
A. Multithreading
B. Cyber cycle
C. Internet of things
D. Cyber-physical system
ANSWER: C

In which application system Distributed systems can run well?


A. HPC
D. HTC
C. HRC
D. Both A and B
ANSWER: D

In which systems desire HPC and HTC?


A. Adaptivity
B. Transparency
C. Dependency
D. Secretive
ANSWER: B

No special machines manage the network of architecture in which


resources are known as?
A. Peer-to-Peer
B. Space based
C. Tightly coupled
D. Loosely coupled
ANSWER: A

Significant characteristics of Distributed systems have of ?


A. 5 types
B. 2 types
C. 3 types
D. 4 types
ANSWER: C

Built of Peer machines are over?


A. Many Server machines
B. 1 Server machine
C. 1 Client machine
D. Many Client machines
ANSWER: D

Type HTC applications are?


A. Business
B. Engineering
C. Science
D. Media mass
ANSWER: A

Virtualization that creates one single address space architecture that


of, is called?
A. Loosely coupled
B. Peer-to-Peer
C. Space-based
D. Tightly coupled
ANSWER: C

We have an internet cloud of resources In cloud computing to form?


A. Centralized computing
B. Decentralized computing
C. Parallel computing
D. All of these
ANSWER: D

Data access and storage are elements of Job throughput, of __________?


A. Flexibility
B. Adaptation
C. Efficiency
D. Dependability
ANSWER: C

Billions of job requests is over massive data sets, ability to support


known as?
A. Efficiency
B. Dependability
C. Adaptation
D. Flexibility
ANSWER: C

Broader concept offers Cloud computing .to select which of the following?
A. Parallel computing
B. Centralized computing
C. Utility computing
D. Decentralized computing
ANSWER: C

Resources and clients transparency that allows movement within a system


is called?
A. Mobility transparency
B. Concurrency transparency
C. Performance transparency
D. Replication transparency
ANSWER: A

Distributed program in a distributed computer running a is known as?


A. Distributed process
B. Distributed program
C. Distributed application
D. Distributed computing
ANSWER: B

Uniprocessor computing devices is called__________?


A. Grid computing
B. Centralized computing
C. Parallel computing
D. Distributed computing
ANSWER: B

Utility computing focuses on a______________ model?


A. Data
B. Cloud
C. Scalable
D. Business
ANSWER: D

What is a CPS merges technologies?


A. 5C
B. 2C
C. 3C
D. 4C
ANSWER: C

Aberration of HPC?
A. High-peak computing
B. High-peripheral computing
C. High-performance computing
D. Highly-parallel computing
ANSWER: C

Peer-to-Peer leads to the development of technologies like?


A. Norming grids
B. Data grids
C. Computational grids
D. Both A and B
ANSWER: D

Type of HPC applications of?


A. Management
B. Media mass
C. Business
D. Science
ANSWER: D

The development generations of Computer technology has gone through?


A. 6
B. 3
C. 4
D. 5
ANSWER: D

Utilization rate of resources in an execution model is known to be its?


A. Adaptation
B. Efficiency
C. Dependability
D. Flexibility
ANSWER: B

Even under failure conditions Providing Quality of Service (QoS)


assurance is the responsibility of?
A. Dependability
B. Adaptation
C. Flexibility
D. Efficiency
ANSWER: A

Interprocessor communication that takes place?


A. Centralized memory
B. Shared memory
C. Message passing
D. Both A and B
ANSWER: D

Data centers and centralized computing covers many and?


A. Microcomputers
B. Minicomputers
C. Mainframe computers
D. Supercomputers
ANSWER: D

Which of the following is an primary goal of HTC paradigm___________?


A. High ratio Identification
B. Low-flux computing
C. High-flux computing
D. Computer utilities
ANSWER: C

The high-throughput service provided is measures taken by


A. Flexibility
B. Efficiency
C. Dependability
D. Adaptation
ANSWER: D

What are the sources of overhead?


A. Essential /Excess Computation
B. Inter-process Communication
C. Idling
D. All above
ANSWER: D

Which are the performance metrics for parallel systems?


A. Execution Time
B. Total Parallel Overhead
C. Speedup
D. All above
ANSWER: D

The efficiency of a parallel program can be written as: E = Ts / pTp.


True or False?
A. True
B. False
ANSWER: A

The important feature of the VLIW is ______?


A. ILP
B. Performance
C. Cost effectiveness
D. delay
ANSWER: A
What are the sources of overhead?

A. Essential /Excess Computation


B. Inter-process Communication
C. Idling
D. All above

Answer : D

Which are the performance metrics for parallel systems?

A. Execution Time
B. Total Parallel Overhead
C. Speedup
D. Efficiency
E. Cost
F. All above

Answer : F

The efficiency of a parallel program can be written as: E = Ts / pTp. True or False?

A. True
B. False

Answer : A

Overhead function or total overhead of a parallel system as the total time collectively
spent by all the processing elements over and above that required by the fastest known
sequential algorithm for solving the same problem on a single processing element.
True or False?
A. True
B. False
Answer : A

What is Speedup?
A. A measure that captures the relative benefit of solving a problem in parallel. It is defined as the
ratio of the time taken to solve a problem on a single processing element to the time required to
solve the same problem on a parallel computer with p identical processing elements.
B. A measure of the fraction of time for which a processing element is usefully
employed.
C. None of the above
Answer : A

In an ideal parallel system, speedup is equal to p and efficiency is equal to one. True or
False?
A. True
B. False
Answer : A
A parallel system is said to be ________________ if the cost of solving a problem on a
parallel computer has the same asymptotic growth (in terms) as a function of the input
size as the fastest-known sequential algorithm on a single processing element.
A. Cost optimal
B. Non Cost optimal
Answer : A

Using fewer than the maximum possible number of processing elements to execute a
parallel algorithm is called ______________ a parallel system in terms of the number of
processing elements.

A. Scaling down
B. Scaling up
Answer : B

The __________________ function determines the ease with which a parallel system can
maintain a constant efficiency and hence achieve speedups increasing in proportion to the
number of processing elements.
A. Isoefficiency
B. Efficiency
C. Scalability
D. Total overhead
Answer : A

Minimum execution time for adding n numbers is Tp = n/p + 2 logp True or False ?
A. True
B. False
Answer : A

The overhead function To = pTP − TS.


A. True
B. False
Answer : A

Performance Metrics for Parallel Systems: Speedup(S) =TS/TP


A. True
B. False
Answer : A

Matrix Vector multiplication 2D Partitions requires some basic communication operations


A. one-to-one communication to align the vector along the main diagonal
B. one-to-all broadcast of each vector element among the n processes of each column
C. all-to-one reduction in each row
D. All Above
Answer : D
What are the issues in sorting?

A. Where the Input and Output Sequences are Stored

B. How Comparisons are Performed

C. All above

Answer : C

The parallel run time of the formulation for Bubble sort is

A. Tp = O(n/plogn/p) + O(n) + O(n)

B. Tp = O(n/plogn/p) + O (n/plogp) + O(ln/p)

C. Non of the above

Answer : A

What are the variants of Bubble sort?

A. Shell sort

B. Quick sort

C. Odd-Even transposition

D. Option A & C

Answer : D

What is the overall complexity of parallel algorithm for quick sort?

A. Tp = O(n/p logn/p) + O(n/p logp) + O(log2p)

B. Tp = O(n/p logn/p) + O(n/p logp)

C. Tp = O(n/p logn/p) + O(log2p)

Answer : A
Formally, given a weighted graph G(V, E, w), the all-pairs shortest paths problem is to
find the shortest paths between all pairs of vertices. True or False?

A. True
B. False

Answer : A

What is true for parallel formulation of Dijkstra’s Algorithm?

A. One approach partitions the vertices among different processes and has each process
compute the single-source shortest paths for all vertices assigned to it. We refer to
this approach as the source-partitioned formulation.
B. Another approach assigns each vertex to a set of processes and uses the parallel
formulation of the single-source algorithm to solve the problem on each set of
processes. We refer to this approach as the source-parallel formulation.
C. Both are true
D. Non of these is true

Answer : C

Search algorithms can be used to solve discrete optimization problems. True or False ?

A. True
B. False
Answer : A

Examples of Discrete optimization problems are ;


A. planning and scheduling,
B. The optimal layout of VLSI chips,
C. Robot motion planning,
D. Test-pattern generation for digital circuits, and logistics and control.
E. All of above
Answer : E

List the important parameters of Parallel DFS

A. Work- Splitting Strategies

B. Load balancing Schemes

C. All of above

Answer : C
List the communication strategies for parallel BFS.

A. Random communication strategy

B. Ring communication strategy

C. Blackboard communication strategy

D. All of above

Answer : D

The lower bound on any comparison-based sort of n numbers is Θ(nlog n)


A. True
B. False
Answer : A

In a compare-split operation
A. Each process sends its block of size n/p to the other process
B. Each process merges the received block with its own block and retains only the
appropriate half of the merged block
C. Both A & B
Answer : C

In a typical sorting network


A. Every sorting network is made up of a series of columns
B. Each column contains a number of comparators connected in parallel
C. Both A & B
Answer : C

Bubble sort is difficult to parallelize since the algorithm has no concurrency


A. True
B. False
Answer : A
Which of the following statements are true with regard to compute capability in CUDA

A. Code compiled for hardware of one compute capability will not need to be re-
compiled to run on hardware of another

B. Different compute capabilities may imply a different amount of local memory per
thread

C. Compute capability is measured by the number of FLOPS a GPU accelerator can


compute.

Answer : B

True or False: The threads in a thread block are distributed across SM units so that each
thread is executed by one SM unit.

A. True

B. False

Answer : B

The style of parallelism supported on GPUs is best described as

A. SISD - Single Instruction Single Data

B. MISD - Multiple Instruction Single Data

C. SIMT - Single Instruction Multiple Thread

Answer : C

True or false: Functions annotated with the __global__ qualifier may be executed on the
host or the device

A. True

B. Flase

Answer : A
Which of the following correctly describes a GPU kernel

A. A kernel may contain a mix of host and GPU code

B. All thread blocks involved in the same computation use the same kernel

C. A kernel is part of the GPU's internal micro-operating system, allowing it to act as in


independent host

Answer : B

Which of the following is not a form of parallelism supported by CUDA

A. Vector parallelism - Floating point computations are executed in parallel on wide


vector units

B. Thread level task parallelism - Different threads execute a different tasks

C. Block and grid level parallelism - Different blocks or grids execute different tasks

D. Data parallelism - Different threads and blocks process different parts of data in
memory

Answer :A

What strategy does the GPU employ if the threads within a warp diverge in their execution?

A. Threads are moved to different warps so that divergence does not occur within a
single warp

B. Threads are allowed to diverge

C. All possible execution paths are run by all threads in a warp serially so that thread
instructions do not diverge

Answer : C

Which of the following does not result in uncoalesced (i.e. serialized) memory access on the
K20 GPUs installed on Stampede

A. Aligned, but non-sequential access

B. Misaligned data access

C. Sparse memory access

Answer : A
Which of the following correctly describes the relationship between Warps, thread blocks,
and CUDA cores?

A. A warp is divided into a number of thread blocks, and each thread block executes on
a single CUDA core

B. A thread block may be divided into a number of warps, and each warp may execute
on a single CUDA core

C. A thread block is assigned to a warp, and each thread in the warp is executed on a
separate CUDA core

Answer : B

Shared memory in CUDA is accessible to:

A. All threads in a single block

B. Both the host and GPU

C. All threads associated with a single kernel

Answer : A

CUDA Architecture CPU consist of

A. CUDA Libraries

B. CUDA Runtime

C. CUDA Driver

D. All Above

Answer : D

CUDA platform works on

A. C

B. C++

C. Forton

D. All Above

Answer : D
Threads support Shared memory and Synchronization

A. True

B. False

Answer : A

Application of CUDA are

A. Fast Video Transcoding

B. Medical Imaging

C. Computational Science

D. Oil and Natural Resources exploration

E. All Above

Answer : E

GPU execute device code

A. True

B. False

Answer : A
---------------------------------------------------------------------------------------------------------------------
SET 1 (120 MCQs)
---------------------------------------------------------------------------------------------------------------------

1. Conventional architectures coarsely comprise of a_

A. A processor
B. Memory system
C Data path.
D All of Above

2. Data intensive applications utilize_

A High aggregate throughput


B High aggregate network bandwidth
C High processing and memory system performance.
D None of above

3. A pipeline is like_

A Overlaps various stages of instruction execution to achieve performance.


B House pipeline
C Both a and b
D A gas line

4. Scheduling of instructions is determined_

A True Data Dependency


B Resource Dependency
C Branch Dependency
D All of above

5. VLIW processors rely on_

A Compile time analysis


B Initial time analysis
C Final time analysis
D Mid time analysis

6. Memory system performance is largely captured by_

A Latency
B Bandwidth
C Both a and b
D none of above

7. The fraction of data references satisfied by the cache is called_

OptimusPrime Page 1
A Cache hit ratio
B Cache fit ratio
B Cache best ratio
C none of above
8. A single control unit that dispatches the same Instruction to various processors is__

A SIMD
B SPMD
C MIMD
D None of above

9. The primary forms of data exchange between parallel tasks are_

A Accessing a shared data space


B Exchanging messages.
C Both A and B
D None of Above

10. Switches map a fixed number of inputs to outputs.


A True
B False

11. The First step in developing a parallel algorithm is_

A. To Decompose the problem into tasks that can be executed concurrently


B. Execute directly
C. Execute indirectly
D. None of Above

12. The number of tasks into which a problem is decomposed determines its_

A. Granularity
B. Priority
C. Modernity
D. None of above

13. The length of the longest path in a task dependency graph is called_
A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above
E.

14. The graph of tasks (nodes) and their interactions/data exchange (edges)_
A. Is referred to as a task interaction graph

OptimusPrime Page 2
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above

15. Mappings are determined by_

A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above

16. Decomposition Techniques are_


A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. speculative decomposition
E. All of Above

17. The Owner Computes Rule generally states that the process assigned a particular data
item is responsible for_

A. All computation associated with it


B. Only one computation
C. Only two computation
D. Only occasionally computation

18. A simple application of exploratory decomposition is_

A. The solution to a 15 puzzle


B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above

19. Speculative Decomposition consist of _


A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B

20. task characteristics include:


A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above

OptimusPrime Page 3
21. Group communication operations are built using point-to-point messaging primitives
A. True
B. False

22. Communicating a message of size m over an uncongested network takes time ts + tmw

A. True
B. False

23. The dual of one-to-all broadcast is_

A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above

24. A hypercube has_

A. 2d nodes
B. 2d nodes
C. 2n Nodes
D. N Nodes

25. A binary tree in which processors are (logically) at the leaves and internal nodes are
routing nodes.

A. True
B. False

26. In All-to-All Broadcast each processor is the source as well as destination.

A. True
B. False

27. The Prefix Sum Operation can be implemented using the_

A. All-to-all broadcast kernel.


B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel

28. In the scatter operation_

A. Single node send a unique message of size m to every other node


B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node

OptimusPrime Page 4
D. None of Above

29. The gather operation is exactly the inverse of the_

A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation

30. In All-to-All Personalized Communication Each node has a distinct message of size m
for every other node

A. True
B. False

31. Computer system of a parallel computer is capable of

A. Decentralized computing
B. Parallel computing
C. Centralized computing
D. Decentralized computing
E. Distributed computing
F. All of these
G. None of these

32. Writing parallel programs is referred to as

A. Parallel computation
B. Parallel processes
C. Parallel development
D. Parallel programming
E. Parallel computation
F. All of these
G. None of these

33. Simplifies applications of three-tier architecture is ____________.


A. Maintenance
B. Initiation
C. Implementation
D. Deployment
E. All of these
F. None of these

34. Dynamic networks of networks, is a dynamic connection that grows is called

OptimusPrime Page 5
A. Multithreading
B. Cyber cycle
C. Internet of things
D. Cyber-physical system
E. All of these
F. None of these

35. In which application system Distributed systems can run well?

A. HPC
D. HTC
C. HRC
D. Both A and B
E. All of these
F. None of these

36. In which systems desire HPC and HTC.

A. Adaptivity
B. Transparency
C. Dependency
D. Secretive
E. Adaptivity
F. All of these
G. None of these

37. No special machines manage the network of architecture in which resources are known as

A. Peer-to-Peer
B. Space based
C. Tightly coupled
D. Loosely coupled
E. All of these
F. None of these

38. Significant characteristics of Distributed systems have of


A. 5 types
B. 2 types
C. 3 types
D. 4 types
E. All of these
F. None of these

39. Built of Peer machines are over

OptimusPrime Page 6
A. Many Server machines
B. 1 Server machine
C. 1 Client machine
D. Many Client machines
E. All of these
F. None of these

40. Type HTC applications are

A. Business
B. Engineering
C. Science
D. Media mass
E. All of these
F. None of these

41. Virtualization that creates one single address space architecture that of, is called

A. Loosely coupled
B. Peer-to-Peer
C. Space-based
D. Tightly coupled
E. Loosely coupled
F. All of these
G. None of these

42. We have an internet cloud of resources In cloud computing to form

A. Centralized computing
B. Decentralized computing
C. Parallel computing
D. Both A and B
E. All of these
F. None of these

43. Data access and storage are elements of Job throughput, of __________.

A. Flexibility
B. Adaptation
C. Efficiency
D. Dependability
E. All of these
F. None of these

44. Billions of job requests is over massive data sets, ability to support known as

OptimusPrime Page 7
A. Efficiency
B. Dependability
C. Adaptation
D. Flexibility
E. All of these
F. None of these

45. Broader concept offers Cloud computing .to select which of the following.

A. Parallel computing
B. Centralized computing
C. Utility computing
D. Decentralized computing
E. Parallel computing
F. All of these
G. None of these

46. Resources and clients transparency that allows movement within a system is called

A. Mobility transparency
B. Concurrency transparency
C. Performance transparency
D. Replication transparency
E. All of these
F. None of these

47. Distributed program in a distributed computer running a is known as

A. Distributed process
B. Distributed program
C. Distributed application
D. Distributed computing
E. All of these
F. None of these

48. Uniprocessor computing devices is called__________.

A. Grid computing
B. Centralized computing
C. Parallel computing
D. Distributed computing
E. All of these
F. None of these

49. Utility computing focuses on a______________ model.

OptimusPrime Page 8
A. Data
B. Cloud
C. Scalable
D. Business
E. All of these
F. None of these

50.|What is a CPS merges technologies

A. 5C
B. 2C
C. 3C
D. 4C
E. All of these
F. None of these

51. Aberration of HPC

A. High-peak computing
B. High-peripheral computing
C. High-performance computing
D. Highly-parallel computing
E. All of these
F. None of these

52. Peer-to-Peer leads to the development of technologies like

A. Norming grids
B. Data grids
C. Computational grids
D. Both A and B
E. All of these
F. None of these

53. Type of HPC applications of.

A. Management
B. Media mass
C. Business
D. Science
E. All of these
F. None of these

54. The development generations of Computer technology has gone through

A. 6

OptimusPrime Page 9
B. 3
C. 4
D. 5
E. All of these
F. None of these

55. Utilization rate of resources in an execution model is known to be its

A. Adaptation
B. Efficiency
C. Dependability
D. Flexibility
E. All of these
F. None of these

56. Even under failure conditions Providing Quality of Service (QoS) assurance is the
responsibility of

A. Dependability
B. Adaptation
C. Flexibility
D. Efficiency
E. All of these
F. None of these

57. Interprocessor communication that takes place


A. Centralized memory
B. Shared memory
C. Message passing
D. Both A and B
E. All of these
F. None of these

58. Data centers and centralized computing covers many and


A. Microcomputers
B. Minicomputers
C. Mainframe computers
D. Supercomputers
E. All of these
F. None of these

59. Which of the following is an primary goal of HTC paradigm___________.


A. High ratio Identification
B. Low-flux computing
C. High-flux computing
D. Computer utilities

OptimusPrime Page 10
E. All of these
F. None of these

60. The high-throughput service provided is measures taken by


A. Flexibility
B. Efficiency
D. Adaptation
E. Dependability
F. All of these
G. None of these

61. What are the sources of overhead?


A. Essential /Excess Computation
B. Inter-process Communication
C. Idling
D. All above

62. Which are the performance metrics for parallel systems?


A. Execution Time
B. Total Parallel Overhead
C. Speedup
D. Efficiency
E. Cost
F. All above

63. The efficiency of a parallel program can be written as: E = Ts / pTp. True or False?
A. True
B. False

64. Overhead function or total overhead of a parallel system as the total time collectively
spent by all the processing elements over and above that required by the fastest known
sequential algorithm for solving the same problem on a single processing element. True
or False?
A. True
B. False

65. What is Speedup?


A. A measure that captures the relative benefit of solving a problem in parallel. It is defined
as the ratio of the time taken to solve a problem on a single processing element to the
time required to solve the same problem on a parallel computer with p identical
processing elements.
B. A measure of the fraction of time for which a processing element is usefully employed.
C. None of the above
Answer : A

OptimusPrime Page 11
66. In an ideal parallel system, speedup is equal to p and efficiency is equal to one. True or
False?
A. True
B. False

67. A parallel system is said to be ________________ if the cost of solving a problem on a


parallel computer has the same asymptotic growth (in Q terms) as a function of the input size
as the fastest-known sequential algorithm on a single processing element.
A. Cost optimal
B. Non Cost optimal

68. Using fewer than the maximum possible number of processing elements to execute a
parallel algorithm is called ______________ a parallel system in terms of the number of
processing elements.

A. Scaling down
B. Scaling up

69. The __________________ function determines the ease with which a parallel system can
maintain a constant efficiency and hence achieve speedups increasing in proportion to the
number of processing elements.
A. Isoefficiency
B. Efficiency
C. Scalability
D. Total overhead

70. Minimum execution time for adding n numbers is Tp = n/p + 2 logp True or False ?
A. True
B. False

71. The overhead function To = pTP − TS.


A. True
B. False

72. Performance Metrics for Parallel Systems: Speedup(S) =TS/TP


A. True
B. False

73. Matrix Vector multiplication 2D Partitions requires some basic communication


operations
A. one-to-one communication to align the vector along the main diagonal
B. one-to-all broadcast of each vector element among the n processes of each column
C. all-to-one reduction in each row
D. All Above

OptimusPrime Page 12
74. What are the issues in sorting?
A. Where the Input and Output Sequences are Stored
B. How Comparisons are Performed
C. All above
Answer : C

75. The parallel run time of the formulation for Bubble sort is
A. Tp = O(n/plogn/p) + O(n) + O(n)
B. Tp = O(n/plogn/p) + O (n/plogp) + O(ln/p)
C. Non of the above

76. What are the variants of Bubble sort?


A. Shell sort
B. Quick sort
C. Odd-Even transposition
D. Option A & C

77. What is the overall complexity of parallel algorithm for quick sort?
A. Tp = O(n/p logn/p) + O(n/p logp) + O(log2 p)
B. Tp = O(n/p logn/p) + O(n/p logp)
C. Tp = O(n/p logn/p) + O(log2 p)

78. Formally, given a weighted graph G(V, E, w), the all-pairs shortest paths problem is to
find the shortest paths between all pairs of vertices. True or False?

A. True
B. False

79. What is true for parallel formulation of Dijkstra’s Algorithm?


A. One approach partitions the vertices among different processes and has each process
compute the single-source shortest paths for all vertices assigned to it. We refer to this
approach as the source-partitioned formulation.
B. Another approach assigns each vertex to a set of processes and uses the parallel
formulation of the single-source algorithm to solve the problem on each set of processes.
We refer to this approach as the source-parallel formulation.
C. Both are true
D. Non of these is true

80. Search algorithms can be used to solve discrete optimization problems. True or False ?

A. True
B. False

OptimusPrime Page 13
81. Examples of Discrete optimization problems are ;
A. planning and scheduling,
B. The optimal layout of VLSI chips,
C. Robot motion planning,
D. Test-pattern generation for digital circuits, and logistics and control.
E. All of above

82. List the important parameters of Parallel DFS


A. Work- Splitting Strategies
B. Load balancing Schemes
C. All of above

83.List the communication strategies for parallel BFS.


A. Random communication strategy
B. Ring communication strategy
C. Blackboard communication strategy
D. All of above

84. The lower bound on any comparison-based sort of n numbers is Θ(nlog n)


A. True
B. False

85. In a compare-split operation


A. Each process sends its block of size n/p to the other process
B. Each process merges the received block with its own block and retains only the
appropriate half of the merged block
C. Both A & B

86. In a typical sorting network


A. Every sorting network is made up of a series of columns
B. Each column contains a number of comparators connected in parallel
C. Both A & B

87. Bubble sort is difficult to parallelize since the algorithm has no concurrency
A. True
B. False

88. Which of the following statements are true with regard to compute capability in CUDA
A. Code compiled for hardware of one compute capability will not need to be re-compiled to
run on hardware of another
B. Different compute capabilities may imply a different amount of local memory per
thread
C. Compute capability is measured by the number of FLOPS a GPU accelerator can
compute.

OptimusPrime Page 14
87. True or False: The threads in a thread block are distributed across SM units so that each
thread is executed by one SM unit.
A. True
B. False

88. The style of parallelism supported on GPUs is best described as


A. SISD - Single Instruction Single Data
B. MISD - Multiple Instruction Single Data
C. SIMT - Single Instruction Multiple Thread

87. True or false: Functions annotated with the __global__ qualifier may be executed on the
host or the device
A. True
B. False

88. Which of the following correctly describes a GPU kernel


A. A kernel may contain a mix of host and GPU code
B. All thread blocks involved in the same computation use the same kernel
C. A kernel is part of the GPU's internal micro-operating system, allowing it to act as in
independent host

89. Which of the following is not a form of parallelism supported by CUDA


A. Vector parallelism - Floating point computations are executed in parallel on wide
vector units
B. Thread level task parallelism - Different threads execute a different tasks
C. Block and grid level parallelism - Different blocks or grids execute different tasks
D. Data parallelism - Different threads and blocks process different parts of data in memory

90.What strategy does the GPU employ if the threads within a warp diverge in their
execution?
A. Threads are moved to different warps so that divergence does not occur within a single
warp
B. Threads are allowed to diverge
C. All possible execution paths are run by all threads in a warp serially so that thread
instructions do not diverge

91. Which of the following does not result in coalesced (i.e. serialized) memory access on the
K20 GPUs installed on Stampede
A. Aligned, but non-sequential access
B. Misaligned data access
C. Sparse memory access

92. Which of the following correctly describes the relationship between Warps, thread
blocks, and CUDA cores?

OptimusPrime Page 15
A. A warp is divided into a number of thread blocks, and each thread block executes on a
single CUDA core
B. A thread block may be divided into a number of warps, and each warp may execute
on a single CUDA core
C. A thread block is assigned to a warp, and each thread in the warp is executed on a
separate CUDA core

93. Shared memory in CUDA is accessible to:


A. All threads in a single block
B. Both the host and GPU
C. All threads associated with a single kernel

94. CUDA Architecture CPU consist of


A. CUDA Libraries
B. CUDA Runtime
C. CUDA Driver
D. All Above

95. CUDA platform works on


A. C
B. C++
C. Forton
D. All Above

96. Threads support Shared memory and Synchronization


A. True
B. False

97. Application of CUDA is


A. Fast Video Transcoding
B. Medical Imaging
C. Computational Science
D. Oil and Natural Resources exploration
E. All Above

98. GPU execute device code


A. True
B. False

99. Hazard are eliminated through renaming by renaming all

A. Source register
B. Memory
C. Data
D. Destination register

OptimusPrime Page 16
100. Types of HPC application

A. Mass Media
B. Business
C. Management
D. Science

101. A distributed operating system must provide a mechanism for

A. intraprocessor communication
B. intraprocess and intraprocessor communication
C. interprocess and interprocessor communication
D. interprocessor communication

102. This is computation not performed by the serial version

A. Serial computation
B. Excess computation
C. perpendicular computation
D. parallel computing

103. The important feature of the VLIW is ______

A. ILP
B. Performance
C. Cost effectiveness
D. delay

104. The tightly coupled set of threads execution working on a single task ,that is called

A. Multithreading
B. Parallel processing
C. Recurrence
D. Serial processing

105. Parallel Algorithm Models

A. Data parallel model

OptimusPrime Page 17
B. Bit model
C. Data model
D. Network model

106. Mpi_Recv used for

A. reverse message
B. receive message
C. forward message
D. Collect message

107. Status bit is also called

A. Binary bit
B. Flag bit
C. Signed bit
D. Unsigned bit

108. For inter processor communication the miss arises are called

A. hit rate
B. coherence misses
C. comitt misses
D. parallel processing

109. The interconnection topologies are implemented using _________ as a node.

A. control unit
B. microprocessor
C. processing unit
D. microprocessor or processing unit

110. _________ gives the theoretical speedup in latency of the execution of a task at fixed
execution time

A. Amdahl's

OptimusPrime Page 18
B. Moor's
C. metcalfe's
D. Gustafson's law

111. The number and size of tasks into which a problem is decomposed determines the

A. fine-grainularity
B. coarse-grainularity
C. sub Task
D. granularity

112. MPI_Finalize used for

A. Stop mpi environment program


B. intitalise program
C. Include header files
D. program start

113. Private data that is used by a single processor then shared data are used

A. Single processor
B. Multi processor
C. Single tasking
D. Multi tasking

114. The time lost due to the branch instruction is often referred to as ____________

A. Delay
B. Branch penalty
C. Latency
D. control hazard

115. NUMA architecture uses _______in design

A. cache
B. shared memory

OptimusPrime Page 19
C. message passing
D. distributed memory

116. Divide and Conqure apporach is known for

A. Sequentional algorithm development


B. parallel algorithm develpoment
C. Task defined algorithm
D. Non defined Algorithm

117. The parallelism across branches require which scheduling

A. Global scheduling
B. Local Scheduling
C. post scheduling
D. pre scheduling

118. Parallel processing may occur

A. In the data stream


B. In instruction stream
C. In network
D. In transferring

119. Pipe-lining is a unique feature of _______.

A. CISC
B. RISC
C. ISA
D. IANA

120. In MPI programing MPI_char is the instruction for

A. Unsign Char
B. Sign character
C. Long Char
D. unsign long char

OptimusPrime Page 20
---------------------------------------------------------------------------------------------------------------------
SET 2 (26 MCQs)
---------------------------------------------------------------------------------------------------------------------

1. What is Cuda Architecture?


a.CUDA Architecture included a unified shader pipeline, allowing each and every chip to be
marshaled by a program.
b.CUDA Architecture included a unified shader pipeline, allowing each and every unit on the
chip to be marshaled by a program intending to perform general-purpose computations
c.CUDA Architecture included a unified shader pipeline, allowing each and every logic unit on
the chip to be marshaled by a program intending to perform general-purpose computations
d.CUDA Architecture included a unified shader pipeline, allowing each and every arithmetic
logic unit (ALU) on the chip to be marshaled by a program intending to perform general-purpose
computations
Ans.D

2. For the following code write a kernel

__global__ void kernel( void ) { }


int main( void ) {
// Write a kernel here
printf( "Hello, World!\n" ); return 0; }
a.kernel<1, 1>(1,1);
b.kernel<<<1, 1>>>(1,1);
c.kernel<<<1, 1>>>();
d.kernel<<1, 1>>();
Ans. c

3. Find out which is the kernel from following code:


#include <iostream>
__global__ void add( int a, int b, int *c ) {
*c = a + b;
}
int main( void ) {
int c; int *dev_c;
HANDLE_ERROR( cudaMalloc( (void**)&dev_c, sizeof(int) ) );
add<<<1,1>>>( 2, 7, dev_c );
HANDLE_ERROR( cudaMemcpy( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost ) );
printf( "2 + 7 = %d\n", c );
cudaFree( dev_c );
return 0;
}
a.cudaMalloc( (void**)&dev_c, sizeof(int) )
b.add<<<1,1>>>(2, 7, dev_c)
c.add<<1,1>>( 2, 7, dev_c );
d.add<<<1,1>>>()

OptimusPrime Page 21
Ans.b

4. From following code which particular line is responsible for copying between device to host

#include <iostream>
__global__ void add( int a, int b, int *c ) {
*c = a + b;
}
int main( void ) {
int c; int *dev_c;
HANDLE_ERROR( cudaMalloc( (void**)&dev_c, sizeof(int) ) );
add<<<1,1>>>( 2, 7, dev_c );
HANDLE_ERROR( cudaMemcpy( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost ) );
printf( "2 + 7 = %d\n", c );
cudaFree( dev_c );
return 0;
}
a. c, dev_c, sizeof(int);
b. HANDLE_ERROR( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost );
c. HANDLE_ERROR( cudaMemcpy( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost ) );
d. cudaMemcpy( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost ) ;
Ans.c

5. What is output of the following code:

#include <iostream>
__global__ void add( int a, int b, int *c ) {
*c = a + b;
}
int main( void ) {
int c; int *dev_c;
HANDLE_ERROR( cudaMalloc( (void**)&dev_c, sizeof(int) ) );
add<<<1,1>>>( 2, 7, dev_c );
HANDLE_ERROR( cudaMemcpy( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost ) );
printf( "2 + 7 = %d\n", c );
cudaFree( dev_c );
return 0;
}
a.2
b.9
c.7
d.0
Ans. b

6.what is function of e __global__ qualifier in cuda program


a. alerts the compiler that a function should be compiled to run on a device instead of the host

OptimusPrime Page 22
b. alerts the interpreter that a function should be compiled to run on a device instead of the host
c. alerts the interpreter that a function should be interpreted to run on a device instead of the host
d. alerts the interpreter that a function should be compiled to run on a host instead of the device
ans.a

7. The on-chip memory which is local to every multithreaded Single Instruction Multiple Data
(SIMD) Processor is called
a. Local Memory
b. Global Memory
c. Flash memory
d. Stack
Ans. a

8. The machine object created by the hardware, managing, scheduling, and executing is a thread
of
a. DIMS instructions
b. DMM instructions
c. SIMD instructions
d. SIM instructions
Ans. c

9. The primary and essential mechanism to support the sparse matrices is


a. Gather-scatter operations
b. Gather operations
c. Scatter operations
d. Gather-scatter technique
Ans. a

10. Which of the following architectures is/are not suitable for realizing SIMD ?
a. Vector Processor
b. Array Processor
c. Von Neumann
d. All of the above
Ans . c

11. Multithreading allowing multiple-threads for sharing the functional units of a


a.Multiple processor
b.Single processor
c.Dual core
d. Corei5
Ans . b

12. Which compiler is used to compile the cude source code:


a.gcc
b.nvc++
c.nc++

OptimusPrime Page 23
d.nvcc
Ans.d

13. which command line is used to execute a cuda program :


a.nvcc hello.cu -o hello
b.nvg++ heloo.cpp -o hello
c.ncc hello.c -o hello
D.g++ hello.cu -o hello
Ans.a

14.The syntax of kernel execution configuration is as follows


a.<<< M , T >>> with a grid of M thread blocks. Each thread block has T parallel blocks
b.<<< M , T >>> with a grid of M blocks. Each thread block has T parallel threads
c.<<< M , T >>> with a grid of M thread blocks. Each thread block has T parallel threads
d.<<< M , T >>> with a grid of M thread blocks. Each thread block has T threads
Ans. c

15.what it contains threadIdx.x


A.contains the index of the thread within the block
b.contains the index of the block within the thread
c.contains the index of the thread size within the block
d.contains the index of the block size within the thread
Ans. A

16.what it contains blockDim.x


a.contains the size of block
b.contains the size of block thread
c.contains the size of thread block (number of threads in the thread block).
d.the size of thread block
Ans. c

17.memory allocation of of variable x and y in cuda:


A.float *b, *a;
cudaMallocManaged(&, N*sizeof(float));
cudaMallocManaged(&, N*sizeof(float));
B.float *x, *y;
cudaMallocManaged(&a, N*sizeof(float));
cudaMallocManaged(&b, N*sizeof(float));
c.float *a, *b;
cudaMallocManaged(&x, N*sizeof(float));
cudaMallocManaged(&y, N*sizeof(float));
d.float *x, *y;
cudaMallocManaged(&x, N*sizeof(float));
cudaMallocManaged(&y, N*sizeof(float));
Ans. D

OptimusPrime Page 24
18.which function is used for free the memory in cuda
a.cudaFree()
b.Free()
c.Cudafree()
d.CudaFree()
Ans. a

19. Which of the following is not a form of parallelism supported by CUDA


a.Vector parallelism - Floating point computations are executed in parallel on wide vector units
b.Thread level task parallelism - Different threads execute a different tasks
c.Block and grid level parallelism - Different blocks or grids execute different tasks
d.Data parallelism - Different threads and blocks process different parts of data in memory
Ans . a

20.The style of parallelism supported on GPUs is best described as


a.SISD - Single Instruction Single Data
b.MISD - Multiple Instruction Single Data
c.SIMT - Single Instruction Multiple Thread
d.MIMD - Multiple Instruction Multiple Data
Ans. c

21. Which of the following correctly describes a GPU kernel


a.A kernel may contain a mix of host and GPU code
b.All thread blocks involved in the same computation use the same kernel
c.A kernel is part of the GPU's internal micro-operating system, allowing it to act as in
independent host
d.All thread blocks involved in the same computation use the different kernel
Ans .b

22.Shared memory in CUDA is accessible to:


a.All threads in a single block
b.Both the host and GPU
c.All threads associated with a single kernel
d.one thread in a single block
Ans.a

23.Which of the following correctly describes the relationship between Warps, thread blocks,
and CUDA cores?
a.A warp is divided into a number of thread blocks, and each thread block executes on a single
CUDA core
b.A thread block may be divided into a number of warps, and each warp may execute on a single
CUDA core
c.A thread block is assigned to a warp, and each thread in the warp is executed on a separate
CUDA core
d. A block index is same as thread index
Ans .b

OptimusPrime Page 25
24. A processor assigned with a thread block, that executes a code ,which we usually call a
A. multithreaded MIMD processor
b. multithreaded SIMD processor
c. multithreaded
D. multicore
Ans. c

25. Thread blocked altogether and being executed in the sets of 32 thread called as
a.block of thread
b.thread block
c.thread
d.block
Ans. b

26.Who developed CUDA :


a. ARM
b. INTEL
c. AMD
d. NVIDIA
Ans. d

---------------------------------------------------------------------------------------------------------------------
SET 3 (30 MCQs)
---------------------------------------------------------------------------------------------------------------------

Unit I

1. Conventional architectures coarsely comprise of a_

A. A processor
B. Memory system
C Data path.
D All of Above
2. Data intensive applications utilize_

A High aggregate throughput


B High aggregate network bandwidth
C High processing and memory system performance.
D None of above
3. A pipeline is like_

A Overlaps various stages of instruction execution to achieve performance.


B House pipeline
C Both a and b
D A gas line

OptimusPrime Page 26
4. Scheduling of instructions is determined_

A True Data Dependency


B Resource Dependency
C Branch Dependency
D All of above
5. VLIW processors rely on_

A Compile time analysis


B Initial time analysis
C Final time analysis
D Mid time analysis
6. Memory system performance is largely captured by_

A Latency
B Bandwidth
C Both a and b
D none of above

7. The fraction of data references satisfied by the cache is called_


A Cache hit ratio
B Cache fit ratio
B Cache best ratio
C none of above
8. A single control unit that dispatches the same Instruction to various processors is__

A SIMD
B SPMD
C MIMD
D None of above
9. The primary forms of data exchange between parallel tasks are_

A Accessing a shared data space


B Exchanging messages.
C Both A and B
D None of Above

10. Switches map a fixed number of inputs to outputs.


A True
B False

Unit 2

1. The First step in developing a parallel algorithm is_

A. To Decompose the problem into tasks that can be executed concurrently

OptimusPrime Page 27
B. Execute directly
C. Execute indirectly
D. None of Above

2. The number of tasks into which a problem is decomposed determines its_

A. Granularity
B. Priority
C. Modernity
D. None of above

3. The length of the longest path in a task dependency graph is called_


A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above

4. The graph of tasks (nodes) and their interactions/data exchange (edges)_

OptimusPrime Page 28
A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above

5. Mappings are determined by_

A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above

6. Decomposition Techniques are_


A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. speculative decomposition
E. All of Above

7. The Owner Computes Rule generally states that the process assigned a particular data item is
responsible for_

A. All computation associated with it


B. Only one computation
C. Only two computation
D. Only occasionally computation

8. A simple application of exploratory decomposition is_

A. The solution to a 15 puzzle


B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above

9. Speculative Decomposition consist of _

A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B

10. task characteristics include:


A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.

OptimusPrime Page 29
D. All of Above

Unit 3

1. Group communication operations are built using point-to-point messaging primitives


A. True
B. False

2. Communicating a message of size m over an uncongested network takes time ts + tmw

A. True
B. False

3. The dual of one-to-all broadcast is_

A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above

4. A hypercube has_

A. 2d nodes
B. 2d nodes
C. 2n Nodes
D. N Nodes

5. A binary tree in which processors are (logically) at the leaves and internal nodes are routing
nodes.

A. True
B. False

6. In All-to-All Broadcast each processor is the source as well as destination.

A. True
B. False

7. The Prefix Sum Operation can be implemented using the_

A. All-to-all broadcast kernel.


B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel

OptimusPrime Page 30
8. In the scatter operation_

OptimusPrime Page 31
A. Single node send a unique message of size m to every other node
B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node
D. None of Above

9. The gather operation is exactly the inverse of the_

A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation

10. In All-to-All Personalized Communication Each node has a distinct message of size m for
every other node

A. True
B. False

---------------------------------------------------------------------------------------------------------------------
SET 4 ( MCQs)
---------------------------------------------------------------------------------------------------------------------

1.Message passing system allows processes to : a) communicate with one another without
resorting to shared data b) communicate with one another by resorting to shared data c) share
data d) name the recipient or sender of the message
Ans-a

2. An IPC facility provides at least two operations : a) write & delete message b) delete &
receive message c) send & delete message d) receive & send message
Ans- d

3.Messages sent by a process : a) have to be of a fixed size b) have to be a variable size c) can be
fixed or variable sized d) None of the mentioned
Ans- c

4.The link between two processes P and Q to send and receive messages is called : a)
communication link b) message-passing link c) synchronization link d) all of the mentioned
Ans- a

5.In the Zero capacity queue : a) the queue can store at least one messageb) the sender blocks
until the receiver receives the message c) the sender keeps sending and the messages don’t wait
in the queue d) none of the mentioned
Ans- b

6.Inter process communication :

OptimusPrime Page 32
a) allows processes to communicate and synchronize their actions when using the same address
space.
b) allows processes to communicate and synchronize their actions without using the same
address space.
c) allows the processes to only synchronize their actions without communication.
d) None of these
Ans- b

7.In the non blocking send :


a) the sending process keeps sending until the message is received b) the sending process sends
the message and resumes operation
c) the sending process keeps sending until it receives a message
d) None of these
Ans- b

8.In indirect communication between processes P and Q :


a) there is another process R to handle and pass on the messages between P and Q
b) there is another machine between the two processes to help communication
c) there is a mailbox to help communication between P and Q
d) None of these
Ans- c

9. In SIMD, elements of short vectors are processed in _______a


a) parallel
b) one by one
c )first come first serve basisd) on priority basis
Ans- b

10.Which one of the following the correct sequence


a) SIMD < SIMT < SMT
b) SIMD > SIMT > SMT
c) SIMD < SIMT > SMT
d) SIMD > SIMT < SMT
Ans- a

11.Single instruction is applied to a multiple data item to produce the ___ output(s).
a)multiple
b)different
c)same
Ans- c

12. Which of the following is NOT a characteristic of parallel computing?


a) Breaks a task into pieces
b) Uses a single processor or computer
c ) Simultaneous execution
d) May use networking

OptimusPrime Page 33
Ans- b

13. Parallel computing uses _____ execution.


a) sequential
b) uniquec ) simultaneous
d) none of the answers is correct.
Ans- c

14) A collection of lines that connects several devices is called ..............


A. bus
B. peripheral connection wires
C. Both a and b
D. internal wires
Answer: A. bus

15) A complete microcomputer system consist of ...........


A. microprocessor
B. memory
C. peripheral equipment
D. all of the above
Answer: D. all of the above

16) PC Program Counter is also called ...................


A. instruction pointer
B. memory pointer
C. data counter
D. file pointer
Answer: A. instruction pointer

17) Data hazards occur when .....................


A. Greater performance loss
B. Pipeline changes the order of read/write access to operands
C. Some functional unit is not fully pipelined
D. Machine size is limited
Answer: B. Pipeline changes the order of read/write access to operands

18) Which of the following bus is used to transfer data from main memory to peripheral device?
A. DMA bus
B. Output bus
C. Data bus
D.All of the above
Answer: C. Data bus

19) Micro instructions are stored in


A. computer memory
B. primary storage

OptimusPrime Page 34
C. secondary storage
D. control memory
E. cache memory
Answer: D. control memory

20. Pipeline processing implement


A. fetch instruction
B. decode instruction
C. fetch operand
D. calculate operand
E. execute instruction
F. all of the above
Answer: F. all of the above

21) Instruction pipelining has minimum stages


A. 4
B. 2
C. 3
D. 6
Answer: B. 2

22). Systems that do not have parallel processing capabilities are


A. SISD
B. SIMD
C. MIMD
D. All of the above
Answer: A. SISD

23) Who is regarded as the founder of Computer Architecture?


A. Alan TuringB. Konrad Zuse
C. John von Neumann
D. John William Mauchly
E. None of the answers above is correct
Answer: C. John von Neumann

24). What is characteristic for the organization of a computer architecture?


A. Size

B. Dynamic behaviour
C. Static behaviour
D. Speed
E. None of the answers above is correct
Answer: B. Dynamic behaviour

25). What is usually regarded as the von Neumann Bottleneck?


A. Processor/memory interface

OptimusPrime Page 35
B. Control unit
C. Arithmetic logical unitD. Instruction set
E. None of the answers above is correct
Answer: A. Processor/memory interface

26) How does the number of transistors per chip increase according to Moore ´s law?
A. Quadratically
B. Linearly
C. Cubicly
D. Exponentially
E. None of the answers above is correct
Answer: D. Exponentially

27) Which value has the speedup of a parallel program that achieves an efficiency of 75% on 32
processors?
A. 18
B. 24
C. 16
D. 20
E. None of the answers above is correct
Answer: B. 24

28). Pipelining strategy is called implement


A. instruction execution
B. instruction prefetch
C. instruction decoding
D. instruction manipulation
Answer: B. instruction prefetch

29). The concept of pipelining is most effective in improving performance if the tasks being
performed in different stages :
A. require different amount of time
B. require about the same amount of time
C. require different amount of time with time difference between any two tasks being same
D. require different amount with time difference between any two tasks being different
Answer: B. require about the same amount of time

30) Which Algorithm is better choice for pipelining?


A. Small Algorithm
B. Hash Algorithm
C. Merge-Sort Algorithm
D. Quick-Sort Algorithm
Answer: C. Merge-Sort Algorithm

31) The expression 'delayed load' is used in context of


A. processor-printer communication

OptimusPrime Page 36
B. memory-monitor communication
C. pipelining
D. none of the above
Answer: C. pipelining

32 ) Parallel processing may occur


A. in the instruction stream
B. in the data stream
C. both[A] and [B]
D. none of the above
Answer: C. both[A] and [B]

33 ) The cost of a parallel processing is primarily determined by :


A. Time Complexity
B. Switching Complexity
C. Circuit Complexity
D. None of the above
Answer: C. Circuit Complexity

34 ) An instruction to provide small delay in program


A. LDA
B. NOP
C. BEA
D. None of the above
Answer: B. NOP

35 )Characteristic of RISC (Reduced Instruction Set Computer) instruction set is


A. three instructions per cycle
B. two instructions per cycle
C. one instruction per cycle
D. none of the
Answer: C. one instruction per cycle

36) In daisy-chaining priority method, all the devices that can request an interrupt are connected
in
A. parallel
B. serial
C. random
D. none of the above
Answer: B. serial

37 ) Which one of the following is a characteristic of CISC (Complex Instruction Set Computer)
A. Fixed format instructions
B. Variable format instructions
C. Instructions are executed by hardware
D. None of the above

OptimusPrime Page 37
Answer: B. Variable format instructions

38). During the execution of the instructions, a copy of the instructions is placed in the ______ .
A. Register
B. RAM
C. System heap
D. Cache
Answer: D. Cache

39 ) Two processors A and B have clock frequencies of 700 Mhz and 900 Mhz respectively.
Suppose A can execute an instruction with an average of 3 steps and B can execute with an
average of 5 steps. For the execution of the same instruction which processor is faster ?
A. A
B. B
C. Both take the same time
D. Insuffient information
Answer: A. A
40 )A processor performing fetch or decoding of different instruction during the execution of
another instruction is called ______ .
A. Super-scaling
B. Pipe-lining
C. Parallel Computation
D. None of these
Answer: B. Pipe-lining

41 ) For a given FINITE number of instructions to be executed, which architecture of the


processor provides for a faster execution ?
A. ISA
B. ANSA
C. Super-scalar
D. All of the above
Answer: C. Super-scalar

42 )The clock rate of the processor can be improved by,


A. Improving the IC technology of the logic circuits
B. Reducing the amount of processing done in one step
C. By using overclocking method
D. All of the above
Answer: D. All of the above

43 )An optimizing Compiler does,


A. Better compilation of the given piece of code.
B. Takes advantage of the type of processor and reduces its process time.
C. Does better memory managament.
D. Both a and c
Answer: B. Takes advantage of the type of processor and reduces its process time.

OptimusPrime Page 38
44 )The ultimate goal of a compiler is to,
A. Reduce the clock cycles for a programming task.
B. Reduce the size of the object code.
C. Be versatile.
D. Be able to detect even the smallest of errors.
Answer: A. Reduce the clock cycles for a programming task.

45 )To which class of systems does the von Neumann computer belong?
A. SIMD (Single Instruction Multiple Data)
B. MIMD (Multiple Instruction Multiple Data)
C. MISD (Multiple Instruction Single Data)
D. SISD (Single Instruction Single Data)
E. None of the answers above is correct.
Answer: D. SISD (Single Instruction Single Data)

46). Parallel programs: Which speedup could be achieved according to Amdahl´s law for infinite
number of processors if 5% of a program is sequential and the remaining part is ideally parallel?
A. Infinite speedup
B. 5
C. 20
D. 50
E. None of the answers above is correct.
Answer: C. 20

47). Itanium processor: Which hazard can be circumvented by register rotation?


A. Control hazards
B. Data hazards
C. Structural hazards
D. None
E. None of the answers above is correct.
Answer: B. Data hazards

48 Which MIMD systems are best scalable with respect to the number of processors?
A. Distributed memory computers
B. ccNUMA systems
C. nccNUMA systems
D. Symmetric multiprocessors
E. None of the answers above is correct
Answer: A. Distributed memory computers

49 ). Cache coherence: For which shared (virtual) memory systems is the snooping protocol
suited?
A. Crossbar connected systems
B. Systems with hypercube network
C. Systems with butterfly network

OptimusPrime Page 39
D. Bus based systems
E. None of the answers above is correct.
Answer: D. Bus based systems

50 )Parallel processing may occur


A. in the instruction stream
B. in the data stream
C. both[A] and [B]
D. none of the above
Answer: C. both[A] and [B]

51). The cost of a parallel processing is primarily determined by :


A. Time Complexity
B. Switching Complexity
C. Circuit Complexity
D. None of the above
Answer: C. Circuit Complexity

Unit 2

1. The best mode of connection between devices which need to send or receive large amounts of
data over a short distance is _____
a) BUS
b) Serial port
c) Parallel port
d) Isochronous port
View Answer
Answer: c
Explanation: The parallel port transfers around 8 to 16 bits of data simultaneously over the lines,
hence increasing transfer rates.

8. In the output interface of the parallel port, along with the valid signal ______ is also sent.
a) Data
b) Idle signal
c) Interrupt
d) Acknowledge signal
View Answer
Answer: b
Explanation: The idle signal is used to check if the device is idle and ready to receive data.
3. Parallel computing uses _____ execution.
a) sequential
b) unique
c) simultaneous
d) none of the above
ans c

OptimusPrime Page 40
4. Heap can be used as ________________
a) Priority queue
b) Stack
c) A decreasing order array
d) None of the mentioned
Answer: a

Explanation: The property of heap that the value of root must be either greater or less than both
of its children makes it work like a priority queue.
6. Which of the following is true about parallel computing performance?
a. Computations use multiple processors.
b. There is an increase in speed.
c. The increase in speed is loosely tied to the number of processor or computers used.
d. All of the answers are correct.
ANS: a

7.Which of the following is NOT a characteristic of parallel computing?


a.Breaks a task into pieces
b.Uses a single processor or computer
c.Simultaneous execution
d.May use networking
ans: b

8) Decentralized computing B. Parallel computing C. Centralized computing D. Decentralized


computing E. Distributed computing F. All of these G. None of these
Answer- A

9) Writing parallel programs is referred to as


A. Parallel computation B. Parallel processes C. Parallel development D. Parallel programming
E. Parallel computation F. All of these G. None of these
Answer- D

10) In which application system Distributed systems can run well?


A. HPC D. HTC C. HRC
D. Both A and B E. All of these F. None of these
Answer- D

11) In which systems desire HPC and HTC.


A. Adaptivity B. Transparency C. Dependency D. Secretive E. Adaptivity F. All of these G.
None of these
Answer- B

12) No special machines manage the network of architecture in which resources are known
as
A. Peer-to-Peer B. Space based C. Tightly coupled D. Loosely coupled E. All of these F. None
of these

OptimusPrime Page 41
Answer- A

13) Virtualization that creates one single address space architecture that of, is called
A. Loosely coupled B. Peer-to-Peer C. Space-based D. Tightly coupled E. Loosely coupled F.
All of these G. None of these
Answer- C

14) Uniprocessor computing devices is called__________.


A. Grid computing B. Centralized computing C. Parallel computing D. Distributed computing E.
All of these F. None of these
Answer- B

15) Utility computing focuses on a______________ model.


A. Data B. Cloud C. Scalable D. Business E. All of these F. None of these
Answer- D

16) Aberavationn of HPC


A. High-peak computing B. High-peripheral computing
C. High-performance computing
D. Highly-parallel computing
E. All of these
F. None of these
Answer- C

17) Peer-to-Peer leads to the development of technologies like


A. Norming grids
B. Data grids
C. Computational grids
D. Both A and B
E. All of these
F. None of these
Answer- D

18) Type of HPC applications of.


A. Management B. Media mass C. Business D. Science E. All of these F.None of these
Answer- D

20) Utilization rate of resources in an execution model is known to be its


A. Adaptation B. Efficiency C. Dependability D. Flexibility E. All of these F. None of these
Answer- B

21) Interprocessor communication that takes place


A. Centralized memory B. Shared memory C. Message passing D. Both A and B E. All of these
F. None of these
Answer- D

OptimusPrime Page 42
22) Data centers and centralized computing covers many and
A. Microcomputers B. Minicomputers C. Mainframe computers D. Supercomputers E. All of
these F. None of these
Answer- D

23) Execution of several activities at the same time. a) processing b) parallel processing c) serial
processing d) multitasking
Answer: b

24) Parallel processing has single execution flow. a) True b) False


Answer: b

25) A term for simultaneous access to a resource, physical or logical. a) Multiprogramming


b) Multitasking c) Threads d) Concurrency
Answer: D

26) ______________ leads to concurrency. a) Serialization b) Parallelism c) Serial processing d)


Distribution
Answer: b

27) A parallelism based on increasing processor word size. a) Increasing b) Count based c) Bit
based d) Bit leve
Answer: d

28) A type of parallelism that uses micro architectural techniques. a) instructional b) bit level c)
bit based d) increasing
Answer: A
29) MIPS stands for? a) Mandatory Instructions/sec b) Millions of Instructions/sec c) Most of
Instructions/sec d) Many Instructions / sec
Answer: B

30) Several instructions execution simultaneously in ________________ a) processing b)


parallel processing c) serial processing d) multitasking
Answer: B

31) Computer has a built-in system clock that emits millions of regularly spaced electric pulses
per _____ called clock cycles. a) second b) millisecond c) microsecond d) minute
Answer: a

32) It takes one clock cycle to perform a basic operation. a) True b) False
Answer: a

33) The operation that does not involves clock cycles is _________ a) Installation of a device b)
Execute c) Fetch d) Decode
Answer: a

OptimusPrime Page 43
34). The number of clock cycles per second is referred as ________ a) Clock speed b) Clock
frequency c) Clock rate d) Clock timing
Answer: a

35). CISC stands for ____________ a) Complex Information Sensed CPU b) Complex
Instruction Set Computer c) Complex Intelligence Sensed CPU d) Complex Instruction Set CPU
Answer: b

36). Which of the following processor has a fixed length of instructions? a) CISC b) RISC c)
EPIC d) Multi-core
Answer: b

37). Processor which is complex and expensive to produce is ________ a) RISC b) EPIC c)
CISC d) Multi-core
Answer: c

38). The architecture that uses a tighter coupling between the compiler and the processor is
____________ a) EPIC b) Multi-core c) RISC d) CISC
Answer: a

39). MAR stands for ___________ a) Memory address register b) Main address register c) Main
accessible register
d) Memory accessible register
Answer: a.

40). A circuitry that processes that responds to and processes the basic instructions that are
required to drive a computer system is ________ a) Memory b) ALU c) CU d) Processor
Answer: d

41) The graph of tasks (nodes) and their interactions/data exchange (edges)_
A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above
Answer: A

42) task characteristics include:


A)Task generation.
B)Task sizes.
C)Size of data associated with tasks.
D) All of Above
Answer: D

43). Decomposition Techniques are_


A. recursive decomposition
B. data decomposition

OptimusPrime Page 44
C. exploratory decomposition
D. speculative decomposition
E. All of Above
Answer: E

44) The Owner Computes Rule generally states that the process assigned a particular data item is
responsible for_
A. All computation associated with it
B. Only one computation
C. Only two computation
D. Only occasionally computation
Answer: A

Unit 3

1.Which topology requires a central controller or hub?


a.Mesh
b.Star
c.Bus
d.Ring
Ans:- b

2.Which topology requires a multipoint connection?


a.Mesh
b.Star
c.Bus
d.Ring
Ans:- c

3 Multipoint topology is
a.Bus
b.Star
c.Mesh
d.Ring
Ans:- a

4.In mesh topology, every device has a dedicated topology of


a.Multipoint linking
b.Point to point linking
c.None of Above
d.Both a and b
Ans:- a

5 .Bus, ring and star topologies mostly used in the


a.LAN
b.MAN

OptimusPrime Page 45
c.WAN
d.Internetwork
Ans:- a

6 Combination of two or more topologies are called


a.Star
b.Bus
c.Ring
d.Hybrid
Ans:- d

7 The topology with highest reliability is ?


a.Bus topology
b.Star topology
c.Ring Topology
d.Mesh Topology
Ans:- d

8 .Star Topology is Based On a Central Device that can be __________ ?


a.HUB
b.Switch
c.Only a
d.Both a and b
Ans:- d

9.Which of the following is not type of the network topology?


a.Mesh
b.Bus
c.Ring
d.stub
Ans:- d

10.In a star-topology Ethernet LAN, _______ is just a point where the signals coming from
different stations collide; it is the collision point.
a.An active hub
b.A passive hub
c.either (a) or (b)
d.neither (a) nor (b)
Ans:- b

11) Group communication operations are built using point-to-point messaging primitives
A. True
B. False
Ans:- A

12) A hypercube has_

OptimusPrime Page 46
A. 2d nodes
B. 2d nodes
C. 2n Nodes
D. N Nodes
Ans:- A

13)The function of multiplexing is


a. To reduce the bandwidth of the signal to be transmitted
b. To combine multiple data streams over a single data channel
c. To allow multiple data streams over multiple channels in a prescribed format
d. To match the frequencies of the signal at the transmitter as well as the receiver
Ans: b
14) A 3 GHz carrier is DSB SC modulated by a signal with maximum frequency of 2 MHz. The
minimum sampling frequency required for the signal so that the signal is ideally sampled is
a. 4 MHz
b. 6 MHz
c. 6.004 GHz
d. 6 GHz
Ans: c

15) Emitter modulator amplifier for Amplitude Modulation


a. Operates in class A mode
b. Has a low efficiency
c. Output power is small
d. All of the above
Ans: d

16) Super heterodyne receivers


a. Have better sensitivity
b. Have high selectivity
c. Need extra circuitry for frequency conversion
d. All of the above
Ans: d

17) The AM spectrum consists of


a. Carrier frequency
b. Upper side band frequency
c. Lower side band frequency
d. All of the above
Ans: d

18) Standard intermediate frequency used for AM receiver is


a. 455 MHz
b. 455 KHz
c. 455 Hz
d. None of the above

OptimusPrime Page 47
Ans: b

19) In the TV receivers, the device used for tuning the receiver to the incoming signal is
a. Varactor diode
b. High pass Filter
c. Zener diode
d. Low pass filter
Ans: a

20) The modulation technique that uses the minimum channel bandwidth and transmitted power
is
a. FM
b. DSB-SC
c. VSB
d. SSB
Ans: d

21) Calculate the bandwidth occupied by a DSB signal when the modulating frequency lies in
the range from 100 Hz to 10KHz.
a. 28 KHz
b. 24.5 KHz
c. 38.6 KHz
d. 19.8 KHz
Ans: d

23)What is a high performance multi-core processor that can be used to accelerate a wide
variety of applications using parallel computing.
1. CLU
2. GPU
3. CPU
4. DSP
ANS-2

24) What is GPU?


1. Grouped Processing Unit
2. Graphics Processing Unit
3. Graphical Performance Utility
4. Graphical Portable Unit
ANS-2

25. A code, known as GRID, which runs on GPU consisting of a set of


1. 32 Thread
2. 32 Block
3. Unit Block
4. Thread Block
ANS-4

OptimusPrime Page 48
26. Interprocessor communication that takes place
1. Centralized memory
2. Shared memory
3. Message passing
4. Both A and B
ANS-4

27. Decomposition into a large number of tasks results in coarse -grained decomposition
1. True
2. False
ANS-2

28. The fetch and execution cycles are interleaved with the help of __
1. Modification in processor architecture
2. Clock
3. Special unit
4. Control unit
ANS-2

29. The processor of system which can read /write GPU memory is known as
1. kernal
2. device
3. Server
4. Host
ANS-4

30. Increasing the granularity of decomposition and utilizing the resulting concurrency to
perform more tasks in parallel decreses performance.
1. TRUE
2. FALSE
ANS-2

---------------------------------------------------------------------------------------------------------------------
SET 5 (MCQs)
---------------------------------------------------------------------------------------------------------------------

What are the sources of overhead?


A. Essential /Excess Computation
B. Inter-process Communication
C. Idling
D. All above
Answer : D

Which are the performance metrics for parallel systems?

OptimusPrime Page 49
A. Execution Time
B. Total Parallel Overhead
C. Speedup
D. Efficiency
E. Cost
F. All above
Answer : F

The efficiency of a parallel program can be written as: E = Ts / pTp. True or False?
A. True
B. False
Answer : A

Overhead function or total overhead of a parallel system as the total time collectively
spent by all the processing elements over and above that required by the fastest known
sequential algorithm for solving the same problem on a single processing element.
True or False?
A. True
B. False
Answer : A

What is Speedup?
A. A measure that captures the relative benefit of solving a problem in parallel. It is defined as
the
ratio of the time taken to solve a problem on a single processing element to the time required to
solve the same problem on a parallel computer with p identical processing elements.
B. A measure of the fraction of time for which a processing element is usefully
employed.
C. None of the above
Answer : A

In an ideal parallel system, speedup is equal to p and efficiency is equal to one. True or
False?
A. True
B. False
Answer : A

A parallel system is said to be ________________ if the cost of solving a problem on a

size as the fastest-known sequential algorithm on a single processing element.


A. Cost optimal
B. Non Cost optimal
Answer : A

Using fewer than the maximum possible number of processing elements to execute a
parallel algorithm is called ______________ a parallel system in terms of the number of

OptimusPrime Page 50
processing elements.
A. Scaling down
B. Scaling up
Answer : B

The __________________ function determines the ease with which a parallel system can
maintain a constant efficiency and hence achieve speedups increasing in proportion to the
number of processing elements.
A. Isoefficiency
B. Efficiency
C. Scalability
D. Total overhead
Answer : A

Minimum execution time for adding n numbers is Tp = n/p + 2 logp True or False ?
A. True
B. False
Answer : A

The overhead function To = pTP − TS.


A. True
B. False
Answer : A

Performance Metrics for Parallel Systems: Speedup(S) =TS/TP


A. True
B. False
Answer : A

Matrix Vector multiplication 2D Partitions requires some basic communication operations


A. one-to-one communication to align the vector along the main diagonal
B. one-to-all broadcast of each vector element among the n processes of each column
C. all-to-one reduction in each row
D. All Above
Answer : D

What are the issues in sorting?


A. Where the Input and Output Sequences are Stored
B. How Comparisons are Performed
C. All above
Answer : C

The parallel run time of the formulation for Bubble sort is


A. Tp = O(n/plogn/p) + O(n) + O(n)
B. Tp = O(n/plogn/p) + O (n/plogp) + O(ln/p)
C. Non of the above

OptimusPrime Page 51
Answer : A

What are the variants of Bubble sort?


A. Shell sort
B. Quick sort
C. Odd-Even transposition
D. Option A & C
Answer : D

What is the overall complexity of parallel algorithm for quick sort?


A. Tp = O(n/p logn/p) + O(n/p logp) + O(log2p)
B. Tp = O(n/p logn/p) + O(n/p logp)
C. Tp = O(n/p logn/p) + O(log2p)
Answer : A

Formally, given a weighted graph G(V, E, w), the all-pairs shortest paths problem is to
find the shortest paths between all pairs of vertices. True or False?
A. True
B. False
Answer : A

What is true for parallel formulation of Dijkstra’s Algorithm?


A. One approach partitions the vertices among different processes and has each process
compute the single-source shortest paths for all vertices assigned to it. We refer to
this approach as the source-partitioned formulation.
B. Another approach assigns each vertex to a set of processes and uses the parallel
formulation of the single-source algorithm to solve the problem on each set of
processes. We refer to this approach as the source-parallel formulation.
C. Both are true
D. Non of these is true
Answer : C

Search algorithms can be used to solve discrete optimization problems. True or False ?
A. True
B. False
Answer : A

Examples of Discrete optimization problems are ;


A. planning and scheduling,
B. The optimal layout of VLSI chips,
C. Robot motion planning,
D. Test-pattern generation for digital circuits, and logistics and control.
E. All of above
Answer : E

List the important parameters of Parallel DFS

OptimusPrime Page 52
A. Work- Splitting Strategies
B. Load balancing Schemes
C. All of above
Answer : C

List the communication strategies for parallel BFS.


A. Random communication strategy
B. Ring communication strategy
C. Blackboard communication strategy
D. All of above
Answer : D

The lower bound on any comparison-based sort of n numbers is Θ(nlog n)


A. True
B. False
Answer : A

In a compare-split operation
A. Each process sends its block of size n/p to the other process
B. Each process merges the received block with its own block and retains only the
appropriate half of the merged block
C. Both A & B
Answer : C

In a typical sorting network


A. Every sorting network is made up of a series of columns
B. Each column contains a number of comparators connected in parallel
C. Both A & B
Answer : C

Bubble sort is difficult to parallelize since the algorithm has no concurrency


A. True
B. False
Answer : A

Unit I

1. Conventional architectures coarsely comprise of a_


A. A processor
B. Memory system
C Data path.
D All of Above

2. Data intensive applications utilize_


A High aggregate throughput
B High aggregate network bandwidth

OptimusPrime Page 53
C High processing and memory system performance.
D None of above

3. A pipeline is like_
A Overlaps various stages of instruction execution to achieve performance.
B House pipeline
C Both a and b
D A gas line

4. Scheduling of instructions is determined_


A True Data Dependency
B Resource Dependency
C Branch Dependency
D All of above

5. VLIW processors rely on_


A Compile time analysis
B Initial time analysis
C Final time analysis
D Mid time analysis

6. Memory system performance is largely captured by_


A Latency
B Bandwidth
C Both a and b
D none of above

7. The fraction of data references satisfied by the cache is called_


A Cache hit ratio
B Cache fit ratio
B Cache best ratio
C none of above

8. A single control unit that dispatches the same Instruction to various processors is__
A SIMD
B SPMD
C MIMD
D None of above

9. The primary forms of data exchange between parallel tasks are_


A Accessing a shared data space
B Exchanging messages.
C Both A and B
D None of Above

10. Switches map a fixed number of inputs to outputs.

OptimusPrime Page 54
A True
B False
Unit 2

1. The First step in developing a parallel algorithm is_


A. To Decompose the problem into tasks that can be executed concurrently
B. Execute directly
C. Execute indirectly
D. None of Above

2. The number of tasks into which a problem is decomposed determines its_


A. Granularity
B. Priority
C. Modernity
D. None of above

3. The length of the longest path in a task dependency graph is called_


A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above

4. The graph of tasks (nodes) and their interactions/data exchange (edges)_


A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above

5. Mappings are determined by_


A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above

6. Decomposition Techniques are_


A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. speculative decomposition
E. All of Above

7. The Owner Computes Rule generally states that the process assigned a particular data
item is responsible for_
A. All computation associated with it
B. Only one computation
C. Only two computation

OptimusPrime Page 55
D. Only occasionally computation

8. A simple application of exploratory decomposition is_


A. The solution to a 15 puzzle
B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above

9. Speculative Decomposition consist of _


A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B

10. task characteristics include:


A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above

Unit 3

1. Group communication operations are built using point-to-point messaging primitives


A. True
B. False

2. Communicating a message of size m over an uncongested network takes time ts + tmw


A. True
B. False

3. The dual of one-to-all broadcast is_


A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above

4. A hypercube has_
A. 2d nodes
B. 2d nodes
C. 2n Nodes
D. N Nodes

5. A binary tree in which processors are (logically) at the leaves and internal nodes are
routing nodes.
A. True
B. False

OptimusPrime Page 56
6. In All-to-All Broadcast each processor is the source as well as destination.
A. True
B. False

7. The Prefix Sum Operation can be implemented using the_


A. All-to-all broadcast kernel.
B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel

8. In the scatter operation_


A. Single node send a unique message of size m to eve ry other node
B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node
D. None of Above

9. The gather operation is exactly the inverse of the_


A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation

10. In All-to-All Personalized Communication Each node has a distinct message of size m
for every other node
A. True
B. False

1. It is ___________ strength and ___________ permeability.


a) High, high
b) Low, low
c) High, low
d) Low, high
View Answer
Answer: c
Explanation: It is specifically chosen so as to have particularly appropriate properties for the
expected use of the structure such as high strength and low permeability.

2. High Performance concrete works out to be economical.


a) True
b) False
View Answer
Answer: a
Explanation: High Performance concrete works out to be economical, even though its initial
cost is high.

OptimusPrime Page 57
3. HPC is not used in high span bridges.
a) True
b) False
View Answer
Answer: b
Explanation: Major applications of high-performance concrete in the field of Civil
Engineering constructions have been in the areas of long-span bridges, high-rise buildings
or structures, highway pavements, etc.

4. Concrete having 28- days’ compressive strength in the range of 60 to 100 MPa.
a) HPC
b) VHPC
c) OPC
d) HSC
View Answer
Answer: a
Explanation: High Performance Concrete having 28- days’ compressive strength in the
range of 60 to 100 MPa

5. Concrete having 28-days compressive strength in the range of 100 to 150 MPa.
a) HPC
b) VHPC
c) OPC
d) HSC
View Answer
Answer: b
Explanation: Very high performing Concrete having 28-days compressive strength in the
range of 100 to 150 MPa.

6. High-Performance Concrete is ____________ as compared to Normal Strength


Concrete.
a) Less brittle
b) Brittle
c) More brittle
d) Highly ductile
View Answer
Answer: c
Explanation: High-Performance Concrete is more brittle as compared to Normal Strength
Concrete (NSC), especially when high strength is the main criteria

7. The choice of cement for high-strength concrete should not be based only on mortarcube
tests but it should also include tests of compressive strengths of concrete at
___________ days.
a) 28, 56, 91
b) 28, 60, 90
c) 30, 60, 90

OptimusPrime Page 58
d) 30, 45, 60
View Answer
Answer: a
Explanation: The choice of cement for high-strength concrete should not be based only on
mortar-cube tests but it should also include tests of compressive strengths of concrete at
28, 56, and 91 days.

8. For high-strength concrete, a cement should produce a minimum 7-days mortar-cube


strength of approximately ___ MPa.
a) 10
b) 20
c) 30
d) 40
View Answer
Answer: c
Explanation: For high-strength concrete, a cement should produce a minimum 7-days
mortar-cube strength of approximately 30 MPa.

9. ____________ mm nominal maximum size aggregates gives optimum strength.


a) 9.5 and 10.5
b) 10.5 and 12.5
c) 9.5 and 12.5
d) 11.5 and 12.5
View Answer
Answer: c

Explanation: Many studies have found that 9.5 mm to 12.5 mm nominal maximum size
aggregates gives optimum strength.
10. Due to low w/c ratio _____________
a) It doesn’t cause any problems
b) It causes problems
c) Workability is easy
d) Strength is more
View Answer
Answer: b
Explanation: Due to the low w/c ratio, it causes problems so superplasticizers are used.

Which of the following statements are true with regard to compute capability in CUDA
A. Code compiled for hardware of one compute capability will not need to be recompiled
to run on hardware of another
B. Different compute capabilities may imply a different amount of local memory per
thread
C. Compute capability is measured by the number of FLOPS a GPU accelerator can
compute.
Answer : B

OptimusPrime Page 59
True or False: The threads in a thread block are distributed across SM units so that each
thread is executed by one SM unit.
A. True
B. False
Answer : B

The style of parallelism supported on GPUs is best described as


A. SISD - Single Instruction Single Data
B. MISD - Multiple Instruction Single Data
C. SIMT - Single Instruction Multiple Thread
Answer : C

True or false: Functions annotated with the __global__ qualifier may be executed on the
host or the device
A. True
B. Flase
Answer : A

Which of the following correctly describes a GPU kernel


A. A kernel may contain a mix of host and GPU code
B. All thread blocks involved in the same computation use the same kernel
C. A kernel is part of the GPU's internal micro-operating system, allowing it to act as in
independent host
Answer : B

Which of the following is not a form of parallelism supported by CUDA


A. Vector parallelism - Floating point computations are executed in parallel on wide
vector units
B. Thread level task parallelism - Different threads execute a different tasks
C. Block and grid level parallelism - Different blocks or grids execute different tasks
D. Data parallelism - Different threads and blocks process different parts of data in
memory
Answer :A

What strategy does the GPU employ if the threads within a warp diverge in their execution?
A. Threads are moved to different warps so that divergence does not occur within a
single warp
B. Threads are allowed to diverge
C. All possible execution paths are run by all threads in a warp serially so that thread
instructions do not diverge
Answer : C

Which of the following does not result in uncoalesced (i.e. serialized) memory access on the
K20 GPUs installed on Stampede
A. Aligned, but non-sequential access
B. Misaligned data access

OptimusPrime Page 60
C. Sparse memory access
Answer : A

Which of the following correctly describes the relationship between Warps, thread blocks,
and CUDA cores?
A. A warp is divided into a number of thread blocks, and each thread block executes on
a single CUDA core
B. A thread block may be divided into a number of warps, and each warp may execute
on a single CUDA core
C. A thread block is assigned to a warp, and each thread in the warp is executed on a
separate CUDA core
Answer : B

Shared memory in CUDA is accessible to:


A. All threads in a single block
B. Both the host and GPU
C. All threads associated with a single kernel
Answer : A

CUDA Architecture CPU consist of


A. CUDA Libraries
B. CUDA Runtime
C. CUDA Driver
D. All Above
Answer : D

CUDA platform works on


A. C
B. C++
C. Forton
D. All Above
Answer : D

Threads support Shared memory and Synchronization


A. True
B. False
Answer : A

Application of CUDA are


A. Fast Video Transcoding
B. Medical Imaging
C. Computational Science
D. Oil and Natural Resources exploration
E. All Above
Answer : E

OptimusPrime Page 61
GPU execute device code
A. True
B. False
Answer : A

---------------------------------------------------------------------------------------------------------------------
SET 6 (MCQs)
--------------------------------------------------------------------------------------- ------------------------------

1) Execution of several activities at the same time.


a) processing b) parallel processing c) serial processing d) multitasking
Ans: b Explanation:

2)
Parallel processing has single execution flow.
a) True b) False
Ans: b Explanation: The statement is false. Sequential programming specifically has single
execution flow.

3)
A term for simultaneous access to a resource, physical or logical.
a) Multiprogramming b) Multitasking c) Threads d) Concurrency
Ans: d Explanation: Concurrency is the term used for the same. When several things are
accessed simultaneously, the job is said to be concurrent.

4)
______________ leads to concurrency.
a) Serialization b) Parallelism c) Serial processing d) Distribution Ans: b Explanation:
Parallelism leads naturally to Concurrency. For example, Several processes trying to print a file
on a single printer.

5)
A parallelism based on increasing processor word size.
a) Increasing b) Count based c) Bit based d) Bit level
Ans: d Explanation: Bit level parallelism is based on increasing processor word size. It focuses
on hardware capabilities for structuring.

6)
The measure of the “effort” needed to maintain efficiency while adding processors.
a) Maintainability b) Efficiency
c) Scalability d) Effectiveness
Ans: C Explanation: The measure of the “effort” needed to maintain efficiency while adding
processors is called as scalability.

7)
Several instructions execution simultaneously in ________________

OptimusPrime Page 62
a) processing b) parallel processing c) serial processing d) multitasking
Ans: b Explanation: In parallel processing, the several instructions are executed simultaneously.

8)
Conventional architectures coarsely comprise of a_
a) A processor
b) Memory system
c) Data path.
d) All of Above
Ans: d Explanation:

9) A pipeline is like_
a) Overlaps various stages of instruction execution to achieve performance.
b) House pipeline
c) Both a and b
d) A gas line
Ans: a Explanation:

10) VLIW processors rely on_


a) Compile time analysis
b) Initial time analysis
c) Final time analysis
d) Mid time analysis
Ans: a Explanation:

11)
Memory system performance is largely captured by_
a) Latency
b) Bandwidth
c) Both a and b
d) none of above
Ans: c Explanation:

12)
The fraction of data references satisfied by the cache is called_
a) Cache hit ratio
b) Cache fit ratio
c) Cache best ratio
d) none of above
Ans: a Explanation:

13)
A single control unit that dispatches the same Instruction to various processors is__
a) SIMD
b) SPMD
c) MIMD

OptimusPrime Page 63
d) None of above
Ans: a Explanation:

14)
The primary forms of data exchange between parallel tasks are_
a) Accessing a shared data space
b) Exchanging messages.
c) Both A and B
d) None of Above Ans: c Explanation:

16)
Switches map a fixed number of inputs to outputs.
a) True
b) False
Ans: a Explanation:

UNIT-2

1)
The First step in developing a parallel algorithm is_
a) To Decompose the problem into tasks that can be executed concurrently
b) Execute directly
c) Execute indirectly
d) None of Above
Ans: a Explanation:

2)
The number of tasks into which a problem is decomposed determines its_
a) Granularity
b) Priority
c) Modernity
d) None of above
Ans: A Explanation:

3)
The length of the longest path in a task dependency graph is called_
a) the critical path length
b) the critical data length
c) the critical bit length
d) None of above
Ans: a Explanation:

4)
The graph of tasks (nodes) and their interactions/data exchange (edges)_
a) Is referred to as a task interaction graph
b) Is referred to as a task Communication graph

OptimusPrime Page 64
c) Is referred to as a task interface graph
d) None of Above
Ans: a Explanation:

5)
Mappings are determined by_
a) task dependency
b) task interaction graphs
c) Both A and B
d) None of Above
Ans: c Explanation:

6)
Decomposition Techniques are_
a) recursive decomposition
b) data decomposition
c) exploratory decomposition
d) speculative decomposition
e) All of Above
Ans: E Explanation:

7)
The Owner Computes Rule generally states that the process assigned a particular data item is
responsible for_
a) All computation associated with it
b) Only one computation
c) Only two computation
d) Only occasionally computation
Ans: A Explanation:

8)
A simple application of exploratory decomposition is_
a) The solution to a 15 puzzle
b) The solution to 20 puzzle
c) The solution to any puzzle
d) None of Above
Ans: A Explanation:

9)
Speculative Decomposition consist of _
a) conservative approaches
b) optimistic approaches
c) Both A and B
d) Only B
Ans: C Explanation:

OptimusPrime Page 65
10)
task characteristics include:
a) Task generation.
b) Task sizes.
c) Size of data associated with tasks.
d) All of Above
Ans: d Explanation: UNIT-3

1)
Group communication operations are built using point-to-point messaging primitives
a) True
b) False
Ans: A Explanation:

2)
Communicating a message of size m over an uncongested network takes time ts + tmw
a) True
b) False
Ans: A Explanation:

3)
The dual of one-to-all broadcast is_
a) All-to-one reduction
b) All-to-one receiver
c) All-to-one Sum
d) None of Above
Ans: A Explanation:

4)
A hypercube has_
a) 2d nodes
b) 2d nodes
c) 2n Nodes
d) N Nodes
Ans: a Explanation:

5)
A binary tree in which processors are (logically) at the leaves and internal nodes are routing
nodes.
a) True
b) False
Ans: A Explanation:

6)
In All-to-All Broadcast each processor is the source as well as destination.
a) True

OptimusPrime Page 66
b) False
Ans: A Explanation:

7)
The Prefix Sum Operation can be implemented using the_
a) All-to-all broadcast kernel.
b) All-to-one broadcast kernel.
c) One-to-all broadcast Kernel
d) Scatter Kernel
Ans: A Explanation:

8)
In the scatter operation_
a) Single node send a unique message of size m to every other node
b) Single node send a same message of size m to every other node
c) Single node send a unique message of size m to next node
d) None of Above
Ans: A Explanation:

9)
The gather operation is exactly the inverse of the_
a) Scatter operation
b) Broadcast operation
c) Prefix Sum
d) Reduction operation
Ans: A Explanation:

10)
In All-to-All Personalized Communication Each node has a distinct message of size m for every
other node
a) True
b) False
Ans: a Explanation:

UNIT-1

1) Conventional architectures coarsely comprise of a______

a) processor
b) Memory system
c) Datapath.
d) All of Above
Ans: d
Explanation:

OptimusPrime Page 67
2) Data intensive applications utilize______
a) High aggregate throughput
b) High aggregate network bandwidth
c) High processing and memory system performance.
d) None of above
Ans: a
Explanation:

3) A pipeline is like_____

a. Overlaps various stages of instruction execution to achieve performance.


b. House pipeline
c. Both a and b
d. gas line
Ans: a

Explanation:
4) Scheduling of instructions is determined ____
a) True Data Dependency
b) Resource Dependency
c) Branch Dependency
d) All of above
Ans: d

Explanation:
5) VLIW processors rely on______

a) Compile time analysis


b) Initial time analysis
c) Final time analysis
d) Mid time analysis
Ans: a

Explanation:
6) Memory system performance is largely captured by_____

a) Latency
b) Bandwidth
c) Both a and b
d) none of above
Ans: c
Explanation:

7) The fraction of data references satisfied by the cache is called_____

a) Cache hit ratio

OptimusPrime Page 68
b) Cache fit ratio
c) Cache best ratio
d) none of above
Ans: a
Explanation:
8) A single control unit that dispatches the same Instruction to various processors is__
a) SIMD
b) SPMD
c) MIMD
d) None of above
Ans: a
Explanation:
9) The primary forms of data exchange between parallel tasks are_

a. Accessing a shared data space


b. Exchanging messages.
c. Both A and B
d. None of Above
Ans: c
Explanation:
10) Switches map a fixed number of inputs to outputs.
a) True
b) False
Ans: a
Explanation:
11) The stage in which the CPU fetches the instructions from the instruction cache in
superscalar organization is
a) Prefetch stage
b) D1 (first decode) stage
c) D2 (second decode) stage
d) Final stage
Ans: a
Explanation: In the prefetch stage of pipeline, the CPU fetches the instructions from the
instruction cache, which stores the instructions to be executed. In this stage, CPU also aligns the
codes appropriately.

12) The CPU decodes the instructions and generates control words in
a) Prefetch stage
b) D1 (first decode) stage
c) D2 (second decode) stage
d) Final stage
Ans: b

Explanation: In D1 stage, the CPU decodes the instructions and generates control words. For
simple RISC instructions, only single control word is enough for starting the execution.
13) The fifth stage of pipeline is also known as

OptimusPrime Page 69
a) read back stage
b) read forward stage
c) write back stage
d) none of the mentioned
Ans: c
Explanation: The fifth stage or final stage of pipeline is also known as “Write back (WB)
stage”.
14) In the execution stage the function performed is
a) CPU accesses data cache
b) executes arithmetic/logic computations
c) executes floating point operations in execution unit
d) all of the mentioned
Ans: d
Explanation: In the execution stage, known as E-stage, the CPU accesses data cache, executes
arithmetic/logic computations, and floating point operations in execution unit.

15) The stage in which the CPU generates an address for data memory references in this
stage is
a) prefetch stage
b) D1 (first decode) stage
c) D2 (second decode) stage
d) execution stage
Ans: c
Explanation: In the D2 (second decode) stage, CPU generates an address for data memory
references in this stage. This stage is required where the control word from D1 stage is again
decoded for final execution.

16) The feature of separated caches is


a) supports the superscalar organization
b) high bandwidth
c) low hit ratio
d) all of the mentioned
Ans: d
Explanation: The separated caches have low hit ratio compared to a unified cache, but have the
advantage of supporting the superscalar organization and high bandwidth.
17) In the operand fetch stage, the FPU (Floating Point Unit) fetches the operands from
a) floating point unit
b) instruction cache
c) floating point register file or data cache
d) floating point register file or instruction cache
Ans: C
Explanation: In the operand fetch stage, the FPU (Floating Point Unit) fetches the operands
from either floating point register file or data cache.
18) The FPU (Floating Point Unit) writes the results to the floating point register file in
a) X1 execution state
b) X2 execution state

OptimusPrime Page 70
c) write back stage
d) none of the mentioned

Ans: c
Explanation: In the two execution stages of X1 and X2, the floating point unit reads the data
from the data cache and executes the floating point computation. In the “write back stage” of
pipeline, the FPU (Floating Point Unit) writes the results to the floating point register file.

19) The floating point multiplier segment performs floating point multiplication in
a) single precision
b) double precision
c) extended precision
d) all of the mentioned

Ans: d
Explanation: The floating point multiplier segment performs floating point multiplication in
single precision, double precision and extended precision.
20) The instruction or segment that executes the floating point square root instructions is
a) floating point square root segment
b) floating point division and square root segment
c) floating point divider segment
d) none of the mentioned

Ans: c
Explanation: The floating point divider segment executes the floating point division and square
root instructions.

21) The floating point rounder segment performs rounding off operation at

a) after write back stage


b) before write back stage
c) before arithmetic operations
d) none of the mentioned
Ans: b
Explanation: The results of floating point addition or division process may be required to be
rounded off, before write back stage to the floating point registers.
21) Which of the following is a floating point exception that is generated in case of integer
arithmetic?
a) divide by zero
b) overflow
c) denormal operand
d) all of the mentioned
Ans: D
Explanation: In the case of integer arithmetic, the possible floating point exceptions in Pentium
are:
1. divide by zero

OptimusPrime Page 71
2. overflow
3. denormal operand
4. underflow
5. invalid operation.

1. The First step in developing a parallel algorithm is_

E. To Decompose the problem into tasks that can be executed concurrently


F. Execute directly
G. Execute indirectly
H. None of Above

2. The number of tasks into which a problem is decomposed determines its_

E. Granularity
F. Priority
G. Modernity
H. None of above

3. The length of the longest path in a task dependency graph is called_


F. the critical path length
G. the critical data length
H. the critical bit length
I. None of above

4. The graph of tasks (nodes) and their interactions/data exchange (edges)_


E. Is referred to as a task interaction graph
F. Is referred to as a task Communication graph
G. Is referred to as a task interface graph
H. None of Above

5. Mappings are determined by_

E. task dependency
F. task interaction graphs
G. Both A and B
H. None of Above

6. Decomposition Techniques are_


F. recursive decomposition
G. data decomposition
H. exploratory decomposition
I. speculative decomposition
J. All of Above

OptimusPrime Page 72
7. The Owner Computes Rule generally states that the process assigned a particular data item is
responsible for_

E. All computation associated with it


F. Only one computation
G. Only two computation
H. Only occasionally computation

8. A simple application of exploratory decomposition is_

E. The solution to a 15 puzzle


F. The solution to 20 puzzle
G. The solution to any puzzle
H. None of Above

9. Speculative Decomposition consist of _

E. conservative approaches
F. optimistic approaches
G. Both A and B
H. Only B

10. task characteristics include:


E. Task generation.
F. Task sizes.
G. Size of data associated with tasks.
H. All of Above

11. Choose the most accurate (CORRECT) statement:


a. Scalability is a measure of the capacity to increase speedup in proportion to the number
of processors
b. Efficiency is the ratio of the serial run time of the best sequential algorithm for solving a
problem to the time taken by the parallel algorithm to solve the same problem
on p processors
c. Run time is the time that elapses from the moment a parallel computation starts to the
moment the last processor finishes.
d. Superlinear is the fraction of time for which a processor is usefully employed
12. Parallelism can be used to increase the (parallel) size of the problem is applicable in
___________________.
a. Amdahl's Law
b. Gustafson-Barsis's Law
c. Newton's Law
d. Pascal's Law

OptimusPrime Page 73
13. ____________ is due to load imbalance, synchronization, or serial components as parts of
overheads in parallel programs.
a. Interprocess interaction
b. Synchronization
c. Idling
d. Excess computation
14. Which of the following parallel methodological design elements focuses on recognizing
opportunities for parallel execution?
a. Partitioning
b. Communication
c. Aggromeration
d. Mapping
15. Considering to use weak or strong scaling is part of ______________ in addressing the
challenges of distributed memory programming.
a. Splitting the problem
b. Speeding up computations
c. Speeding up communication
d. Speeding up hardware
16. Domain and functional decomposition are considered in the following parallel
methodological design elements, EXCEPT:
a. Partitioning
b. Communication
c. Agglomeration
d. Mapping
17. Synchronization is one of the common issues in parallel programming. The issues related to
synchronization include the followings, EXCEPT:
a. Deadlock
b. Livelock
c. Fairness
d. Correctness
18. Which of the followings is the BEST description of Message Passing Interface (MPI)?
a. A specification of a shared memory library
b. MPI uses objects called communicators and groups to define which collection of
processes may communicate with each other
c. Only communicators and not groups are accessible to the programmer only by a "handle"
d. A communicator is an ordered set of processes
---------------------------------------------------------------------------------------------------------------------
SET 7 MCQs
---------------------------------------------------------------------------------------------------------------------
Which is alternative options for latency hiding?
A. Increase CPU frequency
B. Multithreading
C. Increase Bandwidth

OptimusPrime Page 74
D. Increase Memory
ANSWER: B
______ Communication model is generally seen in tightly coupled
system.
A. Message Passing
B. Shared-address space
C. Client-Server
D. Distributed Network
ANSWER: B
The principal parameters that determine the communication latency
are as follows:
A. Startup time (ts) Per-hop time (th) Per-word transfer time (tw)
B. Startup time (ts) Per-word transfer time (tw)
C. Startup time (ts) Per-hop time (th)
D. Startup time (ts) Message-Packet-Size(W)
ANSWER: A
The number and size of tasks into which a problem is decomposed
determines the __
A. Granularity
B. Task
C. Dependency Graph
D. Decomposition
ANSWER: A
Average Degree of Concurrency is...
A. The average number of tasks that can run concurrently over the
entire duration of execution of the process.
B. The average time that can run concurrently over the entire
duration of execution of the process.
C. The average in degree of task dependency graph.
D. The average out degree of task dependency graph.
ANSWER: A
Which task decomposition technique is suitable for the 15-puzzle
problem?
A. Data decomposition
B. Exploratory decomposition
C. Speculative decomposition
D. Recursive decomposition
ANSWER: B
Which of the following method is used to avoid Interaction
Overheads?
A. Maximizing data locality
B. Minimizing data locality
C. Increase memory size
D. None of the above.
ANSWER: A
Which of the following is not parallel algorithm model

OptimusPrime Page 75
A. The Data Parallel Model
B. The work pool model
C. The task graph model
D. The Speculative Model
ANSWER: D
Nvidia GPU based on following architecture
A. MIMD
B. SIMD
C. SISD
D. MISD
ANSWER: B
What is Critical Path?
A. The length of the longest path in a task dependency graph is
called the critical path length.
B. The length of the smallest path in a task dependency graph is
called the critical path length.
C. Path with loop
D. None of the mentioned.
ANSWER: A
Which decompositioin technique uses divide-andconquer strategy?
A. recursive decomposition
B. Sdata decomposition
C. exploratory decomposition
D. speculative decomposition
ANSWER: A
If there are 6 nodes in a ring topology how many message passing
cycles will be required to complete broadcast process in one to all?
A. 1
B. 6
C. 3
D. 4
ANSWER: 3
If there is 4 X 4 Mesh topology network then how many ring operation
will perform to complete one to all broadcast?
A. 4
B. 8
C. 16
D. 32
ANSWER: 8
Consider all to all broadcast in ring topology with 8 nodes. How
many messages will be present with each node after 3rd step/cycle of
communication?
A. 3
B. 4
C. 6
D. 7

OptimusPrime Page 76
ANSWER: 4
Consider Hypercube topology with 8 nodes then how many message
passing cycles will require in all to all broadcast operation?
A. The longest path between any pair of finish nodes.
B. The longest directed path between any pair of start & finish
node.
C. The shortest path between any pair of finish nodes.
D. The number of maximum nodes level in graph.
ANSWER: D
Scatter is ____________.
A. One to all broadcast communication
B. All to all broadcast communication
C. One to all personalised communication
D. Node of the above.
ANSWER: C
If there is 4X4 Mesh Topology ______ message passing cycles will
require complete all to all reduction.
A. 4
B. 6
C. 8
D. 16
ANSWER: C
Following issue(s) is/are the true about sorting techniques with
parallel computing.
A. Large sequence is the issue
B. Where to store output sequence is the issue
C. Small sequence is the issue
D. None of the above
ANSWER: B
Partitioning on series done after ______________
A. Local arrangement
B. Processess assignments
C. Global arrangement
D. None of the above
ANSWER: C
In Parallel DFS processes has following roles.(Select multiple
choices if applicable)
A. Donor
B. Active
C. Idle
D. Passive
ANSWER: A
Suppose there are 16 elements in a series then how many phases will
be required to sort the series using parallel odd-even bubble sort?
A. 8
B. 4

OptimusPrime Page 77
C. 5
D. 15
ANSWER: D
Which are different sources of Overheads in Parallel Programs?
A. Interprocess interactions
B. Process Idling
C. All mentioned options
D. Excess Computation
ANSWER: C
The ratio of the time taken to solve a problem on a parallel
processors to the time required to solve the same problem on a
single processor with p identical processing elements.
A. The ratio of the time taken to solve a problem on a single
processor to the time required to solve the same problem on a
parallel computer with p identical processing elements.
B. The ratio of the time taken to solve a problem on a single
processor to the time required to solve the same problem on a
parallel computer with p identical processing elements
C. The ratio of number of multiple processors to size of data
D. None of the above
ANSWER: B
Efficiency is a measure of the fraction of time for which a
processing element is usefully employed.
A. TRUE
B. FALSE
ANSWER: A
CUDA helps do execute code in parallel mode using __________
A. CPU
B. GPU
C. ROM
D. Cash memory
ANSWER: B
In thread-function execution scenario thread is a ___________
A. Work
B. Worker
C. Task
D. None of the above
ANSWER: B
In GPU Following statements are true
A. Grid contains Block
B. Block contains Threads
C. All the mentioned options.
D. SM stands for Streaming MultiProcessor
ANSWER: C
Computer system of a parallel computer is capable of_____________
A. Decentralized computing

OptimusPrime Page 78
B. Parallel computing
C. Centralized computing
D. All of these
ANSWER: A
In which application system Distributed systems can run well?
A. HPC
B. Distrubuted Framework
C. HRC
D. None of the above
ANSWER: A
A pipeline is like .................... ?
A. an automobile assembly line
B. house pipeline
C. both a and b
D. a gas line
ANSWER: A
Pipeline implements ?
A. fetch instruction
B. decode instruction
C. fetch operand
D. all of above
ANSWER: D
A processor performing fetch or decoding of different instruction
during the execution of another instruction is called ______ ?
A. Super-scaling
B. Pipe-lining
C. Parallel Computation
D. None of these
ANSWER: B
In a parallel execution, the performance will always improve as the
number of processors will increase?
A. True
B. False
ANSWER: B
VLIW stands for ?
A. Very Long Instruction Word
B. Very Long Instruction Width
C. Very Large Instruction Word
D. Very Long Instruction Width
ANSWER: A
In VLIW the decision for the order of execution of the instructions
depends on the program itself?
A. True
B. False
ANSWER: A
Which one is not a limitation of a distributed memory parallel

OptimusPrime Page 79
system?
A. Higher communication time
B. Cache coherency
C. Synchronization overheads
D. None of the above
ANSWER: B
Which of these steps can create conflict among the processors?
A. Synchronized computation of local variables
B. Concurrent write
C. Concurrent read
D. None of the above
ANSWER: B
Which one is not a characteristic of NUMA multiprocessors?
A. It allows shared memory computing
B. Memory units are placed in physically different location
C. All memory units are mapped to one common virtual global memory
D. Processors access their independent local memories
ANSWER: D
Which of these is not a source of overhead in parallel computing?
A. Non-uniform load distribution
B. Less local memory requirement in distributed computing
C. Synchronization among threads in shared memory computing
D. None of the above
ANSWER: B
Systems that do not have parallel processing capabilities are?
A. SISD
B. SIMD
C. MIMD
D. All of the above
ANSWER: A
How does the number of transistors per chip increase according to
Moore ´s law?
A. Quadratically
B. Linearly
C. Cubicly
D. Exponentially
ANSWER: D
Parallel processing may occur?
A. in the instruction stream
B. in the data stream
C. both[A] and [B]
D. none of the above
ANSWER: C
To which class of systems does the von Neumann computer belong?
A. SIMD (Single Instruction Multiple Data)
B. MIMD (Multiple Instruction Multiple Data)

OptimusPrime Page 80
C. MISD (Multiple Instruction Single Data)
D. SISD (Single Instruction Single Data)
ANSWER: D
Fine-grain threading is considered as a ______ threading?
A. Instruction- level
B. Loop level
C. Task-level
D. Function-level
ANSWER: A
Multiprocessor is systems with multiple CPUs, which are capable of
independently executing different tasks in parallel. In this
category every processor and memory module has similar access time?
A. UMA
B. Microprocessor
C. Multiprocessor
D. NUMA
ANSWER: A
For inter processor communication the miss arises are called?
A. hit rate
B. coherence misses
C. comitt misses
D. parallel processing
ANSWER: B
NUMA architecture uses _______in design?
A. cache
B. shared memory
C. message passing
D. distributed memory
ANSWER: D
A multiprocessor machine which is capable of executing multiple
instructions on multiple data sets?
A. SISD
B. SIMD
C. MIMD
D. MISD
ANSWER: C
In message passing, send and receive message between?
A. Task or processes
B. Task and Execution
C. Processor and Instruction
D. Instruction and decode
ANSWER: A
The First step in developing a parallel algorithm is_________?
A. To Decompose the problem into tasks that can be executed
concurrently
B. Execute directly

OptimusPrime Page 81
C. Execute indirectly
D. None of Above
ANSWER: A
The number of tasks into which a problem is decomposed determines
its?
A. Granularity
B. Priority
C. Modernity
D. None of above
ANSWER: A
The length of the longest path in a task dependency graph is called?
A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above
ANSWER: A
The graph of tasks (nodes) and their interactions/data exchange
(edges)?
A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above
ANSWER: A
Mappings are determined by?
A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above
ANSWER: C
Decomposition Techniques are?
A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. All of Above
ANSWER: D
The Owner Computes Rule generally states that the process assigned a
particular data item is responsible for?
A. All computation associated with it
B. Only one computation
C. Only two computation
D. Only occasionally computation
ANSWER: A
A simple application of exploratory decomposition is_?
A. The solution to a 15 puzzle
B. The solution to 20 puzzle
C. The solution to any puzzle

OptimusPrime Page 82
D. None of Above
ANSWER: A
Speculative Decomposition consist of _?
A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B
ANSWER: C
task characteristics include?
A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above
ANSWER: D
Writing parallel programs is referred to as?
A. Parallel computation
B. Parallel processes
C. Parallel development
D. Parallel programming
ANSWER: D
Parallel Algorithm Models?
A. Data parallel model
B. Bit model
C. Data model
D. network model
ANSWER: A
The number and size of tasks into which a problem is decomposed
determines the?
A. fine-granularity
B. coarse-granularity
C. sub Task
D. granularity
ANSWER: A
A feature of a task-dependency graph that determines the average
degree of concurrency for a given granularity is its ___________
path?
A. critical
B. easy
C. difficult
D. ambiguous
ANSWER: A
The pattern of___________ among tasks is captured by what is known
as a task-interaction graph?
A. Interaction
B. communication
C. optmization

OptimusPrime Page 83
D. flow
ANSWER: A
Interaction overheads can be minimized by____?
A. Maximize Data Locality
B. Maximize Volume of data exchange
C. Increase Bandwidth
D. Minimize social media contents
ANSWER: A
Type of parallelism that is naturally expressed by independent tasks
in a task-dependency graph is called _______ parallelism?
A. Task
B. Instruction
C. Data
D. Program
ANSWER: A
Speed up is defined as a ratio of?
A. s=Ts/Tp
B. S= Tp/Ts
C. Ts=S/Tp
D. Tp=S /Ts
ANSWER: A
Parallel computing means to divide the job into several __________?
A. Bit
B. Data
C. Instruction
D. Task
ANSWER: D
_________ is a method for inducing concurrency in problems that can
be solved using the divide-and-conquer strategy?
A. exploratory decomposition
B. speculative decomposition
C. data-decomposition
D. Recursive decomposition
ANSWER: C
The___ time collectively spent by all the processing elements Tall =
p TP?
A. total
B. Average
C. mean
D. sum
ANSWER: A
Group communication operations are built using point-to-point
messaging primitives?
A. True
B. False
ANSWER: A

OptimusPrime Page 84
Communicating a message of size m over an uncongested network takes
time ts + tmw?
A. True
B. False
ANSWER: A
The dual of one-to-all broadcast is ?
A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above
ANSWER: A
A hypercube has?
A. 2d nodes
B. 2d nodes
C. 2n Nodes
D. N Nodes
ANSWER: A
A binary tree in which processors are (logically) at the leaves and
internal nodes are routing nodes?
A. True
B. False
ANSWER: A
In All-to-All Broadcast each processor is thesource as well as
destination?
A. True
B. False
ANSWER: A
The Prefix Sum Operation can be implemented using the ?
A. All-to-all broadcast kernel.
B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel
ANSWER: A
In the scatter operation ?
A. Single node send a unique message of size m to every other node
B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node
D. None of Above
ANSWER: A
The gather operation is exactly the inverse of the ?
A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation
ANSWER: A
In All-to-All Personalized Communication Each node has a distinct

OptimusPrime Page 85
message of size m for every other node ?
A. True
B. False
ANSWER: A
Parallel algorithms often require a single process to send identical
data to all other processes or to a subset of them. This operation
is known as _________?
A. one-to-all broadcast
B. All to one broadcast
C. one-to-all reduction
D. all to one reduction
ANSWER: A
In which of the following operation, a single node sends a unique
message of size m to every other node?
A. Gather
B. Scatter
C. One to all personalized communication
D. Both A and C
ANSWER: D
Gather operation is also known as ________?
A. One to all personalized communication
B. One to all broadcast
C. All to one reduction
D. All to All broadcast
ANSWER: A
one-to-all personalized communication does not involve any
duplication of data?
A. True
B. False
ANSWER: A
Gather operation, or concatenation, in which a single node collects
a unique message from each node?
A. True
B. False
ANSWER: A
Conventional architectures coarsely comprise of a?
A. A processor
B. Memory system
C. Data path.
D. All of Above
ANSWER: D
Data intensive applications utilize?
A. High aggregate throughput
B. High aggregate network bandwidth
C. High processing and memory system performance.
D. None of above

OptimusPrime Page 86
ANSWER: A
A pipeline is like?
A. Overlaps various stages of instruction execution to achieve
performance.
B. House pipeline
C. Both a and b
D. A gas line
ANSWER: A
Scheduling of instructions is determined?
A. True Data Dependency
B. Resource Dependency
C. Branch Dependency
D. All of above
ANSWER: D
VLIW processors rely on?
A. Compile time analysis
B. Initial time analysis
C. Final time analysis
D. Mid time analysis
ANSWER: A
Memory system performance is largely captured by?
A. Latency
B. Bandwidth
C. Both a and b
D. none of above
ANSWER: C
The fraction of data references satisfied by the cache is called?
A. Cache hit ratio
B. Cache fit ratio
C. Cache best ratio
D. none of above
ANSWER: A
A single control unit that dispatches the same Instruction to
various processors is?
A. SIMD
B. SPMD
C. MIMD
D. None of above
ANSWER: A
The primary forms of data exchange between parallel tasks are?
A. Accessing a shared data space
B. Exchanging messages.
C. Both A and B
D. None of Above
ANSWER: C
Switches map a fixed number of inputs to outputs?

OptimusPrime Page 87
A. True
B. False
ANSWER: A
The First step in developing a parallel algorithm is?
A. To Decompose the problem into tasks that can be executed
concurrently
B. Execute directly
C. Execute indirectly
D. None of Above
ANSWER: A
The number of tasks into which a problem is decomposed determines
its?
A. Granularity
B. Priority
C. Modernity
D. None of above
ANSWER: A
The length of the longest path in a task dependency graph is called?
A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above
ANSWER: A
The graph of tasks (nodes) and their interactions/data exchange
(edges)?
A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above
ANSWER: A
Mappings are determined by?
A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above
ANSWER: C
Decomposition Techniques are?
A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. All of Above
ANSWER: D
The Owner Computes Rule generally states that the process assigned a
particular data item are responsible for?
A. All computation associated with it
B. Only one computation

OptimusPrime Page 88
C. Only two computation
D. Only occasionally computation
ANSWER: A
A simple application of exploratory decomposition is?
A. The solution to a 15 puzzle
B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above
ANSWER: A
Speculative Decomposition consist of ?
A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B
ANSWER: C
Task characteristics include?
A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above.
ANSWER: D
Group communication operations are built using point-to-point
messaging primitives?
A. True
B. False
ANSWER: A
Communicating a message of size m over an uncongested network takes
time ts + tmw?
A. True
B. False
ANSWER: A
The dual of one-to-all broadcast is?
A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above
ANSWER: A
A hypercube has?
A. 2d nodes
B. 3d nodes
C. 2n Nodes
D. N Nodes
ANSWER: A
A binary tree in which processors are (logically) at the leaves and
internal nodes are routing nodes?
A. True

OptimusPrime Page 89
B. False
ANSWER: A
In All-to-All Broadcast each processor is the source as well as
destination?
A. True
B. False
ANSWER: A
The Prefix Sum Operation can be implemented using the?
A. All-to-all broadcast kernel.
B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel
ANSWER: A
In the scatter operation?
A. Single node send a unique message of size m to every other node
B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node
D. None of Above
ANSWER: A
The gather operation is exactly the inverse of the?
A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation
ANSWER: A
In All-to-All Personalized Communication Each node has a distinct
message of size m for every other node?
A. True
B. False
ANSWER: A
Computer system of a parallel computer is capable of?
A. Decentralized computing
B. Parallel computing
C. Centralized computing
D. Decentralized computing
E. Distributed computing
ANSWER: A
Writing parallel programs is referred to as?
A. Parallel computation
B. Parallel processes
C. Parallel development
D. Parallel programming
ANSWER: D
Simplifies applications of three-tier architecture is ____________?
A. Maintenance
B. Initiation

OptimusPrime Page 90
C. Implementation
D. Deployment
ANSWER: D
Dynamic networks of networks, is a dynamic connection that grows is
called?
A. Multithreading
B. Cyber cycle
C. Internet of things
D. Cyber-physical system
ANSWER: C
In which application system Distributed systems can run well?
A. HPC
D. HTC
C. HRC
D. Both A and B
ANSWER: D
In which systems desire HPC and HTC?
A. Adaptivity
B. Transparency
C. Dependency
D. Secretive
ANSWER: B
No special machines manage the network of architecture in which
resources are known as?
A. Peer-to-Peer
B. Space based
C. Tightly coupled
D. Loosely coupled
ANSWER: A
Significant characteristics of Distributed systems have of ?
A. 5 types
B. 2 types
C. 3 types
D. 4 types
ANSWER: C
Built of Peer machines are over?
A. Many Server machines
B. 1 Server machine
C. 1 Client machine
D. Many Client machines
ANSWER: D
Type HTC applications are?
A. Business
B. Engineering
C. Science
D. Media mass

OptimusPrime Page 91
ANSWER: A
Virtualization that creates one single address space architecture
that of, is called?
A. Loosely coupled
B. Peer-to-Peer
C. Space-based
D. Tightly coupled
ANSWER: C
We have an internet cloud of resources In cloud computing to form?
A. Centralized computing
B. Decentralized computing
C. Parallel computing
D. All of these
ANSWER: D
Data access and storage are elements of Job throughput, of
__________?
A. Flexibility
B. Adaptation
C. Efficiency
D. Dependability
ANSWER: C
Billions of job requests is over massive data sets, ability to
support known as?
A. Efficiency
B. Dependability
C. Adaptation
D. Flexibility
ANSWER: C
Broader concept offers Cloud computing .to select which of the
following?
A. Parallel computing
B. Centralized computing
C. Utility computing
D. Decentralized computing
ANSWER: C
Resources and clients transparency that allows movement within a
system is called?
A. Mobility transparency
B. Concurrency transparency
C. Performance transparency
D. Replication transparency
ANSWER: A
Distributed program in a distributed computer running a is known as?
A. Distributed process
B. Distributed program
C. Distributed application

OptimusPrime Page 92
D. Distributed computing
ANSWER: B
Uniprocessor computing devices is called__________?
A. Grid computing
B. Centralized computing
C. Parallel computing
D. Distributed computing
ANSWER: B
Utility computing focuses on a______________ model?
A. Data
B. Cloud
C. Scalable
D. Business
ANSWER: D
What is a CPS merges technologies?
A. 5C
B. 2C
C. 3C
D. 4C
ANSWER: C
Aberration of HPC?
A. High-peak computing
B. High-peripheral computing
C. High-performance computing
D. Highly-parallel computing
ANSWER: C
Peer-to-Peer leads to the development of technologies like?
A. Norming grids
B. Data grids
C. Computational grids
D. Both A and B
ANSWER: D
Type of HPC applications of?
A. Management
B. Media mass
C. Business
D. Science
ANSWER: D
The development generations of Computer technology has gone through?
A. 6
B. 3
C. 4
D. 5
ANSWER: D
Utilization rate of resources in an execution model is known to be
its?

OptimusPrime Page 93
A. Adaptation
B. Efficiency
C. Dependability
D. Flexibility
ANSWER: B
Even under failure conditions Providing Quality of Service (QoS)
assurance is the responsibility of?
A. Dependability
B. Adaptation
C. Flexibility
D. Efficiency
ANSWER: A
Interprocessor communication that takes place?
A. Centralized memory
B. Shared memory
C. Message passing
D. Both A and B
ANSWER: D
Data centers and centralized computing covers many and?
A. Microcomputers
B. Minicomputers
C. Mainframe computers
D. Supercomputers
ANSWER: D
Which of the following is an primary goal of HTC
paradigm___________?
A. High ratio Identification
B. Low-flux computing
C. High-flux computing
D. Computer utilities
ANSWER: C
The high-throughput service provided is measures taken by
A. Flexibility
B. Efficiency
C. Dependability
D. Adaptation
ANSWER: D
What are the sources of overhead?
A. Essential /Excess Computation
B. Inter-process Communication
C. Idling
D. All above
ANSWER: D
Which are the performance metrics for parallel systems?
A. Execution Time
B. Total Parallel Overhead

OptimusPrime Page 94
C. Speedup
D. All above
ANSWER: D
The efficiency of a parallel program can be written as: E = Ts /
pTp. True or False?
A. True
B. False
ANSWER: A
The important feature of the VLIW is ______?
A. ILP
B. Performance
C. Cost effectiveness
D. delay
ANSWER: A

---------------------------------------------------------------------------------------------------------------------
SET 8 (MCQs)
---------------------------------------------------------------------------------------------------------------------

OptimusPrime Page 95
No. Question 1 2 3 4 Ans
1 Any condition that causes a processor to stall is Hazard Page fault System error None of the above 1
called as _____.
2 The time lost due to branch instruction is often Latency Delay Branch penalty None of the above 3
referred to as _____.
3 _____ method is used in centralized systems to Scorecard Score boarding Optimizing Redundancy 2
perform out of order execution.
4 The computer cluster architecture emerged as an ISA Workstation Super computers Distributed systems 3
alternative for ____.
5 NVIDIA CUDA Warp is made up of how many 512 1024 312 32 4
threads?
6 Out-of-order instructions is not possible on GPUs. TRUE FALSE -- -- 2

7 CUDA supports programming in .... C or C++ only Java, Python, and more C, C++, third party Pascal 3
wrappers for
Java, Python, and
more
8 FADD, FMAD, FMIN, FMAX are ----- supported by 32-bit IEEE 32-bit integer instructions both none of the above 1
Scalar Processors of NVIDIA GPU. floating point
instructions
9 Each streaming multiprocessor (SM) of CUDA 1024 128 512 8 4
herdware has ------ scalar processors (SP).

10 Each NVIDIA GPU has ------ Streaming 8 1024 512 16 4


Multiprocessors
11 CUDA provides ------- warp and thread scheduling. “programming- “zero-overhead”, 1 clock 64, 2 clock 32, 1 clock 2
Also, the overhead of thread creation is on the overhead”, 2
order of ----. clock
12 Each warp of GPU receives a single instruction and SIMD (Single SIMT (Single instruction SISD (Single SIST (Single instruction 2
“broadcasts” it to all of its threads. It is a ---- instruction multiple thread) instruction single single thread)
operation. multiple data) data)

OptimusPrime Page 96
13 Limitations of CUDA Kernel recursion, call No recursion, no call stack, recursion, no call No recursion, call stack, 2
stack, static no static variable stack, static no static variable
variable declarations variable declarations
declaration declaration

14 What is Unified Virtual Machine It is a It is a technique for It is a technique It is a technique for 1


technique that managing separate host for executing executing general
allow both CPU and device memory device code on purpose programs on
and GPU to spaces. host and host device instead of host.
read from code on device.
single virtual
machine,
simultaneously
.
15 _______ became the first language specifically Python, GPUs. C, CPUs. CUDA C, GPUs. Java, CPUs. 3
designed by a GPU Company to facilitate general
purpose computing on ____.

16 The CUDA architecture consists of --------- for RISC CISC instruction set ZISC instruction PTX instruction set 4
parallel computing kernels and functions. instruction set architecture set architecture architecture
architecture
17 CUDA stands for --------, designed by NVIDIA. Common Complex Unidentified Compute Unified Complex Unstructured 3
Union Discrete Device Architecture Device Distributed Architecture
Architecture Architecture

18 The host processor spawns multithread tasks (or TRUE FALSE --- --- 1
kernels as they are known in CUDA) onto the GPU
device. State true or false.

19 The NVIDIA G80 is a ---- CUDA core device, the 128, 256, 512 32, 64, 128 64, 128, 256 256, 512, 1024 1
NVIDIA G200 is a ---- CUDA core device, and the
NVIDIA Fermi is a ---- CUDA core device.

20 NVIDIA 8-series GPUs offer -------- . 50-200 GFLOPS 200-400 GFLOPS 400-800 GFLOPS 800-1000 GFLOPS 1

OptimusPrime Page 97
21 IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- 32-bit IEEE 32-bit integer instructions both none of the above 2
supported by Scalar Processors of NVIDIA GPU. floating point
instructions
22 CUDA Hardware programming model supports: a) a,c,d,f b,c,d,e a,d,e,f a,b,c,d,e,f 4
fully generally data-parallel archtecture; b)
General thread launch; c) Global load-store; d)
Parallel data cache; e) Scalar architecture; f)
Integers, bit operation

23 In CUDA memory model there are following a, b, d, f a, c, d, e, f a, b, c, d, e, f b, c, e, f 3


memory types available: a) Registers; b) Local
Memory; c) Shared Memory; d) Global Memory; e)
Constant Memory; f) Texture Memory.

24 What is the equivalent of general C program with int main ( void __global__ void kernel( __global__ void __global__ int main ( 2
CUDA C: int main(void) { printf("Hello, World!\n"); ) { kernel void ) { } int main ( void ) { kernel( void ) { void ) { kernel
return 0; } <<<1,1>>>(); kernel <<<1,1>>>(); kernel <<<1,1>>>();
printf("Hello, printf("Hello, World!\n"); <<<1,1>>>(); printf("Hello,
World!\n"); return 0; } printf("Hello, World!\n"); return 0; }
return 0; } World!\n");
return 0; }

25 Which function runs on Device (i.e. GPU): a) a b both a,b --- 1


__global__ void kernel (void ) { } b) int main ( void )
{ ... return 0; }
26 A simple kernel for adding two integers: add() will add() will execute on host, add() will be add() will be called and 1
__global__ void add( int *a, int *b, int *c ) { *c = *a execute on add() will be called from called and executed on device
+ *b; } where __global__ is a CUDA C keyword device, add() device executed on host
which indicates that: will be called
from host
27 If variable a is host variable and dev_a is a device cudaMalloc( malloc( &dev_a, sizeof( int cudaMalloc( malloc( (void**) &dev_a, 3
(GPU) variable, to allocate memory to dev_a select &dev_a, sizeof( )) (void**) &dev_a, sizeof( int ) )
correct statement: int ) ) sizeof( int ) )

OptimusPrime Page 98
28 If variable a is host variable and dev_a is a device memcpy( cudaMemcpy( dev_a, &a, memcpy( (void*) cudaMemcpy( (void*) 2
(GPU) variable, to copy input from variable a to dev_a, &a, size, dev_a, &a, size); &dev_a, &a, size,
variable dev_a select correct statement: size); cudaMemcpyHostToDevic cudaMemcpyDeviceToH
e ); ost );

29 Triple angle brackets mark in a statement inside a call from a call from device code to less than greater than comparison 1
main function, what does it indicates? host code to host code comparison
device code
30 What makes a CUDA code runs in parallel __global__ main() function indicates Kernel name first parameter value 4
indicates parallel execution of code outside triple inside triple angle
parallel angle bracket bracket (N) indicates
execution of indicates excecution of kernel N
code excecution of times in parallel
kernel N times in
parallel

31 Item Text 1 2 3 4 Ans


32 Which of the following statements is NOT TRUE Usually deal No of elements must be Use auxilliary Ususally are of type 3
for Internal Sorting algorithms with small able to fit in process's memory like tape compare-exchange
number of main memory or hard disk
elements

33
34 In sorting networks for INCREASING COMPARATOR X' = min { x , y } X' = max { x , y } and Y' = X' = min { x , y } X' = max { x , y } and Y' = 3
with input x,y select the correct output X', Y' from and Y' = min { x min { x , y } and Y' = max{ x , y max { x , y }
the following options ,y} }
35
36 In sorting networks for DECREASING X' = min { x , y } X' = max { x , y } and Y' = X' = min { x , y } X' = max { x , y } and Y' = 2
COMPARATOR with input x,y select the correct and Y' = min { x min { x , y } and Y' = max{ x , y max { x , y }
output X', Y' from the following options ,y} }
37
38 Which of the following is TRUE for Bitonic a) and b) a) and b) and d) a) and b) and c) a) and b) and c) and d) 4
Sequence a) Monotonically increasing b)
Monotonically Decreasing c) With cyclic shift of
indices d) First increasing then decreasing

OptimusPrime Page 99
39
40 Which of the following is NOT a BITONIC {8, 6, 4, 2, 3, 5, {0, 4, 8, 9, 2, 1} {3, 5, 7, 9, 8, 6, 4, {1, 2, 4, 7, 6, 0, 1} 4
Sequence 7, 9} 2}
41
42 The procedure of sorting a bitonic sequence using Bitonic Merge Bitonic Split Bitonic Divide Bitonic Series 1
bitonic splits is called
43
44 While mapping Bitonic sort on Hypercube, One Bit Two bits Three Bits Four bits 1
Compare-exchange operations take place between
wires whose labels differ in
45
46 Which of following is NOT A WAY of mapping the Row Major Column Major Mapping Row Major Row Major Shuffled 2
input wires of the bitonic Mapping Snakelike Mapping
sorting network to a MESH of processes mapping
47
48 Which is the sorting algorithm in below given steps Selection Sort Bubble Sort Parallel Selcetion Parallel Bubble Sort 2
- 1. procedure X_SORT(n) Sort
2. begin
3. for i := n - 1 downto 1 do
4. for j := 1 to i do
5. compare-exchange(aj, aj + 1);
6. end X_SORT

49
50 The odd-even transposition algorithm sorts n 2n n2 n/2 n 3
elements in n phases (n is even), each of which
requires ------------compare-exchange operations

51

OptimusPrime Page 100


52 What is TRUE about SHELL SORT Moves Moves elements long During second both 2 and 3 4
elements only distance phase algorithm
one position at switches to odd
a time even
transposition sort

53
54 Which is the fastest sorting algorithm Bubble Sort Odd-Even Transposition Shell Sort Quick Sort 4
Sort
55
56 Quicksort's performance is greatly affected by the TRUE FALSE 1
way it partitions a sequence.
57
58 Pivot in Quick sort can be selected as Always First Always Last element Always Middle Randomly Selected 4
Element index Element Element
59
60 Quick sort uses Recursive Decomposition TRUE FALSE 1
61
62 In first step of parallelizing quick sort for n Only one n processes are used two processes are None of the above 1
elements to get subarrays, which of the following process is used used
statement is TRUE
63
64 In Binary tree representation created by execution Leaf Node Root of tree Any internal node None of the above 2
of Quick sort, Pivot is at
65
66 What is the worst case time complexity of a quick O(N) O(N log N) O(N2) O(log N) 3
sort algorithm?
67
68 What is the average running time of a quick sort O(N) O(N log N) O(N2) O(log N) 2
algorithm?
69
70 Odd-even transposition sort is a variation of Quick Sort Shell Sort Bubble Sort Selection Sort 3
71

OptimusPrime Page 101


72 What is the average case time complexity of odd- O(N log N) O(N) O(log N) O(N2) 4
even transposition sort?
73
74 Shell sort is an improvement on Quick Sort Bubble Sort Insertion sort Selection Sort 3
75
76 In parallel Quick Sort Pivot is sent to processes by Broadcast Multicast Selective Unicast 1
Multicast
77
78 In parallel Quick Sort each process divides the n Lists 2 Lists 4 Lists n-1 Lists 2
unsorted list into
79
80 Time Complexity of DFS is? (V – number of O(V + E) O(V) O(E) O(V*E) 1
vertices, E – number of edges)
81
82 A person wants to visit some places. He starts BFS DFS Prim's Kruskal's 2
from a vertex and then wants to visit every vertex
till it finishes from one vertex, backtracks and then
explore other vertex from same vertex. What
algorithm he should use?

83
84 Given an array of n elements and p processes, in n*p n-p p/n n/p 4
the message-passing version of the parallel
quicksort, each process stores ---------elements of
array

85
86 In parallel quick sort Pivot selecton strategy is Maintaing load Maintaining uniform Effective Pivot all of the above 4
crucial for balance distribution of elements in selection in next
process groups level

87

OptimusPrime Page 102


88 In execution of the hypercube formulation of first scond third None of above 3
quicksort for d = 3, split along -----------dimention
to partition sequence into two big blocks, one
greater than pivot and other smaller than pivot as
shown in diagram

89
90 Which Parallel formulation of Quick sort is possible Shared- Message Passing Hypercube All of the above 4
Address-Space formulation Formulation
Parallel
Formulation
91
92 Which formulation of Dijkstra's algorithm exploits source- source-parallel Partitioned- All of above 2
more parallelism partitioned formulation Parallel
formulation Formulation
93
94 In Dijkstra's all pair shortest path each process TRUE FALSE 1
compute the single-source shortest paths for all
vertices assigned to it in SOURCE PARTITIONED
FORMULATION

95
96 A complete graph is a graph in which each pair of TRUE FALSE 1
vertices is adjacent
97
98 The space required to store the adjacency matrix in order of n in order of n log n in order of n in order of n/2 3
of a graph with n vertices is squared
99
10 Graph can be represented by Identity Matrix Adjacency Matrix Sprse list Sparse matrix 2
0
10
1
10 to solve the all-pairs shortest paths problem which a) and c) a) and b) b) and c) c) and d) 2
2 algorithm/s is/are used a) Floyd's algorithm b)
Dijkstra's single-source shortest paths c) Prim's
Algorithm d) Kruskal's Algorithm

OptimusPrime Page 103


10
3
10 Simple backtracking is a depth-first search method TRUE FALSE 1
4 that terminates upon finding the first solution.

10
5
10 Best-first search (BFS) algorithms can search both TRUE FALSE 1
6 graphs and trees.
10
7
10 A* algorithm is a BFS algorithm DFS Algorithm Prim's Algorithm Kruskal's Algorithm 1
8
10
9
11 identify Load-Balancing Scheme/s Asynchronous Global Round Robin Random Polling All above methods 4
0 Round Robin
11
1
11 important component of best-first search (BFS) Open List Closed List Node List Mode List 1
2 algorithms is
11 Question a b c d Ans
3
11 A CUDA program is comprised of two primary GPU kernel CPU kernel OS none of above a
4 components: a host and a _____.
11 The kernel code is dentified by _host_ __global__ _device_ void b
5 the ________qualifier with void return type
11 The kernel code is only callable by the host TRUE FALSE a
6
11 The kernel code is executable on the device and TRUE FALSE b
7 host
11 Calling a kernel is typically referred to as kernel thread kernel initialization kernel kernel invocation d
8 _________. termination
11 Host codes in a CUDA application can Initialize a TRUE FALSE a
9 device

OptimusPrime Page 104


12 Host codes in a CUDA application can Allocate GPU TRUE FALSE a
0 memory
12 A CUDA program is comprised of two primary GPU kernel CPU kernel OS none of above a
1 components: a host and a _____.
12 A CUDA program is comprised of two primary GPU kernel CPU kernel OS none of above a
2 components: a host and a _____.
12 The kernel code is dentified by _host_ __global__ _device_ void b
3 the ________qualifier with void return type
12 Host codes in a CUDA application can not Invoke TRUE FALSE b
4 kernels
12 CUDA offers the Chevron Syntax to configure and TRUE FALSE a
5 execute a kernel.
12 the BlockPerGrid and ThreadPerBlock parameters host kernel thread abstractio none of above c
6 are related to the ________ model supported by n
CUDA.
12 _________ is Callable from the device only _host_ __global__ _device_ none of above c
7
12 ______ is Callable from the host _host_ __global__ _device_ none of above b
8
12 ______ is Callable from the host _host_ __global__ _device_ none of above a
9
13 CUDA supports ____________ in which code in a tread division tread termination thread none of above c
0 single thread is executed by all other threads. abstraction

13 In CUDA, a single invoked kernel is referred to as a block tread grid none of above c
1 _____.
13 A grid is comprised of ________ of threads. block bunch host none of above a
2
13 A block is comprised of multiple _______. treads bunch host none of above a
3
13 a solution of the problem in representing the CUD PTA CDA CUDA d
4 parallelismin algorithm is
13 ______ is Callable from the host _host_ __global__ _device_ none of above b
5
13 ______ is Callable from the host _host_ __global__ _device_ none of above a

OptimusPrime Page 105


6
13 A CUDA program is comprised of two primary GPU kernel CPU kernel OS none of above a
7 components: a host and a _____.
13 The kernel code is dentified by _host_ __global__ _device_ void b
8 the ________qualifier with void return type
13 Host codes in a CUDA application can not Reset a TRUE FALSE b
9 device
14 Host codes in a CUDA application can not Invoke TRUE FALSE b
0 kernels
14 A CUDA program is comprised of two primary GPU kernel CPU kernel OS none of above a
1 components: a host and a _____.
14 Calling a kernel is typically referred to as kernel thread kernel initialization kernel kernel invocation d
2 _________. termination
14 In CUDA, a single invoked kernel is referred to as a block tread grid none of above c
3 _____.
14 A grid is comprised of ________ of threads. block bunch host none of above a
4
14 A block is comprised of multiple _______. treads bunch host none of above a
5
14 A CUDA program is comprised of two primary GPU kernel CPU kernel OS none of above a
6 components: a host and a _____.
14 ______ is Callable from the host _host_ __global__ _device_ none of above a
7
14 In CUDA, a single invoked kernel is referred to as a block tread grid none of above c
8 _____.
14 the BlockPerGrid and ThreadPerBlock parameters host kernel thread abstractio none of above c
9 are related to the ________ model supported by n
CUDA.
15 Host codes in a CUDA application can Transfer TRUE FALSE a
0 data to and from the device
15 Host codes in a CUDA application can not TRUE FALSE b
1 Deallocate memory on the GPU
15 Host codes in a CUDA application can not Reset a TRUE FALSE b
2 device

OptimusPrime Page 106


15 Calling a kernel is typically referred to as kernel thread kernel initialization kernel kernel invocation d
3 _________. termination

OptimusPrime Page 107


D. Y. Patil College of Engineering, Akurdi, Pune 411044
Department of Computer Engineering

___________________________________________________________________________________

Date: 23/07/2020

Class : BE Computer Div: A + B Subject : High Performance Computing


Academic Year : 2020-21 Sem : I Exam Date: 23/07/2020

Q. Options Correct
Question Description Marks CO PO PSO BTL
No. Answer

1 Select different A. data intensive applications utilize D 2 1 1 3 4


aspects of high aggregate throughput
parallelism B. server applications utilize high
aggregate network bandwidth
C. scientific applications typically
utilize high processing and memory
system performance
D. all of the above
2 Select correct A. 10 A 2 1 1 3 4
answer: DRAM B. 20
access times have C. 40
only improved at D. 50
the rate of roughly
____% per year
over this interval.

3 Justify, why to use A. Real world is massively parallel E 2 1 3 3 5


parallel B. Save time and/or time
computing? C. Solve larger / more complex
problems
D. Provide concurrency
E. All of the above

4 Analyze, if the A. In-order B 2 1 3 3 4


second instruction B. Out-of-order
has data C. Both of the above
dependencies with D. None of the above
the first, but the
third instruction
does not, the first

OptimusPrime Page 108


and third
instructions can be
co-scheduled.
Which type if this
issue is?

5 Select the A. Latency C 2 1 3 3 4


parameters which B. Bandwidth
captures Memory C. Both of the above
system D. None of the above
performance

6 Consider the A. Bandwidth: 5 gallons/second and C 2 1 4 3 5


example of a fire- Latency: 15 seconds
hose. If the water B. Bandwidth: 5*15 gallons/second
comes out of the and Latency: 15 seconds
hose five seconds C. Bandwidth: 15 gallons/second and
after the hydrant is Latency: 5 seconds
turned on. Once D. Bandwidth: 3 gallons/second and
the water starts Latency: 5 seconds
flowing, if the
hydrant delivers
water at the rate of
15 gallons/second.
Analyze the
bandwidth and
latency.

7 Select alternate A. Prefeching D 2 1 3 3 4


approaches for B. Multithreading
Hiding Memory C. spatial locality
Latency D. all of the above

8 Select which A. Private B 2 1 5 3 4


clause in OpenMP B. Firstprivate
is similar to the C. Shared
private, except D. All of the above
values of variables
are initialized to
corresponding
values before the

OptimusPrime Page 109


parallel directive.

9 The time which A. Startup time (ts) C 2 1 1 3 1


includes all B. Per-hop time (th)
overheads that are C. Per-word transfer time (tw)
determined by the D. All of the above
length of the
message like
bandwidth of
links, error
checking and
correction, etc. is
called as

10 Select in which A. Store-and-forward routing D 2 1 3 3 4


routing technique, B. Packet routing
Message is divided C. cut-through-routing
into packets? D. in both 2 and 3

11 Which of the A. Snoopy writes A 2 1 1 3 1


following is an B. Write through
efficient method of C. Write within
cache updating? D. Buffered write

12 Select which A. Data coherence protocols D 2 1 3 3 4


protocol is used B. Commit coherence protocols
for maintaining C. Recurrence
coherence of D. Cache coherence protocols
multiple
processors?

13 From inter- A. Coherence misses A 2 1 1 3 1


processor B. Commit misses
communication, C. Parallel processing
the misses arises D. Hit rate
are often called

14 As per Flynn’s A. in the instruction stream C 2 1 1 3 1


Classification, B. in the data stream
where Parallel C. both of the above
processing may D. none of the above
occur?

OptimusPrime Page 110


15 Which of the A. Blue Gene / L B 2 1 1 3 1
following projects B. Blue Gene / M
of Blue Gene is C. Blue Gene / P
not in
D. Blue Gene / Q
development?

(Mrs. Dhanashree Phalke) (Mrs. Vaishali Kolhe) ( Dr. Kailash Shaw) (Dr. Vinayak Kottawar)
Subject Teacher Academic Coordintor Dept. NBA Coordinator HOD Computer

OptimusPrime Page 111


D. Y. Patil College of Engineering, Akurdi, Pune 411044
Department of Computer Engineering

___________________________________________________________________________________

Unit Test II Date: 26/08/2020

Class : BE Computer Div: A + B Subject : High Performance Computing


Academic Year : 2020-21 Sem : I Exam Date: 26/08/2020

Q. Options 28 Correct
Question Description Marks CO PO PSO BTL
No. Answer

1 Task interaction A. control, data D 2 2 1 3 4


graphs represent B. task, data
_____ C. process, control
dependencies, D. data, control
whereas task
dependency graphs
represent ______
dependencies.
2 Select correct A. task dependency graph D 2 2 1 3 4
answer. Which B. process dependency graph
graph represents C. process interaction graph
tasks as nodes and D. task interaction graph
their
interactions/data
exchange as
edges?

3 The average A. average degree of concurrency A 2 2 1 3 4


number of tasks B. degree of concurrency
that can be C. critical path length
processed in D. maximum concurrency
parallel over the
execution of the
program is called
as
_______________

4 The number of A. average concurrency B 2 2 1 3 4


tasks that can be B. degree of concurrency
executed in C. critical path length

OptimusPrime Page 112


parallel is the D. maximum concurrency
____________ of
a decomposition.

5 A decomposition A. process dependency graph B 2 2 1 3 4


can be illustrated B. task dependency graph
in the form of a C. task interaction graph
directed graph D. process interaction graph
with nodes
corresponding to
tasks and edges
indicating that the
result of one task
is required for
processing the
next. Such graph
is called as
_________

6 In which case, the A. input data decomposition B 2 2 4 3 5


owner computes B. output data decomposition
rule implies that C. Both of the above
the output is D. None of the above
computed by the
process to which
the output data is
assigned?

7 Select relevant A. Task generation D 2 2 1 3 4


task characteristics B. Task sizes
from the options C. Size of data associated with tasks
given below: D. All of the above

8 A classic example A. Static Task Generation B 2 2 3 3 4


of game playing - B. Dynamic Task Generation
each 15 puzzle C. None of the above
board is the D. All of the above
example of
___________

9 Analyze task A. static regular interaction pattern B 2 2 3 3 5


interaction pattern B. static irregular interaction pattern

OptimusPrime Page 113


of the C. dynamic regular interaction pattern
multiplication of a D. dynamic irregular interaction pattern
sparse matrix with
a vector.

10 Select the methods A. Maximize data locality E 2 2 3 3 4


for containing B. Minimize volume of data exchange
Interaction C. Minimize frequency of interactions
Overheads. D. Minimize contention and hot-spots
E. All of the above

11 Which model is A. Work pool model B 2 2 1 3 4


equally suitable B. Master slave model
to shared-address- C. Data parallel model
space or message- D. Producer consumer or pipeline
passing model
paradigms, since
the interaction is
naturally two
ways.

12 In which type of A. Work pool model A 2 2 3 3 4


the model, tasks B. Master slave model
are dynamically C. Data parallel model
assigned to the D. Producer consumer or pipeline
processes for model
balancing the
load?

13 Select the A. pixel processing D 2 2 12 3 4


appropriate stage B. vertex processing
of GPU Pipeline C. memory interface
which receives D. host interface
commands from
CPU and also pulls
geometry
information from
system memory.

14 Select the A. GPU Clock Speed E 2 2 12 3 1


hardware B. Size of memory bus
specifications C. Amount of available memory

OptimusPrime Page 114


which most affect D. Memory Clock Rate
the GPU cards E. All of the above
speed.

15 Select the A. pixel processing A 2 2 12 3 1


appropriate stage B. vertex processing
of GPU Pipeline C. memory interface
where
D. host interface
computations
include texture
mapping and math
operations.

(Mrs. Dhanashree Phalke) (Mrs. Vaishali Kolhe) ( Dr. Kailash Shaw) (Dr. Vinayak Kottawar)
Subject Teacher Academic Coordintor Dept. NBA Coordinator HOD Computer

OptimusPrime Page 115


D. Y. Patil College of Engineering, Akurdi, Pune 411044
Department of Computer Engineering

___________________________________________________________________________________

Unit Test III Date: 14/10/2020

Class : BE Computer Div: A Subject : High Performance Computing


Academic Year : 2020-21 Sem : I Exam Date: 14/10/2020

Q. Options 28 Correct
Question Description Marks CO PO PSO BTL
No. Answer

1 In all-to-one reduction, A. First C 2 3 1 3 4


data items must be B. Last
combined piece-wise and C. Target
the result made available
D. N-1
at a _________
processor.
2 Analyze the Cost of A. T=tw log p + ts m (p-1) B 2 3 4 3 4
Scatter and Gather . B. T=ts log p + tw m (p-1)
C. T=ts log p - tw m (p-1)
D. T=tw log p - ts m (p-1)

3 All-to-all personalized A. partial exchange B 2 3 1 3 1


communication is also B. total exchange
known as C. both of the above
_______________. D. none of the above

4 All-to-all personalized A. m C 2 3 1 3 4
communication is B. p
performed independently C. m√p
in each row with D. p√m
clustered messages of
size _______ on a mesh.

5 In All-to-All A. m A 2 3 1 3 1
Personalized B. p
Communication on a C. m-1
Ring, the size of the D. p-1
message reduces by
______ at each step

OptimusPrime Page 116


6 All-to-All Broadcast and A. p C 2 3 1 3 1
Reduction algorithm on a B. p+1
Ring terminates in C. p-1
_________steps. D. p*p

7 In All-to-all Broadcast on A. rowwise, rowwise B 2 3 1 3 3


a Mesh, operation B. rowwise, columnwise
performs in which C. columnwise, rowwise
sequence? D. columnwise, columnwise

8 In the ________ A. Scatter A 2 3 3 3 1


operation, a single node B. gather
sends a unique message
of size m to every other
node.

9 In the _____ operation, a A. Scatter B 2 3 3 3 1


single node collects a B. gather
unique message from
each node.

10 Messages get smaller in A. broadcast, gather C 2 3 1 3 4


_________and stay B. gather, broadcast
constant in _________. C. scatter , broadcast
D. scatter, gather

11 The time taken by all-to- A. T= 2ts(√p – 1) + twm(p-1) B 2 3 4 3 4


all broadcast on a ring is B. T= (ts + twm)(p-1)
______. C. T= ts logp + twm(p-1)
D. T= 2ts(√p – 1) - twm(p-1)

12 The time taken by all-to- A. T= 2ts(√p – 1) + twm(p-1) A 2 3 4 3 4


all broadcast on a mesh is B. T= (ts + twm)(p-1)
______. C. T= ts logp + twm(p-1)
D. T= 2ts(√p – 1) - twm(p-1)

13 The time taken by all-to- A. T= 2ts(√p – 1) + twm(p-1) C 2 3 4 3 4


all broadcast on a B. T= (ts + twm)(p-1)
hypercube is ______. C. T= ts logp + twm(p-1)
D. T= 2ts(√p – 1) - twm(p-1)

14 _____ is a special A. Left shift C 2 3 1 3 1


permutation in which B. Right shift

OptimusPrime Page 117


node i sends a data C. Circular shift
packet to node (i + q) D. Linear shift
mod p in a p-node
ensemble (0 ≤ q ≤ p).

15 The prefix-sum operation A. all-to-all reduction B 2 3 1 3 1


can be implemented B. all-to-all broadcast
using the _______kernel C. one-to-all broadcast
D. all-to-one broadcast

(Mrs. Dhanashree Phalke) (Mrs. Vaishali Kolhe) ( Dr. Kailash Shaw) (Dr. Vinayak Kottawar)
Subject Teacher Academic Coordintor Dept. NBA Coordinator HOD Computer

OptimusPrime Page 118


D. Y. Patil College of Engineering, Akurdi, Pune 411044
Department of Computer Engineering

___________________________________________________________________________________

Unit Test IV Date: 09/11/2020

Class : BE Computer Div: A Subject : High Performance Computing


Academic Year : 2020-21 Sem : I Exam Date: 11/11/2020

Q. Options 28 Correct
Question Description Marks CO PO PSO BTL
No. Answer

1 Select the parameters on A. Input size D 2 4 1 3 4


which the parallel B. Number of processors
runtime of a program C. Communication parameters of
depends. the machine
D. All of the above
2 The time that elapses A. Serial runtime B 2 4 4 3 4
from the moment the first B. Parallel runtime
processor starts to the C. Overhead runtime
moment the last D. Excess runtime
processor finishes
execution is called as
___________.

3 Select how the overhead A. To = TP - TS C 2 4 1 3 1


function (To) is B. To = p*n TP - TS
calculated. C. To = p TP - TS
D. To = TP - pTS

4 What is is the ratio of the A. Efficiency C 2 4 1 3 4


time taken to solve a B. Overall time
problem on a single C. Speedup
processor to the time D. Scaleup
required to solve the
same problem on a
parallel computer with p
identical processing
elements?

5 The parallel time for odd- A. 3.75 B 2 4 1 3 1


even sort (efficient B. 3.5

OptimusPrime Page 119


parallelization of bubble C. 0.33
sort) is 50 seconds. The D. 0.26
serial time for bubblesort
is 175 seconds. Evaluate
the speedup of bubble
sort.

6 Consider the problem of A. E = Θ (n / log n) D 2 4 1 3 1


adding n numbers by B. E = Θ (n log n)
using n processing C. E = Θ (log n / n)
elements. The serial time D. E = Θ (1 / log n)
taken is Θ (n) and
parallel time is Θ (log n).
Evaluate the efficiency.

7 What will be the A. E = O(n). B 2 4 1 3 3


efficiency of cost optimal B. E = O(1).
parallel systems? C. E = O(p).
D. E = O(n log n).

8 Which law states that the A. Amdahl’s Law A 2 4 3 3 1


maximum speedup of a B. Flynn’s Law
parallel program is C. Moore’s Law
limited by the sequential D. Van Neumann’s Law
fraction of the initial
sequential program?

9 Arrange the steps for the A. i, ii, iii B 2 4 3 3 1


Matrix-Vector 2-D B. ii, iii, i
partitioning: C. iii, i, ii
D. ii, i, iii
i) result vector is
computed by
performing an all-to-
one reduction along
the columns.
ii) Alignment of the
vector x along the
principal diagonal of
the matrix.
iii) Copy the vector

OptimusPrime Page 120


elements from each
diagonal process to
all the processes in
the corresponding
column using n
simultaneous
broadcasts among all
processors in the
column.

10 Arrange the A. i, ii, iii C 2 4 1 3 4


communication sequence B. ii, iii, i
in Matrix-Vector 2-D C. iii, ii, i
partitioning: D. ii, i, iii

i) all-to-one reduction
in each row
ii) one-to-all broadcast
of each vector
element among the n
processes of each
column
iii) one-to-one
communication to
align the vector
along the main
diagonal
11 Parallel time in Rowwise A. Θ(1) D 2 4 4 3 4
1-D Partitioning of B. Θ(n log n)
Matrix-Vector C. Θ(n2)
Multiplication where p=n D. Θ(n)
is ____.

12 What are the sources of A. Interprocess interaction D 2 4 4 3 4


overhead in parallel B. Idling
programs? C. Excess computation
D. All of the above

13 What are the A. Execution time E 2 4 4 3 4


performance metrics of B. Total parallel overhead
parallel systems? C. Speedup

OptimusPrime Page 121


D. Efficiency
E. All of the above

14 The isoefficiency A. True A 2 4 1 3 1


function determines the B. False
ease with which a
parallel system can
maintain a constant
efficiency. True of false?

15 Which matrix-matrix A. Cannon’s algorithm B 2 4 1 3 1


multiplication algorithm B. DNS algorithm
uses a 3-D partitioning? C. Both of the above
D. None of the above

(Mrs. Dhanashree Phalke) (Mrs. Vaishali Kolhe) ( Dr. Kailash Shaw) (Dr. Vinayak Kottawar)
Subject Teacher Academic Coordintor Dept. NBA Coordinator HOD Computer

OptimusPrime Page 122


D. Y. Patil College of Engineering, Akurdi, Pune 411044
Department of Computer Engineering

___________________________________________________________________________________

Prelim Exam Date: 29/12/2020

Class : BE Computer Div: A & B Subject : High Performance Co mputing


Academic Year : 2020-21 Sem : I Exam Date: 31/12/2020

Options Corre
Q. ct
Question Description Marks CO PO PSO BTL
No. Answ
er

1 Which of the following is the a. Bit level parallelism D 1 1 1,1 1 2


type of parallelism? b. Instruction level 2
parallelism
c. Loop level parallelism
d. All of the above
2 Which of the parallelism is a. Bit level parallelism B 1 1 1,1 1 2
used by VLIW b. Instruction level 2
parallelism
c. Loop level parallelism
d. Task level Parallelism

3 Tendency of a software a. Spatial Locality a 1 1 1 1 1


process to access information
items whose addresses are b. Temporal locality
near one another known as c. Permanent Locality

d. Sequential Locality

4 Parallel Computers are a. SISD d 1 1 1,1 1 1


classified based on Flynn’s 2
taxonomy which among the b. SIMD
following options does not c. MIMD
come under this
d. SIPD

5 Which among the following a. Hypercube b 1 1 1 1 2


is the popular multistage
network b. Omega

OptimusPrime Page 123


c. Gamma

d. K-D Mesh

6 The multicore architecture a. Homogeneous core b 1 1 1 1 3


that consists of dedicated architecture.
application specific processor
cores that would target the b. Heterogeneous core
issue of running variety of architecture.
applications to be executed c. Polaris core
on a computer. architecture

d. None of the above

7 Decomposition of a. Fine grained C 1 2 1 3 1


computation into a small granularity
number of large task is
b. course grained
granularity

c. coarse grained
granularity

d. task grained
granularity

8 Which among the following a. Data-decomposition D 1 2 1,1 3 2


is the type of decomposition 2
b. Hybrid decomposition

c. Speculative
decomposition

d. All of the above

9 The 15-puzzle problem uses a. Data decomposition B 1 2 1,4 3 2


which type of decomposition ,12
b. Exploratory
decomposition

c. Speculative
decomposition

d. Recursive
decomposition

OptimusPrime Page 124


10 An interaction pattern is a. Structured interaction C 1 2 1,1 3 2
considered to be _______if it 2
has some structure that can be b. unstructured
exploited for efficient interaction
implementation c. Regular interaction

d. Irregular interaction

11 The mapping in which is a. Dynamic mapping a 1 2 1 1 1


tasks are distributed to
processes during execution is b. Static mapping
called as___ c. Pre-execution
mapping

d. In-process mapping

12 The parallel algorithm model a. The data parallel d 1 2 1,2 1 2


in which mapping of tasks is model
done dynamically where
pointer to tasks is stored in b. Producer consumer
physically shared list/priority model
queue/hash table/tree is called c. The task graph model

d. Work pool model

13 The world’s first GPU is a. GEForce 356 B 1 6 5 3 1


marketed by NVDIA in 1999
is b. GEForce 256

c. GEForce 3800

d. GEForce 956

14 The operation in which data a. All to one reduction A 1 3 1 1 2


from all processes are
combined at a single b. All to all reduction
destination process is c. one to all reduction

d. None of the above

15 In scatter operation a single a. One-to-one C 1 3 1 1 2


node sends a unique message personalized
to every node is also called as communication
b. One-to-all broadcast

OptimusPrime Page 125


communication
c. One-to-all personalized
communication
d. all-to-all personalized
communication

16 Single port communication a. True B 1 3 1 1 1


node can communicate on all b. False
the channels connected to it
and provides apparent
speedup
17 Symmetric multiprocessors a. Uniform memory A 1 3 1 1 1
architecture are sometimes access
known as
b. Static memory access

c. Variable memory
access

d. All of the above

18 Heuristic is way of trying a. To discover something a 1 4 1,2 3 2


or an idea embedded
in a program

b. To search and
measure how far a
node in a search tree
seems to be from a
goal

c. To compare two nodes


in a search tree to see
if one is better than
another

d. All of the mentioned

19 A * algorithm is based on a. Breadth-First search C 1 5 1,2 1 2

b. Depth-first Search

c. Best first search

d. Hill climbing

OptimusPrime Page 126


20 Best – First search can be a. Queue C 1 5 1,2 1 1
implemented using the
following data structure b. Stack

c. Priority Queue

d. Circular Queue

21 _____is a measure of the a. Scalability B 1 5 1,2 1 2


fraction of time for which a
processing element is usefully b. Efficiency
employed
c. Speedup

d. isoefficiency

22 The ___of a parallel system is A. speedup D 1 3 1,1 1 2


a measure of its capacity to 2
increase speedup in B. Cost
proportion to the number of
processing elements C. Efficiency

D. Scalability

23 ___helps us determine the a. Isoefficiency Metric of A 1 3 1,3 1 2


best algorithm/architecture scalability
combination for a particular
problem without explicitly b. Efficiency matric of
analyzing all possible scalability
combinations under all
possible co c. Cost metric of
scalability

d. None of the above

24 It is defined as a ratio of the a. Total parallel D 1 3 1,1 1 1


time taken to solve a problem overhead 2
on a single processing
element to the time computer b. Efficiency
with p identical processing
elements c. Cost

d. speedup

25 In Practice a speedup greater a. scalability effect C 1 3 1,2, 1 1


than p is sometimes observed. 12
It is called as_______ b. superscalar effect

OptimusPrime Page 127


c. super linearity effect

d. speedup effect

26 Odd-even transposition sort is a. theta(n^2) A 1 5 1,2, 3 3


not cost -optimal, because 5
time product is b. theta(n^logn)

c. O(n^3)

d. O(n+logn)

27 The quicksort algorithm, a. O(n^3) C 1 5 1,2, 1 3


which has an average 5
complexity of b. O(n+logn)

c. theta(n^logn)

d. theta(n^2)

28 Parallel code executes in a. Synchronising B 1 6 1,2, 1 2


many concurrent Device multiprocessor 12
(GPU) threads across
multiple parallel processing b. Streaming
elements, called multiprocessor

c. Scalable
multiprocessor

d. Summative
multiprocessor

29 ____partitions the vertices a. Source parallel C 1 5 1,2, 3 2


among different processes formulation 12
and has each process compute
the single-source shortest b. Single partitioned
path for all vertices assigned formulation
to it
c. Source partitioned
formulation

d. Shortest path
partitioned
formulation

30 A processor, assigned with a a. Multithreaded DIMS B 1 2 1 1 1


thread block that executes

OptimusPrime Page 128


code, which we usually call a processor

b. Multithreaded SIMD
processor

c. Multithreaded queue

d. Multithreaded stack

31 Processor of system, which a. Server D 1 6 1 1 1


can read/write GPU memory,
is known as b. Kernel

c. Guest

d. Host

32 CUDA stands for a. Compute uniform D 1 6 1,2, 2 1


device architecture 5

b. Computing universal
device architecture

c. Computer unicode
device architecture

d. Compute unified
device architecture

33 The device that are being a. Servers A 1 1 1 1 1


used primarily for database,
file server and mostly for web b. Desktops
application are known as
c. Tablets

d. Supercomputers

34 GPU are designed for running a. True B 1 6 1,2 1 1


a large number of complex
tasks b. False

35 The parallel algorithm design a. All to one broadcast C 1 3 1 1 2


contains a number of b. All to all broadcast
processes where one process c. One to all broadcast
may send the identical data to d. None of these
all other processes is called as
36 The efficient utilization can a. Recursive doubling 1 3 1 1 1
be done by devising a b. Recursive

OptimusPrime Page 129


broadcasting algorithm with c. Scatter and Gather a
the method known as d. None of these

37 The balanced tree is mapped a. switching nodes, 1 3 1,1 1 2


neutrally from the hypercube processing nodes 2
algorithm for one-to-all b. processing nodes, a
broadcast where intermediate switching nodes
are the ________and each
leaf nodes are the
__________
38 Finding prefix-sum operation a. True 1 3 1,1 1 1
is also called as scan b. False 2
operation a

39 All to all personalized a. Scan operation 1 3 1,1 1 2


communication is also called b. Total exchange method 2
as c. None of these B

40 On which network broadcast a. Ring 1 3 1,1 1 2


and reduction operations b. Hypercube 2
performed in two steps: c. Linear array d
1. Operations along with d. Mesh
row
2. Operations along with
column
41 Gather operation is also a. True 1 3 1,8 1 1
called as all to one reduction b. False b

42 The method which is used in a. All-to-all personalized 1 3 1,1 1 1


various parallel algorithm like communication d 2
Fourier transform, matrix b. All-to-all Broadcast
transpose, some parallel c. Total exchange method
database join operations is d. Both a & c
called as

43 Consider a sequence in which a. <2,6,11,17,18 > 1 3 4 2 3


numbers are originally b. <6,15,21,22 >
arranged<2,4,5,6,1>, then c. None of these a
sequence of Prefix sum will
be
44 Select the parameters on A. D 1 4 1 3 4
which the parallel runtime of nput size
a program depends. B.
umber of processors
C.

OptimusPrime Page 130


ommunication parameters
of the machine
D.
ll of the above

45 The time that elapses from A. Serial runtime B 1 4 4 3 4


the moment the first
processor starts to the B. Parallel runtime
moment the last processor
finishes execution is called as C. Overhead runtime
___________.
D. Excess runtime

46 Select how the overhead A. To = TP - TS C 1 4 1 3 1


function (To) is calculated.
B. To = p*n TP - TS

C. To = p TP - TS

D. To = TP - pTS

47 The parallel time for odd- A. 3.75 B 1 4 1 3 1


even sort (efficient
parallelization of bubble sort) B. 3.5
is 50 seconds. The serial time
for bubble sort is 175 C. 0.33
seconds. Evaluate the
speedup of bubble sort. D. 0.26

48 Consider the problem of A. E = Θ (n / log n) D 1 4 1 3 1


adding n numbers by using n
processing elements. The B. E = Θ (n log n)
serial time taken is Θ (n) and C. E = Θ (log n / n)
parallel time is Θ (log n).
Evaluate the efficiency. D. E = Θ (1 / log n)

49 What will be the efficiency of A. E = O(n). B 1 4 1 3 3


cost optimal parallel systems?
B. E = O(1).

OptimusPrime Page 131


C. E = O(p).

D. E = O(n log n).

50 Which law states that the A. Amdahl’s Law A 1 4 3 3 1


maximum speedup of a
parallel program is limited by B. Flynn’s Law
the sequential fraction of the
initial sequential program? C. Moore’s Law

D. Van Neumann’s Law

51 Arrange the steps for the A. i, ii, iii B 1 4 3 3 1


Matrix-Vector 2-D
partitioning B. ii, iii, i

i) result vector is computed C. iii, i, ii


by performing an all-to-one D. ii, i, iii
reduction along the columns.

ii)Alignment of the vector x


along the principal diagonal
of the matrix.

iii)Copy the vector elements


from each diagonal process to
all the processes in the
corresponding column using
n simultaneous broadcasts
among all processors in the
column.

52 Arrange the communication A. i, ii, iii C 1 4 1 3 4


sequence in Matrix-Vector 2-
D partitioning: B. ii, iii, i

i) all-to-one reduction in C. iii, ii, i


each row D. ii, i, iii
ii) one-to-all broadcast of
each vector element
among the n processes of
each column

iii) one-to-one

OptimusPrime Page 132


communication to align
the vector along the main
diagonal

53 Parallel time in Rowwise 1-D A. Θ(1) D 1 4 4 3 4


Partitioning of Matrix-Vector
Multiplication where p=n is B. Θ(n log n)
____.
C. Θ(n2)

D. Θ(n)

54 NVIDIA thought that a. CDA thread c 1 6 1,2, 1 2


‘unifying theme’ of every 12
forms of parallelism is the b. PTA thread

c. CUDA thread

d. CUD thread

55 Thread being blocked a. Thread block a 1 6 1,2, 1 2


altogether and being executed 12
in sets of 32 threads, called a b. 32 thread

c. 32 block

d. Unit block

56 Length of a vector operation a. Known a 1 6 1,2, 1 3


in a real program is often 12,
b. Unknown 6

c. Visible

d. Invisible

57 A code, known as grid which a. 32 thread d 1 6 1,1 1 1


runs on a GPU consisting of a 2,5
set of b. Unit block

c. 32 block

d. Thread block

58 NVDIA unvield the industrys a. GTX 1050 b 1 6 1,1 1 1


first directX 10 GPU is___ 2,5

OptimusPrime Page 133


b. GeForce 8800 GTX 1

c. GeForce GTX 1080

d. GTX 1060

59 The number of instructions a. Instruction count A 1 2 1 1 1


being executed defines the
b. Hit time

c. Clock rate

d. All above

60 In CUDA Programming a. <<<>>> d 1 6 1,2, 3 2


kernel is launch using which 12,
pair of brackets? b. {{{}}} 5

c. ((()))

d. [[[]]]

61 In CUDA programming the a. Memcopy() c 1 6 1,2, 1 1


transfer of data between host 12,
and device special function b. Memorycpy() 5
used is ___
c. cudaMemcopy()

d. cudaMemorycpy()

62 Streaming multiprocessor in a. WRAP a 1 6 1,1 1 2


CUDA, divides the thread in 2,5
a block is called as___ b. Packet

c. Grid

d. Thread block

63 Sources of overheads in a. Idling d 1 3 1,1 1 2


parallel program are 2,2
b. Interprocess
communication

c. Excess computation

d. All of the above

64 What are the sources of A. Interprocess interaction D 1 4 4 3 4


overhead in parallel

OptimusPrime Page 134


programs? B. Idling

C. Excess computation

D. All of the above

65 What are the performance A. Execution time E 1 4 4 3 4


metrics of parallel systems?
B. Total parallel overhead

C. Speedup

D. Efficiency

E. All of the above

66 The isoefficiency function A. True A 1 4 1 3 1


determines the ease with
which a parallel system can B. False
maintain a constant
efficiency. True of false?
67 Which matrix-matrix A. Cannon’s algorithm B 1 4 1 3 1
multiplication algorithm uses
a 3-D partitioning? B. DNS algorithm

C. Both of the above

D. None of the above

68 A solution representing a A. CDA c 1 6 1 1 2


parallelism in an algorithm is
B. PTA

C. CUDA

D. CUD

69 Blocking optimization is used A. Hit miss B 1 5 1 1 2


to improve temporal locality,
for reduce B. Misses

OptimusPrime Page 135


C. Hit rate

D. Cache misses

70 Data are allocated to disks in A. Block level A 1 6 1 1 1


the RAID at the
B. Cache level

C. Low level

D. High level

71 In CUDA C programming a. CPU, CPU d 1 6 1,2, 2 2


serial code is executed 12,
by__and parallel code is b. GPU,CPU 5
executed by__
c. GPU,GPU

d. CPU,GPU

72 Kernel function is qualified a. __local__ C 1 6 1,3 1 1


by the qualifier
b. __universal__

c. __global__

d. A or C

(Mrs. D.A. Phalke & Mrs. Neha D. Patil) (Mrs. Vaishali Kolhe) ( Dr. Kailash Shaw) (Dr. Vinayak Kottawar)
Subject Teacher Academic Coordintor Dept. NBA Coordinator HOD Computer

OptimusPrime Page 136


UNIT SUB : 410241 HPC
ONE

Sr. Questions a b c d Answer


No.
1 A pipeline is like .................... an automobile
assembly line
house pipeline both a and b a gas line
a
2 Data hazards occur when ..................... Greater
performance
Pipeline changes
the order of read/
Some functional unit Machine size is
is not fully pipelined limited
b
loss write access to
operands
3 Systems that do not have parallel
processing capabilities are
SISD SIMD MIMD All of the above
a

4 How does the number of transistors


per chip increase according to Moore
Quadratically Linearly Cubicly Exponentially
d
´s law?

5 Parallel processing may occur in the


instruction
B. in the data
stream
both[A] and [B] none of the
above
c
stream
6 Execution of several activities at the
same time.
processing parallel processing serial processing multitasking
b

7 Cache memory works on the principle Locality of data Locality of memory Locality of reference Locality of
of reference &
c
memory

OptimusPrime Page 137


8 SIMD represents an organization that refers to a
______________. computer
represents
organization of
includes many
processing units
none of the
above.
c
system capable single computer under the supervision
of processing containing a control of a common control
several unit, processor unit unit
programs at the and a memory unit.
same time.
9 A processor performing fetch or
decoding of different instruction
Super-scaling Pipe-lining Parallel Computation None of these
b
during the execution of another
instruction is called ______ .
10 General MIMD configuration usually
called
a
multiprocessor
a vector processor array processor none of the
above.
a
11 A Von Neumann computer uses which SISD
one of the following?
SIMD MISD MIMD.
a
12 MIMD stands for Multiple
instruction
Multiple Memory instruction
instruction memory multiple data
Multiple
information
a
multiple data data memory data
13 MIPS stands for: Memory
Instruction Per
Major Instruction
Per Second
Main Information
Per Second
Million
Instruction Per
d
Second Second
14 M.J. Flynn's parallel processing
classification is based on:
Multiple
Instructions
Multiple data Both (a) and (b) None of the
above
c

15 VLIW stands for: Vector Large


Instruction
Very Long
Instruction Word
Very Large Integrated Very Low
Word Integrated Word
b
Word

OptimusPrime Page 138


16 The major disadvantage of pipeline is: High cost
individual
Initial setup time If branch instruction All of the above
is encountered the
c
dedicated pipe has to be flushed

17 A topology that involves Tokens. Star Ring Bus Daisy Chaining


b
18 multipoint topology is bus star mesh ring
a
19 In super-scalar mode, all the similar TRUE
instructions are grouped and executed
False
a
together.

20 Which mechanism performs an


analysis on the code to determine
Directory
protocol
Snoopy protocol Server based cache
coherence
Compiler based
cache coherence
d
which data items may become unsafe
for caching, and they mark those
items accordingly?
21 How many processors can be
organized in 5-dimensional binary
25 10 32 20
c
hypercube system?
22 Multiprocessors are classified as
________.
SIMD MIMD SISD MISD
b
23 Which of the following is not one of
the interconnection structures?
Crossbar switch Hypercube system Single port memory Time-shared
common bus
c

24 Which combinational device is used


in crossbar switch for selecting proper
Multiplexer Decoder Encoder Demultiplexer
a
memory from multiple addresses?

OptimusPrime Page 139


25 How many switch points are there in 50
crossbar switch network that connects
63 60 54
d
9 processors to 6 memory modules?

26 In a three-cube structure, node 101


cannot communicate directly with
1 11 100 111
b
node?
27 Which method is used as an
alternative way of snooping-based
Directory
protocol
Memory protocol Compiler based
protocol
None of above
a
coherence protocol?
28 snoopy cache protocol are used in
-----------------based system
bus mesh star hypercube
a

29 superscalar architecture contains


-------------execution units for
multiple single none of the above
a
instruction execution
30 time taken by header of a message
between two directly connected nodes
startup time per hop time per word transfer
time
packaging time
b
is called as-----------------

31 the number of switch requirement for n


a network with n input and n output
n2 n3 n4
b
is ------------------

32 which of the following is not static


network
bus ring mesh crossbar switch
d
33 In super-scalar processors, ________
mode of execution is used.
In-order Post order Out of order None of the
mentioned
c

34 ______ have been developed


specifically for pipelined systems.
Utility software Speed up utilities Optimizing
compilers
None of the
above
c

OptimusPrime Page 140


35 Which of the following is a Multicore
combination of several processors on architecture
RISC architecture CISC architecture Subword
parallelism
a
a single chip?
36 The important feature of the VLIW
is .....
ILP Cost effectiveness performance None of the
mentioned
a

37 The parallel execution of operations


in VLIW is done according to the
sk scheduler Interpreter Compiler Encoder
c
schedule determined by .....

38 The VLIW processors are much


simpler as they do not require of .....
Computational
register
Complex logic
circuits
SSD slots Scheduling
hardware
d

39 The VLIW architecture follows .....


approach to achieve parallelism.
MISD SISD SIMD MIMD
d

40 Which of the following is not a


Pipeline Conflicts?
Timing
Variations
Branching Load Balancing Data
Dependency
c

OptimusPrime Page 141


UNIT SUB : 410241 HPC
TWO

Sr. Questions a b c d Answer


No.

1 Task dependency graph is ------------------ directed undirected directed acyclic undirected


acyclic
c
2 In task dependency graph longest
directed path between any pair of start
total work critical path task path task length
b
and finish node is called as --------------

3 which of the following is not a


granularity type
course grain large grain medium grain fine grain
b
4 which of the following is a an example of matrix
data decomposition multiplication
merge sort quick sort 15 puzzal
a

5 which problems can be handled by


recursive decomposition
backtracking greedy method divide and conquer branch and
problem bound
c

6 In this decomposition problem data


decomposition goes hand in hand with its decomposition
recursive
decomposition
explorative
decomposition
speculative
decomposition
c
execution
7 which of the following is not an example n queens
of explorative decomposition problem
15 puzzal problem tic tac toe quick sort
d

8 Topological sort can be applied to which


of the following graphs?
a) Undirected
Cyclic Graphs
b) Directed Cyclic
Graphs
c) Undirected Acyclic d) Directed
Graphs Acyclic Graphs
d

OptimusPrime Page 142


9 In most of the cases, topological sort starts a) Maximum
from a node which has __________ Degree
b) Minimum
Degree
c) Any degree d) Zero Degree
d

10 Which of the following is not an


application of topological sorting?
a) Finding b) Finding
prerequisite of a Deadlock in an
c) Finding Cycle in a d) Ordered
graph Statistics
d
task Operating System

11 In ------------task are defined before


starting the execution of the algorithm
dynamic task static task regular task one way task
b

12 which of the following is not the array


distribution method of data partitioning
block cyclic block cyclic chunk
d

13 blocking optimization is used to improve hit miss


temmporal locality for reduce
misses hit rate cache misses
b

14 CUDA thought that 'unifying theme' of


every form of parallelism is
CDA thread PTA thread CUDA thread CUD thread
c

15 Topological sort of a Directed Acyclic


graph is?
a) Always
unique
b) Always Not
unique
c) Sometimes unique d) Always unique
and sometimes not if graph has even
c
unique number of
vertices
16 threads being block altogether and being thread block
executed in the sets of 32 threads called a
32 thread 32 block unit block
a

17 True or False: The threads in a thread


block are distributed across SM units so
TRUE FALSE
a
that each thread is executed by one SM
unit.

OptimusPrime Page 143


18 When the topological sort of a graph is
unique?
a) When there
exists a
b) In the presence c) In the presence of
of multiple nodes single node with
d) In the
presence of
a
hamiltonian with indegree 0 indegree 0 single node with
path in the graph outdegree 0

19 What is a high performance multi-core


processor that can be used to accelerate a
CPU DSP GPU CLU
c
wide variety of applications using
parallel computing.
20 A good mapping does not depends on
which following factor
knowledge of
task sizes
the size of data
associated with
characteristics of
inter-task
task overhead
d
tasks interactions
21 CUDA is a parallel computing platform
and programming model
TRUE FALSE
a

22 Which of the following is not a form of


parallelism supported by CUDA
Vector
parallelism -
Thread level task
parallelism -
Block and grid level
parallelism -
Data parallelism
- Different
a
Floating point Different threads Different blocks or threads and
computations execute a different grids execute blocks process
are executed in tasks different tasks different parts of
parallel on wide data in memory
vector units

23 The style of parallelism supported on


GPUs is best described as
MISD - Multiple
Instruction
SIMT - Single
Instruction
SISD - Single
Instruction Single
MIMD
b
Single Data Multiple Thread Data

24 True or false: Functions annotated with


the __global__ qualifier may be executed
TRUE FALSE
a
on the host or the device

OptimusPrime Page 144


25 Which of the following correctly
describes a GPU kernel
A kernel may
contain a mix of
All thread blocks
involved in the
A kernel is part of
the GPU's internal
kernel may
contain only host
b
host and GPU same computation micro-operating code
code use the same system, allowing it
kernel to act as in
independent host
26 a code known as grid which runs on GPU 32 thread
consisting of a set of
unit block 32 block thread block
d

27 which of the following is not an parallel


algorithm model
data parallel
model
task graph model task model work pool model
c

28 Having load before the store in a running WAW hazards


program order, then interchanging this
Destination
registers
WAR hazards Registers
c
order, results in a
29 model based on the passing of stream of
data through process arranged in a
producer
consumer model
hybrid model task graph model work pool model
a
succession is called as

30 When instruction i and instruction j are


tends to write the same register or the
Input
dependence
Output
dependence
Ideal pipeline Digital call
b
memory location, it is called

31 Multithreading allowing
multiple-threads for sharing the
Multiple
processor
Single processor Dual core Corei5
b
functional units of a
32 Allowing multiple instructions for issuing Single-issue
in a clock cycle, is the goal of processors
Dual-issue
processors
Multiple-issue
processors
No-issue
processors
c

OptimusPrime Page 145


33 OpenGL stands for: A. Open General B. Open Graphics
Liability Library
C. Open Guide Line D. Open Graphics
Layer
b

34 which of the following is not an


advantage of OpenGL
There is more
detailed
OpenGL is
portable.
OpenGL is more It is not a
functional than any cross-platform
d
documentation other API. API,
for OpenGL
while other API's
don't have such
detailed
documentation.

35 work pool model uses ----------------


approach for task assignment
static dynamic centralized decentralized
b

36 which of the following is false regarding


data parallel model
all task perform degree of
same parallelism
matrix
multiplication is
dynamic
mapping is done
d
computations example of data
increase with size
of problem parallel
t ti
37 which of the following are methods for
containing interaction overheads
maximizing data minimize volumn min frequency
locality of data exchange interactions
of all the above
d

38 which of the following are classes of


dynamic mapping centralized method
self scheduling chunk scheduling both a and b none of the
above
c

39 which of the following is not scheme for


static mapping
block
distribution
block cyclic
distributions
cyclic distributions self scheduling
d

OptimusPrime Page 146


UNIT SUB : 410241 HPC
THREE
Sr. No. Questions a b c d Answer

e.g 1 Write down question Option a Option b Option c Option d a/b/c/d

1 Group communication operations are built


using which primitives?
one to all all to all point to point None of these
c
2 ___ can be performed in an identical fashion Recursive
by inverting the process. Doubling
Reduction Broadcast None of these
b

3 Broadcast and reduction operations on a


mesh is performed
along the rows along the
columns
both a and b
concurrently
None of these
c

4 Cost Analysis on a ring is (ts + twm)(p - 1) (ts - twm)(p + 1) (tw + tsm)(p - 1) (tw - tsm)(p +
1)
a
5 Cost Analysis on a mesh is 2ts(sqrt(p) + 1) +
twm(p - 1)
2tw(sqrt(p) + 1) + 2tw(sqrt(p) - 1) +
tsm(p - 1) tsm(p - 1)
2ts(sqrt(p) - 1)
+ twm(p - 1)
d

6 Communication between two directly link


nodes
Cut-through
routing
Store-and-forwar Nearest
d routing neighbour
None
c
communication

7 All-to-one communication (reduction) is the


dual of ______ broadcast.
all-to-all one-to-all one-to-one all-to-one
b

8 Which is known as Reduction? all-to-one all-to-all one-to-one one-to-all


a

OptimusPrime Page 147


9 Which is known as Broadcast? one-to-one one-to-all all-to-all all-to-one
b
10 The dual of all-to-all broadcast is all-to-all
reduction
all-to-one
reduction
Both None
a
11 All-to-all broadcast algorithm for the 2D
mesh is based on the
Linear Array
Algorithm
Ring algorithm Both None
b

12 In the first phase of 2D Mesh All to All, the


message size is ___
p m*sqrt(p) m p*sqrt(m)
c

13 In the second phase of 2D Mesh All to All, the m


message size is ___
p*sqrt(m) p m*sqrt(p)
d

14 In All to All on Hypercube, The size of the


message to be transmitted at the next step is
doubled tripled halfed no change
a
____ by concatenating the received message
with their current data

15 The all-to-all broadcast on Hypercube needs p


____ steps
sqrt(p) - 1 log p None
c

16 One-to-All Personalized Communication


operation is commonly called ___
gather operation concatenation scatter operation None
c

17 The dual of the scatter operation is the concatenation gather operation Both None
c

OptimusPrime Page 148


18 In Scatter Operation on Hypercube, on each
step, the size of the messages communicated
tripled halved doubled no change
b
is ____
19 Which is also called "Total Exchange" ? All-to-all
broadcast
All-to-all
personalized
all-to-one
reduction
None
b
communication
20 All-to-all personalized communication can
be used in ____
Fourier transform matrix transpose sample sort all of the
above
d

21 In collective communication operations,


collective means
involve group of
processors
involve group of
algorithms
involve group of none of these
variables
a

22 efficiency of data parallel algorithm depends efficient


on the implementation
efficient
implementation
both none
b
of the algorithm of the operation

23 All processes participate in a single ______


interaction operation.
global local wide variable
a

24 subsets of processes in ______ interaction. global local wide variable


b

25 Goal of good algorithm is to implement


commonly used _____ pattern.
communication interaction parallel regular
a

26 Reduction can be used to find the sum,


product, maximum, minimum of _____ of
tuple list sets all of above
c
numbers.
27 source ____ is bottleneck. process algorithm list tuple
a

OptimusPrime Page 149


28 only connections between single pairs of
nodes are used at a time is
good utilization poor utilization massive
utilization
medium
utilization
b

29 all processes that have the data can send it


again is
recursive
doubling
naive approach reduction all
a
30 The ____ do not snoop the messages going
through them.
nodes variables tuple list
a

31 accumulate results and send with the same


pattern is...
broadcast naive approach recursive
doubling
reduction
symmetric
d

32 every node on the linear array has the data


and broadcast on the columns with the linear
parallel vertical horizontal all
a
array algorithm in _____

33 using different links every time and


forwarding in parallel again is
better for
congestion
better for
reduction
better for
communication
better for
algorithm
a

34 In a balanced binary tree processing nodes is leaves


equal to
number of
elemnts
branch none
a
35 In one -to- all broadcast there is divide and
conquer type
sorting type
algorithm
searching type
algorithm
simple
algorithm
a
algorithm
36 For sake of simplicity, the number of nodes is 1
a power of
2 3 4
b

37 Nides with zero in i least significant bits


participate in _______
algorithm broadcast communication searching
c

OptimusPrime Page 150


38 every node has to know when to
communicate that is
call the procedure call for broadcast call for
communication
call the
congestion
a

39 the procedure is disturbed and require only


point-to-point _______
synchronization communication both none
a

40 Renaming relative to the source is _____ the


source.
XOR XNOR AND NAND
a

OptimusPrime Page 151


UNIT SUB : 410241 HPC
FOUR

Sr. No. Questions a b c d Answer

e.g 1 Write down question Option a Option b Option c Option d a/b/c/d

1 mathematically efficiency is e=s/p e=p/s e*s=p/2 e=p+e/e


a
2 Cost of a parallel system is sometimes referred to____ of
product
work processor time both none
c
3 Scaling Characteristics of Parallel Programs Ts is increase constant decreases none
b
4 Speedup tends to saturate and efficiency _____ as a
consequence of Amdahl’s law.
increase constant decreases none
c
5 Speedup obtained when the problem size is _______ linearly increase
with the number of processing elements.
constant decreases depend on
problem
a
size
6 The n × n matrix is partitioned among n processors, with
each processor storing complete ___ of the matrix.
row column both depend on
processor
a

7 cost-optimal parallel systems have an efficiency of ___ 1 n logn complex


a
8 The n × n matrix is partitioned among n2 processors such
that each processor owns a _____ element.
n 2n single double
c

9 how many basic communication operations are used in


matrix vector multiplication
1 2 3 4
c

10 In DNS algorithm of matrix multiplication it used 1d partition 2d partition 3d partition both a,b
c

OptimusPrime Page 152


11 In the Pipelined Execution, steps contain normalization communicatio elimination
n
all
d
12 the cost of the parallel algorithm is higher than the
sequential run time by a factor of __
3/2 2/3 3*2 2/3+3/2
a
13 The load imbalance problem in Parallel Gaussian
Elimination: can be alleviated by using a ____ mapping
acyclic cyclic both none
b

14 A parallel algorithm is evaluated by its runtime in function the input size, the number of the
of processors, communicatio
all
d
n parameters.

15 For a problem consisting of W units of work, p__W


processors can be used optimally.
<= >= < >
a

16 C(W)__Θ(W) for optimality (necessary condition). > < <= equals


d
17 many interactions in oractical parallel programs occur in
_____ pattern
well defined zig-zac reverse straight
a
18 efficient implementation of basic communication operation performance
can improve
communicatio algorithm
n
all
a
19 efficient use of basic communication operations can reduce development
effort and
software
quality
both none
a
20 Group communication operations are built using_____
Messenging primitives.
point-to-point one-to-all all-to-one none
a
21 one processor has a piece of data and it need to send to
everyone is
one -to-all all-to-one point -to-point all of above
a
22 the dual of one -to-all is all-to-one
reduction
one -to-all
reduction
pnoint
-to-point
none
a
reducntion

OptimusPrime Page 153


23 Data items must be combined piece-wise and the result
made available at
target
processor
target
variable
a
finally finatlalyrget
receiver
24 wimpleat way to send p-1 messages from source to the other Algorithm
p-1 processors
communicatio concurrency
n
receiver
c

25 In a eight node ring, node ____ is source of broadcast 1 2 8 0


d
26 The processors compute ______ product of the vector
element and the loval matrix
local global both none
a
27 one to all broadcast use recursive
doubling
simple
algorithm
both none
a

28 In a broadcast and reduction on a balanced binary tree


reduction is done in ______
recursive
order
straight order vertical order parallel
order
a
29 if "X" is the message to broadcast it initially resides at the
source node
1 2 8 0
d
30 logical operators used in algorithm are XOR AND both none
c
31 Generalization of broadcast in Which each processor is Source as well only source
as destination
only
destination
none
a
32 The algorithm terminates in _____ steps p p+1 p+2 p-1
d
33 Each node first sends to one of its neighbours the data it
need to....
broadcast identify verify none
a
34 The second communication phase is a columnwise ______
broadcast of consolidated
All-to-all one -to-all all-to-one
point-to-poi
a
nt

OptimusPrime Page 154


35 All nodes collects _____ message corresponding to √p nodes
to their respectively
√p p p+1 p-1
a
36 It is not possible to port ____ for higher dimensional
network
Algorithm hypercube both none
a
37 If we port algorithm to higher dimemsional network it
would cause
error contention recursion none
b
38 In the scatter operation ____ node send message to every
other node
single double triple none
a
39 The gather Operation is exactly the inverse of _____ scatter
operation
recursion
operation
execution none
a
40 Similar communication pattern to all-to-all broadcast
except in the_____
reverse order parallel order straight order vertical
order
a

OptimusPrime Page 155


UNIT SUB : 410241 HPC
FIVE

Sr. No. Questions a b c d Answer

e.g 1 Write down question Option a Option b Option c Option d a/b/c/d

1 In ___________, the number of elements to be sorted internal sorting internal


is small enough to fit into the process's main searching
external sorting external
searching
a
memory.
2 ______________ algorithms use auxiliary storage
(such as tapes and hard disks) for sorting because
internal sorting internal
searching
External
sorting
external
searching
c
the number of elements to be sorted is too large to
fit into memory.
3 ______ can be comparison-based or
noncomparison-based.
searching Sorting both a and b none of above
b
4 The fundamental operation of comparison-based
sorting is ________.
compare-excha searching
nge
Sorting swapping
a

5 The complexity of bubble sort is Θ(n2). TRUE FALSE


a
6 Bubble sort is difficult to parallelize since the
algorithm has no concurrency.
TRUE FALSE
a
7 Quicksort is one of the most common sorting
algorithms for sequential computers because of its
TRUE FALSE
a
simplicity, low overhead, and optimal average
complexity.
8 The performance of quicksort depends critically
on the quality of the ______-.
non-pivote pivot center element len of array
b

OptimusPrime Page 156


9 the complexity of quicksort is O(nlog n). TRUE FALSE
a
10 DFS begins by expanding the initial node and TRUE FALSE
generating its successors. In each subsequent step,
DFS expands one of the most recently generated
nodes.

11 The main advantage of ______ is that its storage


requirement is linear in the depth of the state
BFS DFS a and b none of above
b
space being searched.
12 _____ algorithms use a heuristic to guide search. BFS DFS a and b none of above
a
13 If the heuristic is admissible, the BFS finds the
optimal solution.
TRUE FALSE
a
14 The search overhead factor of the parallel system is TRUE
defined as the ratio of the work done by the
FALSE
a
parallel formulation to that done by the sequential
formulation
15 The critical issue in parallel depth-first search
algorithms is the distribution of the search space
TRUE FALSE
a
among the processors.

16 Graph search involves a closed list, where the


major operation is a _______
sorting searching lookup none of above
c

17 ______________ algorithms use auxiliary storage


(such as tapes and hard disks) for sorting because
internal sorting internal
searching
External
sorting
external
searching
c
the number of elements to be sorted is too large to
fit into memory.
18 ______ can be comparison-based or
noncomparison-based.
searching Sorting both a and b none of above
b

OptimusPrime Page 157


19 If the heuristic is admissible, the BFS finds the
optimal solution.
TRUE FALSE
a
20 The search overhead factor of the parallel system is TRUE
defined as the ratio of the work done by the
FALSE
a
parallel formulation to that done by the sequential
formulation

21 Breadth First Search is equivalent to which of the


traversal in the Binary Trees?
Pre-order
Traversal
Post-order
Traversal
Level-order
Traversal
In-order
Traversal
c

22 Time Complexity of Breadth First Search is? (V –


number of vertices, E – number of edges)
O(V + E) O(V) O(E) O(V*E)
a
23 Which of the following is not an application of
Breadth First Search?
When the graph When the
is a Binary Tree graph is a
When the graph When the graph
is a n-ary Tree is a Ternary
b
Linked List Tree
24 In BFS, how many times a node is visited? Once Twice Equivalent to
number of
Thrice
c
indegree of the
node

25 Is Best First Search a searching algorithm used in


graphs.
TRUE FALSE
a
26 The critical issue in parallel depth-first search
algorithms is the distribution of the search space
TRUE FALSE
a
among the processors.
27 Graph search involves a closed list, where the
major operation is a _______
sorting searching lookup none of above
c

OptimusPrime Page 158


28 The fundamental operation of comparison-based
sorting is ________.
compare-excha searching
nge
Sorting swapping
a

29 The complexity of bubble sort is Θ(n2). TRUE FALSE


a
30 DFS begins by expanding the initial node and TRUE FALSE
generating its successors. In each subsequent step,
DFS expands one of the most recently generated
nodes.

31 The main advantage of ______ is that its storage


requirement is linear in the depth of the state
BFS DFS a and b none of above
b
space being searched.
32 Breadth First Search is equivalent to which of the
traversal in the Binary Trees?
Pre-order
Traversal
Post-order
Traversal
Level-order
Traversal
In-order
Traversal
c

33 Time Complexity of Breadth First Search is? (V –


number of vertices, E – number of edges)
O(V + E) O(V) O(E) O(V*E)
a
34 Which of the following is not an application of
Breadth First Search?
When the graph When the
is a Binary Tree graph is a
When the graph When the graph
is a n-ary Tree is a Ternary
b
Linked List Tree

35 In BFS, how many times a node is visited? Once Twice Equivalent to


number of
Thrice
c
indegree of the
node

36 Is Best First Search a searching algorithm used in


graphs.
TRUE FALSE
a

OptimusPrime Page 159


37 Which of the following is not a stable sorting
algorithm in its typical implementation.
Insertion Sort Merge Sort Quick Sort Bubble Sort
c
38 Which of the following is not true about
comparison based sorting algorithms?
The minimum
possible time
Any
comparison
Counting Sort is
not a
Heap Sort is not
a comparison
d
complexity of a based sorting comparison based sorting
comparison algorithm can based sorting algorithm.
based sorting be made stable algortihm
algorithm is by using
O(nLogn) for a position as a
random input criteria when
array two elements
39 In ___________, the number of elements to be sorted internal sorting
is small enough to fit into the process's main
internal
searching
external sorting external
searching
a
memory.

OptimusPrime Page 160


UNIT SUB : 410241 HPC
SIX

Sr. Questions a b c d Answer


No.
e.g 1 Write down question Option a Option b Option c Option d a/b/c/d

1 A CUDA program is comprised of two primary


components: a host and a _____.
GPU kernel CPU kernel OS none of above
a
2 The kernel code is dentified by the ________qualifier
with void return type
_host_ __global__ _device_ void
b
3 The kernel code is only callable by the host TRUE FALSE
a
4 The kernel code is executable on the device and host TRUE FALSE
b
5 Calling a kernel is typically referred to as _________. kernel
thread
kernel kernel
initialization termination
kernel
invocation
d
6 Host codes in a CUDA application can Initialize a device TRUE FALSE
a
7 Host codes in a CUDA application can Allocate GPU
memory
TRUE FALSE
a
8 A CUDA program is comprised of two primary
components: a host and a _____.
GPU kernel CPU kernel OS none of above
a

9 A CUDA program is comprised of two primary


components: a host and a _____.
GPU kernel CPU kernel OS none of above
a

10 The kernel code is dentified by the ________qualifier


with void return type
_host_ __global__ _device_ void
b
11 Host codes in a CUDA application can not Invoke kernels TRUE FALSE
b

OptimusPrime Page 161


12 CUDA offers the Chevron Syntax to configure and execute
a kernel.
TRUE FALSE
a
13 the BlockPerGrid and ThreadPerBlock parameters are
related to the ________ model supported by CUDA.
host kernel thread abstract none of above
ion
c
14 _________ is Callable from the device only _host_ __global__ _device_ none of above
c
15 ______ is Callable from the host _host_ __global__ _device_ none of above
b
16 ______ is Callable from the host _host_ __global__ _device_ none of above
a
17 CUDA supports ____________ in which code in a single
thread is executed by all other threads.
tread
division
tread
termination
thread
abstraction
none of above
c

18 In CUDA, a single invoked kernel is referred to as a _____. block tread grid none of above
c
19 A grid is comprised of ________ of threads. block bunch host none of above
a
20 A block is comprised of multiple _______. treads bunch host none of above
a
21 a solution of the problem in representing the
parallelismin algorithm is
CUD PTA CDA CUDA
d
22 ______ is Callable from the host _host_ __global__ _device_ none of above
b
23 ______ is Callable from the host _host_ __global__ _device_ none of above
a
24 A CUDA program is comprised of two primary
components: a host and a _____.
GPU kernel CPU kernel OS none of above
a
25 The kernel code is dentified by the ________qualifier
with void return type
_host_ __global__ _device_ void
b

OptimusPrime Page 162


26 Host codes in a CUDA application can not Reset a device TRUE FALSE
b
27 Host codes in a CUDA application can not Invoke kernels TRUE FALSE
b
28 A CUDA program is comprised of two primary
components: a host and a _____.
GPU kernel CPU kernel OS none of above
a
29 Calling a kernel is typically referred to as _________. kernel
thread
kernel kernel
initialization termination
kernel
invocation
d
30 In CUDA, a single invoked kernel is referred to as a _____. block tread grid none of above
c
31 A grid is comprised of ________ of threads. block bunch host none of above
a
32 A block is comprised of multiple _______. treads bunch host none of above
a
33 A CUDA program is comprised of two primary
components: a host and a _____.
GPU kernel CPU kernel OS none of above
a
34 ______ is Callable from the host _host_ __global__ _device_ none of above
a
35 In CUDA, a single invoked kernel is referred to as a _____. block tread grid none of above
c
36 the BlockPerGrid and ThreadPerBlock parameters are
related to the ________ model supported by CUDA.
host kernel thread abstract none of above
ion
c
37 Host codes in a CUDA application can Transfer data to and TRUE
from the device
FALSE
a
38 Host codes in a CUDA application can not Deallocate
memory on the GPU
TRUE FALSE
b

OptimusPrime Page 163


39 Host codes in a CUDA application can not Reset a device TRUE FALSE
b
40 Calling a kernel is typically referred to as _________. kernel
thread
kernel kernel
initialization termination
kernel
invocation
d

OptimusPrime Page 164


1.Following is true about one to all broadcast

A.In one to all broadcast initially there will be P(Number of processors) copies of messages and

after broadcast finally there will be simgle copy

B.In one to all broadcast initially there will be single copy of message and after broadcast finally

there will P(Number of processors) copies.

Submit

Answer

“In one to all broadcast initially there will be single copy of message and after broadcast finally

there will P(Number of processors) copies.”

2.If total 8 nodes are in ring topology after one to all message broadcasting how many

source nodes will be present?

Submit

Answer

OptimusPrime Page 165


3.Current source node selects _____ node as next source node in linear/ring one to all

message broadcast

A.nearest node

B.longest node

Submit

Answer

longest node

4.In All-to-one reduction after reduction the final copy of massage is avilible on which

node?

A.Source Node

B.Destination Node

C.Both of the above

D.None of these

Answer

Destination Node

5.If there is 4 by 4 mesh topology network present(as per shown in the video) then in how

many broadcast cycles will be required to reach message to all 16 nodes?

OptimusPrime Page 166


4

16

Submit

Answer

6.If there are 8 nodes in a ring topology how many message passing cycles will be

required to complete reduction process

Submit

Answer

7.In One to all broadcast using Hypercube topology how source node selects next

destination node?

Node which is having lowest binary code (label)

Node which is having highet binary code (label)

To all connected node at a time

OptimusPrime Page 167


None of the above

Submit

Answer

Node which is having highet binary code (label)

8.If there are 8 nodes connected in ring topology then ___ number of message passing

cycles will be required to complete all to all broadcast in parallel mode.

Submit

Answer

9.Consider all to all broadcast in ring topology with 8 nodes.How many messages will be

present with each node after 3rd step/cycle of communication?

OptimusPrime Page 168


None of the above

Submit

Answer

10.If there are 16 messages in 4x4 mesh then total how many message passsing cycles

will be required to complete all to all broadcast operation?

Submit

Answer

11.If there are P messages in mxm mesh then total how many message passsing cycles

will be required to complete all to all broadcast operation?

2 √P - 2

2 √P - 1

2 √P

OptimusPrime Page 169


None of the above

Submit

Answer

2 √P - 2

12.How many massage passing cycles required for all-to-all broadcasting in 8 nodes

hypercube?

Submit

Answer

13.In scatter opreation after massage broadcasting every node avail with same massage

copy.

True

False

Submit

Answer

OptimusPrime Page 170


False

14.CUDA helps do execute code in parallel mode using __

CPU

GPU

ROM

Cash memory

Submit

Answer

GPU

15.In thread-function execution scenario thread is a ___

Work

Worker

Task

None of the above

Submit

Answer

Worker

OptimusPrime Page 171


16.In GPU Following statements are true

Block contains Grid

Grid contains Block

Block contains Threads

SM stands for Streaming MultiMedia

SM stands for Streaming MultiProcessor

Submit

Answer

“Grid contains Block”, “Block contains Threads”, “SM stands for Streaming MultiProcessor

17.Following issue(s) is/are the true about sorting techniques with parallel computing.

Large sequence is the issue

Where to store output sequence is the issue

Where to store input sequence is the issue

None of the above

Submit

Answer

“Where to store output sequence is the issue”, “Where to store input sequence is the issue”

OptimusPrime Page 172


18.Partitioning on series done after __

Local arrangement

Processess assignments

Global arrangement

None of the above

Submit

Answer

Global arrangement

19.In Parallel DFS processes has following roles.(Select multiple choices if applicable)

Donor

Active

Idle

Recipient

Submit

Answer

“Donor”, “Recipient”

OptimusPrime Page 173


20.Suppose there are 16 elements in a series then how many phases will be required to

sort the series using parallel odd-even bubble sort?

15

Submit

Answer

15

21.Which are different sources of Overheads in Parallel Programs?

Interprocess interactions

Process Idling

Large amount of DATA

Excess Computation

Submit
Answer

OptimusPrime Page 174


“Interprocess interactions”, “Process Idling”, “Excess Computation”

1 / 1 points
1 / 1 attempts
22,​Speedup (S) is….

The ratio of the time taken to solve a problem on a parallel processors to the time required to

solve the same problem on a single processor with p identical processing elements

The ratio of the time taken to solve a problem on a single processor to the time required to solve

the same problem on a parallel computer with p identical processing elements

The ratio of number of multiple processors to size of data

None of the above

Submit
Answer

The ratio of the time taken to solve a problem on a single processor to the time required to solve
the same problem on a parallel computer with p identical processing elements

1 / 1 points
1 / 1 attempts
23.​Efficiency is a measure of the fraction of time for which a processing element is

usefully employed.

TRUE

FALSE

OptimusPrime Page 175


Submit
Answer

TRUE

OptimusPrime Page 176


marks question A B C D ans
Interconnection Networks Direct Both Static and
0 1 Both Dynamic Static
can be classified as? Network Dynamic.
Parallel Computers are used
Algorithmic Optimization This is an
1 1 to solve which types of Both None
Problems Problems explaination.
problems.
One clock Is used
How many clocks control
2 1 One Three Four Five to control all the
all the stages in a pipeline?
stages.
Main memory is
Main memory in parallel
3 1 Shared Parallel Fixed None shared in parallel
computing is____?
computing.
Ans- (d)-
Application
Which of these is not a class
Application Distributed Symmetric Multicore checkpoiting. is
4 1 of parallel computing
Checkpointing Computing Multiprocessing Computing not a class of
architetcture?
parallel computer
architecture.
Parallel computing
software
Parallel Computing software Parallel
Automatic Application solutionincludes all
5 1 solutions and Techniques All Programming
Parallelization Checkpointing of the following..
includes: languages.
This is an
explanation
The Processors are The Processors
6 2 connected to the memory Switches Cables Buses Registers are connected
through a set of? thru. the switches.
Superscalar Architetcure
This is an
7 2 has how many execution Two One Three Four
explaination.
units?
What is used to hold the The Intermediate
Intermediate
8 2 intermediate output in a Cache RAM ROM Registers are used
Register
pipeline to hold the output.
International
International Human Genome
Human
Which oranization performs Sequencing and Genome Sequencing for
Genome This is an
9 2 sequencing of Human Consortium for Sequencing and Humans and
Sequencing explaination.
Genome? Human Constrium, Consortium,
and
Genome Org. Org.
Consortium
Ans(c)- Five
There are how many stages
10 2 Five Three Two Six stages are there in
in RISC Processor?
a RISC processor.
The DRAM acess
Over the last decade, The
time rate has
DRAM access time has None of the
11 2 0.1 0.2 0.15 improved at a rate
improved at what rate per above
of 10% over the
year?
last decade.

OptimusPrime Page 177


marks question A B C D ans
Cache acts as low
Which memory acts as low- latency high
12 2 latency high bandwidth Cache Register DRAM EPROM bandwidth storage
storage? .This is an
explanation.
Which processor This is an
13 2 SIMD MIMD MISD MIMD
architecture is this? explaination.
This diagram
Which core processor is
14 2 Quad-Core Dual-Core Octa-Core Single-Core shows Quad-
this?
Core.
Data Caching is
Which of these is not a
15 2 Data Caching Decomposition Simplification Parsimony not a prinicple of
scalable design principle?
scable design.
The distance between any O(1) is the ditance
16 2 two nodes in Bus Based O(1) O(n Logn) O(N) O(n^2) between any two
network is? nodes.
All of these are
Early SIMD computers early staged
17 2 All MPP CM-2 Illiac IV
include: SIMD parallel
computers.
This is called
This is which configuration
18 2 Pass-through Cross-Over Shuffle None Pass-through
in Omega networks.
configuration.
Parallelization
Automatic Parallelization includes parse,
19 2 technique doesn’t Share Memory Analyse Schedule Parse analyse schedule
ncludes: and code
generation.
The P4 processor
The Pentuim 4 or P4
has 20 staged
20 2 processor has how many 20 15 18 10
pipeline. This is an
stage pipeline?
explanation.
Sum, Prioirity and
Which protocol is not used
common are used
21 3 to remove concurrent Identify Priority Common Sum
to remove
writes?
concurrent writes.
Exclusive EREW stands for
Erasable Read Easily Read
Read and Exclusive Read
22 3 EREW PRAM stands for? and Erasable and Easily None
Exclusive and Exclsuive
Write PRAM Write
Write Write PRAM.
Multiple
During each clock cycle,
Instuctiion are
multiple instructions are
23 3 Parallel Series Both a and b None piped in parallel.
piped into the processor
This is an
in________?
explanation.
Multistaged
Which Interconnection Multistage Dynamic
24 3 Cross-Bar Bus-Staged Network uses this
Network uses this equation. Networks Networks
eqn.
OptimusPrime Page 178
marks question A B C D ans
There are
generally four
How many types of parallel types of parallel
computing are available computing,
25 3 from both proprietary and 4 2 3 6 available from
open source parallel both proprietary
computing vendors? and open source
parallel computing
vendors.
If a piece of data
is repeatedly used,
If a piece of data is the effective
repeatedly used, the latency of this
effective latency of this memory system
memory system can be Memory can be reduced by
26 3 Hit Ratio Memory ratio Hit Fraction
reduced by the cache. The Fraction. the cache. The
fraction of data references fraction of data
satisfied by the cache is references
called? satisfied by the
cache is called the
cache hit ratio.
SuperScalar
Superscalar Architetcure Data- Architecture can
27 3 Scheduling Phasing Data Extraction
can create problem in? Compiling cause problems in
CPU scheduling.
In cut-through
In cut-through routing, a routing, a message
28 3 message is broken into fixed Flits Flow Digits Control Digits All is broken into
size units called? fixed size units
called flits.
The total communication
This is an
29 3 time for cut-through routing A B C D
explaination.
is?
The Disadvantage of GPU Load- Process All of the This is an
30 1 Data balancing
Pipeline is? balancing balancing above explaination.
Examples of GPU AMD Both AMD and
31 1 Both NVIDIA None
Processors are: Processors NVIDIA.
Simultaneous
execution of
Simultaneous execution of
Stream different programs
32 1 different programs on a data Data Execution Data-paralleism None
Parallelism on a data stream is
stream is called?
called Stream
Parallelism.
Early GPU controllers were GPU This is an
33 1 Video Shifters GPU Shifters Video-Movers
known as? Controllers Explaination.
Algorithm
_____development is a
development is a
critical component of
34 1 Algorithm Code Pseudocode Problem critical component
problem solving using
of problem solving
computers?
using computers
OptimusPrime Page 179
marks question A B C D ans
Graphics
Graphical Gaming Graph This is an
35 1 GPU stands for? Processsing
Processing Unit Processing Unit Processing Unit Explaination.
Unit
Parallelism leads
naturally to
Concurrency. For
Serial
36 1 What leads to concurrency? Parallelism Decomposition All example, Several
Processing
processes trying to
print a file on a
single printer.
Rasterization is the
process of
The process of determining
Space- determining which
which screen-space pixel
37 2 Rasterization Pixelisation Fragmentation Determining screen-space pixel
locations are covered by
Process locations are
each\ntriangle is known as?
covered by
each\ntriangle.
The
programmable
units of the GPU
The programmable units of
follow a single
38 2 the GPU follow which SPMD MISD MIMD SIMD
program multiple-
programming model?
data (SPMD)
programming
model.
Shared Address
Which space can ease the space can ease the
programming effort, programming
especially if the distribution Shared Parallel Series- effort, especially if
39 2 Data- Address
of data is different in Address Address Address the distribution of
different phases of the data is different in
algorithm? different phases of
the algorithm.
Processors are the
Which are the hardware hardware units
40 2 units that physically perform Processsor ALU CPU CU that physically
computations? perform
computations
All of the these are
Examples of Graphics API
41 2 All DirectX CUDA Open-CL examples of
are?
Graphics API
The mechanism by
The mechanism by which which tasks are
tasks are assigned to assigned to
42 2 Mapping Computation Process None
processes for execution is processes for
called___? execution is called
mapping.

OptimusPrime Page 180


marks question A B C D ans
A decomposition
A decomposition into a into a large
large number of small tasks number of small
43 2 Fine- grained Coarse-grained Vector-granied All
is called__________ tasks is called
granularity. fine-grained
granularity.
Identical
operations being
Identical operations being
applied
applied concurrently on Data-
44 2 Parallelism Data Serialsm Concurrency concurrently on
different data items is Parallelism
different data
called?
items is called
Data Parallelism.
System which do not have
This is the
45 2 parallel processsing SISD SIMD MISD MIMD
explainantion.
capabiities?
The time and the
The time and the location in location in the
the program of a static one- program of a static
46 2 Priori Polling Decomposition Execution
way interaction is known as one-way
? interaction is
known a priori.
Memory access in RISC
CALL and MOV and This is the
47 2 architecture is limited to STA and LDA Push and POP
RET JMP explaination.
which instructions?
Data Parallel
Which Algorithms can be algorithms can be
implemented in both shared- implemented in
Data-Parallel Quick-Sort Bubble Sort
48 2 address-space and Data Algorithm both shared-
Algo. Algo. Algo.
message-passing address-space
paradigms? and message-
passing paradigms
Randomized This figure shows
Which type of Distribution is Block-Cyclic Cyclic
49 2 Block None Randomized
this? Distribution Distribution
Distribution Block Distribution.
An abstraction
used to express
An abstraction used to such dependencies
express such dependencies Task- Time- among tasks and
Dependency
50 2 among tasks and their Dependency Dependency None their relative order
Graph.
relative order of execution is Graph. Graph of execution is
known as__________? known as a task-
dependency
graph.

OptimusPrime Page 181


marks question A B C D ans
Block distributions
are some of the
Which is the simplest way to simplest ways to
distribute an array and distribute an array
Block Array Process
51 3 assign uniform contiguous All and assign uniform
Distrbution Distrbution Distribution
portions of the array to contiguous
different processes? portions of the
array to different
processes
An example of a
An example of a decomposition
Image- Travelling Time-
decomposition with a 8 Queen with a regular
52 3 dethering Salesman complexity
regular interaction pattern problem. interaction pattern
problem. Problem Problens
is? is the problem of
image dithering.
A feature of a
task-dependency
A feature of a task-
graph that
dependency graph that
determines the
53 3 determines the average Critical-path Process-path Granularity. Concurrency
average degree of
degree of concurrency for a
concurrency for a
given granularity is
given granularity is
critical path.
The shared-
address-space
The shared-address-space programming
54 3 programming paradigms can Both Two way One way None paradigms can
handle which interactions? handle both one-
way and two-way
interactions.
Cyclic Distribution
can result in an
Which distribution can result
almost perfect
in an almost perfect load
Cyclic Array Block-Cyclic Block load balance due
55 3 balance due to the extreme
Distribution. Distribution Distribution Distribution. to the extreme
fine-grained underlying
fine-grained
decomposition.
underlying
decomposition.
Data sharing
interactions can be
Data sharing interactions
categorized as
56 3 can be categorized Both Read-Write Read only None
either read-only or
as__________interactions?
read-write
interactions

OptimusPrime Page 182


marks question A B C D ans
Algo. Model is a
way of structuring
a parallel algorithm
What is the way of
by selecting a
structuring a parallel
decomposition
algorithm by selecting a
Algorithm Mapping and mapping
57 3 decomposition and mapping Parallel Model Data Model
Model Model technique and
technique and applying the
applying the
appropriate strategy to
appropriate
minimize interactions called?
strategy to
minimize
interactions.
This is Serial
Serial column Column- Bubble Sort
58 3 Which Algorithm is this? None. Column based
based Algo. Algorithm Algo.
algorithm.
Algorithms based on the Matrix- Parallel This is an
59 3 All Quicksort
task graph model include: Factorization QuickSort Explaination.
All-port
communication
Which model permits model permits
simultaneous communication All-port One-port Dual-port Quad-port simultaneous
60 1
on all the channels communication communication communication communication communication on
connected to a node? all the channels
connected to a
node.
A process sends the same
m-word message to every
other process, but different All to All One to All All to All This is an
61 1 None
processes may broadcast Broadcast Broadcast Reduction Explaination.
different messages. It is
called?
All to All One-to-all All-to-one One to one
The Matrix is transposed This is an
62 1 personalized personalized personalized personalized
using which operation? Explaination.
communication communication communication communication.
Each node in a
Each node in a two-
two-dimensional
63 1 dimensional wraparound Four Two Three One
wraparound mesh
mesh has how many ports?
has four ports
Circular shift is a member of
a broader class of global This is ann
64 1 Permutation Combination. Both a and b None
communication operations explaination.
known as?
We define a
circular q-shift as
We define_______ as the the operation in
operation in which node i which node i
Circular q-
65 1 sends a data packet to node Linear shift Circular shift Linear q-shift. sends a data
shift
(i + q) mod p in a p-node packet to node (i
ensemble (0 < q < p). + q) mod p in a p-
node ensemble (0
OptimusPrime < q <Page
p). 183
marks question A B C D ans
Parallel algorithms
often require a
Parallel algorithms often single process to
require a single process to send identical data
send identical data to all One to All One to One All to One to all other
66 1 None
other processes or to a Broadcast Broadcast Broadcast processes or to a
subset of them. This subset of them.
operation is known as? This operation is
known as One to
All Broadcast.
In which Communication
All to All One to One All-to-one One-to-all
each node sends a distinct This is an
67 1 personalized personalized personalized personalized
message of size m to every Explaination.
communication communication communication communication.
other node?
All to All personalized
communication operation is Matrix- Fourier Database Join This is an
68 1 Quick Sort
not used in a which of these Transpose Transformation operation Explaination.
parallel algorithms?
The dual of one to
The Dual of one-to-all All to one All to one One to Many All to All all Broadcast is
69 1
broadcast is? Reduction Broadcast Reduction Broadcast called all to one
reduction.
Reduction on a
Reduction on a linear array linear array can be
can be performed performed by
70 1 by_______ the direction Reversing Forwarding Escaping Widening simply reversing
and the sequence of the direction and
communication? the sequence of
communication
This equation is used to
solve which topology This is an
71 2 Hypercube Mesh Ring Linear-Array
operations in all to all Explaination.
communications?
\nThe communication
Second
pattern of all-to-all Third Variation First Variation Fifth Variation This is an
72 2 Variation of
broadcast can be used to of Reduction of Reduction of Reduction Explaination.
Reduction
perform________?
In the scatter
A single node sends a
operation, a single
unique message of size m to
node sends a
73 2 every other node. This Scatter Reduction Gather Concatenate
unique message of
operation is known
size m to every
as______?
other node.
The Algorithm represents All to All All to All All to All One to One This is an
74 2
which broadcast? Broadcast Broadcast Reduction Reduction explaination.
The message can be The message can
75 2 broadcast in how many Log(p) Log(p^2) One Sin(p) be broadcast in
steps? log p steps.
All to All One-to-all One to one All-to-one
This equation is used to This is an
76 2 personalized OptimusPrime
personalized personalized personalized
solve which operations? Explaination.
Page 184
communication communication communication communication.
marks question A B C D ans
There are n^3
There are how many
computations for
computations for n^2 words
77 2 N^3 Tan n E^n Log n n^2 words of data
of data transferred among
transferred among
the nodes?
the nodes.
Scatter opeartion
One-to-all One-to-one All-to-one All-to-all is also known as
Scatter Operation is also
78 2 personalized personalized personalized personalized One-to-all
known as?
communication communication communication communication. personalized
communication.
A hypercube with
A Hypercube with 2d nodes 2d nodes can be
can be regarded as a d- regarded as a d-
79 2 Two One Three Four
dimensional mesh with____ dimensional mesh
nodes in each dimension. with two nodes in
each dimension
One-to-all broadcast and
all-to-one reduction are Gausiian Shortest path Matrix- Vector This is an
80 2 All
used in several important Elimination Algo. multiplication Explaination.
parallel algorithms including?
Each node of the
Each node of the distributed-
distributed-memory parallel memory parallel
81 2 computer is a______ NUMA UMA CCMA None computer is a
shared-memory NUMA shared-
multiprocessor. memory
multiprocessor.
To perform a q-
To perform a q-shift, we shift, we expand q
82 2 expand q as a sum of 2 3 e Log p as a sum of
distinct powers of______? distinct powers of
2.
In which implementation of
This is an
83 3 circular shift, the entire row Mesh Hypercube Ring Linear
Explaination
to data set is shifted by
On a p-node
hypercube with
all-port
communication,
On a p-node hypercube
the coefficients of
with all-port communication,
tw in the
the coefficients of tw in the
expressions for the
expressions for the
communication
communication times of
84 3 Log(p) Cos(p) Sin(p) E^p times of one-to-all
one-to-all and all-to-all
and all-to-all
broadcast and personalized
broadcast and
communication are all
personalized
smaller than their single-port
communication are
counterparts by a factor of?
all smaller than
their single-port
OptimusPrime counterparts
Page 185by a
factor of log p.
marks question A B C D ans
The Equation represents
Data Model Space-Time Ans-(c) Cost
85 3 which analysis in All to All Cost Analysis Time Analysis
Analysis Analysis Analysis.
Broadcasts?
On a p-node hypercube, the
size of each message
86 3 A B C D A
exchanged in the i th of the
log p steps is?
This figure shows
One to All
Which broadcast is applied One to All One to One All to One All to one
87 3 Broadcast being
on this 3D hypercube? Broadcast Broadcast Broadcast Reduction
applied on 3D
hypercube.
The Equation represents
This is an
88 3 which analysis in One to All Cost Analysis Time Analysis Data Analysis Space Analysis
explaination.
Broadcasts?
The time for
The time for circular shift on circular shift on a
a hypercube can be hypercube can be
89 3 improved by almost a factor Log p Cos(p) e^p Sin p improved by
of ______ for large almost a factor of
messages. log p for large
messages.
The execution time
of a parallel
algorithm depends
not only on input
size but also on
The execution time of the number of
Relative Communication
90 1 parallel algorithm Processor Input Size processing
computation speed
doesn’t depends upon? elements used,
and their relative
computation and
interprocess
communication
speeds.
Processing elements in a The processing Both
parallel system may become Load element synchronization
91 1 Both Synchronization
idle due to many reasons Imbalance doesn’t and load
such as: become idle. imbalance
If the scaled-
speedup curve is
If the scaled-speedup curve close to linear with
is close to linear with respect to the
respect to the number of number of
92 1 Scalable Iso-scalable Non-Scalable Scale-Efficient
processing elements, then processing
the parallel system is elements, then the
considered as? parallel system is
considered
scalable

OptimusPrime Page 186


marks question A B C D ans
A parallel system
is the combination
Which system is the
of an algorithm
combination of an algorithm Parallel Data- Parallel Architecture
93 1 Series System and the parallel
and the parallel architecture System System System
architecture on
on which it is implemented?
which it is
implemented
Scalable Speedup
defined as the
What is defined as the speedup obtained
speedup obtained when the when the problem
Scalable Unscalable Superlinearity Isoefficiency
94 1 problem size is increased size is increased
Speedup Speedup Speedup Speedup
linearly with the number of linearly with the
processing elements? number of
processing
elements
The maximum
number of tasks
The maximum number of that can be
tasks that can be executed executed
95 1 simultaneously at any time in Concurrency Parallelism Linearity Execution simultaneously at
a parallel algorithm is called any time in a
its degree of__________. parallel algorithm
is called its degree
of concurrency.
The isoefficiency due to
This is an
96 1 concurrency in 2-D O(p) O(n Logp) O(1) O(n^2)
explaination.
partitioning is:
We define total
overhead of a
parallel system as
the total time
The total time collectively collectively spent
spent by all the processing by all the
elements over and above processing
that required by the fastest elements over and
Total Parallel
97 2 known sequential algorithm Overhead Serial Runtime above that
Overhead Runtime
for solving the same required by the
problem on a single fastest known
processing element is sequential
known as? algorithm for
solving the same
problem on a
single processing
element.
Parallel
Parallel computations computations
involving matrices and involving\nmatrices
98 2 vectors readily lend Decomposition Composition Linearity Parallelsim and vectors
themselves to data readily lend
______________. themselves to data
OptimusPrime
decomposition.
Page 187
marks question A B C D ans
Parallel 1-D with Pipelining
This is an
99 2 is a___________ Synchronous Asynchronous Optimal Cost-optimal
explaination.
algorithm?
The serial complexity of
This is an
100 2 Matrix-Matrix Multiplication Õ•(n^3) O(n^2) O(n) O(nlogn)
explaination
is:
What is the problem size for Ï´(n^3) is the
101 2 Ï´(n^3) Ï´(nlogn) Ï´(n^2) Ï´(1)
n x n matrix multiplication? problem size.
The given equation Overhead Series Parallel This is an
102 2 Parallel Model
represents which function? Function Overtime Overtime explaination.
The efficiency of a parallel
103 2 A B C D A
program can be written as:
The total number
The total number of steps in
of steps in the
104 2 the entire pipelined Θ(n) Θ(n^2) Θ(n^3) Θ(1)
entire pipelined
procedure is_______?
procedure is Θ(n)
In Canon's Algorithm, the This is an
105 2 θ(n^2) θ(n) θ(n^3) θ(nlogn)
memory used is? explaination.
Consider the
problem of
Consider the problem of
multiplying two n
multiplying two n × n
× n dense,
106 2 dense, square\nmatrices A A×B A/B A+B A-B
square\nmatrices
and B to yield the product
A and B to yield
matrix C =:
the product matrix
C = A × B.
The serial runtime of
multiplying a matrix of
107 2 A B C D A
dimension n x n with a
vector is?
Efficiency is a
measure of the
________is a measure of
fraction of time for
the fraction of time for Overtime
108 2 Efficiency Linearity Superlinearity which a
which a processing element Function
processing
is usefully employed.
element is usefully
employed.
When the work performed
by a serial algorithm is
greater than its parallel
formulation or due to Superlinear Linear Performance This is an
109 2 Super Linearity
hardware features that put Speedup Speedups Metrics explaintion
the serial implementation at
a disadvantage.This
phenomena is known as?
The all-to-all broadcast and
This is an
110 3 the computation of y[i] both Θ(n) Θ(nlogn) Θ(n^2) Θ(n^3)
explaination.
take time?

OptimusPrime Page 188


marks question A B C D ans
If virtual
processing
elements are
mapped
If virtual processing
appropriately onto
elements are mapped
physical
appropriately onto physical
processing
111 3 processing elements, the N/p P/n N+p N*p
elements, the
overall communication time
overall
does not grow by more than
communication
a factor of
time does not
grow by more
than a factor of
n/p
Parallel execution time can
be expressed as a function
of problem size, overhead
112 3 A B C D A
function, and the number of
processing elements.The
Formed eqn is:
In 2-D partioning, the first Ts + twn/
113 3 Ts - twn/√p.\n Ts*twn/√p.\n Ts/ twn*√p.\n Ts + twn/√p.\n
alignment takes time=? √p.\n
Using fewer than
the maximum
Using fewer than the
possible number
maximum possible number
of processing
114 3 of processing elements to Scaling Down Scaling up Scaling Stimulation
elements to
execute a parallel algorithm
execute a parallel
is called________?
algorithm is called
scaling down.
Which of the following is a
Memory This is an
115 3 drawback of matrix matrix Efficient Time-bound Complex
Optimal explaination
multiplication?
Consider the
problem of sorting
Consider the problem of 1024 numbers (n
sorting 1024 numbers (n = = 1024, log n =
116 3 1024, log n = 10) on 32 P/log n P*log n P+logn N*log p 10) on 32
processing elements. The processing
speedup expected is elements. The
speedup expected
is p/logn
Consider the problem of
adding n numbers on p
processing elements such
Ꙩ((n/p) log Ꙩ((n*p) log Ꙩ((p/n) log Ans-(a)-
117 3 that p < n and both n and p Ꙩ((n) log p).
p). p). p). Ꙩ((n/p) log p).
are powers of 2. The overall
parallel execution time of the
problem is:
DNS algorithm has____ DNS has Ω(n)
118 3 Ω(n) Ω(n^2) Ω(n^3) Ω(logn)
runtime? OptimusPrime runtime
Page 189
marks question A B C D ans
The serial algorithm Ans-(b)-n^2. The
requires______ serial algorithm
119 3 multiplications and additions N^2 N^3 Log n Nlog(n) requires n^2
in matrix-vector multiplications and
multiplication.\n additions.\n\n
The time required
The time required to merge to merge two
120 1 two sorted blocks of n/p θ(n/p) θ(n) θ(p/n) θ(nlogp) sorted blocks of
elements is_________?\n\n n/p elements is
θ(n/p).\n\n
The stack is split
into two equal
In Parallel DFS,the stack is pieces such that
split into two equal pieces the size of the
such that the size of the search space
121 1 Half-Split Half-Split Parallel-Split None
search space represented represented by
by each stack is the same. each stack is the
Such a split is called?. same. Such a split
is called a half-
split.
To avoid sending
very small
To avoid sending very small
amounts of work,
amounts of work, nodes
nodes beyond a
beyond a specified stack
122 1 Cut-Off Breakdown Full Series specified stack
depth are not given away.
depth are not
This depth is called
given away. This
the_________depth.
depth is called the
cutoff depth.
In sequential
In sequential sorting sorting algorithms,
algorithms, the input and the Process Secondary External the input and the
123 1 Main Memory
sorted sequences are stored Memory Memory Memory sorted sequences
in which memory? are stored in the
process's memory
Each process
sends its block to
the other process.
Each process sends its
Now, each
block to the other process.
process merges
Now, each process merges
the two sorted
the two sorted blocks and
124 1 Compare-Split Split Compare Exchange. blocks and retains
retains only the appropriate
only the
half of the merged block.
appropriate half of
We refer to this operation
the merged block.
as?
We refer to this
operation as
compare-split.

OptimusPrime Page 190


marks question A B C D ans
Each process
compares the
Each process compares the
received element
received element with its
with its own and
own and retains the Compare Process-
125 1 Exchange All retains the
appropriate element.We Exchange Exchange
appropriate
refer this operation
element. We refer
as_______?
this as compare
exchange.
Parallel BFS
maintains the
Which algorithm maintains
unexpanded nodes
the unexpanded nodes in the
126 1 Parallel BFS Parallel DFS Both a and b None in the search
search graph, ordered
graph, ordered
according to their l-value?
according to their
l-value.
The critical issue in
The critical issue in parallel parallel depth-first
depth-first search algorithms search algorithms
127 1 is the distribution of the Processor Space Memory Blocks is the distribution
search space among of the search
the____________? space among the
processors
Enumeration Sort uses how
This is an
128 2 many processes to sort n N^2 Logn N^3 N
explaination.
elements?
A bitonic
sequence is a
sequence of
elements <a0, a1,
Which sequence is a
..., an-1> with the
sequence of elements <a0,
property that
a1, ..., an-1> with the
either (1) there
property that either (1) there
exists an index i, 0
exists an index i, 0 ≤ i
i n - 1, such that
≤n - 1, such that <a0, Bitonic Acyclic Asymptotic Cyclic
129 2 <a0, ..., ai > is
..., ai > is monotonically Sequence Sequence Sequence Sequence.
monotonically
increasing and <ai +1, ...,
increasing and <ai
an-1> is monotonically
+1, ..., an-1> is
decreasing, or (2) there
monotonically
exists a cyclic shift of indices
decreasing, or (2)
so that (1) is satisfied.
there exists a
cyclic shift of
indices so that (1)
is satisfied

OptimusPrime Page 191


marks question A B C D ans
To make a
substantial
improvement over
To make a substantial
odd-even
improvement over odd-even
transposition sort,
transposition sort, we need
we need an
130 2 an algorithm that moves Shell Sort Linear Sort Quick-Sort Bubble Sort
algorithm that
elements long distances.
moves elements
Which one of these is such
long distances.
serial sorting algorithm?
Shellsort is one
such serial sorting
algorithm.
Quicksort is a
Quick-Sort is a_________ Divide and Greedy Divide and
131 2 Both a and b None
algorithm? Conquer Approach Conquer
algorithm.
The_______ transposition
algorithm sorts n elements in
n phases (n is even), each of This is an
132 2 Odd-Even Odd Even None
which requires n/2 explaination.
compare-exchange
operations.
The average time
The average time
complexity for
133 2 complexity for Bucket Sort O(n+k) O(nlog(n+k)) O(n^3) θ(n^2)
Bucket Sort is
is?
O(n + k).
A popular serial
algorithm for
A popular serial algorithm sorting an array of
for sorting an array of n n elements whose
elements whose values are Quick-Sort values are
134 2 Bucket Sort Linear Sort Bubble-Sort
uniformly distributed over an Algo. uniformly
interval [a, b] is which distributed over an
algorithm? interval [a, b] is
the bucket sort
algorithm
Best case
Best Case time complexity complexity of
135 2 O(n) O(n^3) O(nlogn) O(n^2)
of Bubble Sort is: bubblesort is
O(n).

OptimusPrime Page 192


marks question A B C D ans
When more than
one process tries
When more than one to write to the
process tries to write to the same memory
same memory location, only location, only one
one arbitrarily chosen arbitrarily chosen
CRCW-
136 2 process is allowed to write, PRAM Partitioning CRCW process is allowed
PRAM
and the remaining writes are to write, and the
ignored. This process is remaining writes
called_________ in quick are ignored.It is
sort. called CRCW
PRAM quick sort
algo.
Average Time Complexity in This is an
137 2 O(nlogn) O(n) O(n^3) θ(n^2)
a quicksort algorithm is: explainatoin.
The isoefficiency function of The isoefficiency
138 2 Global Round Robin (GRR) O (p^2 log p) O (p log p) O ( log p) O (p^2) function of GRR is
is: O (p^2 log p)
A comparator is a
A_____ is a device with
device with two
two inputs x and y and two
139 2 Comparator Router Separator Switch. inputs x and y and
outputs x' and y' in a Sorting
two outputs x' and
Network.
y'
If T is a DFS tree
in G then the
If T is a DFS tree in G then parallel
the parallel implementation implementation of
140 2 of the algorithm runs in O(t) O(tlogn) O(logt) O(1) the algorithm
______________time outputs a proof
complexity. that can be
verified in O(t)
time complexity.
In the quest for
fast sorting
In the quest for fast sorting methods, a
methods, a number of number of
networks have been networks have
141 2 θ(nlogn) θ(n) θ(1) θ(n^2)
designed that sort n been designed that
elements in time significantly sort n elements in
smaller than___? time significantly
smaller than
θ(nlogn).
The average value
The average value of the
of the search
search overhead factor in
142 2 One Two Three Four overhead factor in
parallel DFS is less
parallel DFS is
than______?
less than one
Parallel runtime for
Parallel runtime for Ring
Ring architecture
143 3 architecture in a bitonic sort θ(n) θ(nlogn) θ(n^2) θ(n^3)
in a bitonic sort is
is:
OptimusPrime θ(n) Page 193
marks question A B C D ans
The Sequential Complexity
This is an
144 3 of Odd-Even Transposition θ(n^2) θ(nlogn) θ(n^3) θ(n)
explaination.
Algorithm is:
The Algorithm represents Sequential Circular Bubble Simple Bubble Linear Bubble This is an
145 3
which bubble sort: Bubble Sort Sort Sort Sort explaination.
Enumeration Sort uses how
This is an
146 3 much time to sort n θ(1) θ(nlogn) θ(n^2) θ(n)
explaination.
elements?
The radix sort
algorithm relies on
The______algorithm relies
the binary
147 3 on the binary representation Radix-sort Bubble Sort Quick-Sort Bucket-Sort
representation of
of the elements to be sorted.
the elements to be
sorted.
Parallel runtime for Mesh
This is an
148 3 architecture in a bitonic sort θ(n/logn) θ(n) θ(n^2) θ(n^3)
explaination.
is:
The number of
The number of threads in a threads in a thread
thread block is limited by block is also
149 1 the architecture to a total of 512 502 510 412 limited by the
how many threads per architecture to a
block? total of 512
threads per block
CUDA Architecture is
NVIDIA provides
150 1 mainly provided by which NVIDIA Intel Apple IBM
CUDA services.
company?
In CUDA Architecture,
Subprograms are
151 1 what are subprograms Kernel Grid Element Blocks
called kernels.
called?
CUDA Stands for
Compute Computer Common USB Common
What is the fullform of Compute Unified
152 1 Unified Device Unified Device Device Unified Disk
CUDA? Device
Architecture Architecture Architecture Architecture
Architecture.
CUDA
Which of these is not an
Thermo Neural VLSI architecture has no
153 2 application of CUDA Fluid Dynamics
Dynamics Networks Stimulation use on Thermo
Arhitecture?
Dynamics.
CUDA
CUDA programming is programming is
especially well-suited to especially well-
address problems that can suited to address
154 2 Data parallel Task Parallel Both a and b None
be expressed problems that can
as__________ be expressed as
computations. dataparallel
computations.
CUDA C/C++ uses which This is an
155 2 global kernel Cuda_void nvcc
keyword in programming: explaination.
CUDA programs are saved This is an
156 2 .cd .cx
OptimusPrime .cc .cu Page 194
with_____ extension. explaination
marks question A B C D ans
The Kepler K20X
chip block
The Kepler K20-X chip
diagram,
block, contains____
157 2 15 8 16 7 containing 15
streaming
streaming
multiprocessors\n(SMs).
multiprocessors
(SMs)
The K20X
The Kepler K20X architecture
158 2 architecture increases the 64K 32K 128K 256K increases the
register file size to: register fi le size to
64K
The register file in a GPU is Register size in a
159 2 2 MB 1 MB 3MB 1024B
of what size? GPU is 2MB.
NVIDIA’s GPU
computing platform is not This is an
160 2 AMD Tegra Quadro Tesla
enabled on which of the explaination.
following product families:
Tesla K-40 has compute This is an
161 2 3.5 3.2 3.4 3.1
capability of: explaination.
The SIMD unit
The SIMD unit creates, creates, manages,
manages, schedules and schedules and
162 2 executes_____ threads 32 16 24 8 executes 32
simultaneously to create a threads
warp. simultaneously to
create a warp
Which hardware is used by
the host interface to fasten Direct
Memory This is an
163 2 the transfer of bulk data to Memory Switch Hub
Hardware Explaination
and fro the graphics Access
pipeline?
A ‘grid’ is a
collection of
A ____ is a collection of
thread blocks of
thread blocks of the same
164 2 Grid Core Element Blcoks the same thread
thread dimensionality which
dimensionality
all execute the same kernel.
which all execute
the same kernel
Active Warps can be
This is an
165 2 classified into how many 3 2 4 5
explaination.
types?
All threads in a
All threads in a grid share Global Synchronized grid\nshare the
166 2 Local Memory All
the same_________space. memory Memory same global
memory space
CUDA was introduced in This is an
167 2 2007 2006 2008 2010
which year? explaination.

OptimusPrime Page 195


marks question A B C D ans
Unlike a C
function call, all
Unlike a C function call, all
168 3 Asynchronous Synchronous Both a and b None CUDA kernel
CUDA kernel launches are:
launches are
asynchronous
A warp consists of
32 consecutive
A warp consists
threads and
of____consecutive threads
all\nthreads in a
and all threads in a warp are
169 3 32 16 64 128 warp are executed
executed in Single
in Single
Instruction Multiple Thread
Instruction
(SIMT) fashion.
Multiple Thread
(SIMT) fashion
There are how many
This is an
170 3 streaming multiprocessors in 16 8 12 4
explaination.
CUDA architecture?
In CUDA programming, if
This is an
171 3 CPU is the host then device GPU Compiler HDD GPGPU
explaination.
will be:
Both grids and
Both grids and blocks use blocks use the
172 3 the______ type with three Dim3 Dim2 Dim1 Dim4 dim3 type with
unsigned integer fields. three unsigned
integer fi elds
Tesla P100 GPU
based on the
Tesla P100 GPU based on Pascal GPU
the Pascal GPU Architecture has
Architecture has 56 56 Streaming
173 3 Streaming Multiprocessors 2048 512 1024 256 Multiprocessors
(SMs), each capable of (SMs), each
supporting up to____active capable of
threads. supporting up to
2048 active
threads.
The maximum size
The maximum size at each
at each level of the
174 3 level of the thread hierarchy Device Host Compiler Memory
thread hierarchy is
is_____dependent.
device dependent.
Intel I7 has the memory bus This is an
175 3 19B 180B 152B 102B
of width: explaination.
The Streaming
The__________ is the Multiprocessor
Streaming
176 3 heart of the GPU Multiprocessor CUDA Compiler (SM) is the heart
Multiprocessor
architecture: of the GPU
architecture.

OptimusPrime Page 196


marks question A B C D ans
A kernel is defi
A kernel is defined using ned using
177 3 the_____ declaration _global _host _device _void the\n__global
specification declaration specifi
cation
The function
printThreadInfo() is not
Memory Matrix Ans-(d)- Memory
178 3 used to print out which of Block Index Control-Index
Allocations Coordinates Allocations.
the following information
about each thread:

OptimusPrime Page 197


----------------------------------------------------------------------------------------------------------------------------- --
UNIT 1
----------------------------------------------------------------------------------------------------------------------------- --

1. Which is the type of Microcomputer Memory

Address
Contents
Both a and b
none
Ans:
Both a and b

2. A collection of lines that connects several devices is called

Bus
Peripheral connection wires
Both a and b
internal wires
Ans:
Bus

3. Conventional architectures coarsely comprise of a

Processor
Memory System
Data path
All of the above
Ans:
All of the above

4. VLIW processors rely on

Compile time analysis


Initial time analysis
Final time analysis
id time analysis
Ans:
Compile time analysis

5. HPC is not used in high span bridges

True
False
Ans:
False

6. The access time of memory is …………… the time required for performing any single CPU operation.

longer than
shorter than

OptimusPrime Page 1
negligible than
same as
Ans:
longer than

7. Data intensive applications utilize_

High aggregate throughput


High aggregate network bandwidth
high processing and memory system performance
none of above
Ans:
High aggregate throughput

8. Memory system performance is largely captured by_

Latency
bandwidth
both a and b
none of above
Ans:
both a and b

9. A processor performing fetch or decoding of different instruction during the execution of another
instruction is called __ .

Super-scaling
Pipe-lining
Parallel Computation
none of above
Ans:
Pipe-lining

10. For a given FINITE number of instructions to be executed, which architecture of the processor
provides for a faster execution ?

ISA
ANSA
Super-scalar
All of the above
Ans:
Super-scalar

high performance computing mcq sppu


11. HPC works out to be economical.

True
false
Ans:
True

OptimusPrime Page 2
12. High Performance Computing of the Computer System tasks are done by

Node Cluster
Network Cluster
Beowulf Cluster
Stratified Cluster
Ans:
Beowulf Cluster

13. Octa Core Processors are the processors of the computer system that contains

2 Processors
4 Processors
6 Processors
8 Processors
Ans:
8 Processors

14. Parallel computing uses _ execution

sequential
unique
simultaneous
None of above
Ans:
simultaneous

15. Which of the following is NOT a characteristic of parallel computing?

Breaks a task into pieces


Uses a single processor or computer
Simultaneous execution
May use networking
Ans:
Uses a single processor or computer

16. Which of the following is true about parallel computing performance?

Computations use multiple processors


There is an increase in speed
The increase in speed is loosely tied to the number of processor or computers used
All of the Ans:s are correct.
Ans:
All of the Ans:s are correct.

17. __ leads to concurrency.

Serialization
Parallelism
Serial processing
Distribution

OptimusPrime Page 3
Ans:
Parallelism

18. MIPS stands for?

Mandatory Instructions/sec
Millions of Instructions/sec
Most of Instructions/sec
Many Instructions / sec
Ans:
Millions of Instructions/sec

19. Which MIMD systems are best scalable with respect to the number of processors

Distributed memory computers


ccNUMA systems
Symmetric multiprocessors
None of above
Ans:
Distributed memory computers

high performance computing mcq questions


20. To which class of systems does the von Neumann computer belong?

SIMD (Single Instruction Multiple Data)


MIMD (Multiple Instruction Multiple Data)
MISD (Multiple Instruction Single Data)
SISD (Single Instruction Single Data)
Ans:
SISD (Single Instruction Single Data)

21. Which of the architecture is power efficient?

CISC
RISC
ISA
IANA
Ans:
RISC

22. Pipe-lining is a unique feature of _.

RISC
CISC
ISA
IANA
Ans:
RISC

23. The computer architecture aimed at reducing the time of execution of instructions is __.

OptimusPrime Page 4
RISC
CISC
ISA
IANA
Ans:
RISC

24. Type of microcomputer memory is

processor memory
primary memory
secondary memory
All of above
Ans:
All of above

25. A pipeline is like_

Overlaps various stages of instruction execution to achieve performance.


House pipeline
Both a and b
A gas line
Ans:
Overlaps various stages of instruction execution to achieve performance.

High Performance Computing mcq pdf


High Performance Computing mcq pdf unit 1Download
26. Scheduling of instructions is determined_

True Data Dependency


Resource Dependency
Branch Dependency
All of above
Ans:
All of above

27. The fraction of data references satisfied by the cache is called_

Cache hit ratio


Cache fit ratio
Cache best ratio
none of above
Ans:
Cache hit ratio

28. A single control unit that dispatches the same Instruction to various processors is__

SIMD
SPMD
MIMD
none of above

OptimusPrime Page 5
Ans:
SIMD

29. The primary forms of data exchange between parallel tasks are_

Accessing a shared data space


Exchanging messages.
Both A and B
none of above
Ans:
Both A and B

30. Switches map a fixed number of inputs to outputs.

True
False
Ans:
True
----------------------------------------------------------------------------------------------------------------------------- --
UNIT 2
-------------------------------------------------------------------------------------------------------------------------------

1. The First step in developing a parallel algorithm is_

To Decompose the problem into tasks that can be executed concurrently


Execute directly
Execute indirectly
None of Above
Ans:
To Decompose the problem into tasks that can be executed concurrently

2. The number of tasks into which a problem is decomposed determines its_

Granularity
Priority
Modernity
None of Above
Ans:
Granularity

3. The length of the longest path in a task dependency graph is called_

the critical path length


the critical data length
the critical bit length
None of Above
Ans:
he critical path length

4. The graph of tasks (nodes) and their interactions/data exchange (edges)_

OptimusPrime Page 6
Is referred to as a task interaction graph
Is referred to as a task Communication graph
Is referred to as a task interface graph
None of Above
Ans:
Is referred to as a task interaction graph

5. Mappings are determined by_

task dependency
task interaction graphs
Both A and B
None of Above
Ans:
Both A and B

6. Decomposition Techniques are_

recursive decomposition
data decomposition
exploratory decomposition
speculative decomposition
All of above
Ans:
All of above

7. The Owner Computes rule generally states that the process assigned a particular data item is
responsible for _

All computation associated with it


Only one computation
Only two computation
Only occasionally computation
Ans:
All computation associated with it

8. A simple application of exploratory decomposition is_

The solution to a 15 puzzle


The solution to 20 puzzle
The solution to any puzzle
None of Above
Ans:
The solution to a 15 puzzle

9. Speculative Decomposition consist of _

conservative approaches
optimistic approaches
Both A and B
only B

OptimusPrime Page 7
Ans:
Both A and B

hpc mcq questions


10. task characteristics include:

Task generation.
Task sizes.
Size of data associated with tasks.
All of above
Ans:
All of above

11. What is a high performance multi-core processor that can be used to accelerate a wide variety of
applications using parallel computing.

CLU
GPU
CPU
DSP
Ans:
GPU

12. What is GPU?

Grouped Processing Unit


Graphics Processing Unit
Graphical Performance Utility
Graphical Portable Unit
Ans:
Graphics Processing Unit

13. A code, known as GRID, which runs on GPU consisting of a set of

32 Thread
32 Block
Unit Block
Thread Block
Ans:
Thread Block

14. Interprocessor communication that takes place

Centralized memory
Shared memory
Message passing
Both A and B
Ans:
Both A and B

15. Decomposition into a large number of tasks results in coarse-grained decomposition

OptimusPrime Page 8
True
False
Ans:
False

16. Relevant task characteristics include

Task generation.
Task sizes
Size of data associated with tasks
Overhead
both A and B
Ans:
both A and B

17. The fetch and execution cycles are interleaved with the help of __

Modification in processor architecture


Clock
Special unit
Control unit
Ans:
Clock

18. The processor of system which can read /write GPU memory is known as

kernal
device
Server
Host
Ans:
Host

19. Increasing the granularity of decomposition and utilizing the resulting concurrency to perform more
tasks in parallel decreses performance.

TRUE
FALSE
Ans:
FALSE

Parallel computing mcq with answers


20. If there is dependency between tasks it implies their is no need of interaction between them.

TRUE
FALSE
Ans:
FALSE

21. Parallel quick sort is example of task parallel model

OptimusPrime Page 9
TRUE
FALSE
Ans:
TRUE

22. True Data Dependency is

The result of one operation is an input to the next.


Two operations require the same resource.
Ans:
The result of one operation is an input to the next.

23. What is Granularity ?

The size of database


The size of data item
The size of record
The size of file
Ans:
The size of data item

24. In coarse-grained parallelism, a program is split into …………………… task and


……………………… Size

Large tasks , Smaller Size


Small Tasks , Larger Size
Small Tasks , Smaller Size
Equal task, Equal Size
Ans:
Large tasks , Smaller Size

-------------------------------------------------------------------------------------------------------------------------------
UNIT 3
----------------------------------------------------------------------------------------------------------------------------- --

1. The primary and essential mechanism to support the sparse matrices is

Gather-scatter operations
Gather operations
Scatter operations
Gather-scatter technique
Ans:
Gather-scatter operations

2. In the gather operation, a single node collects a ———

Unique message from each node


Unique message from only one node
Different message from each node
None of Above

OptimusPrime Page 10
Ans:
Unique message from each node

3. In the scatter operation, a single node sends a ————

Unique message of size m to every other node


Different message of size m to every other node
Different message of different size m to every other node
All of Above
Ans:
Unique message of size m to every other node

4. Is All to all Bradcasting is same as All to all personalized communication?

Yes
No
Ans:
No

5. Is scatter operation is same as Broadcast?

Yes
No
Ans:
No

6. All-to-all personalized communication is also known as

Total Exchange
Personal Message
Scatter
Gather
Ans:
Total Exchange

7. By which way, scatter operation is different than broadcast

Message size
Number of nodes
Same
None of above
Ans:
Message size

8. The gather operation is exactly the _ of the scatter operation

Inverse
Reverse
Multiple
Same
Ans:

OptimusPrime Page 11
Inverse

9. The gather operation is exactly the inverse of the_

Scatter operation
Broadcast operation
Prefix Sum
Reduction operation
Ans:
Scatter operation

hpc mcq questions


10. The dual of one-to-all broadcast is all-to-one reduction. True or False?

TRUE
FALSE
Ans:
TRUE

11. A binary tree in which processors are (logically) at the leaves and internal nodes are routing nodes.

TRUE
FALSE
Ans:
TRUE

12. Group communication operations are built using point-to-point messaging primitives

TRUE
FALSE
Ans:
TRUE

13. Communicating a message of size m over an uncongested network takes time ts + tmw

True
False
Ans:
True

14. Parallel programs: Which speedup could be achieved according to Amdahl´s law for infinite number
of processors if 5% of a program is sequential and the remaining part is ideally parallel?

Infinite speedup
5
20
None of above
Ans:
20

15. Shift register that performs a circular shift is called

OptimusPrime Page 12
Invalid Counter
Valid Counter
Ring
Undefined
Ans:
Ring

16. 8 bit information can be stored in

2 Registers
4 Registers
6 Registers
8 Registers
Ans:
8 Registers

17. The result of prefix expression * / b + – d a c d, where a = 3, b = 6, c = 1, d = 5 is

0
5
10
8
Ans:
10

18. The height of a binary tree is the maximum number of edges in any root to leaf path. The maximum
number of nodes in a binary tree of height h is?

2h – 1
2h – 1 – 1
2h + 1 – 1
2 * (h+1)
Ans:
2h + 1 – 1

19. A hypercube has_

2^d nodes
2d nodes
2n Nodes
N Nodes
Ans:
2^d nodes

Parallel computing mcq with answers


20. The Prefix Sum Operation can be implemented using the_

All-to-all broadcast kernel


All-to-one broadcast kernel
One-to-all broadcast Kernel

OptimusPrime Page 13
Scatter Kernel
Ans:
All-to-all broadcast kernel

21.In the scatter operation_

Single node send a unique message of size m to every other node


Single node send a same message of size m to every other node
Single node send a unique message of size m to next node
None of Above
Ans:
Single node send a unique message of size m to every other node

22. In All-to-All Personalized Communication Each node has a distinct message of size m for every other
node

True
False
Ans:
True

23. A binary tree in which processors are (logically) at the leaves and internal nodes are
routing nodes.

True
False
Ans:
True

24. In All-to-All Broadcast each processor is thesource as well as destination.

True
False
Ans:
True

----------------------------------------------------------------------------------------------------------------------------- --
UNIT 4
-------------------------------------------------------------------------------------------------------------------------------

1. mathematically efficiency is

e=s/p
e=p/s
e*s=p/2
e=p+e/e
Ans:
e=s/p

2. Cost of a parallel system is sometimes referred to____ of product

OptimusPrime Page 14
work
processor time
both
none
Ans:
both

3. Scaling Characteristics of Parallel Programs Ts is

increase
constant
decreases
none
Ans:
constant

4. Speedup tends to saturate and efficiency _ as a consequence of Amdahl’s law.

increase
constant
decreases
none
Ans:
decreases

5. Speedup obtained when the problem size is ______ linearly with the number of processing elements.

increase
constant
decreases
depend on problem size
Ans:
increase

6. The n × n matrix is partitioned among n processors, with each processor storing complete ___ _ of the
matrix.

row
column
both
depend on processor
Ans:
row

7. cost-optimal parallel systems have an efficiency of _____

1
n
logn
complex
Ans:

OptimusPrime Page 15
1

8. The n × n matrix is partitioned among n2 processors such that each processor owns a _ element

n
2n
single
double
Ans:
single

9. how many basic communication operations are used in matrix vector multiplication

1
2
3
4
Ans:
3

10. In DNS algorithm of matrix multiplication it used

1d partition
2d partition
3d partition
both a,b
Ans:
3d partition

high performance computing mcq sppu


11. In the Pipelined Execution, steps contain

normalization
communication
elimination
all
Ans:
all

12. the cost of the parallel algorithm is higher than the sequential run time by a factor of __

3/2
2/3
3*2
2/3+3/2
Ans:
3/2

13. The load imbalance problem in Parallel Gaussian Elimination: can be alleviated by using a __

mapping

OptimusPrime Page 16
acyclic
cyclic
both
none
Ans:
acyclic

14. A parallel algorithm is evaluated by its runtime in function of

the input size,


the number of processors,
the communication parameters.
all
Ans:
all

15. For a problem consisting of W units of work, p__W processors can be used optimally

<=
>=
<
>
Ans:
16. C(W)__Θ(W) for optimality (necessary condition).

>
<
<=
equals
Ans:
equals

17. many interactions in oractical parallel programs occur in ____ pattern

well defined
zig-zac
reverse
straight
Ans:
well defined

18. efficient implementation of basic communication operation can improve

performance
communication
algorithm
all
Ans:
performance

19. efficient use of basic communication operations can reduce

OptimusPrime Page 17
development effort
software quality
both
none
Ans:
development effort

high performance computing mcq questions


20. Group communication operations are built using _____ Messenging primitives.

point-to-point
one-to-all
all-to-one
none
Ans:
point-to-point

21. one processor has a piece of data and it need to send to everyone is

one -to-all
all-to-one
point -to-point
all of above
Ans:
one -to-all

22. wimpleat way to send p-1 messages from source to the other p-1 processors

Algorithm
communication
concurrency
receiver
Ans:
concurrency

23. In a eight node ring, node __ is source of broadcast

1
2
8
0
Ans:
0

24. The processors compute __ product of the vector element and the local matrix

local
global
both
none

OptimusPrime Page 18
Ans:
local

25. one to all broadcast use

recursive doubling
simple algorithm
both
none
Ans:
recursive doubling

26. In a broadcast and reduction on a balanced binary tree reduction is done in __

recursive order
straight order
vertical order
parallel order
Ans:
recursive order

27. if “X” is the message to broadcast it initially resides at the source node

1
2
8
0
Ans:
0

28. logical operators used in algorithm are

XOR
AND
both
none
Ans:
both

29. Generalization of broadcast in Which each processor is

Source as well as destination


only source
only destination
none
Ans:
Source as well as destination

30. The algorithm terminates in _ steps

OptimusPrime Page 19
p+1
p+2
p-1
Ans:
p-1

30. Each node first sends to one of its neighbours the data it need to….

broadcast
identify
verify
none
Ans:
broadcast

30. The second communication phase is a columnwise __ broadcast of consolidated

All-to-all
one -to-all
all-to-one
point-to-point
Ans:
All-to-all

30. All nodes collects _ message corresponding to √p nodes to their respectively

√p
p
p+1
p-1
Ans:
√p

30. It is not possible to port __ for higher dimensional network

Algorithm
hypercube
both
none
Ans:
Algorithm

30. If we port algorithm to higher dimemsional network it would cause

error
contention
recursion
none
Ans:
contention

OptimusPrime Page 20
30. In the scatter operation __ node send message to every other node

single
double
triple
none
Ans:
single

30. The gather Operation is exactly the inverse of _

scatter operation
recursion operation
execution
none
Ans:
scatter operation

30. Similar communication pattern to all-to-all broadcast except in the_____

reverse order
parallel order
straight order
vertical order
Ans:
reverse order

----------------------------------------------------------------------------------------------------------------------------- --
UNIT 5
----------------------------------------------------------------------------------------------------------------------------- --

1. In _, the number of elements to be sorted is small enough to fit into the process’s main memory.

internal sorting
internal searching
external sorting
external searching
Ans:
internal sorting

2. __ algorithms use auxiliary storage (such as tapes and hard disks) for sorting because the number of
elements to be sorted is too large to fit into memory.

internal sorting
internal searching
External sorting
external searching
Ans:
External sorting

3. __ can be comparison-based or noncomparison-based.

OptimusPrime Page 21
searching
Sorting
both a and b
none of above
Ans:
Sorting

4. The fundamental operation of comparison-based sorting is __

compare-exchange
searching
Sorting
swapping
Ans:
compare-exchange

5. The complexity of bubble sort is Θ(n^2)

TRUE
FALSE
Ans:
TRUE

6. Bubble sort is difficult to parallelize since the algorithm has no concurrency.

TRUE
FALSE
Ans:
TRUE

7. Quicksort is one of the most common sorting algorithms for sequential computers because of its
simplicity, low overhead, and optimal average complexity.

TRUE
FALSE
Ans:
TRUE

8. The performance of quicksort depends critically on the quality of the __

non-pivote
pivot
center element
len of array
Ans:
pivot

9. the complexity of quicksort is O(nlog n)

TRUE

OptimusPrime Page 22
FALSE
Ans:
TRUE

10. DFS begins by expanding the initial node and generating its successors. In each subsequent step, DFS
expands one of the most recently generated nodes.

TRUE
FALSE
Ans:
TRUE

high performance computing mcq sppu


11. The main advantage of __ is that its storage requirement is linear in the depth of the state space being
searched.

BFS
DFS
a and b
none of above
Ans:
DFS

12. _ algorithms use a heuristic to guide search.

BFS
DFS
a and b
none of above
Ans:
BFS

13. If the heuristic is admissible, the BFS finds the optimal solution.

TRUE
FALSE
Ans:
TRUE

14. The search overhead factor of the parallel system is defined as the ratio of the work done by the
parallel formulation to that done by the sequential formulation

TRUE
FALSE
Ans:
TRUE

15. The critical issue in parallel depth-first search algorithms is the distribution of the search space among
the processors.

TRUE

OptimusPrime Page 23
FALSE
Ans:
TRUE

16. Graph search involves a closed list, where the major operation is a _

sorting
searching
lookup
none of above
Ans:
lookup

17. Breadth First Search is equivalent to which of the traversal in the Binary Trees?

Pre-order Traversal
Post-order Traversal
Level-order Traversal
In-order Traversal
Ans:
Level-order Traversal

18. Time Complexity of Breadth First Search is? (V – number of vertices, E – number of edges)

O(V + E)
O(V)
O(E)
O(V*E)
Ans:
O(V + E)

19. Which of the following is not an application of Breadth First Search?

When the graph is a Binary Tree


When the graph is a Linked List
When the graph is a n-ary Tree
When the graph is a Ternary Tree
Ans:
When the graph is a Linked List

high performance computing mcq questions


20. In BFS, how many times a node is visited?

Once
Twice
Equivalent to number of indegree of the node
Thrice
Ans:
Equivalent to number of indegree of the node

21. Is Best First Search a searching algorithm used in graphs.

OptimusPrime Page 24
TRUE
FALSE
Ans:
TRUE

22. The critical issue in parallel depth-first search algorithms is the distribution of the search space among
the processors.

TRUE
FALSE
Ans:
TRUE

23. Graph search involves a closed list, where the major operation is a _

sorting
searching
lookup
none of above
Ans:
lookup

24. Which of the following is not a stable sorting algorithm in its typical implementation.

Insertion Sort
Merge Sort
Quick Sort
Bubble Sort
Ans:
Quick Sort

25. Which of the following is not true about comparison based sorting algorithms?

The minimum possible time complexity of a comparison based sorting algorithm is O(nLogn) for a
random input array
Any comparison based sorting algorithm can be made stable by using position as a criteria when two
elements are compared
Counting Sort is not a comparison based sorting algortihm
Heap Sort is not a comparison based sorting algorithm.
Ans:
Heap Sort is not a comparison based sorting algorithm.

------------------------------------------------------------------------------------------------------- ------------------------
UNIT 6
----------------------------------------------------------------------------------------------------------------------------- --

1. A CUDA program is comprised of two primary components: a host and a _

GPU kernel
CPU kernel

OptimusPrime Page 25
OS
none of above
Ans:
GPU kernel

2.The kernel code is dentified by the ________qualifier with void return type

_host_
__global__
_device_
void
Ans:
__global__

3. The kernel code is only callable by the host

TRUE
FALSE
Ans:
TRUE

4. The kernel code is executable on the device and host

TRUE
FALSE
Ans:
FALSE

5. Calling a kernel is typically referred to as _

kernel thread
kernel initialization
kernel termination
kernel invocation
Ans:
kernel invocation

6. Host codes in a CUDA application can Initialize a device

TRUE
FALSE
Ans:
TRUE

7. Host codes in a CUDA application can Allocate GPU memory

TRUE
FALSE
Ans:
TRUE

OptimusPrime Page 26
8. Host codes in a CUDA application can not Invoke kernels

TRUE
FALSE
Ans:
FALSE

9. CUDA offers the Chevron Syntax to configure and execute a kernel.

TRUE
FALSE
Ans:
TRUE

10. the BlockPerGrid and ThreadPerBlock parameters are related to the __ model supported by CUDA.

host
kernel
thread abstraction
none of above
Ans:
thread abstraction

high performance computing mcq sppu


11. _ is Callable from the device only

_host_
__global__
_device_
none of above
Ans:
_device_

12. __ is Callable from the host

__global__
_device_
none of above
Ans:
__global__

13. CUDA supports __ in which code in a single thread is executed by all other threads.

tread division
tread termination
thread abstraction
none of above
Ans:
thread abstraction

14. In CUDA, a single invoked kernel is referred to as a _

OptimusPrime Page 27
block
tread
grid
none of above
Ans:
grid

15. A grid is comprised of __ of threads.

block
bunch
host
none of above
Ans:
block

16. Host codes in a CUDA application can not Reset a device

TRUE
FALSE
Ans:
FALSE

17. A block is comprised of multiple _

treads
bunch
host
none of above
Ans:
treads

18. a solution of the problem in representing the parallelism in algorithm is

CUD
PTA
CDA
CUDA
Ans:
CUDA

19. Host codes in a CUDA application can Transfer data to and from the device

TRUE
FALSE
Ans:
TRUE

20. Host codes in a CUDA application can not Deallocate memory on the GPU

OptimusPrime Page 28
TRUE
FALSE
Ans:
FALSE

----------------------------------------------------------------------------------------------------------------------------- --

Question 1 : The need for parallel processor to increase speedup

1. Moores Law
2. Minsky conjecture
3. Flynns Law
4. Amdhals Law

ANSWER
Amdhals Law

Question 2 : Which of the following interrupt is non maskable

1. INTR
2. RST 7.5
3. RST 6.5
4. TRAP

ANSWER
-------------

Question 3 : In which system desire HPC

1. Adaptivity
2. Transparency
3. Dependency
4. Secretivte

ANSWER
Transparency

Question 4 : When every caches hierarchy level is subset of level which futher away from the
processor

1. Synchronous
2. Atomic synschronous
3. Distrubutors

OptimusPrime Page 29
4. Multilevel inclusion

ANSWER
Multilevel inclusion

Question 5 : . ______________ leads to concurrency

1. Serialization
2. cloud computing
3. Distribution
4. Parallelism

ANSWER
Parallelism

Question 6 : The problem where process concurrency becomes an issue is called as ___________

1. Reader-write problem
2. Bankers problem
3. Bakery problem
4. Philosophers problem

ANSWER
Reader-write problem

Question 7 : Interprocess communication that take place

1. Centralized memory
2. Message passing
3. shared memory
4. cache memory

ANSWER
shared memory

Question 8 : Speedup can be as low as____

1. 1
2. 2
3. 0
4. 3

OptimusPrime Page 30
ANSWER
0

Question 9 : A type of parallelism that uses micro architectural techniques.

1. bit based
2. bit level
3. increasing
4. instructional

ANSWER
instructional

Question 10 : MPI_Comm_size

1. Returns number of processes


2. Returns number of line
3. Returns size of program
4. Returns value of instruction

ANSWER
Returns number of processes

Question 11 : High performance computing of the computer system tasks are done by

1. node clusters
2. network clusters
3. Beowulf clusters
4. compute nodes

ANSWER
compute nodes

Question 12 : MPI_Comm_rank

1. returns rank
2. returns processes
3. returns value
4. Returns value of instruction

ANSWER
returns rank

OptimusPrime Page 31
Question 13 : A processor performing fetch or decoding of different instruction during the
execution of another instruction is called ______ .

1. Super-scaling
2. Pipe-lining
3. Parallel Computation
4. distributed

ANSWER
Pipe-lining

Question 14 : Any condition that causes a processor to stall is called as _________

1. page fault
2. system error
3. Hazard
4. execuation error

ANSWER
Hazard

Question 15 : Characteristic of RISC (Reduced Instruction Set Computer) instruction set is

1. one word instruction


2. two word instruction
3. three word instruction
4. four word instruction

ANSWER
one word instruction

Question 16 : The disadvantage of using a parallel mode of communication is ______

1. Leads to erroneous data transfer


2. It is costly
3. Security of data
4. complexity of network

ANSWER
It is costly

OptimusPrime Page 32
Question 17 : A microprogram sequencer

1. generates the address of next micro instruction to be executed.


2. generates the control signals to execute a microinstruction.
3. sequentially averages all microinstructions in the control memory.
4. enables the efficient handling of a micro program subroutine.

ANSWER
-------------

Question 18 : The___ time collectively spent by all the processing elements Tall = p TP

1. total
2. Average
3. mean
4. sum

ANSWER
total

Question 19 : In a distributed computing environment, distributed shared memory is used which


is_____________

1. Logical combination of virtual memories on the nodes


2. Logical combination of physical memories on the nodes
3. Logical combination of the secondary memories on all the nodes
4. Logical combinatin of files

ANSWER
-------------

Question 20 : The average number of steps taken to execute the set of instructions can be made
to be less than one by following _______ .

1. Sequentional
2. super-scaling
3. pipe-lining
4. ISA

ANSWER
super-scaling

OptimusPrime Page 33
Question 21 : The main difference between the VLIW and the other approaches to improve
performance is ___________

1. increase in performance
2. Lack of complex hardware design
3. Cost effectiveness
4. latency

ANSWER
Lack of complex hardware design

Question 22 : CISC stands for

1. Complete Instruction Sequential Compilation


2. Complete Instruction Sequential Compiler
3. Complete Instruction Serial Compilation
4. Complex Instruction set computer

ANSWER
complex

Question 23 : Speedup, in theory, should be ______ bounded by p

1. lower
2. upper
3. left
4. right

ANSWER
upper

Question 24 : Virtualization that creates one single address space architecture that of, is called

1. Loosely coupled
2. Space based
3. Tightly coupled
4. peer-to-peer

ANSWER
Space based

Question 25 : MPI_Init

OptimusPrime Page 34
1. Close MPI environment
2. Initialize MPI environment
3. start programing
4. Call processes

ANSWER
start programing

Question 26 : Content of the program counter is added to the address part of the instruction in
order to obtain the effective address is called

1. relative address mode


2. index addressing mode
3. register mode
4. implied mode

ANSWER
-------------

Question 27 : The straight-forward model used for the memory consistency, is called

1. Sequential consistency
2. Random consistency
3. Remote node
4. Host node

ANSWER
-------------

Question 28 : Which MIMD systems are best scalable with respect to the number of processors

1. Distributed memory
2. ccNUMA
3. nccNUMA
4. Symmetric multiprocessor

ANSWER
Distributed memory

Question 29 : Memory management on a multiprocessor must deal with all of found on

1. Uniprocessor Computer

OptimusPrime Page 35
2. Computer
3. Processor
4. System

ANSWER
-------------

Question 30 : The___ time collectively spent by all the processing elements Tall = p TP

1. total
2. sum
3. average
4. product

ANSWER
total

Question 31 : Hazard are eliminated through renaming by renaming all

1. Source register
2. Memory
3. Data
4. Destination register

ANSWER
Destination register

Question 32 : The situation wherein the data of operands are not available is called ______

1. stock
2. Deadlock
3. data hazard
4. structural hazard

ANSWER
data hazard

Question 33 : types of HPC application

1. Mass Media
2. Business
3. Management

OptimusPrime Page 36
4. Science

ANSWER
Science

Question 34 : A distributed operating system must provide a mechanism for

1. intraprocessor communication
2. intraprocess and intraprocessor communication
3. interprocess and interprocessor communication
4. interprocessor communication

ANSWER
-------------

Question 35 : This is computation not performed by the serial version

1. Serial computation
2. Excess computation
3. serial computation
4. parallel computing

ANSWER
-------------

Question 36 : The important feature of the VLIW is ______

1. ILP
2. Performance
3. Cost effectiveness
4. delay

ANSWER
ILP

Question 37 : The tightly coupled set of threads execution working on a single task ,that is called

1. Multithreading
2. Parallel processing
3. Recurrence
4. Serial processing

OptimusPrime Page 37
ANSWER
Multithreading

Question 38 : Parallel Algorithm Models

1. Data parallel model


2. Bit model
3. Data model
4. network model

ANSWER
Data parallel model

Question 39 : Mpi_Recv used for

1. reverse message
2. receive message
3. forward message
4. Collect message

ANSWER
receive message

Question 40 : Status bit is also called

1. Binary bit
2. Flag bit
3. Signed bit
4. Unsigned bit

ANSWER
-------------

Question 41 : For inter processor communication the miss arises are called

1. hit rate
2. coherence misses
3. comitt misses
4. parallel processing

ANSWER
coherence misses

OptimusPrime Page 38
Question 42 : The interconnection topologies are implemented using _________ as a node.

1. control unit
2. microprocessor
3. processing unit
4. microprocessor or processing unit

ANSWER
-------------

Question 43 : _________ gives the theoretical speedup in latency of the execution of a task at
fixed execution time

1. Amdahl's
2. Moor's
3. metcalfe's
4. Gustafson's law

ANSWER
Gustafson's law

Question 44 : The number and size of tasks into which a problem is decomposed determines the

1. fine-grainularity
2. coarse-grainularity
3. sub Task
4. granularity

ANSWER
granularity

Question 45 : MPI_Finalize used for

1. Stop mpi environment program


2. intitalise program
3. Include header files
4. program start

ANSWER
Stop mpi environment program

OptimusPrime Page 39
Question 46 : Private data that is used by a single processor then shared data are used

1. Single processor
2. Multi processor
3. Single tasking
4. Multi tasking

ANSWER
Single processor

Question 47 : The time lost due to the branch instruction is often referred to as ____________

1. Delay
2. Branch penalty
3. Latency
4. control hazard

ANSWER
Branch penalty

Question 48 : NUMA architecture uses _______in design

1. cache
2. shared memory
3. message passing
4. distributed memory

ANSWER
distributed memory

Question 49 : Divide and Conqure apporach is known for

1. Sequentional algorithm development


2. parallel algorithm develpoment
3. Task defined algorithm
4. Non defined Algorithm

ANSWER
Sequentional algorithm development

Question 50 : The parallelism across branches require which scheduling

OptimusPrime Page 40
1. Global scheduling
2. Local Scheduling
3. post scheduling
4. pre scheduling

ANSWER
Global scheduling

Question 51 : Parallel processing may occur

1. In the data stream


2. In instruction stream
3. In network
4. In transferring

ANSWER
In the data strea

Question 52 : Pipe-lining is a unique feature of _______.

1. CISC
2. RISC
3. ISA
4. IANA

ANSWER
RISC

Question 53 : In MPI programing MPI_char is the instruction for

1. Unsign Char
2. Sign character
3. Long Char
4. unsign long char

ANSWER
Sign Char

Question 54 : To increase the speed of memory access in pipelining, we make use of _______

1. Special memory locations


2. Special purpose registers

OptimusPrime Page 41
3. Cache
4. Buffer

ANSWER
buffer

Question 55 : If the value V(x) of the target operand is contained in the address field itself, the
addressing mode is

1. Immediate
2. Direct
3. Indirect
4. Implied

ANSWER
-------------

Question 56 : In a multi-processor configuration two coprocessors are connected to host 8086


processor. The instruction sets of the two coprocessors

1. must be same
2. may overlap
3. must be disjoint
4. must be the same as that of host

ANSWER
-------------

Question 57 : A feature of a task-dependency graph that determines the average degree of


concurrency for a given granularity is its ___________ path

1. critical
2. easy
3. difficult
4. ambiguous

ANSWER
critical

Question 58 : MPI_send used for

1. collect message

OptimusPrime Page 42
2. transfer message
3. send message
4. receive message

ANSWER
send message

Question 59 : What is usually regarded as the von Neumann Bottleneck

1. Instruction set
2. Arithmetic logical unit
3. Processor/memory interface
4. Control unit

ANSWER
Arithmetic logical unit

Question 60 : An interface between the user or an application program, and the system resources
is

1. Microprocessor
2. Microcontroller
3. Multimicroprocessor
4. operating system

ANSWER
-------------

Question 61 : The computer architecture aimed at reducing the time of execution of instructions
is ________.

1. CISC
2. RISC
3. SPARC
4. ISA

ANSWER
RISC

Question 62 : parallel computer is capable of

1. Decentalized computing

OptimusPrime Page 43
2. Parallel computing
3. Distributed computing
4. centralized computing

ANSWER

Question 63 : Design of _______processor is complex

1. parallel
2. pipeline
3. serial
4. distributed

ANSWER
pipeline

Question 64 : The instructions which copy information from one location to another either in the
processor’s internal register set or in the external main memory are called

1. Data transfer instructions


2. Program control instructions
3. Input-output instructions
4. Logical instructions

ANSWER
-------------

Question 65 : The pattern of___________ among tasks is captured by what is known as a task-
interaction graph

1. interaction
2. communication
3. optmization
4. flow

ANSWER
interaction

Question 66 : In vector processor a single instruction, can ask for ____________ data operations

1. multiple
2. single

OptimusPrime Page 44
3. two
4. four

ANSWER
multiple

Question 67 : The cost of a parallel processing is primarily determined by

1. switching complexity
2. circuit complexity
3. Time Complexity
4. space complexity

ANSWER
Time Complexity

Question 68 : Interaction overheads can be minimized by____

1. Maximize Data Locality


2. Maximize Volume of data exchange
3. Increase Bandwidth
4. Minimize social media contents

ANSWER
Maximize Data Locality

Question 69 : This is computation not performed by the serial version

1. Excess Computation
2. serial computation
3. Parallel Computing
4. cluster computation

ANSWER
Excess Computation

Question 70 : The cost of dynamic networks is often determined by the number of


____________ nodes in the network.

1. Packet
2. Ring
3. Static

OptimusPrime Page 45
4. Switching

ANSWER
Switching

Question 71 : The contention for the usage of a hardware device is called ______

1. data hazard
2. Stalk
3. Deadlock
4. structural hazard

ANSWER
structural hazard

Question 72 : Which Algorithm is better choice for pipelining

1. Small Algorithm
2. Hash Algorithm
3. Merge-Sort Algorithm
4. Quick-Sort Algorithm

ANSWER
Merge-Sort Algorithm

Question 73 : In MPI programing MPI_Reduce is the instruction for

1. Full operation
2. Limited operation
3. reduction operation
4. selected operation

ANSWER
reduction operation

Question 74 : The stalling of the processor due to the unavailability of the instructions is called
as ___________

1. Input hazard
2. data hazard
3. structural hazard
4. control hazard

OptimusPrime Page 46
ANSWER
control hazard

Question 75 : _____processors rely on compile time analysis to identify and bundle together
instructions that can be executed concurrently

1. VILW
2. LVIW
3. VLIW
4. VLWI

ANSWER
VLIW

Question 76 : type of parallelism that is naturally expressed by independent tasks in a task-


dependency graph is called _______ parallelism.

1. Task
2. Instruction
3. Data
4. Program

ANSWER
Task

Question 77 : NSM has launched its first supercomputer at

1. BHU
2. IITB
3. IITKG
4. IITM

ANSWER
BHU

Question 78 : Writing parallel programs is referred to as

1. Parallel computation
2. parallel development
3. parallel programing
4. Parallel processing

OptimusPrime Page 47
ANSWER
Parallel computation

Question 79 : A processor performing fetch or decoding of different instruction during the


execution of another instruction is called ______ .

1. Super-scaling
2. Pipe-lining
3. Parallel computation
4. serial computation

ANSWER
Pipe-lining

Question 80 : Zero address instruction format is used for

1. RISC architecture
2. CISC architecture
3. Von-Neuman architecture
4. Stack-organized architecture

ANSWER
-------------

Question 81 : An interface between the user or an application program, and the system resources
are

1. microprocessor
2. microcontroller
3. multi-microprocessor
4. operating system

ANSWER
-------------

Question 82 : The main objective in building the multi-microprocessor is

1. greater throughput
2. enhanced fault tolerance
3. greater throughput and enhanced fault tolerance
4. zero throughput

OptimusPrime Page 48
ANSWER
-------------

Question 83 : UMA architecture uses _______in design

1. cache
2. shared memory
3. message passing
4. distributed memory

ANSWER
shared memory

Question 84 : To which class of systems does the von Neumann computer belong

1. SIMD
2. MIMD
3. MISD
4. SISD

ANSWER
SISD

Question 85 : characteristic of CISC (Complex Instruction Set Computer)

1. Variable format instruction


2. Fixed format instructions
3. Instruction are executed by hardware
4. unsign long char

ANSWER
Variable format instruction

Question 86 : A _________ computation performs one multiply-add on a single pair of vector


elements

1. dot product
2. cross product
3. multiply
4. add

ANSWER

OptimusPrime Page 49
dot

Question 87 : Data parallelism is parallelism inherent in

1. program loops
2. Serial program
3. parallel program
4. long programs

ANSWER
parallel program

Question 88 : What is the execution time per stage of a pipeline that has 5 equal stages and a
mean overhead of 12 cycles

1. 2 cycles
2. 3 cycles
3. 5 cycles
4. 4 cycles

ANSWER
3 cycles

Question 89 : This algorithm is a called greedy because

1. the greedy algorithm never considers the same solution again


2. the greedy algorithm always give same solution again
3. the greedy algorithm never considers the optimal solution
4. the greedy algorithm never considers whole program

ANSWER
the greedy algorithm never considers the same solution again

Question 90 : If n is a power of two, we can perform this operation in ____ steps by propagating
partial sums up a logical binary tree of processors.

1. logn
2. nlogn
3. n
4. n^2

ANSWER

OptimusPrime Page 50
logn

Question 91 : A multiprocessor machine which is capable of executing multiple instructions on


multiple data sets

1. SISD
2. SIMD
3. MIMD
4. MISD

ANSWER
MIMD

Question 92 : Tree networks suffer from a communication bottleneck at higher levels of the tree.
This network, also called a _________ tree.

1. fat
2. binary
3. order static
4. heap tree

ANSWER
FAT

Question 93 : Multiple application independently running are typically called

1. Multiprograming
2. multiithreading
3. Multitasking
4. Synchronization

ANSWER
Multiprograming

Question 94 : Each of the clock cycle from the previous section of execution becomes

1. Previous stage
2. stall
3. previous cycle
4. pipe stage

ANSWER

OptimusPrime Page 51
pipe stage

Question 95 : The main objective in building the multimicroprocessor is

1. greater throughput
2. enhanced fault tolerance
3. greater throughput and enhanced fault tolerance
4. none of the mentioned

ANSWER
-------------

Question 96 : Wating until there is no data hazards then

1. stall
2. write operand
3. Read operand
4. Branching

ANSWER
Read operand

Question 97 : In message passing, send and receive message between

1. Task or processes
2. Task and Execution
3. Processor and Instruction
4. Instruction and decode

ANSWER
Task or processes

Question 98 : We denote the serial runtime by TS and the parallel____by TP

1. runtime
2. clock time
3. processor time
4. clock frequency

ANSWER
runtime

OptimusPrime Page 52
Question 99 : Uniprocessor computing devices is called__________.

1. Grid computing
2. Centralized computing
3. Parallel computing
4. Distributed computing

ANSWER
Centralized computing

Question 100 : The tighhtly copuled set of threads execution working on a single task is called

1. Serial processing
2. parallel processing
3. Multithreading
4. Recurrent

ANSWER
Multithreading

Question 101 : what is WAR

1. Write before read


2. write after write
3. write after read
4. write with read

ANSWER
write after read

Question 102 : Partitioning refer to decomposing of the computational activity as

1. Small Task
2. Large Task
3. Full program
4. group of program

ANSWER
Small Task

Question 103 : Speed up is defined as a ratio of

OptimusPrime Page 53
1. s=Ts/Tp
2. S= Tp/Ts
3. Ts=S/Tp
4. Tp=S /Ts

ANSWER
s=Ts/Tp

Question 104 : A processor that continuously tries to acquire the locks, spinning around a loop
till it reaches its success, is known as

1. Spin locks
2. Store locks
3. Link locks
4. Store operational

ANSWER
-------------

Question 105 : Pipelining strategy is called implement

1. Instruction execution
2. Instruction prefetch
3. Instruction manipulation
4. instruction decoding

ANSWER
Instruction prefetch

Question 106 : Parallel computing means to divide the job into several __________

1. Bit
2. Data
3. Instruction
4. Task

ANSWER
Task

Question 107 : if a piece of data is repeatedly used, the effective latency of this memory system
can be reduced by the __________.

OptimusPrime Page 54
1. RAM
2. ROM
3. Cache
4. HDD

ANSWER
Cache

Question 108 : Processing of multiple tasks simultaneously on multiple processors is called

1. Parallel processong
2. Distributed processing
3. Uni- processing
4. Multi-processing

ANSWER
Multi-processing

Question 109 : The instuction execution sequence ,that holds the instruction result known as

1. Data buffer
2. control buffer
3. reorder buffer
4. ordered buffer

ANSWER
reorder buffer

Question 110 : A multiprocessor operating system must take care of

1. authorized data access and data protection


2. unauthorized data access and data protection
3. authorized data access
4. data protection

ANSWER
-------------

Question 111 : The expression 'delayed load' is used in context of

1. prefetching
2. pipelining

OptimusPrime Page 55
3. processor-printer communication
4. memory-monitor communication

ANSWER
pipelining

Question 112 : _________ is a method for inducing concurrency in problems that can be solved
using the divide-and-conquer strategy.

1. exploratory decomposition
2. speculative decomposition
3. data-decomposition
4. Recursive decomposition

ANSWER
data-decomposition

Question 113 : If no node having a copy of a cache block, this technique is known as

1. Uniform memory access


2. Cached
3. Un-cached
4. Commit

ANSWER
-------------

----------------------------------------------------------------------------------------------------------------------------- --

1: Computer system of a parallel computer is capable of

A. Decentralized computing
B. Parallel computing
C. Centralized computing
D. Decentralized computing
E. Distributed computing
F. All of these
G. None of these

Ans :
A
2: Writing parallel programs is referred to as

A. Parallel computation
B. Parallel processes

OptimusPrime Page 56
C. Parallel development
D. Parallel programming
E. Parallel computation
F. All of these
G. None of these
Ans :
D

3: Simplifies application’s of three-tier architecture is ____________.


A. Maintenance
B. Initiation
C.Implementation
D. Deployment
E. All of these
F. None of these
Ans :
D

4: Dynamic networks of networks, is a dynamic connection that grows is called

A. Multithreading
B. Cyber cycle
C. Internet of things
D. Cyber-physical system
E. All of these
F. None of these
Ans :
C

5: In which application system Distributed systems can run well?

A. HPC
D. HTC
C. HRC
D. Both A and B
E. All of these
F. None of these
Ans :
D

6: In which systems desire HPC and HTC.

A. Adaptivity
B. Transparency
C. Dependency
D. Secretive
E. Adaptivity
F. All of these
G. None of these

OptimusPrime Page 57
Ans :
B

7: No special machines manage the network of architecture in which resources are known as

A. Peer-to-Peer
B. Space based
C. Tightly coupled
D. Loosely coupled
E. All of these
F. None of these

Ans :
A
8: Significant characteristics of Distributed systems have of

A. 5 types
B. 2 types
C. 3 types
D. 4 types
E. All of these
F. None of these

Ans :
C
9: Built of Peer machines are over

A. Many Server machines


B. 1 Server machine
C. 1 Client machine
D. Many Client machines
E. All of these
F. None of these

Ans :
D
10: Type HTC applications are

A. Business
B. Engineering
C. Science
D. Media mass
E. All of these
F. None of these

Ans :

OptimusPrime Page 58
A
11: Virtualization that creates one single address space architecture that of, is called

A. Loosely coupled
B. Peer-to-Peer
C. Space-based
D. Tightly coupled
E. Loosely coupled
F. All of these
G. None of these

Ans :
C
12: We have an internet cloud of resources In cloud computing to form

A. Centralized computing
B. Decentralized computing
C. Parallel computing
D. Both A and B
E. All of these
F. None of these

Ans :
E
13: Data access and storage are elements of Job throughput, of __________.

A. Flexibility
B. Adaptation
C. Efficiency
D. Dependability
E. All of these
F. None of these

Ans :
C
14: Billions of job requests is over massive data sets, ability to support known as

A. Efficiency
B. Dependability
C. Adaptation
D. Flexibility
E. All of these
F. None of these

Ans :
C
15: Broader concept offers Cloud computing .to select which of the following.

OptimusPrime Page 59
A. Parallel computing
B. Centralized computing
C. Utility computing
D. Decentralized computing
E. Parallel computing
F. All of these
G. None of these

Ans :
C
16: Resources and clients transparency that allows movement within a system is called

A.Mobility transparency
B. Concurrency transparency
C. Performance transparency
D. Replication transparency
E. All of these
F. None of these

Ans :
A
17: Distributed program in a distributed computer running a is known as

A. Distributed process
B. Distributed program
C. Distributed application
D. Distributed computing
E. All of these
F. None of these

Ans :
B
18: Uniprocessor computing devices is called__________.

A. Grid computing
B. Centralized computing
C. Parallel computing
D. Distributed computing
E. All of these
F. None of these

Ans :
B
19: Utility computing focuses on a______________ model.

A. Data

OptimusPrime Page 60
B. Cloud
C. Scalable
D. Business
E. All of these
F. None of these

Ans :
D
20: what is a CPS merges technologies

A. 5C
B. 2C
C. 3C
D. 4C
E. All of these
F. None of these

Ans :
C
21: Aberavationn of HPC

A. High-peak computing
B. High-peripheral computing

C. High-performance computing

D. Highly-parallel computing

E. All of these

F. None of these

Ans :
C

22: Peer-to-Peer leads to the development of technologies like

A. Norming grids

B. Data grids

C. Computational grids

D. Both A and B

E. All of these
F. None of these

OptimusPrime Page 61
Ans :
D
23: Type of HPC applications of.

A. Management
B. Media mass
C. Business
D. Science
E. All of these
F.None of these

Ans :
D
24: The development generations of Computer technology has gone through

A. 6
B. 3
C. 4
D. 5
E. All of these
F. None of these

Ans :
D

25: Utilization rate of resources in an execution model is known to be its

A. Adaptation
B. Efficiency
C. Dependability
D. Flexibility
E. All of these
F. None of these

Ans :
B
26: Even under failure conditions Providing Quality of Service (QoS) assurance is the responsibility of

A. Dependability
B. Adaptation
C. Flexibility
D. Efficiency
E. All of these
F. None of these

OptimusPrime Page 62
Ans :
A
27: Interprocessor communication that takes place

A. Centralized memory
B. Shared memory
C. Message passing
D. Both A and B
E. All of these
F. None of these

Ans :
D

28: Data centers and centralized computing covers many and

A. Microcomputers
B. Minicomputers
C. Mainframe computers
D. Supercomputers
E. All of these
F. None of these

Ans :
D
29: Which of the following is an primary goal of HTC paradigm___________.

A. High ratio Identification


B. Low-flux computing
C. High-flux computing
D. Computer utilities
E. All of these
F. None of these

Ans :
C
30: The high-throughput service provided is measures taken by

A. Flexibility
B. Efficiency
D. Adaptation
E. Dependability
F. All of these
G. None of these

OptimusPrime Page 63
Ans :
D

----------------------------------------------------------------------------------------------------------------------------- -

A modem is very helpful to link up two computers with the help of?
(A). telephone line
(B). dedicated line
(C). All of these
(D). None of these
Ans : (C)
A whole micro-computer system consists of which of the following?
(A). microprocessor
(B). memory
(C). peripheral equipment
(D). all of these
(E). None of these
Ans : (D).
Which of the following program is a micro-program written in 0 and 1?
(A). binary micro-program
(B). binary microinstruction
(C). symbolic microinstruction
(D). Symbolic microinstruction
(E). None of these
Ans : A
A pipeline is similar to which of the following?
(A). a gas line
(B). house pipeline
(C). both a and b
(D). an automobile assembly line
(E). None of these
Ans : D

OptimusPrime Page 64
A processor performing fetching or decoding of instructions during the execution of another
instruction is commonly known as?
(A). Super-scaling
(B). Parallel Computation
(C). Pipe-lining
(D). None of these
Ans : D
An optimizing compiler performs which of the following?
(A). Better compilation of the given code.
(B). better memory management.
(C). Takes the benefit of processor type and decreases its process time.
(D). Both a and c
(E). None of these
Ans : C
Which of the following wires is a collection of lines that connects several devices?
(A). internal wires
(B). peripheral connection wires
(C). Both a and b
(D). bus
(E). None of these
Ans : (D).
Which of the following is an instruction to give a small delay in the program?
(A). NOP
(B). LDA
(C). BEA
(D). None of these
Ans : A
How to define a peripheral?
(A). any physical device connected to the computer
(B). tape drive connected to a computer
(C). any drives installed in the computer

OptimusPrime Page 65
(D). None of these
Ans : A
----------------------------------------------------------------------------------------------------------------------------- --

OptimusPrime Page 66
UNIT SUB : 410241 HPC
ONE

Sr. Questions a b c d Answer


No.
1 A pipeline is like .................... an automobile
assembly line
house pipeline both a and b a gas line
a
2 Data hazards occur when ..................... Greater
performance
Pipeline changes
the order of read/
Some functional unit Machine size is
is not fully pipelined limited
b
loss write access to
operands
3 Systems that do not have parallel
processing capabilities are
SISD SIMD MIMD All of the above
a

4 How does the number of transistors


per chip increase according to Moore
Quadratically Linearly Cubicly Exponentially
d
´s law?

5 Parallel processing may occur in the


instruction
B. in the data
stream
both[A] and [B] none of the
above
c
stream
6 Execution of several activities at the
same time.
processing parallel processing serial processing multitasking
b

7 Cache memory works on the principle Locality of data Locality of memory Locality of reference Locality of
of reference &
c
memory
8 SIMD represents an organization that refers to a
______________. computer
represents
organization of
includes many
processing units
none of the
above.
c
system capable single computer under the supervision
of processing containing a control of a common control
several unit, processor unit unit
programs at the and a memory unit.
same time.
9 A processor performing fetch or
decoding of different instruction
Super-scaling Pipe-lining Parallel Computation None of these
b
during the execution of another
instruction is called ______ .
10 General MIMD configuration usually
called
a
multiprocessor
a vector processor array processor none of the
above.
a
11 A Von Neumann computer uses which SISD
one of the following?
SIMD MISD MIMD.
a
12 MIMD stands for Multiple
instruction
Multiple Memory instruction
instruction memory multiple data
Multiple
information
a
multiple data data memory data
13 MIPS stands for: Memory
Instruction Per
Major Instruction
Per Second
Main Information
Per Second
Million
Instruction Per
d
Second Second
14 M.J. Flynn's parallel processing
classification is based on:
Multiple
Instructions
Multiple data Both (a) and (b) None of the
above
c

15 VLIW stands for: Vector Large


Instruction
Very Long
Instruction Word
Very Large Integrated Very Low
Word Integrated Word
b
Word
16 The major disadvantage of pipeline is: High cost
individual
Initial setup time If branch instruction All of the above
is encountered the
c
dedicated pipe has to be flushed

17 A topology that involves Tokens. Star Ring Bus Daisy Chaining


b
18 multipoint topology is bus star mesh ring
a
19 In super-scalar mode, all the similar TRUE
instructions are grouped and executed
False
a
together.

20 Which mechanism performs an


analysis on the code to determine
Directory
protocol
Snoopy protocol Server based cache
coherence
Compiler based
cache coherence
d
which data items may become unsafe
for caching, and they mark those
items accordingly?
21 25 10 32 20
How many processors can be
organized in 5-dimensional binary
c
hypercube system?
22 Multiprocessors are classified as
________.
SIMD MIMD SISD MISD
b
23 Which of the following is not one of
the interconnection structures?
Crossbar switch Hypercube system Single port memory Time-shared
common bus
c

24 Which combinational device is used


in crossbar switch for selecting proper
Multiplexer Decoder Encoder Demultiplexer
a
memory from multiple addresses?
25 How many switch points are there in 50 63 60 54
crossbar switch network that connects
d
9 processors to 6 memory modules?

26 1 11 100 111
In a three-cube structure, node 101
cannot communicate directly with
b
node?
27 Which method is used as an
alternative way of snooping-based
Directory
protocol
Memory protocol Compiler based
protocol
None of above
a
coherence protocol?
28 snoopy cache protocol are used in
-----------------based system
bus mesh star hypercube
a

29 superscalar architecture contains


-------------execution units for
multiple single none of the above
a
instruction execution
30 time taken by header of a message
between two directly connected nodes
startup time per hop time per word transfer
time
packaging time
b
is called as-----------------

31 the number of switch requirement for n


a network with n input and n output
n2 n3 n4
b
is ------------------

32 which of the following is not static


network
bus ring mesh crossbar switch
d
33 In super-scalar processors, ________
mode of execution is used.
In-order Post order Out of order None of the
mentioned
c

34 ______ have been developed


specifically for pipelined systems.
Utility software Speed up utilities Optimizing
compilers
None of the
above
c
35 Which of the following is a Multicore
combination of several processors on architecture
RISC architecture CISC architecture Subword
parallelism
a
a single chip?
36 The important feature of the VLIW
is .....
ILP Cost effectiveness performance None of the
mentioned
a

37 The parallel execution of operations


in VLIW is done according to the
sk scheduler Interpreter Compiler Encoder
c
schedule determined by .....

38 The VLIW processors are much


simpler as they do not require of .....
Computational
register
Complex logic
circuits
SSD slots Scheduling
hardware
d

39 The VLIW architecture follows .....


approach to achieve parallelism.
MISD SISD SIMD MIMD
d

40 Which of the following is not a


Pipeline Conflicts?
Timing
Variations
Branching Load Balancing Data
Dependency
c
UNIT SUB : 410241 HPC
TWO

Sr. Questions a b c d Answer


No.

1 Task dependency graph is ------------------ directed undirected directed acyclic undirected


acyclic
c
2 In task dependency graph longest
directed path between any pair of start
total work critical path task path task length
b
and finish node is called as --------------

3 which of the following is not a


granularity type
course grain large grain medium grain fine grain
b
4 which of the following is a an example of matrix
data decomposition multiplication
merge sort quick sort 15 puzzal
a

5 which problems can be handled by


recursive decomposition
backtracking greedy method divide and conquer branch and
problem bound
c

6 In this decomposition problem data


decomposition goes hand in hand with its decomposition
recursive
decomposition
explorative
decomposition
speculative
decomposition
c
execution
7 which of the following is not an example n queens
of explorative decomposition problem
15 puzzal problem tic tac toe quick sort
d

8 Topological sort can be applied to which


of the following graphs?
a) Undirected
Cyclic Graphs
b) Directed Cyclic
Graphs
c) Undirected Acyclic d) Directed
Graphs Acyclic Graphs
d
9 In most of the cases, topological sort starts a) Maximum
from a node which has __________ Degree
b) Minimum
Degree
c) Any degree d) Zero Degree
d

10 Which of the following is not an


application of topological sorting?
a) Finding b) Finding
prerequisite of a Deadlock in an
c) Finding Cycle in a d) Ordered
graph Statistics
d
task Operating System

11 In ------------task are defined before


starting the execution of the algorithm
dynamic task static task regular task one way task
b

12 which of the following is not the array


distribution method of data partitioning
block cyclic block cyclic chunk
d

13 blocking optimization is used to improve hit miss


temmporal locality for reduce
misses hit rate cache misses
b

14 CUDA thought that 'unifying theme' of


every form of parallelism is
CDA thread PTA thread CUDA thread CUD thread
c

15 Topological sort of a Directed Acyclic


graph is?
a) Always
unique
b) Always Not
unique
c) Sometimes unique d) Always unique
and sometimes not if graph has even
c
unique number of
vertices
16 threads being block altogether and being thread block
executed in the sets of 32 threads called a
32 thread 32 block unit block
a

17 True or False: The threads in a thread


block are distributed across SM units so
TRUE FALSE
a
that each thread is executed by one SM
unit.
18 When the topological sort of a graph is
unique?
a) When there
exists a
b) In the presence c) In the presence of
of multiple nodes single node with
d) In the
presence of
a
hamiltonian with indegree 0 indegree 0 single node with
path in the graph outdegree 0

19 What is a high performance multi-core


processor that can be used to accelerate a
CPU DSP GPU CLU
c
wide variety of applications using
parallel computing.
20 A good mapping does not depends on
which following factor
knowledge of
task sizes
the size of data
associated with
characteristics of
inter-task
task overhead
d
tasks interactions
21 CUDA is a parallel computing platform
and programming model
TRUE FALSE
a

22 Which of the following is not a form of


parallelism supported by CUDA
Vector
parallelism -
Thread level task
parallelism -
Block and grid level
parallelism -
Data parallelism
- Different
a
Floating point Different threads Different blocks or threads and
computations execute a different grids execute blocks process
are executed in tasks different tasks different parts of
parallel on wide data in memory
vector units

23 The style of parallelism supported on


GPUs is best described as
MISD - Multiple
Instruction
SIMT - Single
Instruction
SISD - Single
Instruction Single
MIMD
b
Single Data Multiple Thread Data

24 True or false: Functions annotated with


the __global__ qualifier may be executed
TRUE FALSE
a
on the host or the device
25 Which of the following correctly
describes a GPU kernel
A kernel may
contain a mix of
All thread blocks
involved in the
A kernel is part of
the GPU's internal
kernel may
contain only host
b
host and GPU same computation micro-operating code
code use the same system, allowing it
kernel to act as in
independent host
26 a code known as grid which runs on GPU 32 thread
consisting of a set of
unit block 32 block thread block
d

27 which of the following is not an parallel


algorithm model
data parallel
model
task graph model task model work pool model
c

28 Having load before the store in a running WAW hazards


program order, then interchanging this
Destination
registers
WAR hazards Registers
c
order, results in a
29 model based on the passing of stream of
data through process arranged in a
producer
consumer model
hybrid model task graph model work pool model
a
succession is called as

30 When instruction i and instruction j are


tends to write the same register or the
Input
dependence
Output
dependence
Ideal pipeline Digital call
b
memory location, it is called

31 Multithreading allowing
multiple-threads for sharing the
Multiple
processor
Single processor Dual core Corei5
b
functional units of a
32 Allowing multiple instructions for issuing Single-issue
in a clock cycle, is the goal of processors
Dual-issue
processors
Multiple-issue
processors
No-issue
processors
c
33 OpenGL stands for: A. Open General B. Open Graphics
Liability Library
C. Open Guide Line D. Open Graphics
Layer
b

34 which of the following is not an


advantage of OpenGL
There is more
detailed
OpenGL is
portable.
OpenGL is more It is not a
functional than any cross-platform
d
documentation other API. API,
for OpenGL
while other API's
don't have such
detailed
documentation.

35 work pool model uses ----------------


approach for task assignment
static dynamic centralized decentralized
b

36 which of the following is false regarding


data parallel model
all task perform degree of
same parallelism
matrix
multiplication is
dynamic
mapping is done
d
computations increase with size
example of data
of problem parallel
37 t ti
which of the following are methods for
containing interaction overheads
maximizing data minimize volumn min frequency
locality of data exchange interactions
of all the above
d

38 which of the following are classes of


dynamic mapping centralized method
self scheduling chunk scheduling both a and b none of the
above
c

39 which of the following is not scheme for


static mapping
block
distribution
block cyclic
distributions
cyclic distributions self scheduling
d
UNIT SUB : 410241 HPC
THREE
Sr. No. Questions a b c d Answer

e.g 1 Write down question Option a Option b Option c Option d a/b/c/d

1 Group communication operations are built


using which primitives?
one to all all to all point to point None of these
c
2 ___ can be performed in an identical fashion Recursive
by inverting the process. Doubling
Reduction Broadcast None of these
b

3 Broadcast and reduction operations on a


mesh is performed
along the rows along the
columns
both a and b
concurrently
None of these
c

4 Cost Analysis on a ring is (ts + twm)(p - 1) (ts - twm)(p + 1) (tw + tsm)(p - 1) (tw - tsm)(p +
1)
a
5 Cost Analysis on a mesh is 2ts(sqrt(p) + 1) +
twm(p - 1)
2tw(sqrt(p) + 1) + 2tw(sqrt(p) - 1) +
tsm(p - 1) tsm(p - 1)
2ts(sqrt(p) - 1)
+ twm(p - 1)
d

6 Communication between two directly link


nodes
Cut-through
routing
Store-and-forwar Nearest
d routing neighbour
None
c
communication

7 All-to-one communication (reduction) is the


dual of ______ broadcast.
all-to-all one-to-all one-to-one all-to-one
b

8 Which is known as Reduction? all-to-one all-to-all one-to-one one-to-all


a
9 Which is known as Broadcast? one-to-one one-to-all all-to-all all-to-one
b
10 The dual of all-to-all broadcast is all-to-all
reduction
all-to-one
reduction
Both None
a
11 All-to-all broadcast algorithm for the 2D
mesh is based on the
Linear Array
Algorithm
Ring algorithm Both None
b

12 In the first phase of 2D Mesh All to All, the


message size is ___
p m*sqrt(p) m p*sqrt(m)
c

13 In the second phase of 2D Mesh All to All, the m


message size is ___
p*sqrt(m) p m*sqrt(p)
d

14 In All to All on Hypercube, The size of the


message to be transmitted at the next step is
doubled tripled halfed no change
a
____ by concatenating the received message
with their current data

15 The all-to-all broadcast on Hypercube needs p


____ steps
sqrt(p) - 1 log p None
c

16 One-to-All Personalized Communication


operation is commonly called ___
gather operation concatenation scatter operation None
c

17 The dual of the scatter operation is the concatenation gather operation Both None
c
18 In Scatter Operation on Hypercube, on each
step, the size of the messages communicated
tripled halved doubled no change
b
is ____
19 Which is also called "Total Exchange" ? All-to-all
broadcast
All-to-all
personalized
all-to-one
reduction
None
b
communication
20 All-to-all personalized communication can
be used in ____
Fourier transform matrix transpose sample sort all of the
above
d

21 In collective communication operations,


collective means
involve group of
processors
involve group of
algorithms
involve group of none of these
variables
a

22 efficiency of data parallel algorithm depends efficient


on the implementation
efficient
implementation
both none
b
of the algorithm of the operation

23 All processes participate in a single ______


interaction operation.
global local wide variable
a

24 subsets of processes in ______ interaction. global local wide variable


b

25 Goal of good algorithm is to implement


commonly used _____ pattern.
communication interaction parallel regular
a

26 Reduction can be used to find the sum,


product, maximum, minimum of _____ of
tuple list sets all of above
c
numbers.
27 source ____ is bottleneck. process algorithm list tuple
a
28 only connections between single pairs of
nodes are used at a time is
good utilization poor utilization massive
utilization
medium
utilization
b

29 all processes that have the data can send it


again is
recursive
doubling
naive approach reduction all
a
30 The ____ do not snoop the messages going
through them.
nodes variables tuple list
a

31 accumulate results and send with the same


pattern is...
broadcast naive approach recursive
doubling
reduction
symmetric
d

32 every node on the linear array has the data


and broadcast on the columns with the linear
parallel vertical horizontal all
a
array algorithm in _____

33 using different links every time and


forwarding in parallel again is
better for
congestion
better for
reduction
better for
communication
better for
algorithm
a

34 In a balanced binary tree processing nodes is leaves


equal to
number of
elemnts
branch none
a
35 In one -to- all broadcast there is divide and
conquer type
sorting type
algorithm
searching type
algorithm
simple
algorithm
a
algorithm
36 For sake of simplicity, the number of nodes is 1 2 3 4
a power of
b

37 Nides with zero in i least significant bits


participate in _______
algorithm broadcast communication searching
c
38 every node has to know when to
communicate that is
call the procedure call for broadcast call for
communication
call the
congestion
a

39 the procedure is disturbed and require only


point-to-point _______
synchronization communication both none
a

40 Renaming relative to the source is _____ the


source.
XOR XNOR AND NAND
a
UNIT SUB : 410241 HPC
FOUR

Sr. No. Questions a b c d Answer

e.g 1 Write down question Option a Option b Option c Option d a/b/c/d

1 mathematically efficiency is e=s/p e=p/s e*s=p/2 e=p+e/e


a
2 Cost of a parallel system is sometimes referred to____ of
product
work processor time both none
c
3 Scaling Characteristics of Parallel Programs Ts is increase constant decreases none
b
4 Speedup tends to saturate and efficiency _____ as a
consequence of Amdahl’s law.
increase constant decreases none
c
5 Speedup obtained when the problem size is _______ linearly increase
with the number of processing elements.
constant decreases depend on
problem
a
size
6 The n × n matrix is partitioned among n processors, with
each processor storing complete ___ of the matrix.
row column both depend on
processor
a

7 1
cost-optimal parallel systems have an efficiency of ___ n logn complex
a
8 The n × n matrix is partitioned among n2 processors such
that each processor owns a _____ element.
n 2n single double
c

9 1 2 3 4
how many basic communication operations are used in
matrix vector multiplication
c

10 In DNS algorithm of matrix multiplication it used 1d partition 2d partition 3d partition both a,b
c
11 In the Pipelined Execution, steps contain normalization communicatio elimination
n
all
d
12 3/2 2/3
the cost of the parallel algorithm is higher than the
sequential run time by a factor of __
3*2 2/3+3/2
a
13 The load imbalance problem in Parallel Gaussian
Elimination: can be alleviated by using a ____ mapping
acyclic cyclic both none
b

14 A parallel algorithm is evaluated by its runtime in function the input size, the number of the
of processors, communicatio
all
d
n parameters.

15 For a problem consisting of W units of work, p__W


processors can be used optimally.
<= >= < >
a

16 C(W)__Θ(W) for optimality (necessary condition). > < <= equals


d
17 many interactions in oractical parallel programs occur in
_____ pattern
well defined zig-zac reverse straight
a
18 efficient implementation of basic communication operation performance
can improve
communicatio algorithm
n
all
a
19 efficient use of basic communication operations can reduce development
effort and
software
quality
both none
a
20 Group communication operations are built using_____
Messenging primitives.
point-to-point one-to-all all-to-one none
a
21 one processor has a piece of data and it need to send to
everyone is
one -to-all all-to-one point -to-point all of above
a
22 the dual of one -to-all is all-to-one
reduction
one -to-all
reduction
pnoint
-to-point
none
a
reducntion
23 Data items must be combined piece-wise and the result
made available at
target
processor
target
variable
a
finally finatlalyrget
receiver
24 wimpleat way to send p-1 messages from source to the other Algorithm
p-1 processors
communicatio concurrency
n
receiver
c

25 1 2 8 0
In a eight node ring, node ____ is source of broadcast
d
26 The processors compute ______ product of the vector
element and the loval matrix
local global both none
a
27 one to all broadcast use recursive
doubling
simple
algorithm
both none
a

28 In a broadcast and reduction on a balanced binary tree


reduction is done in ______
recursive
order
straight order vertical order parallel
order
a
29 1 2 8 0
if "X" is the message to broadcast it initially resides at the
source node
d
30 logical operators used in algorithm are XOR AND both none
c
31 Generalization of broadcast in Which each processor is Source as well only source
as destination
only
destination
none
a
32 The algorithm terminates in _____ steps p p+1 p+2 p-1
d
33 Each node first sends to one of its neighbours the data it
need to....
broadcast identify verify none
a
34 The second communication phase is a columnwise ______
broadcast of consolidated
All-to-all one -to-all all-to-one
point-to-poi
a
nt
35 All nodes collects _____ message corresponding to √p nodes
to their respectively
√p p p+1 p-1
a
36 It is not possible to port ____ for higher dimensional
network
Algorithm hypercube both none
a
37 If we port algorithm to higher dimemsional network it
would cause
error contention recursion none
b
38 In the scatter operation ____ node send message to every
other node
single double triple none
a
39 The gather Operation is exactly the inverse of _____ scatter
operation
recursion
operation
execution none
a
40 Similar communication pattern to all-to-all broadcast
except in the_____
reverse order parallel order straight order vertical
order
a
UNIT SUB : 410241 HPC
FIVE

Sr. No. Questions a b c d Answer

e.g 1 Write down question Option a Option b Option c Option d a/b/c/d

1 In ___________, the number of elements to be sorted internal sorting internal


is small enough to fit into the process's main searching
external sorting external
searching
a
memory.
2 ______________ algorithms use auxiliary storage
(such as tapes and hard disks) for sorting because
internal sorting internal
searching
External
sorting
external
searching
c
the number of elements to be sorted is too large to
fit into memory.
3 ______ can be comparison-based or
noncomparison-based.
searching Sorting both a and b none of above
b
4 The fundamental operation of comparison-based
sorting is ________.
compare-excha searching
nge
Sorting swapping
a

5 The complexity of bubble sort is Θ(n2). TRUE FALSE


a
6 Bubble sort is difficult to parallelize since the
algorithm has no concurrency.
TRUE FALSE
a
7 Quicksort is one of the most common sorting
algorithms for sequential computers because of its
TRUE FALSE
a
simplicity, low overhead, and optimal average
complexity.
8 The performance of quicksort depends critically
on the quality of the ______-.
non-pivote pivot center element len of array
b
9 the complexity of quicksort is O(nlog n). TRUE FALSE
a
10 DFS begins by expanding the initial node and TRUE FALSE
generating its successors. In each subsequent step,
DFS expands one of the most recently generated
nodes.

11 The main advantage of ______ is that its storage


requirement is linear in the depth of the state
BFS DFS a and b none of above
b
space being searched.
12 _____ algorithms use a heuristic to guide search. BFS DFS a and b none of above
a
13 If the heuristic is admissible, the BFS finds the
optimal solution.
TRUE FALSE
a
14 The search overhead factor of the parallel system is TRUE
defined as the ratio of the work done by the
FALSE
a
parallel formulation to that done by the sequential
formulation
15 The critical issue in parallel depth-first search
algorithms is the distribution of the search space
TRUE FALSE
a
among the processors.

16 Graph search involves a closed list, where the


major operation is a _______
sorting searching lookup none of above
c

17 ______________ algorithms use auxiliary storage


(such as tapes and hard disks) for sorting because
internal sorting internal
searching
External
sorting
external
searching
c
the number of elements to be sorted is too large to
fit into memory.
18 ______ can be comparison-based or
noncomparison-based.
searching Sorting both a and b none of above
b
19 If the heuristic is admissible, the BFS finds the
optimal solution.
TRUE FALSE
a
20 The search overhead factor of the parallel system is TRUE
defined as the ratio of the work done by the
FALSE
a
parallel formulation to that done by the sequential
formulation

21 Breadth First Search is equivalent to which of the


traversal in the Binary Trees?
Pre-order
Traversal
Post-order
Traversal
Level-order
Traversal
In-order
Traversal
c

22 Time Complexity of Breadth First Search is? (V –


number of vertices, E – number of edges)
O(V + E) O(V) O(E) O(V*E)
a
23 Which of the following is not an application of
Breadth First Search?
When the graph When the
is a Binary Tree graph is a
When the graph When the graph
is a n-ary Tree is a Ternary
b
Linked List Tree
24 In BFS, how many times a node is visited? Once Twice Equivalent to
number of
Thrice
c
indegree of the
node

25 Is Best First Search a searching algorithm used in


graphs.
TRUE FALSE
a
26 The critical issue in parallel depth-first search
algorithms is the distribution of the search space
TRUE FALSE
a
among the processors.
27 Graph search involves a closed list, where the
major operation is a _______
sorting searching lookup none of above
c
28 The fundamental operation of comparison-based
sorting is ________.
compare-excha searching
nge
Sorting swapping
a

29 The complexity of bubble sort is Θ(n2). TRUE FALSE


a
30 DFS begins by expanding the initial node and TRUE FALSE
generating its successors. In each subsequent step,
DFS expands one of the most recently generated
nodes.

31 The main advantage of ______ is that its storage


requirement is linear in the depth of the state
BFS DFS a and b none of above
b
space being searched.
32 Breadth First Search is equivalent to which of the
traversal in the Binary Trees?
Pre-order
Traversal
Post-order
Traversal
Level-order
Traversal
In-order
Traversal
c

33 Time Complexity of Breadth First Search is? (V –


number of vertices, E – number of edges)
O(V + E) O(V) O(E) O(V*E)
a
34 Which of the following is not an application of
Breadth First Search?
When the graph When the
is a Binary Tree graph is a
When the graph When the graph
is a n-ary Tree is a Ternary
b
Linked List Tree

35 In BFS, how many times a node is visited? Once Twice Equivalent to


number of
Thrice
c
indegree of the
node

36 Is Best First Search a searching algorithm used in


graphs.
TRUE FALSE
a
37 Which of the following is not a stable sorting
algorithm in its typical implementation.
Insertion Sort Merge Sort Quick Sort Bubble Sort
c
38 Which of the following is not true about
comparison based sorting algorithms?
The minimum
possible time
Any
comparison
Counting Sort is
not a
Heap Sort is not
a comparison
d
complexity of a based sorting comparison based sorting
comparison algorithm can based sorting algorithm.
based sorting be made stable algortihm
algorithm is by using
O(nLogn) for a position as a
random input criteria when
array two elements
39 In ___________, the number of elements to be sorted internal sorting
is small enough to fit into the process's main
internal
searching
external sorting external
searching
a
memory.
UNIT SUB : 410241 HPC
SIX

Sr. Questions a b c d Answer


No.
e.g 1 Write down question Option a Option b Option c Option d a/b/c/d

1 A CUDA program is comprised of two primary


components: a host and a _____.
GPU kernel CPU kernel OS none of above
a
2 The kernel code is dentified by the ________qualifier
with void return type
_host_ __global__ _device_ void
b
3 The kernel code is only callable by the host TRUE FALSE
a
4 The kernel code is executable on the device and host TRUE FALSE
b
5 Calling a kernel is typically referred to as _________. kernel
thread
kernel kernel
initialization termination
kernel
invocation
d
6 Host codes in a CUDA application can Initialize a device TRUE FALSE
a
7 Host codes in a CUDA application can Allocate GPU
memory
TRUE FALSE
a
8 A CUDA program is comprised of two primary
components: a host and a _____.
GPU kernel CPU kernel OS none of above
a

9 A CUDA program is comprised of two primary


components: a host and a _____.
GPU kernel CPU kernel OS none of above
a

10 The kernel code is dentified by the ________qualifier


with void return type
_host_ __global__ _device_ void
b
11 Host codes in a CUDA application can not Invoke kernels TRUE FALSE
b
12 CUDA offers the Chevron Syntax to configure and execute
a kernel.
TRUE FALSE
a
13 the BlockPerGrid and ThreadPerBlock parameters are
related to the ________ model supported by CUDA.
host kernel thread abstract none of above
ion
c
14 _________ is Callable from the device only _host_ __global__ _device_ none of above
c
15 ______ is Callable from the host _host_ __global__ _device_ none of above
b
16 ______ is Callable from the host _host_ __global__ _device_ none of above
a
17 CUDA supports ____________ in which code in a single
thread is executed by all other threads.
tread
division
tread
termination
thread
abstraction
none of above
c

18 In CUDA, a single invoked kernel is referred to as a _____. block tread grid none of above
c
19 A grid is comprised of ________ of threads. block bunch host none of above
a
20 A block is comprised of multiple _______. treads bunch host none of above
a
21 a solution of the problem in representing the
parallelismin algorithm is
CUD PTA CDA CUDA
d
22 ______ is Callable from the host _host_ __global__ _device_ none of above
b
23 ______ is Callable from the host _host_ __global__ _device_ none of above
a
24 A CUDA program is comprised of two primary
components: a host and a _____.
GPU kernel CPU kernel OS none of above
a
25 The kernel code is dentified by the ________qualifier
with void return type
_host_ __global__ _device_ void
b
26 Host codes in a CUDA application can not Reset a device TRUE FALSE
b
27 Host codes in a CUDA application can not Invoke kernels TRUE FALSE
b
28 A CUDA program is comprised of two primary
components: a host and a _____.
GPU kernel CPU kernel OS none of above
a
29 Calling a kernel is typically referred to as _________. kernel
thread
kernel kernel
initialization termination
kernel
invocation
d
30 In CUDA, a single invoked kernel is referred to as a _____. block tread grid none of above
c
31 A grid is comprised of ________ of threads. block bunch host none of above
a
32 A block is comprised of multiple _______. treads bunch host none of above
a
33 A CUDA program is comprised of two primary
components: a host and a _____.
GPU kernel CPU kernel OS none of above
a
34 ______ is Callable from the host _host_ __global__ _device_ none of above
a
35 In CUDA, a single invoked kernel is referred to as a _____. block tread grid none of above
c
36 the BlockPerGrid and ThreadPerBlock parameters are
related to the ________ model supported by CUDA.
host kernel thread abstract none of above
ion
c
37 Host codes in a CUDA application can Transfer data to and TRUE
from the device
FALSE
a
38 Host codes in a CUDA application can not Deallocate
memory on the GPU
TRUE FALSE
b
39 Host codes in a CUDA application can not Reset a device TRUE FALSE
b
40 Calling a kernel is typically referred to as _________. kernel
thread
kernel kernel
initialization termination
kernel
invocation
d
UNIT FIVE SUB : 410241 HPC

Questions Option 1 Option 2 Option 3 Option 4 Ans

Which of the following statements is NOT TRUE for


Internal Sorting algorithms
Usually deal with No of elements Use auxilliary
small number of must be able to fit memory like tape
Ususally are of
type
3
elements in process's main or hard disk compare-exchang
memory e

In sorting networks for INCREASING COMPARATOR X' = min { x , y } X' = max { x , y } X' = min { x , y }
with input x,y select the correct output X', Y' from the and Y' = min { x , and Y' = min { x , and Y' = max{ x ,
X' = max { x , y }
and Y' = max { x ,
3
following options y} y} y} y}

In sorting networks for DECREASING COMPARATOR X' = min { x , y } X' = max { x , y } X' = min { x , y }
with input x,y select the correct output X', Y' from the and Y' = min { x , and Y' = min { x , and Y' = max{ x ,
X' = max { x , y }
and Y' = max { x ,
2
following options y} y} y} y}

Which of the following is TRUE for Bitonic Sequence a) and b)


a) Monotonically increasing b) Monotonically
a) and b) and d) a) and b) and c) a) and b) and c)
and d)
4
Decreasing c) With cyclic shift of indices d) First
increasing then decreasing

Which of the following is NOT a BITONIC Sequence {8, 6, 4, 2, 3, 5, 7,


9}
{0, 4, 8, 9, 2, 1} {3, 5, 7, 9, 8, 6, 4,
2}
{1, 2, 4, 7, 6, 0, 1}
4

The procedure of sorting a bitonic sequence using


bitonic splits is called
Bitonic Merge Bitonic Split Bitonic Divide Bitonic Series
1
UNIT FIVE SUB : 410241 HPC

Questions Option 1 Option 2 Option 3 Option 4 Ans

Which of the following statements is NOT TRUE for


Internal Sorting algorithms
Usually deal with No of elements Use auxilliary
small number of must be able to fit memory like tape
Ususally are of
type
3
elements in process's main or hard disk compare-exchang
memory e

While mapping Bitonic sort on Hypercube,


Compare-exchange operations take place between
One Bit Two bits Three Bits Four bits
1
wires whose labels differ in

Which of following is NOT A WAY of mapping the


input wires of the bitonic
Row Major
Mapping
Column Major
Mapping
Row Major
Snakelike
Row Major
Shuffled Mapping
2
sorting network to a MESH of processes mapping

Which is the sorting algorithm in below given steps -


1. procedure X_SORT(n)
Selection Sort Bubble Sort Parallel Selcetion Parallel Bubble
Sort Sort
2
2. begin
3. for i := n - 1 downto 1 do
4. for j := 1 to i do
5. compare-exchange(aj, aj + 1);
6. end X_SORT

The odd-even transposition algorithm sorts n n2


elements in n phases (n is even), each of which
2n n/2 n
3
requires ------------compare-exchange operations
UNIT FIVE SUB : 410241 HPC

Questions Option 1 Option 2 Option 3 Option 4 Ans

Which of the following statements is NOT TRUE for


Internal Sorting algorithms
Usually deal with No of elements Use auxilliary
small number of must be able to fit memory like tape
Ususally are of
type
3
elements in process's main or hard disk compare-exchang
memory e

What is TRUE about SHELL SORT Moves elements Moves elements


only one position long distance
During second
phase algorithm
both 2 and 3
4
at a time switches to odd
even
transposition sort
Which is the fastest sorting algorithm Bubble Sort Odd-Even
Transposition
Shell Sort Quick Sort
4
Sort
Quicksort's performance is greatly affected by the
way it partitions a sequence.
TRUE FALSE
1

Pivot in Quick sort can be selected as Always First


Element
Always Last
element
Always Middle
index Element
Randomly
Selected Element
4

Quick sort uses Recursive Decomposition TRUE FALSE


1

In first step of parallelizing quick sort for n elements Only one process n processes are
to get subarrays, which of the following statement is is used used
two processes are None of the above
used
1
TRUE
UNIT FIVE SUB : 410241 HPC

Questions Option 1 Option 2 Option 3 Option 4 Ans

Which of the following statements is NOT TRUE for


Internal Sorting algorithms
Usually deal with No of elements Use auxilliary
small number of must be able to fit memory like tape
Ususally are of
type
3
elements in process's main or hard disk compare-exchang
memory e

In Binary tree representation created by execution of Leaf Node


Quick sort, Pivot is at
Root of tree Any internal
node
None of the above
2

O(N2)
What is the worst case time complexity of a quick sort O(N)
algorithm?
O(N log N) O(log N)
3

O(N2)
What is the average running time of a quick sort
algorithm?
O(N) O(N log N) O(log N)
2
Odd-even transposition sort is a variation of Quick Sort Shell Sort Bubble Sort Selection Sort
3
O(N2)
What is the average case time complexity of odd-even O(N log N)
transposition sort?
O(N) O(log N)
4

Shell sort is an improvement on Quick Sort Bubble Sort Insertion sort Selection Sort
3
In parallel Quick Sort Pivot is sent to processes by Broadcast Multicast Selective
Multicast
Unicast
1
UNIT FIVE SUB : 410241 HPC

Questions Option 1 Option 2 Option 3 Option 4 Ans

Which of the following statements is NOT TRUE for


Internal Sorting algorithms
Usually deal with No of elements Use auxilliary
small number of must be able to fit memory like tape
Ususally are of
type
3
elements in process's main or hard disk compare-exchang
memory e

In parallel Quick Sort each process divides the


unsorted list into
n Lists 2 Lists 4 Lists n-1 Lists
2

Time Complexity of DFS is? (V – number of vertices, E O(V + E)


– number of edges)
O(V) O(E) O(V*E)
1

A person wants to visit some places. He starts from a BFS


vertex and then wants to visit every vertex till it
DFS Prim's Kruskal's
2
finishes from one vertex, backtracks and then explore
other vertex from same vertex. What algorithm he
should use?

Given an array of n elements and p processes, in the


message-passing version of the parallel quicksort,
n*p n-p p/n n/p
4
each process stores ---------elements of array

In parallel quick sort Pivot selecton strategy is crucial Maintaing load


for balance
Maintaining
uniform
Effective Pivot
selection in next
all of the above
4
distribution of level
elements in
process groups
UNIT FIVE SUB : 410241 HPC

Questions Option 1 Option 2 Option 3 Option 4 Ans

Which of the following statements is NOT TRUE for


Internal Sorting algorithms
Usually deal with No of elements Use auxilliary
small number of must be able to fit memory like tape
Ususally are of
type
3
elements in process's main or hard disk compare-exchang
memory e

In execution of the hypercube formulation of


quicksort for d = 3, split along -----------dimention to
first scond third None of above
3
partition sequence into two big blocks, one greater
than pivot and other smaller than pivot as shown in
diagram

Which Parallel formulation of Quick sort is possible Shared-Address-S Message Passing


pace Parallel formulation
Hypercube
Formulation
All of the above
4
Formulation

Which formulation of Dijkstra's algorithm exploits


more parallelism
source-partitione source-parallel
d formulation formulation
Partitioned-Parall All of above
el Formulation
2

In Dijkstra's all pair shortest path each process


compute the single-source shortest paths for all
TRUE FALSE
1
vertices assigned to it in SOURCE PARTITIONED
FORMULATION
UNIT FIVE SUB : 410241 HPC

Questions Option 1 Option 2 Option 3 Option 4 Ans

Which of the following statements is NOT TRUE for


Internal Sorting algorithms
Usually deal with No of elements Use auxilliary
small number of must be able to fit memory like tape
Ususally are of
type
3
elements in process's main or hard disk compare-exchang
memory e

A complete graph is a graph in which each pair of


vertices is adjacent
TRUE FALSE
1

The space required to store the adjacency matrix of a in order of n


graph with n vertices is
in order of n log
n
in order of n
squared
in order of n/2
3

Graph can be represented by Identity Matrix Adjacency Matrix Sprse list Sparse matrix
2

to solve the all-pairs shortest paths problem which


algorithm/s is/are used a) Floyd's algorithm b)
a) and c) a) and b) b) and c) c) and d)
2
Dijkstra's single-source shortest paths c) Prim's
Algorithm d) Kruskal's Algorithm
Simple backtracking is a depth-first search method
that terminates upon finding the first solution.
TRUE FALSE
1

Best-first search (BFS) algorithms can search both


graphs and trees.
TRUE FALSE
1
UNIT FIVE SUB : 410241 HPC

Questions Option 1 Option 2 Option 3 Option 4 Ans

Which of the following statements is NOT TRUE for


Internal Sorting algorithms
Usually deal with No of elements Use auxilliary
small number of must be able to fit memory like tape
Ususally are of
type
3
elements in process's main or hard disk compare-exchang
memory e

A* algorithm is a BFS algorithm DFS Algorithm Prim's Algorithm Kruskal's


Algorithm
1
identify Load-Balancing Scheme/s Asynchronous
Round Robin
Global Round
Robin
Random Polling All above
methods
4

important component of best-first search (BFS)


algorithms is
Open List Closed List Node List Mode List
1
UNIT SUB : 410241 HPC
SIX
Sr. No. Questions Option 1 Option 2 Option 3 Option 4 Ans

1 Any condition that causes a processor to stall Hazard


is called as _____.
Page fault System error None of the above
1

2 The time lost due to branch instruction is


often referred to as _____.
Latency Delay Branch penalty None of the above
3

3 _____ method is used in centralized systems to Scorecard


perform out of order execution.
Score boarding Optimizing Redundancy
2

4 The computer cluster architecture emerged


as an alternative for ____.
ISA Workstation Super computers Distributed
systems
3

5 NVIDIA CUDA Warp is made up of how many 512 1024 312 32


threads?
4

6 Out-of-order instructions is not possible on


GPUs.
TRUE FALSE -- --
2

7 CUDA supports programming in .... C or C++ only Java, Python, and C, C++, third party Pascal
more wrappers for
3
Java, Python, and
more

8 FADD, FMAD, FMIN, FMAX are ----- supported 32-bit IEEE


by Scalar Processors of NVIDIA GPU. floating point
32-bit integer
instructions
both none of the above
1
instructions
9 Each streaming multiprocessor (SM) of CUDA 1024 128 512 8
herdware has ------ scalar processors (SP).
4

10 8 1024 512 16
Each NVIDIA GPU has ------ Streaming
Multiprocessors
4

11 CUDA provides ------- warp and thread


scheduling. Also, the overhead of thread
“programming-ov “zero-overhead”, 1 64, 2 clock
erhead”, 2 clock clock
32, 1 clock
2
creation is on the order of ----.

12 Each warp of GPU receives a single


instruction and “broadcasts” it to all of its
SIMD (Single
instruction
SIMT (Single
instruction
SISD (Single SIST (Single
instruction single instruction single
2
threads. It is a ---- operation. multiple data) multiple thread) data) thread)

13 Limitations of CUDA Kernel recursion, call


stack, static
No recursion, no
call stack, no static
recursion, no call
stack, static
No recursion, call
stack, no static
2
variable variable variable variable
declaration declarations declaration declarations

14 What is Unified Virtual Machine It is a technique


that allow both
It is a technique
for managing
It is a technique
for executing
It is a technique
for executing
1
CPU and GPU to separate host and device code on general purpose
read from single device memory host and host code programs on
virtual machine, spaces. on device. device instead of
simultaneously. host.
15 _______ became the first language specifically Python, GPUs.
designed by a GPU Company to facilitate
C, CPUs. CUDA C, GPUs. Java, CPUs.
3
general purpose computing on ____.
16 The CUDA architecture consists of --------- for
parallel computing kernels and functions.
RISC instruction
set architecture
CISC instruction
set architecture
ZISC instruction
set architecture
PTX instruction
set architecture
4

17 CUDA stands for --------, designed by NVIDIA. Common Union


Discrete
Complex
Unidentified
Compute Unified Complex
Device Unstructured
3
Architecture Device Architecture Distributed
Architecture Architecture

18 The host processor spawns multithread tasks TRUE


(or kernels as they are known in CUDA) onto
FALSE --- ---
1
the GPU device. State true or false.

19 The NVIDIA G80 is a ---- CUDA core device,


the NVIDIA G200 is a ---- CUDA core device,
128, 256, 512 32, 64, 128 64, 128, 256 256, 512, 1024
1
and the NVIDIA Fermi is a ---- CUDA core
device.

20 NVIDIA 8-series GPUs offer -------- . 50-200 GFLOPS 200-400 GFLOPS 400-800 GFLOPS 800-1000 GFLOPS
1

21 IADD, IMUL24, IMAD24, IMIN, IMAX are


----------- supported by Scalar Processors of
32-bit IEEE
floating point
32-bit integer
instructions
both none of the above
2
NVIDIA GPU. instructions

22 CUDA Hardware programming model


supports: a) fully generally data-parallel
a,c,d,f b,c,d,e a,d,e,f a,b,c,d,e,f
4
archtecture; b) General thread launch; c)
Global load-store; d) Parallel data cache; e)
Scalar architecture; f) Integers, bit operation
23 In CUDA memory model there are following
memory types available: a) Registers; b)
a, b, d, f a, c, d, e, f a, b, c, d, e, f b, c, e, f
3
Local Memory; c) Shared Memory; d) Global
Memory; e) Constant Memory; f) Texture
Memory.
24 What is the equivalent of general C program
with CUDA C: int main(void) { printf("Hello,
int main ( void )
{ kernel
__global__ void
kernel( void ) { }
__global__ void
kernel( void )
__global__ int
main ( void )
2
World!\n"); return 0; } <<<1,1>>>(); int main ( void ) { kernel { kernel
printf("Hello, { kernel <<<1,1>>>(); <<<1,1>>>();
World!\n"); return <<<1,1>>>(); printf("Hello, printf("Hello,
0; } printf("Hello, World!\n"); return World!\n");
World!\n"); return 0; } return 0; }
0; }

25 Which function runs on Device (i.e. GPU): a)


__global__ void kernel (void ) { } b) int main
a b both a,b ---
1
( void ) { ... return 0; }

26 A simple kernel for adding two integers:


__global__ void add( int *a, int *b, int *c ) { *c
add() will execute add() will execute add() will be
on device, add() on host, add() will called and
add() will be
called and
1
= *a + *b; } where __global__ is a CUDA C will be called be called from executed on host executed on
keyword which indicates that: from host device device

27 If variable a is host variable and dev_a is a cudaMalloc( &dev malloc( &dev_a,


device (GPU) variable, to allocate memory to _a, sizeof( int ) ) sizeof( int ) )
cudaMalloc( (void malloc( (void**)
**) &dev_a, &dev_a,
3
dev_a select correct statement: sizeof( int ) ) sizeof( int ) )
28 If variable a is host variable and dev_a is a
device (GPU) variable, to copy input from
memcpy( dev_a,
&a, size);
cudaMemcpy( dev memcpy( (void*)
_a, &a, size, dev_a, &a, size);
cudaMemcpy( (vo
id*) &dev_a, &a,
2
variable a to variable dev_a select correct cudaMemcpyHost size,
statement: ToDevice ); cudaMemcpyDevi
ceToHost );

29 Triple angle brackets mark in a statement a call from host


inside main function, what does it indicates? code to device
a call from device less than
code to host code comparison
greater than
comparison
1
code

30 What makes a CUDA code runs in parallel __global__ main() function Kernel name
indicates parallel indicates parallel outside triple
first parameter
value inside
4
execution of code execution of code angle bracket triple angle
indicates bracket (N)
excecution of indicates
kernel N times in excecution of
parallel kernel N times in
parallel
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

Name of the Teacher: Dr. Prasad S.Halgaonkar

Class: BE Subject: High Performance Computing


AY: 2020-21 SEM: I

UNIT-1
1) Conventional architectures coarsely comprise of a______

a) processor
b) Memory system
c) Datapath.
d) All of Above

Ans: d
Explanation:
2) Data intensive applications utilize______

a) High aggregate throughput


b) High aggregate network bandwidth
c) High processing and memory system performance.
d) None of above

Ans: a
Explanation:
3) A pipeline is like_____

a. Overlaps various stages of instruction execution to achieve


performance.
b. House pipeline
c. Both a and b
d. gas line

Ans: a
Explanation:
4) Scheduling of instructions is determined ____
a) True Data Dependency
b) Resource Dependency
c) Branch Dependency
d) All of above

Ans: d
Explanation:
5) VLIW processors rely on______
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

a) Compile time analysis


b) Initial time analysis
c) Final time analysis
d) Mid time analysis
Ans: a
Explanation:
6) Memory system performance is largely captured by_____

a) Latency
b) Bandwidth
c) Both a and b
d) none of above

Ans: c
Explanation:
7) The fraction of data references satisfied by the cache is called_____

a) Cache hit ratio


b) Cache fit ratio
c) Cache best ratio
d) none of above

Ans: a
Explanation:
8) A single control unit that dispatches the same Instruction to various
processors is__
a) SIMD
b) SPMD
c) MIMD
d) None of above

Ans: a
Explanation:
9) The primary forms of data exchange between parallel tasks are_

a. Accessing a shared data space


b. Exchanging messages.
c. Both A and B
d. None of Above
Ans: c
Explanation:
10) Switches map a fixed number of inputs to outputs.
a) True
b) False
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

Ans: a
Explanation:
11) The stage in which the CPU fetches the instructions from the instruction cache in
superscalar organization is
a) Prefetch stage
b) D1 (first decode) stage
c) D2 (second decode) stage
d) Final stage

Ans: a
Explanation: In the prefetch stage of pipeline, the CPU fetches the instructions from the instruction
cache, which stores the instructions to be executed. In this stage, CPU also aligns the
codes appropriately.

12) The CPU decodes the instructions and generates control words in

a) Prefetch stage
b) D1 (first decode) stage
c) D2 (second decode) stage
d) Final stage
Ans: b
In D1 stage, the CPU decodes the instructions and generates control words. For simple
RISC instructions, only single control word is enough for starting the execution.
Explanation:
13) The fifth stage of pipeline is also known as
a) read back stage
b) read forward stage
c) write back stage
d) none of the mentioned
Ans: c
Explanation: The fifth stage or final stage of pipeline is also known as “Write back (WB) stage”.
14) In the execution stage the function performed is
a) CPU accesses data cache
b) executes arithmetic/logic computations
c) executes floating point operations in execution unit
d) all of the mentioned

Ans: d
Explanation: In the execution stage, known as E-stage, the CPU accesses data cache, executes
arithmetic/logic computations, and floating point operations in execution unit.

15) The stage in which the CPU generates an address for data memory references in this
stage is
a) prefetch stage
b) D1 (first decode) stage
c) D2 (second decode) stage
d) execution stage
Ans: c
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

Explanation: In the D2 (second decode) stage, CPU generates an address for data memory
references in this stage. This stage is required where the control word from D1 stage is
again decoded for final execution.

16) The feature of separated caches is


a) supports the superscalar organization
b) high bandwidth
c) low hit ratio
d) all of the mentioned
Ans: d
Explanation: The separated caches have low hit ratio compared to a unified cache, but have the
advantage of supporting the superscalar organization and high bandwidth.
17) In the operand fetch stage, the FPU (Floating Point Unit) fetches the operands from
a) floating point unit
b) instruction cache
c) floating point register file or data cache
d) floating point register file or instruction cache

Ans: C
Explanation: In the operand fetch stage, the FPU (Floating Point Unit) fetches the operands from
either floating point register file or data cache.
18) The FPU (Floating Point Unit) writes the results to the floating point register file in
a) X1 execution state
b) X2 execution state
c) write back stage
d) none of the mentioned

Ans: c
Explanation: In the two execution stages of X1 and X2, the floating point unit reads the data from the
data cache and executes the floating point computation. In the “write back stage” of
pipeline, the FPU (Floating Point Unit) writes the results to the floating point register file.

19) The floating point multiplier segment performs floating point multiplication in

a) single precision
b) double precision
c) extended precision
d) all of the mentioned

Ans: d
Explanation: The floating point multiplier segment performs floating point multiplication in single
precision, double precision and extended precision.
20) The instruction or segment that executes the floating point square root instructions is
a) floating point square root segment
b) floating point division and square root segment
c) floating point divider segment
d) none of the mentioned
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

Ans: c
Explanation: The floating point divider segment executes the floating point division and square root
instructions.

21) The floating point rounder segment performs rounding off operation at

a) after write back stage


b) before write back stage
c) before arithmetic operations
d) none of the mentioned
Ans: b

Explanation: The results of floating point addition or division process may be required to be rounded
off, before write back stage to the floating point registers.
21) Which of the following is a floating point exception that is generated in case of integer
arithmetic?
a) divide by zero
b) overflow
c) denormal operand
d) all of the mentioned
Ans: D

Explanation: In the case of integer arithmetic, the possible floating point exceptions in Pentium are:
1. divide by zero
2. overflow
3. denormal operand
4. underflow
5. invalid operation.

Name and Sign of Subject Teacher


ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

Name of the Teacher: Dr. Prasad S.Halgaonkar

Class: BE Subject: High Performance Computing


AY: 2020-21 SEM: I

UNIT-2
Note: Correct Answers are in Bold Fonts
1. The First step in developing a parallel algorithm is_

A. To Decompose the problem into tasks that can be executed concurrently


B. Execute directly
C. Execute indirectly
D. None of Above

2. The number of tasks into which a problem is decomposed determines its_

A. Granularity
B. Priority
C. Modernity
D. None of above

3. The length of the longest path in a task dependency graph is called_


A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above

4. The graph of tasks (nodes) and their interactions/data exchange (edges)_


A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above

5. Mappings are determined by_

A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above

6. Decomposition Techniques are_


A. recursive decomposition
B. data decomposition
C. exploratory decomposition
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

D. speculative decomposition
E. All of Above

7. The Owner Computes Rule generally states that the process assigned a particular data
item is responsible for_

A. All computation associated with it


B. Only one computation
C. Only two computation
D. Only occasionally computation

8. A simple application of exploratory decomposition is_

A. The solution to a 15 puzzle


B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above

9. Speculative Decomposition consist of _

A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B

10. task characteristics include:


A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above

11. Choose the most accurate (CORRECT) statement:


a. Scalability is a measure of the capacity to increase speedup in
proportion to the number of processors
b. Efficiency is the ratio of the serial run time of the best sequential algorithm for
solving a problem to the time taken by the parallel algorithm to solve the
same problem on p processors
c. Run time is the time that elapses from the moment a parallel computation
starts to the moment the last processor finishes.
d. Superlinear is the fraction of time for which a processor is usefully employed
12. Parallelism can be used to increase the (parallel) size of the problem is applicable in
___________________.
a. Amdahl's Law
b. Gustafson-Barsis's Law
c. Newton's Law
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

d. Pascal's Law
13. ____________ is due to load imbalance, synchronization, or serial components as
parts of overheads in parallel programs.
a. Interprocess interaction
b. Synchronization
c. Idling
d. Excess computation
14. Which of the following parallel methodological design elements focuses on
recognizing opportunities for parallel execution?
a. Partitioning
b. Communication
c. Aggromeration
d. Mapping
15. Considering to use weak or strong scaling is part of ______________ in addressing
the challenges of distributed memory programming.
a. Splitting the problem
b. Speeding up computations
c. Speeding up communication
d. Speeding up hardware
16. Domain and functional decomposition are considered in the following parallel
methodological design elements, EXCEPT:
a. Partitioning
b. Communication
c. Agglomeration
d. Mapping
17. Synchronization is one of the common issues in parallel programming. The issues
related to synchronization include the followings, EXCEPT:
a. Deadlock
b. Livelock
c. Fairness
d. Correctness
18. Which of the followings is the BEST description of Message Passing Interface (MPI)?
a. A specification of a shared memory library
b. MPI uses objects called communicators and groups to define which
collection of processes may communicate with each other
c. Only communicators and not groups are accessible to the programmer only
by a "handle"
d. A communicator is an ordered set of processes

Name and Sign of Subject Teacher


ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

Name of the Teacher: Mr. B A Chaugule


Class: BE Subject: High Performance Computing
AY: 2020-21 SEM: I
UNIT-1
1) Execution of several activities at the same time.
a) processing
b) parallel processing
c) serial processing
d) multitasking
Ans: b
Explanation:
2) Parallel processing has single execution flow.
a) True
b) False
Ans: b
Explanation: The statement is false. Sequential programming specifically has single execution
flow.
3) A term for simultaneous access to a resource, physical or logical.
a) Multiprogramming
b) Multitasking
c) Threads
d) Concurrency
Ans: d
Explanation: Concurrency is the term used for the same. When several things are accessed
simultaneously, the job is said to be concurrent.
4) ______________ leads to concurrency.
a) Serialization
b) Parallelism
c) Serial processing
d) Distribution
Ans: b
Explanation: Parallelism leads naturally to Concurrency. For example, Several processes trying
to print a file on a single printer.
5) A parallelism based on increasing processor word size.
a) Increasing
b) Count based
c) Bit based
d) Bit level
Ans: d
Explanation: Bit level parallelism is based on increasing processor word size. It focuses on
hardware capabilities for structuring.
6) The measure of the “effort” needed to maintain efficiency while adding
processors.
a) Maintainability
b) Efficiency
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

c) Scalability
d) Effectiveness
Ans: C
Explanation: The measure of the “effort” needed to maintain efficiency while adding
processors is called as scalability.
7) Several instructions execution simultaneously in ________________
a) processing
b) parallel processing
c) serial processing
d) multitasking
Ans: b
Explanation: In parallel processing, the several instructions are executed simultaneously.
8) Conventional architectures coarsely comprise of a_
a) A processor
b) Memory system
c) Data path.
d) All of Above
Ans: d
Explanation:
9) A pipeline is like_
a) Overlaps various stages of instruction execution to achieve performance.
b) House pipeline
c) Both a and b
d) A gas line
Ans: a
Explanation:
10) VLIW processors rely on_
a) Compile time analysis
b) Initial time analysis
c) Final time analysis
d) Mid time analysis
Ans: a
Explanation:
11) Memory system performance is largely captured by_
a) Latency
b) Bandwidth
c) Both a and b
d) none of above
Ans: c
Explanation:
12) The fraction of data references satisfied by the cache is called_
a) Cache hit ratio
b) Cache fit ratio
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

c) Cache best ratio


d) none of above
Ans: a
Explanation:
13) A single control unit that dispatches the same Instruction to various processors
is__
a) SIMD
b) SPMD
c) MIMD
d) None of above
Ans: a
Explanation:
14) The primary forms of data exchange between parallel tasks are_
a) Accessing a shared data space
b) Exchanging messages.
c) Both A and B
d) None of Above
Ans: c
Explanation:
16) Switches map a fixed number of inputs to outputs.
a) True
b) False
Ans: a
Explanation:
UNIT-2
1) The First step in developing a parallel algorithm is_
a) To Decompose the problem into tasks that can be executed concurrently
b) Execute directly
c) Execute indirectly
d) None of Above
Ans: a
Explanation:
2) The number of tasks into which a problem is decomposed determines its_
a) Granularity
b) Priority
c) Modernity
d) None of above
Ans: A
Explanation:
3) The length of the longest path in a task dependency graph is called_
a) the critical path length
b) the critical data length
c) the critical bit length
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

d) None of above
Ans: a
Explanation:
4) The graph of tasks (nodes) and their interactions/data exchange (edges)_
a) Is referred to as a task interaction graph
b) Is referred to as a task Communication graph
c) Is referred to as a task interface graph
d) None of Above
Ans: a
Explanation:
5) Mappings are determined by_
a) task dependency
b) task interaction graphs
c) Both A and B
d) None of Above
Ans: c
Explanation:
6) Decomposition Techniques are_
a) recursive decomposition
b) data decomposition
c) exploratory decomposition
d) speculative decomposition
e) All of Above
Ans: E
Explanation:
7) The Owner Computes Rule generally states that the process assigned a
particular data item is responsible for_
a) All computation associated with it
b) Only one computation
c) Only two computation
d) Only occasionally computation
Ans: A
Explanation:
8) A simple application of exploratory decomposition is_
a) The solution to a 15 puzzle
b) The solution to 20 puzzle
c) The solution to any puzzle
d) None of Above
Ans: A
Explanation:
9) Speculative Decomposition consist of _
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

a) conservative approaches
b) optimistic approaches
c) Both A and B
d) Only B
Ans: C
Explanation:
10) task characteristics include:
a) Task generation.
b) Task sizes.
c) Size of data associated with tasks.
d) All of Above
Ans: d
Explanation:
UNIT-3
1) Group communication operations are built using point-to-point messaging
primitives
a) True
b) False
Ans: A
Explanation:
2) Communicating a message of size m over an uncongested network takes
time ts + tmw
a) True
b) False
Ans: A
Explanation:
3) The dual of one-to-all broadcast is_
a) All-to-one reduction
b) All-to-one receiver
c) All-to-one Sum
d) None of Above
Ans: A
Explanation:
4) A hypercube has_
a) 2d nodes
b) 2d nodes
c) 2n Nodes
d) N Nodes
Ans: a
Explanation:
5) A binary tree in which processors are (logically) at the leaves and internal
nodes are routing nodes.
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING

a) True
b) False
Ans: A
Explanation:
6) In All-to-All Broadcast each processor is the source as well as destination.
a) True
b) False
Ans: A
Explanation:
7) The Prefix Sum Operation can be implemented using the_
a) All-to-all broadcast kernel.
b) All-to-one broadcast kernel.
c) One-to-all broadcast Kernel
d) Scatter Kernel
Ans: A
Explanation:
8) In the scatter operation_
a) Single node send a unique message of size m to every other node
b) Single node send a same message of size m to every other node
c) Single node send a unique message of size m to next node
d) None of Above
Ans: A
Explanation:
9) The gather operation is exactly the inverse of the_
a) Scatter operation
b) Broadcast operation
c) Prefix Sum
d) Reduction operation
Ans: A
Explanation:
10) In All-to-All Personalized Communication Each node has a distinct
message of size m for every other node
a) True
b) False
Ans: a
Explanation:

Name and Sign of Subject Teacher


D. Y. Patil College of Engineering, Akurdi, Pune 411044
Department of Computer Engineering

___________________________________________________________________________________

Date: 23/07/2020

Class : BE Computer Div: A + B Subject : High Performance Computing


Academic Year : 2020-21 Sem : I Exam Date: 23/07/2020

Q. Options Correct
Question Description Marks CO PO PSO BTL
No. Answer

1 Select different A. data intensive applications utilize D 2 1 1 3 4


aspects of high aggregate throughput
parallelism B. server applications utilize high
aggregate network bandwidth
C. scientific applications typically
utilize high processing and memory
system performance
D. all of the above
2 Select correct A. 10 A 2 1 1 3 4
answer: DRAM B. 20
access times have C. 40
only improved at D. 50
the rate of roughly
____% per year
over this interval.

3 Justify, why to use A. Real world is massively parallel E 2 1 3 3 5


parallel B. Save time and/or time
computing? C. Solve larger / more complex
problems
D. Provide concurrency
E. All of the above

4 Analyze, if the A. In-order B 2 1 3 3 4


second instruction B. Out-of-order
has data C. Both of the above
dependencies with D. None of the above
the first, but the
third instruction
does not, the first
and third
instructions can be
co-scheduled.
Which type if this
issue is?

5 Select the A. Latency C 2 1 3 3 4


parameters which B. Bandwidth
captures Memory C. Both of the above
system D. None of the above
performance

6 Consider the A. Bandwidth: 5 gallons/second and C 2 1 4 3 5


example of a fire- Latency: 15 seconds
hose. If the water B. Bandwidth: 5*15 gallons/second
comes out of the and Latency: 15 seconds
hose five seconds C. Bandwidth: 15 gallons/second and
after the hydrant is Latency: 5 seconds
turned on. Once D. Bandwidth: 3 gallons/second and
the water starts Latency: 5 seconds
flowing, if the
hydrant delivers
water at the rate of
15 gallons/second.
Analyze the
bandwidth and
latency.

7 Select alternate A. Prefeching D 2 1 3 3 4


approaches for B. Multithreading
Hiding Memory C. spatial locality
Latency D. all of the above

8 Select which A. Private B 2 1 5 3 4


clause in OpenMP B. Firstprivate
is similar to the C. Shared
private, except D. All of the above
values of variables
are initialized to
corresponding
values before the
parallel directive.

9 The time which A. Startup time (ts) C 2 1 1 3 1


includes all B. Per-hop time (th)
overheads that are C. Per-word transfer time (tw)
determined by the D. All of the above
length of the
message like
bandwidth of
links, error
checking and
correction, etc. is
called as

10 Select in which A. Store-and-forward routing D 2 1 3 3 4


routing technique, B. Packet routing
Message is divided C. cut-through-routing
into packets? D. in both 2 and 3

11 Which of the A. Snoopy writes A 2 1 1 3 1


following is an B. Write through
efficient method of C. Write within
cache updating? D. Buffered write

12 Select which A. Data coherence protocols D 2 1 3 3 4


protocol is used B. Commit coherence protocols
for maintaining C. Recurrence
coherence of D. Cache coherence protocols
multiple
processors?

13 From inter- A. Coherence misses A 2 1 1 3 1


processor B. Commit misses
communication, C. Parallel processing
the misses arises D. Hit rate
are often called

14 As per Flynn’s A. in the instruction stream C 2 1 1 3 1


Classification, B. in the data stream
where Parallel C. both of the above
processing may D. none of the above
occur?
15 Which of the A. Blue Gene / L B 2 1 1 3 1
following projects B. Blue Gene / M
of Blue Gene is C. Blue Gene / P
not in
D. Blue Gene / Q
development?

(Mrs. Dhanashree Phalke) (Mrs. Vaishali Kolhe) ( Dr. Kailash Shaw) (Dr. Vinayak Kottawar)
Subject Teacher Academic Coordintor Dept. NBA Coordinator HOD Computer
D. Y. Patil College of Engineering, Akurdi, Pune 411044
Department of Computer Engineering

___________________________________________________________________________________

Unit Test II Date: 26/08/2020

Class : BE Computer Div: A + B Subject : High Performance Computing


Academic Year : 2020-21 Sem : I Exam Date: 26/08/2020

Q. Options 28 Correct
Question Description Marks CO PO PSO BTL
No. Answer

1 Task interaction A. control, data D 2 2 1 3 4


graphs represent B. task, data
_____ C. process, control
dependencies, D. data, control
whereas task
dependency graphs
represent ______
dependencies.
2 Select correct A. task dependency graph D 2 2 1 3 4
answer. Which B. process dependency graph
graph represents C. process interaction graph
tasks as nodes and D. task interaction graph
their
interactions/data
exchange as
edges?

3 The average A. average degree of concurrency A 2 2 1 3 4


number of tasks B. degree of concurrency
that can be C. critical path length
processed in D. maximum concurrency
parallel over the
execution of the
program is called
as
_______________

4 The number of A. average concurrency B 2 2 1 3 4


tasks that can be B. degree of concurrency
executed in C. critical path length
parallel is the D. maximum concurrency
____________ of
a decomposition.

5 A decomposition A. process dependency graph B 2 2 1 3 4


can be illustrated B. task dependency graph
in the form of a C. task interaction graph
directed graph D. process interaction graph
with nodes
corresponding to
tasks and edges
indicating that the
result of one task
is required for
processing the
next. Such graph
is called as
_________

6 In which case, the A. input data decomposition B 2 2 4 3 5


owner computes B. output data decomposition
rule implies that C. Both of the above
the output is D. None of the above
computed by the
process to which
the output data is
assigned?

7 Select relevant A. Task generation D 2 2 1 3 4


task characteristics B. Task sizes
from the options C. Size of data associated with tasks
given below: D. All of the above

8 A classic example A. Static Task Generation B 2 2 3 3 4


of game playing - B. Dynamic Task Generation
each 15 puzzle C. None of the above
board is the D. All of the above
example of
___________

9 Analyze task A. static regular interaction pattern B 2 2 3 3 5


interaction pattern B. static irregular interaction pattern
of the C. dynamic regular interaction pattern
multiplication of a D. dynamic irregular interaction pattern
sparse matrix with
a vector.

10 Select the methods A. Maximize data locality E 2 2 3 3 4


for containing B. Minimize volume of data exchange
Interaction C. Minimize frequency of interactions
Overheads. D. Minimize contention and hot-spots
E. All of the above

11 Which model is A. Work pool model B 2 2 1 3 4


equally suitable B. Master slave model
to shared-address- C. Data parallel model
space or message- D. Producer consumer or pipeline
passing model
paradigms, since
the interaction is
naturally two
ways.

12 In which type of A. Work pool model A 2 2 3 3 4


the model, tasks B. Master slave model
are dynamically C. Data parallel model
assigned to the D. Producer consumer or pipeline
processes for model
balancing the
load?

13 Select the A. pixel processing D 2 2 12 3 4


appropriate stage B. vertex processing
of GPU Pipeline C. memory interface
which receives D. host interface
commands from
CPU and also pulls
geometry
information from
system memory.

14 Select the A. GPU Clock Speed E 2 2 12 3 1


hardware B. Size of memory bus
specifications C. Amount of available memory
which most affect D. Memory Clock Rate
the GPU cards E. All of the above
speed.

15 Select the A. pixel processing A 2 2 12 3 1


appropriate stage B. vertex processing
of GPU Pipeline C. memory interface
where
D. host interface
computations
include texture
mapping and math
operations.

(Mrs. Dhanashree Phalke) (Mrs. Vaishali Kolhe) ( Dr. Kailash Shaw) (Dr. Vinayak Kottawar)
Subject Teacher Academic Coordintor Dept. NBA Coordinator HOD Computer
D. Y. Patil College of Engineering, Akurdi, Pune 411044
Department of Computer Engineering

___________________________________________________________________________________

Unit Test III Date: 14/10/2020

Class : BE Computer Div: A Subject : High Performance Computing


Academic Year : 2020-21 Sem : I Exam Date: 14/10/2020

Q. Options 28 Correct
Question Description Marks CO PO PSO BTL
No. Answer

1 In all-to-one reduction, A. First C 2 3 1 3 4


data items must be B. Last
combined piece-wise and C. Target
the result made available
D. N-1
at a _________
processor.
2 Analyze the Cost of A. T=tw log p + ts m (p-1) B 2 3 4 3 4
Scatter and Gather . B. T=ts log p + tw m (p-1)
C. T=ts log p - tw m (p-1)
D. T=tw log p - ts m (p-1)

3 All-to-all personalized A. partial exchange B 2 3 1 3 1


communication is also B. total exchange
known as C. both of the above
_______________. D. none of the above

4 All-to-all personalized A. m C 2 3 1 3 4
communication is B. p
performed independently C. m√p
in each row with D. p√m
clustered messages of
size _______ on a mesh.

5 In All-to-All A. m A 2 3 1 3 1
Personalized B. p
Communication on a C. m-1
Ring, the size of the D. p-1
message reduces by
______ at each step
6 All-to-All Broadcast and A. p C 2 3 1 3 1
Reduction algorithm on a B. p+1
Ring terminates in C. p-1
_________steps. D. p*p

7 In All-to-all Broadcast on A. rowwise, rowwise B 2 3 1 3 3


a Mesh, operation B. rowwise, columnwise
performs in which C. columnwise, rowwise
sequence? D. columnwise, columnwise

8 In the ________ A. Scatter A 2 3 3 3 1


operation, a single node B. gather
sends a unique message
of size m to every other
node.

9 In the _____ operation, a A. Scatter B 2 3 3 3 1


single node collects a B. gather
unique message from
each node.

10 Messages get smaller in A. broadcast, gather C 2 3 1 3 4


_________and stay B. gather, broadcast
constant in _________. C. scatter , broadcast
D. scatter, gather

11 The time taken by all-to- A. T= 2ts(√p – 1) + twm(p-1) B 2 3 4 3 4


all broadcast on a ring is B. T= (ts + twm)(p-1)
______. C. T= ts logp + twm(p-1)
D. T= 2ts(√p – 1) - twm(p-1)

12 The time taken by all-to- A. T= 2ts(√p – 1) + twm(p-1) A 2 3 4 3 4


all broadcast on a mesh is B. T= (ts + twm)(p-1)
______. C. T= ts logp + twm(p-1)
D. T= 2ts(√p – 1) - twm(p-1)

13 The time taken by all-to- A. T= 2ts(√p – 1) + twm(p-1) C 2 3 4 3 4


all broadcast on a B. T= (ts + twm)(p-1)
hypercube is ______. C. T= ts logp + twm(p-1)
D. T= 2ts(√p – 1) - twm(p-1)

14 _____ is a special A. Left shift C 2 3 1 3 1


permutation in which B. Right shift
node i sends a data C. Circular shift
packet to node (i + q) D. Linear shift
mod p in a p-node
ensemble (0 ≤ q ≤ p).

15 The prefix-sum operation A. all-to-all reduction B 2 3 1 3 1


can be implemented B. all-to-all broadcast
using the _______kernel C. one-to-all broadcast
D. all-to-one broadcast

(Mrs. Dhanashree Phalke) (Mrs. Vaishali Kolhe) ( Dr. Kailash Shaw) (Dr. Vinayak Kottawar)
Subject Teacher Academic Coordintor Dept. NBA Coordinator HOD Computer
D. Y. Patil College of Engineering, Akurdi, Pune 411044
Department of Computer Engineering

___________________________________________________________________________________

Unit Test IV Date: 09/11/2020

Class : BE Computer Div: A Subject : High Performance Computing


Academic Year : 2020-21 Sem : I Exam Date: 11/11/2020

Q. Options 28 Correct
Question Description Marks CO PO PSO BTL
No. Answer

1 Select the parameters on A. Input size D 2 4 1 3 4


which the parallel B. Number of processors
runtime of a program C. Communication parameters of
depends. the machine
D. All of the above
2 The time that elapses A. Serial runtime B 2 4 4 3 4
from the moment the first B. Parallel runtime
processor starts to the C. Overhead runtime
moment the last D. Excess runtime
processor finishes
execution is called as
___________.

3 Select how the overhead A. To = TP - TS C 2 4 1 3 1


function (To) is B. To = p*n TP - TS
calculated. C. To = p TP - TS
D. To = TP - pTS

4 What is is the ratio of the A. Efficiency C 2 4 1 3 4


time taken to solve a B. Overall time
problem on a single C. Speedup
processor to the time D. Scaleup
required to solve the
same problem on a
parallel computer with p
identical processing
elements?

5 The parallel time for odd- A. 3.75 B 2 4 1 3 1


even sort (efficient B. 3.5
parallelization of bubble C. 0.33
sort) is 50 seconds. The D. 0.26
serial time for bubblesort
is 175 seconds. Evaluate
the speedup of bubble
sort.

6 Consider the problem of A. E = Θ (n / log n) D 2 4 1 3 1


adding n numbers by B. E = Θ (n log n)
using n processing C. E = Θ (log n / n)
elements. The serial time D. E = Θ (1 / log n)
taken is Θ (n) and
parallel time is Θ (log n).
Evaluate the efficiency.

7 What will be the A. E = O(n). B 2 4 1 3 3


efficiency of cost optimal B. E = O(1).
parallel systems? C. E = O(p).
D. E = O(n log n).

8 Which law states that the A. Amdahl’s Law A 2 4 3 3 1


maximum speedup of a B. Flynn’s Law
parallel program is C. Moore’s Law
limited by the sequential D. Van Neumann’s Law
fraction of the initial
sequential program?

9 Arrange the steps for the A. i, ii, iii B 2 4 3 3 1


Matrix-Vector 2-D B. ii, iii, i
partitioning: C. iii, i, ii
D. ii, i, iii
i) result vector is
computed by
performing an all-to-
one reduction along
the columns.
ii) Alignment of the
vector x along the
principal diagonal of
the matrix.
iii) Copy the vector
elements from each
diagonal process to
all the processes in
the corresponding
column using n
simultaneous
broadcasts among all
processors in the
column.

10 Arrange the A. i, ii, iii C 2 4 1 3 4


communication sequence B. ii, iii, i
in Matrix-Vector 2-D C. iii, ii, i
partitioning: D. ii, i, iii

i) all-to-one reduction
in each row
ii) one-to-all broadcast
of each vector
element among the n
processes of each
column
iii) one-to-one
communication to
align the vector
along the main
diagonal
11 Parallel time in Rowwise A. Θ(1) D 2 4 4 3 4
1-D Partitioning of B. Θ(n log n)
Matrix-Vector C. Θ(n2)
Multiplication where p=n D. Θ(n)
is ____.

12 What are the sources of A. Interprocess interaction D 2 4 4 3 4


overhead in parallel B. Idling
programs? C. Excess computation
D. All of the above

13 What are the A. Execution time E 2 4 4 3 4


performance metrics of B. Total parallel overhead
parallel systems? C. Speedup
D. Efficiency
E. All of the above

14 The isoefficiency A. True A 2 4 1 3 1


function determines the B. False
ease with which a
parallel system can
maintain a constant
efficiency. True of false?

15 Which matrix-matrix A. Cannon’s algorithm B 2 4 1 3 1


multiplication algorithm B. DNS algorithm
uses a 3-D partitioning? C. Both of the above
D. None of the above

(Mrs. Dhanashree Phalke) (Mrs. Vaishali Kolhe) ( Dr. Kailash Shaw) (Dr. Vinayak Kottawar)
Subject Teacher Academic Coordintor Dept. NBA Coordinator HOD Computer
D. Y. Patil College of Engineering, Akurdi, Pune 411044
Department of Computer Engineering

___________________________________________________________________________________

Prelim Exam Date: 29/12/2020

Class : BE Computer Div: A & B Subject : High Performance Co mputing


Academic Year : 2020-21 Sem : I Exam Date: 31/12/2020

Options Corre
Q. ct
Question Description Marks CO PO PSO BTL
No. Answ
er

1 Which of the following is the a. Bit level parallelism D 1 1 1,1 1 2


type of parallelism? b. Instruction level 2
parallelism
c. Loop level parallelism
d. All of the above
2 Which of the parallelism is a. Bit level parallelism B 1 1 1,1 1 2
used by VLIW b. Instruction level 2
parallelism
c. Loop level parallelism
d. Task level Parallelism

3 Tendency of a software a. Spatial Locality a 1 1 1 1 1


process to access information
items whose addresses are b. Temporal locality
near one another known as c. Permanent Locality

d. Sequential Locality

4 Parallel Computers are a. SISD d 1 1 1,1 1 1


classified based on Flynn’s 2
taxonomy which among the b. SIMD
following options does not c. MIMD
come under this
d. SIPD

5 Which among the following a. Hypercube b 1 1 1 1 2


is the popular multistage
network b. Omega
c. Gamma

d. K-D Mesh

6 The multicore architecture a. Homogeneous core b 1 1 1 1 3


that consists of dedicated architecture.
application specific processor
cores that would target the b. Heterogeneous core
issue of running variety of architecture.
applications to be executed c. Polaris core
on a computer. architecture

d. None of the above

7 Decomposition of a. Fine grained C 1 2 1 3 1


computation into a small granularity
number of large task is
b. course grained
granularity

c. coarse grained
granularity

d. task grained
granularity

8 Which among the following a. Data-decomposition D 1 2 1,1 3 2


is the type of decomposition 2
b. Hybrid decomposition

c. Speculative
decomposition

d. All of the above

9 The 15-puzzle problem uses a. Data decomposition B 1 2 1,4 3 2


which type of decomposition ,12
b. Exploratory
decomposition

c. Speculative
decomposition

d. Recursive
decomposition
10 An interaction pattern is a. Structured interaction C 1 2 1,1 3 2
considered to be _______if it 2
has some structure that can be b. unstructured
exploited for efficient interaction
implementation c. Regular interaction

d. Irregular interaction

11 The mapping in which is a. Dynamic mapping a 1 2 1 1 1


tasks are distributed to
processes during execution is b. Static mapping
called as___ c. Pre-execution
mapping

d. In-process mapping

12 The parallel algorithm model a. The data parallel d 1 2 1,2 1 2


in which mapping of tasks is model
done dynamically where
pointer to tasks is stored in b. Producer consumer
physically shared list/priority model
queue/hash table/tree is called c. The task graph model

d. Work pool model

13 The world’s first GPU is a. GEForce 356 B 1 6 5 3 1


marketed by NVDIA in 1999
is b. GEForce 256

c. GEForce 3800

d. GEForce 956

14 The operation in which data a. All to one reduction A 1 3 1 1 2


from all processes are
combined at a single b. All to all reduction
destination process is c. one to all reduction

d. None of the above

15 In scatter operation a single a. One-to-one C 1 3 1 1 2


node sends a unique message personalized
to every node is also called as communication
b. One-to-all broadcast
communication
c. One-to-all personalized
communication
d. all-to-all personalized
communication

16 Single port communication a. True B 1 3 1 1 1


node can communicate on all b. False
the channels connected to it
and provides apparent
speedup
17 Symmetric multiprocessors a. Uniform memory A 1 3 1 1 1
architecture are sometimes access
known as
b. Static memory access

c. Variable memory
access

d. All of the above

18 Heuristic is way of trying a. To discover something a 1 4 1,2 3 2


or an idea embedded
in a program

b. To search and
measure how far a
node in a search tree
seems to be from a
goal

c. To compare two nodes


in a search tree to see
if one is better than
another

d. All of the mentioned

19 A * algorithm is based on a. Breadth-First search C 1 5 1,2 1 2

b. Depth-first Search

c. Best first search

d. Hill climbing
20 Best – First search can be a. Queue C 1 5 1,2 1 1
implemented using the
following data structure b. Stack

c. Priority Queue

d. Circular Queue

21 _____is a measure of the a. Scalability B 1 5 1,2 1 2


fraction of time for which a
processing element is usefully b. Efficiency
employed
c. Speedup

d. isoefficiency

22 The ___of a parallel system is A. speedup D 1 3 1,1 1 2


a measure of its capacity to 2
increase speedup in B. Cost
proportion to the number of
processing elements C. Efficiency

D. Scalability

23 ___helps us determine the a. Isoefficiency Metric of A 1 3 1,3 1 2


best algorithm/architecture scalability
combination for a particular
problem without explicitly b. Efficiency matric of
analyzing all possible scalability
combinations under all
possible co c. Cost metric of
scalability

d. None of the above

24 It is defined as a ratio of the a. Total parallel D 1 3 1,1 1 1


time taken to solve a problem overhead 2
on a single processing
element to the time computer b. Efficiency
with p identical processing
elements c. Cost

d. speedup

25 In Practice a speedup greater a. scalability effect C 1 3 1,2, 1 1


than p is sometimes observed. 12
It is called as_______ b. superscalar effect
c. super linearity effect

d. speedup effect

26 Odd-even transposition sort is a. theta(n^2) A 1 5 1,2, 3 3


not cost -optimal, because 5
time product is b. theta(n^logn)

c. O(n^3)

d. O(n+logn)

27 The quicksort algorithm, a. O(n^3) C 1 5 1,2, 1 3


which has an average 5
complexity of b. O(n+logn)

c. theta(n^logn)

d. theta(n^2)

28 Parallel code executes in a. Synchronising B 1 6 1,2, 1 2


many concurrent Device multiprocessor 12
(GPU) threads across
multiple parallel processing b. Streaming
elements, called multiprocessor

c. Scalable
multiprocessor

d. Summative
multiprocessor

29 ____partitions the vertices a. Source parallel C 1 5 1,2, 3 2


among different processes formulation 12
and has each process compute
the single-source shortest b. Single partitioned
path for all vertices assigned formulation
to it
c. Source partitioned
formulation

d. Shortest path
partitioned
formulation

30 A processor, assigned with a a. Multithreaded DIMS B 1 2 1 1 1


thread block that executes
code, which we usually call a processor

b. Multithreaded SIMD
processor

c. Multithreaded queue

d. Multithreaded stack

31 Processor of system, which a. Server D 1 6 1 1 1


can read/write GPU memory,
is known as b. Kernel

c. Guest

d. Host

32 CUDA stands for a. Compute uniform D 1 6 1,2, 2 1


device architecture 5

b. Computing universal
device architecture

c. Computer unicode
device architecture

d. Compute unified
device architecture

33 The device that are being a. Servers A 1 1 1 1 1


used primarily for database,
file server and mostly for web b. Desktops
application are known as
c. Tablets

d. Supercomputers

34 GPU are designed for running a. True B 1 6 1,2 1 1


a large number of complex
tasks b. False

35 The parallel algorithm design a. All to one broadcast C 1 3 1 1 2


contains a number of b. All to all broadcast
processes where one process c. One to all broadcast
may send the identical data to d. None of these
all other processes is called as
36 The efficient utilization can a. Recursive doubling 1 3 1 1 1
be done by devising a b. Recursive
broadcasting algorithm with c. Scatter and Gather a
the method known as d. None of these

37 The balanced tree is mapped a. switching nodes, 1 3 1,1 1 2


neutrally from the hypercube processing nodes 2
algorithm for one-to-all b. processing nodes, a
broadcast where intermediate switching nodes
are the ________and each
leaf nodes are the
__________
38 Finding prefix-sum operation a. True 1 3 1,1 1 1
is also called as scan b. False 2
operation a

39 All to all personalized a. Scan operation 1 3 1,1 1 2


communication is also called b. Total exchange method 2
as c. None of these B

40 On which network broadcast a. Ring 1 3 1,1 1 2


and reduction operations b. Hypercube 2
performed in two steps: c. Linear array d
1. Operations along with d. Mesh
row
2. Operations along with
column
41 Gather operation is also a. True 1 3 1,8 1 1
called as all to one reduction b. False b

42 The method which is used in a. All-to-all personalized 1 3 1,1 1 1


various parallel algorithm like communication d 2
Fourier transform, matrix b. All-to-all Broadcast
transpose, some parallel c. Total exchange method
database join operations is d. Both a & c
called as

43 Consider a sequence in which a. <2,6,11,17,18 > 1 3 4 2 3


numbers are originally b. <6,15,21,22 >
arranged<2,4,5,6,1>, then c. None of these a
sequence of Prefix sum will
be
44 Select the parameters on A. D 1 4 1 I 3 4
which the parallel runtime of nput size
a program depends. B. N
umber of processors
C. C
ommunication parameters
of the machine
D. A
ll of the above

45 The time that elapses from A. Serial runtime B 1 4 4 3 4


the moment the first
processor starts to the B. Parallel runtime
moment the last processor
finishes execution is called as C. Overhead runtime
___________.
D. Excess runtime

46 Select how the overhead A. To = TP - TS C 1 4 1 3 1


function (To) is calculated.
B. To = p*n TP - TS

C. To = p TP - TS

D. To = TP - pTS

47 The parallel time for odd- A. 3.75 B 1 4 1 3 1


even sort (efficient
parallelization of bubble sort) B. 3.5
is 50 seconds. The serial time
for bubble sort is 175 C. 0.33
seconds. Evaluate the
speedup of bubble sort. D. 0.26

48 Consider the problem of A. E = Θ (n / log n) D 1 4 1 3 1


adding n numbers by using n
processing elements. The B. E = Θ (n log n)
serial time taken is Θ (n) and C. E = Θ (log n / n)
parallel time is Θ (log n).
Evaluate the efficiency. D. E = Θ (1 / log n)

49 What will be the efficiency of A. E = O(n). B 1 4 1 3 3


cost optimal parallel systems?
B. E = O(1).
C. E = O(p).

D. E = O(n log n).

50 Which law states that the A. Amdahl’s Law A 1 4 3 3 1


maximum speedup of a
parallel program is limited by B. Flynn’s Law
the sequential fraction of the
initial sequential program? C. Moore’s Law

D. Van Neumann’s Law

51 Arrange the steps for the A. i, ii, iii B 1 4 3 3 1


Matrix-Vector 2-D
partitioning B. ii, iii, i

i) result vector is computed C. iii, i, ii


by performing an all-to-one D. ii, i, iii
reduction along the columns.

ii)Alignment of the vector x


along the principal diagonal
of the matrix.

iii)Copy the vector elements


from each diagonal process to
all the processes in the
corresponding column using
n simultaneous broadcasts
among all processors in the
column.

52 Arrange the communication A. i, ii, iii C 1 4 1 3 4


sequence in Matrix-Vector 2-
D partitioning: B. ii, iii, i

i) all-to-one reduction in C. iii, ii, i


each row D. ii, i, iii
ii) one-to-all broadcast of
each vector element
among the n processes of
each column

iii) one-to-one
communication to align
the vector along the main
diagonal

53 Parallel time in Rowwise 1-D A. Θ(1) D 1 4 4 3 4


Partitioning of Matrix-Vector
Multiplication where p=n is B. Θ(n log n)
____.
C. Θ(n2)

D. Θ(n)

54 NVIDIA thought that a. CDA thread c 1 6 1,2, 1 2


‘unifying theme’ of every 12
forms of parallelism is the b. PTA thread

c. CUDA thread

d. CUD thread

55 Thread being blocked a. Thread block a 1 6 1,2, 1 2


altogether and being executed 12
in sets of 32 threads, called a b. 32 thread

c. 32 block

d. Unit block

56 Length of a vector operation a. Known a 1 6 1,2, 1 3


in a real program is often 12,
b. Unknown 6

c. Visible

d. Invisible

57 A code, known as grid which a. 32 thread d 1 6 1,1 1 1


runs on a GPU consisting of a 2,5
set of b. Unit block

c. 32 block

d. Thread block

58 NVDIA unvield the industrys a. GTX 1050 b 1 6 1,1 1 1


first directX 10 GPU is___ 2,5
b. GeForce 8800 GTX 1

c. GeForce GTX 1080

d. GTX 1060

59 The number of instructions a. Instruction count A 1 2 1 1 1


being executed defines the
b. Hit time

c. Clock rate

d. All above

60 In CUDA Programming a. <<<>>> d 1 6 1,2, 3 2


kernel is launch using which 12,
pair of brackets? b. {{{}}} 5

c. ((()))

d. [[[]]]

61 In CUDA programming the a. Memcopy() c 1 6 1,2, 1 1


transfer of data between host 12,
and device special function b. Memorycpy() 5
used is ___
c. cudaMemcopy()

d. cudaMemorycpy()

62 Streaming multiprocessor in a. WRAP a 1 6 1,1 1 2


CUDA, divides the thread in 2,5
a block is called as___ b. Packet

c. Grid

d. Thread block

63 Sources of overheads in a. Idling d 1 3 1,1 1 2


parallel program are 2,2
b. Interprocess
communication

c. Excess computation

d. All of the above

64 What are the sources of A. Interprocess interaction D 1 4 4 3 4


overhead in parallel
programs? B. Idling

C. Excess computation

D. All of the above

65 What are the performance A. Execution time E 1 4 4 3 4


metrics of parallel systems?
B. Total parallel overhead

C. Speedup

D. Efficiency

E. All of the above

66 The isoefficiency function A. True A 1 4 1 3 1


determines the ease with
which a parallel system can B. False
maintain a constant
efficiency. True of false?
67 Which matrix-matrix A. Cannon’s algorithm B 1 4 1 3 1
multiplication algorithm uses
a 3-D partitioning? B. DNS algorithm

C. Both of the above

D. None of the above

68 A solution representing a A. CDA c 1 6 1 1 2


parallelism in an algorithm is
B. PTA

C. CUDA

D. CUD

69 Blocking optimization is used A. Hit miss B 1 5 1 1 2


to improve temporal locality,
for reduce B. Misses
C. Hit rate

D. Cache misses

70 Data are allocated to disks in A. Block level A 1 6 1 1 1


the RAID at the
B. Cache level

C. Low level

D. High level

71 In CUDA C programming a. CPU, CPU d 1 6 1,2, 2 2


serial code is executed 12,
by__and parallel code is b. GPU,CPU 5
executed by__
c. GPU,GPU

d. CPU,GPU

72 Kernel function is qualified a. __local__ C 1 6 1,3 1 1


by the qualifier
b. __universal__

c. __global__

d. A or C

(Mrs. D.A. Phalke & Mrs. Neha D. Patil) (Mrs. Vaishali Kolhe) ( Dr. Kailash Shaw) (Dr. Vinayak Kottawar)
Subject Teacher Academic Coordintor Dept. NBA Coordinator HOD Computer
HPC MCQ QB for Insem Examination

Unit I
1. Conventional architectures coarsely comprise of a_

A. A processor
B. Memory system
C Data path.
D All of Above

2. Data intensive applications utilize_

A High aggregate throughput


B High aggregate network bandwidth
C High processing and memory system performance.
D None of above

3. A pipeline is like_

A Overlaps various stages of instruction execution to achieve performance.


B House pipeline
C Both a and b
D A gas line

4. Scheduling of instructions is determined_

A True Data Dependency


B Resource Dependency
C Branch Dependency
D All of above

5. VLIW processors rely on_

A Compile time analysis


B Initial time analysis
C Final time analysis
D Mid time analysis

6. Memory system performance is largely captured by_

A Latency
B Bandwidth
C Both a and b
D none of above

7. The fraction of data references satisfied by the cache is called_


A Cache hit ratio
B Cache fit ratio
B Cache best ratio
C none of above

8. A single control unit that dispatches the same Instruction to various processors is__

A SIMD
B SPMD
C MIMD
D None of above

9. The primary forms of data exchange between parallel tasks are_

A Accessing a shared data space


B Exchanging messages.
C Both A and B
D None of Above

10. Switches map a fixed number of inputs to outputs.


A True
B False

Unit 2
1. The First step in developing a parallel algorithm is_

A. To Decompose the problem into tasks that can be executed concurrently


B. Execute directly
C. Execute indirectly
D. None of Above

2. The number of tasks into which a problem is decomposed determines its_

A. Granularity
B. Priority
C. Modernity
D. None of above

3. The length of the longest path in a task dependency graph is called_


A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above

4. The graph of tasks (nodes) and their interactions/data exchange (edges)_


A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above

5. Mappings are determined by_

A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above

6. Decomposition Techniques are_


A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. speculative decomposition
E. All of Above

7. The Owner Computes Rule generally states that the process assigned a particular data
item is responsible for_

A. All computation associated with it


B. Only one computation
C. Only two computation
D. Only occasionally computation

8. A simple application of exploratory decomposition is_

A. The solution to a 15 puzzle


B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above

9. Speculative Decomposition consist of _

A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B

10. task characteristics include:


A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above
Unit 3

1. Group communication operations are built using point-to-point messaging primitives


A. True
B. False

2. Communicating a message of size m over an uncongested network takes time ts + tmw

A. True
B. False

3. The dual of one-to-all broadcast is_

A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above

4. A hypercube has_

A. 2d nodes
B. 2d nodes
C. 2n Nodes
D. N Nodes

5. A binary tree in which processors are (logically) at the leaves and internal nodes are
routing nodes.

A. True
B. False

6. In All-to-All Broadcast each processor is the source as well as destination.

A. True
B. False

7. The Prefix Sum Operation can be implemented using the_

A. All-to-all broadcast kernel.


B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel

8. In the scatter operation_


A. Single node send a unique message of size m to every other node
B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node
D. None of Above

9. The gather operation is exactly the inverse of the_

A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation

10. In All-to-All Personalized Communication Each node has a distinct message of size m
for every other node

A. True
B. False

1. It is ___________ strength and ___________ permeability.


a) High, high
b) Low, low
c) High, low
d) Low, high
View Answer
Answer: c
Explanation: It is specifically chosen so as to have particularly appropriate properties for the
expected use of the structure such as high strength and low permeability.

2. High Performance concrete works out to be economical.


a) True
b) False
View Answer
Answer: a
Explanation: High Performance concrete works out to be economical, even though its initial
cost is high.

3. HPC is not used in high span bridges.


a) True
b) False
View Answer
Answer: b
Explanation: Major applications of high-performance concrete in the field of Civil
Engineering constructions have been in the areas of long-span bridges, high-rise buildings
or structures, highway pavements, etc.

4. Concrete having 28- days’ compressive strength in the range of 60 to 100 MPa.
a) HPC
b) VHPC
c) OPC
d) HSC
View Answer
Answer: a
Explanation: High Performance Concrete having 28- days’ compressive strength in the
range of 60 to 100 MPa.

5. Concrete having 28-days compressive strength in the range of 100 to 150 MPa.
a) HPC
b) VHPC
c) OPC
d) HSC
View Answer
Answer: b
Explanation: Very high performing Concrete having 28-days compressive strength in the
range of 100 to 150 MPa.

w to Install Unity on Ubuntu 18.04 [Complete Procedure]

6. High-Performance Concrete is ____________ as compared to Normal Strength


Concrete.
a) Less brittle
b) Brittle
c) More brittle
d) Highly ductile
View Answer
Answer: c
Explanation: High-Performance Concrete is more brittle as compared to Normal Strength
Concrete (NSC), especially when high strength is the main criteria.

7. The choice of cement for high-strength concrete should not be based only on mortar-
cube tests but it should also include tests of compressive strengths of concrete at
___________ days.
a) 28, 56, 91
b) 28, 60, 90
c) 30, 60, 90
d) 30, 45, 60
View Answer
Answer: a
Explanation: The choice of cement for high-strength concrete should not be based only on
mortar-cube tests but it should also include tests of compressive strengths of concrete at
28, 56, and 91 days.

8. For high-strength concrete, a cement should produce a minimum 7-days mortar-cube


strength of approximately ___ MPa.
a) 10
b) 20
c) 30
d) 40
View Answer
Answer: c
Explanation: For high-strength concrete, a cement should produce a minimum 7-days
mortar-cube strength of approximately 30 MPa.

9. ____________ mm nominal maximum size aggregates gives optimum strength.


a) 9.5 and 10.5
b) 10.5 and 12.5
c) 9.5 and 12.5
d) 11.5 and 12.5
View Answer
Answer: c
Explanation: Many studies have found that 9.5 mm to 12.5 mm nominal maximum size
aggregates gives optimum strength.

10. Due to low w/c ratio _____________


a) It doesn’t cause any problems
b) It causes problems
c) Workability is easy
d) Strength is more
View Answer
Answer: b
Explanation: Due to the low w/c ratio, it causes problems so superplasticizers are used.
marks question A B C D ans
Interconnection Networks Direct Both Static and
0 1 Both Dynamic Static
can be classified as? Network Dynamic.
Parallel Computers are used
Algorithmic Optimization This is an
1 1 to solve which types of Both None
Problems Problems explaination.
problems.
One clock Is used
How many clocks control
2 1 One Three Four Five to control all the
all the stages in a pipeline?
stages.
Main memory is
Main memory in parallel
3 1 Shared Parallel Fixed None shared in parallel
computing is____?
computing.
Ans- (d)-
Application
Which of these is not a class
Application Distributed Symmetric Multicore checkpoiting. is
4 1 of parallel computing
Checkpointing Computing Multiprocessing Computing not a class of
architetcture?
parallel computer
architecture.
Parallel computing
software
Parallel Computing software Parallel
Automatic Application solutionincludes all
5 1 solutions and Techniques All Programming
Parallelization Checkpointing of the following..
includes: languages.
This is an
explanation
The Processors are The Processors
6 2 connected to the memory Switches Cables Buses Registers are connected
through a set of? thru. the switches.
Superscalar Architetcure
This is an
7 2 has how many execution Two One Three Four
explaination.
units?
What is used to hold the The Intermediate
Intermediate
8 2 intermediate output in a Cache RAM ROM Registers are used
Register
pipeline to hold the output.
International
International Human Genome
Human
Which oranization performs Sequencing and Genome Sequencing for
Genome This is an
9 2 sequencing of Human Consortium for Sequencing and Humans and
Sequencing explaination.
Genome? Human Constrium, Consortium,
and
Genome Org. Org.
Consortium
Ans(c)- Five
There are how many stages
10 2 Five Three Two Six stages are there in
in RISC Processor?
a RISC processor.
The DRAM acess
Over the last decade, The
time rate has
DRAM access time has None of the
11 2 0.1 0.2 0.15 improved at a rate
improved at what rate per above
of 10% over the
year?
last decade.
marks question A B C D ans
Cache acts as low
Which memory acts as low- latency high
12 2 latency high bandwidth Cache Register DRAM EPROM bandwidth storage
storage? .This is an
explanation.
Which processor This is an
13 2 SIMD MIMD MISD MIMD
architecture is this? explaination.
This diagram
Which core processor is
14 2 Quad-Core Dual-Core Octa-Core Single-Core shows Quad-
this?
Core.
Data Caching is
Which of these is not a
15 2 Data Caching Decomposition Simplification Parsimony not a prinicple of
scalable design principle?
scable design.
The distance between any O(1) is the ditance
16 2 two nodes in Bus Based O(1) O(n Logn) O(N) O(n^2) between any two
network is? nodes.
All of these are
Early SIMD computers early staged
17 2 All MPP CM-2 Illiac IV
include: SIMD parallel
computers.
This is called
This is which configuration
18 2 Pass-through Cross-Over Shuffle None Pass-through
in Omega networks.
configuration.
Parallelization
Automatic Parallelization includes parse,
19 2 technique doesn’t Share Memory Analyse Schedule Parse analyse schedule
ncludes: and code
generation.
The P4 processor
The Pentuim 4 or P4
has 20 staged
20 2 processor has how many 20 15 18 10
pipeline. This is an
stage pipeline?
explanation.
Sum, Prioirity and
Which protocol is not used
common are used
21 3 to remove concurrent Identify Priority Common Sum
to remove
writes?
concurrent writes.
Exclusive EREW stands for
Erasable Read Easily Read
Read and Exclusive Read
22 3 EREW PRAM stands for? and Erasable and Easily None
Exclusive and Exclsuive
Write PRAM Write
Write Write PRAM.
Multiple
During each clock cycle,
Instuctiion are
multiple instructions are
23 3 Parallel Series Both a and b None piped in parallel.
piped into the processor
This is an
in________?
explanation.
Multistaged
Which Interconnection Multistage Dynamic
24 3 Cross-Bar Bus-Staged Network uses this
Network uses this equation. Networks Networks
eqn.
marks question A B C D ans
There are
generally four
How many types of parallel types of parallel
computing are available computing,
25 3 from both proprietary and 4 2 3 6 available from
open source parallel both proprietary
computing vendors? and open source
parallel computing
vendors.
If a piece of data
is repeatedly used,
If a piece of data is the effective
repeatedly used, the latency of this
effective latency of this memory system
memory system can be Memory can be reduced by
26 3 Hit Ratio Memory ratio Hit Fraction
reduced by the cache. The Fraction. the cache. The
fraction of data references fraction of data
satisfied by the cache is references
called? satisfied by the
cache is called the
cache hit ratio.
SuperScalar
Superscalar Architetcure Data- Architecture can
27 3 Scheduling Phasing Data Extraction
can create problem in? Compiling cause problems in
CPU scheduling.
In cut-through
In cut-through routing, a routing, a message
28 3 message is broken into fixed Flits Flow Digits Control Digits All is broken into
size units called? fixed size units
called flits.
The total communication
This is an
29 3 time for cut-through routing A B C D
explaination.
is?
The Disadvantage of GPU Load- Process All of the This is an
30 1 Data balancing
Pipeline is? balancing balancing above explaination.
Examples of GPU AMD Both AMD and
31 1 Both NVIDIA None
Processors are: Processors NVIDIA.
Simultaneous
execution of
Simultaneous execution of
Stream different programs
32 1 different programs on a data Data Execution Data-paralleism None
Parallelism on a data stream is
stream is called?
called Stream
Parallelism.
Early GPU controllers were GPU This is an
33 1 Video Shifters GPU Shifters Video-Movers
known as? Controllers Explaination.
Algorithm
_____development is a
development is a
critical component of
34 1 Algorithm Code Pseudocode Problem critical component
problem solving using
of problem solving
computers?
using computers
marks question A B C D ans
Graphics
Graphical Gaming Graph This is an
35 1 GPU stands for? Processsing
Processing Unit Processing Unit Processing Unit Explaination.
Unit
Parallelism leads
naturally to
Concurrency. For
Serial
36 1 What leads to concurrency? Parallelism Decomposition All example, Several
Processing
processes trying to
print a file on a
single printer.
Rasterization is the
process of
The process of determining
Space- determining which
which screen-space pixel
37 2 Rasterization Pixelisation Fragmentation Determining screen-space pixel
locations are covered by
Process locations are
each\ntriangle is known as?
covered by
each\ntriangle.
The
programmable
units of the GPU
The programmable units of
follow a single
38 2 the GPU follow which SPMD MISD MIMD SIMD
program multiple-
programming model?
data (SPMD)
programming
model.
Shared Address
Which space can ease the space can ease the
programming effort, programming
especially if the distribution Shared Parallel Series- effort, especially if
39 2 Data- Address
of data is different in Address Address Address the distribution of
different phases of the data is different in
algorithm? different phases of
the algorithm.
Processors are the
Which are the hardware hardware units
40 2 units that physically perform Processsor ALU CPU CU that physically
computations? perform
computations
All of the these are
Examples of Graphics API
41 2 All DirectX CUDA Open-CL examples of
are?
Graphics API
The mechanism by
The mechanism by which which tasks are
tasks are assigned to assigned to
42 2 Mapping Computation Process None
processes for execution is processes for
called___? execution is called
mapping.
marks question A B C D ans
A decomposition
A decomposition into a into a large
large number of small tasks number of small
43 2 Fine- grained Coarse-grained Vector-granied All
is called__________ tasks is called
granularity. fine-grained
granularity.
Identical
operations being
Identical operations being
applied
applied concurrently on Data-
44 2 Parallelism Data Serialsm Concurrency concurrently on
different data items is Parallelism
different data
called?
items is called
Data Parallelism.
System which do not have
This is the
45 2 parallel processsing SISD SIMD MISD MIMD
explainantion.
capabiities?
The time and the
The time and the location in location in the
the program of a static one- program of a static
46 2 Priori Polling Decomposition Execution
way interaction is known as one-way
? interaction is
known a priori.
Memory access in RISC
CALL and MOV and This is the
47 2 architecture is limited to STA and LDA Push and POP
RET JMP explaination.
which instructions?
Data Parallel
Which Algorithms can be algorithms can be
implemented in both shared- implemented in
Data-Parallel Quick-Sort Bubble Sort
48 2 address-space and Data Algorithm both shared-
Algo. Algo. Algo.
message-passing address-space
paradigms? and message-
passing paradigms
Randomized This figure shows
Which type of Distribution is Block-Cyclic Cyclic
49 2 Block None Randomized
this? Distribution Distribution
Distribution Block Distribution.
An abstraction
used to express
An abstraction used to such dependencies
express such dependencies Task- Time- among tasks and
Dependency
50 2 among tasks and their Dependency Dependency None their relative order
Graph.
relative order of execution is Graph. Graph of execution is
known as__________? known as a task-
dependency
graph.
marks question A B C D ans
Block distributions
are some of the
Which is the simplest way to simplest ways to
distribute an array and distribute an array
Block Array Process
51 3 assign uniform contiguous All and assign uniform
Distrbution Distrbution Distribution
portions of the array to contiguous
different processes? portions of the
array to different
processes
An example of a
An example of a decomposition
Image- Travelling Time-
decomposition with a 8 Queen with a regular
52 3 dethering Salesman complexity
regular interaction pattern problem. interaction pattern
problem. Problem Problens
is? is the problem of
image dithering.
A feature of a
task-dependency
A feature of a task-
graph that
dependency graph that
determines the
53 3 determines the average Critical-path Process-path Granularity. Concurrency
average degree of
degree of concurrency for a
concurrency for a
given granularity is
given granularity is
critical path.
The shared-
address-space
The shared-address-space programming
54 3 programming paradigms can Both Two way One way None paradigms can
handle which interactions? handle both one-
way and two-way
interactions.
Cyclic Distribution
can result in an
Which distribution can result
almost perfect
in an almost perfect load
Cyclic Array Block-Cyclic Block load balance due
55 3 balance due to the extreme
Distribution. Distribution Distribution Distribution. to the extreme
fine-grained underlying
fine-grained
decomposition.
underlying
decomposition.
Data sharing
interactions can be
Data sharing interactions
categorized as
56 3 can be categorized Both Read-Write Read only None
either read-only or
as__________interactions?
read-write
interactions
marks question A B C D ans
Algo. Model is a
way of structuring
a parallel algorithm
What is the way of
by selecting a
structuring a parallel
decomposition
algorithm by selecting a
Algorithm Mapping and mapping
57 3 decomposition and mapping Parallel Model Data Model
Model Model technique and
technique and applying the
applying the
appropriate strategy to
appropriate
minimize interactions called?
strategy to
minimize
interactions.
This is Serial
Serial column Column- Bubble Sort
58 3 Which Algorithm is this? None. Column based
based Algo. Algorithm Algo.
algorithm.
Algorithms based on the Matrix- Parallel This is an
59 3 All Quicksort
task graph model include: Factorization QuickSort Explaination.
All-port
communication
Which model permits model permits
simultaneous communication All-port One-port Dual-port Quad-port simultaneous
60 1
on all the channels communication communication communication communication communication on
connected to a node? all the channels
connected to a
node.
A process sends the same
m-word message to every
other process, but different All to All One to All All to All This is an
61 1 None
processes may broadcast Broadcast Broadcast Reduction Explaination.
different messages. It is
called?
All to All One-to-all All-to-one One to one
The Matrix is transposed This is an
62 1 personalized personalized personalized personalized
using which operation? Explaination.
communication communication communication communication.
Each node in a
Each node in a two-
two-dimensional
63 1 dimensional wraparound Four Two Three One
wraparound mesh
mesh has how many ports?
has four ports
Circular shift is a member of
a broader class of global This is ann
64 1 Permutation Combination. Both a and b None
communication operations explaination.
known as?
We define a
circular q-shift as
We define_______ as the the operation in
operation in which node i which node i
Circular q-
65 1 sends a data packet to node Linear shift Circular shift Linear q-shift. sends a data
shift
(i + q) mod p in a p-node packet to node (i
ensemble (0 < q < p). + q) mod p in a p-
node ensemble (0
< q < p).
marks question A B C D ans
Parallel algorithms
often require a
Parallel algorithms often single process to
require a single process to send identical data
send identical data to all One to All One to One All to One to all other
66 1 None
other processes or to a Broadcast Broadcast Broadcast processes or to a
subset of them. This subset of them.
operation is known as? This operation is
known as One to
All Broadcast.
In which Communication
All to All One to One All-to-one One-to-all
each node sends a distinct This is an
67 1 personalized personalized personalized personalized
message of size m to every Explaination.
communication communication communication communication.
other node?
All to All personalized
communication operation is Matrix- Fourier Database Join This is an
68 1 Quick Sort
not used in a which of these Transpose Transformation operation Explaination.
parallel algorithms?
The dual of one to
The Dual of one-to-all All to one All to one One to Many All to All all Broadcast is
69 1
broadcast is? Reduction Broadcast Reduction Broadcast called all to one
reduction.
Reduction on a
Reduction on a linear array linear array can be
can be performed performed by
70 1 by_______ the direction Reversing Forwarding Escaping Widening simply reversing
and the sequence of the direction and
communication? the sequence of
communication
This equation is used to
solve which topology This is an
71 2 Hypercube Mesh Ring Linear-Array
operations in all to all Explaination.
communications?
\nThe communication
Second
pattern of all-to-all Third Variation First Variation Fifth Variation This is an
72 2 Variation of
broadcast can be used to of Reduction of Reduction of Reduction Explaination.
Reduction
perform________?
In the scatter
A single node sends a
operation, a single
unique message of size m to
node sends a
73 2 every other node. This Scatter Reduction Gather Concatenate
unique message of
operation is known
size m to every
as______?
other node.
The Algorithm represents All to All All to All All to All One to One This is an
74 2
which broadcast? Broadcast Broadcast Reduction Reduction explaination.
The message can be The message can
75 2 broadcast in how many Log(p) Log(p^2) One Sin(p) be broadcast in
steps? log p steps.
All to All One-to-all One to one All-to-one
This equation is used to This is an
76 2 personalized personalized personalized personalized
solve which operations? Explaination.
communication communication communication communication.
marks question A B C D ans
There are n^3
There are how many
computations for
computations for n^2 words
77 2 N^3 Tan n E^n Log n n^2 words of data
of data transferred among
transferred among
the nodes?
the nodes.
Scatter opeartion
One-to-all One-to-one All-to-one All-to-all is also known as
Scatter Operation is also
78 2 personalized personalized personalized personalized One-to-all
known as?
communication communication communication communication. personalized
communication.
A hypercube with
A Hypercube with 2d nodes 2d nodes can be
can be regarded as a d- regarded as a d-
79 2 Two One Three Four
dimensional mesh with____ dimensional mesh
nodes in each dimension. with two nodes in
each dimension
One-to-all broadcast and
all-to-one reduction are Gausiian Shortest path Matrix- Vector This is an
80 2 All
used in several important Elimination Algo. multiplication Explaination.
parallel algorithms including?
Each node of the
Each node of the distributed-
distributed-memory parallel memory parallel
81 2 computer is a______ NUMA UMA CCMA None computer is a
shared-memory NUMA shared-
multiprocessor. memory
multiprocessor.
To perform a q-
To perform a q-shift, we shift, we expand q
82 2 expand q as a sum of 2 3 e Log p as a sum of
distinct powers of______? distinct powers of
2.
In which implementation of
This is an
83 3 circular shift, the entire row Mesh Hypercube Ring Linear
Explaination
to data set is shifted by
On a p-node
hypercube with
all-port
communication,
On a p-node hypercube
the coefficients of
with all-port communication,
tw in the
the coefficients of tw in the
expressions for the
expressions for the
communication
communication times of
84 3 Log(p) Cos(p) Sin(p) E^p times of one-to-all
one-to-all and all-to-all
and all-to-all
broadcast and personalized
broadcast and
communication are all
personalized
smaller than their single-port
communication are
counterparts by a factor of?
all smaller than
their single-port
counterparts by a
factor of log p.
marks question A B C D ans
The Equation represents
Data Model Space-Time Ans-(c) Cost
85 3 which analysis in All to All Cost Analysis Time Analysis
Analysis Analysis Analysis.
Broadcasts?
On a p-node hypercube, the
size of each message
86 3 A B C D A
exchanged in the i th of the
log p steps is?
This figure shows
One to All
Which broadcast is applied One to All One to One All to One All to one
87 3 Broadcast being
on this 3D hypercube? Broadcast Broadcast Broadcast Reduction
applied on 3D
hypercube.
The Equation represents
This is an
88 3 which analysis in One to All Cost Analysis Time Analysis Data Analysis Space Analysis
explaination.
Broadcasts?
The time for
The time for circular shift on circular shift on a
a hypercube can be hypercube can be
89 3 improved by almost a factor Log p Cos(p) e^p Sin p improved by
of ______ for large almost a factor of
messages. log p for large
messages.
The execution time
of a parallel
algorithm depends
not only on input
size but also on
The execution time of the number of
Relative Communication
90 1 parallel algorithm Processor Input Size processing
computation speed
doesn’t depends upon? elements used,
and their relative
computation and
interprocess
communication
speeds.
Processing elements in a The processing Both
parallel system may become Load element synchronization
91 1 Both Synchronization
idle due to many reasons Imbalance doesn’t and load
such as: become idle. imbalance
If the scaled-
speedup curve is
If the scaled-speedup curve close to linear with
is close to linear with respect to the
respect to the number of number of
92 1 Scalable Iso-scalable Non-Scalable Scale-Efficient
processing elements, then processing
the parallel system is elements, then the
considered as? parallel system is
considered
scalable
marks question A B C D ans
A parallel system
is the combination
Which system is the
of an algorithm
combination of an algorithm Parallel Data- Parallel Architecture
93 1 Series System and the parallel
and the parallel architecture System System System
architecture on
on which it is implemented?
which it is
implemented
Scalable Speedup
defined as the
What is defined as the speedup obtained
speedup obtained when the when the problem
Scalable Unscalable Superlinearity Isoefficiency
94 1 problem size is increased size is increased
Speedup Speedup Speedup Speedup
linearly with the number of linearly with the
processing elements? number of
processing
elements
The maximum
number of tasks
The maximum number of that can be
tasks that can be executed executed
95 1 simultaneously at any time in Concurrency Parallelism Linearity Execution simultaneously at
a parallel algorithm is called any time in a
its degree of__________. parallel algorithm
is called its degree
of concurrency.
The isoefficiency due to
This is an
96 1 concurrency in 2-D O(p) O(n Logp) O(1) O(n^2)
explaination.
partitioning is:
We define total
overhead of a
parallel system as
the total time
The total time collectively collectively spent
spent by all the processing by all the
elements over and above processing
that required by the fastest elements over and
Total Parallel
97 2 known sequential algorithm Overhead Serial Runtime above that
Overhead Runtime
for solving the same required by the
problem on a single fastest known
processing element is sequential
known as? algorithm for
solving the same
problem on a
single processing
element.
Parallel
Parallel computations computations
involving matrices and involving\nmatrices
98 2 vectors readily lend Decomposition Composition Linearity Parallelsim and vectors
themselves to data readily lend
______________. themselves to data
decomposition.
marks question A B C D ans
Parallel 1-D with Pipelining
This is an
99 2 is a___________ Synchronous Asynchronous Optimal Cost-optimal
explaination.
algorithm?
The serial complexity of
This is an
100 2 Matrix-Matrix Multiplication Õ•(n^3) O(n^2) O(n) O(nlogn)
explaination
is:
What is the problem size for Ï´(n^3) is the
101 2 Ï´(n^3) Ï´(nlogn) Ï´(n^2) Ï´(1)
n x n matrix multiplication? problem size.
The given equation Overhead Series Parallel This is an
102 2 Parallel Model
represents which function? Function Overtime Overtime explaination.
The efficiency of a parallel
103 2 A B C D A
program can be written as:
The total number
The total number of steps in
of steps in the
104 2 the entire pipelined Θ(n) Θ(n^2) Θ(n^3) Θ(1)
entire pipelined
procedure is_______?
procedure is Θ(n)
In Canon's Algorithm, the This is an
105 2 θ(n^2) θ(n) θ(n^3) θ(nlogn)
memory used is? explaination.
Consider the
problem of
Consider the problem of
multiplying two n
multiplying two n × n
× n dense,
106 2 dense, square\nmatrices A A×B A/B A+B A-B
square\nmatrices
and B to yield the product
A and B to yield
matrix C =:
the product matrix
C = A × B.
The serial runtime of
multiplying a matrix of
107 2 A B C D A
dimension n x n with a
vector is?
Efficiency is a
measure of the
________is a measure of
fraction of time for
the fraction of time for Overtime
108 2 Efficiency Linearity Superlinearity which a
which a processing element Function
processing
is usefully employed.
element is usefully
employed.
When the work performed
by a serial algorithm is
greater than its parallel
formulation or due to Superlinear Linear Performance This is an
109 2 Super Linearity
hardware features that put Speedup Speedups Metrics explaintion
the serial implementation at
a disadvantage.This
phenomena is known as?
The all-to-all broadcast and
This is an
110 3 the computation of y[i] both Θ(n) Θ(nlogn) Θ(n^2) Θ(n^3)
explaination.
take time?
marks question A B C D ans
If virtual
processing
elements are
mapped
If virtual processing
appropriately onto
elements are mapped
physical
appropriately onto physical
processing
111 3 processing elements, the N/p P/n N+p N*p
elements, the
overall communication time
overall
does not grow by more than
communication
a factor of
time does not
grow by more
than a factor of
n/p
Parallel execution time can
be expressed as a function
of problem size, overhead
112 3 A B C D A
function, and the number of
processing elements.The
Formed eqn is:
In 2-D partioning, the first Ts + twn/
113 3 Ts - twn/√p.\n Ts*twn/√p.\n Ts/ twn*√p.\n Ts + twn/√p.\n
alignment takes time=? √p.\n
Using fewer than
the maximum
Using fewer than the
possible number
maximum possible number
of processing
114 3 of processing elements to Scaling Down Scaling up Scaling Stimulation
elements to
execute a parallel algorithm
execute a parallel
is called________?
algorithm is called
scaling down.
Which of the following is a
Memory This is an
115 3 drawback of matrix matrix Efficient Time-bound Complex
Optimal explaination
multiplication?
Consider the
problem of sorting
Consider the problem of 1024 numbers (n
sorting 1024 numbers (n = = 1024, log n =
116 3 1024, log n = 10) on 32 P/log n P*log n P+logn N*log p 10) on 32
processing elements. The processing
speedup expected is elements. The
speedup expected
is p/logn
Consider the problem of
adding n numbers on p
processing elements such
Ꙩ((n/p) log Ꙩ((n*p) log Ꙩ((p/n) log Ans-(a)-
117 3 that p < n and both n and p Ꙩ((n) log p).
p). p). p). Ꙩ((n/p) log p).
are powers of 2. The overall
parallel execution time of the
problem is:
DNS algorithm has____ DNS has Ω(n)
118 3 Ω(n) Ω(n^2) Ω(n^3) Ω(logn)
runtime? runtime
marks question A B C D ans
The serial algorithm Ans-(b)-n^2. The
requires______ serial algorithm
119 3 multiplications and additions N^2 N^3 Log n Nlog(n) requires n^2
in matrix-vector multiplications and
multiplication.\n additions.\n\n
The time required
The time required to merge to merge two
120 1 two sorted blocks of n/p θ(n/p) θ(n) θ(p/n) θ(nlogp) sorted blocks of
elements is_________?\n\n n/p elements is
θ(n/p).\n\n
The stack is split
into two equal
In Parallel DFS,the stack is pieces such that
split into two equal pieces the size of the
such that the size of the search space
121 1 Half-Split Half-Split Parallel-Split None
search space represented represented by
by each stack is the same. each stack is the
Such a split is called?. same. Such a split
is called a half-
split.
To avoid sending
very small
To avoid sending very small
amounts of work,
amounts of work, nodes
nodes beyond a
beyond a specified stack
122 1 Cut-Off Breakdown Full Series specified stack
depth are not given away.
depth are not
This depth is called
given away. This
the_________depth.
depth is called the
cutoff depth.
In sequential
In sequential sorting sorting algorithms,
algorithms, the input and the Process Secondary External the input and the
123 1 Main Memory
sorted sequences are stored Memory Memory Memory sorted sequences
in which memory? are stored in the
process's memory
Each process
sends its block to
the other process.
Each process sends its
Now, each
block to the other process.
process merges
Now, each process merges
the two sorted
the two sorted blocks and
124 1 Compare-Split Split Compare Exchange. blocks and retains
retains only the appropriate
only the
half of the merged block.
appropriate half of
We refer to this operation
the merged block.
as?
We refer to this
operation as
compare-split.
marks question A B C D ans
Each process
compares the
Each process compares the
received element
received element with its
with its own and
own and retains the Compare Process-
125 1 Exchange All retains the
appropriate element.We Exchange Exchange
appropriate
refer this operation
element. We refer
as_______?
this as compare
exchange.
Parallel BFS
maintains the
Which algorithm maintains
unexpanded nodes
the unexpanded nodes in the
126 1 Parallel BFS Parallel DFS Both a and b None in the search
search graph, ordered
graph, ordered
according to their l-value?
according to their
l-value.
The critical issue in
The critical issue in parallel parallel depth-first
depth-first search algorithms search algorithms
127 1 is the distribution of the Processor Space Memory Blocks is the distribution
search space among of the search
the____________? space among the
processors
Enumeration Sort uses how
This is an
128 2 many processes to sort n N^2 Logn N^3 N
explaination.
elements?
A bitonic
sequence is a
sequence of
elements <a0, a1,
Which sequence is a
..., an-1> with the
sequence of elements <a0,
property that
a1, ..., an-1> with the
either (1) there
property that either (1) there
exists an index i, 0
exists an index i, 0 ≤ i
i n - 1, such that
≤n - 1, such that <a0, Bitonic Acyclic Asymptotic Cyclic
129 2 <a0, ..., ai > is
..., ai > is monotonically Sequence Sequence Sequence Sequence.
monotonically
increasing and <ai +1, ...,
increasing and <ai
an-1> is monotonically
+1, ..., an-1> is
decreasing, or (2) there
monotonically
exists a cyclic shift of indices
decreasing, or (2)
so that (1) is satisfied.
there exists a
cyclic shift of
indices so that (1)
is satisfied
marks question A B C D ans
To make a
substantial
improvement over
To make a substantial
odd-even
improvement over odd-even
transposition sort,
transposition sort, we need
we need an
130 2 an algorithm that moves Shell Sort Linear Sort Quick-Sort Bubble Sort
algorithm that
elements long distances.
moves elements
Which one of these is such
long distances.
serial sorting algorithm?
Shellsort is one
such serial sorting
algorithm.
Quicksort is a
Quick-Sort is a_________ Divide and Greedy Divide and
131 2 Both a and b None
algorithm? Conquer Approach Conquer
algorithm.
The_______ transposition
algorithm sorts n elements in
n phases (n is even), each of This is an
132 2 Odd-Even Odd Even None
which requires n/2 explaination.
compare-exchange
operations.
The average time
The average time
complexity for
133 2 complexity for Bucket Sort O(n+k) O(nlog(n+k)) O(n^3) θ(n^2)
Bucket Sort is
is?
O(n + k).
A popular serial
algorithm for
A popular serial algorithm sorting an array of
for sorting an array of n n elements whose
elements whose values are Quick-Sort values are
134 2 Bucket Sort Linear Sort Bubble-Sort
uniformly distributed over an Algo. uniformly
interval [a, b] is which distributed over an
algorithm? interval [a, b] is
the bucket sort
algorithm
Best case
Best Case time complexity complexity of
135 2 O(n) O(n^3) O(nlogn) O(n^2)
of Bubble Sort is: bubblesort is
O(n).
marks question A B C D ans
When more than
one process tries
When more than one to write to the
process tries to write to the same memory
same memory location, only location, only one
one arbitrarily chosen arbitrarily chosen
CRCW-
136 2 process is allowed to write, PRAM Partitioning CRCW process is allowed
PRAM
and the remaining writes are to write, and the
ignored. This process is remaining writes
called_________ in quick are ignored.It is
sort. called CRCW
PRAM quick sort
algo.
Average Time Complexity in This is an
137 2 O(nlogn) O(n) O(n^3) θ(n^2)
a quicksort algorithm is: explainatoin.
The isoefficiency function of The isoefficiency
138 2 Global Round Robin (GRR) O (p^2 log p) O (p log p) O ( log p) O (p^2) function of GRR is
is: O (p^2 log p)
A comparator is a
A_____ is a device with
device with two
two inputs x and y and two
139 2 Comparator Router Separator Switch. inputs x and y and
outputs x' and y' in a Sorting
two outputs x' and
Network.
y'
If T is a DFS tree
in G then the
If T is a DFS tree in G then parallel
the parallel implementation implementation of
140 2 of the algorithm runs in O(t) O(tlogn) O(logt) O(1) the algorithm
______________time outputs a proof
complexity. that can be
verified in O(t)
time complexity.
In the quest for
fast sorting
In the quest for fast sorting methods, a
methods, a number of number of
networks have been networks have
141 2 θ(nlogn) θ(n) θ(1) θ(n^2)
designed that sort n been designed that
elements in time significantly sort n elements in
smaller than___? time significantly
smaller than
θ(nlogn).
The average value
The average value of the
of the search
search overhead factor in
142 2 One Two Three Four overhead factor in
parallel DFS is less
parallel DFS is
than______?
less than one
Parallel runtime for
Parallel runtime for Ring
Ring architecture
143 3 architecture in a bitonic sort θ(n) θ(nlogn) θ(n^2) θ(n^3)
in a bitonic sort is
is:
θ(n)
marks question A B C D ans
The Sequential Complexity
This is an
144 3 of Odd-Even Transposition θ(n^2) θ(nlogn) θ(n^3) θ(n)
explaination.
Algorithm is:
The Algorithm represents Sequential Circular Bubble Simple Bubble Linear Bubble This is an
145 3
which bubble sort: Bubble Sort Sort Sort Sort explaination.
Enumeration Sort uses how
This is an
146 3 much time to sort n θ(1) θ(nlogn) θ(n^2) θ(n)
explaination.
elements?
The radix sort
algorithm relies on
The______algorithm relies
the binary
147 3 on the binary representation Radix-sort Bubble Sort Quick-Sort Bucket-Sort
representation of
of the elements to be sorted.
the elements to be
sorted.
Parallel runtime for Mesh
This is an
148 3 architecture in a bitonic sort θ(n/logn) θ(n) θ(n^2) θ(n^3)
explaination.
is:
The number of
The number of threads in a threads in a thread
thread block is limited by block is also
149 1 the architecture to a total of 512 502 510 412 limited by the
how many threads per architecture to a
block? total of 512
threads per block
CUDA Architecture is
NVIDIA provides
150 1 mainly provided by which NVIDIA Intel Apple IBM
CUDA services.
company?
In CUDA Architecture,
Subprograms are
151 1 what are subprograms Kernel Grid Element Blocks
called kernels.
called?
CUDA Stands for
Compute Computer Common USB Common
What is the fullform of Compute Unified
152 1 Unified Device Unified Device Device Unified Disk
CUDA? Device
Architecture Architecture Architecture Architecture
Architecture.
CUDA
Which of these is not an
Thermo Neural VLSI architecture has no
153 2 application of CUDA Fluid Dynamics
Dynamics Networks Stimulation use on Thermo
Arhitecture?
Dynamics.
CUDA
CUDA programming is programming is
especially well-suited to especially well-
address problems that can suited to address
154 2 Data parallel Task Parallel Both a and b None
be expressed problems that can
as__________ be expressed as
computations. dataparallel
computations.
CUDA C/C++ uses which This is an
155 2 global kernel Cuda_void nvcc
keyword in programming: explaination.
CUDA programs are saved This is an
156 2 .cd .cx .cc .cu
with_____ extension. explaination
marks question A B C D ans
The Kepler K20X
chip block
The Kepler K20-X chip
diagram,
block, contains____
157 2 15 8 16 7 containing 15
streaming
streaming
multiprocessors\n(SMs).
multiprocessors
(SMs)
The K20X
The Kepler K20X architecture
158 2 architecture increases the 64K 32K 128K 256K increases the
register file size to: register fi le size to
64K
The register file in a GPU is Register size in a
159 2 2 MB 1 MB 3MB 1024B
of what size? GPU is 2MB.
NVIDIA’s GPU
computing platform is not This is an
160 2 AMD Tegra Quadro Tesla
enabled on which of the explaination.
following product families:
Tesla K-40 has compute This is an
161 2 3.5 3.2 3.4 3.1
capability of: explaination.
The SIMD unit
The SIMD unit creates, creates, manages,
manages, schedules and schedules and
162 2 executes_____ threads 32 16 24 8 executes 32
simultaneously to create a threads
warp. simultaneously to
create a warp
Which hardware is used by
the host interface to fasten Direct
Memory This is an
163 2 the transfer of bulk data to Memory Switch Hub
Hardware Explaination
and fro the graphics Access
pipeline?
A ‘grid’ is a
collection of
A ____ is a collection of
thread blocks of
thread blocks of the same
164 2 Grid Core Element Blcoks the same thread
thread dimensionality which
dimensionality
all execute the same kernel.
which all execute
the same kernel
Active Warps can be
This is an
165 2 classified into how many 3 2 4 5
explaination.
types?
All threads in a
All threads in a grid share Global Synchronized grid\nshare the
166 2 Local Memory All
the same_________space. memory Memory same global
memory space
CUDA was introduced in This is an
167 2 2007 2006 2008 2010
which year? explaination.
marks question A B C D ans
Unlike a C
function call, all
Unlike a C function call, all
168 3 Asynchronous Synchronous Both a and b None CUDA kernel
CUDA kernel launches are:
launches are
asynchronous
A warp consists of
32 consecutive
A warp consists
threads and
of____consecutive threads
all\nthreads in a
and all threads in a warp are
169 3 32 16 64 128 warp are executed
executed in Single
in Single
Instruction Multiple Thread
Instruction
(SIMT) fashion.
Multiple Thread
(SIMT) fashion
There are how many
This is an
170 3 streaming multiprocessors in 16 8 12 4
explaination.
CUDA architecture?
In CUDA programming, if
This is an
171 3 CPU is the host then device GPU Compiler HDD GPGPU
explaination.
will be:
Both grids and
Both grids and blocks use blocks use the
172 3 the______ type with three Dim3 Dim2 Dim1 Dim4 dim3 type with
unsigned integer fields. three unsigned
integer fi elds
Tesla P100 GPU
based on the
Tesla P100 GPU based on Pascal GPU
the Pascal GPU Architecture has
Architecture has 56 56 Streaming
173 3 Streaming Multiprocessors 2048 512 1024 256 Multiprocessors
(SMs), each capable of (SMs), each
supporting up to____active capable of
threads. supporting up to
2048 active
threads.
The maximum size
The maximum size at each
at each level of the
174 3 level of the thread hierarchy Device Host Compiler Memory
thread hierarchy is
is_____dependent.
device dependent.
Intel I7 has the memory bus This is an
175 3 19B 180B 152B 102B
of width: explaination.
The Streaming
The__________ is the Multiprocessor
Streaming
176 3 heart of the GPU Multiprocessor CUDA Compiler (SM) is the heart
Multiprocessor
architecture: of the GPU
architecture.
marks question A B C D ans
A kernel is defi
A kernel is defined using ned using
177 3 the_____ declaration _global _host _device _void the\n__global
specification declaration specifi
cation
The function
printThreadInfo() is not
Memory Matrix Ans-(d)- Memory
178 3 used to print out which of Block Index Control-Index
Allocations Coordinates Allocations.
the following information
about each thread:
Which is alternative options for latency hiding?

A. Increase CPU frequency

B. Multithreading

C. Increase Bandwidth

D. Increase Memory

ANSWER: B

______ Communication model is generally seen in tightly coupled system.

A. Message Passing

B. Shared-address space

C. Client-Server

D. Distributed Network

ANSWER: B

The principal parameters that determine the communication latency are as follows:

A. Startup time (ts) Per-hop time (th) Per-word transfer time (tw)

B. Startup time (ts) Per-word transfer time (tw)

C. Startup time (ts) Per-hop time (th)

D. Startup time (ts) Message-Packet-Size(W)

ANSWER: A

The number and size of tasks into which a problem is decomposed determines the __

A. Granularity

B. Task

C. Dependency Graph

D. Decomposition

ANSWER: A

Average Degree of Concurrency is...

A. The average number of tasks that can run concurrently over the entire duration of execution of
the process.
B. The average time that can run concurrently over the entire duration of execution of the process.

C. The average in degree of task dependency graph.

D. The average out degree of task dependency graph.

ANSWER: A

Which task decomposition technique is suitable for the 15-puzzle problem?

A. Data decomposition

B. Exploratory decomposition

C. Speculative decomposition

D. Recursive decomposition

ANSWER: B

Which of the following method is used to avoid Interaction Overheads?

A. Maximizing data locality

B. Minimizing data locality

C. Increase memory size

D. None of the above.

ANSWER: A

Which of the following is not parallel algorithm model

A. The Data Parallel Model

B. The work pool model

C. The task graph model

D. The Speculative Model

ANSWER: D

Nvidia GPU based on following architecture

A. MIMD

B. SIMD

C. SISD

D. MISD
ANSWER: B

What is Critical Path?

A. The length of the longest path in a task dependency graph is called the critical path length.

B. The length of the smallest path in a task dependency graph is called the critical path length.

C. Path with loop

D. None of the mentioned.

ANSWER: A

Which decompositioin technique uses divide-andconquer strategy?

A. recursive decomposition

B. Sdata decomposition

C. exploratory decomposition

D. speculative decomposition

ANSWER: A

If there are 6 nodes in a ring topology how many message passing cycles will be required to
complete broadcast process in one to all?

A. 1

B. 6

C. 3

D. 4

ANSWER: 3

If there is 4 X 4 Mesh topology network then how many ring operation will perform to complete one
to all broadcast?

A. 4

B. 8

C. 16

D. 32

ANSWER: 8
Consider all to all broadcast in ring topology with 8 nodes. How many messages will be present with
each node after 3rd step/cycle of communication?

A. 3

B. 4

C. 6

D. 7

ANSWER: 4

Consider Hypercube topology with 8 nodes then how many message passing cycles will require in all
to all broadcast operation?

A. The longest path between any pair of finish nodes.

B. The longest directed path between any pair of start & finish node.

C. The shortest path between any pair of finish nodes.

D. The number of maximum nodes level in graph.

ANSWER: D

Scatter is ____________.

A. One to all broadcast communication

B. All to all broadcast communication

C. One to all personalised communication

D. Node of the above.

ANSWER: C

If there is 4X4 Mesh Topology ______ message passing cycles will require complete all to all
reduction.

A. 4

B. 6

C. 8

D. 16

ANSWER: C

Following issue(s) is/are the true about sorting techniques with parallel computing.
A. Large sequence is the issue

B. Where to store output sequence is the issue

C. Small sequence is the issue

D. None of the above

ANSWER: B

Partitioning on series done after ______________

A. Local arrangement

B. Processess assignments

C. Global arrangement

D. None of the above

ANSWER: C

In Parallel DFS processes has following roles.(Select multiple choices if applicable)

A. Donor

B. Active

C. Idle

D. Passive

ANSWER: A

Suppose there are 16 elements in a series then how many phases will be required to sort the series
using parallel odd-even bubble sort?

A. 8

B. 4

C. 5

D. 15

ANSWER: D

Which are different sources of Overheads in Parallel Programs?

A. Interprocess interactions

B. Process Idling
C. All mentioned options

D. Excess Computation

ANSWER: C

The ratio of the time taken to solve a problem on a parallel processors to the time required to solve
the same problem on a single processor with p identical processing elements.

A. The ratio of the time taken to solve a problem on a single processor to the time required to solve
the same problem on a parallel computer with p identical processing elements.

B. The ratio of the time taken to solve a problem on a single processor to the time required to solve
the same problem on a parallel computer with p identical processing elements

C. The ratio of number of multiple processors to size of data

D. None of the above

ANSWER: B

Efficiency is a measure of the fraction of time for which a processing element is usefully employed.

A. TRUE

B. FALSE

ANSWER: A

CUDA helps do execute code in parallel mode using __________

A. CPU

B. GPU

C. ROM

D. Cash memory

ANSWER: B

In thread-function execution scenario thread is a ___________

A. Work

B. Worker

C. Task

D. None of the above

ANSWER: B
In GPU Following statements are true

A. Grid contains Block

B. Block contains Threads

C. All the mentioned options.

D. SM stands for Streaming MultiProcessor

ANSWER: C

Computer system of a parallel computer is capable of_____________

A. Decentralized computing

B. Parallel computing

C. Centralized computing

D. All of these

ANSWER: A

In which application system Distributed systems can run well?

A. HPC

B. Distrubuted Framework

C. HRC

D. None of the above

ANSWER: A

A pipeline is like .................... ?

A. an automobile assembly line

B. house pipeline

C. both a and b

D. a gas line

ANSWER: A

Pipeline implements ?

A. fetch instruction
B. decode instruction

C. fetch operand

D. all of above

ANSWER: D

A processor performing fetch or decoding of different instruction during the execution of another
instruction is called ______ ?

A. Super-scaling

B. Pipe-lining

C. Parallel Computation

D. None of these

ANSWER: B

In a parallel execution, the performance will always improve as the number of processors will
increase?

A. True

B. False

ANSWER: B

VLIW stands for ?

A. Very Long Instruction Word

B. Very Long Instruction Width

C. Very Large Instruction Word

D. Very Long Instruction Width

ANSWER: A

In VLIW the decision for the order of execution of the instructions depends on the program itself?

A. True

B. False

ANSWER: A

Which one is not a limitation of a distributed memory parallel system?


A. Higher communication time

B. Cache coherency

C. Synchronization overheads

D. None of the above

ANSWER: B

Which of these steps can create conflict among the processors?

A. Synchronized computation of local variables

B. Concurrent write

C. Concurrent read

D. None of the above

ANSWER: B

Which one is not a characteristic of NUMA multiprocessors?

A. It allows shared memory computing

B. Memory units are placed in physically different location

C. All memory units are mapped to one common virtual global memory

D. Processors access their independent local memories

ANSWER: D

Which of these is not a source of overhead in parallel computing?

A. Non-uniform load distribution

B. Less local memory requirement in distributed computing

C. Synchronization among threads in shared memory computing

D. None of the above

ANSWER: B

Systems that do not have parallel processing capabilities are?

A. SISD

B. SIMD

C. MIMD
D. All of the above

ANSWER: A

How does the number of transistors per chip increase according to Moore ´s law?

A. Quadratically

B. Linearly

C. Cubicly

D. Exponentially

ANSWER: D

Parallel processing may occur?

A. in the instruction stream

B. in the data stream

C. both[A] and [B]

D. none of the above

ANSWER: C

To which class of systems does the von Neumann computer belong?

A. SIMD (Single Instruction Multiple Data)

B. MIMD (Multiple Instruction Multiple Data)

C. MISD (Multiple Instruction Single Data)

D. SISD (Single Instruction Single Data)

ANSWER: D

Fine-grain threading is considered as a ______ threading?

A. Instruction-level

B. Loop level

C. Task-level

D. Function-level

ANSWER: A
Multiprocessor is systems with multiple CPUs, which are capable of independently executing
different tasks in parallel. In this category every processor and memory module has similar access
time?

A. UMA

B. Microprocessor

C. Multiprocessor

D. NUMA

ANSWER: A

For inter processor communication the miss arises are called?

A. hit rate

B. coherence misses

C. comitt misses

D. parallel processing

ANSWER: B

NUMA architecture uses _______in design?

A. cache

B. shared memory

C. message passing

D. distributed memory

ANSWER: D

A multiprocessor machine which is capable of executing multiple instructions on multiple data sets?

A. SISD

B. SIMD

C. MIMD

D. MISD

ANSWER: C

In message passing, send and receive message between?

A. Task or processes
B. Task and Execution

C. Processor and Instruction

D. Instruction and decode

ANSWER: A

The First step in developing a parallel algorithm is_________?

A. To Decompose the problem into tasks that can be executed concurrently

B. Execute directly

C. Execute indirectly

D. None of Above

ANSWER: A

The number of tasks into which a problem is decomposed determines its?

A. Granularity

B. Priority

C. Modernity

D. None of above

ANSWER: A

The length of the longest path in a task dependency graph is called?

A. the critical path length

B. the critical data length

C. the critical bit length

D. None of above

ANSWER: A

The graph of tasks (nodes) and their interactions/data exchange (edges)?

A. Is referred to as a task interaction graph

B. Is referred to as a task Communication graph

C. Is referred to as a task interface graph

D. None of Above
ANSWER: A

Mappings are determined by?

A. task dependency

B. task interaction graphs

C. Both A and B

D. None of Above

ANSWER: C

Decomposition Techniques are?

A. recursive decomposition

B. data decomposition

C. exploratory decomposition

D. All of Above

ANSWER: D

The Owner Computes Rule generally states that the process assigned a particular data item is
responsible for?

A. All computation associated with it

B. Only one computation

C. Only two computation

D. Only occasionally computation

ANSWER: A

A simple application of exploratory decomposition is_?

A. The solution to a 15 puzzle

B. The solution to 20 puzzle

C. The solution to any puzzle

D. None of Above

ANSWER: A
Speculative Decomposition consist of _?

A. conservative approaches

B. optimistic approaches

C. Both A and B

D. Only B

ANSWER: C

task characteristics include?

A. Task generation.

B. Task sizes.

C. Size of data associated with tasks.

D. All of Above

ANSWER: D

Writing parallel programs is referred to as?

A. Parallel computation

B. Parallel processes

C. Parallel development

D. Parallel programming

ANSWER: D

Parallel Algorithm Models?

A. Data parallel model

B. Bit model

C. Data model

D. network model

ANSWER: A

The number and size of tasks into which a problem is decomposed determines the?

A. fine-granularity

B. coarse-granularity
C. sub Task

D. granularity

ANSWER: A

A feature of a task-dependency graph that determines the average degree of concurrency for a given
granularity is its ___________ path?

A. critical

B. easy

C. difficult

D. ambiguous

ANSWER: A

The pattern of___________ among tasks is captured by what is known as a task-interaction graph?

A. Interaction

B. communication

C. optmization

D. flow

ANSWER: A

Interaction overheads can be minimized by____?

A. Maximize Data Locality

B. Maximize Volume of data exchange

C. Increase Bandwidth

D. Minimize social media contents

ANSWER: A

Type of parallelism that is naturally expressed by independent tasks in a task-dependency graph is


called _______ parallelism?

A. Task

B. Instruction

C. Data

D. Program
ANSWER: A

Speed up is defined as a ratio of?

A. s=Ts/Tp

B. S= Tp/Ts

C. Ts=S/Tp

D. Tp=S /Ts

ANSWER: A

Parallel computing means to divide the job into several __________?

A. Bit

B. Data

C. Instruction

D. Task

ANSWER: D

_________ is a method for inducing concurrency in problems that can be solved using the divide-
and-conquer strategy?

A. exploratory decomposition

B. speculative decomposition

C. data-decomposition

D. Recursive decomposition

ANSWER: C

The___ time collectively spent by all the processing elements Tall = p TP?

A. total

B. Average

C. mean

D. sum

ANSWER: A
Group communication operations are built using point-to-point messaging primitives?

A. True

B. False

ANSWER: A

Communicating a message of size m over an uncongested network takes time ts + tmw?

A. True

B. False

ANSWER: A

The dual of one-to-all broadcast is ?

A. All-to-one reduction

B. All-to-one receiver

C. All-to-one Sum

D. None of Above

ANSWER: A

A hypercube has?

A. 2d nodes

B. 2d nodes

C. 2n Nodes

D. N Nodes

ANSWER: A

A binary tree in which processors are (logically) at the leaves and internal nodes are routing nodes?

A. True

B. False

ANSWER: A

In All-to-All Broadcast each processor is thesource as well as destination?

A. True
B. False

ANSWER: A

The Prefix Sum Operation can be implemented using the ?

A. All-to-all broadcast kernel.

B. All-to-one broadcast kernel.

C. One-to-all broadcast Kernel

D. Scatter Kernel

ANSWER: A

In the scatter operation ?

A. Single node send a unique message of size m to every other node

B. Single node send a same message of size m to every other node

C. Single node send a unique message of size m to next node

D. None of Above

ANSWER: A

The gather operation is exactly the inverse of the ?

A. Scatter operation

B. Broadcast operation

C. Prefix Sum

D. Reduction operation

ANSWER: A

In All-to-All Personalized Communication Each node has a distinct message of size m for every other
node ?

A. True

B. False

ANSWER: A

Parallel algorithms often require a single process to send identical data to all other processes or to a
subset of them. This operation is known as _________?
A. one-to-all broadcast

B. All to one broadcast

C. one-to-all reduction

D. all to one reduction

ANSWER: A

In which of the following operation, a single node sends a unique message of size m to every other
node?

A. Gather

B. Scatter

C. One to all personalized communication

D. Both A and C

ANSWER: D

Gather operation is also known as ________?

A. One to all personalized communication

B. One to all broadcast

C. All to one reduction

D. All to All broadcast

ANSWER: A

one-to-all personalized communication does not involve any duplication of data?

A. True

B. False

ANSWER: A

Gather operation, or concatenation, in which a single node collects a unique message from each
node?

A. True

B. False

ANSWER: A
Conventional architectures coarsely comprise of a?

A. A processor

B. Memory system

C. Data path.

D. All of Above

ANSWER: D

Data intensive applications utilize?

A. High aggregate throughput

B. High aggregate network bandwidth

C. High processing and memory system performance.

D. None of above

ANSWER: A

A pipeline is like?

A. Overlaps various stages of instruction execution to achieve performance.

B. House pipeline

C. Both a and b

D. A gas line

ANSWER: A

Scheduling of instructions is determined?

A. True Data Dependency

B. Resource Dependency

C. Branch Dependency

D. All of above

ANSWER: D

VLIW processors rely on?

A. Compile time analysis

B. Initial time analysis


C. Final time analysis

D. Mid time analysis

ANSWER: A

Memory system performance is largely captured by?

A. Latency

B. Bandwidth

C. Both a and b

D. none of above

ANSWER: C

The fraction of data references satisfied by the cache is called?

A. Cache hit ratio

B. Cache fit ratio

C. Cache best ratio

D. none of above

ANSWER: A

A single control unit that dispatches the same Instruction to various processors is?

A. SIMD

B. SPMD

C. MIMD

D. None of above

ANSWER: A

The primary forms of data exchange between parallel tasks are?

A. Accessing a shared data space

B. Exchanging messages.

C. Both A and B

D. None of Above

ANSWER: C
Switches map a fixed number of inputs to outputs?

A. True

B. False

ANSWER: A

The First step in developing a parallel algorithm is?

A. To Decompose the problem into tasks that can be executed concurrently

B. Execute directly

C. Execute indirectly

D. None of Above

ANSWER: A

The number of tasks into which a problem is decomposed determines its?

A. Granularity

B. Priority

C. Modernity

D. None of above

ANSWER: A

The length of the longest path in a task dependency graph is called?

A. the critical path length

B. the critical data length

C. the critical bit length

D. None of above

ANSWER: A

The graph of tasks (nodes) and their interactions/data exchange (edges)?

A. Is referred to as a task interaction graph

B. Is referred to as a task Communication graph

C. Is referred to as a task interface graph


D. None of Above

ANSWER: A

Mappings are determined by?

A. task dependency

B. task interaction graphs

C. Both A and B

D. None of Above

ANSWER: C

Decomposition Techniques are?

A. recursive decomposition

B. data decomposition

C. exploratory decomposition

D. All of Above

ANSWER: D

The Owner Computes Rule generally states that the process assigned a particular data item are
responsible for?

A. All computation associated with it

B. Only one computation

C. Only two computation

D. Only occasionally computation

ANSWER: A

A simple application of exploratory decomposition is?

A. The solution to a 15 puzzle

B. The solution to 20 puzzle

C. The solution to any puzzle

D. None of Above

ANSWER: A
Speculative Decomposition consist of ?

A. conservative approaches

B. optimistic approaches

C. Both A and B

D. Only B

ANSWER: C

Task characteristics include?

A. Task generation.

B. Task sizes.

C. Size of data associated with tasks.

D. All of Above.

ANSWER: D

Group communication operations are built using point-to-point messaging primitives?

A. True

B. False

ANSWER: A

Communicating a message of size m over an uncongested network takes time ts + tmw?

A. True

B. False

ANSWER: A

The dual of one-to-all broadcast is?

A. All-to-one reduction

B. All-to-one receiver

C. All-to-one Sum

D. None of Above

ANSWER: A
A hypercube has?

A. 2d nodes

B. 3d nodes

C. 2n Nodes

D. N Nodes

ANSWER: A

A binary tree in which processors are (logically) at the leaves and internal nodes are routing nodes?

A. True

B. False

ANSWER: A

In All-to-All Broadcast each processor is the source as well as destination?

A. True

B. False

ANSWER: A

The Prefix Sum Operation can be implemented using the?

A. All-to-all broadcast kernel.

B. All-to-one broadcast kernel.

C. One-to-all broadcast Kernel

D. Scatter Kernel

ANSWER: A

In the scatter operation?

A. Single node send a unique message of size m to every other node

B. Single node send a same message of size m to every other node

C. Single node send a unique message of size m to next node

D. None of Above

ANSWER: A
The gather operation is exactly the inverse of the?

A. Scatter operation

B. Broadcast operation

C. Prefix Sum

D. Reduction operation

ANSWER: A

In All-to-All Personalized Communication Each node has a distinct message of size m for every other
node?

A. True

B. False

ANSWER: A

Computer system of a parallel computer is capable of?

A. Decentralized computing

B. Parallel computing

C. Centralized computing

D. Decentralized computing

E. Distributed computing

ANSWER: A

Writing parallel programs is referred to as?

A. Parallel computation

B. Parallel processes

C. Parallel development

D. Parallel programming

ANSWER: D

Simplifies applications of three-tier architecture is ____________?

A. Maintenance
B. Initiation

C. Implementation

D. Deployment

ANSWER: D

Dynamic networks of networks, is a dynamic connection that grows is called?

A. Multithreading

B. Cyber cycle

C. Internet of things

D. Cyber-physical system

ANSWER: C

In which application system Distributed systems can run well?

A. HPC

D. HTC

C. HRC

D. Both A and B

ANSWER: D

In which systems desire HPC and HTC?

A. Adaptivity

B. Transparency

C. Dependency

D. Secretive

ANSWER: B

No special machines manage the network of architecture in which resources are known as?

A. Peer-to-Peer

B. Space based

C. Tightly coupled

D. Loosely coupled
ANSWER: A

Significant characteristics of Distributed systems have of ?

A. 5 types

B. 2 types

C. 3 types

D. 4 types

ANSWER: C

Built of Peer machines are over?

A. Many Server machines

B. 1 Server machine

C. 1 Client machine

D. Many Client machines

ANSWER: D

Type HTC applications are?

A. Business

B. Engineering

C. Science

D. Media mass

ANSWER: A

Virtualization that creates one single address space architecture that of, is called?

A. Loosely coupled

B. Peer-to-Peer

C. Space-based

D. Tightly coupled

ANSWER: C

We have an internet cloud of resources In cloud computing to form?


A. Centralized computing

B. Decentralized computing

C. Parallel computing

D. All of these

ANSWER: D

Data access and storage are elements of Job throughput, of __________?

A. Flexibility

B. Adaptation

C. Efficiency

D. Dependability

ANSWER: C

Billions of job requests is over massive data sets, ability to support known as?

A. Efficiency

B. Dependability

C. Adaptation

D. Flexibility

ANSWER: C

Broader concept offers Cloud computing .to select which of the following?

A. Parallel computing

B. Centralized computing

C. Utility computing

D. Decentralized computing

ANSWER: C

Resources and clients transparency that allows movement within a system is called?

A. Mobility transparency

B. Concurrency transparency

C. Performance transparency
D. Replication transparency

ANSWER: A

Distributed program in a distributed computer running a is known as?

A. Distributed process

B. Distributed program

C. Distributed application

D. Distributed computing

ANSWER: B

Uniprocessor computing devices is called__________?

A. Grid computing

B. Centralized computing

C. Parallel computing

D. Distributed computing

ANSWER: B

Utility computing focuses on a______________ model?

A. Data

B. Cloud

C. Scalable

D. Business

ANSWER: D

What is a CPS merges technologies?

A. 5C

B. 2C

C. 3C

D. 4C

ANSWER: C
Aberration of HPC?

A. High-peak computing

B. High-peripheral computing

C. High-performance computing

D. Highly-parallel computing

ANSWER: C

Peer-to-Peer leads to the development of technologies like?

A. Norming grids

B. Data grids

C. Computational grids

D. Both A and B

ANSWER: D

Type of HPC applications of?

A. Management

B. Media mass

C. Business

D. Science

ANSWER: D

The development generations of Computer technology has gone through?

A. 6

B. 3

C. 4

D. 5

ANSWER: D

Utilization rate of resources in an execution model is known to be its?

A. Adaptation

B. Efficiency
C. Dependability

D. Flexibility

ANSWER: B

Even under failure conditions Providing Quality of Service (QoS) assurance is the responsibility of?

A. Dependability

B. Adaptation

C. Flexibility

D. Efficiency

ANSWER: A

Interprocessor communication that takes place?

A. Centralized memory

B. Shared memory

C. Message passing

D. Both A and B

ANSWER: D

Data centers and centralized computing covers many and?

A. Microcomputers

B. Minicomputers

C. Mainframe computers

D. Supercomputers

ANSWER: D

Which of the following is an primary goal of HTC paradigm___________?

A. High ratio Identification

B. Low-flux computing

C. High-flux computing

D. Computer utilities

ANSWER: C
The high-throughput service provided is measures taken by

A. Flexibility

B. Efficiency

C. Dependability

D. Adaptation

ANSWER: D

What are the sources of overhead?

A. Essential /Excess Computation

B. Inter-process Communication

C. Idling

D. All above

ANSWER: D

Which are the performance metrics for parallel systems?

A. Execution Time

B. Total Parallel Overhead

C. Speedup

D. All above

ANSWER: D

The efficiency of a parallel program can be written as: E = Ts / pTp. True or False?

A. True

B. False

ANSWER: A

The important feature of the VLIW is ______?

A. ILP

B. Performance

C. Cost effectiveness
D. delay

ANSWER: A
SUB : 410241 HPC
Which of the following statements are true with regard to compute capability in CUDA

A. Code compiled for hardware of one compute capability will not need to be re-
compiled to run on hardware of another

B. Different compute capabilities may imply a different amount of local memory per
thread

C. Compute capability is measured by the number of FLOPS a GPU accelerator can


compute.

Answer : B

True or False: The threads in a thread block are distributed across SM units so that each
thread is executed by one SM unit.

A. True

B. False

Answer : B

The style of parallelism supported on GPUs is best described as

A. SISD - Single Instruction Single Data

B. MISD - Multiple Instruction Single Data

C. SIMT - Single Instruction Multiple Thread

Answer : C

True or false: Functions annotated with the __global__ qualifier may be executed on the
host or the device

A. True

B. Flase

Answer : A
SUB : 410241 HPC
Which of the following correctly describes a GPU kernel

A. A kernel may contain a mix of host and GPU code

B. All thread blocks involved in the same computation use the same kernel

C. A kernel is part of the GPU's internal micro-operating system, allowing it to act as in


independent host

Answer : B

Which of the following is not a form of parallelism supported by CUDA

A. Vector parallelism - Floating point computations are executed in parallel on wide


vector units

B. Thread level task parallelism - Different threads execute a different tasks

C. Block and grid level parallelism - Different blocks or grids execute different tasks

D. Data parallelism - Different threads and blocks process different parts of data in
memory

Answer :A

What strategy does the GPU employ if the threads within a warp diverge in their execution?

A. Threads are moved to different warps so that divergence does not occur within a
single warp

B. Threads are allowed to diverge

C. All possible execution paths are run by all threads in a warp serially so that thread
instructions do not diverge

Answer : C

Which of the following does not result in uncoalesced (i.e. serialized) memory access on the
K20 GPUs installed on Stampede

A. Aligned, but non-sequential access

B. Misaligned data access

C. Sparse memory access

Answer : A
SUB : 410241 HPC
Which of the following correctly describes the relationship between Warps, thread blocks,
and CUDA cores?

A. A warp is divided into a number of thread blocks, and each thread block executes on
a single CUDA core

B. A thread block may be divided into a number of warps, and each warp may execute
on a single CUDA core

C. A thread block is assigned to a warp, and each thread in the warp is executed on a
separate CUDA core

Answer : B

Shared memory in CUDA is accessible to:

A. All threads in a single block

B. Both the host and GPU

C. All threads associated with a single kernel

Answer : A

CUDA Architecture CPU consist of

A. CUDA Libraries

B. CUDA Runtime

C. CUDA Driver

D. All Above

Answer : D

CUDA platform works on

A. C

B. C++

C. Forton

D. All Above

Answer : D
SUB : 410241 HPC
Threads support Shared memory and Synchronization

A. True

B. False

Answer : A

Application of CUDA are

A. Fast Video Transcoding

B. Medical Imaging

C. Computational Science

D. Oil and Natural Resources exploration

E. All Above

Answer : E

GPU execute device code

A. True

B. False

Answer : A
SUB : 410241 HPC
What are the issues in sorting?

A. Where the Input and Output Sequences are Stored

B. How Comparisons are Performed

C. All above

Answer : C

The parallel run time of the formulation for Bubble sort is

A. Tp = O(n/plogn/p) + O(n) + O(n)

B. Tp = O(n/plogn/p) + O (n/plogp) + O(ln/p)

C. Non of the above

Answer : A

What are the variants of Bubble sort?

A. Shell sort

B. Quick sort

C. Odd-Even transposition

D. Option A & C

Answer : D

What is the overall complexity of parallel algorithm for quick sort?

A. Tp = O(n/p logn/p) + O(n/p logp) + O(log2p)

B. Tp = O(n/p logn/p) + O(n/p logp)

C. Tp = O(n/p logn/p) + O(log2p)

Answer : A
SUB : 410241 HPC
Formally, given a weighted graph G(V, E, w), the all-pairs shortest paths problem is to
find the shortest paths between all pairs of vertices. True or False?

A. True
B. False

Answer : A

What is true for parallel formulation of Dijkstra’s Algorithm?

A. One approach partitions the vertices among different processes and has each process
compute the single-source shortest paths for all vertices assigned to it. We refer to
this approach as the source-partitioned formulation.
B. Another approach assigns each vertex to a set of processes and uses the parallel
formulation of the single-source algorithm to solve the problem on each set of
processes. We refer to this approach as the source-parallel formulation.
C. Both are true
D. Non of these is true

Answer : C

Search algorithms can be used to solve discrete optimization problems. True or False ?

A. True
B. False
Answer : A

Examples of Discrete optimization problems are ;


A. planning and scheduling,
B. The optimal layout of VLSI chips,
C. Robot motion planning,
D. Test-pattern generation for digital circuits, and logistics and control.
E. All of above
Answer : E

List the important parameters of Parallel DFS

A. Work- Splitting Strategies

B. Load balancing Schemes

C. All of above

Answer : C
SUB : 410241 HPC
List the communication strategies for parallel BFS.

A. Random communication strategy

B. Ring communication strategy

C. Blackboard communication strategy

D. All of above

Answer : D

The lower bound on any comparison-based sort of n numbers is Θ(nlog n)


A. True
B. False
Answer : A

In a compare-split operation
A. Each process sends its block of size n/p to the other process
B. Each process merges the received block with its own block and retains only the
appropriate half of the merged block
C. Both A & B
Answer : C

In a typical sorting network


A. Every sorting network is made up of a series of columns
B. Each column contains a number of comparators connected in parallel
C. Both A & B
Answer : C

Bubble sort is difficult to parallelize since the algorithm has no concurrency


A. True
B. False
Answer : A
SUB : 410241 HPC
What are the sources of overhead?

A. Essential /Excess Computation


B. Inter-process Communication
C. Idling
D. All above

Answer : D

Which are the performance metrics for parallel systems?

A. Execution Time
B. Total Parallel Overhead
C. Speedup
D. Efficiency
E. Cost
F. All above

Answer : F

The efficiency of a parallel program can be written as: E = Ts / pTp. True or False?

A. True
B. False

Answer : A

Overhead function or total overhead of a parallel system as the total time collectively
spent by all the processing elements over and above that required by the fastest known
sequential algorithm for solving the same problem on a single processing element.
True or False?
A. True
B. False
Answer : A

What is Speedup?
A. A measure that captures the relative benefit of solving a problem in parallel. It is defined as the
ratio of the time taken to solve a problem on a single processing element to the time required to
solve the same problem on a parallel computer with p identical processing elements.
B. A measure of the fraction of time for which a processing element is usefully
employed.
C. None of the above
Answer : A

In an ideal parallel system, speedup is equal to p and efficiency is equal to one. True or
False?
A. True
B. False
Answer : A
SUB : 410241 HPC
A parallel system is said to be ________________ if the cost of solving a problem on a
parallel computer has the same asymptotic growth (in terms) as a function of the input
size as the fastest-known sequential algorithm on a single processing element.
A. Cost optimal
B. Non Cost optimal
Answer : A

Using fewer than the maximum possible number of processing elements to execute a
parallel algorithm is called ______________ a parallel system in terms of the number of
processing elements.

A. Scaling down
B. Scaling up
Answer : B

The __________________ function determines the ease with which a parallel system can
maintain a constant efficiency and hence achieve speedups increasing in proportion to the
number of processing elements.
A. Isoefficiency
B. Efficiency
C. Scalability
D. Total overhead
Answer : A

Minimum execution time for adding n numbers is Tp = n/p + 2 logp True or False ?
A. True
B. False
Answer : A

The overhead function To = pTP − TS.


A. True
B. False
Answer : A

Performance Metrics for Parallel Systems: Speedup(S) =TS/TP


A. True
B. False
Answer : A

Matrix Vector multiplication 2D Partitions requires some basic communication operations


A. one-to-one communication to align the vector along the main diagonal
B. one-to-all broadcast of each vector element among the n processes of each column
C. all-to-one reduction in each row
D. All Above
Answer : D
Which is alternative options for latency hiding?
A. Increase CPU frequency
B. Multithreading
C. Increase Bandwidth
D. Increase Memory
ANSWER: B

______ Communication model is generally seen in tightly coupled


system.
A. Message Passing
B. Shared-address space
C. Client-Server
D. Distributed Network
ANSWER: B

The principal parameters that determine the communication latency


are as follows:
A. Startup time (ts) Per-hop time (th) Per-word transfer time (tw)
B. Startup time (ts) Per-word transfer time (tw)
C. Startup time (ts) Per-hop time (th)
D. Startup time (ts) Message-Packet-Size(W)
ANSWER: A

The number and size of tasks into which a problem is decomposed


determines the __
A. Granularity
B. Task
C. Dependency Graph
D. Decomposition
ANSWER: A

Average Degree of Concurrency is...


A. The average number of tasks that can run concurrently over the
entire duration of execution of the process.
B. The average time that can run concurrently over the entire
duration of execution of the process.
C. The average in degree of task dependency graph.
D. The average out degree of task dependency graph.
ANSWER: A

Which task decomposition technique is suitable for the 15-puzzle


problem?
A. Data decomposition
B. Exploratory decomposition
C. Speculative decomposition
D. Recursive decomposition
ANSWER: B

Which of the following method is used to avoid Interaction


Overheads?
A. Maximizing data locality
B. Minimizing data locality
C. Increase memory size
D. None of the above.
ANSWER: A

Which of the following is not parallel algorithm model


A. The Data Parallel Model
B. The work pool model
C. The task graph model
D. The Speculative Model
ANSWER: D

Nvidia GPU based on following architecture


A. MIMD
B. SIMD
C. SISD
D. MISD
ANSWER: B

What is Critical Path?

A. The length of the longest path in a task dependency graph is


called the critical path length.
B. The length of the smallest path in a task dependency graph is
called the critical path length.
C. Path with loop
D. None of the mentioned.
ANSWER: A

Which decompositioin technique uses divide-andconquer strategy?


A. recursive decomposition
B. Sdata decomposition
C. exploratory decomposition
D. speculative decomposition
ANSWER: A

If there are 6 nodes in a ring topology how many message passing


cycles will be required to complete broadcast process in one to all?

A. 1
B. 6
C. 3
D. 4
ANSWER: 3

If there is 4 X 4 Mesh topology network then how many ring operation


will perform to complete one to all broadcast?

A. 4
B. 8
C. 16
D. 32
ANSWER: 8

Consider all to all broadcast in ring topology with 8 nodes. How


many messages will be present with each node after 3rd step/cycle of
communication?
A. 3
B. 4
C. 6
D. 7
ANSWER: 4

Consider Hypercube topology with 8 nodes then how many message


passing cycles will require in all to all broadcast operation?

A. The longest path between any pair of finish nodes.


B. The longest directed path between any pair of start & finish
node.
C. The shortest path between any pair of finish nodes.
D. The number of maximum nodes level in graph.
ANSWER: D

Scatter is ____________.

A. One to all broadcast communication


B. All to all broadcast communication
C. One to all personalised communication
D. Node of the above.
ANSWER: C

If there is 4X4 Mesh Topology ______ message passing cycles will


require complete all to all reduction.
A. 4
B. 6
C. 8
D. 16
ANSWER: C

Following issue(s) is/are the true about sorting techniques with


parallel computing.
A. Large sequence is the issue
B. Where to store output sequence is the issue
C. Small sequence is the issue
D. None of the above
ANSWER: B

Partitioning on series done after ______________


A. Local arrangement
B. Processess assignments
C. Global arrangement
D. None of the above
ANSWER: C

In Parallel DFS processes has following roles.(Select multiple


choices if applicable)
A. Donor
B. Active
C. Idle
D. Passive
ANSWER: A
Suppose there are 16 elements in a series then how many phases will
be required to sort the series using parallel odd-even bubble sort?
A. 8
B. 4
C. 5
D. 15
ANSWER: D

Which are different sources of Overheads in Parallel Programs?


A. Interprocess interactions
B. Process Idling
C. All mentioned options
D. Excess Computation
ANSWER: C

The ratio of the time taken to solve a problem on a parallel


processors to the time required to solve the same problem on a
single processor with p identical processing elements.
A. The ratio of the time taken to solve a problem on a single
processor to the time required to solve the same problem on a
parallel computer with p identical processing elements.
B. The ratio of the time taken to solve a problem on a single
processor to the time required to solve the same problem on a
parallel computer with p identical processing elements
C. The ratio of number of multiple processors to size of data
D. None of the above
ANSWER: B

Efficiency is a measure of the fraction of time for which a


processing element is usefully employed.
A. TRUE
B. FALSE
ANSWER: A

CUDA helps do execute code in parallel mode using __________


A. CPU
B. GPU
C. ROM
D. Cash memory
ANSWER: B

In thread-function execution scenario thread is a ___________


A. Work
B. Worker
C. Task
D. None of the above
ANSWER: B

In GPU Following statements are true


A. Grid contains Block
B. Block contains Threads
C. All the mentioned options.
D. SM stands for Streaming MultiProcessor
ANSWER: C

Computer system of a parallel computer is capable of_____________


A. Decentralized computing
B. Parallel computing
C. Centralized computing
D. All of these
ANSWER: A

In which application system Distributed systems can run well?


A. HPC
B. Distrubuted Framework
C. HRC
D. None of the above
ANSWER: A

A pipeline is like .................... ?


A. an automobile assembly line
B. house pipeline
C. both a and b
D. a gas line
ANSWER: A

Pipeline implements ?
A. fetch instruction
B. decode instruction
C. fetch operand
D. all of above
ANSWER: D

A processor performing fetch or decoding of different instruction


during the execution of another instruction is called ______ ?
A. Super-scaling
B. Pipe-lining
C. Parallel Computation
D. None of these
ANSWER: B

In a parallel execution, the performance will always improve as the


number of processors will increase?
A. True
B. False
ANSWER: B

VLIW stands for ?


A. Very Long Instruction Word
B. Very Long Instruction Width
C. Very Large Instruction Word
D. Very Long Instruction Width
ANSWER: A

In VLIW the decision for the order of execution of the instructions


depends on the program itself?
A. True
B. False
ANSWER: A

Which one is not a limitation of a distributed memory parallel


system?
A. Higher communication time
B. Cache coherency
C. Synchronization overheads
D. None of the above
ANSWER: B

Which of these steps can create conflict among the processors?


A. Synchronized computation of local variables
B. Concurrent write
C. Concurrent read
D. None of the above
ANSWER: B

Which one is not a characteristic of NUMA multiprocessors?


A. It allows shared memory computing
B. Memory units are placed in physically different location
C. All memory units are mapped to one common virtual global memory
D. Processors access their independent local memories
ANSWER: D

Which of these is not a source of overhead in parallel computing?


A. Non-uniform load distribution
B. Less local memory requirement in distributed computing
C. Synchronization among threads in shared memory computing
D. None of the above
ANSWER: B

Systems that do not have parallel processing capabilities are?


A. SISD
B. SIMD
C. MIMD
D. All of the above
ANSWER: A

How does the number of transistors per chip increase according to


Moore ´s law?
A. Quadratically
B. Linearly
C. Cubicly
D. Exponentially
ANSWER: D

Parallel processing may occur?


A. in the instruction stream
B. in the data stream
C. both[A] and [B]
D. none of the above
ANSWER: C
To which class of systems does the von Neumann computer belong?
A. SIMD (Single Instruction Multiple Data)
B. MIMD (Multiple Instruction Multiple Data)
C. MISD (Multiple Instruction Single Data)
D. SISD (Single Instruction Single Data)
ANSWER: D

Fine-grain threading is considered as a ______ threading?


A. Instruction-level
B. Loop level
C. Task-level
D. Function-level
ANSWER: A

Multiprocessor is systems with multiple CPUs, which are capable of


independently executing different tasks in parallel. In this
category every processor and memory module has similar access time?
A. UMA
B. Microprocessor
C. Multiprocessor
D. NUMA
ANSWER: A

For inter processor communication the miss arises are called?


A. hit rate
B. coherence misses
C. comitt misses
D. parallel processing
ANSWER: B

NUMA architecture uses _______in design?


A. cache
B. shared memory
C. message passing
D. distributed memory
ANSWER: D

A multiprocessor machine which is capable of executing multiple


instructions on multiple data sets?
A. SISD
B. SIMD
C. MIMD
D. MISD
ANSWER: C

In message passing, send and receive message between?


A. Task or processes
B. Task and Execution
C. Processor and Instruction
D. Instruction and decode
ANSWER: A

The First step in developing a parallel algorithm is_________?


A. To Decompose the problem into tasks that can be executed
concurrently
B. Execute directly
C. Execute indirectly
D. None of Above
ANSWER: A

The number of tasks into which a problem is decomposed determines


its?
A. Granularity
B. Priority
C. Modernity
D. None of above
ANSWER: A

The length of the longest path in a task dependency graph is called?


A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above
ANSWER: A

The graph of tasks (nodes) and their interactions/data exchange


(edges)?
A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above
ANSWER: A

Mappings are determined by?


A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above
ANSWER: C

Decomposition Techniques are?


A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. All of Above
ANSWER: D

The Owner Computes Rule generally states that the process assigned a
particular data item is responsible for?
A. All computation associated with it
B. Only one computation
C. Only two computation
D. Only occasionally computation
ANSWER: A

A simple application of exploratory decomposition is_?


A. The solution to a 15 puzzle
B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above
ANSWER: A

Speculative Decomposition consist of _?


A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B
ANSWER: C

task characteristics include?


A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above
ANSWER: D

Writing parallel programs is referred to as?


A. Parallel computation
B. Parallel processes
C. Parallel development
D. Parallel programming
ANSWER: D

Parallel Algorithm Models?


A. Data parallel model
B. Bit model
C. Data model
D. network model
ANSWER: A

The number and size of tasks into which a problem is decomposed


determines the?
A. fine-granularity
B. coarse-granularity
C. sub Task
D. granularity
ANSWER: A

A feature of a task-dependency graph that determines the average


degree of concurrency for a given granularity is its ___________
path?
A. critical
B. easy
C. difficult
D. ambiguous
ANSWER: A

The pattern of___________ among tasks is captured by what is known


as a task-interaction graph?
A. Interaction
B. communication
C. optmization
D. flow
ANSWER: A

Interaction overheads can be minimized by____?


A. Maximize Data Locality
B. Maximize Volume of data exchange
C. Increase Bandwidth
D. Minimize social media contents
ANSWER: A

Type of parallelism that is naturally expressed by independent tasks


in a task-dependency graph is called _______ parallelism?
A. Task
B. Instruction
C. Data
D. Program
ANSWER: A

Speed up is defined as a ratio of?


A. s=Ts/Tp
B. S= Tp/Ts
C. Ts=S/Tp
D. Tp=S /Ts
ANSWER: A

Parallel computing means to divide the job into several __________?


A. Bit
B. Data
C. Instruction
D. Task
ANSWER: D

_________ is a method for inducing concurrency in problems that can


be solved using the divide-and-conquer strategy?
A. exploratory decomposition
B. speculative decomposition
C. data-decomposition
D. Recursive decomposition
ANSWER: C

The___ time collectively spent by all the processing elements Tall =


p TP?
A. total
B. Average
C. mean
D. sum
ANSWER: A

Group communication operations are built using point-to-point


messaging primitives?
A. True
B. False
ANSWER: A
Communicating a message of size m over an uncongested network takes
time ts + tmw?
A. True
B. False
ANSWER: A

The dual of one-to-all broadcast is ?


A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above
ANSWER: A

A hypercube has?
A. 2d nodes
B. 2d nodes
C. 2n Nodes
D. N Nodes
ANSWER: A

A binary tree in which processors are (logically) at the leaves and


internal nodes are routing nodes?
A. True
B. False
ANSWER: A

In All-to-All Broadcast each processor is thesource as well as


destination?
A. True
B. False
ANSWER: A

The Prefix Sum Operation can be implemented using the ?


A. All-to-all broadcast kernel.
B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel
ANSWER: A

In the scatter operation ?


A. Single node send a unique message of size m to every other node
B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node
D. None of Above
ANSWER: A

The gather operation is exactly the inverse of the ?


A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation
ANSWER: A

In All-to-All Personalized Communication Each node has a distinct


message of size m for every other node ?
A. True
B. False
ANSWER: A

Parallel algorithms often require a single process to send identical


data to all other processes or to a subset of them. This operation
is known as _________?
A. one-to-all broadcast
B. All to one broadcast
C. one-to-all reduction
D. all to one reduction
ANSWER: A

In which of the following operation, a single node sends a unique


message of size m to every other node?
A. Gather
B. Scatter
C. One to all personalized communication
D. Both A and C
ANSWER: D

Gather operation is also known as ________?


A. One to all personalized communication
B. One to all broadcast
C. All to one reduction
D. All to All broadcast
ANSWER: A

one-to-all personalized communication does not involve any


duplication of data?
A. True
B. False
ANSWER: A

Gather operation, or concatenation, in which a single node collects


a unique message from each node?
A. True
B. False
ANSWER: A

Conventional architectures coarsely comprise of a?


A. A processor
B. Memory system
C. Data path.
D. All of Above
ANSWER: D

Data intensive applications utilize?


A. High aggregate throughput
B. High aggregate network bandwidth
C. High processing and memory system performance.
D. None of above
ANSWER: A
A pipeline is like?
A. Overlaps various stages of instruction execution to achieve
performance.
B. House pipeline
C. Both a and b
D. A gas line
ANSWER: A

Scheduling of instructions is determined?


A. True Data Dependency
B. Resource Dependency
C. Branch Dependency
D. All of above
ANSWER: D

VLIW processors rely on?


A. Compile time analysis
B. Initial time analysis
C. Final time analysis
D. Mid time analysis
ANSWER: A

Memory system performance is largely captured by?


A. Latency
B. Bandwidth
C. Both a and b
D. none of above
ANSWER: C

The fraction of data references satisfied by the cache is called?


A. Cache hit ratio
B. Cache fit ratio
C. Cache best ratio
D. none of above
ANSWER: A

A single control unit that dispatches the same Instruction to


various processors is?
A. SIMD
B. SPMD
C. MIMD
D. None of above
ANSWER: A

The primary forms of data exchange between parallel tasks are?


A. Accessing a shared data space
B. Exchanging messages.
C. Both A and B
D. None of Above
ANSWER: C

Switches map a fixed number of inputs to outputs?


A. True
B. False
ANSWER: A

The First step in developing a parallel algorithm is?


A. To Decompose the problem into tasks that can be executed
concurrently
B. Execute directly
C. Execute indirectly
D. None of Above
ANSWER: A

The number of tasks into which a problem is decomposed determines


its?
A. Granularity
B. Priority
C. Modernity
D. None of above
ANSWER: A

The length of the longest path in a task dependency graph is called?


A. the critical path length
B. the critical data length
C. the critical bit length
D. None of above
ANSWER: A

The graph of tasks (nodes) and their interactions/data exchange


(edges)?
A. Is referred to as a task interaction graph
B. Is referred to as a task Communication graph
C. Is referred to as a task interface graph
D. None of Above
ANSWER: A

Mappings are determined by?


A. task dependency
B. task interaction graphs
C. Both A and B
D. None of Above
ANSWER: C

Decomposition Techniques are?


A. recursive decomposition
B. data decomposition
C. exploratory decomposition
D. All of Above
ANSWER: D

The Owner Computes Rule generally states that the process assigned a
particular data item are responsible for?
A. All computation associated with it
B. Only one computation
C. Only two computation
D. Only occasionally computation
ANSWER: A

A simple application of exploratory decomposition is?


A. The solution to a 15 puzzle
B. The solution to 20 puzzle
C. The solution to any puzzle
D. None of Above
ANSWER: A

Speculative Decomposition consist of ?


A. conservative approaches
B. optimistic approaches
C. Both A and B
D. Only B
ANSWER: C

Task characteristics include?


A. Task generation.
B. Task sizes.
C. Size of data associated with tasks.
D. All of Above.
ANSWER: D

Group communication operations are built using point-to-point


messaging primitives?
A. True
B. False
ANSWER: A

Communicating a message of size m over an uncongested network takes


time ts + tmw?
A. True
B. False
ANSWER: A

The dual of one-to-all broadcast is?


A. All-to-one reduction
B. All-to-one receiver
C. All-to-one Sum
D. None of Above
ANSWER: A

A hypercube has?
A. 2d nodes
B. 3d nodes
C. 2n Nodes
D. N Nodes
ANSWER: A

A binary tree in which processors are (logically) at the leaves and


internal nodes are routing nodes?
A. True
B. False
ANSWER: A
In All-to-All Broadcast each processor is the source as well as
destination?
A. True
B. False
ANSWER: A

The Prefix Sum Operation can be implemented using the?


A. All-to-all broadcast kernel.
B. All-to-one broadcast kernel.
C. One-to-all broadcast Kernel
D. Scatter Kernel
ANSWER: A

In the scatter operation?


A. Single node send a unique message of size m to every other node
B. Single node send a same message of size m to every other node
C. Single node send a unique message of size m to next node
D. None of Above
ANSWER: A

The gather operation is exactly the inverse of the?


A. Scatter operation
B. Broadcast operation
C. Prefix Sum
D. Reduction operation
ANSWER: A

In All-to-All Personalized Communication Each node has a distinct


message of size m for every other node?
A. True
B. False
ANSWER: A

Computer system of a parallel computer is capable of?


A. Decentralized computing
B. Parallel computing
C. Centralized computing
D. Decentralized computing
E. Distributed computing
ANSWER: A

Writing parallel programs is referred to as?


A. Parallel computation
B. Parallel processes
C. Parallel development
D. Parallel programming
ANSWER: D

Simplifies applications of three-tier architecture is ____________?


A. Maintenance
B. Initiation
C. Implementation
D. Deployment
ANSWER: D

Dynamic networks of networks, is a dynamic connection that grows is


called?
A. Multithreading
B. Cyber cycle
C. Internet of things
D. Cyber-physical system
ANSWER: C

In which application system Distributed systems can run well?


A. HPC
D. HTC
C. HRC
D. Both A and B
ANSWER: D

In which systems desire HPC and HTC?


A. Adaptivity
B. Transparency
C. Dependency
D. Secretive
ANSWER: B

No special machines manage the network of architecture in which


resources are known as?
A. Peer-to-Peer
B. Space based
C. Tightly coupled
D. Loosely coupled
ANSWER: A

Significant characteristics of Distributed systems have of ?


A. 5 types
B. 2 types
C. 3 types
D. 4 types
ANSWER: C

Built of Peer machines are over?


A. Many Server machines
B. 1 Server machine
C. 1 Client machine
D. Many Client machines
ANSWER: D

Type HTC applications are?


A. Business
B. Engineering
C. Science
D. Media mass
ANSWER: A

Virtualization that creates one single address space architecture


that of, is called?
A. Loosely coupled
B. Peer-to-Peer
C. Space-based
D. Tightly coupled
ANSWER: C

We have an internet cloud of resources In cloud computing to form?


A. Centralized computing
B. Decentralized computing
C. Parallel computing
D. All of these
ANSWER: D

Data access and storage are elements of Job throughput, of


__________?
A. Flexibility
B. Adaptation
C. Efficiency
D. Dependability
ANSWER: C

Billions of job requests is over massive data sets, ability to


support known as?
A. Efficiency
B. Dependability
C. Adaptation
D. Flexibility
ANSWER: C

Broader concept offers Cloud computing .to select which of the


following?
A. Parallel computing
B. Centralized computing
C. Utility computing
D. Decentralized computing
ANSWER: C

Resources and clients transparency that allows movement within a


system is called?
A. Mobility transparency
B. Concurrency transparency
C. Performance transparency
D. Replication transparency
ANSWER: A

Distributed program in a distributed computer running a is known as?


A. Distributed process
B. Distributed program
C. Distributed application
D. Distributed computing
ANSWER: B

Uniprocessor computing devices is called__________?


A. Grid computing
B. Centralized computing
C. Parallel computing
D. Distributed computing
ANSWER: B

Utility computing focuses on a______________ model?


A. Data
B. Cloud
C. Scalable
D. Business
ANSWER: D

What is a CPS merges technologies?


A. 5C
B. 2C
C. 3C
D. 4C
ANSWER: C

Aberration of HPC?
A. High-peak computing
B. High-peripheral computing
C. High-performance computing
D. Highly-parallel computing
ANSWER: C

Peer-to-Peer leads to the development of technologies like?


A. Norming grids
B. Data grids
C. Computational grids
D. Both A and B
ANSWER: D

Type of HPC applications of?


A. Management
B. Media mass
C. Business
D. Science
ANSWER: D

The development generations of Computer technology has gone through?


A. 6
B. 3
C. 4
D. 5
ANSWER: D

Utilization rate of resources in an execution model is known to be


its?
A. Adaptation
B. Efficiency
C. Dependability
D. Flexibility
ANSWER: B

Even under failure conditions Providing Quality of Service (QoS)


assurance is the responsibility of?
A. Dependability
B. Adaptation
C. Flexibility
D. Efficiency
ANSWER: A

Interprocessor communication that takes place?


A. Centralized memory
B. Shared memory
C. Message passing
D. Both A and B
ANSWER: D

Data centers and centralized computing covers many and?


A. Microcomputers
B. Minicomputers
C. Mainframe computers
D. Supercomputers
ANSWER: D

Which of the following is an primary goal of HTC


paradigm___________?
A. High ratio Identification
B. Low-flux computing
C. High-flux computing
D. Computer utilities
ANSWER: C

The high-throughput service provided is measures taken by


A. Flexibility
B. Efficiency
C. Dependability
D. Adaptation
ANSWER: D

What are the sources of overhead?


A. Essential /Excess Computation
B. Inter-process Communication
C. Idling
D. All above
ANSWER: D

Which are the performance metrics for parallel systems?


A. Execution Time
B. Total Parallel Overhead
C. Speedup
D. All above
ANSWER: D

The efficiency of a parallel program can be written as: E = Ts /


pTp. True or False?
A. True
B. False
ANSWER: A

The important feature of the VLIW is ______?


A. ILP
B. Performance
C. Cost effectiveness
D. delay
ANSWER: A
1.Following is true about one to all broadcast

A.In one to all broadcast initially there will be P(Number of processors) copies of messages and

after broadcast finally there will be simgle copy

B.In one to all broadcast initially there will be single copy of message and after broadcast finally

there will P(Number of processors) copies.

Submit

Answer

“In one to all broadcast initially there will be single copy of message and after broadcast finally

there will P(Number of processors) copies.”

2.If total 8 nodes are in ring topology after one to all message broadcasting how many

source nodes will be present?

Submit

Answer

8
3.Current source node selects _____ node as next source node in linear/ring one to all

message broadcast

A.nearest node

B.longest node

Submit

Answer

longest node

4.In All-to-one reduction after reduction the final copy of massage is avilible on which

node?

A.Source Node

B.Destination Node

C.Both of the above

D.None of these

Answer

Destination Node

5.If there is 4 by 4 mesh topology network present(as per shown in the video) then in how

many broadcast cycles will be required to reach message to all 16 nodes?

8
4

16

Submit

Answer

6.If there are 8 nodes in a ring topology how many message passing cycles will be

required to complete reduction process

Submit

Answer

7.In One to all broadcast using Hypercube topology how source node selects next

destination node?

Node which is having lowest binary code (label)

Node which is having highet binary code (label)

To all connected node at a time


None of the above

Submit

Answer

Node which is having highet binary code (label)

8.If there are 8 nodes connected in ring topology then ___ number of message passing

cycles will be required to complete all to all broadcast in parallel mode.

Submit

Answer

9.Consider all to all broadcast in ring topology with 8 nodes.How many messages will be

present with each node after 3rd step/cycle of communication?

7
None of the above

Submit

Answer

10.If there are 16 messages in 4x4 mesh then total how many message passsing cycles

will be required to complete all to all broadcast operation?

Submit

Answer

11.If there are P messages in mxm mesh then total how many message passsing cycles

will be required to complete all to all broadcast operation?

2 √P - 2

2 √P - 1

2 √P
None of the above

Submit

Answer

2 √P - 2

12.How many massage passing cycles required for all-to-all broadcasting in 8 nodes

hypercube?

Submit

Answer

13.In scatter opreation after massage broadcasting every node avail with same massage

copy.

True

False

Submit

Answer
False

14.CUDA helps do execute code in parallel mode using __

CPU

GPU

ROM

Cash memory

Submit

Answer

GPU

15.In thread-function execution scenario thread is a ___

Work

Worker

Task

None of the above

Submit

Answer

Worker
16.In GPU Following statements are true

Block contains Grid

Grid contains Block

Block contains Threads

SM stands for Streaming MultiMedia

SM stands for Streaming MultiProcessor

Submit

Answer

“Grid contains Block”, “Block contains Threads”, “SM stands for Streaming MultiProcessor

17.Following issue(s) is/are the true about sorting techniques with parallel computing.

Large sequence is the issue

Where to store output sequence is the issue

Where to store input sequence is the issue

None of the above

Submit

Answer

“Where to store output sequence is the issue”, “Where to store input sequence is the issue”
18.Partitioning on series done after __

Local arrangement

Processess assignments

Global arrangement

None of the above

Submit

Answer

Global arrangement

19.In Parallel DFS processes has following roles.(Select multiple choices if applicable)

Donor

Active

Idle

Recipient

Submit

Answer

“Donor”, “Recipient”
20.Suppose there are 16 elements in a series then how many phases will be required to

sort the series using parallel odd-even bubble sort?

15

Submit

Answer

15

21.Which are different sources of Overheads in Parallel Programs?

Interprocess interactions

Process Idling

Large amount of DATA

Excess Computation

Submit
Answer
“Interprocess interactions”, “Process Idling”, “Excess Computation”

1 / 1 points
1 / 1 attempts
22,​Speedup (S) is….

The ratio of the time taken to solve a problem on a parallel processors to the time required to

solve the same problem on a single processor with p identical processing elements

The ratio of the time taken to solve a problem on a single processor to the time required to solve

the same problem on a parallel computer with p identical processing elements

The ratio of number of multiple processors to size of data

None of the above

Submit
Answer

The ratio of the time taken to solve a problem on a single processor to the time required to solve
the same problem on a parallel computer with p identical processing elements

1 / 1 points
1 / 1 attempts
23.​Efficiency is a measure of the fraction of time for which a processing element is

usefully employed.

TRUE

FALSE
Submit
Answer

TRUE
SUB : 410241 HPC

1. What is Cuda Architecture?

a.CUDA Architecture included a unified shader pipeline, allowing each and every chip to be
marshaled by a program.
b.CUDA Architecture included a unified shader pipeline, allowing each and every unit on the
chip to be marshaled by a program intending to perform general-purpose computations
c.CUDA Architecture included a unified shader pipeline, allowing each and every logic unit
on the chip to be marshaled by a program intending to perform general-purpose computations
d.CUDA Architecture included a unified shader pipeline, allowing each and every arithmetic
logic unit (ALU) on the chip to be marshaled by a program intending to perform general-purpose
computations

Ans.D

2. For the following code write a kernel

__global__ void kernel( void ) { }


int main( void ) {
// Write a kernel here

printf( "Hello, World!\n" ); return 0; }

a.kernel<1, 1>(1,1);
b.kernel<<<1, 1>>>(1,1);
c.kernel<<<1, 1>>>();
d.kernel<<1, 1>>();

Ans. c

3. Find out which is the kernel from following code:

#include <iostream>
__global__ void add( int a, int b, int *c ) {
*c = a + b;
}

int main( void ) {


int c; int *dev_c;

HANDLE_ERROR( cudaMalloc( (void**)&dev_c, sizeof(int) ) );


SUB : 410241 HPC

add<<<1,1>>>( 2, 7, dev_c );

HANDLE_ERROR( cudaMemcpy( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost ) );


printf( "2 + 7 = %d\n", c );
cudaFree( dev_c );
return 0;

a.cudaMalloc( (void**)&dev_c, sizeof(int) )


b.add<<<1,1>>>(2, 7, dev_c)
c.add<<1,1>>( 2, 7, dev_c );
d.add<<<1,1>>>()

Ans.b

4. From following code which particular line is responsible for copying between device to host

#include <iostream>
__global__ void add( int a, int b, int *c ) {
*c = a + b;
}

int main( void ) {


int c; int *dev_c;

HANDLE_ERROR( cudaMalloc( (void**)&dev_c, sizeof(int) ) );

add<<<1,1>>>( 2, 7, dev_c );

HANDLE_ERROR( cudaMemcpy( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost ) );


printf( "2 + 7 = %d\n", c );
cudaFree( dev_c );
return 0;

a. c, dev_c, sizeof(int);
b. HANDLE_ERROR( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost );
SUB : 410241 HPC

c. HANDLE_ERROR( cudaMemcpy( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost ) );


d. cudaMemcpy( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost ) ;

Ans.c

5. What is output of the following code:

#include <iostream>
__global__ void add( int a, int b, int *c ) {
*c = a + b;
}

int main( void ) {


int c; int *dev_c;

HANDLE_ERROR( cudaMalloc( (void**)&dev_c, sizeof(int) ) );

add<<<1,1>>>( 2, 7, dev_c );

HANDLE_ERROR( cudaMemcpy( &c, dev_c, sizeof(int), cudaMemcpyDeviceToHost ) );


printf( "2 + 7 = %d\n", c );
cudaFree( dev_c );
return 0;

a.2
b.9
c.7
d.0

Ans. b

6.what is function of e __global__ qualifier in cuda program

a. alerts the compiler that a function should be compiled to run on a device instead of the host
b. alerts the interpreter that a function should be compiled to run on a device instead of the host
SUB : 410241 HPC

c. alerts the interpreter that a function should be interpreted to run on a device instead of the host
d. alerts the interpreter that a function should be compiled to run on a host instead of the device

ans.a

7.The on-chip memory which is local to every multithreaded Single Instruction Multiple Data (SIMD)
Processor is called

a.Local Memory
b.Global Memory
c.Flash memory
d.Stack

Ans. a

8. The machine object created by the hardware, managing, scheduling, and executing is a thread of

a.DIMS instructions
b.DMM instructions
c.SIMD instructions
d.SIM instructions

Ans. c

9. The primary and essential mechanism to support the sparse matrices is

a.Gather-scatter operations
b.Gather operations
c.Scatter operations
d.Gather-scatter technique

Ans. a

10. Which of the following architectures is/are not suitable for realizing SIMD ?
SUB : 410241 HPC

a.Vector Processor
b.Array Processor
c.Von Neumann
d.All of the above
Ans . c

11. Multithreading allowing multiple-threads for sharing the functional units of a


a.Multiple processor
b.Single processor
c.Dual core
d. Corei5

Ans . b

12. Which compiler is used to compile the cude source code:


a.gcc
b.nvc++
c.nc++
d.nvcc

Ans.d

13. which command line is used to execute a cuda program :

a.nvcc hello.cu -o hello


b.nvg++ heloo.cpp -o hello
c.ncc hello.c -o hello
D.g++ hello.cu -o hello

Ans.a

14.The syntax of kernel execution configuration is as follows


a.<<< M , T >>> with a grid of M thread blocks. Each thread block has T parallel blocks
b.<<< M , T >>> with a grid of M blocks. Each thread block has T parallel threads
c.<<< M , T >>> with a grid of M thread blocks. Each thread block has T parallel threads
d.<<< M , T >>> with a grid of M thread blocks. Each thread block has T threads

Ans. c
SUB : 410241 HPC

15.what it contains threadIdx.x

A.contains the index of the thread within the block


b.contains the index of the block within the thread
c.contains the index of the thread size within the block
d.contains the index of the block size within the thread

Ans. A

16.what it contains blockDim.x


a.contains the size of block
b.contains the size of block thread
c.contains the size of thread block (number of threads in the thread block).
d.the size of thread block
Ans. c

17.memory allocation of of variable x and y in cuda:

A.float *b, *a;


cudaMallocManaged(&, N*sizeof(float));
cudaMallocManaged(&, N*sizeof(float));
B.float *x, *y;
cudaMallocManaged(&a, N*sizeof(float));
cudaMallocManaged(&b, N*sizeof(float));
c.float *a, *b;
cudaMallocManaged(&x, N*sizeof(float));
cudaMallocManaged(&y, N*sizeof(float));
d.float *x, *y;
cudaMallocManaged(&x, N*sizeof(float));
cudaMallocManaged(&y, N*sizeof(float));

Ans. d

18.which function is used for free the memory in cuda


a.cudaFree()
SUB : 410241 HPC

b.Free()
c.Cudafree()
d.CudaFree()
Ans. a

19. Which of the following is not a form of parallelism supported by CUDA

a.Vector parallelism - Floating point computations are executed in parallel on wide vector units
b.Thread level task parallelism - Different threads execute a different tasks
c.Block and grid level parallelism - Different blocks or grids execute different tasks
d.Data parallelism - Different threads and blocks process different parts of data in memory

Ans . a

20.The style of parallelism supported on GPUs is best described as


a.SISD - Single Instruction Single Data
b.MISD - Multiple Instruction Single Data
c.SIMT - Single Instruction Multiple Thread
d.MIMD - Multiple Instruction Multiple Data

Ans. c

21. Which of the following correctly describes a GPU kernel

a.A kernel may contain a mix of host and GPU code


b.All thread blocks involved in the same computation use the same kernel
c.A kernel is part of the GPU's internal micro-operating system, allowing it to act as in independent
host
d.All thread blocks involved in the same computation use the different kernel

Ans .b

22.Shared memory in CUDA is accessible to:

a.All threads in a single block


b.Both the host and GPU
c.All threads associated with a single kernel
d.one thread in a single block
SUB : 410241 HPC

Ans.a

23.Which of the following correctly describes the relationship between Warps, thread blocks, and CUDA
cores?
a.A warp is divided into a number of thread blocks, and each thread block executes on a single
CUDA core
b.A thread block may be divided into a number of warps, and each warp may execute on a single
CUDA core
c.A thread block is assigned to a warp, and each thread in the warp is executed on a separate CUDA
core
d. A block index is same as thread index

Ans .b

24. A processor assigned with a thread block, that executes a code ,which we usually call a
A. multithreaded MIMD processor
b. multithreaded SIMD processor
c. multithreaded
D. multicore
Ans. c

25. Thread blocked altogether and being executed in the sets of 32 thread called as

a.block of thread
b.thread block
c.thread
d.block

Ans. b

26.Who developed CUDA :


a. ARM
b. INTEL
c. AMD
d. NVIDIA
SUB : 410241 HPC

Ans. d

You might also like