Mid Sem1
Mid Sem1
Attempt all the questions. Clearly mark the part of a question while answering like 1.A, 1.B,
and so on.
1. Suppose we have two computers A and B. Computer A has a clock cycle of 1 nanosecond and performs
2 instructions per cycle. Computer B, instead, has a clock cycle of 600 picosecond and performs 1.25
instructions per cycle. Assuming a program requires the execution of the same number of instructions
in both computers: [2 + 2 = 4]
A. Which computer is faster for this program?
B. Which computer will be faster if Computer B required 10% more instructions than Computer
A for executing the same program?
2. Assume the runtime of an application for a problem is 100 seconds for problem size 1. It consists of an
initialization phase which lasts for 10 seconds and cannot be parallelized, and a problem-solving phase
which can be perfectly parallelized and grows quadratic with growing problem size.[4 + 3 + 2 + 1 = 10]
A. What is the speedup for the given application as a function of the number of processors p and
the problem size n?
B. What is the execution time and speedup of the application with problem size 1, if it is paral-
lelized and runs on 4 processors?
C. What is the execution time of the application if the problem size is increased to 4 and it runs
on 4 processors? What is the speedup?
D. With the same setting as before, what is the speedup if executed on 16 processors?
3. For the task graphs given below, determine the following: (1) Maximum degree of concurrency; (2)
Critical path length; (3) Maximum achievable speedup over one process assuming that an arbitrarily
large number of processes is available; (4) The minimum number of processes needed to obtain the
maximum possible speedup. [2 × (2 + 2 + 2 + 2) = 16]
Figure-1 Figure-2
Table 1: Task-Graphs
4. (A) Design a parallel algorithm to add even integers in a list of n integers with work complexity O(n)
and depth O(logn). You can assume that parallel prefix sum has O(n) work and O(logn) depth.
(B) Analyze the work and depth of your algorithm.
[5 + 3 = 8]
5. Show with a schematic diagram how you can embed 8 processors arranged in a linear array into a 3D
hypercube. You should clearly show with an arrow or numbering which node of the array is mapped to
which node of the hypercube. Compute the following as a result of the embedding: (1) congestion (2)
dilation, and (3) expansion [4 + 1 + 1 + 1 = 7]
6. Consider a processor operating at 1 GHz connected to a DRAM with a latency of 100 ns. Assume that
the processor has two multiply-add units and is capable of executing four floating-point operations in
each clock cycle. As there is no cache, you can assume that the block size is one word. [2 + 2 + 2 = 6]
A. What is the peak processor rating in terms of FLOPS?
B. What is the peak speed in computing the dot-product of two vectors?
C. How does the computation of dot product affect if the block size is changed to 4 words?
7. Outline the major differences between a shared memory parallel computer and a distributed memory
parallel computer. Discuss the following: [3 + 3 + 3 = 9]
A. Show the differences in the basic architecture of these two types of machines with respect to
memory access.
B. Describe how the programming models are different between these two paradigms due to the
differences in memory access.
C. Report the relative abundance of one architecture versus the other and where they typically
appear in a practical sense.
Page 2