Architecture
Architecture
Architecture
Instructions:
1) Figures to the right indicate marks (for which these questions were asked in the
past). It may also indicate the length of the answer/points to be covered.
2) Write the questions also followed by the answers.
3) Don’t change the question numbers.
4) Attempt any 150 different questions.
5) Submit by 25 November 2016. You may even submit in three instalments of 50 or
two instalments of 75 each. Your timely submission will ensure that you get the
checked tutorials back for your study timely.
TUTORIALS 1
SUM=0
DO 10 I=1, 1000, 1
10 SUM = SUM + B(I)
Make the following assumptions about the predictions about the prediction accuracy and
hit rate:
Prediction accuracy is 90% (for instructions in the buffer).
Hit rate in the buffer is 90% (for branches predicted taken).
Assume that 60% of the branches are taken. 6
1.12 Discuss different software approaches used to exploit instruction level parallelism. 3
1.13 Consider a loop like the one below:
1
Assume that A, B, and C are distinct, non-overlapping arrays. What are the data
dependences among the statements S1 and S2 in the loop? 4
1.14 Consider a loop like this one:
What are the dependences between S1 and S2? Is this loop parallel? If not, Show how
to make it parallel. 5
1.15 The following loop has multiple types of dependences. Find all the true dependences,
output dependences, and anti-dependences, and eliminate the output dependences
and anti-dependences by renaming. 6
1.16 Assume we have a computer where the clock per instruction (CPI) is 1.0 when all
memory accesses hit in the cache. The only data accesses are loads and stores, and
these total 50% of the instructions. If the miss penalty is 25 clock cycles and the miss
rate is 2%, how much faster would the computer be if all instructions were cache hits? 4
1.17 A pipeline P is found to provide a speedup of 6.16 when operating at 100 MHz and an
efficiency of 88 percent.
i) How many stages does P have?
ii) What are P’s MIPS and CPI performance levels? 8
1.18 Let’s use an in-order execution computer for the first example, such as the Ultra-
SPARC III. Assume the cache miss penalty is 100 clock cycles, and all instructions
normally take 1.0-clock cycles (ignoring memory stalls). Assume the average miss rate
is 2%, there is an average of 1.5 memory references per instruction, and the average
number of cache misses per 1000 instructions is 30. What is the impact on performance
when behaviour of the cache is included? Calculate the impact using both misses per
instruction and miss rate. 6
1.19 What is a nonblocking cache? Discuss its advantages. 4
1.20 Explain compiler-controlled prefetching technique. 4
1.21 What is a virtual cache? Explain why virtual caches are not popular. 5
1.22 What can interleaving and wide memory buy? Consider the following description of a
computer and its cache performance:
Block size = 1 word
Memory bus width = 1 word
Miss rate = 3%
Memory accesses per instruction = 1.2
Cache miss penalty = 64 cycles (as above)
Average cycles per instruction (ignoring cache misses) = 2
If we change the block size to 2 words, the miss rate falls to 2%, and a 4-word block has
a miss rate of 1.2%. What is the improvement in performance of interleaving two ways
versus doubling the width of memory and the bus? 5
1.23 Discuss Flynn’s classification of processors. 3
1.24 Compare shared memory multiprocessor architecture and distributed memory
architecture. 4
1.25 Suppose you want to achieve a speedup of 80 with 100 processors. What fraction of
2
the original computation can be sequential? 6
1.26 Suppose we have an application running on a 32-bit multiprocessor, which has a 400
ns time to handle reference to a remote memory. For this application, assume that all
the references except those involving communication hit in the local memory hierarchy,
which is slightly optimistic. Processors are stalled on a remote request, and the
processor clock rate is 1 GHz. If the base IPC (assuming that all references hit in the
cache) is 2, how much faster is the multiprocessor if there is no communication versus
if 0.2% of the instructions involve a remote communication reference? 5
1.27 Compare shared and switched interconnection media. 4
1.28 Describe the following terminologies associated with multiprocessor operating systems
and MIMD algorithms:
i) Protection mechanisms
ii) Scheduling
iii) Degree of decomposition of a parallel algorithm. [3x3]
1.29 The CM-5 supercomputer used wormhole routing, with each switch buffer being just 4
bits per port. Compare efficiency of store-and-forward versus wormhole routing for a
128-node machine using a CM-5 interconnection sending a 16-byte payload. Assume
each switch takes 0.25 μs and that the transfer rate is 20 MB/sec. 5
TUTORIALS 2
2.1 Comment on the guidelines to convert any blocking multi-stage interconnection network
to its equivalent non-blocking counterpart. 4
2.2 Briefly comment on the major reasons for cache coherence problem in any
multiprocessor system. 4
2.3 Define diameter with respect to static networks. 4
2.4 What are the characteristics of SIMD architecture? 4
2.5 Comment on the number and type of process states of any message passing system in
a multiprocessor system. 4
2.6 What factors determine the performance of vector processors? 4
2.7 Estimate the effect of branch instructions on the performance of pipelined architectures
with suitable set of parameters. 4
2.8 Assume that an unpipelined machine has 10-ns clock cycle and it uses 4 cycles each
for ALU operations and branch instructions and 5 cycles for memory operations.
Assume that the relative frequencies of these operations are 40%, 20% and 40%
respectively. Suppose that due to clock skew and setup, pipelining the machine adds 1
ns overhead to the clock. Ignoring any latency impact, how much speedup in the
instruction execution rate will be gained from a pipeline? 6
2.9 From a given reservation table of a pipelined architecture (uni-function), how will find
the minimal and maximal values of average latencies? Justify your answer. 12
2.10 What is meant by cache-coherency? Explain with the help of a suitable example. State
any three techniques to reduce cache miss. How does two-level cache increase
performance? Derive the formula for average access time in a three-level cache?
[3+2+6+4+3]
2.11 What is a vector processor? What are the properties of vector instructions? How are the
two important issues like Vector Length and Stride tackled? 12
2.12 Vectorizing compilers generally detect loops that can be executed on a pipelined vector
computer. Are the vectorization algorithms used by vectorizing compilers suitable for
MIMD machine parallelization? Justify your answer. 6
2.13 Explain the different types of dependences among instructions with suitable examples.
10
2.14 Consider the following loop:
for (i = 1; i <= 500; i++) {
A[i] = A[i] +B[i]; /* Statement 1 or S1 */
B[i+1] = C[i] +D[i]; /* Statement 2 or S2 */
}
3
What are the dependences between S1 and S2? Is this loop parallel? If not, show how to
make it parallel. 8
2.15 How is a block found if it is in cache? What are the different write policies for cache? 10
2.16 Assume the following miss rates:
Size Instruction Cache Data Cache Unified Cache
16 KB 0.64% 6.47% 2.87%
32 KB 0.39% 4.82% 1.99%
Which has the lower miss rate: a 16-KB instruction cache with a 16-KB data cache or a
32-KB unified cache? Assume a hit takes 1 clock cycle and the miss penalty is 50 clock
cycles, and a load or store hit takes 1 extra clock cycle on a unified cache. What type of
hazard the unified cache poses? What is the average memory access time in each
case? Assume write-through caches with write buffer and ignore the difficulties due to
hazard. 8
2.17 What are dynamic networks? What are the characteristics of access time in such
networks? What is Multistage Interconnection Network? 9
2.18 What do you mean by hazard? What are different types of hazards? What are the
alternative techniques to reduce the data hazard? 9
TUTORIALS 3
3.1 What are the differences between scalar instructions and vector instructions? Give at
least four differences. 4
3.2 A two-level memory hierarchy is represented by M1 and M2. M1 has hit ratio as h. The
access times of M1 and M2 are t1 and t2 respectively. What is the effective memory
access time of this hierarchy? 4
3.3 A linear instruction pipeline having ten stages operates at 25 MHz. If one instruction is
issued per clock cycle, what will be the speedup factor to execute a program of 15,000
instructions as compared with the use of an equivalent non-pipelined processor with an
equal amount of flow-through delay? Ignore penalties due to branch instructions and
out of sequence execution. 4
3.4 Draw the Illiac mesh (4x4) network. Label the nodes as N0, N1, ….., N15. List all the
nodes reachable from node N0 in exactly one step. 4
3.5 State advantage and drawback of using independent request and grants in central bus
arbitration system in a multiprocessor system. 4
3.6 Define Grain size. Give examples for fine, medium and coarse grains. 4
3.7 Name the two well-established concepts on which VLIW is based. Give format of the
VLIW instruction. 4
3.8 Identify dependencies among the following statements in a given program:
4
pipeline showing all line widths. 10
3.14 What are the conflicts that are resolved in dynamic instruction scheduling based on
Tomasulo’s algorithm? Consider a seven-stage pipeline having fetch, decode, issue, 3
executes, and write back stages. Present the schedule for the following minimum
register machine code used for computing X =Y+Z and A = B×C. Use timing diagram for
the pipeline. Add suitable wait states wherever required.
R1 ← M (Y); R2 ← M (Z);
R3 ← (R1) + (R2); M (X) ← R3;
R1 ←M (B); R2 ← M (C);
R3 ← (R1) (R2); M (A) ← R3 8
3.15 Answer the following for the reservation table shown: -
1 2 3 4
S1 X X
S2 X
S3 X
TUTORIALS 4
4.1 What is the difference between a microprocessor and microprogram? Are all
microprogrammed computers microprocessors? 4
4.2 Distinguish between arithmetic and logical shifts. Show that in the former case signs are
preserved. 4
4.3 DMA access is given higher priority than CPU access to memory, why? 4
4.4 What is critical section? What are the requirements that a critical section needs to
satisfy? 4
4.5 The memory unit of a computer has 256 K words of 32 bits each. The computer has an
instruction format with four fields, an opcode field, a mode to specify one of six modes,
a register address field to specify one of 56 processor registers and a memory address.
Specify the instruction format. How many instructions are there at most in the
computer? 4
4.6 Classify SIMD and MIMD machines in the light of sequential and parallel machine
architecture. 4
4.7 Differentiate among SCSI, PCI and USB bus systems. 4
4.8 Define pipelining. Show with a simple instance how CPU enhances its performance
using pipelining. 8
4.9 Prove that the time taken by 2 stage pipelined execution of an instruction is half of the
corresponding sequential execution time. Explain using timing diagram. 10
4.10 Explain the role of the cache memory in memory hierarchy to speed up instruction
execution time. 9
5
4.11 Justify the need of two separate caches in CPU performance enhancement. 9
4.12 How do bridges used between multiprocessor clusters allow transactions initiated on a
local bus to be completed on a remote bus? 6
4.13 What do you mean by BUS hierarchies? Can a processor put data to any BUS or read
data from any BUS? 6
4.14 Why is a crossbar switch network called a single-stage, non-blocking and permutation
network? 6
4.15 What is superscalar processing? Illustrate with the help of an example the conditions of
pipeline stalling superscalar processor. 12
4.16 Suggest how VLIW architecture can achieve superscalar performance. 6
4.17 What is the motivation behind introducing parallelism into computer system? Distinguish
between shared-memory and distributed-memory computers. 9
4.18 Illustrate the application of parallel processing using the problem of computing the sum
of Numbers (constants). Show how you arrange the dependency problem. 9
4.19 Write notes on the following: -
(i) Synchronous vs. asynchronous parallel processing.
(ii) Horizontal and Vertical expansion.
(iii) Network topology and reliability in a computer network. [3x5]
4.20 What is the difference between a direct and an indirect address instruction? 3
TUTORIALS 5
5.1 What are the problem areas which cause a poorly designed instruction set to stall
frequently in a pipelined processor? 4
5.2 What is the Amdahl’s law for vectorization? 4
5.3 Why are shared memory machines and distributed memory machines suited for fine
grained and coarse grained problems respectively in case of parallel computing? 4
5.4 What do you understand by spatial & temporal locality? What is clustering? How does
locality of reference help in clusters? 4
5.5 What is VLIW and how is it different from RISC or CISC? 4
5.6 What do you understand by control flow & data flow scheduling? 4
5.7 Compare circuit switching, store & forward technique and wormhole routing technique.
4
5.8 A four segment pipeline implements a function and has the following delays for each
segment (b=0.2):
Segment # Maximum Delay *
1 17 ns
2 15 ns
3 19 ns
4 14 ns
6
ADD.W R7, R7, 4
LD.W R1, 0(R7)
MUL.W R2, R1, R1
ADD.W R3, R3, R2
LD.W R4, 2(R7)
SUB.W R5, R2, R3
.W R5, R5, R4
Percentage of delay occurs due to address & execution interlock is given by the following
table:
7
} 9
TUTORIALS 6
6.1 How many data dependent hazards are there? Which particular hazard may occur for
branch type of instructions? 4
6.2 Compare between Dependency Graph, Signal Flow Graph and Data Flow Graph. 4
6.3 Compare Daisy chaining, Polling and Independent request system interconnect
structures. 4
6.4 Discuss the four machine organizations, according to Flynn's classification. 4
6.5 How does internal forwarding enhance the performance of computers? 4
6.6 Which of coupling systems is better for higher degree of interactions between tasks?
Differentiate between Loosely Coupled System (LCS) and Tightly Coupled System
(TCS). 4
6.7 What do you mean by pipelining? How does it differ from parallel processing? 4
6.8 Write down an O(n2) algorithm for SIMD matrix multiplication. Establish the correctness
of the complexity. Explain the underlined architecture. 12
6.9 Sketch an O(n log 2 n) algorithm for matrix multiplication. 6
6.10 Define systolic array. Discuss the various properties of systolic array. 5
6.11 Draw a systolic array corresponding to bubble-sorter. 4
6.12 Using systolic arrays multiply two full matrices. 9
6.13 Draw the DG (Directed Graph) for Warshal-Floyd algorithm.
for k from 1 to N
for i from 1 to N
for j from 1 to N
Xijk ← Xijk−1 + Xikk−1 x Xkjk−1. 10
6.14 Describe the procedure of mapping DGs and SFGs (Systemic Functional Grammar) to
systolic arrays. 8
6.15 What do you mean by network topology? Write short notes on the following topologies
and relatively compare them in terms of degree of a processor and longest distance
between two processors: (i) Mesh of Trees, (ii) Pyramid, (iii) Shuffle-Exchange, and (iv)
Hypercube. [1+12]
6.16 Write a short note on Star interconnection network and draw such a network for the
number of processors N = 4!. 5
6.17 Explain single-stage and multistage dynamic interconnect networks, with necessary
figures. 10
6.18 Corresponding to the reservation table below, draw the state diagram. Clearly indicate
the cross collision vectors and collision matrices. 8
t s 0 1 2 3 4
1 A B A B
2 A B
3 B AB
TUTORIALS 7
7.1 What are the limitations of Instruction-level parallelism (ILP)? Also explain the design
challenges of Pipeline processors. 4
7.2 Explain the basic concept of CPU machine instruction and its types. 4
8
7.3 Write short-note on Multiport Memory. 4
7.4 What is significance of Virtual Memory in advanced computer architecture? 4
7.5 State the advantages and disadvantages of shared versus memory-mapped I/O. 4
7.6 How the performance of the processors is improved by using the Superscalar
architectures? 4
7.7 Explain the types of system performance factors in a parallel architecture. 4
7.8 In the pipeline organization, it is difficult to keep a large pipeline at maximum rate
because of the pipeline hazards. Explain Control Hazards in pipeline hazards by taking
suitable program. 9
7.9 Explain the Flynn’s classification for Computer Architectures based on the nature of the
instruction flow executed by the computer with diagram. 9
7.10 Explain the basic terms related to Bus Protocols. Give reason for increasing bus
bandwidth of Input/Output devices. 9
7.11 Discuss the limitation factor of parallel execution by taking simple example. 9
7.12 What is a pipeline bubble? Deduce what is in a pipeline stage that is “executing” a
bubble? 8
7.13 What are the properties of the Vector Processors? Explain each component of Vector-
Register Processors with diagram. 10
7.14 What is the need of VLIW architecture? Also compare its properties with Explicitly
Parallel Instruction Computing (EPIC) model. 6
7.15 In Program parallelism, what are the different issues of scheduling? 5
7.16 How Cache memory performance is analyzed in advanced processor architecture. 7
7.17 Give classification criteria for interconnection networks (INs). 8
7.18 Explain Multiple-Instruction Multiple-Data streams (MIMD) parallel architectures model
with diagram. 10
7.19 State network properties with respect to system architectures. Explain Dynamic
Networks Bus Systems in detail. 6
7.20 Vectorizing compilers generally detect loops that can be executed on a pipelined vector
computer. Are the vectorization algorithms used by vectorizing compilers suitable for
MIMD machine parallelization? 6
7.21 What is meant by Cache-Coherency? Explain with the help of a suitable example. 6
TUTORIALS 8
8.1 What is meant by throughput rate? Calculate the throughput rate for a program with
clock frequency 25 MHz, CPI value is 5 and construction count 100. 4
8.2 Consider the following code written for a sequential execution on a uniprocessor
system:
L1: DO 10 I = 1, N
L2: A(I) = B(I) + C(I)
L3: 10 CONTINUE
L4: SUM = 0
L5: DO 20 I = 1, N
L6: SUM = SUM + A(I)
L7: 20 CONTINUE
9
8.8 A computer system has a 128 byte cache. It uses a four way set associative mapping
with 8 bytes in each block. The physical address size is 32 bits and smallest
addressable unit is 1 byte. Draw a diagram showing the organization of the cache
indicating how physical addresses are related to cache address. 6
8.9 Differentiate between asynchronous and synchronous pipeline model? 6
8.10 Vectorizing compilers generally detect loops that can be executed on a pipelined vector
computer. Are the vectorization algorithms used by vectorizing compilers suitable for
MIMD machine parallelization? Justify your answer. 6
8.11 Explain the concept of daisy chain bus arbitration with the help of a diagram. Also make
corresponding timing diagram. 6
8.12 Consider a three stage pipeline having following reservation table: -
1 2 3 4 5 6 7 8 →Time
S1 X X X
S2 X X
S3 X X X
Find the state transition diagram corresponding to this reservation table. 6
8.13 Define the pipeline performance/cost ratio. For what purpose it is used? 6
8.14 What are message passing systems? Explain the format of message, packet and fit in
the context of message passing system. 6
8.15 What is meant by wormhole routing? Where is it used? 6
8.16 What are “hot spots” in context of interconnection networks? How do they affect the
designing of interconnection networks? 6
8.17 Explain the functioning of vector processor with the help of suitable diagram? 6
8.18 What is Cache Coherence Problem? What is a snooping cache? Discuss with example
the Write Through and Write MLE protocols for Cache consistency. 12
8.19 Suppose that scalar operations take 10 times longer to execute per result than vector
operations. Given a program which is originally written in scalar code:
(a) What are the percentage of the code needed to be vectorized in order to achieve the
speedup factor of 2, 4 and 6 respectively?
(b) Suppose the program contains 15% of code that cannot be vectorized such as
sequential I/O operations. Now repeat question (a) above for the remaining code to
achieve the three speed up factors. [9+9]
8.20 What are the software tools used for the development of parallel programming? 8
8.21 Define the following:
(i) Multilevel page table
(ii) Hashing function
(iii) Resource conflict
(iv) Simple operation latency
(v) Address mapping
[5x2]
TUTORIALS 9
9.1 Give at least four differences between scalar and vector instructions. 4
9.2 Give difference between PCI, USB and SCSI Bus Systems. 4
9.3 Define grain size. Give an example for fine, medium and coarse grain. 4
9.4 Suggest some scheme to replace the memory values in cache when cache miss occurs
and it is full. 4
9.5 What kind of instruction may cause problem with instruction prefetch? 4
9.6 Define diameter with respect to static networks. 4
9.7 Briefly explain data flow architecture. 4
9.8 Explain with an example the Cache-Coherency. How does two level cache increase
performance? Derive the formula for average access time in a three level cache? 10
9.9 Describe briefly the dual bus design for shared memory multiprocessors with special
emphasis on clustered architecture. 8
9.10 Discuss in detail various interconnect architecture for MIMD computers. 9
10
9.11 How will the following code be vectorized? Explain
for (i=o; i<1024; i++)
{ if(x[i] > 0)
y[i] = z[i];
else
w[i] = w[i–1];
}
9
9.12 Describe the following terminology associated with multiprocessors.
(a) Mutual exclusion
(b) Critical section
(c) Hardware lock
(d) Semaphore.
[4×4.5]
9.13 Briefly explain speed up in a pipeline with K-stages. 10
9.14 Explain data dependency among pipeline stages. 8
9.15 What is the difference between superscalar and super pipelined process? 9
9.16 What are the limitations of instruction level parallelism that affect superscalar
performance? 9
9.17 Explain branch handling strategies for a pipelined processor under hierarchical memory
system. 10
9.18 What is a stride? How does it affect the design of vector memory? Give an example. 8
TUTORIALS 10
10.1 Describe briefly SIMD machine model. 4
10.2 What are the limitations of instruction-level parallelism? Explain briefly. 4
10.3 Explain the cluster model of memory organisation, with the help of suitable diagrams. 4
10.4 Explain briefly how RISC architecture attempts to reduce execution time. 4
10.5 Differentiate between Loosely Coupled System and Tightly Coupled System? 4
10.6 Explain the crossbar network. 4
10.7 How is performance of the processors improved by using the superscalar architecture?
4
10.8 Describe instruction pipelines for CISC scalar processors with respect to instruction
prefetching, data forwarding and hazard avoidance. 10
10.9 How can the penalties of branches and jumps be reduced in pipeline performance? 8
10.10 What are various three types of dependencies? Describe each briefly with an
example. 12
10.11 I/O bus standard defines how to connect computer system and device. What is the
mechanism for the same? 6
10.12 What are the various techniques for reducing cache miss penalty? 12
10.13 Why do we need virtual memory among many processes? Explain briefly. 6
10.14 In program parallelism, what are different issues of parallelism? 6
10.15 What do you mean by wormhole routing? Where do we need it? Explain. 6
10.16 In vector processing, how do we handle the situation when vector length of the
program is not exactly 64? 6
10.17 How vector processor instructions have helped to improve parallelism? What are the
primary components of instruction format of a typical vector processor? 12
10.18 What is pipeline bubble? Explain with an example. 6
2
10.19 Write down on O(n ) algorithm for SIMD matrix multiplication. Establish the
correctness of the complexity. 10
10.20 Describe the procedure of mapping DGs and SFGs to systolic arrays. 8
TUTORIALS 11
11.1 Comment on the blocking and non-blocking multi-stage interconnection networks. 4
11
11.2 Comment on the cache coherence problem for uniprocessor and multiprocessor system.
4
11.3 Compare and contrast SCSI and USB bus systems with respect to operation,
advantage and disadvantage. 4
11.4 What are alternative forms of data flow architecture? Describe briefly. 4
11.5 Comment on the number and type of process states of any message passing system in
a multiprocessor system. 4
11.6 Critically comment on the operating systems of multiprocessor system. 4
11.7 Describe semaphore with respect to multiprocessing. 4
11.8 What is the optimum size of cache memory? Justify your answer. 6
11.9 Write algorithms to add a set of 8 integer values by multiprocessor environments where
processors are in cube-connected and mesh connected SIMD system. Compare their
time complexities. 12
11.10 Briefly discuss the alternative techniques to reduce the miss rate for cache memory.
10
11.11 Consider the following program segment:
DO I = 1, N
DO J = 2, N
S1: A(I,J) = B(I,J) + C(I,J)
S2: C(I,J) = D(I,J)/2
S3: E(I,J) = A(I,J-1) ** 2 + E(I,J-1)
Enddo
Enddo
Show all data dependences among the statements. 8
11.12 Discuss the performance parameters of interconnection network. 6
11.13 Describe independent memory bank techniques for high bandwidth memory. 6
11.14 Briefly discuss the alternative flow control strategies in any message passing
system? 6
11.15 What are the desirable characteristics of any processor, member of a multiprocessor
system? 4
11.16 Describe VLIW approach to reduce multi-issue processor. 6
11.17 What are different addressing techniques of cache memory? Explain with example. 8
11.18 From a given reservation table of a pipelined architecture (uni-function), how will you
find the minimal and maximal values of average latencies? Justify your answer. 6
11.19 What are the various major hurdles of pipelining? Describe each briefly. 12
11.20 How a vector processor is different from scalar processor? Suggest a suitable
instruction format of any vector processor. What are the major complexities of
vectorizing compilers for application programs with vector data length, different from
vector length of the vector processor? 10
11.21 A computer has a 16-way interleaved memory. We are required to access a 64 by 64
matrix. Compute the total time required for the access, if the elements are accessed
by
(i) row by row
(ii) column by column. 8
TUTORIALS 12
12.1 In any typical CPU, which class of Instructions will normally take maximum time to
execute? Justify your answer with proper reasoning. 4
12.2 For any Pipelined CPU, is it mandatory for each of its machine instructions to be of identical
length? Justify your answer with proper reasoning. 4
12.3 What are the pros and cons if the entire Electronic Memory Hierarchy L1 Cache, L2
Cache, L3 Cache as well as main memory are split into Instruction Memory & Data
Memory? 4
12.4 What are the key features of an Universal Serial Bus [USB] in terms of Peak Data
Transfer Rate, Voltage Level and Interface with the mainframe? 4
12
12.5 Which among the two: Memory-mapped I/O or I/O-mapped I/O is suitable in a Multi
Core Architecture with shared Main Memory? Justify your answer with proper reasoning.
4
12.6 Enlist the main features of a VLIW processor by an example. 4
12.7 Specify the main Vector Instruction types along with relevant examples and/or
Expressions. 4
12.8 A 5-stage pipeline is used to overlap all the Instructions except the Branch Instruction.
The target of the branch cannot be fetched till the current Instruction is completed if
there exists 20% of instructions are branch instructions, each stage having the same
amount of delay. The Pipeline clock is 10 ns.
(i) Compute the Average access Time.
(ii) Compute the Throughput of the system. 6
12.9 Explain in brief the following with reference to the Memory Hierarchy of Computer
System along with the relevance of each.
(i) Coherence Property.
(ii) Various types of Locality of Reference 6
12.10 Give a comparative study of the following two types of Buses PCI & SCSI 2 in terms of
the following parameters:
(i) Data Width.
(ii) Clock Rate.
(iii) Number of Bus Masters.
(iv) Peak Bandwidths.
[1.5x4]
12.11 Specify a Comparative study among the four (4) different Techniques that are
normally adopted to reduce Cache Miss Rate in a modern day processor. 6
12.12 In terms of an I/O device, define the following:
i) Throughput.
ii) Response Time.
iii) Show an approximate plot of Throughput versus Response Time. 6
12.13 Specify the associated problems with an Out of Order Execution of Assembly
level Instruction with some examples. 6
12.14 Specify, in a stepwise fashion, the typical Timing Protocol maintained between a
Master and a Slave over an Asynchronous System BUS. 6
12.15 Consider the following code sequence.
DIV.D F0,F2,F4
ADD.D F6,F0,F8
S.D F6, 0 (R1)
SUB.D F8,F10,F14
MUL.D F6,F10.F8
13
DADOUT R1,R1, #-8 ; decrement pointer 8 bytes(per DW)
BNE R1, R2, Loop ; Loop till Entire Array is Processed
Suppose we have a VLIW that could issue two memory references, two FP operations,
and one integer operation or branch in every clock cycle. Show an unrolled version of the
loop for such a processor. Unroll as many times as necessary to eliminate any stalls.
Ignore the branch delay slot. 12
12.18 Specify what are the three (3) most important capabilities needed in a Compiler to be
able to speculate branches ambitiously. 6
12.19 Name at least 2 (two) possible Structural Hazards along with their viable remedies in
any pipelined CPU. 6
12.20 Formulate a six-segment instruction pipeline for a computer. Specify the operations to
be performed in each segment. 6
12.21 Explain the following performance parameters of an Interconnection Network 6
(i) Bandwidth
(ii) Transport Latency
(iii) Receiver Overhead
12.22 Write short notes on the following:
(a) SISD, SIMD and MIMD
(b) Cross bar Switch
(c) Super Scalar Processor
(6+6+6)
TUTORIALS 13
13.1 Explain the different types of system performance factors in digital computer technology.
4
13.2 Explain pipelining. 4
13.3 Explain SISD and SIMD in detail. 4
13.4 Differentiate between CISC and RISC. 4
13.5 What are the main responsibilities of the Operating System (OS) while handling I/O
devices? 4
13.6 Briefly explain how does complier-based cache optimization reduce the cache miss
rate? 4
13.7 Write the properties of the instructions provided in vector processor. 4
13.8 How should new high level language (HLL) programs be compiled and executed
efficiently on processor architecture? 5
13.9 Assume an instruction cache miss rate of 2%, a data cache miss rate of 4%. If a
machine has a CPI of 2 without any memory stalls and the miss penalty is 40 cycles for
all misses, determine how much faster a machine would run with a perfect cache that
never missed for one instruction. Assume 36% combined frequencies for load and store
instructions. 4
13.10 Write short note on Symbolic processors. Draw the processor architecture of the
Symbolic processor and explain how the stack models are useful to simplify instruction-
set design. 9
13.11 Differentiate Static Vs Dynamic interconnection network. Give examples of each
interconnection network used in modern multiprocessor architectures. 9
13.12 With respect to parallel computer models, explain following system performance
factors: 9
(i) Peak Rate
(ii) Speedup
(iii) Efficiency
13.13 Why program segments are required to be independent in parallel computer models?
Name the different forms of independence in parallelism and explain each in detail. 9
13.14 What is the cache inconsistency problem? Which protocol or method is useful to
overcome cache coherence issues? 9
14
13.15 Briefly explain the basic principles and structure of linear pipelining in processor. 6
13.16 Briefly explain how processors and memories are connected in multiprocessor system
with the help of Cache coherence protocol. 3
13.17 Analyze the data dependencies among the following statement in a given program:
15
(i) Specify the reservation table for this pipeline with six columns and four rows?
(ii) List the set of forbidden latencies between task initiations.
(iii) Draw the state diagram which shows all possible latency cycles.
(iv) Determine the minimal average latency (MAL). 10
14.11 Explain memory consistency models in detail. 6
14.12 What is the difference between Trap and Interrupt? 2
14.13 Differentiate implicit and explicit parallelism. 6
14.14 What are the limitations on parallel execution? Explain in detail. 6
14.15 What do you mean by strobe? Explain source-initiated and destination-initiated strobe.
6
14.16 Give the flow chart of division of two signed magnitude data. Discuss the logic of the
flow chart. 10
14.17 Explain hardware polling method for data transfer. 5
14.18 What is crossbar network? 3
14.19 Explain in detail the different mappings used for cache memory. Compare them. 8
14.20 What are the interrupts? Explain different types of interrupts. 6
14.21 Differentiate linear pipeline and non-linear pipeline processor. 4
14.22 What is parallelism? Differentiate between hardware and software parallelism. 6
14.23 What are the different shared memory multiprocessor models? Explain any one
model in detail. 6
14.24 What can be inferred from the term “Dynamic instruction scheduling?” 6
*****
Tutorials 2011
15.1 Define Spinlock. What is busy waiting?
15.2 Explain the difference among the UMA, NUMA and COMA model.
15.3 What do you mean by implicit parallelism and explicit parallelism?
15.4 Explain briefly the Flynn’s Classification.
15.5 What is thread? Why thread is needed?
15.6 How synchronization of thread is achieved? Explain with example.
15.7 What is Vector Supercomputer?
15.8 Explain Hardware and Software parallelism.
15.9 Explain the term Sharing I/O Devices.
15.10 What is a Multicore processor? What are its advantages and disadvantages?
15.11 What are the various levels of program optimization or software optimization?
15.12 Explain parallelism in a uniprocessor system.
15.13 What is the difference between shared memory model and message passing model.
15.14 Define the term pipelining. Classify pipelining according to the levels of processing.
15.15 What do you mean by Pipeline hazards? What are the three major types of hazards?
15.16 Compare the features of RISC and CISC processors.
15.17 Explain the Superscalar pipelined design.
15.18 What is the objective of memory hierarchy in computer system?
15.19 What are the various CPU scheduling techniques?
15.20 What is a Dual core processor? Explain the difference between dual core and single
core processor.
15.21 What is Sun Niagara? Write some of its characteristics.
15.22 Explain Moore’s law.
15.23 What are the various coherency protocols?
15.24 What are the various memory consistency models?
15.25 What is cache coherence problem? How this problem is dealt with in MIMD
architecture?
16
15.26 Describe briefly the concept of data parallel computation.
15.27 What are the various features of Intel Montecito model?
15.28 What do you mean by compiler optimization? What are it various types?
15.29 Explain the difference between computer organization and computer architecture.
15.30 What is Instruction Level Parallelism?
15.31 What is OpenMP? What are its core elements?
15.32 Write short notes on Local miss rate, Global miss rate, Multilevel Inclusion, Multilevel
Exclusion, compulsory miss and conflict miss.
15.33 Distinguish between shared memory and distributed shared memory.
15.34 Write about the Snoopy Bus Protocols.
15.35 Explain balancing of a subsystem bandwidth.
15.36 Explain the various barrier synchronization algorithms.
15.37 What is communication latency?
15.38 What are the various features of POWER4 processor?
15.39 What do you mean by Pipeline Stalling?
15.40 What is Digital Image Processing? Explain the fundamental steps in Digital Image
Processing.
Tutorials 2014-15
16.1 What is computer architecture? Describe the computer system generations in detail.
16.2 What is Multiprogramming?
16.3 What is parallel processing? Explain the parallel processing at various levels:
(i) Job or program level
(ii) Procedure level
(iii) Inter-instruction level
(iv) Intra-instruction level
16.4 Show the basic uniprocessor architecture and explain its major components.
16.5 Describe the parallel processing mechanism in the following categories:
(a) Multiplicity of functional units
(b) Parallelism and pipelining within CPU
(c) Overlapped CPU and I/O operation
(d) Use of hierarchical memory system
(e) Balancing of Subsystem bandwidth
(f) Multiprogramming and time-sharing
16.6 The execution of 4 programs in 3 computers is given:
Program Execution Time
Computer A Computer B Computer C
Program 1 1 10 20
Program 2 1000 100 20
Program 3 500 1000 50
Program 4 100 800 100
Assume that 100,000,000 instructions were executed in each of them. Calculate MIPS
for each program on each of the three machines. Based on these ratings, can you draw
a clear conclusion regarding the relative performance of the computers?
16.7 What do you mean by pipeline computer? Explain its working.
16.8 Show the space-time diagram for pipelined processor for five pipeline stages.
16.9 What are array Computers?
16.10 Describe the Multiprocessor system and explain the three different interconnections:
(a) Time-shared common bus
(b) Crossbar switch network
(c) Multiport Memories
16.11 A workstation uses a 15-MHz processor with a claimed 10-MIPS rating to execute a
given program mix. Assume one cycle delay memory access.
(i) What is the effective CPI (Cycles per instructions) of this computer?
17
(ii) Suppose the computer is being upgraded with a 30-MHz clock. However, the speed
of the memory subsystem remains unchanged, and consequently two clock cycles
are needed per memory access. If 30% of the instructions require one memory
access and another 5% require two memory accesses per instruction, what is the
performance of the upgraded processor with a compatible instruction set and equal
instruction counts in the given program mix?
16.12 Write the Bernstein’s condition for parallelism. Detect the parallelism in the following
program.
P1 : C = D * E
P2 : M = G + C
P3 : A = B + C
P4 : C = L + M
P5 : F = G/E
16.13 Justify how multi-threaded architecture in a multiprocessor system will increase the
processor utilization.
16.14 Explain cache optimization techniques for reducing cache miss penalty.
16.15 Explain how memory hierarchy bridges the gap between CPU and memory speed.
16.16 Consider the execution of an object code with 200,000 instructions and 40 MHz
processor. The program consists of 4 major types of instructions. The instructions mix
and CPI needed for each instruction type are given below based on the result of a
program time experiment.
16.17 What are the system attributes for performance of computer system?
16.18 Compare instruction level parallelism with thread level parallelism.
16.19 Distinguish between multiprocessor and multicomputer based on their structures,
resource sharing and inter-processor communications.
16.20 What are main design issues of scalable MIMD computers? Explain them briefly.
16.21 A non-pipelined computer uses a 5 nsec clock. The average number of clock cycles
per instruction required by this machine is 4.5. When the machine is pipelined, it
requires a 6 nsec clock. Find out the speedup due to pipelining.
16.22 Explain the superpipelined superscalar design.
16.23 What is Multiprocessor scheduling?
16.24 What do you mean by Time Sharing, Multiprogramming and Multithreading?
16.25 Discuss the various performance issues of Nonlinear processor.
16.26 What are the various limitations of Instruction Level Pipelining?
16.27 Consider the design of a three-level memory hierarchy with the following specifications
for memory characteristics.
The design goal is to achieve an effective memory – access time T=10.04 μs with a
cache hit ratio h1= 0.98 and a hit ratio h2= 0.9 in the main memory. Also, the total cost of
the memory hierarchy is upper-bounded by Rs. 15,000.
16.28 A 40 MHz processor was used to execute a benchmark program with the following
instruction mix and clock cycle counts.
18
Instruction type No. of Instructions Clock cycle count
Integer arithmetic 45,000 1
Data transfer 32,000 2
Floating point 15,000 2
Control transfer 8000 2
Determine the effective CPI, MIPS rate and execution time for this program.
16.29 What is Thread? Explain how Threads are created and terminated.
16.30 How Synchronization of Thread is achieved? Explain with examples.
16.31 What is dependence? Explain three major types of dependence.
16.32 Show the dependence graph among the statements with justification.
S1 : A = B + D
S2 : C = A * 3
S3 : A = A + C
S4 : E + A/2
16.33 Explain the characteristics of fine-grained SIMD architecture.
16.34 Explain data parallel pipelining with the help of block diagram.
16.35 Explain the various design issues for SIMD systems.
16.36 Differentiate the fundamental types of data-parallel system on the bases of principal
characteristics.
16.37 Describe briefly the concept of Data-parallel computation.
16.38 Explain the similarity and differences between the RAM model of serial computation
and PRAM model of parallel computation.
16.39 Describe briefly different PRAM models based on read or write conflicts.
16.40 Define linear pipeline processor. Explain Asynchronous and Synchronous models.
16.41 What is the need of Multistage network?
16.42 Consider the execution of the following code segment consisting of six statements.
Use Bernstein’s condition to detect the maximum parallelism embedded in this code.
Justify the portion that can be executed in parallel and the remaining portions that must
be executed sequentially.
S1 : X ← ( A + B ) * ( A – B)
S2 : Y ← ( C + D ) / ( C – D )
S3 : Z ← X + Y
S4 : A ← E * F
S5 : Y ← E – Z
S6 : B ← ( X – F ) * A
16.43 Write and explain the various instruction set principles.
16.44 What do you mean by Cache Coherence problem?
16.45 Explain the crossbar switch and multiport memory.
*****
Tutorial 17 - GBPEC December 2014 Q. Paper
17.1 What is basic principle and parameters of linear pipelining? How is it different from Non-
linear pipelining? Define all parameters with suitable examples. 10
17.2 What are data dependency hazards in parallelism? How is it detected? Explain with
suitable example. 10
17.3 What are the parameters that affect scheduling for ILP? Define dynamic scheduling
with suitable example and parameters. 10
17.4 What are the basic characteristics and requirements of Vector processing? Define any
one method of vectorization with suitable example. 10
17.5 What are SIMD array processors? How does communication take place among the
PEs? Define with suitable example and figure. 10
17.6 What are the PRAM models? Define each of them with suitable figure and
characteristics. 10
17.7 What is hypercube processor organisation? How processors are communicating in
hypercube organisation? Explain with suitable example. 10
19
17.8 What is shared memory system and how is it categorized? Define each type of shared
memory system with suitable figure. 10
17.9 Define any one static scheduling model with suitable example and figure. 10
17.10 What is load balancing? Define any one load balancing technique with suitable
example and figure. 10
17.11 Write a parallel algorithm for the Matrix multiplication. 10
17.12 What is pipelining chaining? How collisions are prevented in pipelining. Define with
suitable example. 10
17.13 Write a short note on following: 5x4=20
(a) Loop Unrolling:
(b) Hardware-based Speculation
(c) Amdahl’s Law
(d) OpenMP
Tutorial 18 - GBPEC December 2015 Q. Paper
18.1 What is branch handling? Draw and explain arithmetic pipelines. 10
18.2 Explain arithmetic pipelining and trace scheduling techniques. 10
18.3 A non-pipelined processor X has a clock rate of 25 MHz and an average CPI of 4.
Processor Y, an improved processor of X is designed with a 5-stage linear instruction
pipeline. However, due to latch delay and clock skew effects, the clock rate of Y is only
20 MHz. 10
(i) If a program containing 100 instructions is executed on both processors, what is
speedup of processor Y compared with that of processor X?
(ii) Calculate the MIPS rate of each processor during the execution of this particular
program.
18.4 What is a dynamic scheduling algorithm? Explain vector processing. 10
18.5 Explain vectorization methods and static scheduling with example. 10
18.6 Explain exploiting instruction level parallelism with example. 10
18.7 What is a SIMD array processor? Explain SIMD interconnection network. 10
18.8 What is data and control parallelism? Explain Amdahl’s law. 10
18.9 What do you understand by PRAM algorithms? Discuss and explain with suitable
example about the PRAM algorithm for merging two sorted lists. 10
18.10 Define parallel computing. What are the fundamental issues in parallel processing?
Why parallel computing is required? 10
18.11 Explain shared memory and message passing systems. 10
18.12 What is a processor organization? Explain binary tree and hypercube. 10
18.13 What is load balancing? Explain dilation loading with example. 10
18.14 Explain processor graphs and embedding of task of task graphs. 10
18.15 Explain how degree of parallelism and number of processors affect the performance of
a parallel computing system. Discuss various speedup performance laws. 10
*****
20
Breakup of Questions:
Tutorial 1 – 29 Questions
Tutorial 2 – 18 Questions
Tutorial 3 – 19 Questions
Tutorial 4 – 20 Questions
Tutorial 5 – 17 Questions
Tutorial 6 – 22 Questions
Tutorial 7 – 21 Questions
Tutorial 8 – 21 Questions
Tutorial 9 – 18 Questions
Tutorial 10 – 20 Questions
Tutorial 11– 21 Questions
Tutorial 12– 22 Questions
Tutorial 13– 22 Questions
Tutorial 14 – 24 Questions
Tutorial 15- 40 Questions
Tutorial 16- 45 Questions
Tutorial 17- 13 Questions
Tutorial 18- 15 Questions
Total – 294+103=407 Questions
Attempt any 150 different questions.
21