0% found this document useful (0 votes)
41 views

Advanced Computer Architecture

The document discusses the different generations of computers and Flynn's classification schemes. It provides details about the characteristics, examples and advantages and disadvantages of first through fifth generations of computers. It also explains the four categories in Flynn's classification - SISD, SIMD, MISD, and MIMD along with diagrams.

Uploaded by

harsh9808527912
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Advanced Computer Architecture

The document discusses the different generations of computers and Flynn's classification schemes. It provides details about the characteristics, examples and advantages and disadvantages of first through fifth generations of computers. It also explains the four categories in Flynn's classification - SISD, SIMD, MISD, and MIMD along with diagrams.

Uploaded by

harsh9808527912
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Advanced Computer Architecture (TCS-703) Question Bank

UNIT – I PROGRAM AND NETWORK PROPERTIES

Q 1. Differentiate the generations of Electronic Computers with their merits and demerits.

1. FIRST GENERATION
 Introduction:
1. 1946-1959 is the period of first generation computer.
2. J.P.Eckert and J.W.Mauchy invented the first successful electronic computer called ENIAC, ENIAC stands
for “Electronic Numeric Integrated And Calculator”.
 Few Examples are:
1. ENIAC
2. EDVAC
3. UNIVAC
4. IBM-701
5. IBM-650

 Advantages:
1. It made use of vacuum tubes which are the only electronic component available during those days.
2. These computers could calculate in milliseconds.
 Disadvantages:
1. These were very big in size, weight was about 30 tones.
2. These computers were based on vacuum tubes.
3. These computers were very costly.
4. It could store only a small amount of information due to the presence of magnetic drums.
 Introduction:
1. 1959-1965 is the period of second-generation computer.
2. 3.Second generation computers were based on Transistor instead of vacuum tubes.
 Few Examples are:
1. Honeywell 400
2. IBM 7094
3. CDC 1604
4. CDC 3600
5. UNIVAC 1108
… many more
 Advantages:
1. Due to the presence of transistors instead of vacuum tubes, the size of electron component decreased. This
resulted in reducing the size of a computer as compared to first generation computers.
2. Less energy and not produce as much heat as the first genration.
 Disadvantages:
1. A cooling system was required.
2. Constant maintenance was required.
3. Only used for specific purposes.

2. THIRD GENERATION
 Introduction:
1. 1965-1971 is the period of third generation computer.
2. These computers were based on Integrated circuits.
3. IC was invented by Robert Noyce and Jack Kilby In 1958-1959.
4. IC was a single component containing number of transistors.
 Few Examples are:
1. PDP-8
2. PDP-11
3. ICL 2900
4. IBM 360
5. IBM 370
… and many more
 Advantages:
1. These computers were cheaper as compared to second-generation computers.
2. They were fast and reliable.
3. Use of IC in the computer provides the small size of the computer.
4. IC not only reduce the size of the computer but it also improves the performance of the computer as
compared to previous computers.
5. This generation of computers has big storage capacity.
 Disadvantages:
1. IC chips are difficult to maintain.
2. The highly sophisticated technology required for the manufacturing of IC chips.
3. Air conditioning is required.

3. FOURTH GENERATION
 Introduction:
1. 1971-1980 is the period of fourth generation computer.
2. This technology is based on Microprocessor.
3. A microprocessor is used in a computer for any logical and arithmetic function to be performed in any
program.
4. Graphics User Interface (GUI) technology was exploited to offer more comfort to users.
 Few Examples are:
1. IBM 4341
2. DEC 10
3. STAR 1000
4. PUP 11
… and many more
 Advantages:
1. Fastest in computation and size get reduced as compared to the previous generation of computer.
2. Heat generated is negligible.
3. Small in size as compared to previous generation computers.
4. Less maintenance is required.
5. All types of high-level language can be used in this type of computers.
 Disadvantages:
1. The Microprocessor design and fabrication are very complex.
2. Air conditioning is required in many cases due to the presence of ICs.
3. Advance technology is required to make the ICs.

4. FIFTH GENERATION
 Introduction:
1. The period of the fifth generation in 1980-onwards.
2. This generation is based on artificial intelligence.
3. The aim of the fifth generation is to make a device which could respond to natural language input and are
capable of learning and self-organization.
4. This generation is based on ULSI(Ultra Large Scale Integration) technology resulting in the production of
microprocessor chips having ten million electronic component.
 Few Examples are:
1. Desktop
2. Laptop
3. NoteBook
4. UltraBook
5. Chromebook
… and many more
 Advantages:
1. It is more reliable and works faster.
2. It is available in different sizes and unique features.
3. It provides computers with more user-friendly interfaces with multimedia features.
 Disadvantages:
1. They need very low-level languages.
2. They may make the human brains dull and doomed.

Q 2. Explain Flynn’s Classification Schemes and also give diagram to each category.

M.J. Flynn proposed a classification for the organization of a computer system by the number of instructions and data
items that are manipulated simultaneously.

The sequence of instructions read from memory constitutes an instruction stream.

The operations performed on the data in the processor constitute a data stream.

Next →← Prev

Flynn's Classification of Computers


M.J. Flynn proposed a classification for the organization of a computer system by the number of instructions and data
items that are manipulated simultaneously.

The sequence of instructions read from memory constitutes an instruction stream.

The operations performed on the data in the processor constitute a data stream.

Note: The term 'Stream' refers to the flow of instructions or data.

Parallel processing may occur in the instruction stream, in the data stream, or both.

Flynn's classification divides computers into four major groups that are:

1. Single instruction stream, single data stream (SISD)

SISD stands for 'Single Instruction and Single Data Stream'. It represents the organization of a single computer
containing a control unit, a processor unit, and a memory unit. Instructions are decoded by the Control Unit and then the
Control Unit sends the instructions to the processing units for execution.

Data Stream flows between the processors and memory bi-directionally.

Examples: Older generation computers, minicomputers, and workstations


2. Single instruction stream, multiple data stream (SIMD)

SIMD stands for 'Single Instruction and Multiple Data Stream'. It represents an organization that includes
many processing units under the supervision of a common control unit. All processors receive the same
instruction from the control unit but operate on different items of data.

SIMD is mainly dedicated to array processing machines. However, vector processors can also be seen
as a part of this group.

3. MISD stands for 'Multiple Instruction


Multiple instruction stream, single data stream (MISD)
and Single Data stream'. In MISD, multiple processing units operate on one single-
data stream. Each processing unit operates on the data independently via separate
instruction stream.
Next →← Prev

MISD
MISD stands for 'Multiple Instruction and Single Data stream'.

MISD structure is only of theoretical interest since no practical system has been constructed
using this organization.

In MISD, multiple processing units operate on one single-data stream. Each processing unit
operates on the data independently via separate instruction stream.

1. Where, M = Memory Modules, CU = Control Unit, P = Processor Units

Multiple instruction stream, multiple data stream (MIMD) MIMD stands for 'Multiple Instruction and
Multiple Data Stream'.

In this organization, all processors in a parallel computer can execute different instructions and
operate on various data at the same time.

In MIMD, each processor has a separate program and an instruction stream is generated from
each program.

Next →← Prev

MIMD
MIMD stands for 'Multiple Instruction and Multiple Data Stream'.

In this organization, all processors in a parallel computer can execute different instructions
and operate on various data at the same time.
In MIMD, each processor has a separate program and an instruction stream is generated from
each program.

Q 3. Define the System Attributes (Ic, p, m k and Tau) with formulas.


Q 4. How many categories of Shared multiprocessors and explain each of them?

A shared-memory multiprocessor is an architecture


consisting of a modest number of processors, all of which
have direct (hardware) access to all the main memory in
the system.
1) A SMP is a system architecture in which all the
processors can access each memory block in the
same amount of time. This capability is often referred
to as “UMA” or uniform memory access.

2) Nonuniform memory access (NUMA) architectures


retain access by all processors to all the main
memory blocks within the system .But this does not
ensure equal access times to all memory blocks by all
processors.

COMA-Cache only machine architecture


No hierarchy at processor node, remote caches are assisted by distributed cache directories, CC-
NUMA model

Q 5: - What is Bernstein’s Condition?

Bernstein's Conditions are the conditions applied on two statements S1 and


S2 that are to be executed in the processor. It states that three conditions that
are explained below must be satisfied for two successive statements S1 and
S2 to be executed concurrently and still produce the same result
Show the dependence graph among the statements with justification.

a) S1: A = B + D
S2: C = A X 3
S3: A = A + C
S4: E = A / 2 b) S1: X = SIN (Y)

S2: Z = X + W
S3: Y = - 2.5 X W

S4: X = COS (Z)

Q 6: - Differentiate between software and Hardware Parallelism. With the help of a suitable example,
discuss the mismatch between software parallelism and hardware parallelism.
Q 7: - Discuss the grain size and latency. Also Explain the level of parallelism in program execution on
modern computers.

In parallel computing, granularity (or grain size) of a task is a measure of the amount of work
(or computation) which is performed by that task.

In computer networking, latency is an expression of how much time it takes


for a data packet to travel from one designated point to another.
Ideally, latency will be as close to zero as possible.
Bit-level parallelism: It is the form of parallel computing which is
based on the increasing processor’s size. It reduces the number of
instructions that the system must execute in order to perform a
task on large-sized data.
Instruction-level parallelism: A processor can only address less
than one instruction for each clock cycle phase. These instructions
can be re-ordered and grouped which are later on executed
concurrently without affecting the result of the program. This is
called instruction-level parallelism.
Task Parallelism: Task parallelism employs the decomposition of
a task into subtasks and then allocating each of the subtasks for
execution. The processors perform execution of sub tasks
concurrently.

UNIT – II SYSTEM INTERCONNECT ARCHITECTURES

Q 8: - Define the following with the help of an example.

(i) Node Degree The degree of a node is the number of edges connected to
the node.

diameter of a network as the longest of all the calculated


(ii) Network Diameter
shortest paths in a network. It is the shortest distance between the two most
distant nodes in the network.

Minimum number of edges that must be removed in order to


(iii) Bisection width
divide the network into two halves of equal size, or size differing by at most
one node.

Routing refers to the process of selecting the


(iv) Data Routing function
shortest and the most reliable path intelligently over which to send data to its
ultimate destination.
Q 9: - Define the following Static connection networks with example.

Static networks provide fixed connections between nodes. (A node can be


a processing unit, a memory module, an I/O module, or any combination
thereof.) With a static network, links between nodes are unchangeable and
cannot be easily reconfigured.

A chordal ring is an augmented ring, or a circulant graph with a


(i) Ring and Chordal Ring
chord of length 1. Formally it is defined by the pair (n,L) where n is the number of

nodes of the ring, and L is the set of chords, . Each chord


connects every pair of nodes of the ring that are at distance l in the ring.

A barrel shifter is a digital circuit that can shift a data word by


(ii) Barrel Shifter
a specified number of bits without the use of any sequential logic, only pure
combinational logic, i.e. it inherently provides a binary operation.

A systolic array is a network of processors that


(iii) Systolic Arrays
rhythmically compute and pass data through the system.
(iv) Cube connected cycles the cube-connected cycles is an undirected cubic graph,
formed by replacing each vertex of a hypercube graph by a cycle.

(v) k-ary n-cube networks : k nodes in each dimension, each node can be labelled by an n digit
number of radix (base) k, each node is connected to every node which has a label which differs
in only one digit by one.

Q 10: - Define the following Dynamic connection networks with example.

With a dynamic network the connections between nodes are established


by the setting of a set of interconnected switch boxes.
(i) Digital Buses is a communication system that transfers data between components inside
a computer, or between computers. This expression covers all related hardware components
(wire, optical fiber, etc.) and software, including communication protocols.[3]

(ii) Switch Modules The Ethernet switch network module is a modular, high-density voice network
module that provides Layer 2 switching across Ethernet ports.

(iii) Multistage interconnection networks (MINs) are a class of high-


speed computer networks usually composed of processing elements (PEs) on one
end of the network and memory elements (MEs) on the other end, connected
by switching elements (SEs).

Q 11: - What is an Omega Network? Explain the various omega network routing.
An Omega network is a network configuration often used in parallel
computing architectures. It is an indirect topology that relies on the perfect
shuffle interconnection algorithm.

Destination-tag routing[edit]
In destination-tag routing, switch settings are determined solely by the message
destination. The most significant bit of the destination address is used to select the
output of the switch in the first stage; if the most significant bit is 0, the upper output
is selected, and if it is 1, the lower output is selected. The next-most significant bit of
the destination address is used to select the output of the switch in the next stage,
and so on until the final output has been selected.
XOR-tag routing[edit]
In XOR-tag routing, switch settings are based on (source PE) XOR (destination PE).
This XOR-tag contains 1s in the bit positions that must be swapped and 0s in the bit
positions that both source and destination have in common. The most significant bit
of the XOR-tag is used to select the setting of the switch in the first stage; if the most
significant bit is 0, the switch is set to pass-through, and if it is 1, the switch is
crossed. The next-most significant bit of the tag is used to set the switch in the next
stage, and so on until the final output has been selected.

Q 12: - Explain the routing in an omega network having permutation function f = (0, 6, 4, 7, 3) (1, 5) (2).
Q 13: - What is meant by cache coherence problem? Describe various protocols (Snoopy Bus Protocols)
to handle cache coherence.

Cache coherence refers to the problem of keeping the data in these caches consistent. The
main problem is dealing with writes by a processor.

Snoopy Protocols:
 Snoopy protocols distribute the responsibility for maintaining
cache coherence among all of the cache controllers in a
multiprocessor system.
 A cache must recognize when a line that it holds is shared with
other caches.
 When an update action is performed on a shared cache line, it
must be announced to all other caches by a broadcast
mechanism.
 Each cache controller is able to “snoop” on the network to
observed these broadcasted notification and react accordingly.
 Snoopy protocols are ideally suited to a bus-based
multiprocessor, because the shared bus provides a simple
means for broadcasting and snooping.
 Two basic approaches to the snoopy protocol have been
explored: Write invalidates or write- update (write-broadcast)
 With a write-invalidate protocol, there can be multiple readers but
only one write at a time.
 Initially, a line may be shared among several caches for reading
purposes.
 When one of the caches wants to perform a write to the line it
first issues a notice that invalidates that tine in the other caches,
making the line exclusive to the writing cache. Once the line is
exclusive, the owning processor can make local writes until
some other processor requires the same line.

(i) Write through Cache

(ii) Write Back Cache

(iii) Write Once Cache

Write-through - all data written to the cache is also written to memory at the same time.
Write-back - when data is written to a cache, a dirty bit is set for the affected block. The
modified block is written to memory only when the block is replaced.
Write-Once was the first MESI protocol defined. It has the optimization of
executing write-through on the first write and a write-back on all subsequent
writes, reducing the overall bus traffic in consecutive writes to the computer
memory.

Q 14: - Discuss the different Speedup Performance Laws:

(i) Amdahl’s Law

Amdahl's Law says that the time to solve a problem (t) using
a parallel algorithm is t = P/N + L where P is the total amount of core
time for calculations that can be done in parallel, N is the number of
tasks, and L is the time to do the parts of the program that cannot be
done in parallel.
(ii) Gustafson’s Law
Gustafson’s Law says that if you apply P processors to a
task that has serial fraction f, scaling the task to take the
same amount of time as before, the speedup is
Speedup==f+P(1−f)P−f(P−1)

UNIT – III PROCESSOR AND MEMORY HIERARCHY

Q 15: - Write the differences between RISC and CISC architecture.

RISC CISC

It is a Reduced Instruction Set Computer. It is a Com

It emphasizes on software to optimize the instruction set. It emphas


instructio

It is a hard wired unit of programming in the RISC Processor. Microprog

It requires multiple register sets to store the instruction. It require


instructio

RISC has simple decoding of instruction. CISC has

Uses of the pipeline are simple in RISC. Uses of th

It uses a limited number of instruction that requires less time It uses a


to execute the instructions. more time
It uses LOAD and STORE that are independent instructions in It uses LO
the register-to-register a program's interaction. to-memor

RISC has more transistors on memory registers. CISC has

The execution time of RISC is very short. The execu

RISC architecture can be used with high-end applications like CISC arch
telecommunication, image processing, video processing, etc. applicatio
etc.

It has fixed format instruction. It has var

The program written for RISC architecture needs to take more Program w
space in memory. less space

Q 16: - Discuss the pipelining in Super Scalar with the help of an example.

processors are capable of achieving an instruction execution throughput


of more than one instruction per cycle. They are known as ‘Superscalar
Processors’.
In the above diagram, there is a processor with two execution units; one
for integer and one for floating point operations. The instruction fetch unit
is capable of reading the instructions at a time and storing them in the
instruction queue. In each cycle, the dispatch unit retrieves and decodes
up to two instructions from the front of the queue. If there is one integer,
one floating point instruction and no hazards, both the instructions are
dispatched in the same clock cycle.

Q 17: - Outline the architecture of VLIW processor and also simplify the pipeline operations of VLIW
Processors.
Very Long Instruction Word (VLIW) architecture in P-DSPs
(programmable DSP) increases the number of instructions that are
processed per cycle. It is a concatenation of several short instructions
and requires multiple execution units running in parallel, to carry out
the instructions in a single cycle. A language compiler or pre-
processor separates program instructions into basic operations and
places them into VLWI processor which then disassembles and
transfers each operation to an appropriate execution unit.
VLIW P-DSPs have a number of processing units (data paths) i.e.
they have a number of ALUs, MAC units, shifters, etc. The VLIW is
accessed from memory and is used to specify the operands and
operations to be performed by each of the data paths.

As shown in figure, the multiple functional units share a common


multiported register file for fetching the operands and storing the
results. Parallel random access by the functional units to the register
file is facilitated by the read/write cross bar. Execution of the
operations in the functional units is carried out concurrently with the
load/ store operation of data between a RAM and the register file.
Q 18: - With a suitable diagram explain the memory hierarchy and also discuss the basic properties of
each memory module.

Memory Hierarchy in Computer Architecture

The memory hierarchy design in a computer system mainly includes different


storage devices. Most of the computers were inbuilt with extra storage to run
more powerfully beyond the main memory capacity. The following memory
hierarchy diagram is a hierarchical pyramid for computer memory. The
designing of the memory hierarchy is divided into two types such as primary
(Internal) memory and secondary (External) memory.

Primary Memory
The primary memory is also known as internal memory, and
this is accessible by the processor straightly. This memory
includes main, cache, as well as CPU registers.

Secondary Memory
The secondary memory is also known as external memory,
and this is accessible by the processor through an
input/output module. This memory includes an optical disk,
magnetic disk, and magnetic tape.

Q 19: - Explain the inclusion property and data transfers between adjacent levels of memory hierarchy
with an example.

In clu sio n P r o p e r t y: it im plie s t h a t all in fo r m a tio n it e m s a r e o rigin ally s t o r e d in le v el


M n. D u rin g t h e p r o c e s sin g, s u b s e t s o f M n a r e c o pie d in t o M n-1. simila rit y, s u b s e t s o
f M n-1 a r e c o pie d in t o M n-2, a n d s o o n.

Q 20: - Describe the concept of locality of reference and its types. Also differentiate among them.

Locality of reference refers to a phenomenon in which a


computer program tends to access same set of memory
locations for a particular time period. In other
words, Locality of Reference refers to the tendency of the
computer program to access instructions whose addresses
are near one another. The property of locality of reference
is mainly shown by loops and subroutine calls in a program.
Temporal Locality –
Temporal locality means current data or instruction that is
being fetched may be needed soon. So we should store
that data or instruction in the cache memory so that we can
avoid again searching in main memory for the same data.

Spatial Locality –
Spatial locality means instruction or data near to the current
memory location that is being fetched, may be needed
soon in the near future. This is slightly different from the
temporal locality. Here we are talking about nearly located
memory locations while in temporal locality we were talking
about the actual memory location that was being fetched.
Q 21: - Write a short note on the following: -

A hit ratio is a calculation of cache hits, and comparing


(i) Hit Ratios
them with how many total content requests were received.
(ii) Effective access time The effective time here is just the average time using the relative
probabilities of a hit or a miss. So if a hit happens 80% of the time and a miss happens 20% of
the time then the effective time (i.e. average time) over a large number of hits/misses will be 0.8
* (hit time) + 0.2 * (miss time).

(iii) Hierarchy optimization.

UNIT IV - PIPELINING AND BACKPLANE BUS SYSTEM

Q 22: - Define linear pipeline processors and its classifications.

A linear pipeline processor is a cascade of Processing Stages which


are linearly connected to perform fixed function over a stream of data
flowing from one end to the other.
Q 23: - What is asynchronous pipelining. Explain with a diagram.

Asynchronous pipelining is a form of parallelism that is


useful in both distributed and shared memory systems. We
show that asynchronous pipeline schedules are a
generalization of both noniterative DAG (directed acyclic
graph) schedules as well as simpler pipeline schedules,
unifying these two types of scheduling.

Q 24: - Define the following in terms of Pipelining.

(i) Speedup factor


In computer architecture, speedup is a number that measures
the relative performance of two systems processing the same
problem. More technically, it is the improvement in speed of
execution of a task executed on two similar architectures with
different resources.
A pipeline performance/cost ratio (PCR)
(ii) PCR (Performance/Cost Ratio)
has been defined as PCR = Throughput/(c+kh) where Throughput=1/t
and t =t_{latch}+T/k. T is the total time required for the nonpipelined
execution. Derive an expression for the optimal number of pipeline
stages, k_opt, that maximizes the PCR.
(iii) Efficiency and throughput Efficiency measures the amount of work done, regardless of
how much completed product there is - it's process-oriented. Throughput is the rate of
production or the rate at which something can be processed (throughput = output / duration).

Q 25: - Consider the five-stage pipelined processor specified by the following reservation table

(a) List the set of forbidden latencies and the collision vector".

(b) Draw a state transition diagram showing all possible initial sequences {cycles} without causing a
collision in the pipeline.

(c) List all the simple cycles from the state diagram.

(d) identify the greedy cycles among the simple cycles.

(e) What is the minimum average latency (MAL) of this pipeline?


Q 26: - Explain the different hardware support for exposing ILP. Explain ILP with a suitable example.

Instruction-level Parallelism (ILP) is a family of processor and compiler


design techniques that speed up execution by causing individual
machine operations, such as memory loads and stores, integer
additions and floating point multiplications, to execute in parallel.

Example :
Suppose, 4 operations can be carried out in single clock cycle.
So there will be 4 functional units, each attached to one of the
operations, branch unit, and common register file in the ILP
execution hardware. The sub-operations that can be performed
by the functional units are Integer ALU, Integer Multiplication,
Floating Point Operations, Load, Store. Let the respective
latencies be 1, 2, 3, 2, 1.
Let the sequence of instructions be –

1. y1 = x1*1010
2. y2 = x2*1100
3. z1 = y1+0010
4. z2 = y2+0101
5. t1 = t1+1
6. p = q*1000
7. clr = clr+0010
8. r = r+0001
Sequential record of execution vs. Instruction-level Parallel
record of execution –

Q 27: - Describe basic pipelining and differentiate between instruction and arithmetic pipelining with
example of each.

Pipelining is the process of accumulating instruction from the


processor through a pipeline. It allows storing and executing
instructions in an orderly process. It is also known as pipeline
processing. Pipelining is a technique where multiple instructions are
overlapped during execution.

1. Arithmetic Pipeline :
An arithmetic pipeline divides an arithmetic problem into various
sub problems for execution in various pipeline segments. It is
used for floating point operations, multiplication and various
other computations. The process or flowchart arithmetic pipeline
for floating point addition is shown in the diagram.
Floating point addition using arithmetic pipeline :
The following sub operations are performed in this case:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalise the result
First of all the two exponents are compared and the larger of two
exponents is chosen as the result exponent. The difference in
the exponents then decides how many times we must shift the
smaller exponent to the right. Then after shifting of exponent,
both the mantissas get aligned. Finally the addition of both
numbers take place followed by normalisation of the result in the
last segment.
Example:
Let us consider two numbers,

X=0.3214*10^3 and Y=0.4500*10^2

Explanation:
First of all the two exponents are subtracted to give 3-2=1. Thus
3 becomes the exponent of result and the smaller exponent is
shifted 1 times to the right to give
Y=0.0450*10^3

Finally the two numbers are added to produce


Z=0.3664*10^3

As the result is already normalized the result remains the same.


2. Instruction Pipeline :
In this a stream of instructions can be executed by overlapping
fetch, decode and execute phases of an instruction cycle. This
type of technique is used to increase the throughput of the
computer system. An instruction pipeline reads instruction from
the memory while previous instructions are being executed in
other segments of the pipeline. Thus we can execute multiple
instructions simultaneously. The pipeline will be more efficient if
the instruction cycle is divided into segments of equal duration.
In the most general case computer needs to process each
instruction in following sequence of steps:
1. Fetch the instruction from memory (FI)
2. Decode the instruction (DA)
3. Calculate the effective address
4. Fetch the operands from memory (FO)
5. Execute the instruction (EX)
6. Store the result in the proper place
The flowchart for instruction pipeline is shown below.
Let us see an example of instruction pipeline.
Example:
Q 28: - Explain the Tomasulo’s Algorithm for dynamic instruction scheduling.

Tomasulo's algorithm is a computer architecture


hardware algorithm for dynamic scheduling of instructions that allows
out-of-order execution and enables more efficient use of multiple
execution units.
Q 29: - Explain the instruction level parallelism with dynamic approaches. What is pipelining? Explain
various hazards involved in implementing pipelining.

UNIT V - DEVELOPING PARALLEL COMPUTING APPLICATIONS

Q 30: - Discuss the PRAM Algorithms with the help of some examples.

Parallel Random Access Machines (PRAM) is a model, which is


considered for most of the parallel algorithms. Here, multiple
processors are attached to a single block of memory. A PRAM model
contains −
 A set of similar type of processors.
 All the processors share a common memory unit. Processors
can communicate among themselves through the shared
memory only.
 A memory access unit (MAU) connects the processors with the
single shared memory.
Here, n number of processors can perform independent operations
on n number of data in a particular unit of time. This may result in
simultaneous access of same memory location by different
processors.

Q 31: - Define following parallel computing applications: -

(i) Pre order tree traversal


Traverse the right subtree, i.e., call Preorder(right-subtree)
Uses of Preorder. Preorder traversal is used to create a
copy of the tree. Preorder traversal is also used to get prefix
expression on of an expression tree
(ii) Merging two sorted lists
Write a SortedMerge() function that takes two lists, each of
which is sorted in increasing order, and merges the two
together into one list which is in increasing order.
SortedMerge() should return the new list.
(iii) Parallel Quick Sort
We randomly choose a pivot from one of the processes and.
broadcast it to every process. Each process divides its
unsorted list into two lists: those smaller. than (or equal) the
pivot, those greater than the pivot. Each process in the upper
half of the process list sends its “low list”.
Given a list of numbers: {79, 17, 14, 65, 89, 4, 95, 22, 63, 11 }
The first number, 79, is chosen as pivot
Low list contains {17, 14, 65, 4, 22, 63, 11 } High list contains {89, 95 }
For sublist {17, 14, 65, 4, 22, 63, 11 }, choose 17 as pivot
Low list contains {14, 4, 11 } High list contains {64, 22, 63 } . . . { 4, 11, 14, 17, 22, 63, 65 } is
the sorted result of sublist {17, 14, 65, 4, 22, 63, 11 }
For sublist {89, 95 } choose 89 as pivot Low list is empty (no need for further recursions)
High list contains {95 } (no need for further recursions) {89, 95 } is the sorted result of sublist
{89, 95 }
Final sorted result: { 4, 11, 14, 17, 22, 63, 65, 79, 89, 95 }

Q 32: - Explain the Jacobi Algorithm for linear systems.


In numerical linear algebra, the Jacobi method is an iterative algorithm for
determining the solutions of a strictly diagonally dominant system of linear
equations. Each diagonal element is solved for, and an approximate value
is plugged in. The process is then iterated until it converges.

Example :

You might also like