Advanced Computer Architecture
Advanced Computer Architecture
Q 1. Differentiate the generations of Electronic Computers with their merits and demerits.
1. FIRST GENERATION
Introduction:
1. 1946-1959 is the period of first generation computer.
2. J.P.Eckert and J.W.Mauchy invented the first successful electronic computer called ENIAC, ENIAC stands
for “Electronic Numeric Integrated And Calculator”.
Few Examples are:
1. ENIAC
2. EDVAC
3. UNIVAC
4. IBM-701
5. IBM-650
…
Advantages:
1. It made use of vacuum tubes which are the only electronic component available during those days.
2. These computers could calculate in milliseconds.
Disadvantages:
1. These were very big in size, weight was about 30 tones.
2. These computers were based on vacuum tubes.
3. These computers were very costly.
4. It could store only a small amount of information due to the presence of magnetic drums.
Introduction:
1. 1959-1965 is the period of second-generation computer.
2. 3.Second generation computers were based on Transistor instead of vacuum tubes.
Few Examples are:
1. Honeywell 400
2. IBM 7094
3. CDC 1604
4. CDC 3600
5. UNIVAC 1108
… many more
Advantages:
1. Due to the presence of transistors instead of vacuum tubes, the size of electron component decreased. This
resulted in reducing the size of a computer as compared to first generation computers.
2. Less energy and not produce as much heat as the first genration.
Disadvantages:
1. A cooling system was required.
2. Constant maintenance was required.
3. Only used for specific purposes.
2. THIRD GENERATION
Introduction:
1. 1965-1971 is the period of third generation computer.
2. These computers were based on Integrated circuits.
3. IC was invented by Robert Noyce and Jack Kilby In 1958-1959.
4. IC was a single component containing number of transistors.
Few Examples are:
1. PDP-8
2. PDP-11
3. ICL 2900
4. IBM 360
5. IBM 370
… and many more
Advantages:
1. These computers were cheaper as compared to second-generation computers.
2. They were fast and reliable.
3. Use of IC in the computer provides the small size of the computer.
4. IC not only reduce the size of the computer but it also improves the performance of the computer as
compared to previous computers.
5. This generation of computers has big storage capacity.
Disadvantages:
1. IC chips are difficult to maintain.
2. The highly sophisticated technology required for the manufacturing of IC chips.
3. Air conditioning is required.
3. FOURTH GENERATION
Introduction:
1. 1971-1980 is the period of fourth generation computer.
2. This technology is based on Microprocessor.
3. A microprocessor is used in a computer for any logical and arithmetic function to be performed in any
program.
4. Graphics User Interface (GUI) technology was exploited to offer more comfort to users.
Few Examples are:
1. IBM 4341
2. DEC 10
3. STAR 1000
4. PUP 11
… and many more
Advantages:
1. Fastest in computation and size get reduced as compared to the previous generation of computer.
2. Heat generated is negligible.
3. Small in size as compared to previous generation computers.
4. Less maintenance is required.
5. All types of high-level language can be used in this type of computers.
Disadvantages:
1. The Microprocessor design and fabrication are very complex.
2. Air conditioning is required in many cases due to the presence of ICs.
3. Advance technology is required to make the ICs.
4. FIFTH GENERATION
Introduction:
1. The period of the fifth generation in 1980-onwards.
2. This generation is based on artificial intelligence.
3. The aim of the fifth generation is to make a device which could respond to natural language input and are
capable of learning and self-organization.
4. This generation is based on ULSI(Ultra Large Scale Integration) technology resulting in the production of
microprocessor chips having ten million electronic component.
Few Examples are:
1. Desktop
2. Laptop
3. NoteBook
4. UltraBook
5. Chromebook
… and many more
Advantages:
1. It is more reliable and works faster.
2. It is available in different sizes and unique features.
3. It provides computers with more user-friendly interfaces with multimedia features.
Disadvantages:
1. They need very low-level languages.
2. They may make the human brains dull and doomed.
Q 2. Explain Flynn’s Classification Schemes and also give diagram to each category.
M.J. Flynn proposed a classification for the organization of a computer system by the number of instructions and data
items that are manipulated simultaneously.
The operations performed on the data in the processor constitute a data stream.
Next →← Prev
The operations performed on the data in the processor constitute a data stream.
Parallel processing may occur in the instruction stream, in the data stream, or both.
Flynn's classification divides computers into four major groups that are:
SISD stands for 'Single Instruction and Single Data Stream'. It represents the organization of a single computer
containing a control unit, a processor unit, and a memory unit. Instructions are decoded by the Control Unit and then the
Control Unit sends the instructions to the processing units for execution.
SIMD stands for 'Single Instruction and Multiple Data Stream'. It represents an organization that includes
many processing units under the supervision of a common control unit. All processors receive the same
instruction from the control unit but operate on different items of data.
SIMD is mainly dedicated to array processing machines. However, vector processors can also be seen
as a part of this group.
MISD
MISD stands for 'Multiple Instruction and Single Data stream'.
MISD structure is only of theoretical interest since no practical system has been constructed
using this organization.
In MISD, multiple processing units operate on one single-data stream. Each processing unit
operates on the data independently via separate instruction stream.
Multiple instruction stream, multiple data stream (MIMD) MIMD stands for 'Multiple Instruction and
Multiple Data Stream'.
In this organization, all processors in a parallel computer can execute different instructions and
operate on various data at the same time.
In MIMD, each processor has a separate program and an instruction stream is generated from
each program.
Next →← Prev
MIMD
MIMD stands for 'Multiple Instruction and Multiple Data Stream'.
In this organization, all processors in a parallel computer can execute different instructions
and operate on various data at the same time.
In MIMD, each processor has a separate program and an instruction stream is generated from
each program.
a) S1: A = B + D
S2: C = A X 3
S3: A = A + C
S4: E = A / 2 b) S1: X = SIN (Y)
S2: Z = X + W
S3: Y = - 2.5 X W
Q 6: - Differentiate between software and Hardware Parallelism. With the help of a suitable example,
discuss the mismatch between software parallelism and hardware parallelism.
Q 7: - Discuss the grain size and latency. Also Explain the level of parallelism in program execution on
modern computers.
In parallel computing, granularity (or grain size) of a task is a measure of the amount of work
(or computation) which is performed by that task.
(i) Node Degree The degree of a node is the number of edges connected to
the node.
(v) k-ary n-cube networks : k nodes in each dimension, each node can be labelled by an n digit
number of radix (base) k, each node is connected to every node which has a label which differs
in only one digit by one.
(ii) Switch Modules The Ethernet switch network module is a modular, high-density voice network
module that provides Layer 2 switching across Ethernet ports.
Q 11: - What is an Omega Network? Explain the various omega network routing.
An Omega network is a network configuration often used in parallel
computing architectures. It is an indirect topology that relies on the perfect
shuffle interconnection algorithm.
Destination-tag routing[edit]
In destination-tag routing, switch settings are determined solely by the message
destination. The most significant bit of the destination address is used to select the
output of the switch in the first stage; if the most significant bit is 0, the upper output
is selected, and if it is 1, the lower output is selected. The next-most significant bit of
the destination address is used to select the output of the switch in the next stage,
and so on until the final output has been selected.
XOR-tag routing[edit]
In XOR-tag routing, switch settings are based on (source PE) XOR (destination PE).
This XOR-tag contains 1s in the bit positions that must be swapped and 0s in the bit
positions that both source and destination have in common. The most significant bit
of the XOR-tag is used to select the setting of the switch in the first stage; if the most
significant bit is 0, the switch is set to pass-through, and if it is 1, the switch is
crossed. The next-most significant bit of the tag is used to set the switch in the next
stage, and so on until the final output has been selected.
Q 12: - Explain the routing in an omega network having permutation function f = (0, 6, 4, 7, 3) (1, 5) (2).
Q 13: - What is meant by cache coherence problem? Describe various protocols (Snoopy Bus Protocols)
to handle cache coherence.
Cache coherence refers to the problem of keeping the data in these caches consistent. The
main problem is dealing with writes by a processor.
Snoopy Protocols:
Snoopy protocols distribute the responsibility for maintaining
cache coherence among all of the cache controllers in a
multiprocessor system.
A cache must recognize when a line that it holds is shared with
other caches.
When an update action is performed on a shared cache line, it
must be announced to all other caches by a broadcast
mechanism.
Each cache controller is able to “snoop” on the network to
observed these broadcasted notification and react accordingly.
Snoopy protocols are ideally suited to a bus-based
multiprocessor, because the shared bus provides a simple
means for broadcasting and snooping.
Two basic approaches to the snoopy protocol have been
explored: Write invalidates or write- update (write-broadcast)
With a write-invalidate protocol, there can be multiple readers but
only one write at a time.
Initially, a line may be shared among several caches for reading
purposes.
When one of the caches wants to perform a write to the line it
first issues a notice that invalidates that tine in the other caches,
making the line exclusive to the writing cache. Once the line is
exclusive, the owning processor can make local writes until
some other processor requires the same line.
Write-through - all data written to the cache is also written to memory at the same time.
Write-back - when data is written to a cache, a dirty bit is set for the affected block. The
modified block is written to memory only when the block is replaced.
Write-Once was the first MESI protocol defined. It has the optimization of
executing write-through on the first write and a write-back on all subsequent
writes, reducing the overall bus traffic in consecutive writes to the computer
memory.
Amdahl's Law says that the time to solve a problem (t) using
a parallel algorithm is t = P/N + L where P is the total amount of core
time for calculations that can be done in parallel, N is the number of
tasks, and L is the time to do the parts of the program that cannot be
done in parallel.
(ii) Gustafson’s Law
Gustafson’s Law says that if you apply P processors to a
task that has serial fraction f, scaling the task to take the
same amount of time as before, the speedup is
Speedup==f+P(1−f)P−f(P−1)
RISC CISC
RISC architecture can be used with high-end applications like CISC arch
telecommunication, image processing, video processing, etc. applicatio
etc.
The program written for RISC architecture needs to take more Program w
space in memory. less space
Q 16: - Discuss the pipelining in Super Scalar with the help of an example.
Q 17: - Outline the architecture of VLIW processor and also simplify the pipeline operations of VLIW
Processors.
Very Long Instruction Word (VLIW) architecture in P-DSPs
(programmable DSP) increases the number of instructions that are
processed per cycle. It is a concatenation of several short instructions
and requires multiple execution units running in parallel, to carry out
the instructions in a single cycle. A language compiler or pre-
processor separates program instructions into basic operations and
places them into VLWI processor which then disassembles and
transfers each operation to an appropriate execution unit.
VLIW P-DSPs have a number of processing units (data paths) i.e.
they have a number of ALUs, MAC units, shifters, etc. The VLIW is
accessed from memory and is used to specify the operands and
operations to be performed by each of the data paths.
Primary Memory
The primary memory is also known as internal memory, and
this is accessible by the processor straightly. This memory
includes main, cache, as well as CPU registers.
Secondary Memory
The secondary memory is also known as external memory,
and this is accessible by the processor through an
input/output module. This memory includes an optical disk,
magnetic disk, and magnetic tape.
Q 19: - Explain the inclusion property and data transfers between adjacent levels of memory hierarchy
with an example.
Q 20: - Describe the concept of locality of reference and its types. Also differentiate among them.
Spatial Locality –
Spatial locality means instruction or data near to the current
memory location that is being fetched, may be needed
soon in the near future. This is slightly different from the
temporal locality. Here we are talking about nearly located
memory locations while in temporal locality we were talking
about the actual memory location that was being fetched.
Q 21: - Write a short note on the following: -
Q 25: - Consider the five-stage pipelined processor specified by the following reservation table
(a) List the set of forbidden latencies and the collision vector".
(b) Draw a state transition diagram showing all possible initial sequences {cycles} without causing a
collision in the pipeline.
(c) List all the simple cycles from the state diagram.
Example :
Suppose, 4 operations can be carried out in single clock cycle.
So there will be 4 functional units, each attached to one of the
operations, branch unit, and common register file in the ILP
execution hardware. The sub-operations that can be performed
by the functional units are Integer ALU, Integer Multiplication,
Floating Point Operations, Load, Store. Let the respective
latencies be 1, 2, 3, 2, 1.
Let the sequence of instructions be –
1. y1 = x1*1010
2. y2 = x2*1100
3. z1 = y1+0010
4. z2 = y2+0101
5. t1 = t1+1
6. p = q*1000
7. clr = clr+0010
8. r = r+0001
Sequential record of execution vs. Instruction-level Parallel
record of execution –
Q 27: - Describe basic pipelining and differentiate between instruction and arithmetic pipelining with
example of each.
1. Arithmetic Pipeline :
An arithmetic pipeline divides an arithmetic problem into various
sub problems for execution in various pipeline segments. It is
used for floating point operations, multiplication and various
other computations. The process or flowchart arithmetic pipeline
for floating point addition is shown in the diagram.
Floating point addition using arithmetic pipeline :
The following sub operations are performed in this case:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalise the result
First of all the two exponents are compared and the larger of two
exponents is chosen as the result exponent. The difference in
the exponents then decides how many times we must shift the
smaller exponent to the right. Then after shifting of exponent,
both the mantissas get aligned. Finally the addition of both
numbers take place followed by normalisation of the result in the
last segment.
Example:
Let us consider two numbers,
Explanation:
First of all the two exponents are subtracted to give 3-2=1. Thus
3 becomes the exponent of result and the smaller exponent is
shifted 1 times to the right to give
Y=0.0450*10^3
Q 30: - Discuss the PRAM Algorithms with the help of some examples.
Example :