0% found this document useful (0 votes)
59 views

Computer Architecture AllClasses-Outline-1-99

This document provides an introduction to the fundamentals of computer architecture. It discusses computational models and their components, the evolution of computer architecture, and key concepts like processes, threads, parallelism and levels of parallelism.

Uploaded by

SrinivasaRao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Computer Architecture AllClasses-Outline-1-99

This document provides an introduction to the fundamentals of computer architecture. It discusses computational models and their components, the evolution of computer architecture, and key concepts like processes, threads, parallelism and levels of parallelism.

Uploaded by

SrinivasaRao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Computer Architecture Unit 1

Unit 1 Fundamentals of Computer Architecture

Structure:
1.1 Introduction
Objectives
1.2 Computational Model
The basic items of computations
The problem description model
The execution model
1.3 Evolution of Computer Architecture
1.4 Process and Thread
Concept of process
Concept of thread
1.5 Concepts of Concurrent and Parallel Execution
1.6 Classification of Parallel Processing
Single instruction single data (SISD)
Single instruction multiple data (SIMD) Multiple instruction single
data (MISD) Multiple instruction multiple data (MIMD)
1.7 Parallelism and Types of Parallelism
1.8 Levels of Parallelism
1.9 Summary
1.10 Glossary
1.11 Terminal Questions
1.12 Answers

1.13 Introduction
As you all know computers vary greatly in terms of physical size, speed of
operation, storage capacity, application, cost, ease of maintenance and
various other parameters. The hardware of a computer consists of physical
parts that are connected in some way so that the overall structure achieves
the pre-assigned functions. Each hardware unit can be viewed at different
levels of abstraction. You will find that simplification can go on to still deeper
levels. You will be surprised to know that many technologies exist for
manufacturing microchips.

The complexity of integration is likely to go on increasing with time. As a

Manipal University Jaipur B1648 Page No. 1


Computer Architecture Unit 1

consequence smaller and more powerful computers will go on appearing.


Evidently, which components are used and how they are interconnected,
dictates what the resulting computer will be good at doing. Thus, in a faster
computer, you will find special components connected in a special way that
enhances the speed of operation of the designed computer.
Different computer designs can have different components. Moreover, the
same components can be interconnected in variety of ways. Each design will
provide a different performance to the users. Exactly what components
interconnected in what ways will produce what performance is the subject of
Computer Architecture. In this unit, we will study about the basics of Computer
Architecture.
Objectives:
After studying this unit, you should be able to:
• explain computational model and its types
• state the different levels of evolution of computer architecture
• differentiate between process and thread
• describe the concepts of concurrent and parallel execution
• identify the various classification of parallel processing
• list the types of parallelism
• list the levels of parallelism

1.2 Computational Model


Computer architecture may be defined as “The Structure and behavior of a
Conceptual model of a Computer System to perform the required
functionalities”.
Computer Architecture deals with the issue of selection of hardware
components and interconnecting them to create computers that achieve
specified functional, performance and cost goals.
Progressing in the earlier mentioned way, the hardware (at least the electronic
part) breaks down to the following simple digital components.
• Registers
• Counters
• Adders
• Multiplexers

Manipal University Jaipur B1648 Page No. 2


Computer Architecture Unit 1

• De-multiplexers
• Coders
• Decoders
• I/O Controllers
A common foundation or paradigm that links the computer architecture and
language groups is called a Computational Model. The concept or idea of
computational model expresses a higher level of abstraction than can be
achieved by either the computer architecture or the programming language
alone, and includes both.
The computational model consists of the subsequent three abstractions:
1. The basic items of computations
2. The problem description model
3. The execution model
Unlike the ordinary delusions, the set of abstractions that must be selected to
state computational models is not very clear. Some criteria will define fewer
but relatively basic computational models, while a wide variety of criteria will
result in a fairly a huge quantity of different models.
3.1.1 The basic items of computations
This concept recognises the basic items of computation. This is a requirement
of the items to which the computation is referred and the sort of computations
(operations) that are executed on them. For example, in the von Neumann
computational model, the fundamental items of computation are data.
This data will normally be characterised by individual bodies so as to be
capable of distinguishing among several different data items in the course of
a computation. These identifiable bodies are commonly called variables in
programming languages and are put into operation by register addresses or
memory in architectures.
The acknowledged computational models, such as Turing model, the von
Neumann model and the data flow model stand on the theory of data. These
models are briefly explained as below:
The Turing machine architecture operates by manipulating symbols on a
tape. In other words, a tape with innumerable slots exists, and at any one point
in time, the Turing machine is in a specific slot. The machine can change the
symbol and shift to a different slot based on the symbol read at that slot. All of

Manipal University Jaipur B1648 Page No. 3


Computer Architecture Unit 1

this is inevitable.
The von-Neumann architecture explains the stored-program computer
where data and instructions are stored in memory and the machine works by
varying its internal state, In other words, an instruction operates on some data
and changes the data. So naturally, there is a state maintained in the system.
Dataflow architecture expressively distinguishes the conventional von
Neumann architecture or control flow architecture. There is a lack of a program
counter in Dataflow architectures. The execution of instructions in dataflow
systems is exclusively concluded depending on the accessibility of input
arguments to the instructions. Even though dataflow architecture has not been
used in any commercially successful computer hardware, it is extremely
appropriate in many software architectures such as database engine designs
and parallel computing frameworks.
On the other hand, there are various models independent of data. In these
models, the basic items of computation are:
• Messages or objects sent to them needing an associated manipulation (as
in the object-based model)
• Arguments and the functions applied on them (applicative model)
• Elements of sets and the predicates declared on them (predicate-logic-
based model).
1.2.2 The problem description model
The problem description model implies in cooperation the style and method of
problem description. The problem description style specifies the way troubles
in a specific computational model are expressed. The style is either procedural
or declarative. The algorithm to work out the problem is shown in a procedural
style. A particular result is then stated in the form of an algorithm. In a
declarative style, all the facts and dealings significant to the specified problem
have to be stated.
There are two modes for conveying these relationships and facts. The first
employs functions, as in the applicative model of computation, while the
second declares the relationships and facts in the form of predicates, as in the
predicate-logic-based computational model. Now, we will study the second
component of the problem description model that is the problem description
method. It is understood in a different way for the procedural and the
declarative style. In the procedural style, the problem description model states

Manipal University Jaipur B1648 Page No. 4


Computer Architecture Unit 1

the way in which the clarification of the known problem has to be explained.
On the contrary, while using the declarative style, it states the method in which
the difficulty itself has to be explained.
1.2.3 The execution model
This is the third and the final constituent of computational model. It can be
divided into three stages.
• Interpretation of how to perform the computation
• Execution semantics
• Control of the execution sequences
The first stage pronounces the analysis of the computation, which is strongly
linked to the problem description method. The selection of problem description
method and the analysis of the computation are mutually dependent on one
another.
The subsequent stage of the execution model states the execution semantics.
This is taken as a rule that identifies the way a particular execution step is to
be performed. This rule is, certainly, linked with the selected problem
description method and the way the implementation of the computation is
understood. The final stage of the model states the rule of the execution
sequences. In the basic models, implementation is either control driven or data
driven or demand driven.
• In Control driven execution, it is supposed that there is a program
consisting of a succession of instructions. The execution sequence is then
absolutely specified by the command of the directions.
Nevertheless, explicit control instructions can also be used to identify an
exit from the implied execution sequence.
• Data-driven execution is symbolised by the rule that an operation is made
active instantly after all the needed input data is available. Data- driven
execution control is characteristic of the dataflow model of computation.
• In Demand-driven execution, the operations will be made active only when
their implementation is required to attain the ultimate result. Demand-
driven execution control is normally used in the applicative computational
model.
Self Assessment Questions
1. The _________ model refers to both the style and method of problem
description.

Manipal University Jaipur B1648 Page No. 5


Computer Architecture Unit 1

2. In a _________ , the algorithm for solving the problem is stated.


3. _________ execution is characterised by the rule that an operation is
activated as soon as all the needed input data is available.

1.3 Evolution of Computer Architecture


With the advent of revolutionary development in area of semiconductor
technology, the computer architecture has gradually evolved in stages over
the years. The main target of such evolution is to enhance the performance of
the processors. History of computers begins with the invention of the abacus
in 3000 BC, followed by the invention of mechanical calculators in 1617. The
years beyond 1642 till 1980 are marked by inventions of zeroth, first, second
and third generation computers. The years beyond 1980 till today, are marked
by fourth generation computers. Fifth generation computers are still under
research and development.
Zeroth Generation Computers: The zeroth generation of computers (1642-
1946) was distinctly made available by the invention of largely mechanical
computers. In 1642, a French mathematician named Blaise Pascal invented
the first mechanical device which was called Pascaline. In 1822, Charles
Babbage, an English mathematician, invented a machine called Difference
Engine to compute tables of numbers for naval navigation. Later on, in the
year 1834, Babbage attempted to build a digital computer, called Analytical
Engine. The analytical engine had all the parts of a modern computer i.e. the
store (memory unit), the mill (computation unit), the punched card reader (input
unit) and the punched/ printed output (output unit). As all the basic parts of
modern computers were thought out by Charles Babbage, he is known as
Father of Computers.
First Generation Computers: The first generation of computers (19461954)
was marked by the use of vacuum tubes or valves as their basic electronic
component. Although these computers were faster than earlier mechanical
devices, they had many disadvantages. First of all, they were very large in
size. They consumed too much power and generated too much heat, when
used for even short duration of time. They were very unreliable and broke
down frequently. They required regular maintenance and their components
had also to be assembled manually.
Some examples of first generation computers are ENIAC (Electronic
Numerical Integrator and Calculator), EDVAC (Electronic Discrete Variable

Manipal University Jaipur B1648 Page No. 6


Computer Architecture Unit 1

Automatic Computer), EDSAC (Electronic Delay Storage Automatic


Calculator), UNIVAC I (Universal Automatic Calculator) and IAS machine
(Institute for Advanced Study machine built by Princeton’s Institute for
Advanced Study). The basic design of first generation computer is shown in
figure 1.1.

Figure 1.1: Basic Design of a First Generation Computer

IAS machine was a new version of the EDVAC, which was built by von
Neumann. The basic design of IAS machine is now known as von Neumann
machine, which had five basic parts - the memory, the arithmetic logic unit, the
program control unit, the input and output unit as shown in figure 1.2.

Manipal University Jaipur B1648 Page No. 7


Computer Architecture Unit 1

Second Generation Computers: The first generation of computers became


out-dated, when in 1954, the Philco Corporation developed transistors that can
be used in place of vacuum tubes. The second generation of computers (1953-
64) was marked by the use of transistors in place of vacuum tubes. Transistors
had a number of advantages over the vacuum tubes. As transistors were made
from pieces of silicon, so they were more compact than vacuum tubes.
The second-generation computers were smaller in size and generated less
heat than first generation computers. Although they were slightly faster and
more reliable than earlier computers, they also had many disadvantages.
They had limited storage capacity, consumed more power and were also
relatively slow in performance. Some examples of second generation
computers are IBM 701, PDP-1 and IBM 650.The basic design of a second
generation computer is shown in figure 1.3.

Manipal University Jaipur B1648 Page No. 8


Computer Architecture Unit 1

Figure 1.3: Basic Design of Second Generation Computer

Third Generation Computers: Second generation computers became out-


dated after the invention of ICs. The third generation of computers (1964-
1978) was marked by use of Integrated Circuits (ICs) in place of transistors.
As hundreds of transistors could be put on a single small circuit, so ICs were
more compact than transistors. The third generation computers removed
many drawbacks of second generation computers. The third generation
computers were even smaller in size, very less heat generated and required
very less power as compared to earlier two generation of computers. These
computers required less human labour at the assembly stage.
Some examples of third generation computers are IBM 360, PDP-8, Cray-1
and VAX. The basic design of a third generation computer is shown in figure
1.4.

Manipal University Jaipur B1648 Page No. 9


Computer Architecture Unit 1

Figure 1.4: Basic Design of a Third Generation Computer

Fourth Generation Computers: The third generation computers became out-


dated, when it was found in around 1978, that thousands of ICs could be
integrated onto a single chip, called LSI (Large Scale Integration).
The fourth generation of computers (1978-till date) was marked by use of
large-scale Integrated (LSI) circuits in place of ICs. As thousands of ICs could
be put onto a single circuit, so LSI circuits are still more compact than ICs. In
1978, it was found that millions of components could be packed onto a single
circuit, known as Very Large Scale Integration (VLSI). VLSI is the latest
technology of computer that led to the development of the popular Personal
Computers (PCs), also called as Microcomputers.
Some examples of fourth generation computers are IBM PC, IBM PC/AT, 386,
486, Pentium and CRAY-2. The basic design of a fourth generation computer
is shown in figure 1.5.

Manipal University Jaipur B1648 Page No. 10


Computer Architecture Unit 1

Fifth Generation Computers: Although fourth generation computers offer too


many advantages to users, still they have one main disadvantage. The major
drawback of these computers is that they have no intelligence on their own.
Scientists are now trying to remove this drawback by making computers, which
would have artificial intelligence. The fifth generation computers (Tomorrow's
computers) are still under research and development stage. These computers
would have artificial intelligence.
They will use USLI (Ultra Large-Scale Integration) chips in place of VLSI chips.
One USLI chip contains millions of components on a single IC. Robots have
some features of fifth generation computers.
Self Assessment Questions
4. ______ was the first mechanical device, invented by Blaise Pascal.
5. ________ was a new version of the EDVAC, which was built by von
Neumann.
6. The fourth generation of computers was marked by use of Integrated
Circuits (ICs) in place of transistors. (True/ False)
7. Personal Computers (PCs), also called as Microcomputers.
(True/ False)

Manipal University Jaipur B1648 Page No. 11


Computer Architecture Unit 1

Activity 1:
Using the Internet, find out about Fifth Generation Computer Systems
project (FGCS), idea behind it, implementation, timeline and outcome

1.4 Process and Thread


Every process presents the resources required to execute a program. A
process has an executable code, a virtual address space, open handles to
system objects, a unique process identifier, a security context, minimum and
maximum working set sizes, environment variables, a priority class, and at
least one thread of execution. Each process is begun with a single thread,
often called the primary thread, but can create additional threads from any of
its threads.
A thread is the entity within a process that can be scheduled for execution. All
threads of a process share its system resources and virtual address space.
Additionally, each thread maintains exception handlers, thread local storage,
a scheduling priority, a unique thread identifier, and a set of structures the
system will utilise to save the thread context until it is scheduled. The thread
context includes the thread's set of machine registers, a thread environment
block, the kernel stack and a user stack in the address space of the thread's
process. Threads can also have their own security context, which is valuable
in impersonating clients.
The basic difference between process and thread is that every process has its
own data memory location but all related threads can share same data
memory and have their individual stacks. A process is a collection of virtual
memory space, code, data and system resources whereas thread is a code
which will be serially executed within a process.
Let’s study these concepts in detail.
1.4.1 Concept of process
In operating system terminology, instead of the term ‘program’, the notion of
process is used in connection with execution. It designates a commission or
job, or a quantum of work dealt with as an entity. Consequently, the resources
required, such as address space, are typically allocated on a process basis.
Each process has a life cycle, which consists of creation, an execution phase
and termination.
Process creation involves the following four main actions:

Manipal University Jaipur B1648 Page No. 12


Computer Architecture Unit 1

• Setting up the process description: Usually, operating systems describe


a process by means of a description table which is called the Process
Control Block or PCB. A PCB contains all the information relevant to the
whole life cycle of a process. It holds basic data such as process
identification, owner, process status, description of the
allocated address space and so on.
• Allocating address space: Allocation of address space to a process for
execution is the second major component of process creation. This
consists of two approaches: sharing the address space among the created
processes (shared memory) or allocating distinct address spaces to each
process (per-process address spaces).

• Loading the program into the allocated address space: Subsequently,


the executable program file will usually be loaded into the allocated
memory space.

• Passing the process description to the scheduler: Finally, the process


thus created is passed to the process scheduler which allocates the
processor to the competing processes. The process scheduler manages
processes typically by setting up and manipulating queues of PCBs. Thus,
after creating a process the scheduler puts the PCB into ready-to-run
processes.
Process scheduling involves three key concepts: the declaration of distinct
process states, the specification of the state transition diagram and the
statement of a scheduling policy. As far as process states are concerned, there
are three basic states connected with scheduling:
• The ready-to-run state
• The running state and
• The wait (or blocked) state.
In the wait state, they are suspended or blocked waiting for the occurrence of
some event before getting ready to run again. When the scheduler selects a
process for execution, its state is changed from ready-to-run to running.
Finally, a process in the wait can go into the ready-to-run state, if the event it
is waiting for has occurred. You can see various process states in figure 1.6.

Manipal University Jaipur B1648 Page No. 13


Computer Architecture Unit 1

1.4.2 Concept of thread


A thread is a fundamental unit of CPU consumption, which consists of a
program counter, a stack, and a set of registers and a thread ID. Conventional
heavyweight processes consist of a single thread of control. In other words,
there is one program counter, and one sequence of instructions that can be
carried out at any specified time.
At present, multi-threaded applications have taken the place of single thread
applications. These have multiple threads within a single process, each having
their own program counter, stack and set of registers, but sharing common
code, data, and certain structures such as open files. See figure 1.7 to find out
the differences between the two processes.

Manipal University Jaipur B1648 Page No. 14


Computer Architecture Unit 1

Figure 1.7: Single and Multithreaded Processes

Threads are of great use in recent programming particularly when a process


has multiple tasks to perform in parallel of the others. This is mainly helpful
when one of the tasks may block, and it is needed to permit the other tasks to
continue without blocking. For example, in a word processor, a background
thread may check spelling and grammar while a foreground thread processes
user input (keystrokes), while yet a third thread loads images from the hard
drive, and a fourth does periodic automatic backups of the file being edited.
Self Assessment Questions
8. All threads of a process share its virtual address space and system
resources. (True/ False)
9. When the scheduler selects a process for execution, its state is changed
from ready-to-run to the wait state. (True/ False)

1.5 Concepts of Concurrent and Parallel Execution


Concurrent execution is the temporal behaviour of the N-client 1-server model
where one client is served at any given moment. This model has a dual nature;

Manipal University Jaipur B1648 Page No. 15


Computer Architecture Unit 1

it is sequential in a small time scale, but simultaneous in a rather large time


scale. In this situation, the key problem is how the competing clients, let us
say processes or threads, should be scheduled for service (execution) by the
single server (processor). The scheduling policy may be viewed as covering
the following two aspects:
Pre-emption rule: It deals with whether servicing a client can be interrupted
or not and, if so, on what occasions. The pre-emption rule may either specify
time-sharing, which restricts continuous service for each client to the duration
of a time slice, or can be priority based, interrupting the servicing of a client
whenever a higher priority client requests service.
Selection rule: It states how one of the competing clients is selected for
service. The selection rule is typically based on certain parameters, such as
priority, time of arrival, and so on. This rule specifies an algorithm to determine
a numeric value, which we will call the rank, from the given parameters. During
selection, the ranks of all competing clients are computed and the client with
the highest rank is scheduled for service.
Parallel execution: Parallel execution is associated with N-client N-server
model. Having more than one server, allows the servicing of more than one
client at the same time; this is called parallel execution. Parallel computing
is the simultaneous use of multiple compute resources to solve a
computational problem. It may take the use of multiple CPUs. A problem is
broken into discrete parts that can be solved concurrently. Each part is further
broken down to a series of instructions and instructions from each part execute
simultaneously on different CPUs as shown in figure 1.8.
problem instruction

tN t3 t2 t1
Figure 1.8: Parallel Computing Systems

Manipal University Jaipur B1648 Page No. 16


Computer Architecture Unit 1

Thus, we can say that a computer system is said to be Parallel Processing


System or Parallel Computer if it provides facilities for simultaneous
processing of various set of data or simultaneous execution of multiple
instruction. On a computer with more than one processor each of several
processes can be assigned to its own processor, to allow the processes to
progress simultaneously. If only one processor is available the effect of parallel
processing can be simulated by having the processor run each process in turn
for a short time.

Self Assessment Questions


10. Concurrent execution is the temporal behaviour of the _____ Model.
11. During selection, the ranks of all competing clients are computed and the
client with the highest rank is scheduled for service. (True/ False)

1.6 Classification of Parallel Processing


The core element of parallel processing is Central Processing Units (CPUs).
The essential computing process is the execution of sequence of instruction
on asset of data. The term stream is used here to denote a sequence of items
as executed by single processor or multiprocessor. Based on a number of
instruction and data streams can be processed simultaneously,
Flynn classifies the computer system into four categories. They are:
(a) Single Instruction Single Data (SISD)
(b) Single Instruction Multiple Data (SIMD)

(c) Multiple Instruction Single Data (MISD)


(d) Multiple Instruction Multiple Data (MIMD)
Let’s learn more about them.
(e) .1 Single instruction single data (SISD)
Computers with a single processor that is capable of executing scalar
arithmetic operations using a single instruction stream and a single data
stream are called SISD (Single Instruction Single Data) computers. They are
characterised by:
Single instruction: Only single instruction stream/linearised set is being
acted on by the CPU during any one clock-cycle.
Single data: Merely a distinct data stream is being used as input during any

Manipal University Jaipur B1648 Page No. 17


Computer Architecture Unit 1

one clock-cycle.
This is the oldest and of late, the most widespread structure of computer.
Examples: Most PCs, single CPU workstations and mainframes.
Figure 1.9 shows an example of SISD.

1.6.2 Single instruction multiple data (SIMD)


Computers with a single processor that is capable of executing vector
arithmetic operations using a single instruction stream but multiple data
streams are called SIMD (Single Instruction Multiple Data) computers. They
are characterised by:
Single instruction: Every processing unit perform the identical instruction at
every known clock-cycle.
Multiple data: Each processing unit can operate on a different data element.
This category of machine characteristically has an instruction dispatcher, a
very high-bandwidth in-house arrangement, and a very large array of very
small-capacity instruction units. It is best suitable for specialised problems
characterised by a high level of consistency, such as image processing. Figure
1.10 shows an example of SIMD processing.

Manipal University Jaipur B1648 Page No. 18


Computer Architecture Unit 1

1.6.3 Multiple instruction single data (MISD)


Computers with multiple processors that are capable of executing different
operations using multiple instruction streams but single data stream are called
MISD (Multiple instruction Single Data) computers. They are characterised by:
Multiple Instructions: Every processing unit functions on the data alone via
independent instruction streams.
Single data: A single data stream is entered into multiple processing units.
Some conceivable uses of this architecture are in multiple frequency filters
functional on a single signal stream and multiple cryptography algorithms
trying to crack a single coded message. Figure 1.11 shows an example of
MISD processing.

Manipal University Jaipur B1648 Page No. 19


Computer Architecture Unit 1

Figure 1.11: MISD Process

1.6.4 Multiple Instruction Multiple Data (MIMD)


Computers with multiple processors that are capable of executing vector
arithmetic operations using multiple instruction streams and multiple data
streams are called MIMD (Multiple Instruction Multiple Data) computers. They
are characterised by:
Multiple Instructions: Each processor may be performing a dissimilar
instruction stream.
Multiple Data: Each processor may be working with a dissimilar data stream.
It is the most common type of parallel computer. Most modern computers fall
into this category. Execution can be synchronous or asynchronous,
deterministic or non-deterministic.
Examples: most current supercomputers, networked parallel computer “grids”
and multi-processor computers. Figure 1.12 shows a case of MISD
processing.

Manipal University Jaipur B1648 Page No. 20


Computer Architecture Unit 1

Self Assessment Questions


12. In _________ all processing units execute the same instruction at any
given clock cycle.
13. In which system a single data stream is fed into multiple processing units?
14. ________ is the most common type of parallel computer.

1.7 Parallelism and Types of Parallelism


A parallel computer is a set of processors that are able to work cooperatively
to solve a computational problem. This definition broadly includes parallel
supercomputers that have more than hundreds of
processors, networks of workstations, embedded systems and multiple-
processor workstations. Parallel computers have the potential to
concentrate computational resources like processors, memory, or I/O
bandwidth on important computational problems. The following are the various
types of parallelism:
Bit-level parallelism: Bit-level parallelism is a form of parallel computing
based on increasing processor word size. From the advent of very-large- scale
integration (VLSI) computer chip fabrication technology in the 1970s until
about 1986, advancements in computer architecture were conducted by
increasing bit-level parallelism
Instruction-level parallelism: A computer program is a stream of linearised
instructions carried out by a processor. These commands can be rearranged

Manipal University Jaipur B1648 Page No. 21


Computer Architecture Unit 1

and united into groups which are then acted upon in parallel without altering
the outcome of the program. This is known as instructionlevel parallelism.
Advances in instruction-level parallelism dominated computer architecture
from the mid-1980s until the mid-1990s.
Data parallelism: Data parallelism is parallelism inbuilt in program loops. It
centres at allocating the data across various computing nodes to be processed
in parallel. Parallelising loops recurrently leads to related (not necessarily
identical) operation sequences or functions being performed on elements of a
large data structure. Many scientific and engineering applications display data
parallelism.
Self Assessment Questions
15. Parallel computers offer the potential to concentrate computational
resources on important computational problems. (True/ False)
16. Advances in instruction-level parallelism dominated computer architecture
from the mid-1990s until the mid-2000s. (True/False)

1.8 Levels of Parallelism


Parallelism is one of the most popular ideas in computing. Architectures,
compilers and operating system have been striving for more than two decades
to extract and utilise as much parallelism as possible in order to speed up
computation. The notion of parallelism is used in two different contexts. Either
it designates available parallelism in programs or it refers to parallelism
occurring during execution, called utilised parallelism.
Types of available parallelism: Problem solutions may contain two different
kinds of available parallelism, called functional parallelism and data
parallelism.
Functional parallelism is that kind of parallelism which arises from the logic of
a problem solution. On the contrary, data parallelism comes from using data
structures that allow parallel operations on their elements, such as vectors or
matrices, in problem solutions. From another point of view, parallelism can be
considered as being either regular or irregular. Data parallelism is regular,
whereas functional parallelism, with the execution of loop-level parallelism, is
usually irregular.
Levels of available functional parallelism: Programs written in imperative
languages may represent functional parallelism at different levels, that is, at

Manipal University Jaipur B1648 Page No. 22


Computer Architecture Unit 1

different sizes of granularity. In this respect, we can identify the following four
levels and corresponding granularity sizes:
• Parallelism at the instruction level (fine-grained parallelism): Available
instruction-level parallelism means that particular instructions of a program
may be executed in parallel. To this end, instructions can be either
assembly (machine-level) or high-level language instructions. Usually,
instruction-level parallelism is understood at the machinelanguage
(assembly-language) level.
• Parallelism at the loop level (middle-grained parallelism): Parallelism
may also be available at the loop level. Here, consecutive loop iterations
are candidates for parallel execution. However, data dependencies
between subsequent loop iterations, called recurrences, may restrict their
parallel execution.
• Parallelism at the procedure level (middle-grained parallelism): Next,
there is parallelism available at the procedure level in the form of parallel
executable procedures. The extant of parallelism exposed at this level is
subject mainly to the kind of the problem solution considered.
• Parallelism at the program level (coarse-grained parallelism): Lastly,
different programs (users) are obviously independent of each other. Thus,
parallelism is also available at the user level (which we consider to be
coarse-grained parallelism). Multiple, independent users are a key source
of parallelism occurring in computing scenarios.
Utilisation of functional parallelism: Available parallelism can be utilised by
architectures, compilers and operating systems conjointly for speeding up
computation. Let us first discuss the utilisation of functional parallelism.
In general, functional parallelism can be utilised at four different levels of
granularity, that is,
• Instruction
• Thread
• Process
• User level
It is quite natural to utilise available functional parallelism, which is inherent in
a conventional sequential program, at the instruction level by executing
instructions in parallel. This can be achieved by means of architectures
capable of parallel instruction execution. Such architectures are referred to as
instruction-level function-parallel architectures or simply instruction-level
parallel architectures, commonly abbreviated as ILP-architectures.

Manipal University Jaipur B1648 Page No. 23


Computer Architecture Unit 1

Available functional parallelism in a program can also be utilised at the thread


and/or at the process level. Threads and processes are selfcontained
execution entities embodying an executable chunk of code. Threads and
processes can be created either by the programmer using parallel languages
or by operating systems that support multi-threading or multitasking. They can
also be automatically generated by parallel compilers during compilation of
high-level language programs. Available loop and procedure-level parallelism
will often be exposed in the form of threads and processes.
Self Assessment Questions
17. Parallelism occurring during execution is called --------------------- .
18. Parallelism at the instruction level is also called middle-grained
parallelism. (True/ False)
19. Data parallelism is regular, whereas functional parallelism, with the
execution of loop-level parallelism, is usually irregular. (True/ False)

Activity 2:
Decide which architecture is most appropriate for a given application. First
determine the form of parallelisation which would best suit the application, then
decide both hardware and software for running your parallelised application

1.9 Summary
Let us recapitulate the important concepts discussed in this unit:
• Computer Architecture deals with the issue of selection of hardware
components and interconnecting them to create computers that achieve
specified functional, performance and cost goals.
• The concept of a computational model represents a higher level of
abstraction than can be achieved by either the computer architecture or
the programming language alone, and covers both.
• History of computers begins with the invention of the abacus in 3000 BC,
followed by the invention of mechanical calculators in 1617. Fifth
generation computers are still under research and development.
• Each process provides the resources needed to execute a program.
• A thread is the entity within a process that can be scheduled for execution.
• Concurrent execution is the temporal behaviour of the N-client 1-server
model where one client is served at any given moment.

Manipal University Jaipur B1648 Page No. 24


Computer Architecture Unit 1

• Parallel execution is associated with N-client N-server model.


• Based on a number of instruction and data streams can be processed
simultaneously, Flynn classifies the computer system into four
categories.
• The notion of parallelism is used in two different contexts and three
different types. Either it designates available parallelism in programs or it
refers to parallelism occurring during execution, called utilised parallelism.

1.10 Glossary
• EDSAC: Electronic Delay Storage Automatic Calculator
• EDVAC: Electronic Discrete Variable Automatic Computer
• ENIAC: Electronic Numerical Integrator and Calculator
• IC: Integrated Circuit where hundreds of transistors could be put on a
single small circuit.
• LSI: Large Scale Integration, it can pack more than a million transistors
• MSI: Medium Scale Integration, it packs as many as 100 transistors
• PCB: Process Control Block, it is a description table which contains all the
information relevant to the whole life cycle of a process.
• SSI: Small Scale Integration, it can pack 10 to 20 transistors in a single
chip.
• UNIVAC I: Universal Automatic Calculator
• USLI: Ultra Large-Scale Integration, it contains millions of components on
a single IC
• VLSI: Very Large Scale Integration, it can have up to 1000 transistors
1.11 Terminal Questions
1. Explain the concept of Computational Model. Describe its various types.
2. What are the different stages of evolution of Computer Architecture?
Explain in detail.
3. What is the difference between process and thread?
4. Explain the concepts of concurrent and parallel execution.
5. State Flynn’s classification of Parallel Processing.
6. Explain the types of parallelism.
7. What are the various levels of parallelism?

1.12 Answers
Self Assessment Questions
1. Problem description

Manipal University Jaipur B1648 Page No. 25


Computer Architecture Unit 1

2. Procedural style
3. Data-driven
4. Pascaline
5. IAS machine
6. False
7. True
8. True
9. False
10. N-client 1-server
11. True
12. Single Instruction Multiple Data
13. Multiple Instruction Single Data
14. Multiple Instruction Multiple Data
15. True
16. False
17. Utilised parallelism
18. False
19. True

Terminal Questions
1. A common foundation or paradigm that links the computer architecture
and language classes is called a Computational Model. Refer Section 1.2.
2. History of computers begins with the invention of the abacus in 3000 BC,
followed by the invention of mechanical calculators in 1617. The years
beyond 1642 till 1980 are marked by inventions of zeroth, first, second and
third generation computers. Refer Section 1.3.
3. A thread is the entity within a process that can be scheduled for execution.
Refer Section 1.4.
4. Concurrent execution is the temporal behaviour of the N-client 1-server
model where one client is served at any given moment. Parallel execution
is associated with N-client N-server model. Refer Section 1.5.
5. Flynn classifies the computer system into four categories. Refer Section
1.6.
6. There are three types of parallelism. Refer section 1.7.
7. The notion of parallelism is used in two different contexts. Either it
designates available parallelism in programs or it refers to parallelism

Manipal University Jaipur B1648 Page No. 26


Computer Architecture Unit 1

occurring during execution, called utilised parallelism. Refer Section 1.8.

References:
• Hwang, K. (1993) Advanced Computer Architecture. McGraw-Hill, 1993.
• D. A. Godse & A. P. Godse (2010). Computer Organization. Technical
Publications. pp. 3-9.
• John L. Hennessy, David A. Patterson, David Goldberg (2002)
"Computer Architecture: A Quantitative Approach", Morgan Kaufmann; 3rd
edition.
• Dezso Sima, Terry J. Fountain, Peter Kacsuk (1997) Advanced computer
architectures - a design space approach. Addison-Wesley- Longman: I-
XXIII, 1-766

E-references:
• www.cs.clemson.edu/~mark/hist.html
• www.people.bu.edu/bkia/
• www.ac.upc.edu/
• www.inf.ed.ac.uk/teaching/courses/car/

Unit 2 Fundamentals of Computer Design

Structure:
2.1 Introduction
Objectives
2.2 Changing Face of Computing
Desktop computing
Servers
Embedded computers
2.3 Computer Designer
2.4 Technology Trends
2.5 Quantitative Principles in Computer Design
Advantages of parallelism
Principle of locality
Focus on the common case
2.6 Power Consumption

Manipal University Jaipur B1648 Page No. 27


Computer Architecture Unit 1

2.7 Summary
2.8 Glossary
2.9 Terminal Questions
2.10 Answers

2.1 Introduction
In the previous unit, you studied about the computational model and the
evolution of computer architecture. Also, you studied the concept of process
thread. We also covered two types of execution - concurrent and parallel and
also the types and level of parallelism. In this unit, we will throw light on the
changing face of computing, the task of computer designer and its quantitative
principles. We will also examine the technology trends and understand the
concept of power consumption and efficiency of the matrix.
You can define computer design as an activity that converts the architecture
design of the computer into a programming structure implementation of a
particular organisation. Thus, computer design is also referred to as computer
implementation. Computer designer is responsible for the hardware
architecture of the computer.
Objectives:
After studying this unit, you should be able to:
• identify the changing face of computing
• explain the tasks of the computer designer
• describe the technology trends
• discuss the quantitative principles of the computer design
• describe power consumption and efficiency of the matrix

2.2 Changing Face of Computing


Computer technology has come up with drastic changes in the past 60 years
when the first general-purpose computer was invented. It was in late 70s when
the microprocessor made its entrance. The microprocessor had the ability to
integrate the functions of a computer’s Central Processing Unit (CPU) on a
single-integrated circuit. This improved the growth of the computers by about
35% per year in terms of computer performance. The cost advantage in mass-
production of computer microprocessors combined with this 35% growth rate
would lead to an increase in the computer business based on the
microprocessor.

Manipal University Jaipur B1648 Page No. 28


Computer Architecture Unit 1

In 1960s, the main-frame computers used to be the most prevalent ones.


These computers required huge investments in terms of monitoring support
operators. Main-frame computers used to support distinctive applications like
business data support and large-scale scientific computing. Then in 1970s
came the minicomputers which was a smaller-sized computers and supported
applications in scientific laboratories. These minicomputers soon expanded
out to the popularity of multi-sharing, i.e., multiple users sharing the
computers.
In the late 1970s, we observed the emerging of supercomputers that were
high-performance computers for scientific computing. This class of computers
lead the way to innovation that later reduced the investments required for a
computer.
In 1980s came the desktop computers - based on microprocessors - in the
form of personal computers (also known as PCs) and workstations. The
personal computers facilitated the rise of servers - computers that were highly
reliable, supported long-term data storage and access, and improved
computing power.
Individually-owned computers, in 1990s, gave rise to more personalised
services in order to enhance the communication with other computers all over
the world. This resulted in the origination of Internet and World Wide Web
(www) - the personally assisted devices (or the PDAs). By 2000s, the face of
computing started changing with the coming of cell phones and their
extraordinary popularity. This raised the need for embedded computers. An
embedded computer system is designed for particular control functions within
a larger system such as a cell phone. It is embedded as part of a complete
device. Embedded computers are computers installed into other devices. As
a matter of fact, 98% of computing devices are embedded in all kinds of
electronic equipments. Computers are moving away from the desktops and
laptops and are finding use in everyday devices like mobile phones, credit
cards, planes and cars and even in homes in everyday appliances such as
stoves, refrigerators, microwaves, dishwashers, and driers. The trends have
been shown in figure 2.1.

Manipal University Jaipur B1648 Page No. 29


Computer Architecture Unit 1

These changes have dramatically changed the face of computing and the
computing applications. This has led to three different computer markets each
adapted with different requirements, specifications and applications. These
are explained as follows:
2.2.1 Desktop computing
Desktop computers have the largest market in terms of costs. It varies from
low-end systems to very high-end heavily configured computer systems.
Throughout this range the cost and the competence also varies in terms of
performance. This blend of the performance and the price concerns most to
the customers in the market and thus, to the computer designers.
Consequently, the latest, the highest-performance microprocessors and cost-
reduced microprocessors are largely sold in the category of the desktop
systems.
Characteristics of desktop computing
The important characteristics of desktop computing are:
1. Ease-of-use: In desktop computers, all the computer parts come as
separate detachable components of the computer. Thus, making the use
easy and comfortable for the user.
2. Extensive graphic capabilities: It provides extensive graphics

Manipal University Jaipur B1648 Page No. 30


Computer Architecture Unit 1

capabilities for data visualisation and manipulation. It supports two- and


three- dimensional visuals.
2.2.2 Servers
The existence and popularity of servers emerged with the evolution of the
desktop computers. The role of servers expanded to provide more reliable
usage, storage and access of data. It helped the users provide with large-
scale computing services. The web-based services accelerated this trend of
servers tremendously. These servers are successful in replacing the
traditional main-frame computers and have become the backbone of large
enterprises to help the users with high memory storage.
Characteristics of a server
For servers, different characteristics are important to be understood. They are
explained below:
1. Dependability: The server’s dependability is critical. Breakdown of this
type of a server is extremely more disastrous than the breakdown of a
single independent system.
2. Scalability: Servers are highly scalable in terms of the increasing demand
or requirements. They can be scaled up in the computing services,
memory, storage capacity, etc.
3. Efficiency: Servers are highly efficient and cost-effective. The
responsiveness to individual request remains high.
2.2.3 Embedded computers
An embedded computer is a computer system designed to perform a particular
function or task. It is embedded as component of a bigger complete device.
Embedded systems contain microcontrollers dedicated to handle a specific
task. An embedded system can also be defined as a single purpose computer
embedded in a devise to control some particular function of that bigger devise.
Nowadays, embedded computers are the quickly growing division of the
computer market. These devices cover a range of electronics viz., microwave
oven, washing machines, air conditioners, printers, all contain simple
embedded computers - to digital cell phones, set-top boxes, play stations, etc.
Embedded computers are based on different microprocessors - 8-bit, 16-bit,
or a 32-bit - that execute millions of instructions at a time (in a second). Even
though the variety of embedded computers is large in the market, price is a
chief feature in designs of these computers. Performance requirement is also
Manipal University Jaipur B1648 Page No. 31
Computer Architecture Unit 1

crucial, but the chief objective is to meet the performance need at the minimum
cost.
Characteristics of embedded computers
1. Real-time performance: The performance requisite in an embedded
application is real-time execution. Speed, though in varying degrees, is an
important factor in all architectures. The ability to assure real-time
performance acts as a constraint on the speed needs of the system. Real-
time performance means that the agent is assured to perform within
certain time restraints as specified by the task and the environment.
2. Soft real-time: In a number of applications, a more advanced requisite
exists: the standard time for a particular job is constrained and the number
of occurrences when the maximum time is exceeded. Such techniques are
occasionally called soft real-time and they occur when it is possible to
sometimes miss the time limitation on an incident, provided that not plenty
of them are missed.
3. Need to minimise memory size: Memory can be a considerable element
of the system cost. Thus, it is vital to limit the memory size according to
the requirement.
4. Need to minimise memory power: Larger memory also means high
power need. Emphasis on low power is made by the use of batteries.
Unnecessary usage of power needs to be avoided to keep the power need
low.
Self Assessment Questions
5. The __________ had the ability to integrate the functions of a
computer’s Central Processing Unit (CPU) on a single-integrated circuit.
6. _____________ computers used to support typical applications like
business data support and large-scale scientific computing.
7. The performance requirement in an embedded application is real-time
execution. (True/False)
8. ______________ is the chief objective of embedded computers.

2.3 Computer Designer


Computer Designer is a person who designs CPUs or computers that are
actually built for considerable use. He also plays an important role in the further
development of computer designs. Computer designers are also known as

Manipal University Jaipur B1648 Page No. 32


Computer Architecture Unit 1

computer architects. The world’s first designer was Charles Baggage, (1791
- 1871) (See Figure 2.2). He is considered as the father of computers and
holds the credit of inventing the first mechanical computer that eventually led
to more complex designs.

Figure 2.2: Charles Babbage


He delivered the perfectly functioning computer in 1991. Parts of his
uncompleted mechanisms are on display in the London Science Museum.
Nine years later, the Science Museum completed the printer Babbage had
designed for the difference engine.
Tasks of a computer designer: The tasks of a computer designer are
complex and challenging. The designer needs to determine the attributes that
are necessary for a new computer, then design a computer to maximise the
performance keeping in mind the constraints of the designing - cost, time,
power, size, memory, etc. Computer designers often create and customise
computer systems necessary for performing a company's daily tasks. See
Table 2.1 for functional requirements and features to be met.
Table 2.1: Functional Requirements and features

Manipal University Jaipur B1648 Page No. 33


Computer Architecture Unit 1

The designing includes consideration of a variety of technologies, from


compilers and operating systems to logic design and packaging. Initially, the
computer designing process only involved the instruction set design.
The other stages were known as the implementation. Whereas, in reality, the
job of a computer designer is much more than just the instruction set design,
and the technical obstacles are much more challenging than those faced in
the instruction set. Now let us quickly review the instruction set architecture.
Instruction Set Architecture (ISA)
The Instruction Set Architecture (ISA) is the part of the processor that is visible
to the programmer or compiler writer. The ISA acts as the boundary between
software and hardware. It includes the native data types, instructions,
registers, addressing modes, memory architecture, interrupt and exception
handling, and external I/O. ISA can be classified into the following categories:
1. Complex instruction set computer (CISC) - It consists of a variety expert
instructions and may just not be frequently used in practical programs.
2. Reduced instruction set computer (RISC) - This executes only the
instructions that are commonly used in programs and thus, makes the

Manipal University Jaipur B1648 Page No. 34


Computer Architecture Unit 1

processor simpler. The extraordinary operations are executed as


subroutines, where the extra processor execution time is offset by their
infrequent use.
3. Very long instruction word (VLIW) - In this, the processor receives many
instructions encoded and retrieved in one instruction word.
Figure 2.3: shows the Instruction Set Architecture.

Figure 2.3: Instruction Set Architecture

Now, we will discuss the low-level implementation of the 80x86 instruction set.
Computers cannot execute high-level language constructs like ones found in
C. Rather they execute a relatively small set of machine instructions, such as
addition, subtraction, Boolean operations, and data transfers. Thus, the
engineers decided to encode the instructions in a

Manipal University Jaipur B1648 Page No. 35


Computer Architecture Unit 1

numeric format (suitable for storage in memory). The structure of the ISA is
given below:
1. Class of ISA: The operands are registers or memory locations and
approximately all ISAs are now categorised as general-purpose register
architectures. The 80x86 has 16 general-purpose registers and 16 registers
for floating-point data. The two accepted editions of this class are register-
memory ISAs, which can access memory only with load or store-instruction.
Figure 2.4 shows the structure of a programming model consisting of General
Purpose Registers and Memory.

Figure 2.4: Programming Model: General-Purpose Registers (GPRs) and


Memory
2.Memory Addressing: Virtually all desktop and server computers, including
the 80x86, use byte addressing to access memory operands. Some designs
require the objects to be aligned. The 80x86 does not require alignment, but
accesses are generally faster if operands are aligned.
3.Addressing Mode: Every instruction of a computer states an operation on
certain data. There are a numerous ways of specifying address of the data
to be operated on. These different ways of specifying data are called the
addressing modes In addition to stating registers and constant operands,
addressing modes mentions the method to calculate the effective memory
address of an operand by using information held in registers.

Manipal University Jaipur B1648 Page No. 36


Computer Architecture Unit 1

The 80x86 supports some addressing modes for code or data:


i) Absolute/direct address


I load | reg | address I

Zzzeotive address = address aa given in instruction}

It needs large space in an instruction for long address. It is generally


accessible on CISC machines that have variable-length instructions.
ii) Indexed absolute address

। ------1 ----1 ----- 1 --------------------------------------1

I load | reg |indexl address |


। ------1 ----1 ----- 1 --------------------------------------1

(Effective address = address + contents of specified index register)

Even this needs large space in an instruction for large address. The address
is the beginning of an array and the particular array element needed could be
selected by the index.
iii) Base plus index plus offset

। ------- + --------+ ----- 1 ----- 1---------------------- 1

I lead | reg I base|indexl offset I


। ------- + --------+ ----- 1 ----- 1---------------------- 1

(Effective address = offset + contents of specified base register

+ contents of specified index register)

Manipal University Jaipur B1648 Page No. 37


Computer Architecture Unit 1

The beginning address of the array could be stored in the base register, the
index will choose the particular record needed and the offset can choose the
field inside that record.
iv) Scaled

+ --------- +---- + ----------- + +


I load I reg I base I index I
+ --------- +---- +-------------- + +

(Effective address = contents of specified base register


scaled contents of specified index register)

The beginning of an array or vector is stored in the base register and the index
could contain number of the particular array element needed.
v) Register Indirect

+ --------------- + ---------- + -------- +

I load I regl I base I


+ ------------ +--------- +------- +

(Effective address - contents of base register)

This is a distinctive addressing mode. Many computers just use base plus
offset with an offset value of 0.
4. Types and sizes of operands: Machine instructions are operated on
operands of several types. Some types supported by ISAs include
character (e.g., 8-bit ASCII or 16-bit Unicode), signed and unsigned
integers, and single- and double-precision floating-point numbers. ISAs
typically support various sizes for integer numbers.
For example, arithmetic instructions which operate on 8-bit integers 16- bit
integers (short integers), and 32-bit integers are included in a 32-bit
architecture. Signed integers are represented using two’s complement
binary representation.

5. Instructions: Machine Instructions are of two types control flow instructions


and data processing. Data processing instructions manipulate operands

Manipal University Jaipur B1648 Page No. 38


Computer Architecture Unit 1

in memory locations and registers. These support arithmetic operations,


logic operations, shift operations and data transfer operations. Control flow
instructions help us to change the implementation flow to an instruction
other than the subsequent one in the sequence.
6. Encoding an ISA - There are several factors such as the architecture type,
the number of general purpose registers, the number and type of
instructions, the number of operands, etc. that affects encoding. The
Variable length technique states that every operation can work with almost
all addressing modes compatible with ISAs. The fixed length instruction
encoding, depicts that the opcode is united with addressing mode
specifiers. A third technique known as hybrid is a combination of both. It
reduces inconsistency in instruction encoding, but permits multiple
instruction lengths.
Implementation: Implementation of the instruction set architecture
comprises of two components: organisation and the hardware. The high- level
attributes of a computer’s architecture, such as the memory system, the
memory integration, and the architecture of the internal processor or CPU, are
components of the term organisation. CPU i.e., the central processing unit is
where arithmetic, logic, branching, and data transfer are implemented.
Hardware are the particulars of a computer, as well as the comprehensive
logic design and the packaging technology of the computer. Frequently, a line
of computers contains computers with different comprehensive hardware
implementation. But these computers are identical instruction set architectures
and nearly identical organisations. Figure 2.5 shows the components of
architecture.

Manipal University Jaipur B1648 Page No. 39


Computer Architecture Unit 1

Figure 2.5: Components of Architecture

Here, in this unit, the word architecture covers all three aspects of computer
design - instruction set architecture, organisation, and hardware. Thus,
computer designers must design a computer keeping in mind the functional
requirements as well as price, power, performance, and goals. The functional
requirements also have to be determined by the computer architect, which is
a tedious job. The requirements are determined after reviewing the market
specific features. Also, the computer designers must be aware of the
technology trends in the market and the use of computers to avoid
unnecessary costs and failure of the architecture system. Thus, we will study
some important technology trends in the following section.
Self Assessment Questions
5. The world’s first designer was __________________
6. _________________ acts as the boundary between software and
hardware.
7. ISA has __________________ general-purpose registers.
8. CISC stands for __________________ .

Activity 1:
Visit any two organisations. Now make a list of the different type of computers
they are using - desktop, servers and embedded computers - and compare
with one another. What proportion of each type of computing are they using?

2.4 Technology Trends


Technology trends need to be studied on a regular basis in order to cope with

Manipal University Jaipur B1648 Page No. 40


Computer Architecture Unit 1

the dynamic and rapidly changing market. The instruction set should be
designed such to adapt the rapid changes of the technology. The designer
should plan for the technology changes that would lead to the success of the
computer.
There are mainly four main changes that are essential to modern
implementations. These are as follows:
1. Integrated circuit logic technology: Integrated circuits or microchips are
electronic circuits manufactured by forming interconnections between
semiconductor devices. Changes in this technology occur very soon.
Some examples are the evolution of mobile phones, digital microwave
ovens, etc.
2. Semiconductor DRAM (dynamic random-access memory): DRAM
uses a capacitor to store each bit of data, and the level of charge on each
capacitor determines whether that bit is a logical 1 or 0. However these
capacitors do not hold their charge indefinitely, and therefore the data
needs to be refreshed periodically. It is a semiconductor memory that is
equipped in personal computers and workstations. It increases by about
40% every year.
3. Magnetic disk technology: Magnetic disks include floppy disks, compact
disks, hard disks, etc. The disk facing the drive is coated with magnetic
particles into microscopic areas called domains. These domains acts like
a tiny magnet with north and south poles. This technology is currently
increasing 30% every year.
4. Network technology: Networks may be referred to the range of
computers and its hardware components connected together through the
communication channels. Communication protocols lead the
communication in the network and provide the basis for network
programming. The performance depends both on the switches and the
transmission systems.
These rapidly changing technologies mould the design of the computer that
will have a life of more than five years. It has been observed that with the help
of the study of these technology trends, the computer designers have been
able to reduce the costs at the rate at which the technology changes.
Self Assessment Questions
9. The designer should never plan for the technology changes that would

Manipal University Jaipur B1648 Page No. 41


Computer Architecture Unit 1

lead to the success of the computer. (True/False)


10. ______________ are electronic circuits manufactured by forming
interconnections between semiconductor devices.

2.5 Quantitative Principles in Computer Design


Now that we have understood the changing face of computing and the tasks
of a computer designer, we can explore principles that are useful in design
and analysis of computer design. Let us study some important evaluations and
equations. Figure 2.6 depicts the principles of a Computer Design.

Figure 2.6: Quantitative Principles of Computer Design


2.5.1 Advantages of parallelism
Performance of the computer is improved by taking advantage of parallelism.
The exploitation of parallelism to a great extent enhances the performance.
Firstly, the parallelism should be used at the system level. By using the
multiple processors and multiple disks the performance gets better at a typical
server benchmark. The multiple processors and disks help the workload and
instructions to be spread over. These multiple processors and the disks can
be further expanded. This is an important feature of the servers and is known
as scalability. Secondly, parallelism among instructions can be exploited
through pipelining, at an individual processor level. The main objective of
pipelining is to extend beyond the instruction implementation to cut the total
time taken to complete the instruction series.
In parallel, it is feasible to execute every instruction completely or partially, as
not every instruction depend on its immediate processor. This is the key factor
that permits pipelining to work. Thirdly, parallelism can also be discovered at

Manipal University Jaipur B1648 Page No. 42


Computer Architecture Unit 1

the level of digital designing. Caches that are usually looked for in parallel use
multiple memory banks to find a desired item. Modern ALUs use parallelism
to increase their speed of the process of calculating sums from linear to
logarithmic in the number.
2.5.2 Principle of locality
Principle of locality is an important program property as programs tend to
reuse the data and instructions they have already used. Principle of locality
follows the rule that it can help us foresee the data and instructions that a
program might require in the near future. This forecast is made depending on
the trend of usage of the data and instructions in its history.
There are two different kinds of localities: Temporal Locality which declares
that the items referred in the recent times are potential to be accessed in the
near future and Spatial Locality which states that the items nearby the location
of the recently used items may also be referred close together in time. The
localities are stored in a component called cache memory, which is located
between the CPU (or processor) and the main memory as shown in the figure
2.7.

Figure 2.7: Cache Memory Position

2.5.3 Focus on the common case


This is the most important and widely used principle of computer design. It
states to look into the regular case over the rare case during the designing of
the trade-off. It focuses on planning the spending of resources. This is due to
the recurrent nature of the case as it will increase the impact of improvement.
Focussing on the common case will work positively both for power and
resource allocation, thus, leading to advancement. We need to optimise the

Manipal University Jaipur B1648 Page No. 43


Computer Architecture Unit 1

instruction fetching and coding unit of a processor first, as it may be used more
often than a multiplier. It works on dependability as well.
The optimising of the recurrent case is more beneficial and faster than the
non-recurrent case. It is simpler too. For example, it is rare that an overflow
may occur when adding any two numbers in the processor. Thus, it improves
the performance by optimising the more common case of no overflow. To
apply this principle, all we need to do is analyse what the common case is and
what level of performance can be achieved by improving its speed. To quantify
this, we will study the Amdahl’s Law below.
Amdahl’s Law
This law helps compute the performance gain that can be obtained by
improving any division of the computer. Amdahl’s law states that “the
performance improvement to be gained from using some faster mode of
execution is limited by the fraction of the time the faster mode can be used.”
(Hennessey and Patterson)
Figure 2.8 shows the predicted speed using Amdahl’s law in a graphic form.

Figure 2.8: Predicted Speed using Amdahl’s Law


The law defines the Speedup ratio that can be achieved by improving any
element of the computer. Speedup is:
Performance for entire task using the enhancement when possible
Speedup = ------------------------------------------------------------------------------------
Performance for entire task without using the enhancement

Manipal University Jaipur B1648 Page No. 44


Computer Architecture Unit 1

Or,
Execution time for entire task without using the enhancement
Speedup = -------------------------------------------------------------------------------------
Execution time for entire task using the enhancement when possible

Amdahl’s law helps us to find the speedup from some enhancement. This
depends on the following two factors:
1. The fraction of the computation time in the original computer that can be
converted to take advantage of the enhancement - For example, if 20
seconds of the execution time of a program that takes 60 seconds in total
can use an enhancement, the fraction is 20/60. This value, which we will
call Fraction enhanced, is always less than or equal to 1.
2. The improvement gained by the enhanced execution mode; that is, how
much faster the task would run if the enhanced mode were used for the
entire program - This value is the time of the original mode over the time
of the enhanced mode. If the enhanced mode takes, say, 2 seconds for a
portion of the program, while it is 5 seconds in the original mode, the
improvement is 5/2. We will call this value, which is always greater than 1,
Speedup enhanced.
The execution time using the original computer with the enhanced mode will
be the time spent using the unenhanced portion of the computer plus the time
spent using the enhancement:

Execution time = Execution timeoid x|(1 - FracUonEnhan d ) + Fraction Enhanced k


edu
new Spe pEnhanced

The overall speedup is the ratio of the execution times:


Execution time 1
Speedupoverall = ----- ------ ; --- ;--- — = --------------------------------p --- -------------
Execution time FractionEnhanced
new ( 1-
FraCtiO^nhanced ) + -------- Enhanced
SpeedupEnhanced
In the above equations, often it is difficult to calculate the new and the old
times directly.
Self Assessment Questions
11. Performance of the computer is improved by __________________ .
12. The ability of the servers to expand its processors and disks is known as
____________________ .

Manipal University Jaipur B1648 Page No. 45


Computer Architecture Unit 1

13. The main objective of _____________________ is to extend beyond


the instruction implementation to cut the total time taken to complete the
instruction series.
14. __________________ declares that the item referred in the recent
times has potential to be accessed in the near future.
15. __________________ states that the items nearby the location of
the recently used items may also be referred close together in time.

2.6 Power Consumption


Power consumption is another important design criterion that affects the
design of the modern computers. The power efficiency can normally be dealt
for performance or cost benefits. Recent processor designs put more
emphasis on the power efficiency. Also, in the upcoming world of totally
embedded computers, power efficiency has been the major concern of the
computer designers.
It is now widely accepted that power is the primary concern of the modern
microprocessors, rather it has become a constraint in most of the cases.
Power is a function of both static and dynamic power. Static Power is
proportional to the number of transistors, whereas Dynamic Power is generally
the product of the transistor switching and the switching rate. Static power is
generally the concern at the design stage, while when operating, dynamic
power is the dominant energy consumer.
Technologists estimated the rough percentage of usage by each component
of the computer. This is represented in the following figure 2.9, which shows
CPUs only drawing about 5 percent of a PC's total power.

Manipal University Jaipur B1648 Page No. 46


Computer Architecture Unit 1

Figure 2.9: Pie Chart Showing Power Consumption Distribution

Most techniques used for improved performance, viz. multiprogramming and


multithreading, will of course increase the energy consumption. But the
question here is: Does it increase power consumption at a higher rate than the
increase in performance.
Unfortunately, the current techniques used by the programmers to improve
the performance are inefficient from the point of view of power consumption.
This occurs due to the following two characteristics:
1. Delivering multiple instructions earns some overhead in logic that develops
faster than the issue rate develops. Thus, without voltage reductions to
decrease power, it is probable to lead to lower rate of performance per
watt.
2. There has been observed a growing gap between high issue rates and
continual performance. The number of switching transistor rate is
proportional to the high issue rates and the performance is proportional to
the sustained performance. This causes a growing gap between both of
them leading to increased energy consumption per unit. This gap arises
from many issues.
For example, if we want to sustain four instructions per clock, we must
fetch more, issue more, and initiate execution on more than four
instructions. This will create the same situation of gap and such
techniques cannot improve the long-term power efficiency.

Manipal University Jaipur B1648 Page No. 47


Computer Architecture Unit 1

Self Assessment Questions


16. _________________ can normally be dealt for performance or cost
benefits.
17. _________________ is the product of the transistor switching and
the switching rate.
18. The number of switching transistor rate is proportional to __________
and the performance is proportional to _____________ .

2.7 Summary
Let us recapitulate the important concepts discussed in this unit:
• There are two types of execution - concurrent and parallel.
• Computer design is an activity that converts the architecture design of the
computer into a programming structure implementation of a particular
organisation.
• Computer technology has made drastic changes in the past 60 years when
the first general-purpose computer was invented.
• Desktop computers have the largest market in terms of costs. It varies
from low-end systems to very high-end heavily configured computer
systems.
• The world’s first designer was Charles Baggage and is considered as the
father of computers.
• Computer designer needs to determine the attributes that are necessary
for a new computer, then design a computer to maximise the performance.
• The Instruction Set Architecture (ISA) is the part of the processor that is
visible to the programmer or compiler writer.
• Performance of the computer is improved by taking advantage of
parallelism.
• Focussing on the common case will work positively both for power and
resource allocation, thus, leading to advancement.

2.8 Glossary
• CISC: Complex instruction set computer
• Computer designer: A person who design CPUs or computers that are
actually built and are into considerable use and influence the further
development of computer designs.
• Desktop computers: These are in the form of personal computers (also

Manipal University Jaipur B1648 Page No. 48


Computer Architecture Unit 1

known as PCs) and workstations.


• Embedded computer: A computer system designed to perform a
particular function or task.
• Instruction Set Architecture (ISA): A part of the processor that is
visible to the programmer or compiler writer.
• Integrated circuits: An electronic circuit manufactured by the patterned
diffusion of trace elements into the surface of a thin substrate of
semiconductor material.
• RISC: Reduced instruction set computer
• Supercomputers: These are high-performance computers for scientific
computing.
• VLIW: Very long instruction word

2.9 Terminal Questions


1. Describe the three types of computer markets.
2. Explain the characteristics of embedded computers.
3. Who is a computer designer? Explain the job of a computer designer.
4. What are the components of Instruction Set architecture? Discuss in
brief.
5. Explain the technology trends in computer design.
6. Discuss briefly the quantitative principles in computer design.
7. Elucidate Amdahl’s Law.

2.10 Answers
Self Assessment Questions
1. Microprocessor
2. Main-frame
3. True
4. Minimum cost
5. Charles Baggage
6. ISA
7. 16
8. Complex instruction set computer
9. False
10. Integrated circuits or microchips
11. Adopting parallelism

Manipal University Jaipur B1648 Page No. 49


Computer Architecture Unit 1

12. Scalability
13. Pipelining
14. Temporal Locality
15. Spatial Locality
16. Power efficiency
17. Dynamic Power
18. High issue rates, sustained performance

Terminal Questions
1. Desktop computers have the largest market in terms of costs. It varies from
low-end systems to very high-end heavily configured computer systems.
Refer Section 2.2.
2. An embedded system is a single purpose computer embedded in a devise
to control some particular function of that bigger devise. The performance
requirement of an embedded application is real-time execution. Refer
Section 2.2.
3. Computer Designer is a person who has designed CPUs or computers that
were actually built and came into considerable use and influenced the
further development of computer designs. Refer Section 2.3.
4. Architecture covers all three aspects of computer design - instruction set
architecture, organisation, and hardware. Refer Section 2.3.
5. Technology trends need to be studied on a regular basis in order to cope
with the dynamic and rapidly changing market. The instruction set should
be designed such to adapt the rapid changes of the technology. Refer
Section 2.4.
6. Quantitative principles in computer design are: Take Advantage of
Parallelism, Principle of Locality, Focus on the Common Case and
Amdahl’s Law. Refer Section 2.5.
7. Amdahl’s law states that the performance improvement to be gained from
using some faster mode of execution is limited by the fraction of the time
the faster mode can be used. Refer Section 2.5.
References:
• David Salomon, (2008), Computer Organisation, NCC Blackwell.
• John L. Hennessy and David A. Patterson, Computer Architecture: A
Quantitative Approach, (4th Ed.), Morgan Kaufmann Publishers

Manipal University Jaipur B1648 Page No. 50


Computer Architecture Unit 1

• Joseph D. Dumas II; Computer Architecture; CRC Press


• Nicholas P. Carter; Schaum’s outline of computer Architecture; Mc.
Graw-Hill Professional
E-references:
• http://publib.boulder.ibm.com/infocenter/zos/basics/topic/com.ibm.zos.z
concepts/zconcepts_75.html/ Retrieved on 30-03-2012
• http://www.ibm.com/search/csass/search?sn=mh&q=multiprocessing
%20system&lang=en&cc=us&/ Retrieved on 31-03-2012

Manipal University Jaipur B1648 Page No. 51


Computer Architecture Unit 1

Unit 3 Instruction Set Principles


Structure:
3.1 Introduction
Objectives
3.2 Classifying instruction set architecture
Zero-address instructions
One-address instructions
Two-address instructions Three-address instructions
3.3 Memory Addressing
3.4 Address Modes for Signal Processing
3.5 Operations in the instruction sets
Fetch & decode
Execution cycle (instruction execution)
3.6 Instructions for Control Flow
3.7 MIPS Architecture
3.8 Summary
3.9 Glossary
3.10 Terminal Questions
3.11 Answers

3.1 Introduction
In the previous unit, you have studied about fundamentals of computer
architecture and design. Now we will study in detail about the instruction set
and its principles.
The instruction set or the instruction set architecture (ISA) is the set of basic
instructions that a processor understands. In other words, an instruction set,
or instruction set architecture (ISA), is the part of the computer architecture
related to programming, including the native data types, instructions, registers,
addressing modes, memory architecture, interrupt and exception handling,
and external I/O. There are a number of instructions in a program that have to
be accessed in a particular sequence. This encourages us to describe the
issue of instruction and its sequence which we will study in this unit. In this
unit, you will study the fundamentals involved in instruction set architecture
and design. Firstly, the operations in the instruction sets, instruction set
architecture, memory locations and addresses, memory addressing, abstract
model of the main memory, and instructions for control flow need to be

Manipal University Jaipur B1648 Page No. 52


Computer Architecture Unit 1
categorised. Also, we will discuss about MIPS (Microprocessor without
Interlocked Pipeline Stages) architecture.
Objectives:
After studying this unit, you should be able to:
• classify instruction set architecture
• identify memory addressing
• explain address modes for signal processing
• list the various operations in the instruction sets
• recognise instructions for control flow
• describe MIPS architecture along with its characteristics

3.2 Classifying Instruction Set Architecture


The reference manuals provided with a computer system contain a description
of its physical and logical structure. This gave a description of the internal
construction of the CPU, as well as the processor registers available and their
logical competences. The manuals explain all the hardware-executed
instructions, their binary code format, and an accurate definition of each
instruction very well. The control unit of the CPU deduces each code
instruction and present the essential control functions required to process the
instruction.
The instruction format is generally represented in a rectangular box denoting
the bits of the instruction as they appear in memory words or in a control
register. The bits of the instruction are separated into groups called fields.
The most common fields found in instruction formats are:
1. An operation code field that specifies the operation to be performed.
2. An address field that designates a memory address or a processor
register.
3. A mode field that specifies the way the operand or the effective address is
determined.
Apart from these fields some other special fields can also be employed, for
example a field that gives the number of shifts in a shift-type instruction.
Instruction’s operation code field is known as a collection of bits that describes
a variety of processor operations, such as add, subtract, complement, and
shift. A variety of alternatives for choosing the operands from the given
address is specified by the bits that define the mode field of the instruction
code. Execution of operations is done by some data stored in memory or

Manipal University Jaipur B1648 Page No. 53


Computer Architecture Unit 1
processor registers through specification received by computer instructions.
Operands are identified by a memory address that resides in memory while
the ones residing in processor registers are given by a register address.
A register address is a binary number of k bits that defines one of 2k registers
in the CPU. Thus, a CPU with 16 processor registers R0 through R15 will have
a register address field of four bits. The binary number 0101, for example, will
designate register R5. Instructions in computers can be of different lengths
containing varying number of addresses. The following are the different types
of instruction formats:
3.2.1 Zero-address instructions
In zero-address machines, both operands are assumed to be stored at a
default location. The stack is used as the source of the input operands
machines and the result goes back into the stack. Stack is a LIFO (last-in- first-
out) data structure which is supported by all the processors, whether or not
they are zero-address machines. LIFO implies that the last item placed on the
stack is the first item to be taken out of the stack.
All operations on this type of machine assume that the required input operands
are the top two values on the stack. The result of the operation is placed on
top of the stack. Table 3.1 gives some sample instructions for the stack
machines. Notice that the first two instructions are not zero-address
instructions. These two are special instructions that use a single address and
are used to move data between memory and stack.
Table 3.1: Sample Stack Machine Instructions
Instiuction Sanautics

push addr
Places the value at address addr on top of the stack push([addr])

pop addr Stores the top value on the stack at memory address addr M(addr) =
pop

add
Adds the top two values on the stack and pushes the result onto the
stack pusbtpop ♦ pop)

sub
Subtracts the second top value horn the top value of the stack and
pushes the result onto the stack puslrtpop pop)
mult
Multiplies the top two values in the stack and pushes the result onto tlie
stack pusbtpop ♦ pop)
1 ।

Manipal University Jaipur B1648 Page No. 54


Computer Architecture Unit 1
The zero-address format is used by all other instructions. Now, we will see
how the stack machine converts the arithmetic expression we studied the
earlier subsections. In these machines, the statement:
A=B+C*D-E+F+A
is translated to the following code: push E ; <E> push C ; <C, E> push D ;
<D, C, E> mult ;<C*D, E> push B ; <B, C*D, E> add
;<B+C*D, E> sub ;<B+C*D-E> push F ; <F, B+D*C-E>
add ;<F+B+D*C-E> push A ; <A, F+B+D*C-E> add
;<A+F+B+D*C-E> pop A ; <>
On the right, we show the state of the stack after executing each instruction.
The top element of the stack is shown on the left. Notice that we pushed E
early because we need to subtract it from (B+C*D).
The top portion of the stack is made internal to the processor to implement the
stack machines. This is known as the stack depth. The remaining stack is kept
in memory. Thus, to use the top values that are within the stack depth, we do
not have to access the memory.
3.2.2 One-address instructions
Earlier, memory used to be costly and time-consuming, so unique sets of
registers were used to provide an input operand and to receive the result from
the ALU. Due to this, the registers are known as accumulators. Mostly, there
is only one accumulator register in a machine. This type of design, called
accumulator machines, is prevalent only if memory is expensive.
Most operations, in accumulator machines, are performed on the contents of
the accumulator and the operand supplied by the instruction. Therefore, these
machines’ instructions need to state only the address of an individual operand.
A few sample accumulator machine instructions are shown in table 3.2.
Table 3.2: Sample Accumulator Machine Instructions
Instruction Semantics
load addr
Copies the value at address addr into the accumulator accumulator
= [addr]
store addr
Stores the value in the accumulator at the memoiy address addr
M(addr) = accumulator
add addr
Adds the contents of the accumulator and value at address addr
accumulator = accumulator -I- [addr]

Manipal University Jaipur B1648 Page No. 55


Computer Architecture Unit 1
sub addr
Subtracts the value at memory address addr from the contents of
the accumulator
accumulator = accumulator [addr]
mult addr Multiplies the contents of the accumulator and value at address addr
accumulator = accumulator * [addr]

In these machines, the C statement:


A=B+C*D-E+F+A
is converted to the following code:
load C ; load C into the accumulator
mult D ; accumulator = C*D
add B ; accumulator = C*D+B
sub E ; accumulator = C*D+B-E
add F ; accumulator = C*D+B-E+F
add A ; accumulator = C*D+B-E+F+A
store A ; store the accumulator contents
3.2.3 Two-address instructions
Here each address field determines two address fields i.e either a memory
word or the processor register. Usually, we use dest (as in table 3.3) to
indicate that the address used for destination. Also, this address supplies one
of the source operands. The Pentium is an example processor that uses two
addresses. Table 3.3 gives some sample instructions of a two- address
machine. On these machines, the C statement
A=B+C*D-E+F+A
is converted to the following code:
load T,C ; T = C
mult T,D ; T = C*D
add T,B ; T= B+ C*D
sub T,E ; T= B+ C*D - E
add T,F ; T= B+ C*D - E+ F
add A,T ; A= B + C*D - E + F + A

Table 3.3: Sample Two-Address Machine Instructions

Manipal University Jaipur B1648 Page No. 56


Computer Architecture Unit 1

3.2.4 Three-address instructions


Here each address field determines 3 addresses. The general format of an
instruction is: operation dest, op1, op2
where:
• operation - operation to be carried out;
• dest - address to store the result
• op1, op2 - operands on which instruction is to be executed.
All three addresses are carried openly by the three-address machines. Three
addresses are used by the RISC processors use. Table 3.4 gives some
sample instructions of a three-address machine.
In these machines, the C statement:
A=B+C*D-E+F+A
is converted to the following code:
mult T,C,D ; T = C*D
add T,T,B ; T = B + C*D
sub T,T,E ; T = B + C*D - E
add T,T,F ; T = B +C*D - E + F
add A,T,A ; A = B +C*D - E + F
+A
Table 3.4: Sample Three-Address Machine Instructions

Manipal University Jaipur B1648 Page No. 57


Computer Architecture Unit 1

The three-address format results in short programs while assessing arithmetic


expressions. This is the biggest benefit of the three-address format. The
shortcoming is that the binary-coded instructions need too many

Manipal University Jaipur B1648 Page No. 58


Computer Architecture Unit 1
bits to specify three addresses. For example the Cyber 170 is a commercial
computer which uses three-address instructions (See figure 3.1). The
instruction formats in the Cyber computer are restricted to either three- register
address fields or two-register address fields and one-memory address field.

Figure 3.1: Cyber 170 CPU Architecture


A comparison
There are several advantages of each of the four different types of addressing
examined above. The number of instruction statement that needs to be
executed increases as the number of addresses is reduced. Now, let us

Manipal University Jaipur B1648 Page No. 59


Computer Architecture Unit 1
assume that the number of memory accesses depicts our performance metric;
and the lower this number, it is better.
In the three-address machine, every instruction takes four memory accesses:
one access to read the instruction, two for getting the two input operands, and
a final one to write the result back in memory. As there are a total of five
instructions, this machine generates a total of 20 memory accesses.
Similar to the three-address machine, in the two-address machine, each
arithmetic instruction takes four accesses. Remember, one address is used to
double as a source and destination address. Thus, the five arithmetic
instructions require 20 memory accesses. Additionally, we have the load
instruction that needs three accesses. As a result, it gives a total of 23 memory
accesses.
Reading or writing to an accumulator does not require a memory access as
the accumulator is a register and thus, the number of accumulator machine is
better. In this machine, there are only two accesses required by each
instruction. As there are seven instructions, this machine generates 14
memory accesses. In the end, if it is assumed that the stack depth is
adequately big enough that all our push and pop operations are under the limit
if this value, the stack machine takes 19 accesses. This number is obtained
by noting that each push or pop instruction takes two memory accesses,
whereas the five arithmetic instructions take one memory access each.
This comparison shows us that the accumulator machine is the fastest. The
comparison is done keeping in mind that both accumulator and the stack
machines assume the existence of registers. Conversely, the same cannot be
said for the other two machines. Though the three address instruction can all
be register address, but here in particular, it is assumed that there are no
registers on the three- and two-address machines. In case we assume that
these two machines have a single register to hold the temporary T, the count
for the three-address machine falls down to 12 memory accesses. The
corresponding number for the two-address machine is 13 memory accesses.
This simple example shows that as we reduce the number of addresses, we
tend to increase the number of memory accesses.
Self Assessment Questions
1. The bits of the instruction are divided into groups called __________ .
2. _____________ use an implied accumulator (AC) register for all data
manipulation.

Manipal University Jaipur B1648 Page No. 60


Computer Architecture Unit 1
Activity 1:
After knowing about the different instructions format, find out in which
instruction format your computer is based upon and compare the format with
other formats.

3.3 Memory Addressing


Memory addressing is the logical structure of a computer’s random-access
memory (RAM). We all know that a cell is the general term used for the
smallest unit of memory that the CPU can read or write. The size of a cell in
most modern computers is 8 bits. 8 bits join to form 1 byte. Hardware-
accessible units of memory larger than one cell are called words.
At present, 32 bits (4 bytes) and 64 bits (8 bytes) are the most common word
sizes. Each memory cell has an exclusive integer address, thus, the CPU
accesses a cell by using its address. Addresses of logically adjacent cells differ
by 1. Thus, the address space of a processor is the range of possible integer
addresses, typically (0: 2n - 1).
Any operation to be performed is specified by the operation field of the
instruction. The execution of the operation is performed on some data stored
in computer registers or memory words. Selection of operands during program
execution depends on the addressing mode of the instruction. There are
various ways of specifying address of the data to be operated on. These
different ways of specifying data are called the addressing modes. In other
words, Addressing modes are the method used to determine which part of
memory is being referred to by a machine instruction. RAM is divided into
number of sections which are referenced individually through the addressing
modes. The CPU accesses that portion of memory and performs the action
specified by the machine instruction. Depending upon the type of computer
architecture, the addressing mode is selected. The purpose of using address
mode techniques by the computer is to accommodate one or both of the
following provisions:
1. To give programming versatility to the user by providing such facilities as
pointers to memory, counters for loop control, indexing of data, and
program relocation.
2. To reduce the number of bits in the addressing field of the instruction.
Self Assessment Questions
3. Selection of operands during program execution does not depend on the

Manipal University Jaipur B1648 Page No. 61


Computer Architecture Unit 1
addressing mode of the instruction. (True/ False)
4. Hardware-accessible units of memory larger than one cell are called
words. (True/ False)

3.4 Address Modes for Signal Processing


A distinct addressing mode field is required in instruction format for signal
processing as shown in figure 3.2. The operation code (opcode) specifies the
operation to be performed. The mode field is responsible for locating the
operands needed for the operation.
An address field in an instruction may or may not be present. If it’s there, it
may designate a memory address and if not, then a processor register may be
designated. It is noticeable that each address field may be associated with its
own specific addressing mode.

Opcode Mode Address


Figure 3.2: Instruction Format with Mode Field
The following are the different types of address modes:
Implied Mode: The operands in this mode are specified implicitly in the
explanation of the instruction. For example, the instruction ‘‘complement
accumulator’’ is considered as an implied mode instruction as the description
of the instruction implies the operand in the accumulator register. In fact, all
register references instructions that use an accumulator are implied mode
instructions. Zero-address introductions are implied mode instructions.
For example, the operation:
<a: = b + c;>
can be done using the sequence
<load b; add c; store a;>
The destination (the accumulator) is implied in every "load" and "add"
instruction; the source (the accumulator) is implied in every "store" instruction.
Immediate Mode: The operand in this mode is stated in the instruction itself,
i.e. there is an operand field rather than an address field in the immediate
mode instruction. The operand field contains the actual operand to be used in
union with the operation specific in the instruction. For example:
MVI B, #20h
Means the value 20 is moved to operand B

Manipal University Jaipur B1648 Page No. 62


Computer Architecture Unit 1
ADD r0,#50h; (Add 50 to the contents of R0)
Register Mode: In this mode, the operands are in registers that reside within
the CPU. The register required is chosen from a register field in the instruction.
For example:
Add R4, R3
Means Add the value of R4 and R3 and store in R4
MOV AL, BL
Means, Move the content of register BL to AL
Register Indirect Mode: In this mode, the instruction specifies a register in
the CPU that contains the address of the operand and not the operand itself.
Usage of register indirect mode instruction necessitates the placing of memory
address of the operand in the processor register with a previous instruction.
For example:
ADD R4, (R1)
MOV CX, [BX]
Means the contents of BX (representing the memory address) register will be
moved to CX register
Auto-increment or Auto-decrement Mode: After execution of every
instruction from the data in memory it is necessary to increment or decrement
the register. This is done by using the increment or decrement instruction.
Given upon its sheer necessity, some computers use special mode that
increments or decrements the content of the registers automatically. For
example
Auto-increment:
Add R1, (R2)+
Auto-decrement:
Add R1,-(R2)
Direct Addressing Mode: In this mode, the operand resides in memory and
its address is given directly by the address field of the instruction such that the
affective address is equal to the address part of the instruction. For example:
LD Acc, [5]
(Load the value in memory location 5 into the accumulator)
MOV A, 30h
This instruction will read the data out of the internal RAM address 30
Manipal University Jaipur B1648 Page No. 63
Computer Architecture Unit 1
(hexadecimal) and store it in the Accumulator.
Indirect Addressing Mode: Unlike direct address mode, in this mode, the
address field gives the address where the effective address is stored in
memory. The instruction from memory is fetched through control to read the
address part in order to access memory again to read the effective address.
A few addressing modes require that the address field of the instruction be
added to the content of a specific register in the CPU. The effective address
in these modes is obtained from the following equation:
Effective address = Address part of instruction + Context of CPU register
The CPU Register used in the computation may be the program counter, Index
Register or a base Register. For example:
LD Acc, [5]
(Load the value stored in the memory location pointed to by the operand into
the accumulator)
Relative Address Mode: This mode is applied often with branch type
instruction where the branch address position is relative to the instruction word
address. As such in this mode, the program counter contents are added to the
address element of the instruction so as to acquire the effectual address
whose location in memory is relative to the address of the following instruction.
Since the relative address can be specified with the smaller number of bits
than those required to design the entire memory address, it results in a shorter
address field in the instruction format. For example:
JMP +2
(Will tell the processor to move 2 bytes ahead)
MOV CL, [BX+4]
Guides to move to 4 bytes ahead and move the value to register CL
Indexed Addressing Mode: In this mode, the effective address is acquired
by adding the index register content to an instruction’s address element. The
index register is a unique CPU register which contains an index value and can
be added after its value is used to access the memory. For example:
Add R3, (R1 + R2)
My_array DB ‘1’, ‘2’, ‘3’,’4’,’5’;
MOV AL, My_array [3];

Manipal University Jaipur B1648 Page No. 64


Computer Architecture Unit 1
So AL holds value 4.
Base Register Addressing Mode: In this mode, the affective address is
obtained by adding the content of a base register to the part of the instruction
like that of the indexed addressing mode though the register here is a base
register and not an index register.
MOV AX, [1000+BX]
This instruction adds the contents of BX with 1000 to produce the address of
the memory value to fetch. This instruction is useful for accessing elements of
arrays, records, and other data structures.
The difference between the base register and indexed addressing modes is
based on their usage rather than their computation. An index register is
assumed to hold an index number that is relative to the address part of the
instruction.
A base register is assumed to hold a base address and the address field of
the instruction, and gives a displacement relative to this base address. The
base register addressing mode is handy for relocation of programs from one
memory to another as required in multi programming systems.
The address values of instruction must reflect this change of position with a
base register, the displacement values of instructions do not have to change.
Only the value of the base register requires updating to reflect the beginning
of a new memory segment.
Self Assessment Questions
5. ____________ instructions are implied mode instructions.
6. Relative Address Mode is applied often with __________ instruction.
7. The ______________ is a special CPU register that contains an index
value.

3.5 Operations in the Instruction Sets


A program has a sequence of instructions and it is located in the computer’s
memory unit. The program is implemented by following a cycle for each
instruction. Every instruction cycle is subdivided into a series of sub cycles or
phases. The following describes the parts of an instruction cycle:
1. Fetch an instruction from memory.
2. Decode the instruction.

Manipal University Jaipur B1648 Page No. 65


Computer Architecture Unit 1
3. Read the effective address from memory if the instruction has an indirect
address.
4. Execute the instruction.
After the completion of step 4, the control goes back to step 1 to fetch, decode
and execute the next instruction. This process continues indefinitely unless a
HALT instruction is encountered. In an improved instruction execution cycle,
we can introduce a third cycle known as the interrupt cycle. Figure 3.3 illustrate
how the interrupt cycle fits into the overall cycle.

Manipal University Jaipur B1648 Page No. 66


Computer Architecture Unit 1

Figure 3.3: Instruction Cycle with Interrupts

5. 5.1 Fetch & decode


To bring the instructions from main memory into the instruction register, the
CPU first places the value of PC into memory address register. The PC always
points to the next instruction to be executed. The memory read is initiated and
the instruction from that location gets copied in Instruction Register (IR). PC is
also incremented by one simultaneously so that it points to the next instruction
to be executed. This completes the fetch cycle for an instruction as shown in
figure 3.4.

Figure 3.4: Instructions Cycle


Decoding means interpretation of the instruction. Each and every instruction
initiates a sequence of steps to be executed by the CPU. Decoding means

Manipal University Jaipur B1648 Page No. 67


Computer Architecture Unit 1
deciding which course of action is to be taken for execution of the instruction
and what sequence of control signals must be generated for it. Before
execution, operands, i.e. necessary data is fetched from the memory.
6. 5.2 Execution cycle (instruction execution)
As studied in the previous sections, the fundamental task performed by a
computer is the implementation of a program. The program, that is to be
executed, is a set of instructions, and is stored in memory. The task is
completed when instructions of the program are executed by the central
processing unit (CPU). The CPU is mainly responsible for the instruction
execution. Now, lets examine several typical registers some of which are
generally available in the machines.
These registers are:
Memory Address Register (MAR): It identifies the address of memory
location from where the data or instruction is to be accessed (for read
operation) or where the data is to be stored (for write operations).
Memory Buffer Register (MBR): It is a register that temporarily stores the
data that is to be written in the memory (for write operations) or the data
received from the memory (for read operation).
Program Counter (PC): The program counter keeps a record of the
instruction that is to be performed after the instruction in progress.
Instruction Register (IR): Here, loading of the instructions take place before
they are executed.
The model of instruction processing can simply be stated in a two-step
process. Firstly, the CPU reads (fetches) instructions (codes) from the memory
one by one, and executes or performs the operation specified by this
instruction.
The instruction fetch is performed for every instruction. Instruction fetch
involves reading of an instruction from a position, where it is stored, in the
memory to the CPU. The execution of this instruction may entail various
operations as per the nature of the instruction.
An instruction cycle refers to the processing needed for a single instruction
(fetch and execution). The instruction cycle consist of the fetch cycle and the
execute cycle.
Program execution comes to an end if:

Manipal University Jaipur B1648 Page No. 68


Computer Architecture Unit 1
• The electric power supply is stopped or
• Any irrecoverable error occurs, or
• When a program is executed in sequence.
The fetched instruction is in the form of binary code and is loaded into an
instruction register (IR), in the CPU. The CPU interprets the instruction and
takes the required action.
Self Assessment Questions
8. In an improved instruction execution cycle, we can introduce a third cycle
known as the ____________________________ .
9. Write the full form of:
a. MAR b. MBR

3.6 Instructions for Control Flow


Memory locations are storage houses for instructions. When processed in the
CPU, the instructions are fetched from consecutive memory locations and
implemented. Each time an instruction is fetched from memory, the program
counter is simultaneously incremented with the address of the next instruction
in sequence. Once a data transfer or data manipulation instruction is executed,
control returns to the fetch cycle with the program counter containing the
address of the instruction next in sequence.
In case of a program control type of instruction, execution of instruction may
change the address value in the program counter and cause the flow of control
to be altered. The conditions for altering the content of the program counter
are specified by program control instruction, and the conditions for data-
processing operations are specified by data transfer and manipulation
instructions.
As a result of execution of a program control instruction, a change in value of
program counter occurs, which causes a break in the sequence of instruction
execution. This is an important feature in digital computers, as it provides
control over the flow of program execution and a capability for branching to
different program segments. Some typical program control instructions are
listed in table 3.5.
Table 3.5: Typical Program Control Instructions
Name Mnemonic
Branch BR
Jump JMP

Manipal University Jaipur B1648 Page No. 69


Computer Architecture Unit 1
Skip SKP
Call CALL
Return RET
Compare (by subtraction) CMP
Test (by ANDing) TST

The branch and jump instructions are identical in their use but sometimes they
are used to denote different addressing modes. The branch is usually a one-
address instruction. Branch and jump instructions may be conditional or
unconditional.
An unconditional branch instruction, as a name denotes, causes a branch to
the specified address without any conditions. On the contrary the conditional
branch instruction specifies a condition such as branch if positive or branch if
zero. If the condition is met, the program counter is loaded with the branch
address and the next instruction is taken from this address. If the condition is
not met, the program counter remains unaltered and the next instruction is
taken from the next location in sequence.
The skip instruction does not require an address field and is, therefore, a zero-
address instruction. A conditional skip instruction will skip the next instruction,
if the condition is met. This is achieved by incrementing the program counter
during the execute phase in addition to its being incremented during the fetch
phase. If the condition is not met, control proceeds with the next instruction in
sequence where the programmer inserts an unconditional branch instruction.
Thus, a skip-branch pair of instructions causes a branch if the condition is not
met, while a single conditional branch instruction causes a branch if the
condition is met.
The call and return instructions are used in conjunction with subroutines. The
compare instruction performs a subtraction between two operands, but the
result of the operation is not retained. However, certain status bit conditions
are set as a result of the operation. In a similar fashion, the test instruction
performs the logical AND of two operands and updates certain status bits
without retaining the result or changing the operands. The status bits of
interest are the carry bit, the sign bit, a zero indication, and an overflow
condition.
The four status bits are symbolised by C, S, Z, and V. The bits are set or
cleared as a result of an operation performed in the ALU.
1. Bit C (carry) is set to 1 if the end carry C8 is 1 .It is cleared to 0 if the carry
Manipal University Jaipur B1648 Page No. 70
Computer Architecture Unit 1
is 0.
2. Bit S (sign) is set to 1 if the highest-order bit F7 is 1. It is set to 0 if the bit
is 0. s=0 defines positive number and s=1 defines negative number.
3. Bit Z (zero) is set to 1 if the result of the ALU contains all 0’s. It is cleared
to 0 otherwise. In other words, Z = 1 if the result is zero and Z = 0 if the
result is not zero.
4. Bit V (overflow) is set to 1 if the exclusive-OR of the last two carries is equal
to 1, and cleared to 0 otherwise. This is the condition for an overflow when
negative numbers are in 2’s complement. For the 8-bit ALU, V = 1 if the
result is greater than +127 or less than -128.
As you can see in figure 3.5, the status bits can be checked after an ALU
operation to determine certain relationships that exist between the values of A
and B. If bit V is set after the addition of two signed numbers, it indicates an
overflow condition.

Figure 3.5: Status Register Bits


If Z is set after an exclusive-OR operation, it indicates that A = B. This is so
because x = 0, and the exclusive-OR of two equal operands gives an all-0’s
result which sets the Z bit. A single bit in A can be checked to determine if it is
0 or 1 by masking all bits except the bit in question and then checking the Z
status bit.
Self Assessment Questions
10. When processed in the CPU, the instructions are fetched from
locations and implemented.
11. The ______________ and _____________ are identical in their use
but sometimes they are used to denote different addressing modes.

3.7 MIPS Architecture


Manipal University Jaipur B1648 Page No. 71
Computer Architecture Unit 1
After considerable research on efficient processor organisation and VLSI
integration at Stanford University, the MIPS (Microprocessor without
Interlocked Pipeline Stages) architecture evolved. At the same time, a
research group at Berkeley designed the RISC-I chip based on almost the
same ideas. Today, the acronym RISC is interpreted as "regular instruction
set computer", and the RISC ideas are used in every current microprocessor
design. To get a better idea about MIPS Architecture, look at figure 3.6.

Figure 3.6: MIPS Architecture


The principal features of the MIPS architecture are as follows:
• It has a five-stage execution pipeline: fetch, decode, execute, memory-
access, write-result.
• It has a regular instruction set where all instructions are 32-bit.
• There are three-operand arithmetical and logical instructions.
• It consists of 32 general-purpose registers of 32-bits each.
• There are no status register or instruction side-effects.
• There are no complex instructions (like stack management, string
operations, etc.)
• It has optional coprocessors for system management and floating-point.
• It consists of only the load and store instruction access memory.
• It has a flat address space of 4 Gbytes of main memory (232 bytes).
• The Memory-management unit (MMU) maps virtual to actual physical
addresses.
• Optimising C compiler replaces hand-written assembly code.

Manipal University Jaipur B1648 Page No. 72


Computer Architecture Unit 1
• Hardware structure does not check dependencies.
• Its software tool chain knows about hardware and generates correct code.
MIPS Corporation originated in 1984. R2000 microprocessor was their first
product, followed by the R2010 floating-point coprocessor. The early computer
units effectively utilised both chips. R3000 was the next MIPS processor. It
was a variant of the R2000 with an identical instruction set, but optimised for
low-cost embedded systems. This processor and its system- on-a-chip
implementations are still admired and used extensively even in the present
day. Since then, several improved variants of the original instruction set have
been introduced:
• MIPS-I: This is the original 32-bit instruction set; and is still common.
• MIPS-II: It is an improved instruction set with dozens of new instructions.
• MIPS-III: It has a 64-bit instruction set used by the R4000 series.
• MIPS-IV: It is an upgrade of the MIPS III.

The most significant characteristic of the MIPS architecture is the regular


register set. It consists of the 32-bit wide program counter (PC), and a bank of
32 general-purpose registers called r0, .......... , r31, each of which is 32-bit
wide. All general-purpose registers can be used as the target registers and
data sources for all logical, arithmetical, memory access, and control-flow
instructions. Only r0 is unique because it is internally hardwired to zero.
Reading r0 always returns the value 0x00000000, and a value written to r0 is
mistreated and misplaced.
Self Assessment Questions
12. One of the key features of the MIPS architecture is the ___________ .
13. Two separate 32-bit registers called ___________ and __________
are provided for the integer multiplication and division instructions.

Activity 2:
Visit a computer hardware store and try to collect as much information as
possible about the MIPS processor. Compare its features with other
processors.

3.8 Summary
• Each computer has its own particular instruction code format called its
Instruction Set.
• The different types of instruction formats are three-address instructions,
two-address instructions, one-address instructions and zero-address
Manipal University Jaipur B1648 Page No. 73
Computer Architecture Unit 1
instructions.
• A distinct addressing mode field is required in instruction format for signal
processing.
• The program is executed by going through a cycle for each instruction.
• The prototype chip of MIPS architecture demonstrated that it is possible to
integrate a microprocessor with five-stage execution pipeline and cache
controller into a single silicon chip.

3.9 Glossary
• Cell: The smallest unit of memory that the CPU can read or write is cell.
• Decoding: It means interpretation of the instruction.
• Fields: Groups containing bits of instruction.
• Instruction set: Each computer has its own particular instruction code
format called its Instruction Set.
• MIPS: Microprocessor without Interlocked Pipeline.
• Operation: It is a binary code that instructs the computer to perform a
specific operation.
• RISC: Reduced Instruction Set Computer
• Words: Hardware-accessible units of memory larger than one cell are
called words.

3.10 Terminal Questions


1. What are instruction sets? Explain the fields found in instruction formats.
2. Give the classification of the various instruction sets.
3. Define memory addressing.
4. Explain the different types of addressing modes.
5. Describe the instruction cycle and its various phases.
6. Explain the instructions required for control flow.
7. Write a short note on MIPS architecture.

3.11 Answers
Self Assessment Questions
1. Fields
2. One-address instructions
3. False
4. True
5. Zero-address

Manipal University Jaipur B1648 Page No. 74


Computer Architecture Unit 1
6. Branch type
7. Index register
8. Interrupt cycle
9. a. Memory Buffer Register
10. b. Memory Address Register
11. Consecutive memory
12. Branch, jump instructions
13. Regular register set.
14. HI, LO

Terminal Questions
1. Each computer has its own particular instruction code format called its
Instruction Set. Refer Section 3.2.
2. The different types of instruction formats are three-address instructions,
two-address instructions, one-address instructions and zero-address
instructions. Refer Section 3.2.
3. Memory addressing is the logical structure of a computer’s randomaccess
memory (RAM). Refer Section 3.3.
4. A distinct addressing mode field is required in instruction format for signal
processing. Refer Section 3.4.
5. The program is executed by going through a cycle for each instruction.
Each instruction cycle is now subdivided into a sequence of sub cycles or
phases. Refer Section 3.5.
6. The conditions for altering the content of the program counter are specified
by program control instruction, and the conditions for data- processing
operations are specified by data transfer and manipulation instructions.
Refer Section 3.6.
7. After considerable research on efficient processor organisation and VLSI
integration at Stanford University, the MIPS architecture evolved. Refer
Section 3.7.

References:
• Hwang, K. (1993) Advanced Computer Architecture. McGraw-Hill, 1993.
• D. A. Godse & A. P. Godse (2010). Computer Organization. Technical
Publications. pp. 3-9.
• John L. Hennessy, David A. Patterson, David Goldberg (2002)
"Computer Architecture: A Quantitative Approach", Morgan Kaufmann; 3rd
edition.

Manipal University Jaipur B1648 Page No. 75


Computer Architecture Unit 1
• Dezso Sima, Terry J. Fountain, Peter Kacsuk (1997) Advanced computer
architectures - a design space approach. Addison-Wesley- Longman: I-
XXIII, 1-766.
E-references:
• http://tams-www.informatik.uni-hamburg.de/applets/hades/webdemos/
mips.html http://www.withfriendship.com/user/servex/mips-architecture.
php
• http://en.wikipedia.org/wiki/File:CDC_Cyber_170_CPU_architecture.png
Unit 4 Pipelined Processor
Structure:
4.1 Introduction
Objectives
4.2 Pipelining
4.3 Types of Pipelining
4.4 Pipelining Hazards
4.5 Data Hazards
4.6 Control Hazards
4.7 Techniques to Handle Hazards
Minimising data hazard stalls by forwarding
Reducing pipeline branch penalties
4.8 Performance Improvement Pipeline
4.9 Effects of Hazards on Performance
4.10 Summary
4.11 Glossary
4.12 Terminal Questions
4.13 Answers

4.1 Introduction
In the previous unit, you studied about the changing face of computing. Also,
you studied the meaning and tasks of a computer designer. We also covered
the technology trends and the quantitative principles in computer design. In
this unit, we will introduce you to pipelining processing, the pipeline hazards,
structural hazards, control hazards and techniques to handle them. We will
also examine the performance improvement with pipelines and understand the
effect of hazards on performance.
A parallel processing system can carry out concurrent data processing to
attain quicker execution time. For example, as an instruction is being executed

Manipal University Jaipur B1648 Page No. 76


Computer Architecture Unit 1
in the ALU, the subsequent instruction can be read from memory. The system
may have more than one ALU and be able to execute two or more instructions
simultaneously. Additionally, the system may have two or more processors
operating at the same time. The rationale of parallel processing is to speed up
the computer processing potential and increase it all through.
Parallel processing can be viewed from various levels of complexity. A
multifunctional organisation is usually associated with a complex control unit
to coordinate all the activities among the various components.
There are a variety of ways in which parallel processing can be done. We
consider parallel processing under the following main topics:
1. Pipeline processing
2. Vector processing
3. Array processing
Out of these, we will study the pipeline processing in this unit.
Objectives:
After studying this unit, you should be able to:
• explain the concept of pipelining
• list the types of pipelining
• identify various pipeline hazards
• describe data hazards
• discuss control hazards
• analyse the techniques to handle hazards
• describe the performance improvement with pipelines
• explain the effect of hazards on performance

4.2 Pipelining
An implementation technique by which the execution of multiple instructions
can be overlapped is called pipelining. This pipeline technique splits up the
sequential process of an instruction cycle into sub-processes that operates
concurrently in separate segments. As you know computer processors can
execute millions of instructions per second. At the time one instruction is
getting processed, the following one in line also gets processed within the
same time, and so on. A pipeline permits multiple instructions to get executed
at the same time. Without a pipeline, every instruction has to wait for the
previous one to be complete. The main advantage of pipelining is that it
increases the instruction throughput, which is defined as the number of

Manipal University Jaipur B1648 Page No. 77


Computer Architecture Unit 1
instructions completed per unit time. Thus, a program runs faster.
In pipelining, several computations can run in distinct segments at the same
time. A register is associated with each segment in the pipeline to provide
isolation between each segment. Thus, each segment can operate on distinct
data simultaneously. Pipelining is also called virtual parallelism as it provides
an essence of parallelism only at the instruction level. In pipelining, the CPU
executes each instruction in a series of following stages:
1. Instruction Fetching (IF)
2. Instruction Decoding (ID)
3. Instruction Execution (EX)
4. Memory access (MEM)
5. Register Write back (WB)
The CPU while executing a sequence of instructions can pipeline these
common steps. However, in a non-pipelined CPU, instructions are executed
in strict sequence following the steps mentioned above. In pipelined
processors, it is desirable to determine the outcome of a conditional branch as
early as possible in the execution sequence. To understand pipelining, let us
discuss how an instruction flows through the data path in a five- segment
pipeline.
Consider a pipeline with five processing units, where each unit is assumed to
take 1 cycle to finish its execution as described in the following steps:
a) Instruction fetch cycle (IF): In the first step, the address of the instruction
to be fetched is taken from memory into Instruction Register (IR) and is
stored in PC register.
b) Instruction decode fetch cycle (ID): The fetched instruction is decoded
and instruction is send into two temporary registers. Decoding and reading
of registers is done in parallel.
c) Instruction execution cycle (EX): In this cycle, the result is written into
the register file.
d) Memory access completion cycle (MEM): In this cycle, the address of
the operand calculated during the prior cycle is used to access memory.
In case of load and store instructions, either data returns from memory and
is placed in the Load Memory Data (LMD) register or is written into
memory. In case of branch instruction, the PC is replaced with the branch
destination address in the ALU output register.
e) Register write back cycle (WB): During this stage, both single cycle and
two cycle instructions write their results into the register file.
Manipal University Jaipur B1648 Page No. 78
Computer Architecture Unit 1
These steps of five-segment pipelined processor are shown in figure 4.1.

Figure 4.1: A Five-Segment Pipelined Processor

The segments are isolated by registers. The simplest way to visualise a


segment is to think of a segment consisting of an input register and a
combinational circuit that processes the data stored in register. See table 4.1
for examples of sub operations performed in each segment of pipeline
Table 4.1: Sub operations Performed in Each Segment of Pipeline

Ri < An. R2 ^ Bn, R3 < Cn. R4 ^ Dn Input An. Bn. Cn Dn

R5 ^ An *Bn, R6 < Cn* Dn. Multiply

Add and store in


R7 < R- + R6
Register R7

Manipal University Jaipur B1648 Page No. 79


Computer Architecture Unit 1
Now we will study an example of pipeline in figure 4.2.

An* Bn+ Cn* Dn n = 1, 2, 3, ...........


A BC D
n nn

Figure 4.2: Example of Pipeline Processing

In the figure, each segment has one or three registers with combinational
circuits. Each register is loaded with a new data on start of new time segment.
Refer table 4.2 for an example of contents of Registers in Pipeline.
On 1st clock pulse, data is loaded in registers R1, R2, R3, and R4.
On 2nd clock pulse, product is stored in registers R5 and R6.
On 3rd clock pulse, the data in R5, R6 are added and stored in R7.
So it required a total of 3 clock periods only, to compute An* Bn + Cn* Dn.
Table 4.2: Contents of Registers in Pipeline Example
Segment 1 Segment 2 Segment 3 Segment 4
Clock Pulse
R1 R2 R3 R4 R5 R6 R7

1 A1 B1 C1 D1 - -

2 A2 B2 C2 D2 A1*B1 C1*D1

Manipal University Jaipur B1648 Page No. 80


Computer Architecture Unit 1

3 A3 B3 C3 D3 A *B C *D A *B +C *D

4 - - - - A *B C *D A *B +C *D

5 - - - - - - A *B +C *D

An instruction pipeline functions on a flow of instructions by overlapping the


fetch decade and executes phases of the instruction cycle. High speed
computers usually consist of pipeline arithmetic units. The execution of floating
point operations, multiplication of fixed point numbers and similar
computations encountered in scientific problems are done through these
pipeline arithmetic units.
A pipeline multiplier is essentially an array multiplier with special address
designed to minimise the carry propagation time through the partial products.
Floating point operations are easily decomposed into sub operations.
During the time when the previous instructions are being performed in other
sections, an instruction pipeline reads successive instructions from the
memory. Due to this, the instruction and execute phases have common
characteristics and perform synchronised operations. One possible digression
associated with such a scheme is that an instruction may cause a branch other
than the sequence. In that case, all the instructions that have been read from
memory after the branch instruction must be discarded and the pipeline must
be emptied.

The instruction fetch section can be applied by means of a first-in first-out


(FIFO) buffer. This forms a queue rather than a stack.
The instruction pipeline design will be most competent if the instruction cycle
is divided into segments of equal interval. The time taken by each step to
accomplish its job depends on the instruction and the manner in which it is
executed.
Self Assessment Questions
1. An implementation technique by which the execution of multiple
instructions can be overlapped is called _______ .
2. Pipelining is also called ______________ .
3. LMD is the short for _________________ .
Manipal University Jaipur B1648 Page No. 81
Computer Architecture Unit 1
4. The instruction fetch segment can be implemented by means of a

4.3 Types of Pipelining


Pipelines are of two types - Linear and Non-linear.
a) Linear pipelines: These pipelines perform only one pre-defined fixed
functions at specific times in a forward direction from one stage to next
stage. A linear pipeline can be visualised as a collection of processing
segments, where each segment completes a part of an instruction. The
result obtained from the processing in each segment is transferred to the
next segment in the pipeline. As in these pipelines, repeated evaluations
of the same function are performed with different data for some specified
period of time, these pipelines are also called static pipelines.
b) Non-linear pipelines: These pipelines can perform more than one
operation at a time as they have the provision to be reconfigured to
execute variable functions at different times. As these pipelines can
execute different functions at different times, they are called dynamic
pipelines.
An example of a non-linear pipeline is a three-stage pipeline that performs
subtraction and multiplication on different data at the same time as
illustrated in figure 4.3.

In this three-stage pipeline, the input data must go through stages 1, 2 and 3
to perform multiplication and through stages 1 and 3 only to perform
subtraction. Therefore, dynamic pipelines require feed forward and feedback
connections in addition to the streamline connections between the stages.

Manipal University Jaipur B1648 Page No. 82


Computer Architecture Unit 1
Self Assessment Questions
5. ___________ pipelines perform only one pre-defined fixed functions
at specific times in a forward direction from one stage to next stage.
6. _______________ pipelines can perform more than one operation at
a time as they have the provision to be reconfigured to execute variable
functions at different times.
7. Non-Linear pipelines are also called ___________________ .

4.4 Pipelining Hazards


Hazards are the situations that stop the next instruction in the instruction
stream from being executed during its designated clock cycle. Hazards reduce
the performance from the ideal speedup gained by pipelining. In general, there
are three major categories of hazards that can affect normal operation of a
pipeline.
1. Structural hazards (also called resource conflicts): They occur from
resource conflicts when the hardware cannot support all possible
combinations of instructions in simultaneous overlapped execution. These
are caused by multiple accesses to memory performed by segments. In
most cases this problem can be resolved by using separate instruction and
data memories.
2. Data hazards (also called data dependency): They occur when an
instruction depends on the result of a previous instruction in a way that is
exposed by the overlapping of instructions in the pipeline. This happens
arise when an instruction requires the previous output and output is not
yet present. This is explained in detail in the section 4.5.
3. Control hazards (also called branch difficulties): Branch difficulties arise
from branch and other instructions that change the content of PC (Program
Counter). This is explained in detail in the section 4.6.
Stalling can become essential due to the hazards present in the pipelines. The
processor can stall on different events:
4. A cache miss: Before and after the instruction ends up in a miss, a
cache miss stalls all the instructions on pipeline.
5. A hazard in pipeline: When a hazard is removed, it allows some
instructions in the pipeline to proceed whereas some others are delayed.
Once an instruction is stalled, all the instructions following this instruction
are stalled. Instructions in the line preceding the stalled instruction must
keep going, or else the hazard will never clear.

Manipal University Jaipur B1648 Page No. 83


Computer Architecture Unit 1
Self Assessment Questions
8. ______________ are the situations that stop the next instruction in
the instruction stream from being executed during its designated clock
cycle.
9. Structural Hazards are also called _______________ .
10. Data Hazards are also called ___________________ .
11. Control Hazards are also called _____________ .

4.5 Data Hazards


Pipelining has a major effect on changing the relative timing of instructions by
executing them at the same time. This leads to data and control hazards. In
pipelining, the data hazards arise when the sequence of read/write accesses
to operands thus, altering the sequence of the sequential execution in an
unpipelined machine. In simple terms, data hazard occurs when attempted to
use the data before it is ready. The pipelined execution of such instructions is
given below:
The instructions following the ADD make use of the end result of the ADD
instruction (in R1). The ADD instruction writes the value of R1 in the write back
(WB) pipe stage, but the value is read by the SUB instruction during the
instruction decode (ID) stage (ID sub). This problem is referred to as the data
hazard because a wrong value is read by the sub instruction and an attempt
is made to use it.
The WB stage of ADD will get complete when an interrupt occurs between the
ADD and SUB instructions, and the value of R1 at that point will be the
outcome of the ADD.
As we can see in figure 4.4 and 4.5, AND instruction is also effected by data
hazard, the write of R1 does not finish till the end of clock cycle 5. Therefore,
AND instruction when executed at cycle 4 will not retrieve the correct results.

Figure 4.4: Pipelined Execution of the Instruction


Manipal University Jaipur B1648 Page No. 84
Computer Architecture Unit 1
The SUB instruction reads the wrong value as it reads the data (cycle 3) before
the ADD instruction writes the value (cycle 5). The register read of XOR
instruction occurs in clock cycle 6. This is performed correctly as it is done
after the register write by ADD. The OR instruction can function without any
problem. To attain this, the register files reads are performed in the second
half of the cycle and writes in the first half. In cycle 5, the first of the cycle
performs the write to register file by ADD and the second half of the cycle will
perform the read of registers by OR.

Manipal University Jaipur B1648 Page No. 85


Computer Architecture Unit 1

Figure 4.5: Clock Cycles and Execution Order of Instructions

Self Assessment Questions


12. Pipelining has a major effect on changing the relative timing of instructions
by overlapping their execution. (True/False)
13. The register read of XOR instruction occurs in clock cycle

4.6 Control Hazards


Control Hazards cause a greater performance failure for a pipeline as
compared to data hazards. On execution of a branch, the PC may or may not
change from 4 added to its current value. If the PC is changed by the branch
to its target address, then it is known as taken branch; else it is known as not
taken or untaken. Control hazards are also known as Branching hazard and
occur with branches. In this case, the processor will

Manipal University Jaipur B1648 Page No. 86


Computer Architecture Unit 4
not know the outcome of the branch when it needs to insert a new instruction
into the pipeline stage).
As soon as the branch is detected, the method used to deal with branches is
to stall the pipeline. Until the instruction is confirmed to be a branch, there is
no need to stall the pipeline. Thus, the stall does not occur until after the ID
stage. The pipelining behaviour looks as in figure 4.6.

Branch instruction IF ID EX MEM WB


Branch successor IF sail sail IF ID EX MEM WB
Branch successor +1 IF ID EX MEM WB
Branch successor + 2 IF ID EX MEM
Branch successor + 3 IF ID EX

Branch successor + 4 IF ID
Branch successor + 5 IF
Figure 4.6: Three-Cycle Stall in the Pipeline

The control hazard stall is not implemented in the same way as the data
hazard stall, since the instruction fetch (IF) cycle is to be repeated as soon the
branch cycle is known. Thus, the first IF cycle is definitely a stall, as it never
performs essential tasks. By setting the ID/IF to zero, we can implement the
stall for the three cycles. The repetition of the IF stage is not required, if the
branch is untaken, since the correct instruction may already have been
fetched.
Self Assessment Questions
14. _________ cause a greater performance failure for a pipeline than

15. If the PC is changed by the branch to its target address, then it is known
as __________________ branch; else it is known as __________ .

4.7 Techniques to Handle Hazards


In this section, we will discuss the techniques to handle data and control
hazards. Now, let us start with the concept of forwarding technique to handle
data hazard.

4.7.1 Minimising data hazard stalls by forwarding


The problem posed due to data hazards can be solved with a simple hardware
technique called forwarding (also called bypassing and sometimes short-
circuiting). This is a technique to handle data hazards. The key insight in
B1648 Page No. 87
Computer Architecture Unit 1
forwarding is that the result is not really needed by the SUB instruction until
after the ADD actually produces it. The only problem is to make it available for
SUB when it needs it. If the result can be moved from where the ADD produces
it (execute/memory access (EX/MEM) register), to where it is required by the
SUB (ALU input latch), then a stall requirement can be ignored.
Using this study, the mechanism of forwarding is as follows:
1. The ALU result from the EX/MEM register is reversed backside to the ALU
input latches.
2. If it is detected by the forwarding hardware that the register corresponding
to a source for the current ALU operation is written by the previous ALU
operation, the forwarded result is selected as the ALU input by the control
logic rather than the register file reading the value.
If the SUB instruction is stalled, with forwarding, the ADD instruction will be
completed and the bypass is not made active. This is also true for the case of
a disruption between the two instructions.
Figure 4.5 shows that the results of not only the immediate previous instruction
are forwarded but also from an instruction initiated two cycles earlier. The
bypass paths and the highlights of the timing of the register reads and writes
are shown in figure 4.7. We can execute this code sequence without stalls.

Manipal University Jaipur B1648 Page No. 88


Computer Architecture Unit 1

Figure 4.7: Example with the Bypass Paths in Place

We can generalise forwarding to take account of passing a result directly to


the functional unit that needs it. The output of one unit passes its result to the
input of another, rather than forwarding the result of a unit to the input of the
same unit. For example, let’s consider the following sequence:
By forwarding the result of R1 and R4 from the pipeline registers to the ALU
and data memory inputs, we can prevent a stall.
Store requires an operand during MEM, and forwarding of that operand is
shown in the given figure 4.8(a)

Manipal University Jaipur B1648 Page No. 89


Computer Architecture Unit 1

Figure: 4.8 (a) Forwarding Example

• The first forwarding is for value R1 from EX add to EXlw.


• The second forwarding is also for value R1 from MEM add to EXsw
• The third forwarding is for value R4 from MEMlw to MEMsw
Figure 4.8 (b) shows all the forwarding paths for this example.
A forwarding path in DLX, may be required for the input of any functional unit
from any pipeline register. Forwarding paths are required from both the
ALU/MEM and MEM/WB registers to their inputs, as operands are accepted
by both the ALU and data memory. Additionally, a zero detection unit is used
by DLX (RISC processor architecture). This unit operates during the EX cycle,
and requires forwarding as well. Later in this section, we will explore all the
necessary forwarding paths and the control of those paths.

Figure 4.8(b): Forwarding Paths of the Above Example


The memory input stores the result of the load that is forwarded from the
memory output in memory access/write back (MEM/WB). Also, the ALU output

Manipal University Jaipur B1648 Page No. 90


Computer Architecture Unit 1
is forwarded to the ALU input for the address calculation of both the load and
the store. This is similar to forwarding another ALU operation. If the store is
dependant on an immediately preceding ALU operation, the result would need
to be forwarded to prevent a stall.
4.7.2 Reducing pipeline branch penalties
In this section, we discuss four simple compile-time schemes for dealing with
the pipeline stalls that are caused by branch delay. These four schemes have
actions for a branch that are immobile - they are unchanging for every branch
throughout the whole execution.
The branch penalty can be minimised by the software using knowledge of the
hardware scheme and branch behaviour. The branch optimisations rely on
compile-time branch prediction technology. Hence, we will discuss this
technology after these schemes. Below given are the techniques to handle
branch prediction.
1. Freeze or Flush the pipeline: This is the simplest scheme to handle
branches. In this scheme, till the branch destination is identified, any
instruction is held or deleted after the branch. This solution is considered
attractive due to its simplicity for hardware and software. This solution is
shown in the pipeline in figure 4.6. Here, the branch penalty is static and
cannot be reduced by software.
2. Assume each branch as not-taken: In this scheme, every branch is
treated as not taken; it simply allows the hardware to carry on as if the
branch were not executed. Here, you should be careful that no change
should take place in the machine state until the branch result is certainly
identified. A complication may come up from the need to know when the
situation might be transformed by an instruction and how to “back out” a
change. This complexity persuades us to prefer the simpler solution of
flushing the pipeline in machines with complex pipeline structures.
3. Predict-not-taken or predict-untaken scheme: This scheme focuses on
carrying on the fetching of instructions as if the branch was a standard
instruction. The pipeline seems as if nothing usual is occurring. If the
branch is taken, however, the fetched instruction needs to be turned into
a no-op (simply by clearing the IF/ID register) and the fetch at the target
address need to be restarted. This is shown in figure 4.8.

Lntaken branch instruction IF ID EX MEM WB

Manipal University Jaipur B1648 Page No. 91


Computer Architecture Unit 1
Instruction i + 1 IF ID EX MEM WB

Instruction i * 2 IF ID EX MEM WB
Instruction i + 3 IF ID EX MEM WB
Instruction i + 4 IF ID EX MEM WB

laken branch instruction IF ID EX MEM WB


Instruction i + 1 IF idle idle idle idle
Branch target IF ID EX MEM WB
Branch target +1 IF ID EX MEM WB
Branch target + 2 IF ID EX MEM WB
Figure 4.9: Predict-Not-Taken Scheme

An optional scheme says to consider every branch as taken. Once we decode


the branch and compute the target address, the branch is assumed to be taken
and fetching and executing at the target is initiated. In DLX pipeline, since the
target address is not known before the branch outcome is identified, DLX has
no advantage in this approach. In some machines, where the target addresses
is known before the branch outcome, a predict- taken scheme might make
sense. In both predict-taken or predict-not-taken scheme, the compiler can
improve performance by organising the code so that the most common path
matches the hardware‘s selection. Additional opportunities for the compiler to
improve performance are provided by our fourth scheme.
The fourth scheme that is used some machines is known as delayed branch.
Many microprogrammed control units use this technique. In a delayed branch,
the execution cycle with a branch delay of length n is
Branch instruction
Sequential successor1
Sequential successor2

Sequential successorn Branch target if taken

Manipal University Jaipur B1648 Page No. 92


Computer Architecture Unit 4

The branch-delay slots consist of the sequential successors. Execution of


these instructions, take place whether or not the branch is taken. Figure 4.9
shows the pipeline behaviour, having one branch-delay slot.

Figure 4.10: Behaviour of a Delayed Branch

In reality, there is only a single instruction delay in all machines with delayed
branch, and we emphasize on that case.
Self Assessment Questions
16. The problem posed due to data hazards can be solved with a simple
hardware technique called __________________ .
17. Forwarding is also called _________ or _________________ .
18. ____________ is the method of holding or deleting any instructions
after the branch until the branch destination is known.
19. ________________ technique simply allows the hardware to
continue as if the branch were not executed.

4.8 Performance Improvement with Pipeline


Performance is a relation of CPI (cycles per instruction), Clock cycle and
Instruction count. Reducing any of the three factors will lead to improved
performance.
Firstly, it is necessary to relate concept of pipelining to the instruction
execution process i.e. overlap computations of diverse tasks by operating on
them simultaneously in different stages. This will decrease the clock cycle,
and decrease effective time taken by the CPU in comparison to Manipal
University Jaipur B1648 Page No.
94
Computer Architecture Unit 1

original clock cycle. Instruction execution process lends itself naturally to


pipelining the overlap of subtasks of instruction fetch, decode and execute.

Figure 4.11: Pipeline Clock and Timing

In the above given figure 4.11;


Clock cycle of the pipeline: T
Latch delay: d
t = max {tm} + d
Pipeline frequency: f
f=1/t
Performance counters: Performance counters are the components of real
processors that follow a variety of actions carried out by a processor to
facilitate the understanding of its performance. Here, we will study four
memory-mapped performance counters:
• Cycle count - 0xFF00: These are the number of cycles from the time when
the processor was last reset
• Instruction count - 0xFF01: This is the number of actual instructions
implemented since the processor was last retuned.
• Load-stall count - 0xFF02: It states the number of cycles lost to loaduse
stalls. Load-use stalls are the quantity of cycles where no instructions are
executed because of a load-use stall.
• Branch stall count - 0xFF03: This counts the cycles lost to branch mis-
predictions and/or stalls. It shows the number of cycles where no
instruction are executed because of a branch mis-prediction.
• In the single-issue pipeline, for every cycle one (and only one) of the
instruction count, load-stall, or branch stall counters is incremented. As

Manipal University Jaipur B1648 Page No. 94


Computer Architecture Unit 1

such, the cycle count should be equal to the sum of these three registers.
• In the dual-issue processor, only one of the instruction count, load stall, or
branch stall counters is increased, but the instruction count register may
sometimes be incremented by two (for cycles in which two instructions
execute). As such, the sum of these three registers will be greater than or
equal to the cycle count.
During the write back stage of the pipeline, performance counters should be
counted by the processor. To be precise, it is neither a branch stall nor a load
stall cycle. The current value of these counters can be determined by using a
LD or LDR instruction to access them. The LD instruction takes a source label
and stores its address into the destination register. The source register's value
plus an immediate value offset is stored in the LDR and then the destination
register stores it.
To avoid the complexities, the value of the registers is not changed by the
stores to these locations, the contents of memory may still be updated by the
stores. This hardly makes any change as, anytime these locations are
retrieved, the value in the counter is used rather than the value in the memory.
Basically, these counters can be reset to zero only when the entire system is
reset.
Self Assessment Questions
20. ____________ states the number of cycles lost to load-use stalls.
21. ____________ instruction takes a source label and stores its address
into the destination register.
22. ____________ stores the source register's value plus an immediate
value offset and stores it in the destination register.

4.9 Effect of Hazards on the Performance


Hazards are of various types. They render the speed of the performance
improvement. The pipeline performance degrades the ideal performance due
to a stall hazard.
Average instruction time unpipelined
Speedup from pipehnrng= ------------:::::------------------------------
Average instruction time pipelined

Manipal University Jaipur B1648 Page No. 95


Computer Architecture Unit 1

CPIunpipelined X Clock Cycle Timeunpipelined


CPIpipelined X Clock Cycle Timepipelined

CPI is cycles per Instruction which determine the cycle count for each
instruction. The ideal CPI on a pipelined machine is almost always 1.
Therefore, the pipelined CPI is:
CPIpipelined = Ideal CPI + Pipeline stall clock cycles per instruction

= 1 + Pipeline stall clock cycles per instruction

If the cycle time overhead of pipelining is ignored and the stages are all
assumed to be perfectly balanced, then the two machines have an equal cycle
time and:
CPI
unpipelined
Speedup=
1+ Pipeline stall cycles per instruction
If all instructions take the same number of cycles, which must also equal the
number of pipeline stages (the depth of the pipeline) then unpipelined CPI is
equal to the depth of the pipeline, leading to

Pipeline depth
Speedup=
1 + Pipeline stall cycles per instruction
If there are no pipeline stalls, this leads to the intuitive result that pipelining
can improve performance by the depth of pipeline.
Self Assessment Questions
23. A __________ hazard causes the pipeline performance to degrade
the ideal performance.
24. CPI is the abbreviation for ___________ .

Activity 1:
Pick any two hazards from the organisation you previously visited. Now
implement the handling techniques to these hazards.

4.10 Summary
Let us recapitulate the important concepts discussed in this unit:
• A parallel processing system is able to perform concurrent data

Manipal University Jaipur B1648 Page No. 96


Computer Architecture Unit 1

processing to achieve faster execution time.


• An implementation technique by which the execution of multiple
instructions can be overlapped is called pipelining. In pipelining, the CPU
executes each instruction in a series of following small common steps:
• Instruction Fetching (IF)
• Instruction Decoding (ID)
• Instruction Execution(EX)
• Memory Access (MEM)
• Write back (WB)
• The segments are isolated by registers. The simplest way to visualise a
segment is to think of a segment consisting of an input register and a
combinational circuit that processes the data stored in register.
• Linear pipelines perform only one pre-defined fixed functions at specific
times in a forward direction from one stage to next stage. Non-Linear
pipelines can perform more than one operation at a time as they have the
provision to be reconfigured to execute variable functions at different
times.
• Hazards are the situations that stop the next instruction in the instruction
stream from being executed during its designated clock cycle. Structural
Hazards occur from resource conflicts when the hardware cannot support
all possible combinations of instructions in simultaneous overlapped
execution.
• Data Hazards occur when an instruction depends on the result of a
previous instruction in a way that is exposed by the overlapping of
instructions in the pipeline.
• Pipelining has a major effect on changing the relative timing of instructions
by overlapping their execution. This leads to data and control hazards.
Control Hazards do cause a greater performance failure for a pipeline than
do data hazards.
• The problem posed due to data hazards can be solved with a simple
hardware technique called forwarding.
4.11 Glossary
• CPI: Cycles per Instruction
• EX: Instruction Execution
• FIFO: First-in first-out
• Freeze or Flush the pipeline: Holding or deleting any instructions after

Manipal University Jaipur B1648 Page No. 97


Computer Architecture Unit 1

the branch until the branch destination is known.


• Forwarding: A simple hardware technique that can solve the problem
posed due to data hazards.
• Hazards: Situations that stop the next instruction in the instruction stream
from being executed during its designated clock cycle.
• ID: Instruction Decoding
• IF: Instruction Fetching
• Pipelining: An implementation technique by which the execution of
multiple instructions can be overlapped
• Pipeline multiplier: An array multiplier with special address designed to
minimise the carry propagation time through the partial products.
• WB: Write back

4.12 Terminal Questions


1. What do you understand by Parallel Processing? What are the different
types of Parallel Processing?
2. Describe Pipelining Processing. Explain the sequence of instructions in
Pipelining.
3. Explain briefly the types of Pipelining.
4. What do you mean by Hazards? Explain the types of Hazards.
5. Explain in detail the techniques to handle Hazards.

4.13 Answers
Self Assessment Questions
1. Pipelining
2. Virtual parallelism
3. Load Memory Data
4. First-in first-out (FIFO) buffer
5. Linear
6. Non-Linear
7. Dynamic pipelines
8. Hazards
9. Resource conflicts
10. Data dependency
11. Branch difficulties

Manipal University Jaipur B1648 Page No. 98


Computer Architecture Unit 1

12. True
13. 6
14. Control Hazards, data hazards
15. Taken, not taken or untaken
16. Forwarding
17. Bypassing or short-circuiting
18. Freeze or flush the pipeline
19. Assume each branch as not-taken
20. Load-stall count
21. LD
22. LDR
23. Stall
24. Cycles per Instruction

Terminal Questions
1. The concurrent use of two or more CPU or processors to execute a
program is called parallel processing. For details -Refer Section 4.1.
2. An implementation technique by which the execution of multiple
instructions can be overlapped is called pipelining. Refer Section 4.2 for
more details.
3. There are two types of pipelining-Linear and non-linear. Refer Section 4.3
for more details.
4. Hazards are the situations that stop the next instruction in the instruction
stream from being executed during its designated clock cycle. Refer
Section 4.4.
5. There are two techniques to handle hazards namely minimising data
hazard stalls by forwarding and reducing pipeline branch penalties. Refer
Section 4.7.
References:
• David Salomon, Computer Organisation, 2008, NCC Blackwell
• John L. Hennessy and David A. Patterson, Computer Architecture: A
Quantitative Approach, Fourth Edition, Morgan Kaufmann Publishers
• Joseph D. Dumas II; Computer Architecture; CRC Press
• Nicholas P. Carter; Schaum’s outline of computer Architecture; McGraw-
Hill Professional

Manipal University Jaipur B1648 Page No. 99

You might also like