Computer Architecture AllClasses-Outline-1-99
Computer Architecture AllClasses-Outline-1-99
Structure:
1.1 Introduction
Objectives
1.2 Computational Model
The basic items of computations
The problem description model
The execution model
1.3 Evolution of Computer Architecture
1.4 Process and Thread
Concept of process
Concept of thread
1.5 Concepts of Concurrent and Parallel Execution
1.6 Classification of Parallel Processing
Single instruction single data (SISD)
Single instruction multiple data (SIMD) Multiple instruction single
data (MISD) Multiple instruction multiple data (MIMD)
1.7 Parallelism and Types of Parallelism
1.8 Levels of Parallelism
1.9 Summary
1.10 Glossary
1.11 Terminal Questions
1.12 Answers
1.13 Introduction
As you all know computers vary greatly in terms of physical size, speed of
operation, storage capacity, application, cost, ease of maintenance and
various other parameters. The hardware of a computer consists of physical
parts that are connected in some way so that the overall structure achieves
the pre-assigned functions. Each hardware unit can be viewed at different
levels of abstraction. You will find that simplification can go on to still deeper
levels. You will be surprised to know that many technologies exist for
manufacturing microchips.
• De-multiplexers
• Coders
• Decoders
• I/O Controllers
A common foundation or paradigm that links the computer architecture and
language groups is called a Computational Model. The concept or idea of
computational model expresses a higher level of abstraction than can be
achieved by either the computer architecture or the programming language
alone, and includes both.
The computational model consists of the subsequent three abstractions:
1. The basic items of computations
2. The problem description model
3. The execution model
Unlike the ordinary delusions, the set of abstractions that must be selected to
state computational models is not very clear. Some criteria will define fewer
but relatively basic computational models, while a wide variety of criteria will
result in a fairly a huge quantity of different models.
3.1.1 The basic items of computations
This concept recognises the basic items of computation. This is a requirement
of the items to which the computation is referred and the sort of computations
(operations) that are executed on them. For example, in the von Neumann
computational model, the fundamental items of computation are data.
This data will normally be characterised by individual bodies so as to be
capable of distinguishing among several different data items in the course of
a computation. These identifiable bodies are commonly called variables in
programming languages and are put into operation by register addresses or
memory in architectures.
The acknowledged computational models, such as Turing model, the von
Neumann model and the data flow model stand on the theory of data. These
models are briefly explained as below:
The Turing machine architecture operates by manipulating symbols on a
tape. In other words, a tape with innumerable slots exists, and at any one point
in time, the Turing machine is in a specific slot. The machine can change the
symbol and shift to a different slot based on the symbol read at that slot. All of
this is inevitable.
The von-Neumann architecture explains the stored-program computer
where data and instructions are stored in memory and the machine works by
varying its internal state, In other words, an instruction operates on some data
and changes the data. So naturally, there is a state maintained in the system.
Dataflow architecture expressively distinguishes the conventional von
Neumann architecture or control flow architecture. There is a lack of a program
counter in Dataflow architectures. The execution of instructions in dataflow
systems is exclusively concluded depending on the accessibility of input
arguments to the instructions. Even though dataflow architecture has not been
used in any commercially successful computer hardware, it is extremely
appropriate in many software architectures such as database engine designs
and parallel computing frameworks.
On the other hand, there are various models independent of data. In these
models, the basic items of computation are:
• Messages or objects sent to them needing an associated manipulation (as
in the object-based model)
• Arguments and the functions applied on them (applicative model)
• Elements of sets and the predicates declared on them (predicate-logic-
based model).
1.2.2 The problem description model
The problem description model implies in cooperation the style and method of
problem description. The problem description style specifies the way troubles
in a specific computational model are expressed. The style is either procedural
or declarative. The algorithm to work out the problem is shown in a procedural
style. A particular result is then stated in the form of an algorithm. In a
declarative style, all the facts and dealings significant to the specified problem
have to be stated.
There are two modes for conveying these relationships and facts. The first
employs functions, as in the applicative model of computation, while the
second declares the relationships and facts in the form of predicates, as in the
predicate-logic-based computational model. Now, we will study the second
component of the problem description model that is the problem description
method. It is understood in a different way for the procedural and the
declarative style. In the procedural style, the problem description model states
the way in which the clarification of the known problem has to be explained.
On the contrary, while using the declarative style, it states the method in which
the difficulty itself has to be explained.
1.2.3 The execution model
This is the third and the final constituent of computational model. It can be
divided into three stages.
• Interpretation of how to perform the computation
• Execution semantics
• Control of the execution sequences
The first stage pronounces the analysis of the computation, which is strongly
linked to the problem description method. The selection of problem description
method and the analysis of the computation are mutually dependent on one
another.
The subsequent stage of the execution model states the execution semantics.
This is taken as a rule that identifies the way a particular execution step is to
be performed. This rule is, certainly, linked with the selected problem
description method and the way the implementation of the computation is
understood. The final stage of the model states the rule of the execution
sequences. In the basic models, implementation is either control driven or data
driven or demand driven.
• In Control driven execution, it is supposed that there is a program
consisting of a succession of instructions. The execution sequence is then
absolutely specified by the command of the directions.
Nevertheless, explicit control instructions can also be used to identify an
exit from the implied execution sequence.
• Data-driven execution is symbolised by the rule that an operation is made
active instantly after all the needed input data is available. Data- driven
execution control is characteristic of the dataflow model of computation.
• In Demand-driven execution, the operations will be made active only when
their implementation is required to attain the ultimate result. Demand-
driven execution control is normally used in the applicative computational
model.
Self Assessment Questions
1. The _________ model refers to both the style and method of problem
description.
IAS machine was a new version of the EDVAC, which was built by von
Neumann. The basic design of IAS machine is now known as von Neumann
machine, which had five basic parts - the memory, the arithmetic logic unit, the
program control unit, the input and output unit as shown in figure 1.2.
Activity 1:
Using the Internet, find out about Fifth Generation Computer Systems
project (FGCS), idea behind it, implementation, timeline and outcome
tN t3 t2 t1
Figure 1.8: Parallel Computing Systems
one clock-cycle.
This is the oldest and of late, the most widespread structure of computer.
Examples: Most PCs, single CPU workstations and mainframes.
Figure 1.9 shows an example of SISD.
and united into groups which are then acted upon in parallel without altering
the outcome of the program. This is known as instructionlevel parallelism.
Advances in instruction-level parallelism dominated computer architecture
from the mid-1980s until the mid-1990s.
Data parallelism: Data parallelism is parallelism inbuilt in program loops. It
centres at allocating the data across various computing nodes to be processed
in parallel. Parallelising loops recurrently leads to related (not necessarily
identical) operation sequences or functions being performed on elements of a
large data structure. Many scientific and engineering applications display data
parallelism.
Self Assessment Questions
15. Parallel computers offer the potential to concentrate computational
resources on important computational problems. (True/ False)
16. Advances in instruction-level parallelism dominated computer architecture
from the mid-1990s until the mid-2000s. (True/False)
different sizes of granularity. In this respect, we can identify the following four
levels and corresponding granularity sizes:
• Parallelism at the instruction level (fine-grained parallelism): Available
instruction-level parallelism means that particular instructions of a program
may be executed in parallel. To this end, instructions can be either
assembly (machine-level) or high-level language instructions. Usually,
instruction-level parallelism is understood at the machinelanguage
(assembly-language) level.
• Parallelism at the loop level (middle-grained parallelism): Parallelism
may also be available at the loop level. Here, consecutive loop iterations
are candidates for parallel execution. However, data dependencies
between subsequent loop iterations, called recurrences, may restrict their
parallel execution.
• Parallelism at the procedure level (middle-grained parallelism): Next,
there is parallelism available at the procedure level in the form of parallel
executable procedures. The extant of parallelism exposed at this level is
subject mainly to the kind of the problem solution considered.
• Parallelism at the program level (coarse-grained parallelism): Lastly,
different programs (users) are obviously independent of each other. Thus,
parallelism is also available at the user level (which we consider to be
coarse-grained parallelism). Multiple, independent users are a key source
of parallelism occurring in computing scenarios.
Utilisation of functional parallelism: Available parallelism can be utilised by
architectures, compilers and operating systems conjointly for speeding up
computation. Let us first discuss the utilisation of functional parallelism.
In general, functional parallelism can be utilised at four different levels of
granularity, that is,
• Instruction
• Thread
• Process
• User level
It is quite natural to utilise available functional parallelism, which is inherent in
a conventional sequential program, at the instruction level by executing
instructions in parallel. This can be achieved by means of architectures
capable of parallel instruction execution. Such architectures are referred to as
instruction-level function-parallel architectures or simply instruction-level
parallel architectures, commonly abbreviated as ILP-architectures.
Activity 2:
Decide which architecture is most appropriate for a given application. First
determine the form of parallelisation which would best suit the application, then
decide both hardware and software for running your parallelised application
1.9 Summary
Let us recapitulate the important concepts discussed in this unit:
• Computer Architecture deals with the issue of selection of hardware
components and interconnecting them to create computers that achieve
specified functional, performance and cost goals.
• The concept of a computational model represents a higher level of
abstraction than can be achieved by either the computer architecture or
the programming language alone, and covers both.
• History of computers begins with the invention of the abacus in 3000 BC,
followed by the invention of mechanical calculators in 1617. Fifth
generation computers are still under research and development.
• Each process provides the resources needed to execute a program.
• A thread is the entity within a process that can be scheduled for execution.
• Concurrent execution is the temporal behaviour of the N-client 1-server
model where one client is served at any given moment.
1.10 Glossary
• EDSAC: Electronic Delay Storage Automatic Calculator
• EDVAC: Electronic Discrete Variable Automatic Computer
• ENIAC: Electronic Numerical Integrator and Calculator
• IC: Integrated Circuit where hundreds of transistors could be put on a
single small circuit.
• LSI: Large Scale Integration, it can pack more than a million transistors
• MSI: Medium Scale Integration, it packs as many as 100 transistors
• PCB: Process Control Block, it is a description table which contains all the
information relevant to the whole life cycle of a process.
• SSI: Small Scale Integration, it can pack 10 to 20 transistors in a single
chip.
• UNIVAC I: Universal Automatic Calculator
• USLI: Ultra Large-Scale Integration, it contains millions of components on
a single IC
• VLSI: Very Large Scale Integration, it can have up to 1000 transistors
1.11 Terminal Questions
1. Explain the concept of Computational Model. Describe its various types.
2. What are the different stages of evolution of Computer Architecture?
Explain in detail.
3. What is the difference between process and thread?
4. Explain the concepts of concurrent and parallel execution.
5. State Flynn’s classification of Parallel Processing.
6. Explain the types of parallelism.
7. What are the various levels of parallelism?
1.12 Answers
Self Assessment Questions
1. Problem description
2. Procedural style
3. Data-driven
4. Pascaline
5. IAS machine
6. False
7. True
8. True
9. False
10. N-client 1-server
11. True
12. Single Instruction Multiple Data
13. Multiple Instruction Single Data
14. Multiple Instruction Multiple Data
15. True
16. False
17. Utilised parallelism
18. False
19. True
Terminal Questions
1. A common foundation or paradigm that links the computer architecture
and language classes is called a Computational Model. Refer Section 1.2.
2. History of computers begins with the invention of the abacus in 3000 BC,
followed by the invention of mechanical calculators in 1617. The years
beyond 1642 till 1980 are marked by inventions of zeroth, first, second and
third generation computers. Refer Section 1.3.
3. A thread is the entity within a process that can be scheduled for execution.
Refer Section 1.4.
4. Concurrent execution is the temporal behaviour of the N-client 1-server
model where one client is served at any given moment. Parallel execution
is associated with N-client N-server model. Refer Section 1.5.
5. Flynn classifies the computer system into four categories. Refer Section
1.6.
6. There are three types of parallelism. Refer section 1.7.
7. The notion of parallelism is used in two different contexts. Either it
designates available parallelism in programs or it refers to parallelism
References:
• Hwang, K. (1993) Advanced Computer Architecture. McGraw-Hill, 1993.
• D. A. Godse & A. P. Godse (2010). Computer Organization. Technical
Publications. pp. 3-9.
• John L. Hennessy, David A. Patterson, David Goldberg (2002)
"Computer Architecture: A Quantitative Approach", Morgan Kaufmann; 3rd
edition.
• Dezso Sima, Terry J. Fountain, Peter Kacsuk (1997) Advanced computer
architectures - a design space approach. Addison-Wesley- Longman: I-
XXIII, 1-766
E-references:
• www.cs.clemson.edu/~mark/hist.html
• www.people.bu.edu/bkia/
• www.ac.upc.edu/
• www.inf.ed.ac.uk/teaching/courses/car/
Structure:
2.1 Introduction
Objectives
2.2 Changing Face of Computing
Desktop computing
Servers
Embedded computers
2.3 Computer Designer
2.4 Technology Trends
2.5 Quantitative Principles in Computer Design
Advantages of parallelism
Principle of locality
Focus on the common case
2.6 Power Consumption
2.7 Summary
2.8 Glossary
2.9 Terminal Questions
2.10 Answers
2.1 Introduction
In the previous unit, you studied about the computational model and the
evolution of computer architecture. Also, you studied the concept of process
thread. We also covered two types of execution - concurrent and parallel and
also the types and level of parallelism. In this unit, we will throw light on the
changing face of computing, the task of computer designer and its quantitative
principles. We will also examine the technology trends and understand the
concept of power consumption and efficiency of the matrix.
You can define computer design as an activity that converts the architecture
design of the computer into a programming structure implementation of a
particular organisation. Thus, computer design is also referred to as computer
implementation. Computer designer is responsible for the hardware
architecture of the computer.
Objectives:
After studying this unit, you should be able to:
• identify the changing face of computing
• explain the tasks of the computer designer
• describe the technology trends
• discuss the quantitative principles of the computer design
• describe power consumption and efficiency of the matrix
These changes have dramatically changed the face of computing and the
computing applications. This has led to three different computer markets each
adapted with different requirements, specifications and applications. These
are explained as follows:
2.2.1 Desktop computing
Desktop computers have the largest market in terms of costs. It varies from
low-end systems to very high-end heavily configured computer systems.
Throughout this range the cost and the competence also varies in terms of
performance. This blend of the performance and the price concerns most to
the customers in the market and thus, to the computer designers.
Consequently, the latest, the highest-performance microprocessors and cost-
reduced microprocessors are largely sold in the category of the desktop
systems.
Characteristics of desktop computing
The important characteristics of desktop computing are:
1. Ease-of-use: In desktop computers, all the computer parts come as
separate detachable components of the computer. Thus, making the use
easy and comfortable for the user.
2. Extensive graphic capabilities: It provides extensive graphics
crucial, but the chief objective is to meet the performance need at the minimum
cost.
Characteristics of embedded computers
1. Real-time performance: The performance requisite in an embedded
application is real-time execution. Speed, though in varying degrees, is an
important factor in all architectures. The ability to assure real-time
performance acts as a constraint on the speed needs of the system. Real-
time performance means that the agent is assured to perform within
certain time restraints as specified by the task and the environment.
2. Soft real-time: In a number of applications, a more advanced requisite
exists: the standard time for a particular job is constrained and the number
of occurrences when the maximum time is exceeded. Such techniques are
occasionally called soft real-time and they occur when it is possible to
sometimes miss the time limitation on an incident, provided that not plenty
of them are missed.
3. Need to minimise memory size: Memory can be a considerable element
of the system cost. Thus, it is vital to limit the memory size according to
the requirement.
4. Need to minimise memory power: Larger memory also means high
power need. Emphasis on low power is made by the use of batteries.
Unnecessary usage of power needs to be avoided to keep the power need
low.
Self Assessment Questions
5. The __________ had the ability to integrate the functions of a
computer’s Central Processing Unit (CPU) on a single-integrated circuit.
6. _____________ computers used to support typical applications like
business data support and large-scale scientific computing.
7. The performance requirement in an embedded application is real-time
execution. (True/False)
8. ______________ is the chief objective of embedded computers.
computer architects. The world’s first designer was Charles Baggage, (1791
- 1871) (See Figure 2.2). He is considered as the father of computers and
holds the credit of inventing the first mechanical computer that eventually led
to more complex designs.
Now, we will discuss the low-level implementation of the 80x86 instruction set.
Computers cannot execute high-level language constructs like ones found in
C. Rather they execute a relatively small set of machine instructions, such as
addition, subtraction, Boolean operations, and data transfers. Thus, the
engineers decided to encode the instructions in a
numeric format (suitable for storage in memory). The structure of the ISA is
given below:
1. Class of ISA: The operands are registers or memory locations and
approximately all ISAs are now categorised as general-purpose register
architectures. The 80x86 has 16 general-purpose registers and 16 registers
for floating-point data. The two accepted editions of this class are register-
memory ISAs, which can access memory only with load or store-instruction.
Figure 2.4 shows the structure of a programming model consisting of General
Purpose Registers and Memory.
—
I load | reg | address I
—
Even this needs large space in an instruction for large address. The address
is the beginning of an array and the particular array element needed could be
selected by the index.
iii) Base plus index plus offset
The beginning address of the array could be stored in the base register, the
index will choose the particular record needed and the offset can choose the
field inside that record.
iv) Scaled
The beginning of an array or vector is stored in the base register and the index
could contain number of the particular array element needed.
v) Register Indirect
This is a distinctive addressing mode. Many computers just use base plus
offset with an offset value of 0.
4. Types and sizes of operands: Machine instructions are operated on
operands of several types. Some types supported by ISAs include
character (e.g., 8-bit ASCII or 16-bit Unicode), signed and unsigned
integers, and single- and double-precision floating-point numbers. ISAs
typically support various sizes for integer numbers.
For example, arithmetic instructions which operate on 8-bit integers 16- bit
integers (short integers), and 32-bit integers are included in a 32-bit
architecture. Signed integers are represented using two’s complement
binary representation.
Here, in this unit, the word architecture covers all three aspects of computer
design - instruction set architecture, organisation, and hardware. Thus,
computer designers must design a computer keeping in mind the functional
requirements as well as price, power, performance, and goals. The functional
requirements also have to be determined by the computer architect, which is
a tedious job. The requirements are determined after reviewing the market
specific features. Also, the computer designers must be aware of the
technology trends in the market and the use of computers to avoid
unnecessary costs and failure of the architecture system. Thus, we will study
some important technology trends in the following section.
Self Assessment Questions
5. The world’s first designer was __________________
6. _________________ acts as the boundary between software and
hardware.
7. ISA has __________________ general-purpose registers.
8. CISC stands for __________________ .
Activity 1:
Visit any two organisations. Now make a list of the different type of computers
they are using - desktop, servers and embedded computers - and compare
with one another. What proportion of each type of computing are they using?
the dynamic and rapidly changing market. The instruction set should be
designed such to adapt the rapid changes of the technology. The designer
should plan for the technology changes that would lead to the success of the
computer.
There are mainly four main changes that are essential to modern
implementations. These are as follows:
1. Integrated circuit logic technology: Integrated circuits or microchips are
electronic circuits manufactured by forming interconnections between
semiconductor devices. Changes in this technology occur very soon.
Some examples are the evolution of mobile phones, digital microwave
ovens, etc.
2. Semiconductor DRAM (dynamic random-access memory): DRAM
uses a capacitor to store each bit of data, and the level of charge on each
capacitor determines whether that bit is a logical 1 or 0. However these
capacitors do not hold their charge indefinitely, and therefore the data
needs to be refreshed periodically. It is a semiconductor memory that is
equipped in personal computers and workstations. It increases by about
40% every year.
3. Magnetic disk technology: Magnetic disks include floppy disks, compact
disks, hard disks, etc. The disk facing the drive is coated with magnetic
particles into microscopic areas called domains. These domains acts like
a tiny magnet with north and south poles. This technology is currently
increasing 30% every year.
4. Network technology: Networks may be referred to the range of
computers and its hardware components connected together through the
communication channels. Communication protocols lead the
communication in the network and provide the basis for network
programming. The performance depends both on the switches and the
transmission systems.
These rapidly changing technologies mould the design of the computer that
will have a life of more than five years. It has been observed that with the help
of the study of these technology trends, the computer designers have been
able to reduce the costs at the rate at which the technology changes.
Self Assessment Questions
9. The designer should never plan for the technology changes that would
the level of digital designing. Caches that are usually looked for in parallel use
multiple memory banks to find a desired item. Modern ALUs use parallelism
to increase their speed of the process of calculating sums from linear to
logarithmic in the number.
2.5.2 Principle of locality
Principle of locality is an important program property as programs tend to
reuse the data and instructions they have already used. Principle of locality
follows the rule that it can help us foresee the data and instructions that a
program might require in the near future. This forecast is made depending on
the trend of usage of the data and instructions in its history.
There are two different kinds of localities: Temporal Locality which declares
that the items referred in the recent times are potential to be accessed in the
near future and Spatial Locality which states that the items nearby the location
of the recently used items may also be referred close together in time. The
localities are stored in a component called cache memory, which is located
between the CPU (or processor) and the main memory as shown in the figure
2.7.
instruction fetching and coding unit of a processor first, as it may be used more
often than a multiplier. It works on dependability as well.
The optimising of the recurrent case is more beneficial and faster than the
non-recurrent case. It is simpler too. For example, it is rare that an overflow
may occur when adding any two numbers in the processor. Thus, it improves
the performance by optimising the more common case of no overflow. To
apply this principle, all we need to do is analyse what the common case is and
what level of performance can be achieved by improving its speed. To quantify
this, we will study the Amdahl’s Law below.
Amdahl’s Law
This law helps compute the performance gain that can be obtained by
improving any division of the computer. Amdahl’s law states that “the
performance improvement to be gained from using some faster mode of
execution is limited by the fraction of the time the faster mode can be used.”
(Hennessey and Patterson)
Figure 2.8 shows the predicted speed using Amdahl’s law in a graphic form.
Or,
Execution time for entire task without using the enhancement
Speedup = -------------------------------------------------------------------------------------
Execution time for entire task using the enhancement when possible
Amdahl’s law helps us to find the speedup from some enhancement. This
depends on the following two factors:
1. The fraction of the computation time in the original computer that can be
converted to take advantage of the enhancement - For example, if 20
seconds of the execution time of a program that takes 60 seconds in total
can use an enhancement, the fraction is 20/60. This value, which we will
call Fraction enhanced, is always less than or equal to 1.
2. The improvement gained by the enhanced execution mode; that is, how
much faster the task would run if the enhanced mode were used for the
entire program - This value is the time of the original mode over the time
of the enhanced mode. If the enhanced mode takes, say, 2 seconds for a
portion of the program, while it is 5 seconds in the original mode, the
improvement is 5/2. We will call this value, which is always greater than 1,
Speedup enhanced.
The execution time using the original computer with the enhanced mode will
be the time spent using the unenhanced portion of the computer plus the time
spent using the enhancement:
2.7 Summary
Let us recapitulate the important concepts discussed in this unit:
• There are two types of execution - concurrent and parallel.
• Computer design is an activity that converts the architecture design of the
computer into a programming structure implementation of a particular
organisation.
• Computer technology has made drastic changes in the past 60 years when
the first general-purpose computer was invented.
• Desktop computers have the largest market in terms of costs. It varies
from low-end systems to very high-end heavily configured computer
systems.
• The world’s first designer was Charles Baggage and is considered as the
father of computers.
• Computer designer needs to determine the attributes that are necessary
for a new computer, then design a computer to maximise the performance.
• The Instruction Set Architecture (ISA) is the part of the processor that is
visible to the programmer or compiler writer.
• Performance of the computer is improved by taking advantage of
parallelism.
• Focussing on the common case will work positively both for power and
resource allocation, thus, leading to advancement.
2.8 Glossary
• CISC: Complex instruction set computer
• Computer designer: A person who design CPUs or computers that are
actually built and are into considerable use and influence the further
development of computer designs.
• Desktop computers: These are in the form of personal computers (also
2.10 Answers
Self Assessment Questions
1. Microprocessor
2. Main-frame
3. True
4. Minimum cost
5. Charles Baggage
6. ISA
7. 16
8. Complex instruction set computer
9. False
10. Integrated circuits or microchips
11. Adopting parallelism
12. Scalability
13. Pipelining
14. Temporal Locality
15. Spatial Locality
16. Power efficiency
17. Dynamic Power
18. High issue rates, sustained performance
Terminal Questions
1. Desktop computers have the largest market in terms of costs. It varies from
low-end systems to very high-end heavily configured computer systems.
Refer Section 2.2.
2. An embedded system is a single purpose computer embedded in a devise
to control some particular function of that bigger devise. The performance
requirement of an embedded application is real-time execution. Refer
Section 2.2.
3. Computer Designer is a person who has designed CPUs or computers that
were actually built and came into considerable use and influenced the
further development of computer designs. Refer Section 2.3.
4. Architecture covers all three aspects of computer design - instruction set
architecture, organisation, and hardware. Refer Section 2.3.
5. Technology trends need to be studied on a regular basis in order to cope
with the dynamic and rapidly changing market. The instruction set should
be designed such to adapt the rapid changes of the technology. Refer
Section 2.4.
6. Quantitative principles in computer design are: Take Advantage of
Parallelism, Principle of Locality, Focus on the Common Case and
Amdahl’s Law. Refer Section 2.5.
7. Amdahl’s law states that the performance improvement to be gained from
using some faster mode of execution is limited by the fraction of the time
the faster mode can be used. Refer Section 2.5.
References:
• David Salomon, (2008), Computer Organisation, NCC Blackwell.
• John L. Hennessy and David A. Patterson, Computer Architecture: A
Quantitative Approach, (4th Ed.), Morgan Kaufmann Publishers
3.1 Introduction
In the previous unit, you have studied about fundamentals of computer
architecture and design. Now we will study in detail about the instruction set
and its principles.
The instruction set or the instruction set architecture (ISA) is the set of basic
instructions that a processor understands. In other words, an instruction set,
or instruction set architecture (ISA), is the part of the computer architecture
related to programming, including the native data types, instructions, registers,
addressing modes, memory architecture, interrupt and exception handling,
and external I/O. There are a number of instructions in a program that have to
be accessed in a particular sequence. This encourages us to describe the
issue of instruction and its sequence which we will study in this unit. In this
unit, you will study the fundamentals involved in instruction set architecture
and design. Firstly, the operations in the instruction sets, instruction set
architecture, memory locations and addresses, memory addressing, abstract
model of the main memory, and instructions for control flow need to be
push addr
Places the value at address addr on top of the stack push([addr])
pop addr Stores the top value on the stack at memory address addr M(addr) =
pop
add
Adds the top two values on the stack and pushes the result onto the
stack pusbtpop ♦ pop)
sub
Subtracts the second top value horn the top value of the stack and
pushes the result onto the stack puslrtpop pop)
mult
Multiplies the top two values in the stack and pushes the result onto tlie
stack pusbtpop ♦ pop)
1 ।
The branch and jump instructions are identical in their use but sometimes they
are used to denote different addressing modes. The branch is usually a one-
address instruction. Branch and jump instructions may be conditional or
unconditional.
An unconditional branch instruction, as a name denotes, causes a branch to
the specified address without any conditions. On the contrary the conditional
branch instruction specifies a condition such as branch if positive or branch if
zero. If the condition is met, the program counter is loaded with the branch
address and the next instruction is taken from this address. If the condition is
not met, the program counter remains unaltered and the next instruction is
taken from the next location in sequence.
The skip instruction does not require an address field and is, therefore, a zero-
address instruction. A conditional skip instruction will skip the next instruction,
if the condition is met. This is achieved by incrementing the program counter
during the execute phase in addition to its being incremented during the fetch
phase. If the condition is not met, control proceeds with the next instruction in
sequence where the programmer inserts an unconditional branch instruction.
Thus, a skip-branch pair of instructions causes a branch if the condition is not
met, while a single conditional branch instruction causes a branch if the
condition is met.
The call and return instructions are used in conjunction with subroutines. The
compare instruction performs a subtraction between two operands, but the
result of the operation is not retained. However, certain status bit conditions
are set as a result of the operation. In a similar fashion, the test instruction
performs the logical AND of two operands and updates certain status bits
without retaining the result or changing the operands. The status bits of
interest are the carry bit, the sign bit, a zero indication, and an overflow
condition.
The four status bits are symbolised by C, S, Z, and V. The bits are set or
cleared as a result of an operation performed in the ALU.
1. Bit C (carry) is set to 1 if the end carry C8 is 1 .It is cleared to 0 if the carry
Manipal University Jaipur B1648 Page No. 70
Computer Architecture Unit 1
is 0.
2. Bit S (sign) is set to 1 if the highest-order bit F7 is 1. It is set to 0 if the bit
is 0. s=0 defines positive number and s=1 defines negative number.
3. Bit Z (zero) is set to 1 if the result of the ALU contains all 0’s. It is cleared
to 0 otherwise. In other words, Z = 1 if the result is zero and Z = 0 if the
result is not zero.
4. Bit V (overflow) is set to 1 if the exclusive-OR of the last two carries is equal
to 1, and cleared to 0 otherwise. This is the condition for an overflow when
negative numbers are in 2’s complement. For the 8-bit ALU, V = 1 if the
result is greater than +127 or less than -128.
As you can see in figure 3.5, the status bits can be checked after an ALU
operation to determine certain relationships that exist between the values of A
and B. If bit V is set after the addition of two signed numbers, it indicates an
overflow condition.
Activity 2:
Visit a computer hardware store and try to collect as much information as
possible about the MIPS processor. Compare its features with other
processors.
3.8 Summary
• Each computer has its own particular instruction code format called its
Instruction Set.
• The different types of instruction formats are three-address instructions,
two-address instructions, one-address instructions and zero-address
Manipal University Jaipur B1648 Page No. 73
Computer Architecture Unit 1
instructions.
• A distinct addressing mode field is required in instruction format for signal
processing.
• The program is executed by going through a cycle for each instruction.
• The prototype chip of MIPS architecture demonstrated that it is possible to
integrate a microprocessor with five-stage execution pipeline and cache
controller into a single silicon chip.
3.9 Glossary
• Cell: The smallest unit of memory that the CPU can read or write is cell.
• Decoding: It means interpretation of the instruction.
• Fields: Groups containing bits of instruction.
• Instruction set: Each computer has its own particular instruction code
format called its Instruction Set.
• MIPS: Microprocessor without Interlocked Pipeline.
• Operation: It is a binary code that instructs the computer to perform a
specific operation.
• RISC: Reduced Instruction Set Computer
• Words: Hardware-accessible units of memory larger than one cell are
called words.
3.11 Answers
Self Assessment Questions
1. Fields
2. One-address instructions
3. False
4. True
5. Zero-address
Terminal Questions
1. Each computer has its own particular instruction code format called its
Instruction Set. Refer Section 3.2.
2. The different types of instruction formats are three-address instructions,
two-address instructions, one-address instructions and zero-address
instructions. Refer Section 3.2.
3. Memory addressing is the logical structure of a computer’s randomaccess
memory (RAM). Refer Section 3.3.
4. A distinct addressing mode field is required in instruction format for signal
processing. Refer Section 3.4.
5. The program is executed by going through a cycle for each instruction.
Each instruction cycle is now subdivided into a sequence of sub cycles or
phases. Refer Section 3.5.
6. The conditions for altering the content of the program counter are specified
by program control instruction, and the conditions for data- processing
operations are specified by data transfer and manipulation instructions.
Refer Section 3.6.
7. After considerable research on efficient processor organisation and VLSI
integration at Stanford University, the MIPS architecture evolved. Refer
Section 3.7.
References:
• Hwang, K. (1993) Advanced Computer Architecture. McGraw-Hill, 1993.
• D. A. Godse & A. P. Godse (2010). Computer Organization. Technical
Publications. pp. 3-9.
• John L. Hennessy, David A. Patterson, David Goldberg (2002)
"Computer Architecture: A Quantitative Approach", Morgan Kaufmann; 3rd
edition.
4.1 Introduction
In the previous unit, you studied about the changing face of computing. Also,
you studied the meaning and tasks of a computer designer. We also covered
the technology trends and the quantitative principles in computer design. In
this unit, we will introduce you to pipelining processing, the pipeline hazards,
structural hazards, control hazards and techniques to handle them. We will
also examine the performance improvement with pipelines and understand the
effect of hazards on performance.
A parallel processing system can carry out concurrent data processing to
attain quicker execution time. For example, as an instruction is being executed
4.2 Pipelining
An implementation technique by which the execution of multiple instructions
can be overlapped is called pipelining. This pipeline technique splits up the
sequential process of an instruction cycle into sub-processes that operates
concurrently in separate segments. As you know computer processors can
execute millions of instructions per second. At the time one instruction is
getting processed, the following one in line also gets processed within the
same time, and so on. A pipeline permits multiple instructions to get executed
at the same time. Without a pipeline, every instruction has to wait for the
previous one to be complete. The main advantage of pipelining is that it
increases the instruction throughput, which is defined as the number of
In the figure, each segment has one or three registers with combinational
circuits. Each register is loaded with a new data on start of new time segment.
Refer table 4.2 for an example of contents of Registers in Pipeline.
On 1st clock pulse, data is loaded in registers R1, R2, R3, and R4.
On 2nd clock pulse, product is stored in registers R5 and R6.
On 3rd clock pulse, the data in R5, R6 are added and stored in R7.
So it required a total of 3 clock periods only, to compute An* Bn + Cn* Dn.
Table 4.2: Contents of Registers in Pipeline Example
Segment 1 Segment 2 Segment 3 Segment 4
Clock Pulse
R1 R2 R3 R4 R5 R6 R7
1 A1 B1 C1 D1 - -
2 A2 B2 C2 D2 A1*B1 C1*D1
3 A3 B3 C3 D3 A *B C *D A *B +C *D
4 - - - - A *B C *D A *B +C *D
5 - - - - - - A *B +C *D
In this three-stage pipeline, the input data must go through stages 1, 2 and 3
to perform multiplication and through stages 1 and 3 only to perform
subtraction. Therefore, dynamic pipelines require feed forward and feedback
connections in addition to the streamline connections between the stages.
Branch successor + 4 IF ID
Branch successor + 5 IF
Figure 4.6: Three-Cycle Stall in the Pipeline
The control hazard stall is not implemented in the same way as the data
hazard stall, since the instruction fetch (IF) cycle is to be repeated as soon the
branch cycle is known. Thus, the first IF cycle is definitely a stall, as it never
performs essential tasks. By setting the ID/IF to zero, we can implement the
stall for the three cycles. The repetition of the IF stage is not required, if the
branch is untaken, since the correct instruction may already have been
fetched.
Self Assessment Questions
14. _________ cause a greater performance failure for a pipeline than
15. If the PC is changed by the branch to its target address, then it is known
as __________________ branch; else it is known as __________ .
Instruction i * 2 IF ID EX MEM WB
Instruction i + 3 IF ID EX MEM WB
Instruction i + 4 IF ID EX MEM WB
In reality, there is only a single instruction delay in all machines with delayed
branch, and we emphasize on that case.
Self Assessment Questions
16. The problem posed due to data hazards can be solved with a simple
hardware technique called __________________ .
17. Forwarding is also called _________ or _________________ .
18. ____________ is the method of holding or deleting any instructions
after the branch until the branch destination is known.
19. ________________ technique simply allows the hardware to
continue as if the branch were not executed.
such, the cycle count should be equal to the sum of these three registers.
• In the dual-issue processor, only one of the instruction count, load stall, or
branch stall counters is increased, but the instruction count register may
sometimes be incremented by two (for cycles in which two instructions
execute). As such, the sum of these three registers will be greater than or
equal to the cycle count.
During the write back stage of the pipeline, performance counters should be
counted by the processor. To be precise, it is neither a branch stall nor a load
stall cycle. The current value of these counters can be determined by using a
LD or LDR instruction to access them. The LD instruction takes a source label
and stores its address into the destination register. The source register's value
plus an immediate value offset is stored in the LDR and then the destination
register stores it.
To avoid the complexities, the value of the registers is not changed by the
stores to these locations, the contents of memory may still be updated by the
stores. This hardly makes any change as, anytime these locations are
retrieved, the value in the counter is used rather than the value in the memory.
Basically, these counters can be reset to zero only when the entire system is
reset.
Self Assessment Questions
20. ____________ states the number of cycles lost to load-use stalls.
21. ____________ instruction takes a source label and stores its address
into the destination register.
22. ____________ stores the source register's value plus an immediate
value offset and stores it in the destination register.
CPI is cycles per Instruction which determine the cycle count for each
instruction. The ideal CPI on a pipelined machine is almost always 1.
Therefore, the pipelined CPI is:
CPIpipelined = Ideal CPI + Pipeline stall clock cycles per instruction
If the cycle time overhead of pipelining is ignored and the stages are all
assumed to be perfectly balanced, then the two machines have an equal cycle
time and:
CPI
unpipelined
Speedup=
1+ Pipeline stall cycles per instruction
If all instructions take the same number of cycles, which must also equal the
number of pipeline stages (the depth of the pipeline) then unpipelined CPI is
equal to the depth of the pipeline, leading to
Pipeline depth
Speedup=
1 + Pipeline stall cycles per instruction
If there are no pipeline stalls, this leads to the intuitive result that pipelining
can improve performance by the depth of pipeline.
Self Assessment Questions
23. A __________ hazard causes the pipeline performance to degrade
the ideal performance.
24. CPI is the abbreviation for ___________ .
Activity 1:
Pick any two hazards from the organisation you previously visited. Now
implement the handling techniques to these hazards.
4.10 Summary
Let us recapitulate the important concepts discussed in this unit:
• A parallel processing system is able to perform concurrent data
4.13 Answers
Self Assessment Questions
1. Pipelining
2. Virtual parallelism
3. Load Memory Data
4. First-in first-out (FIFO) buffer
5. Linear
6. Non-Linear
7. Dynamic pipelines
8. Hazards
9. Resource conflicts
10. Data dependency
11. Branch difficulties
12. True
13. 6
14. Control Hazards, data hazards
15. Taken, not taken or untaken
16. Forwarding
17. Bypassing or short-circuiting
18. Freeze or flush the pipeline
19. Assume each branch as not-taken
20. Load-stall count
21. LD
22. LDR
23. Stall
24. Cycles per Instruction
Terminal Questions
1. The concurrent use of two or more CPU or processors to execute a
program is called parallel processing. For details -Refer Section 4.1.
2. An implementation technique by which the execution of multiple
instructions can be overlapped is called pipelining. Refer Section 4.2 for
more details.
3. There are two types of pipelining-Linear and non-linear. Refer Section 4.3
for more details.
4. Hazards are the situations that stop the next instruction in the instruction
stream from being executed during its designated clock cycle. Refer
Section 4.4.
5. There are two techniques to handle hazards namely minimising data
hazard stalls by forwarding and reducing pipeline branch penalties. Refer
Section 4.7.
References:
• David Salomon, Computer Organisation, 2008, NCC Blackwell
• John L. Hennessy and David A. Patterson, Computer Architecture: A
Quantitative Approach, Fourth Edition, Morgan Kaufmann Publishers
• Joseph D. Dumas II; Computer Architecture; CRC Press
• Nicholas P. Carter; Schaum’s outline of computer Architecture; McGraw-
Hill Professional