0% found this document useful (0 votes)
12 views

Computer Architecture Free Notes

This document provides an introduction to advanced computer architecture. It discusses the evolution of computer technology from mainframes in the 1960s to modern desktops, servers, and embedded systems. The key points covered include: - Advances in chip manufacturing and algorithms led to RISC architectures in the 1980s and growth in parallelism approaches like ILP, TLP, and DLP in the 2000s. - Computers evolved from large expensive mainframes to cheaper minicomputers, desktops, servers, and embedded systems. - Computer architecture involves designing the instruction set, organization, and hardware to optimize performance within cost, power and availability constraints for the target application area. - Instruction set design

Uploaded by

kayotig846
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Computer Architecture Free Notes

This document provides an introduction to advanced computer architecture. It discusses the evolution of computer technology from mainframes in the 1960s to modern desktops, servers, and embedded systems. The key points covered include: - Advances in chip manufacturing and algorithms led to RISC architectures in the 1980s and growth in parallelism approaches like ILP, TLP, and DLP in the 2000s. - Computers evolved from large expensive mainframes to cheaper minicomputers, desktops, servers, and embedded systems. - Computer architecture involves designing the instruction set, organization, and hardware to optimize performance within cost, power and availability constraints for the target application area. - Instruction set design

Uploaded by

kayotig846
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/255178777

UNIT 1 Advanced Computer Architecture Introduction

Chapter · January 2010

CITATIONS READS

0 46,309

1 author:

Jain Nitin
Bahubali College of Engineering
9 PUBLICATIONS 3 CITATIONS

SEE PROFILE

All content following this page was uploaded by Jain Nitin on 27 May 2014.

The user has requested enhancement of the downloaded file.


UNIT 1 ACA VIII CSE

1.1 INTRODUCTION
Computer technology has made incredible progress in the roughly 60 years since the first general-
purpose electronic computer was created.Various changes like advanced in chip manufacturing,
better algorithms have made it possible to develop successfully a new set of architectures with
simpler instructions, called RISC (Reduced Instruction Set Computer) architectures, in the early
1980s.

Figure 1.1 shows that the combination of architectural and organizational enhancement in 1980s
Later in 2000s the growth have been doubled due to following.
1. instruction-level parallelism (ILP)
2. thread-level parallelism (TLP)
3. data-level parallelism (DLP),

1.2 CLASSES OF COMPUTERS


In the 1960s, the dominant form of computing was on large mainframes—computers costing
millions of dollars and stored in computer rooms with multiple operators overseeing their support.
Typical applications included business data processing and large-scale scientific computing.
The 1970s saw the birth of the minicomputer, a smaller-sized computer initially focused on
applications in scientific laboratories, but rapidly branching out with the popularity of time-sharing
—multiple users sharing a computer interactively through independent terminals. That decade also
saw the emergence of supercomputers, which were high-performance computers for scientific
computing. Although few in number, they were important historically because they pioneered
innovations that later trickled down to less expensive computer classes.

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 1


UNIT 1 ACA VIII CSE

The 1980s saw the rise of the desktop computer based on microprocessors, in the form of both
personal computers and workstations. The individually owned desktop computer replaced time-
sharing and led to the rise of servers—computers that provided larger-scale services such as
reliable, long-term file storage and access, larger memory, and more computing power. The 1990s
saw the emergence of the Internet and the World Wide Web, the first successful handheld computing
devices (personal digital devices)

Desktop Computing
Desktop computing also tends to be reasonably well characterized in terms of applications and
benchmarking, though the increasing use of Webcentric.
They were the first, and still the largest market in dollar terms, is desktop computing. Desktop
computing spans from low-end systems that sell for under $500 to high-end,heavily configured
workstations that may sell for $5000. Throughout this range in price and capability, the desktop
market tends to be driven to optimize price performance. They uses processor quite suffecieny to
carry the job of the naive users.

Servers
As the shift to desktop computing occurred, the role of servers grew to provide larger-scale and
more reliable file and computing services. The World Wide Web accelerated this trend because of
the tremendous growth in the demand and sophistication of Web-based services. Such servers have
become the backbone of large-scale enterprise computing, replacing the traditional mainframe
as they need to be running round the clock.
The cost is high and they were scalabel( i.e balancing the load was the key parmeter) and
provide a good throughput.
A related category is supercomputers. They are the most expensive computers, costing tens of
millions of dollars, and they emphasize floating-point performance.
Embedded Computers
Embedded computers are the fastest growing portion of the computer market.These devices range
from everyday machines—most microwaves, most washing machines, most printers, most
networking switches, and all cars contain simple embedded microprocessors—to handheld digital
devices, such as cell phones and smart cards, to video games and digital set-top boxes.
Embedded computers have the widest spread of processing power and cost.They include 8-bit and
16-bit processors that may cost less than a dime, 32-bit microprocessors that execute 100 million
instructions per second and the cost is also less.

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 2


UNIT 1 ACA VIII CSE

The performance requirement in an embedded application is real-time execution. A real-time


performance requirement is when a segment of the application has an absolute maximum execution
time.
The key characteristics of many embedded applications: the need to minimize memory and the
need to minimize power.
Sometimes the application is expected to fit totally in the memory on the processor chip; other times
the memory is provided off the chip( mobile phone +Memory card)

3 DEFINING COMPUTER ARCHITECTURE


The task the computer designer faces is a complex one: Determine what attributes are important for
a new computer, then design a computer to maximize performance while staying within cost, power,
and availability constraints. This task has many aspects, including instruction set design, functional
organization, logic design, and implementation. The implementation may encompass integrated
circuit design, packaging, power, and cooling. Optimizing the design requires familiarity with a
very wide range of technologies, from compilers and operating systems to logic design and
packaging.
The Architecture role is designing following
Designing Instruction Set Architecture (ISA) and while doing this he need to address the
following.
1. Class of ISA—Nearly all ISAs today are classified as general-purpose register architectures,
where the operands are either registers or memory locations.
2. Memory addressing—Virtually all desktop and server computers, including the 80x86 and
MIPS, use byte addressing to access memory operands. It specify the amount of memory
access by the processor.
3. Addressing modes—In addition to specifying registers and constant operands, addressing
modes specify the address of a memory object. eg. Direct, Indirect, Base ,Index, With
Offset.
4. Types and sizes of operands—Like most ISAs, MIPS and 80x86 support operand sizes of
8-bit (ASCII character), 16-bit (Unicode character or half word), 32-bit (integer or word),
64-bit (double word or long integer),
5. Operations—The general categories of operations are data transfer, arithmetic logical,
control (discussed next), and floating point.
6. Control flow instructions—Virtually all ISAs, including 80x86 and MIPS, support
conditional branches, unconditional jumps, procedure calls, and returns. Both use PC-
relative addressing, where the branch address is specified by an address field that is added to

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 3


UNIT 1 ACA VIII CSE

the PC
7. Encoding an ISA—There are two basic choices on encoding: fixed length and variable
length.

The Rest of Computer Architecture: Designing the Organization and Hardware to Meet Goals
and Functional Requirements consit of following issues
Organization and hardware. The term organization includes the high-level aspects of a
computer’s design, such as the memory system, the memory interconnect, and the design of the
internal processor or CPU (central processing unit—where arithmetic, logic, branching, and data
transfer are implemented).

Hardware refers to the specifics of a computer, including the detailed logic design and the
packaging technology of the computer. Often a line of computers contains computers with identical
instruction set architectures and nearly identical organizations, but they differ in the detailed
hardware implementation.
The word architecture covers all three aspects of computer design—instruction set architecture,
organization, and hardware.
Computer architects must design a computer to meet functional requirements as well as price,
power, performance, and availability goals. All these are summeraze in table below.
Functional requirements Typical features required or supported
Application area Target of computer
General-purpose desktop Balanced performance for a range of tasks, including
interactive performance for graphics, video, and audio
(Ch. 2, 3, 5, App. B)
Scientific desktops and High-performance floating point and graphics (App. I)
servers
Commercial servers Support for databases and transaction processing;
enhancements for reliability and availability; support for
scalability (Ch. 4, App. B, E)
Embedded computing Often requires special support for graphics or video (or
other application-specific extension); power limitations
and power control may be required (Ch. 2, 3, 5, App.
B)
Level of software Determines amount of existing software for computer
compatibility

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 4


UNIT 1 ACA VIII CSE

At programming language Most flexible for designer; need new compiler (Ch. 4,
App. B)
Object code or binary Instruction set architecture is completely defined—little
compatible flexibility—but no investment needed in software or
porting programs.
Operating system Necessary features to support chosen OS (Ch. 5, App. E)
requirements
Size of address space Very important feature (Ch. 5); may limit applications
Memory management Required for modern OS; may be paged or segmented
(Ch. 5)
Protection Different OS and application needs: page vs. segment;
virtual machines (Ch. 5)
Standards Certain standards may be required by marketplace
Floating point Format and arithmetic: IEEE 754 standard (App. I),
special arithmetic for graphics or signal processing
I/O interfaces For I/O devices: Serial ATA, Serial Attach SCSI, PCI
Express (Ch. 6, App. E)
Operating systems UNIX, Windows, Linux, CISCO IOS
Networks Support required for different networks: Ethernet,
Infiniband (App. E)
Programming languages Languages (ANSI C, C++, Java, FORTRAN) affect
instruction set (App. B)

The Table Summarize of some of the most important functional requirements an architect
faces. The left-hand column describes the class of requirement, while the right-hand column
gives specific examples. The right-hand column also contains references to chapters and
appendices that deal with the specific issues.

4 TRENDS IN TECHNOLOGY
If an instruction set architecture is to be successful, it must be designed to survive rapid changes in
computer technology. After all, a successful new instruction set architecture may last decades—for
example, the core of the IBM mainframe has been in use for more than 40 years. An architect must
plan for technology changes that can increase the lifetime of a successful computer.

To plan for the evolution of a computer, the designer must be aware of rapid changes in

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 5


UNIT 1 ACA VIII CSE

implementation technology. Four implementation technologies, which change at a dramatic pace,


are critical to modern implementations:
• Integrated circuit logic technology—Transistor density increases by about 35% per
year, quadrupling in somewhat over four years. Increases in die size are less predictable
and slower, ranging from 10% to 20% per year. The combined effect is a growth rate in
transistor count on a chip of about 40% to 55% per year. Device speed scales more
slowly, as we discuss below.
Semiconductor DRAM (dynamic random-access memory)—Capacity increases by about
40% per year, doubling roughly every two years.
• Magnetic disk technology—Prior to 1990, density increased by about 30% per year,
doubling in three years. It rose to 60% per year thereafter, and increased to 100% per
year in 1996. Since 2004, it has dropped back to 30% per year. Despite this roller coaster
of rates of improvement, disks are still 50–100 times cheaper per bit than DRAM.
• Network technology—Network performance depends both on the performance of
switches and on the performance of the transmission system.
These rapidly changing technologies shape the design of a computer that, with speed and
technology enhancements, may have a lifetime of five or more years.
Performance Trends: Bandwidth over Latency bandwidth or throughput is the total amount of
work done in a given time, such as megabytes per second for a disk transfer. In contrast, latency or
response time is the time between the start and the completion of an event, such as milliseconds for
a disk access. Figure below plots the relative improvement in bandwidth and latency for technology
milestones for microprocessors, memory, networks, and disks

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 6


UNIT 1 ACA VIII CSE

Figure Log-log plot of bandwidth and latency milestones from Figure 1.9 relative to the first
milestone. Note that latency improved about 10X while bandwidth improved about 100X to
1000X.

Scaling of Transistor Performance and Wires


Integrated circuit processes are characterized by the feature size, which is the minimum size of a
transistor or a wire in either the x or y dimension. Feature sizes have decreased from 10 microns in
1971 to 0.09 microns in 2006 he transistor count per square millimeter of silicon is determined by
the surface area of a transistor, the density of transistors increases quadratically with a linear
decrease in feature size.
The increase in transistor performance, however, is more complex. As feature sizes shrink, devices
shrink quadratically in the horizontal dimension and also shrink in the vertical dimension.
The shrink in the vertical dimension requires a reduction in operating voltage to maintain correct
operation and reliability of the transistors. This combination of scaling factors leads to a complex
interrelationship between transistor performance and process feature size. To a first approximation,
transistor performance improves linearly with decreasing feature size.

Although transistors generally improve in performance with decreased feature size, wires in an
integrated circuit do not. In particular, the signal delay for a wire increases in proportion to the
product of its resistance and capacitance. Of course, as feature size shrinks, wires get shorter, but
the resistance and capacitance per unit length get worse.
In the past few years, wire delay has become a major design limitation for large integrated circuits
and is often more critical than transistor switching delay

5 TRENDS IN POWER IN INTEGRATED CIRCUITS


Power also provides challenges as devices are scaled.
First, power must be brought in and distributed around the chip, and modern microprocessors use
hundreds of pins and multiple interconnect layers for just power and ground.
Second, power is dissipated as heat and must be removed.

For CMOS chips, the traditional dominant energy consumption has been in switching transistors,
also called dynamic power. The power required per transistor is proportional to the product of the
load capacitance of the transistor, the square of the voltage, and the frequency of switching, with
watts being the unit:

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 7


UNIT 1 ACA VIII CSE

Power dynamic = 1 ⁄ 2 × Capacitive load × Voltage2 × Frequency switched

Mobile devices care about battery life more than power, so energy is the proper metric, measured in
joules:
Energy dynamic = Capacitive load × Voltage

Hence, dynamic power and energy are greatly reduced by lowering the voltage, and so voltages
have dropped from 5V to just over 1V in 20 years.
Although dynamic power is the primary source of power dissipation in CMOS, static power is
becoming an important issue because leakage current flows even when a transistor is off:

Power static = Current static × Voltage

Thus, increasing the number of transistors increases power even if they are turned off, and leakage
current increases in processors with smaller transistor sizes. As a result, very low power systems are
even gating the voltage to inactive modules to control loss due to leakage.
Example Some microprocessors today are designed to have adjustable voltage, so that a 15%
reduction in voltage may result in a 15% reduction in frequency. What would be the impact on
dynamic power?
Answer Since the capacitance is unchanged, the answer is the ratios of the voltages and
frequencies:

Power new = ( Voltage × 0.85 )2 × ( Frequency switched × 0.85 )


Power old Voltage2 × Frequency switched

thereby reducing power to about 60% of the original.

6 TRENDS IN COST
Although there are computer designs where costs tend to be less important specifically
supercomputers—cost-sensitive designs are of growing significance.
Indeed, in the past 20 years, the use of technology improvements to lower cost, as well as increase
performance, has been a major theme in the computer industry.

The Impact of Time, Volume, and Commodification


The cost of a manufactured computer component decreases over time even without major

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 8


UNIT 1 ACA VIII CSE

improvements in the basic implementation technology. The underlying principle that drives costs
down is the learning curve—manufacturing costs decrease over time. The learning curve itself is
best measured by change in yield—the percentage of manufactured devices that survives the testing
procedure. Whether it is a chip, a board, or a system, designs that have twice the yield will have half
the cost.
Understanding how the learning curve improves yield is critical to projecting costs over a product’s
life.

Microprocessor prices also drop over time, but because they are less standardized than DRAMs, the
relationship between price and cost is more complex. In a period of significant competition, price
tends to track cost closely, although microprocessor vendors probably rarely sell at a loss. Figure
shows processor price trends for Intel microprocessors.

Volume is a second key factor in determining cost. Increasing volumes affectcost in several ways.
First, they decrease the time needed to get down the learning curve, which is partly proportional to
the number of systems (or chips) manufactured. Second, volume decreases cost, since it increases
purchasing and manufacturing efficiency. As a rule of thumb, some designers have estimated that
cost decreases about 10% for each doubling of volume.

Commodities are products that are sold by multiple vendors in large volumes and are essentially
identical. Virtually all the products sold on the shelves of grocery stores are commodities, as are
standard DRAMs, disks, monitors, and keyboards.
Because many vendors ship virtually identical products, it is highly competitive. Of course, this
competition decreases the gap between cost and selling price, but it also decreases cost.

Cost of an Integrated Circuit


Although the costs of integrated circuits have dropped exponentially, the basic process of silicon
manufacture is unchanged: A wafer is still tested and chopped into dies that are packaged (see
Figures 1.11 and 1.12 in book). Thus the cost of a packaged integrated circuit is

This section, we focus on the cost of dies, summarizing the key issues in testing
and packaging at the end.

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 9


UNIT 1 ACA VIII CSE

Learning how to predict the number of good chips per wafer requires first learning how many dies
fit on a wafer and then learning how to predict the percentage of those that will work. From there it
is simple to predict cost:

The most interesting feature of this first term of the chip cost equation is its sensitivity to die size,
shown below.
The number of dies per wafer is approximately the area of the wafer divided by the area of the die.
It can be more accurately estimated by

Example Find the number of dies per 300 mm (30 cm) wafer for a die that is 1.5 cm on a side.
Answer The die area is 2.25 cm2. Thus

However, this only gives the maximum number of dies per wafer. The critical question is: What is
the fraction of good dies on a wafer number, or the die yield?
A simple model of integrated circuit yield, which assumes that defects are randomly distributed
over the wafer and that yield is inversely proportional to the complexity of the fabrication process,
leads to the following:

The above formula is an empirical model developed by looking at the yield of many manufacturing
lines.
Example Find the die yield for dies that are 1.5 cm on a side and 1.0 cm on a side, assuming a
defect density of 0.4 per cm2 and α is 4.
Answer The total die areas are 2.25 cm2 and 1.00 cm2. For the larger die, the yield is

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 10


UNIT 1 ACA VIII CSE

That is, less than half of all the large die are good but more than two-thirds of the small die are
good.

What should a computer designer remember about chip costs? The manufacturing process
dictates the wafer cost, wafer yield, and defects per unit area, so the sole control of the designer is
die area. In practice, because the number of defects per unit area is small, the number of good dies
per wafer, and hence the ost per die, grows roughly as the square of the die area. The computer
designer affects die size, and hence cost, both by what functions are included on or excluded from
the die and by the number of I/O pins.

Cost versus Price


With the commoditization of the computers, the margin between the cost to the manufacture a
product and the price the product sells for has been shrinking.Those margins pay for a company’s
research and development (R&D), marketing, sales, manufacturing equipment maintenance,
building rental, cost of financing, pretax profits, and taxes.

7 DEPENDABILITY
Historically, integrated circuits were one of the most reliable components of a computer. Although
their pins may be vulnerable, and faults may occur over communication channels, the error rate
inside the chip was very low.
Computers are designed and constructed at different layers of abstraction. We can descend
recursively down through a computer seeing components enlarge themselves to full subsystems
until we run into individual transistors. Although some faults are widespread, like the loss of power,
many can be limited to a single component in a module. Thus, utter failure of a module at one level
may be considered merely a component error in a higher-level module. This distinction is helpful in
trying to find ways to build dependable computers.

Module reliability is a measure of the continuous service accomplishment (or, equivalently, of the
time to failure) from a reference initial instant. Hence, the mean time to failure (MTTF) is a

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 11


UNIT 1 ACA VIII CSE

reliability measure. The reciprocal of MTTF is a rate of failures, generally reported as failures per
billion hours of operation, or FIT (for failures in time). Mean time between failures (MTBF) is
simply the sum of MTTF + MTTR. Although MTBF is widely used, MTTF is often the more
appropriate term.

Module availability is a measure of the service accomplishment with respect to the alternation
between the two states of accomplishment and interruption.
For nonredundant systems with repair, module availability is

Example See Book

8 MEASURING, REPORTING, AND SUMMARIZING PERFORMANCE

When we say one computer is faster than another is, what do we mean? The computer user is
interested in reducing response time—the time between the start and the completion of an event—
also referred to as execution time. The administrator of a large data processing center may be
interested in increasing throughput—the total amount of work done in a given time.

In comparing design alternatives, we often want to relate the performance of two different
computers, say, X and Y. The phrase “X is faster than Y” is used here to mean that the response time
or execution time is lower on X than on Y for the given task. In particular, “X is n times faster than
Y” will mean

Since execution time is the reciprocal of performance, the following relationship holds:

Unfortunately, time is not always the metric quoted in comparing the performance of computers.

Even execution time can be defined in different ways depending on what we count. The most
straightforward definition of time is called wall-clock time, response time, or elapsed time, which is

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 12


UNIT 1 ACA VIII CSE

the latency to complete a task, including disk accesses, memory accesses, input/output activities,
operating system overhead—everything. With multiprogramming, the processor works on another
program while waiting for I/O and may not necessarily minimize the elapsed time of one program.

Hence, we need a term to consider this activity. CPU time recognizes this distinction and means the
time the processor is computing, not includ-ing the time waiting for I/O or running other programs.
(Clearly, the response time seen by the user is the elapsed time of the program, not the CPU time.)
Computer users who routinely run the same programs would be the perfect candidates to evaluate a
new computer. To evaluate a new system the users would simply compare the execution time of
their workloads—the mixture of programs and operating system commands that users run on a
computer.

Benchmarks
The best choice of benchmarks to measure performance are real applications, such as a compiler.
Attempts at running programs that are much simpler than a real application have led to performance
pitfalls. Examples include
1. kernels, which are small, key pieces of real applications;
2. toy programs, which are 100-line programs from beginning programming assignments,
such as quicksort; and
3. synthetic benchmarks, which are fake programs invented to try to match the profile and
behavior of real applications, such as Dhrystone.

One of the most successful attempts to create standardized benchmark appli cation suites has been
the SPEC (Standard Performance Evaluation Corporation)

Desktop Benchmarks
Desktop benchmarks divide into two broad classes: processor-intensive benchmarks and graphics-
intensive benchmarks, although many graphics benchmarks include intensive processor activity.

Server Benchmarks
Just as servers have multiple functions, so there are multiple types of benchmarks. The simplest
benchmark is perhaps a processor throughput-oriented benchmark.

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 13


UNIT 1 ACA VIII CSE

8 QUANTITATIVE PRINCIPLES OF COMPUTER DESIGN


The guidelines and principles that are useful in the design and analysis of computers are stated as
follows.
Take Advantage of Parallelism
Taking advantage of parallelism is one of the most important methods for improving performance.
Being able to expand memory and the number of processors and disks is called scalability, and it is
a valuable asset for servers.
At the level of an individual processor, taking advantage of parallelism among instructions is
critical to achieving high performance. One of the simplest ways to do this is through pipelining
Parallelism can also be exploited at the level of detailed digital design. For example, set-associative
caches use multiple banks of memory that are typically searched in parallel to find a desired item.
Modern ALUs use carry-lookahead, which uses parallelism to speed the process of computing sums
from linear to logarithmic in the number of bits per operand.

Principle of Locality
Important fundamental observations have come from properties of programs. The most important
program property that we regularly exploit is the principle of locality: Programs tend to reuse data
and instructions they have used recently. A widely held rule of thumb is that a program spends 90%
of its execution time in only 10% of the code.
Focus on the Common Case
Focusing on the common case works for power as well as for resource allocation and performance.
The instruction fetch and decode unit of a processor may be used much more frequently than a
multiplier, so optimize it first. It works on dependability as well.
A fundamental law, called Amdahl’s Law, can be used to quantify this principle.
The performance gain that can be obtained by improving some portion of a computer can be
calculated using Amdahl’s Law. Amdahl’s Law states that the performance improvement to be
gained from using some faster mode of execution is limited by the fraction of the time the faster
mode can be used.
Amdahl’s Law defines the speedup that can be gained by using a particularfeature. What is
speedup? Suppose that we can make an enhancement to a computer that will improve performance
when it is used. Speedup is the ratio

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 14


UNIT 1 ACA VIII CSE

Alternatively,

Amdahl’s Law gives us a quick way to find the speedup from some enhancement, which depends
on two factors:
1. The fraction of the computation time in the original computer that can be converted to take
advantage of the enhancement—For example, if 20 seconds of the execution time of a
program that takes 60 seconds in total can use an enhancement, the fraction is 20/60. This
value, which we will call Fractionenhanced, is always less than or equal to 1.
2. The improvement gained by the enhanced execution mode; that is, how much faster the task
would run if the enhanced mode were used for the entire program.
This value is the time of the original mode over the time of the enhanced mode. If the
enhanced mode takes, say, 2 seconds for a portion of the program, while it is 5 seconds in
the original mode, the improvement is 5/2. We will call this value, which is always greater
than 1, Speedupenhanced.

The execution time using the original computer with the enhanced mode will be the time spent
using the unenhanced portion of the computer plus the time spent using the enhancement:

The overall speedup is the ratio of the execution times:

Amdahl’s Law can serve as a guide to how much an enhancement will improve performance and
how to distribute resources to improve cost performance. The goal, clearly, is to spend resources
proportional to where time is spent. Amdahl’s Law is particularly useful for comparing the overall
system performance of two alternatives, but it can also be applied to compare two pro
cessor design alternatives

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 15


UNIT 1 ACA VIII CSE

Example
The Processor Performance Equation Essentially all computers are constructed using a clock
running at a constant rate.

These discrete time events are called ticks, clock ticks, clock periods, clocks, cycles, or clock
cycles. Computer designers refer to the time of a clock period by its duration (e.g., 1 ns) or by its
rate (e.g., 1 GHz). CPU time for a program can then be expressed two ways:

CPU time = CPU clock cycles for a program × Clock cycle time
or

In addition to the number of clock cycles needed to execute a program, we can also count the
number of instructions executed—the instruction path length or instruction count (IC). If we
know the number of clock cycles and the instruction count, we can calculate the average number of
clock cycles per instruction (CPI).
Designers sometimes also use instructions per clock (IPC), which is the inverse of CPI.
CPI is computed as

This processor figure of merit provides insight into different styles of instruction sets and
implementations.

By transposing instruction count in the above formula, clock cycles can be defined as IC × CPI.

This allows us to use CPI in the execution time formula:

CPU time = Instruction count × Cycles per instruction × Clock cycle time

Expanding the first formula into the units of measurement shows how the pieces fit together:

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 16


UNIT 1 ACA VIII CSE

As this formula demonstrates, processor performance is dependent upon three characteristics: clock
cycle (or rate), clock cycles per instruction, and instruction count. Furthermore, CPU time is equally
dependent on these three characteristics: A 10% improvement in any one of them leads to a 10%
improvement in CPU time.

Unfortunately, it is difficult to change one parameter in complete isolation from others because the
basic technologies involved in changing each characteristic are interdependent:

1. Clock cycle time—Hardware technology and organization


2. CPI—Organization and instruction set architecture
3. Instruction count—Instruction set architecture and compiler technology

Nitin Jain Asst. Prof. ISE Dept. BCE Shravanbelagola 17

View publication stats

You might also like