Computer Architecture Free Notes
Computer Architecture Free Notes
net/publication/255178777
CITATIONS READS
0 46,309
1 author:
Jain Nitin
Bahubali College of Engineering
9 PUBLICATIONS 3 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jain Nitin on 27 May 2014.
1.1 INTRODUCTION
Computer technology has made incredible progress in the roughly 60 years since the first general-
purpose electronic computer was created.Various changes like advanced in chip manufacturing,
better algorithms have made it possible to develop successfully a new set of architectures with
simpler instructions, called RISC (Reduced Instruction Set Computer) architectures, in the early
1980s.
Figure 1.1 shows that the combination of architectural and organizational enhancement in 1980s
Later in 2000s the growth have been doubled due to following.
1. instruction-level parallelism (ILP)
2. thread-level parallelism (TLP)
3. data-level parallelism (DLP),
The 1980s saw the rise of the desktop computer based on microprocessors, in the form of both
personal computers and workstations. The individually owned desktop computer replaced time-
sharing and led to the rise of servers—computers that provided larger-scale services such as
reliable, long-term file storage and access, larger memory, and more computing power. The 1990s
saw the emergence of the Internet and the World Wide Web, the first successful handheld computing
devices (personal digital devices)
Desktop Computing
Desktop computing also tends to be reasonably well characterized in terms of applications and
benchmarking, though the increasing use of Webcentric.
They were the first, and still the largest market in dollar terms, is desktop computing. Desktop
computing spans from low-end systems that sell for under $500 to high-end,heavily configured
workstations that may sell for $5000. Throughout this range in price and capability, the desktop
market tends to be driven to optimize price performance. They uses processor quite suffecieny to
carry the job of the naive users.
Servers
As the shift to desktop computing occurred, the role of servers grew to provide larger-scale and
more reliable file and computing services. The World Wide Web accelerated this trend because of
the tremendous growth in the demand and sophistication of Web-based services. Such servers have
become the backbone of large-scale enterprise computing, replacing the traditional mainframe
as they need to be running round the clock.
The cost is high and they were scalabel( i.e balancing the load was the key parmeter) and
provide a good throughput.
A related category is supercomputers. They are the most expensive computers, costing tens of
millions of dollars, and they emphasize floating-point performance.
Embedded Computers
Embedded computers are the fastest growing portion of the computer market.These devices range
from everyday machines—most microwaves, most washing machines, most printers, most
networking switches, and all cars contain simple embedded microprocessors—to handheld digital
devices, such as cell phones and smart cards, to video games and digital set-top boxes.
Embedded computers have the widest spread of processing power and cost.They include 8-bit and
16-bit processors that may cost less than a dime, 32-bit microprocessors that execute 100 million
instructions per second and the cost is also less.
the PC
7. Encoding an ISA—There are two basic choices on encoding: fixed length and variable
length.
The Rest of Computer Architecture: Designing the Organization and Hardware to Meet Goals
and Functional Requirements consit of following issues
Organization and hardware. The term organization includes the high-level aspects of a
computer’s design, such as the memory system, the memory interconnect, and the design of the
internal processor or CPU (central processing unit—where arithmetic, logic, branching, and data
transfer are implemented).
Hardware refers to the specifics of a computer, including the detailed logic design and the
packaging technology of the computer. Often a line of computers contains computers with identical
instruction set architectures and nearly identical organizations, but they differ in the detailed
hardware implementation.
The word architecture covers all three aspects of computer design—instruction set architecture,
organization, and hardware.
Computer architects must design a computer to meet functional requirements as well as price,
power, performance, and availability goals. All these are summeraze in table below.
Functional requirements Typical features required or supported
Application area Target of computer
General-purpose desktop Balanced performance for a range of tasks, including
interactive performance for graphics, video, and audio
(Ch. 2, 3, 5, App. B)
Scientific desktops and High-performance floating point and graphics (App. I)
servers
Commercial servers Support for databases and transaction processing;
enhancements for reliability and availability; support for
scalability (Ch. 4, App. B, E)
Embedded computing Often requires special support for graphics or video (or
other application-specific extension); power limitations
and power control may be required (Ch. 2, 3, 5, App.
B)
Level of software Determines amount of existing software for computer
compatibility
At programming language Most flexible for designer; need new compiler (Ch. 4,
App. B)
Object code or binary Instruction set architecture is completely defined—little
compatible flexibility—but no investment needed in software or
porting programs.
Operating system Necessary features to support chosen OS (Ch. 5, App. E)
requirements
Size of address space Very important feature (Ch. 5); may limit applications
Memory management Required for modern OS; may be paged or segmented
(Ch. 5)
Protection Different OS and application needs: page vs. segment;
virtual machines (Ch. 5)
Standards Certain standards may be required by marketplace
Floating point Format and arithmetic: IEEE 754 standard (App. I),
special arithmetic for graphics or signal processing
I/O interfaces For I/O devices: Serial ATA, Serial Attach SCSI, PCI
Express (Ch. 6, App. E)
Operating systems UNIX, Windows, Linux, CISCO IOS
Networks Support required for different networks: Ethernet,
Infiniband (App. E)
Programming languages Languages (ANSI C, C++, Java, FORTRAN) affect
instruction set (App. B)
The Table Summarize of some of the most important functional requirements an architect
faces. The left-hand column describes the class of requirement, while the right-hand column
gives specific examples. The right-hand column also contains references to chapters and
appendices that deal with the specific issues.
4 TRENDS IN TECHNOLOGY
If an instruction set architecture is to be successful, it must be designed to survive rapid changes in
computer technology. After all, a successful new instruction set architecture may last decades—for
example, the core of the IBM mainframe has been in use for more than 40 years. An architect must
plan for technology changes that can increase the lifetime of a successful computer.
To plan for the evolution of a computer, the designer must be aware of rapid changes in
Figure Log-log plot of bandwidth and latency milestones from Figure 1.9 relative to the first
milestone. Note that latency improved about 10X while bandwidth improved about 100X to
1000X.
Although transistors generally improve in performance with decreased feature size, wires in an
integrated circuit do not. In particular, the signal delay for a wire increases in proportion to the
product of its resistance and capacitance. Of course, as feature size shrinks, wires get shorter, but
the resistance and capacitance per unit length get worse.
In the past few years, wire delay has become a major design limitation for large integrated circuits
and is often more critical than transistor switching delay
For CMOS chips, the traditional dominant energy consumption has been in switching transistors,
also called dynamic power. The power required per transistor is proportional to the product of the
load capacitance of the transistor, the square of the voltage, and the frequency of switching, with
watts being the unit:
Mobile devices care about battery life more than power, so energy is the proper metric, measured in
joules:
Energy dynamic = Capacitive load × Voltage
Hence, dynamic power and energy are greatly reduced by lowering the voltage, and so voltages
have dropped from 5V to just over 1V in 20 years.
Although dynamic power is the primary source of power dissipation in CMOS, static power is
becoming an important issue because leakage current flows even when a transistor is off:
Thus, increasing the number of transistors increases power even if they are turned off, and leakage
current increases in processors with smaller transistor sizes. As a result, very low power systems are
even gating the voltage to inactive modules to control loss due to leakage.
Example Some microprocessors today are designed to have adjustable voltage, so that a 15%
reduction in voltage may result in a 15% reduction in frequency. What would be the impact on
dynamic power?
Answer Since the capacitance is unchanged, the answer is the ratios of the voltages and
frequencies:
6 TRENDS IN COST
Although there are computer designs where costs tend to be less important specifically
supercomputers—cost-sensitive designs are of growing significance.
Indeed, in the past 20 years, the use of technology improvements to lower cost, as well as increase
performance, has been a major theme in the computer industry.
improvements in the basic implementation technology. The underlying principle that drives costs
down is the learning curve—manufacturing costs decrease over time. The learning curve itself is
best measured by change in yield—the percentage of manufactured devices that survives the testing
procedure. Whether it is a chip, a board, or a system, designs that have twice the yield will have half
the cost.
Understanding how the learning curve improves yield is critical to projecting costs over a product’s
life.
Microprocessor prices also drop over time, but because they are less standardized than DRAMs, the
relationship between price and cost is more complex. In a period of significant competition, price
tends to track cost closely, although microprocessor vendors probably rarely sell at a loss. Figure
shows processor price trends for Intel microprocessors.
Volume is a second key factor in determining cost. Increasing volumes affectcost in several ways.
First, they decrease the time needed to get down the learning curve, which is partly proportional to
the number of systems (or chips) manufactured. Second, volume decreases cost, since it increases
purchasing and manufacturing efficiency. As a rule of thumb, some designers have estimated that
cost decreases about 10% for each doubling of volume.
Commodities are products that are sold by multiple vendors in large volumes and are essentially
identical. Virtually all the products sold on the shelves of grocery stores are commodities, as are
standard DRAMs, disks, monitors, and keyboards.
Because many vendors ship virtually identical products, it is highly competitive. Of course, this
competition decreases the gap between cost and selling price, but it also decreases cost.
This section, we focus on the cost of dies, summarizing the key issues in testing
and packaging at the end.
Learning how to predict the number of good chips per wafer requires first learning how many dies
fit on a wafer and then learning how to predict the percentage of those that will work. From there it
is simple to predict cost:
The most interesting feature of this first term of the chip cost equation is its sensitivity to die size,
shown below.
The number of dies per wafer is approximately the area of the wafer divided by the area of the die.
It can be more accurately estimated by
Example Find the number of dies per 300 mm (30 cm) wafer for a die that is 1.5 cm on a side.
Answer The die area is 2.25 cm2. Thus
However, this only gives the maximum number of dies per wafer. The critical question is: What is
the fraction of good dies on a wafer number, or the die yield?
A simple model of integrated circuit yield, which assumes that defects are randomly distributed
over the wafer and that yield is inversely proportional to the complexity of the fabrication process,
leads to the following:
The above formula is an empirical model developed by looking at the yield of many manufacturing
lines.
Example Find the die yield for dies that are 1.5 cm on a side and 1.0 cm on a side, assuming a
defect density of 0.4 per cm2 and α is 4.
Answer The total die areas are 2.25 cm2 and 1.00 cm2. For the larger die, the yield is
That is, less than half of all the large die are good but more than two-thirds of the small die are
good.
What should a computer designer remember about chip costs? The manufacturing process
dictates the wafer cost, wafer yield, and defects per unit area, so the sole control of the designer is
die area. In practice, because the number of defects per unit area is small, the number of good dies
per wafer, and hence the ost per die, grows roughly as the square of the die area. The computer
designer affects die size, and hence cost, both by what functions are included on or excluded from
the die and by the number of I/O pins.
7 DEPENDABILITY
Historically, integrated circuits were one of the most reliable components of a computer. Although
their pins may be vulnerable, and faults may occur over communication channels, the error rate
inside the chip was very low.
Computers are designed and constructed at different layers of abstraction. We can descend
recursively down through a computer seeing components enlarge themselves to full subsystems
until we run into individual transistors. Although some faults are widespread, like the loss of power,
many can be limited to a single component in a module. Thus, utter failure of a module at one level
may be considered merely a component error in a higher-level module. This distinction is helpful in
trying to find ways to build dependable computers.
Module reliability is a measure of the continuous service accomplishment (or, equivalently, of the
time to failure) from a reference initial instant. Hence, the mean time to failure (MTTF) is a
reliability measure. The reciprocal of MTTF is a rate of failures, generally reported as failures per
billion hours of operation, or FIT (for failures in time). Mean time between failures (MTBF) is
simply the sum of MTTF + MTTR. Although MTBF is widely used, MTTF is often the more
appropriate term.
Module availability is a measure of the service accomplishment with respect to the alternation
between the two states of accomplishment and interruption.
For nonredundant systems with repair, module availability is
When we say one computer is faster than another is, what do we mean? The computer user is
interested in reducing response time—the time between the start and the completion of an event—
also referred to as execution time. The administrator of a large data processing center may be
interested in increasing throughput—the total amount of work done in a given time.
In comparing design alternatives, we often want to relate the performance of two different
computers, say, X and Y. The phrase “X is faster than Y” is used here to mean that the response time
or execution time is lower on X than on Y for the given task. In particular, “X is n times faster than
Y” will mean
Since execution time is the reciprocal of performance, the following relationship holds:
Unfortunately, time is not always the metric quoted in comparing the performance of computers.
Even execution time can be defined in different ways depending on what we count. The most
straightforward definition of time is called wall-clock time, response time, or elapsed time, which is
the latency to complete a task, including disk accesses, memory accesses, input/output activities,
operating system overhead—everything. With multiprogramming, the processor works on another
program while waiting for I/O and may not necessarily minimize the elapsed time of one program.
Hence, we need a term to consider this activity. CPU time recognizes this distinction and means the
time the processor is computing, not includ-ing the time waiting for I/O or running other programs.
(Clearly, the response time seen by the user is the elapsed time of the program, not the CPU time.)
Computer users who routinely run the same programs would be the perfect candidates to evaluate a
new computer. To evaluate a new system the users would simply compare the execution time of
their workloads—the mixture of programs and operating system commands that users run on a
computer.
Benchmarks
The best choice of benchmarks to measure performance are real applications, such as a compiler.
Attempts at running programs that are much simpler than a real application have led to performance
pitfalls. Examples include
1. kernels, which are small, key pieces of real applications;
2. toy programs, which are 100-line programs from beginning programming assignments,
such as quicksort; and
3. synthetic benchmarks, which are fake programs invented to try to match the profile and
behavior of real applications, such as Dhrystone.
One of the most successful attempts to create standardized benchmark appli cation suites has been
the SPEC (Standard Performance Evaluation Corporation)
Desktop Benchmarks
Desktop benchmarks divide into two broad classes: processor-intensive benchmarks and graphics-
intensive benchmarks, although many graphics benchmarks include intensive processor activity.
Server Benchmarks
Just as servers have multiple functions, so there are multiple types of benchmarks. The simplest
benchmark is perhaps a processor throughput-oriented benchmark.
Principle of Locality
Important fundamental observations have come from properties of programs. The most important
program property that we regularly exploit is the principle of locality: Programs tend to reuse data
and instructions they have used recently. A widely held rule of thumb is that a program spends 90%
of its execution time in only 10% of the code.
Focus on the Common Case
Focusing on the common case works for power as well as for resource allocation and performance.
The instruction fetch and decode unit of a processor may be used much more frequently than a
multiplier, so optimize it first. It works on dependability as well.
A fundamental law, called Amdahl’s Law, can be used to quantify this principle.
The performance gain that can be obtained by improving some portion of a computer can be
calculated using Amdahl’s Law. Amdahl’s Law states that the performance improvement to be
gained from using some faster mode of execution is limited by the fraction of the time the faster
mode can be used.
Amdahl’s Law defines the speedup that can be gained by using a particularfeature. What is
speedup? Suppose that we can make an enhancement to a computer that will improve performance
when it is used. Speedup is the ratio
Alternatively,
Amdahl’s Law gives us a quick way to find the speedup from some enhancement, which depends
on two factors:
1. The fraction of the computation time in the original computer that can be converted to take
advantage of the enhancement—For example, if 20 seconds of the execution time of a
program that takes 60 seconds in total can use an enhancement, the fraction is 20/60. This
value, which we will call Fractionenhanced, is always less than or equal to 1.
2. The improvement gained by the enhanced execution mode; that is, how much faster the task
would run if the enhanced mode were used for the entire program.
This value is the time of the original mode over the time of the enhanced mode. If the
enhanced mode takes, say, 2 seconds for a portion of the program, while it is 5 seconds in
the original mode, the improvement is 5/2. We will call this value, which is always greater
than 1, Speedupenhanced.
The execution time using the original computer with the enhanced mode will be the time spent
using the unenhanced portion of the computer plus the time spent using the enhancement:
Amdahl’s Law can serve as a guide to how much an enhancement will improve performance and
how to distribute resources to improve cost performance. The goal, clearly, is to spend resources
proportional to where time is spent. Amdahl’s Law is particularly useful for comparing the overall
system performance of two alternatives, but it can also be applied to compare two pro
cessor design alternatives
Example
The Processor Performance Equation Essentially all computers are constructed using a clock
running at a constant rate.
These discrete time events are called ticks, clock ticks, clock periods, clocks, cycles, or clock
cycles. Computer designers refer to the time of a clock period by its duration (e.g., 1 ns) or by its
rate (e.g., 1 GHz). CPU time for a program can then be expressed two ways:
CPU time = CPU clock cycles for a program × Clock cycle time
or
In addition to the number of clock cycles needed to execute a program, we can also count the
number of instructions executed—the instruction path length or instruction count (IC). If we
know the number of clock cycles and the instruction count, we can calculate the average number of
clock cycles per instruction (CPI).
Designers sometimes also use instructions per clock (IPC), which is the inverse of CPI.
CPI is computed as
This processor figure of merit provides insight into different styles of instruction sets and
implementations.
By transposing instruction count in the above formula, clock cycles can be defined as IC × CPI.
CPU time = Instruction count × Cycles per instruction × Clock cycle time
Expanding the first formula into the units of measurement shows how the pieces fit together:
As this formula demonstrates, processor performance is dependent upon three characteristics: clock
cycle (or rate), clock cycles per instruction, and instruction count. Furthermore, CPU time is equally
dependent on these three characteristics: A 10% improvement in any one of them leads to a 10%
improvement in CPU time.
Unfortunately, it is difficult to change one parameter in complete isolation from others because the
basic technologies involved in changing each characteristic are interdependent: