Unit1 Aca
Unit1 Aca
1
Fundamentals of Computer Design
Introduction
Incredible progress of Computer technology i.e. more
performance, more main memory, and more disk storage.
Due to: Innovations in
Technology – Performance is fairly steady
Computer Design/ Architectures – Performance is much
less consistent
1960s Mainframe Large; costing millions of dollars and stored in computer rooms with
multiple operators overseeing their support. Typical applications included
business data processing and large-scale scientific computing.
1980s Desktop computer Based on microprocessors in the form of both personal computers and
workstations.
Servers Time-sharing systems are replaced by individually owned desktop
computers acting as servers (i.e. computers that provided larger-scale
services such as reliable, long-term file storage and access, larger memory,
and more computing power.)
Since 2000 Embedded Computers are lodged in other devices and their presence is not
computers immediately obvious. E.g. Cell phones.
The changes in different classes of computers changed views on!
Computing Technologies
Computing applications
Computer markets
Spans from low-end systems that sell for under $500 to high-end,
heavily configured workstations that may sell for $5000. Throughout
this range in price and capability, the desktop market tends to be
driven to optimize Price-performance.
ii. To minimize power: Larger memories also mean more power, and
optimizing power is often critical in embedded applications.
Although the emphasis on low power is frequently driven by the
use of batteries, the need to use less expensive packaging—plastic
(versus ceramic)—and the absence of a fan for cooling also limit
total power consumption.
Defining Computer Architecture
Computer Architecture
1. Determines what attributes are important for a new
computer.
2. Designs a computer to maximize performance while staying
within cost, power, and availability constraints(a.c.p):
Aspects in this task include:
i. Instruction set design
ii. Functional organization
iii. Logic design
iv. Implementation – It may encompass integrated
circuit design, packaging, power, and cooling.
Optimizing the design requires familiarity with a very wide
range of technologies, from compilers and operating
systems to logic design and packaging.
Instruction Set Architecture [ISA] and its 7 dimensions
[using examples from MIPS (Microprocessor without Interlocked
Pipeline Stages) and 80x86] :
Can access memory as part of Can access memory only with load or
many instructions. store instructions. All recent ISAs are
load-store.
Register Register
Immediate (for constants) Immediate (for constants)
Displacement: i.e. a Displacement: i.e. a constant offset is added to a register to
constant offset is added to form the memory address.
a register to form the
memory address.
3 variations of displacement: No register (absolute)
Two registers (based indexed with displacement)
Two registers where one register is multiplied by the size of
the operand in bytes (based with scaled index and
displacement).
It has more like the last three, minus (i.e. without) the
displacement field: register indirect, indexed, and based with
scaled index.
4. Types and sizes of operands:
Driven by the application software which determines how the computer will be used. If a
large body of software exists for a certain ISA, the architect may decide that a new computer
should implement an existing instruction set (to reduce the software size).
Design a computer to meet functional requirements as well as price, power, performance, and
availability goals.
Must be aware of important trends in both the technology and the use of computers, as such
trends not only affect future cost, but also the longevity of an architecture.
Functional requirements to consider in designing a new computer:
Wire delay has become a major (critical) design limitation than transistor
switching delay for large integrated circuits !
Trends in Power in Integrated Circuits
What are the challenges the power provides as devices are scaled?
i.Bringing in and distributing the power around the chip: e.g. Modern
microprocessors use hundreds of pins and multiple interconnect layers for just
power and ground.
ii.Removal of heat dissipated by the power.
iii.Preventing hot spots.
Different types of power distribution issues:
i.Dynamic power: Energy consumption in switching transistors for CMOS chips.
It is proportional to the product of the load capacitance of the transistor, the
square of the voltage, and the frequency of switching, with watts being the unit:
ii.Energy: Mobile devices care about battery life more than power, so energy is
the proper metric, measured in joules:
iii. Static power: Needed to gate the voltage to inactive modules (in which
transistors are off) to control loss due to leakage current flows.
I. Time
II. Volume
III. Commodification
IMPACT OF TIME OVER COST
The cost of a manufactured computer component decreases over time
even without major improvements in the basic implementation technology.
The underlying principle that drives costs down is the learning curve—
i.The more times a task has been performed, the less time will be required on
each subsequent iteration.
ii.As the quantity of items produced doubles, costs decrease at a predictable
rate.
iii.Manufacturing costs decrease over time.
E.g.
Price per megabyte of DRAM has dropped over the long term by 40% per year. Since
DRAMs tend to be priced in close relationship to cost—with the exception of periods
when there is a shortage or an oversupply—price and cost of DRAM track closely.
Microprocessor prices also drop over time, but because they are less standardized
than DRAMs, the relationship between price and cost is more complex. In a period of
significant competition, price tends to track cost closely, although microprocessor
vendors probably rarely sell at a loss.
IMPACT OF VOLUME OVER COST
iii.Volume decreases the amount of development cost thus allowing cost and
selling price to be closer.
IMPACT OF COMMODITIES OVER COST
What are commodities? Commodities are products that are sold by
multiple vendors in large volumes and are essentially identical.
E.g. All the products sold on the shelves of grocery stores are commodities, as
are standard DRAMs, disks, monitors, and keyboards.
• The number of dies per wafer is approximately the area of wafer divided by the area of
the die. It can be more accurately estimated by
π x ( Wafer Diameter / 2 ) 2 π x Wafer Diameter
Dies per wafer = ----------------------------------------- – ---------------------------------
Die area sqrt (2 x Die area)
The first term is the ratio of wafer area (πr2 ) to die area. The second compensates for the “square peg in
a round hole” problem—rectangular dies near the periphery of round wafers. Dividing the
circumference (πd ) by the diagonal of a square die is approximately the number of dies along the
edge.
61
What is the fraction of good dies on a wafer number, or the die
yield?
A simple model of integrated circuit yield, which assumes that
defects are randomly distributed over the wafer and that
yield is inversely proportional to the complexity of the
64
The formula is an empirical model developed by looking at the yield of many
manufacturing lines.
Wafer yield: accounts for wafers that are completely bad and so need not be
tested. For simplicity, wafer yield is assumed to be 100%.
•Although some faults are widespread, like the loss of power, many can be
limited to a single component in a module. Thus, utter failure of a module at one
level may be considered merely a component error in a higher-level module.
When a system is operating properly?
System providers offer Service Level Agreements (SLA) [or Service Level
Objectives (SLO)] to guarantee their service would be dependable.
E.g. they would pay the customer a penalty if they did not meet an
agreement more than some hours per month.
Thus, an SLA could be used to decide whether the system was up or down.
When a system is operating properly?
– reliability
– availability
Two main measures (Quantitative metrics) of dependability:
In resources (have other components to take over from the one that
failed).
Once the component is replaced and the system fully repaired, the
dependability of the system is assumed to be as good as new.
Example:
Assume
One power supply is sufficient to run the disk subsystem and
One redundant power supply is added.
the chance that the other will fail before the first one is replaced.
Thus, if the chance of a second failure before repair is small, then MTTF of
the pair is large.
With two power supplies and independent failures,
the mean time until one disk fails is = MTTF power supply ⁄ 2.
making the pair about 4150 times more reliable than a single power supply.
Measuring, Reporting, and
Summarizing Performance
Performance measures/ metrics:
i.Response time (or execution time): Time between the start and
the completion of an event. – Design must reduce this time.
Performance is reciprocal of response time.
E.g. The user of a desktop computer may say a computer
is faster when a program runs in less time.
The phrase “the throughput of X is 1.3 times higher than Y” signifies here that
the number of tasks completed per unit time on computer X is 1.3 times
the
number of tasks completed on Y.
iii.CPU time:
With multiprogramming, the processor works on another program while waiting for
I/O and may not necessarily minimize the elapsed time of one program. Hence, a term
is needed to consider this activity. CPU time recognizes this distinction.
Definition:
CPU time is the time the processor is computing, not including the time
waiting for I/O or running other programs. (Clearly, the response time seen
by the user is the elapsed time of the program, not the CPU time.)
Computer users who routinely run the same programs would be the perfect
candidates to evaluate a new computer.
To evaluate a new system the users would simply compare the execution time
of their workloads—the mixture of programs and operating system
commands that users run on a computer.
Two types:
Real applications: Examples include
Compiler
Kernels, which are small, key pieces of real applications;
Toy programs, which are 100-line programs from beginning programming
assignments, such as quick-sort;
Synthetic benchmarks: which are fake programs invented to try
to match the profile and behavior of real applications E.g. Dhrystone.
The Dhrystone benchmark contains no floating point operations, thus the name is a pun on the
then-popular Whetstone benchmark for floating point operations. The output from the
benchmark is the number of Dhrystones per second (the number of iterations of the main code
loop per second).
Drawback of using benchmarks:
Alternatively,
The execution time for entire task without using the enhancement
Speed up = --------------------------------------------------------------
The execution time for entire task using the enhancement when possible
Speed up tells us how much faster a task will run using the computer
with the enhancement as opposed to the original computer.
Two factors of speed up enhancement
• Fraction Enhanced : The fraction of the computation time in the original
computer that can be converted to take advantage of the enhancement.
For example: If 20 seconds of the execution time of a program that takes
60 seconds in total can use an enhancement, the fraction is 20/60. The
value, is always less than or equal to 1.
• Speed up Enhanced : The improvement gained by the enhancement,
execution mode, that is, how much faster the task would run if the
enhanced mode were used for the entire program.
For example : If the enhanced mode takes 2 seconds for the portion of
the program, while it is 5 seconds in the original mode, the
improvement is 5/2. The value is always greater than 1.
Calculation of Execution time and Speedup Overall
• Guess ? ! ? ! ?
Example 2
• Suppose that a given architecture does not have hardware support for multiplication,
so multiplications have to be done through repeated addition. If it takes 200 cycles to
perform a multiplication in software, and 4 cycles to perform a multiplication in
hardware, what is the overall speedup from hardware support for multiplication if a
program spends 10% of its time doing multiplications? What about a program that
spends 40% of its time doing multiplications?
• In both the cases, the speedup when the multiplication hardware is used is 200 / 4 =
50 (ratio of time to do a multiplication without the hardware to time with the
hardware). In the case where the program spends 10% of its time doing
multiplications, Fraction Enhanced = 0.1; i.e., 1–Fraction Enhanced = 0.9.
• By Amdahl’s law, we get,
Speedup Overall = 1 / [ 0.9 + ( 0.1 / 50 ) ] = 1.11
• If the program spends 40% of its time doing multiplications before the addition of
hardware multiplication, then Fraction Enhanced is ________________
Hence 1–Fraction Enhanced is 0.6; We get, Speedup = 1 / [ 0.6 + (0.4 / 50) ] = 1.64
Example 3
• If the 1998 version of a computer executes a program in 200
sec and the 2000 version of the computer executes the
program in 150 sec. What is the speedup that the
manufacturer has achieved over the two years period ?
• Hint :
Execution Time old
Speedup = -------------------------------
Execution Time new
Example 3
• If the 1998 version of a computer executes a program in 200 sec
and the 2000 version of the computer executes the program in
150 sec. What is the speedup that the manufacturer has achieved
over the two years period ?
• These discrete time events are called ticks, clock ticks, clock periods,
clocks, or cycles.
• Time of a clock period is referred by its duration ( ex. 1ns) or by its rate (1
GHz).
OR
• If we know the number of clock cycles and the instruction count, we can
calculate Clock cycles Per Instructions (CPI). Instruction Per Clock (IPC) is
inverse of CPI.
CPU Clock cycles for a program
CPI = --------------------------------------------------------
Instruction Count
The total number of Clock cycles for a program can be defined as IC x CPI
Hence CPU Time = Instruction Count x Cycles Per Instructions x Clock Cycle Time
Instructions Clock Cycles Seconds Seconds
CPU Time = ------------------ x -------------------- x ----------------- = -------------
Program Instruction Clock Cycle Program
Calculating CPU Clock Cycles
n
Overall CPI is :
n
Σ ICi x CPIi
i=1 n
ICi
CPI = ------------------------------- = Σ -------------------------------- x CPIi
Instruction Count i=1
Instruction Count
Example1
• Suppose we have made the following
measurements:
Frequency of FP operations = 25 %
Average CPI of FP operations = 4.0
Average CPI of other instructions = 1.33
Frequency of FPSQR = 2 %
CPI of FPSQR = 20
Assume that the two design alternatives are to decrease the
CPI of FPSQR to(by ?) 2 or to decrease the average CPI of all
FP operations to 2.5. Compare these two design alternatives
using the processor performance equation.
Answer
CPI with new FPSQR = CPI Original – 2% x (CPI Old FPSQR - CPI new FPSQR Only )