0% found this document useful (0 votes)

151 views24 pages

Chapter # 1 COAL

The document provides an overview of computer systems architecture and concepts. It discusses eight great ideas in computer architecture over the last 60 years, including designing for Moore's law, using abstraction to simplify design, making the common case fast, performance via parallelism and pipelining, performance via prediction, hierarchy of memories, and dependability via redundancy. It also describes the layers between an application and hardware, including the compiler which translates high-level languages to machine code, and the operating system which handles tasks like input/output, storage allocation, and resource sharing.

Uploaded by

Shahzael Mughal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views24 pages

Chapter # 1 COAL

Uploaded by

Shahzael Mughal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

CHAPTER 1

Computer Abstractions and Technology

1.1 Introduction 3
1.2 Eight Great Ideas in Computer Architecture 11
1.3 Below Your Program 13
1.4 Under the Covers 16
1.5 Technologies for Building Processors and Memory 24
1.6 Performance 28
1.7 The Power Wall 40
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors
43
1.9 Real Stuff: Benchmarking the Intel Core i7 46
1.10 Fallacies and Pitfalls 49
1.11 Concluding Remarks 52
1.12 Historical Perspective and Further Reading 54
1.13 Exercises 54

CMPS290 Class Notes (Chap01) Page 1 / 24 by Kuo-pao Yang

1.1 Introduction 3

 Modern computer technology requires professionals of every computing specialty to

understand both hardware and software.

Classes of Computing Applications and Their Characteristics

 Personal computers
o A computer designed for use by an individual, usually incorporating a graphics
display, a keyboard, and a mouse.
o Personal computers emphasize delivery of good performance to single users at
low cost and usually execute third-party software.
o This class of computing drove the evolution of many computing technologies,
which is only about 35 years old!
 Server computers
o A computer used for running larger programs for multiple users, often
simultaneously, and typically accessed only via a network.
o Servers are built from the same basic technology as desktop computers, but
provide for greater computing, storage, and input/output capacity.
 Supercomputers
o A class of computers with the highest performance and cost
o Supercomputers consist of tens of thousands of processors and many terabytes of
memory, and cost tens to hundreds of millions of dollars.
o Supercomputers are usually used for high-end scientific and engineering
calculations
 Embedded computers
o A computer inside another device used for running one predetermined application
or collection of software.
o Embedded computers include the microprocessors found in your car, the
computers in a television set, and the networks of processors that control a
modern airplane or cargo ship.
o Embedded computing systems are designed to run one application or one set of
related applications that are normally integrated with the hardware and delivered
as a single system; thus, despite the large number of embedded computers, most
users never really see that they are using a computer!
o Elaboration: Many embedded processors are designed using processor cores, a
version of a processor written in a hardware description language, such as Verilog
or VHDL. The core allows a designer to integrate other application-speciﬁc
hardware with the processor core for fabrication on a single chip.

CMPS290 Class Notes (Chap01) Page 2 / 24 by Kuo-pao Yang

FIGURE 1.1 The 2X vs. 10Y bytes ambiguity was resolved by adding a binary notation for all the
common size terms. These prefixes work for bits as well as bytes, so gigabit (Gb) is 109 bits while
gibibits (Gib) is 230 bits.

Welcome to the PostPC Era

 Personal Mobile Device (PMD)
o Replacing the PC is the personal mobile device (PMD).
o PMDs are battery operated with wireless connectivity to the Internet.
o Figure 1.2 shows the rapid growth time of tables and smart phones versus that of
PCs and tradition cell phones.

FIGURE 1.1 The number manufactured per year of tablets and smart phones, which reflect the PostPC
era, versus personal computers and traditional cell phones. Smart phones represent the recent growth in
the cell phone industry, and they passed PCs in 2011. Tablets are the fastest growing category, nearly
doubling between 2011 and 2012. Recent PCs and traditional cell phone categories are relatively flat or
declining.

 Cloud Computing
o Warehouse Scale Computers (WSC): Companies like Amazon and Google build
these WSCs containing 100,000 servers and then let companies rent portions of
them so that they can provide software services to PMDs with having to build
WSCs of their own.

CMPS290 Class Notes (Chap01) Page 3 / 24 by Kuo-pao Yang

o Software as a Service (SaaS): It delivers software and data as a service over the
Internet, usually via a thin program such as a browser that runs on local client
devices, instead of binary code that must be installed, and runs wholly on that
device. Examples include web search and social networking. SaaS deployed via
the cloud is revolutionizing the software industry just PMDs and WSCs are
revolutionizing the hardware industry.
 Today’s software developers will often have a portion of software that runs on a PMD
and a portion that run in the Cloud.

What You Can Learn in This Book

 How programs written in a high-level language, such as C or Java are translated into
the machine language and how the hardware executes them?
 What is the interface between the software and the hardware, and how does software
instruct the hardware to perform needed function?
 What determines the performance of a program, and how can a programmer
improve the performance?
 What techniques can be used by hardware designers to improve performance?
 What is the reasons for and the consequences of the recent switch from sequential
processing to parallel processing?

Understanding Program Performance

 Algorithm
o Determines both the number of source-level statements and the number of I/O
operations executed
 Programming language, compiler, architecture
o Determine the number of machine instructions executed per operation
 Processor and memory system
o Determine how fast instructions are executed
 I/O system (hardware and OS)
o Determines how fast I/O operations are executed

CMPS290 Class Notes (Chap01) Page 4 / 24 by Kuo-pao Yang

1.2 Eight Great Ideas in Computer Architecture 11

 We now introduce eight great ideas that computer architects have been invented in
the last 60 years of computer design.
 These ideas are so powerful they have lasted long after the first computer that used
them, with newer architects demonstrating their admiration by imitating their
predecessors.

1. Design for Moore’s Law

o It states that integrated circuit resources double every 18-24 months.

2. Use Abstraction to Simplify Design

o A major productivity technique for hardware and software is to use abstractions
to represent the design at different levels of representation; lover-level details are
hidden to offer a simpler model at higher levels.

3. Make the Common Case Fast

o Making the common case fast will tend to enhance performance better than
optimizing the rare case. Ironically, the common case is often simpler than the
rare case and hence is often easier to enhance.

4. Performance via Parallelism

o Since the dawn of computing, computer architects have offered designs that get
more performance by performing operations in parallel.

CMPS290 Class Notes (Chap01) Page 5 / 24 by Kuo-pao Yang

5. Performance via Pipelining
o A particular pattern of parallelism is so prevalent in computer architecture that it
merits its own name: pipelining.

6. Performance via Prediction

o The next great idea is prediction. In some cases, it can be faster on average to
guess and start working rather than wait until you know for sure.

7. Hierarchy of Memories
o Architects have found that they can address these conflicting demands with a
hierarchy of memories with the fastest, smallest, and most expensive memory
per bit at the top of the hierarchy and the slowest, largest, and cheapest per bit at
the bottom.

8. Dependability via Redundancy

o Computers not only need to be fast; they need to be dependable. Since any
physical device can fail, we make systems dependable by including redundant
components that can take over when a failure occurs and help detect failures.
Restoring redundancy!

CMPS290 Class Notes (Chap01) Page 6 / 24 by Kuo-pao Yang

1.3 Below Your Program 13

 To go from a complex application to the simple instructions involves several layers of

software that interpret or translate high-level operations into simple computer
instructions, an example of the great idea of abstraction.
 Application software: written in high-level language
 System software sitting between the hardware and application software. Two type of
systems software are central to every computer today: an operating system and a
complier
o Compiler: Translates HLL code to machine code
o Operating System: Provide a variety of service and supervisory functions.
 Handling input and output operations
 Allocating storage and memory
 Scheduling tasks & sharing resources: Providing for protected sharing of the
computer among multiple applications using it simultaneously
 Hardware
o Processor, memory, I/O controllers

FIGURE 1.3 A simplified view of hardware and software as hierarchical layers, shown as concentric
circles with hardware in the center and applications software outermost. In complex applications, there
are often multiple layers of application software as well. For example, a database system may run on
top of the systems software hosting an application, which in turn runs on top of the database.

CMPS290 Class Notes (Chap01) Page 7 / 24 by Kuo-pao Yang

From a High-Level Language to the Language of Hardware
 High-level programming language: A portable language such as C, C++, Java, or
Visual Basic that is composed of words and algebraic nation that can be translated by
a compiler into assembly language
 Complier: A program that translates high level language statements into assembly
language statements.
 Assembler: A program that translates a symbolic version of instructions into the
binary version
 Assembly language: A symbolic representation of machine instructions
 Machine language: A binary representation of machine instructions

FIGURE 1.4 C program compiled into assembly language and then assembled into binary machine
language. Although the translation from high-level language to binary machine language is shown in
two steps, some compilers cut out the middleman and produce binary machine language directly.
These languages and this program are examined in more detail in Chapter 2.

CMPS290 Class Notes (Chap01) Page 8 / 24 by Kuo-pao Yang

1.4 Under the Covers 16

 The five classic components of a computer are input, output, memory, datapath,
and control, with the last two (datapath and control) sometimes combined and called
the processor.
o Input device: A mechanism through which the computer is fed information, such
as a keyboard.
o Output device: A mechanism that conveys the result of a computation to a user,
such as a display, or to anther computer such as network adapters.
o Memory: The storage area in which program are kept when they are running and
that contains the data needed by the running programs such as hard disk,
CD/DVD, flash memory.
o Central processor unit (CPU): Also processor. The active part the computer,
which contains the datapath and control and which add numbers, test number,
signals I/O device to activate, and so on.
o Datapath: The component of the processor that performs arithmetic operations
o Control: The component of the processor tat commands the datapath memory,
and, and I/O devices according to the instructions of the program

FIGURE 1.5 The organization of a computer, showing the five classic components. The processor
gets instructions and data from memory. Input writes data to memory, and output reads data from
memory. Control sends the signals that determine the operations of the datapath, memory, input, and
output.

CMPS290 Class Notes (Chap01) Page 9 / 24 by Kuo-pao Yang

Through the Looking Glass
 LCD screen: A display technology using a thin layer of liquid polymers that can be
used to transmit or block light according to whether a c charge is applied.
 Pixel: The smallest individual picture element. Screen are composed of hundreds of
thousands to millions of pixels, organized in a matric.
 The image is composed of a matrix of picture elements, or pixels, which can be
represented a matrix of bits, call a bit map. A typical tablet ranges in size from 1024
X 768 to 2048 X 1536.
 A color display might use 8 bits for each of the three colors (read, blue, and green),
for 24 bits per pixel, permitting millions of different colors to be displayed. Figure 1.6
shows a fram buffer with a simplified design of just 4 bits per pixel.

FIGURE 1.6 Each coordinate in the frame buffer on the left determines the shade of the
corresponding coordinate for the raster scan CRT display on the right. Pixel (X0, Y0) contains the bit
pattern 0011, which is a lighter shade on the screen than the bit pattern 1101 in pixel (X1, Y1).

Touchscreen
 While PCs also use LCD displays, the tablets and smartphones of the PostPC era
have replaced the keyboard and mouse with touch sensitive display, which has the
wonderful user interface advantage of user pointing directly what they are interested
in rat that indirectly with a mouse.

CMPS290 Class Notes (Chap01) Page 10 / 24 by Kuo-pao Yang

Opening the Box

FIGURE 1.7 Components of the Apple iPad 2 A1395. The metal back of the iPad (with the reversed
Apple logo in the middle) is in the center. At the top is the capacitive multitouch screen and LCD
display. To the far right is the 3.8 V, 25 watt-hour, polymer battery, which consists of three Li-ion cell
cases and offers 10 hours of battery life. To the far left is the metal frame that attaches the LCD to the
back of the iPad. The small components surrounding the metal back in the center are what we think of
as the computer; they are often L-shaped to fit compactly inside the case next to the battery. Figure 1.8
shows a close-up of the L-shaped board to the lower left of the metal case, which is the logic printed
circuit board that contains the processor and the memory. The tiny rectangle below the logic board
contains a chip that provides wireless communication: Wi-Fi, Bluetooth, and FM tuner. It fits into a
small slot in the lower left corner of the logic board. Near the upper left corner of the case is another L-
shaped component, which is a front-facing camera assembly that includes the camera, headphone jack,
and microphone. Near the right upper corner of the case is the board containing the volume control and
silent/screen rotation lock button along with a gyroscope and accelerometer. These last two chips
combine to allow the iPad to recognize 6-axis motion. The tiny rectangle next to it is the rear-facing
camera. Near the bottom right of the case is the L-shaped speaker assembly. The cable at the bottom is
the connector between the logic board and the camera/volume control board. The board between the
cable and the speaker assembly is the controller for the capacitive touchscreen. (Courtesy iFixit,
www.ifixit.com)

 Integrated circuit: Also called a chip. A device combining dozens to millions of

transistors.
 Dynamic random access memory (DRAM): Memory built as an integrated circuit;
it provides random access to any location. Access times are50 nanoseconds and cost
per gigabyte in 2012 was $5 to $10
 The memory is where the programs are kept when they are running; it also contains
the data needed by running program. The memory is built form DRAM chips.
 Multiple DRAMs are used together to contain the instructions and data of a program.

CMPS290 Class Notes (Chap01) Page 11 / 24 by Kuo-pao Yang

FIGURE 1.8 The logic board of Apple iPad 2 in Figure 1.7. The photo highlights five integrated
circuits. The large integrated circuit in the middle is the Apple A5 chip, which contains a dual ARM
processor cores that run at 1 GHz as well as 512 MB of main memory inside the package. Figure 1.9
shows a photograph of the processor chip inside the A5 package. The similar sized chip to the left is
the 32 GB flash memory chip for non-volatile storage. There is an empty space between the two chips
where a second flash chip can be installed to double storage capacity of the iPad. The chips to the right
of the A5 include power controller and I/O controller chips. (Courtesy iFixit, www.ifixit.com)

Inside the Processor

 Datapath: performs operations on data
 Control: sequences datapath, memory, ...
 Cache memory: Consists of a small, fast memory that acts as a buffer for the DRAM
memory.
o Static random access memory (SRAM): Also memory built as an integrated
circuit, but faster and less dense than DRAM

FIGURE 1.9 The processor integrated circuit inside the A5 package. The size of chip is 12.1 by 10.1
mm, and it was manufactured originally in a 45-nm process (see Section 1.5). It has two identical
ARM processors or cores in the middle left of the chip and a PowerVR graphical processor unit (GPU)
with four datapaths in the upper left quadrant. To the left and bottom side of the ARM cores are
interfaces to main memory (DRAM). (Courtesy Chipworks, www.chipworks.com)

CMPS290 Class Notes (Chap01) Page 12 / 24 by Kuo-pao Yang

A Safe Place for Data
 Volatile memory: Storage, such as DRAM that retains data only if it is receiving
power.
o Volatile main memory (primary memory): loses instructions and data when power
off.
 Nonvolatile memory: A form of memory that retains data even in the absence of a
power source and that is used to store programs between runs.
o Non-volatile secondary memory:
 Magnetic disk
 Flash memory
 Optical disk (CDROM, DVD)

Communicating with Other Computers

 Communication, resource sharing, nonlocal access
 Local area network (LAN): Ethernet
 Wide area network (WAN): the Internet
 Wireless network: WiFi, Bluetooth

CMPS290 Class Notes (Chap01) Page 13 / 24 by Kuo-pao Yang

1.5 Technologies for Building Processors and Memory 24

 Electronics technology continues to evolve

o Increased capacity and performance
o Reduced cost
 A transistor is simply an on/off switch controlled by electricity.
 The integrated circuit (IC) combined dozens to hundreds of transistors into a simple
chip.
 Very large-scale integrated (VLSI) circuit: A device containing hundreds of
thousands to millions of transistors.

Year Technology Relative performance/cost

1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000
FIGURE 1.10 Relative performance per unit cost of technologies used in computers over time.
Source: Computer Museum, Boston, with 2013 extrapolated by the authors.

FIGURE 1.11 Growth of capacity per DRAM chip over time. The y-axis is measured in kibibits (210
bits). The DRAM industry quadrupled capacity almost every three years, a 60% increase per year, for
20 years. In recent years, the rate has slowed down and is somewhat closer to doubling every two years
to three years.

CMPS290 Class Notes (Chap01) Page 14 / 24 by Kuo-pao Yang

Manufacturing ICs
 Silicon crystal igot: A rod composed of a silicon crystal that is between 8 and 12
inches in diameter and about 12 to24 inches long
 Wafer: A slice from a silicon igot no more than 0.1 inches thick, used to create chips.

FIGURE 1.12 The chip manufacturing process. After being sliced from the silicon ingot, blank
wafers are put through 20 to 40 steps to create patterned wafers (see Figure 1.13). These patterned
wafers are then tested with a wafer tester, and a map of the good parts is made. Then, the wafers are
diced into dies (see Figure 1.9). In this figure, one wafer produced 20 dies, of which 17 passed testing.
(X means the die is bad.) The yield of good dies in this case was 17/20, or 85%. These good dies are
then bonded into packages and tested one more time before shipping the packaged parts to customers.
One bad packaged part was found in this final test

 Intel Core i7 Wafer

o 300mm wafer, 280 chips, 32nm technology
o Each chip is 20.7 x 10.5 mm

FIGURE 1.13 A 12-inch (300 mm) wafer of Intel Core i7 (Courtesy Intel). The number of dies on
this 300 mm (12 inch) wafer at 100% yield is 280, each 20.7 by 10.5 mm. The several dozen partially
rounded chips at the boundaries of the wafer are useless; they are included because it’s easier to create
the masks used to pattern the silicon. This die uses a 32-nanometer technology, which means that the
smallest features are approximately 32 nm in size, although they are typically somewhat smaller than
the actual feature size, which refers to the size of the transistors as “drawn” versus the final
manufactured size.

CMPS290 Class Notes (Chap01) Page 15 / 24 by Kuo-pao Yang

1.6 Performance 28

 Response Time and Throughput

o Response time: Also called execution time for the computer to complete a task,
including disk access, memory access, I/O activities, operating system overhead,
CPU execution time, and so on. How long it takes to do a task
o Throughput: Also called bandwidth. Another measure of performance. It is the
number of tasks completed per unit time. Total work done per unit time
 How are response time and throughput affected by
o Replacing the processor with a faster version?
o Adding more processors?
 Relative Performance
o Define Performance = 1/Execution Time
o If X is n time fast as Y, then the execution time on Y is n times as long as it is on
X.

Performance X Execution timeY

 n
Performance Y Execution timeX

o Example: If computer A runs a program in 10 seconds and computer B runs the

same program in 15 seconds, how much faster is A than B?

Execution TimeB / Execution TimeA = 15s / 10s = 1.5

So A is 1.5 times faster than B

 Clock cycle: Also called tick, clock tick, clock period, clock, or cycle. The time for
one clock period, usually of the processor clock, which runs at a constant rate

o Clock period: duration of a clock cycle

e.g., 250ps = 0.25ns = 250×10–12s
o Clock frequency (rate): cycles per second
e.g., 4.0GHz = 4000MHz = 4.0×109Hz

CMPS290 Class Notes (Chap01) Page 16 / 24 by Kuo-pao Yang

CPU Performance and its Factors
 CPU Time

CPU Time  CPU Clock Cycles  Clock Cycle Time

CPU Clock Cycles

Clock Rate

 Performance improved by
o Reducing number of clock cycles
o Increasing clock rate
o Hardware designer must often trade off clock rate against cycle count
 CPU Time Example: Our favorite program runs in 10 seconds on computer A, which
has a 2 GHz clock. We are trying to help a computer designer build a computer, B,
which with run this program in 6 seconds. The designer has determined that a
substantial increase in the clock rate is possible, but this increase will affect the rest of
the CPU design, causing computer B to require 1.2 time as many clock cycles as
computer A for this program. What clack rate should we tell the designer to target?
o Computer A: 2 GHz clock, 10sec CPU time
o Designing Computer B
 Aim for 6sec CPU time
 Can do faster clock, but causes 1.2 × clock cycles
o How fast must Computer B clock be?

Clock CyclesB 1.2  Clock CyclesA

Clock RateB  
CPU TimeB 6s
Clock CyclesA  CPU TimeA  Clock Rate A
 10s  2GHz  20  10 9
1.2  20  10 9 24  10 9
Clock RateB    4GHz
6s 6s

o Answer: 4GHz

CMPS290 Class Notes (Chap01) Page 17 / 24 by Kuo-pao Yang

Instruction Performance
 Instruction Count and CPI

Clock Cycles  Instructio n Count  Cycles per Instructio n

CPU Time  Instructio n Count  CPI  Clock Cycle Time
Instructio n Count  CPI

Clock Rate

 Clock cycles per instruction (CPI): Average number of clock cycles per instruction
for a program or program fragment.
o Instruction Count for a program
 Determined by program, ISA and compiler
o Average cycles per instruction
 Determined by CPU hardware
 If different instructions have different CPI
 Average CPI affected by instruction mix

 CPI Example: Suppose we have two implementations of the same instruction set
architecture. Computer A has a clock cycle time of 250ps and a CPI of 2.0 for some
program, and computer B has a clock cycle time of 500ps and a CPI of 1.2 for the
same program. Which computer is faster for this program and by how much?
o Computer A: Cycle Time = 250ps, CPI = 2.0
o Computer B: Cycle Time = 500ps, CPI = 1.2
o Same instruction set architecture (ISA)
o Which is faster, and by how much?

CPU Time  Instructio n Count  CPI  Cycle Time

A A A
 I  2.0  250ps  I  500ps
CPU Time  Instructio n Count  CPI  Cycle Time
B B B
 I  1.2  500ps  I  600ps

B  I  600ps  1.2
CPU Time
CPU Time I  500ps
A

o Answer: 1.2

CMPS290 Class Notes (Chap01) Page 18 / 24 by Kuo-pao Yang

The Classic CPU Performance Equation
 CPI in More Detail
o If different instruction classes take different numbers of cycles

n
Clock Cycles   (CPIi  Instruction Counti )
i 1

o Weighted average CPI

Clock Cycles n
 Instruction Counti 
CPI     CPIi  
Instruction Count i1  Instruction Count 

 CPI Example: A compiler designer is trying to decide between two code sequences
for a particular computer. The hardware designers have supplied the following facts:

Class A B C

CPI for class 1 2 3

For a particular high-level language statement, the compiler writer is considering two
code sequences that required the following instruction counts:

Class A B C
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

Which code sequence executes the most instructions? Which will be faster? What is
the CPI for each sequence?

o Alternative compiled code sequences using instructions in classes A, B, C

o Sequence 1:
 IC = 2 + 1 + 2 = 5 inst.
 Clock Cycles = 2×1 + 1×2 + 2×3 = 10
 Avg. CPI = 10 / 5 = 2.0
o Sequence 2:
 IC = 4 + 1 + 1 = 6 inst.
 Clock Cycles = 4×1 + 1×2 + 1×3 = 9
 Avg. CPI = 9 / 6 = 1.5

CMPS290 Class Notes (Chap01) Page 19 / 24 by Kuo-pao Yang

Performance Summary

Seconds Instructions Clock cycles Seconds

Execution Time    
Program Program Instruction Clock cycle

 Performance depends on
o Algorithm: affects IC, possibly CPI
o Programming language: affects IC, CPI
o Compiler: affects IC, CPI
o Instruction set architecture: affects IC, CPI, Clock rate

CMPS290 Class Notes (Chap01) Page 20 / 24 by Kuo-pao Yang

1.7 The Power Wall 40

 Clock rate and Power

FIGURE 1.16 Clock rate and Power for Intel x86 microprocessors over eight generations and 25
years. The Pentium 4 made a dramatic jump in clock rate and power but less so in performance. The
Prescott thermal problems led to the abandonment of the Pentium 4 line. The Core 2 line reverts to a
simpler pipeline with lower clock rates and multiple processors per chip. The Core i5 pipelines follow
in its footsteps.

 In CMOS (complementary metal oxide semiconductor) IC technology

Power  Capacitiveload Voltage2 Frequency

 Reducing Power Example: Suppose we developed a new, simpler processor that has
85% of the capacitive load of the more complex older processor. Further, assume that
it has adjustable voltage so that it can reduce voltage 15% compared to processor B,
which results in 15% shrink in frequency. What is the impact on dynamic power?

o Suppose a new CPU has

 85% of capacitive load of old CPU
 15% voltage reduction
 15% frequency reduction

Pnew (C old  0.85)  (Vold  0.85)2  (Fold  0.85)

  0.85 4  0.52
C old  Vold  Fold
2
Pold

o Answer: the new processor uses about half (0.52) the power of the old processor

CMPS290 Class Notes (Chap01) Page 21 / 24 by Kuo-pao Yang

1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors
43

 Uniprocessor: A single program running on the single processor

 Multicore microprocessors
o More than one processor per chip
o A “quadcore” microprocessor is a chip that contains four processor or four cores.

FIGURE 1.17 Growth in processor performance since the mid-1980s. This chart plots performance
relative to the VAX 11/780 as measured by the SPECint benchmarks (see Section 1.10). Prior to the
mid-1980s, processor performance growth was largely technology-driven and averaged about 25% per
year. The increase in growth to about 52% since then is attributable to more advanced architectural and
organizational ideas. The higher annual performance improvement of 52% since the mid-1980s meant
performance was about a factor of seven higher in 2002 than it would have been had it stayed at 25%.
Since 2002, the limits of power, available instruction-level parallelism, and long memory latency have
slowed uniprocessor performance recently, to about 22% per year.

CMPS290 Class Notes (Chap01) Page 22 / 24 by Kuo-pao Yang

1.9 Real Stuff: Benchmarking the Intel Core i7 46

SPEC CPU Benchmark

 Programs used to measure performance
o Supposedly typical of actual workload
 Standard Performance Evaluation Corp (SPEC)
o Develops benchmarks for CPU, I/O, Web, …
 SPEC CPU2006
o Elapsed time to execute a selection of programs
 Negligible I/O, so focuses on CPU performance
o Normalize relative to reference machine
o Summarize as geometric mean of performance ratios
 CINT2006 (integer) and CFP2006 (floating-point)

FIGURE 1.18 SPECINTC2006 benchmarks running on a 2.66 GHz Intel Core i7 920. As the
equation on page 35 explains, execution time is the product of the three factors in this table: instruction
count in billions, clocks per instruction (CPI), and clock cycle time in nanoseconds. SPECratio is
simply the reference time, which is supplied by SPEC, divided by the measured execution time. The
single number quoted as SPECINTC2006 is the geometric mean of the SPECratios.

n
n
 Execution time ratio
i1
i

 Intel Core i7 Geometric mean is 25.7

CMPS290 Class Notes (Chap01) Page 23 / 24 by Kuo-pao Yang

1.11 Concluding Remarks 52

 Eight Great Ideas in Computer Architecture

1. Design for Moore’s Law
2. Use Abstraction to Simplify Design
3. Make the Common Case Fast
4. Performance via Parallelism
5. Performance via Pipelining
6. Performance via Prediction
7. Hierarchy of Memories
8. Dependability via Redundancy
 The execution time is related to other important measurements we can make by the
following equation:

Seconds Instructions Clock cycles Seconds

Execution Time    
Program Program Instruction Clock cycle

CMPS290 Class Notes (Chap01) Page 24 / 24 by Kuo-pao Yang

Patterson6e MIPS Ch01 PPT
No ratings yet
Patterson6e MIPS Ch01 PPT
49 pages
Chapter 1: Computer Abstractions and Technology
No ratings yet
Chapter 1: Computer Abstractions and Technology
50 pages
CMPS290 Class Notes Chap 01
No ratings yet
CMPS290 Class Notes Chap 01
24 pages
CH6 - Computer Abstractions and Technology
No ratings yet
CH6 - Computer Abstractions and Technology
69 pages
EE360 Embedded Systems: Omputer Rganization and Esign
No ratings yet
EE360 Embedded Systems: Omputer Rganization and Esign
70 pages
Lecture1 ch1
No ratings yet
Lecture1 ch1
24 pages
Lecture 1
No ratings yet
Lecture 1
51 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
50 pages
Course Instructor: Dr. Afshan Jamil Lecture # 1: Chapter # 1: Computer Abstractions and Technology
No ratings yet
Course Instructor: Dr. Afshan Jamil Lecture # 1: Chapter # 1: Computer Abstractions and Technology
23 pages
Chapter 1 - Computer Abstractions Technology432
No ratings yet
Chapter 1 - Computer Abstractions Technology432
37 pages
CS224-Topic 01 Introduction
No ratings yet
CS224-Topic 01 Introduction
19 pages
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
No ratings yet
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
151 pages
Cse431 02
No ratings yet
Cse431 02
50 pages
L1 Intro
No ratings yet
L1 Intro
23 pages
Chap1 Intro
No ratings yet
Chap1 Intro
21 pages
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
No ratings yet
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
33 pages
CMPS375 Class Notes Chap 01
No ratings yet
CMPS375 Class Notes Chap 01
17 pages
Chap 01
No ratings yet
Chap 01
11 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Comp Org and Desgn Notes
No ratings yet
Comp Org and Desgn Notes
5 pages
CS224-Topic 01 Introduction (Updated)
No ratings yet
CS224-Topic 01 Introduction (Updated)
19 pages
Chapter1 Computer Abstractions and Technology
No ratings yet
Chapter1 Computer Abstractions and Technology
52 pages
Computer Architecture and Operating Systems (CS31702)
No ratings yet
Computer Architecture and Operating Systems (CS31702)
30 pages
PDF
No ratings yet
PDF
41 pages
Cs6303comparchnotes PDF
No ratings yet
Cs6303comparchnotes PDF
250 pages
Chapter 1 Measuring Understanding Performance
No ratings yet
Chapter 1 Measuring Understanding Performance
63 pages
Omputer Rganization and Esign: The Hardware/Software Interface
No ratings yet
Omputer Rganization and Esign: The Hardware/Software Interface
64 pages
Cse.m-ii-Advances in Computer Architecture (12scs23) - Notes
No ratings yet
Cse.m-ii-Advances in Computer Architecture (12scs23) - Notes
213 pages
Computer Abstractions and Technology: Omputer Rganization AND Esign
No ratings yet
Computer Abstractions and Technology: Omputer Rganization AND Esign
50 pages
Chapter 0
No ratings yet
Chapter 0
26 pages
Week 4a - Computer Architecture Fundamentals - Part 1
No ratings yet
Week 4a - Computer Architecture Fundamentals - Part 1
45 pages
EC8552 Computer Architecture and Organization Notes 1
No ratings yet
EC8552 Computer Architecture and Organization Notes 1
106 pages
Chapter 01
No ratings yet
Chapter 01
56 pages
Lecture 2 - Fundamentals
No ratings yet
Lecture 2 - Fundamentals
67 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
51 pages
System Integration and Architecture II
No ratings yet
System Integration and Architecture II
7 pages
CSC 247 Class Lecture
No ratings yet
CSC 247 Class Lecture
33 pages
CS3350B Computer Architecture: Marc Moreno Maza
100% (1)
CS3350B Computer Architecture: Marc Moreno Maza
45 pages
Computer Abstractions and Technology: The Hardware/Software Interface 5
No ratings yet
Computer Abstractions and Technology: The Hardware/Software Interface 5
52 pages
Chapter 01
No ratings yet
Chapter 01
49 pages
01 Abst
No ratings yet
01 Abst
64 pages
Chapter 01
No ratings yet
Chapter 01
49 pages
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
No ratings yet
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
44 pages
Chapter1 Computer Abstractions and Technology
No ratings yet
Chapter1 Computer Abstractions and Technology
53 pages
KTMT
No ratings yet
KTMT
470 pages
Chapter 01 Modified
No ratings yet
Chapter 01 Modified
55 pages
CA0216D Chapter1B
No ratings yet
CA0216D Chapter1B
32 pages
ARM Computer Organization-Chapter01
No ratings yet
ARM Computer Organization-Chapter01
55 pages
Unit I Overview & Instructions: Cs6303-Computer Architecture
100% (1)
Unit I Overview & Instructions: Cs6303-Computer Architecture
16 pages
Mcgraw-Hill/Irwin Mcgraw-Hill/Irwin
No ratings yet
Mcgraw-Hill/Irwin Mcgraw-Hill/Irwin
66 pages
Performance of A Computer
No ratings yet
Performance of A Computer
83 pages
Patterson6e MIPS Ch01 PPT
No ratings yet
Patterson6e MIPS Ch01 PPT
49 pages
Computer Architecture, A Quantitative Approach - Hennessy, Patterson 4
No ratings yet
Computer Architecture, A Quantitative Approach - Hennessy, Patterson 4
912 pages
CMP2008 - NOTLAR - Chapter - 01 - 4UP
No ratings yet
CMP2008 - NOTLAR - Chapter - 01 - 4UP
13 pages
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
From Everand
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
Sam Steed
No ratings yet
Comp Organization
0% (1)
Comp Organization
49 pages
CompArch - Chapter One
No ratings yet
CompArch - Chapter One
9 pages
Virtual Report Processing: The Mapper Story
From Everand
Virtual Report Processing: The Mapper Story
Louis Schlueter
No ratings yet
How Practical Are Fault Injection Attacks Really
No ratings yet
How Practical Are Fault Injection Attacks Really
9 pages
Motherboard Manual For Gigabyte GA-586TX
No ratings yet
Motherboard Manual For Gigabyte GA-586TX
112 pages
JNTUH COA Unit 4
No ratings yet
JNTUH COA Unit 4
25 pages
Schiller AT-6 ECG - Service Manual
No ratings yet
Schiller AT-6 ECG - Service Manual
150 pages
Signal Integrity 1
No ratings yet
Signal Integrity 1
15 pages
Pc1602ar Dwa A Q Powertip
No ratings yet
Pc1602ar Dwa A Q Powertip
19 pages
BIOS Survival Guide by Jean-Paul Rodrigue and Phil Croucher
100% (6)
BIOS Survival Guide by Jean-Paul Rodrigue and Phil Croucher
50 pages
Presentacion 10 Barrybrei
No ratings yet
Presentacion 10 Barrybrei
43 pages
Netking: Ahsan Jameel Khan
No ratings yet
Netking: Ahsan Jameel Khan
20 pages
DRAM
No ratings yet
DRAM
24 pages
Resume Chintan Shukla
No ratings yet
Resume Chintan Shukla
3 pages
Semiconductor Devices Overview
No ratings yet
Semiconductor Devices Overview
10 pages
Data Storage
No ratings yet
Data Storage
9 pages
Introduction To Computing - Module 3 - Hardware Components of Personal Computer
No ratings yet
Introduction To Computing - Module 3 - Hardware Components of Personal Computer
42 pages
Signa HD, HDX & HDXT 1.5T Block Diagrams OP23
No ratings yet
Signa HD, HDX & HDXT 1.5T Block Diagrams OP23
135 pages
Cisf Hardware
No ratings yet
Cisf Hardware
32 pages
Hs-5220/Hs-5620: 100Mhz FSB / Vga / Lan / Sound
No ratings yet
Hs-5220/Hs-5620: 100Mhz FSB / Vga / Lan / Sound
55 pages
Samsung Electronics
No ratings yet
Samsung Electronics
3 pages
Cos Po1 Po2 Po3 Po4 Po5 Po6 Po7 Po8 Po9 Po10 Po11 Po 12 Pso1 Pso2 Co1 Co2 Co3 Co4 Co5 Average (Rounded To Nearest Integer)
No ratings yet
Cos Po1 Po2 Po3 Po4 Po5 Po6 Po7 Po8 Po9 Po10 Po11 Po 12 Pso1 Pso2 Co1 Co2 Co3 Co4 Co5 Average (Rounded To Nearest Integer)
127 pages
Unit 5
No ratings yet
Unit 5
33 pages
Computer Architecture 16 Marks
100% (1)
Computer Architecture 16 Marks
28 pages
IBM PC 300 GL Maintenance Manual
100% (1)
IBM PC 300 GL Maintenance Manual
88 pages
Computer Architecture Slide
No ratings yet
Computer Architecture Slide
352 pages
Components of A Computer: Device Description Keyboard
No ratings yet
Components of A Computer: Device Description Keyboard
31 pages
Memories Complete
No ratings yet
Memories Complete
51 pages
ME
No ratings yet
ME
37 pages
Semiconductor Assembly and Chip Mounting Machines Designed For High-Speed Operation and High Product Quality
No ratings yet
Semiconductor Assembly and Chip Mounting Machines Designed For High-Speed Operation and High Product Quality
6 pages
Computer Organization & Architecture (KCA 105) : DR Manmohan Mishra Associate Professor MCA, Department
No ratings yet
Computer Organization & Architecture (KCA 105) : DR Manmohan Mishra Associate Professor MCA, Department
87 pages
Approximate NoC and Memory Controller Architectures For GPGPU Accelerators
No ratings yet
Approximate NoC and Memory Controller Architectures For GPGPU Accelerators
15 pages
COA Assignment: Ques 1) Describe The Principles of Magnetic Disk
No ratings yet
COA Assignment: Ques 1) Describe The Principles of Magnetic Disk
8 pages