0% found this document useful (0 votes)
49 views

Chapter 1 Computer Abstractions and Technology

The document discusses computer organization and design. It covers topics like computer abstractions, the components of a computer, processor performance, and measuring execution time. The document contains detailed information and examples related to computer hardware and software.

Uploaded by

q qq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Chapter 1 Computer Abstractions and Technology

The document discusses computer organization and design. It covers topics like computer abstractions, the components of a computer, processor performance, and measuring execution time. The document contains detailed information and examples related to computer hardware and software.

Uploaded by

q qq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

COMPUTER ORGANIZATION AND DESIGN

5th
Edition
The Hardware/Software Interface

Chapter 1
Computer Abstractions
and Technology
§1.1 Introduction
The Computer Revolution
 Progress in computer technology
 Underpinned by Moore’s Law
 The number of transistors in a IC doubles every
18-24 months
 Makes novel applications feasible
 Cell phones
 World Wide Web
 Search Engines
 Self-driving car, drone
 VR games
 Computers are pervasive
Chapter 1 — Computer Abstractions and Technology — 2
Classes of Computers
 Desktop computers
 General purpose, variety of software
 Subject to cost/performance tradeoff
 Server computers
 Network based
 High capacity, performance, reliability
 Range from small servers to building sized
 Embedded computers
 Hidden as components of systems
 Stringent power/performance/cost constraints

Chapter 1 — Computer Abstractions and Technology — 3


Processor Market - The PostPC Era

Chapter 1 — Computer Abstractions and Technology — 4


Understanding Performance
 Algorithm
 Determines number of operations executed
 Programming language, compiler
 Determine number of machine instructions for each
source-level statement
 Processor and memory system
 Determine how fast machine instructions are executed
 I/O system (including OS)
 Determines how fast I/O operations are executed

Chapter 1 — Computer Abstractions and Technology — 5


§1.2 Eight Great Ideas in Computer Architecture
Eight Great Ideas
 Design for Moore’s Law

 Use abstraction to simplify design

 Make the common case fast

 Performance via parallelism

 Performance via pipelining

 Performance via prediction

 Hierarchy of memories

 Dependability via redundancy

Chapter 1 — Computer Abstractions and Technology — 6


§1.2 Below Your Program
Below Your Program
 Application software
 Written in high-level language
 System software
 Compiler: translates HLL code to
machine code
 Operating System: service code
 Handling input/output
 Managing memory and storage
 Scheduling tasks & sharing resources
 Hardware
 Processor, memory, I/O controllers

Chapter 1 — Computer Abstractions and Technology — 7


Levels of Program Code
 High-level language
 Level of abstraction closer
to problem domain
 Provides for productivity
and portability
 Assembly language
 Textual representation of
instructions
 Hardware representation
 Binary digits (bits)
 Encoded instructions and
data

Chapter 1 — Computer Abstractions and Technology — 8


§1.3 Under the Covers
Components of a Computer
The BIG Picture  Same components for
all kinds of computer
 Desktop, server,
embedded
 Input/output includes
Input
 User-interface devices
 Display, keyboard, mouse
 Storage devices
Output
 Hard disk, CD/DVD, flash
 Network adapters
Processor Memory
 For communicating with
other computers

Chapter 1 — Computer Abstractions and Technology — 9


Anatomy of a Computer

Output
device

Network
cable

Input Input
device device

Chapter 1 — Computer Abstractions and Technology — 10


Opening the Box

Chapter 1 — Computer Abstractions and Technology — 11


Inside the Processor (CPU)
 AMD Barcelona: 4 processor cores

Cache memory : Small fast SRAM memory for immediate access to data

Chapter 1 — Computer Abstractions and Technology — 12


Abstractions
The BIG Picture

 Abstraction helps us deal with complexity


 Hide lower-level detail

 Instruction set architecture (ISA)


 The hardware/software interface
 Instruction set, register set, addressing mode,
interrupt, etc.

Chapter 1 — Computer Abstractions and Technology — 13


Technology Trend
Year Technology Relative performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2005 Ultra large scale IC 6,200,000,000

 Technology
continues to
evolve
 Increased DRAM capacity

capacity and
performance
 Reduced cost

Chapter 1 — Computer Abstractions and Technology — 14


§1.7 Real Stuff: The AMD Opteron X4
Manufacturing ICs

 Yield: proportion of working dies per wafer

Chapter 1 — Computer Abstractions and Technology — 15


AMD Opteron X2 Wafer
AMD Opteron X2 Wafer Intel Core i7 Wafer

 12-inch wafer, 280 chips,


 X2: 12-inch wafer, 117 chips, 32nm technology
90nm technology  Each chip is 20.7 x 10.5 mm

Chapter 1 — Computer Abstractions and Technology — 16


§1.4 Performance
Defining Performance
 Which airplane has the best performance?

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50

0 100 200 300 400 500 0 2000 4000 6000 8000 10000

Passenger Capacity Cruising Range (miles)

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50

0 500 1000 1500 0 100000 200000 300000 400000

Cruising Speed (mph) Passengers x mph

Chapter 1 — Computer Abstractions and Technology — 17


Response Time and Throughput
 Response time
 How long it takes to do a task
 Throughput
 Total work done per unit time
 e.g., tasks/transactions/… per hour
 How are response time and throughput affected
by
 Replacing the processor with a faster version?
 Adding more processors?
 We’ll focus on response time for now…

Chapter 1 — Computer Abstractions and Technology — 18


Relative Performance
 Define Performance = 1/Execution Time
 “X is n time faster than Y”
Performance X Performance Y
 Execution time Y Execution time X  n

 Example: time taken to run a program


 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and Technology — 19
Measuring Execution Time
 Elapsed time
 Total response time, including all aspects
 CPU processing, I/O, OS overhead, idle time
 Determines system performance

 CPU time  The time we focus


 Comprises user CPU time and system CPU time
 We focus on user CPU time which means the CPU
time spent in the code of our application.

Chapter 1 — Computer Abstractions and Technology — 20


CPU Clocking
 Digital hardware governed by a constant-rate clock
Clock period

Clock (cycles)
Data transfer/calculation
Update state

 Clock period (cycle time): duration of a clock cycle


reciprocal

 e.g., 250ps = 0.25ns = 250×10–12s


 Note: s ms (10-3) us(10-6) ns (10-9) ps (10-12)
 Clock rate (frequency) : cycles per second
 e.g., 4.0GHz = 4000MHz = 4.0×109Hz
 Note: Hz KHz (103) MHz(106) GHz(109) THz (1012)
CPU Time
CPU Time  CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles

Clock Rate

 Performance improved by
 Reducing number of clock cycles
 Increasing clock rate
 Hardware designer must often trade off clock
rate against cycle count

Chapter 1 — Computer Abstractions and Technology — 22


CPU Time Example
 Computer A: 2GHz clock, 10s CPU time
 Designing Computer B
 Aim for 6s CPU time
 Can do faster clock, but causes 1.2 × clock cycles
 How fast must Computer B clock be? (i.e., clock rate?)

Chapter 1 — Computer Abstractions and Technology — 23


Instruction Count and CPI
Clock Cycles  Instruction Count  Cycles per Instruction
CPU Time  Instruction Count  CPI  Clock Cycle Time
Instruction Count  CPI

Clock Rate

 Instruction Count for a program


 Determined by program (language/algorithm), compiler, ISA
 Average cycles per instruction
 Determined by CPU hardware
 If different instructions have different CPI
 Average CPI affected by instruction mix

Chapter 1 — Computer Abstractions and Technology — 24


CPI Example
 Computer A: Cycle Time = 250ps, CPI = 2.0
 Computer B: Cycle Time = 500ps, CPI = 1.2
 A, B : Same ISA they mean “same instr. count“

 Same program running on A and B, which is faster,


and by how much?
CPU Time  Instruction Count  CPI  Cycle Time
A A A
 I  2.0  250ps  I  500ps A is faster…

CPU Time  Instruction Count  CPI  Cycle Time


B B B
 I  1.2  500ps  I  600ps

B  I  600ps  1.2
CPU Time
…by this much
CPU Time I  500ps
A
Chapter 1 — Computer Abstractions and Technology — 25
CPI in More Detail
 If different instruction classes take different
numbers of cycles
n
Clock Cycles   (CPIi  Instructio n Counti )
i1

 Weighted average CPI


Clock Cycles n
 Instructio n Counti 
CPI     CPIi  
Instructio n Count i1  Instructio n Count 

Relative frequency

Chapter 1 — Computer Abstractions and Technology — 26


CPI Example
 Alternative compiled code sequences using
instructions in classes A, B, C

Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

 Sequence 1: IC = 5  Sequence 2: IC = 6
 Clock Cycles  Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
 Avg. CPI = 10/5 = 2.0  Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 27
Performance Summary
The BIG Picture
Tcycle
IC CPI (=1/CR)
Instructio ns Clock cycles Seconds
CPU Time   
Program Instructio n Clock cycle

 Performance depends on
 Algorithm: affects IC, possibly CPI
 Programming language: affects IC, CPI
 Compiler: affects IC, CPI
 Instruction set architecture: affects IC, CPI, Tc

Chapter 1 — Computer Abstractions and Technology — 28


§1.5 The Power Wall
Power Trends

Power
consumption
goes with
clock rate

 In CMOS IC technology
Power  Capacitive load  Voltage2  Frequency

×22 5V → 1V ×300
Chapter 1 — Computer Abstractions and Technology — 29
Reducing Power

 The power wall


 We can’t reduce voltage further
 We can’t remove more heat
 How else can we improve performance?

Chapter 1 — Computer Abstractions and Technology — 30


§1.6 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance

Constrained by power, instruction-level parallelism, memory latency


 Turn to design multiple processors (cores)
Chapter 1 — Computer Abstractions and Technology — 31
Multiprocessors
 Multicore microprocessors
 More than one processor per chip
 Requires explicitly parallel programming
 Rewrite programs for parallelism
 Load balancing
 Optimizing communication and synchronization
 Comparison
 With instruction-level parallelism (pipeline)
 Hardware executes multiple instructions at once
 Hidden from the programmer (Note: In the past, programmers
could rely on innovations in hardware to double the performance
every 18 months without having to change a line of code)

Chapter 1 — Computer Abstractions and Technology — 32


SPEC CPU Benchmark
 Programs used to measure performance
 Supposedly typical of actual workload
 Standard Performance Evaluation Corp (SPEC)
 Develops benchmarks for CPU, I/O, Web, …
 SPEC CPU2006
 Elapsed time to execute a selection of programs
 Negligible I/O, so focuses on CPU performance
 Normalize relative to reference machine
 Summarize as geometric mean of performance ratios
 CINT2006 (integer) and CFP2006 (floating-point)

n
n
 Execution time ratio
i1
i

Chapter 1 — Computer Abstractions and Technology — 33


CINT2006 for Opteron X4 2356
Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECrati
o
perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3
bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8
gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1
mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8
go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6
hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5
sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5
libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8
h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3
omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1
astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1
xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0
Geometric mean  11.7

Ref time: provided by SPEC


SPECratio: Ref-time/Exec-time (The bigger the SPECratio, the faster the CPU is).

Chapter 1 — Computer Abstractions and Technology — 34


§1.8 Fallacies and Pitfalls
Amdahl’s Law
 Pitfall: Improve an aspect of a computer and expect a
proportional improvement in overall performance
 Incorrect

Taffected
 Amdahl’s Law: Timproved   Tunaffected
improvement factor
 Example: multiply accounts for 80s/100s
 How much improvement in multiply performance to

get 4× overall performance?


100 80
  20  n =16
4 n

100 80
 How about 5× ?   20  Can’t be done!
5 n
Chapter 1 — Computer Abstractions and Technology — 35
Example
 Suppose we enhance a machine making all floating-point
instructions run five times faster. If the execution time of
some benchmark before the floating-point enhancement is 10
sec, what will the speedup be if half of the 10 sec is spent
executing floating-point instructions?

 We are looking for a benchmark to show off the new floating-


point unit described above, and want the overall benchmark
to show a speedup of 2. One benchmark we are considering
runs for 100 sec with the old floating-point hardware. How
much of the execution time would floating-point instructions
have to account for in this program in order to yield desired
speedup on this benchmark?

Chapter 1 — Computer Abstractions and Technology — 36


Fallacy: Low Power at Idle
 Look back at i7 power benchmark
 At 100% load: 258W
 At 50% load: 170W (66%)
 At 10% load: 121W (47%)
 Google data center
 Mostly operates at 10% – 50% load
 At 100% load less than 1% of the time
 Consider designing processors to make
power proportional to load

Chapter 1 — Computer Abstractions and Technology — 37


Pitfall: MIPS as a Performance Metric
 MIPS: Millions of Instructions Per Second
 Doesn’t account for
 Differences in ISAs between computers
 Differences in complexity between instructions
 even with the same ISA, different programs on

the same CPU have different MIPS,


Instructio n count
MIPS 
Execution time  10 6
Instructio n count Clock rate
 
Instructio n count  CPI CPI  10 6
 10 6

Clock rate
 CPI varies between programs on a given CPU

Chapter 1 — Computer Abstractions and Technology — 38


§1.9 Concluding Remarks
Concluding Remarks
 Cost/performance is improving
 Due to underlying technology development
 Hierarchical layers of abstraction
 In both hardware and software
 Instruction set architecture
 The hardware/software interface
 Execution time
 The best performance measure
 Power is a limiting factor
 Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 39

You might also like