Computer Abstractions and Technology

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

COMPUTER ORGANIZATION AND DESIGN

Chapter 1
Computer Abstractions
and Technology

Lecture slides are adapted/modified from slides provided by the textbook,


Computer Organization and Design by David A Patterson and John L. Hennessy
publisher Morgan Kaufmann Publisher
The Computer Revolution
◼ Progress in computer technology
◼ Underpinned by Moore’s Law
◼ Makes novel applications feasible
◼ Computers in automobiles
◼ Cell phones
◼ Human genome project Moore’s Law: the no.
of transistors per chip
◼ World Wide Web doubles every two
years
◼ Search Engines
◼ Computers are pervasive

Chapter 1 — Computer Abstractions and Technology — 2


Classes of Computers
◼ Personal computers
◼ General purpose, variety of software
◼ Subject to cost/performance tradeoff

◼ Server computers
◼ Network based
◼ High capacity, performance, reliability
◼ Range from small servers to building sized

Chapter 1 — Computer Abstractions and Technology — 3


Classes of Computers
◼ Supercomputers
◼ High-end scientific and engineering
calculations
◼ Highest capability but represent a small
fraction of the overall computer market

◼ Embedded computers
◼ Hidden as components of systems
◼ Stringent power/performance/cost constraints

Chapter 1 — Computer Abstractions and Technology — 4


The PostPC Era

Chapter 1 — Computer Abstractions and Technology — 5


The PostPC Era
◼ Personal Mobile Device (PMD)
◼ Battery operated
◼ Connects to the Internet
◼ Hundreds of dollars
◼ Smart phones, tablets, electronic glasses
◼ Cloud computing
◼ Warehouse Scale Computers (WSC)
◼ Software as a Service (SaaS) (web search, social
networking)
◼ Portion of software run on a PMD and a
portion run in the Cloud
◼ Amazon and Google
Chapter 1 — Computer Abstractions and Technology — 6
Cloud Computing
Cloud computing refers to
(1) large collection of servers that
provide services over the Internet,

(2) dynamically varying number of


servers as a utility.

SaaS: a portion of code runs on PMD


and a portion that runs in the Cloud.

Chapter 1 — Computer Abstractions and Technology — 7


What You Will Learn
◼ How programs are translated into the
machine language
◼ And how the hardware executes them
◼ The hardware/software interface
◼ What determines program performance
◼ And how it can be improved
◼ How hardware designers improve
performance
◼ What is parallel processing
Chapter 1 — Computer Abstractions and Technology — 8
Understanding Performance
◼ Algorithm
◼ Determines number of operations executed
◼ Programming language, compiler, architecture
◼ Determine number of machine instructions executed
per operation
◼ Processor and memory system
◼ Determine how fast instructions are executed
◼ I/O system (including OS)
◼ Determines how fast I/O operations are executed

Chapter 1 — Computer Abstractions and Technology — 9


Eight Great Ideas
◼ Design for Moore’s Law

◼ Use abstraction to simplify design

◼ Make the common case fast

◼ Performance via parallelism

◼ Performance via pipelining

◼ Performance via prediction

◼ Hierarchy of memories
◼ Dependability via redundancy

Chapter 1 — Computer Abstractions and Technology — 10


Below Your Program
◼ Application software
◼ Written in high-level language
◼ System software
◼ Compiler: translates HLL code to
machine code
◼ Operating System: service code
◼ Handling input/output
◼ Managing memory and storage
◼ Scheduling tasks & sharing resources
◼ Hardware
◼ Processor, memory, I/O controllers

Chapter 1 — Computer Abstractions and Technology — 11


Levels of Program Code
◼ High-level language
◼ Level of abstraction closer
to problem domain
◼ Provides for productivity
and portability
◼ Assembly language
◼ Textual representation of
instructions
◼ Hardware representation
◼ Binary digits (bits)
◼ Encoded instructions and
data

Chapter 1 — Computer Abstractions and Technology — 12


Components of a Computer
The BIG Picture ◼ Same components for
all kinds of computer
◼ Desktop, server,
embedded
◼ Input/output includes
◼ User-interface devices
◼ Display, keyboard, mouse
◼ Storage devices
◼ Hard disk, CD/DVD, flash
◼ Network adapters
◼ For communicating with
other computers

Chapter 1 — Computer Abstractions and Technology — 13


Touchscreen
◼ PostPC device
◼ Supersedes keyboard
and mouse
◼ Resistive and
Capacitive types
◼ Most tablets, smart
phones use capacitive
◼ Capacitive allows
multiple touches
simultaneously

Chapter 1 — Computer Abstractions and Technology — 14


Through the Looking Glass
◼ LCD screen: picture elements (pixels)
◼ Mirrors content of frame buffer memory

Chapter 1 — Computer Abstractions and Technology — 15


Opening the Box
Capacitive multitouch LCD screen

3.8 V, 25 Watt-hour battery

Computer board

Chapter 1 — Computer Abstractions and Technology — 16


Inside the Processor (CPU)
◼ Datapath: performs operations on data
◼ Control: sequences datapath, memory, ...
◼ Cache memory
◼ Small fast SRAM memory for immediate
access to data

Chapter 1 — Computer Abstractions and Technology — 17


Inside the Processor
◼ Apple A5

Chapter 1 — Computer Abstractions and Technology — 18


Abstractions
The BIG Picture

◼ Abstraction helps us deal with complexity


◼ Hide lower-level detail
◼ Instruction set architecture (ISA)
◼ The hardware/software interface
◼ Application binary interface
◼ The ISA plus system software interface
◼ Implementation
◼ The details underlying and interface
Chapter 1 — Computer Abstractions and Technology — 19
A Safe Place for Data
◼ Volatile main memory
◼ Loses instructions and data when power off
◼ Non-volatile secondary memory
◼ Magnetic disk
◼ Flash memory
◼ Optical disk (CDROM, DVD)

Chapter 1 — Computer Abstractions and Technology — 20


Networks
◼ Communication, resource sharing,
nonlocal access
◼ Local area network (LAN): Ethernet
◼ Wide area network (WAN): the Internet
◼ Wireless network: WiFi, Bluetooth

Chapter 1 — Computer Abstractions and Technology — 21


§1.5 Technologies for Building Processors and Memory
Technology Trends
◼ Electronics
technology
continues to evolve
◼ Increased capacity
and performance
◼ Reduced cost
DRAM capacity

Year Technology Relative performance/cost


1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000

Chapter 1 — Computer Abstractions and Technology — 22


Semiconductor Technology
◼ Silicon: semiconductor
◼ Add materials to transform properties:
◼ Conductors
◼ Insulators
◼ Switch

Chapter 1 — Computer Abstractions and Technology — 23


Manufacturing ICs

◼ Yield: proportion of working dies per wafer

Chapter 1 — Computer Abstractions and Technology — 24


Intel Core i7 Wafer

◼ 300mm wafer, 280 chips, 32nm technology


◼ Each chip is 20.7 x 10.5 mm
Chapter 1 — Computer Abstractions and Technology — 25
Integrated Circuit Cost
Cost per wafer
Cost per die =
Dies per wafer  Yield
Dies per wafer  Wafer area Die area
1
Yield =
(1+ (Defects per area  Die area/2)) 2

◼ Nonlinear relation to area and defect rate


◼ Wafer cost and area are fixed
◼ Defect rate determined by manufacturing process
◼ Die area determined by architecture and circuit design

Chapter 1 — Computer Abstractions and Technology — 26


§1.6 Performance
Defining Performance
◼ Which airplane has the best performance?

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50

0 100 200 300 400 500 0 2000 4000 6000 8000 10000

Passenger Capacity Cruising Range (miles)

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50

0 500 1000 1500 0 100000 200000 300000 400000

Cruising Speed (mph) Passengers x mph

Chapter 1 — Computer Abstractions and Technology — 27


Response Time and Throughput
◼ Response time
◼ How long it takes to do a task
◼ Throughput
◼ Total work done per unit time
◼ e.g., tasks/transactions/… per hour
◼ How are response time and throughput affected
by
◼ Replacing the processor with a faster version?
◼ Adding more processors?
◼ We’ll focus on response time for now…

Chapter 1 — Computer Abstractions and Technology — 28


Relative Performance
◼ Define Performance = 1/Execution Time
◼ “X is n time faster than Y”
Performanc e X Performanc e Y
= Execution time Y Execution time X = n

◼ Example: time taken to run a program


◼ 10s on A, 15s on B
◼ Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
◼ So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and Technology — 29
Measuring Execution Time
◼ Elapsed time
◼ Total response time, including all aspects
◼ Processing, I/O, OS overhead, idle time
◼ Determines system performance
◼ CPU time
◼ Time spent processing a given job
◼ Discounts I/O time, other jobs’ shares
◼ Comprises user CPU time and system CPU
time
◼ Different programs are affected differently by
CPU and system performance
Chapter 1 — Computer Abstractions and Technology — 30
CPU Clocking
◼ Operation of digital hardware governed by a
constant-rate clock
Clock period

Clock (cycles)

Data transfer
and computation
Update state

◼ Clock period: duration of a clock cycle


◼ e.g., 250ps = 0.25ns = 250×10–12s
◼ Clock frequency (rate): cycles per second
◼ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1 — Computer Abstractions and Technology — 31
CPU Time
CPU Time = CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
◼ Performance improved by
◼ Reducing number of clock cycles
◼ Increasing clock rate
◼ Hardware designer must often trade off clock
rate against cycle count

Chapter 1 — Computer Abstractions and Technology — 32


CPU Time Example
◼ Computer A: 2GHz clock, 10s CPU time
◼ Designing Computer B
◼ Aim for 6s CPU time
◼ Can do faster clock, but causes 1.2 × clock cycles
◼ How fast must Computer B clock be?
Clock CyclesB 1.2  Clock Cycles A
Clock RateB = =
CPU Time B 6s
Clock Cycles A = CPU Time A  Clock Rate A
= 10s  2GHz = 20  10 9
1.2  20  10 9 24  10 9
Clock RateB = = = 4GHz
6s 6s
Chapter 1 — Computer Abstractions and Technology — 33
Instruction Count and CPI
Clock Cycles = Instruction Count  Cycles per Instruction
CPU Time = Instruction Count  CPI  Clock Cycle Time
Instruction Count  CPI
=
Clock Rate
◼ Instruction Count for a program
◼ Determined by program, ISA and compiler
◼ Average cycles per instruction
◼ Determined by CPU hardware
◼ If different instructions have different CPI
◼ Average CPI affected by instruction mix

Chapter 1 — Computer Abstractions and Technology — 34


CPI Example
◼ Computer A: Cycle Time = 250ps, CPI = 2.0
◼ Computer B: Cycle Time = 500ps, CPI = 1.2
◼ Same ISA
◼ Which is faster, and by how much?
CPU Time = Instruction Count  CPI  Cycle Time
A A A
= I  2.0  250ps = I  500ps A is faster…
CPU Time = Instruction Count  CPI  Cycle Time
B B B
= I  1.2  500ps = I  600ps

B = I  600ps = 1.2
CPU Time
…by this much
CPU Time I  500ps
A
Chapter 1 — Computer Abstractions and Technology — 35
CPI in More Detail
◼ If different instruction classes take different
numbers of cycles
n
Clock Cycles =  (CPIi  Instruction Count i )
i=1

◼ Weighted average CPI


Clock Cycles n
 Instruction Count i 
CPI = =   CPIi  
Instruction Count i=1  Instruction Count 

Relative frequency

Chapter 1 — Computer Abstractions and Technology — 36


CPI Example
◼ Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2 2+1+2=5 inst.
IC in sequence 2 4 1 1 4+1+1=6 inst.

◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
◼ Avg. CPI = 10/5 = 2.0 ◼ Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 37
Performance Summary
The BIG Picture

Instructions Clock cycles Seconds


CPU Time =  
Program Instruction Clock cycle

◼ Performance depends on
◼ Algorithm: affects IC, possibly CPI
◼ Programming language: affects IC, CPI
◼ Compiler: affects IC, CPI
◼ Instruction set architecture: affects IC, CPI, Tc

Chapter 1 — Computer Abstractions and Technology — 38


§1.7 The Power Wall
Power Trends More complex pipeline

Simpler pipeline Core 2

CMOS primary energy consumption


◼ In CMOS IC technology is dynamic energy, switch on->off;
off->on controlled by the clock freq.

Power = 0.5  Capacitive load  Voltage 2  Frequency

Dynamic ×30 5V → 1V ×1000


Power
Chapter 1 — Computer Abstractions and Technology — 39
Reducing Power
◼ Suppose a new CPU has
◼ 85% of capacitive load of old CPU
◼ 15% voltage and 15% frequency reduction
Pnew Cold  0.85  (Vold  0.85) 2  Fold  0.85
= = 0.85 4
= 0.52
Cold  Vold  Fold
2
Pold

◼ The power wall


◼ We can’t reduce voltage further
◼ We can’t remove more heat
◼ How else can we improve performance?
Chapter 1 — Computer Abstractions and Technology — 40
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency

Chapter 1 — Computer Abstractions and Technology — 41


Multiprocessors
◼ Multicore microprocessors
◼ More than one processor per chip
◼ Requires explicitly parallel programming
◼ Compare with instruction level parallelism
◼ Hardware executes multiple instructions at once
◼ Hidden from the programmer
◼ Hard to do
◼ Programming for performance
◼ Load balancing
◼ Optimizing communication and synchronization

Chapter 1 — Computer Abstractions and Technology — 42


SPEC CPU Benchmark
◼ Programs used to measure performance
◼ Supposedly typical of actual workload
◼ Standard Performance Evaluation Corp (SPEC)
◼ Develops benchmarks for CPU, I/O, Web, …
◼ SPEC CPU2006
◼ Elapsed time to execute a selection of programs
◼ Negligible I/O, so focuses on CPU performance
◼ Normalize relative to reference machine
◼ Summarize as geometric mean of performance ratios
◼ CINT2006 (integer) and CFP2006 (floating-point)

n
n
 Execution time ratio
i=1
i

Chapter 1 — Computer Abstractions and Technology — 43


CINT2006 for Intel Core i7 920

Chapter 1 — Computer Abstractions and Technology — 44


SPEC Power Benchmark
◼ Power consumption of server at different
workload levels
◼ Performance: ssj_ops
◼ Power: Watts (Joules/sec)

 10   10 
Overall ssj_ops per Watt =   ssj_ops i    poweri 
 i=0   i=0 

ssj_ops/watt (server side Java operations per second per watt)

Chapter 1 — Computer Abstractions and Technology — 45


SPECpower_ssj2008 for Xeon X5650

Chapter 1 — Computer Abstractions and Technology — 46


Pitfall: Amdahl’s Law
◼ Improving an aspect of a computer and
expecting a proportional improvement in
overall performance
Taffected
Timproved = + Tunaffected
improvemen t factor
◼ Example: multiply accounts for 80s/100s
◼ How much improvement in multiply performance to
get 5× overall?
80
20 = + 20 ◼ Can’t be done!
n
◼ Corollary: make the common case fast
Chapter 1 — Computer Abstractions and Technology — 47
Fallacy: Low Power at Idle
◼ Look back at i7 power benchmark
◼ At 100% load: 258W
◼ At 50% load: 170W (66%)
◼ At 10% load: 121W (47%)
◼ Google data center
◼ Mostly operates at 10% – 50% load
◼ At 100% load less than 1% of the time
◼ Consider designing processors to make
power proportional to load

Chapter 1 — Computer Abstractions and Technology — 48


Pitfall: MIPS as a Performance Metric
◼ MIPS: Millions of Instructions Per Second
◼ Doesn’t account for
◼ Differences in ISAs between computers
◼ Differences in complexity between instructions

Instructio n count
MIPS =
Execution time  10 6
Instructio n count Clock rate
= =
Instructio n count  CPI CPI  10 6
 10 6

Clock rate
◼ CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and Technology — 49
Concluding Remarks
◼ Cost/performance is improving
◼ Due to underlying technology development
◼ Hierarchical layers of abstraction
◼ In both hardware and software
◼ Instruction set architecture
◼ The hardware/software interface
◼ Execution time: the best performance
measure
◼ Power is a limiting factor
◼ Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 50

You might also like