0% found this document useful (0 votes)

16 views76 pages

01 Abstraction and Technology

Uploaded by

20jasmine.asami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views76 pages

01 Abstraction and Technology

Uploaded by

20jasmine.asami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

EECS 112 (Spring 2024)

Organization of Digital Computers

Chapter 01
Computer Abstraction and Technology
Hyoukjun kwon
[email protected]
EECS 112 (Spring 2024)
Organization of Digital Computers

Section 1. Technology

2
Introduction: The Computer Revolution
Three revolutions for civilization:
§ Agricultural revolution
§ Industrial revolution
§ Information revolution
• Computer revolution is the foundation

Computer revolution makes novel applications feasible

§ Computers in automobiles
§ Cell phones
§ Human genome project
§ World Wide Web
Computers are pervasive
§ Search Engines

3
Classes of Computers
§ Personal computers (“PC”)
• General purpose, variety of software
• Subject to cost/performance tradeoff

§ Server computers
• Network based
• High capacity, performance, reliability
• Range from small servers to building sized

§ Supercomputers
• High-end scientific and engineering calculations
• Highest capability but represent a small fraction of the overall computer market

§ Embedded computers
• Hidden as components of systems
• Stringent power/performance/cost constraints
4
The Post PC Era

5
The Post PC Era
§ Personal Mobile Device (PMD)
• Battery operated
• Connects to the Internet
• Hundreds of dollars
• Smart phones, tablets, electronic glasses

§ Cloud computing
• Warehouse Scale Computers (WSC)
• Software as a Service (SaaS)
• Portion of software run on a PMD, and a portion run in the Cloud
• Amazon and Google

§ Others:
• AR/VR
• Autonomous driving
• …
6
Opening the Box

Inside iPhone XS Max

7
Opening the Box

Inside iPhone XS Max

8
Inside the Processor (CPU)
§ Datapath: performs operations on data (i.e., computation)

§ Control: sequences datapath, memory, and other components

§ On-chip Memory: Stores data near CPU cores

• Cache memory: Fast and small SRAM memory for immediate access to data

High-level comparison of on-chip and off-chip memory Subject to the chip size
• Cache memories require 1-2 cycles for access (but small size: KB- 10s MB range)
• DRAMs (off-chip memory) requires 50-60 cycles* (but large size: GB range)

* S. Eyerman et al., “DRAM Bandwidth and Latency Stacks: Visualizing DRAM Bottlenecks.” ISPASS 2022 (Intel Paper) 9
Inside the Processor

Apple A12 System-on-Chip (SoC)

10
Networks
§ Functionalities
• Communication: exchange information
• Resource sharing
o e.g., many users can share one GPU server
• Nonlocal access to remote resources
o e.g., provide access to a computer server

§ Types of Networks
• Local area network (LAN)
o Wired: Ethernet
o Wireless: WiFi
• Wide area network (WAN): the Internet
• Personal area network (PAN)
o Wireless network: Mainly WiFi and Bluetooth

11
Memory and Storage
§ Volatile main memory (DRAM)
• Loses instructions and data when power off
§ Non-volatile secondary memory (storage)
• Flash memory (solid state drive; SSD)
• Magnetic disk (hard disk drive)
• Optical disk (CDROM, DVD)

No single option is ideal for all use cases; they all have trade-offs

12
Technology Trends
§ Electronics technology continues to evolve
• Increased capacity and performance Year Technology Relative performance/cost
• Reduced cost 1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000
§ Semiconductor Technology
• Built upon silicon
• Add materials (conductors and insulators) to transform properties
• Organize the structure to become transistors
• Transistors work as electrically controlled switches
o (conduct or insulate under specific conditions) => transistors
What does a transistor look like?

13
Insulator: material
Modern Transistor: MOSFET where electric current
doesn’t flow freely
(e.g., rubber blocks
§ Metal Oxide Semiconductor Field-Effect Transistor electricity)
Insulator
Gate
Source Drain
Gate
Gate Oxide
n n

p-substrate (silicon)
<n-channel MOSFET>
Insulator
Gate
Source Drain
Gate
Gate Oxide
p p

n-substrate (silicon)
<p-channel MOSFET>

• Figure source: M. Riordan et al., “The invention of the transistor.” Reviews of Modern Physics 71.2, 1999. 14
Modern Transistor: MOSFET
§ Metal Oxide Semiconductor Field-Effect Transistor
- - -
Insulator - - -
Gate
- Si - - Si - - Si -
Source Drain
- Si - - Si - - Si -
Gate - - -
- - -
Gate Oxide “Free” electron
“Hole”
- - -
- - -
n n
- Si - - P - - Si - - Si - - B - - Si -

- - -
p-substrate (silicon) - - -

- - - - - -
<n-channel MOSFET>
- Si - - Si - - Si - - Si - - Si - - Si -
Insulator
Gate - - - - - -
Source Drain
Gate N-doped Substrate P-doped Substrate
Gate Oxide
p p
- Add atoms with five electrons - Add atoms with three electrons
- ”Free” electrons can flow away - ”Holes” can accommodate
n-substrate (silicon)
incoming electrons
<p-channel MOSFET>

15
VT: A voltage large enough to create an n-channel
MOSFET as a Switch
§ How n-type MOSFET works as a switch
Vh > VT
Electrons attracted to the surface due to Vh
Gate
Source Gate Drain
Gate Oxide
-- -- -- -- -- -- -- -- --
n n
“n-channel”

p-substrate (silicon) “Closed” switch (i.e., connected)

GND
Gate
Source Gate Drain
Gate Oxide
n No n-channel n

p-substrate (silicon) “Open” switch (i.e., disconnected)

P-type MOSFET works in the opposite way (If GND applied to GATE, the switch is closed) 16
MOSFET as a Switch
+5V “High” voltage on
Gate Gate Gate == Pushing the
Source Drain switch to close it
Gate
Gate Oxide
-- -- -- -- -- -- -- -- --
n n
“n-channel”
Source Drain
p-substrate (silicon)
<N-type MOSFET>
0V
Gate “Low” voltage on
Gate
Gate == Do not
Source Gate Drain
push the switch;
Gate Oxide switch is open
n No n-channel n
Source Drain
p-substrate (silicon)
<N-type MOSFET>

What you need to remember: MOSFET is an electrically controlled switch 17

Two Types of MOSFET
Gate Gate
Source Drain Source Drain
Gate Gate
Gate Oxide Gate Oxide
n n p p

p-substrate (silicon) n-substrate (silicon)

<N-type MOSFET> <P-type MOSFET>

Gate Voltage Connectivity Gate Voltage Connectivity
“Low” Disconnected (Open) “Low” Connected (Closed)
“High” Connected (Closed) “High” Disconnected (Open)
<Analogy>
A switch with a “short” spring A switch with a “long” spring
We need to push the switch We need to pull the switch (i.e.,
(i.e., “high” voltage) to connect “high” voltage) to disconnect

Source Drain Source Drain

18
VGS: The voltage between gate and source
Switch Abstraction of MOSFET VT: A threshold voltage for a connection

Source
Insulator
Gate
Source Drain
Gate Connected if VGS > VT
Gate Oxide Gate
n n

p-substrate (silicon) Drain

<n-channel MOSFET> <nMOS Switch>
Circle means
Source “inversion”
Insulator
Gate
Source Drain
Gate
Gate Oxide Gate Connected if VGS < VT
p p
== inverted nMOS
n-substrate (silicon) Drain
<p-channel MOSFET> <pMOS Switch>

19
Building Logic Gates with Transistors

Input Output
Vdd = 5 V “Low” == 0 V “High” == 5 V
“High” == 5 V “Low” == 0 V

The input signal is “inverted”

Input Output

In Out

Vss = 0 V “Inverter” or “NOT” gate

Take-away: we can build all logic gates using transistors 20

Manufacturing ICs

Yield: proportion of working dies per wafer

* Resource: From Sand to Silicon: The Making of a Microchip (by Intel) https://youtu.be/_VMYPLXnd7E?si=IFXOEHxxqL9TvPSn 21
Intel® Core 10th Gen “Ice Lake” CPUs Wafer
§ 300mm wafer, 506 chips, 10nm
technology
§ Each chip is 11.4 x 10.7 mm

22
Integrated Circuit Cost
Cost per wafer
Cost per die =
Dies per wafer×Yield
Dies per wafer ≈ Wafer area⁄Die area

1 Empirical observations of
Yield = yields at IC factories;
(1 + (Defects per area×Die area))! N related to the number of
critical processing steps

§ Nonlinear relation to area and defect rate

• Wafer cost and area are fixed
• Defect rate determined by manufacturing process
We should minimize
• Die area determined by architecture and circuit design the die area

23
EECS 112 (Spring 2024)
Organization of Digital Computers

Section 2. Abstraction

24
Seven Great Ideas in Computer Architecture

§ Use abstraction to simplify design

§ Make the common case fast

§ Performance via parallelism

We will focus on
§ Performance via pipelining these in EECS 112

§ Performance via prediction

§ Hierarchy of memories

§ Dependability via redundancy

25
Below Your Program
§ Application software
• Written in high-level language (HLL)
§ System software
• Compiler: translates HLL code to machine code
• Operating System: service code
o Handling input/output
o Managing memory and storage
o Scheduling tasks & sharing resources
§ Hardware
• Processor, memory, I/O controllers

26
Levels of Program Code
§ High-level language (Python, C, C++, …)
• Level of abstraction closer to problem domain
• Provides for productivity and portability
§ Assembly language
• Textual representation of instructions
§ Hardware representation
• Binary digits (bits)
• Encoded instructions and data

Thanks to the multiple levels of

abstractions, we don’t have to deal
with low-level details like this

27
Below Your Program
§ Application software
• Written in high-level language (HLL)
§ System software
• Compiler: translates HLL code to machine code
• Operating System: service code
o Handling input/output
o Managing memory and storage
o Scheduling tasks & sharing resources
§ Hardware
• Processor, memory, I/O controllers

28
Components of a Computer
§ Same components for all kinds of computer
• Desktop, server, embedded, …

§ Input/output includes
• User-interface devices
o Display, keyboard, mouse, touch screen
• Storage devices
o Hard disk, CD/DVD, flash
• Network adapters
o For communicating with other computers
§ Processor
• Control + Datapath + on-chip memory; cache

29
Components of a Computer

Thanks to the multiple-level of abstractions, programmers do not have to

worry about the hardware level details
However, to be a strong programmer who can write high-performance code, you should understand how
underlying hardware (CPU, GPU, etc.) works! 30
EECS 112 (Spring 2024)
Organization of Digital Computers

Section 3. Performance

31
Topics in Performance
§ Definition of performance in computer system
§ Relative performance based on the execution time
§ Clock frequency and period (clock cycle time)
§ CPU time and CPI
§ Performance Formula
§ Factors affecting the performance

32
Defining Performance
§ Which airplane has the best performance?

33
Performance Metrics: Response Time and Throughput
§ Response time (i.e., latency)
• How long it takes to complete a task
Vs.
§ Throughput
• Total work done per unit time
o e.g., tasks/transactions/… per hour

§ Latency- and throughput-oriented optimization strategies

• Latency-oriented (CPU): Replacing the processor with a faster version
• Throughput oriented (e.g., GPU): Adding many cores (simplified than a
regular processor): individual cores can be slower
NVIDIA’s video on Throughput:
https://youtu.be/-P28LKWTzrI?si=z3_I8AV0TG-fHhh1

§ We’ll focus on response time for now

34
Topics in Performance
§ Definition of performance in computer system
§ Relative performance based on the execution time
§ Clock frequency and period (clock cycle time)
§ CPU time and CPI
§ Performance Formula
§ Factors affecting the performance

35
Relative Performance
!
§ Define Performance =
"#$%&'&()* ,(-$
§ “X is n time faster than Y”

𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒. 𝐸𝑥𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒/

= =𝑛
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒/ 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒.

§ Example: time taken to run a program

• 10s on A, 15s on B
• Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
• So A is 1.5 times faster than B

36
Measuring Execution Time
§ Elapsed time (aka wall clock time or response time)
• Total response time, including all aspects (i.e., “end-to-end” latency)
o Includes processing (computation), I/O (data movement), OS overhead, idle time
(“Stalls”; to be discussed later in the lecture), and so on
• Determines system performance

§ CPU time
• Time spent processing a given job on a CPU
o Discounts I/O time, other jobs’ shares
• Consists of user CPU time (time spent on user-defined programs) and
system CPU time (time spent on OS/system services for running the program)
• Different programs are affected differently by CPU and system performance

Example: What if an OS takes too long for dynamic memory allocation (e.g., Malloc)? 37
Topics in Performance
§ Definition of performance in computer system
§ Relative performance based on the execution time
§ Clock frequency and period (clock cycle time)
§ CPU time and CPI
§ Performance Formula
§ Factors affecting the performance

38
CPU Clocking
§ Operation of digital hardware governed by a constant-rate clock
Clock period

Clock (cycles)

Data transfer
and computation

Update state

§ Clock period (Clock cycle time): duration of a clock cycle

• e.g., 250 ps = 0.25 ns = 250×10–12 s

§ Clock frequency (Clock rate): Number of cycles per second

• e.g., 4.0 GHz = 4000 MHz = 4.0×109 Hz

39
Clock Frequency and Cycle
§ Frequency: How many times does a signal oscillate for each second?

Signal
Value Once a second => 1 Hz
1 The duration of one clock signal: 1 second
…
(1 second / 1 clock = 1)
0
1 sec Time Hz (“Hertz”): The number of oscillations of per second
Signal (i.e., frequency)
Value
1 Twice a second => 2 Hz
…
The duration of one clock signal: 0.5 second
0
Time
(1 second / 2 clock = 0.5)
1 sec

Clock Cycle: The duration of one clock signal

40
Clock Frequency and Cycle
§ Frequency: How many times does a signal oscillate for each second?

Signal
Value N times per second => N Hz
1 The duration of one clock signal (Clock Cycle): 1/N second
… (1 second / N clocks = 1/N)
0
1 sec Time

Key Idea: Clock cycle must be longer than the critical path delay

Because we sample values at the end of each clock cycle

41
Topics in Performance
§ Definition of performance in computer system
§ Relative performance based on the execution time
§ Clock frequency and period (clock cycle time)
§ CPU time and CPI
§ Performance Formula
§ Factors affecting the performance

42
CPU Time

CPU Time = CPU Clock Cycles ´ Clock Cycle Time

e.g., 1 ns
CPU Clock Cycles
=
Clock Rate e.g., 1 GHz

1
𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒 (𝑐𝑙𝑜𝑐𝑘 𝑝𝑒𝑟𝑖𝑜𝑑) =
𝐶𝑙𝑜𝑟𝑘 𝑅𝑎𝑡𝑒
§ Performance can be improved by
• Reducing number of clock cycles (i.e., decrease the numerator of the above)
• Increasing clock rate (i.e., increase the denominator of the above)

Hardware designers often need to trade off clock rate against cycle count

43
CPU Time Example
§ Computer A: 2 GHz clock, 10 s CPU time
§ Designing Computer B
• Aim for 6s CPU time
• Can do faster clock, but causes 1.2 × clock cycles
§ How fast must Computer B clock be?

Clock CyclesB 1.2 ´ Clock CyclesA

Clock RateB = =
CPU TimeB 6s
Clock CyclesA = CPU Time A ´ Clock Rate A
= 10s ´ 2GHz = 20 ´ 10 9
1.2 ´ 20 ´ 10 9 24 ´ 10 9
Clock RateB = = = 4GHz
6s 6s
44
Instruction Count and CPI
§ Instruction Count for a program
ISA: Instruction Set Architecture
• Determined by program, ISA, and compiler • defines the supported instructions, data
types, registers, etc.
§ CPI: Clock cycles Per Instruction
• Average number of clock cycles each instruction takes to execute a program
• Determined by CPU hardware
• Each instruction has different CPI: Average CPI affected by instruction mix
• e.g, 20% load/store + 80% compute vs. 30% load/store + 5% conditional + 65% compute

#𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 = 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡 × 𝐶𝑦𝑐𝑙𝑒𝑠 𝑝𝑒𝑟 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 (𝐶𝑃𝐼)

𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 = # 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠× 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒
= 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡 × 𝐶𝑃𝐼 × 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒
"#$%&'(%)*# ,*'#%× ,."
= ,/*(0 12%3

45
Example: Using CPI to compute CPU Time
§ Computer A: Cycle Time = 250ps, CPI = 2.0
§ Computer B: Cycle Time = 500ps, CPI = 1.2
§ Same ISA
§ Which is faster, and by how much?

CPU Time = Instruction Count ´ CPI ´ Cycle Time

A A A
= I ´ 2.0 ´ 250ps = I ´ 500ps A is faster…
CPU Time = Instruction Count ´ CPI ´ Cycle Time
B B B
= I ´ 1.2 ´ 500ps = I ´ 600ps
CPU Time
B = I ´ 600ps = 1.2
…by this much
CPU Time I ´ 500ps
A

46
CPI in More Detail
§ If different instruction classes (e.g., Integer Add vs. Floating
point Add) take different numbers of cycles
n
Clock Cycles = å (CPIi ´ Instruction Count i )
i=1

Weighted average CPI

Clock Cycles n
æ Instructio n Count i ö
CPI = = å ç CPIi ´ ÷
Instructio n Count i=1 è Instructio n Count ø

Reflects the relative

occurrence frequency of
instruction type i

47
CPI Example
§ Alternative compiled code sequences using instructions in classes A, B, C

Instruction Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

n Sequence 1: IC = 5 n Sequence 2: IC = 6
n Clock Cycles n Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
n Avg. CPI = 10/5 = 2.0 n Avg. CPI = 9/6 = 1.5

* IC: Instruction Count 48

Topics in Performance
§ Definition of performance in computer system
§ Relative performance based on the execution time
§ Clock frequency and period (clock cycle time)
§ CPU time and CPI
§ Performance Formula
§ Factors affecting the performance

49
Performance Summary
Seconds Instructions Clock cycles Seconds
CPU Time = = × ×
Program Program Instruction Clock cycle

CPI Clock cycle time

This determined by algorithm, programming languages, compiler, and ISA

§ Performance depends on
• Algorithm: affects Instruction Count (IC), possibly CPI
• Programming language: affects IC, CPI
• Compiler: affects IC, CPI

• Instruction Set Architecture (ISA): affects IC, CPI, clock rate

+) Microarchitecture (Hardware implementation details)

50
Topics in Performance
§ Definition of performance in computer system
§ Relative performance based on the execution time
§ Clock frequency and period (clock cycle time)
§ CPU time and CPI
§ Performance formulation
§ Factors affecting the performance

51
Understanding Factors Affecting “Performance”
§ Algorithm
Ø What is the problem-solving strategy (Mathematics level)?

§ Programming Language, Compiler, and Architecture

Ø How will we generate low-level (very detailed) instructions for our computer to run
the algorithm?
Ø What kind of hardware modules to we have?

§ Microarchitecture: Processor and Memory System

Ø How are underlying hardware modules implemented?

§ Input and Output (I/O): Hardware and Software

Ø How fast can we move data into / out of the processor?

52
Understanding Factors Affecting “Performance”
§ Algorithm
• What it means
Ø What is the problem-solving strategy (Mathematics level)?

• Example: Add integers - 0 to 100

• Algorithm Choice 1: Add individual numbers from 0 to 100 100 adds
!(!#$) 1 add, 1 mult,
• Algorithm Choice 2: Use the mathematical formula 𝑆𝑢𝑚 𝑛 = and 1 div
&

Which algorithm is better (light-weighted; faster)?

From mathematics perspective, choice 2

What if we have a very efficient addition engine that can handle 128 adds every cycle?

We need to understand the underlying hardware architecture to precisely analyze

performance and optimize our program

53
Understanding Factors Affecting “Performance”
§ Programming Language, Compiler, and Architecture
• What it means
Ø How will we generate instructions for our computer to run the algorithm?
• High-level Example
• You want to purchase stamps and a burger.
• Choice 1
(1) Get to the post office and purchase stamps
(2) Go back home and put your stamps on your desk
(3) Get to the Xn-N-Xout to get a burger
(4) Go back home and enjoy the burger
• Choice 2
(1) Get to the post office and purchase stamps
(2) Get to the Xn-N-Xout to get a burger
(3) Go back home, put your stamps on your desk, and enjoy the burger

Which choice is better?

What if the post office and Xn-N-Xout are in the opposite direction?
54
Understanding Factors Affecting “Performance”
§ Microarchitecture: Processor and Memory System
• What it means
Ø How are underlying hardware modules implemented?
• High-level Example
• CPU 1
• Data load from on-chip memory: 10 cycles
• Summing up 128 integers with a special instruction: 1 cycle
• Summing up 128 integers without the special instruction: 128 cycles
• CPU 2
• Data load from on-chip memory: 1 cycle
• Summing up 128 integers: 128 cycles

Which choice is better?

It depends on the problem, algorithm, and instruction choice

55
Understanding Factors Affecting “Performance”
§ Input and Output (I/O): Hardware and Software
• What it means
Ø How fast can we move data into / out of the processor?
• High-level Example Bottleneck: Cooking Speed
Avg: Avg: Avg:
20 Ingredients / minute 1 burger / minute 1 burger / minute

Deliver to
Restaurant

Assumption:
One burger needs five ingredients 56
Understanding Factors Affecting “Performance”
§ Input and Output (I/O): Hardware and Software
• What it means
Ø How fast can we move data into / out of the processor?
• High-level Example Bottleneck: Ingredient delivery (i.e., I/O)
Avg: Avg: Avg:
20 Ingredients / minute 6 burgers / minute 4 burger / minute

Deliver to
Restaurant

57
EECS 112 (Spring 2024)
Organization of Digital Computers

Section 4. Power

58
Static and Dynamic Power in CMOS Technology
§ Static Power
• Mainly based on the leakage: When a transistor is inactive, small current flows
through a transistor when there is no activity (activity: change in the value)

§ Dynamic Power
• Power consumption when we change the value of a transistor
1
𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑃𝑜𝑤𝑒𝑟 ∝ 𝐶𝑓𝑉 5
2
• C: Capacitance. Dependent on the technology node (e.g., 45 nm, 16 nm, 7 nm, …) and fanout (i.e.,
how many transistors are connected on the downstream?)
• f: Transition frequency (i.e., how often do we change between 0 and 1?)
• V: Voltage

f is relevant to the clock rate (The higher clock rate, the higher f)
Power and Clock Frequency Trend over 30 years
Technology innovations
(smaller transistors)

Architectural optimizations
(e.g., multi-core)

1
𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑃𝑜𝑤𝑒𝑟 ∝ 𝐶𝑓𝑉 5 Reached “Power wall”: Cannot further increase the
2 clock frequency due to the heat
How Severe was the Power Wall?

Video: CPU overclocking (over 9GHz) using liquid nitrogen

* Source Sung Hwan Kim, “Germanium-Source Tunnel Field Effect Transistors for Ultra-Low Power Digital Logic.” University of
Uniprocessor Performance

Constrained by power, instruction-

level parallelism, memory latency 62
Reducing Power
§ Example: A new CPU
• 85% of capacitive load of old CPU (e.g., 40 nm ->32 nm technology)
• 15% voltage and 15% frequency reduction

Pnew Cold ´ 0.85 ´ (Vold ´ 0.85) 2 ´ Fold ´ 0.85

= 2
= 0.85 4
= 0.52
Pold Cold ´ Vold ´ Fold

§ The power wall

• We can’t reduce voltage further (further voltage lowering makes the transistors
too leaky)
• We can’t remove more heat efficiently Power is a challenge for integrated circuits:
• Power must be brought in and
distributed around the chip
• Power is dissipated as heat and must be
§ What else can we do to improve performance? removed

63
Multiprocessors
§ Multicore microprocessors
• More than one processor per chip

§ To fully utilize the performance potential, explicit parallel

programming is required
• Hard to do:
o Programming for performance
o Load balancing
o Optimizing communication and synchronization

Often referred to as “CMP” : Chip MultiProcessor

64
FYI: TDP vs Power Consumption
§ TDP: Thermal Design Power
• How much heat dissipation can the target cooling system (i.e,,
default coolers) can manage?
Common Error: TDP == Actual Power
(Relevant, but not the same!)
1. If the load is not 100%, the power
consumption is less than the TDP

2. When load is 100%, if your cooling solutions

is better than the default one, your system
may draw more power than the TDP

* NVIDIA, “GPU Power Primer.” 2019 65

FYI: TDP vs Power Consumption

Consumed at most 221.87 W

while the TDP is 170W

Note: This is provided as an example.

Exact power consumption depends
on many factors (e.g., computer
case, room temperature, system
Benchmark: Prime 95 configuration, etc.)

* AnandTech, “The AMD Ryzen 9 7950X 3D Review. (link)” 2023. 66

EECS 112 (Spring 2024)
Organization of Digital Computers

Section 5. Performance Optimization Pitfalls

67
Pitfall 01: Amdahl’s Law
§ Improving an aspect of a computer and expecting a proportional improvement in
overall performance T/00-12-.
T'()*+,-. = + T34/00-12-.
improvement factor

§ Example: multiply accounts for 80s/100s

• How much improvement in multiply performance to get 5× overall?
80
20 = + 20 Can’t be done!
𝑛

§ Amdahl’s Law:
• 𝑆!"#$%&& : theoretical speedup of whole task;
1 • 𝑠 : speedup of the part of the task that benefits
𝑆DEFGHII = 𝑝 from improvements
𝑠 + (1 − 𝑝) • 𝑝 : proportion of execution time that the part
benefiting from improvements originally occupied
§ Corollary: make the common case fast
68
Deep dive into Amdahl’s Law

• f: fraction of a computation that will get speedup by optimization

• S: The amount of speedup
• Speedupenhanced(f,S): Overall (end-to-end) speedup with f and S

* M. D. Hill and M. R. Marty, “Amdahl’s Law in the Multicore Era,” Computer, vol. 41, no. 7, pp. 33-38, July, 2008. 69
Amdahl’s Law: Example
• f: fraction of a computation that will get speedup by optimization
• S: The amount of speedup
• Speedupenhanced(f,S): Overall (end-to-end) speedup with f and S

0.4s 0.6s
Before
Cannot be parallelized Can be parallelized
Parallelization

1-f = 40% f = 60%

Parallelize on a dual-core CPU
(Assumption: perfect parallelization; 2X speed up)

0.4s 0.3s
After Cannot be parallelized Can be parallelized
Parallelization

70
Amdahl’s Law: Example
• f: fraction of a computation that will get speedup by optimization
• S: The amount of speedup
• Speedupenhanced(f,S): Overall (end-to-end) speedup with f and S

0.4s 0.3s
After Cannot be parallelized Can be parallelized
Parallelization

1 1 1
1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝3#N2#(3O 𝑓 = 0.6, 𝑆 = 2 = = = = = 1.43
𝑓 0.6 0.4 + 0.3 0.7
1−𝑓 +𝑆 1 − 0.6 + 2

Only 43% effective speed up with 2X speed up on the parallelizable portion

71
Amdahl’s Law: Easier Version
• f: fraction of a computation that will get speedup by optimization
• S: The amount of speedup
• Speedupenhanced(f,S): Overall (end-to-end) speedup with f and S

0.4s 0.3s
After Cannot be parallelized Can be parallelized
Parallelization

#𝐶𝑜𝑚𝑝𝑢𝑡𝑎𝑡𝑖𝑜𝑛
𝑆𝑝𝑒𝑒𝑑2P%3& 𝐿𝑎𝑡𝑒𝑛𝑐𝑦2P%3& 𝐿𝑎𝑡𝑒𝑛𝑐𝑦R3P*&3
𝑆𝑝𝑒𝑒𝑑𝑢𝑝3#N2#(3O 𝑓 = 0.6, 𝑆 = 2 = = =
𝑆𝑝𝑒𝑒𝑑Q3P*&3 #𝐶𝑜𝑚𝑝𝑢𝑡𝑎𝑡𝑖𝑜𝑛 𝐿𝑎𝑡𝑒𝑛𝑐𝑦SP%3&
𝐿𝑎𝑡𝑒𝑛𝑐𝑦Q3P*&3
0.4 + 0.6 1.0 1
= = = = 1.43
0.6 0.4 + 0.3 0.7
0.4 +
2
2X speedup == half latency
72
Amdahl’s Law: Understanding the Original Version
The latency before optimization
(normalized to 1)

The latency after optimization if we view

the original latency as 1 (i.e., normalize)

• f: fraction of a computation that will get speedup by optimization

• S: The amount of speedup
• Speedupenhanced(f,S): Overall (end-to-end) speedup with f and S

73
Amdahl’s Law: Implication
§ Question: What do we want to optimize? (investing in engineering costs)
0.2s 0.8s
Before
Opt. candidate 1 Opt. candidate 2
Optimization

§ Best Case for Candidate 1 (Infinite speedup)

0.0s 0.8s
After
Opt. candidate 2
Optimization

§ Best Case for Candidate 2 (Infinite speedup)

0.2s 0.0s
After
Opt. candidate 1
Optimization

Implication: Let’s optimize something significant; not minor aspects 74

Pitfall02 : Using a Subset of Performance Equation (e.g., MIPS)

§ MIPS: Millions of Instructions Per Second

• Doesn’t account for
o Differences in ISAs between computers
o Differences in complexity between instructions

Instructio n count
MIPS =
Execution time ´ 10 6
Instructio n count Clock rate
= =
Instructio n count ´ CPI CPI ´ 10 6
´ 10 6

Clock rate

• CPI varies between programs on a given CPU

75
Concluding Remarks
§ Cost/performance is improving
• Due to underlying technology development
§ Hierarchical layers of abstraction
• In both hardware and software
§ Instruction set architecture
• The hardware/software interface
§ Execution time: A useful performance measure
§ Power is a limiting factor
• Use parallelism to improve performance

Lesson 3 Intro To Intel MP Architecture
No ratings yet
Lesson 3 Intro To Intel MP Architecture
46 pages
3 Combinational Logic II Afterlecture
No ratings yet
3 Combinational Logic II Afterlecture
153 pages
Computer Architecture and Operating Systems
No ratings yet
Computer Architecture and Operating Systems
26 pages
Computer Architecture Slides
No ratings yet
Computer Architecture Slides
274 pages
Lecture - 2
No ratings yet
Lecture - 2
97 pages
KTMT
No ratings yet
KTMT
470 pages
From Architectures To Operating Systems, Chapter 3, Section 3.0 To 3.3.5
No ratings yet
From Architectures To Operating Systems, Chapter 3, Section 3.0 To 3.3.5
28 pages
Lec01 Intro
No ratings yet
Lec01 Intro
41 pages
01 Abst
No ratings yet
01 Abst
64 pages
Induction Program Updated
No ratings yet
Induction Program Updated
13 pages
Onur Digitaldesign 2020 Lecture4 Combinational Logic Afterlecture
No ratings yet
Onur Digitaldesign 2020 Lecture4 Combinational Logic Afterlecture
125 pages
Lec 1
No ratings yet
Lec 1
22 pages
Week 04
No ratings yet
Week 04
44 pages
Topic1 CH-01
No ratings yet
Topic1 CH-01
38 pages
Chapter1 Computer Abstractions and Technology
No ratings yet
Chapter1 Computer Abstractions and Technology
49 pages
Lecture1 ch1
No ratings yet
Lecture1 ch1
24 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Chapter 0
No ratings yet
Chapter 0
26 pages
Lecture 1 Vlsi
No ratings yet
Lecture 1 Vlsi
67 pages
CH 1 - Introduction To Computer Architecture and Performance Measurement
No ratings yet
CH 1 - Introduction To Computer Architecture and Performance Measurement
42 pages
Chapter1 Computer Abstractions and Technology
No ratings yet
Chapter1 Computer Abstractions and Technology
53 pages
CMP2008 - NOTLAR - Chapter - 01 - 4UP
No ratings yet
CMP2008 - NOTLAR - Chapter - 01 - 4UP
13 pages
1 Intro VLSI
No ratings yet
1 Intro VLSI
88 pages
Cse431 02
No ratings yet
Cse431 02
50 pages
1-Introduction To DSD-03-01-2024
No ratings yet
1-Introduction To DSD-03-01-2024
53 pages
Computer Architecture
No ratings yet
Computer Architecture
53 pages
CH6 - Computer Abstractions and Technology
No ratings yet
CH6 - Computer Abstractions and Technology
69 pages
CSS224 Lec1
No ratings yet
CSS224 Lec1
30 pages
Introduction To Computer Architecture and Performance Measurement
No ratings yet
Introduction To Computer Architecture and Performance Measurement
41 pages
Chapter 1 - Computer Abstractions Technology432
No ratings yet
Chapter 1 - Computer Abstractions Technology432
37 pages
Topic: Identify The Front and Rear Panel Controls and Ports On A PC, Cases, Cooling, Cables and Connectors
0% (1)
Topic: Identify The Front and Rear Panel Controls and Ports On A PC, Cases, Cooling, Cables and Connectors
29 pages
Chapter1 Computer Abstractions and Technology
No ratings yet
Chapter1 Computer Abstractions and Technology
52 pages
Chapter 01
No ratings yet
Chapter 01
49 pages
Chapter 1
No ratings yet
Chapter 1
50 pages
Unit-1 ACA
No ratings yet
Unit-1 ACA
86 pages
Lecture 2: Computer Technology & Abstractions - Last Time: Review: Don't Forget The Simple View
No ratings yet
Lecture 2: Computer Technology & Abstractions - Last Time: Review: Don't Forget The Simple View
16 pages
Lecture 0. Introduction: Instructor: Weidong Shi (Larry), PHD Computer Science Department University of Houston
No ratings yet
Lecture 0. Introduction: Instructor: Weidong Shi (Larry), PHD Computer Science Department University of Houston
65 pages
Chapter 1: Computer Abstractions and Technology
No ratings yet
Chapter 1: Computer Abstractions and Technology
50 pages
Computer Abstractions and Technology: The Hardware/Software Interface 5
No ratings yet
Computer Abstractions and Technology: The Hardware/Software Interface 5
52 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
51 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
50 pages
Ico22 - 1 - Computer Abstraction and Technology
No ratings yet
Ico22 - 1 - Computer Abstraction and Technology
42 pages
Understanding Computers Today and Tomorrow, 15th Edition, Morley Chapter 2 Slides
No ratings yet
Understanding Computers Today and Tomorrow, 15th Edition, Morley Chapter 2 Slides
37 pages
Advanced Computer Architecture: Azvjvhd
No ratings yet
Advanced Computer Architecture: Azvjvhd
61 pages
Advance Operating System-Computer Organization: Chap 1a: Overview
No ratings yet
Advance Operating System-Computer Organization: Chap 1a: Overview
71 pages
Lecture 1 Introduction To Computer Architecture and Organization
No ratings yet
Lecture 1 Introduction To Computer Architecture and Organization
69 pages
Comp Organization
0% (1)
Comp Organization
49 pages
Computer Abstractions and Technology: Omputer Rganization AND Esign
No ratings yet
Computer Abstractions and Technology: Omputer Rganization AND Esign
50 pages
2nd Wave F24-1
No ratings yet
2nd Wave F24-1
118 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
61 pages
DigiTitans 3
No ratings yet
DigiTitans 3
60 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
49 pages
Chapter One - Introduction
No ratings yet
Chapter One - Introduction
56 pages
Unit 5 DSP System
100% (2)
Unit 5 DSP System
30 pages
Chapter - 1 DTT 210
No ratings yet
Chapter - 1 DTT 210
55 pages
Computer Abstractions and Technology: Omputer Rganization and Esign
No ratings yet
Computer Abstractions and Technology: Omputer Rganization and Esign
49 pages
06 - Memory System - I
No ratings yet
06 - Memory System - I
63 pages
ICS 431-Ch5-Processes Scheduling
No ratings yet
ICS 431-Ch5-Processes Scheduling
129 pages
MCA Mini Project Documentation Tikamchand
No ratings yet
MCA Mini Project Documentation Tikamchand
8 pages
Unit I Overview & Instructions: Cs6303-Computer Architecture
100% (1)
Unit I Overview & Instructions: Cs6303-Computer Architecture
16 pages
Computer Architecture and Organization: General Introduction
No ratings yet
Computer Architecture and Organization: General Introduction
72 pages
Micro Python
No ratings yet
Micro Python
10 pages
Comp Arch Ch1 Ch2 Ch3 Ch4
No ratings yet
Comp Arch Ch1 Ch2 Ch3 Ch4
161 pages
Digital Design Lec1 Introduction
No ratings yet
Digital Design Lec1 Introduction
43 pages
Operating System Processes and CPU Scheduling
No ratings yet
Operating System Processes and CPU Scheduling
39 pages
EEM 486: Computer Architecture: Course Introduction and The Five Components of A Computer
No ratings yet
EEM 486: Computer Architecture: Course Introduction and The Five Components of A Computer
13 pages
Buy HP 15s-Fq2672TU Intel Core I3 11th Gen (15.6
No ratings yet
Buy HP 15s-Fq2672TU Intel Core I3 11th Gen (15.6
1 page
HirePurchaseScheme CurrentPriceList 13-2-14
No ratings yet
HirePurchaseScheme CurrentPriceList 13-2-14
60 pages
1 - Intro
No ratings yet
1 - Intro
12 pages
MS836 - Manual Decodificación Por Software
No ratings yet
MS836 - Manual Decodificación Por Software
68 pages
Practice Final Exam Blank
No ratings yet
Practice Final Exam Blank
9 pages
Central Processing Unit
No ratings yet
Central Processing Unit
14 pages
Practice Midterm 2 Blank
No ratings yet
Practice Midterm 2 Blank
5 pages
VLSI Processors
No ratings yet
VLSI Processors
11 pages
CS401 Mega MCQs File Spring 2020 by MCS of Virtuallians
No ratings yet
CS401 Mega MCQs File Spring 2020 by MCS of Virtuallians
30 pages
Silberschatz, Galvin and Gagne 2002 1.1 Operating System Concepts
No ratings yet
Silberschatz, Galvin and Gagne 2002 1.1 Operating System Concepts
24 pages
Versa N24
No ratings yet
Versa N24
12 pages
Paper Size
No ratings yet
Paper Size
22 pages
Clariion Codes
No ratings yet
Clariion Codes
20 pages
03 Ideology
No ratings yet
03 Ideology
1 page
PLATTECH
No ratings yet
PLATTECH
3 pages
Processor Chip's Contents: Unit Function
No ratings yet
Processor Chip's Contents: Unit Function
7 pages
Contact: Computer Organization and Architecture
No ratings yet
Contact: Computer Organization and Architecture
14 pages
DFE-520TX B1 Manual v5.00
No ratings yet
DFE-520TX B1 Manual v5.00
8 pages
Microprocessor and Interfacing Theory Assignment-1
No ratings yet
Microprocessor and Interfacing Theory Assignment-1
6 pages
Output Devices
No ratings yet
Output Devices
4 pages
Classification of Computers
No ratings yet
Classification of Computers
4 pages
BTEL13501 Microprocessor and Microcontroller Sem5 Final NVS DB
No ratings yet
BTEL13501 Microprocessor and Microcontroller Sem5 Final NVS DB
4 pages
Grade 3 Quarterly Exams
No ratings yet
Grade 3 Quarterly Exams
3 pages
Quarter 3 - Module 1-W4&W5: Answer & Submit This Page. (W4-1)
No ratings yet
Quarter 3 - Module 1-W4&W5: Answer & Submit This Page. (W4-1)
11 pages
3ware 9500s 12k
No ratings yet
3ware 9500s 12k
2 pages
DS800 Development Suite Software: Specification Sheet
No ratings yet
DS800 Development Suite Software: Specification Sheet
2 pages
Esxtop Troubleshooting Eng
No ratings yet
Esxtop Troubleshooting Eng
1 page
Apple Market Ranker: Experian Micromarketer Generation3
No ratings yet
Apple Market Ranker: Experian Micromarketer Generation3
10 pages
Automated Optical Inspection: Advancements in Computer Vision Technology
From Everand
Automated Optical Inspection: Advancements in Computer Vision Technology
Fouad Sabry
No ratings yet

01 Abstraction and Technology

Uploaded by

01 Abstraction and Technology

Uploaded by

EECS 112 (Spring 2024)

Organization of Digital Computers

Computer revolution makes novel applications feasible

Inside iPhone XS Max

Inside iPhone XS Max

§ Control: sequences datapath, memory, and other components

§ On-chip Memory: Stores data near CPU cores

Apple A12 System-on-Chip (SoC)

p-substrate (silicon) “Closed” switch (i.e., connected)

p-substrate (silicon) “Open” switch (i.e., disconnected)

What you need to remember: MOSFET is an electrically controlled switch 17

p-substrate (silicon) n-substrate (silicon)

<N-type MOSFET> <P-type MOSFET>

Source Drain Source Drain

p-substrate (silicon) Drain

The input signal is “inverted”

Vss = 0 V “Inverter” or “NOT” gate

Take-away: we can build all logic gates using transistors 20

Yield: proportion of working dies per wafer

§ Nonlinear relation to area and defect rate

§ Use abstraction to simplify design

§ Make the common case fast

§ Performance via parallelism

§ Performance via prediction

§ Dependability via redundancy

Thanks to the multiple levels of

Thanks to the multiple-level of abstractions, programmers do not have to

§ Latency- and throughput-oriented optimization strategies

§ We’ll focus on response time for now

𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒. 𝐸𝑥𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒/

§ Example: time taken to run a program

§ Clock period (Clock cycle time): duration of a clock cycle

§ Clock frequency (Clock rate): Number of cycles per second

Clock Cycle: The duration of one clock signal

Because we sample values at the end of each clock cycle

CPU Time = CPU Clock Cycles ´ Clock Cycle Time

Clock CyclesB 1.2 ´ Clock CyclesA

#𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 = 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡 × 𝐶𝑦𝑐𝑙𝑒𝑠 𝑝𝑒𝑟 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 (𝐶𝑃𝐼)

CPU Time = Instruction Count ´ CPI ´ Cycle Time

Weighted average CPI

Reflects the relative

* IC: Instruction Count 48

CPI Clock cycle time

• Instruction Set Architecture (ISA): affects IC, CPI, clock rate

§ Programming Language, Compiler, and Architecture

§ Microarchitecture: Processor and Memory System

§ Input and Output (I/O): Hardware and Software

• Example: Add integers - 0 to 100

Which algorithm is better (light-weighted; faster)?

We need to understand the underlying hardware architecture to precisely analyze

Which choice is better?

Which choice is better?

Video: CPU overclocking (over 9GHz) using liquid nitrogen

Constrained by power, instruction-

Pnew Cold ´ 0.85 ´ (Vold ´ 0.85) 2 ´ Fold ´ 0.85

§ The power wall

§ To fully utilize the performance potential, explicit parallel

Often referred to as “CMP” : Chip MultiProcessor

2. When load is 100%, if your cooling solutions

* NVIDIA, “GPU Power Primer.” 2019 65

Consumed at most 221.87 W

Note: This is provided as an example.

* AnandTech, “The AMD Ryzen 9 7950X 3D Review. (link)” 2023. 66

Section 5. Performance Optimization Pitfalls

§ Example: multiply accounts for 80s/100s

• f: fraction of a computation that will get speedup by optimization

1-f = 40% f = 60%

Only 43% effective speed up with 2X speed up on the parallelizable portion

The latency after optimization if we view

• f: fraction of a computation that will get speedup by optimization

§ Best Case for Candidate 1 (Infinite speedup)

§ Best Case for Candidate 2 (Infinite speedup)

Implication: Let’s optimize something significant; not minor aspects 74

§ MIPS: Millions of Instructions Per Second

• CPI varies between programs on a given CPU

You might also like