01 Abstraction and Technology
01 Abstraction and Technology
Chapter 01
Computer Abstraction and Technology
Hyoukjun kwon
[email protected]
EECS 112 (Spring 2024)
Organization of Digital Computers
Section 1. Technology
2
Introduction: The Computer Revolution
Three revolutions for civilization:
§ Agricultural revolution
§ Industrial revolution
§ Information revolution
• Computer revolution is the foundation
3
Classes of Computers
§ Personal computers (“PC”)
• General purpose, variety of software
• Subject to cost/performance tradeoff
§ Server computers
• Network based
• High capacity, performance, reliability
• Range from small servers to building sized
§ Supercomputers
• High-end scientific and engineering calculations
• Highest capability but represent a small fraction of the overall computer market
§ Embedded computers
• Hidden as components of systems
• Stringent power/performance/cost constraints
4
The Post PC Era
5
The Post PC Era
§ Personal Mobile Device (PMD)
• Battery operated
• Connects to the Internet
• Hundreds of dollars
• Smart phones, tablets, electronic glasses
§ Cloud computing
• Warehouse Scale Computers (WSC)
• Software as a Service (SaaS)
• Portion of software run on a PMD, and a portion run in the Cloud
• Amazon and Google
§ Others:
• AR/VR
• Autonomous driving
• …
6
Opening the Box
7
Opening the Box
8
Inside the Processor (CPU)
§ Datapath: performs operations on data (i.e., computation)
High-level comparison of on-chip and off-chip memory Subject to the chip size
• Cache memories require 1-2 cycles for access (but small size: KB- 10s MB range)
• DRAMs (off-chip memory) requires 50-60 cycles* (but large size: GB range)
* S. Eyerman et al., “DRAM Bandwidth and Latency Stacks: Visualizing DRAM Bottlenecks.” ISPASS 2022 (Intel Paper) 9
Inside the Processor
10
Networks
§ Functionalities
• Communication: exchange information
• Resource sharing
o e.g., many users can share one GPU server
• Nonlocal access to remote resources
o e.g., provide access to a computer server
§ Types of Networks
• Local area network (LAN)
o Wired: Ethernet
o Wireless: WiFi
• Wide area network (WAN): the Internet
• Personal area network (PAN)
o Wireless network: Mainly WiFi and Bluetooth
11
Memory and Storage
§ Volatile main memory (DRAM)
• Loses instructions and data when power off
§ Non-volatile secondary memory (storage)
• Flash memory (solid state drive; SSD)
• Magnetic disk (hard disk drive)
• Optical disk (CDROM, DVD)
No single option is ideal for all use cases; they all have trade-offs
12
Technology Trends
§ Electronics technology continues to evolve
• Increased capacity and performance Year Technology Relative performance/cost
• Reduced cost 1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000
§ Semiconductor Technology
• Built upon silicon
• Add materials (conductors and insulators) to transform properties
• Organize the structure to become transistors
• Transistors work as electrically controlled switches
o (conduct or insulate under specific conditions) => transistors
What does a transistor look like?
13
Insulator: material
Modern Transistor: MOSFET where electric current
doesn’t flow freely
(e.g., rubber blocks
§ Metal Oxide Semiconductor Field-Effect Transistor electricity)
Insulator
Gate
Source Drain
Gate
Gate Oxide
n n
p-substrate (silicon)
<n-channel MOSFET>
Insulator
Gate
Source Drain
Gate
Gate Oxide
p p
n-substrate (silicon)
<p-channel MOSFET>
• Figure source: M. Riordan et al., “The invention of the transistor.” Reviews of Modern Physics 71.2, 1999. 14
Modern Transistor: MOSFET
§ Metal Oxide Semiconductor Field-Effect Transistor
- - -
Insulator - - -
Gate
- Si - - Si - - Si -
Source Drain
- Si - - Si - - Si -
Gate - - -
- - -
Gate Oxide “Free” electron
“Hole”
- - -
- - -
n n
- Si - - P - - Si - - Si - - B - - Si -
- - -
p-substrate (silicon) - - -
- - - - - -
<n-channel MOSFET>
- Si - - Si - - Si - - Si - - Si - - Si -
Insulator
Gate - - - - - -
Source Drain
Gate N-doped Substrate P-doped Substrate
Gate Oxide
p p
- Add atoms with five electrons - Add atoms with three electrons
- ”Free” electrons can flow away - ”Holes” can accommodate
n-substrate (silicon)
incoming electrons
<p-channel MOSFET>
15
VT: A voltage large enough to create an n-channel
MOSFET as a Switch
§ How n-type MOSFET works as a switch
Vh > VT
Electrons attracted to the surface due to Vh
Gate
Source Gate Drain
Gate Oxide
-- -- -- -- -- -- -- -- --
n n
“n-channel”
GND
Gate
Source Gate Drain
Gate Oxide
n No n-channel n
P-type MOSFET works in the opposite way (If GND applied to GATE, the switch is closed) 16
MOSFET as a Switch
+5V “High” voltage on
Gate Gate Gate == Pushing the
Source Drain switch to close it
Gate
Gate Oxide
-- -- -- -- -- -- -- -- --
n n
“n-channel”
Source Drain
p-substrate (silicon)
<N-type MOSFET>
0V
Gate “Low” voltage on
Gate
Gate == Do not
Source Gate Drain
push the switch;
Gate Oxide switch is open
n No n-channel n
Source Drain
p-substrate (silicon)
<N-type MOSFET>
18
VGS: The voltage between gate and source
Switch Abstraction of MOSFET VT: A threshold voltage for a connection
Source
Insulator
Gate
Source Drain
Gate Connected if VGS > VT
Gate Oxide Gate
n n
19
Building Logic Gates with Transistors
Input Output
Vdd = 5 V “Low” == 0 V “High” == 5 V
“High” == 5 V “Low” == 0 V
In Out
22
Integrated Circuit Cost
Cost per wafer
Cost per die =
Dies per wafer×Yield
Dies per wafer ≈ Wafer area⁄Die area
1 Empirical observations of
Yield = yields at IC factories;
(1 + (Defects per area×Die area))! N related to the number of
critical processing steps
23
EECS 112 (Spring 2024)
Organization of Digital Computers
Section 2. Abstraction
24
Seven Great Ideas in Computer Architecture
§ Hierarchy of memories
25
Below Your Program
§ Application software
• Written in high-level language (HLL)
§ System software
• Compiler: translates HLL code to machine code
• Operating System: service code
o Handling input/output
o Managing memory and storage
o Scheduling tasks & sharing resources
§ Hardware
• Processor, memory, I/O controllers
26
Levels of Program Code
§ High-level language (Python, C, C++, …)
• Level of abstraction closer to problem domain
• Provides for productivity and portability
§ Assembly language
• Textual representation of instructions
§ Hardware representation
• Binary digits (bits)
• Encoded instructions and data
27
Below Your Program
§ Application software
• Written in high-level language (HLL)
§ System software
• Compiler: translates HLL code to machine code
• Operating System: service code
o Handling input/output
o Managing memory and storage
o Scheduling tasks & sharing resources
§ Hardware
• Processor, memory, I/O controllers
28
Components of a Computer
§ Same components for all kinds of computer
• Desktop, server, embedded, …
§ Input/output includes
• User-interface devices
o Display, keyboard, mouse, touch screen
• Storage devices
o Hard disk, CD/DVD, flash
• Network adapters
o For communicating with other computers
§ Processor
• Control + Datapath + on-chip memory; cache
29
Components of a Computer
Section 3. Performance
31
Topics in Performance
§ Definition of performance in computer system
§ Relative performance based on the execution time
§ Clock frequency and period (clock cycle time)
§ CPU time and CPI
§ Performance Formula
§ Factors affecting the performance
32
Defining Performance
§ Which airplane has the best performance?
33
Performance Metrics: Response Time and Throughput
§ Response time (i.e., latency)
• How long it takes to complete a task
Vs.
§ Throughput
• Total work done per unit time
o e.g., tasks/transactions/… per hour
35
Relative Performance
!
§ Define Performance =
"#$%&'&()* ,(-$
§ “X is n time faster than Y”
36
Measuring Execution Time
§ Elapsed time (aka wall clock time or response time)
• Total response time, including all aspects (i.e., “end-to-end” latency)
o Includes processing (computation), I/O (data movement), OS overhead, idle time
(“Stalls”; to be discussed later in the lecture), and so on
• Determines system performance
§ CPU time
• Time spent processing a given job on a CPU
o Discounts I/O time, other jobs’ shares
• Consists of user CPU time (time spent on user-defined programs) and
system CPU time (time spent on OS/system services for running the program)
• Different programs are affected differently by CPU and system performance
Example: What if an OS takes too long for dynamic memory allocation (e.g., Malloc)? 37
Topics in Performance
§ Definition of performance in computer system
§ Relative performance based on the execution time
§ Clock frequency and period (clock cycle time)
§ CPU time and CPI
§ Performance Formula
§ Factors affecting the performance
38
CPU Clocking
§ Operation of digital hardware governed by a constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
39
Clock Frequency and Cycle
§ Frequency: How many times does a signal oscillate for each second?
Signal
Value Once a second => 1 Hz
1 The duration of one clock signal: 1 second
…
(1 second / 1 clock = 1)
0
1 sec Time Hz (“Hertz”): The number of oscillations of per second
Signal (i.e., frequency)
Value
1 Twice a second => 2 Hz
…
The duration of one clock signal: 0.5 second
0
Time
(1 second / 2 clock = 0.5)
1 sec
40
Clock Frequency and Cycle
§ Frequency: How many times does a signal oscillate for each second?
Signal
Value N times per second => N Hz
1 The duration of one clock signal (Clock Cycle): 1/N second
… (1 second / N clocks = 1/N)
0
1 sec Time
Key Idea: Clock cycle must be longer than the critical path delay
41
Topics in Performance
§ Definition of performance in computer system
§ Relative performance based on the execution time
§ Clock frequency and period (clock cycle time)
§ CPU time and CPI
§ Performance Formula
§ Factors affecting the performance
42
CPU Time
1
𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒 (𝑐𝑙𝑜𝑐𝑘 𝑝𝑒𝑟𝑖𝑜𝑑) =
𝐶𝑙𝑜𝑟𝑘 𝑅𝑎𝑡𝑒
§ Performance can be improved by
• Reducing number of clock cycles (i.e., decrease the numerator of the above)
• Increasing clock rate (i.e., increase the denominator of the above)
Hardware designers often need to trade off clock rate against cycle count
43
CPU Time Example
§ Computer A: 2 GHz clock, 10 s CPU time
§ Designing Computer B
• Aim for 6s CPU time
• Can do faster clock, but causes 1.2 × clock cycles
§ How fast must Computer B clock be?
45
Example: Using CPI to compute CPU Time
§ Computer A: Cycle Time = 250ps, CPI = 2.0
§ Computer B: Cycle Time = 500ps, CPI = 1.2
§ Same ISA
§ Which is faster, and by how much?
46
CPI in More Detail
§ If different instruction classes (e.g., Integer Add vs. Floating
point Add) take different numbers of cycles
n
Clock Cycles = å (CPIi ´ Instruction Count i )
i=1
Clock Cycles n
æ Instructio n Count i ö
CPI = = å ç CPIi ´ ÷
Instructio n Count i=1 è Instructio n Count ø
47
CPI Example
§ Alternative compiled code sequences using instructions in classes A, B, C
Instruction Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
n Sequence 1: IC = 5 n Sequence 2: IC = 6
n Clock Cycles n Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
n Avg. CPI = 10/5 = 2.0 n Avg. CPI = 9/6 = 1.5
49
Performance Summary
Seconds Instructions Clock cycles Seconds
CPU Time = = × ×
Program Program Instruction Clock cycle
§ Performance depends on
• Algorithm: affects Instruction Count (IC), possibly CPI
• Programming language: affects IC, CPI
• Compiler: affects IC, CPI
50
Topics in Performance
§ Definition of performance in computer system
§ Relative performance based on the execution time
§ Clock frequency and period (clock cycle time)
§ CPU time and CPI
§ Performance formulation
§ Factors affecting the performance
51
Understanding Factors Affecting “Performance”
§ Algorithm
Ø What is the problem-solving strategy (Mathematics level)?
52
Understanding Factors Affecting “Performance”
§ Algorithm
• What it means
Ø What is the problem-solving strategy (Mathematics level)?
What if we have a very efficient addition engine that can handle 128 adds every cycle?
53
Understanding Factors Affecting “Performance”
§ Programming Language, Compiler, and Architecture
• What it means
Ø How will we generate instructions for our computer to run the algorithm?
• High-level Example
• You want to purchase stamps and a burger.
• Choice 1
(1) Get to the post office and purchase stamps
(2) Go back home and put your stamps on your desk
(3) Get to the Xn-N-Xout to get a burger
(4) Go back home and enjoy the burger
• Choice 2
(1) Get to the post office and purchase stamps
(2) Get to the Xn-N-Xout to get a burger
(3) Go back home, put your stamps on your desk, and enjoy the burger
What if the post office and Xn-N-Xout are in the opposite direction?
54
Understanding Factors Affecting “Performance”
§ Microarchitecture: Processor and Memory System
• What it means
Ø How are underlying hardware modules implemented?
• High-level Example
• CPU 1
• Data load from on-chip memory: 10 cycles
• Summing up 128 integers with a special instruction: 1 cycle
• Summing up 128 integers without the special instruction: 128 cycles
• CPU 2
• Data load from on-chip memory: 1 cycle
• Summing up 128 integers: 128 cycles
55
Understanding Factors Affecting “Performance”
§ Input and Output (I/O): Hardware and Software
• What it means
Ø How fast can we move data into / out of the processor?
• High-level Example Bottleneck: Cooking Speed
Avg: Avg: Avg:
20 Ingredients / minute 1 burger / minute 1 burger / minute
Deliver to
Restaurant
Assumption:
One burger needs five ingredients 56
Understanding Factors Affecting “Performance”
§ Input and Output (I/O): Hardware and Software
• What it means
Ø How fast can we move data into / out of the processor?
• High-level Example Bottleneck: Ingredient delivery (i.e., I/O)
Avg: Avg: Avg:
20 Ingredients / minute 6 burgers / minute 4 burger / minute
Deliver to
Restaurant
57
EECS 112 (Spring 2024)
Organization of Digital Computers
Section 4. Power
58
Static and Dynamic Power in CMOS Technology
§ Static Power
• Mainly based on the leakage: When a transistor is inactive, small current flows
through a transistor when there is no activity (activity: change in the value)
§ Dynamic Power
• Power consumption when we change the value of a transistor
1
𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑃𝑜𝑤𝑒𝑟 ∝ 𝐶𝑓𝑉 5
2
• C: Capacitance. Dependent on the technology node (e.g., 45 nm, 16 nm, 7 nm, …) and fanout (i.e.,
how many transistors are connected on the downstream?)
• f: Transition frequency (i.e., how often do we change between 0 and 1?)
• V: Voltage
f is relevant to the clock rate (The higher clock rate, the higher f)
Power and Clock Frequency Trend over 30 years
Technology innovations
(smaller transistors)
Architectural optimizations
(e.g., multi-core)
1
𝐷𝑦𝑛𝑎𝑚𝑖𝑐 𝑃𝑜𝑤𝑒𝑟 ∝ 𝐶𝑓𝑉 5 Reached “Power wall”: Cannot further increase the
2 clock frequency due to the heat
How Severe was the Power Wall?
* Source Sung Hwan Kim, “Germanium-Source Tunnel Field Effect Transistors for Ultra-Low Power Digital Logic.” University of
Uniprocessor Performance
63
Multiprocessors
§ Multicore microprocessors
• More than one processor per chip
64
FYI: TDP vs Power Consumption
§ TDP: Thermal Design Power
• How much heat dissipation can the target cooling system (i.e,,
default coolers) can manage?
Common Error: TDP == Actual Power
(Relevant, but not the same!)
1. If the load is not 100%, the power
consumption is less than the TDP
67
Pitfall 01: Amdahl’s Law
§ Improving an aspect of a computer and expecting a proportional improvement in
overall performance T/00-12-.
T'()*+,-. = + T34/00-12-.
improvement factor
§ Amdahl’s Law:
• 𝑆!"#$%&& : theoretical speedup of whole task;
1 • 𝑠 : speedup of the part of the task that benefits
𝑆DEFGHII = 𝑝 from improvements
𝑠 + (1 − 𝑝) • 𝑝 : proportion of execution time that the part
benefiting from improvements originally occupied
§ Corollary: make the common case fast
68
Deep dive into Amdahl’s Law
* M. D. Hill and M. R. Marty, “Amdahl’s Law in the Multicore Era,” Computer, vol. 41, no. 7, pp. 33-38, July, 2008. 69
Amdahl’s Law: Example
• f: fraction of a computation that will get speedup by optimization
• S: The amount of speedup
• Speedupenhanced(f,S): Overall (end-to-end) speedup with f and S
0.4s 0.6s
Before
Cannot be parallelized Can be parallelized
Parallelization
0.4s 0.3s
After Cannot be parallelized Can be parallelized
Parallelization
70
Amdahl’s Law: Example
• f: fraction of a computation that will get speedup by optimization
• S: The amount of speedup
• Speedupenhanced(f,S): Overall (end-to-end) speedup with f and S
0.4s 0.3s
After Cannot be parallelized Can be parallelized
Parallelization
1 1 1
1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝3#N2#(3O 𝑓 = 0.6, 𝑆 = 2 = = = = = 1.43
𝑓 0.6 0.4 + 0.3 0.7
1−𝑓 +𝑆 1 − 0.6 + 2
71
Amdahl’s Law: Easier Version
• f: fraction of a computation that will get speedup by optimization
• S: The amount of speedup
• Speedupenhanced(f,S): Overall (end-to-end) speedup with f and S
0.4s 0.3s
After Cannot be parallelized Can be parallelized
Parallelization
#𝐶𝑜𝑚𝑝𝑢𝑡𝑎𝑡𝑖𝑜𝑛
𝑆𝑝𝑒𝑒𝑑2P%3& 𝐿𝑎𝑡𝑒𝑛𝑐𝑦2P%3& 𝐿𝑎𝑡𝑒𝑛𝑐𝑦R3P*&3
𝑆𝑝𝑒𝑒𝑑𝑢𝑝3#N2#(3O 𝑓 = 0.6, 𝑆 = 2 = = =
𝑆𝑝𝑒𝑒𝑑Q3P*&3 #𝐶𝑜𝑚𝑝𝑢𝑡𝑎𝑡𝑖𝑜𝑛 𝐿𝑎𝑡𝑒𝑛𝑐𝑦SP%3&
𝐿𝑎𝑡𝑒𝑛𝑐𝑦Q3P*&3
0.4 + 0.6 1.0 1
= = = = 1.43
0.6 0.4 + 0.3 0.7
0.4 +
2
2X speedup == half latency
72
Amdahl’s Law: Understanding the Original Version
The latency before optimization
(normalized to 1)
73
Amdahl’s Law: Implication
§ Question: What do we want to optimize? (investing in engineering costs)
0.2s 0.8s
Before
Opt. candidate 1 Opt. candidate 2
Optimization
Instructio n count
MIPS =
Execution time ´ 10 6
Instructio n count Clock rate
= =
Instructio n count ´ CPI CPI ´ 10 6
´ 10 6
Clock rate
75
Concluding Remarks
§ Cost/performance is improving
• Due to underlying technology development
§ Hierarchical layers of abstraction
• In both hardware and software
§ Instruction set architecture
• The hardware/software interface
§ Execution time: A useful performance measure
§ Power is a limiting factor
• Use parallelism to improve performance
76