Ico22 - 1 - Computer Abstraction and Technology
Ico22 - 1 - Computer Abstraction and Technology
Technology
Chun-Jen Tsai
NYCU
02/17/2022
An Array of Computers
Traditional sense of computers:
Super Computers Mainframe Workstation/PC Notebook/Tablet
3/42
Two Ends of Computer Categories
Personal Mobile Device (PMD)
Battery operated
Connects to the Internet
Hundreds of dollars
Smart phones, tablets, electronic glasses
Cloud computing
Warehouse Scale Computers (WSC)
Software as a Service (SaaS)
Portion of software run on a PMD and a portion run in the Cloud
Amazon and Google
4/42
Performance Factors
Algorithm
Determines number of operations executed
Programming language, compiler, and architecture
Determine number of instructions executed per operation
Processor and memory system
Determine how fast instructions are executed
I/O system (including OS)
Determines how fast I/O operations are executed
5/42
Seven Great Ideas
Use abstraction to simplify design
Make the common case fast
Performance via parallelism
Performance via pipelining
Performance via prediction
Hierarchy of memories
Dependability via redundancy
6/42
Below Your Program
Application software
Written in high-level language
System software
Compiler: translates HLL code to
machine code
Operating System: service code
Handling input/output
Managing memory and storage
Scheduling tasks & sharing resources
Hardware
Processor, memory, I/O controllers
7/42
Levels of Program Code
High-level language
Level of abstraction closer to
problem domain
Provides for productivity and
portability
Assembly language
Textual representation of
instructions
Machine code
Binary digits (bits)
Encoded instructions and data
8/42
Components of a Computer
Same components for
all kinds of computer
Desktop, server,
embedded
Input/output includes
User-interface devices
Display, keyboard, mouse
Storage devices
Hard disk, CD/DVD, flash
Network adapters
For communicating with other computers
9/42
Frame Buffers for Video Display
LCD screen: picture elements (pixels)
Mirrors content of frame buffer memory
10/42
Opening the Box
The internals of an iPhone
11/42
Inside the Processor (CPU)
Datapath: performs operations on data
Control: sequences datapath, memory, ...
Cache memory: Small SRAM for fast access to data
Apple A12:
12/42
Abstractions
Abstraction helps us deal with complexity
Hide lower-level detail
Instruction set architecture (ISA)
The hardware/software interface
Application binary interface
The ISA plus system software interface
Implementation
The details underlying and interface
13/42
A Safe Place for Data
Volatile main memory
Loses instructions and data when power off
Non-volatile secondary memory
Magnetic disk
Flash memory
Optical disk (CDROM, DVD)
14/42
Networks
Communication, resource sharing, nonlocal access
Local area network (LAN): Ethernet
Wide area network (WAN): the Internet
Wireless network: WiFi, Bluetooth
15/42
Technology Trends
Electronics technology continues to evolve
Increased capacity and performance
Reduced cost
DRAM capacity
17/42
Intel® Core 10th Gen
300mm wafer, 506 chips, 10nm technology
Each chip is 11.4 x 10.7 mm
18/42
Integrated Circuit Cost
Nonlinear relation to area and defect rate
Wafer cost and area are fixed
Defect rate determined by manufacturing process
Die area determined by architecture and circuit
design
20/42
Response Time and Throughput
Response time
How long it takes to do a task
Throughput
Total work done per unit time
e.g., tasks/transactions/… per hour
How are response time and throughput
affected by
Replacing the processor with a faster version?
Adding more processors?
21/42
Relative Performance
Define Performance = 1/Execution Time
“X is n time faster than Y”
Performanc e X Performanc e Y
Execution time Y Execution time X n
22/42
Measuring Execution Time
Elapsed time
Total response time, including all aspects
Processing, I/O, OS overhead, idle time
Determines system performance
CPU time
Time spent processing a given job
Discounts I/O time, other jobs’ shares
Comprises user CPU time and system CPU time
Different programs are affected differently by CPU and system
performance
23/42
CPU Clocking
Operation of digital hardware governed by a constant-
rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
24/42
CPU Time
Performance improved by
Reducing number of clock cycles
Increasing clock rate
Hardware designer must often trade off clock rate against
cycle count
25/42
CPU Time Example
Computer A: 2GHz clock, 10s CPU time
Designing Computer B
Aim for 6s CPU time
Can do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be?
Clock CyclesB 1.2 Clock Cycles A
Clock RateB
CPU Time B 6s
Clock Cycles A CPU Time A Clock Rate A
10s 2GHz 20 10 9
1.2 20 10 9 24 10 9
Clock RateB 4GHz
6s 6s
26/42
Instruction Count and CPI
Instruction Count for a program
Determined by program, ISA and compiler
Average cycles per instruction
Determined by CPU hardware
If different instructions have different CPI
Average CPI affected by instruction mix
27/42
CPI Example
Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
28/42
CPI in More Detail
If different instruction classes take different # of cycles
n
Clock Cycles (CPIi Instruction Count i )
i1
Clock Cycles n
Instruction Count i
CPI CPIi
Instruction Count i1 Instruction Count
Relative frequency
29/42
CPI Example
Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
Sequence 1: IC = 5 Sequence 2: IC = 6
Clock Cycles Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
30/42
Performance Summary
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
31/42
Power Trends
In CMOS IC technology
Power Capacitive load Voltage 2 Frequency
×30 5V → 1V ×1000
32/42
Reducing Power
Suppose a new CPU has
85% of capacitive load of old CPU
15% voltage and 15% frequency reduction
33/42
Uniprocessor Performance
34/42
Multiprocessors
Multicore microprocessors
More than one processor per chip
Requires explicitly parallel programming
Compare with instruction level parallelism
Hardware executes multiple instructions at once
Hidden from the programmer
Hard to do
Programming for performance
Load balancing
Optimizing communication and synchronization
35/42
CPU Benchmark
Programs used to measure performance
Supposedly typical of actual workload
36/42
SPECint 2017 Examples
Integer benchmarks on a 1.8 GHz Intel Xeon E5-2650L
37/42
SPEC Power Benchmark
Power consumption of server at different workloads
Performance: ssj_ops/sec
Power: Watts (Joules/sec)
10 10
Overall ssj_ops per Watt ssj_ops i poweri
i 0 i 0
Example:
Xeon E5-2650L
38/42
Amdahl’s Law
Improving an aspect of a computer and expecting a
proportional improvement in overall performance
Taffected
Timproved Tunaffected
improvemen t factor
39/42
Fallacy: Low Power at Idle
Look back at i7 power benchmark
At 100% load: 258W
At 50% load: 170W (66%)
At 10% load: 121W (47%)
Google data center
Mostly operates at 10% – 50% load
At 100% load less than 1% of the time
Consider designing processors to make power
proportional to load
40/42
Pitfall: MIPS as a Performance Metric
MIPS: Millions of Instructions Per Second
Doesn’t account for
Differences in ISAs between computers
Differences in complexity between instructions
Instruction count
MIPS
Execution time 10 6
Instruction count Clock rate
Instruction count CPI CPI 10 6
10 6
Clock rate
41/42
Concluding Remarks
Cost/performance is improving
Due to underlying technology development
42/42