0% found this document useful (0 votes)
12 views40 pages

Chapter 01

The document discusses the fundamentals of computer architecture, focusing on performance improvements driven by advancements in semiconductor technology and computer architectures. It highlights the shift from instruction-level parallelism to new models like data-level and thread-level parallelism, and categorizes different classes of computers based on their performance emphasis. Additionally, it covers trends in technology, power consumption, cost, dependability, and principles of computer design, while addressing common misconceptions in the field.

Uploaded by

Awesome Ali TV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views40 pages

Chapter 01

The document discusses the fundamentals of computer architecture, focusing on performance improvements driven by advancements in semiconductor technology and computer architectures. It highlights the shift from instruction-level parallelism to new models like data-level and thread-level parallelism, and categorizes different classes of computers based on their performance emphasis. Additionally, it covers trends in technology, power consumption, cost, dependability, and principles of computer design, while addressing common misconceptions in the field.

Uploaded by

Awesome Ali TV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Computer Architecture

A Quantitative Approach, Sixth Edition

Chapter 1
Fundamentals of Quantitative
Design and Analysis

Copyright © 2019, Elsevier Inc. All rights reserved. 1


Introduction
Computer Technology
 Performance improvements:
 Improvements in semiconductor technology

Feature size, clock speed
 Improvements in computer architectures

Enabled by HLL compilers, UNIX

Lead to RISC architectures

 Together have enabled:



Lightweight computers

Productivity-based managed/interpreted
programming languages

Copyright © 2019, Elsevier Inc. All rights reserved. 2


Introduction
Single Processor Performance

Copyright © 2019, Elsevier Inc. All rights reserved. 3


Introduction
Current Trends in Architecture
 Cannot continue to leverage Instruction-Level
parallelism (ILP)
 Single processor performance improvement ended in
2003

 New models for performance:


 Data-level parallelism (DLP)
 Thread-level parallelism (TLP)
 Request-level parallelism (RLP)

 These require explicit restructuring of the


application

Copyright © 2019, Elsevier Inc. All rights reserved. 4


Classes of Computers
Classes of Computers
 Personal Mobile Device (PMD)
 e.g. start phones, tablet computers
 Emphasis on energy efficiency and real-time
 Desktop Computing
 Emphasis on price-performance
 Servers
 Emphasis on availability, scalability, throughput
 Clusters / Warehouse Scale Computers
 Used for “Software as a Service (SaaS)”
 Emphasis on availability and price-performance
 Sub-class: Supercomputers, emphasis: floating-point
performance and fast internal networks
 Internet of Things/Embedded Computers
 Emphasis: price

Copyright © 2019, Elsevier Inc. All rights reserved. 5


Classes of Computers
Parallelism
 Classes of parallelism in applications:
 Data-Level Parallelism (DLP)
 Task-Level Parallelism (TLP)

 Classes of architectural parallelism:


 Instruction-Level Parallelism (ILP)
 Vector architectures/Graphic Processor Units (GPUs)
 Thread-Level Parallelism
 Request-Level Parallelism

Copyright © 2019, Elsevier Inc. All rights reserved. 6


Classes of Computers
Flynn’s Taxonomy
 Single instruction stream, single data stream (SISD)

 Single instruction stream, multiple data streams (SIMD)


 Vector architectures
 Multimedia extensions
 Graphics processor units

 Multiple instruction streams, single data stream (MISD)


 No commercial implementation

 Multiple instruction streams, multiple data streams


(MIMD)
 Tightly-coupled MIMD
 Loosely-coupled MIMD

Copyright © 2019, Elsevier Inc. All rights reserved. 7


 The sequence of instructions read from
memory is called an instruction stream.
 The operations performed on the data in
the processor is called data stream.

Copyright © 2019, Elsevier Inc. All rights reserved. 8


 SISD:

Copyright © 2019, Elsevier Inc. All rights reserved. 9


 SIMD

Copyright © 2019, Elsevier Inc. All rights reserved. 10


SIMD
 All processors receive same instruction but
operate on different items of data.

Copyright © 2019, Elsevier Inc. All rights reserved. 11


MIMD

Copyright © 2019, Elsevier Inc. All rights reserved. 12


Defining Computer Architecture
Defining Computer Architecture
 “Old” view of computer architecture:
 Instruction Set Architecture (ISA) design
 i.e. decisions regarding:

registers, memory addressing, addressing modes,
instruction operands, available operations, control flow
instructions, instruction encoding

 “Real” computer architecture:


 Specific requirements of the target machine
 Design to maximize performance within constraints:
cost, power, and availability
 Includes ISA, microarchitecture, hardware

Copyright © 2019, Elsevier Inc. All rights reserved. 13


Defining Computer Architecture
Instruction Set Architecture
 Class of ISA
 General-purpose registers
 Register-memory vs load-store
 RISC-V registers Register Name Use Saver
 32 g.p., 32 f.p. x9 s1 saved callee
Register Name Use Saver x10-x17 a0-a7 arguments caller
x0 zero constant 0 n/a x18-x27 s2-s11 saved callee
x1 ra return addr caller x28-x31 t3-t6 temporaries caller
x2 sp stack ptr callee f0-f7 ft0-ft7 FP temps caller
x3 gp gbl ptr f8-f9 fs0-fs1 FP saved callee
x4 tp thread ptr f10-f17 fa0-fa7 FP arguments callee
x5-x7 t0-t2 temporaries caller
f18-f27 fs2-fs21 FP saved callee
x8 s0/fp saved/ callee
frame ptr f28-f31 ft8-ft11 FP temps caller

Copyright © 2019, Elsevier Inc. All rights reserved. 14


Defining Computer Architecture
Instruction Set Architecture
 Memory addressing
 RISC-V: byte addressed, aligned accesses faster
 Addressing modes
 RISC-V: Register, immediate, displacement
(base+offset)
 Other examples: autoincrement, indexed, PC-relative
 Types and size of operands
 RISC-V: 8-bit, 32-bit, 64-bit

Copyright © 2019, Elsevier Inc. All rights reserved. 15


Defining Computer Architecture
Instruction Set Architecture
 Operations
 RISC-V: data transfer, arithmetic, logical, control,
floating point
 See Fig. 1.5 in text
 Control flow instructions
 Use content of registers (RISC-V) vs. status bits (x86,
ARMv7, ARMv8)
 Return address in register (RISC-V, ARMv7, ARMv8)
vs. on stack (x86)
 Encoding
 Fixed (RISC-V, ARMv7/v8 except compact instruction
set) vs. variable length (x86)
Copyright © 2019, Elsevier Inc. All rights reserved. 16
Trends in Technology
Trends in Technology
 Integrated circuit technology (Moore’s Law)
 Transistor density: 35%/year
 Die size: 10-20%/year
 Integration overall: 40-55%/year

 DRAM capacity: 25-40%/year (slowing)


 8 Gb (2014), 16 Gb (2019), possibly no 32 Gb

 Flash capacity: 50-60%/year


 8-10X cheaper/bit than DRAM

 Magnetic disk capacity: recently slowed to 5%/year


 Density increases may no longer be possible, maybe increase from 7 to
9 platters
 8-10X cheaper/bit then Flash
 200-300X cheaper/bit than DRAM

Copyright © 2019, Elsevier Inc. All rights reserved. 17


Trends in Technology
Bandwidth and Latency
 Bandwidth or throughput
 Total work done in a given time
 32,000-40,000X improvement for processors
 300-1200X improvement for memory and disks

 Latency or response time


 Time between start and completion of an event
 50-90X improvement for processors
 6-8X improvement for memory and disks

Copyright © 2019, Elsevier Inc. All rights reserved. 18


Trends in Technology
Bandwidth and Latency

Log-log plot of bandwidth and latency milestones

Copyright © 2019, Elsevier Inc. All rights reserved. 19


Trends in Technology
Transistors and Wires
 Feature size
 Minimum size of transistor or wire in x or y
dimension
 10 microns in 1971 to .011 microns in 2017
 Transistor performance scales linearly

Wire delay does not improve with feature size!
 Integration density scales quadratically

Copyright © 2019, Elsevier Inc. All rights reserved. 20


Trends in Power and Energy
Power and Energy
 Problem: Get power in, get power out

 Thermal Design Power (TDP)


 Characterizes sustained power consumption
 Used as target for power supply and cooling system
 Lower than peak power (1.5X higher), higher than
average power consumption

 Clock rate can be reduced dynamically to limit


power consumption

 Energy per task is often a better measurement


Copyright © 2019, Elsevier Inc. All rights reserved. 21
Trends in Power and Energy
Dynamic Energy and Power
 Dynamic energy
 Transistor switch from 0 -> 1 or 1 -> 0
 ½ x Capacitive load x Voltage2

 Dynamic power
 ½ x Capacitive load x Voltage2 x Frequency switched

 Reducing clock rate reduces power, not energy

Copyright © 2019, Elsevier Inc. All rights reserved. 22


Trends in Power and Energy
Power
 Intel 80386
consumed ~ 2 W
 3.3 GHz Intel
Core i7 consumes
130 W
 Heat must be
dissipated from
1.5 x 1.5 cm chip
 This is the limit of
what can be
cooled by air

Copyright © 2019, Elsevier Inc. All rights reserved. 23


Trends in Power and Energy
Reducing Power
 Techniques for reducing power:
 Do nothing well
 Dynamic Voltage-Frequency Scaling

 Low power state for DRAM, disks


 Overclocking, turning off cores
Copyright © 2019, Elsevier Inc. All rights reserved. 24
Trends in Power and Energy
Static Power
 Static power consumption
 25-50% of total power

Currentstatic x Voltage
 Scales with number of transistors
 To reduce: power gating

Copyright © 2019, Elsevier Inc. All rights reserved. 25


Trends in Cost
Trends in Cost
 Cost driven down by learning curve
 Yield

 DRAM: price closely tracks cost

 Microprocessors: price depends on


volume
 10% less for each doubling of volume

Copyright © 2019, Elsevier Inc. All rights reserved. 26


Trends in Cost
Integrated Circuit Cost
 Integrated circuit

 Bose-Einstein formula:

 Defects per unit area = 0.016-0.057 defects per square cm (2010)


 N = process-complexity factor = 11.5-15.5 (40 nm, 2010)

Copyright © 2019, Elsevier Inc. All rights reserved. 27


Dependability
Dependability
 Module reliability
 Mean time to failure (MTTF)
 Mean time to repair (MTTR)
 Mean time between failures (MTBF) = MTTF + MTTR
 Availability = MTTF / MTBF

Copyright © 2019, Elsevier Inc. All rights reserved. 28


Measuring Performance
Measuring Performance
 Typical performance metrics:
 Response time
 Throughput

 Speedup of X relative to Y

Execution timeY / Execution timeX

 Execution time
 Wall clock time: includes all system overheads
 CPU time: only computation time

 Benchmarks
 Kernels (e.g. matrix multiply)
 Toy programs (e.g. sorting)
 Synthetic benchmarks (e.g. Dhrystone)
 Benchmark suites (e.g. SPEC06fp, TPC-C)

Copyright © 2019, Elsevier Inc. All rights reserved. 29


Principles
Principles of Computer Design
 Take Advantage of Parallelism
 e.g. multiple processors, disks, memory banks,
pipelining, multiple functional units

 Principle of Locality
 Reuse of data and instructions

 Focus on the Common Case


 Amdahl’s Law

Copyright © 2019, Elsevier Inc. All rights reserved. 30


Principles
Principles of Computer Design
 The Processor Performance Equation

Copyright © 2019, Elsevier Inc. All rights reserved. 31


Principles
Principles of Computer Design
 Different instruction types having different
CPIs

Copyright © 2019, Elsevier Inc. All rights reserved. 32


Principles
Principles of Computer Design
 Different instruction types having different
CPIs

Copyright © 2019, Elsevier Inc. All rights reserved. 33


Fallacies and Pitfalls
 All exponential laws must come to an end
 Dennard scaling (constant power density)

Stopped by threshold voltage
 Disk capacity

30-100% per year to 5% per year
 Moore’s Law

Most visible with DRAM capacity

ITRS disbanded

Only four foundries left producing state-of-the-art
logic chips

11 nm, 3 nm might be the limit

Copyright © 2019, Elsevier Inc. All rights reserved. 34


Fallacies and Pitfalls
 Microprocessors are a silver bullet
 Performance is now a programmer’s burden
 Falling prey to Amdahl’s Law
 A single point of failure
 Hardware enhancements that increase
performance also improve energy
efficiency, or are at worst energy neutral
 Benchmarks remain valid indefinitely
 Compiler optimizations target benchmarks

Copyright © 2019, Elsevier Inc. All rights reserved. 35


Fallacies and Pitfalls
 The rated mean time to failure of disks is
1,200,000 hours or almost 140 years, so
disks practically never fail
 MTTF value from manufacturers assume
regular replacement
 Peak performance tracks observed
performance
 Fault detection can lower availability
 Not all operations are needed for correct
execution
Copyright © 2019, Elsevier Inc. All rights reserved. 36
PC vs SMART PHONE
 Architecture
• Smartphone CPUs: Often use ARM
architecture, which is optimized for
power efficiency and lower heat
generation, making them suitable for
battery-powered devices.
• PC CPUs: Typically use x86
architecture (from Intel or AMD),
designed for higher performance,
with more cores and threads,
allowing for complex computing
Copyright © 2019, Elsevier Inc. All rights reserved. 37
 2. Performance
• Smartphone CPUs: Generally
designed for lower performance
compared to PC CPUs. They focus on
efficient multitasking, handling
everyday tasks, and running mobile
applications.
• PC CPUs: Offer higher clock speeds,
more cores, and greater processing
power, making them better suited for
demanding tasks like gaming, video
Copyright © 2019, Elsevier Inc. All rights reserved. 38
 3. Power Consumption
• Smartphone CPUs: Prioritize energy
efficiency to extend battery life. They
often throttle performance to
conserve power during less
demanding tasks.
• PC CPUs: Can consume more power,
especially during intensive
workloads. They are typically paired
with larger cooling systems to
manage heat.
Copyright © 2019, Elsevier Inc. All rights reserved. 39
 4. Use Cases
• Smartphone CPUs: Optimized for
mobile applications, multitasking,
and connectivity features like cellular
and Wi-Fi.
• PC CPUs: Designed for a broader
range of applications, including high-
performance computing, gaming, and
professional software.

Copyright © 2019, Elsevier Inc. All rights reserved. 40

You might also like