0% found this document useful (0 votes)

516 views86 pages

Computer Architecture Basics 1

Computer Architecture Basics

Uploaded by

Pavlos Zir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

516 views86 pages

Computer Architecture Basics 1

Computer Architecture Basics

Uploaded by

Pavlos Zir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

Computer Architecture

ELE 475 / COS 475

Slide Deck 1: Introduction and
Instruction Set Architectures
David Wentzlaff
Department of Electrical Engineering
Princeton University
1

What is Computer Architecture?

Application

What is Computer Architecture?

Application

Physics
3

What is Computer Architecture?

Application

Gap too large to

bridge in one step

Physics
4

What is Computer Architecture?

Application

Gap too large to

bridge in one step

In its broadest definition,

computer architecture is the
design of the
abstraction/implementation
layers that allow us to
execute information
processing applications
efficiently using
manufacturing technologies

Physics
5

What is Computer Architecture?

Application

Gap too large to

bridge in one step

In its broadest definition,

computer architecture is the
design of the
abstraction/implementation
layers that allow us to
execute information
processing applications
efficiently using
manufacturing technologies

Physics
6

Abstractions in Modern
Computing Systems
Application
Algorithm
Programming Language
Operating System/Virtual Machines
Instruction Set Architecture

Microarchitecture
Register-Transfer Level
Gates
Circuits

Devices
Physics
7

Abstractions in Modern
Computing Systems
Application
Algorithm
Programming Language
Operating System/Virtual Machines
Instruction Set Architecture

Microarchitecture
Register-Transfer Level

Computer Architecture
(ELE 475)

Gates
Circuits

Devices
Physics
8

Computer Architecture is Constantly

Changing
Application
Algorithm
Programming Language

Application Requirements:
Suggest how to improve architecture
Provide revenue to fund development

Operating System/Virtual Machines

Instruction Set Architecture

Microarchitecture
Register-Transfer Level
Gates
Circuits

Devices
Physics

Technology Constraints:
Restrict what can be done efficiently
New technologies make new arch
possible
9

Computer Architecture is Constantly

Changing
Application
Algorithm
Programming Language

Application Requirements:
Suggest how to improve architecture
Provide revenue to fund development

Operating System/Virtual Machines

Instruction Set Architecture

Microarchitecture
Register-Transfer Level

Architecture provides feedback to guide

application and technology research
directions

Gates
Circuits

Devices
Physics

Technology Constraints:
Restrict what can be done efficiently
New technologies make new arch
possible
10

Computers Then

IAS Machine. Design directed by John von Neumann.

First booted in Princeton NJ in 1952
Smithsonian Institution Archives (Smithsonian Image 95-06151)

Computers Now
Sensor Nets
Cameras
Set-top
boxes

Media
Players Laptops

Games

Servers

Routers

Smart
phones

Automobiles

Robots

Supercomputers

Major
Technology
Generations
Vacuum
Tubes

Bipolar

CMOS

nMOS
pMOS

Relays

Electromechanical

[from Kurzweil]

Sequential Processor Performance

RISC

Sequential Processor Performance

Move to multi-processor

RISC

Course Structure
Recommended Readings
In-Lecture Questions
Problem Sets
Very useful for exam preparation
Peer Evaluation

Midterm
Final Exam
17

Course Content Computer

Organization (ELE 375)
Computer Organization
Basic Pipelined
Processor

~50,000 Transistors

Photo of Berkeley RISC I, University of California (Berkeley)

Course Content Computer

Architecture (ELE 475)

Intel Nehalem Processor, Original Core i7, Image Credit Intel:

19
http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

Course Content Computer

Architecture (ELE 475)

~700,000,000 Transistors
Intel Nehalem Processor, Original Core i7, Image Credit Intel:
20
http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

Course Content Computer

Architecture (ELE 475)
Computer Organization
(ELE 375) Processor

~700,000,000 Transistors
Intel Nehalem Processor, Original Core i7, Image Credit Intel:
21
http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

Course Content Computer

Architecture (ELE 475)
Instruction Level Parallelism
Superscalar
Very Long Instruction Word (VLIW)

Computer Organization
(ELE 375) Processor

Long Pipelines (Pipeline

Parallelism)
Advanced Memory and Caches
Data Level Parallelism
Vector
GPU

Thread Level Parallelism

Multithreading
Multiprocessor
Multicore
Manycore

~700,000,000 Transistors

Intel Nehalem Processor, Original Core i7, Image Credit Intel:

22
http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

Architecture vs. Microarchitecture

Architecture/Instruction Set Architecture:
Programmer visible state (Memory & Register)
Operations (Instructions and how they work)
Execution Semantics (interrupts)
Input/Output
Data Types/Sizes
Microarchitecture/Organization:
Tradeoffs on how to implement ISA for some metric
(Speed, Energy, Cost)
Examples: Pipeline depth, number of pipelines, cache
size, silicon area, peak power, execution ordering, bus
widths, ALU widths
23

Software Developments
up to 1955

Libraries of numerical routines

- Floating point operations
- Transcendental functions
- Matrix manipulation, equation solvers, . . .

1955-60

High level Languages - Fortran 1956

Operating Systems - Assemblers, Loaders, Linkers, Compilers
- Accounting programs to keep track of
usage and charges

Software Developments
up to 1955

Libraries of numerical routines

- Floating point operations
- Transcendental functions
- Matrix manipulation, equation solvers, . . .

1955-60

High level Languages - Fortran 1956

Operating Systems - Assemblers, Loaders, Linkers, Compilers
- Accounting programs to keep track of
usage and charges

Machines required experienced operators

Most users could not be expected to understand
these programs, much less write them
Machines had to be sold with a lot of resident software

Compatibility Problem at IBM

By early 1960s, IBM had 4 incompatible lines of
computers!
701
650
702
1401

7094
7074
7080
7010

Each system had its own

Instruction set
I/O system and Secondary Storage:
magnetic tapes, drums and disks
assemblers, compilers, libraries,...
market niche business, scientific, real time, ...
26

Compatibility Problem at IBM

By early 1960s, IBM had 4 incompatible lines of
computers!
701
650
702
1401

7094
7074
7080
7010

Each system had its own

Instruction set
I/O system and Secondary Storage:
magnetic tapes, drums and disks
assemblers, compilers, libraries,...
market niche business, scientific, real time, ...
27

Compatibility Problem at IBM

By early 1960s, IBM had 4 incompatible lines of
computers!
701
650
702
1401

7094
7074
7080
7010

Each system had its own

Instruction set
I/O system and Secondary Storage:
magnetic tapes, drums and disks
assemblers, compilers, libraries,...
market niche business, scientific, real time, ...
IBM 360
28

IBM 360 : Design Premises

Amdahl, Blaauw and Brooks, 1964

The design must lend itself to growth and successor

machines
General method for connecting I/O devices
Total performance - answers per month rather than bits per
microsecond programming aids
Machine must be capable of supervising itself without
manual intervention
Built-in hardware fault checking and locating aids to reduce
down time
Simple to assemble systems with redundant I/O devices,
memories etc. for fault tolerance
Some problems required floating-point larger than 36 bits
29

IBM 360: A General-Purpose Register

(GPR) Machine
Processor State

16 General-Purpose 32-bit Registers

may be used as index and base register

Register 0 has some special properties
4 Floating Point 64-bit Registers
A Program Status Word (PSW)
PC, Condition codes, Control flags

A 32-bit machine with 24-bit addresses

But no instruction contains a 24-bit address!

Data Formats
8-bit bytes, 16-bit half-words, 32-bit words, 64-bit double-words

IBM 360: A General-Purpose Register

(GPR) Machine
Processor State

16 General-Purpose 32-bit Registers

may be used as index and base register

Register 0 has some special properties
4 Floating Point 64-bit Registers
A Program Status Word (PSW)
PC, Condition codes, Control flags

A 32-bit machine with 24-bit addresses

But no instruction contains a 24-bit address!

Data Formats
8-bit bytes, 16-bit half-words, 32-bit words, 64-bit double-words

The IBM 360 is why bytes are 8-bits long today!

IBM 360: Initial Implementations

Model 30
...
Storage
8K - 64 KB
Datapath
8-bit
Circuit Delay 30 nsec/level
Local Store
Main Store
Control Store Read only 1sec

Model 70
256K - 512 KB
64-bit
5 nsec/level
Transistor Registers
Conventional circuits

IBM 360 instruction set architecture (ISA) completely

hid the underlying technological differences between
various models.
Milestone: The first true ISA designed as portable
hardware-software interface!
32

IBM 360: Initial Implementations

Model 30
...
Storage
8K - 64 KB
Datapath
8-bit
Circuit Delay 30 nsec/level
Local Store
Main Store
Control Store Read only 1sec

Model 70
256K - 512 KB
64-bit
5 nsec/level
Transistor Registers
Conventional circuits

IBM 360 instruction set architecture (ISA) completely

hid the underlying technological differences between
various models.
Milestone: The first true ISA designed as portable
hardware-software interface!
With minor modifications it still survives today!
33

IBM 360: 47 years later

The zSeries z11 Microprocessor
5.2 GHz in IBM 45nm PD-SOI CMOS technology
1.4 billion transistors in 512 mm2
64-bit virtual addressing
original S/360 was 24-bit, and S/370 was 31-bit extension

Quad-core design
Three-issue out-of-order superscalar pipeline
Out-of-order memory accesses
Redundant datapaths
every instruction performed in two parallel datapaths and
results compared

[ IBM, Kevin Shum, HotChips, 2010]

Image Credit: IBM
Courtesy of International Business
Machines Corporation, International
Business Machines Corporation.

64KB L1 I-cache, 128KB L1 D-cache on-chip

1.5MB private L2 unified cache per core, on-chip
On-Chip 24MB eDRAM L3 cache
Scales to 96-core multiprocessor with 768MB of
shared L4 eDRAM
34

Same Architecture
Different Microarchitecture
AMD Phenom X4

Intel Atom

X86 Instruction Set

Quad Core
125W
Decode 3 Instructions/Cycle/Core
64KB L1 I Cache, 64KB L1 D Cache
512KB L2 Cache
Out-of-order
2.6GHz

X86 Instruction Set

Single Core
2W
Decode 2 Instructions/Cycle/Core
32KB L1 I Cache, 24KB L1 D Cache
512KB L2 Cache
In-order
1.6GHz

Image Credit: Intel

Image Credit: AMD

Different Architecture
Different Microarchitecture
AMD Phenom X4

IBM POWER7

X86 Instruction Set

Quad Core
125W
Decode 3 Instructions/Cycle/Core
64KB L1 I Cache, 64KB L1 D Cache
512KB L2 Cache
Out-of-order
2.6GHz

Power Instruction Set

Eight Core
200W
Decode 6 Instructions/Cycle/Core
32KB L1 I Cache, 32KB L1 D Cache
256KB L2 Cache
Out-of-order
4.25GHz

Image Credit: IBM

Image Credit: AMD

36
Courtesy of International Business Machines
Corporation, International Business Machines Corporation.

Where Do Operands Come from

And Where Do Results Go?

Where Do Operands Come from

And Where Do Results Go?

ALU

Where Do Operands Come from

And Where Do Results Go?

Memory

ALU

Where Do Operands Come from

And Where Do Results Go?

Memory

Processor

ALU

Where Do Operands Come from

And Where Do Results Go?

Where Do Operands Come from

And Where Do Results Go?
Stack

Memory

Processor

TOS

ALU

Where Do Operands Come from

And Where Do Results Go?
Stack

Accumulator

Processor

ALU

Memory

Processor

TOS

ALU

Where Do Operands Come from

And Where Do Results
Go?
RegisterStack

Accumulator

Memory

Processor

ALU

Memory

Processor

ALU

Memory

Processor

TOS

ALU

Where Do Operands Come from

And Where Do Results
Go? RegisterRegisterStack

Accumulator

Memory

Processor

ALU

Memory

Processor

ALU

Memory

Processor

ALU

Memory

Processor

TOS

ALU

Where Do Operands Come from

And Where Do Results
Go? RegisterRegisterStack

Accumulator

Memory

Number Explicitly
Named Operands:

2 or 3

Processor

ALU

Memory

Processor

ALU

Memory

Processor

ALU

Memory

Processor

TOS

ALU

2 or 3
46

Stack-Based Instruction Set

Architecture (ISA)

Memory

Processor

TOS

ALU

Burroughs B5000 (1960)

Burroughs B6700
HP 3000
ICL 2900
Symbolics 3600
Modern
Inmos Transputer
Forth machines
Java Virtual Machine
Intel x87 Floating Point Unit
47