0% found this document useful (0 votes)
95 views

CS6461 - Computer Architecture Fall 2016 Morris Lancaster: Lecture 3 - Instruction Set Architecture

This document summarizes the evolution of instruction set architectures from early machines with single accumulators to modern RISC architectures. It discusses key concepts in instruction set design including the tradeoffs of complexity versus expandability and legacy support. The document outlines goals for instruction design such as choosing unique opcode values and specifying operands and results. It also categorizes common instruction types and describes different approaches to instruction set architecture from accumulator-based to general purpose register machines to load/store architectures.

Uploaded by

闫麟阁
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

CS6461 - Computer Architecture Fall 2016 Morris Lancaster: Lecture 3 - Instruction Set Architecture

This document summarizes the evolution of instruction set architectures from early machines with single accumulators to modern RISC architectures. It discusses key concepts in instruction set design including the tradeoffs of complexity versus expandability and legacy support. The document outlines goals for instruction design such as choosing unique opcode values and specifying operands and results. It also categorizes common instruction types and describes different approaches to instruction set architecture from accumulator-based to general purpose register machines to load/store architectures.

Uploaded by

闫麟阁
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 40

CS6461 Computer Architecture

Fall 2016
Morris Lancaster
Adapted from Professor Stephen Kaislers Notes

Lecture 3 - Instruction Set Architecture


Evolution of ISAs
Single Accumulator (EDSAC 1950)

Accumulator + Index Registers


(Manchester Mark I, IBM 700 series 1953)

Separation of Programming Model


from Implementation

High-level Language Based Concept of a Family


(B5000 1963) (IBM 360 1964)

General Purpose Register Machines

Complex Instruction Sets Load/Store Architecture


(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)

RISC
(Mips,Sparc,HP-PA,IBM RS6000,PowerPC . . .1987)

LIW/EPIC? (IA-64. . .1999)

10/7/2017 CS61 Computer Architecture - Lecture 3 2


Instruction Set Architecture (ISA)

The ISA is that portion of the machine visible to the


assembly level programmer or to the compiler writer.
Is (recently) not the real machine!
An interface between hardware and low-level software
Standardizes instructions and bit patterns representing them;
follows specified design principles
The machine description that a hardware designer must
understand to design a correct implementation of the computer.
Advantage: Different implementations of the same
instruction set (IBM System/360)
Disadvantage: desire for compatibility prevents trying out
new ideas
Q?: Is binary compatibility really important??
10/7/2017 CS61 Computer Architecture - Lecture 3 3
Why is ISA Design important?

The instruction set architecture is one of the most


important design issues that a CPU designer must get
right from the start.
Features like caches, pipelining, superscalar implementation,
etc., can all be grafted on to a CPU design long after the original
design is obsolete (ex: Intel x86 architecture)
However, it is very difficult to change the instructions a CPU
executes once the CPU is in production and people are writing
software that uses those instructions.
Therefore, one must carefully choose the instructions for a CPU.
You might be tempted to take the "kitchen sink"
approach to instruction set design and include as many
instructions as you can dream up in your instruction set.
This approach fails for several reasons
10/7/2017 CS61 Computer Architecture - Lecture 3 4
Nasty Reality #1: Silicon Real Estate

The first problem with "putting it all on the chip" is that


each feature requires some number of transistors on the
CPU's silicon die.
CPU designers work with a "silicon budget" and are given a finite
number of transistors to work with.
This means that there aren't enough transistors to support
"putting all the features" on a CPU.
The original 8086 processor, for example, had a transistor
budget of less than 30,000 transistors.
The Pentium III processor had a budget of over eight million
transistors.

10/7/2017 CS61 Computer Architecture - Lecture 3 5


Nasty Reality #2: Cost

Although it is possible to use millions of transistors on a


CPU today, the more transistors you use the more
expensive the CPU.
The more instructions that you have (CISC), the more
complex the circuitry and, consequently, the more
transistors you need.

10/7/2017 CS61 Computer Architecture - Lecture 3 6


Nasty Reality #3: Expandability

One problem with the "kitchen sink" approach is that it's very difficult
to anticipate all the features people will want.
For example, Intel's MMX and SIMD instruction enhancements were
added to make multimedia programming more practical on the Pentium
processor.
Back in 1978, very few people could have possibly anticipated the need
for these instructions.
When designing a CPU using the "kitchen sink" approach, it is often
common to discover that programs almost never use some of the available
instructions.
Unless very few programs use the instruction (and you're willing to let them
break) or you can automatically simulate the instruction in software,
removing instructions is a very difficult thing to do.

10/7/2017 CS61 Computer Architecture - Lecture 3 7


Nasty Reality #4: Legacy Support

Often it is the case that an instruction the CPU designer


feels is important turns out to be less useful than
anticipated.
For example, the LOOP instruction on the 80x86 CPU sees very
little use in modern high-performance programs.
The 80x86 ENTER instruction is another good example.
This is almost the opposite of expandability retaining
apparently not useful instructions
Unfortunately, you cannot easily remove instructions in
later versions of a processor because this will break
some existing programs that use those instructions.
Generally, once you add an instruction you have to
support it forever in the instruction set.
10/7/2017 CS61 Computer Architecture - Lecture 3 8
Nasty Reality #5: Complexity

The popularity of a new processor is easily measured by


how much software people write for that processor.
Most CPU designs die a quick death because no one writes
software specific to that CPU.
Therefore, a CPU designer must consider the assembly
programmers and compiler writers who will be using the chip
upon introduction.
While a "kitchen sink" approach might seem to appeal to
such programmers, the truth is no one wants to learn an
overly complex system.
If your CPU does everything under the sun, this might
appeal to someone who is already familiar with the CPU.
However, pity the poor soul who doesn't know the chip
and has to learn it all at once.

10/7/2017 CS61 Computer Architecture - Lecture 3 9


Basic Instruction Design Goals

In a typical Von Neumann architecture CPU, the


computer encodes CPU instructions as numeric values
and stores these numeric values in memory.
The encoding of these instructions is one of the major tasks in
instruction set design and requires very careful thought.
To encode an instruction we must pick a unique numeric
opcode value for each instruction.
With an n-bit number, there are 2n different possible opcodes, so
to encode m instructions you will need an opcode that is at least
log2(m) bits long.
With 128 instructions (27), you need a 7-line to 128-line
decoder to decide which instruction you have.

10/7/2017 CS61 Computer Architecture - Lecture 3 10


8-Instruction Decoder

10/7/2017 CS61 Computer Architecture - Lecture 3 11


What Must an Instruction Specify?

Which operation to perform


Where to find the operands (registers, memory,
immediate, stack, other)
Where to store the result (registers, memory, immediate,
stack, other)
Location of next instruction (either implicitly or explicitly)

10/7/2017 CS61 Computer Architecture - Lecture 3 12


Interpreting Memory Addresses

Objects may have byte or word addresses an address


refers to the number of bytes or words counted from the
beginning of memory.
Little Endian puts the byte whose address is xx00 at the least
significant position in the word.
Big Endian puts the byte whose address is xx00 at the most
significant position in the word.
Alignment data must be aligned on a boundary equal to its
size. Misalignment typically results in an alignment fault
(hardware trap) that must be handled by the Operating System.
It is common for most machines today to do byte
addressing since we want to often manipulate
objects of byte size, such as characters.

10/7/2017 CS61 Computer Architecture - Lecture 3 13


Types of Instructions

Transfer of control: jumps, calls,


Arithmetic/Logical add, subtract, multiply, divide, sqrt,
log, sin, cos, .; also, Boolean comparisons
Load/Store: load a value from memory or store a value
to memory
Stack instructions: push/pop
System: traps, I/O, halt, no-operation,
Floating Point Arithmetic
Decimal
String operations
Multimedia operations: Intels MMX and Suns VIS

10/7/2017 CS61 Computer Architecture - Lecture 3 14


Computer Architectures (from Instruction Perspective)

Accumulator (acc):
1 address add A acc = acc + memory[A]
1+x address addx A acc = acc + memory[A + x]

Stack:
0 address add tos = tos + next
(tos = top of stack)
General Purpose Register:
2 address add A B EA(A) = EA(A) + EA(B)
3 address add A B C EA(A) = EA(B) + EA(C)

Load/Store: Mem3 = Mem1 + Mem2


0 Memory load R1, Mem1
load R2, Mem2
add R1, R2
store R1, Mem3
1 Memory load R1, Mem1
add R1, Mem2
store R1, Mem3

10/7/2017 CS61 Computer Architecture - Lecture 3 15


Addressing Architectures

Motorola 6800 / Intel 8080/8085 (1970s)


1-address architecture: ADDA <addr>
(A) = (A) + (addr)
Intel x86 (1980s)
2-address architecture: ADD EAX, EBX
(A) = (A) + (B)
MIPS (1990s)
3-address architecture: ADD $2, $3, $4
($2) = ($3) + ($4)
Adding C = B + A

Stack Accumulator Register Register


(Register-memory) (load-store)

Push A Load A Load R1, A Load R1, A

Push B Add B Add R1, B Load R2, B

Add Store C Store C, R1 Add R3, R1, R2

Pop C Store C, R3

10/7/2017 CS61 Computer Architecture - Lecture 3 17


Graphic View

memory memory
R3 = R1 + R2
acc = acc + mem[C] R1 = R1 + mem[C]

Registers are the class that won out.


The more registers on the CPU, the better, except that studies of
early RISC machines found that 32 registers was more than enough.

Today, that is debatable given the video and gaming environments.

10/7/2017 CS61 Computer Architecture - Lecture 3 18


Stack Architecture

Need Top of Stack register, Stack Limit register,


Pros
Good code density (implicit operand addressing top of stack)
Low hardware requirements
Easy to write a simple compiler for stack architectures
Cons
Stack becomes the bottleneck
Little ability for parallelism or pipelining
Data is not always at the top of stack when needed, so additional
instructions are needed
Difficult to write an optimizing compiler for stack architectures
(Good PhD topic!!)

10/7/2017 CS61 Computer Architecture - Lecture 3 19


Accumulator Architectures

One or more accumulators (masquerade as GPRs)


Pros
Very low hardware requirements
Easy to design and understand
Cons
Accumulator becomes the bottleneck
Little ability for parallelism or pipelining
High memory traffic

10/7/2017 CS61 Computer Architecture - Lecture 3 20


Memory-Memory Architectures

Operands fetched directly from memory into internal


registers
Pros
Requires fewer instructions (especially if 3 operands)
Easy to write compilers for (especially if 3 operands)
Cons
Very high memory traffic (especially if 3 operands)
Variable number of clocks per instruction (especially if 2
operands) and indexing and indirect access and
With two operands, more data movements are required

10/7/2017 CS61 Computer Architecture - Lecture 3 21


Register-Memory Architectures

Operands fetched from register and memory


Pros
Some data can be accessed without loading first
Instruction format easy to encode
Good code density
Cons
Operands are not equivalent (poor orthogonality)
Variable number of clocks per instruction (w/ indexing and
indirect access and )
May limit number of registers

10/7/2017 CS61 Computer Architecture - Lecture 3 22


Load-Store/Register-Register Architectures

Operands fetched only from registers which are pre-


loaded
Pros
Simple, fixed length instruction encoding
Instructions take similar number of cycles
Relatively easy to pipeline
Cons
Higher instruction count
Not all instructions need three operands
Dependent on good compiler

10/7/2017 CS61 Computer Architecture - Lecture 3 23


Addressing Modes

To get to the data, we must have a way of addressing


the data. This is the Address Mode. Data can be in
several places:
in the instruction
in a register
in main memory
For main memory, we need to generate an absolute
address, e.g., the address that is sent out over the
memory bus (from the MDR)

10/7/2017 CS61 Computer Architecture - Lecture 3 24


DEC VAX Addressing Modes
Studies by [Clark and Emer] indicate that modes 1-4 account for 93% of all operands on the VAX

Address Mode Example Action


Register direct Ri Add R4, R3 R4 < R4 + R3
Immediate (literal) #n Add R4, #3 R4 < R4 + 3
Displacement M[Ri + #n] Add R4, 100(R1) R4 < R4 + M[100 + R1]
Register indirect M[Ri] Add R4, (R1) R4 < R4 + M[R1]
Indexed M[Ri + Rj] Add R4, (R1 + R2) R4 < R4 + M[R1 + R2]
Direct (absolute) M[#n] Add R4, (1000) R4 < R4 + M[1000]
Memory Indirect M[M[Ri] ] Add R4, @(R3) R4 < R4 + M[M[R3]]
Autoincrement M[Ri++] Add R4, (R2)+ R4 < R4 + M[R2]
R2 <- R2 + d
Autodecrement M[Ri - -] Add R4, (R2) R4 < R4 + M[R2]
R2 <- R2 - d
Scaled M[Ri + Rj*d + #n] Add R4, 100(R2)[R3] R4 < R4 + M[100 + R2 + R3*d]

10/7/2017 CS61 Computer Architecture - Lecture 3 25


Addressing Examples - I

Register Direct Addressing: The


Register contains the operand

Direct Addressing to Main Memory:


Instruction contains the Main
Memory Address

Indirect Addressing To Main


Memory: Instruction contains an
address in main memory whose
contents are the address of the
operand

10/7/2017 CS61 Computer Architecture - Lecture 3 26


Addressing Examples - II

Register Indirect Addressing: The


register contains the address of a
main memory location which
contains the address of the operand

The contents of the register specify


a base address which is added to
the displacement value in the
instruction to provide the effective
main memory address.

Relative Addressing: The


displacement value in the instruction
is added to the contents of the PC to
generate the effective main memory
address.

10/7/2017 CS61 Computer Architecture - Lecture 3 27


Control Instructions: Issues and Tradeoffs - I

Compare and branch


+ no extra compare, no state passed between instructions
-- requires ALU op, restricts code scheduling opportunities
Implicitly set condition codes Z, N, V, C
+ can be set ``for free'' as a result of an arithmetic/logical operation
-- constrains code reordering, extra state to save/restore
Explicitly set condition codes
+ can be set ``for free'', decouples branch/fetch from pipeline
-- extra state to save/restore
Condition in general purpose register
+ no special state but uses up a register
-- branch condition separate from branch logic in pipeline

10/7/2017 CS61 Computer Architecture - Lecture 3 28


Control Instructions: Issues and Tradeoffs - II

Some data for MIPS


80% branches use immediate data
> 80% of those values are zero
50% branches use == 0 or != 0
Compromise in MIPS
branch==0, branch<>0
compare instructions for all other compares, such as LT, LTE, GT, GTE
Link Return Address:
implicit register - many recent architectures use this
+ fast, simple
-- software must save register before next call, whether interrupts or not
explicit register
+ may avoid saving register, but is a good ideas to do so
-- register must be specified
processor stack
+ recursion direct
-- complex instructions

10/7/2017 CS61 Computer Architecture - Lecture 3 29


Control Instructions: Issues and Tradeoffs - III

Save or restore state:


What state?
function calls: registers
system calls: registers, flags, PC, PSW, etc
Hardware need not save registers
Caller can save registers in use
Callee saves registers it will use
Hardware register save
IBM STM, VAX CALLS
Faster?
Many recent architectures do no register saving
Or do implicit register saving with register windows (SPARC)

10/7/2017 CS61 Computer Architecture - Lecture 3 30


What comes first? The compiler or the ISA?

Depends on who you ask.


Early computers were programmable only in machine or
assembly language; compilers came later
like FORTAN I and MAD and LISP and COBOL and BASIC and
ALGOL and (Jean Sammet in History of Programming
Languages listed over 700 of them!)
AT&T, in designing the their WE chips, closely examined
how the C compiler generated instructions and designed
an ISA to efficiently support that compiler
Similarly, for the Intel iAPX 432 and Ada 83.

10/7/2017 CS61 Computer Architecture - Lecture 3 31


Compiler Structure

10/7/2017 CS61 Computer Architecture - Lecture 3 32


Compilation Steps

Parsing > intermediate representation


Jump Optimization
Loop Optimizations
Register Allocation
Code Generation > assembly code
Common SubExpression
Procedure in-lining
Constant Propagation
Strength Reduction
Pipeline Scheduling

10/7/2017 CS61 Computer Architecture - Lecture 3 33


What do Compiler Writers Want?

Characteristics of ISA:
regularity
orthogonality
composability
Compilers perform a giant case analysis
too many choices make it hard
Orthogonal instruction sets
operation, addressing mode, data type
One solution or all possible solutions
2 branch conditions eq, lt
or all six eq, ne, lt, gt, le, ge
not 3 or 4
Let the compiler put the instructions together to make
more complex sequences.
10/7/2017 CS61 Computer Architecture - Lecture 3 34
Designing ISAs to Improve Compilation

Provide enough general purpose registers to ease


register allocation ( more than 16).
Provide regular instruction sets by keeping the
operations, data types, and addressing modes
orthogonal.
Provide primitive constructs rather than trying to map to
a high-level language.
Simplify trade-off among alternatives.
Allow compilers to help make the common case fast.

10/7/2017 CS61 Computer Architecture - Lecture 3 35


ISA Metrics

Orthogonality
No special registers, few special cases, all operand modes
available with any data type or instruction type
Completeness
Support for a wide range of operations and target applications
Regularity
No overloading for the meanings of instruction fields
Streamlined Design
Resource needs easily determined. Simplify tradeoffs.
Ease of compilation (programming?), Ease of
implementation, Scalability

10/7/2017 CS61 Computer Architecture - Lecture 3 36


Summary of ISA Design Space

Five Primary Dimensions


Number of explicit operands ( 0, 1, 2, 3 )
Operand Storage Where besides memory?
Effective Address How is memory location
specified?
Type & Size of Operands byte, int, float, vector, . . .
How is it specified?
Operations add, sub, mul, . . .
How is it specifed?

Other Aspects
Successor How is it specified?
Conditions How are they determined?
Encodings Fixed or variable? Wide?
Parallelism
10/7/2017 CS61 Computer Architecture - Lecture 3 37
Looking at ISAs

1. Amdahl, Blaauw, and Brooks, Architecture of the IBM System/360.


IBM Journal of Research and Development, 8(2):87-101, April 1964

2. Lonergan and King, Design of the B 5000 system. Datamation, vol.


7, no. 5, pp. 28-32, May, 1961

3. Patterson and Ditzel, The case for the reduced instruction set
computer. Computer Architecture News, October 1980

4. Clark and Strecker, Comments on the case for the reduced


instruction set computer," Computer Architecture News, October
1980.

10/7/2017 CS61 Computer Architecture - Lecture 3 38


DG Nova ISA

10/7/2017 CS61 Computer Architecture - Lecture 3 39


DG Nova ISA

10/7/2017 CS61 Computer Architecture - Lecture 3 40

You might also like