Isa Architecture
Isa Architecture
By
AJAL.A.J - AP/ECE
Instruction Set Architecture
• Instruction set architecture is the structure of a
computer that a machine language programmer must
understand to write a correct (timing independent)
program for that machine.
op op op Bundling info
Instruction Set Architecture
Computer Architecture =
Instruction Set Architecture
+ Machine Organization
RISC
(Mips,Sparc,HP-PA,IBM RS6000,PowerPC . . .1987)
LIW/”EPIC”? (IA-64. . .1999) VLIW
Instruction Set Architecture
• Computer Architecture = Hardware + ISA
– Interface between all the software that runs on the
machine and the hardware that executes it
instruction set, or instruction set
architecture (ISA)
• An instruction set, or instruction set
architecture (ISA), is the part of the computer
architecture related to programming, including the
native data types, instructions, registers, addressing
modes, memory,
architecture, interrupt and exception handling, and
external I/O. An ISA includes a specification of the
set of opcodes (machine language), and the native
commands implemented by a particular processor.
Microarchitecture
• Instruction set architecture is distinguished from
the microarchitecture, which is the set of processor
design techniques used to implement the
instruction set. Computers with different micro
architectures can share a common instruction set.
• For example, the Intel Pentium and
the AMD Athlon implement nearly identical
versions of the x86 instruction set, but have
radically different internal designs.
NUAL vs. UAL
• Unit Assumed Latency (UAL)
– Semantics of the program are that each
instruction is completed before the next one is
issued
– This is the conventional sequential model
MIPS R2000
Organization CPU
Registers
Coprocessor 1 (FPU)
Registers
$0 $0
$31 $31
Arithmetic Multiply
unit divide
Arithmetic
Lo Hi unit
BadVAddr Cause
Status EPC
Definitions
ExecutionTime Performance
= =n
y x
ExecutionTime Performance x y
Application
Programming
Language
Compiler
1
Performance =
ExecutionTime
1
Performance =
CPI × CycleTime
1 Instructions
Performance = =
Cycles Seconds Seconds
×
Instruction Cycle
software
instruction set
hardware
Dynamic Metrics
• How many instructions are executed?
• How many bytes does the processor fetch to execute the
program?
• How many clocks are required per instruction?
• How "lean" a clock is practical?
1 Cycles Seconds
ExecutionTime = = Instructions × ×
Performance Instruction Cycle
Clock Cycle 1 2 3 4 5 6 7 8 9 10
Instr 1
Instr 2
With Pipelining
• The processor is able to perform each stage
simultaneously.
• If the processor is decoding an instruction, it may
also fetch another instruction at the same time.
Clock Cycle 1 2 3 4 5 6 7 8 9
Instr 1
Instr 2
Instr 3
Instr 4
Instr 5
Pipeline (cont.)
• Length of pipeline depends on the longest
step
• Thus in RISC, all instructions were made to
be the same length
• Each stage takes 1 clock cycle
• In theory, an instruction should be finished
each clock cycle
Pipeline Problem
VLIW Goals:
Flexible enough
Match well technology
VLIW philosophy:
– “dumb” hardware
– “intelligent” compiler
VLIW - History
• Floating Point Systems Array Processor
– very successful in 70’s
– all latencies fixed; fast memory
• Multiflow
– Josh Fisher (now at HP)
– 1980’s Mini-Supercomputer
• Cydrome
– Bob Rau (now at HP)
– 1980’s Mini-Supercomputer
• Tera
– Burton Smith
– 1990’s Supercomputer
– Multithreading
• Intel IA-64 (Intel & HP)
VLIW Processors
Slide 74
"Bob" Rau
o Superscalar architectures
Dispatching individual instructions to be executed completely
independently in different parts of the processor
o Out-of-order execution
Executing instructions in an order different from the program
Parallel processing
Processing instructions in parallel requires three
major tasks:
o Hardware approach:
Works upon dynamic parallelism where
scheduling of instructions is at run time
o Software approach:
Works on static parallelism where
scheduling of instructions is by compiler
VLIW COMPILER
FU
I-fetch &
Issue
Memory
Port
Memory
Port
Multi-ported
Register
File
Block Diagram
Working
Bundle of instructions
• 128 bit bundles
• 3 41-bit instructions/bundle
• 2 bundles can be issued at once
• if issue one, get another
• less delay in bundle issue
Data path : A simple VLIW Architecture
FU FU FU
Register file
Scalability ?
Access time, area, power consumption sharply increase with
number of register ports
Slide 90
Data path : Clustered VLIW Architecture
(distributed register file)
FU FU FU FU FU FU
Interconnection Network
Slide 91
Coarse grain Fus with
VLIW core
Multiplexer network
IR
Micro
Reg1
Reg1
Reg2
Reg1
Reg2
Reg2
Code
Coarse grain
FU
Slide 92
Application Specific FUs
number of inputs
functionality FU
Functional Units
number of outputs
Slide 93
Superscalar Processors
• Superscalar
– Operations are sequential
– Hardware figures out resource assignment, time of execution
100
Slide
Superscalars vs. VLIW
101
Slide
Comparison: CISC, RISC, VLIW
Slide
108