Stud CSA Processors Mod2 Part1
Stud CSA Processors Mod2 Part1
Module - 2 Part - 1
Processors
Processors
• Advanced Processor Technology
– Design Space of Processors
– Instruction-Set Architectures
– CISC Scalar Processors
– RISC Scalar Processors
• Superscalar and Vector Processors
– Superscalar Processors
– VLIW Architecture
– Vector and Symbolic Processors
Advanced Processor Technology
• Major microprocessor families include:
– CISC Computers
– RISC Computers
– Superscalar Processors
– VLIW Processors
– Super Pipelined
– Vector Supercomputers
– symbolic processor
• Scalar and vector processors are used for numerical computations.
• Symbolic processors have been developed for Al applications.
Design space of processors
Design space of processors
• The two broad categories of processors are CISC and RISC.
❑ CISC (Complex Instruction Set Computing)
• Conventional processors like Intel i486, M68040, VAX/8600, IBM 390, etc
fall into this family.
• Typical clock rate ~ 33 – 50 MHz and with microprogrammed control,
• typical CPI ~ 1 – 20.
• CISC processors are at the upper part of the design space.
❑ RISC (Reduced Instruction set computing)
• Today’s RISC processors like Intel i860, SPARC, MIPS R3000, IBM
RS/6000, MIPS, Alpha, ARM etc.
• Have faster clock rate ~ 20 – 120 MHz
• with hardwired control
• typical CPI ~ 1 – 2.
❑ Super Scalar processors
• Special subclass of RISC processor
• Allow Multiple instructions to be executed simultaneously during each
cycle.
• Effective CPI lower than RISC
• Clock rate as scalar RISC.
Design space of processors….
❑ Very long instruction word (VLIW) architecture
•Uses even more functional units than a superscalar processor.
•CPI can be further lowered.
•Due to very long instructions (256 to 1024 bits per instruction), its clock rate is
slow
•VLIW processors have been mostly implemented with microprogrammed control.
•Eg: Intel i860 RISC processor
❑ Vector supercomputers
• Use multiple functional units for concurrent scalar and vector operations.
• The effective CPI of a processor used in a supercomputer should be very low
•The cost increases appreciably if a processor design is restricted to the lower
right corner
Instruction Pipeline
• The execution cycle of a typical instruction involves four
phases:
fetch, decode, execute & write-back
• Often executed by an instruction pipeline.
• An instruction processor can be modelled by a pipeline
structure.
• The pipeline, like an industrial assembly line, receives
successive instructions from its input end and executes them
in a streamlined, overlapped fashion as they flow through.
• A pipeline cycle is intuitively defined as the time required for
each phase to complete its operation, assuming equal delay
in all phases (pipeline stages).
Definitions (instruction pipeline)
❖ Basic definitions associated with instruction pipeline operations:
• Instruction pipeline cycle— the clock period of the instruction pipeline.
• Instruction issue latency— the time (in cycles) required between the issuing of
two adjacent instructions.
• Instruction issue rate— the number of instructions issued per cycle, also called
the degree of a superscalar processor.
• Simple operation latency — Simple operations make up the vast majority of
instructions executed by the machine, such as integer adds, loads, stores,
branches, moves, etc. On the contrary, complex operations are those requiring
an order-of magnitude longer latency, such as divides, cache misses, etc. These
latencies are measured in number of cycles.
• Resource conflicts— This refers to the situation where two or more instructions
demand use of the same functional unit at the same time.
Instruction Pipeline
❑ A base scalar processor is defined as a machine with one instruction issued per cycle, a
one-cycle latency for a simple operation, and a one-cycle latency between instruction
issues. The instruction pipeline can be fully utilized if successive instructions can enter it
continuously at the rate of one per cycle. The effective CPI rating is 1 for the ideal pipeline.
❑ If instruction latency is two cycles per instruction, pipeline can be under utilized.
The effective CPI rating is 2.
❑ Another under pipelined situation is in which the pipeline cycle time is doubled by combining
pipeline stages. ln this case, the fetch and decode phases are combined into one pipeline
stage, and execute and write back are combined into another stage. This will also result in
poor pipeline utilization. The effective CPI rating is one half.
Data path architecture and control unit of a scalar processor
Data path architecture and control unit of a scalar processor…
• Here data path architecture and control unit of a typical, simple scalar
processor without an instruction pipeline.
• Main memory, ID controllers, etc. are connected to the external bus.
• The control unit generates control signals required for the fetch, decode, ALU
operation , memory access and write result phases of instruction execution.
• The control unit itself may use micro coded logic (CISC) or hardwired logic
(RISC).
Processors & Coprocessors
• Central processor of computer is called CPU
– Scalar processor
– Multiple functional units
– Floating point accelerator
• Floating point unit can be coprocessor
– Attached with CPU
– Executes instructions dispatched by CPU
– Can’t be used alone, can’t handle I/O operations
Architectural Models of a basic Scalar processor…
Instruction Set Architectures
• The instruction set, also called instruction set architecture (ISA), is part of a
computer that pertains to programming, which is basically machine
language.
• Instruction set, defines the primitive commands or machine instructions to
the processor.
• Two approaches of ISA: CISC and RISC
⮚ Examples of instruction set
⮚ Characteristics of instruction set:-
▪ ADD - Add two numbers together.
• Instruction formats
▪ COMPARE – Compare numbers.
• Data formats/types
▪ IN - Input information from a device,
• Addressing modes
e.g.,keyboard.
• General purpose registers
▪ JUMP - Jump to designated RAM
• Opcode specifications address.
• Flow control mechanisms ▪ LOAD - Load information from RAM
• memory architecture to the CPU.
• Interrupt and exception handling ▪ OUT - Output information to device,
• external I/O e.g.,monitor.
▪ STORE-Store information to RAM.
Complex Instruction Set Computing Reduced Instruction Set Computing
(CISC) (RISC)
• HLL statements directly implemented in • Only 25% of large set of instructions used
hardware. Add more and more functions into frequently 95% of the time. all these rare
the hardware instructions to software
• instruction set very large & complex • reduced instruction set
• Characterized by micro programmed control • Characterized by hardwired control without
with Control ROM Control ROM
• Typically contains 120 - 350 instructions • Typically contains less than 100 instructions
• Variable instruction format ( 16 - 64 bit) • Fixed instruction format (32 bit)
• a few (8 - 24) general purpose registers • Lot of general purpose registers (32 -192 )
• Clock rate ( 33 - 50Mhz), CPI ( 2 -15) • Clock rate ( 50 - 150Mhz), CPI ( < 1.5)
• Lot of memory based instructions • Mostly register based instructions
• Unified cache design • Split data and instruction cache design
• More than a dozen addressing modes • Only 3 – 5 addressing modes
• Improve execution efficiency • Memory access only by load/store
instructions
CISC vs RISC Architectures
CISC Scalar Processor
•Large number of
instructions
•High CPI
• Generic RISC processors are called scalar RISC because they are designed to issue
one instruction per cycle
• RISC processors push some of the less frequently used operations into software
• RISC processors depend heavily on a good compiler because complex HLL
instructions are to be converted into primitive low level instructions, which are few
in number
• RISC processors have a higher clock rate and lower CPI
RISC Scalar Processor
Advantages:
•Speed: Since a simplified instruction set allows for a pipelined, super scalar
design, RISC processors often achieve 2 to 4 times the performance of CISC
processor using comparable semiconductor technology and the same clock
rates.
•Simpler Hardware: Because the instruction set of a RISC processor is so simple,
it uses up much less chip space; extra functions such as memory management
units or floating point arithmetic units, can also be placed on the same chip.
Smaller chips allow a semiconductor manufacturer to place more parts on a
single silicon wafer, which can lower the per-chip cost dramatically.
•Shorter Design Cycle: Since RISC processors are simpler than corresponding
CISC processors, they can be designed more quickly, and can take advantage of
other technological developments sooner than corresponding CISC designs,
leading to greater leaps in performance between generations.
General characteristics
•Instruction set
consist of less than
100 instructions
•Low CPI
RISC - Example 1
❖ SPARC runs each procedure with a set of thirty two 32-bit registers
• Eight of these registers are global registers shared by all procedures
• Remaining twenty four registers are window registers associated with only one
procedure
• Concept of using overlapped registers is the most important feature
introduced
• Each register window is divided into three sections – Ins, Locals and Outs
• Locals are addressable by each procedure and Ins & Outs are shared among
procedures
• Input registers : arguments are passed to a function
• Local registers : to store any local data.
• Output registers : When calling a function, the programmer puts his argument
in these registers.
RISC - Example 1 (Window Registers)
RISC- Register Window
• At any time, an instruction can access the following
8 global registers and a 24 bit register - window.
• A register window comprises a 16-register set- divided into 8 in and 8 local registers-
together with the 8 in registers of an adjacent register set, addressable from the
current window as its out registers.
• When a procedure is called, the register window shifts by sixteen registers, hiding
the old input registers and old local registers and making the old output registers
the new input registers.
• The current (active) window into the r registers is given by the current window
pointer (CWP) register. It is always decremented.
• Window Invalid Mask- set as 1 for oldest window…if accessed, then trap occurs,
its contents saved on to stack, WIM rotated 1 bit and next lowest window set as
oldest .
• Trap base register – pointer to trap handler
• Special Register to create a 64-bit product in multiple step instructions
• Overlapping windows save time in inter procedure communication, faster context
switching.
RISC-The Floating-point Unit (FPU)
• A typical instruction
format
Pipelining in VLIW Architecture
❑ Memory to memory VP
• Memory based instructions
• Longer instructions
• Instructions include memory address
Vector Instructions
• Scalar pipeline
• Each “Execute-Stage”
operates upon a scalar
operand
• Vector pipeline
• Each “Execute-Stage”
operates upon a vector
operand
Symbolic Processors
❑ Symbolic processors
- Prolog Processors, Lisp Processors or symbolic manipulators.
Deals with logic programs, symbolic lists, objects, scripts, productions systems,
semantic networks, frames and artificial neural networks.
❑ Application areas:
- pattern recognition, expert systems, artificial intelligence, cognitive science,
machine learning, text retrieval, theorem proving, knowledge engineering, etc.
❑ Symbolic processors differ from numeric processors in terms of:-
– Data and knowledge representations
– Primitive operations
– Algorithmic behavior
– Memory
– I/O communication
– Special architectural features
Characteristics of Symbolic Processors
Symbolic Processors
• Lisp program is a set of functions in which data are passed from
function to function.
• The concurrent execution of these functions brings parallelism.
• The applicative and recursive nature of Lisp requires an environment
that efficiently supports stack computations and function calling.
• Use of linked lists as basic data structure – implements an automatic
garbage collection mechanism.
• Primitive operations demand a special instruction set with compare,
matching, logic and symbolic manipulation operations.
• Floating point operations not used.
Symbolic Processor - Example
• Symbolic Lisp Processor
• Layered architecture -
Simplified instruction set,
Stack oriented machine