Functional Units: Input and Arithmetic Logic
Functional Units: Input and Arithmetic Logic
Arithmetic
Input and
logic
Memory
Output Control
I/O Processor
MAR MDR
Control
PC R0
R1
Processor
IR
Figure 1. 3. Singlebus structure.
Speed Issue
Different devices have different
transfer/operate speed.
If the speed of bus is bounded by the slowest
device connected to it, the efficiency will be
very low.
How to solve this?
Main Cache
memory memory Processor
Bus
Cost
Memory management
Processor Clock
Clock, clock cycle, and clock rate
The execution of each instruction is divided
into several steps, each of which completes
in one clock cycle.
Hertz – cycles per second
Basic Performance Equation
T – processor time required to execute a program that has been
prepared in high-level language
N – number of actual machine language instructions needed to
complete the execution (note: loop)
S – average number of basic steps needed to execute one
machine instruction. Each step completes in one clock cycle
R – clock rate
Note: these are not independent to each other
N×S
T=
R
How to improve T?
Pipeline and Superscalar
Operation
Instructions are not necessarily executed one after
another.
The value of S doesn’t have to be the number of
clock cycles to execute one instruction.
Pipelining – overlapping the execution of successive
instructions.
Add R1, R2, R3
Superscalar operation – multiple instruction
pipelines are implemented in the processor.
Goal – reduce S (could become <1!)
Clock Rate
Increase clock rate
Improve the integrated-circuit (IC) technology to make
the circuits faster
Reduce the amount of processing done in one basic step
(however, this may increase the number of basic steps
needed)
Increases in R that are entirely caused by
improvements in IC technology affect all
aspects of the processor’s operation equally
except the time to access the main memory.
CISC and RISC
Tradeoffbetween N and S
A key consideration is the use of pipelining
S is close to 1 even though the number of basic steps
per instruction may be considerably larger
It is much easier to implement efficient pipelining in
processor with simple instruction sets
Reduced Instruction Set Computers (RISC)
Complex Instruction Set Computers (CISC)
Compiler
A compiler translates a high-level language program
into a sequence of machine instructions.
To reduce N, we need a suitable machine
instruction set and a compiler that makes good use
of it.
Goal – reduce N×S
A compiler may not be designed for a specific
processor; however, a high-quality compiler is
usually designed for, and with, a specific processor.
Performance Measurement
T is difficult to compute.
Measure computer performance using benchmark programs.
System Performance Evaluation Corporation (SPEC) selects and
publishes representative application programs for different application
domains, together with test results for many commercially available
computers.
Compile and run (no simulation)
Reference computer
i =1
Multiprocessors and
Multicomputers
Multiprocessor computer
Execute a number of different application tasks in parallel
Execute subtasks of a single large task in parallel
All processors have access to all of the memory – shared-memory
multiprocessor
Cost – processors, memory units, complex interconnection networks
Multicomputers
Each computer only have access to its own memory
Exchange message via a communication network – message-
passing multicomputers