Computer Organization and Architechture
Computer Organization and Architechture
• Characteristics of Multiprocessors
• Interconnection Structures
• Interprocessor Arbitration
• Interprocessor Communication/Synchronization
• Cache Coherence
• Typically,
– What operations are performed on the data in the registers
– What information is passed between registers
• Register Transfer
• Arithmetic Microoperations
• Logic Microoperations
• Shift Microoperations
MICROOPERATIONS (1)
MICROOPERATION (2)
R f(R, R)
f: shift, load, clear, increment, add, subtract, complement,
and, or, xor, …
Computer Organization Computer Architecture
Register Transfer & -operations 10 Register Transfer Language
- Microoperations set
DESIGNATION OF REGISTERS
MAR
– Registers may also be represented showing the bits of data they contain
DESIGNATION OF REGISTERS
• Designation of a register
- a register
- portion of a register
- a bit of a register
15 0 15 8 7 0
R2 PC(H) PC(L)
Numbering of bits Subfields
REGISTER TRANSFER
R2 R1
REGISTER TRANSFER
R3 R5
– the data lines from the source register (R5) to the destination
register (R3)
– Parallel load in the destination register (R3)
– Control lines to perform the action
CONTROL FUNCTIONS
• Often actions need to only occur if a certain condition is true
• This is similar to an “if” statement in a programming language
• In digital systems, this is often done via a control signal, called
a control function
– If the signal is 1, the action takes place
• This is represented as:
P: R2 R1
t t+1
Timing diagram
Clock
Load
Transfer occurs here
• The same clock controls the circuits that generate the control function
and the destination register
• Registers are assumed to use positive-edge-triggered flip-flops
SIMULTANEOUS OPERATIONS
P: R3 R5, MAR IR
CONNECTING REGISTRS
Bus lines
B1 C1 D 1 B2 C2 D 2 B3 C3 D 3 B4 C 4 D 4
0 0 0 0
4 x1 4 x1 4 x1 4 x1
MUX MUX MUX MUX
x
select
y
4-line bus
Load
Reg. R0 Reg. R1 Reg. R2 Reg. R3
D 0 D1 D2 D 3
z E (enable)
Select 2x4
w
Decoder
MEMORY (RAM)
• Memory (RAM) can be thought as a sequential circuits
containing some number of registers
• These registers hold the words of memory
• Each of the r registers is indicated by an address
• These addresses range from 0 to r-1
• Each register (word) can hold n bits of data
• Assume the RAM contains r = 2k words. It needs the
following
– n data input lines data input lines
– n data output lines
n
– k address lines
– A Read control line address lines
– A Write control line k
RAM
Read
unit
Write
n
data output lines
MEMORY TRANSFER
• Collectively, the memory is viewed at the register level as
a device, M.
• Since it contains multiple locations, we must specify
which address in memory we will be using
• This is done by indexing memory references
M
Memory Read
AR
unit Write
MEMORY TRANSFER
Write : M[AR] R1
MEMORY READ
• To read a value from a location in memory and load it into a
register, the register transfer language notation looks like this:
R1 M[MAR]
MEMORY WRITE
• To write a value from a register to a location in memory looks like this
in register transfer language:
M[MAR] R1
MICROOPERATIONS
ARITHMETIC MICROOPERATIONS
• The basic arithmetic microoperations are
– Addition
– Subtraction
– Increment
– Decrement
4-Bit adder-subtractor
C4 S3 S2 S1 S0
Binary Adder-Subtractor
B3 A3 B2 A2 B1 A1 B0 A0
C3 C2 C1 C0
FA FA FA FA
C4 S3 S2 S1 S0
Binary Incrementer
A3 A2 A1 A0 1
x y x y x y x y
HA HA HA HA
C S C S C S C S
C4 S3 S2 S1 S0
ARITHMETIC CIRCUIT
Cin
S1
S0
A0 X0 C0
S1 D0
S0 FA
B0 0 4x1 Y0 C1
1 MUX
2
3
A1 X1 C1
S1 FA D1
S0
B1 0 4x1 Y1 C2
1 MUX
2
3
A2 X2 C2
S1 FA D2
S0
B2 0 4x1 Y2 C3
1 MUX
2
3
A3 X3 C3
S1 D3
S0 FA
B3 0 4x1 Y3 C4
1 MUX
2
3 Cout
0 1
LOGIC MICROOPERATIONS
• Specify binary operations on the strings of bits in registers
– Logic microoperations are bit-wise operations, i.e., they work on the
individual bits of data
– useful for bit manipulations on binary data
– useful for making logical decisions based on the bit value
• There are, in principle, 16 different logic functions that can
be defined over two binary input variables
A B F0 F1 F2 … F13 F14 F15
0 0 0 0 0 … 1 1 1
0 1 0 0 0 … 1 1 1
1 0 0 0 1 … 0 1 1
1 1 0 1 0 … 1 0 1
Ai
0
Bi
1
4X1 Fi
MUX
2
3 Select
S1
S0
Function table
S1 S0 Output -operation
0 0 F=AB AND
0 1 F = AB OR
1 0 F=AB XOR
1 1 F = A’ Complement
– Selective-set AA+B
– Selective-complement AAB
– Selective-clear A A • B’
– Mask (Delete) AA•B
– Clear AAB
– Insert A (A • B) + C
– Compare AAB
– ...
SELECTIVE SET
1100 At
1010 B
1110 At+1 (A A + B)
SELECTIVE COMPLEMENT
1100 At
1010 B
0110 At+1 (A A B)
SELECTIVE CLEAR
1100 At
1010 B
0100 At+1 (A A B’)
MASK OPERATION
1100 At
1010 B
1000 At+1 (A A B)
CLEAR OPERATION
1100 At
1010 B
0110 At+1 (A A B)
INSERT OPERATION
• An insert operation is used to introduce a specific bit pattern
into A register, leaving the other bit positions unchanged
• This is done as
– A mask operation to clear the desired bit positions, followed by
– An OR operation to introduce the new bits into the desired
positions
– Example
» Suppose you wanted to introduce 1010 into the low order
four bits of A: 1101 1000 1011 0001 A (Original)
1101 1000 1011 1010 A (Desired)
SHIFT MICROOPERATIONS
• There are three types of shifts
– Logical shift
– Circular shift
– Arithmetic shift
• What differentiates them is the information that goes into
the serial input
LOGICAL SHIFT
• In a logical shift the serial input to the shift is a 0.
CIRCULAR SHIFT
• In a circular shift the serial input is the bit that is shifted out
of the other end of the register.
ARITHMETIC SHIFT
• An arithmetic shift is meant for signed binary numbers
(integer)
• An arithmetic left shift multiplies a signed number by two
• An arithmetic right shift divides a signed number by two
• The main distinction of an arithmetic shift is that it must keep
the sign of the number the same as it performs the
multiplication or division
ARITHMETIC SHIFT
• An left arithmetic shift operation must be checked for the
overflow
0
sign
bit
S
MUX H0
0
1
A0
A1 S
MUX H1
0
A2 1
A3
S
MUX H2
0
1
S
MUX H3
0
1
Serial
input (IL)
Function Table
Arithmetic D i
Circuit
Select
Ci+1
0 4x1 Fi
1 MUX
2
3
Ei
Logic
Bi
Ai
Circuit
Ai-1 shr
Ai+1 shl
• Computer Registers
• Addressing modes
• Instruction set
INTRODUCTION
• Every different processor type has its own design (different
registers, buses, microoperations, machine instructions, etc)
• Modern processor is a very complex device
• It contains
– Many registers
– Multiple arithmetic units, for both integer and floating point calculations
– The ability to pipeline several consecutive instructions to speed execution
– Etc.
• However, to understand how processors work, we will start with
a simplified processor model
• This is similar to what real processors were like ~25 years ago
• M. Morris Mano introduces a simple processor model he calls
the Basic Computer
• We will use this to introduce processor organization and the
relationship of the RTL model to the higher level computer
processor
CPU RAM
0
15 0
4095
INSTRUCTION
S
• Program
– A sequence of (machine) instructions
• (Machine) Instruction
– A group of bits that tell the computer to perform a specific operation
(a sequence of micro-operation)
• The instructions of a program, along with any needed data
are stored in memory
• The CPU reads the next instruction from memory
• It is placed in an Instruction Register (IR)
• Control circuitry in control unit then translates the
instruction into the sequence of microoperations
necessary to implement it
INSTRUCTION
FORMAT
• A computer instruction is often divided into two parts
– An opcode (Operation Code) that specifies the operation for that
instruction
– An address that specifies the registers and/or locations in memory to
use for that operation
• In the Basic Computer, since the memory contains 4096 (=
212) words, we needs 12 bit to specify which memory
address this instruction will use
• In the Basic Computer, bit 15 of the instruction specifies
the addressing mode (0: direct addressing, 1: indirect
addressing)
• Since the memory words, and hence the instructions, are
16 bits long, that leaves 3 bits for the instruction’s opcode
Instruction Format
15 14 12 11 0
I Opcode Address
Addressing
mode
ADDRESSING
• MODES
The address field of an instruction can represent either
– Direct address: the address in memory of the data to use (the address of the
operand), or
– Indirect address: the address in memory of the address in memory of the data to use
300 1350
457 Operand
1350 Operand
+ +
AC AC
PROCESSOR
REGISTERS
• A processor has many registers to hold instructions,
addresses, data, etc
• The processor has a register, the Program Counter (PC) that
holds the memory address of the next instruction to get
– Since the memory in the Basic Computer only has 4096 locations, the PC
only needs 12 bits
• In a direct or indirect addressing, the processor needs to keep
track of what locations in memory it is addressing: The
Address Register (AR) is used for this
– The AR is a 12 bit register in the Basic Computer
• When an operand is found, using either direct or indirect
addressing, it is placed in the Data Register (DR). The
processor then uses this value as data for its operation
• The Basic Computer has a single general purpose register –
the Accumulator (AC)
PROCESSOR
REGISTERS
• The significance of a general purpose register is that it can be
referred to in instructions
– e.g. load AC with the contents of a specific memory location; store the
contents of AC into a specified memory location
• Often a processor will need a scratch register to store
intermediate results or other temporary data; in the Basic
Computer this is the Temporary Register (TR)
• The Basic Computer uses a very simple model of input/output
(I/O) operations
– Input devices are considered to send 8 bits of character data to the processor
– The processor can send 8 bits of character data to output devices
• The Input Register (INPR) holds an 8 bit character gotten from an
input device
• The Output Register (OUTR) holds an 8 bit character to be send
to an output device
BASIC COMPUTER
REGISTERS
Registers in the Basic Computer
11 0
PC
Memory
11 0
4096 x 16
AR
15 0
IR CPU
15 0 15 0
TR DR
7 0 7 0 15 0
OUTR INPR AC
List of BC Registers
DR 16 Data Register Holds memory operand
AR 12 Address Register Holds address for memory
AC 16 Accumulator Processor register
IR 16 Instruction Register Holds instruction code
PC 12 Program Counter Holds address of instruction
TR 16 Temporary Register Holds temporary data
INPR 8 Input Register Holds input character
OUTR 8 Output Register Holds output character
Computer Organization Computer Architecture
Basic Computer Organization & Design 70 Registers
COMMON BUS
SYSTEM
COMMON BUS
SYSTEM S2
S1
S0
Bus
Memory unit 7
4096 x 16
Address
Write Read
AR 1
LD INR CLR
PC 2
LD INR CLR
DR 3
LD INR CLR
E
ALU AC 4
LD INR CLR
INPR
IR 5
LD
TR 6
LD INR CLR
OUTR Clock
LD
16-bit common bus
COMMON BUS
SYSTEM
Read
INPR
Memory Write
4096 x 16
Address E ALU
AC
L I C
L I C L
L I C DR IR L I C
PC TR
AR OUTR LD
L I C
7 1 2 3 4 5 6
COMMON BUS
SYSTEM
• Three control lines, S2, S1, and S0 control which register the
bus selects as its input
S2 S1 S0 Register
0 0 0 x
0 0 1 AR
0 1 0 PC
0 1 1 DR
1 0 0 AC
1 0 1 IR
1 1 0 TR
1 1 1 Memory
BASIC COMPUTER
INSTRUCTIONS
• Basic Computer Instruction Format
BASIC COMPUTER
Symbol
Hex Code
I=0 I=1
INSTRUCTIONS
Description
AND 0xxx 8xxx AND memory word to AC
ADD 1xxx 9xxx Add memory word to AC
LDA 2xxx Axxx Load AC from memory
STA 3xxx Bxxx Store content of AC into memory
BUN 4xxx Cxxx Branch unconditionally
BSA 5xxx Dxxx Branch and save return address
ISZ 6xxx Exxx Increment and skip if zero
INSTRUCTION SET
COMPLETENESS
A computer should have a set of instructions so that the user can
construct machine language programs to evaluate any function
that is known to be computable.
• Instruction Types
Functional Instructions
- Arithmetic, logic, and shift instructions
- ADD, CMA, INC, CIR, CIL, AND, CLA
Transfer Instructions
- Data transfers between the main memory
and the processor registers
- LDA, STA
Control Instructions
- Program sequencing and control
- BUN, BSA, ISZ
Input/Output Instructions
- Input and output
- INP, OUT
CONTROL
UNIT
• Control unit (CU) of a processor translates from machine
instructions to the control signals for the microoperations
that implement them
TIMING AND
CONTROL
Control unit of Basic Computer
Instruction register (IR)
15 14 13 12 11 - 0 Other inputs
3x8
decoder
7 6543 210
D0
I Combinational
D7 Control Control
logic signals
T 15
T0
15 14 . . . . 2 1 0
4 x 16
decoder
TIMING
- Generated by 4-bit sequenceSIGNALS
counter and 416 decoder
- The SC can be incremented or cleared.
T0
T1
T2
T3
T4
D3
CLR
SC
INSTRUCTION CYCLE
FETCH and
DECODE
• Fetch and Decode T0: AR PC (S0S1S2=010, T0=1)
T1: IR M [AR], PC PC + 1 (S0S1S2=111, T1=1)
T2: D0, . . . , D7 Decode IR(12-14), AR IR(0-11), I IR(15)
T1
S2
T0 S1 Bus
S0
Memory 7
unit
Address
Read
AR 1
LD
PC 2
INR
IR 5
LD
Clock
Common bus
T0
AR PC
T1
IR M[AR], PC PC + 1
T2
Decode Opcode in IR(12-14),
AR IR(0-11), I IR(15)
T3 T3 T3 T3
Execute Execute AR M[AR] Nothing
input-output register-reference
instruction instruction
SC 0 SC 0 Execute T4
memory-reference
instruction
SC 0
D'7IT3: AR M[AR]
D'7I'T3: Nothing
D7I'T3: Execute a register-reference instr.
D7IT3: Execute an input-output instr.
Computer Organization Computer Architecture
Basic Computer Organization & Design 83 Instruction Cycle
REGISTER REFERENCE
INSTRUCTIONS
Register Reference Instructions are identified when
- D7 = 1, I = 0
- Register Ref. Instr. is specified in b0 ~ b11 of IR
- Execution starts with timing signal T3
MEMORY REFERENCE
Symbol
Operation
INSTRUCTIONS
Symbolic Description
Decoder
AND D0 AC AC M[AR]
ADD D1 AC AC + M[AR], E Cout
LDA D2 AC M[AR]
STA D3 M[AR] AC
BUN D4 PC AR
BSA D5 M[AR] PC, PC AR + 1
ISZ D6 M[AR] M[AR] + 1, if M[AR] + 1 = 0 then PC PC+1
- The effective address of the instruction is in AR and was placed there during
timing signal T2 when I = 0, or during timing signal T3 when I = 1
- Memory cycle is assumed to be short enough to complete in a CPU cycle
- The execution of MR instruction starts with T4
AND to AC
D0T4: DR M[AR] Read operand
D0T5: AC AC DR, SC 0 AND with AC
ADD to AC
D1T4: DR M[AR] Read operand
D1T5: AC AC + DR, E Cout, SC 0 Add to AC and store carry in E
Computer Organization Computer Architecture
Basic Computer Organization & Design 85
AR = 135 135 21
Memory Memory
BSA:
D5T4: M[AR] PC, AR AR + 1
D5T5: PC AR, SC 0
D T4 D1 T 4 D2 T 4 D 3T 4
0
DR M[AR] DR M[AR] DR M[AR] M[AR] AC
SC 0
D0 T 5 D1 T 5 D2 T 5
AC AC DR AC AC + DR AC DR
SC 0 E Cout SC 0
SC 0
D4 T 4 D5 T 4 D6 T 4
PC AR M[AR] PC DR M[AR]
SC 0 AR AR + 1
D5 T 5 D6 T 5
PC AR DR DR + 1
SC 0
D6 T 6
M[AR] DR
If (DR = 0)
then (PC PC + 1)
SC 0
AC
Transmitter
Keyboard interface INPR FGI
INPR Input register - 8 bits
OUTR Output register - 8 bits Serial Communications Path
FGI Input flag - 1 bit Parallel Communications Path
FGO Output flag - 1 bit
IEN Interrupt enable - 1 bit
FGI=0 FGO=1
Start Input Start Output
FGI 0
AC Data
yes yes
FGI=0
FGO=0
no
no
AC INPR
OUTR AC
INPUT-OUTPUT INSTRUCTIONS
D7IT3 = p
IR(i) = Bi, i = 6, …, 11
p: SC 0 Clear SC
INP pB11: AC(0-7) INPR, FGI 0 Input char. to AC
OUT pB10: OUTR AC(0-7), FGO 0 Output char. from AC
SKI pB9: if(FGI = 1) then (PC PC + 1) Skip on input flag
SKO pB8: if(FGO = 1) then (PC PC + 1) Skip on output flag
ION pB7: IEN 1 Interrupt enable on
IOF pB6: IEN 0 Interrupt enable off
PROGRAM-CONTROLLED
INPUT/OUTPUT
• Program-controlled I/O
- Continuous CPU involvement
I/O takes valuable CPU time
- CPU slowed down to I/O speed
- Simple
- Least hardware
Input
Output
LOOP, LDA DATA
LOP, SKO DEV
BUN LOP
OUT DEV
- The I/O interface, instead of the CPU, monitors the I/O device.
- When the interface founds that the I/O device is ready for data transfer,
it generates an interrupt request to the CPU
Execute =0
IEN
instructions
=1 Branch to location 1
PC 1
=1
FGI
=0
=1 IEN 0
FGO R0
=0
R1
0 0 256
1 0 BUN 1120 PC = 1 0 BUN 1120
Main Main
255 Program 255 Program
PC = 256 256
1120 1120
I/O I/O
Program Program
1 BUN 0 1 BUN 0
Register-Reference
D7IT3 = r (Common to all register-reference instr)
IR(i) = Bi (i = 0,1,2, ..., 11)
r: SC 0
CLA AC 0
rB11:
CLE E0
CMA rB10:
AC AC
CME rB9: E E
CIR rB8: AC shr AC, AC(15) E, E AC(0)
CIL rB7: AC shl AC, AC(0) E, E AC(15)
INC rB6: AC AC + 1
SPA rB5: If(AC(15) =0) then (PC PC + 1)
SNA rB4: If(AC(15) =1) then (PC PC + 1)
SZA If(AC = 0) then (PC PC + 1)
SZE rB3:
If(E=0) then (PC PC + 1)
HLT rB2: S0
rB1:
Input-Output rB0: (Common to all input-output instructions)
(i = 6,7,8,9,10,11)
D7IT3 = p SC 0
INP IR(i) = Bi AC(0-7) INPR, FGI 0
OUT p: OUTR AC(0-7), FGO 0
SKI pB11: If(FGI=1) then (PC PC + 1)
SKO If(FGO=1) then (PC PC + 1)
ION pB10:
IEN 1
IOF pB9: IEN 0
pB8:
pB7:
pB6:
Computer Organization Computer Architecture
Basic Computer Organization & Design 99 Design of Basic Computer
DESIGN OF BASIC
Hardware Components ofCOMPUTER(BC)
BC
A memory unit: 4096 x 16.
Registers:
AR, PC, DR, AC, IR, TR, OUTR, INPR, and SC
Flip-Flops(Status):
I, S, E, R, IEN, FGI, and FGO
Decoders: a 3x8 Opcode decoder
a 4x16 timing decoder
Common bus: 16 bits
Control logic gates:
Adder and Logic circuit: Connected to AC
CONTROL OF
IEN: Interrupt Enable Flag
FLAGS
pB7: IEN 1 (I/O Instruction)
pB6: IEN 0 (I/O Instruction)
RT2: IEN 0 (Interrupt)
D
7
p
I
J Q IEN
B
7
T3
B6
K
R
T2
selected
x1 x2 x3 x4 x5 x6 x7 S2 S1 S0 register
0 0 0 0 0 0 0 0 0 0 none
1 0 0 0 0 0 0 0 0 1 AR
0 1 0 0 0 0 0 0 1 0 PC
0 0 1 0 0 0 0 0 1 1 DR
0 0 0 1 0 0 0 1 0 0 AC
0 0 0 0 1 0 0 1 0 1 IR
0 0 0 0 0 1 0 1 1 0 TR
0 0 0 0 0 0 1 1 1 1 Memory
For AR D4T4: PC AR
D5T5: PC AR
x1 = D4T4 + D5T5
DESIGN OF ACCUMULATOR
Circuits associated with AC
LOGIC
16
Adder and
16 16 16
From DR logic AC
circuit To bus
8
From INPR
Control
gates
CONTROL OF AC
REGISTER
Gate structures for controlling
the LD, INR, and CLR of AC
AND
C LD
i ADD
FA I J Q
i
AC(i)
DR
C
i+1
K
INPR
From
INPR
bit(i)
COM
SHR
AC(i+1)
SHL
AC(i-1)
Introduction
Machine Language
Assembly Language
Assembler
Program Loops
Subroutines
Input-Output Programming
INTRODUCTION
Those concerned with computer architecture should
have a knowledge of both hardware and software
because the two branches influence each other.
Instruction Set of the Basic Computer
Symbol Hexa code Description
AND 0 or 8 AND M to AC m: effective address
ADD 1 or 9 Add M to AC, carry to E M: memory word (operand)
LDA 2 or A Load AC from M found at m
STA 3 or B Store AC in M
BUN 4 or C Branch unconditionally to m
BSA 5 or D Save return address in m and branch to m+1
ISZ 6 or E Increment M and skip if zero
CLA 7800 Clear AC
CLE 7400 Clear E
CMA 7200 Complement AC
CME 7100 Complement E
CIR 7080 Circulate right E and AC
CIL 7040 Circulate left E and AC
INC 7020 Increment AC, carry to E
SPA 7010 Skip if AC is positive
SNA 7008 Skip if AC is negative
SZA 7004 Skip if AC is zero
SZE 7002 Skip if E is zero
HLT 7001 Halt computer
INP F800 Input information and clear flag
OUT F400 Output information and clear flag
SKI F200 Skip if input flag is on
SKO F100 Skip if output flag is on
ION F080 Turn interrupt on
IOF F040 Turn interrupt off
Computer Organization Computer Architecture
Programming the Basic Computer 108 Machine Language
MACHINE LANGUAGE
• Program
A list of instructions or statements for directing
the computer to perform a required data
processing task
• Machine-language
- Binary code
- Octal or hexadecimal code
• Assembly-language (Assembler)
- Symbolic code
• Fortran Program
INTEGER A, B, C
DATA A,83 / B,-23
C=A+B
END
ASSEMBLY LANGUAGE
Syntax of the BC assembly language
Each line is arranged in three columns called fields
Label field
- May be empty or may specify a symbolic
address consists of up to 3 characters
- Terminated by a comma
Instruction field
- Specifies a machine or a pseudo instruction
- May specify one of
* Memory reference instr. (MRI)
MRI consists of two or three symbols separated by spaces.
ADD OPR (direct address MRI)
ADD PTR I (indirect address MRI)
* Register reference or input-output instr.
Non-MRI does not have an address part
* Pseudo instr. with or without an operand
Symbolic address used in the instruction field must be
defined somewhere as a label
Comment field
- May be empty or may include a comment
PSEUDO-INSTRUCTIONS
ORG N
Hexadecimal number N is the memory loc.
for the instruction or operand listed in the following line
END
Denotes the end of symbolic program
DEC N
Signed decimal number N to be converted to the binary
HEX N
Hexadecimal number N to be converted to the binary
TRANSLATION TO BINARY
Hexadecimal Code
Location Content Symbolic Program
ORG 100
100 2107 LDA SUB
101 7200 CMA
102 7020 INC
103 1106 ADD MIN
104 3108 STA DIF
105 7001 HLT
106 0053 MIN, DEC 83
107 FFE9 SUB, DEC -23
108 0000 DIF, HEX 0
END
First pass
First pass
LC := 0
yes
yes
Store symbol END
in address-
symbol table
together with no Go to
value of LC second
pass
Increment LC
Second pass
LC <- 0
Done
Scan next line of code
Set LC
yes yes
Pseudo yes no
ORG END
instr.
no no
DEC or
yes no HEX
MRI Convert
operand
Get operation code to binary
and set bits 2-4 Valid no
non-MRI and store
instr. in location
Search address- given by LC
symbol table for yes
binary equivalent
of symbol address
and set bits 5-16
Store binary Error in
equivalent of line of
yes no instruction code
I in location
given by LC
Set Set
first first
bit to 1 bit to 0
PROGRAM LOOPS
Loop: A sequence of instructions that are executed many times,
each with a different set of data
Fortran program to add 100 numbers:
DIMENSION A(100)
INTEGER SUM, A
SUM = 0
DO 3 J = 1, 100
3 SUM = SUM + A(J)
- Hardware Implementation
- Implementation of an operation in a computer
with one machine instruction
* Multiplication
- For simplicity, unsigned positive numbers
- 8-bit numbers -> 16-bit product
X = 0000 1111 P
cir EAC
Y = 0000 1011 0000 0000
0000 1111 0000 1111
Y AC 0001 1110 0010 1101
0000 0000 0010 1101
=0 =1 0111 1000 1010 0101
E 1010 0101
PP+X
E0
AC X
cil EAC
cil
X AC
CTR CTR + 1
0 =0
CTR Stop
ORG 100
LOP, CLE / Clear E
LDA Y / Load multiplier
CIR / Transfer multiplier bit to E
STA Y / Store shifted multiplier
SZE / Check if bit is zero
BUN ONE / Bit is one; goto ONE
BUN ZRO / Bit is zero; goto ZRO
ONE, LDA X / Load multiplicand
ADD P / Add to partial product
STA P / Store partial product
CLE / Clear E
ZRO, LDA X / Load multiplicand
CIL / Shift left
STA X / Store shifted multiplicand
ISZ CTR / Increment counter
BUN LOP / Counter not zero; repeat loop
HLT / Counter is zero; halt
CTR, DEC -8 / This location serves as a counter
X, HEX 000F / Multiplicand stored here
Y, HEX 000B / Multiplier stored here
P, HEX 0 / Product formed here
END
CLE / Clear E to 0
SPA / Skip if AC is positive
CME / AC is negative
CIR / Circulate E and AC
SUBROUTINES
Subroutine
Example
CHARACTER MANIPULATION
PROGRAM INTERRUPT
Tasks of Interrupt Service Routine
- Save the Status of CPU
Contents of processor registers and Flags
MICROPROGRAMMED
CONTROL
• Control Memory
• Sequencing Microinstructions
• Microprogram Example
Microprogram
M Control Data
e
m
o IR Status F/Fs
r
y
C Control C
Next Address Storage C
S S D P CPU
Generation A (-program D
Logic s
R memory) R }
Control Memory
• When the control signals are generated by hardware using
conventional logic design techniques, the control unit is
said to be hardwired.
• Microprogramming is a second alternative for designing
the control unit of a digital computer. The principle of
microprogramming is an elegant and systematic method
for controlling the micro operation sequences in a digital
computer.
• The control variables at any given time can be represented
by a string of 1's and 0's called a control word. As such,
control words can be programmed to perform various
operations on the components of the system.
• A control unit whose binary control variables are stored in
memory is called a microprogrammed control unit.
Control Memory….
• The microinstruction specifies one or more micro-
operations for the system.
• A sequence of microinstructions constitutes a micro
program.
• A memory that is part of a control unit is referred to as a
control memory.
• A computer that employs a microprogrammed control unit
will have two separate memories: a main memory and a
control memory.
• The main memory is available to the user for storing the
programs. The contents of main memory may alter when
the data are manipulated and every time that the program
is changed.
• The user's program in main memory consists of machine
instructions and data.
Computer Organization Computer Architecture
132
Control Memory….
• The control memory holds a fixed microprogram that
cannot be altered by the occasional user.
• The microprogram consists of microinstructions that
specify various internal control signals for execution of
register microoperations.
• Each machine instruction initiates a series of
microinstructions in control memory.
Control Memory….
• The control memory address register specifies the
address of the microinstruction, and the control data
register holds the microinstruction read from memory.
• A microinstruction contains bits for initiating
microoperations in the data processor part and bits that
determine the address sequence for the control memory.
Control Memory….
• The next address generator is sometimes called a
microprogram sequencer, as it determines the address
sequence that is read from control memory.
• The address of the next microinstruction can be
specified in several ways, depending on the sequencer
inputs.
• Typical functions of a microprogram sequencer are
incrementing the control address register by one,
loading into the control address register an address from
control memory, transferring an external address, or
loading an initial address to start the control operations.
• The control data register holds the present
microinstruction while the next address is computed and
read from memory. The data register is sometimes called
a pipeline register.
Control Memory….
• It allows the execution of the microoperations specified
by the control word simultaneously with the generation
of the next microinstruction.
• This configuration requires a two-phase clock, with one
clock applied to the address register and the other to the
data register.
• The main advantage of the microprogrammed control is
the fact that once the hardware configuration is
established, there should be no need for further
hardware or wiring changes.
• If we want to establish a different control sequence for
the system, all we need to do is specify a different set of
microinstructions for control memory.
TERMINOLOGY
Microprogram
- Program stored in memory that generates all the control signals required
to execute the instruction set correctly
- Consists of microinstructions
Microinstruction
- Contains a control word and a sequencing word
Control Word - All the control information required for one clock cycle
Sequencing Word - Information needed to decide
the next microinstruction address
- Vocabulary to write a microprogram
Dynamic Microprogramming
- Computer system whose control unit is implemented with
a microprogram in WCS
- Microprogram can be changed by a systems programmer or a user
TERMINOLOGY
- In-line Sequencing
- Branch
- Conditional Branch
- Subroutine
- Loop
- Instruction OP-code mapping
Address Sequencing
• Microinstructions are stored in control memory in
groups, with each group specifying a routine.
• Each computer instruction has its own microprogram
routine in control memory to generate the
microoperations that execute the instruction.
• The hardware that controls the address sequencing of
the control memory must be capable of sequencing the
microinstructions within a routine and be able to branch
from one routine to another.
• To appreciate the address sequencing in a
microprogram control unit, let us enumerate the steps
that the control must undergo during the execution of a
single computer instruction.
• An initial address is loaded into the control address
register when power is turned on in the computer.
Address Sequencing…..
• This address is usually the address of the first
microinstruction that activates the instruction fetch routine.
• The fetch routine may be sequenced by incrementing the
control address register through the rest of its
microinstructions. At the end of the fetch routine, the
instruction is in the instruction register of the computer.
• The control memory next must go through the routine that
determines the effective address of the operand.
• A machine instruction may have bits that specify various
addressing modes, such as indirect address and index
registers.
• The effective address computation routine in control
memory can be reached through a branch microinstruction,
which is conditioned on the status of the mode bits of the
instruction.
Address Sequencing…..
• The next step is to generate the microoperations that
execute the instruction fetched from memory. The
microoperation steps to be generated in processor registers
depend on the operation code part of the instruction.
• Each instruction has its own microprogram routine stored
in a given location of control memory. The transformation
from the instruction code bits to an address in control
memory where the routine is located is referred to as a
mapping process.
• A mapping procedure is a rule that transforms the
instruction code into a control memory address.
• Once the required routine is reached, the microinstructions
that execute the instruction may be sequenced by
incrementing the control address register, but sometimes
the sequence of microoperations will depend on values of
certain status bits in processor registers .
Address Sequencing…..
• Microprograms that employ subroutines will require an
external register for storing the return address. Return
addresses cannot be stored in ROM because the unit has
no writing capability.
• The address sequencing capabilities required in a
control memory are:
Address Sequencing…..
Address Sequencing…..
Conditional Branching:
• The branch logic provides decision-making capabilities in
the control unit.
• The status conditions are special bits in the system that
provide parameter information such as the carry-out of an
adder, the sign bit of a number, the mode bits of an
instruction, and input or output status conditions.
• Information in these bits can be tested and actions
initiated based on their condition: whether their value is 1
or 0.
• The status bits, together with the field in the
microinstruction that specifies a branch address, control
the conditional branch decisions generated in the branch
logic.
Address Sequencing…..
• The branch logic hardware may be implemented in a variety
of ways. The simplest way is to test the specified condition
and branch to the indicated address if the condition is met,
otherwise, the address register is incremented.
• This can be implemented with a multiplexer. Suppose that
there are eight status bit conditions in the system.
• Three bits in the microinstruction are used to specify any
one of eight status bit conditions. These three bits provide
the selection variables for the multiplexer.
• If the selected status bit is in the 1 state, the output of the
multiplexer is 1, otherwise, it is 0.
• A 1 output in the multiplexer generates a control signal to
transfer the branch address from the microinstruction into
the control address register. A 0 output in the multiplexer
causes the address register to be incremented.
Address Sequencing…..
Mapping of Instruction
A special type of branch exists when a microinstruction
specifies a branch to the first word in control memory where a
microprogram routine for an instruction is located.
The status bits for this type of branch are the bits in the
operation code part of the instruction. For example, a computer
with a simple instruction format as shown in Fig., has an
operation code of four bits which can specify up to 16 distinct
instructions.
Mapping of Instruction….
This mapping consists of placing a 0 in the most
significant bit of the address, transferring the four
operation code bits, and clearing the two least
significant bits of the control address register.
This provides for each computer instruction a
microprogram routine with a capacity of four
microinstructions.
If the routine needs more than four microinstructions, it
can use addresses 1000000 through 1 1 1 1 1 1 1 .
If it uses fewer than four microinstructions, the unused
memory locations would be available for other routines.
One can extend this concept to a more general mapping
rule by using a ROM to specify the mapping function.
Mapping of Instruction….
MICROINSTRUCTION SEQUENCING
Instruction code
Mapping
logic
Subroutine
register
Control address register (SBR)
(CAR)
Incrementer
select a status
bit
Microoperations
Branch address
CONDITIONAL BRANCH
Load address
Control address register
Increment
MUX
Control memory
...
Status bits
(condition)
Next address
Conditional Branch
If Condition is true, then Branch (address from
the next address field of the current microinstruction)
else Fall Through
Conditions to Test: O(overflow), N(negative),
Z(zero), C(carry), etc.
Unconditional Branch
Fixing the value of one status bit at the input of the multiplexer to 1
Computer Organization Computer Architecture
Microprogrammed Control 151 Sequencing
MAPPING OF INSTRUCTIONS
Direct Mapping Address
OP-codes of Instructions 0000 ADD Routine
0001 AND Routine
ADD 0000
. 0010 LDA Routine
AND 0001
. 0011 STA Routine
LDA 0010 . 0100 BUN Routine
STA 0011
BUN 0100 Control
Storage
Mapping
Bits 10 xxxx 010 Address
10 0000 010 ADD Routine
Machine OP-code
Instruction 1 0 1 1 Address
Mapping bits 0 x x x x 0 0
Microinstruction
address 0 1 0 1 1 0 0
Mapping memory
(ROM or PLA)
Control Memory
Subroutines
• Subroutines are programs that are used by other
routines to accomplish a particular task.
• A subroutine can be called from any point within the
main body of the microprogram. Frequently, many
Microprograms contain identical sections of code.
• Microinstructions can be saved by employing
subroutines that use common sections of microcode.
Subroutines…
• Microprograms that use subroutines must have a
provision for storing the return address during a
subroutine call and restoring the address during a
subroutine return.
• This may be accomplished by placing the incremented
output from the control address register into a
subroutine register and branching to the beginning of
the subroutine.
• The subroutine register can then become the source for
transferring the address for the return to the main
routine.
• The best way to structure a register file that stores
addresses for subroutines is to organize the registers in
a last-in, first-out (LIFO) stack.
MICROPROGRAM EXAMPLE
Computer Configuration
MUX
10 0
AR
Address Memory
10 0 2048 x 16
PC
MUX
15 0
6 0 6 0 DR
SBR CAR
Microinstruction Format
3 3 3 2 2 7
F1 F2 F3 CD BR AD
F3 Microoperation Symbol
000 None NOP
001 AC AC DR XOR
010 AC AC’ COM
011 AC shl AC SHL
100 AC shr AC SHR
101 PC PC + 1 INCPC
110 PC AR ARTPC
111 Reserved
BR Symbol Function
00 JMP CAR AD if condition = 1
CAR CAR + 1 if condition = 0
01 CALL CAR AD, SBR CAR + 1 if condition = 1
CAR CAR + 1 if condition = 0
10 RET CAR SBR (Return from subroutine)
11 MAP CAR(2-5) DR(11-14), CAR(0,1,6) 0
SYMBOLIC MICROINSTRUCTIONS
• Symbols are used in microinstructions as in assembly language
• A symbolic microprogram can be translated into its binary equivalent
by a microprogram assembler.
Sample Format
five fields: label; micro-ops; CD; BR; AD
SYMBOLIC MICROPROGRAM
• Control Storage: 128 20-bit words
• The first 64 words: Routines for the 16 machine instructions
• The last 64 words: Used for other purpose (e.g., fetch routine and other subroutines)
• Mapping: OP-code XXXX into 0XXXX00, the first address for the 16 routines are
0(0 0000 00), 4(0 0001 00), 8, 12, 16, 20, ..., 60
BINARY
MICROPROGRAM
Address Binary Microinstruction
Micro Routine Decimal Binary F1 F2 F3 CD BR AD
ADD 0 0000000 000 000 000 01 01 1000011
1 0000001 000 100 000 00 00 0000010
2 0000010 001 000 000 00 00 1000000
3 0000011 000 000 000 00 00 1000000
microoperation fields
F1 F2 F3
AND
ADD AC
Arithmetic
logic and DR
DRTAC shift unit
PCTAR
DRTAR
From From
PC DR(0-10) Load
AC
Select 0 1
Multiplexers
Load Clock
AR
Clock CAR
Control Storage
MUX-1 selects an address from one of four sources and routes it into a CAR
Input Logic
I 1 I0 T Meaning Source of Address S1S0 L
S1 = I 1
S0 = I1I0 + I1’T
L = I1’I0T
MICROPROGRAM SEQUENCER
External
(MAP)
L
I0 3 2 1 0
Input Load
I1 logic S1 MUX1 SBR
T S0
1 Incrementer
I MUX2 Test
S
Z Select
Clock CAR
Control memory
Microops CD BR AD
... ...
MICROINSTRUCTION FORMAT
Information in a Microinstruction
- Control Information
- Sequencing Information
- Constant
Information which is useful when feeding into the system
Field Encoding
Field A Field B
Field A Field B
2 bits 6 bits
2 bits 3 bits
2x4 6 x 64
2x4 3x8 Decoder Decoder
Decoder Decoder
Decoder and
1 of 4 1 of 8 selection logic
Two-level microprogram
First level
-Vertical format Microprogram
Second level
-Horizontal format Nanoprogram
- Interprets the microinstruction fields, thus converts a vertical
microinstruction format into a horizontal
nanoinstruction format.
11 bits
Control memory
2048 x 8
Microinstruction (8 bits)
Nanomemory address
Nanomemory
256 x 200
Overview
Fundamental Concepts
Executing an Instruction
Processor Organization
Internal processor
bus
Control signals
PC
Instruction
Address
decoder and
lines
MDR HAS MAR control logic
TWO INPUTS
Memory
AND TWO bus
OUTPUTS
MDR
Data
lines IR
Datapath
Y
Constant 4 R0
Select MUX
Add
A B
ALU Sub R n - 1
control ALU
lines
Carry-in
XOR TEMP
Z
Textbook Page 413
Computer Organization Figure 7.1. Single-bus organization of the datapath inside a processor. Computer Architecture
Central Processing Unit 175
Executing an Instruction
Register Transfers
Internal processor
bus
Riin
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Zout
Figure 7.2. Input and output gating for the registers in Figure 7.1.
Computer Organization Computer Architecture
Central Processing Unit 177
Register Transfers
D Q
1
Q
Riout
Ri in
Clock
Figure
Figure7.3. Input and
7.3. Input and output
output gating
gatingfor
forone
oneregister
registerbit.
bit.
Computer Organization Computer Architecture
Central Processing Unit 178
MDR
Figure
Figure7.4. Connectionand
7.4. Connection and control
control signals
signalsfor
forregister
register MDR.
Computer Organization Computer Architecture
Central Processing Unit 180
Step
Timing
1 2 3
Clock
MAR ← [R1]
MARin
Assume MAR
is always available
on the address lines
of the memory bus. Address
MR
MDRinE
Data
Wait for the MFC response from the memory
MFC
R2 ← [MDR]
Execution of a Complete
Instruction
• Add (R3), R1
• Fetch the instruction
• Fetch the first operand (the contents of the memory
location pointed to by R3)
• Perform the addition
• Load the result into R1
Architecture
Internal processor
bus
Riin
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Zout
Figure 7.2. Input and output gating for the registers in Figure 7.1.
Computer Organization Computer Architecture
Central Processing Unit 184
Execution of a Complete
Instruction Internal processor
bus
Add (R3), R1 Control signals
PC
Instruction
Step Action Address
decoder and
lines
MAR control logic
Constant 4 R0
5 R1out , Yin , WMF C
6 MDR out , SelectY, Add, Zin Select MUX
Step Action
Multiple-Bus Organization
Bus A Bus B Bus C
Incrementer
PC
Register
file
Constant 4
MUX
A
ALU R
Instruction
decoder
IR
MDR
MAR
Computer Organization Figure 7.8. Three-b us organization of the datapath. Computer Architecture
Central Processing Unit 188
Multiple-Bus Organization
Step Action
3 MDR , R=B, IR
outB in
4 R4 , R5 , SelectA, Add, R6 , End
outA outB in
Quiz
Internal processor
bus
Control signals
of the instruction
lines
MAR control logic
Memory
Add R1, R2
bus
MDR
Data
IR
including the instruction
lines
Add
A B
ALU Sub R n - 1
control ALU
lines
Carry-in
XOR TEMP
External
inputs
Decoder/
IR
encoder
Condition
codes
Control signals
Step decoder
T 1 T2 Tn
INS 1
External
INS 2 inputs
Instruction
IR Encoder
decoder
Condition
codes
INSm
Run End
Control signals
Generating Zin
• Zin = T1 + T6 • ADD + T4 • BR + …
Branch Add
T4 T6
T1
Figure 7.12. Generation of the Zin control signal for the processor in Figure 7.1.
Generating End
T7 T5 T4 T5
End
A Complete Processor
Instruction Data
cache cache
Bus interface
Processor
System bus
Main Input/
memory Output
Overview
MDRout
WMFC
MAR in
Select
PCout
R1out
R3out
Micro -
Read
PCin
R1 in
Z out
Add
End
IRin
Yin
instruction
Zin
1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1
Overview
Step Action
Figure 7.6. Con trol sequence for execution of the instruction Add (R3),R1.
Overview
• Control store
Starting
IR address One function
generator cannot be carried
out by this simple
organization.
Clock P C
Control
store CW
Overview
Overview
External
inputs
Starting and
branch address Condition
IR codes
generator
Clock PC
Control
store CW
Microinstructions
F1 F2 F3 F4 F5
0000: No transfer 000: No transfer 000: No transfer 0000: Add 00: No action
0001: PCout 001: PCin 001: MARin 0001: Sub 01: Read
0010: MDRout 010: IRin 010: MDRin 10: Write
0011: Zout 011: Zin 011: TEMPin
0100: R0out 100: R0in 100: Yin 1111: XOR
0101: R1out 101: R1in
0110: R2out 110: R2in 16 ALU
functions
0111: R3out 111: R3 in
1010: TEMPout
1011: Offsetout
Further Improvement
Microprogram Sequencing
- Bit-ORing
- Wide-Branch Addressing
- WMFC
11 10 8 7 4 3 0
Address Microinstruction
(octal)
External Condition
Inputs codes
Decoding circuits
AR
Control store
Next address I R
Microinstruction decoder
Control signals
F0 F1 F2 F3
F4 F5 F6 F7
F8 F9 F10
Octal
address F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
0 0 0 0 0 0 0 0 0 0 1 0 0 1 01 1 0 0 1 0 0 0 0 01 1 0 0 0 0
0 0 1 0 0 0 0 0 0 1 0 0 1 1 00 1 1 0 0 0 0 0 0 00 0 1 0 0 0
0 0 2 0 0 0 0 0 0 1 1 0 1 0 01 0 0 0 0 0 0 0 0 00 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 1 1 0
121 0 1 0 1 0 0 1 0 1 0 0 01 1 0 0 1 0 0 0 0 01 1 0 0 0 0
122 0 1 1 1 1 0 0 0 0 1 1 10 0 0 0 0 0 0 0 0 00 0 1 0 0 1
1 7 0 0 1 1 1 1 0 0 1 0 1 0 00 0 0 0 1 0 0 0 0 01 0 1 0 0 0
1 7 1 0 1 1 1 1 0 1 0 0 1 0 00 0 1 0 0 0 0 0 0 00 0 0 0 0 0
1 7 2 0 1 1 1 1 0 1 1 1 0 1 01 1 0 0 0 0 0 0 0 00 0 0 0 0 0
1 7 3 0 0 0 0 0 0 0 0 0 1 1 10 1 0 0 0 0 0 0 0 00 0 0 0 0 0
Decoder
Decoder
IR Rsrc Rdst
InstDecout
External
inputs ORmode
Decoding
circuits
Condition ORindsrc
codes
AR
Control store
Rdstout
Rdstin
Microinstruction
decoder
Rsrcout
Rsrcin
bit-ORing
• Parallel Processing
• Pipelining
• Arithmetic Pipeline
• Instruction Pipeline
• RISC Pipeline
• Vector Processing
• Array Processors
PARALLEL PROCESSING
- Inter-Instruction level
- Intra-Instruction level
PARALLEL COMPUTERS
Architectural Classification
– Flynn's classification
» Based on the multiplicity of Instruction Streams and
Data Streams
» Instruction Stream
• Sequence of Instructions read from memory
» Data Stream
• Operations performed on the data in the processor
VLIW
MISD Nonexistence
Systolic arrays
Dataflow
Associative processors
Message-passing multicomputers
Hypercube
Mesh
Reconfigurable
Instruction stream
Characteristics
- Standard von Neumann machine
- Instructions and data are stored in memory
- One operation at a time
Limitations
Von Neumann bottleneck
• Multiprogramming
• Spooling
• Multifunction processor
• Pipelining
• Exploiting instruction-level parallelism
- Superscalar
- Superpipelining
- VLIW (Very Long Instruction Word)
M CU P
M CU P
Memory
• •
• •
• •
M CU Data stream
P
Instruction stream
Characteristics
- There is no computer at present that can be
classified as MISD
Control Unit
Instruction stream
Data stream
Alignment network
Characteristics
- Only one copy of the program exists
- A single controller executes one instruction at a time
Array Processors
- The control unit broadcasts instructions to all PEs,
and all active PEs execute the same instructions
- ILLIAC IV, GF-11, Connection Machine, DAP, MPP
Systolic Arrays
- Regular arrangement of a large number of
very simple processors constructed on
VLSI circuits
- CMU Warp, Purdue CHiP
Associative Processors
- Content addressing
- Data transformation operations over many sets
of arguments with a single instruction
- STARAN, PEPE
Computer Organization Computer Architecture
Pipelining and Vector Processing Parallel Processing
221
Interconnection Network
Shared Memory
Characteristics
- Multiple processing units
- Message-passing multicomputers
Buses,
Interconnection Network(IN) Multistage IN,
Crossbar Switch
P P ••• P
Characteristics
All processors have equally direct access to
one large memory address space
Example systems
Bus and cache-based systems
- Sequent Balance, Encore Multimax
Multistage IN-based systems
- Ultracomputer, Butterfly, RP3, HEP
Crossbar switch-based systems
- C.mmp, Alliant FX/8
Limitations
Memory access latency
Hot spot problem
Computer Organization Computer Architecture
Pipelining and Vector Processing Parallel Processing
223
MESSAGE-PASSING MULTICOMPUTER
Message-Passing Network Point-to-point connections
P P ••• P
M M ••• M
Characteristics
- Interconnected computers
- Each processor has its own memory, and
communicate via message-passing
Example systems
- Tree structure: Teradata, DADO
- Mesh-connected: Rediflow, Series 2010, J-Machine
- Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III
Limitations
- Communication overhead
- Hard to programming
Computer Organization Computer Architecture
Pipelining and Vector Processing Pipelining
224
PIPELINING
A technique of decomposing a sequential process
into suboperations, with each subprocess being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory
Ci
Segment 1
R1 R2
Multiplier
Segment 2
R3 R4
Adder
Segment 3
R5
Clock
Pulse Segment 1 Segment 2 Segment 3
Number R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7
GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
Clock
Input S1 R1 S2 R2 S3 R3 S4 R4
Space-Time Diagram
1 2 3 4 5 6 7 8 9
Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
PIPELINE SPEEDUP
n: Number of tasks to be performed
Speedup
Sk: Speedup
Sk = n*tn / (k + n - 1)*tp
tn
lim Sk = ( = k, if tn = k * tp )
n tp
Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS
Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS
Speedup
Sk = 8000 / 2060 = 3.88
ARITHMETIC PIPELINE
Floating-point adder Exponents
a b
Mantissas
A B
X = A x 2a
Y = B x 2b
R R
R R
R R
Stages: Other
Exponent fraction Fraction
S1 subtractor selector
Fraction with min(p,q)
r = max(p,q)
Right shifter
t = |p - q|
S2 Fraction
adder
r c
Leading zero
S3 counter
c
Left shifter
r
d
Exponent
S4 adder
s d
C = A + B = c x 2 r = d x 2s
(r = max (p,q), 0.5 d < 1)
INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
[1] Fetch an instruction from memory
[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place
INSTRUCTION PIPELINE
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Pipelined
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Decode instruction
Segment2: and calculate
effective address
yes Branch?
no
Fetch operand
Segment3: from memory
Interrupt yes
Interrupt?
handling
no
Update PC
Empty pipe
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
R1 <- R1 + 1
INC DA bubble R1 +1
Control hazards
Branches and other instructions that change the PC
make the fetch of the next instruction to be delayed
JMP ID PC + PC Branch address dependency
bubble IF ID OF OE OS
STRUCTURAL HAZARDS
Structural Hazards
Occur when some resource has not been
duplicated enough to allow all combinations
of instructions in the pipeline to execute
i+1 FI DA FO EX
DATA HAZARDS
Data Hazards
Interlock
- hardware detects the data dependencies and delays the scheduling
of the dependent instruction by stalling enough clock cycles
Forwarding (bypassing, short-circuiting)
- Accomplished by a data path that routes a value from a source
(usually an ALU) to a user, bypassing a designated register. This
allows the value to be produced to be used at an earlier stage in the
pipeline than would otherwise be possible
Software Technique
Instruction Scheduling(compiler) for delayed load
Computer Organization Computer Architecture
Pipelining and Vector Processing Instruction Pipeline
237
FORWARDING HARDWARE
Example: Register
file
ALU Operations
E: Write the result to the
destination register R4
ADD I A E
INSTRUCTION SCHEDULING
a = b + c;
d = e - f;
Delayed Load
A load requiring that the following instruction not use its result
CONTROL HAZARDS
Branch Instructions
Next
Instruction FI DA FO EX
CONTROL HAZARDS
Prefetch Target Instruction
– Fetch instructions in both streams, branch not taken and branch taken
– Both are saved until branch branch is executed. Then, select the right
instruction stream and discard the wrong stream
Branch Target Buffer(BTB; Associative Memory)
– Entry: Addr of previously executed branches; Target instruction
and the next few instructions
– When fetching an instruction, search BTB.
– If found, fetch the instruction stream in BTB;
– If not, new stream is fetched and update BTB
Loop Buffer(High Speed Register file)
– Storage of entire loop that allows to execute a loop without accessing memory
Branch Prediction
– Guessing the branch condition, and fetch an instruction stream based on
the guess. Correct guess eliminates the branch penalty
Delayed Branch
– Compiler detects the branch and rearranges the instruction sequence
by inserting useful instructions that keep the pipeline busy
in the presence of a branch instruction
RISC PIPELINE
RISC
- Machine with a very fast clock cycle that
executes at the rate of one instruction per cycle
<- Simple Instruction Set
Fixed Length Instruction Format
Register-to-Register Operations
DELAYED LOAD
LOAD: R1 M[address 1]
LOAD: R2 M[address 2]
ADD: R3 R1 + R2
STORE: M[address 3] R3
Three-segment pipeline timing
Pipeline timing with data conflict
clock cycle 1 2 3 4 5 6
Load R1 I A E
Load R2 I A E
Add R1+R2 I A E
Store R3 I A E
clock cycle 1 2 3 4 5 6 7
Load R1 I A E
The data dependency is taken
Load R2 I A E care by the compiler rather
NOP I A E than the hardware
Add R1+R2 I A E
Store R3 I A E
DELAYED BRANCH
Compiler analyzes the instructions before and after
the branch and rearranges the program sequence by
inserting useful instructions in the delay steps
VECTOR PROCESSING
Vector Processing Applications
• Problems that can be efficiently formulated in terms of vectors
– Long-range weather forecasting
– Petroleum explorations
– Seismic data analysis
– Medical diagnosis
– Aerodynamics and space flight simulations
– Artificial intelligence and expert systems
– Mapping the human genome
– Image processing
VECTOR PROGRAMMING
DO 20 I = 1, 100
20 C(I) = B(I) + A(I)
Conventional computer
Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = i + 1
If I 100 goto 20
Vector computer
VECTOR INSTRUCTIONS
f1: V V
f2: V S
f3: V x V V V: Vector operand
f4: V x S V S: Scalar operand
Source
A
AR AR AR AR
DR DR DR DR
Data bus
Address Interleaving
2. Static Memories
CMOS Cell
4. Synchronous DRAMs
Synchronous DRAM
Organization of a
2M × 32 memory
module using
512K × 8 static
memory chips
Computer Organization Computer Architecture
259
Read-only Memories
A ROM cell
ROM
PROM
EPROM
EEPROM
FLASH MEMORY
Cache Memories
1. Mapping Functions:
Direct Mapping
Associative Mapping
Set-Associative Mapping
Direct-mapped cache
Associative-mapped cache
2.Replacement Algorithms
3.Examples of Mapping Techniques
Memory Hierarchy
Performance Considerations
1. Interleaving
2. Hit Rate and Miss Penalty
3. Caches on the Processor Chip
4. Other Enhancements:
Write Buffer
Prefetching
Lookup-Free cache
Virtual Memory
Address Translation
Secondary Storage
Magnetic Hard Disks:
Optical Disks
CD Technology: Optical disk
Optical disk
CD-ROM
CD-Recordables
CD-ReWritables
DVD Technology
DVD-RAM
A computer system
Bus Arbitration
Bus Master: The device that is allowed to initiate data transfers on bus
at any given time is called the bus master.
1. Centralized
2. Decentralized
BUSES
Synchronous Bus
Asynchronous Bus
Interface Circuits
Parallel Port
A PARALLEL PORT INTERFACE FOR THE BUS, STATE DIAGRAM FOR THE TIMING LOGIC
Computer Organization Computer Architecture
308
Serial Port
Small Computer System Interface