0% found this document useful (0 votes)
127 views162 pages

CSC 307 - Computer System Architecture-Elizade 20182019new

This document outlines a course on computer system architecture, covering topics such as data representation, number systems, logic gates, instruction formats, and memory hierarchy through lectures on introduction, data representation, and computer architecture's changing definition from the 1950s focusing on hardware to modern times focusing more on software and special architectures. The levels of representation from high-level programs to low-level machine code and the relationship between instruction set architecture and computer organization are also discussed.

Uploaded by

Bolanle Ojokoh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views162 pages

CSC 307 - Computer System Architecture-Elizade 20182019new

This document outlines a course on computer system architecture, covering topics such as data representation, number systems, logic gates, instruction formats, and memory hierarchy through lectures on introduction, data representation, and computer architecture's changing definition from the 1950s focusing on hardware to modern times focusing more on software and special architectures. The levels of representation from high-level programs to low-level machine code and the relationship between instruction set architecture and computer organization are also discussed.

Uploaded by

Bolanle Ojokoh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 162

CSC 307- Computer System

Architecture

Elizade University, Ilara-Mokin


Lecturer –Prof.(Mrs.) Bolanle Ojokoh
Outline
• Introduction
• Data Representation and Number Systems
• Logic Gates
• Registers and Transfer Notations
• Instruction Formats and Memory Addressing
Modes
• Memory Hierarchy and Management
• Pipelining
Lecture One
• Introduction
• Data Representation and Number Systems
Computer Architecture’s Changing Definition

• 1950s Computer Architecture


– Computer Arithmetic, with UNIVAC I, II
• 1960s
– Operating system support, especially memory management
• 1970s to mid 1980s Computer Architecture
– Instruction Set Design, especially ISA appropriate for compilers
– Vector processing and shared memory multiprocessors
• 1990s Computer Architecture
– Design of CPU, memory system, I/O system, Multi-processors, Networks
– Design for VLSI
• 2000s Computer Architecture:
– Special purpose architectures, Functionally reconfigurable, Special
considerations for low power/mobile processing, highly parallel
structures
Levels of Representation

temp = v[k];
High Level Language v[k] = v[k+1];
Program v[k+1] = temp;

Compiler
• lw $15, 0($2)
Assembly Language lw $16, 4($2)
Program sw $16, 0($2)
sw $15, 4($2)
Assembler
0000 1001 1100 0110 1010 1111 0101 1000
Machine Language 1010 1111 0101 1000 0000 1001 1100 0110
Program 1100 0110 1010 1111 0101 1000 0000 1001
0101 1000 0000 1001 1100 0110 1010 1111

Machine Interpretation

Control Signal Spec ALUOP[0:3] <= InstReg[9:11] & MASK


Levels of Abstraction

Graphical Interface
Application
Programming Application
Libraries
Operating System
System Programming
Programming Language
Assembler Language
Instruction Set Architecture - “Machine Language”
Processor IO System
Firmware Microprogrammin
g
Computer Design Datapath and Control
Digital Design
Logic Design
Circuit Design Circuits and devices
Fabrication Semiconductors
Materials
The Instruction Set: A Critical Interface

Computer Architecture = • Instruction Set Design


Instruction Set Architecture + – Machine Language
Machine Organization
– Compiler View
– "Computer Architecture"
– "Instruction Set Architecture"
• "Building Architect"

software
instruction set
hardware Computer Organization and
Design
This course • Machine Implementation
• Logic Designer's View
• "Processor Architecture"
• "Computer Organization"

"Construction Engineer"
Instruction Set Architecture

Data Types
Encoding and representation
Architecture Reference Manual
Memory Model
Principles of Operation
Program Visible Processor State Programming Guide
General registers
Program counter …
Processor status
Instruction Set
Instructions and formats
Addressing modes
Data structures
System Model
States
Privilege
• . . . the attributes of a [computing] system as
Interrupts
IO seen by the programmer, i.e. the conceptual
structure and functional behavior, as distinct
External Interfaces from the organization of the data flows and
IO
controls the logic design, and the physical
Management
implementation.
• Amdahl, Blaaw, and Brooks, 1964
Computer Organization

• Capabilities & Performance Characteristics of Principal


Functional Units
(e.g., Registers, ALU, Shifters, Memory Management, etc.
• Ways in which these components are interconnected
– Datapath - nature of information flows and connection of
functional units
– Control - logic and means by which such information flow is
controlled
• Choreography of functional units to realize the ISA
• Register Transfer Level Description / Microcode

“Hardware” designer’s view includes logic and firmware


This Course Focuses on General Purpose
Processors
Unified main memory
• A general-purpose computer system
– Uses a programmable processor • For both programs & data
– Can run “any” application • Von Neumann computer
– Potentially optimized for some class of
Buses & controllers to connect
applications processor, memory, IO devices
– Common names: CPU, DSP, NPU,
microcontroller, microprocessor Processor
Input
Control
Memory

Datapath
Output
MIT Whirlwind, 1951
Computers are pervasive – servers, standalone PCs,
network processors, embedded processors, …
Von-Neumann Machine
Today, “Computers” are
Connected Processors
Proc

Caches
Buses

adapters
Memory

Controllers

Disks
I/O Devices:
Displays Networks
Keyboards

• All have interfaces & organizations


Why Study Computer Architecture?
• Enable better systems: make computers faster, cheaper, smaller,
more reliable, …
– By exploiting advances and changes in underlying technology/circuits

• Enable new applications


– Life-like 3D visualization 20 years ago?
– Virtual reality?
– Personal genomics?

• Enable better solutions to problems


– Software innovation is built into trends and changes in computer architecture
• > 50% performance improvement per year has enabled

• Understand why computers work the way they do


Data representation
• Text and numbers stored in binary (0,1)

• “Bit”: Binary digit – a single 0 or 1


• “Byte”: string of eight bits
fundamental unit of computer memory

14
Bits, Bytes, Nibbles…
• Bits

• Bytes & Nibbles

• Bytes
• (how they are put in memory?)
Number Systems
• Normal humans: use decimal (base ten)

• Computer scientists: use alternatives


– Binary (base two)
– Hexadecimal (base sixteen)

16
Number Systems
• Decimal (base ten; digits 0..9)

85710 = 8×102 + 5×101 + 7×100

• Binary (base two; digits 0..1)

110012 = 1×24 + 1×23 + 0×22 + 0×21 + 1×20


• Binary to Decimal = 25
17
Hexadecimal Numbers
Hex Digit Decimal Equivalent Binary Equivalent
0 0 0000
1 1 0001
2 2 0010
3 3 0011
4 4 0100
5 5 0101
6 6 0110
7 7 0111
8 8 1000
9 9 1001
A 10 1010
B 11 1011
C 12 1100
D 13 1101
E 14 1110
F 15 1111
Number Systems
• Hexadecimal (base sixteen;
digits 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F)

4D7A16 = 4×163 + 13×162 + 7×161 + 10×160

Binary to Hexadecimal Conversion


• Put the digits into groups of four digits starting
from the LSB (right)

19
Binary to Hexadecimal Conversion
Example

• Convert the binary number to Hexadecimal


00000011000110101100000001000000
0 3 1 A C 0 4 0
Decimal to Binary conversion

• Repeatedly divide by 2 until the quotient is 0


• The remainders are the bits
• First remainder is least significant bit (lsb)
• Last remainder is most significant bit (msb)

• Example: convert 5710 to binary


57 / 2 = 28 r 1 , 28/2 = 14 r 0, 14/2= 7 r 0,
7/2 = 3 r 1, 3/2 = 1 r 1, 1/2 = 0 r 1 =111001
21
Hexadecimal  binary conversion
• Hexadecimal digit = shorthand for 4 bits

0000 0001 0010 0011 0100 0101 0110 0111


0 1 2 3 4 5 6 7

1000 1001 1010 1011 1100 1101 1110 1111


8 9 A B C D E F

• Example: convert 6C4D16 to binary


0110 1100 0100 1101
22
Addition
• Decimal

• Binary
Addition
• Hexadecimal 1 carries
3A3E
+ 516B
8BA9

• Binary
Binary Addition Examples
• Add the following
4-bit binary
numbers
1110

• Add the following


4-bit binary
numbers
[1]0001
Signed Binary Numbers
• Sign and Magnitude:
– 1 sign bit, N-1 magnitude bits
– Example: -5 = 11012
+5 = 01012

• Two’s Complement( why? )


– Same as unsigned binary, but most significant bit (msb) has value of -2 N-1
– Most positive 4-bit number: 01112
– Most negative 4-bit number: 10002
“Taking the Two’s Complement”
• Reversing the sign of a two’s complement
number
• Method:
1. Invert the bits
2. Add 1

• Example: Reverse the sign of 0111


1. 1000
2. + 1
1001
Two’s Complement Examples
• Take the two’s complement of 0101.
• 1010
• + 1
• 1011

• Take the two’s complement of 1010.


• 0101
• + 1
• 0110
Two’s Complement Addition
• Add 6 + (-6) using two’s complement
numbers.
(6)
(-6) Two’s comp. of 0110

• Add -2 + 3 using two’s complement


numbers.
1110(-2) Two’s comp. of 0010
0011 (3)
0001
Floating point representation
• Fixed-point
• Normalized floating point
• IEEE standard
Floating point representation
• Fixed-point –1.625, 1 will take 24 bits, while
0.625 will take 8 bits
Giving 00000000 00000000 00000001 10100000
Floating Point Representation

• Normalized Floating-point Representation:


– sign, exponent, significand (or mantissa):
Number = (–1)sign × significand × base exponent
– e.g 1.2345 = 12345 X 10-4
– more bits for significand gives more accuracy
– more bits for exponent increases range
• IEEE 754 floating point standard (universal):
– single precision: 8 bit exponent, 23 bit significand (1 bit sign)
– double precision: 11 bit exponent, 52 bit significand
Floating-point representation
• E.g -96(base 10) = -1100000 (base 2)
• =-1.100000 (base 2) X 26
To fix this into the bits:
1 is for the sign bit (bcos the no is –ve)
6 (exponent) +7 =13 (base 10) = 1101(base 2) for
4 exponent bits
100 ( the 3 bits ffg 1) for 3 mantissa bits
1 1101 100
Binary Floating-Point Formats
Decimal Floating Point (DFP) Addition
• Step 1: equalize the exponents
– add the mantissas only when exponents are the
same.
– the number with smaller exponent should be
shifting its point to the left, and the number with
larger exponent should be shifting its point to
right.
– Rewriting the operand with the smaller exponent
could result in a loss of the least significant digits
– keep guard digit, round digit, and stick digit for
the operand with smaller exponent
DFP addition
• Step 2: add the mantissas
0099999x101
+0016234x10-3
0999990x100
0000016(234)x100
1000006(234) x100
• Step 3: Normalize the result if necessary
DFP addition
• Step 4: Round the number if needed
1000006234x100 =1000006x100
• Step 5: Repeat step 3 if the result is no longer
normalized
• The final result is 1000006
• The correct answer is 1000006.234
Lecture Two
• Logic Gates
Digital Logic Gates

or F=

39
Example of binary signals

Two values: 0 or 1

40
Input-Output signals for gates

41
Boolean Algebra
• Basic definitions:

• x+0=0+x=x
• x.1=1.x=x
• x.(y+z)=(x.y)+(x.z)
• x+(y.z)=(x+y).(x+z)
• x+x’=1
• x.x’=0
42
Boolean Algebra Theorems
• x+x=x
• x.x=x
• x+1=1
• x.0=0
• x+x.y=x
• x.(x+y)=x

43
Boolean Function Implementation

y’

Y’.z

44
Boolean Function Implementation

X’.y’.z

X’.y.z

X.y’

X.y’

X’.z

45
Complement of a function
• DeMorgan’s theorem:
• (x+y)’=x’.y’
(x.y)’=x’+y’

• What about three variables?

• (x+y+z)’=?
• Let A=x+y (A+z)’=A’.z’=(x+y)’.z’=x’.y’.z’

• (x.y.z)’=x’+y’+z’

46
Canonical & Standard Forms
• Consider two binary variables x, y and the AND operation
• four combinations are possible: x.y, x’.y, x.y’, x’.y’
• each AND term is called a minterm or standard products

• for n variables we have 2n minterms

• Consider two binary variables x, y and the OR operation


• four combinations are possible: x+y, x’+y, x+y’, x’+y’
• each OR term is called a maxterm or standard sums

• for n variables we have 2n maxterms

• Canonical Forms:
• Boolean functions expressed as a sum of minterms or product of maxterms.

47
Minterms
• x y z Terms Designation
• 0 0 0 x’.y’.z’ m0
• 0 0 1 x’.y’.z m1
• 0 1 0 x’.y.z’ m2
• 0 1 1 x’.y.z m3
• 1 0 0 x.y’.z’ m4
• 1 0 1 x.y’.z m5
• 1 1 0 x.y.z’ m6
• 1 1 1 x.y.z m7
48
Maxterms
• x y z Designation Terms
• 0 0 0 M0 x+y+z
• 0 0 1 M1 x+y+z’
• 0 1 0 M2 x+y’+z
• 0 1 1 M3 x+y’+z’
• 1 0 0 M4 x’+y+z
• 1 0 1 M5 x’+y+z’
• 1 1 0 M6 x’+y’+z
• 1 1 1 M7 x’+y’+z’

49
Boolean Function: Example

How to express algebraically

• Question: How do we find the function using the truth table?

• Truth table example:


• x y z F1 F2
• 0 0 0 0 0
• 0 0 1 1 1
• 0 1 0 0 0
• 0 1 1 0 1
• 1 0 0 1 1
• 1 0 1 1 1
• 1 1 0 1 0
• 1 1 1 1 0
50
Boolean Function: Example

How to express algebraically

• 1.Form a minterm for each combination forming a 1


• 2.OR all of those terms

• Truth table example:


• x y z F1 minterm
• 0 0 0 0
• 0 0 1 1 x’.y’.z m1
• 0 1 0 0
• 0 1 1 0
• 1 0 0 1 x.y’.z’ m4
• 1 0 1 0
• 1 1 0 0
• 1 1 1 1 x.y.z m7

• F1=m1+m4+m7=x’.y’.z+x.y’.z’+x.y.z=Σ(1,4,7)

51
Boolean Function: Example

How to express algebraically

• Truth table example:


• x y z F2 minterm
• 0 0 0 0 m0
• 0 0 1 0 m1
• 0 1 0 0 m2
• 0 1 1 1 m3
• 1 0 0 0 m4
• 1 0 1 1 m5
• 1 1 0 1 m6
• 1 1 1 1 m7

• F2=m3+m5+m6+m7=x’.y.z+x.y’.z+x.y.z’+x.y.z=Σ(3,5,6,7)

52
Boolean Function: Example

How to express algebraically

• 1.Form a maxterm for each combination forming a 0


• 2.AND all of those terms

• Truth table example:


• x y z F1 maxterm
• 0 0 0 0 x+y+z M0
• 0 0 1 1
• 0 1 0 0 x+y’+z M2
• 0 1 1 0 x+y’+z’ M3
• 1 0 0 1
• 1 0 1 0 x’+y+z’ M5
• 1 1 0 0 x’+y’+z M6
• 1 1 1 1

• F1=M0.M2.M3.M5.M6 = л(0,2,3,5,6)

53
Boolean Function: Example

How to express algebraically

• Truth table example:


• x y z F2 maxterm
• 0 0 0 0 x+y+z M0
• 0 0 1 0 x+y+z’ M1
• 0 1 0 0 x+y’+z M2
• 0 1 1 1
• 1 0 0 0 x’+y+z M4
• 1 0 1 1
• 1 1 0 1
• 1 1 1 1

• F=M0.M1.M2.M4=л(0,1,2,4)=(x+y+z).(x+y+z’).(x+y’+z).(x’+y+z)

54
Maxterms & Minterms: Intuitions
• Minterms:
• If a function is expressed as SUM of PRODUCTS, then if a single
product is 1 the function would be 1.

• Maxterms:
• If a function is expressed as PRODUCT of SUMS, then if a single
product is 0 the function would be 0.

• Canonical Forms:
• Boolean functions expressed as a sum of minterms or product of
maxterms.

55
Standard Forms

Standard Form: Sum of Product or Product of Sum

56
Nonstandard Forms

Nonstandard From: Neither a Sum of Product nor Product of Sum

57
Implementations

Three-level implementation vs. two-level implementation

Two-level implementation normally preferred due to delay importance.

58
• Registers and Transfer Notations
What are Registers?
Registers…
Register Transfer Language (RTL)
• Digital System: An interconnection of hardware
modules that do a certain task on the information.
• Registers + Operations performed on the data stored
in them = Digital Module
• Modules are interconnected with common data and
control paths to form a digital computer system
Register Transfer Language cont.
• Microoperations: operations executed on data stored
in one or more registers.
• For any function of the computer, a sequence of
microoperations is used to describe it
• The result of the operation may be:
– replace the previous binary information of a
register or
– transferred to another register
Shift Right Operation
101101110011 010110111001
Register Transfer Language cont.
• The internal hardware organization of a digital
computer is defined by specifying:
• The set of registers it contains and their function
• The sequence of microoperations performed on the
binary information stored in the registers
• The control that initiates the sequence of
microoperations
• Registers + Microoperations Hardware + Control
Functions = Digital Computer
Register Transfer Language cont.
• Register Transfer Language (RTL) : a symbolic
notation to describe the microoperation transfers
among registers
Next steps:
– Define symbols for various types of microoperations,
– Describe the hardware that implements these
microoperations
Register Transfer (our first microoperation)
• Computer registers are designated by capital
letters (sometimes followed by numerals) to
denote the function of the register
• R1: processor register
• MAR: Memory Address Register (holds an address for a
memory unit)
• PC: Program Counter
• IR: Instruction Register
• SR: Status Register
Register Transfer cont.

• The individual flip-flops in an n-bit register are


numbered in sequence from 0 to n-1 (from
the right position toward the left position)

R1 7 6 5 4 3 2 1 0

Register R1 Showing individual bits

A block diagram of a register


Register Transfer cont.

Other ways of drawing the block diagram of a register:

15 0
PC

Numbering of bits

15 87 0
Upper byte PC(H) PC(L) Lower byte

Partitioned into two parts


Register Transfer cont.
• Information transfer from one register to another is described
by a replacement operator: R2 ← R1
• This statement denotes a transfer of the content of register R1
into register R2
• The transfer happens in one clock cycle
• The content of the R1 (source) does not change
• The content of the R2 (destination) will be lost and replaced
by the new data transferred from R1
• We are assuming that the circuits are available from the
outputs of the source register to the inputs of the destination
register, and that the destination register has a parallel load
capability
Register Transfer cont.
• Conditional transfer occurs only under a
control condition

• Representation of a (conditional) transfer


P: R2 ← R1
• A binary condition (P equals to 0 or 1)
determines when the transfer occurs
• The content of R1 is transferred into R2 only if
P is 1
Register Transfer cont.

Hardware implementation of a controlled transfer: P: R2 ← R1


Block diagram: Control P Load
R2 Clock
Circuit

R1

t t+1

Timing diagram
Clock
Synchronized
Load
with the clock
Transfer occurs here
Register Transfer cont.

Basic Symbols for Register Transfers


Symbol Description Examples
Letters & Denotes a register MAR, R2
numerals
Parenthesis ( ) Denotes a part of a R2(0-7), R2(L)
register
Arrow ← Denotes transfer of R2 ← R1
information
Comma , Separates two R2 ← R1, R1 ← R2
microoperations
Bus and Memory Transfers
• Paths must be provided to transfer information from
one register to another
• A Common Bus System is a scheme for transferring
information between registers in a multiple-register
configuration
• A bus: set of common lines, one for each bit of a
register, through which binary information is
transferred one at a time
• Control signals determine which register is selected
by the bus during each particular register transfer
Bus and Memory Transfers
Register A Register B Register C Register D

Bus lines

Register D Register C Register B Register A


3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0

D3 D2 D1 D0 C3 C2 C1 C0 B3 B2 B1 B0 A3 A2 A1 A0

D3 C3 B3 A3 D2 C2 B2 A2 D 1 C1 B 1 A 1 D0 C0 B0 A0

3 2 1 0 3 2 1 0 3 2 1 0
3 2 1 0 S0
S0 S0 S0
MUX3 MUX2 MUX1 MUX0 S1
S1 S1 S1

4-Line Common Bus


Bus and Memory Transfers
• The transfer of information from a bus into one of many
destination registers is done:
– By connecting the bus lines to the inputs of all destination
registers and then:
– activating the load control of the particular destination register
selected
• We write: R2 ← C to symbolize that the content of register
C is loaded into the register R2 using the common system
bus
• It is equivalent to: BUS ←C, (select C)
R2 ←BUS (Load R2)
Bus and Memory Transfers: Three-State Bus
Buffers
• A bus system can be constructed with three-
state buffer gates instead of multiplexers
• A three-state buffer is a digital circuit that
exhibits three states: logic-0, logic-1, and high-
impedance (Hi-Z)
Control input C

Normal input A Output B

Three-State Buffer
Bus and Memory Transfers: Memory
Transfer
• Memory read : Transfer from memory
• Memory write : Transfer to memory
• Data being read or written is called a memory word
(called M)-
• It is necessary to specify the address of M when
writing /reading memory
• This is done by enclosing the address in square brackets
following the letter M
• Example: M[0016] : the memory contents at address
0x0016
Bus and Memory Transfers: Memory Transfer
cont.

• Assume that the address of a memory unit is


stored in a register called the Address Register
AR
• Lets represent a Data Register with DR, then:
• Read: DR ← M[AR]
• Write: M[AR] ← DR
Bus and Memory Transfers: Memory Transfer
cont.

AR
x0C 19
x12 x0E 34
R1 x10 45
100 x12 66
x14 0
x16 13
R1←M[AR] x18 22

RAM

R1 R1
100 66
Lecture Three
• Microoperations
Microoperations
• The microoperations most often encountered
in digital computers are classified into four
categories:
– Register transfer microoperations
– Arithmetic microoperations (on numeric data
stored in the registers)
– Logic microoperations (bit manipulations on non-
numeric data)
– Shift microoperations
Arithmetic Microoperations
• The basic arithmetic microoperations are:
addition, subtraction, increment, decrement,
and shift
• Addition Microoperation:
R3 ←R1+R2
• Subtraction Microoperation:
R3 ←R1-R2 or : 1’s complement

R3 ←R1+R2+1
Arithmetic Microoperations cont.
• One’s Complement Microoperation:
R2 ←R2
• Two’s Complement Microoperation:
R2 ←R2+1
• Increment Microoperation:
R2 ←R2+1
• Decrement Microoperation:
R2 ←R2-1
Logic Microoperations

OR Microoperation
• Symbol: , +

• Gate:

• Example: 1001102  10101102 = 11101102


OR OR

P+Q: R1←R2+R3, R4←R5


ADD
R6
Logic Microoperations

AND Microoperation
• Symbol: 

• Gate:

• Example: 1001102  10101102 = 00001102


Logic Microoperations

Complement (NOT) Microoperation


• Symbol: 

• Gate:

• Example: 10101102 = 01010012


Logic Microoperations

XOR (Exclusive-OR) Microoperation


• Symbol: 

• Gate:

• Example: 1001102  10101102 = 11100002


Logic Microoperations
Other Logic Microoperations
Selective-set Operation
• Used to force selected bits of a register into
logic-1 by using the OR operation

• Example: 01002  10002 = 11002

Loaded into a register from


In a processor register
memory to perform the
selective-set operation
Logic Microoperations
Other Logic Microoperations cont.
Selective-complement (toggling) Operation
• Used to force selected bits of a register to be
complemented by using the XOR operation

• Example: 00012  10002 = 10012

Loaded into a register from


In a processor register
memory to perform the
selective-complement operation
Logic Microoperations
Other Logic Microoperations cont.
Insert Operation
• Step1: mask the desired bits
• Step2: OR them with the desired value

• Example: suppose R1 = 0110 1010, and we desire to


replace the leftmost 4 bits (0110) with 1001 then:
– Step1: 0110 1010  0000 1111
– Step2: 0000 1010  1001 0000
•  R1 = 1001 1010
Logic Microoperations
Other Logic Microoperations cont.
NAND Microoperation
• Symbols:  and 

• Gate:

• Example: 1001102  10101102 = 11110012


Logic Microoperations
Other Logic Microoperations cont.
NOR Microoperation
• Symbols:  and 

• Gate:

• Example: 1001102  10101102 = 00010012


Logic Microoperations
Other Logic Microoperations cont.
Set (Preset) Microoperation
• Force all bits into 1’s by ORing them with a value in
which all its bits are being assigned to logic-1
• Example: 1001102  1111112 = 1111112
Clear (Reset) Microoperation
• Force all bits into 0’s by ANDing them with a value in
which all its bits are being assigned to logic-0
• Example: 1001102  0000002 = 0000002
Shift Microoperations
• Used for serial transfer of data
• Also used in conjunction with arithmetic, logic, and
other data-processing operations
• The contents of the register can be shifted to the left
or to the right
• As being shifted, the first flip-flop receives its binary
information from the serial input
• Three types of shift: Logical, Circular, and Arithmetic
Shift Microoperations cont.

Serial Input r2 Serial Output


rn-1 r3 r1 r0

Determines Shift Right


the “shift”
type

Serial Output Serial Input


rn-1 r3 r2 r1 r0

Shift Left

**Note that the bit ri is the bit at position (i) of the register
Shift Microoperations:
Logical Shifts
• Transfers 0 through the serial input
• Logical Shift Right: R1←shr R1
The same

• Logical Shift Left: R2←shl R2


The same

? rn-1 r3 r2 r1 r0 0

Logical Shift Left


Shift Microoperations:
Circular Shifts (Rotate Operation)
• Circulates the bits of the register around the
two ends without loss of information
• Circular Shift Right: R1←cir R1
The same

• Circular Shift Left: R2←cil R2


The same

rn-1 r3 r2 r1 r0

Circular Shift Left


Shift Microoperations
Arithmetic Shifts
• Shifts a signed binary number to the left or right
• An arithmetic shift-left multiplies a signed binary
number by 2: ashl (00100): 01000
• An arithmetic shift-right divides the number by 2
ashr (00100) : 00010
• An overflow may occur in arithmetic shift-left, and
occurs when the sign bit is changed (sign reversal)
Shift Microoperations
Arithmetic Shifts cont.

rn-1 r3 r2 r1 r0
?

Sign Arithmetic Shift Right


Bit

? rn-1 r3 r2 r1 r0 0
Sign
Arithmetic Shift Left
Bit
Shift Microoperations cont.

• Example: Assume R1=11001110, then:


– Arithmetic shift right once : R1 = 11100111
– Arithmetic shift right twice : R1 = 11110011
– Arithmetic shift left once : R1 = 10011100
– Arithmetic shift left twice : R1 = 00111000
– Logical shift right once : R1 = 01100111
– Logical shift left once : R1 = 10011100
– Circular shift right once : R1 = 01100111
– Circular shift left once : R1 = 10011101
Lecture Four
• Instruction Formats and Memory Addressing
Modes
Instruction Formats
Common Instruction Formats
Instruction and Word Length Relationships
Instruction and Word Length Relationships
Design of Instruction Formats
Addressing
Immediate Addressing
Immediate Addressing…
Direct Addressing
Direct Addressing…
Register Addressing
Register Indirect Addressing
Register Indirect Addressing
Indexed Addressing
Indexed Addressing
Lecture Five
• Memory Hierarchy and Management
MEMORY HIERARCHY AND MANAGEMENT
• The main memory exhibits certain characteristics: It is
fast; It is randomly accessed; It is expensive; It is located
close (but not inside) to the CPU; It is used to store
currently executed programs and data. On the other
hand, the secondary memory is slow, cheap, directly
accessed, and located remotely from the CPU. 
• Microprocessors are working at a very high rate and they
need large memories; but the memories in place are
much slower than microprocessors. Therefore, there is
the need for memory that would accomodate very large
programs and work at a speed comparable to that of the
microprocessors.
The Memory Hierarchy
• The fact is, the larger a memory, the slower it is,
the faster the memory, the greater the cost/bit.
This led to the creation of a composite memory
system which combines a small, fast memory and
a large slow main memory, which behaves (most
of the time) like a large fast memory. This two
level principle can be extended into a hierarchy of
many levels including the secondary memory (disk
store).
The Memory Hierarchy
• The effectiveness of such a memory hierarchy
is based on property of programs called the
principle of locality. This states that most
programs do not access all code or data
uniformly.
•  
The Memory Hierarchy
Cache Memory

Operations with the Cache Memory


Cache Memory
• Where there is an access to an item which is in
the cache, there is a hit. Where there is an
access to an item which is not in the cache,
there is a miss. The proportion of all memory
accesses that are satisfied by the cache is called
hit rate, while the proportion of all memory
accesses that are not satisfied by the cache is
called miss rate. The miss rate of a well-
designed cache should be very low (a few %).
Cache Memory
• The Cache space (~Kbytes- Mbytes) is much smaller than
main memory (~Mbytes- Gbytes); Items have to be
placed in the cache so that they are available there when
(and possibly only when) they are needed. This can only
work with the principle of locality. During execution of a
program, memory references by the processor, for both
instructions and data, tend to cluster: once an area of the
program is entered, there are repeated references to a
small set of instructions (loop, subroutine) and data
(components of a data structure, local variables or
parameters on the stack).
Cache Memory
• There are two types of locality:
• Temporal locality (locality in time): If an item
is referenced, it will tend to be referenced
again soon.
• Spatial locality (locality in space): If an item is
referenced, items whose addresses are close
by will tend to be referenced soon.
Cache Memory
• It is common also to split the cache into one
dedicated to instructions and one dedicated to
data – split cache. But, sometimes the
implementation is unified – unified cache
Cache Memory
• Advantages of unified caches:
• They are able to better balance the load
between instruction and data fetches
depending on the dynamics of the program
execution.
• The design and implementation are cheaper.
Cache Memory
• Advantages of split caches
• Competition for the cache between
instruction processing and execution units is
eliminated.
• Instruction fetch can proceed in parallel with
memory access from the execution unit.
Cache Memory

Separate Data and Instruction Cache


Cache Memory Organization Techniques

• Direct Mapping
• Set Associative Mapping
• Associative Mapping
Direct Mapping
• In direct mapping, a memory block is
mapped into a unique cache line,
depending on the memory address of
the respective block.
• Tags are stored in the cache in order to
distinguish among blocks which fit into
the same cache line.
 
Direct Mapping If we had a
miss, the
block will be
placed in the
cache line
which
corresponds
to the 14 bits
field in the
memory
address of
the
respective
block:
Cache Memory Organization Techniques
• Advantages of direct mapping are as follows:
• It is simple and cheap;
• The tag field is short; only those bits which are not used to address
the cache have to be stored;
• Access is very fast.
• Disadvantages:
• A given block fits into a fixed cache location
• A given cache line will be replaced whenever there is a reference to
another memory block which fits to the same line, regardless of
what the status of the other cache lines is. This can produce a low
hit ratio, even if only a very small part of the cache is effectively
used
Set Associative Mapping
• In set associative mapping, a memory block is mapped
into any of the lines of a set. The set is determined by the
memory address, but the line inside the set can be any
one.
• If a block has to be placed in the cache the particular line
of the set will be determined according to a replacement
algorithm.
• The number of lines in a set is determined by the
designer. If there are 2 lines/set, it is a two-way set
associative mapping and a four-way set associative
mapping if there are 4 lines in a set.
Set Associative Mapping

Set Associative Mapping (with a two-way set associative cache)


Set Associative Mapping
• Advantages of Set Associative Mapping:
• There is fast access
• It is relatively simple
• The tag field is short
• It keeps most of the advantages of direct mapping, and tries to eliminate
the main shortcoming of direct mapping; a certain flexibility is given
concerning the line to be replaced when a new block is read into the
cache.

• Disadvantage
• Cache hardware is more complex than for direct mapping.
• In practice, 2 and 4-way set associative mapping are used with very good
results. Larger sets do not produce further significant performance
improvement.
Associative Mapping
• In associative mapping, a memory block can be mapped to any
cache line. If a block has to be placed in the cache, the particular
line will be determined according to a replacement algorithm.

• All tags, corresponding to every line in the cache memory, have


to be checked in order to determine if we have a hit or miss. If we
have a hit, the cache logic finally points to the actual line in the
cache. The cache line is retrieved based on a portion of its
content (the tag field) rather than its address. Such a memory
structure is called associative memory.
•  
Associative Mapping
Associative Mapping
• Advantages of Associative Mapping:
• Associative mapping provides the highest flexibility concerning the line
to be replaced when a new block is read into the cache.
 
• Disadvantages:
• It is complex
• The tag field is long
• Fast access can be achieved only using high performance associative
memories for the cache; this can be difficult and expensive to get.

• When a new block is to be placed into the cache, the block stored in
one of the cache lines has to be replaced.
Replacement Algorithms
• Random replacement:
• One of the candidate lines is selected randomly.

• Least recently used (LRU):


•  The candidate line which holds the block that has been in the cache
the longest without being referenced is selected.

• First-in-first-out (FIFO):
• The candidate line which holds the block that has been in the cache the
longest is selected.

•  Least frequently used (LFU):


• The candidate line which holds the block that has got the fewest
references is selected.
Replacement Algorithms
• LRU is the most efficient: relatively simple to
implement and yields good results.
• FIFO is simple to implement. Random
replacement is the simplest to implement and
results are surprisingly good.
Write Strategies
• The problem is:
• How to keep cache content and the content of the
main memory consistent without losing too much
performance? Problems arise when a write is issued
to a memory address, and the content of the
respective address is potentially changed.
• These strategies are outlined as follows:
•  Write-through
•  Write-through with buffered write
• Copy-back
Write Strategies
• Write-through
• All write operations are passed to the main memory; if the
addressed location is currently held in the cache, the cache is
updated so that it is coherent with the main memory. For writes,
the processor always slows down to main memory speed.
 
• Write-through with buffered write
• The same as write-through, but instead of slowing the processor
down by writing directly to main memory, the write address and
data are stored in a high-speed write buffer; the write buffer
transfers data to main memory while the processor continues its
task. The speed is higher, but requires more complex hardware
 
Write Strategies
• Copy-back
• Write operations update only the cache memory
which is not kept coherent with main memory;
cache lines have to remember if they have been
updated; if such a line is replaced from the cache, its
content has to be copied back to memory. It
exhibits good performance (usually several writes
are performed on a cache line before it is replaced
and has to be copied into main memory). The
hardware is complex.
Lecture Six
• Pipelining
Pipelining
Two Stage Pipeline
Six-Stage Pipeline
Pipeline Hazards
• Pipeline hazards are situations that prevent the next
instruction in the instruction stream from executing
during its designated clock cycle. The instruction is
said to be stalled. When an instruction is stalled, all
instructions later in the pipeline than the stalled
instruction are also stalled. Instructions earlier than
the stalled one can continue. No new instructions
are fetched during the stall.
 
Pipeline Hazards
• Types of hazards:
• Structural hazards
• Data hazards
• Control hazards
 
Structural hazards

Structural hazards occur when a certain resource (memory, functional


unit) is requested by more than one instruction at the same time. An
example is shown in the figure below.
Instruction ADD R4,X fetches in the FO stage operand X from memory.
The memory doesn’t accept another access during that cycle.

The Penalty is 1 cycle

Structural hazard
Structural hazards
• How to avoid Structural Hazards:
Certain resources are duplicated in order to
avoid structural hazards. Functional units
(ALU, FP unit) can be pipelined themselves in
order to support several instructions at a time.
A classical way to avoid hazards at memory
access is by providing separate data and
instruction caches.
Data Hazards
We have two
instructions, I1 and I2.
In a pipeline, the
execution of I2 can
start before I1 has
terminated. If in a
certain stage of the
pipeline, I2 needs the
result produced by I1,
but this result has not
yet been generated, we Data Hazards
have a data hazard. The Penalty is 2 cycles
Data Hazards
• How to avoid Data Hazards:
• Some of the penalty produced by data hazards
can be avoided using a technique called
forwarding (bypassing).

Forwarding
Data Hazards
The ALU result is always fed back to the ALU input. If the
hardware detects that the value needed for the current
operation is the one produced by the previous operation (but
which has not yet been written back) it selects the forwarded
result as the ALU input, instead of the value read from register
or memory

After the EI stage of the MUL instruction, the result is


available by forwarding.
The penalty is reduced to one cycle.
 
Control Hazards
• Control hazards are produced by branch instructions. An
example of one produced by an unconditional branch is
shown in the figure on the next slide with a kind of
unconditional branch statement described as follows: 

•  
• --------------
• BR TARGET
• --------------
• TARGET - - - - - - - - - - - - - -
Control Hazards

The penalty is 3 cycles

Control Hazard due to unconditional branch


Control Hazards
ADD R1,R2 R1 = R1 + R2
BEZ TARGET branch if zero
instruction i+1
-------------
TARGET- - - - - - - - - - - - -

Control Hazard due to conditional branch


Reducing Branch Penalties
• Branch instructions represent a major
problem in assuring an optimal flow through
the pipeline. Several approaches have been
taken for reducing branch penalties.
• These include:
• Delayed Branching
• Branch Prediction
• Speculative execution
Reducing Branch Penalties
• Delayed Branching
• With delayed branching the CPU always
executes the instruction that immediately
follows after the branch and only then alters
(if necessary) the sequence of execution. The
instruction after the branch is said to be in the
branch delay slot.
 
Reducing Branch Penalties
• Branch Prediction
• Correct branch prediction is very important and can produce
substantial performance improvements. Based on the predicted
outcome, the respective instruction can be fetched, as well as
the instructions following it, and they can be placed into the
instruction queue. If, after the branch condition is computed, it
turns out that the prediction was correct, execution continues.
On the other hand, if the prediction is not fulfilled, the fetched
instruction(s) must be discarded and the correct instruction
must be fetched. To take full advantage of branch prediction, we
can have the instructions not only fetched but also begin
execution. This is known as speculative execution.
Reducing Branch Penalties
• Speculative execution
• This means that instructions are executed before the processor is
certain that they are in the correct execution path. If it turns out that
the prediction was correct, execution goes on without introducing
any branch penalty. If, however, the prediction is not fulfilled, the
instruction(s) started in advance and all their associated data must be
purged and the state previous to their execution restored.

Branch prediction strategies can be: Static prediction or Dynamic


prediction
•  Static prediction techniques do not take into consideration execution
history, while Dynamic prediction techniques improve the accuracy of
the prediction by recording the history of conditional branches.

You might also like