Double
Double
+
Chapter 1
William Stallings
Computer Organization Basic Concepts and
and Architecture
10th Edition Computer Evolution
© 2016 Pearson Education, Inc., Hoboken,
NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Computer Architecture
Computer Organization
• Attributes of a system • Instruction set, number of
370 Architecture
visible to the bits used to represent
programmer various data types, I/O
• Have a direct impact on mechanisms, techniques
the logical execution of a
program
for addressing memory
IBM System/370 architecture
Was introduced in 1970
Architectural Included a number of models
Computer
attributes Could upgrade to a more expensive, faster model without having to
Architecture
include:
abandon original software
New models are introduced with improved technology, but retain the
same architecture so that the customer’s software investment is
protected
Organizational
Computer Architecture has survived to this day as the architecture of IBM’s
attributes mainframe product line
Organization
include:
I/O Main
memory
Control
Unit
computer and its external
CONTROL
environment
UNIT
System Interconnection –
Sequencing
Logic
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Control Unit
+
Multicore Computer Structure
Controls the operation of the CPU
and hence the computer
Central processing unit (CPU)
Arithmetic and Logic Unit (ALU) Portion of the computer that fetches and executes instructions
Performs the computer’s data Consists of an ALU, a control unit, and registers
processing function Referred to as a processor in a system with a single processing unit
Registers Core
Provide storage internal to the CPU An individual processing unit on a processor chip
May be equivalent in functionality to a CPU on a single-CPU system
CPU Interconnection Specialized processing units are also referred to as cores
Some mechanism that provides for
communication among the control Processor
unit, ALU, and registers A physical piece of silicon containing one or more cores
Is the computer component that interprets and executes instructions
Referred to as a multicore processor if it contains multiple cores
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Main memory chips
Cache Memory
Processor
I/O chips chip
Multiple layers of memory between the processor and main PROCESSOR CHIP
L3 cache L3 cache
Is smaller and faster than main memory
Core Core Core Core
Used to speed up memory access by placing in the cache
data from main memory that is likely to be used in the near
future Instruction
Arithmetic
and logic Load/
logic unit (ALU) store logic
using multiple levels of cache, with level 1 (L1) closest to the L2 instruction L2 data
core and additional levels (L2, L3, etc.) progressively farther cache cache
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Figure 1.4
IBM
+ zEnterprise
EC12 Processor
Unit (PU)
Chip Diagram
Figure 1.3
Motherboard with Two Intel Quad-Core Xeon Processors Storage Control (SC)
Memory Controller (MC)
AC MQ
Input- 0 1 39
Arithmetic-logic output
circuits
equipment
(I, O)
MBR
sign bit (a) Number word
Instructions
and data
Instructions
and data left instruction (20 bits) right instruction (20 bits)
M(0)
M(1)
M(2)
PC IBR
M(3)
AC: Accumulator register 0 8 20 28 39
M(4)
MQ: multiply-quotient register
MBR: memory buffer register
IBR: instruction buffer register
MAR IR PC: program counter
MAR: memory address register
Main
memory
IR: insruction register opcode (8 bits) address (12 bits) opcode (8 bits) address (12 bits)
(M)
Control
Control
circuits
M(4092)
signals (b) Instruction word
M(4093)
M(4095)
Program control unit (CC)
Addresses
Left
No Yes IBR MBR (20:39)
IR IBR (0:7) IR MBR (20:27) instruction
IR MBR (0:7)
Memory address • Specifies the address in memory of the word to be written from
MAR IBR (8:19) MAR MBR (28:39) required?
MAR MBR (8:19)
register (MAR) or read into the MBR
PC PC + 1
Instruction register (IR) • Contains the 8-bit opcode instruction being executed Decode instruction in IR
Execution Yes
Instruction buffer • Employed to temporarily hold the right-hand instruction from a cycle
Is AC > 0?
• Contains the address of the next instruction pair to be fetched AC MBR AC AC + MBR
Program counter (PC) from memory
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Symbolic
Instruction Type Opcode Representation Description
00001010 LOAD MQ Transfer contents of register MQ to the
accumulator AC
00001001 LOAD MQ,M(X) Transfer contents of memory location X to
MQ
00100001 STOR M(X) Transfer contents of accumulator to memory
Data transfer location X
00000001 LOAD M(X) Transfer M(X) to the accumulator
00000010 LOAD –M(X) Transfer –M(X) to the accumulator
00000011 LOAD |M(X)| Transfer absolute value of M(X) to the
accumulator
00000100 LOAD –|M(X)| Transfer –|M(X)| to the accumulator
Unconditional 00001101 JUMP M(X,0:19) Take next instruction from left half of M(X)
Smaller
branch 00001110
00001111
JUMP M(X,20:39)
JUMP+ M(X,0:19)
Take next instruction from right half of M(X)
If number in the accumulator is nonnegative,
take next instruction from left half of M(X)
Table 1.1
JU
MP
If number in the
accumulator is nonnegative, Cheaper
Conditional branch + take next instruction from
The IAS
M(X right half of M(X)
,20:
39) Dissipates less heat than a vacuum tube
00000101 ADD M(X) Add M(X) to AC; put the result in AC
Instruction Set
00000111 ADD |M(X)| Add |M(X)| to AC; put the result in AC Is a solid state device made from silicon
00000110 SUB M(X) Subtract M(X) from AC; put the result in AC
00001000 SUB |M(X)| Subtract |M(X)| from AC; put the remainder
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ +
Second Generation Computers
Introduced:
More complex arithmetic and logic units and
Approximate Typical Speed control units
Generation Dates Technology (operations per second)
1 1946–1957 Vacuum tube 40,000 The use of high-level programming languages
200,000
2 1957–1964 Transistor Provision of system software which provided the
3 1965–1971 Small and medium scale 1,000,000
integration ability to:
4 1972–1977 Large scale integration 10,000,000 Load programs
5 1978–1991 Very large scale integration 100,000,000
>1,000,000,000
Move data to peripherals
6 1991- Ultra large scale integration
Libraries perform common computations
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
IBM 7094 computer Peripheral devices
Mag tape
units History of Computers
CPU
Card
Data
channel
punch
Third Generation: Integrated Circuits
Line
printer
Card
reader
Memory
Manufacturing process was expensive and cumbersome
Data Teleprocessing
channel equipment
The two most important members of the third generation
were the IBM System/360 and the DEC PDP-8
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Wafer
100 bn
10 bn
1 bn
Chip 100 m
10 m
100,000
10.000
1,000
100
10
1
Gate 1947 50 55 60 65 70 75 80 85 90 95 2000 05 11
Packaged
chip
Figure 1.12 Growth in Transistor Count on Integrated Circuits
(DRAM memory)
Figure 1.11 Relationship Among Wafer, Chip, and Gate
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Moore’s Law +
1965; Gordon Moore – co-founder of Intel
IBM System/360
Announced in 1964
Observed number of transistors that could be
Product line was incompatible with older IBM machines
put on a single chip was doubling every year
Was the success of the decade and cemented IBM as the
overwhelmingly dominant computer vendor
Consequences of Moore’s law:
The architecture remains to this day the architecture of IBM’s
The pace slowed to a
doubling every 18
mainframe computers
months in the 1970’s
but has sustained The cost of The electrical Computer
that rate ever since computer logic
and memory
path length is
shortened,
becomes smaller
and is more
Reduction in
power and
Fewer
interchip
Was the industry’s first planned family of computers
circuitry has increasing convenient to cooling
connections
fallen at a
dramatic rate
operating
speed
use in a variety
of environments
requirements
Models were compatible in the sense that a program written for
one model should be capable of being executed by another
model in the series
Omnibus
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Semiconductor Memory
+ LSI
Large
Scale
Integration In 1970 Fairchild produced the first relatively capacious semiconductor memory
more than 1000 components
can be placed on a Chip was about the size Could hold 256 bits of
Non-destructive Much faster than core
single integrated circuit chip of a single core memory
VLSI
Very Large In 1974 the price per bit of semiconductor memory dropped below the price per bit
of core memory
Scale
Integration
10,000 components per chip
ULSI
Semiconductor Memory Ultra Large Since 1970 semiconductor memory has been through 13 generations
Microprocessors Scale
Each generation has provided four times the storage density of the previous generation, accompanied
Integration by declining cost per bit and declining access time
one billion components
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ +
The Internet of Things (IoT) Embedded Application Processors
Operating versus
Term that refers to the expanding interconnection of smart devices, ranging Systems Dedicated Processors
from appliances to tiny sensors
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Cloud Computing
Hard-
ware A/D D/A
AES con- con-
verter verter
Peripheral bus
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Cloud Networking
Refers to the networks and network management functionality that must
be in place to enable cloud computing
Cloud Storage
Subset of cloud computing
Chapter 3
regard to the type of data contained there
Execution occurs in a sequential fashion (unless explicitly
+ modified) from one instruction to the next
Function and Interconnection The result of the process of connecting the various components in
the desired configuration
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Data
Sequence of
arithmetic
and logic
functions
Results Software
• A sequence of codes or instructions
• Part of the hardware interprets each instruction and
Software
(a) Programming in hardware
generates control signals
• Provide a new sequence of codes for each new
program instead of rewiring the hardware
Instruction
codes
Instruction
interpreter
Major components:
• CPU I/O
Control
signals
• Instruction interpreter
• Module of general-purpose arithmetic and logic
General-purpose functions
arithmetic
Data
and logic
Results • I/O Components
functions • Input module
+ • Contains basic components for accepting data
(b) Programming in software
and instructions and converting them into an
internal form of signals usable by the system
• Output module
Figure 3.1 Hardware and Software Approaches
• Means of reporting results
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
CPU Main Memory
0
System 1
2
PC MAR Bus
Memory Memory buffer Instruction
Fetch Cycle Execute Cycle At the beginning of each instruction cycle the processor
fetches an instruction from memory
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
0 3 4 15
Action Categories Opcode Address
Processor- Processor-
memory I/O Program Counter (PC) = Address of instruction
Instruction Register (IR) = Instruction being executed
Accumulator (AC) = Temporary storage
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 0 PC 300 1 9 4 0 3 0 1 PC
301 5 9 4 1 AC 301 5 9 4 1 0 0 0 3 AC
302 2 9 4 1 1 9 4 0 IR 302 2 9 4 1 1 9 4 0 IR Instruction Operand Operand
• •
•
940 0 0 0 3
•
940 0 0 0 3 fetch fetch store
941 0 0 0 2 941 0 0 0 2
Step 1 Step 2
Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 1 PC 300 1 9 4 0 3 0 2 PC Multiple Multiple
301 5 9 4 1 0 0 0 3 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 5 9 4 1 IR 302 2 9 4 1 5 9 4 1 IR
operands results
• •
• •
940 0 0 0 3 940 0 0 0 3 3+2=5
941 0 0 0 2 941 0 0 0 2
Instruction Instruction Operand Operand
Step 3 Step 4 Data
address operation address address
Operation
Memory CPU Registers Memory CPU Registers calculation decoding calculation calculation
300 1 9 4 0 3 0 2 PC 300 1 9 4 0 3 0 3 PC
301 5 9 4 1 0 0 0 5 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 2 9 4 1 IR 302 2 9 4 1 2 9 4 1 IR
•
•
•
• Return for string
940 0 0 0 3 940 0 0 0 3 Instruction complete, or vector data
941 0 0 0 2 941 0 0 0 5 fetch next instruction
Step 5 Step 6
1 4 1 4 1 4
I/O Generated by an I/O controller, to signal normal completion of an WRITE WRITE 5 WRITE 5
operation, request service from the processor, or to signal a variety of END END
error conditions. 3a
Hardware failure Generated by a failure such as power failure or memory parity error. 3 3
3b
HALT
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Time Time
1 1 1 1
4 4 4 4
I/O operation
I/O operation;
processor waits
2a concurrent with I/O operation; 2 I/O operation
processor executing processor waits concurrent with
processor executing;
5 5 then processor
waits
5
2b
2
5
4 2
I/O operation 4
4 3a concurrent with
processor executing 4
I/O operation;
processor waits 5 3 I/O operation
concurrent with
I/O operation; processor executing;
5 3b processor waits then processor
waits
Figure 3.10 Program Timing: Short I/O Wait Figure 3.11 Program Timing: Long I/O Wait
Multiple Multiple
operands results
(a) Sequential interrupt processing
No
Instruction complete, Return for string interrupt
fetch next instruction or vector data
Interrupt
handler Y
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Printer Communication
+
User program
interrupt service routine interrupt service routine
t=0
External
Address M Ports Data
Internal
An I/O
Data Interrupt module is
Signals
External
Data
allowed to
exchange
data
Processor Processor
directly
reads an Processor reads data Processor
with
instruction writes a from an I/O sends data
memory
Instructions Address or a unit of unit of data device via to the I/O
without
data from to memory an I/O device
Control going
Data CPU Signals memory module
through the
Interrupt Data processor
Signals using direct
memory
access
Figure 3.15 Computer Modules
I/O device
I/O Hub
Packets
Protocol Protocol
DRAM
DRAM
Core Core
A B
Routing Routing
DRAM
DRAM
Core Core
C D Flits
Link Link
I/O device
I/O device
I/O Hub Physical Phits Physical
Figure 3.17 Multicore Configuration Using QPI Figure 3.18 QPI Layers
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
COMPONENT A
Intel QuickPath Interconnect Port
Fwd Clk
Rcv Clk
Transmission Lanes Reception Lanes
#2n+1 #n+1 #1 QPI
lane 0
Fwd Clk
Rcv Clk
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ +
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Core Core
Gigabit PCIe
Memory
Ethernet
Chipset
A popular high bandwidth, processor independent bus that can
PCIe
function as a mezzanine or peripheral bus PCIe–PCI
Bridge
Memory
128b/ PCIe
B6 B2
130b lane 2
Figure 3.22 PCIe Protocol Layers Figure 3.23 PCIe Multilane Distribution
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
D+ D–
Differential
the software above the TL and creates
Scrambler Receiver request packets for transmission to a
8b 1b Clock recovery destination via the link layer
circuit
128b/130b Encoding
Data recovery Most transactions use a split transaction
circuit
technique
130b 1b
A request packet is sent out by a
source PCIe device which then waits
Parallel to serial Serial to parallel
for a response called a completion
1b 130b packet
Transmitter Differential
128b/130b Decoding TL messages and some write
Driver
transactions are posted transactions
128b (meaning that no response is
D+ D– expected)
Descrambler
(a) Transmitter TL packet format supports 32-bit
8b
memory addressing and extended
(b) Receiver 64-bit memory addressing
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 3.2
PCIe TLP Transaction Types
Address Space TLP Type Purpose
Memory Read Request
Transfer data to or from a location in the
Memory I/O Memory Memory Read Lock Request
system memory map.
The memory space includes Memory Write Request
This address space is used
system main memory and I/O Read Request Transfer data to or from a location in the
PCIe I/O devices
for legacy PCI devices, with I/O
reserved address ranges I/O Write Request system memory map for legacy devices.
Certain ranges of memory
addresses map into I/O used to address legacy I/O Config Type 0 Read Request
devices devices
Config Type 0 Write Request Transfer data to or from a location in the
Configuration
Config Type 1 Read Request configuration space of a PCIe device.
Configuration Message Config Type 1 Write Request
This address space enables This address space is for Message Request Provides in-band messaging and event
the TL to read/write Message
control signals related to Message Request with Data reporting.
configuration registers interrupts, error handling,
Completion
associated with I/O devices and power management
Memory, I/O, Completion with Data
Returned for certain requests.
Configuration Completion Locked
Completion Locked with Data
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Number
of octets
1 STP framing 1 Start
Appended by PL
2 Sequence number
DLLP
Created
by DLL
4
Chapter 3
Figure 3.25 PCIe Protocol Data Unit Format PCIe data link layer
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
dc voltage
Address line T3 T4
T5 C1 C2 T6
RAM technology is divided into two technologies: Transistor
Presence or absence of charge in a capacitor is interpreted as Bit line Address Bit line
a binary 1 or 0 B line B
Requires periodic charge refreshing to maintain data storage (a) Dynamic RAM (DRAM) cell (b) Static RAM (SRAM) cell
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Static RAM Both volatile
(SRAM) Power must be continuously supplied to the
memory to preserve the bit values
Static
Faster
Used for cache memory (both on and off chip)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ +
No power source is required to maintain the bit values in Nonvolatile and may be written into only once
memory
Writing process is performed electrically and may be
Data or program is permanently in main memory and never performed by supplier or customer at a time later than the
needs to be loaded from a secondary storage device original chip fabrication
Data is actually wired into the chip as part of the fabrication Special equipment is required for the writing process
process
Disadvantages of this: Provides flexibility and convenience
No room for error, if one bit is wrong the whole batch of ROMs Attractive for high volume production runs
must be thrown out
Data insertion step includes a relatively large fixed cost
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
RAS CAS WE OE
Flash
EPROM EEPROM
Memory
Refresh
Counter MUX
Electrically erasable
Erasable programmable programmable read-only Intermediate between
memory EPROM and EEPROM in
read-only memory Row Row Memory array
both cost and functionality De-
Address (2048 2048 4)
A0 coder
A1 Buffer
Can be written into at any
time without erasing prior
contents
Erasure process can be Uses an electrical erasing
technology, does not
performed repeatedly Data Input
Combines the advantage of provide byte-level erasure D1
A10 Column Buffer
non-volatility with the Address D2
flexibility of being Refresh circuitry D3
Buffer Data Output D4
updatable in place Buffer
More expensive than Microchip is organized so Column Decoder
PROM but it has the that a section of memory
advantage of the multiple More expensive than cells are erased in a single
update capability EPROM action or “flash”
Decode 1 of
A16 2 31 A18 D0 2 23 D3 Memory address 512 bits
512
A15 3 30 A17 D1 3 22 D2 register (MAR) Chip #1
Decode 1 of
512 bits
512
Chip #8
(a) 8 Mbit EPROM (b) 16 Mbit DRAM
Decode 1 of
512 bit-sense
If consecutive words of
memory are stored in different
banks, the transfer of a block of
memory is speeded up
Hard Failure
Permanent physical defect
Data Out M
Memory cell or cells affected cannot reliably store data but become Corrector
stuck at 0 or 1 or switch erratically between 0 and 1
Can be caused by:
Harsh environmental abuse
Manufacturing defects Data In M M K
Wear
f
Memory Compare
K K
Soft Error f
Random, non-destructive event that alters the contents of one or more
memory cells
No permanent damage to memory
Can be caused by:
Power supply problems Figure 5.7 Error-Correcting Code Function
Alpha particles
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
(a) A B (b) A B
0 0
Table 5.2
C C
Increase in Word Length with Error Correction
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Bit
12 11 10 9 8 7 6 5 4 3 2 1
position
Position
1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001
number
Bit
12 11 10 9 8 7 6 5 4 3 2 1 Data bit D8 D7 D6 D5 D4 D3 D2 D1
Position
Position Check
C8 C4 C2 C1
1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001 bit
Number
Word
Data stored
D8 D7 D6 D5 D4 D3 D2 D1 0 0 1 1 0 1 0 0 1 1 1 1
Bit as
Check Word
C8 C4 C2 C1
Bit fetched 0 0 1 1 0 1 1 0 1 1 1 1
as
Position
1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001
Figure 5.9 Layout of Data Bits and Check Bits Number
Check
0 0 0 1
Bit
0 0 0 1 0 SDRAM
1 1
1
0
1
Advanced DRAM Organization
1 0 1 0 1 0
0 0 DDR-DRAM
1 1 One of the most critical system bottlenecks when
using high-performance processors is the
(d) (e) (f) interface to main internal memory
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
A0 to A12 Address inputs
BA0, BA1 Bank address lines
CLK Clock input Table 5.3 T0 T1 T2 T3 T4 T5 T6 T7 T8
Chip select
SDRAM
CS COMMAND READ A NOP NOP NOP NOP NOP NOP NOP NOP
Pin
RAS Row address strobe Assignment DQs DOUT A0 DOUT A1 DOUT A2 DOUT A3
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
DDR1 DDR2 DDR3 DDR4
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Flash Memory
Uses only one transistor per bit so it achieves the high density of
EPROM
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Control Gate
N+ N+
Drain Source
P-substrate
+ + + + + +
Control Gate Control Gate
Floating Gate – – – – – –
N+ N+ N+ N+
Drain Source Drain Source
P-substrate P-substrate
(b) Flash memory cell in one state (c) Flash memory cell in zero state
DRAM
High
High
Hard High
High
Hard
PCRAM
Low Easy Low Easy
High Hard High Hard NAND FLASH
Active Code Active Code
Low Low Low Low
power Low execution power Low execution
ReRAM
High High
High
Capacity
High
Capacity
HARD DISK
Read speed Read speed
High High
Write speed Write speed
Decreasing cost
(a) NOR (b) NAND
per bit,
increasing capacity
or density
+ Summary
Perpendicular Perpendicular
Free binary 0 Free binary 1
magnetic layer magnetic layer
layer layer
Interface layer Interface layer
Internal
Direc tion of Direction of
Insulating layer Insulating layer
magn etization magnetization
Interface layer Interface layer
Reference Reference
Perpendicular Perpendicular
Memory
layer Elec tric layer Electric
magnetic layer magnetic layer
curr ent current
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Unit of transfer
Table 4.1
For internal memory the unit of transfer is equal to the number of
Key Characteristics of Computer Memory Systems electrical lines into and out of the memory module
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Method of Accessing Units of Data Capacity and Performance:
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ + Memory
Example
The most common forms are:
Semiconductor memory
Magnetic surface memory
Optical
Suppose that the processor has access to two levels of memory. Level Magneto-optical
1 contains 1000 words and has an access time of 0.1 microseconds;
level 2 contains 100,000 words and has an access time of 1 Several physical characteristics of data storage are important:
Volatile memory
microsecond. Assume that if a word is to be accessed is in level 1,
Information decays naturally or is lost when electrical power is switched off
then the processor accesses it directly. If it is in level 2, then the Nonvolatile memory
word is irst transferred to level 1 and then accessed by the Once recorded, information remains without deterioration until deliberately changed
processor. For simplicity, ignore the time required for the processor No electrical power is needed to retain information
Suppose 95% of the memory accesses are found in cache(level 1). Semiconductor memory
May be either volatile or nonvolatile
Calculate the average time required to access a word?
Nonerasable memory
Average time to access a word= (0.95) (0.1 microsec.) + (0.05) (0.1 Cannot be altered, except by destroying the storage unit
Semiconductor memory of this type is known as read-only memory (ROM)
microsec. + 1 microsec)
For random-access memory the organization is a key design issue
= 0.095+0.055 Organization refers to the physical arrangement of bits to form words
= 0.15 microsec. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Design constraints on a computer’s memory can be summed
up by three questions:
How much, how fast, how expensive
0 1 Disk cache
Fraction of accesses involving only Level 1 (Hit ratio) A portion of main memory can be used as a buffer to hold data
temporarily that is to be read out to disk
A few large transfers of data can be used instead of many small
transfers of data
Figure 4.2 Performance of a Simple Two-Level Memory Data can be retrieved rapidly from the software cache rather than
slowly from the disk
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Block Transfer Line Memory
Number Tag Block address
Word Transfer
0 0
1 1
2 2 Block 0
3 (K words)
CPU Cache Main Memory
Fast Slow
C–1
(a) Single cache Block Length
(K Words)
(a) Cache
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
START
Address
Receive address
RA from CPU Address
buffer
System Bus
in cache? containing RA
Control Control
Yes Processor Cache
Fetch RA word Allocate cache
and deliver line for main
to CPU memory block
Data
buffer
Load main
Deliver RA word
memory block
to CPU
into cache line
Data
DONE
Figure 4.5 Cache Read Operation Figure 4.6 Typical Cache Organization
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Cache Addresses Write Policy Cache Addresses
Logical Write through
Physical Write back Virtual Memory
Cache Size Line Size
Mapping Function Number of caches Virtual memory
Direct Single or two level
Facility that allows programs to address memory from a logical
Associative Unified or split point of view, without regard to the amount of main memory
Set Associative physically available
Replacement Algorithm When used, the address fields of machine instructions contain
Least recently used (LRU) virtual addresses
First in first out (FIFO) For reads to and writes from main memory, a hardware memory
Least frequently used (LFU) management unit (MMU) translates each virtual address into a
physical address in main memory
Random
Table 4.2
Elements of Cache Design
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Year of
Processor Type L1 Cachea L2 cache L3 Cache
Introduction
IBM 360/85 Mainframe 1968 16 to 32 kB — —
Logical address Physical address
MMU PDP-11/70 Minicomputer 1975 1 kB — —
VAX 11/780 Minicomputer 1978 16 kB — —
Processor Main
Cache memory IBM 3033 Mainframe 1978 64 kB — —
IBM 3090 Mainframe 1985 128 to 256 kB — —
Data
Intel 80486 PC 1989 8 kB — —
Pentium
PowerPC 601
PC
PC
1993
1993
8 kB/8 kB
32 kB
256 to 512 KB
—
—
— Table 4.3
PowerPC 620 PC 1996 32 kB/32 kB — —
(a) Logical Cache
PowerPC G4 PC/server 1999 32 kB/32 kB 256 KB to 1 MB 2 MB
IBM S/390 G6 Mainframe 1999 256 kB 8 MB — Cache Sizes of
Pentium 4 PC/server 2000 8 kB/8 kB 256 KB — Some
High-end
Logical address Physical address
IBM SP server/ 2000 64 kB/32 kB 8 MB — Processors
MMU supercomputer
CRAY MTAb Supercomputer 2000 8 kB 2 MB —
Processor Main
memory Itanium PC/server 2001 16 kB/16 kB 96 KB 4 MB
Cache
Itanium 2 PC/server 2002 32 kB 256 KB 6 MB
IBM High-end
Data 2003 64 kB 1.9 MB 36 MB
POWER5 server
CRAY XD-1 Supercomputer 2004 64 kB/64 kB 1MB —
a
Two values separated by
IBM
PC/server 2007 64 kB/64 kB 4 MB 32 MB a slash refer to instruction
(b) Physical Cache POWER6
and data caches.
IBM z10 Mainframe 2008 64 kB/128 kB 3 MB 24-48 MB
Intel Core i7 Workstaton/ b
Both caches are
EE 990
2011 6 32 kB/32 kB 1.5 MB 12 MB instruction only; no data
server
caches.
IBM 24 MB L3
Mainframe/ 24 64 kB/
Figure 4.7 Logical and Physical Caches zEnterprise
Server
2011 24 1.5 MB 192 MB
196 128 kB
L4 (Table can be found on
page 134 in the textbook.)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
b t b
B0 L0
m lines
Because there are fewer cache lines than main memory
blocks, an algorithm is needed for mapping main memory
blocks into cache lines Bm–1
First m blocks of
Lm–1
cache memory
main memory
Three techniques can be used: (equal to size of cache) b = length of block in bits
t = length of tag in bits
(a) Direct mapping
t b
one block of
main memory
Lm–1
cache memory
(b) Associative mapping
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
s+w
+
Memory Address Tag
Cache
Data
Main Memory
WO
Direct Mapping
Tag Line Word W1 B0
L0 W2
s–r r w W3
w Li
s
W4j Address is in two parts
Compare W(4j+1) Bj
w
W(4j+2)
(hit in cache)
W(4j+3) Least Significant w bits identify unique word or byte within a block
1 if match
0 if no match
of main memory
24
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Main memory address (binary)
Tag
(hex) Tag Line + Word Data
00 000000000000000000000000 13579246
00 000000000000000000000100
00 000000001111111111111000
00 000000001111111111111100
Line
Tag Data Number
16 000101100000000000000000 77777777 00 13579246 0000
Address length = (s + w) bits 16 000101100000000000000100 11235813 16 11235813 0001
W(4j+3)
(hit in cache)
1 if match
3FFFFD 111111111111111111110100 33333333
0 if no match 3FFFFE 111111111111111111111000 11223344
s 3FFFFF 111111111111111111111100 24682468 Note: Memory address values are
Lm–1 in binary representation;
32 bits other values are in hexadecimal
Tag Word
Main Memory Address =
22 bits 2 bits
Figure 4.11 Fully Associative Cache Organization
Figure 4.12 Associative Mapping Example
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Set Associative Mapping
Address length = (s + w) bits Compromise that exhibits the strengths of both the direct and
associative approaches while reducing their disadvantages
Number of addressable units = 2s+w words or bytes
Cache consists of a number of sets
Block size = line size = 2w words or bytes
Each set contains a number of lines
Number of blocks in main memory = 2s+ w/2w = 2s
A given block maps to any line in a given set
Number of lines in cache = undetermined
e.g. 2 lines per set
Size of tag = s bits 2 way associative mapping
A given block can be in one of 2 lines in only one set
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
B0 L0 s+w
k lines
Lk–1
Cache Main Memory
Cache memory - set 0 Memory Address Tag Data
Bv–1 B0
Tag Set Word
F0
B1
s–d d w F1
Set 0
B0 L0
Compare Fk+i Set 1
e
set
v lines
(hit in cache) F2k–1
1 if match
Bv–1 Lv–1 0 if no match
0 if match
(b) k direct-mapped caches
1 if no match
(miss in cache)
000 000000001111111111111000
000 000000001111111111111100
Set
Number of addressable units = 2s+w words or bytes 02C 000101100000000000000000 77777777
Tag Data Number Tag Data
000 13579246 0000 02C 77777777
02C 000101100000000000000100 11235813 02C 11235813 0001
Block size = line size = 2w words or bytes 02C 000101100011001110011100 FEDCBA98 02C FEDCBA98
FEDCBA98 0CE7
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
1.0
0.9
+
0.8
0.7
Hit ratio
0.6
0.5
0.4 Once the cache has been filled, when a new block is brought
0.3 into the cache, one of the existing blocks must be replaced
0.2
For direct mapping there is only one possible line for any
0.1
particular block and no choice is possible
0.0
1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M
For the associative and set-associative techniques a
Cache size (bytes)
replacement algorithm is needed
direct
2-way
4-way To achieve high speed, an algorithm must be implemented in
8-way
16-way
hardware
First-in-first-out (FIFO)
Replace that block in the set that has been in the cache longest
Easily implemented as a round-robin or circular buffer technique
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
When a block of Two specific effects
data is retrieved come into play:
and placed in the • Larger blocks reduce the
cache not only the As the block size number of blocks that fit
Write through desired word but increases more into a cache
also some number useful data are • As a block becomes larger
each additional word is
Simplest technique of adjacent words
are retrieved
brought into the
cache
farther from the requested
word
All write operations are made to main memory as well as to the cache
The main disadvantage of this technique is that it generates substantial
memory traffic and may create a bottleneck
Write back
As the block size The hit ratio will
Minimizes memory writes increases the hit begin to decrease
ratio will at first as the block
Updates are made only in the cache increase because
of the principle of
becomes bigger
and the
locality probability of
Portions of main memory are invalid and hence accesses by I/O using the newly
modules can be allowed only through the cache fetched
information
This makes for complex circuitry and a potential bottleneck becomes less than
the probability of
reusing the
information that
has to be replaced
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
0.98
+
Multilevel Caches 0.96
0.94
As logic density has increased it has become possible to have a cache 0.92
on the same chip as the processor
0.90
The on-chip cache reduces the processor’s external bus activity and L1 = 16k
Hit ratio
speeds up execution time and increases overall system performance 0.88 L1 = 8k
When the requested instruction or data is found in the on-chip cache, the bus
access is eliminated 0.86
On-chip cache accesses will complete appreciably faster than would even 0.84
zero-wait state bus cycles
During this period the bus is free to support other transfers 0.82
The use of multilevel caches complicates all of the design issues related
to caches, including size, replacement algorithm, and write policy Figure 4.17 Total Hit Ratio (L1 and L2) for 8 Kbyte and 16 Kbyte L1
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Processor on which
Feature First
Problem Solution Appears
External memory slower than the system Add external cache using 386
bus. faster memory
technology.
Increased processor speed results in Move external cache on- 486
chip, operating at the
Has become common to split cache: external bus becoming a bottleneck for
same speed as the
cache access.
One dedicated to instructions processor.
One dedicated to data Internal cache is rather small, due to Add external L2 cache 486 Table 4.4
limited space on chip using faster technology
Both exist at the same level, typically as two L1 caches than main memory
Contention occurs when both the Create separate data and Pentium Intel
Advantages of unified cache: Instruction Prefetcher and the Execution instruction caches.
Cache
Higher hit rate Unit simultaneously require access to the
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Summary Cache +
Memory
Chapter 4
Elements of cache
Computer memory
design
system overview
Cache addresses
Characteristics of
Memory Systems Cache size
Instruction Sets:
Characteristics and Functions
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Elements of a Machine Instruction
Instruction Operand Operand
fetch fetch store
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Instruction Types
Instruction Representation • Arithmetic instructions provide • Movement of data into or
computational capabilities for out of register and or
processing numeric data memory locations
• Logic (Boolean) instructions operate
Opcodes are represented by abbreviations on the bits of a word as bits rather
than as numbers, thus they provide
called mnemonics capabilities for processing any
other type of data the user may wish
to employ
Examples include:
Data Data
ADD Add
processing storage
SUB Subtract
MUL Multiply
DIV Divide
LOAD Load data from memory
Data
STOR Store data to memory Control
movement
Operands are also represented symbolically • Test instructions are used to test the • I/O instructions are needed
value of a data word or the status of a to transfer programs and
computation data into memory and the
Each symbolic opcode has a fixed binary representation • Branch instructions are used to branch
to a different set of instructions
results of computations
back out to the user
depending on the decision made
The programmer specifies the location of each symbolic operand
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Instruction Comment
SUB Y, A, B YA–B
MPY T, D, E T D E
ADD T, T, C
DIV Y, Y, T
TT+C
YY÷T
Table 12.1
Utilization of Instruction Addresses
(a) Three-address instructions
Instruction Comment (Nonbranching Instructions)
LOAD D AC D
MPY E AC AC E
Instruction Comment ADD C AC AC + C Number of Addresses Symbolic Representation Interpretation
MOVE Y, A STOR Y Y AC
SUB Y, B
3 OP A, B, C A B OP C
LOAD A AC A
MOVE T, D SUB B AC AC – B 2 OP A, B A A OP B
MPY T, E DIV Y AC AC ÷ Y
ADD T, C STOR Y Y AC 1 OP A AC AC OP A
DIV Y, T 0 OP T (T – 1) OP T
AC = accumulator
(b) Two-address instructions (c) One-address instructions
T = top of stack
(T – 1) = second element of stack
A, B, C = memory or register locations
AB
Figure 12.3 Programs to Execute Y
CDE
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Instruction Set Design
Very complex because it affects so many aspects of the computer system
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
All machine languages include numeric data types
A common form of data is text or character strings
Numbers stored in a computer are limited:
Limit to the magnitude of numbers representable on a machine Textual data in character form cannot be easily stored or
In the case of floating-point numbers, a limit to their precision transmitted by data processing and communications systems
because they are designed for binary data
Three types of numerical data are common in computers:
Binary integer or binary fixed point Most commonly used character code is the International
Binary floating point Reference Alphabet (IRA)
Decimal Referred to in the United States as the American Standard Code
for Information Interchange (ASCII)
Packed decimal
Each decimal digit is represented by a 4-bit code with two digits Another code used to encode characters is the Extended
stored per byte
To form numbers 4-bit codes are strung together, usually in multiples
Binary Coded Decimal Interchange Code (EBCDIC)
of 8 bits EBCDIC is used on IBM mainframes
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Data Type Description
General Byte, word (16 bits), doubleword (32 bits), quadword (64 bits),
and double quadword (128 bits) locations with arbitrary binary
contents.
Integer A signed binary value co4ntained in a byte, word, or doubleword,
using twos complement representation.
Ordinal An unsigned integer contained in a byte, word, or doubleword.
Unpacked binary coded
decimal (BCD)
A representation of a BCD digit in the range 0 through 9, with one
digit in each byte. Table 12.2
An n-bit unit consisting of n 1-bit items of data, each item Packed BCD Packed byte representation of two BCD digits; value in the range 0
having the value 0 or 1 to 99.
Near pointer A 16-bit, 32-bit, or 64-bit effective address that represents the
To manipulate the bits of a data item Bit field A contiguous sequence of bits in which the position of each bit is
considered as an independent unit. A bit string can begin at any bit
If floating-point operations are implemented in software, we position of any byte and can contain up to 32 bits.
need to be able to shift significant bits in some operations Bit string A contiguous sequence of bits, containing from zero to 232 – 1
bits.
To convert from IRA to packed decimal, we need to extract the Byte string A contiguous sequence of bytes, words, or doublewords,
rightmost 4 bits of each byte containing from zero to 232 – 1 bytes.
Floating point See Figure 12.4.
Packed SIMD (single Packed 64-bit and 128-bit data types
instruction, multiple data)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Byte unsigned integer
7 0
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
ARM Data Types Data bytes
in memory
ARM processors support (ascending address values
data types of:
from byte 0 to byte 3)
• 8 (byte)
• 16 (halfword) Byte 3
• 32 (word) bits in length
Byte 2
Byte 1
31 0 31 0
Byte 3 Byte 2 Byte 1 Byte 0 Byte 0 Byte 1 Byte 2 Byte 3
For all three data types program status register E-bit = 0 program status register E-bit = 1
Unaligned access
an unsigned
interpretation is • When this option is enabled,
the processor uses one or
supported in which the more memory accesses to
value represents an generate the required transfer
unsigned, nonnegative of adjacent bytes transparently
to the programmer
integer Figure 12.5 ARM Endian Support - Word Load/Store with E-bit
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Type Operation Name Description
Type Operation Name Description
Move (transfer) Transfer word or block from source to destination
Jump (branch) Unconditional transfer; load PC with specified address
Store Transfer word from processor to memory
Jump Conditional Test specified condition; either load PC with specified
Load (fetch) Transfer word from memory to processor address or do nothing, based on condition
Exchange Swap contents of source and destination Jump to Subroutine Place current program control information in known
Data Transfer
Clear (reset) Transfer word of 0s to destination location; jump to specified address
Common Common
Subtract Compute difference of two operands Skip Conditional Test specified condition; either skip or do nothing based
Multiply Compute product of two operands on condition
Arithmetic
Divide
Absolute
Compute quotient of two operands
Replace operand by its absolute value
Instruction Set Halt
Wait (hold)
Stop program execution
Stop program execution; test specified condition
Instruction Set
Negate
Increment
Change sign of operand
Add 1 to operand
Operations No operation
repeatedly; resume execution when condition is satisfied
No operation is performed, but program execution is
Operations
Decrement Subtract 1 from operand (page 1 of 2) Input (read)
continued
Transfer data from specified I/O port or device to (page 2 of 2)
AND Perform logical AND destination (e.g., main memory or processor register)
OR Perform logical OR Output (write) Transfer data from specified source to I/O port or device
NOT (complement) Perform logical NOT Input/Output Start I/O Transfer instructions to I/O processor to initiate I/O
Exclusive-OR Perform logical XOR operation
Test Test specified condition; set flag(s) based on outcome Test I/O Transfer status information from I/O system to specified
Logical destination
Compare Make logical or arithmetic comparison of two or more
operands; set flag(s) based on outcome Translate Translate values in a section of memory based on a table
of correspondences
Set Control Class of instructions to set controls for protection Conversion
Variables purposes, interrupt handling, timer control, etc. Convert Convert the contents of a word from one form to another (Table can be found on page
(e.g., packed decimal to binary) 426 in textbook.)
Shift Left (right) shift operand, introducing constants at end (Table can be found on page
Rotate Left (right) shift operand, with wraparound end 426 in textbook.)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 12.4
Processor Actions for Various Types of Operations
Transfer data from one location to another
If memory is involved:
Data Transfer Determine memory address
Perform virtual-to-actual-memory address transformation
Check cache
Initiate memory read/write
May involve data transfer, before and/or after Must specify:
Arithmetic Perform function in ALU • Location of the source and
destination operands
Set condition codes and flags Most fundamental type of
• The length of data to be
machine instruction transferred must be indicated
Logical Same as arithmetic
• The mode of addressing for each
Similar to arithmetic and logical. May involve special logic to operand must be specified
Conversion
perform conversion
Update program counter. For subroutine call/return, manage
Transfer of Control
parameter passing and linkage
Issue command to I/O module
I/O
If memory-mapped I/O, determine memory-mapped address
Table 12.6
Basic Logical Operations
(b) Logical left shift
S
0 0 1 0 0 0 1
(d) Arithmetic left shift
0 1 1 0 1 1 0
1 0 0 0 1 1 0
1 1 0 1 1 0 1 (e) Right rotate
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Instructions that
Table 12.7
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Input/Output
Instructions that can be executed only while the processor is in a
certain privileged state or is executing a program in a special
Variety of approaches taken: privileged area of memory
Isolated programmed I/O
Memory-mapped programmed I/O Typically these instructions are reserved for the use of the
operating system
DMA
Use of an I/O processor
Examples of system control operations:
Many implementations provide only a few I/O instructions,
with the specific actions specified by parameters, codes, or
command words A system control instruction An instruction to read or Access to process control
may read or alter a control modify a storage protection blocks in a
register key multiprogramming system
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Memory
address Instruction
200
201
202 SUB X, Y
Reasons why transfer-of-control operations are required: 203 BRZ 211
It is essential to be able to execute each instruction more than Unconditional
branch Conditional
once
branch
Virtually all programs involve some decision making 210 BR 202
211
It helps if there are mechanisms for breaking the task up into
smaller pieces that can be worked on one at a time
225 BRE R1, R2, 235
Most common transfer-of-control operations found in
instruction sets: Conditional
branch
Branch
235
Skip
Procedure call
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Addresses Main Memory
4000
Main
4100 CALL Proc1
4101 Program
4500
4800
Procedure
Proc2
RETURN
Figure 12.9 Use of Stack to Implement Nested Procedures of Figure 12.8
(a) Calls and returns (b) Execution sequence
Old Frame Pointer Frame The intent was to provide tools for the compiler writer to produce
Pointer optimized machine language translation of high-level language
Q: Return Point programs
x2
Stack
x2 Provides four instructions to support procedure call/return:
Pointer CALL
x1 x1 ENTER
LEAVE
Old Frame Pointer Frame Old Frame Pointer RETURN
Pointer
P: Return Point P: Return Point When a new procedure is called the following must be performed upon
entry to the new procedure:
Push the return point on the stack
(a) P is active (b) P has called Q Push the current frame pointer on the stack
Copy the stack pointer as the new value of the frame pointer
Adjust the stack pointer to allocate a frame
Figure 12.10 Stack Frame Growth Using Sample Procedures P and Q
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Symbol Condition Tested Comment
Table 12.8 A, NBE CF=0 AND ZF=0 Above; Not below or equal (greater than,
unsigned)
x86 Status Flags AE, NB, NC CF=0 Above or equal; Not below (greater than or
Table
equal, unsigned); Not carry
B, NAE, C CF=1 Below; Not above or equal (less than,
12.9
unsigned); Carry set
Status Bit Name Description BE, NA CF=1 OR ZF=1 Below or equal; Not above (less than or
equal, unsigned)
CF Carry Indicates carrying or borrowing out of the left-most bit position
E, Z ZF=1 Equal; Zero (signed or unsigned)
following an arithmetic operation. Also modified by some of
the shift and rotate operations. G, NLE [(SF=1 AND OF=1) OR (SF=0 Greater than; Not less than or equal (signed)
x86
PF Parity Parity of the least-significant byte of the result of an arithmetic
and OF=0)] AND [ZF=0]
Condition
GE, NL (SF=1 AND OF=1) OR (SF=0 Greater than or equal; Not less than (signed)
or logic operation. 1 indicates even parity; 0 indicates odd AND OF=0) Codes
parity. L, NGE (SF=1 AND OF=0) OR (SF=0
AND OF=1)
Less than; Not greater than or equal (signed) for
AF Auxiliary Carry Represents carrying or borrowing between half-bytes of an 8-bit
LE, NG (SF=1 AND OF=0) OR (SF=0 Less than or equal; Not greater than (signed) Conditional
arithmetic or logic operation. Used in binary-coded decimal
arithmetic.
AND OF=1) OR (ZF=1)
Jump
NE, NZ ZF=0 Not equal; Not zero (signed or unsigned)
ZF Zero Indicates that the result of an arithmetic or logic operation is 0. NO OF=0 No overflow
and
SF Sign Indicates the sign of the result of an arithmetic or logic NS SF=0 Not sign (not negative) SETcc
operation. NP, PO PF=0 Not parity; Parity odd Instructions
O OF=1 Overflow
OF Overflow Indicates an arithmetic overflow after an addition or subtraction
P PF=1 Parity; Parity even
for twos complement arithmetic. (Table can be found on page
S SF=1 Sign (negative) 440 in the textbook.)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Category Instruction Description
PADD [B, W, D] Parallel add of packed eight bytes, four 16-bit words, or two 32-bit
doublewords, with wraparound.
PADDS [B, W] Add with saturation.
PADDUS [B, W] Add unsigned with saturation
PSUB [B, W, D] Subtract with wraparound.
PSUBS [B, W] Subtract with saturation.
Arithmetic PSUBUS [B, W] Subtract unsigned with saturation
PMULHW Parallel multiply of four signed 16-bit words, with high-order 16
PMULLW
bits of 32-bit result chosen.
Parallel multiply of four signed 16-bit words, with low-order 16 bits
of 32-bit result chosen.
Table 12.10
PMADDWD Parallel multiply of four signed 16-bit words; add together adjacent
pairs of 32-bit results. 1996 Intel introduced MMX technology into its Pentium
PCMPEQ [B, W, D] Parallel compare for equality; result is mask of 1s if true or 0s if product line
Comparison
PCMPGT [B, W, D]
false.
Parallel compare for greater than; result is mask of 1s if true or 0s if
false.
MMX MMX is a set of highly optimized instructions for multimedia tasks
PACKUSWB
PACKSS [WB, DW]
Pack words into bytes with unsigned saturation.
Pack words into bytes, or doublewords into words, with signed Instruction Set Video and audio data are typically composed of large arrays
saturation.
Conversion PUNPCKH [BW, WD,
DQ]
Parallel unpack (interleaved merge) high-order bytes, words, or
doublewords from MMX register.
of small data types
PUNPCKL [BW, WD, Parallel unpack (interleaved merge) low-order bytes, words, or
DQ] doublewords from MMX register.
PAND 64-bit bitwise logical AND Three new data types are defined in MMX
PNDN 64-bit bitwise logical AND NOT
Logical
POR 64-bit bitwise logical OR Packed byte
PXOR 64-bit bitwise logical XOR
PSLL [W, D, Q] Parallel logical left shift of packed words, doublewords, or Packed word
quadword by amount specified in MMX register or immediate
value. Packed doubleword
Shift PSRL [W, D, Q] Parallel logical right shift of packed words, doublewords, or
quadword.
PSRA [W, D] Parallel arithmetic right shift of packed words, doublewords, or Each data type is 64 bits in length and consists of multiple
quadword.
Data Transfer MOV [D, Q] Move doubleword or quadword to/from MMX register.
(Table can be found on page
442 in the textbook.)
smaller data fields, each of which holds a fixed-point integer
State Mgt EMMS Empty MMX state (empty FP registers tag bits).
Note: If an instruction supports multiple data types [byte (B), word (W), doubleword (D), quadword
(Q)], the data types are indicated in brackets.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
instructions
instructions instructions
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Code Symbol Condition Tested Comment
0000 EQ Z=1 Equal + Summary Instruction Sets:
0001 NE Z=0 Not equal
0010 CS/HS C=1 Carry set/unsigned higher or same Characteristics and
0011
0100
CC/LO
MI
C=0
N=1
Carry clear/unsigned lower
Minus/negative Table 12.11 Chapter 12 Functions
0101 PL N=0 Plus/positive or zero
0110 VS V=1 Overflow Machine instruction Intel x86 and ARM data types
ARM characteristics
0111 VC V=0 No overflow
1000 HI C = 1 AND Z = 0 Unsigned higher Conditions Elements of a machine
Types of operations
1001 LS C = 0 OR Z = 1 Unsigned lower or same for instruction Data transfer
1010 GE N=V Signed greater than or equal Conditional Instruction representation Arithmetic
[(N = 1 AND V = 1)
OR (N = 0 AND V = 0)
Instruction Instruction types Logical
1011 LT N≠V Signed less than Execution Number of addresses Conversion
[(N = 1 AND V = 0) Instruction set design Input/output
OR (N = 0 AND V = 1)]
Types of operands System control
1100 GT (Z = 0) AND (N = V) Signed greater than
1101 LE (Z = 1) OR (N ≠ V) Signed less than or equal
Numbers Transfer of control
1110 AL — Always (unconditional) Characters
1111 — — This instruction can only be executed
(Table can be found on
Logical data Intel x86 and ARM operation
Page 445 in the textbook.)
unconditionally types
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Read Rotation
Inter-track gap Track
current
Inter-sector gap
MR
sensor Write current
Track sector
Sector
Shield
Inductive
write element
Magnetization
Read-write head
(1 per surface)
Recording Platter
medium
Direction of
Cylinder Spindle Boom
arm motion
Figure 6.1 Inductive Write/Magnetoresistive Read Head Figure 6.2 Disk Data Layout
index
gap ID gap data gap gap ID gap data gap gap ID gap data gap
1 field 2 field 3 1 field 2 field 3 1 field 2 field 3
0 0 1 1 29 29
bytes 17 7 41 515 20 17 7 41 515 20 17 7 41 515 20
600 bytes/sector
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 6.2
+ Typical Hard Disk Drive Parameters
Characteristics Seagate Seagate Seagate Seagate Laptop
Classification Application
Enterprise
Enterprise
Barracuda XT
Desktop
Cheetah NS
Network
HDD
Laptop
attached storage,
application
The head must generate or servers
sense an electromagnetic field Winchester Heads
Capacity 6 TB 3 TB 600 GB 2 TB
of sufficient magnitude to write
and read properly Used in sealed drive assemblies that Average seek 4.16 ms N/A 3.9 ms read 13 ms
are almost free of contaminants time 4.2 ms write
The narrower the head, the
Spindle speed 7200 rpm 7200 rpm 10, 075 rpm 5400 rpm
closer it must be to the platter Designed to operate closer to the
surface to function disk’s surface than conventional rigid Average latency 4.16 ms 4.16 ms 2.98 5.6 ms
A narrower head means
disk heads, thus allowing greater Maximum 216 MB/s 149 MB/s 97 MB/s 300 MB/s
data density
narrower tracks and sustained
therefore greater data transfer rate
Is actually an aerodynamic foil that
density rests lightly on the platter’s surface Bytes per sector 512/4096 512 512 4096
when the disk is motionless
The closer the head is to the Tracks per 8 10 8 4
The air pressure generated by a
disk the greater the risk of cylinder (number
spinning disk is enough to make
error from impurities or the foil rise above the surface of platter
imperfections surfaces)
Cache 128 MB 64 MB 16 MB 8 MB
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
When the disk drive is operating the disk is rotating at constant speed
To read or write the head must be positioned at the desired track and at the beginning
of the desired sector on the track
Track selection involves moving the head in a movable-head system or electronically
selecting one head on a fixed-head system
Wait for Wait for Seek Rotational Data Once the track is selected, the disk controller waits until the appropriate sector rotates to
Device Channel Delay Transfer line up with the head
Seek time
On a movable–head system, the time it takes to position the head at the track
Device Busy
Rotational delay (rotational latency)
The time it takes for the beginning of the sector to reach the head
Access time
The sum of the seek time and the rotational delay
Figure 6.5 Timing of a Disk I/O Transfer The time it takes to get into position to read or write
Transfer time
Once the head is in position, the read or write operation is then performed
as the sector moves under the head
This is the data transfer portion of the operation
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 6.3
+ Consists of 7 levels
RAID Levels
Large I/O Data
Category Level Description
Disks Data Availability
Small I/O Request
Transfer
Levels do not imply a hierarchical Required Capacity
Rate
relationship but designate different Lower than single Very high for both
RAID
Striping 0 Nonredundant N Very high
disk read and write
design architectures that share three Higher than RAID Higher than single Up to twice that of a
disk for read; single disk for read;
common characteristics: Mirroring 1 Mirrored 2N 2, 3, 4, or 5; lower
similar to single similar to single disk
than RAID 6
disk for write for write
1) Set of physical disk drives viewed Redundant via Hamming
Much higher than
single disk; Highest of all Approximately twice
2 N+m
by the operating system as a single code comparable to listed alternatives that of a single disk
RAID 3, 4, or 5
logical drive Parallel access
Much higher than
single disk; Highest of all Approximately twice
3 Bit-interleaved parity N+1
comparable to listed alternatives that of a single disk
2) Data are distributed across the RAID 2, 4, or 5
Redundant Array of physical drives of an array in a Much higher than
Similar to RAID 0
Similar to RAID 0 for
for read;
scheme known as striping 4 Block-interleaved parity N+1
single disk;
significantly
read; significantly
comparable to lower than single disk
Independent Disks RAID 2, 3, or 5
lower than single
disk for write
for write
3) Redundant disk capacity is used to Much higher than Similar to RAID 0 Similar to RAID 0 for
Independent
store parity information, which access 5
Block-interleaved
distributed parity
N+1 single disk; for read; lower read; generally lower
comparable to than single disk than single disk for
guarantees data recoverability in RAID 2, 3, or 4 for write write
case of a disk failure Block-interleaved dual Highest of all
Similar to RAID 0
for read; lower
Similar to RAID 0 for
read; significantly
6 N+2
distributed parity listed alternatives than RAID 5 for lower than RAID 5 for
write write
N = number of data disks; m proportional to log N
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
strip 0 strip 1 strip 2 strip 3
strip 4 strip 5 strip 6 strip 7
strip 8 strip 9 strip 10 strip 11
strip 12 strip 13 strip 14 strip 15
(d) RAID 3 (bit-interleaved parity)
strip 0 strip 1 strip 2 strip 3 strip 0 strip 1 strip 2 strip 3 block 12 block 13 block 14 block 15 P(12-15)
Figure 6.6 RAID Levels (page 1 of 2) Figure 6.6 RAID Levels (page 2 of 2)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Logical Disk
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Table 6.4
SSDs have the following advantages over HDDs:
RAID
High-performance input/output operations per second
Comparison
(IOPS)
(page 2 of 2)
Durability
Longer lifespan
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Host System
Operating System
Software
File System Software
I/O Driver Software
NAND Flash Drives Seagate Laptop Internal
HDD Interface
Table 6.5
Comparison of Solid State Drives and Disk Drives
Figure 6.8 Solid State Drive Architecture
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
CD
+ Practical Issues Compact Disk. A nonerasable disk that stores digitized audio information. The standard
system uses 12-cm disks and can record more than 60 minutes of uninterrupted playing time.
CD-ROM
There are two practical issues peculiar to SSDs Compact Disk Read-Only Memory. A nonerasable disk used for storing computer data.
The standard system uses 12-cm disks and can hold more than 650 Mbytes.
that are not faced by HDDs:
Flash memory becomes
CD-R
CD Recordable. Similar to a CD-ROM. The user can write to the disk only once.
Table 6. 6
SDD performance has a unusable after a certain
CD-RW
tendency to slow down as the number of writes
device is used Techniques for prolonging
CD Rewritable. Similar to a CD-ROM. The user can erase and rewrite to the disk
multiple times. Optical
life:
The entire block must be
Front-ending the flash with a
DVD Disk
read from the flash memory cache to delay and group
Digital Versatile Disk. A technology for producing digitized, compressed representation
(CD-ROM)
Audio CD and the CD-ROM share a similar technology
The main difference is that CD-ROM players are more rugged and Land
have error correction devices to ensure that data are properly transferred Pit
Polycarbonate Aluminum
Production:
plastic
The disk is formed from a resin such as polycarbonate
Digitally recorded information is imprinted as a series of microscopic pits on
the surface of the polycarbonate
This is done with a finely focused, high intensity laser to create a master disk
Laser transmit/
The master is used, in turn, to make a die to stamp out copies onto
receive
polycarbonate
The pitted surface is then coated with a highly reflective surface, usually
aluminum or gold
This shiny surface is protected against dust and scratches by a top
coat of clear acrylic Figure 6.9 CD Operation
Finally a label can be silkscreened onto the acrylic
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
CD-ROM is appropriate for the distribution of large CD-ROM
amounts of data to a large number of users
Sector
Mode
MIN
SEC
00 FF . . . FF 00 Data ECC
The CD-ROM has two advantages:
12 bytes 4 bytes 2048 bytes 288 bytes The optical disk together with the information stored
SYNC ID Data L-ECC on it can be mass replicated inexpensively
Protective layer
(acrylic)
1.2 mm
Reflective layer thick
(aluminum)
Polycarbonate substrate Laser focuses on polycarbonate
(plastic) pits in front of reflective layer.
Disk is prepared in such a way Molecules exhibit a random Fully reflective layer, side 2
that it can be subsequently orientation that reflects light poorly
Fully reflective layer, side 1 1.2 mm
written once with a laser beam Crystalline state thick
of modest-intensity Has a smooth surface that reflects light
Polycarbonate layer, side 1
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
CD 2.11 µm
Data layer
+
Beam spot Land
Pit 1.2 µm Tape systems use the same reading and recording techniques
0.58 µm as disk systems
Blu-ray
Track
Medium is flexible polyester tape coated with magnetizable
laser wavelength material
= 780 nm
0.6 µm
Serial recording
Data are laid out as a sequence of bits along each track
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 6.7
Track 2
Track 1
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Summary +
External Memory
Chapter 6
RAID
Magnetic disk
RAID level 0
Magnetic read and write
mechanisms RAID level 1
Direct
Indirect
Register
Displacement
Instruction Sets: Addressing
Modes and Formats Stack
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Instruction Instruction Instruction
Operand A A
Memory Memory
Operand
Table 13.1
(a) Immediate (b) Direct
Operand
(c) Indirect
Basic Addressing Modes
Instruction Instruction Instruction
R R R A
Memory Memory Mode Algorithm Principal Advantage Principal Disadvantage
Operand
Immediate Operand = A No memory reference Limited operand magnitude
Operand Operand
Direct EA = A Simple Limited address space
Registers Registers Registers
(d) Register (e) Register Indirect (f) Displacement Indirect EA = (A) Large address space Multiple memory references
Instruction
Register EA = R No memory reference Limited address space
Implicit Register indirect EA = (R) Large address space Extra memory reference
Displacement EA = A + (R) Flexibility Complexity
Top of Stack
Register Stack EA = top of stack No memory reference Limited applicability
(g) Stack
Operand = A
This mode can be used to define and use constants or set initial
values of variables
Typically the number will be stored in twos complement form
The leftmost bit of the operand field is used as a sign bit
Advantage:
No memory reference other than the instruction fetch is required to
obtain the operand, thus saving one memory or cache cycle in the
instruction cycle
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Register Addressing
Disadvantage:
Instruction execution requires two memory references to fetch the operand Advantages: Disadvantage:
One to get its address and a second to get its value • Only a small • The address space
address field is is very limited
A rarely used variant of indirect addressing is multilevel or cascaded needed in the
instruction
indirect addressing
• No time-consuming
EA = ( . . . (A) . . . ) memory references
Disadvantage is that three or more memory references could be required are required
to fetch an operand
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Register Indirect Addressing
Analogous to indirect addressing Combines the capabilities of direct addressing and register
indirect addressing
The only difference is whether the address field refers to a
memory location or a register EA = A + (R)
EA = (R) Requires that the instruction have two address fields, at least one
of which is explicit
Address space limitation of the address field is overcome by The value contained in one address field (value = A) is used directly
The other address field refers to a register whose contents are added
having that field refer to a word-length location containing an to A to produce the effective address
address
Most common uses:
Uses one less memory reference than indirect addressing Relative addressing
Base-register addressing
Indexing
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Base-Register Addressing
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
The address field references a main memory address and the referenced
register contains a positive displacement from that address
The method of calculating the EA is the same as for base-register addressing A stack is a linear array of locations
Sometimes referred to as a pushdown list or last-in-first-out queue
An important use is to provide an efficient mechanism for performing
iterative operations
A stack is a reserved block of locations
Autoindexing Items are appended to the top of the stack so that the block is partially filled
Automatically increment or decrement the index register after each reference to it
Associated with the stack is a pointer whose value is the address of the top of
EA = A + (R) the stack
(R) (R) + 1 The stack pointer is maintained in a register
Thus references to stack locations in memory are in fact register indirect addresses
Postindexing
Indexing is performed after the indirection Is a form of implied addressing
EA = (A) + (R)
The machine instructions need not include a memory
Preindexing reference but implicitly operate on the top of the stack
Indexing is performed before the indirection
EA = (A + (R))
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 13.2
x86 Addressing Modes
Mode Algorithm
Immediate Operand = A
Register Operand LA = R
Displacement LA = (SR) + A
Base LA = (SR) + (B)
Base with Displacement LA = (SR) + (B) + A
Scaled Index with Displacement LA = (SR) + (I) S + A
Base with Index and Displacement LA = (SR) + (B) + (I) + A
Base with Scaled Index and Displacement LA = (SR) + (I) S + (B) + A
Relative LA = (PC) + A
LA = linear address
(X) = contents of X
SR = segment register
PC = program counter
A = contents of an address field in the instruction
R = register
B = base register
I = index register
S = scaling factor
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
STRB r0, [r1, #12]
Offset
0xC 0x20C 0x5
r0 Destination
0x5 register
r1 for STR
Original
base register
0x200 0x200
(a) Offset
r1
Branch instructions
Offset
Updated
base register 0x20C 0xC 0x20C
r0
0x5
Destination
register The only form of addressing for branch instructions is immediate
r1 for STR
Original
base register
0x200 0x200 0x5 Instruction contains 24 bit value
Shifted 2 bits left so that the address is on a word boundary
(c) Postindex
Effective range ± 32MB from from the program counter
Figure 13.3 ARM Indexing Methods
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
LDMxx r10, {r0, r1, r4}
STMxx r10, {r0, r1, r4}
Instruction Formats
Increment Increment Decrement Decrement
after (IA) before (IB) after (DA) before (DB)
r10
Base register 0x20C (r4) 0x218
(r4) (r1) 0x214
(r1) (r0) 0x210 Must include
Define the
(r0) (r4) 0x20C an opcode For most
layout of the
(r1) (r4) 0x208 and, implicitly instruction
bits of an
(r0) (r1) 0x204 or explicitly, sets more than
instruction, in
(r0) 0x200 indicate the one
terms of its
addressing instruction
constituent
mode for each format is used
fields
operand
Figure 13.4 ARM Load/Store Multiple Addressing
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Instruction Length
Most basic design issue
Number of
Affects, and is affected by: Number of
addressing
Memory size
operands
Memory organization modes
Bus structure
Processor complexity
Processor speed
Number of Address
Should be equal to the memory-transfer length or one should
be a multiple of the other
register sets range granularity
Should be a multiple of the character length, which is usually
8 bits, and of the length of fixed-point numbers
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Memory Reference Instructions
Opcode D/I Z/C Displacement
0 2 3 4 5 11
Input/Output Instructions
1 1 0 Device Opcode
0 2 3 8 9 11
4
Source
6
Destination
6
2 Opcode
7
R
3
Source
6
3
8 8
4 Opcode FP Destination 5 6
8 2 6 10 6 12 4
7 8 Opcode
13 3 16
Variations can be provided efficiently and compactly 9 Opcode Source Destination Memory Address
4 6 6 16
7 3 6 16
Does not remove the desirability of making all of the 11 Opcode FP Source Memory Address
Because the processor does not know the length of the next 12 Opcode Destination Memory Address
bytes or words equal to at least the longest possible instruction 13 Opcode Source Destination Memory Address 1 Memory Address 2
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Hexadecimal Explanation Assembler Notation 0 or 1 0 or 1 0 or 1 0 or 1
Format and Description bytes bytes bytes bytes
8 bits Operand Address
Instruction Segment
0 5 Opcode for RSB RSB size size
Return from subroutine prefix override
override override
0 or 1 0 or 1
B 0 Opcode for MOVW MOVW 356(R4), 25(R11)
0, 1, 2, 3, or 4 bytes 1, 2, or 3 bytes bytes bytes 0, 1, 2, or 4 bytes 0, 1, 2, or 4 bytes
Word displacement mode,
C 4 Register R4
Move a word from address
that is 356 plus contents
6 4 of R4 to address that is
Instruction prefixes Opcode ModR/m SIB Displacement Immediate
356 in hexadecimal
0 1 25 plus contents of R11
Byte displacement mode,
A B Register R11
1 9 25 in hexadecimal
5 0 Register mode R0
R0 and store the result in Mod Reg/Opcode R/M Scale Index Base
location whose address is
4 2 Index prefix R2 sum of A and 4 times the 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
Indirect word relative contents of R2
D F (displacement from PC)
Figure 13.8 Examples of VAX Instructions Figure 13.9 x86 Instruction Format
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Thumb-2 Instruction Set
Delivers overall code density comparable with Thumb, together with the
performance levels associated with the ARM ISA
Before Thumb-2 developers had to choose between Thumb for size and
ARM for performance
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Address Cont ents Address Contents
101 0010 0010 101 2201 101 2201
102 0001 0010 102 1202 102 1202
103 0001 0010 103 1203 103 1203
104 0011 0010 104 3204 104 3204