0% found this document useful (0 votes)
8 views159 pages

Double

The document discusses the fundamental concepts of computer organization and architecture, highlighting the distinction between architectural and organizational attributes. It outlines the basic functions of a computer, including data processing, storage, movement, and control, as well as the hierarchical structure of computer systems. Additionally, it covers the evolution of computer technology from vacuum tubes to transistors and integrated circuits, emphasizing the importance of architecture in maintaining software compatibility across different models.

Uploaded by

VvosamhvV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views159 pages

Double

The document discusses the fundamental concepts of computer organization and architecture, highlighting the distinction between architectural and organizational attributes. It outlines the basic functions of a computer, including data processing, storage, movement, and control, as well as the hierarchical structure of computer systems. Additionally, it covers the evolution of computer technology from vacuum tubes to transistors and integrated circuits, emphasizing the importance of architecture in maintaining software compatibility across different models.

Uploaded by

VvosamhvV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

+

+
Chapter 1
William Stallings
Computer Organization Basic Concepts and
and Architecture
10th Edition Computer Evolution
© 2016 Pearson Education, Inc., Hoboken,
NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Computer Architecture
Computer Organization
• Attributes of a system • Instruction set, number of
370 Architecture
visible to the bits used to represent
programmer various data types, I/O
• Have a direct impact on mechanisms, techniques
the logical execution of a
program
for addressing memory
 IBM System/370 architecture
 Was introduced in 1970
Architectural  Included a number of models
Computer
attributes Could upgrade to a more expensive, faster model without having to
Architecture 
include:
abandon original software
 New models are introduced with improved technology, but retain the
same architecture so that the customer’s software investment is
protected
Organizational
Computer  Architecture has survived to this day as the architecture of IBM’s
attributes mainframe product line
Organization
include:

• Hardware details • The operational units and


transparent to the their interconnections
programmer, control that realize the
signals, interfaces architectural
between the computer specifications
and peripherals, memory
technology used

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Function
 There are four basic functions that a computer can perform:
 Hierarchical system  Data processing
 Structure  Data may take a wide variety of forms and the range of
 Set of interrelated
 The way in which processing requirements is broad
subsystems
components relate to each  Data storage

 Hierarchical nature of complex other  Short-term


systems is essential to both  Long-term
 Function
their design and their  Data movement
description  The operation of individual
 Input-output (I/O) - when data are received from or delivered to
components as part of the
a device (peripheral) that is directly connected to the computer
 Designer need only deal with structure
 Data communications – when data are moved over longer
a particular level of the system
distances, to or from a remote device
at a time
 Control
 Concerned with structure
 A control unit manages the computer’s resources and
and function at each level
orchestrates(coordinate) the performance of its functional parts
in response to instructions
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
COMPUTER

I/O Main
memory

 CPU – controls the operation of


System
Bus

the computer and performs its


CPU
data processing functions
CPU

Registers ALU  Main Memory – stores data


 I/O – moves data between the
Internal
Bus

Control
Unit
computer and its external
CONTROL
environment
UNIT

 System Interconnection –
Sequencing
Logic

some mechanism that provides


Control Unit
Registers and
Decoders

for communication among CPU,


main memory, and I/O
Control
Memory

Figure 1.1 A Top-Down View of a Computer

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+  Control Unit
+
Multicore Computer Structure
 Controls the operation of the CPU
and hence the computer
 Central processing unit (CPU)
 Arithmetic and Logic Unit (ALU)  Portion of the computer that fetches and executes instructions
 Performs the computer’s data  Consists of an ALU, a control unit, and registers
processing function  Referred to as a processor in a system with a single processing unit

 Registers  Core
 Provide storage internal to the CPU  An individual processing unit on a processor chip
 May be equivalent in functionality to a CPU on a single-CPU system
 CPU Interconnection  Specialized processing units are also referred to as cores
 Some mechanism that provides for
communication among the control  Processor
unit, ALU, and registers  A physical piece of silicon containing one or more cores
 Is the computer component that interprets and executes instructions
 Referred to as a multicore processor if it contains multiple cores

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Main memory chips

Cache Memory
Processor
I/O chips chip

 Multiple layers of memory between the processor and main PROCESSOR CHIP

memory Core Core Core Core

L3 cache L3 cache
 Is smaller and faster than main memory
Core Core Core Core
 Used to speed up memory access by placing in the cache
data from main memory that is likely to be used in the near
future Instruction
Arithmetic
and logic Load/
logic unit (ALU) store logic

 A greater performance improvement may be obtained by L1 I-cache L1 data cache

using multiple levels of cache, with level 1 (L1) closest to the L2 instruction L2 data
core and additional levels (L2, L3, etc.) progressively farther cache cache

from the core


Figure 1.2 Simplified View of Major Elements of a Multicore Computer

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Figure 1.4

IBM
+ zEnterprise
EC12 Processor
Unit (PU)
Chip Diagram

Figure 1.3
Motherboard with Two Intel Quad-Core Xeon Processors Storage Control (SC)
Memory Controller (MC)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Figure 1.5 History of Computers
First Generation: Vacuum Tubes
IBM zEnterprise  Vacuum tubes were used for digital logic
EC12 elements and memory

Core Layout  IAS computer


 Fundamental design approach was the stored program concept
ISU (instruction sequence unit)  Attributed to the mathematician John von Neumann
IFU (instruction fetch unit)
IDU (instruction decode unit)  First publication of the idea was in 1945 for the EDVAC
LSU (load-store unit)  Design began at the Princeton Institute for Advanced Studies
XU (translation unit)
FXU (fixed-point unit)  Completed in 1952
BFU (binary floating-point unit)
DFU (decimal floating-point unit)  Prototype of all subsequent general-purpose computers
RU (recovery unit)
COP (dedicated co-processor)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Central processing unit (CPU)

Arithmetic-logic unit (CA)

AC MQ

Input- 0 1 39
Arithmetic-logic output
circuits
equipment
(I, O)

MBR
sign bit (a) Number word
Instructions
and data

Instructions
and data left instruction (20 bits) right instruction (20 bits)
M(0)
M(1)
M(2)
PC IBR
M(3)
AC: Accumulator register 0 8 20 28 39
M(4)
MQ: multiply-quotient register
MBR: memory buffer register
IBR: instruction buffer register
MAR IR PC: program counter
MAR: memory address register
Main
memory
IR: insruction register opcode (8 bits) address (12 bits) opcode (8 bits) address (12 bits)
(M)
Control
Control
circuits
M(4092)
signals (b) Instruction word
M(4093)
M(4095)
Program control unit (CC)

Addresses

Figure 1.7 IAS Memory Formats


Figure 1.6 IAS Structure
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Start

+ Registers Yes Is next No


instruction MAR PC
No memory in IBR?
Fetch access
Memory buffer register • Contains a word to be stored in memory or sent to the I/O unit
cycle required
MBR M(MAR)
(MBR) • Or is used to receive a word from memory or from the I/O unit

Left
No Yes IBR MBR (20:39)
IR IBR (0:7) IR MBR (20:27) instruction
IR MBR (0:7)
Memory address • Specifies the address in memory of the word to be written from
MAR IBR (8:19) MAR MBR (28:39) required?
MAR MBR (8:19)
register (MAR) or read into the MBR

PC PC + 1

Instruction register (IR) • Contains the 8-bit opcode instruction being executed Decode instruction in IR

AC M(X) Go to M(X, 0:19) If AC > 0 then AC AC + M(X)


go to M(X, 0:19)

Execution Yes
Instruction buffer • Employed to temporarily hold the right-hand instruction from a cycle
Is AC > 0?

register (IBR) word in memory


MBR M(MAR) PC MAR No MBR M(MAR)

• Contains the address of the next instruction pair to be fetched AC MBR AC AC + MBR
Program counter (PC) from memory

M(X) = contents of memory location whose addr ess is X


Accumulator (AC) and • Employed to temporarily hold operands and results of ALU (i:j) = bits i through j

multiplier quotient (MQ) operations


Figure 1.8 Partial Flowchart of IAS Operation

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Symbolic
Instruction Type Opcode Representation Description
00001010 LOAD MQ Transfer contents of register MQ to the
accumulator AC
00001001 LOAD MQ,M(X) Transfer contents of memory location X to
MQ
00100001 STOR M(X) Transfer contents of accumulator to memory
Data transfer location X
00000001 LOAD M(X) Transfer M(X) to the accumulator
00000010 LOAD –M(X) Transfer –M(X) to the accumulator
00000011 LOAD |M(X)| Transfer absolute value of M(X) to the
accumulator
00000100 LOAD –|M(X)| Transfer –|M(X)| to the accumulator
Unconditional 00001101 JUMP M(X,0:19) Take next instruction from left half of M(X)
 Smaller
branch 00001110
00001111
JUMP M(X,20:39)
JUMP+ M(X,0:19)
Take next instruction from right half of M(X)
If number in the accumulator is nonnegative,
take next instruction from left half of M(X)
Table 1.1
JU
MP
If number in the
accumulator is nonnegative,  Cheaper
Conditional branch + take next instruction from

The IAS
M(X right half of M(X)
,20:
39)  Dissipates less heat than a vacuum tube
00000101 ADD M(X) Add M(X) to AC; put the result in AC
Instruction Set
00000111 ADD |M(X)| Add |M(X)| to AC; put the result in AC  Is a solid state device made from silicon
00000110 SUB M(X) Subtract M(X) from AC; put the result in AC
00001000 SUB |M(X)| Subtract |M(X)| from AC; put the remainder

00001011 MUL M(X)


in AC
Multiply M(X) by MQ; put most significant  Was invented at Bell Labs in 1947
bits of result in AC, put least significant bits
Arithmetic
in MQ
00001100 DIV M(X) Divide AC by M(X); put the quotient in MQ
and the remainder in AC  It was not until the late 1950’s that fully transistorized
computers were commercially available
00010100 LSH Multiply accumulator by 2; i.e., shift left one
bit position
00010101 RSH Divide accumulator by 2; i.e., shift right one
position
00010010 STOR M(X,8:19) Replace left address field at M(X) by 12
rightmost bits of AC
Address modify
00010011 STOR M(X,28:39) Replace right address field at M(X) by 12
rightmost bits of AC (Table can be found on page 17 in the textbook.)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ +
Second Generation Computers

 Introduced:
 More complex arithmetic and logic units and
Approximate Typical Speed control units
Generation Dates Technology (operations per second)
1 1946–1957 Vacuum tube 40,000  The use of high-level programming languages
200,000
2 1957–1964 Transistor  Provision of system software which provided the
3 1965–1971 Small and medium scale 1,000,000
integration ability to:
4 1972–1977 Large scale integration 10,000,000  Load programs
5 1978–1991 Very large scale integration 100,000,000
>1,000,000,000
 Move data to peripherals
6 1991- Ultra large scale integration
 Libraries perform common computations

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
IBM 7094 computer Peripheral devices

Mag tape
units History of Computers
CPU
Card
Data
channel
punch
Third Generation: Integrated Circuits
Line
printer

Card
reader

Drum  1958 – the invention of the integrated circuit


Multi- Data
plexor channel
Disk  Discrete component
Data  Single, self-contained transistor
Disk
channel
 Manufactured separately, packaged in their own containers, and
Hyper- soldered or wired together onto masonite-like circuit boards
tapes

Memory
 Manufacturing process was expensive and cumbersome
Data Teleprocessing
channel equipment
 The two most important members of the third generation
were the IBM System/360 and the DEC PDP-8

Figure 1.9 An IBM 7094 Configuration


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+  A computer consists of gates,
Integrated memory cells, and
interconnections among these
elements

 The gates and memory cells


 Data storage – provided by are constructed of simple
memory cells digital electronic components
Boolean Binary
Input logic Output Input storage Output  Data processing – provided by
function cell
gates  Exploits the fact that such
components as transistors,
resistors, and conductors can be
Read  Data movement – the paths fabricated from a
among components are used semiconductor such as silicon
Activate Write to move data from memory to
signal memory and from memory  Many transistors can be
through gates to memory produced at the same time on a
(a) Gate (b) Memory cell single wafer of silicon
 Control – the paths among
components can carry control  Transistors can be connected
signals with a processor metallization to
Figure 1.10 Fundamental Computer Elements form circuits

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Wafer

100 bn
10 bn
1 bn
Chip 100 m
10 m
100,000
10.000
1,000
100
10
1
Gate 1947 50 55 60 65 70 75 80 85 90 95 2000 05 11

Packaged
chip
Figure 1.12 Growth in Transistor Count on Integrated Circuits
(DRAM memory)
Figure 1.11 Relationship Among Wafer, Chip, and Gate
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Moore’s Law +
1965; Gordon Moore – co-founder of Intel
IBM System/360

 Announced in 1964
Observed number of transistors that could be
 Product line was incompatible with older IBM machines
put on a single chip was doubling every year
 Was the success of the decade and cemented IBM as the
overwhelmingly dominant computer vendor
Consequences of Moore’s law:
 The architecture remains to this day the architecture of IBM’s
The pace slowed to a
doubling every 18
mainframe computers
months in the 1970’s
but has sustained The cost of The electrical Computer
that rate ever since computer logic
and memory
path length is
shortened,
becomes smaller
and is more
Reduction in
power and
Fewer
interchip
 Was the industry’s first planned family of computers
circuitry has increasing convenient to cooling
connections
fallen at a
dramatic rate
operating
speed
use in a variety
of environments
requirements
 Models were compatible in the sense that a program written for
one model should be capable of being executed by another
model in the series

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Family Characteristics

Console Main I/O I/O


CPU
controller memory module module

Omnibus

Figure 1.13 PDP-8 Bus Structure

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Semiconductor Memory
+ LSI
Large
Scale
Integration In 1970 Fairchild produced the first relatively capacious semiconductor memory
more than 1000 components
can be placed on a Chip was about the size Could hold 256 bits of
Non-destructive Much faster than core
single integrated circuit chip of a single core memory

VLSI
Very Large In 1974 the price per bit of semiconductor memory dropped below the price per bit
of core memory
Scale
Integration
10,000 components per chip

ULSI
Semiconductor Memory Ultra Large Since 1970 semiconductor memory has been through 13 generations
Microprocessors Scale
Each generation has provided four times the storage density of the previous generation, accompanied
Integration by declining cost per bit and declining access time
one billion components
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+

 The density of elements on processor chips continued to rise


 More and more elements were placed on each chip so that fewer 4004 8008 8080 8086 8088
and fewer chips were needed to construct a single computer Introduced 1971 1972 1974 1978 1979
processor 5 MHz, 8 MHz, 10
Clock speeds 108 kHz 108 kHz 2 MHz 5 MHz, 8 MHz
MHz
 1971 Intel developed 4004
Bus width 4 bits 8 bits 8 bits 16 bits 8 bits
 First chip to contain all of the components of a CPU on a single
Number of 2,300 3,500 6,000 29,000 29,000
chip
transistors
 Birth of microprocessor
Feature size
10 8 6 3 6
(µm)
 1972 Intel developed 8008
Addressable
 First 8-bit microprocessor 640 Bytes 16 KB 64 KB 1 MB 1 MB
memory
 1974 Intel developed 8080
 First general purpose microprocessor
 Faster, has a richer instruction set, has a large addressing (a) 1970s Processors
capability
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
486TM DX
80286 386TM DX 386TM SX
CPU 486TM SX Pentium Pentium Pro Pentium II
Introduced 1982 1985 1988 1989 Introduced 1991 1993 1995 1997
6 MHz - 12.5 16 MHz - 33 16 MHz - 33 25 MHz - 50 16 MHz - 33 60 MHz - 166 150 MHz - 200 200 MHz - 300
Clock speeds Clock speeds
MHz MHz MHz MHz MHz MHz, MHz MHz
Bus width 16 bits 32 bits 16 bits 32 bits Bus width 32 bits 32 bits 64 bits 64 bits
Number of 1.185 million 3.1 million 5.5 million 7.5 million
Number of transistors
134,000 275,000 275,000 1.2 million transistors
Feature size (µm) 1 0.8 0.6 0.35
Feature size (µm) 1.5 1 1 0.8 - 1 Addressable
Addressable 4 GB 4 GB 64 GB 64 GB
16 MB 4 GB 16 MB 4 GB memory
memory Virtual memory 64 TB 64 TB 64 TB 64 TB
Virtual 512 kB L1 and 1
1 GB 64 TB 64 TB 64 TB Cache 8 kB 8 kB 512 kB L2
memory MB L2
Cache — — — 8 kB

(c) 1990s Processors


(b) 1980s Processors
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
The Evolution of the Intel x86
Pentium III Pentium 4
Core 2 Duo Core i7 EE
4960X
Architecture
Introduced 1999 2000 2006 2013
Clock speeds 450 - 660 MHz 1.3 - 1.8 GHz 1.06 - 1.2 GHz 4 GHz  Two processor families are the Intel x86 and the ARM
Bus architectures
wid 64 bits 64 bits 64 bits 64 bits
th  Current x86 offerings represent the results of decades of
Number of 9.5 million 42 million 167 million 1.86 billion design effort on complex instruction set computers (CISCs)
transistors
Feature size (nm) 250 180 65 22  An alternative approach to processor design is the reduced
Addressable
memory
64 GB 64 GB 64 GB 64 GB instruction set computer (RISC)
Virtual memory 64 TB 64 TB 64 TB 64 TB
1.5 MB L2/15
 ARM architecture is used in a wide variety of embedded
Cache 512 kB L2 256 kB L2 2 MB L2 systems and is one of the most powerful and best-designed
MB L3
Number of cores 1 1 2 6 RISC-based systems on the market

(d) Recent Processors


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Highlights of the Evolution of the
Highlights of the Evolution of the Intel Product Line:
Intel Product Line: Pentium
• Intel introduced the use of superscalar techniques, which allow multiple instructions to execute in parallel
8080 8086 80286 80386 80486 Pentium Pro
• World’s first • A more • Extension of the • Intel’s first 32- • Introduced the • Continued the move into superscalar organization with aggressive use of register renaming, branch
general- powerful 16-bit 8086 enabling bit machine use of much prediction, data flow analysis, and speculative execution
purpose machine addressing a • First Intel more
microprocessor • Has an 16-MB memory processor to sophisticated Pentium II
• 8-bit machine, instruction instead of just support and powerful
1MB cache • Incorporated Intel MMX technology, which is designed specifically to process video, audio, and graphics
8-bit data path cache, or multitasking data efficiently
to memory queue, that technology and
• Was used in the prefetches a sophisticated
few instructions instruction Pentium III
first personal
computer before they are pipelining • Incorporated additional floating-point instructions
(Altair) executed • Also offered a • Streaming SIMD Extensions (SSE)
• The first built-in math
appearance of coprocessor Pentium 4
the x86 • Includes additional floating-point and other enhancements for multimedia
architecture
• The 8088 was a Core
variant of this
processor and • First Intel x86 micro-core
used in IBM’s
first personal Core 2
computer • Extends the Core architecture to 64 bits
(securing the • Core 2 Quad provides four cores on a single chip
success of Intel • More recent Core offerings have up to 10 cores per chip
• An important addition to the architecture was the Advanced Vector Extensions instruction set
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Custom
logic
Embedded Systems
 The use of electronics and software within a product Processor Memory

 Billions of computer systems are produced each year


that are embedded within larger devices
Human Diagnostic
port
 Today many devices that use electric power have an interface
embedded computing system
A/D D/A
 Often embedded systems are tightly coupled to their conversion Conversion
environment
 This can give rise to real-time constraints imposed by the
need to interact with the environment
 Constraints such as required speeds of motion, required
Actuators/
Sensors
precision of measurement, and required time durations, indicators
dictate the timing of software operations
 If multiple activities must be managed simultaneously this
imposes more complex real-time constraints
Figure 1.14 Possible Organization of an Embedded System

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ +
The Internet of Things (IoT) Embedded Application Processors
Operating versus
 Term that refers to the expanding interconnection of smart devices, ranging Systems Dedicated Processors
from appliances to tiny sensors

 Is primarily driven by deeply embedded devices


 There are two general  Application processors
 Generations of deployment culminating in the IoT: approaches to developing an  Defined by the processor’s ability
Information technology (IT) to execute complex operating

embedded operating system systems
 PCs, servers, routers, firewalls, and so on, bought as IT devices by enterprise IT
people and primarily using wired connectivity (OS):  General-purpose in nature
 Operational technology (OT)  Take an existing OS and  An example is the smartphone –
the embedded system is designed
 Machines/appliances with embedded IT built by non-IT companies, such as adapt it for the embedded to support numerous apps and
medical machinery, SCADA, process control, and kiosks, bought as appliances by perform a wide variety of functions
enterprise OT people and primarily using wired connectivity application
Personal technology

 Design and implement an  Dedicated processor
 Smartphones, tablets, and eBook readers bought as IT devices by consumers
exclusively using wireless connectivity and often multiple forms of wireless OS intended solely for  Is dedicated to one or a small
connectivity embedded use number of specific tasks required
 Sensor/actuator technology by the host device
 Single-purpose devices bought by consumers, IT, and OT people exclusively  Because such an embedded system
using wireless connectivity, generally of a single form, as part of larger systems is dedicated to a specific task or
tasks, the processor and associated
components can be engineered to
 It is the fourth generation that is usually thought of as the IoT and it is marked reduce size and cost
by the use of billions of embedded devices
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Processor
Deeply Embedded Systems
 Subset of embedded systems
Analog data A/D Temporary  Has a processor whose behavior is difficult to observe both by the
RAM
acquisition converter data programmer and the user

 Uses a microcontroller rather than a microprocessor


Analog data D/A Program
ROM
transmission converter and data  Is not programmable once the program logic for the device has been
burned into ROM

Send/receive Serial I/O EEPROM


Permanent  Has no interaction with a user
data ports data
 Dedicated, single-purpose devices that detect something in the
environment, perform a basic level of processing, and then do
Peripheral Parallel I/O Timing something with the results
TIMER
interfaces ports System functions
 Often have wireless capability and appear in networked configurations,
bus
such as networks of sensors deployed over a large area

 Typically have extreme resource constraints in terms of memory,


processor size, time, and power consumption
Figure 1.15 Typical Microcontroller Chip Elements
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
ARM +
ARM Products
Refers to a processor architecture that has evolved from
RISC design principles and is used in embedded systems

Family of RISC-based microprocessors and microcontrollers


designed by ARM Holdings, Cambridge, England
Cortex-M
Chips are high-speed processors that are known for their • Cortex-M0
small die size and low power requirements
• Cortex-M0+
Probably the most widely used embedded processor • Cortex-M3
architecture and indeed the most widely used processor • Cortex-M4
architecture of any kind in the world
A/Cortex-
Acorn RISC Machine/Advanced RISC Machine

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Cloud Computing
Hard-
ware A/D D/A
AES con- con-
verter verter

Peripheral bus

 NIST defines cloud computing as:

oscillator “A model for enabling ubiquitous, convenient,


Microcontroller Chip
on-demand network access to a shared pool of
ICode
interface
SRAM &
peripheral I/F
configurable computing resources that can be
Bus matrix rapidly provisioned and released with minimal
management effort or service provider interaction.”

 You get economies of scale, professional network


Cortex-M3 Core management, and professional security management
Cortex-M3
Processor
 The individual or company only needs to pay for the storage
capacity and services they need

 Cloud provider takes care of security


Figure 1.16 Typical Microcontroller Chip Based on Cortex-M3

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Cloud Networking
 Refers to the networks and network management functionality that must
be in place to enable cloud computing

 One example is the provisioning of high-performance and/or high-


reliability networking between the provider and subscriber

 The collection of network capabilities required to access a cloud,


including making use of specialized services over the Internet, linking
enterprise data center to a cloud, and using firewalls and other network
security devices at critical points to enforce access security policies

Cloud Storage
 Subset of cloud computing

 Consists of database storage and database applications hosted


remotely on cloud servers

 Enables small businesses and individual users to take advantage of data


storage that scales with their needs and to take advantage of a variety of
database applications without having to buy, maintain, and manage the
storage assets
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Summary +
Basic Concepts and
Computer Evolution
Chapter 1
 Organization and architecture
 Embedded systems
 Structure and function
 The Internet of things
 Brief history of computers  Embedded operating systems
 The First Generation: Vacuum  Application processors versus
tubes
dedicated processors
 The Second Generation:
Transistors  Microprocessors versus
 The Third Generation: Integrated microcontrollers
Circuits  Embedded versus deeply
 Later generations embedded systems
William Stallings
The evolution of the Intel x86

architecture  ARM architecture Computer Organization
 ARM evolution
 Cloud computing
and Architecture
 Instruction set architecture
 Basic concepts
 ARM products
10th Edition
 Cloud services
© 2016 Pearson Education, Inc., Hoboken,
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. NJ. All rights reserved.
+

 Contemporary computer designs are based on concepts


developed by John von Neumann at the Institute for
Advanced Studies, Princeton

 Referred to as the von Neumann architecture and is based on


three key concepts:
 Data and instructions are stored in a single read-write memory
 The contents of this memory are addressable by location, without

Chapter 3
regard to the type of data contained there
 Execution occurs in a sequential fashion (unless explicitly
+ modified) from one instruction to the next

A Top-Level View of Computer  Hardwired program

Function and Interconnection  The result of the process of connecting the various components in
the desired configuration
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Data
Sequence of
arithmetic
and logic
functions
Results Software
• A sequence of codes or instructions
• Part of the hardware interprets each instruction and
Software
(a) Programming in hardware
generates control signals
• Provide a new sequence of codes for each new
program instead of rewiring the hardware
Instruction
codes
Instruction
interpreter
Major components:
• CPU I/O
Control
signals
• Instruction interpreter
• Module of general-purpose arithmetic and logic
General-purpose functions
arithmetic
Data
and logic
Results • I/O Components
functions • Input module
+ • Contains basic components for accepting data
(b) Programming in software
and instructions and converting them into an
internal form of signals usable by the system
• Output module
Figure 3.1 Hardware and Software Approaches
• Means of reporting results

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
CPU Main Memory
0
System 1
2
PC MAR Bus
Memory Memory buffer Instruction

address register (MBR) Instruction


Instruction
register (MAR) • Contains the data
IR MBR
• Specifies the to be written into
memory or I/O AR
address in memory Data
for the next read or receives the data Execution
Data
write read from memory unit I/O BR
Data
Data

I/O Module n–2


n–1

I/O address I/O buffer


register (I/OAR) register (I/OBR)
• Specifies a • Used for the
particular I/O exchange of data PC = Program counter
+ device between an I/O Buffers IR = Instruction register
module and the MAR = Memory address register
MBR = Memory buffer register
CPU
I/O AR = Input/output address register
MBR I/O BR = Input/output buffer register

Figure 3.2 Computer Components: Top-Level View


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+

Fetch Cycle Execute Cycle  At the beginning of each instruction cycle the processor
fetches an instruction from memory

 The program counter (PC) holds the address of the


START
Fetch Next Execute
HALT instruction to be fetched next
Instruction Instruction

 The processor increments the PC after each instruction


fetch so that it will fetch the next instruction in sequence

 The fetched instruction is loaded into the instruction


register (IR)

 The processor interprets the instruction and performs the


Figure 3.3 Basic Instruction Cycle required action

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
0 3 4 15
Action Categories Opcode Address

(a) Instruction format


• Data transferred from • Data transferred to or
processor to memory from a peripheral
or from memory to device by
processor transferring between 0 1 15
the processor and an S Magnitude
I/O module

(b) Integer format

Processor- Processor-
memory I/O Program Counter (PC) = Address of instruction
Instruction Register (IR) = Instruction being executed
Accumulator (AC) = Temporary storage

(c) Internal CPU registers

Data 0001 = Load AC from Memory


Control
processing 0010 = Store AC to Memory
0101 = Add to AC from Memory

• An instruction may • The processor may (d) Partial list of opcodes


specify that the perform some
sequence of arithmetic or logic
execution be altered operation on data

Figure 3.4 Characteristics of a Hypothetical Machine

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 0 PC 300 1 9 4 0 3 0 1 PC
301 5 9 4 1 AC 301 5 9 4 1 0 0 0 3 AC
302 2 9 4 1 1 9 4 0 IR 302 2 9 4 1 1 9 4 0 IR Instruction Operand Operand
• •

940 0 0 0 3

940 0 0 0 3 fetch fetch store
941 0 0 0 2 941 0 0 0 2
Step 1 Step 2
Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 1 PC 300 1 9 4 0 3 0 2 PC Multiple Multiple
301 5 9 4 1 0 0 0 3 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 5 9 4 1 IR 302 2 9 4 1 5 9 4 1 IR
operands results
• •
• •
940 0 0 0 3 940 0 0 0 3 3+2=5
941 0 0 0 2 941 0 0 0 2
Instruction Instruction Operand Operand
Step 3 Step 4 Data
address operation address address
Operation
Memory CPU Registers Memory CPU Registers calculation decoding calculation calculation
300 1 9 4 0 3 0 2 PC 300 1 9 4 0 3 0 3 PC
301 5 9 4 1 0 0 0 5 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 2 9 4 1 IR 302 2 9 4 1 2 9 4 1 IR



• Return for string
940 0 0 0 3 940 0 0 0 3 Instruction complete, or vector data
941 0 0 0 2 941 0 0 0 5 fetch next instruction
Step 5 Step 6

Figure 3.5 Example of Program Execution


(contents of memory and registers in hexadecimal) Figure 3.6 Instruction Cycle State Diagram
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
User I/O User I/O User I/O
Program Program Program Program Program Program

1 4 1 4 1 4

Program Generated by some condition that occurs as a result of an instruction I/O


Command
I/O
Command
I/O
Command
WRITE WRITE WRITE
execution, such as arithmetic overflow, division by zero, attempt to
5
execute an illegal machine instruction, or reference outside a user's
2a
allowed memory space. END
2 2
Timer Generated by a timer within the processor. This allows the operating Interrupt Interrupt
system to perform certain functions on a regular basis. 2b Handler Handler

I/O Generated by an I/O controller, to signal normal completion of an WRITE WRITE 5 WRITE 5

operation, request service from the processor, or to signal a variety of END END
error conditions. 3a

Hardware failure Generated by a failure such as power failure or memory parity error. 3 3

3b

WRITE WRITE WRITE


Table 3.1 (a) No interrupts (b) Interrupts; short I/O wait (c) Interrupts; long I/O wait

Classes of Interrupts = interrupt occurs during course of execution of user program

Figure 3.7 Program Flow of Control Without and With Interrupts


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
User Program Interrupt Handler

1 Fetch Cycle Execute Cycle Interrupt Cycle


2
Interrupts
Disabled
Check for
Fetch Next Execute
START Interrupt;
Instruction Instruction Interrupts Process Interrupt
i Enabled
Interrupt
occurs here i+1

HALT

Figure 3.9 Instruction Cycle with Interrupts


Figure 3.8 Transfer of Control via Interrupts

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Time Time

1 1 1 1

4 4 4 4
I/O operation
I/O operation;
processor waits
2a concurrent with I/O operation; 2 I/O operation
processor executing processor waits concurrent with
processor executing;
5 5 then processor
waits
5
2b
2
5
4 2
I/O operation 4
4 3a concurrent with
processor executing 4
I/O operation;
processor waits 5 3 I/O operation
concurrent with
I/O operation; processor executing;
5 3b processor waits then processor
waits

(b) With interrupts 5


3 5

3 (b) With interrupts

(a) Without interrupts


(a) Without interrupts

Figure 3.10 Program Timing: Short I/O Wait Figure 3.11 Program Timing: Long I/O Wait

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Interrupt
User program handler X

Instruction Operand Operand Interrupt


fetch fetch store handler Y

Multiple Multiple
operands results
(a) Sequential interrupt processing

Instruction Instruction Operand Operand Interrupt


Data Interrupt
address operation address address Interrupt User program handler X
Operation check
calculation decoding calculation calculation

No
Instruction complete, Return for string interrupt
fetch next instruction or vector data
Interrupt
handler Y

Figure 3.12 Instruction Cycle State Diagram, With Interrupts


(b) Nested interrupt processing

Figure 3.13 Transfer of Control with Multiple Interrupts

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Printer Communication
+
User program
interrupt service routine interrupt service routine
t=0

 I/O module can exchange data directly with the processor

 Processor can read data from or write data to an I/O module


Disk  Processor identifies a specific device that is controlled by a
interrupt service routine
particular I/O module
 I/O instructions rather than memory referencing instructions

 In some cases it is desirable to allow I/O exchanges to occur


directly with memory
 The processor grants to an I/O module the authority to read from
or write to memory so that the I/O memory transfer can occur
without tying up the processor
 The I/O module issues read or write commands to memory
relieving the processor of responsibility for the exchange
 This operation is known as direct memory access (DMA)
Figure 3.14 Example Time Sequence of Multiple Interrupts
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Read Memory
Write
The interconnection structure must support the
Address
N Words
0 Data following types of transfers:
Data N–1

Memory Processor I/O to or


I/O to Processor
to to from
processor to I/O
Read I/O Module Internal
processor memory memory
Write Data

External
Address M Ports Data

Internal
An I/O
Data Interrupt module is
Signals
External
Data
allowed to
exchange
data
Processor Processor
directly
reads an Processor reads data Processor
with
instruction writes a from an I/O sends data
memory
Instructions Address or a unit of unit of data device via to the I/O
without
data from to memory an I/O device
Control going
Data CPU Signals memory module
through the
Interrupt Data processor
Signals using direct
memory
access
Figure 3.15 Computer Modules

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
A communication pathway Signals transmitted by any
connecting two or more one device are available for
devices reception by all other
devices attached to the bus
I
• Key characteristic is that it is a
shared transmission medium • If two devices transmit during the
Data lines that provide a path for moving data among system
n
same time period their signals will 
overlap and become garbled
modules
n
 May consist of 32, 64, 128, or more separate lines
Typically consists of multiple
Computer systems contain a  The number of lines is referred to as the width of the data bus
B
communication lines
number of different buses
• Each line is capable of that provide pathways
transmitting signals representing
binary 1 and binary 0 between components at  The number of lines determines how many bits can be
u t
various levels of the
computer system hierarchy
transferred at a time

 The width of the data bus


i is a key factor in
System bus
determining overall
• A bus that connects major
computer components (processor,
The most common computer
interconnection structures
o system performance
memory, I/O)
are based on the use of one
or more system buses
n
n © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Address Bus Control Bus

 Used to designate the source or


destination of the data on the  Used to control the access and the
data bus use of the data and address lines
 If the processor wishes to
CPU Memory Memory I/O I/O
read a word of data from  Because the data and address lines
memory it puts the address of are shared by all components there
the desired word on the must be a means of controlling their
use Control lines
address lines
 Control signals transmit both Address lines Bus
 Width determines the maximum
command and timing information
possible memory capacity of the among system modules Data lines
system
 Timing signals indicate the validity
 Also used to address I/O ports of data and address information
 The higher order bits are
used to select a particular  Command signals specify operations
module on the bus and the Figure 3.16 Bus Interconnection Scheme
to be performed
lower order bits select a
memory location or I/O port
within the module
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Point-to-Point Interconnect +Quick Path Interconnect
Principal reason for change At higher and higher data  Introduced in 2008
was the electrical rates it becomes
constraints encountered increasingly difficult to  Multiple direct connections
with increasing the perform the synchronization
frequency of wide and arbitration functions in a  Direct pairwise connections to other components
synchronous buses timely fashion eliminating the need for arbitration found in shared
transmission systems

 Layered protocol architecture

 These processor level interconnects use a layered


A conventional shared bus protocol architecture rather than the simple use of
on the same chip magnified control signals found in shared bus arrangements
Has lower latency, higher
the difficulties of increasing
data rate, and better
bus data rate and reducing
scalability  Packetized data transfer
bus latency to keep up with
the processors
 Data are sent as a sequence of packets each of which
includes control headers and error control codes

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
I/O device

I/O device
I/O Hub

Packets
Protocol Protocol

DRAM

DRAM
Core Core
A B

Routing Routing
DRAM

DRAM
Core Core
C D Flits
Link Link
I/O device

I/O device
I/O Hub Physical Phits Physical

QPI PCI Express Memory bus

Figure 3.17 Multicore Configuration Using QPI Figure 3.18 QPI Layers
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
COMPONENT A
Intel QuickPath Interconnect Port
Fwd Clk

Rcv Clk
Transmission Lanes Reception Lanes
#2n+1 #n+1 #1 QPI
lane 0

bit stream of flits #2n+2 #n+2 #2 QPI


lane 1

#2n+1 #2n #n+2 #n+1 #n #2 #1

Fwd Clk
Rcv Clk

Reception Lanes Transmission Lanes

Intel QuickPath Interconnect Port


#3n #2n #n QPI
COMPONENT B lane 19

Figure 3.19 Physical Interface of the Intel QPI Interconnect


Figure 3.20 QPI Multilane Distribution

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ +

 Flow control function


 Performs two key  Needed to ensure that a Routing Layer Protocol Layer
functions: flow control and sending QPI entity does not
error control overwhelm a receiving QPI  Packet is defined as the unit of
entity by sending data faster  Used to determine the course transfer
 Operate on the level of than the receiver can process that a packet will traverse
the flit (flow control the data and clear buffers for One key function performed at
across the available system 
unit) more incoming data
this level is a cache coherency
interconnects
 Each lit consists of a 72- protocol which deals with
bit message payload  Defined by firmware and making sure that main
 Error control function
and an 8-bit error describe the possible paths memory values held in
control code called a  Detects and recovers from that a packet can follow multiple caches are consistent
cyclic redundancy check bit errors, and so isolates
(CRC) higher layers from  A typical data packet payload
experiencing bit errors is a block of data being sent to
or from a cache

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Core Core

Gigabit PCIe
Memory
Ethernet
Chipset
 A popular high bandwidth, processor independent bus that can
PCIe
function as a mezzanine or peripheral bus PCIe–PCI
Bridge
Memory

 Delivers better system performance for high speed I/O PCIe


subsystems
PCIe PCIe
 PCI Special Interest Group (SIG) Switch
 Created to develop further and maintain the compatibility of the PCI
specifications
PCIe PCIe

 PCI Express (PCIe)


Legacy PCIe PCIe PCIe
 Point-to-point interconnect scheme intended to replace bus-based endpoint endpoint endpoint endpoint
schemes such as PCI
 Key requirement is high capacity to support the needs of higher data rate
I/O devices, such as Gigabit Ethernet
 Another requirement deals with the need to support time dependent data
streams Figure 3.21 Typical Configuration Using PCIe
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Transaction layer
packets (TLP)
Transaction Transaction
128b/ PCIe
B4 B0
Data link layer byte stream
130b lane 0

packets (DLLP) B5 B1 128b/ PCIe


Data Link Data Link 130b lane 1
B7 B6 B5 B4 B3 B2 B1 B0

128b/ PCIe
B6 B2
130b lane 2

Physical Physical 128b/ PCIe


B7 B3
130b lane 3

Figure 3.22 PCIe Protocol Layers Figure 3.23 PCIe Multilane Distribution

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
D+ D–

+ Receives read and write requests from


8b 

Differential
the software above the TL and creates
Scrambler Receiver request packets for transmission to a
8b 1b Clock recovery destination via the link layer
circuit

128b/130b Encoding
Data recovery  Most transactions use a split transaction
circuit
technique
130b 1b
 A request packet is sent out by a
source PCIe device which then waits
Parallel to serial Serial to parallel
for a response called a completion
1b 130b packet
Transmitter Differential
128b/130b Decoding  TL messages and some write
Driver
transactions are posted transactions
128b (meaning that no response is
D+ D– expected)
Descrambler
(a) Transmitter  TL packet format supports 32-bit
8b
memory addressing and extended
(b) Receiver 64-bit memory addressing

Figure 3.24 PCIe Transmit and Receive Block Diagrams

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 3.2
PCIe TLP Transaction Types
Address Space TLP Type Purpose
Memory Read Request
Transfer data to or from a location in the
 Memory I/O Memory Memory Read Lock Request
 system memory map.
 The memory space includes Memory Write Request
 This address space is used
system main memory and I/O Read Request Transfer data to or from a location in the
PCIe I/O devices
for legacy PCI devices, with I/O
reserved address ranges I/O Write Request system memory map for legacy devices.
 Certain ranges of memory
addresses map into I/O used to address legacy I/O Config Type 0 Read Request
devices devices
Config Type 0 Write Request Transfer data to or from a location in the
Configuration
Config Type 1 Read Request configuration space of a PCIe device.
 Configuration  Message Config Type 1 Write Request
 This address space enables  This address space is for Message Request Provides in-band messaging and event
the TL to read/write Message
control signals related to Message Request with Data reporting.
configuration registers interrupts, error handling,
Completion
associated with I/O devices and power management
Memory, I/O, Completion with Data
Returned for certain requests.
Configuration Completion Locked
Completion Locked with Data
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Number
of octets
1 STP framing 1 Start

+ Summary A Top-Level View of

Appended by PL
2 Sequence number
DLLP

Created
by DLL
4

2 CRC Computer Function


and Interconnection
12 or 16 Header 1 End

Chapter 3

Created by Transaction Layer

Appended by Data Link Layer

Appended by Physical Layer


 Point-to-point interconnect
 QPI physical layer
 Computer components
Data  QPI link layer
Computer function
0 to 4096

 QPI routing layer
 Instruction fetch and
execute  QPI protocol layer
 Interrupts  PCI express
0 or 4 ECRC  I/O function  PCI physical and logical
 Interconnection structures architecture
4 LCRC

 Bus interconnection  PCIe physical layer


1 STP framing

 PCIe transaction layer


(a) Transaction Layer Packet (b) Data Link Layer Packet

Figure 3.25 PCIe Protocol Data Unit Format  PCIe data link layer
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+

William Stallings + Chapter 5


Computer Organization
and Architecture Internal Memory
10th Edition
© 2016 Pearson Education, Inc., Hoboken,
NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Control Control Write
Memory Type Category Erasure Volatility
Mechanism
Random-access Read-write Electrically,
Electrically Volatile
memory (RAM) memory byte-level
Read-only
Masks
memory (ROM) Read-only
Not possible
Select Data in Select Sense Programmable memory
Cell Cell ROM (PROM)
Erasable PROM UV light, chip-
(EPROM) level Nonvolatile
Electrically Electrically
(a) Write (b) Read Read-mostly Electrically,
Erasable PROM
memory byte-level
(EEPROM)
Electrically,
Flash memory
block-level

Figure 5.1 Memory Cell Operation Table 5.1


Semiconductor Memory Types

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
dc voltage

Address line T3 T4

T5 C1 C2 T6
 RAM technology is divided into two technologies: Transistor

 Dynamic RAM (DRAM)


Storage
 Static RAM (SRAM) capacitor
T1 T2
 DRAM
Ground
 Made with cells that store data as charge on capacitors Bit line
Ground
B

 Presence or absence of charge in a capacitor is interpreted as Bit line Address Bit line
a binary 1 or 0 B line B

 Requires periodic charge refreshing to maintain data storage (a) Dynamic RAM (DRAM) cell (b) Static RAM (SRAM) cell

 The term dynamic refers to tendency of the stored charge to


leak away, even with power continuously applied Figure 5.2 Typical Memory Cell Structures

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Static RAM  Both volatile
(SRAM)  Power must be continuously supplied to the
memory to preserve the bit values

 Digital device that uses the same  Dynamic cell


logic elements used in the  Simpler to build, smaller
processor  More dense (smaller cells = more cells per unit
area)
 Binary values are stored using  Less expensive
traditional flip-flop logic gate
 Requires the supporting refresh circuitry
configurations
 Tend to be favored for large memory
 Will hold its data as long as power + requirements
is supplied to it  Used for main memory

 Static
 Faster
 Used for cache memory (both on and off chip)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ +

 Contains a permanent pattern of data that cannot be


changed or added to  Less expensive alternative

 No power source is required to maintain the bit values in  Nonvolatile and may be written into only once
memory
 Writing process is performed electrically and may be
 Data or program is permanently in main memory and never performed by supplier or customer at a time later than the
needs to be loaded from a secondary storage device original chip fabrication

 Data is actually wired into the chip as part of the fabrication  Special equipment is required for the writing process
process
 Disadvantages of this:  Provides flexibility and convenience
 No room for error, if one bit is wrong the whole batch of ROMs  Attractive for high volume production runs
must be thrown out
 Data insertion step includes a relatively large fixed cost

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
RAS CAS WE OE

Read-Mostly Memory Timing and Control

Flash
EPROM EEPROM
Memory
Refresh
Counter MUX

Electrically erasable
Erasable programmable programmable read-only Intermediate between
memory EPROM and EEPROM in
read-only memory Row Row Memory array
both cost and functionality De-
Address (2048 2048 4)
A0 coder
A1 Buffer
Can be written into at any
time without erasing prior
contents
Erasure process can be Uses an electrical erasing
technology, does not
performed repeatedly Data Input
Combines the advantage of provide byte-level erasure D1
A10 Column Buffer
non-volatility with the Address D2
flexibility of being Refresh circuitry D3
Buffer Data Output D4
updatable in place Buffer
More expensive than Microchip is organized so Column Decoder
PROM but it has the that a section of memory
advantage of the multiple More expensive than cells are erased in a single
update capability EPROM action or “flash”

Figure 5.3 Typical 16 Megabit DRAM (4M 4)


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
A19 1 32 Vcc Vcc 1 24 Vss
1M 8 4M 4 512 words by

Decode 1 of
A16 2 31 A18 D0 2 23 D3 Memory address 512 bits

512
A15 3 30 A17 D1 3 22 D2 register (MAR) Chip #1

A12 4 29 A14 WE 4 21 CAS


A7 5 28 A13 RAS 5 20 OE 9 Decode 1 of
A6 6 27 A8 NC 6 19 A9 512 bit-sense Memory buffer
register (MBR)
A5 7 26 A9 A10 7 24 Pin Dip 18 A8
1
A4 8 25 A11 A0 8 17 A7 2
0.6" 9
A3 9 32 Pin Dip 24 Vpp A1 9 16 A6 3
4
A2 10 23 A10 A2 10 15 A5 5
0.6"
A1 11 CE A3 11 A4 6
22 14
7
A0 12 21 D7 Vcc 12 Top View 13 Vss 8
D0 13 20 D6
D1 14 19 D5
D2 15 18 D4
Vss 16 Top View 17 D3 512 words by

Decode 1 of
512 bits

512
Chip #8
(a) 8 Mbit EPROM (b) 16 Mbit DRAM
Decode 1 of
512 bit-sense

Figure 5.4 Typical Memory Package Pins and Signals


Figure 5.5 256-KByte Memory Organization
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Interleaved Memory Composed of a collection of
DRAM chips

Grouped together to form a


memory bank

Each bank is independently


able to service a memory read
or write request

K banks can service K requests


simultaneously, increasing
+ memory read or write rates by
a factor of K

If consecutive words of
memory are stored in different
banks, the transfer of a block of
memory is speeded up

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Error Signal

 Hard Failure
 Permanent physical defect
Data Out M
 Memory cell or cells affected cannot reliably store data but become Corrector
stuck at 0 or 1 or switch erratically between 0 and 1
 Can be caused by:
 Harsh environmental abuse
 Manufacturing defects Data In M M K
 Wear
f
Memory Compare
K K
 Soft Error f
 Random, non-destructive event that alters the contents of one or more
memory cells
 No permanent damage to memory
 Can be caused by:
 Power supply problems Figure 5.7 Error-Correcting Code Function
 Alpha particles

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
(a) A B (b) A B

1 1 1 0 Single-Error Correction Single-Error Correction/


1 1
1 0 1 0 Double-Error Detection
Data Bits Check Bits % Increase Check Bits % Increase
0
8 4 50 5 62.5
C C 16 5 31.25 6 37.5
(c) A B (d) A B 32 6 18.75 7 21.875
64 7 10.94 8 12.5
1 1 0 1 1 0 128 8 6.25 9 7.03
1 1
0 0 0 0 256 9 3.52 10 3.91

0 0
Table 5.2
C C
Increase in Word Length with Error Correction

Figure 5.8 Hamming Error-Correcting Code

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Bit
12 11 10 9 8 7 6 5 4 3 2 1
position
Position
1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001
number
Bit
12 11 10 9 8 7 6 5 4 3 2 1 Data bit D8 D7 D6 D5 D4 D3 D2 D1
Position
Position Check
C8 C4 C2 C1
1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001 bit
Number
Word
Data stored
D8 D7 D6 D5 D4 D3 D2 D1 0 0 1 1 0 1 0 0 1 1 1 1
Bit as
Check Word
C8 C4 C2 C1
Bit fetched 0 0 1 1 0 1 1 0 1 1 1 1
as
Position
1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001
Figure 5.9 Layout of Data Bits and Check Bits Number
Check
0 0 0 1
Bit

Figure 5.10 Check Bit Calculation


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
(a) (b) (c)

0 0 0 1 0 SDRAM
1 1
1
0
1
Advanced DRAM Organization
1 0 1 0 1 0
0 0 DDR-DRAM
1 1  One of the most critical system bottlenecks when
using high-performance processors is the
(d) (e) (f) interface to main internal memory

 The traditional DRAM chip is constrained both by


1 0 1 1 0 1 1 0 1 its internal architecture and by its interface to the
0 0 0 processor’s memory bus RDRAM
1 0 1 1 1 1
 A number of enhancements to the basic DRAM
0 0 0 architecture have been explored
1 1 1 +
 The schemes that currently dominate the market
are SDRAM and DDR-DRAM

Figure 5.11 Hamming SEC-DED Code


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Synchronous DRAM (SDRAM)

One of the most widely used forms of DRAM

Exchanges data with the processor synchronized


to an external clock signal and running at the full
speed of the processor/memory bus without
imposing wait states

With synchronous access the DRAM moves data in


and out under control of the system clock
• The processor or other master issues the instruction
and address information which is latched by the DRAM
• The DRAM then responds after a set number of clock
cycles
• Meanwhile the master can safely do other tasks while
the SDRAM is processing

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
A0 to A12 Address inputs
BA0, BA1 Bank address lines
CLK Clock input Table 5.3 T0 T1 T2 T3 T4 T5 T6 T7 T8

CKE Clock enable CLK

Chip select
SDRAM
CS COMMAND READ A NOP NOP NOP NOP NOP NOP NOP NOP

Pin
RAS Row address strobe Assignment DQs DOUT A0 DOUT A1 DOUT A2 DOUT A3

CAS Column address strobe s


WE Write enable
DQ0 to DQ15 Data input/output
DQM Data mask Figure 5.13 SDRAM Read Timing (Burst Length = 4, CAS latency = 2)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
DDR1 DDR2 DDR3 DDR4

Developed by the JEDEC Solid State Technology Association Prefetch buffer


 2 4 8 8
(bits)
(Electronic Industries Alliance’s semiconductor-engineering-
standardization body) Voltage level (V) 2.5 1.8 1.5 1.2
Front side bus
200—400 400—1066 800—2133 2133—4266
 Numerous companies make DDR chips, which are widely data rates (Mbps)
used in desktop computers and servers

 DDR achieves higher data rates in three ways:


 First, the data transfer is synchronized to both the rising and
Table 5.4
falling edge of the clock, rather than just the rising edge DDR Characteristics
 Second, DDR uses higher clock rate on the bus to increase the
transfer rate
 Third, a buffering scheme is used

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Flash Memory

 Used both for internal memory and external memory applications

 First introduced in the mid-1980’s

 Is intermediate between EPROM and EEPROM in both cost and


functionality

 Uses an electrical erasing technology like EEPROM

 It is possible to erase just blocks of memory rather than an entire chip

 Gets its name because the microchip is organized so that a section of


memory cells are erased in a single action

 Does not provide byte-level erasure

 Uses only one transistor per bit so it achieves the high density of
EPROM

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Control Gate

N+ N+
Drain Source
P-substrate

(a) Transistor structure

+ + + + + +
Control Gate Control Gate

Floating Gate – – – – – –
N+ N+ N+ N+
Drain Source Drain Source
P-substrate P-substrate

(b) Flash memory cell in one state (c) Flash memory cell in zero state

Figure 5.15 Flash Memory Operation


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Increasing performance
and endurance

Cost per bit Cost per bit SRAM


Low Low
STT-RAM
File storage File storage
Standby Low Standby Low
use use
power power
Easy Easy

DRAM
High
High
Hard High
High
Hard
PCRAM
Low Easy Low Easy
High Hard High Hard NAND FLASH
Active Code Active Code
Low Low Low Low
power Low execution power Low execution

ReRAM
High High
High
Capacity
High
Capacity
HARD DISK
Read speed Read speed
High High
Write speed Write speed

Decreasing cost
(a) NOR (b) NAND
per bit,
increasing capacity
or density

Figure 5.17 Kiviat Graphs for Flash Memory


Figure 5.18 Nonvolatile RAM within the Memory Hierarchy
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Bit line Bit line

+ Summary
Perpendicular Perpendicular
Free binary 0 Free binary 1
magnetic layer magnetic layer
layer layer
Interface layer Interface layer

Internal
Direc tion of Direction of
Insulating layer Insulating layer
magn etization magnetization
Interface layer Interface layer
Reference Reference
Perpendicular Perpendicular

Memory
layer Elec tric layer Electric
magnetic layer magnetic layer
curr ent current

Base electrode Base electrode

(a) STT-RAM Chapter 5


Top electrode Top electrode
Polycrystaline
Polycrystaline
chalcogenide
Amorphous
chalcogenide
chalcogenide  Semiconductor main memory
 DDR DRAM
Heater
Insulator
Heater
Insulator
 Organization
 Synchronous DRAM
Bottom electrode Bottom electrode  DRAM and SRAM
 DDR SDRAM
(b) PCRAM  Types of ROM
 Chip logic  Flash memory
Top electrode
Reduction:
Top electrode
Oxidation:  Chip packaging  Operation
Insulator low resistance Insulator high resistance
Filament Filament  Module organization  NOR and NAND flash memory
Metal oxide Metal oxide  Interleaved memory
 Newer nonvolatile solid-state
Bottom electrode Bottom electrode
 Error correction memory technologies
(c) ReRAM

Figure 5.19 Nonvolatile RAM Technologies

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+

William Stallings + Chapter 4


Computer Organization
and Architecture
Cache Memory
10th Edition
© 2016 Pearson Education, Inc., Hoboken,
NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Location Performance
Internal (e.g. processor registers, cache, Access time
main memory) Cycle time
External (e.g. optical disks, magnetic disks, Transfer rate
tapes) Physical Type
Capacity Semiconductor  Location
Number of words Magnetic  Refers to whether memory is internal and external to the computer
Number of bytes Optical  Internal memory is often equated with main memory
Unit of Transfer Magneto-optical
 Processor requires its own local memory, in the form of registers
Word Physical Characteristics
Block Volatile/nonvolatile  Cache is another form of internal memory
Access Method Erasable/nonerasable  External memory consists of peripheral storage devices that are
Sequential Organization accessible to the processor via I/O controllers
Direct Memory modules
Random  Capacity
Associative  Memory is typically expressed in terms of bytes

 Unit of transfer
Table 4.1
 For internal memory the unit of transfer is equal to the number of
Key Characteristics of Computer Memory Systems electrical lines into and out of the memory module

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Method of Accessing Units of Data Capacity and Performance:

Sequential Direct Random


Associative
The two most important characteristics of
access access access memory
Each addressable A word is retrieved
Memory is organized into
Involves a shared read- location in memory has a based on a portion of its
units of data called
write mechanism unique, physically wired- contents rather than its
records
in addressing mechanism address

Three performance parameters are used:


Each location has its own
The time to access a
Individual blocks or addressing mechanism
Access must be made in given location is
records have a unique and retrieval time is
a specific linear independent of the
address based on constant independent of
sequence sequence of prior
physical location location or prior access
accesses and is constant
patterns Memory cycle time
Access time (latency) Transfer rate
• Access time plus any additional
• For random-access memory it is the time required before second • The rate at which data can be
time it takes to perform a read or access can commence transferred into or out of a memory
Any location can be
Cache memories may write operation • Additional time may be required unit
selected at random and
Access time is variable Access time is variable
directly addressed and
employ associative • For non-random-access memory it for transients to die out on signal • For random-access memory it is
access is the time it takes to position the equal to 1/(cycle time)
accessed lines or to regenerate data if they
read-write mechanism at the are read destructively
desired location • Concerned with the system bus,
not the processor

Main memory and some


cache systems are
random access

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ + Memory
Example
 The most common forms are:
 Semiconductor memory
 Magnetic surface memory
 Optical
Suppose that the processor has access to two levels of memory. Level  Magneto-optical
1 contains 1000 words and has an access time of 0.1 microseconds;
level 2 contains 100,000 words and has an access time of 1  Several physical characteristics of data storage are important:
Volatile memory
microsecond. Assume that if a word is to be accessed is in level 1, 
 Information decays naturally or is lost when electrical power is switched off
then the processor accesses it directly. If it is in level 2, then the  Nonvolatile memory
word is irst transferred to level 1 and then accessed by the  Once recorded, information remains without deterioration until deliberately changed
processor. For simplicity, ignore the time required for the processor  No electrical power is needed to retain information

to determine whether the word is in level 1 or level 2.  Magnetic-surface memories


 Are nonvolatile

Suppose 95% of the memory accesses are found in cache(level 1).  Semiconductor memory
 May be either volatile or nonvolatile
Calculate the average time required to access a word?
 Nonerasable memory
Average time to access a word= (0.95) (0.1 microsec.) + (0.05) (0.1  Cannot be altered, except by destroying the storage unit
 Semiconductor memory of this type is known as read-only memory (ROM)
microsec. + 1 microsec)
 For random-access memory the organization is a key design issue
= 0.095+0.055  Organization refers to the physical arrangement of bits to form words
= 0.15 microsec. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
 Design constraints on a computer’s memory can be summed
up by three questions:
 How much, how fast, how expensive

 There is a trade-off among capacity, access time, and cost


 Faster access time, greater cost per bit
 Greater capacity, smaller cost per bit
 Greater capacity, slower access time

 The way out of the memory dilemma is not to rely on a single


memory component or technology, but to employ a memory
hierarchy

Figure 4.1 The Memory Hierarchy

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
T1 + T2 +
Memory
T2

 The use of three levels exploits the fact that semiconductor

Average access time


memory comes in a variety of types which differ in speed
and cost

 Data are stored more permanently on external mass storage


devices

T1  External, nonvolatile memory is also referred to as


secondary memory or auxiliary memory

0 1  Disk cache
Fraction of accesses involving only Level 1 (Hit ratio)  A portion of main memory can be used as a buffer to hold data
temporarily that is to be read out to disk
 A few large transfers of data can be used instead of many small
transfers of data
Figure 4.2 Performance of a Simple Two-Level Memory  Data can be retrieved rapidly from the software cache rather than
slowly from the disk
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Block Transfer Line Memory
Number Tag Block address
Word Transfer
0 0
1 1
2 2 Block 0
3 (K words)
CPU Cache Main Memory
Fast Slow

C–1
(a) Single cache Block Length
(K Words)

(a) Cache

Level 1 Level 2 Level 3 Main


CPU
(L1) cache (L2) cache (L3) cache Memory
Block M – 1
Fastest Fast
Less Slow
fast
2n – 1
Word
Length
(b) Three-level cache organization
(b) Main memory

Figure 4.4 Cache/Main-Memory Structure


Figure 4.3 Cache and Main Memory

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
START
Address

Receive address
RA from CPU Address
buffer

Is block No Access main


containing RA memory for block

System Bus
in cache? containing RA
Control Control
Yes Processor Cache
Fetch RA word Allocate cache
and deliver line for main
to CPU memory block

Data
buffer
Load main
Deliver RA word
memory block
to CPU
into cache line
Data

DONE

Figure 4.5 Cache Read Operation Figure 4.6 Typical Cache Organization
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Cache Addresses Write Policy Cache Addresses
Logical Write through
Physical Write back Virtual Memory
Cache Size Line Size
Mapping Function Number of caches  Virtual memory
Direct Single or two level
 Facility that allows programs to address memory from a logical
Associative Unified or split point of view, without regard to the amount of main memory
Set Associative physically available
Replacement Algorithm  When used, the address fields of machine instructions contain
Least recently used (LRU) virtual addresses
First in first out (FIFO)  For reads to and writes from main memory, a hardware memory
Least frequently used (LFU) management unit (MMU) translates each virtual address into a
physical address in main memory
Random

Table 4.2
Elements of Cache Design
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Year of
Processor Type L1 Cachea L2 cache L3 Cache
Introduction
IBM 360/85 Mainframe 1968 16 to 32 kB — —
Logical address Physical address
MMU PDP-11/70 Minicomputer 1975 1 kB — —
VAX 11/780 Minicomputer 1978 16 kB — —
Processor Main
Cache memory IBM 3033 Mainframe 1978 64 kB — —
IBM 3090 Mainframe 1985 128 to 256 kB — —
Data
Intel 80486 PC 1989 8 kB — —
Pentium
PowerPC 601
PC
PC
1993
1993
8 kB/8 kB
32 kB
256 to 512 KB


— Table 4.3
PowerPC 620 PC 1996 32 kB/32 kB — —
(a) Logical Cache
PowerPC G4 PC/server 1999 32 kB/32 kB 256 KB to 1 MB 2 MB
IBM S/390 G6 Mainframe 1999 256 kB 8 MB — Cache Sizes of
Pentium 4 PC/server 2000 8 kB/8 kB 256 KB — Some
High-end
Logical address Physical address
IBM SP server/ 2000 64 kB/32 kB 8 MB — Processors
MMU supercomputer
CRAY MTAb Supercomputer 2000 8 kB 2 MB —
Processor Main
memory Itanium PC/server 2001 16 kB/16 kB 96 KB 4 MB
Cache
Itanium 2 PC/server 2002 32 kB 256 KB 6 MB
IBM High-end
Data 2003 64 kB 1.9 MB 36 MB
POWER5 server
CRAY XD-1 Supercomputer 2004 64 kB/64 kB 1MB —
a
Two values separated by
IBM
PC/server 2007 64 kB/64 kB 4 MB 32 MB a slash refer to instruction
(b) Physical Cache POWER6
and data caches.
IBM z10 Mainframe 2008 64 kB/128 kB 3 MB 24-48 MB
Intel Core i7 Workstaton/ b
Both caches are
EE 990
2011 6 32 kB/32 kB 1.5 MB 12 MB instruction only; no data
server
caches.
IBM 24 MB L3
Mainframe/ 24 64 kB/
Figure 4.7 Logical and Physical Caches zEnterprise
Server
2011 24 1.5 MB 192 MB
196 128 kB
L4 (Table can be found on
page 134 in the textbook.)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
b t b
B0 L0

m lines
 Because there are fewer cache lines than main memory
blocks, an algorithm is needed for mapping main memory
blocks into cache lines Bm–1
First m blocks of
Lm–1

cache memory
main memory
 Three techniques can be used: (equal to size of cache) b = length of block in bits
t = length of tag in bits
(a) Direct mapping

t b

exhibits the strengths of L0

one block of
main memory

Lm–1
cache memory
(b) Associative mapping

Figure 4.8 Mapping From Main Memory to Cache:


Direct and Associative

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
s+w
+
Memory Address Tag
Cache
Data
Main Memory
WO
Direct Mapping
Tag Line Word W1 B0
L0 W2
s–r r w W3

 Each block of main memory maps to only one cache line


s–r
 i.e. if a block is in cache, it must be in one specific place

w Li
s
W4j  Address is in two parts
Compare W(4j+1) Bj
w
W(4j+2)

(hit in cache)
W(4j+3)  Least Significant w bits identify unique word or byte within a block
1 if match
0 if no match
of main memory

 Most Significant s bits specify one of 2s memory block


Lm–1
0 if match
1 if no match  The MSBs are split into a cache line field r and a tag of s-r (most
(miss in cache)
significant)

Figure 4.9 Direct-Mapping Cache Organization

24
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Main memory address (binary)
Tag
(hex) Tag Line + Word Data
00 000000000000000000000000 13579246
00 000000000000000000000100

00 000000001111111111111000
00 000000001111111111111100
Line
Tag Data Number
16 000101100000000000000000 77777777 00 13579246 0000
 Address length = (s + w) bits 16 000101100000000000000100 11235813 16 11235813 0001

16 000101100011001110011100 FEDCBA98 16 FEDCBA98 0CE7

 Number of addressable units = 2s+w words or bytes FF 11223344 3FFE


16 000101101111111111111100 12345678 16 12345678 3FFF

 Block size = line size = 2w words or bytes FF 111111110000000000000000


8 bits 32 bits
16-Kline cache
FF 111111110000000000000100

 Number of blocks in main memory = 2s+ w/2w = 2s


FF 111111111111111111111000 11223344
FF 111111111111111111111100 24682468
 Number of lines in cache = m = 2r 32 bits
Note: Memory address values are
in binary representation;
other values are in hexadecimal
16-MByte main memory

 Size of cache = 2r+w words or bytes


Tag Line Word

 Size of tag = (s – r) bits Main memory address =

8 bits 14 bits 2 bits

Figure 4.10 Direct Mapping Example

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
s+w Main memory address (binary)

Tag (hex) Tag Word Data


000000 000000000000000000000000 13579246
Cache Main Memory 000001 000000000000000000000100

Memory Address Tag Data W0


Tag Word W1
W2 B0
L0 W3
s
Line
Tag Data Number
3FFFFE 11223344 0000
w 058CE7 FEDCBA98 0001
058CE6 000101100011001110011000
058CE7 000101100011001110011100 FEDCBA98 FEDCBA98
058CE8 000101100011001110100000
3FFFFD 33333333 3FFD
000000 13579246 3FFE
w Lj 3FFFFF 24682468 3FFF
s W4j 22 bits 32 bits
W(4j+1)
Compare w W(4j+2) Bj 16 Kline Cache

W(4j+3)
(hit in cache)
1 if match
3FFFFD 111111111111111111110100 33333333
0 if no match 3FFFFE 111111111111111111111000 11223344
s 3FFFFF 111111111111111111111100 24682468 Note: Memory address values are
Lm–1 in binary representation;
32 bits other values are in hexadecimal

0 if match 16 MByte Main Memory


1 if no match
(miss in cache)

Tag Word
Main Memory Address =

22 bits 2 bits
Figure 4.11 Fully Associative Cache Organization
Figure 4.12 Associative Mapping Example
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Set Associative Mapping

 Address length = (s + w) bits  Compromise that exhibits the strengths of both the direct and
associative approaches while reducing their disadvantages
 Number of addressable units = 2s+w words or bytes
 Cache consists of a number of sets
 Block size = line size = 2w words or bytes
 Each set contains a number of lines
 Number of blocks in main memory = 2s+ w/2w = 2s
 A given block maps to any line in a given set
 Number of lines in cache = undetermined
 e.g. 2 lines per set
 Size of tag = s bits  2 way associative mapping
 A given block can be in one of 2 lines in only one set

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
B0 L0 s+w

k lines
Lk–1
Cache Main Memory
Cache memory - set 0 Memory Address Tag Data
Bv–1 B0
Tag Set Word
F0
B1
s–d d w F1

Set 0

Cache memory - set v–1


s–d
(a) v associative-mapped caches
Fk s+w
Bj

B0 L0
Compare Fk+i Set 1
e
set

v lines
(hit in cache) F2k–1
1 if match
Bv–1 Lv–1 0 if no match

0 if match
(b) k direct-mapped caches
1 if no match
(miss in cache)

Figure 4.13 Mapping From Main Memory to Cache:


k-way Set Associative Figure 4.14 k-Way Set Associative Cache Organization
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Main memory address (binary)
Tag Main Memory Address =
(hex) Tag Set + Word Data
Tag Set Word
000 000000000000000000000000 13579246
000 000000000000000000000100

9 bits 13 bits 2 bits

000 000000001111111111111000
000 000000001111111111111100
Set
 Number of addressable units = 2s+w words or bytes 02C 000101100000000000000000 77777777
Tag Data Number Tag Data
000 13579246 0000 02C 77777777
02C 000101100000000000000100 11235813 02C 11235813 0001

 Block size = line size = 2w words or bytes 02C 000101100011001110011100 FEDCBA98 02C FEDCBA98
FEDCBA98 0CE7

 Number of blocks in main memory = 2s+w/2w=2s 1FF 11223344 1FFE


02C 000101100111111111111100 12345678 02C 12345678 1FFF 1FF 24682468

 Number of lines in set = k 9 bits 32 bits 9 bits 32 bits


1FF 111111111000000000000000 16 Kline Cache
1FF 111111111000000000000100
 Number of sets = v = 2d

 Number of lines in cache = m=kv = k * 2d 1FF 111111111111111111111000 11223344


1FF 111111111111111111111100 24682468

 Size of cache = k * 2d+w words or bytes 32 bits


Note: Memory address values are
16 MByte Main Memory in binary representation;
 Size of tag = (s – d) bits other values are in hexadecimal

Figure 4.15 Two-Way Set Associative Mapping Example

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
1.0
0.9
+
0.8
0.7
Hit ratio

0.6
0.5
0.4  Once the cache has been filled, when a new block is brought
0.3 into the cache, one of the existing blocks must be replaced
0.2
 For direct mapping there is only one possible line for any
0.1
particular block and no choice is possible
0.0
1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M
 For the associative and set-associative techniques a
Cache size (bytes)
replacement algorithm is needed
direct
2-way
4-way  To achieve high speed, an algorithm must be implemented in
8-way
16-way
hardware

Figure 4.16 Varying Associativity over Cache Size


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ The most common replacement
algorithms are:
 Least recently used (LRU)
 Most effective
 Replace that block in the set that has been in the cache longest with
no reference to it
 Because of its simplicity of implementation, LRU is the most popular
replacement algorithm

 First-in-first-out (FIFO)
 Replace that block in the set that has been in the cache longest
 Easily implemented as a round-robin or circular buffer technique

 Least frequently used (LFU)


 Replace that block in the set that has experienced the fewest
references
 Could be implemented by associating a counter with each line

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
When a block of Two specific effects
data is retrieved come into play:
and placed in the • Larger blocks reduce the
cache not only the As the block size number of blocks that fit
 Write through desired word but increases more into a cache
also some number useful data are • As a block becomes larger
each additional word is
 Simplest technique of adjacent words
are retrieved
brought into the
cache
farther from the requested
word

 All write operations are made to main memory as well as to the cache
 The main disadvantage of this technique is that it generates substantial
memory traffic and may create a bottleneck

 Write back
As the block size The hit ratio will
 Minimizes memory writes increases the hit begin to decrease
ratio will at first as the block
 Updates are made only in the cache increase because
of the principle of
becomes bigger
and the
locality probability of
 Portions of main memory are invalid and hence accesses by I/O using the newly
modules can be allowed only through the cache fetched
information
 This makes for complex circuitry and a potential bottleneck becomes less than
the probability of
reusing the
information that
has to be replaced
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
0.98
+
Multilevel Caches 0.96

0.94
 As logic density has increased it has become possible to have a cache 0.92
on the same chip as the processor
0.90
 The on-chip cache reduces the processor’s external bus activity and L1 = 16k

Hit ratio
speeds up execution time and increases overall system performance 0.88 L1 = 8k
 When the requested instruction or data is found in the on-chip cache, the bus
access is eliminated 0.86
 On-chip cache accesses will complete appreciably faster than would even 0.84
zero-wait state bus cycles
 During this period the bus is free to support other transfers 0.82

 Two-level cache: 0.80


 Internal cache designated as level 1 (L1)
0.78
 External cache designated as level 2 (L2)
1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M 2M
 Potential savings due to the use of an L2 cache depends on the hit rates L2 Cache size (bytes)
in both the L1 and L2 caches

 The use of multilevel caches complicates all of the design issues related
to caches, including size, replacement algorithm, and write policy Figure 4.17 Total Hit Ratio (L1 and L2) for 8 Kbyte and 16 Kbyte L1
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Processor on which
Feature First
Problem Solution Appears
External memory slower than the system Add external cache using 386
bus. faster memory
technology.
Increased processor speed results in Move external cache on- 486
chip, operating at the
 Has become common to split cache: external bus becoming a bottleneck for
same speed as the
cache access.
 One dedicated to instructions processor.
 One dedicated to data Internal cache is rather small, due to Add external L2 cache 486 Table 4.4
limited space on chip using faster technology
 Both exist at the same level, typically as two L1 caches than main memory
Contention occurs when both the Create separate data and Pentium Intel
 Advantages of unified cache: Instruction Prefetcher and the Execution instruction caches.
Cache
 Higher hit rate Unit simultaneously require access to the

 Balances load of instruction and data fetches automatically


cache. In that case, the Prefetcher is stalled Evolution
while the Execution Unit’s data access
 Only one cache needs to be designed and implemented takes place.
Create separate back-side Pentium Pro
 Trend is toward split caches at the L1 and uni ied caches for bus that runs at higher
speed than the main
higher levels Increased processor speed results in
(front-side) external bus.
external bus becoming a bottleneck for L2
The BSB is dedicated to
 Advantages of split cache: cache access.
the L2 cache.
 Eliminates cache contention between instruction fetch/decode unit Move L2 cache on to the Pentium II
and execution unit processor chip.
Some applications deal with massive Add external L3 cache. Pentium III
 Important in pipelining
databases and must have rapid access to
large amounts of data. The on-chip caches Move L3 cache on-chip. Pentium 4
are too small. (Table is on page
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. 150 in the
textbook.)
System Bus

Out-of-order L1 instruction Instruction


execution cache (12K ops) fetch/decode
logic unit
64
bits
L3 cache
(1 MB)
Table 4.5 Pentium 4 Cache Operating Modes
Integer register file FP register file
Control Bits Operating Mode
CD NW Cache Fills Write Throughs Invalidates
Load Store Simple Simple Complex FP/ FP
address
unit
address
unit
integer
ALU
integer
ALU
integer
ALU
MMX
unit
move
unit
0 0 Enabled Enabled Enabled
L2 cache
(512 KB) 1 0 Disabled Enabled Enabled
1 1 Disabled Disabled Disabled
L1 data cache (16 KB) 256
bits Note: CD = 0; NW = 1 is an invalid combination.

Figure 4.18 Pentium 4 Block Diagram

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Summary Cache +
Memory
Chapter 4
 Elements of cache
 Computer memory
design
system overview
 Cache addresses
 Characteristics of
Memory Systems  Cache size

 Memory Hierarchy  Mapping function

 Cache memory  Replacement algorithms


William Stallings
principles  Write policy
Computer Organization
 Pentium 4 cache  Line size
and Architecture
organization  Number of caches
10th Edition
© 2016 Pearson Education, Inc., Hoboken,
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. NJ. All rights reserved.
+

 The operation of the processor is determined by the


instructions it executes, referred to as machine instructions or
computer instructions

 The collection of different instructions that the processor can


execute is referred to as the processor’s instruction set

+ Chapter 12  Each instruction must contain the information required by the


processor for execution

Instruction Sets:
Characteristics and Functions
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Elements of a Machine Instruction
Instruction Operand Operand
fetch fetch store

Operation code Source operand


(opcode) reference
Multiple Multiple
• Specifies the operation • The operation may
to be performed. The involve one or more operands results
operation is specified source operands, that
by a binary code, is, operands that are
known as the operation inputs for the operation
code, or opcode Instruction Instruction Operand Operand
Data
address operation address address
Operation
calculation decoding calculation calculation

Result operand Next instruction Return for string


reference reference Instruction complete, or vector data
fetch next instruction
• The operation may • This tells the processor
produce a result where to fetch the next
instruction after the
execution of this
instruction is complete

Figure 12.1 Instruction Cycle State Diagram


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Source and result operands can be +
Instruction Representation
in one of four areas:
3) Processor register
 A processor contains one or
 Within the computer each instruction is represented by a
1) Main or virtual memory more registers that may be sequence of bits
 As with next instruction referenced by machine
references, the main or virtual instructions.
 The instruction is divided into fields, corresponding to the
memory address must be constituent elements of the instruction
supplied  If more than one register
exists each register is 4 bits 6 bits 6 bits
assigned a unique name or
number and the instruction Opcode Operand Reference Operand Reference
2) I/O device must contain the number of
16 bits
 The instruction must specify the desired register
the I/O module and device for 4) Immediate
the operation. If memory-
mapped I/O is used, this is  The value of the operand is
just another main or virtual contained in a field in the Figure 10.2 A Simple Instruction Format
memory address instruction being executed
Figure 12.2 A Simple Instruction Format

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Instruction Types
Instruction Representation • Arithmetic instructions provide • Movement of data into or
computational capabilities for out of register and or
processing numeric data memory locations
• Logic (Boolean) instructions operate
 Opcodes are represented by abbreviations on the bits of a word as bits rather
than as numbers, thus they provide
called mnemonics capabilities for processing any
other type of data the user may wish
to employ

 Examples include:
Data Data
 ADD Add
processing storage
 SUB Subtract
 MUL Multiply
 DIV Divide
 LOAD Load data from memory
Data
 STOR Store data to memory Control
movement
 Operands are also represented symbolically • Test instructions are used to test the • I/O instructions are needed
value of a data word or the status of a to transfer programs and
computation data into memory and the
 Each symbolic opcode has a fixed binary representation • Branch instructions are used to branch
to a different set of instructions
results of computations
back out to the user
depending on the decision made
 The programmer specifies the location of each symbolic operand

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Instruction Comment
SUB Y, A, B YA–B
MPY T, D, E T  D E
ADD T, T, C
DIV Y, Y, T
TT+C
YY÷T
Table 12.1
Utilization of Instruction Addresses
(a) Three-address instructions
Instruction Comment (Nonbranching Instructions)
LOAD D AC  D
MPY E AC  AC E
Instruction Comment ADD C AC  AC + C Number of Addresses Symbolic Representation Interpretation
MOVE Y, A STOR Y Y  AC
SUB Y, B
3 OP A, B, C A  B OP C
LOAD A AC  A
MOVE T, D SUB B AC  AC – B 2 OP A, B A  A OP B
MPY T, E DIV Y AC  AC ÷ Y
ADD T, C STOR Y Y  AC 1 OP A AC  AC OP A
DIV Y, T 0 OP T  (T – 1) OP T

AC = accumulator
(b) Two-address instructions (c) One-address instructions
T = top of stack
(T – 1) = second element of stack
A, B, C = memory or register locations
AB
Figure 12.3 Programs to Execute Y
CDE
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Instruction Set Design
Very complex because it affects so many aspects of the computer system

Defines many of the functions performed by the processor

Programmer’s means of controlling the processor

Fundamental design issues:

Operation repertoire Data types Instruction format Registers Addressing


• How many and which • The various types of data • Instruction length in bits, • Number of processor • The mode or modes by
operations to provide and upon which operations are number of addresses, size registers that can be which the address of an
how complex operations performed of various fields, etc. referenced by instructions operand is specified
should be and their use

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
 All machine languages include numeric data types
 A common form of data is text or character strings
 Numbers stored in a computer are limited:
 Limit to the magnitude of numbers representable on a machine  Textual data in character form cannot be easily stored or
 In the case of floating-point numbers, a limit to their precision transmitted by data processing and communications systems
because they are designed for binary data
 Three types of numerical data are common in computers:
 Binary integer or binary fixed point  Most commonly used character code is the International
 Binary floating point Reference Alphabet (IRA)
 Decimal  Referred to in the United States as the American Standard Code
for Information Interchange (ASCII)
 Packed decimal
 Each decimal digit is represented by a 4-bit code with two digits  Another code used to encode characters is the Extended
stored per byte
 To form numbers 4-bit codes are strung together, usually in multiples
Binary Coded Decimal Interchange Code (EBCDIC)
of 8 bits  EBCDIC is used on IBM mainframes

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Data Type Description
General Byte, word (16 bits), doubleword (32 bits), quadword (64 bits),
and double quadword (128 bits) locations with arbitrary binary
contents.
Integer A signed binary value co4ntained in a byte, word, or doubleword,
using twos complement representation.
Ordinal An unsigned integer contained in a byte, word, or doubleword.
Unpacked binary coded
decimal (BCD)
A representation of a BCD digit in the range 0 through 9, with one
digit in each byte. Table 12.2
 An n-bit unit consisting of n 1-bit items of data, each item Packed BCD Packed byte representation of two BCD digits; value in the range 0
having the value 0 or 1 to 99.
Near pointer A 16-bit, 32-bit, or 64-bit effective address that represents the

 Two advantages to bit-oriented view:


offset within a segment. Used for all pointers in a nonsegmented
memory and for references within a segment in a segmented
x86
Data Types
memory.
 Memory can be used most efficiently for storing an array of Far pointer A logical address consisting of a 16-bit segment selector and an
Boolean or binary data items in which each item can take on only offset of 16, 32, or 64 bi ts. Far pointers are used for memory
references in a segmented memory model where the identity of a
the values 1 (true) and 0 (false) segment being accessed must be specified explicitly.

 To manipulate the bits of a data item Bit field A contiguous sequence of bits in which the position of each bit is
considered as an independent unit. A bit string can begin at any bit
 If floating-point operations are implemented in software, we position of any byte and can contain up to 32 bits.

need to be able to shift significant bits in some operations Bit string A contiguous sequence of bits, containing from zero to 232 – 1
bits.
 To convert from IRA to packed decimal, we need to extract the Byte string A contiguous sequence of bytes, words, or doublewords,
rightmost 4 bits of each byte containing from zero to 232 – 1 bytes.
Floating point See Figure 12.4.
Packed SIMD (single Packed 64-bit and 128-bit data types
instruction, multiple data)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Byte unsigned integer
7 0

Word unsigned integer


15 0

Doubleword unsigned integer


31 0

Quadword unsigned integer


63 0
sign bit
twos comp Byte signed integer
7 0 (twos complement)

Introduced to the x86 architecture as part of the extensions of the


sign bit
Word signed integer 
15 0
(twos complement) instruction set to optimize performance of multimedia applications
sign bit
Doubleword signed integer
31 0
(twos complement)  These extensions include MMX (multimedia extensions) and SSE
sign bit
Quadward usigned integer (streaming SIMD extensions)
0
(twos complement)t
63
sign bit
signif.
Half precision  Data types:
exp
floating point
15 9 0  Packed byte and packed byte integer
sign bit
exp significand
Single precision  Packed word and packed word integer
floating point
sign bit
31 22 0
 Packed doubleword and packed doubleword integer
Double precision
exp significand
floating point  Packed quadword and packed quadword integer
63 51 0
 Packed single-precision floating-point and packed double-precision
sign bit integer bit
Double extended precision floating-point
exponent significand
floating point
79 63 0

Figure 12.4 x86 Numeric Data Formats

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
ARM Data Types Data bytes
in memory
ARM processors support (ascending address values
data types of:
from byte 0 to byte 3)
• 8 (byte)
• 16 (halfword) Byte 3
• 32 (word) bits in length
Byte 2

Byte 1

Alignment checking Byte 0


All three data types can
also be used for twos • When the appropriate control
complement signed bit is set, a data abort signal
indicates an alignment fault for
integers attempting unaligned access

31 0 31 0
Byte 3 Byte 2 Byte 1 Byte 0 Byte 0 Byte 1 Byte 2 Byte 3

ARM register ARM register

For all three data types program status register E-bit = 0 program status register E-bit = 1
Unaligned access
an unsigned
interpretation is • When this option is enabled,
the processor uses one or
supported in which the more memory accesses to
value represents an generate the required transfer
unsigned, nonnegative of adjacent bytes transparently
to the programmer
integer Figure 12.5 ARM Endian Support - Word Load/Store with E-bit
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Type Operation Name Description
Type Operation Name Description
Move (transfer) Transfer word or block from source to destination
Jump (branch) Unconditional transfer; load PC with specified address
Store Transfer word from processor to memory
Jump Conditional Test specified condition; either load PC with specified
Load (fetch) Transfer word from memory to processor address or do nothing, based on condition
Exchange Swap contents of source and destination Jump to Subroutine Place current program control information in known
Data Transfer
Clear (reset) Transfer word of 0s to destination location; jump to specified address

Table 12.3 Table 12.3


Set Transfer word of 1s to destination Return Replace contents of PC and other register from known
location
Push Transfer word from source to top of stack
Execute Fetch operand from specified location and execute as
Pop Transfer word from top of stack to destination instruction; do not modify PC
Transfer of Control
Add Compute sum of two operands Skip Increment PC to skip next instruction

Common Common
Subtract Compute difference of two operands Skip Conditional Test specified condition; either skip or do nothing based
Multiply Compute product of two operands on condition

Arithmetic
Divide
Absolute
Compute quotient of two operands
Replace operand by its absolute value
Instruction Set Halt
Wait (hold)
Stop program execution
Stop program execution; test specified condition
Instruction Set
Negate
Increment
Change sign of operand
Add 1 to operand
Operations No operation
repeatedly; resume execution when condition is satisfied
No operation is performed, but program execution is
Operations
Decrement Subtract 1 from operand (page 1 of 2) Input (read)
continued
Transfer data from specified I/O port or device to (page 2 of 2)
AND Perform logical AND destination (e.g., main memory or processor register)
OR Perform logical OR Output (write) Transfer data from specified source to I/O port or device
NOT (complement) Perform logical NOT Input/Output Start I/O Transfer instructions to I/O processor to initiate I/O
Exclusive-OR Perform logical XOR operation
Test Test specified condition; set flag(s) based on outcome Test I/O Transfer status information from I/O system to specified
Logical destination
Compare Make logical or arithmetic comparison of two or more
operands; set flag(s) based on outcome Translate Translate values in a section of memory based on a table
of correspondences
Set Control Class of instructions to set controls for protection Conversion
Variables purposes, interrupt handling, timer control, etc. Convert Convert the contents of a word from one form to another (Table can be found on page
(e.g., packed decimal to binary) 426 in textbook.)
Shift Left (right) shift operand, introducing constants at end (Table can be found on page
Rotate Left (right) shift operand, with wraparound end 426 in textbook.)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 12.4
Processor Actions for Various Types of Operations
Transfer data from one location to another
If memory is involved:
Data Transfer Determine memory address
Perform virtual-to-actual-memory address transformation
Check cache
Initiate memory read/write
May involve data transfer, before and/or after Must specify:
Arithmetic Perform function in ALU • Location of the source and
destination operands
Set condition codes and flags Most fundamental type of
• The length of data to be
machine instruction transferred must be indicated
Logical Same as arithmetic
• The mode of addressing for each
Similar to arithmetic and logical. May involve special logic to operand must be specified
Conversion
perform conversion
Update program counter. For subroutine call/return, manage
Transfer of Control
parameter passing and linkage
Issue command to I/O module
I/O
If memory-mapped I/O, determine memory-mapped address

(Table can be found on page 427 in textbook.)


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 12.5
Examples of IBM EAS/390 Data Transfer Operations +  Most machines provide the basic arithmetic
Operation Number of Bits operations of add, subtract, multiply, and divide
Mnemonic Name Transferred Description
L Load 32 Transfer from memory to register
 These are provided for signed integer (fixed-
point) numbers
LH Load Halfword 16 Transfer from memory to register
LR Load 32 Transfer from register to register  Often they are also provided for floating-point
LER Load (Short) 32 Transfer from floating-point register to
and packed decimal numbers
floating-point register
 Other possible operations include a variety of
LE Load (Short) 32 Transfer from memory to floating-point single-operand instructions:
register
LDR Load (Long) 64 Transfer from floating-point register to  Absolute
floating-point register  Take the absolute value of the operand
LD Load (Long) 64 Transfer from memory to floating-point  Negate
register
 Negate the operand
ST Store 32 Transfer from register to memory
STH Store Halfword 16 Transfer from register to memory  Increment
STC Store Character 8 Transfer from register to memory  Add 1 to the operand
STE Store (Short) 32 Transfer from floating-point register to Decrement
memory
 Subtract 1 from the operand
STD Store (Long) 64 Transfer from floating-point register to
memory
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. (Table can be found on page 428 in textbook.)
0

(a) Logical right shift

Table 12.6
Basic Logical Operations
(b) Logical left shift

(c) Arithmetic right shift

P Q NOT P P AND Q P OR Q P XOR Q P=Q 0

S
0 0 1 0 0 0 1
(d) Arithmetic left shift

0 1 1 0 1 1 0
1 0 0 0 1 1 0
1 1 0 1 1 0 1 (e) Right rotate

(f) Left rotate

Figure 12.6 Shift and Rotate Operations

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Instructions that
Table 12.7

Examples of Shift and Rotate Operations

Input Operation Result


10100110 Logical right shift (3 bits) 00010100
10100110 Logical left shift (3 bits) 00110000
10100110 Arithmetic right shift (3 bits) 11110100
10100110 Arithmetic left shift (3 bits) 10110000 Translate (TR)
instruction
10100110 Right rotate (3 bits) 11010100
10100110 Left rotate (3 bits) 00110101

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Input/Output
Instructions that can be executed only while the processor is in a
certain privileged state or is executing a program in a special
 Variety of approaches taken: privileged area of memory
 Isolated programmed I/O
 Memory-mapped programmed I/O Typically these instructions are reserved for the use of the
operating system
 DMA
 Use of an I/O processor
Examples of system control operations:
 Many implementations provide only a few I/O instructions,
with the specific actions specified by parameters, codes, or
command words A system control instruction An instruction to read or Access to process control
may read or alter a control modify a storage protection blocks in a
register key multiprogramming system

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Memory
address Instruction

200
201
202 SUB X, Y
 Reasons why transfer-of-control operations are required: 203 BRZ 211
 It is essential to be able to execute each instruction more than Unconditional
branch Conditional
once
branch
 Virtually all programs involve some decision making 210 BR 202
211
 It helps if there are mechanisms for breaking the task up into
smaller pieces that can be worked on one at a time
225 BRE R1, R2, 235
 Most common transfer-of-control operations found in
instruction sets: Conditional
branch
 Branch
235
 Skip
 Procedure call

Figure 12.7 Branch Instructions


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Typically implies that one
 Self-contained computer program that is incorporated into a
instruction be skipped, larger program
Includes an implied thus the implied address
 At any point in the program the procedure may be invoked, or called
address equals the address of the
 Processor is instructed to go and execute the entire procedure and
next instruction plus one then return to the point from which the call took place
instruction length
 Two principal reasons for use of procedures:
 Economy
 A procedure allows the same piece of code to be used many times
 Modularity

Because the skip  Involves two basic instructions:


instruction does not Example is the  A call instruction that branches from the present location to the
require a destination increment-and-skip-if- procedure
address field it is free to zero (ISZ) instruction  Return instruction that returns from the procedure to the place from
do other things which it was called

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Addresses Main Memory
4000

Main
4100 CALL Proc1
4101 Program

4500

4600 CALL Proc2


4601 4651
4601 Procedure
Proc1 4101 4101 4101 4101 4101
4650 CALL Proc2
4651
(a) Initial stack (b) After (c) Initial (d) After (e) After (f) After (g) After
RETURN
contents CALL Proc1 CALL Proc2 RETURN CALL Proc2 RETURN RETURN

4800
Procedure
Proc2

RETURN
Figure 12.9 Use of Stack to Implement Nested Procedures of Figure 12.8
(a) Calls and returns (b) Execution sequence

Figure 12.8 Nested Procedures


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
x86 Operation Types
y2 Stack
Pointer
 The x86 provides a complex array of operation types including a
y1 number of specialized instructions

Old Frame Pointer Frame  The intent was to provide tools for the compiler writer to produce
Pointer optimized machine language translation of high-level language
Q: Return Point programs

x2
Stack
x2  Provides four instructions to support procedure call/return:
Pointer  CALL
x1 x1  ENTER
 LEAVE
Old Frame Pointer Frame Old Frame Pointer  RETURN
Pointer
P: Return Point P: Return Point  When a new procedure is called the following must be performed upon
entry to the new procedure:
 Push the return point on the stack
(a) P is active (b) P has called Q  Push the current frame pointer on the stack
 Copy the stack pointer as the new value of the frame pointer
 Adjust the stack pointer to allocate a frame
Figure 12.10 Stack Frame Growth Using Sample Procedures P and Q
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Symbol Condition Tested Comment

Table 12.8 A, NBE CF=0 AND ZF=0 Above; Not below or equal (greater than,
unsigned)

x86 Status Flags AE, NB, NC CF=0 Above or equal; Not below (greater than or

Table
equal, unsigned); Not carry
B, NAE, C CF=1 Below; Not above or equal (less than,

12.9
unsigned); Carry set
Status Bit Name Description BE, NA CF=1 OR ZF=1 Below or equal; Not above (less than or
equal, unsigned)
CF Carry Indicates carrying or borrowing out of the left-most bit position
E, Z ZF=1 Equal; Zero (signed or unsigned)
following an arithmetic operation. Also modified by some of
the shift and rotate operations. G, NLE [(SF=1 AND OF=1) OR (SF=0 Greater than; Not less than or equal (signed)
x86
PF Parity Parity of the least-significant byte of the result of an arithmetic
and OF=0)] AND [ZF=0]
Condition
GE, NL (SF=1 AND OF=1) OR (SF=0 Greater than or equal; Not less than (signed)
or logic operation. 1 indicates even parity; 0 indicates odd AND OF=0) Codes
parity. L, NGE (SF=1 AND OF=0) OR (SF=0
AND OF=1)
Less than; Not greater than or equal (signed) for
AF Auxiliary Carry Represents carrying or borrowing between half-bytes of an 8-bit
LE, NG (SF=1 AND OF=0) OR (SF=0 Less than or equal; Not greater than (signed) Conditional
arithmetic or logic operation. Used in binary-coded decimal
arithmetic.
AND OF=1) OR (ZF=1)
Jump
NE, NZ ZF=0 Not equal; Not zero (signed or unsigned)
ZF Zero Indicates that the result of an arithmetic or logic operation is 0. NO OF=0 No overflow
and
SF Sign Indicates the sign of the result of an arithmetic or logic NS SF=0 Not sign (not negative) SETcc
operation. NP, PO PF=0 Not parity; Parity odd Instructions
O OF=1 Overflow
OF Overflow Indicates an arithmetic overflow after an addition or subtraction
P PF=1 Parity; Parity even
for twos complement arithmetic. (Table can be found on page
S SF=1 Sign (negative) 440 in the textbook.)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Category Instruction Description
PADD [B, W, D] Parallel add of packed eight bytes, four 16-bit words, or two 32-bit
doublewords, with wraparound.
PADDS [B, W] Add with saturation.
PADDUS [B, W] Add unsigned with saturation
PSUB [B, W, D] Subtract with wraparound.
PSUBS [B, W] Subtract with saturation.
Arithmetic PSUBUS [B, W] Subtract unsigned with saturation
PMULHW Parallel multiply of four signed 16-bit words, with high-order 16

PMULLW
bits of 32-bit result chosen.
Parallel multiply of four signed 16-bit words, with low-order 16 bits
of 32-bit result chosen.
Table 12.10
PMADDWD Parallel multiply of four signed 16-bit words; add together adjacent
pairs of 32-bit results.  1996 Intel introduced MMX technology into its Pentium
PCMPEQ [B, W, D] Parallel compare for equality; result is mask of 1s if true or 0s if product line
Comparison
PCMPGT [B, W, D]
false.
Parallel compare for greater than; result is mask of 1s if true or 0s if
false.
MMX  MMX is a set of highly optimized instructions for multimedia tasks
PACKUSWB
PACKSS [WB, DW]
Pack words into bytes with unsigned saturation.
Pack words into bytes, or doublewords into words, with signed Instruction Set  Video and audio data are typically composed of large arrays
saturation.
Conversion PUNPCKH [BW, WD,
DQ]
Parallel unpack (interleaved merge) high-order bytes, words, or
doublewords from MMX register.
of small data types
PUNPCKL [BW, WD, Parallel unpack (interleaved merge) low-order bytes, words, or
DQ] doublewords from MMX register.
PAND 64-bit bitwise logical AND  Three new data types are defined in MMX
PNDN 64-bit bitwise logical AND NOT
Logical
POR 64-bit bitwise logical OR  Packed byte
PXOR 64-bit bitwise logical XOR
PSLL [W, D, Q] Parallel logical left shift of packed words, doublewords, or  Packed word
quadword by amount specified in MMX register or immediate
value.  Packed doubleword
Shift PSRL [W, D, Q] Parallel logical right shift of packed words, doublewords, or
quadword.
PSRA [W, D] Parallel arithmetic right shift of packed words, doublewords, or  Each data type is 64 bits in length and consists of multiple
quadword.
Data Transfer MOV [D, Q] Move doubleword or quadword to/from MMX register.
(Table can be found on page
442 in the textbook.)
smaller data fields, each of which holds a fixed-point integer
State Mgt EMMS Empty MMX state (empty FP registers tag bits).
Note: If an instruction supports multiple data types [byte (B), word (W), doubleword (D), quadword
(Q)], the data types are indicated in brackets.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
instructions

instructions instructions

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Code Symbol Condition Tested Comment
0000 EQ Z=1 Equal + Summary Instruction Sets:
0001 NE Z=0 Not equal
0010 CS/HS C=1 Carry set/unsigned higher or same Characteristics and
0011
0100
CC/LO
MI
C=0
N=1
Carry clear/unsigned lower
Minus/negative Table 12.11 Chapter 12 Functions
0101 PL N=0 Plus/positive or zero
0110 VS V=1 Overflow  Machine instruction  Intel x86 and ARM data types
ARM characteristics
0111 VC V=0 No overflow
1000 HI C = 1 AND Z = 0 Unsigned higher Conditions  Elements of a machine
 Types of operations
1001 LS C = 0 OR Z = 1 Unsigned lower or same for instruction  Data transfer
1010 GE N=V Signed greater than or equal Conditional  Instruction representation  Arithmetic
[(N = 1 AND V = 1)
OR (N = 0 AND V = 0)
Instruction  Instruction types  Logical
1011 LT N≠V Signed less than Execution  Number of addresses  Conversion
[(N = 1 AND V = 0)  Instruction set design  Input/output
OR (N = 0 AND V = 1)]
 Types of operands  System control
1100 GT (Z = 0) AND (N = V) Signed greater than
1101 LE (Z = 1) OR (N ≠ V) Signed less than or equal
 Numbers  Transfer of control
1110 AL — Always (unconditional)  Characters
1111 — — This instruction can only be executed
(Table can be found on
 Logical data  Intel x86 and ARM operation
Page 445 in the textbook.)
unconditionally types
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+

William Stallings + Chapter 6


Computer Organization
and Architecture
External Memory
10th Edition
© 2016 Pearson Education, Inc.,
Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ The write mechanism exploits
the fact that electricity flowing
through a coil produces a
magnetic field

A disk is a circular platter constructed of nonmagnetic



material, called the substrate, coated with a magnetizable
Magnetic
material Read
The write head itself is made of
 Traditionally the substrate has been an aluminium or aluminium
alloy material easily magnetizable material and Write
and is in the shape of a
 Recently glass substrates have been introduced rectangular doughnut with a gap Mechanisms
along one side and a few turns
 Benefits of the glass substrate: of conducting wire along the
opposite side
 Improvement in the uniformity of the magnetic film surface to
increase disk reliability
 A significant reduction in overall surface defects to help reduce
read-write errors
 Ability to support lower fly heights
 Better stiffness to reduce disk dynamics
 Greater ability to withstand shock and damage

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Read Rotation
Inter-track gap Track
current
Inter-sector gap
MR
sensor Write current
Track sector
Sector
Shield

Inductive
write element

Magnetization
Read-write head
(1 per surface)

Recording Platter
medium

Direction of
Cylinder Spindle Boom
arm motion
Figure 6.1 Inductive Write/Magnetoresistive Read Head Figure 6.2 Disk Data Layout

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Track Zone
Sector

index

physical sector 0 physical sector 1 physical sector 29


secto r

gap ID gap data gap gap ID gap data gap gap ID gap data gap
1 field 2 field 3 1 field 2 field 3 1 field 2 field 3
0 0 1 1 29 29
bytes 17 7 41 515 20 17 7 41 515 20 17 7 41 515 20

600 bytes/sector

synch track head sector synch


CRC data CRC
byte # # # byte
(a) Constant angular velocity (b) Multiple zone recording
bytes 1 2 1 1 2 1 512 2

Figure 6.4 Winchester Disk Format (Seagate ST506)


Figure 6.3 Comparison of Disk Layout Methods
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Head Motion Platters  Fixed-head disk  Removable disk
Fixed head (one per track) Single platter  One read-write head per  Can be removed and
track replaced with another disk
Movable head (one per surface) Multiple platter
 Heads are mounted on a  Advantages:
fixed ridged arm that  Unlimited amounts of data are
Disk Portability Head Mechanism extends across all tracks available with a limited
number of disk systems
Nonremovable disk Contact (floppy)
 Movable-head disk  A disk may be moved from
Removable disk Fixed gap one computer system to
 One read-write head another
Aerodynamic gap (Winchester)
 Head is mounted on an arm  Floppy disks and ZIP
Sides cartridge disks are
 The arm can be extended
Single sided or retracted examples of removable
disks
Double sided  Non-removable disk
 Permanently mounted in the  Double sided disk
Table 6.1 disk drive  Magnetizable
 The hard disk in a personal coating is applied
Physical Characteristics of Disk Systems computer is a non-removable
disk to both sides of the
platter

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 6.2
+ Typical Hard Disk Drive Parameters
Characteristics Seagate Seagate Seagate Seagate Laptop

Classification Application
Enterprise
Enterprise
Barracuda XT
Desktop
Cheetah NS
Network
HDD
Laptop
attached storage,
application
 The head must generate or servers
sense an electromagnetic field Winchester Heads
Capacity 6 TB 3 TB 600 GB 2 TB
of sufficient magnitude to write
and read properly  Used in sealed drive assemblies that Average seek 4.16 ms N/A 3.9 ms read 13 ms
are almost free of contaminants time 4.2 ms write
 The narrower the head, the
Spindle speed 7200 rpm 7200 rpm 10, 075 rpm 5400 rpm
closer it must be to the platter  Designed to operate closer to the
surface to function disk’s surface than conventional rigid Average latency 4.16 ms 4.16 ms 2.98 5.6 ms
 A narrower head means
disk heads, thus allowing greater Maximum 216 MB/s 149 MB/s 97 MB/s 300 MB/s
data density
narrower tracks and sustained
therefore greater data transfer rate
 Is actually an aerodynamic foil that
density rests lightly on the platter’s surface Bytes per sector 512/4096 512 512 4096
when the disk is motionless
 The closer the head is to the Tracks per 8 10 8 4
 The air pressure generated by a
disk the greater the risk of cylinder (number
spinning disk is enough to make
error from impurities or the foil rise above the surface of platter
imperfections surfaces)
Cache 128 MB 64 MB 16 MB 8 MB
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
 When the disk drive is operating the disk is rotating at constant speed
 To read or write the head must be positioned at the desired track and at the beginning
of the desired sector on the track
 Track selection involves moving the head in a movable-head system or electronically
selecting one head on a fixed-head system
Wait for Wait for Seek Rotational Data  Once the track is selected, the disk controller waits until the appropriate sector rotates to
Device Channel Delay Transfer line up with the head

 Seek time
 On a movable–head system, the time it takes to position the head at the track
Device Busy
 Rotational delay (rotational latency)
 The time it takes for the beginning of the sector to reach the head

 Access time
 The sum of the seek time and the rotational delay
Figure 6.5 Timing of a Disk I/O Transfer  The time it takes to get into position to read or write

 Transfer time
 Once the head is in position, the read or write operation is then performed
as the sector moves under the head
 This is the data transfer portion of the operation

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 6.3
+  Consists of 7 levels
RAID Levels
Large I/O Data
Category Level Description
Disks Data Availability
Small I/O Request
Transfer
 Levels do not imply a hierarchical Required Capacity
Rate

relationship but designate different Lower than single Very high for both

RAID
Striping 0 Nonredundant N Very high
disk read and write
design architectures that share three Higher than RAID Higher than single Up to twice that of a
disk for read; single disk for read;
common characteristics: Mirroring 1 Mirrored 2N 2, 3, 4, or 5; lower
similar to single similar to single disk
than RAID 6
disk for write for write
1) Set of physical disk drives viewed Redundant via Hamming
Much higher than
single disk; Highest of all Approximately twice
2 N+m
by the operating system as a single code comparable to listed alternatives that of a single disk
RAID 3, 4, or 5
logical drive Parallel access
Much higher than
single disk; Highest of all Approximately twice
3 Bit-interleaved parity N+1
comparable to listed alternatives that of a single disk
2) Data are distributed across the RAID 2, 4, or 5
Redundant Array of physical drives of an array in a Much higher than
Similar to RAID 0
Similar to RAID 0 for
for read;
scheme known as striping 4 Block-interleaved parity N+1
single disk;
significantly
read; significantly
comparable to lower than single disk
Independent Disks RAID 2, 3, or 5
lower than single
disk for write
for write
3) Redundant disk capacity is used to Much higher than Similar to RAID 0 Similar to RAID 0 for
Independent
store parity information, which access 5
Block-interleaved
distributed parity
N+1 single disk; for read; lower read; generally lower
comparable to than single disk than single disk for
guarantees data recoverability in RAID 2, 3, or 4 for write write
case of a disk failure Block-interleaved dual Highest of all
Similar to RAID 0
for read; lower
Similar to RAID 0 for
read; significantly
6 N+2
distributed parity listed alternatives than RAID 5 for lower than RAID 5 for
write write
N = number of data disks; m proportional to log N

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
strip 0 strip 1 strip 2 strip 3
strip 4 strip 5 strip 6 strip 7
strip 8 strip 9 strip 10 strip 11
strip 12 strip 13 strip 14 strip 15
(d) RAID 3 (bit-interleaved parity)

(a) RAID 0 (non-redundant) block 0 block 1 block 2 block 3 P(0-3)


block 4 block 5 block 6 block 7 P(4-7)
block 8 block 9 block 10 block 11 P(8-11)

strip 0 strip 1 strip 2 strip 3 strip 0 strip 1 strip 2 strip 3 block 12 block 13 block 14 block 15 P(12-15)

strip 4 strip 5 strip 6 strip 7 strip 4 strip 5 strip 6 strip 7


(e) RAID 4 (block-level parity)
strip 8 strip 9 strip 10 strip 11 strip 8 strip 9 strip 10 strip 11
strip 12 strip 13 strip 14 strip 15 strip 12 strip 13 strip 14 strip 15
block 0 block 1 block 2 block 3 P(0-3)
block 4 block 5 block 6 P(4-7) block 7
block 8 block 9 P(8-11) block 10 block 11
(b) RAID 1 (mirrored) block 12 P(12-15) block 13 block 14 block 15
P(16-19) block 16 block 17 block 18 block 19

(f) RAID 5 (block-level distributed parity)

b0 b1 b2 b3 f0(b) f1(b) f2(b) block 0 block 1 block 2 P(0-3) Q(0-3)


block 4 block 5 P(4-7) Q(4-7) block 7
block 8 P(8-11) Q(8-11) block 10 block 11
block 12 P(12-15) Q(12-15) block 13 block 14 block 15

(c) RAID 2 (redundancy through Hamming code)


(g) RAID 6 (dual redundancy)

Figure 6.6 RAID Levels (page 1 of 2) Figure 6.6 RAID Levels (page 2 of 2)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Logical Disk

strip 0 strip 0 strip 1 strip 2 strip 3


+  Addresses the issues of request patterns of
the host system and layout of the data
R d
strip 1 strip 4 strip 5 strip 6 strip 7
 Impact of redundancy does not interfere
a
strip 2 strip 8 strip 9 strip 10 strip 11
strip 3 strip 12 strip 13 strip 14 strip 15
with analysis i 0
strip 4
strip 5 Physical
Physical Physical Physical
strip 6 Disk 0 Disk 1 Disk 2 Disk 3
strip 7
strip 8
 For applications to experience
Array a high transfer rate two
strip 9  For an individual I/O request for a
Management requirements must be met:
strip 10
Software
small amount of data the I/O time
strip 11 1. A high transfer capacity must
is dominated by the seek time and
exist along the entire path
rotational latency
strip 12
strip 13 between host memory and the
individual disk drives  A disk array can provide high I/O
strip 14 execution rates by balancing the
strip 15 2. The application must make I/O I/O load across multiple disks
requests that drive the disk
array efficiently  If the strip size is relatively large
multiple waiting I/O requests can
Figure 6.7 Data Mapping for a RAID Level 0 Array be handled in parallel, reducing
the queuing time for each request
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ R + R 2

Characteristics Positive Aspects Performance


 An error-correcting code is
 Differs from RAID levels 2 through 6  A read request can be serviced by  Makes use of a parallel access calculated across corresponding
in the way in which redundancy is either of the two disks that contains 1 technique bits on each data disk and the bits
achieved the requested data of the code are stored in the
corresponding bit positions on
 In a parallel access array all multiple parity disks
 Redundancy is achieved by the  There is no “write penalty” member disks participate in the
simple expedient of duplicating all
the data execution of every I/O request  Typically a Hamming code is used,
 Recovery from a failure is simple,
when a drive fails the data can be which is able to correct single-bit
 Spindles of the individual drives errors and detect double-bit
 Data striping is used but each logical accessed from the second drive errors
strip is mapped to two separate are synchronized so that each
physical disks so that every disk in  Provides real-time copy of all data disk head is in the same position
the array has a mirror disk that on each disk at any given time  The number of redundant disks is
contains the same data proportional to the log of the
 Can achieve high I/O request rates if number of data disks
the bulk of the requests are reads  Data striping is used
 RAID 1 can also be implemented
 Strips are very small, often as  Would only be an effective choice
without data striping, although this is in an environment in which many
 Principal disadvantage is the cost small as a single byte or word
less common disk errors occur
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ R + R
a a
i i
d d
Characteristics
Redundancy Performance Performance
In the event of a drive failure, the
 Makes use of an independent
 Requires only a single 
access technique Involves a write penalty when
redundant disk, no matter how parity drive is accessed and data is 3  4
reconstructed from the remaining  In an independent access array, an I/O write request of small
large the disk array devices
each member disk operates size is performed
independently so that separate
 Employs parallel access, with  Once the failed drive is replaced, the
I/O requests can be satisfied in
data distributed in small strips missing data can be restored on the
parallel
 Each time a write occurs the
new drive and operation resumed
array management software
 Instead of an error correcting  In the event of a disk failure, all of the  Data striping is used must update not only the user
code, a simple parity bit is data are still available in what is data but also the corresponding
computed for the set of referred to as reduced mode  Strips are relatively large
parity bits
individual bits in the same
position on all of the data disks  Return to full operation requires that  To calculate the new parity the
the failed disk be replaced and the array management software  Thus each strip write involves
entire contents of the failed disk be
 Can achieve very high data regenerated on the new disk must read the old user strip two reads and two writes
transfer rates and the old parity strip
 In a transaction-oriented environment
performance suffers
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ R
a
i
d
Table 6.4
Characteristics Characteristics
 Organized in a similar fashion  Two different parity calculations 5 RAID
to RAID 4 are carried out and stored in 6 Comparison
separate blocks on different
disks (page 1 of 2)
 Difference is distribution of
the parity strips across all  Advantage is that it provides
disks extremely high data availability

 A typical allocation is a round-  Three disks would have to fail


robin scheme within the mean time to repair
(MTTR) interval to cause data to
be lost
 The distribution of parity
strips across all drives avoids  Incurs a substantial write
the potential I/O bottleneck penalty because each write
found in RAID 4 affects two parity blocks

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+

Table 6.4
 SSDs have the following advantages over HDDs:
RAID
 High-performance input/output operations per second
Comparison
(IOPS)
(page 2 of 2)
 Durability

 Longer lifespan

 Lower power consumption

 Quieter and cooler running capabilities

 Lower access times and latency rates

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Host System

Operating System
Software
File System Software
I/O Driver Software
NAND Flash Drives Seagate Laptop Internal
HDD Interface

File copy/write speed 200—550 Mbps 50—120 Mbps


Power draw/battery life Less power draw, averages 2– More power draw, averages SSD
Interface
3 watts, resulting in 30+ 6–7 watts and therefore uses
minute battery boost more battery
Storage capacity Typically not larger than 512 Typically around 500 GB and
GB for notebook size drives; 1 2 TB maximum for notebook
TB max for desktops size drives; 4 TB max for
desktops
Cost Approx. $0.50 per GB for a 1- Approx $0.15 per GB for a 4-
TB drive TB drive

Table 6.5
Comparison of Solid State Drives and Disk Drives
Figure 6.8 Solid State Drive Architecture

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
CD
+ Practical Issues Compact Disk. A nonerasable disk that stores digitized audio information. The standard
system uses 12-cm disks and can record more than 60 minutes of uninterrupted playing time.

CD-ROM
There are two practical issues peculiar to SSDs Compact Disk Read-Only Memory. A nonerasable disk used for storing computer data.
The standard system uses 12-cm disks and can hold more than 650 Mbytes.
that are not faced by HDDs:
 Flash memory becomes
CD-R
CD Recordable. Similar to a CD-ROM. The user can write to the disk only once.
Table 6. 6
 SDD performance has a unusable after a certain
CD-RW
tendency to slow down as the number of writes
device is used  Techniques for prolonging
CD Rewritable. Similar to a CD-ROM. The user can erase and rewrite to the disk
multiple times. Optical
life:
 The entire block must be
 Front-ending the flash with a
DVD Disk
read from the flash memory cache to delay and group
Digital Versatile Disk. A technology for producing digitized, compressed representation

and placed in a RAM buffer write operations


of video information, as well as large volumes of other digital data. Both 8 and 12 cm diameters
are used, with a double-sided capacity of up to 17 Gbytes. The basic DVD is read-only (DVD-
Products
 Using wear-leveling ROM).
 Before the block can be algorithms that evenly
written back to flash distribute writes across block DVD-R
memory, the entire block of of cells DVD Recordable. Similar to a DVD-ROM. The user can write to the disk only once.
 Bad-block management Only one-sided disks can be used.
flash memory must be
techniques
erased DVD-RW
 Most flash devices estimate DVD Rewritable. Similar to a DVD-ROM. The user can erase and rewrite to the disk
 The entire block from the their own remaining lifetimes multiple times. Only one-sided disks can be used.
buffer is now written back to so systems can anticipate
the flash memory failure and take preemptive Blu-Ray DVD
action High definition video disk. Provides considerably greater data storage density than DVD,
using a 405-nm (blue-violet) laser. A single layer on a single side can store 25 Gbytes.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Compact Disk Read-Only Memory Protective
acrylic Label

(CD-ROM)
 Audio CD and the CD-ROM share a similar technology
 The main difference is that CD-ROM players are more rugged and Land
have error correction devices to ensure that data are properly transferred Pit
Polycarbonate Aluminum
 Production:
plastic
 The disk is formed from a resin such as polycarbonate
 Digitally recorded information is imprinted as a series of microscopic pits on
the surface of the polycarbonate
 This is done with a finely focused, high intensity laser to create a master disk
Laser transmit/
 The master is used, in turn, to make a die to stamp out copies onto
receive
polycarbonate
 The pitted surface is then coated with a highly reflective surface, usually
aluminum or gold
 This shiny surface is protected against dust and scratches by a top
coat of clear acrylic Figure 6.9 CD Operation
 Finally a label can be silkscreened onto the acrylic

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
 CD-ROM is appropriate for the distribution of large CD-ROM
amounts of data to a large number of users

 Because the expense of the initial writing process it


Layered is not appropriate for individualized applications

Sector

Mode
MIN

SEC
00 FF . . . FF 00 Data ECC
 The CD-ROM has two advantages:
12 bytes 4 bytes 2048 bytes 288 bytes  The optical disk together with the information stored
SYNC ID Data L-ECC on it can be mass replicated inexpensively

 The optical disk is removable, allowing the disk itself


2352 bytes to be used for archival storage

 The CD-ROM disadvantages:


 It is read-only and cannot be updated
Figure 6.10 CD-ROM Block Format  It has an access time much longer than that of a
magnetic disk drive

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Label

Protective layer
(acrylic)
1.2 mm
Reflective layer thick
(aluminum)
Polycarbonate substrate Laser focuses on polycarbonate
(plastic) pits in front of reflective layer.

 Write-once read-many  Can be repeatedly written and


(a) CD-ROM - Capacity 682 MB
overwritten
 Accommodates applications in  Phase change disk uses a material that
which only one or a small has two significantly different Polycarbonate substrate, side 2
number of copies of a set of data reflectivities in two different phase states
is needed  Amorphous state
Semireflective layer, side 2

Polycarbonate layer, side 2

 Disk is prepared in such a way  Molecules exhibit a random Fully reflective layer, side 2
that it can be subsequently orientation that reflects light poorly
Fully reflective layer, side 1 1.2 mm
written once with a laser beam  Crystalline state thick
of modest-intensity  Has a smooth surface that reflects light
Polycarbonate layer, side 1

Semireflective layer, side 1 Laser focuses on pits in one layer


well on one side at a time. Disk must
 Medium includes a dye layer Polycarbonate substrate, side 1 be flipped to read other side.
which is used to change  A beam of laser light can change the
reflectivity and is activated by a material from one phase to the other
high-intensity laser  Disadvantage is that the material (b) DVD-ROM, double-sided, dual-layer - Capacity 17 GB
eventually and permanently loses its
 Provides a permanent record of desirable properties
large volumes of user data  Advantage is that it can be rewritten Figure 6.11 CD-ROM and DVD-ROM

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
CD 2.11 µm

Data layer
+
Beam spot Land

Pit 1.2 µm  Tape systems use the same reading and recording techniques
0.58 µm as disk systems
Blu-ray
Track
 Medium is flexible polyester tape coated with magnetizable
laser wavelength material
= 780 nm

 Coating may consist of particles of pure metal in special


1.32 µm
0.1 µm binders or vapor-plated metal films
DVD

 Data on the tape are structured as a number of parallel tracks


running lengthwise
405 nm

0.6 µm
 Serial recording
 Data are laid out as a sequence of bits along each track

 Data are read and written in contiguous blocks called physical


650 nm
records

 Blocks on the tape are separated by gaps referred


to as inter-record gaps
Figure 6.12 Optical Memory Characteristics

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 6.7
Track 2

Track 1

Track 0 LTO Tape Drives


Direction of
Bottom read/write
edge of tape
LTO-1 LTO-2 LTO-3 LTO-4 LTO-5 LTO-6 LTO-7 LTO-8
Release date 2000 2003 2005 2007 2010 TBA TBA TBA
(a) Serpentine reading and writing
Compressed 200 GB 400 GB 800 GB 1600 GB 3.2 TB 8 TB 16 TB 32 TB
capacity
Track 3 4 8 12 16 20 Compressed 40 MB/s 80 MB/s 160 MB/s 240 MB/s 280 MB/s 525 MB/s 788 MB/s 1.18 GB/s
transfer rate
Track 2 3 7 11 15 19 (MB/s)
Linear density 4880 7398 9638 13250 15142
Track 1 2 6 10 14 18 (bits/mm)
Tape tracks 384 512 704 896 1280
Track 0 1 5 9 13 17
Tape length 609 m 609 m 680 m 820 m 846 m
Direction of Tape width (cm) 1.27 1.27 1.27 1.27 1.27
tape motion
Write elements 8 8 16 16 16
(b) Block layout for system that reads/writes four tracks simultaneously
WORM? No No Yes Yes Yes Yes Yes Yes
Encryption No No No Yes Yes Yes Yes Yes
Capable?
Figure 6.13 Typical Magnetic Tape Features Partitioning? No No No No Yes Yes Yes Yes

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Summary +
External Memory

Chapter 6
 RAID
 Magnetic disk
 RAID level 0
 Magnetic read and write
mechanisms  RAID level 1

 Data organization and  RAID level 2


formatting  RAID level 3
 Physical characteristics  RAID level 4
 Disk performance parameters  RAID level 5
 RAID level 6
 Solid state drives William Stallings
SSD compared to HDD

 SSD organization
 Optical memory Computer Organization
 Compact disk
 Practical issues
and Architecture
 Digital versatile disk
 High-definition optical disks
10th Edition
 Magnetic tape
© 2016 Pearson Education, Inc., Hoboken,
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. NJ. All rights reserved.
Addressing Modes
Immediate

Direct

Indirect

Register

+ Chapter 13 Register indirect

Displacement
Instruction Sets: Addressing
Modes and Formats Stack

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Instruction Instruction Instruction
Operand A A
Memory Memory

Operand
Table 13.1
(a) Immediate (b) Direct
Operand

(c) Indirect
Basic Addressing Modes
Instruction Instruction Instruction
R R R A
Memory Memory Mode Algorithm Principal Advantage Principal Disadvantage

Operand
Immediate Operand = A No memory reference Limited operand magnitude
Operand Operand
Direct EA = A Simple Limited address space
Registers Registers Registers
(d) Register (e) Register Indirect (f) Displacement Indirect EA = (A) Large address space Multiple memory references
Instruction
Register EA = R No memory reference Limited address space
Implicit Register indirect EA = (R) Large address space Extra memory reference
Displacement EA = A + (R) Flexibility Complexity
Top of Stack
Register Stack EA = top of stack No memory reference Limited applicability
(g) Stack

Figure 13.1 Addressing Modes


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Direct Addressing

 Simplest form of addressing

 Operand = A

 This mode can be used to define and use constants or set initial
values of variables
 Typically the number will be stored in twos complement form
 The leftmost bit of the operand field is used as a sign bit

 Advantage:
 No memory reference other than the instruction fetch is required to
obtain the operand, thus saving one memory or cache cycle in the
instruction cycle

 Disadvantage: Limitation is that it


 The size of the number is restricted to the size of the address field, which,
in most instruction sets, is small compared with the word length

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Register Addressing

 Reference to the address of a word in memory which contains a


full-length address of the operand Address field
refers to a
 EA = (A) register rather EA = R
 Parentheses are to be interpreted as meaning contents of than a main
memory address
 Advantage:
 For a word length of N an address space of 2N is now available

 Disadvantage:
 Instruction execution requires two memory references to fetch the operand Advantages: Disadvantage:
 One to get its address and a second to get its value • Only a small • The address space
address field is is very limited
 A rarely used variant of indirect addressing is multilevel or cascaded needed in the
instruction
indirect addressing
• No time-consuming
 EA = ( . . . (A) . . . ) memory references
 Disadvantage is that three or more memory references could be required are required
to fetch an operand

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Register Indirect Addressing

 Analogous to indirect addressing  Combines the capabilities of direct addressing and register
indirect addressing
 The only difference is whether the address field refers to a
memory location or a register  EA = A + (R)

 EA = (R)  Requires that the instruction have two address fields, at least one
of which is explicit
 Address space limitation of the address field is overcome by  The value contained in one address field (value = A) is used directly
 The other address field refers to a register whose contents are added
having that field refer to a word-length location containing an to A to produce the effective address
address
 Most common uses:
 Uses one less memory reference than indirect addressing  Relative addressing
 Base-register addressing
 Indexing

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Base-Register Addressing

 The referenced register contains a main memory address and


The implicitly referenced register is the program counter (PC) the address field contains a displacement from that address
• The next instruction address is added to the address field to produce the EA  The register reference may be explicit or implicit
• Typically the address field is treated as a twos complement number for this
operation  Exploits the locality of memory references
• Thus the effective address is a displacement relative to the address of the
instruction  Convenient means of implementing segmentation

 In some implementations a single segment base register is


employed and is used implicitly

 In others the programmer may choose a register to hold the


base address of a segment and the instruction must reference it
explicitly

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
 The address field references a main memory address and the referenced
register contains a positive displacement from that address

 The method of calculating the EA is the same as for base-register addressing  A stack is a linear array of locations
 Sometimes referred to as a pushdown list or last-in-first-out queue
 An important use is to provide an efficient mechanism for performing
iterative operations
 A stack is a reserved block of locations
 Autoindexing  Items are appended to the top of the stack so that the block is partially filled
 Automatically increment or decrement the index register after each reference to it
 Associated with the stack is a pointer whose value is the address of the top of
 EA = A + (R) the stack
 (R)  (R) + 1  The stack pointer is maintained in a register
 Thus references to stack locations in memory are in fact register indirect addresses
 Postindexing
 Indexing is performed after the indirection  Is a form of implied addressing
 EA = (A) + (R)
 The machine instructions need not include a memory
 Preindexing reference but implicitly operate on the top of the stack
 Indexing is performed before the indirection
 EA = (A + (R))

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 13.2
x86 Addressing Modes
Mode Algorithm
Immediate Operand = A
Register Operand LA = R
Displacement LA = (SR) + A
Base LA = (SR) + (B)
Base with Displacement LA = (SR) + (B) + A
Scaled Index with Displacement LA = (SR) + (I) S + A
Base with Index and Displacement LA = (SR) + (B) + (I) + A
Base with Scaled Index and Displacement LA = (SR) + (I) S + (B) + A
Relative LA = (PC) + A
LA = linear address
(X) = contents of X
SR = segment register
PC = program counter
A = contents of an address field in the instruction
R = register
B = base register
I = index register
S = scaling factor
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
STRB r0, [r1, #12]

Offset
0xC 0x20C 0x5
r0 Destination
0x5 register
r1 for STR
Original
base register
0x200 0x200

(a) Offset

STRB r0, [r1, #12]!


 Data processing instructions
r1 Offset
Updated
base register 0x20C 0xC 0x20C 0x5
 Use either register addressing or a mixture of register and
r0
0x5
Destination
register
immediate addressing
r1 for STR
Original
base register
0x200 0x200  For register addressing the value in one of the register operands
may be scaled using one of the five shift operators
(b) Preindex

STRB r0, [r1], #12

r1
Branch instructions
Offset
Updated 
base register 0x20C 0xC 0x20C
r0
0x5
Destination
register  The only form of addressing for branch instructions is immediate
r1 for STR
Original
base register
0x200 0x200 0x5  Instruction contains 24 bit value
 Shifted 2 bits left so that the address is on a word boundary
(c) Postindex
 Effective range ± 32MB from from the program counter
Figure 13.3 ARM Indexing Methods

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
LDMxx r10, {r0, r1, r4}
STMxx r10, {r0, r1, r4}
Instruction Formats
Increment Increment Decrement Decrement
after (IA) before (IB) after (DA) before (DB)
r10
Base register 0x20C (r4) 0x218
(r4) (r1) 0x214
(r1) (r0) 0x210 Must include
Define the
(r0) (r4) 0x20C an opcode For most
layout of the
(r1) (r4) 0x208 and, implicitly instruction
bits of an
(r0) (r1) 0x204 or explicitly, sets more than
instruction, in
(r0) 0x200 indicate the one
terms of its
addressing instruction
constituent
mode for each format is used
fields
operand
Figure 13.4 ARM Load/Store Multiple Addressing

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Instruction Length
 Most basic design issue
Number of
 Affects, and is affected by: Number of
addressing
 Memory size
operands
 Memory organization modes
 Bus structure
 Processor complexity
Processor speed

Number of Address
 Should be equal to the memory-transfer length or one should
be a multiple of the other
register sets range granularity
 Should be a multiple of the character length, which is usually
8 bits, and of the length of fixed-point numbers

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Memory Reference Instructions
Opcode D/I Z/C Displacement
0 2 3 4 5 11

Input/Output Instructions
1 1 0 Device Opcode
0 2 3 8 9 11

Register Reference Instructions


Group 1 Microinstructions
1 1 1 0 CLA CLL CMA CML RAR RAL BSW IAC
0 1 2 3 4 5 6 7 8 9 10 11
Index
Group 2 Microinstructions Opcode Register I Memory Address
Register
1 1 1 1 CLA SMA SZA SNL RSS OSR HLT 0
0 8 9 12 14 17 18 35
0 1 2 3 4 5 6 7 8 9 10 11

Group 3 Microinstructions I = indirect bit


1 1 1 1 CLA MQA 0 MQL 0 0 0 1
0 1 2 3 4 5 6 7 8 9 10 11 Figure 13.6 PDP-10 Instruction Format
D/I = Direct/Indirect address IAC = Increment ACcumulator
Z/C = Page 0 or Current page SMA = Skip on Minus Accumulator
CLA = Clear Accumulator SZA = Skip on Zero Accumulator
CLL = Clear Link SNL = Skip on Nonzero Link
CMA = CoMplement Accumulator RSS = Reverse Skip Sense
CML = CoMplement Link OSR = Or with Switch Register
RAR = Rotate Accumultator Right HLT = HaLT
RAL = Rotate Accumulator Left MQA = Multiplier Quotient into Accumulator
BSW = Byte SWap MQL = Multiplier Quotient Load

Figure 13.5 PDP-8 Instruction Formats


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ 1 Opcode

4
Source

6
Destination

6
2 Opcode

7
R

3
Source

6
3

8 8

4 Opcode FP Destination 5 6

8 2 6 10 6 12 4

7 8 Opcode

13 3 16

 Variations can be provided efficiently and compactly 9 Opcode Source Destination Memory Address

4 6 6 16

 Increases the complexity of the processor 10 Opcode R Source Memory Address

7 3 6 16

 Does not remove the desirability of making all of the 11 Opcode FP Source Memory Address

instruction lengths integrally related to word length 8 2 6 16

 Because the processor does not know the length of the next 12 Opcode Destination Memory Address

instruction to be fetched a typical strategy is to fetch a number of 10 6 16

bytes or words equal to at least the longest possible instruction 13 Opcode Source Destination Memory Address 1 Memory Address 2

 Sometimes multiple instructions are fetched 4 6 6 16 16

Numbers below fields indicate bit length


Source and Destination each contain a 3-bit addr essing mode field and a 3-bit register number
FP indicates one of four floating-point registers
R indicates one of the general-purpose r egisters
CC is the condition code field

Figure 13.7 Instruction Formats for the PDP-11

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Hexadecimal Explanation Assembler Notation 0 or 1 0 or 1 0 or 1 0 or 1
Format and Description bytes bytes bytes bytes
8 bits Operand Address
Instruction Segment
0 5 Opcode for RSB RSB size size
Return from subroutine prefix override
override override

D 4 Opcode for CLRL CLRL R9


5 9 Register R9 Clear register R9

0 or 1 0 or 1
B 0 Opcode for MOVW MOVW 356(R4), 25(R11)
0, 1, 2, 3, or 4 bytes 1, 2, or 3 bytes bytes bytes 0, 1, 2, or 4 bytes 0, 1, 2, or 4 bytes
Word displacement mode,
C 4 Register R4
Move a word from address
that is 356 plus contents
6 4 of R4 to address that is
Instruction prefixes Opcode ModR/m SIB Displacement Immediate
356 in hexadecimal
0 1 25 plus contents of R11
Byte displacement mode,
A B Register R11
1 9 25 in hexadecimal

C 1 Opcode for ADDL3 ADDL3 #5, R0, @A[R2]

0 5 Short literal 5 Add 5 to a 32-bit integer in

5 0 Register mode R0
R0 and store the result in Mod Reg/Opcode R/M Scale Index Base
location whose address is
4 2 Index prefix R2 sum of A and 4 times the 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
Indirect word relative contents of R2
D F (displacement from PC)

Amount of displacement from


PC relative to location A

Figure 13.8 Examples of VAX Instructions Figure 13.9 x86 Instruction Format
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Thumb-2 Instruction Set

 The only instruction set available on the Cortex-M microcontroller


products

 Is a major enhancement to the Thumb instruction set architecture (ISA)


 Introduces 32-bit instructions that can be intermixed freely with the older 16-
bit Thumb instructions
 Most 32-bit Thumb instructions are unconditional, whereas almost all ARM
instructions can be conditional
 Introduces a new If-Then (IT) instruction that delivers much of the functionality
of the condition field in ARM instructions

 Delivers overall code density comparable with Thumb, together with the
performance levels associated with the ARM ISA

 Before Thumb-2 developers had to choose between Thumb for size and
ARM for performance

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Address Cont ents Address Contents
101 0010 0010 101 2201 101 2201
102 0001 0010 102 1202 102 1202
103 0001 0010 103 1203 103 1203
104 0011 0010 104 3204 104 3204

201 0000 0000 201 0002 201 0002


202 0000 0000 202 0003 202 0003
203 0000 0000 203 0004 203 0004
204 0000 0000 204 0000 204 0000

(a) Binary program (b) Hexadecimal program

Address Instruction Label Operation Operand


101 LDA 201 FORMUL LDA I
102 ADD 202 ADD J
103 ADD 203 ADD K
104 STA 204 STA N

201 DAT 2 I DATA 2


202 DAT 3 J DATA 3
203 DAT 4 K DATA 4
204 DAT 0 N DATA 0

(c) Symbolic program (d) Assembly program

Figure 13.14 Computation of the Formula N = I + J+ K


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Summary Instruction Sets:
Addressing Modes
and Formats
Chapter 13
 x86 addressing modes
 Addressing modes
 ARM addressing modes
 Immediate addressing
 Direct addressing  Instruction formats
 Indirect addressing  Instruction length
 Register addressing  Allocation of bits
 Register indirect addressing  Variable-length instructions
 Displacement addressing
 X86 instruction formats
 Stack addressing
 ARM instruction formats
 Assembly language

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

You might also like