Computer Architecture Slide
Computer Architecture Slide
Computer Organization
and Architecture
Chapter 1
Introduction
Architecture & Organization 1
Data
Control
Movement
Mechanism
Apparatus
Data
Processing
Facility
Operations (1)
z Data movement
ye.g. keyboard to screen
Data
Storage
Facility
Data
Control
Movement
Mechanism
Apparatus
Data
Processing
Facility
Operations (2)
z Storage
ye.g. Internet download to disk
Data
Storage
Facility
Data
Control
Movement
Mechanism
Apparatus
Data
Processing
Facility
Operation (3)
Data
Control
Movement
Mechanism
Apparatus
Data
Processing
Facility
Operation (4)
Data
Control
Movement
Mechanism
Apparatus
Data
Processing
Facility
Structure - Top Level
Peripherals Computer
Central Main
Processing Memory
Unit
Computer
Systems
Interconnection
Input
Output
Communication
lines
Structure - The CPU
CPU
Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection
Control
Unit
Structure - The Control Unit
Control Unit
CPU
Sequencing
ALU Login
Control
Internal
Unit
Bus
Control Unit
Registers Registers and
Decoders
Control
Memory
Outline of the Book (1)
z http://www.shore.net/~ws/COA5e.html
ylinks to sites of interest
ylinks to sites for courses that use the book
yerrata list for book
yinformation on other books by W. Stallings
Internet Resources
- Web sites to look for
z comp.arch
z comp.arch.arithmetic
z comp.arch.storage
William Stallings
Computer Organization
and Architecture
Chapter 2
Computer Evolution and
Performance
ENIAC - background
Input
Output Main
Equipment Memory
Accumulator MQ
Input MBR
Output Instructions
& Data Main
Equipment
Memory
IBR PC
MAR
IR Control
Circuits
Address
Program Control Unit
Commercial Computers
z 1964
z Replaced (& not compatible with) 7000 series
z First planned “family” of computers
ySimilar or identical instruction sets
ySimilar or identical O/S
yIncreasing speed
yIncreasing number of I/O ports (i.e. more terminals)
yIncreased memory size
yIncreased cost
z Multiplexed switch structure
DEC PDP-8
z 1964
z First minicomputer (after miniskirt!)
z Did not need air conditioned room
z Small enough to sit on a lab bench
z $16,000
y$100k+ for IBM 360
z Embedded applications & OEM
z BUS STRUCTURE
DEC - PDP-8 Bus Structure
OMNIBUS
Semiconductor Memory
z 1970
z Fairchild
z Size of a single core
yi.e. 1 bit of magnetic core storage
z Holds 256 bits
z Non-destructive read
z Much faster than core
z Capacity approximately doubles each year
Intel
z 1971 - 4004
yFirst microprocessor
yAll CPU components on a single chip
y4 bit
z Followed in 1972 by 8008
y8 bit
yBoth designed for specific applications
z 1974 - 8080
yIntel’s first general purpose microprocessor
Speeding it up
z Pipelining
z On board cache
z On board L1 & L2 cache
z Branch prediction
z Data flow analysis
z Speculative execution
Performance Mismatch
z http://www.intel.com/
ySearch for the Intel Museum
z http://www.ibm.com
z http://www.dec.com
z Charles Babbage Institute
z PowerPC
z Intel Developer Home
William Stallings
Computer Organization
and Architecture
Chapter 3
A top level view of computer
function & Interconnection
Program Concept
A sequence of steps
For each step, an arithmetic or logical operation
is done
For each operation, a different set of control
signals is needed
Function of Control Unit
We have a computer!
Components
Two steps:
Fetch
Execute
Fetch Cycle
Processor-memory
data transfer between CPU and main memory
Processor I/O
Data transfer between CPU and I/O module
Data processing
Some arithmetic or logical operation on data
Control
Alteration of sequence of operations
e.g. jump
Combination of above
Example of Program Execution
Instruction Cycle -
State Diagram
Interrupts
Mechanism by which other modules (e.g. I/O)
may interrupt normal sequence of processing
Program
e.g. overflow, division by zero
Timer
Generated by internal processor timer
Used in pre-emptive multi-tasking
I/O
from I/O controller
Hardware failure
e.g. memory parity error
Program Flow Control
Interrupt Cycle
Added to instruction cycle
Processor checks for interrupt
Indicated by an interrupt signal
If no interrupt, fetch next instruction
If interrupt pending:
Suspend execution of current program
Save context
Set PC to start address of interrupt handler routine
Process interrupt
Restore context and continue interrupted program
Instruction Cycle (with
Interrupts) - State Diagram
Multiple Interrupts
Disable interrupts
Processor will ignore further interrupts whilst
processing one interrupt
Interrupts remain pending and are checked after first
interrupt has been processed
Interrupts handled in sequence as they occur
Define priorities
Low priority interrupts can be interrupted by higher
priority interrupts
When higher priority interrupt has been processed,
processor returns to previous interrupt
Multiple Interrupts - Sequential
Multiple Interrupts - Nested
Connecting
Carries data
Remember that there is no difference between “data”
and “instruction” at this level
Width is a key determinant of performance
8, 16, 32, 64 bit
Address bus
Dedicated
Separate data & address lines
Multiplexed
Shared lines
Address valid or data valid control line
Advantage - fewer lines
Disadvantages
More complex control
Ultimate performance
Elements of Bus Design
Type
Dedicated Address
Multiplexed Data
Bus Width
Address
Data
Method of Arbitration
Centralized Read
Distributed Write
Timing Read-modify-write
Synchronous Read-after-write
Asynchronous Block
Elements of Bus Design
Systems lines
Including clock and reset
Address & Data
32 time mux lines for address/data
Interrupt & validate lines
Interface Control
Arbitration
Not shared
Direct connection to PCI bus arbiter
Error lines
PCI Bus Lines (Optional)
Interrupt lines
Not shared
Cache support
64-bit Bus Extension
Additional 32 lines
Time multiplexed
2 lines to enable devices to agree to use 64-bit transfer
JTAG/Boundary Scan
For testing procedures
PCI Commands
Location
Capacity
Unit of transfer
Access method
Performance
Physical type
Physical characteristics
Organisation
Location
CPU
Internal
External
Capacity
Word size
The natural unit of organisation
Number of words
or Bytes
Unit of Transfer
Internal
Usually governed by data bus width
External
Usually a block which is much larger than a word
Addressable unit
Smallest location which can be uniquely addressed
Word internally
Cluster on M$ disks
Access Methods (1)
Sequential
Start at the beginning and read through in order
Access time depends on location of data and previous
location
e.g. tape
Direct
Individual blocks have unique address
Access is by jumping to vicinity plus sequential
search
Access time depends on location and previous
location
e.g. disk
Access Methods (2)
Random
Individual addresses identify locations exactly
Access time is independent of location or previous
access
e.g. RAM
Associative
Data is located by a comparison with contents of a
portion of the store
Access time is independent of location or previous
access
e.g. cache
Memory Hierarchy
Registers
In CPU
Internal or Main memory
May include one or more levels of cache
“RAM”
External memory
Backing store
Performance
Access time
Time between presenting the address and getting the
valid data
Memory Cycle time
Time may be required for the memory to “recover”
before next access
Cycle time is access + recovery
Transfer Rate
Rate at which data can be moved
Physical Types
Semiconductor
RAM
Magnetic
Disk & Tape
Optical
CD & DVD
Others
Bubble
Hologram
Physical Characteristics
Decay
Volatility
Erasable
Power consumption
Organisation
How much?
Capacity
How fast?
Time is money
How expensive?
Hierarchy List
Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape
So you want fast?
RAM
Misnamed as all semiconductor memory is random
access
Read/Write
Volatile
Temporary storage
Static or dynamic
Dynamic RAM
Permanent storage
Microprogramming (see later)
Library subroutines
Systems programs (BIOS)
Function tables
Types of ROM
Written during manufacture
Very expensive for small runs
Programmable (once)
PROM
Needs special equipment to program
Read “mostly”
Erasable Programmable (EPROM)
Erased by UV
Electrically Erasable (EEPROM)
Takes much longer to write than read
Flash memory
Erase whole memory electrically
Organisation in detail
Hard Failure
Permanent defect
Soft Error
Random, non-destructive
No permanent damage to memory
Detected using Hamming error correcting code
Error Correcting Code Function
Cache
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
Size does matter
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
Typical Cache Organization
Mapping Function
Cache of 64kByte
Cache block of 4 bytes
i.e. cache is 16k (214) lines of 4 bytes
16MBytes main memory
24 bit address
(224=16M)
Direct Mapping
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
Direct Mapping
Cache Line Table
Simple
Inexpensive
Fixed location for given block
If a program accesses 2 blocks that map to the same
line repeatedly, cache misses are very high
Associative Mapping
Word
Tag 9 bit Set 13 bit 2 bit
No choice
Each block only maps to one line
Replace that line
Replacement Algorithms (2)
Associative & Set Associative
Foreground reading
Find out detail of Pentium II cache systems
NOT just from Stallings!
Newer RAM Technology (1)
Foreground reading
Check out any other RAM you can find
See Web site:
The RAM Guide
William Stallings
Computer Organization
and Architecture
Chapter 5
External Memory
Types of External Memory
Magnetic Disk
RAID
Removable
Optical
CD-ROM
CD-Writable (WORM)
CD-R/W
DVD
Magnetic Tape
Magnetic Disk
Fixed head
One read write head per track
Heads mounted on fixed ridged arm
Movable head
One read write head per side
Mounted on a movable arm
Fixed and Movable Heads
Removable or Not
Removable disk
Can be removed from drive and replaced with
another disk
Provides unlimited storage capacity
Easy data transfer between systems
Nonremovable disk
Permanently mounted in the drive
Floppy Disk
Universal
Cheap
Fastest external storage
Getting larger all the time
Multiple Gigabyte now usual
Removable Hard Disk
ZIP
Cheap
Very common
Only 100M
JAZ
Not cheap
1G
L-120 (a: drive)
Also reads 3.5” floppy
Becoming more popular?
Finding Sectors
Sync Sync
Byte Track Head Sector CRC Data CRC
Byte
Foreground reading
Find others
Characteristics
Seek time
Moving head to correct track
(Rotational) latency
Waiting for data to rotate under head
Access time = Seek + Latency
Transfer rate
RAID
No redundancy
Data striped across all disks
Round Robin striping
Increase speed
Multiple data requests probably not on same disk
Disks seek in parallel
A set of data is likely to be striped across multiple
disks
RAID 1
Mirrored Disks
Data is striped across disks
2 copies of each stripe on separate disks
Read from either
Write to both
Recovery is simple
Swap faulty disk & re-mirror
No down time
Expensive
RAID 2
Disks are synchronized
Very small stripes
Often single byte/word
Error correction calculated across corresponding
bits on disks
Multiple parity disks store Hamming code error
correction in corresponding positions
Lots of redundancy
Expensive
Not used
RAID 3
Similar to RAID 2
Only one redundant disk, no matter how large
the array
Simple parity bit for each set of corresponding
bits
Data on failed drive can be reconstructed from
surviving data and parity info
Very high transfer rates
RAID 4
Like RAID 4
Parity striped across all disks
Round robin allocation for parity stripe
Avoids RAID 4 bottleneck at parity disk
Commonly used in network servers
Sector
Mode
FF Layered
Sec
Min
00 x 10 00 Data ECC
Difficult
Move head to rough position
Set correct speed
Read address
Adjust to required location
(Yawn!)
CD-ROM for & against
CD-Writable
WORM
Now affordable
Compatible with CD-ROM drives
CD-RW
Erasable
Getting cheaper
Mostly CD-ROM drive compatible
DVD - what’s in a name?
Multi-layer
Very high capacity (4.7G per layer)
Full length movie on single disk
Using MPEG compression
Finally standardized (honest!)
Movies carry regional coding
Players only play correct region films
Can be “fixed”
DVD - Writable
Serial access
Slow
Very cheap
Backup and archive
Digital Audio Tape (DAT)
Chapter 6
Input/Output
Input/Output Problems
Human readable
Screen, printer, keyboard
Machine readable
Monitoring and control
Communication
Modem
Network Interface Card (NIC)
I/O Module Function
Address
Lines Input External Data
Output Device
Data Logic Status
Interface
Lines
Logic Control
I/O Module Decisions
Programmed
Interrupt driven
Direct Memory Access (DMA)
Programmed I/O
8259A 8086
IRQ0
IRQ1
IRQ2
IRQ3 INTR
IRQ4
IRQ5
IRQ6
IRQ7
ISA Bus Interrupt System
http://www.pcguide.com/ref/mbsys/res/irq/func.htm
Parallel interface
8, 16, 32 bit data lines
Daisy chained
Devices are independent
Devices can communicate with each other as
well as host
SCSI - 1
Early 1980s
8 bit
5MHz
Data rate 5MBytes.s-1
Seven devices
Eight including host interface
SCSI - 2
1991
16 and 32 bit
10MHz
Data rate 20 or 40 Mbytes.s-1
Command,
Bus Data,
free Arbitration (Re)Selection
Status,
Message
SCSI Timing Diagram
Configuring SCSI
Daisy chain
Up to 63 devices on single port
Really 64 of which one is the interface itself
Up to 1022 buses can be connected with
bridges
Automatic configuration
No bus terminators
May be tree structure
FireWire v SCSI
FireWire 3 Layer Stack
Physical
Transmission medium, electrical and signaling
characteristics
Link
Transmission of data in packets
Transaction
Request-response protocol
FireWire - Physical Layer
Chapter 7
Operating System
Support
Objectives and Functions
Convenience
Making the computer easier to use
Efficiency
Allowing better use of computer resources
Layers and Views of a
Computer System
Operating System Services
Program creation
Program execution
Access to I/O devices
Controlled access to files
System access
Error detection and response
Accounting
O/S as a Resource Manager
Types of Operating System
Interactive
Batch
Single program (Uni-programming)
Multi-programming (Multi-tasking)
Early Systems
Instructions to Monitor
Usually denoted by $
e.g.
$JOB
$FTN
... Some Fortran instructions
$LOAD
$RUN
... Some data
$END
Desirable Hardware Features
Memory protection
To protect the Monitor
Timer
To prevent a job monopolizing the system
Privileged instructions
Only executed by Monitor
e.g. I/O
Interrupts
Allows for relinquishing and regaining control
Multi-programmed Batch
Systems
Key to multi-programming
Long term
Medium term
Short term
I/O
Long Term Scheduling
Dispatcher
Fine grained decisions of which job to execute
next
i.e. which job actually gets to use the processor
in the next time slot
Process States
Process Control Block
Identifier
State
Priority
Program counter
Memory pointers
Context data
I/O status
Accounting information
Key Elements of O/S
Process Scheduling
Uni-program
Memory split into two
One for Operating System (monitor)
One for currently executing program
Multi-program
“User” part is sub-divided and shared among active
processes
Swapping
Demand paging
Do not require all pages of a process in memory
Bring in pages as required
Page fault
Required page is not in memory
Operating System must swap in required page
May need to swap out a page to make space
Select page to throw out based on recent history
Thrashing
Solutions
Good page replacement algorithms
Reduce number of processes running
Fit more memory
Bonus
Stallings chapter 7
Stallings, W. Operating Systems, Internals and
Design Principles, Prentice Hall 1998
Loads of Web sites on Operating Systems
William Stallings
Computer Organization
and Architecture
Chapter 8
Computer Arithmetic
Arithmetic & Logic Unit
+3 = 00000011
+2 = 00000010
+1 = 00000001
+0 = 00000000
-1 = 11111111
-2 = 11111110
-3 = 11111101
Benefits
0= 00000000
Bitwise not 11111111
Add 1 to LSB +1
Result 1 00000000
Overflow is ignored, so:
-0=0
Negation Special Case 2
-128 = 10000000
bitwise not 01111111
Add 1 to LSB +1
Result 10000000
So:
-(-128) = -128 X
Monitor MSB (sign bit)
It should change during negation
Range of Numbers
8 bit 2s compliment
+127 = 01111111 = 27 -1
-128 = 10000000 = -27
16 bit 2s compliment
+32767 = 011111111 11111111 = 215 - 1
-32768 = 100000000 00000000 = -215
Conversion Between Lengths
Complex
Work out partial product for each digit
Take care with place value (column)
Add partial products
Multiplication Example
00001101 Quotient
Divisor 1011 10010011 Dividend
1011
001110
Partial 1011
Remainders
001111
1011
100 Remainder
Real Numbers
Stallings Chapter 8
IEEE 754 on IEEE Web site
William Stallings
Computer Organization
and Architecture
Chapter 9
Instruction Sets:
Characteristics
and Functions
What is an instruction set?
Data processing
Data storage (main memory)
Data movement (I/O)
Program flow control
Number of Addresses (a)
3 addresses
Operand 1, Operand 2, Result
a = b + c;
May be a forth - next instruction (usually implicit)
Not common
Needs very long words to hold everything
Number of Addresses (b)
2 addresses
One address doubles as operand and result
a=a+b
Reduces length of instruction
Requires some extra work
Temporary storage to hold some results
Number of Addresses (c)
1 address
Implicit second address
Usually a register (accumulator)
Common on early machines
Number of Addresses (d)
0 (zero) addresses
All addresses implicit
Uses a stack
e.g. push a
push b
add
pop c
c=a+b
How Many Addresses
More addresses
More complex (powerful?) instructions
More registers
Inter-register operations are quicker
Fewer instructions per program
Fewer addresses
Less complex (powerful?) instructions
More instructions per program
Faster fetch/execution of instructions
Design Decisions (1)
Operation repertoire
How many ops?
What can they do?
How complex are they?
Data types
Instruction formats
Length of op code field
Number of addresses
Design Decisions (2)
Registers
Number of CPU registers available
Which operations can be performed on which
registers?
Addressing modes (later…)
RISC v CISC
Types of Operand
Addresses
Numbers
Integer/floating point
Characters
ASCII etc.
Logical Data
Bits or flags
(Aside: Is there any difference between numbers and characters?
Ask a C programmer!)
Pentium Data Types
8 bit Byte
16 bit word
32 bit double word
64 bit quad word
Addressing is by 8 bit unit
A 32 bit double word is read at addresses
divisible by 4
Specific Data Types
Data Transfer
Arithmetic
Logical
Conversion
I/O
System Control
Transfer of Control
Data Transfer
Specify
Source
Destination
Amount of data
May be different instructions for different
movements
e.g. IBM 370
Or one instruction and different addresses
e.g. VAX
Arithmetic
Bitwise operations
AND, OR, NOT
Conversion
Privileged instructions
CPU needs to be in specific state
Ring 0 on 80386+
Kernel mode
For operating systems use
Transfer of Control
Branch
e.g. branch to x if result is zero
Skip
e.g. increment and skip if zero
ISZ Register1
Branch xxxx
ADD A
Subroutine call
c.f. interrupt call
Foreground Reading