0% found this document useful (0 votes)

3 views62 pages

CSC303 Architecture Lecture Note 2019

The document outlines the course CSC303: Computer Architecture and Organization I at the Nigeria Police Academy, detailing its structure, key concepts, and components of computer systems. It discusses various architectures, including Harvard and Von Neumann, and covers essential topics such as system buses, memory types, CPU structure, and instruction sets. The document serves as a comprehensive guide for understanding the fundamental principles of computer architecture and organization.

Uploaded by

Faizu muhammad kabir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views62 pages

CSC303 Architecture Lecture Note 2019

Uploaded by

Faizu muhammad kabir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

NIGERIA POLICE ACADEMY, WUDIL KANO

Department of Computer Science and Mathematics

CSC303: COMPUTER ARCHITECTURE AND ORGANIZATION I (3 Units)

Mr. Nuraddeen Ado Muhammad

References

1. Computer Organization and Architecture 5th Edition, 2000 William Stallings

2. Computer Organization And Architecture Design For Performance 8th and 9th Edition
By William Stallings
Contents

1. Introduction.

2. Computer Evolution and Performance.

THE COMPUTER SYSTEM

3. System Buses.

4. Internal Memory.

5. External Memory.

6. Input/Output.

7. THE CENTRAL PROCESSING UNIT

8. Computer Arithmetic.

9. Instruction Sets: Characteristics and Functions.

10. Instruction Sets: Addressing Modes and Formats.

11. CPU Structure and Function

12. Instruction Set Computers (i.e RISCs and CISC).

2
INTRODUCTION
Computer systems have conventionally been defined through their interfaces at a number of layered
abstraction levels, each providing functional support to its predecessor. Included among the levels
are the application programs, the high- level languages, and the set of machine instructions. Based on
the interface between different levels of the system, a number of computer architectures can be
defined.

The interface between the application programs and a high- level language is referred to as language
architecture. The instruction set architecture defines the interface between the basic machine
instruction set and the runtime and I/O control.
A different definition of computer architecture is built on four basic viewpoints. These are the
structure, the organization, the implementation, and the performance.

In this definition:
The structure defines the interconnection of various hardware components.
The organization defines the dynamic interplay and management of the various components.
The implementation defines the detailed design of hardware components, and
The Performance specifies the behavior of the computer system.
A computer system, like any system, consists of an interrelated set of components. The system is best
characterized in terms of structure-the way in which components are interconnected, and functio n-
the operation of the individual components. Furthermore, a computer's organization is hierarchic.
Each major component can be further described by decomposing it into its major subcomponents and
describing their structure and function. For clarity and ease of understanding, this hierarchica l
organization is described in this Lecture from the top down:
• Computer system: Major components are processor, memory, I/O.
• Processor: Major components are control unit, registers, ALU, and instruction execution unit.
• Control unit: Major components are control memory, microinstruction sequencing logic, and
registers.
Throughout the discussion, aspects of the system are viewed from the points of view of both
architecture (those attributes of a system visible to a machine language programmer) and organiza tio n
(the operational units and their interconnections that realize the architecture).
Organization and Architecture
• Computer Architecture refers to those attributes of a system that have a direct impact on the
logical execution of a program. Examples:
3
o The instruction set
o The number of bits used to represent various data types
o I/O mechanisms
o Memory addressing techniques
• Computer Organization refers to the operational units and their interconnections that realize
the architectural specifications. Examples are things that are transparent to the programmer:
o control signals
o interfaces between computer and peripherals
o the memory technology being used
• So, for example, the fact that a multiply instruction is available is a computer architecture
issue. How that multiply is implemented is a computer organization issue.
I. Harvard and Von Neuman Architectures which differ based on the way the CPU accesses the
main memory.

• Harvard Architecture

The Harvard architecture is a computer architecture with physically separate storage (memory) and
signal pathways for instructions and data

• Von Neuman Architecture

In contrast with the Harvard Architecture, the Von Neuman machine uses same memory for both
instructions and data.

All contemporary computer designs are based on concepts developed by John von Neumann at the
Institute for Advanced Studies Princeton. Such a design is referred to as the vonNeumann
architecture and is based on three key concepts:

• Data and instructions are stored in a single read-write memory.

• The contents of this memory are addressable by location, without regard to the type
of data contained there.
• Execution occurs in a sequential fashion (unless explicitly modified) from one
instruction to the next.

Structure and Function

• Modern computers contain millions of electronic components
• The key to describing such systems is to recognize their hierarchical nature
4
o They are a set of layers or levels of interrelated subsystems
o Each level consists of a set of components and their inter-relationships
• The behavior of each level depends only on a simplified, abstracted characterization of the
system at the next lower level
• At each level, the designer is concerned with:
o Structure: The way in which the components are interrelated
o Function: The operation of each individual component as part of the structure.
• We will usually describe systems from the top-down, instead of bottom-up.

Function
• A functional view of the computer
• Basic functions that a computer can perform:
o Data Processing - a wide variety of forms, but only a few fundamental methods or types
o Data Storage - long-term or short, temporary storage
o Data Movement
▪ Input/Output - when data are received from or delivered to a peripheral, a device
connected directly to the computer
▪ Data Communications - when data is moved over longer distances, to or from a remote
device

5
• Control - of the above functions, by instructions provided by the user of the computer (i.e.
their programs)
Four (4) Possible types of operations with this basic structure
Device for Processing Data in Storage
Device for Processing Data En-route Between the Outside World and Storage

Structure
• Simplest possible view of a computer:
o Storage
o Processing
o Peripherals
o Communication Lines
• Internal Structure of the Computer Itself:
o Central Processing Unit (CPU): Controls the operation of the computer and performs its
data processing functions. Often simply referred to as processor.
o Main Memory: Stores data.
o I/O: Moves data between the computer and its external environment.
o System Interconnection: Some mechanism that provides for communication among CPU,
main memory, and I/O.

6
• Main Structural components of the CPU:
o Control Unit: Controls the operation of the CPU and hence the computer.
o Arithmetic and Logic Unit (ALU): Performs the computer's data processing functions.
o Registers: Provides storage internal to the CPU.
o CPU Interconnection: Some mechanism that provides for communication among the
control unit, ALU, and registers.

7
A Brief History of Computers
• 1946: Princeton Institute for Advanced Studies (IAS) computer
o Prototype for all subsequent general-purpose computers. With rare exceptions, all of
today’s computers have this same general structure, and are thus referred to as von
Neumann machines.

• General IAS Structure Consists of:

o A main memory, which stores both data and instructions
o An ALU capable of operating on binary data
o A control unit, which interprets the instructions in memory and causes them to be executed
o I/O equipment operated by the control unit

8
Figure 2.2 Expanded Structure of IAS Computer
THE COMPUTER SYSTEM
System Buses
Interconnecting Basic Components
Computer Components
• The von Neumann architecture is based on three key concepts:
o Data and instructions are stored in a single read-write memory
o The contents of this memory are addressable by location, without regard to the type of
data contained there
o Execution occurs in a sequential fashion (unless explic itly modified) from one instructio n
to the next
• Two approaches to programming
o Hardwired Programming - Constructing a configuration of hardware logic components
to perform a particular set of arithmetic and logic operations on a set of data
o Software Programming - A sequence of codes or instructions, each of which supply the
necessary control signals to a general-purpose configuration of control and logic functio ns
(which may themselves be hardwired programs)
• Other components needed
o I/O Components - a means to:
▪ Accept data and instructions in some form, and convert to an internal form of signals
9
▪ Report results
o Main Memory
▪ Distinguished from external storage/peripherals
▪ A place to temporarily store both:
• Instructions - Data interpreted as codes for generating control signals
• Data - Data upon which computations are performed
▪ Interactions among Computer Components
o Memory Address Register – specifies address for next read or write
o Memory Buffer Register – contains data to be written into or receives data read from
memory
o I/O address register - Specifies a particular I/O device
o I/O buffer register - used for exchange of data between an I/O module and CPU (or
memory)
o Memory Module - a set of locations
▪ With sequentially numbered addresses
▪ Each holds a binary number that can be either an instruction or data

10
Computer Function
• Processing required for a single instruction is called an instruction cycle
• Simple POV (Point-Of-View): 2 steps

o Fetch - CPU reads an instruction from a location in memory

▪ Program counter (PC) register keeps track of which instruction executes next
▪ Normally, CPU increments PC after each fetch
▪ Fetched instruction is loaded into the instruction register (IR)
o Execute - CPU executes the instruction
▪ May involve several operations
▪ May utilize previously changed state of CPU and (indirectly) other devices
▪ General categories:
➢ CPU-Memory: Data may be transferred from CPU to memory or vice-versa
➢ CPU-IO: Data may be transferred between CPU and an I/O module
➢ Data Processing: CPU may perform some arithmetic or logic operation on the data
➢ Control: An instruction may specify that the sequence of execution be altered
• Expanded execution cycle
o Instruction Address Calculation (IAC) - determine the address of the next instruction
o Instruction Fetch (IF)
o Instruction Operation Decoding (IOD) - analyze OP to determine OP type and operands
o Operand Address Calculation (OAC)
o Operand Fetch (OF)
o Data Operation (DO) - perform indicated OP
o Operand Store (OS) - write result into memory or out to I/O
• Interrupts
o Mechanism by which other modules may interrupt the normal processing of the CPU
o Classes
▪ Program - as a result of program execution
▪ Timer - generated by hardware timer
11
▪ I/O - to signal completion of I/O or error
▪ Hardware failure
• Instruction cycle with interrupts

• When an interrupt signal is generated, the processor:

o Suspends execution of the current program and saves its context (such as PC and other
registers)
o Sets PC to starting address of an interrupt handler routine
• Multiple interrupts
o Can be handled by disabling some or all interrupts. Disabled interrupts generally remain
pending and are handled sequentially
o Can be handled by prioritizing interrupts, allowing a higher priority interrupt to interrupt
one of lower priority
• Physical Interrupts
o Interrupts are represented as one or more lines in the system bus
▪ One line: polling - when line goes high, CPU polls devices to determine which caused
interrupt
▪ Multiple lines: addressable interrupts - combination of lines indicates both interrupt
and which device caused it. Ex. 386 based architectures use 4 bit interrupts, allowing
IRQ’s 0-15 (with an extra line to signal pending)
Interconnection Structures
• The collection of paths connecting the various modules of a computer (CPU, memory, I/O) is
called the interconnection structure.

12
• It must support the following types of transfers:
o Memory to CPU
o CPU to Memory
o I/O to CPU
o CPU to I/O
o I/O to or from Memory - using Direct Memory Access (DMA)
Bus Interconnection
• A bus is a shared transmission medium
o Must only be used by one device at a time
o When used to connect major computer components (CPU, memory, I/O) is called a system
bus
• Three functional groups of communication lines

13
o Data lines (data bus) - move data between system modules
▪ Width is a key factor in determining overall system performance
o Address lines - designate source or destination of data on the data bus
▪ Width determines the maximum possible memory capacity of the system (may be a
multiple of width)
▪ Also used to address I/O ports. Typically:
➢ high-order bits select a particular module
➢ lower-order bits select a memory location or I/O port within the module
o Control lines - control access to and use of the data and address lines. Typical control
lines include:
▪ Memory write: causes data on the bus to be written into the addressed location
▪ Memory read: causes data from the addressed location to be placed on the bus
▪ I/O write: causes data on the bus to be output to the addressed I/O port
▪ I/O read: causes data from the addressed I/O port to be placed on the bus
▪ Transfer ACK: indicates that data have been accepted from or placed on the bus
▪ Bus request: indicates that a module needs to gain control of the bus
▪ Bus grant: indicates that a requesting module has been granted control of the bus
▪ Interrupt request: indicates that an interrupt is pending
▪ Interrupt ACK: acknowledges that the pending interrupt has been recognized
▪ Clock: is used to synchronize operations
▪ Reset: initializes all modules.
• If one module wishes to send data to another, it must:
o Obtain use of the bus
o Transfer data via the bus
• If one module wishes to request data from another, it must:
o Obtain use of the bus

14
o Transfer a request to the other module over control and address lines
o Wait for second module to send data
• Typical physical arrangement of a system bus
o A number of parallel electrical conductors
o Each system component (usually on one or more boards) taps into some or all of the bus
lines (usually with a slotted connector)
o System can be expanded by adding more boards
o A bad component can be replaced by replacing the board where it resides
Multiple Bus Hierarchies
• A great number of devices on a bus will cause performance to suffer
o Propagation delay - the time it takes for devices to coordinate the use of the bus
o The bus may become a bottleneck as the aggregate data transfer demand approaches the
capacity of the bus (in available transfer cycles/second)
• Traditional Hierarchical Bus Architecture
o Use of a cache structure insulates CPU from frequent accesses to main memory
o Main memory can be moved off local bus to a system bus
o Expansion bus interface
▪ buffers data transfers between system bus and I/O controllers on expansion bus
▪ insulates memory-to-processor traffic from I/O traffic
• Traditional Hierarchical Bus Architecture Example

15
• High-performance Hierarchical Bus Architecture
o Traditional hierarchical bus breaks down as higher and higher performance is seen in the
I/O devices
o Incorporates a high-speed bus
▪ Specifically designed to support high-capacity I/O devices
▪ Brings high-demand devices into closer integration with the processor and at the same
time is independent of the processor
▪ Changes in processor architecture do not affect the high-speed bus, and vice versa
o Sometimes known as a mezzanine architecture
• High-performance Hierarchical Bus Architecture Example

Elements of Bus Design

• Bus Types
o Dedicated - a line is permanently assigned either to one function or to a physical subset of
computer components
o Multiplexed
▪ Time multiplexing - using the same lines for multiple purposes (different purposes at
different times)
➢ Uses fewer lines, saving space and cost
➢ BUT more complex circuitry required in each module
➢ BUT potential reduction in performance
16
• Physical dedication - the use of multiple buses, each of which connects to only a subset of
modules, with an adapter module to connect buses and resolve contention at the higher level.
• Method of Arbitration - determining who can use the bus at a particular time
o Centralized - a single hardware device called the bus controller or arbiter allocates time
on the bus
o Distributed - each module contains access control logic and the modules act together to
share the bus
o Both methods designate one device (either CPU or an I/O module) as master, which may
initiate a data transfer with some other device, which acts as a slave.
• Timing
o Synchronous Timing
▪ Bus includes a clock line upon which a clock transmits a regular sequence of
alternating 1’s and 0’s of equal duration
▪ A single 1-0 transmission is referred to as a clock cycle or bus cycle
▪ All other devices on the bus can read the clock line, and all events start at the beginning
of a clock cycle

17
o Asynchronous Timing
▪ The occurrence of one event on a bus follows and depends on the occurrence of a previous
event
▪ Allows system to take advantage of advances in device performance by having a mixture of
slow and fast devices, using older and newer technology, sharing the same bus
▪ BUT harder to implement and test than synchronous timing

18
• Bus Width
o Data bus: wider = better performance
o Address bus: wider = more locations can be referenced
• Data Transfer Type
o All buses must support write (master to slave) and read (slave to master) transfers
• Combination operations
o Read-modify-write
▪ A read followed immediately by a write to the same address.
▪ Address is only broadcast once, at the beginning of the operation
▪ Indivisible, to prevent access to the data element by other potential bus masters
▪ Principle purpose is to protect shared memory in a multiprogramming system

19
o Read-after-write - indivisible operation consisting of a write followed immediately by a
read from the same address (for error checking purposes)
• Block data transfer
o one address cycle followed by n data cycles
o first data item to or from specified address
o remaining data items to or from subsequent addresses
PCI
• PCI = Peripheral Component Interconnect
o High-bandwidth
o Processor independent
o Can function as a mezzanine or peripheral bus
• Current Standard
o up to 64 data lines at 33Mhz
o requires few chips to implement
o supports other buses attached to PCI bus
o public domain, initially developed by Intel to support Pentium-based systems
o supports a variety of microprocessor-based configurations, including multiple processors
o uses synchronous timing and centralized arbitration
• Typical Desktop System

Note: Bridge acts as a data buffer so that the speed of the PCI bus may differ from that of the
processor’s I/O capability.
• Typical Server System

20
Note: In a multiprocessor system, one or more PCI configurations may be connected by bridges to
the processor’s system bus.
• Bus Structure
o 50 mandatory signal lines, divided into the following groups:
▪ System Pins - includes clock and reset
▪ Address and Data Pins - 32 time-multiplexed lines for addresses and data, plus lines
to interpret and validate these
▪ Interface Control Pins - control timing of transactions and provide coordination among
initiators and targets
▪ Arbitration Pins - not shared, each PCI master has its own pair to connect to PCI bus
arbiter
▪ Error Reporting Pins - for parity and other errors
o 50 optional signal lines, divided into the following groups:
▪ Interrupt Pins - not shared, each PCI device has its own interrupt line or lines to an
interrupt controller
▪ Cache Support Pins
▪ 64-bit Bus Extension Pins - 32 additional time-multiplexed lines for addresses and
data, plus lines to interpret and validate these, and to provide agreement between two
PCI devices on use of these
▪ ITAG/Boundary Scan Pins - support testing procedures from IEEE Standard 149.1
• PCI Commands
o issued by the initiator (the master) to the target (the slave)
21
o Use the C/BE lines
o Types
- Interrupt Ack - Memory Read Multiple
- Special Cycle - Memory Write
- I/O Read - Memory Write & Invalidate
- I/O Write - Configuration Read
- Memory Read - Configuration Write
- Memory Read Line - Dual Address Cycle
THE COMPUTER SYSTEM
MEMORY
Internal Memory
Characteristics of Computer Memory Systems
• Location
o CPU (registers and L1 cache)
o Internal Memory (main)
o External (secondary)
• Capacity
o Word Size - typically equal to the number of bits used to represent a number and to the
instruction length.
o Number of Words - has to do with the number of addressable units (which are typically
words, but are sometimes bytes, regardless of word size). For addresses of length A (in
bits), the number of addressable units is 2A.
• Unit of Transfer
o Word
o Block
• Access Method
o Sequential Access
▪ information used to separate or identify records is stored with the records
▪ access must be made in a specific linear sequence
▪ the time to access an arbitrary record is highly variable
o Direct Access
▪ individual blocks or records have an address based on physical location
▪ access is by direct access to general vicinity of desired information, then some search
22
▪ access time is still variable, but not as much as sequential access
o Random Access
▪ each addressable location has a unique, physical location
▪ access is by direct access to desired location
▪ access time is constant and independent of prior accesses
o Associative
▪ desired units of information are retrieved by comparing a sub-part of the unit with a
desired mask -- location is not needed
▪ access time is constant and independent of prior accesses
▪ most useful for searching - a search through N possible locations would take O(N)
with Random Access Memory, but O(1) with Associative Memory
• Performance
o Access Time
o Memory Cycle Time - primarily for random-access memory = access time + additiona l
time required before a second access can begin (refresh time, for example)
o Transfer Rate
▪ Generally measured in bits/second
▪ Inversely proportional to memory cycle time for random access memory
• Physical Type
o Most common - semiconductor and magnetic surface memories
o Others - optical, bubble, mechanical (e.g. paper tape), core etc
• Physical Characteristics
o volatile - information decays or is lost when power is lost
o non-volatile - information remains without deterioration until changed -- no electrical
power needed
o non-erasable
▪ information cannot be altered with a normal memory access cycle As a practical
matter, must be non-volatile
• Organization - the physical arrangement of bits to form words.
The Memory Hierarchy
• Design Constraints
o How much? “If you build it, they will come.” Applications tend to be built to use any
commonly available amount, so question is open-ended.
23
o How fast? Must be able to keep up with the CPU -- don’t want to waste cycles waiting for
instructions or operands.
o How expensive? Cost of memory (also associated with “How much?”) must be reasonable
vs. other component costs.
• There are trade-offs between the 3 key characteristics of memory (cost, capacity, and access
time) which yield the following relationships:
o Smaller access time -> greater cost per bit
o Greater capacity -> smaller cost per bit
o Greater capacity -> greater access time
• Contemporary Memory Hierarchy
o Magnetic Tape
o Optical/Magnetic Disk
o Disk Cache
o Main Memory
o Cache
o Registers
Semiconductor or Main Memory
• Types of Random-Access Semiconductor Memory
o RAM - Random Access Memory
▪ possible both to read data from the memory and to easily and rapidly write new data
into the memory
▪ volatile - can only be used for temporary storage (all the other types of random-access
memory are non-volatile)
▪ possible both to read data from the memory and to easily and rapidly write new data
into the memory
▪ Types:
✓ Dynamic - stores data as charge on capacitors
✓ tend to discharge over time
✓ require periodic charge (like a memory reference) to refresh
✓ more dense and less expensive than comparable static RAMs
✓ Static - stores data in traditional flip-flop logic gates
✓ no refresh needed
✓ generally faster than dynamic RAMs
24
o ROM - Read Only Memory
▪ contains a permanent pattern of data which cannot be changed
▪ data is actually wired-in to the chip as part of the fabrication process
▪ cheaper for high-volume production
o PROM - Programmable Read Only Memory
▪ writing process is performed electrically
▪ may be written after chip fabrication
✓ writing uses different electronics than normal memory writes
o EPROM - Erasable Programmable Read Only Memory
▪ read and written electrically, as with PROM
▪ before a write, all cells must be erased by exposure to UV radiation (erasure takes
about 20 minutes)
✓ writing uses different electronics than normal memory writes
✓ errors can be corrected by erasing and starting over
▪ more expensive than PROM
o EEPROM - Electrically Erasable Programmable Read Only Memory
▪ byte-level writing - any part(s) of the memory can be written at any time
▪ updateable in place - writing uses ordinary bus control, address, and data lines
▪ writing takes much longer than reading
▪ more expensive (per bit) and less dense than EPROM
o Flash Memory
▪ uses electrical erasing technology
▪ allows individual blocks to be erased, but not byte-level erasure, and modern flash
memory is updateable in place (some may function more like I/O modules)
▪ much faster erasure than EPROM
▪ same density as EPROM
▪ Sometimes refers to other devices, such as battery-backed RAM and tiny hard-
disk drives which behave like flash memory for all intents and purposes.
• Organization
o Typical organization
▪ bits read/written at a time
▪ Logically 4 square arrays of 2048x2048 cells
▪ Horizontal lines connect to Select terminals
25
▪ Vertical lines connect to Data-In/Sense terminals
▪ Multiple DRAMs must connect to memory controller to read/write an 8 bit word
▪ Illustrates why successive generations grow by a factor of 4 -- each extra pin
devoted to addressing doubles the number of rows and columns

• Chip Packaging

26
o Typical Pin outs
▪ A0-An: Address of word being accessed (may be multiplexed row/column) for an
n bit (n*2 bit) address
▪ D0-Dn: Data in/out for n bits
▪ Vcc: Power supply
▪ Vss: Ground
▪ CE: Chip enable - allows several chips to use same circuits for everything else, but
only have one chip use them
▪ Vpp: Program Voltage - used for writes to (programming) an EPROM
▪ RAS: Row Address Select
▪ CAS: Column Address Select
▪ W or WE: Write enable
▪ OE: Output enable
• Error Correction Principles
o Hard Failure
▪ A permanent defect
▪ Causes same result all the time, or randomly fluctuating results
o Soft Error - A random, nondestructive event that alters the contents of one or more
memory cells, without damaging the memory. Caused by:
▪ Power supply problems
▪ Alpha particles
o Detection and Correction
Cache Memory
• Principles
o Intended to give memory speed approaching that of fastest memories available but with
large size, at close to price of slower memories.
o Cache is checked first for all memory references.
o If not found, the entire block in which that reference resides in main memory is stored in
a cache slot, called a line.
o Each line includes a tag (usually a portion of the main memory address) which identifies
which particular block is being stored.
o The proportion of memory references, which are found already stored in cache, is called
the hit ratio.
27
• Elements of Cache Design
o Cache Size
▪ Small enough that overall average cost/bit is close to that of main memory alone
▪ large enough so that overall average access time is close to that of cache alone
▪ large caches tend to be slightly slower than small ones
▪ studies indicate that 1K-512K words is optimum cache size

28
29
THE COMPUTER SYSTEM
External Memory
Magnetic Disk
A disk is a circular platter constructed of nonmagnetic material, called the substrate, coated with a
magnetizable material. Traditionally, the substrate has been an aluminum or aluminum alloy material.
The glass substrate has a number of benefits, including the following:
• Improvement in the uniformity of the magnetic film surface to increase disk reliability
• A significant reduction in overall surface defects to help reduce read-write errors
• Better stiffness to reduce disk dynamics
• Greater ability to withstand shock and damage
Magnetic Read and Write Mechanisms
• Data are recorded on and later retrieved from the disk via a conducting coil named the head
• During a read or write operation, the head is stationary while the platter rotates beneath it
• The write mechanism exploits the fact that electricity flowing through a coil produces a
magnetic field. Electric pulses are sent to the write head, and the resulting magnetic patterns
are recorded on the surface below, with different patterns for positive and negative currents.

Data Organization and Formatting

• The head is a relatively small device capable of reading from or writing to a portion of the
platter rotating beneath it.
• This gives rise to the organization of data on the platter in a concentric set of rings, called
tracks.
30
• Each track is the same width as the head. There are thousands of tracks per surface.

Physical Characteristics
• In a fixed-head disk, there is one read-write head per track. All of the heads are mounted on
a rigid arm that extends across all tracks; such systems are rare today.
• In a movable-head disk, there is only one read-write head. Again, the head is mounted on
an arm.
The disk itself is mounted in a disk drive, which consists of the arm, a spindle that rotates the
disk, and the electronics needed for input and output of binary data.
• A nonremovable disk is permanently mounted in the disk drive; the hard disk in a
personal computer is a nonremovable disk.
• A removable disk can be removed and replaced with another disk.

31
RAID

Redundant Arrays of Independent Disks

Three Common (mostly) Characteristics
• RAID is a set of physical disk drives viewed by the operating system as a single logical drive.
• Data are distributed across the physical drives of an array.
• Redundant disk capacity is used to store parity information, which guarantees data
recoverability in case of a disk failure.* * Except for RAID level 0.
Level 0 (Non-redundant)

• Not a true member of RAID – no redundancy!

• Data is striped across all the disks in the array
o Each disk is divided into strips which may be blocks, sectors, or some other convenie nt
unit.
o Strips from a file are mapped round-robin to each array member
o A set of logically consecutive strips that maps exactly one strip to each array member is a
stripe

32
• If a single I/O request consists of multiple contiguous strips, up to n strips can be handled in
parallel, greatly reducing I/O transfer time.
Level 1 (Mirrored)

• Only level where redundancy is achieved by simply duplicating all the data
• Data striping is used as in RAID 0, but each logical strip is mapped to two separate physical
disks
• A read request can be serviced by disk with minimal seek and latency time
• Write requests require updating 2 disks, but both can be updated in parallel, so no penalty
• When a drive fails, data may be accessed from other drive
• High cost for high performance
o Usually used only for highly critical data.
o Best performance when requests are mostly reads
Level 2 (Redundancy through Hamming Code)

• Uses parallel access – all member disks participate in every I/O request
• Uses small strips, often as small as a single byte or word
• An error-correcting code (usually Hamming) is calculated across corresponding bits on each
data disk, and the bits of the code are stored in the corresponding bit positions on multip le
parity disks.
• Useful in an environment where a lot of disk errors are expected
o Usually expensive overkill.
o Disks are so reliable that this is never implemented
33
Level 3 (Bit-Interleaved Parity)

• Uses parallel access – all member disks participate in every I/O request
• Uses small strips, often as small as a single byte or word
• Uses only a single parity disk, no matter how large the disk array
o A simple parity bit is calculated and stored
o In the event of a failure in one disk, the data on that disk can be reconstructed from the
data on the others
o Until the bad disk is replaced, data can still be accessed (at a performance penalty) in
reduced mode
Level 4 (Block-Level Parity)

• Uses an independent access technique

o Each member disk operates independently, so separate I/O requests can be satisfied in
parallel.
o More suitable for apps that require high I/O request rates rather than high data transfer
rates.
• Relatively large strips
• Has a write penalty for small writes, but not for larger ones (because parity can be calculated
from values on other strips)
• In any case, every write involves the parity disk
Level 5 (Block-Level Distributed Parity)

34
• Like Level 4, but distributes parity strips across all disks, removing the parity bottleneck
Level 6 (Dual Redundancy)

• Like Level 5, but provides 2 parity strips for each stripe, allowing recovery from 2
simultaneous disk failures.
SOLID STATE DRIVES
• One of the most significant developments in computer architecture in recent years is the
increasing use of solid state drives (SSDs) to complement or even replace hard disk drives
(HDDs), both as internal and external secondary memory.
• The term solid state refers to electronic circuitry built with semiconductors.
• A solid state drive is a memory device made with solid state components that can be used as
a replacement to a hard disk drive.
Flash Memory
Flash memory is a type of semiconductor memory that has been around for a number of years and is
used in many consumer electronic products, including smart phones, GPS devices, MP3 players,
digital cameras, and USB devices.
• In a flash memory cell, a second gate—called a floating gate, because it is insulated by a thin
oxide layer—is added to the transistor.
• Initially, the floating gate does not interfere with the operation of the transistor, in this state,
the cell is deemed to represent binary 1.

35
• Applying a large voltage across the oxide layer causes electrons to tunnel through it and
become trapped on the floating gate, where they remain even if the power is disconnected, in
this state, the cell is deemed to represent binary 0.

There are two distinctive types of flash memory, designated as NOR and NAND.
• In NOR flash memory, the basic unit of access is a bit, and the logical organization resembles
a NOR logic device.
• For NAND flash memory, the basic unit is 16 or 32 bits, and the logical organiza tio n
resembles NAND devices.
• NOR flash memory provides high-speed random access. It can read and write data to specific
locations, and can reference and retrieve a single byte.
• NOR flash memory flash memory is used to store cell phone operating system code and on
Windows computers for the BIOS program that runs at startup.
• NAND reads and writes in small blocks. It is used in USB flash drives, memory cards (in
digital cameras, MP3 players, etc.), and in SSDs

OPTICAL MEMORY

• In 1983, one of the most successful consumer products of all time was introduced, the compact
disk (CD) digital audio system.
• The CD is a nonerasable disk that can store more than 60 minutes of audio information on
one side.

36
CD Operation

The disk is formed from a resin, such as polycarbonate. Digitally recorded information (either
music or computer data) is imprinted as a series of microscopic pits on the surface of the
polycarbonate. This is done, first of all, with a finely focused, high intensity laser to create a
master disk. The master is used, in turn, to make a die to stamp out copies onto polycarbonate.
The pitted surface is then coated with a highly reflective surface, usually aluminum or gold. This
shiny surface is protected against dust and scratches by a top coat of clear acrylic. Finally, a label
can be silkscreened onto the acrylic.

• Data on the CD-ROM are organized as a sequence of blocks. A typical block format is shown
in Figure 6.13. It consists of the following fields:

37
• Sync: The sync field identifies the beginning of a block. It consists of a byte of all 0s, 10 bytes
of all 1s, and a byte of all 0s.
• Header: The header contains the block address and the mode byte. Mode 0 specifies a blank
data field; mode 1 specifies the use of an error-correcting code and 2048 bytes of data; mode
2 specifies 2336 bytes of user data with no error-correcting code.
• Data: User data.
• Auxiliary: Additional user data in mode 2. In mode 1, this is a 288-byte error correcting code.

Digital Versatile Disk (DVD)

With the capacious digital versatile disk (DVD), the electronics industry has at last found an
acceptable replacement for the analog VHS video tape. The DVD has replaced the videotape used
in video cassette recorders (VCRs) and, more important for this discussion, replace the CD-ROM
in personal computers and servers.
The DVD’s greater capacity is due to three differences from CDs (Figure 6.14):
1. Bits are packed more closely on a DVD. The spacing between loops of a spiral on a CD is 1.6
μm and the minimum distance between pits along the spiral is 0.834 μm. The DVD uses a
laser with shorter wavelength and achieves a loop spacing of 0.74 μm and a minimum distance
between pits of 0.4 μm. The result of these two improvements is about a seven-fold increase
in capacity, to about 4.7 GB.
2. 2. The DVD employs a second layer of pits and lands on top of the first layer. A dual-layer
DVD has a semi reflective layer on top of the reflective layer, and by adjusting focus, the
lasers in DVD drives can read each layer separately. This technique almost doubles the
capacity of the disk, to about 8.5 GB. The lower reflectivity of the second layer limits its
storage capacity so that a full doubling is not achieved.
3. The DVD-ROM can be two sided, whereas data are recorded on only one side of a CD. This
brings total capacity up to 17 GB.

38
MAGNETIC TAPE
Tape systems use the same reading and recording techniques as disk systems. The medium is
flexible polyester (similar to that used in some clothing) tape coated with magnetizable material.
The coating may consist of particles of pure metal in special binders or vapor-plated metal films.
• The tape and the tape drive are analogous to a home tape recorder system.
• Tape widths vary from 0.38 cm (0.15 inch) to 1.27 cm (0.5 inch).
• Tapes used to be packaged as open reels that have to be threaded through a second spindle
for use.
• Data on the tape are structured as a number of parallel tracks running lengthwise.
• Earlier tape systems typically used nine tracks. This made it possible to store data one
byte a time, with an additional parity bit as the ninth track. This was followed by tape
systems using 18 or 36 tracks, corresponding to a digital word or double word is referred
as parallel recording while most modern systems instead use serial recording.
• The typical recording technique used in serial tapes is referred to as serpentine recording.
• Data are still recorded serially along individual tracks, but blocks in sequence are stored
on adjacent tracks, as suggested by Figure 6.16b.
• A tape drive is a sequential-access device. If the tape head is positioned at record 1, then
to read record N, it is necessary to read physical records 1 through N - 1, one at a time. If

39
the head is currently positioned beyond the desired record, it is necessary to rewind the
tape certain distance and begin reading forward.

THE COMPUTER SYSTEM

INPUT/OUTPUT (I/O)
Introduction
• Why not connect peripherals directly to system bus?
o Wide variety of peripherals with various operating methods
o Data transfer rate of peripherals is often much slower/faster than memory or CPU
o Different data formats and word lengths than used by computer
• Major functions of an I/O module
o Interface to CPU and memory via system bus or central switch
o Interface to one or more peripheral devices by tailored data links

40
External Devices
• External devices, often called peripheral devices or just peripherals, make computer systems
useful.
• Three broad categories of external devices:
o Human-Readable (ex. terminals, printers)
o Machine-Readable (ex. disks, sensors)
o Communication (ex. modems, NIC’s)
• Basic structure of an external device:
o Data - bits sent to or received from the I/O module
o Control signals - determine the function that the device will perform
o Status signals - indicate the state of the device (esp. READY/NOT-READY)
o Control logic - interprets commands from the I/O module to operate the device
o Transducer - converts data from computer-suitable electrical signals to the form of energy
used by the external device.
o Buffer - temporarily holds data being transferred between I/O module and the external
device.

41
I/O Modules
• An I/O Module is the entity within a computer responsible for:
o control of one or more external devices
o Exchange of data between those devices and main memory and/or CPU registers
• It must have two interfaces:
o Internal, to CPU and main memory
o External, to the device(s)
• Major function/requirement categories
o Control and Timing
▪ Coordinates the flow of traffic between internal resources and external devices
▪ Cooperation with bus arbitration
o CPU Communication
▪ Command Decoding
▪ Data
▪ Status Reporting
▪ Address Recognition.
o Device Communication (see diagram under External Devices)
▪ Commands
▪ Status Information
▪ Data
o Data Buffering
▪ Rate of data transfer to/from CPU is orders of magnitude faster than to/from external
devices

42
▪ I/O module buffers data so that peripheral can send/receive at its rate, and CPU can
send/receive at its rate
o Error Detection
▪ Must detect and correct or report errors that occur
▪ Types of errors
▪ Mechanical/electrical malfunctions
▪ Data errors during transmission
• I/O Module Structure

Programmed I/O
• With programmed I/O, data is exchanged under complete control of the CPU
o CPU encounters an I/O instruction
o CPU issues a command to appropriate I/O module
o I/O module performs requested action and sets I/O status register bits
o CPU must wait, and periodically check I/O module status until it finds that the operation
is complete
• To execute an I/O instruction, the CPU issues:
o An address, specifying I/O module and external device
o A command, 4 types:
▪ Control - activate a peripheral and tell it what to do
43
▪ Test - querying the state of the module or one of its external devices
▪ Read - obtain an item of data from the peripheral and place it in an internal buffer (data
register from preceding illustration)
▪ Write - take an item of data from the data bus and transmit it to the peripheral
• Two modes of addressing are possible:
o Memory-mapped I/O
o Isolated I/O
Interrupt-Driven I/O
• Problem with programmed I/O is CPU has to wait for I/O module to be ready for either
reception or transmission of data, taking time to query status at regular intervals.
• Interrupt-driven I/O is an alternative
o It allows the CPU to go back to doing useful work after issuing an I/O command.
o When the command is completed, the I/O module will signal the CPU that it is ready with
an interrupt.
Direct Memory Access
• Drawbacks of Programmed and Interrupt-Driven I/O
o The I/O transfer rate is limited by the speed with which the CPU can test and service a
device
o The CPU is tied up in managing an I/O transfer; a number of instructions must be executed
for each I/O transfer
• DMA Function

44
o When CPU wishes to read or write a block of data it issues a command to the DMA
module containing:
▪ Whether a read or write is requested
▪ The address of the I/O device involved
▪ The starting location in memory to read from or write to
▪ The number of words to be read or written
o CPU continues with other work
o DMA module handles entire operation. When memory has been modified as ordered, it
interrupts the CPU
o CPU is only involved at beginning and end of the transfer
o DMA module can force CPU to suspend operation while it transfers a word
▪ called cycle stealing
▪ not an interrupt, just a wait state
▪ slows operation of CPU, but not as badly as non-DMA
• Possible DMA Configurations
o Single Bus, Detached DMA

▪ DMA module uses programmed I/O as a surrogate CPU

▪ Each transfer of a word consumes 2 bus cycles
o Single-Bus, Integrated DMA-I/O

▪ DMA module is attached to one or more I/O modules

▪ Data and control instructions can move between I/O module and DMA module
without involving system bus
45
▪ DMA module must still steal cycles for transfer to memory
o I/O Bus

▪ Reduces number of I/O interfaces in the DMA module to one

▪ Easily expanded
▪ DMA module must still steal cycles for transfer to memory
The External Interface
• Types of interfaces
o Parallel interface - multiple lines connect the I/O module and the peripheral, and multip le
bits are transferred simultaneously
o Serial interface - only one line is used to transmit data, one bit at a time
• Typical dialog between I/O module and peripheral (for a write operation)
o The I/O module sends a control signal requesting permission to send data
o The peripheral ACK’s the request
o The I/O module transfers the data
o The peripheral ACK’s receiving the data
• Buffer in I/O module compensates for speed differences
• Point-to-Point and Multipoint Configurations
o Point-to-point interface
▪ provides a dedicated line between the I/O module and the external device
▪ examples are keyboard, printer, modem
• Multipoint interface
▪ in effect are external buses
▪ examples are SCSI, USB, IEEE 1394 (FireWire), and even IDE

46
THE COMPUTER SYSTEM

Operating System Support

Introduction
• From an architectural viewpoint, the most important function of an operating system is
resource management
o It controls the movement and storage of data
o It allows access to peripheral devices
o It controls which sets of instructions are allowed to process data at a particular time
• But it is an unusual control mechanism, in that: toggle
o It functions in the same way as ordinary computer software -- it is a program executed by
the CPU
o It frequently relinquishes control and must depend on the CPU to allow it to regain control
Scheduling
• The central task of modern operating systems is multiprogramming - allowing multiple jobs
or user programs to be executed concurrently.
• A better term than job (which is rooted in the old batch systems) is process. Several
definitions:
o A program in execution
o The “animated spirit” of a program
o That entity to which a processor is assigned
High-Level Scheduling
• High-level scheduling
o Determines which programs are admitted to the system for processing
o Executes relatively infrequently
o Controls the degree of multiprogramming (number of processes in memory)
o Once admitted, a job or program becomes a process and is added to a queue for the short-
term scheduler
Short-Term Scheduling
• The short-term scheduler, or dispatcher
o Determines which process (of those permitted by the high-level scheduler) gets to execute
next
o Executes frequently

47
• Process States - for short-term scheduling, a process is understood to be in one of 5 basic
states
o New - admitted by the high-level scheduler, but not yet ready to execute
o Ready - needs only the CPU
o Running - currently executing in the CPU
o Waiting - suspended from execution, waiting for some system resource
o Halted - the process has terminated and will be destroyed by the operating system

THE CENTRAL PROCESSING UNIT

Computer Arithmetic
• The ALU is that part of the computer that actually performs arithmetic and logical operations
on data.
• All of the other elements of the computer system—control unit, registers, memory, I/O—are
there mainly to bring data into the ALU for it to process and then to take the results back out.

Integer Representation
• Sign-Magnitude Representation
o Leftmost bit is sign bit: 0 for positive, 1 for negative
o Remaining bits are magnitude
48
o Drawbacks
▪ Addition and subtraction must consider both the signs and relative magnitudes -- more
complex
▪ Testing for zero must consider two possible zero representations
• Two’s Complement Representation
o Leftmost bit still indicates sign
o Positive numbers exactly same as sign-magnitude
o Zero is only all zeros (positive)
o Negative numbers found by taking 2’s complement
▪ Take complement of positive version
▪ Add 1
Integer Arithmetic (8.3)
• 2’s complement examples (with 8 bit numbers)
o Getting -55
▪ Start with +55: 0110111
▪ Complement that: 1001000
▪ Add 1: +0000001
▪ Total is -55: 1001001
o Negating -55
▪ Complement -55: 0110110
▪ Add 1: +0000001
▪ Total is 55 (see top) 0110111
o Adding -55 + 58
▪ Start with -55: 1001001
▪ Add 58: +0111010
▪ Result is 3: 0000011
▪ Overflow into and out-of sign bit is ignored
• Overflow Rule - if two numbers are added, and they are both positive or both negative, then
overflow occurs if and only if the result has the opposite sign
• Converting between different bit lengths
o Move sign bit to new leftmost position
o Fill in with copies of the sign bit
o Examples (8 bit -> 16 bit)
49
▪ +18: 00010010 -> 0000000000010010
▪ -18: 11101110 -> 1111111111101110
• Multiplication
o Repeated Addition
o Unsigned Integers
▪ Generating partial products, shifting, and adding
▪ Just like longhand multiplication
• Two’s Complement Multiplication
o Straightforward multiplication will not work if either the multiplier or multiplicand are
negative
▪ Multiplicand would have to be padded with sign bit into a 2n-bit partial product, so
that the signs would line up
▪ In a negative multiplier, the 1’s and 0’s would no longer correspond to add-shift’s and
shift-only’s
o Simple solution
▪ Convert both multiplier and multiplicand to positive numbers
▪ Perform multiplication
▪ Take 2’s complement of result if and only if the signs of original numbers were differe nt
▪ Other methods do not require this final transformation step
• Booth’s Algorithm
• Why does Booth’s Algorithm work?
o Consider multiplying some multiplicand M by 30: M * (00011110) which would take 4
shift-adds of M (one for each 1)
o That is the same as multiplying M by (32 - 2): M * (00100000 - 00000010) = M *
(00100000) - M * (00000010) which would take:
▪ 1 shift-only on no transition (imagine last bit was 0)
▪ 1 shift-subtract on the transition from 0 to 1
▪ 3 shift-only’s on no transition
▪ 1 shift-add on the transition from 1 to 0
▪ 2 shift-only’s on no transition
• Division
o Unsigned integers 00001101 Quotient Divisor 1011 10010011 Dividend 1011 001110
1011 001111 1011 100 Remainder
50
Floating-Point Representation
• Principles
o Using scientific notation, we can store a floating point number in 3 parts ±S * B±E :
▪ Sign
▪ Significand (or Mantissa)
▪ Exponent
▪ (The Base stays the same, so need not be stored)
o The sign applies to the significand. Exponents use a biased representation, where a fixed
value called the bias is subtracted from the field to get the actual exponent.
• We require that numbers be normalized, so that the decimal in the significand is always in the
same place
o we will choose just to the right of a leading 0
o format will be ±0.1bbb… b * 2±E
o thus, it is unnecessary to store either that leading 0, or the next 1, since all numbers will
have them
• IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754)
o Facilitates portability of programs from one processor to another
o Defines both 32-bit single, 64-bit double formats and
o .128-bit double formats

THE CENTRAL PROCESSING UNIT

CPU Structure and Function
Processor Organization
• Things a CPU must do:
51
o Fetch Instructions
o Interpret Instructions
o Fetch Data
o Process Data
o Write Data

• A small amount of internal memory, called the registers, is needed by the CPU to fulfill these
requirements
Register Organization
• Registers are at top of the memory hierarchy. They serve two functions:
o User-Visible Registers - Enable the machine- or assembly-language programmer to
minimize main-memory references by optimizing use of registers
o Control and Status Registers - used by the control unit to control the operation of the CPU
and by privileged, OS programs to control the execution of programs
• User-Visible Registers
o Categories of Use
▪ General Purpose

52
▪ Data
▪ Address
▪ Condition Codes
o Design Issues
▪ Completely general-purpose registers or specialized use?
✓ Specialized registers save bits in instructions because their use can be implicit
✓ General-purpose registers are more flexible
✓ Trend is toward use of specialized registers
▪ Number of registers provided?
✓ More registers require more operand specifier bits in instructions
✓ 8 to 32 registers appears optimum (RISC systems use hundreds, but are a completely
different approach)
▪ Register Length?
✓ Address registers must be long enough to hold the largest address
✓ Data registers should be able to hold values of most data types
✓ Some machines allow two contiguous registers for double-length values
▪ Automatic or manual save of condition codes?
✓ Condition restore is usually automatic upon call return
✓ Saving condition code registers may be automatic upon call instruction, or may be
manual
• Control and Status Registers
o Essential to instruction execution
▪ Program Counter (PC)
▪ Instruction Register (IR)
▪ Memory Address Register (MAR) - usually connected directly to address lines of bus
▪ Memory Buffer Register (MBR) - usually connected directly to data lines of bus
o Program Status Word (PSW) - also essential, common fields or flags contained include:
▪ Sign - sign bit of last arithmetic operation
▪ Zero - set when result of last arithmetic operation is 0
▪ Carry - set if last operation resulted in a carry into or borrow out of a high-order bit
▪ Equal - set if a logical compare result is equality
▪ Overflow - set when last arithmetic operation caused overflow
▪ Interrupt Enable/Disable - used to enable or disable interrupts
53
▪ Supervisor - indicates if privileged ops can be used
o Other optional registers
▪ Pointer to a block of memory containing additional status info (like process control
blocks)
▪ An interrupt vector
▪ A system stack pointer
▪ A page table pointer
▪ I/O registers
o Design issues
▪ Operating system support in CPU
▪ How to divide allocation of control information between CPU registers and first part
of main memory (usual tradeoffs apply)
• Example Microprocessor Register Organization

Instruction Pipelining
• Concept is similar to a manufacturing assembly line
o Products at various stages can be worked on simultaneously
o Also referred to as pipelining, because, as in a pipeline, new inputs are accepted at one end
before previously accepted inputs appear as outputs at the other end
• Consider subdividing instruction processing into two stages:
o Fetch instruction
o Execute instruction
54
• During execution, there are times when main memory is not being accessed.
• During this time, the next instruction could be fetched and buffered (called instructio n
prefetch or fetch overlap).
• If the Fetch and Execute stages were of equal duration, the instruction cycle time would be
halved.
• However, doubling of execution time is unlikely because:
o Execution time is generally longer than fetch time (it will also involve reading and storing
operands, in addition to operation execution)
o A conditional branch makes the address of the next instruction to be fetched unknown
(although we can minimize this problem by fetching the next sequential instruction anyway)
• To gain further speedup, the pipeline must have more stages. Consider the following
decomposition of instruction processing:
o Fetch Instruction (FI)
o Decode Instruction (DI) - determine opcode and operand specifiers
o Calculate Operands (CO) - calculate effective address of each source operand
o Fetch Operands (FO)
o Execute Instruction (EI)
o Write Operand (WO)
• Timing diagram, assuming 6 stages of fairly equal duration and no branching

55
Notes on the diagram
o Each instruction is assumed to use all six stages
▪ Not always true in reality
▪ To simplify pipeline hardware, timing is set up assuming all 6 stages will be used
o It assumes that all stages can be performed in parallel
▪ Not actually true, especially due to memory access conflicts
▪ Pipeline hardware must accommodate exclusive use of memory access lines, so delays
may occur
▪ Often, the desired value will be in cache, or the FO or WO stage may be null, so
pipeline will not be slowed much of the time
• If the six stages are not of equal duration, there will be some waiting involved for shorter
stages
• The CO (Calculate Operands) stage may depend on the contents of a register that could be
altered by a previous instruction that is still in the pipeline
• It may appear that more stages will result in even more speedup

56
o There is some overhead in moving data from buffer to buffer, which increases with more
stages
o The amount of control logic for dependencies, etc. for moving from stage to stage increases
exponentially as stages are added
• Conditional branch instructions and interrupts can invalidate several instruction fetches
THE CENTRAL PROCESSING UNIT
Reduced Instruction Set Computers (RISCs)
Introduction
• RISC is one of the few true innovations in computer organization and architecture in the last
50 years of computing.
• Key elements common to most designs:
o A limited and simple instruction set
o A large number of general purpose registers, or the use of compiler technology to optimize
register usage
o An emphasis on optimizing the instruction pipeline
Instruction Execution Characteristics
• Overview
o Semantic Gap - the difference between the operations provided in high-level languages and
those provided in computer architecture
o Symptoms of the semantic gap:
▪ Execution inefficiency
▪ Excessive machine program size
▪ Compiler complexity
o New designs had features trying to close gap:
▪ Large instruction sets
▪ Dozens of addressing modes
▪ Various HLL statements in hardware
o Intent of these designs:
▪ Make compiler-writing easier
▪ Improve execution efficiency by implementing complex sequences of operations in
microcode
▪ Provide support for even more complex and sophisticated HLL's
o Concurrently, studies of the machine instructions generated by HLL programs
57
▪ Looked at the characteristics and patterns of execution of such instructions
▪ Results lead to using simpler architectures to support HLL's, instead of more complex
o To understand the reasoning of the RISC advocates, we look at study results on 3 main aspects
of computation:
▪ Operations performed - the functions to be performed by the CPU and its interactio n
with memory.
▪ Operands used - types of operands and their frequency of use. Determine memory
organization and addressing modes.
▪ Execution Sequencing - determines the control and pipeline organization.
o Study results are based on dynamic measurements (during program execution), so that we can
see effect on performance
• Operations
o Simple counting of statement frequency indicates that assignment (data moveme nt)
predominates, followed by selection/iteration.
o Weighted studies show that call/return actually accounts for the most work
o Target architectural organization to support these operations well
o Patterson study also looked at dynamic frequency of occurrence of classes of variables.
Results showed a preponderance of references to highly localized scalars:
▪ Majority of references are to simple scalars
▪ Over 80% of scalars were local variables
▪ References to arrays/structures require a previous ref to their index or pointer, which
is usually a local scalar
• Operands
o Another study found that each instruction (DEC-10 in this case) references 0.5 operands in
memory and 1.4 registers.
o Implications:
▪ Need for fast operand accessing
▪ Need for optimized mechanisms for storing and accessing local scalar variables
• Execution Sequencing
o Subroutine calls are the time-consuming operation in HLL's
o Minimize their impact by
▪ Streamlining the parameter passing
▪ Efficient access to local variables
58
▪ Support nested subroutine invocation
o Statistics
▪ 98% of dynamically called procedures passed fewer than 6 parameters
▪ 92% use less than 6 local scalar variables
▪ Rare to have long sequences of subroutine calls followed by returns (e.g., a recursive
sorting algorithm)
▪ Depth of nesting was typically rather low
• Implications
o Reducing the semantic gap through complex architectures may not be the most efficient use
of system hardware
o Optimize machine design based on the most time-consuming tasks of typical HLL programs
o Use large numbers of registers
▪ Reduce memory reference by keeping variables close to CPU (more register refs
instead)
▪ Streamlines instruction set by making memory interactions primarily loads and stores
o Pipeline design
▪ Minimize impact of conditional branches
o Simplify instruction set rather than make it more complex
Large Register Files
• How can we make programs use registers more often?
o Software - optimizing compilers
▪ Compiler attempts to allocate registers to those variables that will be used most in a
given time period
▪ Requires sophisticated program-analysis algorithms
o Hardware
▪ Make more registers available, so that they'll be used more often by ordinary compilers
▪ Pioneered at Berkeley by first commercial RISC product, the Pyramid
Reduced Instruction Set Architecture
• Why CISC?
o CISC trends to richer instruction sets
▪ More instructions
▪ More complex instructions
o Reasons
59
▪ To simplify compilers
▪ To improve performance
• Are compilers simplified?
o Assertion: If there are machine instructions that resemble HLL statements, compiler
construction is simpler
o Counter-arguments:
▪ Complex machine instructions are often hard to exploit because the compiler must
find those cases that fit the construct
▪ Other compiler goals
▪ Minimizing code size
▪ Reducing instruction execution count
▪ Enhancing pipelining are more difficult with a complex instruction set
▪ Studies show that most instructions actually produced by CISC compilers are the
relatively simple ones
• Is performance improved?
o Assertion: Programs will be smaller and they will execute faster
▪ Smaller programs save memory
▪ Smaller programs have fewer instructions, requiring less instruction fetching
▪ Smaller programs occupy fewer pages in a paged environment, so have fewer page
faults
o Counter-arguments:
▪ Inexpensive memory makes memory savings less compelling
• CISC programs may be shorter, but bits used for each instruction are more, so total memory
used may not be smaller
o Opcodes require more bits
o Operands require more bits because they are usually memory addresses, as opposed to register
identifiers (which are the usual case for RISC)
• The entire control unit must be more complex to accommodate seldom used complex
operations, so even the more often-used simple operations take longer
• The speedup for complex instructions may be mostly due to their implementation as simpler
instructions in microcode, which is similar to the speed of simpler instructions in RISC
(except that the CISC designer must decide a priori which instructio ns to speed up in this way)
• Characteristics of RISC Architectures
60
o One instruction per cycle
▪ A machine cycle is defined by the time it takes to fetch two operands from registers,
perform and ALU operation, and store the result in a register
▪ RISC machine instructions should be no more complicated than, and execute about as
fast as microinstructions on a CISC machine
▪ No microcoding needed, and simple instructions will execute faster than their CISC
equivalents due to no access to microprogram control store.
o Register-to-register operations
▪ Only simple LOAD and STORE operations access memory
▪ Simplifies instruction set and control unit
▪ Ex. Typical RISC has 2 ADD instructions
▪ Ex. VAX has 25 different ADD instructions
▪ Encourages optimization of register use
o Simple addressing modes
▪ Almost all instructions use simple register addressing
▪ A few other modes, such as displacement and PC relative, may be provided
▪ More complex addressing is implemented in software from the simpler ones
▪ Further simplifies instruction set and control unit
o Simple instruction formats
▪ Only a few formats are used
▪ Further simplifies the control unit
▪ Instruction length is fixed and aligned on word boundaries
▪ Optimizes instruction fetching
▪ Single instructions don't cross page boundaries
▪ Field locations (especially the opcode) are fixed
▪ Allows simultaneous opcode decoding and register operand access
• Potential benefits
o More effective optimizing compilers
o Simpler control unit can execute instructions faster than a comparable CISC unit
o Instruction pipelining can be applied more effectively with a reduced instruction set
o More responsiveness to interrupts
▪ They are checked between rudimentary operations
▪ No need for complex instruction restarting mechanisms
61
• VLSI implementation
o Requires less "real estate" for control unit (6% in RISC I vs. about 50% for CISC microcode
store)
o Less design and implementation time
RISC Pipelining
• The simplified structure of RISC instructions allows us to reconsider pipelining
o Most instructions are register-to-register, so an instruction cycle has 2 phases
▪ I: Instruction Fetch
▪ E: Execute (an ALU operation w/ register input and output)
o For load and store operations, 3 phases are needed
o I: Instruction fetch
▪ E: Execute (actually memory address calculation)
▪ D: Memory (register-to-memory or memory-to-register)
• Since the E phase usually involves an ALU operation, it may be longer than the other phases.
In this case, we can divide it into 2 sub phases:
o E1: Register file read
o E2: ALU operation and register write
The RISC vs. CISC Controversy
• In spite of the apparent advantages of RISC, it is still an open question whether the RISC
approach is demonstrably better.
• Studies to compare RISC to CISC are hampered by several problems (as of the textbook
writing):
o There is no pair of RISC and CISC machines that are closely comparable
o No definitive set of test programs exist.
o It is difficult to sort out hardware effects from effects due to skill in compiler writing.
• Most of the comparative analysis on RISC has been done on “toy” machines, rather than
commercial products.
• Most commercially available “RISC” machines possess a mixture of RISC and CISC
characteristics.
• The controversy has died down to a great extent
o As chip densities and speeds increase, RISC systems have become more complex
o To improve performance, CISC systems have increased their number of general purpose
registers and increased emphasis on instruction pipeline design.
62

g8 - Pretechnical Studies Notes
89% (38)
g8 - Pretechnical Studies Notes
44 pages
Computer Organization and Architecture
67% (3)
Computer Organization and Architecture
111 pages
CAO Module1
No ratings yet
CAO Module1
34 pages
DISH+20 1-21 1+Remote+User+Guide PDF
No ratings yet
DISH+20 1-21 1+Remote+User+Guide PDF
64 pages
Water Safety Instructor's Manual (Updated November 2019)
100% (1)
Water Safety Instructor's Manual (Updated November 2019)
389 pages
Coa PPT - Module 1 - Introduction and Performance Evaluation
No ratings yet
Coa PPT - Module 1 - Introduction and Performance Evaluation
56 pages
Computer Organization and Architecture Notes
No ratings yet
Computer Organization and Architecture Notes
102 pages
Computer Organization and Architecture Notes
No ratings yet
Computer Organization and Architecture Notes
15 pages
Netflix Pricing Decision 2011
0% (1)
Netflix Pricing Decision 2011
12 pages
DVD800 Insignia
No ratings yet
DVD800 Insignia
80 pages
CSC 323 Computer Architecture and Organization II 2ND SEMESTER
No ratings yet
CSC 323 Computer Architecture and Organization II 2ND SEMESTER
128 pages
01 Introduction Nya
No ratings yet
01 Introduction Nya
27 pages
Bluray Disc Notes
100% (1)
Bluray Disc Notes
19 pages
MOD2
No ratings yet
MOD2
121 pages
Siwes Report at Bayero University Kano Cit Unit - Nwu Kano
100% (1)
Siwes Report at Bayero University Kano Cit Unit - Nwu Kano
35 pages
COA Module 1
No ratings yet
COA Module 1
158 pages
COA Unit-1
No ratings yet
COA Unit-1
95 pages
Introduction To Computer
100% (1)
Introduction To Computer
57 pages
Chapter 1 Introduction To COA
No ratings yet
Chapter 1 Introduction To COA
65 pages
Module 1
No ratings yet
Module 1
70 pages
Unofficial PS2 Guide (Revision 1.0)
No ratings yet
Unofficial PS2 Guide (Revision 1.0)
159 pages
Lecture 1 - Computer System Architecture
No ratings yet
Lecture 1 - Computer System Architecture
49 pages
Note On System Performance Evaluation CSC 508
No ratings yet
Note On System Performance Evaluation CSC 508
64 pages
Co Unit - 1
No ratings yet
Co Unit - 1
48 pages
Chapter 1
No ratings yet
Chapter 1
59 pages
Part A Final
No ratings yet
Part A Final
41 pages
Scheme of Work: Cambridge IGCSE Information and Communication Technology 0417
No ratings yet
Scheme of Work: Cambridge IGCSE Information and Communication Technology 0417
64 pages
Com314 Lecture Note
No ratings yet
Com314 Lecture Note
29 pages
Chapter 1
100% (1)
Chapter 1
19 pages
Module 1
No ratings yet
Module 1
67 pages
Chapter One Comparc
No ratings yet
Chapter One Comparc
23 pages
Coal 1
No ratings yet
Coal 1
55 pages
1.1the Computer Systems
No ratings yet
1.1the Computer Systems
65 pages
Unit 1 Overview - COA BTech IV B
No ratings yet
Unit 1 Overview - COA BTech IV B
27 pages
Chapter 03 Computer Function Interconnection
No ratings yet
Chapter 03 Computer Function Interconnection
60 pages
485 COURSES Computer Architecture I 14103
No ratings yet
485 COURSES Computer Architecture I 14103
31 pages
U 1 Newcoa
No ratings yet
U 1 Newcoa
23 pages
Ics2101 Computer Organisation
No ratings yet
Ics2101 Computer Organisation
30 pages
Blue Ray
No ratings yet
Blue Ray
35 pages
Instructions: DVD/CD Receiver KD-DV4506/ KD-DV4505/KD-DV4504
No ratings yet
Instructions: DVD/CD Receiver KD-DV4506/ KD-DV4505/KD-DV4504
90 pages
Intisar'sComputer Organization and Architecture Notes
No ratings yet
Intisar'sComputer Organization and Architecture Notes
110 pages
Arch Leacture 1
No ratings yet
Arch Leacture 1
52 pages
Cod Chap 1
No ratings yet
Cod Chap 1
20 pages
Company Presentation: Gowri.M Reg No:1005121 ICM-IMK Poojappura
No ratings yet
Company Presentation: Gowri.M Reg No:1005121 ICM-IMK Poojappura
39 pages
Physics Presentation (Invention of Blue Ray)
No ratings yet
Physics Presentation (Invention of Blue Ray)
20 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
59 pages
Ics 2101 Bit 2102 First Batch
No ratings yet
Ics 2101 Bit 2102 First Batch
23 pages
Computer Architecture CSC 303
100% (1)
Computer Architecture CSC 303
36 pages
Chapter 1V
No ratings yet
Chapter 1V
22 pages
03 System Buses Part 1
No ratings yet
03 System Buses Part 1
22 pages
Computer Organization
No ratings yet
Computer Organization
23 pages
GE ELECT SED 3 Module 09 Computer Systems Organization and Architecture
No ratings yet
GE ELECT SED 3 Module 09 Computer Systems Organization and Architecture
14 pages
UPDATED-Evolution of Computers
No ratings yet
UPDATED-Evolution of Computers
22 pages
Comp Arch and Org - Lec 1
No ratings yet
Comp Arch and Org - Lec 1
27 pages
Chapter1 - Basic Concepts & Computer - Evolution
No ratings yet
Chapter1 - Basic Concepts & Computer - Evolution
67 pages
Lecture 2
No ratings yet
Lecture 2
78 pages
Cit Lecture Note Part B by P - K - Joseph
No ratings yet
Cit Lecture Note Part B by P - K - Joseph
16 pages
Introduction
No ratings yet
Introduction
20 pages
Pioneer HDD/DVD Recorder DVR-550H-S
No ratings yet
Pioneer HDD/DVD Recorder DVR-550H-S
142 pages
Computer Organization and Architecture: Chapter 1 - Introduction Sameer Akram
No ratings yet
Computer Organization and Architecture: Chapter 1 - Introduction Sameer Akram
21 pages
CH 1
No ratings yet
CH 1
21 pages
Discovering Computers 2012: Your Interactive Guide To The Digital World
No ratings yet
Discovering Computers 2012: Your Interactive Guide To The Digital World
47 pages
16x DVD+-RW DL DVD Writer Comparison Guide
No ratings yet
16x DVD+-RW DL DVD Writer Comparison Guide
3 pages
1 Introduction, IAS Architecture 18-07-2024
No ratings yet
1 Introduction, IAS Architecture 18-07-2024
16 pages
Computer Organization and Architecture: Julius Bancud
No ratings yet
Computer Organization and Architecture: Julius Bancud
36 pages
Lecture1 All
No ratings yet
Lecture1 All
37 pages
COA Unit1
No ratings yet
COA Unit1
9 pages
Summary Notes
No ratings yet
Summary Notes
63 pages
Lec 1 - Lec2 Introduction - & Evolution
No ratings yet
Lec 1 - Lec2 Introduction - & Evolution
15 pages
Computer Organization Notes
No ratings yet
Computer Organization Notes
6 pages
Chapter#1 - Notes
No ratings yet
Chapter#1 - Notes
5 pages
Microprocessor Based Systems: Lecture No 02 Introduction To Computer Architecture
No ratings yet
Microprocessor Based Systems: Lecture No 02 Introduction To Computer Architecture
24 pages
Chapter 1 - Introduction To Computer Architecture and Organization
No ratings yet
Chapter 1 - Introduction To Computer Architecture and Organization
21 pages
Lesson 1: Computer: Systems Design
No ratings yet
Lesson 1: Computer: Systems Design
16 pages
Competency Based Learning Materials: Zoom Technical Vocational Training and Assessment Center Inc
No ratings yet
Competency Based Learning Materials: Zoom Technical Vocational Training and Assessment Center Inc
20 pages
CDROM, Floppy and Hard Disk Structure
No ratings yet
CDROM, Floppy and Hard Disk Structure
44 pages
EE3 Manual
No ratings yet
EE3 Manual
37 pages
Chapter1 - Von Neumann CA
No ratings yet
Chapter1 - Von Neumann CA
4 pages
CD Player & FM Tuner PDF
No ratings yet
CD Player & FM Tuner PDF
8 pages
Computer Knowledge Find The Odd Man Out
No ratings yet
Computer Knowledge Find The Odd Man Out
1 page
Nuevo Documento de Texto
No ratings yet
Nuevo Documento de Texto
3 pages
Hollywood Goes Global
No ratings yet
Hollywood Goes Global
6 pages
Secondary Storage Devices
No ratings yet
Secondary Storage Devices
17 pages
HP ProBook 4430s 4530s Notebook PC Datasheet
100% (1)
HP ProBook 4430s 4530s Notebook PC Datasheet
4 pages
Toshiba L515-SP4031
No ratings yet
Toshiba L515-SP4031
3 pages
Solution Manual For Computer Organization and Architecture 9th Edition by William Stallings ISBN 013293633X 9780132936330
100% (59)
Solution Manual For Computer Organization and Architecture 9th Edition by William Stallings ISBN 013293633X 9780132936330
36 pages
Toxic
No ratings yet
Toxic
6 pages
Understanding Software Engineering Vol 1: Where does the software run and how? The hardware.
From Everand
Understanding Software Engineering Vol 1: Where does the software run and how? The hardware.
Gabriel Clemente
No ratings yet
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
From Everand
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
Sam Steed
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet

CSC303 Architecture Lecture Note 2019

Uploaded by

CSC303 Architecture Lecture Note 2019

Uploaded by

NIGERIA POLICE ACADEMY, WUDIL KANO

Department of Computer Science and Mathematics

CSC303: COMPUTER ARCHITECTURE AND ORGANIZATION I (3 Units)

Mr. Nuraddeen Ado Muhammad

1. Computer Organization and Architecture 5th Edition, 2000 William Stallings

2. Computer Evolution and Performance.

THE COMPUTER SYSTEM

7. THE CENTRAL PROCESSING UNIT

9. Instruction Sets: Characteristics and Functions.

10. Instruction Sets: Addressing Modes and Formats.

11. CPU Structure and Function

12. Instruction Set Computers (i.e RISCs and CISC).

• Von Neuman Architecture

• Data and instructions are stored in a single read-write memory.

Structure and Function

• General IAS Structure Consists of:

o Fetch - CPU reads an instruction from a location in memory

• When an interrupt signal is generated, the processor:

Elements of Bus Design

Data Organization and Formatting

Redundant Arrays of Independent Disks

• Not a true member of RAID – no redundancy!

• Uses an independent access technique

Digital Versatile Disk (DVD)

THE COMPUTER SYSTEM

▪ DMA module uses programmed I/O as a surrogate CPU

▪ DMA module is attached to one or more I/O modules

▪ Reduces number of I/O interfaces in the DMA module to one

Operating System Support

THE CENTRAL PROCESSING UNIT

THE CENTRAL PROCESSING UNIT

You might also like