0% found this document useful (0 votes)
15 views30 pages

Ics2101 Computer Organisation

Uploaded by

jameswangaruro82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views30 pages

Ics2101 Computer Organisation

Uploaded by

jameswangaruro82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

lOMoARcPSD|46245439

ICS2101 COMPUTER ORGANISATION

computer science (Jomo Kenyatta University of Agriculture and Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Enock Momanyi ([email protected])
lOMoARcPSD|46245439

ICS 2101: COMPUTER


ORGANIZATION

Samson Ochingo, Dept of Computing, JKUAT, [email protected], COHRED Block, 1st


Floor

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

ICS2101: COMPUTER ORGANIZATION


COURSE OUTLINE:
Processor architectures: RISC (Reduced Instruction Set Computers), CISC (Complex Instruction Set Computers)
and Superscalar. Instruction Set and Addressing Modes. Assembly Language. Basics of Digital Logic and Hardware
Construction. Memory Organization and the Cache Principles. System Buses. Input/Output methods and devices.

WEEK CHAPTER CONTENTS


1 Introduction  What is computer architecture?
 What is computer organization?
 Levels within the computer architecture
 Structure and function of a computer
2 System Buses  Computer components
 Interconnection structures
 What are buses
 Bus types and characteristics
3 Internal Memory  Memory and the memory hierarchy
 Main memory, Registers, Cache, Mapping functions
4 External  Why external memory
Memory  Common storage media and their characteristics
 RAID
5 Input/Output  I/O modules
Methods and  I/O channels and the processor
Devices  Programmed I/O, Interrupt-driven I/O and DMA
 External interfaces etc
6 The Operating  Overview of OS
System  Scheduling, Memory management etc
7 Instruction Sets  Machine instruction characteristics
(Characteristics  Instruction representation ant types
and Functions)  Number of addresses, design and operand types etc
8 Instruction sets  Addressing modes and Instruction format
(Addressing  Instruction length, instruction format etc
modes and  Fixed/variable length instructions etc
Format)
9 RISC Vs CISC  What is RISC, What is CISC and their characteristics
 Comparison of RISC Vs CISC systems etc
10 Superscalar  What are superscalar systems
Systems  Their execution mode and their implementation
12 Parallel  Multiple Processors, organization
Processing  Symmetric Processing and Asymmetric processing
13 Basics of Digital  Boolean algebra
Logic and  Digital logics
Hardware  Data processing:
Construction o Data paths – CU, Arithmetic Unit, Logic Unit
 Microprogramming
 Floating point number representation
14 Assembly  What is Assembly Language Programming
Language  How does it differ from the others
Programming  Examining the Address, Contents, Label, Operation etc
 Or as an assignment
15 & 16 EXAMS EXAMS

Recommended text(s)
1. Computer Organization and Architecture (5th Ed) By William Stallings, Prentice Hall
2. Introduction to Computer Science By C. S. French

Page 1 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 1: INTRODUCTION

Computer Organization
Refers to the operational units and their interconnections that realize the architectural specifications. (Architecture – refers
to those attributes of a system visible to the programmer those attributes that have a direct impact on the logical execution
of a program). In effect, organizational attributes include those hardware details transparent to the programmer e.g. control
signals, interfaces between the computer and peripherals and the memory technology used. Architecture may survive many
years, but the organization changes with changing technology, but generally the relationship between organization and
architecture are very close, especially with regard to microcomputers.

Structure and Function:


A computer is a complex system with contemporary computers being able to contain millions of electronic components.

The building up of complex systems takes up a hierarchic structure, where the behavior at each level depends only on a
simplified abstracted characterization of the system at the next lower level. At each of theses levels, the designer is only
concerned with two main items:
a. Structure: which is the way in which the components are interrelated
b. Function: which is the operation of each individual component as part of that structure.

A top-down overview of the computer system follows.

Structure
The computer displays four main structural components namely:
 CPU – which controls the operations of the computer and performs its data processing functions.
 Main memory – that moves data between the computer and its external environment
 System interconnection – which are the mechanisms that provide for communications among the CPU, main
memory and I/O.

The CPU is structured as below:

 Control Unit (CU) - controls the operations of the CPU/computer


 ALU – performs data processing functions
 Registers – provide storage internal to CPU
 CPU interconnection – the mechanisms that provide for communication among the CU, ALU and registers.
Function
There are mainly four basic functions a computer performs thus:
 Data processing – since data may take a variety of forms and the range of processing requirements is broad.
 Data storage – since the computer must temporarily store at least those pieces of data that are being worked
on at any given time. There are both short-term and long-term storage.
 Data movement – occurs between itself and the outside world. I/O occurs when data moves from and to a
device that is directly connected to the computer (i.e. peripheral) while data communication occurs when data
are moved over longer distances.
 Control – of the three functions above are performed by the person providing the computer with instructions.

Computer systems architecture


The term “architecture” refers to the style of construction and organization of the many parts of a computer system. This
implies that there are variations in construction that reflect the differing ways in which computers are used, even though the
basic elements of the computer are the same for almost all digital computers.

Levels within the Computer Architecture:


The simplest way to look at the levels in the construction and organization of a computer is that of hardware and software
i.e. the software is a layer that sits on the hardware (see fig 1 below), using and controlling it while the hardware provides
all the operations that the software requires.

Software
Fig. 1 Hardware

In general, arrangements within the computer may be considered as multilayered with several s/ware levels sitting on
several h/ware levels (see fig 2 below)

Page 2 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

Fig. 2

7 Software levels Applications Layer


6 Higher Order Software Layer
5 Operating Systems Layer
4 Hardware levels Machine Layer
3 Micro programmed Layer
2 Digital Logic Layer
1 Physical Device Layer

Explanations:

1. The Physical Device Layer: forms the electrical and electronic components of the computer. Some of the gadgets
to be seen here include transistors, capacitors, resistors, power supply components etc.
2. The Digital Logic Layer: caters for all the most basic operations of the machine. Elements at this layer can store,
manipulate and transmit data in the form of binary representations. These digital logic elements are called “gates”
which are normally constructed from a small number of transistors and other electronic components. Such devices
may be combined together to form computer processors, memories and such other units used for I/O.
3. The Micro programmed layer: interprets the machine language instructions from the machine layer and directly
causes the digital logic elements to perform the required operations. In effect, it is some basic inner processor,
driven by its own control program instructions held in its own private inner ROM. They may be termed as
“firmware” (i.e. software in ROM).
4. Machine Layer: is the lowest level at which a program can be written. It holds the machine language instructions
which can be directly interpreted by the hardware.
5. The Operating Systems Layer: controls the way in which all software use the underlying hardware. Also it hides
the complexities of the hardware from other s/ware by providing its own facilities which enable software to use
hardware more simply, and prevents other software from bypassing these facilities so that the hardware can only be
accessed by the OS.
6. The Higher Order Software Layer: covers all programs in languages other than machine language. They require
translation into machine code before they are executed.
7. The Applications Layer: represents the language of the computer as seen by the end-user.

Physical Organization of the Computer


Due to the high cost of constructing a computer from scratch, the manufacturers tend to construct their computers from
varied combinations of standard components e.g. many different microcomputers contain the same microprocessors. This
calls for standardization of components to allow for interconnections. One method to achieve this is by the use of “buses”.

Defn: A bus is a collection of parallel electrical conductors called “lines” onto which a number of components may be
connected. Such connectors, placed along the length of the bus have multiple electrical contacts.

Two basic bus types are:


a) Internal buses – used within the processor and an integral part of its construction.
b) External buses – used to connect separate hardware elements together e.g. connecting the processor to main memory.

Buses are mainly used to convey:


 Data signals
 Data address signals
 Control signals
 Power

The Influence of Size on Construction


There are mainly three different forms of construction as listed here:
1. Single-chip computers: are those found in devices e.g. watches, cameras etc. The processors are specialized,
having been programmed to do specific tasks.
2. Single-board computers: bigger than (1) above, but still relatively small. They are constructed on thin flat sheets
of electrical insulator onto which the components can be fixed and interconnected. Printed Circuit Boards (PCBs)
are often used for volume production.

Page 3 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

These single-board computers fall into two broad categories:


 Small general-purpose computers e.g. small home computer
 Small special-purpose computers – often used for applications involving the control of physical processes
e.g. operating complex milling machines.

3. Multi-board, bus-based computers: usually general-purpose computers, and too large to fit onto a single board.
Each board has a particular function, and all boards are interconnected by plugging them into individual slots on
one or more general-purpose buses. For example, one board may contain the processor, another main storage etc.
Many microcomputers and mainframes are based on this type of construction. In some cases, a primary board,
called “motherboard” exists for the processor and other main components into which other boards may be slotted.

Main components of the digital logic layer


The speed of data transfer, and by extension, processing, is determined by the size (width) of the bus e.g. 8-bit, 16-bit etc.
Microcomputer-based systems especially tend to use a single bus.

Most architecture otherwise use two buses that follow two alternatives as shown below (see figures 3 (a), (b) and (c))

REFER TO THE HAND DRAWN DIAGRAMS IN THE POWERPOINT LECTURE

In these layouts, data transfers between memory and the processor use a faster bus and also are not held up waiting for the
slower devices used for I/O.

Arrangement (b) is used more in larger microcomputers while (c) is used on larger minicomputers and mainframes. The
main aims of doing these are:
 To maximize the use of the processor by freeing it of the burden of controlling low-level I/O operations.
 To maximize the speed and efficiency of I/O data transfers to and from memory

The processor:
Its functions are:
 To control the use of main storage to store data and instructions
 To control the sequence of operations
 To give commands to all parts of the computer system
 To carry out processing

It is connected to other parts of the computer by means of buses.

Registers
Are special-purpose temporary storage locations within the processor and are used for:

Page 4 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

i. Memory data registers (MDR) – registers through which all data and instructions pass in and out of the
processor.
ii. Memory buffer registers (MBR) – registers through which all data and instructions pass in and out of main
storage.
iii. Prior to each transfer between the processor’s MDR and the main storage’s MBR, the exact source or
destination of the data in main storage must be specified. This is done by loading the appropriate location
address into the Memory address register (MAR).
iv. I/O units connected to the processor via a bus also have data buffer registers (DBR) similar to MBR.

Control Unit
Is the nerve center of the computer. Controls and coordinates all hardware operations. Operates in two-stage cycle called the
“fetch-execute” cycle. It automatically deals with instructions in the order in which they occur in main memory. It does this
by using a register called “Program Counter (PC) or Sequence Control Register (SCR) which holds the location address of
the next instruction to be performed.

Arithmetic and Logic Unit (ALU)


Has two main functions:
 Carries out the arithmetic e.g. add, subtract etc
 Performs certain logical operations e.g. testing whether two data items match.

Multiprocessor Systems
Refer to any computer containing more than one processor. The extra processors may be used
 As additional m main processors sharing the normal processing load.
 As special-purpose processors catering for some particular function e.g. a math’s co-processor may be used in
conjunction with a single main processor to perform some standard complex computations.

Where there are a number of main processors, there are two basic methods of using the processors:
a. Asymmetric multiprocessing (AMP) – in this case, one processor is the master and all other are subordinate to it.
The master processor has special privileges over the operating system.
b. Symmetric multiprocessing (SMP) – here all processors have equal rights.

NB: alternative architectures are coming up into place to increase computational power. These include:
 Pipeline machines – in which each stage of fetch-execute cycle is handled by a separate machine hardware unit.
 Array Processors – in which there is one control unit but multiple ALUs which work in parallel with one another.

Defn:

Pipelining - is the name given to a method of speeding up the fetch-execute cycle by fetching not just the next instruction
but the next few that follow it in main storage. These pre-fetches are carried out if during the execute part of the fetch-
execute cycle, a brief interval occurs during which memory does not need to be accessed, e.g. while an arithmetic operation
is in progress i.e. no fetch task.

Array processors – in which there is one control unit but multiple ALUs which work in parallel with one another.

Page 5 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 2: SYSTEM BUSES

The major computer system components (processor, memory and I/O modules) need to be interconnected in order to
exchange data and control signals. The most popular means of interconnection is the use of a shared system bus consisting
of multiple lines. There exists, usually, a hierarchy of buses to improve performance.

Bus Interconnection
A bus is simply a communication pathway connecting two or more devices, and it is a shared medium, meaning only one
device can send at a time. Several bus lines may be used to transfer more bits per unit time.

Buses consist of between 50 – 100 lines each line with its function classified in three as:
i. Data lines – for moving data between system modules. They can be 8, 16, 32 etc lines referred to as ‘width’.
The wider the better is the system performance.
ii. Address lines – designate the source/destination of the data on the data bus.
iii. Control lines - control access to and the use of the data and address lines. Control signals transmit both
command and timing information between system modules.

Key elements of bus design


i. Bus type: Bus lines can be grouped into either “dedicated” or “multiplexed”. The dedicated bus line is
permanently assigned to either one function or to a physical subset of computer components. The multiplexed
one divides its time between transmissions and has the advantage of using fewer lines thus less cost, but
requires more complex circuitry within each module. Also there is potential reduction in performance due to
lack of parallelism. The dedicated one has the advantage of high throughput due to less bus contention, but the
disadvantage is the increased size and cost of the system.
ii. Method of arbitration: Can be centralized i.e. a bus controller/arbiter, is responsible for allocating time on the
bus, or distributed – each module contains control access logic and so they act together to share the bus.
iii. Timing: Is the way events are coordinated on the bus and may be synchronous or asynchronous.
Synchronous timing is simpler and easy but is less flexible i.e. the system cannot take advantage of advances
in device performance while with asynchronous timing, a mixture of slow and fast device, using older and
newer technology can share the bus.
iv. Bus width: The wider the data bus, the greater is the number of bits transferred at a time.
v. Data transfer types: The bus supports both write and read operations.

Multiple – Bus hierarchies


The more the devices are connected to the bus the more the performance will suffer for two main reasons.
i. The more the devices attached the greater will be the bus length and so the greater the propagation delay i.e.
contention.
ii. The bus may become a bottleneck as the aggregate data transfer demand approaches the capacity of the bus.
i.e. there will be demand for multiple buses. Thus most computer systems employ the use of multiple buses,
generally laid out in a hierarchy.(see pg 75 of recommended text for diagram).
Note that in this case computer components are connected to the buses based on their relationships/similarities e.g. I and O
then P and M, commonly referred to as “locality of reference”.
Interrupts:
Virtually all computers provide a mechanism by which other modules (I/O, memory) may interrupt the normal processing
of the processor.
Some common interrupts may be classed as:
i. Program interrupts - generated by some condition that occurs as a result of instruction execution e.g. an
arithmetic overflow, division by zero, reference outside allowed user memory etc.
ii. Timer interrupts – generated by the processor timer to allow the OS to perform certain functions on a regular
basis.
iii. I/O interrupts – generated by an I/O controller, to signal normal completion of an operation or to signal a
variety of error conditions.
iv. H/W failure interrupts – generated by failure e.g. power failure or memory parity error.

The need for interrupts therefore is primarily for improving processing efficiency e.g. when you consider the fast
processor and slow I/O devices.

I/O Function

Page 6 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

An I/O module e.g. a disk controller, can exchange data directly with the processor i.e. the processor can read data from or
write data to an I/O module. At certain time, it’s worth allowing I/O exchanges to occur directly with memory i.e. the
processor grants to an I/O module the authority to read from or write to memory, enabling the I/O – memory transfer to
occur without tying up the processor. This is referred to as Direct Memory Access (DMA) (cycle stealing strategy).

Also note that there are primarily three types of buses namely:
i. Address bus – carries memory addresses from the processor to other components eg M, I/O.
ii. Data bus – carries data between P and other components
iii. Control bus – carries control signals from P to the other components.

Page 7 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 3: MEMORY ORGANIZATION AND CACHE PRINCIPLES

Internal Memory:
Computer memory exhibits a wide range of type, technology, organization, performance and cost. A typical computer
system is equipped with a hierarchy of memory subsystems, some internal to the system (i.e. directly accessed by the
processor) and some external (accessed by the processor via an I/O module).

Characteristics of Memory Systems


i. Location – refers to whether the memory is internal or external to the computer e.g. main memory is internal while
disk and tape are external.
ii. Capacity – expressed in bytes or words e.g. 8, 16, 36 bits.
iii. Unit of transfer – is the number of bits read out of or written into memory at a time.

Memory may be accessed in the following methods


 Sequential access – accesses records in a specific linear sequence, as on tape units.
 Direct access – access is accomplished by direct access to reach a general vicinity plus sequential
searching, counting or waiting to reach the final location (i.e. individual records or blocks have a unique
address based on physical location.
 Random access – any location can be selected at random and directly addressed and accessed since each
addressable location in memory has a unique, physically wired-in addressing mechanism (i.e. time of
access is constant).

iv. Performance – is a characteristic that can be measured by three parameters:


 Access time – time taken to perform a read-write operation
 Memory cycle time – the access time plus the time required before a second access can commence
 Transfer rate – rate at which data can be transferred into or out of memory

Memory exists in a variety of forms such as semiconductor, magnetic, optical and magneto optical.

The memory hierarchy


Design constraints on memory focus on how much (capacity), how fast and how expensive. Tradeoffs exists e.g. the faster
the memory the greater is the cost.

Memory is therefore built in a hierarchy to optimize the three constraints. Downwards, the following occur:
 Decreasing cost per bit
 Increasing capacity
 Increasing access time
 Decreasing frequency of access of the memory by the processor

See figure below – “the memory hierarchy”.

Inboard memory Registers, cache, main memory

Outboard storage Mag disks, cds, dvd-rw, dvd-ram etc

Offline storage Mag tape, mo, worm

Page 8 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

Semiconductor Main Memory

Types of random access semiconductor memory


The most common is RAM where you can read and write data easily and rapidly by electrical signals. It is also volatile and
so temporary storage. May be static or dynamic types.

There is also ROM which contains a permanent pattern of data that cannot be changed. It can only be read and not written
to. Its major application is in microprogramming . Others include library subroutines for frequently wanted functions,
system programs and function tables.

The main advantage of ROM is that the data/program is permanently if main memory and need never be loaded from a
secondary storage device since its data is actually wired into the chip. However this poses two main problems:
 The data insertion step includes a relatively large fixed cost
 There is no room for error i.e. if one bit is wrong the whole batch of ROMs must be discarded.

PROM – programmable memory is less expensive, non-volatile and may be written into only once either by the supplier or
customer. It requires special equipment. Required only when a small number of ROMs with particular memory content is
needed.

Read mostly memories – e.g. EPROM, EEPROM and flash memory are useful for applications in which there are more
read operations than write operations but for which non-volatile storage is required.

Flash memory is intermediate between EPROM and EEPROM in both cost and functionality and uses semiconductor
technology.

Cache Memory Principles

Word Block

Cache
CPU Main memory

Cache memory is intended to give memory speeds approaching that of the fastest memories available and at the same time
provide a large memory size at the price of less expensive types of semiconductor memories. The cache contains a copy of
the portions of main memory and so when the processor attempts to read a word of memory, a check is made to determine if
the word is in the cache first then delivered to the processor. The principles of locality of reference apply:
i. It is likely to talk more to its neighbours – physical locality of reference.
ii. Repeated talks are more likely with those already talked to – temporal locality of reference.
iii. Uses the principles e.g. FIFO, LFU, LRU, random, round robin etc

Elements of cache design

i. Cache size – should be small enough so that the average cost per bit is close to that of main memory alone and
large enough so that the overall average access time is close to that of cache alone. Large caches tend to be
slower than smaller ones.

Other motivations to minimize cache size include:


 The larger the cache the larger the number of gates involved in addressing the cache resulting in slower
caches
 Limitations by the available chip and board area.

Typical cache sizes are between 1k and 512k word.

ii. Mapping function – because there are fewer cache lines than main memory blocks, an algorithm is needed for
mapping main memory blocks into cache lines. Three techniques can be used namely direct, associative and
set associative, each with its advantages and disadvantages.

Here is the explanation.

Page 9 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

a. Direct mapping – maps each block of memory into only one possible cache line.
b. Associative mapping – overcomes the disadvantage of direct mapping by permitting each memory block to be
loaded into any line of the cache. Therefore, there is flexibility as to which block to replace when a new block is
read into the cache. Its disadvantage is the complex circuitry required to examine the tags of all the cache lines in
parallel (tag field uniquely identifies a block of main memory so that they can be realigned appropriately.
c. Set associative mapping – exhibits the strengths of both the direct and associative approaches while reducing their
disadvantages.

Replacement algorithms
When a new block is brought into the cache, one of the existing blocks must be replaced. In the case of direct mapping,
replacement takes only one possible line for any particular block i.e. no choice is possible. For associative and set
associative, a replacement algorithm is needed, the algorithm being implemented in hardware. The main four are:
i. LRU – is the most effective and replaces the block in the set that has been in the cache longest with no
reference to it.
ii. FIFO – easily implemented as a round-robin or circular buffer technique and replaces on a FIFO basis.
iii. LFU – replaces that block in the set that has experienced the fewest references. In this case, each line may be
associated with a counter.
iv. Pick a line at random from among the candidate lines.

Write policy
Before a block in the cache can be replaced, it is necessary to consider whether it has been altered in the cache but not in the
main memory. If it has not then the old block in the cache may be overwritten otherwise main memory must be updated.

Since more than one device may have access to main memory, or the existence of many processors attaching to the same
bus and each having its own local cache, an alteration in one cache invalidates a word in the other caches.

A technique called write through is used which ensures that all write operations are made to main memory as well as to the
cache to maintain validity. The disadvantage here is that it generates substantial memory traffic thus may create a
bottleneck.

Another technique is write back where updates are made only in the cache thus minimizing memory writes. The
disadvantage is that parts of main memory are invalid and so accesses by I/O modules are allowed only through the cache.

Number of caches
More recently the use of multiple caches has become the norm. There is the on-chip (on same chip as processor) and
external chip caches. The on-chip cache reduces the processor’s external bus activity and so speeds up execution times and
increases overall system performance. Also more recently, it has become common practice to split the cache into two – one
dedicated to instructions and the other to data.

Page 10 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 4: EXTERNAL MEMORY

Magnetic disks – use magnetic technology. The disk is a circular platter of metal/plastic base coated with magnetizable
material. A read/write operation occurs when the platter rotates below it. During write operation, magnetic patterns are
recorded on the surface and during read operation, current is generated of the same polarity as the one already recorded.

Data are written on tracks, about 500 – 2000 tracks per surface.

Data are transferred to and from the disk in blocks. Data are stored in block size regions known as sectors, about 10 – 100
sectors per track.

Redundant Array of Independent Disks


The rate of improvement in secondary storage performance has been considerably less than the rate for processors and main
memory. This mismatch has made the disk storage system perhaps the main focus of concern in improving overall computer
system performance. With the use of multiple disks, there is a wide variety of ways in which the data can be organized and
in which redundancy can be added to improve reliability. The RAID scheme thus consists of seven levels, zero to six, which
designate different design architectures that share three common characteristics:
iv. RAID is a set of physical disk drives viewed by the operating system as a single logical drive.
v. Data are distributed across the physical disk drives of an array.
vi. Redundant disk capacity is used to store parity information which guarantees data recoverability in case
of a disk failure.

The RAID strategy replaces large-capacity disk drives with multiple smaller-capacity drives and distributes data in such
away as to enable simultaneous access to data from multiple drives, thereby improving I/O performance and allowing easier
incremental increases in capacity. The RAID proposal is effectively to increase redundancy.

Although the use of multiple heads simultaneously achieves higher I/O and transfer rates, the use of multiple devices also
increases probability of failure, but this is compensated for by RAID making use of stored parity information enabling the
recovery of data lost due to a disk failure.

Task: Research on the various levels of RAID.

Optical memory
The Compact Disk (CD) digital audio system was discovered in 1983. It is non-erasable and can store more than 60 minutes
of audio information on one side. A variety of them have now been produced as below.

i. CD: - A non-erasable disk that stores digital audio information. The standard systems use 12cm disks and
records more than 60 minutes of uninterrupted playing time.
ii. CD-ROM:- Non-erasable disk used for storing computer data. Uses 12 cm disks storing more than 600 m
bytes.
iii. DVD: - Digital Video Disk. A technology for producing digitized, compressed representation of video
information as well as large volumes of other digital data.
iv. WORM: Write once read many. Is more easily written than CD-ROM. Size is 5 ¼”, holding between 200-
800 m bytes of data.
v. Erasable Optical Disk: - Uses optical technology but can be easily erased and rewritten. Both 3.25” and
5.25” disks are in use with typical capacity of 650MB.
vi. Magneto-Optical Disk: - Uses optical technology for read and magnetic recording techniques assisted by
optical focusing. Both 3.25” and 5.25” disks are in use, capacities being above 1GB.
vii. Magnetic Tapes: - Use similar recording and reading techniques as disk systems. The tape is coated with
magnetic oxide. (Similar to home tape recorder system. Blocks of data on tape are separated by gaps called
“inter-record” gaps. It is a serial access medium, unlike disks that are direct access.
Magnetic tapes were the first kind of secondary memory and are still widely used as the low-cost, slowest
speed member of the memory hierarchy.

Page 11 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 5: INPUT/OUTPUT METHODS AND DEVICES


The third key element of a computer system after the processor and the memory modules is a set of I/O modules. Each I/O
module interfaces to the system bus or central switch and controls one or more peripheral devices. An I/O module contains
intelligence/logic for performing a communication function between the peripheral and the bus.

The reasons for not connecting directly the peripherals to the system bus include:
i. It’s impractical to incorporate the necessary logic of the wide variety of peripherals with various methods of
operations within the processor to control a range of devices.
ii. It’s impractical to use the high-speed system bus to communicate directly with the peripherals whose speeds
are much slower.
iii. Peripherals often use different data format/word lengths than the computers to which they attach.

The I/O module then is a requirement with the main function:


i. Interface to the processor and memory via the system bus or central switch.
ii. Interface to one or more peripheral devices by tailored data links.

I/O module:
Module function: the major functions or requirements for an I/O module fall into the following categories:
i. Control and timing – which is the coordinating of the flow of traffic between internal resources e.g. main
memory, system buses etc, and the external devices e.g. control of data transfer from an external device to the
processor.
ii. Processor communication – which involves the following
 Command decoding i.e. I/O module accepts command from the processor e.g. READ SECTOR etc
 Data exchanges between the processor and module
 Status reporting of the I/O module e.g. BUSY, READY etc
 Address recognition - I/O module recognizes one unique address for each peripheral it controls.
iii. Device communication – implies/involves commands, status information and data.
iv. Data buffering – enables I/O module to operate at both device and memory speeds i.e. during I/O.
v. Error detection – detects error and reports to the processor. Such errors may include
 Mechanical/electrical malfunctions of device e.g. paper jam, bad disk track etc.
 Unintentional changes to bit pattern as it is transmitted from device to I/O module e.g. use of parity bit to
detect transmission errors.
OS
Fig: Structure of the I/O system

1 2 3 4 5

1. Application programs – i.e. the user level.


2. I/O control system – that part of the OS dealing with I/O related system calls.
3. Device driver – software module that manages communication with and the control of a specific I/O device.
4. Device controller – provides interface between the computer and the I/O device itself. It is a hardware unit
attached to the computer.
5. Device – the peripheral itself.

Three techniques are possible for I/O operations namely:


i. Programmed I/O – where data are exchanged between the processor and the I/O module. The processor
executes a program that gives it direct control of the I/O operation, including sensing device status, sending a
read/write command and transferring the data. When the processor issues a command to the I/O module, it
must wait until the I/O operation is complete. If the processor is faster than the I/O module, this is wasteful of
the processor time.
ii. Interrupt-driven I/O – in this case the processor issues an I/O command, continues to execute other
instructions and is interrupted by the I/O module when the latter has completed its work.

Thus with i and ii, the processor is responsible for extracting data from main memory for input.

Page 12 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

iii. Direct memory access (DMA) – the I/O module and memory exchange data directly without processor
involvement. The DMA technique involves an additional module on the bus system. The DMA module is
capable of mimicking the processor and indeed of taking over control of the system from the processor. It
needs to do this to and from memory over the system bus. For this purpose, the DMA module must use the bus
only when the processor does not need it, or it must force the processor to suspend operation temporarily. This
is referred to as “cycle-stealing”.

EXTERNAL DEVICES
Are those that provide a means of exchanging data between the external environment and the computer. They can be
classified as follows:-
i. Human readable:- Suitable for communicating with the computer user.
ii. Machine readable:- Communicates with equipment.
iii. Communication:- For communicating with remote devices.

Examples of (i) include video display terminals and printers, examples of (ii) include magnetic disks and tape systems.
An external device interfaces with an I/O module in the form of control, data and status signals. i.e.
 Control signals- determine the function the device performs e.g. input or read, output or write e.t.c.
 Data – occur in the form of a set of bits to be sent to or received from the I/O module.
 Status signals- indicate the status of the device. E.g. READY/NOT READY

Some further definitions:


Buffer: - is a temporary storage area that holds data during different stages of I/O. May include:
 Internal buffer area: - the area of main store set aside to hold data awaiting output or data recently input but not yet
processed.
 Buffer registers: - Located along the data path between the I/O devices and the processor, hold characters in the
process of being transferred.
Buffering: - Is the name given to the technique of transferring data into temporary storage prior to processing or output and
so enables simultaneous operation of devices.

Channel: - Is a path along which I/O data signals flow.

A bus (or highway):- hardware within the processor through which data signals pass from one of the choice of destinations.

An interface: - hardware located on each channel adjacent to the processor. Converts control data signals from the processor
to forms usable by the device connected to the interface.

The problems of speed differences between the processor and peripherals lead to a variety of techniques of I/O transfer. An
example is the simple I/O with other devices e.g.
i. Document readers
ii. Line printers: - ha data buffer that holds character and once its full, a single print actions is started by a
suitable instruction.
iii. Graph plotters: - for drawing purposes.

Multiplexing: This is another way of overcoming the difference in speed of hardware devices and the processors, by the
use of multiplexers. It involves transmitting character codes from a number of devices along a single channel.

The multiplexer has buffer register and may operate either synchronously or asynchronously. They are useful in handling
data from a number of terminals placed far from the processor and connected via telephone links.

Multiplexer
Multiple transmissions
Single channel

Page 13 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

Interrupt: - Is a break into normal automatic sequential control. Allows the control unit to fetch the first instruction of
another program (supervisor) or an instruction from another part of the same program instead of fetching the next
instructions as part of the fetch-execute cycle.

INPUT DEVICES
Problems of Data Entry
i. The data to be processed must be presented in a machine-sensible form i.e. the language of the particular input
device.
ii. The process of data collection involves a great many people, machines and expense.
iii. Data can originate in many forms.

The various data collection media and methods may be outlined as below:
i. On-line systems: - Where the computer is linked directly to data source e.g. a computer that controls a
machine or factory process.
ii. Key-to-diskette: - As used on PCs where data is entered onto magnetic media as an alternative to online
systems.
iii. Character recognition: - OCR or MICR techniques require that a source documents themselves are prepared
in a machine-sensible form.

Such document readers include:


i. Optical Character Recognition (OCR) or Optical Mark Recognition (OMR)
In these cases, a scanning device recognizes each character by the amount of reflected light or a mark in
particular positions on the document triggers of a response. OCR is used extensively in connection with billing
e.g. gas/electricity and insurance premiums renewals. OMR may be applied in meter reader documents in
conjunction with OCR.
ii. MICR: - documents are passed through a strong magnetic field, causing the iron oxide in the ink encoded
characters to become magnetized. Documents are then passed under a read head, where a current flows at
strength according to the size of the magnetized area. Applied in banks e.g. writing on cheques.
iii. Direct input devices e.g.
- Laser scanners at a supermarket checkout read coded marks on food products.
- Voice data entry (VDE) devices:- Where data can be spoken into them.
iv. Data loggers/recorders: - record and store data at source, then data transferred to the computer later.

Page 14 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 6: OPERATING SYSTEMS

Why OS were developed


Problems encountered on early generations included:
 Setup time – was required as each job was put into the machine and during which time the computer was idle eg
changing stationery on a printer or tape reels on tape.
 Manual intervention – was necessary to investigate error conditions and to initiate corrective measures (machine
idle)
 Imbalance between processor and peripherals – meant that the CPU lying idle for long periods of time during
the operations of the peripherals.

A super controller was therefore needed. This could only occur by the use of an internally stored program called OS.

Defn: OS is a suite of programs that has taken over many of the functions once performed by human operators. Its role is
that of resource management and such resources may include processors, I/O devices, programs, storage and data.

Functions of OS:
 Scheduling and loading of programs in order to provide a continuous sequence of processing or to provide
appropriate responses to events.
 Control over hardware resources
 Protecting hardware, software and data from improper use
 Calling into main storage programs and subroutines as and when required
 Passing of control from one job to another under a system of priority
 Provision of error correction routines
 Furnishing a complete record of all that happens
 Communication with the computer operator.

Types of Operating Systems:


 Single-program systems i.e. are single-user, single-program on small microcomputer-based systems eg MS-DOS
 Simple batch systems – provide multiprogramming of batch programs but have few facilities for interaction of
multi-access
 Multi-access and Time-sharing – majority of OS fall into this category
 Real-time systems – examples include process control systems.

Methods of operation and modes of access

a. Multiprocessing – where two or more processors are present in a computer and are sharing some or all of the
computer memory.
b. Multiprogramming – when more than one program in main memory are being processed apparently at the same
time
c. Batch processing – the job is not processed until fully put
d. Remote job entry – batch processing where jobs are entered at a terminal remote from the computer and
transmitted into the computer
e. Interactive computing – if the computer and terminal user can communicate with each other
f. Conversational mode – where the response to the user message is immediate
g. Multi-access – if the computer allows interactive facilities to more than one user at a time
h. Timesharing – processor time is divide into small units and shared in turn between users
i. Real time system – capable of processing data quickly such that the results are available to influence the activity
currently taking place.

Page 15 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 7: INSTRUCTION SETS & ADDRESSING MODES


Instruction Representation
Inside the computer, an instruction is represented by a sequence of bits. An instruction is divided into fields, with each field
corresponding to the constituent elements of the instruction (see diagram below).

4 bits 6 bits 6 bits

Opcode Operand Reference Operand Reference


16 bits
Fig. A simple instruction format.

During instruction execution, the following occur:


An instruction is read into an instruction register (IR) in the CPU, and the CPU must be able to extract data from the various
instruction fields to perform the required operation.

Addressing modes and format


An operand reference in an instruction either contains the actual value of the operand (immediate) or a reference of the
address of the operand. A wide variety of addressing modes are used in various instruction sets, which include: immediate,
direct, indirect, register direct, register indirect, displacement and stack as discussed here below.

Note the following:


A = contents of an address field in the instruction
R = contents of an address field in the instruction that refers to a register
EA = actual (effective) address of the location containing the referenced operand
(X) = contents of location X

Comments:
 First, virtually all computer architectures provide more than one of these addressing modes. The CU can determine
which addressing mode is used via several approaches such as
o Often, different opcodes will use different addressing modes
o Also one or more bits in the instruction format can be used as a mode field and the value of the mode field
determines which addressing mode is to be used.
 The EA will be either a main memory address or a register in a system without virtual memory while EA is a
virtual address or a register in a virtual memory system.

An instruction format defines the layout fields in the instruction. It considers issues e.g. instruction length (fixed or
variable), the number of bits assigned to opcode and each operand reference and how addressing mode is determined.

Addressing
Usually the address field(s) in an instruction format are relatively small. But generally we would like to be able to
reference a large range of locations in maim memory or, for some systems, virtual memory. A variety of addressing
techniques have therefore been employed to achieve this objective. These involve some tradeoffs between address
range and/or addressing flexibility on one hand and the number of memory references and/or the complexity of address
calculation on the other hand. The most common addressing techniques are discussed below.
i. Immediate: Is the simplest form of addressing in which the operand is actually present in the instruction. i.e.
OPERAND = A
This mode can be used to define and use constants or set initial values of variables. Its advantage is that no memory
reference other than the instruction fetch is required to obtain the operand. The disadvantage is that the size of the
number is restricted to the size of the address field, which, in most instruction sets is small compared with the word
length.

ii. Direct Addressing: Is a simple form of addressing in which the address field contains the effective address of
the operand. i.e. EA = A
The technique was common in earlier generations of computers and is still found on a number of small computer
systems. It requires only one memory reference and no special calculation but the limitation is that it provides only a
limited address space.
Page 16 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

iii. Indirect Addressing: This solves the problem of the limited address range in direct addressing by having the
address field refer to the address of a word in memory, which in turn contains a full-length address of the
operand i.e. EA = (A).

The parenthesis is to be interpreted as meaning contents of. The advantage is a larger address space while the
disadvantage is that the instruction execution requires two memory references to fetch the operand i.e. one, to get its
address and two, to get its value.

iv. Register Addressing: Is similar to direct addressing the only difference being that the address field refers to a
register rather than main memory address i.e. EA = R

Advantages include

i. Only a small address field is needed in the instruction.


ii. No memory references are required.
Note that the memory access time for a register internal to the CPU is much less than that for a main memory address.
The disadvantage is that the address space is very limited.

NB: If register addressing is heavily used in an instruction set, this implies that the CPU registers will be heavily used.
Because of the severely limited number of registers (compared with main memory locations), their use in this manner
makes sense only if they are employed efficiently. If every operand is brought into a register from main memory,
operated on once, and then returned to main memory, then a wasteful intermediate step has been added. If, instead, the
operand in a register remains in use for multiple operations, then a real savings is achieved e.g. the intermediate result
in a calculation.

v. Register Indirect Addressing: Is analogous to indirect addressing. The only difference is whether the
address field refers to a memory location or a register i.e. EA = (R)
Advantages and disadvantages are similar, but register indirect addressing uses one less memory reference than indirect
addressing.

vi. Displacement Addressing: Combines the capabilities of direct addressing and register indirect addressing i.e.
EA = A + (R)

It requires that the instruction have two address fields, at least one of which is explicit. The value contained in one
address field (value =A) is used directly .The other address field, or an implicit reference based on opcode, refers to a
register whose contents are added to A to produce the EA.

TASK: read about the three of the most common uses of displacement addressing. (Refer: William Stallings 5th Ed,
pgs 379-381).

vii. Stack Addressing: A stack is a linear array of locations, where items arte appended to the top of the stack so
that at any given time, the block is partially filled. Associated with the stack is a pointer whose value is the
address of the top of the stack. The stack pointer is maintained in a register; therefore references to stack
locations in memory are in fact register indirect addresses.

The stack mode of addressing is a form of implied addressing. The machine instructions need not include a memory
reference but implicitly operate on the top of the stack. Stacks have not been common traditionally but are becoming quite
common in microprocessors.

Instruction Format
An instruction format defines the layout of bits of an instruction in terms of its constituent parts. It must include an opcode
and implicitly/ or explicitly zero or more operands. It must also indicate the addressing mode for each operand, and for most
instruction sets, more than one instruction format is used. Some key design issues include:
 Instruction length: which affects and is affected by memory size, memory organization, bus structure, CPU
complexity and CPU speed.
 Allocation of bits: which could be determined by the following factors:
o Number of addressing modes
o Number of operands

Page 17 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

o Register versus memory


o Number of register sets
o Address range etc

TASK: Read more on this from William Stallings 5th Ed pgs: 388 onwards. Please see the attached pages: 376 and table
10.1Basic Addressing Modes on page 377 for graphical representations and summary.

Page 18 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 8: COMPLEX AND REDUCED INSTRUCTION SET COMPUTERS (CISC & RISC)

NB: CISC came earlier. RISC came later (1970s 80s) to overcome the increasing complexity of CISC processors,
There is need to examine the general characteristics of and the motivation for RISC architecture. Even though such systems
have been defined and designed in a variety of ways by different groups, the key elements they mostly share are:
 A large number of general-purpose registers, or the use of compiler technology to optimize register usage.
 A limited and simple instruction set.
 An emphasis on optimizing the instruction pipeline.

Why the Need for CISC (Complex Instruction Set Computers): CISC systems displayed a large number of instructions
and more complex instructions. This was a motivation to simplify compilers and the desire to improve performance. With
the advent of HLL (High Level Languages), architects attempted to design machines that provided better support of HLLs.
Simplifying compiler construction implied that there were to be machine instructions that resembled HLLs statements (the
task of the compiler writer is to generate a sequence of machine instructions for each HLL statements). The other
expectation i.e. improving performance implied that CISC would bring about smaller, faster programs.

The advantages of smaller programs include:


 They save memory space as a resource (but note that memory has become so inexpensive thus the potential
advantage is no longer compelling).
 The advantage of smaller programs improves performance e.g. there are fewer instructions to be fetched and in a
paging environment, smaller programs occupy fewer pages, thus reducing page faults.
Researchers though have found out that CISC systems perform no better than the RISC systems for their purported
arguments.

Characteristic of RISC architectures include:


 One instruction per machine cycle.
 Register-to-register operation.
 Simple addressing mode.
 Simple instruction formats.

Explanations: A machine cycle is the time taken to fetch two operands from registers, perform an ALU operation and store
the result in a register. Such instruction can be hardwired eliminating the need for a micro program to be accessed during
instruction execution.

The register-to-register optimizer the register use, so that frequently accessed operands remain in high-speed storage. This is
unique to RISC systems since others display memory-to-memory or mixed register/memory operations.

Almost all RISC instructions use simple register addressing which simplifies the instruction set and the control unit.
Finally, RISC systems use only one or a few instruction formats e.g. instruction length is fixed and aligned on word
boundaries.

The advantages of RISC may thus be said:


i. With respect to performance, more effective optimizing compilers can be developed.
ii. A CU built for those instructions executes faster than a comparable CISC.
iii. Instruction pipelining technique can be applied much more effectively with reduced instruction set.
iv. RISC programs should be more responsive to interrupts.

In general, it has been found that RISC and CISC systems may benefit from the inclusions of some features of each other.
Thus modern systems are no longer pure RISC or pure CISC.
RISC VS CISC controversy: The assessment of merits of the RISC approach can be grouped into two categories:
i. Quantitative: - Attempts to compare program size and execution speed of programs on RISC and CISC
machines that use comparable technology.
ii. Qualitative: - Examination of issues e.g. use of High Level Language support and optimum use of VLSIs.

Problems that do arise in comparing these systems include:-


i. There is no pair of RISC and CISC machines that are comparable in life cycle, cost, level of technology, gate
complexity, compiler sophistication, operating system support etc.
ii. No definitive set of programs exist i.e. performance varies with the program.
Page 19 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

iii. It is difficult to sort out hardware effects from the effects due to skill in computer writing
iv. Most of the comparative analysis of RISC has been done in “toy” machines rather than commercial products.
Furthermore, most commercially available machines advertised as RISC posses a mixture of RISC and CISC
characteristics. Thus a fair comparison with commercial, “pure-play” CISC machine e.g. a VAX Pentium, is
difficult.

NOTES:

RISC:
 H/w is simpler.
 Instruction set is composed of a few basic steps of loading, evaluating and storing operations.
 Reduce cycles per instruction at the cost of instructions per program.
CISC:
 A single instruction will do all loading, evaluating and storing operations, hence complex.
 Minimize the number of instructions per program at the cost of an increase in the number of cycles per instruction.

CISC characteristics RISC characteristics


i. Complex instructions hence complex instruction i. Simple instructions hence simple instruction
decoding decoding
ii. Instructions are larger than one-word size ii. Instructions come undersize one word
iii. Instruction may take more than a single clock- iii. Instruction takes a single clock-cycle to get executed
cycle to get executed
iv. Less number of general-purpose registers as iv. More general-purpose registers
operations get performed in memory itself
v. Complex addressing modes v. Simple addressing modes
vi. More data types vi. Fewer data types
vii. Does not make use of pipelining strategy. vii. A pipeline can be achieved.
Differences
i. Focus on hardware i. Focus on software
ii. Uses both hard-wired and ii. Uses only hard-wired CU
microprogramming CU
iii. Transistors used for storing complex iii. Transistors are used for more registers
instructions
iv. Variable size instructions iv. Fixed size instructions
v. Can perform reg-reg, reg-mem or mem- v. Perform only reg-reg arithmetic operations
mem
vi. Requires less registers vi. Requires more registers
vii. Code size is small vii. Code size is large
viii. Instruction takes more than one cycle viii. Instruction executed in a single clock-cycle
ix. Instruction larger than the size of one-word ix. Instruction

Page 20 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 9: SUPERSCALAR PROCESSORS

 A superscalar processor is one in which multiple independent instruction pipelines are used. Each pipeline
consists of multiple stages so that each pipeline can handle multiple instructions at a time. Multiple pipelines
introduce a new level of parallelism, enabling multiple streams of instructions to be processed at a time. A
superscalar processor exploits what is known as instruction-level parallelism, which refers to the degree to which
the instructions of a program can be executed in parallel.
 A superscalar processor typically fetches multiple instructions at a time and then attempts to find nearby
instructions that are independent of one another and can therefore be executed in parallel. If the input to one
instruction depends on the output of a preceding instruction, then the latter instruction cannot complete execution
at the same time or before the former instruction. Once such dependencies have been identified, the processor may
issue and complete instructions in an order that differs from that of the original machine code.
 The processor may eliminate some unnecessary dependencies by the use of additional registers and the renaming
of register references in the original code.

Superscalar execution

Fig: Conceptual depiction of superscalar processing

Instruction dispatch Instruction issue

Instruction execution

Instruction
reorder &
Static program commit
Instruction fetch &
Branch prediction

Window of execution

 Static program – written by the programmer


 Instruction fetch/branch prediction – form a dynamic stream of instructions. Examine dependencies.
 Window of execution – instructions no longer serial but flow according to data dependencies and hardware
resource availabilities.

Instructions finally are conceptually put back into sequential order and results recorded (instruction reorder and commit).
This final step is referred to as committing or retiring the instruction. This step is needed for the following reason:

Because of the use of parallel, multiple pipelines, instructions may complete in an order different from that shown in the
static program. Further, the use of branch prediction and speculative execution means that some instructions may complete
execution and then must be abandoned because the branch they represent is not taken. Therefore, permanent storage and
program-visible registers cannot be updated immediately when instructions complete execution. Results must be held in
some sort of temporary storage that is usable by dependant instructions and then made permanent when it is determined that
the sequential model would have executed the instruction.

Limitations of implementing superscalar systems

a. True data dependency – refers to the situation where the output of an earlier execution becomes the input of the
next stage of execution. Also referred to as flow dependency or write-read dependency e.g.

add r1, r2
move r3, r1

Which implies that the 2nd instruction can be fetched and decoded but cannot execute until the 1st instruction executes.

Page 21 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

b. Procedural dependency – the presence of an instruction sequence complicates the pipeline operation. The
instructions following a branch have a procedural dependency on the branch and cannot be executed until the
branch is executed.

c. Resource conflicts – refers to the competition of two or more instructions for the same resource at the same time.
This problem can however be alleviated by duplicating resources or can be minimized by pipelining the
appropriate functional unit in a case where an operation takes a long time to complete.

d. Output dependency – also referred to as write-write dependency, refers to a situation where the execution of a
latter instruction must wait for the execution of an earlier instruction in a situation where the earlier may take
longer to execute and thereby overwrites the third (latter) instruction e.g.

I1: r3 <=r3 op r5
I2: r4 <= r3 + 1
I3: r3 <= r5 +1
I4: r7 <= r3 op r4

 I3 cannot execute before I1 in a case where I1 will take long to complete and hence overwrites r3.

e. Antidependancy – also referred to as read-write dependency, is similar to true data dependency but is reversed
i.e. instead of the 1st instruction producing a value that the 2nd instruction uses, the 2nd instruction destroys a value
that the 1st instruction uses e.g.

I1: r3 <= r3 op r5
I2: r4 <= r3 + 1
I3: r3 <= r5 + 1
I4: r7 <= r3 op r4

i.e. instruction I3 cannot complete execution before I2 begins execution and has fetched its operands since I3 updates
register r3 which is a source of operand for I2

Superscalar implementation

The processor hardware required for the superscalar approach has the following key elements:
i. Instruction fetch strategies that simultaneously fetch multiple instructions, often predicting the outcomes of, and
fetching beyond conditional branch instructions.
ii. Logic for determining time dependencies involving register values, and mechanisms for communicating these
values to where they are needed during execution.
iii. Mechanisms for initiating/issuing multiple instructions in parallel.
iv. Resources for parallel execution of multiple instructions etc.
v. Mechanisms for committing the process state in the correct order.

NB:
 Superscalar CPU is typically pipelined.
 Superscalar and pipelining are considered as separate performance enhancement techniques.
 Superscalar – executes multiple instructions in parallel by using multiple execution units.
 Pipelining – executes multiple instructions in the same execution unit in parallel by dividing the execution unit into
different phases,

Task: See chapter 13, William Stallings for interested readers.

Page 22 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 10: PARALLEL PROCESSING

As computer technology has evolved and the cost of computer hardware has dropped, computer designers have sought more
opportunities for parallelism, usually to enhance performance, and in some cases to increase availability.

Three of the most prominent approaches to parallel processing are:


i. Symmetric Multiprocessing (SMP) – involves the use of multiple processors sharing a common
memory, but its organization raises the issue of cache coherence. It is the earliest and most common
example.
ii. Clusters – consist of multiple independent computers organized in a cooperative fashion. They have been
used to support workloads that are way above the capacity of a single SMP.
iii. Non-uniform Memory Access (NUMA) – new technology considered to be a possible alternative to the
two above.

A brief examination of each technology follows:

1. Symmetric Multiprocessing – refers to computer hardware architecture and also to the operating system
behavior that reflects that architecture. They are standalone computer systems with the following
characteristics:
 Two or more similar processors of comparable capability.
 The processors share the same memory and are interconnected by a bus or other internal connection
schemes such that memory access time is approximately the same for each processor.
 All the processors share access to I/O devices, either through the same channels or through different
channels that provide paths to the same device.
 All the processors can perform the same functions hence symmetric.
 The system is controlled by an integrated OS that provides interaction between processors and their
programs at the job, task, file and data element levels. The OS of an SMP schedules processes or threads
across all of the processors.

Advantages of SMP over uniprocessor architectures include:


 Performance is enhanced
 Availability i.e. failure of a single processor does not halt the machine.
 Incremental growth i.e. additional processors can be added.
 Scaling i.e. vendors can offer a range of products with different price and performance characteristics based on the
number of processors configured in the system.

NB: The existence of multiple processors is transparent to user i.e. OS takes care.

The organization of a multiprocessors system can be classified as:


i. Time-shared (common bus) – there are multiple processors and multiple I/O processors all attempting to
gain access to one or more memory modules via the bus. This approach has the advantage of being
simple, flexible and reliable.
ii. Multiport memory – allows for direct independent access of main memory modules by each processor
and I/O module. Logic associated with main memory is required for solving conflicts.

Adv:
 Provides better performance since each processor has a dedicated path to each memory module.
 It is also possible to configure portions of main memory as “private” to one or more processors and/or I/O
modules, thereby increasing security against unauthorized access etc.

Disadv:
 More complex than bus approach since more logic has to be added to the memory system.

NB: A write through policy should be used for cache control to alert other processors to a memory update.
iii. Central Control Unit – all logic for coordinating the multiprocessor configuration is concentrated in the
central control unit. It is flexible and simple but quite complex and is a potential performance bottleneck.

Multiprocessor operating System design considerations

Page 23 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

An SMP Os manages processor and other computer resources so that the user perceives a single OS controlling system
resources i.e. it may appear as a single-processor multiprogramming system. It is the responsibility of the OS to schedule
and allocate resources to multiple jobs.

Among the key design issues are:


i. Simultaneous concurrent processes – e.g. issues of deadlock and invalid operations must be avoided.
ii. Scheduling – properly directing processes to processors without conflicts.
iii. Synchronization – to enforce mutual exclusion and event-ordering.
iv. Memory management – must be ensured.
v. Reliability and fault tolerance – e.g. graceful degradation in the case of processor failure etc.

2. Clusters
Are particularly attractive to server applications. They are essentially a group of interconnected whole computers
working together as a unified computing resource that can create the illusion of being one machine.

Benefits (or objectives/design requirements)


i. Absolute scalability – clusters can grow large even surpassing the largest standalone.
ii. Incremental scalability – it is possible to add new systems to the cluster in small increments.
iii. High availability – failure of one node does not mean loss of service.
iv. Superior price/performance - it is possible to build a cluster with equal or greater computing power than a
single large machine at much lower cost.

Clusters Vs SMP
i. The main strength of SMP approach is that it is easier to manage and configure than a cluster.
ii. SMP also usually does take less physical space and draws less power than a comparable cluster.
iii. SMP products are well established and stable.

But,

 Clusters are far superior to SMP in terms of incremental and absolute growth.
 Also they are highly available in that all components of the system can readily be made highly redundant.

3. Non-Uniform Memory Access


These have just recently appeared for commercial use but not as common as the above two.

In NUMA, the memory access time of a processor differs depending on which region of main memory is accessed. A
NUMA system without cache coherence is more or less equivalent to a cluster.

The objective of NUMA is to maintain a transparent system-wide memory while permitting multiple multiprocessor
nodes, each with its own bus or other internal interconnect system i.e. independent nodes with their own processors, L1,
L2 caches and main memory, but all the main memories are envisaged as one system wide addressable memory.

Some definitions:
i. Uniform Memory Access (UMA) – the memory access time of a processor to all regions of memory is
the same.
ii. NUMA – the memory access time of a processor differs depending on which region of main memory is
accessed.
iii. Cache-coherent NUMA (CC-NUMA) – a NUMA system in which cache coherence is maintained
among the caches of the various processors.

The processor limit in a SMP is more of the driving force for clusters. But in clusters, applications do not see a large global
memory thus one approach to achieving large-scale multiprocessing while retaining the flavor of SMP.

NUMA advantage
 CC-NUMA system can deliver effective performance at higher levels of parallelism than SMP without requiring
major software changes.
NUMA disadvantage
 A CC-NUMA does not transparently look like an SMP, software changes will be required to move an OS and
applications from an SMP to a CC-NUMA

Page 24 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

 Availability which so much depends on the exact implementation of the CC-NUMA i.e. is hard/complex to
implement.

A brief history of Parallel Computing


 Earliest – only one program at a time.
 Then several programs interleaved to save time.
 Then multiprogramming – time sharing of the CPU.
 Then multiprocessing – two or more processors shared the work.
 The SMP and MMP (massively multiple processors).
 Increases in SPM (processors) reach a limit at which performance starts to degrade.
 To get around this, CLUSTER and NUMA were created that use message passing system i.e. no broadcasting of
messages but such messages are sent directly to the affected programs.
Clusters:
Are a set of computers that work together so that they can be viewed as a single system , and the components are usually
connected to each other through fast LANs, with each node running its own instance of an OS – but at times they use the
same h/w and OS.

NUMA:
Is a way to configure clusters so that they can share memory locally thereby improving performance and the system ability
to be expanded. – can be thought of as a “cluster in a box”.
Adds an intermediate level of memory shared among a few microprocessors so that all data accesses don’t have to travel on
the main bus.

Page 25 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 11: BASICS OF DIGITAL LOGIC AND HARDWARE CONSTRUNCTION

The digital circuitry in digital computers and other digital systems are designed, their behaviour is analyzed with the use of
Boolean Algebra. Boolean algebra turns out to be a convenient tool in two ways:
i. Analysis – it is an economical way of describing the function of digital circuitry.
ii. Design – given a desired function, Boolean algebra can be applied to develop a simplified implementation
of that function.

It makes use of variables and operations which are logical variables and operations. A variable takes on the value TRUE (1)
or FALSE (0). The basic logical operations are AND, OR and NOT represented by dot, plus sign and over bar e.g.
A AND B = A.B
A OR B = A + B
NOT A = A
The operation “AND” yields true if any and only if both its operands are true. OR yields true if either or both if its operands
are true. Not inverts the value of its operating e.g. the equation.
D = A + (B+C)
Implies that D = 1 if A = 1 or if both B = 0 and C = 0.

Table 1. Boolean Operators


P Q NOT P P AND Q P OR Q P XOR Q P NAND Q P NOR Q
0 0 1 0 1 0 1 1
0 1 1 1 1 1 1 0
1 0 0 1 1 1 1 0
1 1 0 1 0 0 0 0

Note: In the absence of parenthesis, the AND operation takes precedence over the OR operation. Also, when no ambiguity
will occur, the AND operation is represented by simply concatenation instead of the dot operator, thus;
A + B.C = A + (B.C) = A + BC
Which means:
“take the AND of B and C; then take OR of the result and A”.

Table 1 above defines the basic logical operations in a form known as “truth table”, which simply lists the value of an
operation for every possible combination of values of operands. The following imply thus;

 Exclusive-o (XOR) – of two logical operands is 1 if exactly one of the operands has a value 1.
 The NAND function is the complement of the AND function and NOR the complement of OR.

A NAND B = NOT (A AND B) = AB

A NOR B = (NOT (A OR B) = A+B

These three new operations can be useful in implementing certain digital circuits.

Gates

These are the building blocks of all digital circuits i.e. electronics circuits that produce an output signal that is simple
Boolean operation on its input signals. It uses the gates AND, OR, NOR, NAND and NOT.

Gates can have one or more inputs used. Usually not all gates are used in implementation since design and fabrication are
simpler if only one or two types is used. It is therefore important to identify “functionally complete” sets of gates as below;
 AND, OR, NOT
 AND, NOT
 OR, NOT
 NAND
 NOR

Note:
 AND, OR and NOT gates are functionally complete since they represent the three operations of Boolean algebra.

Page 26 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

 AND and NOT gates form functionally complete set by devising away to synthesize the OR operation e.g. by
applying De Morgan’s theorem thus

A+B = A.B
A OR B = {(NOT A) AND (NOT B)}
OR and NOT can be used to synthesize AND operation hence functionally complete.

It should thus be noted that can be implemented solely with NAND or NOR gates

Page 27 of 29

Downloaded by Enock Momanyi ([email protected])


lOMoARcPSD|46245439

CHAPTER 12: ASSEMBLY LANGUAGE PROGRAMMING

The CPU can understand and execute machine instructions which are simply binary numbers stored in the computer.
Programming in machine language would thus require the programmer to enter program as binary data which would be
extremely tedious.

In this line ALP was invented. It uses symbolic addresses instead of binary data. They are then translated into machine
language by an “assembler”. It does two main things
i. Performs symbolic translations of data items
ii. Assigns some form of memory address to symbolic addresses.

ASP was the frits step to the HLL in use today and virtually all machines provide one though few people can make use of it.
They are basically for designing systems programs e.g. compilers and I/O routines.

The end

Page 28 of 29

Downloaded by Enock Momanyi ([email protected])

You might also like