Ics2101 Computer Organisation
Ics2101 Computer Organisation
Recommended text(s)
1. Computer Organization and Architecture (5th Ed) By William Stallings, Prentice Hall
2. Introduction to Computer Science By C. S. French
Page 1 of 29
CHAPTER 1: INTRODUCTION
Computer Organization
Refers to the operational units and their interconnections that realize the architectural specifications. (Architecture – refers
to those attributes of a system visible to the programmer those attributes that have a direct impact on the logical execution
of a program). In effect, organizational attributes include those hardware details transparent to the programmer e.g. control
signals, interfaces between the computer and peripherals and the memory technology used. Architecture may survive many
years, but the organization changes with changing technology, but generally the relationship between organization and
architecture are very close, especially with regard to microcomputers.
The building up of complex systems takes up a hierarchic structure, where the behavior at each level depends only on a
simplified abstracted characterization of the system at the next lower level. At each of theses levels, the designer is only
concerned with two main items:
a. Structure: which is the way in which the components are interrelated
b. Function: which is the operation of each individual component as part of that structure.
Structure
The computer displays four main structural components namely:
CPU – which controls the operations of the computer and performs its data processing functions.
Main memory – that moves data between the computer and its external environment
System interconnection – which are the mechanisms that provide for communications among the CPU, main
memory and I/O.
Software
Fig. 1 Hardware
In general, arrangements within the computer may be considered as multilayered with several s/ware levels sitting on
several h/ware levels (see fig 2 below)
Page 2 of 29
Fig. 2
Explanations:
1. The Physical Device Layer: forms the electrical and electronic components of the computer. Some of the gadgets
to be seen here include transistors, capacitors, resistors, power supply components etc.
2. The Digital Logic Layer: caters for all the most basic operations of the machine. Elements at this layer can store,
manipulate and transmit data in the form of binary representations. These digital logic elements are called “gates”
which are normally constructed from a small number of transistors and other electronic components. Such devices
may be combined together to form computer processors, memories and such other units used for I/O.
3. The Micro programmed layer: interprets the machine language instructions from the machine layer and directly
causes the digital logic elements to perform the required operations. In effect, it is some basic inner processor,
driven by its own control program instructions held in its own private inner ROM. They may be termed as
“firmware” (i.e. software in ROM).
4. Machine Layer: is the lowest level at which a program can be written. It holds the machine language instructions
which can be directly interpreted by the hardware.
5. The Operating Systems Layer: controls the way in which all software use the underlying hardware. Also it hides
the complexities of the hardware from other s/ware by providing its own facilities which enable software to use
hardware more simply, and prevents other software from bypassing these facilities so that the hardware can only be
accessed by the OS.
6. The Higher Order Software Layer: covers all programs in languages other than machine language. They require
translation into machine code before they are executed.
7. The Applications Layer: represents the language of the computer as seen by the end-user.
Defn: A bus is a collection of parallel electrical conductors called “lines” onto which a number of components may be
connected. Such connectors, placed along the length of the bus have multiple electrical contacts.
Page 3 of 29
3. Multi-board, bus-based computers: usually general-purpose computers, and too large to fit onto a single board.
Each board has a particular function, and all boards are interconnected by plugging them into individual slots on
one or more general-purpose buses. For example, one board may contain the processor, another main storage etc.
Many microcomputers and mainframes are based on this type of construction. In some cases, a primary board,
called “motherboard” exists for the processor and other main components into which other boards may be slotted.
Most architecture otherwise use two buses that follow two alternatives as shown below (see figures 3 (a), (b) and (c))
In these layouts, data transfers between memory and the processor use a faster bus and also are not held up waiting for the
slower devices used for I/O.
Arrangement (b) is used more in larger microcomputers while (c) is used on larger minicomputers and mainframes. The
main aims of doing these are:
To maximize the use of the processor by freeing it of the burden of controlling low-level I/O operations.
To maximize the speed and efficiency of I/O data transfers to and from memory
The processor:
Its functions are:
To control the use of main storage to store data and instructions
To control the sequence of operations
To give commands to all parts of the computer system
To carry out processing
Registers
Are special-purpose temporary storage locations within the processor and are used for:
Page 4 of 29
i. Memory data registers (MDR) – registers through which all data and instructions pass in and out of the
processor.
ii. Memory buffer registers (MBR) – registers through which all data and instructions pass in and out of main
storage.
iii. Prior to each transfer between the processor’s MDR and the main storage’s MBR, the exact source or
destination of the data in main storage must be specified. This is done by loading the appropriate location
address into the Memory address register (MAR).
iv. I/O units connected to the processor via a bus also have data buffer registers (DBR) similar to MBR.
Control Unit
Is the nerve center of the computer. Controls and coordinates all hardware operations. Operates in two-stage cycle called the
“fetch-execute” cycle. It automatically deals with instructions in the order in which they occur in main memory. It does this
by using a register called “Program Counter (PC) or Sequence Control Register (SCR) which holds the location address of
the next instruction to be performed.
Multiprocessor Systems
Refer to any computer containing more than one processor. The extra processors may be used
As additional m main processors sharing the normal processing load.
As special-purpose processors catering for some particular function e.g. a math’s co-processor may be used in
conjunction with a single main processor to perform some standard complex computations.
Where there are a number of main processors, there are two basic methods of using the processors:
a. Asymmetric multiprocessing (AMP) – in this case, one processor is the master and all other are subordinate to it.
The master processor has special privileges over the operating system.
b. Symmetric multiprocessing (SMP) – here all processors have equal rights.
NB: alternative architectures are coming up into place to increase computational power. These include:
Pipeline machines – in which each stage of fetch-execute cycle is handled by a separate machine hardware unit.
Array Processors – in which there is one control unit but multiple ALUs which work in parallel with one another.
Defn:
Pipelining - is the name given to a method of speeding up the fetch-execute cycle by fetching not just the next instruction
but the next few that follow it in main storage. These pre-fetches are carried out if during the execute part of the fetch-
execute cycle, a brief interval occurs during which memory does not need to be accessed, e.g. while an arithmetic operation
is in progress i.e. no fetch task.
Array processors – in which there is one control unit but multiple ALUs which work in parallel with one another.
Page 5 of 29
The major computer system components (processor, memory and I/O modules) need to be interconnected in order to
exchange data and control signals. The most popular means of interconnection is the use of a shared system bus consisting
of multiple lines. There exists, usually, a hierarchy of buses to improve performance.
Bus Interconnection
A bus is simply a communication pathway connecting two or more devices, and it is a shared medium, meaning only one
device can send at a time. Several bus lines may be used to transfer more bits per unit time.
Buses consist of between 50 – 100 lines each line with its function classified in three as:
i. Data lines – for moving data between system modules. They can be 8, 16, 32 etc lines referred to as ‘width’.
The wider the better is the system performance.
ii. Address lines – designate the source/destination of the data on the data bus.
iii. Control lines - control access to and the use of the data and address lines. Control signals transmit both
command and timing information between system modules.
The need for interrupts therefore is primarily for improving processing efficiency e.g. when you consider the fast
processor and slow I/O devices.
I/O Function
Page 6 of 29
An I/O module e.g. a disk controller, can exchange data directly with the processor i.e. the processor can read data from or
write data to an I/O module. At certain time, it’s worth allowing I/O exchanges to occur directly with memory i.e. the
processor grants to an I/O module the authority to read from or write to memory, enabling the I/O – memory transfer to
occur without tying up the processor. This is referred to as Direct Memory Access (DMA) (cycle stealing strategy).
Also note that there are primarily three types of buses namely:
i. Address bus – carries memory addresses from the processor to other components eg M, I/O.
ii. Data bus – carries data between P and other components
iii. Control bus – carries control signals from P to the other components.
Page 7 of 29
Internal Memory:
Computer memory exhibits a wide range of type, technology, organization, performance and cost. A typical computer
system is equipped with a hierarchy of memory subsystems, some internal to the system (i.e. directly accessed by the
processor) and some external (accessed by the processor via an I/O module).
Memory exists in a variety of forms such as semiconductor, magnetic, optical and magneto optical.
Memory is therefore built in a hierarchy to optimize the three constraints. Downwards, the following occur:
Decreasing cost per bit
Increasing capacity
Increasing access time
Decreasing frequency of access of the memory by the processor
Page 8 of 29
There is also ROM which contains a permanent pattern of data that cannot be changed. It can only be read and not written
to. Its major application is in microprogramming . Others include library subroutines for frequently wanted functions,
system programs and function tables.
The main advantage of ROM is that the data/program is permanently if main memory and need never be loaded from a
secondary storage device since its data is actually wired into the chip. However this poses two main problems:
The data insertion step includes a relatively large fixed cost
There is no room for error i.e. if one bit is wrong the whole batch of ROMs must be discarded.
PROM – programmable memory is less expensive, non-volatile and may be written into only once either by the supplier or
customer. It requires special equipment. Required only when a small number of ROMs with particular memory content is
needed.
Read mostly memories – e.g. EPROM, EEPROM and flash memory are useful for applications in which there are more
read operations than write operations but for which non-volatile storage is required.
Flash memory is intermediate between EPROM and EEPROM in both cost and functionality and uses semiconductor
technology.
Word Block
Cache
CPU Main memory
Cache memory is intended to give memory speeds approaching that of the fastest memories available and at the same time
provide a large memory size at the price of less expensive types of semiconductor memories. The cache contains a copy of
the portions of main memory and so when the processor attempts to read a word of memory, a check is made to determine if
the word is in the cache first then delivered to the processor. The principles of locality of reference apply:
i. It is likely to talk more to its neighbours – physical locality of reference.
ii. Repeated talks are more likely with those already talked to – temporal locality of reference.
iii. Uses the principles e.g. FIFO, LFU, LRU, random, round robin etc
i. Cache size – should be small enough so that the average cost per bit is close to that of main memory alone and
large enough so that the overall average access time is close to that of cache alone. Large caches tend to be
slower than smaller ones.
ii. Mapping function – because there are fewer cache lines than main memory blocks, an algorithm is needed for
mapping main memory blocks into cache lines. Three techniques can be used namely direct, associative and
set associative, each with its advantages and disadvantages.
Page 9 of 29
a. Direct mapping – maps each block of memory into only one possible cache line.
b. Associative mapping – overcomes the disadvantage of direct mapping by permitting each memory block to be
loaded into any line of the cache. Therefore, there is flexibility as to which block to replace when a new block is
read into the cache. Its disadvantage is the complex circuitry required to examine the tags of all the cache lines in
parallel (tag field uniquely identifies a block of main memory so that they can be realigned appropriately.
c. Set associative mapping – exhibits the strengths of both the direct and associative approaches while reducing their
disadvantages.
Replacement algorithms
When a new block is brought into the cache, one of the existing blocks must be replaced. In the case of direct mapping,
replacement takes only one possible line for any particular block i.e. no choice is possible. For associative and set
associative, a replacement algorithm is needed, the algorithm being implemented in hardware. The main four are:
i. LRU – is the most effective and replaces the block in the set that has been in the cache longest with no
reference to it.
ii. FIFO – easily implemented as a round-robin or circular buffer technique and replaces on a FIFO basis.
iii. LFU – replaces that block in the set that has experienced the fewest references. In this case, each line may be
associated with a counter.
iv. Pick a line at random from among the candidate lines.
Write policy
Before a block in the cache can be replaced, it is necessary to consider whether it has been altered in the cache but not in the
main memory. If it has not then the old block in the cache may be overwritten otherwise main memory must be updated.
Since more than one device may have access to main memory, or the existence of many processors attaching to the same
bus and each having its own local cache, an alteration in one cache invalidates a word in the other caches.
A technique called write through is used which ensures that all write operations are made to main memory as well as to the
cache to maintain validity. The disadvantage here is that it generates substantial memory traffic thus may create a
bottleneck.
Another technique is write back where updates are made only in the cache thus minimizing memory writes. The
disadvantage is that parts of main memory are invalid and so accesses by I/O modules are allowed only through the cache.
Number of caches
More recently the use of multiple caches has become the norm. There is the on-chip (on same chip as processor) and
external chip caches. The on-chip cache reduces the processor’s external bus activity and so speeds up execution times and
increases overall system performance. Also more recently, it has become common practice to split the cache into two – one
dedicated to instructions and the other to data.
Page 10 of 29
Magnetic disks – use magnetic technology. The disk is a circular platter of metal/plastic base coated with magnetizable
material. A read/write operation occurs when the platter rotates below it. During write operation, magnetic patterns are
recorded on the surface and during read operation, current is generated of the same polarity as the one already recorded.
Data are written on tracks, about 500 – 2000 tracks per surface.
Data are transferred to and from the disk in blocks. Data are stored in block size regions known as sectors, about 10 – 100
sectors per track.
The RAID strategy replaces large-capacity disk drives with multiple smaller-capacity drives and distributes data in such
away as to enable simultaneous access to data from multiple drives, thereby improving I/O performance and allowing easier
incremental increases in capacity. The RAID proposal is effectively to increase redundancy.
Although the use of multiple heads simultaneously achieves higher I/O and transfer rates, the use of multiple devices also
increases probability of failure, but this is compensated for by RAID making use of stored parity information enabling the
recovery of data lost due to a disk failure.
Optical memory
The Compact Disk (CD) digital audio system was discovered in 1983. It is non-erasable and can store more than 60 minutes
of audio information on one side. A variety of them have now been produced as below.
i. CD: - A non-erasable disk that stores digital audio information. The standard systems use 12cm disks and
records more than 60 minutes of uninterrupted playing time.
ii. CD-ROM:- Non-erasable disk used for storing computer data. Uses 12 cm disks storing more than 600 m
bytes.
iii. DVD: - Digital Video Disk. A technology for producing digitized, compressed representation of video
information as well as large volumes of other digital data.
iv. WORM: Write once read many. Is more easily written than CD-ROM. Size is 5 ¼”, holding between 200-
800 m bytes of data.
v. Erasable Optical Disk: - Uses optical technology but can be easily erased and rewritten. Both 3.25” and
5.25” disks are in use with typical capacity of 650MB.
vi. Magneto-Optical Disk: - Uses optical technology for read and magnetic recording techniques assisted by
optical focusing. Both 3.25” and 5.25” disks are in use, capacities being above 1GB.
vii. Magnetic Tapes: - Use similar recording and reading techniques as disk systems. The tape is coated with
magnetic oxide. (Similar to home tape recorder system. Blocks of data on tape are separated by gaps called
“inter-record” gaps. It is a serial access medium, unlike disks that are direct access.
Magnetic tapes were the first kind of secondary memory and are still widely used as the low-cost, slowest
speed member of the memory hierarchy.
Page 11 of 29
The reasons for not connecting directly the peripherals to the system bus include:
i. It’s impractical to incorporate the necessary logic of the wide variety of peripherals with various methods of
operations within the processor to control a range of devices.
ii. It’s impractical to use the high-speed system bus to communicate directly with the peripherals whose speeds
are much slower.
iii. Peripherals often use different data format/word lengths than the computers to which they attach.
I/O module:
Module function: the major functions or requirements for an I/O module fall into the following categories:
i. Control and timing – which is the coordinating of the flow of traffic between internal resources e.g. main
memory, system buses etc, and the external devices e.g. control of data transfer from an external device to the
processor.
ii. Processor communication – which involves the following
Command decoding i.e. I/O module accepts command from the processor e.g. READ SECTOR etc
Data exchanges between the processor and module
Status reporting of the I/O module e.g. BUSY, READY etc
Address recognition - I/O module recognizes one unique address for each peripheral it controls.
iii. Device communication – implies/involves commands, status information and data.
iv. Data buffering – enables I/O module to operate at both device and memory speeds i.e. during I/O.
v. Error detection – detects error and reports to the processor. Such errors may include
Mechanical/electrical malfunctions of device e.g. paper jam, bad disk track etc.
Unintentional changes to bit pattern as it is transmitted from device to I/O module e.g. use of parity bit to
detect transmission errors.
OS
Fig: Structure of the I/O system
1 2 3 4 5
Thus with i and ii, the processor is responsible for extracting data from main memory for input.
Page 12 of 29
iii. Direct memory access (DMA) – the I/O module and memory exchange data directly without processor
involvement. The DMA technique involves an additional module on the bus system. The DMA module is
capable of mimicking the processor and indeed of taking over control of the system from the processor. It
needs to do this to and from memory over the system bus. For this purpose, the DMA module must use the bus
only when the processor does not need it, or it must force the processor to suspend operation temporarily. This
is referred to as “cycle-stealing”.
EXTERNAL DEVICES
Are those that provide a means of exchanging data between the external environment and the computer. They can be
classified as follows:-
i. Human readable:- Suitable for communicating with the computer user.
ii. Machine readable:- Communicates with equipment.
iii. Communication:- For communicating with remote devices.
Examples of (i) include video display terminals and printers, examples of (ii) include magnetic disks and tape systems.
An external device interfaces with an I/O module in the form of control, data and status signals. i.e.
Control signals- determine the function the device performs e.g. input or read, output or write e.t.c.
Data – occur in the form of a set of bits to be sent to or received from the I/O module.
Status signals- indicate the status of the device. E.g. READY/NOT READY
A bus (or highway):- hardware within the processor through which data signals pass from one of the choice of destinations.
An interface: - hardware located on each channel adjacent to the processor. Converts control data signals from the processor
to forms usable by the device connected to the interface.
The problems of speed differences between the processor and peripherals lead to a variety of techniques of I/O transfer. An
example is the simple I/O with other devices e.g.
i. Document readers
ii. Line printers: - ha data buffer that holds character and once its full, a single print actions is started by a
suitable instruction.
iii. Graph plotters: - for drawing purposes.
Multiplexing: This is another way of overcoming the difference in speed of hardware devices and the processors, by the
use of multiplexers. It involves transmitting character codes from a number of devices along a single channel.
The multiplexer has buffer register and may operate either synchronously or asynchronously. They are useful in handling
data from a number of terminals placed far from the processor and connected via telephone links.
Multiplexer
Multiple transmissions
Single channel
Page 13 of 29
Interrupt: - Is a break into normal automatic sequential control. Allows the control unit to fetch the first instruction of
another program (supervisor) or an instruction from another part of the same program instead of fetching the next
instructions as part of the fetch-execute cycle.
INPUT DEVICES
Problems of Data Entry
i. The data to be processed must be presented in a machine-sensible form i.e. the language of the particular input
device.
ii. The process of data collection involves a great many people, machines and expense.
iii. Data can originate in many forms.
The various data collection media and methods may be outlined as below:
i. On-line systems: - Where the computer is linked directly to data source e.g. a computer that controls a
machine or factory process.
ii. Key-to-diskette: - As used on PCs where data is entered onto magnetic media as an alternative to online
systems.
iii. Character recognition: - OCR or MICR techniques require that a source documents themselves are prepared
in a machine-sensible form.
Page 14 of 29
A super controller was therefore needed. This could only occur by the use of an internally stored program called OS.
Defn: OS is a suite of programs that has taken over many of the functions once performed by human operators. Its role is
that of resource management and such resources may include processors, I/O devices, programs, storage and data.
Functions of OS:
Scheduling and loading of programs in order to provide a continuous sequence of processing or to provide
appropriate responses to events.
Control over hardware resources
Protecting hardware, software and data from improper use
Calling into main storage programs and subroutines as and when required
Passing of control from one job to another under a system of priority
Provision of error correction routines
Furnishing a complete record of all that happens
Communication with the computer operator.
a. Multiprocessing – where two or more processors are present in a computer and are sharing some or all of the
computer memory.
b. Multiprogramming – when more than one program in main memory are being processed apparently at the same
time
c. Batch processing – the job is not processed until fully put
d. Remote job entry – batch processing where jobs are entered at a terminal remote from the computer and
transmitted into the computer
e. Interactive computing – if the computer and terminal user can communicate with each other
f. Conversational mode – where the response to the user message is immediate
g. Multi-access – if the computer allows interactive facilities to more than one user at a time
h. Timesharing – processor time is divide into small units and shared in turn between users
i. Real time system – capable of processing data quickly such that the results are available to influence the activity
currently taking place.
Page 15 of 29
Comments:
First, virtually all computer architectures provide more than one of these addressing modes. The CU can determine
which addressing mode is used via several approaches such as
o Often, different opcodes will use different addressing modes
o Also one or more bits in the instruction format can be used as a mode field and the value of the mode field
determines which addressing mode is to be used.
The EA will be either a main memory address or a register in a system without virtual memory while EA is a
virtual address or a register in a virtual memory system.
An instruction format defines the layout fields in the instruction. It considers issues e.g. instruction length (fixed or
variable), the number of bits assigned to opcode and each operand reference and how addressing mode is determined.
Addressing
Usually the address field(s) in an instruction format are relatively small. But generally we would like to be able to
reference a large range of locations in maim memory or, for some systems, virtual memory. A variety of addressing
techniques have therefore been employed to achieve this objective. These involve some tradeoffs between address
range and/or addressing flexibility on one hand and the number of memory references and/or the complexity of address
calculation on the other hand. The most common addressing techniques are discussed below.
i. Immediate: Is the simplest form of addressing in which the operand is actually present in the instruction. i.e.
OPERAND = A
This mode can be used to define and use constants or set initial values of variables. Its advantage is that no memory
reference other than the instruction fetch is required to obtain the operand. The disadvantage is that the size of the
number is restricted to the size of the address field, which, in most instruction sets is small compared with the word
length.
ii. Direct Addressing: Is a simple form of addressing in which the address field contains the effective address of
the operand. i.e. EA = A
The technique was common in earlier generations of computers and is still found on a number of small computer
systems. It requires only one memory reference and no special calculation but the limitation is that it provides only a
limited address space.
Page 16 of 29
iii. Indirect Addressing: This solves the problem of the limited address range in direct addressing by having the
address field refer to the address of a word in memory, which in turn contains a full-length address of the
operand i.e. EA = (A).
The parenthesis is to be interpreted as meaning contents of. The advantage is a larger address space while the
disadvantage is that the instruction execution requires two memory references to fetch the operand i.e. one, to get its
address and two, to get its value.
iv. Register Addressing: Is similar to direct addressing the only difference being that the address field refers to a
register rather than main memory address i.e. EA = R
Advantages include
NB: If register addressing is heavily used in an instruction set, this implies that the CPU registers will be heavily used.
Because of the severely limited number of registers (compared with main memory locations), their use in this manner
makes sense only if they are employed efficiently. If every operand is brought into a register from main memory,
operated on once, and then returned to main memory, then a wasteful intermediate step has been added. If, instead, the
operand in a register remains in use for multiple operations, then a real savings is achieved e.g. the intermediate result
in a calculation.
v. Register Indirect Addressing: Is analogous to indirect addressing. The only difference is whether the
address field refers to a memory location or a register i.e. EA = (R)
Advantages and disadvantages are similar, but register indirect addressing uses one less memory reference than indirect
addressing.
vi. Displacement Addressing: Combines the capabilities of direct addressing and register indirect addressing i.e.
EA = A + (R)
It requires that the instruction have two address fields, at least one of which is explicit. The value contained in one
address field (value =A) is used directly .The other address field, or an implicit reference based on opcode, refers to a
register whose contents are added to A to produce the EA.
TASK: read about the three of the most common uses of displacement addressing. (Refer: William Stallings 5th Ed,
pgs 379-381).
vii. Stack Addressing: A stack is a linear array of locations, where items arte appended to the top of the stack so
that at any given time, the block is partially filled. Associated with the stack is a pointer whose value is the
address of the top of the stack. The stack pointer is maintained in a register; therefore references to stack
locations in memory are in fact register indirect addresses.
The stack mode of addressing is a form of implied addressing. The machine instructions need not include a memory
reference but implicitly operate on the top of the stack. Stacks have not been common traditionally but are becoming quite
common in microprocessors.
Instruction Format
An instruction format defines the layout of bits of an instruction in terms of its constituent parts. It must include an opcode
and implicitly/ or explicitly zero or more operands. It must also indicate the addressing mode for each operand, and for most
instruction sets, more than one instruction format is used. Some key design issues include:
Instruction length: which affects and is affected by memory size, memory organization, bus structure, CPU
complexity and CPU speed.
Allocation of bits: which could be determined by the following factors:
o Number of addressing modes
o Number of operands
Page 17 of 29
TASK: Read more on this from William Stallings 5th Ed pgs: 388 onwards. Please see the attached pages: 376 and table
10.1Basic Addressing Modes on page 377 for graphical representations and summary.
Page 18 of 29
CHAPTER 8: COMPLEX AND REDUCED INSTRUCTION SET COMPUTERS (CISC & RISC)
NB: CISC came earlier. RISC came later (1970s 80s) to overcome the increasing complexity of CISC processors,
There is need to examine the general characteristics of and the motivation for RISC architecture. Even though such systems
have been defined and designed in a variety of ways by different groups, the key elements they mostly share are:
A large number of general-purpose registers, or the use of compiler technology to optimize register usage.
A limited and simple instruction set.
An emphasis on optimizing the instruction pipeline.
Why the Need for CISC (Complex Instruction Set Computers): CISC systems displayed a large number of instructions
and more complex instructions. This was a motivation to simplify compilers and the desire to improve performance. With
the advent of HLL (High Level Languages), architects attempted to design machines that provided better support of HLLs.
Simplifying compiler construction implied that there were to be machine instructions that resembled HLLs statements (the
task of the compiler writer is to generate a sequence of machine instructions for each HLL statements). The other
expectation i.e. improving performance implied that CISC would bring about smaller, faster programs.
Explanations: A machine cycle is the time taken to fetch two operands from registers, perform an ALU operation and store
the result in a register. Such instruction can be hardwired eliminating the need for a micro program to be accessed during
instruction execution.
The register-to-register optimizer the register use, so that frequently accessed operands remain in high-speed storage. This is
unique to RISC systems since others display memory-to-memory or mixed register/memory operations.
Almost all RISC instructions use simple register addressing which simplifies the instruction set and the control unit.
Finally, RISC systems use only one or a few instruction formats e.g. instruction length is fixed and aligned on word
boundaries.
In general, it has been found that RISC and CISC systems may benefit from the inclusions of some features of each other.
Thus modern systems are no longer pure RISC or pure CISC.
RISC VS CISC controversy: The assessment of merits of the RISC approach can be grouped into two categories:
i. Quantitative: - Attempts to compare program size and execution speed of programs on RISC and CISC
machines that use comparable technology.
ii. Qualitative: - Examination of issues e.g. use of High Level Language support and optimum use of VLSIs.
iii. It is difficult to sort out hardware effects from the effects due to skill in computer writing
iv. Most of the comparative analysis of RISC has been done in “toy” machines rather than commercial products.
Furthermore, most commercially available machines advertised as RISC posses a mixture of RISC and CISC
characteristics. Thus a fair comparison with commercial, “pure-play” CISC machine e.g. a VAX Pentium, is
difficult.
NOTES:
RISC:
H/w is simpler.
Instruction set is composed of a few basic steps of loading, evaluating and storing operations.
Reduce cycles per instruction at the cost of instructions per program.
CISC:
A single instruction will do all loading, evaluating and storing operations, hence complex.
Minimize the number of instructions per program at the cost of an increase in the number of cycles per instruction.
Page 20 of 29
A superscalar processor is one in which multiple independent instruction pipelines are used. Each pipeline
consists of multiple stages so that each pipeline can handle multiple instructions at a time. Multiple pipelines
introduce a new level of parallelism, enabling multiple streams of instructions to be processed at a time. A
superscalar processor exploits what is known as instruction-level parallelism, which refers to the degree to which
the instructions of a program can be executed in parallel.
A superscalar processor typically fetches multiple instructions at a time and then attempts to find nearby
instructions that are independent of one another and can therefore be executed in parallel. If the input to one
instruction depends on the output of a preceding instruction, then the latter instruction cannot complete execution
at the same time or before the former instruction. Once such dependencies have been identified, the processor may
issue and complete instructions in an order that differs from that of the original machine code.
The processor may eliminate some unnecessary dependencies by the use of additional registers and the renaming
of register references in the original code.
Superscalar execution
Instruction execution
Instruction
reorder &
Static program commit
Instruction fetch &
Branch prediction
Window of execution
Instructions finally are conceptually put back into sequential order and results recorded (instruction reorder and commit).
This final step is referred to as committing or retiring the instruction. This step is needed for the following reason:
Because of the use of parallel, multiple pipelines, instructions may complete in an order different from that shown in the
static program. Further, the use of branch prediction and speculative execution means that some instructions may complete
execution and then must be abandoned because the branch they represent is not taken. Therefore, permanent storage and
program-visible registers cannot be updated immediately when instructions complete execution. Results must be held in
some sort of temporary storage that is usable by dependant instructions and then made permanent when it is determined that
the sequential model would have executed the instruction.
a. True data dependency – refers to the situation where the output of an earlier execution becomes the input of the
next stage of execution. Also referred to as flow dependency or write-read dependency e.g.
add r1, r2
move r3, r1
Which implies that the 2nd instruction can be fetched and decoded but cannot execute until the 1st instruction executes.
Page 21 of 29
b. Procedural dependency – the presence of an instruction sequence complicates the pipeline operation. The
instructions following a branch have a procedural dependency on the branch and cannot be executed until the
branch is executed.
c. Resource conflicts – refers to the competition of two or more instructions for the same resource at the same time.
This problem can however be alleviated by duplicating resources or can be minimized by pipelining the
appropriate functional unit in a case where an operation takes a long time to complete.
d. Output dependency – also referred to as write-write dependency, refers to a situation where the execution of a
latter instruction must wait for the execution of an earlier instruction in a situation where the earlier may take
longer to execute and thereby overwrites the third (latter) instruction e.g.
I1: r3 <=r3 op r5
I2: r4 <= r3 + 1
I3: r3 <= r5 +1
I4: r7 <= r3 op r4
I3 cannot execute before I1 in a case where I1 will take long to complete and hence overwrites r3.
e. Antidependancy – also referred to as read-write dependency, is similar to true data dependency but is reversed
i.e. instead of the 1st instruction producing a value that the 2nd instruction uses, the 2nd instruction destroys a value
that the 1st instruction uses e.g.
I1: r3 <= r3 op r5
I2: r4 <= r3 + 1
I3: r3 <= r5 + 1
I4: r7 <= r3 op r4
i.e. instruction I3 cannot complete execution before I2 begins execution and has fetched its operands since I3 updates
register r3 which is a source of operand for I2
Superscalar implementation
The processor hardware required for the superscalar approach has the following key elements:
i. Instruction fetch strategies that simultaneously fetch multiple instructions, often predicting the outcomes of, and
fetching beyond conditional branch instructions.
ii. Logic for determining time dependencies involving register values, and mechanisms for communicating these
values to where they are needed during execution.
iii. Mechanisms for initiating/issuing multiple instructions in parallel.
iv. Resources for parallel execution of multiple instructions etc.
v. Mechanisms for committing the process state in the correct order.
NB:
Superscalar CPU is typically pipelined.
Superscalar and pipelining are considered as separate performance enhancement techniques.
Superscalar – executes multiple instructions in parallel by using multiple execution units.
Pipelining – executes multiple instructions in the same execution unit in parallel by dividing the execution unit into
different phases,
Page 22 of 29
As computer technology has evolved and the cost of computer hardware has dropped, computer designers have sought more
opportunities for parallelism, usually to enhance performance, and in some cases to increase availability.
1. Symmetric Multiprocessing – refers to computer hardware architecture and also to the operating system
behavior that reflects that architecture. They are standalone computer systems with the following
characteristics:
Two or more similar processors of comparable capability.
The processors share the same memory and are interconnected by a bus or other internal connection
schemes such that memory access time is approximately the same for each processor.
All the processors share access to I/O devices, either through the same channels or through different
channels that provide paths to the same device.
All the processors can perform the same functions hence symmetric.
The system is controlled by an integrated OS that provides interaction between processors and their
programs at the job, task, file and data element levels. The OS of an SMP schedules processes or threads
across all of the processors.
NB: The existence of multiple processors is transparent to user i.e. OS takes care.
Adv:
Provides better performance since each processor has a dedicated path to each memory module.
It is also possible to configure portions of main memory as “private” to one or more processors and/or I/O
modules, thereby increasing security against unauthorized access etc.
Disadv:
More complex than bus approach since more logic has to be added to the memory system.
NB: A write through policy should be used for cache control to alert other processors to a memory update.
iii. Central Control Unit – all logic for coordinating the multiprocessor configuration is concentrated in the
central control unit. It is flexible and simple but quite complex and is a potential performance bottleneck.
Page 23 of 29
An SMP Os manages processor and other computer resources so that the user perceives a single OS controlling system
resources i.e. it may appear as a single-processor multiprogramming system. It is the responsibility of the OS to schedule
and allocate resources to multiple jobs.
2. Clusters
Are particularly attractive to server applications. They are essentially a group of interconnected whole computers
working together as a unified computing resource that can create the illusion of being one machine.
Clusters Vs SMP
i. The main strength of SMP approach is that it is easier to manage and configure than a cluster.
ii. SMP also usually does take less physical space and draws less power than a comparable cluster.
iii. SMP products are well established and stable.
But,
Clusters are far superior to SMP in terms of incremental and absolute growth.
Also they are highly available in that all components of the system can readily be made highly redundant.
In NUMA, the memory access time of a processor differs depending on which region of main memory is accessed. A
NUMA system without cache coherence is more or less equivalent to a cluster.
The objective of NUMA is to maintain a transparent system-wide memory while permitting multiple multiprocessor
nodes, each with its own bus or other internal interconnect system i.e. independent nodes with their own processors, L1,
L2 caches and main memory, but all the main memories are envisaged as one system wide addressable memory.
Some definitions:
i. Uniform Memory Access (UMA) – the memory access time of a processor to all regions of memory is
the same.
ii. NUMA – the memory access time of a processor differs depending on which region of main memory is
accessed.
iii. Cache-coherent NUMA (CC-NUMA) – a NUMA system in which cache coherence is maintained
among the caches of the various processors.
The processor limit in a SMP is more of the driving force for clusters. But in clusters, applications do not see a large global
memory thus one approach to achieving large-scale multiprocessing while retaining the flavor of SMP.
NUMA advantage
CC-NUMA system can deliver effective performance at higher levels of parallelism than SMP without requiring
major software changes.
NUMA disadvantage
A CC-NUMA does not transparently look like an SMP, software changes will be required to move an OS and
applications from an SMP to a CC-NUMA
Page 24 of 29
Availability which so much depends on the exact implementation of the CC-NUMA i.e. is hard/complex to
implement.
NUMA:
Is a way to configure clusters so that they can share memory locally thereby improving performance and the system ability
to be expanded. – can be thought of as a “cluster in a box”.
Adds an intermediate level of memory shared among a few microprocessors so that all data accesses don’t have to travel on
the main bus.
Page 25 of 29
The digital circuitry in digital computers and other digital systems are designed, their behaviour is analyzed with the use of
Boolean Algebra. Boolean algebra turns out to be a convenient tool in two ways:
i. Analysis – it is an economical way of describing the function of digital circuitry.
ii. Design – given a desired function, Boolean algebra can be applied to develop a simplified implementation
of that function.
It makes use of variables and operations which are logical variables and operations. A variable takes on the value TRUE (1)
or FALSE (0). The basic logical operations are AND, OR and NOT represented by dot, plus sign and over bar e.g.
A AND B = A.B
A OR B = A + B
NOT A = A
The operation “AND” yields true if any and only if both its operands are true. OR yields true if either or both if its operands
are true. Not inverts the value of its operating e.g. the equation.
D = A + (B+C)
Implies that D = 1 if A = 1 or if both B = 0 and C = 0.
Note: In the absence of parenthesis, the AND operation takes precedence over the OR operation. Also, when no ambiguity
will occur, the AND operation is represented by simply concatenation instead of the dot operator, thus;
A + B.C = A + (B.C) = A + BC
Which means:
“take the AND of B and C; then take OR of the result and A”.
Table 1 above defines the basic logical operations in a form known as “truth table”, which simply lists the value of an
operation for every possible combination of values of operands. The following imply thus;
Exclusive-o (XOR) – of two logical operands is 1 if exactly one of the operands has a value 1.
The NAND function is the complement of the AND function and NOR the complement of OR.
These three new operations can be useful in implementing certain digital circuits.
Gates
These are the building blocks of all digital circuits i.e. electronics circuits that produce an output signal that is simple
Boolean operation on its input signals. It uses the gates AND, OR, NOR, NAND and NOT.
Gates can have one or more inputs used. Usually not all gates are used in implementation since design and fabrication are
simpler if only one or two types is used. It is therefore important to identify “functionally complete” sets of gates as below;
AND, OR, NOT
AND, NOT
OR, NOT
NAND
NOR
Note:
AND, OR and NOT gates are functionally complete since they represent the three operations of Boolean algebra.
Page 26 of 29
AND and NOT gates form functionally complete set by devising away to synthesize the OR operation e.g. by
applying De Morgan’s theorem thus
A+B = A.B
A OR B = {(NOT A) AND (NOT B)}
OR and NOT can be used to synthesize AND operation hence functionally complete.
It should thus be noted that can be implemented solely with NAND or NOR gates
Page 27 of 29
The CPU can understand and execute machine instructions which are simply binary numbers stored in the computer.
Programming in machine language would thus require the programmer to enter program as binary data which would be
extremely tedious.
In this line ALP was invented. It uses symbolic addresses instead of binary data. They are then translated into machine
language by an “assembler”. It does two main things
i. Performs symbolic translations of data items
ii. Assigns some form of memory address to symbolic addresses.
ASP was the frits step to the HLL in use today and virtually all machines provide one though few people can make use of it.
They are basically for designing systems programs e.g. compilers and I/O routines.
The end
Page 28 of 29