COAU5
COAU5
Earlier when programming was done using assembly language, a need was felt to
make instruction do more task because programming in assembly was tedious and
error prone due to which CISC architecture evolved but with up rise of high level
language dependency on assembly reduced RISC architecture prevailed.
Characteristic of RISC –
1. Simpler instruction, hence simple instruction decoding.
2. Instruction come under size of one word.
3. Instruction take single clock cycle to get executed.
4. More number of general purpose register.
5. Simple Addressing Modes.
6. Less Data types.
7. Pipeline can be achieved.
Characteristic of CISC –
1. Complex instruction, hence complex instruction decoding.
2. Instruction are larger than one word size.
3. Instruction may take more than single clock cycle to get executed.
4. Less number of general purpose register as operation get performed in
memory itself.
5. Complex Addressing Modes.
6. More Data types.
So, add operation is divided into parts i.e. load, operate, store due to which RISC
programs are longer and require more memory to get stored but require less
transistors due to less complex command.
Difference –
RISC CISC
Uses only Hardwired control unit Uses both hardwired and micro
programmed control unit
Transistors are used for more Transistors are used for storing
registers complex
Instructions
A instruction fit in one word Instruction are larger than size of one
word
PIPELINE AND VECTOR PROCESSING
Parallel processing:
• Parallel processing is a term used for a large class of techniques that
are used to provide simultaneous data-processing tasks for the purpose of increasing the
computational speed of a computer system.
The system may have two or more ALUs to be able to execute two or more
instruction at the same time.
It can be achieved by having multiple functional units that perform same or different
operation simultaneously.
Separate the execution unit into eight functional units operating in parallel.
There are variety of ways in which the parallel processing can be classified
19
UNIT-V
Architectural Classification:
– Flynn's classification
» Instruction Stream
» Data Stream
SISD represents the organization containing single control unit, a processor unit and a
memory unit. Instruction are executed sequentially and system may or may not have
internal parallel processing capabilities.
SIMD represents an organization that includes many processing units under the
supervision of a common control unit.
MISD structure is of only theoretical interest since no practical system has been
constructed using this organization.
The main difference between multicomputer system and multiprocessor system is that the
multiprocessor system is controlled by one operating system that provides interaction
between processors and all the component of the system cooperate in the solution of a
problem.
Pipeline Processing
Vector Processing
Array Processors
20
UNIT-V
PIPELINING:
• The final result is obtained when data have passed through all segments.
21
UNIT-V
OPERATIONS IN EACH PIPELINE STAGE:
• Space-Time Diagram
PIPELINE SPEEDUP:
n = 6 in previous example
22
UNIT-V
k = 4 in previous example
The first task t1 requires k clock cycles to complete its operation since there
are k segments
• Speedup (S)
Example:
- 4-stage pipeline
Types of Pipelining:
• Arithmetic Pipeline
• Instruction Pipeline
ARITHMETIC PIPELINE:
Pipeline arithmetic units are usually found in very high speed computers.
23
UNIT-V
We will now discuss the pipeline unit for the floating point addition and subtraction.
The inputs to floating point adder pipeline are two normalized floating point numbers.
The floating point addition and subtraction can be performed in four segments.
Floating-point adder:
1) Compare exponents :
3-2=1
2) Align mantissas
X = 0.9504 x 103
Y = 0.08200 x 103
3) Add mantissas
Z = 1.0324 x 103
4) Normalize result
Z = 0.10324 x 104
24
UNIT-V
Instruction Pipeline:
Pipeline processing can occur not only in the data stream but in the instruction stream
as well.
This caused the instruction fetch and execute segments to overlap and perform
simultaneous operation.
25
UNIT-V
INSTRUCTION CYCLE:
26
UNIT-V
[3] Calculate the effective address of the operand
* Effective address calculation can be done in the part of the decoding phase
* Storage of the operation result into a register is done automatically in the execution phase
[2] DA: Decode the instruction and calculate the effective address of the operand
Pipeline Conflicts :
–
1) Resource conflicts: memory access by two segments at the same time. Most of these
conflicts can be resolved by using separate instruction and data memories.
27
UNIT-V
Example: an instruction with register indirect mode cannot proceed to fetch the operand
if the previous instruction is loading the address into the register.
3) Branch difficulties: branch and other instruction (interrupt, ret, ..) that change the value
of PC.
Hardware interlocks: It is the circuit that detects the conflict situation and
delayed the instruction by sufficient cycles to resolve the conflict.
Operand Forwarding: It uses the special hardware to detect the conflict and
avoid it by routing the data through the special path between pipeline
segments.
Delayed Loads: The compiler detects the data conflict and reorder the
instruction as necessary to delay the loading of the conflicting data by
inserting no operation instruction.
Branch Prediction
Delayed Branch
RISC Pipeline:
Since all operation are performed in the register, there is no need of effective address
calculation.
I: Instruction Fetch
A: ALU Operation
E: Execute Instruction
Delayed Load:
28
UNIT-V
Delayed Branch:
29
UNIT-V
Organization of Intel 8085 Micro-Processor:
The microprocessors that are available today came with a wide variety of capabilities and
architectural features. All of them, regardless of their diversity, are provided with at least the
following functional components, which form the central processing unit (CPU) of a classical
computer.
1. Register Section : A set of registers for temporary storage of instructions, data and
address of data .
2. Arithmetic and Logic Unit : Hardware for performing primitive arithmetic and logical
operations .
3. Interface Section : Input and output lines through which the microprocessor
communicates with the outside world .
4. Timing and Control Section : Hardware for coordinating and controlling the activities
of the various sections within the microprocessor and other devices connected to the
interface section .
The block diagram of the microprocessor along with the memory and Input/Output (I/O)
devices is shown in the Figure 11.1.
30
UNIT-V
Intel Microprocessors:
Intel 4004 is the first 4-bit microprocessor introduced by Intel in 1971. After that Intel
introduced its first 8-bit microprocessor 8088 in 1972.
These microprocessors could not last long as general-purpose microprocessors due to their
design and performance limitations.
In 1974, Intel introduced the first general purpose 8-bit microprocessor 8080 and this is the
first step of Intel towards the development of advanced microprocessor.
After 8080, Intel launched microprocessor 8085 with a few more features added to its
architecture, and it is considered to be the first functionally complete microprocessor.
The main limitations of the 8-bit microprocessors were their low speed, low memory
capacity, limited number of general purpose registers and a less powerful instruction set .
In the family of 16-bit microprocessors, Intel's 8086 was the first one introduced in 1978 .
8086 microprocessor has a much powerful instruction set along with the architectural
developments, which imparted substantial programming flexibility and improvement over the
8-bit microprocessor.
Intel 8085 is the first popular microprocessor used by many vendors. Due to its simple
architecture and organization, it is easy to understand the working principle of a
microprocessor.
The programmable registers of the 8085 are shown in the Figure 11.2-
31
UNIT-V
Figure 11.2: Register Organisation of 8085
Apart from these programmable registers , some other registers are also available which are
not accessible to the programmer . These registers include -
Instruction Register(IR).
Memory address and data buffers(MAR & MDR).
o MAR: Memory Address Register.
o MDR: Memory Data Register.
Temporary register for ALU use.
ALU of 8085 :
The 8-bit parallel ALU of 8085 is capable of performing the following operations –
Because of limited chip area , complex operations like multiplication, division, etc are not
available, in earlier processors like 8085.
The five flag bits give the status of the microprocessor after an ALU operation.
The carry (C) flag bit indicates whether there is any overflow from the MSB.
The parity (P) flag bit is set if the parity of the accumulater is even.
The Auxiliary Carry (AC) flag bit indicates overflow out of bit –3 ( lower nibble) in the same
manner, as the C-flag indicates the overflow out of the bit-7.
32
UNIT-V
The Zero (Z) flag bit is set if the content of the accumulator after any ALU operations is zero.
The Sign(S) flag bit is set to the condition of bit-7 of the accumulator as per the sign of the
contents of the accumulator(positive or negative ).
Microprocessor chips are equipped with a number of pins for communication with the outside
world. This is known as the system bus.
The interface lines of the Intel 8085 microprocessor are shown in the Figure 11.3 –
The AD0 - AD7 lines are used as lower order 8-bit address bus and data bus , in time division
multiplexed manner .
The A8 - A15 lines are used for higher order 8 bit of address bus.
IO/M : indicates memory access for LOW and I/O access for HIGH .
ALE : ALE is an address latch enable signal , this signal is HIGH when address information
is present in AD0-AD7 . The falling edge of ALU can be used to latch the address into an
external buffer to de-multiples the address bus .
33
UNIT-V
READY : READY line is used for communication with slow memory and I/O devices .
S0 and S1 : The status of the system bus is difined by the S0 and S1 lines as follows -
S1 S0 Operation Specified
0 0 Halt
0 1 Memory or I/O WRITE
1 0 Memory or I/O READ
1 1 Instruction Fetch
There are ten lines associated with CPU and bus control-
TRAP , RST7.5 , RST6.5 , RST5.5 and INTR are the Interrupt lines.
INTA: Interrupt acknowledge line.
RESET IN : This is the reset input signal to the 8085.
RESET OUT : The 8085 generates the RESET-OUT signal in response to
RESET-IN signal , which can be used as a system reset signal .
HOLD : HOLD signal is used for DMA request.
HLDA : HLDA signal is used for DMA grant .
Clock and Utility Lines :
X1 and X2: X1 and X2 are provided to connect a crystal or a RC network for generating
theclockinternaltothe chip.
Sid: input line for serial data communication.
Sod: output line for serial data communication.
Vcc and vss: power supply.
The block diagram of the Intel 8085 is shown in the Figure 11.4 -
34
UNIT-V
Addressing Modes :
The 8085 has four different modes for addressing data stored in memory or in registers -
Direct: Bytes 2 and 3 of the instruction contains the exact memory address of the data item(
the low-order bits of the address are in byte 2 , the high-order bits in byte 3 ).
Register: The instruction specifies the register or register pair in which the data are located.
Register Indirect: The instruction specifies a register pair which contains the memory address
where the data are located .( the high-order bits of the address are in the first register of the
pair and the low order bits in the second ).
Immediate: The instruction contains the data itself . This is either and 8-bit quantity or a 16-
bit quantity (least significant byte first , most significant byte second ).
A branch instruction can specify the address of the next instruction to be executed in one of
two ways -
Direct: The branch instruction contains the address of the next instruction to be executed .
REFERENCE :
35
UNIT-V
Computer Organization and Architecture Chapter 8 : Multiprocessors
Chapter – 8
Multiprocessors
8.1 Characteristics of multiprocessors
A multiprocessor system is an interconnection of two or more CPUs with memory
and input-output equipment.
The term “processor” in multiprocessor can mean either a central processing unit
(CPU) or an input-output processor (IOP).
Multiprocessors are classified as multiple instruction stream, multiple data stream
(MIMD) systems
The similarity and distinction between multiprocessor and multicomputer are
o Similarity
Both support concurrent operations
o Distinction
The network consists of several autonomous computers that may
or may not communicate with each other.
A multiprocessor system is controlled by one operating system that
provides interaction between processors and all the components of
the system cooperate in the solution of a problem.
Multiprocessing improves the reliability of the system.
The benefit derived from a multiprocessor organization is an improved system
performance.
o Multiple independent jobs can be made to operate in parallel.
o A single job can be partitioned into multiple parallel tasks.
Multiprocessing can improve performance by decomposing a program into
parallel executable tasks.
o The user can explicitly declare that certain tasks of the program be
executed in parallel.
This must be done prior to loading the program by specifying the
parallel executable segments.
o The other is to provide a compiler with multiprocessor software that can
automatically detect parallelism in a user’s program.
Multiprocessor are classified by the way their memory is organized.
o A multiprocessor system with common shared memory is classified as a
shared-memory or tightly coupled multiprocessor.
Tolerate a higher degree of interaction between tasks.
o Each processor element with its own private local memory is classified as
a distributed-memory or loosely coupled system.
Are most efficient when the interaction between tasks is minimal
Multiport Memory
A multiport memory system employs separate buses between each memory module and
each CPU.
The module must have internal control logic to determine which port will have access to
memory at any given time.
Memory access conflicts are resolved by assigning fixed priorities to each memory port.
Adv.:
o The high transfer rate can be achieved because of the multiple paths.
Disadv.:
o It requires expensive memory control logic and a large number of cables and
connections
Crossbar Switch
Consists of a number of crosspoints that are placed at intersections between processor
buses and memory module paths.
The small square in each crosspoint is a switch that determines the path from a processor
to a memory module.
Adv.:
o Supports simultaneous transfers from all memory modules
Disadv.:
o The hardware required to implement the switch can become quite large and
complex.
Below fig. shows the functional design of a crossbar switch connected to one memory
module.
Using the 2x2 switch as a building block, it is possible to build a multistage network to
control the communication between a number of sources and destinations.
o To see how this is done, consider the binary tree shown in Fig. below.
o Certain request patterns cannot be satisfied simultaneously. i.e., if P1 000~011,
then P2 100~111
One such topology is the omega switching network shown in Fig. below
Some request patterns cannot be connected simultaneously. i.e., any two sources cannot
be connected simultaneously to destination 000 and 001
In a tightly coupled multiprocessor system, the source is a processor and the destination
is a memory module.
Set up the path transfer the address into memory transfer the data
In a loosely coupled multiprocessor system, both the source and destination are
processing elements.
Hypercube System
The hypercube or binary n-cube multiprocessor structure is a loosely coupled system
composed of N=2n processors interconnected in an n-dimensional binary cube.
o Each processor forms a node of the cube, in effect it contains not only a CPU but
also local memory and I/O interface.
o Each processor address differs from that of each of its n neighbors by exactly one
bit position.
Fig. below shows the hypercube structure for n=1, 2, and 3.
Routing messages through an n-cube structure may take from one to n links from a
source node to a destination node.
o A routing procedure can be developed by computing the exclusive-OR of the
source node address with the destination node address.
o The message is then sent along any one of the axes that the resulting binary value
will have 1 bits corresponding to the axes on which the two nodes differ.
A representative of the hypercube architecture is the Intel iPSC computer complex.
o It consists of 128(n=7) microcomputers, each node consists of a CPU, a floating-
point processor, local memory, and serial communication interface units.
Interprocess Synchronization
The instruction set of a multiprocessor contains basic instructions that are used to
implement communication and synchronization between cooperating processes.
o Communication refers to the exchange of data between different processes.
o Synchronization refers to the special case where the data used to communicate
between processors is control information.
INTERPROCESSOR ARBITRATION
Computer systems contain a number of buses at various levels to facilitate the transfer of
information between components. The CPU contains a number of internal buses for transferring
information between processor registers and ALU.
A memory bus consists of lines for transferring data, address, and read/write information.
An I/O bus is used to transfer information to and from input and output devices.
A bus that connects major components in a multiprocessor system, such as CPUs, IOPs, and
memory, is called a system bus.
The processors in a shared memory multiprocessor system request access to common memory
or other common resources through the system bus. If no other processor is currently utilizing
the bus, the requesting processor may be granted access immediately.
Other processors may request the system bus at the same time. Arbitration must then be
performed to resolve this multiple contention for the shared resources. The arbitration logic
would be part of the system bus controller placed between the local bus and the system bus.
System Bus
A typical system bus consists of approximately 100 signal lines. These lines are divided into
three functional groups: data, address, and control. In addition, there are power distribution
lines that supply power to the components.
For example, the IEEE standard 796 multibus system has 16 data lines, 24 address lines, 26
control lines, and 20 power lines, for a total of 86 lines.
Data transfers over the system bus may be synchronous or asynchronous.
In a synchronous bus, each data item is transferred during a time slice known in advance to
both source and destination units. Synchronization is achieved by driving both units from a
common clock source.
In an asynchronous bus, each data item being
transferred is accompanied by handshaking control
signals to indicate when the data are transferred from the
source and received by the destination
The following table lists the 86 lines that are available in
the IEEE standard 796 multibus.
The processor whose arbiter has a PI = 1 and PO = 0 is the one that is given control of the
system bus
A processor may be in the middle of a bus operation when a higher priority processor requests
the bus. The lower-priority processor must complete its bus operation before it relinquishes
control of the bus.
When an arbiter receives control of the bus (because its PI = 1 and PO = 0) it examines the
busy line. If the line is inactive, it means that no other processor is using the bus. The arbiter
activates the busy line and its processor takes control of the bus. However, if the arbiter finds
the busy line active, it means that another processor is currently using the bus.
The arbiter keeps examining the busy line while the lower-priority processor that lost control of
the bus completes its operation.
When the bus busy line returns to its inactive state, the higher-priority arbiter enables the busy
line, and its corresponding processor can then conduct the required bus transfers.